A Voice Of Reason On Voice Over WiFi
Voice over WiFi is scary. Retries, packet errors (due to lots of Retries) and high latency (usually due to packet errors that happen because of lots of Retries) will murder a WiFi network's ability to handle Voice and leave your users screaming (not actually screaming) like they were cast in a horror movie (or, at the very least annoyed like a character from Office Space). But there's one thing that sometimes scares people, but really shouldn't: Voice Arbitration. It's not going to kill your WiFi voice calls. In fact, it will almost certainly help.
Arbitration is a process defined in the 802.11 standard. Every device (client/station and AP) goes through it.
The simplest way I can describe 802.11 Arbitration is like so:
If your AP or station has heard a quiet channel for 37 microseconds (0.000037 seconds), then your AP or station transmits a frame (what most people call a packet, but I call a frame).
If your AP or station has been hearing a busy channel for the past 37 microseconds, then it enters Arbitration. Arbitration is the process by which APs and stations go through a series of procedures in order to avoid devices sending at the same time (and, thus, causing a collision).
The big worry some people have is that Voice will mess up Arbitration. That it will make it MORE likely that collisions will happen. The Voice QoS category (as originally defined in the 802.11e amendment) reduces the amount of randomness in Arbitration. For ordinary (called Best Effort in 802.11e), APs and stations choose a random value between 0 and 15 (on their first try; retries choose from exponentially larger random pools of values). Voice applications (at least, applications that get tagged as voice by the AP or station) choose values between 0 and 3. The problem comes when two devices (APs or stations or one of each) choose the same random value during Arbitration. If that happens, then both devices try to transmit at the same time and most likely a collision occurs. Two devices choosing the same number between 0 and 15 could happen, but maybe not all that often. Two devices choosing the same number between 0 and 3 would be far more likely.
The flaw in this thinking is that devices have to be in Arbitration at the EXACT same moment for the 802.11e Voice category to become a problem. Most likely, they won't be.
Even though voice applications seem like they're always running, they're really not. Each voice packet only uses about 300 microseconds* (assuming a 39 Mbps data rate -- because smartphones don't support MIMO -- and the use of RTS/CTS) or 150 microseconds (if RTS/CTS is not used) of channel time. And a modern voice application is going to have at most about 90 packets go over the wireless channel per second (combined send and receive). That's not a whole lot of channel time. If we take the example of an iPhone on a typical WLAN -- where RTS/CTS is used when the iPhone sends, but not used when the AP sends -- that's only 20,250 total microseconds being used. 20,250 microseconds is 20.25 ms. Or 0.02025 seconds. In other words, each phone is only using about 2% of the available channel time when a call is active.
If all of these numbers are too much, here is a visualization of how often a typical voice app is actually using the wireless channel:
The red and blue mean that the channel is occupied, while the empty boxes mean that the channel is unused**.
Look at all of that open space. And remember that open space is not the only way that collisions are avoided. Collisions are avoided if:
-A phone becomes ready to send data while the channel is clear.
-A phone becomes ready to send data while another phone or AP is already sending data (assuming the devices can hear each others' transmissions).
The only way a collision can occur due to an Arbitration match is if two or more devices begin attempting to send data at the exact same time. For applications that use as little channel time as voice, that's just not very likely to happen.
I have found the 802.11e Voice category to not cause problems in high density/congested environments. In fact, I've found it to help. When voice applications use the 802.11e/WMM Best Effort category instead of the Voice category, an extra 63 microseconds*** gets added to the packet transmission process. That's about a 20% jump in used channel time for devices that do use RTS/CTS and a whopping 40% jump for devices that don't. That means more channel time per packet. Using more channel time per packet is an actual problem, because it leads to the real reason that collisions tend to happen. Collisions most often occur because one device is in the middle of sending and another device can't hear it. More channel time being used by each packet is most likely going to exacerbate the problem of collisions, not minimize it.
***
*All time calculations include 28 microseconds for a Voice AIFS, 18 microseconds for the Random Backoff (that's two slot times), 10 microseconds per SIFS, 50 microseconds for non-HT physical layer headers (used by RTS, CTS and Block Ack frames) and 40 microseconds for HT physical layer headers (used by Data frames).
**That little graph has 10,000 boxes, meaning that each box could represent 100 microseconds. The red dots are a visualization of how much channel time is being used when the iPhone sends (300 microsecs, including RTS/CTS) and the blue dots are a visualizations of the AP sending (since 150 microsecs would mean 1.5 boxes, I just represented half of the AP's packets with two boxes and half with one box).
***63 microseconds is calculated by adding an extra 9 microseconds to the AIFS (the Best Effort AIFS is 37 microseconds) and by adding six extra 9 microsecond slot times for the Random Backoff.
Arbitration is a process defined in the 802.11 standard. Every device (client/station and AP) goes through it.
The simplest way I can describe 802.11 Arbitration is like so:
If your AP or station has heard a quiet channel for 37 microseconds (0.000037 seconds), then your AP or station transmits a frame (what most people call a packet, but I call a frame).
If your AP or station has been hearing a busy channel for the past 37 microseconds, then it enters Arbitration. Arbitration is the process by which APs and stations go through a series of procedures in order to avoid devices sending at the same time (and, thus, causing a collision).
The big worry some people have is that Voice will mess up Arbitration. That it will make it MORE likely that collisions will happen. The Voice QoS category (as originally defined in the 802.11e amendment) reduces the amount of randomness in Arbitration. For ordinary (called Best Effort in 802.11e), APs and stations choose a random value between 0 and 15 (on their first try; retries choose from exponentially larger random pools of values). Voice applications (at least, applications that get tagged as voice by the AP or station) choose values between 0 and 3. The problem comes when two devices (APs or stations or one of each) choose the same random value during Arbitration. If that happens, then both devices try to transmit at the same time and most likely a collision occurs. Two devices choosing the same number between 0 and 15 could happen, but maybe not all that often. Two devices choosing the same number between 0 and 3 would be far more likely.
The flaw in this thinking is that devices have to be in Arbitration at the EXACT same moment for the 802.11e Voice category to become a problem. Most likely, they won't be.
Even though voice applications seem like they're always running, they're really not. Each voice packet only uses about 300 microseconds* (assuming a 39 Mbps data rate -- because smartphones don't support MIMO -- and the use of RTS/CTS) or 150 microseconds (if RTS/CTS is not used) of channel time. And a modern voice application is going to have at most about 90 packets go over the wireless channel per second (combined send and receive). That's not a whole lot of channel time. If we take the example of an iPhone on a typical WLAN -- where RTS/CTS is used when the iPhone sends, but not used when the AP sends -- that's only 20,250 total microseconds being used. 20,250 microseconds is 20.25 ms. Or 0.02025 seconds. In other words, each phone is only using about 2% of the available channel time when a call is active.
If all of these numbers are too much, here is a visualization of how often a typical voice app is actually using the wireless channel:
The red and blue mean that the channel is occupied, while the empty boxes mean that the channel is unused**.
Look at all of that open space. And remember that open space is not the only way that collisions are avoided. Collisions are avoided if:
-A phone becomes ready to send data while the channel is clear.
-A phone becomes ready to send data while another phone or AP is already sending data (assuming the devices can hear each others' transmissions).
The only way a collision can occur due to an Arbitration match is if two or more devices begin attempting to send data at the exact same time. For applications that use as little channel time as voice, that's just not very likely to happen.
I have found the 802.11e Voice category to not cause problems in high density/congested environments. In fact, I've found it to help. When voice applications use the 802.11e/WMM Best Effort category instead of the Voice category, an extra 63 microseconds*** gets added to the packet transmission process. That's about a 20% jump in used channel time for devices that do use RTS/CTS and a whopping 40% jump for devices that don't. That means more channel time per packet. Using more channel time per packet is an actual problem, because it leads to the real reason that collisions tend to happen. Collisions most often occur because one device is in the middle of sending and another device can't hear it. More channel time being used by each packet is most likely going to exacerbate the problem of collisions, not minimize it.
***
*All time calculations include 28 microseconds for a Voice AIFS, 18 microseconds for the Random Backoff (that's two slot times), 10 microseconds per SIFS, 50 microseconds for non-HT physical layer headers (used by RTS, CTS and Block Ack frames) and 40 microseconds for HT physical layer headers (used by Data frames).
**That little graph has 10,000 boxes, meaning that each box could represent 100 microseconds. The red dots are a visualization of how much channel time is being used when the iPhone sends (300 microsecs, including RTS/CTS) and the blue dots are a visualizations of the AP sending (since 150 microsecs would mean 1.5 boxes, I just represented half of the AP's packets with two boxes and half with one box).
***63 microseconds is calculated by adding an extra 9 microseconds to the AIFS (the Best Effort AIFS is 37 microseconds) and by adding six extra 9 microsecond slot times for the Random Backoff.
***
If you like my blog, you can support it by shopping through my Amazon link or donating Bitcoin to 1N8m1o9phSkFXpa9VUrMVHx4LJWfratseU
ben at sniffwifi dot com
Twitter: @Ben_SniffWiFi
ben at sniffwifi dot com
Twitter: @Ben_SniffWiFi
Great post Ben. I wrote a guide to deploying WiFi for voice a little while ago. It's a bit dated and doesn't go as technically deep as you do but your readers might find it interesting. I've more recently reposted it on LinkedIn Pulse https://www.linkedin.com/pulse/voip-over-wireless-networking-some-things-you-need-know-wolff?trk=mp-author-card
ReplyDeleteMichael