Thursday, December 15, 2011

Why wireless (802.11) roaming is a nightmare (and why CCX can help)

Part 2: Why CCX can help

In part 1, we saw the three main generations of roaming algorithms implemented in 802.11 wireless devices to handle roaming, and saw the limitations related to the fact that the wireless client has little or no visibility on the neighbouring environment. Because of this blindness, the wireless laptop is compelled to make roaming decisions built on "educated guesses" about the changes in its RF environment. In this second part, we will see how the Cisco Compatible Extensions can help the wireless device improve roaming efficiency.
This second part is not intended to be a general praise to "Great Cisco", but to show how a dynamic interaction between the wireless infrastructure and the wireless client can improve roaming. It is interesting to notice that the 802.11 standard did no implement such a dynamic interaction, and that it takes proprietary implementations to solve the "client blindness" issue.

- Tell me what you see: CCX message S36
The first issue you need to solve for efficient roaming is knowledge about the environment. If the wireless client knows about the neighbouring APs, roaming becomes a walk in the park! Yes, but we said in part 1 that the core  problem was to collect this information without having to spend precious milliseconds and battery time looking for it. In other words, you do not want each client to have to scan all the time looking for potential neighbours, because this decreases the efficiency of the client in the current cell (time wasted away from the cell frequency), consumes a lot of energy (strain on the client battery), and is inefficient (who knows if the client is going to detect the best APs, or just the first ones to reply).
A nice way to alleviate this problem is to centralize this information, for example on the AP. With CCX message S36 (from the number S3x, you can recognize that this was introduced in CCX version 2), and AP can ask a client to report about its environment. The process is as follows:
  1. The AP sends an S36 Radio Measurement request (S36 RRM request) to the client, basically saying "hey, report what you know of your environment".
  2. The client has several ways to react. 
  • The client can ignore the AP request ("leave me alone, I am busy, or I do not know anything about my environment, or I am no CCX anyway!"). 
  • The client can reply immediately with the information it knows
  • The client can explore its environment and come back with answers.
Most clients are set to explore their environment first and then come back with more detailed information. What is nice about S36 is that the AP can specify the target channel (tell what you know about APs on channel 11), to limit the information that the client needs to collect.
For this exploration, the client can use one of three modes:
  • Active scanning: in this mode, the client jumps to the requested channel(s), and sends broadcast probe requests. Each request contains a CCX S36 proprietary Information Element. 
  • Passive scanning: in this mode, the client jumps to the requested channel(s), and passively listens for beacons.
  • Beacon table: in this mode, the client simply returns to the requesting AP the information about the channel collected previously. In other words, the client does not jump to scan, but directly replies with the information that the client has about APs on the requested channel(s). 
These 3 modes are quite flexible: depending on your client battery conservation policy, any of these 3 modes can be used.
Where it gets smart is that the AP can ask the client to collect specific elements of information. In the S36 Radio Measurement request, the AP can specify what element is to be reported:
  • Channel load: check the channel for a specified duration and tell me how loaded this channel is.
  • Noise histogram: check noise on the channel, make a certain number of measurements, and report them to me.
  • Beacon: probe and collect the beacons you hear. Report them to me.
  • Frame: listen on the target channel and report what frames you hear.
The Radio measurement request can also specify if the client should perform all measurements then report, or if the client should report after each measurement. This is useful when more than one channel is to be explored, or if more than one measurement is expected (for example, the client may report each beacon heard, or collect all beacons on a given channel and report them all at one time).

The client will then report each of the elements requested in the Radio Measurement report (S36 RRM Report):
  • Channel load: this is a simple measurement, the client reports the measurement duration and the percentage of time the network was busy (someone was sending) on the tested channel.
  • Noise histogram: this is also quite simple, the client reports the measurement duration, then a certain number of noise values collected over the measurement duration. These noise values are always taken when the network is idle (i.e. no 802.11 frame is detected, and what is reported is the real background noise)
  • Beacon: this is a more complete list of elements. The report contains of course the measurement duration, then for each received beacon the received signal power, signal type (OFDM, ERP, DSSS, HT, etc), the beacon interval and main components (such as BSSID, supported data rates, etc.). This is an important element: with this report, the AP knows all the neighbouring WLANs detected by the client.
  • Frame: this report contains of course the measurement duration, then the number of frames detected, with the BSSID and the received power at which the frames were heard. With this piece of information,  the client may not have heard the AP on the tested channel, but the AP knows what clients of what SSID are communicating on the tested channel.

Okay, but how does that help? You might think that the client does not need the AP S36 message if all that fuss is just to go scan, something that the client could do on its own.You would be right if CCX only had S36, but this is combined with some other CCX messages, namely S51 and S68.
So far, CCX does a nice job at sharing the information: when a client scans other channels and learns about other APs and BSSIDs, S36 allows this information to be shared with the AP. Nice, because the AP is at the center of the cell, and is heard by all other clients.

- This is how you roam, S51
The CCX S51 message was introduced in CCXv4, as you may be able to guess from the S5x structure. CCXv2 already had an ancestor to S51, called S32. S32 was updated with S51, so we will ignore the old S32 and describe S51. This S51 message exchange is all about roaming. What is a bit complex about this exchange is that it can occur at different points in time, and contains several possible sub-components:

The first and probably most important element is the RF Parameter Element. The AP can send a S51 message to the client, containing the following "CCXv4 S51 RF parameter elements":
  • Minimum RSSI needed: the AP tells the client "this is the minimum RSSI you need to get if you want to associate to my cell".
  • AP Tx Power: the AP communicates its transmit power (in dBm).
  • Scanning threshold: the AP tells the client "if you get my signal below this RSSI level, start scanning for another AP".
  • Hysteresis: the AP tells the client "if while scanning you hear a neighbouring AP at a signal that many dB above mine, jump to that AP".
  • Transition time: the AP tells the client "when you roam, you have that number of seconds to get to another AP. After that, I'll disconnect you.
S51 contains other elements, they will come into play when looking at S51 more in detail. So how is S51 used? And how S36 works with S51?

First of all, a client is turned on, scans and discovers an AP with a probe request. The AP probe response can contain a S51 Channel Load Information element. This tells the client how busy the AP is (how many clients taking what percentage of the AP resources). This is a CCX S51 message, but is in fact an ancestor to the QBSS Load IE (QoS Basic Service Set Load Information Element) defined by 802.11e. If your station and AP are WMM, the station will receive the same information from the 802.11e/WMM QoS section of the beacon, without the need for a proprietary CCX S51 element in the beacon.

Just after the client successful association to the cell, the AP sends a CCX S51 Neighbour List Update message to the client. This is a unicast unsolicited message. In other words, the AP sends the message to the client without being asked, as soon as the CCXv4 client is associated. This S51 Neighbour List Update contains the list of 6 best neighouring APs. For each neighbour, the client learns the channel, the AP Minimum acceptable received signal level, the last known AP TX power, the AP beacon interval, and roaming parameters such as roaming Hysteresis, adaptive Scan Threshold and transition time.
In a logic where cells are next to each others, the 6 best neighbors for the AP give the client the best chances to find a better AP in case of roaming.

The AP collected this information thanks the the S36 messages reported by other clients. A client entering the cell is also a good candidate for a S36 question. As the client is entering the cell, it may be just turned on, or is coming from another cell. So the AP can ask "send me the list of neighbours you know, just your beacon table, not need to scan". This is not battery-expensive for the client (no scanning required), and allows the AP to be informed about neighboring APs.

With this information about neighbors and the information about the current client power level, the AP can then calculate the overlap between cells. The AP can then communicate to the client, its own RF parameter elements, including the Scanning Threshold (also called Adaptive Scan Threshold).
With this piece of information, the AP tells the client:
  • As long as my RSSI is above this value, you do not need to scan. You are in my inner /good coverage area, so conserve your battery. This is great. In practical tests, you can see the client get a 30 to 35% longer battery time when CCX is turned on, just thanks to this feature.
  • When my RSSI gets below this value, start scanning. Here are the possible neighbors. This way, the client only starts scanning when needed, and scans in priority the useful channels, here again saving on battery.
  • When you roam, only jump to another AP if its RSSI is that much better than mine ("that much" is given in the Roaming Hysteresis value). This is very useful for stability, and prevents a client from jumping back and forth between 2 APs of about the same RSSI.
  • As a side note, the AP can also use the S31 (CCXv2 and later, as you may have guessed) AP Controlled Client Transmit Power. If the client is close to the AP, the AP can tell the client: "no need to be that loud, I can hear you even if you are quieter", thus allowing the client to reduce its power level, and here again save battery power. The AP can also ask the client to increase its power level if the client goes out of range of the AP.
  • This S51/S36 message system can be very helpful with sticky clients. You may remember from part 1 the RSA algorithm (sticky clients that do not roam when they are far away, even if they are close to another, better AP). If the client signal gets too weak, the AP can send an S36 (give me the list of APs you hear), and if the client report a better AP, the current AP can send a  S51 Direct Roam message. This message basically says "Roam to AP X on channel Y, I know it's there and you can reach it". This way, the AP can dynamically compensate for a weak roaming algorithm.
CCX has other features to faciliate roaming. For example, CCXv5 introduces the S68 message (S6x, logically CCXv5), gratuitous probe response. To avoid that the client running a passive scan has to stay close to 100 ms on the scanned channel, the APs can send broadcast probe responses between beacons, thus speeding up the discovery process.
The good thing about CCX is that most silicon vendors for client devices implement CCX (check here for the list). Cisco gives it away to anyone who wants to implement it into their clients (they still have to pay for the certification)... but only Cisco access points are CCX on the infrastructure side.
The complexity of CCX lies in the fact that it has many features, and that certifying for a new generation of CCX costs quite some development work (and certification costs). For this reason, not all vendors implement the newest and latest CCX, and you may find that some features are not available in this or that client. Sometimes, a simple firmware upgrade is enough to get a newer CCX, but sometimes the vendor implements the newer CCX in the newer clients and you may be stuck with for example a CCX v3 client, doing S31 and S36, but no S51 or S68... Some vendors tell you which CCX they run in the properties windows of their adapters, some others don't. But if you associate that client to a Cisco AP, the AP will always tell you what CCX it is running. You may then have to spend hours to try to figure out if you can get a newer CCX for this adapter or not...