Thursday, December 15, 2011

Why wireless (802.11) roaming is a nightmare (and why CCX can help)

Part 2: Why CCX can help

In part 1, we saw the three main generations of roaming algorithms implemented in 802.11 wireless devices to handle roaming, and saw the limitations related to the fact that the wireless client has little or no visibility on the neighbouring environment. Because of this blindness, the wireless laptop is compelled to make roaming decisions built on "educated guesses" about the changes in its RF environment. In this second part, we will see how the Cisco Compatible Extensions can help the wireless device improve roaming efficiency.
This second part is not intended to be a general praise to "Great Cisco", but to show how a dynamic interaction between the wireless infrastructure and the wireless client can improve roaming. It is interesting to notice that the 802.11 standard did no implement such a dynamic interaction, and that it takes proprietary implementations to solve the "client blindness" issue.

- Tell me what you see: CCX message S36
The first issue you need to solve for efficient roaming is knowledge about the environment. If the wireless client knows about the neighbouring APs, roaming becomes a walk in the park! Yes, but we said in part 1 that the core  problem was to collect this information without having to spend precious milliseconds and battery time looking for it. In other words, you do not want each client to have to scan all the time looking for potential neighbours, because this decreases the efficiency of the client in the current cell (time wasted away from the cell frequency), consumes a lot of energy (strain on the client battery), and is inefficient (who knows if the client is going to detect the best APs, or just the first ones to reply).
A nice way to alleviate this problem is to centralize this information, for example on the AP. With CCX message S36 (from the number S3x, you can recognize that this was introduced in CCX version 2), and AP can ask a client to report about its environment. The process is as follows:
  1. The AP sends an S36 Radio Measurement request (S36 RRM request) to the client, basically saying "hey, report what you know of your environment".
  2. The client has several ways to react. 
  • The client can ignore the AP request ("leave me alone, I am busy, or I do not know anything about my environment, or I am no CCX anyway!"). 
  • The client can reply immediately with the information it knows
  • The client can explore its environment and come back with answers.
Most clients are set to explore their environment first and then come back with more detailed information. What is nice about S36 is that the AP can specify the target channel (tell what you know about APs on channel 11), to limit the information that the client needs to collect.
For this exploration, the client can use one of three modes:
  • Active scanning: in this mode, the client jumps to the requested channel(s), and sends broadcast probe requests. Each request contains a CCX S36 proprietary Information Element. 
  • Passive scanning: in this mode, the client jumps to the requested channel(s), and passively listens for beacons.
  • Beacon table: in this mode, the client simply returns to the requesting AP the information about the channel collected previously. In other words, the client does not jump to scan, but directly replies with the information that the client has about APs on the requested channel(s). 
These 3 modes are quite flexible: depending on your client battery conservation policy, any of these 3 modes can be used.
Where it gets smart is that the AP can ask the client to collect specific elements of information. In the S36 Radio Measurement request, the AP can specify what element is to be reported:
  • Channel load: check the channel for a specified duration and tell me how loaded this channel is.
  • Noise histogram: check noise on the channel, make a certain number of measurements, and report them to me.
  • Beacon: probe and collect the beacons you hear. Report them to me.
  • Frame: listen on the target channel and report what frames you hear.
The Radio measurement request can also specify if the client should perform all measurements then report, or if the client should report after each measurement. This is useful when more than one channel is to be explored, or if more than one measurement is expected (for example, the client may report each beacon heard, or collect all beacons on a given channel and report them all at one time).


The client will then report each of the elements requested in the Radio Measurement report (S36 RRM Report):
  • Channel load: this is a simple measurement, the client reports the measurement duration and the percentage of time the network was busy (someone was sending) on the tested channel.
  • Noise histogram: this is also quite simple, the client reports the measurement duration, then a certain number of noise values collected over the measurement duration. These noise values are always taken when the network is idle (i.e. no 802.11 frame is detected, and what is reported is the real background noise)
  • Beacon: this is a more complete list of elements. The report contains of course the measurement duration, then for each received beacon the received signal power, signal type (OFDM, ERP, DSSS, HT, etc), the beacon interval and main components (such as BSSID, supported data rates, etc.). This is an important element: with this report, the AP knows all the neighbouring WLANs detected by the client.
  • Frame: this report contains of course the measurement duration, then the number of frames detected, with the BSSID and the received power at which the frames were heard. With this piece of information,  the client may not have heard the AP on the tested channel, but the AP knows what clients of what SSID are communicating on the tested channel.

Okay, but how does that help? You might think that the client does not need the AP S36 message if all that fuss is just to go scan, something that the client could do on its own.You would be right if CCX only had S36, but this is combined with some other CCX messages, namely S51 and S68.
So far, CCX does a nice job at sharing the information: when a client scans other channels and learns about other APs and BSSIDs, S36 allows this information to be shared with the AP. Nice, because the AP is at the center of the cell, and is heard by all other clients.

- This is how you roam, S51
The CCX S51 message was introduced in CCXv4, as you may be able to guess from the S5x structure. CCXv2 already had an ancestor to S51, called S32. S32 was updated with S51, so we will ignore the old S32 and describe S51. This S51 message exchange is all about roaming. What is a bit complex about this exchange is that it can occur at different points in time, and contains several possible sub-components:

The first and probably most important element is the RF Parameter Element. The AP can send a S51 message to the client, containing the following "CCXv4 S51 RF parameter elements":
  • Minimum RSSI needed: the AP tells the client "this is the minimum RSSI you need to get if you want to associate to my cell".
  • AP Tx Power: the AP communicates its transmit power (in dBm).
  • Scanning threshold: the AP tells the client "if you get my signal below this RSSI level, start scanning for another AP".
  • Hysteresis: the AP tells the client "if while scanning you hear a neighbouring AP at a signal that many dB above mine, jump to that AP".
  • Transition time: the AP tells the client "when you roam, you have that number of seconds to get to another AP. After that, I'll disconnect you.
S51 contains other elements, they will come into play when looking at S51 more in detail. So how is S51 used? And how S36 works with S51?


First of all, a client is turned on, scans and discovers an AP with a probe request. The AP probe response can contain a S51 Channel Load Information element. This tells the client how busy the AP is (how many clients taking what percentage of the AP resources). This is a CCX S51 message, but is in fact an ancestor to the QBSS Load IE (QoS Basic Service Set Load Information Element) defined by 802.11e. If your station and AP are WMM, the station will receive the same information from the 802.11e/WMM QoS section of the beacon, without the need for a proprietary CCX S51 element in the beacon.

Just after the client successful association to the cell, the AP sends a CCX S51 Neighbour List Update message to the client. This is a unicast unsolicited message. In other words, the AP sends the message to the client without being asked, as soon as the CCXv4 client is associated. This S51 Neighbour List Update contains the list of 6 best neighouring APs. For each neighbour, the client learns the channel, the AP Minimum acceptable received signal level, the last known AP TX power, the AP beacon interval, and roaming parameters such as roaming Hysteresis, adaptive Scan Threshold and transition time.
In a logic where cells are next to each others, the 6 best neighbors for the AP give the client the best chances to find a better AP in case of roaming.


The AP collected this information thanks the the S36 messages reported by other clients. A client entering the cell is also a good candidate for a S36 question. As the client is entering the cell, it may be just turned on, or is coming from another cell. So the AP can ask "send me the list of neighbours you know, just your beacon table, not need to scan". This is not battery-expensive for the client (no scanning required), and allows the AP to be informed about neighboring APs.

With this information about neighbors and the information about the current client power level, the AP can then calculate the overlap between cells. The AP can then communicate to the client, its own RF parameter elements, including the Scanning Threshold (also called Adaptive Scan Threshold).
With this piece of information, the AP tells the client:
  • As long as my RSSI is above this value, you do not need to scan. You are in my inner /good coverage area, so conserve your battery. This is great. In practical tests, you can see the client get a 30 to 35% longer battery time when CCX is turned on, just thanks to this feature.
  • When my RSSI gets below this value, start scanning. Here are the possible neighbors. This way, the client only starts scanning when needed, and scans in priority the useful channels, here again saving on battery.
  • When you roam, only jump to another AP if its RSSI is that much better than mine ("that much" is given in the Roaming Hysteresis value). This is very useful for stability, and prevents a client from jumping back and forth between 2 APs of about the same RSSI.
  • As a side note, the AP can also use the S31 (CCXv2 and later, as you may have guessed) AP Controlled Client Transmit Power. If the client is close to the AP, the AP can tell the client: "no need to be that loud, I can hear you even if you are quieter", thus allowing the client to reduce its power level, and here again save battery power. The AP can also ask the client to increase its power level if the client goes out of range of the AP.
  • This S51/S36 message system can be very helpful with sticky clients. You may remember from part 1 the RSA algorithm (sticky clients that do not roam when they are far away, even if they are close to another, better AP). If the client signal gets too weak, the AP can send an S36 (give me the list of APs you hear), and if the client report a better AP, the current AP can send a  S51 Direct Roam message. This message basically says "Roam to AP X on channel Y, I know it's there and you can reach it". This way, the AP can dynamically compensate for a weak roaming algorithm.
CCX has other features to faciliate roaming. For example, CCXv5 introduces the S68 message (S6x, logically CCXv5), gratuitous probe response. To avoid that the client running a passive scan has to stay close to 100 ms on the scanned channel, the APs can send broadcast probe responses between beacons, thus speeding up the discovery process.
The good thing about CCX is that most silicon vendors for client devices implement CCX (check here for the list). Cisco gives it away to anyone who wants to implement it into their clients (they still have to pay for the certification)... but only Cisco access points are CCX on the infrastructure side.
The complexity of CCX lies in the fact that it has many features, and that certifying for a new generation of CCX costs quite some development work (and certification costs). For this reason, not all vendors implement the newest and latest CCX, and you may find that some features are not available in this or that client. Sometimes, a simple firmware upgrade is enough to get a newer CCX, but sometimes the vendor implements the newer CCX in the newer clients and you may be stuck with for example a CCX v3 client, doing S31 and S36, but no S51 or S68... Some vendors tell you which CCX they run in the properties windows of their adapters, some others don't. But if you associate that client to a Cisco AP, the AP will always tell you what CCX it is running. You may then have to spend hours to try to figure out if you can get a newer CCX for this adapter or not...

Thursday, November 10, 2011

Why it's the right time to start working in 802.11 wireless: new protocols are coming

So you thought that 802.11n was the ultimate protocol, allowing 300 Mbps, maybe up to 450 Mbps? If you are not working in wireless yet, now is the right time to think about switching career, and starting to get a few certifications in the 802.11 wireless field: new protocols are coming that will change the deal for a long time, and make 802.11 THE protocol you want to be expert on. So get some Cisco wireless training and prepare for the storm to come:

  • 802.11ac is the first big one. This amendment is planned for end of 2012, and will increase wireless speeds in the 5 GHz band beyond the 1 Gbps bar. Its also brings very clever enhancements. For example, 70 % of the cell traffic is from the AP to the clients. Knowing that, 802.11ac has mechanisms where the AP could use a 160 MHz wide channel, and allocate sub-sections of this mega channel to groups of clients, allowing several clients to communicate at the same time with the AP! When 802.11ac will come out, new APs with up to 8 antennas will appear on the market, and your favorite wireless hardware vendor will have a few golden years ahead of them to replace the old 802.11n APs!

  • 802.11ad does not make a lot of noise, but is an important amendment, to be released also by the end of 2012. 802.11ad brings the 802.11 protocol to the 57-66 GHz band. Why there? Because this is the range of frequencies your home devices will use to communicate. With 802.11ad, your TVs, hifi system, speakers and any other electronic device in your home will be able to communicate and exchange data. This way, you will be able to watch a movie on a TV, hearing the sound wirelessly through your HD speakers, then move the movie to another TV seamlessly. This may look like geeky accessories, but soon you will see a booming demand of 802.11 professional to install, maintain and troubleshoot home systems.

  • 802.11ah brings 802.11 below 1 GHz, into the many unlicensed bands available at these low frequencies. This allows 802.11 to be used at longer range, and send 802.11 signals along highways to provide tons of information to travelling users, but also to communicate over several miles from one antenna. Throughput will not be very high (100kbps is the target), but there are countless businesses who employ regional employees and need to stay in touch with them and send data. Here again, a big demand for 802.11 professionals will appear as theses systems are sold and deployed... worldwide. 802.11ah should come in 2014.
  • 802.11af is a great scavenger amendment. TVs signals were analog, and became digital. At the same time, they changed frequency. There is a rich collection of low frequencies abandoned by analog TV that are available for who wants them... and 802.11af is there to take them! 802.11af should be published some time in 2014. Its exact scope is still changing, but every day sees new possible applications, from internet for rural areas, to monitoring sensors reporting to central stations, or nationwide alert systems... and all that, using the good 802.11 protocol.
All this is exciting! 802.11ac will be the first wave driving sales and demand, and the other amendments to follow will deepen the need for wireless expertise. So start your journey and join us in the 802.11 world! :-)

Wednesday, October 12, 2011

Why wireless (802.11) roaming is a nightmare (and why CCX can help)

Part 1: the nightmare

Have you ever suffered from roaming issues, got your VoWLAN call disconnected when jumping from one AP to another, and wondered why... why do some devices roam just fine while some others roam poorly, dropping packets if not the entire session? Are some vendors so dummy that they don't know how to implement a proper roaming algorithm? Well, they are probably not. Roaming algorithms are always an arbitration between contradicting needs. This article explains what happens when your wireless device decides to roam, and what choice can be made that will make your roaming experience a non-event or a nightmare.
Notice that this article is full of notes designed to add to your knowledge. You can skip the notes if you are only focusing on the roaming and scanning issue.

Your wireless device and its environment

To understand how roaming happens, you have to put yourself in the shoes of your wireless device (yes, wireless devices have shoes sometimes). For your wireless device, the world is full of unknown. It does not have the nice administrator view of the entire wireless infrastructure. If you were the wireless device, the only things you would know would be :
  • that there is one AP
  • communication with this AP is possible.
You know a lot about yourself, but you don't know much about this AP. You know:
  • the AP channel,
  • the BSSID (the AP MAC address associated to the SSID, or network name, your user configured you to join).

As you receive frames from the AP, you analyze the frame RSSI, and deduce the data rate you could use to send unicast frames back to the AP. Your know your current power level, but have no real idea of the AP power level.

Suppose that you send a frame and it is not acknowledged.... you start asking yourself many questions:
  • Is it that your power is too low and the AP did not hear you?
  • Should you resend at the same data rate with a higher power level, if possible?
  • Was there a collision because someone else sent at the same time? Should you just wait an EIFS (extended interframe space, used when a collision is detected), and resend the frame at the same data rate, same power level?
  • Did the user move away from the AP, and is your current modulation/data rate not adapted anymore?
  • Should you resend the frame at a lower data rate? With the same power level? Or a higher power level, to be safe?
  • Are you getting to the edge of the cell? Is there a cell edge?
  • Did the administrator determine an area where you are not supposed to be in the current cell anymore?
  • How is this edge determined? Based on AP power limitation? Based on physical obstacles (door, wall, etc. Remember that you do not see, so you do not know if your user brought you behind a wall or not)? Based on allowed rates limitation?
  • If there is a cell edge and if you are getting close to it, should you start to scan for another AP?

Based on these 4 possible scenarios (power too low for the AP current distance, collision, distance to AP increased, edge of the cell reached, with or without porential other AP), your wireless client driver has to make a decision on what to do next...

Power level or data rate?

For all these possibilities, power level is a critical issue. If you are a mobile wireless device, your ability to conserve and spend sparingly your battery energy is what makes you popular (who wants to buy a VoWLAN phone with a one hour battery life if another brand offers the same type of VoWLAN phone with a 6 hour battery life?). For this reason, it is common to see wireless devices make initial power level decisions when joining a cell. Based on the AP RSSI and SNR, an internal algorithm decides of the right power level. For example, suppose your power level ranges from 1 mW to 40 mW. Your user turns you on, and you are set to automatically join an SSID. You send a probe request at the lowest mandatory rate you support and maximum power (40 mW).
_______________________
This is a mandatory requirement as per the 802.11 standard. Stations discovering the network by sending a probe request always send the request at highest possible power, and lowest possible data rate. This ensures that the request is heard as far as possible.
This is a probe request:

The AP is expected to answer with a probe response:

A probe response has the same format as a beacon. The only difference is that a beacon contains an additional field that the probe response does not contain (the TIM, or Traffic Indication Map, which lists the stations for which the AP has traffic buffered. All other fields are the same).
__________________________


You receive a probe response from the AP and you determine that this frame was received at – 37 dBm RSSI and 41 dBm SNR. Your internal power determination algorithm immediately thinks: “Wow! These are awesome conditions! I must be very close to the AP! I bet I can send and be heard at 5 mW!”. You then start sending your next frames at 5 mW, and check your success rate. This success rate is often determined in terms of PER (packet error rates): how many frames get dropped and not acknowledged when I use this power level? If there are significantly more drops than at the previous power level (and my vendor-specific power level decision making algorithm is going to tell me how much “significantly more” is going to be), I might need to increase my power level until the packet error rate falls below an acceptable threshold.
Depending on the vendor, this power change test is made often... or not. If your priority is to conserve battery power, you may want to make this test and lower your power level when first joining the cell, then keep the power low as long as you can. This means that if you determined a comfortable power level and your packet error rate starts to increase, you may want to choose to revert to a lower data rate, at same power level, rather than increase your power level.

To scan or not to scan

If you send a unicast frame to the AP that does not get acknowledged, your first move is probably to give that frame a second chance. You wait an EIFS, pick up a random number, countdown from there, then resend the frame a second time, at same power level and same data rate.
What if the frame does not get acknowledged a second time? It is time to think. Here again, each vendor proprietary algorithm will determine how you think:
  • Should you try a third time? 
  • Should you revert to a lower, more robust, data rate? 
  • Should you increase your power level? 
  • Should you start scanning for other APs? Are there any other APs? 
If you are in a SOHO environment with only one AP, scanning is simply a waste of battery energy... and nothing tells you what type of environment you are in (the user of the wireless device might know, but the wireless device itself has no clue). You will not know about other APs until you start scanning the other channels... which consumes time and energy, maybe just to discover that there is no other AP, or that the other APs do not serve your SSID.
Reverting to a lower data rate may be a safer solution from a battery conservation standpoint.

SRA

The first algorithms that were implemented with this type of logic are commonly called SRAs: sticky roaming algorithm. With SRA, you try to hang on to the current AP as much as you can, lowering your data rate down to the lowest rate if you have to.
This is usually done one data rate at a time. For example, if you were transmitting at 48 Mbps, you would first reverse to 36 Mbps, and try that data rate (for one or several frames depending on your internal proprietary algorithm). If 36 Mbps does not provide a satisfactory loss rate (i.e. loss rate, or packet error rate is still too high for your internal algorithm "acceptable level)", you would try 24 Mbps, etc.
If reverting down to lower data rates is not enough, labs experiments determined that increasing your local power level would be less energy costly than jumping in to the unknown of scanning and roaming. So once your reached your lowest possible data rate, you would increase your power level to maintain your packet error rate below the acceptable level.
___________________________
Notice that some vendors implement an "intelligent SRA" algorithm, that takes into account the AP signal. For example, if the AP RSSI was -41 dBm and suddenly drops to -71 dBm, the "intelligent algorithm" would determine that increasing the power level is needed even before lowering down the data rate. Each vendor has a table of specifications that determines what data rate is possible at what power level, for example here.
___________________________

From an admin standpoint, these clients are the sticky clients, unadapted to enterprise environments. These clients stick to their old AP, even if they are far away from it and just below another AP that would provide far better performances. But understand that from the client perspective clinging to the AP is related to survival, in order to maintain battery power. This client will start scanning as a last resort decision, because it does not know if there are any other APs out there, and is not expected any other AP anyway.

Better algorithms

But is this clinging behavior really conserving battery? If you are transmitting at 54 Mbps for example, sending 2346 bytes of data in a frame may take you 350 microseconds, while transmitting the same frame at 1 Mbps may take you 550 microseconds. Simple math shows you that your radio is on for 60% more time when you send at 1 Mbps, therefore consuming 60% more power. Also, if you are in the 1 Mbps area, you are far away from the AP. Your frame on its way to the AP has far more chances to collide with another RF signal than if you were close to the AP and sending at 54 Mbps.

_________________________
This increased risk is related to 2 factors:
  • the signal has a longer path to travel, so it has more chances to hit another signal on this long path: you have less chances to hit another signal if you travel 3 meters than if you travel 100 meters.
  • the signal takes longer to cross that distance, so it stays longer in the air, and the longer a signal is in the air, the more chances it has to be hit by something suddenly starting to send.
_________________________


This means that your error rate is “naturally” going to be higher at the edge of the cell than close to the AP. If the error rate is higher, the retry rate is also going to be higher. The more you retry, the more your device is using its battery to re-send instead of peacefully going to sleep/doze mode.
This make that the SRA algorithms were not that battery-efficient after all. As you device moves away from the AP, its energy consumption increases (because of signals taking longer to be sent, and having to be resent more often).
For this reason, second generation algorithms were built that determined that the station should start scanning before getting to the extreme situation of completely losing contact with the current AP.

Some drivers were even designed to allow you to determine the roaming (and therefore scanning) aggressiveness. If you know that you are in a corporate environment with many APs, you can set the behavior to “aggressive roaming” (scan early and jump if you find a better AP). If you know that you are in a one AP environment, you can set the behavior to “conserve power” (stick to the current AP as long as you can). For example, the Intel 4965 (win7 driver):



ERA: scanning doubts

Okay, so you decided to throw away your first generation SRA algorithm, and implement instead the "enterprise roaming algorithm" (ERA), starting to scan as you move away from the AP, in order to jump to another AP and maintain a good data rate, because a good data rate equals better battery conservation.
But wait a minute. This is easier said than done. Scanning in itself is not going to solve all your problems. Here again, try to think like a wireless card. Scanning is going to consume power... also, scanning can be done passively or actively.

Passive scanning: energy-efficient but time-consuming

Passive scanning is the most “energy efficient” mode. You set your radio to the next channel, and listen to detect if any beacon is heard. If your original AP is on channel 1, you may want to jump to channel 6 and listen there, because it is the next adjacent (non-overlapping) channel.
_________________________
In the IEEE 802.11 standard, a channel is adjacent if it is not overlapping with the other channel. In 2.4 GHz band, channels are 5 MHz apart. Channel 1 peak frequency is 2412 MHz, channel 2 peak frequency is 2417 MHz (so channel 1 and channel 2 are 5 MHz apart).
Two channels are not overlapping for 802.11b if their peak frequency is 25 MHz apart. Two channels are not overlapping for 802.11g if their peak frequency is 20 MHz apart (but as most 802.11g systems are built to be compatible with older 802.11b clients, 25 MHz is typically used for 802.11b.g networks).
Channel 6 peak frequency is 2437 MHz, 25 MHz away from channel 1, so channels 1 and 6 are not overlapping. Again, 2 channels are adjacent if they are non-overlapping. Channel 1 and 6 are adjacent.
You will find many vendors who erroneously call "adjacent" 2 neighbouring and overlapping channels, for example 1 and 2.  As log as you understand what the vendor means, all is perfect, but be aware that these channels are not adjacent for the 802.11 standard, they are overlapping.
_________________________


Jumping from channel 1 to channel 6 to scan is nice, but here again, you do not know the environment. Maybe the next AP is set to channel 3, and is far away enough so that you will not understand its signal when listening on channel 6. So you probably have to scan each channel in turn, starting from 2, then 3, 4, etc.

_________________________
Can you hear a signal on channel 3 from channel 6? Well, maybe. Any signal spreads beyond its main frequency, although it is weaker as you move away from the main frequency. This phenomenon is related to the spectral mask. This is a typical spectral mask for OFDM signals:
 You can see here that the 802.11 specification dictates that your signal should be -28 dB weaker than the main signal when you are 20 Mhz away from the main frequency. So it is weaker, but your card may be able to hear it, and maybe understand it.
By the way, beacons contain a field, called DS Parameter set, that indicates on which channel the AP is supposed to be. If the signal was captured while scanning another channel, at leat your station will know on which channel the AP is:

___________________________



How long should you listen on each channel? You know that beacons are sent by default every 100 TU (or 102.4 ms). Reason would say that you should stay at least that long on each channel... but this would mean that it would take you more than one second to scan all channels in the 2.4 GHz spectrum, and even worse, maybe more than 2 seconds (depending on your regulatory domain and the number of allowed channels) in the 5 GHz spectrum...
What if your current AP has traffic to send to you in between? Luckily, the 802.11 standard thought about this issue. Before leaving your current channel, you need to send a frame to the AP with your Power save bit set to 1, so that the AP knows that you are not available.
 _________________________
With non-WMM stations, this frame is an data null frame, in other words a data frame with an empty body, and just the power management bit set to 1:
 With WMM station, any frame can be sent by the station to the AP, as long as the Power Management bit is set to 1.
When returning to the channel, non-WMM stations must ask the AP if any traffic arrived and was buffered in between, using a specific PS_Poll frame.
WMM station can simply send any frame to the AP to inform about their return to the channel.
__________________________

Does that Power management bit solve your "I'm off channel for a while to detect other APs" issue? Not completely. You are still supposed to be back to the active channel for the next DTIM.
A DTIM is the beacon that says that the AP has broadcast or multicast traffic to send to the cell. A DTIM can be sent every beacon, or at longer interval (2, 5, 200 beacons if you configure your AP that way).
If this DTIM is in every beacon, you have to be back in less than a beacon interval... so, you CANNOT be away for an entire beacon interval! Do you really need to be away that long? One way around this issue is simply to scan, and jump back to your main channel as soon as you hear a beacon in the scanned channel. For example, suppose that you are scanning channel 3. In worst case scenario, there is no AP there, and you stay until it is time to jump back to channel 1 and listen to the next DTIM. In best case scenario, you hear a beacon in channel 3 after just a few milliseconds of scan, and happily return immediately to channel 1, knowing that there is an AP in channel 3.

Does that make an efficient scanning algorithm? Not really... In fact, none of these behaviors is entirely satisfying:
  • if there is an AP in channel 3 and it has the same beacon interval as your main AP in channel 1, and if by coincidence both APs send their beacon at the same time, you will not discover the AP in channel 3... although it is there!
    Luckily, there is a solution for this issue: go back to scan channel 3 a few times... why would that solve the problem if both APs are set to send their beacon at the same time and with the same interval? Because a beacon is just like any other frame. Suppose the beacon interval is 100 TU. 100 TU after having started to send the previous beacon, the AP will try to send the next beacon. In order to do so, the AP will need the medium to be idle (if someone is sending at that time, the Ap will have to wait until the medium gets free). The AP will also need to pickup a random number and countdown from there, just like for any other frame. This makes that although the beacon interval is set to 100 TU, practical cell conditions make that there is usually not exactly 100 TU between each beacon. By coming back a few times to channel 3, you will eventually hear the AP beacon.
  • If after listening to channel 3, you hear a beacon (and better yet, a beacon from an AP supporting your SSID, with an acceptable RSSI and SNR), should you be satisfied? Probably not! This AP you heard may be far, even if its signal is acceptable. There may be another, closer AP, that you haven't heard yet.
    Jumping to the conclusion that the AP you heard is your next best candidate may be a mistake. Some drivers make this mistake, which leads to poor roaming decisions (and comments from the network admin, in the mood “why on earth is that client jumping to this AP in the lower floor whereas there is another better AP just above the client, in the same room??). Wisdom states that you should make sure that you detected all APs before deciding that you know about channel 3. This brings you back to the scenario where you have to stay longer on channel 3...
The sad conclusion is that there is no easy solution: in order to passively scan, you need to spend time away from the main channel and listen. Passive scanning is energy-efficient but time consuming...


Active scanning: time-efficient but energy-consuming

Another way to detect the environment is active scanning. Instead of passively listening to the other channels, you send a probe request. This behavior is more efficient, because APs are supposed to answer probe requests. Within a few milliseconds, you can know what other AP is on the scanned channel.
Once again, this is still not a perfect solution:
  • Some environments disallow active scanning (flight safe mode for example). Therefore, some wireless clients do not be set to actively scan by default... my nokia phone E71 is a perfect example.
  • Some APs are set to hide their SSID and not answer to probes... but this is a deviation from the standard.
  • Just like for passive scanning, should you be happy with the first probe response you get? Probably not. Here again, the contention mechanisms apply, and the probe response is just like any other frame: in order to send it, the AP must decide that the medium is idle, pick up a random number and count down from there before sending. Therefore, you may very well receive the probe response from a distant AP before the response from a closer AP. You need to spend some time on the channel before deciding that you are sure that you got all answers.
Your request, just like the AP response, may collide, and you may not be aware of that collision. If you send a request and get not response, is it that there is no AP or that your request was not received because of a collision? If you receive 2 answers, does that mean that there are 2 APs on this channel, or could there be a third AP that did not get your probe, or whose response was lost because of a collision? In most cases, you will have to probe a few times before deciding that you know what APs are on that scanned channel.
Spending time means spending energy. This is worsened by the fact that active scanning implies sending and receiving, which consumes more battery power that simply receiving. Active scanning is more time-efficient than passive scanning (although it does take time), but also more energy-consuming.

HARA: hybrid adaptive

So scanning while conserving battery power is a fine balance between passive scanning and active scanning, while trying to privilege  lower data rates before deciding that scanning is needed. The exact formula (at what AP RSSI/SNR drop level do you start scanning, how many times do you retry a lost frame before reverting to lower data rate, when do you increase your local power) depends on the vendor. The exact algorithm is of course kept secret, not only because you don't want to help your competitors by providing tools that would help them be more efficient, but also because the behavior depends in great part on the exact hardware you use (circuits efficiency, card and antenna position in the device and performances). The algorithm implementing this type of adaptive behavior are usually grouped under the common name Hybrid Adaptive Roaming Algorithm (HARA).
Some drivers are directly adaptive and implement different types of scanning behavior. When your AP RSSI/SNR/packet error rate reaches a specific (vendor dependent) threshold, your client would start passively scanning the other channels. If your reach another, lower threshold and are about to lose the current connection to your AP, and if passive scanning did not provide any good candidate AP to roam to, your client turns to panic mode and starts frantically and actively scanning the other channels, to find another AP before it is too late and your connection is lost. Smart eh? Some vendors even do a pre-assessment (for example Intel). When you first switch on your wireless device, the device actively scan all channels, and deduces the environment type. If multiple APs with the same SSID are detected, the environment will be seen as Enterprise, and your client will start passively scanning in the background soon in the roaming process (because it knows that there are other APs out there). If no other AP is detected with the same SSID, the client will revert to a sticky behavior, closer to SRA, because it assumes a home or SOHO environment. It is still not a pure SRA, because you may be in a meeting room where only one AP is detected, and still in an enterprise environment. Therefore, the HARA will still switch to passive/active scanning when your AP RSSI/SNR levels drop.

Going further: the need for scanned channel AP power

All these constraints lead you to the conclusion that you will have to spend some time scanning around before deciding that there is another AP that you jump to. Why do some devices roam well and others don't? It depends of course on which algorithm they implement. Recent devices may still implement SRA or ERA algorithm, just because the vendor never updated its roaming algorithm for this type of device (for many possible reasons, ranging from cost to typical device use cases). Even if the device implements HARA, it adaptation to your environment will depend heavily on if the device expected behavior match your deployment conditions. Drivers are heavily tested and optimized for specific roam events (for example office environment with sudden roam needed due to a door closing between the device and AP, or warehouse low roam and stickiness expected for a barcode scanner in a high multipath environment). Although it is not possible for you to get the exact roaming behavior for which the device is optimized (unless you work for the driver development team of one of these vendors haha), reading the release notes of the driver with an "educated eye" will give you some hints on what the vendor tried to achieve in each driver release.
In all cases, blame the environment, not the device!

Regardless on how carefully the vendor designed its driver, there are still many parameters out of control of your device. We will name just one to be used as a typical example. Even when you scan and discover other APs, a doubt will stay in your wireless device mind (yes, wireless devices also have minds, sometimes :-)): what is the power of the AP I just discovered?
Does this power matter? Yes, it is key in the roaming decision.
Look at the following scenario:

This is a bird view, so you know that the laptop is moving toward the right. The laptop was connected to an AP somewhere on the lower left, and decided that it had to roam. Scanning the other channels, the laptop discovers 2 APs. One AP offers a -71 dBm RSSI (AP to laptop signal), and the other a -72 dBm RSSI. Whcih AP should the laptop roam to?
Even an admin having a global view would have to stop and think for a second. AP1 power level is 5 mW, just like the laptop power level. This power level symmetry allows for an identical data rate on the way up and on the way down, 24 Mbps in both cases. Good! On the other hand, the laptop is moving away fro this AP, so this nice connection is only going to last for a few seconds.
___________________________
For how long exactly? It depends of course on how far from the AP the client is. A general rule is that you lose 46 dB at 1 meter from the AP (so if the AP signal is 20 dBm, the client should read an RSSI of about -26 dBm), and you lose roughly 10 dBs as you double the distance (- 36 dBm at 2 m, -46 dBm at 4 m, etc). This is very general and the environment characteristics would dictate the exact values, but these are common references in indoor open space deployments. Typical roaming speed is 1.2 m / seconds...
___________________________

So connecting to AP2 may be better after all... except that for now, AP2 offers a lower RSSI (-72 dBm). AP2 power is also very different from the laptop power level: 40 mW against 5 mW. This makes that frames down from the AP may get to the laptop at 24 Mbps, but frames from the laptop to the AP will probably have to be sent slower, for example at 6 Mbps. This is something that the laptop will have to discover by sending frames and reverting back. Without knowing the AP power, the laptop has to assume that if the AP RSSI is -72dBm, frames sent to the AP will also be heard by the AP at -72 dBm. It's only by failing to get ACK frames that the laptop will understand that there is a power mismatch ad that frames have to be sent slower. It will try 18, then 12, then 9 before succeeding at 6 Mbps. What a waste of time! Yet, as the laptop moves toward AP2, this is more of time investment, as the connection to AP2 will improve after a few seconds...
You know, what would be great is if the laptop could know the AP power, so that, as the laptop moves toward the right, the laptop could evaluate both APs RSSI variation and determine the best strategy: jump temporarily to AP1, then AP2, or directly to AP2. But there is no mechanism in the 802.11 standard to allow for this kind of power level information exchange between APs and stations...

Cisco Wireless VoIP 802.11n phone?

Cisco announced a few days ago the end of life / end of sale for their Cisco Unified Wireless IP phone 7921 (see here). The 7925 is the last end user product in the wireless line (even the CB21Ag card, with its PCMCIA format and lack for Win7 support is dying away)... so will Cisco give up and soon throw away the 7925 as well, or will they come up with new product and a 802.11n phone?
Cisco is clearly moving away from the end-user market, but WiFi phones are an enterprise product that complement the large range of Cisco wired IP phone.
But a 802.11n phone? Cisco has already certified the Cius as a 802.11n phone. Cisco and Dell are the only ones having a Wi-Fi-only 802.11n phone (see here), but there are hundreds of dual band GSM/WiFi certified 802.11n phones (see here).
All these phones are single radio, single stream. This is allowed by the 802.11n amendment, and is expected to last for a while. The main concern in a phone is to conserve battery. As soon as you put 2 radio modules in a device, you consume twice as much power as another device with a single radio module. Although you can find clever ways to use the second module sparingly, your phone battery is still going to last a lot less than you competitor's single-radio/single stream phone... not to mention the technical challenge to put 2 or 3 radio modules, with 2, 3 or 4 antennas, in a device that has to be light and fit in your hand.
Isn't single radio/single stream enough anyway? If you use your phone for what it is... a phone, you need to send a receive 50 packets per second, consuming about 200 Kbps. So why bother with 802.11n? The gain in range might be useful to you, but all design guides work hard at convincing wireless network designers that they should design small cells, to control the data rate and number of devices in their network. So you might be able to take advantage of the additional range provided by 802.11n, but only if the network you join is poorly designed... :-D
Most dual band phones certified for 802.11n offer this certification to allow for a more comfortable browsing (and downloading) experience, not really for the voice part itself.
So it would probably make sense for Cisco to work on 802.11n hybrid devices (phone + something else that needs bandwidth, like the Cius and its telepresence feature), but probably not for a pure "phone only" device... I have no internal insight on Cisco secret plans, so this is just a thoughts I figured I should share, as I often get this Cisco 802.11n phone question, but this is by no means an informed or even educated guess on Cisco strategy... :-)