Message ID | 20200602001018.83640-2-matthewmwang@chromium.org |
---|---|
State | Changes Requested |
Headers | show |
Series | [1/6] Use LUT instead of macro | expand |
On Mon, Jun 01, 2020 at 05:10:14PM -0700, Matthew Wang wrote: > The roaming logic is currently too aggressive and full of holes. In > particular, the (arbitrary) values we currently assign to the roaming > difficulty are too small; an AP can reasonably move up or down 5dB in > RSSI in an enterprise environment even when the device is static. The > logic needs to be more resilient to these small fluctuations. > > The following notes describe each of the changes made and the rationale > behind it. > 1. Remove all short-circuiting and allow the roaming logic below to run > its course. This way we avoid overlooking important factors. > 2. Add a to_2ghz flag that indicates if we are moving from a 5ghz AP to > a 2ghz AP. This should increase the difficulty to roam in that > direction and mirror the to_5ghz flag. > 3. Adjust the min_diff upward across the board. Originally, we would > roam with a RSSI imporvement of 1dB in the weakest bucket and 5dB in > the strongest bucket. RSSI is fairly fickle and APs can easily > fluctuate by more than that in a normal enterprise environment. > Additionally, there should be more buckets. We currently stop > differentiating after the RSSI is at least -70dB. This is still > pretty weak, and the difficulty should continue to increase above > that. Move the threshold up to 4dB in the weakest bucket and 10dB in > the strongest bucket, and add two more buckets (up to -60dB). > 4. We adjust the roam difficulty up and down based on the estimated > throughput of the selected AP relative to the estimated throughput of > the current AP. Small gains or losses in estimated throughput should > not be rewarded or penalized as heavily as they are now since > estimated throughput is a hand-wavy measurement to begin with. Relax > these adjustments. This seems to prevent roaming from non-HT to HT AP if those APs have the same signal strength. This does not sound desirable. Being able to use HT will likely get better throughput and reduced airtime need. For example, the prefix_ht20_during_roam test case fails because of these.
Thanks for the response. I ran the prefer_ht20_during_roam test and it looks like it expects a roam from a non-HT AP at -30 dBm with 54 Mbps est_tpt to an HT AP at -30 dBm with 65 Mbps est_tpt. I'm not too clear on all the phy layer details of HT, but in my opinion, roaming from an AP at such a strong signal level (-30 dBm) needs a fairly strong justification, and a 20% increase (54 -> 65) in estimated throughput is not that strong. That being said, if there are benefits to moving from non-HT to HT that aren't captured in the {signal, est_tpt} pair, then I'd advocate for using a flag that behaves similarly to the to_{2,5}ghz flags, which adjusts the roaming difficulty up/down 2 dBm depending on the direction of the roam. We can assign some value for {to,from}_ht flags that subtracts from or adds to the roaming difficulty depending on how valuable HT vs non-HT is. Does this make sense? On Tue, Jun 23, 2020 at 2:30 PM Matthew Wang <matthewmwang@chromium.org> wrote: > > Thanks for the response. I ran the prefer_ht20_during_roam test and it > looks like it expects a roam from a non-HT AP at -30 dBm with 54 Mbps > est_tpt to an HT AP at -30 dBm with 65 Mbps est_tpt. I'm not too clear > on all the phy layer details of HT, but in my opinion, roaming from an > AP at such a strong signal level (-30 dBm) needs a fairly strong > justification, and a 20% increase (54 -> 65) in estimated throughput > is not that strong. That being said, if there are benefits to moving > from non-HT to HT that aren't captured in the {signal, est_tpt} pair, > then I'd advocate for using a flag that behaves similarly to the > to_{2,5}ghz flags, which adjusts the roaming difficulty up/down 2 dBm > depending on the direction of the roam. We can assign some value for > {to,from}_ht flags that subtracts from or adds to the roaming > difficulty depending on how valuable HT vs non-HT is. Does this make > sense? > > On Fri, Jun 19, 2020 at 8:31 AM Jouni Malinen <j@w1.fi> wrote: > > > > On Mon, Jun 01, 2020 at 05:10:14PM -0700, Matthew Wang wrote: > > > The roaming logic is currently too aggressive and full of holes. In > > > particular, the (arbitrary) values we currently assign to the roaming > > > difficulty are too small; an AP can reasonably move up or down 5dB in > > > RSSI in an enterprise environment even when the device is static. The > > > logic needs to be more resilient to these small fluctuations. > > > > > > The following notes describe each of the changes made and the rationale > > > behind it. > > > 1. Remove all short-circuiting and allow the roaming logic below to run > > > its course. This way we avoid overlooking important factors. > > > 2. Add a to_2ghz flag that indicates if we are moving from a 5ghz AP to > > > a 2ghz AP. This should increase the difficulty to roam in that > > > direction and mirror the to_5ghz flag. > > > 3. Adjust the min_diff upward across the board. Originally, we would > > > roam with a RSSI imporvement of 1dB in the weakest bucket and 5dB in > > > the strongest bucket. RSSI is fairly fickle and APs can easily > > > fluctuate by more than that in a normal enterprise environment. > > > Additionally, there should be more buckets. We currently stop > > > differentiating after the RSSI is at least -70dB. This is still > > > pretty weak, and the difficulty should continue to increase above > > > that. Move the threshold up to 4dB in the weakest bucket and 10dB in > > > the strongest bucket, and add two more buckets (up to -60dB). > > > 4. We adjust the roam difficulty up and down based on the estimated > > > throughput of the selected AP relative to the estimated throughput of > > > the current AP. Small gains or losses in estimated throughput should > > > not be rewarded or penalized as heavily as they are now since > > > estimated throughput is a hand-wavy measurement to begin with. Relax > > > these adjustments. > > > > This seems to prevent roaming from non-HT to HT AP if those APs have the > > same signal strength. This does not sound desirable. Being able to use > > HT will likely get better throughput and reduced airtime need. For > > example, the prefix_ht20_during_roam test case fails because of these. > > > > -- > > Jouni Malinen PGP id EFC895FA
On Tue, Jun 23, 2020 at 02:41:19PM -0700, Matthew Wang wrote: > Thanks for the response. I ran the prefer_ht20_during_roam test and it > looks like it expects a roam from a non-HT AP at -30 dBm with 54 Mbps > est_tpt to an HT AP at -30 dBm with 65 Mbps est_tpt. I'm not too clear > on all the phy layer details of HT, but in my opinion, roaming from an > AP at such a strong signal level (-30 dBm) needs a fairly strong > justification, and a 20% increase (54 -> 65) in estimated throughput > is not that strong. That being said, if there are benefits to moving > from non-HT to HT that aren't captured in the {signal, est_tpt} pair, > then I'd advocate for using a flag that behaves similarly to the > to_{2,5}ghz flags, which adjusts the roaming difficulty up/down 2 dBm > depending on the direction of the roam. We can assign some value for > {to,from}_ht flags that subtracts from or adds to the roaming > difficulty depending on how valuable HT vs non-HT is. Does this make > sense? That behavior is not limited to so high signal strength value. Same happens with all other values and the impact is much more severe in some cases. For example, with both APs at -86 dBm non-HT AP gets 2 Mbps est_tpt and HT AP gets 8.666 Mbps. Still, roaming is prevented ("Skip roam - too small difference in signal level (0 < 4)"). In other words, even that min_diff -= 2 style adjustment that is given for to_5ghz would not help here. In fact, it looks like this patch would make it impossible to roam to an AP that shows better throughput estimate (regardless of how large a difference) on the same band unless there is also at least 4 dB increase in the signal level when cur_level < -85. That does not look reasonable. And yes, HT does come with possibility of using A-MPDU and A-MSDU aggregation which can result in significant improvement in throughput and airtime efficiency and that is in addition to the estimated throughput values from TX rates. So it might indeed make sense to add/subtract difficulty for roaming, but still, the main issue I see here is in that behavior that seems to completely prevent roaming unless signal level increases significantly regardless of other factors.
> the main issue I see here is in that behavior that seems to completely prevent roaming unless signal level increases significantly regardless of other factors. This is only the case for the lowest bucket (< -85dBm), as you've pointed out, and in this case, I would argue that roaming for estimated throughput gains doesn't actually provide much (if any) benefit. -86dBm is hardly usable WiFi, and the roam will only add to the disruption in connectivity, resulting in a decrease in the actual throughput. Putting the non-HT vs HT question aside for a moment, if we take two APs that are both at approximately -86 dBm from the device, according to the original algorithm, any fluctuation in signal level where the current signal is any worse than the signal of the other AP will cause a roam, and as I'm sure you're aware, fluctuation in signal level happens constantly. What we end up seeing is the device aggressively roaming back and forth for no real signal or throughput gains. The poor signal level (combined with constant roaming) far outweighs any gains in estimated throughput. Even taking into consideration the 2 Mbps non-HT AP vs the 8.666 Mbps HT AP situation, I think it makes sense to encode some slight bias towards the HT AP for the reasons you mentioned (A-MPDU and A-MSDU), but there's no point in roaming to the HT AP if we're just going to roam away from it because the signal fluctuated a couple dBm. The point I'm trying to make is that RSSI measurements are moving targets and can see quite a bit of variance, and the algorithm should be able to absorb some of this. If we encoded a preference for HT, we could see something like: at -86 dBm, roam to HT AP when signal is 2 dBm greater, roam to non-HT AP when signal is 6 dBm greater. In this case, even if two APs have the same "real" RSSI relative to the device, it's extremely likely, and, in fact, almost inevitable, that the HT AP will be > 2 dBm greater at some point, and it's much less likely that the non-HT AP will be > 6 dBm greater at some point. Using the mac80211_hwsim module to test roaming has this shortcoming - that the RSSI measurements are too precise and don't actually simulate real-world environments accurately in this regard. My arguments up to now have been pretty much entirely conjecture, but I'd like to offer some (maybe slightly hand-wavy) data. ChromeOS is one of the largest (if not the largest) consumers of the roaming functionality in wpa_supplicant since Android offloads roaming to the chipset. We've gotten many complaints about roaming from enterprise customers in the past, but ever since we've adopted these roaming changes nine or so months ago (along with a few more since I first submitted this patch) in our tree, we've had customers tell us that their issues were resolved and roaming complaints as a whole have become more or less non-existent. Additionally, we've since added integration tests to simulate movement between APs using programmable attenuators, and we've seen measurable improvements in stability (device is able to withstand minor fluctuations in signal level without roaming) and no decreases in mobility (device is still able to roam before getting deauth-ed for leaving the original AP). I hope this makes sense, and I'm more than happy to continue discussing anything you still think needs to be improved or is unsatisfactory. I think encoding the HT preference is reasonable and I'll go ahead and do that when I get a chance, but beyond that, I still believe this is a step in the right direction. Sorry for the long response, and if you've made it this far, thanks for reading :) On Sun, Feb 28, 2021 at 1:22 AM Jouni Malinen <j@w1.fi> wrote: > > On Tue, Jun 23, 2020 at 02:41:19PM -0700, Matthew Wang wrote: > > Thanks for the response. I ran the prefer_ht20_during_roam test and it > > looks like it expects a roam from a non-HT AP at -30 dBm with 54 Mbps > > est_tpt to an HT AP at -30 dBm with 65 Mbps est_tpt. I'm not too clear > > on all the phy layer details of HT, but in my opinion, roaming from an > > AP at such a strong signal level (-30 dBm) needs a fairly strong > > justification, and a 20% increase (54 -> 65) in estimated throughput > > is not that strong. That being said, if there are benefits to moving > > from non-HT to HT that aren't captured in the {signal, est_tpt} pair, > > then I'd advocate for using a flag that behaves similarly to the > > to_{2,5}ghz flags, which adjusts the roaming difficulty up/down 2 dBm > > depending on the direction of the roam. We can assign some value for > > {to,from}_ht flags that subtracts from or adds to the roaming > > difficulty depending on how valuable HT vs non-HT is. Does this make > > sense? > > That behavior is not limited to so high signal strength value. Same > happens with all other values and the impact is much more severe in some > cases. For example, with both APs at -86 dBm non-HT AP gets 2 Mbps > est_tpt and HT AP gets 8.666 Mbps. Still, roaming is prevented ("Skip > roam - too small difference in signal level (0 < 4)"). In other words, > even that min_diff -= 2 style adjustment that is given for to_5ghz would > not help here. In fact, it looks like this patch would make it > impossible to roam to an AP that shows better throughput estimate > (regardless of how large a difference) on the same band unless there is > also at least 4 dB increase in the signal level when cur_level < -85. > That does not look reasonable. > > And yes, HT does come with possibility of using A-MPDU and A-MSDU > aggregation which can result in significant improvement in throughput > and airtime efficiency and that is in addition to the estimated > throughput values from TX rates. So it might indeed make sense to > add/subtract difficulty for roaming, but still, the main issue I see > here is in that behavior that seems to completely prevent roaming unless > signal level increases significantly regardless of other factors. > > -- > Jouni Malinen PGP id EFC895FA
diff --git a/wpa_supplicant/events.c b/wpa_supplicant/events.c index f0f91892f..d6e4be3c7 100644 --- a/wpa_supplicant/events.c +++ b/wpa_supplicant/events.c @@ -1782,8 +1782,10 @@ static int wpa_supplicant_need_to_roam(struct wpa_supplicant *wpa_s, struct wpa_bss *current_bss = NULL; #ifndef CONFIG_NO_ROAMING int min_diff, diff; - int to_5ghz; + int to_2ghz, to_5ghz; int cur_level; + int adjust = 0; + double adjust_factor, est_ratio; unsigned int cur_est, sel_est; struct wpa_signal_info si; int cur_snr = 0; @@ -1871,65 +1873,60 @@ static int wpa_supplicant_need_to_roam(struct wpa_supplicant *wpa_s, cur_level, cur_snr, cur_est); } - if (sel_est > cur_est + 5000) { - wpa_dbg(wpa_s, MSG_DEBUG, - "Allow reassociation - selected BSS has better estimated throughput"); - return 1; - } - + to_2ghz = current_bss->freq > 4000 && selected->freq < 4000; to_5ghz = selected->freq > 4000 && current_bss->freq < 4000; - if (cur_level < 0 && cur_level > selected->level + to_5ghz * 2 && - sel_est < cur_est * 1.2) { - wpa_dbg(wpa_s, MSG_DEBUG, "Skip roam - Current BSS has better " - "signal level"); - return 0; - } - - if (cur_est > sel_est + 5000) { - wpa_dbg(wpa_s, MSG_DEBUG, - "Skip roam - Current BSS has better estimated throughput"); - return 0; + /* + * Set the minimum RSSI difference needed to roam (`min_diff`) and the + * estimated throughput weight (`adjust_factor`) based on the RSSI of + * the current AP. At low RSSI (< -70 dBm), we ignore estimated + * throughput gains and only consider RSSI gains. At higher RSSI, + * `adjust_factor` is multiplied by `adjust` (set below according to the + * selected AP's estimated throughput relative to the current AP's + * estimated throughput) to influence the `min_diff`. + */ + if (cur_level < -85) { /* ..-86 dBm */ + min_diff = 4; + adjust_factor = 0; + } else if (cur_level < -80) { /* -85..-81 dBm */ + min_diff = 5; + adjust_factor = 0.2; + } else if (cur_level < -75) { /* -80..-76 dBm */ + min_diff = 6; + adjust_factor = 0.4; + } else if (cur_level < -70) { /* -75..-71 dBm */ + min_diff = 7; + adjust_factor = 0.6; + } else if (cur_level < -65) { /* -70..-66 dBm */ + min_diff = 8; + adjust_factor = 0.8; + } else if (cur_level < -60) { /* -65..-61 dBm */ + min_diff = 9; + adjust_factor = 1.0; + } else if (cur_level < 0) { /* -60..-1 dBm */ + min_diff = 10; + adjust_factor = 1.2; + } else { /* unspecified units (not in dBm) */ + min_diff = 5; + adjust_factor = 1.0; } - if (cur_snr > GREAT_SNR) { - wpa_dbg(wpa_s, MSG_DEBUG, - "Skip roam - Current BSS has good SNR (%u > %u)", - cur_snr, GREAT_SNR); - return 0; + sel_est = selected->est_throughput; + if (cur_est > sel_est) { + adjust = 1; + est_ratio = sel_est == 0 ? 2 : (double) cur_est / sel_est; + } else { + adjust = -1; + est_ratio = cur_est == 0 ? 2 : (double) sel_est / cur_est; } + if (est_ratio > 2) + est_ratio = 2; + adjust *= (int) ((est_ratio - 1) * 10); + min_diff += adjust * adjust_factor; - if (cur_level < -85) /* ..-86 dBm */ - min_diff = 1; - else if (cur_level < -80) /* -85..-81 dBm */ - min_diff = 2; - else if (cur_level < -75) /* -80..-76 dBm */ - min_diff = 3; - else if (cur_level < -70) /* -75..-71 dBm */ - min_diff = 4; - else if (cur_level < 0) /* -70..-1 dBm */ - min_diff = 5; - else /* unspecified units (not in dBm) */ - min_diff = 2; - - if (cur_est > sel_est * 1.5) - min_diff += 10; - else if (cur_est > sel_est * 1.2) - min_diff += 5; - else if (cur_est > sel_est * 1.1) + if (to_2ghz) min_diff += 2; - else if (cur_est > sel_est) - min_diff++; - else if (sel_est > cur_est * 1.5) - min_diff -= 10; - else if (sel_est > cur_est * 1.2) - min_diff -= 5; - else if (sel_est > cur_est * 1.1) - min_diff -= 2; - else if (sel_est > cur_est) - min_diff--; - - if (to_5ghz) + else if (to_5ghz) min_diff -= 2; diff = selected->level - cur_level; if (diff < min_diff) {
The roaming logic is currently too aggressive and full of holes. In particular, the (arbitrary) values we currently assign to the roaming difficulty are too small; an AP can reasonably move up or down 5dB in RSSI in an enterprise environment even when the device is static. The logic needs to be more resilient to these small fluctuations. The following notes describe each of the changes made and the rationale behind it. 1. Remove all short-circuiting and allow the roaming logic below to run its course. This way we avoid overlooking important factors. 2. Add a to_2ghz flag that indicates if we are moving from a 5ghz AP to a 2ghz AP. This should increase the difficulty to roam in that direction and mirror the to_5ghz flag. 3. Adjust the min_diff upward across the board. Originally, we would roam with a RSSI imporvement of 1dB in the weakest bucket and 5dB in the strongest bucket. RSSI is fairly fickle and APs can easily fluctuate by more than that in a normal enterprise environment. Additionally, there should be more buckets. We currently stop differentiating after the RSSI is at least -70dB. This is still pretty weak, and the difficulty should continue to increase above that. Move the threshold up to 4dB in the weakest bucket and 10dB in the strongest bucket, and add two more buckets (up to -60dB). 4. We adjust the roam difficulty up and down based on the estimated throughput of the selected AP relative to the estimated throughput of the current AP. Small gains or losses in estimated throughput should not be rewarded or penalized as heavily as they are now since estimated throughput is a hand-wavy measurement to begin with. Relax these adjustments. 5. At lower signal levels, we shouldn't care as much about estimated throughput gains. In particular, below a certain threshold, we should only be chasing RSSI gains in search of a more stable connection. For example, it doesn't matter that we gain 10% estimated throughput if our RSSI is bad. Introduce an adjust_factor so that we can weight estimated throughput differently. Signed-off-by: Matthew Wang <matthewmwang@chromium.org> --- wpa_supplicant/events.c | 103 +++++++++++++++++++--------------------- 1 file changed, 50 insertions(+), 53 deletions(-)