Inconsistent WiFi

It can’t be working all that well, you’ve got 25% packet loss. You’re dropping a quarter of the total datagrams sent. I’m not sure that is all that great… -_-

Correct, still ~54Mbit/s is not bad. When lowering specified send rate from 100M to the 54M you used, the drop was 1%.

Did some more testing, and also with 802.11n/wmm enabled it will eventually reboot… It just seems to take longer/occur much less frequent.

I am also looking for a work around for the wifi issue. Here are some tests that I completed today.

  1. Open Embedded build (rpb-console-image)
    Replicated high ping times (some > 1000ms)
    Ping time seems to get worse as router signal strength is lower

  2. I added a USB WiFI adapter to the dragonboard on the openembedded rpb-console-image
    rtl8188cus chipset (enabled driver in menuconfig)
    I see the same ping times as the 3620. (>1000ms at times)
    This indicates the issue is not with the driver

  3. I added a USB to ethernet adapter and tried ping again
    1ms ping times to router
    20ms to 8.8.8.8 (similar to my laptop)

  4. Disabled network manager service and retested 3620 ping
    Connected using iw and wpa_supplicant commands
    Significantly better rstults
    <5ms pings to router
    20ms average to 8.8.8.8 with a few upto 50ms

  5. Repeated results on latest debian image with newer network manager
    OE network manager version: 1.0.12
    Debian network manager version: 1.6.x (forgot the exact version)

Can somebody confirm with another USB to WiFi adapter?

Takeaway: It appears to be less about the 36xx driver and more about networkmanager.

For sure it is easy to have unrealistic expectations of wireless and its definitely worth running checks with other well supported adapters to make sure we don’t start looking for ethernet levels of reliability over a wireless link.

In particular if the WiFi device is built around a single tuner then it will have high ping times during an SSID scan. It is part of the hardware design and NetworkManager cannot change that: NetworkManager and WiFi Scans – Dan Williams’ blog

If you would expect >1000ms ping times with any single tuner wifi, then why would linaro use networkmanager at all?

Why don’t we see this with Android if it is a fundamental issue with single tuner wifi?

Ultimately NetworkManager is the tool that gves us roaming support.

Ultimately scanning decisions are heuristics and NM and Android can
implement different approaches. The last time I looked into this there
was a very active discussion within NetworkManager on alternative
heuristics. In fact it looks like that work has just landed in the
upstream development branches:
https://bugzilla.gnome.org/show_bug.cgi?id=766482

Could still be a while before it hits a released image though…

We don’t expect 1s ping times due to NetworkManager, this is much more likely the result of you running wcn36xx with power save enabled; which does trigger some bug showing up as ridiculous rtt. Try running “iw dev wlan0 set power_save off” and there’s a good chance you’re seeing a huge improvement.

The network manager periodically issue a network scan, which does cause a measurable difference in the throughput and latency, but not on the scale affecting TCP traffic.

Here is what I see with power save off, network manager on

root@:~# iw dev wlan0 set power_save off
root@:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=57 time=925.628 ms
64 bytes from 8.8.8.8: seq=1 ttl=57 time=12.352 ms
64 bytes from 8.8.8.8: seq=2 ttl=57 time=12.170 ms
64 bytes from 8.8.8.8: seq=3 ttl=57 time=12.537 ms
64 bytes from 8.8.8.8: seq=4 ttl=57 time=12.217 ms
64 bytes from 8.8.8.8: seq=5 ttl=57 time=14.719 ms
64 bytes from 8.8.8.8: seq=6 ttl=57 time=952.531 ms
64 bytes from 8.8.8.8: seq=7 ttl=57 time=11.568 ms
64 bytes from 8.8.8.8: seq=8 ttl=57 time=11.998 ms
64 bytes from 8.8.8.8: seq=9 ttl=57 time=1012.062 ms
64 bytes from 8.8.8.8: seq=10 ttl=57 time=12.633 ms
64 bytes from 8.8.8.8: seq=11 ttl=57 time=11.972 ms
64 bytes from 8.8.8.8: seq=12 ttl=57 time=11.848 ms
64 bytes from 8.8.8.8: seq=13 ttl=57 time=11.751 ms
64 bytes from 8.8.8.8: seq=14 ttl=57 time=11.758 ms
64 bytes from 8.8.8.8: seq=15 ttl=57 time=12.942 ms
64 bytes from 8.8.8.8: seq=16 ttl=57 time=1012.024 ms
64 bytes from 8.8.8.8: seq=17 ttl=57 time=11.984 ms
64 bytes from 8.8.8.8: seq=18 ttl=57 time=1012.240 ms
64 bytes from 8.8.8.8: seq=19 ttl=57 time=15.943 ms
64 bytes from 8.8.8.8: seq=20 ttl=57 time=11.678 ms
64 bytes from 8.8.8.8: seq=21 ttl=57 time=11.872 ms
64 bytes from 8.8.8.8: seq=22 ttl=57 time=564.177 ms

same build, disable NetworkManager.service
ip link set wlan0 up
wpa_supplicant -B -D wext -i wlan0 -c /etc/wpa_supplicant.conf
dhclient wlan0

root@:~# iw dev wlan0 set power_save off
root@:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=57 time=11.601 ms
64 bytes from 8.8.8.8: seq=1 ttl=57 time=11.482 ms
64 bytes from 8.8.8.8: seq=2 ttl=57 time=11.448 ms
64 bytes from 8.8.8.8: seq=3 ttl=57 time=11.491 ms
64 bytes from 8.8.8.8: seq=4 ttl=57 time=11.523 ms
64 bytes from 8.8.8.8: seq=5 ttl=57 time=13.511 ms
64 bytes from 8.8.8.8: seq=6 ttl=57 time=13.004 ms
64 bytes from 8.8.8.8: seq=7 ttl=57 time=11.780 ms
64 bytes from 8.8.8.8: seq=8 ttl=57 time=11.616 ms
64 bytes from 8.8.8.8: seq=9 ttl=57 time=12.711 ms
64 bytes from 8.8.8.8: seq=10 ttl=57 time=11.579 ms
64 bytes from 8.8.8.8: seq=11 ttl=57 time=11.516 ms
64 bytes from 8.8.8.8: seq=12 ttl=57 time=11.395 ms
64 bytes from 8.8.8.8: seq=13 ttl=57 time=11.466 ms
64 bytes from 8.8.8.8: seq=14 ttl=57 time=11.426 ms
64 bytes from 8.8.8.8: seq=15 ttl=57 time=11.383 ms
64 bytes from 8.8.8.8: seq=16 ttl=57 time=11.368 ms
64 bytes from 8.8.8.8: seq=17 ttl=57 time=11.667 ms
64 bytes from 8.8.8.8: seq=18 ttl=57 time=11.376 ms
64 bytes from 8.8.8.8: seq=19 ttl=57 time=11.620 ms
64 bytes from 8.8.8.8: seq=20 ttl=57 time=12.670 ms
64 bytes from 8.8.8.8: seq=21 ttl=57 time=11.575 ms
64 bytes from 8.8.8.8: seq=22 ttl=57 time=11.549 ms
64 bytes from 8.8.8.8: seq=23 ttl=57 time=11.396 ms
64 bytes from 8.8.8.8: seq=24 ttl=57 time=11.458 ms
64 bytes from 8.8.8.8: seq=25 ttl=57 time=11.421 ms
64 bytes from 8.8.8.8: seq=26 ttl=57 time=11.412 ms
64 bytes from 8.8.8.8: seq=27 ttl=57 time=11.424 ms
64 bytes from 8.8.8.8: seq=28 ttl=57 time=12.370 ms
64 bytes from 8.8.8.8: seq=29 ttl=57 time=25.271 ms
64 bytes from 8.8.8.8: seq=30 ttl=57 time=11.516 ms
64 bytes from 8.8.8.8: seq=31 ttl=57 time=12.590 ms
64 bytes from 8.8.8.8: seq=32 ttl=57 time=11.633 ms
64 bytes from 8.8.8.8: seq=33 ttl=57 time=13.570 ms
64 bytes from 8.8.8.8: seq=34 ttl=57 time=11.742 ms
64 bytes from 8.8.8.8: seq=35 ttl=57 time=13.009 ms
64 bytes from 8.8.8.8: seq=36 ttl=57 time=15.340 ms
64 bytes from 8.8.8.8: seq=37 ttl=57 time=11.456 ms

Note that at each scan, 37 channels are scanned.

$ echo 0x400 > /sys/module/wcn36xx/parameters/debug_mask
$ dmesg
wcn36xx: mac80211 scan 37 channels worker

Looking at the driver, we can find the scan method (wcn36xx_hw_scan_worker):
for (i = 0; i < req->n_channels; i++) {
...
wcn36xx_smd_start_scan(wcn, req->channels[i]->hw_value);
msleep(30);
wcn36xx_smd_end_scan(wcn, req->channels[i]->hw_value);
...
}

So we can deduct that the scan loop will take about 1110ms (30ms * 37 channels).
I would say that this seems to match the different observations.
That said, I don’t know what is really done inside the firmware, if some sort of antenna time-sharing is performed.

The remaining question is why scanning 37 channels since there are only 14 channels in the 2.4Ghz band.

Debug shows that the driver actually try to scan 5Ghz channels:
[ 1908.495625] wcn36xx: hal start scan channel 14
[ 1908.538978] wcn36xx: hal end scan channel 14
[ 1908.539629] wcn36xx: hal start scan channel 36
[ 1908.586980] wcn36xx: hal end scan channel 36

This should not happen with this 2.4Ghz(only) chip (we could save 700ms).
iw phy0 info reports 5GHz support.

It looks like there is already a bug report for 5GHz.

https://bugs.96boards.org/show_bug.cgi?id=493

and there was a quick failed attempt and removing 5GHz. Can this be picked up again?

Yes this one seems coming from a different root cause since disabling 5GHz band capability from driver src does not seem to help.

However, for this particular scan issue, I think adding some chip/band-support detection could help.
We are going to have a look.

Thanks for looking into it. As of now this is killing us. How difficult do you think it will be to add chip/band-support detection?

Patch is currently under review, latency during scan is reduced as expected.

Hi @Loic,

I’ve integrated your patch that is under review for the chip/band support detection, and we still experience the reboot issue when pushing UDP packets through the interface with iperf. However, it seems that it is a little bit more reliable from the standpoint that it won’t try to connect to a network that it cannot support.

The iperf test in Inconsistent WiFi - #24 by Rob_Gries will trigger the bug if the -t switch is increased to 5 Minutes (300s). It seems that sustained activity at 30s is no longer able to trigger the bug reliably.

fwiw, the patch has not been merge into linux-next, so it will show up in the next Linux kernel version, and it was backported into the Linaro kernel branch (based on 4.9), so it will be included in the next Linaro release as well.

Hi @anon91830841,

That’s good news. I’ve been following the work that is on going in the Linaro Git repos, so I’ve been backporting some of the new patches into the stable branch #260 build to see if anything improves in the way of wifi support.

I’ve been messing with the following patches from ath-next, qcom’s codeaurora rpb repo, and upstream linaro and applying them to the #260 build that I’ve been using as a baseline:

The result is that the WiFi adaptor is a little bit more stable, however the UDP bug still exists and can be triggered with iperf with around ~3-5 minutes of sustained activity. I’ve also seen that the wifi module will stall when it doesn’t crash and cause a loss of connectivity. As @2048Bit has said before, this is killing us and making it hard to work with this particular chip. Hopefully this helps shed some additional light on some of the issues that we are having.

I’ll continue to mess around with this and will share if I find a particular configuration that works reliably.

-Rob

Has anyone tried building and using Dan Williams’ version of network manager from Daniel T’s post here:

I’m building it now inside of build #260, along with the patches that I found and cataloged.

FYI - this is the repo - NetworkManager/NetworkManager - Network connection manager and user applications (mirrored from https://gitlab.freedesktop.org/NetworkManager/NetworkManager)

This hack is known to eliminate the crash with UDP - imm/kernel - Unnamed repository

I’ve actually tried that one. It makes the crash harder to reproduce but it is definitely there. Try running iperf for an extended period of time when that patch is incorporated.