Facing issue while pairing custom to the mobile application

Hi @Loic,

We are working on a produce based on APQ8016E and WCN3660B WiFi chip.

Issue description:
The pairing process includes:

  1. Connection to the Wi-Fi network
  2. Connection to the server and mobile application

The fail happens in the second phase, during server connection.

The server connection process is constructed from few HTTPS requests that mostly fail on timeout.

Additional notes for the issue:

  1. Normally it takes 1-2 seconds to this command to finish and when it got stuck, it finishes after 30-60 seconds
  2. We tested it with 18 and 20 Linaro builds and it looks the same
  3. It happens only with specific routers. We reproduce it constantly with Galaxy Note 10 and Pocophone f1 hotspots
  4. In Israel it looks like 50% of the routers provided by ISPs have this issue
  5. With G1(based on TI processor) devices we don’t see this issue

The detailed report and logs are uploaded on the below link.
https://arrowelectronics-my.sharepoint.com/:f:/r/personal/parth_y_shah_einfochips_com/Documents/Attachments/issue_6?csf=1&web=1&e=gVFatm

Please help further.

Regards,
Darshak

Sounds like you’re dealing with an external network issue, not something that anyone here can really help you with except for providing some general ideas to help you overcome the problem.

For example; consider implementing a timeout of 5-10 seconds and retrying the connection.

And FYI: Your link is useless. Seems to require some kind of 3rd party login.

Thanks for your reply @doitright

It has been shared with Loic and he can access this.

Hi All,

Please find the logs and detailed report on the below link.

https://drive.google.com/drive/folders/1K3trY--MiCUctvm-QRLyUHLC0fWhlSNb?usp=sharing

The DUT receive a duplicated SYN-ACK, which means that either our previous ACK has either been lost or not yet received by the server (or server did not receive any DATA after a certain time with TCP_DEFER_ACCEPT option). Since the connection is ‘established’ on client side, the rule is to send an empty ACK containing the current send-sequence number and the next sequence number. This is what is done here.

But as you mentioned, Tsecr value in the resubmitted ACK packet is not updated with Tsval of this duplicate SYN-ACK. Not sure if it’s your problem though since the sequence numberof that ack is probably not what the server expect as well, so It would be good to capture packets on the server as well to check whether the server receives the ACK packets or not.

You can also test this patch to check if it fixes your issue (and capture packets): loic.poulain/linux.git - [no description]

Regarding packet latency. The server replies to the client with a SYN-ACK but with (sometimes) the DSCP flag set to class-3 (I assume for better tcp quality of service). I suppose when the AP detect that field, it tries to translate that in the WiFi domain, and create a new block-ack session with a new TID (tid=4), this setup delays (a bit) the delivery of the SYN-ACK to the client, which (probably) acks it to late, causing a retransmission of the SYN-ACK by the server. This behavior is purely controlled by the AP, so there is no much we can do on DUT WiFi side.

In case of your g1 device, the behavior seems a bit different since the AP does not wait for blockack setup completion but directly send the packet with TID=4, which seems enough to prevent the duplicate SYN_ACK…

Hi @Loic,

Please find below captures for G1 and G2 devices:

https://arrowelectronics-my.sharepoint.com/:f:/r/personal/parth_y_shah_einfochips_com/Documents/Attachments/issue_6_logs_comparison_G1_G2?csf=1&web=1&e=A45Zrp

We’ve found some additional differences between the G1 and G2 behavior.

The differences description:
AP = Access point
DUT = Tyto device = station
BA = Block Ack

G1 flow:

  1. 1344: DUT receives “Action packet” as a start of BA session
  2. 1347: DUT receives SYN ACK from the server
  3. 1352: DUT sends response on “Action packet”, received in #1
  4. 1355: DUT sends ACK to the server (response on SYN ACK)
  5. 1357: DUT receives BA request
  6. 1358: DUT sends BA response
  7. 1366: DUT sends Client Hello to the server. 23ms after 1358 packet

G2 flow:

  1. 2253: DUT receives “Action packet” as a start of BA session
  2. 2256: DUT receives SYN ACK from the server
  3. 2258: DUT sends ACK to the server (response on SYN ACK)
  4. 2260: DUT sends response on “Action packet”, received in #1
  5. 2262: DUT receives BA request
  6. 2263: DUT sends BA response
  7. 2310: DUT sends Client Hello to the server. 235ms after 2263 packet

Observation: Looks like wcn36xx driver is taking more time to process and respond to the addba request frame.

Can you please provide your inputs over here how to overcome this? It is a very critical issue for us.

Regards,
Parth Y Shah

Have you tried the proposed patch?

This does not seem to be the problem, you still have repeated SYN/ACK from the server which does not receive (or accept) dut ACK. have you been able to capture on server side?

Hi @Loic

Yes, We tested this patch and it doesn’t fix the issue.

No, we can’t check logs on the server side.

Regards,
Parth Y Shah

But does it fixes the timestamp?
What is the hit rate of that issue?
is there a simple way to reproduce it with a db410c with an other service?

It is reproducible with specific setup. It reproduces with some mobile hotspots and some APs.

Steps to reproduce:

  1. Connect G2 device to mobile hotspot or some AP using mobile application
  2. After this pairing the app will try to connect to some application/web sever

While performing this steps this issue happens.

It is randomly reproducible (like 4/5 times in 10 trials).

Hi @loic,

Below sequence difference between G1 and G2 could be a problem over here?

G2 -capture - ADDBA request → SYN-ACK → ACK → ADDBA reponse

G1 -capture - ADDBA request → SYN/ACK → ADDBA response → ACK

AFAIK, no. packet is correctly acked and action is correctly replied.

No @loic,

Please find the tcpdump at below location:
https://arrowelectronics-my.sharepoint.com/:f:/r/personal/parth_y_shah_einfochips_com/Documents/Attachments/TCP_issue_logs?csf=1&web=1&e=3lnC9b

Desription:
Tsecr of Duplicated Ack still show Tsval of initial packet
For example (tcp2.cap): Frame 70 is an answer to frame 69 and Tsecr should be “217593384” (as TSval of frame 69). In fact Tsecr is same as in frame 66 (initial ACK package)

How do you know packet is dropped by the AP?

@Loic, my latest comment is for the tcp issue for which you have provided the patch and after the patch applied the tcp issue was not resolved.

https://arrowelectronics-my.sharepoint.com/:u:/r/personal/parth_y_shah_einfochips_com/Documents/Attachments/issue_6/Captures%20from%20WAN+air+tcpdump.zip?csf=1&web=1&e=NUCHfx
here you can find few “sets” of captures – capture from device + air + ppp of linux machine that works as WAN for router. One of sets done with outgoing traffic from device with DSCP 0x60.

From ppp captures we can see that Ack from our device never went out from router. But sometimes we can see that Client Hello that going after Ack – went out successfully

Hi @loic,

Is there any way/patch available to increase 3-way handshake timings? We want to check this issue after doing so.