ESP32 mqtt client ws_read() failure

PeterR
Posts: 568
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Wed Sep 30, 2020 11:39 am

Thanks; great call!
A task delay works.
At the start of the investigation I added logic to test & set timeout to 1000mS if zero - but it sill failed. Must have put that in the wrong place!

The take away is:
(1) The fault happens when the ESP-IDF MQTT client receives a fragmented WS frames; when the header & payload do not arrive at the same time.
(2) I use the ESP-IDF websocket server to transmit MQTT server packets (both httpd_ws_send_frame_async() & httpd_ws_send_frame() produce this error).

Checking httpd_ws.c & WS header & payload are indeed sent seperately.

I would suggest that this is a fault in the ESP-IDF Websocket transport layer. It is perfectly possibly for header and payload to land at different times & indeed this is guarenteed when using the ESP-IDF Websocket server (albeit the segments may land close enough not to matter)!
I will go back and look at my timeout logic again and see if I can suggest a more polished fix.

Thanks again for the support & happy holidays!
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 568
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Wed Sep 30, 2020 1:28 pm

I had my timeout logic in the wrong place.
Setting:

Code: Select all

timeout_ms = 1000;
after the header has been grabbed also works.

We discussed the timeout at the start of this topic & I think I now understand why you do not want a timeout on header search but you definitely want a timeout after!
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 568
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Tue Oct 06, 2020 9:17 am

Adding a timeout seems to fix.
There seems to be another websocket issue as detailed here:
& I also believe that IDF CAN should be fixed.

ESP-Marius
Posts: 55
Joined: Wed Oct 23, 2019 1:49 am

Re: ESP32 mqtt client ws_read() failure

Postby ESP-Marius » Fri Oct 09, 2020 10:13 am

Mind testing if this works for you? (without any of your timeouts)

Attached a diff
Attachments
diff.txt
(1011 Bytes) Downloaded 23 times

PeterR
Posts: 568
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Mon Oct 12, 2020 11:26 am

Hi,

The patch does not apply to my IDF: SHA-1: 84b51781c80740fda92784dafcfc96c13b0d8b66
The patch needs to be applied to latest IDF: SHA-1: 8bc19ba893e5544d571a753d82b44a84799b94b1
If I swap over to latest IDF & make -j8 flash then:

Code: Select all

The following Python requirements are not satisfied:
gdbgui==0.13.2.0
pygdbmi<=0.9.0.2
reedsolo>=1.5.3,<=1.5.4
bitstring>=3.1.6
The recommended way to install a packages is via "pacman". Please run "pacman -Ss <package_name>" for searching the package database and if found then "pacman -S mingw-w64-i686-python-<package_name>" for installing it.
NOTE: You may need to run "pacman -Syu" if your package database is older and run twice if the previous run updated "pacman" itself.
Please read https://github.com/msys2/msys2/wiki/Using-packages for further information about using "pacman"
Diagnostic information:
    IDF_PYTHON_ENV_PATH: (not set)
    Python interpreter used: C:/msys32/mingw32/bin/python.exe
    Warning: python interpreter not running from IDF_PYTHON_ENV_PATH
    PATH: C:\msys32\mingw32\bin;C:\msys32\opt\xtensa-esp32-elf\bin;C:\msys32\mingw32\bin;C:\msys32\usr\local\bin;C:\msys32\usr\bin;C:\msys32\usr\bin;C:\Windows\System32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\msys32\usr\bin\site_perl;C:\msys32\usr\bin\vendor_perl;C:\msys32\usr\bin\core_perl
I've been down this path before & even with a clean MINGW toolchain I was unable to quickly resolve.

I have a working change (time-out). Happy to try yours though if you make a 84b51781c80740fda92784dafcfc96c13b0d8b66 patch.
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 568
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Thu Oct 22, 2020 10:49 am

@ESP-Marius

Hi,
My timeout logic is not perfect. I still get:

Code: Select all

TRANSPORT_WS: Error read data
TRANSPORT_WS: Error reading payload data
Would you please create your patch for my SHA 84b51781c80740fda92784dafcfc96c13b0d8b66 ?
& I also believe that IDF CAN should be fixed.

ESP-Marius
Posts: 55
Joined: Wed Oct 23, 2019 1:49 am

Re: ESP32 mqtt client ws_read() failure

Postby ESP-Marius » Fri Oct 23, 2020 10:44 am

PeterR wrote:
Thu Oct 22, 2020 10:49 am
@ESP-Marius

Hi,
My timeout logic is not perfect. I still get:

Code: Select all

TRANSPORT_WS: Error read data
TRANSPORT_WS: Error reading payload data
Would you please create your patch for my SHA 84b51781c80740fda92784dafcfc96c13b0d8b66 ?
Hi,

You can try the one I've attached now and see if that applies/helps.
Attachments
ws.diff.txt
(8.36 KiB) Downloaded 26 times

PeterR
Posts: 568
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Tue Oct 27, 2020 10:32 am

Thanks, that looks good. I will comment again in a couple of weeks when it has been bedded in.
Had a couple of whitespace issues fixed with:

Code: Select all

git apply --whitespace=fix ws.diff.txt
EDIT: PS - would you describe the change please? You modified the ESP MQTT client but packet to frame may be (0..1 : 0..1). Not had a chance to review in detail but interested in what you think was wrong with ESP MQTT client.
I know; gift horse etc....
& I also believe that IDF CAN should be fixed.

PeterR
Posts: 568
Joined: Mon Jun 04, 2018 2:47 pm

Re: ESP32 mqtt client ws_read() failure

Postby PeterR » Mon Nov 02, 2020 6:28 pm

Hi,
I think that the patch fixes the fragmentation issue however there are other issues behind this.

(1) On occasion I get:

Code: Select all

httpd_ws: httpd_ws_recv_frame: WS frame is not properly masked
This error is generated from my internal MQTT server & (I believe) only as a result of my own MQTT client's PUBLISH.

(2) On occasion I get:

Code: Select all

MQTTS: Session ending for socket: 53
This is my own message and is made from my MQTT server's

Code: Select all

mqtts_set_session_context(session, &on_session_end, newContext);
The log is made for my IDF MQTT client's socket & (even on MQTT_CLIENT verbose) without any other messages.

(3) and very infrequently:
no mem for receive buffer
I have about 35KB data free.
Now (3) seems to be the clue; emac_esp32_rx_task() was unable to allocate memory & pass the packet on. Wonder if MQTT CLIENT is also scratching around for memory.....?

I suspect that (2) is also the result of memory shortage and; (1) might be an alternative path/race but triggered from a lack of memory (i.e. if IDF MQTT WS transport send in sections...).
It is clear that available memory depends on emac_esp32_rx_task() (i.e. it's malloc()) and so network traffic, processing etc.

Ideally an MQTT connection should be maintained. Droping an Websocket MQTT connection is a big deal for my application & results in 1 second or so outage. AJAX would be relatively imune to this issue because whilst a single request might fail (lack of memory) AJAX is not 'connected'. So if you AJAX request fast enough you'll only see a judder but you are clearly otherwise limited.

SO: memory indeterminism seems to be the result of emac_esp32_rx_task() & its mallocs. This then gives rise to WS frame errors and some other requests to close the client socket which I believe eminate from the ESP-IDF MQTT websocket client service.

I wonder if the MQTT client library and web socket transport could just return 'try again' or fail? Certainly could add better failure logging. I suspect that IDF MQTT client packet send() ends up in multiple parts and that WS transport might also end up in multiple TCP/IP packets. If instead MQTT packet send() was sent in one section then we could both report the transport error back & stop an MQTT server thinking that there was a protocol error due to fragmented WS frames & killing the connection (following emac_esp32_rx_task() 'eating all the pies')- there would no longer be any incomplete MQTT packets & so no reason to kick an ESP-IDF MQTT client into the bin!

Assuming WS fragmentation has been fixed then this is just an IDF MQTT client fragmentation issue.

Would be really keen for pointers and/or a patch!

EDIT: PS Its QoS0 after all (& would be hard to achieve better on an embedded server), so just return fail!
EDIT: PPS Bring on common browser uni/multicast support :)
& I also believe that IDF CAN should be fixed.

Who is online

Users browsing this forum: xlfdan and 36 guests