esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

fpena_inusual
Posts: 9
Joined: Tue Oct 20, 2020 8:58 am

esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby fpena_inusual » Tue Oct 20, 2020 10:00 am

Hi there,

We're using SLIP protocol to communicate an esp32 device with a mcu and all seems to be ok, but we're having reboots from time to time depending on the data received by the esp32, caused by an abort on an internall memory allocation, so there's a memory leak somewhere...
We've also included a heap memory tracking task to determine from where the memory leak comes from, and we found our slip reception task is consuming all the memory for some reason...
As the task is pretty simple (no dynamic memory allocation) we determined the problem was wit esp_netif_receive function.
We started from the slip_udp example (examples/protocols/slip/slip_udp/) without much changes, we've only changed the receiving task to make it more robust (managing fifo overtflows, etc...).
¿Any clue about how to solve/trace the issue?

P.S.: Here is our receiving task, in case you wonder what we've changed

Code: Select all

static void esp_slip_modem_uart_rx_task(void *arg) {

        uart_event_t event;

        esp_slip_modem_t *slip_modem = (esp_slip_modem_t *) arg;

        int len;

        while (slip_modem->running == true) {
                if(xQueueReceive(slip_modem->uart.uart_queue, (void * )&event, 1)) {
                        switch(event.type) {
                                case UART_DATA:
                                        // Read data from the UART
                                        len = uart_read_bytes(slip_modem->uart.uart_dev, slip_modem->buffer, slip_modem->buffer_len, 1);
                                        if (len > 0) {
                                                // Ensure null termination
                                                slip_modem->buffer[len] = '\0';
                                                esp_netif_receive(slip_modem->base.netif, slip_modem->buffer, len, NULL);
                                        }
                                        if (len < 0) {
                                                // Asume error and clear everything
                                                uart_flush_input(slip_modem->uart.uart_dev);
                                                xQueueReset(slip_modem->uart.uart_queue);
                                        }
                                        break;
                                case UART_FIFO_OVF:
                                case UART_BUFFER_FULL:
                                        uart_flush_input(slip_modem->uart.uart_dev);
                                        xQueueReset(slip_modem->uart.uart_queue);
                                        break;
                                case UART_FRAME_ERR:
                                default:
                                        break;
                         }
                } else {
                        vTaskDelay(pdMS_TO_TICKS(1));
                }
        }
        vTaskDelete(NULL);
}

P.S.2: Here is the error message it give us

Code: Select all

Memory allocation failed

Backtrace:0x4008e9e6:0x3ffd0d60 0x4008f049:0x3ffd0d80 0x400f60ab:0x3ffd0da0 0x4008565b:0x3ffd0dc0 0x40085738:0x3ffd0df0 0x40085b5a:0x3ffd0e10 0x40085b99:0x3ffd0e50 0x40151c24:0x3ffd0e70 0x40151cc2:0x3ffd0e90 0x40151d30:0x3ffd0eb0 0x400941f4:0x3ffd0ed0 0x4015c3fb:0x3ffd0ef0 0x4015c4c7:0x3ffd0f10 0x4015c623:0x3ffd0f30 0x4014ec65:0x3ffd0f50 0x4017d571:0x3ffd0f70 0x400f7065:0x3ffd0f90
0x4008e9e6: panic_abort at /home/desarrollo/devel/esp32/esp-idf/components/esp_system/panic.c:360

0x4008f049: esp_system_abort at /home/desarrollo/devel/esp32/esp-idf/components/esp_system/system_api.c:104

0x400f60ab: heap_caps_alloc_failed at /home/desarrollo/devel/esp32/esp-idf/components/heap/heap_caps.c:67

0x4008565b: heap_caps_malloc at /home/desarrollo/devel/esp32/esp-idf/components/heap/heap_caps.c:155

0x40085738: heap_caps_malloc_default at /home/desarrollo/devel/esp32/esp-idf/components/heap/heap_caps.c:187

0x40085b5a: trace_malloc at /home/desarrollo/devel/esp32/esp-idf/components/heap/include/heap_trace.inc:95

0x40085b99: __wrap_malloc at /home/desarrollo/devel/esp32/esp-idf/components/heap/include/heap_trace.inc:157

0x40151c24: mem_malloc at /home/desarrollo/devel/esp32/esp-idf/components/lwip/lwip/src/core/mem.c:237

0x40151cc2: do_memp_malloc_pool at /home/desarrollo/devel/esp32/esp-idf/components/lwip/lwip/src/core/memp.c:254

0x40151d30: memp_malloc at /home/desarrollo/devel/esp32/esp-idf/components/lwip/lwip/src/core/memp.c:350 (discriminator 2)

0x400941f4: pbuf_alloc at /home/desarrollo/devel/esp32/esp-idf/components/lwip/lwip/src/core/pbuf.c:247

0x4015c3fb: slipif_rxbyte at /home/desarrollo/devel/esp32/esp-idf/components/lwip/lwip/src/netif/slipif.c:265 (discriminator 4)

0x4015c4c7: slipif_rxbyte_enqueue at /home/desarrollo/devel/esp32/esp-idf/components/lwip/lwip/src/netif/slipif.c:494

0x4015c623: slipif_received_bytes at /home/desarrollo/devel/esp32/esp-idf/components/lwip/lwip/src/netif/slipif.c:555 (discriminator 3)

0x4014ec65: esp_netif_lwip_slip_input at /home/desarrollo/devel/esp32/esp-idf/components/esp_netif/lwip/esp_netif_lwip_slip.c:165 (discriminator 3)

0x4017d571: esp_netif_receive at /home/desarrollo/devel/esp32/esp-idf/components/esp_netif/lwip/esp_netif_lwip.c:803

0x400f7065: esp_slip_modem_uart_rx_task at /home/desarrollo/devel/esp32/eth2slip/main/src/driver/slip_modem.c:221

ESP_Alvin
Posts: 195
Joined: Thu May 17, 2018 2:26 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby ESP_Alvin » Tue Oct 20, 2020 11:30 am

Moderator's note: edit the topic title for issue tracking, thanks for reporting.

fpena_inusual
Posts: 9
Joined: Tue Oct 20, 2020 8:58 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby fpena_inusual » Wed Oct 21, 2020 10:53 am

Hello again,

I have more information on this.
It seems to occur only when the received datagram is fragmented, that is when the actual datagram is bigger than the interface MTU and it's chopped by the ip layer. The memory leak is 1520 bytes per datagram.
I assumed the slip MTU is 1500 bytes, so I tried to ping to the device with up to 1472 bytes (1500 bytes (MTU) - 20 byte (IP header) - 8 byte (ICMP header)) and it works fine, no memory leak so far.
If I try to ping with more than 1472 bytes I have no response from the device and it starts to consume the memory, 1520 bytes per lost packet as I mentioned.

Any idea how I can overcome this?

ESP_cermak
Posts: 69
Joined: Thu Nov 01, 2018 8:32 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby ESP_cermak » Thu Oct 22, 2020 4:34 pm

Hi fpena_inusual,

One possible reason could be that the slip netif doesn't support packets longer than 256 bytes, as per limitation in lwIP (len parameter is uint8_t). There a fix ready for internal review: https://github.com/espressif/esp-idf/pull/5928, Could you please check if that helps?

fpena_inusual
Posts: 9
Joined: Tue Oct 20, 2020 8:58 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby fpena_inusual » Fri Oct 23, 2020 9:15 am

Ok, patch applied, but sadly it doesn't fix this issue.

Thanks anyway

ESP_cermak
Posts: 69
Joined: Thu Nov 01, 2018 8:32 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby ESP_cermak » Mon Oct 26, 2020 5:17 pm

Would be it acceptable to workaround this by increasing the MTU size here? https://github.com/espressif/esp-lwip/b ... ipif.c#L79
I might be wrong, but it looks like the extra bytes are simply dropped on netif level (here in slipif.c) before getting fragmented.

Please note that it is possible to add target compile flags to other components in your project makefile:

Code: Select all

idf_component_get_property(lwip lwip COMPONENT_LIB)
target_compile_definitions(${lwip} PRIVATE "-DSLIP_MAX_SIZE=1600")

fpena_inusual
Posts: 9
Joined: Tue Oct 20, 2020 8:58 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby fpena_inusual » Tue Oct 27, 2020 9:24 am

@ESP_cermak
It would be acceptable, but I think that fragmentation is only part of the problem.
The memory leak occurs in many other situations, it's just when you try to ping with a data packet bigger than the MTU it goes wild.
Even connecting to a http server increases the memory consumption.

fpena_inusual
Posts: 9
Joined: Tue Oct 20, 2020 8:58 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby fpena_inusual » Tue Oct 27, 2020 1:07 pm

Hello again...

I think I have something, but I don't know how interpret it.
Here it is the capture of a HTTP transaction: https://pastebin.com/raw/qz8GpB3d
As you can see, there's a retransmission packet, and all packets are below 1500 bytes long
From my tests, every time there's a retransmission (or any other error, I guess) the memory leak is incremented in 1520 bytes...

fpena_inusual
Posts: 9
Joined: Tue Oct 20, 2020 8:58 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby fpena_inusual » Tue Oct 27, 2020 5:29 pm

Ok, the problem seems to be with lwip itself...
I wrote a new driver only using the lwip basic functions bypassing the ESP-NETIF layer (I had to comment the sio functions from the idf code) and it fails in the same way...
I think the lwip code for slip is broken in some way.

fpena_inusual
Posts: 9
Joined: Tue Oct 20, 2020 8:58 am

Re: esp_netif_receive eating all my ram (SLIP protocol) (IDFGH-4141)

Postby fpena_inusual » Wed Oct 28, 2020 12:18 pm

Ok, another little update, with a very small improvement...

I changed my reception task to this:

Code: Select all

static void esp_slip_modem_uart_rx_task(void *arg) {

        uart_event_t event;

        esp_slip_modem_t *slip_modem = (esp_slip_modem_t *) arg;

        int len;

        while (slip_modem->running == true) {
                if(xQueueReceive(slip_modem->uart.uart_queue, (void * )&event, pdMS_TO_TICKS(65))) {
                        switch(event.type) {
                                case UART_DATA:
                                        // Read data from the UART
                                        memset((void *)slip_modem->buffer, 0, slip_modem->buffer_len);
                                        len = uart_read_bytes(slip_modem->uart.uart_dev, slip_modem->buffer, slip_modem->buffer_len, 0);
                                        if (len > 0) {
                                                const int max_batch = 255;
                                                int sent = 0;
                                                while(sent < len) {
                                                        int batch = (len - sent) > max_batch ? max_batch : (len - sent);
                                                        esp_netif_receive(slip_modem->base.netif, slip_modem->buffer+sent, batch, NULL);
                                                        sent += batch;
                                                }
                                        } else if (len < 0) {
                                                // Asume error and clear everything
                                                ESP_LOGE(TAG, "Receiving error (len < 0)!!!");
                                                uart_flush_input(slip_modem->uart.uart_dev);
                                                xQueueReset(slip_modem->uart.uart_queue);
                                        }
                                        break;
                                case UART_FRAME_ERR:
                                case UART_BUFFER_FULL:
                                case UART_FIFO_OVF:
                                        ESP_LOGE(TAG, "Receiving error!!!");
                                        uart_flush_input(slip_modem->uart.uart_dev);
                                        xQueueReset(slip_modem->uart.uart_queue);
                                        break;
                                default:
                                        break;
                         }
                } else {
                        vTaskDelay(pdMS_TO_TICKS(65));
                }
        }
        vTaskDelete(NULL);
}
As you can see I decided not to patch the idf but pass the received data in small chuncks to esp_netif_receive (just what the patch is supposed to do).
Now I can ping with data bigger than the MTU and it will reply without incrementing the memory usage (in fact it does, but it's freed afterwards)...
But it still fails with regular http transactions...

Any hint on how I can track the problem (I'm not familiar with lwip inners)?

P.S.: I'm sorry for being so boring, but our product depends on this to be working and if doesn't work we will lose a 500 units contract (two esp32 devices on each unit) just on the first batch (a lot more planned)... so it's quite important to us :P

Who is online

Users browsing this forum: Baidu [Spider] and 103 guests