SOLVED: NVS data set or erase FAILS; NVS becomes read-only

jcolebaker
Posts: 60
Joined: Thu Mar 18, 2021 12:23 am

SOLVED: NVS data set or erase FAILS; NVS becomes read-only

Postby jcolebaker » Sun Jun 12, 2022 9:42 pm

We use the "Non Volatile Storage" library to store user registrations and other application settings in our ESP32 project (See https://docs.espressif.com/projects/esp ... flash.html).

We have run into an intermittent problem where nvs_set_* functions start to fail: They do not return any error code, but the new value has NOT been added to the NVS. This effects updating an existing key (value doesn't change) and adding a new key (key not added; dump of all keys shows only keys which already existed).

Here is my "set" code to store a value:

Code: Select all

    // Initialisation:
    static const char *NVS_NAMESPACE = "wc_nvs";
    esp_err_t result = nvs_flash_init();
    if (result == ESP_ERR_NVS_NO_FREE_PAGES || result == ESP_ERR_NVS_NEW_VERSION_FOUND)
    {
        ESP_ERROR_CHECK( nvs_flash_erase() );
        result = nvs_flash_init();
    }
    ESP_ERROR_CHECK( result );
    ESP_ERROR_CHECK( nvs_open(NVS_NAMESPACE, NVS_READWRITE, &l_nvs_handle) );

...
    // Write a key / value:
    ESP_ERROR_CHECK( nvs_set_blob(l_nvs_handle, key, data, size) );
    ESP_ERROR_CHECK( nvs_commit(l_nvs_handle) );
Once the problem has occurred, no further writes using the NVS library work, although reading values is OK. The problem persists after a power cycle.

I have a "factory reset" triggered by a button press, but this does not fix the problem. Here is the "factory reset" code and log output, which demonstrate the issue:

Code: Select all

static void OnStandbyButtonHold(void)
{
    ESP_LOGI(TAG, "************* FACTORY RESET *************");
    WcNvsEraseFlash();
    WC_EXIT_ALWAYS();
}
...
void WcNvsEraseFlash(void)
{
    nvs_close(l_nvs_handle);

    ESP_LOGI(TAG, "ERASE NVS partition...");
    ESP_ERROR_CHECK( nvs_flash_erase() );
    ESP_LOGI(TAG, "NVS Erase finished.");

    // After an erase, the flash is de-initialised. Need to initialise and re-open:
    ESP_ERROR_CHECK( nvs_flash_init() );
    ESP_ERROR_CHECK( nvs_open(NVS_NAMESPACE, NVS_READWRITE, &l_nvs_handle) );

    // ***** Debug flash erase issue: print flash contents - should be empty! **********
    DebugPrintKeys();
}
"DebugPrintKeys" is a debug function which iterates through all keys in our nvs name space and prints the key names to the log. It should print "No keys found" after a nvs_flash_erase, which it does when things are working correctly.

After the error has occurred, the log shows the following for a factory reset:

Code: Select all

I (2360495) WCUI: ************* FACTORY RESET *************
I (2360495) NVSC: ERASE NVS partition...
I (2360495) NVSC: NVS Erase finished.
I (2360525) NVSC: NVS Keys:
I (2360525) NVSC: Key: 'name_zone_1'; type 66
I (2360525) NVSC: Key: 'name_zone_2'; type 66
I (2360535) NVSC: Key: 'name_zone_3'; type 66
I (2360535) NVSC: Key: 'name_zone_4'; type 66
I (2360545) NVSC: Key: 'name_zone_5'; type 66
I (2360545) NVSC: Key: 'name_zone_6'; type 66
I (2360555) NVSC: Key: 'serial_engr'; type 33
I (2360565) NVSC: Key: '01_reg_data'; type 66
I (2360565) NVSC: Key: '02_reg_data'; type 66
I (2360575) NVSC: Key: 'use_stand_ble'; type 4
I (2360575) NVSC: Key: 'use_target_ble'; type 4
I (2360585) NVSC: Key: 'name_engr'; type 66
I (2360585) NVSC: NVS Stats: used entries 104; free entries 904; all entries 1008
I.e. All the previously stored keys are still there (not erased). After rebooting, attempts to read the nvs keys confirm that they are still there, and the bug persists (adding / updating values fails).

The only way I've found to restore the correct operation of the NVS is to perform a full flash erase using the boot loader mode, i.e.:

esptool.py -p com4 erase_flash

Then reinstall the firmware. This does work.

I've tried uploading and analysing the content of the NVS partition from a device with the bug, in case it was corrupt. The content looks OK (based on the documentation for the NVS library) and I can see the stored namespaces, keys and data. Note that a "factory reset" as above had been performed, so all the entries SHOULD have been erased, but they appeared to be still intact.

Also, I was able to download the nvs partition data from the failed board to another board using "write_flash..." command. This worked OK - the original keys and values from the failed board were there, and I was able to update and erase as normal.

Any suggestions? I'm pretty stuck because I can't replicate the bug; it's intermittent, i.e. only happens when the device is given to the testers! However it has now happened 3 or 4 times and renders the device useless until a flash erase is done.

ESP-IDF 4.3; ESP32-WROOM module, chip revision: 3.
Last edited by jcolebaker on Mon Jun 27, 2022 2:52 am, edited 1 time in total.

ESP_igrr
Posts: 2067
Joined: Tue Dec 01, 2015 8:37 am

Re: NVS data set or erase FAILS; NVS becomes read-only

Postby ESP_igrr » Mon Jun 13, 2022 2:52 pm

Could you please open an issue at https://github.com/espressif/esp-idf/issues and attach the NVS partition binary for us to analyze? If you can share the code which could be used to reproduce the issue, it is even better!
Thanks in advance.

chegewara
Posts: 2228
Joined: Wed Jun 14, 2017 9:00 pm

Re: NVS data set or erase FAILS; NVS becomes read-only

Postby chegewara » Mon Jun 13, 2022 10:36 pm

I saw your issue on github, but it is still missing the most important part, which is partition table.
Here is definition of function, and if your partition name is different than set in sdkconfig or you have more than 1 with the same name it may not erase the partition you want

Code: Select all

extern "C" esp_err_t nvs_flash_erase(void)
{
    return nvs_flash_erase_partition(NVS_DEFAULT_PART_NAME);
}

jcolebaker
Posts: 60
Joined: Thu Mar 18, 2021 12:23 am

Re: NVS data set or erase FAILS; NVS becomes read-only

Postby jcolebaker » Tue Jun 14, 2022 12:20 am

@chegewara I've attached a zip to the github issue with our custom partition table (the source CSV and also the generated bin file).

I don't think this is the problem because everything works fine most of the time; the problem is intermittent, although one of the testers has now had it happen about 4 times during testing.

I also wondered whether the partition table could have been corrupted in the flash. However, I uploaded it from a device with the error (see flash.bin in flash_data.zip, which contains the first 128 kB of the flash) and did a byte-by-byte compare with the original bin file, and they are identical.

chegewara
Posts: 2228
Joined: Wed Jun 14, 2017 9:00 pm

Re: NVS data set or erase FAILS; NVS becomes read-only

Postby chegewara » Tue Jun 14, 2022 12:38 am

This may be a bug and may be related to size of "nvs" partition, but its only theory and needs to be confirmed by espressif.
I never tested or saw nvs data partition this size (32kB), and full erase is a bit slow. Maybe you could try "nvs_erase_all" instead?

jcolebaker
Posts: 60
Joined: Thu Mar 18, 2021 12:23 am

Re: NVS data set or erase FAILS; NVS becomes read-only

Postby jcolebaker » Tue Jun 14, 2022 7:55 pm

I have considered trying "nvs_erase_all", but that would not help with the original problem - which is that we can't add or update values in NVS with "nvs_set_...".

Also, looking at the implementation of "nvs_flash_erase", it ultimately calls "esp_partition_erase_range" in the spi_flash component. It is concerning that basic spi_flash functions seem to failing (without returning any error code). "esp_partition_erase_range" is independent of the NVS library and should only require that the partition table be valid (which it does seem to be).

jcolebaker
Posts: 60
Joined: Thu Mar 18, 2021 12:23 am

Re: NVS data set or erase FAILS; NVS becomes read-only

Postby jcolebaker » Mon Jun 27, 2022 2:50 am

SOLVED:

We are now 99% sure this problem is actually caused by a hardware design issue with our USB programming board.

During the condition where the main power supply to the board is OFF but the programmer is still connected to USB, we see stray voltages on the pins connected to the ESP32, including the MTDI pin. This pin has an internal pull down in the ESP, but it is not active until after power is applied. Where it has floated high during power down, it looks like the pin is not pulled low fast enough at power on, and gets read as "1". This causes the VDD_SDIO to be set to 1.8v instead of 3.3v. We observe that the pin strapping shown by the bootloader is 0x33 or 0x3f instead of 0x13. The condition persists a software reboot, and our testers were usually experiencing a recurrence of the problem if they cycled the power. However, reprogramming the board would reset it until some future power interruption caused it to recur.

The problem did not happen on our original boards because the flash still worked correctly even at the lower voltage. However, more recent boards have a later ESP module with a different flash chip (Manufacturer: c8; Device: 4017). I have now confirmed that on these boards, the flash can be read but cannot be written when VDD_SDIO has been set to 1.8v.

I have worked around the problem on our dev boards by running this espefuse command to set the VDD_SDIO to 3.3v via the e-fuses:

Code: Select all

python path\to\esp\tools\espefuse.py -p comX set_flash_voltage 3.3V
Finally, a bug I can blame on the hardware designer...

Who is online

Users browsing this forum: ESP_Roland, hugerobber, SvanteKaiser and 92 guests