spi_flash_read() behavior when flash is going bad.

Inquisitor
Posts: 22
Joined: Thu Dec 14, 2017 10:53 am
Contact:

spi_flash_read() behavior when flash is going bad.

Postby Inquisitor » Tue Sep 28, 2021 3:03 pm

I am writing a class to do wear leveling in a flash sector. I see that spi_flash_read() and spi_flash_write() return errors SpiFlashOpResult for error and timeout. I am also considering writing the class to not write to areas with bad memory regions. Can someone describe...

  1. What causes SPI_FLASH_RESULT_ERR?
  2. What causes SPI_FLASH_RESULT_TIMEOUT?
  3. When flash is getting worn out, what happens when some region is faulty? Does it cause one of the above errors or does it reboot the ESP8266?

Thanks.
3 lines of code = App/Web Server w/ GUI Admin, File Mngr, OTA, AP Mgr, Perf Metrics, WebSocket Comms, App API, All running on ESP8266... Even usable on ESP-01S
https://www.esp8266.com/viewtopic.php?f=11&t=23535
https://InqOnThat.com

Her Mary
Posts: 537
Joined: Mon Oct 27, 2014 11:09 am

Re: spi_flash_read() behavior when flash is going bad.

Postby Her Mary » Fri Oct 08, 2021 2:29 pm


Inquisitor
Posts: 22
Joined: Thu Dec 14, 2017 10:53 am
Contact:

Re: spi_flash_read() behavior when flash is going bad.

Postby Inquisitor » Sat Oct 09, 2021 8:57 pm

Her Mary wrote:Could this example help? https://github.com/espressif/esp-idf/tr ... _levelling


Thank for your reply.
I looked through the example you referenced and searched for details on the one API call (esp_vfs_fat_spiflash_mount). It implements a wear leveling methodology, but doesn't go into any details about error conditions it handles. Creating a wear leveling class is fairly trivial... (1) write to the whole sector and (2) then erase and start again OR move to another sector. The difficult part is handling the error conditions above.

Turns out, those error conditions never occur. At least... I've set up a test program to abuse a sector hoping to catch them. After many 100K cycles of erase/read/write cycles the whole esp8266 board simply fails... without ever returning one of those errors. I have unfortunately destroyed two ESP01s using that test. In other words... the three API methods always return SPI_FLASH_RESULT_OK. There is no hint of impending doom of the board.

From those two tests, I did note a slowing down of the spi_flash_read() over the life. I'm now running a third ESP01 across many sectors trying to quantify that "aging" of the flash. I suspect that Espressif was trying to do the same thing by returning the SPI_FLASH_RESULT_TIMEOUT error when the duration was over a certain amount of time. It's what I would (will) do. Unfortunately, their internal duration is too long... the ESP01 board becomes unusable before that "age" is reached on the flash currently on a production ESP01 board.

I'd like to find some more technical details about flash memory behavior (irrespective of its use on an ESP8266 board). It currently doesn't make sense to me why the failure in one sector makes the whole flash unusable and thus cause the ESP8266 processor to fail. I would have thought that simply avoiding a failed portion of memory OR if necessary a full sector would be sufficient.
3 lines of code = App/Web Server w/ GUI Admin, File Mngr, OTA, AP Mgr, Perf Metrics, WebSocket Comms, App API, All running on ESP8266... Even usable on ESP-01S
https://www.esp8266.com/viewtopic.php?f=11&t=23535
https://InqOnThat.com

Who is online

Users browsing this forum: No registered users and 186 guests