I need to fill the FIFO as fast as possible so I ended up using "pure" asm to squeeze the max out of it. I'm using a "plain" RAM to FIFO copy, and the CCOUNT register to count the cycles needed to complete the action.
Here is the "main" code snippet (tmr hold the cycles count at the end):
Code: Select all
uint32 spiFIFO[16];
asm volatile (
"l32i.n %0, %2, 0 \r\n" // this is here to preload the registers with correct address so to count only cycles needed to actually transfer data from RAM to FIFO
"l32i.n %0, %3, 0 \r\n"
"rsr.ccount %1 \r\n"
"l32i.n %0, %2, 0 \r\n" // this line and
"s32i.n %0, %3, 0 \r\n" // this one are a "move pair", moving one DWORD from RAM to FIFO
"l32i.n %0, %2, 4 \r\n"
"s32i.n %0, %3, 4 \r\n"
"l32i.n %0, %2, 8 \r\n"
"s32i.n %0, %3, 8 \r\n"
"l32i.n %0, %2, 12 \r\n"
"s32i.n %0, %3, 12 \r\n"
... // "pairs" are repeated for various tests
"rsr.ccount %0 \r\n"
"sub %1, %0, %1 \r\n"
: "=&r"(data),"=&r"(tmr):"r"(spiFIFO),"r"(SPI_W0(ESP_SPI_HSPI)));
So, here is the strange thing:
5 pairs = 11 cycles (10 l32i/s32i + rsr) which looks great
8 pairs = 17 cycles again great
10 pairs = 28 cycles slowing down - 1.4 cy/inst
12 pairs = 51 cycles
16 pairs = 64 cycles - 2 times slower?!
WHY?? Or better - how to avoid that?
It seems like some sort of "cache overrun" but the statement "ESP has no caches" is everywhere in the forum(s)...
reading the FIFO register (or GPIO) is "constantly slow" - 12 cycles for a single l32i read instruction
Any hint?
regards,
mculibrk