Memory mapped GPIO access and memw

Posts: 52
Joined: Mon Mar 26, 2018 7:27 pm

Memory mapped GPIO access and memw

Postby apuder » Thu Jan 14, 2021 6:48 pm


I looked at the generated assembly for a simple memory mapped GPIO access:

Code: Select all

GPIO.out_w1ts = SOME_VAL;
Here is the assembly I see:

Code: Select all

l32r    a9, SOME_VAL
l32r    a8, GPIO
s32i.n  a9, a8, 8
I wanted to ask about the memw instruction. First of all, why is the memw instruction before the s32i instruction? I would have anticipated it after the s32i instruction. But my main question is about the necessity of memw. I assume that the compiler generated this because of GPIO being declared as volatile. However, memw may also cause quite some delays. Is memw for this type of access necessary? Is the memory mapped GPIO hardware also behind a cache? What would happen if I removed the memw instruction?


Posts: 9271
Joined: Thu Nov 26, 2015 4:08 am

Re: Memory mapped GPIO access and memw

Postby ESP_Sprite » Fri Jan 15, 2021 3:29 am

It is to make sure all previous writes have made it to their respective devices. An example where this would be relevant if you would do a write to a RTC peripheral register, then a write to a GPIO w1ts register to make a pin high, then a write to a w1tc register to make a pin low. The write to a RTC peripheral may take a while to complete (because of different clock domains) so the bus may be busy for a bit after that write. The CPU doesn't care, it can continue executing while the bus is crunching away at its task. It then tries to execute the W1TS. This fails; the bus is still busy and can't accept another write. The CPU still doesn't want to wait for this, so it sets the write aside in a load-store buffer and continues execution: the next instruction is a W1TC. At that specific moment, the RTC peripheral write completes, and this W1TC can actually be executed: the GPIO goes low. After this is done, the CPU tries to re-execute what is in the load-store buffer, which is the W1TS: the GPIO goes high. This is effectively the inverse of what you would expect the code to do.

(Note that I'm not 100% sure if this particular example would be an issue in practice. If the Xtensa core tries to keep coherency on the APB/Dport buses, it would never swap around memory writes like that, but I can't be sure. Even if there is memory coherency, depending on the peripheral, you may still get weirdness in other situations without the memw instruction, for instance if you write something over the Dport but read it back over APB.)

For why the memw is executed before the write rather than after: at least a consideration could be that this is more efficient. With a memw before every volatile write, the last memory access will already have had a few cycles to complete (e.g. while the CPU was loading up the address and data to write to.) If the memw was after the write, it would always wait for the entire duration of the write.

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 209 guests