Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine: add SPI DMA support for the rp2040 and samd51 #3985

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from
Draft

Conversation

aykevl
Copy link
Member

@aykevl aykevl commented Nov 4, 2023

This adds SPI DMA support for the rp2040, and a fallback for all the other devices.

For more details, see: tinygo-org/tinygo-site#342

Draft, until:

This prepares all devices for actual async SPI support.
Copy link

github-actions bot commented Nov 4, 2023

Size difference with the dev branch:

Binary size difference
 before   after   diff
  60960   60960      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adt7410/main.go
   9736    9736      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adxl345/main.go
  13268   13268      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pybadge ./examples/amg88xx
   8724    8724      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/main.go
  11612   11612      0   0.00%  tinygo build -size short -o ./build/test.hex -target=nano-33-ble ./examples/apds9960/proximity/main.go
   9784    9784      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/itsybitsy-m0/main.go
   8168    8168      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/at24cx/main.go
   8344    8344      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bh1750/main.go
   7632    7632      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/blinkm/main.go
  70492   70492      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pinetime     ./examples/bma42x/main.go
  63444   63444      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmi160/main.go
  27836   27836      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmp180/main.go
  63544   63544      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmp280/main.go
  12352   12352      0   0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/bmp388/main.go
   8128    8128      0   0.00%  tinygo build -size short -o ./build/test.hex -target=bluepill ./examples/ds1307/sram/main.go
  22100   22100      0   0.00%  tinygo build -size short -o ./build/test.hex -target=bluepill ./examples/ds1307/time/main.go
  69332   69332      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/ds3231/main.go
   4704    4704      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/easystepper/main.go
  24932   24932      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/espconsole/main.go
  25080   25080      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/esphub/main.go
  24932   24932      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/espstation/main.go
  68844   68844      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/flash/console/spi
  64916   64916      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/flash/console/qspi
   7040    7040      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/gc9a01/main.go
  68148   68148      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/gps/i2c/main.go
  68544   68544      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/gps/uart/main.go
   8372    8372      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/hcsr04/main.go
   5612    5612      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hd44780/customchar/main.go
   5656    5656      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hd44780/text/main.go
  10564   10564      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/hd44780i2c/main.go
  14516   14516      0   0.00%  tinygo build -size short -o ./build/test.hex -target=nano-33-ble ./examples/hts221/main.go
  16940   16940      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hub75/main.go
  10052   10052      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/basic
  10832   10832      0   0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/ili9341/basic
  29052   29052      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/pyportal_boing
  10080   10080      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/scroll
  10908   10908      0   0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/ili9341/scroll
 263564  263564      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/slideshow
  12052   12052      0   0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/lis3dh/main.go
  13912   13912      0   0.00%  tinygo build -size short -o ./build/test.hex -target=nano-33-ble ./examples/lps22hb/main.go
  26124   26124      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/lsm303agr/main.go
  12520   12520      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/lsm6ds3/main.go
  10996   10996      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mag3110/main.go
  10176   10176      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp23017/main.go
  10604   10604      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp23017-multiple/main.go
   9816    9816      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp3008/main.go
  66920   66920      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp2515/main.go
  22984   22984      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/microbitmatrix/main.go
  22936   22936      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit-v2 ./examples/microbitmatrix/main.go
   8444    8444      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mma8653/main.go
   8356    8356      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mpu6050/main.go
  75196   75196      0   0.00%  tinygo build -size short -o ./build/test.hex -target=p1am-100 ./examples/p1am/main.go
  12148   12148      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pico ./examples/pca9685/main.go
   6080    6080      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/pcd8544/setbuffer/main.go
   5100    5100      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/pcd8544/setpixel/main.go
   2681    2681      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino ./examples/servo
   7944    7944      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pybadge ./examples/shifter/main.go
  56608   56608      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/sht3x/main.go
  56664   56664      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/sht4x/main.go
  56580   56580      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/shtc3/main.go
   6456    6456      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1306/i2c_128x32/main.go
   5960    5960      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1306/spi_128x64/main.go
   5676    5676      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1331/main.go
   6376    6376      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/st7735/main.go
   6232    6232      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/st7789/main.go
  17156   17156      0   0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/thermistor/main.go
  10312   10312      0   0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-bluefruit ./examples/tone
  10096   10096      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/tm1637/main.go
   9404    9404      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/touch/resistive/fourwire/main.go
  12468   12468      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/touch/resistive/pyportal_touchpaint/main.go
  15804   15804      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/vl53l1x/main.go
  13848   13848      0   0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/vl6180x/main.go
   6452    6452      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd2in13/main.go
   6004    6004      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd2in13x/main.go
   6260    6260      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd4in2/main.go
 137504  137504      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/ntpclient/main.go
 137512  137512      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/udpstation/main.go
 137496  137496      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/tcpclient/main.go
 137792  137792      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/webclient/main.go
   6924    6924      0   0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/ws2812
   5228    5228      0   0.00%  tinygo build -size short -o ./build/test.bin -target=m5stamp-c3          ./examples/ws2812
  61872   61872      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-nrf52840 ./examples/is31fl3731/main.go
   1549    1549      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino   ./examples/ws2812
    880     880      0   0.00%  tinygo build -size short -o ./build/test.hex -target=digispark ./examples/ws2812
  32248   32248      0   0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/bme280/main.go
  16612   16612      0   0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/microphone/main.go
  11268   11268      0   0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/buzzer/main.go
  12976   12976      0   0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/veml6070/main.go
   6800    6800      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l293x/simple/main.go
   8720    8720      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l293x/speed/main.go
   6764    6764      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l9110x/simple/main.go
   9336    9336      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l9110x/speed/main.go
   7304    7304      0   0.00%  tinygo build -size short -o ./build/test.hex -target=nucleo-f103rb ./examples/shiftregister/main.go
   6932    6932      0   0.00%  tinygo build -size short -o ./build/test.hex -target=hifive1b ./examples/ssd1351/main.go
  13172   13172      0   0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/lis2mdl/main.go
   8428    8428      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/max72xx/main.go
  77152   77152      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/dht/main.go
  36608   36608      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-rp2040 ./examples/pcf8523/
  71136   71136      0   0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/alarm/
   7400    7400      0   0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/clkout/
  70768   70768      0   0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/time/
  71056   71056      0   0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/timer/
  12076   12076      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pico ./examples/qmi8658c/main.go
   8996    8996      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/ina260/main.go
   9280    9280      0   0.00%  tinygo build -size short -o ./build/test.hex -target=nucleo-l432kc ./examples/aht20/main.go
  72288   72288      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m4 ./examples/sdcard/console/
  82580   82580      0   0.00%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/webclient/
  71996   71996      0   0.00%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/webserver/
  98644   98644      0   0.00%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/mqttsub/
  60756   60756      0   0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m4 ./examples/i2csoft/adt7410/
  10152   10152      0   0.00%  tinygo build -size short -o ./build/test.elf -target=wioterminal ./examples/axp192/m5stack-core2-blinky/
   8916    8916      0   0.00%  tinygo build -size short -o ./build/test.uf2 -target=pico ./examples/xpt2046/main.go
  14564   14564      0   0.00%  tinygo build -size short -o ./build/test.hex -target=nucleo-wl55jc ./examples/sx126x/lora_rxtx/
  26260   26260      0   0.00%  tinygo build -size short -o ./build/test.uf2 -target=pico ./examples/ssd1289/main.go
  11188   11188      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pico ./examples/irremote/main.go
  11204   11204      0   0.00%  tinygo build -size short -o ./build/test.hex -target=badger2040 ./examples/uc8151/main.go
  10312   10312      0   0.00%  tinygo build -size short -o ./build/test.uf2 -target=pico ./examples/scd4x/main.go
   8692    8692      0   0.00%  tinygo build -size short -o ./build/test.uf2 -target=circuitplay-express ./examples/makeybutton/main.go
   9576    9576      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/ds18b20/main.go
  81312   81312      0   0.00%  tinygo build -size short -o ./build/test.hex -target=nucleo-wl55jc ./examples/lora/lorawan/atcmd/
  15732   15732      0   0.00%  tinygo build -size short -o ./build/test.uf2 -target=pico ./examples/as560x/main.go
   9800    9800      0   0.00%  tinygo build -size short -o ./build/test.uf2 -target=pico ./examples/mpu6886/main.go
   7840    7840      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/ttp229/main.go
  66056   66056      0   0.00%  tinygo build -size short -o ./build/test.hex -target=pico ./examples/ndir/main_ndir.go
  61368   61368      0   0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ndir/main_ndir.go
  64616   64616      0   0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/ndir/main_ndir.go
   9216    9216      0   0.00%  tinygo build -size short -o ./build/test.uf2 -target=pico ./examples/mpu9150/main.go
3826946 3826946      0   0.00%   sum

Copy link
Member

@kenbell kenbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to add one (maybe two?) examples showing best practice on how to use StartTx / Wait in a driver (like tinygo drivers) and directly in an app (if that use-case is expected).

@@ -306,6 +325,15 @@ func (spi SPI) tx(tx []byte) error {
dreq = 18 // DREQ_SPI1_TX
}

// Wait for the previous transmission to complete.
for ch.CTRL_TRIG.Get()&rp.DMA_CH0_CTRL_TRIG_BUSY != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a busy loop, why not yield to the scheduler here - it's a busy loop either way, and yielding would allow stuff like timers to happen?

ch = &dmaChannels[spi0DMAChannel]
} else { // SPI1
ch = &dmaChannels[spi1DMAChannel]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above - this goes through to a busy loop. I'm not sure how this enables async behavior if it's busy-loop. How does:

spi.StartTx()
spi.Wait()

work better than doing:

spi.Tx()

It seems like it only helps in one very specific case where the code between StartTx and Wait can do something meaningful for an app - and I'm not sure how that works in a driver in tinygo drivers. If a display-chip driver wants to use DMA how does the driver yield to the app logic, so the app can do some useful processing ready for the next frame?

I can see how this works if the app is itself calling spi.StartTx and spi.Wait, but not how it works when there's a driver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, I plan on adding methods like StartDrawBitmap and Wait to the display drivers. I already have these changes locally, but would like to improve them a bit before submitting.
Basically, that way you can do display updates and calculating the next frame at the same time. Pseudocode:

for part of display:
    fill half of the buffer
    wait until previous buffer has been sent
    send buffer to screen
    continue with other half of the buffer
wait until previous buffer has been sent

I've managed to hit nearly the maximum possible FPS on a st7789 this way, because the app could fill the next buffer to be sent while the previous buffer was still transmitting - thereby doing the two in parallel.

If you'd like to see these changes first in the driver I understand.

Copy link
Contributor

@soypat soypat Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced this could not be written as application code with the old blocking SPI abstraction, i.e:

workerDisplayThread(workToDoChannel):
   for:
      ourWork <- workToDoChannel;
      if !(ourWork && displayOn):
         return
      fill our part of buffer given ourWork
      lock spi bus
      send buffer to screen while blocking
      unlock spi bus

Some advantages of the above abstraction:

  • machine.SPI kept simple, which in my opinion is a huge gain. I really don't want to end up with something like smbus or the micropython i2c abstraction...
  • Multiplexing on to more than just 2 threads is pretty straightforward in case the rendering is the bottleneck, given you have multicore processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand you correctly, you propose doing the work in two separate goroutines that both have half the buffer?
One thing I ran into when I tried this is that this part:

send buffer to screen while blocking

...might actually not happen in parallel. So what could happen is that the first goroutine renders a buffer, then locks the SPI bus (or does something else that can cause a context switch), and then the second goroutine starts rendering into its own buffer before the first goroutine has started the SPI transfer. Which means there is no speed benefit at all. This is especially likely to occur when the SPI code sprinkles some runtime.Gosched() calls throughout, almost guaranteeing that it will switch over to the second goroutine before starting the SPI transfer.

This is really hard to reason about and I don't see a good way around it. I've found the async transmit and wait calls to be much easier to reason about.

Perhaps you know of a way to avoid this trap, though?

Copy link
Contributor

@soypat soypat Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really hard to reason about and I don't see a good way around it. I've found the async transmit and wait calls to be much easier to reason about.

I agree, it is harder to reason about, but at least it can be abstracted and though about in a way such that we could develop a generic algorithm to solve this problem using current SPI so that we don't expose two abstractions on the same type (AsyncSPI+SPI) at the user level.

Perhaps you know of a way to avoid this trap, though?

Hmm, what about two channels? One for starting rendering with the display space to be rendered and another to signal it finished rendering? You'd then need a high level orchestrator to synchronize the processes, but I'm guessing maybe it could consume the display interface plus an interface that renders the scene so that the complexity is implemented once and used for all displays drivers.

Maybe it's not a long lasting solution if moving on to multiple screens but I think it's worth the effort to think about how we can keep the SPI interface minimalist.

As far as I can tell we already have the async functionality exposed via the current SPI interface in a out-of-band sort of way.

  • Users know that calling SPI.Tx will block and most likely schedule the current goroutine while it performs I/O.
  • When Tx returns it is a signal that the transaction has ended. Time to deassert CS.

Something I don't like about this new interface is it leaks coupling between StartTx and Wait regarding the CS line. People who write drivers need to implement both the StartAction and Wait methods on their driver device API, and that bubbles up to the user level where users also need to write and use the StartAction and Wait methods to ensure the CS line is deasserted after the action.

As a final note here's a quote by Rob Pike from his dotGo talk "Simplicity is complicated" which I feels applies to the SPI 2 method interface design we enjoy today: "Simplicity is complicated but clarity is worth the fight".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A should block on an asynchronous DMA call ideally that runs in the background while B starts drawing window.

Ideally yes. The problem is that there are two asynchronous DMA calls. One in the setWindow method (to set the x/y/width of the bitmap), and the other for the actual bitmap.
So A does the first call, blocks, so the CPU switches over to B before the bitmap has started transmitting.
The setWindow call is very short so barely useful to do in the background. But because it's a blocking call, it pushes A to the background. And because we have a cooperative scheduler, A is only resumed once B is finished rendering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.

Yeah, that makes a lot of sense.

It does seem like the point right after setWindow in goroutine A could send over a buffered channel of length 1 to guarantee goroutine B does not steal control. B would be waiting right before starting rendering on the channel receive. This could be done vice-versa, so B would also send over a channel (maybe the same one!) when done rendering to signal A should also start rendering right after B starts SPI.

This is considering that a send on a buffered not-full channel does not block, which if I'm not mistaken it does not (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - setWindow looks like this?

	x += d.columnOffset
	y += d.rowOffset
	copy(d.buf[:4], []uint8{uint8(x >> 8), uint8(x), uint8((x + w - 1) >> 8), uint8(x + w - 1)})
	d.sendCommand(CASET, d.buf[:4])
	copy(d.buf[:4], []uint8{uint8(y >> 8), uint8(y), uint8((y + h - 1) >> 8), uint8(y + h - 1)})
	d.sendCommand(RASET, d.buf[:4])
	d.sendCommand(RAMWR, nil)

If so, that's like 3 DMA transfers just to set window size? @aykevl design is growing on me - but i wonder if buses could/should expose a startTx that takes multiple buffers and does them all in one contiguous async operation. In most cases I guess this would mean using interrupts to switch between buffers when each transaction is complete. I guess this could theoretically be faster since setWindow currently does three blocking SPI transactions that could be async?

Copy link
Member Author

@aykevl aykevl Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like the point right after setWindow in goroutine A could send over a buffered channel of length 1 to guarantee goroutine B does not steal control. B would be waiting right before starting rendering on the channel receive.

Yes, this could work. It's a bit finnicky to get right, but I don't see why this wouldn't work.
It does seem to have a lot more non-obvious complexity though. While the API may look simpler, users of the API would have to introduce a lot more complexity to be fast - and it's certainly not very intuitive as evidenced by this discussion!

This is considering that a send on a buffered not-full channel does not block, which if I'm not mistaken it does not (?)

Correct. (I'd say this is implementation defined, but it makes the most sense to not block when doing a send on a channel with space in the buffer).

Hmm - setWindow looks like this?
[...]
If so, that's like 3 DMA transfers just to set window size?

Actually five, because each sendCommand call is actually a single byte transaction to send the command, and then (optionally, depending on the command) some parameter bytes sent in a separate transfer. And it has to be five separate transactions because the command/data pin has to be toggled between each transfer so that the display controller knows which bytes are part of the command and which are part of the command parameters.

(There is an optimization opportunity here if the window didn't change, but I'm talking about the general case).

This is also the reason why the SetPixel API is so slow: it would have to do six SPI transactions (13 bytes) per pixel in the common (worst) case!

[...] I wonder if buses could/should expose a startTx that takes multiple buffers and does them all in one contiguous async operation.

This is actually part of the design: StartTx can be called multiple times before waiting for the transactions to complete. (The implementation I wrote doesn't make use of this, but it's possible).
...if it were possible, which unfortunately it isn't because of the command/data pin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartTx can be called multiple times before waiting for the transactions to complete.

Isn't there an issue with Lock contention when calling StartTx? When there's several StartTx calls the ordering is not guaranteed since they'd be waiting on a lock free which in upstream Go has no guaranteed ordering?

While the API may look simpler, users of the API would have to introduce a lot more complexity to be fast

While I do agree this is more work for the user of machine.SPI there are some other points worth mentioning:

  1. Go and async: Go's design is built on channels and virtual threads. Async API design is not what's usually found in the Go ecosystem (with exception of exec and few other packages) and users are more likely to be unfamiliar with this API.

  2. We have one use case for this API and it may not even be strictly necessary to solve the problem. There may be a way to solve this using channels and maintaining performance characteristics of this design.

Given these points above I'd err on the side of including this API in a external package, not machine, and seeing how well it fares on the basis that Yes is forever, No is temporary.

@aykevl aykevl changed the title machine: add SPI DMA support for the rp2040 machine: add SPI DMA support for the rp2040 and samd51 Nov 11, 2023
@aykevl
Copy link
Member Author

aykevl commented Dec 8, 2023

Here is a driver that shows how this async API can be used: tinygo-org/drivers#625
And here is how I plan to use this driver in a graphics library: https://github.com/aykevl/tinygl/compare/async-update

While the graphics library needs to be careful to support async operations, in most cases it's pretty simple to use (get a new buffer, write to it, send to the display) and in fact users of the graphics library don't need to know about it at all unless they implement a custom widget.

In practice, I've found that fast display updates (where rendering takes less time than a display update) are limited by the time it takes to update the display (plus some small extra overhead). This means that it's possible to do most display updates on the Gopher Badge in about 20ms (the hardware limit) and it will only take longer when rendering takes longer than 20ms.
So in short, it works and it speeds up display performance substantially - about as much as DMA theoretically could.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants