Camera data rate throughput issue with Avenger96

Hi All,

We are trying to interface camera on Avenger96. But we are not able to achieve maximum throughput supported by
DCMI, st-mipid02.

As per Figure 2 in stmipid02 datasheet ( Datasheet - STMIPID02 - Dual mode MIPI CSI-2 / SMIA CCP2 de-serializer),
MIPI CSI-2 to STMIPID02 data rate should be between 80Mbps to 800Mbps per lane.
DCMI operates at 208Mhz in 8 Bit mode, maximum theoretical throughput is 208 MiB/s (~1600Mbps).

In our case, we are able to get proper image only if data rate of sensor module interface is 320Mbps (set in sensor module)
and st-mipid02 data rate is set at 333Mbps (via clk_lane_reg1).

When we try to set st-mipid02 data rate to 400 or 500 Mbps per lane and data rate of camera sensor module to 380Mbps, 480Mbps respectively (which is well within range of stmipid02 and dcmi),
we are not getting proper image and we are seeing overrun errors from DCMI mentioned below.
stm32-dcmi 4c006000.dcmi: Some errors found while streaming: errors=6963 (overrun=6966), buffers=4

It seems that DCMI is not able to provide expected throughput. Could you please provide information on
limitations of st mipid02 and DCMI?

Also, as per mentioned in section 3.3 of AN5470 [dm00693021-stm32mp1-series-interfacing-with-a-mipi-csi2-camera-stmicroelectronics (2).pdf](file:///C:/Users/142865/AppData/Local/Microsoft/Windows/INetCache/Content.Outlook/K6NX08HM/dm00693021-stm32mp1-series-interfacing-with-a-mipi-csi2-camera-stmicroelectronics%20(2).pdf)
from ST, “maximum performance is achieved is 24MPixel/sec, which is equivalent of 1.3Mpixel @18 fps”. In this case total data rate would be ~390 Mbps if data format is 16 bit. Does that mean that we can not achieve more than 390Mbps of throughput?

This is likely due to DRAM bandwidth limitations of the AV96 in combination with LTDC supplying the HDMI output.
On AV96, the D3 mezzanine board with OV5640 camera is supposed to work.

Keep in mind that the STM32MP1 platform might not be able to cope with all the resolutions and framerates the sensor supports, esp. when using the LTDC for driving video output (e.g. HDMI) and GPU (for compositing) as well, it is possible to exhaust DRAM bandwidth and DCMI/DMA will start indicating overflow. Reasonable resolution which the STM32MP1 can handle is some 1280x720 @ 30 FPS packed YUV, above that it depends on the system load.

The DCMI operates at 208 MHz in 8bit mode, so the maximum theoretical throughput is 208 MiB/s under ideal conditions (i.e. without additional load caused by LTDC and GPU). Below are examples of DCMI bandwidth used by different camera settings:

1280x720 @ 30 FPS packed YUV ~= 56 MiB/s

1920x1080 @ 30 FPS packed YUV ~= 125 MiB/s "

Now, let’s do the simple math for viewing high-resolution camera image.

You will be doing sustained READ from DRAM by the LTDC, to supply the HDMI with image. If you have 1920x1080 @ 60 Hz panel with 32bpp RGBx colors, that READ is ~475 MiB/s (1920x1080x(32/8)x60 >> 20)

You will be doing sustained WRITE to DRAM by the camera pipeline, which writes captured frames into the DRAM. If you have 1920x1080 frames in e.g. YUYV format at 30 fps, that’s ~118 MiB/s (1920x1080x(16/8)x30 >> 20)

And then there is the conversion from packed YUV to RGBx, which does 118 MiB/s READs and 475 MiB/s WRITEs, so you end up with 600 MiB/s READs and

600 MiB/s WRITEs to that DDR3 DRAM of the STM32MP1, which is likely close to its limit.

But there is a bonus catch to it. Various compositors have a special case for showing fullscreen windows, where they bypass compositing in such case, i.e. if they get a suitable buffer which they can pass directly to the scanout engine (LTDC) as is, they do so. If your application is windowed, then you need to do the compositing pass, which means one set of rendering, often by GPU, which means READ of the source texture (the camera image) and WRITE of the destination buffer (the LTDC buffer). If you have a really bad case, like weston header on top and the camera window just below that, then you would end up basically doing another 1920x1080x60 at 32bpp READ+WRITE from memory to memory, which is extra 475 MiB/s READ+WRITE in both directions.

You can measure the exact bandwidth using perf and the DDR performance counter driver, see perf -e for all the events available (if you need details on this, let me know)