Camera data rate throughput issue with Avenger96

Hi All,

We are trying to interface camera on Avenger96. But we are not able to achieve maximum throughput supported by
DCMI, st-mipid02.

As per Figure 2 in stmipid02 datasheet ( Datasheet - STMIPID02 - Dual mode MIPI CSI-2 / SMIA CCP2 de-serializer),
MIPI CSI-2 to STMIPID02 data rate should be between 80Mbps to 800Mbps per lane.
DCMI operates at 208Mhz in 8 Bit mode, maximum theoretical throughput is 208 MiB/s (~1600Mbps).

In our case, we are able to get proper image only if data rate of sensor module interface is 320Mbps (set in sensor module)
and st-mipid02 data rate is set at 333Mbps (via clk_lane_reg1).

When we try to set st-mipid02 data rate to 400 or 500 Mbps per lane and data rate of camera sensor module to 380Mbps, 480Mbps respectively (which is well within range of stmipid02 and dcmi),
we are not getting proper image and we are seeing overrun errors from DCMI mentioned below.
stm32-dcmi 4c006000.dcmi: Some errors found while streaming: errors=6963 (overrun=6966), buffers=4

It seems that DCMI is not able to provide expected throughput. Could you please provide information on
limitations of st mipid02 and DCMI?

Also, as per mentioned in section 3.3 of AN5470 [dm00693021-stm32mp1-series-interfacing-with-a-mipi-csi2-camera-stmicroelectronics (2).pdf](file:///C:/Users/142865/AppData/Local/Microsoft/Windows/INetCache/Content.Outlook/K6NX08HM/dm00693021-stm32mp1-series-interfacing-with-a-mipi-csi2-camera-stmicroelectronics%20(2).pdf)
from ST, “maximum performance is achieved is 24MPixel/sec, which is equivalent of 1.3Mpixel @18 fps”. In this case total data rate would be ~390 Mbps if data format is 16 bit. Does that mean that we can not achieve more than 390Mbps of throughput?

This is likely due to DRAM bandwidth limitations of the AV96 in combination with LTDC supplying the HDMI output.
On AV96, the D3 mezzanine board with OV5640 camera is supposed to work.

Keep in mind that the STM32MP1 platform might not be able to cope with all the resolutions and framerates the sensor supports, esp. when using the LTDC for driving video output (e.g. HDMI) and GPU (for compositing) as well, it is possible to exhaust DRAM bandwidth and DCMI/DMA will start indicating overflow. Reasonable resolution which the STM32MP1 can handle is some 1280x720 @ 30 FPS packed YUV, above that it depends on the system load.

The DCMI operates at 208 MHz in 8bit mode, so the maximum theoretical throughput is 208 MiB/s under ideal conditions (i.e. without additional load caused by LTDC and GPU). Below are examples of DCMI bandwidth used by different camera settings:

1280x720 @ 30 FPS packed YUV ~= 56 MiB/s

1920x1080 @ 30 FPS packed YUV ~= 125 MiB/s "

Now, let’s do the simple math for viewing high-resolution camera image.

You will be doing sustained READ from DRAM by the LTDC, to supply the HDMI with image. If you have 1920x1080 @ 60 Hz panel with 32bpp RGBx colors, that READ is ~475 MiB/s (1920x1080x(32/8)x60 >> 20)

You will be doing sustained WRITE to DRAM by the camera pipeline, which writes captured frames into the DRAM. If you have 1920x1080 frames in e.g. YUYV format at 30 fps, that’s ~118 MiB/s (1920x1080x(16/8)x30 >> 20)

And then there is the conversion from packed YUV to RGBx, which does 118 MiB/s READs and 475 MiB/s WRITEs, so you end up with 600 MiB/s READs and

600 MiB/s WRITEs to that DDR3 DRAM of the STM32MP1, which is likely close to its limit.

But there is a bonus catch to it. Various compositors have a special case for showing fullscreen windows, where they bypass compositing in such case, i.e. if they get a suitable buffer which they can pass directly to the scanout engine (LTDC) as is, they do so. If your application is windowed, then you need to do the compositing pass, which means one set of rendering, often by GPU, which means READ of the source texture (the camera image) and WRITE of the destination buffer (the LTDC buffer). If you have a really bad case, like weston header on top and the camera window just below that, then you would end up basically doing another 1920x1080x60 at 32bpp READ+WRITE from memory to memory, which is extra 475 MiB/s READ+WRITE in both directions.

You can measure the exact bandwidth using perf and the DDR performance counter driver, see perf -e for all the events available (if you need details on this, let me know)

Thank you for the detailed response. It is informative and helped us to proceed further.
We have one more query in this regard.
We are now trying to interface 13 MP camera (4208(H) x 3120 (V)) with AP1302 ISP camera with Avenger96 board. When doing this development we have observed that there is limitation of 2592 x 2592 in the dcmi driver. linux/stm32-dcmi.c at master · torvalds/linux (github.com).
We tried to identify why this limitation is kept in the driver by referring all the related documents , but we have not yet identified source for this limitation.
Currently we are not able to stream 13MP preview in HDMI display even though we increased limit in dcmi driver till 4208 x 3120 resolution.
We have also tried to increase CMA memory up to 512MB , still preview is not successful.
Preview is tried with (4208 x 3120 @ 1 FPS ) . Following is the log of meminfo
When AV96 is Idle
root@dh-stm32mp1-dhcor-avenger96:~# cat /proc/meminfo
MemTotal: 1027396 kB
MemFree: 900804 kB
MemAvailable: 930804 kB
Buffers: 6916 kB
Cached: 56540 kB
SwapCached: 0 kB
Active: 20116 kB
Inactive: 55544 kB
Active(anon): 1608 kB
Inactive(anon): 31076 kB
Active(file): 18508 kB
Inactive(file): 24468 kB
Unevictable: 9708 kB
Mlocked: 0 kB
HighTotal: 262140 kB
HighFree: 181380 kB
LowTotal: 765256 kB
LowFree: 719424 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 4 kB
Writeback: 0 kB
AnonPages: 21980 kB
Mapped: 22820 kB
Shmem: 20480 kB
KReclaimable: 11384 kB
Slab: 22424 kB
SReclaimable: 11384 kB
SUnreclaim: 11040 kB
KernelStack: 936 kB
PageTables: 888 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 513696 kB
Committed_AS: 102928 kB
VmallocTotal: 245760 kB
VmallocUsed: 920 kB
VmallocChunk: 0 kB
Percpu: 208 kB
CmaTotal: 524288 kB
CmaFree: 511852 kB

When AV96 try to start streaming
root@dh-stm32mp1-dhcor-avenger96:~# cat /proc/meminfo
MemTotal: 1027396 kB
MemFree: 418232 kB
MemAvailable: 453208 kB
Buffers: 6268 kB
Cached: 350180 kB
SwapCached: 0 kB
Active: 20508 kB
Inactive: 68372 kB
Active(anon): 1644 kB
Inactive(anon): 38712 kB
Active(file): 18864 kB
Inactive(file): 29660 kB
Unevictable: 297152 kB
Mlocked: 0 kB
HighTotal: 262140 kB
HighFree: 576 kB
LowTotal: 765256 kB
LowFree: 417656 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 36 kB
Writeback: 0 kB
AnonPages: 29652 kB
Mapped: 27172 kB
Shmem: 307924 kB
KReclaimable: 10812 kB
Slab: 22720 kB
SReclaimable: 10812 kB
SUnreclaim: 11908 kB
KernelStack: 992 kB
PageTables: 1008 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 513696 kB
Committed_AS: 453364 kB
VmallocTotal: 245760 kB
VmallocUsed: 932 kB
VmallocChunk: 0 kB
Percpu: 208 kB
CmaTotal: 524288 kB
CmaFree: 331412 kB