Venus encoding support

Hi,

I’m using gstreamer to encode h264 video. For testing purposes, I’m using videotestsrc but I can’t achieve the maximum framerate for the given input. Have you experienced this issue?

Format required: NV12, fullhd, 60fps

This is the gstreamer pipeline I’m using:

For debugging I’m using gstshark tracers.

GST_DEBUG="*:3,GST_TRACER:7" GST_TRACERS="framerate" \
gst-launch-1.0 videotestsrc pattern=0 num-buffers=3000 ! \
video/x-raw, width=1920, height=1080, framerate=60/1, format=NV12 ! \
tee name=tee0 \
tee0. ! queue name=qvideotestsrc ! fakesink \
tee0. ! queue name=qv4l2h264 max-size-buffers=10 leaky=2 ! v4l2h264enc extra-controls="controls,h264_profile=4,video_bitrate_mode=1,video_peak_bitrate=25000000,video_bitrate=25000000;" ! fakesink

Here the output:

As you see in the plot, videotestsrc is giving and output of 60fps but the venus encoding v4l2h264enc can’t achieve that. I tried to parallelize in threads by using the queues but not sure if that is the issue. Also, one of the cpu cores is 100% which I assume is the encoding part.

Any idea why this is happening or how can I tweak the encoder to get 60fps?
I tried to add queues or modify the bitrate but still not able to get the proper output.

Thanks in advance!

I think this is the cause of your issue (bottleneck), encoding is done by hardware encoder and so should only have ‘limited’ impact on the CPU load. AFAIR @leo-yan already performed some perf analysis on this kind of pipeline, but I don’t remember the output, maybe something wrong with videotestsrc…

  • @sumit.garg Sumit has much experience for this issue; could you give suggestions for this? :slight_smile:

Its most likely that videotestsrc is the bottleneck here being single threaded application performing heavy byte-wise operation which slows down the rate of video frame generation. So due to slow input source to gstreamer pipeline, encoder threads could to be starved.

So I would suggest you to have pre-generated video frames and then use them as an input source to the encoder. Something like:

$ gst-launch-1.0 videotestsrc pattern=0 num-buffers=3000 ! video/x-raw, width=1920, height=1080, framerate=60/1, format=NV12 ! filesink location=testfile.ts

$ gst-launch-1.0 filesrc num-buffers=3000 location=testfile.ts blocksize=3110400 '!' video/x-raw, width=1920, height=1080, framerate=60/1, format=NV12 ! v4l2h264enc extra-controls="controls,h264_profile=4,video_bitrate_mode=1,video_peak_bitrate=25000000,video_bitrate=25000000;" ! fakesink

Hi guys,

Thanks for your feedback. I agree that the bottleneck should be in the CPU so, I followed @sumit.garg steps but I couldn’t get the the 2nd pipeline working (for reading and encoding the ‘raw’ file generated). Basically, the error says 'Filter caps do not completely specify the output format':

Additional debug info:
../../../gst-plugins-good-1.14.4/sys/v4l2/v4l2_calls.c(587): gst_v4l2_get_selection_capabilities (): /GstPipeline:pipeline0/v4l2h264enc:v4l2h264enc0:
system error: Invalid argument
ERROR: from element /GstPipeline:pipeline0/GstCapsFilter:capsfilter0: Filter caps do not completely specify the output format
Additional debug info:
../../../gstreamer-1.14.4/plugins/elements/gstcapsfilter.c(455): gst_capsfilter_prepare_buf (): /GstPipeline:pipeline0/GstCapsFilter:capsfilter0:
Output caps are unfixed: video/x-raw, width=(int)1920, height=(int)1080, framerate=(fraction)60/1, format=(string)NV12, interlace-mode=(string){ progressive, interleaved }, colorimetry=(string){ bt601, smpte240m, bt709, 2:4:5:2, 2:4:5:3, 1:4:7:1, 2:4:7:1, 2:4:12:8, bt2020, 2:0:0:0 }
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
Freeing pipeline ...

Seems like it is an error with the 'blocksize' but I guess the value is correct for NV12 fullHD:'3110400=1920x1080x1.5'.

The weird thing is that I’ve tried the previous pipeline instead with videotestsrc with a video device I have connected (/dev/videoXv4l2src). In this case, the error is the same, I can’t get 60fps in the venus encoder even if my camera streams in 60fps NV12 fullHD and most important, all CPU cores are below 50%.

Here an output log of gstreamer for fullHD, 30fps, NV12 (in this case, venus enc is working at 24fps):

Pipeline:

GST_DEBUG="*:3,GST_TRACER:7" GST_TRACERS="framerate;cpuusage" \
gst-launch-1.0 v4l2src device=/dev/qcam_video0 num-buffers=600 ! \
video/x-raw, width=1920, height=1080, framerate=30/1, format=NV12 ! \
tee name=tee0 \
tee0. ! queue name=qv4l2src ! fakesink \
tee0. ! queue name=qv4l2h264 max-size-buffers=10 leaky=2 ! v4l2h264enc extra-controls="controls,h264_profile=4,video_bitrate_mode=1,video_peak_bitrate=25000000,video_bitrate=25000000;" ! fakesink

Log:

0:00:09.673702554  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: cpuusage, number=(uint)0, load=(double)11.340206;
0:00:09.673889274  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: cpuusage, number=(uint)1, load=(double)15.463917;
0:00:09.674041775  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: cpuusage, number=(uint)2, load=(double)0.000000;
0:00:09.674187140  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: cpuusage, number=(uint)3, load=(double)0.000000;
0:00:09.674347401  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: framerate, pad=(string)v4l2src0_src, fps=(uint)30;
0:00:09.674494798  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: framerate, pad=(string)capsfilter0_src, fps=(uint)30;
0:00:09.674638132  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: framerate, pad=(string)qv4l2src_src, fps=(uint)30;
0:00:09.674783132  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: framerate, pad=(string)qv4l2h264_src, fps=(uint)23;
0:00:09.674926987  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: framerate, pad=(string)tee0_src_1, fps=(uint)30;
0:00:09.675068759  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: framerate, pad=(string)tee0_src_0, fps=(uint)30;
0:00:09.675211520  4047 0xaaaae24ebcc0 TRACE             GST_TRACER :0:: framerate, pad=(string)v4l2h264enc0_src, fps=(uint)23;

The logs for 60fps are the same but in this case, encoding works up to 45fps.

Maybe try to use videoparse or rawvideoparse instead using filter, e.g.:

gst-launch-1.0 filesrc num-buffers=3000 location=testfile.ts ! videoparse format=nv12 width=1920 height=1080 ! v4l2h264enc...

Thanks @Loic,

Not sure if the raw file is good enough for the encoder but here the pipelines and graphs:

  • gen raw file
GST_DEBUG="*:3,GST_TRACER:7" GST_TRACERS="framerate;cpuusage" \
gst-launch-1.0 videotestsrc pattern=0 num-buffers=3000 ! \
video/x-raw, width=1920, height=1080, framerate=60/1, format=NV12 ! \
filesink location=testfile.ts

Here the cpu which I think is dropping down the fps of videotestsrc down to 0 in some points.

  • encode raw file
GST_DEBUG="*:3,GST_TRACER:7" GST_TRACERS="framerate;cpuusage" \
gst-launch-1.0 filesrc num-buffers=3000 location=testfile.ts \
! rawvideoparse format=nv12 width=1920 height=1080 framerate=60/1 \
! v4l2h264enc extra-controls="controls,h264_profile=4,video_bitrate_mode=1,video_peak_bitrate=25000000,video_bitrate=25000000;" ! \
fakesink

Here the documentation for the framerate GStshark tracer: https://developer.ridgerun.com/wiki/index.php?title=GstShark_-_Framerate_tracer
It says: ‘The frame rate is the measurement of the frame frequency, that means that it is the measurement of the number of frames that go through the source pad of certain element in a given time. Normally frame rate is expressed in frames per second (FPS)’.
Which I guess is the one I need for this test.

Can you share CPU usage graph for this as well?

Sure!

I just want to add more information and clarify my comment about v4l2src:

The weird thing is that I’ve tried the previous pipeline instead with videotestsrc with a video device I have connected ( /dev/videoXv4l2src ). In this case, the error is the same, I can’t get 60fps in the venus encoder even if my camera streams in 60fps NV12 fullHD and most important, all CPU cores are below 50%.

Here some plots for video device (v4l2src: /dev/qcam_video0):

  • cpu (cores) vs framerate:

I know there are so many graphs but still, you can see a line in 60 fps for the v4l2src and another line around ~45 fps for the v4l2h264enc. Also, CPU cores are below 50% of usage.

  • Here just the framerate plot:

  • Here the pipeline:
GST_DEBUG="*:3,GST_TRACER:7" GST_TRACERS="framerate;cpuusage" \
gst-launch-1.0 v4l2src device=/dev/qcam_video0 num-buffers=3000 ! \
video/x-raw, width=1920, height=1080, framerate=60/1, format=NV12 ! \
tee name=tee0 \
tee0. ! queue name=qv4l2src ! fakesink \
tee0. ! queue name=qv4l2h264 max-size-buffers=10 leaky=2 ! v4l2h264enc extra-controls="controls,h264_profile=4,video_bitrate_mode=1,video_peak_bitrate=25000000,video_bitrate=25000000;" ! fakesink

Could you please report the value of /sys/kernel/debug/clk/video_core_clk/clk_rate while pipeline is running?

I think it’s a question/problem for @svarbanov
Also on a recent build I’m not able the pipeline breaks if I specifify a framerate in gstreamer pipeline filter:

gst-launch-1.0 videotestsrc ! video/x-raw, width=1920, height=1080, format=NV12, framerate=60/1  ! v4l2h264enc ! fpsdisplaysink video-sink=fakesink -v 

any idea?

@Loic, to get the gstreamer tracers you need to install GstShark

Regarding venus encoder driver… I was testing the above with:

  • This patch reverted: 5354ea814aebe6f066d55a6a0840831fa638d20a

And as you are pointing:

@Loic
Could you please report the value of /sys/kernel/debug/clk/video_core_clk/clk_rate while pipeline is running?

  • I also changed the frequency for 1080p@30 as suggested here:
diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
index 5b8350e87e75..b786ead6a285 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -433,7 +433,7 @@ static const struct freq_tbl msm8996_freq_table[] = {
        { 1944000, 490000000 }, /* 4k UHD @ 60 */
        {  972000, 320000000 }, /* 4k UHD @ 30 */
        {  489600, 150000000 }, /* 1080p @ 60 */
-       {  244800,  75000000 }, /* 1080p @ 30 */
+       {  244800,  76000000 }, /* 1080p @ 30 */
 };
 
 static const struct reg_val msm8996_reg_preset[] = {

I had applied the freq patch but somehow the venus-enc.ko driver was not up to date in my board. Maybe I changed and reverted since I couldn’t verify that was really needed for my setup. My fault.

Therefore, my previous clk_rate was for 1080p@30:

cat /sys/kernel/debug/clk/video_core_clk/clk_rate
75000000

And now it is (:question:):

cat /sys/kernel/debug/clk/video_core_clk/clk_rate
150000000

And here the results for 1080p@30:

But still the same results for 1080p@60 (with the same clock rate 150000000):

So, I guess it is something related with the clock for 1080p@60?

Can’t understand why for the 30 fps pipeline, the frequency used is 150000000 and not 76000000.

BTW, can someone point me to the downstream driver for venus? I guess is not here http://git.linaro.org/landing-teams/working/qualcomm/kernel.git but in some AOSP repo.

Could you please try the following venus-core patch:
https://git.linaro.org/people/loic.poulain/linux.git/commit/?h=qcomlt-4.14-venus&id=41e9dd9c103d221ea694ad173a0364c51a06e6af

Yes! It worked! Thanks, @Loic for the patch.

I was always pointing to the working/qualcomm/kernel.git - Qualcomm Landing Team kernel and I wasn’t aware of your repo.

Results:

But let me ask you more questions about this frequency table:

  • Could you point me to the downstream kernel where the venus encoder driver is? I’m just curious about that and I guess is AOSP but no idea where it is located.
  • Is it somehow a way to calculate this frequency values for more resolutions and intervals (fps)?

Thanks again! :smile:

Yes, that’s the good repo, mine is just a personal repo for development and sharing some fixes. The plan is to upstream a fix and to backport it in qualcomm repo.

Well, there are several ‘downstream’ repo, you can find the frequency defined in a devicetree at:
https://android.googlesource.com/kernel/msm/+/android-msm-wahoo-4.4-oreo-dr1/arch/arm/boot/dts/qcom/msm8996-vidc.dtsi#41
And associated driver:
https://android.googlesource.com/kernel/msm/+/android-msm-wahoo-4.4-oreo-dr1/drivers/media/platform/msm/vidc/

It’s quite obscure to me but relies on encoder/decoder load which is based on instance(s) (decoding/encoding) throughput, so depending on frame width/height/fps:

Thanks @Loic for the links.

Okay, I will try to create and test additional modes since it should work up to 1080p@120.