Long Running Venus Encoder Fault

Hello I am working on a streaming video application and I have been seeing some issues while running video through Venus for a long period of time. Basically the video stream starts getting corrupted after several hours of use. I am not sure when it exactly goes down, but if I do not refresh the encoder by taking the RTP H264 stream down and bringing it back up again the stream becomes unplayable.

Is there a maximum length that I am hitting when leaving the encoder open for multiple hours? The only indicator of failure is a dmesg log entry indicating an “active buffer mismatch” from CAMSS. Could this be the culprit?

@svarbanov any idea?

Hi Rob, I cannot find such print in camss, could you point me to the exact message and also the what kernel version you are using. I haven’t seen such encoder behavior, so more info will be needed how that could be reproduced.

I was finally able to circle back to this issue that I’ve been seeing, sorry to bump a 7 month old post. I was able to do some additional diagnostics on the issue with the help of the following gstreamer pipeline:

gst-launch-1.0 -m -e v4l2src device=/dev/video3 ! video/x-raw,format=NV12,width=1280,height=720,framerate=30/1 ! v4l2h264enc extra-controls="controls,h264_level=9" ! video/x-h264, stream-format=byte-stream, alignment=au, profile=constrained-baseline, width=1280, height=720, framerate=30/1 ! h264parse ! fakesink

I see the following events come across my console when the duration of the encode approaches ~40 hours:

Unfortunately I do not have additional logging from this particular test, however I will be spinning up another test soon where I’ll be logging to a file and saving down the entire encode to a file. This way we’ll have the log at the transition to the fault and a video file to parse. From as far as I can tell the error develops when the slice header field overflows past 65535, thus creating a non-conformant h.264 stream. It appears that encodes can decode on some software decoders (x264 and openh264), however hardware decoders like the ones found in iPhones have trouble decoding these files when the encode session enters this state. Can the Venus encoder be used for protracted periods of encoding in a security camera-like use case?

I have some additional information pertaining to this fault –

I was able to capture the bitstream to a file and run it through codecvisa, a stream analysis program and it appears that the offending field in the slice header data has been identified. The offending header containing the invalid data appears to be idr_pic_id which signals the decoder of the id of the IDR frame. The IDR frame or instantaneous decoder refresh frame is responsible for clearing the reference picture buffer and invalidating the previous reference frames before it (see Everything You Ever Wanted to Know About IDR Frames but Were Afraid to Ask - Streaming Learning Center).

Slice header decoded:

Full encoder Log (matching data found on line 49 “0:00:01.750390843 786 0xc5dd190 WARN codecparsers_h264 gsth264parser.c:2128:gst_h264_parser_parse_slice_hdr: value greater than max. value: 78636, max 65535” ): https://gist.github.com/RobGries/91a91d659bf0c506b6f945ad0a6442a6

Additionally, according to http://www.ramugedia.com/h-264-avc-error-handling, it shows that the idr_pic_id cannot exceed 65535 and the severity of the error is Middle/Low. The site also shows that a valid recovery step might be to set the idr_pic_id to zero when it overflows.

@svarbanov, Should Venus detect this fault and set the idr_pic_id to zero when it overflows past it’s maximum?