media: cedrus: skip invalid H.264 reference list entries

Message ID 20260324080856.56787-1-pengpeng@iscas.ac.cn (mailing list archive)
State New
Headers
Series media: cedrus: skip invalid H.264 reference list entries |

Commit Message

Pengpeng Hou March 24, 2026, 8:08 a.m. UTC
Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
stateless slice control and later uses their indices to look up
decode->dpb[] in _cedrus_write_ref_list().

Rejecting such controls in cedrus_try_ctrl() would break existing
userspace, since stateless H.264 reference lists may legitimately carry
out-of-range indices for missing references. Instead, guard the actual
DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
V4L2_H264_NUM_DPB_ENTRIES array.

This keeps the fix local to the driver use site and avoids out-of-bounds
reads from malformed or unsupported reference list entries.

Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
---
 drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Jernej Skrabec March 29, 2026, 9:21 a.m. UTC | #1
Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a):
> Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> stateless slice control and later uses their indices to look up
> decode->dpb[] in _cedrus_write_ref_list().
> 
> Rejecting such controls in cedrus_try_ctrl() would break existing
> userspace, since stateless H.264 reference lists may legitimately carry
> out-of-range indices for missing references. Instead, guard the actual
> DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> V4L2_H264_NUM_DPB_ENTRIES array.
> 
> This keeps the fix local to the driver use site and avoids out-of-bounds
> reads from malformed or unsupported reference list entries.
> 
> Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>

Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>

Best regards,
Jernej
  
Chen-Yu Tsai March 29, 2026, 12:44 p.m. UTC | #2
On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
>
> Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a):
> > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > stateless slice control and later uses their indices to look up
> > decode->dpb[] in _cedrus_write_ref_list().
> >
> > Rejecting such controls in cedrus_try_ctrl() would break existing
> > userspace, since stateless H.264 reference lists may legitimately carry
> > out-of-range indices for missing references. Instead, guard the actual
> > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > V4L2_H264_NUM_DPB_ENTRIES array.
> >
> > This keeps the fix local to the driver use site and avoids out-of-bounds
> > reads from malformed or unsupported reference list entries.
> >
> > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
>
> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>

Tested-by: Chen-Yu Tsai <wens@kernel.org>

This fixes a KASAN slab-use-after-free warning when running fluster H.264
tests.
  
Nicolas Dufresne March 30, 2026, 3:54 p.m. UTC | #3
Le mardi 24 mars 2026 à 16:08 +0800, Pengpeng Hou a écrit :
> Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> stateless slice control and later uses their indices to look up
> decode->dpb[] in _cedrus_write_ref_list().
> 
> Rejecting such controls in cedrus_try_ctrl() would break existing
> userspace, since stateless H.264 reference lists may legitimately carry
> out-of-range indices for missing references. Instead, guard the actual
> DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> V4L2_H264_NUM_DPB_ENTRIES array.
> 
> This keeps the fix local to the driver use site and avoids out-of-bounds
> reads from malformed or unsupported reference list entries.
> 
> Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> ---
>  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
>  		u8 dpb_idx;
>  
>  		dpb_idx = ref_list[i].index;
> +		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
> +			continue;

Matches how we skip inactive references (in this diff, though most userspace
just don't pass them). Now, if I looked lower, we set a position for each
references. My understanding is that if no bits are set, it means "no position".
How much testing have you done to confirm the HW behaves properly ?

Despite this question, I think this is going to work better then doing memory
overrun:

	Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>

Nicolas

> +
>  		dpb = &decode->dpb[dpb_idx];
>  
>  		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
  
Nicolas Dufresne March 30, 2026, 3:55 p.m. UTC | #4
Le dimanche 29 mars 2026 à 20:44 +0800, Chen-Yu Tsai a écrit :
> On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
> > 
> > Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a):
> > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > > stateless slice control and later uses their indices to look up
> > > decode->dpb[] in _cedrus_write_ref_list().
> > > 
> > > Rejecting such controls in cedrus_try_ctrl() would break existing
> > > userspace, since stateless H.264 reference lists may legitimately carry
> > > out-of-range indices for missing references. Instead, guard the actual
> > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > > V4L2_H264_NUM_DPB_ENTRIES array.
> > > 
> > > This keeps the fix local to the driver use site and avoids out-of-bounds
> > > reads from malformed or unsupported reference list entries.
> > > 
> > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > 
> > Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
> 
> Tested-by: Chen-Yu Tsai <wens@kernel.org>
> 
> This fixes a KASAN slab-use-after-free warning when running fluster H.264
> tests.

Ah, very good, can you cite which test caused that ? I didn't expect fluster to
cover cases with missing references. I think it will be handy for future
testing.

Nicolas
  
Chen-Yu Tsai March 30, 2026, 4:45 p.m. UTC | #5
On Mon, Mar 30, 2026 at 11:55 PM Nicolas Dufresne
<nicolas.dufresne@collabora.com> wrote:
>
> Le dimanche 29 mars 2026 à 20:44 +0800, Chen-Yu Tsai a écrit :
> > On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
> > >
> > > Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a):
> > > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > > > stateless slice control and later uses their indices to look up
> > > > decode->dpb[] in _cedrus_write_ref_list().
> > > >
> > > > Rejecting such controls in cedrus_try_ctrl() would break existing
> > > > userspace, since stateless H.264 reference lists may legitimately carry
> > > > out-of-range indices for missing references. Instead, guard the actual
> > > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > > > V4L2_H264_NUM_DPB_ENTRIES array.
> > > >
> > > > This keeps the fix local to the driver use site and avoids out-of-bounds
> > > > reads from malformed or unsupported reference list entries.
> > > >
> > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > >
> > > Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
> >
> > Tested-by: Chen-Yu Tsai <wens@kernel.org>
> >
> > This fixes a KASAN slab-use-after-free warning when running fluster H.264
> > tests.
>
> Ah, very good, can you cite which test caused that ? I didn't expect fluster to
> cover cases with missing references. I think it will be handy for future
> testing.

Looks like it is FM1_BT_B. And it only happens on the first run after reboot,
or KASAN just only reports it once.

BTW, this would be a lot easier to figure out if we could get fluster to
output a system timestamp for each decode run (at least in single job mode).

I had to hack in delays between each decode rune, and then look at `dmesg -w`
and switching back to the window that has fluster running once the warning
triggers.


ChenYu
  
Nicolas Dufresne March 30, 2026, 5:25 p.m. UTC | #6
Le mardi 31 mars 2026 à 00:45 +0800, Chen-Yu Tsai a écrit :
> On Mon, Mar 30, 2026 at 11:55 PM Nicolas Dufresne
> <nicolas.dufresne@collabora.com> wrote:
> > 
> > Le dimanche 29 mars 2026 à 20:44 +0800, Chen-Yu Tsai a écrit :
> > > On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
> > > > 
> > > > Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a):
> > > > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > > > > stateless slice control and later uses their indices to look up
> > > > > decode->dpb[] in _cedrus_write_ref_list().
> > > > > 
> > > > > Rejecting such controls in cedrus_try_ctrl() would break existing
> > > > > userspace, since stateless H.264 reference lists may legitimately carry
> > > > > out-of-range indices for missing references. Instead, guard the actual
> > > > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > > > > V4L2_H264_NUM_DPB_ENTRIES array.
> > > > > 
> > > > > This keeps the fix local to the driver use site and avoids out-of-bounds
> > > > > reads from malformed or unsupported reference list entries.
> > > > > 
> > > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > > > 
> > > > Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
> > > 
> > > Tested-by: Chen-Yu Tsai <wens@kernel.org>
> > > 
> > > This fixes a KASAN slab-use-after-free warning when running fluster H.264
> > > tests.
> > 
> > Ah, very good, can you cite which test caused that ? I didn't expect fluster to
> > cover cases with missing references. I think it will be handy for future
> > testing.
> 
> Looks like it is FM1_BT_B. And it only happens on the first run after reboot,
> or KASAN just only reports it once.

Thanks, its one of the unsupported stream that we didn't find how to detect
ahead of time, and so we try to decode it.

> 
> BTW, this would be a lot easier to figure out if we could get fluster to
> output a system timestamp for each decode run (at least in single job mode).


Well, that's not magical, they have to trace the same timestamp. An example, the
kernel and gstreamer both uses their own uptime, which is of course not helping
it at all.

> 
> I had to hack in delays between each decode rune, and then look at `dmesg -w`
> and switching back to the window that has fluster running once the warning
> triggers.

If all you care is which streams caused what kernel trace, I think the least
amount of effort is to propose a patch against fluster to syslog the start of
tests. Your logger will aggregate. Note that its only going to work for single
job run since the kernel error trace don't give enough context to trace back the
error into the V4L2 FD and back to the owning process.

Nicolas

> 
> 
> ChenYu
  
Paul Kocialkowski April 9, 2026, 1:33 p.m. UTC | #7
Hi,

On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote:
> Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> stateless slice control and later uses their indices to look up
> decode->dpb[] in _cedrus_write_ref_list().
> 
> Rejecting such controls in cedrus_try_ctrl() would break existing
> userspace, since stateless H.264 reference lists may legitimately carry
> out-of-range indices for missing references. Instead, guard the actual
> DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> V4L2_H264_NUM_DPB_ENTRIES array.

Could you explain why it is legitimate that userspace would pass indices that
are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1
lists are constructed from active references only and the number of items there
should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1.
We can tolerate invalid data beyond these indices, but certainly not as part
of the indices that should be valid.

However I agree that cedrus_try_ctrl is maybe not the right place to check it
since I'm not sure we are guaranteed that the slice params control will be
checked before the new DPB (from the same request) is applied, so we might end
up checking against the dpb from the previous decode request.

But I think we should error out and not just skip the invalid reference.

All the best,

Paul

> 
> This keeps the fix local to the driver use site and avoids out-of-bounds
> reads from malformed or unsupported reference list entries.
> 
> Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> ---
>  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
>  		u8 dpb_idx;
>  
>  		dpb_idx = ref_list[i].index;
> +		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
> +			continue;
> +
>  		dpb = &decode->dpb[dpb_idx];
>  
>  		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> -- 
> 2.50.1
>
  
Nicolas Dufresne April 9, 2026, 2 p.m. UTC | #8
Le jeudi 09 avril 2026 à 15:33 +0200, Paul Kocialkowski a écrit :
> Hi,
> 
> On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote:
> > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > stateless slice control and later uses their indices to look up
> > decode->dpb[] in _cedrus_write_ref_list().
> > 
> > Rejecting such controls in cedrus_try_ctrl() would break existing
> > userspace, since stateless H.264 reference lists may legitimately carry
> > out-of-range indices for missing references. Instead, guard the actual
> > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > V4L2_H264_NUM_DPB_ENTRIES array.
> 
> Could you explain why it is legitimate that userspace would pass indices that
> are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1
> lists are constructed from active references only and the number of items
> there
> should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1.
> We can tolerate invalid data beyond these indices, but certainly not as part
> of the indices that should be valid.
> 
> However I agree that cedrus_try_ctrl is maybe not the right place to check it
> since I'm not sure we are guaranteed that the slice params control will be
> checked before the new DPB (from the same request) is applied, so we might end
> up checking against the dpb from the previous decode request.
> 
> But I think we should error out and not just skip the invalid reference.

Its been a long time I haven't looked into this. But what happens here is that
once you lost a reference, the userspace DPB will hold a gap picture, which as
no backing storage. Since it has no backing storage, there is no cookie
(timestamp) associated with it. This gap picture will still make it to the
reference lists, since the position of the reference in the list is important
(you cannot just remove an item). It is an established practice in userspace to
simply fill the void with an invalid index, typically 0xff, which is always
invalid. Because that's what some userspace do, it became part of our ABI.

Decoders are expected be fault tolerant, though the tolerance level is hardware
specific, and so failing in the common code would be inappropriate (failing in
Cedrus could be acceptable, assuming it can't work with missing references,
which the implementation seems to be fine with).

Hantro G1 notably have a flag to report missing reference to the HW, and it will
manage concealement internally. G2/RKVDEC don't, and we try and pick the most
recent frame as a replacement backing storage, which most of the time minimises
the damages.

As future refinement, we need drivers in the long term to properly report the
damages (perhaps through additional RO request controls). As discussed few years
ago in the error handling wip for rkvdec, the V4L2 doc specify that any sort of
damages known to exist in a frame shall results in the ERROR flag being set. We
can deduce that the error flag with a payload of 0 indicates to userspace to not
use the frame (which typically happen on hard errors, or errror at entropy
decode staged) and ERROR flag with a correct payload signal some level of
corruption, and its left to the application to decide what to do.

Nicolas

> 
> All the best,
> 
> Paul
> 
> > 
> > This keeps the fix local to the driver use site and avoids out-of-bounds
> > reads from malformed or unsupported reference list entries.
> > 
> > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > ---
> >  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx
> > *ctx,
> >  		u8 dpb_idx;
> >  
> >  		dpb_idx = ref_list[i].index;
> > +		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
> > +			continue;
> > +
> >  		dpb = &decode->dpb[dpb_idx];
> >  
> >  		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > -- 
> > 2.50.1
> >
  
Nicolas Dufresne April 9, 2026, 2 p.m. UTC | #9
Le jeudi 09 avril 2026 à 22:30 +0800, Pengpeng Hou a écrit :
> Hi Paul,
> 
> Thanks, that makes sense.
> 
> I agree Cedrus should not silently skip an invalid index in the active
> portion of ref_pic_list0/ref_pic_list1.
> 
> I'll respin this to keep the check at the Cedrus use site rather than
> cedrus_try_ctrl(), but return -EINVAL when one of the active reference
> entries points past V4L2_H264_NUM_DPB_ENTRIES. Entries beyond
> num_ref_idx_l0_active_minus1 / num_ref_idx_l1_active_minus1 will still
> be ignored as before.

Please, let the discussion continue before respinning.

Nicolas

> 
> Thanks,
> Pengpeng
>
  
Pengpeng Hou April 9, 2026, 2:30 p.m. UTC | #10
Hi Paul,

Thanks, that makes sense.

I agree Cedrus should not silently skip an invalid index in the active
portion of ref_pic_list0/ref_pic_list1.

I'll respin this to keep the check at the Cedrus use site rather than
cedrus_try_ctrl(), but return -EINVAL when one of the active reference
entries points past V4L2_H264_NUM_DPB_ENTRIES. Entries beyond
num_ref_idx_l0_active_minus1 / num_ref_idx_l1_active_minus1 will still
be ignored as before.

Thanks,
Pengpeng
  
Paul Kocialkowski April 9, 2026, 2:31 p.m. UTC | #11
Hi Nicolas,

On Thu 09 Apr 26, 10:00, Nicolas Dufresne wrote:
> Le jeudi 09 avril 2026 à 15:33 +0200, Paul Kocialkowski a écrit :
> > Hi,
> > 
> > On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote:
> > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > > stateless slice control and later uses their indices to look up
> > > decode->dpb[] in _cedrus_write_ref_list().
> > > 
> > > Rejecting such controls in cedrus_try_ctrl() would break existing
> > > userspace, since stateless H.264 reference lists may legitimately carry
> > > out-of-range indices for missing references. Instead, guard the actual
> > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > > V4L2_H264_NUM_DPB_ENTRIES array.
> > 
> > Could you explain why it is legitimate that userspace would pass indices that
> > are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1
> > lists are constructed from active references only and the number of items
> > there
> > should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1.
> > We can tolerate invalid data beyond these indices, but certainly not as part
> > of the indices that should be valid.
> > 
> > However I agree that cedrus_try_ctrl is maybe not the right place to check it
> > since I'm not sure we are guaranteed that the slice params control will be
> > checked before the new DPB (from the same request) is applied, so we might end
> > up checking against the dpb from the previous decode request.
> > 
> > But I think we should error out and not just skip the invalid reference.
> 
> Its been a long time I haven't looked into this. But what happens here is that
> once you lost a reference, the userspace DPB will hold a gap picture, which as
> no backing storage. Since it has no backing storage, there is no cookie
> (timestamp) associated with it. This gap picture will still make it to the
> reference lists, since the position of the reference in the list is important
> (you cannot just remove an item). It is an established practice in userspace to
> simply fill the void with an invalid index, typically 0xff, which is always
> invalid. Because that's what some userspace do, it became part of our ABI.

Right we definitely need to keep the order of the L0/L1 lists even with missing
references and the question is whether the hardware can deal with it or not.

Our uAPI specification currently doesn't say anything about handling missing
references. I'm generally not very keen on considering that undefined behavior
becomes de-facto uAPI that should never be broken, because there are cases where
it is obviously incorrect and the fact that it didn't fail previously is the
result of a bug in the implementation.

But in this situation I agree we do need a way to indicate that references are
missing and using 0xff sounds like a good plan to me, given that we provide a
uAPI header define with this value and that the doc mentions it.

> Decoders are expected be fault tolerant, though the tolerance level is hardware
> specific, and so failing in the common code would be inappropriate (failing in
> Cedrus could be acceptable, assuming it can't work with missing references,
> which the implementation seems to be fine with).

Okay I agree that we should not fail in common code and tolerate a value to
indicate a missing reference.

What the current proposal is doing (skipping the reference) results in the
SRAM entry for the reference remaining untouched, which will keep its value
from the previous frame. This seems clearly incorrect.

> Hantro G1 notably have a flag to report missing reference to the HW, and it will
> manage concealement internally. G2/RKVDEC don't, and we try and pick the most
> recent frame as a replacement backing storage, which most of the time minimises
> the damages.

It sounds like an approach that could work for cedrus too.

> As future refinement, we need drivers in the long term to properly report the
> damages (perhaps through additional RO request controls). As discussed few years
> ago in the error handling wip for rkvdec, the V4L2 doc specify that any sort of
> damages known to exist in a frame shall results in the ERROR flag being set. We
> can deduce that the error flag with a payload of 0 indicates to userspace to not
> use the frame (which typically happen on hard errors, or errror at entropy
> decode staged) and ERROR flag with a correct payload signal some level of
> corruption, and its left to the application to decide what to do.

I think it make sense yes, but it would be good to document it in the uAPI
document too.

All the best,

Paul

> Nicolas
> 
> > 
> > All the best,
> > 
> > Paul
> > 
> > > 
> > > This keeps the fix local to the driver use site and avoids out-of-bounds
> > > reads from malformed or unsupported reference list entries.
> > > 
> > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > > ---
> > >  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx
> > > *ctx,
> > >  		u8 dpb_idx;
> > >  
> > >  		dpb_idx = ref_list[i].index;
> > > +		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
> > > +			continue;
> > > +
> > >  		dpb = &decode->dpb[dpb_idx];
> > >  
> > >  		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > > -- 
> > > 2.50.1
> > >
  
Nicolas Dufresne April 9, 2026, 2:39 p.m. UTC | #12
Hi,

Le jeudi 09 avril 2026 à 16:31 +0200, Paul Kocialkowski a écrit :
> I think it make sense yes, but it would be good to document it in the uAPI
> document too.

Basically, extend in the M2M decoder spec(s) on the existing documentation:
   
   V4L2_BUF_FLAG_ERROR:
   -
   When this flag is set, the buffer has been dequeued successfully, although
   the data might **have been corrupted**. This is recoverable, streaming may
   continue as normal and the buffer may be reused normally. Drivers set this
   flag when the VIDIOC_DQBUF ioctl is called.
   
   
cheers,
Nicolas
  
Paul Kocialkowski April 9, 2026, 3:31 p.m. UTC | #13
Hi,

On Thu 09 Apr 26, 10:39, Nicolas Dufresne wrote:
> Hi,
> 
> Le jeudi 09 avril 2026 à 16:31 +0200, Paul Kocialkowski a écrit :
> > I think it make sense yes, but it would be good to document it in the uAPI
> > document too.
> 
> Basically, extend in the M2M decoder spec(s) on the existing documentation:
>    
>    V4L2_BUF_FLAG_ERROR:
>    -
>    When this flag is set, the buffer has been dequeued successfully, although
>    the data might **have been corrupted**. This is recoverable, streaming may
>    continue as normal and the buffer may be reused normally. Drivers set this
>    flag when the VIDIOC_DQBUF ioctl is called.

Well this part is about v4l2 buffers in general, not just m2m/decoders.
But I guess this mechanism would make sense for more device classes than just
decoders, so we could indeed specify it there. Maybe with a sufficiently broad
wording.

But it would be good to also update the stateless decode document (and maybe
stateful too) where V4L2_BUF_FLAG_ERROR is already mentionned a few times.
We could indicate how this behavior related to reference frames there.

If we agree I could make a series with the following:
- Introduce a V4L2_H264_REF_MISSING 0xff define (same for HEVC)
- Update the v4l2_h264_reference doc to mention it
- Update the cedrus driver to error out (zero-size payload) when the L0/L1 index
  is either V4L2_H264_REF_MISSING or an invalid index that doesn't exist in the
  DPB (same for HEVC)
- Update the v4l2 buffer and stateless(+stateful) documents to mention that
  buffers marked with V4L2_BUF_FLAG_ERROR may or may not contain usable (yet
  corrupted) data depending on the payload size and how it relates to reference
  frames.

Then we could later envision having a mechanism (hopefully common) to figure out
the best replacement to a given missing reference, which would allow cedrus
(and maybe other drivers too) to return a frame with incorrect data instead of
a zero-size payload error.

What do you think?

All the best,

Paul
  
Nicolas Dufresne April 9, 2026, 5:48 p.m. UTC | #14
Le jeudi 09 avril 2026 à 17:31 +0200, Paul Kocialkowski a écrit :
> Hi,
> 
> On Thu 09 Apr 26, 10:39, Nicolas Dufresne wrote:
> > Hi,
> > 
> > Le jeudi 09 avril 2026 à 16:31 +0200, Paul Kocialkowski a écrit :
> > > I think it make sense yes, but it would be good to document it in the uAPI
> > > document too.
> > 
> > Basically, extend in the M2M decoder spec(s) on the existing documentation:
> >    
> >    V4L2_BUF_FLAG_ERROR:
> >    -
> >    When this flag is set, the buffer has been dequeued successfully, although
> >    the data might **have been corrupted**. This is recoverable, streaming may
> >    continue as normal and the buffer may be reused normally. Drivers set this
> >    flag when the VIDIOC_DQBUF ioctl is called.
> 
> Well this part is about v4l2 buffers in general, not just m2m/decoders.
> But I guess this mechanism would make sense for more device classes than just
> decoders, so we could indeed specify it there. Maybe with a sufficiently broad
> wording.

This is current spec, I did meant to use that as basis and improve that codec
specific part.

> 
> But it would be good to also update the stateless decode document (and maybe
> stateful too) where V4L2_BUF_FLAG_ERROR is already mentionned a few times.
> We could indicate how this behavior related to reference frames there.
> 
> If we agree I could make a series with the following:
> - Introduce a V4L2_H264_REF_MISSING 0xff define (same for HEVC)
> - Update the v4l2_h264_reference doc to mention it
> - Update the cedrus driver to error out (zero-size payload) when the L0/L1 index
>   is either V4L2_H264_REF_MISSING or an invalid index that doesn't exist in the
>   DPB (same for HEVC)

Base on what you reported, this is currently the shortest and safe thing to do.

> - Update the v4l2 buffer and stateless(+stateful) documents to mention that
>   buffers marked with V4L2_BUF_FLAG_ERROR may or may not contain usable (yet
>   corrupted) data depending on the payload size and how it relates to reference
>   frames.

wfm

> 
> Then we could later envision having a mechanism (hopefully common) to figure out
> the best replacement to a given missing reference, which would allow cedrus
> (and maybe other drivers too) to return a frame with incorrect data instead of
> a zero-size payload error.

Certainly, though from my experience best or any works quite well, and quite
better then skipping. But if it makes the HW unstable, or sending uninitialized
data to the HW, I agree this is better to hard fail that frame now, enhance
later. Also, be aware that some HW (RK3399 iirc) have error counters, so you
know how many macroblocks are not decoded properly. Other HW have error IRQ,
with a register that tells which macroblock it failed on. The reference decoder
have optional support for software concealment for the failing macroblock, the
down side is that in the worse case you get an irq per block, which is quite a
disaster performance whise.

> 
> What do you think?

Yes, I'm happy with the proposal to get this moving forward.

Nicolas

> 
> All the best,
> 
> Paul
  

Patch

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -210,6 +210,9 @@  static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
 		u8 dpb_idx;
 
 		dpb_idx = ref_list[i].index;
+		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
+			continue;
+
 		dpb = &decode->dpb[dpb_idx];
 
 		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))