| Message ID | 20260324080856.56787-1-pengpeng@iscas.ac.cn (mailing list archive) |
|---|---|
| State | New |
| Headers |
Received: from cstnet.cn (smtp21.cstnet.cn [159.226.251.21]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8662B359A8C; Tue, 24 Mar 2026 08:09:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=159.226.251.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774339751; cv=none; b=WiV/ZaTZGefFjXOn9qft64s4pFOp7UUgTyyGYcVbO690UYXeZnc7HejxVanT9yawTfu6kgeglzMeRnCPfQvDB4RZn9FT46g6paPrJ1tJ2W0L6aGF6R1oPV1UTuQdeIYBUTjvBxc6J7+HNwj/54SOsDfNtu7i5ujJ1pL7f01vPcI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774339751; c=relaxed/simple; bh=cqH5rURx3Dp+FJG2C8XwFT90pQsy89+ITQTIFqIQwEE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=E+uEaFsleGDbPbI3f2BZvvdQpRtwhJnLZID1ekiySR0TbK0YRfbNe79HPmVVJiuucr+qmA0kvGwGO5rKCfr2CgwBT2scD71gQRP9xAtHkBtyVDh4rZ+IuYIkNDhiinXQ7IUdq7MeKmuoxAzFwRUQkOibNE2AJ41nmqX7GPGQXo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=iscas.ac.cn; spf=pass smtp.mailfrom=iscas.ac.cn; arc=none smtp.client-ip=159.226.251.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=iscas.ac.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=iscas.ac.cn Received: from localhost.localdomain (unknown [111.196.245.197]) by APP-01 (Coremail) with SMTP id qwCowADnj2uYRsJpr877Cg--.1755S2; Tue, 24 Mar 2026 16:08:56 +0800 (CST) From: Pengpeng Hou <pengpeng@iscas.ac.cn> To: mripard@kernel.org Cc: paulk@sys-base.io, mchehab@kernel.org, gregkh@linuxfoundation.org, wens@kernel.org, jernej.skrabec@gmail.com, samuel@sholland.org, nicolas.dufresne@collabora.com, linux-media@vger.kernel.org, linux-staging@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-sunxi@lists.linux.dev, linux-kernel@vger.kernel.org, pengpeng@iscas.ac.cn Subject: [PATCH] media: cedrus: skip invalid H.264 reference list entries Date: Tue, 24 Mar 2026 16:08:56 +0800 Message-ID: <20260324080856.56787-1-pengpeng@iscas.ac.cn> X-Mailer: git-send-email 2.50.1 Precedence: bulk X-Mailing-List: linux-sunxi@lists.linux.dev List-Id: <linux-sunxi.lists.linux.dev> List-Subscribe: <mailto:linux-sunxi+subscribe@lists.linux.dev> List-Unsubscribe: <mailto:linux-sunxi+unsubscribe@lists.linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: qwCowADnj2uYRsJpr877Cg--.1755S2 X-Coremail-Antispam: 1UD129KBjvJXoW7tFWrZF43AFyUGw4xtFWxtFb_yoW8Gry5pF 4jgw12ya1DJF45GF47AF40vFyYgas5ZrWfCry5K3WxZF15uF1xuFnY9348Xrs8Za98C3ya yF1ag345Xa1DA37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9Y14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2 Y2ka0xkIwI1lc7CjxVAaw2AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x 0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2 zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF 4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWU CwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCT nIWIevJa73UjIFyTuYvjfUOlkVUUUUU X-CM-SenderInfo: pshqw1xhqjqxpvfd2hldfou0/ Status: O |
| Series |
media: cedrus: skip invalid H.264 reference list entries
|
|
Commit Message
Pengpeng Hou
March 24, 2026, 8:08 a.m. UTC
Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
stateless slice control and later uses their indices to look up
decode->dpb[] in _cedrus_write_ref_list().
Rejecting such controls in cedrus_try_ctrl() would break existing
userspace, since stateless H.264 reference lists may legitimately carry
out-of-range indices for missing references. Instead, guard the actual
DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
V4L2_H264_NUM_DPB_ENTRIES array.
This keeps the fix local to the driver use site and avoids out-of-bounds
reads from malformed or unsupported reference list entries.
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
---
drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
1 file changed, 3 insertions(+)
Comments
Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a): > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > stateless slice control and later uses their indices to look up > decode->dpb[] in _cedrus_write_ref_list(). > > Rejecting such controls in cedrus_try_ctrl() would break existing > userspace, since stateless H.264 reference lists may legitimately carry > out-of-range indices for missing references. Instead, guard the actual > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > V4L2_H264_NUM_DPB_ENTRIES array. > > This keeps the fix local to the driver use site and avoids out-of-bounds > reads from malformed or unsupported reference list entries. > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Best regards, Jernej
On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote: > > Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a): > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > > stateless slice control and later uses their indices to look up > > decode->dpb[] in _cedrus_write_ref_list(). > > > > Rejecting such controls in cedrus_try_ctrl() would break existing > > userspace, since stateless H.264 reference lists may legitimately carry > > out-of-range indices for missing references. Instead, guard the actual > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > > V4L2_H264_NUM_DPB_ENTRIES array. > > > > This keeps the fix local to the driver use site and avoids out-of-bounds > > reads from malformed or unsupported reference list entries. > > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > > Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Tested-by: Chen-Yu Tsai <wens@kernel.org> This fixes a KASAN slab-use-after-free warning when running fluster H.264 tests.
Le mardi 24 mars 2026 à 16:08 +0800, Pengpeng Hou a écrit : > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > stateless slice control and later uses their indices to look up > decode->dpb[] in _cedrus_write_ref_list(). > > Rejecting such controls in cedrus_try_ctrl() would break existing > userspace, since stateless H.264 reference lists may legitimately carry > out-of-range indices for missing references. Instead, guard the actual > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > V4L2_H264_NUM_DPB_ENTRIES array. > > This keeps the fix local to the driver use site and avoids out-of-bounds > reads from malformed or unsupported reference list entries. > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > --- > drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx, > u8 dpb_idx; > > dpb_idx = ref_list[i].index; > + if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES) > + continue; Matches how we skip inactive references (in this diff, though most userspace just don't pass them). Now, if I looked lower, we set a position for each references. My understanding is that if no bits are set, it means "no position". How much testing have you done to confirm the HW behaves properly ? Despite this question, I think this is going to work better then doing memory overrun: Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Nicolas > + > dpb = &decode->dpb[dpb_idx]; > > if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
Le dimanche 29 mars 2026 à 20:44 +0800, Chen-Yu Tsai a écrit : > On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote: > > > > Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a): > > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > > > stateless slice control and later uses their indices to look up > > > decode->dpb[] in _cedrus_write_ref_list(). > > > > > > Rejecting such controls in cedrus_try_ctrl() would break existing > > > userspace, since stateless H.264 reference lists may legitimately carry > > > out-of-range indices for missing references. Instead, guard the actual > > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > > > V4L2_H264_NUM_DPB_ENTRIES array. > > > > > > This keeps the fix local to the driver use site and avoids out-of-bounds > > > reads from malformed or unsupported reference list entries. > > > > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > > > > Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> > > Tested-by: Chen-Yu Tsai <wens@kernel.org> > > This fixes a KASAN slab-use-after-free warning when running fluster H.264 > tests. Ah, very good, can you cite which test caused that ? I didn't expect fluster to cover cases with missing references. I think it will be handy for future testing. Nicolas
On Mon, Mar 30, 2026 at 11:55 PM Nicolas Dufresne <nicolas.dufresne@collabora.com> wrote: > > Le dimanche 29 mars 2026 à 20:44 +0800, Chen-Yu Tsai a écrit : > > On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote: > > > > > > Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a): > > > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > > > > stateless slice control and later uses their indices to look up > > > > decode->dpb[] in _cedrus_write_ref_list(). > > > > > > > > Rejecting such controls in cedrus_try_ctrl() would break existing > > > > userspace, since stateless H.264 reference lists may legitimately carry > > > > out-of-range indices for missing references. Instead, guard the actual > > > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > > > > V4L2_H264_NUM_DPB_ENTRIES array. > > > > > > > > This keeps the fix local to the driver use site and avoids out-of-bounds > > > > reads from malformed or unsupported reference list entries. > > > > > > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > > > > > > Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> > > > > Tested-by: Chen-Yu Tsai <wens@kernel.org> > > > > This fixes a KASAN slab-use-after-free warning when running fluster H.264 > > tests. > > Ah, very good, can you cite which test caused that ? I didn't expect fluster to > cover cases with missing references. I think it will be handy for future > testing. Looks like it is FM1_BT_B. And it only happens on the first run after reboot, or KASAN just only reports it once. BTW, this would be a lot easier to figure out if we could get fluster to output a system timestamp for each decode run (at least in single job mode). I had to hack in delays between each decode rune, and then look at `dmesg -w` and switching back to the window that has fluster running once the warning triggers. ChenYu
Le mardi 31 mars 2026 à 00:45 +0800, Chen-Yu Tsai a écrit : > On Mon, Mar 30, 2026 at 11:55 PM Nicolas Dufresne > <nicolas.dufresne@collabora.com> wrote: > > > > Le dimanche 29 mars 2026 à 20:44 +0800, Chen-Yu Tsai a écrit : > > > On Sun, Mar 29, 2026 at 5:21 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote: > > > > > > > > Dne torek, 24. marec 2026 ob 09:08:56 Srednjeevropski poletni čas je Pengpeng Hou napisal(a): > > > > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > > > > > stateless slice control and later uses their indices to look up > > > > > decode->dpb[] in _cedrus_write_ref_list(). > > > > > > > > > > Rejecting such controls in cedrus_try_ctrl() would break existing > > > > > userspace, since stateless H.264 reference lists may legitimately carry > > > > > out-of-range indices for missing references. Instead, guard the actual > > > > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > > > > > V4L2_H264_NUM_DPB_ENTRIES array. > > > > > > > > > > This keeps the fix local to the driver use site and avoids out-of-bounds > > > > > reads from malformed or unsupported reference list entries. > > > > > > > > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > > > > > > > > Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> > > > > > > Tested-by: Chen-Yu Tsai <wens@kernel.org> > > > > > > This fixes a KASAN slab-use-after-free warning when running fluster H.264 > > > tests. > > > > Ah, very good, can you cite which test caused that ? I didn't expect fluster to > > cover cases with missing references. I think it will be handy for future > > testing. > > Looks like it is FM1_BT_B. And it only happens on the first run after reboot, > or KASAN just only reports it once. Thanks, its one of the unsupported stream that we didn't find how to detect ahead of time, and so we try to decode it. > > BTW, this would be a lot easier to figure out if we could get fluster to > output a system timestamp for each decode run (at least in single job mode). Well, that's not magical, they have to trace the same timestamp. An example, the kernel and gstreamer both uses their own uptime, which is of course not helping it at all. > > I had to hack in delays between each decode rune, and then look at `dmesg -w` > and switching back to the window that has fluster running once the warning > triggers. If all you care is which streams caused what kernel trace, I think the least amount of effort is to propose a patch against fluster to syslog the start of tests. Your logger will aggregate. Note that its only going to work for single job run since the kernel error trace don't give enough context to trace back the error into the V4L2 FD and back to the owning process. Nicolas > > > ChenYu
Hi, On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote: > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > stateless slice control and later uses their indices to look up > decode->dpb[] in _cedrus_write_ref_list(). > > Rejecting such controls in cedrus_try_ctrl() would break existing > userspace, since stateless H.264 reference lists may legitimately carry > out-of-range indices for missing references. Instead, guard the actual > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > V4L2_H264_NUM_DPB_ENTRIES array. Could you explain why it is legitimate that userspace would pass indices that are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1 lists are constructed from active references only and the number of items there should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1. We can tolerate invalid data beyond these indices, but certainly not as part of the indices that should be valid. However I agree that cedrus_try_ctrl is maybe not the right place to check it since I'm not sure we are guaranteed that the slice params control will be checked before the new DPB (from the same request) is applied, so we might end up checking against the dpb from the previous decode request. But I think we should error out and not just skip the invalid reference. All the best, Paul > > This keeps the fix local to the driver use site and avoids out-of-bounds > reads from malformed or unsupported reference list entries. > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > --- > drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx, > u8 dpb_idx; > > dpb_idx = ref_list[i].index; > + if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES) > + continue; > + > dpb = &decode->dpb[dpb_idx]; > > if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) > -- > 2.50.1 >
Le jeudi 09 avril 2026 à 15:33 +0200, Paul Kocialkowski a écrit : > Hi, > > On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote: > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > > stateless slice control and later uses their indices to look up > > decode->dpb[] in _cedrus_write_ref_list(). > > > > Rejecting such controls in cedrus_try_ctrl() would break existing > > userspace, since stateless H.264 reference lists may legitimately carry > > out-of-range indices for missing references. Instead, guard the actual > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > > V4L2_H264_NUM_DPB_ENTRIES array. > > Could you explain why it is legitimate that userspace would pass indices that > are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1 > lists are constructed from active references only and the number of items > there > should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1. > We can tolerate invalid data beyond these indices, but certainly not as part > of the indices that should be valid. > > However I agree that cedrus_try_ctrl is maybe not the right place to check it > since I'm not sure we are guaranteed that the slice params control will be > checked before the new DPB (from the same request) is applied, so we might end > up checking against the dpb from the previous decode request. > > But I think we should error out and not just skip the invalid reference. Its been a long time I haven't looked into this. But what happens here is that once you lost a reference, the userspace DPB will hold a gap picture, which as no backing storage. Since it has no backing storage, there is no cookie (timestamp) associated with it. This gap picture will still make it to the reference lists, since the position of the reference in the list is important (you cannot just remove an item). It is an established practice in userspace to simply fill the void with an invalid index, typically 0xff, which is always invalid. Because that's what some userspace do, it became part of our ABI. Decoders are expected be fault tolerant, though the tolerance level is hardware specific, and so failing in the common code would be inappropriate (failing in Cedrus could be acceptable, assuming it can't work with missing references, which the implementation seems to be fine with). Hantro G1 notably have a flag to report missing reference to the HW, and it will manage concealement internally. G2/RKVDEC don't, and we try and pick the most recent frame as a replacement backing storage, which most of the time minimises the damages. As future refinement, we need drivers in the long term to properly report the damages (perhaps through additional RO request controls). As discussed few years ago in the error handling wip for rkvdec, the V4L2 doc specify that any sort of damages known to exist in a frame shall results in the ERROR flag being set. We can deduce that the error flag with a payload of 0 indicates to userspace to not use the frame (which typically happen on hard errors, or errror at entropy decode staged) and ERROR flag with a correct payload signal some level of corruption, and its left to the application to decide what to do. Nicolas > > All the best, > > Paul > > > > > This keeps the fix local to the driver use site and avoids out-of-bounds > > reads from malformed or unsupported reference list entries. > > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > > --- > > drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx > > *ctx, > > u8 dpb_idx; > > > > dpb_idx = ref_list[i].index; > > + if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES) > > + continue; > > + > > dpb = &decode->dpb[dpb_idx]; > > > > if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) > > -- > > 2.50.1 > >
Le jeudi 09 avril 2026 à 22:30 +0800, Pengpeng Hou a écrit : > Hi Paul, > > Thanks, that makes sense. > > I agree Cedrus should not silently skip an invalid index in the active > portion of ref_pic_list0/ref_pic_list1. > > I'll respin this to keep the check at the Cedrus use site rather than > cedrus_try_ctrl(), but return -EINVAL when one of the active reference > entries points past V4L2_H264_NUM_DPB_ENTRIES. Entries beyond > num_ref_idx_l0_active_minus1 / num_ref_idx_l1_active_minus1 will still > be ignored as before. Please, let the discussion continue before respinning. Nicolas > > Thanks, > Pengpeng >
Hi Paul, Thanks, that makes sense. I agree Cedrus should not silently skip an invalid index in the active portion of ref_pic_list0/ref_pic_list1. I'll respin this to keep the check at the Cedrus use site rather than cedrus_try_ctrl(), but return -EINVAL when one of the active reference entries points past V4L2_H264_NUM_DPB_ENTRIES. Entries beyond num_ref_idx_l0_active_minus1 / num_ref_idx_l1_active_minus1 will still be ignored as before. Thanks, Pengpeng
Hi Nicolas, On Thu 09 Apr 26, 10:00, Nicolas Dufresne wrote: > Le jeudi 09 avril 2026 à 15:33 +0200, Paul Kocialkowski a écrit : > > Hi, > > > > On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote: > > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the > > > stateless slice control and later uses their indices to look up > > > decode->dpb[] in _cedrus_write_ref_list(). > > > > > > Rejecting such controls in cedrus_try_ctrl() would break existing > > > userspace, since stateless H.264 reference lists may legitimately carry > > > out-of-range indices for missing references. Instead, guard the actual > > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed > > > V4L2_H264_NUM_DPB_ENTRIES array. > > > > Could you explain why it is legitimate that userspace would pass indices that > > are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1 > > lists are constructed from active references only and the number of items > > there > > should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1. > > We can tolerate invalid data beyond these indices, but certainly not as part > > of the indices that should be valid. > > > > However I agree that cedrus_try_ctrl is maybe not the right place to check it > > since I'm not sure we are guaranteed that the slice params control will be > > checked before the new DPB (from the same request) is applied, so we might end > > up checking against the dpb from the previous decode request. > > > > But I think we should error out and not just skip the invalid reference. > > Its been a long time I haven't looked into this. But what happens here is that > once you lost a reference, the userspace DPB will hold a gap picture, which as > no backing storage. Since it has no backing storage, there is no cookie > (timestamp) associated with it. This gap picture will still make it to the > reference lists, since the position of the reference in the list is important > (you cannot just remove an item). It is an established practice in userspace to > simply fill the void with an invalid index, typically 0xff, which is always > invalid. Because that's what some userspace do, it became part of our ABI. Right we definitely need to keep the order of the L0/L1 lists even with missing references and the question is whether the hardware can deal with it or not. Our uAPI specification currently doesn't say anything about handling missing references. I'm generally not very keen on considering that undefined behavior becomes de-facto uAPI that should never be broken, because there are cases where it is obviously incorrect and the fact that it didn't fail previously is the result of a bug in the implementation. But in this situation I agree we do need a way to indicate that references are missing and using 0xff sounds like a good plan to me, given that we provide a uAPI header define with this value and that the doc mentions it. > Decoders are expected be fault tolerant, though the tolerance level is hardware > specific, and so failing in the common code would be inappropriate (failing in > Cedrus could be acceptable, assuming it can't work with missing references, > which the implementation seems to be fine with). Okay I agree that we should not fail in common code and tolerate a value to indicate a missing reference. What the current proposal is doing (skipping the reference) results in the SRAM entry for the reference remaining untouched, which will keep its value from the previous frame. This seems clearly incorrect. > Hantro G1 notably have a flag to report missing reference to the HW, and it will > manage concealement internally. G2/RKVDEC don't, and we try and pick the most > recent frame as a replacement backing storage, which most of the time minimises > the damages. It sounds like an approach that could work for cedrus too. > As future refinement, we need drivers in the long term to properly report the > damages (perhaps through additional RO request controls). As discussed few years > ago in the error handling wip for rkvdec, the V4L2 doc specify that any sort of > damages known to exist in a frame shall results in the ERROR flag being set. We > can deduce that the error flag with a payload of 0 indicates to userspace to not > use the frame (which typically happen on hard errors, or errror at entropy > decode staged) and ERROR flag with a correct payload signal some level of > corruption, and its left to the application to decide what to do. I think it make sense yes, but it would be good to document it in the uAPI document too. All the best, Paul > Nicolas > > > > > All the best, > > > > Paul > > > > > > > > This keeps the fix local to the driver use site and avoids out-of-bounds > > > reads from malformed or unsupported reference list entries. > > > > > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> > > > --- > > > drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx > > > *ctx, > > > u8 dpb_idx; > > > > > > dpb_idx = ref_list[i].index; > > > + if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES) > > > + continue; > > > + > > > dpb = &decode->dpb[dpb_idx]; > > > > > > if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) > > > -- > > > 2.50.1 > > >
Hi, Le jeudi 09 avril 2026 à 16:31 +0200, Paul Kocialkowski a écrit : > I think it make sense yes, but it would be good to document it in the uAPI > document too. Basically, extend in the M2M decoder spec(s) on the existing documentation: V4L2_BUF_FLAG_ERROR: - When this flag is set, the buffer has been dequeued successfully, although the data might **have been corrupted**. This is recoverable, streaming may continue as normal and the buffer may be reused normally. Drivers set this flag when the VIDIOC_DQBUF ioctl is called. cheers, Nicolas
Hi, On Thu 09 Apr 26, 10:39, Nicolas Dufresne wrote: > Hi, > > Le jeudi 09 avril 2026 à 16:31 +0200, Paul Kocialkowski a écrit : > > I think it make sense yes, but it would be good to document it in the uAPI > > document too. > > Basically, extend in the M2M decoder spec(s) on the existing documentation: > > V4L2_BUF_FLAG_ERROR: > - > When this flag is set, the buffer has been dequeued successfully, although > the data might **have been corrupted**. This is recoverable, streaming may > continue as normal and the buffer may be reused normally. Drivers set this > flag when the VIDIOC_DQBUF ioctl is called. Well this part is about v4l2 buffers in general, not just m2m/decoders. But I guess this mechanism would make sense for more device classes than just decoders, so we could indeed specify it there. Maybe with a sufficiently broad wording. But it would be good to also update the stateless decode document (and maybe stateful too) where V4L2_BUF_FLAG_ERROR is already mentionned a few times. We could indicate how this behavior related to reference frames there. If we agree I could make a series with the following: - Introduce a V4L2_H264_REF_MISSING 0xff define (same for HEVC) - Update the v4l2_h264_reference doc to mention it - Update the cedrus driver to error out (zero-size payload) when the L0/L1 index is either V4L2_H264_REF_MISSING or an invalid index that doesn't exist in the DPB (same for HEVC) - Update the v4l2 buffer and stateless(+stateful) documents to mention that buffers marked with V4L2_BUF_FLAG_ERROR may or may not contain usable (yet corrupted) data depending on the payload size and how it relates to reference frames. Then we could later envision having a mechanism (hopefully common) to figure out the best replacement to a given missing reference, which would allow cedrus (and maybe other drivers too) to return a frame with incorrect data instead of a zero-size payload error. What do you think? All the best, Paul
Le jeudi 09 avril 2026 à 17:31 +0200, Paul Kocialkowski a écrit : > Hi, > > On Thu 09 Apr 26, 10:39, Nicolas Dufresne wrote: > > Hi, > > > > Le jeudi 09 avril 2026 à 16:31 +0200, Paul Kocialkowski a écrit : > > > I think it make sense yes, but it would be good to document it in the uAPI > > > document too. > > > > Basically, extend in the M2M decoder spec(s) on the existing documentation: > > > > V4L2_BUF_FLAG_ERROR: > > - > > When this flag is set, the buffer has been dequeued successfully, although > > the data might **have been corrupted**. This is recoverable, streaming may > > continue as normal and the buffer may be reused normally. Drivers set this > > flag when the VIDIOC_DQBUF ioctl is called. > > Well this part is about v4l2 buffers in general, not just m2m/decoders. > But I guess this mechanism would make sense for more device classes than just > decoders, so we could indeed specify it there. Maybe with a sufficiently broad > wording. This is current spec, I did meant to use that as basis and improve that codec specific part. > > But it would be good to also update the stateless decode document (and maybe > stateful too) where V4L2_BUF_FLAG_ERROR is already mentionned a few times. > We could indicate how this behavior related to reference frames there. > > If we agree I could make a series with the following: > - Introduce a V4L2_H264_REF_MISSING 0xff define (same for HEVC) > - Update the v4l2_h264_reference doc to mention it > - Update the cedrus driver to error out (zero-size payload) when the L0/L1 index > is either V4L2_H264_REF_MISSING or an invalid index that doesn't exist in the > DPB (same for HEVC) Base on what you reported, this is currently the shortest and safe thing to do. > - Update the v4l2 buffer and stateless(+stateful) documents to mention that > buffers marked with V4L2_BUF_FLAG_ERROR may or may not contain usable (yet > corrupted) data depending on the payload size and how it relates to reference > frames. wfm > > Then we could later envision having a mechanism (hopefully common) to figure out > the best replacement to a given missing reference, which would allow cedrus > (and maybe other drivers too) to return a frame with incorrect data instead of > a zero-size payload error. Certainly, though from my experience best or any works quite well, and quite better then skipping. But if it makes the HW unstable, or sending uninitialized data to the HW, I agree this is better to hard fail that frame now, enhance later. Also, be aware that some HW (RK3399 iirc) have error counters, so you know how many macroblocks are not decoded properly. Other HW have error IRQ, with a register that tells which macroblock it failed on. The reference decoder have optional support for software concealment for the failing macroblock, the down side is that in the worse case you get an irq per block, which is quite a disaster performance whise. > > What do you think? Yes, I'm happy with the proposal to get this moving forward. Nicolas > > All the best, > > Paul
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx, u8 dpb_idx; dpb_idx = ref_list[i].index; + if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES) + continue; + dpb = &decode->dpb[dpb_idx]; if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))