Message ID | 20230923005856.2538223-1-wangzhaolong1@huawei.com |
---|---|
State | New |
Headers | show |
Series | [RFC] mtd: Fix error code loss in mtdchar_read() function. | expand |
Hello, Richard, your advice is welcome here. wangzhaolong1@huawei.com wrote on Sat, 23 Sep 2023 08:58:56 +0800: > In the first while loop, if the mtd_read() function returns -EBADMSG s/the// s/function// , > and 'retlen' returns 0, the loop break and the function returns value s/and// remains to 0. The loop breaks and the function returns 'total_retlen' which is 0 instead of the error code. > 'total_retlen' is 0, not the error code. Actually after looking at the code, I have no strong opinion regarding whether we should return 0 or an error code in this case. There is this comment right above, and I'm not sure it is still up to date because I believe many drivers just don't provide the data upon ECC error: /* Nand returns -EBADMSG on ECC errors, but it returns * the data. For our userspace tools it is important * to dump areas with ECC errors! * For kernel internal usage it also might return -EUCLEAN * to signal the caller that a bitflip has occurred and has * been corrected by the ECC algorithm. * Userspace software which accesses NAND this way * must be aware of the fact that it deals with NAND */ > This problem causes the user-space program to encounter EOF when it has > not finished reading the mtd partion, and this also violates the read > system call standard in POSIX. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217939 > Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com> > --- > drivers/mtd/mtdchar.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c > index 8dc4f5c493fc..ba60dc6bef98 100644 > --- a/drivers/mtd/mtdchar.c > +++ b/drivers/mtd/mtdchar.c > @@ -211,7 +211,7 @@ static ssize_t mtdchar_read(struct file *file, char __user *buf, size_t count, > } > > kfree(kbuf); > - return total_retlen; > + return total_retlen ? total_retlen : ret; This is kind of wrong, if ret is 0 then you return ret while you should return total_retlen. In practice it does not really matter, the result is the same, but it makes it harder to understand the code IMHO. > } /* mtdchar_read */ > > static ssize_t mtdchar_write(struct file *file, const char __user *buf, size_t count, Thanks, Miquèl
----- Ursprüngliche Mail ----- >> 'total_retlen' is 0, not the error code. > > Actually after looking at the code, I have no strong opinion > regarding whether we should return 0 or an error code in this case. > > There is this comment right above, and I'm not sure it is still up to > date because I believe many drivers just don't provide the data upon > ECC error: > > /* Nand returns -EBADMSG on ECC errors, but it returns > * the data. For our userspace tools it is important > * to dump areas with ECC errors! > * For kernel internal usage it also might return -EUCLEAN > * to signal the caller that a bitflip has occurred and has > * been corrected by the ECC algorithm. > * Userspace software which accesses NAND this way > * must be aware of the fact that it deals with NAND > */ > >> This problem causes the user-space program to encounter EOF when it has >> not finished reading the mtd partion, and this also violates the read >> system call standard in POSIX. This is a special purpose device file and not a regular file. Please explain in detail why this violates POSIX and which program breaks. As pointed out by Miquel, the comment makes it clean that this behavior is on purpose. If we return now all of a sudden -EBADMSG for the described scenario we might even break existing MTD userspace. Thanks, //richard
Hi Richard, richard@nod.at wrote on Mon, 25 Sep 2023 11:14:40 +0200 (CEST): > ----- Ursprüngliche Mail ----- > >> 'total_retlen' is 0, not the error code. > > > > Actually after looking at the code, I have no strong opinion > > regarding whether we should return 0 or an error code in this case. > > > > There is this comment right above, and I'm not sure it is still up to > > date because I believe many drivers just don't provide the data upon > > ECC error: > > > > /* Nand returns -EBADMSG on ECC errors, but it returns > > * the data. For our userspace tools it is important > > * to dump areas with ECC errors! > > * For kernel internal usage it also might return -EUCLEAN > > * to signal the caller that a bitflip has occurred and has > > * been corrected by the ECC algorithm. > > * Userspace software which accesses NAND this way > > * must be aware of the fact that it deals with NAND > > */ > > > >> This problem causes the user-space program to encounter EOF when it has > >> not finished reading the mtd partion, and this also violates the read > >> system call standard in POSIX. > > This is a special purpose device file and not a regular file. > Please explain in detail why this violates POSIX and which program breaks. > > As pointed out by Miquel, the comment makes it clean that this behavior is > on purpose. If we return now all of a sudden -EBADMSG for the described > scenario we might even break existing MTD userspace. The bugzilla link in the commit log [1] mentions: * dd would just stop in the middle without showing errors -> we probably don't care, we expect the userspace to know this is NAND when dealing with mtd devices directly, dd is not mtd-aware anyway. * ubiformat would loop forever -> that one needs attention I guess :) [1] https://bugzilla.kernel.org/show_bug.cgi?id=217939 Thanks, Miquèl
----- Ursprüngliche Mail ----- > Von: "Miquel Raynal" <miquel.raynal@bootlin.com> >> As pointed out by Miquel, the comment makes it clean that this behavior is >> on purpose. If we return now all of a sudden -EBADMSG for the described >> scenario we might even break existing MTD userspace. > > The bugzilla link in the commit log [1] mentions: Ups. > * dd would just stop in the middle without showing errors > -> we probably don't care, we expect the userspace to know this is > NAND when dealing with mtd devices directly, dd is not mtd-aware > anyway. Yep. That's fine. > * ubiformat would loop forever > -> that one needs attention I guess :) Hmm. Let me check the source. Thanks, //richard
----- Ursprüngliche Mail ----- > Von: "ZhaoLong Wang" <wangzhaolong1@huawei.com> > An: "Miquel Raynal" <miquel.raynal@bootlin.com>, "richard" <richard@nod.at>, "Vignesh Raghavendra" <vigneshr@ti.com> > CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "chengzhihao1" > <chengzhihao1@huawei.com>, "ZhaoLong Wang" <wangzhaolong1@huawei.com>, "yi zhang" <yi.zhang@huawei.com>, "yangerkun" > <yangerkun@huawei.com> > Gesendet: Samstag, 23. September 2023 02:58:56 > Betreff: [RFC] mtd: Fix error code loss in mtdchar_read() function. > In the first while loop, if the mtd_read() function returns -EBADMSG > and 'retlen' returns 0, the loop break and the function returns value > 'total_retlen' is 0, not the error code. Given this a second thought. I don't think a NAND driver is allowed to return less than requests bytes and setting EBADMSG. UBI's IO path has a comment on that: /* * The driver should never return -EBADMSG if it failed to read * all the requested data. But some buggy drivers might do * this, so we change it to -EIO. */ if (read != len && mtd_is_eccerr(err)) { ubi_assert(0); err = -EIO; } Thanks, //richard
> There is this comment right above, and I'm not sure it is still up to > date because I believe many drivers just don't provide the data upon > ECC error: After observing the nand_base framework code, I think the current nand_base framework can limit the length of retlen to 0 when an ECC error occurs. The prerequisite is that the NAND driver development personnel can correctly provide the return value of the function according to the requirements of the chip->ecc.read_page() callback. However, the read_page() callback comment does not notice the particularity of the following two error codes: * -EUCLEAN - Returned by the MTD layer when maxbitflips greater then bitflip_threshold * -EBADMSG - Returned by NAND Generic Layer when the statistical ECC error stats changes and the number of retries is exhausted. These two error codes are handled by the upper layer and should not be returned by the NAND driver developer. But some driver developers don't realize this. So I don't think it's worth fixing right now, but is the description of the return value of the callback too simplistic? Is there any other more detailed description document for reference?
Hi Richard, richard@nod.at wrote on Mon, 25 Sep 2023 16:03:03 +0200 (CEST): > ----- Ursprüngliche Mail ----- > > Von: "ZhaoLong Wang" <wangzhaolong1@huawei.com> > > An: "Miquel Raynal" <miquel.raynal@bootlin.com>, "richard" <richard@nod.at>, "Vignesh Raghavendra" <vigneshr@ti.com> > > CC: "linux-mtd" <linux-mtd@lists.infradead.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "chengzhihao1" > > <chengzhihao1@huawei.com>, "ZhaoLong Wang" <wangzhaolong1@huawei.com>, "yi zhang" <yi.zhang@huawei.com>, "yangerkun" > > <yangerkun@huawei.com> > > Gesendet: Samstag, 23. September 2023 02:58:56 > > Betreff: [RFC] mtd: Fix error code loss in mtdchar_read() function. > > > In the first while loop, if the mtd_read() function returns -EBADMSG > > and 'retlen' returns 0, the loop break and the function returns value > > 'total_retlen' is 0, not the error code. > > Given this a second thought. I don't think a NAND driver is allowed to return > less than requests bytes and setting EBADMSG. > UBI's IO path has a comment on that: > > /* > * The driver should never return -EBADMSG if it failed to read > * all the requested data. But some buggy drivers might do > * this, so we change it to -EIO. > */ > if (read != len && mtd_is_eccerr(err)) { > ubi_assert(0); > err = -EIO; > } Interesting. Shall we add this check to the mtd_read() path as well? Maybe with a WARN_ON()? Thanks, Miquèl
----- Ursprüngliche Mail ----- > Von: "Miquel Raynal" <miquel.raynal@bootlin.com> >> Given this a second thought. I don't think a NAND driver is allowed to return >> less than requests bytes and setting EBADMSG. >> UBI's IO path has a comment on that: >> >> /* >> * The driver should never return -EBADMSG if it failed to read >> * all the requested data. But some buggy drivers might do >> * this, so we change it to -EIO. >> */ >> if (read != len && mtd_is_eccerr(err)) { >> ubi_assert(0); >> err = -EIO; >> } > > Interesting. Shall we add this check to the mtd_read() path as well? > > Maybe with a WARN_ON()? WARN_ON_ONCE(), please. But yes, let's add it. Thanks, //richard
richard@nod.at wrote on Mon, 25 Sep 2023 16:59:31 +0200 (CEST): > ----- Ursprüngliche Mail ----- > > Von: "Miquel Raynal" <miquel.raynal@bootlin.com> > >> Given this a second thought. I don't think a NAND driver is allowed to return > >> less than requests bytes and setting EBADMSG. > >> UBI's IO path has a comment on that: > >> > >> /* > >> * The driver should never return -EBADMSG if it failed to read > >> * all the requested data. But some buggy drivers might do > >> * this, so we change it to -EIO. > >> */ > >> if (read != len && mtd_is_eccerr(err)) { > >> ubi_assert(0); > >> err = -EIO; > >> } > > > > Interesting. Shall we add this check to the mtd_read() path as well? > > > > Maybe with a WARN_ON()? > > WARN_ON_ONCE(), please. But yes, let's add it. Zhaolong, can you take care of it? > > Thanks, > //richard Thanks, Miquèl
> richard@nod.at wrote on Mon, 25 Sep 2023 16:59:31 +0200 (CEST): > >> ----- Ursprüngliche Mail ----- >>> Von: "Miquel Raynal" <miquel.raynal@bootlin.com> >>>> Given this a second thought. I don't think a NAND driver is allowed to return >>>> less than requests bytes and setting EBADMSG. >>>> UBI's IO path has a comment on that: >>>> >>>> /* >>>> * The driver should never return -EBADMSG if it failed to read >>>> * all the requested data. But some buggy drivers might do >>>> * this, so we change it to -EIO. >>>> */ >>>> if (read != len && mtd_is_eccerr(err)) { >>>> ubi_assert(0); >>>> err = -EIO; >>>> } >>> Interesting. Shall we add this check to the mtd_read() path as well? >>> >>> Maybe with a WARN_ON()? >> WARN_ON_ONCE(), please. But yes, let's add it. > Zhaolong, can you take care of it? > >> Thanks, >> //richard > > Thanks, > Miquèl Yes!That is a good idea, and I am pleased to do this. Thanks, Zhaolong
diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c index 8dc4f5c493fc..ba60dc6bef98 100644 --- a/drivers/mtd/mtdchar.c +++ b/drivers/mtd/mtdchar.c @@ -211,7 +211,7 @@ static ssize_t mtdchar_read(struct file *file, char __user *buf, size_t count, } kfree(kbuf); - return total_retlen; + return total_retlen ? total_retlen : ret; } /* mtdchar_read */ static ssize_t mtdchar_write(struct file *file, const char __user *buf, size_t count,
In the first while loop, if the mtd_read() function returns -EBADMSG and 'retlen' returns 0, the loop break and the function returns value 'total_retlen' is 0, not the error code. This problem causes the user-space program to encounter EOF when it has not finished reading the mtd partion, and this also violates the read system call standard in POSIX. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217939 Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com> --- drivers/mtd/mtdchar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)