Message ID | 20240507160546.130255-3-miquel.raynal@bootlin.com |
---|---|
State | New |
Headers | show |
Series | mtd: rawnand: NAND early identification fixes | expand |
Hello Miquel, Am Tue, May 07, 2024 at 06:05:46PM +0200 schrieb Miquel Raynal: > Early during NAND identification, mtd_info fields have not yet been > initialized (namely, writesize and oobsize) and thus cannot be used for > sanity checks yet. Of course if there is a misuse of > nand_change_read_column_op() so early we won't be warned, but there is > anyway no actual check to perform at this stage as we do not yet know > the NAND geometry. > > So, if the fields are empty, especially mtd->writesize which is *always* > set quite rapidly after identification, let's skip the sanity checks. > > nand_change_read_column_op() is subject to be used early for ONFI/JEDEC > identification in the very unlikely case of: > - bitflips appearing in the parameter page, > - the controller driver not supporting simple DATA_IN cycles. > > Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers") > Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers") > Cc: stable@vger.kernel.org > Reported-by: Alexander Dahl <ada@thorsis.com> > Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/ > Reported-by: Steven Seeger <steven.seeger@flightsystems.net> > Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/ > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > drivers/mtd/nand/raw/nand_base.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c > index 248e654ecefd..a66e73cd68cb 100644 > --- a/drivers/mtd/nand/raw/nand_base.c > +++ b/drivers/mtd/nand/raw/nand_base.c > @@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip, > if (len && !buf) > return -EINVAL; > > - if (offset_in_page + len > mtd->writesize + mtd->oobsize) > - return -EINVAL; > + if (mtd->writesize) { > + if ((offset_in_page + len > mtd->writesize + mtd->oobsize)) > + return -EINVAL; These doubled (( )) are new and I think not necessary? Greets Alex > > - /* Small page NANDs do not support column change. */ > - if (mtd->writesize <= 512) > - return -ENOTSUPP; > + /* Small page NANDs do not support column change. */ > + if (mtd->writesize <= 512) > + return -ENOTSUPP; > + } > > if (nand_has_exec_op(chip)) { > const struct nand_interface_config *conf = > -- > 2.40.1 >
Hi Alexander, ada@thorsis.com wrote on Wed, 8 May 2024 08:41:44 +0200: > Hello Miquel, > > Am Tue, May 07, 2024 at 06:05:46PM +0200 schrieb Miquel Raynal: > > Early during NAND identification, mtd_info fields have not yet been > > initialized (namely, writesize and oobsize) and thus cannot be used for > > sanity checks yet. Of course if there is a misuse of > > nand_change_read_column_op() so early we won't be warned, but there is > > anyway no actual check to perform at this stage as we do not yet know > > the NAND geometry. > > > > So, if the fields are empty, especially mtd->writesize which is *always* > > set quite rapidly after identification, let's skip the sanity checks. > > > > nand_change_read_column_op() is subject to be used early for ONFI/JEDEC > > identification in the very unlikely case of: > > - bitflips appearing in the parameter page, > > - the controller driver not supporting simple DATA_IN cycles. > > > > Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers") > > Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers") > > Cc: stable@vger.kernel.org > > Reported-by: Alexander Dahl <ada@thorsis.com> > > Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/ > > Reported-by: Steven Seeger <steven.seeger@flightsystems.net> > > Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/ > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > > --- > > drivers/mtd/nand/raw/nand_base.c | 12 +++++++----- > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c > > index 248e654ecefd..a66e73cd68cb 100644 > > --- a/drivers/mtd/nand/raw/nand_base.c > > +++ b/drivers/mtd/nand/raw/nand_base.c > > @@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip, > > if (len && !buf) > > return -EINVAL; > > > > - if (offset_in_page + len > mtd->writesize + mtd->oobsize) > > - return -EINVAL; > > + if (mtd->writesize) { > > + if ((offset_in_page + len > mtd->writesize + mtd->oobsize)) > > + return -EINVAL; > > These doubled (( )) are new and I think not necessary? Oops, true. Any chances you'll be able to test the patchset? Same question for Steven! Cheers, Miquèl
Hi Miquel, On Tue, May 07, 2024 at 06:05:46PM +0200, Miquel Raynal wrote: > Early during NAND identification, mtd_info fields have not yet been > initialized (namely, writesize and oobsize) and thus cannot be used for > sanity checks yet. Of course if there is a misuse of > nand_change_read_column_op() so early we won't be warned, but there is > anyway no actual check to perform at this stage as we do not yet know > the NAND geometry. > > So, if the fields are empty, especially mtd->writesize which is *always* > set quite rapidly after identification, let's skip the sanity checks. > > nand_change_read_column_op() is subject to be used early for ONFI/JEDEC > identification in the very unlikely case of: > - bitflips appearing in the parameter page, > - the controller driver not supporting simple DATA_IN cycles. > > Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers") > Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers") > Cc: stable@vger.kernel.org > Reported-by: Alexander Dahl <ada@thorsis.com> > Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/ > Reported-by: Steven Seeger <steven.seeger@flightsystems.net> > Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/ > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> > --- > drivers/mtd/nand/raw/nand_base.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c > index 248e654ecefd..a66e73cd68cb 100644 > --- a/drivers/mtd/nand/raw/nand_base.c > +++ b/drivers/mtd/nand/raw/nand_base.c > @@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip, > if (len && !buf) > return -EINVAL; > > - if (offset_in_page + len > mtd->writesize + mtd->oobsize) > - return -EINVAL; > + if (mtd->writesize) { > + if ((offset_in_page + len > mtd->writesize + mtd->oobsize)) > + return -EINVAL; > > - /* Small page NANDs do not support column change. */ > - if (mtd->writesize <= 512) > - return -ENOTSUPP; > + /* Small page NANDs do not support column change. */ > + if (mtd->writesize <= 512) > + return -ENOTSUPP; > + } This is not enough. A few lines further down nand_fill_column_cycles() is called which also uses mtd->writesize. This function also needs to know if we have a large page or small page NAND, so bypassing the checks won't be enough there. Sascha
On Tuesday, May 7, 2024, Miquel Raynal wrote: >So, if the fields are empty, especially mtd->writesize which is *always* >set quite rapidly after identification, let's skip the sanity checks. I noticed this when first looking at my board with the bitflip in a NAND chip's parameter page. I just assumed that since I was setting it up to do column change operations, I needed to add this in at the time. Looking at it now, since this information is being supplied by me before the scan, it's wrong. So I agree it's a bug, but I didn't think about it again since I was tackling the bug of trying to read additional parameter page copies further down the page due to the bitflip. I don't have access to the board right now, but when I get to it again I can try this patch. I will need to remove what I already added in to check and reply back. It may be a few weeks, though. On another note, I think that this entire API would benefit from discouraging hybrid approaches. I implement function overloads for things like ecc.read_page, write_page, read_page_raw, etc, but also use the exec function for things like erase, read id, read parameter page, etc. I maybe did it "wrong" but it works. Past drivers I've done use the legacy cmdfunc, so this was my first attempt at using the command parser. I suspect there are a lot of people like me writing drivers for proprietary hardware that uses FPGAs to do some of the NAND interaction, rather than direct chip access as the API was originally designed for. So, to explain further, read_page triggers my addr/chip select, read page command, and retrieving the buffer. Read parameter page goes through the command parser, as does the column change op, with some state variables to keep track of where in the read cycle we are so that each copy of the parameter page data can be accessed in the buffer. I lament the lack of consistency here. But, it works and the customer is unlikely to want to change anything at this point. :) Steven
Hi Steven, steven.seeger@flightsystems.net wrote on Tue, 14 May 2024 17:57:46 +0000: > On Tuesday, May 7, 2024, Miquel Raynal wrote: > > >So, if the fields are empty, especially mtd->writesize which is *always* > >set quite rapidly after identification, let's skip the sanity checks. > > I noticed this when first looking at my board with the bitflip in a NAND chip's parameter page. I just assumed that since I was setting it up to do column change operations, I needed to add this in at the time. Looking at it now, since this information is being supplied by me before the scan, it's wrong. So I agree it's a bug, but I didn't think about it again since I was tackling the bug of trying to read additional parameter page copies further down the page due to the bitflip. > > I don't have access to the board right now, but when I get to it again I can try this patch. I will need to remove what I already added in to check and reply back. It may be a few weeks, though. > > On another note, I think that this entire API would benefit from discouraging hybrid approaches. I implement function overloads for things like ecc.read_page, write_page, read_page_raw, etc, but also use the exec function for things like erase, read id, read parameter page, etc. I maybe did it "wrong" but it works. Past drivers I've done use the legacy cmdfunc, so this was my first attempt at using the command parser. I suspect there are a lot of people like me writing drivers for proprietary hardware that uses FPGAs to do some of the NAND interaction, rather than direct chip access as the API was originally designed for. I don't know what you mean with direct chip access. ->cmd_ctrl() and ->cmdfunc() were desperately too simple and many drivers started guessing what the core was trying to do, making it very hard to extend/modify the core without breaking them all. This was sign of an inadequate design and hence ->exec_op() (providing all the operation) was introduced. Just to make it clear, the original APIs were totally fine back then, but controllers evolved, became smarter^W more complex, until the APIs were no longer fitting. > So, to explain further, read_page triggers my addr/chip select, read page command, and retrieving the buffer. Read parameter page goes through the command parser, as does the column change op, with some state variables to keep track of where in the read cycle we are so that each copy of the parameter page data can be accessed in the buffer. I lament the lack of consistency here. But, it works and the customer is unlikely to want to change anything at this point. :) The logic is: - Early at boot you need to identify the chip, its parameters, its configuration, etc. -> exec_op() is used - During normal operation, it's time for I/Os. Using ->exec_op() again can work, but most of the time these operations can be done faster with a more custom approach, especially since most controller drivers embed and ECC engine that also needs to be managed during these accesses. -> your page helpers are here for that - During debugging you might want to perform raw page reads, performance does not matter here but the data layout in the chip is NAND controller and ECC engine specific, while the user expected layout is: [all the data][all the oob]. -> your raw page helpers are here for that And there are standard helpers provided in the core if your controller does not need specific implementations. You may want to use them because it makes your life easier, they will use ->exec_op(). Thanks, Miquèl
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index 248e654ecefd..a66e73cd68cb 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip, if (len && !buf) return -EINVAL; - if (offset_in_page + len > mtd->writesize + mtd->oobsize) - return -EINVAL; + if (mtd->writesize) { + if ((offset_in_page + len > mtd->writesize + mtd->oobsize)) + return -EINVAL; - /* Small page NANDs do not support column change. */ - if (mtd->writesize <= 512) - return -ENOTSUPP; + /* Small page NANDs do not support column change. */ + if (mtd->writesize <= 512) + return -ENOTSUPP; + } if (nand_has_exec_op(chip)) { const struct nand_interface_config *conf =
Early during NAND identification, mtd_info fields have not yet been initialized (namely, writesize and oobsize) and thus cannot be used for sanity checks yet. Of course if there is a misuse of nand_change_read_column_op() so early we won't be warned, but there is anyway no actual check to perform at this stage as we do not yet know the NAND geometry. So, if the fields are empty, especially mtd->writesize which is *always* set quite rapidly after identification, let's skip the sanity checks. nand_change_read_column_op() is subject to be used early for ONFI/JEDEC identification in the very unlikely case of: - bitflips appearing in the parameter page, - the controller driver not supporting simple DATA_IN cycles. Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers") Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers") Cc: stable@vger.kernel.org Reported-by: Alexander Dahl <ada@thorsis.com> Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/ Reported-by: Steven Seeger <steven.seeger@flightsystems.net> Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/ Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> --- drivers/mtd/nand/raw/nand_base.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)