Message ID | ea0422cd-a8e6-3c36-f551-a0142893301b@marvell.com |
---|---|
State | Accepted |
Commit | aad8aa56d96b1305ae5a9708b604b2f2a4e97c4e |
Delegated to: | Dario Binacchi |
Headers | show |
Series | mtd: nand: pxa3xx: Incorrect bitflip return on page read | expand |
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > > Once a page is read with higher bitflips all subsequent reads > are returning the same bitflip value even though they have none. > max_bitflip variable is not being reset to 0 across page reads. > > This is causing problems like incorrectly > marking erase blocks bad by UBI and causing read failures. > > Verified the change with both MTD reads and UBI. > This change is inline with other NFC drivers. > > Sample error log where a block is marked bad incorrectly: > > ubi0: fixable bit-flip detected at PEB 125 > ubi0: run torture test for PEB 125 > ubi0: fixable bit-flip detected at PEB 125 > ubi0 error: torture_peb: read problems on freshly erased PEB 125, > must be bad > ubi0 error: erase_worker: failed to erase PEB 125, error -5 > ubi0: mark PEB 125 as bad > > Signed-off-by: rminnikanti <rminnikanti@marvell.com> Looks good to me Reviewed-by: Chris Packham <judge.packham@gmail.com> > --- > drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c > index 1d9a6d107b..d2a4faad56 100644 > --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) > info->ecc_err_cnt = 0; > info->ndcb3 = 0; > info->need_wait = 0; > + /* > + * Reset max_bitflips to zero. Once command is complete, > + * max_bitflips for this READ is returned in ecc.read_page() > + */ > + info->max_bitflips = 0; > > switch (command) { > case NAND_CMD_READ0: > -- > 2.17.1
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote: > > On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > > > > Once a page is read with higher bitflips all subsequent reads > > are returning the same bitflip value even though they have none. > > max_bitflip variable is not being reset to 0 across page reads. > > > > This is causing problems like incorrectly > > marking erase blocks bad by UBI and causing read failures. > > > > Verified the change with both MTD reads and UBI. > > This change is inline with other NFC drivers. > > > > Sample error log where a block is marked bad incorrectly: > > > > ubi0: fixable bit-flip detected at PEB 125 > > ubi0: run torture test for PEB 125 > > ubi0: fixable bit-flip detected at PEB 125 > > ubi0 error: torture_peb: read problems on freshly erased PEB 125, > > must be bad > > ubi0 error: erase_worker: failed to erase PEB 125, error -5 > > ubi0: mark PEB 125 as bad > > > > Signed-off-by: rminnikanti <rminnikanti@marvell.com> > > Looks good to me > > Reviewed-by: Chris Packham <judge.packham@gmail.com> > > > --- > > drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c > > index 1d9a6d107b..d2a4faad56 100644 > > --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > > +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > > @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) > > info->ecc_err_cnt = 0; > > info->ndcb3 = 0; > > info->need_wait = 0; > > + /* > > + * Reset max_bitflips to zero. Once command is complete, > > + * max_bitflips for this READ is returned in ecc.read_page() > > + */ > > + info->max_bitflips = 0; > > Why this should not be put to 0 in read_page instead on prepare_start_command? Michael > > switch (command) { > > case NAND_CMD_READ0: > > -- > > 2.17.1
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote: > ---------------------------------------------------------------------- > On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote: >> >> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: >>> >>> Once a page is read with higher bitflips all subsequent reads >>> are returning the same bitflip value even though they have none. >>> max_bitflip variable is not being reset to 0 across page reads. >>> >>> This is causing problems like incorrectly >>> marking erase blocks bad by UBI and causing read failures. >>> >>> Verified the change with both MTD reads and UBI. >>> This change is inline with other NFC drivers. >>> >>> Sample error log where a block is marked bad incorrectly: >>> >>> ubi0: fixable bit-flip detected at PEB 125 >>> ubi0: run torture test for PEB 125 >>> ubi0: fixable bit-flip detected at PEB 125 >>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >>> must be bad >>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >>> ubi0: mark PEB 125 as bad >>> >>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> >> >> Looks good to me >> >> Reviewed-by: Chris Packham <judge.packham@gmail.com> >> >>> --- >>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c >>> index 1d9a6d107b..d2a4faad56 100644 >>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) >>> info->ecc_err_cnt = 0; >>> info->ndcb3 = 0; >>> info->need_wait = 0; >>> + /* >>> + * Reset max_bitflips to zero. Once command is complete, >>> + * max_bitflips for this READ is returned in ecc.read_page() >>> + */ >>> + info->max_bitflips = 0; >>> > > Why this should not be put to 0 in read_page instead on prepare_start_command? > > Michael > ecc.read_page is invoked after the read command execution. First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips must already have the bit flip value. Thanks, Ravi. >>> switch (command) { >>> case NAND_CMD_READ0: >>> -- >>> 2.17.1 >
Hi Ravi On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > > On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote: > > > ---------------------------------------------------------------------- > > On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote: > >> > >> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > >>> > >>> Once a page is read with higher bitflips all subsequent reads > >>> are returning the same bitflip value even though they have none. > >>> max_bitflip variable is not being reset to 0 across page reads. > >>> > >>> This is causing problems like incorrectly > >>> marking erase blocks bad by UBI and causing read failures. > >>> > >>> Verified the change with both MTD reads and UBI. > >>> This change is inline with other NFC drivers. > >>> > >>> Sample error log where a block is marked bad incorrectly: > >>> > >>> ubi0: fixable bit-flip detected at PEB 125 > >>> ubi0: run torture test for PEB 125 > >>> ubi0: fixable bit-flip detected at PEB 125 > >>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, > >>> must be bad > >>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 > >>> ubi0: mark PEB 125 as bad > >>> > >>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> > >> > >> Looks good to me > >> > >> Reviewed-by: Chris Packham <judge.packham@gmail.com> > >> > >>> --- > >>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > >>> 1 file changed, 5 insertions(+) > >>> > >>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c > >>> index 1d9a6d107b..d2a4faad56 100644 > >>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > >>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > >>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) > >>> info->ecc_err_cnt = 0; > >>> info->ndcb3 = 0; > >>> info->need_wait = 0; > >>> + /* > >>> + * Reset max_bitflips to zero. Once command is complete, > >>> + * max_bitflips for this READ is returned in ecc.read_page() > >>> + */ > >>> + info->max_bitflips = 0; > >>> > > > > Why this should not be put to 0 in read_page instead on prepare_start_command? > > > > Michael > > > > ecc.read_page is invoked after the read command execution. > First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked > to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips > must already have the bit flip value. > All the other implementation has a slight different way to handle. From what you said the reset should be done on for NAND_CMD_READ0 command and should be sufficient. Technically should be moved in switch and not unconditionally. Michael > Thanks, Ravi. > > >>> switch (command) { > >>> case NAND_CMD_READ0: > >>> -- > >>> 2.17.1 > > >
On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote: > Hi Ravi > > On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: >> >> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote: >> >>> ---------------------------------------------------------------------- >>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote: >>>> >>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: >>>>> >>>>> Once a page is read with higher bitflips all subsequent reads >>>>> are returning the same bitflip value even though they have none. >>>>> max_bitflip variable is not being reset to 0 across page reads. >>>>> >>>>> This is causing problems like incorrectly >>>>> marking erase blocks bad by UBI and causing read failures. >>>>> >>>>> Verified the change with both MTD reads and UBI. >>>>> This change is inline with other NFC drivers. >>>>> >>>>> Sample error log where a block is marked bad incorrectly: >>>>> >>>>> ubi0: fixable bit-flip detected at PEB 125 >>>>> ubi0: run torture test for PEB 125 >>>>> ubi0: fixable bit-flip detected at PEB 125 >>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >>>>> must be bad >>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >>>>> ubi0: mark PEB 125 as bad >>>>> >>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> >>>> >>>> Looks good to me >>>> >>>> Reviewed-by: Chris Packham <judge.packham@gmail.com> >>>> >>>>> --- >>>>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>> index 1d9a6d107b..d2a4faad56 100644 >>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) >>>>> info->ecc_err_cnt = 0; >>>>> info->ndcb3 = 0; >>>>> info->need_wait = 0; >>>>> + /* >>>>> + * Reset max_bitflips to zero. Once command is complete, >>>>> + * max_bitflips for this READ is returned in ecc.read_page() >>>>> + */ >>>>> + info->max_bitflips = 0; >>>>> >>> >>> Why this should not be put to 0 in read_page instead on prepare_start_command? >>> >>> Michael >>> >> >> ecc.read_page is invoked after the read command execution. >> First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked >> to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips >> must already have the bit flip value. >> > > All the other implementation has a slight different way to handle. > From what you said the reset should > be done on for NAND_CMD_READ0 command and should be sufficient. > Technically should be moved > in switch and not unconditionally. > > Michael > max_bitflip is not being reset to 0 across page reads. Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. This is causing problems like incorrectly marking erase blocks bad by UBI and read failures. Tested it with both MTD reads and UBI attach. This change is inline with other NFC drivers. Sample error log where a block is marked bad incorrectly: ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad Signed-off-by: rminnikanti <rminnikanti@marvell.com> --- drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..97f250483f 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) switch (command) { case NAND_CMD_READ0: + /* + * Reset max_bitflips to zero. Once command is complete, + * max_bitflips for this READ is returned in ecc.read_page() + */ + info->max_bitflips = 0; case NAND_CMD_READOOB: case NAND_CMD_PAGEPROG: if (!info->force_raw)
Hi Ravi On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > > On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote: > > Hi Ravi > > > > On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > >> > >> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote: > >> > >>> ---------------------------------------------------------------------- > >>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote: > >>>> > >>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > >>>>> > >>>>> Once a page is read with higher bitflips all subsequent reads > >>>>> are returning the same bitflip value even though they have none. > >>>>> max_bitflip variable is not being reset to 0 across page reads. > >>>>> > >>>>> This is causing problems like incorrectly > >>>>> marking erase blocks bad by UBI and causing read failures. > >>>>> > >>>>> Verified the change with both MTD reads and UBI. > >>>>> This change is inline with other NFC drivers. > >>>>> > >>>>> Sample error log where a block is marked bad incorrectly: > >>>>> > >>>>> ubi0: fixable bit-flip detected at PEB 125 > >>>>> ubi0: run torture test for PEB 125 > >>>>> ubi0: fixable bit-flip detected at PEB 125 > >>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, > >>>>> must be bad > >>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 > >>>>> ubi0: mark PEB 125 as bad > >>>>> > >>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> > >>>> > >>>> Looks good to me > >>>> > >>>> Reviewed-by: Chris Packham <judge.packham@gmail.com> > >>>> > >>>>> --- > >>>>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > >>>>> 1 file changed, 5 insertions(+) > >>>>> > >>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c > >>>>> index 1d9a6d107b..d2a4faad56 100644 > >>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > >>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > >>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) > >>>>> info->ecc_err_cnt = 0; > >>>>> info->ndcb3 = 0; > >>>>> info->need_wait = 0; > >>>>> + /* > >>>>> + * Reset max_bitflips to zero. Once command is complete, > >>>>> + * max_bitflips for this READ is returned in ecc.read_page() > >>>>> + */ > >>>>> + info->max_bitflips = 0; > >>>>> > >>> > >>> Why this should not be put to 0 in read_page instead on prepare_start_command? > >>> > >>> Michael > >>> > >> > >> ecc.read_page is invoked after the read command execution. > >> First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked > >> to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips > >> must already have the bit flip value. > >> > > > > All the other implementation has a slight different way to handle. > > From what you said the reset should > > be done on for NAND_CMD_READ0 command and should be sufficient. > > Technically should be moved > > in switch and not unconditionally. > > > > Michael > > > > max_bitflip is not being reset to 0 across page reads. > Once a page is read with higher bitflips all subsequent reads > are returning the same bitflip value even though they have none. > > This is causing problems like incorrectly > marking erase blocks bad by UBI and read failures. > > Tested it with both MTD reads and UBI attach. > This change is inline with other NFC drivers. > > Sample error log where a block is marked bad incorrectly: > > ubi0: fixable bit-flip detected at PEB 125 > ubi0: run torture test for PEB 125 > ubi0: fixable bit-flip detected at PEB 125 > ubi0 error: torture_peb: read problems on freshly erased PEB 125, > must be bad > ubi0 error: erase_worker: failed to erase PEB 125, error -5 > ubi0: mark PEB 125 as bad > > Signed-off-by: rminnikanti <rminnikanti@marvell.com> > --- > drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c > index 1d9a6d107b..97f250483f 100644 > --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) > > switch (command) { > case NAND_CMD_READ0: > + /* > + * Reset max_bitflips to zero. Once command is complete, > + * max_bitflips for this READ is returned in ecc.read_page() > + */ > + info->max_bitflips = 0; > case NAND_CMD_READOOB: > case NAND_CMD_PAGEPROG: > if (!info->force_raw) > -- > 2.17.1 > > Thanks Michael. Fixed it. > > >> Thanks, Ravi. > >> > >>>>> switch (command) { > >>>>> case NAND_CMD_READ0: > >>>>> -- > >>>>> 2.17.1 > >>> > >> > > > > Acked-by: Michael Trimarchi <michael@amarulasolutions.com>
Hi, Can you please merge this PR, if there are no more review comments? Thanks, Ravi On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote: > ---------------------------------------------------------------------- > Hi Ravi > > On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com> wrote: >> >> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote: >>> Hi Ravi >>> >>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: >>>> >>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote: >>>> >>>>> ---------------------------------------------------------------------- >>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote: >>>>>> >>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote: >>>>>>> >>>>>>> Once a page is read with higher bitflips all subsequent reads >>>>>>> are returning the same bitflip value even though they have none. >>>>>>> max_bitflip variable is not being reset to 0 across page reads. >>>>>>> >>>>>>> This is causing problems like incorrectly >>>>>>> marking erase blocks bad by UBI and causing read failures. >>>>>>> >>>>>>> Verified the change with both MTD reads and UBI. >>>>>>> This change is inline with other NFC drivers. >>>>>>> >>>>>>> Sample error log where a block is marked bad incorrectly: >>>>>>> >>>>>>> ubi0: fixable bit-flip detected at PEB 125 >>>>>>> ubi0: run torture test for PEB 125 >>>>>>> ubi0: fixable bit-flip detected at PEB 125 >>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >>>>>>> must be bad >>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >>>>>>> ubi0: mark PEB 125 as bad >>>>>>> >>>>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> >>>>>> >>>>>> Looks good to me >>>>>> >>>>>> Reviewed-by: Chris Packham <judge.packham@gmail.com> >>>>>> >>>>>>> --- >>>>>>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >>>>>>> 1 file changed, 5 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>>>> index 1d9a6d107b..d2a4faad56 100644 >>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) >>>>>>> info->ecc_err_cnt = 0; >>>>>>> info->ndcb3 = 0; >>>>>>> info->need_wait = 0; >>>>>>> + /* >>>>>>> + * Reset max_bitflips to zero. Once command is complete, >>>>>>> + * max_bitflips for this READ is returned in ecc.read_page() >>>>>>> + */ >>>>>>> + info->max_bitflips = 0; >>>>>>> >>>>> >>>>> Why this should not be put to 0 in read_page instead on prepare_start_command? >>>>> >>>>> Michael >>>>> >>>> >>>> ecc.read_page is invoked after the read command execution. >>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked >>>> to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips >>>> must already have the bit flip value. >>>> >>> >>> All the other implementation has a slight different way to handle. >>> From what you said the reset should >>> be done on for NAND_CMD_READ0 command and should be sufficient. >>> Technically should be moved >>> in switch and not unconditionally. >>> >>> Michael >>> >> >> max_bitflip is not being reset to 0 across page reads. >> Once a page is read with higher bitflips all subsequent reads >> are returning the same bitflip value even though they have none. >> >> This is causing problems like incorrectly >> marking erase blocks bad by UBI and read failures. >> >> Tested it with both MTD reads and UBI attach. >> This change is inline with other NFC drivers. >> >> Sample error log where a block is marked bad incorrectly: >> >> ubi0: fixable bit-flip detected at PEB 125 >> ubi0: run torture test for PEB 125 >> ubi0: fixable bit-flip detected at PEB 125 >> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >> must be bad >> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >> ubi0: mark PEB 125 as bad >> >> Signed-off-by: rminnikanti <rminnikanti@marvell.com> >> --- >> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c >> index 1d9a6d107b..97f250483f 100644 >> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >> @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) >> >> switch (command) { >> case NAND_CMD_READ0: >> + /* >> + * Reset max_bitflips to zero. Once command is complete, >> + * max_bitflips for this READ is returned in ecc.read_page() >> + */ >> + info->max_bitflips = 0; >> case NAND_CMD_READOOB: >> case NAND_CMD_PAGEPROG: >> if (!info->force_raw) >> -- >> 2.17.1 >> >> Thanks Michael. Fixed it. >> >>>> Thanks, Ravi. >>>> >>>>>>> switch (command) { >>>>>>> case NAND_CMD_READ0: >>>>>>> -- >>>>>>> 2.17.1 >>>>> >>>> >>> >>> > > Acked-by: Michael Trimarchi <michael@amarulasolutions.com> > > >
Hi Dario Can you add to next pull? Michael On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti <rminnikanti@marvell.com> wrote: > Hi, > > Can you please merge this PR, if there are no more review comments? > > Thanks, > Ravi > On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote: > > > ---------------------------------------------------------------------- > > Hi Ravi > > > > On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com> > wrote: > >> > >> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote: > >>> Hi Ravi > >>> > >>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti < > rminnikanti@marvell.com> wrote: > >>>> > >>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote: > >>>> > >>>>> > ---------------------------------------------------------------------- > >>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham < > judge.packham@gmail.com> wrote: > >>>>>> > >>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti < > rminnikanti@marvell.com> wrote: > >>>>>>> > >>>>>>> Once a page is read with higher bitflips all subsequent reads > >>>>>>> are returning the same bitflip value even though they have none. > >>>>>>> max_bitflip variable is not being reset to 0 across page reads. > >>>>>>> > >>>>>>> This is causing problems like incorrectly > >>>>>>> marking erase blocks bad by UBI and causing read failures. > >>>>>>> > >>>>>>> Verified the change with both MTD reads and UBI. > >>>>>>> This change is inline with other NFC drivers. > >>>>>>> > >>>>>>> Sample error log where a block is marked bad incorrectly: > >>>>>>> > >>>>>>> ubi0: fixable bit-flip detected at PEB 125 > >>>>>>> ubi0: run torture test for PEB 125 > >>>>>>> ubi0: fixable bit-flip detected at PEB 125 > >>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, > >>>>>>> must be bad > >>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 > >>>>>>> ubi0: mark PEB 125 as bad > >>>>>>> > >>>>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> > >>>>>> > >>>>>> Looks good to me > >>>>>> > >>>>>> Reviewed-by: Chris Packham <judge.packham@gmail.com> > >>>>>> > >>>>>>> --- > >>>>>>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > >>>>>>> 1 file changed, 5 insertions(+) > >>>>>>> > >>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c > b/drivers/mtd/nand/raw/pxa3xx_nand.c > >>>>>>> index 1d9a6d107b..d2a4faad56 100644 > >>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > >>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > >>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct > pxa3xx_nand_info *info, int command) > >>>>>>> info->ecc_err_cnt = 0; > >>>>>>> info->ndcb3 = 0; > >>>>>>> info->need_wait = 0; > >>>>>>> + /* > >>>>>>> + * Reset max_bitflips to zero. Once command is complete, > >>>>>>> + * max_bitflips for this READ is returned in > ecc.read_page() > >>>>>>> + */ > >>>>>>> + info->max_bitflips = 0; > >>>>>>> > >>>>> > >>>>> Why this should not be put to 0 in read_page instead on > prepare_start_command? > >>>>> > >>>>> Michael > >>>>> > >>>> > >>>> ecc.read_page is invoked after the read command execution. > >>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then > ecc.read_page is invoked > >>>> to read the page from buffer. So, by the time read_page is invoked, > info->max_bitflips > >>>> must already have the bit flip value. > >>>> > >>> > >>> All the other implementation has a slight different way to handle. > >>> From what you said the reset should > >>> be done on for NAND_CMD_READ0 command and should be sufficient. > >>> Technically should be moved > >>> in switch and not unconditionally. > >>> > >>> Michael > >>> > >> > >> max_bitflip is not being reset to 0 across page reads. > >> Once a page is read with higher bitflips all subsequent reads > >> are returning the same bitflip value even though they have none. > >> > >> This is causing problems like incorrectly > >> marking erase blocks bad by UBI and read failures. > >> > >> Tested it with both MTD reads and UBI attach. > >> This change is inline with other NFC drivers. > >> > >> Sample error log where a block is marked bad incorrectly: > >> > >> ubi0: fixable bit-flip detected at PEB 125 > >> ubi0: run torture test for PEB 125 > >> ubi0: fixable bit-flip detected at PEB 125 > >> ubi0 error: torture_peb: read problems on freshly erased PEB 125, > >> must be bad > >> ubi0 error: erase_worker: failed to erase PEB 125, error -5 > >> ubi0: mark PEB 125 as bad > >> > >> Signed-off-by: rminnikanti <rminnikanti@marvell.com> > >> --- > >> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c > b/drivers/mtd/nand/raw/pxa3xx_nand.c > >> index 1d9a6d107b..97f250483f 100644 > >> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c > >> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c > >> @@ -803,6 +803,11 @@ static void prepare_start_command(struct > pxa3xx_nand_info *info, int command) > >> > >> switch (command) { > >> case NAND_CMD_READ0: > >> + /* > >> + * Reset max_bitflips to zero. Once command is complete, > >> + * max_bitflips for this READ is returned in > ecc.read_page() > >> + */ > >> + info->max_bitflips = 0; > >> case NAND_CMD_READOOB: > >> case NAND_CMD_PAGEPROG: > >> if (!info->force_raw) > >> -- > >> 2.17.1 > >> > >> Thanks Michael. Fixed it. > >> > >>>> Thanks, Ravi. > >>>> > >>>>>>> switch (command) { > >>>>>>> case NAND_CMD_READ0: > >>>>>>> -- > >>>>>>> 2.17.1 > >>>>> > >>>> > >>> > >>> > > > > Acked-by: Michael Trimarchi <michael@amarulasolutions.com> > > > > > > >
Hi Can this be merged? Let me know if I missed any process. Thanks, Ravi On 5/21/24 13:15, Michael Nazzareno Trimarchi wrote: > Hi Dario > > Can you add to next pull? > > Michael > > On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti <rminnikanti@marvell.com> > wrote: > >> Hi, >> >> Can you please merge this PR, if there are no more review comments? >> >> Thanks, >> Ravi >> On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote: >> >>> ---------------------------------------------------------------------- >>> Hi Ravi >>> >>> On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com> >> wrote: >>>> >>>> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote: >>>>> Hi Ravi >>>>> >>>>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti < >> rminnikanti@marvell.com> wrote: >>>>>> >>>>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote: >>>>>> >>>>>>> >> ---------------------------------------------------------------------- >>>>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham < >> judge.packham@gmail.com> wrote: >>>>>>>> >>>>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti < >> rminnikanti@marvell.com> wrote: >>>>>>>>> >>>>>>>>> Once a page is read with higher bitflips all subsequent reads >>>>>>>>> are returning the same bitflip value even though they have none. >>>>>>>>> max_bitflip variable is not being reset to 0 across page reads. >>>>>>>>> >>>>>>>>> This is causing problems like incorrectly >>>>>>>>> marking erase blocks bad by UBI and causing read failures. >>>>>>>>> >>>>>>>>> Verified the change with both MTD reads and UBI. >>>>>>>>> This change is inline with other NFC drivers. >>>>>>>>> >>>>>>>>> Sample error log where a block is marked bad incorrectly: >>>>>>>>> >>>>>>>>> ubi0: fixable bit-flip detected at PEB 125 >>>>>>>>> ubi0: run torture test for PEB 125 >>>>>>>>> ubi0: fixable bit-flip detected at PEB 125 >>>>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >>>>>>>>> must be bad >>>>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >>>>>>>>> ubi0: mark PEB 125 as bad >>>>>>>>> >>>>>>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> >>>>>>>> >>>>>>>> Looks good to me >>>>>>>> >>>>>>>> Reviewed-by: Chris Packham <judge.packham@gmail.com> >>>>>>>> >>>>>>>>> --- >>>>>>>>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >>>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c >> b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>>>>>> index 1d9a6d107b..d2a4faad56 100644 >>>>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct >> pxa3xx_nand_info *info, int command) >>>>>>>>> info->ecc_err_cnt = 0; >>>>>>>>> info->ndcb3 = 0; >>>>>>>>> info->need_wait = 0; >>>>>>>>> + /* >>>>>>>>> + * Reset max_bitflips to zero. Once command is complete, >>>>>>>>> + * max_bitflips for this READ is returned in >> ecc.read_page() >>>>>>>>> + */ >>>>>>>>> + info->max_bitflips = 0; >>>>>>>>> >>>>>>> >>>>>>> Why this should not be put to 0 in read_page instead on >> prepare_start_command? >>>>>>> >>>>>>> Michael >>>>>>> >>>>>> >>>>>> ecc.read_page is invoked after the read command execution. >>>>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then >> ecc.read_page is invoked >>>>>> to read the page from buffer. So, by the time read_page is invoked, >> info->max_bitflips >>>>>> must already have the bit flip value. >>>>>> >>>>> >>>>> All the other implementation has a slight different way to handle. >>>>> From what you said the reset should >>>>> be done on for NAND_CMD_READ0 command and should be sufficient. >>>>> Technically should be moved >>>>> in switch and not unconditionally. >>>>> >>>>> Michael >>>>> >>>> >>>> max_bitflip is not being reset to 0 across page reads. >>>> Once a page is read with higher bitflips all subsequent reads >>>> are returning the same bitflip value even though they have none. >>>> >>>> This is causing problems like incorrectly >>>> marking erase blocks bad by UBI and read failures. >>>> >>>> Tested it with both MTD reads and UBI attach. >>>> This change is inline with other NFC drivers. >>>> >>>> Sample error log where a block is marked bad incorrectly: >>>> >>>> ubi0: fixable bit-flip detected at PEB 125 >>>> ubi0: run torture test for PEB 125 >>>> ubi0: fixable bit-flip detected at PEB 125 >>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125, >>>> must be bad >>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5 >>>> ubi0: mark PEB 125 as bad >>>> >>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com> >>>> --- >>>> drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c >> b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>> index 1d9a6d107b..97f250483f 100644 >>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c >>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c >>>> @@ -803,6 +803,11 @@ static void prepare_start_command(struct >> pxa3xx_nand_info *info, int command) >>>> >>>> switch (command) { >>>> case NAND_CMD_READ0: >>>> + /* >>>> + * Reset max_bitflips to zero. Once command is complete, >>>> + * max_bitflips for this READ is returned in >> ecc.read_page() >>>> + */ >>>> + info->max_bitflips = 0; >>>> case NAND_CMD_READOOB: >>>> case NAND_CMD_PAGEPROG: >>>> if (!info->force_raw) >>>> -- >>>> 2.17.1 >>>> >>>> Thanks Michael. Fixed it. >>>> >>>>>> Thanks, Ravi. >>>>>> >>>>>>>>> switch (command) { >>>>>>>>> case NAND_CMD_READ0: >>>>>>>>> -- >>>>>>>>> 2.17.1 >>>>>>> >>>>>> >>>>> >>>>> >>> >>> Acked-by: Michael Trimarchi <michael@amarulasolutions.com> >>> >>> >>> >> >
diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c index 1d9a6d107b..d2a4faad56 100644 --- a/drivers/mtd/nand/raw/pxa3xx_nand.c +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) info->ecc_err_cnt = 0; info->ndcb3 = 0; info->need_wait = 0; + /* + * Reset max_bitflips to zero. Once command is complete, + * max_bitflips for this READ is returned in ecc.read_page() + */ + info->max_bitflips = 0; switch (command) { case NAND_CMD_READ0:
Once a page is read with higher bitflips all subsequent reads are returning the same bitflip value even though they have none. max_bitflip variable is not being reset to 0 across page reads. This is causing problems like incorrectly marking erase blocks bad by UBI and causing read failures. Verified the change with both MTD reads and UBI. This change is inline with other NFC drivers. Sample error log where a block is marked bad incorrectly: ubi0: fixable bit-flip detected at PEB 125 ubi0: run torture test for PEB 125 ubi0: fixable bit-flip detected at PEB 125 ubi0 error: torture_peb: read problems on freshly erased PEB 125, must be bad ubi0 error: erase_worker: failed to erase PEB 125, error -5 ubi0: mark PEB 125 as bad Signed-off-by: rminnikanti <rminnikanti@marvell.com> --- drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++ 1 file changed, 5 insertions(+)