diff mbox series

mtd: nand: pxa3xx: Incorrect bitflip return on page read

Message ID ea0422cd-a8e6-3c36-f551-a0142893301b@marvell.com
State Accepted
Commit aad8aa56d96b1305ae5a9708b604b2f2a4e97c4e
Delegated to: Dario Binacchi
Headers show
Series mtd: nand: pxa3xx: Incorrect bitflip return on page read | expand

Commit Message

Ravi Minnikanti April 27, 2024, 4:15 p.m. UTC
Once a page is read with higher bitflips all subsequent reads
are returning the same bitflip value even though they have none.
max_bitflip variable is not being reset to 0 across page reads.

This is causing problems like incorrectly
marking erase blocks bad by UBI and causing read failures.

Verified the change with both MTD reads and UBI.
This change is inline with other NFC drivers.

Sample error log where a block is marked bad incorrectly:

ubi0: fixable bit-flip detected at PEB 125
ubi0: run torture test for PEB 125
ubi0: fixable bit-flip detected at PEB 125
ubi0 error: torture_peb: read problems on freshly erased PEB 125,
must be bad
ubi0 error: erase_worker: failed to erase PEB 125, error -5
ubi0: mark PEB 125 as bad

Signed-off-by: rminnikanti <rminnikanti@marvell.com>
---
 drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Chris Packham April 29, 2024, 4:22 p.m. UTC | #1
On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>
> Once a page is read with higher bitflips all subsequent reads
> are returning the same bitflip value even though they have none.
> max_bitflip variable is not being reset to 0 across page reads.
>
> This is causing problems like incorrectly
> marking erase blocks bad by UBI and causing read failures.
>
> Verified the change with both MTD reads and UBI.
> This change is inline with other NFC drivers.
>
> Sample error log where a block is marked bad incorrectly:
>
> ubi0: fixable bit-flip detected at PEB 125
> ubi0: run torture test for PEB 125
> ubi0: fixable bit-flip detected at PEB 125
> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> must be bad
> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> ubi0: mark PEB 125 as bad
>
> Signed-off-by: rminnikanti <rminnikanti@marvell.com>

Looks good to me

Reviewed-by: Chris Packham <judge.packham@gmail.com>

> ---
>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
> index 1d9a6d107b..d2a4faad56 100644
> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
>         info->ecc_err_cnt       = 0;
>         info->ndcb3             = 0;
>         info->need_wait         = 0;
> +       /*
> +        * Reset max_bitflips to zero. Once command is complete,
> +        * max_bitflips for this READ is returned in ecc.read_page()
> +        */
> +       info->max_bitflips      = 0;
>
>         switch (command) {
>         case NAND_CMD_READ0:
> --
> 2.17.1
Michael Nazzareno Trimarchi April 29, 2024, 4:59 p.m. UTC | #2
On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote:
>
> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
> >
> > Once a page is read with higher bitflips all subsequent reads
> > are returning the same bitflip value even though they have none.
> > max_bitflip variable is not being reset to 0 across page reads.
> >
> > This is causing problems like incorrectly
> > marking erase blocks bad by UBI and causing read failures.
> >
> > Verified the change with both MTD reads and UBI.
> > This change is inline with other NFC drivers.
> >
> > Sample error log where a block is marked bad incorrectly:
> >
> > ubi0: fixable bit-flip detected at PEB 125
> > ubi0: run torture test for PEB 125
> > ubi0: fixable bit-flip detected at PEB 125
> > ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> > must be bad
> > ubi0 error: erase_worker: failed to erase PEB 125, error -5
> > ubi0: mark PEB 125 as bad
> >
> > Signed-off-by: rminnikanti <rminnikanti@marvell.com>
>
> Looks good to me
>
> Reviewed-by: Chris Packham <judge.packham@gmail.com>
>
> > ---
> >  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
> > index 1d9a6d107b..d2a4faad56 100644
> > --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> > +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> > @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
> >         info->ecc_err_cnt       = 0;
> >         info->ndcb3             = 0;
> >         info->need_wait         = 0;
> > +       /*
> > +        * Reset max_bitflips to zero. Once command is complete,
> > +        * max_bitflips for this READ is returned in ecc.read_page()
> > +        */
> > +       info->max_bitflips      = 0;
> >

Why this should not be put to 0 in read_page instead on prepare_start_command?

Michael

> >         switch (command) {
> >         case NAND_CMD_READ0:
> > --
> > 2.17.1
Ravi Minnikanti April 30, 2024, 4:25 a.m. UTC | #3
On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:

> ----------------------------------------------------------------------
> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote:
>>
>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>>>
>>> Once a page is read with higher bitflips all subsequent reads
>>> are returning the same bitflip value even though they have none.
>>> max_bitflip variable is not being reset to 0 across page reads.
>>>
>>> This is causing problems like incorrectly
>>> marking erase blocks bad by UBI and causing read failures.
>>>
>>> Verified the change with both MTD reads and UBI.
>>> This change is inline with other NFC drivers.
>>>
>>> Sample error log where a block is marked bad incorrectly:
>>>
>>> ubi0: fixable bit-flip detected at PEB 125
>>> ubi0: run torture test for PEB 125
>>> ubi0: fixable bit-flip detected at PEB 125
>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>>> must be bad
>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>>> ubi0: mark PEB 125 as bad
>>>
>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
>>
>> Looks good to me
>>
>> Reviewed-by: Chris Packham <judge.packham@gmail.com>
>>
>>> ---
>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>> index 1d9a6d107b..d2a4faad56 100644
>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
>>>         info->ecc_err_cnt       = 0;
>>>         info->ndcb3             = 0;
>>>         info->need_wait         = 0;
>>> +       /*
>>> +        * Reset max_bitflips to zero. Once command is complete,
>>> +        * max_bitflips for this READ is returned in ecc.read_page()
>>> +        */
>>> +       info->max_bitflips      = 0;
>>>
> 
> Why this should not be put to 0 in read_page instead on prepare_start_command?
> 
> Michael
>

ecc.read_page is invoked after the read command execution.
First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked
to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips
must already have the bit flip value.

Thanks, Ravi.

>>>         switch (command) {
>>>         case NAND_CMD_READ0:
>>> --
>>> 2.17.1
>
Michael Nazzareno Trimarchi May 6, 2024, 7:35 a.m. UTC | #4
Hi Ravi

On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>
> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
>
> > ----------------------------------------------------------------------
> > On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote:
> >>
> >> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
> >>>
> >>> Once a page is read with higher bitflips all subsequent reads
> >>> are returning the same bitflip value even though they have none.
> >>> max_bitflip variable is not being reset to 0 across page reads.
> >>>
> >>> This is causing problems like incorrectly
> >>> marking erase blocks bad by UBI and causing read failures.
> >>>
> >>> Verified the change with both MTD reads and UBI.
> >>> This change is inline with other NFC drivers.
> >>>
> >>> Sample error log where a block is marked bad incorrectly:
> >>>
> >>> ubi0: fixable bit-flip detected at PEB 125
> >>> ubi0: run torture test for PEB 125
> >>> ubi0: fixable bit-flip detected at PEB 125
> >>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> >>> must be bad
> >>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> >>> ubi0: mark PEB 125 as bad
> >>>
> >>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
> >>
> >> Looks good to me
> >>
> >> Reviewed-by: Chris Packham <judge.packham@gmail.com>
> >>
> >>> ---
> >>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>> index 1d9a6d107b..d2a4faad56 100644
> >>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
> >>>         info->ecc_err_cnt       = 0;
> >>>         info->ndcb3             = 0;
> >>>         info->need_wait         = 0;
> >>> +       /*
> >>> +        * Reset max_bitflips to zero. Once command is complete,
> >>> +        * max_bitflips for this READ is returned in ecc.read_page()
> >>> +        */
> >>> +       info->max_bitflips      = 0;
> >>>
> >
> > Why this should not be put to 0 in read_page instead on prepare_start_command?
> >
> > Michael
> >
>
> ecc.read_page is invoked after the read command execution.
> First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked
> to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips
> must already have the bit flip value.
>

All the other implementation has a slight different way to handle.
From what you said the reset should
be done on for NAND_CMD_READ0 command and should be sufficient.
Technically should be moved
in switch and not unconditionally.

Michael

> Thanks, Ravi.
>
> >>>         switch (command) {
> >>>         case NAND_CMD_READ0:
> >>> --
> >>> 2.17.1
> >
>
Ravi Minnikanti May 6, 2024, 5:32 p.m. UTC | #5
On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
> Hi Ravi
> 
> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>>
>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
>>
>>> ----------------------------------------------------------------------
>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote:
>>>>
>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>>>>>
>>>>> Once a page is read with higher bitflips all subsequent reads
>>>>> are returning the same bitflip value even though they have none.
>>>>> max_bitflip variable is not being reset to 0 across page reads.
>>>>>
>>>>> This is causing problems like incorrectly
>>>>> marking erase blocks bad by UBI and causing read failures.
>>>>>
>>>>> Verified the change with both MTD reads and UBI.
>>>>> This change is inline with other NFC drivers.
>>>>>
>>>>> Sample error log where a block is marked bad incorrectly:
>>>>>
>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>> ubi0: run torture test for PEB 125
>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>>>>> must be bad
>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>>>>> ubi0: mark PEB 125 as bad
>>>>>
>>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
>>>>
>>>> Looks good to me
>>>>
>>>> Reviewed-by: Chris Packham <judge.packham@gmail.com>
>>>>
>>>>> ---
>>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>>>>  1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>> index 1d9a6d107b..d2a4faad56 100644
>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
>>>>>         info->ecc_err_cnt       = 0;
>>>>>         info->ndcb3             = 0;
>>>>>         info->need_wait         = 0;
>>>>> +       /*
>>>>> +        * Reset max_bitflips to zero. Once command is complete,
>>>>> +        * max_bitflips for this READ is returned in ecc.read_page()
>>>>> +        */
>>>>> +       info->max_bitflips      = 0;
>>>>>
>>>
>>> Why this should not be put to 0 in read_page instead on prepare_start_command?
>>>
>>> Michael
>>>
>>
>> ecc.read_page is invoked after the read command execution.
>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked
>> to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips
>> must already have the bit flip value.
>>
> 
> All the other implementation has a slight different way to handle.
> From what you said the reset should
> be done on for NAND_CMD_READ0 command and should be sufficient.
> Technically should be moved
> in switch and not unconditionally.
> 
> Michael
> 

max_bitflip is not being reset to 0 across page reads.
Once a page is read with higher bitflips all subsequent reads
are returning the same bitflip value even though they have none.

This is causing problems like incorrectly
marking erase blocks bad by UBI and read failures.

Tested it with both MTD reads and UBI attach.
This change is inline with other NFC drivers.

Sample error log where a block is marked bad incorrectly:

ubi0: fixable bit-flip detected at PEB 125
ubi0: run torture test for PEB 125
ubi0: fixable bit-flip detected at PEB 125
ubi0 error: torture_peb: read problems on freshly erased PEB 125,
must be bad
ubi0 error: erase_worker: failed to erase PEB 125, error -5
ubi0: mark PEB 125 as bad

Signed-off-by: rminnikanti <rminnikanti@marvell.com>
---
 drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
index 1d9a6d107b..97f250483f 100644
--- a/drivers/mtd/nand/raw/pxa3xx_nand.c
+++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
@@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
 
 	switch (command) {
 	case NAND_CMD_READ0:
+		/*
+		 * Reset max_bitflips to zero. Once command is complete,
+		 * max_bitflips for this READ is returned in ecc.read_page()
+		 */
+		info->max_bitflips = 0;
 	case NAND_CMD_READOOB:
 	case NAND_CMD_PAGEPROG:
 		if (!info->force_raw)
Michael Nazzareno Trimarchi May 6, 2024, 6:28 p.m. UTC | #6
Hi Ravi

On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>
> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
> > Hi Ravi
> >
> > On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
> >>
> >> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
> >>
> >>> ----------------------------------------------------------------------
> >>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote:
> >>>>
> >>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
> >>>>>
> >>>>> Once a page is read with higher bitflips all subsequent reads
> >>>>> are returning the same bitflip value even though they have none.
> >>>>> max_bitflip variable is not being reset to 0 across page reads.
> >>>>>
> >>>>> This is causing problems like incorrectly
> >>>>> marking erase blocks bad by UBI and causing read failures.
> >>>>>
> >>>>> Verified the change with both MTD reads and UBI.
> >>>>> This change is inline with other NFC drivers.
> >>>>>
> >>>>> Sample error log where a block is marked bad incorrectly:
> >>>>>
> >>>>> ubi0: fixable bit-flip detected at PEB 125
> >>>>> ubi0: run torture test for PEB 125
> >>>>> ubi0: fixable bit-flip detected at PEB 125
> >>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> >>>>> must be bad
> >>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> >>>>> ubi0: mark PEB 125 as bad
> >>>>>
> >>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
> >>>>
> >>>> Looks good to me
> >>>>
> >>>> Reviewed-by: Chris Packham <judge.packham@gmail.com>
> >>>>
> >>>>> ---
> >>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
> >>>>>  1 file changed, 5 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>> index 1d9a6d107b..d2a4faad56 100644
> >>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
> >>>>>         info->ecc_err_cnt       = 0;
> >>>>>         info->ndcb3             = 0;
> >>>>>         info->need_wait         = 0;
> >>>>> +       /*
> >>>>> +        * Reset max_bitflips to zero. Once command is complete,
> >>>>> +        * max_bitflips for this READ is returned in ecc.read_page()
> >>>>> +        */
> >>>>> +       info->max_bitflips      = 0;
> >>>>>
> >>>
> >>> Why this should not be put to 0 in read_page instead on prepare_start_command?
> >>>
> >>> Michael
> >>>
> >>
> >> ecc.read_page is invoked after the read command execution.
> >> First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked
> >> to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips
> >> must already have the bit flip value.
> >>
> >
> > All the other implementation has a slight different way to handle.
> > From what you said the reset should
> > be done on for NAND_CMD_READ0 command and should be sufficient.
> > Technically should be moved
> > in switch and not unconditionally.
> >
> > Michael
> >
>
> max_bitflip is not being reset to 0 across page reads.
> Once a page is read with higher bitflips all subsequent reads
> are returning the same bitflip value even though they have none.
>
> This is causing problems like incorrectly
> marking erase blocks bad by UBI and read failures.
>
> Tested it with both MTD reads and UBI attach.
> This change is inline with other NFC drivers.
>
> Sample error log where a block is marked bad incorrectly:
>
> ubi0: fixable bit-flip detected at PEB 125
> ubi0: run torture test for PEB 125
> ubi0: fixable bit-flip detected at PEB 125
> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> must be bad
> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> ubi0: mark PEB 125 as bad
>
> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
> ---
>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
> index 1d9a6d107b..97f250483f 100644
> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
>
>         switch (command) {
>         case NAND_CMD_READ0:
> +               /*
> +                * Reset max_bitflips to zero. Once command is complete,
> +                * max_bitflips for this READ is returned in ecc.read_page()
> +                */
> +               info->max_bitflips = 0;
>         case NAND_CMD_READOOB:
>         case NAND_CMD_PAGEPROG:
>                 if (!info->force_raw)
> --
> 2.17.1
>
> Thanks Michael. Fixed it.
>
> >> Thanks, Ravi.
> >>
> >>>>>         switch (command) {
> >>>>>         case NAND_CMD_READ0:
> >>>>> --
> >>>>> 2.17.1
> >>>
> >>
> >
> >

Acked-by: Michael Trimarchi <michael@amarulasolutions.com>
Ravi Minnikanti May 21, 2024, 2:30 p.m. UTC | #7
Hi,

Can you please merge this PR, if there are no more review comments?

Thanks,
Ravi
On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:

> ----------------------------------------------------------------------
> Hi Ravi
> 
> On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>>
>> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
>>> Hi Ravi
>>>
>>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>>>>
>>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
>>>>
>>>>> ----------------------------------------------------------------------
>>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <judge.packham@gmail.com> wrote:
>>>>>>
>>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <rminnikanti@marvell.com> wrote:
>>>>>>>
>>>>>>> Once a page is read with higher bitflips all subsequent reads
>>>>>>> are returning the same bitflip value even though they have none.
>>>>>>> max_bitflip variable is not being reset to 0 across page reads.
>>>>>>>
>>>>>>> This is causing problems like incorrectly
>>>>>>> marking erase blocks bad by UBI and causing read failures.
>>>>>>>
>>>>>>> Verified the change with both MTD reads and UBI.
>>>>>>> This change is inline with other NFC drivers.
>>>>>>>
>>>>>>> Sample error log where a block is marked bad incorrectly:
>>>>>>>
>>>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>>>> ubi0: run torture test for PEB 125
>>>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>>>>>>> must be bad
>>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>>>>>>> ubi0: mark PEB 125 as bad
>>>>>>>
>>>>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
>>>>>>
>>>>>> Looks good to me
>>>>>>
>>>>>> Reviewed-by: Chris Packham <judge.packham@gmail.com>
>>>>>>
>>>>>>> ---
>>>>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>>>>>>  1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>> index 1d9a6d107b..d2a4faad56 100644
>>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
>>>>>>>         info->ecc_err_cnt       = 0;
>>>>>>>         info->ndcb3             = 0;
>>>>>>>         info->need_wait         = 0;
>>>>>>> +       /*
>>>>>>> +        * Reset max_bitflips to zero. Once command is complete,
>>>>>>> +        * max_bitflips for this READ is returned in ecc.read_page()
>>>>>>> +        */
>>>>>>> +       info->max_bitflips      = 0;
>>>>>>>
>>>>>
>>>>> Why this should not be put to 0 in read_page instead on prepare_start_command?
>>>>>
>>>>> Michael
>>>>>
>>>>
>>>> ecc.read_page is invoked after the read command execution.
>>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then ecc.read_page is invoked
>>>> to read the page from buffer. So, by the time read_page is invoked, info->max_bitflips
>>>> must already have the bit flip value.
>>>>
>>>
>>> All the other implementation has a slight different way to handle.
>>> From what you said the reset should
>>> be done on for NAND_CMD_READ0 command and should be sufficient.
>>> Technically should be moved
>>> in switch and not unconditionally.
>>>
>>> Michael
>>>
>>
>> max_bitflip is not being reset to 0 across page reads.
>> Once a page is read with higher bitflips all subsequent reads
>> are returning the same bitflip value even though they have none.
>>
>> This is causing problems like incorrectly
>> marking erase blocks bad by UBI and read failures.
>>
>> Tested it with both MTD reads and UBI attach.
>> This change is inline with other NFC drivers.
>>
>> Sample error log where a block is marked bad incorrectly:
>>
>> ubi0: fixable bit-flip detected at PEB 125
>> ubi0: run torture test for PEB 125
>> ubi0: fixable bit-flip detected at PEB 125
>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>> must be bad
>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>> ubi0: mark PEB 125 as bad
>>
>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
>> ---
>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
>> index 1d9a6d107b..97f250483f 100644
>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>> @@ -803,6 +803,11 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
>>
>>         switch (command) {
>>         case NAND_CMD_READ0:
>> +               /*
>> +                * Reset max_bitflips to zero. Once command is complete,
>> +                * max_bitflips for this READ is returned in ecc.read_page()
>> +                */
>> +               info->max_bitflips = 0;
>>         case NAND_CMD_READOOB:
>>         case NAND_CMD_PAGEPROG:
>>                 if (!info->force_raw)
>> --
>> 2.17.1
>>
>> Thanks Michael. Fixed it.
>>
>>>> Thanks, Ravi.
>>>>
>>>>>>>         switch (command) {
>>>>>>>         case NAND_CMD_READ0:
>>>>>>> --
>>>>>>> 2.17.1
>>>>>
>>>>
>>>
>>>
> 
> Acked-by: Michael Trimarchi <michael@amarulasolutions.com>
> 
> 
>
Michael Nazzareno Trimarchi May 21, 2024, 8:15 p.m. UTC | #8
Hi Dario

Can you add to next pull?

Michael

On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti <rminnikanti@marvell.com>
wrote:

> Hi,
>
> Can you please merge this PR, if there are no more review comments?
>
> Thanks,
> Ravi
> On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:
>
> > ----------------------------------------------------------------------
> > Hi Ravi
> >
> > On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com>
> wrote:
> >>
> >> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
> >>> Hi Ravi
> >>>
> >>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <
> rminnikanti@marvell.com> wrote:
> >>>>
> >>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
> >>>>
> >>>>>
> ----------------------------------------------------------------------
> >>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <
> judge.packham@gmail.com> wrote:
> >>>>>>
> >>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <
> rminnikanti@marvell.com> wrote:
> >>>>>>>
> >>>>>>> Once a page is read with higher bitflips all subsequent reads
> >>>>>>> are returning the same bitflip value even though they have none.
> >>>>>>> max_bitflip variable is not being reset to 0 across page reads.
> >>>>>>>
> >>>>>>> This is causing problems like incorrectly
> >>>>>>> marking erase blocks bad by UBI and causing read failures.
> >>>>>>>
> >>>>>>> Verified the change with both MTD reads and UBI.
> >>>>>>> This change is inline with other NFC drivers.
> >>>>>>>
> >>>>>>> Sample error log where a block is marked bad incorrectly:
> >>>>>>>
> >>>>>>> ubi0: fixable bit-flip detected at PEB 125
> >>>>>>> ubi0: run torture test for PEB 125
> >>>>>>> ubi0: fixable bit-flip detected at PEB 125
> >>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> >>>>>>> must be bad
> >>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> >>>>>>> ubi0: mark PEB 125 as bad
> >>>>>>>
> >>>>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
> >>>>>>
> >>>>>> Looks good to me
> >>>>>>
> >>>>>> Reviewed-by: Chris Packham <judge.packham@gmail.com>
> >>>>>>
> >>>>>>> ---
> >>>>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
> >>>>>>>  1 file changed, 5 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
> b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>>>> index 1d9a6d107b..d2a4faad56 100644
> >>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct
> pxa3xx_nand_info *info, int command)
> >>>>>>>         info->ecc_err_cnt       = 0;
> >>>>>>>         info->ndcb3             = 0;
> >>>>>>>         info->need_wait         = 0;
> >>>>>>> +       /*
> >>>>>>> +        * Reset max_bitflips to zero. Once command is complete,
> >>>>>>> +        * max_bitflips for this READ is returned in
> ecc.read_page()
> >>>>>>> +        */
> >>>>>>> +       info->max_bitflips      = 0;
> >>>>>>>
> >>>>>
> >>>>> Why this should not be put to 0 in read_page instead on
> prepare_start_command?
> >>>>>
> >>>>> Michael
> >>>>>
> >>>>
> >>>> ecc.read_page is invoked after the read command execution.
> >>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then
> ecc.read_page is invoked
> >>>> to read the page from buffer. So, by the time read_page is invoked,
> info->max_bitflips
> >>>> must already have the bit flip value.
> >>>>
> >>>
> >>> All the other implementation has a slight different way to handle.
> >>> From what you said the reset should
> >>> be done on for NAND_CMD_READ0 command and should be sufficient.
> >>> Technically should be moved
> >>> in switch and not unconditionally.
> >>>
> >>> Michael
> >>>
> >>
> >> max_bitflip is not being reset to 0 across page reads.
> >> Once a page is read with higher bitflips all subsequent reads
> >> are returning the same bitflip value even though they have none.
> >>
> >> This is causing problems like incorrectly
> >> marking erase blocks bad by UBI and read failures.
> >>
> >> Tested it with both MTD reads and UBI attach.
> >> This change is inline with other NFC drivers.
> >>
> >> Sample error log where a block is marked bad incorrectly:
> >>
> >> ubi0: fixable bit-flip detected at PEB 125
> >> ubi0: run torture test for PEB 125
> >> ubi0: fixable bit-flip detected at PEB 125
> >> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
> >> must be bad
> >> ubi0 error: erase_worker: failed to erase PEB 125, error -5
> >> ubi0: mark PEB 125 as bad
> >>
> >> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
> >> ---
> >>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
> b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >> index 1d9a6d107b..97f250483f 100644
> >> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
> >> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
> >> @@ -803,6 +803,11 @@ static void prepare_start_command(struct
> pxa3xx_nand_info *info, int command)
> >>
> >>         switch (command) {
> >>         case NAND_CMD_READ0:
> >> +               /*
> >> +                * Reset max_bitflips to zero. Once command is complete,
> >> +                * max_bitflips for this READ is returned in
> ecc.read_page()
> >> +                */
> >> +               info->max_bitflips = 0;
> >>         case NAND_CMD_READOOB:
> >>         case NAND_CMD_PAGEPROG:
> >>                 if (!info->force_raw)
> >> --
> >> 2.17.1
> >>
> >> Thanks Michael. Fixed it.
> >>
> >>>> Thanks, Ravi.
> >>>>
> >>>>>>>         switch (command) {
> >>>>>>>         case NAND_CMD_READ0:
> >>>>>>> --
> >>>>>>> 2.17.1
> >>>>>
> >>>>
> >>>
> >>>
> >
> > Acked-by: Michael Trimarchi <michael@amarulasolutions.com>
> >
> >
> >
>
Ravi Minnikanti June 16, 2024, 10:24 a.m. UTC | #9
Hi

Can this be merged?
Let me know if I missed any process.

Thanks,
Ravi

On 5/21/24 13:15, Michael Nazzareno Trimarchi wrote:
> Hi Dario
> 
> Can you add to next pull?
> 
> Michael
> 
> On Tue, May 21, 2024, 4:31 PM Ravi Minnikanti <rminnikanti@marvell.com>
> wrote:
> 
>> Hi,
>>
>> Can you please merge this PR, if there are no more review comments?
>>
>> Thanks,
>> Ravi
>> On 5/6/24 11:28, Michael Nazzareno Trimarchi wrote:
>>
>>> ----------------------------------------------------------------------
>>> Hi Ravi
>>>
>>> On Mon, May 6, 2024 at 7:33 PM Ravi Minnikanti <rminnikanti@marvell.com>
>> wrote:
>>>>
>>>> On 5/6/24 00:35, Michael Nazzareno Trimarchi wrote:
>>>>> Hi Ravi
>>>>>
>>>>> On Tue, Apr 30, 2024 at 6:25 AM Ravi Minnikanti <
>> rminnikanti@marvell.com> wrote:
>>>>>>
>>>>>> On 4/29/24 09:59, Michael Nazzareno Trimarchi wrote:
>>>>>>
>>>>>>>
>> ----------------------------------------------------------------------
>>>>>>> On Mon, Apr 29, 2024 at 6:22 PM Chris Packham <
>> judge.packham@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Sun, Apr 28, 2024 at 4:15 AM Ravi Minnikanti <
>> rminnikanti@marvell.com> wrote:
>>>>>>>>>
>>>>>>>>> Once a page is read with higher bitflips all subsequent reads
>>>>>>>>> are returning the same bitflip value even though they have none.
>>>>>>>>> max_bitflip variable is not being reset to 0 across page reads.
>>>>>>>>>
>>>>>>>>> This is causing problems like incorrectly
>>>>>>>>> marking erase blocks bad by UBI and causing read failures.
>>>>>>>>>
>>>>>>>>> Verified the change with both MTD reads and UBI.
>>>>>>>>> This change is inline with other NFC drivers.
>>>>>>>>>
>>>>>>>>> Sample error log where a block is marked bad incorrectly:
>>>>>>>>>
>>>>>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>>>>>> ubi0: run torture test for PEB 125
>>>>>>>>> ubi0: fixable bit-flip detected at PEB 125
>>>>>>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>>>>>>>>> must be bad
>>>>>>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>>>>>>>>> ubi0: mark PEB 125 as bad
>>>>>>>>>
>>>>>>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
>>>>>>>>
>>>>>>>> Looks good to me
>>>>>>>>
>>>>>>>> Reviewed-by: Chris Packham <judge.packham@gmail.com>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>>>>>>>>  1 file changed, 5 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
>> b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>>>> index 1d9a6d107b..d2a4faad56 100644
>>>>>>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>>>>>>> @@ -800,6 +800,11 @@ static void prepare_start_command(struct
>> pxa3xx_nand_info *info, int command)
>>>>>>>>>         info->ecc_err_cnt       = 0;
>>>>>>>>>         info->ndcb3             = 0;
>>>>>>>>>         info->need_wait         = 0;
>>>>>>>>> +       /*
>>>>>>>>> +        * Reset max_bitflips to zero. Once command is complete,
>>>>>>>>> +        * max_bitflips for this READ is returned in
>> ecc.read_page()
>>>>>>>>> +        */
>>>>>>>>> +       info->max_bitflips      = 0;
>>>>>>>>>
>>>>>>>
>>>>>>> Why this should not be put to 0 in read_page instead on
>> prepare_start_command?
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>
>>>>>> ecc.read_page is invoked after the read command execution.
>>>>>> First chip->cmdfunc is executed with NAND_CMD_READ0 and then
>> ecc.read_page is invoked
>>>>>> to read the page from buffer. So, by the time read_page is invoked,
>> info->max_bitflips
>>>>>> must already have the bit flip value.
>>>>>>
>>>>>
>>>>> All the other implementation has a slight different way to handle.
>>>>> From what you said the reset should
>>>>> be done on for NAND_CMD_READ0 command and should be sufficient.
>>>>> Technically should be moved
>>>>> in switch and not unconditionally.
>>>>>
>>>>> Michael
>>>>>
>>>>
>>>> max_bitflip is not being reset to 0 across page reads.
>>>> Once a page is read with higher bitflips all subsequent reads
>>>> are returning the same bitflip value even though they have none.
>>>>
>>>> This is causing problems like incorrectly
>>>> marking erase blocks bad by UBI and read failures.
>>>>
>>>> Tested it with both MTD reads and UBI attach.
>>>> This change is inline with other NFC drivers.
>>>>
>>>> Sample error log where a block is marked bad incorrectly:
>>>>
>>>> ubi0: fixable bit-flip detected at PEB 125
>>>> ubi0: run torture test for PEB 125
>>>> ubi0: fixable bit-flip detected at PEB 125
>>>> ubi0 error: torture_peb: read problems on freshly erased PEB 125,
>>>> must be bad
>>>> ubi0 error: erase_worker: failed to erase PEB 125, error -5
>>>> ubi0: mark PEB 125 as bad
>>>>
>>>> Signed-off-by: rminnikanti <rminnikanti@marvell.com>
>>>> ---
>>>>  drivers/mtd/nand/raw/pxa3xx_nand.c | 5 +++++
>>>>  1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c
>> b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>> index 1d9a6d107b..97f250483f 100644
>>>> --- a/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>> +++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
>>>> @@ -803,6 +803,11 @@ static void prepare_start_command(struct
>> pxa3xx_nand_info *info, int command)
>>>>
>>>>         switch (command) {
>>>>         case NAND_CMD_READ0:
>>>> +               /*
>>>> +                * Reset max_bitflips to zero. Once command is complete,
>>>> +                * max_bitflips for this READ is returned in
>> ecc.read_page()
>>>> +                */
>>>> +               info->max_bitflips = 0;
>>>>         case NAND_CMD_READOOB:
>>>>         case NAND_CMD_PAGEPROG:
>>>>                 if (!info->force_raw)
>>>> --
>>>> 2.17.1
>>>>
>>>> Thanks Michael. Fixed it.
>>>>
>>>>>> Thanks, Ravi.
>>>>>>
>>>>>>>>>         switch (command) {
>>>>>>>>>         case NAND_CMD_READ0:
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>> Acked-by: Michael Trimarchi <michael@amarulasolutions.com>
>>>
>>>
>>>
>>
>
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/pxa3xx_nand.c b/drivers/mtd/nand/raw/pxa3xx_nand.c
index 1d9a6d107b..d2a4faad56 100644
--- a/drivers/mtd/nand/raw/pxa3xx_nand.c
+++ b/drivers/mtd/nand/raw/pxa3xx_nand.c
@@ -800,6 +800,11 @@  static void prepare_start_command(struct pxa3xx_nand_info *info, int command)
 	info->ecc_err_cnt	= 0;
 	info->ndcb3		= 0;
 	info->need_wait		= 0;
+	/*
+	 * Reset max_bitflips to zero. Once command is complete,
+	 * max_bitflips for this READ is returned in ecc.read_page()
+	 */
+	info->max_bitflips	= 0;
 
 	switch (command) {
 	case NAND_CMD_READ0: