diff mbox series

[2/3] mtd: rawnand: arasan: Ensure program page operations are successful

Message ID 20230717194221.229778-2-miquel.raynal@bootlin.com
State Accepted
Headers show
Series [1/3] mtd: rawnand: marvell: Ensure program page operations are successful | expand

Commit Message

Miquel Raynal July 17, 2023, 7:42 p.m. UTC
The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check
should be performed to see whether the operation was finished *and*
successful.

The NAND core offers helpers to finish a page write (sending the
"PAGE PROG" command, waiting for the NAND chip to be ready again, and
checking the operation status). But in some cases, advanced controller
drivers might want to optimize this and craft their own page write
helper to leverage additional hardware capabilities, thus not always
using the core facilities.

Some drivers, like this one, do not use the core helper to finish a page
write because the final cycles are automatically managed by the
hardware. In this case, the additional care must be taken to manually
perform the final status check.

Let's read the NAND chip status at the end of the page write helper and
return -EIO upon error.

Cc: Michal Simek <michal.simek@amd.com>
Cc: stable@vger.kernel.org
Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>

---

Hello Michal,

I have not tested this, but based on a report on another driver, I
believe the status check is also missing here and could sometimes
lead to unnoticed partial writes.

Please test on your side that everything still works and let me
know how it goes.

Thanks a lot.
Miquèl
---
 drivers/mtd/nand/raw/arasan-nand-controller.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Miquel Raynal Sept. 11, 2023, 3:52 p.m. UTC | #1
Hi Michal,

miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:

> The NAND core complies with the ONFI specification, which itself
> mentions that after any program or erase operation, a status check
> should be performed to see whether the operation was finished *and*
> successful.
> 
> The NAND core offers helpers to finish a page write (sending the
> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
> checking the operation status). But in some cases, advanced controller
> drivers might want to optimize this and craft their own page write
> helper to leverage additional hardware capabilities, thus not always
> using the core facilities.
> 
> Some drivers, like this one, do not use the core helper to finish a page
> write because the final cycles are automatically managed by the
> hardware. In this case, the additional care must be taken to manually
> perform the final status check.
> 
> Let's read the NAND chip status at the end of the page write helper and
> return -EIO upon error.
>
> Cc: Michal Simek <michal.simek@amd.com>
> Cc: stable@vger.kernel.org
> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> 
> ---
> 
> Hello Michal,
> 
> I have not tested this, but based on a report on another driver, I
> believe the status check is also missing here and could sometimes
> lead to unnoticed partial writes.
> 
> Please test on your side that everything still works and let me
> know how it goes.

Any news from the testing team about patches 2/3 and 3/3?

Thanks,
Miquèl
Michal Simek Sept. 12, 2023, 1:55 p.m. UTC | #2
Hi Miquel,

On 9/11/23 17:52, Miquel Raynal wrote:
> Hi Michal,
> 
> miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
> 
>> The NAND core complies with the ONFI specification, which itself
>> mentions that after any program or erase operation, a status check
>> should be performed to see whether the operation was finished *and*
>> successful.
>>
>> The NAND core offers helpers to finish a page write (sending the
>> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
>> checking the operation status). But in some cases, advanced controller
>> drivers might want to optimize this and craft their own page write
>> helper to leverage additional hardware capabilities, thus not always
>> using the core facilities.
>>
>> Some drivers, like this one, do not use the core helper to finish a page
>> write because the final cycles are automatically managed by the
>> hardware. In this case, the additional care must be taken to manually
>> perform the final status check.
>>
>> Let's read the NAND chip status at the end of the page write helper and
>> return -EIO upon error.
>>
>> Cc: Michal Simek <michal.simek@amd.com>
>> Cc: stable@vger.kernel.org
>> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>
>> ---
>>
>> Hello Michal,
>>
>> I have not tested this, but based on a report on another driver, I
>> believe the status check is also missing here and could sometimes
>> lead to unnoticed partial writes.
>>
>> Please test on your side that everything still works and let me
>> know how it goes.
> 
> Any news from the testing team about patches 2/3 and 3/3?

I asked Amit to test and he didn't get back to me even I asked for it couple of 
times.
Can you please tell me how to test it? I will setup HW myself and test it and 
get back to you.

Thanks,
Michal
Miquel Raynal Sept. 12, 2023, 2:17 p.m. UTC | #3
Hi Michal,

michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:

> Hi Miquel,
> 
> On 9/11/23 17:52, Miquel Raynal wrote:
> > Hi Michal,
> > 
> > miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
> >   
> >> The NAND core complies with the ONFI specification, which itself
> >> mentions that after any program or erase operation, a status check
> >> should be performed to see whether the operation was finished *and*
> >> successful.
> >>
> >> The NAND core offers helpers to finish a page write (sending the
> >> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
> >> checking the operation status). But in some cases, advanced controller
> >> drivers might want to optimize this and craft their own page write
> >> helper to leverage additional hardware capabilities, thus not always
> >> using the core facilities.
> >>
> >> Some drivers, like this one, do not use the core helper to finish a page
> >> write because the final cycles are automatically managed by the
> >> hardware. In this case, the additional care must be taken to manually
> >> perform the final status check.
> >>
> >> Let's read the NAND chip status at the end of the page write helper and
> >> return -EIO upon error.
> >>
> >> Cc: Michal Simek <michal.simek@amd.com>
> >> Cc: stable@vger.kernel.org
> >> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
> >> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> >>
> >> ---
> >>
> >> Hello Michal,
> >>
> >> I have not tested this, but based on a report on another driver, I
> >> believe the status check is also missing here and could sometimes
> >> lead to unnoticed partial writes.
> >>
> >> Please test on your side that everything still works and let me
> >> know how it goes.  
> > 
> > Any news from the testing team about patches 2/3 and 3/3?  
> 
> I asked Amit to test and he didn't get back to me even I asked for it couple of times.

Ok.

> Can you please tell me how to test it? I will setup HW myself and test it and get back to you.

I believe setting up the board to use the hardware BCH engine and
performing basic erase/write/read testing with a known file and check
it still behaves correctly would work. You can also run

	nandbiterrs -i /dev/mtdx

as a second step and verify there is no difference with and without the
patch and finally check the impact:

	flash_speed -d -c 10 /dev/mtdx
	(be careful: this is a destructive operation)

Thanks,
Miquèl
Michal Simek Sept. 20, 2023, 7:55 a.m. UTC | #4
Hi Miquel,

On 9/12/23 16:17, Miquel Raynal wrote:
> Hi Michal,
> 
> michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
> 
>> Hi Miquel,
>>
>> On 9/11/23 17:52, Miquel Raynal wrote:
>>> Hi Michal,
>>>
>>> miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
>>>    
>>>> The NAND core complies with the ONFI specification, which itself
>>>> mentions that after any program or erase operation, a status check
>>>> should be performed to see whether the operation was finished *and*
>>>> successful.
>>>>
>>>> The NAND core offers helpers to finish a page write (sending the
>>>> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
>>>> checking the operation status). But in some cases, advanced controller
>>>> drivers might want to optimize this and craft their own page write
>>>> helper to leverage additional hardware capabilities, thus not always
>>>> using the core facilities.
>>>>
>>>> Some drivers, like this one, do not use the core helper to finish a page
>>>> write because the final cycles are automatically managed by the
>>>> hardware. In this case, the additional care must be taken to manually
>>>> perform the final status check.
>>>>
>>>> Let's read the NAND chip status at the end of the page write helper and
>>>> return -EIO upon error.
>>>>
>>>> Cc: Michal Simek <michal.simek@amd.com>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
>>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>>>
>>>> ---
>>>>
>>>> Hello Michal,
>>>>
>>>> I have not tested this, but based on a report on another driver, I
>>>> believe the status check is also missing here and could sometimes
>>>> lead to unnoticed partial writes.
>>>>
>>>> Please test on your side that everything still works and let me
>>>> know how it goes.
>>>
>>> Any news from the testing team about patches 2/3 and 3/3?
>>
>> I asked Amit to test and he didn't get back to me even I asked for it couple of times.
> 
> Ok.
> 
>> Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
> 
> I believe setting up the board to use the hardware BCH engine and
> performing basic erase/write/read testing with a known file and check
> it still behaves correctly would work. You can also run
> 
> 	nandbiterrs -i /dev/mtdx
> 
> as a second step and verify there is no difference with and without the
> patch and finally check the impact:
> 
> 	flash_speed -d -c 10 /dev/mtdx
> 	(be careful: this is a destructive operation)

I run this myself.

pl353 test log before the patch.

# cat /proc/mtd
dev:    size   erasesize  name
mtd0: 10000000 00020000 "pl35x-nand-controller"
# nandbiterrs -i /dev/mtd0
incremental biterrors test
Successfully corrected 0 bit errors per subpage
Inserted biterror @ 0/5
Read reported 1 corrected bit errors
Successfully corrected 1 bit errors per subpage
Inserted biterror @ 0/2
Failed to recover 1 bitflips
Read error after 2 bit errors per page
#  flash_speed -d -c 10 /dev/mtd0
scanning for bad eraseblocks
scanned 10 eraseblocks, 0 are bad
testing eraseblock write speed
eraseblock write speed is 4555 KiB/s
testing eraseblock read speed
eraseblock read speed is 5765 KiB/s
testing page write speed
page write speed is 4383 KiB/s
testing page read speed
page read speed is 5614 KiB/s
testing 2 page write speed
2 page write speed is 4444 KiB/s
testing 2 page read speed
2 page read speed is 5688 KiB/s
Testing erase speed
erase speed is 320000 KiB/s
Testing 2x multi-block erase speed
2x multi-block erase speed is 320000 KiB/s
Testing 4x multi-block erase speed
4x multi-block erase speed is 320000 KiB/s
Testing 8x multi-block erase speed
8x multi-block erase speed is 320000 KiB/s
Testing 16x multi-block erase speed
16x multi-block erase speed is 320000 KiB/s
Testing 32x multi-block erase speed
32x multi-block erase speed is 320000 KiB/s
Testing 64x multi-block erase speed
64x multi-block erase speed is 320000 KiB/s
finished
# dmesg | grep nand
[    2.876719] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[    2.883130] nand: Micron MT29F2G08ABAEAWP
[    2.887230] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB 
size: 64
#


When applied

# cat /proc/mtd
dev:    size   erasesize  name
mtd0: 10000000 00020000 "pl35x-nand-controller"
# nandbiterrs -i /dev/mtd0
incremental biterrors test
Successfully corrected 0 bit errors per subpage
Inserted biterror @ 0/5
Read reported 1 corrected bit errors
Successfully corrected 1 bit errors per subpage
Inserted biterror @ 0/2
Failed to recover 1 bitflips
Read error after 2 bit errors per page
# flash_speed -d -c 10 /dev/mtd0
scanning for bad eraseblocks
scanned 10 eraseblocks, 0 are bad
testing eraseblock write speed
eraseblock write speed is 4522 KiB/s
testing eraseblock read speed
eraseblock read speed is 5765 KiB/s
testing page write speed
page write speed is 4383 KiB/s
testing page read speed
page read speed is 5638 KiB/s
testing 2 page write speed
2 page write speed is 4444 KiB/s
testing 2 page read speed
2 page read speed is 5714 KiB/s
Testing erase speed
erase speed is 320000 KiB/s
Testing 2x multi-block erase speed
2x multi-block erase speed is 320000 KiB/s
Testing 4x multi-block erase speed
4x multi-block erase speed is 320000 KiB/s
Testing 8x multi-block erase speed
8x multi-block erase speed is 320000 KiB/s
Testing 16x multi-block erase speed
16x multi-block erase speed is 320000 KiB/s
Testing 32x multi-block erase speed
32x multi-block erase speed is 320000 KiB/s
Testing 64x multi-block erase speed
64x multi-block erase speed is 320000 KiB/s
finished
# dmesg | grep nand
[    2.896206] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
[    2.902648] nand: Micron MT29F2G08ABAEAWP
[    2.906667] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB 
size: 64

Behavior is the same. Speed is changing on every run.


I don't have zynqmp board here but will try to get data asap.


Thanks,
Michal
Michal Simek Sept. 21, 2023, 10:25 a.m. UTC | #5
On 9/12/23 16:17, Miquel Raynal wrote:
> Hi Michal,
> 
> michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
> 
>> Hi Miquel,
>>
>> On 9/11/23 17:52, Miquel Raynal wrote:
>>> Hi Michal,
>>>
>>> miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
>>>    
>>>> The NAND core complies with the ONFI specification, which itself
>>>> mentions that after any program or erase operation, a status check
>>>> should be performed to see whether the operation was finished *and*
>>>> successful.
>>>>
>>>> The NAND core offers helpers to finish a page write (sending the
>>>> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
>>>> checking the operation status). But in some cases, advanced controller
>>>> drivers might want to optimize this and craft their own page write
>>>> helper to leverage additional hardware capabilities, thus not always
>>>> using the core facilities.
>>>>
>>>> Some drivers, like this one, do not use the core helper to finish a page
>>>> write because the final cycles are automatically managed by the
>>>> hardware. In this case, the additional care must be taken to manually
>>>> perform the final status check.
>>>>
>>>> Let's read the NAND chip status at the end of the page write helper and
>>>> return -EIO upon error.
>>>>
>>>> Cc: Michal Simek <michal.simek@amd.com>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
>>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>>>
>>>> ---
>>>>
>>>> Hello Michal,
>>>>
>>>> I have not tested this, but based on a report on another driver, I
>>>> believe the status check is also missing here and could sometimes
>>>> lead to unnoticed partial writes.
>>>>
>>>> Please test on your side that everything still works and let me
>>>> know how it goes.
>>>
>>> Any news from the testing team about patches 2/3 and 3/3?
>>
>> I asked Amit to test and he didn't get back to me even I asked for it couple of times.
> 
> Ok.
> 
>> Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
> 
> I believe setting up the board to use the hardware BCH engine and
> performing basic erase/write/read testing with a known file and check
> it still behaves correctly would work. You can also run
> 
> 	nandbiterrs -i /dev/mtdx
> 
> as a second step and verify there is no difference with and without the
> patch and finally check the impact:
> 
> 	flash_speed -d -c 10 /dev/mtdx
> 	(be careful: this is a destructive operation)

Testing team won't see any issue that's why feel free to add my
Acked-by: Michal Smek <michal.simek@amd.com>

Thanks,
Michal
Miquel Raynal Sept. 22, 2023, 9:14 a.m. UTC | #6
Hi Michal,

michal.simek@amd.com wrote on Thu, 21 Sep 2023 12:25:10 +0200:

> On 9/12/23 16:17, Miquel Raynal wrote:
> > Hi Michal,
> > 
> > michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
> >   
> >> Hi Miquel,
> >>
> >> On 9/11/23 17:52, Miquel Raynal wrote:  
> >>> Hi Michal,
> >>>
> >>> miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:  
> >>>    >>>> The NAND core complies with the ONFI specification, which itself  
> >>>> mentions that after any program or erase operation, a status check
> >>>> should be performed to see whether the operation was finished *and*
> >>>> successful.
> >>>>
> >>>> The NAND core offers helpers to finish a page write (sending the
> >>>> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
> >>>> checking the operation status). But in some cases, advanced controller
> >>>> drivers might want to optimize this and craft their own page write
> >>>> helper to leverage additional hardware capabilities, thus not always
> >>>> using the core facilities.
> >>>>
> >>>> Some drivers, like this one, do not use the core helper to finish a page
> >>>> write because the final cycles are automatically managed by the
> >>>> hardware. In this case, the additional care must be taken to manually
> >>>> perform the final status check.
> >>>>
> >>>> Let's read the NAND chip status at the end of the page write helper and
> >>>> return -EIO upon error.
> >>>>
> >>>> Cc: Michal Simek <michal.simek@amd.com>
> >>>> Cc: stable@vger.kernel.org
> >>>> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
> >>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> >>>>
> >>>> ---
> >>>>
> >>>> Hello Michal,
> >>>>
> >>>> I have not tested this, but based on a report on another driver, I
> >>>> believe the status check is also missing here and could sometimes
> >>>> lead to unnoticed partial writes.
> >>>>
> >>>> Please test on your side that everything still works and let me
> >>>> know how it goes.  
> >>>
> >>> Any news from the testing team about patches 2/3 and 3/3?  
> >>
> >> I asked Amit to test and he didn't get back to me even I asked for it couple of times.  
> > 
> > Ok.
> >   
> >> Can you please tell me how to test it? I will setup HW myself and test it and get back to you.  
> > 
> > I believe setting up the board to use the hardware BCH engine and
> > performing basic erase/write/read testing with a known file and check
> > it still behaves correctly would work. You can also run
> > 
> > 	nandbiterrs -i /dev/mtdx
> > 
> > as a second step and verify there is no difference with and without the
> > patch and finally check the impact:
> > 
> > 	flash_speed -d -c 10 /dev/mtdx
> > 	(be careful: this is a destructive operation)  
> 
> Testing team won't see any issue that's why feel free to add my
> Acked-by: Michal Smek <michal.simek@amd.com>

I think you told me in the last e-mail you tested the pl353 patch, not
the one for the Arasan controller. Shall I add your Acked-by here and
your Tested-by in the other?

Thanks,
Miquèl
Michal Simek Sept. 22, 2023, 9:16 a.m. UTC | #7
On 9/22/23 11:14, Miquel Raynal wrote:
> Hi Michal,
> 
> michal.simek@amd.com wrote on Thu, 21 Sep 2023 12:25:10 +0200:
> 
>> On 9/12/23 16:17, Miquel Raynal wrote:
>>> Hi Michal,
>>>
>>> michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
>>>    
>>>> Hi Miquel,
>>>>
>>>> On 9/11/23 17:52, Miquel Raynal wrote:
>>>>> Hi Michal,
>>>>>
>>>>> miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
>>>>>     >>>> The NAND core complies with the ONFI specification, which itself
>>>>>> mentions that after any program or erase operation, a status check
>>>>>> should be performed to see whether the operation was finished *and*
>>>>>> successful.
>>>>>>
>>>>>> The NAND core offers helpers to finish a page write (sending the
>>>>>> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
>>>>>> checking the operation status). But in some cases, advanced controller
>>>>>> drivers might want to optimize this and craft their own page write
>>>>>> helper to leverage additional hardware capabilities, thus not always
>>>>>> using the core facilities.
>>>>>>
>>>>>> Some drivers, like this one, do not use the core helper to finish a page
>>>>>> write because the final cycles are automatically managed by the
>>>>>> hardware. In this case, the additional care must be taken to manually
>>>>>> perform the final status check.
>>>>>>
>>>>>> Let's read the NAND chip status at the end of the page write helper and
>>>>>> return -EIO upon error.
>>>>>>
>>>>>> Cc: Michal Simek <michal.simek@amd.com>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
>>>>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Hello Michal,
>>>>>>
>>>>>> I have not tested this, but based on a report on another driver, I
>>>>>> believe the status check is also missing here and could sometimes
>>>>>> lead to unnoticed partial writes.
>>>>>>
>>>>>> Please test on your side that everything still works and let me
>>>>>> know how it goes.
>>>>>
>>>>> Any news from the testing team about patches 2/3 and 3/3?
>>>>
>>>> I asked Amit to test and he didn't get back to me even I asked for it couple of times.
>>>
>>> Ok.
>>>    
>>>> Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
>>>
>>> I believe setting up the board to use the hardware BCH engine and
>>> performing basic erase/write/read testing with a known file and check
>>> it still behaves correctly would work. You can also run
>>>
>>> 	nandbiterrs -i /dev/mtdx
>>>
>>> as a second step and verify there is no difference with and without the
>>> patch and finally check the impact:
>>>
>>> 	flash_speed -d -c 10 /dev/mtdx
>>> 	(be careful: this is a destructive operation)
>>
>> Testing team won't see any issue that's why feel free to add my
>> Acked-by: Michal Smek <michal.simek@amd.com>
> 
> I think you told me in the last e-mail you tested the pl353 patch, not
> the one for the Arasan controller. Shall I add your Acked-by here and
> your Tested-by in the other?

Yes exactly.
I tested pl353 myself. If that log looks good feel free to add my Tested-by tag.
And I got information from testing team that they tested Arasan one hence only 
Ack one.

Thanks,
Michal
Miquel Raynal Sept. 22, 2023, 9:17 a.m. UTC | #8
Hi Michal,

michal.simek@amd.com wrote on Fri, 22 Sep 2023 11:16:20 +0200:

> On 9/22/23 11:14, Miquel Raynal wrote:
> > Hi Michal,
> > 
> > michal.simek@amd.com wrote on Thu, 21 Sep 2023 12:25:10 +0200:
> >   
> >> On 9/12/23 16:17, Miquel Raynal wrote:  
> >>> Hi Michal,
> >>>
> >>> michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:  
> >>>    >>>> Hi Miquel,  
> >>>>
> >>>> On 9/11/23 17:52, Miquel Raynal wrote:  
> >>>>> Hi Michal,
> >>>>>
> >>>>> miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:  
> >>>>>     >>>> The NAND core complies with the ONFI specification, which itself  
> >>>>>> mentions that after any program or erase operation, a status check
> >>>>>> should be performed to see whether the operation was finished *and*
> >>>>>> successful.
> >>>>>>
> >>>>>> The NAND core offers helpers to finish a page write (sending the
> >>>>>> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
> >>>>>> checking the operation status). But in some cases, advanced controller
> >>>>>> drivers might want to optimize this and craft their own page write
> >>>>>> helper to leverage additional hardware capabilities, thus not always
> >>>>>> using the core facilities.
> >>>>>>
> >>>>>> Some drivers, like this one, do not use the core helper to finish a page
> >>>>>> write because the final cycles are automatically managed by the
> >>>>>> hardware. In this case, the additional care must be taken to manually
> >>>>>> perform the final status check.
> >>>>>>
> >>>>>> Let's read the NAND chip status at the end of the page write helper and
> >>>>>> return -EIO upon error.
> >>>>>>
> >>>>>> Cc: Michal Simek <michal.simek@amd.com>
> >>>>>> Cc: stable@vger.kernel.org
> >>>>>> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
> >>>>>> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> Hello Michal,
> >>>>>>
> >>>>>> I have not tested this, but based on a report on another driver, I
> >>>>>> believe the status check is also missing here and could sometimes
> >>>>>> lead to unnoticed partial writes.
> >>>>>>
> >>>>>> Please test on your side that everything still works and let me
> >>>>>> know how it goes.  
> >>>>>
> >>>>> Any news from the testing team about patches 2/3 and 3/3?  
> >>>>
> >>>> I asked Amit to test and he didn't get back to me even I asked for it couple of times.  
> >>>
> >>> Ok.  
> >>>    >>>> Can you please tell me how to test it? I will setup HW myself and test it and get back to you.  
> >>>
> >>> I believe setting up the board to use the hardware BCH engine and
> >>> performing basic erase/write/read testing with a known file and check
> >>> it still behaves correctly would work. You can also run
> >>>
> >>> 	nandbiterrs -i /dev/mtdx
> >>>
> >>> as a second step and verify there is no difference with and without the
> >>> patch and finally check the impact:
> >>>
> >>> 	flash_speed -d -c 10 /dev/mtdx
> >>> 	(be careful: this is a destructive operation)  
> >>
> >> Testing team won't see any issue that's why feel free to add my
> >> Acked-by: Michal Smek <michal.simek@amd.com>  
> > 
> > I think you told me in the last e-mail you tested the pl353 patch, not
> > the one for the Arasan controller. Shall I add your Acked-by here and
> > your Tested-by in the other?  
> 
> Yes exactly.
> I tested pl353 myself. If that log looks good feel free to add my Tested-by tag.
> And I got information from testing team that they tested Arasan one hence only Ack one.

Perfect. Thanks a lot!

Miquèl
Miquel Raynal Sept. 22, 2023, 2:51 p.m. UTC | #9
On Mon, 2023-07-17 at 19:42:20 UTC, Miquel Raynal wrote:
> The NAND core complies with the ONFI specification, which itself
> mentions that after any program or erase operation, a status check
> should be performed to see whether the operation was finished *and*
> successful.
> 
> The NAND core offers helpers to finish a page write (sending the
> "PAGE PROG" command, waiting for the NAND chip to be ready again, and
> checking the operation status). But in some cases, advanced controller
> drivers might want to optimize this and craft their own page write
> helper to leverage additional hardware capabilities, thus not always
> using the core facilities.
> 
> Some drivers, like this one, do not use the core helper to finish a page
> write because the final cycles are automatically managed by the
> hardware. In this case, the additional care must be taken to manually
> perform the final status check.
> 
> Let's read the NAND chip status at the end of the page write helper and
> return -EIO upon error.
> 
> Cc: Michal Simek <michal.simek@amd.com>
> Cc: stable@vger.kernel.org
> Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine")
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> Acked-by: Michal Smek <michal.simek@amd.com>

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes.

Miquel
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/arasan-nand-controller.c b/drivers/mtd/nand/raw/arasan-nand-controller.c
index 906eef70cb6d..487c139316fe 100644
--- a/drivers/mtd/nand/raw/arasan-nand-controller.c
+++ b/drivers/mtd/nand/raw/arasan-nand-controller.c
@@ -515,6 +515,7 @@  static int anfc_write_page_hw_ecc(struct nand_chip *chip, const u8 *buf,
 	struct mtd_info *mtd = nand_to_mtd(chip);
 	unsigned int len = mtd->writesize + (oob_required ? mtd->oobsize : 0);
 	dma_addr_t dma_addr;
+	u8 status;
 	int ret;
 	struct anfc_op nfc_op = {
 		.pkt_reg =
@@ -561,10 +562,21 @@  static int anfc_write_page_hw_ecc(struct nand_chip *chip, const u8 *buf,
 	}
 
 	/* Spare data is not protected */
-	if (oob_required)
+	if (oob_required) {
 		ret = nand_write_oob_std(chip, page);
+		if (ret)
+			return ret;
+	}
 
-	return ret;
+	/* Check write status on the chip side */
+	ret = nand_status_op(chip, &status);
+	if (ret)
+		return ret;
+
+	if (status & NAND_STATUS_FAIL)
+		return -EIO;
+
+	return 0;
 }
 
 static int anfc_sel_write_page_hw_ecc(struct nand_chip *chip, const u8 *buf,