Message ID | 20090820153644.631dbd7b@lappy.seanm.ca |
---|---|
State | Not Applicable |
Headers | show |
Hi Sean, The change is necessary in both Linux and u-boot. Without this change customer are seeing the problem. Best Regards, Victor Gallardo > -----Original Message----- > From: linuxppc-dev-bounces+vgallardo=amcc.com@lists.ozlabs.org [mailto:linuxppc-dev- > bounces+vgallardo=amcc.com@lists.ozlabs.org] On Behalf Of Sean MacLennan > Sent: Thursday, August 20, 2009 12:37 PM > To: Stefan Roese > Cc: u-boot@lists.denx.de; Feng Kan; linux-mtd@lists.infradead.org; linuxppc-dev@ozlabs.org > Subject: Re: [U-Boot] NAND ECC Error with wrong SMC ording bug > > On Thu, 20 Aug 2009 07:01:21 +0200 > Stefan Roese <sr@denx.de> wrote: > > > On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: > > > > I see other boards using SMC as well, can someone comment on the > > > > change I am proposing. > > > > Should I change the correction algorithm or the calculate > > > > function? If the later is preferred > > > > it would mean the change must be pushed in both U-Boot and Linux. > > > > > > Odds are the calculate function is wrong. The correction algo is > > > used by many nand drivers, I *assume* it is correct. The calculate > > > function was set to agree with u-boot (1.3.0). > > > > Yes, it seems that you changed the order in the calculation function > > while reworking the NDFC driver for arch/powerpc. So we should > > probably change this order back to the original version. And change > > it in U-Boot as well. > > > > BTW: I didn't see any problems with ECC so far with the current code. > > Feng, how did you spot this problem? > > Ok, I think I have reproduced the problem programmatically. Basically, > I force a one bit error with the following patch: > > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c > index 8c21b89..91dd5b4 100644 > --- a/drivers/mtd/nand/nand_base.c > +++ b/drivers/mtd/nand/nand_base.c > @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip > *chip, > uint8_t *ecc_calc = chip->buffers->ecccalc; > const uint8_t *p = buf; > uint32_t *eccpos = chip->ecc.layout->eccpos; > + static int count; > > for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { > chip->ecc.hwctl(mtd, NAND_ECC_WRITE); > - chip->write_buf(mtd, p, eccsize); > - chip->ecc.calculate(mtd, p, &ecc_calc[i]); > + if (count == 0) { > + count = 1; > + printk("Corrupt one bit: %08x => %08x\n", > + *p, *p ^ 8); > + *(uint8_t *)p ^= 8; > + chip->write_buf(mtd, p, eccsize); > + *(uint8_t *)p ^= 8; > + nand_calculate_ecc(mtd, p, &ecc_calc[i]); > + } else { > + chip->write_buf(mtd, p, eccsize); > + chip->ecc.calculate(mtd, p, &ecc_calc[i]); > + } > } > > for (i = 0; i < chip->ecc.total; i++) > > Basically I write a one bit error to the NAND, but calculate with the > correct bit. This assumes nand_calculate_ecc is correct. > > I then added debugs to the correction to make sure it corrected > properly: > > diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c > index c0cb87d..57dcaa1 100644 > --- a/drivers/mtd/nand/nand_ecc.c > +++ b/drivers/mtd/nand/nand_ecc.c > @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf, > byte_addr = (addressbits[b2 & 0x3] << 8) + > (addressbits[b1] << 4) + addressbits[b0]; > bit_addr = addressbits[b2 >> 2]; > + > + printk("Single bit error: correct %08x => %08x\n", > + buf[byte_addr], buf[byte_addr] ^ (1 << bit_addr)); > + > /* flip the bit */ > buf[byte_addr] ^= (1 << bit_addr); > return 1; > > } > /* count nr of bits; use table lookup, faster than calculating it */ > - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) > + if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) { > + printk("ECC DATA BAD\n"); // SAM DBG > return 1; /* error in ecc data; no action needed */ > + } > > printk(KERN_ERR "uncorrectable error : "); > return -1; > > With the current ndfc code, the error correction gets the bits wrong. > Switching it back to the original way and the correction is correct. > > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c > index 89bf85a..497e175 100644 > --- a/drivers/mtd/nand/ndfc.c > +++ b/drivers/mtd/nand/ndfc.c > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, > > wmb(); > ecc = in_be32(ndfc->ndfcbase + NDFC_ECC); > - /* The NDFC uses Smart Media (SMC) bytes order */ > - ecc_code[0] = p[2]; > - ecc_code[1] = p[1]; > + ecc_code[0] = p[1]; > + ecc_code[1] = p[2]; > ecc_code[2] = p[3]; > > return 0; > > Does anybody see a problem with my method of reproducing the bug? This > bug is deadly for our customers. I don't want to make the change unless > it is absolutely necessary. > > Cheers, > Sean > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev
<snip> > With the current ndfc code, the error correction gets the bits wrong. > Switching it back to the original way and the correction is correct. > > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c > index 89bf85a..497e175 100644 > --- a/drivers/mtd/nand/ndfc.c > +++ b/drivers/mtd/nand/ndfc.c > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, > > wmb(); > ecc = in_be32(ndfc->ndfcbase + NDFC_ECC); > - /* The NDFC uses Smart Media (SMC) bytes order */ > - ecc_code[0] = p[2]; > - ecc_code[1] = p[1]; > + ecc_code[0] = p[1]; > + ecc_code[1] = p[2]; > ecc_code[2] = p[3]; > > return 0; > > Does anybody see a problem with my method of reproducing the bug? This > bug is deadly for our customers. I don't want to make the change unless > it is absolutely necessary.. Just one question: did you enabled MTD_NAND_ECC_SMC in configs? -vimal > > Cheers, > Sean > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ >
On Fri, 21 Aug 2009 10:47:09 +0530
vimal singh <vimal.newwork@gmail.com> wrote:
> Just one question: did you enabled MTD_NAND_ECC_SMC in configs?
It is automagically selected when you select the NDFC driver.
Cheers,
Sean
On Friday 21 August 2009 07:17:09 vimal singh wrote: > > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c > > index 89bf85a..497e175 100644 > > --- a/drivers/mtd/nand/ndfc.c > > +++ b/drivers/mtd/nand/ndfc.c > > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, > > > > wmb(); > > ecc = in_be32(ndfc->ndfcbase + NDFC_ECC); > > - /* The NDFC uses Smart Media (SMC) bytes order */ > > - ecc_code[0] = p[2]; > > - ecc_code[1] = p[1]; > > + ecc_code[0] = p[1]; > > + ecc_code[1] = p[2]; > > ecc_code[2] = p[3]; > > > > return 0; > > > > Does anybody see a problem with my method of reproducing the bug? This > > bug is deadly for our customers. I don't want to make the change unless > > it is absolutely necessary.. > > Just one question: did you enabled MTD_NAND_ECC_SMC in configs? Yes, MTD_NAND_ECC_SMC is selected via Kconfig for this driver. Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: office@denx.de
Hi Vimal, > > With the current ndfc code, the error correction gets the bits wrong. > > Switching it back to the original way and the correction is correct. > > > > diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c > > index 89bf85a..497e175 100644 > > --- a/drivers/mtd/nand/ndfc.c > > +++ b/drivers/mtd/nand/ndfc.c > > @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, > > > > wmb(); > > ecc = in_be32(ndfc->ndfcbase + NDFC_ECC); > > - /* The NDFC uses Smart Media (SMC) bytes order */ > > - ecc_code[0] = p[2]; > > - ecc_code[1] = p[1]; > > + ecc_code[0] = p[1]; > > + ecc_code[1] = p[2]; > > ecc_code[2] = p[3]; > > > > return 0; > > > > Does anybody see a problem with my method of reproducing the bug? This > > bug is deadly for our customers. I don't want to make the change unless > > it is absolutely necessary.. > > Just one question: did you enabled MTD_NAND_ECC_SMC in configs? Yes, it was set. Best Regards, Victor Gallardo
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 8c21b89..91dd5b4 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_calc = chip->buffers->ecccalc; const uint8_t *p = buf; uint32_t *eccpos = chip->ecc.layout->eccpos; + static int count; for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { chip->ecc.hwctl(mtd, NAND_ECC_WRITE); - chip->write_buf(mtd, p, eccsize); - chip->ecc.calculate(mtd, p, &ecc_calc[i]); + if (count == 0) { + count = 1; + printk("Corrupt one bit: %08x => %08x\n", + *p, *p ^ 8); + *(uint8_t *)p ^= 8; + chip->write_buf(mtd, p, eccsize); + *(uint8_t *)p ^= 8; + nand_calculate_ecc(mtd, p, &ecc_calc[i]); + } else { + chip->write_buf(mtd, p, eccsize); + chip->ecc.calculate(mtd, p, &ecc_calc[i]); + } } for (i = 0; i < chip->ecc.total; i++) Basically I write a one bit error to the NAND, but calculate with the correct bit. This assumes nand_calculate_ecc is correct. I then added debugs to the correction to make sure it corrected properly: diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c index c0cb87d..57dcaa1 100644 --- a/drivers/mtd/nand/nand_ecc.c +++ b/drivers/mtd/nand/nand_ecc.c @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf, byte_addr = (addressbits[b2 & 0x3] << 8) + (addressbits[b1] << 4) + addressbits[b0]; bit_addr = addressbits[b2 >> 2]; + + printk("Single bit error: correct %08x => %08x\n", + buf[byte_addr], buf[byte_addr] ^ (1 << bit_addr)); + /* flip the bit */ buf[byte_addr] ^= (1 << bit_addr); return 1; } /* count nr of bits; use table lookup, faster than calculating it */ - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) + if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) { + printk("ECC DATA BAD\n"); // SAM DBG return 1; /* error in ecc data; no action needed */ + } printk(KERN_ERR "uncorrectable error : "); return -1; With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc->ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0;