Message ID | 1461961285-24159-2-git-send-email-kdasu.kdev@gmail.com |
---|---|
State | Superseded |
Headers | show |
On Fri, 29 Apr 2016 16:21:25 -0400 Kamal Dasu <kdasu.kdev@gmail.com> wrote: > This change provides a fix for controller bug where nand > controller could have a possible sticky error after a PIO > followed by a DMA read. The fix retries a read if we see > a uncorr_ecc after read to detect such sticky errors. > > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> > --- > drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c > index 29a9abd..13c7784 100644 > --- a/drivers/mtd/nand/brcmnand/brcmnand.c > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c > @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, > struct brcmnand_controller *ctrl = host->ctrl; > u64 err_addr = 0; > int err; > + bool retry = true; > > dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf); > > +try_dmaread: > brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0); > > if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) { > @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, > > if (mtd_is_eccerr(err)) { > int ret; > - > + /* > + * On controller version >=7.0 if we are doing a DMA read > + * after a prior PIO read that reported uncorrectable error, > + * the DMA engine captures this error following DMA read > + * cleared only on subsequent DMA read, so just retry once > + * to clear a possible false error reported for current DMA > + * read > + */ Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after doing the PIO/DMA read instead of doing it before executing a new read? This would solve your problem without the need for this extra retry, or am I missing something? > + if ((ctrl->nand_version >= 0x0700) && retry) { > + retry = false; > + goto try_dmaread; > + } > ret = brcmstb_nand_verify_erased_page(mtd, chip, buf, addr); > if (ret < 0) { > dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
Boris, On Mon, May 30, 2016 at 4:50 AM, Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > On Fri, 29 Apr 2016 16:21:25 -0400 > Kamal Dasu <kdasu.kdev@gmail.com> wrote: > >> This change provides a fix for controller bug where nand >> controller could have a possible sticky error after a PIO >> followed by a DMA read. The fix retries a read if we see >> a uncorr_ecc after read to detect such sticky errors. >> >> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> >> --- >> drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c >> index 29a9abd..13c7784 100644 >> --- a/drivers/mtd/nand/brcmnand/brcmnand.c >> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c >> @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, >> struct brcmnand_controller *ctrl = host->ctrl; >> u64 err_addr = 0; >> int err; >> + bool retry = true; >> >> dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf); >> >> +try_dmaread: >> brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0); >> >> if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) { >> @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, >> >> if (mtd_is_eccerr(err)) { >> int ret; >> - >> + /* >> + * On controller version >=7.0 if we are doing a DMA read >> + * after a prior PIO read that reported uncorrectable error, >> + * the DMA engine captures this error following DMA read >> + * cleared only on subsequent DMA read, so just retry once >> + * to clear a possible false error reported for current DMA >> + * read >> + */ > > Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after > doing the PIO/DMA read instead of doing it before executing a new read? > This would solve your problem without the need for this extra retry, or > am I missing something? > Clearing the count registers or the intr registers does not clear the condition. Only a clean read (a page that does not have errors) clears the condition. So if this was a false error ( page is really clean) and we read again, it will clear the condition. Thanks Kamal
On Wed, 1 Jun 2016 12:50:56 -0400 Kamal Dasu <kamal.dasu@broadcom.com> wrote: > Boris, > > On Mon, May 30, 2016 at 4:50 AM, Boris Brezillon > <boris.brezillon@free-electrons.com> wrote: > > On Fri, 29 Apr 2016 16:21:25 -0400 > > Kamal Dasu <kdasu.kdev@gmail.com> wrote: > > > >> This change provides a fix for controller bug where nand > >> controller could have a possible sticky error after a PIO > >> followed by a DMA read. The fix retries a read if we see > >> a uncorr_ecc after read to detect such sticky errors. > >> > >> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> > >> --- > >> drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++- > >> 1 file changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c > >> index 29a9abd..13c7784 100644 > >> --- a/drivers/mtd/nand/brcmnand/brcmnand.c > >> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c > >> @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, > >> struct brcmnand_controller *ctrl = host->ctrl; > >> u64 err_addr = 0; > >> int err; > >> + bool retry = true; > >> > >> dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf); > >> > >> +try_dmaread: > >> brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0); > >> > >> if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) { > >> @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, > >> > >> if (mtd_is_eccerr(err)) { > >> int ret; > >> - > >> + /* > >> + * On controller version >=7.0 if we are doing a DMA read > >> + * after a prior PIO read that reported uncorrectable error, > >> + * the DMA engine captures this error following DMA read > >> + * cleared only on subsequent DMA read, so just retry once > >> + * to clear a possible false error reported for current DMA > >> + * read > >> + */ > > > > Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after > > doing the PIO/DMA read instead of doing it before executing a new read? > > This would solve your problem without the need for this extra retry, or > > am I missing something? > > > > Clearing the count registers or the intr registers does not clear the > condition. Only a clean read (a page that does not have errors) clears > the condition. So if this was a false error ( page is really clean) > and we read again, it will clear the condition. This sounds like an expensive workaround, but you know the IP better than I do, so I'll trust you ;).
On Wed, 1 Jun 2016 12:50:56 -0400 Kamal Dasu <kamal.dasu@broadcom.com> wrote: > Boris, > > On Mon, May 30, 2016 at 4:50 AM, Boris Brezillon > <boris.brezillon@free-electrons.com> wrote: > > On Fri, 29 Apr 2016 16:21:25 -0400 > > Kamal Dasu <kdasu.kdev@gmail.com> wrote: > > > >> This change provides a fix for controller bug where nand > >> controller could have a possible sticky error after a PIO > >> followed by a DMA read. The fix retries a read if we see > >> a uncorr_ecc after read to detect such sticky errors. > >> > >> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> > >> --- > >> drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++- > >> 1 file changed, 14 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c > >> index 29a9abd..13c7784 100644 > >> --- a/drivers/mtd/nand/brcmnand/brcmnand.c > >> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c > >> @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, > >> struct brcmnand_controller *ctrl = host->ctrl; > >> u64 err_addr = 0; > >> int err; > >> + bool retry = true; > >> > >> dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf); > >> > >> +try_dmaread: > >> brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0); > >> > >> if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) { > >> @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, > >> > >> if (mtd_is_eccerr(err)) { > >> int ret; > >> - > >> + /* > >> + * On controller version >=7.0 if we are doing a DMA read > >> + * after a prior PIO read that reported uncorrectable error, > >> + * the DMA engine captures this error following DMA read > >> + * cleared only on subsequent DMA read, so just retry once > >> + * to clear a possible false error reported for current DMA > >> + * read > >> + */ > > > > Hm, shouldn't this BRCMNAND_UNCORR_COUNT bit be cleared just after > > doing the PIO/DMA read instead of doing it before executing a new read? > > This would solve your problem without the need for this extra retry, or > > am I missing something? > > > > Clearing the count registers or the intr registers does not clear the > condition. Only a clean read (a page that does not have errors) clears > the condition. So if this was a false error ( page is really clean) > and we read again, it will clear the condition. > Can you try that patch [1], just to make sure the extra read cycle is really necessary? [1]http://code.bulix.org/uzb7m8-99997
Yes have tried this, but its the controller internals that causes this condition. Kamal
diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c index 29a9abd..13c7784 100644 --- a/drivers/mtd/nand/brcmnand/brcmnand.c +++ b/drivers/mtd/nand/brcmnand/brcmnand.c @@ -1555,9 +1555,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, struct brcmnand_controller *ctrl = host->ctrl; u64 err_addr = 0; int err; + bool retry = true; dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf); +try_dmaread: brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0); if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) { @@ -1579,7 +1581,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip, if (mtd_is_eccerr(err)) { int ret; - + /* + * On controller version >=7.0 if we are doing a DMA read + * after a prior PIO read that reported uncorrectable error, + * the DMA engine captures this error following DMA read + * cleared only on subsequent DMA read, so just retry once + * to clear a possible false error reported for current DMA + * read + */ + if ((ctrl->nand_version >= 0x0700) && retry) { + retry = false; + goto try_dmaread; + } ret = brcmstb_nand_verify_erased_page(mtd, chip, buf, addr); if (ret < 0) { dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
This change provides a fix for controller bug where nand controller could have a possible sticky error after a PIO followed by a DMA read. The fix retries a read if we see a uncorr_ecc after read to detect such sticky errors. Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> --- drivers/mtd/nand/brcmnand/brcmnand.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)