Message ID | 20190726074434.21627-1-m.felsch@pengutronix.de |
---|---|
State | Changes Requested |
Delegated to: | Miquel Raynal |
Headers | show |
Series | mtd: rawnand: micron: handle "ecc off" devices correctly | expand |
Hi Marco, + Richard + Working e-mail address for Boris Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 09:44:34 +0200: > Some devices don't support ecc "official". By "official" I mean that the > feature can be set trough the "SET FEATURE (EFh)" command but isn't > reported to the "READ ID Parameter Tables". Because the "ECC Field" > still says that it is disabled. This is applicable at least > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > datasheet describes the ECC feature in chapter "ECC Protection". > > Currently the driver checks the "READ ID Parameter" field directly after > we enabled the feature. If the check fails we return immediately but > leave the ECC on. Now all future read/program cycles goes trough the ecc > and the host nfc gets confused and reports ECC errors. > > To address this in a common way we need to turn off the ECC directly > after reading the "READ ID Parameter" and before checking the > "ECC status". > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Good catch! However you report that on-die ECC correction is working but you still disable it; any reason to do so ? Would it be better to actually enable on-die ECC and explicitly mark these two chips as buggy (see [1] for checking the chip IDs)? [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83 > --- > drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c > index 1622d3145587..fb199ad2f1a6 100644 > --- a/drivers/mtd/nand/raw/nand_micron.c > +++ b/drivers/mtd/nand/raw/nand_micron.c > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) > return MICRON_ON_DIE_UNSUPPORTED; > > + /* > + * It seems that there are devices which do not support ECC official. > + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports > + * enabling the ECC feature but don't reflect that to the READ_ID table. > + * So we have to guarantee that we disable the ECC feature directly > + * after we did the READ_ID table command. Later we can evaluate the > + * ECC_ENABLE support. > + */ > ret = micron_nand_on_die_ecc_setup(chip, true); > if (ret) > return MICRON_ON_DIE_UNSUPPORTED; > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > if (ret) > return MICRON_ON_DIE_UNSUPPORTED; > > - if (!(id[4] & MICRON_ID_ECC_ENABLED)) > - return MICRON_ON_DIE_UNSUPPORTED; > - > ret = micron_nand_on_die_ecc_setup(chip, false); > if (ret) > return MICRON_ON_DIE_UNSUPPORTED; > > + if (!(id[4] & MICRON_ID_ECC_ENABLED)) > + return MICRON_ON_DIE_UNSUPPORTED; > + > ret = nand_readid_op(chip, 0, id, sizeof(id)); > if (ret) > return MICRON_ON_DIE_UNSUPPORTED; Thanks, Miquèl
+ Actual address for Boris Miquel Raynal <miquel.raynal@bootlin.com> wrote on Fri, 26 Jul 2019 10:28:58 +0200: > Hi Marco, > > + Richard > + Working e-mail address for Boris > > Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 > 09:44:34 +0200: > > > Some devices don't support ecc "official". By "official" I mean that the > > feature can be set trough the "SET FEATURE (EFh)" command but isn't > > reported to the "READ ID Parameter Tables". Because the "ECC Field" > > still says that it is disabled. This is applicable at least > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > > datasheet describes the ECC feature in chapter "ECC Protection". > > > > Currently the driver checks the "READ ID Parameter" field directly after > > we enabled the feature. If the check fails we return immediately but > > leave the ECC on. Now all future read/program cycles goes trough the ecc > > and the host nfc gets confused and reports ECC errors. > > > > To address this in a common way we need to turn off the ECC directly > > after reading the "READ ID Parameter" and before checking the > > "ECC status". > > > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> > > Good catch! However you report that on-die ECC correction is working > but you still disable it; any reason to do so ? Would it be better to > actually enable on-die ECC and explicitly mark these two chips as > buggy (see [1] for checking the chip IDs)? > > [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83 > > > --- > > drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c > > index 1622d3145587..fb199ad2f1a6 100644 > > --- a/drivers/mtd/nand/raw/nand_micron.c > > +++ b/drivers/mtd/nand/raw/nand_micron.c > > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) > > return MICRON_ON_DIE_UNSUPPORTED; > > > > + /* > > + * It seems that there are devices which do not support ECC official. > > + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports > > + * enabling the ECC feature but don't reflect that to the READ_ID table. > > + * So we have to guarantee that we disable the ECC feature directly > > + * after we did the READ_ID table command. Later we can evaluate the > > + * ECC_ENABLE support. > > + */ > > ret = micron_nand_on_die_ecc_setup(chip, true); > > if (ret) > > return MICRON_ON_DIE_UNSUPPORTED; > > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > if (ret) > > return MICRON_ON_DIE_UNSUPPORTED; > > > > - if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > - return MICRON_ON_DIE_UNSUPPORTED; > > - > > ret = micron_nand_on_die_ecc_setup(chip, false); > > if (ret) > > return MICRON_ON_DIE_UNSUPPORTED; > > > > + if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > + return MICRON_ON_DIE_UNSUPPORTED; > > + > > ret = nand_readid_op(chip, 0, id, sizeof(id)); > > if (ret) > > return MICRON_ON_DIE_UNSUPPORTED; > > Thanks, > Miquèl
Hi Miguel, Am Freitag, den 26.07.2019, 10:28 +0200 schrieb Miquel Raynal: > Hi Marco, > > + Richard > + Working e-mail address for Boris > > > Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 > 09:44:34 +0200: > > > Some devices don't support ecc "official". By "official" I mean that the > > feature can be set trough the "SET FEATURE (EFh)" command but isn't > > reported to the "READ ID Parameter Tables". Because the "ECC Field" > > still says that it is disabled. This is applicable at least > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > > datasheet describes the ECC feature in chapter "ECC Protection". > > > > Currently the driver checks the "READ ID Parameter" field directly after > > we enabled the feature. If the check fails we return immediately but > > leave the ECC on. Now all future read/program cycles goes trough the ecc > > and the host nfc gets confused and reports ECC errors. > > > > To address this in a common way we need to turn off the ECC directly > > after reading the "READ ID Parameter" and before checking the > > "ECC status". > > > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> > > Good catch! However you report that on-die ECC correction is working > but you still disable it; any reason to do so ? Would it be better to > actually enable on-die ECC and explicitly mark these two chips as > buggy (see [1] for checking the chip IDs)? It's the other way around. The chip is not supposed to have on-die ECC according to the datasheet and correctly reflects this fact in the READ_ID, so Linux should not try to use the on-die ECC. The bug is that the NAND is not supposed to have on-die ECC and reports this correctly, but then actually enables a on-die ECC unit when asked to, probably due to the same die being used for on-die ECC and ECC off devices. The consequence is that Linux (correctly) assumes that the full OOB size is available to the controller, but the on-die ECC unit scribbles over some of the OOB data. I think this fix the most robust solution, as it makes sure to disable the on-die ECC unit to avoid the issue, which might also be present on other NAND chips we don't know about yet. Regards, Lucas > [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83 > > > --- > > drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c > > index 1622d3145587..fb199ad2f1a6 100644 > > --- a/drivers/mtd/nand/raw/nand_micron.c > > +++ b/drivers/mtd/nand/raw/nand_micron.c > > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > + /* > > > > + * It seems that there are devices which do not support ECC official. > > > > + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports > > > > + * enabling the ECC feature but don't reflect that to the READ_ID table. > > > > + * So we have to guarantee that we disable the ECC feature directly > > > > + * after we did the READ_ID table command. Later we can evaluate the > > > > + * ECC_ENABLE support. > > > > + */ > > > > ret = micron_nand_on_die_ecc_setup(chip, true); > > > > if (ret) > > > > return MICRON_ON_DIE_UNSUPPORTED; > > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > if (ret) > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > - if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > - return MICRON_ON_DIE_UNSUPPORTED; > > - > > > > ret = micron_nand_on_die_ecc_setup(chip, false); > > > > if (ret) > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > + if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > + return MICRON_ON_DIE_UNSUPPORTED; > > + > > > > ret = nand_readid_op(chip, 0, id, sizeof(id)); > > > > if (ret) > > return MICRON_ON_DIE_UNSUPPORTED; > > Thanks, > Miquèl >
On Fri, 26 Jul 2019 10:34:41 +0200 Miquel Raynal <miquel.raynal@bootlin.com> wrote: > + Actual address for Boris > > Miquel Raynal <miquel.raynal@bootlin.com> wrote on Fri, 26 Jul 2019 > 10:28:58 +0200: > > > Hi Marco, > > > > + Richard > > + Working e-mail address for Boris > > > > Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 > > 09:44:34 +0200: > > > > > Some devices don't support ecc "official". By "official" I mean that the > > > feature can be set trough the "SET FEATURE (EFh)" command but isn't > > > reported to the "READ ID Parameter Tables". Because the "ECC Field" > > > still says that it is disabled. This is applicable at least > > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > > > datasheet describes the ECC feature in chapter "ECC Protection". > > > > > > Currently the driver checks the "READ ID Parameter" field directly after > > > we enabled the feature. If the check fails we return immediately but > > > leave the ECC on. Now all future read/program cycles goes trough the ecc > > > and the host nfc gets confused and reports ECC errors. > > > > > > To address this in a common way we need to turn off the ECC directly > > > after reading the "READ ID Parameter" and before checking the > > > "ECC status". > > > > > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Duh! Yet another bug on those Micron chips. I can't say I'm surprised :-). Anyway, the change looks good: Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> > > > > Good catch! However you report that on-die ECC correction is working > > but you still disable it; any reason to do so ? Would it be better to > > actually enable on-die ECC and explicitly mark these two chips as > > buggy (see [1] for checking the chip IDs)? That's a solution, but are we even sure ECC works correctly on those NANDs? Given all the problem we have with on-die ECC on Micron chips I think it might be a good thing to base the "on-die ECC support" detection on the full ID (or even better, the part name provided by the ONFi param page) instead of trying to be smart. This way we can whitelist the NANDs that are known to work correctly and stop adding more quirks every time we find a new bug... > > > > [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83 > > > > > --- > > > drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- > > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c > > > index 1622d3145587..fb199ad2f1a6 100644 > > > --- a/drivers/mtd/nand/raw/nand_micron.c > > > +++ b/drivers/mtd/nand/raw/nand_micron.c > > > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > + /* > > > + * It seems that there are devices which do not support ECC official. > > > + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports > > > + * enabling the ECC feature but don't reflect that to the READ_ID table. > > > + * So we have to guarantee that we disable the ECC feature directly > > > + * after we did the READ_ID table command. Later we can evaluate the > > > + * ECC_ENABLE support. > > > + */ > > > ret = micron_nand_on_die_ecc_setup(chip, true); > > > if (ret) > > > return MICRON_ON_DIE_UNSUPPORTED; > > > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > if (ret) > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > - if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > - return MICRON_ON_DIE_UNSUPPORTED; > > > - > > > ret = micron_nand_on_die_ecc_setup(chip, false); > > > if (ret) > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > + if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > + return MICRON_ON_DIE_UNSUPPORTED; > > > + > > > ret = nand_readid_op(chip, 0, id, sizeof(id)); > > > if (ret) > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > Thanks, > > Miquèl
Hi Lucas, Marco, Lucas Stach <l.stach@pengutronix.de> wrote on Fri, 26 Jul 2019 10:54:11 +0200: > Hi Miguel, > > Am Freitag, den 26.07.2019, 10:28 +0200 schrieb Miquel Raynal: > > Hi Marco, > > > > + Richard > > + Working e-mail address for Boris > > > > > Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 > > 09:44:34 +0200: > > > > > Some devices don't support ecc "official". By "official" I mean that the ^ uppercase ECC > > > feature can be set trough the "SET FEATURE (EFh)" command but isn't > > > reported to the "READ ID Parameter Tables". Because the "ECC Field" > > > still says that it is disabled. This is applicable at least > > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > > > datasheet describes the ECC feature in chapter "ECC Protection". What about: "Some devices are supposed to do not support on-die ECC but experience shows that internal ECC machinery can actually be enabled through the "SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter Tables" returns that it is not." > > > > > > Currently the driver checks the "READ ID Parameter" field directly after > > > we enabled the feature. If the check fails we return immediately but > > > leave the ECC on. Now all future read/program cycles goes trough the ecc > > > and the host nfc gets confused and reports ECC errors. And here: "Currently, the driver checks the "READ ID Parameter" field directly after having enabled the feature. If the check fails it returns immediately but leaves the ECC on. When using buggy chips like MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will go through the on-die ECC, confusing the host controller which is supposed to be the one handling correction." > > > To address this in a common way we need to turn off the ECC directly > > > after reading the "READ ID Parameter" and before checking the > > > "ECC status". > > > > > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> > > > > Good catch! However you report that on-die ECC correction is working > > but you still disable it; any reason to do so ? Would it be better to > > actually enable on-die ECC and explicitly mark these two chips as > > buggy (see [1] for checking the chip IDs)? > > It's the other way around. The chip is not supposed to have on-die ECC > according to the datasheet and correctly reflects this fact in the > READ_ID, so Linux should not try to use the on-die ECC. Ok I understood the opposite because of the "Even worse the datasheet describes the ECC feature [...]" which implied to me that the on-die ECC feature was actually expected despite the status bit not being set. Marco, can you rephrase a bit the commit log? I proposed something, feel free to adapt. > The bug is that the NAND is not supposed to have on-die ECC and reports > this correctly, but then actually enables a on-die ECC unit when asked > to, probably due to the same die being used for on-die ECC and ECC off > devices. The consequence is that Linux (correctly) assumes that the > full OOB size is available to the controller, but the on-die ECC unit > scribbles over some of the OOB data. > > I think this fix the most robust solution, as it makes sure to disable > the on-die ECC unit to avoid the issue, which might also be present on > other NAND chips we don't know about yet. > > Regards, > Lucas > > > [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83 > > > > > --- > > > drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- > > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c > > > index 1622d3145587..fb199ad2f1a6 100644 > > > --- a/drivers/mtd/nand/raw/nand_micron.c > > > +++ b/drivers/mtd/nand/raw/nand_micron.c > > > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > > (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > + /* > > > > > + * It seems that there are devices which do not support ECC official. > > > > > + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports > > > > > + * enabling the ECC feature but don't reflect that to the READ_ID table. > > > > > + * So we have to guarantee that we disable the ECC feature directly > > > > > + * after we did the READ_ID table command. Later we can evaluate the > > > > > + * ECC_ENABLE support. > > > > > + */ > > > > > ret = micron_nand_on_die_ecc_setup(chip, true); > > > > > if (ret) > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > > if (ret) > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > - if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > > - return MICRON_ON_DIE_UNSUPPORTED; > > > - > > > > > ret = micron_nand_on_die_ecc_setup(chip, false); > > > > > if (ret) > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > + if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > > + return MICRON_ON_DIE_UNSUPPORTED; > > > + > > > > > ret = nand_readid_op(chip, 0, id, sizeof(id)); > > > > > if (ret) > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > Thanks, > > Miquèl > > Thanks, Miquèl
Wrong address for Boris again, sorry for the noise. > Hi Lucas, Marco, > > Lucas Stach <l.stach@pengutronix.de> wrote on Fri, 26 Jul 2019 10:54:11 > +0200: > > > Hi Miguel, > > > > Am Freitag, den 26.07.2019, 10:28 +0200 schrieb Miquel Raynal: > > > Hi Marco, > > > > > > + Richard > > > + Working e-mail address for Boris > > > > > > > Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 > > > 09:44:34 +0200: > > > > > > > Some devices don't support ecc "official". By "official" I mean that the > > ^ uppercase ECC > > > > > feature can be set trough the "SET FEATURE (EFh)" command but isn't > > > > reported to the "READ ID Parameter Tables". Because the "ECC Field" > > > > still says that it is disabled. This is applicable at least > > > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > > > > datasheet describes the ECC feature in chapter "ECC Protection". > > What about: > > "Some devices are supposed to do not support on-die ECC but > experience shows that internal ECC machinery can actually be enabled > through the "SET FEATURE (EFh)" command, even if a read of the "READ ID > Parameter Tables" returns that it is not." > > > > > > > > > Currently the driver checks the "READ ID Parameter" field directly after > > > > we enabled the feature. If the check fails we return immediately but > > > > leave the ECC on. Now all future read/program cycles goes trough the ecc > > > > and the host nfc gets confused and reports ECC errors. > > And here: > > "Currently, the driver checks the "READ ID Parameter" field > directly after having enabled the feature. If the check fails it returns > immediately but leaves the ECC on. When using buggy chips like > MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will > go through the on-die ECC, confusing the host controller which is > supposed to be the one handling correction." > > > > > To address this in a common way we need to turn off the ECC directly > > > > after reading the "READ ID Parameter" and before checking the > > > > "ECC status". > > > > > > > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> > > > > > > Good catch! However you report that on-die ECC correction is working > > > but you still disable it; any reason to do so ? Would it be better to > > > actually enable on-die ECC and explicitly mark these two chips as > > > buggy (see [1] for checking the chip IDs)? > > > > It's the other way around. The chip is not supposed to have on-die ECC > > according to the datasheet and correctly reflects this fact in the > > READ_ID, so Linux should not try to use the on-die ECC. > > Ok I understood the opposite because of the "Even worse the datasheet > describes the ECC feature [...]" which implied to me that the on-die ECC > feature was actually expected despite the status bit not being set. > > Marco, can you rephrase a bit the commit log? I proposed something, > feel free to adapt. > > > The bug is that the NAND is not supposed to have on-die ECC and reports > > this correctly, but then actually enables a on-die ECC unit when asked > > to, probably due to the same die being used for on-die ECC and ECC off > > devices. The consequence is that Linux (correctly) assumes that the > > full OOB size is available to the controller, but the on-die ECC unit > > scribbles over some of the OOB data. > > > > I think this fix the most robust solution, as it makes sure to disable > > the on-die ECC unit to avoid the issue, which might also be present on > > other NAND chips we don't know about yet. > > > > Regards, > > Lucas > > > > > [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83 > > > > > > > --- > > > > drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- > > > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c > > > > index 1622d3145587..fb199ad2f1a6 100644 > > > > --- a/drivers/mtd/nand/raw/nand_micron.c > > > > +++ b/drivers/mtd/nand/raw/nand_micron.c > > > > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > > > (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > > > + /* > > > > > > + * It seems that there are devices which do not support ECC official. > > > > > > + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports > > > > > > + * enabling the ECC feature but don't reflect that to the READ_ID table. > > > > > > + * So we have to guarantee that we disable the ECC feature directly > > > > > > + * after we did the READ_ID table command. Later we can evaluate the > > > > > > + * ECC_ENABLE support. > > > > > > + */ > > > > > > ret = micron_nand_on_die_ecc_setup(chip, true); > > > > > > if (ret) > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > > > if (ret) > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > > > - if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > > > - return MICRON_ON_DIE_UNSUPPORTED; > > > > - > > > > > > ret = micron_nand_on_die_ecc_setup(chip, false); > > > > > > if (ret) > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > > > + if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > > > + return MICRON_ON_DIE_UNSUPPORTED; > > > > + > > > > > > ret = nand_readid_op(chip, 0, id, sizeof(id)); > > > > > > if (ret) > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > Thanks, > > > Miquèl > > > > > > Thanks, > Miquèl Thanks, Miquèl
Hi Miquel, On 19-07-26 11:20, Miquel Raynal wrote: > Wrong address for Boris again, sorry for the noise. > > > Hi Lucas, Marco, > > > > Lucas Stach <l.stach@pengutronix.de> wrote on Fri, 26 Jul 2019 10:54:11 > > +0200: > > > > > Hi Miguel, > > > > > > Am Freitag, den 26.07.2019, 10:28 +0200 schrieb Miquel Raynal: > > > > Hi Marco, > > > > > > > > + Richard > > > > + Working e-mail address for Boris > > > > > > > > > Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 > > > > 09:44:34 +0200: > > > > > > > > > Some devices don't support ecc "official". By "official" I mean that the > > > > ^ uppercase ECC > > > > > > > feature can be set trough the "SET FEATURE (EFh)" command but isn't > > > > > reported to the "READ ID Parameter Tables". Because the "ECC Field" > > > > > still says that it is disabled. This is applicable at least > > > > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > > > > > datasheet describes the ECC feature in chapter "ECC Protection". > > > > What about: > > > > "Some devices are supposed to do not support on-die ECC but > > experience shows that internal ECC machinery can actually be enabled > > through the "SET FEATURE (EFh)" command, even if a read of the "READ ID > > Parameter Tables" returns that it is not." > > > > > > > > > > > > Currently the driver checks the "READ ID Parameter" field directly after > > > > > we enabled the feature. If the check fails we return immediately but > > > > > leave the ECC on. Now all future read/program cycles goes trough the ecc > > > > > and the host nfc gets confused and reports ECC errors. > > > > And here: > > > > "Currently, the driver checks the "READ ID Parameter" field > > directly after having enabled the feature. If the check fails it returns > > immediately but leaves the ECC on. When using buggy chips like > > MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will > > go through the on-die ECC, confusing the host controller which is > > supposed to be the one handling correction." > > > > > > > To address this in a common way we need to turn off the ECC directly > > > > > after reading the "READ ID Parameter" and before checking the > > > > > "ECC status". > > > > > > > > > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> > > > > > > > > Good catch! However you report that on-die ECC correction is working > > > > but you still disable it; any reason to do so ? Would it be better to > > > > actually enable on-die ECC and explicitly mark these two chips as > > > > buggy (see [1] for checking the chip IDs)? > > > > > > It's the other way around. The chip is not supposed to have on-die ECC > > > according to the datasheet and correctly reflects this fact in the > > > READ_ID, so Linux should not try to use the on-die ECC. > > > > Ok I understood the opposite because of the "Even worse the datasheet > > describes the ECC feature [...]" which implied to me that the on-die ECC > > feature was actually expected despite the status bit not being set. > > > > Marco, can you rephrase a bit the commit log? I proposed something, > > feel free to adapt. Thanks for the fast reply :) Of course I can adapt it and adding Boris rb-tag. Regards, Marco > > > The bug is that the NAND is not supposed to have on-die ECC and reports > > > this correctly, but then actually enables a on-die ECC unit when asked > > > to, probably due to the same die being used for on-die ECC and ECC off > > > devices. The consequence is that Linux (correctly) assumes that the > > > full OOB size is available to the controller, but the on-die ECC unit > > > scribbles over some of the OOB data. > > > > > > I think this fix the most robust solution, as it makes sure to disable > > > the on-die ECC unit to avoid the issue, which might also be present on > > > other NAND chips we don't know about yet. > > > > > > Regards, > > > Lucas > > > > > > > [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83 > > > > > > > > > --- > > > > > drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- > > > > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c > > > > > index 1622d3145587..fb199ad2f1a6 100644 > > > > > --- a/drivers/mtd/nand/raw/nand_micron.c > > > > > +++ b/drivers/mtd/nand/raw/nand_micron.c > > > > > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > > > > (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) > > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > > > > > + /* > > > > > > > + * It seems that there are devices which do not support ECC official. > > > > > > > + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports > > > > > > > + * enabling the ECC feature but don't reflect that to the READ_ID table. > > > > > > > + * So we have to guarantee that we disable the ECC feature directly > > > > > > > + * after we did the READ_ID table command. Later we can evaluate the > > > > > > > + * ECC_ENABLE support. > > > > > > > + */ > > > > > > > ret = micron_nand_on_die_ecc_setup(chip, true); > > > > > > > if (ret) > > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) > > > > > > > if (ret) > > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > > > > > - if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > > > > - return MICRON_ON_DIE_UNSUPPORTED; > > > > > - > > > > > > > ret = micron_nand_on_die_ecc_setup(chip, false); > > > > > > > if (ret) > > > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > > > > > + if (!(id[4] & MICRON_ID_ECC_ENABLED)) > > > > > > > + return MICRON_ON_DIE_UNSUPPORTED; > > > > > + > > > > > > > ret = nand_readid_op(chip, 0, id, sizeof(id)); > > > > > > > if (ret) > > > > > return MICRON_ON_DIE_UNSUPPORTED; > > > > > > > > Thanks, > > > > Miquèl > > > > > > > > > > Thanks, > > Miquèl > > > > > Thanks, > Miquèl >
Hi Marco, Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 11:40:10 +0200: > Hi Miquel, > > On 19-07-26 11:20, Miquel Raynal wrote: > > Wrong address for Boris again, sorry for the noise. > > > > > Hi Lucas, Marco, > > > > > > Lucas Stach <l.stach@pengutronix.de> wrote on Fri, 26 Jul 2019 10:54:11 > > > +0200: > > > > > > > Hi Miguel, > > > > > > > > Am Freitag, den 26.07.2019, 10:28 +0200 schrieb Miquel Raynal: > > > > > Hi Marco, > > > > > > > > > > + Richard > > > > > + Working e-mail address for Boris > > > > > > > > > > > Marco Felsch <m.felsch@pengutronix.de> wrote on Fri, 26 Jul 2019 > > > > > 09:44:34 +0200: > > > > > > > > > > > Some devices don't support ecc "official". By "official" I mean that the > > > > > > ^ uppercase ECC > > > > > > > > > feature can be set trough the "SET FEATURE (EFh)" command but isn't > > > > > > reported to the "READ ID Parameter Tables". Because the "ECC Field" > > > > > > still says that it is disabled. This is applicable at least > > > > > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the > > > > > > datasheet describes the ECC feature in chapter "ECC Protection". > > > > > > What about: > > > > > > "Some devices are supposed to do not support on-die ECC but > > > experience shows that internal ECC machinery can actually be enabled > > > through the "SET FEATURE (EFh)" command, even if a read of the "READ ID > > > Parameter Tables" returns that it is not." > > > > > > > > > > > > > > > Currently the driver checks the "READ ID Parameter" field directly after > > > > > > we enabled the feature. If the check fails we return immediately but > > > > > > leave the ECC on. Now all future read/program cycles goes trough the ecc > > > > > > and the host nfc gets confused and reports ECC errors. > > > > > > And here: > > > > > > "Currently, the driver checks the "READ ID Parameter" field > > > directly after having enabled the feature. If the check fails it returns > > > immediately but leaves the ECC on. When using buggy chips like > > > MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will > > > go through the on-die ECC, confusing the host controller which is > > > supposed to be the one handling correction." > > > > > > > > > To address this in a common way we need to turn off the ECC directly > > > > > > after reading the "READ ID Parameter" and before checking the > > > > > > "ECC status". > > > > > > > > > > > > Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> > > > > > > > > > > Good catch! However you report that on-die ECC correction is working > > > > > but you still disable it; any reason to do so ? Would it be better to > > > > > actually enable on-die ECC and explicitly mark these two chips as > > > > > buggy (see [1] for checking the chip IDs)? > > > > > > > > It's the other way around. The chip is not supposed to have on-die ECC > > > > according to the datasheet and correctly reflects this fact in the > > > > READ_ID, so Linux should not try to use the on-die ECC. > > > > > > Ok I understood the opposite because of the "Even worse the datasheet > > > describes the ECC feature [...]" which implied to me that the on-die ECC > > > feature was actually expected despite the status bit not being set. > > > > > > Marco, can you rephrase a bit the commit log? I proposed something, > > > feel free to adapt. > > Thanks for the fast reply :) Of course I can adapt it and adding Boris rb-tag. I suppose you can also add Fixes and Stable tags. Thanks, Miquèl
diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c index 1622d3145587..fb199ad2f1a6 100644 --- a/drivers/mtd/nand/raw/nand_micron.c +++ b/drivers/mtd/nand/raw/nand_micron.c @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2) return MICRON_ON_DIE_UNSUPPORTED; + /* + * It seems that there are devices which do not support ECC official. + * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports + * enabling the ECC feature but don't reflect that to the READ_ID table. + * So we have to guarantee that we disable the ECC feature directly + * after we did the READ_ID table command. Later we can evaluate the + * ECC_ENABLE support. + */ ret = micron_nand_on_die_ecc_setup(chip, true); if (ret) return MICRON_ON_DIE_UNSUPPORTED; @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip) if (ret) return MICRON_ON_DIE_UNSUPPORTED; - if (!(id[4] & MICRON_ID_ECC_ENABLED)) - return MICRON_ON_DIE_UNSUPPORTED; - ret = micron_nand_on_die_ecc_setup(chip, false); if (ret) return MICRON_ON_DIE_UNSUPPORTED; + if (!(id[4] & MICRON_ID_ECC_ENABLED)) + return MICRON_ON_DIE_UNSUPPORTED; + ret = nand_readid_op(chip, 0, id, sizeof(id)); if (ret) return MICRON_ON_DIE_UNSUPPORTED;
Some devices don't support ecc "official". By "official" I mean that the feature can be set trough the "SET FEATURE (EFh)" command but isn't reported to the "READ ID Parameter Tables". Because the "ECC Field" still says that it is disabled. This is applicable at least for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the datasheet describes the ECC feature in chapter "ECC Protection". Currently the driver checks the "READ ID Parameter" field directly after we enabled the feature. If the check fails we return immediately but leave the ECC on. Now all future read/program cycles goes trough the ecc and the host nfc gets confused and reports ECC errors. To address this in a common way we need to turn off the ECC directly after reading the "READ ID Parameter" and before checking the "ECC status". Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> --- drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)