Message ID | 20170703134609.0ea9cd63@bbrezillon |
---|---|
State | Not Applicable |
Headers | show |
+ Thomas, as I don't really want to dig up the original patch to review right now On Mon, Jul 03, 2017 at 01:46:09PM +0200, Boris Brezillon wrote: > Hi, > > Here's my PR for 4.13. This v2 includes Dan's fix ("mtd: nand: mtk: > release lock on error path"). > > As usual, let me know if you see any problem. > > Thanks, > > Boris > > The following changes since commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6: > > Linux 4.12-rc1 (2017-05-13 13:19:49 -0700) > > are available in the git repository at: > > ssh://git.infradead.org/srv/git/l2-mtd.git tags/nand/for-4.13 I happen to have SSH access to that repository :) but a publicly-readable link is usually a better idea, in the highly unlikely case that someone else wants to review your pull request. > for you to fetch changes up to 81667e9c8ad827365f2d1e8b924caf062a19c593: > > mtd: nand: mtk: release lock on error path (2017-07-03 13:39:09 +0200) > > ---------------------------------------------------------------- > This pull request contains the following core changes: > > * addition of on-ecc support to Micron driver This still works by overriding chip->ecc.{read,write}_page{,_raw}() callbacks... I never liked that, and there have been multiple submissions that tried this already, IIRC. It's for sure not going to work with at least one in-tree driver (brcmnand); I suspect others (qcom_nand? mtk_nand?). Can this be improved to either bail out when this clobbers a controller's existing callback, or even better, to utilize a controller's existing _raw() implementation, instead of assuming it can use the nand_base defaults? The latter seems difficult to do in time for this merge, but maybe the former can be done within the release? I'm still tempted to pull this, since at least it does require somebody to opt in via the device tree binding. So they should probably make sure it works with their driver before using it. > * addition of helpers to help drivers choose most appropriate ECC > settings > * deletion of dead-code (cached programming and ->errstat() hook) > * make sure drivers that do not support the SET/GET FEATURES command > return ENOTSUPP use a dummy ->set/get_features implementation > returning -ENOTSUPP (required for Micron on-die ECC) > * change the semantic of ecc->write_page() for drivers setting the > NAND_ECC_CUSTOM_PAGE_ACCESS flag > * support exiting 'GET STATUS' command in default ->cmdfunc() > implementations > * change the prototype of ->setup_data_interface() > > A bunch of driver related changes: > > * various cleanup, fixes and improvements of the MTK driver > * OMAP DT bindings fixes > * support for ->setup_data_interface() in the fsmc driver > * support for imx7 in the gpmi driver > * finalization of the denali driver rework (thanks to Masahiro for the > work he's done on this driver) > * fix "bitflips in erased pages" handling in the ifc driver > * addition of PM ops and dynamic timing configuration to the atmel > driver > > And as usual we also have a few minor cleanup/fixes/improvements > patches across the subsystem. > Brian
Le Tue, 11 Jul 2017 18:01:15 -0700, Brian Norris <computersforpeace@gmail.com> a écrit : > + Thomas, as I don't really want to dig up the original patch to review > right now > > On Mon, Jul 03, 2017 at 01:46:09PM +0200, Boris Brezillon wrote: > > Hi, > > > > Here's my PR for 4.13. This v2 includes Dan's fix ("mtd: nand: mtk: > > release lock on error path"). > > > > As usual, let me know if you see any problem. > > > > Thanks, > > > > Boris > > > > The following changes since commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6: > > > > Linux 4.12-rc1 (2017-05-13 13:19:49 -0700) > > > > are available in the git repository at: > > > > ssh://git.infradead.org/srv/git/l2-mtd.git tags/nand/for-4.13 > > I happen to have SSH access to that repository :) but a > publicly-readable link is usually a better idea, in the highly unlikely > case that someone else wants to review your pull request. My bad. I have 2 remotes for l2-mtd (l2-mtd and l2-mtd-pub), just passed the wrong one to 'git request-pull' :-/. > > > for you to fetch changes up to 81667e9c8ad827365f2d1e8b924caf062a19c593: > > > > mtd: nand: mtk: release lock on error path (2017-07-03 13:39:09 +0200) > > > > ---------------------------------------------------------------- > > This pull request contains the following core changes: > > > > * addition of on-ecc support to Micron driver > > This still works by overriding chip->ecc.{read,write}_page{,_raw}() > callbacks... I never liked that, and there have been multiple > submissions that tried this already, IIRC. It's for sure not going to > work with at least one in-tree driver (brcmnand); I suspect others > (qcom_nand? mtk_nand?). Yes, I know it doesn't work with all drivers, but given that those drivers are already not supporting ECC_NONE or ECC_SW, I don't think it's a real program. > Can this be improved to either bail out when > this clobbers a controller's existing callback, or even better, to > utilize a controller's existing _raw() implementation, instead of > assuming it can use the nand_base defaults? IMO, if there's something to fix it's the NAND controller drivers that are only partially complying with core expectations. I mean, '->cmdfunc(READ0) + ->read_buf()' is a valid sequence and should work with all NAND controllers, but for some (good?) reasons, some drivers are taking liberties with this semantic. Note that we are working on improving the situation with the ->exec_op() approach [4], which should provide NAND controller drivers with all the information they need to execute a NAND operation (CMD, ADDR, DATA, WAIT cycles), which, AFAIU was one blocking aspect with '->cmdfunc()/->cmd_ctrl() + ->read/write_buf/byte/word()'. It will also let drivers return -ENOTSUPP to report when they simply don't support a specific NAND operation (for example, when the CMD+ADDR+DATA+WAIT sequence is not supported). > > The latter seems difficult to do in time for this merge, but maybe the > former can be done within the release? > > I'm still tempted to pull this, since at least it does require somebody > to opt in via the device tree binding. So they should probably make sure > it works with their driver before using it. There really no risk with this new feature because: 1/ the option has to be opted-in by the NAND controller driver (or through a DT property) 2/ even if it's enabled by mistake on one of these controllers, they either check ecc->mode [1][2] or simply override chip->ecc content unconditionally [3] If one NAND controller driver is not doing #2, the driver should be fixed not the on-die ECC logic. > > > * addition of helpers to help drivers choose most appropriate ECC > > settings > > * deletion of dead-code (cached programming and ->errstat() hook) > > * make sure drivers that do not support the SET/GET FEATURES command > > return ENOTSUPP use a dummy ->set/get_features implementation > > returning -ENOTSUPP (required for Micron on-die ECC) > > * change the semantic of ecc->write_page() for drivers setting the > > NAND_ECC_CUSTOM_PAGE_ACCESS flag > > * support exiting 'GET STATUS' command in default ->cmdfunc() > > implementations > > * change the prototype of ->setup_data_interface() > > > > A bunch of driver related changes: > > > > * various cleanup, fixes and improvements of the MTK driver > > * OMAP DT bindings fixes > > * support for ->setup_data_interface() in the fsmc driver > > * support for imx7 in the gpmi driver > > * finalization of the denali driver rework (thanks to Masahiro for the > > work he's done on this driver) > > * fix "bitflips in erased pages" handling in the ifc driver > > * addition of PM ops and dynamic timing configuration to the atmel > > driver > > > > And as usual we also have a few minor cleanup/fixes/improvements > > patches across the subsystem. > > > > Brian [1]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/brcmnand/brcmnand.c#L2121 [2]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/mtk_nand.c#L1171 [3]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L1840 [4]https://github.com/linux-nand/linux/commits/nand/exec_op
Hi, On Wed, Jul 12, 2017 at 09:19:19AM +0200, Boris Brezillon wrote: > Le Tue, 11 Jul 2017 18:01:15 -0700, > Brian Norris <computersforpeace@gmail.com> a écrit : > > > + Thomas, as I don't really want to dig up the original patch to review > > right now > > > > On Mon, Jul 03, 2017 at 01:46:09PM +0200, Boris Brezillon wrote: > > > Hi, > > > > > > Here's my PR for 4.13. This v2 includes Dan's fix ("mtd: nand: mtk: > > > release lock on error path"). > > > > > > As usual, let me know if you see any problem. > > > > > > Thanks, > > > > > > Boris > > > > > > The following changes since commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6: > > > > > > Linux 4.12-rc1 (2017-05-13 13:19:49 -0700) > > > > > > are available in the git repository at: > > > > > > ssh://git.infradead.org/srv/git/l2-mtd.git tags/nand/for-4.13 > > > > I happen to have SSH access to that repository :) but a > > publicly-readable link is usually a better idea, in the highly unlikely > > case that someone else wants to review your pull request. > > My bad. I have 2 remotes for l2-mtd (l2-mtd and l2-mtd-pub), just passed > the wrong one to 'git request-pull' :-/. > > > > > > for you to fetch changes up to 81667e9c8ad827365f2d1e8b924caf062a19c593: > > > > > > mtd: nand: mtk: release lock on error path (2017-07-03 13:39:09 +0200) > > > > > > ---------------------------------------------------------------- > > > This pull request contains the following core changes: > > > > > > * addition of on-ecc support to Micron driver > > > > This still works by overriding chip->ecc.{read,write}_page{,_raw}() > > callbacks... I never liked that, and there have been multiple > > submissions that tried this already, IIRC. It's for sure not going to > > work with at least one in-tree driver (brcmnand); I suspect others > > (qcom_nand? mtk_nand?). > > Yes, I know it doesn't work with all drivers, but given that those > drivers are already not supporting ECC_NONE or ECC_SW, I don't think > it's a real program. I'll take 's/program/problem/' as a fun typo? :D I suppose that's mostly fine reasoning (though ECC_NONE is a pretty stupid mode). And I see that the "unsupported drivers" are mostly going to be resilient to this anyway, as noted below. > > Can this be improved to either bail out when > > this clobbers a controller's existing callback, or even better, to > > utilize a controller's existing _raw() implementation, instead of > > assuming it can use the nand_base defaults? > > IMO, if there's something to fix it's the NAND controller drivers that > are only partially complying with core expectations. I mean, > '->cmdfunc(READ0) + ->read_buf()' is a valid sequence and should work > with all NAND controllers, but for some (good?) reasons, some drivers > are taking liberties with this semantic. How is that supposed to communicate things like "read with ECC" vs. "read without ECC"? Overall, that command semantic looked designed mostly for stupid controllers that integrated very little, and I completely understand why many vendors went with drivers that sidestepped that and implemented operations differently. (Disclosure: I implemented one.) If you expect cmdfunc(READ0) + ->read_buf() to always give you raw data, then why should any controller driver be allowed to implement ecc.read_page_raw()? Or conversely, if there are good reasons to allow controllers to override ecc.read_page_raw(), then why should the on-die *flash* implementation circumvent that? Flash support should be orthogonal to controller support. > Note that we are working on improving the situation with the > ->exec_op() approach [4], which should provide NAND controller drivers > with all the information they need to execute a NAND operation > (CMD, ADDR, DATA, WAIT cycles), which, AFAIU was one blocking > aspect with '->cmdfunc()/->cmd_ctrl() + ->read/write_buf/byte/word()'. > > It will also let drivers return -ENOTSUPP to report when they simply > don't support a specific NAND operation (for example, when the > CMD+ADDR+DATA+WAIT sequence is not supported). I'm pretty sure that won't cover why several of the drivers I pointed at don't implement the "semantics" you expect. > > > > The latter seems difficult to do in time for this merge, but maybe the > > former can be done within the release? > > > > I'm still tempted to pull this, since at least it does require somebody > > to opt in via the device tree binding. So they should probably make sure > > it works with their driver before using it. > > There really no risk with this new feature because: The following reasons are OK, but I object to the "no risk" assessment. There's a reason this approach was rejected before; it's setting a precedent of supporting new flash features with a "works for me", but without regard for some drivers that don't use the same approach. Yes, sometimes controller drivers need to be reworked or updated, but at the same time, there are perfectly reasonable ways to implement this while seamlessly supporting these drivers -- namely, *wrap* the existing ecc->{read,write}_page_raw() instead of replacing them. > 1/ the option has to be opted-in by the NAND controller driver (or > through a DT property) I noted that already, which is why I'm not necessarily rejecting the pull request. > 2/ even if it's enabled by mistake on one of these controllers, they > either check ecc->mode [1][2] Right, thanks for pointing those out. That helps a bit more. > or simply override chip->ecc content > unconditionally [3] Really? That seems a little error prone to have two different setters trying to clobber each other. Before this patch set, at least we mostly just respect the controller driver (see all the 'if (!ecc->foo)' in nand_scan_tail()); the only clobbering was on the 'mode' field, because it can be read from the device tree, in nand_scan_ident(). > If one NAND controller driver is not doing #2, the driver should be > fixed not the on-die ECC logic. I'm not sure those are mutually exclusive. The (new) on-die ECC logic can easily recognize that ecc.foo is already configured, and skip clobbering it. As noted above, nand_scan_tail() already does this for most other similar features. If I don't see someone else fix this up soon, I'll look at doing it myself. > > > > > * addition of helpers to help drivers choose most appropriate ECC > > > settings > > > * deletion of dead-code (cached programming and ->errstat() hook) > > > * make sure drivers that do not support the SET/GET FEATURES command > > > return ENOTSUPP use a dummy ->set/get_features implementation > > > returning -ENOTSUPP (required for Micron on-die ECC) > > > * change the semantic of ecc->write_page() for drivers setting the > > > NAND_ECC_CUSTOM_PAGE_ACCESS flag > > > * support exiting 'GET STATUS' command in default ->cmdfunc() > > > implementations > > > * change the prototype of ->setup_data_interface() > > > > > > A bunch of driver related changes: > > > > > > * various cleanup, fixes and improvements of the MTK driver > > > * OMAP DT bindings fixes > > > * support for ->setup_data_interface() in the fsmc driver > > > * support for imx7 in the gpmi driver > > > * finalization of the denali driver rework (thanks to Masahiro for the > > > work he's done on this driver) > > > * fix "bitflips in erased pages" handling in the ifc driver > > > * addition of PM ops and dynamic timing configuration to the atmel > > > driver > > > > > > And as usual we also have a few minor cleanup/fixes/improvements > > > patches across the subsystem. Anyway, I think I'm satisfied enough that we're OK for now. I've pulled the set. Brian > [1]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/brcmnand/brcmnand.c#L2121 > [2]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/mtk_nand.c#L1171 > [3]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/qcom_nandc.c#L1840 > [4]https://github.com/linux-nand/linux/commits/nand/exec_op
On Wed, 12 Jul 2017 18:44:33 -0700 Brian Norris <computersforpeace@gmail.com> wrote: > Hi, > > On Wed, Jul 12, 2017 at 09:19:19AM +0200, Boris Brezillon wrote: > > Le Tue, 11 Jul 2017 18:01:15 -0700, > > Brian Norris <computersforpeace@gmail.com> a écrit : > > > > > + Thomas, as I don't really want to dig up the original patch to review > > > right now > > > > > > On Mon, Jul 03, 2017 at 01:46:09PM +0200, Boris Brezillon wrote: > > > > Hi, > > > > > > > > Here's my PR for 4.13. This v2 includes Dan's fix ("mtd: nand: mtk: > > > > release lock on error path"). > > > > > > > > As usual, let me know if you see any problem. > > > > > > > > Thanks, > > > > > > > > Boris > > > > > > > > The following changes since commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6: > > > > > > > > Linux 4.12-rc1 (2017-05-13 13:19:49 -0700) > > > > > > > > are available in the git repository at: > > > > > > > > ssh://git.infradead.org/srv/git/l2-mtd.git tags/nand/for-4.13 > > > > > > I happen to have SSH access to that repository :) but a > > > publicly-readable link is usually a better idea, in the highly unlikely > > > case that someone else wants to review your pull request. > > > > My bad. I have 2 remotes for l2-mtd (l2-mtd and l2-mtd-pub), just passed > > the wrong one to 'git request-pull' :-/. > > > > > > > > > for you to fetch changes up to 81667e9c8ad827365f2d1e8b924caf062a19c593: > > > > > > > > mtd: nand: mtk: release lock on error path (2017-07-03 13:39:09 +0200) > > > > > > > > ---------------------------------------------------------------- > > > > This pull request contains the following core changes: > > > > > > > > * addition of on-ecc support to Micron driver > > > > > > This still works by overriding chip->ecc.{read,write}_page{,_raw}() > > > callbacks... I never liked that, and there have been multiple > > > submissions that tried this already, IIRC. It's for sure not going to > > > work with at least one in-tree driver (brcmnand); I suspect others > > > (qcom_nand? mtk_nand?). > > > > Yes, I know it doesn't work with all drivers, but given that those > > drivers are already not supporting ECC_NONE or ECC_SW, I don't think > > it's a real program. > > I'll take 's/program/problem/' as a fun typo? :D Hehe. Yep, it's a typo. > > I suppose that's mostly fine reasoning (though ECC_NONE is a pretty > stupid mode). I don't want to debate the usefulness of ECC_NONE here, but I don't find it stupid, just not so useful in real-life :-P. > And I see that the "unsupported drivers" are mostly going > to be resilient to this anyway, as noted below. I'm more optimistic here. Of course, it requires reworking the logic in most of them, and a few might never be able to support on-die ECC because of hardcoded logic in the NAND controller IP preventing one from disabling the ECC engine. > > > > Can this be improved to either bail out when > > > this clobbers a controller's existing callback, or even better, to > > > utilize a controller's existing _raw() implementation, instead of > > > assuming it can use the nand_base defaults? > > > > IMO, if there's something to fix it's the NAND controller drivers that > > are only partially complying with core expectations. I mean, > > '->cmdfunc(READ0) + ->read_buf()' is a valid sequence and should work > > with all NAND controllers, but for some (good?) reasons, some drivers > > are taking liberties with this semantic. > > How is that supposed to communicate things like "read with ECC" vs. "read > without ECC"? It's not supposed to. > Overall, that command semantic looked designed mostly for > stupid controllers that integrated very little, and I completely > understand why many vendors went with drivers that sidestepped that and > implemented operations differently. (Disclosure: I implemented one.) Well, that's where most of the misunderstanding comes from. People tend to mix the ECC engine and NAND controller concepts, simply because it's most of the time embedded in the same HW block. Let me explain my understanding of the NAND framework. You have 3 different concepts: 1/ the NAND chip (struct nand_chip) 2/ the NAND controller (struct nand_hw_ctrl) 3/ the ECC engine (struct nand_ecc_ctrl) The NAND chip and NAND controller are pretty easy to identify and separate, and you don't have a choice here. It's a bit more complicated for the ECC engine, because it may be in the placed in the NAND controller, directly on the chip or handled completely in SW. ecc->read/write_page[_raw]() methods are here to let the ECC engine take the appropriate action based on the required operation, but those functions have to be implemented by the block taking care of ECC: - on-die ECC: NAND vendor driver sends a command to disable/enable ECC before launching the read/prog page operation, and check for errors with another NAND command (only for READ page operations). - ECC engine embedded in NAND controllers: NAND controller driver sets the ECC_EN bit before trigerring the NAND read/prog operation and check for errors when the read is done > > If you expect cmdfunc(READ0) + ->read_buf() to always give you raw data, > then why should any controller driver be allowed to implement > ecc.read_page_raw()? Because ECC engine might have to be disabled before you launch the cmdfunc(READ0) + ->read_buf() sequence, and only the ECC controller can do that. For ECC engines embedded in NAND controllers it's usually about setting the ECC_EN bit to 0 and skipping ECC correction step. For on-die ECC, it may require sending an extra NAND command. If you let the NAND controller override ecc->read_page_raw() and still want to use on-die ECC, then you're screwed, because on-die ECC logic embedded in the NAND might stay enabled even when ecc->read_page_raw() is called. > > Or conversely, if there are good reasons to allow controllers to > override ecc.read_page_raw(), then why should the on-die *flash* > implementation circumvent that? Flash support should be orthogonal to > controller support. The only case where on-die ECC logic should override NAND controller hooks is when ecc->mode is set to ONDIE_ECC. But that also means that, when on-die ECC is selected by the NAND controller driver (at probe/init time), it should make sure that the ECC engine embedded in the NAND controller is disabled when this particular NAND chip is accessed (can be done in ->select_chip() for instance, or it can be disabled at probe time assuming you only have one NAND chip connected to the NAND controller). > > > Note that we are working on improving the situation with the > > ->exec_op() approach [4], which should provide NAND controller drivers > > with all the information they need to execute a NAND operation > > (CMD, ADDR, DATA, WAIT cycles), which, AFAIU was one blocking > > aspect with '->cmdfunc()/->cmd_ctrl() + ->read/write_buf/byte/word()'. > > > > It will also let drivers return -ENOTSUPP to report when they simply > > don't support a specific NAND operation (for example, when the > > CMD+ADDR+DATA+WAIT sequence is not supported). > > I'm pretty sure that won't cover why several of the drivers I pointed at > don't implement the "semantics" you expect. Of course, this ->exec_op() hook won't magically solve all problems, but it will help transition to something that is easier to maintain. As mentioned above, the NAND controller drivers will still be able to decide whether it supports on-die ECC or not, and when it does, it should make sure the ECC engine embedded in the controller is disabled when read/prog accesses are done to this specific chip. Now, maybe I'm wrong, and ->exec_op() will not fit all controllers. But in this case, I'd rather add chip->read/write_page() hooks to let the NAND controller do raw accesses to the NAND rather than abuse the ecc->read/write_page[_raw]() ones, because we're likely to need a specific implementation for these methods for some funky on-die ECC engines (a specific action might be required to disable ECC before a read/write access). > > > > > > > The latter seems difficult to do in time for this merge, but maybe the > > > former can be done within the release? > > > > > > I'm still tempted to pull this, since at least it does require somebody > > > to opt in via the device tree binding. So they should probably make sure > > > it works with their driver before using it. > > > > There really no risk with this new feature because: > > The following reasons are OK, but I object to the "no risk" assessment. > There's a reason this approach was rejected before; it's setting a > precedent of supporting new flash features with a "works for me", but > without regard for some drivers that don't use the same approach. But that's not what Thomas did here. The feature has to be explicitly enabled, and most (all?) drivers check/override ecc->mode before calling nand_scan_tail(). So, unless someone decides to remove ecc->mode checks without testing, we should be safe. > Yes, > sometimes controller drivers need to be reworked or updated, but at the > same time, there are perfectly reasonable ways to implement this while > seamlessly supporting these drivers -- namely, *wrap* the existing > ecc->{read,write}_page_raw() instead of replacing them. Then we need to clarify things, and adding chip->read/write_page() page hooks in nand_chip to do all raw accesses to the NAND is one solution. By raw, I mean raw from the NAND controller PoV. That is, if the controller embeds an ECC engine it has to disable it. Note that we still need the ecc->read/write_page_raw() methods, because accessing the NAND in raw mode from the NAND controller PoV does not necessarily mean we've disabled ECC (this is the case when on-die ECC is in use). > > > 1/ the option has to be opted-in by the NAND controller driver (or > > through a DT property) > > I noted that already, which is why I'm not necessarily rejecting the > pull request. > > > 2/ even if it's enabled by mistake on one of these controllers, they > > either check ecc->mode [1][2] > > Right, thanks for pointing those out. That helps a bit more. > > > or simply override chip->ecc content > > unconditionally [3] > > Really? That seems a little error prone to have two different setters > trying to clobber each other. Before this patch set, at least we mostly > just respect the controller driver (see all the 'if (!ecc->foo)' in > nand_scan_tail()); the only clobbering was on the 'mode' field, because > it can be read from the device tree, in nand_scan_ident(). > > > If one NAND controller driver is not doing #2, the driver should be > > fixed not the on-die ECC logic. > > I'm not sure those are mutually exclusive. The (new) on-die ECC logic > can easily recognize that ecc.foo is already configured, and skip > clobbering it. As noted above, nand_scan_tail() already does this for > most other similar features. > > If I don't see someone else fix this up soon, I'll look at doing it > myself. Actually, this patch [1] should fix the problem. By moving the vendor init step in nand_scan_tail() we make sure ecc->mode is set to its final value when Micron's driver decides whether on-die ECC should be activated or not. [1]https://patchwork.ozlabs.org/patch/770228/