Message ID | 20240123014517.5787-1-mcpratt@pm.me |
---|---|
Headers | show |
Series | fw_devlink: generically handle bad links to child fwnodes | expand |
On Mon, Jan 22, 2024 at 5:46 PM Michael Pratt <mcpratt@pm.me> wrote: > > Hi all, > > First off, if you are CC'ed here, it's likely because you were > involved in or CC'ed in a patch series by Saravana Kannan for fw_devlink, > and for consistency, I'm kindly asking for review from you all as well. > If you think this may not affect your use cases, feel free to ignore this. > I'm also CC'ing Christian and Rafal from the Openwrt project. > > This series is following up on some excellent work from Saravana [1] > which is another patch series that includes some changes > that helped make it possible for fw_devlink to work with MTD partitions > that are a supplier fwnode by being a NVMEM provider. For an MTD partition > to become an NVMEM provider, it must be registered as a platform device > using of_platform_populate() or similar function > which was done in a previous commit [2] > but this also resulted in fw_devlink to apply > to those fwnodes and their child fwnodes. > > That regression caused consumer devices to defer indefinitely > while waiting for probing that will never happen or for device links > to form from fwnode links with fwnodes that are not associated > with any real device or driver that probes > (e.g. describing the location of a MAC address). > > Saravana's patch series helped in two ways: > First, by moving consumers from a child fwnode (in this case, > the "nvmem-cells" compatible node) to an ancestor fwnode > that represents the actual supplier device when that device probes, > which handles the case where > the supplier device has a probe attempt first. [3] [4] > And secondly, by specifically marking "nvmem-cells" compatible nodes > as populated during MTD partition parsing which also occurs during > the supplier device probe, which handles both cases of initial probe order, > but only for that node type. [5] > > However, there's some drawbacks to the second solution Oh, somehow missed this thread entirely until it saw some activity today. > from having to manually name which nodes need the workaround > for the corner case with the compatible string. > Most notably, that it's possible for this corner case > to apply to other nodes types, but also the fact that initial probe order > deciding whether or not everything probes in the intended order, if at all, > through driver core alone is still an issue with fw_devlink, > despite the fact that controlling probe order in driver core > is one of it's goals. In other words, the real problem is in driver core, > but the fix is in the mtd driver. It's been a while since I looked at MTD code, but the real problem is not in driver core, but how it's used incorrectly by the MTD/nvmem frameworks. Adding devices to a bus that'll never probe is wrong. I think there's also a case of two devices being created off the same DT node. While not technically wrong, it's weird when one of them never probes. I'll take a closer look and respond to this series later. Hopefully this week, but if not, then next week. As I said in the other patch, I don't like the series in the current form because it's changing APIs in not so great ways. fwnode_link_add() is supposed to be super dumb and simple. It's literally there just to avoid reparsing DT nodes multiple times. Nothing more ever. It just denotes the "pointers" between fwnode or DT nodes. The "smarts" should either be where fwnode links are converted into device links (and not have to fix them up) or where nvmem-cells is being parsed and converted to fwnode links. As I said in the other patch, let me take a closer look this week or next and get back. These things needs several hours of uninterrupted time for me to debug and unwind. -Saravana > > Unfortunately, with the Openwrt project > we are experiencing this regression again > by implementing the new NVMEM layout "fixed-layout" > after it was accepted into the kernel. [6] > This causes some subsystems of an SOC, like > ethernet or wireless or both depending on hardware and DTS, > and in some cases a completely different function like USB > to never probe once again, and the temporary fix, like before, > is by disabling fw_devlink. [7] > > Below is a simplified DTS of an Atheros device with the NVMEM layout: > > > ð0 { > nvmem-cells = <&macaddr_art_0>; > nvmem-cell-names = "mac-address"; > }; > > &wifi { > nvmem-cells = <&caldata_art_1000>; > nvmem-cell-names = "calibration"; > }; > > &spi { > status = "okay"; > > flash@0 { > compatible = "jedec,spi-nor"; > > partitions { > compatible = "fixed-partitions"; > > partition@ff0000 { > label = "art"; > reg = <0xff0000 0x010000>; > read-only; > > nvmem-layout { > compatible = "fixed-layout"; > > macaddr_art_0: macaddr@0 { > reg = <0x0 0x6>; > }; > > caldata_art_1000: caldata@1000 { > reg = <0x1000 0x440>; > }; > }; > }; > }; > }; > }; > > > When the DTS is written this way, not only is there a different node > involved for NVMEM, but this node is a new node that is yet another > descendant in the tree. In other words, the "partition@ff0000" node > used to be what designated this device as the NVMEM provider > with the "nvmem-cells" compatible, so the node that represents > the actual probing device is now 4 parent nodes up in the tree > in this example instead of 3 parent nodes up in the tree as before. > > For the past year, and even before the "fixed-layout" issue came up, > I had been experimenting with a way to handle these reverse probe order > and linking of distant descendant node issues in a generic way instead of > naming exceptions with the compatible string, and this series is the > culmination of those efforts. It may look strange at first, > but there are a myriad set of cases to handle and other reasons > that led me to develop this approach of using existing bad device links > in order to find the correct fwnode to link to, and then redo > the relevant links for that consumer device altogether. > I'm concerned that doing this another way would be a much more massive > project that would basically rewrite what the fw_devlink feature does. > Or perhaps there would have to be a new, third form of device links > that would be a "placeholder" before it becomes a fwnode link. > Eventually, I came to the conclusion that > there simply is not enough information to form the correct fwnode link > before the real supplier device has a successful probe. > > Some of the changes proposed here are made on the extreme side of caution, > for example, checking for null dereference when it might not be necessary, > and reducing the activity of some functions in order to reduce > the amount of assumptions taking place in the middle of driver core > in cases where the new functions proposed here handles that just as well > and closer to a possible probe defer event > (e.g. not declaring a fwnode as NOT_DEVICE before > a probe attempt is expected to have happened). > > I have tried to make the commit messages as self-explanatory as I can, > but they might have ended up a little too verbose, and therefore confusing > but I'm ready to explain whatever has not been explained well already. > Lastly, this is my first attempt at sending a larger change to the kernel, > so I appreciate your time and patience very much. > > MCP > > > [1] https://lore.kernel.org/lkml/20230207014207.1678715-1-saravanak@google.com/ > > [2] bcdf0315a61a29eb753a607d3a85a4032de72d94 > > [3] 3a2dbc510c437ca392516b0105bad8e7970e6614 > > [4] 411c0d58ca6faa9bc4b9f5382118a31c7bb92a6f > > [5] fb42378dcc7f247df56f0ecddfdae85487495fbc > > [6] 27f699e578b1a72df89dfa3bc42e093a01dc8d10 > > [7] https://github.com/openwrt/openwrt/commit/703d667a0cdf6dfa402c08e72d0c77a257ca5009 > > > Michael Pratt (4): > driver core: fw_devlink: Use driver to determine probe ability > driver core: fw_devlink: Link to supplier ancestor if no device > driver core: fw_devlink: Add function device_links_fixup_suppliers() > mtd: mtdpart: Allow fwnode links to NVMEM compatible fwnodes > > drivers/base/base.h | 1 + > drivers/base/core.c | 144 ++++++++++++++++++++++++++++++++++++++--- > drivers/base/dd.c | 2 + > drivers/mtd/mtdpart.c | 10 --- > include/linux/fwnode.h | 4 ++ > 5 files changed, 143 insertions(+), 18 deletions(-) > > > base-commit: b0d326da462e20285236e11e4cbc32085de9f363 > -- > 2.30.2 > >
On Mon, Feb 5, 2024 at 12:06 PM Saravana Kannan <saravanak@google.com> wrote: > > On Mon, Jan 22, 2024 at 5:46 PM Michael Pratt <mcpratt@pm.me> wrote: > > > > Hi all, > > > > First off, if you are CC'ed here, it's likely because you were > > involved in or CC'ed in a patch series by Saravana Kannan for fw_devlink, > > and for consistency, I'm kindly asking for review from you all as well. > > If you think this may not affect your use cases, feel free to ignore this. > > I'm also CC'ing Christian and Rafal from the Openwrt project. > > > > This series is following up on some excellent work from Saravana [1] > > which is another patch series that includes some changes > > that helped make it possible for fw_devlink to work with MTD partitions > > that are a supplier fwnode by being a NVMEM provider. For an MTD partition > > to become an NVMEM provider, it must be registered as a platform device > > using of_platform_populate() or similar function > > which was done in a previous commit [2] > > but this also resulted in fw_devlink to apply > > to those fwnodes and their child fwnodes. > > > > That regression caused consumer devices to defer indefinitely > > while waiting for probing that will never happen or for device links > > to form from fwnode links with fwnodes that are not associated > > with any real device or driver that probes > > (e.g. describing the location of a MAC address). > > > > Saravana's patch series helped in two ways: > > First, by moving consumers from a child fwnode (in this case, > > the "nvmem-cells" compatible node) to an ancestor fwnode > > that represents the actual supplier device when that device probes, > > which handles the case where > > the supplier device has a probe attempt first. [3] [4] > > And secondly, by specifically marking "nvmem-cells" compatible nodes > > as populated during MTD partition parsing which also occurs during > > the supplier device probe, which handles both cases of initial probe order, > > but only for that node type. [5] > > > > However, there's some drawbacks to the second solution > > Oh, somehow missed this thread entirely until it saw some activity today. > > > from having to manually name which nodes need the workaround > > for the corner case with the compatible string. > > Most notably, that it's possible for this corner case > > to apply to other nodes types, but also the fact that initial probe order > > deciding whether or not everything probes in the intended order, if at all, > > through driver core alone is still an issue with fw_devlink, > > despite the fact that controlling probe order in driver core > > is one of it's goals. In other words, the real problem is in driver core, > > but the fix is in the mtd driver. > > It's been a while since I looked at MTD code, but the real problem is > not in driver core, but how it's used incorrectly by the MTD/nvmem > frameworks. Adding devices to a bus that'll never probe is wrong. I > think there's also a case of two devices being created off the same DT > node. While not technically wrong, it's weird when one of them never > probes. > > I'll take a closer look and respond to this series later. Hopefully > this week, but if not, then next week. > > As I said in the other patch, I don't like the series in the current > form because it's changing APIs in not so great ways. > > fwnode_link_add() is supposed to be super dumb and simple. It's > literally there just to avoid reparsing DT nodes multiple times. > Nothing more ever. It just denotes the "pointers" between fwnode or DT > nodes. > > The "smarts" should either be where fwnode links are converted into > device links (and not have to fix them up) or where nvmem-cells is > being parsed and converted to fwnode links. > > As I said in the other patch, let me take a closer look this week or > next and get back. These things needs several hours of uninterrupted > time for me to debug and unwind. As promised, I took a closer look and left some comments in Patch 1 and 2. Patch 1 is 100% broken/wrong so the series will not be accepted. Just for the sake of understanding, can you send a patch that'll add these additional compatible strings similar to how I handled nvmem-cells and show my how much worse it is than this series? > > > > > Unfortunately, with the Openwrt project > > we are experiencing this regression again > > by implementing the new NVMEM layout "fixed-layout" > > after it was accepted into the kernel. [6] > > This causes some subsystems of an SOC, like > > ethernet or wireless or both depending on hardware and DTS, > > and in some cases a completely different function like USB > > to never probe once again, and the temporary fix, like before, > > is by disabling fw_devlink. [7] > > > > Below is a simplified DTS of an Atheros device with the NVMEM layout: > > > > > > ð0 { > > nvmem-cells = <&macaddr_art_0>; > > nvmem-cell-names = "mac-address"; > > }; > > > > &wifi { > > nvmem-cells = <&caldata_art_1000>; > > nvmem-cell-names = "calibration"; > > }; > > > > &spi { > > status = "okay"; > > > > flash@0 { > > compatible = "jedec,spi-nor"; > > > > partitions { > > compatible = "fixed-partitions"; > > > > partition@ff0000 { > > label = "art"; > > reg = <0xff0000 0x010000>; > > read-only; > > > > nvmem-layout { > > compatible = "fixed-layout"; > > > > macaddr_art_0: macaddr@0 { > > reg = <0x0 0x6>; > > }; > > > > caldata_art_1000: caldata@1000 { > > reg = <0x1000 0x440>; > > }; > > }; > > }; > > }; > > }; > > }; In this example, can you walk me through the probe attempts/successes of the nvmem supplier and it's consumers and at what point does fw_devlink makes the wrong decision? You also said fw_devlink depends on the order in which devices probe to work correctly. This is definitely not the case/intention. So, if you can show me an example of that, I'll fix that too. I'm fairly certain this example will end up being a case of the nvmem framework creating pointless devices or using a "bus" when it needs a "class". But I don't want to assume. I'm asking all these question to understand your case better and maybe suggest a better fix that doesn't break fw_devlink or effectively disable it. Thanks, Saravana > > > > > > When the DTS is written this way, not only is there a different node > > involved for NVMEM, but this node is a new node that is yet another > > descendant in the tree. In other words, the "partition@ff0000" node > > used to be what designated this device as the NVMEM provider > > with the "nvmem-cells" compatible, so the node that represents > > the actual probing device is now 4 parent nodes up in the tree > > in this example instead of 3 parent nodes up in the tree as before. > > > > For the past year, and even before the "fixed-layout" issue came up, > > I had been experimenting with a way to handle these reverse probe order > > and linking of distant descendant node issues in a generic way instead of > > naming exceptions with the compatible string, and this series is the > > culmination of those efforts. It may look strange at first, > > but there are a myriad set of cases to handle and other reasons > > that led me to develop this approach of using existing bad device links > > in order to find the correct fwnode to link to, and then redo > > the relevant links for that consumer device altogether. > > I'm concerned that doing this another way would be a much more massive > > project that would basically rewrite what the fw_devlink feature does. > > Or perhaps there would have to be a new, third form of device links > > that would be a "placeholder" before it becomes a fwnode link. > > Eventually, I came to the conclusion that > > there simply is not enough information to form the correct fwnode link > > before the real supplier device has a successful probe. > > > > Some of the changes proposed here are made on the extreme side of caution, > > for example, checking for null dereference when it might not be necessary, > > and reducing the activity of some functions in order to reduce > > the amount of assumptions taking place in the middle of driver core > > in cases where the new functions proposed here handles that just as well > > and closer to a possible probe defer event > > (e.g. not declaring a fwnode as NOT_DEVICE before > > a probe attempt is expected to have happened). > > > > I have tried to make the commit messages as self-explanatory as I can, > > but they might have ended up a little too verbose, and therefore confusing > > but I'm ready to explain whatever has not been explained well already. > > Lastly, this is my first attempt at sending a larger change to the kernel, > > so I appreciate your time and patience very much. > > > > MCP > > > > > > [1] https://lore.kernel.org/lkml/20230207014207.1678715-1-saravanak@google.com/ > > > > [2] bcdf0315a61a29eb753a607d3a85a4032de72d94 > > > > [3] 3a2dbc510c437ca392516b0105bad8e7970e6614 > > > > [4] 411c0d58ca6faa9bc4b9f5382118a31c7bb92a6f > > > > [5] fb42378dcc7f247df56f0ecddfdae85487495fbc > > > > [6] 27f699e578b1a72df89dfa3bc42e093a01dc8d10 > > > > [7] https://github.com/openwrt/openwrt/commit/703d667a0cdf6dfa402c08e72d0c77a257ca5009 > > > > > > Michael Pratt (4): > > driver core: fw_devlink: Use driver to determine probe ability > > driver core: fw_devlink: Link to supplier ancestor if no device > > driver core: fw_devlink: Add function device_links_fixup_suppliers() > > mtd: mtdpart: Allow fwnode links to NVMEM compatible fwnodes > > > > drivers/base/base.h | 1 + > > drivers/base/core.c | 144 ++++++++++++++++++++++++++++++++++++++--- > > drivers/base/dd.c | 2 + > > drivers/mtd/mtdpart.c | 10 --- > > include/linux/fwnode.h | 4 ++ > > 5 files changed, 143 insertions(+), 18 deletions(-) > > > > > > base-commit: b0d326da462e20285236e11e4cbc32085de9f363 > > -- > > 2.30.2 > > > >