Message ID | f2a14edb5761d372ec939ccbea4fb8dfd1fdab91.1731685185.git.pnewman@connecttech.com |
---|---|
State | New |
Headers | show |
Series | net: stmmac: dwmac-tegra: Read iommu stream id from device tree | expand |
On 11/15/24 17:31, Parker Newman wrote: > From: Parker Newman <pnewman@connecttech.com> > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > Fixes kernel panics when using mgbe controllers other than mgbe0. It's better to include the full Oops backtrace, possibly decoded. > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. Since this looks like a fix, you should include a suitable 'Fixes' tag here, and specify the 'net' target tree in the subj prefix. > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > if (IS_ERR(mgbe->xpcs)) > return PTR_ERR(mgbe->xpcs); > > + /* get controller's stream id from iommu property in device tree */ > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > + return -EINVAL; > + } I *think* it would be better to fallback (possibly with a warning or notice) to the previous default value when the device tree property is not available, to avoid regressions. Thanks, Paolo
On Fri, 15 Nov 2024 18:17:07 +0100 Paolo Abeni <pabeni@redhat.com> wrote: > On 11/15/24 17:31, Parker Newman wrote: > > From: Parker Newman <pnewman@connecttech.com> > > > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > > Fixes kernel panics when using mgbe controllers other than mgbe0. > > It's better to include the full Oops backtrace, possibly decoded. > Will do, there are many different ones but I can add the most common. > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. > > Since this looks like a fix, you should include a suitable 'Fixes' tag > here, and specify the 'net' target tree in the subj prefix. > Sorry I missed the "net" tag. The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but in the past I was told they aren't needed in that situation? > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > > if (IS_ERR(mgbe->xpcs)) > > return PTR_ERR(mgbe->xpcs); > > > > + /* get controller's stream id from iommu property in device tree */ > > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > > + return -EINVAL; > > + } > > I *think* it would be better to fallback (possibly with a warning or > notice) to the previous default value when the device tree property is > not available, to avoid regressions. > I debated this as well... In theory the iommu must be setup for the mgbe controller to work anyways. Doing it this way means the worst case is probe() fails and you lose an ethernet port. Having it fall back to mgbe0's SID adds the risk of the entire system crashing. I can see arguments for both methods. I can add the fallback to mgbe0's SID and change the message to a warning when I send V2 if you like. Thanks! Parker > Thanks, > > Paolo >
On Fri, Nov 15, 2024 at 01:59:40PM -0500, Parker Newman wrote: > On Fri, 15 Nov 2024 18:17:07 +0100 > Paolo Abeni <pabeni@redhat.com> wrote: > > > On 11/15/24 17:31, Parker Newman wrote: > > > From: Parker Newman <pnewman@connecttech.com> > > > > > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > > > Fixes kernel panics when using mgbe controllers other than mgbe0. > > > > It's better to include the full Oops backtrace, possibly decoded. > > > > Will do, there are many different ones but I can add the most common. > > > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. > > > > Since this looks like a fix, you should include a suitable 'Fixes' tag > > here, and specify the 'net' target tree in the subj prefix. > > > > Sorry I missed the "net" tag. > > The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but > in the past I was told they aren't needed in that situation? > > > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > > > if (IS_ERR(mgbe->xpcs)) > > > return PTR_ERR(mgbe->xpcs); > > > > > > + /* get controller's stream id from iommu property in device tree */ > > > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > > > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > > > + return -EINVAL; > > > + } > > > > I *think* it would be better to fallback (possibly with a warning or > > notice) to the previous default value when the device tree property is > > not available, to avoid regressions. > > > > I debated this as well... In theory the iommu must be setup for the > mgbe controller to work anyways. Doing it this way means the worst case is > probe() fails and you lose an ethernet port. New DT properties are always optional. Take the example of a board only using a single controller. It should happily work. It probably does not have this property because it is not needed. Your change is likely to cause a regression on such a board. Also, is a binding patch needed? Andrew
On Sat, 16 Nov 2024 20:22:53 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > On Fri, Nov 15, 2024 at 01:59:40PM -0500, Parker Newman wrote: > > On Fri, 15 Nov 2024 18:17:07 +0100 > > Paolo Abeni <pabeni@redhat.com> wrote: > > > > > On 11/15/24 17:31, Parker Newman wrote: > > > > From: Parker Newman <pnewman@connecttech.com> > > > > > > > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > > > > Fixes kernel panics when using mgbe controllers other than mgbe0. > > > > > > It's better to include the full Oops backtrace, possibly decoded. > > > > > > > Will do, there are many different ones but I can add the most common. > > > > > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. > > > > > > Since this looks like a fix, you should include a suitable 'Fixes' tag > > > here, and specify the 'net' target tree in the subj prefix. > > > > > > > Sorry I missed the "net" tag. > > > > The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but > > in the past I was told they aren't needed in that situation? > > > > > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > > > > if (IS_ERR(mgbe->xpcs)) > > > > return PTR_ERR(mgbe->xpcs); > > > > > > > > + /* get controller's stream id from iommu property in device tree */ > > > > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > > > > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > > > > + return -EINVAL; > > > > + } > > > > > > I *think* it would be better to fallback (possibly with a warning or > > > notice) to the previous default value when the device tree property is > > > not available, to avoid regressions. > > > > > > > I debated this as well... In theory the iommu must be setup for the > > mgbe controller to work anyways. Doing it this way means the worst case is > > probe() fails and you lose an ethernet port. > > New DT properties are always optional. Take the example of a board > only using a single controller. It should happily work. It probably > does not have this property because it is not needed. Your change is > likely to cause a regression on such a board. > > Also, is a binding patch needed? > > Andrew This is not a new dt property, the "iommus" property is an existing property that is parsed by the Nvidia implementation of the arm-smmu driver. Here is a snippet from the device tree: smmu_niso0: iommu@12000000 { compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; ... } /* MGBE0 */ ethernet@6800000 { compatible = "nvidia,tegra234-mgbe"; ... iommus = <&smmu_niso0 TEGRA234_SID_MGBE>; ... } /* MGBE1 */ ethernet@6900000 { compatible = "nvidia,tegra234-mgbe"; ... iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; ... } The 2nd field of the iommus propert is the "Stream ID" which arm-smmu stores in the device's struct iommu_fwspec *fwspec. This is what the existing tegra_dev_iommu_get_stream_id() function uses to get the SID. If the iommus property is missing completely from the MGBE's device tree node it causes secure read/write errors which spam the kernel log and can cause crashes. I can add the fallback in V2 with a warning if that is preferred. Thanks, Parker
> This is not a new dt property, the "iommus" property is an existing property > that is parsed by the Nvidia implementation of the arm-smmu driver. > > Here is a snippet from the device tree: > > smmu_niso0: iommu@12000000 { > compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; > ... > } > > /* MGBE0 */ > ethernet@6800000 { > compatible = "nvidia,tegra234-mgbe"; > ... > iommus = <&smmu_niso0 TEGRA234_SID_MGBE>; > ... > } > > /* MGBE1 */ > ethernet@6900000 { > compatible = "nvidia,tegra234-mgbe"; > ... > iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; > ... > } What i was meaning does the nvidia,tegra234-mgbe binding allow iommus? I just checked, yes it does. > If the iommus property is missing completely from the MGBE's device tree node it > causes secure read/write errors which spam the kernel log and can cause crashes. > > I can add the fallback in V2 with a warning if that is preferred. The fact it crashed makes me think it is optional. Any existing users must work, otherwise it would crash, and then be debugged. I guess you are pushing the usage further, and so have come across this condition. Is the iommus a SoC property, or a board property? If it is a SoC property, could you review all the SoC .dtsi files and fix up any which are missing the property? Adding a warning is O.K, but ideally the missing property should be added first. The merge window is open now, so patches will need to wait two weeks. Andrew
On Tue, 19 Nov 2024 01:50:18 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > > This is not a new dt property, the "iommus" property is an existing property > > that is parsed by the Nvidia implementation of the arm-smmu driver. > > > > Here is a snippet from the device tree: > > > > smmu_niso0: iommu@12000000 { > > compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; > > ... > > } > > > > /* MGBE0 */ > > ethernet@6800000 { > > compatible = "nvidia,tegra234-mgbe"; > > ... > > iommus = <&smmu_niso0 TEGRA234_SID_MGBE>; > > ... > > } > > > > /* MGBE1 */ > > ethernet@6900000 { > > compatible = "nvidia,tegra234-mgbe"; > > ... > > iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; > > ... > > } > > What i was meaning does the nvidia,tegra234-mgbe binding allow iommus? > I just checked, yes it does. > > > If the iommus property is missing completely from the MGBE's device tree node it > > causes secure read/write errors which spam the kernel log and can cause crashes. > > > > I can add the fallback in V2 with a warning if that is preferred. > > The fact it crashed makes me think it is optional. Any existing users > must work, otherwise it would crash, and then be debugged. I guess you > are pushing the usage further, and so have come across this condition. > > Is the iommus a SoC property, or a board property? If it is a SoC > property, could you review all the SoC .dtsi files and fix up any > which are missing the property? > > Adding a warning is O.K, but ideally the missing property should be > added first. I think there is some confusion here. I will try to summarize: - Ihe iommu is supported by the Tegra SOC. - The way the mgbe driver is written the iommu DT property is REQUIRED. - "iommus" is a SOC DT property and is defined in tegra234.dtsi. - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. - There are no device tree changes required to to make this patch work. - This patch works fine with existing device trees. I will add the fallback however in case there is changes made to the iommu subsystem in the future. > The merge window is open now, so patches will need to wait two weeks. > Ok thanks, I will wait a couple weeks to resend. Parker
> I think there is some confusion here. I will try to summarize: > - Ihe iommu is supported by the Tegra SOC. > - The way the mgbe driver is written the iommu DT property is REQUIRED. If it is required, please also include a patch to nvidia,tegra234-mgbe.yaml and make iommus required. > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > - There are no device tree changes required to to make this patch work. > - This patch works fine with existing device trees. > > I will add the fallback however in case there is changes made to the iommu > subsystem in the future. I would suggest you make iommus a required property and run the tests over the existing .dts files. I looked at the history of tegra234.dtsi. The ethernet nodes were added in: 610cdf3186bc604961bf04851e300deefd318038 Author: Thierry Reding <treding@nvidia.com> Date: Thu Jul 7 09:48:15 2022 +0200 arm64: tegra: Add MGBE nodes on Tegra234 and the iommus property is present. So the requires is safe. Please expand the commit message. It is clear from all the questions and backwards and forwards, it does not provide enough details. I just have one open issue. The code has been like this for over 2 years. Why has it only now started crashing? Andrew
On Tue, 19 Nov 2024 20:18:00 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > > I think there is some confusion here. I will try to summarize: > > - Ihe iommu is supported by the Tegra SOC. > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > If it is required, please also include a patch to > nvidia,tegra234-mgbe.yaml and make iommus required. > I will add this when I submit a v2 of the patch. > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > - There are no device tree changes required to to make this patch work. > > - This patch works fine with existing device trees. > > > > I will add the fallback however in case there is changes made to the iommu > > subsystem in the future. > > I would suggest you make iommus a required property and run the tests > over the existing .dts files. > > I looked at the history of tegra234.dtsi. The ethernet nodes were > added in: > > 610cdf3186bc604961bf04851e300deefd318038 > Author: Thierry Reding <treding@nvidia.com> > Date: Thu Jul 7 09:48:15 2022 +0200 > > arm64: tegra: Add MGBE nodes on Tegra234 > > and the iommus property is present. So the requires is safe. > > Please expand the commit message. It is clear from all the questions > and backwards and forwards, it does not provide enough details. > I will add more details when I submit V2. > I just have one open issue. The code has been like this for over 2 > years. Why has it only now started crashing? > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia provides a custom kernel package with many out of tree drivers including a driver for the mgbe controllers. Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards that use 2 or more of the mgbe controllers which is why we found the bug. Thanks, Parker
On Tue, Nov 19, 2024 at 02:47:29PM -0500, Parker Newman wrote: > On Tue, 19 Nov 2024 20:18:00 +0100 > Andrew Lunn <andrew@lunn.ch> wrote: > > > > I think there is some confusion here. I will try to summarize: > > > - Ihe iommu is supported by the Tegra SOC. > > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > > > If it is required, please also include a patch to > > nvidia,tegra234-mgbe.yaml and make iommus required. > > > > I will add this when I submit a v2 of the patch. > > > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > > - There are no device tree changes required to to make this patch work. > > > - This patch works fine with existing device trees. > > > > > > I will add the fallback however in case there is changes made to the iommu > > > subsystem in the future. > > > > I would suggest you make iommus a required property and run the tests > > over the existing .dts files. > > > > I looked at the history of tegra234.dtsi. The ethernet nodes were > > added in: > > > > 610cdf3186bc604961bf04851e300deefd318038 > > Author: Thierry Reding <treding@nvidia.com> > > Date: Thu Jul 7 09:48:15 2022 +0200 > > > > arm64: tegra: Add MGBE nodes on Tegra234 > > > > and the iommus property is present. So the requires is safe. > > > > Please expand the commit message. It is clear from all the questions > > and backwards and forwards, it does not provide enough details. > > > > I will add more details when I submit V2. > > > I just have one open issue. The code has been like this for over 2 > > years. Why has it only now started crashing? > > > > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia > provides a custom kernel package with many out of tree drivers including a > driver for the mgbe controllers. > > Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, > the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards > that use 2 or more of the mgbe controllers which is why we found the bug. Correct. Also, this was a really stupid thing that I overlooked. I don't recall the exact circumstances, but I vaguely recall there had been discussions about adding the tegra_dev_iommu_get_stream_id() helper (that this patch uses) around the time that this driver was created. In the midst of all of this I likely forgot to update the driver after the discussions had settled. Anyway, I agree with the conclusion that we don't need a compatibility fallback for this, both because it would be actively wrong to do it and we've had the required IOMMU properties in device tree since the start, so there can't be any regressions caused by this. I don't think it's necessary to make the iommus property required, though, because there's technically no requirement for these devices to be attached to an IOMMU. They usually are, and it's better if they are, but they should be able to work correctly without an IOMMU. Thanks, and apologies for dropping the ball on this, Thierry
On Wed, 4 Dec 2024 17:23:53 +0100 Thierry Reding <thierry.reding@gmail.com> wrote: > On Tue, Nov 19, 2024 at 02:47:29PM -0500, Parker Newman wrote: > > On Tue, 19 Nov 2024 20:18:00 +0100 > > Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > I think there is some confusion here. I will try to summarize: > > > > - Ihe iommu is supported by the Tegra SOC. > > > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > > > > > If it is required, please also include a patch to > > > nvidia,tegra234-mgbe.yaml and make iommus required. > > > > > > > I will add this when I submit a v2 of the patch. > > > > > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > > > - There are no device tree changes required to to make this patch work. > > > > - This patch works fine with existing device trees. > > > > > > > > I will add the fallback however in case there is changes made to the iommu > > > > subsystem in the future. > > > > > > I would suggest you make iommus a required property and run the tests > > > over the existing .dts files. > > > > > > I looked at the history of tegra234.dtsi. The ethernet nodes were > > > added in: > > > > > > 610cdf3186bc604961bf04851e300deefd318038 > > > Author: Thierry Reding <treding@nvidia.com> > > > Date: Thu Jul 7 09:48:15 2022 +0200 > > > > > > arm64: tegra: Add MGBE nodes on Tegra234 > > > > > > and the iommus property is present. So the requires is safe. > > > > > > Please expand the commit message. It is clear from all the questions > > > and backwards and forwards, it does not provide enough details. > > > > > > > I will add more details when I submit V2. > > > > > I just have one open issue. The code has been like this for over 2 > > > years. Why has it only now started crashing? > > > > > > > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia > > provides a custom kernel package with many out of tree drivers including a > > driver for the mgbe controllers. > > > > Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, > > the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards > > that use 2 or more of the mgbe controllers which is why we found the bug. > > Correct. Also, this was a really stupid thing that I overlooked. I don't > recall the exact circumstances, but I vaguely recall there had been > discussions about adding the tegra_dev_iommu_get_stream_id() helper > (that this patch uses) around the time that this driver was created. In > the midst of all of this I likely forgot to update the driver after the > discussions had settled. > > Anyway, I agree with the conclusion that we don't need a compatibility > fallback for this, both because it would be actively wrong to do it and > we've had the required IOMMU properties in device tree since the start, > so there can't be any regressions caused by this. > > I don't think it's necessary to make the iommus property required, > though, because there's technically no requirement for these devices to > be attached to an IOMMU. They usually are, and it's better if they are, > but they should be able to work correctly without an IOMMU. > Thanks for confirming from the Nvidia side! I wasn't sure if they would work without the iommu. That said, if you did NOT want to use the iommu and removed the iommu DT property then the probe will fail after my patch. Would we not need a guard around the writes to MGBE_WRAP_AXI_ASID0_CTRL as well? Thanks, Parker > Thanks, and apologies for dropping the ball on this, > Thierry
On Wed, Dec 04, 2024 at 11:53:17AM -0500, Parker Newman wrote: > On Wed, 4 Dec 2024 17:23:53 +0100 > Thierry Reding <thierry.reding@gmail.com> wrote: > > > On Tue, Nov 19, 2024 at 02:47:29PM -0500, Parker Newman wrote: > > > On Tue, 19 Nov 2024 20:18:00 +0100 > > > Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > > > I think there is some confusion here. I will try to summarize: > > > > > - Ihe iommu is supported by the Tegra SOC. > > > > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > > > > > > > If it is required, please also include a patch to > > > > nvidia,tegra234-mgbe.yaml and make iommus required. > > > > > > > > > > I will add this when I submit a v2 of the patch. > > > > > > > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > > > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > > > > - There are no device tree changes required to to make this patch work. > > > > > - This patch works fine with existing device trees. > > > > > > > > > > I will add the fallback however in case there is changes made to the iommu > > > > > subsystem in the future. > > > > > > > > I would suggest you make iommus a required property and run the tests > > > > over the existing .dts files. > > > > > > > > I looked at the history of tegra234.dtsi. The ethernet nodes were > > > > added in: > > > > > > > > 610cdf3186bc604961bf04851e300deefd318038 > > > > Author: Thierry Reding <treding@nvidia.com> > > > > Date: Thu Jul 7 09:48:15 2022 +0200 > > > > > > > > arm64: tegra: Add MGBE nodes on Tegra234 > > > > > > > > and the iommus property is present. So the requires is safe. > > > > > > > > Please expand the commit message. It is clear from all the questions > > > > and backwards and forwards, it does not provide enough details. > > > > > > > > > > I will add more details when I submit V2. > > > > > > > I just have one open issue. The code has been like this for over 2 > > > > years. Why has it only now started crashing? > > > > > > > > > > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia > > > provides a custom kernel package with many out of tree drivers including a > > > driver for the mgbe controllers. > > > > > > Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, > > > the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards > > > that use 2 or more of the mgbe controllers which is why we found the bug. > > > > Correct. Also, this was a really stupid thing that I overlooked. I don't > > recall the exact circumstances, but I vaguely recall there had been > > discussions about adding the tegra_dev_iommu_get_stream_id() helper > > (that this patch uses) around the time that this driver was created. In > > the midst of all of this I likely forgot to update the driver after the > > discussions had settled. > > > > Anyway, I agree with the conclusion that we don't need a compatibility > > fallback for this, both because it would be actively wrong to do it and > > we've had the required IOMMU properties in device tree since the start, > > so there can't be any regressions caused by this. > > > > I don't think it's necessary to make the iommus property required, > > though, because there's technically no requirement for these devices to > > be attached to an IOMMU. They usually are, and it's better if they are, > > but they should be able to work correctly without an IOMMU. > > > Thanks for confirming from the Nvidia side! I wasn't sure if they would > work without the iommu. That said, if you did NOT want to use the iommu > and removed the iommu DT property then the probe will fail after my patch. > Would we not need a guard around the writes to MGBE_WRAP_AXI_ASID0_CTRL as well? Well... frankly, I don't know. There's an override in the memory controller which we set when a device is attached to an IOMMU. That's usually how the stream ID is programmed. If we don't do that it will typically default to a special passthrough stream ID (technically the firmware can lock down those register and force them to a specific stream ID all the time). I'm not sure what exactly the impact is of these ASID registers (there's a few other instances where those are needed). They are required if you want to use the IOMMU, but I don't know what their meaning is if the IOMMU is not enabled. There's also different cases: the IOMMU can be disabled altogether, in which case no page tables and such exist for any device and no translation should happen whatsoever. But there's also the case where an IOMMU is enabled, but certain devices shouldn't attach to them. I should make it very clear that both of these are not recommended setups and I don't know if they'll work. And they are mostly untested. I've been meaning, but haven't found the time, to test some of these cases. Thierry
On Wed, 4 Dec 2024 19:06:30 +0100 Thierry Reding <thierry.reding@gmail.com> wrote: > On Wed, Dec 04, 2024 at 11:53:17AM -0500, Parker Newman wrote: > > On Wed, 4 Dec 2024 17:23:53 +0100 > > Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > On Tue, Nov 19, 2024 at 02:47:29PM -0500, Parker Newman wrote: > > > > On Tue, 19 Nov 2024 20:18:00 +0100 > > > > Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > > > > > I think there is some confusion here. I will try to summarize: > > > > > > - Ihe iommu is supported by the Tegra SOC. > > > > > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > > > > > > > > > If it is required, please also include a patch to > > > > > nvidia,tegra234-mgbe.yaml and make iommus required. > > > > > > > > > > > > > I will add this when I submit a v2 of the patch. > > > > > > > > > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > > > > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > > > > > - There are no device tree changes required to to make this patch work. > > > > > > - This patch works fine with existing device trees. > > > > > > > > > > > > I will add the fallback however in case there is changes made to the iommu > > > > > > subsystem in the future. > > > > > > > > > > I would suggest you make iommus a required property and run the tests > > > > > over the existing .dts files. > > > > > > > > > > I looked at the history of tegra234.dtsi. The ethernet nodes were > > > > > added in: > > > > > > > > > > 610cdf3186bc604961bf04851e300deefd318038 > > > > > Author: Thierry Reding <treding@nvidia.com> > > > > > Date: Thu Jul 7 09:48:15 2022 +0200 > > > > > > > > > > arm64: tegra: Add MGBE nodes on Tegra234 > > > > > > > > > > and the iommus property is present. So the requires is safe. > > > > > > > > > > Please expand the commit message. It is clear from all the questions > > > > > and backwards and forwards, it does not provide enough details. > > > > > > > > > > > > > I will add more details when I submit V2. > > > > > > > > > I just have one open issue. The code has been like this for over 2 > > > > > years. Why has it only now started crashing? > > > > > > > > > > > > > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia > > > > provides a custom kernel package with many out of tree drivers including a > > > > driver for the mgbe controllers. > > > > > > > > Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, > > > > the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards > > > > that use 2 or more of the mgbe controllers which is why we found the bug. > > > > > > Correct. Also, this was a really stupid thing that I overlooked. I don't > > > recall the exact circumstances, but I vaguely recall there had been > > > discussions about adding the tegra_dev_iommu_get_stream_id() helper > > > (that this patch uses) around the time that this driver was created. In > > > the midst of all of this I likely forgot to update the driver after the > > > discussions had settled. > > > > > > Anyway, I agree with the conclusion that we don't need a compatibility > > > fallback for this, both because it would be actively wrong to do it and > > > we've had the required IOMMU properties in device tree since the start, > > > so there can't be any regressions caused by this. > > > > > > I don't think it's necessary to make the iommus property required, > > > though, because there's technically no requirement for these devices to > > > be attached to an IOMMU. They usually are, and it's better if they are, > > > but they should be able to work correctly without an IOMMU. > > > > > Thanks for confirming from the Nvidia side! I wasn't sure if they would > > work without the iommu. That said, if you did NOT want to use the iommu > > and removed the iommu DT property then the probe will fail after my patch. > > Would we not need a guard around the writes to MGBE_WRAP_AXI_ASID0_CTRL as well? > > Well... frankly, I don't know. There's an override in the memory > controller which we set when a device is attached to an IOMMU. That's > usually how the stream ID is programmed. If we don't do that it will > typically default to a special passthrough stream ID (technically the > firmware can lock down those register and force them to a specific > stream ID all the time). I'm not sure what exactly the impact is of > these ASID registers (there's a few other instances where those are > needed). They are required if you want to use the IOMMU, but I don't > know what their meaning is if the IOMMU is not enabled. > > There's also different cases: the IOMMU can be disabled altogether, in > which case no page tables and such exist for any device and no > translation should happen whatsoever. But there's also the case where an > IOMMU is enabled, but certain devices shouldn't attach to them. I should > make it very clear that both of these are not recommended setups and I > don't know if they'll work. And they are mostly untested. I've been > meaning, but haven't found the time, to test some of these cases. > Yes I agree, all of those situations are very niche and not recommended for a Tegra platform that should always have the iommu enabled anyways. Adding an iommu bypass would probably be outside of the scope of this patch. I will not add a patch marking the iommu as required in the device tree bindings when I submit v2 unless anyone else feels different. Thanks again, Parker
On Fri, Dec 06, 2024 at 08:42:03AM -0500, Parker Newman wrote: > On Wed, 4 Dec 2024 19:06:30 +0100 > Thierry Reding <thierry.reding@gmail.com> wrote: > > > On Wed, Dec 04, 2024 at 11:53:17AM -0500, Parker Newman wrote: > > > On Wed, 4 Dec 2024 17:23:53 +0100 > > > Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > > On Tue, Nov 19, 2024 at 02:47:29PM -0500, Parker Newman wrote: > > > > > On Tue, 19 Nov 2024 20:18:00 +0100 > > > > > Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > > > > > > > I think there is some confusion here. I will try to summarize: > > > > > > > - Ihe iommu is supported by the Tegra SOC. > > > > > > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > > > > > > > > > > > If it is required, please also include a patch to > > > > > > nvidia,tegra234-mgbe.yaml and make iommus required. > > > > > > > > > > > > > > > > I will add this when I submit a v2 of the patch. > > > > > > > > > > > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > > > > > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > > > > > > - There are no device tree changes required to to make this patch work. > > > > > > > - This patch works fine with existing device trees. > > > > > > > > > > > > > > I will add the fallback however in case there is changes made to the iommu > > > > > > > subsystem in the future. > > > > > > > > > > > > I would suggest you make iommus a required property and run the tests > > > > > > over the existing .dts files. > > > > > > > > > > > > I looked at the history of tegra234.dtsi. The ethernet nodes were > > > > > > added in: > > > > > > > > > > > > 610cdf3186bc604961bf04851e300deefd318038 > > > > > > Author: Thierry Reding <treding@nvidia.com> > > > > > > Date: Thu Jul 7 09:48:15 2022 +0200 > > > > > > > > > > > > arm64: tegra: Add MGBE nodes on Tegra234 > > > > > > > > > > > > and the iommus property is present. So the requires is safe. > > > > > > > > > > > > Please expand the commit message. It is clear from all the questions > > > > > > and backwards and forwards, it does not provide enough details. > > > > > > > > > > > > > > > > I will add more details when I submit V2. > > > > > > > > > > > I just have one open issue. The code has been like this for over 2 > > > > > > years. Why has it only now started crashing? > > > > > > > > > > > > > > > > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia > > > > > provides a custom kernel package with many out of tree drivers including a > > > > > driver for the mgbe controllers. > > > > > > > > > > Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, > > > > > the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards > > > > > that use 2 or more of the mgbe controllers which is why we found the bug. > > > > > > > > Correct. Also, this was a really stupid thing that I overlooked. I don't > > > > recall the exact circumstances, but I vaguely recall there had been > > > > discussions about adding the tegra_dev_iommu_get_stream_id() helper > > > > (that this patch uses) around the time that this driver was created. In > > > > the midst of all of this I likely forgot to update the driver after the > > > > discussions had settled. > > > > > > > > Anyway, I agree with the conclusion that we don't need a compatibility > > > > fallback for this, both because it would be actively wrong to do it and > > > > we've had the required IOMMU properties in device tree since the start, > > > > so there can't be any regressions caused by this. > > > > > > > > I don't think it's necessary to make the iommus property required, > > > > though, because there's technically no requirement for these devices to > > > > be attached to an IOMMU. They usually are, and it's better if they are, > > > > but they should be able to work correctly without an IOMMU. > > > > > > > Thanks for confirming from the Nvidia side! I wasn't sure if they would > > > work without the iommu. That said, if you did NOT want to use the iommu > > > and removed the iommu DT property then the probe will fail after my patch. > > > Would we not need a guard around the writes to MGBE_WRAP_AXI_ASID0_CTRL as well? > > > > Well... frankly, I don't know. There's an override in the memory > > controller which we set when a device is attached to an IOMMU. That's > > usually how the stream ID is programmed. If we don't do that it will > > typically default to a special passthrough stream ID (technically the > > firmware can lock down those register and force them to a specific > > stream ID all the time). I'm not sure what exactly the impact is of > > these ASID registers (there's a few other instances where those are > > needed). They are required if you want to use the IOMMU, but I don't > > know what their meaning is if the IOMMU is not enabled. > > > > There's also different cases: the IOMMU can be disabled altogether, in > > which case no page tables and such exist for any device and no > > translation should happen whatsoever. But there's also the case where an > > IOMMU is enabled, but certain devices shouldn't attach to them. I should > > make it very clear that both of these are not recommended setups and I > > don't know if they'll work. And they are mostly untested. I've been > > meaning, but haven't found the time, to test some of these cases. > > > > Yes I agree, all of those situations are very niche and not recommended for > a Tegra platform that should always have the iommu enabled anyways. Adding an > iommu bypass would probably be outside of the scope of this patch. > > I will not add a patch marking the iommu as required in the device tree > bindings when I submit v2 unless anyone else feels different. I was able to find a bit more information on this. Starting with Tegra234 it's usually no longer possible to even enable bypass. It can still be done, but it needs to be carefully coordinated between the secure bootloader/firmware and the OS. Basically the secure firmware can lock down the ability to bypass the IOMMU. If the firmware was configured to allow bypass, the driver can then do so by writing the special stream ID 0x7f into the stream ID register. On these newer chips the memory controller override no longer has any effect and writing the per-device stream ID registers is the only way to attach to the IOMMU. There's still the case where you can disable the IOMMU altogether, in which case the IOMMU will still be bypassed, no matter what the firmware did. My understanding is that it doesn't matter in those cases whether we write the stream ID registers or not, they will simply get ignored. With one exception perhaps being the bypass SID. If you write that, then there's a protection mechanism that kicks in. Well, after all this this still isn't entirely clear to me, but I think what it means in a nutshell is that a) we'll want to keep the IOMMU always on for security and because the firmware is by default configured to not allow bypassing, b) IOMMU isn't strictly required because the IOMMU might be completely disabled and c) we shouldn't need to conditionalize the stream ID register writes. That said, the tegra_dev_iommu_get_stream_id() function returns a bool specifically to support the latter case. So the intention with the design was that drivers would call that function and only need to write the stream ID register if it returns true. That's not always great because you may want (or need) to rewrite the register after suspend/ resume, in which case you probably want to resort to a cached value rather than call that API. On the other hand the API is quite simple, so it could serve as a cache as well. Meh. Thierry
On Fri, 6 Dec 2024 17:01:01 +0100 Thierry Reding <thierry.reding@gmail.com> wrote: > On Fri, Dec 06, 2024 at 08:42:03AM -0500, Parker Newman wrote: > > On Wed, 4 Dec 2024 19:06:30 +0100 > > Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > On Wed, Dec 04, 2024 at 11:53:17AM -0500, Parker Newman wrote: > > > > On Wed, 4 Dec 2024 17:23:53 +0100 > > > > Thierry Reding <thierry.reding@gmail.com> wrote: > > > > > > > > > On Tue, Nov 19, 2024 at 02:47:29PM -0500, Parker Newman wrote: > > > > > > On Tue, 19 Nov 2024 20:18:00 +0100 > > > > > > Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > > > > > > > > > I think there is some confusion here. I will try to summarize: > > > > > > > > - Ihe iommu is supported by the Tegra SOC. > > > > > > > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > > > > > > > > > > > > > If it is required, please also include a patch to > > > > > > > nvidia,tegra234-mgbe.yaml and make iommus required. > > > > > > > > > > > > > > > > > > > I will add this when I submit a v2 of the patch. > > > > > > > > > > > > > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > > > > > > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > > > > > > > - There are no device tree changes required to to make this patch work. > > > > > > > > - This patch works fine with existing device trees. > > > > > > > > > > > > > > > > I will add the fallback however in case there is changes made to the iommu > > > > > > > > subsystem in the future. > > > > > > > > > > > > > > I would suggest you make iommus a required property and run the tests > > > > > > > over the existing .dts files. > > > > > > > > > > > > > > I looked at the history of tegra234.dtsi. The ethernet nodes were > > > > > > > added in: > > > > > > > > > > > > > > 610cdf3186bc604961bf04851e300deefd318038 > > > > > > > Author: Thierry Reding <treding@nvidia.com> > > > > > > > Date: Thu Jul 7 09:48:15 2022 +0200 > > > > > > > > > > > > > > arm64: tegra: Add MGBE nodes on Tegra234 > > > > > > > > > > > > > > and the iommus property is present. So the requires is safe. > > > > > > > > > > > > > > Please expand the commit message. It is clear from all the questions > > > > > > > and backwards and forwards, it does not provide enough details. > > > > > > > > > > > > > > > > > > > I will add more details when I submit V2. > > > > > > > > > > > > > I just have one open issue. The code has been like this for over 2 > > > > > > > years. Why has it only now started crashing? > > > > > > > > > > > > > > > > > > > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia > > > > > > provides a custom kernel package with many out of tree drivers including a > > > > > > driver for the mgbe controllers. > > > > > > > > > > > > Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, > > > > > > the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards > > > > > > that use 2 or more of the mgbe controllers which is why we found the bug. > > > > > > > > > > Correct. Also, this was a really stupid thing that I overlooked. I don't > > > > > recall the exact circumstances, but I vaguely recall there had been > > > > > discussions about adding the tegra_dev_iommu_get_stream_id() helper > > > > > (that this patch uses) around the time that this driver was created. In > > > > > the midst of all of this I likely forgot to update the driver after the > > > > > discussions had settled. > > > > > > > > > > Anyway, I agree with the conclusion that we don't need a compatibility > > > > > fallback for this, both because it would be actively wrong to do it and > > > > > we've had the required IOMMU properties in device tree since the start, > > > > > so there can't be any regressions caused by this. > > > > > > > > > > I don't think it's necessary to make the iommus property required, > > > > > though, because there's technically no requirement for these devices to > > > > > be attached to an IOMMU. They usually are, and it's better if they are, > > > > > but they should be able to work correctly without an IOMMU. > > > > > > > > > Thanks for confirming from the Nvidia side! I wasn't sure if they would > > > > work without the iommu. That said, if you did NOT want to use the iommu > > > > and removed the iommu DT property then the probe will fail after my patch. > > > > Would we not need a guard around the writes to MGBE_WRAP_AXI_ASID0_CTRL as well? > > > > > > Well... frankly, I don't know. There's an override in the memory > > > controller which we set when a device is attached to an IOMMU. That's > > > usually how the stream ID is programmed. If we don't do that it will > > > typically default to a special passthrough stream ID (technically the > > > firmware can lock down those register and force them to a specific > > > stream ID all the time). I'm not sure what exactly the impact is of > > > these ASID registers (there's a few other instances where those are > > > needed). They are required if you want to use the IOMMU, but I don't > > > know what their meaning is if the IOMMU is not enabled. > > > > > > There's also different cases: the IOMMU can be disabled altogether, in > > > which case no page tables and such exist for any device and no > > > translation should happen whatsoever. But there's also the case where an > > > IOMMU is enabled, but certain devices shouldn't attach to them. I should > > > make it very clear that both of these are not recommended setups and I > > > don't know if they'll work. And they are mostly untested. I've been > > > meaning, but haven't found the time, to test some of these cases. > > > > > > > Yes I agree, all of those situations are very niche and not recommended for > > a Tegra platform that should always have the iommu enabled anyways. Adding an > > iommu bypass would probably be outside of the scope of this patch. > > > > I will not add a patch marking the iommu as required in the device tree > > bindings when I submit v2 unless anyone else feels different. > > I was able to find a bit more information on this. Starting with > Tegra234 it's usually no longer possible to even enable bypass. It can > still be done, but it needs to be carefully coordinated between the > secure bootloader/firmware and the OS. Basically the secure firmware > can lock down the ability to bypass the IOMMU. If the firmware was > configured to allow bypass, the driver can then do so by writing the > special stream ID 0x7f into the stream ID register. > > On these newer chips the memory controller override no longer has any > effect and writing the per-device stream ID registers is the only way to > attach to the IOMMU. > > There's still the case where you can disable the IOMMU altogether, in > which case the IOMMU will still be bypassed, no matter what the firmware > did. My understanding is that it doesn't matter in those cases whether > we write the stream ID registers or not, they will simply get ignored. > With one exception perhaps being the bypass SID. If you write that, then > there's a protection mechanism that kicks in. > > Well, after all this this still isn't entirely clear to me, but I think > what it means in a nutshell is that a) we'll want to keep the IOMMU > always on for security and because the firmware is by default configured > to not allow bypassing, b) IOMMU isn't strictly required because the > IOMMU might be completely disabled and c) we shouldn't need to > conditionalize the stream ID register writes. > > That said, the tegra_dev_iommu_get_stream_id() function returns a bool > specifically to support the latter case. So the intention with the > design was that drivers would call that function and only need to write > the stream ID register if it returns true. That's not always great > because you may want (or need) to rewrite the register after suspend/ > resume, in which case you probably want to resort to a cached value > rather than call that API. On the other hand the API is quite simple, so > it could serve as a cache as well. > I guess I could add an IS_ENABLED(CONFIG_IOMMU_API) check if tegra_dev_iommu_get_stream_id() returns false in probe? That would cover if the IOMMU_API is not enabled but not if the Tegra's iommu is disabled in other ways. -Parker
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c index 3827997d2132..dc903b846b1b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only +#include <linux/iommu.h> #include <linux/platform_device.h> #include <linux/of.h> #include <linux/module.h> @@ -19,6 +20,8 @@ struct tegra_mgbe { struct reset_control *rst_mac; struct reset_control *rst_pcs; + u32 iommu_sid; + void __iomem *hv; void __iomem *regs; void __iomem *xpcs; @@ -50,7 +53,6 @@ struct tegra_mgbe { #define MGBE_WRAP_COMMON_INTR_ENABLE 0x8704 #define MAC_SBD_INTR BIT(2) #define MGBE_WRAP_AXI_ASID0_CTRL 0x8400 -#define MGBE_SID 0x6 static int __maybe_unused tegra_mgbe_suspend(struct device *dev) { @@ -84,7 +86,7 @@ static int __maybe_unused tegra_mgbe_resume(struct device *dev) writel(MAC_SBD_INTR, mgbe->regs + MGBE_WRAP_COMMON_INTR_ENABLE); /* Program SID */ - writel(MGBE_SID, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); + writel(mgbe->iommu_sid, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); value = readl(mgbe->xpcs + XPCS_WRAP_UPHY_STATUS); if ((value & XPCS_WRAP_UPHY_STATUS_TX_P_UP) == 0) { @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) if (IS_ERR(mgbe->xpcs)) return PTR_ERR(mgbe->xpcs); + /* get controller's stream id from iommu property in device tree */ + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { + dev_err(mgbe->dev, "failed to get iommu stream id\n"); + return -EINVAL; + } + res.addr = mgbe->regs; res.irq = irq; @@ -346,7 +354,7 @@ static int tegra_mgbe_probe(struct platform_device *pdev) writel(MAC_SBD_INTR, mgbe->regs + MGBE_WRAP_COMMON_INTR_ENABLE); /* Program SID */ - writel(MGBE_SID, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); + writel(mgbe->iommu_sid, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); plat->flags |= STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP;