Message ID | 8b4fa91380fc4754ea80f47330c613e4f6b6592c.1724159867.git.andrea.porta@suse.com |
---|---|
State | New |
Headers | show |
Series | Add support for RaspberryPi RP1 PCI device using a DT overlay | expand |
On Tue, Aug 20, 2024 at 04:36:05PM +0200, Andrea della Porta wrote: > The of_pci_set_address() function parse devicetree PCI range specifier s/parse/parses/ ? > assuming the address is 'sanitized' at the origin, i.e. without checking > whether the incoming address is 32 or 64 bit has specified in the flags. > In this way an address with no OF_PCI_ADDR_SPACE_MEM64 set in the flagss s/flagss/flags/ > could leak through and the upper 32 bits of the address will be set too, > and this violates the PCI specs stating that ion 32 bit address the upper s/ion/in/ > bit should be zero. I don't understand this code, so I'm probably missing something. It looks like the interesting path here is: of_pci_prop_ranges res = &pdev->resource[...]; for (j = 0; j < num; j++) { val64 = res[j].start; of_pci_set_address(..., val64, 0, flags, false); + if (OF_PCI_ADDR_SPACE_MEM64) + prop[1] = upper_32_bits(val64); + else + prop[1] = 0; OF_PCI_ADDR_SPACE_MEM64 tells us about the size of the PCI bus address, but the address (val64) is a CPU physical address, not a PCI bus address, so I don't understand why of_pci_set_address() should use OF_PCI_ADDR_SPACE_MEM64 to clear part of the CPU address. Add blank lines between paragraphs. > This could cause mapping translation mismatch on PCI devices (e.g. RP1) > that are expected to be addressed with a 64 bit address while advertising > a 32 bit address in the PCI config region. > Add a check in of_pci_set_address() to set upper 32 bits to zero in case > the address has no 64 bit flag set. Is this an indication of a DT error? Have you seen this cause a problem? If so, what does it look like to a user? I.e., how could a user find this patch if they saw a problem? > Signed-off-by: Andrea della Porta <andrea.porta@suse.com> > --- > drivers/pci/of_property.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/of_property.c b/drivers/pci/of_property.c > index 5a0b98e69795..77865facdb4a 100644 > --- a/drivers/pci/of_property.c > +++ b/drivers/pci/of_property.c > @@ -60,7 +60,10 @@ static void of_pci_set_address(struct pci_dev *pdev, u32 *prop, u64 addr, > prop[0] |= flags | reg_num; > if (!reloc) { > prop[0] |= OF_PCI_ADDR_FIELD_NONRELOC; > - prop[1] = upper_32_bits(addr); > + if (FIELD_GET(OF_PCI_ADDR_FIELD_SS, flags) == OF_PCI_ADDR_SPACE_MEM64) > + prop[1] = upper_32_bits(addr); > + else > + prop[1] = 0; > prop[2] = lower_32_bits(addr); > } > } > -- > 2.35.3 >
Hi Bjorn, On 10:24 Wed 21 Aug , Bjorn Helgaas wrote: > On Tue, Aug 20, 2024 at 04:36:05PM +0200, Andrea della Porta wrote: > > The of_pci_set_address() function parse devicetree PCI range specifier > > s/parse/parses/ ? Ack. > > > assuming the address is 'sanitized' at the origin, i.e. without checking > > whether the incoming address is 32 or 64 bit has specified in the flags. > > In this way an address with no OF_PCI_ADDR_SPACE_MEM64 set in the flagss > > s/flagss/flags/ Ack. > > > could leak through and the upper 32 bits of the address will be set too, > > and this violates the PCI specs stating that ion 32 bit address the upper > > s/ion/in/ Ack. > > > bit should be zero. > > I don't understand this code, so I'm probably missing something. It > looks like the interesting path here is: > > of_pci_prop_ranges > res = &pdev->resource[...]; > for (j = 0; j < num; j++) { > val64 = res[j].start; > of_pci_set_address(..., val64, 0, flags, false); > + if (OF_PCI_ADDR_SPACE_MEM64) > + prop[1] = upper_32_bits(val64); > + else > + prop[1] = 0; > > OF_PCI_ADDR_SPACE_MEM64 tells us about the size of the PCI bus > address, but the address (val64) is a CPU physical address, not a PCI > bus address, so I don't understand why of_pci_set_address() should use > OF_PCI_ADDR_SPACE_MEM64 to clear part of the CPU address. > It all starts from of_pci_prop_ranges(), that is the caller of of_pci_set_address(). val64 (i.e. res[j].start) is the address part of a struct resource that has its own flags. Those flags are directly translated to of_pci_range flags by of_pci_get_addr_flags(), so any IORESOURCE_MEM_64 / IORESOURCE_MEM in the resource flag will respectively become OF_PCI_ADDR_SPACE_MEM64 / OF_PCI_ADDR_SPACE_MEM32 in pci range. What is advertised as 32 bit at the origin (val64) should not become a 64 bit PCI address at the output of of_pci_set_address(), so the upper 32 bit portion should be dropped. This is explicitly stated in [1] (see page 5), where a space code of 0b10 implies that the upper 32 bit of the address must be zeroed out. Please note that of_pci_prop_ranges() will be called only in case CONFIG_PCI_DYNAMIC_OF_NODES is enabled to populate ranges for pci bridges and pci endpoint for which a quirk is declared, so I would say not a very often recurring use case. [1] - https://www.devicetree.org/open-firmware/bindings/pci/pci2_1.pdf > Add blank lines between paragraphs. Ack. > > > This could cause mapping translation mismatch on PCI devices (e.g. RP1) > > that are expected to be addressed with a 64 bit address while advertising > > a 32 bit address in the PCI config region. > > Add a check in of_pci_set_address() to set upper 32 bits to zero in case > > the address has no 64 bit flag set. > > Is this an indication of a DT error? Have you seen this cause a > problem? If so, what does it look like to a user? I.e., how could a > user find this patch if they saw a problem? Not neccessarily a DT error, but an inconsistent representation of addresses wrt the specs. I incidentally encountered this on RaspberryPi 5, where the PCI config space for the RP1 endpoint shows 32 bit BARs (basically an offset from zero) but the effective address to which the CPU can access the device is 64 bit nonetheless (0x1f_00000000). I believe this is backed by some non standard hw wiring. Without this patch the range translation chain is broken, like this: pcie@120000: <0x2000000 0x00 0x00 0x1f 0x00 0x00 0xfffffffc>; ~~~ chain breaks here ~~~ pci@0 : <0x82000000 0x1f 0x00 0x82000000 0x1f 0x00 0x00 0x600000>; dev@0,0 : <0x01 0x00 0x00 0x82010000 0x1f 0x00 0x00 0x400000>; rp1@0 : <0xc0 0x40000000 0x01 0x00 0x00 0x00 0x400000>; while with the patch applied the chain correctly become: pcie@120000: <0x2000000 0x00 0x00 0x1f 0x00 0x00 0xfffffffc>; pci@0 : <0x82000000 0x00 0x00 0x82000000 0x00 0x00 0x00 0x600000>; dev@0,0 : <0x01 0x00 0x00 0x82010000 0x00 0x00 0x00 0x400000>; rp1@0 : <0xc0 0x40000000 0x01 0x00 0x00 0x00 0x400000>; I'm not advocating here that this patch is fixing the behaviour on Rpi5, this is just a nice side effect of the correct address representation. I think we can also probably fix it by patching the pcie node in the devicetree like this: pcie@120000: <0x2000000 0xi1f 0x00 0x1f 0x00 0x00 0xfffffffc>; but this is of course just a 1:1 mapping, while the address will still be at least 'virtually' unconsistent, showing 64 bit address wihile the 32 bit flag is set. The net symptoms to the user would be, in the case of the RP1, a platform driver of one of its sub-peripheral that fails to be probed. Many thanks, Andrea > > > Signed-off-by: Andrea della Porta <andrea.porta@suse.com> > > --- > > drivers/pci/of_property.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/of_property.c b/drivers/pci/of_property.c > > index 5a0b98e69795..77865facdb4a 100644 > > --- a/drivers/pci/of_property.c > > +++ b/drivers/pci/of_property.c > > @@ -60,7 +60,10 @@ static void of_pci_set_address(struct pci_dev *pdev, u32 *prop, u64 addr, > > prop[0] |= flags | reg_num; > > if (!reloc) { > > prop[0] |= OF_PCI_ADDR_FIELD_NONRELOC; > > - prop[1] = upper_32_bits(addr); > > + if (FIELD_GET(OF_PCI_ADDR_FIELD_SS, flags) == OF_PCI_ADDR_SPACE_MEM64) > > + prop[1] = upper_32_bits(addr); > > + else > > + prop[1] = 0; > > prop[2] = lower_32_bits(addr); > > } > > } > > -- > > 2.35.3 > >
On Mon, Aug 26, 2024 at 09:51:02PM +0200, Andrea della Porta wrote: > On 10:24 Wed 21 Aug , Bjorn Helgaas wrote: > > On Tue, Aug 20, 2024 at 04:36:05PM +0200, Andrea della Porta wrote: > > > The of_pci_set_address() function parses devicetree PCI range > > > specifier assuming the address is 'sanitized' at the origin, > > > i.e. without checking whether the incoming address is 32 or 64 > > > bit has specified in the flags. In this way an address with no > > > OF_PCI_ADDR_SPACE_MEM64 set in the flags could leak through and > > > the upper 32 bits of the address will be set too, and this > > > violates the PCI specs stating that in 32 bit address the upper > > > bit should be zero. > > I don't understand this code, so I'm probably missing something. It > > looks like the interesting path here is: > > > > of_pci_prop_ranges > > res = &pdev->resource[...]; > > for (j = 0; j < num; j++) { > > val64 = res[j].start; > > of_pci_set_address(..., val64, 0, flags, false); > > + if (OF_PCI_ADDR_SPACE_MEM64) > > + prop[1] = upper_32_bits(val64); > > + else > > + prop[1] = 0; > > > > OF_PCI_ADDR_SPACE_MEM64 tells us about the size of the PCI bus > > address, but the address (val64) is a CPU physical address, not a PCI > > bus address, so I don't understand why of_pci_set_address() should use > > OF_PCI_ADDR_SPACE_MEM64 to clear part of the CPU address. > > It all starts from of_pci_prop_ranges(), that is the caller of > of_pci_set_address(). > val64 (i.e. res[j].start) is the address part of a struct resource > that has its own flags. Those flags are directly translated to > of_pci_range flags by of_pci_get_addr_flags(), so any > IORESOURCE_MEM_64 / IORESOURCE_MEM in the resource flag will > respectively become OF_PCI_ADDR_SPACE_MEM64 / > OF_PCI_ADDR_SPACE_MEM32 in pci range. > What is advertised as 32 bit at the origin (val64) should not become > a 64 bit PCI address at the output of of_pci_set_address(), so the > upper 32 bit portion should be dropped. > This is explicitly stated in [1] (see page 5), where a space code of 0b10 > implies that the upper 32 bit of the address must be zeroed out. OK, I was confused and thought IORESOURCE_MEM_64 was telling us something about the *CPU* address, but it's actually telling us something about what *PCI bus* addresses are possible, i.e., whether it's a 32-bit BAR or a 64-bit BAR. However, the CPU physical address space and the PCI bus address are not the same. Generic code paths should account for that different by applying an offset (the offset will be zero on many platforms where CPU and PCI bus addresses *look* the same). So a generic code path like of_pci_prop_ranges() that basically copies a CPU physical address to a PCI bus address looks broken to me. Maybe my expectation of this being described in DT is mistaken. Bjorn
Hi Bjorn, On 17:26 Tue 03 Sep , Bjorn Helgaas wrote: > On Mon, Aug 26, 2024 at 09:51:02PM +0200, Andrea della Porta wrote: > > On 10:24 Wed 21 Aug , Bjorn Helgaas wrote: > > > On Tue, Aug 20, 2024 at 04:36:05PM +0200, Andrea della Porta wrote: > > > > The of_pci_set_address() function parses devicetree PCI range > > > > specifier assuming the address is 'sanitized' at the origin, > > > > i.e. without checking whether the incoming address is 32 or 64 > > > > bit has specified in the flags. In this way an address with no > > > > OF_PCI_ADDR_SPACE_MEM64 set in the flags could leak through and > > > > the upper 32 bits of the address will be set too, and this > > > > violates the PCI specs stating that in 32 bit address the upper > > > > bit should be zero. > > > > I don't understand this code, so I'm probably missing something. It > > > looks like the interesting path here is: > > > > > > of_pci_prop_ranges > > > res = &pdev->resource[...]; > > > for (j = 0; j < num; j++) { > > > val64 = res[j].start; > > > of_pci_set_address(..., val64, 0, flags, false); > > > + if (OF_PCI_ADDR_SPACE_MEM64) > > > + prop[1] = upper_32_bits(val64); > > > + else > > > + prop[1] = 0; > > > > > > OF_PCI_ADDR_SPACE_MEM64 tells us about the size of the PCI bus > > > address, but the address (val64) is a CPU physical address, not a PCI > > > bus address, so I don't understand why of_pci_set_address() should use > > > OF_PCI_ADDR_SPACE_MEM64 to clear part of the CPU address. > > > > It all starts from of_pci_prop_ranges(), that is the caller of > > of_pci_set_address(). > > > val64 (i.e. res[j].start) is the address part of a struct resource > > that has its own flags. Those flags are directly translated to > > of_pci_range flags by of_pci_get_addr_flags(), so any > > IORESOURCE_MEM_64 / IORESOURCE_MEM in the resource flag will > > respectively become OF_PCI_ADDR_SPACE_MEM64 / > > OF_PCI_ADDR_SPACE_MEM32 in pci range. > > > What is advertised as 32 bit at the origin (val64) should not become > > a 64 bit PCI address at the output of of_pci_set_address(), so the > > upper 32 bit portion should be dropped. > > > This is explicitly stated in [1] (see page 5), where a space code of 0b10 > > implies that the upper 32 bit of the address must be zeroed out. > > OK, I was confused and thought IORESOURCE_MEM_64 was telling us > something about the *CPU* address, but it's actually telling us > something about what *PCI bus* addresses are possible, i.e., whether > it's a 32-bit BAR or a 64-bit BAR. > > However, the CPU physical address space and the PCI bus address are > not the same. Generic code paths should account for that different by > applying an offset (the offset will be zero on many platforms where > CPU and PCI bus addresses *look* the same). > > So a generic code path like of_pci_prop_ranges() that basically copies > a CPU physical address to a PCI bus address looks broken to me. Hmmm, I'd say that a translation from one bus type to the other is going on nonetheless, and this is done in the current upstream function as well. This patch of course does not add the translation (which is already in place), just to do it avoiding generating inconsistent address. > > Maybe my expectation of this being described in DT is mistaken. Not sure what you mean here, the address being translated are coming from DT, in fact they are described by "ranges" properties. Many thanks, Andrea > Bjorn
[+cc Lizhi] On Thu, Sep 05, 2024 at 06:43:35PM +0200, Andrea della Porta wrote: > On 17:26 Tue 03 Sep , Bjorn Helgaas wrote: > > On Mon, Aug 26, 2024 at 09:51:02PM +0200, Andrea della Porta wrote: > > > On 10:24 Wed 21 Aug , Bjorn Helgaas wrote: > > > > On Tue, Aug 20, 2024 at 04:36:05PM +0200, Andrea della Porta wrote: > > > > > The of_pci_set_address() function parses devicetree PCI range > > > > > specifier assuming the address is 'sanitized' at the origin, > > > > > i.e. without checking whether the incoming address is 32 or 64 > > > > > bit has specified in the flags. In this way an address with no > > > > > OF_PCI_ADDR_SPACE_MEM64 set in the flags could leak through and > > > > > the upper 32 bits of the address will be set too, and this > > > > > violates the PCI specs stating that in 32 bit address the upper > > > > > bit should be zero. > > > > > > I don't understand this code, so I'm probably missing something. It > > > > looks like the interesting path here is: > > > > > > > > of_pci_prop_ranges > > > > res = &pdev->resource[...]; > > > > for (j = 0; j < num; j++) { > > > > val64 = res[j].start; > > > > of_pci_set_address(..., val64, 0, flags, false); > > > > + if (OF_PCI_ADDR_SPACE_MEM64) > > > > + prop[1] = upper_32_bits(val64); > > > > + else > > > > + prop[1] = 0; > ... > > However, the CPU physical address space and the PCI bus address are > > not the same. Generic code paths should account for that different by > > applying an offset (the offset will be zero on many platforms where > > CPU and PCI bus addresses *look* the same). > > > > So a generic code path like of_pci_prop_ranges() that basically copies > > a CPU physical address to a PCI bus address looks broken to me. > > Hmmm, I'd say that a translation from one bus type to the other is > going on nonetheless, and this is done in the current upstream function > as well. This patch of course does not add the translation (which is > already in place), just to do it avoiding generating inconsistent address. I think I was looking at this backwards. I assumed we were *parsing" a "ranges" property, but I think in fact we're *building* a "ranges" property to describe an existing PCI device (either a PCI-to-PCI bridge or an endpoint). For such devices there is no address translation. Any address translation would only occur at a PCI host bridge that has CPU address space on the upstream side and PCI address space on the downstream side. Since (IIUC), we're building "ranges" for a device in the interior of a PCI hierarchy where address translation doesn't happen, I think both the parent and child addresses in "ranges" should be in the PCI address space. But right now, I think they're both in the CPU address space, and we basically do this: of_pci_prop_ranges(struct pci_dev *pdev, ...) res = &pdev->resource[...]; for (j = 0; j < num; j++) { # iterate through BARs or windows val64 = res[j].start; # CPU physical address # <convert to PCI address space> of_pci_set_address(..., rp[i].parent_addr, val64, ...) rp[i].parent_addr = val64 if (pci_is_bridge(pdev)) memcpy(rp[i].child_addr, rp[i].parent_addr) else rp[i].child_addr[0] = j # child addr unset/unused Here "res" is a PCI BAR or bridge window, and it contains CPU physical addresses, so "val64" is a CPU physical address. It looks to me like we should convert to a PCI bus address at the point noted above, based on any translation described by the PCI host bridge. That *should* naturally result in a 32-bit value if OF_PCI_ADDR_SPACE_MEM64 is not set. > > Maybe my expectation of this being described in DT is mistaken. > > Not sure what you mean here, the address being translated are coming from > DT, in fact they are described by "ranges" properties. Right, for my own future reference since I couldn't find a generic description of "ranges" in Documentation/devicetree/: [1] https://devicetree-specification.readthedocs.io/en/latest/chapter2-devicetree-basics.html#ranges
diff --git a/drivers/pci/of_property.c b/drivers/pci/of_property.c index 5a0b98e69795..77865facdb4a 100644 --- a/drivers/pci/of_property.c +++ b/drivers/pci/of_property.c @@ -60,7 +60,10 @@ static void of_pci_set_address(struct pci_dev *pdev, u32 *prop, u64 addr, prop[0] |= flags | reg_num; if (!reloc) { prop[0] |= OF_PCI_ADDR_FIELD_NONRELOC; - prop[1] = upper_32_bits(addr); + if (FIELD_GET(OF_PCI_ADDR_FIELD_SS, flags) == OF_PCI_ADDR_SPACE_MEM64) + prop[1] = upper_32_bits(addr); + else + prop[1] = 0; prop[2] = lower_32_bits(addr); } }
The of_pci_set_address() function parse devicetree PCI range specifier assuming the address is 'sanitized' at the origin, i.e. without checking whether the incoming address is 32 or 64 bit has specified in the flags. In this way an address with no OF_PCI_ADDR_SPACE_MEM64 set in the flagss could leak through and the upper 32 bits of the address will be set too, and this violates the PCI specs stating that ion 32 bit address the upper bit should be zero. This could cause mapping translation mismatch on PCI devices (e.g. RP1) that are expected to be addressed with a 64 bit address while advertising a 32 bit address in the PCI config region. Add a check in of_pci_set_address() to set upper 32 bits to zero in case the address has no 64 bit flag set. Signed-off-by: Andrea della Porta <andrea.porta@suse.com> --- drivers/pci/of_property.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)