Message ID | 20120525193716.GA8817@google.com |
---|---|
State | Not Applicable |
Headers | show |
On 05/25/2012 12:37 PM, Bjorn Helgaas wrote: > > That's goofy. You're proposing to make only x86_64 and x86-PAE try to put > 64-bit BARs above 4GB. Why should this be specific to x86? I acknowledge > that there's risk in doing this, but if it's a good idea for x86_64, it > should also be a good idea for other 64-bit architectures. > And so it is (assuming, of course, that we can address that memory.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote: >> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote: >> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL >> >> overflowing to zero -- that means the reader has to know what the >> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious >> >> ways if we changed it. >> >> >> >> please check if attached one is more clear. >> >> make max and bottom is only related to _MEM and not default one. >> >> - if (!(res->flags & IORESOURCE_MEM_64)) >> - max = PCIBIOS_MAX_MEM_32; >> + if (res->flags & IORESOURCE_MEM) { >> + if (!(res->flags & IORESOURCE_MEM_64)) >> + max = PCIBIOS_MAX_MEM_32; >> + else if (PCIBIOS_MAX_MEM_32 != -1) >> + bottom = (resource_size_t)(1ULL<<32); >> + } >> >> will still not affect to other arches. > > That's goofy. You're proposing to make only x86_64 and x86-PAE try to put > 64-bit BARs above 4GB. Why should this be specific to x86? I acknowledge > that there's risk in doing this, but if it's a good idea for x86_64, it > should also be a good idea for other 64-bit architectures. > > And testing for "is this x86_32 without PAE?" with > "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an > important bit of arch-specific behavior. > > Tangential question about allocate_resource(): Is its "max" argument > really necessary? We'll obviously only allocate from inside the root > resource, so "max" is just a way to artificially avoid the end of > that resource. Is there really a case where that's required? > > "min" makes sense because in a case like this, it's valid to allocate from > anywhere in the root resource, but we want to try to allocate from the >4GB > part first, then fall back to allocating from the whole resource. I'm not > sure there's a corresponding case for "max." > > Getting back to this patch, I don't think we should need to adjust "max" at > all. For example, this: > > commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25 > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Thu May 24 22:15:26 2012 -0600 > > PCI: try to allocate 64-bit mem resources above 4GB > > If we have a 64-bit mem resource, try to allocate it above 4GB first. If > that fails, we'll fall back to allocating space below 4GB. > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > index 4ce5ef2..075e5b1 100644 > --- a/drivers/pci/bus.c > +++ b/drivers/pci/bus.c > @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res, > { > int i, ret = -ENOMEM; > struct resource *r; > - resource_size_t max = -1; > + resource_size_t start = 0; > + resource_size_t end = MAX_RESOURCE; yeah, MAX_RESOURCE is better than -1. > > type_mask |= IORESOURCE_IO | IORESOURCE_MEM; > > - /* don't allocate too high if the pref mem doesn't support 64bit*/ > - if (!(res->flags & IORESOURCE_MEM_64)) > - max = PCIBIOS_MAX_MEM_32; can not remove this one. otherwise will could allocate above 4g range to non MEM64 resource. > + /* If this is a 64-bit mem resource, try above 4GB first */ > + if (res->flags & IORESOURCE_MEM_64) > + start = (resource_size_t) (1ULL << 32); could affect other arches. let's see if other arches is ok. please check merged version. also we have include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0) arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1) we should merge them later? Thanks Yinghai
On Fri, May 25, 2012 at 2:19 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Fri, May 25, 2012 at 12:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> On Fri, May 25, 2012 at 11:39:26AM -0700, Yinghai Lu wrote: >>> On Fri, May 25, 2012 at 10:53 AM, Yinghai Lu <yinghai@kernel.org> wrote: >>> >> I don't really like the dependency on PCIBIOS_MAX_MEM_32 + 1ULL >>> >> overflowing to zero -- that means the reader has to know what the >>> >> value of PCIBIOS_MAX_MEM_32 is, and things would break in non-obvious >>> >> ways if we changed it. >>> >> >>> >>> please check if attached one is more clear. >>> >>> make max and bottom is only related to _MEM and not default one. >>> >>> - if (!(res->flags & IORESOURCE_MEM_64)) >>> - max = PCIBIOS_MAX_MEM_32; >>> + if (res->flags & IORESOURCE_MEM) { >>> + if (!(res->flags & IORESOURCE_MEM_64)) >>> + max = PCIBIOS_MAX_MEM_32; >>> + else if (PCIBIOS_MAX_MEM_32 != -1) >>> + bottom = (resource_size_t)(1ULL<<32); >>> + } >>> >>> will still not affect to other arches. >> >> That's goofy. You're proposing to make only x86_64 and x86-PAE try to put >> 64-bit BARs above 4GB. Why should this be specific to x86? I acknowledge >> that there's risk in doing this, but if it's a good idea for x86_64, it >> should also be a good idea for other 64-bit architectures. >> >> And testing for "is this x86_32 without PAE?" with >> "PCIBIOS_MAX_MEM_32 == -1" is just plain obtuse and hides an >> important bit of arch-specific behavior. >> >> Tangential question about allocate_resource(): Is its "max" argument >> really necessary? We'll obviously only allocate from inside the root >> resource, so "max" is just a way to artificially avoid the end of >> that resource. Is there really a case where that's required? >> >> "min" makes sense because in a case like this, it's valid to allocate from >> anywhere in the root resource, but we want to try to allocate from the >4GB >> part first, then fall back to allocating from the whole resource. I'm not >> sure there's a corresponding case for "max." >> >> Getting back to this patch, I don't think we should need to adjust "max" at >> all. For example, this: >> >> commit cb1c8e46244cfd84a1a2fe91be860a74c1cf4e25 >> Author: Bjorn Helgaas <bhelgaas@google.com> >> Date: Thu May 24 22:15:26 2012 -0600 >> >> PCI: try to allocate 64-bit mem resources above 4GB >> >> If we have a 64-bit mem resource, try to allocate it above 4GB first. If >> that fails, we'll fall back to allocating space below 4GB. >> >> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c >> index 4ce5ef2..075e5b1 100644 >> --- a/drivers/pci/bus.c >> +++ b/drivers/pci/bus.c >> @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res, >> { >> int i, ret = -ENOMEM; >> struct resource *r; >> - resource_size_t max = -1; >> + resource_size_t start = 0; >> + resource_size_t end = MAX_RESOURCE; > > yeah, MAX_RESOURCE is better than -1. > >> >> type_mask |= IORESOURCE_IO | IORESOURCE_MEM; >> >> - /* don't allocate too high if the pref mem doesn't support 64bit*/ >> - if (!(res->flags & IORESOURCE_MEM_64)) >> - max = PCIBIOS_MAX_MEM_32; > > can not remove this one. > otherwise will could allocate above 4g range to non MEM64 resource. Yeah, I convince myself of the dumbest things sometimes. It occurred to me while driving home that we need this, but you beat me to it :) I think we actually have a separate bug here. On 64-bit non-x86 architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following attempt to avoid putting a 32-bit BAR above 4G only works on x86, where PCIBIOS_MAX_MEM_32 is 0xffffffff. /* don't allocate too high if the pref mem doesn't support 64bit*/ if (!(res->flags & IORESOURCE_MEM_64)) max = PCIBIOS_MAX_MEM_32; I think we should fix this with a separate patch that removes PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit 0xffffffff (or some other "max 32-bit value" symbol). I don't think there's anything arch-specific about this. So I'd like to see two patches here: 1) Avoid allocating 64-bit regions for 32-bit BARs 2) Try to allocate regions above 4GB for 64-bit BARs > also we have > > include/linux/range.h:#define MAX_RESOURCE ((resource_size_t)~0) > arch/x86/kernel/e820.c:#define MAX_RESOURCE_SIZE ((resource_size_t)-1) > > we should merge them later? I would support that. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/25/2012 02:55 PM, Bjorn Helgaas wrote: > > I think we actually have a separate bug here. On 64-bit non-x86 > architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following > attempt to avoid putting a 32-bit BAR above 4G only works on x86, > where PCIBIOS_MAX_MEM_32 is 0xffffffff. > > /* don't allocate too high if the pref mem doesn't support 64bit*/ > if (!(res->flags & IORESOURCE_MEM_64)) > max = PCIBIOS_MAX_MEM_32; > > I think we should fix this with a separate patch that removes > PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit > 0xffffffff (or some other "max 32-bit value" symbol). I don't think > there's anything arch-specific about this. > > So I'd like to see two patches here: > 1) Avoid allocating 64-bit regions for 32-bit BARs > 2) Try to allocate regions above 4GB for 64-bit BARs > Do we also need to track the maximum address available to the CPU? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 25, 2012 at 3:58 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 05/25/2012 02:55 PM, Bjorn Helgaas wrote: >> >> I think we actually have a separate bug here. On 64-bit non-x86 >> architectures, PCIBIOS_MAX_MEM_32 is a 64-bit -1, so the following >> attempt to avoid putting a 32-bit BAR above 4G only works on x86, >> where PCIBIOS_MAX_MEM_32 is 0xffffffff. >> >> /* don't allocate too high if the pref mem doesn't support 64bit*/ >> if (!(res->flags & IORESOURCE_MEM_64)) >> max = PCIBIOS_MAX_MEM_32; >> >> I think we should fix this with a separate patch that removes >> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit >> 0xffffffff (or some other "max 32-bit value" symbol). I don't think >> there's anything arch-specific about this. >> >> So I'd like to see two patches here: >> 1) Avoid allocating 64-bit regions for 32-bit BARs >> 2) Try to allocate regions above 4GB for 64-bit BARs > > Do we also need to track the maximum address available to the CPU? We are allocating from the resources available on this bus, which means the host bridge apertures or a subset. If the arch supplies those, that should be enough. If it doesn't, we default to iomem_resource. On x86, we trim that down to only the supported address space: iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1; But I'm sure some other architectures don't do that yet. I guess that's one of the risks -- if an arch is doesn't tell us the apertures and doesn't trim iomem_resource, we could allocate a non-addressable region. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > I think we should fix this with a separate patch that removes > PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit > 0xffffffff (or some other "max 32-bit value" symbol). I don't think > there's anything arch-specific about this. > > So I'd like to see two patches here: > 1) Avoid allocating 64-bit regions for 32-bit BARs > 2) Try to allocate regions above 4GB for 64-bit BARs Sure. please check updated two patches. Thanks Yinghai
On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> I think we should fix this with a separate patch that removes >> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit >> 0xffffffff (or some other "max 32-bit value" symbol). I don't think >> there's anything arch-specific about this. >> >> So I'd like to see two patches here: >> 1) Avoid allocating 64-bit regions for 32-bit BARs >> 2) Try to allocate regions above 4GB for 64-bit BARs > > Sure. please check updated two patches. I think the first one looks good. I'm curious about the second. Why did you add the IORESOURCE_MEM test? That's doesn't affect the "start =" piece because IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set. But it does affect the "end =" part. Previously we limited all I/O and 32-bit mem BARs to the low 4GB. This patch makes it so we limit 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs. But I/O BARs can only have 32 bits of address, so it seems like we should limit them the same way as 32-bit mem BARs. So I expected something like this: if (res->flags & IORESOURCE_MEM_64) { start = (resource_size_t) (1ULL << 32); end = PCI_MAX_RESOURCE; } else { start = 0; end = PCI_MAX_RESOURCE_32; } -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 25, 2012 at 6:12 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >>> I think we should fix this with a separate patch that removes >>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit >>> 0xffffffff (or some other "max 32-bit value" symbol). I don't think >>> there's anything arch-specific about this. >>> >>> So I'd like to see two patches here: >>> 1) Avoid allocating 64-bit regions for 32-bit BARs >>> 2) Try to allocate regions above 4GB for 64-bit BARs >> >> Sure. please check updated two patches. > > I think the first one looks good. > > I'm curious about the second. Why did you add the IORESOURCE_MEM > test? That's doesn't affect the "start =" piece because > IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set. > > But it does affect the "end =" part. Previously we limited all I/O > and 32-bit mem BARs to the low 4GB. This patch makes it so we limit > 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs. But I/O > BARs can only have 32 bits of address, so it seems like we should > limit them the same way as 32-bit mem BARs. So I expected something > like this: > > if (res->flags & IORESOURCE_MEM_64) { > start = (resource_size_t) (1ULL << 32); > end = PCI_MAX_RESOURCE; > } else { > start = 0; > end = PCI_MAX_RESOURCE_32; > } Another bug here: we're trying to restrict the *bus* addresses we allocate, but we're applying the limits to *CPU* addresses. Therefore, this only works as intended when CPU addresses are the same as bus addresses, i.e., when the host bridge applies no address translation. That happens to be the case for x86, but is not the case in general. I think we need a third patch to fix this problem. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 25, 2012 at 5:12 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Fri, May 25, 2012 at 5:10 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> On Fri, May 25, 2012 at 2:55 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >>> I think we should fix this with a separate patch that removes >>> PCIBIOS_MAX_MEM_32 altogether, replacing this use with an explicit >>> 0xffffffff (or some other "max 32-bit value" symbol). I don't think >>> there's anything arch-specific about this. >>> >>> So I'd like to see two patches here: >>> 1) Avoid allocating 64-bit regions for 32-bit BARs >>> 2) Try to allocate regions above 4GB for 64-bit BARs >> >> Sure. please check updated two patches. > > I think the first one looks good. > > I'm curious about the second. Why did you add the IORESOURCE_MEM > test? That's doesn't affect the "start =" piece because > IORESOURCE_MEM is always set if IORESOURCE_MEM_64 is set. > > But it does affect the "end =" part. Previously we limited all I/O > and 32-bit mem BARs to the low 4GB. This patch makes it so we limit > 32-bit mem BARs to the low 4GB, but we don't limit I/O BARs. But I/O > BARs can only have 32 bits of address, so it seems like we should > limit them the same way as 32-bit mem BARs. So I expected something > like this: > > if (res->flags & IORESOURCE_MEM_64) { > start = (resource_size_t) (1ULL << 32); > end = PCI_MAX_RESOURCE; > } else { > start = 0; > end = PCI_MAX_RESOURCE_32; > } x86 are using 16bits. some others use 32 bits. #define IO_SPACE_LIMIT 0xffffffff ia64 and sparc are using 64bits. #define IO_SPACE_LIMIT 0xffffffffffffffffUL but pci only support 16bits and 32bits. maybe later we can add PCI_MAX_RESOURCE_16 to handle 16bits and 32bit io ports. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, May 26, 2012 at 8:01 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: > Another bug here: we're trying to restrict the *bus* addresses we > allocate, but we're applying the limits to *CPU* addresses. > Therefore, this only works as intended when CPU addresses are the same > as bus addresses, i.e., when the host bridge applies no address > translation. That happens to be the case for x86, but is not the case > in general. > > I think we need a third patch to fix this problem. yes. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/29/2012 10:55 AM, Yinghai Lu wrote: > > x86 are using 16bits. > > some others use 32 bits. > #define IO_SPACE_LIMIT 0xffffffff > > ia64 and sparc are using 64bits. > #define IO_SPACE_LIMIT 0xffffffffffffffffUL > > but pci only support 16bits and 32bits. > > maybe later we can add > PCI_MAX_RESOURCE_16 > > to handle 16bits and 32bit io ports. > Shouldn't this be dealt by root port apertures? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 05/29/2012 10:55 AM, Yinghai Lu wrote: >> >> x86 are using 16bits. >> >> some others use 32 bits. >> #define IO_SPACE_LIMIT 0xffffffff >> >> ia64 and sparc are using 64bits. >> #define IO_SPACE_LIMIT 0xffffffffffffffffUL >> >> but pci only support 16bits and 32bits. >> >> maybe later we can add >> PCI_MAX_RESOURCE_16 >> >> to handle 16bits and 32bit io ports. >> > > Shouldn't this be dealt by root port apertures? > pci bridge could support 16bits and 32bits io port. but we did not record if 32bits is supported. so during allocating, could have allocated above 64k address to non 32bit bridge. but x86 is ok, because ioport.end always set to 0xffff. other arches with IO_SPACE_LIMIT with 0xffffffff or 0xffffffffffffffffUL may have problem. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/29/2012 11:17 AM, Yinghai Lu wrote: > > pci bridge could support 16bits and 32bits io port. > but we did not record if 32bits is supported. > Okay, so this is the actual problem, no? > so during allocating, could have allocated above 64k address to non > 32bit bridge. > > but x86 is ok, because ioport.end always set to 0xffff. > other arches with IO_SPACE_LIMIT with 0xffffffff or > 0xffffffffffffffffUL may have problem. The latter is nonsense, the PCI-side address space is only 32 bits wide. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 05/29/2012 10:55 AM, Yinghai Lu wrote: >>> >>> x86 are using 16bits. >>> >>> some others use 32 bits. >>> #define IO_SPACE_LIMIT 0xffffffff >>> >>> ia64 and sparc are using 64bits. >>> #define IO_SPACE_LIMIT 0xffffffffffffffffUL >>> >>> but pci only support 16bits and 32bits. >>> >>> maybe later we can add >>> PCI_MAX_RESOURCE_16 >>> >>> to handle 16bits and 32bit io ports. >>> >> >> Shouldn't this be dealt by root port apertures? >> > > pci bridge could support 16bits and 32bits io port. > but we did not record if 32bits is supported. > > so during allocating, could have allocated above 64k address to non > 32bit bridge. > > but x86 is ok, because ioport.end always set to 0xffff. > other arches with IO_SPACE_LIMIT with 0xffffffff or > 0xffffffffffffffffUL may have problem. I think current IO_SPACE_LIMIT usage is a little confused. The "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to a CPU-side address, not a bus address. Other uses, e.g., in __pci_read_base(), apply it to bus addresses from BARs, which is wrong. Host bridges apply I/O port offsets just like they apply memory offsets. The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means there's no restriction on CPU-side I/O port addresses, but any given host bridge will translate its I/O port aperture to bus addresses that fit in 32 bits. None of this is really relevant to the question I asked, namely, "why Yinghai's patch doesn't limit I/O BAR values to 32 bits?" That constraint is clearly a requirement because I/O BARs are only 32 bits wide, but I don't think it needs to be enforced in the code here. The host bridge or upstream P2P bridge apertures should already take care of that automatically. I don't think the 16- or 32-bitness of P2P bridge apertures is relevant here, because the I/O resources available on the secondary bus already reflect that. After all that discussion, I think my objection here boils down to "you shouldn't change the I/O BAR constraints in a patch that claims to allocate 64-bit *memory* BARs above 4GB." I think the code below is still the clearest way to set the constraints: if (res->flags & IORESOURCE_MEM_64) { start = (resource_size_t) (1ULL << 32); end = PCI_MAX_RESOURCE; } else { start = 0; end = PCI_MAX_RESOURCE_32; } It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32 because host bridge apertures should already enforce that, but I think the code above just makes it clearer. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote: >>> On 05/29/2012 10:55 AM, Yinghai Lu wrote: >>>> >>>> x86 are using 16bits. >>>> >>>> some others use 32 bits. >>>> #define IO_SPACE_LIMIT 0xffffffff >>>> >>>> ia64 and sparc are using 64bits. >>>> #define IO_SPACE_LIMIT 0xffffffffffffffffUL >>>> >>>> but pci only support 16bits and 32bits. >>>> >>>> maybe later we can add >>>> PCI_MAX_RESOURCE_16 >>>> >>>> to handle 16bits and 32bit io ports. >>>> >>> >>> Shouldn't this be dealt by root port apertures? >>> >> >> pci bridge could support 16bits and 32bits io port. >> but we did not record if 32bits is supported. >> >> so during allocating, could have allocated above 64k address to non >> 32bit bridge. >> >> but x86 is ok, because ioport.end always set to 0xffff. >> other arches with IO_SPACE_LIMIT with 0xffffffff or >> 0xffffffffffffffffUL may have problem. > > I think current IO_SPACE_LIMIT usage is a little confused. The > "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to > a CPU-side address, not a bus address. Other uses, e.g., in > __pci_read_base(), apply it to bus addresses from BARs, which is > wrong. Host bridges apply I/O port offsets just like they apply > memory offsets. The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means > there's no restriction on CPU-side I/O port addresses, but any given > host bridge will translate its I/O port aperture to bus addresses that > fit in 32 bits. > > None of this is really relevant to the question I asked, namely, "why > Yinghai's patch doesn't limit I/O BAR values to 32 bits?" That > constraint is clearly a requirement because I/O BARs are only 32 bits > wide, but I don't think it needs to be enforced in the code here. The > host bridge or upstream P2P bridge apertures should already take care > of that automatically. I don't think the 16- or 32-bitness of P2P > bridge apertures is relevant here, because the I/O resources available > on the secondary bus already reflect that. > > After all that discussion, I think my objection here boils down to > "you shouldn't change the I/O BAR constraints in a patch that claims > to allocate 64-bit *memory* BARs above 4GB." > > I think the code below is still the clearest way to set the constraints: > > if (res->flags & IORESOURCE_MEM_64) { > start = (resource_size_t) (1ULL << 32); > end = PCI_MAX_RESOURCE; > } else { > start = 0; > end = PCI_MAX_RESOURCE_32; > } > > It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32 > because host bridge apertures should already enforce that, but I think > the code above just makes it clearer. ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports. also RFC to limit for 16 bit ioport handling. only help other arches that does support 32bit ioports but have bridges only support 16bit io ports. Thanks Yinghai
On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 05/29/2012 11:17 AM, Yinghai Lu wrote: >> >> pci bridge could support 16bits and 32bits io port. >> but we did not record if 32bits is supported. >> > > Okay, so this is the actual problem, no? their fw could not need kernel help to allocate io ports, or they are only use device that support 32bit ioport. > >> so during allocating, could have allocated above 64k address to non >> 32bit bridge. >> >> but x86 is ok, because ioport.end always set to 0xffff. >> other arches with IO_SPACE_LIMIT with 0xffffffff or >> 0xffffffffffffffffUL may have problem. > > The latter is nonsense, the PCI-side address space is only 32 bits wide. > maybe they have unified io include ioport and mem io? Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/29/2012 01:46 PM, Yinghai Lu wrote: > On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 05/29/2012 11:17 AM, Yinghai Lu wrote: >>> >>> pci bridge could support 16bits and 32bits io port. >>> but we did not record if 32bits is supported. >>> >> >> Okay, so this is the actual problem, no? > > their fw could not need kernel help to allocate io ports, or they are > only use device that support 32bit ioport. > That barely parses, never mind makes sense. >> >>> so during allocating, could have allocated above 64k address to non >>> 32bit bridge. >>> >>> but x86 is ok, because ioport.end always set to 0xffff. >>> other arches with IO_SPACE_LIMIT with 0xffffffff or >>> 0xffffffffffffffffUL may have problem. >> >> The latter is nonsense, the PCI-side address space is only 32 bits wide. >> > maybe they have unified io include ioport and mem io? > The bus-side address space should not be more than 32 bits no matter what. As Bjorn indicates, you seem to be mixing up bus and cpu addresses all over the place. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Yinghai Lu <yinghai@kernel.org> Date: Tue, 29 May 2012 13:46:03 -0700 > On Tue, May 29, 2012 at 12:03 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 05/29/2012 11:17 AM, Yinghai Lu wrote: >>> so during allocating, could have allocated above 64k address to non >>> 32bit bridge. >>> >>> but x86 is ok, because ioport.end always set to 0xffff. >>> other arches with IO_SPACE_LIMIT with 0xffffffff or >>> 0xffffffffffffffffUL may have problem. >> >> The latter is nonsense, the PCI-side address space is only 32 bits wide. >> > maybe they have unified io include ioport and mem io? The resource values stored are the actual physical I/O addresses on some platforms. Sparc, for example, falls into this category. Therefore ~0UL is the only feasible limit for them. We have to distinguish explicitly when we are operating upon actual PCI bus view addresses vs. the values that end up in the resources. They are entirely different, and have completely different address ranges. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote: >>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote: >>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote: >>>>> >>>>> x86 are using 16bits. >>>>> >>>>> some others use 32 bits. >>>>> #define IO_SPACE_LIMIT 0xffffffff >>>>> >>>>> ia64 and sparc are using 64bits. >>>>> #define IO_SPACE_LIMIT 0xffffffffffffffffUL >>>>> >>>>> but pci only support 16bits and 32bits. >>>>> >>>>> maybe later we can add >>>>> PCI_MAX_RESOURCE_16 >>>>> >>>>> to handle 16bits and 32bit io ports. >>>>> >>>> >>>> Shouldn't this be dealt by root port apertures? >>>> >>> >>> pci bridge could support 16bits and 32bits io port. >>> but we did not record if 32bits is supported. >>> >>> so during allocating, could have allocated above 64k address to non >>> 32bit bridge. >>> >>> but x86 is ok, because ioport.end always set to 0xffff. >>> other arches with IO_SPACE_LIMIT with 0xffffffff or >>> 0xffffffffffffffffUL may have problem. >> >> I think current IO_SPACE_LIMIT usage is a little confused. The >> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to >> a CPU-side address, not a bus address. Other uses, e.g., in >> __pci_read_base(), apply it to bus addresses from BARs, which is >> wrong. Host bridges apply I/O port offsets just like they apply >> memory offsets. The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means >> there's no restriction on CPU-side I/O port addresses, but any given >> host bridge will translate its I/O port aperture to bus addresses that >> fit in 32 bits. >> >> None of this is really relevant to the question I asked, namely, "why >> Yinghai's patch doesn't limit I/O BAR values to 32 bits?" That >> constraint is clearly a requirement because I/O BARs are only 32 bits >> wide, but I don't think it needs to be enforced in the code here. The >> host bridge or upstream P2P bridge apertures should already take care >> of that automatically. I don't think the 16- or 32-bitness of P2P >> bridge apertures is relevant here, because the I/O resources available >> on the secondary bus already reflect that. >> >> After all that discussion, I think my objection here boils down to >> "you shouldn't change the I/O BAR constraints in a patch that claims >> to allocate 64-bit *memory* BARs above 4GB." >> >> I think the code below is still the clearest way to set the constraints: >> >> if (res->flags & IORESOURCE_MEM_64) { >> start = (resource_size_t) (1ULL << 32); >> end = PCI_MAX_RESOURCE; >> } else { >> start = 0; >> end = PCI_MAX_RESOURCE_32; >> } >> >> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32 >> because host bridge apertures should already enforce that, but I think >> the code above just makes it clearer. > > > ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports. I like the fact that this patch no longer changes anything for I/O resources. I assume this is part of fixing some bug (Steven's?) I'd like to have a pointer in the changelog to a bugzilla or discussion about the bug. The effect of this patch is similar to what I did earlier with b126b4703afa4 and e7f8567db9a7 (allocate space from top down), though this one is more limited and it won't change quite as much. We ran into problems (BIOS defects, I think) and had to revert my patches, so it's quite possible that we'll run into similar problems here. I'm a little nervous because this is a fundamental area that explores new areas of the address space minefield. I think we're generally safer if we follow a path similar to where Windows has been. I think Windows also prefers space above 4GB for 64-bit BARs, but I suspect that's just a natural consequence of allocating from the top down. So we'll place things just above 4GB, and Windows will place things as high as possible. I don't know the best solution here. This patch ("bottom-up above 4GB") is one possibility. Another is to allocate only 64-bit BARs top-down. Or maybe allocate everything top-down on machines newer than some date. They all seem ugly. What makes me uneasy is that your patch strikes out on a new path that is different from what we've done before *and* different from what Windows does. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu <yinghai@kernel.org> wrote: >>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin <hpa@zytor.com> wrote: >>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote: >>>>> >>>>> x86 are using 16bits. >>>>> >>>>> some others use 32 bits. >>>>> #define IO_SPACE_LIMIT 0xffffffff >>>>> >>>>> ia64 and sparc are using 64bits. >>>>> #define IO_SPACE_LIMIT 0xffffffffffffffffUL >>>>> >>>>> but pci only support 16bits and 32bits. >>>>> >>>>> maybe later we can add >>>>> PCI_MAX_RESOURCE_16 >>>>> >>>>> to handle 16bits and 32bit io ports. >>>>> >>>> >>>> Shouldn't this be dealt by root port apertures? >>>> >>> >>> pci bridge could support 16bits and 32bits io port. >>> but we did not record if 32bits is supported. >>> >>> so during allocating, could have allocated above 64k address to non >>> 32bit bridge. >>> >>> but x86 is ok, because ioport.end always set to 0xffff. >>> other arches with IO_SPACE_LIMIT with 0xffffffff or >>> 0xffffffffffffffffUL may have problem. >> >> I think current IO_SPACE_LIMIT usage is a little confused. The >> "ioport_resource.end = IO_SPACE_LIMIT" in kernel/resource.c refers to >> a CPU-side address, not a bus address. Other uses, e.g., in >> __pci_read_base(), apply it to bus addresses from BARs, which is >> wrong. Host bridges apply I/O port offsets just like they apply >> memory offsets. The ia64 IO_SPACE_LIMIT of 0xffffffffffffffffUL means >> there's no restriction on CPU-side I/O port addresses, but any given >> host bridge will translate its I/O port aperture to bus addresses that >> fit in 32 bits. >> >> None of this is really relevant to the question I asked, namely, "why >> Yinghai's patch doesn't limit I/O BAR values to 32 bits?" That >> constraint is clearly a requirement because I/O BARs are only 32 bits >> wide, but I don't think it needs to be enforced in the code here. The >> host bridge or upstream P2P bridge apertures should already take care >> of that automatically. I don't think the 16- or 32-bitness of P2P >> bridge apertures is relevant here, because the I/O resources available >> on the secondary bus already reflect that. >> >> After all that discussion, I think my objection here boils down to >> "you shouldn't change the I/O BAR constraints in a patch that claims >> to allocate 64-bit *memory* BARs above 4GB." >> >> I think the code below is still the clearest way to set the constraints: >> >> if (res->flags & IORESOURCE_MEM_64) { >> start = (resource_size_t) (1ULL << 32); >> end = PCI_MAX_RESOURCE; >> } else { >> start = 0; >> end = PCI_MAX_RESOURCE_32; >> } >> >> It's not strictly necessary to limit I/O BARs to PCI_MAX_RESOURCE_32 >> because host bridge apertures should already enforce that, but I think >> the code above just makes it clearer. > > > ok, please check the version, that put back PCI_MAX_RESOURCE_32 for io ports. > > also RFC to limit for 16 bit ioport handling. only help other arches > that does support 32bit ioports but have bridges only support 16bit io > ports. I don't understand this one at all. It looks like you mashed together at least two changes: (1) prefer I/O space above 64K if available, and (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use that to limit allocations. I don't see the justification for (1). What problem does this solve? I don't see the need for (2). Even without the start/end constraints in pci_bus_alloc_resource(), we only allocate from the resources available on the bus. Apparently you think this list might be incorrect, i.e., it might include I/O space above 64K when it shouldn't. I don't think that's the case, but if it is, we should fix the list of bus resources, not add a hack to avoid parts of it. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > I don't understand this one at all. It looks like you mashed together > at least two changes: (1) prefer I/O space above 64K if available, and > (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P > bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use > that to limit allocations. let drop the one about IORESOURCE_IO_32. that should only fix one reallocation bug in theory out of x86. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 5:33 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Tue, May 29, 2012 at 4:27 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> I don't understand this one at all. It looks like you mashed together >> at least two changes: (1) prefer I/O space above 64K if available, and >> (2) mark secondary bus resources with IORESOURCE_IO_32 when the P2P >> bridge I/O window address decode type is PCI_IO_RANGE_TYPE_32 and use >> that to limit allocations. > > let drop the one about IORESOURCE_IO_32. > > that should only fix one reallocation bug in theory out of x86. If there's a bug, we should try to fix it. I just don't understand what the bug *is*. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 30/05/12 00:27, Bjorn Helgaas wrote: > On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> > wrote: >> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas >> <bhelgaas@google.com> wrote: >>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu >>> <yinghai@kernel.org> wrote: >>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin >>>> <hpa@zytor.com> wrote: >>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote: *SNIP* >> >> >> ok, please check the version, that put back PCI_MAX_RESOURCE_32 >> for io ports. >> >> also RFC to limit for 16 bit ioport handling. only help other >> arches that does support 32bit ioports but have bridges only >> support 16bit io ports. > > I don't understand this one at all. It looks like you mashed > together at least two changes: (1) prefer I/O space above 64K if > available, and (2) mark secondary bus resources with > IORESOURCE_IO_32 when the P2P bridge I/O window address decode type > is PCI_IO_RANGE_TYPE_32 and use that to limit allocations. > > I don't see the justification for (1). What problem does this > solve? > I can potentially see this helping with hotplug, where a new device has only 16 bit io ports, but if there's no space remaining under 64k allocation would fail. Preferring above 64k means preserving that limited resource. This is exactly equivalent to my original motivation for preferring 64 bit PCI mem in order to preserve 32 bit address space for hotplug devices without 64 bit BARs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk/FzvEACgkQGcb56gMuC61lYwCfchbsyzN5KLCWTuyQiMcJR0DH l4gAoJAY1D0HN6m5JnQLu705hvm85p5a =whd3 -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 30, 2012 at 1:40 AM, Steven Newbury <steve@snewbury.org.uk> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 30/05/12 00:27, Bjorn Helgaas wrote: >> On Tue, May 29, 2012 at 2:40 PM, Yinghai Lu <yinghai@kernel.org> >> wrote: >>> On Tue, May 29, 2012 at 12:23 PM, Bjorn Helgaas >>> <bhelgaas@google.com> wrote: >>>> On Tue, May 29, 2012 at 12:17 PM, Yinghai Lu >>>> <yinghai@kernel.org> wrote: >>>>> On Tue, May 29, 2012 at 10:57 AM, H. Peter Anvin >>>>> <hpa@zytor.com> wrote: >>>>>> On 05/29/2012 10:55 AM, Yinghai Lu wrote: > > *SNIP* > >>> >>> >>> ok, please check the version, that put back PCI_MAX_RESOURCE_32 >>> for io ports. >>> >>> also RFC to limit for 16 bit ioport handling. only help other >>> arches that does support 32bit ioports but have bridges only >>> support 16bit io ports. >> >> I don't understand this one at all. It looks like you mashed >> together at least two changes: (1) prefer I/O space above 64K if >> available, and (2) mark secondary bus resources with >> IORESOURCE_IO_32 when the P2P bridge I/O window address decode type >> is PCI_IO_RANGE_TYPE_32 and use that to limit allocations. >> >> I don't see the justification for (1). What problem does this >> solve? >> > I can potentially see this helping with hotplug, where a new device > has only 16 bit io ports, but if there's no space remaining under 64k > allocation would fail. Preferring above 64k means preserving that > limited resource. This is exactly equivalent to my original > motivation for preferring 64 bit PCI mem in order to preserve 32 bit > address space for hotplug devices without 64 bit BARs. You're right, the spec does allow the upper 16 bits of I/O BARs to be hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make some sense. I don't think it applies to x86, since I don't think there's a way to generate an I/O access to anything above 64K, but it could help other arches. I'm inclined to be conservative and wait until we find a problem where a patch like this would help. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/30/2012 09:27 AM, Bjorn Helgaas wrote: > > You're right, the spec does allow the upper 16 bits of I/O BARs to be > hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make > some sense. I don't think it applies to x86, since I don't think > there's a way to generate an I/O access to anything above 64K, but it > could help other arches. > > I'm inclined to be conservative and wait until we find a problem where > a patch like this would help. > The really conservative thing is to just use 16-bit addresses for I/O on all platforms. I think a lot of non-x86 platforms does that. -hpa
On Wed, May 30, 2012 at 9:27 AM, Bjorn Helgaas <bhelgaas@google.com> wrote: > > You're right, the spec does allow the upper 16 bits of I/O BARs to be > hardwired to zero (PCI spec rev 3.0, p. 226), so this part does make > some sense. I don't think it applies to x86, since I don't think > there's a way to generate an I/O access to anything above 64K, but it > could help other arches. > > I'm inclined to be conservative and wait until we find a problem where > a patch like this would help. I agree. I would be *very* suspicious of "it might help with other architectures". Sure, other architectures might be using non-16-bit IO addresses for their motherboard resources (just because they can). However, any hotplug device is most likely (by far) to have been tested and designed for x86, which means that even if it might look like it has more than 16 bits of IO space, it will never have been tested that way. So rather than make it likely to help other architectures, I'd say that it would make it likely to break things. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: > > The bus-side address space should not be more than 32 bits no matter > what. As Bjorn indicates, you seem to be mixing up bus and cpu > addresses all over the place. please check update patches that is using converted pci bus address for boundary checking. Thanks Yinghai Lu
On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> >> The bus-side address space should not be more than 32 bits no matter >> what. As Bjorn indicates, you seem to be mixing up bus and cpu >> addresses all over the place. > > please check update patches that is using converted pci bus address > for boundary checking. What problem does this fix? There's significant risk that this allocation change will make us trip over something, so it must fix something to make it worth considering. Steve's problem doesn't count because that's a "pci=nocrs" case that will always require special handling. A general solution is not possible without a BIOS change (to describe >4GB apertures) or a native host bridge driver (to discover >4GB apertures from the hardware). These patches only make Steve's machine work by accident -- they make us put the video device above 4GB, and we're just lucky that the host bridge claims that region. One possibility is some sort of boot-time option to force a PCI device to a specified address. That would be useful for debugging as well as for Steve's machine. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: >>> >>> The bus-side address space should not be more than 32 bits no matter >>> what. As Bjorn indicates, you seem to be mixing up bus and cpu >>> addresses all over the place. >> >> please check update patches that is using converted pci bus address >> for boundary checking. > > What problem does this fix? There's significant risk that this > allocation change will make us trip over something, so it must fix > something to make it worth considering. If we do not enable that, we would not find the problem. On one my test setup that _CRS does state 64bit resource range, but when I clear some device resource manually and let kernel allocate high, just then find out those devices does not work with drivers. It turns out _CRS have more big range than what the chipset setting states. with fixing in BIOS, allocate high is working now on that platform. > > Steve's problem doesn't count because that's a "pci=nocrs" case that > will always require special handling. but pci=nocrs is still supported, even some systems does not work with pci=use_crs > A general solution is not > possible without a BIOS change (to describe >4GB apertures) or a > native host bridge driver (to discover >4GB apertures from the > hardware). These patches only make Steve's machine work by accident > -- they make us put the video device above 4GB, and we're just lucky > that the host bridge claims that region. Some bios looks enabling the non-stated range default to legacy chain. Some bios does not do that. only stated range count. So with pci=nocrs we still have some chance to get allocate high working. > > One possibility is some sort of boot-time option to force a PCI device > to a specified address. That would be useful for debugging as well as > for Steve's machine. yeah, how about pci=alloc_high and default to disabled ? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote: >>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: >>>> >>>> The bus-side address space should not be more than 32 bits no matter >>>> what. As Bjorn indicates, you seem to be mixing up bus and cpu >>>> addresses all over the place. >>> >>> please check update patches that is using converted pci bus address >>> for boundary checking. >> >> What problem does this fix? There's significant risk that this >> allocation change will make us trip over something, so it must fix >> something to make it worth considering. > > If we do not enable that, we would not find the problem. Sorry, that didn't make any sense to me. I'm hoping you will point us to a bug report that is fixed by this patch. > On one my test setup that _CRS does state 64bit resource range, > but when I clear some device resource manually and let kernel allocate > high, just then find out those devices does not work with drivers. > It turns out _CRS have more big range than what the chipset setting states. > with fixing in BIOS, allocate high is working now on that platform. I didn't understand this either, sorry. Are you saying that this patch helps us work around a BIOS defect? >> Steve's problem doesn't count because that's a "pci=nocrs" case that >> will always require special handling. > > but pci=nocrs is still supported, even some systems does not work with > pci=use_crs > >> A general solution is not >> possible without a BIOS change (to describe >4GB apertures) or a >> native host bridge driver (to discover >4GB apertures from the >> hardware). These patches only make Steve's machine work by accident >> -- they make us put the video device above 4GB, and we're just lucky >> that the host bridge claims that region. > > Some bios looks enabling the non-stated range default to legacy chain. > Some bios does not do that. only stated range count. > So with pci=nocrs we still have some chance to get allocate high working. The patch as proposed changes behavior for all systems, whether we're using _CRS or not (in fact, it even changes the behavior for non-x86 systems). The only case we know of where it fixes something is Steve's system, where he already has to use "pci=nocrs" in order for it to help. My point is that it would be safer to leave things as they are for everybody, and merely ask Steve to use "pci=nocrs pci=alloc_high" or something similar. >> One possibility is some sort of boot-time option to force a PCI device >> to a specified address. That would be useful for debugging as well as >> for Steve's machine. > > yeah, how about > > pci=alloc_high > > and default to disabled ? I was actually thinking of something more specific, e.g., a way to place one device at an exact address. I've implemented that a couple times already for testing various things. But maybe a more general option like "pci=alloc_high" would make sense, too. Linux has a long history of allocating bottom-up. Windows has a long history of allocating top-down. You're proposing a third alternative, allocating bottom-up starting at 4GB for 64-bit BARs. If we change this area, I would prefer something that follows Windows because I think it will be closer to what's been tested by Windows. Do you think your alternative is better? -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 4, 2012 at 9:50 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > On Mon, Jun 4, 2012 at 7:37 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> On Sun, Jun 3, 2012 at 6:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >>> On Fri, Jun 1, 2012 at 4:30 PM, Yinghai Lu <yinghai@kernel.org> wrote: >>>> On Tue, May 29, 2012 at 1:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: >>>>> >>>>> The bus-side address space should not be more than 32 bits no matter >>>>> what. As Bjorn indicates, you seem to be mixing up bus and cpu >>>>> addresses all over the place. >>>> >>>> please check update patches that is using converted pci bus address >>>> for boundary checking. >>> >>> What problem does this fix? There's significant risk that this >>> allocation change will make us trip over something, so it must fix >>> something to make it worth considering. >> >> If we do not enable that, we would not find the problem. > > Sorry, that didn't make any sense to me. I'm hoping you will point us > to a bug report that is fixed by this patch. current it only help Steve's test case. > >> On one my test setup that _CRS does state 64bit resource range, >> but when I clear some device resource manually and let kernel allocate >> high, just then find out those devices does not work with drivers. >> It turns out _CRS have more big range than what the chipset setting states. >> with fixing in BIOS, allocate high is working now on that platform. > > I didn't understand this either, sorry. Are you saying that this > patch helps us work around a BIOS defect? Help us find out one BIOS defect. > >> yeah, how about >> >> pci=alloc_high >> >> and default to disabled ? > > I was actually thinking of something more specific, e.g., a way to > place one device at an exact address. I've implemented that a couple > times already for testing various things. But maybe a more general > option like "pci=alloc_high" would make sense, too. yeah. > .... > Linux has a long history of allocating bottom-up. Windows has a long > history of allocating top-down. You're proposing a third alternative, > allocating bottom-up starting at 4GB for 64-bit BARs. If we change > this area, I would prefer something that follows Windows because I > think it will be closer to what's been tested by Windows. Do you > think your alternative is better? hope we can figure out how windows is making it work. Steve, Can you check if Windows is working with your test case ? If it works, we may try do the same thing from Linux, so you will not need to append "pci=nocrs pci=alloc_high"... Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org> wrote: > > Linux has a long history of allocating bottom-up. Windows has a long > > history of allocating top-down. You're proposing a third alternative, > > allocating bottom-up starting at 4GB for 64-bit BARs. If we change > > this area, I would prefer something that follows Windows because I > > think it will be closer to what's been tested by Windows. Do you > > think your alternative is better? > > hope we can figure out how windows is making it work. > > Steve, Can you check if Windows is working with your test case ? > > If it works, we may try do the same thing from Linux, so you will not > need to append "pci=nocrs pci=alloc_high"... > Unfortunately I don't have a 64 bit version of Windows to test with. Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash. From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible. I'll try to find the specification document again. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 6, 2012 at 2:44 AM, Steven Newbury <steve@snewbury.org.uk> wrote: > On Tue, 5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org> wrote: >> > Linux has a long history of allocating bottom-up. Windows has a long >> > history of allocating top-down. You're proposing a third alternative, >> > allocating bottom-up starting at 4GB for 64-bit BARs. If we change >> > this area, I would prefer something that follows Windows because I >> > think it will be closer to what's been tested by Windows. Do you >> > think your alternative is better? >> >> hope we can figure out how windows is making it work. >> >> Steve, Can you check if Windows is working with your test case ? >> >> If it works, we may try do the same thing from Linux, so you will not >> need to append "pci=nocrs pci=alloc_high"... >> > Unfortunately I don't have a 64 bit version of Windows to test with. Vista(32 bit) fails to even boot when docked, hot-plugging fails to allocate resources, but at least doesn't crash. > > From what I've read about the (64 bit) Windows allocation stragegy it's closer to Yinghai's method than the Linux default, preferring 64 bit resources (>4G) when possible. I'll try to find the specification document again. Here's the host bridge info from the BIOS (from https://bugzilla.kernel.org/show_bug.cgi?id=10461 attachment https://bugzilla.kernel.org/attachment.cgi?id=72869): ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff] pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff] pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff] pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff] pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff] pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff] pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff] pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff] pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff] pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff] There's no aperture above 4GB. So I don't think any version of Windows will ever assign a BAR above 4GB. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
於 三,2012-07-04 於 10:56 +0800,lee joey 提到: > > > ---------- Forwarded message ---------- > From: Bjorn Helgaas <bhelgaas@google.com> > Date: 2012/6/7 > Subject: Re: [PATCH 02/11] PCI: Try to allocate mem64 above 4G at > first > To: Steven Newbury <steve@snewbury.org.uk> > Cc: Yinghai Lu <yinghai@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>, > David Miller <davem@davemloft.net>, Tony Luck <tony.luck@intel.com>, > Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton > <akpm@linux-foundation.org>, linux-pci@vger.kernel.org, > linux-kernel@vger.kernel.org > > > On Wed, Jun 6, 2012 at 2:44 AM, Steven Newbury <steve@snewbury.org.uk> > wrote: > > On Tue, 5 Jun 2012, 06:04:57 BST, Yinghai Lu <yinghai@kernel.org> > wrote: > >> > Linux has a long history of allocating bottom-up. Windows has a > long > >> > history of allocating top-down. You're proposing a third > alternative, > >> > allocating bottom-up starting at 4GB for 64-bit BARs. If we > change > >> > this area, I would prefer something that follows Windows because > I > >> > think it will be closer to what's been tested by Windows. Do you > >> > think your alternative is better? > >> > >> hope we can figure out how windows is making it work. > >> > >> Steve, Can you check if Windows is working with your test case ? > >> > >> If it works, we may try do the same thing from Linux, so you will > not > >> need to append "pci=nocrs pci=alloc_high"... > >> > > Unfortunately I don't have a 64 bit version of Windows to test > with. Vista(32 bit) fails to even boot when docked, hot-plugging > fails to allocate resources, but at least doesn't crash. > > > > From what I've read about the (64 bit) Windows allocation stragegy > it's closer to Yinghai's method than the Linux default, preferring 64 > bit resources (>4G) when possible. I'll try to find the specification > document again. > > > Here's the host bridge info from the BIOS (from > https://bugzilla.kernel.org/show_bug.cgi?id=10461 attachment > https://bugzilla.kernel.org/attachment.cgi?id=72869): > > ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) > pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] > pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] > pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] > pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff] > pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xf7ffffff] > pci_root PNP0A03:00: host bridge window [mem 0xfc000000-0xfebfffff] > pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff] > pci_root PNP0A03:00: host bridge window [mem 0xfed1c000-0xfed1ffff] > pci_root PNP0A03:00: host bridge window [mem 0xfed90000-0xfed9ffff] > pci_root PNP0A03:00: host bridge window [mem 0xfed40000-0xfed44fff] > pci_root PNP0A03:00: host bridge window [mem 0xfeda7000-0xfedfffff] > pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff] > pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdfffff] > > There's no aperture above 4GB. So I don't think any version of > Windows will ever assign a BAR above 4GB. > -- > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > Hope have any help... Here have a document from MSDN talk about the pci allocate strategy on Windows server 2003, XP and vista: http://msdn.microsoft.com/en-us/library/windows/hardware/gg462986.aspx Per page 4, looks Microsoft have different strategy on different Windows version On XP and server 2003: First, they ignored BIOS's boot configuration and allocate below 4G. If fail, then try to allocate above 4GB. On Vista: it always respects the boot configuration of devices above 4 GB. But, this document didn't cover the behavior on Windows 7, not sure it's the same with Vista. Thanks Joey Lee -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 4ce5ef2..075e5b1 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -121,14 +121,16 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res, { int i, ret = -ENOMEM; struct resource *r; - resource_size_t max = -1; + resource_size_t start = 0; + resource_size_t end = MAX_RESOURCE; type_mask |= IORESOURCE_IO | IORESOURCE_MEM; - /* don't allocate too high if the pref mem doesn't support 64bit*/ - if (!(res->flags & IORESOURCE_MEM_64)) - max = PCIBIOS_MAX_MEM_32; + /* If this is a 64-bit mem resource, try above 4GB first */ + if (res->flags & IORESOURCE_MEM_64) + start = (resource_size_t) (1ULL << 32); +again: pci_bus_for_each_resource(bus, r, i) { if (!r) continue; @@ -145,12 +147,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res, /* Ok, try it out.. */ ret = allocate_resource(r, res, size, - r->start ? : min, - max, align, + max(start, r->start ? : min), + end, align, alignf, alignf_data); if (ret == 0) - break; + return 0; + } + + if (start != 0) { + start = 0; + goto again; } + return ret; }