Message ID | 20240716193246.1909697-1-stewart.hildebrand@amd.com (mailing list archive) |
---|---|
Headers | show |
Series | PCI: Align small (<4k) BARs | expand |
From: Stewart Hildebrand > Sent: 16 July 2024 20:33 > > This series sets the default minimum resource alignment to 4k for memory > BARs. In preparation, it makes an optimization and addresses some corner > cases observed when reallocating BARs. I consider the prepapatory > patches to be prerequisites to changing the default BAR alignment. Should the BARs be page aligned on systems with large pages? At least as an option for hypervisor pass-through and any than can be mmap()ed into userspace. Does any hardware actually have 'silly numbers' of small memory BARs? I have a vague memory of some ethernet controllers having lots of (?) virtual devices that might have separate registers than can be mapped out to a hypervisor. Expanding those to a large page might be problematic - but needed for security. For more normal hardware just ensuring that two separate targets don't share a page while allowing (eg) two 1k BAR to reside in the same 64k page would give some security. Aligning a small MSIX BAR is unlikely to have any effect on the address space utilisation (for PCIe) since the bridge will assign a power of two sized block - with a big pad (useful for generating pcie errors!) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: David Laight > Sent: 17 July 2024 14:15 > > From: Stewart Hildebrand > > Sent: 16 July 2024 20:33 > > > > This series sets the default minimum resource alignment to 4k for memory > > BARs. In preparation, it makes an optimization and addresses some corner > > cases observed when reallocating BARs. I consider the prepapatory > > patches to be prerequisites to changing the default BAR alignment. > > Should the BARs be page aligned on systems with large pages? > At least as an option for hypervisor pass-through and any than can be mmap()ed > into userspace. The actual patch says PAGE_SIZE .. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 7/17/24 09:15, David Laight wrote: > From: Stewart Hildebrand >> Sent: 16 July 2024 20:33 >> >> This series sets the default minimum resource alignment to 4k for memory >> BARs. In preparation, it makes an optimization and addresses some corner >> cases observed when reallocating BARs. I consider the prepapatory >> patches to be prerequisites to changing the default BAR alignment. > > Should the BARs be page aligned on systems with large pages? > At least as an option for hypervisor pass-through and any than can be mmap()ed > into userspace. It is sort of an option already using the pci=resource_alignment= option, but you'd need to spell out every device and manually set the alignment value, and you'd still end up with fake BAR sizes. I had actually prepared locally a patch to make this less painful to do and preserve the BAR size (introduce "pci=resource_alignment=all" option), but I'd like Bjorn's opinion before sending since there has been some previous reluctance to making changes to the pci=resource_alignment= option [2]. [2] https://lore.kernel.org/linux-pci/20160929115422.GA31048@localhost/ Anyway, 4k is more defensible because that is what the PCIe 6.1 spec calls out, and that is better than the current situation of no default minimum alignment. I feel PAGE_SIZE is also justified, and that is why the actual patch now says max(SZ_4K, PAGE_SIZE) as you pointed out elsewhere. This is a change from v1 that simply had 4k (sorry, I forgot to update the cover letter). PowerNV has been using PAGE_SIZE since commits 0a701aa63784 and 382746376993, I think with 64k page size. I don't have a strong opinion whether the common default should be max(SZ_4K, PAGE_SIZE) or simply SZ_4K. > Does any hardware actually have 'silly numbers' of small memory BARs? > > I have a vague memory of some ethernet controllers having lots of (?) > virtual devices that might have separate registers than can be mapped > out to a hypervisor. > Expanding those to a large page might be problematic - but needed for security. This series does not change alignment of SRIOV / VF BARs. See commits 62d9a78f32d9, ea629d873f3e, and PCIe 6.1 spec section 9.2.1.1.1. > For more normal hardware just ensuring that two separate targets don't share > a page while allowing (eg) two 1k BAR to reside in the same 64k page would > give some security. Allow me to understand this better, with an example: PCI Device A BAR 1 (1k) BAR 2 (1k) PCI Device B BAR 1 (1k) BAR 2 (1k) We align all BARs to 4k. Additionally, are you saying it would be ok to let both device A BARs to reside in the same 64k page, while device B BARs would need to reside in a separate 64k page? I.e. having two levels of alignment: PAGE_SIZE on a per-device basis, and 4k on a per-BAR basis? If I understand you correctly, there's currently no logic in the PCI subsystem to easily support this, so that is a rather large ask. I'm also not sure that it's necessary. > Aligning a small MSIX BAR is unlikely to have any effect on the address > space utilisation (for PCIe) since the bridge will assign a power of two > sized block - with a big pad (useful for generating pcie errors!) > > David
From: Stewart Hildebrand > Sent: 17 July 2024 19:31 ... > > For more normal hardware just ensuring that two separate targets don't share > > a page while allowing (eg) two 1k BAR to reside in the same 64k page would > > give some security. > > Allow me to understand this better, with an example: > > PCI Device A > BAR 1 (1k) > BAR 2 (1k) > > PCI Device B > BAR 1 (1k) > BAR 2 (1k) > > We align all BARs to 4k. Additionally, are you saying it would be ok to > let both device A BARs to reside in the same 64k page, while device B > BARs would need to reside in a separate 64k page? I.e. having two levels > of alignment: PAGE_SIZE on a per-device basis, and 4k on a per-BAR > basis? > > If I understand you correctly, there's currently no logic in the PCI > subsystem to easily support this, so that is a rather large ask. I'm > also not sure that it's necessary. That is what I was thinking, but it probably doesn't matter. It would only be necessary if the system would otherwise run out of PCI(e) address space. Even after I reduced our FPGAs BARs from 32MB to 'only' 4MB (1MB + 1MB + 8k) we still get issues with some PC bios failing to allocate the resources in some slots - but these are old x86-64 systems that might have been expected to run 32bit windows. The requirement to use a separate BAR for MSIX pretty much doubles the required address space. As an aside, if a PCIe device asks for: BAR-0 (4k) BAR-1 (8k) BAR-2 (4k) (which is a bit silly) does it get packed into 16k with no padding by assigning BAR-2 between BAR-0 and BAR-1, or is it all padded out to 32k. I'd probably add a comment to say it isn't done :-) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 7/18/24 06:01, David Laight wrote: > From: Stewart Hildebrand >> Sent: 17 July 2024 19:31 > ... >>> For more normal hardware just ensuring that two separate targets don't share >>> a page while allowing (eg) two 1k BAR to reside in the same 64k page would >>> give some security. >> >> Allow me to understand this better, with an example: >> >> PCI Device A >> BAR 1 (1k) >> BAR 2 (1k) >> >> PCI Device B >> BAR 1 (1k) >> BAR 2 (1k) >> >> We align all BARs to 4k. Additionally, are you saying it would be ok to >> let both device A BARs to reside in the same 64k page, while device B >> BARs would need to reside in a separate 64k page? I.e. having two levels >> of alignment: PAGE_SIZE on a per-device basis, and 4k on a per-BAR >> basis? >> >> If I understand you correctly, there's currently no logic in the PCI >> subsystem to easily support this, so that is a rather large ask. I'm >> also not sure that it's necessary. > > That is what I was thinking, but it probably doesn't matter. > It would only be necessary if the system would otherwise run out > of PCI(e) address space. > > Even after I reduced our FPGAs BARs from 32MB to 'only' 4MB (1MB + 1MB + 8k) > we still get issues with some PC bios failing to allocate the resources > in some slots - but these are old x86-64 systems that might have been expected > to run 32bit windows. I expect this series will not make any difference with that particular scenario since the BARs are >4k (and PAGE_SIZE == 4k on x86). > The requirement to use a separate BAR for MSIX pretty much doubles the > required address space. 4k region, not BAR. > As an aside, if a PCIe device asks for: > BAR-0 (4k) > BAR-1 (8k) > BAR-2 (4k) > (which is a bit silly) > does it get packed into 16k with no padding by assigning BAR-2 between > BAR-0 and BAR-1, or is it all padded out to 32k. > I'd probably add a comment to say it isn't done :-) On a system with 4k page size, this series should not affect the example you've provided since those BARs are all 4k or larger. If you are testing with this series applied to your kernel and notice any regression, please let me know.