diff mbox series

[v2] PCI: vmd: Enable Hotplug based on BIOS setting on VMD rootports

Message ID 20231127211729.42668-1-nirmal.patel@linux.intel.com
State New
Headers show
Series [v2] PCI: vmd: Enable Hotplug based on BIOS setting on VMD rootports | expand

Commit Message

Nirmal Patel Nov. 27, 2023, 9:17 p.m. UTC
Currently VMD copies root bridge setting to enable Hotplug on its
rootports. This mechanism works fine for Host OS and no issue has
been observed. However in case of VM, all the HyperVisors don't
pass the Hotplug setting to the guest BIOS which results in
assigning default values and disabling Hotplug capability in the
guest which have been observed by many OEMs.

VMD Hotplug can be enabled or disabled based on the VMD rootports'
Hotplug configuration in BIOS. is_hotplug_bridge is set on each
VMD rootport based on Hotplug capable bit in SltCap in probe.c.
Check is_hotplug_bridge and enable or disable native_pcie_hotplug
based on that value.

This patch will make sure that Hotplug is enabled properly in Host
as well as in VM while honoring _OSC settings as well as VMD hotplug
setting.

Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
---
v1->v2: Updating commit message.
---
 drivers/pci/controller/vmd.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Dec. 2, 2023, 12:07 a.m. UTC | #1
On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote:
> Currently VMD copies root bridge setting to enable Hotplug on its
> rootports. This mechanism works fine for Host OS and no issue has
> been observed. However in case of VM, all the HyperVisors don't
> pass the Hotplug setting to the guest BIOS which results in
> assigning default values and disabling Hotplug capability in the
> guest which have been observed by many OEMs.

Can we make this a little more specific?  I'm lacking a lot of
virtualization background here, so I'm going to ask a bunch of stupid
questions here:

"VMD copies root bridge setting" refers to _OSC and the copying done
in vmd_copy_host_bridge_flags()?  Obviously this must be in the Host
OS.

"This mechanism works fine for Host OS and no issue has been
observed."  I guess this means that if the platform grants hotplug
control to the Host OS via _OSC, pciehp in the Host OS works fine to
manage hotplug on Root Ports below the VMD?

If the platform does *not* grant hotplug control to the Host OS, I
guess it means the Host OS pciehp can *not* manage hotplug below VMD?
Is that also what you expect?

"However in case of VM, all the HyperVisors don't pass the Hotplug
setting to the guest BIOS"  What is the mechanism by which they would
pass the hotplug setting?  I guess the guest probably doesn't see the
VMD endpoint itself, right?  The guest sees either virtualized or
passed-through VMD Root Ports?

I assume the guest OS sees a constructed ACPI system (different from
the ACPI environment the host sees), so it sees a PNP0A03 host bridge
with _OSC?  And presumably VMD Root Ports below that host bridge?

So are you saying the _OSC seen by the guest doesn't grant hotplug
control to the guest OS?  And it doesn't do that because the guest
BIOS hasn't implemented _OSC?  Or maybe the guest BIOS relies on the
Slot Capabilities registers of VMD Root Ports to decide whether _OSC
should grant hotplug control?  And those Slot Capabilities don't
advertise hotplug support?

"... which results in assigning default values and disabling Hotplug
capability in the guest."  What default values are these?  Is this
talking about the default host_bridge->native_pcie_hotplug value set
in acpi_pci_root_create(), where the OS assumes hotplug is owned by
the platform unless _OSC grants control to the OS?

> VMD Hotplug can be enabled or disabled based on the VMD rootports'
> Hotplug configuration in BIOS. is_hotplug_bridge is set on each
> VMD rootport based on Hotplug capable bit in SltCap in probe.c.
> Check is_hotplug_bridge and enable or disable native_pcie_hotplug
> based on that value.
> 
> This patch will make sure that Hotplug is enabled properly in Host
> as well as in VM while honoring _OSC settings as well as VMD hotplug
> setting.

"Hotplug is enabled properly in Host as well as in VM" sounds like
we're talking about both the host OS and the guest OS.

In the host OS, this overrides whatever was negotiated via _OSC, I
guess on the principle that _OSC doesn't apply because the host BIOS
doesn't know about the Root Ports below the VMD.  (I'm not sure it's
100% resolved that _OSC doesn't apply to them, so we should mention
that assumption here.)

In the guest OS, this still overrides whatever was negotiated via
_OSC, but presumably the guest BIOS *does* know about the Root Ports,
so I don't think the same principle about overriding _OSC applies
there.

I think it's still a problem that we handle
host_bridge->native_pcie_hotplug differently than all the other
features negotiated via _OSC.  We should at least explain this
somehow.

I suspect part of the reason for doing it differently is to avoid the
interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: Honor ACPI
_OSC on PCIe features").  If so, I think 04b12ef163d1 should be
mentioned here because it's an important piece of the picture.

> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> ---
> v1->v2: Updating commit message.
> ---
>  drivers/pci/controller/vmd.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 769eedeb8802..e39eaef5549a 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  	resource_size_t membar2_offset = 0x2000;
>  	struct pci_bus *child;
>  	struct pci_dev *dev;
> +	struct pci_host_bridge *vmd_bridge;
>  	int ret;
>  
>  	/*
> @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  	 * and will fail pcie_bus_configure_settings() early. It can instead be
>  	 * run on each of the real root ports.
>  	 */
> -	list_for_each_entry(child, &vmd->bus->children, node)
> +	vmd_bridge = to_pci_host_bridge(vmd->bus->bridge);
> +	list_for_each_entry(child, &vmd->bus->children, node) {
>  		pcie_bus_configure_settings(child);
> +		/*
> +		 * When Hotplug is enabled on vmd root-port, enable it on vmd
> +		 * bridge.
> +		 */
> +		if (child->self->is_hotplug_bridge)
> +			vmd_bridge->native_pcie_hotplug = 1;

"is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which
means "the hardware of this slot is hot-plug *capable*."

Whether hotplug is *enabled* is a separate policy decision about
whether the OS is allowed to operate the hotplug functionality, so I
think saying "When Hotplug is enabled" in the comment is a little bit
misleading.

Bjorn

> +	}
>  
>  	pci_bus_add_devices(vmd->bus);
>  
> -- 
> 2.31.1
>
Nirmal Patel Dec. 5, 2023, 10:20 p.m. UTC | #2
On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote:
> On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote:
> > Currently VMD copies root bridge setting to enable Hotplug on its
> > rootports. This mechanism works fine for Host OS and no issue has
> > been observed. However in case of VM, all the HyperVisors don't
> > pass the Hotplug setting to the guest BIOS which results in
> > assigning default values and disabling Hotplug capability in the
> > guest which have been observed by many OEMs.
> 
> Can we make this a little more specific?  I'm lacking a lot of
> virtualization background here, so I'm going to ask a bunch of stupid
> questions here:
> 
> "VMD copies root bridge setting" refers to _OSC and the copying done
> in vmd_copy_host_bridge_flags()?  Obviously this must be in the Host
> OS.
Yes. we are talking about vmd_copy_host_bridge_flags. It gets called in
both Host and Guest. But it assigns different values. In Host, it
assigns correct values, while in guest it assigns power-on default
value 0 which in simple term disable everything including hotplug.
> 
> "This mechanism works fine for Host OS and no issue has been
> observed."  I guess this means that if the platform grants hotplug
> control to the Host OS via _OSC, pciehp in the Host OS works fine to
> manage hotplug on Root Ports below the VMD?
Yes. When platform hotplug setting is enabled, then _OSC assigns
correct value to vmd root ports via vmd_copy_host_bridge_flags.
> 
> If the platform does *not* grant hotplug control to the Host OS, I
> guess it means the Host OS pciehp can *not* manage hotplug below VMD?
> Is that also what you expect?
> 
> "However in case of VM, all the HyperVisors don't pass the Hotplug
> setting to the guest BIOS"  What is the mechanism by which they would
> pass the hotplug setting?  I guess the guest probably doesn't see the
> VMD endpoint itself, right?  The guest sees either virtualized or
> passed-through VMD Root Ports?
In guest, vmd is passthrough including pci topology behind vmd. The way
we verify Hotplug capability is to read Slot Capabilities registers'
hotplug enabled bit set on vmd root ports in Guest.
 
The values copied in vmd_copy_host_bridge_flags are different in Host
than in Guest. I do not know what component is responsible for this in
every HyperVisor. But I tested this in ESXi, HyperV, KVM and they all
had the same behavior where the _OSC flags are set to power-on default
values of 0 by vmd_copy_host_bridge_flags in Guest OS which is
disabling hotplug.
> 
> I assume the guest OS sees a constructed ACPI system (different from
> the ACPI environment the host sees), so it sees a PNP0A03 host bridge
> with _OSC?  And presumably VMD Root Ports below that host bridge?
> 
> So are you saying the _OSC seen by the guest doesn't grant hotplug
> control to the guest OS?  And it doesn't do that because the guest
> BIOS hasn't implemented _OSC?  Or maybe the guest BIOS relies on the
> Slot Capabilities registers of VMD Root Ports to decide whether _OSC
> should grant hotplug control?  And those Slot Capabilities don't
> advertise hotplug support?
Currently Hotplug is controlled by _OSC in both host and Guest. In case
of guest, it seems guest BIOS hasn't implemented _OSC since all the
flags are set to 0 which is not the case in host.

VMD is passthrough in Guest so the slot capabilities registers are
still same as what Host driver would see. That is how we can still
control hotplug on vmd on both Guest and Host.
> 
> "... which results in assigning default values and disabling Hotplug
> capability in the guest."  What default values are these?
0. Everything is disabled in guest. So basically _OSC is writing wrong
values in guest.
>   Is this
> talking about the default host_bridge->native_pcie_hotplug value set
> in acpi_pci_root_create(), where the OS assumes hotplug is owned by
> the platform unless _OSC grants control to the OS?
vmd driver calls pci_create_root_bus which eventually ends up
calling pci_init_host_bridge where native_pcie_hotplug is set to 1.
Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug to
_OSC setting of 0 in guest.
> 
> > VMD Hotplug can be enabled or disabled based on the VMD rootports'
> > Hotplug configuration in BIOS. is_hotplug_bridge is set on each
> > VMD rootport based on Hotplug capable bit in SltCap in probe.c.
> > Check is_hotplug_bridge and enable or disable native_pcie_hotplug
> > based on that value.
> > 
> > This patch will make sure that Hotplug is enabled properly in Host
> > as well as in VM while honoring _OSC settings as well as VMD
> > hotplug
> > setting.
> 
> "Hotplug is enabled properly in Host as well as in VM" sounds like
> we're talking about both the host OS and the guest OS.
Yes. VM means guest.
> 
> In the host OS, this overrides whatever was negotiated via _OSC, I
> guess on the principle that _OSC doesn't apply because the host BIOS
> doesn't know about the Root Ports below the VMD.  (I'm not sure it's
> 100% resolved that _OSC doesn't apply to them, so we should mention
> that assumption here.)
_OSC still controls every flag including Hotplug. I have observed that
slot capabilities register has hotplug enabled only when platform has
enabled the hotplug. So technically not overriding it in the host.
It overrides in guest because _OSC is passing wrong/different
information than the _OSC information in Host.
Also like I mentioned, slot capabilities registers are not changed in
Guest because vmd is passthrough.
> 
> In the guest OS, this still overrides whatever was negotiated via
> _OSC, but presumably the guest BIOS *does* know about the Root Ports,
> so I don't think the same principle about overriding _OSC applies
> there.
> 
> I think it's still a problem that we handle
> host_bridge->native_pcie_hotplug differently than all the other
> features negotiated via _OSC.  We should at least explain this
> somehow.
Right now this is the only way to know in Guest OS if platform has
enabled Hotplug or not. We have many customers complaining about
Hotplug being broken in Guest. So just trying to honor _OSC but also
fixing its limitation in Guest.
> 
> I suspect part of the reason for doing it differently is to avoid the
> interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: Honor ACPI
> _OSC on PCIe features").  If so, I think 04b12ef163d1 should be
> mentioned here because it's an important piece of the picture.
I can add a note about this patch as well. Let me know if you want me
to add something specific.

Thank you for taking the time. This is a very critical issue for us and
we have many customers complaining about it, I would appreciate to get
it resolved as soon as possible.

Thanks
nirmal
> 
> > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > ---
> > v1->v2: Updating commit message.
> > ---
> >  drivers/pci/controller/vmd.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/controller/vmd.c
> > b/drivers/pci/controller/vmd.c
> > index 769eedeb8802..e39eaef5549a 100644
> > --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev
> > *vmd, unsigned long features)
> >  	resource_size_t membar2_offset = 0x2000;
> >  	struct pci_bus *child;
> >  	struct pci_dev *dev;
> > +	struct pci_host_bridge *vmd_bridge;
> >  	int ret;
> >  
> >  	/*
> > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev
> > *vmd, unsigned long features)
> >  	 * and will fail pcie_bus_configure_settings() early. It can
> > instead be
> >  	 * run on each of the real root ports.
> >  	 */
> > -	list_for_each_entry(child, &vmd->bus->children, node)
> > +	vmd_bridge = to_pci_host_bridge(vmd->bus->bridge);
> > +	list_for_each_entry(child, &vmd->bus->children, node) {
> >  		pcie_bus_configure_settings(child);
> > +		/*
> > +		 * When Hotplug is enabled on vmd root-port, enable it
> > on vmd
> > +		 * bridge.
> > +		 */
> > +		if (child->self->is_hotplug_bridge)
> > +			vmd_bridge->native_pcie_hotplug = 1;
> 
> "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which
> means "the hardware of this slot is hot-plug *capable*."
> 
> Whether hotplug is *enabled* is a separate policy decision about
> whether the OS is allowed to operate the hotplug functionality, so I
> think saying "When Hotplug is enabled" in the comment is a little bit
> misleading.
I will rewrite the comment.
> 
> Bjorn
> 
> > +	}
> >  
> >  	pci_bus_add_devices(vmd->bus);
> >  
> > -- 
> > 2.31.1
> >
Bjorn Helgaas Dec. 6, 2023, 12:27 a.m. UTC | #3
On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote:
> > On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote:
> > > Currently VMD copies root bridge setting to enable Hotplug on its
> > > rootports. This mechanism works fine for Host OS and no issue has
> > > been observed. However in case of VM, all the HyperVisors don't
> > > pass the Hotplug setting to the guest BIOS which results in
> > > assigning default values and disabling Hotplug capability in the
> > > guest which have been observed by many OEMs.
> > 
> > Can we make this a little more specific?  I'm lacking a lot of
> > virtualization background here, so I'm going to ask a bunch of stupid
> > questions here:
> > 
> > "VMD copies root bridge setting" refers to _OSC and the copying done
> > in vmd_copy_host_bridge_flags()?  Obviously this must be in the Host
> > OS.
>
> Yes. we are talking about vmd_copy_host_bridge_flags. It gets called in
> both Host and Guest. But it assigns different values. In Host, it
> assigns correct values, while in guest it assigns power-on default
> value 0 which in simple term disable everything including hotplug.

If vmd_copy_host_bridge_flags() is called both in the Host and the
Guest, I guess that means both the VMD endpoint and the VMD Root Ports
below it are passed through to the Guest?  I expected only the VMD
Root Ports would be passed through, but maybe that's my
misunderstanding.

The VMD endpoint is under host bridge A, and the VMD creates a new PCI
domain under a new host bridge B.  vmd_copy_host_bridge_flags() copies
all the feature ownership flags from A to B.

On ACPI systems (like both Host and Guest) all these flags are
initially cleared for host bridge A in acpi_pci_root_create() because
these features are owned by the platform until _OSC grants ownership
to the OS.  They are initially *set* for host bridge B in
pci_init_host_bridge() because it's created by the vmd driver instead
of being enumerated via ACPI.

If the Host _OSC grants ownership to the OS, A's native_pcie_hotplug
will be set, and vmd_copy_host_bridge_flags() leaves it set for B as
well.  If the Host _OSC does *not* grant hotplug ownership to the OS,
native_pcie_hotplug will be cleared for both A and B.

Apparently the Guest doesn't supply _OSC (seems like a spec violation;
could tell from the dmesg), so A's native_pcie_hotplug remains
cleared, which means vmd_copy_host_bridge_flags() will also clear it
for B.

[The lack of a Guest BIOS _OSC would also mean that if you passed
through a normal non-VMD PCIe Switch Downstream Port to the Guest, the
Guest OS would never be able to manage hotplug below that Port.  Maybe
nobody passes through Switch Ports.]

> > "This mechanism works fine for Host OS and no issue has been
> > observed."  I guess this means that if the platform grants hotplug
> > control to the Host OS via _OSC, pciehp in the Host OS works fine to
> > manage hotplug on Root Ports below the VMD?
>
> Yes. When platform hotplug setting is enabled, then _OSC assigns
> correct value to vmd root ports via vmd_copy_host_bridge_flags.
>
> > If the platform does *not* grant hotplug control to the Host OS, I
> > guess it means the Host OS pciehp can *not* manage hotplug below VMD?
> > Is that also what you expect?
> > 
> > "However in case of VM, all the HyperVisors don't pass the Hotplug
> > setting to the guest BIOS"  What is the mechanism by which they would
> > pass the hotplug setting?  I guess the guest probably doesn't see the
> > VMD endpoint itself, right?  The guest sees either virtualized or
> > passed-through VMD Root Ports?
>
> In guest, vmd is passthrough including pci topology behind vmd. The way
> we verify Hotplug capability is to read Slot Capabilities registers'
> hotplug enabled bit set on vmd root ports in Guest.

I guess maybe this refers to set_pcie_hotplug_bridge(), which checks
PCI_EXP_SLTCAP_HPC?  If it's not set, pciehp won't bind to the device.
This behavior is the same in Host and Guest.

> The values copied in vmd_copy_host_bridge_flags are different in Host
> than in Guest. I do not know what component is responsible for this in
> every HyperVisor. But I tested this in ESXi, HyperV, KVM and they all
> had the same behavior where the _OSC flags are set to power-on default
> values of 0 by vmd_copy_host_bridge_flags in Guest OS which is
> disabling hotplug.
>
> > I assume the guest OS sees a constructed ACPI system (different from
> > the ACPI environment the host sees), so it sees a PNP0A03 host bridge
> > with _OSC?  And presumably VMD Root Ports below that host bridge?
> > 
> > So are you saying the _OSC seen by the guest doesn't grant hotplug
> > control to the guest OS?  And it doesn't do that because the guest
> > BIOS hasn't implemented _OSC?  Or maybe the guest BIOS relies on the
> > Slot Capabilities registers of VMD Root Ports to decide whether _OSC
> > should grant hotplug control?  And those Slot Capabilities don't
> > advertise hotplug support?
>
> Currently Hotplug is controlled by _OSC in both host and Guest. In case
> of guest, it seems guest BIOS hasn't implemented _OSC since all the
> flags are set to 0 which is not the case in host.

I think you want pciehp to work on the VMD Root Ports in the Guest,
no matter what, on the assumption that any _OSC that applies to host
bridge A does not apply to the host bridge created by the vmd driver.

If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC
on PCIe features").  Since host bridge B was not enumerated via ACPI,
the OS owns all those features under B by default, so reverting
04b12ef163d1 would leave that state.

Obviously we'd have to first figure out another solution for the AER
message flood that 04b12ef163d1 resolved.

> VMD is passthrough in Guest so the slot capabilities registers are
> still same as what Host driver would see. That is how we can still
> control hotplug on vmd on both Guest and Host.
> > 
> > "... which results in assigning default values and disabling Hotplug
> > capability in the guest."  What default values are these?
>
> 0. Everything is disabled in guest. So basically _OSC is writing wrong
> values in guest.
>
> > Is this talking about the default host_bridge->native_pcie_hotplug
> > value set in acpi_pci_root_create(), where the OS assumes hotplug
> > is owned by the platform unless _OSC grants control to the OS?
>
> vmd driver calls pci_create_root_bus which eventually ends up
> calling pci_init_host_bridge where native_pcie_hotplug is set to 1.
> Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug to
> _OSC setting of 0 in guest.
>
> > > VMD Hotplug can be enabled or disabled based on the VMD
> > > rootports' Hotplug configuration in BIOS. is_hotplug_bridge is
> > > set on each VMD rootport based on Hotplug capable bit in SltCap
> > > in probe.c.  Check is_hotplug_bridge and enable or disable
> > > native_pcie_hotplug based on that value.
> > > 
> > > This patch will make sure that Hotplug is enabled properly in
> > > Host as well as in VM while honoring _OSC settings as well as
> > > VMD hotplug setting.
> > 
> > "Hotplug is enabled properly in Host as well as in VM" sounds like
> > we're talking about both the host OS and the guest OS.
>
> Yes. VM means guest.
> > 
> > In the host OS, this overrides whatever was negotiated via _OSC, I
> > guess on the principle that _OSC doesn't apply because the host BIOS
> > doesn't know about the Root Ports below the VMD.  (I'm not sure it's
> > 100% resolved that _OSC doesn't apply to them, so we should mention
> > that assumption here.)
>
> _OSC still controls every flag including Hotplug. I have observed
> that slot capabilities register has hotplug enabled only when
> platform has enabled the hotplug. So technically not overriding it
> in the host.
>
> It overrides in guest because _OSC is passing wrong/different
> information than the _OSC information in Host.  Also like I
> mentioned, slot capabilities registers are not changed in Guest
> because vmd is passthrough.
> > 
> > In the guest OS, this still overrides whatever was negotiated via
> > _OSC, but presumably the guest BIOS *does* know about the Root
> > Ports, so I don't think the same principle about overriding _OSC
> > applies there.
> > 
> > I think it's still a problem that we handle
> > host_bridge->native_pcie_hotplug differently than all the other
> > features negotiated via _OSC.  We should at least explain this
> > somehow.
>
> Right now this is the only way to know in Guest OS if platform has
> enabled Hotplug or not. We have many customers complaining about
> Hotplug being broken in Guest. So just trying to honor _OSC but also
> fixing its limitation in Guest.
> > 
> > I suspect part of the reason for doing it differently is to avoid
> > the interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd:
> > Honor ACPI _OSC on PCIe features").  If so, I think 04b12ef163d1
> > should be mentioned here because it's an important piece of the
> > picture.
>
> I can add a note about this patch as well. Let me know if you want
> me to add something specific.
> 
> Thank you for taking the time. This is a very critical issue for us
> and we have many customers complaining about it, I would appreciate
> to get it resolved as soon as possible.

> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > ---
> > > v1->v2: Updating commit message.
> > > ---
> > >  drivers/pci/controller/vmd.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/controller/vmd.c
> > > b/drivers/pci/controller/vmd.c
> > > index 769eedeb8802..e39eaef5549a 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev
> > > *vmd, unsigned long features)
> > >  	resource_size_t membar2_offset = 0x2000;
> > >  	struct pci_bus *child;
> > >  	struct pci_dev *dev;
> > > +	struct pci_host_bridge *vmd_bridge;
> > >  	int ret;
> > >  
> > >  	/*
> > > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct vmd_dev
> > > *vmd, unsigned long features)
> > >  	 * and will fail pcie_bus_configure_settings() early. It can
> > > instead be
> > >  	 * run on each of the real root ports.
> > >  	 */
> > > -	list_for_each_entry(child, &vmd->bus->children, node)
> > > +	vmd_bridge = to_pci_host_bridge(vmd->bus->bridge);
> > > +	list_for_each_entry(child, &vmd->bus->children, node) {
> > >  		pcie_bus_configure_settings(child);
> > > +		/*
> > > +		 * When Hotplug is enabled on vmd root-port, enable it
> > > on vmd
> > > +		 * bridge.
> > > +		 */
> > > +		if (child->self->is_hotplug_bridge)
> > > +			vmd_bridge->native_pcie_hotplug = 1;
> > 
> > "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set, which
> > means "the hardware of this slot is hot-plug *capable*."
> > 
> > Whether hotplug is *enabled* is a separate policy decision about
> > whether the OS is allowed to operate the hotplug functionality, so I
> > think saying "When Hotplug is enabled" in the comment is a little bit
> > misleading.
>
> I will rewrite the comment.
> > 
> > Bjorn
> > 
> > > +	}
> > >  
> > >  	pci_bus_add_devices(vmd->bus);
> > >  
> > > -- 
> > > 2.31.1
> > > 
>
Nirmal Patel Dec. 11, 2023, 11:05 p.m. UTC | #4
On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> > On Fri, 2023-12-01 at 18:07 -0600, Bjorn Helgaas wrote:
> > > On Mon, Nov 27, 2023 at 04:17:29PM -0500, Nirmal Patel wrote:
> > > > Currently VMD copies root bridge setting to enable Hotplug on
> > > > its
> > > > rootports. This mechanism works fine for Host OS and no issue
> > > > has
> > > > been observed. However in case of VM, all the HyperVisors don't
> > > > pass the Hotplug setting to the guest BIOS which results in
> > > > assigning default values and disabling Hotplug capability in
> > > > the
> > > > guest which have been observed by many OEMs.
> > > 
> > > Can we make this a little more specific?  I'm lacking a lot of
> > > virtualization background here, so I'm going to ask a bunch of
> > > stupid
> > > questions here:
> > > 
> > > "VMD copies root bridge setting" refers to _OSC and the copying
> > > done
> > > in vmd_copy_host_bridge_flags()?  Obviously this must be in the
> > > Host
> > > OS.
> > 
> > Yes. we are talking about vmd_copy_host_bridge_flags. It gets
> > called in
> > both Host and Guest. But it assigns different values. In Host, it
> > assigns correct values, while in guest it assigns power-on default
> > value 0 which in simple term disable everything including hotplug.
> 
> If vmd_copy_host_bridge_flags() is called both in the Host and the
> Guest, I guess that means both the VMD endpoint and the VMD Root
> Ports
> below it are passed through to the Guest?  I expected only the VMD
> Root Ports would be passed through, but maybe that's my
> misunderstanding.
> 
> The VMD endpoint is under host bridge A, and the VMD creates a new
> PCI
> domain under a new host bridge B.  vmd_copy_host_bridge_flags()
> copies
> all the feature ownership flags from A to B.
> 
> On ACPI systems (like both Host and Guest) all these flags are
> initially cleared for host bridge A in acpi_pci_root_create() because
> these features are owned by the platform until _OSC grants ownership
> to the OS.  They are initially *set* for host bridge B in
> pci_init_host_bridge() because it's created by the vmd driver instead
> of being enumerated via ACPI.
> 
> If the Host _OSC grants ownership to the OS, A's native_pcie_hotplug
> will be set, and vmd_copy_host_bridge_flags() leaves it set for B as
> well.  If the Host _OSC does *not* grant hotplug ownership to the OS,
> native_pcie_hotplug will be cleared for both A and B.
> 
> Apparently the Guest doesn't supply _OSC (seems like a spec
> violation;
> could tell from the dmesg), so A's native_pcie_hotplug remains
> cleared, which means vmd_copy_host_bridge_flags() will also clear it
> for B.
> 
> [The lack of a Guest BIOS _OSC would also mean that if you passed
> through a normal non-VMD PCIe Switch Downstream Port to the Guest,
> the
> Guest OS would never be able to manage hotplug below that
> Port.  Maybe
> nobody passes through Switch Ports.]
> 
> > > "This mechanism works fine for Host OS and no issue has been
> > > observed."  I guess this means that if the platform grants
> > > hotplug
> > > control to the Host OS via _OSC, pciehp in the Host OS works fine
> > > to
> > > manage hotplug on Root Ports below the VMD?
> > 
> > Yes. When platform hotplug setting is enabled, then _OSC assigns
> > correct value to vmd root ports via vmd_copy_host_bridge_flags.
> > 
> > > If the platform does *not* grant hotplug control to the Host OS,
> > > I
> > > guess it means the Host OS pciehp can *not* manage hotplug below
> > > VMD?
> > > Is that also what you expect?
> > > 
> > > "However in case of VM, all the HyperVisors don't pass the
> > > Hotplug
> > > setting to the guest BIOS"  What is the mechanism by which they
> > > would
> > > pass the hotplug setting?  I guess the guest probably doesn't see
> > > the
> > > VMD endpoint itself, right?  The guest sees either virtualized or
> > > passed-through VMD Root Ports?
> > 
> > In guest, vmd is passthrough including pci topology behind vmd. The
> > way
> > we verify Hotplug capability is to read Slot Capabilities
> > registers'
> > hotplug enabled bit set on vmd root ports in Guest.
> 
> I guess maybe this refers to set_pcie_hotplug_bridge(), which checks
> PCI_EXP_SLTCAP_HPC?  If it's not set, pciehp won't bind to the
> device.
> This behavior is the same in Host and Guest.
> 
> > The values copied in vmd_copy_host_bridge_flags are different in
> > Host
> > than in Guest. I do not know what component is responsible for this
> > in
> > every HyperVisor. But I tested this in ESXi, HyperV, KVM and they
> > all
> > had the same behavior where the _OSC flags are set to power-on
> > default
> > values of 0 by vmd_copy_host_bridge_flags in Guest OS which is
> > disabling hotplug.
> > 
> > > I assume the guest OS sees a constructed ACPI system (different
> > > from
> > > the ACPI environment the host sees), so it sees a PNP0A03 host
> > > bridge
> > > with _OSC?  And presumably VMD Root Ports below that host bridge?
> > > 
> > > So are you saying the _OSC seen by the guest doesn't grant
> > > hotplug
> > > control to the guest OS?  And it doesn't do that because the
> > > guest
> > > BIOS hasn't implemented _OSC?  Or maybe the guest BIOS relies on
> > > the
> > > Slot Capabilities registers of VMD Root Ports to decide whether
> > > _OSC
> > > should grant hotplug control?  And those Slot Capabilities don't
> > > advertise hotplug support?
> > 
> > Currently Hotplug is controlled by _OSC in both host and Guest. In
> > case
> > of guest, it seems guest BIOS hasn't implemented _OSC since all the
> > flags are set to 0 which is not the case in host.
> 
> I think you want pciehp to work on the VMD Root Ports in the Guest,
> no matter what, on the assumption that any _OSC that applies to host
> bridge A does not apply to the host bridge created by the vmd driver.
> 
> If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC
> on PCIe features").  Since host bridge B was not enumerated via ACPI,
> the OS owns all those features under B by default, so reverting
> 04b12ef163d1 would leave that state.
> 
> Obviously we'd have to first figure out another solution for the AER
> message flood that 04b12ef163d1 resolved.
Hi Bjorn,

If we revert 04b12ef163d1, then VMD driver will still enable AER
blindly which is a bug. So we need to find a way to make VMD driver
aware of AER platform setting and use that information to enable or
disable AER for its child devices.

There is a setting in BIOS that allows us to enable OS native AER
support on the platform. This setting is located in EDK Menu ->
Platform configuration -> system event log -> IIO error enabling -> OS
native AER support.

This setting is assigned to VMD bridge by vmd_copy_host_bridge_flags in
patch 04b12ef163d1. Without the patch 04b12ef163d1, VMD driver will
enable AER even if platform has disabled OS native AER support as
mentioned earlier. This will result in an AER flood mentioned in 04b12e
f163d1 since there is no AER handlers. 

I believe the rate limit will not alone fix the issue rather will act
as a work around. If VMD is aware of OS native AER support setting,
then we will not see AER flooding issue.

Do you have any suggestion on how to make VMD driver aware of AER
setting and keep it in sync with platform setting.

Thanks
nirmal
> 
> > VMD is passthrough in Guest so the slot capabilities registers are
> > still same as what Host driver would see. That is how we can still
> > control hotplug on vmd on both Guest and Host.
> > > "... which results in assigning default values and disabling
> > > Hotplug
> > > capability in the guest."  What default values are these?
> > 
> > 0. Everything is disabled in guest. So basically _OSC is writing
> > wrong
> > values in guest.
> > 
> > > Is this talking about the default host_bridge-
> > > >native_pcie_hotplug
> > > value set in acpi_pci_root_create(), where the OS assumes hotplug
> > > is owned by the platform unless _OSC grants control to the OS?
> > 
> > vmd driver calls pci_create_root_bus which eventually ends up
> > calling pci_init_host_bridge where native_pcie_hotplug is set to 1.
> > Now vmd_copy_host_bridge_flags overwrites the native_pcie_hotplug
> > to
> > _OSC setting of 0 in guest.
> > 
> > > > VMD Hotplug can be enabled or disabled based on the VMD
> > > > rootports' Hotplug configuration in BIOS. is_hotplug_bridge is
> > > > set on each VMD rootport based on Hotplug capable bit in SltCap
> > > > in probe.c.  Check is_hotplug_bridge and enable or disable
> > > > native_pcie_hotplug based on that value.
> > > > 
> > > > This patch will make sure that Hotplug is enabled properly in
> > > > Host as well as in VM while honoring _OSC settings as well as
> > > > VMD hotplug setting.
> > > 
> > > "Hotplug is enabled properly in Host as well as in VM" sounds
> > > like
> > > we're talking about both the host OS and the guest OS.
> > 
> > Yes. VM means guest.
> > > In the host OS, this overrides whatever was negotiated via _OSC,
> > > I
> > > guess on the principle that _OSC doesn't apply because the host
> > > BIOS
> > > doesn't know about the Root Ports below the VMD.  (I'm not sure
> > > it's
> > > 100% resolved that _OSC doesn't apply to them, so we should
> > > mention
> > > that assumption here.)
> > 
> > _OSC still controls every flag including Hotplug. I have observed
> > that slot capabilities register has hotplug enabled only when
> > platform has enabled the hotplug. So technically not overriding it
> > in the host.
> > 
> > It overrides in guest because _OSC is passing wrong/different
> > information than the _OSC information in Host.  Also like I
> > mentioned, slot capabilities registers are not changed in Guest
> > because vmd is passthrough.
> > > In the guest OS, this still overrides whatever was negotiated via
> > > _OSC, but presumably the guest BIOS *does* know about the Root
> > > Ports, so I don't think the same principle about overriding _OSC
> > > applies there.
> > > 
> > > I think it's still a problem that we handle
> > > host_bridge->native_pcie_hotplug differently than all the other
> > > features negotiated via _OSC.  We should at least explain this
> > > somehow.
> > 
> > Right now this is the only way to know in Guest OS if platform has
> > enabled Hotplug or not. We have many customers complaining about
> > Hotplug being broken in Guest. So just trying to honor _OSC but
> > also
> > fixing its limitation in Guest.
> > > I suspect part of the reason for doing it differently is to avoid
> > > the interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd:
> > > Honor ACPI _OSC on PCIe features").  If so, I think 04b12ef163d1
> > > should be mentioned here because it's an important piece of the
> > > picture.
> > 
> > I can add a note about this patch as well. Let me know if you want
> > me to add something specific.
> > 
> > Thank you for taking the time. This is a very critical issue for us
> > and we have many customers complaining about it, I would appreciate
> > to get it resolved as soon as possible.
> > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > > ---
> > > > v1->v2: Updating commit message.
> > > > ---
> > > >  drivers/pci/controller/vmd.c | 11 ++++++++++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/vmd.c
> > > > b/drivers/pci/controller/vmd.c
> > > > index 769eedeb8802..e39eaef5549a 100644
> > > > --- a/drivers/pci/controller/vmd.c
> > > > +++ b/drivers/pci/controller/vmd.c
> > > > @@ -720,6 +720,7 @@ static int vmd_enable_domain(struct vmd_dev
> > > > *vmd, unsigned long features)
> > > >  	resource_size_t membar2_offset = 0x2000;
> > > >  	struct pci_bus *child;
> > > >  	struct pci_dev *dev;
> > > > +	struct pci_host_bridge *vmd_bridge;
> > > >  	int ret;
> > > >  
> > > >  	/*
> > > > @@ -886,8 +887,16 @@ static int vmd_enable_domain(struct
> > > > vmd_dev
> > > > *vmd, unsigned long features)
> > > >  	 * and will fail pcie_bus_configure_settings() early.
> > > > It can
> > > > instead be
> > > >  	 * run on each of the real root ports.
> > > >  	 */
> > > > -	list_for_each_entry(child, &vmd->bus->children, node)
> > > > +	vmd_bridge = to_pci_host_bridge(vmd->bus->bridge);
> > > > +	list_for_each_entry(child, &vmd->bus->children, node) {
> > > >  		pcie_bus_configure_settings(child);
> > > > +		/*
> > > > +		 * When Hotplug is enabled on vmd root-port,
> > > > enable it
> > > > on vmd
> > > > +		 * bridge.
> > > > +		 */
> > > > +		if (child->self->is_hotplug_bridge)
> > > > +			vmd_bridge->native_pcie_hotplug = 1;
> > > 
> > > "is_hotplug_bridge" basically means PCI_EXP_SLTCAP_HPC is set,
> > > which
> > > means "the hardware of this slot is hot-plug *capable*."
> > > 
> > > Whether hotplug is *enabled* is a separate policy decision about
> > > whether the OS is allowed to operate the hotplug functionality,
> > > so I
> > > think saying "When Hotplug is enabled" in the comment is a little
> > > bit
> > > misleading.
> > 
> > I will rewrite the comment.
> > > Bjorn
> > > 
> > > > +	}
> > > >  
> > > >  	pci_bus_add_devices(vmd->bus);
> > > >  
> > > > -- 
> > > > 2.31.1
> > > >
Bjorn Helgaas Dec. 12, 2023, 9:13 p.m. UTC | #5
On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote:
> On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> > ...

> > > Currently Hotplug is controlled by _OSC in both host and Guest.
> > > In case of guest, it seems guest BIOS hasn't implemented _OSC
> > > since all the flags are set to 0 which is not the case in host.
> > 
> > I think you want pciehp to work on the VMD Root Ports in the
> > Guest, no matter what, on the assumption that any _OSC that
> > applies to host bridge A does not apply to the host bridge created
> > by the vmd driver.
> > 
> > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI
> > _OSC on PCIe features").  Since host bridge B was not enumerated
> > via ACPI, the OS owns all those features under B by default, so
> > reverting 04b12ef163d1 would leave that state.
> > 
> > Obviously we'd have to first figure out another solution for the
> > AER message flood that 04b12ef163d1 resolved.
> 
> If we revert 04b12ef163d1, then VMD driver will still enable AER
> blindly which is a bug. So we need to find a way to make VMD driver
> aware of AER platform setting and use that information to enable or
> disable AER for its child devices.
> 
> There is a setting in BIOS that allows us to enable OS native AER
> support on the platform. This setting is located in EDK Menu ->
> Platform configuration -> system event log -> IIO error enabling ->
> OS native AER support.
> 
> This setting is assigned to VMD bridge by vmd_copy_host_bridge_flags
> in patch 04b12ef163d1. Without the patch 04b12ef163d1, VMD driver
> will enable AER even if platform has disabled OS native AER support
> as mentioned earlier. This will result in an AER flood mentioned in
> 04b12e f163d1 since there is no AER handlers. 
> 
> I believe the rate limit will not alone fix the issue rather will
> act as a work around.

I agree with this part.  A rate limit would not solve the problem of
an interrupt storm consuming the CPU.  That could be addressed by
switching to polling when the rate is high or possibly other ways.

> If VMD is aware of OS native AER support setting, then we will not
> see AER flooding issue.
> 
> Do you have any suggestion on how to make VMD driver aware of AER
> setting and keep it in sync with platform setting.

Well, this is the whole problem.  IIUC, you're saying we should use
_OSC to negotiate for AER control below the VMD bridge, but ignore
_OSC for hotplug control.

I don't see how to read the _OSC spec and decide which things apply
below a VMD bridge and which things don't.

But I think a proposal that "_OSC doesn't apply to any devices below a
VMD bridge," that *is* a pretty generic principle.  If we think that's
the right way to use _OSC, shouldn't the vmd driver be able to
configure AER for devices below the VMD bridge in any way it wants?

I'm not sure what "_OSC doesn't apply below a VMD" would mean for
things like firmware-first error logging below the VMD.  Maybe
firmware-first doesn't make sense below VMD anyway because firmware
and the OS have different ideas about the Segment for those devices?

Bjorn
Nirmal Patel Dec. 14, 2023, 1:07 a.m. UTC | #6
On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote:
> On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote:
> > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> > > ...
> > > > Currently Hotplug is controlled by _OSC in both host and Guest.
> > > > In case of guest, it seems guest BIOS hasn't implemented _OSC
> > > > since all the flags are set to 0 which is not the case in host.
> > > 
> > > I think you want pciehp to work on the VMD Root Ports in the
> > > Guest, no matter what, on the assumption that any _OSC that
> > > applies to host bridge A does not apply to the host bridge
> > > created
> > > by the vmd driver.
> > > 
> > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI
> > > _OSC on PCIe features").  Since host bridge B was not enumerated
> > > via ACPI, the OS owns all those features under B by default, so
> > > reverting 04b12ef163d1 would leave that state.
> > > 
> > > Obviously we'd have to first figure out another solution for the
> > > AER message flood that 04b12ef163d1 resolved.
> > 
> > If we revert 04b12ef163d1, then VMD driver will still enable AER
> > blindly which is a bug. So we need to find a way to make VMD driver
> > aware of AER platform setting and use that information to enable or
> > disable AER for its child devices.
> > 
> > There is a setting in BIOS that allows us to enable OS native AER
> > support on the platform. This setting is located in EDK Menu ->
> > Platform configuration -> system event log -> IIO error enabling ->
> > OS native AER support.
> > 
> > This setting is assigned to VMD bridge by
> > vmd_copy_host_bridge_flags
> > in patch 04b12ef163d1. Without the patch 04b12ef163d1, VMD driver
> > will enable AER even if platform has disabled OS native AER support
> > as mentioned earlier. This will result in an AER flood mentioned in
> > 04b12e f163d1 since there is no AER handlers. 
> > 
> > I believe the rate limit will not alone fix the issue rather will
> > act as a work around.
> 
> I agree with this part.  A rate limit would not solve the problem of
> an interrupt storm consuming the CPU.  That could be addressed by
> switching to polling when the rate is high or possibly other ways.
> 
> > If VMD is aware of OS native AER support setting, then we will not
> > see AER flooding issue.
> > 
> > Do you have any suggestion on how to make VMD driver aware of AER
> > setting and keep it in sync with platform setting.
> 
> Well, this is the whole problem.  IIUC, you're saying we should use
> _OSC to negotiate for AER control below the VMD bridge, but ignore
> _OSC for hotplug control.
Because VMD has it's own hotplug BIOS setting which allows vmd to
enable or disable hotplug on it's domain. However we dont have VMD
specific AER, DPC, LTR settings. Is it okay if we include an additional
check for hotplug? i.e. Hotplug capable bit in SltCap register which
reflects VMD BIOS hotplug setting.

Another way is to overwrite _OSC setting for hotplug only in Guest OS.
If we make VMD driver aware of Host or Guest environment, only in case
of Guest we overwrite _OSC hotplug using SltCap register and we don't
revert the 04b12ef163d1.
> 
> I don't see how to read the _OSC spec and decide which things apply
> below a VMD bridge and which things don't.
> 
> But I think a proposal that "_OSC doesn't apply to any devices below
> a
> VMD bridge," that *is* a pretty generic principle.  If we think
> that's
> the right way to use _OSC, shouldn't the vmd driver be able to
> configure AER for devices below the VMD bridge in any way it wants?
> 
> I'm not sure what "_OSC doesn't apply below a VMD" would mean for
> things like firmware-first error logging below the VMD.  Maybe
> firmware-first doesn't make sense below VMD anyway because firmware
> and the OS have different ideas about the Segment for those devices?
> 
> Bjorn
Bjorn Helgaas Dec. 14, 2023, 7:23 p.m. UTC | #7
On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote:
> On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote:
> > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote:
> > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> > > > ...
> > > > > Currently Hotplug is controlled by _OSC in both host and Guest.
> > > > > In case of guest, it seems guest BIOS hasn't implemented _OSC
> > > > > since all the flags are set to 0 which is not the case in host.
> > > > 
> > > > I think you want pciehp to work on the VMD Root Ports in the
> > > > Guest, no matter what, on the assumption that any _OSC that
> > > > applies to host bridge A does not apply to the host bridge
> > > > created
> > > > by the vmd driver.
> > > > 
> > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor ACPI
> > > > _OSC on PCIe features").  Since host bridge B was not enumerated
> > > > via ACPI, the OS owns all those features under B by default, so
> > > > reverting 04b12ef163d1 would leave that state.
> > > > 
> > > > Obviously we'd have to first figure out another solution for the
> > > > AER message flood that 04b12ef163d1 resolved.
> > > 
> > > If we revert 04b12ef163d1, then VMD driver will still enable AER
> > > blindly which is a bug. So we need to find a way to make VMD driver
> > > aware of AER platform setting and use that information to enable or
> > > disable AER for its child devices.
> > > 
> > > There is a setting in BIOS that allows us to enable OS native AER
> > > support on the platform. This setting is located in EDK Menu ->
> > > Platform configuration -> system event log -> IIO error enabling ->
> > > OS native AER support.
> > > 
> > > This setting is assigned to VMD bridge by
> > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without the
> > > patch 04b12ef163d1, VMD driver will enable AER even if platform
> > > has disabled OS native AER support as mentioned earlier. This
> > > will result in an AER flood mentioned in 04b12ef163d1 since
> > > there is no AER handlers. 

I missed this before.  What does "there are no AER handlers" mean?  Do
you mean there are no *firmware* AER handlers?  

The dmesg log at https://bugzilla.kernel.org/attachment.cgi?id=299571
(from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug
fixed by 04b12ef163d1), shows this:

  acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
  acpi PNP0A08:00: _OSC: platform does not support [AER]
  acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]

so the firmware did not grant AER control to the OS (I think "platform
does not support" is a confusing description).

Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but not
below it, so Linux had native control below VMD and it did handle AER
interrupts from the 10000:e0:06.0 Root Port below VMD:

  vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
  pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0

After 04b12ef163d1, Linux applied _OSC below the VMD bridge as well,
so it did not enable or handle any AER interrupts.  I suspect the
platform didn't handle AER interrupts below VMD either, so those
errors were probably just ignored.

> > > If VMD is aware of OS native AER support setting, then we will not
> > > see AER flooding issue.
> > > 
> > > Do you have any suggestion on how to make VMD driver aware of AER
> > > setting and keep it in sync with platform setting.
> > 
> > Well, this is the whole problem.  IIUC, you're saying we should use
> > _OSC to negotiate for AER control below the VMD bridge, but ignore
> > _OSC for hotplug control.
>
> Because VMD has its own hotplug BIOS setting which allows vmd to
> enable or disable hotplug on its domain. However we don't have VMD
> specific AER, DPC, LTR settings. 

I don't quite follow.  The OS knows nothing about whether BIOS
settings exist, so they can't be used to determine where _OSC applies.

> Is it okay if we include an additional check for hotplug? i.e.
> Hotplug capable bit in SltCap register which reflects VMD BIOS
> hotplug setting.

I don't know what check you have in mind, but the OS can definitely
use SltCap to decide what features to enable.

You suggest above that you want vmd to be aware of AER ownership per
_OSC, but it sounds more like you just want AER disabled below VMD
regardless.  Or do you have platforms that can actually handle AER
interrupts from Root Ports below VMD?

> Another way is to overwrite _OSC setting for hotplug only in Guest
> OS.  If we make VMD driver aware of Host or Guest environment, only
> in case of Guest we overwrite _OSC hotplug using SltCap register and
> we don't revert the 04b12ef163d1.

Making vmd aware of being in host or guest sounds kind of ugly to me.

Bjorn
Nirmal Patel Dec. 14, 2023, 10:22 p.m. UTC | #8
On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote:
> On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote:
> > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote:
> > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote:
> > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> > > > > ...
> > > > > > Currently Hotplug is controlled by _OSC in both host and
> > > > > > Guest.
> > > > > > In case of guest, it seems guest BIOS hasn't implemented
> > > > > > _OSC
> > > > > > since all the flags are set to 0 which is not the case in
> > > > > > host.
> > > > > 
> > > > > I think you want pciehp to work on the VMD Root Ports in the
> > > > > Guest, no matter what, on the assumption that any _OSC that
> > > > > applies to host bridge A does not apply to the host bridge
> > > > > created
> > > > > by the vmd driver.
> > > > > 
> > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor
> > > > > ACPI
> > > > > _OSC on PCIe features").  Since host bridge B was not
> > > > > enumerated
> > > > > via ACPI, the OS owns all those features under B by default,
> > > > > so
> > > > > reverting 04b12ef163d1 would leave that state.
> > > > > 
> > > > > Obviously we'd have to first figure out another solution for
> > > > > the
> > > > > AER message flood that 04b12ef163d1 resolved.
> > > > 
> > > > If we revert 04b12ef163d1, then VMD driver will still enable
> > > > AER
> > > > blindly which is a bug. So we need to find a way to make VMD
> > > > driver
> > > > aware of AER platform setting and use that information to
> > > > enable or
> > > > disable AER for its child devices.
> > > > 
> > > > There is a setting in BIOS that allows us to enable OS native
> > > > AER
> > > > support on the platform. This setting is located in EDK Menu ->
> > > > Platform configuration -> system event log -> IIO error
> > > > enabling ->
> > > > OS native AER support.
> > > > 
> > > > This setting is assigned to VMD bridge by
> > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without the
> > > > patch 04b12ef163d1, VMD driver will enable AER even if platform
> > > > has disabled OS native AER support as mentioned earlier. This
> > > > will result in an AER flood mentioned in 04b12ef163d1 since
> > > > there is no AER handlers. 
> 
> I missed this before.  What does "there are no AER handlers"
> mean?  Do
> you mean there are no *firmware* AER handlers?  
Sorry for confusing wordings. Your understanding is correct. The patch
04b12ef163d1 is used to disable AER on vmd and avoid the AER flooding
by applying _OSC settings since vmd driver doesn't give a choice to
toggle AER, DPC, LTR, etc.
> 
> The dmesg log at https://bugzilla.kernel.org/attachment.cgi?id=299571
> (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug
> fixed by 04b12ef163d1), shows this:
> 
>   acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> Segments MSI HPX-Type3]
>   acpi PNP0A08:00: _OSC: platform does not support [AER]
>   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME
> PCIeCapability LTR]
> 
> so the firmware did not grant AER control to the OS (I think
> "platform
> does not support" is a confusing description).
> 
> Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but
> not
> below it, so Linux had native control below VMD and it did handle AER
> interrupts from the 10000:e0:06.0 Root Port below VMD:
> 
>   vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
>   pcieport 10000:e0:06.0: AER: Corrected error received:
> 10000:e1:00.0
> 
> After 04b12ef163d1, Linux applied _OSC below the VMD bridge as well,
> so it did not enable or handle any AER interrupts.  I suspect the
> platform didn't handle AER interrupts below VMD either, so those
> errors were probably just ignored.
> 
> > > > If VMD is aware of OS native AER support setting, then we will
> > > > not
> > > > see AER flooding issue.
> > > > 
> > > > Do you have any suggestion on how to make VMD driver aware of
> > > > AER
> > > > setting and keep it in sync with platform setting.
> > > 
> > > Well, this is the whole problem.  IIUC, you're saying we should
> > > use
> > > _OSC to negotiate for AER control below the VMD bridge, but
> > > ignore
> > > _OSC for hotplug control.
> > 
> > Because VMD has its own hotplug BIOS setting which allows vmd to
> > enable or disable hotplug on its domain. However we don't have VMD
> > specific AER, DPC, LTR settings. 
> 
> I don't quite follow.  The OS knows nothing about whether BIOS
> settings exist, so they can't be used to determine where _OSC
> applies.
> 
> > Is it okay if we include an additional check for hotplug? i.e.
> > Hotplug capable bit in SltCap register which reflects VMD BIOS
> > hotplug setting.
> 
> I don't know what check you have in mind, but the OS can definitely
> use SltCap to decide what features to enable.
Yes I agree. That is what I am also suggesting in this patch.
> 
> You suggest above that you want vmd to be aware of AER ownership per
> _OSC, but it sounds more like you just want AER disabled below VMD
> regardless.  Or do you have platforms that can actually handle AER
> interrupts from Root Ports below VMD?
No I dont want AER disabled below VMD all the time. I am suggesting vmd
should be in sync with all the _OSC flags since vmd doesn't give a
choice to toggle. However, the issue arises in guest where _OSC setting
for hotplug is always 0 eventhough hotplug is 1 in host and hotplug
enable bit in SltCap register is 1 in both host and guest. So we can
use that to enable hotplug in guest. Like you suggested in your above
comment. 
> 
> > Another way is to overwrite _OSC setting for hotplug only in Guest
> > OS.  If we make VMD driver aware of Host or Guest environment, only
> > in case of Guest we overwrite _OSC hotplug using SltCap register
> > and
> > we don't revert the 04b12ef163d1.
> 
> Making vmd aware of being in host or guest sounds kind of ugly to me.
I agree.
> 
> Bjorn
Nirmal Patel Jan. 12, 2024, 12:02 a.m. UTC | #9
On Thu, 2023-12-14 at 15:22 -0700, Nirmal Patel wrote:
> On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote:
> > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote:
> > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote:
> > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote:
> > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel
> > > > > > wrote:
> > > > > > ...
> > > > > > > Currently Hotplug is controlled by _OSC in both host and
> > > > > > > Guest.
> > > > > > > In case of guest, it seems guest BIOS hasn't implemented
> > > > > > > _OSC
> > > > > > > since all the flags are set to 0 which is not the case in
> > > > > > > host.
> > > > > > 
> > > > > > I think you want pciehp to work on the VMD Root Ports in
> > > > > > the
> > > > > > Guest, no matter what, on the assumption that any _OSC that
> > > > > > applies to host bridge A does not apply to the host bridge
> > > > > > created
> > > > > > by the vmd driver.
> > > > > > 
> > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd: Honor
> > > > > > ACPI
> > > > > > _OSC on PCIe features").  Since host bridge B was not
> > > > > > enumerated
> > > > > > via ACPI, the OS owns all those features under B by
> > > > > > default,
> > > > > > so
> > > > > > reverting 04b12ef163d1 would leave that state.
> > > > > > 
> > > > > > Obviously we'd have to first figure out another solution
> > > > > > for
> > > > > > the
> > > > > > AER message flood that 04b12ef163d1 resolved.
> > > > > 
> > > > > If we revert 04b12ef163d1, then VMD driver will still enable
> > > > > AER
> > > > > blindly which is a bug. So we need to find a way to make VMD
> > > > > driver
> > > > > aware of AER platform setting and use that information to
> > > > > enable or
> > > > > disable AER for its child devices.
> > > > > 
> > > > > There is a setting in BIOS that allows us to enable OS native
> > > > > AER
> > > > > support on the platform. This setting is located in EDK Menu
> > > > > ->
> > > > > Platform configuration -> system event log -> IIO error
> > > > > enabling ->
> > > > > OS native AER support.
> > > > > 
> > > > > This setting is assigned to VMD bridge by
> > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without the
> > > > > patch 04b12ef163d1, VMD driver will enable AER even if
> > > > > platform
> > > > > has disabled OS native AER support as mentioned earlier. This
> > > > > will result in an AER flood mentioned in 04b12ef163d1 since
> > > > > there is no AER handlers. 
> > 
> > I missed this before.  What does "there are no AER handlers"
> > mean?  Do
> > you mean there are no *firmware* AER handlers?  
> Sorry for confusing wordings. Your understanding is correct. The
> patch
> 04b12ef163d1 is used to disable AER on vmd and avoid the AER flooding
> by applying _OSC settings since vmd driver doesn't give a choice to
> toggle AER, DPC, LTR, etc.
> > The dmesg log at 
> > https://bugzilla.kernel.org/attachment.cgi?id=299571
> > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug
> > fixed by 04b12ef163d1), shows this:
> > 
> >   acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> > Segments MSI HPX-Type3]
> >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug
> > PME
> > PCIeCapability LTR]
> > 
> > so the firmware did not grant AER control to the OS (I think
> > "platform
> > does not support" is a confusing description).
> > 
> > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but
> > not
> > below it, so Linux had native control below VMD and it did handle
> > AER
> > interrupts from the 10000:e0:06.0 Root Port below VMD:
> > 
> >   vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
> >   pcieport 10000:e0:06.0: AER: Corrected error received:
> > 10000:e1:00.0
> > 
> > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as
> > well,
> > so it did not enable or handle any AER interrupts.  I suspect the
> > platform didn't handle AER interrupts below VMD either, so those
> > errors were probably just ignored.
> > 
> > > > > If VMD is aware of OS native AER support setting, then we
> > > > > will
> > > > > not
> > > > > see AER flooding issue.
> > > > > 
> > > > > Do you have any suggestion on how to make VMD driver aware of
> > > > > AER
> > > > > setting and keep it in sync with platform setting.
> > > > 
> > > > Well, this is the whole problem.  IIUC, you're saying we should
> > > > use
> > > > _OSC to negotiate for AER control below the VMD bridge, but
> > > > ignore
> > > > _OSC for hotplug control.
> > > 
> > > Because VMD has its own hotplug BIOS setting which allows vmd to
> > > enable or disable hotplug on its domain. However we don't have
> > > VMD
> > > specific AER, DPC, LTR settings. 
> > 
> > I don't quite follow.  The OS knows nothing about whether BIOS
> > settings exist, so they can't be used to determine where _OSC
> > applies.
> > 
> > > Is it okay if we include an additional check for hotplug? i.e.
> > > Hotplug capable bit in SltCap register which reflects VMD BIOS
> > > hotplug setting.
> > 
> > I don't know what check you have in mind, but the OS can definitely
> > use SltCap to decide what features to enable.
> Yes I agree. That is what I am also suggesting in this patch.
> > You suggest above that you want vmd to be aware of AER ownership
> > per
> > _OSC, but it sounds more like you just want AER disabled below VMD
> > regardless.  Or do you have platforms that can actually handle AER
> > interrupts from Root Ports below VMD?
> No I dont want AER disabled below VMD all the time. I am suggesting
> vmd
> should be in sync with all the _OSC flags since vmd doesn't give a
> choice to toggle. However, the issue arises in guest where _OSC
> setting
> for hotplug is always 0 eventhough hotplug is 1 in host and hotplug
> enable bit in SltCap register is 1 in both host and guest. So we can
> use that to enable hotplug in guest. Like you suggested in your above
> comment. 
> > > Another way is to overwrite _OSC setting for hotplug only in
> > > Guest
> > > OS.  If we make VMD driver aware of Host or Guest environment,
> > > only
> > > in case of Guest we overwrite _OSC hotplug using SltCap register
> > > and
> > > we don't revert the 04b12ef163d1.
> > 
> > Making vmd aware of being in host or guest sounds kind of ugly to
> > me.
> I agree.
> > Bjorn
Hi Bjorn,

Happy new year. sending a gentle reminder to help conclude the
discussion.

Thanks
nirmal
Bjorn Helgaas Jan. 12, 2024, 10:55 p.m. UTC | #10
[+cc Rafael, Kai-Heng]

On Thu, Dec 14, 2023 at 03:22:21PM -0700, Nirmal Patel wrote:
> On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote:
> > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote:
> > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote:
> > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote:
> > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> > > > > > ...
> > > > > > > Currently Hotplug is controlled by _OSC in both host and
> > > > > > > Guest.  In case of guest, it seems guest BIOS hasn't
> > > > > > > implemented _OSC since all the flags are set to 0 which
> > > > > > > is not the case in host.
> > > > > > 
> > > > > > I think you want pciehp to work on the VMD Root Ports in
> > > > > > the Guest, no matter what, on the assumption that any _OSC
> > > > > > that applies to host bridge A does not apply to the host
> > > > > > bridge created by the vmd driver.
> > > > > > 
> > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd:
> > > > > > Honor ACPI _OSC on PCIe features").  Since host bridge B
> > > > > > was not enumerated via ACPI, the OS owns all those
> > > > > > features under B by default, so reverting 04b12ef163d1
> > > > > > would leave that state.
> > > > > > 
> > > > > > Obviously we'd have to first figure out another solution
> > > > > > for the AER message flood that 04b12ef163d1 resolved.
> > > > > 
> > > > > If we revert 04b12ef163d1, then VMD driver will still enable
> > > > > AER blindly which is a bug. So we need to find a way to make
> > > > > VMD driver aware of AER platform setting and use that
> > > > > information to enable or disable AER for its child devices.
> > > > > 
> > > > > There is a setting in BIOS that allows us to enable OS
> > > > > native AER support on the platform. This setting is located
> > > > > in EDK Menu -> Platform configuration -> system event log ->
> > > > > IIO error enabling -> OS native AER support.
> > > > > 
> > > > > This setting is assigned to VMD bridge by
> > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without
> > > > > the patch 04b12ef163d1, VMD driver will enable AER even if
> > > > > platform has disabled OS native AER support as mentioned
> > > > > earlier. This will result in an AER flood mentioned in
> > > > > 04b12ef163d1 since there is no AER handlers. 
> > 
> > I missed this before.  What does "there are no AER handlers" mean?
> > Do you mean there are no *firmware* AER handlers?  
>
> Sorry for confusing wordings. Your understanding is correct. The patch
> 04b12ef163d1 is used to disable AER on vmd and avoid the AER flooding
> by applying _OSC settings since vmd driver doesn't give a choice to
> toggle AER, DPC, LTR, etc.
> > 
> > The dmesg log at https://bugzilla.kernel.org/attachment.cgi?id=299571
> > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug
> > fixed by 04b12ef163d1), shows this:
> > 
> >   acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> > Segments MSI HPX-Type3]
> >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME
> > PCIeCapability LTR]
> > 
> > so the firmware did not grant AER control to the OS (I think
> > "platform does not support" is a confusing description).
> > 
> > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge but
> > not below it, so Linux had native control below VMD and it did
> > handle AER interrupts from the 10000:e0:06.0 Root Port below VMD:
> > 
> >   vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
> >   pcieport 10000:e0:06.0: AER: Corrected error received:
> > 10000:e1:00.0
> > 
> > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as well,
> > so it did not enable or handle any AER interrupts.  I suspect the
> > platform didn't handle AER interrupts below VMD either, so those
> > errors were probably just ignored.
> > 
> > > > > If VMD is aware of OS native AER support setting, then we
> > > > > will not see AER flooding issue.
> > > > > 
> > > > > Do you have any suggestion on how to make VMD driver aware
> > > > > of AER setting and keep it in sync with platform setting.
> > > > 
> > > > Well, this is the whole problem.  IIUC, you're saying we
> > > > should use _OSC to negotiate for AER control below the VMD
> > > > bridge, but ignore _OSC for hotplug control.
> > > 
> > > Because VMD has its own hotplug BIOS setting which allows vmd to
> > > enable or disable hotplug on its domain. However we don't have
> > > VMD specific AER, DPC, LTR settings. 
> > 
> > I don't quite follow.  The OS knows nothing about whether BIOS
> > settings exist, so they can't be used to determine where _OSC
> > applies.
> > 
> > > Is it okay if we include an additional check for hotplug? i.e.
> > > Hotplug capable bit in SltCap register which reflects VMD BIOS
> > > hotplug setting.
> > 
> > I don't know what check you have in mind, but the OS can
> > definitely use SltCap to decide what features to enable.
>
> Yes I agree. That is what I am also suggesting in this patch.

I should have said "the OS can use SltCap to *help* decide what
features to enable."  For ACPI host bridges, _OSC determines whether
hotplug should be operated by the platform or the OS.

I think we're converging on the idea that since VMD is effectively
*not* an ACPI host bridge and doesn't have its own _OSC, the _OSC that
applies to the VMD endpoint does *not* apply to the hierarchy below
the VMD.  In that case, the default is that the OS owns all the
features (hotplug, AER, etc) below the VMD.

> > You suggest above that you want vmd to be aware of AER ownership per
> > _OSC, but it sounds more like you just want AER disabled below VMD
> > regardless.  Or do you have platforms that can actually handle AER
> > interrupts from Root Ports below VMD?
>
> No I dont want AER disabled below VMD all the time. I am suggesting
> vmd should be in sync with all the _OSC flags since vmd doesn't give
> a choice to toggle.

This is what I don't understand.  If we decide that _OSC doesn't apply
below VMD because the VMD host bridge isn't described in ACPI, the
idea of being in sync with _OSC doesn't make sense.

> However, the issue arises in guest where _OSC setting for hotplug is
> always 0 even though hotplug is 1 in host and hotplug enable bit in
> SltCap register is 1 in both host and guest. So we can use that to
> enable hotplug in guest. Like you suggested in your above comment. 

All this got paged out over the holidays, so I need to refresh my
understanding here.  Maybe it will help if we can make the situation
more concrete.  I'm basing this on the logs at
https://bugzilla.kernel.org/show_bug.cgi?id=215027.  I assume the
10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both
passed through to the guest.  I'm sure I got lots wrong here, so
please correct me :)

  Host OS sees:

    PNP0A08 host bridge to 0000 [bus 00-ff]
      _OSC applies to domain 0000
      OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in domain 0000
    vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff]
      no _OSC applies in domain 10000;
      host OS owns all PCIe features in domain 10000
    pci 10000:e0:06.0: [8086:464d]             # VMD root port
    pci 10000:e0:06.0: PCI bridge to [bus e1]
    pci 10000:e0:06.0: SltCap: HotPlug+        # Hotplug Capable
    pci 10000:e1:00.0: [144d:a80a]             # nvme

  Guest OS sees:

    PNP0A03 host bridge to 0000 [bus 00-ff]
      _OSC applies to domain 0000
      platform owns [PCIeHotplug ...]          # _OSC doesn't grant to OS
    pci 0000:e0:06.0: [8086:464d]              # VMD root port
    pci 0000:e0:06.0: PCI bridge to [bus e1]
    pci 0000:e0:06.0: SltCap: HotPlug+         # Hotplug Capable
    pci 0000:e1:00.0: [144d:a80a]             # nvme

So the guest OS sees that the VMD Root Port supports hotplug, but it
can't use it because it was not granted ownership of the feature?

Bjorn
Nirmal Patel Jan. 16, 2024, 8:37 p.m. UTC | #11
On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote:
> [+cc Rafael, Kai-Heng]
> 
> On Thu, Dec 14, 2023 at 03:22:21PM -0700, Nirmal Patel wrote:
> > On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote:
> > > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote:
> > > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote:
> > > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel wrote:
> > > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel
> > > > > > > wrote:
> > > > > > > ...
> > > > > > > > Currently Hotplug is controlled by _OSC in both host
> > > > > > > > and
> > > > > > > > Guest.  In case of guest, it seems guest BIOS hasn't
> > > > > > > > implemented _OSC since all the flags are set to 0 which
> > > > > > > > is not the case in host.
> > > > > > > 
> > > > > > > I think you want pciehp to work on the VMD Root Ports in
> > > > > > > the Guest, no matter what, on the assumption that any
> > > > > > > _OSC
> > > > > > > that applies to host bridge A does not apply to the host
> > > > > > > bridge created by the vmd driver.
> > > > > > > 
> > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd:
> > > > > > > Honor ACPI _OSC on PCIe features").  Since host bridge B
> > > > > > > was not enumerated via ACPI, the OS owns all those
> > > > > > > features under B by default, so reverting 04b12ef163d1
> > > > > > > would leave that state.
> > > > > > > 
> > > > > > > Obviously we'd have to first figure out another solution
> > > > > > > for the AER message flood that 04b12ef163d1 resolved.
> > > > > > 
> > > > > > If we revert 04b12ef163d1, then VMD driver will still
> > > > > > enable
> > > > > > AER blindly which is a bug. So we need to find a way to
> > > > > > make
> > > > > > VMD driver aware of AER platform setting and use that
> > > > > > information to enable or disable AER for its child devices.
> > > > > > 
> > > > > > There is a setting in BIOS that allows us to enable OS
> > > > > > native AER support on the platform. This setting is located
> > > > > > in EDK Menu -> Platform configuration -> system event log
> > > > > > ->
> > > > > > IIO error enabling -> OS native AER support.
> > > > > > 
> > > > > > This setting is assigned to VMD bridge by
> > > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without
> > > > > > the patch 04b12ef163d1, VMD driver will enable AER even if
> > > > > > platform has disabled OS native AER support as mentioned
> > > > > > earlier. This will result in an AER flood mentioned in
> > > > > > 04b12ef163d1 since there is no AER handlers. 
> > > 
> > > I missed this before.  What does "there are no AER handlers"
> > > mean?
> > > Do you mean there are no *firmware* AER handlers?  
> > 
> > Sorry for confusing wordings. Your understanding is correct. The
> > patch
> > 04b12ef163d1 is used to disable AER on vmd and avoid the AER
> > flooding
> > by applying _OSC settings since vmd driver doesn't give a choice to
> > toggle AER, DPC, LTR, etc.
> > > The dmesg log at 
> > > https://bugzilla.kernel.org/attachment.cgi?id=299571
> > > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the bug
> > > fixed by 04b12ef163d1), shows this:
> > > 
> > >   acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> > > Segments MSI HPX-Type3]
> > >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> > >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug
> > > PME
> > > PCIeCapability LTR]
> > > 
> > > so the firmware did not grant AER control to the OS (I think
> > > "platform does not support" is a confusing description).
> > > 
> > > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge
> > > but
> > > not below it, so Linux had native control below VMD and it did
> > > handle AER interrupts from the 10000:e0:06.0 Root Port below VMD:
> > > 
> > >   vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
> > >   pcieport 10000:e0:06.0: AER: Corrected error received:
> > > 10000:e1:00.0
> > > 
> > > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as
> > > well,
> > > so it did not enable or handle any AER interrupts.  I suspect the
> > > platform didn't handle AER interrupts below VMD either, so those
> > > errors were probably just ignored.
> > > 
> > > > > > If VMD is aware of OS native AER support setting, then we
> > > > > > will not see AER flooding issue.
> > > > > > 
> > > > > > Do you have any suggestion on how to make VMD driver aware
> > > > > > of AER setting and keep it in sync with platform setting.
> > > > > 
> > > > > Well, this is the whole problem.  IIUC, you're saying we
> > > > > should use _OSC to negotiate for AER control below the VMD
> > > > > bridge, but ignore _OSC for hotplug control.
> > > > 
> > > > Because VMD has its own hotplug BIOS setting which allows vmd
> > > > to
> > > > enable or disable hotplug on its domain. However we don't have
> > > > VMD specific AER, DPC, LTR settings. 
> > > 
> > > I don't quite follow.  The OS knows nothing about whether BIOS
> > > settings exist, so they can't be used to determine where _OSC
> > > applies.
> > > 
> > > > Is it okay if we include an additional check for hotplug? i.e.
> > > > Hotplug capable bit in SltCap register which reflects VMD BIOS
> > > > hotplug setting.
> > > 
> > > I don't know what check you have in mind, but the OS can
> > > definitely use SltCap to decide what features to enable.
> > 
> > Yes I agree. That is what I am also suggesting in this patch.
> 
> I should have said "the OS can use SltCap to *help* decide what
> features to enable."  For ACPI host bridges, _OSC determines whether
> hotplug should be operated by the platform or the OS.
> 
> I think we're converging on the idea that since VMD is effectively
> *not* an ACPI host bridge and doesn't have its own _OSC, the _OSC
> that
> applies to the VMD endpoint does *not* apply to the hierarchy below
> the VMD.  In that case, the default is that the OS owns all the
> features (hotplug, AER, etc) below the VMD.
Well there will be few problems if VMD owns/assigns all the flags by itself. We discussed all of the potential problems but due to the holidays I think I should summarize them again.
#1 : Currently there is no way to pass the information about AER, DPC,
etc to VMD driver from BIOS or from boot parameter. For example, If VMD
blindly enables AER and platform has AER disabled, then we will see
AERs from devices under VMD but user have no way to toggle it since we
rejected idea of adding boot parameter for AER, DPC under VMD. I
believe you also didn't like the idea of sysfs knob suggested by Kai-
Heng.

#2 : Even if we use VMD hardware register to store AER, DPC and make
UEFI VMD driver to write information about Hotplug, DPC, AER, we still
dont have a way to change the setting if user wants to alter them. Also
the issue will still persist in client platforms since we don't own
their UEFI VMD driver. It will be a huge effort.

#3 : At this moment, user can enable/disable only Hotplug in VMD BIOS
settings (meaning no AER, DPC, LTR, etc)and VMD driver can read it from
SltCap register. This means BIOS needs to add other settings and VMD
needs to figure out to read them which at this moment VMD can't do it.

#4 : consistency with Host OS and Guest OS.

I believe the current propesed patch is the best option which requires
minimal changes without breaking other platform features and unblock
our customers. This issues has been a blocker for us.

For your concerns regarding how VMD can own all the _OSC features, i am
open to other ideas and will discuss with the architect if they have
any suggestions.


> 
> > > You suggest above that you want vmd to be aware of AER ownership
> > > per
> > > _OSC, but it sounds more like you just want AER disabled below
> > > VMD
> > > regardless.  Or do you have platforms that can actually handle
> > > AER
> > > interrupts from Root Ports below VMD?
> > 
> > No I dont want AER disabled below VMD all the time. I am suggesting
> > vmd should be in sync with all the _OSC flags since vmd doesn't
> > give
> > a choice to toggle.
> 
> This is what I don't understand.  If we decide that _OSC doesn't
> apply
> below VMD because the VMD host bridge isn't described in ACPI, the
> idea of being in sync with _OSC doesn't make sense.
> 
> > However, the issue arises in guest where _OSC setting for hotplug
> > is
> > always 0 even though hotplug is 1 in host and hotplug enable bit in
> > SltCap register is 1 in both host and guest. So we can use that to
> > enable hotplug in guest. Like you suggested in your above comment. 
> 
> All this got paged out over the holidays, so I need to refresh my
> understanding here.  Maybe it will help if we can make the situation
> more concrete.  I'm basing this on the logs at
> https://bugzilla.kernel.org/show_bug.cgi?id=215027.  I assume the
> 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both
> passed through to the guest.  I'm sure I got lots wrong here, so
> please correct me :)
> 
>   Host OS sees:
> 
>     PNP0A08 host bridge to 0000 [bus 00-ff]
>       _OSC applies to domain 0000
>       OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in
> domain 0000
>     vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff]
>       no _OSC applies in domain 10000;
>       host OS owns all PCIe features in domain 10000
>     pci 10000:e0:06.0: [8086:464d]             # VMD root port
>     pci 10000:e0:06.0: PCI bridge to [bus e1]
>     pci 10000:e0:06.0: SltCap: HotPlug+        # Hotplug Capable
>     pci 10000:e1:00.0: [144d:a80a]             # nvme
> 
>   Guest OS sees:
> 
>     PNP0A03 host bridge to 0000 [bus 00-ff]
>       _OSC applies to domain 0000
>       platform owns [PCIeHotplug ...]          # _OSC doesn't grant
> to OS
>     pci 0000:e0:06.0: [8086:464d]              # VMD root port
>     pci 0000:e0:06.0: PCI bridge to [bus e1]
>     pci 0000:e0:06.0: SltCap: HotPlug+         # Hotplug Capable
>     pci 0000:e1:00.0: [144d:a80a]             # nvme
> 
> So the guest OS sees that the VMD Root Port supports hotplug, but it
> can't use it because it was not granted ownership of the feature?
You are correct about _OSC not granting access in Guest. This is what I
see on my setup.

	Host OS:

	ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa])
	acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM
Segments MSI EDR HPX-Type3]
	acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug
AER DPC]
	acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME
PCIeCapability LTR]
	PCI host bridge to bus 0000:e2

	vmd 0000:e2:00.5: PCI host bridge to bus 10007:00
	vmd 0000:e2:00.5: Bound to PCI domain 10007

	Guest OS:

	ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03])
	acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM
Segments MSI EDR HPX-Type3]
	acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug
SHPCHotplug PME AER LTR DPC]
	acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability]

	vmd 0000:03:00.0: Bound to PCI domain 10000

	SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
	

	
> 
> Bjorn
Bjorn Helgaas Jan. 16, 2024, 11:54 p.m. UTC | #12
On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:
> ...

> In guest, vmd is passthrough including pci topology behind vmd. ...

IIUC this says the VMD bridge itself is passed through to the guest,
as well as the VMD Root Ports and devices below them.

Why is the VMD bridge itself passed through to the guest?  I assume
the host OS vmd driver enables the VMD bridge functionality, i.e., in
vmd_enable_domain()?

I gleaned this from the dmesg logs at
https://bugzilla.kernel.org/show_bug.cgi?id=215027.  I assume these
are from the host:

  pci 0000:00:0e.0: [8086:467f] type 00         # VMD Endpoint
  vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
  pci_bus 10000:e0: root bus resource [bus e0-ff]
  pci 10000:e0:06.0: [8086:464d] type 01        # VMD Root Port
  pci 10000:e0:06.0: PCI bridge to [bus e1]
  pci 10000:e1:00.0: [144d:a80a] type 00        # NVMe
  nvme 10000:e1:00.0: PCI INT A: not connected

What does it look like in the guest?  Does the guest see all three of
these devices (VMD Endpoint, VMD Root Port, NVMe)?

If the guest sees the VMD Endpoint, what does it do with it?  Does the
host vmd driver coordinate with the guest vmd driver?

Bjorn
Bjorn Helgaas Jan. 17, 2024, 12:49 a.m. UTC | #13
On Tue, Jan 16, 2024 at 01:37:32PM -0700, Nirmal Patel wrote:
> On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote:
> ...

> > Maybe it will help if we can make the
> > situation more concrete.  I'm basing this on the logs at
> > https://bugzilla.kernel.org/show_bug.cgi?id=215027.  I assume the
> > 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both
> > passed through to the guest.  I'm sure I got lots wrong here, so
> > please correct me :)
> > 
> >   Host OS sees:
> > 
> >     PNP0A08 host bridge to 0000 [bus 00-ff]
> >       _OSC applies to domain 0000
> >       OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in
> > domain 0000
> >     vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff]
> >       no _OSC applies in domain 10000;
> >       host OS owns all PCIe features in domain 10000
> >     pci 10000:e0:06.0: [8086:464d]             # VMD root port
> >     pci 10000:e0:06.0: PCI bridge to [bus e1]
> >     pci 10000:e0:06.0: SltCap: HotPlug+        # Hotplug Capable
> >     pci 10000:e1:00.0: [144d:a80a]             # nvme
> > 
> >   Guest OS sees:
> > 
> >     PNP0A03 host bridge to 0000 [bus 00-ff]
> >       _OSC applies to domain 0000
> >       platform owns [PCIeHotplug ...]          # _OSC doesn't grant
> > to OS
> >     pci 0000:e0:06.0: [8086:464d]              # VMD root port
> >     pci 0000:e0:06.0: PCI bridge to [bus e1]
> >     pci 0000:e0:06.0: SltCap: HotPlug+         # Hotplug Capable
> >     pci 0000:e1:00.0: [144d:a80a]             # nvme
> > 
> > So the guest OS sees that the VMD Root Port supports hotplug, but
> > it can't use it because it was not granted ownership of the
> > feature?
>
> You are correct about _OSC not granting access in Guest.

I was assuming the VMD Endpoint itself was not visible in the guest
and the VMD Root Ports appeared in domain 0000 described by the guest
PNP0A03.  The PNP0A03 _OSC would then apply to the VMD Root Ports.
But my assumption appears to be wrong:

> This is what I see on my setup.
> 
>   Host OS:
> 
>     ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa])
>     acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
>     acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug AER DPC]
>     acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR]
>     PCI host bridge to bus 0000:e2
> 
>     vmd 0000:e2:00.5: PCI host bridge to bus 10007:00
>     vmd 0000:e2:00.5: Bound to PCI domain 10007
> 
>   Guest OS:
> 
>     ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03])
>     acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
>     acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug SHPCHotplug PME AER LTR DPC]
>     acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability]
> 
>     vmd 0000:03:00.0: Bound to PCI domain 10000
> 
>     SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-

Your example above suggests that the guest has a PNP0A08 device for
domain 0000, with an _OSC, the guest sees the VMD Endpoint at
0000:03:00.0, and the VMD Root Ports and devices below them are in
domain 10000.  Right?

If we decide the _OSC that covers the VMD Endpoint does *not* apply to
the domain below the VMD bridge, the guest has no _OSC for domain
10000, so the guest OS should default to owning all the PCIe features
in that domain.

Bjorn
Nirmal Patel Feb. 1, 2024, 6:38 p.m. UTC | #14
On Tue, 2024-01-16 at 13:37 -0700, Nirmal Patel wrote:
> On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote:
> > [+cc Rafael, Kai-Heng]
> > 
> > On Thu, Dec 14, 2023 at 03:22:21PM -0700, Nirmal Patel wrote:
> > > On Thu, 2023-12-14 at 13:23 -0600, Bjorn Helgaas wrote:
> > > > On Wed, Dec 13, 2023 at 06:07:02PM -0700, Nirmal Patel wrote:
> > > > > On Tue, 2023-12-12 at 15:13 -0600, Bjorn Helgaas wrote:
> > > > > > On Mon, Dec 11, 2023 at 04:05:25PM -0700, Nirmal Patel
> > > > > > wrote:
> > > > > > > On Tue, 2023-12-05 at 18:27 -0600, Bjorn Helgaas wrote:
> > > > > > > > On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel
> > > > > > > > wrote:
> > > > > > > > ...
> > > > > > > > > Currently Hotplug is controlled by _OSC in both host
> > > > > > > > > and
> > > > > > > > > Guest.  In case of guest, it seems guest BIOS hasn't
> > > > > > > > > implemented _OSC since all the flags are set to 0
> > > > > > > > > which
> > > > > > > > > is not the case in host.
> > > > > > > > 
> > > > > > > > I think you want pciehp to work on the VMD Root Ports
> > > > > > > > in
> > > > > > > > the Guest, no matter what, on the assumption that any
> > > > > > > > _OSC
> > > > > > > > that applies to host bridge A does not apply to the
> > > > > > > > host
> > > > > > > > bridge created by the vmd driver.
> > > > > > > > 
> > > > > > > > If so, we should just revert 04b12ef163d1 ("PCI: vmd:
> > > > > > > > Honor ACPI _OSC on PCIe features").  Since host bridge
> > > > > > > > B
> > > > > > > > was not enumerated via ACPI, the OS owns all those
> > > > > > > > features under B by default, so reverting 04b12ef163d1
> > > > > > > > would leave that state.
> > > > > > > > 
> > > > > > > > Obviously we'd have to first figure out another
> > > > > > > > solution
> > > > > > > > for the AER message flood that 04b12ef163d1 resolved.
> > > > > > > 
> > > > > > > If we revert 04b12ef163d1, then VMD driver will still
> > > > > > > enable
> > > > > > > AER blindly which is a bug. So we need to find a way to
> > > > > > > make
> > > > > > > VMD driver aware of AER platform setting and use that
> > > > > > > information to enable or disable AER for its child
> > > > > > > devices.
> > > > > > > 
> > > > > > > There is a setting in BIOS that allows us to enable OS
> > > > > > > native AER support on the platform. This setting is
> > > > > > > located
> > > > > > > in EDK Menu -> Platform configuration -> system event log
> > > > > > > ->
> > > > > > > IIO error enabling -> OS native AER support.
> > > > > > > 
> > > > > > > This setting is assigned to VMD bridge by
> > > > > > > vmd_copy_host_bridge_flags in patch 04b12ef163d1. Without
> > > > > > > the patch 04b12ef163d1, VMD driver will enable AER even
> > > > > > > if
> > > > > > > platform has disabled OS native AER support as mentioned
> > > > > > > earlier. This will result in an AER flood mentioned in
> > > > > > > 04b12ef163d1 since there is no AER handlers. 
> > > > 
> > > > I missed this before.  What does "there are no AER handlers"
> > > > mean?
> > > > Do you mean there are no *firmware* AER handlers?  
> > > 
> > > Sorry for confusing wordings. Your understanding is correct. The
> > > patch
> > > 04b12ef163d1 is used to disable AER on vmd and avoid the AER
> > > flooding
> > > by applying _OSC settings since vmd driver doesn't give a choice
> > > to
> > > toggle AER, DPC, LTR, etc.
> > > > The dmesg log at 
> > > > https://bugzilla.kernel.org/attachment.cgi?id=299571
> > > > (from https://bugzilla.kernel.org/show_bug.cgi?id=215027, the
> > > > bug
> > > > fixed by 04b12ef163d1), shows this:
> > > > 
> > > >   acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM
> > > > ClockPM
> > > > Segments MSI HPX-Type3]
> > > >   acpi PNP0A08:00: _OSC: platform does not support [AER]
> > > >   acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug
> > > > SHPCHotplug
> > > > PME
> > > > PCIeCapability LTR]
> > > > 
> > > > so the firmware did not grant AER control to the OS (I think
> > > > "platform does not support" is a confusing description).
> > > > 
> > > > Prior to 04b12ef163d1, Linux applied _OSC above the VMD bridge
> > > > but
> > > > not below it, so Linux had native control below VMD and it did
> > > > handle AER interrupts from the 10000:e0:06.0 Root Port below
> > > > VMD:
> > > > 
> > > >   vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
> > > >   pcieport 10000:e0:06.0: AER: Corrected error received:
> > > > 10000:e1:00.0
> > > > 
> > > > After 04b12ef163d1, Linux applied _OSC below the VMD bridge as
> > > > well,
> > > > so it did not enable or handle any AER interrupts.  I suspect
> > > > the
> > > > platform didn't handle AER interrupts below VMD either, so
> > > > those
> > > > errors were probably just ignored.
> > > > 
> > > > > > > If VMD is aware of OS native AER support setting, then we
> > > > > > > will not see AER flooding issue.
> > > > > > > 
> > > > > > > Do you have any suggestion on how to make VMD driver
> > > > > > > aware
> > > > > > > of AER setting and keep it in sync with platform setting.
> > > > > > 
> > > > > > Well, this is the whole problem.  IIUC, you're saying we
> > > > > > should use _OSC to negotiate for AER control below the VMD
> > > > > > bridge, but ignore _OSC for hotplug control.
> > > > > 
> > > > > Because VMD has its own hotplug BIOS setting which allows vmd
> > > > > to
> > > > > enable or disable hotplug on its domain. However we don't
> > > > > have
> > > > > VMD specific AER, DPC, LTR settings. 
> > > > 
> > > > I don't quite follow.  The OS knows nothing about whether BIOS
> > > > settings exist, so they can't be used to determine where _OSC
> > > > applies.
> > > > 
> > > > > Is it okay if we include an additional check for hotplug?
> > > > > i.e.
> > > > > Hotplug capable bit in SltCap register which reflects VMD
> > > > > BIOS
> > > > > hotplug setting.
> > > > 
> > > > I don't know what check you have in mind, but the OS can
> > > > definitely use SltCap to decide what features to enable.
> > > 
> > > Yes I agree. That is what I am also suggesting in this patch.
> > 
> > I should have said "the OS can use SltCap to *help* decide what
> > features to enable."  For ACPI host bridges, _OSC determines
> > whether
> > hotplug should be operated by the platform or the OS.
> > 
> > I think we're converging on the idea that since VMD is effectively
> > *not* an ACPI host bridge and doesn't have its own _OSC, the _OSC
> > that
> > applies to the VMD endpoint does *not* apply to the hierarchy below
> > the VMD.  In that case, the default is that the OS owns all the
> > features (hotplug, AER, etc) below the VMD.
> Well there will be few problems if VMD owns/assigns all the flags by
> itself. We discussed all of the potential problems but due to the
> holidays I think I should summarize them again.
> #1 : Currently there is no way to pass the information about AER,
> DPC,
> etc to VMD driver from BIOS or from boot parameter. For example, If
> VMD
> blindly enables AER and platform has AER disabled, then we will see
> AERs from devices under VMD but user have no way to toggle it since
> we
> rejected idea of adding boot parameter for AER, DPC under VMD. I
> believe you also didn't like the idea of sysfs knob suggested by Kai-
> Heng.
> 
> #2 : Even if we use VMD hardware register to store AER, DPC and make
> UEFI VMD driver to write information about Hotplug, DPC, AER, we
> still
> dont have a way to change the setting if user wants to alter them.
> Also
> the issue will still persist in client platforms since we don't own
> their UEFI VMD driver. It will be a huge effort.
> 
> #3 : At this moment, user can enable/disable only Hotplug in VMD BIOS
> settings (meaning no AER, DPC, LTR, etc)and VMD driver can read it
> from
> SltCap register. This means BIOS needs to add other settings and VMD
> needs to figure out to read them which at this moment VMD can't do
> it.
> 
> #4 : consistency with Host OS and Guest OS.
> 
> I believe the current propesed patch is the best option which
> requires
> minimal changes without breaking other platform features and unblock
> our customers. This issues has been a blocker for us.
> 
> For your concerns regarding how VMD can own all the _OSC features, i
> am
> open to other ideas and will discuss with the architect if they have
> any suggestions.
> 
> 
> > > > You suggest above that you want vmd to be aware of AER
> > > > ownership
> > > > per
> > > > _OSC, but it sounds more like you just want AER disabled below
> > > > VMD
> > > > regardless.  Or do you have platforms that can actually handle
> > > > AER
> > > > interrupts from Root Ports below VMD?
> > > 
> > > No I dont want AER disabled below VMD all the time. I am
> > > suggesting
> > > vmd should be in sync with all the _OSC flags since vmd doesn't
> > > give
> > > a choice to toggle.
> > 
> > This is what I don't understand.  If we decide that _OSC doesn't
> > apply
> > below VMD because the VMD host bridge isn't described in ACPI, the
> > idea of being in sync with _OSC doesn't make sense.
> > 
> > > However, the issue arises in guest where _OSC setting for hotplug
> > > is
> > > always 0 even though hotplug is 1 in host and hotplug enable bit
> > > in
> > > SltCap register is 1 in both host and guest. So we can use that
> > > to
> > > enable hotplug in guest. Like you suggested in your above
> > > comment. 
> > 
> > All this got paged out over the holidays, so I need to refresh my
> > understanding here.  Maybe it will help if we can make the
> > situation
> > more concrete.  I'm basing this on the logs at
> > https://bugzilla.kernel.org/show_bug.cgi?id=215027.  I assume the
> > 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both
> > passed through to the guest.  I'm sure I got lots wrong here, so
> > please correct me :)
> > 
> >   Host OS sees:
> > 
> >     PNP0A08 host bridge to 0000 [bus 00-ff]
> >       _OSC applies to domain 0000
> >       OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in
> > domain 0000
> >     vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff]
> >       no _OSC applies in domain 10000;
> >       host OS owns all PCIe features in domain 10000
> >     pci 10000:e0:06.0: [8086:464d]             # VMD root port
> >     pci 10000:e0:06.0: PCI bridge to [bus e1]
> >     pci 10000:e0:06.0: SltCap: HotPlug+        # Hotplug Capable
> >     pci 10000:e1:00.0: [144d:a80a]             # nvme
> > 
> >   Guest OS sees:
> > 
> >     PNP0A03 host bridge to 0000 [bus 00-ff]
> >       _OSC applies to domain 0000
> >       platform owns [PCIeHotplug ...]          # _OSC doesn't grant
> > to OS
> >     pci 0000:e0:06.0: [8086:464d]              # VMD root port
> >     pci 0000:e0:06.0: PCI bridge to [bus e1]
> >     pci 0000:e0:06.0: SltCap: HotPlug+         # Hotplug Capable
> >     pci 0000:e1:00.0: [144d:a80a]             # nvme
> > 
> > So the guest OS sees that the VMD Root Port supports hotplug, but
> > it
> > can't use it because it was not granted ownership of the feature?
> You are correct about _OSC not granting access in Guest. This is what
> I
> see on my setup.
> 
> 	Host OS:
> 
> 	ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa])
> 	acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> Segments MSI EDR HPX-Type3]
> 	acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug
> AER DPC]
> 	acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME
> PCIeCapability LTR]
> 	PCI host bridge to bus 0000:e2
> 
> 	vmd 0000:e2:00.5: PCI host bridge to bus 10007:00
> 	vmd 0000:e2:00.5: Bound to PCI domain 10007
> 
> 	Guest OS:
> 
> 	ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03])
> 	acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> Segments MSI EDR HPX-Type3]
> 	acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug
> SHPCHotplug PME AER LTR DPC]
> 	acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability]
> 
> 	vmd 0000:03:00.0: Bound to PCI domain 10000
> 
> 	SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 	
> 
> 	
> > Bjorn

Hi Bjorn, Lorenzo,

Gentle ping. 

Thanks.
Bjorn Helgaas Feb. 1, 2024, 9:16 p.m. UTC | #15
On Tue, Jan 16, 2024 at 06:49:33PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 16, 2024 at 01:37:32PM -0700, Nirmal Patel wrote:
> > On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote:
> > ...
> > > Maybe it will help if we can make the
> > > situation more concrete.  I'm basing this on the logs at
> > > https://bugzilla.kernel.org/show_bug.cgi?id=215027.  I assume the
> > > 10000:e0:06.0 Root Port and the 10000:e1:00.0 NVMe device are both
> > > passed through to the guest.  I'm sure I got lots wrong here, so
> > > please correct me :)
> > > 
> > >   Host OS sees:
> > > 
> > >     PNP0A08 host bridge to 0000 [bus 00-ff]
> > >       _OSC applies to domain 0000
> > >       OS owns [PCIeHotplug SHPCHotplug PME PCIeCapability LTR] in
> > > domain 0000
> > >     vmd 0000:00:0e.0: PCI host bridge to domain 10000 [bus e0-ff]
> > >       no _OSC applies in domain 10000;
> > >       host OS owns all PCIe features in domain 10000
> > >     pci 10000:e0:06.0: [8086:464d]             # VMD root port
> > >     pci 10000:e0:06.0: PCI bridge to [bus e1]
> > >     pci 10000:e0:06.0: SltCap: HotPlug+        # Hotplug Capable
> > >     pci 10000:e1:00.0: [144d:a80a]             # nvme
> > > 
> > >   Guest OS sees:
> > > 
> > >     PNP0A03 host bridge to 0000 [bus 00-ff]
> > >       _OSC applies to domain 0000
> > >       platform owns [PCIeHotplug ...]          # _OSC doesn't grant
> > > to OS
> > >     pci 0000:e0:06.0: [8086:464d]              # VMD root port
> > >     pci 0000:e0:06.0: PCI bridge to [bus e1]
> > >     pci 0000:e0:06.0: SltCap: HotPlug+         # Hotplug Capable
> > >     pci 0000:e1:00.0: [144d:a80a]             # nvme
> > > 
> > > So the guest OS sees that the VMD Root Port supports hotplug, but
> > > it can't use it because it was not granted ownership of the
> > > feature?
> >
> > You are correct about _OSC not granting access in Guest.
> 
> I was assuming the VMD Endpoint itself was not visible in the guest
> and the VMD Root Ports appeared in domain 0000 described by the guest
> PNP0A03.  The PNP0A03 _OSC would then apply to the VMD Root Ports.
> But my assumption appears to be wrong:
> 
> > This is what I see on my setup.
> > 
> >   Host OS:
> > 
> >     ACPI: PCI Root Bridge [PC11] (domain 0000 [bus e2-fa])
> >     acpi PNP0A08:0b: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
> >     acpi PNP0A08:0b: _OSC: platform does not support [SHPCHotplug AER DPC]
> >     acpi PNP0A08:0b: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR]
> >     PCI host bridge to bus 0000:e2
> > 
> >     vmd 0000:e2:00.5: PCI host bridge to bus 10007:00
> >     vmd 0000:e2:00.5: Bound to PCI domain 10007
> > 
> >   Guest OS:
> > 
> >     ACPI: PCI Root Bridge [PC0G] (domain 0000 [bus 03])
> >     acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
> >     acpi PNP0A08:01: _OSC: platform does not support [PCIeHotplug SHPCHotplug PME AER LTR DPC]
> >     acpi PNP0A08:01: _OSC: OS now controls [PCIeCapability]
> > 
> >     vmd 0000:03:00.0: Bound to PCI domain 10000
> > 
> >     SltCap: PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 
> Your example above suggests that the guest has a PNP0A08 device for
> domain 0000, with an _OSC, the guest sees the VMD Endpoint at
> 0000:03:00.0, and the VMD Root Ports and devices below them are in
> domain 10000.  Right?

Is my assumption here correct?  Do the VMD Endpoint, the VMD Root
Ports, and the devices below those Root Ports all visible to the
guest?

If so, it sounds like the VMD Endpoint is visible to both the host and
the guest?  I assume you only want it claimed by the vmd driver in
*one* of them?  If it's visible in both, how do you prevent one from
claiming it?

> If we decide the _OSC that covers the VMD Endpoint does *not* apply to
> the domain below the VMD bridge, the guest has no _OSC for domain
> 10000, so the guest OS should default to owning all the PCIe features
> in that domain.

I assume the guest sees the VMD Endpoint in domain 0000, right?

Does the guest have an _OSC for domain 0000?  If so, that would
clearly apply to the VMD Endpoint.

I think the guest sees the VMD Root Ports in domain 10000, along with
the devices below them, right?

I'm pretty sure the guest has no _OSC for domain 10000, i.e., for the
the VMD Root Ports and below, right?

Bjorn
Bjorn Helgaas Feb. 1, 2024, 10:22 p.m. UTC | #16
On Tue, Jan 16, 2024 at 01:37:32PM -0700, Nirmal Patel wrote:
> On Fri, 2024-01-12 at 16:55 -0600, Bjorn Helgaas wrote:
> ...

> > I think we're converging on the idea that since VMD is effectively
> > *not* an ACPI host bridge and doesn't have its own _OSC, the _OSC
> > that applies to the VMD endpoint does *not* apply to the hierarchy
> > below the VMD.  In that case, the default is that the OS owns all
> > the features (hotplug, AER, etc) below the VMD.
>
> Well there will be few problems if VMD owns/assigns all the flags by
> itself. We discussed all of the potential problems but due to the
> holidays I think I should summarize them again.
>
> #1 : Currently there is no way to pass the information about AER,
> DPC, etc to VMD driver from BIOS or from boot parameter. For
> example, if VMD blindly enables AER and platform has AER disabled,
> then we will see AERs from devices under VMD but user have no way to
> toggle it since we rejected idea of adding boot parameter for AER,
> DPC under VMD. I believe you also didn't like the idea of sysfs knob
> suggested by Kai-Heng.
> 
> #2 : Even if we use VMD hardware register to store AER, DPC and make
> UEFI VMD driver to write information about Hotplug, DPC, AER, we
> still dont have a way to change the setting if user wants to alter
> them. Also the issue will still persist in client platforms since we
> don't own their UEFI VMD driver. It will be a huge effort.
> 
> #3 : At this moment, user can enable/disable only Hotplug in VMD
> BIOS settings (meaning no AER, DPC, LTR, etc)and VMD driver can read
> it from SltCap register. This means BIOS needs to add other settings
> and VMD needs to figure out to read them which at this moment VMD
> can't do it.
> 
> #4 : consistency with Host OS and Guest OS.
> 
> I believe the current proposed patch is the best option which
> requires minimal changes without breaking other platform features
> and unblock our customers. This issues has been a blocker for us.
> 
> For your concerns regarding how VMD can own all the _OSC features, i
> am open to other ideas and will discuss with the architect if they
> have any suggestions.

As I understand it, the basic model of PCIe features is:

  - If platform doesn't have a way to negotiate ownership of PCIe
    features, the OS owns them by default, e.g., on non-ACPI systems.

  - If platform does have a way to negotiate, e.g., ACPI _OSC, the
    platform owns the features until it grants ownership to the OS.

  - If the OS has ownership (either by default or granted by
    platform), it can use the feature if the hardware device
    advertises support.

I think this model applies to all PCIe features, including hotplug,
AER, DPC, etc., and the OS uses _OSC and the Capabilities in device
config space to determine whether to use the features.

_OSC is the obvious way for a platform to use BIOS settings to
influence what the OS does.  I think the problem with VMD is that you
have a guest OS running on a platform (the guest VM) where you want a
host BIOS setting to control things in that guest platform, but
there's currently no way to influence the _OSC seen by the guest.

I think we need to:

  - Clarify whether _OSC only applies in the domain of the host bridge
    where it appears, i.e., an _OSC in a host bridge to domain 0000
    obviously applies to a VMD Endpoint in domain 0000; does it also
    apply to devices in the domain 10000 *below* the VMD Endpoint?

  - Describe what the VMD bridge does with events below it.  E.g.,
    normally a Root Port that receives an error Message generates an
    interrupt depending on its Interrupt Disable and Error Reporting
    Enable bits.  What happens when it's a VMD Root Port?  Does it
    forward an error Message up via the VMD Endpoint?  Generate an
    interrupt via the VMD Endpoint?  If so, which interrupt?

The question of where _OSC applies feels like an architectural thing.

The question of how AER, DPC, hotplug, etc. events are forwarded
across the VMD Root Port/Endpoint might be, too, or maybe that's all
driver-specific, I dunno.  Either way, I think it's worth documenting
somehow.

Bjorn
Bjorn Helgaas Feb. 1, 2024, 11 p.m. UTC | #17
On Thu, Feb 01, 2024 at 11:38:47AM -0700, Nirmal Patel wrote:
> Gentle ping. 

Thanks for the ping, I was waiting for you, so we were deadlocked ;)
Nirmal Patel Feb. 7, 2024, 12:27 a.m. UTC | #18
On Thu, 2024-02-01 at 17:00 -0600, Bjorn Helgaas wrote:
> On Thu, Feb 01, 2024 at 11:38:47AM -0700, Nirmal Patel wrote:
> > Gentle ping. 
> 
> Thanks for the ping, I was waiting for you, so we were deadlocked ;)
Hi Bjorn,

Sorry I missed that.

Did you have a chance to look at my response on January 16th to your
questions? I tried to summarize all of the potential problems and
issues with different fixes. Please let me know if it is easier if I
resend the explanation. Thanks.

-nirmal
Bjorn Helgaas Feb. 7, 2024, 6:55 p.m. UTC | #19
On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote:
> ...
> Did you have a chance to look at my response on January 16th to your
> questions? I tried to summarize all of the potential problems and
> issues with different fixes. Please let me know if it is easier if I
> resend the explanation. Thanks.

I did see your Jan 16 response, thanks.

I had more questions after reading it, but they're more about
understanding the topology seen by the host and the guest:
  Jan 16: https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas
  Feb  1: https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas

As I mentioned in my second Feb 1 response
(https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the
usual plan envisioned by the PCI Firmware spec is that an OS can use a
PCIe feature if the platform has granted the OS ownership via _OSC and
a device advertises the feature via a Capability in config space.

My main concern with the v2 patch
(https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com)
is that it overrides _OSC for native_pcie_hotplug, but not for any of
the other features (AER, PME, LTR, DPC, etc.)

I think it's hard to read the specs and conclude that PCIe hotplug is
a special case, and I think we're likely to have similar issues with
other features in the future.

But if you think this is the best solution, I'm OK with merging it.

Bjorn
Nirmal Patel Feb. 13, 2024, 5:47 p.m. UTC | #20
On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote:
> On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote:
> > ...
> > Did you have a chance to look at my response on January 16th to
> > your
> > questions? I tried to summarize all of the potential problems and
> > issues with different fixes. Please let me know if it is easier if
> > I
> > resend the explanation. Thanks.
> 
> I did see your Jan 16 response, thanks.
> 
> I had more questions after reading it, but they're more about
> understanding the topology seen by the host and the guest:
>   Jan 16: https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas
>   Feb  1: https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas
> 
> As I mentioned in my second Feb 1 response
> (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the
> usual plan envisioned by the PCI Firmware spec is that an OS can use
> a
> PCIe feature if the platform has granted the OS ownership via _OSC
> and
> a device advertises the feature via a Capability in config space.
> 
> My main concern with the v2 patch
> (
> https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com
> )
> is that it overrides _OSC for native_pcie_hotplug, but not for any of
> the other features (AER, PME, LTR, DPC, etc.)
> 
> I think it's hard to read the specs and conclude that PCIe hotplug is
> a special case, and I think we're likely to have similar issues with
> other features in the future.
> 
> But if you think this is the best solution, I'm OK with merging it.
I sincerely apologize for late responses. I just found out that my
evolution mail client is automatically sending linux-pci emails to junk
since January 2024.

At the moment Hotplug is an exception for us, but I do share your
concern about other flags. We have done lot of testing and so far patch
v2 is the best solution we have.

Thanks
nirmal
> 
> Bjorn
Lorenzo Pieralisi Feb. 14, 2024, 1:43 p.m. UTC | #21
On Tue, Dec 05, 2023 at 03:20:27PM -0700, Nirmal Patel wrote:

[...]

> > In the host OS, this overrides whatever was negotiated via _OSC, I
> > guess on the principle that _OSC doesn't apply because the host BIOS
> > doesn't know about the Root Ports below the VMD.  (I'm not sure it's
> > 100% resolved that _OSC doesn't apply to them, so we should mention
> > that assumption here.)
> _OSC still controls every flag including Hotplug. I have observed that
> slot capabilities register has hotplug enabled only when platform has
> enabled the hotplug. So technically not overriding it in the host.
> It overrides in guest because _OSC is passing wrong/different
> information than the _OSC information in Host.
> Also like I mentioned, slot capabilities registers are not changed in
> Guest because vmd is passthrough.
> > 
> > In the guest OS, this still overrides whatever was negotiated via
> > _OSC, but presumably the guest BIOS *does* know about the Root Ports,
> > so I don't think the same principle about overriding _OSC applies
> > there.
> > 
> > I think it's still a problem that we handle
> > host_bridge->native_pcie_hotplug differently than all the other
> > features negotiated via _OSC.  We should at least explain this
> > somehow.
> Right now this is the only way to know in Guest OS if platform has
> enabled Hotplug or not. We have many customers complaining about
> Hotplug being broken in Guest. So just trying to honor _OSC but also
> fixing its limitation in Guest.

The question is: should _OSC settings apply to the VMD hierarchy ?

Yes or no (not maybe) ?

If yes, the guest firmware is buggy (and I appreciate it is hard to
fix). If no this patch should also change vmd_copy_host_bridge_flags()
and remove the native_pcie_hotplug initialization from the root bridge.

As a matter of fact this patch overrides the _OSC settings in host and
guest and it is not acceptable because it is not based on any documented
policy (if there is a policy please describe it).

> > I suspect part of the reason for doing it differently is to avoid the
> > interrupt storm that we avoid via 04b12ef163d1 ("PCI: vmd: Honor ACPI
> > _OSC on PCIe features").  If so, I think 04b12ef163d1 should be
> > mentioned here because it's an important piece of the picture.
> I can add a note about this patch as well. Let me know if you want me
> to add something specific.

At the very least you need to explain the problem you are solving in the
commit log - sentences like:

"This patch will make sure that Hotplug is enabled properly in Host
as well as in VM while honoring _OSC settings as well as VMD hotplug
setting."

aren't useful to someone who will look into this in the future.

If _OSC does not grant the host kernel HP handling and *any* (that's
another choice that should be explained) slot
capability reports that the VMD root port is HP capable we do
override _OSC, we are not honouring it, AFAICS.

Then we can say that in your user experience the stars always align and
what I mention above is not a problem because it can't happen but it is hard
to grok by just reading the code and more importantly, it is not based
on any standard documentation.
 
> Thank you for taking the time. This is a very critical issue for us and
> we have many customers complaining about it, I would appreciate to get
> it resolved as soon as possible.

Sure but we need to solve it in a way that does not break tomorrow
otherwise we will keep applying plasters on top of plasters.

Ignoring _OSC hotplug settings for VMD bridges in *both* host and guests
may be an acceptable - though a tad controversial - policy (if based on
software/hardware guidelines).

A mixture thereof in my opinion is not acceptable.

Thanks,
Lorenzo
Nirmal Patel March 6, 2024, 10:27 p.m. UTC | #22
On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote:
> On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote:
> > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote:
> > > ...
> > > Did you have a chance to look at my response on January 16th to
> > > your
> > > questions? I tried to summarize all of the potential problems and
> > > issues with different fixes. Please let me know if it is easier
> > > if
> > > I
> > > resend the explanation. Thanks.
> > 
> > I did see your Jan 16 response, thanks.
> > 
> > I had more questions after reading it, but they're more about
> > understanding the topology seen by the host and the guest:
> >   Jan 16: 
> > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas
> >   Feb  1: 
> > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas
> > 
> > As I mentioned in my second Feb 1 response
> > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the
> > usual plan envisioned by the PCI Firmware spec is that an OS can
> > use
> > a
> > PCIe feature if the platform has granted the OS ownership via _OSC
> > and
> > a device advertises the feature via a Capability in config space.
> > 
> > My main concern with the v2 patch
> > (
> > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com
> > )
> > is that it overrides _OSC for native_pcie_hotplug, but not for any
> > of
> > the other features (AER, PME, LTR, DPC, etc.)
> > 
> > I think it's hard to read the specs and conclude that PCIe hotplug
> > is
> > a special case, and I think we're likely to have similar issues
> > with
> > other features in the future.
> > 
> > But if you think this is the best solution, I'm OK with merging it.
Hi Bjorn,

We tried some other ways to pass the information about all of the flags
but I couldn't retrieve it in guest OS and VMD hardware doesn't have
empty registers to write this information. So as of now we have this
solution which only overwrites Hotplug flag. If you can accept it that
would be great.

Thanks
nirmal.

> I sincerely apologize for late responses. I just found out that my
> evolution mail client is automatically sending linux-pci emails to
> junk
> since January 2024.
> 
> At the moment Hotplug is an exception for us, but I do share your
> concern about other flags. We have done lot of testing and so far
> patch
> v2 is the best solution we have.
> 
> Thanks
> nirmal
> > Bjorn
Kai-Heng Feng March 7, 2024, 6:44 a.m. UTC | #23
Hi Nirmal,

On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel
<nirmal.patel@linux.intel.com> wrote:
>
> On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote:
> > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote:
> > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote:
> > > > ...
> > > > Did you have a chance to look at my response on January 16th to
> > > > your
> > > > questions? I tried to summarize all of the potential problems and
> > > > issues with different fixes. Please let me know if it is easier
> > > > if
> > > > I
> > > > resend the explanation. Thanks.
> > >
> > > I did see your Jan 16 response, thanks.
> > >
> > > I had more questions after reading it, but they're more about
> > > understanding the topology seen by the host and the guest:
> > >   Jan 16:
> > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas
> > >   Feb  1:
> > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas
> > >
> > > As I mentioned in my second Feb 1 response
> > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas), the
> > > usual plan envisioned by the PCI Firmware spec is that an OS can
> > > use
> > > a
> > > PCIe feature if the platform has granted the OS ownership via _OSC
> > > and
> > > a device advertises the feature via a Capability in config space.
> > >
> > > My main concern with the v2 patch
> > > (
> > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com
> > > )
> > > is that it overrides _OSC for native_pcie_hotplug, but not for any
> > > of
> > > the other features (AER, PME, LTR, DPC, etc.)
> > >
> > > I think it's hard to read the specs and conclude that PCIe hotplug
> > > is
> > > a special case, and I think we're likely to have similar issues
> > > with
> > > other features in the future.
> > >
> > > But if you think this is the best solution, I'm OK with merging it.
> Hi Bjorn,
>
> We tried some other ways to pass the information about all of the flags
> but I couldn't retrieve it in guest OS and VMD hardware doesn't have
> empty registers to write this information. So as of now we have this
> solution which only overwrites Hotplug flag. If you can accept it that
> would be great.

My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features" was a
logically conclusion based on VMD ports are just apertures.
Apparently there are more nuances than that, and people outside of
Intel can't possibly know. And yes VMD creates lots of headaches
during hardware enablement.

So is it possible to document the right way to use _OSC, as Bjorn suggested?

Kai-Heng

> > In the host OS, this overrides whatever was negotiated via _OSC, I
> > guess on the principle that _OSC doesn't apply because the host BIOS
> > doesn't know about the Root Ports below the VMD.  (I'm not sure it's
> > 100% resolved that _OSC doesn't apply to them, so we should mention
> > that assumption here.)
> _OSC still controls every flag including Hotplug. I have observed that
> slot capabilities register has hotplug enabled only when platform has
> enabled the hotplug. So technically not overriding it in the host.
> It overrides in guest because _OSC is passing wrong/different
> information than the _OSC information in Host.
> Also like I mentioned, slot capabilities registers are not changed in
> Guest because vmd is passthrough.
> >
> > In the guest OS, this still overrides whatever was negotiated via
> > _OSC, but presumably the guest BIOS *does* know about the Root Ports,
> > so I don't think the same principle about overriding _OSC applies
> > there.
> >
> > I think it's still a problem that we handle
> > host_bridge->native_pcie_hotplug differently than all the other
> > features negotiated via _OSC.  We should at least explain this
> > somehow.
> Right now this is the only way to know in Guest OS if platform has
> enabled Hotplug or not. We have many customers complaining about
> Hotplug being broken in Guest. So just trying to honor _OSC but also
> fixing its limitation in Guest.
>
> Thanks
> nirmal.
>
> > I sincerely apologize for late responses. I just found out that my
> > evolution mail client is automatically sending linux-pci emails to
> > junk
> > since January 2024.
> >
> > At the moment Hotplug is an exception for us, but I do share your
> > concern about other flags. We have done lot of testing and so far
> > patch
> > v2 is the best solution we have.
> >
> > Thanks
> > nirmal
> > > Bjorn
>
Nirmal Patel March 8, 2024, 12:09 a.m. UTC | #24
On Thu, 7 Mar 2024 14:44:23 +0800
Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:

> Hi Nirmal,
> 
> On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel
> <nirmal.patel@linux.intel.com> wrote:
> >
> > On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote:  
> > > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote:  
> > > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote:  
> > > > > ...
> > > > > Did you have a chance to look at my response on January 16th
> > > > > to your
> > > > > questions? I tried to summarize all of the potential problems
> > > > > and issues with different fixes. Please let me know if it is
> > > > > easier if
> > > > > I
> > > > > resend the explanation. Thanks.  
> > > >
> > > > I did see your Jan 16 response, thanks.
> > > >
> > > > I had more questions after reading it, but they're more about
> > > > understanding the topology seen by the host and the guest:
> > > >   Jan 16:
> > > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas
> > > >   Feb  1:
> > > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas
> > > >
> > > > As I mentioned in my second Feb 1 response
> > > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas),
> > > > the usual plan envisioned by the PCI Firmware spec is that an
> > > > OS can use
> > > > a
> > > > PCIe feature if the platform has granted the OS ownership via
> > > > _OSC and
> > > > a device advertises the feature via a Capability in config
> > > > space.
> > > >
> > > > My main concern with the v2 patch
> > > > (
> > > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com
> > > > )
> > > > is that it overrides _OSC for native_pcie_hotplug, but not for
> > > > any of
> > > > the other features (AER, PME, LTR, DPC, etc.)
> > > >
> > > > I think it's hard to read the specs and conclude that PCIe
> > > > hotplug is
> > > > a special case, and I think we're likely to have similar issues
> > > > with
> > > > other features in the future.
> > > >
> > > > But if you think this is the best solution, I'm OK with merging
> > > > it.  
> > Hi Bjorn,
> >
> > We tried some other ways to pass the information about all of the
> > flags but I couldn't retrieve it in guest OS and VMD hardware
> > doesn't have empty registers to write this information. So as of
> > now we have this solution which only overwrites Hotplug flag. If
> > you can accept it that would be great.  
> 
> My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features" was a
> logically conclusion based on VMD ports are just apertures.
> Apparently there are more nuances than that, and people outside of
> Intel can't possibly know. And yes VMD creates lots of headaches
> during hardware enablement.
> 
> So is it possible to document the right way to use _OSC, as Bjorn
> suggested?
> 
> Kai-Heng
Well we are stuck here with two issues which are kind of chicken and egg
problem.
1) VMD respects _OSC; which works excellent in Host but _OSC gives
wrong information in Guest OS which ends up disabling Hotplug, AER,
DPC, etc. We can live with AER, DPC disabled in VM but not the Hotplug.

2) VMD takes ownership of all the flags. For this to work VMD needs to
know these settings from BIOS. VMD hardware doesn't have empty register
where VMD UEFI driver can write this information. So honoring user
settings for AER, Hotplug, etc from BIOS is not possible.

Where do you want me to add documentation? Will more information in
the comment section of the Sltcap code change help?

Architecture wise VMD must own all of these flags but we have a
hardware limitation to successfully pass the necessary information to
the Host OS and the Guest OS.

Thanks
nirmal
> 
> > > In the host OS, this overrides whatever was negotiated via _OSC, I
> > > guess on the principle that _OSC doesn't apply because the host
> > > BIOS doesn't know about the Root Ports below the VMD.  (I'm not
> > > sure it's 100% resolved that _OSC doesn't apply to them, so we
> > > should mention that assumption here.)  
> > _OSC still controls every flag including Hotplug. I have observed
> > that slot capabilities register has hotplug enabled only when
> > platform has enabled the hotplug. So technically not overriding it
> > in the host. It overrides in guest because _OSC is passing
> > wrong/different information than the _OSC information in Host.
> > Also like I mentioned, slot capabilities registers are not changed
> > in Guest because vmd is passthrough.  
> > >
> > > In the guest OS, this still overrides whatever was negotiated via
> > > _OSC, but presumably the guest BIOS *does* know about the Root
> > > Ports, so I don't think the same principle about overriding _OSC
> > > applies there.
> > >
> > > I think it's still a problem that we handle
> > > host_bridge->native_pcie_hotplug differently than all the other
> > > features negotiated via _OSC.  We should at least explain this
> > > somehow.  
> > Right now this is the only way to know in Guest OS if platform has
> > enabled Hotplug or not. We have many customers complaining about
> > Hotplug being broken in Guest. So just trying to honor _OSC but also
> > fixing its limitation in Guest.
> >
> > Thanks
> > nirmal.
> >  
> > > I sincerely apologize for late responses. I just found out that my
> > > evolution mail client is automatically sending linux-pci emails to
> > > junk
> > > since January 2024.
> > >
> > > At the moment Hotplug is an exception for us, but I do share your
> > > concern about other flags. We have done lot of testing and so far
> > > patch
> > > v2 is the best solution we have.
> > >
> > > Thanks
> > > nirmal  
> > > > Bjorn  
> >
Kai-Heng Feng March 15, 2024, 1:29 a.m. UTC | #25
On Fri, Mar 8, 2024 at 8:09 AM Nirmal Patel
<nirmal.patel@linux.intel.com> wrote:
>
> On Thu, 7 Mar 2024 14:44:23 +0800
> Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
>
> > Hi Nirmal,
> >
> > On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel
> > <nirmal.patel@linux.intel.com> wrote:
> > >
> > > On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote:
> > > > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote:
> > > > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel wrote:
> > > > > > ...
> > > > > > Did you have a chance to look at my response on January 16th
> > > > > > to your
> > > > > > questions? I tried to summarize all of the potential problems
> > > > > > and issues with different fixes. Please let me know if it is
> > > > > > easier if
> > > > > > I
> > > > > > resend the explanation. Thanks.
> > > > >
> > > > > I did see your Jan 16 response, thanks.
> > > > >
> > > > > I had more questions after reading it, but they're more about
> > > > > understanding the topology seen by the host and the guest:
> > > > >   Jan 16:
> > > > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas
> > > > >   Feb  1:
> > > > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas
> > > > >
> > > > > As I mentioned in my second Feb 1 response
> > > > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas),
> > > > > the usual plan envisioned by the PCI Firmware spec is that an
> > > > > OS can use
> > > > > a
> > > > > PCIe feature if the platform has granted the OS ownership via
> > > > > _OSC and
> > > > > a device advertises the feature via a Capability in config
> > > > > space.
> > > > >
> > > > > My main concern with the v2 patch
> > > > > (
> > > > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com
> > > > > )
> > > > > is that it overrides _OSC for native_pcie_hotplug, but not for
> > > > > any of
> > > > > the other features (AER, PME, LTR, DPC, etc.)
> > > > >
> > > > > I think it's hard to read the specs and conclude that PCIe
> > > > > hotplug is
> > > > > a special case, and I think we're likely to have similar issues
> > > > > with
> > > > > other features in the future.
> > > > >
> > > > > But if you think this is the best solution, I'm OK with merging
> > > > > it.
> > > Hi Bjorn,
> > >
> > > We tried some other ways to pass the information about all of the
> > > flags but I couldn't retrieve it in guest OS and VMD hardware
> > > doesn't have empty registers to write this information. So as of
> > > now we have this solution which only overwrites Hotplug flag. If
> > > you can accept it that would be great.
> >
> > My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features" was a
> > logically conclusion based on VMD ports are just apertures.
> > Apparently there are more nuances than that, and people outside of
> > Intel can't possibly know. And yes VMD creates lots of headaches
> > during hardware enablement.
> >
> > So is it possible to document the right way to use _OSC, as Bjorn
> > suggested?
> >
> > Kai-Heng
> Well we are stuck here with two issues which are kind of chicken and egg
> problem.
> 1) VMD respects _OSC; which works excellent in Host but _OSC gives
> wrong information in Guest OS which ends up disabling Hotplug, AER,
> DPC, etc. We can live with AER, DPC disabled in VM but not the Hotplug.
>
> 2) VMD takes ownership of all the flags. For this to work VMD needs to
> know these settings from BIOS. VMD hardware doesn't have empty register
> where VMD UEFI driver can write this information. So honoring user
> settings for AER, Hotplug, etc from BIOS is not possible.
>
> Where do you want me to add documentation? Will more information in
> the comment section of the Sltcap code change help?
>
> Architecture wise VMD must own all of these flags but we have a
> hardware limitation to successfully pass the necessary information to
> the Host OS and the Guest OS.

If there's an official document on intel.com, it can make many things
clearer and easier.
States what VMD does and what VMD expect OS to do can be really
helpful. Basically put what you wrote in an official document.

Kai-Heng

>
> Thanks
> nirmal
> >
> > > > In the host OS, this overrides whatever was negotiated via _OSC, I
> > > > guess on the principle that _OSC doesn't apply because the host
> > > > BIOS doesn't know about the Root Ports below the VMD.  (I'm not
> > > > sure it's 100% resolved that _OSC doesn't apply to them, so we
> > > > should mention that assumption here.)
> > > _OSC still controls every flag including Hotplug. I have observed
> > > that slot capabilities register has hotplug enabled only when
> > > platform has enabled the hotplug. So technically not overriding it
> > > in the host. It overrides in guest because _OSC is passing
> > > wrong/different information than the _OSC information in Host.
> > > Also like I mentioned, slot capabilities registers are not changed
> > > in Guest because vmd is passthrough.
> > > >
> > > > In the guest OS, this still overrides whatever was negotiated via
> > > > _OSC, but presumably the guest BIOS *does* know about the Root
> > > > Ports, so I don't think the same principle about overriding _OSC
> > > > applies there.
> > > >
> > > > I think it's still a problem that we handle
> > > > host_bridge->native_pcie_hotplug differently than all the other
> > > > features negotiated via _OSC.  We should at least explain this
> > > > somehow.
> > > Right now this is the only way to know in Guest OS if platform has
> > > enabled Hotplug or not. We have many customers complaining about
> > > Hotplug being broken in Guest. So just trying to honor _OSC but also
> > > fixing its limitation in Guest.
> > >
> > > Thanks
> > > nirmal.
> > >
> > > > I sincerely apologize for late responses. I just found out that my
> > > > evolution mail client is automatically sending linux-pci emails to
> > > > junk
> > > > since January 2024.
> > > >
> > > > At the moment Hotplug is an exception for us, but I do share your
> > > > concern about other flags. We have done lot of testing and so far
> > > > patch
> > > > v2 is the best solution we have.
> > > >
> > > > Thanks
> > > > nirmal
> > > > > Bjorn
> > >
>
Nirmal Patel March 22, 2024, 8:57 p.m. UTC | #26
On Fri, 15 Mar 2024 09:29:32 +0800
Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:

> On Fri, Mar 8, 2024 at 8:09 AM Nirmal Patel
> <nirmal.patel@linux.intel.com> wrote:
> >
> > On Thu, 7 Mar 2024 14:44:23 +0800
> > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> >  
> > > Hi Nirmal,
> > >
> > > On Thu, Mar 7, 2024 at 6:20 AM Nirmal Patel
> > > <nirmal.patel@linux.intel.com> wrote:  
> > > >
> > > > On Tue, 2024-02-13 at 10:47 -0700, Nirmal Patel wrote:  
> > > > > On Wed, 2024-02-07 at 12:55 -0600, Bjorn Helgaas wrote:  
> > > > > > On Tue, Feb 06, 2024 at 05:27:29PM -0700, Nirmal Patel
> > > > > > wrote:  
> > > > > > > ...
> > > > > > > Did you have a chance to look at my response on January
> > > > > > > 16th to your
> > > > > > > questions? I tried to summarize all of the potential
> > > > > > > problems and issues with different fixes. Please let me
> > > > > > > know if it is easier if
> > > > > > > I
> > > > > > > resend the explanation. Thanks.  
> > > > > >
> > > > > > I did see your Jan 16 response, thanks.
> > > > > >
> > > > > > I had more questions after reading it, but they're more
> > > > > > about understanding the topology seen by the host and the
> > > > > > guest: Jan 16:
> > > > > > https://lore.kernel.org/r/20240117004933.GA108810@bhelgaas
> > > > > >   Feb  1:
> > > > > > https://lore.kernel.org/r/20240201211620.GA650432@bhelgaas
> > > > > >
> > > > > > As I mentioned in my second Feb 1 response
> > > > > > (https://lore.kernel.org/r/20240201222245.GA650725@bhelgaas),
> > > > > > the usual plan envisioned by the PCI Firmware spec is that
> > > > > > an OS can use
> > > > > > a
> > > > > > PCIe feature if the platform has granted the OS ownership
> > > > > > via _OSC and
> > > > > > a device advertises the feature via a Capability in config
> > > > > > space.
> > > > > >
> > > > > > My main concern with the v2 patch
> > > > > > (
> > > > > > https://lore.kernel.org/r/20231127211729.42668-1-nirmal.patel@linux.intel.com
> > > > > > )
> > > > > > is that it overrides _OSC for native_pcie_hotplug, but not
> > > > > > for any of
> > > > > > the other features (AER, PME, LTR, DPC, etc.)
> > > > > >
> > > > > > I think it's hard to read the specs and conclude that PCIe
> > > > > > hotplug is
> > > > > > a special case, and I think we're likely to have similar
> > > > > > issues with
> > > > > > other features in the future.
> > > > > >
> > > > > > But if you think this is the best solution, I'm OK with
> > > > > > merging it.  
> > > > Hi Bjorn,
> > > >
> > > > We tried some other ways to pass the information about all of
> > > > the flags but I couldn't retrieve it in guest OS and VMD
> > > > hardware doesn't have empty registers to write this
> > > > information. So as of now we have this solution which only
> > > > overwrites Hotplug flag. If you can accept it that would be
> > > > great.  
> > >
> > > My original commit "PCI: vmd: Honor ACPI _OSC on PCIe features"
> > > was a logically conclusion based on VMD ports are just apertures.
> > > Apparently there are more nuances than that, and people outside of
> > > Intel can't possibly know. And yes VMD creates lots of headaches
> > > during hardware enablement.
> > >
> > > So is it possible to document the right way to use _OSC, as Bjorn
> > > suggested?
> > >
> > > Kai-Heng  
> > Well we are stuck here with two issues which are kind of chicken
> > and egg problem.
> > 1) VMD respects _OSC; which works excellent in Host but _OSC gives
> > wrong information in Guest OS which ends up disabling Hotplug, AER,
> > DPC, etc. We can live with AER, DPC disabled in VM but not the
> > Hotplug.
> >
> > 2) VMD takes ownership of all the flags. For this to work VMD needs
> > to know these settings from BIOS. VMD hardware doesn't have empty
> > register where VMD UEFI driver can write this information. So
> > honoring user settings for AER, Hotplug, etc from BIOS is not
> > possible.
> >
> > Where do you want me to add documentation? Will more information in
> > the comment section of the Sltcap code change help?
> >
> > Architecture wise VMD must own all of these flags but we have a
> > hardware limitation to successfully pass the necessary information
> > to the Host OS and the Guest OS.  
> 
> If there's an official document on intel.com, it can make many things
> clearer and easier.
> States what VMD does and what VMD expect OS to do can be really
> helpful. Basically put what you wrote in an official document.
> 
> Kai-Heng

Hi Kai-Heng/ Bjorn,
Thanks for the suggestion. I can certainly find official VMD
architecture document and add that required information to
Documentation/PCI/controller/vmd.rst. Will that be okay?

I also need your some help/suggestion on following alternate solution.
We have been looking at VMD HW registers to find some empty registers.
Cache Line Size register offset OCh is not being used by VMD. This is
the explanation in PCI spec 5.0 section 7.5.1.1.7:
"This read-write register is implemented for legacy compatibility
purposes but has no effect on any PCI Express device behavior."
Can these registers be used for passing _OSC settings from BIOS to VMD
OS driver?

These 8 bits are more than enough for UEFI VMD driver to store all _OSC
flags and VMD OS driver can read it during OS boot up. This will solve
all of our issues.

Thanks
nirmal

> 
> >
> > Thanks
> > nirmal  
> > >  
> > > > > In the host OS, this overrides whatever was negotiated via
> > > > > _OSC, I guess on the principle that _OSC doesn't apply
> > > > > because the host BIOS doesn't know about the Root Ports below
> > > > > the VMD.  (I'm not sure it's 100% resolved that _OSC doesn't
> > > > > apply to them, so we should mention that assumption here.)  
> > > > _OSC still controls every flag including Hotplug. I have
> > > > observed that slot capabilities register has hotplug enabled
> > > > only when platform has enabled the hotplug. So technically not
> > > > overriding it in the host. It overrides in guest because _OSC
> > > > is passing wrong/different information than the _OSC
> > > > information in Host. Also like I mentioned, slot capabilities
> > > > registers are not changed in Guest because vmd is passthrough.  
> > > > >
> > > > > In the guest OS, this still overrides whatever was negotiated
> > > > > via _OSC, but presumably the guest BIOS *does* know about the
> > > > > Root Ports, so I don't think the same principle about
> > > > > overriding _OSC applies there.
> > > > >
> > > > > I think it's still a problem that we handle
> > > > > host_bridge->native_pcie_hotplug differently than all the
> > > > > other features negotiated via _OSC.  We should at least
> > > > > explain this somehow.  
> > > > Right now this is the only way to know in Guest OS if platform
> > > > has enabled Hotplug or not. We have many customers complaining
> > > > about Hotplug being broken in Guest. So just trying to honor
> > > > _OSC but also fixing its limitation in Guest.
> > > >
> > > > Thanks
> > > > nirmal.
> > > >  
> > > > > I sincerely apologize for late responses. I just found out
> > > > > that my evolution mail client is automatically sending
> > > > > linux-pci emails to junk
> > > > > since January 2024.
> > > > >
> > > > > At the moment Hotplug is an exception for us, but I do share
> > > > > your concern about other flags. We have done lot of testing
> > > > > and so far patch
> > > > > v2 is the best solution we have.
> > > > >
> > > > > Thanks
> > > > > nirmal  
> > > > > > Bjorn  
> > > >  
> >
Bjorn Helgaas March 22, 2024, 9:37 p.m. UTC | #27
On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote:
> On Fri, 15 Mar 2024 09:29:32 +0800
> Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> ...

> > If there's an official document on intel.com, it can make many things
> > clearer and easier.
> > States what VMD does and what VMD expect OS to do can be really
> > helpful. Basically put what you wrote in an official document.
> 
> Thanks for the suggestion. I can certainly find official VMD
> architecture document and add that required information to
> Documentation/PCI/controller/vmd.rst. Will that be okay?

I'd definitely be interested in whatever you can add to illuminate
these issues.

> I also need your some help/suggestion on following alternate solution.
> We have been looking at VMD HW registers to find some empty registers.
> Cache Line Size register offset OCh is not being used by VMD. This is
> the explanation in PCI spec 5.0 section 7.5.1.1.7:
> "This read-write register is implemented for legacy compatibility
> purposes but has no effect on any PCI Express device behavior."
> Can these registers be used for passing _OSC settings from BIOS to VMD
> OS driver?
> 
> These 8 bits are more than enough for UEFI VMD driver to store all _OSC
> flags and VMD OS driver can read it during OS boot up. This will solve
> all of our issues.

Interesting idea.  I think you'd have to do some work to separate out
the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still
relevant, to make sure nothing breaks.  But I think we overwrite it in
some cases even for PCIe devices where it's pointless, and it would be
nice to clean that up.

Bjorn
Paul M Stillwell Jr March 22, 2024, 10:43 p.m. UTC | #28
On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote:
>> On Fri, 15 Mar 2024 09:29:32 +0800
>> Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
>> ...
> 
>>> If there's an official document on intel.com, it can make many things
>>> clearer and easier.
>>> States what VMD does and what VMD expect OS to do can be really
>>> helpful. Basically put what you wrote in an official document.
>>
>> Thanks for the suggestion. I can certainly find official VMD
>> architecture document and add that required information to
>> Documentation/PCI/controller/vmd.rst. Will that be okay?
> 
> I'd definitely be interested in whatever you can add to illuminate
> these issues.
> 
>> I also need your some help/suggestion on following alternate solution.
>> We have been looking at VMD HW registers to find some empty registers.
>> Cache Line Size register offset OCh is not being used by VMD. This is
>> the explanation in PCI spec 5.0 section 7.5.1.1.7:
>> "This read-write register is implemented for legacy compatibility
>> purposes but has no effect on any PCI Express device behavior."
>> Can these registers be used for passing _OSC settings from BIOS to VMD
>> OS driver?
>>
>> These 8 bits are more than enough for UEFI VMD driver to store all _OSC
>> flags and VMD OS driver can read it during OS boot up. This will solve
>> all of our issues.
> 
> Interesting idea.  I think you'd have to do some work to separate out
> the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still
> relevant, to make sure nothing breaks.  But I think we overwrite it in
> some cases even for PCIe devices where it's pointless, and it would be
> nice to clean that up.
> 

I think the suggestion here is to use the VMD devices Cache Line Size 
register, not the other PCI devices. In that case we don't have to worry 
about conventional PCI devices because we aren't touching them.

Paul
Bjorn Helgaas March 22, 2024, 11:36 p.m. UTC | #29
On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote:
> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
> > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote:
> > > On Fri, 15 Mar 2024 09:29:32 +0800
> > > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> > > ...
> > 
> > > > If there's an official document on intel.com, it can make many things
> > > > clearer and easier.
> > > > States what VMD does and what VMD expect OS to do can be really
> > > > helpful. Basically put what you wrote in an official document.
> > > 
> > > Thanks for the suggestion. I can certainly find official VMD
> > > architecture document and add that required information to
> > > Documentation/PCI/controller/vmd.rst. Will that be okay?
> > 
> > I'd definitely be interested in whatever you can add to illuminate
> > these issues.
> > 
> > > I also need your some help/suggestion on following alternate solution.
> > > We have been looking at VMD HW registers to find some empty registers.
> > > Cache Line Size register offset OCh is not being used by VMD. This is
> > > the explanation in PCI spec 5.0 section 7.5.1.1.7:
> > > "This read-write register is implemented for legacy compatibility
> > > purposes but has no effect on any PCI Express device behavior."
> > > Can these registers be used for passing _OSC settings from BIOS to VMD
> > > OS driver?
> > > 
> > > These 8 bits are more than enough for UEFI VMD driver to store all _OSC
> > > flags and VMD OS driver can read it during OS boot up. This will solve
> > > all of our issues.
> > 
> > Interesting idea.  I think you'd have to do some work to separate out
> > the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still
> > relevant, to make sure nothing breaks.  But I think we overwrite it in
> > some cases even for PCIe devices where it's pointless, and it would be
> > nice to clean that up.
> 
> I think the suggestion here is to use the VMD devices Cache Line Size
> register, not the other PCI devices. In that case we don't have to worry
> about conventional PCI devices because we aren't touching them.

Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for
every device in some cases.  If we wrote the VMD PCI_CACHE_LINE_SIZE,
it would obliterate whatever you want to pass there.

Bjorn
Paul M Stillwell Jr March 25, 2024, 3:10 p.m. UTC | #30
On 3/22/2024 4:36 PM, Bjorn Helgaas wrote:
> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote:
>> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
>>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote:
>>>> On Fri, 15 Mar 2024 09:29:32 +0800
>>>> Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
>>>> ...
>>>
>>>>> If there's an official document on intel.com, it can make many things
>>>>> clearer and easier.
>>>>> States what VMD does and what VMD expect OS to do can be really
>>>>> helpful. Basically put what you wrote in an official document.
>>>>
>>>> Thanks for the suggestion. I can certainly find official VMD
>>>> architecture document and add that required information to
>>>> Documentation/PCI/controller/vmd.rst. Will that be okay?
>>>
>>> I'd definitely be interested in whatever you can add to illuminate
>>> these issues.
>>>
>>>> I also need your some help/suggestion on following alternate solution.
>>>> We have been looking at VMD HW registers to find some empty registers.
>>>> Cache Line Size register offset OCh is not being used by VMD. This is
>>>> the explanation in PCI spec 5.0 section 7.5.1.1.7:
>>>> "This read-write register is implemented for legacy compatibility
>>>> purposes but has no effect on any PCI Express device behavior."
>>>> Can these registers be used for passing _OSC settings from BIOS to VMD
>>>> OS driver?
>>>>
>>>> These 8 bits are more than enough for UEFI VMD driver to store all _OSC
>>>> flags and VMD OS driver can read it during OS boot up. This will solve
>>>> all of our issues.
>>>
>>> Interesting idea.  I think you'd have to do some work to separate out
>>> the conventional PCI devices, where PCI_CACHE_LINE_SIZE is still
>>> relevant, to make sure nothing breaks.  But I think we overwrite it in
>>> some cases even for PCIe devices where it's pointless, and it would be
>>> nice to clean that up.
>>
>> I think the suggestion here is to use the VMD devices Cache Line Size
>> register, not the other PCI devices. In that case we don't have to worry
>> about conventional PCI devices because we aren't touching them.
> 
> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for
> every device in some cases.  If we wrote the VMD PCI_CACHE_LINE_SIZE,
> it would obliterate whatever you want to pass there.
>

Ah, got it. Thanks for clarifying.

Paul
Nirmal Patel March 26, 2024, 12:17 a.m. UTC | #31
On Fri, 22 Mar 2024 18:36:37 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote:
> > On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:  
> > > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote:  
> > > > On Fri, 15 Mar 2024 09:29:32 +0800
> > > > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> > > > ...  
> > >   
> > > > > If there's an official document on intel.com, it can make
> > > > > many things clearer and easier.
> > > > > States what VMD does and what VMD expect OS to do can be
> > > > > really helpful. Basically put what you wrote in an official
> > > > > document.  
> > > > 
> > > > Thanks for the suggestion. I can certainly find official VMD
> > > > architecture document and add that required information to
> > > > Documentation/PCI/controller/vmd.rst. Will that be okay?  
> > > 
> > > I'd definitely be interested in whatever you can add to illuminate
> > > these issues.
> > >   
> > > > I also need your some help/suggestion on following alternate
> > > > solution. We have been looking at VMD HW registers to find some
> > > > empty registers. Cache Line Size register offset OCh is not
> > > > being used by VMD. This is the explanation in PCI spec 5.0
> > > > section 7.5.1.1.7: "This read-write register is implemented for
> > > > legacy compatibility purposes but has no effect on any PCI
> > > > Express device behavior." Can these registers be used for
> > > > passing _OSC settings from BIOS to VMD OS driver?
> > > > 
> > > > These 8 bits are more than enough for UEFI VMD driver to store
> > > > all _OSC flags and VMD OS driver can read it during OS boot up.
> > > > This will solve all of our issues.  
> > > 
> > > Interesting idea.  I think you'd have to do some work to separate
> > > out the conventional PCI devices, where PCI_CACHE_LINE_SIZE is
> > > still relevant, to make sure nothing breaks.  But I think we
> > > overwrite it in some cases even for PCIe devices where it's
> > > pointless, and it would be nice to clean that up.  
> > 
> > I think the suggestion here is to use the VMD devices Cache Line
> > Size register, not the other PCI devices. In that case we don't
> > have to worry about conventional PCI devices because we aren't
> > touching them.  
> 
> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for
> every device in some cases.  If we wrote the VMD PCI_CACHE_LINE_SIZE,
> it would obliterate whatever you want to pass there.
> 
> Bjorn
Our initial testing showed no change in value by overwrite from pci. The
registers didn't change in Host as well as in Guest OS when some data
was written from BIOS. I will perform more testing and also make sure
to write every register just in case.

Thanks for the response.

-nirmal
Kai-Heng Feng March 26, 2024, 1:59 a.m. UTC | #32
On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel
<nirmal.patel@linux.intel.com> wrote:
>
> On Fri, 22 Mar 2024 18:36:37 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> > On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr wrote:
> > > On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
> > > > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel wrote:
> > > > > On Fri, 15 Mar 2024 09:29:32 +0800
> > > > > Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> > > > > ...
> > > >
> > > > > > If there's an official document on intel.com, it can make
> > > > > > many things clearer and easier.
> > > > > > States what VMD does and what VMD expect OS to do can be
> > > > > > really helpful. Basically put what you wrote in an official
> > > > > > document.
> > > > >
> > > > > Thanks for the suggestion. I can certainly find official VMD
> > > > > architecture document and add that required information to
> > > > > Documentation/PCI/controller/vmd.rst. Will that be okay?
> > > >
> > > > I'd definitely be interested in whatever you can add to illuminate
> > > > these issues.
> > > >
> > > > > I also need your some help/suggestion on following alternate
> > > > > solution. We have been looking at VMD HW registers to find some
> > > > > empty registers. Cache Line Size register offset OCh is not
> > > > > being used by VMD. This is the explanation in PCI spec 5.0
> > > > > section 7.5.1.1.7: "This read-write register is implemented for
> > > > > legacy compatibility purposes but has no effect on any PCI
> > > > > Express device behavior." Can these registers be used for
> > > > > passing _OSC settings from BIOS to VMD OS driver?
> > > > >
> > > > > These 8 bits are more than enough for UEFI VMD driver to store
> > > > > all _OSC flags and VMD OS driver can read it during OS boot up.
> > > > > This will solve all of our issues.
> > > >
> > > > Interesting idea.  I think you'd have to do some work to separate
> > > > out the conventional PCI devices, where PCI_CACHE_LINE_SIZE is
> > > > still relevant, to make sure nothing breaks.  But I think we
> > > > overwrite it in some cases even for PCIe devices where it's
> > > > pointless, and it would be nice to clean that up.
> > >
> > > I think the suggestion here is to use the VMD devices Cache Line
> > > Size register, not the other PCI devices. In that case we don't
> > > have to worry about conventional PCI devices because we aren't
> > > touching them.
> >
> > Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE for
> > every device in some cases.  If we wrote the VMD PCI_CACHE_LINE_SIZE,
> > it would obliterate whatever you want to pass there.
> >
> > Bjorn
> Our initial testing showed no change in value by overwrite from pci. The
> registers didn't change in Host as well as in Guest OS when some data
> was written from BIOS. I will perform more testing and also make sure
> to write every register just in case.

If the VMD hardware is designed in this way and there's an official
document states that "VMD ports should follow _OSC expect for
hotplugging" then IMO there's no need to find alternative.

Kai-Heng

> Thanks for the response.
>
> -nirmal
>
Paul M Stillwell Jr March 26, 2024, 3:51 p.m. UTC | #33
On 3/25/2024 6:59 PM, Kai-Heng Feng wrote:
> On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel 
> <nirmal.patel@linux.intel.com> wrote:
>> 
>> On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas
>> <helgaas@kernel.org> wrote:
>> 
>>> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr
>>> wrote:
>>>> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
>>>>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel
>>>>> wrote:
>>>>>> On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng
>>>>>> <kai.heng.feng@canonical.com> wrote: ...
>>>>> 
>>>>>>> If there's an official document on intel.com, it can
>>>>>>> make many things clearer and easier. States what VMD does
>>>>>>> and what VMD expect OS to do can be really helpful.
>>>>>>> Basically put what you wrote in an official document.
>>>>>> 
>>>>>> Thanks for the suggestion. I can certainly find official
>>>>>> VMD architecture document and add that required information
>>>>>> to Documentation/PCI/controller/vmd.rst. Will that be
>>>>>> okay?
>>>>> 
>>>>> I'd definitely be interested in whatever you can add to
>>>>> illuminate these issues.
>>>>> 
>>>>>> I also need your some help/suggestion on following
>>>>>> alternate solution. We have been looking at VMD HW
>>>>>> registers to find some empty registers. Cache Line Size
>>>>>> register offset OCh is not being used by VMD. This is the
>>>>>> explanation in PCI spec 5.0 section 7.5.1.1.7: "This
>>>>>> read-write register is implemented for legacy compatibility
>>>>>> purposes but has no effect on any PCI Express device
>>>>>> behavior." Can these registers be used for passing _OSC
>>>>>> settings from BIOS to VMD OS driver?
>>>>>> 
>>>>>> These 8 bits are more than enough for UEFI VMD driver to
>>>>>> store all _OSC flags and VMD OS driver can read it during
>>>>>> OS boot up. This will solve all of our issues.
>>>>> 
>>>>> Interesting idea.  I think you'd have to do some work to
>>>>> separate out the conventional PCI devices, where
>>>>> PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing
>>>>> breaks.  But I think we overwrite it in some cases even for
>>>>> PCIe devices where it's pointless, and it would be nice to
>>>>> clean that up.
>>>> 
>>>> I think the suggestion here is to use the VMD devices Cache
>>>> Line Size register, not the other PCI devices. In that case we
>>>> don't have to worry about conventional PCI devices because we
>>>> aren't touching them.
>>> 
>>> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE
>>> for every device in some cases.  If we wrote the VMD
>>> PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to
>>> pass there.
>>> 
>>> Bjorn
>> Our initial testing showed no change in value by overwrite from
>> pci. The registers didn't change in Host as well as in Guest OS
>> when some data was written from BIOS. I will perform more testing
>> and also make sure to write every register just in case.
> 
> If the VMD hardware is designed in this way and there's an official 
> document states that "VMD ports should follow _OSC expect for 
> hotplugging" then IMO there's no need to find alternative.
> 

There isn't any official documentation for this, it's just the way VMD
is designed :). VMD assumes that everything behind it is hotpluggable so
the bits don't *really* matter. In other words, VMD supports hotplug and
you can't turn that off so hotplug is always on regardless of what the
bits say.

I believe prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
features") VMD ignored the flags (which was ok, but led to the issue
that the patch corrected). I think that patch is fine except for the 
hotplug bits because the hypervisor BIOS isn't filling them in correctly 
(if I understand the problem correctly). As I mentioned earlier the VMD 
design is such that VMD is going to handle all the hotplug events so the 
bits in the root bridge for hotplug are irrelevant WRT VMD.

Paul

> Kai-Heng
> 
>> Thanks for the response.
>> 
>> -nirmal
>>
Paul M Stillwell Jr March 26, 2024, 4:03 p.m. UTC | #34
On 3/26/2024 8:51 AM, Paul M Stillwell Jr wrote:
> On 3/25/2024 6:59 PM, Kai-Heng Feng wrote:
>> On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel 
>> <nirmal.patel@linux.intel.com> wrote:
>>>
>>> On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas
>>> <helgaas@kernel.org> wrote:
>>>
>>>> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr
>>>> wrote:
>>>>> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
>>>>>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel
>>>>>> wrote:
>>>>>>> On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng
>>>>>>> <kai.heng.feng@canonical.com> wrote: ...
>>>>>>
>>>>>>>> If there's an official document on intel.com, it can
>>>>>>>> make many things clearer and easier. States what VMD does
>>>>>>>> and what VMD expect OS to do can be really helpful.
>>>>>>>> Basically put what you wrote in an official document.
>>>>>>>
>>>>>>> Thanks for the suggestion. I can certainly find official
>>>>>>> VMD architecture document and add that required information
>>>>>>> to Documentation/PCI/controller/vmd.rst. Will that be
>>>>>>> okay?
>>>>>>
>>>>>> I'd definitely be interested in whatever you can add to
>>>>>> illuminate these issues.
>>>>>>
>>>>>>> I also need your some help/suggestion on following
>>>>>>> alternate solution. We have been looking at VMD HW
>>>>>>> registers to find some empty registers. Cache Line Size
>>>>>>> register offset OCh is not being used by VMD. This is the
>>>>>>> explanation in PCI spec 5.0 section 7.5.1.1.7: "This
>>>>>>> read-write register is implemented for legacy compatibility
>>>>>>> purposes but has no effect on any PCI Express device
>>>>>>> behavior." Can these registers be used for passing _OSC
>>>>>>> settings from BIOS to VMD OS driver?
>>>>>>>
>>>>>>> These 8 bits are more than enough for UEFI VMD driver to
>>>>>>> store all _OSC flags and VMD OS driver can read it during
>>>>>>> OS boot up. This will solve all of our issues.
>>>>>>
>>>>>> Interesting idea.  I think you'd have to do some work to
>>>>>> separate out the conventional PCI devices, where
>>>>>> PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing
>>>>>> breaks.  But I think we overwrite it in some cases even for
>>>>>> PCIe devices where it's pointless, and it would be nice to
>>>>>> clean that up.
>>>>>
>>>>> I think the suggestion here is to use the VMD devices Cache
>>>>> Line Size register, not the other PCI devices. In that case we
>>>>> don't have to worry about conventional PCI devices because we
>>>>> aren't touching them.
>>>>
>>>> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE
>>>> for every device in some cases.  If we wrote the VMD
>>>> PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to
>>>> pass there.
>>>>
>>>> Bjorn
>>> Our initial testing showed no change in value by overwrite from
>>> pci. The registers didn't change in Host as well as in Guest OS
>>> when some data was written from BIOS. I will perform more testing
>>> and also make sure to write every register just in case.
>>
>> If the VMD hardware is designed in this way and there's an official 
>> document states that "VMD ports should follow _OSC expect for 
>> hotplugging" then IMO there's no need to find alternative.
>>
> 
> There isn't any official documentation for this, it's just the way VMD
> is designed :). VMD assumes that everything behind it is hotpluggable so
> the bits don't *really* matter. In other words, VMD supports hotplug and
> you can't turn that off so hotplug is always on regardless of what the
> bits say.
> 
> I believe prior to 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> features") VMD ignored the flags (which was ok, but led to the issue
> that the patch corrected). I think that patch is fine except for the 
> hotplug bits because the hypervisor BIOS isn't filling them in correctly 
> (if I understand the problem correctly). As I mentioned earlier the VMD 
> design is such that VMD is going to handle all the hotplug events so the 
> bits in the root bridge for hotplug are irrelevant WRT VMD.
> 

I should have been clearer in one aspect of this response: the 
hypervisor BIOS sets all the _OSC bits to zero, not just the hotplug 
bits. It's just that the hotplug bits cause us issues when VMD is being 
used in a VM because it disables hotplug and we need it to be enabled 
because our customers expect to always be able to hotplug devices owned 
by VMD.

> Paul
> 
>> Kai-Heng
>>
>>> Thanks for the response.
>>>
>>> -nirmal
>>>
> 
>
Bjorn Helgaas March 26, 2024, 9:08 p.m. UTC | #35
On Tue, Mar 26, 2024 at 08:51:45AM -0700, Paul M Stillwell Jr wrote:
> On 3/25/2024 6:59 PM, Kai-Heng Feng wrote:
> > On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel
> > <nirmal.patel@linux.intel.com> wrote:
> > > 
> > > On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas
> > > <helgaas@kernel.org> wrote:
> > > 
> > > > On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr
> > > > wrote:
> > > > > On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
> > > > > > On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel
> > > > > > wrote:
> > > > > > > On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng
> > > > > > > <kai.heng.feng@canonical.com> wrote: ...
> > > > > > 
> > > > > > > > If there's an official document on intel.com, it can
> > > > > > > > make many things clearer and easier. States what VMD does
> > > > > > > > and what VMD expect OS to do can be really helpful.
> > > > > > > > Basically put what you wrote in an official document.
> > > > > > > 
> > > > > > > Thanks for the suggestion. I can certainly find official
> > > > > > > VMD architecture document and add that required information
> > > > > > > to Documentation/PCI/controller/vmd.rst. Will that be
> > > > > > > okay?
> > > > > > 
> > > > > > I'd definitely be interested in whatever you can add to
> > > > > > illuminate these issues.
> > > > > > 
> > > > > > > I also need your some help/suggestion on following
> > > > > > > alternate solution. We have been looking at VMD HW
> > > > > > > registers to find some empty registers. Cache Line Size
> > > > > > > register offset OCh is not being used by VMD. This is the
> > > > > > > explanation in PCI spec 5.0 section 7.5.1.1.7: "This
> > > > > > > read-write register is implemented for legacy compatibility
> > > > > > > purposes but has no effect on any PCI Express device
> > > > > > > behavior." Can these registers be used for passing _OSC
> > > > > > > settings from BIOS to VMD OS driver?
> > > > > > > 
> > > > > > > These 8 bits are more than enough for UEFI VMD driver to
> > > > > > > store all _OSC flags and VMD OS driver can read it during
> > > > > > > OS boot up. This will solve all of our issues.
> > > > > > 
> > > > > > Interesting idea.  I think you'd have to do some work to
> > > > > > separate out the conventional PCI devices, where
> > > > > > PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing
> > > > > > breaks.  But I think we overwrite it in some cases even for
> > > > > > PCIe devices where it's pointless, and it would be nice to
> > > > > > clean that up.
> > > > > 
> > > > > I think the suggestion here is to use the VMD devices Cache
> > > > > Line Size register, not the other PCI devices. In that case we
> > > > > don't have to worry about conventional PCI devices because we
> > > > > aren't touching them.
> > > > 
> > > > Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE
> > > > for every device in some cases.  If we wrote the VMD
> > > > PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to
> > > > pass there.
> > > > 
> > > Our initial testing showed no change in value by overwrite from
> > > pci. The registers didn't change in Host as well as in Guest OS
> > > when some data was written from BIOS. I will perform more testing
> > > and also make sure to write every register just in case.
> > 
> > If the VMD hardware is designed in this way and there's an official
> > document states that "VMD ports should follow _OSC expect for
> > hotplugging" then IMO there's no need to find alternative.
> 
> There isn't any official documentation for this, it's just the way VMD
> is designed :). VMD assumes that everything behind it is hotpluggable so
> the bits don't *really* matter. In other words, VMD supports hotplug and
> you can't turn that off so hotplug is always on regardless of what the
> bits say.

This whole discussion is about who *owns* the feature, not whether the
hardware supports the feature.

The general rule is "if _OSC covers the feature, the platform owns the
feature and the OS shouldn't touch it unless the OS requests and is
granted control of it."

VMD wants special treatment, and we'd like a simple description of
what that treatment is.  Right now it sounds like you want the OS to
own *hotplug* below VMD regardless of what _OSC says.

But I still have no idea about other features like AER, etc., so we're
kind of in no man's land about how to handle them.

Bjorn
Paul M Stillwell Jr April 2, 2024, 4:10 p.m. UTC | #36
On 3/26/2024 2:08 PM, Bjorn Helgaas wrote:
> On Tue, Mar 26, 2024 at 08:51:45AM -0700, Paul M Stillwell Jr wrote:
>> On 3/25/2024 6:59 PM, Kai-Heng Feng wrote:
>>> On Tue, Mar 26, 2024 at 8:17 AM Nirmal Patel
>>> <nirmal.patel@linux.intel.com> wrote:
>>>>
>>>> On Fri, 22 Mar 2024 18:36:37 -0500 Bjorn Helgaas
>>>> <helgaas@kernel.org> wrote:
>>>>
>>>>> On Fri, Mar 22, 2024 at 03:43:27PM -0700, Paul M Stillwell Jr
>>>>> wrote:
>>>>>> On 3/22/2024 2:37 PM, Bjorn Helgaas wrote:
>>>>>>> On Fri, Mar 22, 2024 at 01:57:00PM -0700, Nirmal Patel
>>>>>>> wrote:
>>>>>>>> On Fri, 15 Mar 2024 09:29:32 +0800 Kai-Heng Feng
>>>>>>>> <kai.heng.feng@canonical.com> wrote: ...
>>>>>>>
>>>>>>>>> If there's an official document on intel.com, it can
>>>>>>>>> make many things clearer and easier. States what VMD does
>>>>>>>>> and what VMD expect OS to do can be really helpful.
>>>>>>>>> Basically put what you wrote in an official document.
>>>>>>>>
>>>>>>>> Thanks for the suggestion. I can certainly find official
>>>>>>>> VMD architecture document and add that required information
>>>>>>>> to Documentation/PCI/controller/vmd.rst. Will that be
>>>>>>>> okay?
>>>>>>>
>>>>>>> I'd definitely be interested in whatever you can add to
>>>>>>> illuminate these issues.
>>>>>>>
>>>>>>>> I also need your some help/suggestion on following
>>>>>>>> alternate solution. We have been looking at VMD HW
>>>>>>>> registers to find some empty registers. Cache Line Size
>>>>>>>> register offset OCh is not being used by VMD. This is the
>>>>>>>> explanation in PCI spec 5.0 section 7.5.1.1.7: "This
>>>>>>>> read-write register is implemented for legacy compatibility
>>>>>>>> purposes but has no effect on any PCI Express device
>>>>>>>> behavior." Can these registers be used for passing _OSC
>>>>>>>> settings from BIOS to VMD OS driver?
>>>>>>>>
>>>>>>>> These 8 bits are more than enough for UEFI VMD driver to
>>>>>>>> store all _OSC flags and VMD OS driver can read it during
>>>>>>>> OS boot up. This will solve all of our issues.
>>>>>>>
>>>>>>> Interesting idea.  I think you'd have to do some work to
>>>>>>> separate out the conventional PCI devices, where
>>>>>>> PCI_CACHE_LINE_SIZE is still relevant, to make sure nothing
>>>>>>> breaks.  But I think we overwrite it in some cases even for
>>>>>>> PCIe devices where it's pointless, and it would be nice to
>>>>>>> clean that up.
>>>>>>
>>>>>> I think the suggestion here is to use the VMD devices Cache
>>>>>> Line Size register, not the other PCI devices. In that case we
>>>>>> don't have to worry about conventional PCI devices because we
>>>>>> aren't touching them.
>>>>>
>>>>> Yes, but there is generic code that writes PCI_CACHE_LINE_SIZE
>>>>> for every device in some cases.  If we wrote the VMD
>>>>> PCI_CACHE_LINE_SIZE, it would obliterate whatever you want to
>>>>> pass there.
>>>>>
>>>> Our initial testing showed no change in value by overwrite from
>>>> pci. The registers didn't change in Host as well as in Guest OS
>>>> when some data was written from BIOS. I will perform more testing
>>>> and also make sure to write every register just in case.
>>>
>>> If the VMD hardware is designed in this way and there's an official
>>> document states that "VMD ports should follow _OSC expect for
>>> hotplugging" then IMO there's no need to find alternative.
>>
>> There isn't any official documentation for this, it's just the way VMD
>> is designed :). VMD assumes that everything behind it is hotpluggable so
>> the bits don't *really* matter. In other words, VMD supports hotplug and
>> you can't turn that off so hotplug is always on regardless of what the
>> bits say.
> 
> This whole discussion is about who *owns* the feature, not whether the
> hardware supports the feature.
> 

I'm not disagreeing about who owns the feature :) I'm trying to find a 
solution when the data that the driver gets is wrong.

> The general rule is "if _OSC covers the feature, the platform owns the
> feature and the OS shouldn't touch it unless the OS requests and is
> granted control of it."
> 

The code is fine in a bare metal system, but completely broken in a 
hypervisor because all the _OSC bits are set to 0. So I am trying to 
find a solution where the hotplug bits can be set correctly in this 
situation.

> VMD wants special treatment, and we'd like a simple description of
> what that treatment is.  Right now it sounds like you want the OS to
> own *hotplug* below VMD regardless of what _OSC says.
> 

What I want is for hotplug to be enabled in a hypervisor :) The slot 
capability register has the correct bits set, but it doesn't make sense 
(to me) to use a register in a root port to set the bridge settings. 
That's why this patch just removed the hotplug bits from the code. 
Everything is set up correctly (as far as I can tell) in both bare metal 
and hypervisors if we omit those 2 lines.

Does anyone have a better suggestion on how to handle the case where the 
_OSC bits are incorrect?

Paul

> But I still have no idea about other features like AER, etc., so we're
> kind of in no man's land about how to handle them.
> 
> Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 769eedeb8802..e39eaef5549a 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -720,6 +720,7 @@  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	resource_size_t membar2_offset = 0x2000;
 	struct pci_bus *child;
 	struct pci_dev *dev;
+	struct pci_host_bridge *vmd_bridge;
 	int ret;
 
 	/*
@@ -886,8 +887,16 @@  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	 * and will fail pcie_bus_configure_settings() early. It can instead be
 	 * run on each of the real root ports.
 	 */
-	list_for_each_entry(child, &vmd->bus->children, node)
+	vmd_bridge = to_pci_host_bridge(vmd->bus->bridge);
+	list_for_each_entry(child, &vmd->bus->children, node) {
 		pcie_bus_configure_settings(child);
+		/*
+		 * When Hotplug is enabled on vmd root-port, enable it on vmd
+		 * bridge.
+		 */
+		if (child->self->is_hotplug_bridge)
+			vmd_bridge->native_pcie_hotplug = 1;
+	}
 
 	pci_bus_add_devices(vmd->bus);