diff mbox series

PCI: fixup PCI_INTERRUPT_LINE for VMD downstream devices

Message ID 20240724170040.5193-1-nirmal.patel@linux.intel.com
State New
Headers show
Series PCI: fixup PCI_INTERRUPT_LINE for VMD downstream devices | expand

Commit Message

Nirmal Patel July 24, 2024, 5 p.m. UTC
VMD does not support legacy interrupts for devices downstream from a
VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for these
devices to ensure we don't try to set up a legacy irq for them.

Note: This patch was proposed by Jim, I am trying to upstream it.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
---
 arch/x86/pci/fixup.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Christoph Hellwig July 24, 2024, 6:36 p.m. UTC | #1
On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:
> +#if IS_ENABLED(CONFIG_VMD)
> +/* 
> + * VMD does not support legacy interrupts for downstream devices.
> + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure OS
> + * doesn't try to configure a legacy irq.
> + */

The wording is a bit weird, Linux is the OS normally.  Or is this
about guest OSes in virtualized environments?  Given how VMD bypasses
a lot of boundaries can we even assign individual devices to VFIO?
And if so is that actually safe or should we prohibit it?
Bjorn Helgaas July 24, 2024, 7:10 p.m. UTC | #2
On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:
> VMD does not support legacy interrupts for devices downstream from a
> VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for these
> devices to ensure we don't try to set up a legacy irq for them.

s/legacy interrupts/INTx/
s/legacy irq/INTx/

> Note: This patch was proposed by Jim, I am trying to upstream it.
> 
> Signed-off-by: Jim Harris <james.r.harris@intel.com>
> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> ---
>  arch/x86/pci/fixup.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index b33afb240601..a3b34a256e7f 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
>  			      PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
>  
> +#if IS_ENABLED(CONFIG_VMD)
> +/* 
> + * VMD does not support legacy interrupts for downstream devices.
> + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure OS
> + * doesn't try to configure a legacy irq.

s/legacy interrupts/INTx/
s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/

> + */
> +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> +{
> +	if (is_vmd(dev->bus))
> +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, quirk_vmd_interrupt_line);

A quirk for every PCI device, even on systems without VMD, seems like
kind of a clumsy way to deal with this.

Conceptually, I would expect a host bridge driver (VMD acts like a
host bridge in this case) to know whether it supports INTx, and if the
driver knows it doesn't support INTx or it has no _PRT or DT
description of INTx routing to use, an attempt to configure INTx
should just fail naturally.

I don't claim this is how host bridge drivers actually work; I just
think it's the way they *should* work.

> +#endif
> +
>  static void quirk_intel_th_dnv(struct pci_dev *dev)
>  {
>  	struct resource *r = &dev->resource[4];
> -- 
> 2.39.1
>
Manivannan Sadhasivam July 25, 2024, 4:10 a.m. UTC | #3
On Wed, Jul 24, 2024 at 02:10:30PM -0500, Bjorn Helgaas wrote:
> On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:
> > VMD does not support legacy interrupts for devices downstream from a
> > VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for these
> > devices to ensure we don't try to set up a legacy irq for them.
> 
> s/legacy interrupts/INTx/
> s/legacy irq/INTx/
> 
> > Note: This patch was proposed by Jim, I am trying to upstream it.
> > 
> > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > ---
> >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > index b33afb240601..a3b34a256e7f 100644
> > --- a/arch/x86/pci/fixup.c
> > +++ b/arch/x86/pci/fixup.c
> > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct pci_dev *pdev)
> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
> >  			      PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
> >  
> > +#if IS_ENABLED(CONFIG_VMD)
> > +/* 
> > + * VMD does not support legacy interrupts for downstream devices.
> > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure OS
> > + * doesn't try to configure a legacy irq.
> 
> s/legacy interrupts/INTx/
> s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> 
> > + */
> > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > +{
> > +	if (is_vmd(dev->bus))
> > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
> > +}
> > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, quirk_vmd_interrupt_line);
> 
> A quirk for every PCI device, even on systems without VMD, seems like
> kind of a clumsy way to deal with this.
> 
> Conceptually, I would expect a host bridge driver (VMD acts like a
> host bridge in this case) to know whether it supports INTx, and if the
> driver knows it doesn't support INTx or it has no _PRT or DT
> description of INTx routing to use, an attempt to configure INTx
> should just fail naturally.
> 
> I don't claim this is how host bridge drivers actually work; I just
> think it's the way they *should* work.
> 

Absolutely! This patch is fixing the issue in a wrong place. There are existing
DT based host bridge drivers that disable INTx due to lack of hardware
capability. The driver just need to nullify pci_host_bridge::map_irq callback.

- Mani
Nirmal Patel July 25, 2024, 9:22 p.m. UTC | #4
On Thu, 25 Jul 2024 09:40:13 +0530
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:

> On Wed, Jul 24, 2024 at 02:10:30PM -0500, Bjorn Helgaas wrote:
> > On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:  
> > > VMD does not support legacy interrupts for devices downstream
> > > from a VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0
> > > for these devices to ensure we don't try to set up a legacy irq
> > > for them.  
> > 
> > s/legacy interrupts/INTx/
> > s/legacy irq/INTx/
> >   
> > > Note: This patch was proposed by Jim, I am trying to upstream it.
> > > 
> > > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > ---
> > >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > index b33afb240601..a3b34a256e7f 100644
> > > --- a/arch/x86/pci/fixup.c
> > > +++ b/arch/x86/pci/fixup.c
> > > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct pci_dev
> > > *pdev) DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
> > > PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
> > >  
> > > +#if IS_ENABLED(CONFIG_VMD)
> > > +/* 
> > > + * VMD does not support legacy interrupts for downstream devices.
> > > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure
> > > OS
> > > + * doesn't try to configure a legacy irq.  
> > 
> > s/legacy interrupts/INTx/
> > s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> >   
> > > + */
> > > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > > +{
> > > +	if (is_vmd(dev->bus))
> > > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE,
> > > 0); +}
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > > quirk_vmd_interrupt_line);  
> > 
> > A quirk for every PCI device, even on systems without VMD, seems
> > like kind of a clumsy way to deal with this.
> > 
> > Conceptually, I would expect a host bridge driver (VMD acts like a
> > host bridge in this case) to know whether it supports INTx, and if
> > the driver knows it doesn't support INTx or it has no _PRT or DT
> > description of INTx routing to use, an attempt to configure INTx
> > should just fail naturally.
> > 
> > I don't claim this is how host bridge drivers actually work; I just
> > think it's the way they *should* work.
> >   
> 
> Absolutely! This patch is fixing the issue in a wrong place. There
> are existing DT based host bridge drivers that disable INTx due to
> lack of hardware capability. The driver just need to nullify
> pci_host_bridge::map_irq callback.
> 
> - Mani
> 

Thanks for the response and suggestion. I will make adjustments in VMD
driver code.

-nirmal
Nirmal Patel July 29, 2024, 8:08 p.m. UTC | #5
On Thu, 25 Jul 2024 09:40:13 +0530
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:

> On Wed, Jul 24, 2024 at 02:10:30PM -0500, Bjorn Helgaas wrote:
> > On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:  
> > > VMD does not support legacy interrupts for devices downstream
> > > from a VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0
> > > for these devices to ensure we don't try to set up a legacy irq
> > > for them.  
> > 
> > s/legacy interrupts/INTx/
> > s/legacy irq/INTx/
> >   
> > > Note: This patch was proposed by Jim, I am trying to upstream it.
> > > 
> > > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > ---
> > >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > index b33afb240601..a3b34a256e7f 100644
> > > --- a/arch/x86/pci/fixup.c
> > > +++ b/arch/x86/pci/fixup.c
> > > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct pci_dev
> > > *pdev) DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
> > > PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
> > >  
> > > +#if IS_ENABLED(CONFIG_VMD)
> > > +/* 
> > > + * VMD does not support legacy interrupts for downstream devices.
> > > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure
> > > OS
> > > + * doesn't try to configure a legacy irq.  
> > 
> > s/legacy interrupts/INTx/
> > s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> >   
> > > + */
> > > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > > +{
> > > +	if (is_vmd(dev->bus))
> > > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE,
> > > 0); +}
> > > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > > quirk_vmd_interrupt_line);  
> > 
> > A quirk for every PCI device, even on systems without VMD, seems
> > like kind of a clumsy way to deal with this.
> > 
> > Conceptually, I would expect a host bridge driver (VMD acts like a
> > host bridge in this case) to know whether it supports INTx, and if
> > the driver knows it doesn't support INTx or it has no _PRT or DT
> > description of INTx routing to use, an attempt to configure INTx
> > should just fail naturally.
> > 
> > I don't claim this is how host bridge drivers actually work; I just
> > think it's the way they *should* work.
> >   
> 
> Absolutely! This patch is fixing the issue in a wrong place. There
> are existing DT based host bridge drivers that disable INTx due to
> lack of hardware capability. The driver just need to nullify
> pci_host_bridge::map_irq callback.
> 
> - Mani
> 
For VMD as a host bridge, pci_host_bridge::map_irq is null. Even all
VMD rootports' PCI_INTERRUPT_LINE registers are set to 0. Since VMD
doesn't explicitly set PCI_INTERRUPT_LINE register to 0 for all of its
sub-devices (i.e. NVMe), if some NVMes has non-zero value set for
PCI_INTERRUPT_LINE (i.e. 0xff) then some software like SPDK can read
it and make wrong assumption about INTx support.

Based Bjorn's and your suggestion, it might be better if VMD sets
PCI_INTERRUPT_LINE register for all of its sub-devices during VMD
enumeration.

-nirmal.
Nirmal Patel July 29, 2024, 8:10 p.m. UTC | #6
On Wed, 24 Jul 2024 14:10:30 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:
> > VMD does not support legacy interrupts for devices downstream from a
> > VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for these
> > devices to ensure we don't try to set up a legacy irq for them.  
> 
> s/legacy interrupts/INTx/
> s/legacy irq/INTx/
> 
> > Note: This patch was proposed by Jim, I am trying to upstream it.
> > 
> > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > ---
> >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > index b33afb240601..a3b34a256e7f 100644
> > --- a/arch/x86/pci/fixup.c
> > +++ b/arch/x86/pci/fixup.c
> > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct pci_dev
> > *pdev) DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
> > PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
> >  
> > +#if IS_ENABLED(CONFIG_VMD)
> > +/* 
> > + * VMD does not support legacy interrupts for downstream devices.
> > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure OS
> > + * doesn't try to configure a legacy irq.  
> 
> s/legacy interrupts/INTx/
> s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> 
> > + */
> > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > +{
> > +	if (is_vmd(dev->bus))
> > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
> > +}
> > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > quirk_vmd_interrupt_line);  
> 
> A quirk for every PCI device, even on systems without VMD, seems like
> kind of a clumsy way to deal with this.
> 
> Conceptually, I would expect a host bridge driver (VMD acts like a
> host bridge in this case) to know whether it supports INTx, and if the
> driver knows it doesn't support INTx or it has no _PRT or DT
> description of INTx routing to use, an attempt to configure INTx
> should just fail naturally.
> 
> I don't claim this is how host bridge drivers actually work; I just
> think it's the way they *should* work.

For VMD as a host bridge, pci_host_bridge::map_irq is null. Even all
VMD rootports' PCI_INTERRUPT_LINE registers are set to 0. Since VMD
doesn't explicitly set PCI_INTERRUPT_LINE register to 0 for all of its
sub-devices (i.e. NVMe), if some NVMes has non-zero value set for
PCI_INTERRUPT_LINE (i.e. 0xff) then some software like SPDK can read
it and make wrong assumption about INTx support.

Based your suggestion, it might be better if VMD sets
PCI_INTERRUPT_LINE register to 0 for all of its sub-devices during
VMD enumeration.

-nirmal.
> 
> > +#endif
> > +
> >  static void quirk_intel_th_dnv(struct pci_dev *dev)
> >  {
> >  	struct resource *r = &dev->resource[4];
> > -- 
> > 2.39.1
> >
Manivannan Sadhasivam July 30, 2024, 5:28 a.m. UTC | #7
On Mon, Jul 29, 2024 at 01:08:59PM -0700, Nirmal Patel wrote:
> On Thu, 25 Jul 2024 09:40:13 +0530
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> 
> > On Wed, Jul 24, 2024 at 02:10:30PM -0500, Bjorn Helgaas wrote:
> > > On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:  
> > > > VMD does not support legacy interrupts for devices downstream
> > > > from a VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0
> > > > for these devices to ensure we don't try to set up a legacy irq
> > > > for them.  
> > > 
> > > s/legacy interrupts/INTx/
> > > s/legacy irq/INTx/
> > >   
> > > > Note: This patch was proposed by Jim, I am trying to upstream it.
> > > > 
> > > > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > > ---
> > > >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > > index b33afb240601..a3b34a256e7f 100644
> > > > --- a/arch/x86/pci/fixup.c
> > > > +++ b/arch/x86/pci/fixup.c
> > > > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct pci_dev
> > > > *pdev) DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
> > > > PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
> > > >  
> > > > +#if IS_ENABLED(CONFIG_VMD)
> > > > +/* 
> > > > + * VMD does not support legacy interrupts for downstream devices.
> > > > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure
> > > > OS
> > > > + * doesn't try to configure a legacy irq.  
> > > 
> > > s/legacy interrupts/INTx/
> > > s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> > >   
> > > > + */
> > > > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > > > +{
> > > > +	if (is_vmd(dev->bus))
> > > > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE,
> > > > 0); +}
> > > > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > > > quirk_vmd_interrupt_line);  
> > > 
> > > A quirk for every PCI device, even on systems without VMD, seems
> > > like kind of a clumsy way to deal with this.
> > > 
> > > Conceptually, I would expect a host bridge driver (VMD acts like a
> > > host bridge in this case) to know whether it supports INTx, and if
> > > the driver knows it doesn't support INTx or it has no _PRT or DT
> > > description of INTx routing to use, an attempt to configure INTx
> > > should just fail naturally.
> > > 
> > > I don't claim this is how host bridge drivers actually work; I just
> > > think it's the way they *should* work.
> > >   
> > 
> > Absolutely! This patch is fixing the issue in a wrong place. There
> > are existing DT based host bridge drivers that disable INTx due to
> > lack of hardware capability. The driver just need to nullify
> > pci_host_bridge::map_irq callback.
> > 
> > - Mani
> > 
> For VMD as a host bridge, pci_host_bridge::map_irq is null. Even all
> VMD rootports' PCI_INTERRUPT_LINE registers are set to 0. 

If map_irq is already NULL, then how INTx is being configured? In your patch
description:

"So initialize the PCI_INTERRUPT_LINE to 0 for these devices to ensure we don't
try to set up a legacy irq for them."

Who is 'we'? For sure the PCI core wouldn't set INTx in your case. Does 'we'
refer to device firmware?

>Since VMD
> doesn't explicitly set PCI_INTERRUPT_LINE register to 0 for all of its
> sub-devices (i.e. NVMe), if some NVMes has non-zero value set for
> PCI_INTERRUPT_LINE (i.e. 0xff) then some software like SPDK can read
> it and make wrong assumption about INTx support.
> 

Is this statement is true (I haven't heard of before), then don't we need to set
PCI_INTERRUPT_LINE to 0 for all devices irrespective of host bridge? 

> Based Bjorn's and your suggestion, it might be better if VMD sets
> PCI_INTERRUPT_LINE register for all of its sub-devices during VMD
> enumeration.
> 

What about hotplug devices? 

- Mani
Nirmal Patel July 30, 2024, 5:51 p.m. UTC | #8
On Tue, 30 Jul 2024 10:58:30 +0530
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:

> On Mon, Jul 29, 2024 at 01:08:59PM -0700, Nirmal Patel wrote:
> > On Thu, 25 Jul 2024 09:40:13 +0530
> > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> >   
> > > On Wed, Jul 24, 2024 at 02:10:30PM -0500, Bjorn Helgaas wrote:  
> > > > On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:
> > > >  
> > > > > VMD does not support legacy interrupts for devices downstream
> > > > > from a VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0
> > > > > for these devices to ensure we don't try to set up a legacy
> > > > > irq for them.    
> > > > 
> > > > s/legacy interrupts/INTx/
> > > > s/legacy irq/INTx/
> > > >     
> > > > > Note: This patch was proposed by Jim, I am trying to upstream
> > > > > it.
> > > > > 
> > > > > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > > > ---
> > > > >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> > > > >  1 file changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > > > index b33afb240601..a3b34a256e7f 100644
> > > > > --- a/arch/x86/pci/fixup.c
> > > > > +++ b/arch/x86/pci/fixup.c
> > > > > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct
> > > > > pci_dev *pdev)
> > > > > DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
> > > > > PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid); 
> > > > > +#if IS_ENABLED(CONFIG_VMD)
> > > > > +/* 
> > > > > + * VMD does not support legacy interrupts for downstream
> > > > > devices.
> > > > > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to
> > > > > ensure OS
> > > > > + * doesn't try to configure a legacy irq.    
> > > > 
> > > > s/legacy interrupts/INTx/
> > > > s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> > > >     
> > > > > + */
> > > > > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > > > > +{
> > > > > +	if (is_vmd(dev->bus))
> > > > > +		pci_write_config_byte(dev,
> > > > > PCI_INTERRUPT_LINE, 0); +}
> > > > > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > > > > quirk_vmd_interrupt_line);    
> > > > 
> > > > A quirk for every PCI device, even on systems without VMD, seems
> > > > like kind of a clumsy way to deal with this.
> > > > 
> > > > Conceptually, I would expect a host bridge driver (VMD acts
> > > > like a host bridge in this case) to know whether it supports
> > > > INTx, and if the driver knows it doesn't support INTx or it has
> > > > no _PRT or DT description of INTx routing to use, an attempt to
> > > > configure INTx should just fail naturally.
> > > > 
> > > > I don't claim this is how host bridge drivers actually work; I
> > > > just think it's the way they *should* work.
> > > >     
> > > 
> > > Absolutely! This patch is fixing the issue in a wrong place. There
> > > are existing DT based host bridge drivers that disable INTx due to
> > > lack of hardware capability. The driver just need to nullify
> > > pci_host_bridge::map_irq callback.
> > > 
> > > - Mani
> > >   
> > For VMD as a host bridge, pci_host_bridge::map_irq is null. Even all
> > VMD rootports' PCI_INTERRUPT_LINE registers are set to 0.   
> 
> If map_irq is already NULL, then how INTx is being configured? In
> your patch description:
VMD uses MSIx.
> 
> "So initialize the PCI_INTERRUPT_LINE to 0 for these devices to
> ensure we don't try to set up a legacy irq for them."
> 
> Who is 'we'? For sure the PCI core wouldn't set INTx in your case.
> Does 'we' refer to device firmware?
> 
> >Since VMD
> > doesn't explicitly set PCI_INTERRUPT_LINE register to 0 for all of
> > its sub-devices (i.e. NVMe), if some NVMes has non-zero value set
> > for PCI_INTERRUPT_LINE (i.e. 0xff) then some software like SPDK can
> > read it and make wrong assumption about INTx support.
> >   
> 
> Is this statement is true (I haven't heard of before), then don't we
> need to set PCI_INTERRUPT_LINE to 0 for all devices irrespective of
> host bridge? 
Since VMD doesn't support legacy interrupt, BIOS sets
PCI_INTERRUPT_LINE registers to 0 for all of the VMD rootports but
not the NVMes'.

According to PCIe base specs, "Values in this register are
programmed by system software and are system architecture specific.
The Function itself does not use this value; rather the value in this
register is used by device drivers and operating systems."

We had an issue raised on us sometime back because some SSDs have 0xff
(i.e. Samsung) set to these registers by firmware and SPDK was reading
them when SSDs were behind VMD which led them to believe VMD had INTx
support enabled. After some testing, it made more sense to clear these
registers for all of the VMD owned devices.

> 
> > Based Bjorn's and your suggestion, it might be better if VMD sets
> > PCI_INTERRUPT_LINE register for all of its sub-devices during VMD
> > enumeration.
> >   
> 
> What about hotplug devices?
That is a good question and because of that I thought of putting the
fix in fixup.c. But I am open to your suggestion since fixup is not the
right place.

Thanks
- nirmal
> 
> - Mani
>
Manivannan Sadhasivam July 31, 2024, 3:07 a.m. UTC | #9
On Tue, Jul 30, 2024 at 10:51:15AM -0700, Nirmal Patel wrote:
> On Tue, 30 Jul 2024 10:58:30 +0530
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> 
> > On Mon, Jul 29, 2024 at 01:08:59PM -0700, Nirmal Patel wrote:
> > > On Thu, 25 Jul 2024 09:40:13 +0530
> > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> > >   
> > > > On Wed, Jul 24, 2024 at 02:10:30PM -0500, Bjorn Helgaas wrote:  
> > > > > On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:
> > > > >  
> > > > > > VMD does not support legacy interrupts for devices downstream
> > > > > > from a VMD endpoint. So initialize the PCI_INTERRUPT_LINE to 0
> > > > > > for these devices to ensure we don't try to set up a legacy
> > > > > > irq for them.    
> > > > > 
> > > > > s/legacy interrupts/INTx/
> > > > > s/legacy irq/INTx/
> > > > >     
> > > > > > Note: This patch was proposed by Jim, I am trying to upstream
> > > > > > it.
> > > > > > 
> > > > > > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > > > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > > > > ---
> > > > > >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> > > > > >  1 file changed, 14 insertions(+)
> > > > > > 
> > > > > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > > > > index b33afb240601..a3b34a256e7f 100644
> > > > > > --- a/arch/x86/pci/fixup.c
> > > > > > +++ b/arch/x86/pci/fixup.c
> > > > > > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct
> > > > > > pci_dev *pdev)
> > > > > > DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
> > > > > > PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid); 
> > > > > > +#if IS_ENABLED(CONFIG_VMD)
> > > > > > +/* 
> > > > > > + * VMD does not support legacy interrupts for downstream
> > > > > > devices.
> > > > > > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to
> > > > > > ensure OS
> > > > > > + * doesn't try to configure a legacy irq.    
> > > > > 
> > > > > s/legacy interrupts/INTx/
> > > > > s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> > > > >     
> > > > > > + */
> > > > > > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > > > > > +{
> > > > > > +	if (is_vmd(dev->bus))
> > > > > > +		pci_write_config_byte(dev,
> > > > > > PCI_INTERRUPT_LINE, 0); +}
> > > > > > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > > > > > quirk_vmd_interrupt_line);    
> > > > > 
> > > > > A quirk for every PCI device, even on systems without VMD, seems
> > > > > like kind of a clumsy way to deal with this.
> > > > > 
> > > > > Conceptually, I would expect a host bridge driver (VMD acts
> > > > > like a host bridge in this case) to know whether it supports
> > > > > INTx, and if the driver knows it doesn't support INTx or it has
> > > > > no _PRT or DT description of INTx routing to use, an attempt to
> > > > > configure INTx should just fail naturally.
> > > > > 
> > > > > I don't claim this is how host bridge drivers actually work; I
> > > > > just think it's the way they *should* work.
> > > > >     
> > > > 
> > > > Absolutely! This patch is fixing the issue in a wrong place. There
> > > > are existing DT based host bridge drivers that disable INTx due to
> > > > lack of hardware capability. The driver just need to nullify
> > > > pci_host_bridge::map_irq callback.
> > > > 
> > > > - Mani
> > > >   
> > > For VMD as a host bridge, pci_host_bridge::map_irq is null. Even all
> > > VMD rootports' PCI_INTERRUPT_LINE registers are set to 0.   
> > 
> > If map_irq is already NULL, then how INTx is being configured? In
> > your patch description:
> VMD uses MSIx.
> > 
> > "So initialize the PCI_INTERRUPT_LINE to 0 for these devices to
> > ensure we don't try to set up a legacy irq for them."
> > 
> > Who is 'we'? For sure the PCI core wouldn't set INTx in your case.
> > Does 'we' refer to device firmware?
> > 
> > >Since VMD
> > > doesn't explicitly set PCI_INTERRUPT_LINE register to 0 for all of
> > > its sub-devices (i.e. NVMe), if some NVMes has non-zero value set
> > > for PCI_INTERRUPT_LINE (i.e. 0xff) then some software like SPDK can
> > > read it and make wrong assumption about INTx support.
> > >   
> > 
> > Is this statement is true (I haven't heard of before), then don't we
> > need to set PCI_INTERRUPT_LINE to 0 for all devices irrespective of
> > host bridge? 
> Since VMD doesn't support legacy interrupt, BIOS sets
> PCI_INTERRUPT_LINE registers to 0 for all of the VMD rootports but
> not the NVMes'.
> 
> According to PCIe base specs, "Values in this register are
> programmed by system software and are system architecture specific.
> The Function itself does not use this value; rather the value in this
> register is used by device drivers and operating systems."
> 
> We had an issue raised on us sometime back because some SSDs have 0xff
> (i.e. Samsung) set to these registers by firmware and SPDK was reading
> them when SSDs were behind VMD which led them to believe VMD had INTx
> support enabled. After some testing, it made more sense to clear these
> registers for all of the VMD owned devices.
> 

This is a valuable information that should've been present in the patch
description. Now I can understand the intention of your patch. Previously I
couldn't.

> > 
> > > Based Bjorn's and your suggestion, it might be better if VMD sets
> > > PCI_INTERRUPT_LINE register for all of its sub-devices during VMD
> > > enumeration.
> > >   
> > 
> > What about hotplug devices?
> That is a good question and because of that I thought of putting the
> fix in fixup.c. But I am open to your suggestion since fixup is not the
> right place.
> 

How about the below change?

diff --git a/drivers/pci/irq.c b/drivers/pci/irq.c
index 4555630be9ec..140df1138f14 100644
--- a/drivers/pci/irq.c
+++ b/drivers/pci/irq.c
@@ -147,6 +147,13 @@ void pci_assign_irq(struct pci_dev *dev)
        struct pci_host_bridge *hbrg = pci_find_host_bridge(dev->bus);
 
        if (!(hbrg->map_irq)) {
+               /*
+                * Some userspace applications like SPDK reads
+                * PCI_INTERRUPT_LINE to decide whether INTx is enabled or not.
+                * So write 0 to make sure they understand that INTx is disabled
+                * for the device.
+                */
+               pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
                pci_dbg(dev, "runtime IRQ mapping not provided by arch\n");
                return;
        }


So this sets PCI_INTERRUPT_LINE to 0 for _all_ devices that don't support INTx.
As per your explanation above, the issue you are seeing is not just applicable
to VMD, but for all devices.

- Mani
Nirmal Patel July 31, 2024, 7:21 p.m. UTC | #10
On Wed, 24 Jul 2024 11:36:31 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel wrote:
> > +#if IS_ENABLED(CONFIG_VMD)
> > +/* 
> > + * VMD does not support legacy interrupts for downstream devices.
> > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure OS
> > + * doesn't try to configure a legacy irq.
> > + */  
> 
> The wording is a bit weird, Linux is the OS normally.  Or is this
> about guest OSes in virtualized environments?  Given how VMD bypasses
> a lot of boundaries can we even assign individual devices to VFIO?
> And if so is that actually safe or should we prohibit it?
> 

This is for host OS. I will add better description as suggested by
Mani.
Nirmal Patel Aug. 1, 2024, 6:57 p.m. UTC | #11
On Wed, 31 Jul 2024 08:37:39 +0530
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:

> On Tue, Jul 30, 2024 at 10:51:15AM -0700, Nirmal Patel wrote:
> > On Tue, 30 Jul 2024 10:58:30 +0530
> > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> >   
> > > On Mon, Jul 29, 2024 at 01:08:59PM -0700, Nirmal Patel wrote:  
> > > > On Thu, 25 Jul 2024 09:40:13 +0530
> > > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> > > >     
> > > > > On Wed, Jul 24, 2024 at 02:10:30PM -0500, Bjorn Helgaas
> > > > > wrote:    
> > > > > > On Wed, Jul 24, 2024 at 10:00:40AM -0700, Nirmal Patel
> > > > > > wrote: 
> > > > > > > VMD does not support legacy interrupts for devices
> > > > > > > downstream from a VMD endpoint. So initialize the
> > > > > > > PCI_INTERRUPT_LINE to 0 for these devices to ensure we
> > > > > > > don't try to set up a legacy irq for them.      
> > > > > > 
> > > > > > s/legacy interrupts/INTx/
> > > > > > s/legacy irq/INTx/
> > > > > >       
> > > > > > > Note: This patch was proposed by Jim, I am trying to
> > > > > > > upstream it.
> > > > > > > 
> > > > > > > Signed-off-by: Jim Harris <james.r.harris@intel.com>
> > > > > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > > > > > ---
> > > > > > >  arch/x86/pci/fixup.c | 14 ++++++++++++++
> > > > > > >  1 file changed, 14 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> > > > > > > index b33afb240601..a3b34a256e7f 100644
> > > > > > > --- a/arch/x86/pci/fixup.c
> > > > > > > +++ b/arch/x86/pci/fixup.c
> > > > > > > @@ -653,6 +653,20 @@ static void quirk_no_aersid(struct
> > > > > > > pci_dev *pdev)
> > > > > > > DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
> > > > > > > PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid); 
> > > > > > > +#if IS_ENABLED(CONFIG_VMD)
> > > > > > > +/* 
> > > > > > > + * VMD does not support legacy interrupts for downstream
> > > > > > > devices.
> > > > > > > + * So PCI_INTERRPUT_LINE needs to be initialized to 0 to
> > > > > > > ensure OS
> > > > > > > + * doesn't try to configure a legacy irq.      
> > > > > > 
> > > > > > s/legacy interrupts/INTx/
> > > > > > s/PCI_INTERRPUT_LINE/PCI_INTERRUPT_LINE/
> > > > > >       
> > > > > > > + */
> > > > > > > +static void quirk_vmd_interrupt_line(struct pci_dev *dev)
> > > > > > > +{
> > > > > > > +	if (is_vmd(dev->bus))
> > > > > > > +		pci_write_config_byte(dev,
> > > > > > > PCI_INTERRUPT_LINE, 0); +}
> > > > > > > +DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > > > > > > quirk_vmd_interrupt_line);      
> > > > > > 
> > > > > > A quirk for every PCI device, even on systems without VMD,
> > > > > > seems like kind of a clumsy way to deal with this.
> > > > > > 
> > > > > > Conceptually, I would expect a host bridge driver (VMD acts
> > > > > > like a host bridge in this case) to know whether it supports
> > > > > > INTx, and if the driver knows it doesn't support INTx or it
> > > > > > has no _PRT or DT description of INTx routing to use, an
> > > > > > attempt to configure INTx should just fail naturally.
> > > > > > 
> > > > > > I don't claim this is how host bridge drivers actually
> > > > > > work; I just think it's the way they *should* work.
> > > > > >       
> > > > > 
> > > > > Absolutely! This patch is fixing the issue in a wrong place.
> > > > > There are existing DT based host bridge drivers that disable
> > > > > INTx due to lack of hardware capability. The driver just need
> > > > > to nullify pci_host_bridge::map_irq callback.
> > > > > 
> > > > > - Mani
> > > > >     
> > > > For VMD as a host bridge, pci_host_bridge::map_irq is null.
> > > > Even all VMD rootports' PCI_INTERRUPT_LINE registers are set to
> > > > 0.     
> > > 
> > > If map_irq is already NULL, then how INTx is being configured? In
> > > your patch description:  
> > VMD uses MSIx.  
> > > 
> > > "So initialize the PCI_INTERRUPT_LINE to 0 for these devices to
> > > ensure we don't try to set up a legacy irq for them."
> > > 
> > > Who is 'we'? For sure the PCI core wouldn't set INTx in your case.
> > > Does 'we' refer to device firmware?
> > >   
> > > >Since VMD
> > > > doesn't explicitly set PCI_INTERRUPT_LINE register to 0 for all
> > > > of its sub-devices (i.e. NVMe), if some NVMes has non-zero
> > > > value set for PCI_INTERRUPT_LINE (i.e. 0xff) then some software
> > > > like SPDK can read it and make wrong assumption about INTx
> > > > support. 
> > > 
> > > Is this statement is true (I haven't heard of before), then don't
> > > we need to set PCI_INTERRUPT_LINE to 0 for all devices
> > > irrespective of host bridge?   
> > Since VMD doesn't support legacy interrupt, BIOS sets
> > PCI_INTERRUPT_LINE registers to 0 for all of the VMD rootports but
> > not the NVMes'.
> > 
> > According to PCIe base specs, "Values in this register are
> > programmed by system software and are system architecture specific.
> > The Function itself does not use this value; rather the value in
> > this register is used by device drivers and operating systems."
> > 
> > We had an issue raised on us sometime back because some SSDs have
> > 0xff (i.e. Samsung) set to these registers by firmware and SPDK was
> > reading them when SSDs were behind VMD which led them to believe
> > VMD had INTx support enabled. After some testing, it made more
> > sense to clear these registers for all of the VMD owned devices.
> >   
> 
> This is a valuable information that should've been present in the
> patch description. Now I can understand the intention of your patch.
> Previously I couldn't.
> 
> > >   
> > > > Based Bjorn's and your suggestion, it might be better if VMD
> > > > sets PCI_INTERRUPT_LINE register for all of its sub-devices
> > > > during VMD enumeration.
> > > >     
> > > 
> > > What about hotplug devices?  
> > That is a good question and because of that I thought of putting the
> > fix in fixup.c. But I am open to your suggestion since fixup is not
> > the right place.
> >   
> 
> How about the below change?
> 
> diff --git a/drivers/pci/irq.c b/drivers/pci/irq.c
> index 4555630be9ec..140df1138f14 100644
> --- a/drivers/pci/irq.c
> +++ b/drivers/pci/irq.c
> @@ -147,6 +147,13 @@ void pci_assign_irq(struct pci_dev *dev)
>         struct pci_host_bridge *hbrg = pci_find_host_bridge(dev->bus);
>  
>         if (!(hbrg->map_irq)) {
> +               /*
> +                * Some userspace applications like SPDK reads
> +                * PCI_INTERRUPT_LINE to decide whether INTx is
> enabled or not.
> +                * So write 0 to make sure they understand that INTx
> is disabled
> +                * for the device.
> +                */
> +               pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
>                 pci_dbg(dev, "runtime IRQ mapping not provided by
> arch\n"); return;
>         }
> 
> 
> So this sets PCI_INTERRUPT_LINE to 0 for _all_ devices that don't
> support INTx. As per your explanation above, the issue you are seeing
> is not just applicable to VMD, but for all devices.
> 
> - Mani
> 

Thanks for the suggestion. Let me test the changes.

-nirmal
diff mbox series

Patch

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index b33afb240601..a3b34a256e7f 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -653,6 +653,20 @@  static void quirk_no_aersid(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
 			      PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
 
+#if IS_ENABLED(CONFIG_VMD)
+/* 
+ * VMD does not support legacy interrupts for downstream devices.
+ * So PCI_INTERRPUT_LINE needs to be initialized to 0 to ensure OS
+ * doesn't try to configure a legacy irq.
+ */
+static void quirk_vmd_interrupt_line(struct pci_dev *dev)
+{
+	if (is_vmd(dev->bus))
+		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, quirk_vmd_interrupt_line);
+#endif
+
 static void quirk_intel_th_dnv(struct pci_dev *dev)
 {
 	struct resource *r = &dev->resource[4];