diff mbox series

pci: irq: Add an early parameter to limit pci irq numbers

Message ID 20230524093623.3698134-1-chenhuacai@loongson.cn
State New
Headers show
Series pci: irq: Add an early parameter to limit pci irq numbers | expand

Commit Message

Huacai Chen May 24, 2023, 9:36 a.m. UTC
Some platforms (such as LoongArch) cannot provide enough irq numbers as
many as logical cpu numbers. So we should limit pci irq numbers when
allocate msi/msix vectors, otherwise some device drivers may fail at
initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
to control the limit.

The default pci msi/msix number limit is defined 32 for LoongArch and
NR_IRQS for other platforms.

Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas May 24, 2023, 3:21 p.m. UTC | #1
[+cc Marc, LKML]

On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> Some platforms (such as LoongArch) cannot provide enough irq numbers as
> many as logical cpu numbers. So we should limit pci irq numbers when
> allocate msi/msix vectors, otherwise some device drivers may fail at
> initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> to control the limit.
> 
> The default pci msi/msix number limit is defined 32 for LoongArch and
> NR_IRQS for other platforms.

The IRQ experts can chime in on this, but this doesn't feel right to
me.  I assume arch code should set things up so only valid IRQ numbers
can be allocated.  This doesn't seem necessarily PCI-specific, I'd
prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
command-line parameter that users have to discover and supply.

> Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>  drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index ef1d8857a51b..6617381e50e7 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
>  	return ret;
>  }
>  
> +#ifdef CONFIG_LOONGARCH
> +#define DEFAULT_PCI_IRQ_LIMITS 32
> +#else
> +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> +#endif
> +
> +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> +
> +static int __init pci_irq_limit(char *str)
> +{
> +	get_option(&str, &pci_irq_limits);
> +
> +	if (pci_irq_limits == 0)
> +		pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> +
> +	return 0;
> +}
> +
> +early_param("pci_irq_limit", pci_irq_limit);
> +
>  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  			   struct irq_affinity *affd)
>  {
>  	int nvec;
>  	int rc;
>  
> +	maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> +
>  	if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
>  		return -EINVAL;
>  
> @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
>  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
>  			    int maxvec, struct irq_affinity *affd, int flags)
>  {
> -	int hwsize, rc, nvec = maxvec;
> +	int hwsize, rc, nvec;
> +
> +	nvec = clamp_val(maxvec, 0, pci_irq_limits);
>  
>  	if (maxvec < minvec)
>  		return -ERANGE;
> -- 
> 2.39.1
>
Marc Zyngier May 25, 2023, 9:08 a.m. UTC | #2
On Wed, 24 May 2023 16:21:09 +0100,
Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> [+cc Marc, LKML]
> 
> On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > many as logical cpu numbers. So we should limit pci irq numbers when
> > allocate msi/msix vectors, otherwise some device drivers may fail at
> > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > to control the limit.
> > 
> > The default pci msi/msix number limit is defined 32 for LoongArch and
> > NR_IRQS for other platforms.
> 
> The IRQ experts can chime in on this, but this doesn't feel right to
> me.  I assume arch code should set things up so only valid IRQ numbers
> can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> command-line parameter that users have to discover and supply.

I'd tend to agree. The irqchip driver that provides the interrupt
numbers should perform the capping, not the core PCI code.

Furthermore, MSI allocation is never guaranteed anyway, and drivers
shouldn't expect to get all the interrupts they request. If they do,
they need fixing.

Overall, interrupt architectures that provide as few as 32 possible
interrupts for MSIs will be crippled, and there isn't much we can do
about it. This also applies to a large number of ARM systems that use
the Designware IP.

	M.
Huacai Chen May 25, 2023, 9:14 a.m. UTC | #3
Hi, Bjorn,

On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Marc, LKML]
>
> On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > many as logical cpu numbers. So we should limit pci irq numbers when
> > allocate msi/msix vectors, otherwise some device drivers may fail at
> > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > to control the limit.
> >
> > The default pci msi/msix number limit is defined 32 for LoongArch and
> > NR_IRQS for other platforms.
>
> The IRQ experts can chime in on this, but this doesn't feel right to
> me.  I assume arch code should set things up so only valid IRQ numbers
> can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> command-line parameter that users have to discover and supply.
The problem we meet: LoongArch machines can have as many as 256
logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
machine, 192 irqs can be easily exhausted if there are several NICs
(NIC usually allocates msi irqs depending on the number of online
cpus). So we want to limit the msi allocation.

This is not a LoongArch-specific problem, because I think other
platforms can also meet if they have many NICs. But of course,
LoongArch can meet it more easily because the available msi vectors
are very few. So, adding a cmdline parameter is somewhat reasonable.

After some investigation, I think it may be possible to modify
drivers/irqchip/irq-loongson-pch-msi.c and override
msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
doing that need to remove the "static" before
__msi_domain_alloc_irqs(), which means revert
762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
__msi_domain_alloc_irqs() static"), I don't know whether that is
acceptable.

If such a revert is not acceptable, it seems that we can only use the
method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
a little better.

Huacai

>
> > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > index ef1d8857a51b..6617381e50e7 100644
> > --- a/drivers/pci/msi/msi.c
> > +++ b/drivers/pci/msi/msi.c
> > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> >       return ret;
> >  }
> >
> > +#ifdef CONFIG_LOONGARCH
> > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > +#else
> > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > +#endif
> > +
> > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > +
> > +static int __init pci_irq_limit(char *str)
> > +{
> > +     get_option(&str, &pci_irq_limits);
> > +
> > +     if (pci_irq_limits == 0)
> > +             pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > +
> > +     return 0;
> > +}
> > +
> > +early_param("pci_irq_limit", pci_irq_limit);
> > +
> >  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> >                          struct irq_affinity *affd)
> >  {
> >       int nvec;
> >       int rc;
> >
> > +     maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > +
> >       if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> >               return -EINVAL;
> >
> > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> >  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> >                           int maxvec, struct irq_affinity *affd, int flags)
> >  {
> > -     int hwsize, rc, nvec = maxvec;
> > +     int hwsize, rc, nvec;
> > +
> > +     nvec = clamp_val(maxvec, 0, pci_irq_limits);
> >
> >       if (maxvec < minvec)
> >               return -ERANGE;
> > --
> > 2.39.1
> >
Bjorn Helgaas May 25, 2023, 9:40 p.m. UTC | #4
On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > to control the limit.
> > >
> > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > NR_IRQS for other platforms.
> >
> > The IRQ experts can chime in on this, but this doesn't feel right to
> > me.  I assume arch code should set things up so only valid IRQ numbers
> > can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > command-line parameter that users have to discover and supply.
>
> The problem we meet: LoongArch machines can have as many as 256
> logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> machine, 192 irqs can be easily exhausted if there are several NICs
> (NIC usually allocates msi irqs depending on the number of online
> cpus). So we want to limit the msi allocation.
> 
> This is not a LoongArch-specific problem, because I think other
> platforms can also meet if they have many NICs. But of course,
> LoongArch can meet it more easily because the available msi vectors
> are very few. So, adding a cmdline parameter is somewhat reasonable.

The patch contains "#ifdef CONFIG_LOONGARCH", which makes this
solution LoongArch-specific.  I'm not willing for that yet.

It sounds like the LoongArch MSI limit is known at compile-time, or at
least at boot-time, so the kernel ought to be able to figure out what
to do without a command-line parameter.

> After some investigation, I think it may be possible to modify
> drivers/irqchip/irq-loongson-pch-msi.c and override
> msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> doing that need to remove the "static" before
> __msi_domain_alloc_irqs(), which means revert
> 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> __msi_domain_alloc_irqs() static"), I don't know whether that is
> acceptable.

I guess you mean msi_domain_ops::domain_alloc_irqs() (not
msi_domain_info).  If this is really a generic problem, I'm surprised
that no other arch has needed to override .domain_alloc_irqs().

I think you'll have better luck getting feedback if you can post the
complete working patch.  At the very least, you'll learn more about
the problem by doing that.

Bjorn
Huacai Chen May 26, 2023, 8:17 a.m. UTC | #5
Hi, Bjorn and Marc,

On Fri, May 26, 2023 at 5:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > > to control the limit.
> > > >
> > > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > > NR_IRQS for other platforms.
> > >
> > > The IRQ experts can chime in on this, but this doesn't feel right to
> > > me.  I assume arch code should set things up so only valid IRQ numbers
> > > can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > > command-line parameter that users have to discover and supply.
> >
> > The problem we meet: LoongArch machines can have as many as 256
> > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> > machine, 192 irqs can be easily exhausted if there are several NICs
> > (NIC usually allocates msi irqs depending on the number of online
> > cpus). So we want to limit the msi allocation.
> >
> > This is not a LoongArch-specific problem, because I think other
> > platforms can also meet if they have many NICs. But of course,
> > LoongArch can meet it more easily because the available msi vectors
> > are very few. So, adding a cmdline parameter is somewhat reasonable.
>
> The patch contains "#ifdef CONFIG_LOONGARCH", which makes this
> solution LoongArch-specific.  I'm not willing for that yet.
>
> It sounds like the LoongArch MSI limit is known at compile-time, or at
> least at boot-time, so the kernel ought to be able to figure out what
> to do without a command-line parameter.
>
> > After some investigation, I think it may be possible to modify
> > drivers/irqchip/irq-loongson-pch-msi.c and override
> > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> > doing that need to remove the "static" before
> > __msi_domain_alloc_irqs(), which means revert
> > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> > __msi_domain_alloc_irqs() static"), I don't know whether that is
> > acceptable.
>
> I guess you mean msi_domain_ops::domain_alloc_irqs() (not
> msi_domain_info).  If this is really a generic problem, I'm surprised
> that no other arch has needed to override .domain_alloc_irqs().
Yes, I mean msi_domain_ops::domain_alloc_irqs() here.

>
> I think you'll have better luck getting feedback if you can post the
> complete working patch.  At the very least, you'll learn more about
> the problem by doing that.
Emm, I found I can do some small modification on
msi_domain_prepare_irqs(), and solve the problem by overriding
msi_domain_ops::msi_prepare(), thanks. And patches is coming soon.

Huacai
>
> Bjorn
Manivannan Sadhasivam May 28, 2023, 4:57 p.m. UTC | #6
On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> Hi, Bjorn,
> 
> On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Marc, LKML]
> >
> > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > to control the limit.
> > >
> > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > NR_IRQS for other platforms.
> >
> > The IRQ experts can chime in on this, but this doesn't feel right to
> > me.  I assume arch code should set things up so only valid IRQ numbers
> > can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > command-line parameter that users have to discover and supply.
> The problem we meet: LoongArch machines can have as many as 256
> logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> machine, 192 irqs can be easily exhausted if there are several NICs
> (NIC usually allocates msi irqs depending on the number of online
> cpus). So we want to limit the msi allocation.
> 

If the MSI allocation fails with multiple vectors, then the NIC driver should
revert to a single MSI vector. Is that happening in your case?

- Mani

> This is not a LoongArch-specific problem, because I think other
> platforms can also meet if they have many NICs. But of course,
> LoongArch can meet it more easily because the available msi vectors
> are very few. So, adding a cmdline parameter is somewhat reasonable.
> 
> After some investigation, I think it may be possible to modify
> drivers/irqchip/irq-loongson-pch-msi.c and override
> msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> doing that need to remove the "static" before
> __msi_domain_alloc_irqs(), which means revert
> 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> __msi_domain_alloc_irqs() static"), I don't know whether that is
> acceptable.
> 
> If such a revert is not acceptable, it seems that we can only use the
> method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
> a little better.
> 
> Huacai
> 
> >
> > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
> > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > ---
> > >  drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > > index ef1d8857a51b..6617381e50e7 100644
> > > --- a/drivers/pci/msi/msi.c
> > > +++ b/drivers/pci/msi/msi.c
> > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > >       return ret;
> > >  }
> > >
> > > +#ifdef CONFIG_LOONGARCH
> > > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > > +#else
> > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > > +#endif
> > > +
> > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > +
> > > +static int __init pci_irq_limit(char *str)
> > > +{
> > > +     get_option(&str, &pci_irq_limits);
> > > +
> > > +     if (pci_irq_limits == 0)
> > > +             pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +early_param("pci_irq_limit", pci_irq_limit);
> > > +
> > >  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> > >                          struct irq_affinity *affd)
> > >  {
> > >       int nvec;
> > >       int rc;
> > >
> > > +     maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > +
> > >       if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> > >               return -EINVAL;
> > >
> > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> > >  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> > >                           int maxvec, struct irq_affinity *affd, int flags)
> > >  {
> > > -     int hwsize, rc, nvec = maxvec;
> > > +     int hwsize, rc, nvec;
> > > +
> > > +     nvec = clamp_val(maxvec, 0, pci_irq_limits);
> > >
> > >       if (maxvec < minvec)
> > >               return -ERANGE;
> > > --
> > > 2.39.1
> > >
Huacai Chen May 29, 2023, 2:02 a.m. UTC | #7
Hi, Manivannan,

On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> > Hi, Bjorn,
> >
> > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > [+cc Marc, LKML]
> > >
> > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > > to control the limit.
> > > >
> > > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > > NR_IRQS for other platforms.
> > >
> > > The IRQ experts can chime in on this, but this doesn't feel right to
> > > me.  I assume arch code should set things up so only valid IRQ numbers
> > > can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > > command-line parameter that users have to discover and supply.
> > The problem we meet: LoongArch machines can have as many as 256
> > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> > machine, 192 irqs can be easily exhausted if there are several NICs
> > (NIC usually allocates msi irqs depending on the number of online
> > cpus). So we want to limit the msi allocation.
> >
>
> If the MSI allocation fails with multiple vectors, then the NIC driver should
> revert to a single MSI vector. Is that happening in your case?
Thank you for pointing this out. Yes, I know  most existing drivers
will fallback to use single msi or legacy irqs when failed. However,
as I
replied in another thread (the new solution of this problem [1]), we
want to do some proactive throttling rather than consume msi vectors
aggressively. For example, if we have two NICs, we want both of them
to get 32 msi vectors; not one exhaust all available vectors, and the
other fallback to use single msi or legacy irq.

I hope I have explained clearly, thanks.

[1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@loongson.cn/T/#t

Huacai
>
> - Mani
>
> > This is not a LoongArch-specific problem, because I think other
> > platforms can also meet if they have many NICs. But of course,
> > LoongArch can meet it more easily because the available msi vectors
> > are very few. So, adding a cmdline parameter is somewhat reasonable.
> >
> > After some investigation, I think it may be possible to modify
> > drivers/irqchip/irq-loongson-pch-msi.c and override
> > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> > doing that need to remove the "static" before
> > __msi_domain_alloc_irqs(), which means revert
> > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> > __msi_domain_alloc_irqs() static"), I don't know whether that is
> > acceptable.
> >
> > If such a revert is not acceptable, it seems that we can only use the
> > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
> > a little better.
> >
> > Huacai
> >
> > >
> > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
> > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > > ---
> > > >  drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > > > index ef1d8857a51b..6617381e50e7 100644
> > > > --- a/drivers/pci/msi/msi.c
> > > > +++ b/drivers/pci/msi/msi.c
> > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > > >       return ret;
> > > >  }
> > > >
> > > > +#ifdef CONFIG_LOONGARCH
> > > > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > > > +#else
> > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > > > +#endif
> > > > +
> > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > +
> > > > +static int __init pci_irq_limit(char *str)
> > > > +{
> > > > +     get_option(&str, &pci_irq_limits);
> > > > +
> > > > +     if (pci_irq_limits == 0)
> > > > +             pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +early_param("pci_irq_limit", pci_irq_limit);
> > > > +
> > > >  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> > > >                          struct irq_affinity *affd)
> > > >  {
> > > >       int nvec;
> > > >       int rc;
> > > >
> > > > +     maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > +
> > > >       if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> > > >               return -EINVAL;
> > > >
> > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> > > >  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> > > >                           int maxvec, struct irq_affinity *affd, int flags)
> > > >  {
> > > > -     int hwsize, rc, nvec = maxvec;
> > > > +     int hwsize, rc, nvec;
> > > > +
> > > > +     nvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > >
> > > >       if (maxvec < minvec)
> > > >               return -ERANGE;
> > > > --
> > > > 2.39.1
> > > >
>
> --
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam May 29, 2023, 5:39 a.m. UTC | #8
On Mon, May 29, 2023 at 10:02:20AM +0800, Huacai Chen wrote:
> Hi, Manivannan,
> 
> On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
> >
> > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> > > Hi, Bjorn,
> > >
> > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > [+cc Marc, LKML]
> > > >
> > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > > > to control the limit.
> > > > >
> > > > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > > > NR_IRQS for other platforms.
> > > >
> > > > The IRQ experts can chime in on this, but this doesn't feel right to
> > > > me.  I assume arch code should set things up so only valid IRQ numbers
> > > > can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > > > command-line parameter that users have to discover and supply.
> > > The problem we meet: LoongArch machines can have as many as 256
> > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> > > machine, 192 irqs can be easily exhausted if there are several NICs
> > > (NIC usually allocates msi irqs depending on the number of online
> > > cpus). So we want to limit the msi allocation.
> > >
> >
> > If the MSI allocation fails with multiple vectors, then the NIC driver should
> > revert to a single MSI vector. Is that happening in your case?
> Thank you for pointing this out. Yes, I know  most existing drivers
> will fallback to use single msi or legacy irqs when failed. However,
> as I
> replied in another thread (the new solution of this problem [1]), we
> want to do some proactive throttling rather than consume msi vectors
> aggressively. For example, if we have two NICs, we want both of them
> to get 32 msi vectors; not one exhaust all available vectors, and the
> other fallback to use single msi or legacy irq.
> 
> I hope I have explained clearly, thanks.
> 

The problem you are facing is not specific to Loongsoon but rather generic. And
the solution we have currently is what you were also aware of it seems. So if
you want to propose an alternative solution, it should be generic and also a
good justification needs to be provided to the maintainers i.e., comparing two
solutions and why yours is better.

But IMO what you are proposing seems like usecase driven and may not work all
the time due to architecture limitation. This again proves that the existing
solution is sufficient enough.

- Mani

> [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@loongson.cn/T/#t
> 
> Huacai
> >
> > - Mani
> >
> > > This is not a LoongArch-specific problem, because I think other
> > > platforms can also meet if they have many NICs. But of course,
> > > LoongArch can meet it more easily because the available msi vectors
> > > are very few. So, adding a cmdline parameter is somewhat reasonable.
> > >
> > > After some investigation, I think it may be possible to modify
> > > drivers/irqchip/irq-loongson-pch-msi.c and override
> > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> > > doing that need to remove the "static" before
> > > __msi_domain_alloc_irqs(), which means revert
> > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> > > __msi_domain_alloc_irqs() static"), I don't know whether that is
> > > acceptable.
> > >
> > > If such a revert is not acceptable, it seems that we can only use the
> > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
> > > a little better.
> > >
> > > Huacai
> > >
> > > >
> > > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
> > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > > > ---
> > > > >  drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> > > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > > > > index ef1d8857a51b..6617381e50e7 100644
> > > > > --- a/drivers/pci/msi/msi.c
> > > > > +++ b/drivers/pci/msi/msi.c
> > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > > > >       return ret;
> > > > >  }
> > > > >
> > > > > +#ifdef CONFIG_LOONGARCH
> > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > > > > +#else
> > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > > > > +#endif
> > > > > +
> > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > +
> > > > > +static int __init pci_irq_limit(char *str)
> > > > > +{
> > > > > +     get_option(&str, &pci_irq_limits);
> > > > > +
> > > > > +     if (pci_irq_limits == 0)
> > > > > +             pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > +
> > > > > +     return 0;
> > > > > +}
> > > > > +
> > > > > +early_param("pci_irq_limit", pci_irq_limit);
> > > > > +
> > > > >  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> > > > >                          struct irq_affinity *affd)
> > > > >  {
> > > > >       int nvec;
> > > > >       int rc;
> > > > >
> > > > > +     maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > > +
> > > > >       if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> > > > >               return -EINVAL;
> > > > >
> > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> > > > >  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> > > > >                           int maxvec, struct irq_affinity *affd, int flags)
> > > > >  {
> > > > > -     int hwsize, rc, nvec = maxvec;
> > > > > +     int hwsize, rc, nvec;
> > > > > +
> > > > > +     nvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > >
> > > > >       if (maxvec < minvec)
> > > > >               return -ERANGE;
> > > > > --
> > > > > 2.39.1
> > > > >
> >
> > --
> > மணிவண்ணன் சதாசிவம்
Huacai Chen May 29, 2023, 6:52 a.m. UTC | #9
Hi, Manivannan,

On Mon, May 29, 2023 at 1:39 PM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Mon, May 29, 2023 at 10:02:20AM +0800, Huacai Chen wrote:
> > Hi, Manivannan,
> >
> > On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam
> > <manivannan.sadhasivam@linaro.org> wrote:
> > >
> > > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> > > > Hi, Bjorn,
> > > >
> > > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > >
> > > > > [+cc Marc, LKML]
> > > > >
> > > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > > > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > > > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > > > > to control the limit.
> > > > > >
> > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > > > > NR_IRQS for other platforms.
> > > > >
> > > > > The IRQ experts can chime in on this, but this doesn't feel right to
> > > > > me.  I assume arch code should set things up so only valid IRQ numbers
> > > > > can be allocated.  This doesn't seem necessarily PCI-specific, I'd
> > > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > > > > command-line parameter that users have to discover and supply.
> > > > The problem we meet: LoongArch machines can have as many as 256
> > > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> > > > machine, 192 irqs can be easily exhausted if there are several NICs
> > > > (NIC usually allocates msi irqs depending on the number of online
> > > > cpus). So we want to limit the msi allocation.
> > > >
> > >
> > > If the MSI allocation fails with multiple vectors, then the NIC driver should
> > > revert to a single MSI vector. Is that happening in your case?
> > Thank you for pointing this out. Yes, I know  most existing drivers
> > will fallback to use single msi or legacy irqs when failed. However,
> > as I
> > replied in another thread (the new solution of this problem [1]), we
> > want to do some proactive throttling rather than consume msi vectors
> > aggressively. For example, if we have two NICs, we want both of them
> > to get 32 msi vectors; not one exhaust all available vectors, and the
> > other fallback to use single msi or legacy irq.
> >
> > I hope I have explained clearly, thanks.
> >
>
> The problem you are facing is not specific to Loongsoon but rather generic. And
> the solution we have currently is what you were also aware of it seems. So if
> you want to propose an alternative solution, it should be generic and also a
> good justification needs to be provided to the maintainers i.e., comparing two
> solutions and why yours is better.
Yes, I think we are facing a generic problem, but it is more obvious
on platforms which provide less MSI vectors. And my solution seems
generic enough. :)

At least in my example, "proactive throttling" is better than
"aggressive consuming", because two (or more) NICs have more balanced
throughput.

>
> But IMO what you are proposing seems like usecase driven and may not work all
> the time due to architecture limitation. This again proves that the existing
> solution is sufficient enough.
Yes, it's a usecase driven solution, so I provide a cmdline parameter
to let the user decide.

Huacai
>
> - Mani
>
> > [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@loongson.cn/T/#t
> >
> > Huacai
> > >
> > > - Mani
> > >
> > > > This is not a LoongArch-specific problem, because I think other
> > > > platforms can also meet if they have many NICs. But of course,
> > > > LoongArch can meet it more easily because the available msi vectors
> > > > are very few. So, adding a cmdline parameter is somewhat reasonable.
> > > >
> > > > After some investigation, I think it may be possible to modify
> > > > drivers/irqchip/irq-loongson-pch-msi.c and override
> > > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> > > > doing that need to remove the "static" before
> > > > __msi_domain_alloc_irqs(), which means revert
> > > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> > > > __msi_domain_alloc_irqs() static"), I don't know whether that is
> > > > acceptable.
> > > >
> > > > If such a revert is not acceptable, it seems that we can only use the
> > > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
> > > > a little better.
> > > >
> > > > Huacai
> > > >
> > > > >
> > > > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn>
> > > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > > > > ---
> > > > > >  drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> > > > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > > > > > index ef1d8857a51b..6617381e50e7 100644
> > > > > > --- a/drivers/pci/msi/msi.c
> > > > > > +++ b/drivers/pci/msi/msi.c
> > > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > > > > >       return ret;
> > > > > >  }
> > > > > >
> > > > > > +#ifdef CONFIG_LOONGARCH
> > > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > > > > > +#else
> > > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > > > > > +#endif
> > > > > > +
> > > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > > +
> > > > > > +static int __init pci_irq_limit(char *str)
> > > > > > +{
> > > > > > +     get_option(&str, &pci_irq_limits);
> > > > > > +
> > > > > > +     if (pci_irq_limits == 0)
> > > > > > +             pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > > +
> > > > > > +     return 0;
> > > > > > +}
> > > > > > +
> > > > > > +early_param("pci_irq_limit", pci_irq_limit);
> > > > > > +
> > > > > >  int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> > > > > >                          struct irq_affinity *affd)
> > > > > >  {
> > > > > >       int nvec;
> > > > > >       int rc;
> > > > > >
> > > > > > +     maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > > > +
> > > > > >       if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> > > > > >               return -EINVAL;
> > > > > >
> > > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> > > > > >  int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> > > > > >                           int maxvec, struct irq_affinity *affd, int flags)
> > > > > >  {
> > > > > > -     int hwsize, rc, nvec = maxvec;
> > > > > > +     int hwsize, rc, nvec;
> > > > > > +
> > > > > > +     nvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > > >
> > > > > >       if (maxvec < minvec)
> > > > > >               return -ERANGE;
> > > > > > --
> > > > > > 2.39.1
> > > > > >
> > >
> > > --
> > > மணிவண்ணன் சதாசிவம்
>
> --
> மணிவண்ணன் சதாசிவம்
Jason Gunthorpe May 31, 2023, 7:14 p.m. UTC | #10
On Mon, May 29, 2023 at 02:52:29PM +0800, Huacai Chen wrote:

> > But IMO what you are proposing seems like usecase driven and may not work all
> > the time due to architecture limitation. This again proves that the existing
> > solution is sufficient enough.

> Yes, it's a usecase driven solution, so I provide a cmdline parameter
> to let the user decide.

The NIC drivers should be consuming interrupts based on the number of
queues they are using, and that is something you can control from the
command line, eg ethtool IIRC. Usually it defaults to the number of
CPUs.

Basically, you want to enable the user to configure the system with a
user specified reduced number of NIC queues, and we already have way
to do that.

Jason
Huacai Chen June 1, 2023, 4:19 a.m. UTC | #11
Hi, Jason,

On Thu, Jun 1, 2023 at 3:14 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, May 29, 2023 at 02:52:29PM +0800, Huacai Chen wrote:
>
> > > But IMO what you are proposing seems like usecase driven and may not work all
> > > the time due to architecture limitation. This again proves that the existing
> > > solution is sufficient enough.
>
> > Yes, it's a usecase driven solution, so I provide a cmdline parameter
> > to let the user decide.
>
> The NIC drivers should be consuming interrupts based on the number of
> queues they are using, and that is something you can control from the
> command line, eg ethtool IIRC. Usually it defaults to the number of
> CPUs.
>
> Basically, you want to enable the user to configure the system with a
> user specified reduced number of NIC queues, and we already have way
> to do that.
Yes, ethtool is a possible way, thank you very much.

Huacai
>
> Jason
diff mbox series

Patch

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index ef1d8857a51b..6617381e50e7 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -402,12 +402,34 @@  static int msi_capability_init(struct pci_dev *dev, int nvec,
 	return ret;
 }
 
+#ifdef CONFIG_LOONGARCH
+#define DEFAULT_PCI_IRQ_LIMITS 32
+#else
+#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
+#endif
+
+static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
+
+static int __init pci_irq_limit(char *str)
+{
+	get_option(&str, &pci_irq_limits);
+
+	if (pci_irq_limits == 0)
+		pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
+
+	return 0;
+}
+
+early_param("pci_irq_limit", pci_irq_limit);
+
 int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 			   struct irq_affinity *affd)
 {
 	int nvec;
 	int rc;
 
+	maxvec = clamp_val(maxvec, 0, pci_irq_limits);
+
 	if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
 		return -EINVAL;
 
@@ -776,7 +798,9 @@  static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
 int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
 			    int maxvec, struct irq_affinity *affd, int flags)
 {
-	int hwsize, rc, nvec = maxvec;
+	int hwsize, rc, nvec;
+
+	nvec = clamp_val(maxvec, 0, pci_irq_limits);
 
 	if (maxvec < minvec)
 		return -ERANGE;