Message ID | 20230524093623.3698134-1-chenhuacai@loongson.cn |
---|---|
State | New |
Headers | show |
Series | pci: irq: Add an early parameter to limit pci irq numbers | expand |
[+cc Marc, LKML] On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > Some platforms (such as LoongArch) cannot provide enough irq numbers as > many as logical cpu numbers. So we should limit pci irq numbers when > allocate msi/msix vectors, otherwise some device drivers may fail at > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > to control the limit. > > The default pci msi/msix number limit is defined 32 for LoongArch and > NR_IRQS for other platforms. The IRQ experts can chime in on this, but this doesn't feel right to me. I assume arch code should set things up so only valid IRQ numbers can be allocated. This doesn't seem necessarily PCI-specific, I'd prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a command-line parameter that users have to discover and supply. > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > --- > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > index ef1d8857a51b..6617381e50e7 100644 > --- a/drivers/pci/msi/msi.c > +++ b/drivers/pci/msi/msi.c > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, > return ret; > } > > +#ifdef CONFIG_LOONGARCH > +#define DEFAULT_PCI_IRQ_LIMITS 32 > +#else > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS > +#endif > + > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > + > +static int __init pci_irq_limit(char *str) > +{ > + get_option(&str, &pci_irq_limits); > + > + if (pci_irq_limits == 0) > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > + > + return 0; > +} > + > +early_param("pci_irq_limit", pci_irq_limit); > + > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, > struct irq_affinity *affd) > { > int nvec; > int rc; > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits); > + > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) > return -EINVAL; > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, > int maxvec, struct irq_affinity *affd, int flags) > { > - int hwsize, rc, nvec = maxvec; > + int hwsize, rc, nvec; > + > + nvec = clamp_val(maxvec, 0, pci_irq_limits); > > if (maxvec < minvec) > return -ERANGE; > -- > 2.39.1 >
On Wed, 24 May 2023 16:21:09 +0100, Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Marc, LKML] > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > many as logical cpu numbers. So we should limit pci irq numbers when > > allocate msi/msix vectors, otherwise some device drivers may fail at > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > to control the limit. > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > NR_IRQS for other platforms. > > The IRQ experts can chime in on this, but this doesn't feel right to > me. I assume arch code should set things up so only valid IRQ numbers > can be allocated. This doesn't seem necessarily PCI-specific, I'd > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > command-line parameter that users have to discover and supply. I'd tend to agree. The irqchip driver that provides the interrupt numbers should perform the capping, not the core PCI code. Furthermore, MSI allocation is never guaranteed anyway, and drivers shouldn't expect to get all the interrupts they request. If they do, they need fixing. Overall, interrupt architectures that provide as few as 32 possible interrupts for MSIs will be crippled, and there isn't much we can do about it. This also applies to a large number of ARM systems that use the Designware IP. M.
Hi, Bjorn, On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Marc, LKML] > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > many as logical cpu numbers. So we should limit pci irq numbers when > > allocate msi/msix vectors, otherwise some device drivers may fail at > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > to control the limit. > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > NR_IRQS for other platforms. > > The IRQ experts can chime in on this, but this doesn't feel right to > me. I assume arch code should set things up so only valid IRQ numbers > can be allocated. This doesn't seem necessarily PCI-specific, I'd > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > command-line parameter that users have to discover and supply. The problem we meet: LoongArch machines can have as many as 256 logical cpus, and the maximum of msi vectors is 192. Even on a 64-core machine, 192 irqs can be easily exhausted if there are several NICs (NIC usually allocates msi irqs depending on the number of online cpus). So we want to limit the msi allocation. This is not a LoongArch-specific problem, because I think other platforms can also meet if they have many NICs. But of course, LoongArch can meet it more easily because the available msi vectors are very few. So, adding a cmdline parameter is somewhat reasonable. After some investigation, I think it may be possible to modify drivers/irqchip/irq-loongson-pch-msi.c and override msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, doing that need to remove the "static" before __msi_domain_alloc_irqs(), which means revert 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make __msi_domain_alloc_irqs() static"), I don't know whether that is acceptable. If such a revert is not acceptable, it seems that we can only use the method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is a little better. Huacai > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > --- > > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++- > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > > index ef1d8857a51b..6617381e50e7 100644 > > --- a/drivers/pci/msi/msi.c > > +++ b/drivers/pci/msi/msi.c > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, > > return ret; > > } > > > > +#ifdef CONFIG_LOONGARCH > > +#define DEFAULT_PCI_IRQ_LIMITS 32 > > +#else > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS > > +#endif > > + > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > + > > +static int __init pci_irq_limit(char *str) > > +{ > > + get_option(&str, &pci_irq_limits); > > + > > + if (pci_irq_limits == 0) > > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > + > > + return 0; > > +} > > + > > +early_param("pci_irq_limit", pci_irq_limit); > > + > > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, > > struct irq_affinity *affd) > > { > > int nvec; > > int rc; > > > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits); > > + > > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) > > return -EINVAL; > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en > > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, > > int maxvec, struct irq_affinity *affd, int flags) > > { > > - int hwsize, rc, nvec = maxvec; > > + int hwsize, rc, nvec; > > + > > + nvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > if (maxvec < minvec) > > return -ERANGE; > > -- > > 2.39.1 > >
On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote: > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > > many as logical cpu numbers. So we should limit pci irq numbers when > > > allocate msi/msix vectors, otherwise some device drivers may fail at > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > > to control the limit. > > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > > NR_IRQS for other platforms. > > > > The IRQ experts can chime in on this, but this doesn't feel right to > > me. I assume arch code should set things up so only valid IRQ numbers > > can be allocated. This doesn't seem necessarily PCI-specific, I'd > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > > command-line parameter that users have to discover and supply. > > The problem we meet: LoongArch machines can have as many as 256 > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core > machine, 192 irqs can be easily exhausted if there are several NICs > (NIC usually allocates msi irqs depending on the number of online > cpus). So we want to limit the msi allocation. > > This is not a LoongArch-specific problem, because I think other > platforms can also meet if they have many NICs. But of course, > LoongArch can meet it more easily because the available msi vectors > are very few. So, adding a cmdline parameter is somewhat reasonable. The patch contains "#ifdef CONFIG_LOONGARCH", which makes this solution LoongArch-specific. I'm not willing for that yet. It sounds like the LoongArch MSI limit is known at compile-time, or at least at boot-time, so the kernel ought to be able to figure out what to do without a command-line parameter. > After some investigation, I think it may be possible to modify > drivers/irqchip/irq-loongson-pch-msi.c and override > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, > doing that need to remove the "static" before > __msi_domain_alloc_irqs(), which means revert > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make > __msi_domain_alloc_irqs() static"), I don't know whether that is > acceptable. I guess you mean msi_domain_ops::domain_alloc_irqs() (not msi_domain_info). If this is really a generic problem, I'm surprised that no other arch has needed to override .domain_alloc_irqs(). I think you'll have better luck getting feedback if you can post the complete working patch. At the very least, you'll learn more about the problem by doing that. Bjorn
Hi, Bjorn and Marc, On Fri, May 26, 2023 at 5:40 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote: > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > > > many as logical cpu numbers. So we should limit pci irq numbers when > > > > allocate msi/msix vectors, otherwise some device drivers may fail at > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > > > to control the limit. > > > > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > > > NR_IRQS for other platforms. > > > > > > The IRQ experts can chime in on this, but this doesn't feel right to > > > me. I assume arch code should set things up so only valid IRQ numbers > > > can be allocated. This doesn't seem necessarily PCI-specific, I'd > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > > > command-line parameter that users have to discover and supply. > > > > The problem we meet: LoongArch machines can have as many as 256 > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core > > machine, 192 irqs can be easily exhausted if there are several NICs > > (NIC usually allocates msi irqs depending on the number of online > > cpus). So we want to limit the msi allocation. > > > > This is not a LoongArch-specific problem, because I think other > > platforms can also meet if they have many NICs. But of course, > > LoongArch can meet it more easily because the available msi vectors > > are very few. So, adding a cmdline parameter is somewhat reasonable. > > The patch contains "#ifdef CONFIG_LOONGARCH", which makes this > solution LoongArch-specific. I'm not willing for that yet. > > It sounds like the LoongArch MSI limit is known at compile-time, or at > least at boot-time, so the kernel ought to be able to figure out what > to do without a command-line parameter. > > > After some investigation, I think it may be possible to modify > > drivers/irqchip/irq-loongson-pch-msi.c and override > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, > > doing that need to remove the "static" before > > __msi_domain_alloc_irqs(), which means revert > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make > > __msi_domain_alloc_irqs() static"), I don't know whether that is > > acceptable. > > I guess you mean msi_domain_ops::domain_alloc_irqs() (not > msi_domain_info). If this is really a generic problem, I'm surprised > that no other arch has needed to override .domain_alloc_irqs(). Yes, I mean msi_domain_ops::domain_alloc_irqs() here. > > I think you'll have better luck getting feedback if you can post the > complete working patch. At the very least, you'll learn more about > the problem by doing that. Emm, I found I can do some small modification on msi_domain_prepare_irqs(), and solve the problem by overriding msi_domain_ops::msi_prepare(), thanks. And patches is coming soon. Huacai > > Bjorn
On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote: > Hi, Bjorn, > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > [+cc Marc, LKML] > > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > > many as logical cpu numbers. So we should limit pci irq numbers when > > > allocate msi/msix vectors, otherwise some device drivers may fail at > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > > to control the limit. > > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > > NR_IRQS for other platforms. > > > > The IRQ experts can chime in on this, but this doesn't feel right to > > me. I assume arch code should set things up so only valid IRQ numbers > > can be allocated. This doesn't seem necessarily PCI-specific, I'd > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > > command-line parameter that users have to discover and supply. > The problem we meet: LoongArch machines can have as many as 256 > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core > machine, 192 irqs can be easily exhausted if there are several NICs > (NIC usually allocates msi irqs depending on the number of online > cpus). So we want to limit the msi allocation. > If the MSI allocation fails with multiple vectors, then the NIC driver should revert to a single MSI vector. Is that happening in your case? - Mani > This is not a LoongArch-specific problem, because I think other > platforms can also meet if they have many NICs. But of course, > LoongArch can meet it more easily because the available msi vectors > are very few. So, adding a cmdline parameter is somewhat reasonable. > > After some investigation, I think it may be possible to modify > drivers/irqchip/irq-loongson-pch-msi.c and override > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, > doing that need to remove the "static" before > __msi_domain_alloc_irqs(), which means revert > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make > __msi_domain_alloc_irqs() static"), I don't know whether that is > acceptable. > > If such a revert is not acceptable, it seems that we can only use the > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is > a little better. > > Huacai > > > > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > --- > > > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++- > > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > > > index ef1d8857a51b..6617381e50e7 100644 > > > --- a/drivers/pci/msi/msi.c > > > +++ b/drivers/pci/msi/msi.c > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, > > > return ret; > > > } > > > > > > +#ifdef CONFIG_LOONGARCH > > > +#define DEFAULT_PCI_IRQ_LIMITS 32 > > > +#else > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS > > > +#endif > > > + > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > + > > > +static int __init pci_irq_limit(char *str) > > > +{ > > > + get_option(&str, &pci_irq_limits); > > > + > > > + if (pci_irq_limits == 0) > > > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > + > > > + return 0; > > > +} > > > + > > > +early_param("pci_irq_limit", pci_irq_limit); > > > + > > > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, > > > struct irq_affinity *affd) > > > { > > > int nvec; > > > int rc; > > > > > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits); > > > + > > > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) > > > return -EINVAL; > > > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en > > > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, > > > int maxvec, struct irq_affinity *affd, int flags) > > > { > > > - int hwsize, rc, nvec = maxvec; > > > + int hwsize, rc, nvec; > > > + > > > + nvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > > if (maxvec < minvec) > > > return -ERANGE; > > > -- > > > 2.39.1 > > >
Hi, Manivannan, On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote: > > Hi, Bjorn, > > > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > [+cc Marc, LKML] > > > > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > > > many as logical cpu numbers. So we should limit pci irq numbers when > > > > allocate msi/msix vectors, otherwise some device drivers may fail at > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > > > to control the limit. > > > > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > > > NR_IRQS for other platforms. > > > > > > The IRQ experts can chime in on this, but this doesn't feel right to > > > me. I assume arch code should set things up so only valid IRQ numbers > > > can be allocated. This doesn't seem necessarily PCI-specific, I'd > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > > > command-line parameter that users have to discover and supply. > > The problem we meet: LoongArch machines can have as many as 256 > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core > > machine, 192 irqs can be easily exhausted if there are several NICs > > (NIC usually allocates msi irqs depending on the number of online > > cpus). So we want to limit the msi allocation. > > > > If the MSI allocation fails with multiple vectors, then the NIC driver should > revert to a single MSI vector. Is that happening in your case? Thank you for pointing this out. Yes, I know most existing drivers will fallback to use single msi or legacy irqs when failed. However, as I replied in another thread (the new solution of this problem [1]), we want to do some proactive throttling rather than consume msi vectors aggressively. For example, if we have two NICs, we want both of them to get 32 msi vectors; not one exhaust all available vectors, and the other fallback to use single msi or legacy irq. I hope I have explained clearly, thanks. [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@loongson.cn/T/#t Huacai > > - Mani > > > This is not a LoongArch-specific problem, because I think other > > platforms can also meet if they have many NICs. But of course, > > LoongArch can meet it more easily because the available msi vectors > > are very few. So, adding a cmdline parameter is somewhat reasonable. > > > > After some investigation, I think it may be possible to modify > > drivers/irqchip/irq-loongson-pch-msi.c and override > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, > > doing that need to remove the "static" before > > __msi_domain_alloc_irqs(), which means revert > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make > > __msi_domain_alloc_irqs() static"), I don't know whether that is > > acceptable. > > > > If such a revert is not acceptable, it seems that we can only use the > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is > > a little better. > > > > Huacai > > > > > > > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > > --- > > > > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++- > > > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > > > > index ef1d8857a51b..6617381e50e7 100644 > > > > --- a/drivers/pci/msi/msi.c > > > > +++ b/drivers/pci/msi/msi.c > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, > > > > return ret; > > > > } > > > > > > > > +#ifdef CONFIG_LOONGARCH > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32 > > > > +#else > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS > > > > +#endif > > > > + > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > + > > > > +static int __init pci_irq_limit(char *str) > > > > +{ > > > > + get_option(&str, &pci_irq_limits); > > > > + > > > > + if (pci_irq_limits == 0) > > > > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +early_param("pci_irq_limit", pci_irq_limit); > > > > + > > > > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, > > > > struct irq_affinity *affd) > > > > { > > > > int nvec; > > > > int rc; > > > > > > > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > + > > > > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) > > > > return -EINVAL; > > > > > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en > > > > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, > > > > int maxvec, struct irq_affinity *affd, int flags) > > > > { > > > > - int hwsize, rc, nvec = maxvec; > > > > + int hwsize, rc, nvec; > > > > + > > > > + nvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > > > > if (maxvec < minvec) > > > > return -ERANGE; > > > > -- > > > > 2.39.1 > > > > > > -- > மணிவண்ணன் சதாசிவம்
On Mon, May 29, 2023 at 10:02:20AM +0800, Huacai Chen wrote: > Hi, Manivannan, > > On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam > <manivannan.sadhasivam@linaro.org> wrote: > > > > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote: > > > Hi, Bjorn, > > > > > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > [+cc Marc, LKML] > > > > > > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > > > > many as logical cpu numbers. So we should limit pci irq numbers when > > > > > allocate msi/msix vectors, otherwise some device drivers may fail at > > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > > > > to control the limit. > > > > > > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > > > > NR_IRQS for other platforms. > > > > > > > > The IRQ experts can chime in on this, but this doesn't feel right to > > > > me. I assume arch code should set things up so only valid IRQ numbers > > > > can be allocated. This doesn't seem necessarily PCI-specific, I'd > > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > > > > command-line parameter that users have to discover and supply. > > > The problem we meet: LoongArch machines can have as many as 256 > > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core > > > machine, 192 irqs can be easily exhausted if there are several NICs > > > (NIC usually allocates msi irqs depending on the number of online > > > cpus). So we want to limit the msi allocation. > > > > > > > If the MSI allocation fails with multiple vectors, then the NIC driver should > > revert to a single MSI vector. Is that happening in your case? > Thank you for pointing this out. Yes, I know most existing drivers > will fallback to use single msi or legacy irqs when failed. However, > as I > replied in another thread (the new solution of this problem [1]), we > want to do some proactive throttling rather than consume msi vectors > aggressively. For example, if we have two NICs, we want both of them > to get 32 msi vectors; not one exhaust all available vectors, and the > other fallback to use single msi or legacy irq. > > I hope I have explained clearly, thanks. > The problem you are facing is not specific to Loongsoon but rather generic. And the solution we have currently is what you were also aware of it seems. So if you want to propose an alternative solution, it should be generic and also a good justification needs to be provided to the maintainers i.e., comparing two solutions and why yours is better. But IMO what you are proposing seems like usecase driven and may not work all the time due to architecture limitation. This again proves that the existing solution is sufficient enough. - Mani > [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@loongson.cn/T/#t > > Huacai > > > > - Mani > > > > > This is not a LoongArch-specific problem, because I think other > > > platforms can also meet if they have many NICs. But of course, > > > LoongArch can meet it more easily because the available msi vectors > > > are very few. So, adding a cmdline parameter is somewhat reasonable. > > > > > > After some investigation, I think it may be possible to modify > > > drivers/irqchip/irq-loongson-pch-msi.c and override > > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, > > > doing that need to remove the "static" before > > > __msi_domain_alloc_irqs(), which means revert > > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make > > > __msi_domain_alloc_irqs() static"), I don't know whether that is > > > acceptable. > > > > > > If such a revert is not acceptable, it seems that we can only use the > > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is > > > a little better. > > > > > > Huacai > > > > > > > > > > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> > > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > > > --- > > > > > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++- > > > > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > > > > > index ef1d8857a51b..6617381e50e7 100644 > > > > > --- a/drivers/pci/msi/msi.c > > > > > +++ b/drivers/pci/msi/msi.c > > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, > > > > > return ret; > > > > > } > > > > > > > > > > +#ifdef CONFIG_LOONGARCH > > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32 > > > > > +#else > > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS > > > > > +#endif > > > > > + > > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > > + > > > > > +static int __init pci_irq_limit(char *str) > > > > > +{ > > > > > + get_option(&str, &pci_irq_limits); > > > > > + > > > > > + if (pci_irq_limits == 0) > > > > > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > > + > > > > > + return 0; > > > > > +} > > > > > + > > > > > +early_param("pci_irq_limit", pci_irq_limit); > > > > > + > > > > > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, > > > > > struct irq_affinity *affd) > > > > > { > > > > > int nvec; > > > > > int rc; > > > > > > > > > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > + > > > > > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) > > > > > return -EINVAL; > > > > > > > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en > > > > > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, > > > > > int maxvec, struct irq_affinity *affd, int flags) > > > > > { > > > > > - int hwsize, rc, nvec = maxvec; > > > > > + int hwsize, rc, nvec; > > > > > + > > > > > + nvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > > > > > > if (maxvec < minvec) > > > > > return -ERANGE; > > > > > -- > > > > > 2.39.1 > > > > > > > > > -- > > மணிவண்ணன் சதாசிவம்
Hi, Manivannan, On Mon, May 29, 2023 at 1:39 PM Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Mon, May 29, 2023 at 10:02:20AM +0800, Huacai Chen wrote: > > Hi, Manivannan, > > > > On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam > > <manivannan.sadhasivam@linaro.org> wrote: > > > > > > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote: > > > > Hi, Bjorn, > > > > > > > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > > > [+cc Marc, LKML] > > > > > > > > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > > > > > many as logical cpu numbers. So we should limit pci irq numbers when > > > > > > allocate msi/msix vectors, otherwise some device drivers may fail at > > > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > > > > > to control the limit. > > > > > > > > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > > > > > NR_IRQS for other platforms. > > > > > > > > > > The IRQ experts can chime in on this, but this doesn't feel right to > > > > > me. I assume arch code should set things up so only valid IRQ numbers > > > > > can be allocated. This doesn't seem necessarily PCI-specific, I'd > > > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > > > > > command-line parameter that users have to discover and supply. > > > > The problem we meet: LoongArch machines can have as many as 256 > > > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core > > > > machine, 192 irqs can be easily exhausted if there are several NICs > > > > (NIC usually allocates msi irqs depending on the number of online > > > > cpus). So we want to limit the msi allocation. > > > > > > > > > > If the MSI allocation fails with multiple vectors, then the NIC driver should > > > revert to a single MSI vector. Is that happening in your case? > > Thank you for pointing this out. Yes, I know most existing drivers > > will fallback to use single msi or legacy irqs when failed. However, > > as I > > replied in another thread (the new solution of this problem [1]), we > > want to do some proactive throttling rather than consume msi vectors > > aggressively. For example, if we have two NICs, we want both of them > > to get 32 msi vectors; not one exhaust all available vectors, and the > > other fallback to use single msi or legacy irq. > > > > I hope I have explained clearly, thanks. > > > > The problem you are facing is not specific to Loongsoon but rather generic. And > the solution we have currently is what you were also aware of it seems. So if > you want to propose an alternative solution, it should be generic and also a > good justification needs to be provided to the maintainers i.e., comparing two > solutions and why yours is better. Yes, I think we are facing a generic problem, but it is more obvious on platforms which provide less MSI vectors. And my solution seems generic enough. :) At least in my example, "proactive throttling" is better than "aggressive consuming", because two (or more) NICs have more balanced throughput. > > But IMO what you are proposing seems like usecase driven and may not work all > the time due to architecture limitation. This again proves that the existing > solution is sufficient enough. Yes, it's a usecase driven solution, so I provide a cmdline parameter to let the user decide. Huacai > > - Mani > > > [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@loongson.cn/T/#t > > > > Huacai > > > > > > - Mani > > > > > > > This is not a LoongArch-specific problem, because I think other > > > > platforms can also meet if they have many NICs. But of course, > > > > LoongArch can meet it more easily because the available msi vectors > > > > are very few. So, adding a cmdline parameter is somewhat reasonable. > > > > > > > > After some investigation, I think it may be possible to modify > > > > drivers/irqchip/irq-loongson-pch-msi.c and override > > > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, > > > > doing that need to remove the "static" before > > > > __msi_domain_alloc_irqs(), which means revert > > > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make > > > > __msi_domain_alloc_irqs() static"), I don't know whether that is > > > > acceptable. > > > > > > > > If such a revert is not acceptable, it seems that we can only use the > > > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is > > > > a little better. > > > > > > > > Huacai > > > > > > > > > > > > > > > Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> > > > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> > > > > > > --- > > > > > > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++- > > > > > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > > > > > > index ef1d8857a51b..6617381e50e7 100644 > > > > > > --- a/drivers/pci/msi/msi.c > > > > > > +++ b/drivers/pci/msi/msi.c > > > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, > > > > > > return ret; > > > > > > } > > > > > > > > > > > > +#ifdef CONFIG_LOONGARCH > > > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32 > > > > > > +#else > > > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS > > > > > > +#endif > > > > > > + > > > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > > > + > > > > > > +static int __init pci_irq_limit(char *str) > > > > > > +{ > > > > > > + get_option(&str, &pci_irq_limits); > > > > > > + > > > > > > + if (pci_irq_limits == 0) > > > > > > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > > > + > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > +early_param("pci_irq_limit", pci_irq_limit); > > > > > > + > > > > > > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, > > > > > > struct irq_affinity *affd) > > > > > > { > > > > > > int nvec; > > > > > > int rc; > > > > > > > > > > > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > > + > > > > > > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) > > > > > > return -EINVAL; > > > > > > > > > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en > > > > > > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, > > > > > > int maxvec, struct irq_affinity *affd, int flags) > > > > > > { > > > > > > - int hwsize, rc, nvec = maxvec; > > > > > > + int hwsize, rc, nvec; > > > > > > + > > > > > > + nvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > > > > > > > > if (maxvec < minvec) > > > > > > return -ERANGE; > > > > > > -- > > > > > > 2.39.1 > > > > > > > > > > > > -- > > > மணிவண்ணன் சதாசிவம் > > -- > மணிவண்ணன் சதாசிவம்
On Mon, May 29, 2023 at 02:52:29PM +0800, Huacai Chen wrote: > > But IMO what you are proposing seems like usecase driven and may not work all > > the time due to architecture limitation. This again proves that the existing > > solution is sufficient enough. > Yes, it's a usecase driven solution, so I provide a cmdline parameter > to let the user decide. The NIC drivers should be consuming interrupts based on the number of queues they are using, and that is something you can control from the command line, eg ethtool IIRC. Usually it defaults to the number of CPUs. Basically, you want to enable the user to configure the system with a user specified reduced number of NIC queues, and we already have way to do that. Jason
Hi, Jason, On Thu, Jun 1, 2023 at 3:14 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Mon, May 29, 2023 at 02:52:29PM +0800, Huacai Chen wrote: > > > > But IMO what you are proposing seems like usecase driven and may not work all > > > the time due to architecture limitation. This again proves that the existing > > > solution is sufficient enough. > > > Yes, it's a usecase driven solution, so I provide a cmdline parameter > > to let the user decide. > > The NIC drivers should be consuming interrupts based on the number of > queues they are using, and that is something you can control from the > command line, eg ethtool IIRC. Usually it defaults to the number of > CPUs. > > Basically, you want to enable the user to configure the system with a > user specified reduced number of NIC queues, and we already have way > to do that. Yes, ethtool is a possible way, thank you very much. Huacai > > Jason
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index ef1d8857a51b..6617381e50e7 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, return ret; } +#ifdef CONFIG_LOONGARCH +#define DEFAULT_PCI_IRQ_LIMITS 32 +#else +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS +#endif + +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; + +static int __init pci_irq_limit(char *str) +{ + get_option(&str, &pci_irq_limits); + + if (pci_irq_limits == 0) + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; + + return 0; +} + +early_param("pci_irq_limit", pci_irq_limit); + int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, struct irq_affinity *affd) { int nvec; int rc; + maxvec = clamp_val(maxvec, 0, pci_irq_limits); + if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) return -EINVAL; @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, int maxvec, struct irq_affinity *affd, int flags) { - int hwsize, rc, nvec = maxvec; + int hwsize, rc, nvec; + + nvec = clamp_val(maxvec, 0, pci_irq_limits); if (maxvec < minvec) return -ERANGE;