Message ID | 20200501224042.141366-2-helgaas@kernel.org |
---|---|
State | New |
Headers | show |
Series | PCI: Check for platform_get_irq() failure consistently | expand |
On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > > These interfaces return a negative error number or an IRQ: > > platform_get_irq() > platform_get_irq_optional() > platform_get_irq_byname() > platform_get_irq_byname_optional() > > The function comments suggest checking for error like this: > > irq = platform_get_irq(...); > if (irq < 0) > return irq; > > which is what most callers (~900 of 1400) do, so it's implicit that IRQ 0 > is invalid. But some callers check for "irq <= 0", and it's not obvious > from the source that we never return an IRQ 0. > > Make this more explicit by updating the comments to say that an IRQ number > is always non-zero and adding a WARN() if we ever do return zero. If we do > return IRQ 0, it likely indicates a bug in the arch-specific parts of > platform_get_irq(). I worry about adding WARN() as there are systems that do panic_on_warn() and syzbot trips over this as well. I don't think that for this issue it would be a problem, but what really is this warning about that someone could do anything with? Other than that minor thing, this looks good to me, thanks for finally clearing this up. greg k-h
On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote: > On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote: > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > These interfaces return a negative error number or an IRQ: > > > > platform_get_irq() > > platform_get_irq_optional() > > platform_get_irq_byname() > > platform_get_irq_byname_optional() > > > > The function comments suggest checking for error like this: > > > > irq = platform_get_irq(...); > > if (irq < 0) > > return irq; > > > > which is what most callers (~900 of 1400) do, so it's implicit that IRQ 0 > > is invalid. But some callers check for "irq <= 0", and it's not obvious > > from the source that we never return an IRQ 0. > > > > Make this more explicit by updating the comments to say that an IRQ number > > is always non-zero and adding a WARN() if we ever do return zero. If we do > > return IRQ 0, it likely indicates a bug in the arch-specific parts of > > platform_get_irq(). > > I worry about adding WARN() as there are systems that do panic_on_warn() > and syzbot trips over this as well. I don't think that for this issue > it would be a problem, but what really is this warning about that > someone could do anything with? > > Other than that minor thing, this looks good to me, thanks for finally > clearing this up. What I'm concerned about is an arch that returns 0. Most drivers don't check for 0 so they'll just try to use it, and things will fail in some obscure way. My assumption is that if there really is no IRQ, we should return -ENOENT or similar instead of 0. I could be convinced that it's not worth warning about at all, or we could do something like the following: diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 084cf1d23d3f..4afa5875e14d 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) ret = -ENXIO; #endif out: - WARN(ret == 0, "0 is an invalid IRQ number\n"); + /* Returning zero here is likely a bug in the arch IRQ code */ + if (ret == 0) { + pr_warn("0 is an invalid IRQ number\n"); + dump_stack(); + } return ret; } EXPORT_SYMBOL_GPL(platform_get_irq_optional); @@ -312,7 +316,11 @@ static int __platform_get_irq_byname(struct platform_device *dev, r = platform_get_resource_byname(dev, IORESOURCE_IRQ, name); if (r) { - WARN(r->start == 0, "0 is an invalid IRQ number\n"); + /* Returning zero here is likely a bug in the arch IRQ code */ + if (r->start == 0) { + pr_warn("0 is an invalid IRQ number\n"); + dump_stack(); + } return r->start; }
On Mon, May 04, 2020 at 01:08:22PM -0500, Bjorn Helgaas wrote: > On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote: > > On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote: > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > > > These interfaces return a negative error number or an IRQ: > > > > > > platform_get_irq() > > > platform_get_irq_optional() > > > platform_get_irq_byname() > > > platform_get_irq_byname_optional() > > > > > > The function comments suggest checking for error like this: > > > > > > irq = platform_get_irq(...); > > > if (irq < 0) > > > return irq; > > > > > > which is what most callers (~900 of 1400) do, so it's implicit that IRQ 0 > > > is invalid. But some callers check for "irq <= 0", and it's not obvious > > > from the source that we never return an IRQ 0. > > > > > > Make this more explicit by updating the comments to say that an IRQ number > > > is always non-zero and adding a WARN() if we ever do return zero. If we do > > > return IRQ 0, it likely indicates a bug in the arch-specific parts of > > > platform_get_irq(). > > > > I worry about adding WARN() as there are systems that do panic_on_warn() > > and syzbot trips over this as well. I don't think that for this issue > > it would be a problem, but what really is this warning about that > > someone could do anything with? > > > > Other than that minor thing, this looks good to me, thanks for finally > > clearing this up. > > What I'm concerned about is an arch that returns 0. Most drivers > don't check for 0 so they'll just try to use it, and things will fail > in some obscure way. My assumption is that if there really is no IRQ, > we should return -ENOENT or similar instead of 0. > > I could be convinced that it's not worth warning about at all, or we > could do something like the following: > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index 084cf1d23d3f..4afa5875e14d 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) > ret = -ENXIO; > #endif > out: > - WARN(ret == 0, "0 is an invalid IRQ number\n"); > + /* Returning zero here is likely a bug in the arch IRQ code */ > + if (ret == 0) { > + pr_warn("0 is an invalid IRQ number\n"); > + dump_stack(); > + } > return ret; > } > EXPORT_SYMBOL_GPL(platform_get_irq_optional); > @@ -312,7 +316,11 @@ static int __platform_get_irq_byname(struct platform_device *dev, > > r = platform_get_resource_byname(dev, IORESOURCE_IRQ, name); > if (r) { > - WARN(r->start == 0, "0 is an invalid IRQ number\n"); > + /* Returning zero here is likely a bug in the arch IRQ code */ > + if (r->start == 0) { > + pr_warn("0 is an invalid IRQ number\n"); > + dump_stack(); > + } > return r->start; > } > I like that, but you said this is something that the platform people should only see when bringing up a new system, so maybe the WARN() is fine. It's not user-triggerable, so your original is ok. sorry for the noise, greg k-h
On Mon, May 04, 2020 at 09:07:21PM +0200, Greg Kroah-Hartman wrote: > On Mon, May 04, 2020 at 01:08:22PM -0500, Bjorn Helgaas wrote: > > On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote: > > > On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote: > > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > > > > > These interfaces return a negative error number or an IRQ: > > > > > > > > platform_get_irq() > > > > platform_get_irq_optional() > > > > platform_get_irq_byname() > > > > platform_get_irq_byname_optional() > > > > > > > > The function comments suggest checking for error like this: > > > > > > > > irq = platform_get_irq(...); > > > > if (irq < 0) > > > > return irq; > > > > > > > > which is what most callers (~900 of 1400) do, so it's implicit > > > > that IRQ 0 is invalid. But some callers check for "irq <= 0", > > > > and it's not obvious from the source that we never return an > > > > IRQ 0. > > > > > > > > Make this more explicit by updating the comments to say that > > > > an IRQ number is always non-zero and adding a WARN() if we > > > > ever do return zero. If we do return IRQ 0, it likely > > > > indicates a bug in the arch-specific parts of > > > > platform_get_irq(). > > > > > > I worry about adding WARN() as there are systems that do > > > panic_on_warn() and syzbot trips over this as well. I don't > > > think that for this issue it would be a problem, but what really > > > is this warning about that someone could do anything with? > > > > > > Other than that minor thing, this looks good to me, thanks for > > > finally clearing this up. > > > > What I'm concerned about is an arch that returns 0. Most drivers > > don't check for 0 so they'll just try to use it, and things will > > fail in some obscure way. My assumption is that if there really > > is no IRQ, we should return -ENOENT or similar instead of 0. > > > > I could be convinced that it's not worth warning about at all, or > > we could do something like the following: > > > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > > index 084cf1d23d3f..4afa5875e14d 100644 > > --- a/drivers/base/platform.c > > +++ b/drivers/base/platform.c > > @@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) > > ret = -ENXIO; > > #endif > > out: > > - WARN(ret == 0, "0 is an invalid IRQ number\n"); > > + /* Returning zero here is likely a bug in the arch IRQ code */ > > + if (ret == 0) { > > + pr_warn("0 is an invalid IRQ number\n"); > > + dump_stack(); > > + } > > return ret; > > } > > ... > I like that, but you said this is something that the platform people > should only see when bringing up a new system, so maybe the WARN() is > fine. It's not user-triggerable, so your original is ok. Is that an ack? Thomas, any thoughts? I suspect we could see this given a broken DT, too, so I'm not sure it's strictly a bringup problem. I would probably argue that even this case would be an arch defect: the kernel should validate data from a DT at least enough to avoid giving a bogus, useless IRQ to a driver. Bjorn
On Mon, May 04, 2020 at 05:26:59PM -0500, Bjorn Helgaas wrote: > On Mon, May 04, 2020 at 09:07:21PM +0200, Greg Kroah-Hartman wrote: > > On Mon, May 04, 2020 at 01:08:22PM -0500, Bjorn Helgaas wrote: > > > On Sat, May 02, 2020 at 08:15:37AM +0200, Greg Kroah-Hartman wrote: > > > > On Fri, May 01, 2020 at 05:40:41PM -0500, Bjorn Helgaas wrote: > > > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > > > > > > > These interfaces return a negative error number or an IRQ: > > > > > > > > > > platform_get_irq() > > > > > platform_get_irq_optional() > > > > > platform_get_irq_byname() > > > > > platform_get_irq_byname_optional() > > > > > > > > > > The function comments suggest checking for error like this: > > > > > > > > > > irq = platform_get_irq(...); > > > > > if (irq < 0) > > > > > return irq; > > > > > > > > > > which is what most callers (~900 of 1400) do, so it's implicit > > > > > that IRQ 0 is invalid. But some callers check for "irq <= 0", > > > > > and it's not obvious from the source that we never return an > > > > > IRQ 0. > > > > > > > > > > Make this more explicit by updating the comments to say that > > > > > an IRQ number is always non-zero and adding a WARN() if we > > > > > ever do return zero. If we do return IRQ 0, it likely > > > > > indicates a bug in the arch-specific parts of > > > > > platform_get_irq(). > > > > > > > > I worry about adding WARN() as there are systems that do > > > > panic_on_warn() and syzbot trips over this as well. I don't > > > > think that for this issue it would be a problem, but what really > > > > is this warning about that someone could do anything with? > > > > > > > > Other than that minor thing, this looks good to me, thanks for > > > > finally clearing this up. > > > > > > What I'm concerned about is an arch that returns 0. Most drivers > > > don't check for 0 so they'll just try to use it, and things will > > > fail in some obscure way. My assumption is that if there really > > > is no IRQ, we should return -ENOENT or similar instead of 0. > > > > > > I could be convinced that it's not worth warning about at all, or > > > we could do something like the following: > > > > > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > > > index 084cf1d23d3f..4afa5875e14d 100644 > > > --- a/drivers/base/platform.c > > > +++ b/drivers/base/platform.c > > > @@ -220,7 +220,11 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) > > > ret = -ENXIO; > > > #endif > > > out: > > > - WARN(ret == 0, "0 is an invalid IRQ number\n"); > > > + /* Returning zero here is likely a bug in the arch IRQ code */ > > > + if (ret == 0) { > > > + pr_warn("0 is an invalid IRQ number\n"); > > > + dump_stack(); > > > + } > > > return ret; > > > } > > > ... > > > I like that, but you said this is something that the platform people > > should only see when bringing up a new system, so maybe the WARN() is > > fine. It's not user-triggerable, so your original is ok. > > Is that an ack? Thomas, any thoughts? Sorry, yes: Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 5255550b7c34..084cf1d23d3f 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -152,23 +152,24 @@ EXPORT_SYMBOL_GPL(devm_platform_ioremap_resource_byname); * if (irq < 0) * return irq; * - * Return: IRQ number on success, negative error number on failure. + * Return: non-zero IRQ number on success, negative error number on failure. */ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) { + int ret; #ifdef CONFIG_SPARC /* sparc does not have irqs represented as IORESOURCE_IRQ resources */ if (!dev || num >= dev->archdata.num_irqs) return -ENXIO; - return dev->archdata.irqs[num]; + ret = dev->archdata.irqs[num]; + goto out; #else struct resource *r; - int ret; if (IS_ENABLED(CONFIG_OF_IRQ) && dev->dev.of_node) { ret = of_irq_get(dev->dev.of_node, num); if (ret > 0 || ret == -EPROBE_DEFER) - return ret; + goto out; } r = platform_get_resource(dev, IORESOURCE_IRQ, num); @@ -176,7 +177,7 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) if (r && r->flags & IORESOURCE_DISABLED) { ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r); if (ret) - return ret; + goto out; } } @@ -190,13 +191,17 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) struct irq_data *irqd; irqd = irq_get_irq_data(r->start); - if (!irqd) - return -ENXIO; + if (!irqd) { + ret = -ENXIO; + goto out; + } irqd_set_trigger_type(irqd, r->flags & IORESOURCE_BITS); } - if (r) - return r->start; + if (r) { + ret = r->start; + goto out; + } /* * For the index 0 interrupt, allow falling back to GpioInt @@ -209,11 +214,14 @@ int platform_get_irq_optional(struct platform_device *dev, unsigned int num) ret = acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num); /* Our callers expect -ENXIO for missing IRQs. */ if (ret >= 0 || ret == -EPROBE_DEFER) - return ret; + goto out; } - return -ENXIO; + ret = -ENXIO; #endif +out: + WARN(ret == 0, "0 is an invalid IRQ number\n"); + return ret; } EXPORT_SYMBOL_GPL(platform_get_irq_optional); @@ -231,7 +239,7 @@ EXPORT_SYMBOL_GPL(platform_get_irq_optional); * if (irq < 0) * return irq; * - * Return: IRQ number on success, negative error number on failure. + * Return: non-zero IRQ number on success, negative error number on failure. */ int platform_get_irq(struct platform_device *dev, unsigned int num) { @@ -303,8 +311,10 @@ static int __platform_get_irq_byname(struct platform_device *dev, } r = platform_get_resource_byname(dev, IORESOURCE_IRQ, name); - if (r) + if (r) { + WARN(r->start == 0, "0 is an invalid IRQ number\n"); return r->start; + } return -ENXIO; } @@ -316,7 +326,7 @@ static int __platform_get_irq_byname(struct platform_device *dev, * * Get an IRQ like platform_get_irq(), but then by name rather then by index. * - * Return: IRQ number on success, negative error number on failure. + * Return: non-zero IRQ number on success, negative error number on failure. */ int platform_get_irq_byname(struct platform_device *dev, const char *name) { @@ -338,7 +348,7 @@ EXPORT_SYMBOL_GPL(platform_get_irq_byname); * Get an optional IRQ by name like platform_get_irq_byname(). Except that it * does not print an error message if an IRQ can not be obtained. * - * Return: IRQ number on success, negative error number on failure. + * Return: non-zero IRQ number on success, negative error number on failure. */ int platform_get_irq_byname_optional(struct platform_device *dev, const char *name)