Message ID | 20191025140621.43417-1-andriy.shevchenko@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | [v1] pinctrl: baytrail: Move IRQ valid mask initialization to a dedicated callback | expand |
On Fri, Oct 25, 2019 at 05:06:21PM +0300, Andy Shevchenko wrote: > There is a logical continuation of the commit 5fbe5b5883f8 ("gpio: Initialize > the irqchip valid_mask with a callback") to split IRQ initialization to > hardware and valid mask setup parts. > Hans, it will be nice if you would be able to test this as well. The better is to use our for-next branch with this applied on top.
On Fri, Oct 25, 2019 at 05:06:21PM +0300, Andy Shevchenko wrote: > There is a logical continuation of the commit 5fbe5b5883f8 ("gpio: Initialize > the irqchip valid_mask with a callback") to split IRQ initialization to > hardware and valid mask setup parts. > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Hi, On 25-10-2019 16:06, Andy Shevchenko wrote: > There is a logical continuation of the commit 5fbe5b5883f8 ("gpio: Initialize > the irqchip valid_mask with a callback") to split IRQ initialization to > hardware and valid mask setup parts. > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> So I've been running some tests based on https://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/intel.git/log/?h=for-next + this patch. That combo causes several issues, so I tried reverting the patches one by one, if I drop this patch and use just: https://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/intel.git/log/?h=for-next Then the kernel does not even boot. I believe this is caused by: 88583e340a0e ("pinctrl: intel: baytrail: Pass irqchip when adding gpiochip") The problem here is that gpiochip_add_data_with_key() calls gpiochip_irqchip_init_hw() before it calls gpiochip_irqchip_init_valid_mask(), so after commit 88583e340a0e when byt_gpio_irq_init_hw runs gc->irq.valid_mask is NULL and we crash with a NULL pointer exception (or so I believe, the kernel never gets far enough to get any info out of it without extra work). Note that this ("[PATCH v1] pinctrl: baytrail: Move IRQ valid mask initialization to a dedicated callback") patch fixes this since it moves the gc->irq.valid_mask accesses to byt_init_irq_valid_mask. But this change itself triggers another variant of this ordering issue, it causes these 2 new errors to get logged: byt_gpio INT33FC:01: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x01800000 byt_gpio INT33FC:02: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x00400000 The problem is that before this change the code calculating the valid_mask would also disable interrupts on GPIOs which do not have their BYT_DIRECT_IRQ_EN bit set. This now happens after the check done in byt_gpio_irq_init_hw() causing these false-positive error messages. Even if we ignore the NULL pointer deref problem for now and we ignore these 2 new error messages for now. Things are still broken with the current changes in pinctrl/intel.git/for-next switching to letting devm_gpiochip_add_data register the irqchip means that acpi_gpiochip_request_interrupts() gets called before gpiochip_add_pin_range() is called from pinctrl-baytrail.c, causing the GPIO lookup of any ACPI _AEI handlers to fail, resulting in errors like this one: byt_gpio INT33FC:02: Failed to request GPIO for pin 0x13: -517 And none of the _AEI handlers working TL;DR: commit 88583e340a0e ("pinctrl: intel: baytrail: Pass irqchip when adding gpiochip") breaks a bunch of stuff and should be dropped from pinctrl/intel.git/for-next and this needs some more work before it is ready for mainline. Regards, Hans > --- > drivers/pinctrl/intel/pinctrl-baytrail.c | 25 ++++++++++-------------- > 1 file changed, 10 insertions(+), 15 deletions(-) > > diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c b/drivers/pinctrl/intel/pinctrl-baytrail.c > index beb26550c25f..08e2b940cc11 100644 > --- a/drivers/pinctrl/intel/pinctrl-baytrail.c > +++ b/drivers/pinctrl/intel/pinctrl-baytrail.c > @@ -1432,22 +1432,10 @@ static void byt_init_irq_valid_mask(struct gpio_chip *chip, > unsigned long *valid_mask, > unsigned int ngpios) > { > - /* > - * FIXME: currently the valid_mask is filled in as part of > - * initializing the irq_chip below in byt_gpio_irq_init_hw(). > - * when converting this driver to the new way of passing the > - * gpio_irq_chip along when adding the gpio_chip, move the > - * mask initialization into this callback instead. Right now > - * this callback is here to make sure the mask gets allocated. > - */ > -} > - > -static int byt_gpio_irq_init_hw(struct gpio_chip *gc) > -{ > - struct byt_gpio *vg = gpiochip_get_data(gc); > + struct byt_gpio *vg = gpiochip_get_data(chip); > struct device *dev = &vg->pdev->dev; > void __iomem *reg; > - u32 base, value; > + u32 value; > int i; > > /* > @@ -1468,13 +1456,20 @@ static int byt_gpio_irq_init_hw(struct gpio_chip *gc) > > value = readl(reg); > if (value & BYT_DIRECT_IRQ_EN) { > - clear_bit(i, gc->irq.valid_mask); > + clear_bit(i, valid_mask); > dev_dbg(dev, "excluding GPIO %d from IRQ domain\n", i); > } else if ((value & BYT_PIN_MUX) == byt_get_gpio_mux(vg, i)) { > byt_gpio_clear_triggering(vg, i); > dev_dbg(dev, "disabling GPIO %d\n", i); > } > } > +} > + > +static int byt_gpio_irq_init_hw(struct gpio_chip *gc) > +{ > + struct byt_gpio *vg = gpiochip_get_data(gc); > + void __iomem *reg; > + u32 base, value; > > /* clear interrupt status trigger registers */ > for (base = 0; base < vg->soc_data->npins; base += 32) { >
On Wed, Oct 30, 2019 at 10:31 AM Hans de Goede <hdegoede@redhat.com> wrote: > The problem here is that gpiochip_add_data_with_key() calls gpiochip_irqchip_init_hw() > before it calls gpiochip_irqchip_init_valid_mask(), so after commit 88583e340a0e > when byt_gpio_irq_init_hw runs gc->irq.valid_mask is NULL and we crash with a NULL > pointer exception (or so I believe, the kernel never gets far enough to get > any info out of it without extra work). > > Note that this ("[PATCH v1] pinctrl: baytrail: Move IRQ valid mask initialization to a dedicated callback") > patch fixes this since it moves the gc->irq.valid_mask accesses to > byt_init_irq_valid_mask. OK so we have a halfway fix there. > But this change itself triggers another variant of this ordering issue, > it causes these 2 new errors to get logged: > > byt_gpio INT33FC:01: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x01800000 > byt_gpio INT33FC:02: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x00400000 > > The problem is that before this change the code calculating the valid_mask > would also disable interrupts on GPIOs which do not have their > BYT_DIRECT_IRQ_EN bit set. This now happens after the check done in > byt_gpio_irq_init_hw() causing these false-positive error messages. Isn't that easily fixed with something like this: diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 9afbc0612126..e865c889ba8d 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1411,11 +1411,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, machine_gpiochip_add(chip); - ret = gpiochip_irqchip_init_hw(chip); + ret = gpiochip_irqchip_init_valid_mask(chip); if (ret) goto err_remove_acpi_chip; - ret = gpiochip_irqchip_init_valid_mask(chip); + ret = gpiochip_irqchip_init_hw(chip); if (ret) goto err_remove_acpi_chip; (I sent a separate patch for this.) It isn't super-easy to know the right ordering semantics for init_hw vs init_valid_mask I think. Sadly we need to test it out in practice. > Even if we ignore the NULL pointer deref problem for now and we ignore > these 2 new error messages for now. Things are still broken with the > current changes in pinctrl/intel.git/for-next switching to letting > devm_gpiochip_add_data register the irqchip means that > acpi_gpiochip_request_interrupts() gets called before > gpiochip_add_pin_range() is called from pinctrl-baytrail.c, causing > the GPIO lookup of any ACPI _AEI handlers to fail, resulting in > errors like this one: > > byt_gpio INT33FC:02: Failed to request GPIO for pin 0x13: -517 > > And none of the _AEI handlers working I just vaguely understand this... If what you're saying is that the Baytrail driver is dependent on registering the pin ranges *before* registering the GPIO chip can we then: diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c b/drivers/pinctrl/intel/pinctrl-baytrail.c index beb26550c25f..b308567c5153 100644 --- a/drivers/pinctrl/intel/pinctrl-baytrail.c +++ b/drivers/pinctrl/intel/pinctrl-baytrail.c @@ -1549,16 +1549,20 @@ static int byt_gpio_probe(struct byt_gpio *vg) girq->handler = handle_bad_irq; } - ret = devm_gpiochip_add_data(&vg->pdev->dev, gc, vg); + /* + * Needs to happen first since the gpiochip is using pin + * control as back-end. + */ + ret = gpiochip_add_pin_range(gc, dev_name(&vg->pdev->dev), + 0, 0, vg->soc_data->npins); if (ret) { - dev_err(&vg->pdev->dev, "failed adding byt-gpio chip\n"); + dev_err(&vg->pdev->dev, "failed to add GPIO pin range\n"); return ret; } - ret = gpiochip_add_pin_range(&vg->chip, dev_name(&vg->pdev->dev), - 0, 0, vg->soc_data->npins); + ret = devm_gpiochip_add_data(&vg->pdev->dev, gc, vg); if (ret) { - dev_err(&vg->pdev->dev, "failed to add GPIO pin range\n"); + dev_err(&vg->pdev->dev, "failed adding byt-gpio chip\n"); return ret; } (Tell me if I should send this as a separate patch.) It's not entirely logical to have this semantic ordering so the extra comment explains it, I hope, in case it actually works. > TL;DR: commit 88583e340a0e ("pinctrl: intel: baytrail: Pass irqchip when adding gpiochip") > breaks a bunch of stuff and should be dropped from pinctrl/intel.git/for-next > and this needs some more work before it is ready for mainline. I don't know if that is such a good idea if this is a global problem, like something that would potentially disturb any ACPI-based GPIO chip. We might leave something else broken even if we fix the issue locally. Yours, Linus Walleij
Hi, On 30-10-2019 13:42, Linus Walleij wrote: > On Wed, Oct 30, 2019 at 10:31 AM Hans de Goede <hdegoede@redhat.com> wrote: > >> The problem here is that gpiochip_add_data_with_key() calls gpiochip_irqchip_init_hw() >> before it calls gpiochip_irqchip_init_valid_mask(), so after commit 88583e340a0e >> when byt_gpio_irq_init_hw runs gc->irq.valid_mask is NULL and we crash with a NULL >> pointer exception (or so I believe, the kernel never gets far enough to get >> any info out of it without extra work). >> >> Note that this ("[PATCH v1] pinctrl: baytrail: Move IRQ valid mask initialization to a dedicated callback") >> patch fixes this since it moves the gc->irq.valid_mask accesses to >> byt_init_irq_valid_mask. > > OK so we have a halfway fix there. > >> But this change itself triggers another variant of this ordering issue, >> it causes these 2 new errors to get logged: >> >> byt_gpio INT33FC:01: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x01800000 >> byt_gpio INT33FC:02: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x00400000 >> >> The problem is that before this change the code calculating the valid_mask >> would also disable interrupts on GPIOs which do not have their >> BYT_DIRECT_IRQ_EN bit set. This now happens after the check done in >> byt_gpio_irq_init_hw() causing these false-positive error messages. > > Isn't that easily fixed with something like this: > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > index 9afbc0612126..e865c889ba8d 100644 > --- a/drivers/gpio/gpiolib.c > +++ b/drivers/gpio/gpiolib.c > @@ -1411,11 +1411,11 @@ int gpiochip_add_data_with_key(struct > gpio_chip *chip, void *data, > > machine_gpiochip_add(chip); > > - ret = gpiochip_irqchip_init_hw(chip); > + ret = gpiochip_irqchip_init_valid_mask(chip); > if (ret) > goto err_remove_acpi_chip; > > - ret = gpiochip_irqchip_init_valid_mask(chip); > + ret = gpiochip_irqchip_init_hw(chip); > if (ret) > goto err_remove_acpi_chip; > > (I sent a separate patch for this.) Yes I just replied to that patch, this seems like a good fix to me. > It isn't super-easy to know the right ordering semantics > for init_hw vs init_valid_mask I think. Sadly we need to > test it out in practice. Ack. >> Even if we ignore the NULL pointer deref problem for now and we ignore >> these 2 new error messages for now. Things are still broken with the >> current changes in pinctrl/intel.git/for-next switching to letting >> devm_gpiochip_add_data register the irqchip means that >> acpi_gpiochip_request_interrupts() gets called before >> gpiochip_add_pin_range() is called from pinctrl-baytrail.c, causing >> the GPIO lookup of any ACPI _AEI handlers to fail, resulting in >> errors like this one: >> >> byt_gpio INT33FC:02: Failed to request GPIO for pin 0x13: -517 >> >> And none of the _AEI handlers working > > I just vaguely understand this... > > If what you're saying is that the Baytrail driver is dependent > on registering the pin ranges *before* registering the GPIO > chip Yes I think so, I debugged the _AEI handlers not working a bit yesterday and the problem is that pinctrl_gpio_request() fails, first pinctrl_get_device_gpio_range fails with -EPROBEDEFER (*) and then pinctrl_match_gpio_range() also fails. I added some debug pr_err calls to pinctrl_match_gpio_range() and when it runs the range for the gpiochip for which acpi_gpiochip_request_interrupts() is processing _AEI event-handlers is not yet in the registered ranges. *) Which is not treated specially by acpi_gpiochip_request_interrupts() as that normally gets called from the gpiochip driver itself, so the device is expected to alreayd be probed. > can we then: > > diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c > b/drivers/pinctrl/intel/pinctrl-baytrail.c > index beb26550c25f..b308567c5153 100644 > --- a/drivers/pinctrl/intel/pinctrl-baytrail.c > +++ b/drivers/pinctrl/intel/pinctrl-baytrail.c > @@ -1549,16 +1549,20 @@ static int byt_gpio_probe(struct byt_gpio *vg) > girq->handler = handle_bad_irq; > } > > - ret = devm_gpiochip_add_data(&vg->pdev->dev, gc, vg); > + /* > + * Needs to happen first since the gpiochip is using pin > + * control as back-end. > + */ > + ret = gpiochip_add_pin_range(gc, dev_name(&vg->pdev->dev), > + 0, 0, vg->soc_data->npins); > if (ret) { > - dev_err(&vg->pdev->dev, "failed adding byt-gpio chip\n"); > + dev_err(&vg->pdev->dev, "failed to add GPIO pin range\n"); > return ret; > } > > - ret = gpiochip_add_pin_range(&vg->chip, dev_name(&vg->pdev->dev), > - 0, 0, vg->soc_data->npins); > + ret = devm_gpiochip_add_data(&vg->pdev->dev, gc, vg); > if (ret) { > - dev_err(&vg->pdev->dev, "failed to add GPIO pin range\n"); > + dev_err(&vg->pdev->dev, "failed adding byt-gpio chip\n"); > return ret; > } > > (Tell me if I should send this as a separate patch.) If you want me to test if this fixes the issue, then yes please. > It's not entirely logical to have this semantic ordering so > the extra comment explains it, I hope, in case it actually > works. > >> TL;DR: commit 88583e340a0e ("pinctrl: intel: baytrail: Pass irqchip when adding gpiochip") >> breaks a bunch of stuff and should be dropped from pinctrl/intel.git/for-next >> and this needs some more work before it is ready for mainline. > > I don't know if that is such a good idea if this is a global problem, > like something that would potentially disturb any ACPI-based > GPIO chip. We might leave something else broken even if we > fix the issue locally. Right, I did a quick check and at least these x86 pinctrl drivers all 3 have this ordering problem once the irq chip registration is moved to the gpiochip_add_data() call. drivers/pinctrl/intel/pinctrl-baytrail.c drivers/pinctrl/intel/pinctrl-cherryview.c drivers/pinctrl/intel/pinctrl-intel.c drivers/pinctrl/pinctrl-amd.c And it seems that drivers/gpio/gpio-merrifield.c is already suffering from this problem in 5.4! So some more generic solution would be ideal... Regards, Hans
On Wed, Oct 30, 2019 at 02:11:50PM +0100, Hans de Goede wrote: > On 30-10-2019 13:42, Linus Walleij wrote: > > On Wed, Oct 30, 2019 at 10:31 AM Hans de Goede <hdegoede@redhat.com> wrote: > > > TL;DR: commit 88583e340a0e ("pinctrl: intel: baytrail: Pass irqchip when adding gpiochip") > > > breaks a bunch of stuff and should be dropped from pinctrl/intel.git/for-next > > > and this needs some more work before it is ready for mainline. > > > > I don't know if that is such a good idea if this is a global problem, > > like something that would potentially disturb any ACPI-based > > GPIO chip. We might leave something else broken even if we > > fix the issue locally. > > Right, I did a quick check and at least these x86 pinctrl drivers > all 3 have this ordering problem once the irq chip registration is > moved to the gpiochip_add_data() call. > > drivers/pinctrl/intel/pinctrl-baytrail.c > drivers/pinctrl/intel/pinctrl-cherryview.c > drivers/pinctrl/intel/pinctrl-intel.c > drivers/pinctrl/pinctrl-amd.c Hmm.. do we have cherryview broken in next / vanilla? > > And it seems that drivers/gpio/gpio-merrifield.c is already > suffering from this problem in 5.4! > > So some more generic solution would be ideal...
On Wed, Oct 30, 2019 at 2:11 PM Hans de Goede <hdegoede@redhat.com> wrote: > > (Tell me if I should send this as a separate patch.) > > If you want me to test if this fixes the issue, then yes please. I sent a patch fixing all the instances we can immediately spot. > Right, I did a quick check and at least these x86 pinctrl drivers > all 3 have this ordering problem once the irq chip registration is > moved to the gpiochip_add_data() call. > > drivers/pinctrl/intel/pinctrl-baytrail.c > drivers/pinctrl/intel/pinctrl-cherryview.c > drivers/pinctrl/intel/pinctrl-intel.c > drivers/pinctrl/pinctrl-amd.c > > And it seems that drivers/gpio/gpio-merrifield.c is already > suffering from this problem in 5.4! I fixed all of these in my patch, let's see what happens. If we need to partially backport it to v5.4 then let's just do that. Yours, Linus Walleij
Hi, On 30-10-2019 15:47, Andy Shevchenko wrote: > On Wed, Oct 30, 2019 at 02:11:50PM +0100, Hans de Goede wrote: >> On 30-10-2019 13:42, Linus Walleij wrote: >>> On Wed, Oct 30, 2019 at 10:31 AM Hans de Goede <hdegoede@redhat.com> wrote: > >>>> TL;DR: commit 88583e340a0e ("pinctrl: intel: baytrail: Pass irqchip when adding gpiochip") >>>> breaks a bunch of stuff and should be dropped from pinctrl/intel.git/for-next >>>> and this needs some more work before it is ready for mainline. >>> >>> I don't know if that is such a good idea if this is a global problem, >>> like something that would potentially disturb any ACPI-based >>> GPIO chip. We might leave something else broken even if we >>> fix the issue locally. >> >> Right, I did a quick check and at least these x86 pinctrl drivers >> all 3 have this ordering problem once the irq chip registration is >> moved to the gpiochip_add_data() call. >> >> drivers/pinctrl/intel/pinctrl-baytrail.c >> drivers/pinctrl/intel/pinctrl-cherryview.c >> drivers/pinctrl/intel/pinctrl-intel.c >> drivers/pinctrl/pinctrl-amd.c > > Hmm.. do we have cherryview broken in next / vanilla? In 5.4-rc# the irq chip is still registered as a separate step after the gpiochip_add_pin_range calls. I'm also not seen any troublesome patches here: https://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/intel.git/log/?h=for-next >> And it seems that drivers/gpio/gpio-merrifield.c is already >> suffering from this problem in 5.4! >> >> So some more generic solution would be ideal... merrifield OTOH definitely seems broken wrt this (when looking at the code). Regards, Hans
diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c b/drivers/pinctrl/intel/pinctrl-baytrail.c index beb26550c25f..08e2b940cc11 100644 --- a/drivers/pinctrl/intel/pinctrl-baytrail.c +++ b/drivers/pinctrl/intel/pinctrl-baytrail.c @@ -1432,22 +1432,10 @@ static void byt_init_irq_valid_mask(struct gpio_chip *chip, unsigned long *valid_mask, unsigned int ngpios) { - /* - * FIXME: currently the valid_mask is filled in as part of - * initializing the irq_chip below in byt_gpio_irq_init_hw(). - * when converting this driver to the new way of passing the - * gpio_irq_chip along when adding the gpio_chip, move the - * mask initialization into this callback instead. Right now - * this callback is here to make sure the mask gets allocated. - */ -} - -static int byt_gpio_irq_init_hw(struct gpio_chip *gc) -{ - struct byt_gpio *vg = gpiochip_get_data(gc); + struct byt_gpio *vg = gpiochip_get_data(chip); struct device *dev = &vg->pdev->dev; void __iomem *reg; - u32 base, value; + u32 value; int i; /* @@ -1468,13 +1456,20 @@ static int byt_gpio_irq_init_hw(struct gpio_chip *gc) value = readl(reg); if (value & BYT_DIRECT_IRQ_EN) { - clear_bit(i, gc->irq.valid_mask); + clear_bit(i, valid_mask); dev_dbg(dev, "excluding GPIO %d from IRQ domain\n", i); } else if ((value & BYT_PIN_MUX) == byt_get_gpio_mux(vg, i)) { byt_gpio_clear_triggering(vg, i); dev_dbg(dev, "disabling GPIO %d\n", i); } } +} + +static int byt_gpio_irq_init_hw(struct gpio_chip *gc) +{ + struct byt_gpio *vg = gpiochip_get_data(gc); + void __iomem *reg; + u32 base, value; /* clear interrupt status trigger registers */ for (base = 0; base < vg->soc_data->npins; base += 32) {
There is a logical continuation of the commit 5fbe5b5883f8 ("gpio: Initialize the irqchip valid_mask with a callback") to split IRQ initialization to hardware and valid mask setup parts. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> --- drivers/pinctrl/intel/pinctrl-baytrail.c | 25 ++++++++++-------------- 1 file changed, 10 insertions(+), 15 deletions(-)