Message ID | 20240520220104.3602-1-asmaa@nvidia.com |
---|---|
Headers | show |
Series | UBUNTU: SAUCE: gpio-mlxbf3: During reboot test, ipmb driver fails to load intermittently | expand |
On 5/20/24 4:01 PM, Asmaa Mnebhi wrote: > BugLink: https://bugs.launchpad.net/bugs/2066198 > > SRU Justification: > > [Impact] > > The ipmb driver failing to load is just the result of i2c-mlxbf > not receiving interrupts. > In fact, any driver dependent on the i2c-mlxbf driver will not work. > > How to reproduce this issue? > > - modprobe gpio-mlxbf3 > - modprobe pwr-mlxbf > - modprobe mlxbf-gige -> this calls into the gpio driver which enables the PHY interrupt (gpio10) > - reboot linux > -> graceful reboot does not remove modules so it doesn't disable the PHY interrupt via > mlxbf3_gpio_irq_disable. Hence, the interrupt remains enabled. > - In anolis, we don't enforce the dependency between gpio-mlxbf3 and mlxbf-gige. > So the next time linux boots and loads the driver in this order, we encounter the issue: > - modprobe mlxbf-gige. The gige driver uses polling in the case where it loads before the gpio > driver. Note that the interrupt at GPIO10 is still enabled at this point so if the interrupt > triggers, there is nothing to clear it. > - modprobe gpio-mlxbf3 > - modprobe i2c-mlxbf. The interrupt wouldn't work here because it is shared with the gpio > interrupts which was not cleared. > > [Fix] > > * The solution is to add a shutdown function to the gpio driver to clear and disable all interrupts. > * Also make sure to clear the interrupt after disabling it in the disable irq function. > > [Test Case] > > * Do the reboot test (2000-3000 iterations) > * Check that all following drivers are loaded without errors: gpio-mlxbf3, pwr_mlxbf, mlxbf-gige, i2c-mlxbf > * check that the ipmb drivers are loaded and functional (send ipmb command to the bmc and vice versa) > > [Regression Potential] > > * No known regression. > Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com> On Tue, May 21, 2024 at 12:02 AM Asmaa Mnebhi <asmaa@nvidia.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/2066198 > > SRU Justification: > > [Impact] > > The ipmb driver failing to load is just the result of i2c-mlxbf > not receiving interrupts. > In fact, any driver dependent on the i2c-mlxbf driver will not work. > > How to reproduce this issue? > > - modprobe gpio-mlxbf3 > - modprobe pwr-mlxbf > - modprobe mlxbf-gige -> this calls into the gpio driver which enables the PHY interrupt (gpio10) > - reboot linux > -> graceful reboot does not remove modules so it doesn't disable the PHY interrupt via > mlxbf3_gpio_irq_disable. Hence, the interrupt remains enabled. > - In anolis, we don't enforce the dependency between gpio-mlxbf3 and mlxbf-gige. > So the next time linux boots and loads the driver in this order, we encounter the issue: > - modprobe mlxbf-gige. The gige driver uses polling in the case where it loads before the gpio > driver. Note that the interrupt at GPIO10 is still enabled at this point so if the interrupt > triggers, there is nothing to clear it. > - modprobe gpio-mlxbf3 > - modprobe i2c-mlxbf. The interrupt wouldn't work here because it is shared with the gpio > interrupts which was not cleared. > > [Fix] > > * The solution is to add a shutdown function to the gpio driver to clear and disable all interrupts. > * Also make sure to clear the interrupt after disabling it in the disable irq function. > > [Test Case] > > * Do the reboot test (2000-3000 iterations) > * Check that all following drivers are loaded without errors: gpio-mlxbf3, pwr_mlxbf, mlxbf-gige, i2c-mlxbf > * check that the ipmb drivers are loaded and functional (send ipmb command to the bmc and vice versa) > > [Regression Potential] > > * No known regression. >
Applied to jammy:linux-bluefield/master-next. Thanks. -- Best regards, Bartlomiej On Tue, May 21, 2024 at 12:02 AM Asmaa Mnebhi <asmaa@nvidia.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/2066198 > > SRU Justification: > > [Impact] > > The ipmb driver failing to load is just the result of i2c-mlxbf > not receiving interrupts. > In fact, any driver dependent on the i2c-mlxbf driver will not work. > > How to reproduce this issue? > > - modprobe gpio-mlxbf3 > - modprobe pwr-mlxbf > - modprobe mlxbf-gige -> this calls into the gpio driver which enables the PHY interrupt (gpio10) > - reboot linux > -> graceful reboot does not remove modules so it doesn't disable the PHY interrupt via > mlxbf3_gpio_irq_disable. Hence, the interrupt remains enabled. > - In anolis, we don't enforce the dependency between gpio-mlxbf3 and mlxbf-gige. > So the next time linux boots and loads the driver in this order, we encounter the issue: > - modprobe mlxbf-gige. The gige driver uses polling in the case where it loads before the gpio > driver. Note that the interrupt at GPIO10 is still enabled at this point so if the interrupt > triggers, there is nothing to clear it. > - modprobe gpio-mlxbf3 > - modprobe i2c-mlxbf. The interrupt wouldn't work here because it is shared with the gpio > interrupts which was not cleared. > > [Fix] > > * The solution is to add a shutdown function to the gpio driver to clear and disable all interrupts. > * Also make sure to clear the interrupt after disabling it in the disable irq function. > > [Test Case] > > * Do the reboot test (2000-3000 iterations) > * Check that all following drivers are loaded without errors: gpio-mlxbf3, pwr_mlxbf, mlxbf-gige, i2c-mlxbf > * check that the ipmb drivers are loaded and functional (send ipmb command to the bmc and vice versa) > > [Regression Potential] > > * No known regression. >