diff mbox series

[v2] Do not mark ACPI devices as irq safe

Message ID 20240813161254.3509409-1-leitao@debian.org
State Handled Elsewhere
Headers show
Series [v2] Do not mark ACPI devices as irq safe | expand

Commit Message

Breno Leitao Aug. 13, 2024, 4:12 p.m. UTC
On ACPI machines, the tegra i2c module encounters an issue due to a
mutex being called inside a spinlock. This leads to the following bug:

	BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
	in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
	preempt_count: 0, expected: 0
	RCU nest depth: 0, expected: 0
	irq event stamp: 0

	Call trace:
	__might_sleep
	__mutex_lock_common
	mutex_lock_nested
	acpi_subsys_runtime_resume
	rpm_resume
	tegra_i2c_xfer

The problem arises because during __pm_runtime_resume(), the spinlock
&dev->power.lock is acquired before rpm_resume() is called. Later,
rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
mutexes, triggering the error.

To address this issue, devices on ACPI are now marked as not IRQ-safe,
considering the dependency of acpi_subsys_runtime_resume() on mutexes.

Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
---
Changelog:
v2:
  * Replaced ACPI_HANDLE() by has_acpi_companion() (Andy Shevchenko)
  * Expanded the comment before the change (Andy Shevchenko)
  * Simplified the stack in the summary (Andy Shevchenko)

 drivers/i2c/busses/i2c-tegra.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Andy Shevchenko Aug. 13, 2024, 7:45 p.m. UTC | #1
On Tue, Aug 13, 2024 at 7:13 PM Breno Leitao <leitao@debian.org> wrote:
>
> On ACPI machines, the tegra i2c module encounters an issue due to a
> mutex being called inside a spinlock. This leads to the following bug:
>
>         BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
>         in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
>         preempt_count: 0, expected: 0
>         RCU nest depth: 0, expected: 0
>         irq event stamp: 0
>
>         Call trace:
>         __might_sleep
>         __mutex_lock_common
>         mutex_lock_nested
>         acpi_subsys_runtime_resume
>         rpm_resume
>         tegra_i2c_xfer

The above stacktrace is still too verbose. Submitting Patches
documentation is clear about this. Please, remove unrelated,
insignificant lines, like
"irq event stamp: 0" which gives no valuable information. So, at the
end it will be ~5-6 lines only. Other than that, LGTM.

> The problem arises because during __pm_runtime_resume(), the spinlock
> &dev->power.lock is acquired before rpm_resume() is called. Later,
> rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> mutexes, triggering the error.
>
> To address this issue, devices on ACPI are now marked as not IRQ-safe,
> considering the dependency of acpi_subsys_runtime_resume() on mutexes.
Wolfram Sang Aug. 13, 2024, 7:47 p.m. UTC | #2
> end it will be ~5-6 lines only. Other than that, LGTM.

What about a proper prefix in the subject?
Andi Shyti Aug. 13, 2024, 10:53 p.m. UTC | #3
Hi Breno,

You don't need to resend the patch. Because the changes are only
in the commit log, I can take care of them.

First of all, we need to fix the title to be:

"i2c: tegra: Do not mark ACPI devices as irq safe"

On Tue, Aug 13, 2024 at 09:12:53AM GMT, Breno Leitao wrote:
> On ACPI machines, the tegra i2c module encounters an issue due to a
> mutex being called inside a spinlock. This leads to the following bug:
> 
> 	BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
> 	in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
> 	preempt_count: 0, expected: 0
> 	RCU nest depth: 0, expected: 0
> 	irq event stamp: 0
> 
> 	Call trace:
> 	__might_sleep
> 	__mutex_lock_common
> 	mutex_lock_nested
> 	acpi_subsys_runtime_resume
> 	rpm_resume
> 	tegra_i2c_xfer

We can keep the trace as:

	BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
	...

	Call trace:
	__might_sleep
	__mutex_lock_common
	mutex_lock_nested
	acpi_subsys_runtime_resume
	rpm_resume
	tegra_i2c_xfer

> The problem arises because during __pm_runtime_resume(), the spinlock
> &dev->power.lock is acquired before rpm_resume() is called. Later,
> rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> mutexes, triggering the error.
> 
> To address this issue, devices on ACPI are now marked as not IRQ-safe,
> considering the dependency of acpi_subsys_runtime_resume() on mutexes.
> 
> Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
> Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
> Reviewed-by: Andy Shevchenko <andy@kernel.org>

I haven't seen Andy explicitly tagging this patch. Andy, can we
keep it? Or have I missed it.

Besides, you also need:

Fixes: ede2299f7101 ("i2c: tegra: Support atomic transfers")
Cc: <stable@vger.kernel.org> # v5.6+

Can you please check whether this is right?

This patch won't apply, though, as far as 5.6 so you should
expect to provide some support for the stable backport.

Andi
Breno Leitao Aug. 14, 2024, 8:47 a.m. UTC | #4
Hello Andi,

On Tue, Aug 13, 2024 at 11:53:17PM +0100, Andi Shyti wrote:
> Hi Breno,
> 
> You don't need to resend the patch. Because the changes are only
> in the commit log, I can take care of them.

In fact, the changes are in the code itself, see the changelog:

  * Replaced ACPI_HANDLE() by has_acpi_companion() (Andy Shevchenko)
  * Expanded the comment before the change (Andy Shevchenko)

> Besides, you also need:
> 
> Fixes: ede2299f7101 ("i2c: tegra: Support atomic transfers")
> Cc: <stable@vger.kernel.org> # v5.6+
> 
> Can you please check whether this is right?

I would say that we probably want to blame the support for ACPI device,
which came later than ede2299f7101 ("i2c: tegra: Support atomic
transfers").

I'd suggest the following:

 Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support")
 CC: <stable@vger.kernel.org> # v5.17+

I am not planning to submit a new patch with these changes, please let
me know if you need action on my side.

Thanks for handling this fix,
--breno
Andi Shyti Aug. 14, 2024, 11:02 a.m. UTC | #5
Hi Breno,

On Wed, Aug 14, 2024 at 01:47:54AM GMT, Breno Leitao wrote:
> On Tue, Aug 13, 2024 at 11:53:17PM +0100, Andi Shyti wrote:
> > You don't need to resend the patch. Because the changes are only
> > in the commit log, I can take care of them.
> 
> In fact, the changes are in the code itself, see the changelog:
> 
>   * Replaced ACPI_HANDLE() by has_acpi_companion() (Andy Shevchenko)
>   * Expanded the comment before the change (Andy Shevchenko)

I meant no need to send a v3.

> > Besides, you also need:
> > 
> > Fixes: ede2299f7101 ("i2c: tegra: Support atomic transfers")
> > Cc: <stable@vger.kernel.org> # v5.6+
> > 
> > Can you please check whether this is right?
> 
> I would say that we probably want to blame the support for ACPI device,
> which came later than ede2299f7101 ("i2c: tegra: Support atomic
> transfers").
> 
> I'd suggest the following:
> 
>  Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support")
>  CC: <stable@vger.kernel.org> # v5.17+

Makes sense.

> I am not planning to submit a new patch with these changes, please let
> me know if you need action on my side.

Not for now, you might need to still support the backports to
stable as there might be some differences and I can already see
that it doesn't apply that far back (from 6.1, basically).

Andi

> Thanks for handling this fix,
> --breno
Andy Shevchenko Aug. 14, 2024, 12:56 p.m. UTC | #6
On Tue, Aug 13, 2024 at 09:47:02PM +0200, Wolfram Sang wrote:
> 
> > end it will be ~5-6 lines only. Other than that, LGTM.
> 
> What about a proper prefix in the subject?

Indeed, but seems Andi is kind to amend this whilst applying.
Breno Leitao Aug. 14, 2024, 1:40 p.m. UTC | #7
On Wed, Aug 14, 2024 at 12:02:57PM +0100, Andi Shyti wrote:
> Hi Breno,
> 
> On Wed, Aug 14, 2024 at 01:47:54AM GMT, Breno Leitao wrote:
> > On Tue, Aug 13, 2024 at 11:53:17PM +0100, Andi Shyti wrote:
> > > You don't need to resend the patch. Because the changes are only
> > > in the commit log, I can take care of them.
> > 
> > In fact, the changes are in the code itself, see the changelog:
> > 
> >   * Replaced ACPI_HANDLE() by has_acpi_companion() (Andy Shevchenko)
> >   * Expanded the comment before the change (Andy Shevchenko)
> 
> I meant no need to send a v3.
> 
> > > Besides, you also need:
> > > 
> > > Fixes: ede2299f7101 ("i2c: tegra: Support atomic transfers")
> > > Cc: <stable@vger.kernel.org> # v5.6+
> > > 
> > > Can you please check whether this is right?
> > 
> > I would say that we probably want to blame the support for ACPI device,
> > which came later than ede2299f7101 ("i2c: tegra: Support atomic
> > transfers").
> > 
> > I'd suggest the following:
> > 
> >  Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support")
> >  CC: <stable@vger.kernel.org> # v5.17+
> 
> Makes sense.
> 
> > I am not planning to submit a new patch with these changes, please let
> > me know if you need action on my side.
> 
> Not for now, you might need to still support the backports to
> stable as there might be some differences and I can already see
> that it doesn't apply that far back (from 6.1, basically).

Sure, count me on, if you need backports to stable.

Thanks for getting this fixed
--breno
Andi Shyti Aug. 14, 2024, 10:30 p.m. UTC | #8
Hi Breno,

On Tue, Aug 13, 2024 at 09:12:53AM GMT, Breno Leitao wrote:
> On ACPI machines, the tegra i2c module encounters an issue due to a
> mutex being called inside a spinlock. This leads to the following bug:
> 
> 	BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
> 	in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1282, name: kssif0010
> 	preempt_count: 0, expected: 0
> 	RCU nest depth: 0, expected: 0
> 	irq event stamp: 0
> 
> 	Call trace:
> 	__might_sleep
> 	__mutex_lock_common
> 	mutex_lock_nested
> 	acpi_subsys_runtime_resume
> 	rpm_resume
> 	tegra_i2c_xfer
> 
> The problem arises because during __pm_runtime_resume(), the spinlock
> &dev->power.lock is acquired before rpm_resume() is called. Later,
> rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on
> mutexes, triggering the error.
> 
> To address this issue, devices on ACPI are now marked as not IRQ-safe,
> considering the dependency of acpi_subsys_runtime_resume() on mutexes.
> 
> Co-developed-by: Michael van der Westhuizen <rmikey@meta.com>
> Signed-off-by: Michael van der Westhuizen <rmikey@meta.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
> Reviewed-by: Andy Shevchenko <andy@kernel.org>

merged to i2c/i2c-host-fixes.

Thanks,
Andi
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 85b31edc558d..1df5b4204142 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -1802,9 +1802,9 @@  static int tegra_i2c_probe(struct platform_device *pdev)
 	 * domain.
 	 *
 	 * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't
-	 * be used for atomic transfers.
+	 * be used for atomic transfers. ACPI device is not IRQ safe also.
 	 */
-	if (!IS_VI(i2c_dev))
+	if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev))
 		pm_runtime_irq_safe(i2c_dev->dev);
 
 	pm_runtime_enable(i2c_dev->dev);