mbox series

[v2,0/4] gpio: fix SRCU bugs

Message ID 20240214084419.6194-1-brgl@bgdev.pl
Headers show
Series gpio: fix SRCU bugs | expand

Message

Bartosz Golaszewski Feb. 14, 2024, 8:44 a.m. UTC
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Here are four fixes to some bugs in recent SRCU changes. The first one fixes
an actual race condition. The other three just make lockdep happy.

v1 -> v2:
- use srcu_dereference() instead of rcu_dereference_protected() as
  advised by Paul
- add a patch using rcu_dereference_check(..., 1) in deprecated
  interfaces that return the address of the RCU-protected chip structure
  to external users (who shouldn't use it anyway but well...)
- pick up review tags for patches 1/4 and 2/4

Bartosz Golaszewski (4):
  gpio: take the SRCU read lock in gpiod_hog()
  gpio: cdev: use correct pointer accessors with SRCU
  gpio: use srcu_dereference() with SRCU-protected pointers
  gpio: don't let lockdep complain about inherently dangerous RCU usage

 drivers/gpio/gpiolib-cdev.c  | 25 ++++++++++++-------------
 drivers/gpio/gpiolib-sysfs.c |  5 +++--
 drivers/gpio/gpiolib.c       | 32 ++++++++++++++++++--------------
 drivers/gpio/gpiolib.h       |  3 ++-
 4 files changed, 35 insertions(+), 30 deletions(-)

Comments

Andy Shevchenko Feb. 14, 2024, 12:53 p.m. UTC | #1
On Wed, Feb 14, 2024 at 09:44:15AM +0100, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> an actual race condition. The other three just make lockdep happy.

Same comment here, can we simply redo this work so we won't have tons of fixes
on top of the nice RCU rework?
Bartosz Golaszewski Feb. 14, 2024, 1:02 p.m. UTC | #2
On Wed, Feb 14, 2024 at 1:53 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Wed, Feb 14, 2024 at 09:44:15AM +0100, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> > an actual race condition. The other three just make lockdep happy.
>
> Same comment here, can we simply redo this work so we won't have tons of fixes
> on top of the nice RCU rework?
>

The rework is clearly a new development - not meant for backporting. I
don't see any benefit from rebasing. These are normal fixes for a big
rework.

Bart
Mark Brown Feb. 14, 2024, 1:17 p.m. UTC | #3
On Wed, Feb 14, 2024 at 09:44:15AM +0100, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> an actual race condition. The other three just make lockdep happy.

This doesn't fix the issue I reported yesterday when applied on top of
today's next:

   https://lava.sirena.org.uk/scheduler/job/585469

[    1.995518] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078

...

[    2.176162] Call trace:
[    2.178610]  check_init_srcu_struct+0x1c/0xa0
[    2.182974]  synchronize_srcu+0x1c/0x100
[    2.186904]  gpiod_request_commit+0xec/0x1e0
[    2.191183]  gpiochip_request_own_desc+0x58/0x124
[    2.195894]  gpiod_hog+0x114/0x1b4
[    2.199305]  of_gpiochip_add+0x208/0x370
[    2.203232]  gpiochip_add_data_with_key+0x71c/0xf10
[    2.208117]  devm_gpiochip_add_data_with_key+0x30/0x7c
[    2.213261]  mxc_gpio_probe+0x208/0x4b0
Bartosz Golaszewski Feb. 14, 2024, 1:22 p.m. UTC | #4
On Wed, Feb 14, 2024 at 2:17 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Feb 14, 2024 at 09:44:15AM +0100, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> > an actual race condition. The other three just make lockdep happy.
>
> This doesn't fix the issue I reported yesterday when applied on top of
> today's next:
>
>    https://lava.sirena.org.uk/scheduler/job/585469
>
> [    1.995518] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078
>
> ...
>
> [    2.176162] Call trace:
> [    2.178610]  check_init_srcu_struct+0x1c/0xa0
> [    2.182974]  synchronize_srcu+0x1c/0x100
> [    2.186904]  gpiod_request_commit+0xec/0x1e0
> [    2.191183]  gpiochip_request_own_desc+0x58/0x124
> [    2.195894]  gpiod_hog+0x114/0x1b4
> [    2.199305]  of_gpiochip_add+0x208/0x370
> [    2.203232]  gpiochip_add_data_with_key+0x71c/0xf10
> [    2.208117]  devm_gpiochip_add_data_with_key+0x30/0x7c
> [    2.213261]  mxc_gpio_probe+0x208/0x4b0

No, the fix for this issue is in my tree as commit ba5c5effe02c
("gpio: initialize descriptor SRCU structure before adding OF-based
chips"). These are mostly fixes for lockdep warnings.

Bart
Paul E. McKenney Feb. 14, 2024, 6:44 p.m. UTC | #5
On Wed, Feb 14, 2024 at 09:44:15AM +0100, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> an actual race condition. The other three just make lockdep happy.

For 1/4-3/4:

Acked-by: Paul E. McKenney <paulmck@kernel.org>

For 4/4, you are playing with fire, but I will assume that you know what
you are doing.  ;-)

							Thanx, Paul

> v1 -> v2:
> - use srcu_dereference() instead of rcu_dereference_protected() as
>   advised by Paul
> - add a patch using rcu_dereference_check(..., 1) in deprecated
>   interfaces that return the address of the RCU-protected chip structure
>   to external users (who shouldn't use it anyway but well...)
> - pick up review tags for patches 1/4 and 2/4
> 
> Bartosz Golaszewski (4):
>   gpio: take the SRCU read lock in gpiod_hog()
>   gpio: cdev: use correct pointer accessors with SRCU
>   gpio: use srcu_dereference() with SRCU-protected pointers
>   gpio: don't let lockdep complain about inherently dangerous RCU usage
> 
>  drivers/gpio/gpiolib-cdev.c  | 25 ++++++++++++-------------
>  drivers/gpio/gpiolib-sysfs.c |  5 +++--
>  drivers/gpio/gpiolib.c       | 32 ++++++++++++++++++--------------
>  drivers/gpio/gpiolib.h       |  3 ++-
>  4 files changed, 35 insertions(+), 30 deletions(-)
> 
> -- 
> 2.40.1
>
Bartosz Golaszewski Feb. 14, 2024, 7:08 p.m. UTC | #6
On Wed, Feb 14, 2024 at 7:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Feb 14, 2024 at 09:44:15AM +0100, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> > an actual race condition. The other three just make lockdep happy.
>
> For 1/4-3/4:
>
> Acked-by: Paul E. McKenney <paulmck@kernel.org>
>
> For 4/4, you are playing with fire, but I will assume that you know what
> you are doing.  ;-)
>

Up until this rework, this gdev->chip pointer could go from under any
user at any point. Now we have this gpio_device wrapper that provides
an entry point to using the chip safely while protected by the SRCU
read lock. Anyone who is still accessing gpio_chip directly (and not
being the GPIO provider themselves) is asking for trouble. There's
however no point in spamming lockdep splats in this case. I may end up
adding a warning to these routines.

Unfortunately, it's hard to fix 15 years of technical debt. :(

Thanks for the Acks.
Bartosz

[snip]
Paul E. McKenney Feb. 14, 2024, 8:44 p.m. UTC | #7
On Wed, Feb 14, 2024 at 08:08:33PM +0100, Bartosz Golaszewski wrote:
> On Wed, Feb 14, 2024 at 7:44 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Feb 14, 2024 at 09:44:15AM +0100, Bartosz Golaszewski wrote:
> > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > >
> > > Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> > > an actual race condition. The other three just make lockdep happy.
> >
> > For 1/4-3/4:
> >
> > Acked-by: Paul E. McKenney <paulmck@kernel.org>
> >
> > For 4/4, you are playing with fire, but I will assume that you know what
> > you are doing.  ;-)
> 
> Up until this rework, this gdev->chip pointer could go from under any
> user at any point. Now we have this gpio_device wrapper that provides
> an entry point to using the chip safely while protected by the SRCU
> read lock. Anyone who is still accessing gpio_chip directly (and not
> being the GPIO provider themselves) is asking for trouble. There's
> however no point in spamming lockdep splats in this case. I may end up
> adding a warning to these routines.
> 
> Unfortunately, it's hard to fix 15 years of technical debt. :(

Indeed, life is sometimes hard!

							Thanx, Paul

> Thanks for the Acks.
> Bartosz
> 
> [snip]
Bartosz Golaszewski Feb. 15, 2024, 7:43 a.m. UTC | #8
On Wed, Feb 14, 2024 at 9:44 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
>
> Here are four fixes to some bugs in recent SRCU changes. The first one fixes
> an actual race condition. The other three just make lockdep happy.
>
> v1 -> v2:
> - use srcu_dereference() instead of rcu_dereference_protected() as
>   advised by Paul
> - add a patch using rcu_dereference_check(..., 1) in deprecated
>   interfaces that return the address of the RCU-protected chip structure
>   to external users (who shouldn't use it anyway but well...)
> - pick up review tags for patches 1/4 and 2/4
>
> Bartosz Golaszewski (4):
>   gpio: take the SRCU read lock in gpiod_hog()
>   gpio: cdev: use correct pointer accessors with SRCU
>   gpio: use srcu_dereference() with SRCU-protected pointers
>   gpio: don't let lockdep complain about inherently dangerous RCU usage
>
>  drivers/gpio/gpiolib-cdev.c  | 25 ++++++++++++-------------
>  drivers/gpio/gpiolib-sysfs.c |  5 +++--
>  drivers/gpio/gpiolib.c       | 32 ++++++++++++++++++--------------
>  drivers/gpio/gpiolib.h       |  3 ++-
>  4 files changed, 35 insertions(+), 30 deletions(-)
>
> --
> 2.40.1
>

Series queued.

Bart
Linus Walleij Feb. 21, 2024, 9:45 p.m. UTC | #9
On Wed, Feb 14, 2024 at 8:08 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:

> Unfortunately, it's hard to fix 15 years of technical debt. :(

If it's any consolation I didn't create this one technical debt on purpose.

The actual mistake creating it goes something like: /dev/gpiochipN files are
nice and we can clean up after a file handle is closed or terminated,
I wonder why people insist on using sysfs for so many things.

All the file semantics of handles going away by being used etc, that's
why sysfs is good. I was too inexperienced to understand that, or I would
have paid more attention...

Yet I think we ended up in a reasonable place.

Yours,
Linus Walleij