diff mbox

[RFC] i2c: Don't wait for device release in i2c_del_adapter

Message ID 1421082050-10213-1-git-send-email-pantelis.antoniou@konsulko.com
State Deferred
Headers show

Commit Message

Pantelis Antoniou Jan. 12, 2015, 5 p.m. UTC
Waiting for the device release method to be called when
going through i2c_del_adapter is wrong and leads to deadlock
when removing an i2c mux device.

For instance when using the OF i2c mux unitest removal we get this.

[<c055dfdc>] (__schedule) from [<c0561b74>] (schedule_timeout+0x18/0x198)
[<c0561b74>] (schedule_timeout) from [<c055ee74>] (wait_for_common+0xf8/0x138)
[<c055ee74>] (wait_for_common) from [<c03f724c>] (i2c_del_adapter+0x174/0x1cc)
[<c03f724c>] (i2c_del_adapter) from [<c03f8480>] (i2c_del_mux_adapter+0x48/0x60)
[<c03f8480>] (i2c_del_mux_adapter) from [<c046cf00>] (selftest_i2c_mux_remove+0x28/0x34)
[<c046cf00>] (selftest_i2c_mux_remove) from [<c03f5234>] (i2c_device_remove+0x34/0x70)
[<c03f5234>] (i2c_device_remove) from [<c03322ec>] (__device_release_driver+0x7c/0xc0)
[<c03322ec>] (__device_release_driver) from [<c0332968>] (driver_detach+0x8c/0xb4)
[<c0332968>] (driver_detach) from [<c0332088>] (bus_remove_driver+0x64/0x8c)
[<c0332088>] (bus_remove_driver) from [<c07f1858>] (of_selftest+0x1f84/0x20f0)
[<c07f1858>] (of_selftest) from [<c0008b04>] (do_one_initcall+0x104/0x1b4)
[<c0008b04>] (do_one_initcall) from [<c07c4d9c>] (kernel_init_freeable+0x110/0x1d8)
[<c07c4d9c>] (kernel_init_freeable) from [<c05591ec>] (kernel_init+0x8/0xe4)
[<c05591ec>] (kernel_init) from [<c000deb8>] (ret_from_fork+0x14/0x3c)
2 locks held by swapper/0/1:
 #0:  (&dev->mutex){......}, at: [<c0332944>] driver_detach+0x68/0xb4
 #1:  (&dev->mutex){......}, at: [<c0332954>] driver_detach+0x78/0xb4
Kernel panic - not syncing: hung_task: blocked tasks
CPU: 0 PID: 16 Comm: khungtaskd Not tainted 3.19.0-rc4-00022-g261647c #443
Hardware name: Generic AM33XX (Flattened Device Tree)
[<c00140a8>] (unwind_backtrace) from [<c00110dc>] (show_stack+0x10/0x14)
[<c00110dc>] (show_stack) from [<c055c3ac>] (dump_stack+0x70/0x8c)
[<c055c3ac>] (dump_stack) from [<c055ad10>] (panic+0x88/0x1f0)
[<c055ad10>] (panic) from [<c00adef8>] (watchdog+0x2d4/0x350)
[<c00adef8>] (watchdog) from [<c004e984>] (kthread+0xd8/0xec)
[<c004e984>] (kthread) from [<c000deb8>] (ret_from_fork+0x14/0x3c)

The device release method is never called and we hang waiting for it.

This patch is marked as an [RFC] since the original code seems to
really want to protect against a race from sysfs accessors, which
I don't see how it could be a problem.

Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
---
 drivers/i2c/i2c-core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Wolfram Sang Jan. 13, 2015, 3:29 p.m. UTC | #1
On Mon, Jan 12, 2015 at 07:00:50PM +0200, Pantelis Antoniou wrote:
> Waiting for the device release method to be called when
> going through i2c_del_adapter is wrong and leads to deadlock
> when removing an i2c mux device.

Adding Jean to the list, maybe he knows more details from the past.

> 
> For instance when using the OF i2c mux unitest removal we get this.
> 
> [<c055dfdc>] (__schedule) from [<c0561b74>] (schedule_timeout+0x18/0x198)
> [<c0561b74>] (schedule_timeout) from [<c055ee74>] (wait_for_common+0xf8/0x138)
> [<c055ee74>] (wait_for_common) from [<c03f724c>] (i2c_del_adapter+0x174/0x1cc)
> [<c03f724c>] (i2c_del_adapter) from [<c03f8480>] (i2c_del_mux_adapter+0x48/0x60)
> [<c03f8480>] (i2c_del_mux_adapter) from [<c046cf00>] (selftest_i2c_mux_remove+0x28/0x34)
> [<c046cf00>] (selftest_i2c_mux_remove) from [<c03f5234>] (i2c_device_remove+0x34/0x70)
> [<c03f5234>] (i2c_device_remove) from [<c03322ec>] (__device_release_driver+0x7c/0xc0)
> [<c03322ec>] (__device_release_driver) from [<c0332968>] (driver_detach+0x8c/0xb4)
> [<c0332968>] (driver_detach) from [<c0332088>] (bus_remove_driver+0x64/0x8c)
> [<c0332088>] (bus_remove_driver) from [<c07f1858>] (of_selftest+0x1f84/0x20f0)
> [<c07f1858>] (of_selftest) from [<c0008b04>] (do_one_initcall+0x104/0x1b4)
> [<c0008b04>] (do_one_initcall) from [<c07c4d9c>] (kernel_init_freeable+0x110/0x1d8)
> [<c07c4d9c>] (kernel_init_freeable) from [<c05591ec>] (kernel_init+0x8/0xe4)
> [<c05591ec>] (kernel_init) from [<c000deb8>] (ret_from_fork+0x14/0x3c)
> 2 locks held by swapper/0/1:
>  #0:  (&dev->mutex){......}, at: [<c0332944>] driver_detach+0x68/0xb4
>  #1:  (&dev->mutex){......}, at: [<c0332954>] driver_detach+0x78/0xb4
> Kernel panic - not syncing: hung_task: blocked tasks
> CPU: 0 PID: 16 Comm: khungtaskd Not tainted 3.19.0-rc4-00022-g261647c #443
> Hardware name: Generic AM33XX (Flattened Device Tree)
> [<c00140a8>] (unwind_backtrace) from [<c00110dc>] (show_stack+0x10/0x14)
> [<c00110dc>] (show_stack) from [<c055c3ac>] (dump_stack+0x70/0x8c)
> [<c055c3ac>] (dump_stack) from [<c055ad10>] (panic+0x88/0x1f0)
> [<c055ad10>] (panic) from [<c00adef8>] (watchdog+0x2d4/0x350)
> [<c00adef8>] (watchdog) from [<c004e984>] (kthread+0xd8/0xec)
> [<c004e984>] (kthread) from [<c000deb8>] (ret_from_fork+0x14/0x3c)
> 
> The device release method is never called and we hang waiting for it.
> 
> This patch is marked as an [RFC] since the original code seems to
> really want to protect against a race from sysfs accessors, which
> I don't see how it could be a problem.
> 
> Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
> ---
>  drivers/i2c/i2c-core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
> index 39d25a8..e020a16 100644
> --- a/drivers/i2c/i2c-core.c
> +++ b/drivers/i2c/i2c-core.c
> @@ -1185,7 +1185,7 @@ EXPORT_SYMBOL_GPL(i2c_new_dummy);
>  static void i2c_adapter_dev_release(struct device *dev)
>  {
>  	struct i2c_adapter *adap = to_i2c_adapter(dev);
> -	complete(&adap->dev_released);
> +	/* XXX complete(&adap->dev_released); */
>  }
>  
>  /*
> @@ -1797,11 +1797,11 @@ void i2c_del_adapter(struct i2c_adapter *adap)
>  	dev_dbg(&adap->dev, "adapter [%s] unregistered\n", adap->name);
>  
>  	/* clean up the sysfs representation */
> -	init_completion(&adap->dev_released);
> +	/* XXX init_completion(&adap->dev_released); */
>  	device_unregister(&adap->dev);
>  
>  	/* wait for sysfs to drop all references */
> -	wait_for_completion(&adap->dev_released);
> +	/* XXX wait_for_completion(&adap->dev_released); */
>  
>  	/* free bus id */
>  	mutex_lock(&core_lock);
> -- 
> 1.7.12
>
Jean Delvare Jan. 14, 2015, 1:49 p.m. UTC | #2
Hi Wolfram, Pantelis,

On Tue, 13 Jan 2015 16:29:57 +0100, Wolfram Sang wrote:
> On Mon, Jan 12, 2015 at 07:00:50PM +0200, Pantelis Antoniou wrote:
> > Waiting for the device release method to be called when
> > going through i2c_del_adapter is wrong and leads to deadlock
> > when removing an i2c mux device.
> 
> Adding Jean to the list, maybe he knows more details from the past.
> 
> > For instance when using the OF i2c mux unitest removal we get this.

I can't parse the sentence above, sorry. What is "unitest"?

> > [<c055dfdc>] (__schedule) from [<c0561b74>] (schedule_timeout+0x18/0x198)
> > [<c0561b74>] (schedule_timeout) from [<c055ee74>] (wait_for_common+0xf8/0x138)
> > [<c055ee74>] (wait_for_common) from [<c03f724c>] (i2c_del_adapter+0x174/0x1cc)
> > [<c03f724c>] (i2c_del_adapter) from [<c03f8480>] (i2c_del_mux_adapter+0x48/0x60)
> > [<c03f8480>] (i2c_del_mux_adapter) from [<c046cf00>] (selftest_i2c_mux_remove+0x28/0x34)
> > [<c046cf00>] (selftest_i2c_mux_remove) from [<c03f5234>] (i2c_device_remove+0x34/0x70)
> > [<c03f5234>] (i2c_device_remove) from [<c03322ec>] (__device_release_driver+0x7c/0xc0)
> > [<c03322ec>] (__device_release_driver) from [<c0332968>] (driver_detach+0x8c/0xb4)
> > [<c0332968>] (driver_detach) from [<c0332088>] (bus_remove_driver+0x64/0x8c)
> > [<c0332088>] (bus_remove_driver) from [<c07f1858>] (of_selftest+0x1f84/0x20f0)
> > [<c07f1858>] (of_selftest) from [<c0008b04>] (do_one_initcall+0x104/0x1b4)
> > [<c0008b04>] (do_one_initcall) from [<c07c4d9c>] (kernel_init_freeable+0x110/0x1d8)
> > [<c07c4d9c>] (kernel_init_freeable) from [<c05591ec>] (kernel_init+0x8/0xe4)
> > [<c05591ec>] (kernel_init) from [<c000deb8>] (ret_from_fork+0x14/0x3c)
> > 2 locks held by swapper/0/1:
> >  #0:  (&dev->mutex){......}, at: [<c0332944>] driver_detach+0x68/0xb4
> >  #1:  (&dev->mutex){......}, at: [<c0332954>] driver_detach+0x78/0xb4
> > Kernel panic - not syncing: hung_task: blocked tasks
> > CPU: 0 PID: 16 Comm: khungtaskd Not tainted 3.19.0-rc4-00022-g261647c #443
> > Hardware name: Generic AM33XX (Flattened Device Tree)
> > [<c00140a8>] (unwind_backtrace) from [<c00110dc>] (show_stack+0x10/0x14)
> > [<c00110dc>] (show_stack) from [<c055c3ac>] (dump_stack+0x70/0x8c)
> > [<c055c3ac>] (dump_stack) from [<c055ad10>] (panic+0x88/0x1f0)
> > [<c055ad10>] (panic) from [<c00adef8>] (watchdog+0x2d4/0x350)
> > [<c00adef8>] (watchdog) from [<c004e984>] (kthread+0xd8/0xec)
> > [<c004e984>] (kthread) from [<c000deb8>] (ret_from_fork+0x14/0x3c)
> > 
> > The device release method is never called and we hang waiting for it.
> > 
> > This patch is marked as an [RFC] since the original code seems to
> > really want to protect against a race from sysfs accessors, which
> > I don't see how it could be a problem.

The code is not from me, it's from Greg KH. It was written in August
2003 when Greg converted the i2c subsystem to somewhat fit into the
(then) new device driver model. So that was a long time ago and it is
possible that this is no longer necessary. FWIW I can't see anything
similar in other subsystems.

In fact I'm not sure what purpose the completion serves exactly. I
expected it to delay the i2c adapter removal until no sysfs user of it
was left (as the comment suggests.) However I just tested unloading an
i2c bus driver while its adapter's new_device attribute was opened and
rmmod returned immediately. So it doesn't look like accessing sysfs
attributes actually takes a reference to the underlying i2c_adapter.

That being said, nobody complained about this in 11 years, so I would
be surprised if it was plain wrong. I'd rather question the test code.
Where is it?
Pantelis Antoniou Jan. 14, 2015, 1:54 p.m. UTC | #3
Hi Jean,

> On Jan 14, 2015, at 15:49 , Jean Delvare <jdelvare@suse.de> wrote:
> 
> Hi Wolfram, Pantelis,
> 
> On Tue, 13 Jan 2015 16:29:57 +0100, Wolfram Sang wrote:
>> On Mon, Jan 12, 2015 at 07:00:50PM +0200, Pantelis Antoniou wrote:
>>> Waiting for the device release method to be called when
>>> going through i2c_del_adapter is wrong and leads to deadlock
>>> when removing an i2c mux device.
>> 
>> Adding Jean to the list, maybe he knows more details from the past.
>> 
>>> For instance when using the OF i2c mux unitest removal we get this.
> 
> I can't parse the sentence above, sorry. What is "unitest”?
> 

The DT overlay patchset contains a number of OF unitests for its features.

One of the new tests is performing a runtime insertion and removal of an i2c mux containing
an i2c device on one of the channels.

When removing the mux device we get a deadlock.

>>> [<c055dfdc>] (__schedule) from [<c0561b74>] (schedule_timeout+0x18/0x198)
>>> [<c0561b74>] (schedule_timeout) from [<c055ee74>] (wait_for_common+0xf8/0x138)
>>> [<c055ee74>] (wait_for_common) from [<c03f724c>] (i2c_del_adapter+0x174/0x1cc)
>>> [<c03f724c>] (i2c_del_adapter) from [<c03f8480>] (i2c_del_mux_adapter+0x48/0x60)
>>> [<c03f8480>] (i2c_del_mux_adapter) from [<c046cf00>] (selftest_i2c_mux_remove+0x28/0x34)
>>> [<c046cf00>] (selftest_i2c_mux_remove) from [<c03f5234>] (i2c_device_remove+0x34/0x70)
>>> [<c03f5234>] (i2c_device_remove) from [<c03322ec>] (__device_release_driver+0x7c/0xc0)
>>> [<c03322ec>] (__device_release_driver) from [<c0332968>] (driver_detach+0x8c/0xb4)
>>> [<c0332968>] (driver_detach) from [<c0332088>] (bus_remove_driver+0x64/0x8c)
>>> [<c0332088>] (bus_remove_driver) from [<c07f1858>] (of_selftest+0x1f84/0x20f0)
>>> [<c07f1858>] (of_selftest) from [<c0008b04>] (do_one_initcall+0x104/0x1b4)
>>> [<c0008b04>] (do_one_initcall) from [<c07c4d9c>] (kernel_init_freeable+0x110/0x1d8)
>>> [<c07c4d9c>] (kernel_init_freeable) from [<c05591ec>] (kernel_init+0x8/0xe4)
>>> [<c05591ec>] (kernel_init) from [<c000deb8>] (ret_from_fork+0x14/0x3c)
>>> 2 locks held by swapper/0/1:
>>> #0:  (&dev->mutex){......}, at: [<c0332944>] driver_detach+0x68/0xb4
>>> #1:  (&dev->mutex){......}, at: [<c0332954>] driver_detach+0x78/0xb4
>>> Kernel panic - not syncing: hung_task: blocked tasks
>>> CPU: 0 PID: 16 Comm: khungtaskd Not tainted 3.19.0-rc4-00022-g261647c #443
>>> Hardware name: Generic AM33XX (Flattened Device Tree)
>>> [<c00140a8>] (unwind_backtrace) from [<c00110dc>] (show_stack+0x10/0x14)
>>> [<c00110dc>] (show_stack) from [<c055c3ac>] (dump_stack+0x70/0x8c)
>>> [<c055c3ac>] (dump_stack) from [<c055ad10>] (panic+0x88/0x1f0)
>>> [<c055ad10>] (panic) from [<c00adef8>] (watchdog+0x2d4/0x350)
>>> [<c00adef8>] (watchdog) from [<c004e984>] (kthread+0xd8/0xec)
>>> [<c004e984>] (kthread) from [<c000deb8>] (ret_from_fork+0x14/0x3c)
>>> 
>>> The device release method is never called and we hang waiting for it.
>>> 
>>> This patch is marked as an [RFC] since the original code seems to
>>> really want to protect against a race from sysfs accessors, which
>>> I don't see how it could be a problem.
> 
> The code is not from me, it's from Greg KH. It was written in August
> 2003 when Greg converted the i2c subsystem to somewhat fit into the
> (then) new device driver model. So that was a long time ago and it is
> possible that this is no longer necessary. FWIW I can't see anything
> similar in other subsystems.
> 
> In fact I'm not sure what purpose the completion serves exactly. I
> expected it to delay the i2c adapter removal until no sysfs user of it
> was left (as the comment suggests.) However I just tested unloading an
> i2c bus driver while its adapter's new_device attribute was opened and
> rmmod returned immediately. So it doesn't look like accessing sysfs
> attributes actually takes a reference to the underlying i2c_adapter.
> 

That’s what got me stumped too. I haven’t seen something like this before.

> That being said, nobody complained about this in 11 years, so I would
> be surprised if it was plain wrong. I'd rather question the test code.
> Where is it?
> 

No-one has apparently tested removing an i2c mux device on a running system
before.

It’s in a new patchset I posted earlier.

Search for "of: unitest: Add I2C overlay unit tests.”

https://www.mail-archive.com/devicetree@vger.kernel.org/msg57577.html

> -- 
> Jean Delvare
> SUSE L3 Support

Regards

— Pantelis

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Lawnick Jan. 14, 2015, 2:06 p.m. UTC | #4
Am 14.01.2015 um 14:54 schrieb Pantelis Antoniou:
> Hi Jean,
>
>> On Jan 14, 2015, at 15:49 , Jean Delvare <jdelvare@suse.de> wrote:
...
>> That being said, nobody complained about this in 11 years, so I would
>> be surprised if it was plain wrong. I'd rather question the test code.
>> Where is it?
>>
>
> No-one has apparently tested removing an i2c mux device on a running system
> before.

We do it day by day and it works perfectly with pca954x from 2.6.32 to 
3.14.27
So if there are problems they must have been invented lately.
Pantelis Antoniou Jan. 14, 2015, 2:12 p.m. UTC | #5
Hi Michael,

> On Jan 14, 2015, at 16:06 , Michael Lawnick <ml.lawnick@gmx.de> wrote:
> 
> Am 14.01.2015 um 14:54 schrieb Pantelis Antoniou:
>> Hi Jean,
>> 
>>> On Jan 14, 2015, at 15:49 , Jean Delvare <jdelvare@suse.de> wrote:
> ...
>>> That being said, nobody complained about this in 11 years, so I would
>>> be surprised if it was plain wrong. I'd rather question the test code.
>>> Where is it?
>>> 
>> 
>> No-one has apparently tested removing an i2c mux device on a running system
>> before.
> 
> We do it day by day and it works perfectly with pca954x from 2.6.32 to 3.14.27
> So if there are problems they must have been invented lately.

I had a report that someone had the same problem with overlays containing a pca mux,
let me see if I can find the person that sent it and get him to supply more info.

Kernel was at least 3.17, so…

> -- 
> JM2C
> KR
> Michael
> 

Regards

— Pantelis

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guenter Roeck Jan. 14, 2015, 2:14 p.m. UTC | #6
On 01/14/2015 05:54 AM, Pantelis Antoniou wrote:

>> That being said, nobody complained about this in 11 years, so I would
>> be surprised if it was plain wrong. I'd rather question the test code.
>> Where is it?
>>
>
> No-one has apparently tested removing an i2c mux device on a running system
> before.
>
Hi Pantelis,

We didn't see the problem with 3.14. That was with an earlier version of the
overlay patchset, though.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pantelis Antoniou Jan. 14, 2015, 2:18 p.m. UTC | #7
Hi Guenter,

> On Jan 14, 2015, at 16:14 , Guenter Roeck <linux@roeck-us.net> wrote:
> 
> On 01/14/2015 05:54 AM, Pantelis Antoniou wrote:
> 
>>> That being said, nobody complained about this in 11 years, so I would
>>> be surprised if it was plain wrong. I'd rather question the test code.
>>> Where is it?
>>> 
>> 
>> No-one has apparently tested removing an i2c mux device on a running system
>> before.
>> 
> Hi Pantelis,
> 
> We didn't see the problem with 3.14. That was with an earlier version of the
> overlay patchset, though.
> 

There have been changes to the overlay patchset since then true, but I don’t see
how that would trigger a deadlock in i2c. The completion wait just seems bogus to me.

> Guenter
> 

Regards

— Pantelis

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean Delvare Jan. 14, 2015, 3:15 p.m. UTC | #8
On Wed, 14 Jan 2015 15:54:52 +0200, Pantelis Antoniou wrote:
> On Jan 14, 2015, at 15:49 , Jean Delvare <jdelvare@suse.de> wrote:
> > That being said, nobody complained about this in 11 years, so I would
> > be surprised if it was plain wrong. I'd rather question the test code.
> > Where is it?
> 
> No-one has apparently tested removing an i2c mux device on a running system
> before.

I did that. On my system the i2c-i801 driver instantiates an
i2c-mux-gpio device. Unloading the i2c-i801 driver removes that device
and it works fine. Note however that I am not able to unload module
i2c-i801 immediately, I have to unload module i2c-mux-gpio first, as
the latter holds a reference to the former. This is because
i2c-mux-gpio calls i2c_get_adapter() on the parent I2C bus segment at
probing time.

I see that i2c-mux-pinctrl does the same, but i2c-mux-pca954x and
i2c-mux-pca9541 do not. This might be an issue. I doubt this is related
to your problem though, as these are module references and not device
references.

Were you able to figure out who is holding a reference to your
i2c_adapter? This would help us understand who is doing something wrong
here, i2c-core or your test code.
Guenter Roeck Jan. 14, 2015, 4:19 p.m. UTC | #9
On Wed, Jan 14, 2015 at 04:18:56PM +0200, Pantelis Antoniou wrote:
> Hi Guenter,
> 
> > On Jan 14, 2015, at 16:14 , Guenter Roeck <linux@roeck-us.net> wrote:
> > 
> > On 01/14/2015 05:54 AM, Pantelis Antoniou wrote:
> > 
> >>> That being said, nobody complained about this in 11 years, so I would
> >>> be surprised if it was plain wrong. I'd rather question the test code.
> >>> Where is it?
> >>> 
> >> 
> >> No-one has apparently tested removing an i2c mux device on a running system
> >> before.
> >> 
> > Hi Pantelis,
> > 
> > We didn't see the problem with 3.14. That was with an earlier version of the
> > overlay patchset, though.
> > 
> 
> There have been changes to the overlay patchset since then true, but I don’t see
> how that would trigger a deadlock in i2c. The completion wait just seems bogus to me.
> 

Maybe the problem is triggered by a change in the infrastructure.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guenter Roeck Jan. 14, 2015, 4:24 p.m. UTC | #10
On Wed, Jan 14, 2015 at 04:15:25PM +0100, Jean Delvare wrote:
> On Wed, 14 Jan 2015 15:54:52 +0200, Pantelis Antoniou wrote:
> > On Jan 14, 2015, at 15:49 , Jean Delvare <jdelvare@suse.de> wrote:
> > > That being said, nobody complained about this in 11 years, so I would
> > > be surprised if it was plain wrong. I'd rather question the test code.
> > > Where is it?
> > 
> > No-one has apparently tested removing an i2c mux device on a running system
> > before.
> 
> I did that. On my system the i2c-i801 driver instantiates an
> i2c-mux-gpio device. Unloading the i2c-i801 driver removes that device
> and it works fine. Note however that I am not able to unload module
> i2c-i801 immediately, I have to unload module i2c-mux-gpio first, as
> the latter holds a reference to the former. This is because
> i2c-mux-gpio calls i2c_get_adapter() on the parent I2C bus segment at
> probing time.
> 
> I see that i2c-mux-pinctrl does the same, but i2c-mux-pca954x and
> i2c-mux-pca9541 do not. This might be an issue. I doubt this is related
> to your problem though, as these are module references and not device
> references.
> 
Should we add get/put functions to those drivers ?

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean Delvare Jan. 14, 2015, 5:14 p.m. UTC | #11
Hi Guenter,

On Wed, 14 Jan 2015 08:24:43 -0800, Guenter Roeck wrote:
> On Wed, Jan 14, 2015 at 04:15:25PM +0100, Jean Delvare wrote:
> > I did that. On my system the i2c-i801 driver instantiates an
> > i2c-mux-gpio device. Unloading the i2c-i801 driver removes that device
> > and it works fine. Note however that I am not able to unload module
> > i2c-i801 immediately, I have to unload module i2c-mux-gpio first, as
> > the latter holds a reference to the former. This is because
> > i2c-mux-gpio calls i2c_get_adapter() on the parent I2C bus segment at
> > probing time.
> > 
> > I see that i2c-mux-pinctrl does the same, but i2c-mux-pca954x and
> > i2c-mux-pca9541 do not. This might be an issue. I doubt this is related
> > to your problem though, as these are module references and not device
> > references.
>
> Should we add get/put functions to those drivers ?

I'm not sure. If you unload a driver which instantiated a pca9540-like
device, without unloading i2c-mux-pca954x first, does something bad
happen?
Pantelis Antoniou Jan. 14, 2015, 5:24 p.m. UTC | #12
Hi Jean,

> On Jan 14, 2015, at 19:14 , Jean Delvare <jdelvare@suse.de> wrote:
> 
> Hi Guenter,
> 
> On Wed, 14 Jan 2015 08:24:43 -0800, Guenter Roeck wrote:
>> On Wed, Jan 14, 2015 at 04:15:25PM +0100, Jean Delvare wrote:
>>> I did that. On my system the i2c-i801 driver instantiates an
>>> i2c-mux-gpio device. Unloading the i2c-i801 driver removes that device
>>> and it works fine. Note however that I am not able to unload module
>>> i2c-i801 immediately, I have to unload module i2c-mux-gpio first, as
>>> the latter holds a reference to the former. This is because
>>> i2c-mux-gpio calls i2c_get_adapter() on the parent I2C bus segment at
>>> probing time.
>>> 
>>> I see that i2c-mux-pinctrl does the same, but i2c-mux-pca954x and
>>> i2c-mux-pca9541 do not. This might be an issue. I doubt this is related
>>> to your problem though, as these are module references and not device
>>> references.
>> 
>> Should we add get/put functions to those drivers ?
> 
> I'm not sure. If you unload a driver which instantiated a pca9540-like
> device, without unloading i2c-mux-pca954x first, does something bad
> happen?
> 

I’m not sure this got anything to do with module reference. The problem
lies at the device model’s reference counts.

I’ll try to dig around tomorrow and see what the real device reference counts
are, but my hunch goes like this:

MUX
+—- ADAPTER
    +— DEV.

Mux remove method is called, i2c_del_mux_adapter is called on all the channels
of the mux, calling in turn i2c_del_adapter which hangs on completion of the
dev_released.

The call to device_unregister never calls the device_type callback (i2c_adapter_dev_release)
because the reference count is not 1 at that point, someone else is having another
reference.

I guess we could use Greg’s ideas on the matter.

> -- 
> Jean Delvare
> SUSE L3 Support

Regards

— Pantelis

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 14, 2015, 8:41 p.m. UTC | #13
On Wed, Jan 14, 2015 at 07:24:22PM +0200, Pantelis Antoniou wrote:
> I’ll try to dig around tomorrow and see what the real device reference counts
> are, but my hunch goes like this:
> 
> MUX
> +—- ADAPTER
>     +— DEV.
> 
> Mux remove method is called, i2c_del_mux_adapter is called on all the channels
> of the mux, calling in turn i2c_del_adapter which hangs on completion of the
> dev_released.
> 
> The call to device_unregister never calls the device_type callback (i2c_adapter_dev_release)
> because the reference count is not 1 at that point, someone else is having another
> reference.

Who has that reference?  It's not sysfs, right?  Or is it?  How is the
remove method being called, is that coming from a sysfs file?

Having that call "wait" for the other release call to happen is really
old, as Jean points out, from 2003.  We have "fixed" sysfs since then to
detach the files from the devices easier, we used to have some nasy
reference count issues in that area.  Perhaps this isn't needed anymore,
I don't remember the i2c core code at all, that was a long time ago.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pantelis Antoniou Jan. 15, 2015, 4:25 p.m. UTC | #14
Hi Greg,

> On Jan 14, 2015, at 22:41 , Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> 
> On Wed, Jan 14, 2015 at 07:24:22PM +0200, Pantelis Antoniou wrote:
>> I’ll try to dig around tomorrow and see what the real device reference counts
>> are, but my hunch goes like this:
>> 
>> MUX
>> +—- ADAPTER
>>    +— DEV.
>> 
>> Mux remove method is called, i2c_del_mux_adapter is called on all the channels
>> of the mux, calling in turn i2c_del_adapter which hangs on completion of the
>> dev_released.
>> 
>> The call to device_unregister never calls the device_type callback (i2c_adapter_dev_release)
>> because the reference count is not 1 at that point, someone else is having another
>> reference.
> 

First of all, my head hurts. Tracking device references ain’t easy. Is there some kind
of debugging method you’d recommend for this?

> Who has that reference?  It's not sysfs, right?  Or is it?  How is the
> remove method being called, is that coming from a sysfs file?
> 

No, it’s not sysfs. Sysfs is not used at all.

The remove method of the mux gets called and on it’s remove method i2c_del_mux_adapter is called,
which in turn calls i2c_del_adapter() which hangs.

Anyway, finally figured out what the problem was, a reference was held, and not released properly in
the i2c notifier pathc.

I’ll send out an updated patch later today.

> Having that call "wait" for the other release call to happen is really
> old, as Jean points out, from 2003.  We have "fixed" sysfs since then to
> detach the files from the devices easier, we used to have some nasy
> reference count issues in that area.  Perhaps this isn't needed anymore,


It’s awfully easy to get a hang with this blocking wait. Turns a memory leak into a hard hang.

> I don't remember the i2c core code at all, that was a long time ago.
> 

Bummer.

> thanks,
> 
> greg k-h

Regards

— Pantelis

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 15, 2015, 10:55 p.m. UTC | #15
On Thu, Jan 15, 2015 at 06:25:22PM +0200, Pantelis Antoniou wrote:
> Hi Greg,
> 
> > On Jan 14, 2015, at 22:41 , Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> > 
> > On Wed, Jan 14, 2015 at 07:24:22PM +0200, Pantelis Antoniou wrote:
> >> I’ll try to dig around tomorrow and see what the real device reference counts
> >> are, but my hunch goes like this:
> >> 
> >> MUX
> >> +—- ADAPTER
> >>    +— DEV.
> >> 
> >> Mux remove method is called, i2c_del_mux_adapter is called on all the channels
> >> of the mux, calling in turn i2c_del_adapter which hangs on completion of the
> >> dev_released.
> >> 
> >> The call to device_unregister never calls the device_type callback (i2c_adapter_dev_release)
> >> because the reference count is not 1 at that point, someone else is having another
> >> reference.
> > 
> 
> First of all, my head hurts. Tracking device references ain’t easy. Is there some kind
> of debugging method you’d recommend for this?

You can turn on debugging for kobjects and the driver core if you want
to slow down your system log a bunch, but it can be helpful for trickier
issues.  Or just sprinkle a few printks around.

> > I don't remember the i2c core code at all, that was a long time ago.
> > 
> 
> Bummer.

Do you remember code you wrote 12 year ago and haven't looked at for at
least 11?  Why expect others to as well? :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pantelis Antoniou Jan. 16, 2015, 7:19 a.m. UTC | #16
Hi Greg,

> On Jan 16, 2015, at 00:55 , Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> 
> On Thu, Jan 15, 2015 at 06:25:22PM +0200, Pantelis Antoniou wrote:
>> Hi Greg,
>> 
>>> On Jan 14, 2015, at 22:41 , Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
>>> 
>>> On Wed, Jan 14, 2015 at 07:24:22PM +0200, Pantelis Antoniou wrote:
>>>> I’ll try to dig around tomorrow and see what the real device reference counts
>>>> are, but my hunch goes like this:
>>>> 
>>>> MUX
>>>> +—- ADAPTER
>>>>   +— DEV.
>>>> 
>>>> Mux remove method is called, i2c_del_mux_adapter is called on all the channels
>>>> of the mux, calling in turn i2c_del_adapter which hangs on completion of the
>>>> dev_released.
>>>> 
>>>> The call to device_unregister never calls the device_type callback (i2c_adapter_dev_release)
>>>> because the reference count is not 1 at that point, someone else is having another
>>>> reference.
>>> 
>> 
>> First of all, my head hurts. Tracking device references ain’t easy. Is there some kind
>> of debugging method you’d recommend for this?
> 
> You can turn on debugging for kobjects and the driver core if you want
> to slow down your system log a bunch, but it can be helpful for trickier
> issues.  Or just sprinkle a few printks around.
> 

Turning on debugging for kobjects is like opening the floodgates…

I managed to make do with a few printks, but some kind of tracking would
be helpful. 

>>> I don't remember the i2c core code at all, that was a long time ago.
>>> 
>> 
>> Bummer.
> 
> Do you remember code you wrote 12 year ago and haven't looked at for at
> least 11?  Why expect others to as well? :)
> 

You’re supposed to be a super-hero maintainer god right? How am I supposed
to continue living with my belief system crumbled like that? :)

> thanks,
> 
> greg k-h

Regards

— Pantelis

--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean Delvare Jan. 16, 2015, 7:36 a.m. UTC | #17
Hi Pantelis,

On Thu, 15 Jan 2015 18:25:22 +0200, Pantelis Antoniou wrote:
> The remove method of the mux gets called and on it’s remove method i2c_del_mux_adapter is called,
> which in turn calls i2c_del_adapter() which hangs.
> 
> Anyway, finally figured out what the problem was, a reference was held, and not released properly in
> the i2c notifier pathc.
> 
> I’ll send out an updated patch later today.
> 
> > Having that call "wait" for the other release call to happen is really
> > old, as Jean points out, from 2003.  We have "fixed" sysfs since then to
> > detach the files from the devices easier, we used to have some nasy
> > reference count issues in that area.  Perhaps this isn't needed anymore,
> 
> It’s awfully easy to get a hang with this blocking wait. Turns a memory leak into a hard hang.

Turns an invisible memory leak into a visible bug, giving you the
opportunity to fix it. You should be grateful ;-)

That being said, if this completion serves no other purpose then it
can go away, I don't mind.
diff mbox

Patch

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 39d25a8..e020a16 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -1185,7 +1185,7 @@  EXPORT_SYMBOL_GPL(i2c_new_dummy);
 static void i2c_adapter_dev_release(struct device *dev)
 {
 	struct i2c_adapter *adap = to_i2c_adapter(dev);
-	complete(&adap->dev_released);
+	/* XXX complete(&adap->dev_released); */
 }
 
 /*
@@ -1797,11 +1797,11 @@  void i2c_del_adapter(struct i2c_adapter *adap)
 	dev_dbg(&adap->dev, "adapter [%s] unregistered\n", adap->name);
 
 	/* clean up the sysfs representation */
-	init_completion(&adap->dev_released);
+	/* XXX init_completion(&adap->dev_released); */
 	device_unregister(&adap->dev);
 
 	/* wait for sysfs to drop all references */
-	wait_for_completion(&adap->dev_released);
+	/* XXX wait_for_completion(&adap->dev_released); */
 
 	/* free bus id */
 	mutex_lock(&core_lock);