diff mbox

[BUGFIX,2/9] ACPIPHP: fix device destroying order issue when handling dock notification

Message ID 2301639.HY0nSRLtah@vostro.rjw.lan
State Not Applicable
Headers show

Commit Message

Rafael J. Wysocki June 14, 2013, 12:23 p.m. UTC
On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote:
> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote:
> > Current ACPI glue logic expects that physical devices are destroyed
> > before destroying companion ACPI devices, otherwise it will break the
> > ACPI unbind logic and cause following warning messages:
> > [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
> > [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
> > [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
> > [  180.013656]  port1: Oops, 'acpi_handle' corrupt
> > Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321
> > for full log message.
> 
> So my question is, did we have this problem before commit 3b63aaa70e1?
> 
> If we did, then when did it start?  Or was it present forever?
> 
> > Above warning messages are caused by following scenario:
> > 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq
> > 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb()
> >    ->dock_notify()-> handle_eject_request()->hotplug_dock_devices()
> > 3) hotplug_dock_devices() first invokes registered hotplug callbacks to
> >    destroy physical devices, then destroys all affected ACPI devices.
> >    Everything seems perfect until now. But the acpiphp dock notification
> >    handler will queue another task (T2) onto kacpi_hotplug_wq to really
> >    destroy affected physical devices.
> 
> Would not the solution be to modify it so that it didn't spawn the other
> task (T2), but removed the affected physical devices synchronously?
> 
> > 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have
> >    been destroyed.
> > 5) kacpi_hotplug_wq handles T2, which destroys all affected physical
> >    devices.
> > 
> > So it breaks ACPI glue logic's expection because ACPI devices are destroyed
> > in step 3 and physical devices are destroyed in step 5.
> > 
> > Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> > Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Yinghai Lu <yinghai@kernel.org>
> > Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > Cc: linux-pci@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: stable@vger.kernel.org
> > ---
> > Hi Bjorn and Rafael,
> >      The recursive lock changes haven't been tested yet, need help
> > from Alexander for testing.
> 
> Well, let's just say I'm not a fan of recursive locks.  Is that unavoidable
> here?

What about the appended patch (on top of [1/9], untested)?

Rafael


---
 drivers/pci/hotplug/acpiphp_glue.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexander E. Patrakov June 14, 2013, 12:30 p.m. UTC | #1
2013/6/14 Rafael J. Wysocki <rjw@sisk.pl>:

> What about the appended patch (on top of [1/9], untested)?
>
> Rafael

Sorry, I have lost the track of which patches, on top of 3.10-rc5, I
should include in my testing and which I shouldn't. Could you please
restore my understanding? I.e., please provide a full list of LKML or
Bugzilla links to patches which I should test during the next boot of
the laptop.

Thanks in advance.

> ---
>  drivers/pci/hotplug/acpiphp_glue.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
> +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif
>         return NOTIFY_OK;
>  }
>
> +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context)
> +{
> +       if (event == ACPI_NOTIFY_EJECT_REQUEST) {
> +               struct acpiphp_func *func = context;
> +
> +               if (!acpiphp_disable_slot(func->slot))
> +                       acpiphp_eject_slot(func->slot);
> +       } else {
> +               handle_hotplug_event_func(handle, event, context);
> +       }
> +}
>
>  static const struct acpi_dock_ops acpiphp_dock_ops = {
> -       .handler = handle_hotplug_event_func,
> +       .handler = handle_dock_event_func,
>  };
>
>  /* Check whether the PCI device is managed by native PCIe hotplug driver */
>



--
Alexander E. Patrakov
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 14, 2013, 12:53 p.m. UTC | #2
On Friday, June 14, 2013 06:30:58 PM Alexander E. Patrakov wrote:
> 2013/6/14 Rafael J. Wysocki <rjw@sisk.pl>:
> 
> > What about the appended patch (on top of [1/9], untested)?
> >
> > Rafael
> 
> Sorry, I have lost the track of which patches, on top of 3.10-rc5, I
> should include in my testing and which I shouldn't. Could you please
> restore my understanding? I.e., please provide a full list of LKML or
> Bugzilla links to patches which I should test during the next boot of
> the laptop.
> 
> Thanks in advance.

As far as I'm concerned, please test these two for now:

https://patchwork.kernel.org/patch/2717851/
https://patchwork.kernel.org/patch/2721271/

They are targeted at the ordering problems only, however, so they won't address
the resource allocation issues you're seeing.

There are some other patches Jiang Liu and Yinghai are working on as far as
I can say, but I'm not sure what their status is at the moment.

Thanks,
Rafael
Jiang Liu June 14, 2013, 1:57 p.m. UTC | #3
On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote:
> On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote:
>> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote:
>>> Current ACPI glue logic expects that physical devices are destroyed
>>> before destroying companion ACPI devices, otherwise it will break the
>>> ACPI unbind logic and cause following warning messages:
>>> [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
>>> [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
>>> [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
>>> [  180.013656]  port1: Oops, 'acpi_handle' corrupt
>>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321
>>> for full log message.
>>
>> So my question is, did we have this problem before commit 3b63aaa70e1?
>>
>> If we did, then when did it start?  Or was it present forever?
>>
>>> Above warning messages are caused by following scenario:
>>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq
>>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb()
>>>    ->dock_notify()-> handle_eject_request()->hotplug_dock_devices()
>>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to
>>>    destroy physical devices, then destroys all affected ACPI devices.
>>>    Everything seems perfect until now. But the acpiphp dock notification
>>>    handler will queue another task (T2) onto kacpi_hotplug_wq to really
>>>    destroy affected physical devices.
>>
>> Would not the solution be to modify it so that it didn't spawn the other
>> task (T2), but removed the affected physical devices synchronously?
>>
>>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have
>>>    been destroyed.
>>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical
>>>    devices.
>>>
>>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed
>>> in step 3 and physical devices are destroyed in step 5.
>>>
>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
>>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
>>> Cc: linux-pci@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: stable@vger.kernel.org
>>> ---
>>> Hi Bjorn and Rafael,
>>>      The recursive lock changes haven't been tested yet, need help
>>> from Alexander for testing.
>>
>> Well, let's just say I'm not a fan of recursive locks.  Is that unavoidable
>> here?
> 
> What about the appended patch (on top of [1/9], untested)?
> 
> Rafael
It should have similar effect as patch 2/9, and it will encounter the
same deadlock scenario as 2/9 too.

> 
> 
> ---
>  drivers/pci/hotplug/acpiphp_glue.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
> +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif
>  	return NOTIFY_OK;
>  }
>  
> +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context)
> +{
> +	if (event == ACPI_NOTIFY_EJECT_REQUEST) {
> +		struct acpiphp_func *func = context;
> +
> +		if (!acpiphp_disable_slot(func->slot))
> +			acpiphp_eject_slot(func->slot);
> +	} else {
> +		handle_hotplug_event_func(handle, event, context);
> +	}
> +}
>  
>  static const struct acpi_dock_ops acpiphp_dock_ops = {
> -	.handler = handle_hotplug_event_func,
> +	.handler = handle_dock_event_func,
>  };
>  
>  /* Check whether the PCI device is managed by native PCIe hotplug driver */
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 14, 2013, 2:12 p.m. UTC | #4
On Friday, June 14, 2013 09:57:15 PM Jiang Liu wrote:
> On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote:
> > On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote:
> >> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote:
> >>> Current ACPI glue logic expects that physical devices are destroyed
> >>> before destroying companion ACPI devices, otherwise it will break the
> >>> ACPI unbind logic and cause following warning messages:
> >>> [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
> >>> [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
> >>> [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
> >>> [  180.013656]  port1: Oops, 'acpi_handle' corrupt
> >>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321
> >>> for full log message.
> >>
> >> So my question is, did we have this problem before commit 3b63aaa70e1?
> >>
> >> If we did, then when did it start?  Or was it present forever?
> >>
> >>> Above warning messages are caused by following scenario:
> >>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq
> >>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb()
> >>>    ->dock_notify()-> handle_eject_request()->hotplug_dock_devices()
> >>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to
> >>>    destroy physical devices, then destroys all affected ACPI devices.
> >>>    Everything seems perfect until now. But the acpiphp dock notification
> >>>    handler will queue another task (T2) onto kacpi_hotplug_wq to really
> >>>    destroy affected physical devices.
> >>
> >> Would not the solution be to modify it so that it didn't spawn the other
> >> task (T2), but removed the affected physical devices synchronously?
> >>
> >>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have
> >>>    been destroyed.
> >>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical
> >>>    devices.
> >>>
> >>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed
> >>> in step 3 and physical devices are destroyed in step 5.
> >>>
> >>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> >>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
> >>> Cc: Bjorn Helgaas <bhelgaas@google.com>
> >>> Cc: Yinghai Lu <yinghai@kernel.org>
> >>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> >>> Cc: linux-pci@vger.kernel.org
> >>> Cc: linux-kernel@vger.kernel.org
> >>> Cc: stable@vger.kernel.org
> >>> ---
> >>> Hi Bjorn and Rafael,
> >>>      The recursive lock changes haven't been tested yet, need help
> >>> from Alexander for testing.
> >>
> >> Well, let's just say I'm not a fan of recursive locks.  Is that unavoidable
> >> here?
> > 
> > What about the appended patch (on top of [1/9], untested)?
> > 
> > Rafael
> It should have similar effect as patch 2/9, and it will encounter the
> same deadlock scenario as 2/9 too.

And why exactly?

I'm looking at acpiphp_disable_slot() and I'm not seeing where the
problematic lock is taken.  Similarly for power_off_slot().

It should take the ACPI scan lock, but that's a different matter.

Thanks,
Rafael


> > ---
> >  drivers/pci/hotplug/acpiphp_glue.c |   13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
> > +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif
> >  	return NOTIFY_OK;
> >  }
> >  
> > +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context)
> > +{
> > +	if (event == ACPI_NOTIFY_EJECT_REQUEST) {
> > +		struct acpiphp_func *func = context;
> > +
> > +		if (!acpiphp_disable_slot(func->slot))
> > +			acpiphp_eject_slot(func->slot);
> > +	} else {
> > +		handle_hotplug_event_func(handle, event, context);
> > +	}
> > +}
> >  
> >  static const struct acpi_dock_ops acpiphp_dock_ops = {
> > -	.handler = handle_hotplug_event_func,
> > +	.handler = handle_dock_event_func,
> >  };
> >  
> >  /* Check whether the PCI device is managed by native PCIe hotplug driver */
> > 
>
Jiang Liu June 14, 2013, 3:30 p.m. UTC | #5
On 06/14/2013 10:12 PM, Rafael J. Wysocki wrote:
> On Friday, June 14, 2013 09:57:15 PM Jiang Liu wrote:
>> On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote:
>>> On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote:
>>>> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote:
>>>>> Current ACPI glue logic expects that physical devices are destroyed
>>>>> before destroying companion ACPI devices, otherwise it will break the
>>>>> ACPI unbind logic and cause following warning messages:
>>>>> [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
>>>>> [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
>>>>> [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
>>>>> [  180.013656]  port1: Oops, 'acpi_handle' corrupt
>>>>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321
>>>>> for full log message.
>>>>
>>>> So my question is, did we have this problem before commit 3b63aaa70e1?
>>>>
>>>> If we did, then when did it start?  Or was it present forever?
>>>>
>>>>> Above warning messages are caused by following scenario:
>>>>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq
>>>>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb()
>>>>>    ->dock_notify()-> handle_eject_request()->hotplug_dock_devices()
>>>>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to
>>>>>    destroy physical devices, then destroys all affected ACPI devices.
>>>>>    Everything seems perfect until now. But the acpiphp dock notification
>>>>>    handler will queue another task (T2) onto kacpi_hotplug_wq to really
>>>>>    destroy affected physical devices.
>>>>
>>>> Would not the solution be to modify it so that it didn't spawn the other
>>>> task (T2), but removed the affected physical devices synchronously?
>>>>
>>>>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have
>>>>>    been destroyed.
>>>>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical
>>>>>    devices.
>>>>>
>>>>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed
>>>>> in step 3 and physical devices are destroyed in step 5.
>>>>>
>>>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
>>>>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
>>>>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>>>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>>>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
>>>>> Cc: linux-pci@vger.kernel.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Cc: stable@vger.kernel.org
>>>>> ---
>>>>> Hi Bjorn and Rafael,
>>>>>      The recursive lock changes haven't been tested yet, need help
>>>>> from Alexander for testing.
>>>>
>>>> Well, let's just say I'm not a fan of recursive locks.  Is that unavoidable
>>>> here?
>>>
>>> What about the appended patch (on top of [1/9], untested)?
>>>
>>> Rafael
>> It should have similar effect as patch 2/9, and it will encounter the
>> same deadlock scenario as 2/9 too.
> 
> And why exactly?
> 
> I'm looking at acpiphp_disable_slot() and I'm not seeing where the
> problematic lock is taken.  Similarly for power_off_slot().
> 
> It should take the ACPI scan lock, but that's a different matter.
> 
> Thanks,
> Rafael
The deadlock scenario is the same:
        hotplug_dock_devices()
                mutex_lock(&ds->hp_lock)
                        dd->ops->handler()
				destroy pci bus
                                	unregister_hotplug_dock_device()
                                        	mutex_lock(&ds->hp_lock)


> 
> 
>>> ---
>>>  drivers/pci/hotplug/acpiphp_glue.c |   13 ++++++++++++-
>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
>>> +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
>>> @@ -145,9 +145,20 @@ static int post_dock_fixups(struct notif
>>>  	return NOTIFY_OK;
>>>  }
>>>  
>>> +static void handle_dock_event_func(acpi_handle handle, u32 event, void *context)
>>> +{
>>> +	if (event == ACPI_NOTIFY_EJECT_REQUEST) {
>>> +		struct acpiphp_func *func = context;
>>> +
>>> +		if (!acpiphp_disable_slot(func->slot))
>>> +			acpiphp_eject_slot(func->slot);
>>> +	} else {
>>> +		handle_hotplug_event_func(handle, event, context);
>>> +	}
>>> +}
>>>  
>>>  static const struct acpi_dock_ops acpiphp_dock_ops = {
>>> -	.handler = handle_hotplug_event_func,
>>> +	.handler = handle_dock_event_func,
>>>  };
>>>  
>>>  /* Check whether the PCI device is managed by native PCIe hotplug driver */
>>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 14, 2013, 11:12 p.m. UTC | #6
On Friday, June 14, 2013 11:30:12 PM Jiang Liu wrote:
> On 06/14/2013 10:12 PM, Rafael J. Wysocki wrote:
> > On Friday, June 14, 2013 09:57:15 PM Jiang Liu wrote:
> >> On 06/14/2013 08:23 PM, Rafael J. Wysocki wrote:
> >>> On Thursday, June 13, 2013 09:59:44 PM Rafael J. Wysocki wrote:
> >>>> On Friday, June 14, 2013 12:32:25 AM Jiang Liu wrote:
> >>>>> Current ACPI glue logic expects that physical devices are destroyed
> >>>>> before destroying companion ACPI devices, otherwise it will break the
> >>>>> ACPI unbind logic and cause following warning messages:
> >>>>> [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
> >>>>> [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
> >>>>> [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
> >>>>> [  180.013656]  port1: Oops, 'acpi_handle' corrupt
> >>>>> Please refer to https://bugzilla.kernel.org/attachment.cgi?id=104321
> >>>>> for full log message.
> >>>>
> >>>> So my question is, did we have this problem before commit 3b63aaa70e1?
> >>>>
> >>>> If we did, then when did it start?  Or was it present forever?
> >>>>
> >>>>> Above warning messages are caused by following scenario:
> >>>>> 1) acpi_dock_notifier_call() queues a task (T1) onto kacpi_hotplug_wq
> >>>>> 2) kacpi_hotplug_wq handles T1, which invokes acpi_dock_deferred_cb()
> >>>>>    ->dock_notify()-> handle_eject_request()->hotplug_dock_devices()
> >>>>> 3) hotplug_dock_devices() first invokes registered hotplug callbacks to
> >>>>>    destroy physical devices, then destroys all affected ACPI devices.
> >>>>>    Everything seems perfect until now. But the acpiphp dock notification
> >>>>>    handler will queue another task (T2) onto kacpi_hotplug_wq to really
> >>>>>    destroy affected physical devices.
> >>>>
> >>>> Would not the solution be to modify it so that it didn't spawn the other
> >>>> task (T2), but removed the affected physical devices synchronously?
> >>>>
> >>>>> 4) kacpi_hotplug_wq finishes T1, and all affected ACPI devices have
> >>>>>    been destroyed.
> >>>>> 5) kacpi_hotplug_wq handles T2, which destroys all affected physical
> >>>>>    devices.
> >>>>>
> >>>>> So it breaks ACPI glue logic's expection because ACPI devices are destroyed
> >>>>> in step 3 and physical devices are destroyed in step 5.
> >>>>>
> >>>>> Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
> >>>>> Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
> >>>>> Cc: Bjorn Helgaas <bhelgaas@google.com>
> >>>>> Cc: Yinghai Lu <yinghai@kernel.org>
> >>>>> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> >>>>> Cc: linux-pci@vger.kernel.org
> >>>>> Cc: linux-kernel@vger.kernel.org
> >>>>> Cc: stable@vger.kernel.org
> >>>>> ---
> >>>>> Hi Bjorn and Rafael,
> >>>>>      The recursive lock changes haven't been tested yet, need help
> >>>>> from Alexander for testing.
> >>>>
> >>>> Well, let's just say I'm not a fan of recursive locks.  Is that unavoidable
> >>>> here?
> >>>
> >>> What about the appended patch (on top of [1/9], untested)?
> >>>
> >>> Rafael
> >> It should have similar effect as patch 2/9, and it will encounter the
> >> same deadlock scenario as 2/9 too.
> > 
> > And why exactly?
> > 
> > I'm looking at acpiphp_disable_slot() and I'm not seeing where the
> > problematic lock is taken.  Similarly for power_off_slot().
> > 
> > It should take the ACPI scan lock, but that's a different matter.
> > 
> > Thanks,
> > Rafael
> The deadlock scenario is the same:

Well, you didn't answer my question.

>         hotplug_dock_devices()
>                 mutex_lock(&ds->hp_lock)
>                         dd->ops->handler()
> 				destroy pci bus

And this wasn't particularly helpful.

What about mentioning acpi_pci_remove_bus()?  I don't remember all changes
made recently, sorry.

>                                 	unregister_hotplug_dock_device()
>                                         	mutex_lock(&ds->hp_lock)

I see the problem.

ds->hp_lock is used to make both addition and removal of hotplug devices wait
for us to complete walking ds->hotplug_devices.  However, if we are in the
process of removing hotplug devices, we don't need removals to block on
ds->hp_lock (in fact, we don't even want them to block on it).  Conversely, if
we are in the process of adding hotplug devices, we don't want additions to
block on ds->hp_lock.

So, why don't we do the following:

(1) Introduce a 'hotplug_status' field into struct_dock station with possible
    values representing "removal", "addition" and "no action" and a wait queue
    associated with it.  We can use ds->dd_lock to protect that field.

(2) hotplug_status will be modified by hotplug_dock_devices() depending on the
    event.  For example, on eject it will set hotplug_status to "removal".
    Then, after completing the walk, it will reset hotplug_status to
    "no action" and wake up its wait queue.

(3) dock_del_hotplug_device() will check if hotplug_status is "removal" or
    "no_action" and if so, it will just do the removal under ds->dd_lock.  If
    it is "addition", though, it will go to sleep (in the hotplug_status' wait
    queue) until hotplug_dock_devices() wakes it up.

(4) Analogously for dock_add_hotplug_device().

For this to work the "addition" and "deletion" of hotplug devices should better
be implemented as setting and clearing a 'hotplug' flag in
struct dock_dependent_device (instead of using ds->hotplug_devices).

Thanks,
Rafael
diff mbox

Patch

Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
===================================================================
--- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
+++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
@@ -145,9 +145,20 @@  static int post_dock_fixups(struct notif
 	return NOTIFY_OK;
 }
 
+static void handle_dock_event_func(acpi_handle handle, u32 event, void *context)
+{
+	if (event == ACPI_NOTIFY_EJECT_REQUEST) {
+		struct acpiphp_func *func = context;
+
+		if (!acpiphp_disable_slot(func->slot))
+			acpiphp_eject_slot(func->slot);
+	} else {
+		handle_hotplug_event_func(handle, event, context);
+	}
+}
 
 static const struct acpi_dock_ops acpiphp_dock_ops = {
-	.handler = handle_hotplug_event_func,
+	.handler = handle_dock_event_func,
 };
 
 /* Check whether the PCI device is managed by native PCIe hotplug driver */