diff mbox series

[ovs-dev] netdev-dpdk: Fixed RCU deadlock when deleting vhostuser port

Message ID 20240808095717.75932-1-15957197901@163.com
State Changes Requested
Delegated to: Kevin Traynor
Headers show
Series [ovs-dev] netdev-dpdk: Fixed RCU deadlock when deleting vhostuser port | expand

Commit Message

Xinxin Zhao Aug. 8, 2024, 9:57 a.m. UTC
When the ovs control thread del vhost-user port and
the vhost-event thread process the vhost-user port down concurrently,
the main thread may fall into a deadlock.

E.g., vhostuser port is created as client.
The ovs control thread executes the following process:
rte_vhost_driver_unregister->fdset_try_del.
At the same time, the vhost-event thread executes the following process:
fdset_event_dispatch->vhost_user_read_cb->destroy_device.
At this time, vhost-event will wait for rcu scheduling,
and the ovs control thread is waiting for pfdentry->busy to be 0.
The two threads are waiting for each other and fall into a deadlock.

Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.")

Signed-off-by: Xinxin Zhao <15957197901@163.com>
---
 lib/netdev-dpdk.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Kevin Traynor Sept. 5, 2024, 4:35 p.m. UTC | #1
On 08/08/2024 10:57, Xinxin Zhao wrote:
> When the ovs control thread del vhost-user port and
> the vhost-event thread process the vhost-user port down concurrently,
> the main thread may fall into a deadlock.
> 
> E.g., vhostuser port is created as client.
> The ovs control thread executes the following process:
> rte_vhost_driver_unregister->fdset_try_del.
> At the same time, the vhost-event thread executes the following process:
> fdset_event_dispatch->vhost_user_read_cb->destroy_device.
> At this time, vhost-event will wait for rcu scheduling,
> and the ovs control thread is waiting for pfdentry->busy to be 0.
> The two threads are waiting for each other and fall into a deadlock.
> 

Hi Xinxin,

Thanks for the patch. I managed to reproduced this with a little bit of
hacking. Indeed, a deadlock can occur with some unlucky timing.

Acked-by: Kevin Traynor <ktraynor@redhat.com>

> Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.")
> 
> Signed-off-by: Xinxin Zhao <15957197901@163.com>
> ---
>  lib/netdev-dpdk.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 02cef6e45..0c02357f5 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1808,7 +1808,16 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
>      OVS_EXCLUDED(dpdk_mutex)
>      OVS_EXCLUDED(dev->mutex)
>  {
> -    return rte_vhost_driver_unregister(vhost_id);
> +    int ret;
> +    /* Due to the rcu wait of the vhost-event thread,
> +     * rte_vhost_driver_unregister() may loop endlessly.
> +     * So the unregister action needs to be removed from the rcu_list.
> +     */
> +    ovsrcu_quiesce_start();
> +    ret = rte_vhost_driver_unregister(vhost_id);
> +    ovsrcu_quiesce_end();
> +
> +    return ret;
>  }
>  
>  static void
Eelco Chaudron Sept. 6, 2024, 6:38 a.m. UTC | #2
On 5 Sep 2024, at 18:35, Kevin Traynor wrote:

> On 08/08/2024 10:57, Xinxin Zhao wrote:
>> When the ovs control thread del vhost-user port and
>> the vhost-event thread process the vhost-user port down concurrently,
>> the main thread may fall into a deadlock.
>>
>> E.g., vhostuser port is created as client.
>> The ovs control thread executes the following process:
>> rte_vhost_driver_unregister->fdset_try_del.
>> At the same time, the vhost-event thread executes the following process:
>> fdset_event_dispatch->vhost_user_read_cb->destroy_device.
>> At this time, vhost-event will wait for rcu scheduling,
>> and the ovs control thread is waiting for pfdentry->busy to be 0.
>> The two threads are waiting for each other and fall into a deadlock.
>>
>
> Hi Xinxin,
>
> Thanks for the patch. I managed to reproduced this with a little bit of
> hacking. Indeed, a deadlock can occur with some unlucky timing.
>
> Acked-by: Kevin Traynor <ktraynor@redhat.com>

Kevin or Xinxin, can you add some more explanation on where the deadlock is occurring?

Also, how do we guarantee that it’s safe to go to quiesce state and that no others in the call chain hold/use any RCU-protected data?

Thanks,

Eelco

>> Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.")
>>
>> Signed-off-by: Xinxin Zhao <15957197901@163.com>
>> ---
>>  lib/netdev-dpdk.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 02cef6e45..0c02357f5 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1808,7 +1808,16 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
>>      OVS_EXCLUDED(dpdk_mutex)
>>      OVS_EXCLUDED(dev->mutex)
>>  {
>> -    return rte_vhost_driver_unregister(vhost_id);
>> +    int ret;
>> +    /* Due to the rcu wait of the vhost-event thread,
>> +     * rte_vhost_driver_unregister() may loop endlessly.
>> +     * So the unregister action needs to be removed from the rcu_list.
>> +     */
>> +    ovsrcu_quiesce_start();
>> +    ret = rte_vhost_driver_unregister(vhost_id);
>> +    ovsrcu_quiesce_end();
>> +
>> +    return ret;
>>  }
>>
>>  static void
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Kevin Traynor Sept. 6, 2024, 2:03 p.m. UTC | #3
On 06/09/2024 07:38, Eelco Chaudron wrote:
> 
> 
> On 5 Sep 2024, at 18:35, Kevin Traynor wrote:
> 
>> On 08/08/2024 10:57, Xinxin Zhao wrote:
>>> When the ovs control thread del vhost-user port and
>>> the vhost-event thread process the vhost-user port down concurrently,
>>> the main thread may fall into a deadlock.
>>>
>>> E.g., vhostuser port is created as client.
>>> The ovs control thread executes the following process:
>>> rte_vhost_driver_unregister->fdset_try_del.
>>> At the same time, the vhost-event thread executes the following process:
>>> fdset_event_dispatch->vhost_user_read_cb->destroy_device.
>>> At this time, vhost-event will wait for rcu scheduling,
>>> and the ovs control thread is waiting for pfdentry->busy to be 0.
>>> The two threads are waiting for each other and fall into a deadlock.
>>>
>>
>> Hi Xinxin,
>>
>> Thanks for the patch. I managed to reproduced this with a little bit of
>> hacking. Indeed, a deadlock can occur with some unlucky timing.
>>
>> Acked-by: Kevin Traynor <ktraynor@redhat.com>
> 
> Kevin or Xinxin, can you add some more explanation on where the deadlock is occurring?
> 

vhost-event thread is blocking on the synchronize, waiting for main
thread to queisce, i.e.

#3  0x0000000003690c09 in ovsrcu_synchronize () at lib/ovs-rcu.c:237
#4  0x00000000037162fa in destroy_device (vid=0) at lib/netdev-dpdk.c:4919
#5  0x0000000003238a12 in vhost_destroy_device_notify (dev=0x14ba693c0)
at ../lib/vhost/vhost.c:756

While main thread is looping in dpdk unregister, waiting for vhost-event
callback to finish, i.e.:

#1  0x00000000032336e8 in rte_vhost_driver_unregister (path=0xd217a30
"/tmp/vhost0") at ../lib/vhost/socket.c:1075
#2  0x000000000370d5f3 in dpdk_vhost_driver_unregister (dev=0x17d6e8b40,
vhost_id=0xd217a30 "/tmp/vhost0") at lib/netdev-dpdk.c:1811
#3  0x000000000370d709 in netdev_dpdk_vhost_destruct
(netdev=0x17d6e8bc0) at lib/netdev-dpdk.c:1843

> Also, how do we guarantee that it’s safe to go to quiesce state and that no others in the call chain hold/use any RCU-protected data?
> 

It should be fine for anything rcu freed in the main thread (possible
mirror bridge struct) as other threads would need to quiesce too, but
you have a point about anything used in main thread. We are deep into
bridge_run(), so I'm not sure how we could test for every scenario.

If we can't guarantee it, then maybe another approach is needed, perhaps
we could hijack the ovs_vhost thread mechanism to call the unregister ?
but i'm not sure if there's other implications doing it asynchronously.

> Thanks,
> 
> Eelco
> 
>>> Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.")
>>>
>>> Signed-off-by: Xinxin Zhao <15957197901@163.com>
>>> ---
>>>  lib/netdev-dpdk.c | 11 ++++++++++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 02cef6e45..0c02357f5 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -1808,7 +1808,16 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
>>>      OVS_EXCLUDED(dpdk_mutex)
>>>      OVS_EXCLUDED(dev->mutex)
>>>  {
>>> -    return rte_vhost_driver_unregister(vhost_id);
>>> +    int ret;
>>> +    /* Due to the rcu wait of the vhost-event thread,
>>> +     * rte_vhost_driver_unregister() may loop endlessly.
>>> +     * So the unregister action needs to be removed from the rcu_list.
>>> +     */
>>> +    ovsrcu_quiesce_start();
>>> +    ret = rte_vhost_driver_unregister(vhost_id);
>>> +    ovsrcu_quiesce_end();
>>> +
>>> +    return ret;
>>>  }
>>>
>>>  static void
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Eelco Chaudron Sept. 9, 2024, 12:34 p.m. UTC | #4
On 6 Sep 2024, at 16:03, Kevin Traynor wrote:

> On 06/09/2024 07:38, Eelco Chaudron wrote:
>>
>>
>> On 5 Sep 2024, at 18:35, Kevin Traynor wrote:
>>
>>> On 08/08/2024 10:57, Xinxin Zhao wrote:
>>>> When the ovs control thread del vhost-user port and
>>>> the vhost-event thread process the vhost-user port down concurrently,
>>>> the main thread may fall into a deadlock.
>>>>
>>>> E.g., vhostuser port is created as client.
>>>> The ovs control thread executes the following process:
>>>> rte_vhost_driver_unregister->fdset_try_del.
>>>> At the same time, the vhost-event thread executes the following process:
>>>> fdset_event_dispatch->vhost_user_read_cb->destroy_device.
>>>> At this time, vhost-event will wait for rcu scheduling,
>>>> and the ovs control thread is waiting for pfdentry->busy to be 0.
>>>> The two threads are waiting for each other and fall into a deadlock.
>>>>
>>>
>>> Hi Xinxin,
>>>
>>> Thanks for the patch. I managed to reproduced this with a little bit of
>>> hacking. Indeed, a deadlock can occur with some unlucky timing.
>>>
>>> Acked-by: Kevin Traynor <ktraynor@redhat.com>
>>
>> Kevin or Xinxin, can you add some more explanation on where the deadlock is occurring?
>>
>
> vhost-event thread is blocking on the synchronize, waiting for main
> thread to queisce, i.e.
>
> #3  0x0000000003690c09 in ovsrcu_synchronize () at lib/ovs-rcu.c:237
> #4  0x00000000037162fa in destroy_device (vid=0) at lib/netdev-dpdk.c:4919
> #5  0x0000000003238a12 in vhost_destroy_device_notify (dev=0x14ba693c0)
> at ../lib/vhost/vhost.c:756
>
> While main thread is looping in dpdk unregister, waiting for vhost-event
> callback to finish, i.e.:
>
> #1  0x00000000032336e8 in rte_vhost_driver_unregister (path=0xd217a30
> "/tmp/vhost0") at ../lib/vhost/socket.c:1075
> #2  0x000000000370d5f3 in dpdk_vhost_driver_unregister (dev=0x17d6e8b40,
> vhost_id=0xd217a30 "/tmp/vhost0") at lib/netdev-dpdk.c:1811
> #3  0x000000000370d709 in netdev_dpdk_vhost_destruct
> (netdev=0x17d6e8bc0) at lib/netdev-dpdk.c:1843
>
>> Also, how do we guarantee that it’s safe to go to quiesce state and that no others in the call chain hold/use any RCU-protected data?
>>
>
> It should be fine for anything rcu freed in the main thread (possible
> mirror bridge struct) as other threads would need to quiesce too, but
> you have a point about anything used in main thread. We are deep into
> bridge_run(), so I'm not sure how we could test for every scenario.
>
> If we can't guarantee it, then maybe another approach is needed, perhaps
> we could hijack the ovs_vhost thread mechanism to call the unregister ?
> but i'm not sure if there's other implications doing it asynchronously.


This callback is called by netdev_unref(), which is called for example by netdev_close(). netdev_close() is called all over the place which makes it unsafe to just go through quiesce state. I guess we need another way to fix this.

>> Thanks,
>>
>> Eelco
>>
>>>> Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.")
>>>>
>>>> Signed-off-by: Xinxin Zhao <15957197901@163.com>
>>>> ---
>>>>  lib/netdev-dpdk.c | 11 ++++++++++-
>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>>> index 02cef6e45..0c02357f5 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -1808,7 +1808,16 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
>>>>      OVS_EXCLUDED(dpdk_mutex)
>>>>      OVS_EXCLUDED(dev->mutex)
>>>>  {
>>>> -    return rte_vhost_driver_unregister(vhost_id);
>>>> +    int ret;
>>>> +    /* Due to the rcu wait of the vhost-event thread,
>>>> +     * rte_vhost_driver_unregister() may loop endlessly.
>>>> +     * So the unregister action needs to be removed from the rcu_list.
>>>> +     */
>>>> +    ovsrcu_quiesce_start();
>>>> +    ret = rte_vhost_driver_unregister(vhost_id);
>>>> +    ovsrcu_quiesce_end();
>>>> +
>>>> +    return ret;
>>>>  }
>>>>
>>>>  static void
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Kevin Traynor Sept. 11, 2024, 9:19 a.m. UTC | #5
On 09/09/2024 13:34, Eelco Chaudron wrote:
> 
> 
> On 6 Sep 2024, at 16:03, Kevin Traynor wrote:
> 
>> On 06/09/2024 07:38, Eelco Chaudron wrote:
>>>
>>>
>>> On 5 Sep 2024, at 18:35, Kevin Traynor wrote:
>>>
>>>> On 08/08/2024 10:57, Xinxin Zhao wrote:
>>>>> When the ovs control thread del vhost-user port and
>>>>> the vhost-event thread process the vhost-user port down concurrently,
>>>>> the main thread may fall into a deadlock.
>>>>>
>>>>> E.g., vhostuser port is created as client.
>>>>> The ovs control thread executes the following process:
>>>>> rte_vhost_driver_unregister->fdset_try_del.
>>>>> At the same time, the vhost-event thread executes the following process:
>>>>> fdset_event_dispatch->vhost_user_read_cb->destroy_device.
>>>>> At this time, vhost-event will wait for rcu scheduling,
>>>>> and the ovs control thread is waiting for pfdentry->busy to be 0.
>>>>> The two threads are waiting for each other and fall into a deadlock.
>>>>>
>>>>

Hi Xinxin,

Just wondering if you observed this during normal usage, or did you
identify from code inspection/tools or some other stress test etc ?

thanks,
Kevin.

>>>> Hi Xinxin,
>>>>
>>>> Thanks for the patch. I managed to reproduced this with a little bit of
>>>> hacking. Indeed, a deadlock can occur with some unlucky timing.
>>>>
>>>> Acked-by: Kevin Traynor <ktraynor@redhat.com>
>>>
>>> Kevin or Xinxin, can you add some more explanation on where the deadlock is occurring?
>>>
>>
>> vhost-event thread is blocking on the synchronize, waiting for main
>> thread to queisce, i.e.
>>
>> #3  0x0000000003690c09 in ovsrcu_synchronize () at lib/ovs-rcu.c:237
>> #4  0x00000000037162fa in destroy_device (vid=0) at lib/netdev-dpdk.c:4919
>> #5  0x0000000003238a12 in vhost_destroy_device_notify (dev=0x14ba693c0)
>> at ../lib/vhost/vhost.c:756
>>
>> While main thread is looping in dpdk unregister, waiting for vhost-event
>> callback to finish, i.e.:
>>
>> #1  0x00000000032336e8 in rte_vhost_driver_unregister (path=0xd217a30
>> "/tmp/vhost0") at ../lib/vhost/socket.c:1075
>> #2  0x000000000370d5f3 in dpdk_vhost_driver_unregister (dev=0x17d6e8b40,
>> vhost_id=0xd217a30 "/tmp/vhost0") at lib/netdev-dpdk.c:1811
>> #3  0x000000000370d709 in netdev_dpdk_vhost_destruct
>> (netdev=0x17d6e8bc0) at lib/netdev-dpdk.c:1843
>>
>>> Also, how do we guarantee that it’s safe to go to quiesce state and that no others in the call chain hold/use any RCU-protected data?
>>>
>>
>> It should be fine for anything rcu freed in the main thread (possible
>> mirror bridge struct) as other threads would need to quiesce too, but
>> you have a point about anything used in main thread. We are deep into
>> bridge_run(), so I'm not sure how we could test for every scenario.
>>
>> If we can't guarantee it, then maybe another approach is needed, perhaps
>> we could hijack the ovs_vhost thread mechanism to call the unregister ?
>> but i'm not sure if there's other implications doing it asynchronously.
> 
> 
> This callback is called by netdev_unref(), which is called for example by netdev_close(). netdev_close() is called all over the place which makes it unsafe to just go through quiesce state. I guess we need another way to fix this.
> 
>>> Thanks,
>>>
>>> Eelco
>>>
>>>>> Fixes: afee281 ("netdev-dpdk: Fix dpdk_watchdog failure to quiesce.")
>>>>>
>>>>> Signed-off-by: Xinxin Zhao <15957197901@163.com>
>>>>> ---
>>>>>  lib/netdev-dpdk.c | 11 ++++++++++-
>>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>>>> index 02cef6e45..0c02357f5 100644
>>>>> --- a/lib/netdev-dpdk.c
>>>>> +++ b/lib/netdev-dpdk.c
>>>>> @@ -1808,7 +1808,16 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
>>>>>      OVS_EXCLUDED(dpdk_mutex)
>>>>>      OVS_EXCLUDED(dev->mutex)
>>>>>  {
>>>>> -    return rte_vhost_driver_unregister(vhost_id);
>>>>> +    int ret;
>>>>> +    /* Due to the rcu wait of the vhost-event thread,
>>>>> +     * rte_vhost_driver_unregister() may loop endlessly.
>>>>> +     * So the unregister action needs to be removed from the rcu_list.
>>>>> +     */
>>>>> +    ovsrcu_quiesce_start();
>>>>> +    ret = rte_vhost_driver_unregister(vhost_id);
>>>>> +    ovsrcu_quiesce_end();
>>>>> +
>>>>> +    return ret;
>>>>>  }
>>>>>
>>>>>  static void
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 02cef6e45..0c02357f5 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1808,7 +1808,16 @@  dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
     OVS_EXCLUDED(dpdk_mutex)
     OVS_EXCLUDED(dev->mutex)
 {
-    return rte_vhost_driver_unregister(vhost_id);
+    int ret;
+    /* Due to the rcu wait of the vhost-event thread,
+     * rte_vhost_driver_unregister() may loop endlessly.
+     * So the unregister action needs to be removed from the rcu_list.
+     */
+    ovsrcu_quiesce_start();
+    ret = rte_vhost_driver_unregister(vhost_id);
+    ovsrcu_quiesce_end();
+
+    return ret;
 }
 
 static void