diff mbox series

nbd: Prevent NULL pointer dereference in nbd_blockdev_client_closed()

Message ID 20240607150021.121536-1-alexander.ivanov@virtuozzo.com
State New
Headers show
Series nbd: Prevent NULL pointer dereference in nbd_blockdev_client_closed() | expand

Commit Message

Alexander Ivanov June 7, 2024, 3 p.m. UTC
In some cases, the NBD server can be stopped before
nbd_blockdev_client_closed() is called, causing the nbd_server variable
to be nullified. This leads to a NULL pointer dereference when accessing
nbd_server.

Add a NULL check for nbd_server to the nbd_blockdev_client_closed()
function to prevent NULL pointer dereference.

Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
---
 blockdev-nbd.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Alexander Ivanov June 8, 2024, 9:36 a.m. UTC | #1
There is a bug reproducer in the attachment.


On 6/7/24 17:00, Alexander Ivanov wrote:
> In some cases, the NBD server can be stopped before
> nbd_blockdev_client_closed() is called, causing the nbd_server variable
> to be nullified. This leads to a NULL pointer dereference when accessing
> nbd_server.
>
> Add a NULL check for nbd_server to the nbd_blockdev_client_closed()
> function to prevent NULL pointer dereference.
>
> Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
> ---
>   blockdev-nbd.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 213012435f..fb1f30ae0d 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -52,6 +52,9 @@ int nbd_server_max_connections(void)
>   static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
>   {
>       nbd_client_put(client);
> +    if (nbd_server == NULL) {
> +        return;
> +    }
>       assert(nbd_server->connections > 0);
>       nbd_server->connections--;
>       nbd_update_server_watch(nbd_server);
Eric Blake June 10, 2024, 12:33 p.m. UTC | #2
On Sat, Jun 08, 2024 at 11:36:59AM GMT, Alexander Ivanov wrote:
> There is a bug reproducer in the attachment.

Summarizing the reproducer, you are repeatedly calling QMP
nbd-server-start/nbd-server-stop on qemu as NBD server in one thread,
and repeatedly calling 'qemu-nbd -L' in another, to try and provoke
the race where the server is stopped while qemu-nbd -L is still trying
to grab information.

> 
> 
> On 6/7/24 17:00, Alexander Ivanov wrote:
> > In some cases, the NBD server can be stopped before
> > nbd_blockdev_client_closed() is called, causing the nbd_server variable
> > to be nullified. This leads to a NULL pointer dereference when accessing
> > nbd_server.

Am I correct that the NULL pointer deref was in qemu-nbd in your
reproducer, and not in qemu-kvm?

> > 
> > Add a NULL check for nbd_server to the nbd_blockdev_client_closed()
> > function to prevent NULL pointer dereference.
> > 
> > Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
> > ---
> >   blockdev-nbd.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> > index 213012435f..fb1f30ae0d 100644
> > --- a/blockdev-nbd.c
> > +++ b/blockdev-nbd.c
> > @@ -52,6 +52,9 @@ int nbd_server_max_connections(void)
> >   static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
> >   {
> >       nbd_client_put(client);
> > +    if (nbd_server == NULL) {
> > +        return;
> > +    }
> >       assert(nbd_server->connections > 0);

While this does indeed avoid the NULL dereference right here, I still
want to understand why nbd_server is getting set to NULL while there
is still an active client, and make sure we don't have any other NULL
derefs.  I'll respond again once I've studied the code a bit more.

> >       nbd_server->connections--;
> >       nbd_update_server_watch(nbd_server);
> 
> -- 
> Best regards,
> Alexander Ivanov
Alexander Ivanov June 10, 2024, 12:52 p.m. UTC | #3
On 6/10/24 14:33, Eric Blake wrote:
> On Sat, Jun 08, 2024 at 11:36:59AM GMT, Alexander Ivanov wrote:
>> There is a bug reproducer in the attachment.
> Summarizing the reproducer, you are repeatedly calling QMP
> nbd-server-start/nbd-server-stop on qemu as NBD server in one thread,
> and repeatedly calling 'qemu-nbd -L' in another, to try and provoke
> the race where the server is stopped while qemu-nbd -L is still trying
> to grab information.
Yes, you are right.
>
>>
>> On 6/7/24 17:00, Alexander Ivanov wrote:
>>> In some cases, the NBD server can be stopped before
>>> nbd_blockdev_client_closed() is called, causing the nbd_server variable
>>> to be nullified. This leads to a NULL pointer dereference when accessing
>>> nbd_server.
> Am I correct that the NULL pointer deref was in qemu-nbd in your
> reproducer, and not in qemu-kvm?
No, it happened in qemu-kvm.
>
>>> Add a NULL check for nbd_server to the nbd_blockdev_client_closed()
>>> function to prevent NULL pointer dereference.
>>>
>>> Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
>>> ---
>>>    blockdev-nbd.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
>>> index 213012435f..fb1f30ae0d 100644
>>> --- a/blockdev-nbd.c
>>> +++ b/blockdev-nbd.c
>>> @@ -52,6 +52,9 @@ int nbd_server_max_connections(void)
>>>    static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
>>>    {
>>>        nbd_client_put(client);
>>> +    if (nbd_server == NULL) {
>>> +        return;
>>> +    }
>>>        assert(nbd_server->connections > 0);
> While this does indeed avoid the NULL dereference right here, I still
> want to understand why nbd_server is getting set to NULL while there
> is still an active client, and make sure we don't have any other NULL
> derefs.  I'll respond again once I've studied the code a bit more.
I agree that the patch fixes only symptoms, but haven't succeeded with the
bug roots investigation, and decided to raise this issue with the community.

Also, later I reproduced another bug with the patch applied and with the
same reproducer. There was an

    assert(nbd_server->connections > 0);

violation in nbd_blockdev_client_closed(). It happens much less frequently,
however. Think the reasons are similar to the original bug.
>
>>>        nbd_server->connections--;
>>>        nbd_update_server_watch(nbd_server);
>> -- 
>> Best regards,
>> Alexander Ivanov
Alexander Ivanov June 18, 2024, 8:44 a.m. UTC | #4
Hello Eric,

Do you have any ideas about the bug?

Thank you.

On 6/10/24 14:33, Eric Blake wrote:
> On Sat, Jun 08, 2024 at 11:36:59AM GMT, Alexander Ivanov wrote:
>> There is a bug reproducer in the attachment.
> Summarizing the reproducer, you are repeatedly calling QMP
> nbd-server-start/nbd-server-stop on qemu as NBD server in one thread,
> and repeatedly calling 'qemu-nbd -L' in another, to try and provoke
> the race where the server is stopped while qemu-nbd -L is still trying
> to grab information.
>
>>
>> On 6/7/24 17:00, Alexander Ivanov wrote:
>>> In some cases, the NBD server can be stopped before
>>> nbd_blockdev_client_closed() is called, causing the nbd_server variable
>>> to be nullified. This leads to a NULL pointer dereference when accessing
>>> nbd_server.
> Am I correct that the NULL pointer deref was in qemu-nbd in your
> reproducer, and not in qemu-kvm?
>
>>> Add a NULL check for nbd_server to the nbd_blockdev_client_closed()
>>> function to prevent NULL pointer dereference.
>>>
>>> Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
>>> ---
>>>    blockdev-nbd.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
>>> index 213012435f..fb1f30ae0d 100644
>>> --- a/blockdev-nbd.c
>>> +++ b/blockdev-nbd.c
>>> @@ -52,6 +52,9 @@ int nbd_server_max_connections(void)
>>>    static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
>>>    {
>>>        nbd_client_put(client);
>>> +    if (nbd_server == NULL) {
>>> +        return;
>>> +    }
>>>        assert(nbd_server->connections > 0);
> While this does indeed avoid the NULL dereference right here, I still
> want to understand why nbd_server is getting set to NULL while there
> is still an active client, and make sure we don't have any other NULL
> derefs.  I'll respond again once I've studied the code a bit more.
>
>>>        nbd_server->connections--;
>>>        nbd_update_server_watch(nbd_server);
>> -- 
>> Best regards,
>> Alexander Ivanov
Alexander Ivanov June 28, 2024, 9:58 a.m. UTC | #5
Ping?

On 6/7/24 17:00, Alexander Ivanov wrote:
>   static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
>   {
>       nbd_client_put(client);
> +    if (nbd_server == NULL) {
> +        return;
> +    }
>       assert(nbd_server->connections > 0);
>       nbd_server->connections--;
>       nbd_update_server_watch(nbd_server);
Eric Blake July 30, 2024, 2:35 a.m. UTC | #6
On Fri, Jun 28, 2024 at 11:58:37AM GMT, Alexander Ivanov wrote:
> Ping?
> 
> On 6/7/24 17:00, Alexander Ivanov wrote:
> >   static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
> >   {
> >       nbd_client_put(client);
> > +    if (nbd_server == NULL) {
> > +        return;
> > +    }
> >       assert(nbd_server->connections > 0);
> >       nbd_server->connections--;
> >       nbd_update_server_watch(nbd_server);
>

I've spent more time looking at this, and I'm stumped.  Maybe Dan can
help out.  If I understand correctly, here's the race situation we
have:

qemu main loop                coroutine                    client
QMP nbd-server-start
 - calls nbd_server_start
   - assigns nbd_server = g_new0()
   - calls qio_net_listener_open_sync
   - calls nbd_server_update_watch
     - calls qio_net_listener_set_client_func(, nbd_accept, )
     - waiting for client is now in coroutine
- returns control to QMP main loop
                                                           opens TCP socket
                              - qio notices incoming connection
                                - calls nbd_accept callback
                                - starts NBD handshake nbd_client_new(, nbd_blockdev_client_closed)
QMP nbd-server-stop
 - calls nbd_server_stop
   - calls nbd_server_free
     - calls qio_net_listener_disconnect()
       - marks qio channel as useless
   - sets nbd_server = NULL
                                - qio sees channel is useless, disconnects
                                                           client deals gracefully with dead connection
                                - calls nbd_blockdev_client_closed callback
                                - segfault since nbd_server->connections derefs NULL


Thread 1 "qemu-kvm" received signal SIGSEGV, Segmentation fault.
0x000055555600af59 in nbd_blockdev_client_closed (client=0x55555810dfc0, 
    ignored=false) at ../blockdev-nbd.c:56
56	    assert(nbd_server->connections > 0);
(gdb) p nbd_server
$1 = (NBDServerData *) 0x0
(gdb) bt
#0  0x000055555600af59 in nbd_blockdev_client_closed
    (client=0x55555810dfc0, ignored=false) at ../blockdev-nbd.c:56
#1  0x0000555555ff72f9 in client_close
    (client=0x55555810dfc0, negotiated=false) at ../nbd/server.c:1624
#2  0x0000555555ffbc49 in nbd_co_client_start (opaque=0x55555810dfc0)
    at ../nbd/server.c:3198
#3  0x0000555556220629 in coroutine_trampoline (i0=1488434896, i1=21845)

Worse, the race could go another way: if another QMP nbd-server-start
is called fast enough before the coroutine finishes the nbd_accept
code from the first connection, it could attempt to modify the second
nbd_server object, which may be associated with a completely different
socket.

As far as I can tell, the problem stems from the fact that the attempt
to establish the connection with the client is continuing in a
background coroutine despite our efforts to honor the QMP
nbd-server-stop command.  But I'm not sure on the proper way to inform
qio to abandon all incoming traffic to a given net listener.  Or maybe
I should just change the semantics of QMP nbd-server-stop to fail if
there are known connections from and nbd_accept() - but I still don't
want that to wait indefinitely, so I still want some way to tell the
qio channel that the server is going away and to tear down sockets as
soon as possible.

As a stopgap, something like this might prevent the SEGV:

diff --git i/blockdev-nbd.c w/blockdev-nbd.c
index 213012435f4..2f5ea094407 100644
--- i/blockdev-nbd.c
+++ w/blockdev-nbd.c
@@ -277,6 +277,12 @@ void qmp_nbd_server_stop(Error **errp)

     blk_exp_close_all_type(BLOCK_EXPORT_TYPE_NBD);

+    if (qio_net_listener_is_connected(nbd_server->listener) ||
+        nbd_server->connections > 0) {
+        error_setg(errp, "NBD server still has connected clients");
+        return;
+    }
+
     nbd_server_free(nbd_server);
     nbd_server = NULL;
 }

but it's not as graceful as I'd like (it would be nicer to have the
nbd-server-stop command wait until it knows all connections are
cleaned, instead of failing up front).
diff mbox series

Patch

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 213012435f..fb1f30ae0d 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -52,6 +52,9 @@  int nbd_server_max_connections(void)
 static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
 {
     nbd_client_put(client);
+    if (nbd_server == NULL) {
+        return;
+    }
     assert(nbd_server->connections > 0);
     nbd_server->connections--;
     nbd_update_server_watch(nbd_server);