Message ID | 20220314203818.3681277-4-eblake@redhat.com |
---|---|
State | New |
Headers | show |
Series | nbd: MULTI_CONN for shared writable exports | expand |
The patches seem OK to me, but I don't really know enough about the internals of qemu-nbd to give a line-by-line review. I did however build and test qemu-nbd with the patches: $ ./build/qemu-nbd /var/tmp/test.qcow2 $ nbdinfo nbd://localhost ... can_multi_conn: false $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 $ nbdinfo nbd://localhost ... can_multi_conn: false ^^^ Is this expected? It also happens with -e 0. $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 $ nbdinfo nbd://localhost ... can_multi_conn: true $ ./build/qemu-nbd -e 2 -m off /var/tmp/test.qcow2 $ nbdinfo nbd://localhost ... can_multi_conn: false Rich.
On Tue, Mar 15, 2022 at 01:14:41PM +0000, Richard W.M. Jones wrote: > The patches seem OK to me, but I don't really know enough about the > internals of qemu-nbd to give a line-by-line review. I did however > build and test qemu-nbd with the patches: > > $ ./build/qemu-nbd /var/tmp/test.qcow2 > $ nbdinfo nbd://localhost > ... > can_multi_conn: false > > > $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 > $ nbdinfo nbd://localhost > ... > can_multi_conn: false > > ^^^ Is this expected? It also happens with -e 0. Yes, because qemu-nbd defaults to read-write connections, but to be conservative, this patch defaults '-m auto' to NOT advertise multi-conn for read-write; you need to be explicit: > > > $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 > $ nbdinfo nbd://localhost > ... > can_multi_conn: true either with '-m on' as you did here, or with build/qemu-nbd -r -e 2 /var/tmp/test.qcow2 where the '-m auto' default exposes multi-conn for a readonly client. > > > $ ./build/qemu-nbd -e 2 -m off /var/tmp/test.qcow2 > $ nbdinfo nbd://localhost > ... > can_multi_conn: false > > > Rich.
On Wed, Mar 16, 2022 at 04:07:21PM -0500, Eric Blake wrote: > On Tue, Mar 15, 2022 at 01:14:41PM +0000, Richard W.M. Jones wrote: > > The patches seem OK to me, but I don't really know enough about the > > internals of qemu-nbd to give a line-by-line review. I did however > > build and test qemu-nbd with the patches: > > > > $ ./build/qemu-nbd /var/tmp/test.qcow2 > > $ nbdinfo nbd://localhost > > ... > > can_multi_conn: false > > > > > > $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 > > $ nbdinfo nbd://localhost > > ... > > can_multi_conn: false > > > > ^^^ Is this expected? It also happens with -e 0. > > Yes, because qemu-nbd defaults to read-write connections, but to be > conservative, this patch defaults '-m auto' to NOT advertise > multi-conn for read-write; you need to be explicit: > > > > > > > $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 > > $ nbdinfo nbd://localhost > > ... > > can_multi_conn: true > > either with '-m on' as you did here, or with > > build/qemu-nbd -r -e 2 /var/tmp/test.qcow2 > > where the '-m auto' default exposes multi-conn for a readonly client. I hit send prematurely - one more thought I wanted to make clear. For _this_ series, the behavior of '-m auto' depends solely on readonly vs. read-write; that being the most conservative choice of preserving pre-existing qemu-nbd behavior (that is, if you don't use -m, the only change in behavior should be the fact that --help output now lists -m). But in future versions of qemu, we might be able to improve '-m auto' to also take into consideration whether the backing image of a read-write device is known-cache-consistent (such as a local file system), where we can then manage to default to advertising multi-conn for those writable images, while still avoiding the advertisement when exposing other devices such as network-mounted devices where we are less confident on whether there are sufficient cache consistency guarantees. Use of explicit '-m off' or '-m on' gives guaranteed results no matter the qemu version, so it is only explicit (or implied) '-m auto' where we might get smarter defaults over time. Thus testing whether advertising multi-conn makes a difference in other tools like nbdcopy thus requires checking if qemu-nbd is new enough to support -m, and then being explicit that we are opting in to using it.
On Wed, Mar 16, 2022 at 04:15:53PM -0500, Eric Blake wrote: > On Wed, Mar 16, 2022 at 04:07:21PM -0500, Eric Blake wrote: > > On Tue, Mar 15, 2022 at 01:14:41PM +0000, Richard W.M. Jones wrote: > > > The patches seem OK to me, but I don't really know enough about the > > > internals of qemu-nbd to give a line-by-line review. I did however > > > build and test qemu-nbd with the patches: > > > > > > $ ./build/qemu-nbd /var/tmp/test.qcow2 > > > $ nbdinfo nbd://localhost > > > ... > > > can_multi_conn: false > > > > > > > > > $ ./build/qemu-nbd -e 2 /var/tmp/test.qcow2 > > > $ nbdinfo nbd://localhost > > > ... > > > can_multi_conn: false > > > > > > ^^^ Is this expected? It also happens with -e 0. > > > > Yes, because qemu-nbd defaults to read-write connections, but to be > > conservative, this patch defaults '-m auto' to NOT advertise > > multi-conn for read-write; you need to be explicit: > > > > > > > > > > > $ ./build/qemu-nbd -e 2 -m on /var/tmp/test.qcow2 > > > $ nbdinfo nbd://localhost > > > ... > > > can_multi_conn: true > > > > either with '-m on' as you did here, or with > > > > build/qemu-nbd -r -e 2 /var/tmp/test.qcow2 > > > > where the '-m auto' default exposes multi-conn for a readonly client. > > I hit send prematurely - one more thought I wanted to make clear. For > _this_ series, the behavior of '-m auto' depends solely on readonly > vs. read-write; that being the most conservative choice of preserving > pre-existing qemu-nbd behavior (that is, if you don't use -m, the only > change in behavior should be the fact that --help output now lists > -m). > > But in future versions of qemu, we might be able to improve '-m auto' > to also take into consideration whether the backing image of a > read-write device is known-cache-consistent (such as a local file > system), where we can then manage to default to advertising multi-conn > for those writable images, while still avoiding the advertisement when > exposing other devices such as network-mounted devices where we are > less confident on whether there are sufficient cache consistency > guarantees. Use of explicit '-m off' or '-m on' gives guaranteed > results no matter the qemu version, so it is only explicit (or > implied) '-m auto' where we might get smarter defaults over time. > > Thus testing whether advertising multi-conn makes a difference in > other tools like nbdcopy thus requires checking if qemu-nbd is new > enough to support -m, and then being explicit that we are opting in to > using it. I see. The commit message of patch 3 confused me because superficially it seems to say that multi-conn is safe (and therefore I assumed safe == enabled) when backed by a local filesystem. Could the commit message be improved to list the default for common combinations of flags? Rich.
Am 14.03.2022 um 21:38 hat Eric Blake geschrieben: > According to the NBD spec, a server that advertises > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > not see any cache inconsistencies: when properly separated by a single > flush, actions performed by one client will be visible to another > client, regardless of which client did the flush. We satisfy these > conditions in qemu when our block layer is backed by the local > filesystem (by virtue of the semantics of fdatasync(), and the fact > that qemu itself is not buffering writes beyond flushes). It is > harder to state whether we satisfy these conditions for network-based > protocols, so the safest course of action is to allow users to opt-in > to advertising multi-conn. Do you have an example of how this could be unsafe? As I understand it, the NBD server has a single BlockBackend and therefore is a single client for the backend, be it file-posix or any network-based protocol. It doesn't really make a difference for the storage from how many different NBD clients the requests are coming. I would have expected that cache coherency of the protocol level driver would only matter if you had two QEMU processes accessing the same file concurrently. In fact, I don't think we even need the flush restriction from the NBD spec. All clients see the same state (that of the NBD server BlockBackend) even without anyone issuing any flush. The flush is only needed to make sure that cached data is written to the backing storage when writeback caches are involved. Please correct me if I'm misunderstanding something here. > We may later tweak defaults to advertise > by default when the block layer can confirm that the underlying > protocol driver is cache consistent between multiple writers, but for > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > explicitly start qemu-nbd (new -m command-line option) or > qemu-storage-daemon (new qapi field 'multi-conn') with multi-conn > advertisement in a known-safe setup where the client end can then > benefit from parallel clients. > > Note, however, that we don't want to advertise MULTI_CONN when we know > that a second client cannot connect (for historical reasons, qemu-nbd > defaults to a single connection while nbd-server-add and QMP commands > default to unlimited connections; but we already have existing means > to let either style of NBD server creation alter those defaults). The > harder part of this patch is setting up an iotest to demonstrate > behavior of multiple NBD clients to a single server. It might be > possible with parallel qemu-io processes, but I found it easier to do > in python with the help of libnbd, and help from Nir and Vladimir in > writing the test. > > Signed-off-by: Eric Blake <eblake@redhat.com> > Suggested-by: Nir Soffer <nsoffer@redhat.com> > Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@ya.ru> > @@ -709,6 +714,17 @@ int main(int argc, char **argv) > exit(EXIT_FAILURE); > } > break; > + case 'm': > + { > + Error *err = NULL; > + multi_conn = qapi_enum_parse(&OnOffAuto_lookup, optarg, > + ON_OFF_AUTO_AUTO, &err); > + if (err) { > + error_report_err(err); > + exit(EXIT_FAILURE); > + } I think this is the same as passing &error_fatal. > + break; > + } > case 'f': > fmt = optarg; > break; > diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn > new file mode 100755 > index 000000000000..7d1179b33b05 > --- /dev/null > +++ b/tests/qemu-iotests/tests/nbd-multiconn > @@ -0,0 +1,157 @@ > +#!/usr/bin/env python3 > +# group: rw auto quick > +# > +# Test cases for NBD multi-conn advertisement > +# > +# Copyright (C) 2022 Red Hat, Inc. > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <http://www.gnu.org/licenses/>. > + > +import os > +from contextlib import contextmanager > +import iotests > +from iotests import qemu_img_create, qemu_io_silent qemu_io_silent() doesn't exist any more, commit 72cfb937 removed it. Kevin
On Wed, Apr 27, 2022 at 05:52:09PM +0200, Kevin Wolf wrote: > Am 14.03.2022 um 21:38 hat Eric Blake geschrieben: > > According to the NBD spec, a server that advertises > > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > > not see any cache inconsistencies: when properly separated by a single > > flush, actions performed by one client will be visible to another > > client, regardless of which client did the flush. We satisfy these > > conditions in qemu when our block layer is backed by the local > > filesystem (by virtue of the semantics of fdatasync(), and the fact > > that qemu itself is not buffering writes beyond flushes). It is > > harder to state whether we satisfy these conditions for network-based > > protocols, so the safest course of action is to allow users to opt-in > > to advertising multi-conn. > > Do you have an example of how this could be unsafe? Nothing direct. I tried to turn this on unconditionally in an earlier version, and we waffled about whether we could prove that network block backends (such as gluster) provide us the safety that the NBD spec demands: https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html > > As I understand it, the NBD server has a single BlockBackend and > therefore is a single client for the backend, be it file-posix or any > network-based protocol. It doesn't really make a difference for the > storage from how many different NBD clients the requests are coming. > > I would have expected that cache coherency of the protocol level driver > would only matter if you had two QEMU processes accessing the same file > concurrently. Or a multi-pathed connection to network storage, where one QEMU process accesses the network device, but those accesses may round-robin which server they reach, and where any caching at an individual server may be inconsistent with what is seen on another server unless flushing is used to force the round-robin access to synchronize between the multi-path views. > > In fact, I don't think we even need the flush restriction from the NBD > spec. All clients see the same state (that of the NBD server > BlockBackend) even without anyone issuing any flush. The flush is only > needed to make sure that cached data is written to the backing storage > when writeback caches are involved. > > Please correct me if I'm misunderstanding something here. Likewise me, if I'm being overly cautious. I can certainly write a simpler v4 that just always advertises MULTI_CONN if we allow more than one client, without any knob to override it; it's just that it is harder to write a commit message justifying why I think it is safe to do so. > > > We may later tweak defaults to advertise > > by default when the block layer can confirm that the underlying > > protocol driver is cache consistent between multiple writers, but for > > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > > explicitly start qemu-nbd (new -m command-line option) or > > qemu-storage-daemon (new qapi field 'multi-conn') with multi-conn > > advertisement in a known-safe setup where the client end can then > > benefit from parallel clients. > > > > Note, however, that we don't want to advertise MULTI_CONN when we know > > that a second client cannot connect (for historical reasons, qemu-nbd > > defaults to a single connection while nbd-server-add and QMP commands > > default to unlimited connections; but we already have existing means > > to let either style of NBD server creation alter those defaults). The > > harder part of this patch is setting up an iotest to demonstrate > > behavior of multiple NBD clients to a single server. It might be > > possible with parallel qemu-io processes, but I found it easier to do > > in python with the help of libnbd, and help from Nir and Vladimir in > > writing the test. > > > > Signed-off-by: Eric Blake <eblake@redhat.com> > > Suggested-by: Nir Soffer <nsoffer@redhat.com> > > Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@ya.ru> > > > @@ -709,6 +714,17 @@ int main(int argc, char **argv) > > exit(EXIT_FAILURE); > > } > > break; > > + case 'm': > > + { > > + Error *err = NULL; > > + multi_conn = qapi_enum_parse(&OnOffAuto_lookup, optarg, > > + ON_OFF_AUTO_AUTO, &err); > > + if (err) { > > + error_report_err(err); > > + exit(EXIT_FAILURE); > > + } > > I think this is the same as passing &error_fatal. Yes, sounds right. > > > + break; > > + } > > case 'f': > > fmt = optarg; > > break; > > diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn > > new file mode 100755 > > index 000000000000..7d1179b33b05 > > --- /dev/null > > +++ b/tests/qemu-iotests/tests/nbd-multiconn > > @@ -0,0 +1,157 @@ > > +#!/usr/bin/env python3 > > + > > +import os > > +from contextlib import contextmanager > > +import iotests > > +from iotests import qemu_img_create, qemu_io_silent > > qemu_io_silent() doesn't exist any more, commit 72cfb937 removed it. Sounds like I need to rebase and post v4 anyways, then.
Am 27.04.2022 um 23:39 hat Eric Blake geschrieben: > On Wed, Apr 27, 2022 at 05:52:09PM +0200, Kevin Wolf wrote: > > Am 14.03.2022 um 21:38 hat Eric Blake geschrieben: > > > According to the NBD spec, a server that advertises > > > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > > > not see any cache inconsistencies: when properly separated by a single > > > flush, actions performed by one client will be visible to another > > > client, regardless of which client did the flush. We satisfy these > > > conditions in qemu when our block layer is backed by the local > > > filesystem (by virtue of the semantics of fdatasync(), and the fact > > > that qemu itself is not buffering writes beyond flushes). It is > > > harder to state whether we satisfy these conditions for network-based > > > protocols, so the safest course of action is to allow users to opt-in > > > to advertising multi-conn. > > > > Do you have an example of how this could be unsafe? > > Nothing direct. I tried to turn this on unconditionally in an earlier > version, and we waffled about whether we could prove that network > block backends (such as gluster) provide us the safety that the NBD > spec demands: > > https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html > https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html > > > > > As I understand it, the NBD server has a single BlockBackend and > > therefore is a single client for the backend, be it file-posix or any > > network-based protocol. It doesn't really make a difference for the > > storage from how many different NBD clients the requests are coming. > > > > I would have expected that cache coherency of the protocol level driver > > would only matter if you had two QEMU processes accessing the same file > > concurrently. > > Or a multi-pathed connection to network storage, where one QEMU > process accesses the network device, but those accesses may > round-robin which server they reach, and where any caching at an > individual server may be inconsistent with what is seen on another > server unless flushing is used to force the round-robin access to > synchronize between the multi-path views. I don't think this is a realistic scenario. It would mean that you successfully write data to the storage, and when you then read the same location, you get different data back. This would be inconsistent even with a single client. So I'd call this broken storage that should be replaced as soon as possible. I could imagine problems of this kind with two separate connections to the network storage, but here all the NBD clients share a single BlockBackend, so for the storage they are a single connection. > > In fact, I don't think we even need the flush restriction from the NBD > > spec. All clients see the same state (that of the NBD server > > BlockBackend) even without anyone issuing any flush. The flush is only > > needed to make sure that cached data is written to the backing storage > > when writeback caches are involved. > > > > Please correct me if I'm misunderstanding something here. > > Likewise me, if I'm being overly cautious. > > I can certainly write a simpler v4 that just always advertises > MULTI_CONN if we allow more than one client, without any knob to > override it; it's just that it is harder to write a commit message > justifying why I think it is safe to do so. Having an explicit option doesn't hurt, but it's the reasoning in the commit message that feels wrong to me. We could consider changing "auto" to advertise MULTI_CONN even for writable exports. There might still be a good reason not to do this by default, though, because of the NBD clients. I'm quite sure that the backend won't make any trouble, but client might if someone else is writing to the same image (this is why we require an explicit share-rw=on for guest devices in the same case). Kevin
On Fri, Apr 29, 2022 at 02:49:35PM +0200, Kevin Wolf wrote: ... > > Or a multi-pathed connection to network storage, where one QEMU > > process accesses the network device, but those accesses may > > round-robin which server they reach, and where any caching at an > > individual server may be inconsistent with what is seen on another > > server unless flushing is used to force the round-robin access to > > synchronize between the multi-path views. > > I don't think this is a realistic scenario. It would mean that you > successfully write data to the storage, and when you then read the same > location, you get different data back. This would be inconsistent even > with a single client. So I'd call this broken storage that should be > replaced as soon as possible. > > I could imagine problems of this kind with two separate connections to > the network storage, but here all the NBD clients share a single > BlockBackend, so for the storage they are a single connection. I like that chain of reasoning. > > > > In fact, I don't think we even need the flush restriction from the NBD > > > spec. All clients see the same state (that of the NBD server > > > BlockBackend) even without anyone issuing any flush. The flush is only > > > needed to make sure that cached data is written to the backing storage > > > when writeback caches are involved. > > > > > > Please correct me if I'm misunderstanding something here. > > > > Likewise me, if I'm being overly cautious. > > > > I can certainly write a simpler v4 that just always advertises > > MULTI_CONN if we allow more than one client, without any knob to > > override it; it's just that it is harder to write a commit message > > justifying why I think it is safe to do so. > > Having an explicit option doesn't hurt, but it's the reasoning in the > commit message that feels wrong to me. > > We could consider changing "auto" to advertise MULTI_CONN even for > writable exports. There might still be a good reason not to do this by > default, though, because of the NBD clients. I'm quite sure that the > backend won't make any trouble, but client might if someone else is > writing to the same image (this is why we require an explicit > share-rw=on for guest devices in the same case). If your worry is about a client trying to determine if writing to an NBD server is going to race with some external process writing to the direct image, I don't see how not advertising MULTI_CONN will make things safer - the NBD client to qemu-nbd will still be going through a single backend, and that race is present even if there is only one NBD client. The point of MULTI_CONN is informing the client that it can open multiple sockets and see a consistent view across all of them, and in your scenario of the server competing with some external process over the underlying data file, that competition is not controlled by how many NBD clients connect to the server, but by the external process having access at the same time the server has access through a single BlockBackend (and would be just as risky as if MULTI_CONN were not advertised and the client limits itself to one NBD connection). If we can argue that our single BlockBackend point of access is safe enough to default to advertising MULTI_CONN for writable connections (when we support parallel clients), then exposing an OnOffAuto knob is overkill. I'm not even sure I can envision a case where needing to not advertise the bit would matter to a client (clients are supposed to ignore unknown feature bits).
Am 02.05.2022 um 23:12 hat Eric Blake geschrieben: > On Fri, Apr 29, 2022 at 02:49:35PM +0200, Kevin Wolf wrote: > ... > > > Or a multi-pathed connection to network storage, where one QEMU > > > process accesses the network device, but those accesses may > > > round-robin which server they reach, and where any caching at an > > > individual server may be inconsistent with what is seen on another > > > server unless flushing is used to force the round-robin access to > > > synchronize between the multi-path views. > > > > I don't think this is a realistic scenario. It would mean that you > > successfully write data to the storage, and when you then read the same > > location, you get different data back. This would be inconsistent even > > with a single client. So I'd call this broken storage that should be > > replaced as soon as possible. > > > > I could imagine problems of this kind with two separate connections to > > the network storage, but here all the NBD clients share a single > > BlockBackend, so for the storage they are a single connection. > > I like that chain of reasoning. > > > > > > > In fact, I don't think we even need the flush restriction from the NBD > > > > spec. All clients see the same state (that of the NBD server > > > > BlockBackend) even without anyone issuing any flush. The flush is only > > > > needed to make sure that cached data is written to the backing storage > > > > when writeback caches are involved. > > > > > > > > Please correct me if I'm misunderstanding something here. > > > > > > Likewise me, if I'm being overly cautious. > > > > > > I can certainly write a simpler v4 that just always advertises > > > MULTI_CONN if we allow more than one client, without any knob to > > > override it; it's just that it is harder to write a commit message > > > justifying why I think it is safe to do so. > > > > Having an explicit option doesn't hurt, but it's the reasoning in the > > commit message that feels wrong to me. > > > > We could consider changing "auto" to advertise MULTI_CONN even for > > writable exports. There might still be a good reason not to do this by > > default, though, because of the NBD clients. I'm quite sure that the > > backend won't make any trouble, but client might if someone else is > > writing to the same image (this is why we require an explicit > > share-rw=on for guest devices in the same case). > > If your worry is about a client trying to determine if writing to an > NBD server is going to race with some external process writing to the > direct image No, I'm not really thinking of a race, but just the fact that someone else could write to the image at all. Guest OSes don't generally expect the disk magically changing under their feet (unless you use cluster filesystems etc.), so the safe default is not allowing multiple writers to the same image. > I don't see how not advertising MULTI_CONN will make things safer - > the NBD client to qemu-nbd will still be going through a single > backend, and that race is present even if there is only one NBD > client. The point of MULTI_CONN is informing the client that it can > open multiple sockets and see a consistent view across all of them, > and in your scenario of the server competing with some external > process over the underlying data file, that competition is not > controlled by how many NBD clients connect to the server, but by the > external process having access at the same time the server has access > through a single BlockBackend (and would be just as risky as if > MULTI_CONN were not advertised and the client limits itself to one NBD > connection). Oh, right, that makes sense. So the other connections should really be considered the same client in this case, not some random other writer. Can we somehow tell whether a new connection is just the same client opening an additional connection or whether someone else is connecting? But maybe it's not worth spending too much time with this for a check that would only try to catch configurations that the user set up in a way they didn't actually intend. > If we can argue that our single BlockBackend point of access is safe > enough to default to advertising MULTI_CONN for writable connections > (when we support parallel clients), then exposing an OnOffAuto knob is > overkill. I'm not even sure I can envision a case where needing to > not advertise the bit would matter to a client (clients are supposed > to ignore unknown feature bits). Yes, my concern seems unrelated to MULTI_CONN after all. Kevin
diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt index bdb0f2a41aca..6c99070b99c8 100644 --- a/docs/interop/nbd.txt +++ b/docs/interop/nbd.txt @@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports, NBD_CMD_FLAG_FAST_ZERO * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth" +* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst index 4c950f61998e..229414e3cdc7 100644 --- a/docs/tools/qemu-nbd.rst +++ b/docs/tools/qemu-nbd.rst @@ -139,8 +139,18 @@ driver options if :option:`--image-opts` is specified. .. option:: -e, --shared=NUM Allow up to *NUM* clients to share the device (default - ``1``), 0 for unlimited. Safe for readers, but for now, - consistency is not guaranteed between multiple writers. + ``1``), 0 for unlimited. + +.. option:: -m, --multi-conn=MODE + + When multiple clients are allowed at once (via :option:`--shared`), + controls whether the NBD protocol MULTI_CONN bit is advertised to + clients. This defaults to ``auto``, which advertises the bit for + readonly connections (:option:`--read-only`) but not for writeable + connections; it can also be set to ``on`` or ``off`` to override the + default. Advertising this bit informs clients whether a flush from + one client has guaranteed consistency to subsequent access from a + parallel client. .. option:: -t, --persistent diff --git a/qapi/block-export.json b/qapi/block-export.json index f183522d0d2c..d9d38620156f 100644 --- a/qapi/block-export.json +++ b/qapi/block-export.json @@ -21,7 +21,9 @@ # recreated on the fly while the NBD server is active. # If missing, it will default to denying access (since 4.0). # @max-connections: The maximum number of connections to allow at the same -# time, 0 for unlimited. (since 5.2; default: 0) +# time, 0 for unlimited. Setting this to 1 also stops +# the server from advertising multiple client support +# (since 5.2; default: 0) # # Since: 4.2 ## @@ -50,7 +52,9 @@ # recreated on the fly while the NBD server is active. # If missing, it will default to denying access (since 4.0). # @max-connections: The maximum number of connections to allow at the same -# time, 0 for unlimited. (since 5.2; default: 0) +# time, 0 for unlimited. Setting this to 1 also stops +# the server from advertising multiple client support +# (since 5.2; default: 0). # # Returns: error if the server is already running. # @@ -95,11 +99,20 @@ # the metadata context name "qemu:allocation-depth" to # inspect allocation details. (since 5.2) # +# @multi-conn: Controls whether multiple client support is advertised, if the +# server's max-connections is not 1. "auto" advertises support +# for read-only clients, but not for writeable (although this +# may be improved in the future for writeable devices that are +# known to be cache-consistent, such as the raw file driver), +# while "on" and "off" explicitly set the bit (default auto, +# since 7.0). +# # Since: 5.2 ## { 'struct': 'BlockExportOptionsNbd', 'base': 'BlockExportOptionsNbdBase', - 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool' } } + 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool', + '*multi-conn': 'OnOffAuto' } } ## # @BlockExportOptionsVhostUserBlk: diff --git a/include/block/nbd.h b/include/block/nbd.h index c5a29ce1c61b..c74b7a9d2e6e 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016-2020 Red Hat, Inc. + * Copyright (C) 2016-2022 Red Hat, Inc. * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws> * * Network Block Device @@ -346,6 +346,7 @@ void nbd_client_put(NBDClient *client); void nbd_server_is_qemu_nbd(int max_connections); bool nbd_server_is_running(void); +int nbd_server_max_connections(void); void nbd_server_start(SocketAddress *addr, const char *tls_creds, const char *tls_authz, uint32_t max_connections, Error **errp); diff --git a/blockdev-nbd.c b/blockdev-nbd.c index add41a23af43..15ad7b8dbfd5 100644 --- a/blockdev-nbd.c +++ b/blockdev-nbd.c @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) return nbd_server || qemu_nbd_connections >= 0; } +int nbd_server_max_connections(void) +{ + return nbd_server ? nbd_server->max_connections : -1; +} + static void nbd_blockdev_client_closed(NBDClient *client, bool ignored) { nbd_client_put(client); diff --git a/nbd/server.c b/nbd/server.c index 5da884c2fc35..2cfdd56cda19 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016-2021 Red Hat, Inc. + * Copyright (C) 2016-2022 Red Hat, Inc. * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws> * * Network Block Device Server Side @@ -1642,7 +1642,7 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args, int64_t size; uint64_t perm, shared_perm; bool readonly = !exp_args->writable; - bool shared = !exp_args->writable; + bool multi_conn; strList *bitmaps; size_t i; int ret; @@ -1680,6 +1680,23 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args, return size; } + /* + * Determine whether to advertise multi-conn. Default is auto, + * which resolves to on for read-only and off for writable. But + * if the server has max-connections 1, that forces the flag off. + */ + if (!arg->has_multi_conn) { + arg->multi_conn = ON_OFF_AUTO_AUTO; + } + if (nbd_server_max_connections() == 1) { + arg->multi_conn = ON_OFF_AUTO_OFF; + } + if (arg->multi_conn == ON_OFF_AUTO_AUTO) { + multi_conn = readonly; + } else { + multi_conn = arg->multi_conn == ON_OFF_AUTO_ON; + } + /* Don't allow resize while the NBD server is running, otherwise we don't * care what happens with the node. */ blk_get_perm(blk, &perm, &shared_perm); @@ -1693,11 +1710,11 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args, exp->description = g_strdup(arg->description); exp->nbdflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA | NBD_FLAG_SEND_CACHE); + if (multi_conn) { + exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN; + } if (readonly) { exp->nbdflags |= NBD_FLAG_READ_ONLY; - if (shared) { - exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN; - } } else { exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES | NBD_FLAG_SEND_FAST_ZERO); diff --git a/qemu-nbd.c b/qemu-nbd.c index 8c25ae93df83..ddeb2fcf7f40 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -70,6 +70,7 @@ #define QEMU_NBD_OPT_PID_FILE 265 #define QEMU_NBD_OPT_SELINUX_LABEL 266 #define QEMU_NBD_OPT_TLSHOSTNAME 267 +#define QEMU_NBD_OPT_MULTI_CONN 268 #define MBR_SIZE 512 @@ -79,6 +80,7 @@ static SocketAddress *saddr; static int persistent = 0; static enum { RUNNING, TERMINATE, TERMINATED } state; static int shared = 1; +static OnOffAuto multi_conn = ON_OFF_AUTO_AUTO; static int nb_fds; static QIONetListener *server; static QCryptoTLSCreds *tlscreds; @@ -100,6 +102,8 @@ static void usage(const char *name) " -k, --socket=PATH path to the unix socket\n" " (default '"SOCKET_PATH"')\n" " -e, --shared=NUM device can be shared by NUM clients (default '1')\n" +" -m, --multi-conn=MODE control multi-conn advertisement with -e, MODE is\n" +" 'on', 'off', or 'auto' (default based on -r)\n" " -t, --persistent don't exit on the last connection\n" " -v, --verbose display extra debugging information\n" " -x, --export-name=NAME expose export by name (default is empty string)\n" @@ -513,7 +517,7 @@ int main(int argc, char **argv) char *device = NULL; QemuOpts *sn_opts = NULL; const char *sn_id_or_name = NULL; - const char *sopt = "hVb:o:p:rsnc:dvk:e:f:tl:x:T:D:AB:L"; + const char *sopt = "hVb:o:p:rsnc:dvk:e:m:f:tl:x:T:D:AB:L"; struct option lopt[] = { { "help", no_argument, NULL, 'h' }, { "version", no_argument, NULL, 'V' }, @@ -536,6 +540,7 @@ int main(int argc, char **argv) { "detect-zeroes", required_argument, NULL, QEMU_NBD_OPT_DETECT_ZEROES }, { "shared", required_argument, NULL, 'e' }, + { "multi-conn", required_argument, NULL, 'm' }, { "format", required_argument, NULL, 'f' }, { "persistent", no_argument, NULL, 't' }, { "verbose", no_argument, NULL, 'v' }, @@ -709,6 +714,17 @@ int main(int argc, char **argv) exit(EXIT_FAILURE); } break; + case 'm': + { + Error *err = NULL; + multi_conn = qapi_enum_parse(&OnOffAuto_lookup, optarg, + ON_OFF_AUTO_AUTO, &err); + if (err) { + error_report_err(err); + exit(EXIT_FAILURE); + } + break; + } case 'f': fmt = optarg; break; @@ -1107,6 +1123,8 @@ int main(int argc, char **argv) .bitmaps = bitmaps, .has_allocation_depth = alloc_depth, .allocation_depth = alloc_depth, + .has_multi_conn = true, + .multi_conn = multi_conn, }, }; blk_exp_add(export_opts, &error_fatal); diff --git a/MAINTAINERS b/MAINTAINERS index f2e9ce1da2ac..678c34e8077f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3362,6 +3362,7 @@ F: qemu-nbd.* F: blockdev-nbd.c F: docs/interop/nbd.txt F: docs/tools/qemu-nbd.rst +F: tests/qemu-iotests/tests/*nbd* T: git https://repo.or.cz/qemu/ericb.git nbd T: git https://src.openvz.org/scm/~vsementsov/qemu.git nbd diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn new file mode 100755 index 000000000000..7d1179b33b05 --- /dev/null +++ b/tests/qemu-iotests/tests/nbd-multiconn @@ -0,0 +1,157 @@ +#!/usr/bin/env python3 +# group: rw auto quick +# +# Test cases for NBD multi-conn advertisement +# +# Copyright (C) 2022 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +import os +from contextlib import contextmanager +import iotests +from iotests import qemu_img_create, qemu_io_silent + + +disk = os.path.join(iotests.test_dir, 'disk') +size = '4M' +nbd_sock = os.path.join(iotests.sock_dir, 'nbd_sock') +nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock + + +@contextmanager +def open_nbd(export_name): + h = nbd.NBD() + try: + h.connect_uri(nbd_uri.format(export_name)) + yield h + finally: + h.shutdown() + +class TestNbdMulticonn(iotests.QMPTestCase): + def setUp(self): + assert qemu_img_create('-f', iotests.imgfmt, disk, size) == 0 + assert qemu_io_silent('-c', 'w -P 1 0 2M', + '-c', 'w -P 2 2M 2M', disk) == 0 + + self.vm = iotests.VM() + self.vm.launch() + result = self.vm.qmp('blockdev-add', { + 'driver': 'qcow2', + 'node-name': 'n', + 'file': {'driver': 'file', 'filename': disk} + }) + self.assert_qmp(result, 'return', {}) + + def tearDown(self): + self.vm.shutdown() + os.remove(disk) + try: + os.remove(nbd_sock) + except OSError: + pass + + @contextmanager + def run_server(self, max_connections=None): + args = { + 'addr': { + 'type': 'unix', + 'data': {'path': nbd_sock} + } + } + if max_connections is not None: + args['max-connections'] = max_connections + + result = self.vm.qmp('nbd-server-start', args) + self.assert_qmp(result, 'return', {}) + yield + + result = self.vm.qmp('nbd-server-stop') + self.assert_qmp(result, 'return', {}) + + def add_export(self, name, writable=None, multi_conn=None): + args = { + 'type': 'nbd', + 'id': name, + 'node-name': 'n', + 'name': name, + } + if writable is not None: + args['writable'] = writable + if multi_conn is not None: + args['multi-conn'] = multi_conn + + result = self.vm.qmp('block-export-add', args) + self.assert_qmp(result, 'return', {}) + + def test_default_settings(self): + with self.run_server(): + self.add_export('r') + self.add_export('w', writable=True) + with open_nbd('r') as h: + self.assertTrue(h.can_multi_conn()) + with open_nbd('w') as h: + self.assertFalse(h.can_multi_conn()) + + def test_multiconn_option(self): + with self.run_server(): + self.add_export('r', multi_conn='off') + self.add_export('w', writable=True, multi_conn='on') + with open_nbd('r') as h: + self.assertFalse(h.can_multi_conn()) + with open_nbd('w') as h: + self.assertTrue(h.can_multi_conn()) + + def test_limited_connections(self): + with self.run_server(max_connections=1): + self.add_export('r', multi_conn='on') + self.add_export('w', writable=True, multi_conn='on') + with open_nbd('r') as h: + self.assertFalse(h.can_multi_conn()) + with open_nbd('w') as h: + self.assertFalse(h.can_multi_conn()) + + def test_parallel_writes(self): + with self.run_server(): + self.add_export('w', writable=True, multi_conn='on') + + clients = [nbd.NBD() for _ in range(3)] + for c in clients: + c.connect_uri(nbd_uri.format('w')) + assert c.can_multi_conn() + + initial_data = clients[0].pread(1024 * 1024, 0) + self.assertEqual(initial_data, b'\x01' * 1024 * 1024) + + updated_data = b'\x03' * 1024 * 1024 + clients[1].pwrite(updated_data, 0) + clients[2].flush() + current_data = clients[0].pread(1024 * 1024, 0) + + self.assertEqual(updated_data, current_data) + + for i in range(3): + clients[i].shutdown() + + +if __name__ == '__main__': + try: + # Easier to use libnbd than to try and set up parallel + # 'qemu-nbd --list' or 'qemu-io' processes, but not all systems + # have libnbd installed. + import nbd # type: ignore + + iotests.main(supported_fmts=['qcow2']) + except ImportError: + iotests.notrun('libnbd not installed') diff --git a/tests/qemu-iotests/tests/nbd-multiconn.out b/tests/qemu-iotests/tests/nbd-multiconn.out new file mode 100644 index 000000000000..89968f35d783 --- /dev/null +++ b/tests/qemu-iotests/tests/nbd-multiconn.out @@ -0,0 +1,5 @@ +.... +---------------------------------------------------------------------- +Ran 4 tests + +OK
According to the NBD spec, a server that advertises NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will not see any cache inconsistencies: when properly separated by a single flush, actions performed by one client will be visible to another client, regardless of which client did the flush. We satisfy these conditions in qemu when our block layer is backed by the local filesystem (by virtue of the semantics of fdatasync(), and the fact that qemu itself is not buffering writes beyond flushes). It is harder to state whether we satisfy these conditions for network-based protocols, so the safest course of action is to allow users to opt-in to advertising multi-conn. We may later tweak defaults to advertise by default when the block layer can confirm that the underlying protocol driver is cache consistent between multiple writers, but for now, this at least allows savvy users (such as virt-v2v or nbdcopy) to explicitly start qemu-nbd (new -m command-line option) or qemu-storage-daemon (new qapi field 'multi-conn') with multi-conn advertisement in a known-safe setup where the client end can then benefit from parallel clients. Note, however, that we don't want to advertise MULTI_CONN when we know that a second client cannot connect (for historical reasons, qemu-nbd defaults to a single connection while nbd-server-add and QMP commands default to unlimited connections; but we already have existing means to let either style of NBD server creation alter those defaults). The harder part of this patch is setting up an iotest to demonstrate behavior of multiple NBD clients to a single server. It might be possible with parallel qemu-io processes, but I found it easier to do in python with the help of libnbd, and help from Nir and Vladimir in writing the test. Signed-off-by: Eric Blake <eblake@redhat.com> Suggested-by: Nir Soffer <nsoffer@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@ya.ru> --- docs/interop/nbd.txt | 1 + docs/tools/qemu-nbd.rst | 14 +- qapi/block-export.json | 19 ++- include/block/nbd.h | 3 +- blockdev-nbd.c | 5 + nbd/server.c | 27 +++- qemu-nbd.c | 20 ++- MAINTAINERS | 1 + tests/qemu-iotests/tests/nbd-multiconn | 157 +++++++++++++++++++++ tests/qemu-iotests/tests/nbd-multiconn.out | 5 + 10 files changed, 240 insertions(+), 12 deletions(-) create mode 100755 tests/qemu-iotests/tests/nbd-multiconn create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out