Message ID | 20220215171838.2651387-1-eblake@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2] nbd/server: Allow MULTI_CONN for shared writable exports | expand |
On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <eblake@redhat.com> wrote: > According to the NBD spec, a server advertising > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > not see any cache inconsistencies: when properly separated by a single > flush, actions performed by one client will be visible to another > client, regardless of which client did the flush. We satisfy these > conditions in qemu when our block layer is backed by the local > filesystem (by virtue of the semantics of fdatasync(), and the fact > that qemu itself is not buffering writes beyond flushes). It is > harder to state whether we satisfy these conditions for network-based > protocols, so the safest course of action is to allow users to opt-in > to advertising multi-conn. We may later tweak defaults to advertise > by default when the block layer can confirm that the underlying > protocol driver is cache consistent between multiple writers, but for > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn > advertisement in a known-safe setup where the client end can then > benefit from parallel clients. > It makes sense, and will be used by oVirt. Actually we are already using multiple connections for writing about 2 years, based on your promise that if every client writes to district areas this is safe. Note, however, that we don't want to advertise MULTI_CONN when we know > that a second client cannot connect (for historical reasons, qemu-nbd > defaults to a single connection while nbd-server-add and QMP commands > default to unlimited connections; but we already have existing means > to let either style of NBD server creation alter those defaults). The > harder part of this patch is setting up an iotest to demonstrate > behavior of multiple NBD clients to a single server. It might be > possible with parallel qemu-io processes, but concisely managing that > in shell is painful. I found it easier to do by relying on the libnbd > project's nbdsh, which means this test will be skipped on platforms > where that is not available. > > Signed-off-by: Eric Blake <eblake@redhat.com> > Fixes: https://bugzilla.redhat.com/1708300 > --- > > v1 was in Aug 2021 [1], with further replies in Sep [2] and Oct [3]. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04900.html > [2] https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html > [3] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html > > Since then, I've tweaked the QAPI to mention 7.0 (instead of 6.2), and > reworked the logic so that default behavior is unchanged for now > (advertising multi-conn on a writable export requires opt-in during > the command line or QMP, but remains default for a readonly export). > I've also expanded the amount of testing done in the new iotest. > > docs/interop/nbd.txt | 1 + > docs/tools/qemu-nbd.rst | 3 +- > qapi/block-export.json | 34 +++- > include/block/nbd.h | 3 +- > blockdev-nbd.c | 5 + > nbd/server.c | 27 ++- > MAINTAINERS | 1 + > tests/qemu-iotests/tests/nbd-multiconn | 188 +++++++++++++++++++++ > tests/qemu-iotests/tests/nbd-multiconn.out | 112 ++++++++++++ > 9 files changed, 363 insertions(+), 11 deletions(-) > create mode 100755 tests/qemu-iotests/tests/nbd-multiconn > create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out > > diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt > index bdb0f2a41aca..6c99070b99c8 100644 > --- a/docs/interop/nbd.txt > +++ b/docs/interop/nbd.txt > @@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", > NBD_CMD_CACHE > * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports, > NBD_CMD_FLAG_FAST_ZERO > * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth" > +* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports > diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst > index 6031f9689312..1de785524c36 100644 > --- a/docs/tools/qemu-nbd.rst > +++ b/docs/tools/qemu-nbd.rst > @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. > .. option:: -e, --shared=NUM > > Allow up to *NUM* clients to share the device (default > - ``1``), 0 for unlimited. Safe for readers, but for now, > - consistency is not guaranteed between multiple writers. > + ``1``), 0 for unlimited. > Removing the note means that now consistency is guaranteed between multiple writers, no? Or maybe we want to mention here that consistency depends on the protocol and users can opt in, or refer to the section where this is discussed? .. option:: -t, --persistent > > diff --git a/qapi/block-export.json b/qapi/block-export.json > index f183522d0d2c..0a27e8ee84f9 100644 > --- a/qapi/block-export.json > +++ b/qapi/block-export.json > @@ -21,7 +21,9 @@ > # recreated on the fly while the NBD server is active. > # If missing, it will default to denying access (since 4.0). > # @max-connections: The maximum number of connections to allow at the same > -# time, 0 for unlimited. (since 5.2; default: 0) > +# time, 0 for unlimited. Setting this to 1 also stops > +# the server from advertising multiple client support > +# (since 5.2; default: 0) > # > # Since: 4.2 > ## > @@ -50,7 +52,9 @@ > # recreated on the fly while the NBD server is active. > # If missing, it will default to denying access (since 4.0). > # @max-connections: The maximum number of connections to allow at the same > -# time, 0 for unlimited. (since 5.2; default: 0) > +# time, 0 for unlimited. Setting this to 1 also stops > +# the server from advertising multiple client support > +# (since 5.2; default: 0). > # > # Returns: error if the server is already running. > # > @@ -79,6 +83,26 @@ > { 'struct': 'BlockExportOptionsNbdBase', > 'data': { '*name': 'str', '*description': 'str' } } > > +## > +# @NbdExportMultiConn: > +# > +# Possible settings for advertising NBD multiple client support. > +# > +# @off: Do not advertise multiple clients. > +# > +# @on: Allow multiple clients (for writable clients, this is only safe > +# if the underlying BDS is cache-consistent, such as when backed > +# by the raw file driver); ignored if the NBD server was set up > +# with max-connections of 1. > +# > +# @auto: Behaves like @off if the export is writable, and @on if the > +# export is read-only. > +# > +# Since: 7.0 > +## > +{ 'enum': 'NbdExportMultiConn', > + 'data': ['off', 'on', 'auto'] } > Are we going to have --multi-con=(on|off|auto)? > + > ## > # @BlockExportOptionsNbd: > # > @@ -95,11 +119,15 @@ > # the metadata context name "qemu:allocation-depth" to > # inspect allocation details. (since 5.2) > # > +# @multi-conn: Controls whether multiple client support is advertised, if > the > +# server's max-connections is not 1. (default auto, since > 7.0) > +# > # Since: 5.2 > ## > { 'struct': 'BlockExportOptionsNbd', > 'base': 'BlockExportOptionsNbdBase', > - 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool' } } > + 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool', > + '*multi-conn': 'NbdExportMultiConn' } } > > ## > # @BlockExportOptionsVhostUserBlk: > diff --git a/include/block/nbd.h b/include/block/nbd.h > index 78d101b77488..138b9857dbcb 100644 > --- a/include/block/nbd.h > +++ b/include/block/nbd.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (C) 2016-2020 Red Hat, Inc. > + * Copyright (C) 2016-2022 Red Hat, Inc. > * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws> > * > * Network Block Device > @@ -346,6 +346,7 @@ void nbd_client_put(NBDClient *client); > > void nbd_server_is_qemu_nbd(bool value); > bool nbd_server_is_running(void); > +int nbd_server_max_connections(void); > void nbd_server_start(SocketAddress *addr, const char *tls_creds, > const char *tls_authz, uint32_t max_connections, > Error **errp); > diff --git a/blockdev-nbd.c b/blockdev-nbd.c > index bdfa7ed3a5a9..0df222e868d8 100644 > --- a/blockdev-nbd.c > +++ b/blockdev-nbd.c > @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) > return nbd_server || is_qemu_nbd; > } > > +int nbd_server_max_connections(void) > +{ > + return nbd_server ? nbd_server->max_connections : -1; > +} > -1 is a little bit strange for a limit, maybe 1 is a better default when we nbd_server == NULL? When can this happen? > + > static void nbd_blockdev_client_closed(NBDClient *client, bool ignored) > { > nbd_client_put(client); > diff --git a/nbd/server.c b/nbd/server.c > index 9fb2f264023e..f17aacc8139f 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (C) 2016-2021 Red Hat, Inc. > + * Copyright (C) 2016-2022 Red Hat, Inc. > * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws> > * > * Network Block Device Server Side > @@ -1641,7 +1641,7 @@ static int nbd_export_create(BlockExport *blk_exp, > BlockExportOptions *exp_args, > int64_t size; > uint64_t perm, shared_perm; > bool readonly = !exp_args->writable; > - bool shared = !exp_args->writable; > + bool multi_conn; > strList *bitmaps; > size_t i; > int ret; > @@ -1679,6 +1679,23 @@ static int nbd_export_create(BlockExport *blk_exp, > BlockExportOptions *exp_args, > return size; > } > > + /* > + * Determine whether to advertise multi-conn. Default is auto, > + * which resolves to on for read-only and off for writable. But > + * if the server has max-connections 1, that forces the flag off. > Looks good, this can be enabled automatically based on the driver if we want, so users using auto will can upgrade to multi-con automatically. > + */ > + if (!arg->has_multi_conn) { > + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO; > + } > + if (nbd_server_max_connections() == 1) { + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF; > + } + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) { > + multi_conn = readonly; > + } else { > + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON; > + } > This part is a little bit confusing - we do: - initialize args->multi_con if it has not value - set the temporary multi_con based now initialized args->multi_con I think it will be nicer to separate arguments parsing, so there is no need to initialize it here or have arg->has_multi_conn, but I did not check how other arguments are handled. > + > /* Don't allow resize while the NBD server is running, otherwise we > don't > * care what happens with the node. */ > blk_get_perm(blk, &perm, &shared_perm); > @@ -1692,11 +1709,11 @@ static int nbd_export_create(BlockExport *blk_exp, > BlockExportOptions *exp_args, > exp->description = g_strdup(arg->description); > exp->nbdflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH | > NBD_FLAG_SEND_FUA | NBD_FLAG_SEND_CACHE); > + if (multi_conn) { > + exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN; > + } > if (readonly) { > exp->nbdflags |= NBD_FLAG_READ_ONLY; > - if (shared) { > - exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN; > - } > } else { > exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES > | > NBD_FLAG_SEND_FAST_ZERO);v > diff --git a/MAINTAINERS b/MAINTAINERS > index 9814580975c5..24dd800ae3f8 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -3334,6 +3334,7 @@ F: qemu-nbd.* > F: blockdev-nbd.c > F: docs/interop/nbd.txt > F: docs/tools/qemu-nbd.rst > +F: tests/qemu-iotests/tests/*nbd* > T: git https://repo.or.cz/qemu/ericb.git nbd > T: git https://src.openvz.org/scm/~vsementsov/qemu.git nbd > > diff --git a/tests/qemu-iotests/tests/nbd-multiconn > b/tests/qemu-iotests/tests/nbd-multiconn > new file mode 100755 > index 000000000000..0261552f32f2 > --- /dev/null > +++ b/tests/qemu-iotests/tests/nbd-multiconn > @@ -0,0 +1,188 @@ > +#!/usr/bin/env bash > +# group: rw auto quick > +# > +# Test that qemu-nbd MULTI_CONN works > +# > +# Copyright (C) 2018-2022 Red Hat, Inc. > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <http://www.gnu.org/licenses/>. > +# > + > +seq="$(basename $0)" > +echo "QA output created by $seq" > + > +status=1 # failure is the default! > + > +_cleanup() > +{ > + _cleanup_test_img > + _cleanup_qemu > + rm -f "$SOCK_DIR/nbd" > +} > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +# get standard environment, filters and checks > +cd .. > +. ./common.rc > +. ./common.filter > +. ./common.qemu > +. ./common.nbd > + > +_supported_fmt qcow2 > +_supported_proto file > +_supported_os Linux > +_require_command QEMU_NBD > + > +# Parallel client connections are easier with libnbd than qemu-io: > +if ! (nbdsh --version) >/dev/null 2>&1; then > + _notrun "nbdsh utility required, skipped this test" > +fi > + > +echo > +echo "=== Initial image setup ===" > +echo > + > +_make_test_img 4M > +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io > +_launch_qemu 2> >(_filter_nbd) > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", > + "arguments":{"driver":"qcow2", "node-name":"n", > + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" > +export nbd_unix_socket > + > +echo > +echo "=== Default nbd server settings ===" > +echo > +# Default allows for unlimited connections, readonly images advertise > +# multi-conn, and writable images do not > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > + "arguments":{"addr":{"type":"unix", > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > + "name":"r"}}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"w", "writable":true}}' "return" > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' > +assert h.can_multi_conn() > +h.shutdown() > +print("nbdsh passed")' > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' > +assert not h.can_multi_conn() > +h.shutdown() > +print("nbdsh passed")' > Mixing of shell and python is very confusing. Wouldn't it be much cleaner to write the test in python? > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", > + "arguments":{"id":"r"}}' "DELETED" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", > + "arguments":{"id":"w"}}' "DELETED" > + > +echo > +echo "=== Per-export overrides of defaults ===" > +echo > +# Can explicitly disable multi-conn for readonly image, and explicitly > +# enable it for writable image > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > + "name":"r", "multi-conn":"off"}}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"w", "multi-conn":"on", "writable":true}}' "return" > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' > +assert not h.can_multi_conn() > +h.shutdown() > +print("nbdsh passed")' > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' > +assert h.can_multi_conn() > +h.shutdown() > +print("nbdsh passed")' > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", > + "arguments":{"id":"r"}}' "DELETED" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", > + "arguments":{"id":"w"}}' "DELETED" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return" > + > +echo > +echo "=== Limit nbd server max-connections ===" > +echo > +# When max-connections is 1, no images advertise multi-conn, even when > +# explicitly requested per export > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > + "arguments":{"max-connections":1, "addr":{"type":"unix", > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > + "name":"r", "multi-conn":"on"}}' "return" > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' > +assert not h.can_multi_conn() > +h.shutdown() > +print("nbdsh passed")' > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", > + "arguments":{"id":"r"}}' "DELETED" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"w", "multi-conn":"on", "writable":true}}' "return" > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' > +assert not h.can_multi_conn() > +h.shutdown() > +print("nbdsh passed")' > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", > + "arguments":{"id":"w"}}' "DELETED" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return" > + > +echo > +echo "=== Demonstrate parallel writers ===" > +echo > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > + "arguments":{"addr":{"type":"unix", > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"", "multi-conn":"on", "writable":true}}' "return" > + > +nbdsh -c ' > +import os > +sock = os.getenv("nbd_unix_socket") > +h = [] > + > +for i in range(3): > + h.append(nbd.NBD()) > + h[i].connect_unix(sock) > + assert h[i].can_multi_conn() > This is somewhat C in python, maybe: handles = [nbd.NBD() for _ in range(3)] for h in handles: h.connect_unix(sock) assert h.can_multi_con() Since assert will abort the test, and we don't handle exceptions, failure will not shutdown the connections but it should not matter for the purpose of a test. > + > +buf1 = h[0].pread(1024 * 1024, 0) > +if buf1 != b"\x01" * 1024 * 1024: > + print("Unexpected initial read") > Not clear when we initialize the buffer to \x01 - is this the qemu-io above? > +buf2 = b"\x03" * 1024 * 1024 > +h[1].pwrite(buf2, 0) > +h[2].flush() > +buf1 = h[0].pread(1024 * 1024, 0) > +if buf1 == buf2: > + print("Flush appears to be consistent across connections") > buf1 was the initial content, buf2 is the new content, but now we override but1 to check that the right was flushed. It will be be better to use different names like inittial_data, updated_data, current_data. This block is the most important part of the test, showing that we can write in one connection, flush in the second, and read the written block in the third. Maybe add a comment about this? I think it will help someone else trying to understand what this part is trying to test. > + > +for i in range(3): > + h[i].shutdown() > for h in handles: h.shutdown() > +' > + > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", > + "arguments":{"id":"w"}}' "DELETED" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"quit"}' "return" > +wait=yes _cleanup_qemu > + > +# success, all done > +echo '*** done' > +rm -f $seq.full > +status=0 > diff --git a/tests/qemu-iotests/tests/nbd-multiconn.out > b/tests/qemu-iotests/tests/nbd-multiconn.out > new file mode 100644 > index 000000000000..a06428e1536a > --- /dev/null > +++ b/tests/qemu-iotests/tests/nbd-multiconn.out > @@ -0,0 +1,112 @@ > +QA output created by nbd-multiconn > + > +=== Initial image setup === > + > +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304 > +wrote 2097152/2097152 bytes at offset 0 > +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 2097152/2097152 bytes at offset 2097152 > +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +{"execute":"qmp_capabilities"} > +{"return": {}} > +{"execute":"blockdev-add", > + "arguments":{"driver":"IMGFMT", "node-name":"n", > + "file":{"driver":"file", "filename":"TEST_DIR/t.IMGFMT"}}} > +{"return": {}} > + > +=== Default nbd server settings === > + > +{"execute":"nbd-server-start", > + "arguments":{"addr":{"type":"unix", > + "data":{"path":"SOCK_DIR/qemu-nbd.sock"}}}} > +{"return": {}} > +{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > + "name":"r"}} > +{"return": {}} > +{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"w", "writable":true}} > +{"return": {}} > +nbdsh passed > +nbdsh passed > +{"execute":"block-export-del", > + "arguments":{"id":"r"}} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "r"}} > +{"execute":"block-export-del", > + "arguments":{"id":"w"}} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} > + > +=== Per-export overrides of defaults === > + > +{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > + "name":"r", "multi-conn":"off"}} > +{"return": {}} > +{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"w", "multi-conn":"on", "writable":true}} > +{"return": {}} > +nbdsh passed > +nbdsh passed > +{"execute":"block-export-del", > + "arguments":{"id":"r"}} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "r"}} > +{"execute":"block-export-del", > + "arguments":{"id":"w"}} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} > +{"execute":"nbd-server-stop"} > +{"return": {}} > + > +=== Limit nbd server max-connections === > + > +{"execute":"nbd-server-start", > + "arguments":{"max-connections":1, "addr":{"type":"unix", > + "data":{"path":"SOCK_DIR/qemu-nbd.sock"}}}} > +{"return": {}} > +{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > + "name":"r", "multi-conn":"on"}} > +{"return": {}} > +nbdsh passed > +{"execute":"block-export-del", > + "arguments":{"id":"r"}} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "r"}} > +{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"w", "multi-conn":"on", "writable":true}} > +{"return": {}} > +nbdsh passed > +{"execute":"block-export-del", > + "arguments":{"id":"w"}} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} > +{"execute":"nbd-server-stop"} > +{"return": {}} > + > +=== Demonstrate parallel writers === > + > +{"execute":"nbd-server-start", > + "arguments":{"addr":{"type":"unix", > + "data":{"path":"SOCK_DIR/qemu-nbd.sock"}}}} > +{"return": {}} > +{"execute":"block-export-add", > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > + "name":"", "multi-conn":"on", "writable":true}} > +{"return": {}} > +Flush appears to be consistent across connections > +{"execute":"block-export-del", > + "arguments":{"id":"w"}} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} > +{"execute":"nbd-server-stop"} > +{"return": {}} > +{"execute":"quit"} > +{"return": {}} > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} > Nothing about the contents here says anything about the actual test except the "Flush appears..." line. > +*** done > -- > 2.35.1 > Looks good, feel free to ignore the style comments and suggestion to rewrite the test in python. Nir
15.02.2022 20:18, Eric Blake wrote: > According to the NBD spec, a server advertising > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > not see any cache inconsistencies: when properly separated by a single > flush, actions performed by one client will be visible to another > client, regardless of which client did the flush. We satisfy these > conditions in qemu when our block layer is backed by the local > filesystem (by virtue of the semantics of fdatasync(), and the fact > that qemu itself is not buffering writes beyond flushes). It is > harder to state whether we satisfy these conditions for network-based > protocols, so the safest course of action is to allow users to opt-in > to advertising multi-conn. We may later tweak defaults to advertise > by default when the block layer can confirm that the underlying > protocol driver is cache consistent between multiple writers, but for > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn > advertisement in a known-safe setup where the client end can then > benefit from parallel clients. > > Note, however, that we don't want to advertise MULTI_CONN when we know Here we change existing default behavior. Pre-patch MULTI_CONN is still advertised for readonly export even when connections number is limited to 1. Still may be it's a good change? Let's then note it here.. Could it break some existing clients? Hard to imagine. Otherwise patch looks good to me, nonsignificant notes below. > that a second client cannot connect (for historical reasons, qemu-nbd > defaults to a single connection while nbd-server-add and QMP commands > default to unlimited connections; but we already have existing means > to let either style of NBD server creation alter those defaults). The > harder part of this patch is setting up an iotest to demonstrate > behavior of multiple NBD clients to a single server. It might be > possible with parallel qemu-io processes, but concisely managing that > in shell is painful. I found it easier to do by relying on the libnbd > project's nbdsh, which means this test will be skipped on platforms > where that is not available. > > Signed-off-by: Eric Blake <eblake@redhat.com> > Fixes: https://bugzilla.redhat.com/1708300 > --- > [..] > +## > +# @NbdExportMultiConn: > +# > +# Possible settings for advertising NBD multiple client support. > +# > +# @off: Do not advertise multiple clients. > +# > +# @on: Allow multiple clients (for writable clients, this is only safe > +# if the underlying BDS is cache-consistent, such as when backed > +# by the raw file driver); ignored if the NBD server was set up > +# with max-connections of 1. > +# > +# @auto: Behaves like @off if the export is writable, and @on if the > +# export is read-only. > +# > +# Since: 7.0 > +## > +{ 'enum': 'NbdExportMultiConn', > + 'data': ['off', 'on', 'auto'] } Probably we should use generic OnOffAuto type that we already have.. But in your way documentation looks better. > + > ## > # @BlockExportOptionsNbd: > # [..] > + > +_make_test_img 4M > +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io > +_launch_qemu 2> >(_filter_nbd) > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", > + "arguments":{"driver":"qcow2", "node-name":"n", > + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" > +export nbd_unix_socket > + [..] > +nbdsh -c ' > +import os > +sock = os.getenv("nbd_unix_socket") Just interested why you pass it through environment and no simply do '... sock = "'"$nbd_unix_socket"'"... ' > +h = [] > + > +for i in range(3): > + h.append(nbd.NBD()) Looks like Python:) And if something looks like Python, it's Python, we know. Should I say about PEP8 that recommends 4 whitespaces?) > + h[i].connect_unix(sock) > + assert h[i].can_multi_conn() > + > +buf1 = h[0].pread(1024 * 1024, 0) > +if buf1 != b"\x01" * 1024 * 1024: > + print("Unexpected initial read") > +buf2 = b"\x03" * 1024 * 1024 > +h[1].pwrite(buf2, 0) > +h[2].flush() > +buf1 = h[0].pread(1024 * 1024, 0) > +if buf1 == buf2: > + print("Flush appears to be consistent across connections") > + > +for i in range(3): > + h[i].shutdown() > +'
On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote: > On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <eblake@redhat.com> wrote: > > > According to the NBD spec, a server advertising > > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > > not see any cache inconsistencies: when properly separated by a single > > flush, actions performed by one client will be visible to another > > client, regardless of which client did the flush. We satisfy these > > conditions in qemu when our block layer is backed by the local > > filesystem (by virtue of the semantics of fdatasync(), and the fact > > that qemu itself is not buffering writes beyond flushes). It is > > harder to state whether we satisfy these conditions for network-based > > protocols, so the safest course of action is to allow users to opt-in > > to advertising multi-conn. We may later tweak defaults to advertise > > by default when the block layer can confirm that the underlying > > protocol driver is cache consistent between multiple writers, but for > > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn > > advertisement in a known-safe setup where the client end can then > > benefit from parallel clients. > > > > It makes sense, and will be used by oVirt. Actually we are already using > multiple connections for writing about 2 years, based on your promise > that if every client writes to district areas this is safe. I presume s/district/distinct/, but yes, I'm glad we're finally trying to make the code match existing practice ;) > > +++ b/docs/tools/qemu-nbd.rst > > @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. > > .. option:: -e, --shared=NUM > > > > Allow up to *NUM* clients to share the device (default > > - ``1``), 0 for unlimited. Safe for readers, but for now, > > - consistency is not guaranteed between multiple writers. > > + ``1``), 0 for unlimited. > > > > Removing the note means that now consistency is guaranteed between > multiple writers, no? > > Or maybe we want to mention here that consistency depends on the protocol > and users can opt in, or refer to the section where this is discussed? Yeah, a link to the QAPI docs where multi-conn is documented might be nice, except I'm not sure the best way to do that in our sphinx documentation setup. > > +## > > +# @NbdExportMultiConn: > > +# > > +# Possible settings for advertising NBD multiple client support. > > +# > > +# @off: Do not advertise multiple clients. > > +# > > +# @on: Allow multiple clients (for writable clients, this is only safe > > +# if the underlying BDS is cache-consistent, such as when backed > > +# by the raw file driver); ignored if the NBD server was set up > > +# with max-connections of 1. > > +# > > +# @auto: Behaves like @off if the export is writable, and @on if the > > +# export is read-only. > > +# > > +# Since: 7.0 > > +## > > +{ 'enum': 'NbdExportMultiConn', > > + 'data': ['off', 'on', 'auto'] } > > > > Are we going to have --multi-con=(on|off|auto)? Oh. The QMP command (which is immediately visible through nbd-server-add/block-storage-add to qemu and qemu-storage-daemon) gains "multi-conn":"on", but you may be right that qemu-nbd would want a command line option (either that, or we accellerate our plans that qsd should replace qemu-nbd). > > +++ b/blockdev-nbd.c > > @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) > > return nbd_server || is_qemu_nbd; > > } > > > > +int nbd_server_max_connections(void) > > +{ > > + return nbd_server ? nbd_server->max_connections : -1; > > +} > > > > -1 is a little bit strange for a limit, maybe 1 is a better default when > we nbd_server == NULL? When can this happen? In qemu, if you haven't used the QMP command 'nbd-server-start' yet. In qemu-nbd, always (per the nbd_server_is_running function just above). My iotest only covered the qemu/qsd side, not the qemu-nbd side, so it looks like I need a v3... > > +++ b/nbd/server.c > > + /* > > + * Determine whether to advertise multi-conn. Default is auto, > > + * which resolves to on for read-only and off for writable. But > > + * if the server has max-connections 1, that forces the flag off. > > > > Looks good, this can be enabled automatically based on the driver > if we want, so users using auto will can upgrade to multi-con automatically. Yes, that's part of why I made it a tri-state with a default of 'auto'. > > > > + */ > > + if (!arg->has_multi_conn) { > > + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO; > > + } > > + if (nbd_server_max_connections() == 1) { > > + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF; > > + } > > + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) { > > + multi_conn = readonly; > > + } else { > > + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON; > > + } > > > > This part is a little bit confusing - we do: > - initialize args->multi_con if it has not value > - set the temporary multi_con based now initialized args->multi_con > > I think it will be nicer to separate arguments parsing, so there is no need > to initialize it here or have arg->has_multi_conn, but I did not check how > other arguments are handled. arg->has_multi_conn is fallout from the fact that our QAPI must remain back-compat. If it is false, the user did not pass "multi-conn":..., so we want the default value of "auto". Once we've established the default, then we force multi-conn off if we know we are limited to one client (although as you pointed out, nbd_server_max_connections() needs to be tested with qemu-nbd). Then, it's easier to resolve the tri-state enum variable into the bool of what we actually advertise to the NBD client. > > +++ b/tests/qemu-iotests/tests/nbd-multiconn > > @@ -0,0 +1,188 @@ > > +#!/usr/bin/env bash > > +# group: rw auto quick > > +# > > +# Test that qemu-nbd MULTI_CONN works > > +# > > +echo > > +echo "=== Initial image setup ===" > > +echo > > + > > +_make_test_img 4M > > +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io > > +_launch_qemu 2> >(_filter_nbd) > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", > > + "arguments":{"driver":"qcow2", "node-name":"n", > > + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" I'm not the best at writing python iotests; I welcome a language translation of this aspect. But the shell code for _launch_qemu/_send_qemu_cmd was already pretty nice for setting up a long-running background qemu process where I can pass in QMP commands at will, then interact from other processes. > > +export nbd_unix_socket > > + > > +echo > > +echo "=== Default nbd server settings ===" > > +echo > > +# Default allows for unlimited connections, readonly images advertise > > +# multi-conn, and writable images do not > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > > + "arguments":{"addr":{"type":"unix", > > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > > + "name":"r"}}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > > + "name":"w", "writable":true}}' "return" > > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' > > +assert h.can_multi_conn() > > +h.shutdown() > > +print("nbdsh passed")' > > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' > > +assert not h.can_multi_conn() > > +h.shutdown() > > +print("nbdsh passed")' > > > > Mixing of shell and python is very confusing. Wouldn't it be much cleaner > to write the test in python? Here, nbdsh -c 'python snippet' is used as a shell command line parameter. Writing python code to call out to a system() command where one of the arguments to that command is a python script snippet is going to be just as annoying as writing it in bash. > > +echo > > +echo "=== Demonstrate parallel writers ===" > > +echo > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > > + "arguments":{"addr":{"type":"unix", > > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > > + "name":"", "multi-conn":"on", "writable":true}}' "return" > > + > > +nbdsh -c ' > > +import os > > +sock = os.getenv("nbd_unix_socket") > > +h = [] > > + > > +for i in range(3): > > + h.append(nbd.NBD()) > > + h[i].connect_unix(sock) > > + assert h[i].can_multi_conn() > > > > This is somewhat C in python, maybe: > > handles = [nbd.NBD() for _ in range(3)] > > for h in handles: > h.connect_unix(sock) > assert h.can_multi_con() > > Since assert will abort the test, and we don't handle > exceptions, failure will not shutdown the connections > but it should not matter for the purpose of a test. As I said, I'm open to python suggestions :) I like your approach. > > > > + > > +buf1 = h[0].pread(1024 * 1024, 0) > > +if buf1 != b"\x01" * 1024 * 1024: > > + print("Unexpected initial read") > > > > Not clear when we initialize the buffer to \x01 - is this the qemu-io above? Yes, when the qcow2 file was initially created. > > > > +buf2 = b"\x03" * 1024 * 1024 > > +h[1].pwrite(buf2, 0) > > +h[2].flush() > > +buf1 = h[0].pread(1024 * 1024, 0) > > +if buf1 == buf2: > > + print("Flush appears to be consistent across connections") > > > > buf1 was the initial content, buf2 is the new content, but now we override > but1 to check that the right was flushed. It will be be better to use > different > names like inittial_data, updated_data, current_data. Can do. > > This block is the most important part of the test, showing that we can write > in one connection, flush in the second, and read the written block in the > third. > Maybe add a comment about this? I think it will help someone else trying > to understand what this part is trying to test. Can do. > > +{"execute":"block-export-add", > > + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > > + "name":"", "multi-conn":"on", "writable":true}} > > +{"return": {}} > > +Flush appears to be consistent across connections > > +{"execute":"block-export-del", > > + "arguments":{"id":"w"}} > > +{"return": {}} > > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} > > +{"execute":"nbd-server-stop"} > > +{"return": {}} > > +{"execute":"quit"} > > +{"return": {}} > > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > > "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} > > > > Nothing about the contents here says anything about the actual test > except the "Flush appears..." line. Yeah, it's a lot of QMP debugging (to show what commands were run in setting up the server), and less verbose in the nbdsh side. Do I need to add more output during the nbdsh that uses multiple connections? > > > > +*** done > > -- > > 2.35.1 > > > > Looks good, feel free to ignore the style comments and suggestion to rewrite > the test in python. The style comments are nice, the rewriting is going to be harder for me (but I'll accept help). At any rate, getting qemu-nbd to be feature-compatible may require a v3 anyway.
16.02.2022 02:24, Eric Blake wrote: > On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote: >> On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <eblake@redhat.com> wrote: >> >>> According to the NBD spec, a server advertising >>> NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will >>> not see any cache inconsistencies: when properly separated by a single >>> flush, actions performed by one client will be visible to another >>> client, regardless of which client did the flush. We satisfy these >>> conditions in qemu when our block layer is backed by the local >>> filesystem (by virtue of the semantics of fdatasync(), and the fact >>> that qemu itself is not buffering writes beyond flushes). It is >>> harder to state whether we satisfy these conditions for network-based >>> protocols, so the safest course of action is to allow users to opt-in >>> to advertising multi-conn. We may later tweak defaults to advertise >>> by default when the block layer can confirm that the underlying >>> protocol driver is cache consistent between multiple writers, but for >>> now, this at least allows savvy users (such as virt-v2v or nbdcopy) to >>> explicitly start qemu-nbd or qemu-storage-daemon with multi-conn >>> advertisement in a known-safe setup where the client end can then >>> benefit from parallel clients. >>> >> >> It makes sense, and will be used by oVirt. Actually we are already using >> multiple connections for writing about 2 years, based on your promise >> that if every client writes to district areas this is safe. > > I presume s/district/distinct/, but yes, I'm glad we're finally trying > to make the code match existing practice ;) > >>> +++ b/docs/tools/qemu-nbd.rst >>> @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. >>> .. option:: -e, --shared=NUM >>> >>> Allow up to *NUM* clients to share the device (default >>> - ``1``), 0 for unlimited. Safe for readers, but for now, >>> - consistency is not guaranteed between multiple writers. >>> + ``1``), 0 for unlimited. >>> >> >> Removing the note means that now consistency is guaranteed between >> multiple writers, no? >> >> Or maybe we want to mention here that consistency depends on the protocol >> and users can opt in, or refer to the section where this is discussed? > > Yeah, a link to the QAPI docs where multi-conn is documented might be > nice, except I'm not sure the best way to do that in our sphinx > documentation setup. > >>> +## >>> +# @NbdExportMultiConn: >>> +# >>> +# Possible settings for advertising NBD multiple client support. >>> +# >>> +# @off: Do not advertise multiple clients. >>> +# >>> +# @on: Allow multiple clients (for writable clients, this is only safe >>> +# if the underlying BDS is cache-consistent, such as when backed >>> +# by the raw file driver); ignored if the NBD server was set up >>> +# with max-connections of 1. >>> +# >>> +# @auto: Behaves like @off if the export is writable, and @on if the >>> +# export is read-only. >>> +# >>> +# Since: 7.0 >>> +## >>> +{ 'enum': 'NbdExportMultiConn', >>> + 'data': ['off', 'on', 'auto'] } >>> >> >> Are we going to have --multi-con=(on|off|auto)? > > Oh. The QMP command (which is immediately visible through > nbd-server-add/block-storage-add to qemu and qemu-storage-daemon) > gains "multi-conn":"on", but you may be right that qemu-nbd would want > a command line option (either that, or we accellerate our plans that > qsd should replace qemu-nbd). > >>> +++ b/blockdev-nbd.c >>> @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) >>> return nbd_server || is_qemu_nbd; >>> } >>> >>> +int nbd_server_max_connections(void) >>> +{ >>> + return nbd_server ? nbd_server->max_connections : -1; >>> +} >>> >> >> -1 is a little bit strange for a limit, maybe 1 is a better default when >> we nbd_server == NULL? When can this happen? > > In qemu, if you haven't used the QMP command 'nbd-server-start' yet. > In qemu-nbd, always (per the nbd_server_is_running function just > above). My iotest only covered the qemu/qsd side, not the qemu-nbd > side, so it looks like I need a v3... > >>> +++ b/nbd/server.c > >>> + /* >>> + * Determine whether to advertise multi-conn. Default is auto, >>> + * which resolves to on for read-only and off for writable. But >>> + * if the server has max-connections 1, that forces the flag off. >>> >> >> Looks good, this can be enabled automatically based on the driver >> if we want, so users using auto will can upgrade to multi-con automatically. > > Yes, that's part of why I made it a tri-state with a default of 'auto'. > >> >> >>> + */ >>> + if (!arg->has_multi_conn) { >>> + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO; >>> + } >>> + if (nbd_server_max_connections() == 1) { >> >> + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF; >>> + } >> >> + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) { >>> + multi_conn = readonly; >>> + } else { >>> + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON; >>> + } >>> >> >> This part is a little bit confusing - we do: >> - initialize args->multi_con if it has not value >> - set the temporary multi_con based now initialized args->multi_con >> >> I think it will be nicer to separate arguments parsing, so there is no need >> to initialize it here or have arg->has_multi_conn, but I did not check how >> other arguments are handled. > > arg->has_multi_conn is fallout from the fact that our QAPI must remain > back-compat. If it is false, the user did not pass "multi-conn":..., > so we want the default value of "auto". Once we've established the > default, then we force multi-conn off if we know we are limited to one > client (although as you pointed out, nbd_server_max_connections() > needs to be tested with qemu-nbd). Then, it's easier to resolve the > tri-state enum variable into the bool of what we actually advertise to > the NBD client. > >>> +++ b/tests/qemu-iotests/tests/nbd-multiconn >>> @@ -0,0 +1,188 @@ >>> +#!/usr/bin/env bash >>> +# group: rw auto quick >>> +# >>> +# Test that qemu-nbd MULTI_CONN works >>> +# >>> +echo >>> +echo "=== Initial image setup ===" >>> +echo >>> + >>> +_make_test_img 4M >>> +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io >>> +_launch_qemu 2> >(_filter_nbd) >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", >>> + "arguments":{"driver":"qcow2", "node-name":"n", >>> + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" > > I'm not the best at writing python iotests; I welcome a language > translation of this aspect. Let me try:) #!/usr/bin/env python3 import os import iotests import nbd from iotests import qemu_img_create, qemu_io_silent disk = os.path.join(iotests.test_dir, 'disk') size = '4M' nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock') nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock class TestNbdMulticonn(iotests.QMPTestCase): def setUp(self): assert qemu_img_create('-f', iotests.imgfmt, disk, size) == 0 assert qemu_io_silent('-c', 'w -P 1 0 2M', '-c', 'w -P 2 2M 2M', disk) == 0 self.vm = iotests.VM() self.vm.launch() result = self.vm.qmp('blockdev-add', { 'driver': 'qcow2', 'node-name': 'n', 'file': {'driver': 'file', 'filename': disk} }) self.assert_qmp(result, 'return', {}) def tearDown(self): self.vm.shutdown() os.remove(disk) os.remove(nbd_sock) def server_start(self, max_connections=None): args = { 'addr': { 'type': 'unix', 'data': {'path': nbd_sock} } } if max_connections is not None: args['max-connections'] = max_connections result = self.vm.qmp('nbd-server-start', args) self.assert_qmp(result, 'return', {}) def export_add(self, name, writable=None, multi_conn=None): args = { 'type': 'nbd', 'id': name, 'node-name': 'n', 'name': name, } if writable is not None: args['writable'] = writable if multi_conn is not None: args['multi-conn'] = multi_conn result = self.vm.qmp('block-export-add', args) self.assert_qmp(result, 'return', {}) def check_multi_conn(self, export_name, multi_conn): cl = nbd.NBD() cl.connect_uri(nbd_uri.format(export_name)) self.assertEqual(cl.can_multi_conn(), multi_conn) cl.shutdown() def test_default_settings(self): self.server_start() self.export_add('r') self.export_add('w', writable=True) self.check_multi_conn('r', True) self.check_multi_conn('w', False) def test_multiconn_option(self): self.server_start() self.export_add('r', multi_conn='off') self.export_add('w', writable=True, multi_conn='on') self.check_multi_conn('r', False) self.check_multi_conn('w', True) def test_limited_connections(self): self.server_start(max_connections=1) self.export_add('r', multi_conn='on') self.export_add('w', writable=True, multi_conn='on') self.check_multi_conn('r', False) self.check_multi_conn('w', False) def test_parallel_writes(self): self.server_start() self.export_add('w', writable=True, multi_conn='on') clients = [nbd.NBD() for _ in range(3)] for c in clients: c.connect_uri(nbd_uri.format('w')) assert c.can_multi_conn() buf1 = clients[0].pread(1024 * 1024, 0) self.assertEqual(buf1, b"\x01" * 1024 * 1024) buf2 = b"\x03" * 1024 * 1024 clients[1].pwrite(buf2, 0) clients[2].flush() buf1 = clients[0].pread(1024 * 1024, 0) self.assertEqual(buf1, buf2) for i in range(3): clients[i].shutdown() if __name__ == '__main__': iotests.main(supported_fmts=['qcow2']) > But the shell code for > _launch_qemu/_send_qemu_cmd was already pretty nice for setting up a > long-running background qemu process where I can pass in QMP commands > at will, then interact from other processes. > >>> +export nbd_unix_socket >>> + >>> +echo >>> +echo "=== Default nbd server settings ===" >>> +echo >>> +# Default allows for unlimited connections, readonly images advertise >>> +# multi-conn, and writable images do not >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", >>> + "arguments":{"addr":{"type":"unix", >>> + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", >>> + "arguments":{"type":"nbd", "id":"r", "node-name":"n", >>> + "name":"r"}}' "return" >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", >>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", >>> + "name":"w", "writable":true}}' "return" >>> +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' >>> +assert h.can_multi_conn() >>> +h.shutdown() >>> +print("nbdsh passed")' >>> +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' >>> +assert not h.can_multi_conn() >>> +h.shutdown() >>> +print("nbdsh passed")' >>> >> >> Mixing of shell and python is very confusing. Wouldn't it be much cleaner >> to write the test in python? > > Here, nbdsh -c 'python snippet' is used as a shell command line > parameter. Writing python code to call out to a system() command > where one of the arguments to that command is a python script snippet > is going to be just as annoying as writing it in bash. > >>> +echo >>> +echo "=== Demonstrate parallel writers ===" >>> +echo >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", >>> + "arguments":{"addr":{"type":"unix", >>> + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", >>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", >>> + "name":"", "multi-conn":"on", "writable":true}}' "return" >>> + >>> +nbdsh -c ' >>> +import os >>> +sock = os.getenv("nbd_unix_socket") >>> +h = [] >>> + >>> +for i in range(3): >>> + h.append(nbd.NBD()) >>> + h[i].connect_unix(sock) >>> + assert h[i].can_multi_conn() >>> >> >> This is somewhat C in python, maybe: >> >> handles = [nbd.NBD() for _ in range(3)] >> >> for h in handles: >> h.connect_unix(sock) >> assert h.can_multi_con() >> >> Since assert will abort the test, and we don't handle >> exceptions, failure will not shutdown the connections >> but it should not matter for the purpose of a test. > > As I said, I'm open to python suggestions :) I like your approach. > >> >> >>> + >>> +buf1 = h[0].pread(1024 * 1024, 0) >>> +if buf1 != b"\x01" * 1024 * 1024: >>> + print("Unexpected initial read") >>> >> >> Not clear when we initialize the buffer to \x01 - is this the qemu-io above? > > Yes, when the qcow2 file was initially created. > >> >> >>> +buf2 = b"\x03" * 1024 * 1024 >>> +h[1].pwrite(buf2, 0) >>> +h[2].flush() >>> +buf1 = h[0].pread(1024 * 1024, 0) >>> +if buf1 == buf2: >>> + print("Flush appears to be consistent across connections") >>> >> >> buf1 was the initial content, buf2 is the new content, but now we override >> but1 to check that the right was flushed. It will be be better to use >> different >> names like inittial_data, updated_data, current_data. > > Can do. > >> >> This block is the most important part of the test, showing that we can write >> in one connection, flush in the second, and read the written block in the >> third. >> Maybe add a comment about this? I think it will help someone else trying >> to understand what this part is trying to test. > > Can do. > >>> +{"execute":"block-export-add", >>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", >>> + "name":"", "multi-conn":"on", "writable":true}} >>> +{"return": {}} >>> +Flush appears to be consistent across connections >>> +{"execute":"block-export-del", >>> + "arguments":{"id":"w"}} >>> +{"return": {}} >>> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >>> "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} >>> +{"execute":"nbd-server-stop"} >>> +{"return": {}} >>> +{"execute":"quit"} >>> +{"return": {}} >>> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >>> "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} >>> >> >> Nothing about the contents here says anything about the actual test >> except the "Flush appears..." line. > > Yeah, it's a lot of QMP debugging (to show what commands were run in > setting up the server), and less verbose in the nbdsh side. Do I need > to add more output during the nbdsh that uses multiple connections? > >> >> >>> +*** done >>> -- >>> 2.35.1 >>> >> >> Looks good, feel free to ignore the style comments and suggestion to rewrite >> the test in python. > > The style comments are nice, the rewriting is going to be harder for > me (but I'll accept help). At any rate, getting qemu-nbd to be > feature-compatible may require a v3 anyway. >
On Tue, Feb 15, 2022 at 05:24:14PM -0600, Eric Blake wrote: > Oh. The QMP command (which is immediately visible through > nbd-server-add/block-storage-add to qemu and qemu-storage-daemon) > gains "multi-conn":"on", but you may be right that qemu-nbd would want > a command line option (either that, or we accellerate our plans that > qsd should replace qemu-nbd). I really hope there will always be something called "qemu-nbd" that acts like qemu-nbd. Rich.
On Wed, Feb 16, 2022 at 12:13 PM Richard W.M. Jones <rjones@redhat.com> wrote: > On Tue, Feb 15, 2022 at 05:24:14PM -0600, Eric Blake wrote: > > Oh. The QMP command (which is immediately visible through > > nbd-server-add/block-storage-add to qemu and qemu-storage-daemon) > > gains "multi-conn":"on", but you may be right that qemu-nbd would want > > a command line option (either that, or we accellerate our plans that > > qsd should replace qemu-nbd). > > I really hope there will always be something called "qemu-nbd" > that acts like qemu-nbd. > I share this hope. Most projects I work on are based on qemu-nbd. However in oVirt use case, we want to provide an NBD socket for clients to allow direct access to disks. One of the issues we need to solve for this is having a way to tell if the qemu-nbd is active, so we can terminate idle transfers. The way we do this with the ovirt-imageio server is to query the status of the transfer, and use the idle time (time since last request) and active status (has inflight requests) to detect a stale transfer that should be terminated. An example use case is a process on a remote host that started an image transfer, and killed or crashed in the middle of the transfer without cleaning up properly. To be more specific, every request to the imageio server (read, write, flush, zero, options) updates a timestamp in the transfer state. When we get the status we report the time since that timestamp was updated. Additionally we keep and report the number of inflight requests, so we can tell the case when requests are blocked on inaccessible storage (e.g. non responsive NFS). We don't have a way to do this with qemu-nbd, but I guess that using qemu-storage-daemon when we have qmp access will make such monitoring possible. Nir
Eric Blake <eblake@redhat.com> writes: > According to the NBD spec, a server advertising > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > not see any cache inconsistencies: when properly separated by a single > flush, actions performed by one client will be visible to another > client, regardless of which client did the flush. We satisfy these > conditions in qemu when our block layer is backed by the local > filesystem (by virtue of the semantics of fdatasync(), and the fact > that qemu itself is not buffering writes beyond flushes). It is > harder to state whether we satisfy these conditions for network-based > protocols, so the safest course of action is to allow users to opt-in > to advertising multi-conn. We may later tweak defaults to advertise > by default when the block layer can confirm that the underlying > protocol driver is cache consistent between multiple writers, but for > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn > advertisement in a known-safe setup where the client end can then > benefit from parallel clients. > > Note, however, that we don't want to advertise MULTI_CONN when we know > that a second client cannot connect (for historical reasons, qemu-nbd > defaults to a single connection while nbd-server-add and QMP commands > default to unlimited connections; but we already have existing means > to let either style of NBD server creation alter those defaults). The > harder part of this patch is setting up an iotest to demonstrate > behavior of multiple NBD clients to a single server. It might be > possible with parallel qemu-io processes, but concisely managing that > in shell is painful. I found it easier to do by relying on the libnbd > project's nbdsh, which means this test will be skipped on platforms > where that is not available. > > Signed-off-by: Eric Blake <eblake@redhat.com> > Fixes: https://bugzilla.redhat.com/1708300 > --- > > v1 was in Aug 2021 [1], with further replies in Sep [2] and Oct [3]. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04900.html > [2] https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html > [3] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html > > Since then, I've tweaked the QAPI to mention 7.0 (instead of 6.2), and > reworked the logic so that default behavior is unchanged for now > (advertising multi-conn on a writable export requires opt-in during > the command line or QMP, but remains default for a readonly export). > I've also expanded the amount of testing done in the new iotest. > > docs/interop/nbd.txt | 1 + > docs/tools/qemu-nbd.rst | 3 +- > qapi/block-export.json | 34 +++- > include/block/nbd.h | 3 +- > blockdev-nbd.c | 5 + > nbd/server.c | 27 ++- > MAINTAINERS | 1 + > tests/qemu-iotests/tests/nbd-multiconn | 188 +++++++++++++++++++++ > tests/qemu-iotests/tests/nbd-multiconn.out | 112 ++++++++++++ > 9 files changed, 363 insertions(+), 11 deletions(-) > create mode 100755 tests/qemu-iotests/tests/nbd-multiconn > create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out > > diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt > index bdb0f2a41aca..6c99070b99c8 100644 > --- a/docs/interop/nbd.txt > +++ b/docs/interop/nbd.txt > @@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE > * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports, > NBD_CMD_FLAG_FAST_ZERO > * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth" > +* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports > diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst > index 6031f9689312..1de785524c36 100644 > --- a/docs/tools/qemu-nbd.rst > +++ b/docs/tools/qemu-nbd.rst > @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. > .. option:: -e, --shared=NUM > > Allow up to *NUM* clients to share the device (default > - ``1``), 0 for unlimited. Safe for readers, but for now, > - consistency is not guaranteed between multiple writers. > + ``1``), 0 for unlimited. > > .. option:: -t, --persistent > > diff --git a/qapi/block-export.json b/qapi/block-export.json > index f183522d0d2c..0a27e8ee84f9 100644 > --- a/qapi/block-export.json > +++ b/qapi/block-export.json > @@ -21,7 +21,9 @@ > # recreated on the fly while the NBD server is active. > # If missing, it will default to denying access (since 4.0). > # @max-connections: The maximum number of connections to allow at the same > -# time, 0 for unlimited. (since 5.2; default: 0) > +# time, 0 for unlimited. Setting this to 1 also stops > +# the server from advertising multiple client support > +# (since 5.2; default: 0) > # > # Since: 4.2 > ## > @@ -50,7 +52,9 @@ > # recreated on the fly while the NBD server is active. > # If missing, it will default to denying access (since 4.0). > # @max-connections: The maximum number of connections to allow at the same > -# time, 0 for unlimited. (since 5.2; default: 0) > +# time, 0 for unlimited. Setting this to 1 also stops > +# the server from advertising multiple client support > +# (since 5.2; default: 0). > # > # Returns: error if the server is already running. > # > @@ -79,6 +83,26 @@ > { 'struct': 'BlockExportOptionsNbdBase', > 'data': { '*name': 'str', '*description': 'str' } } > > +## > +# @NbdExportMultiConn: > +# > +# Possible settings for advertising NBD multiple client support. > +# > +# @off: Do not advertise multiple clients. > +# > +# @on: Allow multiple clients (for writable clients, this is only safe > +# if the underlying BDS is cache-consistent, such as when backed > +# by the raw file driver); ignored if the NBD server was set up > +# with max-connections of 1. > +# > +# @auto: Behaves like @off if the export is writable, and @on if the > +# export is read-only. > +# > +# Since: 7.0 > +## > +{ 'enum': 'NbdExportMultiConn', > + 'data': ['off', 'on', 'auto'] } Have you considered using OnOffAuto from common.json? Duplicating it as NbdExportMultiConn lets you document the values right in the enum. If you reuse it, you have to document them where the enum is used, i.e. ... > + > ## > # @BlockExportOptionsNbd: > # > @@ -95,11 +119,15 @@ > # the metadata context name "qemu:allocation-depth" to > # inspect allocation details. (since 5.2) > # > +# @multi-conn: Controls whether multiple client support is advertised, if the > +# server's max-connections is not 1. (default auto, since 7.0) > +# ... here. > # Since: 5.2 > ## > { 'struct': 'BlockExportOptionsNbd', > 'base': 'BlockExportOptionsNbdBase', > - 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool' } } > + 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool', > + '*multi-conn': 'NbdExportMultiConn' } } > > ## > # @BlockExportOptionsVhostUserBlk: [...]
On Wed, Feb 16, 2022 at 10:08 AM Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote: > > 16.02.2022 02:24, Eric Blake wrote: > > On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote: > >> On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <eblake@redhat.com> wrote: > >> > >>> According to the NBD spec, a server advertising > >>> NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will > >>> not see any cache inconsistencies: when properly separated by a single > >>> flush, actions performed by one client will be visible to another > >>> client, regardless of which client did the flush. We satisfy these > >>> conditions in qemu when our block layer is backed by the local > >>> filesystem (by virtue of the semantics of fdatasync(), and the fact > >>> that qemu itself is not buffering writes beyond flushes). It is > >>> harder to state whether we satisfy these conditions for network-based > >>> protocols, so the safest course of action is to allow users to opt-in > >>> to advertising multi-conn. We may later tweak defaults to advertise > >>> by default when the block layer can confirm that the underlying > >>> protocol driver is cache consistent between multiple writers, but for > >>> now, this at least allows savvy users (such as virt-v2v or nbdcopy) to > >>> explicitly start qemu-nbd or qemu-storage-daemon with multi-conn > >>> advertisement in a known-safe setup where the client end can then > >>> benefit from parallel clients. > >>> > >> > >> It makes sense, and will be used by oVirt. Actually we are already using > >> multiple connections for writing about 2 years, based on your promise > >> that if every client writes to district areas this is safe. > > > > I presume s/district/distinct/, but yes, I'm glad we're finally trying > > to make the code match existing practice ;) > > > >>> +++ b/docs/tools/qemu-nbd.rst > >>> @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. > >>> .. option:: -e, --shared=NUM > >>> > >>> Allow up to *NUM* clients to share the device (default > >>> - ``1``), 0 for unlimited. Safe for readers, but for now, > >>> - consistency is not guaranteed between multiple writers. > >>> + ``1``), 0 for unlimited. > >>> > >> > >> Removing the note means that now consistency is guaranteed between > >> multiple writers, no? > >> > >> Or maybe we want to mention here that consistency depends on the protocol > >> and users can opt in, or refer to the section where this is discussed? > > > > Yeah, a link to the QAPI docs where multi-conn is documented might be > > nice, except I'm not sure the best way to do that in our sphinx > > documentation setup. > > > >>> +## > >>> +# @NbdExportMultiConn: > >>> +# > >>> +# Possible settings for advertising NBD multiple client support. > >>> +# > >>> +# @off: Do not advertise multiple clients. > >>> +# > >>> +# @on: Allow multiple clients (for writable clients, this is only safe > >>> +# if the underlying BDS is cache-consistent, such as when backed > >>> +# by the raw file driver); ignored if the NBD server was set up > >>> +# with max-connections of 1. > >>> +# > >>> +# @auto: Behaves like @off if the export is writable, and @on if the > >>> +# export is read-only. > >>> +# > >>> +# Since: 7.0 > >>> +## > >>> +{ 'enum': 'NbdExportMultiConn', > >>> + 'data': ['off', 'on', 'auto'] } > >>> > >> > >> Are we going to have --multi-con=(on|off|auto)? > > > > Oh. The QMP command (which is immediately visible through > > nbd-server-add/block-storage-add to qemu and qemu-storage-daemon) > > gains "multi-conn":"on", but you may be right that qemu-nbd would want > > a command line option (either that, or we accellerate our plans that > > qsd should replace qemu-nbd). > > > >>> +++ b/blockdev-nbd.c > >>> @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) > >>> return nbd_server || is_qemu_nbd; > >>> } > >>> > >>> +int nbd_server_max_connections(void) > >>> +{ > >>> + return nbd_server ? nbd_server->max_connections : -1; > >>> +} > >>> > >> > >> -1 is a little bit strange for a limit, maybe 1 is a better default when > >> we nbd_server == NULL? When can this happen? > > > > In qemu, if you haven't used the QMP command 'nbd-server-start' yet. > > In qemu-nbd, always (per the nbd_server_is_running function just > > above). My iotest only covered the qemu/qsd side, not the qemu-nbd > > side, so it looks like I need a v3... > > > >>> +++ b/nbd/server.c > > > >>> + /* > >>> + * Determine whether to advertise multi-conn. Default is auto, > >>> + * which resolves to on for read-only and off for writable. But > >>> + * if the server has max-connections 1, that forces the flag off. > >>> > >> > >> Looks good, this can be enabled automatically based on the driver > >> if we want, so users using auto will can upgrade to multi-con automatically. > > > > Yes, that's part of why I made it a tri-state with a default of 'auto'. > > > >> > >> > >>> + */ > >>> + if (!arg->has_multi_conn) { > >>> + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO; > >>> + } > >>> + if (nbd_server_max_connections() == 1) { > >> > >> + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF; > >>> + } > >> > >> + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) { > >>> + multi_conn = readonly; > >>> + } else { > >>> + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON; > >>> + } > >>> > >> > >> This part is a little bit confusing - we do: > >> - initialize args->multi_con if it has not value > >> - set the temporary multi_con based now initialized args->multi_con > >> > >> I think it will be nicer to separate arguments parsing, so there is no need > >> to initialize it here or have arg->has_multi_conn, but I did not check how > >> other arguments are handled. > > > > arg->has_multi_conn is fallout from the fact that our QAPI must remain > > back-compat. If it is false, the user did not pass "multi-conn":..., > > so we want the default value of "auto". Once we've established the > > default, then we force multi-conn off if we know we are limited to one > > client (although as you pointed out, nbd_server_max_connections() > > needs to be tested with qemu-nbd). Then, it's easier to resolve the > > tri-state enum variable into the bool of what we actually advertise to > > the NBD client. > > > >>> +++ b/tests/qemu-iotests/tests/nbd-multiconn > >>> @@ -0,0 +1,188 @@ > >>> +#!/usr/bin/env bash > >>> +# group: rw auto quick > >>> +# > >>> +# Test that qemu-nbd MULTI_CONN works > >>> +# > >>> +echo > >>> +echo "=== Initial image setup ===" > >>> +echo > >>> + > >>> +_make_test_img 4M > >>> +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io > >>> +_launch_qemu 2> >(_filter_nbd) > >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" > >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", > >>> + "arguments":{"driver":"qcow2", "node-name":"n", > >>> + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" > > > > I'm not the best at writing python iotests; I welcome a language > > translation of this aspect. > > > > Let me try:) Thanks! This is much nicer and will be easier to maintain. > > > #!/usr/bin/env python3 > > import os > import iotests > import nbd > from iotests import qemu_img_create, qemu_io_silent > > > disk = os.path.join(iotests.test_dir, 'disk') > size = '4M' > nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock') > nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock > > > class TestNbdMulticonn(iotests.QMPTestCase): > def setUp(self): > assert qemu_img_create('-f', iotests.imgfmt, disk, size) == 0 > assert qemu_io_silent('-c', 'w -P 1 0 2M', > '-c', 'w -P 2 2M 2M', disk) == 0 > > self.vm = iotests.VM() > self.vm.launch() > result = self.vm.qmp('blockdev-add', { > 'driver': 'qcow2', > 'node-name': 'n', > 'file': {'driver': 'file', 'filename': disk} > }) > self.assert_qmp(result, 'return', {}) > > def tearDown(self): > self.vm.shutdown() > os.remove(disk) > os.remove(nbd_sock) > > def server_start(self, max_connections=None): > args = { > 'addr': { > 'type': 'unix', > 'data': {'path': nbd_sock} > } > } > if max_connections is not None: > args['max-connections'] = max_connections > > result = self.vm.qmp('nbd-server-start', args) > self.assert_qmp(result, 'return', {}) > > def export_add(self, name, writable=None, multi_conn=None): > args = { > 'type': 'nbd', > 'id': name, > 'node-name': 'n', > 'name': name, > } > if writable is not None: > args['writable'] = writable > if multi_conn is not None: > args['multi-conn'] = multi_conn > > result = self.vm.qmp('block-export-add', args) > self.assert_qmp(result, 'return', {}) > > def check_multi_conn(self, export_name, multi_conn): > cl = nbd.NBD() > cl.connect_uri(nbd_uri.format(export_name)) > self.assertEqual(cl.can_multi_conn(), multi_conn) > cl.shutdown() The test will be more clear and the code more reusable if the helper was doing only connect/disconnect. @contextmanager def open_nbd(nbd_uri, export_name): h = nbd.NBD() h.connect_uri(nbd_uri.format(export_name)) try: yield h finally: h.shutdown() Any test that need nbd handle can do: with open_nbd(nbd_uri, export_name) as h: use the handle... Good example when we can use this is the block status cache test, using complicated qemu-img commands instead of opening a client and calling block_status a few times. And this also cleans up at the end of the test so failure will not affect the next test. The helper can live in the iotest module instead of inventing it for every new test. > > def test_default_settings(self): > self.server_start() > self.export_add('r')) > self.export_add('w', writable=True) > self.check_multi_conn('r', True) > self.check_multi_conn('w', False) With the helper suggested, this test will be: with open_nbd(nbd_uri, "r") as h: self.assertTrue(h.can_multi_conn()) with open_nbd(nbd_uri, "w") as h: self.assertFalse(h.can_multi_conn()) Now you don't need to check what check_multicon() is doing when reading the tests, and it is very clear what open_nbd() does based on the name and usage as context manager. > > def test_multiconn_option(self): > self.server_start() > self.export_add('r', multi_conn='off') > self.export_add('w', writable=True, multi_conn='on') It will be more natural to use: self.start_server() self.add_export(...) In C the other way is more natural because you don't have namespaces or classes. > self.check_multi_conn('r', False) > self.check_multi_conn('w', True) And I think you agree since you did not use: self.multi_con_check(...) > > def test_limited_connections(self): > self.server_start(max_connections=1) > self.export_add('r', multi_conn='on') > self.export_add('w', writable=True, multi_conn='on') > self.check_multi_conn('r', False) > self.check_multi_conn('w', False) > > def test_parallel_writes(self): > self.server_start() > self.export_add('w', writable=True, multi_conn='on') > > clients = [nbd.NBD() for _ in range(3)] > for c in clients: > c.connect_uri(nbd_uri.format('w')) > assert c.can_multi_conn() > > buf1 = clients[0].pread(1024 * 1024, 0) > self.assertEqual(buf1, b"\x01" * 1024 * 1024) > > buf2 = b"\x03" * 1024 * 1024 > clients[1].pwrite(buf2, 0) > clients[2].flush() > buf1 = clients[0].pread(1024 * 1024, 0) > > self.assertEqual(buf1, buf2) > > for i in range(3): > clients[i].shutdown() > > > if __name__ == '__main__': > iotests.main(supported_fmts=['qcow2']) > > > But the shell code for > > _launch_qemu/_send_qemu_cmd was already pretty nice for setting up a > > long-running background qemu process where I can pass in QMP commands > > at will, then interact from other processes. > > > >>> +export nbd_unix_socket > >>> + > >>> +echo > >>> +echo "=== Default nbd server settings ===" > >>> +echo > >>> +# Default allows for unlimited connections, readonly images advertise > >>> +# multi-conn, and writable images do not > >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > >>> + "arguments":{"addr":{"type":"unix", > >>> + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > >>> + "arguments":{"type":"nbd", "id":"r", "node-name":"n", > >>> + "name":"r"}}' "return" > >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > >>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > >>> + "name":"w", "writable":true}}' "return" > >>> +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' > >>> +assert h.can_multi_conn() > >>> +h.shutdown() > >>> +print("nbdsh passed")' > >>> +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' > >>> +assert not h.can_multi_conn() > >>> +h.shutdown() > >>> +print("nbdsh passed")' > >>> > >> > >> Mixing of shell and python is very confusing. Wouldn't it be much cleaner > >> to write the test in python? > > > > Here, nbdsh -c 'python snippet' is used as a shell command line > > parameter. Writing python code to call out to a system() command > > where one of the arguments to that command is a python script snippet > > is going to be just as annoying as writing it in bash. > > > >>> +echo > >>> +echo "=== Demonstrate parallel writers ===" > >>> +echo > >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", > >>> + "arguments":{"addr":{"type":"unix", > >>> + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" > >>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", > >>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > >>> + "name":"", "multi-conn":"on", "writable":true}}' "return" > >>> + > >>> +nbdsh -c ' > >>> +import os > >>> +sock = os.getenv("nbd_unix_socket") > >>> +h = [] > >>> + > >>> +for i in range(3): > >>> + h.append(nbd.NBD()) > >>> + h[i].connect_unix(sock) > >>> + assert h[i].can_multi_conn() > >>> > >> > >> This is somewhat C in python, maybe: > >> > >> handles = [nbd.NBD() for _ in range(3)] > >> > >> for h in handles: > >> h.connect_unix(sock) > >> assert h.can_multi_con() > >> > >> Since assert will abort the test, and we don't handle > >> exceptions, failure will not shutdown the connections > >> but it should not matter for the purpose of a test. > > > > As I said, I'm open to python suggestions :) I like your approach. > > > >> > >> > >>> + > >>> +buf1 = h[0].pread(1024 * 1024, 0) > >>> +if buf1 != b"\x01" * 1024 * 1024: > >>> + print("Unexpected initial read") > >>> > >> > >> Not clear when we initialize the buffer to \x01 - is this the qemu-io above? > > > > Yes, when the qcow2 file was initially created. > > > >> > >> > >>> +buf2 = b"\x03" * 1024 * 1024 > >>> +h[1].pwrite(buf2, 0) > >>> +h[2].flush() > >>> +buf1 = h[0].pread(1024 * 1024, 0) > >>> +if buf1 == buf2: > >>> + print("Flush appears to be consistent across connections") > >>> > >> > >> buf1 was the initial content, buf2 is the new content, but now we override > >> but1 to check that the right was flushed. It will be be better to use > >> different > >> names like inittial_data, updated_data, current_data. > > > > Can do. > > > >> > >> This block is the most important part of the test, showing that we can write > >> in one connection, flush in the second, and read the written block in the > >> third. > >> Maybe add a comment about this? I think it will help someone else trying > >> to understand what this part is trying to test. > > > > Can do. > > > >>> +{"execute":"block-export-add", > >>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", > >>> + "name":"", "multi-conn":"on", "writable":true}} > >>> +{"return": {}} > >>> +Flush appears to be consistent across connections > >>> +{"execute":"block-export-del", > >>> + "arguments":{"id":"w"}} > >>> +{"return": {}} > >>> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >>> "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} > >>> +{"execute":"nbd-server-stop"} > >>> +{"return": {}} > >>> +{"execute":"quit"} > >>> +{"return": {}} > >>> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, > >>> "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} > >>> > >> > >> Nothing about the contents here says anything about the actual test > >> except the "Flush appears..." line. > > > > Yeah, it's a lot of QMP debugging (to show what commands were run in > > setting up the server), and less verbose in the nbdsh side. Do I need > > to add more output during the nbdsh that uses multiple connections? > > > >> > >> > >>> +*** done > >>> -- > >>> 2.35.1 > >>> > >> > >> Looks good, feel free to ignore the style comments and suggestion to rewrite > >> the test in python. > > > > The style comments are nice, the rewriting is going to be harder for > > me (but I'll accept help). At any rate, getting qemu-nbd to be > > feature-compatible may require a v3 anyway. > > > > > -- > Best regards, > Vladimir >
16.02.2022 20:14, Nir Soffer wrote: > On Wed, Feb 16, 2022 at 10:08 AM Vladimir Sementsov-Ogievskiy > <vsementsov@virtuozzo.com> wrote: >> >> 16.02.2022 02:24, Eric Blake wrote: >>> On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote: >>>> On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <eblake@redhat.com> wrote: >>>> >>>>> According to the NBD spec, a server advertising >>>>> NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will >>>>> not see any cache inconsistencies: when properly separated by a single >>>>> flush, actions performed by one client will be visible to another >>>>> client, regardless of which client did the flush. We satisfy these >>>>> conditions in qemu when our block layer is backed by the local >>>>> filesystem (by virtue of the semantics of fdatasync(), and the fact >>>>> that qemu itself is not buffering writes beyond flushes). It is >>>>> harder to state whether we satisfy these conditions for network-based >>>>> protocols, so the safest course of action is to allow users to opt-in >>>>> to advertising multi-conn. We may later tweak defaults to advertise >>>>> by default when the block layer can confirm that the underlying >>>>> protocol driver is cache consistent between multiple writers, but for >>>>> now, this at least allows savvy users (such as virt-v2v or nbdcopy) to >>>>> explicitly start qemu-nbd or qemu-storage-daemon with multi-conn >>>>> advertisement in a known-safe setup where the client end can then >>>>> benefit from parallel clients. >>>>> >>>> >>>> It makes sense, and will be used by oVirt. Actually we are already using >>>> multiple connections for writing about 2 years, based on your promise >>>> that if every client writes to district areas this is safe. >>> >>> I presume s/district/distinct/, but yes, I'm glad we're finally trying >>> to make the code match existing practice ;) >>> >>>>> +++ b/docs/tools/qemu-nbd.rst >>>>> @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. >>>>> .. option:: -e, --shared=NUM >>>>> >>>>> Allow up to *NUM* clients to share the device (default >>>>> - ``1``), 0 for unlimited. Safe for readers, but for now, >>>>> - consistency is not guaranteed between multiple writers. >>>>> + ``1``), 0 for unlimited. >>>>> >>>> >>>> Removing the note means that now consistency is guaranteed between >>>> multiple writers, no? >>>> >>>> Or maybe we want to mention here that consistency depends on the protocol >>>> and users can opt in, or refer to the section where this is discussed? >>> >>> Yeah, a link to the QAPI docs where multi-conn is documented might be >>> nice, except I'm not sure the best way to do that in our sphinx >>> documentation setup. >>> >>>>> +## >>>>> +# @NbdExportMultiConn: >>>>> +# >>>>> +# Possible settings for advertising NBD multiple client support. >>>>> +# >>>>> +# @off: Do not advertise multiple clients. >>>>> +# >>>>> +# @on: Allow multiple clients (for writable clients, this is only safe >>>>> +# if the underlying BDS is cache-consistent, such as when backed >>>>> +# by the raw file driver); ignored if the NBD server was set up >>>>> +# with max-connections of 1. >>>>> +# >>>>> +# @auto: Behaves like @off if the export is writable, and @on if the >>>>> +# export is read-only. >>>>> +# >>>>> +# Since: 7.0 >>>>> +## >>>>> +{ 'enum': 'NbdExportMultiConn', >>>>> + 'data': ['off', 'on', 'auto'] } >>>>> >>>> >>>> Are we going to have --multi-con=(on|off|auto)? >>> >>> Oh. The QMP command (which is immediately visible through >>> nbd-server-add/block-storage-add to qemu and qemu-storage-daemon) >>> gains "multi-conn":"on", but you may be right that qemu-nbd would want >>> a command line option (either that, or we accellerate our plans that >>> qsd should replace qemu-nbd). >>> >>>>> +++ b/blockdev-nbd.c >>>>> @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) >>>>> return nbd_server || is_qemu_nbd; >>>>> } >>>>> >>>>> +int nbd_server_max_connections(void) >>>>> +{ >>>>> + return nbd_server ? nbd_server->max_connections : -1; >>>>> +} >>>>> >>>> >>>> -1 is a little bit strange for a limit, maybe 1 is a better default when >>>> we nbd_server == NULL? When can this happen? >>> >>> In qemu, if you haven't used the QMP command 'nbd-server-start' yet. >>> In qemu-nbd, always (per the nbd_server_is_running function just >>> above). My iotest only covered the qemu/qsd side, not the qemu-nbd >>> side, so it looks like I need a v3... >>> >>>>> +++ b/nbd/server.c >>> >>>>> + /* >>>>> + * Determine whether to advertise multi-conn. Default is auto, >>>>> + * which resolves to on for read-only and off for writable. But >>>>> + * if the server has max-connections 1, that forces the flag off. >>>>> >>>> >>>> Looks good, this can be enabled automatically based on the driver >>>> if we want, so users using auto will can upgrade to multi-con automatically. >>> >>> Yes, that's part of why I made it a tri-state with a default of 'auto'. >>> >>>> >>>> >>>>> + */ >>>>> + if (!arg->has_multi_conn) { >>>>> + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO; >>>>> + } >>>>> + if (nbd_server_max_connections() == 1) { >>>> >>>> + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF; >>>>> + } >>>> >>>> + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) { >>>>> + multi_conn = readonly; >>>>> + } else { >>>>> + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON; >>>>> + } >>>>> >>>> >>>> This part is a little bit confusing - we do: >>>> - initialize args->multi_con if it has not value >>>> - set the temporary multi_con based now initialized args->multi_con >>>> >>>> I think it will be nicer to separate arguments parsing, so there is no need >>>> to initialize it here or have arg->has_multi_conn, but I did not check how >>>> other arguments are handled. >>> >>> arg->has_multi_conn is fallout from the fact that our QAPI must remain >>> back-compat. If it is false, the user did not pass "multi-conn":..., >>> so we want the default value of "auto". Once we've established the >>> default, then we force multi-conn off if we know we are limited to one >>> client (although as you pointed out, nbd_server_max_connections() >>> needs to be tested with qemu-nbd). Then, it's easier to resolve the >>> tri-state enum variable into the bool of what we actually advertise to >>> the NBD client. >>> >>>>> +++ b/tests/qemu-iotests/tests/nbd-multiconn >>>>> @@ -0,0 +1,188 @@ >>>>> +#!/usr/bin/env bash >>>>> +# group: rw auto quick >>>>> +# >>>>> +# Test that qemu-nbd MULTI_CONN works >>>>> +# >>>>> +echo >>>>> +echo "=== Initial image setup ===" >>>>> +echo >>>>> + >>>>> +_make_test_img 4M >>>>> +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io >>>>> +_launch_qemu 2> >(_filter_nbd) >>>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" >>>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", >>>>> + "arguments":{"driver":"qcow2", "node-name":"n", >>>>> + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" >>> >>> I'm not the best at writing python iotests; I welcome a language >>> translation of this aspect. >> >> >> >> Let me try:) > > Thanks! This is much nicer and will be easier to maintain. > >> >> >> #!/usr/bin/env python3 >> >> import os >> import iotests >> import nbd >> from iotests import qemu_img_create, qemu_io_silent >> >> >> disk = os.path.join(iotests.test_dir, 'disk') >> size = '4M' >> nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock') >> nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock >> >> >> class TestNbdMulticonn(iotests.QMPTestCase): >> def setUp(self): >> assert qemu_img_create('-f', iotests.imgfmt, disk, size) == 0 >> assert qemu_io_silent('-c', 'w -P 1 0 2M', >> '-c', 'w -P 2 2M 2M', disk) == 0 >> >> self.vm = iotests.VM() >> self.vm.launch() >> result = self.vm.qmp('blockdev-add', { >> 'driver': 'qcow2', >> 'node-name': 'n', >> 'file': {'driver': 'file', 'filename': disk} >> }) >> self.assert_qmp(result, 'return', {}) >> >> def tearDown(self): >> self.vm.shutdown() >> os.remove(disk) >> os.remove(nbd_sock) >> >> def server_start(self, max_connections=None): >> args = { >> 'addr': { >> 'type': 'unix', >> 'data': {'path': nbd_sock} >> } >> } >> if max_connections is not None: >> args['max-connections'] = max_connections >> >> result = self.vm.qmp('nbd-server-start', args) >> self.assert_qmp(result, 'return', {}) >> >> def export_add(self, name, writable=None, multi_conn=None): >> args = { >> 'type': 'nbd', >> 'id': name, >> 'node-name': 'n', >> 'name': name, >> } >> if writable is not None: >> args['writable'] = writable >> if multi_conn is not None: >> args['multi-conn'] = multi_conn >> >> result = self.vm.qmp('block-export-add', args) >> self.assert_qmp(result, 'return', {}) >> >> def check_multi_conn(self, export_name, multi_conn): >> cl = nbd.NBD() >> cl.connect_uri(nbd_uri.format(export_name)) >> self.assertEqual(cl.can_multi_conn(), multi_conn) >> cl.shutdown() > > The test will be more clear and the code more reusable if the helper > was doing only connect/disconnect. > > @contextmanager > def open_nbd(nbd_uri, export_name): > h = nbd.NBD() > h.connect_uri(nbd_uri.format(export_name)) > try: > yield h > finally: > h.shutdown() > > Any test that need nbd handle can do: > > with open_nbd(nbd_uri, export_name) as h: > use the handle... > > Good example when we can use this is the block status cache test, > using complicated qemu-img commands instead of opening > a client and calling block_status a few times. > > And this also cleans up at the end of the test so failure > will not affect the next test. > > The helper can live in the iotest module instead of inventing it for > every new test. Yes that sounds good. > >> >> def test_default_settings(self): >> self.server_start() >> self.export_add('r')) >> self.export_add('w', writable=True) >> self.check_multi_conn('r', True) >> self.check_multi_conn('w', False) > > With the helper suggested, this test will be: > > with open_nbd(nbd_uri, "r") as h: > self.assertTrue(h.can_multi_conn()) > > with open_nbd(nbd_uri, "w") as h: > self.assertFalse(h.can_multi_conn()) > > Now you don't need to check what check_multicon() is doing when > reading the tests, and it is very clear what open_nbd() does based > on the name and usage as context manager. Agree > >> >> def test_multiconn_option(self): >> self.server_start() >> self.export_add('r', multi_conn='off') >> self.export_add('w', writable=True, multi_conn='on') > > It will be more natural to use: > > self.start_server() > self.add_export(...) > > In C the other way is more natural because you don't have namespaces > or classes. > >> self.check_multi_conn('r', False) >> self.check_multi_conn('w', True) > > And I think you agree since you did not use: > > self.multi_con_check(...) > Yes) >> >> def test_limited_connections(self): >> self.server_start(max_connections=1) >> self.export_add('r', multi_conn='on') >> self.export_add('w', writable=True, multi_conn='on') >> self.check_multi_conn('r', False) >> self.check_multi_conn('w', False) >> >> def test_parallel_writes(self): >> self.server_start() >> self.export_add('w', writable=True, multi_conn='on') >> >> clients = [nbd.NBD() for _ in range(3)] >> for c in clients: >> c.connect_uri(nbd_uri.format('w')) >> assert c.can_multi_conn() >> >> buf1 = clients[0].pread(1024 * 1024, 0) >> self.assertEqual(buf1, b"\x01" * 1024 * 1024) >> >> buf2 = b"\x03" * 1024 * 1024 >> clients[1].pwrite(buf2, 0) >> clients[2].flush() >> buf1 = clients[0].pread(1024 * 1024, 0) >> >> self.assertEqual(buf1, buf2) >> >> for i in range(3): >> clients[i].shutdown() >> >> >> if __name__ == '__main__': >> iotests.main(supported_fmts=['qcow2']) >> >>> But the shell code for >>> _launch_qemu/_send_qemu_cmd was already pretty nice for setting up a >>> long-running background qemu process where I can pass in QMP commands >>> at will, then interact from other processes. >>> >>>>> +export nbd_unix_socket >>>>> + >>>>> +echo >>>>> +echo "=== Default nbd server settings ===" >>>>> +echo >>>>> +# Default allows for unlimited connections, readonly images advertise >>>>> +# multi-conn, and writable images do not >>>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", >>>>> + "arguments":{"addr":{"type":"unix", >>>>> + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" >>>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", >>>>> + "arguments":{"type":"nbd", "id":"r", "node-name":"n", >>>>> + "name":"r"}}' "return" >>>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", >>>>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", >>>>> + "name":"w", "writable":true}}' "return" >>>>> +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' >>>>> +assert h.can_multi_conn() >>>>> +h.shutdown() >>>>> +print("nbdsh passed")' >>>>> +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' >>>>> +assert not h.can_multi_conn() >>>>> +h.shutdown() >>>>> +print("nbdsh passed")' >>>>> >>>> >>>> Mixing of shell and python is very confusing. Wouldn't it be much cleaner >>>> to write the test in python? >>> >>> Here, nbdsh -c 'python snippet' is used as a shell command line >>> parameter. Writing python code to call out to a system() command >>> where one of the arguments to that command is a python script snippet >>> is going to be just as annoying as writing it in bash. >>> >>>>> +echo >>>>> +echo "=== Demonstrate parallel writers ===" >>>>> +echo >>>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", >>>>> + "arguments":{"addr":{"type":"unix", >>>>> + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" >>>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", >>>>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", >>>>> + "name":"", "multi-conn":"on", "writable":true}}' "return" >>>>> + >>>>> +nbdsh -c ' >>>>> +import os >>>>> +sock = os.getenv("nbd_unix_socket") >>>>> +h = [] >>>>> + >>>>> +for i in range(3): >>>>> + h.append(nbd.NBD()) >>>>> + h[i].connect_unix(sock) >>>>> + assert h[i].can_multi_conn() >>>>> >>>> >>>> This is somewhat C in python, maybe: >>>> >>>> handles = [nbd.NBD() for _ in range(3)] >>>> >>>> for h in handles: >>>> h.connect_unix(sock) >>>> assert h.can_multi_con() >>>> >>>> Since assert will abort the test, and we don't handle >>>> exceptions, failure will not shutdown the connections >>>> but it should not matter for the purpose of a test. >>> >>> As I said, I'm open to python suggestions :) I like your approach. >>> >>>> >>>> >>>>> + >>>>> +buf1 = h[0].pread(1024 * 1024, 0) >>>>> +if buf1 != b"\x01" * 1024 * 1024: >>>>> + print("Unexpected initial read") >>>>> >>>> >>>> Not clear when we initialize the buffer to \x01 - is this the qemu-io above? >>> >>> Yes, when the qcow2 file was initially created. >>> >>>> >>>> >>>>> +buf2 = b"\x03" * 1024 * 1024 >>>>> +h[1].pwrite(buf2, 0) >>>>> +h[2].flush() >>>>> +buf1 = h[0].pread(1024 * 1024, 0) >>>>> +if buf1 == buf2: >>>>> + print("Flush appears to be consistent across connections") >>>>> >>>> >>>> buf1 was the initial content, buf2 is the new content, but now we override >>>> but1 to check that the right was flushed. It will be be better to use >>>> different >>>> names like inittial_data, updated_data, current_data. >>> >>> Can do. >>> >>>> >>>> This block is the most important part of the test, showing that we can write >>>> in one connection, flush in the second, and read the written block in the >>>> third. >>>> Maybe add a comment about this? I think it will help someone else trying >>>> to understand what this part is trying to test. >>> >>> Can do. >>> >>>>> +{"execute":"block-export-add", >>>>> + "arguments":{"type":"nbd", "id":"w", "node-name":"n", >>>>> + "name":"", "multi-conn":"on", "writable":true}} >>>>> +{"return": {}} >>>>> +Flush appears to be consistent across connections >>>>> +{"execute":"block-export-del", >>>>> + "arguments":{"id":"w"}} >>>>> +{"return": {}} >>>>> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >>>>> "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} >>>>> +{"execute":"nbd-server-stop"} >>>>> +{"return": {}} >>>>> +{"execute":"quit"} >>>>> +{"return": {}} >>>>> +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, >>>>> "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} >>>>> >>>> >>>> Nothing about the contents here says anything about the actual test >>>> except the "Flush appears..." line. >>> >>> Yeah, it's a lot of QMP debugging (to show what commands were run in >>> setting up the server), and less verbose in the nbdsh side. Do I need >>> to add more output during the nbdsh that uses multiple connections? >>> >>>> >>>> >>>>> +*** done >>>>> -- >>>>> 2.35.1 >>>>> >>>> >>>> Looks good, feel free to ignore the style comments and suggestion to rewrite >>>> the test in python. >>> >>> The style comments are nice, the rewriting is going to be harder for >>> me (but I'll accept help). At any rate, getting qemu-nbd to be >>> feature-compatible may require a v3 anyway. >>> >> >> >> -- >> Best regards, >> Vladimir >> >
On Wed, Feb 16, 2022 at 02:44:52PM +0100, Markus Armbruster wrote: > > > > +## > > +# @NbdExportMultiConn: > > +# > > +# Possible settings for advertising NBD multiple client support. > > +# > > +# @off: Do not advertise multiple clients. > > +# > > +# @on: Allow multiple clients (for writable clients, this is only safe > > +# if the underlying BDS is cache-consistent, such as when backed > > +# by the raw file driver); ignored if the NBD server was set up > > +# with max-connections of 1. > > +# > > +# @auto: Behaves like @off if the export is writable, and @on if the > > +# export is read-only. > > +# > > +# Since: 7.0 > > +## > > +{ 'enum': 'NbdExportMultiConn', > > + 'data': ['off', 'on', 'auto'] } > > Have you considered using OnOffAuto from common.json? Sounds good to me. > > Duplicating it as NbdExportMultiConn lets you document the values right > in the enum. If you reuse it, you have to document them where the enum > is used, i.e. ... > > > + > > ## > > # @BlockExportOptionsNbd: > > # > > @@ -95,11 +119,15 @@ > > # the metadata context name "qemu:allocation-depth" to > > # inspect allocation details. (since 5.2) > > # > > +# @multi-conn: Controls whether multiple client support is advertised, if the > > +# server's max-connections is not 1. (default auto, since 7.0) > > +# > > ... here. Yep, that's a good change to make for v3. I'll do it.
On Wed, Feb 16, 2022 at 11:08:06AM +0300, Vladimir Sementsov-Ogievskiy wrote: > 16.02.2022 02:24, Eric Blake wrote: > > > > +++ b/tests/qemu-iotests/tests/nbd-multiconn > > > > @@ -0,0 +1,188 @@ > > > > +#!/usr/bin/env bash > > > > +# group: rw auto quick > > > > +# > > > > +# Test that qemu-nbd MULTI_CONN works > > > > +# > > > > +echo > > > > +echo "=== Initial image setup ===" > > > > +echo > > > > + > > > > +_make_test_img 4M > > > > +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io > > > > +_launch_qemu 2> >(_filter_nbd) > > > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" > > > > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", > > > > + "arguments":{"driver":"qcow2", "node-name":"n", > > > > + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" > > > > I'm not the best at writing python iotests; I welcome a language > > translation of this aspect. > > > > Let me try:) > > > #!/usr/bin/env python3 > > import os > import iotests > import nbd > from iotests import qemu_img_create, qemu_io_silent > > > disk = os.path.join(iotests.test_dir, 'disk') > size = '4M' > nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock') > nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock ... Thanks; I'm playing with this (and the improvements suggested in followup messages) in preparation for a v3 posting. > > > > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' > > > > +assert h.can_multi_conn() > > > > +h.shutdown() > > > > +print("nbdsh passed")' > > > > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' > > > > +assert not h.can_multi_conn() > > > > +h.shutdown() > > > > +print("nbdsh passed")' > > > > > > > > > > Mixing of shell and python is very confusing. Wouldn't it be much cleaner > > > to write the test in python? > > > > Here, nbdsh -c 'python snippet' is used as a shell command line > > parameter. Writing python code to call out to a system() command > > where one of the arguments to that command is a python script snippet > > is going to be just as annoying as writing it in bash. Then again, since libnbd already includes python bindings, we wouldn't have to detour through nbdsh, but just use the python bindings directly (and I see that your translation did that).
On Wed, Feb 16, 2022 at 07:14:58PM +0200, Nir Soffer wrote: > > > I'm not the best at writing python iotests; I welcome a language > > > translation of this aspect. > > > > > > > > Let me try:) > > Thanks! This is much nicer and will be easier to maintain. > > > > > > > #!/usr/bin/env python3 > > > > import os > > import iotests > > import nbd What happens here if libnbd is not installed? Is there a way to make the test gracefully skip instead of error out? > > def check_multi_conn(self, export_name, multi_conn): > > cl = nbd.NBD() > > cl.connect_uri(nbd_uri.format(export_name)) > > self.assertEqual(cl.can_multi_conn(), multi_conn) > > cl.shutdown() > > The test will be more clear and the code more reusable if the helper > was doing only connect/disconnect. > > @contextmanager > def open_nbd(nbd_uri, export_name): > h = nbd.NBD() > h.connect_uri(nbd_uri.format(export_name)) This can throw an exception if the connect fails; should it be inside the try? I'm comparing to libnbd/python/t/220-opt-list.py, which also sets up a context manager. > try: > yield h > finally: > h.shutdown() > > Any test that need nbd handle can do: > > with open_nbd(nbd_uri, export_name) as h: > use the handle... > > Good example when we can use this is the block status cache test, > using complicated qemu-img commands instead of opening > a client and calling block_status a few times. > > And this also cleans up at the end of the test so failure > will not affect the next test. > > The helper can live in the iotest module instead of inventing it for > every new test. Moving it into iotest makes the question about what to do if libnbd is not installed even more important; while we could perhaps catch and deal with a failed 'import' for this test, skipping the iotest module due to a failed import has knock-on effects to all other iotests even when they don't care if libnbd is installed. > > > > > def test_default_settings(self): > > self.server_start() > > self.export_add('r')) > > self.export_add('w', writable=True) > > self.check_multi_conn('r', True) > > self.check_multi_conn('w', False) > > With the helper suggested, this test will be: > > with open_nbd(nbd_uri, "r") as h: > self.assertTrue(h.can_multi_conn()) > > with open_nbd(nbd_uri, "w") as h: > self.assertFalse(h.can_multi_conn()) > > Now you don't need to check what check_multicon() is doing when > reading the tests, and it is very clear what open_nbd() does based > on the name and usage as context manager. Yes, I like that aspect of a context manager. > > > > > def test_multiconn_option(self): > > self.server_start() > > self.export_add('r', multi_conn='off') > > self.export_add('w', writable=True, multi_conn='on') > > It will be more natural to use: > > self.start_server() > self.add_export(...) > > In C the other way is more natural because you don't have namespaces > or classes. > > > self.check_multi_conn('r', False) > > self.check_multi_conn('w', True) > > And I think you agree since you did not use: > > self.multi_con_check(...) > Useful naming advice.
diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt index bdb0f2a41aca..6c99070b99c8 100644 --- a/docs/interop/nbd.txt +++ b/docs/interop/nbd.txt @@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports, NBD_CMD_FLAG_FAST_ZERO * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth" +* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst index 6031f9689312..1de785524c36 100644 --- a/docs/tools/qemu-nbd.rst +++ b/docs/tools/qemu-nbd.rst @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified. .. option:: -e, --shared=NUM Allow up to *NUM* clients to share the device (default - ``1``), 0 for unlimited. Safe for readers, but for now, - consistency is not guaranteed between multiple writers. + ``1``), 0 for unlimited. .. option:: -t, --persistent diff --git a/qapi/block-export.json b/qapi/block-export.json index f183522d0d2c..0a27e8ee84f9 100644 --- a/qapi/block-export.json +++ b/qapi/block-export.json @@ -21,7 +21,9 @@ # recreated on the fly while the NBD server is active. # If missing, it will default to denying access (since 4.0). # @max-connections: The maximum number of connections to allow at the same -# time, 0 for unlimited. (since 5.2; default: 0) +# time, 0 for unlimited. Setting this to 1 also stops +# the server from advertising multiple client support +# (since 5.2; default: 0) # # Since: 4.2 ## @@ -50,7 +52,9 @@ # recreated on the fly while the NBD server is active. # If missing, it will default to denying access (since 4.0). # @max-connections: The maximum number of connections to allow at the same -# time, 0 for unlimited. (since 5.2; default: 0) +# time, 0 for unlimited. Setting this to 1 also stops +# the server from advertising multiple client support +# (since 5.2; default: 0). # # Returns: error if the server is already running. # @@ -79,6 +83,26 @@ { 'struct': 'BlockExportOptionsNbdBase', 'data': { '*name': 'str', '*description': 'str' } } +## +# @NbdExportMultiConn: +# +# Possible settings for advertising NBD multiple client support. +# +# @off: Do not advertise multiple clients. +# +# @on: Allow multiple clients (for writable clients, this is only safe +# if the underlying BDS is cache-consistent, such as when backed +# by the raw file driver); ignored if the NBD server was set up +# with max-connections of 1. +# +# @auto: Behaves like @off if the export is writable, and @on if the +# export is read-only. +# +# Since: 7.0 +## +{ 'enum': 'NbdExportMultiConn', + 'data': ['off', 'on', 'auto'] } + ## # @BlockExportOptionsNbd: # @@ -95,11 +119,15 @@ # the metadata context name "qemu:allocation-depth" to # inspect allocation details. (since 5.2) # +# @multi-conn: Controls whether multiple client support is advertised, if the +# server's max-connections is not 1. (default auto, since 7.0) +# # Since: 5.2 ## { 'struct': 'BlockExportOptionsNbd', 'base': 'BlockExportOptionsNbdBase', - 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool' } } + 'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool', + '*multi-conn': 'NbdExportMultiConn' } } ## # @BlockExportOptionsVhostUserBlk: diff --git a/include/block/nbd.h b/include/block/nbd.h index 78d101b77488..138b9857dbcb 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016-2020 Red Hat, Inc. + * Copyright (C) 2016-2022 Red Hat, Inc. * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws> * * Network Block Device @@ -346,6 +346,7 @@ void nbd_client_put(NBDClient *client); void nbd_server_is_qemu_nbd(bool value); bool nbd_server_is_running(void); +int nbd_server_max_connections(void); void nbd_server_start(SocketAddress *addr, const char *tls_creds, const char *tls_authz, uint32_t max_connections, Error **errp); diff --git a/blockdev-nbd.c b/blockdev-nbd.c index bdfa7ed3a5a9..0df222e868d8 100644 --- a/blockdev-nbd.c +++ b/blockdev-nbd.c @@ -44,6 +44,11 @@ bool nbd_server_is_running(void) return nbd_server || is_qemu_nbd; } +int nbd_server_max_connections(void) +{ + return nbd_server ? nbd_server->max_connections : -1; +} + static void nbd_blockdev_client_closed(NBDClient *client, bool ignored) { nbd_client_put(client); diff --git a/nbd/server.c b/nbd/server.c index 9fb2f264023e..f17aacc8139f 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2016-2021 Red Hat, Inc. + * Copyright (C) 2016-2022 Red Hat, Inc. * Copyright (C) 2005 Anthony Liguori <anthony@codemonkey.ws> * * Network Block Device Server Side @@ -1641,7 +1641,7 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args, int64_t size; uint64_t perm, shared_perm; bool readonly = !exp_args->writable; - bool shared = !exp_args->writable; + bool multi_conn; strList *bitmaps; size_t i; int ret; @@ -1679,6 +1679,23 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args, return size; } + /* + * Determine whether to advertise multi-conn. Default is auto, + * which resolves to on for read-only and off for writable. But + * if the server has max-connections 1, that forces the flag off. + */ + if (!arg->has_multi_conn) { + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO; + } + if (nbd_server_max_connections() == 1) { + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF; + } + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) { + multi_conn = readonly; + } else { + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON; + } + /* Don't allow resize while the NBD server is running, otherwise we don't * care what happens with the node. */ blk_get_perm(blk, &perm, &shared_perm); @@ -1692,11 +1709,11 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args, exp->description = g_strdup(arg->description); exp->nbdflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA | NBD_FLAG_SEND_CACHE); + if (multi_conn) { + exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN; + } if (readonly) { exp->nbdflags |= NBD_FLAG_READ_ONLY; - if (shared) { - exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN; - } } else { exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES | NBD_FLAG_SEND_FAST_ZERO); diff --git a/MAINTAINERS b/MAINTAINERS index 9814580975c5..24dd800ae3f8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3334,6 +3334,7 @@ F: qemu-nbd.* F: blockdev-nbd.c F: docs/interop/nbd.txt F: docs/tools/qemu-nbd.rst +F: tests/qemu-iotests/tests/*nbd* T: git https://repo.or.cz/qemu/ericb.git nbd T: git https://src.openvz.org/scm/~vsementsov/qemu.git nbd diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn new file mode 100755 index 000000000000..0261552f32f2 --- /dev/null +++ b/tests/qemu-iotests/tests/nbd-multiconn @@ -0,0 +1,188 @@ +#!/usr/bin/env bash +# group: rw auto quick +# +# Test that qemu-nbd MULTI_CONN works +# +# Copyright (C) 2018-2022 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +seq="$(basename $0)" +echo "QA output created by $seq" + +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_test_img + _cleanup_qemu + rm -f "$SOCK_DIR/nbd" +} +trap "_cleanup; exit \$status" 0 1 2 3 15 + +# get standard environment, filters and checks +cd .. +. ./common.rc +. ./common.filter +. ./common.qemu +. ./common.nbd + +_supported_fmt qcow2 +_supported_proto file +_supported_os Linux +_require_command QEMU_NBD + +# Parallel client connections are easier with libnbd than qemu-io: +if ! (nbdsh --version) >/dev/null 2>&1; then + _notrun "nbdsh utility required, skipped this test" +fi + +echo +echo "=== Initial image setup ===" +echo + +_make_test_img 4M +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io +_launch_qemu 2> >(_filter_nbd) +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add", + "arguments":{"driver":"qcow2", "node-name":"n", + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return" +export nbd_unix_socket + +echo +echo "=== Default nbd server settings ===" +echo +# Default allows for unlimited connections, readonly images advertise +# multi-conn, and writable images do not +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", + "arguments":{"addr":{"type":"unix", + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"r", "node-name":"n", + "name":"r"}}' "return" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"w", "writable":true}}' "return" +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' +assert h.can_multi_conn() +h.shutdown() +print("nbdsh passed")' +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' +assert not h.can_multi_conn() +h.shutdown() +print("nbdsh passed")' +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", + "arguments":{"id":"r"}}' "DELETED" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", + "arguments":{"id":"w"}}' "DELETED" + +echo +echo "=== Per-export overrides of defaults ===" +echo +# Can explicitly disable multi-conn for readonly image, and explicitly +# enable it for writable image +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"r", "node-name":"n", + "name":"r", "multi-conn":"off"}}' "return" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"w", "multi-conn":"on", "writable":true}}' "return" +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' +assert not h.can_multi_conn() +h.shutdown() +print("nbdsh passed")' +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' +assert h.can_multi_conn() +h.shutdown() +print("nbdsh passed")' +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", + "arguments":{"id":"r"}}' "DELETED" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", + "arguments":{"id":"w"}}' "DELETED" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return" + +echo +echo "=== Limit nbd server max-connections ===" +echo +# When max-connections is 1, no images advertise multi-conn, even when +# explicitly requested per export +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", + "arguments":{"max-connections":1, "addr":{"type":"unix", + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"r", "node-name":"n", + "name":"r", "multi-conn":"on"}}' "return" +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c ' +assert not h.can_multi_conn() +h.shutdown() +print("nbdsh passed")' +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", + "arguments":{"id":"r"}}' "DELETED" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"w", "multi-conn":"on", "writable":true}}' "return" +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c ' +assert not h.can_multi_conn() +h.shutdown() +print("nbdsh passed")' +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", + "arguments":{"id":"w"}}' "DELETED" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return" + +echo +echo "=== Demonstrate parallel writers ===" +echo +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start", + "arguments":{"addr":{"type":"unix", + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"", "multi-conn":"on", "writable":true}}' "return" + +nbdsh -c ' +import os +sock = os.getenv("nbd_unix_socket") +h = [] + +for i in range(3): + h.append(nbd.NBD()) + h[i].connect_unix(sock) + assert h[i].can_multi_conn() + +buf1 = h[0].pread(1024 * 1024, 0) +if buf1 != b"\x01" * 1024 * 1024: + print("Unexpected initial read") +buf2 = b"\x03" * 1024 * 1024 +h[1].pwrite(buf2, 0) +h[2].flush() +buf1 = h[0].pread(1024 * 1024, 0) +if buf1 == buf2: + print("Flush appears to be consistent across connections") + +for i in range(3): + h[i].shutdown() +' + +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-del", + "arguments":{"id":"w"}}' "DELETED" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-stop"}' "return" +_send_qemu_cmd $QEMU_HANDLE '{"execute":"quit"}' "return" +wait=yes _cleanup_qemu + +# success, all done +echo '*** done' +rm -f $seq.full +status=0 diff --git a/tests/qemu-iotests/tests/nbd-multiconn.out b/tests/qemu-iotests/tests/nbd-multiconn.out new file mode 100644 index 000000000000..a06428e1536a --- /dev/null +++ b/tests/qemu-iotests/tests/nbd-multiconn.out @@ -0,0 +1,112 @@ +QA output created by nbd-multiconn + +=== Initial image setup === + +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304 +wrote 2097152/2097152 bytes at offset 0 +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 2097152/2097152 bytes at offset 2097152 +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +{"execute":"qmp_capabilities"} +{"return": {}} +{"execute":"blockdev-add", + "arguments":{"driver":"IMGFMT", "node-name":"n", + "file":{"driver":"file", "filename":"TEST_DIR/t.IMGFMT"}}} +{"return": {}} + +=== Default nbd server settings === + +{"execute":"nbd-server-start", + "arguments":{"addr":{"type":"unix", + "data":{"path":"SOCK_DIR/qemu-nbd.sock"}}}} +{"return": {}} +{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"r", "node-name":"n", + "name":"r"}} +{"return": {}} +{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"w", "writable":true}} +{"return": {}} +nbdsh passed +nbdsh passed +{"execute":"block-export-del", + "arguments":{"id":"r"}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "r"}} +{"execute":"block-export-del", + "arguments":{"id":"w"}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} + +=== Per-export overrides of defaults === + +{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"r", "node-name":"n", + "name":"r", "multi-conn":"off"}} +{"return": {}} +{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"w", "multi-conn":"on", "writable":true}} +{"return": {}} +nbdsh passed +nbdsh passed +{"execute":"block-export-del", + "arguments":{"id":"r"}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "r"}} +{"execute":"block-export-del", + "arguments":{"id":"w"}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} +{"execute":"nbd-server-stop"} +{"return": {}} + +=== Limit nbd server max-connections === + +{"execute":"nbd-server-start", + "arguments":{"max-connections":1, "addr":{"type":"unix", + "data":{"path":"SOCK_DIR/qemu-nbd.sock"}}}} +{"return": {}} +{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"r", "node-name":"n", + "name":"r", "multi-conn":"on"}} +{"return": {}} +nbdsh passed +{"execute":"block-export-del", + "arguments":{"id":"r"}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "r"}} +{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"w", "multi-conn":"on", "writable":true}} +{"return": {}} +nbdsh passed +{"execute":"block-export-del", + "arguments":{"id":"w"}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} +{"execute":"nbd-server-stop"} +{"return": {}} + +=== Demonstrate parallel writers === + +{"execute":"nbd-server-start", + "arguments":{"addr":{"type":"unix", + "data":{"path":"SOCK_DIR/qemu-nbd.sock"}}}} +{"return": {}} +{"execute":"block-export-add", + "arguments":{"type":"nbd", "id":"w", "node-name":"n", + "name":"", "multi-conn":"on", "writable":true}} +{"return": {}} +Flush appears to be consistent across connections +{"execute":"block-export-del", + "arguments":{"id":"w"}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}} +{"execute":"nbd-server-stop"} +{"return": {}} +{"execute":"quit"} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}} +*** done
According to the NBD spec, a server advertising NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will not see any cache inconsistencies: when properly separated by a single flush, actions performed by one client will be visible to another client, regardless of which client did the flush. We satisfy these conditions in qemu when our block layer is backed by the local filesystem (by virtue of the semantics of fdatasync(), and the fact that qemu itself is not buffering writes beyond flushes). It is harder to state whether we satisfy these conditions for network-based protocols, so the safest course of action is to allow users to opt-in to advertising multi-conn. We may later tweak defaults to advertise by default when the block layer can confirm that the underlying protocol driver is cache consistent between multiple writers, but for now, this at least allows savvy users (such as virt-v2v or nbdcopy) to explicitly start qemu-nbd or qemu-storage-daemon with multi-conn advertisement in a known-safe setup where the client end can then benefit from parallel clients. Note, however, that we don't want to advertise MULTI_CONN when we know that a second client cannot connect (for historical reasons, qemu-nbd defaults to a single connection while nbd-server-add and QMP commands default to unlimited connections; but we already have existing means to let either style of NBD server creation alter those defaults). The harder part of this patch is setting up an iotest to demonstrate behavior of multiple NBD clients to a single server. It might be possible with parallel qemu-io processes, but concisely managing that in shell is painful. I found it easier to do by relying on the libnbd project's nbdsh, which means this test will be skipped on platforms where that is not available. Signed-off-by: Eric Blake <eblake@redhat.com> Fixes: https://bugzilla.redhat.com/1708300 --- v1 was in Aug 2021 [1], with further replies in Sep [2] and Oct [3]. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04900.html [2] https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html [3] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html Since then, I've tweaked the QAPI to mention 7.0 (instead of 6.2), and reworked the logic so that default behavior is unchanged for now (advertising multi-conn on a writable export requires opt-in during the command line or QMP, but remains default for a readonly export). I've also expanded the amount of testing done in the new iotest. docs/interop/nbd.txt | 1 + docs/tools/qemu-nbd.rst | 3 +- qapi/block-export.json | 34 +++- include/block/nbd.h | 3 +- blockdev-nbd.c | 5 + nbd/server.c | 27 ++- MAINTAINERS | 1 + tests/qemu-iotests/tests/nbd-multiconn | 188 +++++++++++++++++++++ tests/qemu-iotests/tests/nbd-multiconn.out | 112 ++++++++++++ 9 files changed, 363 insertions(+), 11 deletions(-) create mode 100755 tests/qemu-iotests/tests/nbd-multiconn create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out