Message ID | 20211025101735.2060852-1-eesposit@redhat.com |
---|---|
Headers | show |
Series | block layer: split block APIs in global state and I/O | expand |
On 10/25/21 12:17, Emanuele Giuseppe Esposito wrote: [...] > Each function in the GS API will have an assertion, checking > that it is always running under BQL. > I/O functions are instead thread safe (or so should be), meaning > that they *can* run under BQL, but also in an iothread in another > AioContext. Therefore they do not provide any assertion, and > need to be audited manually to verify the correctness. > > Adding assetions has helped finding 2 bugs already, as shown in > my series "Migration: fix missing iothread locking". > > Tested this series by running unit tests, qemu-iotests and qtests > (x86_64). > Some functions in the GS API are used everywhere but not > properly tested. Therefore their assertion is never actually run in > the tests, so despite my very careful auditing, it is not impossible > to exclude that some will trigger while actually using QEMU. > > Patch 1 introduces qemu_in_main_thread(), the function used in > all assertions. This had to be introduced otherwise all unit tests > would fail, since they run in the main loop but use the code in > stubs/iothread.c > Patches 2-14 and 19-25 (with the exception of patch 9, that is an additional > assert) are all structured in the same way: first we split the header > and in the next (even) patch we add assertions. > The rest of the patches ontain either both assertions and split, > or have no assertions. This seems a lot of assertions added in hot-path code. Does it makes sense to use a BLOCK_ASSERT() macro instead, only expanded when configure with --enable-debug?
On Mon, Oct 25, 2021 at 04:09:41PM +0200, Philippe Mathieu-Daudé wrote: > On 10/25/21 12:17, Emanuele Giuseppe Esposito wrote: > [...] > > > Each function in the GS API will have an assertion, checking > > that it is always running under BQL. > > I/O functions are instead thread safe (or so should be), meaning > > that they *can* run under BQL, but also in an iothread in another > > AioContext. Therefore they do not provide any assertion, and > > need to be audited manually to verify the correctness. > > > > Adding assetions has helped finding 2 bugs already, as shown in > > my series "Migration: fix missing iothread locking". > > > > Tested this series by running unit tests, qemu-iotests and qtests > > (x86_64). > > Some functions in the GS API are used everywhere but not > > properly tested. Therefore their assertion is never actually run in > > the tests, so despite my very careful auditing, it is not impossible > > to exclude that some will trigger while actually using QEMU. > > > > Patch 1 introduces qemu_in_main_thread(), the function used in > > all assertions. This had to be introduced otherwise all unit tests > > would fail, since they run in the main loop but use the code in > > stubs/iothread.c > > Patches 2-14 and 19-25 (with the exception of patch 9, that is an additional > > assert) are all structured in the same way: first we split the header > > and in the next (even) patch we add assertions. > > The rest of the patches ontain either both assertions and split, > > or have no assertions. > > This seems a lot of assertions added in hot-path code. > > Does it makes sense to use a BLOCK_ASSERT() macro instead, > only expanded when configure with --enable-debug? I think the assertions are only in the slow path (functions that must be run with the BQL held from the main thread). The I/O request code path does not have new assertions. Stefan
On Mon, Oct 25, 2021 at 06:17:10AM -0400, Emanuele Giuseppe Esposito wrote: > Currently, block layer APIs like block-backend.h contain a mix of > functions that are either running in the main loop and under the > BQL, or are thread-safe functions and run in iothreads performing I/O. > The functions running under BQL also take care of modifying the > block graph, by using drain and/or aio_context_acquire/release. > This makes it very confusing to understand where each function > runs, and what assumptions it provided with regards to thread > safety. > > We call the functions running under BQL "global state (GS) API", and > distinguish them from the thread-safe "I/O API". > > The aim of this series is to split the relevant block headers in > global state and I/O sub-headers. The division will be done in Kevin and Hanna, Does one of you want to review and merge this? It affects the entire block layer and your input would be valuable. Thanks, Stefan > this way: > header.h will be split in header-global-state.h, header-io.h and > header-common.h. The latter will just contain the data structures > needed by header-global-state and header-io, and common helpers > that are neither in GS nor in I/O. header.h will remain for > legacy and to avoid changing all includes in all QEMU c files, > but will only include the two new headers. No function shall be > added in header.c . > Once we split all relevant headers, it will be much easier to see what > uses the AioContext lock and remove it, which is the overall main > goal of this and other series that I posted/will post. > > In addition to splitting the relevant headers shown in this series, > it is also very helpful splitting the function pointers in some > block structures, to understand what runs under AioContext lock and > what doesn't. This is what patches 19-25 do. > > Each function in the GS API will have an assertion, checking > that it is always running under BQL. > I/O functions are instead thread safe (or so should be), meaning > that they *can* run under BQL, but also in an iothread in another > AioContext. Therefore they do not provide any assertion, and > need to be audited manually to verify the correctness. > > Adding assetions has helped finding 2 bugs already, as shown in > my series "Migration: fix missing iothread locking". > > Tested this series by running unit tests, qemu-iotests and qtests > (x86_64). > Some functions in the GS API are used everywhere but not > properly tested. Therefore their assertion is never actually run in > the tests, so despite my very careful auditing, it is not impossible > to exclude that some will trigger while actually using QEMU. > > Patch 1 introduces qemu_in_main_thread(), the function used in > all assertions. This had to be introduced otherwise all unit tests > would fail, since they run in the main loop but use the code in > stubs/iothread.c > Patches 2-14 and 19-25 (with the exception of patch 9, that is an additional > assert) are all structured in the same way: first we split the header > and in the next (even) patch we add assertions. > The rest of the patches ontain either both assertions and split, > or have no assertions. > > Next steps once this get reviewed: > 1) audit the GS API and replace the AioContext lock with drains, > or remove them when not necessary (requires further discussion). > 2) [optional as it should be already the case] audit the I/O API > and check that thread safety is guaranteed > > Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> > --- > v3 -> v4: > * blockdev.h (patch 14): blockdev_mark_auto_del, blockdev_auto_del > and blk_legacy_dinfo as GS API. > * add copyright header to block.h, block-io.h and block-global-state.h > * rebase on current master (c5b2f55981) > > v2 -> v3: > * rename "graph API" into "global state API" > * change order of patches, block.h comes before block-backend.h > * change GS and I/O comment headers to avoid redundancy, all other > headers refer to block-global-state.h and block-io.h > * fix typo on GS and I/O headers > * use assert instead of g_assert > * move bdrv_pwrite_sync, bdrv_block_status and bdrv_co_copy_range_{from/to} > to the I/O API > * change assert_bdrv_graph_writable implementation, since we need > to introduce additional drains > * remove transactions API split > * add preparation patch for blockdev.h (patch 13) > * backup-top -> copy-on-write > * change I/O comment in job.h into a better meaningful explanation > * fix all warnings given by checkpatch, mostly due to /* */ to be > split in separate lines > * rebase on current master (c09124dcb8), and split the following new functions: > blk_replace_bs (I/O) > bdrv_bsc_is_data (I/O) > bdrv_bsc_invalidate_range (I/O) > bdrv_bsc_fill (I/O) > bdrv_new_open_driver_opts (GS) > blk_get_max_hw_iov (I/O) > they are all added in patches 4 and 6. > > v1 -> v2: > * remove the iothread locking bug fix, and send it as separate patch > * rename graph API -> global state API > * better documented patch 1 (qemu_in_main_thread) > * add and split all other block layer headers > * fix warnings given by checkpatch on multiline comments > > Emanuele Giuseppe Esposito (25): > main-loop.h: introduce qemu_in_main_thread() > include/block/block: split header into I/O and global state API > assertions for block global state API > include/sysemu/block-backend: split header into I/O and global state > (GS) API > block/block-backend.c: assertions for block-backend > include/block/block_int: split header into I/O and global state API > assertions for block_int global state API > block: introduce assert_bdrv_graph_writable > include/block/blockjob_int.h: split header into I/O and GS API > assertions for blockjob_int.h > include/block/blockjob.h: global state API > assertions for blockob.h global state API > include/sysemu/blockdev.h: move drive_add and inline drive_def > include/systemu/blockdev.h: global state API > assertions for blockdev.h global state API > include/block/snapshot: global state API + assertions > block/copy-before-write.h: global state API + assertions > block/coroutines: I/O API > block_int-common.h: split function pointers in BlockDriver > block_int-common.h: assertion in the callers of BlockDriver function > pointers > block_int-common.h: split function pointers in BdrvChildClass > block_int-common.h: assertions in the callers of BdrvChildClass > function pointers > block-backend-common.h: split function pointers in BlockDevOps > job.h: split function pointers in JobDriver > job.h: assertions in the callers of JobDriver funcion pointers > > block.c | 188 ++- > block/backup.c | 1 + > block/block-backend.c | 105 +- > block/commit.c | 4 + > block/copy-before-write.c | 2 + > block/copy-before-write.h | 7 + > block/coroutines.h | 6 + > block/dirty-bitmap.c | 1 + > block/io.c | 37 + > block/meson.build | 7 +- > block/mirror.c | 4 + > block/monitor/bitmap-qmp-cmds.c | 6 + > block/monitor/block-hmp-cmds.c | 2 +- > block/snapshot.c | 28 + > block/stream.c | 2 + > blockdev.c | 55 +- > blockjob.c | 14 + > include/block/block-common.h | 389 +++++ > include/block/block-global-state.h | 286 ++++ > include/block/block-io.h | 306 ++++ > include/block/block.h | 878 +---------- > include/block/block_int-common.h | 1193 +++++++++++++++ > include/block/block_int-global-state.h | 327 ++++ > include/block/block_int-io.h | 163 ++ > include/block/block_int.h | 1478 +------------------ > include/block/blockjob.h | 9 + > include/block/blockjob_int.h | 28 + > include/block/snapshot.h | 13 +- > include/qemu/job.h | 16 + > include/qemu/main-loop.h | 13 + > include/sysemu/block-backend-common.h | 92 ++ > include/sysemu/block-backend-global-state.h | 122 ++ > include/sysemu/block-backend-io.h | 139 ++ > include/sysemu/block-backend.h | 269 +--- > include/sysemu/blockdev.h | 24 +- > job.c | 9 + > migration/savevm.c | 2 + > softmmu/cpus.c | 5 + > softmmu/qdev-monitor.c | 2 + > softmmu/vl.c | 25 +- > stubs/iothread-lock.c | 5 + > 41 files changed, 3619 insertions(+), 2643 deletions(-) > create mode 100644 include/block/block-common.h > create mode 100644 include/block/block-global-state.h > create mode 100644 include/block/block-io.h > create mode 100644 include/block/block_int-common.h > create mode 100644 include/block/block_int-global-state.h > create mode 100644 include/block/block_int-io.h > create mode 100644 include/sysemu/block-backend-common.h > create mode 100644 include/sysemu/block-backend-global-state.h > create mode 100644 include/sysemu/block-backend-io.h > > -- > 2.27.0 >
On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote: > Currently, block layer APIs like block-backend.h contain a mix of > functions that are either running in the main loop and under the > BQL, or are thread-safe functions and run in iothreads performing I/O. > The functions running under BQL also take care of modifying the > block graph, by using drain and/or aio_context_acquire/release. > This makes it very confusing to understand where each function > runs, and what assumptions it provided with regards to thread > safety. > > We call the functions running under BQL "global state (GS) API", and > distinguish them from the thread-safe "I/O API". > > The aim of this series is to split the relevant block headers in > global state and I/O sub-headers. Despite leaving quite some comments, the series and the split seem reasonable to me overall. (This is a pretty big series, after all, so those “some comments” stack up against a majority of changes that seem OK to me. :)) One thing I noticed while reviewing is that it’s really hard to verify that no I/O function calls a GS function. What would be wonderful is some function marker like coroutine_fn that marks GS functions (or I/O functions) and that we could then verify the call paths. But AFAIU we’ve always wanted precisely that for coroutine_fn and still don’t have it, so this seems like extremely wishful thinking... :( I think most of the issues I found can be fixed (or are even irrelevant), the only thing that really worries me are the two places that are clearly I/O paths that call permission functions: Namely first block_crypto_amend_options_generic_luks() (part of the luks block driver’s .bdrv_co_amend implementation), which calls bdrv_child_refresh_perms(); and second fuse_do_truncate(), which calls blk_set_perm(). In the first case, we need this call so that we don’t permanently hog the WRITE permission for the luks file, which used to be a problem, I believe. We want to unshare the WRITE permission (and apparently also CONSISTENT_READ) during the key update, so we need some way to temporarily update the permissions. I only really see four solutions for this: (1) We somehow make the amend job run in the main context under the BQL and have it prevent all concurrent I/O access (seems bad) (2) We can make the permission functions part of the I/O path (seems wrong and probably impossible?) (3) We can drop the permissions update and permanently require the permissions that we need when updating keys (I think this might break existing use cases) (4) We can acquire the BQL around the permission update call and perhaps that works? I don’t know how (4) would work but it’s basically the only reasonable solution I can come up with. Would this be a way to call a BQL function from an I/O function? As for the second case, the same applies as above, with the differences that we have no jobs, so this code must always run in the block device’s AioContext (I think), which rules out (1); but (3) would become easier (i.e. require the RESIZE permission all the time), although that too might have an impact on existing users (don’t think so, though). In any case, if we could do (4), that would solve the problem here, too. And finally, another notable thing I noticed is that the way how create-related functions are handled is inconsistent. I believe they should all be GS functions; qmp_blockdev_create() seems to agree with me on this, but we currently seem to have some bugs there. It’s possible to invoke blockdev-create on a block device that’s in an I/O thread, and then qemu crashes. Oops. (The comment in qmp_blockdev_create() says that the block drivers’ implementations should prevent this, but apparently they don’t...?) In any case, that’s a pre-existing bug, of course, that doesn’t concern this series (other than that it suggests that “create” functions should be classified as GS). Hanna
On Mon, Nov 15, 2021 at 05:03:28PM +0100, Hanna Reitz wrote: > On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote: > > Currently, block layer APIs like block-backend.h contain a mix of > > functions that are either running in the main loop and under the > > BQL, or are thread-safe functions and run in iothreads performing I/O. > > The functions running under BQL also take care of modifying the > > block graph, by using drain and/or aio_context_acquire/release. > > This makes it very confusing to understand where each function > > runs, and what assumptions it provided with regards to thread > > safety. > > > > We call the functions running under BQL "global state (GS) API", and > > distinguish them from the thread-safe "I/O API". > > > > The aim of this series is to split the relevant block headers in > > global state and I/O sub-headers. > > Despite leaving quite some comments, the series and the split seem > reasonable to me overall. (This is a pretty big series, after all, so those > “some comments” stack up against a majority of changes that seem OK to me. > :)) > > One thing I noticed while reviewing is that it’s really hard to verify that > no I/O function calls a GS function. What would be wonderful is some > function marker like coroutine_fn that marks GS functions (or I/O functions) > and that we could then verify the call paths. But AFAIU we’ve always wanted > precisely that for coroutine_fn and still don’t have it, so this seems like > extremely wishful thinking... :( Even if we don't programmatically verify these rules, it would be a major step forward if we simply adopted a standard naming convention for the APIs that was essentally self-describing. eg block_io_XXX for all I/O stuff and block_state_XXXX for all global state ,and block_common if we have stuff applicable to both use cases. I wouldn't suggest doing it as part of this series, but I think it'd be worthwhile in general for the medium-long term, despite the code churn in updating all usage in the short term. Regards, Daniel
On 11/15/21 17:03, Hanna Reitz wrote: > > I only really see four solutions for this: > (1) We somehow make the amend job run in the main context under the BQL > and have it prevent all concurrent I/O access (seems bad) > (2) We can make the permission functions part of the I/O path (seems > wrong and probably impossible?) > (3) We can drop the permissions update and permanently require the > permissions that we need when updating keys (I think this might break > existing use cases) > (4) We can acquire the BQL around the permission update call and perhaps > that works? > > I don’t know how (4) would work but it’s basically the only reasonable > solution I can come up with. Would this be a way to call a BQL function > from an I/O function? I think that would deadlock: main I/O thread -------- ----- start bdrv_co_amend take BQL bdrv_drain ... hangs ... (2) is definitely wrong. (3) I have no idea. Would it be possible or meaningful to do the bdrv_child_refresh_perms in qmp_x_blockdev_amend? It seems that all users need it, and in general it seems weird to amend a qcow2 or luks header (and thus the meaning of parts of the file) while others can write to the same file. Paolo
On 11/15/21 17:03, Hanna Reitz wrote:
> and second fuse_do_truncate(), which calls blk_set_perm().
Here it seems that a non-growable export is still growable as long as
nobody is watching. :) Is this the desired behavior?
Paolo
On 18.11.21 15:04, Paolo Bonzini wrote: > On 11/15/21 17:03, Hanna Reitz wrote: >> and second fuse_do_truncate(), which calls blk_set_perm(). > > Here it seems that a non-growable export is still growable as long as > nobody is watching. :) Is this the desired behavior? Yes, absolutely. “Growable” is documented to mean that writes after the end of the exported file will grow it to fit. Explicit truncating is something else, and I believe we should allow it on all writable exports. (Of course only when other potential users of the block node in question allow it to be resized, but that’s what the permission is for.) Hanna
On 18.11.21 14:50, Paolo Bonzini wrote: > On 11/15/21 17:03, Hanna Reitz wrote: >> >> I only really see four solutions for this: >> (1) We somehow make the amend job run in the main context under the >> BQL and have it prevent all concurrent I/O access (seems bad) >> (2) We can make the permission functions part of the I/O path (seems >> wrong and probably impossible?) >> (3) We can drop the permissions update and permanently require the >> permissions that we need when updating keys (I think this might break >> existing use cases) >> (4) We can acquire the BQL around the permission update call and >> perhaps that works? >> >> I don’t know how (4) would work but it’s basically the only >> reasonable solution I can come up with. Would this be a way to call >> a BQL function from an I/O function? > > I think that would deadlock: > > main I/O thread > -------- ----- > start bdrv_co_amend > take BQL > bdrv_drain > ... hangs ... :/ Is there really nothing we can do? Forgive me if I’m talking complete nonsense here (because frankly I don’t even really know what a bottom half is exactly), but can’t we schedule some coroutine in the main thread to do the perm notifications and wait for them in the I/O thread? > (2) is definitely wrong. > > (3) I have no idea. > > Would it be possible or meaningful to do the bdrv_child_refresh_perms > in qmp_x_blockdev_amend? It seems that all users need it, and in > general it seems weird to amend a qcow2 or luks header (and thus the > meaning of parts of the file) while others can write to the same file. Hmm... Perhaps. We would need to undo the permission change when the job finishes, though, i.e. in JobDriver.prepare() or JobDriver.clean(). Doing the change in qmp_x_blockdev_amend() would be asymmetric then, so we’d probably want a new JobDriver method that runs in the main thread before .run() is invoked. (Unfortunately, “.prepare()” is now taken already...) Doesn’t solve the FUSE problem, but there we could try to just take the RESIZE permission permanently and if that fails, we just don’t allow truncates for that export. Not nice, but should work for common cases. Hanna
El jue., 18 nov. 2021 16:31, Hanna Reitz <hreitz@redhat.com> escribió: > On 18.11.21 14:50, Paolo Bonzini wrote: > > On 11/15/21 17:03, Hanna Reitz wrote: > >> > >> I only really see four solutions for this: > >> (1) We somehow make the amend job run in the main context under the > >> BQL and have it prevent all concurrent I/O access (seems bad) > >> (2) We can make the permission functions part of the I/O path (seems > >> wrong and probably impossible?) > >> (3) We can drop the permissions update and permanently require the > >> permissions that we need when updating keys (I think this might break > >> existing use cases) > >> (4) We can acquire the BQL around the permission update call and > >> perhaps that works? > >> > >> I don’t know how (4) would work but it’s basically the only > >> reasonable solution I can come up with. Would this be a way to call > >> a BQL function from an I/O function? > > > > I think that would deadlock: > > > > main I/O thread > > -------- ----- > > start bdrv_co_amend > > take BQL > > bdrv_drain > > ... hangs ... > > :/ > > Is there really nothing we can do? Forgive me if I’m talking complete > nonsense here (because frankly I don’t even really know what a bottom > half is exactly), but can’t we schedule some coroutine in the main > thread to do the perm notifications and wait for them in the I/O thread? > I think you still get a deadlock, just one with a longer chain. You still have a cycle of things depending on each other, but one of them is now the I/O thread waiting for the bottom half. Hmm... Perhaps. We would need to undo the permission change when the > job finishes, though, i.e. in JobDriver.prepare() or JobDriver.clean(). > Doing the change in qmp_x_blockdev_amend() would be asymmetric then, so > we’d probably want a new JobDriver method that runs in the main thread > before .run() is invoked. (Unfortunately, “.prepare()” is now taken > already...) > Ok at least it's feasible. Doesn’t solve the FUSE problem, but there we could try to just take the > RESIZE permission permanently and if that fails, we just don’t allow > truncates for that export. Not nice, but should work for common cases. > Yeah definitely not nice. Probably permissions could be protected by their own mutex, even a global one like the one we have for jobs. For now I suggest just ignoring the problem and adding a comment, since it's not really something that didn't exist. Paolo
On 19/11/2021 04:13, Paolo Bonzini wrote: > > > El jue., 18 nov. 2021 16:31, Hanna Reitz <hreitz@redhat.com > <mailto:hreitz@redhat.com>> escribió: > > On 18.11.21 14:50, Paolo Bonzini wrote: > > On 11/15/21 17:03, Hanna Reitz wrote: > >> > >> I only really see four solutions for this: > >> (1) We somehow make the amend job run in the main context under the > >> BQL and have it prevent all concurrent I/O access (seems bad) > >> (2) We can make the permission functions part of the I/O path > (seems > >> wrong and probably impossible?) > >> (3) We can drop the permissions update and permanently require the > >> permissions that we need when updating keys (I think this might > break > >> existing use cases) > >> (4) We can acquire the BQL around the permission update call and > >> perhaps that works? > >> > >> I don’t know how (4) would work but it’s basically the only > >> reasonable solution I can come up with. Would this be a way to > call > >> a BQL function from an I/O function? > > > > I think that would deadlock: > > > > main I/O thread > > -------- ----- > > start bdrv_co_amend > > take BQL > > bdrv_drain > > ... hangs ... > > :/ > > Is there really nothing we can do? Forgive me if I’m talking complete > nonsense here (because frankly I don’t even really know what a bottom > half is exactly), but can’t we schedule some coroutine in the main > thread to do the perm notifications and wait for them in the I/O thread? > > > I think you still get a deadlock, just one with a longer chain. You > still have a cycle of things depending on each other, but one of them is > now the I/O thread waiting for the bottom half. > > Hmm... Perhaps. We would need to undo the permission change when the > job finishes, though, i.e. in JobDriver.prepare() or JobDriver.clean(). > Doing the change in qmp_x_blockdev_amend() would be asymmetric then, so > we’d probably want a new JobDriver method that runs in the main thread > before .run() is invoked. (Unfortunately, “.prepare()” is now taken > already...) > > > Ok at least it's feasible. Ok I think I got it. I will create a new callback, maybe "pre_run" or something like that to perform the first bdrv_child_refresh_perms and implement the .clean callback to perform the "cleanup" bdrv_child_refresh_perms in block_crypto_amend_options_generic_luks. > > Doesn’t solve the FUSE problem, but there we could try to just take the > RESIZE permission permanently and if that fails, we just don’t allow > truncates for that export. Not nice, but should work for common cases. > > > Yeah definitely not nice. Probably permissions could be protected by > their own mutex, even a global one like the one we have for jobs. For > now I suggest just ignoring the problem and adding a comment, since it's > not really something that didn't exist. > Will add a TODO in blk_set/get permissions explaining the issue. Last issue we had with regards to permissions in GS had to do with bdrv_co_invalidate_cache: however, Paolo suggested me a simple fix to simply assert that the function is either under BQL or does not have open_flags & BDRV_O_INACTIVE set. This basically skips the permission code block, entering it only if we have the BQL. Ok, apart from this permissions issue and assert_bdrv_graph_writable, I should have addressed all main comments of this series. Assume that for the others where I did not explicitly answered, I agree and applied your comments. Thank you, Emanuele
Currently, block layer APIs like block-backend.h contain a mix of functions that are either running in the main loop and under the BQL, or are thread-safe functions and run in iothreads performing I/O. The functions running under BQL also take care of modifying the block graph, by using drain and/or aio_context_acquire/release. This makes it very confusing to understand where each function runs, and what assumptions it provided with regards to thread safety. We call the functions running under BQL "global state (GS) API", and distinguish them from the thread-safe "I/O API". The aim of this series is to split the relevant block headers in global state and I/O sub-headers. The division will be done in this way: header.h will be split in header-global-state.h, header-io.h and header-common.h. The latter will just contain the data structures needed by header-global-state and header-io, and common helpers that are neither in GS nor in I/O. header.h will remain for legacy and to avoid changing all includes in all QEMU c files, but will only include the two new headers. No function shall be added in header.c . Once we split all relevant headers, it will be much easier to see what uses the AioContext lock and remove it, which is the overall main goal of this and other series that I posted/will post. In addition to splitting the relevant headers shown in this series, it is also very helpful splitting the function pointers in some block structures, to understand what runs under AioContext lock and what doesn't. This is what patches 19-25 do. Each function in the GS API will have an assertion, checking that it is always running under BQL. I/O functions are instead thread safe (or so should be), meaning that they *can* run under BQL, but also in an iothread in another AioContext. Therefore they do not provide any assertion, and need to be audited manually to verify the correctness. Adding assetions has helped finding 2 bugs already, as shown in my series "Migration: fix missing iothread locking". Tested this series by running unit tests, qemu-iotests and qtests (x86_64). Some functions in the GS API are used everywhere but not properly tested. Therefore their assertion is never actually run in the tests, so despite my very careful auditing, it is not impossible to exclude that some will trigger while actually using QEMU. Patch 1 introduces qemu_in_main_thread(), the function used in all assertions. This had to be introduced otherwise all unit tests would fail, since they run in the main loop but use the code in stubs/iothread.c Patches 2-14 and 19-25 (with the exception of patch 9, that is an additional assert) are all structured in the same way: first we split the header and in the next (even) patch we add assertions. The rest of the patches ontain either both assertions and split, or have no assertions. Next steps once this get reviewed: 1) audit the GS API and replace the AioContext lock with drains, or remove them when not necessary (requires further discussion). 2) [optional as it should be already the case] audit the I/O API and check that thread safety is guaranteed Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> --- v3 -> v4: * blockdev.h (patch 14): blockdev_mark_auto_del, blockdev_auto_del and blk_legacy_dinfo as GS API. * add copyright header to block.h, block-io.h and block-global-state.h * rebase on current master (c5b2f55981) v2 -> v3: * rename "graph API" into "global state API" * change order of patches, block.h comes before block-backend.h * change GS and I/O comment headers to avoid redundancy, all other headers refer to block-global-state.h and block-io.h * fix typo on GS and I/O headers * use assert instead of g_assert * move bdrv_pwrite_sync, bdrv_block_status and bdrv_co_copy_range_{from/to} to the I/O API * change assert_bdrv_graph_writable implementation, since we need to introduce additional drains * remove transactions API split * add preparation patch for blockdev.h (patch 13) * backup-top -> copy-on-write * change I/O comment in job.h into a better meaningful explanation * fix all warnings given by checkpatch, mostly due to /* */ to be split in separate lines * rebase on current master (c09124dcb8), and split the following new functions: blk_replace_bs (I/O) bdrv_bsc_is_data (I/O) bdrv_bsc_invalidate_range (I/O) bdrv_bsc_fill (I/O) bdrv_new_open_driver_opts (GS) blk_get_max_hw_iov (I/O) they are all added in patches 4 and 6. v1 -> v2: * remove the iothread locking bug fix, and send it as separate patch * rename graph API -> global state API * better documented patch 1 (qemu_in_main_thread) * add and split all other block layer headers * fix warnings given by checkpatch on multiline comments Emanuele Giuseppe Esposito (25): main-loop.h: introduce qemu_in_main_thread() include/block/block: split header into I/O and global state API assertions for block global state API include/sysemu/block-backend: split header into I/O and global state (GS) API block/block-backend.c: assertions for block-backend include/block/block_int: split header into I/O and global state API assertions for block_int global state API block: introduce assert_bdrv_graph_writable include/block/blockjob_int.h: split header into I/O and GS API assertions for blockjob_int.h include/block/blockjob.h: global state API assertions for blockob.h global state API include/sysemu/blockdev.h: move drive_add and inline drive_def include/systemu/blockdev.h: global state API assertions for blockdev.h global state API include/block/snapshot: global state API + assertions block/copy-before-write.h: global state API + assertions block/coroutines: I/O API block_int-common.h: split function pointers in BlockDriver block_int-common.h: assertion in the callers of BlockDriver function pointers block_int-common.h: split function pointers in BdrvChildClass block_int-common.h: assertions in the callers of BdrvChildClass function pointers block-backend-common.h: split function pointers in BlockDevOps job.h: split function pointers in JobDriver job.h: assertions in the callers of JobDriver funcion pointers block.c | 188 ++- block/backup.c | 1 + block/block-backend.c | 105 +- block/commit.c | 4 + block/copy-before-write.c | 2 + block/copy-before-write.h | 7 + block/coroutines.h | 6 + block/dirty-bitmap.c | 1 + block/io.c | 37 + block/meson.build | 7 +- block/mirror.c | 4 + block/monitor/bitmap-qmp-cmds.c | 6 + block/monitor/block-hmp-cmds.c | 2 +- block/snapshot.c | 28 + block/stream.c | 2 + blockdev.c | 55 +- blockjob.c | 14 + include/block/block-common.h | 389 +++++ include/block/block-global-state.h | 286 ++++ include/block/block-io.h | 306 ++++ include/block/block.h | 878 +---------- include/block/block_int-common.h | 1193 +++++++++++++++ include/block/block_int-global-state.h | 327 ++++ include/block/block_int-io.h | 163 ++ include/block/block_int.h | 1478 +------------------ include/block/blockjob.h | 9 + include/block/blockjob_int.h | 28 + include/block/snapshot.h | 13 +- include/qemu/job.h | 16 + include/qemu/main-loop.h | 13 + include/sysemu/block-backend-common.h | 92 ++ include/sysemu/block-backend-global-state.h | 122 ++ include/sysemu/block-backend-io.h | 139 ++ include/sysemu/block-backend.h | 269 +--- include/sysemu/blockdev.h | 24 +- job.c | 9 + migration/savevm.c | 2 + softmmu/cpus.c | 5 + softmmu/qdev-monitor.c | 2 + softmmu/vl.c | 25 +- stubs/iothread-lock.c | 5 + 41 files changed, 3619 insertions(+), 2643 deletions(-) create mode 100644 include/block/block-common.h create mode 100644 include/block/block-global-state.h create mode 100644 include/block/block-io.h create mode 100644 include/block/block_int-common.h create mode 100644 include/block/block_int-global-state.h create mode 100644 include/block/block_int-io.h create mode 100644 include/sysemu/block-backend-common.h create mode 100644 include/sysemu/block-backend-global-state.h create mode 100644 include/sysemu/block-backend-io.h