Message ID | 153452570408.13804.6971118770752634964.stgit@gimli.home |
---|---|
State | New |
Headers | show |
Series | [PULL,1/4] balloon: Allow multiple inhibit users | expand |
This breaks qemu-io test for me. #0 0x000003ff98f3e2d4 in raise () at /lib64/libc.so.6 #1 0x000003ff98f239a8 in abort () at /lib64/libc.so.6 #2 0x000003ff98f3632e in __assert_fail_base () at /lib64/libc.so.6 #3 0x000003ff98f363ac in () at /lib64/libc.so.6 #4 0x000000000108301a in qemu_balloon_inhibit (state=state@entry=false) at /home/cborntra/REPOS/qemu/balloon.c:56 #5 0x00000000012026f2 in postcopy_ram_incoming_cleanup (mis=mis@entry=0x163bc5c0) at /home/cborntra/REPOS/qemu/migration/postcopy-ram.c:542 #6 0x00000000011eed5e in process_incoming_migration_co (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/migration/migration.c:407 #7 0x0000000001374000 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:116 #8 0x000003ff98f53c02 in __makecontext_ret () at /lib64/libc.so.6 Cant see right now whats wrong. On 08/17/2018 07:08 PM, Alex Williamson wrote: > A simple true/false internal state does not allow multiple users. Fix > this within the existing interface by converting to a counter, so long > as the counter is elevated, ballooning is inhibited. > > Reviewed-by: David Hildenbrand <david@redhat.com> > Reviewed-by: Peter Xu <peterx@redhat.com> > Reviewed-by: Cornelia Huck <cohuck@redhat.com> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > balloon.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/balloon.c b/balloon.c > index 6bf0a9681377..931987983858 100644 > --- a/balloon.c > +++ b/balloon.c > @@ -26,6 +26,7 @@ > > #include "qemu/osdep.h" > #include "qemu-common.h" > +#include "qemu/atomic.h" > #include "exec/cpu-common.h" > #include "sysemu/kvm.h" > #include "sysemu/balloon.h" > @@ -37,16 +38,22 @@ > static QEMUBalloonEvent *balloon_event_fn; > static QEMUBalloonStatus *balloon_stat_fn; > static void *balloon_opaque; > -static bool balloon_inhibited; > +static int balloon_inhibit_count; > > bool qemu_balloon_is_inhibited(void) > { > - return balloon_inhibited; > + return atomic_read(&balloon_inhibit_count) > 0; > } > > void qemu_balloon_inhibit(bool state) > { > - balloon_inhibited = state; > + if (state) { > + atomic_inc(&balloon_inhibit_count); > + } else { > + atomic_dec(&balloon_inhibit_count); > + } > + > + assert(atomic_read(&balloon_inhibit_count) >= 0); > } > > static bool have_balloon(Error **errp) > >
On Wed, 22 Aug 2018 17:56:12 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > This breaks qemu-io test for me. > > #0 0x000003ff98f3e2d4 in raise () at /lib64/libc.so.6 > #1 0x000003ff98f239a8 in abort () at /lib64/libc.so.6 > #2 0x000003ff98f3632e in __assert_fail_base () at /lib64/libc.so.6 > #3 0x000003ff98f363ac in () at /lib64/libc.so.6 > #4 0x000000000108301a in qemu_balloon_inhibit (state=state@entry=false) at /home/cborntra/REPOS/qemu/balloon.c:56 > #5 0x00000000012026f2 in postcopy_ram_incoming_cleanup (mis=mis@entry=0x163bc5c0) at /home/cborntra/REPOS/qemu/migration/postcopy-ram.c:542 > #6 0x00000000011eed5e in process_incoming_migration_co (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/migration/migration.c:407 > #7 0x0000000001374000 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:116 > #8 0x000003ff98f53c02 in __makecontext_ret () at /lib64/libc.so.6 > > Cant see right now whats wrong. I'd have to guess it's the assert added to verify increments and decrements are balanced, which suggests postcopy is de-inhibiting more than it's inhibiting :-\ Thanks, Alex
On Wed, 22 Aug 2018 11:47:09 -0600 Alex Williamson <alex.williamson@redhat.com> wrote: > On Wed, 22 Aug 2018 17:56:12 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > This breaks qemu-io test for me. > > > > #0 0x000003ff98f3e2d4 in raise () at /lib64/libc.so.6 > > #1 0x000003ff98f239a8 in abort () at /lib64/libc.so.6 > > #2 0x000003ff98f3632e in __assert_fail_base () at /lib64/libc.so.6 > > #3 0x000003ff98f363ac in () at /lib64/libc.so.6 > > #4 0x000000000108301a in qemu_balloon_inhibit (state=state@entry=false) at /home/cborntra/REPOS/qemu/balloon.c:56 > > #5 0x00000000012026f2 in postcopy_ram_incoming_cleanup (mis=mis@entry=0x163bc5c0) at /home/cborntra/REPOS/qemu/migration/postcopy-ram.c:542 > > #6 0x00000000011eed5e in process_incoming_migration_co (opaque=<optimized out>) at /home/cborntra/REPOS/qemu/migration/migration.c:407 > > #7 0x0000000001374000 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:116 > > #8 0x000003ff98f53c02 in __makecontext_ret () at /lib64/libc.so.6 > > > > Cant see right now whats wrong. > > I'd have to guess it's the assert added to verify increments and > decrements are balanced, which suggests postcopy is de-inhibiting more > than it's inhibiting :-\ Thanks, When postcopy inhibits ballooning, it does so from the migration stream at loaddvm_process_command(). If the command is MIG_CMD_POSTCOPY_LISTEN, this calls loadvm_postcopy_handle_listen(). If postcopy RAM is enabled, this calls postcopy_ram_enable_notify(), which is where the balloon inhibitor is triggered. I think the postcopy_ram_listen_thread, which is created at the end of loadvm_postcopy_handle_listen(), is intended to be where the inhibit would be lifted, but I note a couple error paths before that where the inhibit is not released. postcopy_ram_listen_thread() calls postcopy_ram_incoming_cleanup(), which always releases the balloon inhibit, regardless of the postcopy RAM capability, and of course there are some error cases in advance of this that would again leave the inhibit in place. We also have the coroutine process_incoming_migration_co, launched from migration_incoming_process() which seems to do a gratuitous call to postcopy_ram_incoming_cleanup() and will release the inhibit regardless of whether it was activated previous. So apparently I assumed too much about the existing balloon inhibitor use case, it's clearly not as general as the interface would lead one to believe. The immediate solution is probably to make a postcopy specific wrapper around qemu_balloon_inhibit() that keeps a static internal state to avoid over incrementing or decrementing. That should resolve this and then we can move on to whether we want to change the overall internal interface or fixup some of the other cases noted above. Thanks, Alex
On Wed, 22 Aug 2018 13:01:39 -0600 Alex Williamson <alex.williamson@redhat.com> wrote: > So apparently I assumed too much about the existing balloon inhibitor > use case, it's clearly not as general as the interface would lead one > to believe. The immediate solution is probably to make a postcopy > specific wrapper around qemu_balloon_inhibit() that keeps a static > internal state to avoid over incrementing or decrementing. That should > resolve this and then we can move on to whether we want to change the > overall internal interface or fixup some of the other cases noted > above. Thanks, > > Alex > My gut feeling is that an interface that allows code to add yet-another-inhibitor and to remove it again (IOW, what this patch implemented) is the way to go and that we should push out any checking whether a inhibitor actually can be removed to the callers. So, I think it actually makes a lot of sense for the postcopy code to keep track whether it actually submitted 'its' inhibit and only call into the generic inhibitor code to release it if that's the case.
diff --git a/balloon.c b/balloon.c index 6bf0a9681377..931987983858 100644 --- a/balloon.c +++ b/balloon.c @@ -26,6 +26,7 @@ #include "qemu/osdep.h" #include "qemu-common.h" +#include "qemu/atomic.h" #include "exec/cpu-common.h" #include "sysemu/kvm.h" #include "sysemu/balloon.h" @@ -37,16 +38,22 @@ static QEMUBalloonEvent *balloon_event_fn; static QEMUBalloonStatus *balloon_stat_fn; static void *balloon_opaque; -static bool balloon_inhibited; +static int balloon_inhibit_count; bool qemu_balloon_is_inhibited(void) { - return balloon_inhibited; + return atomic_read(&balloon_inhibit_count) > 0; } void qemu_balloon_inhibit(bool state) { - balloon_inhibited = state; + if (state) { + atomic_inc(&balloon_inhibit_count); + } else { + atomic_dec(&balloon_inhibit_count); + } + + assert(atomic_read(&balloon_inhibit_count) >= 0); } static bool have_balloon(Error **errp)