Message ID | 20170327132242.17104-1-lvivier@redhat.com |
---|---|
State | New |
Headers | show |
Adding cc: Bharata. On 27/03/2017 15:22, Laurent Vivier wrote: > If, once the kernel has booted, we try to remove a memory > hotplugged while the kernel was not started, QEMU crashes on > an assert: > > qemu-system-ppc64: hw/virtio/vhost.c:651: > vhost_commit: Assertion `r >= 0' failed. > ... > #4 in vhost_commit > #5 in memory_region_transaction_commit > #6 in pc_dimm_memory_unplug > #7 in spapr_memory_unplug > #8 spapr_machine_device_unplug > #9 in hotplug_handler_unplug > #10 in spapr_lmb_release > #11 in detach > #12 in set_allocation_state > #13 in rtas_set_indicator > ... > > If we take a closer look to the guest kernel log, we can see when > we try to unplug the memory: > > pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) > > What happens: > > 1- The kernel has ignored the memory hotplug event because > it was not started when it was generated. > > 2- When we hot-unplug the memory, > QEMU starts to remove the memory, > generates an hot-unplug event, > and signals the kernel of the incoming new event > > 3- as the kernel is started, on the QEMU signal, it reads > the event list, decodes the hotplug event and tries to > finish the hotplugging. > > 4- QEMU receives the hotplug notification while it > is trying to hot-unplug the memory. This moves the memory > DRC to an invalid state > > This patch prevents this by not allowing to set the allocation > state to USABLE while the DRC is awaiting release and allocation. > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > --- > hw/ppc/spapr_drc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > index 150f6bf..377ea65 100644 > --- a/hw/ppc/spapr_drc.c > +++ b/hw/ppc/spapr_drc.c > @@ -135,6 +135,12 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc, > if (!drc->dev) { > return RTAS_OUT_NO_SUCH_INDICATOR; > } > + if (drc->awaiting_release && drc->awaiting_allocation) { > + /* kernel is acknowledging a previous hotplug event > + * while we are already removing it. > + */ I'm wondering if I should set drc->awaiting_allocation to false here? > + return RTAS_OUT_NO_SUCH_INDICATOR; > + } > } > > if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) { > Thanks, Laurent
Quoting Laurent Vivier (2017-03-27 08:22:42) > If, once the kernel has booted, we try to remove a memory > hotplugged while the kernel was not started, QEMU crashes on > an assert: > > qemu-system-ppc64: hw/virtio/vhost.c:651: > vhost_commit: Assertion `r >= 0' failed. > ... > #4 in vhost_commit > #5 in memory_region_transaction_commit > #6 in pc_dimm_memory_unplug > #7 in spapr_memory_unplug > #8 spapr_machine_device_unplug > #9 in hotplug_handler_unplug > #10 in spapr_lmb_release > #11 in detach > #12 in set_allocation_state > #13 in rtas_set_indicator > ... > > If we take a closer look to the guest kernel log, we can see when > we try to unplug the memory: > > pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) > > What happens: > > 1- The kernel has ignored the memory hotplug event because > it was not started when it was generated. > > 2- When we hot-unplug the memory, > QEMU starts to remove the memory, > generates an hot-unplug event, > and signals the kernel of the incoming new event > > 3- as the kernel is started, on the QEMU signal, it reads > the event list, decodes the hotplug event and tries to > finish the hotplugging. > > 4- QEMU receives the hotplug notification while it > is trying to hot-unplug the memory. This moves the memory > DRC to an invalid state So, IIUC, it looks like the the memory is already onlined at boot-time, and when it attempts to handle the hotplug event the kernel does the following: static int dlpar_memory_add_by_count(u32 lmbs_to_add, struct property *prop) { ... pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add); ... for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) { rc = dlpar_acquire_drc(lmbs[i].drc_index); if (rc) continue; rc = dlpar_add_lmb(&lmbs[i]); if (rc) { dlpar_release_drc(lmbs[i].drc_index); continue; } lmbs_added++; /* Mark this lmb so we can remove it later if all of the * requested LMBs cannot be added. */ lmbs[i].reserved = 1; } dlpar_add_lmb likely fails because the memory was already onlined, and the acquire/release cause allocation_state transitions of USABLE and UNUSABLE, respectively. But since in the meantime we've marked the memory for unplug, this UNUSABLE transition falsely indicates to QEMU that the guest has acknowledged the unplug and released the memory, so we nuke it prematurely. > > This patch prevents this by not allowing to set the allocation > state to USABLE while the DRC is awaiting release and allocation. > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > --- > hw/ppc/spapr_drc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > index 150f6bf..377ea65 100644 > --- a/hw/ppc/spapr_drc.c > +++ b/hw/ppc/spapr_drc.c > @@ -135,6 +135,12 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc, > if (!drc->dev) { > return RTAS_OUT_NO_SUCH_INDICATOR; > } > + if (drc->awaiting_release && drc->awaiting_allocation) { > + /* kernel is acknowledging a previous hotplug event > + * while we are already removing it. > + */ > + return RTAS_OUT_NO_SUCH_INDICATOR; > + } So basically if the unplug happens soon-enough after the hotplug (or in the special-case of hotplugging at boot-time, any time after hotplug), we force the dlpar_acquire_drc to fail so we don't subsequently nuke it on release. This might trip up test cases where it's common to fire off a ton of plug/unplug and assume each operation succeeds in turn, but the new behavior does seem appropriate since there's no reason acquire should be expected to succeed if we've since marked the memory for removal; that choice is wholly up to the platform up until the point the acquire actually succeeds. Technically, though, it's the *release* part that's the real problem, since that's what's getting misconstrued. Unfortunately I don't see any way to address things on that end, so this approach seems reasonable. As for the follow-up question I think clearing awaiting_allocation would be necessary to allow subsequent unplug to succeed and not be stuck with the unused DIMM until reset. My main concern there though is if any code ever decided to retry the hotplug for some reason, they could bypass this check and trigger the same error. If it was highly unlikely to occur I think I would opt to simply clear awaiting_allocation, but I think this scenario could be triggered by simply invoking: drmgr -c lmb -s ... -a after the initial plug and unplug requests fail. Ideally we would keep the awaiting_allocation && awaiting_release guard in place so dlpar_acquire_drc never succeeds in this case, but still allow for unplug to succeed. One *assumption* we can make use of is that acquire/release are done in pairs by the hotplug code, so if acquire/allocation_state:USABLE always fail, then a subsequent allocate_state:UNUSABLE would then likely be the result of the unplug path where release can be reasonably issued in isolation (under the assumption that hotplug handled the acquire). So perhaps something like (untested): if (drc->awaiting_release && drc->awaiting_allocation) { /* kernel is acknowledging a previous hotplug event * while we are already removing it. - */ + * it's safe to ignore awaiting_allocation here since we know the + * situation is predicated on the guest either already having done + * so (boot-time hotplug), or never being able to acquire in the + * first place (hotplug followed by immediate unplug). + */ + drc->awaiting_allocation_skippable = true; return RTAS_OUT_NO_SUCH_INDICATOR; } and then in detach(): if (drc->awaiting_allocation) { - drc->awaiting_release = true; - trace_spapr_drc_awaiting_allocation(get_index(drc)); - return; + if (!drc->awaiting_allocation_skippable) { + drc->awaiting_release = true; + trace_spapr_drc_awaiting_allocation(get_index(drc)); + return; + } } ... drc->awaiting_release = false; + drc->awaiting_release_skippable = false; > } > > if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) { > -- > 2.9.3 > >
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c index 150f6bf..377ea65 100644 --- a/hw/ppc/spapr_drc.c +++ b/hw/ppc/spapr_drc.c @@ -135,6 +135,12 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc, if (!drc->dev) { return RTAS_OUT_NO_SUCH_INDICATOR; } + if (drc->awaiting_release && drc->awaiting_allocation) { + /* kernel is acknowledging a previous hotplug event + * while we are already removing it. + */ + return RTAS_OUT_NO_SUCH_INDICATOR; + } } if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receives the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release and allocation. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382 Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- hw/ppc/spapr_drc.c | 6 ++++++ 1 file changed, 6 insertions(+)