Message ID | 20190826090812.19080-1-lvivier@redhat.com |
---|---|
State | New |
Headers | show |
Series | pseries: Fix compat_pvr on reset | expand |
On Mon, Aug 26, 2019 at 11:08:12AM +0200, Laurent Vivier wrote: > If we a migrate P8 machine to a P9 machine, the migration fails on > destination with: > > error while loading state for instance 0x1 of device 'cpu' > load of migration failed: Operation not permitted > > This is caused because the compat_pvr field is only present for the first > CPU. > Originally, spapr_machine_reset() calls ppc_set_compat() to set the value > max_compat_pvr for the first cpu and this was propagated to all CPUs by > spapr_cpu_reset(). Now, as spapr_cpu_reset() is called before that, the > value is not propagated to all CPUs and the migration fails. > > To fix that, propagate the new value to all CPUs in spapr_machine_reset(). > > Fixes: 25c9780d38d4 ("spapr: Reset CAS & IRQ subsystem after devices") > Signed-off-by: Laurent Vivier <lvivier@redhat.com> Applied to ppc-for-4.2, thanks. > --- > hw/ppc/spapr.c | 8 +++++++- > hw/ppc/spapr_cpu_core.c | 2 ++ > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index baedadf20b8c..d063312a3b2a 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1752,7 +1752,13 @@ static void spapr_machine_reset(MachineState *machine) > spapr_ovec_cleanup(spapr->ov5_cas); > spapr->ov5_cas = spapr_ovec_new(); > > - ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal); > + /* > + * reset compat_pvr for all CPUs > + * as qemu_devices_reset() is called before this, > + * it can't be propagated by spapr_cpu_reset() > + * from the first CPU to all the others > + */ > + ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal); > } > > /* > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index bf47fbdf6f7f..45e2f2747ffc 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -43,6 +43,8 @@ static void spapr_cpu_reset(void *opaque) > > /* Set compatibility mode to match the boot CPU, which was either set > * by the machine reset code or by CAS. This should never fail. > + * At startup the value is already set for all the CPUs > + * but we need this when we hotplug a new CPU > */ > ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort); >
On Mon, Aug 26, 2019 at 11:08:12AM +0200, Laurent Vivier wrote: > If we a migrate P8 machine to a P9 machine, the migration fails on > destination with: > > error while loading state for instance 0x1 of device 'cpu' > load of migration failed: Operation not permitted > > This is caused because the compat_pvr field is only present for the first > CPU. > Originally, spapr_machine_reset() calls ppc_set_compat() to set the value > max_compat_pvr for the first cpu and this was propagated to all CPUs by > spapr_cpu_reset(). Now, as spapr_cpu_reset() is called before that, the > value is not propagated to all CPUs and the migration fails. > > To fix that, propagate the new value to all CPUs in spapr_machine_reset(). > > Fixes: 25c9780d38d4 ("spapr: Reset CAS & IRQ subsystem after devices") > Signed-off-by: Laurent Vivier <lvivier@redhat.com> Applied to ppc-for-4.2, thanks. > --- > hw/ppc/spapr.c | 8 +++++++- > hw/ppc/spapr_cpu_core.c | 2 ++ > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index baedadf20b8c..d063312a3b2a 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1752,7 +1752,13 @@ static void spapr_machine_reset(MachineState *machine) > spapr_ovec_cleanup(spapr->ov5_cas); > spapr->ov5_cas = spapr_ovec_new(); > > - ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal); > + /* > + * reset compat_pvr for all CPUs > + * as qemu_devices_reset() is called before this, > + * it can't be propagated by spapr_cpu_reset() > + * from the first CPU to all the others > + */ > + ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal); > } > > /* > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index bf47fbdf6f7f..45e2f2747ffc 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -43,6 +43,8 @@ static void spapr_cpu_reset(void *opaque) > > /* Set compatibility mode to match the boot CPU, which was either set > * by the machine reset code or by CAS. This should never fail. > + * At startup the value is already set for all the CPUs > + * but we need this when we hotplug a new CPU > */ > ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort); >
On Mon, 26 Aug 2019 11:08:12 +0200 Laurent Vivier <lvivier@redhat.com> wrote: > If we a migrate P8 machine to a P9 machine, the migration fails on > destination with: > > error while loading state for instance 0x1 of device 'cpu' > load of migration failed: Operation not permitted > > This is caused because the compat_pvr field is only present for the first > CPU. > Originally, spapr_machine_reset() calls ppc_set_compat() to set the value > max_compat_pvr for the first cpu and this was propagated to all CPUs by > spapr_cpu_reset(). Now, as spapr_cpu_reset() is called before that, the > value is not propagated to all CPUs and the migration fails. > > To fix that, propagate the new value to all CPUs in spapr_machine_reset(). > Yeah, the assumption that compat_pvr would be set for the boot CPU before device reset was rather fragile. It makes a lot of sense to do this explicitly from the core machine code. Reviewed-by: Greg Kurz <groug@kaod.org> And now, ppc_set_compat() ends up being called twice for every CPU at machine reset. It isn't a great performance penalty but I think the case of hotplugged CPUs could be better handled by calling ppc_set_compat() from spapr_core_plug(). This would also be cleaner to have the compat_pvr stuff to be handled in spapr.c only rather than in two separate files IMHO. I'll send a patch. > Fixes: 25c9780d38d4 ("spapr: Reset CAS & IRQ subsystem after devices") > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > --- > hw/ppc/spapr.c | 8 +++++++- > hw/ppc/spapr_cpu_core.c | 2 ++ > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index baedadf20b8c..d063312a3b2a 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1752,7 +1752,13 @@ static void spapr_machine_reset(MachineState *machine) > spapr_ovec_cleanup(spapr->ov5_cas); > spapr->ov5_cas = spapr_ovec_new(); > > - ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal); > + /* > + * reset compat_pvr for all CPUs > + * as qemu_devices_reset() is called before this, > + * it can't be propagated by spapr_cpu_reset() > + * from the first CPU to all the others > + */ > + ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal); > } > > /* > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index bf47fbdf6f7f..45e2f2747ffc 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -43,6 +43,8 @@ static void spapr_cpu_reset(void *opaque) > > /* Set compatibility mode to match the boot CPU, which was either set > * by the machine reset code or by CAS. This should never fail. > + * At startup the value is already set for all the CPUs > + * but we need this when we hotplug a new CPU > */ > ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort); >
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index baedadf20b8c..d063312a3b2a 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1752,7 +1752,13 @@ static void spapr_machine_reset(MachineState *machine) spapr_ovec_cleanup(spapr->ov5_cas); spapr->ov5_cas = spapr_ovec_new(); - ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal); + /* + * reset compat_pvr for all CPUs + * as qemu_devices_reset() is called before this, + * it can't be propagated by spapr_cpu_reset() + * from the first CPU to all the others + */ + ppc_set_compat_all(spapr->max_compat_pvr, &error_fatal); } /* diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index bf47fbdf6f7f..45e2f2747ffc 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -43,6 +43,8 @@ static void spapr_cpu_reset(void *opaque) /* Set compatibility mode to match the boot CPU, which was either set * by the machine reset code or by CAS. This should never fail. + * At startup the value is already set for all the CPUs + * but we need this when we hotplug a new CPU */ ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort);
If we a migrate P8 machine to a P9 machine, the migration fails on destination with: error while loading state for instance 0x1 of device 'cpu' load of migration failed: Operation not permitted This is caused because the compat_pvr field is only present for the first CPU. Originally, spapr_machine_reset() calls ppc_set_compat() to set the value max_compat_pvr for the first cpu and this was propagated to all CPUs by spapr_cpu_reset(). Now, as spapr_cpu_reset() is called before that, the value is not propagated to all CPUs and the migration fails. To fix that, propagate the new value to all CPUs in spapr_machine_reset(). Fixes: 25c9780d38d4 ("spapr: Reset CAS & IRQ subsystem after devices") Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- hw/ppc/spapr.c | 8 +++++++- hw/ppc/spapr_cpu_core.c | 2 ++ 2 files changed, 9 insertions(+), 1 deletion(-)