Message ID | 1505054255-2990-4-git-send-email-mark.cave-ayland@ilande.co.uk |
---|---|
State | New |
Headers | show |
Series | ppc: migration fixes for TCG | expand |
On Sun, Sep 10, 2017 at 03:37:34PM +0100, Mark Cave-Ayland wrote: > This is referenced in cpu_ppc_handle_mmu_fault() and so should be included > in the migration stream. That is not, on its own, sufficient reason. > Note: the vmstate_ppc version number has already been bumped by the previous > patch in this series. > > Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> As with 2/4 it breaks backwards migration. But more, I really disklike the idea of migrating this. It's internal state for one, and it's also essentially transitory state. Can we avoid putting it in the otherwise persistent structure at all? Can we derive the state from elsewhere? Can we prevent migration from occurring in the small windows where this data is live? > --- > target/ppc/machine.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/ppc/machine.c b/target/ppc/machine.c > index 8fec1a4..10b3c41 100644 > --- a/target/ppc/machine.c > +++ b/target/ppc/machine.c > @@ -676,7 +676,7 @@ const VMStateDescription vmstate_ppc_cpu = { > > /* Internal state */ > VMSTATE_UINTTL(env.hflags_nmsr, PowerPCCPU), > - /* FIXME: access_type? */ > + VMSTATE_UINT8_V(env.access_type, PowerPCCPU, 6), > > /* Interrupt state */ > VMSTATE_UINT32_V(env.pending_interrupts, PowerPCCPU, 6),
On 11/09/17 11:57, David Gibson wrote: > On Sun, Sep 10, 2017 at 03:37:34PM +0100, Mark Cave-Ayland wrote: >> This is referenced in cpu_ppc_handle_mmu_fault() and so should be included >> in the migration stream. > > That is not, on its own, sufficient reason. > >> Note: the vmstate_ppc version number has already been bumped by the previous >> patch in this series. >> >> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> > > As with 2/4 it breaks backwards migration. > > But more, I really disklike the idea of migrating this. It's internal > state for one, and it's also essentially transitory state. Can we > avoid putting it in the otherwise persistent structure at all? Can we > derive the state from elsewhere? Can we prevent migration from > occurring in the small windows where this data is live? From what I can see references to access_type are scattered throughout mmu_helper.c although I'm not necessarily familiar enough with PPC to know whether this is something that can be derived elsewhere instead. And once again it was something that was removed by a90db15. When pausing a VM, does execution stop at the end of the current TB rather than immediately? If so, perhaps someone could confirm that guarantee is good enough for access_type? ATB, Mark.
On Mon, Sep 11, 2017 at 05:52:10PM +0100, Mark Cave-Ayland wrote: > On 11/09/17 11:57, David Gibson wrote: > > > On Sun, Sep 10, 2017 at 03:37:34PM +0100, Mark Cave-Ayland wrote: > >> This is referenced in cpu_ppc_handle_mmu_fault() and so should be included > >> in the migration stream. > > > > That is not, on its own, sufficient reason. > > > >> Note: the vmstate_ppc version number has already been bumped by the previous > >> patch in this series. > >> > >> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> > > > > As with 2/4 it breaks backwards migration. > > > > But more, I really disklike the idea of migrating this. It's internal > > state for one, and it's also essentially transitory state. Can we > > avoid putting it in the otherwise persistent structure at all? Can we > > derive the state from elsewhere? Can we prevent migration from > > occurring in the small windows where this data is live? > > >From what I can see references to access_type are scattered throughout > mmu_helper.c although I'm not necessarily familiar enough with PPC to > know whether this is something that can be derived elsewhere instead. > And once again it was something that was removed by a90db15. Right, but the migration code prior to a90db15 was a complete mess. It definitely included a number of things it didn't need to and shouldn't as well as being missing other things that were needed. It's not a good model. And although it might have more-or-less worked for certain machines like the ones you're reviving here, it was never properly tested > When pausing a VM, does execution stop at the end of the current TB > rather than immediately? If so, perhaps someone could confirm that > guarantee is good enough for access_type? I'm pretty sure it has to; we'd have to come up out of an individual TB in order to get to the main loop where we check the "please pause" flag. I'm not sure if that helps us here though - I *think* access type is about carrying information from the point where we trigger an exception to the point where we actually start processing the exception. This code is really ugly and I've never understood it well :(. It's always seemed bogus to me that we have an essentially global variable to carry information over that small gap, though. Unfortunately it's unlikely that I'd be able to dive into this and work out if it's really needed any time soon.
On 13/09/17 08:19, David Gibson wrote: >> When pausing a VM, does execution stop at the end of the current TB >> rather than immediately? If so, perhaps someone could confirm that >> guarantee is good enough for access_type? > > I'm pretty sure it has to; we'd have to come up out of an individual > TB in order to get to the main loop where we check the "please pause" > flag. I'm not sure if that helps us here though - I *think* access > type is about carrying information from the point where we trigger an > exception to the point where we actually start processing the > exception. > > This code is really ugly and I've never understood it well :(. It's > always seemed bogus to me that we have an essentially global variable > to carry information over that small gap, though. Unfortunately it's > unlikely that I'd be able to dive into this and work out if it's > really needed any time soon. From my testing yesterday it does appear to be required for TCG (or unless this is exposing another bug in the Mac migration) so I can check it works here and then maybe someone else confirm it works on KVM? A couple of things I've noticed: the heathrow VMStateDescription looks good, however I can see that the OpenPIC timers won't likely migrate correctly without adding a few more fields - that's easy to fix. Another thing is that if we're changing the migration stream for Mac models Ben has some OpenPIC and other related changes in his wip queue that it might make sense to put those in first before properly fixing the Mac machine migration. ATB, Mark.
On Wed, Sep 13, 2017 at 06:17:21PM +0100, Mark Cave-Ayland wrote: > On 13/09/17 08:19, David Gibson wrote: > > >> When pausing a VM, does execution stop at the end of the current TB > >> rather than immediately? If so, perhaps someone could confirm that > >> guarantee is good enough for access_type? > > > > I'm pretty sure it has to; we'd have to come up out of an individual > > TB in order to get to the main loop where we check the "please pause" > > flag. I'm not sure if that helps us here though - I *think* access > > type is about carrying information from the point where we trigger an > > exception to the point where we actually start processing the > > exception. > > > > This code is really ugly and I've never understood it well :(. It's > > always seemed bogus to me that we have an essentially global variable > > to carry information over that small gap, though. Unfortunately it's > > unlikely that I'd be able to dive into this and work out if it's > > really needed any time soon. > > >From my testing yesterday it does appear to be required for TCG (or > unless this is exposing another bug in the Mac migration) so I can check > it works here and then maybe someone else confirm it works on KVM? > > A couple of things I've noticed: the heathrow VMStateDescription looks > good, however I can see that the OpenPIC timers won't likely migrate > correctly without adding a few more fields - that's easy to fix. Right. And since OpenPIC isn't used on any platforms that have real production use in the wild, it's fine to bump the migration stream version for it. > Another thing is that if we're changing the migration stream for Mac > models Ben has some OpenPIC and other related changes in his wip queue > that it might make sense to put those in first before properly fixing > the Mac machine migration. That would have something to be said for it. It's probably not essential, though, since I don't consider the non-pseries platforms at this stage sufficiently mature that we guarantee a stable migration stream format.
diff --git a/target/ppc/machine.c b/target/ppc/machine.c index 8fec1a4..10b3c41 100644 --- a/target/ppc/machine.c +++ b/target/ppc/machine.c @@ -676,7 +676,7 @@ const VMStateDescription vmstate_ppc_cpu = { /* Internal state */ VMSTATE_UINTTL(env.hflags_nmsr, PowerPCCPU), - /* FIXME: access_type? */ + VMSTATE_UINT8_V(env.access_type, PowerPCCPU, 6), /* Interrupt state */ VMSTATE_UINT32_V(env.pending_interrupts, PowerPCCPU, 6),
This is referenced in cpu_ppc_handle_mmu_fault() and so should be included in the migration stream. Note: the vmstate_ppc version number has already been bumped by the previous patch in this series. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> --- target/ppc/machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)