diff mbox

[PULL,02/15] migration: extend VMStateInfo

Message ID 20170126131402.547f1004.cornelia.huck@de.ibm.com
State New
Headers show

Commit Message

Cornelia Huck Jan. 26, 2017, 12:14 p.m. UTC
On Wed, 25 Jan 2017 14:44:20 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> > On Wed, 25 Jan 2017 13:22:55 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > 
> > > * Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> > > > On Wed, 25 Jan 2017 12:00:53 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> > > > > OK, so it looks like that's a failure path, adding a return -ENOMEM would seem to make
> > > > > sense there.
> > > > 
> > > > Just saw this. I don't think we want -ENOMEM, as that would change the
> > > > actual state being saved, no?
> > > 
> > > But isn't that the intention of this function?
> > > 
> > >     buf = g_try_malloc0(len);
> > >     if (!buf) {
> > >         /* Storing FLIC_FAILED into the count field here will cause the
> > >          * target system to fail when attempting to load irqs from the
> > >          * migration state */
> > >         error_report("flic: couldn't allocate memory");
> > >         qemu_put_be64(f, FLIC_FAILED);
> > >         return;
> > >     }
> > > 
> > > What should happen on the destination - should the migration fail?
> > > If we want the migration to fail then we should now return an error
> > > status rather than 0, and then we see a failed migration on the source
> > > as well.
> > 
> > Yes. There's also another error case below where we should return an
> > error instead of putting FLIC_FAILED, then.
> > 
> > The problem is that this is rather hard to test: So I'd prefer to fix
> > the compile for now and introduce error return codes in a separate
> > patch.
> 
> OK, that's fair.

I've coded something up and tried to test it with error injection to
trigger the failed case, but I can't really see an improvement :(

Before: source logs error, target fails to load the flic with 'invalid
argument'

After: source logs error, target fails to load the flic with 'could not
allocate memory'

The migration code does not seem to do anything with the return code of
put methods for now, so that's not too surprising. Is anything in the
works?

For now, I'd prefer to keep the old behaviour as 'invalid argument'
seems like a more obvious error on the target.

Comments

Dr. David Alan Gilbert Jan. 27, 2017, 6:20 p.m. UTC | #1
* Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> On Wed, 25 Jan 2017 14:44:20 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> > > On Wed, 25 Jan 2017 13:22:55 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > 
> > > > * Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> > > > > On Wed, 25 Jan 2017 12:00:53 +0000
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > > > > > OK, so it looks like that's a failure path, adding a return -ENOMEM would seem to make
> > > > > > sense there.
> > > > > 
> > > > > Just saw this. I don't think we want -ENOMEM, as that would change the
> > > > > actual state being saved, no?
> > > > 
> > > > But isn't that the intention of this function?
> > > > 
> > > >     buf = g_try_malloc0(len);
> > > >     if (!buf) {
> > > >         /* Storing FLIC_FAILED into the count field here will cause the
> > > >          * target system to fail when attempting to load irqs from the
> > > >          * migration state */
> > > >         error_report("flic: couldn't allocate memory");
> > > >         qemu_put_be64(f, FLIC_FAILED);
> > > >         return;
> > > >     }
> > > > 
> > > > What should happen on the destination - should the migration fail?
> > > > If we want the migration to fail then we should now return an error
> > > > status rather than 0, and then we see a failed migration on the source
> > > > as well.
> > > 
> > > Yes. There's also another error case below where we should return an
> > > error instead of putting FLIC_FAILED, then.
> > > 
> > > The problem is that this is rather hard to test: So I'd prefer to fix
> > > the compile for now and introduce error return codes in a separate
> > > patch.
> > 
> > OK, that's fair.
> 
> I've coded something up and tried to test it with error injection to
> trigger the failed case, but I can't really see an improvement :(
> 
> Before: source logs error, target fails to load the flic with 'invalid
> argument'
> 
> After: source logs error, target fails to load the flic with 'could not
> allocate memory'
> 
> The migration code does not seem to do anything with the return code of
> put methods for now, so that's not too surprising. Is anything in the
> works?

OK, I hadn't remembered that wasn't wired up.

> For now, I'd prefer to keep the old behaviour as 'invalid argument'
> seems like a more obvious error on the target.

Yes, agreed.

If you feel like, then I'd just change the return -ENOMEM but still
send the FLIC_FAILED.  That way the destination still fails
as before, and the destination will fail nicely when we wire up the
error checks.

Dave

> diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
> index e86a84e49a..3c62ef8258 100644
> --- a/hw/intc/s390_flic_kvm.c
> +++ b/hw/intc/s390_flic_kvm.c
> @@ -293,27 +293,21 @@ static int kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
>      int len = FLIC_SAVE_INITIAL_SIZE;
>      void *buf;
>      int count;
> +    int r = 0;
>  
>      flic_disable_wait_pfault((struct KVMS390FLICState *) opaque);
>  
>      buf = g_try_malloc0(len);
>      if (!buf) {
> -        /* Storing FLIC_FAILED into the count field here will cause the
> -         * target system to fail when attempting to load irqs from the
> -         * migration state */
>          error_report("flic: couldn't allocate memory");
> -        qemu_put_be64(f, FLIC_FAILED);
> -        return 0;
> +        return -ENOMEM;
>      }
>  
>      count = __get_all_irqs(flic, &buf, len);
>      if (count < 0) {
>          error_report("flic: couldn't retrieve irqs from kernel, rc %d",
>                       count);
> -        /* Storing FLIC_FAILED into the count field here will cause the
> -         * target system to fail when attempting to load irqs from the
> -         * migration state */
> -        qemu_put_be64(f, FLIC_FAILED);
> +        r = count;
>      } else {
>          qemu_put_be64(f, count);
>          qemu_put_buffer(f, (uint8_t *) buf,
> @@ -321,7 +315,7 @@ static int kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
>      }
>      g_free(buf);
>  
> -    return 0;
> +    return r;
>  }
>  
>  /**
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Cornelia Huck Feb. 1, 2017, 10:18 a.m. UTC | #2
On Fri, 27 Jan 2017 18:20:36 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> > On Wed, 25 Jan 2017 14:44:20 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > 
> > > * Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> > > > On Wed, 25 Jan 2017 13:22:55 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > 
> > > > > * Cornelia Huck (cornelia.huck@de.ibm.com) wrote:
> > > > > > On Wed, 25 Jan 2017 12:00:53 +0000
> > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > 
> > > > > > > OK, so it looks like that's a failure path, adding a return -ENOMEM would seem to make
> > > > > > > sense there.
> > > > > > 
> > > > > > Just saw this. I don't think we want -ENOMEM, as that would change the
> > > > > > actual state being saved, no?
> > > > > 
> > > > > But isn't that the intention of this function?
> > > > > 
> > > > >     buf = g_try_malloc0(len);
> > > > >     if (!buf) {
> > > > >         /* Storing FLIC_FAILED into the count field here will cause the
> > > > >          * target system to fail when attempting to load irqs from the
> > > > >          * migration state */
> > > > >         error_report("flic: couldn't allocate memory");
> > > > >         qemu_put_be64(f, FLIC_FAILED);
> > > > >         return;
> > > > >     }
> > > > > 
> > > > > What should happen on the destination - should the migration fail?
> > > > > If we want the migration to fail then we should now return an error
> > > > > status rather than 0, and then we see a failed migration on the source
> > > > > as well.
> > > > 
> > > > Yes. There's also another error case below where we should return an
> > > > error instead of putting FLIC_FAILED, then.
> > > > 
> > > > The problem is that this is rather hard to test: So I'd prefer to fix
> > > > the compile for now and introduce error return codes in a separate
> > > > patch.
> > > 
> > > OK, that's fair.
> > 
> > I've coded something up and tried to test it with error injection to
> > trigger the failed case, but I can't really see an improvement :(
> > 
> > Before: source logs error, target fails to load the flic with 'invalid
> > argument'
> > 
> > After: source logs error, target fails to load the flic with 'could not
> > allocate memory'
> > 
> > The migration code does not seem to do anything with the return code of
> > put methods for now, so that's not too surprising. Is anything in the
> > works?
> 
> OK, I hadn't remembered that wasn't wired up.
> 
> > For now, I'd prefer to keep the old behaviour as 'invalid argument'
> > seems like a more obvious error on the target.
> 
> Yes, agreed.
> 
> If you feel like, then I'd just change the return -ENOMEM but still
> send the FLIC_FAILED.  That way the destination still fails
> as before, and the destination will fail nicely when we wire up the
> error checks.

Yes, having both FLIC_FAILED and a nonzero return code looks like the
best approach for now.
diff mbox

Patch

diff --git a/hw/intc/s390_flic_kvm.c b/hw/intc/s390_flic_kvm.c
index e86a84e49a..3c62ef8258 100644
--- a/hw/intc/s390_flic_kvm.c
+++ b/hw/intc/s390_flic_kvm.c
@@ -293,27 +293,21 @@  static int kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
     int len = FLIC_SAVE_INITIAL_SIZE;
     void *buf;
     int count;
+    int r = 0;
 
     flic_disable_wait_pfault((struct KVMS390FLICState *) opaque);
 
     buf = g_try_malloc0(len);
     if (!buf) {
-        /* Storing FLIC_FAILED into the count field here will cause the
-         * target system to fail when attempting to load irqs from the
-         * migration state */
         error_report("flic: couldn't allocate memory");
-        qemu_put_be64(f, FLIC_FAILED);
-        return 0;
+        return -ENOMEM;
     }
 
     count = __get_all_irqs(flic, &buf, len);
     if (count < 0) {
         error_report("flic: couldn't retrieve irqs from kernel, rc %d",
                      count);
-        /* Storing FLIC_FAILED into the count field here will cause the
-         * target system to fail when attempting to load irqs from the
-         * migration state */
-        qemu_put_be64(f, FLIC_FAILED);
+        r = count;
     } else {
         qemu_put_be64(f, count);
         qemu_put_buffer(f, (uint8_t *) buf,
@@ -321,7 +315,7 @@  static int kvm_flic_save(QEMUFile *f, void *opaque, size_t size,
     }
     g_free(buf);
 
-    return 0;
+    return r;
 }
 
 /**