diff mbox series

[v2,7/7] migration: Provide explicit error message for file shutdowns

Message ID 20230705163502.331007-8-peterx@redhat.com
State New
Headers show
Series migration: Better error handling in return path thread | expand

Commit Message

Peter Xu July 5, 2023, 4:35 p.m. UTC
Provide an explicit reason for qemu_file_shutdown()s, which can be
displayed in query-migrate when used.

This will make e.g. migrate-pause to display explicit error descriptions,
from:

"error-desc": "Channel error: Input/output error"

To:

"error-desc": "Channel is explicitly shutdown by the user"

in query-migrate.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/qemu-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Fabiano Rosas July 5, 2023, 10:05 p.m. UTC | #1
Peter Xu <peterx@redhat.com> writes:

> Provide an explicit reason for qemu_file_shutdown()s, which can be
> displayed in query-migrate when used.
>

Can we consider this to cover the TODO:

 * TODO: convert to propagate Error objects instead of squashing
 * to a fixed errno value

or would that need something fancier?

> This will make e.g. migrate-pause to display explicit error descriptions,
> from:
>
> "error-desc": "Channel error: Input/output error"
>
> To:
>
> "error-desc": "Channel is explicitly shutdown by the user"
>
> in query-migrate.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/qemu-file.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 419b4092e7..ff605027de 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>       *      --> guest crash!
>       */
>      if (!f->last_error) {
> -        qemu_file_set_error(f, -EIO);
> +        Error *err = NULL;
> +
> +        error_setg(&err, "Channel is explicitly shutdown by the user");

It is good that we can grep this message. However, I'm confused about
who the "user" is meant to be here and how are they implicated in this
error.

> +        qemu_file_set_error_obj(f, -EIO, err);
>      }
>  
>      if (!qio_channel_has_feature(f->ioc,
Peter Xu July 5, 2023, 10:34 p.m. UTC | #2
On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> > displayed in query-migrate when used.
> >
> 
> Can we consider this to cover the TODO:
> 
>  * TODO: convert to propagate Error objects instead of squashing
>  * to a fixed errno value
> 
> or would that need something fancier?

The TODO seems to say we want to allow qemu_file_shutdown() to report an
Error* when anything wrong happened (e.g. shutdown() failed)?  While this
patch was trying to store a specific error string so when query migration
later it'll show up to the user.  If so, IMHO they're two things.

> 
> > This will make e.g. migrate-pause to display explicit error descriptions,
> > from:
> >
> > "error-desc": "Channel error: Input/output error"
> >
> > To:
> >
> > "error-desc": "Channel is explicitly shutdown by the user"
> >
> > in query-migrate.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/qemu-file.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index 419b4092e7..ff605027de 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >       *      --> guest crash!
> >       */
> >      if (!f->last_error) {
> > -        qemu_file_set_error(f, -EIO);
> > +        Error *err = NULL;
> > +
> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
> 
> It is good that we can grep this message. However, I'm confused about
> who the "user" is meant to be here and how are they implicated in this
> error.

Ah, here the user is who sends the "migrate-pause" command, according to
the example of the commit message.

What I wanted to do is provide a clear message (besides -EIO) when
query-migrate, so we know more on how the migration is paused/stopped/...
Before that it shows that the same as e.g. any form of IO errors happened.

Thanks,
Fabiano Rosas July 6, 2023, 1:50 p.m. UTC | #3
Peter Xu <peterx@redhat.com> writes:

> On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Provide an explicit reason for qemu_file_shutdown()s, which can be
>> > displayed in query-migrate when used.
>> >
>> 
>> Can we consider this to cover the TODO:
>> 
>>  * TODO: convert to propagate Error objects instead of squashing
>>  * to a fixed errno value
>> 
>> or would that need something fancier?
>
> The TODO seems to say we want to allow qemu_file_shutdown() to report an
> Error* when anything wrong happened (e.g. shutdown() failed)?  While this
> patch was trying to store a specific error string so when query migration
> later it'll show up to the user.  If so, IMHO they're two things.
>

Ok, just making sure.

>> 
>> > This will make e.g. migrate-pause to display explicit error descriptions,
>> > from:
>> >
>> > "error-desc": "Channel error: Input/output error"
>> >
>> > To:
>> >
>> > "error-desc": "Channel is explicitly shutdown by the user"
>> >
>> > in query-migrate.
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  migration/qemu-file.c | 5 ++++-
>> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> > index 419b4092e7..ff605027de 100644
>> > --- a/migration/qemu-file.c
>> > +++ b/migration/qemu-file.c
>> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>> >       *      --> guest crash!
>> >       */
>> >      if (!f->last_error) {
>> > -        qemu_file_set_error(f, -EIO);
>> > +        Error *err = NULL;
>> > +
>> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
>> 
>> It is good that we can grep this message. However, I'm confused about
>> who the "user" is meant to be here and how are they implicated in this
>> error.
>
> Ah, here the user is who sends the "migrate-pause" command, according to
> the example of the commit message.
>

That's where I'm confused. There are 15 callsites for
qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
missing the logical step that links migrate-pause with this
error_setg(). Are you assuming that the race described will only happen
with migrate-pause and the other invocations would have set an error
already?
Peter Xu July 6, 2023, 4:27 p.m. UTC | #4
On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> >> > displayed in query-migrate when used.
> >> >
> >> 
> >> Can we consider this to cover the TODO:
> >> 
> >>  * TODO: convert to propagate Error objects instead of squashing
> >>  * to a fixed errno value
> >> 
> >> or would that need something fancier?
> >
> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
> > patch was trying to store a specific error string so when query migration
> > later it'll show up to the user.  If so, IMHO they're two things.
> >
> 
> Ok, just making sure.
> 
> >> 
> >> > This will make e.g. migrate-pause to display explicit error descriptions,
> >> > from:
> >> >
> >> > "error-desc": "Channel error: Input/output error"
> >> >
> >> > To:
> >> >
> >> > "error-desc": "Channel is explicitly shutdown by the user"
> >> >
> >> > in query-migrate.
> >> >
> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >> > ---
> >> >  migration/qemu-file.c | 5 ++++-
> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> > index 419b4092e7..ff605027de 100644
> >> > --- a/migration/qemu-file.c
> >> > +++ b/migration/qemu-file.c
> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >> >       *      --> guest crash!
> >> >       */
> >> >      if (!f->last_error) {
> >> > -        qemu_file_set_error(f, -EIO);
> >> > +        Error *err = NULL;
> >> > +
> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
> >> 
> >> It is good that we can grep this message. However, I'm confused about
> >> who the "user" is meant to be here and how are they implicated in this
> >> error.
> >
> > Ah, here the user is who sends the "migrate-pause" command, according to
> > the example of the commit message.
> >
> 
> That's where I'm confused. There are 15 callsites for
> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
> missing the logical step that links migrate-pause with this
> error_setg().
> Are you assuming that the race described will only happen
> with migrate-pause and the other invocations would have set an error
> already?

It's not a race, but I think you're right. I thought it was always the case
to shut but actually not: we do shutdown() also in a few places where we
don't really fail, either for COLO or for completion of migration.  With
the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
I could have done better.

Let's drop this patch.. sorry for the noise.
Fabiano Rosas July 6, 2023, 5:33 p.m. UTC | #5
Peter Xu <peterx@redhat.com> writes:

> On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
>> >> > displayed in query-migrate when used.
>> >> >
>> >> 
>> >> Can we consider this to cover the TODO:
>> >> 
>> >>  * TODO: convert to propagate Error objects instead of squashing
>> >>  * to a fixed errno value
>> >> 
>> >> or would that need something fancier?
>> >
>> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
>> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
>> > patch was trying to store a specific error string so when query migration
>> > later it'll show up to the user.  If so, IMHO they're two things.
>> >
>> 
>> Ok, just making sure.
>> 
>> >> 
>> >> > This will make e.g. migrate-pause to display explicit error descriptions,
>> >> > from:
>> >> >
>> >> > "error-desc": "Channel error: Input/output error"
>> >> >
>> >> > To:
>> >> >
>> >> > "error-desc": "Channel is explicitly shutdown by the user"
>> >> >
>> >> > in query-migrate.
>> >> >
>> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> >> > ---
>> >> >  migration/qemu-file.c | 5 ++++-
>> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> >> > index 419b4092e7..ff605027de 100644
>> >> > --- a/migration/qemu-file.c
>> >> > +++ b/migration/qemu-file.c
>> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>> >> >       *      --> guest crash!
>> >> >       */
>> >> >      if (!f->last_error) {
>> >> > -        qemu_file_set_error(f, -EIO);
>> >> > +        Error *err = NULL;
>> >> > +
>> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
>> >> 
>> >> It is good that we can grep this message. However, I'm confused about
>> >> who the "user" is meant to be here and how are they implicated in this
>> >> error.
>> >
>> > Ah, here the user is who sends the "migrate-pause" command, according to
>> > the example of the commit message.
>> >
>> 
>> That's where I'm confused. There are 15 callsites for
>> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
>> missing the logical step that links migrate-pause with this
>> error_setg().
>> Are you assuming that the race described will only happen
>> with migrate-pause and the other invocations would have set an error
>> already?
>
> It's not a race, but I think you're right. I thought it was always the case

I'm talking about the race with another thread checking f->last_error
and this thread setting it. Described in commit f5816b5c86ed
("migration: Fix race on qemu_file_shutdown()").

> to shut but actually not: we do shutdown() also in a few places where we
> don't really fail, either for COLO or for completion of migration.  With
> the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
> I could have done better.
>

The idea is that we avoid doing IO after the file has been shutdown, so
we preload this -EIO error. We could just alter the message to "Channel
has been explicitly shutdown" or "Tried to do IO after channel
shutdown". It would still be better than the generic EIO message.

But up to you.
Peter Xu July 6, 2023, 6:08 p.m. UTC | #6
On Thu, Jul 06, 2023 at 02:33:42PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
> >> >> Peter Xu <peterx@redhat.com> writes:
> >> >> 
> >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
> >> >> > displayed in query-migrate when used.
> >> >> >
> >> >> 
> >> >> Can we consider this to cover the TODO:
> >> >> 
> >> >>  * TODO: convert to propagate Error objects instead of squashing
> >> >>  * to a fixed errno value
> >> >> 
> >> >> or would that need something fancier?
> >> >
> >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
> >> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
> >> > patch was trying to store a specific error string so when query migration
> >> > later it'll show up to the user.  If so, IMHO they're two things.
> >> >
> >> 
> >> Ok, just making sure.
> >> 
> >> >> 
> >> >> > This will make e.g. migrate-pause to display explicit error descriptions,
> >> >> > from:
> >> >> >
> >> >> > "error-desc": "Channel error: Input/output error"
> >> >> >
> >> >> > To:
> >> >> >
> >> >> > "error-desc": "Channel is explicitly shutdown by the user"
> >> >> >
> >> >> > in query-migrate.
> >> >> >
> >> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
> >> >> > ---
> >> >> >  migration/qemu-file.c | 5 ++++-
> >> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >> >> >
> >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> >> >> > index 419b4092e7..ff605027de 100644
> >> >> > --- a/migration/qemu-file.c
> >> >> > +++ b/migration/qemu-file.c
> >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
> >> >> >       *      --> guest crash!
> >> >> >       */
> >> >> >      if (!f->last_error) {
> >> >> > -        qemu_file_set_error(f, -EIO);
> >> >> > +        Error *err = NULL;
> >> >> > +
> >> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
> >> >> 
> >> >> It is good that we can grep this message. However, I'm confused about
> >> >> who the "user" is meant to be here and how are they implicated in this
> >> >> error.
> >> >
> >> > Ah, here the user is who sends the "migrate-pause" command, according to
> >> > the example of the commit message.
> >> >
> >> 
> >> That's where I'm confused. There are 15 callsites for
> >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
> >> missing the logical step that links migrate-pause with this
> >> error_setg().
> >> Are you assuming that the race described will only happen
> >> with migrate-pause and the other invocations would have set an error
> >> already?
> >
> > It's not a race, but I think you're right. I thought it was always the case
> 
> I'm talking about the race with another thread checking f->last_error
> and this thread setting it. Described in commit f5816b5c86ed
> ("migration: Fix race on qemu_file_shutdown()").

I don't yet catch your point, sorry.  I thought f5816b5c86ed closed that
race.  What's still missing?

> 
> > to shut but actually not: we do shutdown() also in a few places where we
> > don't really fail, either for COLO or for completion of migration.  With
> > the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
> > I could have done better.
> >
> 
> The idea is that we avoid doing IO after the file has been shutdown, so
> we preload this -EIO error. We could just alter the message to "Channel
> has been explicitly shutdown" or "Tried to do IO after channel
> shutdown". It would still be better than the generic EIO message.

My point is I'm afraid (I thought after you pointed out, but maybe I just
misread what you said..) we'll call qemu_file_shutdown() even in normal
paths, so we can see an error poped up in query-migrate even if nothing
wrong happened. I think that's unwanted.

We can still improve that msg by only setting that specific error in e.g.
qmp_migrate_pause|cancel() or paths where we know we want to set the error,
but I'd rather drop the patch first so the rest patches can be reviewed and
merged first; that'll be a cosmetic change.
Fabiano Rosas July 6, 2023, 6:47 p.m. UTC | #7
Peter Xu <peterx@redhat.com> writes:

> On Thu, Jul 06, 2023 at 02:33:42PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote:
>> >> >> Peter Xu <peterx@redhat.com> writes:
>> >> >> 
>> >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be
>> >> >> > displayed in query-migrate when used.
>> >> >> >
>> >> >> 
>> >> >> Can we consider this to cover the TODO:
>> >> >> 
>> >> >>  * TODO: convert to propagate Error objects instead of squashing
>> >> >>  * to a fixed errno value
>> >> >> 
>> >> >> or would that need something fancier?
>> >> >
>> >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an
>> >> > Error* when anything wrong happened (e.g. shutdown() failed)?  While this
>> >> > patch was trying to store a specific error string so when query migration
>> >> > later it'll show up to the user.  If so, IMHO they're two things.
>> >> >
>> >> 
>> >> Ok, just making sure.
>> >> 
>> >> >> 
>> >> >> > This will make e.g. migrate-pause to display explicit error descriptions,
>> >> >> > from:
>> >> >> >
>> >> >> > "error-desc": "Channel error: Input/output error"
>> >> >> >
>> >> >> > To:
>> >> >> >
>> >> >> > "error-desc": "Channel is explicitly shutdown by the user"
>> >> >> >
>> >> >> > in query-migrate.
>> >> >> >
>> >> >> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> >> >> > ---
>> >> >> >  migration/qemu-file.c | 5 ++++-
>> >> >> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >> >> >
>> >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> >> >> > index 419b4092e7..ff605027de 100644
>> >> >> > --- a/migration/qemu-file.c
>> >> >> > +++ b/migration/qemu-file.c
>> >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f)
>> >> >> >       *      --> guest crash!
>> >> >> >       */
>> >> >> >      if (!f->last_error) {
>> >> >> > -        qemu_file_set_error(f, -EIO);
>> >> >> > +        Error *err = NULL;
>> >> >> > +
>> >> >> > +        error_setg(&err, "Channel is explicitly shutdown by the user");
>> >> >> 
>> >> >> It is good that we can grep this message. However, I'm confused about
>> >> >> who the "user" is meant to be here and how are they implicated in this
>> >> >> error.
>> >> >
>> >> > Ah, here the user is who sends the "migrate-pause" command, according to
>> >> > the example of the commit message.
>> >> >
>> >> 
>> >> That's where I'm confused. There are 15 callsites for
>> >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm
>> >> missing the logical step that links migrate-pause with this
>> >> error_setg().
>> >> Are you assuming that the race described will only happen
>> >> with migrate-pause and the other invocations would have set an error
>> >> already?
>> >
>> > It's not a race, but I think you're right. I thought it was always the case
>> 
>> I'm talking about the race with another thread checking f->last_error
>> and this thread setting it. Described in commit f5816b5c86ed
>> ("migration: Fix race on qemu_file_shutdown()").
>
> I don't yet catch your point, sorry.  I thought f5816b5c86ed closed that
> race.  What's still missing?
>

I was initially trying to ask if your previous knowledge about the
situation that caused the race could allow you to infer that the error
message would only be relevant in the migrate-pause scenario. But I now
understand that is not the case.

>> 
>> > to shut but actually not: we do shutdown() also in a few places where we
>> > don't really fail, either for COLO or for completion of migration.  With
>> > the 1st patch, it'll even show in query-migrate.  Thanks for spotting it -
>> > I could have done better.
>> >
>> 
>> The idea is that we avoid doing IO after the file has been shutdown, so
>> we preload this -EIO error. We could just alter the message to "Channel
>> has been explicitly shutdown" or "Tried to do IO after channel
>> shutdown". It would still be better than the generic EIO message.
>
> My point is I'm afraid (I thought after you pointed out, but maybe I just
> misread what you said..) we'll call qemu_file_shutdown() even in normal
> paths, so we can see an error poped up in query-migrate even if nothing
> wrong happened. I think that's unwanted.
>

I see. My point was that the error message wouldn't always match the
situation in which qemu_file_shutdown() was called. The fact that we
might not even want the error message at all had not crossed my mind.

> We can still improve that msg by only setting that specific error in e.g.
> qmp_migrate_pause|cancel() or paths where we know we want to set the error,
> but I'd rather drop the patch first so the rest patches can be reviewed and
> merged first; that'll be a cosmetic change.

Ok, I agree. Thanks for the clarification.
diff mbox series

Patch

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 419b4092e7..ff605027de 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -87,7 +87,10 @@  int qemu_file_shutdown(QEMUFile *f)
      *      --> guest crash!
      */
     if (!f->last_error) {
-        qemu_file_set_error(f, -EIO);
+        Error *err = NULL;
+
+        error_setg(&err, "Channel is explicitly shutdown by the user");
+        qemu_file_set_error_obj(f, -EIO, err);
     }
 
     if (!qio_channel_has_feature(f->ioc,