Message ID | 20230705163502.331007-8-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration: Better error handling in return path thread | expand |
Peter Xu <peterx@redhat.com> writes: > Provide an explicit reason for qemu_file_shutdown()s, which can be > displayed in query-migrate when used. > Can we consider this to cover the TODO: * TODO: convert to propagate Error objects instead of squashing * to a fixed errno value or would that need something fancier? > This will make e.g. migrate-pause to display explicit error descriptions, > from: > > "error-desc": "Channel error: Input/output error" > > To: > > "error-desc": "Channel is explicitly shutdown by the user" > > in query-migrate. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > migration/qemu-file.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > index 419b4092e7..ff605027de 100644 > --- a/migration/qemu-file.c > +++ b/migration/qemu-file.c > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) > * --> guest crash! > */ > if (!f->last_error) { > - qemu_file_set_error(f, -EIO); > + Error *err = NULL; > + > + error_setg(&err, "Channel is explicitly shutdown by the user"); It is good that we can grep this message. However, I'm confused about who the "user" is meant to be here and how are they implicated in this error. > + qemu_file_set_error_obj(f, -EIO, err); > } > > if (!qio_channel_has_feature(f->ioc,
On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > Provide an explicit reason for qemu_file_shutdown()s, which can be > > displayed in query-migrate when used. > > > > Can we consider this to cover the TODO: > > * TODO: convert to propagate Error objects instead of squashing > * to a fixed errno value > > or would that need something fancier? The TODO seems to say we want to allow qemu_file_shutdown() to report an Error* when anything wrong happened (e.g. shutdown() failed)? While this patch was trying to store a specific error string so when query migration later it'll show up to the user. If so, IMHO they're two things. > > > This will make e.g. migrate-pause to display explicit error descriptions, > > from: > > > > "error-desc": "Channel error: Input/output error" > > > > To: > > > > "error-desc": "Channel is explicitly shutdown by the user" > > > > in query-migrate. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > migration/qemu-file.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > > index 419b4092e7..ff605027de 100644 > > --- a/migration/qemu-file.c > > +++ b/migration/qemu-file.c > > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) > > * --> guest crash! > > */ > > if (!f->last_error) { > > - qemu_file_set_error(f, -EIO); > > + Error *err = NULL; > > + > > + error_setg(&err, "Channel is explicitly shutdown by the user"); > > It is good that we can grep this message. However, I'm confused about > who the "user" is meant to be here and how are they implicated in this > error. Ah, here the user is who sends the "migrate-pause" command, according to the example of the commit message. What I wanted to do is provide a clear message (besides -EIO) when query-migrate, so we know more on how the migration is paused/stopped/... Before that it shows that the same as e.g. any form of IO errors happened. Thanks,
Peter Xu <peterx@redhat.com> writes: > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be >> > displayed in query-migrate when used. >> > >> >> Can we consider this to cover the TODO: >> >> * TODO: convert to propagate Error objects instead of squashing >> * to a fixed errno value >> >> or would that need something fancier? > > The TODO seems to say we want to allow qemu_file_shutdown() to report an > Error* when anything wrong happened (e.g. shutdown() failed)? While this > patch was trying to store a specific error string so when query migration > later it'll show up to the user. If so, IMHO they're two things. > Ok, just making sure. >> >> > This will make e.g. migrate-pause to display explicit error descriptions, >> > from: >> > >> > "error-desc": "Channel error: Input/output error" >> > >> > To: >> > >> > "error-desc": "Channel is explicitly shutdown by the user" >> > >> > in query-migrate. >> > >> > Signed-off-by: Peter Xu <peterx@redhat.com> >> > --- >> > migration/qemu-file.c | 5 ++++- >> > 1 file changed, 4 insertions(+), 1 deletion(-) >> > >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c >> > index 419b4092e7..ff605027de 100644 >> > --- a/migration/qemu-file.c >> > +++ b/migration/qemu-file.c >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) >> > * --> guest crash! >> > */ >> > if (!f->last_error) { >> > - qemu_file_set_error(f, -EIO); >> > + Error *err = NULL; >> > + >> > + error_setg(&err, "Channel is explicitly shutdown by the user"); >> >> It is good that we can grep this message. However, I'm confused about >> who the "user" is meant to be here and how are they implicated in this >> error. > > Ah, here the user is who sends the "migrate-pause" command, according to > the example of the commit message. > That's where I'm confused. There are 15 callsites for qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm missing the logical step that links migrate-pause with this error_setg(). Are you assuming that the race described will only happen with migrate-pause and the other invocations would have set an error already?
On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote: > >> Peter Xu <peterx@redhat.com> writes: > >> > >> > Provide an explicit reason for qemu_file_shutdown()s, which can be > >> > displayed in query-migrate when used. > >> > > >> > >> Can we consider this to cover the TODO: > >> > >> * TODO: convert to propagate Error objects instead of squashing > >> * to a fixed errno value > >> > >> or would that need something fancier? > > > > The TODO seems to say we want to allow qemu_file_shutdown() to report an > > Error* when anything wrong happened (e.g. shutdown() failed)? While this > > patch was trying to store a specific error string so when query migration > > later it'll show up to the user. If so, IMHO they're two things. > > > > Ok, just making sure. > > >> > >> > This will make e.g. migrate-pause to display explicit error descriptions, > >> > from: > >> > > >> > "error-desc": "Channel error: Input/output error" > >> > > >> > To: > >> > > >> > "error-desc": "Channel is explicitly shutdown by the user" > >> > > >> > in query-migrate. > >> > > >> > Signed-off-by: Peter Xu <peterx@redhat.com> > >> > --- > >> > migration/qemu-file.c | 5 ++++- > >> > 1 file changed, 4 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > >> > index 419b4092e7..ff605027de 100644 > >> > --- a/migration/qemu-file.c > >> > +++ b/migration/qemu-file.c > >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) > >> > * --> guest crash! > >> > */ > >> > if (!f->last_error) { > >> > - qemu_file_set_error(f, -EIO); > >> > + Error *err = NULL; > >> > + > >> > + error_setg(&err, "Channel is explicitly shutdown by the user"); > >> > >> It is good that we can grep this message. However, I'm confused about > >> who the "user" is meant to be here and how are they implicated in this > >> error. > > > > Ah, here the user is who sends the "migrate-pause" command, according to > > the example of the commit message. > > > > That's where I'm confused. There are 15 callsites for > qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm > missing the logical step that links migrate-pause with this > error_setg(). > Are you assuming that the race described will only happen > with migrate-pause and the other invocations would have set an error > already? It's not a race, but I think you're right. I thought it was always the case to shut but actually not: we do shutdown() also in a few places where we don't really fail, either for COLO or for completion of migration. With the 1st patch, it'll even show in query-migrate. Thanks for spotting it - I could have done better. Let's drop this patch.. sorry for the noise.
Peter Xu <peterx@redhat.com> writes: > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote: >> >> Peter Xu <peterx@redhat.com> writes: >> >> >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be >> >> > displayed in query-migrate when used. >> >> > >> >> >> >> Can we consider this to cover the TODO: >> >> >> >> * TODO: convert to propagate Error objects instead of squashing >> >> * to a fixed errno value >> >> >> >> or would that need something fancier? >> > >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an >> > Error* when anything wrong happened (e.g. shutdown() failed)? While this >> > patch was trying to store a specific error string so when query migration >> > later it'll show up to the user. If so, IMHO they're two things. >> > >> >> Ok, just making sure. >> >> >> >> >> > This will make e.g. migrate-pause to display explicit error descriptions, >> >> > from: >> >> > >> >> > "error-desc": "Channel error: Input/output error" >> >> > >> >> > To: >> >> > >> >> > "error-desc": "Channel is explicitly shutdown by the user" >> >> > >> >> > in query-migrate. >> >> > >> >> > Signed-off-by: Peter Xu <peterx@redhat.com> >> >> > --- >> >> > migration/qemu-file.c | 5 ++++- >> >> > 1 file changed, 4 insertions(+), 1 deletion(-) >> >> > >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c >> >> > index 419b4092e7..ff605027de 100644 >> >> > --- a/migration/qemu-file.c >> >> > +++ b/migration/qemu-file.c >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) >> >> > * --> guest crash! >> >> > */ >> >> > if (!f->last_error) { >> >> > - qemu_file_set_error(f, -EIO); >> >> > + Error *err = NULL; >> >> > + >> >> > + error_setg(&err, "Channel is explicitly shutdown by the user"); >> >> >> >> It is good that we can grep this message. However, I'm confused about >> >> who the "user" is meant to be here and how are they implicated in this >> >> error. >> > >> > Ah, here the user is who sends the "migrate-pause" command, according to >> > the example of the commit message. >> > >> >> That's where I'm confused. There are 15 callsites for >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm >> missing the logical step that links migrate-pause with this >> error_setg(). >> Are you assuming that the race described will only happen >> with migrate-pause and the other invocations would have set an error >> already? > > It's not a race, but I think you're right. I thought it was always the case I'm talking about the race with another thread checking f->last_error and this thread setting it. Described in commit f5816b5c86ed ("migration: Fix race on qemu_file_shutdown()"). > to shut but actually not: we do shutdown() also in a few places where we > don't really fail, either for COLO or for completion of migration. With > the 1st patch, it'll even show in query-migrate. Thanks for spotting it - > I could have done better. > The idea is that we avoid doing IO after the file has been shutdown, so we preload this -EIO error. We could just alter the message to "Channel has been explicitly shutdown" or "Tried to do IO after channel shutdown". It would still be better than the generic EIO message. But up to you.
On Thu, Jul 06, 2023 at 02:33:42PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote: > >> Peter Xu <peterx@redhat.com> writes: > >> > >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote: > >> >> Peter Xu <peterx@redhat.com> writes: > >> >> > >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be > >> >> > displayed in query-migrate when used. > >> >> > > >> >> > >> >> Can we consider this to cover the TODO: > >> >> > >> >> * TODO: convert to propagate Error objects instead of squashing > >> >> * to a fixed errno value > >> >> > >> >> or would that need something fancier? > >> > > >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an > >> > Error* when anything wrong happened (e.g. shutdown() failed)? While this > >> > patch was trying to store a specific error string so when query migration > >> > later it'll show up to the user. If so, IMHO they're two things. > >> > > >> > >> Ok, just making sure. > >> > >> >> > >> >> > This will make e.g. migrate-pause to display explicit error descriptions, > >> >> > from: > >> >> > > >> >> > "error-desc": "Channel error: Input/output error" > >> >> > > >> >> > To: > >> >> > > >> >> > "error-desc": "Channel is explicitly shutdown by the user" > >> >> > > >> >> > in query-migrate. > >> >> > > >> >> > Signed-off-by: Peter Xu <peterx@redhat.com> > >> >> > --- > >> >> > migration/qemu-file.c | 5 ++++- > >> >> > 1 file changed, 4 insertions(+), 1 deletion(-) > >> >> > > >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > >> >> > index 419b4092e7..ff605027de 100644 > >> >> > --- a/migration/qemu-file.c > >> >> > +++ b/migration/qemu-file.c > >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) > >> >> > * --> guest crash! > >> >> > */ > >> >> > if (!f->last_error) { > >> >> > - qemu_file_set_error(f, -EIO); > >> >> > + Error *err = NULL; > >> >> > + > >> >> > + error_setg(&err, "Channel is explicitly shutdown by the user"); > >> >> > >> >> It is good that we can grep this message. However, I'm confused about > >> >> who the "user" is meant to be here and how are they implicated in this > >> >> error. > >> > > >> > Ah, here the user is who sends the "migrate-pause" command, according to > >> > the example of the commit message. > >> > > >> > >> That's where I'm confused. There are 15 callsites for > >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm > >> missing the logical step that links migrate-pause with this > >> error_setg(). > >> Are you assuming that the race described will only happen > >> with migrate-pause and the other invocations would have set an error > >> already? > > > > It's not a race, but I think you're right. I thought it was always the case > > I'm talking about the race with another thread checking f->last_error > and this thread setting it. Described in commit f5816b5c86ed > ("migration: Fix race on qemu_file_shutdown()"). I don't yet catch your point, sorry. I thought f5816b5c86ed closed that race. What's still missing? > > > to shut but actually not: we do shutdown() also in a few places where we > > don't really fail, either for COLO or for completion of migration. With > > the 1st patch, it'll even show in query-migrate. Thanks for spotting it - > > I could have done better. > > > > The idea is that we avoid doing IO after the file has been shutdown, so > we preload this -EIO error. We could just alter the message to "Channel > has been explicitly shutdown" or "Tried to do IO after channel > shutdown". It would still be better than the generic EIO message. My point is I'm afraid (I thought after you pointed out, but maybe I just misread what you said..) we'll call qemu_file_shutdown() even in normal paths, so we can see an error poped up in query-migrate even if nothing wrong happened. I think that's unwanted. We can still improve that msg by only setting that specific error in e.g. qmp_migrate_pause|cancel() or paths where we know we want to set the error, but I'd rather drop the patch first so the rest patches can be reviewed and merged first; that'll be a cosmetic change.
Peter Xu <peterx@redhat.com> writes: > On Thu, Jul 06, 2023 at 02:33:42PM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Thu, Jul 06, 2023 at 10:50:34AM -0300, Fabiano Rosas wrote: >> >> Peter Xu <peterx@redhat.com> writes: >> >> >> >> > On Wed, Jul 05, 2023 at 07:05:13PM -0300, Fabiano Rosas wrote: >> >> >> Peter Xu <peterx@redhat.com> writes: >> >> >> >> >> >> > Provide an explicit reason for qemu_file_shutdown()s, which can be >> >> >> > displayed in query-migrate when used. >> >> >> > >> >> >> >> >> >> Can we consider this to cover the TODO: >> >> >> >> >> >> * TODO: convert to propagate Error objects instead of squashing >> >> >> * to a fixed errno value >> >> >> >> >> >> or would that need something fancier? >> >> > >> >> > The TODO seems to say we want to allow qemu_file_shutdown() to report an >> >> > Error* when anything wrong happened (e.g. shutdown() failed)? While this >> >> > patch was trying to store a specific error string so when query migration >> >> > later it'll show up to the user. If so, IMHO they're two things. >> >> > >> >> >> >> Ok, just making sure. >> >> >> >> >> >> >> >> > This will make e.g. migrate-pause to display explicit error descriptions, >> >> >> > from: >> >> >> > >> >> >> > "error-desc": "Channel error: Input/output error" >> >> >> > >> >> >> > To: >> >> >> > >> >> >> > "error-desc": "Channel is explicitly shutdown by the user" >> >> >> > >> >> >> > in query-migrate. >> >> >> > >> >> >> > Signed-off-by: Peter Xu <peterx@redhat.com> >> >> >> > --- >> >> >> > migration/qemu-file.c | 5 ++++- >> >> >> > 1 file changed, 4 insertions(+), 1 deletion(-) >> >> >> > >> >> >> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c >> >> >> > index 419b4092e7..ff605027de 100644 >> >> >> > --- a/migration/qemu-file.c >> >> >> > +++ b/migration/qemu-file.c >> >> >> > @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) >> >> >> > * --> guest crash! >> >> >> > */ >> >> >> > if (!f->last_error) { >> >> >> > - qemu_file_set_error(f, -EIO); >> >> >> > + Error *err = NULL; >> >> >> > + >> >> >> > + error_setg(&err, "Channel is explicitly shutdown by the user"); >> >> >> >> >> >> It is good that we can grep this message. However, I'm confused about >> >> >> who the "user" is meant to be here and how are they implicated in this >> >> >> error. >> >> > >> >> > Ah, here the user is who sends the "migrate-pause" command, according to >> >> > the example of the commit message. >> >> > >> >> >> >> That's where I'm confused. There are 15 callsites for >> >> qemu_file_shutdown(). Only 2 of them are from migrate-pause. So I'm >> >> missing the logical step that links migrate-pause with this >> >> error_setg(). >> >> Are you assuming that the race described will only happen >> >> with migrate-pause and the other invocations would have set an error >> >> already? >> > >> > It's not a race, but I think you're right. I thought it was always the case >> >> I'm talking about the race with another thread checking f->last_error >> and this thread setting it. Described in commit f5816b5c86ed >> ("migration: Fix race on qemu_file_shutdown()"). > > I don't yet catch your point, sorry. I thought f5816b5c86ed closed that > race. What's still missing? > I was initially trying to ask if your previous knowledge about the situation that caused the race could allow you to infer that the error message would only be relevant in the migrate-pause scenario. But I now understand that is not the case. >> >> > to shut but actually not: we do shutdown() also in a few places where we >> > don't really fail, either for COLO or for completion of migration. With >> > the 1st patch, it'll even show in query-migrate. Thanks for spotting it - >> > I could have done better. >> > >> >> The idea is that we avoid doing IO after the file has been shutdown, so >> we preload this -EIO error. We could just alter the message to "Channel >> has been explicitly shutdown" or "Tried to do IO after channel >> shutdown". It would still be better than the generic EIO message. > > My point is I'm afraid (I thought after you pointed out, but maybe I just > misread what you said..) we'll call qemu_file_shutdown() even in normal > paths, so we can see an error poped up in query-migrate even if nothing > wrong happened. I think that's unwanted. > I see. My point was that the error message wouldn't always match the situation in which qemu_file_shutdown() was called. The fact that we might not even want the error message at all had not crossed my mind. > We can still improve that msg by only setting that specific error in e.g. > qmp_migrate_pause|cancel() or paths where we know we want to set the error, > but I'd rather drop the patch first so the rest patches can be reviewed and > merged first; that'll be a cosmetic change. Ok, I agree. Thanks for the clarification.
diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 419b4092e7..ff605027de 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -87,7 +87,10 @@ int qemu_file_shutdown(QEMUFile *f) * --> guest crash! */ if (!f->last_error) { - qemu_file_set_error(f, -EIO); + Error *err = NULL; + + error_setg(&err, "Channel is explicitly shutdown by the user"); + qemu_file_set_error_obj(f, -EIO, err); } if (!qio_channel_has_feature(f->ioc,
Provide an explicit reason for qemu_file_shutdown()s, which can be displayed in query-migrate when used. This will make e.g. migrate-pause to display explicit error descriptions, from: "error-desc": "Channel error: Input/output error" To: "error-desc": "Channel is explicitly shutdown by the user" in query-migrate. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/qemu-file.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)