Message ID | 20230918144206.560120-51-armbru@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration/rdma: Error handling fixes | expand |
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the error), there is no error to > report, i.e. the report is bogus. > > qemu_rdma_source_init(), qemu_rdma_connect(), > rdma_start_incoming_migration(), and rdma_start_outgoing_migration() > violate this principle: they call error_report() via > qemu_rdma_cleanup(). > > Moreover, qemu_rdma_cleanup() can't fail. It is called on error > paths, and QIOChannel close and finalization. Are the conditions it > reports really errors? I doubt it. I'm not very sure, it's fine if it's call from the error path. but when the caller is migration_cancle from HMP/QMP, shall we report something more though we know QEMU can recover. maybe change to warning etc... > > Clean this up: silence qemu_rdma_cleanup(). I believe that's fine for > all these callers. If it isn't, we need to convert to Error instead. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > migration/rdma.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index d9f80ef390..be2db7946d 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma, > > static void qemu_rdma_cleanup(RDMAContext *rdma) > { > - Error *err = NULL; > int idx; > > if (rdma->cm_id && rdma->connected) { > @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) > .type = RDMA_CONTROL_ERROR, > .repeat = 1, > }; > - error_report("Early error. Sending error."); > - if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) { > - error_report_err(err); > - } > + qemu_rdma_post_send_control(rdma, NULL, &head, NULL); > } > > rdma_disconnect(rdma->cm_id);
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes: > On 18/09/2023 22:42, Markus Armbruster wrote: >> Functions that use an Error **errp parameter to return errors should >> not also report them to the user, because reporting is the caller's >> job. When the caller does, the error is reported twice. When it >> doesn't (because it recovered from the error), there is no error to >> report, i.e. the report is bogus. >> >> qemu_rdma_source_init(), qemu_rdma_connect(), >> rdma_start_incoming_migration(), and rdma_start_outgoing_migration() >> violate this principle: they call error_report() via >> qemu_rdma_cleanup(). >> >> Moreover, qemu_rdma_cleanup() can't fail. It is called on error >> paths, and QIOChannel close and finalization. Are the conditions it >> reports really errors? I doubt it. > > I'm not very sure, it's fine if it's call from the error path. but when > the caller is migration_cancle from HMP/QMP, shall we report something more > though we know QEMU can recover. > > maybe change to warning etc... The part I'm sure about is that reporting an error to the user is wrong when we actually recover from the error. Which qemu_rdma_cleanup() does. I'm not sure whether the (complicated!) condition that triggers qemu_rdma_cleanup()'s ill-advised error report needs to be reported in some other form. The remainder of the function ignores failure... If you think we should to downgrade the error to a warning, and no maintainer disagrees, then I'll downgrade. Do you? >> Clean this up: silence qemu_rdma_cleanup(). I believe that's fine for >> all these callers. If it isn't, we need to convert to Error instead. >> >> Signed-off-by: Markus Armbruster <armbru@redhat.com> >> --- >> migration/rdma.c | 6 +----- >> 1 file changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index d9f80ef390..be2db7946d 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma, >> >> static void qemu_rdma_cleanup(RDMAContext *rdma) >> { >> - Error *err = NULL; >> int idx; >> >> if (rdma->cm_id && rdma->connected) { >> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) >> .type = RDMA_CONTROL_ERROR, >> .repeat = 1, >> }; >> - error_report("Early error. Sending error."); >> - if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) { >> - error_report_err(err); >> - } >> + qemu_rdma_post_send_control(rdma, NULL, &head, NULL); >> } >> >> rdma_disconnect(rdma->cm_id);
On 26/09/2023 19:52, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes: > >> On 18/09/2023 22:42, Markus Armbruster wrote: >>> Functions that use an Error **errp parameter to return errors should >>> not also report them to the user, because reporting is the caller's >>> job. When the caller does, the error is reported twice. When it >>> doesn't (because it recovered from the error), there is no error to >>> report, i.e. the report is bogus. >>> >>> qemu_rdma_source_init(), qemu_rdma_connect(), >>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration() >>> violate this principle: they call error_report() via >>> qemu_rdma_cleanup(). >>> >>> Moreover, qemu_rdma_cleanup() can't fail. It is called on error >>> paths, and QIOChannel close and finalization. Are the conditions it >>> reports really errors? I doubt it. >> >> I'm not very sure, it's fine if it's call from the error path. but when >> the caller is migration_cancle from HMP/QMP, shall we report something more >> though we know QEMU can recover. >> >> maybe change to warning etc... > > The part I'm sure about is that reporting an error to the user is wrong > when we actually recover from the error. Which qemu_rdma_cleanup() > does. Yes, i have no doubt about this. > > I'm not sure whether the (complicated!) condition that triggers > qemu_rdma_cleanup()'s ill-advised error report needs to be reported in > some other form. The remainder of the function ignores failure... > > If you think we should to downgrade the error to a warning, and no > maintainer disagrees, then I'll downgrade. Do you? Yes, I'd like downgrade error to a warning. Thanks Zhijian > >>> Clean this up: silence qemu_rdma_cleanup(). I believe that's fine for >>> all these callers. If it isn't, we need to convert to Error instead. >>> >>> Signed-off-by: Markus Armbruster <armbru@redhat.com> >>> --- >>> migration/rdma.c | 6 +----- >>> 1 file changed, 1 insertion(+), 5 deletions(-) >>> >>> diff --git a/migration/rdma.c b/migration/rdma.c >>> index d9f80ef390..be2db7946d 100644 >>> --- a/migration/rdma.c >>> +++ b/migration/rdma.c >>> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma, >>> >>> static void qemu_rdma_cleanup(RDMAContext *rdma) >>> { >>> - Error *err = NULL; >>> int idx; >>> >>> if (rdma->cm_id && rdma->connected) { >>> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) >>> .type = RDMA_CONTROL_ERROR, >>> .repeat = 1, >>> }; >>> - error_report("Early error. Sending error."); >>> - if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) { >>> - error_report_err(err); >>> - } >>> + qemu_rdma_post_send_control(rdma, NULL, &head, NULL); >>> } >>> >>> rdma_disconnect(rdma->cm_id); >
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes: > On 26/09/2023 19:52, Markus Armbruster wrote: >> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes: >> >>> On 18/09/2023 22:42, Markus Armbruster wrote: >>>> Functions that use an Error **errp parameter to return errors should >>>> not also report them to the user, because reporting is the caller's >>>> job. When the caller does, the error is reported twice. When it >>>> doesn't (because it recovered from the error), there is no error to >>>> report, i.e. the report is bogus. >>>> >>>> qemu_rdma_source_init(), qemu_rdma_connect(), >>>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration() >>>> violate this principle: they call error_report() via >>>> qemu_rdma_cleanup(). >>>> >>>> Moreover, qemu_rdma_cleanup() can't fail. It is called on error >>>> paths, and QIOChannel close and finalization. Are the conditions it >>>> reports really errors? I doubt it. >>> >>> I'm not very sure, it's fine if it's call from the error path. but when >>> the caller is migration_cancle from HMP/QMP, shall we report something more >>> though we know QEMU can recover. >>> >>> maybe change to warning etc... >> >> The part I'm sure about is that reporting an error to the user is wrong >> when we actually recover from the error. Which qemu_rdma_cleanup() >> does. > > Yes, i have no doubt about this. > > >> >> I'm not sure whether the (complicated!) condition that triggers >> qemu_rdma_cleanup()'s ill-advised error report needs to be reported in >> some other form. The remainder of the function ignores failure... >> >> If you think we should to downgrade the error to a warning, and no >> maintainer disagrees, then I'll downgrade. Do you? > > Yes, I'd like downgrade error to a warning. Got it, thanks!
diff --git a/migration/rdma.c b/migration/rdma.c index d9f80ef390..be2db7946d 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma, static void qemu_rdma_cleanup(RDMAContext *rdma) { - Error *err = NULL; int idx; if (rdma->cm_id && rdma->connected) { @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) .type = RDMA_CONTROL_ERROR, .repeat = 1, }; - error_report("Early error. Sending error."); - if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) { - error_report_err(err); - } + qemu_rdma_post_send_control(rdma, NULL, &head, NULL); } rdma_disconnect(rdma->cm_id);
Functions that use an Error **errp parameter to return errors should not also report them to the user, because reporting is the caller's job. When the caller does, the error is reported twice. When it doesn't (because it recovered from the error), there is no error to report, i.e. the report is bogus. qemu_rdma_source_init(), qemu_rdma_connect(), rdma_start_incoming_migration(), and rdma_start_outgoing_migration() violate this principle: they call error_report() via qemu_rdma_cleanup(). Moreover, qemu_rdma_cleanup() can't fail. It is called on error paths, and QIOChannel close and finalization. Are the conditions it reports really errors? I doubt it. Clean this up: silence qemu_rdma_cleanup(). I believe that's fine for all these callers. If it isn't, we need to convert to Error instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- migration/rdma.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)