diff mbox series

[50/52] migration/rdma: Silence qemu_rdma_cleanup()

Message ID 20230918144206.560120-51-armbru@redhat.com
State New
Headers show
Series migration/rdma: Error handling fixes | expand

Commit Message

Markus Armbruster Sept. 18, 2023, 2:42 p.m. UTC
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init(), qemu_rdma_connect(),
rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
violate this principle: they call error_report() via
qemu_rdma_cleanup().

Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
paths, and QIOChannel close and finalization.  Are the conditions it
reports really errors?  I doubt it.

Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
all these callers.  If it isn't, we need to convert to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 migration/rdma.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

Li Zhijian Sept. 26, 2023, 10:12 a.m. UTC | #1
On 18/09/2023 22:42, Markus Armbruster wrote:
> Functions that use an Error **errp parameter to return errors should
> not also report them to the user, because reporting is the caller's
> job.  When the caller does, the error is reported twice.  When it
> doesn't (because it recovered from the error), there is no error to
> report, i.e. the report is bogus.
> 
> qemu_rdma_source_init(), qemu_rdma_connect(),
> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
> violate this principle: they call error_report() via
> qemu_rdma_cleanup().
> 
> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
> paths, and QIOChannel close and finalization.  Are the conditions it
> reports really errors?  I doubt it.

I'm not very sure, it's fine if it's call from the error path. but when
the caller is migration_cancle from HMP/QMP, shall we report something more
though we know QEMU can recover.

maybe change to warning etc...




> 
> Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
> all these callers.  If it isn't, we need to convert to Error instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   migration/rdma.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d9f80ef390..be2db7946d 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>   
>   static void qemu_rdma_cleanup(RDMAContext *rdma)
>   {
> -    Error *err = NULL;
>       int idx;
>   
>       if (rdma->cm_id && rdma->connected) {
> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>                                          .type = RDMA_CONTROL_ERROR,
>                                          .repeat = 1,
>                                        };
> -            error_report("Early error. Sending error.");
> -            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
> -                error_report_err(err);
> -            }
> +            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
>           }
>   
>           rdma_disconnect(rdma->cm_id);
Markus Armbruster Sept. 26, 2023, 11:52 a.m. UTC | #2
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 18/09/2023 22:42, Markus Armbruster wrote:
>> Functions that use an Error **errp parameter to return errors should
>> not also report them to the user, because reporting is the caller's
>> job.  When the caller does, the error is reported twice.  When it
>> doesn't (because it recovered from the error), there is no error to
>> report, i.e. the report is bogus.
>> 
>> qemu_rdma_source_init(), qemu_rdma_connect(),
>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
>> violate this principle: they call error_report() via
>> qemu_rdma_cleanup().
>> 
>> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
>> paths, and QIOChannel close and finalization.  Are the conditions it
>> reports really errors?  I doubt it.
>
> I'm not very sure, it's fine if it's call from the error path. but when
> the caller is migration_cancle from HMP/QMP, shall we report something more
> though we know QEMU can recover.
>
> maybe change to warning etc...

The part I'm sure about is that reporting an error to the user is wrong
when we actually recover from the error.  Which qemu_rdma_cleanup()
does.

I'm not sure whether the (complicated!) condition that triggers
qemu_rdma_cleanup()'s ill-advised error report needs to be reported in
some other form.  The remainder of the function ignores failure...

If you think we should to downgrade the error to a warning, and no
maintainer disagrees, then I'll downgrade.  Do you?

>> Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
>> all these callers.  If it isn't, we need to convert to Error instead.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   migration/rdma.c | 6 +-----
>>   1 file changed, 1 insertion(+), 5 deletions(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index d9f80ef390..be2db7946d 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>>   
>>   static void qemu_rdma_cleanup(RDMAContext *rdma)
>>   {
>> -    Error *err = NULL;
>>       int idx;
>>   
>>       if (rdma->cm_id && rdma->connected) {
>> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>                                          .type = RDMA_CONTROL_ERROR,
>>                                          .repeat = 1,
>>                                        };
>> -            error_report("Early error. Sending error.");
>> -            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
>> -                error_report_err(err);
>> -            }
>> +            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
>>           }
>>   
>>           rdma_disconnect(rdma->cm_id);
Li Zhijian Sept. 27, 2023, 1:41 a.m. UTC | #3
On 26/09/2023 19:52, Markus Armbruster wrote:
> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
> 
>> On 18/09/2023 22:42, Markus Armbruster wrote:
>>> Functions that use an Error **errp parameter to return errors should
>>> not also report them to the user, because reporting is the caller's
>>> job.  When the caller does, the error is reported twice.  When it
>>> doesn't (because it recovered from the error), there is no error to
>>> report, i.e. the report is bogus.
>>>
>>> qemu_rdma_source_init(), qemu_rdma_connect(),
>>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
>>> violate this principle: they call error_report() via
>>> qemu_rdma_cleanup().
>>>
>>> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
>>> paths, and QIOChannel close and finalization.  Are the conditions it
>>> reports really errors?  I doubt it.
>>
>> I'm not very sure, it's fine if it's call from the error path. but when
>> the caller is migration_cancle from HMP/QMP, shall we report something more
>> though we know QEMU can recover.
>>
>> maybe change to warning etc...
> 
> The part I'm sure about is that reporting an error to the user is wrong
> when we actually recover from the error.  Which qemu_rdma_cleanup()
> does.

Yes, i have no doubt about this.


> 
> I'm not sure whether the (complicated!) condition that triggers
> qemu_rdma_cleanup()'s ill-advised error report needs to be reported in
> some other form.  The remainder of the function ignores failure...
> 
> If you think we should to downgrade the error to a warning, and no
> maintainer disagrees, then I'll downgrade.  Do you?

Yes, I'd like downgrade error to a warning.


Thanks
Zhijian

> 
>>> Clean this up: silence qemu_rdma_cleanup().  I believe that's fine for
>>> all these callers.  If it isn't, we need to convert to Error instead.
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>    migration/rdma.c | 6 +-----
>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index d9f80ef390..be2db7946d 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2330,7 +2330,6 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>>>    
>>>    static void qemu_rdma_cleanup(RDMAContext *rdma)
>>>    {
>>> -    Error *err = NULL;
>>>        int idx;
>>>    
>>>        if (rdma->cm_id && rdma->connected) {
>>> @@ -2341,10 +2340,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>>                                           .type = RDMA_CONTROL_ERROR,
>>>                                           .repeat = 1,
>>>                                         };
>>> -            error_report("Early error. Sending error.");
>>> -            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
>>> -                error_report_err(err);
>>> -            }
>>> +            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
>>>            }
>>>    
>>>            rdma_disconnect(rdma->cm_id);
>
Markus Armbruster Sept. 27, 2023, 5:30 a.m. UTC | #4
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:

> On 26/09/2023 19:52, Markus Armbruster wrote:
>> "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com> writes:
>> 
>>> On 18/09/2023 22:42, Markus Armbruster wrote:
>>>> Functions that use an Error **errp parameter to return errors should
>>>> not also report them to the user, because reporting is the caller's
>>>> job.  When the caller does, the error is reported twice.  When it
>>>> doesn't (because it recovered from the error), there is no error to
>>>> report, i.e. the report is bogus.
>>>>
>>>> qemu_rdma_source_init(), qemu_rdma_connect(),
>>>> rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
>>>> violate this principle: they call error_report() via
>>>> qemu_rdma_cleanup().
>>>>
>>>> Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
>>>> paths, and QIOChannel close and finalization.  Are the conditions it
>>>> reports really errors?  I doubt it.
>>>
>>> I'm not very sure, it's fine if it's call from the error path. but when
>>> the caller is migration_cancle from HMP/QMP, shall we report something more
>>> though we know QEMU can recover.
>>>
>>> maybe change to warning etc...
>> 
>> The part I'm sure about is that reporting an error to the user is wrong
>> when we actually recover from the error.  Which qemu_rdma_cleanup()
>> does.
>
> Yes, i have no doubt about this.
>
>
>> 
>> I'm not sure whether the (complicated!) condition that triggers
>> qemu_rdma_cleanup()'s ill-advised error report needs to be reported in
>> some other form.  The remainder of the function ignores failure...
>> 
>> If you think we should to downgrade the error to a warning, and no
>> maintainer disagrees, then I'll downgrade.  Do you?
>
> Yes, I'd like downgrade error to a warning.

Got it, thanks!
diff mbox series

Patch

diff --git a/migration/rdma.c b/migration/rdma.c
index d9f80ef390..be2db7946d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2330,7 +2330,6 @@  static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
 
 static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
-    Error *err = NULL;
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
@@ -2341,10 +2340,7 @@  static void qemu_rdma_cleanup(RDMAContext *rdma)
                                        .type = RDMA_CONTROL_ERROR,
                                        .repeat = 1,
                                      };
-            error_report("Early error. Sending error.");
-            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
-                error_report_err(err);
-            }
+            qemu_rdma_post_send_control(rdma, NULL, &head, NULL);
         }
 
         rdma_disconnect(rdma->cm_id);