diff mbox

[V7,4/6] qcow2: cancel the modification on fail in qcow2_snapshot_create()

Message ID 1386244972-528-5-git-send-email-xiawenc@linux.vnet.ibm.com
State New
Headers show

Commit Message

Wayne Xia Dec. 5, 2013, 12:02 p.m. UTC
Signed-off-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-snapshot.c |   25 +++++++++++++++++++++----
 1 files changed, 21 insertions(+), 4 deletions(-)

Comments

Stefan Hajnoczi Dec. 20, 2013, 2:20 p.m. UTC | #1
On Thu, Dec 05, 2013 at 08:02:50PM +0800, Wenchao Xia wrote:
> +restore_refcount:
> +    if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, -1)
> +        < 0 && errp) {
> +        /* Nothing can be done now, need image check later */
> +        error_setg(&err, "%s\nqcow2: Error in restoring refcount in snapshot",
> +                   error_get_pretty(*errp));
> +        error_free(*errp);
> +        *errp = NULL;
> +        error_propagate(errp, err);
> +    }

We get here if writing the new snapshot list failed.  If
qcow2_update_snapshot_refcount(..., -1) also fails I think we should
skip qcow2_free_clusters() below.  We don't know the exact state of the
disk image anymore - better to leak clusters than to have a dangling
reference.

> +dealloc_cluster:
> +    qcow2_free_clusters(bs, sn->l1_table_offset,
> +                        sn->l1_size * sizeof(uint64_t),
> +                        QCOW2_DISCARD_ALWAYS);
> +
>  fail:
>      g_free(sn->id_str);
>      g_free(sn->name);
> -- 
> 1.7.1
> 
>
Wayne Xia Dec. 23, 2013, 2:57 a.m. UTC | #2
于 2013/12/20 22:20, Stefan Hajnoczi 写道:
> On Thu, Dec 05, 2013 at 08:02:50PM +0800, Wenchao Xia wrote:
>> +restore_refcount:
>> +    if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, -1)
>> +        < 0 && errp) {
>> +        /* Nothing can be done now, need image check later */
>> +        error_setg(&err, "%s\nqcow2: Error in restoring refcount in snapshot",
>> +                   error_get_pretty(*errp));
>> +        error_free(*errp);
>> +        *errp = NULL;
>> +        error_propagate(errp, err);
>> +    }
>
> We get here if writing the new snapshot list failed.  If
> qcow2_update_snapshot_refcount(..., -1) also fails I think we should
> skip qcow2_free_clusters() below.  We don't know the exact state of the
> disk image anymore - better to leak clusters than to have a dangling
> reference.
>

   Make sense, dangling point should be avoid in any case, will fix,
Thanks for reviewing!

>> +dealloc_cluster:
>> +    qcow2_free_clusters(bs, sn->l1_table_offset,
>> +                        sn->l1_size * sizeof(uint64_t),
>> +                        QCOW2_DISCARD_ALWAYS);
>> +
>>   fail:
>>       g_free(sn->id_str);
>>       g_free(sn->name);
>> --
>> 1.7.1
>>
>>
>
Wayne Xia Dec. 23, 2013, 6:12 a.m. UTC | #3
于 2013/12/23 10:57, Wenchao Xia 写道:
> 于 2013/12/20 22:20, Stefan Hajnoczi 写道:
>> On Thu, Dec 05, 2013 at 08:02:50PM +0800, Wenchao Xia wrote:
>>> +restore_refcount:
>>> +    if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset,
>>> s->l1_size, -1)
>>> +        < 0 && errp) {
>>> +        /* Nothing can be done now, need image check later */
>>> +        error_setg(&err, "%s\nqcow2: Error in restoring refcount in
>>> snapshot",
>>> +                   error_get_pretty(*errp));
>>> +        error_free(*errp);
>>> +        *errp = NULL;
>>> +        error_propagate(errp, err);
>>> +    }
>>
>> We get here if writing the new snapshot list failed.  If
>> qcow2_update_snapshot_refcount(..., -1) also fails I think we should
>> skip qcow2_free_clusters() below.  We don't know the exact state of the
>> disk image anymore - better to leak clusters than to have a dangling
>> reference.
>>
>
>    Make sense, dangling point should be avoid in any case, will fix,
> Thanks for reviewing!
>
>>> +dealloc_cluster:
>>> +    qcow2_free_clusters(bs, sn->l1_table_offset,
>>> +                        sn->l1_size * sizeof(uint64_t),
>>> +                        QCOW2_DISCARD_ALWAYS);
>>> +
>>>   fail:
>>>       g_free(sn->id_str);
>>>       g_free(sn->name);
>>> --
>>> 1.7.1
>>>
>>>
>>
>
>
Hi, Stefan
   I have reconsidered the roll back process, there is many case we
should take care, so it is better to summarize a general rule to do such
cancel operations. I suggest: do a series of roll back operations,
when one fail, skip following roll back operation. For snapshot create,
the create action is:
allocate new L1 -> refcount+1 -> allocate sn_list -> update header
The mirrored rollback action can be:
deallocate L1 <- refcount-1 <- deallocate sn_list <- restore header

When fail happens in rollback action, simply stop following ones.
If you agree, I'd like to reorganize the patch as above.
Stefan Hajnoczi Jan. 2, 2014, 3:42 a.m. UTC | #4
On Mon, Dec 23, 2013 at 02:12:56PM +0800, Wenchao Xia wrote:
>   I have reconsidered the roll back process, there is many case we
> should take care, so it is better to summarize a general rule to do such
> cancel operations. I suggest: do a series of roll back operations,
> when one fail, skip following roll back operation. For snapshot create,
> the create action is:
> allocate new L1 -> refcount+1 -> allocate sn_list -> update header
> The mirrored rollback action can be:
> deallocate L1 <- refcount-1 <- deallocate sn_list <- restore header
> 
> When fail happens in rollback action, simply stop following ones.
> If you agree, I'd like to reorganize the patch as above.

I agree.  When the steps depend on each other we should skip further
operations when an error is returned in the failure path.  If the steps
are independent we can still safely clean up the independent parts.

Stefan
diff mbox

Patch

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 55746c4..5f787bc 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -400,6 +400,7 @@  void qcow2_snapshot_create(BlockDriverState *bs,
     int i, ret;
     uint64_t *l1_table = NULL;
     int64_t l1_table_offset;
+    Error *err = NULL;
 
     memset(sn, 0, sizeof(*sn));
 
@@ -448,7 +449,7 @@  void qcow2_snapshot_create(BlockDriverState *bs,
                          PRIu64 " with size %" PRIu64,
                          sn->l1_table_offset,
                          (uint64_t)(s->l1_size * sizeof(uint64_t)));
-        goto fail;
+        goto dealloc_cluster;
     }
 
     ret = bdrv_pwrite(bs->file, sn->l1_table_offset, l1_table,
@@ -459,7 +460,7 @@  void qcow2_snapshot_create(BlockDriverState *bs,
                          PRIu64 " with size %" PRIu64,
                          sn->l1_table_offset,
                          (uint64_t)(s->l1_size * sizeof(uint64_t)));
-        goto fail;
+        goto dealloc_cluster;
     }
 
     g_free(l1_table);
@@ -476,7 +477,7 @@  void qcow2_snapshot_create(BlockDriverState *bs,
                          "Failed in update of refcount for snapshot at %"
                          PRIu64 " with size %d",
                          s->l1_table_offset, s->l1_size);
-        goto fail;
+        goto dealloc_cluster;
     }
 
     /* Append the new snapshot to the snapshot list */
@@ -494,7 +495,7 @@  void qcow2_snapshot_create(BlockDriverState *bs,
         g_free(s->snapshots);
         s->snapshots = old_snapshot_list;
         s->nb_snapshots--;
-        goto fail;
+        goto restore_refcount;
     }
 
     g_free(old_snapshot_list);
@@ -514,6 +515,22 @@  void qcow2_snapshot_create(BlockDriverState *bs,
 #endif
     return;
 
+restore_refcount:
+    if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, -1)
+        < 0 && errp) {
+        /* Nothing can be done now, need image check later */
+        error_setg(&err, "%s\nqcow2: Error in restoring refcount in snapshot",
+                   error_get_pretty(*errp));
+        error_free(*errp);
+        *errp = NULL;
+        error_propagate(errp, err);
+    }
+
+dealloc_cluster:
+    qcow2_free_clusters(bs, sn->l1_table_offset,
+                        sn->l1_size * sizeof(uint64_t),
+                        QCOW2_DISCARD_ALWAYS);
+
 fail:
     g_free(sn->id_str);
     g_free(sn->name);