Message ID | 12553646173152-git-send-email-lirans@il.ibm.com |
---|---|
State | New |
Headers | show |
Hi Liran, lirans@il.ibm.com wrote: > To support live migration without shared storage we need to be able to trace > writes to disk while migrating. This Patch expose handler registration for > above components to be notified about block writes. > > diff --git a/block.c b/block.c > index 33f3d65..bf5f7a6 100644 > --- a/block.c > +++ b/block.c > @@ -61,6 +61,8 @@ BlockDriverState *bdrv_first; > > static BlockDriver *first_drv; > > +static BlockDriverDirtyHandler *bdrv_dirty_handler = NULL; > + > Should be a property of a BlockDriverState. IOW, we should register a dirty callback for each block device we're interested in. > int path_is_absolute(const char *path) > { > const char *p; > @@ -626,6 +628,10 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num, > if (bdrv_check_request(bs, sector_num, nb_sectors)) > return -EIO; > > + if(bdrv_dirty_handler != NULL) { > + bdrv_dirty_handler(bs, sector_num, nb_sectors); > + } > + > return drv->bdrv_write(bs, sector_num, buf, nb_sectors); > } > CodingStyle seems off. We have to be careful in these cases to check for whether we're dealing with BDRV_FILE. In the case of something like qcow2, you would get two dirty callbacks as this code stands. The first would be what the guest actually writes and then the second (and potentially third) would be qcow2 metadata updates along with writing the actual data to disk. In terms of an interface, I think it would be better to register a bitmap and to poll the block driver periodically to see which bits have changed. This is how ram dirty tracking works and I think keeping these interfaces consistent is a good thing. I'd suggest tracking dirtiness in relatively large chunks (at least 2MB). > > @@ -1359,6 +1370,10 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num, > if (bdrv_check_request(bs, sector_num, nb_sectors)) > return NULL; > > + if(bdrv_dirty_handler != NULL) { > + bdrv_dirty_handler(bs, sector_num, nb_sectors); > + } > + > ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors, > cb, opaque); > The check should be in the completion callback as AIO requests can be canceled. You're potentially giving a false positive. Regards, Anthony Liguori
qemu-devel-bounces+lirans=il.ibm.com@nongnu.org wrote on 21/10/2009 20:21:09: > Anthony Liguori <anthony@codemonkey.ws> > Sent by: qemu-devel-bounces+lirans=il.ibm.com@nongnu.org > > 21/10/2009 20:21 > > To > > Liran Schour/Haifa/IBM@IBMIL > > cc > > qemu-devel@nongnu.org > > Subject > > Re: [Qemu-devel] [PATCH 1/3 v4] Expose a mechanisem to trace block writes > > Hi Liran, > > lirans@il.ibm.com wrote: > > To support live migration without shared storage we need to be able to trace > > writes to disk while migrating. This Patch expose handler registration for > > above components to be notified about block writes. > > > > diff --git a/block.c b/block.c > > index 33f3d65..bf5f7a6 100644 > > --- a/block.c > > +++ b/block.c > > @@ -61,6 +61,8 @@ BlockDriverState *bdrv_first; > > > > static BlockDriver *first_drv; > > > > +static BlockDriverDirtyHandler *bdrv_dirty_handler = NULL; > > + > > > > Should be a property of a BlockDriverState. IOW, we should register a > dirty callback for each block device we're interested in. Agree, I will fix that. > > int path_is_absolute(const char *path) > > { > > const char *p; > > @@ -626,6 +628,10 @@ int bdrv_write(BlockDriverState *bs, int64_t > sector_num, > > if (bdrv_check_request(bs, sector_num, nb_sectors)) > > return -EIO; > > > > + if(bdrv_dirty_handler != NULL) { > > + bdrv_dirty_handler(bs, sector_num, nb_sectors); > > + } > > + > > return drv->bdrv_write(bs, sector_num, buf, nb_sectors); > > } > > > > CodingStyle seems off. > > We have to be careful in these cases to check for whether we're dealing > with BDRV_FILE. In the case of something like qcow2, you would get two > dirty callbacks as this code stands. The first would be what the guest > actually writes and then the second (and potentially third) would be > qcow2 metadata updates along with writing the actual data to disk. It seems to me that if we will register a callback for each device we are interested in, it will solve the problem. (in the code now bug is avoided by check that I do in block-migration.c about the device type and name. But I agree that right now it is buggy.) Will register each device separately to solve the problem. > In terms of an interface, I think it would be better to register a > bitmap and to poll the block driver periodically to see which bits have > changed. This is how ram dirty tracking works and I think keeping these > interfaces consistent is a good thing. There are advantages for let the higher component to manage it's own dirty tracking mechanism. In this way more then one component can register itself. I think to change the registration mechanism to allow more then one handler to be registered, the same way like register_savevm_live works. > I'd suggest tracking dirtiness in relatively large chunks (at least 2MB). There are disadvantages for tracking dirtiness in such large chunks. One write of 4KB to disk will result in 2MB migrated data. For now I see a way to improve things by allocating the bitmap only when migration started and not from the beginning. I will fix that. > > > > @@ -1359,6 +1370,10 @@ BlockDriverAIOCB *bdrv_aio_writev > (BlockDriverState *bs, int64_t sector_num, > > if (bdrv_check_request(bs, sector_num, nb_sectors)) > > return NULL; > > > > + if(bdrv_dirty_handler != NULL) { > > + bdrv_dirty_handler(bs, sector_num, nb_sectors); > > + } > > + > > ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors, > > cb, opaque); > > > > The check should be in the completion callback as AIO requests can be > canceled. You're potentially giving a false positive. The problem is that the completion callback is caller private. The only way is to intercept the code that is calling the cb, but this code is format specific - means many places in the source code. For our reason I think that we can live with treating a write that was cancel as a dirty block. What do you think? I will fix all of the above and will resend the patch. Thanks for the review. - Liran
diff --git a/block.c b/block.c index 33f3d65..bf5f7a6 100644 --- a/block.c +++ b/block.c @@ -61,6 +61,8 @@ BlockDriverState *bdrv_first; static BlockDriver *first_drv; +static BlockDriverDirtyHandler *bdrv_dirty_handler = NULL; + int path_is_absolute(const char *path) { const char *p; @@ -626,6 +628,10 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num, if (bdrv_check_request(bs, sector_num, nb_sectors)) return -EIO; + if(bdrv_dirty_handler != NULL) { + bdrv_dirty_handler(bs, sector_num, nb_sectors); + } + return drv->bdrv_write(bs, sector_num, buf, nb_sectors); } @@ -1162,6 +1168,11 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num, return -ENOTSUP; if (bdrv_check_request(bs, sector_num, nb_sectors)) return -EIO; + + if(bdrv_dirty_handler != NULL) { + bdrv_dirty_handler(bs, sector_num, nb_sectors); + } + return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors); } @@ -1359,6 +1370,10 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num, if (bdrv_check_request(bs, sector_num, nb_sectors)) return NULL; + if(bdrv_dirty_handler != NULL) { + bdrv_dirty_handler(bs, sector_num, nb_sectors); + } + ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors, cb, opaque); @@ -1869,7 +1884,19 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs, return NULL; } + + void *qemu_blockalign(BlockDriverState *bs, size_t size) { return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size); } + +void register_bdrv_dirty_tracking(BlockDriverDirtyHandler *dirty_handler) +{ + bdrv_dirty_handler = dirty_handler; +} + +void unregister_bdrv_dirty_tracking(void) +{ + bdrv_dirty_handler = NULL; +} diff --git a/block.h b/block.h index a966afb..9757a42 100644 --- a/block.h +++ b/block.h @@ -78,7 +78,8 @@ void bdrv_register(BlockDriver *bdrv); /* async block I/O */ typedef struct BlockDriverAIOCB BlockDriverAIOCB; typedef void BlockDriverCompletionFunc(void *opaque, int ret); - +typedef void BlockDriverDirtyHandler(BlockDriverState *bs, int64_t sector, + int sector_num); BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *iov, int nb_sectors, BlockDriverCompletionFunc *cb, void *opaque); @@ -184,4 +185,7 @@ int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf, int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf, int64_t pos, int size); +void register_bdrv_dirty_tracking(BlockDriverDirtyHandler *dirty_handler); +void unregister_bdrv_dirty_tracking(void); + #endif diff --git a/block_int.h b/block_int.h index 8e72abe..c524860 100644 --- a/block_int.h +++ b/block_int.h @@ -168,6 +168,7 @@ struct BlockDriverState { int cyls, heads, secs, translation; int type; char device_name[32]; + void *dirty_control; BlockDriverState *next; void *private; };