Message ID | 20100129190440.GA25287@lst.de |
---|---|
State | New |
Headers | show |
On 01/29/2010 01:04 PM, Christoph Hellwig wrote: > Add three new suboptions for the drive option to export block topology > information to the guest. This is needed to get optimal I/O alignment > for RAID arrays or SSDs. > > The options are: > > - physical_block_size to specify the physical block size of the device, > this is going to increase from 512 bytes to 4096 kilobytes for many > modern storage devices > - min_io_size to specify the minimal I/O size without performance impact, > this is typically set to the RAID chunk size for arrays. > - opt_io_size to specify the optimal sustained I/O size, this is > typically the RAID stripe width for arrays. > This makes sense. > I decided to not auto-probe these values from blkid which might easily > be possible as I don't know how to deal with these issues on migration. > > Note that we specificly only set the physical_block_size, and not the > logial one which is the unit all I/O is described in. The reason for > that is that IDE does not support increasing the logical block size and > at last for now I want to stick to one meachnisms in queue and allow > for easy switching of transports for a given backing image which would > not be possible if scsi and virtio use real 4k sectors, while ide only > uses the physical block exponent. > > Signed-off-by: Christoph Hellwig<hch@lst.de> > > Index: qemu/block.c > =================================================================== > --- qemu.orig/block.c 2010-01-29 11:07:50.083004364 +0100 > +++ qemu/block.c 2010-01-29 11:08:32.940004255 +0100 > @@ -1028,6 +1028,51 @@ int bdrv_enable_write_cache(BlockDriverS > return bs->enable_write_cache; > } > > +unsigned int bdrv_get_physical_block_size(BlockDriverState *bs) > +{ > + return bs->physical_block_size; > +} > + > +unsigned int bdrv_get_physical_block_exp(BlockDriverState *bs) > +{ > + unsigned int exp = 0, size; > + > + for (size = bs->physical_block_size; size> 512; size>>= 1) { > + exp++; > + } > + > + return exp; > +} > + > +void bdrv_set_physical_block_size(BlockDriverState *bs, > + unsigned int physical_block_size) > +{ > + bs->physical_block_size = physical_block_size; > +} > + > But I don't think this is the wrong place to do it. The BlockDriverState reflects that backing device, not the emulated device itself. In this case, you're trying to set a property of the emulated device. I think these need to be qdev properties of the respective devices. From a UI perspective, you can still expose -drive options for the end user to consume, but this data should be associated with the devices themselves. Regards, Anthony Liguori > +unsigned int bdrv_get_min_io_size(BlockDriverState *bs) > +{ > + return bs->min_io_size; > +} > + > +void bdrv_set_min_io_size(BlockDriverState *bs, > + unsigned int min_io_size) > +{ > + bs->min_io_size = min_io_size; > +} > + > +unsigned int bdrv_get_opt_io_size(BlockDriverState *bs) > +{ > + return bs->opt_io_size; > +} > + > +void bdrv_set_opt_io_size(BlockDriverState *bs, > + unsigned int opt_io_size) > +{ > + bs->opt_io_size = opt_io_size; > +} > + > /* XXX: no longer used */ > void bdrv_set_change_cb(BlockDriverState *bs, > void (*change_cb)(void *opaque), void *opaque) > Index: qemu/block.h > =================================================================== > --- qemu.orig/block.h 2010-01-29 11:07:50.089004011 +0100 > +++ qemu/block.h 2010-01-29 11:08:32.940004255 +0100 > @@ -152,6 +152,16 @@ int bdrv_is_inserted(BlockDriverState *b > int bdrv_media_changed(BlockDriverState *bs); > int bdrv_is_locked(BlockDriverState *bs); > void bdrv_set_locked(BlockDriverState *bs, int locked); > +unsigned int bdrv_get_physical_block_size(BlockDriverState *bs); > +unsigned int bdrv_get_physical_block_exp(BlockDriverState *bs); > +void bdrv_set_physical_block_size(BlockDriverState *bs, > + unsigned int physical_block_size); > +unsigned int bdrv_get_min_io_size(BlockDriverState *bs); > +void bdrv_set_min_io_size(BlockDriverState *bs, > + unsigned int min_io_size); > +unsigned int bdrv_get_opt_io_size(BlockDriverState *bs); > +void bdrv_set_opt_io_size(BlockDriverState *bs, > + unsigned int opt_io_size); > int bdrv_eject(BlockDriverState *bs, int eject_flag); > void bdrv_set_change_cb(BlockDriverState *bs, > void (*change_cb)(void *opaque), void *opaque); > Index: qemu/block_int.h > =================================================================== > --- qemu.orig/block_int.h 2010-01-29 11:07:50.096004065 +0100 > +++ qemu/block_int.h 2010-01-29 11:08:32.941003474 +0100 > @@ -173,6 +173,14 @@ struct BlockDriverState { > drivers. They are not used by the block driver */ > int cyls, heads, secs, translation; > int type; > + > + /* > + * Topology information, all optional. > + */ > + unsigned int physical_block_size; > + unsigned int min_io_size; > + unsigned int opt_io_size; > + > char device_name[32]; > unsigned long *dirty_bitmap; > BlockDriverState *next; > Index: qemu/qemu-config.c > =================================================================== > --- qemu.orig/qemu-config.c 2010-01-29 11:07:50.133004032 +0100 > +++ qemu/qemu-config.c 2010-01-29 11:08:32.944025367 +0100 > @@ -78,6 +78,15 @@ QemuOptsList qemu_drive_opts = { > },{ > .name = "readonly", > .type = QEMU_OPT_BOOL, > + },{ > + .name = "physical_block_size", > + .type = QEMU_OPT_NUMBER, > + },{ > + .name = "min_io_size", > + .type = QEMU_OPT_NUMBER, > + },{ > + .name = "opt_io_size", > + .type = QEMU_OPT_NUMBER, > }, > { /* end if list */ } > }, > Index: qemu/vl.c > =================================================================== > --- qemu.orig/vl.c 2010-01-29 11:07:50.141004284 +0100 > +++ qemu/vl.c 2010-01-29 11:08:32.947003820 +0100 > @@ -1904,6 +1904,9 @@ DriveInfo *drive_init(QemuOpts *opts, vo > int index; > int cache; > int aio = 0; > + unsigned long physical_block_size = 512; > + unsigned long min_io_size = 0; > + unsigned long opt_io_size = 0; > int ro = 0; > int bdrv_flags; > int on_read_error, on_write_error; > @@ -2053,6 +2056,32 @@ DriveInfo *drive_init(QemuOpts *opts, vo > } > #endif > > + if ((buf = qemu_opt_get(opts, "physical_block_size")) != NULL) { > + physical_block_size = strtoul(buf, NULL, 10); > + if (physical_block_size< 512) { > + fprintf(stderr, "sector size must be larger than 512 bytes\n"); > + return NULL; > + } > + } > + > + if ((buf = qemu_opt_get(opts, "min_io_size")) != NULL) { > + min_io_size = strtoul(buf, NULL, 10); > + if (!min_io_size || (min_io_size % physical_block_size)) { > + fprintf(stderr, > + "min_io_size must be a multiple of the sector size\n"); > + return NULL; > + } > + } > + > + if ((buf = qemu_opt_get(opts, "opt_io_size")) != NULL) { > + opt_io_size = strtoul(buf, NULL, 10); > + if (!opt_io_size || (opt_io_size % min_io_size)) { > + fprintf(stderr, > + "opt_io_size must be a multiple of min_io_size\n"); > + return NULL; > + } > + } > + > if ((buf = qemu_opt_get(opts, "format")) != NULL) { > if (strcmp(buf, "?") == 0) { > fprintf(stderr, "qemu: Supported formats:"); > @@ -2257,6 +2286,12 @@ DriveInfo *drive_init(QemuOpts *opts, vo > return NULL; > } > > + bdrv_set_physical_block_size(dinfo->bdrv, physical_block_size); > + if (min_io_size) > + bdrv_set_min_io_size(dinfo->bdrv, min_io_size); > + if (opt_io_size) > + bdrv_set_opt_io_size(dinfo->bdrv, opt_io_size); > + > if (bdrv_key_required(dinfo->bdrv)) > autostart = 0; > *fatal_error = 0; > Index: qemu/qemu-options.hx > =================================================================== > --- qemu.orig/qemu-options.hx 2010-01-29 11:07:50.149004395 +0100 > +++ qemu/qemu-options.hx 2010-01-29 19:36:06.118256061 +0100 > @@ -104,6 +104,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive, > " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n" > " [,cache=writethrough|writeback|none][,format=f][,serial=s]\n" > " [,addr=A][,id=name][,aio=threads|native][,readonly=on|off]\n" > + " [,physical_block=size=size][,min_io_size=size][,opt_io_size=size]\n" > " use 'file' as a drive image\n") > DEF("set", HAS_ARG, QEMU_OPTION_set, > "-set group.id.arg=value\n" > @@ -149,6 +150,15 @@ an untrusted format header. > This option specifies the serial number to assign to the device. > @item addr=@var{addr} > Specify the controller's PCI address (if=virtio only). > +@item physical_sector_size=@var{size} > +Report a physical block size larger than the logical block size of 512 bytes. > +@item min_io_size=@var{size} > +Reported a minimum I/O size or optimum I/O granularity. This is the smallest > +I/O size the device can perform without a performance penalty. For RAID > +devices this should be set to the stripe chunk size. > +@item opt_io_size=@var{size} > +Report an optimal I/O size, which is the device's preferred unit for > +sustained I/O. This should be set to the stripe width for RAID devices. > @end table > > By default, writethrough caching is used for all block device. This means that > > > >
On Wed, Feb 03, 2010 at 01:00:47PM -0600, Anthony Liguori wrote: > But I don't think this is the wrong place to do it. The > BlockDriverState reflects that backing device, not the emulated device > itself. In this case, you're trying to set a property of the emulated > device. I think that's very borderline. While the emulated device exposes these properties, they are in fact a property of the backing storage, the right sector and min/max I/O sizes are determined by the backing storage device. > I think these need to be qdev properties of the respective devices. > From a UI perspective, you can still expose -drive options for the end > user to consume, but this data should be associated with the devices > themselves. In addition to not really beeing more logical this would be a lot more effort. We'd need to add properties to all the device, which means including dealing with the n+1 ide variants, the virtio-pci proxy, etc. If you believe it really needs to be in the qdev properties I'll implement it, but I suspect the current version is a better idea.
Christoph Hellwig wrote: > On Wed, Feb 03, 2010 at 01:00:47PM -0600, Anthony Liguori wrote: > > But I don't think this is the wrong place to do it. The > > BlockDriverState reflects that backing device, not the emulated device > > itself. In this case, you're trying to set a property of the emulated > > device. > > I think that's very borderline. While the emulated device exposes these > properties, they are in fact a property of the backing storage, the > right sector and min/max I/O sizes are determined by the backing storage > device. > > > I think these need to be qdev properties of the respective devices. > > From a UI perspective, you can still expose -drive options for the end > > user to consume, but this data should be associated with the devices > > themselves. > > In addition to not really beeing more logical this would be a lot more > effort. We'd need to add properties to all the device, which means > including dealing with the n+1 ide variants, the virtio-pci proxy, etc. > > If you believe it really needs to be in the qdev properties I'll > implement it, but I suspect the current version is a better idea. If you move your VM to a new system with different backing devices, sometimes you want to be sure there is no guest-visible change. Or even if you just replace a drive - you might prefer confidence that the guest sees no change. Even if you just convert between qcow2 and a raw block device, or the other way, you'll sometimes want to be sure it's not guest-visible. -- Jamie
On Fri, Feb 05, 2010 at 04:16:20PM +0000, Jamie Lokier wrote: > If you move your VM to a new system with different backing devices, > sometimes you want to be sure there is no guest-visible change. Or > even if you just replace a drive - you might prefer confidence that > the guest sees no change. Yes, that's why we do not auto probe it but require it to be set manually. Note that not the physical block size attribute can we a data integrity issue, though. A storage device guarnatees that it can write a sector atomically, so moving from a 4k to a 512 byte physcical sector device could lead to not beeing able to atomically write a 4k piece of data that the guest expects to write atomcially. I'm not sure how failure safe the read-modify-write algorithms on 4k sector disks with logical 512 bye blocks are, but I'd expect issues there, too. > Even if you just convert between qcow2 and a raw block device, or the > other way, you'll sometimes want to be sure it's not guest-visible. The image format has no hooks into these options currently.
Christoph Hellwig wrote: > Note that not the physical block size attribute can > we a data integrity issue, though. I agree. > A storage device guarnatees that it can write a sector atomically, I've looked everywhere for confirmation of *atomicity*, and so far all I've seen are rumours. Some people believe sector writes are atomic during power failure, some people believe they are not. Those who believe it is, don't have reliable references, and I haven't seen it in any standard. SQLite has a flag which can be set for a backing store to say block writes are atomic, to enable it to optimise some things; the flag is not set when writing to a disk block device, because they didn't find confirmation of it. > so moving from a 4k to a 512 byte physcical sector device could lead > to not beeing able to atomically write a 4k piece of data that the > guest expects to write atomcially. If there is no confirmation that sector writes are atomic, then no database or filesystem should be relying on that property anyway. > I'm not sure how failure safe the read-modify-write algorithms on > 4k sector disks with logical 512 bye blocks are, but I'd expect issues > there, too. I think you might be referring to what I'm calling "radius of destruction", because I don't know if there's a well known term for it. By that I mean if you write 512 bytes, and it's implemented as RMW to a 4k sector, then on power failure any part of the 4k sector could be corrupted. On some RAIDs the size is much larger; also on many flash devices. (RAIDs make it clearer that the alignment is relevant too.) Note that if 4k sector writes were _atomic_, then read-modify-write of 512 bytes would be completely reliable. > > Even if you just convert between qcow2 and a raw block device, or the > > other way, you'll sometimes want to be sure it's not guest-visible. > > The image format has no hooks into these options currently. No, but whatever is reported to the guest, you may device you want it to continue being reported to the guest after doing the convert operation. Even if it's a data integrity concern. In fact *precisely* when the guest has algorithms which write differently depending on the sector size, for integrity, that means changing the guest-visible sector size may trigger bugs and other changes that you sometimes don't want. I agree that it should report the size by default, though, because the integrity concern. -- Jamie
On 02/05/2010 07:09 AM, Christoph Hellwig wrote: > On Wed, Feb 03, 2010 at 01:00:47PM -0600, Anthony Liguori wrote: > >> But I don't think this is the wrong place to do it. The >> BlockDriverState reflects that backing device, not the emulated device >> itself. In this case, you're trying to set a property of the emulated >> device. >> > I think that's very borderline. While the emulated device exposes these > properties, they are in fact a property of the backing storage, the > right sector and min/max I/O sizes are determined by the backing storage > device. > It's guest visible state that should be configured based on the properties of the backing store. However, as you said, live migration complicates things. drive is not an optimal interface because it mixes host and guest state. But the subset of -drive that you normally use should be independent of the guest. For instance, with: -drive file=/home/anthony/foo.img,cache=off,aio=native,id=foo -device virtio-blk-pci,drive=foo file, cache, and aio are all host properties. They have no visible impact to the guest and if I change those properties during live migration, they take affect without the guest noticing These topology options still can be specified via -drive, but the proper way of splitting things would have them as part of the -device option. During live migration, things on the -device side inherited as part of the live migration process. >> I think these need to be qdev properties of the respective devices. >> From a UI perspective, you can still expose -drive options for the end >> user to consume, but this data should be associated with the devices >> themselves. >> > In addition to not really beeing more logical this would be a lot more > effort. We'd need to add properties to all the device, which means > including dealing with the n+1 ide variants, the virtio-pci proxy, etc. > The way we deal with this with network devices is we have a common DEFINE_NIC_PROPERTIES that lets us consistently add properties for all network devices. We need exactly this sort of thing for disk drives for the reason you describe. > If you believe it really needs to be in the qdev properties I'll > implement it, but I suspect the current version is a better idea. > It definitely a correctness issue. Once we can describe the full PC via qdev, we want to be able to migrate that description as part of the migration traffic. For that to work, we need to have all of the device characteristics expressed as qdev properties. Regards, Anthony Liguori
Anthony Liguori <anthony@codemonkey.ws> writes: > On 02/05/2010 07:09 AM, Christoph Hellwig wrote: >> On Wed, Feb 03, 2010 at 01:00:47PM -0600, Anthony Liguori wrote: >> >>> But I don't think this is the wrong place to do it. The >>> BlockDriverState reflects that backing device, not the emulated device >>> itself. In this case, you're trying to set a property of the emulated >>> device. >>> >> I think that's very borderline. While the emulated device exposes these >> properties, they are in fact a property of the backing storage, the >> right sector and min/max I/O sizes are determined by the backing storage >> device. >> > > It's guest visible state that should be configured based on the > properties of the backing store. > > However, as you said, live migration complicates things. > > drive is not an optimal interface because it mixes host and guest > state. Qdev separates guest and host stuff. -device does guest, -netdev, -chardev, -drive if=none do host. Unfortunately, -drive can also do host, with other values for parameter if. A clean, host-only -blockdev for use with -device would be nice to have. > But the subset of -drive that you normally use should be > independent of the guest. To be precise: those properties that make sense with if=none. > For instance, with: > > -drive file=/home/anthony/foo.img,cache=off,aio=native,id=foo -device > virtio-blk-pci,drive=foo There's an implied if=ide, which means it's mixed host and guest stuff. > file, cache, and aio are all host properties. They have no visible > impact to the guest and if I change those properties during live > migration, they take affect without the guest noticing > > These topology options still can be specified via -drive, but the > proper way of splitting things would have them as part of the -device > option. During live migration, things on the -device side inherited > as part of the live migration process. If it's guest-visible, it should live in guest state (qdev device state), and be configured with -device. Else, it should live in host state (block driver state), and be configured via -drive if=none (or -blockdev once we have it). [...]
Index: qemu/block.c =================================================================== --- qemu.orig/block.c 2010-01-29 11:07:50.083004364 +0100 +++ qemu/block.c 2010-01-29 11:08:32.940004255 +0100 @@ -1028,6 +1028,51 @@ int bdrv_enable_write_cache(BlockDriverS return bs->enable_write_cache; } +unsigned int bdrv_get_physical_block_size(BlockDriverState *bs) +{ + return bs->physical_block_size; +} + +unsigned int bdrv_get_physical_block_exp(BlockDriverState *bs) +{ + unsigned int exp = 0, size; + + for (size = bs->physical_block_size; size > 512; size >>= 1) { + exp++; + } + + return exp; +} + +void bdrv_set_physical_block_size(BlockDriverState *bs, + unsigned int physical_block_size) +{ + bs->physical_block_size = physical_block_size; +} + + +unsigned int bdrv_get_min_io_size(BlockDriverState *bs) +{ + return bs->min_io_size; +} + +void bdrv_set_min_io_size(BlockDriverState *bs, + unsigned int min_io_size) +{ + bs->min_io_size = min_io_size; +} + +unsigned int bdrv_get_opt_io_size(BlockDriverState *bs) +{ + return bs->opt_io_size; +} + +void bdrv_set_opt_io_size(BlockDriverState *bs, + unsigned int opt_io_size) +{ + bs->opt_io_size = opt_io_size; +} + /* XXX: no longer used */ void bdrv_set_change_cb(BlockDriverState *bs, void (*change_cb)(void *opaque), void *opaque) Index: qemu/block.h =================================================================== --- qemu.orig/block.h 2010-01-29 11:07:50.089004011 +0100 +++ qemu/block.h 2010-01-29 11:08:32.940004255 +0100 @@ -152,6 +152,16 @@ int bdrv_is_inserted(BlockDriverState *b int bdrv_media_changed(BlockDriverState *bs); int bdrv_is_locked(BlockDriverState *bs); void bdrv_set_locked(BlockDriverState *bs, int locked); +unsigned int bdrv_get_physical_block_size(BlockDriverState *bs); +unsigned int bdrv_get_physical_block_exp(BlockDriverState *bs); +void bdrv_set_physical_block_size(BlockDriverState *bs, + unsigned int physical_block_size); +unsigned int bdrv_get_min_io_size(BlockDriverState *bs); +void bdrv_set_min_io_size(BlockDriverState *bs, + unsigned int min_io_size); +unsigned int bdrv_get_opt_io_size(BlockDriverState *bs); +void bdrv_set_opt_io_size(BlockDriverState *bs, + unsigned int opt_io_size); int bdrv_eject(BlockDriverState *bs, int eject_flag); void bdrv_set_change_cb(BlockDriverState *bs, void (*change_cb)(void *opaque), void *opaque); Index: qemu/block_int.h =================================================================== --- qemu.orig/block_int.h 2010-01-29 11:07:50.096004065 +0100 +++ qemu/block_int.h 2010-01-29 11:08:32.941003474 +0100 @@ -173,6 +173,14 @@ struct BlockDriverState { drivers. They are not used by the block driver */ int cyls, heads, secs, translation; int type; + + /* + * Topology information, all optional. + */ + unsigned int physical_block_size; + unsigned int min_io_size; + unsigned int opt_io_size; + char device_name[32]; unsigned long *dirty_bitmap; BlockDriverState *next; Index: qemu/qemu-config.c =================================================================== --- qemu.orig/qemu-config.c 2010-01-29 11:07:50.133004032 +0100 +++ qemu/qemu-config.c 2010-01-29 11:08:32.944025367 +0100 @@ -78,6 +78,15 @@ QemuOptsList qemu_drive_opts = { },{ .name = "readonly", .type = QEMU_OPT_BOOL, + },{ + .name = "physical_block_size", + .type = QEMU_OPT_NUMBER, + },{ + .name = "min_io_size", + .type = QEMU_OPT_NUMBER, + },{ + .name = "opt_io_size", + .type = QEMU_OPT_NUMBER, }, { /* end if list */ } }, Index: qemu/vl.c =================================================================== --- qemu.orig/vl.c 2010-01-29 11:07:50.141004284 +0100 +++ qemu/vl.c 2010-01-29 11:08:32.947003820 +0100 @@ -1904,6 +1904,9 @@ DriveInfo *drive_init(QemuOpts *opts, vo int index; int cache; int aio = 0; + unsigned long physical_block_size = 512; + unsigned long min_io_size = 0; + unsigned long opt_io_size = 0; int ro = 0; int bdrv_flags; int on_read_error, on_write_error; @@ -2053,6 +2056,32 @@ DriveInfo *drive_init(QemuOpts *opts, vo } #endif + if ((buf = qemu_opt_get(opts, "physical_block_size")) != NULL) { + physical_block_size = strtoul(buf, NULL, 10); + if (physical_block_size < 512) { + fprintf(stderr, "sector size must be larger than 512 bytes\n"); + return NULL; + } + } + + if ((buf = qemu_opt_get(opts, "min_io_size")) != NULL) { + min_io_size = strtoul(buf, NULL, 10); + if (!min_io_size || (min_io_size % physical_block_size)) { + fprintf(stderr, + "min_io_size must be a multiple of the sector size\n"); + return NULL; + } + } + + if ((buf = qemu_opt_get(opts, "opt_io_size")) != NULL) { + opt_io_size = strtoul(buf, NULL, 10); + if (!opt_io_size || (opt_io_size % min_io_size)) { + fprintf(stderr, + "opt_io_size must be a multiple of min_io_size\n"); + return NULL; + } + } + if ((buf = qemu_opt_get(opts, "format")) != NULL) { if (strcmp(buf, "?") == 0) { fprintf(stderr, "qemu: Supported formats:"); @@ -2257,6 +2286,12 @@ DriveInfo *drive_init(QemuOpts *opts, vo return NULL; } + bdrv_set_physical_block_size(dinfo->bdrv, physical_block_size); + if (min_io_size) + bdrv_set_min_io_size(dinfo->bdrv, min_io_size); + if (opt_io_size) + bdrv_set_opt_io_size(dinfo->bdrv, opt_io_size); + if (bdrv_key_required(dinfo->bdrv)) autostart = 0; *fatal_error = 0; Index: qemu/qemu-options.hx =================================================================== --- qemu.orig/qemu-options.hx 2010-01-29 11:07:50.149004395 +0100 +++ qemu/qemu-options.hx 2010-01-29 19:36:06.118256061 +0100 @@ -104,6 +104,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive, " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n" " [,cache=writethrough|writeback|none][,format=f][,serial=s]\n" " [,addr=A][,id=name][,aio=threads|native][,readonly=on|off]\n" + " [,physical_block=size=size][,min_io_size=size][,opt_io_size=size]\n" " use 'file' as a drive image\n") DEF("set", HAS_ARG, QEMU_OPTION_set, "-set group.id.arg=value\n" @@ -149,6 +150,15 @@ an untrusted format header. This option specifies the serial number to assign to the device. @item addr=@var{addr} Specify the controller's PCI address (if=virtio only). +@item physical_sector_size=@var{size} +Report a physical block size larger than the logical block size of 512 bytes. +@item min_io_size=@var{size} +Reported a minimum I/O size or optimum I/O granularity. This is the smallest +I/O size the device can perform without a performance penalty. For RAID +devices this should be set to the stripe chunk size. +@item opt_io_size=@var{size} +Report an optimal I/O size, which is the device's preferred unit for +sustained I/O. This should be set to the stripe width for RAID devices. @end table By default, writethrough caching is used for all block device. This means that
Add three new suboptions for the drive option to export block topology information to the guest. This is needed to get optimal I/O alignment for RAID arrays or SSDs. The options are: - physical_block_size to specify the physical block size of the device, this is going to increase from 512 bytes to 4096 kilobytes for many modern storage devices - min_io_size to specify the minimal I/O size without performance impact, this is typically set to the RAID chunk size for arrays. - opt_io_size to specify the optimal sustained I/O size, this is typically the RAID stripe width for arrays. I decided to not auto-probe these values from blkid which might easily be possible as I don't know how to deal with these issues on migration. Note that we specificly only set the physical_block_size, and not the logial one which is the unit all I/O is described in. The reason for that is that IDE does not support increasing the logical block size and at last for now I want to stick to one meachnisms in queue and allow for easy switching of transports for a given backing image which would not be possible if scsi and virtio use real 4k sectors, while ide only uses the physical block exponent. Signed-off-by: Christoph Hellwig <hch@lst.de>