Message ID | 1262081278-1858-1-git-send-email-avi@redhat.com |
---|---|
State | New |
Headers | show |
Avi Kivity wrote: > Guests use this number as a hint for alignment and I/O request sizes. It's not just a hint. It is also the "radius of corruption on failed write" - important for journalling filesystems and databases. > Given > that modern disks have 4K block sizes, Do they, yet? > and cached file-backed images also have 4K block sizes, this hint > can improve guest performance. Agreed - but see below. > We probably need to make this configurable depending on machine type. It > should be the default for -M 0.13 only as it can affect guest code paths. What about that Windows/Linux 4k sectors incompatibility thing, where disks with 4k sectors have to sense whether the first partition starts at 512-byte sector 63 (Linux) or 512-byte sector 1024 (or something; Windows), and then adjust their 512-byte sector to 4k-sector mapping so that 4k blocks within the partition are aligned to 4k sectors? Iirc, Linux (and old but not current Windows) tends to place the first partition starting at sector 63, which means 4k filesystem blocks will _not_ align to 4k blocks in the cached file-backed images with Qemu. It has been discussed for hardware disk design with 4k sectors, and somehow there were plans to map sectors so that the Linux partition scheme results in nicely aligned filesystem blocks - so Qemu's IDE (and SCSI) emulation should do the same. Or should it? I don't know how the 4k sector thing worked out in the end, or if it's still in discussion. -- Jamie
On Tue, Dec 29, 2009 at 2:21 PM, Jamie Lokier <jamie@shareable.org> wrote: > Avi Kivity wrote: >> Guests use this number as a hint for alignment and I/O request sizes. > > It's not just a hint. It is also the "radius of corruption on failed > write" - important for journalling filesystems and databases. > >> Given >> that modern disks have 4K block sizes, > > Do they, yet? Yes, there are WD disks in the wild with 4k blocks, although in this first transition phase the firmware hides the fact and emulates the old 512b sector. >> We probably need to make this configurable depending on machine type. It >> should be the default for -M 0.13 only as it can affect guest code paths. > > What about that Windows/Linux 4k sectors incompatibility thing, where > disks with 4k sectors have to sense whether the first partition starts > at 512-byte sector 63 (Linux) or 512-byte sector 1024 (or something; > Windows), and then adjust their 512-byte sector to 4k-sector mapping > so that 4k blocks within the partition are aligned to 4k sectors? Linux tools put the first partition at sector 63 (512-byte) to retain compatibility with Windows; Linux itself does not have any problem with different layouts. See e.g. [1] The problem seems to be limited to Win 5.x (XP, 2k3) and WD has an utility[2] to re-align partitions in this case, so I guess that they do cope fine with a 4k-aligned partition table, they just create it unaligned by default. > It has been discussed for hardware disk design with 4k sectors, and > somehow there were plans to map sectors so that the Linux partition > scheme results in nicely aligned filesystem blocks Ugh, I hope you're wrong ;-) AFAICS remapping will lead only to headaches... Linux does not have any problem with aligned partitions. Luca [1] http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/ [2] http://support.wdc.com/product/download.asp?groupid=805&sid=123&lang=en
On 12/29/2009 03:39 PM, Luca Tettamanti wrote: > > Ugh, I hope you're wrong ;-) AFAICS remapping will lead only to > headaches... Linux does not have any problem with aligned partitions. > > And in fact, that was the motivation for this patch, as parted will align based on the physical block size.
On Tue, Dec 29, 2009 at 02:39:38PM +0100, Luca Tettamanti wrote: > Linux tools put the first partition at sector 63 (512-byte) to retain > compatibility with Windows; Well, some of them, and depending on the exact disks. It's all rather complicated. > > It has been discussed for hardware disk design with 4k sectors, and > > somehow there were plans to map sectors so that the Linux partition > > scheme results in nicely aligned filesystem blocks > > Ugh, I hope you're wrong ;-) AFAICS remapping will lead only to > headaches... Linux does not have any problem with aligned partitions. Linux doesn't care. As doesn't windows. But performance on mis-aligned partitions will suck badly - both on 4k sector drives, SSDs or probably various copy on write layers in virtualization once you hit the worst case. Fortunately the block topology information present in recent ATA and SCSI standards allows the storage hardware to tell about the required alignment, and Linux now has a topology API to expose it, which is used by the most recent versions of the partitioning tools and filesystem creation tools.
On Tue, Dec 29, 2009 at 12:07:58PM +0200, Avi Kivity wrote: > Guests use this number as a hint for alignment and I/O request sizes. Given > that modern disks have 4K block sizes, and cached file-backed images also > have 4K block sizes, this hint can improve guest performance. > > We probably need to make this configurable depending on machine type. It > should be the default for -M 0.13 only as it can affect guest code paths. The information is correct per the ATA spec, but: (a) as mentioned above it should not be used for old machine types (b) we need to sort out passing through the first block alignment bits that are also in IDENTIFY word 106 if using a raw block device underneat (b) probably need to adjust the physical blocks size depending on the underlying storage topology. I have a patch in my queue for a while now dealing with (b) and parts of (c), but it's been preempted by more urgent work.
diff --git a/hw/ide/core.c b/hw/ide/core.c index 76c3820..89fd3ce 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -164,6 +164,7 @@ static void ide_identify(IDEState *s) put_le16(p + 101, s->nb_sectors >> 16); put_le16(p + 102, s->nb_sectors >> 32); put_le16(p + 103, s->nb_sectors >> 48); + put_le16(p + 106, 0x6000 | 3); /* 8 logical sectors per physical sector */ memcpy(s->identify_data, p, sizeof(s->identify_data)); s->identify_set = 1;
Guests use this number as a hint for alignment and I/O request sizes. Given that modern disks have 4K block sizes, and cached file-backed images also have 4K block sizes, this hint can improve guest performance. We probably need to make this configurable depending on machine type. It should be the default for -M 0.13 only as it can affect guest code paths. Signed-off-by: Avi Kivity <avi@redhat.com> --- hw/ide/core.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)