Message ID | 20140514154230.10746.56297.stgit@bahia.local |
---|---|
State | New |
Headers | show |
Il 14/05/2014 17:42, Greg Kurz ha scritto: > + { .name = "virtio/is_big_endian", > + .version_id = 1, > + .save = virtio_save_device_endian, > + .load = virtio_load_device_endian, > + }, > { .name = NULL } I think this is okay, but you need to add support for a "needed" function like you have in normal vmstate subsections. You only need the subsection if if (target_words_bigendian()) { return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_BIG; } else { return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE; } Paolo
On Tue, 27 May 2014 17:01:38 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 14/05/2014 17:42, Greg Kurz ha scritto: > > + { .name = "virtio/is_big_endian", > > + .version_id = 1, > > + .save = virtio_save_device_endian, > > + .load = virtio_load_device_endian, > > + }, > > { .name = NULL } > > I think this is okay, but you need to add support for a "needed" > function like you have in normal vmstate subsections. You only need the > subsection if > > if (target_words_bigendian()) { > return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_BIG; > } else { > return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE; > } > > Paolo > Hi Paolo, As suggested by Andreas, I have reworked the patch set to use VMState directly instead of mimicking... and of course, I end up with a virtio_device_endian_needed() function that does just that ! :) I'm on leave now, I'll try to send an update next week. BTW, I have spotted two locations where the migration code is affected by the device endianness at load time: int virtio_load(VirtIODevice *vdev, QEMUFile *f) { [...] nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; ^^^^^^^^^^^ /* Check it isn't doing very strange things with descriptor numbers. */ if (nheads > vdev->vq[i].vring.num) { [...] } and static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) { [...] /* The config space */ qemu_get_be16s(f, &s->config.cols); qemu_get_be16s(f, &s->config.rows); qemu_get_be32s(f, &max_nr_ports); tswap32s(&max_nr_ports); ^^^^^^ if (max_nr_ports > tswap32(s->config.max_nr_ports)) { [...] } If we stream subsections after the device descriptor as it is done in VMState, these two will break because the device endian is stale. The first one can be easily dealt with: just defer the sanity check to a post_load function. The second is a bit more tricky: the virtio serial migrates its config space (target endian) and the active ports bitmap. The load code assumes max_nr_ports from the config space tells the size of the ports bitmap... that means the virtio migration protocol is also contaminated by target endianness. :-\ I have questions about virtio serial: - what is exactly the point at migrating the virtio device config space ? - what is the scenario that calls for the active ports bitmap checking at load time ? - we currently have 0 < max_nr_ports < VIRTIO_PCI_QUEUE_MAX / 2 hardcoded. With the current code, that means the ports bitmap is always a single 32-bit fixed size value... are there plans to support virtio serial with 0 port ? with more than 32 ports ? In the case the answer for above is "legacy virtio really sucks" then, is it acceptable to not honor bug-compatibility with older versions and fix the code ? :) A solution could be to simply remove all that "crap" and bump versions, or at least send zeroes on save and skip them on load. Another solution I haven't tried yet would be to stream subsections before the device descriptor... Any suggestions ? Cheers.
Il 29/05/2014 11:12, Greg Kurz ha scritto: > int virtio_load(VirtIODevice *vdev, QEMUFile *f) > { > [...] > nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; > ^^^^^^^^^^^ > /* Check it isn't doing very strange things with descriptor numbers. */ > if (nheads > vdev->vq[i].vring.num) { > [...] > } > > and > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) > { > [...] > /* The config space */ > qemu_get_be16s(f, &s->config.cols); > qemu_get_be16s(f, &s->config.rows); > > qemu_get_be32s(f, &max_nr_ports); > tswap32s(&max_nr_ports); > ^^^^^^ > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > [...] > } > > If we stream subsections after the device descriptor as it is done > in VMState, these two will break because the device endian is stale. > > The first one can be easily dealt with: just defer the sanity check > to a post_load function. Good, we're lucky here. > The second is a bit more tricky: the > virtio serial migrates its config space (target endian) and the > active ports bitmap. The load code assumes max_nr_ports from the > config space tells the size of the ports bitmap... that means the > virtio migration protocol is also contaminated by target endianness. :-\ Ouch. I guess we could break migration in the case of host endianness != target endianness, like this: /* These three used to be fetched in target endianness and then * stored as big endian. It ended up as little endian if host and * target endianness doesn't match. * * Starting with qemu 2.1, we always store as big endian. The * version wasn't bumped to avoid breaking backwards compatibility. * We check the validity of max_nr_ports, and the incorrect- * endianness max_nr_ports will be huge, which will abort migration * anyway. */ uint16_t cols = tswap16(s->config.cols); uint16_t rows = tswap16(s->config.rows); uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); qemu_put_be16s(f, &cols); qemu_put_be16s(f, &rows); qemu_put_be32s(f, &max_nr_ports); ... uint16_t cols, rows; qemu_get_be16s(f, &cols); qemu_get_be16s(f, &rows); qemu_get_be32s(f, &max_nr_ports); /* Convert back to target endianness when storing into the config * space. */ s->config.cols = tswap16(cols); s->config.rows = tswap16(rows); if (max_nr_ports > tswap32(s->config.max_nr_ports) { ... } > In the case the answer for above is "legacy virtio really sucks" then, > is it acceptable to not honor bug-compatibility with older versions and > fix the code ? :) As long as the common cases don't break, yes. The question is what are the common cases. Here I think the only non-obscure case that could break is x86-on-PPC, and it's not that common. Paolo
On Thu, 29 May 2014 12:16:26 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 29/05/2014 11:12, Greg Kurz ha scritto: > > int virtio_load(VirtIODevice *vdev, QEMUFile *f) > > { > > [...] > > nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; > > ^^^^^^^^^^^ > > /* Check it isn't doing very strange things with descriptor numbers. */ > > if (nheads > vdev->vq[i].vring.num) { > > [...] > > } > > > > and > > > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) > > { > > [...] > > /* The config space */ > > qemu_get_be16s(f, &s->config.cols); > > qemu_get_be16s(f, &s->config.rows); > > > > qemu_get_be32s(f, &max_nr_ports); > > tswap32s(&max_nr_ports); > > ^^^^^^ > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > [...] > > } > > > > If we stream subsections after the device descriptor as it is done > > in VMState, these two will break because the device endian is stale. > > > > The first one can be easily dealt with: just defer the sanity check > > to a post_load function. > > Good, we're lucky here. > > > The second is a bit more tricky: the > > virtio serial migrates its config space (target endian) and the > > active ports bitmap. The load code assumes max_nr_ports from the > > config space tells the size of the ports bitmap... that means the > > virtio migration protocol is also contaminated by target endianness. :-\ > > Ouch. > > I guess we could break migration in the case of host endianness != > target endianness, like this: > > /* These three used to be fetched in target endianness and then > * stored as big endian. It ended up as little endian if host and > * target endianness doesn't match. > * > * Starting with qemu 2.1, we always store as big endian. The > * version wasn't bumped to avoid breaking backwards compatibility. > * We check the validity of max_nr_ports, and the incorrect- > * endianness max_nr_ports will be huge, which will abort migration > * anyway. > */ > uint16_t cols = tswap16(s->config.cols); > uint16_t rows = tswap16(s->config.rows); > uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); > > qemu_put_be16s(f, &cols); > qemu_put_be16s(f, &rows); > qemu_put_be32s(f, &max_nr_ports); > > ... > > uint16_t cols, rows; > > qemu_get_be16s(f, &cols); > qemu_get_be16s(f, &rows); > qemu_get_be32s(f, &max_nr_ports); > > /* Convert back to target endianness when storing into the config > * space. > */ Paolo, The patch set to support endian changing targets adds a device_endian field to the VirtIODevice structure to be used instead of the default target endianness as it happens with tswap() macros. It also introduces virtio_tswap() helpers for this purpose, but they can only be used when the device_endian field has been restored... in a subsection after the device descriptor... :-\ If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything and we cannot convert back to LE... > s->config.cols = tswap16(cols); > s->config.rows = tswap16(rows); Since cols and rows are not involved in the protocol, we can safely defer the conversion to post load. > if (max_nr_ports > tswap32(s->config.max_nr_ports) { > ... > } > Since we know that 0 < max_nr_ports < 32, is it acceptable to guess the correct endianness with a heuristic ? if (max_nr_ports > tswap32(s->config.max_nr_ports)) { max_nr_ports = bswap32(max_nr_ports); } if (max_nr_ports > tswap32(s->config.max_nr_ports)) { return -EINVAL; } > > In the case the answer for above is "legacy virtio really sucks" then, > > is it acceptable to not honor bug-compatibility with older versions and > > fix the code ? :) > > As long as the common cases don't break, yes. The question is what are > the common cases. Here I think the only non-obscure case that could > break is x86-on-PPC, and it's not that common. > > Paolo > Thanks.
On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote: > On Thu, 29 May 2014 12:16:26 +0200 > Paolo Bonzini <pbonzini@redhat.com> wrote: > > Il 29/05/2014 11:12, Greg Kurz ha scritto: > > > int virtio_load(VirtIODevice *vdev, QEMUFile *f) > > > { > > > [...] > > > nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; > > > ^^^^^^^^^^^ > > > /* Check it isn't doing very strange things with descriptor numbers. */ > > > if (nheads > vdev->vq[i].vring.num) { > > > [...] > > > } > > > > > > and > > > > > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) > > > { > > > [...] > > > /* The config space */ > > > qemu_get_be16s(f, &s->config.cols); > > > qemu_get_be16s(f, &s->config.rows); > > > > > > qemu_get_be32s(f, &max_nr_ports); > > > tswap32s(&max_nr_ports); > > > ^^^^^^ > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > > [...] > > > } > > > > > > If we stream subsections after the device descriptor as it is done > > > in VMState, these two will break because the device endian is stale. > > > > > > The first one can be easily dealt with: just defer the sanity check > > > to a post_load function. > > > > Good, we're lucky here. > > > > > The second is a bit more tricky: the > > > virtio serial migrates its config space (target endian) and the > > > active ports bitmap. The load code assumes max_nr_ports from the > > > config space tells the size of the ports bitmap... that means the > > > virtio migration protocol is also contaminated by target endianness. :-\ > > > > Ouch. > > > > I guess we could break migration in the case of host endianness != > > target endianness, like this: > > > > /* These three used to be fetched in target endianness and then > > * stored as big endian. It ended up as little endian if host and > > * target endianness doesn't match. > > * > > * Starting with qemu 2.1, we always store as big endian. The > > * version wasn't bumped to avoid breaking backwards compatibility. > > * We check the validity of max_nr_ports, and the incorrect- > > * endianness max_nr_ports will be huge, which will abort migration > > * anyway. > > */ > > uint16_t cols = tswap16(s->config.cols); > > uint16_t rows = tswap16(s->config.rows); > > uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); > > > > qemu_put_be16s(f, &cols); > > qemu_put_be16s(f, &rows); > > qemu_put_be32s(f, &max_nr_ports); > > > > ... > > > > uint16_t cols, rows; > > > > qemu_get_be16s(f, &cols); > > qemu_get_be16s(f, &rows); > > qemu_get_be32s(f, &max_nr_ports); > > > > /* Convert back to target endianness when storing into the config > > * space. > > */ > > Paolo, > > The patch set to support endian changing targets adds a device_endian > field to the VirtIODevice structure to be used instead of the default > target endianness as it happens with tswap() macros. It also introduces > virtio_tswap() helpers for this purpose, but they can only be used when > the device_endian field has been restored... in a subsection after the > device descriptor... :-\ Store it earlier then, using plain put/get. You can still add a section conditionally to cause a cleaner failure in broken cross-version scenarios. > If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything > and we cannot convert back to LE... > > > s->config.cols = tswap16(cols); > > s->config.rows = tswap16(rows); > > Since cols and rows are not involved in the protocol, we can safely > defer the conversion to post load. > > > if (max_nr_ports > tswap32(s->config.max_nr_ports) { > > ... > > } > > > > Since we know that 0 < max_nr_ports < 32, is it acceptable to guess > the correct endianness with a heuristic ? > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > max_nr_ports = bswap32(max_nr_ports); > } > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > return -EINVAL; > } > > > > In the case the answer for above is "legacy virtio really sucks" then, > > > is it acceptable to not honor bug-compatibility with older versions and > > > fix the code ? :) > > > > As long as the common cases don't break, yes. The question is what are > > the common cases. Here I think the only non-obscure case that could > > break is x86-on-PPC, and it's not that common. > > > > Paolo > > > > Thanks. One starts doubting whether all this hackery is worth it. virtio 1.0 should be out real soon now, it makes everything LE so the problem goes away. It's not like PPC LE is so popular that we must support old drivers at all costs. Won't time be better spent backporting virtio 1.0 drivers? > -- > Gregory Kurz kurzgreg@fr.ibm.com > gkurz@linux.vnet.ibm.com > Software Engineer @ IBM/Meiosys http://www.ibm.com > Tel +33 (0)562 165 496 > > "Anarchy is about taking complete responsibility for yourself." > Alan Moore.
On Thu, 12 Jun 2014 10:54:48 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote: > > On Thu, 29 May 2014 12:16:26 +0200 > > Paolo Bonzini <pbonzini@redhat.com> wrote: > > > Il 29/05/2014 11:12, Greg Kurz ha scritto: > > > > int virtio_load(VirtIODevice *vdev, QEMUFile *f) > > > > { > > > > [...] > > > > nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; > > > > ^^^^^^^^^^^ > > > > /* Check it isn't doing very strange things with descriptor numbers. */ > > > > if (nheads > vdev->vq[i].vring.num) { > > > > [...] > > > > } > > > > > > > > and > > > > > > > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) > > > > { > > > > [...] > > > > /* The config space */ > > > > qemu_get_be16s(f, &s->config.cols); > > > > qemu_get_be16s(f, &s->config.rows); > > > > > > > > qemu_get_be32s(f, &max_nr_ports); > > > > tswap32s(&max_nr_ports); > > > > ^^^^^^ > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > > > [...] > > > > } > > > > > > > > If we stream subsections after the device descriptor as it is done > > > > in VMState, these two will break because the device endian is stale. > > > > > > > > The first one can be easily dealt with: just defer the sanity check > > > > to a post_load function. > > > > > > Good, we're lucky here. > > > > > > > The second is a bit more tricky: the > > > > virtio serial migrates its config space (target endian) and the > > > > active ports bitmap. The load code assumes max_nr_ports from the > > > > config space tells the size of the ports bitmap... that means the > > > > virtio migration protocol is also contaminated by target endianness. :-\ > > > > > > Ouch. > > > > > > I guess we could break migration in the case of host endianness != > > > target endianness, like this: > > > > > > /* These three used to be fetched in target endianness and then > > > * stored as big endian. It ended up as little endian if host and > > > * target endianness doesn't match. > > > * > > > * Starting with qemu 2.1, we always store as big endian. The > > > * version wasn't bumped to avoid breaking backwards compatibility. > > > * We check the validity of max_nr_ports, and the incorrect- > > > * endianness max_nr_ports will be huge, which will abort migration > > > * anyway. > > > */ > > > uint16_t cols = tswap16(s->config.cols); > > > uint16_t rows = tswap16(s->config.rows); > > > uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); > > > > > > qemu_put_be16s(f, &cols); > > > qemu_put_be16s(f, &rows); > > > qemu_put_be32s(f, &max_nr_ports); > > > > > > ... > > > > > > uint16_t cols, rows; > > > > > > qemu_get_be16s(f, &cols); > > > qemu_get_be16s(f, &rows); > > > qemu_get_be32s(f, &max_nr_ports); > > > > > > /* Convert back to target endianness when storing into the config > > > * space. > > > */ > > > > Paolo, > > > > The patch set to support endian changing targets adds a device_endian > > field to the VirtIODevice structure to be used instead of the default > > target endianness as it happens with tswap() macros. It also introduces > > virtio_tswap() helpers for this purpose, but they can only be used when > > the device_endian field has been restored... in a subsection after the > > device descriptor... :-\ > > Store it earlier then, using plain put/get. Not sure I follow... this will break compatibility, no ? > You can still add a section conditionally to cause > a cleaner failure in broken cross-version scenarios. > > > If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything > > and we cannot convert back to LE... > > > > > s->config.cols = tswap16(cols); > > > s->config.rows = tswap16(rows); > > > > Since cols and rows are not involved in the protocol, we can safely > > defer the conversion to post load. > > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports) { > > > ... > > > } > > > > > > > Since we know that 0 < max_nr_ports < 32, is it acceptable to guess > > the correct endianness with a heuristic ? > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > max_nr_ports = bswap32(max_nr_ports); > > } > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > return -EINVAL; > > } > > > > > > In the case the answer for above is "legacy virtio really sucks" then, > > > > is it acceptable to not honor bug-compatibility with older versions and > > > > fix the code ? :) > > > > > > As long as the common cases don't break, yes. The question is what are > > > the common cases. Here I think the only non-obscure case that could > > > break is x86-on-PPC, and it's not that common. > > > > > > Paolo > > > > > > > Thanks. > > One starts doubting whether all this hackery is worth it. virtio 1.0 > should be out real soon now, it makes everything LE so the problem goes > away. It's not like PPC LE is so popular that we must support old drivers > at all costs. Won't time be better spent backporting virtio 1.0 drivers? > Hmmm... AFAIC some QEMU maintainers expressed interest in supporting legacy virtio in the case we have endian-changing targets. Patches to run a ppc64le guests have been accepted in KVM, Linux and QEMU... the only missing block is virtio. I don't especially care in supporting old drivers at all cost: this request was expressed on the list. I just want people to be able to run a ppc64le ubuntu-14.04 (and soon other distros) guest on a ppc64 box and be able to migrate. Would it be acceptable to break compatibility for ppc64 (and maybe ARM) only with a target specific hook being called from the virtio code ? > > > -- > > Gregory Kurz kurzgreg@fr.ibm.com > > gkurz@linux.vnet.ibm.com > > Software Engineer @ IBM/Meiosys http://www.ibm.com > > Tel +33 (0)562 165 496 > > > > "Anarchy is about taking complete responsibility for yourself." > > Alan Moore. > Thanks.
On 12.06.14 09:54, Michael S. Tsirkin wrote: > On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote: >> On Thu, 29 May 2014 12:16:26 +0200 >> Paolo Bonzini <pbonzini@redhat.com> wrote: >>> Il 29/05/2014 11:12, Greg Kurz ha scritto: >>>> int virtio_load(VirtIODevice *vdev, QEMUFile *f) >>>> { >>>> [...] >>>> nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; >>>> ^^^^^^^^^^^ >>>> /* Check it isn't doing very strange things with descriptor numbers. */ >>>> if (nheads > vdev->vq[i].vring.num) { >>>> [...] >>>> } >>>> >>>> and >>>> >>>> static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) >>>> { >>>> [...] >>>> /* The config space */ >>>> qemu_get_be16s(f, &s->config.cols); >>>> qemu_get_be16s(f, &s->config.rows); >>>> >>>> qemu_get_be32s(f, &max_nr_ports); >>>> tswap32s(&max_nr_ports); >>>> ^^^^^^ >>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>>> [...] >>>> } >>>> >>>> If we stream subsections after the device descriptor as it is done >>>> in VMState, these two will break because the device endian is stale. >>>> >>>> The first one can be easily dealt with: just defer the sanity check >>>> to a post_load function. >>> Good, we're lucky here. >>> >>>> The second is a bit more tricky: the >>>> virtio serial migrates its config space (target endian) and the >>>> active ports bitmap. The load code assumes max_nr_ports from the >>>> config space tells the size of the ports bitmap... that means the >>>> virtio migration protocol is also contaminated by target endianness. :-\ >>> Ouch. >>> >>> I guess we could break migration in the case of host endianness != >>> target endianness, like this: >>> >>> /* These three used to be fetched in target endianness and then >>> * stored as big endian. It ended up as little endian if host and >>> * target endianness doesn't match. >>> * >>> * Starting with qemu 2.1, we always store as big endian. The >>> * version wasn't bumped to avoid breaking backwards compatibility. >>> * We check the validity of max_nr_ports, and the incorrect- >>> * endianness max_nr_ports will be huge, which will abort migration >>> * anyway. >>> */ >>> uint16_t cols = tswap16(s->config.cols); >>> uint16_t rows = tswap16(s->config.rows); >>> uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); >>> >>> qemu_put_be16s(f, &cols); >>> qemu_put_be16s(f, &rows); >>> qemu_put_be32s(f, &max_nr_ports); >>> >>> ... >>> >>> uint16_t cols, rows; >>> >>> qemu_get_be16s(f, &cols); >>> qemu_get_be16s(f, &rows); >>> qemu_get_be32s(f, &max_nr_ports); >>> >>> /* Convert back to target endianness when storing into the config >>> * space. >>> */ >> Paolo, >> >> The patch set to support endian changing targets adds a device_endian >> field to the VirtIODevice structure to be used instead of the default >> target endianness as it happens with tswap() macros. It also introduces >> virtio_tswap() helpers for this purpose, but they can only be used when >> the device_endian field has been restored... in a subsection after the >> device descriptor... :-\ > Store it earlier then, using plain put/get. > You can still add a section conditionally to cause > a cleaner failure in broken cross-version scenarios. > >> If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything >> and we cannot convert back to LE... >> >>> s->config.cols = tswap16(cols); >>> s->config.rows = tswap16(rows); >> Since cols and rows are not involved in the protocol, we can safely >> defer the conversion to post load. >> >>> if (max_nr_ports > tswap32(s->config.max_nr_ports) { >>> ... >>> } >>> >> Since we know that 0 < max_nr_ports < 32, is it acceptable to guess >> the correct endianness with a heuristic ? >> >> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >> max_nr_ports = bswap32(max_nr_ports); >> } >> >> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >> return -EINVAL; >> } >> >>>> In the case the answer for above is "legacy virtio really sucks" then, >>>> is it acceptable to not honor bug-compatibility with older versions and >>>> fix the code ? :) >>> As long as the common cases don't break, yes. The question is what are >>> the common cases. Here I think the only non-obscure case that could >>> break is x86-on-PPC, and it's not that common. >>> >>> Paolo >>> >> Thanks. > One starts doubting whether all this hackery is worth it. virtio 1.0 > should be out real soon now, it makes everything LE so the problem goes > away. It's not like PPC LE is so popular that we must support old drivers > at all costs. Won't time be better spent backporting virtio 1.0 drivers? There are already released and working Linux distributions (Ubuntu, openSUSE, maybe others) that don't have virtio 1.0 drivers. Putting our heads into the sand is not an option ;). Alex
Il 12/06/2014 09:43, Greg Kurz ha scritto: > Since we know that 0 < max_nr_ports < 32, is it acceptable to guess > the correct endianness with a heuristic ? > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > max_nr_ports = bswap32(max_nr_ports); > } > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > return -EINVAL; > } Yes, I guess it is acceptable. So first you should fix the code to always serialize fields as big-endian, and then apply this little hack and virtio_tswap on top of the previous change. Paolo
On 12.06.14 10:55, Paolo Bonzini wrote: > Il 12/06/2014 09:43, Greg Kurz ha scritto: >> Since we know that 0 < max_nr_ports < 32, is it acceptable to guess >> the correct endianness with a heuristic ? >> >> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >> max_nr_ports = bswap32(max_nr_ports); >> } >> >> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >> return -EINVAL; >> } > > Yes, I guess it is acceptable. So first you should fix the code to > always serialize fields as big-endian, and then apply this little hack > and virtio_tswap on top of the previous change. Why don't we just ignore those config fields? They won't change during the lifetime of the guest anyway - and configuration needs to be the same on source and dest to work. Alex
On Thu, Jun 12, 2014 at 10:47:17AM +0200, Greg Kurz wrote: > On Thu, 12 Jun 2014 10:54:48 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote: > > > On Thu, 29 May 2014 12:16:26 +0200 > > > Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > Il 29/05/2014 11:12, Greg Kurz ha scritto: > > > > > int virtio_load(VirtIODevice *vdev, QEMUFile *f) > > > > > { > > > > > [...] > > > > > nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; > > > > > ^^^^^^^^^^^ > > > > > /* Check it isn't doing very strange things with descriptor numbers. */ > > > > > if (nheads > vdev->vq[i].vring.num) { > > > > > [...] > > > > > } > > > > > > > > > > and > > > > > > > > > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) > > > > > { > > > > > [...] > > > > > /* The config space */ > > > > > qemu_get_be16s(f, &s->config.cols); > > > > > qemu_get_be16s(f, &s->config.rows); > > > > > > > > > > qemu_get_be32s(f, &max_nr_ports); > > > > > tswap32s(&max_nr_ports); > > > > > ^^^^^^ > > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > > > > [...] > > > > > } > > > > > > > > > > If we stream subsections after the device descriptor as it is done > > > > > in VMState, these two will break because the device endian is stale. > > > > > > > > > > The first one can be easily dealt with: just defer the sanity check > > > > > to a post_load function. > > > > > > > > Good, we're lucky here. > > > > > > > > > The second is a bit more tricky: the > > > > > virtio serial migrates its config space (target endian) and the > > > > > active ports bitmap. The load code assumes max_nr_ports from the > > > > > config space tells the size of the ports bitmap... that means the > > > > > virtio migration protocol is also contaminated by target endianness. :-\ > > > > > > > > Ouch. > > > > > > > > I guess we could break migration in the case of host endianness != > > > > target endianness, like this: > > > > > > > > /* These three used to be fetched in target endianness and then > > > > * stored as big endian. It ended up as little endian if host and > > > > * target endianness doesn't match. > > > > * > > > > * Starting with qemu 2.1, we always store as big endian. The > > > > * version wasn't bumped to avoid breaking backwards compatibility. > > > > * We check the validity of max_nr_ports, and the incorrect- > > > > * endianness max_nr_ports will be huge, which will abort migration > > > > * anyway. > > > > */ > > > > uint16_t cols = tswap16(s->config.cols); > > > > uint16_t rows = tswap16(s->config.rows); > > > > uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); > > > > > > > > qemu_put_be16s(f, &cols); > > > > qemu_put_be16s(f, &rows); > > > > qemu_put_be32s(f, &max_nr_ports); > > > > > > > > ... > > > > > > > > uint16_t cols, rows; > > > > > > > > qemu_get_be16s(f, &cols); > > > > qemu_get_be16s(f, &rows); > > > > qemu_get_be32s(f, &max_nr_ports); > > > > > > > > /* Convert back to target endianness when storing into the config > > > > * space. > > > > */ > > > > > > Paolo, > > > > > > The patch set to support endian changing targets adds a device_endian > > > field to the VirtIODevice structure to be used instead of the default > > > target endianness as it happens with tswap() macros. It also introduces > > > virtio_tswap() helpers for this purpose, but they can only be used when > > > the device_endian field has been restored... in a subsection after the > > > device descriptor... :-\ > > > > Store it earlier then, using plain put/get. > > Not sure I follow... this will break compatibility, no ? Do it only on the ambialent targets :) > > You can still add a section conditionally to cause > > a cleaner failure in broken cross-version scenarios. > > > > > If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything > > > and we cannot convert back to LE... > > > > > > > s->config.cols = tswap16(cols); > > > > s->config.rows = tswap16(rows); > > > > > > Since cols and rows are not involved in the protocol, we can safely > > > defer the conversion to post load. > > > > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports) { > > > > ... > > > > } > > > > > > > > > > Since we know that 0 < max_nr_ports < 32, is it acceptable to guess > > > the correct endianness with a heuristic ? > > > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > > max_nr_ports = bswap32(max_nr_ports); > > > } > > > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > > return -EINVAL; > > > } > > > > > > > > In the case the answer for above is "legacy virtio really sucks" then, > > > > > is it acceptable to not honor bug-compatibility with older versions and > > > > > fix the code ? :) > > > > > > > > As long as the common cases don't break, yes. The question is what are > > > > the common cases. Here I think the only non-obscure case that could > > > > break is x86-on-PPC, and it's not that common. > > > > > > > > Paolo > > > > > > > > > > Thanks. > > > > One starts doubting whether all this hackery is worth it. virtio 1.0 > > should be out real soon now, it makes everything LE so the problem goes > > away. It's not like PPC LE is so popular that we must support old drivers > > at all costs. Won't time be better spent backporting virtio 1.0 drivers? > > > > Hmmm... AFAIC some QEMU maintainers expressed interest in supporting legacy > virtio in the case we have endian-changing targets. Patches to run a ppc64le > guests have been accepted in KVM, Linux and QEMU... the only missing block > is virtio. I don't especially care in supporting old drivers at all cost: this > request was expressed on the list. I just want people to be able to run a ppc64le > ubuntu-14.04 (and soon other distros) guest on a ppc64 box and be able to migrate. Sigh. So ubuntu ships known-broken virtio drivers in their LTS release and now we are trying to make them work. There's no working qemu configuration at the moment, do we really have to build new hardware around known-broken drivers and then maintain it all for years? Why not build good hardware and backport new drivers? > Would it be acceptable to break compatibility for ppc64 (and maybe ARM) only with > a target specific hook being called from the virtio code ? I think so. Can we just add a function that tells us whether target is ambivalent? Only migrate new stuff if that's the case. > > > > > -- > > > Gregory Kurz kurzgreg@fr.ibm.com > > > gkurz@linux.vnet.ibm.com > > > Software Engineer @ IBM/Meiosys http://www.ibm.com > > > Tel +33 (0)562 165 496 > > > > > > "Anarchy is about taking complete responsibility for yourself." > > > Alan Moore. > > > > Thanks. > > -- > Gregory Kurz kurzgreg@fr.ibm.com > gkurz@linux.vnet.ibm.com > Software Engineer @ IBM/Meiosys http://www.ibm.com > Tel +33 (0)562 165 496 > > "Anarchy is about taking complete responsibility for yourself." > Alan Moore.
On Thu, 12 Jun 2014 10:55:42 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 12/06/2014 09:43, Greg Kurz ha scritto: > > Since we know that 0 < max_nr_ports < 32, is it acceptable to guess > > the correct endianness with a heuristic ? > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > max_nr_ports = bswap32(max_nr_ports); > > } > > > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > > return -EINVAL; > > } > > Yes, I guess it is acceptable. So first you should fix the code to > always serialize fields as big-endian, and then apply this little hack > and virtio_tswap on top of the previous change. > > Paolo > BTW, can someone explain why we stream the device config ?
On 12.06.14 10:47, Greg Kurz wrote: > On Thu, 12 Jun 2014 10:54:48 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote: >>> On Thu, 29 May 2014 12:16:26 +0200 >>> Paolo Bonzini <pbonzini@redhat.com> wrote: >>>> Il 29/05/2014 11:12, Greg Kurz ha scritto: >>>>> int virtio_load(VirtIODevice *vdev, QEMUFile *f) >>>>> { >>>>> [...] >>>>> nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; >>>>> ^^^^^^^^^^^ >>>>> /* Check it isn't doing very strange things with descriptor numbers. */ >>>>> if (nheads > vdev->vq[i].vring.num) { >>>>> [...] >>>>> } >>>>> >>>>> and >>>>> >>>>> static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) >>>>> { >>>>> [...] >>>>> /* The config space */ >>>>> qemu_get_be16s(f, &s->config.cols); >>>>> qemu_get_be16s(f, &s->config.rows); >>>>> >>>>> qemu_get_be32s(f, &max_nr_ports); >>>>> tswap32s(&max_nr_ports); >>>>> ^^^^^^ >>>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>>>> [...] >>>>> } >>>>> >>>>> If we stream subsections after the device descriptor as it is done >>>>> in VMState, these two will break because the device endian is stale. >>>>> >>>>> The first one can be easily dealt with: just defer the sanity check >>>>> to a post_load function. >>>> Good, we're lucky here. >>>> >>>>> The second is a bit more tricky: the >>>>> virtio serial migrates its config space (target endian) and the >>>>> active ports bitmap. The load code assumes max_nr_ports from the >>>>> config space tells the size of the ports bitmap... that means the >>>>> virtio migration protocol is also contaminated by target endianness. :-\ >>>> Ouch. >>>> >>>> I guess we could break migration in the case of host endianness != >>>> target endianness, like this: >>>> >>>> /* These three used to be fetched in target endianness and then >>>> * stored as big endian. It ended up as little endian if host and >>>> * target endianness doesn't match. >>>> * >>>> * Starting with qemu 2.1, we always store as big endian. The >>>> * version wasn't bumped to avoid breaking backwards compatibility. >>>> * We check the validity of max_nr_ports, and the incorrect- >>>> * endianness max_nr_ports will be huge, which will abort migration >>>> * anyway. >>>> */ >>>> uint16_t cols = tswap16(s->config.cols); >>>> uint16_t rows = tswap16(s->config.rows); >>>> uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); >>>> >>>> qemu_put_be16s(f, &cols); >>>> qemu_put_be16s(f, &rows); >>>> qemu_put_be32s(f, &max_nr_ports); >>>> >>>> ... >>>> >>>> uint16_t cols, rows; >>>> >>>> qemu_get_be16s(f, &cols); >>>> qemu_get_be16s(f, &rows); >>>> qemu_get_be32s(f, &max_nr_ports); >>>> >>>> /* Convert back to target endianness when storing into the config >>>> * space. >>>> */ >>> Paolo, >>> >>> The patch set to support endian changing targets adds a device_endian >>> field to the VirtIODevice structure to be used instead of the default >>> target endianness as it happens with tswap() macros. It also introduces >>> virtio_tswap() helpers for this purpose, but they can only be used when >>> the device_endian field has been restored... in a subsection after the >>> device descriptor... :-\ >> Store it earlier then, using plain put/get. > Not sure I follow... this will break compatibility, no ? > >> You can still add a section conditionally to cause >> a cleaner failure in broken cross-version scenarios. >> >>> If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything >>> and we cannot convert back to LE... >>> >>>> s->config.cols = tswap16(cols); >>>> s->config.rows = tswap16(rows); >>> Since cols and rows are not involved in the protocol, we can safely >>> defer the conversion to post load. >>> >>>> if (max_nr_ports > tswap32(s->config.max_nr_ports) { >>>> ... >>>> } >>>> >>> Since we know that 0 < max_nr_ports < 32, is it acceptable to guess >>> the correct endianness with a heuristic ? >>> >>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>> max_nr_ports = bswap32(max_nr_ports); >>> } >>> >>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>> return -EINVAL; >>> } >>> >>>>> In the case the answer for above is "legacy virtio really sucks" then, >>>>> is it acceptable to not honor bug-compatibility with older versions and >>>>> fix the code ? :) >>>> As long as the common cases don't break, yes. The question is what are >>>> the common cases. Here I think the only non-obscure case that could >>>> break is x86-on-PPC, and it's not that common. >>>> >>>> Paolo >>>> >>> Thanks. >> One starts doubting whether all this hackery is worth it. virtio 1.0 >> should be out real soon now, it makes everything LE so the problem goes >> away. It's not like PPC LE is so popular that we must support old drivers >> at all costs. Won't time be better spent backporting virtio 1.0 drivers? >> > Hmmm... AFAIC some QEMU maintainers expressed interest in supporting legacy > virtio in the case we have endian-changing targets. Patches to run a ppc64le > guests have been accepted in KVM, Linux and QEMU... the only missing block > is virtio. I don't especially care in supporting old drivers at all cost: this > request was expressed on the list. I just want people to be able to run a ppc64le > ubuntu-14.04 (and soon other distros) guest on a ppc64 box and be able to migrate. > > Would it be acceptable to break compatibility for ppc64 (and maybe ARM) only with > a target specific hook being called from the virtio code ? I don't see the problem. Let's just make the config fields UNUSED in vmstate and keep the virtio-serial config space in good endian locally. Then you don't care about ordering of sections anymore. Then we only need the subsection code for "endian is not target endian" which makes migration backwards compatible with non-switchable QEMU versions and everyone's happy. End of story :). Alex
On Thu, Jun 12, 2014 at 10:55:40AM +0200, Alexander Graf wrote: > > On 12.06.14 09:54, Michael S. Tsirkin wrote: > >On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote: > >>On Thu, 29 May 2014 12:16:26 +0200 > >>Paolo Bonzini <pbonzini@redhat.com> wrote: > >>>Il 29/05/2014 11:12, Greg Kurz ha scritto: > >>>>int virtio_load(VirtIODevice *vdev, QEMUFile *f) > >>>>{ > >>>>[...] > >>>> nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; > >>>> ^^^^^^^^^^^ > >>>> /* Check it isn't doing very strange things with descriptor numbers. */ > >>>> if (nheads > vdev->vq[i].vring.num) { > >>>>[...] > >>>>} > >>>> > >>>>and > >>>> > >>>>static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) > >>>>{ > >>>>[...] > >>>> /* The config space */ > >>>> qemu_get_be16s(f, &s->config.cols); > >>>> qemu_get_be16s(f, &s->config.rows); > >>>> > >>>> qemu_get_be32s(f, &max_nr_ports); > >>>> tswap32s(&max_nr_ports); > >>>> ^^^^^^ > >>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > >>>>[...] > >>>>} > >>>> > >>>>If we stream subsections after the device descriptor as it is done > >>>>in VMState, these two will break because the device endian is stale. > >>>> > >>>>The first one can be easily dealt with: just defer the sanity check > >>>>to a post_load function. > >>>Good, we're lucky here. > >>> > >>>>The second is a bit more tricky: the > >>>>virtio serial migrates its config space (target endian) and the > >>>>active ports bitmap. The load code assumes max_nr_ports from the > >>>>config space tells the size of the ports bitmap... that means the > >>>>virtio migration protocol is also contaminated by target endianness. :-\ > >>>Ouch. > >>> > >>>I guess we could break migration in the case of host endianness != > >>>target endianness, like this: > >>> > >>> /* These three used to be fetched in target endianness and then > >>> * stored as big endian. It ended up as little endian if host and > >>> * target endianness doesn't match. > >>> * > >>> * Starting with qemu 2.1, we always store as big endian. The > >>> * version wasn't bumped to avoid breaking backwards compatibility. > >>> * We check the validity of max_nr_ports, and the incorrect- > >>> * endianness max_nr_ports will be huge, which will abort migration > >>> * anyway. > >>> */ > >>> uint16_t cols = tswap16(s->config.cols); > >>> uint16_t rows = tswap16(s->config.rows); > >>> uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); > >>> > >>> qemu_put_be16s(f, &cols); > >>> qemu_put_be16s(f, &rows); > >>> qemu_put_be32s(f, &max_nr_ports); > >>> > >>>... > >>> > >>> uint16_t cols, rows; > >>> > >>> qemu_get_be16s(f, &cols); > >>> qemu_get_be16s(f, &rows); > >>> qemu_get_be32s(f, &max_nr_ports); > >>> > >>> /* Convert back to target endianness when storing into the config > >>> * space. > >>> */ > >>Paolo, > >> > >>The patch set to support endian changing targets adds a device_endian > >>field to the VirtIODevice structure to be used instead of the default > >>target endianness as it happens with tswap() macros. It also introduces > >>virtio_tswap() helpers for this purpose, but they can only be used when > >>the device_endian field has been restored... in a subsection after the > >>device descriptor... :-\ > >Store it earlier then, using plain put/get. > >You can still add a section conditionally to cause > >a cleaner failure in broken cross-version scenarios. > > > >>If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything > >>and we cannot convert back to LE... > >> > >>> s->config.cols = tswap16(cols); > >>> s->config.rows = tswap16(rows); > >>Since cols and rows are not involved in the protocol, we can safely > >>defer the conversion to post load. > >> > >>> if (max_nr_ports > tswap32(s->config.max_nr_ports) { > >>> ... > >>> } > >>> > >>Since we know that 0 < max_nr_ports < 32, is it acceptable to guess > >>the correct endianness with a heuristic ? > >> > >>if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > >> max_nr_ports = bswap32(max_nr_ports); > >>} > >> > >>if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > >> return -EINVAL; > >>} > >> > >>>>In the case the answer for above is "legacy virtio really sucks" then, > >>>>is it acceptable to not honor bug-compatibility with older versions and > >>>>fix the code ? :) > >>>As long as the common cases don't break, yes. The question is what are > >>>the common cases. Here I think the only non-obscure case that could > >>>break is x86-on-PPC, and it's not that common. > >>> > >>>Paolo > >>> > >>Thanks. > >One starts doubting whether all this hackery is worth it. virtio 1.0 > >should be out real soon now, it makes everything LE so the problem goes > >away. It's not like PPC LE is so popular that we must support old drivers > >at all costs. Won't time be better spent backporting virtio 1.0 drivers? > > There are already released and working Linux distributions (Ubuntu, > openSUSE, maybe others) that don't have virtio 1.0 drivers. Putting our > heads into the sand is not an option ;). > > > Alex I don't get it. Does virtio work there at the moment?
On 12.06.14 11:07, Michael S. Tsirkin wrote: > On Thu, Jun 12, 2014 at 10:55:40AM +0200, Alexander Graf wrote: >> On 12.06.14 09:54, Michael S. Tsirkin wrote: >>> On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote: >>>> On Thu, 29 May 2014 12:16:26 +0200 >>>> Paolo Bonzini <pbonzini@redhat.com> wrote: >>>>> Il 29/05/2014 11:12, Greg Kurz ha scritto: >>>>>> int virtio_load(VirtIODevice *vdev, QEMUFile *f) >>>>>> { >>>>>> [...] >>>>>> nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; >>>>>> ^^^^^^^^^^^ >>>>>> /* Check it isn't doing very strange things with descriptor numbers. */ >>>>>> if (nheads > vdev->vq[i].vring.num) { >>>>>> [...] >>>>>> } >>>>>> >>>>>> and >>>>>> >>>>>> static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) >>>>>> { >>>>>> [...] >>>>>> /* The config space */ >>>>>> qemu_get_be16s(f, &s->config.cols); >>>>>> qemu_get_be16s(f, &s->config.rows); >>>>>> >>>>>> qemu_get_be32s(f, &max_nr_ports); >>>>>> tswap32s(&max_nr_ports); >>>>>> ^^^^^^ >>>>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>>>>> [...] >>>>>> } >>>>>> >>>>>> If we stream subsections after the device descriptor as it is done >>>>>> in VMState, these two will break because the device endian is stale. >>>>>> >>>>>> The first one can be easily dealt with: just defer the sanity check >>>>>> to a post_load function. >>>>> Good, we're lucky here. >>>>> >>>>>> The second is a bit more tricky: the >>>>>> virtio serial migrates its config space (target endian) and the >>>>>> active ports bitmap. The load code assumes max_nr_ports from the >>>>>> config space tells the size of the ports bitmap... that means the >>>>>> virtio migration protocol is also contaminated by target endianness. :-\ >>>>> Ouch. >>>>> >>>>> I guess we could break migration in the case of host endianness != >>>>> target endianness, like this: >>>>> >>>>> /* These three used to be fetched in target endianness and then >>>>> * stored as big endian. It ended up as little endian if host and >>>>> * target endianness doesn't match. >>>>> * >>>>> * Starting with qemu 2.1, we always store as big endian. The >>>>> * version wasn't bumped to avoid breaking backwards compatibility. >>>>> * We check the validity of max_nr_ports, and the incorrect- >>>>> * endianness max_nr_ports will be huge, which will abort migration >>>>> * anyway. >>>>> */ >>>>> uint16_t cols = tswap16(s->config.cols); >>>>> uint16_t rows = tswap16(s->config.rows); >>>>> uint32_t max_nr_ports = tswap32(s->config.max_nr_ports); >>>>> >>>>> qemu_put_be16s(f, &cols); >>>>> qemu_put_be16s(f, &rows); >>>>> qemu_put_be32s(f, &max_nr_ports); >>>>> >>>>> ... >>>>> >>>>> uint16_t cols, rows; >>>>> >>>>> qemu_get_be16s(f, &cols); >>>>> qemu_get_be16s(f, &rows); >>>>> qemu_get_be32s(f, &max_nr_ports); >>>>> >>>>> /* Convert back to target endianness when storing into the config >>>>> * space. >>>>> */ >>>> Paolo, >>>> >>>> The patch set to support endian changing targets adds a device_endian >>>> field to the VirtIODevice structure to be used instead of the default >>>> target endianness as it happens with tswap() macros. It also introduces >>>> virtio_tswap() helpers for this purpose, but they can only be used when >>>> the device_endian field has been restored... in a subsection after the >>>> device descriptor... :-\ >>> Store it earlier then, using plain put/get. >>> You can still add a section conditionally to cause >>> a cleaner failure in broken cross-version scenarios. >>> >>>> If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything >>>> and we cannot convert back to LE... >>>> >>>>> s->config.cols = tswap16(cols); >>>>> s->config.rows = tswap16(rows); >>>> Since cols and rows are not involved in the protocol, we can safely >>>> defer the conversion to post load. >>>> >>>>> if (max_nr_ports > tswap32(s->config.max_nr_ports) { >>>>> ... >>>>> } >>>>> >>>> Since we know that 0 < max_nr_ports < 32, is it acceptable to guess >>>> the correct endianness with a heuristic ? >>>> >>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>>> max_nr_ports = bswap32(max_nr_ports); >>>> } >>>> >>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>>> return -EINVAL; >>>> } >>>> >>>>>> In the case the answer for above is "legacy virtio really sucks" then, >>>>>> is it acceptable to not honor bug-compatibility with older versions and >>>>>> fix the code ? :) >>>>> As long as the common cases don't break, yes. The question is what are >>>>> the common cases. Here I think the only non-obscure case that could >>>>> break is x86-on-PPC, and it's not that common. >>>>> >>>>> Paolo >>>>> >>>> Thanks. >>> One starts doubting whether all this hackery is worth it. virtio 1.0 >>> should be out real soon now, it makes everything LE so the problem goes >>> away. It's not like PPC LE is so popular that we must support old drivers >>> at all costs. Won't time be better spent backporting virtio 1.0 drivers? >> There are already released and working Linux distributions (Ubuntu, >> openSUSE, maybe others) that don't have virtio 1.0 drivers. Putting our >> heads into the sand is not an option ;). >> >> >> Alex > I don't get it. Does virtio work there at the moment? With the LE enable patch, yes, sure. Imagine the same would happen for Windows and e1000 - would you still argue the same way? Alex
Il 12/06/2014 11:06, Greg Kurz ha scritto: > On Thu, 12 Jun 2014 10:55:42 +0200 > Paolo Bonzini <pbonzini@redhat.com> wrote: > >> Il 12/06/2014 09:43, Greg Kurz ha scritto: >>> Since we know that 0 < max_nr_ports < 32, is it acceptable to guess >>> the correct endianness with a heuristic ? >>> >>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>> max_nr_ports = bswap32(max_nr_ports); >>> } >>> >>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) { >>> return -EINVAL; >>> } >> >> Yes, I guess it is acceptable. So first you should fix the code to >> always serialize fields as big-endian, and then apply this little hack >> and virtio_tswap on top of the previous change. > > BTW, can someone explain why we stream the device config ? For max_nr_ports it is useless indeed. Let's keep storing it for backwards compatibility, but you can remove its load. The cols and rows values should be defined by the host and thus could even change on migration, there is no need to store them. As of now, however, they are unused in QEMU and should always be zero, because VIRTIO_CONSOLE_F_SIZE is not supported. Paolo
On Thu, Jun 12, 2014 at 11:19:47AM +0200, Paolo Bonzini wrote: > Il 12/06/2014 11:06, Greg Kurz ha scritto: > >On Thu, 12 Jun 2014 10:55:42 +0200 > >Paolo Bonzini <pbonzini@redhat.com> wrote: > > > >>Il 12/06/2014 09:43, Greg Kurz ha scritto: > >>>Since we know that 0 < max_nr_ports < 32, is it acceptable to guess > >>>the correct endianness with a heuristic ? > >>> > >>>if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > >>> max_nr_ports = bswap32(max_nr_ports); > >>>} > >>> > >>>if (max_nr_ports > tswap32(s->config.max_nr_ports)) { > >>> return -EINVAL; > >>>} > >> > >>Yes, I guess it is acceptable. So first you should fix the code to > >>always serialize fields as big-endian, and then apply this little hack > >>and virtio_tswap on top of the previous change. > > > >BTW, can someone explain why we stream the device config ? > > For max_nr_ports it is useless indeed. Let's keep storing it for backwards > compatibility, but you can remove its load. > > The cols and rows values should be defined by the host and thus could even > change on migration, there is no need to store them. As of now, however, > they are unused in QEMU and should always be zero, because > VIRTIO_CONSOLE_F_SIZE is not supported. > > Paolo Maybe just drop unnecessary stuff for new machine types? Then we won't need hacks to migrate it.
Il 12/06/2014 11:37, Michael S. Tsirkin ha scritto: > Maybe just drop unnecessary stuff for new machine types? > Then we won't need hacks to migrate it. For any machine type it's trivial to migrate it. All machine types including old ones can disregard the migrated values. Paolo
On Thu, Jun 12, 2014 at 11:39:42AM +0200, Paolo Bonzini wrote: > Il 12/06/2014 11:37, Michael S. Tsirkin ha scritto: > >Maybe just drop unnecessary stuff for new machine types? > >Then we won't need hacks to migrate it. > > For any machine type it's trivial to migrate it. All machine types > including old ones can disregard the migrated values. > > Paolo Ah, good idea, so simply disregard the values in load.
diff --git a/exec.c b/exec.c index 91513c6..4e83588 100644 --- a/exec.c +++ b/exec.c @@ -2740,13 +2740,7 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr, #endif #if !defined(CONFIG_USER_ONLY) - -/* - * A helper function for the _utterly broken_ virtio device model to find out if - * it's running on a big endian machine. Don't do this at home kids! - */ -bool virtio_is_big_endian(void); -bool virtio_is_big_endian(void) +bool target_words_bigendian(void) { #if defined(TARGET_WORDS_BIGENDIAN) return true; diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index ce97514..2ffceb8 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -89,9 +89,6 @@ /* Flags track per-device state like workarounds for quirks in older guests. */ #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG (1 << 0) -/* HACK for virtio to determine if it's running a big endian guest */ -bool virtio_is_big_endian(void); - static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size, VirtIOPCIProxy *dev); @@ -409,13 +406,13 @@ static uint64_t virtio_pci_config_read(void *opaque, hwaddr addr, break; case 2: val = virtio_config_readw(vdev, addr); - if (virtio_is_big_endian()) { + if (virtio_is_big_endian(vdev)) { val = bswap16(val); } break; case 4: val = virtio_config_readl(vdev, addr); - if (virtio_is_big_endian()) { + if (virtio_is_big_endian(vdev)) { val = bswap32(val); } break; @@ -443,13 +440,13 @@ static void virtio_pci_config_write(void *opaque, hwaddr addr, virtio_config_writeb(vdev, addr, val); break; case 2: - if (virtio_is_big_endian()) { + if (virtio_is_big_endian(vdev)) { val = bswap16(val); } virtio_config_writew(vdev, addr, val); break; case 4: - if (virtio_is_big_endian()) { + if (virtio_is_big_endian(vdev)) { val = bswap32(val); } virtio_config_writel(vdev, addr, val); diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index f4eaa3f..9018b6c 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -547,6 +547,12 @@ void virtio_reset(void *opaque) virtio_set_status(vdev, 0); + if (target_words_bigendian()) { + vdev->device_endian = VIRTIO_DEVICE_ENDIAN_BIG; + } else { + vdev->device_endian = VIRTIO_DEVICE_ENDIAN_LITTLE; + } + if (k->reset) { k->reset(vdev); } @@ -834,12 +840,28 @@ void virtio_notify_config(VirtIODevice *vdev) virtio_notify_vector(vdev, vdev->config_vector); } +static void virtio_save_device_endian(VirtIODevice *vdev, QEMUFile *f) +{ + qemu_put_byte(f, vdev->device_endian); +} + +static int virtio_load_device_endian(VirtIODevice *vdev, QEMUFile *f) +{ + vdev->device_endian = qemu_get_byte(f); + return 0; +} + static const struct VirtIOSubsectionDescStruct { const char *name; int version_id; void (*save)(VirtIODevice *vdev, QEMUFile *f); int (*load)(VirtIODevice *vdev, QEMUFile *f); } virtio_subsection[] = { + { .name = "virtio/is_big_endian", + .version_id = 1, + .save = virtio_save_device_endian, + .load = virtio_load_device_endian, + }, { .name = NULL } }; @@ -861,6 +883,8 @@ void virtio_save_subsections(VirtIODevice *vdev, QEMUFile *f) int virtio_load_subsections(VirtIODevice *vdev, QEMUFile *f) { + int i; + while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) { char idstr[256]; uint8_t len, size; @@ -895,6 +919,26 @@ int virtio_load_subsections(VirtIODevice *vdev, QEMUFile *f) } } + for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) { + if (vdev->vq[i].vring.num == 0) { + break; + } + + if (vdev->vq[i].pa) { + uint16_t nheads; + nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; + /* Check it isn't doing strange things with descriptor numbers. */ + if (nheads > vdev->vq[i].vring.num) { + error_report("VQ %d size 0x%x Guest index 0x%x " + "inconsistent with Host index 0x%x: delta 0x%x", + i, vdev->vq[i].vring.num, + vring_avail_idx(&vdev->vq[i]), + vdev->vq[i].last_avail_idx, nheads); + return -1; + } + } + } + return 0; } @@ -962,6 +1006,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f) BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); + /* We poison the endianness to ensure it does not get used before + * subsections have been loaded. + */ + vdev->device_endian = VIRTIO_DEVICE_ENDIAN_UNKNOWN; + if (k->load_config) { ret = k->load_config(qbus->parent, f); if (ret) @@ -995,18 +1044,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f) vdev->vq[i].notification = true; if (vdev->vq[i].pa) { - uint16_t nheads; virtqueue_init(&vdev->vq[i]); - nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx; - /* Check it isn't doing very strange things with descriptor numbers. */ - if (nheads > vdev->vq[i].vring.num) { - error_report("VQ %d size 0x%x Guest index 0x%x " - "inconsistent with Host index 0x%x: delta 0x%x", - i, vdev->vq[i].vring.num, - vring_avail_idx(&vdev->vq[i]), - vdev->vq[i].last_avail_idx, nheads); - return -1; - } } else if (vdev->vq[i].last_avail_idx) { error_report("VQ %d address 0x0 " "inconsistent with Host index 0x%x", @@ -1078,6 +1116,10 @@ void virtio_init(VirtIODevice *vdev, const char *name, } vdev->vmstate = qemu_add_vm_change_state_handler(virtio_vmstate_change, vdev); + /* We poison the endianness to ensure it does not get used before + * the device is reset. + */ + vdev->device_endian = VIRTIO_DEVICE_ENDIAN_UNKNOWN; } hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n) diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h index a21b65a..eb798c1 100644 --- a/include/exec/cpu-common.h +++ b/include/exec/cpu-common.h @@ -122,4 +122,5 @@ void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque); #endif +bool target_words_bigendian(void); #endif /* !CPU_COMMON_H */ diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index 82f852f..0012470 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -104,6 +104,12 @@ typedef struct VirtQueueElement #define VIRTIO_DEVICE(obj) \ OBJECT_CHECK(VirtIODevice, (obj), TYPE_VIRTIO_DEVICE) +enum virtio_device_endian { + VIRTIO_DEVICE_ENDIAN_UNKNOWN, + VIRTIO_DEVICE_ENDIAN_LITTLE, + VIRTIO_DEVICE_ENDIAN_BIG, +}; + struct VirtIODevice { DeviceState parent_obj; @@ -121,6 +127,7 @@ struct VirtIODevice bool vm_running; VMChangeStateEntry *vmstate; char *bus_name; + enum virtio_device_endian device_endian; }; typedef struct VirtioDeviceClass { @@ -257,4 +264,10 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign, bool set_handler); void virtio_queue_notify_vq(VirtQueue *vq); void virtio_irq(VirtQueue *vq); + +static inline bool virtio_is_big_endian(VirtIODevice *vdev) +{ + assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); + return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG; +} #endif
Some CPU families can dynamically change their endianness. This means we can have little endian ppc or big endian arm guests for example. This has an impact on legacy virtio data structures since they are target endian. We hence introduce a new property to track the endianness of each virtio device. It is reasonnably assumed that endianness won't change while the device is in use : we hence capture the device endianness when it gets reset. Of course this property must be part of the migration stream as most of the virtio code will depend on it. It is hence migrated in a subsection to preserve compatibility with older migration streams. The tricky part is that the endianness gets known quite late and we must ensure no access is made to virtio data before that. It is for example invalid to call vring_avail_idx() as does the actual migration code: the VQ indexes sanity checks had to be moved from virtio_load() to virtio_load_subsections() because of that. In fact, we have two conditions where the endianness property is stale: - from initialization time (virtio_init) to first reset time (virtio_reset) - from incoming migration time (virtio_load) to the time the property is restored (virtio_load_device_endian) We enforce some paranoia by making the endianness a 3-value enum so that we can poison it when it should not be used. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> --- exec.c | 8 +----- hw/virtio/virtio-pci.c | 11 +++----- hw/virtio/virtio.c | 64 ++++++++++++++++++++++++++++++++++++-------- include/exec/cpu-common.h | 1 + include/hw/virtio/virtio.h | 13 +++++++++ 5 files changed, 72 insertions(+), 25 deletions(-)