diff mbox

[RFC,8/8] virtio: add endian-ambivalent support to VirtIODevice

Message ID 20140514154230.10746.56297.stgit@bahia.local
State New
Headers show

Commit Message

Greg Kurz May 14, 2014, 3:42 p.m. UTC
Some CPU families can dynamically change their endianness. This means we
can have little endian ppc or big endian arm guests for example. This has
an impact on legacy virtio data structures since they are target endian.
We hence introduce a new property to track the endianness of each virtio
device. It is reasonnably assumed that endianness won't change while the
device is in use : we hence capture the device endianness when it gets
reset.

Of course this property must be part of the migration stream as most of
the virtio code will depend on it. It is hence migrated in a subsection
to preserve compatibility with older migration streams. The tricky part
is that the endianness gets known quite late and we must ensure no access
is made to virtio data before that. It is for example invalid to call
vring_avail_idx() as does the actual migration code: the VQ indexes sanity
checks had to be moved from virtio_load() to virtio_load_subsections()
because of that.

In fact, we have two conditions where the endianness property is stale:
- from initialization time (virtio_init) to first reset time (virtio_reset)
- from incoming migration time (virtio_load) to the time the property
  is restored (virtio_load_device_endian)
We enforce some paranoia by making the endianness a 3-value enum so
that we can poison it when it should not be used.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
 exec.c                     |    8 +-----
 hw/virtio/virtio-pci.c     |   11 +++-----
 hw/virtio/virtio.c         |   64 ++++++++++++++++++++++++++++++++++++--------
 include/exec/cpu-common.h  |    1 +
 include/hw/virtio/virtio.h |   13 +++++++++
 5 files changed, 72 insertions(+), 25 deletions(-)

Comments

Paolo Bonzini May 27, 2014, 3:01 p.m. UTC | #1
Il 14/05/2014 17:42, Greg Kurz ha scritto:
> +    { .name = "virtio/is_big_endian",
> +      .version_id = 1,
> +      .save = virtio_save_device_endian,
> +      .load = virtio_load_device_endian,
> +    },
>      { .name = NULL }

I think this is okay, but you need to add support for a "needed" 
function like you have in normal vmstate subsections.  You only need the 
subsection if

     if (target_words_bigendian()) {
         return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_BIG;
     } else {
         return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
     }

Paolo
Greg Kurz May 29, 2014, 9:12 a.m. UTC | #2
On Tue, 27 May 2014 17:01:38 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 14/05/2014 17:42, Greg Kurz ha scritto:
> > +    { .name = "virtio/is_big_endian",
> > +      .version_id = 1,
> > +      .save = virtio_save_device_endian,
> > +      .load = virtio_load_device_endian,
> > +    },
> >      { .name = NULL }
> 
> I think this is okay, but you need to add support for a "needed" 
> function like you have in normal vmstate subsections.  You only need the 
> subsection if
> 
>      if (target_words_bigendian()) {
>          return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_BIG;
>      } else {
>          return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
>      }
> 
> Paolo
> 

Hi Paolo,

As suggested by Andreas, I have reworked the patch set to use
VMState directly instead of mimicking... and of course, I end
up with a virtio_device_endian_needed() function that does just
that ! :)

I'm on leave now, I'll try to send an update next week.

BTW, I have spotted two locations where the migration code is
affected by the device endianness at load time:

int virtio_load(VirtIODevice *vdev, QEMUFile *f)
{
[...]
            nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
                       ^^^^^^^^^^^
            /* Check it isn't doing very strange things with descriptor numbers. */
            if (nheads > vdev->vq[i].vring.num) {
[...]
}

and

static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
{
[...]
    /* The config space */
    qemu_get_be16s(f, &s->config.cols);
    qemu_get_be16s(f, &s->config.rows);

    qemu_get_be32s(f, &max_nr_ports);
    tswap32s(&max_nr_ports);
     ^^^^^^
    if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
[...]
}

If we stream subsections after the device descriptor as it is done
in VMState, these two will break because the device endian is stale.

The first one can be easily dealt with: just defer the sanity check
to a post_load function. The second is a bit more tricky: the
virtio serial migrates its config space (target endian) and the
active ports bitmap. The load code assumes max_nr_ports from the
config space tells the size of the ports bitmap... that means the
virtio migration protocol is also contaminated by target endianness. :-\

I have questions about virtio serial:
- what is exactly the point at migrating the virtio device config space ?
- what is the scenario that calls for the active ports bitmap checking
  at load time ?
- we currently have 0 < max_nr_ports < VIRTIO_PCI_QUEUE_MAX / 2 hardcoded.
  With the current code, that means the ports bitmap is always a single
  32-bit fixed size value... are there plans to support virtio serial
  with 0 port ? with more than 32 ports ?

In the case the answer for above is "legacy virtio really sucks" then,
is it acceptable to not honor bug-compatibility with older versions and
fix the code ? :)

A solution could be to simply remove all that "crap" and bump versions,
or at least send zeroes on save and skip them on load.

Another solution I haven't tried yet would be to stream subsections before
the device descriptor...

Any suggestions ?

Cheers.
Paolo Bonzini May 29, 2014, 10:16 a.m. UTC | #3
Il 29/05/2014 11:12, Greg Kurz ha scritto:
> int virtio_load(VirtIODevice *vdev, QEMUFile *f)
> {
> [...]
>             nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
>                        ^^^^^^^^^^^
>             /* Check it isn't doing very strange things with descriptor numbers. */
>             if (nheads > vdev->vq[i].vring.num) {
> [...]
> }
>
> and
>
> static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> {
> [...]
>     /* The config space */
>     qemu_get_be16s(f, &s->config.cols);
>     qemu_get_be16s(f, &s->config.rows);
>
>     qemu_get_be32s(f, &max_nr_ports);
>     tswap32s(&max_nr_ports);
>      ^^^^^^
>     if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> [...]
> }
>
> If we stream subsections after the device descriptor as it is done
> in VMState, these two will break because the device endian is stale.
>
> The first one can be easily dealt with: just defer the sanity check
> to a post_load function.

Good, we're lucky here.

> The second is a bit more tricky: the
> virtio serial migrates its config space (target endian) and the
> active ports bitmap. The load code assumes max_nr_ports from the
> config space tells the size of the ports bitmap... that means the
> virtio migration protocol is also contaminated by target endianness. :-\

Ouch.

I guess we could break migration in the case of host endianness != 
target endianness, like this:

     /* These three used to be fetched in target endianness and then
      * stored as big endian.  It ended up as little endian if host and
      * target endianness doesn't match.
      *
      * Starting with qemu 2.1, we always store as big endian.  The
      * version wasn't bumped to avoid breaking backwards compatibility.
      * We check the validity of max_nr_ports, and the incorrect-
      * endianness max_nr_ports will be huge, which will abort migration
      * anyway.
      */
     uint16_t cols = tswap16(s->config.cols);
     uint16_t rows = tswap16(s->config.rows);
     uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);

     qemu_put_be16s(f, &cols);
     qemu_put_be16s(f, &rows);
     qemu_put_be32s(f, &max_nr_ports);

...

     uint16_t cols, rows;

     qemu_get_be16s(f, &cols);
     qemu_get_be16s(f, &rows);
     qemu_get_be32s(f, &max_nr_ports);

     /* Convert back to target endianness when storing into the config
      * space.
      */
     s->config.cols = tswap16(cols);
     s->config.rows = tswap16(rows);
     if (max_nr_ports > tswap32(s->config.max_nr_ports) {
         ...
     }

> In the case the answer for above is "legacy virtio really sucks" then,
> is it acceptable to not honor bug-compatibility with older versions and
> fix the code ? :)

As long as the common cases don't break, yes.  The question is what are 
the common cases.  Here I think the only non-obscure case that could 
break is x86-on-PPC, and it's not that common.

Paolo
Greg Kurz June 12, 2014, 7:43 a.m. UTC | #4
On Thu, 29 May 2014 12:16:26 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 29/05/2014 11:12, Greg Kurz ha scritto:
> > int virtio_load(VirtIODevice *vdev, QEMUFile *f)
> > {
> > [...]
> >             nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> >                        ^^^^^^^^^^^
> >             /* Check it isn't doing very strange things with descriptor numbers. */
> >             if (nheads > vdev->vq[i].vring.num) {
> > [...]
> > }
> >
> > and
> >
> > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> > {
> > [...]
> >     /* The config space */
> >     qemu_get_be16s(f, &s->config.cols);
> >     qemu_get_be16s(f, &s->config.rows);
> >
> >     qemu_get_be32s(f, &max_nr_ports);
> >     tswap32s(&max_nr_ports);
> >      ^^^^^^
> >     if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > [...]
> > }
> >
> > If we stream subsections after the device descriptor as it is done
> > in VMState, these two will break because the device endian is stale.
> >
> > The first one can be easily dealt with: just defer the sanity check
> > to a post_load function.
> 
> Good, we're lucky here.
> 
> > The second is a bit more tricky: the
> > virtio serial migrates its config space (target endian) and the
> > active ports bitmap. The load code assumes max_nr_ports from the
> > config space tells the size of the ports bitmap... that means the
> > virtio migration protocol is also contaminated by target endianness. :-\
> 
> Ouch.
> 
> I guess we could break migration in the case of host endianness != 
> target endianness, like this:
> 
>      /* These three used to be fetched in target endianness and then
>       * stored as big endian.  It ended up as little endian if host and
>       * target endianness doesn't match.
>       *
>       * Starting with qemu 2.1, we always store as big endian.  The
>       * version wasn't bumped to avoid breaking backwards compatibility.
>       * We check the validity of max_nr_ports, and the incorrect-
>       * endianness max_nr_ports will be huge, which will abort migration
>       * anyway.
>       */
>      uint16_t cols = tswap16(s->config.cols);
>      uint16_t rows = tswap16(s->config.rows);
>      uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
> 
>      qemu_put_be16s(f, &cols);
>      qemu_put_be16s(f, &rows);
>      qemu_put_be32s(f, &max_nr_ports);
> 
> ...
> 
>      uint16_t cols, rows;
> 
>      qemu_get_be16s(f, &cols);
>      qemu_get_be16s(f, &rows);
>      qemu_get_be32s(f, &max_nr_ports);
> 
>      /* Convert back to target endianness when storing into the config
>       * space.
>       */

Paolo,

The patch set to support endian changing targets adds a device_endian
field to the VirtIODevice structure to be used instead of the default
target endianness as it happens with tswap() macros. It also introduces
virtio_tswap() helpers for this purpose, but they can only be used when
the device_endian field has been restored... in a subsection after the
device descriptor... :-\
If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
and we cannot convert back to LE...

>      s->config.cols = tswap16(cols);
>      s->config.rows = tswap16(rows);

Since cols and rows are not involved in the protocol, we can safely
defer the conversion to post load.

>      if (max_nr_ports > tswap32(s->config.max_nr_ports) {
>          ...
>      }
> 

Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
the correct endianness with a heuristic ?

if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
	max_nr_ports = bswap32(max_nr_ports);
}

if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
	return -EINVAL;
}

> > In the case the answer for above is "legacy virtio really sucks" then,
> > is it acceptable to not honor bug-compatibility with older versions and
> > fix the code ? :)
> 
> As long as the common cases don't break, yes.  The question is what are 
> the common cases.  Here I think the only non-obscure case that could 
> break is x86-on-PPC, and it's not that common.
> 
> Paolo
> 

Thanks.
Michael S. Tsirkin June 12, 2014, 7:54 a.m. UTC | #5
On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote:
> On Thu, 29 May 2014 12:16:26 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 29/05/2014 11:12, Greg Kurz ha scritto:
> > > int virtio_load(VirtIODevice *vdev, QEMUFile *f)
> > > {
> > > [...]
> > >             nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> > >                        ^^^^^^^^^^^
> > >             /* Check it isn't doing very strange things with descriptor numbers. */
> > >             if (nheads > vdev->vq[i].vring.num) {
> > > [...]
> > > }
> > >
> > > and
> > >
> > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> > > {
> > > [...]
> > >     /* The config space */
> > >     qemu_get_be16s(f, &s->config.cols);
> > >     qemu_get_be16s(f, &s->config.rows);
> > >
> > >     qemu_get_be32s(f, &max_nr_ports);
> > >     tswap32s(&max_nr_ports);
> > >      ^^^^^^
> > >     if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > > [...]
> > > }
> > >
> > > If we stream subsections after the device descriptor as it is done
> > > in VMState, these two will break because the device endian is stale.
> > >
> > > The first one can be easily dealt with: just defer the sanity check
> > > to a post_load function.
> > 
> > Good, we're lucky here.
> > 
> > > The second is a bit more tricky: the
> > > virtio serial migrates its config space (target endian) and the
> > > active ports bitmap. The load code assumes max_nr_ports from the
> > > config space tells the size of the ports bitmap... that means the
> > > virtio migration protocol is also contaminated by target endianness. :-\
> > 
> > Ouch.
> > 
> > I guess we could break migration in the case of host endianness != 
> > target endianness, like this:
> > 
> >      /* These three used to be fetched in target endianness and then
> >       * stored as big endian.  It ended up as little endian if host and
> >       * target endianness doesn't match.
> >       *
> >       * Starting with qemu 2.1, we always store as big endian.  The
> >       * version wasn't bumped to avoid breaking backwards compatibility.
> >       * We check the validity of max_nr_ports, and the incorrect-
> >       * endianness max_nr_ports will be huge, which will abort migration
> >       * anyway.
> >       */
> >      uint16_t cols = tswap16(s->config.cols);
> >      uint16_t rows = tswap16(s->config.rows);
> >      uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
> > 
> >      qemu_put_be16s(f, &cols);
> >      qemu_put_be16s(f, &rows);
> >      qemu_put_be32s(f, &max_nr_ports);
> > 
> > ...
> > 
> >      uint16_t cols, rows;
> > 
> >      qemu_get_be16s(f, &cols);
> >      qemu_get_be16s(f, &rows);
> >      qemu_get_be32s(f, &max_nr_ports);
> > 
> >      /* Convert back to target endianness when storing into the config
> >       * space.
> >       */
> 
> Paolo,
> 
> The patch set to support endian changing targets adds a device_endian
> field to the VirtIODevice structure to be used instead of the default
> target endianness as it happens with tswap() macros. It also introduces
> virtio_tswap() helpers for this purpose, but they can only be used when
> the device_endian field has been restored... in a subsection after the
> device descriptor... :-\

Store it earlier then, using plain put/get.
You can still add a section conditionally to cause
a cleaner failure in broken cross-version scenarios.

> If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
> and we cannot convert back to LE...
> 
> >      s->config.cols = tswap16(cols);
> >      s->config.rows = tswap16(rows);
> 
> Since cols and rows are not involved in the protocol, we can safely
> defer the conversion to post load.
> 
> >      if (max_nr_ports > tswap32(s->config.max_nr_ports) {
> >          ...
> >      }
> > 
> 
> Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
> the correct endianness with a heuristic ?
> 
> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> 	max_nr_ports = bswap32(max_nr_ports);
> }
> 
> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> 	return -EINVAL;
> }
> 
> > > In the case the answer for above is "legacy virtio really sucks" then,
> > > is it acceptable to not honor bug-compatibility with older versions and
> > > fix the code ? :)
> > 
> > As long as the common cases don't break, yes.  The question is what are 
> > the common cases.  Here I think the only non-obscure case that could 
> > break is x86-on-PPC, and it's not that common.
> > 
> > Paolo
> > 
> 
> Thanks.

One starts doubting whether all this hackery is worth it.  virtio 1.0
should be out real soon now, it makes everything LE so the problem goes
away. It's not like PPC LE is so popular that we must support old drivers
at all costs.  Won't time be better spent backporting virtio 1.0 drivers?


> -- 
> Gregory Kurz                                     kurzgreg@fr.ibm.com
>                                                  gkurz@linux.vnet.ibm.com
> Software Engineer @ IBM/Meiosys                  http://www.ibm.com
> Tel +33 (0)562 165 496
> 
> "Anarchy is about taking complete responsibility for yourself."
>         Alan Moore.
Greg Kurz June 12, 2014, 8:47 a.m. UTC | #6
On Thu, 12 Jun 2014 10:54:48 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote:
> > On Thu, 29 May 2014 12:16:26 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > Il 29/05/2014 11:12, Greg Kurz ha scritto:
> > > > int virtio_load(VirtIODevice *vdev, QEMUFile *f)
> > > > {
> > > > [...]
> > > >             nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> > > >                        ^^^^^^^^^^^
> > > >             /* Check it isn't doing very strange things with descriptor numbers. */
> > > >             if (nheads > vdev->vq[i].vring.num) {
> > > > [...]
> > > > }
> > > >
> > > > and
> > > >
> > > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> > > > {
> > > > [...]
> > > >     /* The config space */
> > > >     qemu_get_be16s(f, &s->config.cols);
> > > >     qemu_get_be16s(f, &s->config.rows);
> > > >
> > > >     qemu_get_be32s(f, &max_nr_ports);
> > > >     tswap32s(&max_nr_ports);
> > > >      ^^^^^^
> > > >     if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > > > [...]
> > > > }
> > > >
> > > > If we stream subsections after the device descriptor as it is done
> > > > in VMState, these two will break because the device endian is stale.
> > > >
> > > > The first one can be easily dealt with: just defer the sanity check
> > > > to a post_load function.
> > > 
> > > Good, we're lucky here.
> > > 
> > > > The second is a bit more tricky: the
> > > > virtio serial migrates its config space (target endian) and the
> > > > active ports bitmap. The load code assumes max_nr_ports from the
> > > > config space tells the size of the ports bitmap... that means the
> > > > virtio migration protocol is also contaminated by target endianness. :-\
> > > 
> > > Ouch.
> > > 
> > > I guess we could break migration in the case of host endianness != 
> > > target endianness, like this:
> > > 
> > >      /* These three used to be fetched in target endianness and then
> > >       * stored as big endian.  It ended up as little endian if host and
> > >       * target endianness doesn't match.
> > >       *
> > >       * Starting with qemu 2.1, we always store as big endian.  The
> > >       * version wasn't bumped to avoid breaking backwards compatibility.
> > >       * We check the validity of max_nr_ports, and the incorrect-
> > >       * endianness max_nr_ports will be huge, which will abort migration
> > >       * anyway.
> > >       */
> > >      uint16_t cols = tswap16(s->config.cols);
> > >      uint16_t rows = tswap16(s->config.rows);
> > >      uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
> > > 
> > >      qemu_put_be16s(f, &cols);
> > >      qemu_put_be16s(f, &rows);
> > >      qemu_put_be32s(f, &max_nr_ports);
> > > 
> > > ...
> > > 
> > >      uint16_t cols, rows;
> > > 
> > >      qemu_get_be16s(f, &cols);
> > >      qemu_get_be16s(f, &rows);
> > >      qemu_get_be32s(f, &max_nr_ports);
> > > 
> > >      /* Convert back to target endianness when storing into the config
> > >       * space.
> > >       */
> > 
> > Paolo,
> > 
> > The patch set to support endian changing targets adds a device_endian
> > field to the VirtIODevice structure to be used instead of the default
> > target endianness as it happens with tswap() macros. It also introduces
> > virtio_tswap() helpers for this purpose, but they can only be used when
> > the device_endian field has been restored... in a subsection after the
> > device descriptor... :-\
> 
> Store it earlier then, using plain put/get.

Not sure I follow... this will break compatibility, no ?

> You can still add a section conditionally to cause
> a cleaner failure in broken cross-version scenarios.
> 
> > If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
> > and we cannot convert back to LE...
> > 
> > >      s->config.cols = tswap16(cols);
> > >      s->config.rows = tswap16(rows);
> > 
> > Since cols and rows are not involved in the protocol, we can safely
> > defer the conversion to post load.
> > 
> > >      if (max_nr_ports > tswap32(s->config.max_nr_ports) {
> > >          ...
> > >      }
> > > 
> > 
> > Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
> > the correct endianness with a heuristic ?
> > 
> > if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > 	max_nr_ports = bswap32(max_nr_ports);
> > }
> > 
> > if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > 	return -EINVAL;
> > }
> > 
> > > > In the case the answer for above is "legacy virtio really sucks" then,
> > > > is it acceptable to not honor bug-compatibility with older versions and
> > > > fix the code ? :)
> > > 
> > > As long as the common cases don't break, yes.  The question is what are 
> > > the common cases.  Here I think the only non-obscure case that could 
> > > break is x86-on-PPC, and it's not that common.
> > > 
> > > Paolo
> > > 
> > 
> > Thanks.
> 
> One starts doubting whether all this hackery is worth it.  virtio 1.0
> should be out real soon now, it makes everything LE so the problem goes
> away. It's not like PPC LE is so popular that we must support old drivers
> at all costs.  Won't time be better spent backporting virtio 1.0 drivers?
> 

Hmmm... AFAIC some QEMU maintainers expressed interest in supporting legacy
virtio in the case we have endian-changing targets. Patches to run a ppc64le
guests have been accepted in KVM, Linux and QEMU... the only missing block
is virtio. I don't especially care in supporting old drivers at all cost: this
request was expressed on the list. I just want people to be able to run a ppc64le
ubuntu-14.04 (and soon other distros) guest on a ppc64 box and be able to migrate.

Would it be acceptable to break compatibility for ppc64 (and maybe ARM) only with
a target specific hook being called from the virtio code ?

> 
> > -- 
> > Gregory Kurz                                     kurzgreg@fr.ibm.com
> >                                                  gkurz@linux.vnet.ibm.com
> > Software Engineer @ IBM/Meiosys                  http://www.ibm.com
> > Tel +33 (0)562 165 496
> > 
> > "Anarchy is about taking complete responsibility for yourself."
> >         Alan Moore.
> 

Thanks.
Alexander Graf June 12, 2014, 8:55 a.m. UTC | #7
On 12.06.14 09:54, Michael S. Tsirkin wrote:
> On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote:
>> On Thu, 29 May 2014 12:16:26 +0200
>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 29/05/2014 11:12, Greg Kurz ha scritto:
>>>> int virtio_load(VirtIODevice *vdev, QEMUFile *f)
>>>> {
>>>> [...]
>>>>              nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
>>>>                         ^^^^^^^^^^^
>>>>              /* Check it isn't doing very strange things with descriptor numbers. */
>>>>              if (nheads > vdev->vq[i].vring.num) {
>>>> [...]
>>>> }
>>>>
>>>> and
>>>>
>>>> static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>>>> {
>>>> [...]
>>>>      /* The config space */
>>>>      qemu_get_be16s(f, &s->config.cols);
>>>>      qemu_get_be16s(f, &s->config.rows);
>>>>
>>>>      qemu_get_be32s(f, &max_nr_ports);
>>>>      tswap32s(&max_nr_ports);
>>>>       ^^^^^^
>>>>      if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>>> [...]
>>>> }
>>>>
>>>> If we stream subsections after the device descriptor as it is done
>>>> in VMState, these two will break because the device endian is stale.
>>>>
>>>> The first one can be easily dealt with: just defer the sanity check
>>>> to a post_load function.
>>> Good, we're lucky here.
>>>
>>>> The second is a bit more tricky: the
>>>> virtio serial migrates its config space (target endian) and the
>>>> active ports bitmap. The load code assumes max_nr_ports from the
>>>> config space tells the size of the ports bitmap... that means the
>>>> virtio migration protocol is also contaminated by target endianness. :-\
>>> Ouch.
>>>
>>> I guess we could break migration in the case of host endianness !=
>>> target endianness, like this:
>>>
>>>       /* These three used to be fetched in target endianness and then
>>>        * stored as big endian.  It ended up as little endian if host and
>>>        * target endianness doesn't match.
>>>        *
>>>        * Starting with qemu 2.1, we always store as big endian.  The
>>>        * version wasn't bumped to avoid breaking backwards compatibility.
>>>        * We check the validity of max_nr_ports, and the incorrect-
>>>        * endianness max_nr_ports will be huge, which will abort migration
>>>        * anyway.
>>>        */
>>>       uint16_t cols = tswap16(s->config.cols);
>>>       uint16_t rows = tswap16(s->config.rows);
>>>       uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
>>>
>>>       qemu_put_be16s(f, &cols);
>>>       qemu_put_be16s(f, &rows);
>>>       qemu_put_be32s(f, &max_nr_ports);
>>>
>>> ...
>>>
>>>       uint16_t cols, rows;
>>>
>>>       qemu_get_be16s(f, &cols);
>>>       qemu_get_be16s(f, &rows);
>>>       qemu_get_be32s(f, &max_nr_ports);
>>>
>>>       /* Convert back to target endianness when storing into the config
>>>        * space.
>>>        */
>> Paolo,
>>
>> The patch set to support endian changing targets adds a device_endian
>> field to the VirtIODevice structure to be used instead of the default
>> target endianness as it happens with tswap() macros. It also introduces
>> virtio_tswap() helpers for this purpose, but they can only be used when
>> the device_endian field has been restored... in a subsection after the
>> device descriptor... :-\
> Store it earlier then, using plain put/get.
> You can still add a section conditionally to cause
> a cleaner failure in broken cross-version scenarios.
>
>> If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
>> and we cannot convert back to LE...
>>
>>>       s->config.cols = tswap16(cols);
>>>       s->config.rows = tswap16(rows);
>> Since cols and rows are not involved in the protocol, we can safely
>> defer the conversion to post load.
>>
>>>       if (max_nr_ports > tswap32(s->config.max_nr_ports) {
>>>           ...
>>>       }
>>>
>> Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
>> the correct endianness with a heuristic ?
>>
>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>> 	max_nr_ports = bswap32(max_nr_ports);
>> }
>>
>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>> 	return -EINVAL;
>> }
>>
>>>> In the case the answer for above is "legacy virtio really sucks" then,
>>>> is it acceptable to not honor bug-compatibility with older versions and
>>>> fix the code ? :)
>>> As long as the common cases don't break, yes.  The question is what are
>>> the common cases.  Here I think the only non-obscure case that could
>>> break is x86-on-PPC, and it's not that common.
>>>
>>> Paolo
>>>
>> Thanks.
> One starts doubting whether all this hackery is worth it.  virtio 1.0
> should be out real soon now, it makes everything LE so the problem goes
> away. It's not like PPC LE is so popular that we must support old drivers
> at all costs.  Won't time be better spent backporting virtio 1.0 drivers?

There are already released and working Linux distributions (Ubuntu, 
openSUSE, maybe others) that don't have virtio 1.0 drivers. Putting our 
heads into the sand is not an option ;).


Alex
Paolo Bonzini June 12, 2014, 8:55 a.m. UTC | #8
Il 12/06/2014 09:43, Greg Kurz ha scritto:
> Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
> the correct endianness with a heuristic ?
>
> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> 	max_nr_ports = bswap32(max_nr_ports);
> }
>
> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> 	return -EINVAL;
> }

Yes, I guess it is acceptable.  So first you should fix the code to 
always serialize fields as big-endian, and then apply this little hack 
and virtio_tswap on top of the previous change.

Paolo
Alexander Graf June 12, 2014, 8:57 a.m. UTC | #9
On 12.06.14 10:55, Paolo Bonzini wrote:
> Il 12/06/2014 09:43, Greg Kurz ha scritto:
>> Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
>> the correct endianness with a heuristic ?
>>
>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>     max_nr_ports = bswap32(max_nr_ports);
>> }
>>
>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>     return -EINVAL;
>> }
>
> Yes, I guess it is acceptable.  So first you should fix the code to 
> always serialize fields as big-endian, and then apply this little hack 
> and virtio_tswap on top of the previous change.

Why don't we just ignore those config fields? They won't change during 
the lifetime of the guest anyway - and configuration needs to be the 
same on source and dest to work.


Alex
Michael S. Tsirkin June 12, 2014, 9:05 a.m. UTC | #10
On Thu, Jun 12, 2014 at 10:47:17AM +0200, Greg Kurz wrote:
> On Thu, 12 Jun 2014 10:54:48 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote:
> > > On Thu, 29 May 2014 12:16:26 +0200
> > > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > Il 29/05/2014 11:12, Greg Kurz ha scritto:
> > > > > int virtio_load(VirtIODevice *vdev, QEMUFile *f)
> > > > > {
> > > > > [...]
> > > > >             nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> > > > >                        ^^^^^^^^^^^
> > > > >             /* Check it isn't doing very strange things with descriptor numbers. */
> > > > >             if (nheads > vdev->vq[i].vring.num) {
> > > > > [...]
> > > > > }
> > > > >
> > > > > and
> > > > >
> > > > > static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> > > > > {
> > > > > [...]
> > > > >     /* The config space */
> > > > >     qemu_get_be16s(f, &s->config.cols);
> > > > >     qemu_get_be16s(f, &s->config.rows);
> > > > >
> > > > >     qemu_get_be32s(f, &max_nr_ports);
> > > > >     tswap32s(&max_nr_ports);
> > > > >      ^^^^^^
> > > > >     if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > > > > [...]
> > > > > }
> > > > >
> > > > > If we stream subsections after the device descriptor as it is done
> > > > > in VMState, these two will break because the device endian is stale.
> > > > >
> > > > > The first one can be easily dealt with: just defer the sanity check
> > > > > to a post_load function.
> > > > 
> > > > Good, we're lucky here.
> > > > 
> > > > > The second is a bit more tricky: the
> > > > > virtio serial migrates its config space (target endian) and the
> > > > > active ports bitmap. The load code assumes max_nr_ports from the
> > > > > config space tells the size of the ports bitmap... that means the
> > > > > virtio migration protocol is also contaminated by target endianness. :-\
> > > > 
> > > > Ouch.
> > > > 
> > > > I guess we could break migration in the case of host endianness != 
> > > > target endianness, like this:
> > > > 
> > > >      /* These three used to be fetched in target endianness and then
> > > >       * stored as big endian.  It ended up as little endian if host and
> > > >       * target endianness doesn't match.
> > > >       *
> > > >       * Starting with qemu 2.1, we always store as big endian.  The
> > > >       * version wasn't bumped to avoid breaking backwards compatibility.
> > > >       * We check the validity of max_nr_ports, and the incorrect-
> > > >       * endianness max_nr_ports will be huge, which will abort migration
> > > >       * anyway.
> > > >       */
> > > >      uint16_t cols = tswap16(s->config.cols);
> > > >      uint16_t rows = tswap16(s->config.rows);
> > > >      uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
> > > > 
> > > >      qemu_put_be16s(f, &cols);
> > > >      qemu_put_be16s(f, &rows);
> > > >      qemu_put_be32s(f, &max_nr_ports);
> > > > 
> > > > ...
> > > > 
> > > >      uint16_t cols, rows;
> > > > 
> > > >      qemu_get_be16s(f, &cols);
> > > >      qemu_get_be16s(f, &rows);
> > > >      qemu_get_be32s(f, &max_nr_ports);
> > > > 
> > > >      /* Convert back to target endianness when storing into the config
> > > >       * space.
> > > >       */
> > > 
> > > Paolo,
> > > 
> > > The patch set to support endian changing targets adds a device_endian
> > > field to the VirtIODevice structure to be used instead of the default
> > > target endianness as it happens with tswap() macros. It also introduces
> > > virtio_tswap() helpers for this purpose, but they can only be used when
> > > the device_endian field has been restored... in a subsection after the
> > > device descriptor... :-\
> > 
> > Store it earlier then, using plain put/get.
> 
> Not sure I follow... this will break compatibility, no ?

Do it only on the ambialent targets :)

> > You can still add a section conditionally to cause
> > a cleaner failure in broken cross-version scenarios.
> > 
> > > If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
> > > and we cannot convert back to LE...
> > > 
> > > >      s->config.cols = tswap16(cols);
> > > >      s->config.rows = tswap16(rows);
> > > 
> > > Since cols and rows are not involved in the protocol, we can safely
> > > defer the conversion to post load.
> > > 
> > > >      if (max_nr_ports > tswap32(s->config.max_nr_ports) {
> > > >          ...
> > > >      }
> > > > 
> > > 
> > > Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
> > > the correct endianness with a heuristic ?
> > > 
> > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > > 	max_nr_ports = bswap32(max_nr_ports);
> > > }
> > > 
> > > if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > > 	return -EINVAL;
> > > }
> > > 
> > > > > In the case the answer for above is "legacy virtio really sucks" then,
> > > > > is it acceptable to not honor bug-compatibility with older versions and
> > > > > fix the code ? :)
> > > > 
> > > > As long as the common cases don't break, yes.  The question is what are 
> > > > the common cases.  Here I think the only non-obscure case that could 
> > > > break is x86-on-PPC, and it's not that common.
> > > > 
> > > > Paolo
> > > > 
> > > 
> > > Thanks.
> > 
> > One starts doubting whether all this hackery is worth it.  virtio 1.0
> > should be out real soon now, it makes everything LE so the problem goes
> > away. It's not like PPC LE is so popular that we must support old drivers
> > at all costs.  Won't time be better spent backporting virtio 1.0 drivers?
> > 
> 
> Hmmm... AFAIC some QEMU maintainers expressed interest in supporting legacy
> virtio in the case we have endian-changing targets. Patches to run a ppc64le
> guests have been accepted in KVM, Linux and QEMU... the only missing block
> is virtio. I don't especially care in supporting old drivers at all cost: this
> request was expressed on the list. I just want people to be able to run a ppc64le
> ubuntu-14.04 (and soon other distros) guest on a ppc64 box and be able to migrate.

Sigh. So ubuntu ships known-broken virtio drivers in their LTS release
and now we are trying to make them work.
There's no working qemu configuration at the moment, do we really
have to build new hardware around known-broken drivers
and then maintain it all for years?
Why not build good hardware and backport new drivers?


> Would it be acceptable to break compatibility for ppc64 (and maybe ARM) only with
> a target specific hook being called from the virtio code ?

I think so.
Can we just add a function that tells us whether target is ambivalent?
Only migrate new stuff if that's the case.

> > 
> > > -- 
> > > Gregory Kurz                                     kurzgreg@fr.ibm.com
> > >                                                  gkurz@linux.vnet.ibm.com
> > > Software Engineer @ IBM/Meiosys                  http://www.ibm.com
> > > Tel +33 (0)562 165 496
> > > 
> > > "Anarchy is about taking complete responsibility for yourself."
> > >         Alan Moore.
> > 
> 
> Thanks.
> 
> -- 
> Gregory Kurz                                     kurzgreg@fr.ibm.com
>                                                  gkurz@linux.vnet.ibm.com
> Software Engineer @ IBM/Meiosys                  http://www.ibm.com
> Tel +33 (0)562 165 496
> 
> "Anarchy is about taking complete responsibility for yourself."
>         Alan Moore.
Greg Kurz June 12, 2014, 9:06 a.m. UTC | #11
On Thu, 12 Jun 2014 10:55:42 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 12/06/2014 09:43, Greg Kurz ha scritto:
> > Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
> > the correct endianness with a heuristic ?
> >
> > if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > 	max_nr_ports = bswap32(max_nr_ports);
> > }
> >
> > if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> > 	return -EINVAL;
> > }
> 
> Yes, I guess it is acceptable.  So first you should fix the code to 
> always serialize fields as big-endian, and then apply this little hack 
> and virtio_tswap on top of the previous change.
> 
> Paolo
> 

BTW, can someone explain why we stream the device config ?
Alexander Graf June 12, 2014, 9:06 a.m. UTC | #12
On 12.06.14 10:47, Greg Kurz wrote:
> On Thu, 12 Jun 2014 10:54:48 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote:
>>> On Thu, 29 May 2014 12:16:26 +0200
>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> Il 29/05/2014 11:12, Greg Kurz ha scritto:
>>>>> int virtio_load(VirtIODevice *vdev, QEMUFile *f)
>>>>> {
>>>>> [...]
>>>>>              nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
>>>>>                         ^^^^^^^^^^^
>>>>>              /* Check it isn't doing very strange things with descriptor numbers. */
>>>>>              if (nheads > vdev->vq[i].vring.num) {
>>>>> [...]
>>>>> }
>>>>>
>>>>> and
>>>>>
>>>>> static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>>>>> {
>>>>> [...]
>>>>>      /* The config space */
>>>>>      qemu_get_be16s(f, &s->config.cols);
>>>>>      qemu_get_be16s(f, &s->config.rows);
>>>>>
>>>>>      qemu_get_be32s(f, &max_nr_ports);
>>>>>      tswap32s(&max_nr_ports);
>>>>>       ^^^^^^
>>>>>      if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>>>> [...]
>>>>> }
>>>>>
>>>>> If we stream subsections after the device descriptor as it is done
>>>>> in VMState, these two will break because the device endian is stale.
>>>>>
>>>>> The first one can be easily dealt with: just defer the sanity check
>>>>> to a post_load function.
>>>> Good, we're lucky here.
>>>>
>>>>> The second is a bit more tricky: the
>>>>> virtio serial migrates its config space (target endian) and the
>>>>> active ports bitmap. The load code assumes max_nr_ports from the
>>>>> config space tells the size of the ports bitmap... that means the
>>>>> virtio migration protocol is also contaminated by target endianness. :-\
>>>> Ouch.
>>>>
>>>> I guess we could break migration in the case of host endianness !=
>>>> target endianness, like this:
>>>>
>>>>       /* These three used to be fetched in target endianness and then
>>>>        * stored as big endian.  It ended up as little endian if host and
>>>>        * target endianness doesn't match.
>>>>        *
>>>>        * Starting with qemu 2.1, we always store as big endian.  The
>>>>        * version wasn't bumped to avoid breaking backwards compatibility.
>>>>        * We check the validity of max_nr_ports, and the incorrect-
>>>>        * endianness max_nr_ports will be huge, which will abort migration
>>>>        * anyway.
>>>>        */
>>>>       uint16_t cols = tswap16(s->config.cols);
>>>>       uint16_t rows = tswap16(s->config.rows);
>>>>       uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
>>>>
>>>>       qemu_put_be16s(f, &cols);
>>>>       qemu_put_be16s(f, &rows);
>>>>       qemu_put_be32s(f, &max_nr_ports);
>>>>
>>>> ...
>>>>
>>>>       uint16_t cols, rows;
>>>>
>>>>       qemu_get_be16s(f, &cols);
>>>>       qemu_get_be16s(f, &rows);
>>>>       qemu_get_be32s(f, &max_nr_ports);
>>>>
>>>>       /* Convert back to target endianness when storing into the config
>>>>        * space.
>>>>        */
>>> Paolo,
>>>
>>> The patch set to support endian changing targets adds a device_endian
>>> field to the VirtIODevice structure to be used instead of the default
>>> target endianness as it happens with tswap() macros. It also introduces
>>> virtio_tswap() helpers for this purpose, but they can only be used when
>>> the device_endian field has been restored... in a subsection after the
>>> device descriptor... :-\
>> Store it earlier then, using plain put/get.
> Not sure I follow... this will break compatibility, no ?
>
>> You can still add a section conditionally to cause
>> a cleaner failure in broken cross-version scenarios.
>>
>>> If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
>>> and we cannot convert back to LE...
>>>
>>>>       s->config.cols = tswap16(cols);
>>>>       s->config.rows = tswap16(rows);
>>> Since cols and rows are not involved in the protocol, we can safely
>>> defer the conversion to post load.
>>>
>>>>       if (max_nr_ports > tswap32(s->config.max_nr_ports) {
>>>>           ...
>>>>       }
>>>>
>>> Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
>>> the correct endianness with a heuristic ?
>>>
>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>> 	max_nr_ports = bswap32(max_nr_ports);
>>> }
>>>
>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>> 	return -EINVAL;
>>> }
>>>
>>>>> In the case the answer for above is "legacy virtio really sucks" then,
>>>>> is it acceptable to not honor bug-compatibility with older versions and
>>>>> fix the code ? :)
>>>> As long as the common cases don't break, yes.  The question is what are
>>>> the common cases.  Here I think the only non-obscure case that could
>>>> break is x86-on-PPC, and it's not that common.
>>>>
>>>> Paolo
>>>>
>>> Thanks.
>> One starts doubting whether all this hackery is worth it.  virtio 1.0
>> should be out real soon now, it makes everything LE so the problem goes
>> away. It's not like PPC LE is so popular that we must support old drivers
>> at all costs.  Won't time be better spent backporting virtio 1.0 drivers?
>>
> Hmmm... AFAIC some QEMU maintainers expressed interest in supporting legacy
> virtio in the case we have endian-changing targets. Patches to run a ppc64le
> guests have been accepted in KVM, Linux and QEMU... the only missing block
> is virtio. I don't especially care in supporting old drivers at all cost: this
> request was expressed on the list. I just want people to be able to run a ppc64le
> ubuntu-14.04 (and soon other distros) guest on a ppc64 box and be able to migrate.
>
> Would it be acceptable to break compatibility for ppc64 (and maybe ARM) only with
> a target specific hook being called from the virtio code ?

I don't see the problem. Let's just make the config fields UNUSED in 
vmstate and keep the virtio-serial config space in good endian locally. 
Then you don't care about ordering of sections anymore.

Then we only need the subsection code for "endian is not target endian" 
which makes migration backwards compatible with non-switchable QEMU 
versions and everyone's happy. End of story :).


Alex
Michael S. Tsirkin June 12, 2014, 9:07 a.m. UTC | #13
On Thu, Jun 12, 2014 at 10:55:40AM +0200, Alexander Graf wrote:
> 
> On 12.06.14 09:54, Michael S. Tsirkin wrote:
> >On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote:
> >>On Thu, 29 May 2014 12:16:26 +0200
> >>Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>>Il 29/05/2014 11:12, Greg Kurz ha scritto:
> >>>>int virtio_load(VirtIODevice *vdev, QEMUFile *f)
> >>>>{
> >>>>[...]
> >>>>             nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
> >>>>                        ^^^^^^^^^^^
> >>>>             /* Check it isn't doing very strange things with descriptor numbers. */
> >>>>             if (nheads > vdev->vq[i].vring.num) {
> >>>>[...]
> >>>>}
> >>>>
> >>>>and
> >>>>
> >>>>static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> >>>>{
> >>>>[...]
> >>>>     /* The config space */
> >>>>     qemu_get_be16s(f, &s->config.cols);
> >>>>     qemu_get_be16s(f, &s->config.rows);
> >>>>
> >>>>     qemu_get_be32s(f, &max_nr_ports);
> >>>>     tswap32s(&max_nr_ports);
> >>>>      ^^^^^^
> >>>>     if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> >>>>[...]
> >>>>}
> >>>>
> >>>>If we stream subsections after the device descriptor as it is done
> >>>>in VMState, these two will break because the device endian is stale.
> >>>>
> >>>>The first one can be easily dealt with: just defer the sanity check
> >>>>to a post_load function.
> >>>Good, we're lucky here.
> >>>
> >>>>The second is a bit more tricky: the
> >>>>virtio serial migrates its config space (target endian) and the
> >>>>active ports bitmap. The load code assumes max_nr_ports from the
> >>>>config space tells the size of the ports bitmap... that means the
> >>>>virtio migration protocol is also contaminated by target endianness. :-\
> >>>Ouch.
> >>>
> >>>I guess we could break migration in the case of host endianness !=
> >>>target endianness, like this:
> >>>
> >>>      /* These three used to be fetched in target endianness and then
> >>>       * stored as big endian.  It ended up as little endian if host and
> >>>       * target endianness doesn't match.
> >>>       *
> >>>       * Starting with qemu 2.1, we always store as big endian.  The
> >>>       * version wasn't bumped to avoid breaking backwards compatibility.
> >>>       * We check the validity of max_nr_ports, and the incorrect-
> >>>       * endianness max_nr_ports will be huge, which will abort migration
> >>>       * anyway.
> >>>       */
> >>>      uint16_t cols = tswap16(s->config.cols);
> >>>      uint16_t rows = tswap16(s->config.rows);
> >>>      uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
> >>>
> >>>      qemu_put_be16s(f, &cols);
> >>>      qemu_put_be16s(f, &rows);
> >>>      qemu_put_be32s(f, &max_nr_ports);
> >>>
> >>>...
> >>>
> >>>      uint16_t cols, rows;
> >>>
> >>>      qemu_get_be16s(f, &cols);
> >>>      qemu_get_be16s(f, &rows);
> >>>      qemu_get_be32s(f, &max_nr_ports);
> >>>
> >>>      /* Convert back to target endianness when storing into the config
> >>>       * space.
> >>>       */
> >>Paolo,
> >>
> >>The patch set to support endian changing targets adds a device_endian
> >>field to the VirtIODevice structure to be used instead of the default
> >>target endianness as it happens with tswap() macros. It also introduces
> >>virtio_tswap() helpers for this purpose, but they can only be used when
> >>the device_endian field has been restored... in a subsection after the
> >>device descriptor... :-\
> >Store it earlier then, using plain put/get.
> >You can still add a section conditionally to cause
> >a cleaner failure in broken cross-version scenarios.
> >
> >>If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
> >>and we cannot convert back to LE...
> >>
> >>>      s->config.cols = tswap16(cols);
> >>>      s->config.rows = tswap16(rows);
> >>Since cols and rows are not involved in the protocol, we can safely
> >>defer the conversion to post load.
> >>
> >>>      if (max_nr_ports > tswap32(s->config.max_nr_ports) {
> >>>          ...
> >>>      }
> >>>
> >>Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
> >>the correct endianness with a heuristic ?
> >>
> >>if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> >>	max_nr_ports = bswap32(max_nr_ports);
> >>}
> >>
> >>if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> >>	return -EINVAL;
> >>}
> >>
> >>>>In the case the answer for above is "legacy virtio really sucks" then,
> >>>>is it acceptable to not honor bug-compatibility with older versions and
> >>>>fix the code ? :)
> >>>As long as the common cases don't break, yes.  The question is what are
> >>>the common cases.  Here I think the only non-obscure case that could
> >>>break is x86-on-PPC, and it's not that common.
> >>>
> >>>Paolo
> >>>
> >>Thanks.
> >One starts doubting whether all this hackery is worth it.  virtio 1.0
> >should be out real soon now, it makes everything LE so the problem goes
> >away. It's not like PPC LE is so popular that we must support old drivers
> >at all costs.  Won't time be better spent backporting virtio 1.0 drivers?
> 
> There are already released and working Linux distributions (Ubuntu,
> openSUSE, maybe others) that don't have virtio 1.0 drivers. Putting our
> heads into the sand is not an option ;).
> 
> 
> Alex

I don't get it. Does virtio work there at the moment?
Alexander Graf June 12, 2014, 9:08 a.m. UTC | #14
On 12.06.14 11:07, Michael S. Tsirkin wrote:
> On Thu, Jun 12, 2014 at 10:55:40AM +0200, Alexander Graf wrote:
>> On 12.06.14 09:54, Michael S. Tsirkin wrote:
>>> On Thu, Jun 12, 2014 at 09:43:51AM +0200, Greg Kurz wrote:
>>>> On Thu, 29 May 2014 12:16:26 +0200
>>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> Il 29/05/2014 11:12, Greg Kurz ha scritto:
>>>>>> int virtio_load(VirtIODevice *vdev, QEMUFile *f)
>>>>>> {
>>>>>> [...]
>>>>>>              nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
>>>>>>                         ^^^^^^^^^^^
>>>>>>              /* Check it isn't doing very strange things with descriptor numbers. */
>>>>>>              if (nheads > vdev->vq[i].vring.num) {
>>>>>> [...]
>>>>>> }
>>>>>>
>>>>>> and
>>>>>>
>>>>>> static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>>>>>> {
>>>>>> [...]
>>>>>>      /* The config space */
>>>>>>      qemu_get_be16s(f, &s->config.cols);
>>>>>>      qemu_get_be16s(f, &s->config.rows);
>>>>>>
>>>>>>      qemu_get_be32s(f, &max_nr_ports);
>>>>>>      tswap32s(&max_nr_ports);
>>>>>>       ^^^^^^
>>>>>>      if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>>>>> [...]
>>>>>> }
>>>>>>
>>>>>> If we stream subsections after the device descriptor as it is done
>>>>>> in VMState, these two will break because the device endian is stale.
>>>>>>
>>>>>> The first one can be easily dealt with: just defer the sanity check
>>>>>> to a post_load function.
>>>>> Good, we're lucky here.
>>>>>
>>>>>> The second is a bit more tricky: the
>>>>>> virtio serial migrates its config space (target endian) and the
>>>>>> active ports bitmap. The load code assumes max_nr_ports from the
>>>>>> config space tells the size of the ports bitmap... that means the
>>>>>> virtio migration protocol is also contaminated by target endianness. :-\
>>>>> Ouch.
>>>>>
>>>>> I guess we could break migration in the case of host endianness !=
>>>>> target endianness, like this:
>>>>>
>>>>>       /* These three used to be fetched in target endianness and then
>>>>>        * stored as big endian.  It ended up as little endian if host and
>>>>>        * target endianness doesn't match.
>>>>>        *
>>>>>        * Starting with qemu 2.1, we always store as big endian.  The
>>>>>        * version wasn't bumped to avoid breaking backwards compatibility.
>>>>>        * We check the validity of max_nr_ports, and the incorrect-
>>>>>        * endianness max_nr_ports will be huge, which will abort migration
>>>>>        * anyway.
>>>>>        */
>>>>>       uint16_t cols = tswap16(s->config.cols);
>>>>>       uint16_t rows = tswap16(s->config.rows);
>>>>>       uint32_t max_nr_ports = tswap32(s->config.max_nr_ports);
>>>>>
>>>>>       qemu_put_be16s(f, &cols);
>>>>>       qemu_put_be16s(f, &rows);
>>>>>       qemu_put_be32s(f, &max_nr_ports);
>>>>>
>>>>> ...
>>>>>
>>>>>       uint16_t cols, rows;
>>>>>
>>>>>       qemu_get_be16s(f, &cols);
>>>>>       qemu_get_be16s(f, &rows);
>>>>>       qemu_get_be32s(f, &max_nr_ports);
>>>>>
>>>>>       /* Convert back to target endianness when storing into the config
>>>>>        * space.
>>>>>        */
>>>> Paolo,
>>>>
>>>> The patch set to support endian changing targets adds a device_endian
>>>> field to the VirtIODevice structure to be used instead of the default
>>>> target endianness as it happens with tswap() macros. It also introduces
>>>> virtio_tswap() helpers for this purpose, but they can only be used when
>>>> the device_endian field has been restored... in a subsection after the
>>>> device descriptor... :-\
>>> Store it earlier then, using plain put/get.
>>> You can still add a section conditionally to cause
>>> a cleaner failure in broken cross-version scenarios.
>>>
>>>> If the scenario is ppc64le-on-ppc64: tswap() macros don't do anything
>>>> and we cannot convert back to LE...
>>>>
>>>>>       s->config.cols = tswap16(cols);
>>>>>       s->config.rows = tswap16(rows);
>>>> Since cols and rows are not involved in the protocol, we can safely
>>>> defer the conversion to post load.
>>>>
>>>>>       if (max_nr_ports > tswap32(s->config.max_nr_ports) {
>>>>>           ...
>>>>>       }
>>>>>
>>>> Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
>>>> the correct endianness with a heuristic ?
>>>>
>>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>>> 	max_nr_ports = bswap32(max_nr_ports);
>>>> }
>>>>
>>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>>> 	return -EINVAL;
>>>> }
>>>>
>>>>>> In the case the answer for above is "legacy virtio really sucks" then,
>>>>>> is it acceptable to not honor bug-compatibility with older versions and
>>>>>> fix the code ? :)
>>>>> As long as the common cases don't break, yes.  The question is what are
>>>>> the common cases.  Here I think the only non-obscure case that could
>>>>> break is x86-on-PPC, and it's not that common.
>>>>>
>>>>> Paolo
>>>>>
>>>> Thanks.
>>> One starts doubting whether all this hackery is worth it.  virtio 1.0
>>> should be out real soon now, it makes everything LE so the problem goes
>>> away. It's not like PPC LE is so popular that we must support old drivers
>>> at all costs.  Won't time be better spent backporting virtio 1.0 drivers?
>> There are already released and working Linux distributions (Ubuntu,
>> openSUSE, maybe others) that don't have virtio 1.0 drivers. Putting our
>> heads into the sand is not an option ;).
>>
>>
>> Alex
> I don't get it. Does virtio work there at the moment?

With the LE enable patch, yes, sure. Imagine the same would happen for 
Windows and e1000 - would you still argue the same way?


Alex
Paolo Bonzini June 12, 2014, 9:19 a.m. UTC | #15
Il 12/06/2014 11:06, Greg Kurz ha scritto:
> On Thu, 12 Jun 2014 10:55:42 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> Il 12/06/2014 09:43, Greg Kurz ha scritto:
>>> Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
>>> the correct endianness with a heuristic ?
>>>
>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>> 	max_nr_ports = bswap32(max_nr_ports);
>>> }
>>>
>>> if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
>>> 	return -EINVAL;
>>> }
>>
>> Yes, I guess it is acceptable.  So first you should fix the code to
>> always serialize fields as big-endian, and then apply this little hack
>> and virtio_tswap on top of the previous change.
>
> BTW, can someone explain why we stream the device config ?

For max_nr_ports it is useless indeed.  Let's keep storing it for 
backwards compatibility, but you can remove its load.

The cols and rows values should be defined by the host and thus could 
even change on migration, there is no need to store them.  As of now, 
however, they are unused in QEMU and should always be zero, because 
VIRTIO_CONSOLE_F_SIZE is not supported.

Paolo
Michael S. Tsirkin June 12, 2014, 9:37 a.m. UTC | #16
On Thu, Jun 12, 2014 at 11:19:47AM +0200, Paolo Bonzini wrote:
> Il 12/06/2014 11:06, Greg Kurz ha scritto:
> >On Thu, 12 Jun 2014 10:55:42 +0200
> >Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> >>Il 12/06/2014 09:43, Greg Kurz ha scritto:
> >>>Since we know that 0 < max_nr_ports < 32,  is it acceptable to guess
> >>>the correct endianness with a heuristic ?
> >>>
> >>>if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> >>>	max_nr_ports = bswap32(max_nr_ports);
> >>>}
> >>>
> >>>if (max_nr_ports > tswap32(s->config.max_nr_ports)) {
> >>>	return -EINVAL;
> >>>}
> >>
> >>Yes, I guess it is acceptable.  So first you should fix the code to
> >>always serialize fields as big-endian, and then apply this little hack
> >>and virtio_tswap on top of the previous change.
> >
> >BTW, can someone explain why we stream the device config ?
> 
> For max_nr_ports it is useless indeed.  Let's keep storing it for backwards
> compatibility, but you can remove its load.
> 
> The cols and rows values should be defined by the host and thus could even
> change on migration, there is no need to store them.  As of now, however,
> they are unused in QEMU and should always be zero, because
> VIRTIO_CONSOLE_F_SIZE is not supported.
> 
> Paolo

Maybe just drop unnecessary stuff for new machine types?
Then we won't need hacks to migrate it.
Paolo Bonzini June 12, 2014, 9:39 a.m. UTC | #17
Il 12/06/2014 11:37, Michael S. Tsirkin ha scritto:
> Maybe just drop unnecessary stuff for new machine types?
> Then we won't need hacks to migrate it.

For any machine type it's trivial to migrate it.  All machine types 
including old ones can disregard the migrated values.

Paolo
Michael S. Tsirkin June 12, 2014, 10:55 a.m. UTC | #18
On Thu, Jun 12, 2014 at 11:39:42AM +0200, Paolo Bonzini wrote:
> Il 12/06/2014 11:37, Michael S. Tsirkin ha scritto:
> >Maybe just drop unnecessary stuff for new machine types?
> >Then we won't need hacks to migrate it.
> 
> For any machine type it's trivial to migrate it.  All machine types
> including old ones can disregard the migrated values.
> 
> Paolo

Ah, good idea, so simply disregard the values in load.
diff mbox

Patch

diff --git a/exec.c b/exec.c
index 91513c6..4e83588 100644
--- a/exec.c
+++ b/exec.c
@@ -2740,13 +2740,7 @@  int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
 #endif
 
 #if !defined(CONFIG_USER_ONLY)
-
-/*
- * A helper function for the _utterly broken_ virtio device model to find out if
- * it's running on a big endian machine. Don't do this at home kids!
- */
-bool virtio_is_big_endian(void);
-bool virtio_is_big_endian(void)
+bool target_words_bigendian(void)
 {
 #if defined(TARGET_WORDS_BIGENDIAN)
     return true;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ce97514..2ffceb8 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -89,9 +89,6 @@ 
 /* Flags track per-device state like workarounds for quirks in older guests. */
 #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1 << 0)
 
-/* HACK for virtio to determine if it's running a big endian guest */
-bool virtio_is_big_endian(void);
-
 static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size,
                                VirtIOPCIProxy *dev);
 
@@ -409,13 +406,13 @@  static uint64_t virtio_pci_config_read(void *opaque, hwaddr addr,
         break;
     case 2:
         val = virtio_config_readw(vdev, addr);
-        if (virtio_is_big_endian()) {
+        if (virtio_is_big_endian(vdev)) {
             val = bswap16(val);
         }
         break;
     case 4:
         val = virtio_config_readl(vdev, addr);
-        if (virtio_is_big_endian()) {
+        if (virtio_is_big_endian(vdev)) {
             val = bswap32(val);
         }
         break;
@@ -443,13 +440,13 @@  static void virtio_pci_config_write(void *opaque, hwaddr addr,
         virtio_config_writeb(vdev, addr, val);
         break;
     case 2:
-        if (virtio_is_big_endian()) {
+        if (virtio_is_big_endian(vdev)) {
             val = bswap16(val);
         }
         virtio_config_writew(vdev, addr, val);
         break;
     case 4:
-        if (virtio_is_big_endian()) {
+        if (virtio_is_big_endian(vdev)) {
             val = bswap32(val);
         }
         virtio_config_writel(vdev, addr, val);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f4eaa3f..9018b6c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -547,6 +547,12 @@  void virtio_reset(void *opaque)
 
     virtio_set_status(vdev, 0);
 
+    if (target_words_bigendian()) {
+        vdev->device_endian = VIRTIO_DEVICE_ENDIAN_BIG;
+    } else {
+        vdev->device_endian = VIRTIO_DEVICE_ENDIAN_LITTLE;
+    }
+
     if (k->reset) {
         k->reset(vdev);
     }
@@ -834,12 +840,28 @@  void virtio_notify_config(VirtIODevice *vdev)
     virtio_notify_vector(vdev, vdev->config_vector);
 }
 
+static void virtio_save_device_endian(VirtIODevice *vdev, QEMUFile *f)
+{
+    qemu_put_byte(f, vdev->device_endian);
+}
+
+static int virtio_load_device_endian(VirtIODevice *vdev, QEMUFile *f)
+{
+    vdev->device_endian = qemu_get_byte(f);
+    return 0;
+}
+
 static const struct VirtIOSubsectionDescStruct {
     const char *name;
     int version_id;
     void (*save)(VirtIODevice *vdev, QEMUFile *f);
     int (*load)(VirtIODevice *vdev, QEMUFile *f);
 } virtio_subsection[] = {
+    { .name = "virtio/is_big_endian",
+      .version_id = 1,
+      .save = virtio_save_device_endian,
+      .load = virtio_load_device_endian,
+    },
     { .name = NULL }
 };
 
@@ -861,6 +883,8 @@  void virtio_save_subsections(VirtIODevice *vdev, QEMUFile *f)
 
 int virtio_load_subsections(VirtIODevice *vdev, QEMUFile *f)
 {
+    int i;
+
     while (qemu_peek_byte(f, 0) == QEMU_VM_SUBSECTION) {
         char idstr[256];
         uint8_t len, size;
@@ -895,6 +919,26 @@  int virtio_load_subsections(VirtIODevice *vdev, QEMUFile *f)
         }
     }
 
+    for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
+        if (vdev->vq[i].vring.num == 0) {
+            break;
+        }
+
+        if (vdev->vq[i].pa) {
+            uint16_t nheads;
+            nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
+            /* Check it isn't doing strange things with descriptor numbers. */
+            if (nheads > vdev->vq[i].vring.num) {
+                error_report("VQ %d size 0x%x Guest index 0x%x "
+                             "inconsistent with Host index 0x%x: delta 0x%x",
+                             i, vdev->vq[i].vring.num,
+                             vring_avail_idx(&vdev->vq[i]),
+                             vdev->vq[i].last_avail_idx, nheads);
+                return -1;
+            }
+        }
+    }
+
     return 0;
 }
 
@@ -962,6 +1006,11 @@  int virtio_load(VirtIODevice *vdev, QEMUFile *f)
     BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 
+    /* We poison the endianness to ensure it does not get used before
+     * subsections have been loaded.
+     */
+    vdev->device_endian = VIRTIO_DEVICE_ENDIAN_UNKNOWN;
+
     if (k->load_config) {
         ret = k->load_config(qbus->parent, f);
         if (ret)
@@ -995,18 +1044,7 @@  int virtio_load(VirtIODevice *vdev, QEMUFile *f)
         vdev->vq[i].notification = true;
 
         if (vdev->vq[i].pa) {
-            uint16_t nheads;
             virtqueue_init(&vdev->vq[i]);
-            nheads = vring_avail_idx(&vdev->vq[i]) - vdev->vq[i].last_avail_idx;
-            /* Check it isn't doing very strange things with descriptor numbers. */
-            if (nheads > vdev->vq[i].vring.num) {
-                error_report("VQ %d size 0x%x Guest index 0x%x "
-                             "inconsistent with Host index 0x%x: delta 0x%x",
-                             i, vdev->vq[i].vring.num,
-                             vring_avail_idx(&vdev->vq[i]),
-                             vdev->vq[i].last_avail_idx, nheads);
-                return -1;
-            }
         } else if (vdev->vq[i].last_avail_idx) {
             error_report("VQ %d address 0x0 "
                          "inconsistent with Host index 0x%x",
@@ -1078,6 +1116,10 @@  void virtio_init(VirtIODevice *vdev, const char *name,
     }
     vdev->vmstate = qemu_add_vm_change_state_handler(virtio_vmstate_change,
                                                      vdev);
+    /* We poison the endianness to ensure it does not get used before
+     * the device is reset.
+     */
+    vdev->device_endian = VIRTIO_DEVICE_ENDIAN_UNKNOWN;
 }
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index a21b65a..eb798c1 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -122,4 +122,5 @@  void qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 
 #endif
 
+bool target_words_bigendian(void);
 #endif /* !CPU_COMMON_H */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 82f852f..0012470 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -104,6 +104,12 @@  typedef struct VirtQueueElement
 #define VIRTIO_DEVICE(obj) \
         OBJECT_CHECK(VirtIODevice, (obj), TYPE_VIRTIO_DEVICE)
 
+enum virtio_device_endian {
+    VIRTIO_DEVICE_ENDIAN_UNKNOWN,
+    VIRTIO_DEVICE_ENDIAN_LITTLE,
+    VIRTIO_DEVICE_ENDIAN_BIG,
+};
+
 struct VirtIODevice
 {
     DeviceState parent_obj;
@@ -121,6 +127,7 @@  struct VirtIODevice
     bool vm_running;
     VMChangeStateEntry *vmstate;
     char *bus_name;
+    enum virtio_device_endian device_endian;
 };
 
 typedef struct VirtioDeviceClass {
@@ -257,4 +264,10 @@  void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign,
                                                bool set_handler);
 void virtio_queue_notify_vq(VirtQueue *vq);
 void virtio_irq(VirtQueue *vq);
+
+static inline bool virtio_is_big_endian(VirtIODevice *vdev)
+{
+    assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
+    return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+}
 #endif