diff mbox

[PULL,21/29] xen/pt: Sync up the dev.config and data values.

Message ID 1441905361-31967-21-git-send-email-stefano.stabellini@eu.citrix.com
State New
Headers show

Commit Message

Stefano Stabellini Sept. 10, 2015, 5:15 p.m. UTC
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

For a passthrough device we maintain a state of emulated
registers value contained within d->config. We also consult
the host registers (and apply ro and write masks) whenever
the guest access the registers. This is done in xen_pt_pci_write_config
and xen_pt_pci_read_config.

Also in this picture we call pci_default_write_config which
updates the d->config and if the d->config[PCI_COMMAND] register
has PCI_COMMAND_MEMORY (or PCI_COMMAND_IO) acts on those changes.

On startup the d->config[PCI_COMMAND] are the host values, not
what the guest initial values should be, which is exactly what
we do _not_ want to do for 64-bit BARs when the guest just wants
to read the size of the BAR. Huh you say?

To get the size of 64-bit memory space BARs,  the guest has
to calculate ((BAR[x] & 0xFFFFFFF0) + ((BAR[x+1] & 0xFFFFFFFF) << 32))
which means it has to do two writes of ~0 to BARx and BARx+1.

prior to this patch and with XSA120-addendum patch (Linux kernel)
the PCI_COMMAND register is copied from the host it can have
PCI_COMMAND_MEMORY bit set which means that QEMU will try to
update the hypervisor's P2M with BARx+1 value to ~0 (0xffffffff)
(to sync the guest state to host) instead of just having
xen_pt_pci_write_config and xen_pt_bar_reg_write apply the
proper masks and return the size to the guest.

To thwart this, this patch syncs up the host values with the
guest values taking into account the emu_mask (bit set means
we emulate, PCI_COMMAND_MEMORY and PCI_COMMAND_IO are set).
That is we copy the host values - masking out any bits which
we will emulate. Then merge it with the initial emulation register
values. Lastly this value is then copied both in
dev.config _and_ XenPTReg->data field.

There is also reg->size accounting taken into consideration
that ends up being used in patch.
 xen/pt: Check if reg->init function sets the 'data' past the reg->size

This fixes errors such as these:

(XEN) memory_map:add: dom2 gfn=fffe0 mfn=fbce0 nr=20
(DEBUG) 189 pci dev 04:0 BAR16 wrote ~0.
(DEBUG) 200 pci dev 04:0 BAR16 read 0x0fffe0004.
(XEN) memory_map:remove: dom2 gfn=fffe0 mfn=fbce0 nr=20
(DEBUG) 204 pci dev 04:0 BAR16 wrote 0x0fffe0004.
(DEBUG) 217 pci dev 04:0 BAR16 read upper 0x000000000.
(XEN) memory_map:add: dom2 gfn=ffffffff00000 mfn=fbce0 nr=20
(XEN) p2m.c:883:d0v0 p2m_set_entry failed! mfn=ffffffffffffffff rc:-22
(XEN) memory_map:fail: dom2 gfn=ffffffff00000 mfn=fbce0 nr=20 ret:-22
(XEN) memory_map:remove: dom2 gfn=ffffffff00000 mfn=fbce0 nr=20
(XEN) p2m.c:920:d0v0 gfn_to_mfn failed! gfn=ffffffff00000 type:4
(XEN) p2m.c:920:d0v0 gfn_to_mfn failed! gfn=ffffffff00001 type:4
..
(XEN) memory_map: error -22 removing dom2 access to [fbce0,fbcff]
(DEBUG) 222 pci dev 04:0 BAR16 read upper 0x0ffffffff.
(XEN) memory_map:remove: dom2 gfn=ffffffff00000 mfn=fbce0 nr=20
(XEN) memory_map: error -22 removing dom2 access to [fbce0,fbcff]

[The DEBUG is to illustate what the hvmloader was doing]

Also we swap from xen_host_pci_long to using xen_host_pci_get_[byte,word,long].

Otherwise we get:

xen_pt_config_reg_init: Offset 0x0004 mismatch! Emulated=0x0000, host=0x2300017, syncing to 0x2300014.
xen_pt_config_reg_init: Error: Offset 0x0004:0x2300014 expands past register size(2)!

which is not surprising. We read the value as an 32-bit (from host),
then operate it as a 16-bit - and the remainder is left unchanged.

We end up writing the value as 16-bit (so 0014) to dev.config
(as we use proper xen_set_host_[byte,word,long] so we don't spill
to other registers) but in XenPTReg->data it is as 32-bit (0x2300014)!

It is harmless as the read/write functions end up using an size mask
and never modify the bits past 16-bit (reg->size is 2).

This patch fixes the warnings by reading the value using the
proper size.

Note that the check for size is still left in-case the developer
sets bits past the reg->size in the ->init routines. The author
tried to fiddle with QEMU_BUILD_BUG to make this work but failed.

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 hw/xen/xen_pt_config_init.c |   59 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Sept. 14, 2015, 10:01 a.m. UTC | #1
On 10/09/2015 19:15, Stefano Stabellini wrote:
> +
> +        switch (reg->size) {
> +        case 1: rc = xen_host_pci_get_byte(&s->real_device, offset, (uint8_t *)&val);

A bit ugly, and it relies on the host being little endian.

> +                break;
> +        case 2: rc = xen_host_pci_get_word(&s->real_device, offset, (uint16_t *)&val);

Same here.

> +                break;
> +        case 4: rc = xen_host_pci_get_long(&s->real_device, offset, &val);
> +                break;
> +        default: assert(1);

This should be assert(0) or, better, abort().

Paolo
Stefano Stabellini Sept. 15, 2015, 10:07 a.m. UTC | #2
CC Konrad

On Mon, 14 Sep 2015, Paolo Bonzini wrote:
> On 10/09/2015 19:15, Stefano Stabellini wrote:
> > +
> > +        switch (reg->size) {
> > +        case 1: rc = xen_host_pci_get_byte(&s->real_device, offset, (uint8_t *)&val);
> 
> A bit ugly, and it relies on the host being little endian.
> 
> > +                break;
> > +        case 2: rc = xen_host_pci_get_word(&s->real_device, offset, (uint16_t *)&val);
> 
> Same here.

cpu_to_le32?

But in practice, Xen being little endian only, I doubt that xen_pt_config_init.c
would actually work on be.


> > +                break;
> > +        case 4: rc = xen_host_pci_get_long(&s->real_device, offset, &val);
> > +                break;
> > +        default: assert(1);
> 
> This should be assert(0) or, better, abort().
Konrad Rzeszutek Wilk Sept. 15, 2015, 1:25 p.m. UTC | #3
On Tue, Sep 15, 2015 at 11:07:02AM +0100, Stefano Stabellini wrote:
> CC Konrad
> 
> On Mon, 14 Sep 2015, Paolo Bonzini wrote:
> > On 10/09/2015 19:15, Stefano Stabellini wrote:
> > > +
> > > +        switch (reg->size) {
> > > +        case 1: rc = xen_host_pci_get_byte(&s->real_device, offset, (uint8_t *)&val);
> > 
> > A bit ugly, and it relies on the host being little endian.
> > 
> > > +                break;
> > > +        case 2: rc = xen_host_pci_get_word(&s->real_device, offset, (uint16_t *)&val);
> > 
> > Same here.
> 
> cpu_to_le32?
> 
> But in practice, Xen being little endian only, I doubt that xen_pt_config_init.c
> would actually work on be.
> 
> 
> > > +                break;
> > > +        case 4: rc = xen_host_pci_get_long(&s->real_device, offset, &val);
> > > +                break;
> > > +        default: assert(1);
> > 
> > This should be assert(0) or, better, abort().

OK. Stefano, do you want me to:

 1). Rebase the patches on top of your tag?
 2). Send an follow up patch to change this to abort()? (and wherever else I used
     assert(..)?
 3). Wait till Paolo is done going through the patchset and then revisit 1) or 2)?
Stefano Stabellini Sept. 15, 2015, 1:28 p.m. UTC | #4
On Tue, 15 Sep 2015, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 15, 2015 at 11:07:02AM +0100, Stefano Stabellini wrote:
> > CC Konrad
> > 
> > On Mon, 14 Sep 2015, Paolo Bonzini wrote:
> > > On 10/09/2015 19:15, Stefano Stabellini wrote:
> > > > +
> > > > +        switch (reg->size) {
> > > > +        case 1: rc = xen_host_pci_get_byte(&s->real_device, offset, (uint8_t *)&val);
> > > 
> > > A bit ugly, and it relies on the host being little endian.
> > > 
> > > > +                break;
> > > > +        case 2: rc = xen_host_pci_get_word(&s->real_device, offset, (uint16_t *)&val);
> > > 
> > > Same here.
> > 
> > cpu_to_le32?
> > 
> > But in practice, Xen being little endian only, I doubt that xen_pt_config_init.c
> > would actually work on be.
> > 
> > 
> > > > +                break;
> > > > +        case 4: rc = xen_host_pci_get_long(&s->real_device, offset, &val);
> > > > +                break;
> > > > +        default: assert(1);
> > > 
> > > This should be assert(0) or, better, abort().
> 
> OK. Stefano, do you want me to:
> 
>  1). Rebase the patches on top of your tag?

Patches are already upstream, so no worries about rebasing.


>  2). Send an follow up patch to change this to abort()? (and wherever else I used
>      assert(..)?

Yes, that would be good.


>  3). Wait till Paolo is done going through the patchset and then revisit 1) or 2)?

I don't know if Paolo has any more comments.

Regarding his previous comment on little-endian, as I wrote I am not
sure if sprinkling around some cpu_to_le32 would actually make
hw/xen/xen_pt_config_init.c much better. I'll leave that to you.
Paolo Bonzini Sept. 15, 2015, 1:32 p.m. UTC | #5
On 15/09/2015 15:28, Stefano Stabellini wrote:
> 
>> >  2). Send an follow up patch to change this to abort()? (and wherever else I used
>> >      assert(..)?
> Yes, that would be good.
> 
> 
>> >  3). Wait till Paolo is done going through the patchset and then revisit 1) or 2)?
> I don't know if Paolo has any more comments.

No, the few comments I had were only prompted by Coverity.

Paolo
Konrad Rzeszutek Wilk Sept. 15, 2015, 1:55 p.m. UTC | #6
On Tue, Sep 15, 2015 at 02:28:51PM +0100, Stefano Stabellini wrote:
> On Tue, 15 Sep 2015, Konrad Rzeszutek Wilk wrote:
> > On Tue, Sep 15, 2015 at 11:07:02AM +0100, Stefano Stabellini wrote:
> > > CC Konrad
> > > 
> > > On Mon, 14 Sep 2015, Paolo Bonzini wrote:
> > > > On 10/09/2015 19:15, Stefano Stabellini wrote:
> > > > > +
> > > > > +        switch (reg->size) {
> > > > > +        case 1: rc = xen_host_pci_get_byte(&s->real_device, offset, (uint8_t *)&val);
> > > > 
> > > > A bit ugly, and it relies on the host being little endian.
> > > > 
> > > > > +                break;
> > > > > +        case 2: rc = xen_host_pci_get_word(&s->real_device, offset, (uint16_t *)&val);
> > > > 
> > > > Same here.
> > > 
> > > cpu_to_le32?
> > > 
> > > But in practice, Xen being little endian only, I doubt that xen_pt_config_init.c
> > > would actually work on be.
> > > 
> > > 
> > > > > +                break;
> > > > > +        case 4: rc = xen_host_pci_get_long(&s->real_device, offset, &val);
> > > > > +                break;
> > > > > +        default: assert(1);
> > > > 
> > > > This should be assert(0) or, better, abort().
> > 
> > OK. Stefano, do you want me to:
> > 
> >  1). Rebase the patches on top of your tag?
> 
> Patches are already upstream, so no worries about rebasing.
> 
> 
> >  2). Send an follow up patch to change this to abort()? (and wherever else I used
> >      assert(..)?
> 
> Yes, that would be good.
> 
> 
> >  3). Wait till Paolo is done going through the patchset and then revisit 1) or 2)?
> 
> I don't know if Paolo has any more comments.
> 
> Regarding his previous comment on little-endian, as I wrote I am not
> sure if sprinkling around some cpu_to_le32 would actually make
> hw/xen/xen_pt_config_init.c much better. I'll leave that to you.

I got some other outstanding things I need to get done so will be
as most lazy as possible in regards to this and just send you a patch :-)
diff mbox

Patch

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index a75baea..b1ad743 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1893,6 +1893,10 @@  static int xen_pt_config_reg_init(XenPCIPassthroughState *s,
     reg_entry->reg = reg;
 
     if (reg->init) {
+        uint32_t host_mask, size_mask;
+        unsigned int offset;
+        uint32_t val;
+
         /* initialize emulate register */
         rc = reg->init(s, reg_entry->reg,
                        reg_grp->base_offset + reg->offset, &data);
@@ -1905,8 +1909,61 @@  static int xen_pt_config_reg_init(XenPCIPassthroughState *s,
             g_free(reg_entry);
             return 0;
         }
+        /* Sync up the data to dev.config */
+        offset = reg_grp->base_offset + reg->offset;
+        size_mask = 0xFFFFFFFF >> ((4 - reg->size) << 3);
+
+        switch (reg->size) {
+        case 1: rc = xen_host_pci_get_byte(&s->real_device, offset, (uint8_t *)&val);
+                break;
+        case 2: rc = xen_host_pci_get_word(&s->real_device, offset, (uint16_t *)&val);
+                break;
+        case 4: rc = xen_host_pci_get_long(&s->real_device, offset, &val);
+                break;
+        default: assert(1);
+        }
+        if (rc) {
+            /* Serious issues when we cannot read the host values! */
+            g_free(reg_entry);
+            return rc;
+        }
+        /* Set bits in emu_mask are the ones we emulate. The dev.config shall
+         * contain the emulated view of the guest - therefore we flip the mask
+         * to mask out the host values (which dev.config initially has) . */
+        host_mask = size_mask & ~reg->emu_mask;
+
+        if ((data & host_mask) != (val & host_mask)) {
+            uint32_t new_val;
+
+            /* Mask out host (including past size). */
+            new_val = val & host_mask;
+            /* Merge emulated ones (excluding the non-emulated ones). */
+            new_val |= data & host_mask;
+            /* Leave intact host and emulated values past the size - even though
+             * we do not care as we write per reg->size granularity, but for the
+             * logging below lets have the proper value. */
+            new_val |= ((val | data)) & ~size_mask;
+            XEN_PT_LOG(&s->dev,"Offset 0x%04x mismatch! Emulated=0x%04x, host=0x%04x, syncing to 0x%04x.\n",
+                       offset, data, val, new_val);
+            val = new_val;
+        } else
+            val = data;
+
+        /* This could be just pci_set_long as we don't modify the bits
+         * past reg->size, but in case this routine is run in parallel
+         * we do not want to over-write other registers. */
+        switch (reg->size) {
+        case 1: pci_set_byte(s->dev.config + offset, (uint8_t)val);
+                break;
+        case 2: pci_set_word(s->dev.config + offset, (uint16_t)val);
+                break;
+        case 4: pci_set_long(s->dev.config + offset, val);
+                break;
+        default: assert(1);
+        }
         /* set register value */
-        reg_entry->data = data;
+        reg_entry->data = val;
+
     }
     /* list add register entry */
     QLIST_INSERT_HEAD(&reg_grp->reg_tbl_list, reg_entry, entries);