diff mbox

intel_iommu: add "eim" property

Message ID 1470922183-22873-1-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu Aug. 11, 2016, 1:29 p.m. UTC
Adding one extra property for intel-iommu device to decide whether we
should support EIM bit for IR.

Now we are throwing high 24 bits of dest_id away directly. This will
cause interrupt issues with guests that:

- enabled x2apic with cluster mode
- have more than 16 vcpus (so will have more than 1 cluster)

Let's make xapic the default one, and for the brave people who would
like to try EIM and know the side effects, we can do it by explicitly
enabling EIM using:

  -device intel-iommu,intremap=on,eim=on

Even after we have x2apic support, it'll still be good if we can provide
a way to switch xapic/x2apic from QEMU side for e.g. debugging purpose,
which is an alternative for tuning guest kernel boot parameters.

We can switch the default to "on" after x2apic fully supported.

Signed-off-by: Peter Xu <peterx@redhat.com>
---

This will have small conflict with Radim's patches to selectively
enable EIM. I can do a rebase when needed.

 hw/i386/intel_iommu.c         | 37 ++++++++++++++++++++++++++++++++++++-
 include/hw/i386/intel_iommu.h |  1 +
 2 files changed, 37 insertions(+), 1 deletion(-)

Comments

Radim Krčmář Aug. 11, 2016, 3:44 p.m. UTC | #1
2016-08-11 21:29+0800, Peter Xu:
> Adding one extra property for intel-iommu device to decide whether we
> should support EIM bit for IR.
> 
> Now we are throwing high 24 bits of dest_id away directly. This will
> cause interrupt issues with guests that:
> 
> - enabled x2apic with cluster mode
> - have more than 16 vcpus (so will have more than 1 cluster)

Good catch, it is a problem starting at 9 VCPUs, because that one is
already "0x100" and we truncate to 8 bits ...

> Let's make xapic the default one, and for the brave people who would
> like to try EIM and know the side effects, we can do it by explicitly
> enabling EIM using:

This might bite us: users cannot easily tell when EIM is sane, so eim=on
is going to be a hazard even after some KVM/QEMU has it fixed.
I'd make make eim=on fail when logical x2APIC is broken.

Right now, we can allow eim=on if maximal APIC ID < 8, because physical
0xffffffff gets truncated to 0xff and is still interpreted as a
broadcast thanks to the quirk.
Logical 0xff gets misinterpreted as a broadcast, but it would address
all APICs anyway and KVM's lowest-priority works for broadcast too, so
it is fine to do so.  Not very wise, though. :)

>   -device intel-iommu,intremap=on,eim=on
> 
> Even after we have x2apic support, it'll still be good if we can provide
> a way to switch xapic/x2apic from QEMU side for e.g. debugging purpose,
> which is an alternative for tuning guest kernel boot parameters.
> 
> We can switch the default to "on" after x2apic fully supported.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> This will have small conflict with Radim's patches to selectively
> enable EIM. I can do a rebase when needed.

This patch made me realize that broadcast quirk is not enough, so I'll
do v2 on top of this.
Paolo Bonzini Aug. 12, 2016, 8:34 a.m. UTC | #2
On 11/08/2016 15:29, Peter Xu wrote:
> +
> +static void vtd_eim_prop_set(Object *o, bool value, Error **errp)
> +{
> +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(o);
> +    s->eim_supported = value;
> +}
> +
> +static void vtd_instance_init(Object *o)
> +{
> +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(o);
> +
> +    /*
> +     * TODO: we can enable this by default after we have full x2apic
> +     * support.
> +     */
> +    s->eim_supported = false;
> +    object_property_add_bool(o, "eim", vtd_eim_prop_get,
> +                             vtd_eim_prop_set, NULL);
> +}
> +
>  static const TypeInfo vtd_info = {
>      .name          = TYPE_INTEL_IOMMU_DEVICE,
>      .parent        = TYPE_X86_IOMMU_DEVICE,
> +    .instance_init = vtd_instance_init,

This can use DEFINE_PROP_BOOL.

Paolo
Peter Xu Aug. 12, 2016, 9:22 a.m. UTC | #3
On Fri, Aug 12, 2016 at 10:34:44AM +0200, Paolo Bonzini wrote:
> 
> 
> On 11/08/2016 15:29, Peter Xu wrote:
> > +
> > +static void vtd_eim_prop_set(Object *o, bool value, Error **errp)
> > +{
> > +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(o);
> > +    s->eim_supported = value;
> > +}
> > +
> > +static void vtd_instance_init(Object *o)
> > +{
> > +    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(o);
> > +
> > +    /*
> > +     * TODO: we can enable this by default after we have full x2apic
> > +     * support.
> > +     */
> > +    s->eim_supported = false;
> > +    object_property_add_bool(o, "eim", vtd_eim_prop_get,
> > +                             vtd_eim_prop_set, NULL);
> > +}
> > +
> >  static const TypeInfo vtd_info = {
> >      .name          = TYPE_INTEL_IOMMU_DEVICE,
> >      .parent        = TYPE_X86_IOMMU_DEVICE,
> > +    .instance_init = vtd_instance_init,
> 
> This can use DEFINE_PROP_BOOL.

Right. Will post v2. Thanks!

-- peterx
Peter Xu Aug. 12, 2016, 9:29 a.m. UTC | #4
On Thu, Aug 11, 2016 at 05:44:16PM +0200, Radim Krčmář wrote:
> 2016-08-11 21:29+0800, Peter Xu:
> > Adding one extra property for intel-iommu device to decide whether we
> > should support EIM bit for IR.
> > 
> > Now we are throwing high 24 bits of dest_id away directly. This will
> > cause interrupt issues with guests that:
> > 
> > - enabled x2apic with cluster mode
> > - have more than 16 vcpus (so will have more than 1 cluster)
> 
> Good catch, it is a problem starting at 9 VCPUs, because that one is
> already "0x100" and we truncate to 8 bits ...

Ah yes. I muddled on the bits. :)

> 
> > Let's make xapic the default one, and for the brave people who would
> > like to try EIM and know the side effects, we can do it by explicitly
> > enabling EIM using:
> 
> This might bite us: users cannot easily tell when EIM is sane, so eim=on
> is going to be a hazard even after some KVM/QEMU has it fixed.
> I'd make make eim=on fail when logical x2APIC is broken.
> 
> Right now, we can allow eim=on if maximal APIC ID < 8, because physical
> 0xffffffff gets truncated to 0xff and is still interpreted as a
> broadcast thanks to the quirk.
> Logical 0xff gets misinterpreted as a broadcast, but it would address
> all APICs anyway and KVM's lowest-priority works for broadcast too, so
> it is fine to do so.  Not very wise, though. :)
> 
> >   -device intel-iommu,intremap=on,eim=on
> > 
> > Even after we have x2apic support, it'll still be good if we can provide
> > a way to switch xapic/x2apic from QEMU side for e.g. debugging purpose,
> > which is an alternative for tuning guest kernel boot parameters.
> > 
> > We can switch the default to "on" after x2apic fully supported.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > This will have small conflict with Radim's patches to selectively
> > enable EIM. I can do a rebase when needed.
> 
> This patch made me realize that broadcast quirk is not enough, so I'll
> do v2 on top of this.

I'll post v2 soon with DEFINE_PROP_BOOL. Thanks,

-- peterx
diff mbox

Patch

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 28c31a2..a2f5051 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2364,7 +2364,10 @@  static void vtd_init(IntelIOMMUState *s)
     s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
     if (x86_iommu->intr_supported) {
-        s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM | VTD_ECAP_MHMV;
+        s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
+        if (s->eim_supported) {
+            s->ecap |= VTD_ECAP_EIM;
+        }
     }
 
     vtd_reset_context_cache(s);
@@ -2468,6 +2471,12 @@  static void vtd_realize(DeviceState *dev, Error **errp)
     /* Pseudo address space under root PCI bus. */
     pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 
+    /* EIM bit requires IR */
+    if (s->eim_supported && !x86_iommu->intr_supported) {
+        error_report("EIM (Extended Interrupt Mode) bit requires intremap=on");
+        exit(1);
+    }
+
     /* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
     if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
         !kvm_irqchip_is_split()) {
@@ -2490,9 +2499,35 @@  static void vtd_class_init(ObjectClass *klass, void *data)
     x86_class->int_remap = vtd_int_remap;
 }
 
+static bool vtd_eim_prop_get(Object *o, Error **errp)
+{
+    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(o);
+    return s->eim_supported;
+}
+
+static void vtd_eim_prop_set(Object *o, bool value, Error **errp)
+{
+    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(o);
+    s->eim_supported = value;
+}
+
+static void vtd_instance_init(Object *o)
+{
+    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(o);
+
+    /*
+     * TODO: we can enable this by default after we have full x2apic
+     * support.
+     */
+    s->eim_supported = false;
+    object_property_add_bool(o, "eim", vtd_eim_prop_get,
+                             vtd_eim_prop_set, NULL);
+}
+
 static const TypeInfo vtd_info = {
     .name          = TYPE_INTEL_IOMMU_DEVICE,
     .parent        = TYPE_X86_IOMMU_DEVICE,
+    .instance_init = vtd_instance_init,
     .instance_size = sizeof(IntelIOMMUState),
     .class_init    = vtd_class_init,
 };
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index a42dbd7..b1bc768 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -289,6 +289,7 @@  struct IntelIOMMUState {
     dma_addr_t intr_root;           /* Interrupt remapping table pointer */
     uint32_t intr_size;             /* Number of IR table entries */
     bool intr_eime;                 /* Extended interrupt mode enabled */
+    bool eim_supported;             /* Whether to allow EIM bit */
 };
 
 /* Find the VTD Address space associated with the given bus pointer,