diff mbox

[v3,12/13] q35: ioapic: add support for split irqchip and irqfd

Message ID 1460691099-3024-13-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu April 15, 2016, 3:31 a.m. UTC
This patch allows Intel IR work with splitted irqchip. Two more fields
are added to IOAPICCommonState to support the translation process (For
future AMD IR support, we will need to provide another AMD-specific
callback for int_remap()). In split irqchip mode, IOAPIC is working in
user space, only update kernel irq routes when entry changed. When IR is
enabled, we directly update the kernel with translated messages. It
works just like a kernel cache for the remapping entries.

Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
long as we can support split irqchip, we will support irqfd as
well. Also, since kernel gsi routes will cache translated interrupts,
irqfd delivery will not suffer from any performance impact due to IR.

And, since we supported irqfd, vhost devices will be able to work
seamlessly with IR now. Logically this should contain both vhost-net and
vhost-user case.

Here we avoided capturing IOMMU IR invalidation, based on the assumption
that, guest kernel will always first update IR entry, then IOAPIC
entry. As long as guest follows this order to update IOAPIC entries, we
should be safe.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c             |  9 +++++++--
 hw/intc/ioapic.c                  | 39 ++++++++++++++-------------------------
 hw/intc/ioapic_common.c           |  4 ++++
 include/hw/i386/intel_iommu.h     |  2 ++
 include/hw/i386/ioapic_internal.h |  5 +++++
 5 files changed, 32 insertions(+), 27 deletions(-)

Comments

Radim Krčmář April 15, 2016, 3:31 p.m. UTC | #1
2016-04-15 11:31+0800, Peter Xu:
> This patch allows Intel IR work with splitted irqchip. Two more fields
> are added to IOAPICCommonState to support the translation process (For
> future AMD IR support, we will need to provide another AMD-specific
> callback for int_remap()). In split irqchip mode, IOAPIC is working in
> user space, only update kernel irq routes when entry changed. When IR is
> enabled, we directly update the kernel with translated messages. It
> works just like a kernel cache for the remapping entries.

(Patches are nice, thanks, I'll be looking how to slap EIM on top.)

> Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
> long as we can support split irqchip, we will support irqfd as
> well. Also, since kernel gsi routes will cache translated interrupts,
> irqfd delivery will not suffer from any performance impact due to IR.
> 
> And, since we supported irqfd, vhost devices will be able to work
> seamlessly with IR now. Logically this should contain both vhost-net and
> vhost-user case.

Doesn't look that all callers of kvm_irqchip_update_msi_route() are IR
aware.  I think wrapping the remapping around it might be easiest,
kvm_arch_fixup_msi_route() is another candidate.

> Here we avoided capturing IOMMU IR invalidation, based on the assumption
> that, guest kernel will always first update IR entry, then IOAPIC
> entry. As long as guest follows this order to update IOAPIC entries, we
> should be safe.

The OS configures IOAPIC, MSI and IR independently.  e.g. changing the
destination LAPIC only updates IRTE and can happen anytime.
You have to update kvm_irqchip routes when IRTE changes.
Jan Kiszka April 17, 2016, 2:44 a.m. UTC | #2
On 2016-04-14 20:31, Peter Xu wrote:
> This patch allows Intel IR work with splitted irqchip. Two more fields
> are added to IOAPICCommonState to support the translation process (For
> future AMD IR support, we will need to provide another AMD-specific
> callback for int_remap()). In split irqchip mode, IOAPIC is working in
> user space, only update kernel irq routes when entry changed. When IR is
> enabled, we directly update the kernel with translated messages. It
> works just like a kernel cache for the remapping entries.
> 
> Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
> long as we can support split irqchip, we will support irqfd as
> well. Also, since kernel gsi routes will cache translated interrupts,
> irqfd delivery will not suffer from any performance impact due to IR.
> 
> And, since we supported irqfd, vhost devices will be able to work
> seamlessly with IR now. Logically this should contain both vhost-net and
> vhost-user case.
> 
> Here we avoided capturing IOMMU IR invalidation, based on the assumption
> that, guest kernel will always first update IR entry, then IOAPIC
> entry. As long as guest follows this order to update IOAPIC entries, we
> should be safe.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c             |  9 +++++++--
>  hw/intc/ioapic.c                  | 39 ++++++++++++++-------------------------
>  hw/intc/ioapic_common.c           |  4 ++++
>  include/hw/i386/intel_iommu.h     |  2 ++
>  include/hw/i386/ioapic_internal.h |  5 +++++
>  5 files changed, 32 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 68ebc1e..104afeb 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2077,9 +2077,9 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
>      uint16_t index = 0;
>      VTDIrq irq = {0};
>  
> -    assert(iommu && origin && translated);
> +    assert(origin && translated);
>  
> -    if (!iommu->intr_enabled) {
> +    if (!iommu || !iommu->intr_enabled) {
>          memcpy(translated, origin, sizeof(*origin));
>          return 0;
>      }
> @@ -2143,6 +2143,11 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
>      return 0;
>  }
>  
> +int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst)
> +{
> +    return vtd_interrupt_remap_msi(iommu, src, dst);
> +}
> +
>  static uint64_t vtd_mem_ir_read(void *opaque, hwaddr addr, unsigned size)
>  {
>      uint64_t data = 0;
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 84e8948..b993bd0 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
>  {
>  #ifdef CONFIG_KVM
>      int i;
> +    int ret;
>  
>      if (kvm_irqchip_is_split()) {
>          for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> -            uint64_t entry = s->ioredtbl[i];
> -            uint8_t trig_mode;
> -            uint8_t delivery_mode;
> -            uint8_t dest;
> -            uint8_t dest_mode;
> -            uint64_t pin_polarity;
> -            MSIMessage msg;
> -
> -            trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
> -            dest = entry >> IOAPIC_LVT_DEST_SHIFT;
> -            dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
> -            pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
> -            delivery_mode =
> -                (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
> -
> -            msg.address = APIC_DEFAULT_ADDRESS;
> -            msg.address |= dest_mode << 2;
> -            msg.address |= dest << 12;
> -
> -            msg.data = entry & IOAPIC_VECTOR_MASK;
> -            msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT;
> -            msg.data |= pin_polarity << APIC_POLARITY_SHIFT;
> -            msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT;
> -
> -            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
> +            MSIMessage src, dst;
> +            struct ioapic_entry_info info;
> +            ioapic_entry_parse(s->ioredtbl[i], &info);
> +            src.address = info.addr;
> +            src.data = info.data;
> +            /* We update kernel irqchip routes with translated
> +             * results. */
> +            ret = s->int_remap(s->iommu, &src, &dst);
> +            if (ret) {
> +                DPRINTF("Int remap failed: %d, drop interrupt\n", ret);
> +                continue;
> +            }
> +            kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL);

The need to hook here makes me wonder if we can't inject IOAPIC
interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but
that will pick the fast-path on kernels supporting split irqchip)
instead of open-coding the route changes. If we translated the IOAPIC
outputs always into MSIs, the need for special-casing split irqchip
would be gone, and the need for hooking here for IR just as well.

Jan
Michael S. Tsirkin April 17, 2016, 9:45 a.m. UTC | #3
On Sat, Apr 16, 2016 at 07:44:12PM -0700, Jan Kiszka wrote:
> On 2016-04-14 20:31, Peter Xu wrote:
> > This patch allows Intel IR work with splitted irqchip. Two more fields
> > are added to IOAPICCommonState to support the translation process (For
> > future AMD IR support, we will need to provide another AMD-specific
> > callback for int_remap()). In split irqchip mode, IOAPIC is working in
> > user space, only update kernel irq routes when entry changed. When IR is
> > enabled, we directly update the kernel with translated messages. It
> > works just like a kernel cache for the remapping entries.
> > 
> > Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
> > long as we can support split irqchip, we will support irqfd as
> > well. Also, since kernel gsi routes will cache translated interrupts,
> > irqfd delivery will not suffer from any performance impact due to IR.
> > 
> > And, since we supported irqfd, vhost devices will be able to work
> > seamlessly with IR now. Logically this should contain both vhost-net and
> > vhost-user case.
> > 
> > Here we avoided capturing IOMMU IR invalidation, based on the assumption
> > that, guest kernel will always first update IR entry, then IOAPIC
> > entry. As long as guest follows this order to update IOAPIC entries, we
> > should be safe.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c             |  9 +++++++--
> >  hw/intc/ioapic.c                  | 39 ++++++++++++++-------------------------
> >  hw/intc/ioapic_common.c           |  4 ++++
> >  include/hw/i386/intel_iommu.h     |  2 ++
> >  include/hw/i386/ioapic_internal.h |  5 +++++
> >  5 files changed, 32 insertions(+), 27 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 68ebc1e..104afeb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2077,9 +2077,9 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
> >      uint16_t index = 0;
> >      VTDIrq irq = {0};
> >  
> > -    assert(iommu && origin && translated);
> > +    assert(origin && translated);
> >  
> > -    if (!iommu->intr_enabled) {
> > +    if (!iommu || !iommu->intr_enabled) {
> >          memcpy(translated, origin, sizeof(*origin));
> >          return 0;
> >      }
> > @@ -2143,6 +2143,11 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
> >      return 0;
> >  }
> >  
> > +int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst)
> > +{
> > +    return vtd_interrupt_remap_msi(iommu, src, dst);
> > +}
> > +
> >  static uint64_t vtd_mem_ir_read(void *opaque, hwaddr addr, unsigned size)
> >  {
> >      uint64_t data = 0;
> > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> > index 84e8948..b993bd0 100644
> > --- a/hw/intc/ioapic.c
> > +++ b/hw/intc/ioapic.c
> > @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
> >  {
> >  #ifdef CONFIG_KVM
> >      int i;
> > +    int ret;
> >  
> >      if (kvm_irqchip_is_split()) {
> >          for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> > -            uint64_t entry = s->ioredtbl[i];
> > -            uint8_t trig_mode;
> > -            uint8_t delivery_mode;
> > -            uint8_t dest;
> > -            uint8_t dest_mode;
> > -            uint64_t pin_polarity;
> > -            MSIMessage msg;
> > -
> > -            trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
> > -            dest = entry >> IOAPIC_LVT_DEST_SHIFT;
> > -            dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
> > -            pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
> > -            delivery_mode =
> > -                (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
> > -
> > -            msg.address = APIC_DEFAULT_ADDRESS;
> > -            msg.address |= dest_mode << 2;
> > -            msg.address |= dest << 12;
> > -
> > -            msg.data = entry & IOAPIC_VECTOR_MASK;
> > -            msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT;
> > -            msg.data |= pin_polarity << APIC_POLARITY_SHIFT;
> > -            msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT;
> > -
> > -            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
> > +            MSIMessage src, dst;
> > +            struct ioapic_entry_info info;
> > +            ioapic_entry_parse(s->ioredtbl[i], &info);
> > +            src.address = info.addr;
> > +            src.data = info.data;
> > +            /* We update kernel irqchip routes with translated
> > +             * results. */
> > +            ret = s->int_remap(s->iommu, &src, &dst);
> > +            if (ret) {
> > +                DPRINTF("Int remap failed: %d, drop interrupt\n", ret);
> > +                continue;
> > +            }
> > +            kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL);
> 
> The need to hook here makes me wonder if we can't inject IOAPIC
> interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but
> that will pick the fast-path on kernels supporting split irqchip)
> instead of open-coding the route changes. If we translated the IOAPIC
> outputs always into MSIs, the need for special-casing split irqchip
> would be gone, and the need for hooking here for IR just as well.
> 
> Jan

Will work for irqfd as well.
Peter Xu April 18, 2016, 3:30 a.m. UTC | #4
On Fri, Apr 15, 2016 at 05:31:58PM +0200, Radim Krčmář wrote:
> 2016-04-15 11:31+0800, Peter Xu:
> > This patch allows Intel IR work with splitted irqchip. Two more fields
> > are added to IOAPICCommonState to support the translation process (For
> > future AMD IR support, we will need to provide another AMD-specific
> > callback for int_remap()). In split irqchip mode, IOAPIC is working in
> > user space, only update kernel irq routes when entry changed. When IR is
> > enabled, we directly update the kernel with translated messages. It
> > works just like a kernel cache for the remapping entries.
> 
> (Patches are nice, thanks, I'll be looking how to slap EIM on top.)
> 
> > Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
> > long as we can support split irqchip, we will support irqfd as
> > well. Also, since kernel gsi routes will cache translated interrupts,
> > irqfd delivery will not suffer from any performance impact due to IR.
> > 
> > And, since we supported irqfd, vhost devices will be able to work
> > seamlessly with IR now. Logically this should contain both vhost-net and
> > vhost-user case.
> 
> Doesn't look that all callers of kvm_irqchip_update_msi_route() are IR
> aware.  I think wrapping the remapping around it might be easiest,
> kvm_arch_fixup_msi_route() is another candidate.

You are right, failed to find this during smoke test. It seems that
kvm_arch_fixup_msi_route() is a good place. Thanks!

> 
> > Here we avoided capturing IOMMU IR invalidation, based on the assumption
> > that, guest kernel will always first update IR entry, then IOAPIC
> > entry. As long as guest follows this order to update IOAPIC entries, we
> > should be safe.
> 
> The OS configures IOAPIC, MSI and IR independently.  e.g. changing the
> destination LAPIC only updates IRTE and can happen anytime.
> You have to update kvm_irqchip routes when IRTE changes.

Thanks to point out. Will add one more patch to do that.

-- peterx
Peter Xu April 18, 2016, 8:55 a.m. UTC | #5
On Sun, Apr 17, 2016 at 12:45:03PM +0300, Michael S. Tsirkin wrote:
> On Sat, Apr 16, 2016 at 07:44:12PM -0700, Jan Kiszka wrote:
> > On 2016-04-14 20:31, Peter Xu wrote:
[...]
> > > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> > > index 84e8948..b993bd0 100644
> > > --- a/hw/intc/ioapic.c
> > > +++ b/hw/intc/ioapic.c
> > > @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
> > >  {
> > >  #ifdef CONFIG_KVM
> > >      int i;
> > > +    int ret;
> > >  
> > >      if (kvm_irqchip_is_split()) {
> > >          for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> > > -            uint64_t entry = s->ioredtbl[i];
> > > -            uint8_t trig_mode;
> > > -            uint8_t delivery_mode;
> > > -            uint8_t dest;
> > > -            uint8_t dest_mode;
> > > -            uint64_t pin_polarity;
> > > -            MSIMessage msg;
> > > -
> > > -            trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
> > > -            dest = entry >> IOAPIC_LVT_DEST_SHIFT;
> > > -            dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
> > > -            pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
> > > -            delivery_mode =
> > > -                (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
> > > -
> > > -            msg.address = APIC_DEFAULT_ADDRESS;
> > > -            msg.address |= dest_mode << 2;
> > > -            msg.address |= dest << 12;
> > > -
> > > -            msg.data = entry & IOAPIC_VECTOR_MASK;
> > > -            msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT;
> > > -            msg.data |= pin_polarity << APIC_POLARITY_SHIFT;
> > > -            msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT;
> > > -
> > > -            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
> > > +            MSIMessage src, dst;
> > > +            struct ioapic_entry_info info;
> > > +            ioapic_entry_parse(s->ioredtbl[i], &info);
> > > +            src.address = info.addr;
> > > +            src.data = info.data;
> > > +            /* We update kernel irqchip routes with translated
> > > +             * results. */
> > > +            ret = s->int_remap(s->iommu, &src, &dst);
> > > +            if (ret) {
> > > +                DPRINTF("Int remap failed: %d, drop interrupt\n", ret);
> > > +                continue;
> > > +            }
> > > +            kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL);
> > 
> > The need to hook here makes me wonder if we can't inject IOAPIC
> > interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but
> > that will pick the fast-path on kernels supporting split irqchip)
> > instead of open-coding the route changes. If we translated the IOAPIC
> > outputs always into MSIs, the need for special-casing split irqchip
> > would be gone, and the need for hooking here for IR just as well.

Hi, Jan,

IIUC, this can be achieved by removing lines in ioapic_service():

-#ifdef CONFIG_KVM
-                if (kvm_irqchip_is_split()) {
-                    if (trig_mode == IOAPIC_TRIGGER_EDGE) {
-                        kvm_set_irq(kvm_state, i, 1);
-                        kvm_set_irq(kvm_state, i, 0);
-                    } else {
-                        if (!coalesce) {
-                            kvm_set_irq(kvm_state, i, 1);
-                        }
-                    }
-                    continue;
-                }
-#else
-                (void)coalesce;
-#endif

So that QEMU will automatically select the correct way to notify the
interrupt depending on whether "apic" or "kvm-apic" is used.

I suppose this is a good way to do if we are to support split
irqchip only. However, what if we move on to support irqfd and
vhost? If so, we may still need to update kernel entries into
translated ones, right? Or... did I miss anything?

> 
> Will work for irqfd as well.

Michael,

Could you help explain how could I support irqfd as well using Jan's
method mentioned above?

Thanks in advance (to both :).

-- peterx
Jan Kiszka April 25, 2016, 5 a.m. UTC | #6
On 2016-04-18 10:55, Peter Xu wrote:
> On Sun, Apr 17, 2016 at 12:45:03PM +0300, Michael S. Tsirkin wrote:
>> On Sat, Apr 16, 2016 at 07:44:12PM -0700, Jan Kiszka wrote:
>>> On 2016-04-14 20:31, Peter Xu wrote:
> [...]
>>>> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
>>>> index 84e8948..b993bd0 100644
>>>> --- a/hw/intc/ioapic.c
>>>> +++ b/hw/intc/ioapic.c
>>>> @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
>>>>  {
>>>>  #ifdef CONFIG_KVM
>>>>      int i;
>>>> +    int ret;
>>>>  
>>>>      if (kvm_irqchip_is_split()) {
>>>>          for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>>>> -            uint64_t entry = s->ioredtbl[i];
>>>> -            uint8_t trig_mode;
>>>> -            uint8_t delivery_mode;
>>>> -            uint8_t dest;
>>>> -            uint8_t dest_mode;
>>>> -            uint64_t pin_polarity;
>>>> -            MSIMessage msg;
>>>> -
>>>> -            trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
>>>> -            dest = entry >> IOAPIC_LVT_DEST_SHIFT;
>>>> -            dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
>>>> -            pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
>>>> -            delivery_mode =
>>>> -                (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
>>>> -
>>>> -            msg.address = APIC_DEFAULT_ADDRESS;
>>>> -            msg.address |= dest_mode << 2;
>>>> -            msg.address |= dest << 12;
>>>> -
>>>> -            msg.data = entry & IOAPIC_VECTOR_MASK;
>>>> -            msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT;
>>>> -            msg.data |= pin_polarity << APIC_POLARITY_SHIFT;
>>>> -            msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT;
>>>> -
>>>> -            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
>>>> +            MSIMessage src, dst;
>>>> +            struct ioapic_entry_info info;
>>>> +            ioapic_entry_parse(s->ioredtbl[i], &info);
>>>> +            src.address = info.addr;
>>>> +            src.data = info.data;
>>>> +            /* We update kernel irqchip routes with translated
>>>> +             * results. */
>>>> +            ret = s->int_remap(s->iommu, &src, &dst);
>>>> +            if (ret) {
>>>> +                DPRINTF("Int remap failed: %d, drop interrupt\n", ret);
>>>> +                continue;
>>>> +            }
>>>> +            kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL);
>>>
>>> The need to hook here makes me wonder if we can't inject IOAPIC
>>> interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but
>>> that will pick the fast-path on kernels supporting split irqchip)
>>> instead of open-coding the route changes. If we translated the IOAPIC
>>> outputs always into MSIs, the need for special-casing split irqchip
>>> would be gone, and the need for hooking here for IR just as well.
> 
> Hi, Jan,
> 
> IIUC, this can be achieved by removing lines in ioapic_service():
> 
> -#ifdef CONFIG_KVM
> -                if (kvm_irqchip_is_split()) {
> -                    if (trig_mode == IOAPIC_TRIGGER_EDGE) {
> -                        kvm_set_irq(kvm_state, i, 1);
> -                        kvm_set_irq(kvm_state, i, 0);
> -                    } else {
> -                        if (!coalesce) {
> -                            kvm_set_irq(kvm_state, i, 1);
> -                        }
> -                    }
> -                    continue;
> -                }
> -#else
> -                (void)coalesce;
> -#endif
> 
> So that QEMU will automatically select the correct way to notify the
> interrupt depending on whether "apic" or "kvm-apic" is used.
> 
> I suppose this is a good way to do if we are to support split
> irqchip only. However, what if we move on to support irqfd and
> vhost? If so, we may still need to update kernel entries into
> translated ones, right? Or... did I miss anything?

Right, in-kernel irq sources depend on an already remapped channel
because they deliver directly to the in-kernel APICs. Thus, you will
have to establish routes for those irqfds that reflects th (cached) IRTEs.

Jan
diff mbox

Patch

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 68ebc1e..104afeb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2077,9 +2077,9 @@  static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
     uint16_t index = 0;
     VTDIrq irq = {0};
 
-    assert(iommu && origin && translated);
+    assert(origin && translated);
 
-    if (!iommu->intr_enabled) {
+    if (!iommu || !iommu->intr_enabled) {
         memcpy(translated, origin, sizeof(*origin));
         return 0;
     }
@@ -2143,6 +2143,11 @@  static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
     return 0;
 }
 
+int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst)
+{
+    return vtd_interrupt_remap_msi(iommu, src, dst);
+}
+
 static uint64_t vtd_mem_ir_read(void *opaque, hwaddr addr, unsigned size)
 {
     uint64_t data = 0;
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 84e8948..b993bd0 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -182,34 +182,23 @@  static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 {
 #ifdef CONFIG_KVM
     int i;
+    int ret;
 
     if (kvm_irqchip_is_split()) {
         for (i = 0; i < IOAPIC_NUM_PINS; i++) {
-            uint64_t entry = s->ioredtbl[i];
-            uint8_t trig_mode;
-            uint8_t delivery_mode;
-            uint8_t dest;
-            uint8_t dest_mode;
-            uint64_t pin_polarity;
-            MSIMessage msg;
-
-            trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-            dest = entry >> IOAPIC_LVT_DEST_SHIFT;
-            dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
-            pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
-            delivery_mode =
-                (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
-
-            msg.address = APIC_DEFAULT_ADDRESS;
-            msg.address |= dest_mode << 2;
-            msg.address |= dest << 12;
-
-            msg.data = entry & IOAPIC_VECTOR_MASK;
-            msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT;
-            msg.data |= pin_polarity << APIC_POLARITY_SHIFT;
-            msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT;
-
-            kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+            MSIMessage src, dst;
+            struct ioapic_entry_info info;
+            ioapic_entry_parse(s->ioredtbl[i], &info);
+            src.address = info.addr;
+            src.data = info.data;
+            /* We update kernel irqchip routes with translated
+             * results. */
+            ret = s->int_remap(s->iommu, &src, &dst);
+            if (ret) {
+                DPRINTF("Int remap failed: %d, drop interrupt\n", ret);
+                continue;
+            }
+            kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL);
         }
         kvm_irqchip_commit_routes(kvm_state);
     }
diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
index 1b7ec5e..d583398 100644
--- a/hw/intc/ioapic_common.c
+++ b/hw/intc/ioapic_common.c
@@ -25,6 +25,7 @@ 
 #include "hw/i386/ioapic.h"
 #include "hw/i386/ioapic_internal.h"
 #include "hw/sysbus.h"
+#include "hw/i386/intel_iommu.h"
 
 /* ioapic_no count start from 0 to MAX_IOAPICS,
  * remove as static variable from ioapic_common_init.
@@ -135,6 +136,9 @@  static void ioapic_common_realize(DeviceState *dev, Error **errp)
     info = IOAPIC_COMMON_GET_CLASS(s);
     info->realize(dev, errp);
 
+    s->iommu = (void *)vtd_iommu_get();
+    s->int_remap = vtd_int_remap;
+
     sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->io_memory);
     ioapic_no++;
 }
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index f824957..25a9306 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -246,5 +246,7 @@  struct IntelIOMMUState {
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
 /* Get default IOMMU object */
 IntelIOMMUState *vtd_iommu_get(void);
+/* Interrupt remapping hook function */
+int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst);
 
 #endif
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index d279f2d..d9070cf 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -102,6 +102,11 @@  struct IOAPICCommonState {
     uint8_t ioregsel;
     uint32_t irr;
     uint64_t ioredtbl[IOAPIC_NUM_PINS];
+
+    /* This should be the IOMMU that owns the IOAPIC. */
+    void *iommu;
+    /* Interrupt remapping callback */
+    int (*int_remap)(void *iommu, MSIMessage *src, MSIMessage *dst);
 };
 
 void ioapic_reset_common(DeviceState *dev);