Message ID | 1460691099-3024-13-git-send-email-peterx@redhat.com |
---|---|
State | New |
Headers | show |
2016-04-15 11:31+0800, Peter Xu: > This patch allows Intel IR work with splitted irqchip. Two more fields > are added to IOAPICCommonState to support the translation process (For > future AMD IR support, we will need to provide another AMD-specific > callback for int_remap()). In split irqchip mode, IOAPIC is working in > user space, only update kernel irq routes when entry changed. When IR is > enabled, we directly update the kernel with translated messages. It > works just like a kernel cache for the remapping entries. (Patches are nice, thanks, I'll be looking how to slap EIM on top.) > Since KVM irqfd is using kernel gsi routes to deliver interrupts, as > long as we can support split irqchip, we will support irqfd as > well. Also, since kernel gsi routes will cache translated interrupts, > irqfd delivery will not suffer from any performance impact due to IR. > > And, since we supported irqfd, vhost devices will be able to work > seamlessly with IR now. Logically this should contain both vhost-net and > vhost-user case. Doesn't look that all callers of kvm_irqchip_update_msi_route() are IR aware. I think wrapping the remapping around it might be easiest, kvm_arch_fixup_msi_route() is another candidate. > Here we avoided capturing IOMMU IR invalidation, based on the assumption > that, guest kernel will always first update IR entry, then IOAPIC > entry. As long as guest follows this order to update IOAPIC entries, we > should be safe. The OS configures IOAPIC, MSI and IR independently. e.g. changing the destination LAPIC only updates IRTE and can happen anytime. You have to update kvm_irqchip routes when IRTE changes.
On 2016-04-14 20:31, Peter Xu wrote: > This patch allows Intel IR work with splitted irqchip. Two more fields > are added to IOAPICCommonState to support the translation process (For > future AMD IR support, we will need to provide another AMD-specific > callback for int_remap()). In split irqchip mode, IOAPIC is working in > user space, only update kernel irq routes when entry changed. When IR is > enabled, we directly update the kernel with translated messages. It > works just like a kernel cache for the remapping entries. > > Since KVM irqfd is using kernel gsi routes to deliver interrupts, as > long as we can support split irqchip, we will support irqfd as > well. Also, since kernel gsi routes will cache translated interrupts, > irqfd delivery will not suffer from any performance impact due to IR. > > And, since we supported irqfd, vhost devices will be able to work > seamlessly with IR now. Logically this should contain both vhost-net and > vhost-user case. > > Here we avoided capturing IOMMU IR invalidation, based on the assumption > that, guest kernel will always first update IR entry, then IOAPIC > entry. As long as guest follows this order to update IOAPIC entries, we > should be safe. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > hw/i386/intel_iommu.c | 9 +++++++-- > hw/intc/ioapic.c | 39 ++++++++++++++------------------------- > hw/intc/ioapic_common.c | 4 ++++ > include/hw/i386/intel_iommu.h | 2 ++ > include/hw/i386/ioapic_internal.h | 5 +++++ > 5 files changed, 32 insertions(+), 27 deletions(-) > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > index 68ebc1e..104afeb 100644 > --- a/hw/i386/intel_iommu.c > +++ b/hw/i386/intel_iommu.c > @@ -2077,9 +2077,9 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, > uint16_t index = 0; > VTDIrq irq = {0}; > > - assert(iommu && origin && translated); > + assert(origin && translated); > > - if (!iommu->intr_enabled) { > + if (!iommu || !iommu->intr_enabled) { > memcpy(translated, origin, sizeof(*origin)); > return 0; > } > @@ -2143,6 +2143,11 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, > return 0; > } > > +int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst) > +{ > + return vtd_interrupt_remap_msi(iommu, src, dst); > +} > + > static uint64_t vtd_mem_ir_read(void *opaque, hwaddr addr, unsigned size) > { > uint64_t data = 0; > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c > index 84e8948..b993bd0 100644 > --- a/hw/intc/ioapic.c > +++ b/hw/intc/ioapic.c > @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s) > { > #ifdef CONFIG_KVM > int i; > + int ret; > > if (kvm_irqchip_is_split()) { > for (i = 0; i < IOAPIC_NUM_PINS; i++) { > - uint64_t entry = s->ioredtbl[i]; > - uint8_t trig_mode; > - uint8_t delivery_mode; > - uint8_t dest; > - uint8_t dest_mode; > - uint64_t pin_polarity; > - MSIMessage msg; > - > - trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1); > - dest = entry >> IOAPIC_LVT_DEST_SHIFT; > - dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1; > - pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1; > - delivery_mode = > - (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK; > - > - msg.address = APIC_DEFAULT_ADDRESS; > - msg.address |= dest_mode << 2; > - msg.address |= dest << 12; > - > - msg.data = entry & IOAPIC_VECTOR_MASK; > - msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT; > - msg.data |= pin_polarity << APIC_POLARITY_SHIFT; > - msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT; > - > - kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); > + MSIMessage src, dst; > + struct ioapic_entry_info info; > + ioapic_entry_parse(s->ioredtbl[i], &info); > + src.address = info.addr; > + src.data = info.data; > + /* We update kernel irqchip routes with translated > + * results. */ > + ret = s->int_remap(s->iommu, &src, &dst); > + if (ret) { > + DPRINTF("Int remap failed: %d, drop interrupt\n", ret); > + continue; > + } > + kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL); The need to hook here makes me wonder if we can't inject IOAPIC interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but that will pick the fast-path on kernels supporting split irqchip) instead of open-coding the route changes. If we translated the IOAPIC outputs always into MSIs, the need for special-casing split irqchip would be gone, and the need for hooking here for IR just as well. Jan
On Sat, Apr 16, 2016 at 07:44:12PM -0700, Jan Kiszka wrote: > On 2016-04-14 20:31, Peter Xu wrote: > > This patch allows Intel IR work with splitted irqchip. Two more fields > > are added to IOAPICCommonState to support the translation process (For > > future AMD IR support, we will need to provide another AMD-specific > > callback for int_remap()). In split irqchip mode, IOAPIC is working in > > user space, only update kernel irq routes when entry changed. When IR is > > enabled, we directly update the kernel with translated messages. It > > works just like a kernel cache for the remapping entries. > > > > Since KVM irqfd is using kernel gsi routes to deliver interrupts, as > > long as we can support split irqchip, we will support irqfd as > > well. Also, since kernel gsi routes will cache translated interrupts, > > irqfd delivery will not suffer from any performance impact due to IR. > > > > And, since we supported irqfd, vhost devices will be able to work > > seamlessly with IR now. Logically this should contain both vhost-net and > > vhost-user case. > > > > Here we avoided capturing IOMMU IR invalidation, based on the assumption > > that, guest kernel will always first update IR entry, then IOAPIC > > entry. As long as guest follows this order to update IOAPIC entries, we > > should be safe. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > hw/i386/intel_iommu.c | 9 +++++++-- > > hw/intc/ioapic.c | 39 ++++++++++++++------------------------- > > hw/intc/ioapic_common.c | 4 ++++ > > include/hw/i386/intel_iommu.h | 2 ++ > > include/hw/i386/ioapic_internal.h | 5 +++++ > > 5 files changed, 32 insertions(+), 27 deletions(-) > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > index 68ebc1e..104afeb 100644 > > --- a/hw/i386/intel_iommu.c > > +++ b/hw/i386/intel_iommu.c > > @@ -2077,9 +2077,9 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, > > uint16_t index = 0; > > VTDIrq irq = {0}; > > > > - assert(iommu && origin && translated); > > + assert(origin && translated); > > > > - if (!iommu->intr_enabled) { > > + if (!iommu || !iommu->intr_enabled) { > > memcpy(translated, origin, sizeof(*origin)); > > return 0; > > } > > @@ -2143,6 +2143,11 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, > > return 0; > > } > > > > +int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst) > > +{ > > + return vtd_interrupt_remap_msi(iommu, src, dst); > > +} > > + > > static uint64_t vtd_mem_ir_read(void *opaque, hwaddr addr, unsigned size) > > { > > uint64_t data = 0; > > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c > > index 84e8948..b993bd0 100644 > > --- a/hw/intc/ioapic.c > > +++ b/hw/intc/ioapic.c > > @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s) > > { > > #ifdef CONFIG_KVM > > int i; > > + int ret; > > > > if (kvm_irqchip_is_split()) { > > for (i = 0; i < IOAPIC_NUM_PINS; i++) { > > - uint64_t entry = s->ioredtbl[i]; > > - uint8_t trig_mode; > > - uint8_t delivery_mode; > > - uint8_t dest; > > - uint8_t dest_mode; > > - uint64_t pin_polarity; > > - MSIMessage msg; > > - > > - trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1); > > - dest = entry >> IOAPIC_LVT_DEST_SHIFT; > > - dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1; > > - pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1; > > - delivery_mode = > > - (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK; > > - > > - msg.address = APIC_DEFAULT_ADDRESS; > > - msg.address |= dest_mode << 2; > > - msg.address |= dest << 12; > > - > > - msg.data = entry & IOAPIC_VECTOR_MASK; > > - msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT; > > - msg.data |= pin_polarity << APIC_POLARITY_SHIFT; > > - msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT; > > - > > - kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); > > + MSIMessage src, dst; > > + struct ioapic_entry_info info; > > + ioapic_entry_parse(s->ioredtbl[i], &info); > > + src.address = info.addr; > > + src.data = info.data; > > + /* We update kernel irqchip routes with translated > > + * results. */ > > + ret = s->int_remap(s->iommu, &src, &dst); > > + if (ret) { > > + DPRINTF("Int remap failed: %d, drop interrupt\n", ret); > > + continue; > > + } > > + kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL); > > The need to hook here makes me wonder if we can't inject IOAPIC > interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but > that will pick the fast-path on kernels supporting split irqchip) > instead of open-coding the route changes. If we translated the IOAPIC > outputs always into MSIs, the need for special-casing split irqchip > would be gone, and the need for hooking here for IR just as well. > > Jan Will work for irqfd as well.
On Fri, Apr 15, 2016 at 05:31:58PM +0200, Radim Krčmář wrote: > 2016-04-15 11:31+0800, Peter Xu: > > This patch allows Intel IR work with splitted irqchip. Two more fields > > are added to IOAPICCommonState to support the translation process (For > > future AMD IR support, we will need to provide another AMD-specific > > callback for int_remap()). In split irqchip mode, IOAPIC is working in > > user space, only update kernel irq routes when entry changed. When IR is > > enabled, we directly update the kernel with translated messages. It > > works just like a kernel cache for the remapping entries. > > (Patches are nice, thanks, I'll be looking how to slap EIM on top.) > > > Since KVM irqfd is using kernel gsi routes to deliver interrupts, as > > long as we can support split irqchip, we will support irqfd as > > well. Also, since kernel gsi routes will cache translated interrupts, > > irqfd delivery will not suffer from any performance impact due to IR. > > > > And, since we supported irqfd, vhost devices will be able to work > > seamlessly with IR now. Logically this should contain both vhost-net and > > vhost-user case. > > Doesn't look that all callers of kvm_irqchip_update_msi_route() are IR > aware. I think wrapping the remapping around it might be easiest, > kvm_arch_fixup_msi_route() is another candidate. You are right, failed to find this during smoke test. It seems that kvm_arch_fixup_msi_route() is a good place. Thanks! > > > Here we avoided capturing IOMMU IR invalidation, based on the assumption > > that, guest kernel will always first update IR entry, then IOAPIC > > entry. As long as guest follows this order to update IOAPIC entries, we > > should be safe. > > The OS configures IOAPIC, MSI and IR independently. e.g. changing the > destination LAPIC only updates IRTE and can happen anytime. > You have to update kvm_irqchip routes when IRTE changes. Thanks to point out. Will add one more patch to do that. -- peterx
On Sun, Apr 17, 2016 at 12:45:03PM +0300, Michael S. Tsirkin wrote: > On Sat, Apr 16, 2016 at 07:44:12PM -0700, Jan Kiszka wrote: > > On 2016-04-14 20:31, Peter Xu wrote: [...] > > > diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c > > > index 84e8948..b993bd0 100644 > > > --- a/hw/intc/ioapic.c > > > +++ b/hw/intc/ioapic.c > > > @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s) > > > { > > > #ifdef CONFIG_KVM > > > int i; > > > + int ret; > > > > > > if (kvm_irqchip_is_split()) { > > > for (i = 0; i < IOAPIC_NUM_PINS; i++) { > > > - uint64_t entry = s->ioredtbl[i]; > > > - uint8_t trig_mode; > > > - uint8_t delivery_mode; > > > - uint8_t dest; > > > - uint8_t dest_mode; > > > - uint64_t pin_polarity; > > > - MSIMessage msg; > > > - > > > - trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1); > > > - dest = entry >> IOAPIC_LVT_DEST_SHIFT; > > > - dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1; > > > - pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1; > > > - delivery_mode = > > > - (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK; > > > - > > > - msg.address = APIC_DEFAULT_ADDRESS; > > > - msg.address |= dest_mode << 2; > > > - msg.address |= dest << 12; > > > - > > > - msg.data = entry & IOAPIC_VECTOR_MASK; > > > - msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT; > > > - msg.data |= pin_polarity << APIC_POLARITY_SHIFT; > > > - msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT; > > > - > > > - kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); > > > + MSIMessage src, dst; > > > + struct ioapic_entry_info info; > > > + ioapic_entry_parse(s->ioredtbl[i], &info); > > > + src.address = info.addr; > > > + src.data = info.data; > > > + /* We update kernel irqchip routes with translated > > > + * results. */ > > > + ret = s->int_remap(s->iommu, &src, &dst); > > > + if (ret) { > > > + DPRINTF("Int remap failed: %d, drop interrupt\n", ret); > > > + continue; > > > + } > > > + kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL); > > > > The need to hook here makes me wonder if we can't inject IOAPIC > > interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but > > that will pick the fast-path on kernels supporting split irqchip) > > instead of open-coding the route changes. If we translated the IOAPIC > > outputs always into MSIs, the need for special-casing split irqchip > > would be gone, and the need for hooking here for IR just as well. Hi, Jan, IIUC, this can be achieved by removing lines in ioapic_service(): -#ifdef CONFIG_KVM - if (kvm_irqchip_is_split()) { - if (trig_mode == IOAPIC_TRIGGER_EDGE) { - kvm_set_irq(kvm_state, i, 1); - kvm_set_irq(kvm_state, i, 0); - } else { - if (!coalesce) { - kvm_set_irq(kvm_state, i, 1); - } - } - continue; - } -#else - (void)coalesce; -#endif So that QEMU will automatically select the correct way to notify the interrupt depending on whether "apic" or "kvm-apic" is used. I suppose this is a good way to do if we are to support split irqchip only. However, what if we move on to support irqfd and vhost? If so, we may still need to update kernel entries into translated ones, right? Or... did I miss anything? > > Will work for irqfd as well. Michael, Could you help explain how could I support irqfd as well using Jan's method mentioned above? Thanks in advance (to both :). -- peterx
On 2016-04-18 10:55, Peter Xu wrote: > On Sun, Apr 17, 2016 at 12:45:03PM +0300, Michael S. Tsirkin wrote: >> On Sat, Apr 16, 2016 at 07:44:12PM -0700, Jan Kiszka wrote: >>> On 2016-04-14 20:31, Peter Xu wrote: > [...] >>>> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c >>>> index 84e8948..b993bd0 100644 >>>> --- a/hw/intc/ioapic.c >>>> +++ b/hw/intc/ioapic.c >>>> @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s) >>>> { >>>> #ifdef CONFIG_KVM >>>> int i; >>>> + int ret; >>>> >>>> if (kvm_irqchip_is_split()) { >>>> for (i = 0; i < IOAPIC_NUM_PINS; i++) { >>>> - uint64_t entry = s->ioredtbl[i]; >>>> - uint8_t trig_mode; >>>> - uint8_t delivery_mode; >>>> - uint8_t dest; >>>> - uint8_t dest_mode; >>>> - uint64_t pin_polarity; >>>> - MSIMessage msg; >>>> - >>>> - trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1); >>>> - dest = entry >> IOAPIC_LVT_DEST_SHIFT; >>>> - dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1; >>>> - pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1; >>>> - delivery_mode = >>>> - (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK; >>>> - >>>> - msg.address = APIC_DEFAULT_ADDRESS; >>>> - msg.address |= dest_mode << 2; >>>> - msg.address |= dest << 12; >>>> - >>>> - msg.data = entry & IOAPIC_VECTOR_MASK; >>>> - msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT; >>>> - msg.data |= pin_polarity << APIC_POLARITY_SHIFT; >>>> - msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT; >>>> - >>>> - kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); >>>> + MSIMessage src, dst; >>>> + struct ioapic_entry_info info; >>>> + ioapic_entry_parse(s->ioredtbl[i], &info); >>>> + src.address = info.addr; >>>> + src.data = info.data; >>>> + /* We update kernel irqchip routes with translated >>>> + * results. */ >>>> + ret = s->int_remap(s->iommu, &src, &dst); >>>> + if (ret) { >>>> + DPRINTF("Int remap failed: %d, drop interrupt\n", ret); >>>> + continue; >>>> + } >>>> + kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL); >>> >>> The need to hook here makes me wonder if we can't inject IOAPIC >>> interrupts via KVM_SIGNAL_MSI (abstracted by kvm_irqchip_send_msi, but >>> that will pick the fast-path on kernels supporting split irqchip) >>> instead of open-coding the route changes. If we translated the IOAPIC >>> outputs always into MSIs, the need for special-casing split irqchip >>> would be gone, and the need for hooking here for IR just as well. > > Hi, Jan, > > IIUC, this can be achieved by removing lines in ioapic_service(): > > -#ifdef CONFIG_KVM > - if (kvm_irqchip_is_split()) { > - if (trig_mode == IOAPIC_TRIGGER_EDGE) { > - kvm_set_irq(kvm_state, i, 1); > - kvm_set_irq(kvm_state, i, 0); > - } else { > - if (!coalesce) { > - kvm_set_irq(kvm_state, i, 1); > - } > - } > - continue; > - } > -#else > - (void)coalesce; > -#endif > > So that QEMU will automatically select the correct way to notify the > interrupt depending on whether "apic" or "kvm-apic" is used. > > I suppose this is a good way to do if we are to support split > irqchip only. However, what if we move on to support irqfd and > vhost? If so, we may still need to update kernel entries into > translated ones, right? Or... did I miss anything? Right, in-kernel irq sources depend on an already remapped channel because they deliver directly to the in-kernel APICs. Thus, you will have to establish routes for those irqfds that reflects th (cached) IRTEs. Jan
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 68ebc1e..104afeb 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2077,9 +2077,9 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, uint16_t index = 0; VTDIrq irq = {0}; - assert(iommu && origin && translated); + assert(origin && translated); - if (!iommu->intr_enabled) { + if (!iommu || !iommu->intr_enabled) { memcpy(translated, origin, sizeof(*origin)); return 0; } @@ -2143,6 +2143,11 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, return 0; } +int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst) +{ + return vtd_interrupt_remap_msi(iommu, src, dst); +} + static uint64_t vtd_mem_ir_read(void *opaque, hwaddr addr, unsigned size) { uint64_t data = 0; diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c index 84e8948..b993bd0 100644 --- a/hw/intc/ioapic.c +++ b/hw/intc/ioapic.c @@ -182,34 +182,23 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s) { #ifdef CONFIG_KVM int i; + int ret; if (kvm_irqchip_is_split()) { for (i = 0; i < IOAPIC_NUM_PINS; i++) { - uint64_t entry = s->ioredtbl[i]; - uint8_t trig_mode; - uint8_t delivery_mode; - uint8_t dest; - uint8_t dest_mode; - uint64_t pin_polarity; - MSIMessage msg; - - trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1); - dest = entry >> IOAPIC_LVT_DEST_SHIFT; - dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1; - pin_polarity = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1; - delivery_mode = - (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK; - - msg.address = APIC_DEFAULT_ADDRESS; - msg.address |= dest_mode << 2; - msg.address |= dest << 12; - - msg.data = entry & IOAPIC_VECTOR_MASK; - msg.data |= delivery_mode << APIC_DELIVERY_MODE_SHIFT; - msg.data |= pin_polarity << APIC_POLARITY_SHIFT; - msg.data |= trig_mode << APIC_TRIG_MODE_SHIFT; - - kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); + MSIMessage src, dst; + struct ioapic_entry_info info; + ioapic_entry_parse(s->ioredtbl[i], &info); + src.address = info.addr; + src.data = info.data; + /* We update kernel irqchip routes with translated + * results. */ + ret = s->int_remap(s->iommu, &src, &dst); + if (ret) { + DPRINTF("Int remap failed: %d, drop interrupt\n", ret); + continue; + } + kvm_irqchip_update_msi_route(kvm_state, i, dst, NULL); } kvm_irqchip_commit_routes(kvm_state); } diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c index 1b7ec5e..d583398 100644 --- a/hw/intc/ioapic_common.c +++ b/hw/intc/ioapic_common.c @@ -25,6 +25,7 @@ #include "hw/i386/ioapic.h" #include "hw/i386/ioapic_internal.h" #include "hw/sysbus.h" +#include "hw/i386/intel_iommu.h" /* ioapic_no count start from 0 to MAX_IOAPICS, * remove as static variable from ioapic_common_init. @@ -135,6 +136,9 @@ static void ioapic_common_realize(DeviceState *dev, Error **errp) info = IOAPIC_COMMON_GET_CLASS(s); info->realize(dev, errp); + s->iommu = (void *)vtd_iommu_get(); + s->int_remap = vtd_int_remap; + sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->io_memory); ioapic_no++; } diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index f824957..25a9306 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -246,5 +246,7 @@ struct IntelIOMMUState { VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn); /* Get default IOMMU object */ IntelIOMMUState *vtd_iommu_get(void); +/* Interrupt remapping hook function */ +int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst); #endif diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h index d279f2d..d9070cf 100644 --- a/include/hw/i386/ioapic_internal.h +++ b/include/hw/i386/ioapic_internal.h @@ -102,6 +102,11 @@ struct IOAPICCommonState { uint8_t ioregsel; uint32_t irr; uint64_t ioredtbl[IOAPIC_NUM_PINS]; + + /* This should be the IOMMU that owns the IOAPIC. */ + void *iommu; + /* Interrupt remapping callback */ + int (*int_remap)(void *iommu, MSIMessage *src, MSIMessage *dst); }; void ioapic_reset_common(DeviceState *dev);
This patch allows Intel IR work with splitted irqchip. Two more fields are added to IOAPICCommonState to support the translation process (For future AMD IR support, we will need to provide another AMD-specific callback for int_remap()). In split irqchip mode, IOAPIC is working in user space, only update kernel irq routes when entry changed. When IR is enabled, we directly update the kernel with translated messages. It works just like a kernel cache for the remapping entries. Since KVM irqfd is using kernel gsi routes to deliver interrupts, as long as we can support split irqchip, we will support irqfd as well. Also, since kernel gsi routes will cache translated interrupts, irqfd delivery will not suffer from any performance impact due to IR. And, since we supported irqfd, vhost devices will be able to work seamlessly with IR now. Logically this should contain both vhost-net and vhost-user case. Here we avoided capturing IOMMU IR invalidation, based on the assumption that, guest kernel will always first update IR entry, then IOAPIC entry. As long as guest follows this order to update IOAPIC entries, we should be safe. Signed-off-by: Peter Xu <peterx@redhat.com> --- hw/i386/intel_iommu.c | 9 +++++++-- hw/intc/ioapic.c | 39 ++++++++++++++------------------------- hw/intc/ioapic_common.c | 4 ++++ include/hw/i386/intel_iommu.h | 2 ++ include/hw/i386/ioapic_internal.h | 5 +++++ 5 files changed, 32 insertions(+), 27 deletions(-)