Message ID | 20130917091840.GA12722@iris.ozlabs.ibm.com |
---|---|
State | New, archived |
Headers | show |
On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote: > This implements a simple way to express the case of IRQ routing where > there is one in-kernel PIC and the system interrupts (GSIs) are routed > 1-1 to the corresponding pins of the PIC. This is expressed by having > kvm->irq_routing == NULL with a skeleton irq routing entry in the new > kvm->default_irq_route field. > > This provides a convenient way to provide backwards compatibility when > adding IRQ routing capability to an existing in-kernel PIC, such as the > XICS emulation on powerpc. > Why not create simple 1-1 irq routing table? It will take a little bit more memory, but there will be no need for kvm->irq_routing == NULL special handling. > Signed-off-by: Paul Mackerras <paulus@samba.org> > --- > include/linux/kvm_host.h | 1 + > virt/kvm/eventfd.c | 4 ++++ > virt/kvm/irqchip.c | 23 ++++++++++++++++++++--- > 3 files changed, 25 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 749bdb1..609f587 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -394,6 +394,7 @@ struct kvm { > * if configured, irqfds.lock. > */ > struct kvm_irq_routing_table __rcu *irq_routing; > + struct kvm_kernel_irq_routing_entry default_irq_route; > struct hlist_head mask_notifier_list; > struct hlist_head irq_ack_notifier_list; > #endif > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index abe4d60..7010cb3 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -272,6 +272,10 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd, > { > struct kvm_kernel_irq_routing_entry *e; > > + if (!irq_rt) { > + rcu_assign_pointer(irqfd->irq_entry, NULL); > + return; > + } > if (irqfd->gsi >= irq_rt->nr_rt_entries) { > rcu_assign_pointer(irqfd->irq_entry, NULL); > return; > diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c > index 20dc9e4..ebc2941 100644 > --- a/virt/kvm/irqchip.c > +++ b/virt/kvm/irqchip.c > @@ -30,13 +30,24 @@ > #include <trace/events/kvm.h> > #include "irq.h" > > +static inline int pin_to_gsi(struct kvm *kvm, unsigned irqchip, unsigned pin) > +{ > + struct kvm_irq_routing_table *rt; > + int gsi = pin; > + > + rt = rcu_dereference(kvm->irq_routing); > + if (rt) > + gsi = rt->chip[irqchip][pin]; > + return gsi; > +} > + > bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin) > { > struct kvm_irq_ack_notifier *kian; > int gsi; > > rcu_read_lock(); > - gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin]; > + gsi = pin_to_gsi(kvm, irqchip, pin); > if (gsi != -1) > hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list, > link) > @@ -59,7 +70,7 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) > trace_kvm_ack_irq(irqchip, pin); > > rcu_read_lock(); > - gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin]; > + gsi = pin_to_gsi(kvm, irqchip, pin); > if (gsi != -1) > hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list, > link) > @@ -126,7 +137,13 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level, > */ > rcu_read_lock(); > irq_rt = rcu_dereference(kvm->irq_routing); > - if (irq < irq_rt->nr_rt_entries) > + if (!irq_rt) { > + if (kvm->default_irq_route.set) { > + irq_set[i] = kvm->default_irq_route; > + irq_set[i].gsi = irq; > + irq_set[i++].irqchip.pin = irq; > + } > + } else if (irq < irq_rt->nr_rt_entries) > hlist_for_each_entry(e, &irq_rt->map[irq], link) > irq_set[i++] = *e; > rcu_read_unlock(); > -- > 1.8.4.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote: > On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote: > > This implements a simple way to express the case of IRQ routing where > > there is one in-kernel PIC and the system interrupts (GSIs) are routed > > 1-1 to the corresponding pins of the PIC. This is expressed by having > > kvm->irq_routing == NULL with a skeleton irq routing entry in the new > > kvm->default_irq_route field. > > > > This provides a convenient way to provide backwards compatibility when > > adding IRQ routing capability to an existing in-kernel PIC, such as the > > XICS emulation on powerpc. > > > Why not create simple 1-1 irq routing table? It will take a little bit > more memory, but there will be no need for kvm->irq_routing == NULL > special handling. The short answer is that userspace wants to use interrupt source numbers (i.e. pin numbers for the inputs to the emulated XICS) that are scattered throughout a large space, since that mirrors what real hardware does. More specifically, hardware divides up the interrupt source number into two fields, each of typically 12 bits, where the more significant field identifies an "interrupt source unit" (ISU) and the less significant field identifies an interrupt within the ISU. Each PCI host bridge would have an ISU, for example, and there can be ISUs associated with other things that attach directly to the interconnect fabric (coprocessors, cluster interconnects, etc.). Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host bridge, which means for example that virtio devices get interrupt pin numbers starting at 4096. So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of 4096, say 16384, which would allow for 3 ISUs. But that would bloat out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1 mappings between GSI and pins for all of them, the routing table would be over 960kB. There is a compatibility concern too -- if I want existing userspace to run, I would have to create 1-1 default mappings for at least the first (say) 4200 pins or so, which would use up about 294kB. That really doesn't seem worth it compared to just using the null routing table pointer to indicate an unlimited 1-1 mapping. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 23, 2013 at 09:24:14PM +1000, Paul Mackerras wrote: > On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote: > > On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote: > > > This implements a simple way to express the case of IRQ routing where > > > there is one in-kernel PIC and the system interrupts (GSIs) are routed > > > 1-1 to the corresponding pins of the PIC. This is expressed by having > > > kvm->irq_routing == NULL with a skeleton irq routing entry in the new > > > kvm->default_irq_route field. > > > > > > This provides a convenient way to provide backwards compatibility when > > > adding IRQ routing capability to an existing in-kernel PIC, such as the > > > XICS emulation on powerpc. > > > > > Why not create simple 1-1 irq routing table? It will take a little bit > > more memory, but there will be no need for kvm->irq_routing == NULL > > special handling. > > The short answer is that userspace wants to use interrupt source > numbers (i.e. pin numbers for the inputs to the emulated XICS) that > are scattered throughout a large space, since that mirrors what real > hardware does. More specifically, hardware divides up the interrupt > source number into two fields, each of typically 12 bits, where the > more significant field identifies an "interrupt source unit" (ISU) and > the less significant field identifies an interrupt within the ISU. > Each PCI host bridge would have an ISU, for example, and there can be > ISUs associated with other things that attach directly to the > interconnect fabric (coprocessors, cluster interconnects, etc.). > > Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host > bridge, which means for example that virtio devices get interrupt pin > numbers starting at 4096. > > So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of > 4096, say 16384, which would allow for 3 ISUs. But that would bloat > out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1 > mappings between GSI and pins for all of them, the routing table would > be over 960kB. > Yes, this is not an option. GSI is just a cookie for anything but x86 non MSI interrupts. So the way to use irq routing table to deliver XICS irqs is to register GSI->XICS irq mapping and by triggering "GSI", which is just an arbitrary number, userspace tells kernel that XICS irq, that was registered for that GSI, should be injected. > There is a compatibility concern too -- if I want existing userspace > to run, I would have to create 1-1 default mappings for at least the > first (say) 4200 pins or so, which would use up about 294kB. That > really doesn't seem worth it compared to just using the null routing > table pointer to indicate an unlimited 1-1 mapping. > Let me check that I understand you correctly. Exiting userspace already uses XICS irq directly and now you want to switch this interface to use irq routing. New userspace will register GSI->XICS irq mapping like described above, this is what [2/2] does. Is this correct? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 23, 2013 at 09:34:01PM +0300, Gleb Natapov wrote: > On Mon, Sep 23, 2013 at 09:24:14PM +1000, Paul Mackerras wrote: > > On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote: > > > On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote: > > > > This implements a simple way to express the case of IRQ routing where > > > > there is one in-kernel PIC and the system interrupts (GSIs) are routed > > > > 1-1 to the corresponding pins of the PIC. This is expressed by having > > > > kvm->irq_routing == NULL with a skeleton irq routing entry in the new > > > > kvm->default_irq_route field. > > > > > > > > This provides a convenient way to provide backwards compatibility when > > > > adding IRQ routing capability to an existing in-kernel PIC, such as the > > > > XICS emulation on powerpc. > > > > > > > Why not create simple 1-1 irq routing table? It will take a little bit > > > more memory, but there will be no need for kvm->irq_routing == NULL > > > special handling. > > > > The short answer is that userspace wants to use interrupt source > > numbers (i.e. pin numbers for the inputs to the emulated XICS) that > > are scattered throughout a large space, since that mirrors what real > > hardware does. More specifically, hardware divides up the interrupt > > source number into two fields, each of typically 12 bits, where the > > more significant field identifies an "interrupt source unit" (ISU) and > > the less significant field identifies an interrupt within the ISU. > > Each PCI host bridge would have an ISU, for example, and there can be > > ISUs associated with other things that attach directly to the > > interconnect fabric (coprocessors, cluster interconnects, etc.). > > > > Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host > > bridge, which means for example that virtio devices get interrupt pin > > numbers starting at 4096. > > > > So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of > > 4096, say 16384, which would allow for 3 ISUs. But that would bloat > > out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1 > > mappings between GSI and pins for all of them, the routing table would > > be over 960kB. > > > Yes, this is not an option. GSI is just a cookie for anything but x86 > non MSI interrupts. So the way to use irq routing table to deliver XICS irqs > is to register GSI->XICS irq mapping and by triggering "GSI", which is > just an arbitrary number, userspace tells kernel that XICS irq, that was > registered for that GSI, should be injected. Yes, that's fine as far as it goes, but the trouble is that the existing data structures (specifically the chip[][] array in struct kvm_irq_routing_table) don't handle well the case where the pin numbers are large and/or sparse. In other words, using a small compact set of GSI numbers wouldn't help, because it's not the GSI -> pin mapping that is the problem, it is the reverse pin -> GSI mapping. > > There is a compatibility concern too -- if I want existing userspace > > to run, I would have to create 1-1 default mappings for at least the > > first (say) 4200 pins or so, which would use up about 294kB. That > > really doesn't seem worth it compared to just using the null routing > > table pointer to indicate an unlimited 1-1 mapping. > > > Let me check that I understand you correctly. Exiting userspace already > uses XICS irq directly and now you want to switch this interface to use > irq routing. New userspace will register GSI->XICS irq mapping like > described above, this is what [2/2] does. Is this correct? I don't particularly want irq routing, but it appears to be unavoidable if one wants IRQFD, which I do want. There is no particular advantage to userspace in using a GSI -> XICS irq mapping. If userspace did set one up it would be the identity mapping. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 26, 2013 at 10:00:59AM +1000, Paul Mackerras wrote: > On Mon, Sep 23, 2013 at 09:34:01PM +0300, Gleb Natapov wrote: > > On Mon, Sep 23, 2013 at 09:24:14PM +1000, Paul Mackerras wrote: > > > On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote: > > > > On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote: > > > > > This implements a simple way to express the case of IRQ routing where > > > > > there is one in-kernel PIC and the system interrupts (GSIs) are routed > > > > > 1-1 to the corresponding pins of the PIC. This is expressed by having > > > > > kvm->irq_routing == NULL with a skeleton irq routing entry in the new > > > > > kvm->default_irq_route field. > > > > > > > > > > This provides a convenient way to provide backwards compatibility when > > > > > adding IRQ routing capability to an existing in-kernel PIC, such as the > > > > > XICS emulation on powerpc. > > > > > > > > > Why not create simple 1-1 irq routing table? It will take a little bit > > > > more memory, but there will be no need for kvm->irq_routing == NULL > > > > special handling. > > > > > > The short answer is that userspace wants to use interrupt source > > > numbers (i.e. pin numbers for the inputs to the emulated XICS) that > > > are scattered throughout a large space, since that mirrors what real > > > hardware does. More specifically, hardware divides up the interrupt > > > source number into two fields, each of typically 12 bits, where the > > > more significant field identifies an "interrupt source unit" (ISU) and > > > the less significant field identifies an interrupt within the ISU. > > > Each PCI host bridge would have an ISU, for example, and there can be > > > ISUs associated with other things that attach directly to the > > > interconnect fabric (coprocessors, cluster interconnects, etc.). > > > > > > Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host > > > bridge, which means for example that virtio devices get interrupt pin > > > numbers starting at 4096. > > > > > > So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of > > > 4096, say 16384, which would allow for 3 ISUs. But that would bloat > > > out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1 > > > mappings between GSI and pins for all of them, the routing table would > > > be over 960kB. > > > > > Yes, this is not an option. GSI is just a cookie for anything but x86 > > non MSI interrupts. So the way to use irq routing table to deliver XICS irqs > > is to register GSI->XICS irq mapping and by triggering "GSI", which is > > just an arbitrary number, userspace tells kernel that XICS irq, that was > > registered for that GSI, should be injected. > > Yes, that's fine as far as it goes, but the trouble is that the > existing data structures (specifically the chip[][] array in struct > kvm_irq_routing_table) don't handle well the case where the pin > numbers are large and/or sparse. > > In other words, using a small compact set of GSI numbers wouldn't > help, because it's not the GSI -> pin mapping that is the problem, it > is the reverse pin -> GSI mapping. > That's internal implementation detail that can be redesigned if needed, we can let architecture define how irq is mapped back to a GSI (cookie). Buts since you mentioned chip[][] here I have a question about patch 2: in kvm_set_routing_entry() you check irq against KVM_IRQCHIP_NUM_PINS which is defined to be 256 for powerpc, but according to what you described above about 12 bit ISU/irq devision this is not enough to inject any interrupt with ISU 1. You would have had at least 256 irqs in each ISU if you reused irqchip to specify ISU, but you didn't do it. Why not extend a union in kvm_irq_routing_entry with xics specific structure that will specify ISU/irq withing ISU? Who irq numbers inside ISU are assigned? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 749bdb1..609f587 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -394,6 +394,7 @@ struct kvm { * if configured, irqfds.lock. */ struct kvm_irq_routing_table __rcu *irq_routing; + struct kvm_kernel_irq_routing_entry default_irq_route; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; #endif diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index abe4d60..7010cb3 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -272,6 +272,10 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd, { struct kvm_kernel_irq_routing_entry *e; + if (!irq_rt) { + rcu_assign_pointer(irqfd->irq_entry, NULL); + return; + } if (irqfd->gsi >= irq_rt->nr_rt_entries) { rcu_assign_pointer(irqfd->irq_entry, NULL); return; diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c index 20dc9e4..ebc2941 100644 --- a/virt/kvm/irqchip.c +++ b/virt/kvm/irqchip.c @@ -30,13 +30,24 @@ #include <trace/events/kvm.h> #include "irq.h" +static inline int pin_to_gsi(struct kvm *kvm, unsigned irqchip, unsigned pin) +{ + struct kvm_irq_routing_table *rt; + int gsi = pin; + + rt = rcu_dereference(kvm->irq_routing); + if (rt) + gsi = rt->chip[irqchip][pin]; + return gsi; +} + bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin) { struct kvm_irq_ack_notifier *kian; int gsi; rcu_read_lock(); - gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin]; + gsi = pin_to_gsi(kvm, irqchip, pin); if (gsi != -1) hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list, link) @@ -59,7 +70,7 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) trace_kvm_ack_irq(irqchip, pin); rcu_read_lock(); - gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin]; + gsi = pin_to_gsi(kvm, irqchip, pin); if (gsi != -1) hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list, link) @@ -126,7 +137,13 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level, */ rcu_read_lock(); irq_rt = rcu_dereference(kvm->irq_routing); - if (irq < irq_rt->nr_rt_entries) + if (!irq_rt) { + if (kvm->default_irq_route.set) { + irq_set[i] = kvm->default_irq_route; + irq_set[i].gsi = irq; + irq_set[i++].irqchip.pin = irq; + } + } else if (irq < irq_rt->nr_rt_entries) hlist_for_each_entry(e, &irq_rt->map[irq], link) irq_set[i++] = *e; rcu_read_unlock();
This implements a simple way to express the case of IRQ routing where there is one in-kernel PIC and the system interrupts (GSIs) are routed 1-1 to the corresponding pins of the PIC. This is expressed by having kvm->irq_routing == NULL with a skeleton irq routing entry in the new kvm->default_irq_route field. This provides a convenient way to provide backwards compatibility when adding IRQ routing capability to an existing in-kernel PIC, such as the XICS emulation on powerpc. Signed-off-by: Paul Mackerras <paulus@samba.org> --- include/linux/kvm_host.h | 1 + virt/kvm/eventfd.c | 4 ++++ virt/kvm/irqchip.c | 23 ++++++++++++++++++++--- 3 files changed, 25 insertions(+), 3 deletions(-)