Message ID | 20151030210946.8538.26203.stgit@dwillia2-desk3.amr.corp.intel.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On Fri, Oct 30, 2015 at 05:09:46PM -0400, Dan Williams wrote: > For high frequency I/O the overhead of threaded interrupts impacts > performance. Add an option to make it configurable, with the default > being hardirq. > > A quick out-of-the-box test (i.e. no affinity tuning) shows ~10% random > read performance at ~20% less cpu. The cpu wins appear to be from > reduced lock contention. Do we need threaded irq at all? Why not just switch to hardirq? Thanks.
On Fri, Oct 30, 2015 at 5:56 PM, Tejun Heo <tj@kernel.org> wrote: > On Fri, Oct 30, 2015 at 05:09:46PM -0400, Dan Williams wrote: >> For high frequency I/O the overhead of threaded interrupts impacts >> performance. Add an option to make it configurable, with the default >> being hardirq. >> >> A quick out-of-the-box test (i.e. no affinity tuning) shows ~10% random >> read performance at ~20% less cpu. The cpu wins appear to be from >> reduced lock contention. > > Do we need threaded irq at all? Why not just switch to hardirq? > I can't imagine anyone doing high iops storage to also rely on the ability to preempt the irq handler. I'm assuming if someone notices it missing they can scream, but otherwise hardirq seems all around better. NVMe also has this optional via module parameter, but talking to Keith he does not know of anyone using it. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 30, 2015 at 06:59:14PM -0700, Dan Williams wrote: > On Fri, Oct 30, 2015 at 5:56 PM, Tejun Heo <tj@kernel.org> wrote: > > On Fri, Oct 30, 2015 at 05:09:46PM -0400, Dan Williams wrote: > >> For high frequency I/O the overhead of threaded interrupts impacts > >> performance. Add an option to make it configurable, with the default > >> being hardirq. > >> > >> A quick out-of-the-box test (i.e. no affinity tuning) shows ~10% random > >> read performance at ~20% less cpu. The cpu wins appear to be from > >> reduced lock contention. > > > > Do we need threaded irq at all? Why not just switch to hardirq? > > > > I can't imagine anyone doing high iops storage to also rely on the > ability to preempt the irq handler. I'm assuming if someone notices > it missing they can scream, but otherwise hardirq seems all around > better. > > NVMe also has this optional via module parameter, but talking to Keith > he does not know of anyone using it. Let's remove it for now and do the conditional thing if anybody misses it. No need to keep around dead code proactively. Thanks.
On 15-10-30 10:01 PM, Tejun Heo wrote: > On Fri, Oct 30, 2015 at 06:59:14PM -0700, Dan Williams wrote: >> On Fri, Oct 30, 2015 at 5:56 PM, Tejun Heo <tj@kernel.org> wrote: >>> On Fri, Oct 30, 2015 at 05:09:46PM -0400, Dan Williams wrote: >>>> For high frequency I/O the overhead of threaded interrupts impacts >>>> performance. Add an option to make it configurable, with the default >>>> being hardirq. >>>> >>>> A quick out-of-the-box test (i.e. no affinity tuning) shows ~10% random >>>> read performance at ~20% less cpu. The cpu wins appear to be from >>>> reduced lock contention. >>> >>> Do we need threaded irq at all? Why not just switch to hardirq? >>> >> >> I can't imagine anyone doing high iops storage to also rely on the >> ability to preempt the irq handler. I'm assuming if someone notices >> it missing they can scream, but otherwise hardirq seems all around >> better. >> >> NVMe also has this optional via module parameter, but talking to Keith >> he does not know of anyone using it. > > Let's remove it for now and do the conditional thing if anybody misses > it. No need to keep around dead code proactively. Aren't threaded IRQs there to improve overall system real-time response? If so, it would seem a step backwards to remove them completely, especially considering the low cost of maintaining them here. Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Mark. On Mon, Nov 16, 2015 at 08:52:35AM -0500, Mark Lord wrote: > >Let's remove it for now and do the conditional thing if anybody misses > >it. No need to keep around dead code proactively. > > Aren't threaded IRQs there to improve overall system real-time response? > If so, it would seem a step backwards to remove them completely, > especially considering the low cost of maintaining them here. For ahci, it was added without any justification. It just came with the MSI changes. If there's any data indicating that this makes any noticeable difference, it's easy to restore. Thanks.
On Mon, Nov 16, 2015 at 7:33 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, Mark. > > On Mon, Nov 16, 2015 at 08:52:35AM -0500, Mark Lord wrote: >> >Let's remove it for now and do the conditional thing if anybody misses >> >it. No need to keep around dead code proactively. >> >> Aren't threaded IRQs there to improve overall system real-time response? >> If so, it would seem a step backwards to remove them completely, >> especially considering the low cost of maintaining them here. > > For ahci, it was added without any justification. It just came with > the MSI changes. If there's any data indicating that this makes any > noticeable difference, it's easy to restore. > Also, the non-mult-msi case has been living without threaded interrupts since forever. If they were needed I expect the functionality would have been ported over by now. As is this provides a noticeable throughput improvement. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index c6f098a0435c..69bbd679d6aa 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -57,6 +57,10 @@ MODULE_PARM_DESC(skip_host_reset, "skip global host reset (0=don't skip, 1=skip) module_param_named(ignore_sss, ahci_ignore_sss, int, 0444); MODULE_PARM_DESC(ignore_sss, "Ignore staggered spinup flag (0=don't ignore, 1=ignore)"); +static int use_threaded_interrupts; +module_param(use_threaded_interrupts, int, 0444); +MODULE_PARM_DESC(, "Defer interrupt handling to a thread (0=don't defer, 1=defer)"); + static int ahci_set_lpm(struct ata_link *link, enum ata_lpm_policy policy, unsigned hints); static ssize_t ahci_led_show(struct ata_port *ap, char *buf); @@ -1826,6 +1830,26 @@ static irqreturn_t ahci_multi_irqs_intr(int irq, void *dev_instance) return IRQ_WAKE_THREAD; } +static irqreturn_t ahci_multi_irqs_intr_hard(int irq, void *dev_instance) +{ + struct ata_port *ap = dev_instance; + void __iomem *port_mmio = ahci_port_base(ap); + u32 status; + + VPRINTK("ENTER\n"); + + status = readl(port_mmio + PORT_IRQ_STAT); + writel(status, port_mmio + PORT_IRQ_STAT); + + spin_lock(ap->lock); + ahci_handle_port_interrupt(ap, port_mmio, status); + spin_unlock(ap->lock); + + VPRINTK("EXIT\n"); + + return IRQ_HANDLED; +} + static u32 ahci_handle_port_intr(struct ata_host *host, u32 irq_masked) { unsigned int i, handled = 0; @@ -2492,10 +2516,16 @@ static int ahci_host_activate_multi_irqs(struct ata_host *host, continue; } - rc = devm_request_threaded_irq(host->dev, irq, - ahci_multi_irqs_intr, - ahci_port_thread_fn, 0, - pp->irq_desc, host->ports[i]); + if (use_threaded_interrupts) + rc = devm_request_threaded_irq(host->dev, irq, + ahci_multi_irqs_intr, + ahci_port_thread_fn, 0, + pp->irq_desc, host->ports[i]); + else + rc = devm_request_irq(host->dev, irq, + ahci_multi_irqs_intr_hard, 0, + pp->irq_desc, host->ports[i]); + if (rc) return rc; ata_port_desc(host->ports[i], "irq %d", irq);
For high frequency I/O the overhead of threaded interrupts impacts performance. Add an option to make it configurable, with the default being hardirq. A quick out-of-the-box test (i.e. no affinity tuning) shows ~10% random read performance at ~20% less cpu. The cpu wins appear to be from reduced lock contention. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/ata/libahci.c | 38 ++++++++++++++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html