Message ID | 20140410115706.662fb5e7@annuminas.surriel.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2014-04-10 at 11:57 -0400, Rik van Riel wrote: > Jiri noticed that netperf throughput had gotten worse in recent years, > for smaller message sizes. In the past, ksoftirqd would take around 80% > of a CPU, and netserver would take around 100% of another CPU. > > On current kernels, sometimes all the softirq processing is done in the > context of the netperf process, which can result in as much as a 50% > performance drop, due to netserver spending all its CPU time "delivering" > packets to a socket it rarely empties, and dropping the packets on the > floor as a result. > > This seems silly in an age where even cell phones are multi-core, and > we could simply let the ksoftirqd thread handle the softirq load, so > the scheduler can migrate the userspace task to another CPU. > > This patch accomplishes that in a very simplistic way. The code > remembers when __do_softirq last looped, and will punt softirq > handling to ksoftirqd if another softirq happens in the same jiffie. > > Netperf results: > > without patch with patch > UDP_STREAM 1472 957.17 / 954.18 957.15 / 951.73 > UDP_STREAM 978 936.85 / 930.06 936.84 / 927.63 > UDP_STREAM 466 875.98 / 865.62 875.98 / 868.65 > UDP_STREAM 210 760.88 / 748.70 760.88 / 748.61 > UDP_STREAM 82 554.06 / 329.96 554.06 / 505.95 > unstable ^^^^^^ > UDP_STREAM 18 158.99 / 108.95 160.73 / 112.68 > > Signed-off-by: Rik van Riel <riel@redhat.com> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: David Miller <davem@davemloft.net> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Tested-by: Jiri Benc <jbenc@redhat.com> > Reported-by: Jiri Benc <jbenc@redhat.com> > --- > kernel/softirq.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index 787b3a0..020be2f 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -56,6 +56,7 @@ EXPORT_SYMBOL(irq_stat); > static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp; > > DEFINE_PER_CPU(struct task_struct *, ksoftirqd); > +DEFINE_PER_CPU(unsigned long, softirq_looped); > > char *softirq_to_name[NR_SOFTIRQS] = { > "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", > @@ -271,6 +272,9 @@ asmlinkage void __do_softirq(void) > > pending = local_softirq_pending(); > if (pending) { > + /* Still busy? Remember this for invoke_softirq() below... */ > + this_cpu_write(softirq_looped, jiffies); > + > if (time_before(jiffies, end) && !need_resched() && > --max_restart) > goto restart; > @@ -330,7 +334,11 @@ void irq_enter(void) > > static inline void invoke_softirq(void) > { > - if (!force_irqthreads) { > + /* > + * If force_irqthreads is set, or if we looped in __do_softirq this > + * jiffie, punt to ksoftirqd to prevent userland starvation. > + */ > + if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) { > /* > * We can safely execute softirq on the current stack if > * it is the irq stack, because it should be near empty I guess this is the tradeoff between latencies and throughput. Have you tried some TCP_RR / UDP_RR tests with one / multiple instances, and have you tried drivers that use deferred skb freeing (hard irq calling TX completion handler, then dev_kfree_skb_any() scheduling TX softirq) and increase chance of having a not zero local_softirq_pending() Calling skb destructor on a different cpu can have a huge false sharing effect. A TCP socket is really big. Your test only UDP_STREAM stresses the RX part, and UDP sockets dont use the complex callbacks TCP sockets use. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Rik van Riel <riel@redhat.com> Date: Thu, 10 Apr 2014 11:57:06 -0400 > @@ -330,7 +334,11 @@ void irq_enter(void) > > static inline void invoke_softirq(void) > { > - if (!force_irqthreads) { > + /* > + * If force_irqthreads is set, or if we looped in __do_softirq this > + * jiffie, punt to ksoftirqd to prevent userland starvation. > + */ > + if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) { If we do this, which I'm not convinced of yet, I think we should use two jiffies as the cutoff. Because if we are at the tail end of one jiffie we'll prematurely go to ksoftirqd when we really have no reason to do so. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/11/2014 04:33 PM, David Miller wrote: > From: Rik van Riel <riel@redhat.com> > Date: Thu, 10 Apr 2014 11:57:06 -0400 > >> @@ -330,7 +334,11 @@ void irq_enter(void) >> >> static inline void invoke_softirq(void) >> { >> - if (!force_irqthreads) { >> + /* >> + * If force_irqthreads is set, or if we looped in __do_softirq this >> + * jiffie, punt to ksoftirqd to prevent userland starvation. >> + */ >> + if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) { > > If we do this, which I'm not convinced of yet, I think we should use two > jiffies as the cutoff. I am not fully convinced, either. This patch mostly just illustrates the problem, and gives something that solves Jiri's immediate problem. It is quite likely that there is a better way to solve the problem of: 1) softirq handling starving out userspace processing, 2) which could be solved by moving the userspace process elsewhere, and 3) shifting softirq processing to ksoftirqd A working patch seems to be one of the better ways to start a discussion, though. If anybody has a nicer idea on how to solve the problem, I'd even be willing to implement your idea, and give Jiri another patch to test :)
diff --git a/kernel/softirq.c b/kernel/softirq.c index 787b3a0..020be2f 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -56,6 +56,7 @@ EXPORT_SYMBOL(irq_stat); static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp; DEFINE_PER_CPU(struct task_struct *, ksoftirqd); +DEFINE_PER_CPU(unsigned long, softirq_looped); char *softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", @@ -271,6 +272,9 @@ asmlinkage void __do_softirq(void) pending = local_softirq_pending(); if (pending) { + /* Still busy? Remember this for invoke_softirq() below... */ + this_cpu_write(softirq_looped, jiffies); + if (time_before(jiffies, end) && !need_resched() && --max_restart) goto restart; @@ -330,7 +334,11 @@ void irq_enter(void) static inline void invoke_softirq(void) { - if (!force_irqthreads) { + /* + * If force_irqthreads is set, or if we looped in __do_softirq this + * jiffie, punt to ksoftirqd to prevent userland starvation. + */ + if (!force_irqthreads && this_cpu_read(softirq_looped) != jiffies) { /* * We can safely execute softirq on the current stack if * it is the irq stack, because it should be near empty