Message ID | 4A008A72.6030607@cosmosbay.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, May 05, 2009 at 08:50:26PM +0200, Eric Dumazet wrote: > > I have tried with IRQs bound to one CPU per NIC. Same result. > > Did you check "grep eth /proc/interrupts" that your affinities setup > were indeed taken into account ? > > You should use same CPU for eth0 and eth2 (bond0), > > and another CPU for eth1 and eth3 (bond1) Ok, the best result is when assign all IRQs to the same CPU. Zero drops. When I bind slaves of bond interfaces to the same CPU, I start to get some drops, but much less than before. I didn't play with combinations. My problem is, after applying your accounting patch below, one of my HTB servers reports only 30-40% CPU idle on one of the cores. That won't take me for very long, load balancing across cores is needed. Is there any way at least to balance individual NICs on per core basis?
On Wed, 6 May 2009 02:50:08 +0300 Vladimir Ivashchenko <hazard@francoudi.com> wrote: > On Tue, May 05, 2009 at 08:50:26PM +0200, Eric Dumazet wrote: > > > > I have tried with IRQs bound to one CPU per NIC. Same result. > > > > Did you check "grep eth /proc/interrupts" that your affinities setup > > were indeed taken into account ? > > > > You should use same CPU for eth0 and eth2 (bond0), > > > > and another CPU for eth1 and eth3 (bond1) > > Ok, the best result is when assign all IRQs to the same CPU. Zero drops. > > When I bind slaves of bond interfaces to the same CPU, I start to get > some drops, but much less than before. I didn't play with combinations. > > My problem is, after applying your accounting patch below, one of my > HTB servers reports only 30-40% CPU idle on one of the cores. That won't > take me for very long, load balancing across cores is needed. > > Is there any way at least to balance individual NICs on per core basis? > The user level irqbalance program is a good place to start: http://www.irqbalance.org/ But it doesn't yet no how to handle multi-queue devices, and it seems to not handle NUMA (like SMP Nehalam) perfectly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Eric Dumazet <dada1@cosmosbay.com> wrote: > Vladimir Ivashchenko a écrit : > >>> On both kernels, the system is running with at least 70% idle CPU. > >>> The network interrupts are distributed accross the cores. > >> You should not distribute interrupts, but bound a NIC to one CPU > > > > Kernels 2.6.28 and 2.6.29 do this by default, so I thought its correct. > > The defaults are wrong? > > Yes they are, at least for forwarding setups. > > > > > I have tried with IRQs bound to one CPU per NIC. Same result. > > Did you check "grep eth /proc/interrupts" that your affinities setup > were indeed taken into account ? > > You should use same CPU for eth0 and eth2 (bond0), > > and another CPU for eth1 and eth3 (bond1) > > check how your cpus are setup > > egrep 'physical id|core id|processor' /proc/cpuinfo > > Because you might play and find best combo > > > If you use 2.6.29, apply following patch to get better system accounting, > to check if your cpu are saturated or not by hard/soft irqs > > --- linux-2.6.29/kernel/sched.c.orig 2009-05-05 20:46:49.000000000 +0200 > +++ linux-2.6.29/kernel/sched.c 2009-05-05 20:47:19.000000000 +0200 > @@ -4290,7 +4290,7 @@ > > if (user_tick) > account_user_time(p, one_jiffy, one_jiffy_scaled); > - else if (p != rq->idle) > + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) > account_system_time(p, HARDIRQ_OFFSET, one_jiffy, > one_jiffy_scaled); > else Note, your scheduler fix is upstream now in Linus's tree, as: f5f293a: sched: account system time properly "git cherry-pick f5f293a" will apply it to a .29 basis. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- linux-2.6.29/kernel/sched.c.orig 2009-05-05 20:46:49.000000000 +0200 +++ linux-2.6.29/kernel/sched.c 2009-05-05 20:47:19.000000000 +0200 @@ -4290,7 +4290,7 @@ if (user_tick) account_user_time(p, one_jiffy, one_jiffy_scaled); - else if (p != rq->idle) + else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET)) account_system_time(p, HARDIRQ_OFFSET, one_jiffy, one_jiffy_scaled); else