diff mbox

regression: unregister_netdev() unusably slow

Message ID 20090525000050.GJ24757@kvack.org
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Benjamin LaHaise May 25, 2009, midnight UTC
On Mon, May 25, 2009 at 12:47:39AM +0200, Eric Dumazet wrote:
> There is a strong dependancy against HZ
> BTW, I am using TREE_RCU

I'm using CLASSIC_RCU.  The bisect just completed, and it points to RCU.  
It makes some degree of sense since I'm testing on an otherwise idle 
machine.  That said, where is fixing it going to make sense?  I'm not 
opposed to having device unregister take a few timer ticks, but there 
has to be some way of exposing parallelism to the system, and since the 
synchronize_net() calls are done under rntl_lock(), none is possible at 
present.  Hrm.

		-ben

bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
commit bf51935f3e988e0ed6f34b55593e5912f990750a
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Feb 17 06:01:30 2009 -0800

    x86, rcu: fix strange load average and ksoftirqd behavior
    
    Damien Wyart reported high ksoftirqd CPU usage (20%) on an
    otherwise idle system.
    
    The function-graph trace Damien provided:
...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet May 25, 2009, 5:22 a.m. UTC | #1
Benjamin LaHaise a écrit :
> On Mon, May 25, 2009 at 12:47:39AM +0200, Eric Dumazet wrote:
>> There is a strong dependancy against HZ
>> BTW, I am using TREE_RCU
> 
> I'm using CLASSIC_RCU.  The bisect just completed, and it points to RCU.  
> It makes some degree of sense since I'm testing on an otherwise idle 
> machine.  That said, where is fixing it going to make sense?  I'm not 
> opposed to having device unregister take a few timer ticks, but there 
> has to be some way of exposing parallelism to the system, and since the 
> synchronize_net() calls are done under rntl_lock(), none is possible at 
> present.  Hrm.

Thanks Ben, this bisection indeed confirms how nasty synchronize_rcu() is :)

Time to include Paul and lkml in the discussion, and find a better solution than 
one provided in February.

> 
> 		-ben
> 
> bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
> commit bf51935f3e988e0ed6f34b55593e5912f990750a
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Tue Feb 17 06:01:30 2009 -0800
> 
>     x86, rcu: fix strange load average and ksoftirqd behavior
>     
>     Damien Wyart reported high ksoftirqd CPU usage (20%) on an
>     otherwise idle system.
>     
>     The function-graph trace Damien provided:
> ...
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> 
> index a546f55..bd4da2a 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -104,9 +104,6 @@ void cpu_idle(void)
>  			check_pgt_cache();
>  			rmb();
>  
> -			if (rcu_pending(cpu))
> -				rcu_check_callbacks(cpu, 0);
> -
>  			if (cpu_is_offline(cpu))
>  				play_dead();
>  
> 
> --

Paul, this commit makes net device unregister very slow (more than 100 ms
 if CONFIG_NO_HZ is set), while it used to be pretty fast in previous kernels.

Quoting Ben : 
" I just ran a few L2TP tests against 2.6.30-rc7, and it looks like network 
  device deletion has become unusably slow.  At least in 2.6.27.10, deleting 
  1000 network interfaces takes less than 2 seconds of real time.  The same 
  test run under 2.6.30-rc7 is taking hundreds of seconds to delete 1000 
  interfaces at a rate of about 5 per second.  The interfaces all share the 
  same local ip address, but each have a single route to a unique client 
  ip address."

Device unregister is a synchronize_rcu() abuser (three calls to dismantle
a vlan...) so delaying rcu callbacks can be pretty expensive for it.

I wonder if the real root of the problem was not discovered in the meantime,
by commit 64ca5ab913f1594ef316556e65f5eae63ff50cee
rcu: increment quiescent state counter in ksoftirqd()

Maybe this commit solved Damien Wyart problem as well, and we can revert
commit bf51935f3e988e0ed6f34b55593e5912f990750a ?

Thank you

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Damien Wyart May 25, 2009, 8:04 a.m. UTC | #2
Hello,

* Eric Dumazet <dada1@cosmosbay.com> [2009-05-25 07:22]:
> Time to include Paul and lkml in the discussion, and find a better
> solution than one provided in February.

> > bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
> > commit bf51935f3e988e0ed6f34b55593e5912f990750a
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue Feb 17 06:01:30 2009 -0800

> >     x86, rcu: fix strange load average and ksoftirqd behavior

> >     Damien Wyart reported high ksoftirqd CPU usage (20%) on an
> >     otherwise idle system.

> >     The function-graph trace Damien provided:
> > ...
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c

> > index a546f55..bd4da2a 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -104,9 +104,6 @@ void cpu_idle(void)
> >  			check_pgt_cache();
> >  			rmb();

> > -			if (rcu_pending(cpu))
> > -				rcu_check_callbacks(cpu, 0);
> > -
> >  			if (cpu_is_offline(cpu))
> >  				play_dead();

> I wonder if the real root of the problem was not discovered in the meantime,
> by commit 64ca5ab913f1594ef316556e65f5eae63ff50cee
> rcu: increment quiescent state counter in ksoftirqd()

> Maybe this commit solved Damien Wyart problem as well, and we can revert
> commit bf51935f3e988e0ed6f34b55593e5912f990750a ?

Ran some tests on 2.6.30-rc7 with bf51935f reverted, and I am still
seeing the problems I originally reported back in February, so I guess
64ca5ab9 is not enough to fully solve all the issues... Note that I am
using CONFIG_TREE_RCU=y (was already true in my February reports).

Feel free to ask if more testing is needed.
Paul E. McKenney May 25, 2009, 4:21 p.m. UTC | #3
On Mon, May 25, 2009 at 07:22:02AM +0200, Eric Dumazet wrote:
> Benjamin LaHaise a écrit :
> > On Mon, May 25, 2009 at 12:47:39AM +0200, Eric Dumazet wrote:
> >> There is a strong dependancy against HZ
> >> BTW, I am using TREE_RCU
> > 
> > I'm using CLASSIC_RCU.  The bisect just completed, and it points to RCU.  
> > It makes some degree of sense since I'm testing on an otherwise idle 
> > machine.  That said, where is fixing it going to make sense?  I'm not 
> > opposed to having device unregister take a few timer ticks, but there 
> > has to be some way of exposing parallelism to the system, and since the 
> > synchronize_net() calls are done under rntl_lock(), none is possible at 
> > present.  Hrm.
> 
> Thanks Ben, this bisection indeed confirms how nasty synchronize_rcu() is :)

Yet another step in my learning what is required of RCU, it seems!  ;-)

> Time to include Paul and lkml in the discussion, and find a better solution than 
> one provided in February.

One approach would be to convert the offending synchronize_rcu() to
call_rcu(), but if this were straightforward, I would guess that you would
have already done this.  But if the code following the synchronize_rcu()
does nothing but free up old data structures, this is an easy fix.
If there are statistics or other state involved, then call_rcu() might
not be the right tool for the job.

Another approach is to apply the patch at:

	http://lkml.org/lkml/2009/5/22/332

Then replace the offending synchronize_rcu() with synchronize_rcu_expedited().
This code is still a bit on the experimental side, but tests have been
going quite well, so, unlike a week or two ago, it is definitely worth
trying out.

Do either of these approaches work for you?

							Thanx, Paul

> > 		-ben
> > 
> > bf51935f3e988e0ed6f34b55593e5912f990750a is first bad commit
> > commit bf51935f3e988e0ed6f34b55593e5912f990750a
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Tue Feb 17 06:01:30 2009 -0800
> > 
> >     x86, rcu: fix strange load average and ksoftirqd behavior
> >     
> >     Damien Wyart reported high ksoftirqd CPU usage (20%) on an
> >     otherwise idle system.
> >     
> >     The function-graph trace Damien provided:
> > ...
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> > 
> > index a546f55..bd4da2a 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -104,9 +104,6 @@ void cpu_idle(void)
> >  			check_pgt_cache();
> >  			rmb();
> >  
> > -			if (rcu_pending(cpu))
> > -				rcu_check_callbacks(cpu, 0);
> > -
> >  			if (cpu_is_offline(cpu))
> >  				play_dead();
> >  
> > 
> > --
> 
> Paul, this commit makes net device unregister very slow (more than 100 ms
>  if CONFIG_NO_HZ is set), while it used to be pretty fast in previous kernels.
> 
> Quoting Ben : 
> " I just ran a few L2TP tests against 2.6.30-rc7, and it looks like network 
>   device deletion has become unusably slow.  At least in 2.6.27.10, deleting 
>   1000 network interfaces takes less than 2 seconds of real time.  The same 
>   test run under 2.6.30-rc7 is taking hundreds of seconds to delete 1000 
>   interfaces at a rate of about 5 per second.  The interfaces all share the 
>   same local ip address, but each have a single route to a unique client 
>   ip address."
> 
> Device unregister is a synchronize_rcu() abuser (three calls to dismantle
> a vlan...) so delaying rcu callbacks can be pretty expensive for it.
> 
> I wonder if the real root of the problem was not discovered in the meantime,
> by commit 64ca5ab913f1594ef316556e65f5eae63ff50cee
> rcu: increment quiescent state counter in ksoftirqd()
> 
> Maybe this commit solved Damien Wyart problem as well, and we can revert
> commit bf51935f3e988e0ed6f34b55593e5912f990750a ?
> 
> Thank you
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c

index a546f55..bd4da2a 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -104,9 +104,6 @@  void cpu_idle(void)
 			check_pgt_cache();
 			rmb();
 
-			if (rcu_pending(cpu))
-				rcu_check_callbacks(cpu, 0);
-
 			if (cpu_is_offline(cpu))
 				play_dead();