diff mbox series

[net] ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state

Message ID 20180320144456.223556-1-dav.lebrun@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net] ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state | expand

Commit Message

David Lebrun March 20, 2018, 2:44 p.m. UTC
From: David Lebrun <dlebrun@google.com>

The seg6_build_state() function is called with RCU read lock held,
so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.

[   92.770271] =============================
[   92.770628] WARNING: suspicious RCU usage
[   92.770921] 4.16.0-rc4+ #12 Not tainted
[   92.771277] -----------------------------
[   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[   92.772279]
[   92.772279] other info that might help us debug this:
[   92.772279]
[   92.773067]
[   92.773067] rcu_scheduler_active = 2, debug_locks = 1
[   92.773514] 2 locks held by ip/2413:
[   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
[   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
[   92.775065]
[   92.775065] stack backtrace:
[   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
[   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
[   92.776608] Call Trace:
[   92.776852]  dump_stack+0x7d/0xbc
[   92.777130]  __schedule+0x133/0xf00
[   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
[   92.777783]  ? __sched_text_start+0x8/0x8
[   92.778073]  ? rcu_is_watching+0x19/0x30
[   92.778383]  ? kernel_text_address+0x49/0x60
[   92.778800]  ? __kernel_text_address+0x9/0x30
[   92.779241]  ? unwind_get_return_address+0x29/0x40
[   92.779727]  ? pcpu_alloc+0x102/0x8f0
[   92.780101]  _cond_resched+0x23/0x50
[   92.780459]  __mutex_lock+0xbd/0xad0
[   92.780818]  ? pcpu_alloc+0x102/0x8f0
[   92.781194]  ? seg6_build_state+0x11d/0x240
[   92.781611]  ? save_stack+0x9b/0xb0
[   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
[   92.782480]  ? seg6_build_state+0x11d/0x240
[   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
[   92.783393]  ? ip6_route_info_create+0x687/0x1640
[   92.783846]  ? ip6_route_add+0x74/0x110
[   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0

Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
---
 net/ipv6/seg6_iptunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet March 20, 2018, 3:07 p.m. UTC | #1
On 03/20/2018 07:44 AM, David Lebrun wrote:
> From: David Lebrun <dlebrun@google.com>
> 
> The seg6_build_state() function is called with RCU read lock held,
> so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.
>
> 
> Fixes: 6c8702c60b886 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
> Signed-off-by: David Lebrun <dlebrun@google.com>
> ---
>  net/ipv6/seg6_iptunnel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
> index bd6cc688bd19..8367b859a934 100644
> --- a/net/ipv6/seg6_iptunnel.c
> +++ b/net/ipv6/seg6_iptunnel.c
> @@ -418,7 +418,7 @@ static int seg6_build_state(struct nlattr *nla,
>  
>  	slwt = seg6_lwt_lwtunnel(newts);
>  
> -	err = dst_cache_init(&slwt->cache, GFP_KERNEL);
> +	err = dst_cache_init(&slwt->cache, GFP_ATOMIC);
>

This is not the proper fix.

Control path holds RTNL and can sleeep if needed.

RCU should be avoided in lwtunnel_build_state()
David Lebrun March 20, 2018, 5:11 p.m. UTC | #2
On 20/03/18 15:07, Eric Dumazet wrote:
> This is not the proper fix.
> 
> Control path holds RTNL and can sleeep if needed.
> 
> RCU should be avoided in lwtunnel_build_state()
> 

+Roopa

In lwtunnel_build_state(), the RCU protects the lwtunnel_encap_ops "ops" 
which is rcu-dereferenced. Moreover, the lwtunnel_state_alloc() 
function, which is used in all build_state functions, also uses 
GFP_ATOMIC, so this seemed a proper fix, or at least proper mitigation.

Do you suggest that the lwtunnel_encap_ops can be protected in a 
different way, not requiring RCU ?
Eric Dumazet March 20, 2018, 5:20 p.m. UTC | #3
On 03/20/2018 10:11 AM, David Lebrun wrote:
> On 20/03/18 15:07, Eric Dumazet wrote:
>> This is not the proper fix.
>>
>> Control path holds RTNL and can sleeep if needed.
>>
>> RCU should be avoided in lwtunnel_build_state()
>>
> 
> +Roopa
> 
> In lwtunnel_build_state(), the RCU protects the lwtunnel_encap_ops "ops" which is rcu-dereferenced. Moreover, the lwtunnel_state_alloc() function, which is used in all build_state functions, also uses GFP_ATOMIC, so this seemed a proper fix, or at least proper mitigation.
> 
> Do you suggest that the lwtunnel_encap_ops can be protected in a different way, not requiring RCU ?

Yes, GFP_ATOMIC might be an 'easy fix' for net tree,
but for the future, GFP_KERNEL allocations make more sense in control path.

Many scripts do not handle errors correctly and simply abort.
In case of failure, they might retry (busy poll) in the best scenario,
consuming cycles that might be needed to exit from pressure.

GFP_KERNEL allocations provide proper scheduling, they really should be the norm for control path.
David Miller March 22, 2018, 4:22 p.m. UTC | #4
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 20 Mar 2018 10:20:15 -0700

> 
> 
> On 03/20/2018 10:11 AM, David Lebrun wrote:
>> On 20/03/18 15:07, Eric Dumazet wrote:
>>> This is not the proper fix.
>>>
>>> Control path holds RTNL and can sleeep if needed.
>>>
>>> RCU should be avoided in lwtunnel_build_state()
>>>
>> 
>> +Roopa
>> 
>> In lwtunnel_build_state(), the RCU protects the lwtunnel_encap_ops "ops" which is rcu-dereferenced. Moreover, the lwtunnel_state_alloc() function, which is used in all build_state functions, also uses GFP_ATOMIC, so this seemed a proper fix, or at least proper mitigation.
>> 
>> Do you suggest that the lwtunnel_encap_ops can be protected in a different way, not requiring RCU ?
> 
> Yes, GFP_ATOMIC might be an 'easy fix' for net tree,
> but for the future, GFP_KERNEL allocations make more sense in control path.

Agreed.

I'll apply this to net and queue it up for -stable, but long term making
this able to use GFP_KERNEL properly is the real way to go about it.
diff mbox series

Patch

diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index bd6cc688bd19..8367b859a934 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -418,7 +418,7 @@  static int seg6_build_state(struct nlattr *nla,
 
 	slwt = seg6_lwt_lwtunnel(newts);
 
-	err = dst_cache_init(&slwt->cache, GFP_KERNEL);
+	err = dst_cache_init(&slwt->cache, GFP_ATOMIC);
 	if (err) {
 		kfree(newts);
 		return err;