diff mbox series

[net,1/2] net: disable netpoll on fresh napis

Message ID 20200826194007.1962762-2-kuba@kernel.org
State Accepted
Delegated to: David Miller
Headers show
Series net: fix netpoll crash with bnxt | expand

Commit Message

Jakub Kicinski Aug. 26, 2020, 7:40 p.m. UTC
napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
netpoll from accessing rings before init is complete. However, the
same is not done for fresh napi instances in netif_napi_add(),
even though we expect NAPI instances to be added as disabled.

This causes crashes during driver reconfiguration (enabling XDP,
changing the channel count) - if there is any printk() after
netif_napi_add() but before napi_enable().

To ensure memory ordering is correct we need to use RCU accessors.

Reported-by: Rob Sherwood <rsher@fb.com>
Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/core/dev.c     | 3 ++-
 net/core/netpoll.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Aug. 27, 2020, 7:25 a.m. UTC | #1
On 8/26/20 12:40 PM, Jakub Kicinski wrote:
> napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
> netpoll from accessing rings before init is complete. However, the
> same is not done for fresh napi instances in netif_napi_add(),
> even though we expect NAPI instances to be added as disabled.
> 
> This causes crashes during driver reconfiguration (enabling XDP,
> changing the channel count) - if there is any printk() after
> netif_napi_add() but before napi_enable().
> 
> To ensure memory ordering is correct we need to use RCU accessors.
> 
> Reported-by: Rob Sherwood <rsher@fb.com>
> Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  net/core/dev.c     | 3 ++-
>  net/core/netpoll.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d42c9ea0c3c0..95ac7568f693 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6612,12 +6612,13 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>  		netdev_err_once(dev, "%s() called with weight %d\n", __func__,
>  				weight);
>  	napi->weight = weight;
> -	list_add(&napi->dev_list, &dev->napi_list);
>  	napi->dev = dev;
>  #ifdef CONFIG_NETPOLL
>  	napi->poll_owner = -1;
>  #endif
>  	set_bit(NAPI_STATE_SCHED, &napi->state);
> +	set_bit(NAPI_STATE_NPSVC, &napi->state);
> +	list_add_rcu(&napi->dev_list, &dev->napi_list);
>  	napi_hash_add(napi);
>  }
>  EXPORT_SYMBOL(netif_napi_add);
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 093e90e52bc2..2338753e936b 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -162,7 +162,7 @@ static void poll_napi(struct net_device *dev)
>  	struct napi_struct *napi;
>  	int cpu = smp_processor_id();
>  
> -	list_for_each_entry(napi, &dev->napi_list, dev_list) {
> +	list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) {
>  		if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) {
>  			poll_one_napi(napi);
>  			smp_store_release(&napi->poll_owner, -1);
> 

You added rcu in this patch (without anything in the changelog).

netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might need list_for_each_entry_rcu_bh()
Jakub Kicinski Aug. 27, 2020, 3:10 p.m. UTC | #2
On Thu, 27 Aug 2020 00:25:31 -0700 Eric Dumazet wrote:
> On 8/26/20 12:40 PM, Jakub Kicinski wrote:
> > To ensure memory ordering is correct we need to use RCU accessors.
>
> > +	set_bit(NAPI_STATE_NPSVC, &napi->state);
> > +	list_add_rcu(&napi->dev_list, &dev->napi_list);
> 
> >  
> > -	list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > +	list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) {
> >  		if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) {
> >  			poll_one_napi(napi);
> >  			smp_store_release(&napi->poll_owner, -1);
> >   
> 
> You added rcu in this patch (without anything in the changelog).

I mentioned I need it for the barriers, in particular I wanted the
store release barrier in list_add. Not extremely clean :(

> netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might
> need list_for_each_entry_rcu_bh()

I thought the RCU flavors are mostly meaningless at this point,
list_for_each_entry_rcu() checks rcu_read_lock_any_held(). I can add
the definition of list_for_each_entry_rcu_bh() (since it doesn't exist)
or go back to non-RCU iteration (since the use is just documentation,
the code is identical). Or fix it some other way?
Eric Dumazet Aug. 27, 2020, 3:43 p.m. UTC | #3
On 8/27/20 8:10 AM, Jakub Kicinski wrote:
> On Thu, 27 Aug 2020 00:25:31 -0700 Eric Dumazet wrote:
>> On 8/26/20 12:40 PM, Jakub Kicinski wrote:
>>> To ensure memory ordering is correct we need to use RCU accessors.
>>
>>> +	set_bit(NAPI_STATE_NPSVC, &napi->state);
>>> +	list_add_rcu(&napi->dev_list, &dev->napi_list);
>>
>>>  
>>> -	list_for_each_entry(napi, &dev->napi_list, dev_list) {
>>> +	list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) {
>>>  		if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) {
>>>  			poll_one_napi(napi);
>>>  			smp_store_release(&napi->poll_owner, -1);
>>>   
>>
>> You added rcu in this patch (without anything in the changelog).
> 
> I mentioned I need it for the barriers, in particular I wanted the
> store release barrier in list_add. Not extremely clean :(

Hmmm, we also have smp_mb__after_atomic()

> 
>> netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might
>> need list_for_each_entry_rcu_bh()
> 
> I thought the RCU flavors are mostly meaningless at this point,
> list_for_each_entry_rcu() checks rcu_read_lock_any_held(). I can add
> the definition of list_for_each_entry_rcu_bh() (since it doesn't exist)
> or go back to non-RCU iteration (since the use is just documentation,
> the code is identical). Or fix it some other way?
> 

Oh, I really thought list_for_each_entry_rcu() was only checking standard rcu.

I might have been confused because we do have hlist_for_each_entry_rcu_bh() helper.

Anyway, when looking at the patch I was not at ease because we do not have proper
rcu grace period when a napi is removed from dev->napi_list. A driver might
free the napi struct right after calling netif_napi_del()
Jakub Kicinski Aug. 27, 2020, 5:47 p.m. UTC | #4
On Thu, 27 Aug 2020 08:43:22 -0700 Eric Dumazet wrote:
> On 8/27/20 8:10 AM, Jakub Kicinski wrote:
> > On Thu, 27 Aug 2020 00:25:31 -0700 Eric Dumazet wrote:  
> >> On 8/26/20 12:40 PM, Jakub Kicinski wrote:  
> >>> To ensure memory ordering is correct we need to use RCU accessors.  
> >>  
> >>> +	set_bit(NAPI_STATE_NPSVC, &napi->state);
> >>> +	list_add_rcu(&napi->dev_list, &dev->napi_list);  
> >>  
> >>>  
> >>> -	list_for_each_entry(napi, &dev->napi_list, dev_list) {
> >>> +	list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) {
> >>>  		if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) {
> >>>  			poll_one_napi(napi);
> >>>  			smp_store_release(&napi->poll_owner, -1);
> >>>     
> >>
> >> You added rcu in this patch (without anything in the changelog).  
> > 
> > I mentioned I need it for the barriers, in particular I wanted the
> > store release barrier in list_add. Not extremely clean :(  
> 
> Hmmm, we also have smp_mb__after_atomic()

Pairing with the cmpxchg() on the netpoll side? Can do, I wasn't 
sure if the list operations themselves need some special care 
(like READ_ONCE/WRITE_ONCE)..

> >> netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might
> >> need list_for_each_entry_rcu_bh()  
> > 
> > I thought the RCU flavors are mostly meaningless at this point,
> > list_for_each_entry_rcu() checks rcu_read_lock_any_held(). I can add
> > the definition of list_for_each_entry_rcu_bh() (since it doesn't exist)
> > or go back to non-RCU iteration (since the use is just documentation,
> > the code is identical). Or fix it some other way?
> >   
> 
> Oh, I really thought list_for_each_entry_rcu() was only checking standard rcu.
> 
> I might have been confused because we do have hlist_for_each_entry_rcu_bh() helper.
> 
> Anyway, when looking at the patch I was not at ease because we do not have proper
> rcu grace period when a napi is removed from dev->napi_list. A driver might
> free the napi struct right after calling netif_napi_del()

Ugh, you're right. I didn't look closely enough at netif_napi_del():

	if (napi_hash_del(napi))
		synchronize_net();
	list_del_init(&napi->dev_list);

Looks like I can reorder these.. and perhaps make all dev->napi_list
accesses RCU, for netpoll?
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index d42c9ea0c3c0..95ac7568f693 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6612,12 +6612,13 @@  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		netdev_err_once(dev, "%s() called with weight %d\n", __func__,
 				weight);
 	napi->weight = weight;
-	list_add(&napi->dev_list, &dev->napi_list);
 	napi->dev = dev;
 #ifdef CONFIG_NETPOLL
 	napi->poll_owner = -1;
 #endif
 	set_bit(NAPI_STATE_SCHED, &napi->state);
+	set_bit(NAPI_STATE_NPSVC, &napi->state);
+	list_add_rcu(&napi->dev_list, &dev->napi_list);
 	napi_hash_add(napi);
 }
 EXPORT_SYMBOL(netif_napi_add);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 093e90e52bc2..2338753e936b 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -162,7 +162,7 @@  static void poll_napi(struct net_device *dev)
 	struct napi_struct *napi;
 	int cpu = smp_processor_id();
 
-	list_for_each_entry(napi, &dev->napi_list, dev_list) {
+	list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) {
 		if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) {
 			poll_one_napi(napi);
 			smp_store_release(&napi->poll_owner, -1);