Message ID | 20200826194007.1962762-2-kuba@kernel.org |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | net: fix netpoll crash with bnxt | expand |
On 8/26/20 12:40 PM, Jakub Kicinski wrote: > napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent > netpoll from accessing rings before init is complete. However, the > same is not done for fresh napi instances in netif_napi_add(), > even though we expect NAPI instances to be added as disabled. > > This causes crashes during driver reconfiguration (enabling XDP, > changing the channel count) - if there is any printk() after > netif_napi_add() but before napi_enable(). > > To ensure memory ordering is correct we need to use RCU accessors. > > Reported-by: Rob Sherwood <rsher@fb.com> > Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable") > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > net/core/dev.c | 3 ++- > net/core/netpoll.c | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index d42c9ea0c3c0..95ac7568f693 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6612,12 +6612,13 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi, > netdev_err_once(dev, "%s() called with weight %d\n", __func__, > weight); > napi->weight = weight; > - list_add(&napi->dev_list, &dev->napi_list); > napi->dev = dev; > #ifdef CONFIG_NETPOLL > napi->poll_owner = -1; > #endif > set_bit(NAPI_STATE_SCHED, &napi->state); > + set_bit(NAPI_STATE_NPSVC, &napi->state); > + list_add_rcu(&napi->dev_list, &dev->napi_list); > napi_hash_add(napi); > } > EXPORT_SYMBOL(netif_napi_add); > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > index 093e90e52bc2..2338753e936b 100644 > --- a/net/core/netpoll.c > +++ b/net/core/netpoll.c > @@ -162,7 +162,7 @@ static void poll_napi(struct net_device *dev) > struct napi_struct *napi; > int cpu = smp_processor_id(); > > - list_for_each_entry(napi, &dev->napi_list, dev_list) { > + list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) { > if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) { > poll_one_napi(napi); > smp_store_release(&napi->poll_owner, -1); > You added rcu in this patch (without anything in the changelog). netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might need list_for_each_entry_rcu_bh()
On Thu, 27 Aug 2020 00:25:31 -0700 Eric Dumazet wrote: > On 8/26/20 12:40 PM, Jakub Kicinski wrote: > > To ensure memory ordering is correct we need to use RCU accessors. > > > + set_bit(NAPI_STATE_NPSVC, &napi->state); > > + list_add_rcu(&napi->dev_list, &dev->napi_list); > > > > > - list_for_each_entry(napi, &dev->napi_list, dev_list) { > > + list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) { > > if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) { > > poll_one_napi(napi); > > smp_store_release(&napi->poll_owner, -1); > > > > You added rcu in this patch (without anything in the changelog). I mentioned I need it for the barriers, in particular I wanted the store release barrier in list_add. Not extremely clean :( > netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might > need list_for_each_entry_rcu_bh() I thought the RCU flavors are mostly meaningless at this point, list_for_each_entry_rcu() checks rcu_read_lock_any_held(). I can add the definition of list_for_each_entry_rcu_bh() (since it doesn't exist) or go back to non-RCU iteration (since the use is just documentation, the code is identical). Or fix it some other way?
On 8/27/20 8:10 AM, Jakub Kicinski wrote: > On Thu, 27 Aug 2020 00:25:31 -0700 Eric Dumazet wrote: >> On 8/26/20 12:40 PM, Jakub Kicinski wrote: >>> To ensure memory ordering is correct we need to use RCU accessors. >> >>> + set_bit(NAPI_STATE_NPSVC, &napi->state); >>> + list_add_rcu(&napi->dev_list, &dev->napi_list); >> >>> >>> - list_for_each_entry(napi, &dev->napi_list, dev_list) { >>> + list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) { >>> if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) { >>> poll_one_napi(napi); >>> smp_store_release(&napi->poll_owner, -1); >>> >> >> You added rcu in this patch (without anything in the changelog). > > I mentioned I need it for the barriers, in particular I wanted the > store release barrier in list_add. Not extremely clean :( Hmmm, we also have smp_mb__after_atomic() > >> netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might >> need list_for_each_entry_rcu_bh() > > I thought the RCU flavors are mostly meaningless at this point, > list_for_each_entry_rcu() checks rcu_read_lock_any_held(). I can add > the definition of list_for_each_entry_rcu_bh() (since it doesn't exist) > or go back to non-RCU iteration (since the use is just documentation, > the code is identical). Or fix it some other way? > Oh, I really thought list_for_each_entry_rcu() was only checking standard rcu. I might have been confused because we do have hlist_for_each_entry_rcu_bh() helper. Anyway, when looking at the patch I was not at ease because we do not have proper rcu grace period when a napi is removed from dev->napi_list. A driver might free the napi struct right after calling netif_napi_del()
On Thu, 27 Aug 2020 08:43:22 -0700 Eric Dumazet wrote: > On 8/27/20 8:10 AM, Jakub Kicinski wrote: > > On Thu, 27 Aug 2020 00:25:31 -0700 Eric Dumazet wrote: > >> On 8/26/20 12:40 PM, Jakub Kicinski wrote: > >>> To ensure memory ordering is correct we need to use RCU accessors. > >> > >>> + set_bit(NAPI_STATE_NPSVC, &napi->state); > >>> + list_add_rcu(&napi->dev_list, &dev->napi_list); > >> > >>> > >>> - list_for_each_entry(napi, &dev->napi_list, dev_list) { > >>> + list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) { > >>> if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) { > >>> poll_one_napi(napi); > >>> smp_store_release(&napi->poll_owner, -1); > >>> > >> > >> You added rcu in this patch (without anything in the changelog). > > > > I mentioned I need it for the barriers, in particular I wanted the > > store release barrier in list_add. Not extremely clean :( > > Hmmm, we also have smp_mb__after_atomic() Pairing with the cmpxchg() on the netpoll side? Can do, I wasn't sure if the list operations themselves need some special care (like READ_ONCE/WRITE_ONCE).. > >> netpoll_poll_dev() uses rcu_dereference_bh(), suggesting you might > >> need list_for_each_entry_rcu_bh() > > > > I thought the RCU flavors are mostly meaningless at this point, > > list_for_each_entry_rcu() checks rcu_read_lock_any_held(). I can add > > the definition of list_for_each_entry_rcu_bh() (since it doesn't exist) > > or go back to non-RCU iteration (since the use is just documentation, > > the code is identical). Or fix it some other way? > > > > Oh, I really thought list_for_each_entry_rcu() was only checking standard rcu. > > I might have been confused because we do have hlist_for_each_entry_rcu_bh() helper. > > Anyway, when looking at the patch I was not at ease because we do not have proper > rcu grace period when a napi is removed from dev->napi_list. A driver might > free the napi struct right after calling netif_napi_del() Ugh, you're right. I didn't look closely enough at netif_napi_del(): if (napi_hash_del(napi)) synchronize_net(); list_del_init(&napi->dev_list); Looks like I can reorder these.. and perhaps make all dev->napi_list accesses RCU, for netpoll?
diff --git a/net/core/dev.c b/net/core/dev.c index d42c9ea0c3c0..95ac7568f693 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6612,12 +6612,13 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi, netdev_err_once(dev, "%s() called with weight %d\n", __func__, weight); napi->weight = weight; - list_add(&napi->dev_list, &dev->napi_list); napi->dev = dev; #ifdef CONFIG_NETPOLL napi->poll_owner = -1; #endif set_bit(NAPI_STATE_SCHED, &napi->state); + set_bit(NAPI_STATE_NPSVC, &napi->state); + list_add_rcu(&napi->dev_list, &dev->napi_list); napi_hash_add(napi); } EXPORT_SYMBOL(netif_napi_add); diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 093e90e52bc2..2338753e936b 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -162,7 +162,7 @@ static void poll_napi(struct net_device *dev) struct napi_struct *napi; int cpu = smp_processor_id(); - list_for_each_entry(napi, &dev->napi_list, dev_list) { + list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) { if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) { poll_one_napi(napi); smp_store_release(&napi->poll_owner, -1);
napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent netpoll from accessing rings before init is complete. However, the same is not done for fresh napi instances in netif_napi_add(), even though we expect NAPI instances to be added as disabled. This causes crashes during driver reconfiguration (enabling XDP, changing the channel count) - if there is any printk() after netif_napi_add() but before napi_enable(). To ensure memory ordering is correct we need to use RCU accessors. Reported-by: Rob Sherwood <rsher@fb.com> Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable") Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- net/core/dev.c | 3 ++- net/core/netpoll.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)