Message ID | 157851930654.21459.7236323146782270917.stgit@john-Precision-5820-Tower |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | xdp devmap improvements cleanup | expand |
On Wed, Jan 8, 2020 at 1:36 PM John Fastabend <john.fastabend@gmail.com> wrote: > > Now that we depend on rcu_call() and synchronize_rcu() to also wait > for preempt_disabled region to complete the rcu read critical section > in __dev_map_flush() is no longer relevant. > > These originally ensured the map reference was safe while a map was > also being free'd. But flush by new rules can only be called from > preempt-disabled NAPI context. The synchronize_rcu from the map free > path and the rcu_call from the delete path will ensure the reference > here is safe. So lets remove the rcu_read_lock and rcu_read_unlock > pair to avoid any confusion around how this is being protected. > > If the rcu_read_lock was required it would mean errors in the above > logic and the original patch would also be wrong. > > Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") > Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com>
On 2020/01/09 6:35, John Fastabend wrote: > Now that we depend on rcu_call() and synchronize_rcu() to also wait > for preempt_disabled region to complete the rcu read critical section > in __dev_map_flush() is no longer relevant. > > These originally ensured the map reference was safe while a map was > also being free'd. But flush by new rules can only be called from > preempt-disabled NAPI context. The synchronize_rcu from the map free > path and the rcu_call from the delete path will ensure the reference > here is safe. So lets remove the rcu_read_lock and rcu_read_unlock > pair to avoid any confusion around how this is being protected. > > If the rcu_read_lock was required it would mean errors in the above > logic and the original patch would also be wrong. > > Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > --- > kernel/bpf/devmap.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > index f0bf525..0129d4a 100644 > --- a/kernel/bpf/devmap.c > +++ b/kernel/bpf/devmap.c > @@ -378,10 +378,8 @@ void __dev_map_flush(void) > struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > struct xdp_bulk_queue *bq, *tmp; > > - rcu_read_lock(); > list_for_each_entry_safe(bq, tmp, flush_list, flush_node) > bq_xmit_all(bq, XDP_XMIT_FLUSH); > - rcu_read_unlock(); I introduced this lock because some drivers have assumption that .ndo_xdp_xmit() is called under RCU. (commit 86723c864063) Maybe devmap deletion logic does not need this anymore, but is it OK to drivers? Toshiaki Makita
On 2020/01/09 15:01, Toshiaki Makita wrote: > On 2020/01/09 6:35, John Fastabend wrote: >> Now that we depend on rcu_call() and synchronize_rcu() to also wait >> for preempt_disabled region to complete the rcu read critical section >> in __dev_map_flush() is no longer relevant. >> >> These originally ensured the map reference was safe while a map was >> also being free'd. But flush by new rules can only be called from >> preempt-disabled NAPI context. The synchronize_rcu from the map free >> path and the rcu_call from the delete path will ensure the reference >> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock >> pair to avoid any confusion around how this is being protected. >> >> If the rcu_read_lock was required it would mean errors in the above >> logic and the original patch would also be wrong. >> >> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") >> Signed-off-by: John Fastabend <john.fastabend@gmail.com> >> --- >> kernel/bpf/devmap.c | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c >> index f0bf525..0129d4a 100644 >> --- a/kernel/bpf/devmap.c >> +++ b/kernel/bpf/devmap.c >> @@ -378,10 +378,8 @@ void __dev_map_flush(void) >> struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); >> struct xdp_bulk_queue *bq, *tmp; >> - rcu_read_lock(); >> list_for_each_entry_safe(bq, tmp, flush_list, flush_node) >> bq_xmit_all(bq, XDP_XMIT_FLUSH); >> - rcu_read_unlock(); > > I introduced this lock because some drivers have assumption that > .ndo_xdp_xmit() is called under RCU. (commit 86723c864063) > > Maybe devmap deletion logic does not need this anymore, but is it > OK to drivers? More references for the driver problem: - Lockdep splat in virtio_net: https://lists.openwall.net/netdev/2019/04/24/282 - Discussion for fix: https://lists.openwall.net/netdev/2019/04/25/234 Toshiaki Makita
Toshiaki Makita wrote: > On 2020/01/09 6:35, John Fastabend wrote: > > Now that we depend on rcu_call() and synchronize_rcu() to also wait > > for preempt_disabled region to complete the rcu read critical section > > in __dev_map_flush() is no longer relevant. > > > > These originally ensured the map reference was safe while a map was > > also being free'd. But flush by new rules can only be called from > > preempt-disabled NAPI context. The synchronize_rcu from the map free > > path and the rcu_call from the delete path will ensure the reference > > here is safe. So lets remove the rcu_read_lock and rcu_read_unlock > > pair to avoid any confusion around how this is being protected. > > > > If the rcu_read_lock was required it would mean errors in the above > > logic and the original patch would also be wrong. > > > > Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") > > Signed-off-by: John Fastabend <john.fastabend@gmail.com> > > --- > > kernel/bpf/devmap.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > > index f0bf525..0129d4a 100644 > > --- a/kernel/bpf/devmap.c > > +++ b/kernel/bpf/devmap.c > > @@ -378,10 +378,8 @@ void __dev_map_flush(void) > > struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > > struct xdp_bulk_queue *bq, *tmp; > > > > - rcu_read_lock(); > > list_for_each_entry_safe(bq, tmp, flush_list, flush_node) > > bq_xmit_all(bq, XDP_XMIT_FLUSH); > > - rcu_read_unlock(); > > I introduced this lock because some drivers have assumption that > .ndo_xdp_xmit() is called under RCU. (commit 86723c864063) > > Maybe devmap deletion logic does not need this anymore, but is it > OK to drivers? Ah OK thanks for catching this. So its a strange requirement from virto_net to need read_lock like this. Quickly scanned the drivers and seems its the only one. I think the best path forward is to fix virtio_net so it doesn't need rcu_read_lock() here then the locking is much cleaner IMO. I'll send a v2 and either move the xdp enabled check (the piece using the rcu_read_lock) into a bitmask flag or push the rcu_read_lock() into virtio_net so its clear that this is a detail of virtio_net and not a general thing. FWIW I don't think the rcu_read_lock is actually needed in the virtio_net case anymore either but pretty sure the rcu_dereference will cause an rcu splat. Maybe there is another annotation we can use. I'll dig into it tomorrow. Thanks > > Toshiaki Makita
On 2020/01/09 15:35, John Fastabend wrote: > Toshiaki Makita wrote: >> On 2020/01/09 6:35, John Fastabend wrote: >>> Now that we depend on rcu_call() and synchronize_rcu() to also wait >>> for preempt_disabled region to complete the rcu read critical section >>> in __dev_map_flush() is no longer relevant. >>> >>> These originally ensured the map reference was safe while a map was >>> also being free'd. But flush by new rules can only be called from >>> preempt-disabled NAPI context. The synchronize_rcu from the map free >>> path and the rcu_call from the delete path will ensure the reference >>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock >>> pair to avoid any confusion around how this is being protected. >>> >>> If the rcu_read_lock was required it would mean errors in the above >>> logic and the original patch would also be wrong. >>> >>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") >>> Signed-off-by: John Fastabend <john.fastabend@gmail.com> >>> --- >>> kernel/bpf/devmap.c | 2 -- >>> 1 file changed, 2 deletions(-) >>> >>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c >>> index f0bf525..0129d4a 100644 >>> --- a/kernel/bpf/devmap.c >>> +++ b/kernel/bpf/devmap.c >>> @@ -378,10 +378,8 @@ void __dev_map_flush(void) >>> struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); >>> struct xdp_bulk_queue *bq, *tmp; >>> >>> - rcu_read_lock(); >>> list_for_each_entry_safe(bq, tmp, flush_list, flush_node) >>> bq_xmit_all(bq, XDP_XMIT_FLUSH); >>> - rcu_read_unlock(); >> >> I introduced this lock because some drivers have assumption that >> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063) >> >> Maybe devmap deletion logic does not need this anymore, but is it >> OK to drivers? > > Ah OK thanks for catching this. So its a strange requirement from > virto_net to need read_lock like this. Quickly scanned the drivers > and seems its the only one. > > I think the best path forward is to fix virtio_net so it doesn't > need rcu_read_lock() here then the locking is much cleaner IMO. Actually veth is calling rcu_dereference in .ndo_xdp_xmit() so it needs the same treatment. In the reference I sent in another mail, Jesper said mlx5 also has some RCU dependency. > I'll send a v2 and either move the xdp enabled check (the piece > using the rcu_read_lock) into a bitmask flag or push the > rcu_read_lock() into virtio_net so its clear that this is a detail > of virtio_net and not a general thing. FWIW I don't think the > rcu_read_lock is actually needed in the virtio_net case anymore > either but pretty sure the rcu_dereference will cause an rcu > splat. Maybe there is another annotation we can use. I'll dig > into it tomorrow. Thanks I'm thinking we can just move the rcu_lock to wrap around only ndo_xdp_xmit. But as you suggest if we can identify all drivers which depends on RCU and move the rcu_lock into the drivers (or remove the dependency) it's better. Toshiaki Makita
Toshiaki Makita wrote: > On 2020/01/09 15:35, John Fastabend wrote: > > Toshiaki Makita wrote: > >> On 2020/01/09 6:35, John Fastabend wrote: > >>> Now that we depend on rcu_call() and synchronize_rcu() to also wait > >>> for preempt_disabled region to complete the rcu read critical section > >>> in __dev_map_flush() is no longer relevant. > >>> > >>> These originally ensured the map reference was safe while a map was > >>> also being free'd. But flush by new rules can only be called from > >>> preempt-disabled NAPI context. The synchronize_rcu from the map free > >>> path and the rcu_call from the delete path will ensure the reference > >>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock > >>> pair to avoid any confusion around how this is being protected. > >>> > >>> If the rcu_read_lock was required it would mean errors in the above > >>> logic and the original patch would also be wrong. > >>> > >>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") > >>> Signed-off-by: John Fastabend <john.fastabend@gmail.com> > >>> --- > >>> kernel/bpf/devmap.c | 2 -- > >>> 1 file changed, 2 deletions(-) > >>> > >>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > >>> index f0bf525..0129d4a 100644 > >>> --- a/kernel/bpf/devmap.c > >>> +++ b/kernel/bpf/devmap.c > >>> @@ -378,10 +378,8 @@ void __dev_map_flush(void) > >>> struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > >>> struct xdp_bulk_queue *bq, *tmp; > >>> > >>> - rcu_read_lock(); > >>> list_for_each_entry_safe(bq, tmp, flush_list, flush_node) > >>> bq_xmit_all(bq, XDP_XMIT_FLUSH); > >>> - rcu_read_unlock(); > >> > >> I introduced this lock because some drivers have assumption that > >> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063) > >> > >> Maybe devmap deletion logic does not need this anymore, but is it > >> OK to drivers? > > > > Ah OK thanks for catching this. So its a strange requirement from > > virto_net to need read_lock like this. Quickly scanned the drivers > > and seems its the only one. > > > > I think the best path forward is to fix virtio_net so it doesn't > > need rcu_read_lock() here then the locking is much cleaner IMO. > > Actually veth is calling rcu_dereference in .ndo_xdp_xmit() so it needs > the same treatment. In the reference I sent in another mail, Jesper > said mlx5 also has some RCU dependency. So veth, virtio and tun seem to need rcu_read_lock/unlock because they use an rcu_dereference() in the xdp_xmit path. I'll audit the rest today. @Jesper, recall why mlx5 would require rcu_read_lock()/unlock() pair? I just looked at mlx5_xdp_xmit and I'm not seeing a rcu_dereference in there so if it is required I would want to understand why. > > > I'll send a v2 and either move the xdp enabled check (the piece > > using the rcu_read_lock) into a bitmask flag or push the > > rcu_read_lock() into virtio_net so its clear that this is a detail > > of virtio_net and not a general thing. FWIW I don't think the > > rcu_read_lock is actually needed in the virtio_net case anymore > > either but pretty sure the rcu_dereference will cause an rcu > > splat. Maybe there is another annotation we can use. I'll dig > > into it tomorrow. Thanks > > I'm thinking we can just move the rcu_lock to wrap around only ndo_xdp_xmit. > But as you suggest if we can identify all drivers which depends on RCU and move the > rcu_lock into the drivers (or remove the dependency) it's better. I think we are working in bpf-next tree here so it would be best to identify the minimal set of drivers that require the read_lock and push that into the driver. I prefer these things are a precise so its easy to understand when reading the code. Otherwise its really not clear without grepping through the code or walking the git history why we wrapped this in a rcu_read_lock/unlock. At minimum we want a comment but that feels like a big hammer that is not needed. Most drivers should not care about the rcu_read_lock it seems to just be special cases in the software devices where this happens. veth for example is dereferencing the peer netdev. tun is dereference the tun_file. virtio_net usage seems to be arbitrary to me and is simply used to decide if xdp is enabled but we can do that in a cleaner way. I'll put a v2 together today and send it out for review.
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index f0bf525..0129d4a 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -378,10 +378,8 @@ void __dev_map_flush(void) struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); struct xdp_bulk_queue *bq, *tmp; - rcu_read_lock(); list_for_each_entry_safe(bq, tmp, flush_list, flush_node) bq_xmit_all(bq, XDP_XMIT_FLUSH); - rcu_read_unlock(); } /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or
Now that we depend on rcu_call() and synchronize_rcu() to also wait for preempt_disabled region to complete the rcu read critical section in __dev_map_flush() is no longer relevant. These originally ensured the map reference was safe while a map was also being free'd. But flush by new rules can only be called from preempt-disabled NAPI context. The synchronize_rcu from the map free path and the rcu_call from the delete path will ensure the reference here is safe. So lets remove the rcu_read_lock and rcu_read_unlock pair to avoid any confusion around how this is being protected. If the rcu_read_lock was required it would mean errors in the above logic and the original patch would also be wrong. Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") Signed-off-by: John Fastabend <john.fastabend@gmail.com> --- kernel/bpf/devmap.c | 2 -- 1 file changed, 2 deletions(-)