Message ID | 20190927165221.2391541-1-kafai@fb.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf] bpf: Fix a race in reuseport_array_free() | expand |
On 9/27/19 9:52 AM, Martin KaFai Lau wrote: > In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still > valid. It is because bpf_sk_reuseport_detach() can be called from > __sk_destruct() which is invoked through call_rcu(..., __sk_destruct). We could question why reuseport_detach_sock(sk) is called from __sk_destruct() (after the rcu grace period) instead of sk_destruct() ? > > This patch takes the reuseport_lock in reuseport_array_free() which > is not the fast path. The lock is taken inside the loop in case > that the bpf map is big. > > Signed-off-by: Martin KaFai Lau <kafai@fb.com> Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote: > > > On 9/27/19 9:52 AM, Martin KaFai Lau wrote: > > In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still > > valid. It is because bpf_sk_reuseport_detach() can be called from > > __sk_destruct() which is invoked through call_rcu(..., __sk_destruct). > > We could question why reuseport_detach_sock(sk) is called from __sk_destruct() > (after the rcu grace period) instead of sk_destruct() ? Agree. It is another way to fix it. In this patch, I chose to avoid the need to single out a special treatment for reuseport_detach_sock() in sk_destruct(). I am happy either way. What do you think? > > > > > This patch takes the reuseport_lock in reuseport_array_free() which > > is not the fast path. The lock is taken inside the loop in case > > that the bpf map is big. > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com> > > Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY") Ah...missed that. Thanks!
On 9/27/19 11:17 AM, Martin Lau wrote: > On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote: >> >> >> On 9/27/19 9:52 AM, Martin KaFai Lau wrote: >>> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still >>> valid. It is because bpf_sk_reuseport_detach() can be called from >>> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct). >> >> We could question why reuseport_detach_sock(sk) is called from __sk_destruct() >> (after the rcu grace period) instead of sk_destruct() ? > Agree. It is another way to fix it. > > In this patch, I chose to avoid the need to single out a special treatment for > reuseport_detach_sock() in sk_destruct(). > > I am happy either way. What do you think? It seems that since we call reuseport_detach_sock() after the rcu grace period, another cpu could catch the sk pointer in reuse->socks[] array and use it right before our cpu frees the socket. RCU rules are not properly applied here I think. The rules for deletion are : 1) unpublish object from various lists/arrays/hashes. 2) rcu_grace_period 3) free the object. If we fix the unpublish (we need to anyway to make the data path safe), then your patch is not needed ? What about (totally untested, might be horribly wrong) diff --git a/net/core/sock.c b/net/core/sock.c index 07863edbe6fc4842e47ebebf00bc21bc406d9264..d31a4b094797f73ef89110c954aa0a164879362d 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1700,8 +1700,6 @@ static void __sk_destruct(struct rcu_head *head) sk_filter_uncharge(sk, filter); RCU_INIT_POINTER(sk->sk_filter, NULL); } - if (rcu_access_pointer(sk->sk_reuseport_cb)) - reuseport_detach_sock(sk); sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP); @@ -1728,7 +1726,13 @@ static void __sk_destruct(struct rcu_head *head) void sk_destruct(struct sock *sk) { - if (sock_flag(sk, SOCK_RCU_FREE)) + bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE); + + if (rcu_access_pointer(sk->sk_reuseport_cb)) { + reuseport_detach_sock(sk); + use_call_rcu = true; + } + if (use_call_rcu) call_rcu(&sk->sk_rcu, __sk_destruct); else __sk_destruct(&sk->sk_rcu);
On Fri, Sep 27, 2019 at 01:47:32PM -0700, Eric Dumazet wrote: > > > On 9/27/19 11:17 AM, Martin Lau wrote: > > On Fri, Sep 27, 2019 at 10:24:49AM -0700, Eric Dumazet wrote: > >> > >> > >> On 9/27/19 9:52 AM, Martin KaFai Lau wrote: > >>> In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still > >>> valid. It is because bpf_sk_reuseport_detach() can be called from > >>> __sk_destruct() which is invoked through call_rcu(..., __sk_destruct). > >> > >> We could question why reuseport_detach_sock(sk) is called from __sk_destruct() > >> (after the rcu grace period) instead of sk_destruct() ? > > Agree. It is another way to fix it. > > > > In this patch, I chose to avoid the need to single out a special treatment for > > reuseport_detach_sock() in sk_destruct(). > > > > I am happy either way. What do you think? > > It seems that since we call reuseport_detach_sock() after the rcu grace period, > another cpu could catch the sk pointer in reuse->socks[] array and use > it right before our cpu frees the socket. > > RCU rules are not properly applied here I think. > > The rules for deletion are : > > 1) unpublish object from various lists/arrays/hashes. Thanks for the analysis. Agreed. Indeed, there is an issue in reuse->socks[] which is shared with other sockets and they may pick up the destructed sk from reuse->socks[]. > 2) rcu_grace_period > 3) free the object. > > If we fix the unpublish (we need to anyway to make the data path safe), > then your patch is not needed ? Correct, not needed. > > What about (totally untested, might be horribly wrong) I had something similar in mind also. I will take a closer look and re-spin v2. > > diff --git a/net/core/sock.c b/net/core/sock.c > index 07863edbe6fc4842e47ebebf00bc21bc406d9264..d31a4b094797f73ef89110c954aa0a164879362d 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1700,8 +1700,6 @@ static void __sk_destruct(struct rcu_head *head) > sk_filter_uncharge(sk, filter); > RCU_INIT_POINTER(sk->sk_filter, NULL); > } > - if (rcu_access_pointer(sk->sk_reuseport_cb)) > - reuseport_detach_sock(sk); > > sock_disable_timestamp(sk, SK_FLAGS_TIMESTAMP); > > @@ -1728,7 +1726,13 @@ static void __sk_destruct(struct rcu_head *head) > > void sk_destruct(struct sock *sk) > { > - if (sock_flag(sk, SOCK_RCU_FREE)) > + bool use_call_rcu = sock_flag(sk, SOCK_RCU_FREE); > + > + if (rcu_access_pointer(sk->sk_reuseport_cb)) { > + reuseport_detach_sock(sk); > + use_call_rcu = true; > + } > + if (use_call_rcu) > call_rcu(&sk->sk_rcu, __sk_destruct); > else > __sk_destruct(&sk->sk_rcu);
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c index 50c083ba978c..9e593ac31ad7 100644 --- a/kernel/bpf/reuseport_array.c +++ b/kernel/bpf/reuseport_array.c @@ -103,29 +103,11 @@ static void reuseport_array_free(struct bpf_map *map) * array now. Hence, this function only races with * bpf_sk_reuseport_detach() which was triggerred by * close() or disconnect(). - * - * This function and bpf_sk_reuseport_detach() are - * both removing sk from "array". Who removes it - * first does not matter. - * - * The only concern here is bpf_sk_reuseport_detach() - * may access "array" which is being freed here. - * bpf_sk_reuseport_detach() access this "array" - * through sk->sk_user_data _and_ with sk->sk_callback_lock - * held which is enough because this "array" is not freed - * until all sk->sk_user_data has stopped referencing this "array". - * - * Hence, due to the above, taking "reuseport_lock" is not - * needed here. */ - - /* - * Since reuseport_lock is not taken, sk is accessed under - * rcu_read_lock() - */ - rcu_read_lock(); for (i = 0; i < map->max_entries; i++) { - sk = rcu_dereference(array->ptrs[i]); + spin_lock_bh(&reuseport_lock); + sk = rcu_dereference_protected(array->ptrs[i], + lockdep_is_held(&reuseport_lock)); if (sk) { write_lock_bh(&sk->sk_callback_lock); /* @@ -137,8 +119,9 @@ static void reuseport_array_free(struct bpf_map *map) write_unlock_bh(&sk->sk_callback_lock); RCU_INIT_POINTER(array->ptrs[i], NULL); } + spin_unlock_bh(&reuseport_lock); + cond_resched(); } - rcu_read_unlock(); /* * Once reaching here, all sk->sk_user_data is not
In reuseport_array_free(), the rcu_read_lock() cannot ensure sk is still valid. It is because bpf_sk_reuseport_detach() can be called from __sk_destruct() which is invoked through call_rcu(..., __sk_destruct). This patch takes the reuseport_lock in reuseport_array_free() which is not the fast path. The lock is taken inside the loop in case that the bpf map is big. Signed-off-by: Martin KaFai Lau <kafai@fb.com> --- kernel/bpf/reuseport_array.c | 27 +++++---------------------- 1 file changed, 5 insertions(+), 22 deletions(-)