diff mbox

[net] tun, bpf: fix suspicious RCU usage in tun_{attach,detach}_filter

Message ID 56FD795C.9090903@stressinduktion.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Hannes Frederic Sowa March 31, 2016, 7:24 p.m. UTC
Hello,

On 31.03.2016 21:21, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Thu, 31 Mar 2016 14:16:18 +0200
>
>> On 03/31/2016 01:59 PM, Eric Dumazet wrote:
>>> On Thu, 2016-03-31 at 13:35 +0200, Daniel Borkmann wrote:
>>>
>>>> +static inline bool sock_owned_externally(const struct sock *sk)
>>>> +{
>>>> +	return sk->sk_flags & (1UL << SOCK_EXTERNAL_OWNER);
>>>> +}
>>>> +
>>>
>>> Have you reinvented sock_flag(sl, SOCK_EXTERNAL_OWNER) ? ;)
>>>
>>> Anyway, using a flag for this purpose sounds overkill to me.
>>
>> Right.
>>
>>> Setting it is a way to 'fool' lockdep anyway...
>>
>> Yep, correct, we'd be fooling the tun case, so this diff doesn't
>> really make it any better there.
>
> I like the currently proposed patch where TUN says that RTNL is what
> the synchronizing element is.
>
> Maybe we could make a helper of some sort but since we only have once
> case like this is just overkill.
>
> Alexei, do you really mind if I apply Danile's patch?

I proposed the following patch to Daniel and he seemed to like it. I
was just waiting for his feedback and tags and wanted to send it out
then.

What do you think?

lockdep_sock_is_held does also have some other applications in other
parts of the source.

Bye,
Hannes

  	if (old_fp)
@@ -2259,7 +2260,9 @@ int sk_detach_filter(struct sock *sk)
  		return -EPERM;

  	filter = rcu_dereference_protected(sk->sk_filter,
-					   sock_owned_by_user(sk));
+					   lockdep_rtnl_is_held() ||
+					   lockdep_sock_is_held(sk));
+
  	if (filter) {
  		RCU_INIT_POINTER(sk->sk_filter, NULL);
  		sk_filter_uncharge(sk, filter);

c

Comments

Alexei Starovoitov March 31, 2016, 7:31 p.m. UTC | #1
On Thu, Mar 31, 2016 at 09:24:12PM +0200, Hannes Frederic Sowa wrote:
> Hello,
> 
> On 31.03.2016 21:21, David Miller wrote:
> >From: Daniel Borkmann <daniel@iogearbox.net>
> >Date: Thu, 31 Mar 2016 14:16:18 +0200
> >
> >>On 03/31/2016 01:59 PM, Eric Dumazet wrote:
> >>>On Thu, 2016-03-31 at 13:35 +0200, Daniel Borkmann wrote:
> >>>
> >>>>+static inline bool sock_owned_externally(const struct sock *sk)
> >>>>+{
> >>>>+	return sk->sk_flags & (1UL << SOCK_EXTERNAL_OWNER);
> >>>>+}
> >>>>+
> >>>
> >>>Have you reinvented sock_flag(sl, SOCK_EXTERNAL_OWNER) ? ;)
> >>>
> >>>Anyway, using a flag for this purpose sounds overkill to me.
> >>
> >>Right.
> >>
> >>>Setting it is a way to 'fool' lockdep anyway...
> >>
> >>Yep, correct, we'd be fooling the tun case, so this diff doesn't
> >>really make it any better there.
> >
> >I like the currently proposed patch where TUN says that RTNL is what
> >the synchronizing element is.
> >
> >Maybe we could make a helper of some sort but since we only have once
> >case like this is just overkill.
> >
> >Alexei, do you really mind if I apply Danile's patch?

I don't have strong opinion either way.
Though Hannes's patch below looks simpler and easier to backport.
Yeah, I do care about backports quite a bit more nowadays :)

> I proposed the following patch to Daniel and he seemed to like it. I
> was just waiting for his feedback and tags and wanted to send it out
> then.
> 
> What do you think?
> 
> lockdep_sock_is_held does also have some other applications in other
> parts of the source.
> 
> Bye,
> Hannes
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 255d3e03727b73..651b84a38cfb9b 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1330,6 +1330,12 @@ static inline void sock_release_ownership(struct sock
> *sk)
>  	sk->sk_lock.owned = 0;
>  }
> 
> +static inline bool lockdep_sock_is_held(struct sock *sk)
> +{
> +	return lockdep_is_held(&sk->sk_lock) ||
> +	       lockdep_is_held(&sk->sk_lock.slock);
> +}
> +
>  /*
>   * Macro so as to not evaluate some arguments when
>   * lockdep is not enabled.
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4b81b71171b4ce..8ab270d5ce5507 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1166,7 +1166,8 @@ static int __sk_attach_prog(struct bpf_prog *prog,
> struct sock *sk)
>  	}
> 
>  	old_fp = rcu_dereference_protected(sk->sk_filter,
> -					   sock_owned_by_user(sk));
> +					   lockdep_rtnl_is_held() ||
> +					   lockdep_sock_is_held(sk));
>  	rcu_assign_pointer(sk->sk_filter, fp);
> 
>  	if (old_fp)
> @@ -2259,7 +2260,9 @@ int sk_detach_filter(struct sock *sk)
>  		return -EPERM;
> 
>  	filter = rcu_dereference_protected(sk->sk_filter,
> -					   sock_owned_by_user(sk));
> +					   lockdep_rtnl_is_held() ||
> +					   lockdep_sock_is_held(sk));
> +
>  	if (filter) {
>  		RCU_INIT_POINTER(sk->sk_filter, NULL);
>  		sk_filter_uncharge(sk, filter);
>
David Miller March 31, 2016, 7:36 p.m. UTC | #2
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 31 Mar 2016 21:24:12 +0200

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4b81b71171b4ce..8ab270d5ce5507 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1166,7 +1166,8 @@ static int __sk_attach_prog(struct bpf_prog
> *prog, struct sock *sk)
>  	}
> 
>  	old_fp = rcu_dereference_protected(sk->sk_filter,
> -					   sock_owned_by_user(sk));
> +					   lockdep_rtnl_is_held() ||
> +					   lockdep_sock_is_held(sk));
>  	rcu_assign_pointer(sk->sk_filter, fp);
> 
>  	if (old_fp)

I have the same objections Daniel did.

Not all socket filter clients use RTNL as the synchornization
mechanism.  The caller, or some descriptive element, should tell us
what that synchronizing element is.

Yes, I understand how these RTNL checks can pass "accidently" but
the opposite is true too.  A socket locking synchornizing user,
who didn't lock the socket, might now pass because RTNL happens
to be held elsewhere.

Constraining the test properly, based upon the user, makes this less
likely to happen.  And that's desirable.
Hannes Frederic Sowa March 31, 2016, 7:48 p.m. UTC | #3
On 31.03.2016 21:36, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Thu, 31 Mar 2016 21:24:12 +0200
>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 4b81b71171b4ce..8ab270d5ce5507 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -1166,7 +1166,8 @@ static int __sk_attach_prog(struct bpf_prog
>> *prog, struct sock *sk)
>>   	}
>>
>>   	old_fp = rcu_dereference_protected(sk->sk_filter,
>> -					   sock_owned_by_user(sk));
>> +					   lockdep_rtnl_is_held() ||
>> +					   lockdep_sock_is_held(sk));
>>   	rcu_assign_pointer(sk->sk_filter, fp);
>>
>>   	if (old_fp)
>
> I have the same objections Daniel did.
>
> Not all socket filter clients use RTNL as the synchornization
> mechanism.  The caller, or some descriptive element, should tell us
> what that synchronizing element is.
>
> Yes, I understand how these RTNL checks can pass "accidently" but
> the opposite is true too.  A socket locking synchornizing user,
> who didn't lock the socket, might now pass because RTNL happens
> to be held elsewhere.

Actually lockdep_rtnl_is_held checks if this specific code/thread holds 
the lock and no other cpu/thread. So it will not pass here in case 
another cpu has the lock.

lockdep stores the current held locks in current->held_locks, if we 
preempt we switch current pointer, if we take a spin_lock we can't sleep 
thus not preempt. Thus we always know that this specific code has the lock.

Using sock_owned_by_user actually has this problem, and thus I am 
replacing it. We don't know who has the socket locked.

Tightest solution would probably be to combine both patches.

bool called_by_tuntap;

old_fp = rcu_dereference_protected(sk->sk_filter, called_by_tuntap ? 
lockdep_rtnl_is_held() : lockdep_sock_is_held());

Bye,
Hannes
David Miller March 31, 2016, 7:48 p.m. UTC | #4
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Thu, 31 Mar 2016 12:31:56 -0700

> On Thu, Mar 31, 2016 at 09:24:12PM +0200, Hannes Frederic Sowa wrote:
>> Hello,
>> 
>> On 31.03.2016 21:21, David Miller wrote:
>> >From: Daniel Borkmann <daniel@iogearbox.net>
>> >Date: Thu, 31 Mar 2016 14:16:18 +0200
>> >
>> >Alexei, do you really mind if I apply Danile's patch?
> 
> I don't have strong opinion either way.
> Though Hannes's patch below looks simpler and easier to backport.
> Yeah, I do care about backports quite a bit more nowadays :)

You know, I care a lot about backports too :)

But Hannes's patch has the same fundamental issue, I think.

If we accept both synchornization styles, false positives are more
likely.

And in the long term, we can fix the false positive possibilities with
the RTNL checks.
David Miller March 31, 2016, 7:50 p.m. UTC | #5
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 31 Mar 2016 21:48:27 +0200

> Tightest solution would probably be to combine both patches.
> 
> bool called_by_tuntap;
> 
> old_fp = rcu_dereference_protected(sk->sk_filter, called_by_tuntap ?
> lockdep_rtnl_is_held() : lockdep_sock_is_held());

Ok, I see what you're saying.

I misunderstood how the RTNL lockdep checks work and thought we could
get false positives from other entities taking RTNL.

Can you cook up the combined patch?

Thanks.
Daniel Borkmann March 31, 2016, 9:52 p.m. UTC | #6
On 03/31/2016 09:48 PM, Hannes Frederic Sowa wrote:
[...]
> Tightest solution would probably be to combine both patches.
>
> bool called_by_tuntap;
>
> old_fp = rcu_dereference_protected(sk->sk_filter, called_by_tuntap ? lockdep_rtnl_is_held() : lockdep_sock_is_held());

If I understand you correctly with combining them, you mean you'd still
need the API change to pass the bool 'called_by_tuntap' down, right?

If so, your main difference is, after all, to replace the sock_owned_by_user()
with the lockdep_sock_is_held() construction instead, correct?

But then, isn't it already sufficient when you pass the bool itself down
that 'demuxes' in this case between the sock_owned_by_user() vs
lockdep_rtnl_is_held() check?

Thanks,
Daniel
Hannes Frederic Sowa March 31, 2016, 11:31 p.m. UTC | #7
Hi Daniel,

On 31.03.2016 23:52, Daniel Borkmann wrote:
> On 03/31/2016 09:48 PM, Hannes Frederic Sowa wrote:
> [...]
>> Tightest solution would probably be to combine both patches.
>>
>> bool called_by_tuntap;
>>
>> old_fp = rcu_dereference_protected(sk->sk_filter, called_by_tuntap ?
>> lockdep_rtnl_is_held() : lockdep_sock_is_held());
>
> If I understand you correctly with combining them, you mean you'd still
> need the API change to pass the bool 'called_by_tuntap' down, right?

I actually decided to simply lock the sockets. So that there will always 
be a clear lock owner during the updates. I think this is cleaner. What 
do you think?

> If so, your main difference is, after all, to replace the
> sock_owned_by_user()
> with the lockdep_sock_is_held() construction instead, correct?

I just didn't do that part because we hold socket lock now.

> But then, isn't it already sufficient when you pass the bool itself down
> that 'demuxes' in this case between the sock_owned_by_user() vs
> lockdep_rtnl_is_held() check?

I just send out the patches, please have a look.

Thanks,
Hannes
diff mbox

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e03727b73..651b84a38cfb9b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1330,6 +1330,12 @@  static inline void sock_release_ownership(struct 
sock *sk)
  	sk->sk_lock.owned = 0;
  }

+static inline bool lockdep_sock_is_held(struct sock *sk)
+{
+	return lockdep_is_held(&sk->sk_lock) ||
+	       lockdep_is_held(&sk->sk_lock.slock);
+}
+
  /*
   * Macro so as to not evaluate some arguments when
   * lockdep is not enabled.
diff --git a/net/core/filter.c b/net/core/filter.c
index 4b81b71171b4ce..8ab270d5ce5507 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1166,7 +1166,8 @@  static int __sk_attach_prog(struct bpf_prog *prog, 
struct sock *sk)
  	}

  	old_fp = rcu_dereference_protected(sk->sk_filter,
-					   sock_owned_by_user(sk));
+					   lockdep_rtnl_is_held() ||
+					   lockdep_sock_is_held(sk));
  	rcu_assign_pointer(sk->sk_filter, fp);