diff mbox

net/unix: sk_socket can disappear when state is unlocked

Message ID 1432225541-28498-1-git-send-email-salyzyn@android.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Mark Salyzyn May 21, 2015, 4:25 p.m. UTC
got a rare NULL pointer dereference in clear_bit

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
---
 net/unix/af_unix.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Hannes Frederic Sowa May 22, 2015, 9:50 a.m. UTC | #1
On Do, 2015-05-21 at 09:25 -0700, Mark Salyzyn wrote:
> got a rare NULL pointer dereference in clear_bit
> 
> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> ---
>  net/unix/af_unix.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 5266ea7..37a8925 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1880,6 +1880,11 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
>  		unix_state_unlock(sk);
>  		timeo = freezable_schedule_timeout(timeo);
>  		unix_state_lock(sk);
> +
> +		/* sk_socket may have been killed while unlocked */
> +		if (!sk->sk_socket)
> +			break;
> +
>  		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
>  	}
>  

Canonical way is to test for sock_flag(sk, SOCK_DEAD). Also it does not
seem like we are returning an error to user space but are still looping
to try to dequeue skbs from sk_receive_queue, which is concurrently
emptied by unix_release (maybe, without holding unix_state_lock).

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Salyzyn May 22, 2015, 2:51 p.m. UTC | #2
On 05/22/2015 02:50 AM, Hannes Frederic Sowa wrote:
> On Do, 2015-05-21 at 09:25 -0700, Mark Salyzyn wrote:
>> got a rare NULL pointer dereference in clear_bit
>>
>> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
>> ---
>>   net/unix/af_unix.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index 5266ea7..37a8925 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -1880,6 +1880,11 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
>>   		unix_state_unlock(sk);
>>   		timeo = freezable_schedule_timeout(timeo);
>>   		unix_state_lock(sk);
>> +
>> +		/* sk_socket may have been killed while unlocked */
>> +		if (!sk->sk_socket)
>> +			break;
>> +
>>   		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
>>   	}
>>   
> Canonical way is to test for sock_flag(sk, SOCK_DEAD). Also it does not
> seem like we are returning an error to user space but are still looping
> to try to dequeue skbs from sk_receive_queue, which is concurrently
> emptied by unix_release (maybe, without holding unix_state_lock).
>
> Bye,
> Hannes
>
I will send an updated patch shortly.

It may be acceptable given the expectation that sk_set_socket(sk, NULL) 
occurs after SOCK_DEAD flag is set since we would not be here during the 
socket initialization/connection phases. As such, for all phases (and I 
re-iterate, we can only be here if in connected state), it is not a 
generic guarantee of sk_socket != NULL. But I only saw one apparent 
example (in net/decnet/dn_nsp_in.c) of using sock_flag(sk, SOCK_DEAD) as 
protection against a possible deference NULL access with sk_socket, and 
many KISS examples of checking sk_socket for NULL to protect against thus.

Thanks for making me look though, it appears that I missed the same 
problem in net/caif/caif_socket.c and will add it!

Sincerely -- Mark Salyzyn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa May 22, 2015, 3:35 p.m. UTC | #3
On Fr, 2015-05-22 at 07:51 -0700, Mark Salyzyn wrote:
> On 05/22/2015 02:50 AM, Hannes Frederic Sowa wrote:
> > On Do, 2015-05-21 at 09:25 -0700, Mark Salyzyn wrote:
> >> got a rare NULL pointer dereference in clear_bit
> >>
> >> Signed-off-by: Mark Salyzyn <salyzyn@android.com>
> >> ---
> >>   net/unix/af_unix.c | 5 +++++
> >>   1 file changed, 5 insertions(+)
> >>
> >> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> >> index 5266ea7..37a8925 100644
> >> --- a/net/unix/af_unix.c
> >> +++ b/net/unix/af_unix.c
> >> @@ -1880,6 +1880,11 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
> >>   		unix_state_unlock(sk);
> >>   		timeo = freezable_schedule_timeout(timeo);
> >>   		unix_state_lock(sk);
> >> +
> >> +		/* sk_socket may have been killed while unlocked */
> >> +		if (!sk->sk_socket)
> >> +			break;
> >> +
> >>   		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
> >>   	}
> >>   
> > Canonical way is to test for sock_flag(sk, SOCK_DEAD). Also it does not
> > seem like we are returning an error to user space but are still looping
> > to try to dequeue skbs from sk_receive_queue, which is concurrently
> > emptied by unix_release (maybe, without holding unix_state_lock).
> >
> > Bye,
> > Hannes
> >
> I will send an updated patch shortly.
> 
> It may be acceptable given the expectation that sk_set_socket(sk, NULL) 
> occurs after SOCK_DEAD flag is set since we would not be here during the 
> socket initialization/connection phases. As such, for all phases (and I 
> re-iterate, we can only be here if in connected state), it is not a 
> generic guarantee of sk_socket != NULL. But I only saw one apparent 
> example (in net/decnet/dn_nsp_in.c) of using sock_flag(sk, SOCK_DEAD) as 
> protection against a possible deference NULL access with sk_socket, and 
> many KISS examples of checking sk_socket for NULL to protect against thus.
> 
> Thanks for making me look though, it appears that I missed the same 
> problem in net/caif/caif_socket.c and will add it!

Thank you for v2 of the patch.

I still wonder if we need to actually recheck the condition and not
simply break out of unix_stream_data_wait:

We return to the unix_stream_recvmsg loop and recheck the
sk_receive_queue. At this point sk_receive_queue is not really protected
with unix_state_lock against concurrent modification with unix_release,
as such we could end up concurrently dequeueing packets if socket is
DEAD.

Does that make sense?

Thanks,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Salyzyn May 22, 2015, 4:24 p.m. UTC | #4
On 05/22/2015 08:35 AM, Hannes Frederic Sowa wrote:
> I still wonder if we need to actually recheck the condition and not
> simply break out of unix_stream_data_wait:
>
> We return to the unix_stream_recvmsg loop and recheck the
> sk_receive_queue. At this point sk_receive_queue is not really protected
> with unix_state_lock against concurrent modification with unix_release,
> as such we could end up concurrently dequeueing packets if socket is
> DEAD.
sock destroy(sic) is called before sock_orphan which sets SOCK_DEAD, so 
the receive queue has already been drained.

Sincerely -- Mark Salyzyn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa May 22, 2015, 6:16 p.m. UTC | #5
On Fri, May 22, 2015, at 18:24, Mark Salyzyn wrote:
> On 05/22/2015 08:35 AM, Hannes Frederic Sowa wrote:
> > I still wonder if we need to actually recheck the condition and not
> > simply break out of unix_stream_data_wait:
> >
> > We return to the unix_stream_recvmsg loop and recheck the
> > sk_receive_queue. At this point sk_receive_queue is not really protected
> > with unix_state_lock against concurrent modification with unix_release,
> > as such we could end up concurrently dequeueing packets if socket is
> > DEAD.
> sock destroy(sic) is called before sock_orphan which sets SOCK_DEAD, so 
> the receive queue has already been drained.

I am still afraid that there is a race:

When we break out in unix_stream_data_wait we most of the time hit the
continue statement in unix_stream_recvmsg. Albeit we acquired state lock
again, we could end up in a situation where the sk_receive_queue is not
completely drained. We would miss the recheck of the sk_shutdown mask,
because it is possible we dequeue a non-null skb from the receive queue.
This is because unix_release_sock acquires state lock, sets appropriate
flags but the draining of the receive queue does happen without locks,
state lock is unlocked before that. So theoretically both, release_sock
and recvmsg could dequeue skbs concurrently in nondeterministic
behavior.

The fix would be to recheck SOCK_DEAD or even better, sk_shutdown right
after we reacquired state_lock and break out of the loop altogether,
maybe with -ECONNRESET.

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Salyzyn May 22, 2015, 7:59 p.m. UTC | #6
On 05/22/2015 11:16 AM, Hannes Frederic Sowa wrote:
> On Fri, May 22, 2015, at 18:24, Mark Salyzyn wrote:
>> On 05/22/2015 08:35 AM, Hannes Frederic Sowa wrote:
>>> I still wonder if we need to actually recheck the condition and not
>>> simply break out of unix_stream_data_wait:
>>>
>>> We return to the unix_stream_recvmsg loop and recheck the
>>> sk_receive_queue. At this point sk_receive_queue is not really protected
>>> with unix_state_lock against concurrent modification with unix_release,
>>> as such we could end up concurrently dequeueing packets if socket is
>>> DEAD.
>> sock destroy(sic) is called before sock_orphan which sets SOCK_DEAD, so
>> the receive queue has already been drained.
> I am still afraid that there is a race:
>
> When we break out in unix_stream_data_wait we most of the time hit the
> continue statement in unix_stream_recvmsg. Albeit we acquired state lock
> again, we could end up in a situation where the sk_receive_queue is not
> completely drained. We would miss the recheck of the sk_shutdown mask,
> because it is possible we dequeue a non-null skb from the receive queue.
> This is because unix_release_sock acquires state lock, sets appropriate
> flags but the draining of the receive queue does happen without locks,
> state lock is unlocked before that. So theoretically both, release_sock
> and recvmsg could dequeue skbs concurrently in nondeterministic
> behavior.
>
> The fix would be to recheck SOCK_DEAD or even better, sk_shutdown right
> after we reacquired state_lock and break out of the loop altogether,
> maybe with -ECONNRESET.
>
> Thanks,
> Hannes
I am trying to figure out _how_ to appease your worries.

Keep in mind what I hit was rare already, and resulted in a panic. 
Nondeterministic packet delivery during shutdown is a given, but if I 
buy that one can receive another frame after packet flush and 
RCV_SHUTDOWN, and SOCK_DEAD is set under lock then returning to the 
thread in wait, would you be more comfortable with:

         do {
                 int chunk;
                 struct sk_buff *skb, *last;

                 unix_state_lock(sk);
                 last = skb = skb_peek(&sk->sk_receive_queue);
  again:
-               if (skb == NULL) {
+               if (!skb || sock_flag(sk, SOCK_DEAD)) {
                         unix_sk(sk)->recursion_level = 0;
                         if (copied >= target)
                                 goto unlock;

- or -

+               skb = NULL;
+               if (!sock_flag(sk, SOCK_DEAD))   // check after loop, 
but not in again loop?
+                       skb = skb_peek(&sk->sk_receive_queue
+               last = skb;

I know this does not give you -ECONNRESET (but we will we get 
sock_error(sk) disposition, another check for sock_flag if err == 0 
could fix that)

Too far to deal with nondeterministic packet flow? getting a last packet 
or not does not seem worth the cycles of CPU trouble?

Sincerely -- Mark Salyzyn

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5266ea7..37a8925 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1880,6 +1880,11 @@  static long unix_stream_data_wait(struct sock *sk, long timeo,
 		unix_state_unlock(sk);
 		timeo = freezable_schedule_timeout(timeo);
 		unix_state_lock(sk);
+
+		/* sk_socket may have been killed while unlocked */
+		if (!sk->sk_socket)
+			break;
+
 		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
 	}