rds: fix use-after-free read in rds_find_bound

Message ID	1514662599-14491-1-git-send-email-santosh.shilimkar@oracle.com
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Santosh Shilimkar <santosh.shilimkar@oracle.com> To: netdev@vger.kernel.org, davem@davemloft.net Cc: linux-kernel@vger.kernel.org, Santosh Shilimkar <santosh.shilimkar@oracle.com> Subject: [PATCH] rds: fix use-after-free read in rds_find_bound Date: Sat, 30 Dec 2017 11:36:39 -0800 Message-Id: <1514662599-14491-1-git-send-email-santosh.shilimkar@oracle.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	rds: fix use-after-free read in rds_find_bound \| expand rds: fix use-after-free read in rds_find_bound

Santosh Shilimkar Dec. 30, 2017, 7:36 p.m. UTC

socket buffer can get freed as part of sock_close
callback so before adding reference check underneath
socket validity.

Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/bind.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Sowmini Varadhan Dec. 30, 2017, 8:26 p.m. UTC | #1

On (12/30/17 11:36), Santosh Shilimkar wrote:
> 
> socket buffer can get freed as part of sock_close
> callback so before adding reference check underneath
> socket validity.

I'm not sure I understand this fix-  

struct rds_sock is:
  struct rds_sock {
        struct sock             rs_sk;
         :
  }

How can  rs be non-null but rds_rs_to_sk() is null? (Note that
rds_rs_to_sk just returns &rs->rs_sk) so the changed line is 
identical to the original line.

> -	if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> +	if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))

I think the real issue is refcount bug somewhere,

Was the syzbot test run with http://patchwork.ozlabs.org/patch/852492/
this sounds like that type of bug.

--Sowmini

Santosh Shilimkar Dec. 30, 2017, 9:37 p.m. UTC | #2

On 12/30/17 12:26 PM, Sowmini Varadhan wrote:
> On (12/30/17 11:36), Santosh Shilimkar wrote:
>>
>> socket buffer can get freed as part of sock_close
>> callback so before adding reference check underneath
>> socket validity.
> 
> I'm not sure I understand this fix-
> 
> struct rds_sock is:
>    struct rds_sock {
>          struct sock             rs_sk;
>           :
>    }
> 
> How can  rs be non-null but rds_rs_to_sk() is null? (Note that
> rds_rs_to_sk just returns &rs->rs_sk) so the changed line is
> identical to the original line.
>
Well thats what the report says o.w flag test wouldn't have
been attempted.

>> -	if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
>> +	if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> 
> I think the real issue is refcount bug somewhere,
>
Thats what I thought as well initially but since the reported case,
the rs seems to be valid where as sk seems to be freed up as part of
sock_release callback.

> Was the syzbot test run with http://patchwork.ozlabs.org/patch/852492/
> this sounds like that type of bug.
> 
That fix scenario, the rs don't get inserted in hash table and
in this particular bug, the lookup was successful so am not sure
if these two bugs are related.

But since bound address fix was still not part of the build
reproduced use after free bug, $subject fix can wait for next
reproduction. Unfortunately as per the report, there is no
reproducer for it to test if other fix fixes this issue.

Regards,
Santosh

Sowmini Varadhan Dec. 30, 2017, 10:32 p.m. UTC | #3

On (12/30/17 13:37), santosh.shilimkar@oracle.com wrote:
> Well thats what the report says o.w flag test wouldn't have
> been attempted.

the bug report says "use-after-free". 

It doesnt say that rds_rs_to_sk(rs) is null (if rds_rs_to_sk(rs) was null,
rs would also be null, please cscope struct rds_sock)

What the bug report says is 
" The buggy address belongs to the object at ffff8801c09a6080
  which belongs to the cache RDS of size 1472
 The buggy address is located 96 bytes inside of .."

96 is the offset of sk->sk_flags. so yes, there is a socket refcount
issue.

But the patch you sent (see next two lines) will not solve that.

> >>-	if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> >>+	if (rs && rds_rs_to_sk(rs) && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))

Sowmini>I think the real issue is refcount bug somewhere,

> Thats what I thought as well initially but since the reported case,
> the rs seems to be valid where as sk seems to be freed up as part of
> sock_release callback.

I dont understand the statement above- how can "rs be valid, and sk
be freed"?

rs_sk is embedded in the struct rds_sock, it is not a pointer. 

let's find and fix the refcount bug. See stack trace in commit comment.
The socket release is happening prematurely and existing WARN_ONs
are not catching it.

> >Was the syzbot test run with http://patchwork.ozlabs.org/patch/852492/
> >this sounds like that type of bug.

--Sowmini

Santosh Shilimkar Dec. 31, 2017, 5:09 a.m. UTC | #4

On 12/30/17 2:32 PM, Sowmini Varadhan wrote:
> On (12/30/17 13:37), santosh.shilimkar@oracle.com wrote:

[...]

>> Thats what I thought as well initially but since the reported case,
>> the rs seems to be valid where as sk seems to be freed up as part of
>> sock_release callback.
> 
> I dont understand the statement above- how can "rs be valid, and sk
> be freed"?
>
> rs_sk is embedded in the struct rds_sock, it is not a pointer.
>
I was going with order of evaluation of if () but you made good point.
rs_sk isn't a pointer so sk can't be null.

> let's find and fix the refcount bug. See stack trace in commit comment.
> The socket release is happening prematurely and existing WARN_ONs
> are not catching it.
>
Right. This was loop transport in action so xmit will just flip
the direction with receive. And rds_recv_incoming() can race with 
socket_release. rds_find_bound() is suppose to add ref count on
socket for rds_recv_incoming() but by that time socket is DEAD &
freed by socket release callback.
And rds_release is suppose to be synced with rs_recv_lock. But
release callback is marking the sk orphan before syncing
up with receive path and updating the bind table. Probably it
can pushed down further after the socket clean up buut need
to think bit more.


diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index b405f77..11e1426 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -65,7 +65,6 @@ static int rds_release(struct socket *sock)

         rs = rds_sk_to_rs(sk);

-       sock_orphan(sk);
         /* Note - rds_clear_recv_queue grabs rs_recv_lock, so
          * that ensures the recv path has completed messing
          * with the socket. */
@@ -85,6 +84,7 @@ static int rds_release(struct socket *sock)

         rds_trans_put(rs->rs_transport);

+       sock_orphan(sk);
         sock->sk = NULL;
         sock_put(sk);
  out:

Sowmini Varadhan Dec. 31, 2017, 12:33 p.m. UTC | #5

On (12/30/17 21:09), santosh.shilimkar@oracle.com wrote:
> Right. This was loop transport in action so xmit will just flip
> the direction with receive. And rds_recv_incoming() can race with
> socket_release. rds_find_bound() is suppose to add ref count on
> socket for rds_recv_incoming() but by that time socket is DEAD &
> freed by socket release callback.

correct, that makes sense.

> And rds_release is suppose to be synced with rs_recv_lock. But
> release callback is marking the sk orphan before syncing
> up with receive path and updating the bind table. Probably it
> can pushed down further after the socket clean up buut need
> to think bit more.

However, I'm not sure this seals the race..  according to the
bug report rds_recv_incoming->rds_find_bound is being called
in rds_send_worker context and the rds_find_bound code is

     63         rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms);
     64         if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
     65                 rds_sock_addref(rs);
     66         else 
     67                 rs = NULL;
     68 

Since the entire logic of rds_release can interleave between line 63
and 64, (whereas we only addref at line 65), moving the sock_orphan
will not help.

I see that there was an explicic synchornization via the bucket->lock 
before 7b5654349e. I think you need something like that, or some type
or rcu-based hash list.

Patch below may make race-window smaller, but race window is still there.

> 
> 
> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index b405f77..11e1426 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
> @@ -65,7 +65,6 @@ static int rds_release(struct socket *sock)
> 
>         rs = rds_sk_to_rs(sk);
> 
> -       sock_orphan(sk);
>         /* Note - rds_clear_recv_queue grabs rs_recv_lock, so
>          * that ensures the recv path has completed messing
>          * with the socket. */
> @@ -85,6 +84,7 @@ static int rds_release(struct socket *sock)
> 
>         rds_trans_put(rs->rs_transport);
> 
> +       sock_orphan(sk);
>         sock->sk = NULL;
>         sock_put(sk);
>  out:

Santosh Shilimkar Dec. 31, 2017, 10:30 p.m. UTC | #6

On 12/31/17 4:33 AM, Sowmini Varadhan wrote:
> On (12/30/17 21:09), santosh.shilimkar@oracle.com wrote:
>> Right. This was loop transport in action so xmit will just flip
>> the direction with receive. And rds_recv_incoming() can race with
>> socket_release. rds_find_bound() is suppose to add ref count on
>> socket for rds_recv_incoming() but by that time socket is DEAD &
>> freed by socket release callback.
> 
> correct, that makes sense.
>
Yea. In fact the earlier point of sk being null and rs not is
also possible because socket release explicitly marks its
NULL ("sock->sk = NULL"). But it just side effect of the actual
race.

>> And rds_release is suppose to be synced with rs_recv_lock. But
>> release callback is marking the sk orphan before syncing
>> up with receive path and updating the bind table. Probably it
>> can pushed down further after the socket clean up buut need
>> to think bit more.
> 
> However, I'm not sure this seals the race..  according to the
> bug report rds_recv_incoming->rds_find_bound is being called
> in rds_send_worker context and the rds_find_bound code is
> 
>       63         rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms);
>       64         if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
>       65                 rds_sock_addref(rs);
>       66         else
>       67                 rs = NULL;
>       68
> 
> Since the entire logic of rds_release can interleave between line 63
> and 64, (whereas we only addref at line 65), moving the sock_orphan
> will not help.
>
> I see that there was an explicic synchornization via the bucket->lock
> before 7b5654349e. I think you need something like that, or some type
> or rcu-based hash list.
>
The rhashtable already has internal bucket lock so those operation like
add/remove are synced up.  But yes reference addition can still race
with receive since receive lock is taken after find bound.

> Patch below may make race-window smaller, but race window is still there.
>If the receive lock is taken ahead then with sock_orphan
change socket release will get synchronized with receive. But
preventing socket release to be getting triggered while in receive
path by means ref count is better option. Moving socket_orphan
down is anyway a good change but its not enough. Will think bit
more about it.

Thanks for the good discussion.

Regards,
Santosh

rds: fix use-after-free read in rds_find_bound

Commit Message

Comments

Patch