mbox series

[bpf,v2,0/3] sockmap/ktls fixes

Message ID 155620799743.22884.8046772841813554446.stgit@john-XPS-13-9360
Headers show
Series sockmap/ktls fixes | expand

Message

John Fastabend April 25, 2019, 4:02 p.m. UTC
Series of fixes for sockmap and ktls, see patches for descriptions.

v2: fix build issue for CONFIG_TLS_DEVICE and fixup couple comments from
    Jakub.

---

John Fastabend (3):
      bpf: tls, implement unhash to avoid transition out of ESTABLISHED
      bpf: sockmap remove duplicate queue free
      bpf: sockmap fix msg->sg.size account on ingress skb


 include/net/tls.h  |   14 ++++++++++++-
 net/core/skmsg.c   |    1 +
 net/ipv4/tcp_bpf.c |    2 --
 net/tls/tls_main.c |   55 +++++++++++++++++++++++++++++++++++++++-------------
 net/tls/tls_sw.c   |   13 +++++++++---
 5 files changed, 64 insertions(+), 21 deletions(-)

--
Signature

Comments

Jakub Kicinski April 25, 2019, 6:30 p.m. UTC | #1
On Thu, 25 Apr 2019 09:02:50 -0700, John Fastabend wrote:
> Series of fixes for sockmap and ktls, see patches for descriptions.
> 
> v2: fix build issue for CONFIG_TLS_DEVICE and fixup couple comments from
>     Jakub.

Ah, right my comment about the rx side sleeping was fairly nonsensical,
the locking issues is that the work queue tries to lock the same socket.

But I'm hitting some nasties, there is a UAF on a non-offload socket,
and offload dies fairly hard.  It _could_ be my offload patches on top,
but "they worked yesterday".  Digging deeper on the offload side,
here's the UAF:

[  258.559962] =================================================================
[  258.568212] BUG: KASAN: use-after-free in tls_sk_proto_close+0x1a9/0x1e0 [tl]
[  258.576398] Read of size 8 at addr ffff88871d1edf18 by task ktls_source/2542
[  258.584369] 
[  258.586121] CPU: 18 PID: 2542 Comm: ktls_source Not tainted 5.1.0-rc5-debug-7
[  258.596445] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177
[  258.604968] Call Trace:
[  258.607796]  dump_stack+0x7c/0xc0
[  258.611594]  print_address_description.cold.2+0x9/0x239
[  258.617528]  kasan_report.cold.3+0x78/0x92
[  258.622200]  ? tls_sk_proto_close+0x1a9/0x1e0 [tls]
[  258.627745]  ? tcp_check_oom+0x390/0x390
[  258.632221]  tls_sk_proto_close+0x1a9/0x1e0 [tls]
[  258.637573]  inet_release+0xd6/0x1b0
[  258.641661]  __sock_release+0xc0/0x290
[  258.645942]  sock_close+0x11/0x20
[  258.649735]  __fput+0x244/0x730
[  258.653341]  task_work_run+0xfe/0x180
[  258.657530]  exit_to_usermode_loop+0x10d/0x130
[  258.662589]  do_syscall_64+0x2ff/0x400
[  258.666875]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  258.672630] RIP: 0033:0x7fb42bbe2421
[  258.676723] Code: f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
[  258.697857] RSP: 002b:00007fffaabd9428 EFLAGS: 00000246 ORIG_RAX: 00000000003
[  258.706526] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fb42bbe2421
[  258.714595] RDX: 00007fb41ffbf000 RSI: 000000000bebd000 RDI: 0000000000000003
[  258.722664] RBP: 0000000000000003 R08: 00000000ffffffff R09: 0000000000000000
[  258.730735] R10: 0000000000000022 R11: 0000000000000246 R12: 00007fb42b7df210
[  258.738805] R13: 00007fb41f923010 R14: 0000000000004113 R15: 0000000000000000
[  258.746875] 
[  258.748645] Allocated by task 2542:
[  258.752655]  create_ctx+0x46/0x2d0 [tls]
[  258.757129]  tls_init+0xd2/0x470 [tls]
[  258.761410]  tcp_set_ulp+0x235/0x4bf
[  258.765499]  do_tcp_setsockopt.isra.5+0x28b/0x1d90
[  258.770944]  __sys_setsockopt+0x10e/0x1d0
[  258.775514]  __x64_sys_setsockopt+0xba/0x150
[  258.780378]  do_syscall_64+0x96/0x400
[  258.784578]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  258.790308] 
[  258.792057] Freed by task 2542:
[  258.795656]  kfree+0xe5/0x300
[  258.799060]  tls_sk_proto_destroy+0x1c7/0x400 [tls]
[  258.804615]  tls_sk_proto_close+0x8a/0x1e0 [tls]
[  258.809870]  inet_release+0xd6/0x1b0
[  258.813953]  __sock_release+0xc0/0x290
[  258.818231]  sock_close+0x11/0x20
[  258.822023]  __fput+0x244/0x730
[  258.825620]  task_work_run+0xfe/0x180
[  258.829799]  exit_to_usermode_loop+0x10d/0x130
[  258.834855]  do_syscall_64+0x2ff/0x400
[  258.839136]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  258.844880] 
[  258.846649] The buggy address belongs to the object at ffff88871d1ede88
[  258.846649]  which belongs to the cache kmalloc-512 of size 512
[  258.860764] The buggy address is located 144 bytes inside of
[  258.860764]  512-byte region [ffff88871d1ede88, ffff88871d1ee088)
[  258.874002] The buggy address belongs to the page:
[  258.879450] page:ffffea001c747a00 count:1 mapcount:0 mapping:ffff88881e411080
[  258.892014] flags: 0x2ffff0000010200(slab|head)
[  258.897169] raw: 02ffff0000010200 ffffea001c88b208 ffffea00204bb208 ffff88880
[  258.905940] raw: ffff88871d1ed7c8 0000000000250019 00000001ffffffff 000000000
[  258.914711] page dumped because: kasan: bad access detected
[  258.921048] 
[  258.922797] Memory state around the buggy address:
[  258.928245]  ffff88871d1ede00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc c
[  258.936435]  ffff88871d1ede80: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
[  258.944635] >ffff88871d1edf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
[  258.952830]                             ^
[  258.957401]  ffff88871d1edf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
[  258.965591]  ffff88871d1ee000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
[  258.973778] =================================================================
John Fastabend April 25, 2019, 6:49 p.m. UTC | #2
On 4/25/19 11:30 AM, Jakub Kicinski wrote:
> On Thu, 25 Apr 2019 09:02:50 -0700, John Fastabend wrote:
>> Series of fixes for sockmap and ktls, see patches for descriptions.
>>
>> v2: fix build issue for CONFIG_TLS_DEVICE and fixup couple comments from
>>     Jakub.
> 
> Ah, right my comment about the rx side sleeping was fairly nonsensical,
> the locking issues is that the work queue tries to lock the same socket.
> 

Right.

> But I'm hitting some nasties, there is a UAF on a non-offload socket,
> and offload dies fairly hard.  It _could_ be my offload patches on top,
> but "they worked yesterday".  Digging deeper on the offload side,
> here's the UAF:

hmm OK I see what is happening. I could also only enable the unhash for
SW/SW  base proto. So only with,

  prot[TLS_SW][TLS_SW].unhash

There is this on the offload side did I smash it somehow?

   prot[TLS_HW_RECORD][TLS_HW_RECORD].unhash       = tls_hw_unhash;

Also I have this in my stack,

commit 01628cbabdf2fbf0b710a399f54ae005d0963f3f (HEAD -> ktls-fixes,
refs/patches/ktls-fixes/bpf-sockmap-only-stop-strp-if)
Author: John Fastabend <john.fastabend@gmail.com>
Date:   Wed Apr 24 15:55:55 2019 -0700

    bpf: sockmap, only stop/flush strp if it was enabled at some point

    If we try to call strp_done on a parser that has never been
    initialized, because the sockmap user is only using TX side for
    example we get the following error.


      [  883.422081] WARNING: CPU: 1 PID: 208 at kernel/workqueue.c:3030
__flush_work+0x1ca/0x1e0
      ...
      [  883.422095] Workqueue: events sk_psock_destroy_deferred
      [  883.422097] RIP: 0010:__flush_work+0x1ca/0x1e0


    This had been wrapped in a 'if (psock->parser.enabled)' logic which
    was broken because the strp_done() was never actually being called
    because we do a strp_stop() earlier in the tear down logic will
    set parser.enabled to false. This could result in a use after free
    if work was still in the queue and was resolved by the patch here,
    1d79895aef18f ("sk_msg: Always cancel strp work before freeing the
    psock"). However, calling strp_stop(), done by the patch marked in
    the fixes tag, only is useful if we never initialized a strp parser
    program and never initialized the strp to start with. Because if
    we had initialized a stream parser strp_stop() would have beencalled
    by sk_psock_drop() earlier in the tear down process.  By forcing the
    strp to stop we get past the WARNING in strp_done that checks
    the stopped flag but calling cancel_work_sync on work that has never
    been initialized is also wrong and generates the warning above.

    To fix check if the parser program exists. If the program exists
    then the strp work has been initialized and must be sync'd and
    cancelled before free'ing any structures. If no program exists we
    never initialized the stream parser in the first place so skip the
    sync/cancel logic implemented by strp_done.

    Finally, remove the strp_done its not needed and in the case where
    we are using the
    stream parser has already been called.

    Fixes: e8e3437762ad9 ("bpf: Stop the psock parser before canceling
its work")
    Signed-off-by: John Fastabend <john.fastabend@gmail.com>

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 782ae9eb4dce..4b4b9ad4bb86 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -555,8 +555,12 @@ static void sk_psock_destroy_deferred(struct
work_struct *gc)
        struct sk_psock *psock = container_of(gc, struct sk_psock, gc);

        /* No sk_callback_lock since already detached. */
-       strp_stop(&psock->parser.strp);
-       strp_done(&psock->parser.strp);
+
+       /* Parser has been stopped */
+       if (psock->progs.skb_parser)
+               strp_stop(&psock->parser.strp);
+               strp_done(&psock->parser.strp);
+       }

        cancel_work_sync(&psock->work);


> 
> [  258.559962] =================================================================
> [  258.568212] BUG: KASAN: use-after-free in tls_sk_proto_close+0x1a9/0x1e0 [tl]
> [  258.576398] Read of size 8 at addr ffff88871d1edf18 by task ktls_source/2542
> [  258.584369] 
> [  258.586121] CPU: 18 PID: 2542 Comm: ktls_source Not tainted 5.1.0-rc5-debug-7
> [  258.596445] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177
> [  258.604968] Call Trace:
> [  258.607796]  dump_stack+0x7c/0xc0
> [  258.611594]  print_address_description.cold.2+0x9/0x239
> [  258.617528]  kasan_report.cold.3+0x78/0x92
> [  258.622200]  ? tls_sk_proto_close+0x1a9/0x1e0 [tls]
> [  258.627745]  ? tcp_check_oom+0x390/0x390
> [  258.632221]  tls_sk_proto_close+0x1a9/0x1e0 [tls]
> [  258.637573]  inet_release+0xd6/0x1b0
> [  258.641661]  __sock_release+0xc0/0x290
> [  258.645942]  sock_close+0x11/0x20
> [  258.649735]  __fput+0x244/0x730
> [  258.653341]  task_work_run+0xfe/0x180
> [  258.657530]  exit_to_usermode_loop+0x10d/0x130
> [  258.662589]  do_syscall_64+0x2ff/0x400
> [  258.666875]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  258.672630] RIP: 0033:0x7fb42bbe2421
> [  258.676723] Code: f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
> [  258.697857] RSP: 002b:00007fffaabd9428 EFLAGS: 00000246 ORIG_RAX: 00000000003
> [  258.706526] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fb42bbe2421
> [  258.714595] RDX: 00007fb41ffbf000 RSI: 000000000bebd000 RDI: 0000000000000003
> [  258.722664] RBP: 0000000000000003 R08: 00000000ffffffff R09: 0000000000000000
> [  258.730735] R10: 0000000000000022 R11: 0000000000000246 R12: 00007fb42b7df210
> [  258.738805] R13: 00007fb41f923010 R14: 0000000000004113 R15: 0000000000000000
> [  258.746875] 
> [  258.748645] Allocated by task 2542:
> [  258.752655]  create_ctx+0x46/0x2d0 [tls]
> [  258.757129]  tls_init+0xd2/0x470 [tls]
> [  258.761410]  tcp_set_ulp+0x235/0x4bf
> [  258.765499]  do_tcp_setsockopt.isra.5+0x28b/0x1d90
> [  258.770944]  __sys_setsockopt+0x10e/0x1d0
> [  258.775514]  __x64_sys_setsockopt+0xba/0x150
> [  258.780378]  do_syscall_64+0x96/0x400
> [  258.784578]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  258.790308] 
> [  258.792057] Freed by task 2542:
> [  258.795656]  kfree+0xe5/0x300
> [  258.799060]  tls_sk_proto_destroy+0x1c7/0x400 [tls]
> [  258.804615]  tls_sk_proto_close+0x8a/0x1e0 [tls]
> [  258.809870]  inet_release+0xd6/0x1b0
> [  258.813953]  __sock_release+0xc0/0x290
> [  258.818231]  sock_close+0x11/0x20
> [  258.822023]  __fput+0x244/0x730
> [  258.825620]  task_work_run+0xfe/0x180
> [  258.829799]  exit_to_usermode_loop+0x10d/0x130
> [  258.834855]  do_syscall_64+0x2ff/0x400
> [  258.839136]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  258.844880] 
> [  258.846649] The buggy address belongs to the object at ffff88871d1ede88
> [  258.846649]  which belongs to the cache kmalloc-512 of size 512
> [  258.860764] The buggy address is located 144 bytes inside of
> [  258.860764]  512-byte region [ffff88871d1ede88, ffff88871d1ee088)
> [  258.874002] The buggy address belongs to the page:
> [  258.879450] page:ffffea001c747a00 count:1 mapcount:0 mapping:ffff88881e411080
> [  258.892014] flags: 0x2ffff0000010200(slab|head)
> [  258.897169] raw: 02ffff0000010200 ffffea001c88b208 ffffea00204bb208 ffff88880
> [  258.905940] raw: ffff88871d1ed7c8 0000000000250019 00000001ffffffff 000000000
> [  258.914711] page dumped because: kasan: bad access detected
> [  258.921048] 
> [  258.922797] Memory state around the buggy address:
> [  258.928245]  ffff88871d1ede00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc c
> [  258.936435]  ffff88871d1ede80: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
> [  258.944635] >ffff88871d1edf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
> [  258.952830]                             ^
> [  258.957401]  ffff88871d1edf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
> [  258.965591]  ffff88871d1ee000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb b
> [  258.973778] =================================================================
>
Jakub Kicinski April 25, 2019, 7:12 p.m. UTC | #3
On Thu, 25 Apr 2019 11:49:18 -0700, John Fastabend wrote:
> On 4/25/19 11:30 AM, Jakub Kicinski wrote:
> > On Thu, 25 Apr 2019 09:02:50 -0700, John Fastabend wrote:  
> >> Series of fixes for sockmap and ktls, see patches for descriptions.
> >>
> >> v2: fix build issue for CONFIG_TLS_DEVICE and fixup couple comments from
> >>     Jakub.  
> > 
> > Ah, right my comment about the rx side sleeping was fairly nonsensical,
> > the locking issues is that the work queue tries to lock the same socket.
> >   
> 
> Right.
> 
> > But I'm hitting some nasties, there is a UAF on a non-offload socket,
> > and offload dies fairly hard.  It _could_ be my offload patches on top,
> > but "they worked yesterday".  Digging deeper on the offload side,
> > here's the UAF:  
> 
> hmm OK I see what is happening. I could also only enable the unhash for
> SW/SW  base proto. So only with,
> 
>   prot[TLS_SW][TLS_SW].unhash
> 
> There is this on the offload side did I smash it somehow?
> 
>    prot[TLS_HW_RECORD][TLS_HW_RECORD].unhash       = tls_hw_unhash;


Um, I think you're good there, note that the TLS_HW_RECORD thing is not
the nice packet-based offload, it's the TOE stuff from Chelsio.  I'm
using TLS_HW.

> Also I have this in my stack,

Thanks, I will toss this in.

> commit 01628cbabdf2fbf0b710a399f54ae005d0963f3f (HEAD -> ktls-fixes,
> refs/patches/ktls-fixes/bpf-sockmap-only-stop-strp-if)
> Author: John Fastabend <john.fastabend@gmail.com>
> Date:   Wed Apr 24 15:55:55 2019 -0700
> 
>     bpf: sockmap, only stop/flush strp if it was enabled at some point
> 
>     If we try to call strp_done on a parser that has never been
>     initialized, because the sockmap user is only using TX side for
>     example we get the following error.
> 
> 
>       [  883.422081] WARNING: CPU: 1 PID: 208 at kernel/workqueue.c:3030
> __flush_work+0x1ca/0x1e0
>       ...
>       [  883.422095] Workqueue: events sk_psock_destroy_deferred
>       [  883.422097] RIP: 0010:__flush_work+0x1ca/0x1e0
> 
> 
>     This had been wrapped in a 'if (psock->parser.enabled)' logic which
>     was broken because the strp_done() was never actually being called
>     because we do a strp_stop() earlier in the tear down logic will
>     set parser.enabled to false. This could result in a use after free
>     if work was still in the queue and was resolved by the patch here,
>     1d79895aef18f ("sk_msg: Always cancel strp work before freeing the
>     psock"). However, calling strp_stop(), done by the patch marked in
>     the fixes tag, only is useful if we never initialized a strp parser
>     program and never initialized the strp to start with. Because if
>     we had initialized a stream parser strp_stop() would have beencalled
>     by sk_psock_drop() earlier in the tear down process.  By forcing the
>     strp to stop we get past the WARNING in strp_done that checks
>     the stopped flag but calling cancel_work_sync on work that has never
>     been initialized is also wrong and generates the warning above.
> 
>     To fix check if the parser program exists. If the program exists
>     then the strp work has been initialized and must be sync'd and
>     cancelled before free'ing any structures. If no program exists we
>     never initialized the stream parser in the first place so skip the
>     sync/cancel logic implemented by strp_done.
> 
>     Finally, remove the strp_done its not needed and in the case where
>     we are using the
>     stream parser has already been called.
> 
>     Fixes: e8e3437762ad9 ("bpf: Stop the psock parser before canceling
> its work")
>     Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 782ae9eb4dce..4b4b9ad4bb86 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -555,8 +555,12 @@ static void sk_psock_destroy_deferred(struct
> work_struct *gc)
>         struct sk_psock *psock = container_of(gc, struct sk_psock, gc);
> 
>         /* No sk_callback_lock since already detached. */
> -       strp_stop(&psock->parser.strp);
> -       strp_done(&psock->parser.strp);
> +
> +       /* Parser has been stopped */
> +       if (psock->progs.skb_parser)
> +               strp_stop(&psock->parser.strp);
> +               strp_done(&psock->parser.strp);
> +       }
> 
>         cancel_work_sync(&psock->work);