mbox series

[bpf-next,v2,0/4] Sockmap iterator

Message ID 20200901103210.54607-1-lmb@cloudflare.com
Headers show
Series Sockmap iterator | expand

Message

Lorenz Bauer Sept. 1, 2020, 10:32 a.m. UTC
This addresses Jakub's feedback for the v1 [1]. Outstanding issues are:

* Can we use rcu_dereference instead of rcu_dereference_raw?
* Is it OK to not take the bucket lock?
* Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?

Changes in v2:
- Remove unnecessary sk_fullsock checks (Jakub)
- Nits for test output (Jakub)
- Increase number of sockets in tests to 64 (Jakub)
- Handle ENOENT in tests (Jakub)
- Actually test SOCKHASH iteration (myself)
- Fix SOCKHASH iterator initialization (myself)

1: https://lore.kernel.org/bpf/20200828094834.23290-1-lmb@cloudflare.com/

Lorenz Bauer (4):
  net: sockmap: Remove unnecessary sk_fullsock checks
  net: Allow iterating sockmap and sockhash
  selftests: bpf: Add helper to compare socket cookies
  selftests: bpf: Test copying a sockmap via bpf_iter

 net/core/sock_map.c                           | 287 +++++++++++++++++-
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 141 ++++++++-
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   9 +
 .../selftests/bpf/progs/bpf_iter_sockmap.c    |  58 ++++
 .../selftests/bpf/progs/bpf_iter_sockmap.h    |   3 +
 5 files changed, 482 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h

Comments

Yonghong Song Sept. 2, 2020, 5:30 a.m. UTC | #1
On 9/1/20 3:32 AM, Lorenz Bauer wrote:
> This addresses Jakub's feedback for the v1 [1]. Outstanding issues are:
> 
> * Can we use rcu_dereference instead of rcu_dereference_raw?
> * Is it OK to not take the bucket lock?
> * Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?

For the last question, I see current implementation is

+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_iter__sockmap, key),
+		  PTR_TO_RDONLY_BUF_OR_NULL },
+		{ offsetof(struct bpf_iter__sockmap, sk),
+		  PTR_TO_SOCKET_OR_NULL },
+	},

The PTR_TO_BTF_ID might be better as it provides more flexibility and
more fields to access. If in the verifier PTR_TO_SOCKET reg type
is desired, you could add a callback somewhere to check:
    if the type is PTR_TO_BTF_ID and the btf_id is 'struct socket",
    then you can convert reg type to PTR_TO_SOCKET_OR_NULL.
PTR_TO_SOCK_OR_NULL is used here since PTR_TO_BTF_ID might be
a null pointer.

We could add a bit in register state to indicate a PTR_TO_BTF_ID
could be possibly null or not. For example, a conversion from
PTR_TO_BTF_ID_OR_NULL to PTR_TO_BTF_ID will yield non-null btf_id.
A pointer tracing will yield a possible-null btf_id. If we have this,
it is possible to convert a PTR_TO_BTF_ID to PTR_TO_SOCKET.

> 
> Changes in v2:
> - Remove unnecessary sk_fullsock checks (Jakub)
> - Nits for test output (Jakub)
> - Increase number of sockets in tests to 64 (Jakub)
> - Handle ENOENT in tests (Jakub)
> - Actually test SOCKHASH iteration (myself)
> - Fix SOCKHASH iterator initialization (myself)
> 
> 1: https://lore.kernel.org/bpf/20200828094834.23290-1-lmb@cloudflare.com/
> 
> Lorenz Bauer (4):
>    net: sockmap: Remove unnecessary sk_fullsock checks
>    net: Allow iterating sockmap and sockhash
>    selftests: bpf: Add helper to compare socket cookies
>    selftests: bpf: Test copying a sockmap via bpf_iter
> 
>   net/core/sock_map.c                           | 287 +++++++++++++++++-
>   .../selftests/bpf/prog_tests/sockmap_basic.c  | 141 ++++++++-
>   tools/testing/selftests/bpf/progs/bpf_iter.h  |   9 +
>   .../selftests/bpf/progs/bpf_iter_sockmap.c    |  58 ++++
>   .../selftests/bpf/progs/bpf_iter_sockmap.h    |   3 +
>   5 files changed, 482 insertions(+), 16 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.c
>   create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_sockmap.h
>
Martin KaFai Lau Sept. 3, 2020, 8:08 p.m. UTC | #2
On Tue, Sep 01, 2020 at 11:32:06AM +0100, Lorenz Bauer wrote:
> * Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?
I am working on a patch to teach the verifier to allow PTR_TO_SOCK* being used
in the bpf_skc_to_*_sock() helper.

The use case is, for example, use bpf_skc_to_tcp_sock() to cast the
tc's __sk_buff->sk to a kernel "struct tcp_sock" (PTR_TO_BTF_ID) such
that the bpf prog won't be limited by the fields in "struct bpf_tcp_sock"
if the user has perfmon cap.   Thus, in general, should be doable.
Hopefully I have something sharable next week.

For the sockmap iter  (which is tracing), I think it is better
to begin with PTR_TO_BTF_ID_OR_NULL such that the bpf iter prog
can access the tcp_sock (or udp_sock) without another casting or
bpf_probe_read_kernel().
Lorenz Bauer Sept. 4, 2020, 7:46 a.m. UTC | #3
On Thu, 3 Sep 2020 at 21:08, Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Sep 01, 2020 at 11:32:06AM +0100, Lorenz Bauer wrote:
> > * Can we teach the verifier that PTR_TO_BTF_ID can be the same as PTR_TO_SOCKET?
> I am working on a patch to teach the verifier to allow PTR_TO_SOCK* being used
> in the bpf_skc_to_*_sock() helper.
>
> The use case is, for example, use bpf_skc_to_tcp_sock() to cast the
> tc's __sk_buff->sk to a kernel "struct tcp_sock" (PTR_TO_BTF_ID) such
> that the bpf prog won't be limited by the fields in "struct bpf_tcp_sock"
> if the user has perfmon cap.   Thus, in general, should be doable.
> Hopefully I have something sharable next week.

That's great! I got carried away with refactoring check_func_arg with
the same goal, it'll be interesting to see what you've come up with.
I've decided to make a smaller change for the sake of landing this
series, and then I'll follow up with my refactoring. Hopefully we
won't be stepping on each others toes too much.

>
> For the sockmap iter  (which is tracing), I think it is better
> to begin with PTR_TO_BTF_ID_OR_NULL such that the bpf iter prog
> can access the tcp_sock (or udp_sock) without another casting or
> bpf_probe_read_kernel().

Yes, that's what I'm going with as well.