Message ID | 20200706230128.4073544-2-sdf@google.com |
---|---|
State | Accepted |
Delegated to: | BPF Maintainers |
Headers | show |
Series | bpf: add BPF_CGROUP_INET_SOCK_RELEASE hook | expand |
On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote: > > Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers > on inet socket release. It triggers only for userspace > sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE. > > The only questionable part here is the sock->sk check > in the inet_release. Looking at the places where we > do 'sock->sk = NULL', I don't understand how it can race > with inet_release and why the check is there (it's been > there since the initial git import). Otherwise, the > change itself is pretty simple, we add a BPF hook > to the inet_release and avoid calling it for kernel > sockets. > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > include/linux/bpf-cgroup.h | 4 ++++ > include/uapi/linux/bpf.h | 1 + > kernel/bpf/syscall.c | 3 +++ > net/core/filter.c | 1 + > net/ipv4/af_inet.c | 3 +++ > 5 files changed, 12 insertions(+) > Looks good overall, but I have no idea about sock->sk NULL case. Acked-by: Andrii Nakryiko <andriin@fb.com> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h > index c66c545e161a..2c6f26670acc 100644 > --- a/include/linux/bpf-cgroup.h > +++ b/include/linux/bpf-cgroup.h [...]
On 7/7/20 1:42 AM, Andrii Nakryiko wrote: > On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote: >> >> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers >> on inet socket release. It triggers only for userspace >> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE. >> >> The only questionable part here is the sock->sk check >> in the inet_release. Looking at the places where we >> do 'sock->sk = NULL', I don't understand how it can race >> with inet_release and why the check is there (it's been >> there since the initial git import). Otherwise, the >> change itself is pretty simple, we add a BPF hook >> to the inet_release and avoid calling it for kernel >> sockets. >> >> Signed-off-by: Stanislav Fomichev <sdf@google.com> >> --- >> include/linux/bpf-cgroup.h | 4 ++++ >> include/uapi/linux/bpf.h | 1 + >> kernel/bpf/syscall.c | 3 +++ >> net/core/filter.c | 1 + >> net/ipv4/af_inet.c | 3 +++ >> 5 files changed, 12 insertions(+) >> > > Looks good overall, but I have no idea about sock->sk NULL case. +1, looks good & very useful hook. For the sock->sk NULL case here's a related discussion on why it's needed [0]. [0] https://lore.kernel.org/netdev/20190221221356.173485-1-ebiggers@kernel.org/ > Acked-by: Andrii Nakryiko <andriin@fb.com> > >> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h >> index c66c545e161a..2c6f26670acc 100644 >> --- a/include/linux/bpf-cgroup.h >> +++ b/include/linux/bpf-cgroup.h > > [...] >
On Tue, Jul 7, 2020 at 2:42 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 7/7/20 1:42 AM, Andrii Nakryiko wrote: > > On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote: > >> > >> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers > >> on inet socket release. It triggers only for userspace > >> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE. > >> > >> The only questionable part here is the sock->sk check > >> in the inet_release. Looking at the places where we > >> do 'sock->sk = NULL', I don't understand how it can race > >> with inet_release and why the check is there (it's been > >> there since the initial git import). Otherwise, the > >> change itself is pretty simple, we add a BPF hook > >> to the inet_release and avoid calling it for kernel > >> sockets. > >> > >> Signed-off-by: Stanislav Fomichev <sdf@google.com> > >> --- > >> include/linux/bpf-cgroup.h | 4 ++++ > >> include/uapi/linux/bpf.h | 1 + > >> kernel/bpf/syscall.c | 3 +++ > >> net/core/filter.c | 1 + > >> net/ipv4/af_inet.c | 3 +++ > >> 5 files changed, 12 insertions(+) > >> > > > > Looks good overall, but I have no idea about sock->sk NULL case. > > +1, looks good & very useful hook. For the sock->sk NULL case here's a related > discussion on why it's needed [0]. Thanks for the pointer! I'll resend a v5 with s/sock/sock_create/ you mentioned and will clean up the commit description a bit.
On 7/8/20 1:43 AM, Stanislav Fomichev wrote: > On Tue, Jul 7, 2020 at 2:42 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >> >> On 7/7/20 1:42 AM, Andrii Nakryiko wrote: >>> On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote: >>>> >>>> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers >>>> on inet socket release. It triggers only for userspace >>>> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE. >>>> >>>> The only questionable part here is the sock->sk check >>>> in the inet_release. Looking at the places where we >>>> do 'sock->sk = NULL', I don't understand how it can race >>>> with inet_release and why the check is there (it's been >>>> there since the initial git import). Otherwise, the >>>> change itself is pretty simple, we add a BPF hook >>>> to the inet_release and avoid calling it for kernel >>>> sockets. >>>> >>>> Signed-off-by: Stanislav Fomichev <sdf@google.com> >>>> --- >>>> include/linux/bpf-cgroup.h | 4 ++++ >>>> include/uapi/linux/bpf.h | 1 + >>>> kernel/bpf/syscall.c | 3 +++ >>>> net/core/filter.c | 1 + >>>> net/ipv4/af_inet.c | 3 +++ >>>> 5 files changed, 12 insertions(+) >>>> >>> >>> Looks good overall, but I have no idea about sock->sk NULL case. >> >> +1, looks good & very useful hook. For the sock->sk NULL case here's a related >> discussion on why it's needed [0]. > Thanks for the pointer! I'll resend a v5 with s/sock/sock_create/ you > mentioned and will clean up the commit description a bit. Already fixed up the selftest and a typo in the commit desc there & applied it. Let me know if you prefer a respin though and I can toss it taking the respin which would work just as well. :) Thanks, Daniel
On Tue, Jul 7, 2020 at 4:56 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 7/8/20 1:43 AM, Stanislav Fomichev wrote: > > On Tue, Jul 7, 2020 at 2:42 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > >> > >> On 7/7/20 1:42 AM, Andrii Nakryiko wrote: > >>> On Mon, Jul 6, 2020 at 4:02 PM Stanislav Fomichev <sdf@google.com> wrote: > >>>> > >>>> Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers > >>>> on inet socket release. It triggers only for userspace > >>>> sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE. > >>>> > >>>> The only questionable part here is the sock->sk check > >>>> in the inet_release. Looking at the places where we > >>>> do 'sock->sk = NULL', I don't understand how it can race > >>>> with inet_release and why the check is there (it's been > >>>> there since the initial git import). Otherwise, the > >>>> change itself is pretty simple, we add a BPF hook > >>>> to the inet_release and avoid calling it for kernel > >>>> sockets. > >>>> > >>>> Signed-off-by: Stanislav Fomichev <sdf@google.com> > >>>> --- > >>>> include/linux/bpf-cgroup.h | 4 ++++ > >>>> include/uapi/linux/bpf.h | 1 + > >>>> kernel/bpf/syscall.c | 3 +++ > >>>> net/core/filter.c | 1 + > >>>> net/ipv4/af_inet.c | 3 +++ > >>>> 5 files changed, 12 insertions(+) > >>>> > >>> > >>> Looks good overall, but I have no idea about sock->sk NULL case. > >> > >> +1, looks good & very useful hook. For the sock->sk NULL case here's a related > >> discussion on why it's needed [0]. > > Thanks for the pointer! I'll resend a v5 with s/sock/sock_create/ you > > mentioned and will clean up the commit description a bit. > > Already fixed up the selftest and a typo in the commit desc there & applied it. Oh, awesome, thanks!
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index c66c545e161a..2c6f26670acc 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -210,6 +210,9 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) \ BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_CREATE) +#define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) \ + BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_RELEASE) + #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) \ BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET4_POST_BIND) @@ -401,6 +404,7 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET4_BIND(sk, uaddr) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET6_BIND(sk, uaddr) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; }) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index da9bf35a26f8..548a749aebb3 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -226,6 +226,7 @@ enum bpf_attach_type { BPF_CGROUP_INET4_GETSOCKNAME, BPF_CGROUP_INET6_GETSOCKNAME, BPF_XDP_DEVMAP, + BPF_CGROUP_INET_SOCK_RELEASE, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 8da159936bab..156f51ffada2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1981,6 +1981,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, case BPF_PROG_TYPE_CGROUP_SOCK: switch (expected_attach_type) { case BPF_CGROUP_INET_SOCK_CREATE: + case BPF_CGROUP_INET_SOCK_RELEASE: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: return 0; @@ -2779,6 +2780,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_CGROUP_SKB; break; case BPF_CGROUP_INET_SOCK_CREATE: + case BPF_CGROUP_INET_SOCK_RELEASE: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: return BPF_PROG_TYPE_CGROUP_SOCK; @@ -2929,6 +2931,7 @@ static int bpf_prog_query(const union bpf_attr *attr, case BPF_CGROUP_INET_INGRESS: case BPF_CGROUP_INET_EGRESS: case BPF_CGROUP_INET_SOCK_CREATE: + case BPF_CGROUP_INET_SOCK_RELEASE: case BPF_CGROUP_INET4_BIND: case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET4_POST_BIND: diff --git a/net/core/filter.c b/net/core/filter.c index c5e696e6c315..ddcc0d6209e1 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6890,6 +6890,7 @@ static bool __sock_filter_check_attach_type(int off, case offsetof(struct bpf_sock, priority): switch (attach_type) { case BPF_CGROUP_INET_SOCK_CREATE: + case BPF_CGROUP_INET_SOCK_RELEASE: goto full_access; default: return false; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index ea6ed6d487ed..ff141d630bdf 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -411,6 +411,9 @@ int inet_release(struct socket *sock) if (sk) { long timeout; + if (!sk->sk_kern_sock) + BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk); + /* Applications forget to leave groups before exiting */ ip_mc_drop_socket(sk);
Implement BPF_CGROUP_INET_SOCK_RELEASE hook that triggers on inet socket release. It triggers only for userspace sockets, the same semantics as existing BPF_CGROUP_INET_SOCK_CREATE. The only questionable part here is the sock->sk check in the inet_release. Looking at the places where we do 'sock->sk = NULL', I don't understand how it can race with inet_release and why the check is there (it's been there since the initial git import). Otherwise, the change itself is pretty simple, we add a BPF hook to the inet_release and avoid calling it for kernel sockets. Signed-off-by: Stanislav Fomichev <sdf@google.com> --- include/linux/bpf-cgroup.h | 4 ++++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 3 +++ net/core/filter.c | 1 + net/ipv4/af_inet.c | 3 +++ 5 files changed, 12 insertions(+)