[RFC,bpf-next,00/11] Add socket lookup support

Message ID	20180509210709.7201-1-joe@wand.net.nz
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Joe Stringer <joe@wand.net.nz> To: daniel@iogearbox.net Cc: netdev@vger.kernel.org, ast@kernel.org, john.fastabend@gmail.com, tgraf@suug.ch, kafai@fb.com Subject: [RFC bpf-next 00/11] Add socket lookup support Date: Wed, 9 May 2018 14:06:58 -0700 Message-Id: <20180509210709.7201-1-joe@wand.net.nz> Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	Add socket lookup support \| expand [RFC,bpf-next,00/11] Add socket lookup support [RFC,bpf-next,01/11] bpf: Add iterator for spilled registers [RFC,bpf-next,02/11] bpf: Simplify ptr_min_max_vals adjustment [RFC,bpf-next,03/11] bpf: Generalize ptr_or_null regs check [RFC,bpf-next,04/11] bpf: Add PTR_TO_SOCKET verifier type [RFC,bpf-next,05/11] bpf: Macrofy stack state copy [RFC,bpf-next,06/11] bpf: Add reference tracking to verifier [RFC,bpf-next,07/11] bpf: Add helper to retrieve socket in BPF [RFC,bpf-next,08/11] selftests/bpf: Add tests for reference tracking [RFC,bpf-next,09/11] libbpf: Support loading individual progs [RFC,bpf-next,10/11] selftests/bpf: Add C tests for reference tracking [RFC,bpf-next,11/11] Documentation: Describe bpf reference tracking

Message ID

20180509210709.7201-1-joe@wand.net.nz

Headers

From: Joe Stringer <joe@wand.net.nz>
To: daniel@iogearbox.net
Cc: netdev@vger.kernel.org, ast@kernel.org, john.fastabend@gmail.com,
	tgraf@suug.ch, kafai@fb.com
Subject: [RFC bpf-next 00/11] Add socket lookup support
Date: Wed,  9 May 2018 14:06:58 -0700
Message-Id: <20180509210709.7201-1-joe@wand.net.nz>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Series

Add socket lookup support | expand

Message

Joe Stringer May 9, 2018, 9:06 p.m. UTC

This series proposes a new helper for the BPF API which allows BPF programs to
perform lookups for sockets in a network namespace. This would allow programs
to determine early on in processing whether the stack is expecting to receive
the packet, and perform some action (eg drop, forward somewhere) based on this
information.

The series is structured roughly into:
* Misc refactor
* Add the socket pointer type
* Add reference tracking to ensure that socket references are freed
* Extend the BPF API to add sk_lookup() / sk_release() functions
* Add tests/documentation

The helper proposed in this series includes a parameter for a tuple which must
be filled in by the caller to determine the socket to look up. The simplest
case would be filling with the contents of the packet, ie mapping the packet's
5-tuple into the parameter. In common cases, it may alternatively be useful to
reverse the direction of the tuple and perform a lookup, to find the socket
that initiates this connection; and if the BPF program ever performs a form of
IP address translation, it may further be useful to be able to look up
arbitrary tuples that are not based upon the packet, but instead based on state
held in BPF maps or hardcoded in the BPF program.

Currently, access into the socket's fields are limited to those which are
otherwise already accessible, and are restricted to read-only access.

A few open points:
* Currently, the lookup interface only returns either a valid socket or a NULL
  pointer. This means that if there is any kind of issue with the tuple, such
  as it provides an unsupported protocol number, or the socket can't be found,
  then we are unable to differentiate these cases from one another. One natural
  approach to improve this could be to return an ERR_PTR from the
  bpf_sk_lookup() helper. This would be more complicated but maybe it's
  worthwhile.
* No ordering is defined between sockets. If the tuple could find multiple
  sockets, then it will arbitrarily return one. It is up to the caller to
  handle this. If we wish to handle this more reliably in future, we could
  encode an ordering preference in the flags field.
* Currently this helper is only defined for TC hook point, but it should also
  be valid at XDP and perhaps some other hooks.

Joe Stringer (11):
  bpf: Add iterator for spilled registers
  bpf: Simplify ptr_min_max_vals adjustment
  bpf: Generalize ptr_or_null regs check
  bpf: Add PTR_TO_SOCKET verifier type
  bpf: Macrofy stack state copy
  bpf: Add reference tracking to verifier
  bpf: Add helper to retrieve socket in BPF
  selftests/bpf: Add tests for reference tracking
  libbpf: Support loading individual progs
  selftests/bpf: Add C tests for reference tracking
  Documentation: Describe bpf reference tracking

 Documentation/networking/filter.txt               |  64 +++
 include/linux/bpf.h                               |  19 +-
 include/linux/bpf_verifier.h                      |  31 +-
 include/uapi/linux/bpf.h                          |  39 +-
 kernel/bpf/verifier.c                             | 548 ++++++++++++++++++----
 net/core/filter.c                                 | 132 +++++-
 tools/include/uapi/linux/bpf.h                    |  40 +-
 tools/lib/bpf/libbpf.c                            |   4 +-
 tools/lib/bpf/libbpf.h                            |   3 +
 tools/testing/selftests/bpf/Makefile              |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h         |   7 +
 tools/testing/selftests/bpf/test_progs.c          |  38 ++
 tools/testing/selftests/bpf/test_sk_lookup_kern.c | 127 +++++
 tools/testing/selftests/bpf/test_verifier.c       | 373 ++++++++++++++-
 14 files changed, 1299 insertions(+), 128 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_sk_lookup_kern.c

Comments

Joe Stringer May 16, 2018, 7:05 p.m. UTC | #1

On 9 May 2018 at 14:06, Joe Stringer <joe@wand.net.nz> wrote:
> This series proposes a new helper for the BPF API which allows BPF programs to
> perform lookups for sockets in a network namespace. This would allow programs
> to determine early on in processing whether the stack is expecting to receive
> the packet, and perform some action (eg drop, forward somewhere) based on this
> information.
>
> The series is structured roughly into:
> * Misc refactor
> * Add the socket pointer type
> * Add reference tracking to ensure that socket references are freed
> * Extend the BPF API to add sk_lookup() / sk_release() functions
> * Add tests/documentation
>
> The helper proposed in this series includes a parameter for a tuple which must
> be filled in by the caller to determine the socket to look up. The simplest
> case would be filling with the contents of the packet, ie mapping the packet's
> 5-tuple into the parameter. In common cases, it may alternatively be useful to
> reverse the direction of the tuple and perform a lookup, to find the socket
> that initiates this connection; and if the BPF program ever performs a form of
> IP address translation, it may further be useful to be able to look up
> arbitrary tuples that are not based upon the packet, but instead based on state
> held in BPF maps or hardcoded in the BPF program.
>
> Currently, access into the socket's fields are limited to those which are
> otherwise already accessible, and are restricted to read-only access.
>
> A few open points:
> * Currently, the lookup interface only returns either a valid socket or a NULL
>   pointer. This means that if there is any kind of issue with the tuple, such
>   as it provides an unsupported protocol number, or the socket can't be found,
>   then we are unable to differentiate these cases from one another. One natural
>   approach to improve this could be to return an ERR_PTR from the
>   bpf_sk_lookup() helper. This would be more complicated but maybe it's
>   worthwhile.

This suggestion would add a lot of complexity, and there's not many
legitimately different error cases. There's:
* Unsupported socket type
* Cannot find netns
* Tuple argument is the wrong size
* Can't find socket

If we split the helpers into protocol-specific types, the first one
would be addressed. The last one is addressed by returning NULL. It
seems like a reasonable compromise to me to return NULL also in the
middle two cases as well, and rely on the BPF writer to provide valid
arguments.

> * No ordering is defined between sockets. If the tuple could find multiple
>   sockets, then it will arbitrarily return one. It is up to the caller to
>   handle this. If we wish to handle this more reliably in future, we could
>   encode an ordering preference in the flags field.

Doesn't need to be addressed with this series, there is scope for
addressing these cases when the use case arises.

> * Currently this helper is only defined for TC hook point, but it should also
>   be valid at XDP and perhaps some other hooks.

Easy to add support for XDP on demand, initial implementation doesn't need it.

Alexei Starovoitov May 16, 2018, 8:04 p.m. UTC | #2

On Wed, May 16, 2018 at 12:05:06PM -0700, Joe Stringer wrote:
> >
> > A few open points:
> > * Currently, the lookup interface only returns either a valid socket or a NULL
> >   pointer. This means that if there is any kind of issue with the tuple, such
> >   as it provides an unsupported protocol number, or the socket can't be found,
> >   then we are unable to differentiate these cases from one another. One natural
> >   approach to improve this could be to return an ERR_PTR from the
> >   bpf_sk_lookup() helper. This would be more complicated but maybe it's
> >   worthwhile.
> 
> This suggestion would add a lot of complexity, and there's not many
> legitimately different error cases. There's:
> * Unsupported socket type
> * Cannot find netns
> * Tuple argument is the wrong size
> * Can't find socket
> 
> If we split the helpers into protocol-specific types, the first one
> would be addressed. The last one is addressed by returning NULL. It
> seems like a reasonable compromise to me to return NULL also in the
> middle two cases as well, and rely on the BPF writer to provide valid
> arguments.
> 
> > * No ordering is defined between sockets. If the tuple could find multiple
> >   sockets, then it will arbitrarily return one. It is up to the caller to
> >   handle this. If we wish to handle this more reliably in future, we could
> >   encode an ordering preference in the flags field.
> 
> Doesn't need to be addressed with this series, there is scope for
> addressing these cases when the use case arises.

Thanks for summarizing the conf call discussion.
Looking forward to non-rfc patches :)