[bpf-next,v2,1/4] bpf: add get{peer,sock}name attach types for sock_addr

As stated in 983695fa6765 ("bpf: fix unconnected udp hooks"), the objective
for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
transparent to applications. In Cilium we make use of these hooks [0] in
order to enable E-W load balancing for existing Kubernetes service types
for all Cilium managed nodes in the cluster. Those backends can be local
or remote. The main advantage of this approach is that it operates as close
as possible to the socket, and therefore allows to avoid packet-based NAT
given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.

This also allows to expose NodePort services on loopback addresses in the
host namespace, for example. As another advantage, this also efficiently
blocks bind requests for applications in the host namespace for exposed
ports. However, one missing item is that we also need to perform reverse
xlation for inet{,6}_getname() hooks such that we can return the service
IP/port tuple back to the application instead of the remote peer address.

The vast majority of applications does not bother about getpeername(), but
in a few occasions we've seen breakage when validating the peer's address
since it returns unexpectedly the backend tuple instead of the service one.
Therefore, this trivial patch allows to customise and adds a getpeername()
as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
to address this situation.

Simple example:

  # ./cilium/cilium service list
  ID   Frontend     Service Type   Backend
  1    1.2.3.4:80   ClusterIP      1 => 10.0.0.10:80

Before; curl's verbose output example, no getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
  > GET / HTTP/1.1
  > Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

After; with getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
  > GET / HTTP/1.1
  >  Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
peer to the context similar as in inet{,6}_getname() fashion, but API-wise
this is suboptimal as it always enforces programs having to test for ctx->peer
which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
Similarly, the checked return code is on tnum_range(1, 1), but if a use case
comes up in future, it can easily be changed to return an error code instead.
Helper and ctx member access is the same as with connect/sendmsg/etc hooks.

  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
---
 include/linux/bpf-cgroup.h     |  1 +
 include/uapi/linux/bpf.h       |  4 ++++
 kernel/bpf/syscall.c           | 12 ++++++++++++
 kernel/bpf/verifier.c          |  6 +++++-
 net/core/filter.c              |  4 ++++
 net/ipv4/af_inet.c             |  8 ++++++--
 net/ipv6/af_inet6.c            |  9 ++++++---
 tools/include/uapi/linux/bpf.h |  4 ++++
 8 files changed, 42 insertions(+), 6 deletions(-)

Message ID	61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
State	Accepted
Delegated to:	BPF Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=iogearbox.net Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 49QvHv5nCpz9sTD for <patchwork-incoming-netdev@ozlabs.org>; Tue, 19 May 2020 08:46:03 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728445AbgERWqD (ORCPT <rfc822;patchwork-incoming-netdev@ozlabs.org>); Mon, 18 May 2020 18:46:03 -0400 Received: from www62.your-server.de ([213.133.104.62]:57858 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726500AbgERWqC (ORCPT <rfc822;netdev@vger.kernel.org>); Mon, 18 May 2020 18:46:02 -0400 Received: from 75.57.196.178.dynamic.wline.res.cust.swisscom.ch ([178.196.57.75] helo=localhost) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from <daniel@iogearbox.net>) id 1jaoWD-0000c6-Gg; Tue, 19 May 2020 00:45:57 +0200 From: Daniel Borkmann <daniel@iogearbox.net> To: ast@kernel.org Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, rdna@fb.com, sdf@google.com, andrii.nakryiko@gmail.com, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andriin@fb.com> Subject: [PATCH bpf-next v2 1/4] bpf: add get{peer,sock}name attach types for sock_addr Date: Tue, 19 May 2020 00:45:45 +0200 Message-Id: <61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net> X-Mailer: git-send-email 2.21.0 In-Reply-To: <cover.1589841594.git.daniel@iogearbox.net> References: <cover.1589841594.git.daniel@iogearbox.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.102.2/25816/Mon May 18 14:17:08 2020) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org
Series	Add get{peer,sock}name cgroup attach types \| expand [bpf-next,v2,0/4] Add get{peer,sock}name cgroup attach types [bpf-next,v2,1/4] bpf: add get{peer,sock}name attach types for sock_addr [bpf-next,v2,2/4] bpf, libbpf: enable get{peer,sock}name attach types [bpf-next,v2,3/4] bpf, bpftool: enable get{peer,sock}name attach types [bpf-next,v2,4/4] bpf, testing: add get{peer,sock}name selftests to test_progs

[bpf-next,v2,1/4] bpf: add get{peer,sock}name attach types for sock_addr

Commit Message

Patch