diff mbox series

[bpf-next,1/4] bpf: add BPF_CGROUP_INET_SOCK_RELEASE hook

Message ID 20200626000929.217930-1-sdf@google.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf-next,1/4] bpf: add BPF_CGROUP_INET_SOCK_RELEASE hook | expand

Commit Message

Stanislav Fomichev June 26, 2020, 12:09 a.m. UTC
Sometimes it's handy to know when the socket gets freed.
In particular, we'd like to try to use a smarter allocation
of ports for bpf_bind and explore the possibility of
limiting the number of SOCK_DGRAM sockets the process can have.
Adding a release pair to existing BPF_CGROUP_INET_SOCK_CREATE
can unlock both of the mentioned features.

The only questionable part here is the sock->sk check
in the inet_release. Looking at the places where we
do 'sock->sk = NULL', I don't understand how it can race
with inet_release and why the check is there (it's been
there since the initial git import). Otherwise, the
change itself is pretty simple, we add a BPF hook
to the inet_release and avoid calling it for kernel
sockets.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf-cgroup.h | 3 +++
 include/uapi/linux/bpf.h   | 1 +
 kernel/bpf/syscall.c       | 3 +++
 net/core/filter.c          | 1 +
 net/ipv4/af_inet.c         | 3 +++
 5 files changed, 11 insertions(+)

Comments

kernel test robot June 26, 2020, 2:30 a.m. UTC | #1
Hi Stanislav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf/master]
[also build test ERROR on net/master net-next/master v5.8-rc2 next-20200625]
[cannot apply to bpf-next/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Stanislav-Fomichev/bpf-add-BPF_CGROUP_INET_SOCK_RELEASE-hook/20200626-081210
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
config: nds32-defconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from ./arch/nds32/include/generated/asm/bug.h:1,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:12,
                    from include/linux/uio.h:9,
                    from include/linux/socket.h:8,
                    from net/ipv4/af_inet.c:69:
   include/linux/dma-mapping.h: In function 'dma_map_resource':
   arch/nds32/include/asm/memory.h:82:32: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
      82 | #define pfn_valid(pfn)  ((pfn) >= PHYS_PFN_OFFSET && (pfn) < (PHYS_PFN_OFFSET + max_mapnr))
         |                                ^~
   include/asm-generic/bug.h:144:27: note: in definition of macro 'WARN_ON_ONCE'
     144 |  int __ret_warn_once = !!(condition);   \
         |                           ^~~~~~~~~
   include/linux/dma-mapping.h:352:19: note: in expansion of macro 'pfn_valid'
     352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
         |                   ^~~~~~~~~
   net/ipv4/af_inet.c: In function 'inet_release':
>> net/ipv4/af_inet.c:415:4: error: implicit declaration of function 'BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE'; did you mean 'BPF_CGROUP_RUN_PROG_INET_SOCK'? [-Werror=implicit-function-declaration]
     415 |    BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk);
         |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |    BPF_CGROUP_RUN_PROG_INET_SOCK
   cc1: some warnings being treated as errors

vim +415 net/ipv4/af_inet.c

   400	
   401	
   402	/*
   403	 *	The peer socket should always be NULL (or else). When we call this
   404	 *	function we are destroying the object and from then on nobody
   405	 *	should refer to it.
   406	 */
   407	int inet_release(struct socket *sock)
   408	{
   409		struct sock *sk = sock->sk;
   410	
   411		if (sk) {
   412			long timeout;
   413	
   414			if (!sk->sk_kern_sock)
 > 415				BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk);
   416	
   417			/* Applications forget to leave groups before exiting */
   418			ip_mc_drop_socket(sk);
   419	
   420			/* If linger is set, we don't return until the close
   421			 * is complete.  Otherwise we return immediately. The
   422			 * actually closing is done the same either way.
   423			 *
   424			 * If the close is due to the process exiting, we never
   425			 * linger..
   426			 */
   427			timeout = 0;
   428			if (sock_flag(sk, SOCK_LINGER) &&
   429			    !(current->flags & PF_EXITING))
   430				timeout = sk->sk_lingertime;
   431			sk->sk_prot->close(sk, timeout);
   432			sock->sk = NULL;
   433		}
   434		return 0;
   435	}
   436	EXPORT_SYMBOL(inet_release);
   437	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot June 26, 2020, 7 a.m. UTC | #2
Hi Stanislav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf/master]
[also build test ERROR on net/master net-next/master v5.8-rc2 next-20200625]
[cannot apply to bpf-next/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Stanislav-Fomichev/bpf-add-BPF_CGROUP_INET_SOCK_RELEASE-hook/20200626-081210
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
config: x86_64-defconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 6e11ed52057ffc39941cb2de6d93cae522db4782)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> net/ipv4/af_inet.c:415:4: error: implicit declaration of function 'BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE' [-Werror,-Wimplicit-function-declaration]
                           BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk);
                           ^
   1 error generated.

vim +/BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE +415 net/ipv4/af_inet.c

   400	
   401	
   402	/*
   403	 *	The peer socket should always be NULL (or else). When we call this
   404	 *	function we are destroying the object and from then on nobody
   405	 *	should refer to it.
   406	 */
   407	int inet_release(struct socket *sock)
   408	{
   409		struct sock *sk = sock->sk;
   410	
   411		if (sk) {
   412			long timeout;
   413	
   414			if (!sk->sk_kern_sock)
 > 415				BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk);
   416	
   417			/* Applications forget to leave groups before exiting */
   418			ip_mc_drop_socket(sk);
   419	
   420			/* If linger is set, we don't return until the close
   421			 * is complete.  Otherwise we return immediately. The
   422			 * actually closing is done the same either way.
   423			 *
   424			 * If the close is due to the process exiting, we never
   425			 * linger..
   426			 */
   427			timeout = 0;
   428			if (sock_flag(sk, SOCK_LINGER) &&
   429			    !(current->flags & PF_EXITING))
   430				timeout = sk->sk_lingertime;
   431			sk->sk_prot->close(sk, timeout);
   432			sock->sk = NULL;
   433		}
   434		return 0;
   435	}
   436	EXPORT_SYMBOL(inet_release);
   437	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index c66c545e161a..b4fd09fe67bd 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -210,6 +210,9 @@  int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk)				       \
 	BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_CREATE)
 
+#define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk)			       \
+	BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_RELEASE)
+
 #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk)				       \
 	BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET4_POST_BIND)
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c65b374a5090..d7aea1d0167a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -226,6 +226,7 @@  enum bpf_attach_type {
 	BPF_CGROUP_INET4_GETSOCKNAME,
 	BPF_CGROUP_INET6_GETSOCKNAME,
 	BPF_XDP_DEVMAP,
+	BPF_CGROUP_INET_SOCK_RELEASE,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4d530b1d5683..2a3d4b8f95c7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1994,6 +1994,7 @@  bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 	case BPF_PROG_TYPE_CGROUP_SOCK:
 		switch (expected_attach_type) {
 		case BPF_CGROUP_INET_SOCK_CREATE:
+		case BPF_CGROUP_INET_SOCK_RELEASE:
 		case BPF_CGROUP_INET4_POST_BIND:
 		case BPF_CGROUP_INET6_POST_BIND:
 			return 0;
@@ -2792,6 +2793,7 @@  attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_CGROUP_SKB;
 		break;
 	case BPF_CGROUP_INET_SOCK_CREATE:
+	case BPF_CGROUP_INET_SOCK_RELEASE:
 	case BPF_CGROUP_INET4_POST_BIND:
 	case BPF_CGROUP_INET6_POST_BIND:
 		return BPF_PROG_TYPE_CGROUP_SOCK;
@@ -2942,6 +2944,7 @@  static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
 	case BPF_CGROUP_INET_SOCK_CREATE:
+	case BPF_CGROUP_INET_SOCK_RELEASE:
 	case BPF_CGROUP_INET4_BIND:
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_POST_BIND:
diff --git a/net/core/filter.c b/net/core/filter.c
index 209482a4eaa2..7bcac182383c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6855,6 +6855,7 @@  static bool __sock_filter_check_attach_type(int off,
 	case offsetof(struct bpf_sock, priority):
 		switch (attach_type) {
 		case BPF_CGROUP_INET_SOCK_CREATE:
+		case BPF_CGROUP_INET_SOCK_RELEASE:
 			goto full_access;
 		default:
 			return false;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 02aa5cb3a4fd..965a96ea1168 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -411,6 +411,9 @@  int inet_release(struct socket *sock)
 	if (sk) {
 		long timeout;
 
+		if (!sk->sk_kern_sock)
+			BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk);
+
 		/* Applications forget to leave groups before exiting */
 		ip_mc_drop_socket(sk);