diff mbox

[0/6,v3] kvmalloc

Message ID 20170126141349.GN6590@dhcp22.suse.cz
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Michal Hocko Jan. 26, 2017, 2:13 p.m. UTC
On Thu 26-01-17 14:40:04, Michal Hocko wrote:
> On Thu 26-01-17 14:10:06, Daniel Borkmann wrote:
> > On 01/26/2017 12:58 PM, Michal Hocko wrote:
> > > On Thu 26-01-17 12:33:55, Daniel Borkmann wrote:
> > > > On 01/26/2017 11:08 AM, Michal Hocko wrote:
> > > [...]
> > > > > If you disagree I can drop the bpf part of course...
> > > > 
> > > > If we could consolidate these spots with kvmalloc() eventually, I'm
> > > > all for it. But even if __GFP_NORETRY is not covered down to all
> > > > possible paths, it kind of does have an effect already of saying
> > > > 'don't try too hard', so would it be harmful to still keep that for
> > > > now? If it's not, I'd personally prefer to just leave it as is until
> > > > there's some form of support by kvmalloc() and friends.
> > > 
> > > Well, you can use kvmalloc(size, GFP_KERNEL|__GFP_NORETRY). It is not
> > > disallowed. It is not _supported_ which means that if it doesn't work as
> > > you expect you are on your own. Which is actually the situation right
> > > now as well. But I still think that this is just not right thing to do.
> > > Even though it might happen to work in some cases it gives a false
> > > impression of a solution. So I would rather go with
> > 
> > Hmm. 'On my own' means, we could potentially BUG somewhere down the
> > vmalloc implementation, etc, presumably? So it might in-fact be
> > harmful to pass that, right?
> 
> No it would mean that it might eventually hit the behavior which you are
> trying to avoid - in other words it may invoke OOM killer even though
> __GFP_NORETRY means giving up before any system wide disruptive actions
> a re taken.

I will separate both bpf and netfilter hunks into its own patch with the
clarification. Does the following look better?
---
From ab6b2d724228e4abcc69c44f5ab1ce91009aa91d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 26 Jan 2017 14:59:21 +0100
Subject: [PATCH] net, bpf: use kvzalloc helper

both bpf_map_area_alloc and xt_alloc_table_info try really hard to
play nicely with large memory requests which can be triggered from
the userspace (by an admin). See 5bad87348c70 ("netfilter: x_tables:
avoid warn and OOM killer on vmalloc call") resp. d407bd25a204 ("bpf:
don't trigger OOM killer under pressure with map alloc").

The current allocation pattern strongly resembles kvmalloc helper except
for one thing __GFP_NORETRY is not used for the vmalloc fallback. The
main reason why kvmalloc doesn't really support __GFP_NORETRY is
because vmalloc doesn't support this flag properly and it is far from
straightforward to make it understand it because there are some hard
coded GFP_KERNEL allocation deep in the call chains. This patch simply
replaces the open coded variants with kvmalloc and puts a note to
push on MM people to support __GFP_NORETRY in kvmalloc it this turns out
to be really needed along with OOM report pointing at vmalloc.

If there is an immediate need and no full support yet then
	kvmalloc(size, gfp | __GFP_NORETRY)
will work as good as __vmalloc(gfp | __GFP_NORETRY) - in other words it
might trigger the OOM in some cases.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/bpf/syscall.c     | 19 +++++--------------
 net/netfilter/x_tables.c | 16 ++++++----------
 2 files changed, 11 insertions(+), 24 deletions(-)

Comments

kernel test robot Jan. 26, 2017, 2:37 p.m. UTC | #1
Hi Michal,

[auto build test ERROR on next-20170125]
[cannot apply to linus/master linux/master nf-next/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.10-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/net-bpf-use-kvzalloc-helper/20170126-221904
config: x86_64-randconfig-x017-201704 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   net/netfilter/x_tables.c: In function 'xt_alloc_table_info':
>> net/netfilter/x_tables.c:1012:9: error: implicit declaration of function 'kvzalloc' [-Werror=implicit-function-declaration]
     info = kvzalloc(sz, GFP_KERNEL);
            ^~~~~~~~
>> net/netfilter/x_tables.c:1012:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
     info = kvzalloc(sz, GFP_KERNEL);
          ^
   cc1: some warnings being treated as errors

vim +/kvzalloc +1012 net/netfilter/x_tables.c

  1006	
  1007		/*
  1008		 * FIXME: we would really like to not trigger the OOM killer and rather
  1009		 * fail instead. This is not supported right now. Please nag MM people
  1010		 * if these OOM start bothering people.
  1011		 */
> 1012		info = kvzalloc(sz, GFP_KERNEL);
  1013		info->size = size;
  1014		return info;
  1015	}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
kernel test robot Jan. 26, 2017, 2:58 p.m. UTC | #2
Hi Michal,

[auto build test ERROR on next-20170125]
[cannot apply to linus/master linux/master nf-next/master v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.10-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/net-bpf-use-kvzalloc-helper/20170126-221904
config: x86_64-randconfig-x014-201704 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   kernel/bpf/syscall.c: In function 'bpf_map_area_alloc':
>> kernel/bpf/syscall.c:61:9: error: implicit declaration of function 'kvzalloc' [-Werror=implicit-function-declaration]
     return kvzalloc(size, GFP_USER);
            ^~~~~~~~
>> kernel/bpf/syscall.c:61:9: warning: return makes pointer from integer without a cast [-Wint-conversion]
     return kvzalloc(size, GFP_USER);
            ^~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/kvzalloc +61 kernel/bpf/syscall.c

    55	{
    56		/*
    57		 * FIXME: we would really like to not trigger the OOM killer and rather
    58		 * fail instead. This is not supported right now. Please nag MM people
    59		 * if these OOM start bothering people.
    60		 */
  > 61		return kvzalloc(size, GFP_USER);
    62	}
    63	
    64	void bpf_map_area_free(void *area)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19b6129eab23..a6dc4d596f14 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -53,21 +53,12 @@  void bpf_register_map_type(struct bpf_map_type_list *tl)
 
 void *bpf_map_area_alloc(size_t size)
 {
-	/* We definitely need __GFP_NORETRY, so OOM killer doesn't
-	 * trigger under memory pressure as we really just want to
-	 * fail instead.
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
 	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
-	void *area;
-
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc(size, GFP_USER | flags);
-		if (area != NULL)
-			return area;
-	}
-
-	return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | flags,
-			 PAGE_KERNEL);
+	return kvzalloc(size, GFP_USER);
 }
 
 void bpf_map_area_free(void *area)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d529989f5791..ba8ba633da72 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -995,16 +995,12 @@  struct xt_table_info *xt_alloc_table_info(unsigned int size)
 	if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
 		return NULL;
 
-	if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
-	if (!info) {
-		info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN |
-				     __GFP_NORETRY | __GFP_HIGHMEM,
-				 PAGE_KERNEL);
-		if (!info)
-			return NULL;
-	}
-	memset(info, 0, sizeof(*info));
+	/*
+	 * FIXME: we would really like to not trigger the OOM killer and rather
+	 * fail instead. This is not supported right now. Please nag MM people
+	 * if these OOM start bothering people.
+	 */
+	info = kvzalloc(sz, GFP_KERNEL);
 	info->size = size;
 	return info;
 }