From patchwork Thu Feb 14 16:03:47 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jann Horn <jannh@google.com>
X-Patchwork-Id: 1042264
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
	spf=none (mailfrom) smtp.mailfrom=vger.kernel.org
	(client-ip=209.132.180.67; helo=vger.kernel.org;
	envelope-from=netdev-owner@vger.kernel.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none)
	header.from=google.com
Authentication-Results: ozlabs.org; dkim=pass (2048-bit key;
	unprotected) header.d=google.com header.i=@google.com
	header.b="sYFeD6TH"; dkim-atps=neutral
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 440h600258z9sMr
	for <patchwork-incoming-netdev@ozlabs.org>;
	Fri, 15 Feb 2019 03:04:08 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S2502598AbfBNQEB (ORCPT
	<rfc822;patchwork-incoming-netdev@ozlabs.org>);
	Thu, 14 Feb 2019 11:04:01 -0500
Received: from mail-it1-f201.google.com ([209.85.166.201]:54881 "EHLO
	mail-it1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S2407698AbfBNQEA (ORCPT
	<rfc822;netdev@vger.kernel.org>); Thu, 14 Feb 2019 11:04:00 -0500
Received: by mail-it1-f201.google.com with SMTP id v125so3989995itc.4
	for <netdev@vger.kernel.org>; Thu, 14 Feb 2019 08:03:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20161025;
	h=date:message-id:mime-version:subject:from:to:cc;
	bh=LMMCuwulD/4U8krzNsFeJe/3gdpanzjXyzK8S2//JGE=;
	b=sYFeD6THGVUp5WphhWTj1Ubt/fdAYefIIH++4LXXYYHZD+T88HZIwTslFAMpi3MQDE
	DY2SuwPu8hWWzDRJ3GDmA+O2/ZbrHmR26sRL45JKqqqMKiyiimyeeHoFTDuuT3Ccbkpj
	29UPAnOO5IZ3GiHcxZQaaLlB/uhP3kRznkQHpdjLTk7N0u7eg2m93CWCUrNnv05LwQ4Z
	HNTB+/GAE6sWq7rRvpfoMOVjBYXnCvGDJHfOYl+EEl3Xk7XPjjh+2zhq0qVSe11J4KR7
	G1SXinUEoYGbWhtsLgeUQAmy81PurKEE3cgr5vJMoe+47MDvsY7VppV/2MQB4XY6LHac
	OPag==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc;
	bh=LMMCuwulD/4U8krzNsFeJe/3gdpanzjXyzK8S2//JGE=;
	b=KhA2Vkh0tMYwsFem1a5Er0J7oM6MZekQvS9PWCxYwS24NACjJt12gYMykuhM4w2BBN
	Nr1sFaOutiAfcGlohml/1eF+99YoFI1F0wJJN1RHVgtM0vSiun1UD5UmMH8DW3xCQDSS
	Un2uiZYpy5JDljvsRcvwfYX7grgMxWAi5SnUvRcfyZDpAx8/Do1zHJ4ZOfbOwkChXHg1
	roKC4A12Dv7I1kAqwYVBTiGw5T+E87YL5VvI1pwJHs97wzVmant17g7Itw2PXq2Ee42/
	uOTWaibtCxPGeqqtTXS3wo22DdY74j0QneJ4ww3jdnzX78m1nUjLn9VGwXd1tnxsTB8I
	34ug==
X-Gm-Message-State: AHQUAuZJUaHAdDTdTITornbtY7VBfKkQDeGDNKLctQibE5mVVcrwghgT
	SiYdK0vPMr8+r7cs9hH0OxWs6B78hA==
X-Google-Smtp-Source: 
 AHgI3IYn2V6oFFy2kAe3b//vB9567tnSNHvUsoy5juN0lM0WfW1jir1J3r8aK4geTP5uVA1GoJWcIyo6Vg==
X-Received: by 2002:a24:d82:: with SMTP id 124mr2780889itx.10.1550160239377;
	Thu, 14 Feb 2019 08:03:59 -0800 (PST)
Date: Thu, 14 Feb 2019 17:03:47 +0100
Message-Id: <20190214160347.13647-1-jannh@google.com>
Mime-Version: 1.0
X-Mailer: git-send-email 2.21.0.rc0.258.g878e2cd30e-goog
Subject: [PATCH net v2] mm: page_alloc: fix ref bias in page_frag_alloc() for
	1-byte allocs
From: Jann Horn <jannh@google.com>
To: "David S. Miller" <davem@davemloft.net>, netdev@vger.kernel.org,
	jannh@google.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	Oscar Salvador <osalvador@suse.de>,
	Mel Gorman <mgorman@techsingularity.net>, alexander.duyck@gmail.com
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.

However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.

Per Alexander Duyck's request, use PAGE_FRAG_CACHE_MAX_SIZE instead of
nc->size for the bias in the hope of making the generated code slightly
faster.

Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.

To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.

Signed-off-by: Jann Horn <jannh@google.com>
---
sending to davem as specified by akpm

changed in v2:
 - use PAGE_FRAG_CACHE_MAX_SIZE instead of nc->size for refcount bias
   (Alexander Duyck)

 mm/page_alloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35fdde041f5c..7f79b78bc829 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4675,11 +4675,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
-		page_ref_add(page, size - 1);
+		page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE);
 
 		/* reset page count bias and offset to start of new frag */
 		nc->pfmemalloc = page_is_pfmemalloc(page);
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
 		nc->offset = size;
 	}
 
@@ -4695,10 +4695,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		size = nc->size;
 #endif
 		/* OK, page count is 0, we can safely set it */
-		set_page_count(page, size);
+		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
 
 		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
 		offset = size - fragsz;
 	}