4.7.0, cp -al causes OOM

Message ID	20160814125111.GE9248@dhcp22.suse.cz
State	Not Applicable, archived
Headers	show Return-Path: <linux-ext4-owner@vger.kernel.org> Date: Sun, 14 Aug 2016 14:51:12 +0200 From: Michal Hocko <mhocko@kernel.org> To: arekm@maven.pl Cc: linux-ext4@vger.kernel.org, linux-mm@vger.kernel.org Subject: Re: 4.7.0, cp -al causes OOM Message-ID: <20160814125111.GE9248@dhcp22.suse.cz> References: <201608120901.41463.a.miskiewicz@gmail.com> <20160812074340.GC3639@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160812074340.GC3639@dhcp22.suse.cz> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk

Message ID

20160814125111.GE9248@dhcp22.suse.cz

State

Not Applicable, archived

Headers

Date: Sun, 14 Aug 2016 14:51:12 +0200
From: Michal Hocko <mhocko@kernel.org>
To: arekm@maven.pl
Cc: linux-ext4@vger.kernel.org, linux-mm@vger.kernel.org
Subject: Re: 4.7.0, cp -al causes OOM
Message-ID: <20160814125111.GE9248@dhcp22.suse.cz>
References: <201608120901.41463.a.miskiewicz@gmail.com>
	<20160812074340.GC3639@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160812074340.GC3639@dhcp22.suse.cz>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk

Commit Message

Michal Hocko Aug. 14, 2016, 12:51 p.m. UTC

On Fri 12-08-16 09:43:40, Michal Hocko wrote:
> Hi,
> 
> On Fri 12-08-16 09:01:41, Arkadiusz Miskiewicz wrote:
[...]
> > [87259.568395] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15360kB
> > [87259.568403] Node 0 DMA32: 11467*4kB (UME) 1525*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 58068kB
> > [87259.568411] Node 0 Normal: 9927*4kB (UMEH) 1119*8kB (UMH) 19*16kB (H) 8*32kB (H) 2*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49348kB
> 
> As you can see there are barely some high order pages available. There
> are few in the atomic reserves which is a bit surprising because I would
> expect they would get released under a heavy memory pressure. I will
> double check that part.

OK, so the reason is that we are trying to preserve at least one page
block per zone. This is not really all that much to matter overall but I
guess we should just release those pageblocks because OOM is certainly
much worse than an high order GFP_ATOMIC request failing. The diff below
does that. I am a bit skeptical this will make much difference but let's
give it a try. I will also send another patch which should show
compaction/migration counters during high order OOMs. This might tell us
a bit more about the compaction behavior.
---

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d46b65061be..b8600943184e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2053,8 +2053,7 @@  static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
 								ac->nodemask) {
-		/* Preserve at least one pageblock */
-		if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
+		if (!zone->nr_reserved_highatomic)
 			continue;
 
 		spin_lock_irqsave(&zone->lock, flags);
@@ -3276,11 +3275,10 @@  __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
-	 * pages are pinned on the per-cpu lists or in high alloc reserves.
+	 * pages are pinned on the per-cpu lists.
 	 * Shrink them them and try again
 	 */
 	if (!page && !drained) {
-		unreserve_highatomic_pageblock(ac);
 		drain_all_pages(NULL);
 		drained = true;
 		goto retry;
@@ -3636,6 +3634,12 @@  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 
 	/*
+	 * Make sure we are not pinning atomic higher order reserves when we
+	 * are really fighting to get !costly order and running out of memory
+	 */
+	unreserve_highatomic_pageblock(ac);
+
+	/*
 	 * It doesn't make any sense to retry for the compaction if the order-0
 	 * reclaim is not able to make any progress because the current
 	 * implementation of the compaction depends on the sufficient amount

4.7.0, cp -al causes OOM

Commit Message

Patch