diff mbox

[1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed

Message ID 20091026100019.2F4A.A69D9226@jp.fujitsu.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

KOSAKI Motohiro Oct. 26, 2009, 1:11 a.m. UTC
> If a direct reclaim makes no forward progress, it considers whether it
> should go OOM or not. Whether OOM is triggered or not, it may retry the
> application afterwards. In times past, this would always wake kswapd as well
> but currently, kswapd is not woken up after direct reclaim fails. For order-0
> allocations, this makes little difference but if there is a heavy mix of
> higher-order allocations that direct reclaim is failing for, it might mean
> that kswapd is not rewoken for higher orders as much as it did previously.
> 
> This patch wakes up kswapd when an allocation is being retried after a direct
> reclaim failure. It would be expected that kswapd is already awake, but
> this has the effect of telling kswapd to reclaim at the higher order as well.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  mm/page_alloc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf72055..dfa4362 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
>  		goto nopage;
>  
> +restart:
>  	wake_all_kswapd(order, zonelist, high_zoneidx);
>  
> -restart:
>  	/*
>  	 * OK, we're below the kswapd watermark and have kicked background
>  	 * reclaim. Now things get more complex, so set up alloc_flags according

I think this patch is correct. personally, I like to add some commnent
at restart label. but it isn't big matter.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

However, I have a question. __alloc_pages_slowpath() retry logic is,

1. try_to_free_pages() reclaimed some pages:
	-> wait awhile and goto rebalance
2. try_to_free_pages() didn't reclaimed any page:
	-> call out_of_memory() and goto restart

Then, case-1 should be fixed too? 
I mean,

-------------------------------------------

?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Rientjes Oct. 26, 2009, 7:10 a.m. UTC | #1
On Mon, 26 Oct 2009, KOSAKI Motohiro wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf72055..5a27896 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1899,6 +1899,12 @@ rebalance:
>  	if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
>  		/* Wait for some write requests to complete then retry */
>  		congestion_wait(BLK_RW_ASYNC, HZ/50);
> +
> +		/*
> +		 * While we wait congestion wait, Amount of free memory can
> +		 * be changed dramatically. Thus, we kick kswapd again.
> +		 */
> +		wake_all_kswapd(order, zonelist, high_zoneidx);
>  		goto rebalance;
>  	}
>  

We're blocking to finish writeback of the directly reclaimed memory, why 
do we need to wake kswapd afterwards?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
KOSAKI Motohiro Oct. 27, 2009, 2:42 a.m. UTC | #2
> On Mon, 26 Oct 2009, KOSAKI Motohiro wrote:
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index bf72055..5a27896 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1899,6 +1899,12 @@ rebalance:
> >  	if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
> >  		/* Wait for some write requests to complete then retry */
> >  		congestion_wait(BLK_RW_ASYNC, HZ/50);
> > +
> > +		/*
> > +		 * While we wait congestion wait, Amount of free memory can
> > +		 * be changed dramatically. Thus, we kick kswapd again.
> > +		 */
> > +		wake_all_kswapd(order, zonelist, high_zoneidx);
> >  		goto rebalance;
> >  	}
> >  
> 
> We're blocking to finish writeback of the directly reclaimed memory, why 
> do we need to wake kswapd afterwards?

the same reason of "goto restart" case. that's my intention.
if following scenario occur, it is equivalent that we didn't call wake_all_kswapd().

  1. call congestion_wait()
  2. kswapd reclaimed lots memory and sleep
  3. another task consume lots memory
  4. wakeup from congestion_wait()

IOW, if we falled into __alloc_pages_slowpath(), we naturally expect
next page_alloc() don't fall into slowpath. however if kswapd end to
its work too early, this assumption isn't true.

Is this too pessimistic assumption?





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mel Gorman Oct. 27, 2009, 12:27 p.m. UTC | #3
On Tue, Oct 27, 2009 at 11:42:55AM +0900, KOSAKI Motohiro wrote:
> > On Mon, 26 Oct 2009, KOSAKI Motohiro wrote:
> > 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index bf72055..5a27896 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1899,6 +1899,12 @@ rebalance:
> > >  	if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
> > >  		/* Wait for some write requests to complete then retry */
> > >  		congestion_wait(BLK_RW_ASYNC, HZ/50);
> > > +
> > > +		/*
> > > +		 * While we wait congestion wait, Amount of free memory can
> > > +		 * be changed dramatically. Thus, we kick kswapd again.
> > > +		 */
> > > +		wake_all_kswapd(order, zonelist, high_zoneidx);
> > >  		goto rebalance;
> > >  	}
> > >  
> > 
> > We're blocking to finish writeback of the directly reclaimed memory, why 
> > do we need to wake kswapd afterwards?
> 
> the same reason of "goto restart" case. that's my intention.
> if following scenario occur, it is equivalent that we didn't call wake_all_kswapd().
> 
>   1. call congestion_wait()
>   2. kswapd reclaimed lots memory and sleep
>   3. another task consume lots memory
>   4. wakeup from congestion_wait()
> 
> IOW, if we falled into __alloc_pages_slowpath(), we naturally expect
> next page_alloc() don't fall into slowpath. however if kswapd end to
> its work too early, this assumption isn't true.
> 
> Is this too pessimistic assumption?
> 

hmm.

The reason it's not woken in both cases a second time was to match the
behaviour of 2.6.30.  If the direct reclaimer goes asleep and another task
consumes the memory the direct reclaimer freed then the greedy process should
kick kswapd back awake again as free memory goes below the low watermark.

However, if the greedy process was allocating order-0, it's possible that
the watermarks for order-0 are being met leaving kswapd alone where as the
high-order ones are not leaving kswapd to go back asleep or to reclaim at
the wrong order.

It's a functional change but I can add it to the list of things to
consider. Thanks
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf72055..5a27896 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1899,6 +1899,12 @@  rebalance:
 	if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
 		/* Wait for some write requests to complete then retry */
 		congestion_wait(BLK_RW_ASYNC, HZ/50);
+
+		/*
+		 * While we wait congestion wait, Amount of free memory can
+		 * be changed dramatically. Thus, we kick kswapd again.
+		 */
+		wake_all_kswapd(order, zonelist, high_zoneidx);
 		goto rebalance;
 	}