diff mbox

[v2] Don't fall back to mmap if the original arena is not corrupt

Message ID 20150819172549.GN2415@spoyarek.pnq.redhat.com
State New
Headers show

Commit Message

Siddhesh Poyarekar Aug. 19, 2015, 5:25 p.m. UTC
On Wed, Aug 19, 2015 at 09:50:29AM -0400, Mike Frysinger wrote:
> verified how ?

Sorry, I meant to write "verified that there were no regressions on
x86_64".  Testing this behaviour itself is tricky since you'll need to
generate the right amount of load to cause contention, with a single
arena or in a retry path.

> doesn't this comment explicitly say you don't want to use the avoid arena ?
> doesn't it need updating now with this change in logic ?

Right, updated patch with comment:

Siddhesh

Comments

Mike Frysinger Aug. 19, 2015, 6:53 p.m. UTC | #1
On 19 Aug 2015 22:55, Siddhesh Poyarekar wrote:
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -823,16 +823,20 @@ reused_arena (mstate avoid_arena)
>  
>    /* Make sure that the arena we get is not corrupted.  */
>    mstate begin = result;
> +  bool looped = false;
> +
>    while (arena_is_corrupt (result) || result == avoid_arena)
>      {
>        result = result->next;
>        if (result == begin)
> -	break;
> +	{
> +	  looped = true;
> +	  break;
> +	}
>      }
>  
> -  /* We could not find any arena that was either not corrupted or not the one
> -     we wanted to avoid.  */
> -  if (result == begin || result == avoid_arena)
> +  /* We could not find any arena that was not corrupted.  */
> +  if (looped)
>      return NULL;

i'm not sure this fixes the stated bug.  say you have two arenas:
	a -> b -> a
b is corrupt and a is avoided.
	avoid_arena = a;
	result = a;
	begin = a;

so the first loop runs because result == avoid_arena, so we move result=b,
and then the second loop runs because arena_is_corrupt(b), so we move
result=a, and then we hit the if statement which sets looped=true, and then
we break out.  then we return NULL even though there is a non-corrupt region.

looks like this applies whenever the first region is avoided and the others
are corrupt.

maybe instead of setting a looped variable, perhaps you want:
	while (arena_is_corrupt (result) || result == avoid_arena) {
		result = result->next;
		if (result == begin)
			break;
	}
	if (__glibc_unlikely (arena_is_corrupt (result))) {
		if (arena_is_corrupt (avoid_arena))
			return NULL;
		result = avoid_arena;
	}
-mike
Siddhesh Poyarekar Aug. 24, 2015, 7:15 a.m. UTC | #2
On Wed, Aug 19, 2015 at 02:53:06PM -0400, Mike Frysinger wrote:
> i'm not sure this fixes the stated bug.  say you have two arenas:
> 	a -> b -> a
> b is corrupt and a is avoided.
> 	avoid_arena = a;
> 	result = a;
> 	begin = a;
> 
> so the first loop runs because result == avoid_arena, so we move result=b,
> and then the second loop runs because arena_is_corrupt(b), so we move
> result=a, and then we hit the if statement which sets looped=true, and then
> we break out.  then we return NULL even though there is a non-corrupt region.
> 
> looks like this applies whenever the first region is avoided and the others
> are corrupt.

Thanks for pointing that out, but it appears to me that this problem
is only valid when none of the arenas are corrupt and no arenas are
avoided.  One would not really care about performance loss when any of
the arenas are corrupt because it will just be a matter of time before
the program dies.  There is of course the case of:

a -> a where one could fall back to mmap, but that's OK for two reasons:

1. It is unlikely because the code will always attempt to create
   another arena before going down the retry path, so you'd at least
   end up with a -> b -> a.  In the unlikely case that creating the
   arena fails, we don't fall back to reusing anyway, and we just use
   mmap

2. Even if we ever end up with the a -> a case, it's OK to fall back
   because we don't really care about retrying in the same map that we
   failed, so I'd argue that the original code was wrong in returning
   the same arena.

This also means that my original assertion about the "degenerate case
being only one arena" is wrong; there is no such degenerate case.
Also, the code around this is a bit murky because of some useless
bits.  I'll try to clean them up in follow-up patches.

Is the explanation sound enough for me to commit this?  I can add
comments in the code to explain this in detail if you think it'll
help.

Siddhesh
Siddhesh Poyarekar Sept. 8, 2015, 8:25 a.m. UTC | #3
Ping!

On Mon, Aug 24, 2015 at 12:45:07PM +0530, Siddhesh Poyarekar wrote:
> On Wed, Aug 19, 2015 at 02:53:06PM -0400, Mike Frysinger wrote:
> > i'm not sure this fixes the stated bug.  say you have two arenas:
> > 	a -> b -> a
> > b is corrupt and a is avoided.
> > 	avoid_arena = a;
> > 	result = a;
> > 	begin = a;
> > 
> > so the first loop runs because result == avoid_arena, so we move result=b,
> > and then the second loop runs because arena_is_corrupt(b), so we move
> > result=a, and then we hit the if statement which sets looped=true, and then
> > we break out.  then we return NULL even though there is a non-corrupt region.
> > 
> > looks like this applies whenever the first region is avoided and the others
> > are corrupt.
> 
> Thanks for pointing that out, but it appears to me that this problem
> is only valid when none of the arenas are corrupt and no arenas are
> avoided.  One would not really care about performance loss when any of
> the arenas are corrupt because it will just be a matter of time before
> the program dies.  There is of course the case of:
> 
> a -> a where one could fall back to mmap, but that's OK for two reasons:
> 
> 1. It is unlikely because the code will always attempt to create
>    another arena before going down the retry path, so you'd at least
>    end up with a -> b -> a.  In the unlikely case that creating the
>    arena fails, we don't fall back to reusing anyway, and we just use
>    mmap
> 
> 2. Even if we ever end up with the a -> a case, it's OK to fall back
>    because we don't really care about retrying in the same map that we
>    failed, so I'd argue that the original code was wrong in returning
>    the same arena.
> 
> This also means that my original assertion about the "degenerate case
> being only one arena" is wrong; there is no such degenerate case.
> Also, the code around this is a bit murky because of some useless
> bits.  I'll try to clean them up in follow-up patches.
> 
> Is the explanation sound enough for me to commit this?  I can add
> comments in the code to explain this in detail if you think it'll
> help.
> 
> Siddhesh
diff mbox

Patch

diff --git a/malloc/arena.c b/malloc/arena.c
index 21ecc5a1..2430d77 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -823,16 +823,20 @@  reused_arena (mstate avoid_arena)
 
   /* Make sure that the arena we get is not corrupted.  */
   mstate begin = result;
+  bool looped = false;
+
   while (arena_is_corrupt (result) || result == avoid_arena)
     {
       result = result->next;
       if (result == begin)
-	break;
+	{
+	  looped = true;
+	  break;
+	}
     }
 
-  /* We could not find any arena that was either not corrupted or not the one
-     we wanted to avoid.  */
-  if (result == begin || result == avoid_arena)
+  /* We could not find any arena that was not corrupted.  */
+  if (looped)
     return NULL;
 
   /* No arena available without contention.  Wait for the next in line.  */