Message ID | 20150819172549.GN2415@spoyarek.pnq.redhat.com |
---|---|
State | New |
Headers | show |
On 19 Aug 2015 22:55, Siddhesh Poyarekar wrote: > --- a/malloc/arena.c > +++ b/malloc/arena.c > @@ -823,16 +823,20 @@ reused_arena (mstate avoid_arena) > > /* Make sure that the arena we get is not corrupted. */ > mstate begin = result; > + bool looped = false; > + > while (arena_is_corrupt (result) || result == avoid_arena) > { > result = result->next; > if (result == begin) > - break; > + { > + looped = true; > + break; > + } > } > > - /* We could not find any arena that was either not corrupted or not the one > - we wanted to avoid. */ > - if (result == begin || result == avoid_arena) > + /* We could not find any arena that was not corrupted. */ > + if (looped) > return NULL; i'm not sure this fixes the stated bug. say you have two arenas: a -> b -> a b is corrupt and a is avoided. avoid_arena = a; result = a; begin = a; so the first loop runs because result == avoid_arena, so we move result=b, and then the second loop runs because arena_is_corrupt(b), so we move result=a, and then we hit the if statement which sets looped=true, and then we break out. then we return NULL even though there is a non-corrupt region. looks like this applies whenever the first region is avoided and the others are corrupt. maybe instead of setting a looped variable, perhaps you want: while (arena_is_corrupt (result) || result == avoid_arena) { result = result->next; if (result == begin) break; } if (__glibc_unlikely (arena_is_corrupt (result))) { if (arena_is_corrupt (avoid_arena)) return NULL; result = avoid_arena; } -mike
On Wed, Aug 19, 2015 at 02:53:06PM -0400, Mike Frysinger wrote: > i'm not sure this fixes the stated bug. say you have two arenas: > a -> b -> a > b is corrupt and a is avoided. > avoid_arena = a; > result = a; > begin = a; > > so the first loop runs because result == avoid_arena, so we move result=b, > and then the second loop runs because arena_is_corrupt(b), so we move > result=a, and then we hit the if statement which sets looped=true, and then > we break out. then we return NULL even though there is a non-corrupt region. > > looks like this applies whenever the first region is avoided and the others > are corrupt. Thanks for pointing that out, but it appears to me that this problem is only valid when none of the arenas are corrupt and no arenas are avoided. One would not really care about performance loss when any of the arenas are corrupt because it will just be a matter of time before the program dies. There is of course the case of: a -> a where one could fall back to mmap, but that's OK for two reasons: 1. It is unlikely because the code will always attempt to create another arena before going down the retry path, so you'd at least end up with a -> b -> a. In the unlikely case that creating the arena fails, we don't fall back to reusing anyway, and we just use mmap 2. Even if we ever end up with the a -> a case, it's OK to fall back because we don't really care about retrying in the same map that we failed, so I'd argue that the original code was wrong in returning the same arena. This also means that my original assertion about the "degenerate case being only one arena" is wrong; there is no such degenerate case. Also, the code around this is a bit murky because of some useless bits. I'll try to clean them up in follow-up patches. Is the explanation sound enough for me to commit this? I can add comments in the code to explain this in detail if you think it'll help. Siddhesh
Ping! On Mon, Aug 24, 2015 at 12:45:07PM +0530, Siddhesh Poyarekar wrote: > On Wed, Aug 19, 2015 at 02:53:06PM -0400, Mike Frysinger wrote: > > i'm not sure this fixes the stated bug. say you have two arenas: > > a -> b -> a > > b is corrupt and a is avoided. > > avoid_arena = a; > > result = a; > > begin = a; > > > > so the first loop runs because result == avoid_arena, so we move result=b, > > and then the second loop runs because arena_is_corrupt(b), so we move > > result=a, and then we hit the if statement which sets looped=true, and then > > we break out. then we return NULL even though there is a non-corrupt region. > > > > looks like this applies whenever the first region is avoided and the others > > are corrupt. > > Thanks for pointing that out, but it appears to me that this problem > is only valid when none of the arenas are corrupt and no arenas are > avoided. One would not really care about performance loss when any of > the arenas are corrupt because it will just be a matter of time before > the program dies. There is of course the case of: > > a -> a where one could fall back to mmap, but that's OK for two reasons: > > 1. It is unlikely because the code will always attempt to create > another arena before going down the retry path, so you'd at least > end up with a -> b -> a. In the unlikely case that creating the > arena fails, we don't fall back to reusing anyway, and we just use > mmap > > 2. Even if we ever end up with the a -> a case, it's OK to fall back > because we don't really care about retrying in the same map that we > failed, so I'd argue that the original code was wrong in returning > the same arena. > > This also means that my original assertion about the "degenerate case > being only one arena" is wrong; there is no such degenerate case. > Also, the code around this is a bit murky because of some useless > bits. I'll try to clean them up in follow-up patches. > > Is the explanation sound enough for me to commit this? I can add > comments in the code to explain this in detail if you think it'll > help. > > Siddhesh
diff --git a/malloc/arena.c b/malloc/arena.c index 21ecc5a1..2430d77 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -823,16 +823,20 @@ reused_arena (mstate avoid_arena) /* Make sure that the arena we get is not corrupted. */ mstate begin = result; + bool looped = false; + while (arena_is_corrupt (result) || result == avoid_arena) { result = result->next; if (result == begin) - break; + { + looped = true; + break; + } } - /* We could not find any arena that was either not corrupted or not the one - we wanted to avoid. */ - if (result == begin || result == avoid_arena) + /* We could not find any arena that was not corrupted. */ + if (looped) return NULL; /* No arena available without contention. Wait for the next in line. */