Message ID | DB6PR0801MB20537B594B5629491605CD74834B0@DB6PR0801MB2053.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | Add single-threaded fast path to malloc | expand |
On Okt 12 2017, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > diff --git a/malloc/malloc.c b/malloc/malloc.c > index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) > >= av->system_mem, 0)) > { > /* We might not have a lock at this point and concurrent modifications > - of system_mem might have let to a false positive. Redo the test > - after getting the lock. */ > - if (!have_lock > - || ({ __libc_lock_lock (av->mutex); > - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; > - })) > + of system_mem might result in a false positive. Redo the test after > + getting the lock. */ > + if (!have_lock) > + __libc_lock_lock (av->mutex); > + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) There is no need to redo the tests if we had the lock. Andreas.
On 10/12/2017 11:35 AM, Wilco Dijkstra wrote: > This patch fixes a deadlock in the fastbin consistency check. > If we fail the fast check due to concurrent modifications to > the next chunk or system_mem, we should not lock if we already > have the arena lock. Simplify the check to make it obviously > correct. I don't think the subject line is correct. What is the deadlock? I don't see it. Thanks, Florian
On Okt 12 2017, Florian Weimer <fweimer@redhat.com> wrote: > On 10/12/2017 11:35 AM, Wilco Dijkstra wrote: >> This patch fixes a deadlock in the fastbin consistency check. >> If we fail the fast check due to concurrent modifications to >> the next chunk or system_mem, we should not lock if we already >> have the arena lock. Simplify the check to make it obviously >> correct. > > I don't think the subject line is correct. What is the deadlock? I don't > see it. The problem is that commit 24cffce736 inverted the condition on have_lock, which is wrong. Andreas.
Florian Weimer wrote: > I don't think the subject line is correct. What is the deadlock? I > don't see it. > - if (!have_lock > - || ({ __libc_lock_lock (av->mutex); It's right there. Have_lock means you've just done __libc_lock_lock (av->mutex), so doing it again (same thread) implies deadlock. Wilco
Andreas Schwab wrote: > diff --git a/malloc/malloc.c b/malloc/malloc.c > index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) > >= av->system_mem, 0)) > { > /* We might not have a lock at this point and concurrent modifications > - of system_mem might have let to a false positive. Redo the test > - after getting the lock. */ > - if (!have_lock > - || ({ __libc_lock_lock (av->mutex); > - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; > - })) > + of system_mem might result in a false positive. Redo the test after > + getting the lock. */ > + if (!have_lock) > + __libc_lock_lock (av->mutex); > + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ > + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) > There is no need to redo the tests if we had the lock. Well I guess an alternative is to do: if (have_lock) print error else { lock repeat test and print error unlock } I also wonder whether we should actually unlock again before printing the error - or do we assume/hope/know no memory allocation is ever required in the error case? Wilco
On Okt 12 2017, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > Andreas Schwab wrote: > >> diff --git a/malloc/malloc.c b/malloc/malloc.c >> index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 >> --- a/malloc/malloc.c >> +++ b/malloc/malloc.c >> @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) >> >= av->system_mem, 0)) >> { >> /* We might not have a lock at this point and concurrent modifications >> - of system_mem might have let to a false positive. Redo the test >> - after getting the lock. */ >> - if (!have_lock >> - || ({ __libc_lock_lock (av->mutex); >> - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ >> - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; >> - })) >> + of system_mem might result in a false positive. Redo the test after >> + getting the lock. */ >> + if (!have_lock) >> + __libc_lock_lock (av->mutex); >> + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ >> + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) > >> There is no need to redo the tests if we had the lock. > > Well I guess an alternative is to do: > > if (have_lock) > print error > else > { > lock > repeat test and print error > unlock > } No, you can just test have_lock again, and skip the redo if set. Still much clearer than the original layout. Andreas.
On 10/12/2017 12:16 PM, Andreas Schwab wrote: > On Okt 12 2017, Florian Weimer <fweimer@redhat.com> wrote: > >> On 10/12/2017 11:35 AM, Wilco Dijkstra wrote: >>> This patch fixes a deadlock in the fastbin consistency check. >>> If we fail the fast check due to concurrent modifications to >>> the next chunk or system_mem, we should not lock if we already >>> have the arena lock. Simplify the check to make it obviously >>> correct. >> >> I don't think the subject line is correct. What is the deadlock? I don't >> see it. > > The problem is that commit 24cffce736 inverted the condition on > have_lock, which is wrong. Oh, right, sorry about that. Thanks, Florian
On 10/12/2017 12:18 PM, Wilco Dijkstra wrote: > Florian Weimer wrote: > >> I don't think the subject line is correct. What is the deadlock? I >> don't see it. > >> - if (!have_lock >> - || ({ __libc_lock_lock (av->mutex); > > It's right there. Have_lock means you've just done __libc_lock_lock (av->mutex), > so doing it again (same thread) implies deadlock. Hmm. So if we enter this code path with have_lock, we don't have to re-do the check, but malloc_printerr will be called in the end anyway, so this is not the interesting case. In practice, without heap corruption, the lock will be acquired here and re-checking is needed, so I think your cleanup is okay after all. The logic is indeed much clearer. Thanks, Florian
DJ Delorie wrote: > I think the bug can be fixed by only changing the sense of the have_lock > condition: > - if (!have_lock > + if (have_lock Yes it could but then it's still impossible to follow the logic... The only reason I spotted the bug was because I refactored the code. Then the bug suddenly became very obvious. I think locking as a side effect in conditions is something we should avoid. So my variant or this one are reasonable ways to do it: if (error detected) { if (have_lock) /* we had the lock during the test above, so the test is valid, and the error we detect is valid. */ print error message else /* we didn't have the lock, so aquire it and repeat the test. If the error is still present, fail. */ get lock, repeat test, maybe print error message } I suppose we could also print the error once at the end: if (error detected) { bool fail = true; if (!have_lock) { get_lock fail = repeat test unlock } if (fail) print error } Wilco
On 10/12/2017 11:35 PM, Wilco Dijkstra wrote: > I suppose we could also print the error once at the end: > > if (error detected) > { > bool fail = true; > if (!have_lock) > { > get_lock > fail = repeat test > unlock > } > if (fail) > print error > } Yes, that's also a good option. Anyway, should I make the one-character change removing the ! in the meantime? This is a real bug, and we should fix that even if the code still isn't as pretty as it could be. Thanks, Florian
Florian Weimer wrote: > > Anyway, should I make the one-character change removing the ! in the > meantime? This is a real bug, and we should fix that even if the code > still isn't as pretty as it could be. I could easily commit the current patch as is if you're thinking of backporting it. Once we agree on what the best way of writing this sequence, I can provide an updated patch. Wilco
OK, I've committed this:
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date: Thu Oct 19 18:19:55 2017 +0100
Fix deadlock in _int_free consistency check
This patch fixes a deadlock in the fastbin consistency check.
If we fail the fast check due to concurrent modifications to
the next chunk or system_mem, we should not lock if we already
have the arena lock. Simplify the check to make it obviously
correct.
* malloc/malloc.c (_int_free): Fix deadlock bug in consistency check.
--
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 784f401b02f7d812936c013632478445ce0773b1..f9054dcea039fc1ecb2456c5c63057ede7a57bfa 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -4171,17 +4171,20 @@ _int_free (mstate av, mchunkptr p, int have_lock)
|| __builtin_expect (chunksize (chunk_at_offset (p, size))
>= av->system_mem, 0))
{
+ bool fail = true;
/* We might not have a lock at this point and concurrent modifications
- of system_mem might have let to a false positive. Redo the test
- after getting the lock. */
- if (!have_lock
- || ({ __libc_lock_lock (av->mutex);
- chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ
- || chunksize (chunk_at_offset (p, size)) >= av->system_mem;
- }))
+ of system_mem might result in a false positive. Redo the test after
+ getting the lock. */
+ if (!have_lock)
+ {
+ __libc_lock_lock (av->mutex);
+ fail = (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ
+ || chunksize (chunk_at_offset (p, size)) >= av->system_mem);
+ __libc_lock_unlock (av->mutex);
+ }
+
+ if (fail)
malloc_printerr ("free(): invalid next size (fast)");
- if (! have_lock)
- __libc_lock_unlock (av->mutex);
}
free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);
diff --git a/malloc/malloc.c b/malloc/malloc.c index c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f..f4f44400d120188c4d0bece996380e04b35c8fac 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -4168,15 +4168,14 @@ _int_free (mstate av, mchunkptr p, int have_lock) >= av->system_mem, 0)) { /* We might not have a lock at this point and concurrent modifications - of system_mem might have let to a false positive. Redo the test - after getting the lock. */ - if (!have_lock - || ({ __libc_lock_lock (av->mutex); - chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ - || chunksize (chunk_at_offset (p, size)) >= av->system_mem; - })) + of system_mem might result in a false positive. Redo the test after + getting the lock. */ + if (!have_lock) + __libc_lock_lock (av->mutex); + if (chunksize_nomask (chunk_at_offset (p, size)) <= 2 * SIZE_SZ + || chunksize (chunk_at_offset (p, size)) >= av->system_mem) malloc_printerr ("free(): invalid next size (fast)"); - if (! have_lock) + if (!have_lock) __libc_lock_unlock (av->mutex); }