[3/5] Add single-threaded path to _int_free

Message ID	DB6PR0801MB20534AC3908A760CBE1EA173834B0@DB6PR0801MB2053.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <libc-alpha-return-85696-incoming=patchwork.ozlabs.org@sourceware.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id :content-type:content-transfer-encoding:mime-version; q=dns; s= default; b=DEau5z/ODmoCd1SRW0V5ktLyXnA+6CGIQaqDwRbZaji6wkQrxeJ/L 4BAzxgQtpKjR2IqTVGvjJGEL9NA7pGFra4l4vrci+okVPc6xWWzj2xp0bRuV6GUF ktHwIIu2zNselCfwxLQaW9GMHjo1mKa/ZCBGdzQl45v+EpQ1qTiJD0= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org> CC: nd <nd@arm.com> Subject: [PATCH 3/5] Add single-threaded path to _int_free Date: Thu, 12 Oct 2017 09:35:34 +0000 Message-ID: <DB6PR0801MB20534AC3908A760CBE1EA173834B0@DB6PR0801MB2053.eurprd08.prod.outlook.com> nodisclaimer: True received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0
Series	Add single-threaded fast path to malloc \| expand [0/5] Add single-threaded fast path to malloc [1/5] Inline tcache functions [2/5] Add single-threaded path to malloc/realloc/calloc/memalloc [3/5] Add single-threaded path to _int_free [4/5] Fix deadlock in _int_free consistency check [5/5] Add single-threaded path to _int_malloc

Message ID

DB6PR0801MB20534AC3908A760CBE1EA173834B0@DB6PR0801MB2053.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:from:to:cc:subject:date:message-id
	:content-type:content-transfer-encoding:mime-version; q=dns; s=
	default; b=DEau5z/ODmoCd1SRW0V5ktLyXnA+6CGIQaqDwRbZaji6wkQrxeJ/L
	4BAzxgQtpKjR2IqTVGvjJGEL9NA7pGFra4l4vrci+okVPc6xWWzj2xp0bRuV6GUF
	ktHwIIu2zNselCfwxLQaW9GMHjo1mKa/ZCBGdzQl45v+EpQ1qTiJD0=
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
Sender: libc-alpha-owner@sourceware.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
CC: nd <nd@arm.com>
Subject: [PATCH 3/5] Add single-threaded path to _int_free
Date: Thu, 12 Oct 2017 09:35:34 +0000
Message-ID: <DB6PR0801MB20534AC3908A760CBE1EA173834B0@DB6PR0801MB2053.eurprd08.prod.outlook.com>
nodisclaimer: True
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Oct 2017 09:35:34.0406
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6SPR00MB2500

Series

Add single-threaded fast path to malloc | expand

Commit Message

Wilco Dijkstra Oct. 12, 2017, 9:35 a.m. UTC

This patch adds single-threaded fast paths to _int_free.
Move and simplify the consistency checks so that they don't
make the fastbin list update overly complex.  Bypass the
explicit locking for larger allocations.

Passes GLIBC tests, OK for commit?

ChangeLog:
2017-10-11  Wilco Dijkstra  <wdijkstr@arm.com>

	* malloc/malloc.c (_int_free): Add SINGLE_THREAD_P paths.

--

Comments

Wilco Dijkstra Oct. 12, 2017, 10:21 p.m. UTC | #1

DJ Delorie wrote:
> +    /* Check that the top of the bin is not the record we are going to
> +       add (i.e., double free).  */
> +    if (__builtin_expect (old == p, 0))
> +      malloc_printerr ("double free or corruption (fasttop)");

> By moving this out of the multi-thread loop, you're removing a check
> against some of the top chunks in the case where the loop happens more
> than once.

Sure, my goal was to share the checks between single and multi-threaded
cases and also keep the loop as simple as possible to reduce the chance
of contention.

How often do we iterate more than once? And what are the chances
that when there is contention, the block at the top of the bin is the same as
the one we're trying to insert but the older one we checked isn't?

If we want to detect double free in almost all cases, wouldn't it be much easier
to check the in-use bit? I can't make much sense of the check on the next chunk
either - that's only done if have_lock == true, so rarely executed.

> +    if (SINGLE_THREAD_P)
>        {

> If you set have_lock to zero here, you can omit the last two chunks of
> this patch.

To true you mean - yes I can do that.

Wilco

Wilco Dijkstra Oct. 13, 2017, 12:06 p.m. UTC | #2

DJ Delorie wrote:
> Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> > How often do we iterate more than once? And what are the chances that
> > when there is contention, the block at the top of the bin is the same
> > as the one we're trying to insert but the older one we checked isn't?
>
> Sadly, these are the cases that Bad Guys can create and take advantage
> of to infect systems.  Removing a test should never be done without
> extreme forethought and consideration.

If that's the case then why isn't a double free checked everywhere?

I can free a block many times without getting any errors. It just gets added
to the tcache without any consistency checks! Also freeing a block into
the fastbin even if it is already in the fastbin is not detected. Even repeatedly
freeing the same block after the undetected double free goes completely
undetected despite the supposed check...

So I think any security features need to be well designed and supported.
Doing checks that are completely ineffective doesn't make sense -
that just adds unnecessary overhead while providing no actual security
benefit (in fact it gives a false sense of security which is even worse...).

Wilco

Wilco Dijkstra Oct. 19, 2017, 6:49 p.m. UTC | #3

DJ Delorie wrote:

> > +    if (SINGLE_THREAD_P)
>>        {
>
> If you set have_lock to zero here, you can omit the last two chunks of
> this patch.
   
Here is the updated version with that change and the original check:

This patch adds single-threaded fast paths to _int_free.
Bypass the explicit locking for larger allocations.

Passes GLIBC tests, OK for commit?

ChangeLog:
2017-10-19  Wilco Dijkstra  <wdijkstr@arm.com>

	* malloc/malloc.c (_int_free): Add SINGLE_THREAD_P paths.
 
diff --git a/malloc/malloc.c b/malloc/malloc.c
index e220fba83b0f9dc515aef562bdfca6a3ad13d3ea..ca5cfff3a1b1882ae608219fdec973b7f13cbb21 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -4195,24 +4195,34 @@ _int_free (mstate av, mchunkptr p, int have_lock)
 
     /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
     mchunkptr old = *fb, old2;
-    unsigned int old_idx = ~0u;
-    do
+
+    if (SINGLE_THREAD_P)
       {
-	/* Check that the top of the bin is not the record we are going to add
-	   (i.e., double free).  */
+	/* Check that the top of the bin is not the record we are going to
+	   add (i.e., double free).  */
 	if (__builtin_expect (old == p, 0))
 	  malloc_printerr ("double free or corruption (fasttop)");
-	/* Check that size of fastbin chunk at the top is the same as
-	   size of the chunk that we are adding.  We can dereference OLD
-	   only if we have the lock, otherwise it might have already been
-	   deallocated.  See use of OLD_IDX below for the actual check.  */
-	if (have_lock && old != NULL)
-	  old_idx = fastbin_index(chunksize(old));
-	p->fd = old2 = old;
+	p->fd = old;
+	*fb = p;
       }
-    while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);
-
-    if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0))
+    else
+      do
+	{
+	  /* Check that the top of the bin is not the record we are going to
+	     add (i.e., double free).  */
+	  if (__builtin_expect (old == p, 0))
+	    malloc_printerr ("double free or corruption (fasttop)");
+	  p->fd = old2 = old;
+	}
+      while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
+	     != old2);
+
+    /* Check that size of fastbin chunk at the top is the same as
+       size of the chunk that we are adding.  We can dereference OLD
+       only if we have the lock, otherwise it might have already been
+       allocated again.  */
+    if (have_lock && old != NULL
+	&& __builtin_expect (fastbin_index (chunksize (old)) != idx, 0))
       malloc_printerr ("invalid fastbin entry (free)");
   }
 
@@ -4221,6 +4231,11 @@ _int_free (mstate av, mchunkptr p, int have_lock)
   */
 
   else if (!chunk_is_mmapped(p)) {
+
+    /* If we're single-threaded, don't lock the arena.  */
+    if (SINGLE_THREAD_P)
+      have_lock = true;
+
     if (!have_lock)
       __libc_lock_lock (av->mutex);

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 13247c64e2be0779050dfbd0fd25f205ba7184f7..c00df205c6004ee5b5d0aee9ffd5130b3c8f9e9f 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -4188,24 +4188,29 @@  _int_free (mstate av, mchunkptr p, int have_lock)
 
     /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
     mchunkptr old = *fb, old2;
-    unsigned int old_idx = ~0u;
-    do
+
+    /* Check that the top of the bin is not the record we are going to
+       add (i.e., double free).  */
+    if (__builtin_expect (old == p, 0))
+      malloc_printerr ("double free or corruption (fasttop)");
+
+    if (SINGLE_THREAD_P)
       {
-	/* Check that the top of the bin is not the record we are going to add
-	   (i.e., double free).  */
-	if (__builtin_expect (old == p, 0))
-	  malloc_printerr ("double free or corruption (fasttop)");
-	/* Check that size of fastbin chunk at the top is the same as
-	   size of the chunk that we are adding.  We can dereference OLD
-	   only if we have the lock, otherwise it might have already been
-	   deallocated.  See use of OLD_IDX below for the actual check.  */
-	if (have_lock && old != NULL)
-	  old_idx = fastbin_index(chunksize(old));
-	p->fd = old2 = old;
+	p->fd = old;
+	*fb = p;
       }
-    while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);
-
-    if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0))
+    else
+      do
+	p->fd = old2 = old;
+      while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
+	     != old2);
+
+    /* Check that size of fastbin chunk at the top is the same as
+       size of the chunk that we are adding.  We can dereference OLD
+       only if we have the lock, otherwise it might have already been
+       allocated again.  */
+    if (have_lock && old != NULL
+	&& __builtin_expect (fastbin_index (chunksize (old)) != idx, 0))
       malloc_printerr ("invalid fastbin entry (free)");
   }
 
@@ -4214,7 +4219,7 @@  _int_free (mstate av, mchunkptr p, int have_lock)
   */
 
   else if (!chunk_is_mmapped(p)) {
-    if (!have_lock)
+    if (!SINGLE_THREAD_P && !have_lock)
       __libc_lock_lock (av->mutex);
 
     nextchunk = chunk_at_offset(p, size);
@@ -4329,7 +4334,7 @@  _int_free (mstate av, mchunkptr p, int have_lock)
       }
     }
 
-    if (!have_lock)
+    if (!SINGLE_THREAD_P && !have_lock)
       __libc_lock_unlock (av->mutex);
   }
   /*

[3/5] Add single-threaded path to _int_free

Commit Message

Comments

Patch