[RFC/PoC] malloc: use wfcqueue to speed up remote frees

The goal is to reduce contention and improve locality of cross-thread
malloc/free traffic common to IPC systems (including Userspace-RCU) and
some garbage-collected runtimes.

Very rough benchmarks using `xthr`[1], a small URCU test program
I wrote years ago shows huge improvements in time and space:

  $ /usr/bin/time ./before.sh ./xthr -a 2 -m 2500 -i $((1024 * 1024 * 5))
  2.46user 3.51system 0:05.50elapsed 108%CPU (0avgtext+0avgdata 3352592maxresident)k
  0inputs+0outputs (17major+838014minor)pagefaults 0swaps

  $ /usr/bin/time ./after.sh ./xthr -a 2 -m 2500 -i $((1024 * 1024 * 5))
  2.68user 0.48system 0:02.55elapsed 123%CPU (0avgtext+0avgdata 532904maxresident)k
  0inputs+0outputs (0major+174304minor)pagefaults 0swaps

Where before.sh and after.sh are script wrappers around ld-linux for
the appropriate glibc installation.

  #!/bin/sh
  exec /tmp/6/lib/ld-linux-x86-64.so.2 --library-path /tmp/6/lib "$@"

  [1] xthr.c: https://80x24.org/spew/20180731082205.vykyunsm5xg7ml3e@dcvr/

It avoids lock contention by only deferring `_int_free' to scenarios
where the arena lock is already acquired.  Three functions are added:

* remote_free_begin  - Producer - enqueues the allocation into an arena
                       designated to another thread.  This is wait-free,
                       branchless, and only modifies the last (cold)
                       cacheline of the arena belonging to another thread

* remote_free_step   - Consumer - grabs everything enqueued by
                       remote_free_begin and calls `_int_free' locally
                       without acquiring extra locks.  Returns `true'
                       if it did any work, as other threads may have
                       called `remote_free_begin' in the meantime.

* remote_free_finish - Consumer - calls remote_free_step in a loop until
                       there is no more work to do.  It runs before most
                       calls to malloc_consolidate.

wfcqueue is the LGPL-2.1+ Wait-Free Concurrent Queue distributed
with Userspace-RCU <http://liburcu.org/>.  wfcqueue does not
depend on RCU itself (only atomics), but forms the basis of the
workqueue and call_rcu primitive within liburcu.

The functions I'm using from wfcqueue can be statically-linked
from header files, so it involves no extra linkage at runtime.
Note: Debian users can `apt-get install liburcu-dev' to get
wfcqueue.h; I expect it's available for other distros.

If this proof-of-concept is found acceptable, I can work on
making wfcqueue use the atomics provided by gcc/glibc instead
of the `uatomic` headers of URCU so it can be bundled with
glibc.  But maybe adding liburcu as a build-time dependency
is acceptable :)

Note: I'm haven't gotten "make -j4 check" even close to passing even
without this patch on commit 98864ed0e055583707e37cdb7d41a9cdeac4473b.
It's likely a problem on my end; but I'm only a fairly
common Debian 9 x86-64 system; though I haven't built glibc in years.

On the other hand, with the exception of fiddle (dl-dependent)
and tz tests, most of the "test-all" suite passes for Ruby
when using the either the before.sh or after.sh glibc wrapper
(but I haven't done much testing otherwise):

    make test-all TESTS='-x fiddle -x time_tz' \
		RUNRUBYOPT=--precommand=/path/to/after.sh
---
 malloc/malloc.c | 108 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 103 insertions(+), 5 deletions(-)

Message ID	20180731084936.g4yw6wnvt677miti@dcvr
State	New
Headers	show Return-Path: <libc-alpha-return-94946-incoming=patchwork.ozlabs.org@sourceware.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-94946-incoming=patchwork.ozlabs.org@sourceware.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=yhbt.net Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="LvI1VKYO"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41fqrG17hyz9rxx for <incoming@patchwork.ozlabs.org>; Tue, 31 Jul 2018 18:49:49 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; q=dns; s=default; b=Vn+L2m87s0dnrHLVKbIjgBWuPN1nq W6EQYomYoo6BPi9nJHSvDjeIHJAZigmB4ge9pPIcVQ88QwMfUbzykR5CottjJMSx WqW/2ZzJsQuTfUJFmE5wyNjpLBHSUswoQWQ/l6LcCfNtjvbPjxzTo7wUG8VNPv/E GBRSMlz3FQ8Fng= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; s=default; bh=WVp4EYV4RCFA1db4PKMYnG0P1Es=; b=LvI 1VKYOtwVMw++C9wrN1sd8HLszpyDWriHXfGFwFZrfIb7ozLT38/GI8EXvM04NFCg C2d9ow1LDxUYbh0JeXquTpUKxrn1CCoH2sR1rJnOQcsx+a8ndSs3aiUOweoXoxMk 1mxJorOsIceKNSjZp9d8bVI6F4PRuooxO5VLxNvE= Received: (qmail 104504 invoked by alias); 31 Jul 2018 08:49:43 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-incoming=patchwork.ozlabs.org@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 104485 invoked by uid 89); 31 Jul 2018 08:49:42 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=belonging, designated, consolidate, gotten X-HELO: dcvr.yhbt.net Date: Tue, 31 Jul 2018 08:49:36 +0000 From: Eric Wong <normalperson@yhbt.net> To: libc-alpha@sourceware.org Subject: [RFC/PoC] malloc: use wfcqueue to speed up remote frees Message-ID: <20180731084936.g4yw6wnvt677miti@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline
Series	[RFC/PoC] malloc: use wfcqueue to speed up remote frees \| expand [RFC/PoC] malloc: use wfcqueue to speed up remote frees

[RFC/PoC] malloc: use wfcqueue to speed up remote frees

Commit Message

Comments

Patch