From patchwork Thu Aug 29 06:27:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Wangyang" X-Patchwork-Id: 1978234 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=YihGe1el; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WvWcL2rYgz1yZ9 for ; Thu, 29 Aug 2024 16:30:25 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B8A42385E441 for ; Thu, 29 Aug 2024 06:30:23 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 1BA2E3858283 for ; Thu, 29 Aug 2024 06:30:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1BA2E3858283 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1BA2E3858283 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913005; cv=none; b=oajXRPQ3HIKnLlyFowsQcalFGhYDDUcCq4JXZCSZGeW6P7ewDab8bskcPvyP4Bxaen8QxT75bji+hNPX8/J5mRd47zGHP0weYHJ4Y2aaQkFAkuHms8/8kcnNTM2CsefwzLlANUTwdKlmtzRvlrvoV9f6VqfrxxGp0WEld6ElwT4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913005; c=relaxed/simple; bh=V8C/kbEeVY3N4TTX+9TykhPqRYKGOKsGOOiofF+1ss4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=EaqI0wBHCcMnjRzduiSaTaU9FQeSB/AdbAe3oyoyCnSczgzPr25qBphOhJL5tCrlCm6kpGRrWsJFbdaJQlFKkBqOUouO4p+yWrAYWIssTlPCuu2seS8bz+lFyGBHf3n5Tye9l8dIA9QR9QxBxka3xHBgpmfexwUqbI4oDzKnc1k= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724913003; x=1756449003; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=V8C/kbEeVY3N4TTX+9TykhPqRYKGOKsGOOiofF+1ss4=; b=YihGe1elwny2YYLA5gxT2mlkyCKE2SnKoowbRFrqp7W6t+JIKHfv5xb0 CIqWaGh3sROE6k3vhDhqjHhXLbKGZ5AjgV5iyMAsIPQkRkT83FQzEBDx3 ghm+isglNsXNeTI0TPmVdc8lH6Qfxsu4lUo23odnF8hP8cqSUX8d3b3UY INQIe5Wgw+tBJp2fCe6emoNDqQcmv8SbTP/XR8oyuoS5OGBuJsyDcvRD8 0xh0nGEd4o7UhLiUt2cF8r01EoqVXqmUOYT4pjw12jKpwEagSJp9VrrJR RkILh2DpIGwjmS2ML29NLuEhMpAKirnV7Jdmsj3CHcU2zr+MTv6zsABMl w==; X-CSE-ConnectionGUID: yfQ1UED7Qw6o/H7wi9dGMA== X-CSE-MsgGUID: Vh3LZmFUR7qHjRQC6XN0FQ== X-IronPort-AV: E=McAfee;i="6700,10204,11178"; a="22998338" X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="22998338" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2024 23:30:02 -0700 X-CSE-ConnectionGUID: i4CfxXItSLGsTNlEbNhEMA== X-CSE-MsgGUID: HcVTFtlOSB6LJ7fLXc2ysg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="63539126" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by fmviesa009.fm.intel.com with ESMTP; 28 Aug 2024 23:30:00 -0700 From: Wangyang Guo To: libc-alpha@sourceware.org Cc: fweimer@redhat.com, goldstein.w.n@gmail.com, tianyou.li@intel.com, Wangyang Guo Subject: [PATCH v3 1/5] malloc: Split _int_free() into 3 sub functions Date: Thu, 29 Aug 2024 14:27:28 +0800 Message-ID: <20240829062732.1663342-2-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240829062732.1663342-1-wangyang.guo@intel.com> References: <20240829062732.1663342-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Split _int_free() into 3 smaller functions for flexible combination: * _int_free_check -- sanity check for free * tcache_free -- free memory to tcache (quick path) * _int_free_chunk -- free memory chunk (slow path) --- Changes in v3: - Add comments to the split functions. - Wrap out seldom executed tcache_double_free_verify() as noinline function. - Link to v2: https://sourceware.org/pipermail/libc-alpha/2024-August/159426.html No changes in v2 --- malloc/malloc.c | 133 ++++++++++++++++++++++++++++++------------------ 1 file changed, 84 insertions(+), 49 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index bcb6e5b83c..ef49a13ea7 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1086,7 +1086,9 @@ typedef struct malloc_chunk* mchunkptr; /* Internal routines. */ static void* _int_malloc(mstate, size_t); -static void _int_free(mstate, mchunkptr, int); +static void _int_free (mstate, mchunkptr, int); +static void _int_free_check (mstate, mchunkptr, INTERNAL_SIZE_T); +static void _int_free_chunk (mstate, mchunkptr, INTERNAL_SIZE_T, int); static void _int_free_merge_chunk (mstate, mchunkptr, INTERNAL_SIZE_T); static INTERNAL_SIZE_T _int_free_create_chunk (mstate, mchunkptr, INTERNAL_SIZE_T, @@ -3206,6 +3208,57 @@ tcache_next (tcache_entry *e) return (tcache_entry *) REVEAL_PTR (e->next); } +/* Verify if the suspicious tcache_entry is double free. + It's not expected to execute very often, mark it as noinline. */ +static __attribute__ ((noinline)) void +tcache_double_free_verify (tcache_entry *e, size_t tc_idx) +{ + tcache_entry *tmp; + size_t cnt = 0; + LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx); + for (tmp = tcache->entries[tc_idx]; + tmp; + tmp = REVEAL_PTR (tmp->next), ++cnt) + { + if (cnt >= mp_.tcache_count) + malloc_printerr ("free(): too many chunks detected in tcache"); + if (__glibc_unlikely (!aligned_OK (tmp))) + malloc_printerr ("free(): unaligned chunk detected in tcache 2"); + if (tmp == e) + malloc_printerr ("free(): double free detected in tcache 2"); + /* If we get here, it was a coincidence. We've wasted a + few cycles, but don't abort. */ + } +} + +/* Try to free chunk to the tcache, if success return true. + Caller must ensure that chunk and size are valid. */ +static inline bool +tcache_free (mchunkptr p, INTERNAL_SIZE_T size) +{ + bool done = false; + size_t tc_idx = csize2tidx (size); + if (tcache != NULL && tc_idx < mp_.tcache_bins) + { + /* Check to see if it's already in the tcache. */ + tcache_entry *e = (tcache_entry *) chunk2mem (p); + + /* This test succeeds on double free. However, we don't 100% + trust it (it also matches random payload data at a 1 in + 2^ chance), so verify it's not an unlikely + coincidence before aborting. */ + if (__glibc_unlikely (e->key == tcache_key)) + tcache_double_free_verify (e, tc_idx); + + if (tcache->counts[tc_idx] < mp_.tcache_count) + { + tcache_put (p, tc_idx); + done = true; + } + } + return done; +} + static void tcache_thread_shutdown (void) { @@ -4490,14 +4543,9 @@ _int_malloc (mstate av, size_t bytes) ------------------------------ free ------------------------------ */ -static void -_int_free (mstate av, mchunkptr p, int have_lock) +static inline void +_int_free_check (mstate av, mchunkptr p, INTERNAL_SIZE_T size) { - INTERNAL_SIZE_T size; /* its size */ - mfastbinptr *fb; /* associated fastbin */ - - size = chunksize (p); - /* Little security check which won't hurt performance: the allocator never wraps around at the end of the address space. Therefore we can exclude some size values which might appear @@ -4510,48 +4558,15 @@ _int_free (mstate av, mchunkptr p, int have_lock) if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size))) malloc_printerr ("free(): invalid size"); - check_inuse_chunk(av, p); - -#if USE_TCACHE - { - size_t tc_idx = csize2tidx (size); - if (tcache != NULL && tc_idx < mp_.tcache_bins) - { - /* Check to see if it's already in the tcache. */ - tcache_entry *e = (tcache_entry *) chunk2mem (p); - - /* This test succeeds on double free. However, we don't 100% - trust it (it also matches random payload data at a 1 in - 2^ chance), so verify it's not an unlikely - coincidence before aborting. */ - if (__glibc_unlikely (e->key == tcache_key)) - { - tcache_entry *tmp; - size_t cnt = 0; - LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx); - for (tmp = tcache->entries[tc_idx]; - tmp; - tmp = REVEAL_PTR (tmp->next), ++cnt) - { - if (cnt >= mp_.tcache_count) - malloc_printerr ("free(): too many chunks detected in tcache"); - if (__glibc_unlikely (!aligned_OK (tmp))) - malloc_printerr ("free(): unaligned chunk detected in tcache 2"); - if (tmp == e) - malloc_printerr ("free(): double free detected in tcache 2"); - /* If we get here, it was a coincidence. We've wasted a - few cycles, but don't abort. */ - } - } + check_inuse_chunk (av, p); +} - if (tcache->counts[tc_idx] < mp_.tcache_count) - { - tcache_put (p, tc_idx); - return; - } - } - } -#endif +/* Free chunk P of SIZE bytes to the arena, if arena lock is held, + set have_lock to 1. Caller must ensure chunk and size are valid. */ +static void +_int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock) +{ + mfastbinptr *fb; /* associated fastbin */ /* If eligible, place chunk on a fastbin so it can be found @@ -4657,6 +4672,26 @@ _int_free (mstate av, mchunkptr p, int have_lock) } } +/* Free chunk P to its arena AV, if arena lock held, set have_lock to 1. + It will perform sanity check, then try the fast path to free into + tcache. If the attempt not success, free the chunk to arena. */ +static void +_int_free (mstate av, mchunkptr p, int have_lock) +{ + INTERNAL_SIZE_T size; /* its size */ + + size = chunksize (p); + + _int_free_check (av, p, size); + +#if USE_TCACHE + if (tcache_free (p, size)) + return; +#endif + + _int_free_chunk (av, p, size, have_lock); +} + /* Try to merge chunk P of SIZE bytes with its neighbors. Put the resulting chunk on the appropriate bin list. P must not be on a bin list yet, and it can be in use. */ From patchwork Thu Aug 29 06:27:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Wangyang" X-Patchwork-Id: 1978237 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=RLUtuBVm; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WvWdW2ClQz1yZ9 for ; Thu, 29 Aug 2024 16:31:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8DEE3385EC2E for ; Thu, 29 Aug 2024 06:31:25 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 2312A3858289 for ; Thu, 29 Aug 2024 06:30:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2312A3858289 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2312A3858289 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913006; cv=none; b=JBzrMk6nJPFClh/0PQ+8BYNQQMml0z7e+aYHnq/F5o1yXFE2cTUHPWH03sq8fhxVHQAG0LbA8LCsPu2TSDUt8+MMg/mkvUULWAod9NXh4swxKGym/qtDHnp+xcLzj80yZnweT1lmo5M3rk0Mx2BPo5ViGxHnRqUISg4ZJWcrjsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913006; c=relaxed/simple; bh=Cv/Dx6UpnlcnsI0OYKK3ior3yWu1wX3XERApypGMNUQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=rQ9Wll+/4bexBMsCs80hTlxE6QI8Yy3xVKcrNkl974w1C25nsBkSwsxMNKkL5L+IOFONorr0JptKOh5ettoZHcLYIn6BkiCcaNDjtH3G+1acahPe1rDRkmbXIVayk+BXbd7rXoWR16ncO8/all3HsFhxktwZS0YNZGf9HyPuZCc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724913004; x=1756449004; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Cv/Dx6UpnlcnsI0OYKK3ior3yWu1wX3XERApypGMNUQ=; b=RLUtuBVmbalzkF/CWJOsXKKcOOHdoFD+FTRBhxAqCFIECJSf2RusJNLD tOpEjSftrCKllXkg+mnOgheMd/5fme5Ic+vXGZu6hVzi13zzrp7VdMeho uPlNrmUw1Vy3xxzVpOOF0gEC6MhFBNcU1PGncly+HuLjxankgvMdBKI2E TaEtAxDWYRzbw8Ne6otBVySBx1o7yGWMhr/Wj4KG2BPlIogeK+hJWTxdH CH+c6ijWQwEmL4zsuY2ysE9t4vVnOq0ITVYM8Uy9OxFLBFYuAEEu8Dz4k JkaEAvIXVtlKpDkgQUCDPrcLQlxdqfaUrMd5xWW6z0Efp5XsTYf87InvA g==; X-CSE-ConnectionGUID: HKMYq/baTmO6r/W2nx4h1A== X-CSE-MsgGUID: gfoJtg/sR6+Yb1fDJZI/SQ== X-IronPort-AV: E=McAfee;i="6700,10204,11178"; a="22998346" X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="22998346" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2024 23:30:04 -0700 X-CSE-ConnectionGUID: 31V8xl9qRRqCQTiiy8Vy7w== X-CSE-MsgGUID: OILMay/wQqSLUZtZDA84DA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="63539140" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by fmviesa009.fm.intel.com with ESMTP; 28 Aug 2024 23:30:02 -0700 From: Wangyang Guo To: libc-alpha@sourceware.org Cc: fweimer@redhat.com, goldstein.w.n@gmail.com, tianyou.li@intel.com, Wangyang Guo Subject: [PATCH v3 2/5] malloc: Avoid func call for tcache quick path in free() Date: Thu, 29 Aug 2024 14:27:29 +0800 Message-ID: <20240829062732.1663342-3-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240829062732.1663342-1-wangyang.guo@intel.com> References: <20240829062732.1663342-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Tcache is an important optimzation to accelerate memory free(), things within this code path should be kept as simple as possible. This commit try to remove the function call when free() invokes tcache code path. Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.904 4 threads | 0.919 The performance data shows it can improve bench-malloc-thread benchmark by ~10% in single thread and ~8% in multi-thread scenario. --- Changes in v2: - _int_free_check() should be put outside of USE_TCACHE. - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159359.html --- malloc/malloc.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index ef49a13ea7..264f35e1a3 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3448,7 +3448,17 @@ __libc_free (void *mem) (void)tag_region (chunk2mem (p), memsize (p)); ar_ptr = arena_for_chunk (p); - _int_free (ar_ptr, p, 0); + INTERNAL_SIZE_T size = chunksize (p); + _int_free_check (ar_ptr, p, size); + +#if USE_TCACHE + if (tcache_free (p, size)) + { + __set_errno (err); + return; + } +#endif + _int_free_chunk (ar_ptr, p, size, 0); } __set_errno (err); From patchwork Thu Aug 29 06:27:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Wangyang" X-Patchwork-Id: 1978235 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=TzLLfSU7; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WvWcv0zFXz1yZ9 for ; Thu, 29 Aug 2024 16:30:55 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 605AB385EC27 for ; Thu, 29 Aug 2024 06:30:53 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 2D431385DDCA for ; Thu, 29 Aug 2024 06:30:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2D431385DDCA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2D431385DDCA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913008; cv=none; b=XfSodDqRTYUtGO4oF7xHRwXVtse09ec9LVtjuKgUyIrY7YbyM7Bmjie1tmJHSK7hpneLlLliqM1O7urxnQgm6FSN4YI/aqQHXccPEA9eA1xxI9wSxJfgyH+XQ7Rai4fGU9wU8BoO9BGxNWFyQ1IdjGxBikigM/OXKIbZs8d66P4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913008; c=relaxed/simple; bh=t/hVu67XZli0Y0uJbiws1v6w4im9tZrtQe/nQawI/v8=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=IHBiKL8RpJUbQKQP0KP11XiPKahgy1VbjVRXr8ycbR6/Q41xlcH7OLWn+reZOFyBw1uPjNkGJOkQ6Oro3PyeRU+APQEU2B4jAuQMDzeGDmzz2kIDZlU6fGoC/VaCHOP4HqQcvZbZrTiwpQGlJyElsdK9ypi9TPZrZI0ucJ7qq1s= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724913006; x=1756449006; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=t/hVu67XZli0Y0uJbiws1v6w4im9tZrtQe/nQawI/v8=; b=TzLLfSU75073Jiz9JXwSX+8lV+iB7XPJUh4q4OjAO7GRJGMomgR0EUIi PPnCu5ruH4zJpFwf4MGimBKDaL6nkOkygBYwViYffb/SezjRaIif7TRxa Iwpn1zUo/fCu4v8jN563OI6ZpQBdPNuAXdql03kKhcUH1v3iT1sP6X76n C5depm/zJf9aH5nynTVQhdlXde3K853HEBiCraN9nVLN5LJnDrsPsHt81 SsWvZ0ryqugxsosjDR65wHA7W9fAI6MwUK1ECEaA2yy/9bVayUjykGsUu +zMHFwQCtV4vfXKnEh2YOomhtq6/zOHywiQye99+L7PNg/2ku9qNnZ28V w==; X-CSE-ConnectionGUID: Mss6SumjR7+1cnbi8ZYZCw== X-CSE-MsgGUID: +GuFZ8vWQxOPJ9rA90QuTQ== X-IronPort-AV: E=McAfee;i="6700,10204,11178"; a="22998375" X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="22998375" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2024 23:30:05 -0700 X-CSE-ConnectionGUID: NH9r/iZ5Q+qWPjkFTDSkhA== X-CSE-MsgGUID: cTUuCaVPSImufjJJvwRBrw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="63539168" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by fmviesa009.fm.intel.com with ESMTP; 28 Aug 2024 23:30:04 -0700 From: Wangyang Guo To: libc-alpha@sourceware.org Cc: fweimer@redhat.com, goldstein.w.n@gmail.com, tianyou.li@intel.com, Wangyang Guo Subject: [PATCH v3 3/5] malloc: Arena is not needed for tcache path in free() Date: Thu, 29 Aug 2024 14:27:30 +0800 Message-ID: <20240829062732.1663342-4-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240829062732.1663342-1-wangyang.guo@intel.com> References: <20240829062732.1663342-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Arena is not needed for _int_free_check() in non-DEBUG mode. This commit defers arena deference to _int_free_chunk() thus accelerate tcache path. When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk(). Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.994 4 threads | 0.968 The data shows it can brings 3% performance gain in multi-thread scenario. --- Changes in v2: - _int_free_check() should be put outside of USE_TCACHE. - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159360.html --- malloc/malloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index 264f35e1a3..efb5292e9f 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p) { mchunkptr next; + if (av == NULL) + av = arena_for_chunk (p); + do_check_chunk (av, p); if (chunk_is_mmapped (p)) @@ -3447,9 +3450,10 @@ __libc_free (void *mem) /* Mark the chunk as belonging to the library again. */ (void)tag_region (chunk2mem (p), memsize (p)); - ar_ptr = arena_for_chunk (p); INTERNAL_SIZE_T size = chunksize (p); - _int_free_check (ar_ptr, p, size); + /* av is not needed for _int_free_check in non-DEBUG mode, + in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */ + _int_free_check (NULL, p, size); #if USE_TCACHE if (tcache_free (p, size)) @@ -3458,6 +3462,8 @@ __libc_free (void *mem) return; } #endif + + ar_ptr = arena_for_chunk (p); _int_free_chunk (ar_ptr, p, size, 0); } From patchwork Thu Aug 29 06:27:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Wangyang" X-Patchwork-Id: 1978236 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=lThXh9SC; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WvWd657nhz1yfy for ; Thu, 29 Aug 2024 16:31:06 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8AAE4385ED4A for ; Thu, 29 Aug 2024 06:31:04 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id D8046385DDF8 for ; Thu, 29 Aug 2024 06:30:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D8046385DDF8 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D8046385DDF8 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913011; cv=none; b=CXwBk+XCYxxb7f6HHBefHrY3XTFPhUSSk/gE6GkVtuVBhRb1csyTbK7IKkP1APCggFyAvmJSdXRr6woQyc27U4ovwBlvjrC9ZcBGglyf6CtW2fgss/B5Q/v5cXM0ErkgSNOoHHfp+LASigTWdit3FyaX2aPfMtXK1ROG02lVKFY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913011; c=relaxed/simple; bh=Ri4agsZHRwLwgb7E+XLgCzZkZ70mvET+Zk7J/VbE6Pk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Xg6YQqSp5rMZfdqBdRHoZ7nDL/tSMzBb33gdXnbj06trtrlOhf+Bk/5vdz0oYngTKw5YSH0UJiKuEmY38X9DEv8eqZxvb0fzWP7iYIdtQE7TkRsOZufOUoJqOLLzmPQu0j1JvCiaBPAFk6FmCYExa3hLJU068B5kN3VMBYqc6vc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724913008; x=1756449008; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ri4agsZHRwLwgb7E+XLgCzZkZ70mvET+Zk7J/VbE6Pk=; b=lThXh9SCSAU805bqTMFjOXBaMQMYoNPvrUifngmt7x41CcqRKJfWswYA +cOdOxhveXFoZ8u8BnJmToYw2ghjbIjcG3JALLxdgoyOh+OSG8DhZVJuv tY9qIDKjUYEN7tngn0rJovnO/YXc+QN6LQG9nuqkiEp72KDBQOYZoN7yZ IcW6TEDl7L+07t7XpmQxGTKPqwRAJDhuEHV0ZVIXWQ/+gq0p85AxdeBAF tFw6yCQh3KQDxgLS/ce5uV35fhJp4pGuCRpxlM6o4ZBBWiYXzTUxDUzFj 1pXflVKpupMD0QT9EEM/8VGhnIEjqxcaqgktmcA6YLetJFHCC13gJDvNY A==; X-CSE-ConnectionGUID: ceNQSSprQGqUlahzOaQ3/w== X-CSE-MsgGUID: pw9B6P4QQ4+Ft7XduD6/zA== X-IronPort-AV: E=McAfee;i="6700,10204,11178"; a="22998386" X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="22998386" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2024 23:30:07 -0700 X-CSE-ConnectionGUID: XL+Ywkc6Ql+/2ZsGLx9etw== X-CSE-MsgGUID: fe1ce4l2Ro+6dr+/HakjSw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="63539175" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by fmviesa009.fm.intel.com with ESMTP; 28 Aug 2024 23:30:05 -0700 From: Wangyang Guo To: libc-alpha@sourceware.org Cc: fweimer@redhat.com, goldstein.w.n@gmail.com, tianyou.li@intel.com, Wangyang Guo Subject: [PATCH v3 4/5] benchtests: Add calloc function test to bench-malloc-thread Date: Thu, 29 Aug 2024 14:27:31 +0800 Message-ID: <20240829062732.1663342-5-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240829062732.1663342-1-wangyang.guo@intel.com> References: <20240829062732.1663342-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org --- benchtests/bench-malloc-thread.c | 114 ++++++++++++++++++++----------- 1 file changed, 76 insertions(+), 38 deletions(-) diff --git a/benchtests/bench-malloc-thread.c b/benchtests/bench-malloc-thread.c index 46fdabd30c..e8429cec10 100644 --- a/benchtests/bench-malloc-thread.c +++ b/benchtests/bench-malloc-thread.c @@ -123,6 +123,8 @@ alarm_handler (int signum) timeout = true; } +typedef size_t (*loop_func_t)(void **); + /* Allocate and free blocks in a random order. */ static size_t malloc_benchmark_loop (void **ptr_arr) @@ -145,10 +147,32 @@ malloc_benchmark_loop (void **ptr_arr) return iters; } +static size_t +calloc_benchmark_loop (void **ptr_arr) +{ + unsigned int offset_state = 0, block_state = 0; + size_t iters = 0; + + while (!timeout) + { + unsigned int next_idx = get_random_offset (&offset_state); + unsigned int next_block = get_random_block_size (&block_state); + + free (ptr_arr[next_idx]); + + ptr_arr[next_idx] = calloc (1, next_block); + + iters++; + } + + return iters; +} + struct thread_args { size_t iters; void **working_set; + loop_func_t benchmark_loop; timing_t elapsed; }; @@ -161,7 +185,7 @@ benchmark_thread (void *arg) timing_t start, stop; TIMING_NOW (start); - iters = malloc_benchmark_loop (thread_set); + iters = args->benchmark_loop (thread_set); TIMING_NOW (stop); TIMING_DIFF (args->elapsed, start, stop); @@ -171,7 +195,7 @@ benchmark_thread (void *arg) } static timing_t -do_benchmark (size_t num_threads, size_t *iters) +do_benchmark (loop_func_t benchmark_loop, size_t num_threads, size_t *iters) { timing_t elapsed = 0; @@ -183,7 +207,7 @@ do_benchmark (size_t num_threads, size_t *iters) memset (working_set, 0, sizeof (working_set)); TIMING_NOW (start); - *iters = malloc_benchmark_loop (working_set); + *iters = benchmark_loop (working_set); TIMING_NOW (stop); TIMING_DIFF (elapsed, start, stop); @@ -201,6 +225,7 @@ do_benchmark (size_t num_threads, size_t *iters) for (size_t i = 0; i < num_threads; i++) { args[i].working_set = working_set[i]; + args[i].benchmark_loop = benchmark_loop; pthread_create(&threads[i], NULL, benchmark_thread, &args[i]); } @@ -214,6 +239,47 @@ do_benchmark (size_t num_threads, size_t *iters) return elapsed; } +static void +bench_function (json_ctx_t *json_ctx, size_t num_threads, + const char *func_name, loop_func_t benchmark_loop) +{ + timing_t cur; + size_t iters = 0; + double d_total_s, d_total_i; + + init_random_values (); + + json_attr_object_begin (json_ctx, func_name); + + json_attr_object_begin (json_ctx, ""); + + timeout = false; + alarm (BENCHMARK_DURATION); + + cur = do_benchmark (benchmark_loop, num_threads, &iters); + + struct rusage usage; + getrusage(RUSAGE_SELF, &usage); + + d_total_s = cur; + d_total_i = iters; + + json_attr_double (json_ctx, "duration", d_total_s); + json_attr_double (json_ctx, "iterations", d_total_i); + json_attr_double (json_ctx, "time_per_iteration", d_total_s / d_total_i); + json_attr_double (json_ctx, "max_rss", usage.ru_maxrss); + + json_attr_double (json_ctx, "threads", num_threads); + json_attr_double (json_ctx, "min_size", MIN_ALLOCATION_SIZE); + json_attr_double (json_ctx, "max_size", MAX_ALLOCATION_SIZE); + json_attr_double (json_ctx, "random_seed", RAND_SEED); + + json_attr_object_end (json_ctx); + + json_attr_object_end (json_ctx); + +} + static void usage(const char *name) { fprintf (stderr, "%s: \n", name); @@ -223,10 +289,8 @@ static void usage(const char *name) int main (int argc, char **argv) { - timing_t cur; - size_t iters = 0, num_threads = 1; + size_t num_threads = 1; json_ctx_t json_ctx; - double d_total_s, d_total_i; struct sigaction act; if (argc == 1) @@ -246,48 +310,22 @@ main (int argc, char **argv) else usage(argv[0]); - init_random_values (); - - json_init (&json_ctx, 0, stdout); - - json_document_begin (&json_ctx); - - json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); - - json_attr_object_begin (&json_ctx, "functions"); - - json_attr_object_begin (&json_ctx, "malloc"); - - json_attr_object_begin (&json_ctx, ""); - memset (&act, 0, sizeof (act)); act.sa_handler = &alarm_handler; sigaction (SIGALRM, &act, NULL); - alarm (BENCHMARK_DURATION); - - cur = do_benchmark (num_threads, &iters); - - struct rusage usage; - getrusage(RUSAGE_SELF, &usage); + json_init (&json_ctx, 0, stdout); - d_total_s = cur; - d_total_i = iters; + json_document_begin (&json_ctx); - json_attr_double (&json_ctx, "duration", d_total_s); - json_attr_double (&json_ctx, "iterations", d_total_i); - json_attr_double (&json_ctx, "time_per_iteration", d_total_s / d_total_i); - json_attr_double (&json_ctx, "max_rss", usage.ru_maxrss); + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); - json_attr_double (&json_ctx, "threads", num_threads); - json_attr_double (&json_ctx, "min_size", MIN_ALLOCATION_SIZE); - json_attr_double (&json_ctx, "max_size", MAX_ALLOCATION_SIZE); - json_attr_double (&json_ctx, "random_seed", RAND_SEED); + json_attr_object_begin (&json_ctx, "functions"); - json_attr_object_end (&json_ctx); + bench_function (&json_ctx, num_threads, "malloc", malloc_benchmark_loop); - json_attr_object_end (&json_ctx); + bench_function (&json_ctx, num_threads, "calloc", calloc_benchmark_loop); json_attr_object_end (&json_ctx); From patchwork Thu Aug 29 06:27:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Wangyang" X-Patchwork-Id: 1978238 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=MsjDaIPJ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WvWfK1vZWz1yZ9 for ; Thu, 29 Aug 2024 16:32:09 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 60693385E027 for ; Thu, 29 Aug 2024 06:32:07 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 5E697385EC26 for ; Thu, 29 Aug 2024 06:30:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5E697385EC26 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5E697385EC26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913012; cv=none; b=AZgPTxgPyT/666Wu9VLdLFA8RgwZEZjvEiEfOql7ThbBwkQyd8xjLJjT2KyF8LtYATgBlDo8quF5Qty4Bfb0ccl+Q8TgDHtdpGGBol1Az9iK7QvlK2+PufC35NS+zkEwCWdjcq7lZd1i0PNHCcAmOCjfyAS15iO2CfCTWMVcVP4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724913012; c=relaxed/simple; bh=603hRoD4H/TUDSKlUnB4HV0IvJQIHaLPZOEXV+x1vQk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=IvM1jlWin+w+b0SzUXi72vDNt4Gv/n/VL58i4HPuXRzO1GwKhQKZ6tHwsNkYh8aPkVLQoi9vDJEBoJAWnZ1g8MiHsrFH3KS4hgCyhJrFH88NIVDO0HEquIC1arf50AjiM06FDnxLlNdb5OlmiLFzOg+kRkoRr4NWyuXWG/iyvp8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724913009; x=1756449009; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=603hRoD4H/TUDSKlUnB4HV0IvJQIHaLPZOEXV+x1vQk=; b=MsjDaIPJT+tfzHzlRh4vlWkRHi0w4ZVNOokULvX7zNZHhgDd5EgB5/l2 SqrfbpnbvZSL17hzht2CxgUBonpwrTEtk1Sf3cY8yj3ptnej1Nrq7QUH9 aKYqE2NhoXugjRer7HeVQriXdCkQXy97Rjy7UEkMzVUSwgCfDONFAs1dV 9VKwVbplfUcZxAs2FbCEw70DvmERXaQVEp4ZYrDCCUnYsqnR1GUR8MUWm TEgQAHaCMHg0RfeICr7VQxAQyX+UBJGTiP2nwMH4zAVy42jgGS9c1Wvtr b5NbLHTGhFyg9BHPW3L2zNFd5yqb8HgTeiJsw489HJQxsR5GC+1Ao9G8G Q==; X-CSE-ConnectionGUID: 9CwcMGFYR+6+UcPJbN9fPQ== X-CSE-MsgGUID: arKlXAdoSGWsoBfAy0YZOQ== X-IronPort-AV: E=McAfee;i="6700,10204,11178"; a="22998397" X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="22998397" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2024 23:30:09 -0700 X-CSE-ConnectionGUID: BMxEDidSShexjkVHYTjkLg== X-CSE-MsgGUID: xGYYYPxmQcSwlKDVWSsydg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,185,1719903600"; d="scan'208";a="63539179" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by fmviesa009.fm.intel.com with ESMTP; 28 Aug 2024 23:30:07 -0700 From: Wangyang Guo To: libc-alpha@sourceware.org Cc: fweimer@redhat.com, goldstein.w.n@gmail.com, tianyou.li@intel.com, Wangyang Guo Subject: [PATCH v3 5/5] malloc: Add tcache path for calloc Date: Thu, 29 Aug 2024 14:27:32 +0800 Message-ID: <20240829062732.1663342-6-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240829062732.1663342-1-wangyang.guo@intel.com> References: <20240829062732.1663342-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org This commit add tcache support in calloc() which can largely improve the performance of small size allocation, especially in multi-thread scenario. clear_mem() and tcache_available() is split out as a helper function for better reusing the code. Also fix tst-safe-linking failure after enabling tcache. In previous, calloc() is used as a way to by-pass tcache in memory allocation and trigger safe-linking check in fastbins path. With tcache enabled, it needs extra workarounds to bypass tcache. Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Bench Function: calloc Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.724 4 threads | 0.534 --- Changes in v3: - Split out tcache_available() as helper function. - Link to v2: https://sourceware.org/pipermail/libc-alpha/2024-August/159430.html Changes in v2: - Merge tst-safe-linking fix to make sure CI check pass. - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159362.html --- malloc/malloc.c | 129 ++++++++++++++++++++++++-------------- malloc/tst-safe-linking.c | 81 ++++++++++++++++++++---- 2 files changed, 150 insertions(+), 60 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index efb5292e9f..f9dd29cf64 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3211,6 +3211,18 @@ tcache_next (tcache_entry *e) return (tcache_entry *) REVEAL_PTR (e->next); } +/* Check if tcache is available for alloc by corresponding tc_idx. */ +static __always_inline bool +tcache_availabe (size_t tc_idx) +{ + if (tc_idx < mp_.tcache_bins + && tcache != NULL + && tcache->counts[tc_idx] > 0) + return true; + else + return false; +} + /* Verify if the suspicious tcache_entry is double free. It's not expected to execute very often, mark it as noinline. */ static __attribute__ ((noinline)) void @@ -3369,9 +3381,7 @@ __libc_malloc (size_t bytes) MAYBE_INIT_TCACHE (); DIAG_PUSH_NEEDS_COMMENT; - if (tc_idx < mp_.tcache_bins - && tcache != NULL - && tcache->counts[tc_idx] > 0) + if (tcache_availabe (tc_idx)) { victim = tcache_get (tc_idx); return tag_new_usable (victim); @@ -3683,9 +3693,7 @@ _mid_memalign (size_t alignment, size_t bytes, void *address) } size_t tc_idx = csize2tidx (tbytes); - if (tc_idx < mp_.tcache_bins - && tcache != NULL - && tcache->counts[tc_idx] > 0) + if (tcache_availabe (tc_idx)) { /* The tcache itself isn't encoded, but the chain is. */ tcache_entry **tep = & tcache->entries[tc_idx]; @@ -3763,16 +3771,55 @@ __libc_pvalloc (size_t bytes) return _mid_memalign (pagesize, rounded_bytes, address); } +static __always_inline void * +clear_mem (void *mem, INTERNAL_SIZE_T csz) +{ + INTERNAL_SIZE_T *d; + unsigned long clearsize, nclears; + + /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that + contents have an odd number of INTERNAL_SIZE_T-sized words; + minimally 3. */ + d = (INTERNAL_SIZE_T *) mem; + clearsize = csz - SIZE_SZ; + nclears = clearsize / sizeof (INTERNAL_SIZE_T); + assert (nclears >= 3); + + if (nclears > 9) + return memset (d, 0, clearsize); + + else + { + *(d + 0) = 0; + *(d + 1) = 0; + *(d + 2) = 0; + if (nclears > 4) + { + *(d + 3) = 0; + *(d + 4) = 0; + if (nclears > 6) + { + *(d + 5) = 0; + *(d + 6) = 0; + if (nclears > 8) + { + *(d + 7) = 0; + *(d + 8) = 0; + } + } + } + } + + return mem; +} + void * __libc_calloc (size_t n, size_t elem_size) { mstate av; - mchunkptr oldtop; - INTERNAL_SIZE_T sz, oldtopsize; + mchunkptr oldtop, p; + INTERNAL_SIZE_T sz, oldtopsize, csz; void *mem; - unsigned long clearsize; - unsigned long nclears; - INTERNAL_SIZE_T *d; ptrdiff_t bytes; if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes))) @@ -3788,6 +3835,27 @@ __libc_calloc (size_t n, size_t elem_size) MAYBE_INIT_TCACHE (); +#if USE_TCACHE + /* int_free also calls request2size, be careful to not pad twice. */ + size_t tbytes = checked_request2size (bytes); + if (tbytes == 0) + { + __set_errno (ENOMEM); + return NULL; + } + size_t tc_idx = csize2tidx (tbytes); + + if (tcache_availabe (tc_idx)) + { + mem = tcache_get (tc_idx); + p = mem2chunk (mem); + if (__glibc_unlikely (mtag_enabled)) + return tag_new_zero_region (mem, memsize (p)); + csz = chunksize (p); + return clear_mem (mem, csz); + } +#endif + if (SINGLE_THREAD_P) av = &main_arena; else @@ -3842,7 +3910,7 @@ __libc_calloc (size_t n, size_t elem_size) if (mem == 0) return 0; - mchunkptr p = mem2chunk (mem); + p = mem2chunk (mem); /* If we are using memory tagging, then we need to set the tags regardless of MORECORE_CLEARS, so we zero the whole block while @@ -3850,7 +3918,7 @@ __libc_calloc (size_t n, size_t elem_size) if (__glibc_unlikely (mtag_enabled)) return tag_new_zero_region (mem, memsize (p)); - INTERNAL_SIZE_T csz = chunksize (p); + csz = chunksize (p); /* Two optional cases in which clearing not necessary */ if (chunk_is_mmapped (p)) @@ -3869,40 +3937,7 @@ __libc_calloc (size_t n, size_t elem_size) } #endif - /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that - contents have an odd number of INTERNAL_SIZE_T-sized words; - minimally 3. */ - d = (INTERNAL_SIZE_T *) mem; - clearsize = csz - SIZE_SZ; - nclears = clearsize / sizeof (INTERNAL_SIZE_T); - assert (nclears >= 3); - - if (nclears > 9) - return memset (d, 0, clearsize); - - else - { - *(d + 0) = 0; - *(d + 1) = 0; - *(d + 2) = 0; - if (nclears > 4) - { - *(d + 3) = 0; - *(d + 4) = 0; - if (nclears > 6) - { - *(d + 5) = 0; - *(d + 6) = 0; - if (nclears > 8) - { - *(d + 7) = 0; - *(d + 8) = 0; - } - } - } - } - - return mem; + return clear_mem (mem, csz); } #endif /* IS_IN (libc) */ diff --git a/malloc/tst-safe-linking.c b/malloc/tst-safe-linking.c index 01dd07004d..5302575ad1 100644 --- a/malloc/tst-safe-linking.c +++ b/malloc/tst-safe-linking.c @@ -111,22 +111,37 @@ test_fastbin (void *closure) int i; int mask = ((int *)closure)[0]; size_t size = TCACHE_ALLOC_SIZE; + void * ps[TCACHE_FILL_COUNT]; + void * pps[TCACHE_FILL_COUNT]; printf ("++ fastbin ++\n"); + /* Populate the fastbin list. */ + void * volatile a = calloc (1, size); + void * volatile b = calloc (1, size); + void * volatile c = calloc (1, size); + printf ("a=%p, b=%p, c=%p\n", a, b, c); + + /* Chunks for later tcache filling from fastbins. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + void * volatile p = calloc (1, size); + pps[i] = p; + } + /* Take the tcache out of the game. */ for (i = 0; i < TCACHE_FILL_COUNT; ++i) { void * volatile p = calloc (1, size); - printf ("p=%p\n", p); - free (p); + ps[i] = p; } - /* Populate the fastbin list. */ - void * volatile a = calloc (1, size); - void * volatile b = calloc (1, size); - void * volatile c = calloc (1, size); - printf ("a=%p, b=%p, c=%p\n", a, b, c); + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + } + + /* Free abc will return to fastbin in FIFO order. */ free (a); free (b); free (c); @@ -136,11 +151,43 @@ test_fastbin (void *closure) memset (c, mask & 0xFF, size); printf ("After: c=%p, c[0]=%p\n", c, ((void **)c)[0]); + /* Filling fastbins, will be copied to tcache later. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (pps[i]); + } + + /* Drain out tcache to make sure later alloc from fastbins. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + void * volatile p = calloc (1, size); + ps[i] = p; + } + + /* This line will also filling tcache with remain pps and c. */ + pps[TCACHE_FILL_COUNT - 1] = calloc (1, size); + + /* Tcache is FILO, now the first one is c, take it out. */ c = calloc (1, size); printf ("Allocated: c=%p\n", c); + + /* Drain out remain pps from tcache. */ + for (i = 0; i < TCACHE_FILL_COUNT - 1; ++i) + { + void * volatile p = calloc (1, size); + pps[i] = p; + } + /* This line will trigger the Safe-Linking check. */ b = calloc (1, size); printf ("b=%p\n", b); + + /* Free previous pointers. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + free (pps[i]); + } } /* Try corrupting the fastbin list and trigger a consolidate. */ @@ -150,21 +197,29 @@ test_fastbin_consolidate (void *closure) int i; int mask = ((int*)closure)[0]; size_t size = TCACHE_ALLOC_SIZE; + void * ps[TCACHE_FILL_COUNT]; printf ("++ fastbin consolidate ++\n"); + /* Populate the fastbin list. */ + void * volatile a = calloc (1, size); + void * volatile b = calloc (1, size); + void * volatile c = calloc (1, size); + printf ("a=%p, b=%p, c=%p\n", a, b, c); + /* Take the tcache out of the game. */ for (i = 0; i < TCACHE_FILL_COUNT; ++i) { void * volatile p = calloc (1, size); - free (p); + ps[i] = p; } - /* Populate the fastbin list. */ - void * volatile a = calloc (1, size); - void * volatile b = calloc (1, size); - void * volatile c = calloc (1, size); - printf ("a=%p, b=%p, c=%p\n", a, b, c); + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + } + + /* Free abc will return to fastbin. */ free (a); free (b); free (c);