From patchwork Mon Aug 26 02:55:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Wangyang" X-Patchwork-Id: 1976553 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=HQvt9c4X; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wsb4D23Z4z1yfF for ; Mon, 26 Aug 2024 12:59:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8200738654B7 for ; Mon, 26 Aug 2024 02:59:22 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by sourceware.org (Postfix) with ESMTPS id 7C33B3865C27 for ; Mon, 26 Aug 2024 02:58:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7C33B3865C27 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7C33B3865C27 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724641100; cv=none; b=i5HL+pZc49t/DytZvjmFVSaMDReWJ+gcWaf1P5Q3+UTrB5bVFNZV/hpamtRHpkztbQOdYnq2nrN++XqG1Pv5r8F33SQERa4QBpl1aBRTGemnoJZi5jqhsgdM46I9TfHY4MSECKoXXn5vywwyQ/eSw8/nYy/Yd25RrVY5PI96haw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724641100; c=relaxed/simple; bh=Qgo4GVZrN9GVH38k/yKjj/WS1iKTT7oe1fz8TCyx2gk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qpKAXijfk1TSPTJUto2FVQhaeEw7O40JPb7bJaQ2WfIXelGT/lChOIa7tcMfSdQOUeGH/NfgdN7KiiUYYAaxDMgO3zfzKo8cySC9zMn3zZR9q8rDon82bFVAi75lRiIdNYEPjpfv6+IEaHusN1qeJPoI3nWibPIgaYiM/VIJ4o4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724641094; x=1756177094; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Qgo4GVZrN9GVH38k/yKjj/WS1iKTT7oe1fz8TCyx2gk=; b=HQvt9c4XGFvhCUAmX+ZcIQ7iHEKJtfnDa7QAjdW2gnC3lSjCAvYkvA4l Lbh/21JccwUuA1+GQTbR370P42mserx4+jd3gdvRoaHtl5EFOZjBAuavr KMtj+YFglh1xatjo6z65D7yQ3hGsjgebtMsMzueXkL7BG++tq5xPNoP/I qtehUfzJ/6WiiNW6i87YW0ltS1F7H2XcQg4Yh6OLj+nQ8Dig/X1znf0+y 7j8UG9KCnY3MyQraJSQWAMdeflp3Uqtn7puGErpb9+t0h6Gw5VC5FwGMg HjwbW+JKFSHg6mrL3g9489nC9BlkiF2KXGJszlprxjN5AWCQ+7m04AcgP Q==; X-CSE-ConnectionGUID: UUW0i65nQY+mvuChYlHCpA== X-CSE-MsgGUID: Uw2FEicTQ46kRggQbWwCXw== X-IronPort-AV: E=McAfee;i="6700,10204,11175"; a="22566075" X-IronPort-AV: E=Sophos;i="6.10,176,1719903600"; d="scan'208";a="22566075" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Aug 2024 19:58:13 -0700 X-CSE-ConnectionGUID: 2oFmQBPITOKW0iTlLOmIoQ== X-CSE-MsgGUID: baOpt90oQFW4D0Z9luBEDA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,176,1719903600"; d="scan'208";a="99868881" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by orviesa001.jf.intel.com with ESMTP; 25 Aug 2024 19:58:12 -0700 From: Wangyang Guo To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, tianyou.li@intel.com, Wangyang Guo Subject: [PATCH v2 5/5] malloc: Add tcache path for calloc Date: Mon, 26 Aug 2024 10:55:34 +0800 Message-ID: <20240826025534.472882-6-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240826025534.472882-1-wangyang.guo@intel.com> References: <20240826025534.472882-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org This commit add tcache support in calloc() which can largely improve the performance of small size allocation, especially in multi-thread scenario. clear_mem() is split out as a helper function for better reusing the code. Also fix tst-safe-linking failure after enabling tcache. In previous, calloc() is used as a way to by-pass tcache in memory allocation and trigger safe-linking check in fastbins path. With tcache enabled, it needs extra workarounds to bypass tcache. Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Bench Function: calloc Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.724 4 threads | 0.534 --- Changes in v2: - Merge tst-safe-linking fix to make sure CI check pass. - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159362.html Signed-off-by: Wangyang Guo --- malloc/malloc.c | 111 ++++++++++++++++++++++++-------------- malloc/tst-safe-linking.c | 81 +++++++++++++++++++++++----- 2 files changed, 138 insertions(+), 54 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index e364e8e6c4..78f1e93577 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3755,16 +3755,55 @@ __libc_pvalloc (size_t bytes) return _mid_memalign (pagesize, rounded_bytes, address); } +static __always_inline void * +clear_mem (void *mem, INTERNAL_SIZE_T csz) +{ + INTERNAL_SIZE_T *d; + unsigned long clearsize, nclears; + + /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that + contents have an odd number of INTERNAL_SIZE_T-sized words; + minimally 3. */ + d = (INTERNAL_SIZE_T *) mem; + clearsize = csz - SIZE_SZ; + nclears = clearsize / sizeof (INTERNAL_SIZE_T); + assert (nclears >= 3); + + if (nclears > 9) + return memset (d, 0, clearsize); + + else + { + *(d + 0) = 0; + *(d + 1) = 0; + *(d + 2) = 0; + if (nclears > 4) + { + *(d + 3) = 0; + *(d + 4) = 0; + if (nclears > 6) + { + *(d + 5) = 0; + *(d + 6) = 0; + if (nclears > 8) + { + *(d + 7) = 0; + *(d + 8) = 0; + } + } + } + } + + return mem; +} + void * __libc_calloc (size_t n, size_t elem_size) { mstate av; - mchunkptr oldtop; - INTERNAL_SIZE_T sz, oldtopsize; + mchunkptr oldtop, p; + INTERNAL_SIZE_T sz, oldtopsize, csz; void *mem; - unsigned long clearsize; - unsigned long nclears; - INTERNAL_SIZE_T *d; ptrdiff_t bytes; if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes))) @@ -3780,6 +3819,29 @@ __libc_calloc (size_t n, size_t elem_size) MAYBE_INIT_TCACHE (); +#if USE_TCACHE + /* int_free also calls request2size, be careful to not pad twice. */ + size_t tbytes = checked_request2size (bytes); + if (tbytes == 0) + { + __set_errno (ENOMEM); + return NULL; + } + size_t tc_idx = csize2tidx (tbytes); + + if (tc_idx < mp_.tcache_bins + && tcache != NULL + && tcache->counts[tc_idx] > 0) + { + mem = tcache_get (tc_idx); + p = mem2chunk (mem); + if (__glibc_unlikely (mtag_enabled)) + return tag_new_zero_region (mem, memsize (p)); + csz = chunksize (p); + return clear_mem (mem, csz); + } +#endif + if (SINGLE_THREAD_P) av = &main_arena; else @@ -3834,7 +3896,7 @@ __libc_calloc (size_t n, size_t elem_size) if (mem == 0) return 0; - mchunkptr p = mem2chunk (mem); + p = mem2chunk (mem); /* If we are using memory tagging, then we need to set the tags regardless of MORECORE_CLEARS, so we zero the whole block while @@ -3842,7 +3904,7 @@ __libc_calloc (size_t n, size_t elem_size) if (__glibc_unlikely (mtag_enabled)) return tag_new_zero_region (mem, memsize (p)); - INTERNAL_SIZE_T csz = chunksize (p); + csz = chunksize (p); /* Two optional cases in which clearing not necessary */ if (chunk_is_mmapped (p)) @@ -3861,40 +3923,7 @@ __libc_calloc (size_t n, size_t elem_size) } #endif - /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that - contents have an odd number of INTERNAL_SIZE_T-sized words; - minimally 3. */ - d = (INTERNAL_SIZE_T *) mem; - clearsize = csz - SIZE_SZ; - nclears = clearsize / sizeof (INTERNAL_SIZE_T); - assert (nclears >= 3); - - if (nclears > 9) - return memset (d, 0, clearsize); - - else - { - *(d + 0) = 0; - *(d + 1) = 0; - *(d + 2) = 0; - if (nclears > 4) - { - *(d + 3) = 0; - *(d + 4) = 0; - if (nclears > 6) - { - *(d + 5) = 0; - *(d + 6) = 0; - if (nclears > 8) - { - *(d + 7) = 0; - *(d + 8) = 0; - } - } - } - } - - return mem; + return clear_mem (mem, csz); } #endif /* IS_IN (libc) */ diff --git a/malloc/tst-safe-linking.c b/malloc/tst-safe-linking.c index 01dd07004d..5302575ad1 100644 --- a/malloc/tst-safe-linking.c +++ b/malloc/tst-safe-linking.c @@ -111,22 +111,37 @@ test_fastbin (void *closure) int i; int mask = ((int *)closure)[0]; size_t size = TCACHE_ALLOC_SIZE; + void * ps[TCACHE_FILL_COUNT]; + void * pps[TCACHE_FILL_COUNT]; printf ("++ fastbin ++\n"); + /* Populate the fastbin list. */ + void * volatile a = calloc (1, size); + void * volatile b = calloc (1, size); + void * volatile c = calloc (1, size); + printf ("a=%p, b=%p, c=%p\n", a, b, c); + + /* Chunks for later tcache filling from fastbins. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + void * volatile p = calloc (1, size); + pps[i] = p; + } + /* Take the tcache out of the game. */ for (i = 0; i < TCACHE_FILL_COUNT; ++i) { void * volatile p = calloc (1, size); - printf ("p=%p\n", p); - free (p); + ps[i] = p; } - /* Populate the fastbin list. */ - void * volatile a = calloc (1, size); - void * volatile b = calloc (1, size); - void * volatile c = calloc (1, size); - printf ("a=%p, b=%p, c=%p\n", a, b, c); + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + } + + /* Free abc will return to fastbin in FIFO order. */ free (a); free (b); free (c); @@ -136,11 +151,43 @@ test_fastbin (void *closure) memset (c, mask & 0xFF, size); printf ("After: c=%p, c[0]=%p\n", c, ((void **)c)[0]); + /* Filling fastbins, will be copied to tcache later. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (pps[i]); + } + + /* Drain out tcache to make sure later alloc from fastbins. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + void * volatile p = calloc (1, size); + ps[i] = p; + } + + /* This line will also filling tcache with remain pps and c. */ + pps[TCACHE_FILL_COUNT - 1] = calloc (1, size); + + /* Tcache is FILO, now the first one is c, take it out. */ c = calloc (1, size); printf ("Allocated: c=%p\n", c); + + /* Drain out remain pps from tcache. */ + for (i = 0; i < TCACHE_FILL_COUNT - 1; ++i) + { + void * volatile p = calloc (1, size); + pps[i] = p; + } + /* This line will trigger the Safe-Linking check. */ b = calloc (1, size); printf ("b=%p\n", b); + + /* Free previous pointers. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + free (pps[i]); + } } /* Try corrupting the fastbin list and trigger a consolidate. */ @@ -150,21 +197,29 @@ test_fastbin_consolidate (void *closure) int i; int mask = ((int*)closure)[0]; size_t size = TCACHE_ALLOC_SIZE; + void * ps[TCACHE_FILL_COUNT]; printf ("++ fastbin consolidate ++\n"); + /* Populate the fastbin list. */ + void * volatile a = calloc (1, size); + void * volatile b = calloc (1, size); + void * volatile c = calloc (1, size); + printf ("a=%p, b=%p, c=%p\n", a, b, c); + /* Take the tcache out of the game. */ for (i = 0; i < TCACHE_FILL_COUNT; ++i) { void * volatile p = calloc (1, size); - free (p); + ps[i] = p; } - /* Populate the fastbin list. */ - void * volatile a = calloc (1, size); - void * volatile b = calloc (1, size); - void * volatile c = calloc (1, size); - printf ("a=%p, b=%p, c=%p\n", a, b, c); + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + } + + /* Free abc will return to fastbin. */ free (a); free (b); free (c);