From patchwork Thu Aug 22 02:59:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Wangyang" X-Patchwork-Id: 1975221 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=Re3n1CS5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wq7MM3vlgz1ybW for ; Thu, 22 Aug 2024 13:03:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BC5BE38708D3 for ; Thu, 22 Aug 2024 03:03:57 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by sourceware.org (Postfix) with ESMTPS id 5D7ED38708F7 for ; Thu, 22 Aug 2024 03:02:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5D7ED38708F7 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5D7ED38708F7 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724295759; cv=none; b=d+h/FL3+jS3cZUAa2t49UPZwNgUWanbNBNiaEq4c3L//gOtUPMOSjINB5uWzH3WMuSgByL2eu3w/t2BrMRC++8Z4bb26jBhqCF9WXrGszvGwuQEA/kGR8RmF8W/G/2KacM5gFm7nh8hnJ0w6fccw9EOXj0xZMouW4WYXHevrUYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724295759; c=relaxed/simple; bh=TAAEGXCno3Mh0LcApjO9x2v9uPwC0Mcdo20syTvg7qs=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=x7FJPWVfsfZQr6HXYM/Noi3vlfQKLC4F+D1nOgE7Xm1N/0svAE4Rl29wTbaUxtWPJtitUW2/fllZddD4hsivsZ+TmBFIUi0yDrranc6oYvrHtAhtGKAZLEPOGEI9WRurF/FO80LoUlDOq56dgB6oRcyuF8pHVLz+CscksNaYuwY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724295757; x=1755831757; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TAAEGXCno3Mh0LcApjO9x2v9uPwC0Mcdo20syTvg7qs=; b=Re3n1CS5ZtXHoD0VQ/JlbmW3ggqnKcOvA18m6aFXKz4ZmbaF0M5kmPqs GaTsJEH6Rn+Ql8KZ7zZG25PWVGUczaTa9TNFSJoH6GpTbNgwbXF4EGkJC T86kjDihpD95xioI7EXgDXlGTaMPqPMwNGulsItEJFJSdMqFTXQwh/96U Ilueus2qDRDChG/1p0O9g944MdVCFts14GH9ucrLL9BPXuDCCaya9YVPS jnZ7lO9PqipDs69PpWrCpAHM1ZtmihW4Suz5tXYIhK5GlYdwzlrEae0gu BL+BU3RDWALZHUCU4n6Ar/lFLwBmA4jzVl9fZ7PDX/aJXpxl45KunfEnv g==; X-CSE-ConnectionGUID: BjmkyB1XRr6lo87kACfqUQ== X-CSE-MsgGUID: EfECJDmzS1S4ROKxVG3v7w== X-IronPort-AV: E=McAfee;i="6700,10204,11171"; a="25581836" X-IronPort-AV: E=Sophos;i="6.10,165,1719903600"; d="scan'208";a="25581836" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2024 20:02:37 -0700 X-CSE-ConnectionGUID: N5k7Ce0sQbaDBc2p+eiB8g== X-CSE-MsgGUID: 1CDYTgg2RHa+HH1Ts09SDQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,165,1719903600"; d="scan'208";a="66181842" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by orviesa005.jf.intel.com with ESMTP; 21 Aug 2024 20:02:35 -0700 From: Wangyang Guo To: libc-alpha@sourceware.org Cc: Noah Goldstein , Tianyou Li , Wangyang Guo Subject: [PATCH 5/6] malloc: Add tcache path for calloc Date: Thu, 22 Aug 2024 10:59:20 +0800 Message-ID: <20240822025921.3120998-6-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240822025921.3120998-1-wangyang.guo@intel.com> References: <20240822025921.3120998-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org This commit add tcache support in calloc() which can largely improve the performance of small size allocation, especially in multi-thread scenario. clear_mem() is also split out as a helper function for better reusing the code. Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Bench Function: calloc Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.724 4 threads | 0.534 Signed-off-by: Wangyang Guo --- malloc/malloc.c | 111 ++++++++++++++++++++++++++++++------------------ 1 file changed, 70 insertions(+), 41 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index 030aff093b..19fdd72444 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3755,16 +3755,55 @@ __libc_pvalloc (size_t bytes) return _mid_memalign (pagesize, rounded_bytes, address); } +static __always_inline void * +clear_mem (void *mem, INTERNAL_SIZE_T csz) +{ + INTERNAL_SIZE_T *d; + unsigned long clearsize, nclears; + + /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that + contents have an odd number of INTERNAL_SIZE_T-sized words; + minimally 3. */ + d = (INTERNAL_SIZE_T *) mem; + clearsize = csz - SIZE_SZ; + nclears = clearsize / sizeof (INTERNAL_SIZE_T); + assert (nclears >= 3); + + if (nclears > 9) + return memset (d, 0, clearsize); + + else + { + *(d + 0) = 0; + *(d + 1) = 0; + *(d + 2) = 0; + if (nclears > 4) + { + *(d + 3) = 0; + *(d + 4) = 0; + if (nclears > 6) + { + *(d + 5) = 0; + *(d + 6) = 0; + if (nclears > 8) + { + *(d + 7) = 0; + *(d + 8) = 0; + } + } + } + } + + return mem; +} + void * __libc_calloc (size_t n, size_t elem_size) { mstate av; - mchunkptr oldtop; - INTERNAL_SIZE_T sz, oldtopsize; + mchunkptr oldtop, p; + INTERNAL_SIZE_T sz, oldtopsize, csz; void *mem; - unsigned long clearsize; - unsigned long nclears; - INTERNAL_SIZE_T *d; ptrdiff_t bytes; if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes))) @@ -3780,6 +3819,29 @@ __libc_calloc (size_t n, size_t elem_size) MAYBE_INIT_TCACHE (); +#if USE_TCACHE + /* int_free also calls request2size, be careful to not pad twice. */ + size_t tbytes = checked_request2size (bytes); + if (tbytes == 0) + { + __set_errno (ENOMEM); + return NULL; + } + size_t tc_idx = csize2tidx (tbytes); + + if (tc_idx < mp_.tcache_bins + && tcache != NULL + && tcache->counts[tc_idx] > 0) + { + mem = tcache_get (tc_idx); + p = mem2chunk (mem); + if (__glibc_unlikely (mtag_enabled)) + return tag_new_zero_region (mem, memsize (p)); + csz = chunksize (p); + return clear_mem (mem, csz); + } +#endif + if (SINGLE_THREAD_P) av = &main_arena; else @@ -3834,7 +3896,7 @@ __libc_calloc (size_t n, size_t elem_size) if (mem == 0) return 0; - mchunkptr p = mem2chunk (mem); + p = mem2chunk (mem); /* If we are using memory tagging, then we need to set the tags regardless of MORECORE_CLEARS, so we zero the whole block while @@ -3842,7 +3904,7 @@ __libc_calloc (size_t n, size_t elem_size) if (__glibc_unlikely (mtag_enabled)) return tag_new_zero_region (mem, memsize (p)); - INTERNAL_SIZE_T csz = chunksize (p); + csz = chunksize (p); /* Two optional cases in which clearing not necessary */ if (chunk_is_mmapped (p)) @@ -3861,40 +3923,7 @@ __libc_calloc (size_t n, size_t elem_size) } #endif - /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that - contents have an odd number of INTERNAL_SIZE_T-sized words; - minimally 3. */ - d = (INTERNAL_SIZE_T *) mem; - clearsize = csz - SIZE_SZ; - nclears = clearsize / sizeof (INTERNAL_SIZE_T); - assert (nclears >= 3); - - if (nclears > 9) - return memset (d, 0, clearsize); - - else - { - *(d + 0) = 0; - *(d + 1) = 0; - *(d + 2) = 0; - if (nclears > 4) - { - *(d + 3) = 0; - *(d + 4) = 0; - if (nclears > 6) - { - *(d + 5) = 0; - *(d + 6) = 0; - if (nclears > 8) - { - *(d + 7) = 0; - *(d + 8) = 0; - } - } - } - } - - return mem; + return clear_mem (mem, csz); } #endif /* IS_IN (libc) */