@@ -2143,6 +2143,9 @@ do_check_inuse_chunk (mstate av, mchunkptr p)
{
mchunkptr next;
+ if (av == NULL)
+ av = arena_for_chunk (p);
+
do_check_chunk (av, p);
if (chunk_is_mmapped (p))
@@ -3439,17 +3442,20 @@ __libc_free (void *mem)
/* Mark the chunk as belonging to the library again. */
(void)tag_region (chunk2mem (p), memsize (p));
- ar_ptr = arena_for_chunk (p);
INTERNAL_SIZE_T size = chunksize (p);
#if USE_TCACHE
- _int_free_check (ar_ptr, p, size);
+ /* av is not needed for _int_free_check in non-DEBUG mode,
+ in DEBUG mode, av will fetch from p in do_check_inuse_chunk. */
+ _int_free_check (NULL, p, size);
if (tcache_free (p, size))
{
__set_errno (err);
return;
}
#endif
+
+ ar_ptr = arena_for_chunk (p);
_int_free_chunk (ar_ptr, p, size, 0);
}
Arena is not needed for _int_free_check() in non-DEBUG mode. This commit defers arena deference to _int_free_chunk() thus accelerate tcache path. When DEBUG enabled, arena can be obtained from p in do_check_inuse_chunk(). Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.994 4 threads | 0.968 The data shows it can brings 3% performance gain in multi-thread scenario. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> --- malloc/malloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)