Message ID | PAWPR08MB8982CFB797B86F9DBDA20E6F83A42@PAWPR08MB8982.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | malloc: Move mmap code out of __libc_free hotpath | expand |
* Wilco Dijkstra: > Currently __libc_free checks for a freed mmap chunk in the fast path. > Also errno is always saved and restored to preserve it. Since mmap chunks > are larger than the largest tcache chunk, it is safe to delay this and > handle tcache, smallbin and medium bin blocks first. Move saving of errno > to cases that actually need it. I think the “Since mmap chunks are larger than the largest tcache chunk” part is not correct. We can fall back to performing mmap for a smallish allocation. But I think it's still fine to put such chunks into tcache. Maybe add a comment to this effect?
Hi Florian, >> Currently __libc_free checks for a freed mmap chunk in the fast path. >> Also errno is always saved and restored to preserve it. Since mmap chunks >> are larger than the largest tcache chunk, it is safe to delay this and >> handle tcache, smallbin and medium bin blocks first. Move saving of errno >> to cases that actually need it. > > I think the “Since mmap chunks are larger than the largest tcache > chunk” part is not correct. We can fall back to performing mmap for a > smallish allocation. But I think it's still fine to put such chunks > into tcache. Maybe add a comment to this effect? Well assuming minimum page size is 4KB it should not currently be possible for the smallest mmap chunk to get into tcache. However the available size of an mmap chunk is different, so it would go wrong unless we make the chunk layouts compatible (that would be a simplification that removes the special cases in memsize and musable). Cheers, Wilco
Hi DJ, > ... the rest of this looks ok, but it leaves _libc_free and _int_free as > being very small functions. Do we need them? Or are we relying on the > inlining to make them efficient? No we don't need them - the next step is move _int_free-* into __libc_free, simplify the code and use a similar tailcall approach as __libc_malloc. This gives similar speedups as malloc! Cheers, Wilco
* Wilco Dijkstra: > Hi Florian, > >>> Currently __libc_free checks for a freed mmap chunk in the fast path. >>> Also errno is always saved and restored to preserve it. Since mmap chunks >>> are larger than the largest tcache chunk, it is safe to delay this and >>> handle tcache, smallbin and medium bin blocks first. Move saving of errno >>> to cases that actually need it. >> >> I think the “Since mmap chunks are larger than the largest tcache >> chunk” part is not correct. We can fall back to performing mmap for a >> smallish allocation. But I think it's still fine to put such chunks >> into tcache. Maybe add a comment to this effect? > > Well assuming minimum page size is 4KB it should not currently be possible > for the smallest mmap chunk to get into tcache. I think 2K alignment will produce a 2K chunk. Still outside the range of tcache, though. I thought it reached further. > However the available size of an mmap chunk is different, so it would > go wrong unless we make the chunk layouts compatible (that would be a > simplification that removes the special cases in memsize and musable). Oh, I had forgotten about that. In this case, we probably want some sort of assert that this doesn't go wrong if someone changes TCACHE_MAX_BINS. And the comment needs updating that the number is no longer arbitrary. Thanks, Florian
Hi DJ, > Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes: >>> ... the rest of this looks ok, but it leaves _libc_free and _int_free as >>> being very small functions. Do we need them? Or are we relying on the >>> inlining to make them efficient? >> >> No we don't need them - the next step is move _int_free-* into __libc_free, >> simplify the code and use a similar tailcall approach as __libc_malloc. This >> gives similar speedups as malloc! > > Does this imply that the tcache stuff will only live in __libc_free and > nothing else will populate tcache from internally? Yes, for free there doesn't seem to be a need for another interface that frees into tcache besides __libc_free. On current trunk the _int_free functions are used once. So basically there is no need to use _int_free, either you want tcache and thus use __libc_free, or if you don't, use _int_free_chunk. Note tcache_thread_shutdown calls __libc_free eventhough it should be calling _int_free_chunk and then has to jump through hoops to avoid infinite recursion... Cheers, Wilco
diff --git a/malloc/malloc.c b/malloc/malloc.c index b73ddbf554461da34d99258fae87c6ece6d175ba..f362227e68b78e25583aa971cb6ee131f44a92e5 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3313,6 +3313,9 @@ tcache_init(void) if (tcache_shutting_down) return; + /* Preserve errno when called from free() - _int_malloc may corrupt it. */ + int err = errno; + arena_get (ar_ptr, bytes); victim = _int_malloc (ar_ptr, bytes); if (!victim && ar_ptr != NULL) @@ -3321,10 +3324,11 @@ tcache_init(void) victim = _int_malloc (ar_ptr, bytes); } - if (ar_ptr != NULL) __libc_lock_unlock (ar_ptr->mutex); + __set_errno (err); + /* In a low memory situation, we may not be able to allocate memory - in which case, we just keep trying later. However, we typically do this very early, so either there is sufficient @@ -3446,37 +3450,15 @@ __libc_free (void *mem) if (__glibc_unlikely (mtag_enabled)) *(volatile char *)mem; - int err = errno; - p = mem2chunk (mem); - if (chunk_is_mmapped (p)) /* release mmapped memory. */ - { - /* See if the dynamic brk/mmap threshold needs adjusting. - Dumped fake mmapped chunks do not affect the threshold. */ - if (!mp_.no_dyn_threshold - && chunksize_nomask (p) > mp_.mmap_threshold - && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX) - { - mp_.mmap_threshold = chunksize (p); - mp_.trim_threshold = 2 * mp_.mmap_threshold; - LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2, - mp_.mmap_threshold, mp_.trim_threshold); - } - munmap_chunk (p); - } - else - { - MAYBE_INIT_TCACHE (); - - /* Mark the chunk as belonging to the library again. */ - (void)tag_region (chunk2mem (p), memsize (p)); + MAYBE_INIT_TCACHE (); - ar_ptr = arena_for_chunk (p); - _int_free (ar_ptr, p, 0); - } + /* Mark the chunk as belonging to the library again. */ + (void)tag_region (chunk2mem (p), memsize (p)); - __set_errno (err); + ar_ptr = arena_for_chunk (p); + _int_free (ar_ptr, p, 0); } libc_hidden_def (__libc_free) @@ -4665,6 +4647,9 @@ _int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock) else if (!chunk_is_mmapped(p)) { + /* Preserve errno in case block merging results in munmap. */ + int err = errno; + /* If we're single-threaded, don't lock the arena. */ if (SINGLE_THREAD_P) have_lock = true; @@ -4676,13 +4661,33 @@ _int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock) if (!have_lock) __libc_lock_unlock (av->mutex); + + __set_errno (err); } /* If the chunk was allocated via mmap, release via munmap(). */ else { + + /* Preserve errno in case munmap sets it. */ + int err = errno; + + /* See if the dynamic brk/mmap threshold needs adjusting. + Dumped fake mmapped chunks do not affect the threshold. */ + if (!mp_.no_dyn_threshold + && chunksize_nomask (p) > mp_.mmap_threshold + && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX) + { + mp_.mmap_threshold = chunksize (p); + mp_.trim_threshold = 2 * mp_.mmap_threshold; + LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2, + mp_.mmap_threshold, mp_.trim_threshold); + } + munmap_chunk (p); + + __set_errno (err); } }