Message ID | 002301cff37e$61c12330$25436990$@com |
---|---|
State | New |
Headers | show |
> On Oct 29, 2014, at 6:43 AM, Wilco Dijkstra <wdijkstr@arm.com> wrote: > > This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main goal is to > reduce performance variations due to random alignment choices, however it improves performance on > several benchmarks as well. SPECFP2000 improves by ~1.5%. > > Any comments? > > ChangeLog: > 2014-10-29 Wilco Dijkstra <wdijkstr@arm.com> > > * malloc/malloc.c (__libc_malloc): Add support for aligning large blocks. > * sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h: New file. > New defines (MALLOC_LARGE_BLOCK_ALIGN), (MALLOC_LARGE_BLOCK_SIZE). > * sysdeps/unix/sysv/linux/arm/malloc-sysdep.h: Likewise. > > --- > malloc/malloc.c | 8 ++++++++ > sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h | 23 +++++++++++++++++++++++ > sysdeps/unix/sysv/linux/arm/malloc-sysdep.h | 23 +++++++++++++++++++++++ > 3 files changed, 54 insertions(+) > create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > create mode 100644 sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > > diff --git a/malloc/malloc.c b/malloc/malloc.c > index 6cbe9f3..0b0466e 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -2878,6 +2878,14 @@ __libc_malloc (size_t bytes) > mstate ar_ptr; > void *victim; > > +#ifdef MALLOC_LARGE_BLOCK_ALIGN > + if (bytes > MALLOC_LARGE_BLOCK_SIZE) > + { > + void *address = RETURN_ADDRESS (0); > + return _mid_memalign (MALLOC_LARGE_BLOCK_ALIGN, bytes, address); > + } > +#endif > + > void *(*hook) (size_t, const void *) > = atomic_forced_read (__malloc_hook); > if (__builtin_expect (hook != NULL, 0)) > diff --git a/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > b/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > new file mode 100644 > index 0000000..3fe9f72 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > @@ -0,0 +1,23 @@ > +/* System-specific malloc support functions. AArch64 version. > + Copyright (C) 2012-2014 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#define MALLOC_LARGE_BLOCK_ALIGN (64) Yes that is not the only cache line size on aarch64. Why don't you make it a global variable and read it from the system register like we do for memset? ThunderX has a cache line size of 128. Thanks, Andrew > +#define MALLOC_LARGE_BLOCK_SIZE (16384) > + > +#include_next <malloc-sysdep.h> > + > diff --git a/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > b/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > new file mode 100644 > index 0000000..3fe9f72 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > @@ -0,0 +1,23 @@ > +/* System-specific malloc support functions. AArch64 version. > + Copyright (C) 2012-2014 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#define MALLOC_LARGE_BLOCK_ALIGN (64) > +#define MALLOC_LARGE_BLOCK_SIZE (16384) > + > +#include_next <malloc-sysdep.h> > + > -- > 1.7.9.5 > >
On 29 October 2014 13:43, Wilco Dijkstra <wdijkstr@arm.com> wrote: > This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main goal is to > reduce performance variations due to random alignment choices, however it improves performance on > several benchmarks as well. SPECFP2000 improves by ~1.5%. > > Any comments? This bug may be relevant to changing malloc alignments: https://sourceware.org/bugzilla/show_bug.cgi?id=6527 I don't know precisely what the issue is with malloc_get_state so I am not sure if it applies here. > ChangeLog: > 2014-10-29 Wilco Dijkstra <wdijkstr@arm.com> > > * malloc/malloc.c (__libc_malloc): Add support for aligning large blocks. I think this should be: malloc/malloc.c (__libc_malloc) [MALLOC_LARGE_BLOCK_ALIGN]: Add support for aligning large blocks. Although I am not a GNU ChangeLog expert... > * sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h: New file. > New defines (MALLOC_LARGE_BLOCK_ALIGN), (MALLOC_LARGE_BLOCK_SIZE). "New file." should be sufficient. > * sysdeps/unix/sysv/linux/arm/malloc-sysdep.h: Likewise. > > --- > malloc/malloc.c | 8 ++++++++ > sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h | 23 +++++++++++++++++++++++ > sysdeps/unix/sysv/linux/arm/malloc-sysdep.h | 23 +++++++++++++++++++++++ > 3 files changed, 54 insertions(+) > create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > create mode 100644 sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > > diff --git a/malloc/malloc.c b/malloc/malloc.c > index 6cbe9f3..0b0466e 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -2878,6 +2878,14 @@ __libc_malloc (size_t bytes) > mstate ar_ptr; > void *victim; > > +#ifdef MALLOC_LARGE_BLOCK_ALIGN > + if (bytes > MALLOC_LARGE_BLOCK_SIZE) > + { > + void *address = RETURN_ADDRESS (0); > + return _mid_memalign (MALLOC_LARGE_BLOCK_ALIGN, bytes, address); > + } > +#endif > + This will change the behaviour of malloc hooks i.e. the malloc hook will not get triggered but memalign will. > void *(*hook) (size_t, const void *) > = atomic_forced_read (__malloc_hook); > if (__builtin_expect (hook != NULL, 0)) > diff --git a/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > b/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > new file mode 100644 > index 0000000..3fe9f72 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h > @@ -0,0 +1,23 @@ > +/* System-specific malloc support functions. AArch64 version. > + Copyright (C) 2012-2014 Free Software Foundation, Inc. These copyright lines are wrong, but the whole copy of the LGPL could probably be dropped with the file being so simple. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#define MALLOC_LARGE_BLOCK_ALIGN (64) > +#define MALLOC_LARGE_BLOCK_SIZE (16384) > + > +#include_next <malloc-sysdep.h> > + > diff --git a/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > b/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > new file mode 100644 > index 0000000..3fe9f72 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h > @@ -0,0 +1,23 @@ > +/* System-specific malloc support functions. AArch64 version. > + Copyright (C) 2012-2014 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#define MALLOC_LARGE_BLOCK_ALIGN (64) > +#define MALLOC_LARGE_BLOCK_SIZE (16384) > + > +#include_next <malloc-sysdep.h> > + > -- > 1.7.9.5 > >
On Wed, Oct 29, 2014 at 2:05 PM, Will Newton <will.newton@linaro.org> wrote: > On 29 October 2014 13:43, Wilco Dijkstra <wdijkstr@arm.com> wrote: >> This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main goal is to >> reduce performance variations due to random alignment choices, however it improves performance on >> several benchmarks as well. SPECFP2000 improves by ~1.5%. >> >> Any comments? > > This bug may be relevant to changing malloc alignments: > > https://sourceware.org/bugzilla/show_bug.cgi?id=6527 > > I don't know precisely what the issue is with malloc_get_state so I am > not sure if it applies here. > >> ChangeLog: >> 2014-10-29 Wilco Dijkstra <wdijkstr@arm.com> >> >> * malloc/malloc.c (__libc_malloc): Add support for aligning large blocks. > > I think this should be: > > malloc/malloc.c (__libc_malloc) [MALLOC_LARGE_BLOCK_ALIGN]: Add > support for aligning large blocks. > > Although I am not a GNU ChangeLog expert... > >> * sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h: New file. >> New defines (MALLOC_LARGE_BLOCK_ALIGN), (MALLOC_LARGE_BLOCK_SIZE). > > "New file." should be sufficient. > >> * sysdeps/unix/sysv/linux/arm/malloc-sysdep.h: Likewise. >> >> --- >> malloc/malloc.c | 8 ++++++++ >> sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h | 23 +++++++++++++++++++++++ >> sysdeps/unix/sysv/linux/arm/malloc-sysdep.h | 23 +++++++++++++++++++++++ >> 3 files changed, 54 insertions(+) >> create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h >> create mode 100644 sysdeps/unix/sysv/linux/arm/malloc-sysdep.h >> >> diff --git a/malloc/malloc.c b/malloc/malloc.c >> index 6cbe9f3..0b0466e 100644 >> --- a/malloc/malloc.c >> +++ b/malloc/malloc.c >> @@ -2878,6 +2878,14 @@ __libc_malloc (size_t bytes) >> mstate ar_ptr; >> void *victim; >> >> +#ifdef MALLOC_LARGE_BLOCK_ALIGN >> + if (bytes > MALLOC_LARGE_BLOCK_SIZE) >> + { >> + void *address = RETURN_ADDRESS (0); >> + return _mid_memalign (MALLOC_LARGE_BLOCK_ALIGN, bytes, address); >> + } >> +#endif >> + > > This will change the behaviour of malloc hooks i.e. the malloc hook > will not get triggered but memalign will. Yeah that was my first reaction too - but read _mid_memalign - that appears to call the malloc hooks too ? Ramana
> Andrew Pinski wrote: > > > +#define MALLOC_LARGE_BLOCK_ALIGN (64) > > Yes that is not the only cache line size on aarch64. Why don't you make it a global variable > and read it from the system register like we do for memset? ThunderX has a cache line size of > 128. We could easily set it to 128 for AArch64 - the amount of wasted memory is minimal for large block sizes so overaligning doesn't hurt much. However you raise an interesting point - several functions may want to know the cacheline size and this would be useful across all targets. So we could add generic infrastructure that sets the cacheline size at GLIBC startup rather than having to add complex target specific code to compute it and cache it on demand in every function that might query the line size. Wilco
On 29 October 2014 14:16, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Wed, Oct 29, 2014 at 2:05 PM, Will Newton <will.newton@linaro.org> wrote: >> On 29 October 2014 13:43, Wilco Dijkstra <wdijkstr@arm.com> wrote: >>> This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main goal is to >>> reduce performance variations due to random alignment choices, however it improves performance on >>> several benchmarks as well. SPECFP2000 improves by ~1.5%. >>> >>> Any comments? >> >> This bug may be relevant to changing malloc alignments: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=6527 >> >> I don't know precisely what the issue is with malloc_get_state so I am >> not sure if it applies here. >> >>> ChangeLog: >>> 2014-10-29 Wilco Dijkstra <wdijkstr@arm.com> >>> >>> * malloc/malloc.c (__libc_malloc): Add support for aligning large blocks. >> >> I think this should be: >> >> malloc/malloc.c (__libc_malloc) [MALLOC_LARGE_BLOCK_ALIGN]: Add >> support for aligning large blocks. >> >> Although I am not a GNU ChangeLog expert... >> >>> * sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h: New file. >>> New defines (MALLOC_LARGE_BLOCK_ALIGN), (MALLOC_LARGE_BLOCK_SIZE). >> >> "New file." should be sufficient. >> >>> * sysdeps/unix/sysv/linux/arm/malloc-sysdep.h: Likewise. >>> >>> --- >>> malloc/malloc.c | 8 ++++++++ >>> sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h | 23 +++++++++++++++++++++++ >>> sysdeps/unix/sysv/linux/arm/malloc-sysdep.h | 23 +++++++++++++++++++++++ >>> 3 files changed, 54 insertions(+) >>> create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h >>> create mode 100644 sysdeps/unix/sysv/linux/arm/malloc-sysdep.h >>> >>> diff --git a/malloc/malloc.c b/malloc/malloc.c >>> index 6cbe9f3..0b0466e 100644 >>> --- a/malloc/malloc.c >>> +++ b/malloc/malloc.c >>> @@ -2878,6 +2878,14 @@ __libc_malloc (size_t bytes) >>> mstate ar_ptr; >>> void *victim; >>> >>> +#ifdef MALLOC_LARGE_BLOCK_ALIGN >>> + if (bytes > MALLOC_LARGE_BLOCK_SIZE) >>> + { >>> + void *address = RETURN_ADDRESS (0); >>> + return _mid_memalign (MALLOC_LARGE_BLOCK_ALIGN, bytes, address); >>> + } >>> +#endif >>> + >> >> This will change the behaviour of malloc hooks i.e. the malloc hook >> will not get triggered but memalign will. > > Yeah that was my first reaction too - but read _mid_memalign - that > appears to call the malloc hooks too ? But _mid_memalign calls the memalign hook rather than the malloc hook which is rather surprising given we called malloc. It should simply be a case of moving the #ifdef block to below the call of the malloc hook in __libc_malloc to retain the existing behaviour.
> Will Newton wrote: > >> This will change the behaviour of malloc hooks i.e. the malloc hook > >> will not get triggered but memalign will. > > > > Yeah that was my first reaction too - but read _mid_memalign - that > > appears to call the malloc hooks too ? > > But _mid_memalign calls the memalign hook rather than the malloc hook > which is rather surprising given we called malloc. It should simply be > a case of moving the #ifdef block to below the call of the malloc hook > in __libc_malloc to retain the existing behaviour. Right, it would be trivial to move the code so that the malloc hook is tried first. However if it isn't set then it will try the memalign hook as well. I'm unclear what the idea of the hooks is, to hook the original call only or to hook whatever low-level function you eventually fall into or any combination... There is also an overhead in trying several hooks rather than just one. One possibility would be to split _mid_memalign, skip all the unnecessary tests on alignment (which we know is correct) and directly go to the arena_get part, which is simpler and faster. Wilco
On Wed, Oct 29, 2014 at 01:43:54PM -0000, Wilco Dijkstra wrote: > This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main goal is to > reduce performance variations due to random alignment choices, however it improves performance on > several benchmarks as well. SPECFP2000 improves by ~1.5%. > > Any comments? It seems like this would pathologically increase fragmentation: when a tiny block is split off the beginning to align a large allocation, isn't it likely that this tiny chunk (rather than some other tiny chunk already in the appropriate-sized free bin prior to the large allocation) will be used to satisfy a tiny request, which may end up being long-lived? memalign-type operations are known to be a major problem for fragmentation. See this bug report: https://sourceware.org/bugzilla/show_bug.cgi?id=14581 A smarter approach might be to round up the _sizes_ of large allocations so that the next available address after the allocated block is nicely aligned. This will tend to make _future_ allocations aligned, even if the first one isn't, and won't leave any free gaps that could contribute to fragmentation. Rich
On Wed, Oct 29, 2014 at 9:25 AM, Wilco Dijkstra <wdijkstr@arm.com> wrote: > > > Andrew Pinski wrote: > > > > > +#define MALLOC_LARGE_BLOCK_ALIGN (64) > > > > Yes that is not the only cache line size on aarch64. Why don't you make it a global variable > > and read it from the system register like we do for memset? ThunderX has a cache line size of > > 128. > > We could easily set it to 128 for AArch64 - the amount of wasted memory is > minimal for large block sizes so overaligning doesn't hurt much. > > However you raise an interesting point - several functions may want to know > the cacheline size and this would be useful across all targets. So we could > add generic infrastructure that sets the cacheline size at GLIBC startup > rather than having to add complex target specific code to compute it and > cache it on demand in every function that might query the line size. This has already done for powerpc since the A2 processor can use different cache-line sizes, so look there for precedent. grep for __cache_line_size.
On Wed, Oct 29, 2014 at 2:27 PM, Will Newton <will.newton@linaro.org> wrote: > On 29 October 2014 14:16, Ramana Radhakrishnan > <ramana.gcc@googlemail.com> wrote: >> On Wed, Oct 29, 2014 at 2:05 PM, Will Newton <will.newton@linaro.org> wrote: >>> On 29 October 2014 13:43, Wilco Dijkstra <wdijkstr@arm.com> wrote: >>>> This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main goal is to >>>> reduce performance variations due to random alignment choices, however it improves performance on >>>> several benchmarks as well. SPECFP2000 improves by ~1.5%. >>>> >>>> Any comments? >>> >>> This bug may be relevant to changing malloc alignments: >>> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=6527 >>> >>> I don't know precisely what the issue is with malloc_get_state so I am >>> not sure if it applies here. >>> >>>> ChangeLog: >>>> 2014-10-29 Wilco Dijkstra <wdijkstr@arm.com> >>>> >>>> * malloc/malloc.c (__libc_malloc): Add support for aligning large blocks. >>> >>> I think this should be: >>> >>> malloc/malloc.c (__libc_malloc) [MALLOC_LARGE_BLOCK_ALIGN]: Add >>> support for aligning large blocks. >>> >>> Although I am not a GNU ChangeLog expert... >>> >>>> * sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h: New file. >>>> New defines (MALLOC_LARGE_BLOCK_ALIGN), (MALLOC_LARGE_BLOCK_SIZE). >>> >>> "New file." should be sufficient. >>> >>>> * sysdeps/unix/sysv/linux/arm/malloc-sysdep.h: Likewise. >>>> >>>> --- >>>> malloc/malloc.c | 8 ++++++++ >>>> sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h | 23 +++++++++++++++++++++++ >>>> sysdeps/unix/sysv/linux/arm/malloc-sysdep.h | 23 +++++++++++++++++++++++ >>>> 3 files changed, 54 insertions(+) >>>> create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h >>>> create mode 100644 sysdeps/unix/sysv/linux/arm/malloc-sysdep.h >>>> >>>> diff --git a/malloc/malloc.c b/malloc/malloc.c >>>> index 6cbe9f3..0b0466e 100644 >>>> --- a/malloc/malloc.c >>>> +++ b/malloc/malloc.c >>>> @@ -2878,6 +2878,14 @@ __libc_malloc (size_t bytes) >>>> mstate ar_ptr; >>>> void *victim; >>>> >>>> +#ifdef MALLOC_LARGE_BLOCK_ALIGN >>>> + if (bytes > MALLOC_LARGE_BLOCK_SIZE) >>>> + { >>>> + void *address = RETURN_ADDRESS (0); >>>> + return _mid_memalign (MALLOC_LARGE_BLOCK_ALIGN, bytes, address); >>>> + } >>>> +#endif >>>> + >>> >>> This will change the behaviour of malloc hooks i.e. the malloc hook >>> will not get triggered but memalign will. >> >> Yeah that was my first reaction too - but read _mid_memalign - that >> appears to call the malloc hooks too ? > > But _mid_memalign calls the memalign hook rather than the malloc hook > which is rather surprising given we called malloc. It should simply be > a case of moving the #ifdef block to below the call of the malloc hook > in __libc_malloc to retain the existing behaviour. Indeed - sorry for the noise. Ramana > > -- > Will Newton > Toolchain Working Group, Linaro
> Rich Felker wrote: > On Wed, Oct 29, 2014 at 01:43:54PM -0000, Wilco Dijkstra wrote: > > This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main > goal is to > > reduce performance variations due to random alignment choices, however it improves > performance on > > several benchmarks as well. SPECFP2000 improves by ~1.5%. > > > > Any comments? > > It seems like this would pathologically increase fragmentation: when a > tiny block is split off the beginning to align a large allocation, > isn't it likely that this tiny chunk (rather than some other tiny > chunk already in the appropriate-sized free bin prior to the large > allocation) will be used to satisfy a tiny request, which may end up > being long-lived? A good implementation of memalign would try to avoid that, for example by increasing the previous allocated block or not adding tiny splitoff chunks to the free list. > memalign-type operations are known to be a major problem for > fragmentation. See this bug report: > > https://sourceware.org/bugzilla/show_bug.cgi?id=14581 It seems that report is claiming that freeing a block doesn't immediately merge with adjacent freed blocks. If that is the case then that is a very serious bug indeed! The fact that memalign is not fully integrated into the allocation code doesn't help either - it never tries to reuse an correctly aligned free block. > A smarter approach might be to round up the _sizes_ of large > allocations so that the next available address after the allocated > block is nicely aligned. This will tend to make _future_ allocations > aligned, even if the first one isn't, and won't leave any free gaps > that could contribute to fragmentation. That might be possible too, I'd have to give it a go. I was trying to avoid reworking the guts of the malloc code - I think it would be better to create a new modern allocator from scratch. Wilco
On Wed, Oct 29, 2014 at 03:31:37PM -0000, Wilco Dijkstra wrote: > > Rich Felker wrote: > > On Wed, Oct 29, 2014 at 01:43:54PM -0000, Wilco Dijkstra wrote: > > > This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main > > goal is to > > > reduce performance variations due to random alignment choices, however it improves > > performance on > > > several benchmarks as well. SPECFP2000 improves by ~1.5%. > > > > > > Any comments? > > > > It seems like this would pathologically increase fragmentation: when a > > tiny block is split off the beginning to align a large allocation, > > isn't it likely that this tiny chunk (rather than some other tiny > > chunk already in the appropriate-sized free bin prior to the large > > allocation) will be used to satisfy a tiny request, which may end up > > being long-lived? > > A good implementation of memalign would try to avoid that, for example > by increasing the previous allocated block or not adding tiny splitoff > chunks to the free list. I don't think modifying the previous allocated block is practical; this almost certainly has data races unless you impose costly synchronization. If you do the split but don't add the split block to the free list, I don't see how you'd track it to free it later. You could refrain from splitting the tiny block at all and instead keep it as part of the the allocated block with a special flag bit on the header to indicate that the allocator needs to look back to find the actual beginning of the block when freeing or reallocing it, etc. This is essentially a more complex, but slightly more optimal, version of my idea for rounding up the size. > > memalign-type operations are known to be a major problem for > > fragmentation. See this bug report: > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=14581 > > It seems that report is claiming that freeing a block doesn't immediately > merge with adjacent freed blocks. If that is the case then that is a very > serious bug indeed! The fact that memalign is not fully integrated into > the allocation code doesn't help either - it never tries to reuse an > correctly aligned free block. Yes, there are probably improvements that could be made to memalign to reduce the problem, but they have tradeoffs. I don't think it's practical to make memalign reuse correctly aligned free blocks unless you segregate the free lists not just by size but also by alignment. Otherwise you'd have to do a non-constant-time search to determine whether a free block of the desired alignment exists and select it. > > A smarter approach might be to round up the _sizes_ of large > > allocations so that the next available address after the allocated > > block is nicely aligned. This will tend to make _future_ allocations > > aligned, even if the first one isn't, and won't leave any free gaps > > that could contribute to fragmentation. > > That might be possible too, I'd have to give it a go. I was trying to avoid > reworking the guts of the malloc code - I think it would be better to create > a new modern allocator from scratch. That's quite possible. All the basic algorithms involved are understood and well documented, so it's certainly practical to redo without all the cruft. However there is quite a big testing burden. Even if you know what you're doing, it's very easy to make stupid mistakes. Rich
On 10/29/2014 09:43 AM, Wilco Dijkstra wrote: > +#define MALLOC_LARGE_BLOCK_ALIGN (64) > +#define MALLOC_LARGE_BLOCK_SIZE (16384) Typo-prone interface. It will generate -Wundef errors. See: https://sourceware.org/glibc/wiki/Wundef It will become an error soon to leave these undefined for other arches. Cheers, Carlos.
diff --git a/malloc/malloc.c b/malloc/malloc.c index 6cbe9f3..0b0466e 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2878,6 +2878,14 @@ __libc_malloc (size_t bytes) mstate ar_ptr; void *victim; +#ifdef MALLOC_LARGE_BLOCK_ALIGN + if (bytes > MALLOC_LARGE_BLOCK_SIZE) + { + void *address = RETURN_ADDRESS (0); + return _mid_memalign (MALLOC_LARGE_BLOCK_ALIGN, bytes, address); + } +#endif + void *(*hook) (size_t, const void *) = atomic_forced_read (__malloc_hook); if (__builtin_expect (hook != NULL, 0)) diff --git a/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h b/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h new file mode 100644 index 0000000..3fe9f72 --- /dev/null +++ b/sysdeps/unix/sysv/linux/aarch64/malloc-sysdep.h @@ -0,0 +1,23 @@ +/* System-specific malloc support functions. AArch64 version. + Copyright (C) 2012-2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#define MALLOC_LARGE_BLOCK_ALIGN (64) +#define MALLOC_LARGE_BLOCK_SIZE (16384) + +#include_next <malloc-sysdep.h> + diff --git a/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h b/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h new file mode 100644 index 0000000..3fe9f72 --- /dev/null +++ b/sysdeps/unix/sysv/linux/arm/malloc-sysdep.h @@ -0,0 +1,23 @@ +/* System-specific malloc support functions. AArch64 version. + Copyright (C) 2012-2014 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#define MALLOC_LARGE_BLOCK_ALIGN (64) +#define MALLOC_LARGE_BLOCK_SIZE (16384) + +#include_next <malloc-sysdep.h> +