Message ID | ce479249-6362-6a58-ec9b-d6227ef99db9@redhat.com |
---|---|
State | New |
Headers | show |
Carlos O'Donell wrote: > The dynamic array growth factor appears to be a value of 2, and that doesn't > match best practice nor academic literature. I would be happier with a growth > factor closer to 1.5. For what it's worth, Gnulib (which has supported dynamically growabable arrays for some time) generates the new size via "n += n / 2 + 1;" by default. This guarantees growth even if N is zero, and approximates the growth factor of 1.5 that you're suggesting. (The code also checks for integer overflow of course.) The caller can ask for a different amount of growth, if necessary. > Can we make the choice to have dynarray always heap allocated? In Gnulib we've found it better to support initial buffers on the stack, as an option. Commonly these initial buffers are already big enough, which is a performance win in the typical case. > I dont like defaults for this interface. I think the user should > explicitly request a size or this should fail. Why impose this requirement? In Gnulib the caller can specify an initial size, but this is optional. The default initial size is chosen to be efficient for the underlying malloc implementation. It would seem odd to require callers to know this implementation detail. As a general point, I suggest looking at Gnulib's lib/xalloc.h and lib/xmalloc.c for ideas. It has many of the ideas of the proposal (including some type safety). It's not as fancy as the proposal, but it does have the advantage of widespread use and implementation experience. http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xalloc.h http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xmalloc.c
On 05/20/2017 08:43 PM, Paul Eggert wrote: > Carlos O'Donell wrote: >> The dynamic array growth factor appears to be a value of 2, and >> that doesn't match best practice nor academic literature. I would >> be happier with a growth factor closer to 1.5. > > For what it's worth, Gnulib (which has supported dynamically > growabable arrays for some time) generates the new size via "n += n / > 2 + 1;" by default. This guarantees growth even if N is zero, and > approximates the growth factor of 1.5 that you're suggesting. (The > code also checks for integer overflow of course.) The caller can ask > for a different amount of growth, if necessary. The growth factor is related to the underlying storage being used by the dynamic array. If the storage had such properties that the array could be grown in-place and to any size with zero cost, then the growth could be incredibly small and size kept optimal. Since no underlying storage has those properties at all times we choose a growth factor that maps to the properties that are present. Here, those properties are a function of the underlying system allocator, how it calls mmap, when, how often, and the costs associated with it. In general I'm surprised to see that 1.5 is a universally good constant, then again, most OS have similar subsystems when it comes to allocators and virtual memory. I don't know that having the caller ask for different growth is going to be initially useful in glibc so I'm not going to ask for this in the API. >> Can we make the choice to have dynarray always heap allocated? > > In Gnulib we've found it better to support initial buffers on the > stack, as an option. Commonly these initial buffers are already big > enough, which is a performance win in the typical case. OK. Thanks for the data point here. >> I dont like defaults for this interface. I think the user should >> explicitly request a size or this should fail. > > Why impose this requirement? In Gnulib the caller can specify an > initial size, but this is optional. The default initial size is > chosen to be efficient for the underlying malloc implementation. It > would seem odd to require callers to know this implementation > detail. I see your point. Having the caller convey that no information is relevant in the context allows the implementation to choose something optimal given the backing store. And we have some use cases like this in the loader. At most we know we have say one link map, but how many more there are after is completely unknown (unless we had statistical data to guide us). > As a general point, I suggest looking at Gnulib's lib/xalloc.h and > lib/xmalloc.c for ideas. It has many of the ideas of the proposal > (including some type safety). It's not as fancy as the proposal, but > it does have the advantage of widespread use and implementation > experience. > > http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xalloc.h > http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xmalloc.c Before I look... What license do these files have?
On 06/01/2017 10:13 AM, Carlos O'Donell wrote: > On 05/20/2017 08:43 PM, Paul Eggert wrote: >> Carlos O'Donell wrote: >>> The dynamic array growth factor appears to be a value of 2, and >>> that doesn't match best practice nor academic literature. I would >>> be happier with a growth factor closer to 1.5. >> >> For what it's worth, Gnulib (which has supported dynamically >> growabable arrays for some time) generates the new size via "n += n / >> 2 + 1;" by default. This guarantees growth even if N is zero, and >> approximates the growth factor of 1.5 that you're suggesting. (The >> code also checks for integer overflow of course.) The caller can ask >> for a different amount of growth, if necessary. > > The growth factor is related to the underlying storage being used by > the dynamic array. If the storage had such properties that the array > could be grown in-place and to any size with zero cost, then the growth > could be incredibly small and size kept optimal. Since no underlying storage > has those properties at all times we choose a growth factor that maps to the > properties that are present. Here, those properties are a function > of the underlying system allocator, how it calls mmap, when, how often, > and the costs associated with it. In general I'm surprised to see that 1.5 > is a universally good constant, then again, most OS have similar subsystems > when it comes to allocators and virtual memory. FWIW, we use a 1.5 growth factor for similar situations within GCC. > > I don't know that having the caller ask for different growth is going to be > initially useful in glibc so I'm not going to ask for this in the API. > >>> Can we make the choice to have dynarray always heap allocated? >> >> In Gnulib we've found it better to support initial buffers on the >> stack, as an option. Commonly these initial buffers are already big >> enough, which is a performance win in the typical case. > > OK. Thanks for the data point here. And I'll chime in here that based on experiences within glibc that alloca and vlas are ultimately a bad idea. Yes, they can be orders of magnitude more efficient than hitting the heap, but time and time again they've been mis-managed leading to security vulnerabilities. Walk through the CVEs for glibc over the last 10 years and count how many are related to dynamic stack allocations. IMHO dynamic stack allocations should not be exposed to developers. We simply get them wrong too often. I'll climb down off the soapbox now... >> Why impose this requirement? In Gnulib the caller can specify an >> initial size, but this is optional. The default initial size is >> chosen to be efficient for the underlying malloc implementation. It >> would seem odd to require callers to know this implementation >> detail. > > I see your point. Having the caller convey that no information is relevant > in the context allows the implementation to choose something optimal given > the backing store. And we have some use cases like this in the loader. > At most we know we have say one link map, but how many more there are after > is completely unknown (unless we had statistical data to guide us). We've found in GCC that an initial size is useful for growable arrays. Often we know enough to guess the common size for particular objects. Jeff
On 06/01/2017 09:13 AM, Carlos O'Donell wrote: >> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xalloc.h >> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xmalloc.c > Before I look... > > What license do these files have? They're GPLed, with copyright assigned to the FSF. Although we haven't run into a need for an LGPLed version, that would not be hard to arrange, so I wouldn't worry about the LGPL-vs-GPL licensing issues. One other thing: I'm implementing a slightly different version for Gnulib that uses ptrdiff_t rather than size_t for sizes (and is LGPL rather than GPL since it will clearly have uses in library situations). Using ptrdiff_t improves reliability, since one can compile with -fsanitize=undefined to catch integer overflow with size calculations. Over the years we have found that using size_t for byte and object counts leads to problems for which there is no effective automated checking. This is why I suggest that new memory-allocation APIs use ptrdiff_t rather than size_t.
On 06/01/2017 07:09 PM, Jeff Law wrote:
> FWIW, we use a 1.5 growth factor for similar situations within GCC.
Once we have a set of internal users, I plan to add an stap probe which
logs the final sizes (and the element type etc.), and then we can
revisit the growth policy. My hope is that very few dynarrays will ever
hit the heap, but we'll see.
Thanks,
Florian
On 01/06/2017 17:20, Florian Weimer wrote: > On 06/01/2017 07:09 PM, Jeff Law wrote: >> FWIW, we use a 1.5 growth factor for similar situations within GCC. > > Once we have a set of internal users, I plan to add an stap probe which > logs the final sizes (and the element type etc.), and then we can > revisit the growth policy. My hope is that very few dynarrays will ever > hit the heap, but we'll see. Btw I am planning to use on my glob patches I am working on.
diff --git a/malloc/Makefile b/malloc/Makefile index ad074c6..50f8298 100644 --- a/malloc/Makefile +++ b/malloc/Makefile @@ -33,16 +33,16 @@ tests := mallocbug tst-malloc tst-valloc tst-calloc tst-obstack \ tst-mallocfork2 \ tst-interpose-nothread \ tst-interpose-thread \ - tst-dynarray \ - tst-dynarray-fail \ - tst-dynarray-at-fail \ tests-static := \ tst-interpose-static-nothread \ tst-interpose-static-thread \ tst-malloc-usable-static \ -tests-internal := tst-mallocstate tst-scratch_buffer +tests-internal := tst-mallocstate tst-scratch_buffer \ + tst-dynarray \ + tst-dynarray-fail \ + tst-dynarray-at-fail \ ifneq (no,$(have-tunables)) tests += tst-malloc-usable-tunables diff --git a/malloc/dynarray_at_failure.c b/malloc/dynarray_at_failure.c index 6123eae..7c8b4f0 100644 --- a/malloc/dynarray_at_failure.c +++ b/malloc/dynarray_at_failure.c @@ -23,9 +23,9 @@ void __libc_dynarray_at_failure (size_t size, size_t index) { char buf[200]; - snprintf (buf, sizeof (buf), "Fatal glibc error: " - "array index %zu not less than array length %zu\n", - index, size); - __libc_fatal (buf); + __snprintf (buf, sizeof (buf), "Fatal glibc error: " + "array index %zu not less than array length %zu\n", + index, size); + __libc_fatal (buf); } libc_hidden_def (__libc_dynarray_at_failure) diff --git a/malloc/tst-dynarray-fail.c b/malloc/tst-dynarray-fail.c index 69b84f0..3541587 100644 --- a/malloc/tst-dynarray-fail.c +++ b/malloc/tst-dynarray-fail.c @@ -30,6 +30,11 @@ #include <sys/resource.h> #include <unistd.h> +/* The test is run with malloc trace hooks which use _dl_addr on free + and that is very expensive. The test can take as long as 40s on + some systems so we increase to 50 for some buffer. */ +#define TIMEOUT 50 + /* Data structure to fill up the heap. */ struct heap_filler {