Message ID | ZI9MmdQ+OMehcdeg@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | [libstdc++] Improve M_check_len | expand |
On Sun, 18 Jun 2023 at 19:37, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, > _M_check_len is used in vector reallocations. It computes __n + __s but > does > checking for case that (__n + __s) * sizeof (Tp) would overflow ptrdiff_t. > Since we know that __s is a size of already allocated memory block if __n > is > not too large, this will never happen on 64bit systems since memory is not > that > large. This patch adds __builtin_constant_p checks for this case. This > size > of fully inlined push_back function that is critical for loops that are > controlled by std::vector based stack. > > With the patch to optimize std::max and to handle SRA candidates, we > fully now inline push_back with -O3 (not with -O2), however there are still > quite few silly things for example: > > // _78 is original size of the allocated vector. > > _76 = stack$_M_end_of_storage_177 - _142; > _77 = _76 /[ex] 8; > _78 = (long unsigned int) _77; > _79 = MAX_EXPR <_78, 1>; > _80 = _78 + _79; // this is result of _M_check_len doubling the > allocated vector size. > if (_80 != 0) // result will always be non-zero. > goto <bb 7>; [54.67%] > else > goto <bb 13>; [45.33%] > > <bb 7> [local count: 30795011]: > if (_80 > 1152921504606846975) // doubling succesfully allocated > memmory will never get so large. > goto <bb 8>; [10.00%] > else > goto <bb 11>; [90.00%] > > <bb 8> [local count: 3079501]: > if (_80 > 2305843009213693951) // I wonder if we really want to have > two different throws > goto <bb 9>; [50.00%] > else > goto <bb 10>; [50.00%] > > <bb 9> [local count: 1539750]: > std::__throw_bad_array_new_length (); > > <bb 10> [local count: 1539750]: > std::__throw_bad_alloc (); > > <bb 11> [local count: 27715510]: > _108 = _80 * 8; > _109 = operator new (_108); > > Maybe we want to add assumption that result of the function is never > greater than max_size to get rid of the two checks above. However this > will still be recongized only after inlining and will continue confusing > inliner heuristics. > > Bootstrapped/regtested x86_64-linux. I am not too familiar with libstdc++ > internals, > so would welcome comments and ideas. > > libstdc++-v3/ChangeLog: > > PR tree-optimization/110287 > * include/bits/stl_vector.h: Optimize _M_check_len for constantly > sized > types and allocations. > > diff --git a/libstdc++-v3/include/bits/stl_vector.h > b/libstdc++-v3/include/bits/stl_vector.h > index 70ced3d101f..3ad59fe3e2b 100644 > --- a/libstdc++-v3/include/bits/stl_vector.h > +++ b/libstdc++-v3/include/bits/stl_vector.h > @@ -1895,11 +1895,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER > size_type > _M_check_len(size_type __n, const char* __s) const > { > - if (max_size() - size() < __n) > - __throw_length_error(__N(__s)); > + // On 64bit systems vectors of small sizes can not > + // reach overflow by growing by small sizes; before > + // this happens, we will run out of memory. > + if (__builtin_constant_p (sizeof (_Tp)) > This shouldn't be here, of course sizeof is a constant. No space before the opening parens, libstdc++ doesn't follow GNU style. > + && __builtin_constant_p (__n) > + && sizeof (ptrdiff_t) >= 8 > + && __n < max_size () / 2) > This check is not OK. As I said in Bugzilla just now, max_size() depends on the allocator, which could return something much smaller than PTRDIFF_MAX. You can't make this assumption for all specializations of std::vector. If Alloc::max_size() == 100 and this->size() == 100 then this function needs to throw length_error for *any* n. In the general case you cannot remove size() from this condition. For std::allocator<T> it's safe to assume that max_size() is related to PTRDIFF_MAX/sizeof(T), but this patch would apply to all allocators. > + return size() + (std::max)(size(), __n); > + else > + { > + if (max_size() - size() < __n) > + __throw_length_error(__N(__s)); > > - const size_type __len = size() + (std::max)(size(), __n); > - return (__len < size() || __len > max_size()) ? max_size() : __len; > + const size_type __len = size() + (std::max)(size(), __n); > + return (__len < size() || __len > max_size()) ? max_size() : > __len; > + } > } > > // Called by constructors to check initial size. > >
> > - if (max_size() - size() < __n) > > - __throw_length_error(__N(__s)); > > + // On 64bit systems vectors of small sizes can not > > + // reach overflow by growing by small sizes; before > > + // this happens, we will run out of memory. > > + if (__builtin_constant_p (sizeof (_Tp)) > > > > This shouldn't be here, of course sizeof is a constant. OK :) > > No space before the opening parens, libstdc++ doesn't follow GNU style. Fixed. > > > > > + && __builtin_constant_p (__n) > > + && sizeof (ptrdiff_t) >= 8 > > + && __n < max_size () / 2) > > > > This check is not OK. As I said in Bugzilla just now, max_size() depends on > the allocator, which could return something much smaller than PTRDIFF_MAX. > You can't make this assumption for all specializations of std::vector. > > If Alloc::max_size() == 100 and this->size() == 100 then this function > needs to throw length_error for *any* n. In the general case you cannot > remove size() from this condition. > > For std::allocator<T> it's safe to assume that max_size() is related to > PTRDIFF_MAX/sizeof(T), but this patch would apply to all allocators. Here is updated version. I simply __builtin_constant_p max_size and test it is large enough. For that we need to copy it into temporary variable since we fold-const __builtin_constant_p (function (x)) early, before function gets inlined. I also added __builtin_unreachable to determine return value range as discussed in PR. Honza diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h index 70ced3d101f..7a1966405ca 100644 --- a/libstdc++-v3/include/bits/stl_vector.h +++ b/libstdc++-v3/include/bits/stl_vector.h @@ -1895,11 +1895,29 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER size_type _M_check_len(size_type __n, const char* __s) const { - if (max_size() - size() < __n) - __throw_length_error(__N(__s)); + const size_type __max_size = max_size(); + // On 64bit systems vectors can not reach overflow by growing + // by small sizes; before this happens, we will run out of memory. + if (__builtin_constant_p(__n) + && __builtin_constant_p(__max_size) + && sizeof(ptrdiff_t) >= 8 + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60) + && __n < __max_size / 2) + { + const size_type __len = size() + (std::max)(size(), __n); + // let compiler know that __len has sane value range. + if (__len < __n || __len >= __max_size) + __builtin_unreachable(); + return __len; + } + else + { + if (__max_size - size() < __n) + __throw_length_error(__N(__s)); - const size_type __len = size() + (std::max)(size(), __n); - return (__len < size() || __len > max_size()) ? max_size() : __len; + const size_type __len = size() + (std::max)(size(), __n); + return (__len < size() || __len > __max_size) ? __max_size : __len; + } } // Called by constructors to check initial size.
On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches wrote: > - if (max_size() - size() < __n) > - __throw_length_error(__N(__s)); > + const size_type __max_size = max_size(); > + // On 64bit systems vectors can not reach overflow by growing > + // by small sizes; before this happens, we will run out of memory. > + if (__builtin_constant_p(__n) > + && __builtin_constant_p(__max_size) > + && sizeof(ptrdiff_t) >= 8 > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60) Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation? Jakub
On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote: > On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches > wrote: > > - if (max_size() - size() < __n) > > - __throw_length_error(__N(__s)); > > + const size_type __max_size = max_size(); > > + // On 64bit systems vectors can not reach overflow by growing > > + // by small sizes; before this happens, we will run out of memory. > > + if (__builtin_constant_p(__n) > > + && __builtin_constant_p(__max_size) > > + && sizeof(ptrdiff_t) >= 8 > > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60) > > Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation? > For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). But for a user-defined allocator that has a silly max_size(), yes, that's possible. I still don't really understand why any change is needed here. The PR says that the current _M_check_len brings in the EH code, but how/why does that happen? The __throw_length_error function is not inline, it's defined in libstdc++.so, so why isn't it just an extern call? Is the problem that it makes _M_check_len potentially-throwing? Because that's basically the entire point of _M_check_len: to throw the exception that is required by the C++ standard. We need to be very careful about removing that required throw! And after we call _M_check_len we call allocate unconditionally, so _M_realloc_insert can always throw (we only call _M_realloc_insert in the case where we've already decided a reallocation is definitely needed). Would this version of _M_check_len help? size_type _M_check_len(size_type __n, const char* __s) const { const size_type __size = size(); const size_type __max_size = max_size(); if (__is_same(allocator_type, allocator<_Tp>) && __size > __max_size / 2) __builtin_unreachable(); // Assume std::allocator can't fill memory. else if (__size > __max_size) __builtin_unreachable(); if (__max_size - __size < __n) __throw_length_error(__N(__s)); const size_type __len = __size + (std::max)(__size, __n); return (__len < __size || __len > __max_size) ? __max_size : __len; } This only applies to std::allocator, not user-defined allocators (because we don't know their semantics). It also seems like less of a big hack!
P.S. please CC libstdc++@gcc.gnu.org for all libstdc++ patches. On Mon, 19 Jun 2023 at 16:13, Jonathan Wakely <jwakely@redhat.com> wrote: > On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote: > >> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches >> wrote: >> > - if (max_size() - size() < __n) >> > - __throw_length_error(__N(__s)); >> > + const size_type __max_size = max_size(); >> > + // On 64bit systems vectors can not reach overflow by growing >> > + // by small sizes; before this happens, we will run out of memory. >> > + if (__builtin_constant_p(__n) >> > + && __builtin_constant_p(__max_size) >> > + && sizeof(ptrdiff_t) >= 8 >> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60) >> >> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation? >> > > For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). > But for a user-defined allocator that has a silly max_size(), yes, that's > possible. > > I still don't really understand why any change is needed here. The PR says > that the current _M_check_len brings in the EH code, but how/why does that > happen? The __throw_length_error function is not inline, it's defined in > libstdc++.so, so why isn't it just an extern call? Is the problem that it > makes _M_check_len potentially-throwing? Because that's basically the > entire point of _M_check_len: to throw the exception that is required by > the C++ standard. We need to be very careful about removing that required > throw! And after we call _M_check_len we call allocate unconditionally, so > _M_realloc_insert can always throw (we only call _M_realloc_insert in the > case where we've already decided a reallocation is definitely needed). > > Would this version of _M_check_len help? > > size_type > _M_check_len(size_type __n, const char* __s) const > { > const size_type __size = size(); > const size_type __max_size = max_size(); > > if (__is_same(allocator_type, allocator<_Tp>) > && __size > __max_size / 2) > __builtin_unreachable(); // Assume std::allocator can't fill > memory. > else if (__size > __max_size) > __builtin_unreachable(); > > if (__max_size - __size < __n) > __throw_length_error(__N(__s)); > > const size_type __len = __size + (std::max)(__size, __n); > return (__len < __size || __len > __max_size) ? __max_size : __len; > } > > This only applies to std::allocator, not user-defined allocators (because > we don't know their semantics). It also seems like less of a big hack! > > >
On Mon, 19 Jun 2023 at 16:13, Jonathan Wakely <jwakely@redhat.com> wrote: > On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote: > >> On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches >> wrote: >> > - if (max_size() - size() < __n) >> > - __throw_length_error(__N(__s)); >> > + const size_type __max_size = max_size(); >> > + // On 64bit systems vectors can not reach overflow by growing >> > + // by small sizes; before this happens, we will run out of memory. >> > + if (__builtin_constant_p(__n) >> > + && __builtin_constant_p(__max_size) >> > + && sizeof(ptrdiff_t) >= 8 >> > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60) >> >> Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation? >> > > For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). > But for a user-defined allocator that has a silly max_size(), yes, that's > possible. > > I still don't really understand why any change is needed here. The PR says > that the current _M_check_len brings in the EH code, but how/why does that > happen? The __throw_length_error function is not inline, it's defined in > libstdc++.so, so why isn't it just an extern call? Is the problem that it > makes _M_check_len potentially-throwing? Because that's basically the > entire point of _M_check_len: to throw the exception that is required by > the C++ standard. We need to be very careful about removing that required > throw! And after we call _M_check_len we call allocate unconditionally, so > _M_realloc_insert can always throw (we only call _M_realloc_insert in the > case where we've already decided a reallocation is definitely needed). > > Would this version of _M_check_len help? > > size_type > _M_check_len(size_type __n, const char* __s) const > { > const size_type __size = size(); > const size_type __max_size = max_size(); > > if (__is_same(allocator_type, allocator<_Tp>) > && __size > __max_size / 2) > This check is wrong for C++17 and older standards, because max_size() changed value in C++20. In C++17 it was PTRDIFF_MAX / sizeof(T) but in C++20 it's SIZE_MAX / sizeof(T). So on 32-bit targets using C++17, it's possible a std::vector could use PTRDIFF_MAX/2 bytes, and then the size <= max_size/2 assumption would not hold.
> On Mon, 19 Jun 2023 at 12:20, Jakub Jelinek wrote: > > > On Mon, Jun 19, 2023 at 01:05:36PM +0200, Jan Hubicka via Gcc-patches > > wrote: > > > - if (max_size() - size() < __n) > > > - __throw_length_error(__N(__s)); > > > + const size_type __max_size = max_size(); > > > + // On 64bit systems vectors can not reach overflow by growing > > > + // by small sizes; before this happens, we will run out of memory. > > > + if (__builtin_constant_p(__n) > > > + && __builtin_constant_p(__max_size) > > > + && sizeof(ptrdiff_t) >= 8 > > > + && __max_size * sizeof(_Tp) >= ((ptrdiff_t)1 << 60) > > > > Isn't there a risk of overlow in the __max_size * sizeof(_Tp) computation? > > > > For std::allocator, no, because max_size() is size_t(-1) / sizeof(_Tp). But > for a user-defined allocator that has a silly max_size(), yes, that's > possible. > > I still don't really understand why any change is needed here. The PR says > that the current _M_check_len brings in the EH code, but how/why does that > happen? The __throw_length_error function is not inline, it's defined in > libstdc++.so, so why isn't it just an extern call? Is the problem that it It is really quite interesting peformance problem which does affect real code. Extra extern call counts (especially since it is seen as 3 calls by inliner). This is _M_check_len after early optimizations (so as seen by inline heuristics): <bb 2> [local count: 1073741824]: _15 = this_7(D)->D.26656._M_impl.D.25963._M_finish; _14 = this_7(D)->D.26656._M_impl.D.25963._M_start; _13 = _15 - _14; _10 = _13 /[ex] 8; _8 = (long unsigned int) _10; _1 = 1152921504606846975 - _8; __n.3_2 = __n; if (_1 < __n.3_2) goto <bb 3>; [0.04%] else goto <bb 4>; [99.96%] <bb 3> [local count: 429496]: std::__throw_length_error (__s_16(D)); <bb 4> [local count: 1073312329]: D.27696 = _8; if (__n.3_2 > _8) goto <bb 5>; [34.00%] else goto <bb 6>; [66.00%] <bb 5> [local count: 364926196]: <bb 6> [local count: 1073312330]: # _18 = PHI <&D.27696(4), &__n(5)> _3 = *_18; __len_11 = _3 + _8; D.27696 ={v} {CLOBBER(eol)}; if (_8 > __len_11) goto <bb 8>; [35.00%] else goto <bb 7>; [65.00%] <bb 7> [local count: 697653013]: _5 = MIN_EXPR <__len_11, 1152921504606846975>; <bb 8> [local count: 1073312330]: # iftmp.4_4 = PHI <1152921504606846975(6), _5(7)> return iftmp.4_4; So a lot of code that is essnetially semantically equivalent to: return __size + MAX_EXPR (__n, __size) at least with the default allocator. Early inliner decides that it is not good idea to early inline. At this stage we inline mostly calls where we expect code to get smaller after inlining and since the function contains another uninlinable call, this does not seem likely. With -O3 we will inline it later at IPA stage, but only when the code is considered hot. With -O2 we decide to keep it offline if the unit contians multiple calls to the function otherwise we inline it since it wins in the code size estimation model. The problem is that _M_check_len is used by _M_realloc_insert that later feeds result to the allocator. There is extra redundancy since allocator can call std::__throw_bad_array_new_length and std::__throw_bad_alloc for bad sizes, but _M_check_len will not produce them which is something we won't work out before inlning it. As a result _M_realloc_insert is seen as very large function by inliner heuristics (71 instructions). Functions that are not declared inline are inlined if smaller than 15 instructions with -O2 and 30 instructions with -O3. So we don't inline. This hurts common lops that use vector as a stack and calls push_back in internal loop. Not inlining prevents SRA and we end up saving and loading the end of vector pointer on every iteration of the loop. The following testcase: typedef unsigned int uint32_t; std::pair<uint32_t, uint32_t> pair; void test() { std::vector<std::pair<uint32_t, uint32_t>> stack; stack.push_back (pair); while (!stack.empty()) { std::pair<uint32_t, uint32_t> cur = stack.back(); stack.pop_back(); if (!cur.first) { cur.second++; stack.push_back (cur); } if (cur.second > 10000) break; } } int main() { for (int i = 0; i < 10000; i++) test(); } Runs for me 0.5s with _M_realoc_insert not inlined and 0.048s with _M_realloc_insert inlined. Clang inlines it even at -O2 and does 0.063s. I believe it is the reason why jpegxl library is slower when built with GCC and since such loops are quite common in say DFS walk, I think it is frequent problem. > makes _M_check_len potentially-throwing? Because that's basically the > entire point of _M_check_len: to throw the exception that is required by > the C++ standard. We need to be very careful about removing that required > throw! And after we call _M_check_len we call allocate unconditionally, so > _M_realloc_insert can always throw (we only call _M_realloc_insert in the > case where we've already decided a reallocation is definitely needed). > > Would this version of _M_check_len help? > > size_type > _M_check_len(size_type __n, const char* __s) const > { > const size_type __size = size(); > const size_type __max_size = max_size(); > > if (__is_same(allocator_type, allocator<_Tp>) > && __size > __max_size / 2) > __builtin_unreachable(); // Assume std::allocator can't fill > memory. > else if (__size > __max_size) > __builtin_unreachable(); > > if (__max_size - __size < __n) > __throw_length_error(__N(__s)); > > const size_type __len = __size + (std::max)(__size, __n); > return (__len < __size || __len > __max_size) ? __max_size : __len; > } > > This only applies to std::allocator, not user-defined allocators (because > we don't know their semantics). It also seems like less of a big hack! This helps my testcase :) _M_check_len is now: size_type std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len (const struct vector * const this, size_type __n, const char * __s) { const size_type __len; long unsigned int _1; long unsigned int __n.5_2; long int _5; long unsigned int _6; struct pair * _7; struct pair * _8; long int _12; long unsigned int _13; long unsigned int _14; long unsigned int _15; <bb 2> [local count: 1073741824]: _8 = this_4(D)->D.26651._M_impl.D.25958._M_finish; _7 = this_4(D)->D.26651._M_impl.D.25958._M_start; _5 = _8 - _7; _12 = _5 /[ex] 8; _13 = (long unsigned int) _12; _15 = (long unsigned int) _5; if (_15 > 4611686018427387896) goto <bb 3>; [0.00%] else goto <bb 4>; [100.00%] <bb 3> [count: 0]: __builtin_unreachable (); <bb 4> [local count: 1073741824]: _1 = 1152921504606846975 - _13; __n.5_2 = __n; if (_1 < __n.5_2) goto <bb 5>; [0.04%] else goto <bb 6>; [99.96%] <bb 5> [local count: 429496]: std::__throw_length_error (__s_10(D)); <bb 6> [local count: 1073312329]: _14 = MAX_EXPR <__n.5_2, _13>; __len_9 = _13 + _14; _6 = MIN_EXPR <__len_9, 1152921504606846975>; return _6; } Originally it counted as 32 instructions in inliner metrics (load/calls counts as 2), while now it is 28 and 13 of them are expected to survive optimizations after inlining. With the patch we inline _M_realloc_insert at -O3. With -O2 we still do not inline it (and I am not convinced we should), but reduce its code size from 347 bytes to 227 bytes in the final binary that also counts. Estimated size of offlined _M_realloc_insert is still 67 instructions instead of formed 71 (still long way to 15). How common are nonstandard allocators? Honza
> > > > size_type > > _M_check_len(size_type __n, const char* __s) const > > { > > const size_type __size = size(); > > const size_type __max_size = max_size(); > > > > if (__is_same(allocator_type, allocator<_Tp>) > > && __size > __max_size / 2) > > > > This check is wrong for C++17 and older standards, because max_size() > changed value in C++20. > > In C++17 it was PTRDIFF_MAX / sizeof(T) but in C++20 it's SIZE_MAX / > sizeof(T). So on 32-bit targets using C++17, it's possible a std::vector > could use PTRDIFF_MAX/2 bytes, and then the size <= max_size/2 assumption > would not hold. Can we go with this perhaps only for 64bit targets? I am not sure how completely safe this idea is in 32bit world: I guess one can have OS that lets you to allocate half of address space as one allocation. Thanks! Honza
> > > > > > size_type > > > _M_check_len(size_type __n, const char* __s) const > > > { > > > const size_type __size = size(); > > > const size_type __max_size = max_size(); > > > > > > if (__is_same(allocator_type, allocator<_Tp>) > > > && __size > __max_size / 2) > > > > > > > This check is wrong for C++17 and older standards, because max_size() > > changed value in C++20. > > > > In C++17 it was PTRDIFF_MAX / sizeof(T) but in C++20 it's SIZE_MAX / > > sizeof(T). So on 32-bit targets using C++17, it's possible a std::vector > > could use PTRDIFF_MAX/2 bytes, and then the size <= max_size/2 assumption > > would not hold. > > Can we go with this perhaps only for 64bit targets? > I am not sure how completely safe this idea is in 32bit world: I guess > one can have OS that lets you to allocate half of address space as one > allocation. Perhaps something like: size > std::min ((uint64_t)__max_size, ((uint64_t)1 << 62) / sizeof (_Tp)) is safe for all allocators and 32bit, so we won't need __is_same test and test for 64bit? Honza > > Thanks! > Honza
On Tue, Jun 20, 2023 at 09:50:25AM +0200, Jan Hubicka wrote: > > > > > > size_type > > > _M_check_len(size_type __n, const char* __s) const > > > { > > > const size_type __size = size(); > > > const size_type __max_size = max_size(); > > > > > > if (__is_same(allocator_type, allocator<_Tp>) > > > && __size > __max_size / 2) > > > > > > > This check is wrong for C++17 and older standards, because max_size() > > changed value in C++20. > > > > In C++17 it was PTRDIFF_MAX / sizeof(T) but in C++20 it's SIZE_MAX / > > sizeof(T). So on 32-bit targets using C++17, it's possible a std::vector > > could use PTRDIFF_MAX/2 bytes, and then the size <= max_size/2 assumption > > would not hold. > > Can we go with this perhaps only for 64bit targets? > I am not sure how completely safe this idea is in 32bit world: I guess > one can have OS that lets you to allocate half of address space as one > allocation. Is it safe even on 64bit targets? I mean, doesn't say PowerPC already allow full 64-bit virtual address space? The assumption that one can't have more than half of virtual address space allocations is true right now at least on x86-64, aarch64 and others, but isn't that something that can change with newer versions of CPUs without the need to recompile applications (add another level or two of page tables)? By being hardcoded in libstdc++ headers those assumptions will be hardcoded in lots of applications. Jakub
On Jun 20 2023, Jakub Jelinek via Gcc-patches wrote: > Is it safe even on 64bit targets? I mean, doesn't say PowerPC already allow > full 64-bit virtual address space? The assumption that one can't have > more than half of virtual address space allocations is true right now at > least on x86-64, aarch64 and others, but isn't that something that can > change with newer versions of CPUs without the need to recompile > applications (add another level or two of page tables)? At least s390 can allocate more than half the address space. That triggered a failure in gawk.
On Tue, 20 Jun 2023 at 09:21, Andreas Schwab wrote: > On Jun 20 2023, Jakub Jelinek via Gcc-patches wrote: > > > Is it safe even on 64bit targets? I mean, doesn't say PowerPC already > allow > > full 64-bit virtual address space? The assumption that one can't have > > more than half of virtual address space allocations is true right now at > > least on x86-64, aarch64 and others, but isn't that something that can > > change with newer versions of CPUs without the need to recompile > > applications (add another level or two of page tables)? > > At least s390 can allocate more than half the address space. That > triggered a failure in gawk. > Is PTRDIFF_MAX large enough to represent the difference between any two pointers? What we're considering for libstdc++ is treating PTRDIFF_MAX as an upper limit on allocation size. If there are targets that can really allocate a 2^63 byte array, they won't be able to represent the difference between the first element and the last element unless ptrdiff_t is wider than 64 bits.
On Tue, 20 Jun 2023 at 11:45, Jonathan Wakely <jwakely@redhat.com> wrote: > On Tue, 20 Jun 2023 at 09:21, Andreas Schwab wrote: > >> On Jun 20 2023, Jakub Jelinek via Gcc-patches wrote: >> >> > Is it safe even on 64bit targets? I mean, doesn't say PowerPC already >> allow >> > full 64-bit virtual address space? The assumption that one can't have >> > more than half of virtual address space allocations is true right now at >> > least on x86-64, aarch64 and others, but isn't that something that can >> > change with newer versions of CPUs without the need to recompile >> > applications (add another level or two of page tables)? >> >> At least s390 can allocate more than half the address space. That >> triggered a failure in gawk. >> > > Is PTRDIFF_MAX large enough to represent the difference between any two > pointers? > > What we're considering for libstdc++ is treating PTRDIFF_MAX as an upper > limit on allocation size. If there are targets that can really allocate a > 2^63 byte array, they won't be able to represent the difference between the > first element and the last element unless ptrdiff_t is wider than 64 bits. > Of course if we're talking about 32-bit targets then you only need a 64-bit ptrdiff_t to allow arrays larger than 2^31 bytes. In any case, PTRDIFF_MAX is still an upper limit (although for a 64-bit ptrdiff_t and a 32-bit address space, it's not a useful limit, because it's much much larger than the real limit).
diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h index 70ced3d101f..3ad59fe3e2b 100644 --- a/libstdc++-v3/include/bits/stl_vector.h +++ b/libstdc++-v3/include/bits/stl_vector.h @@ -1895,11 +1895,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER size_type _M_check_len(size_type __n, const char* __s) const { - if (max_size() - size() < __n) - __throw_length_error(__N(__s)); + // On 64bit systems vectors of small sizes can not + // reach overflow by growing by small sizes; before + // this happens, we will run out of memory. + if (__builtin_constant_p (sizeof (_Tp)) + && __builtin_constant_p (__n) + && sizeof (ptrdiff_t) >= 8 + && __n < max_size () / 2) + return size() + (std::max)(size(), __n); + else + { + if (max_size() - size() < __n) + __throw_length_error(__N(__s)); - const size_type __len = size() + (std::max)(size(), __n); - return (__len < size() || __len > max_size()) ? max_size() : __len; + const size_type __len = size() + (std::max)(size(), __n); + return (__len < size() || __len > max_size()) ? max_size() : __len; + } } // Called by constructors to check initial size.