Message ID | 1444787059-3050-1-git-send-email-froydnj@mozilla.com |
---|---|
State | New |
Headers | show |
On 13/10/15 21:44 -0400, Nathan Froyd wrote: >Including <algorithm> in C++11 mode (typically done for >std::{min,max,swap}) includes <random>, for >std::uniform_int_distribution. On x86 platforms, <random> manages to >drag in <x86intrin.h> through x86's opt_random.h header, and ><x86intrin.h> has gotten rather large recently with the addition of AVX >intrinsics. The comparison between C++03 mode and C++11 mode is not >quite exact, but it gives an idea of the penalty we're talking about >here: > >froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++11 | wc > 53460 127553 1401268 >froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++03 | wc > 9202 18933 218189 > >That's approximately a 7x penalty in C++11 mode (granted, C++11 includes >more than just <x86intrin.h>) with GCC 4.9.2 on a Debian system; current >mainline is somewhat worse: > >froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc > 84851 210475 2369616 >froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++03 | wc > 9383 19402 239676 > ><x86intrin.h> itself clocks in at 1.3MB+ of preprocessed text. Yep, that's been bothering me for a while. >This patch aims to reduce that size penalty by recognizing that both of >the places that #include <x86intrin.h> do not need the full set of x86 >intrinsics, but can get by with a smaller, more focused header in each >case. <ext/random> needs only <emmintrin.h> to declare __m128i, while >x86's opt_random.h must include <pmmintrin.h> for declarations of >various intrinsic functions. > >The net result is that the size of mainline's <algorithm> is significantly reduced: > >froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc > 39174 88538 1015281 > >which seems like a win. Indeed! >Bootstrapped on x86_64-pc-linux-gnu with --enable-languages=c,c++, >tested with check-target-libstdc++-v3, no regressions. Also verified >that <algorithm> and <ext/random> pass -fsyntax-check with >-march=native (on a recent Haswell chip); if an -march=native bootstrap >is necessary, I am happy to do that if somebody instructs me in getting >everything properly set up. > >OK? OK, thanks.
diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index e3061ef..ff0b048 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,9 @@ +2015-10-13 Nathan Froyd <froydnj@gcc.gnu.org> + + * config/cpu/i486/opt/bits/opt_random.h: Include pmmintrin.h instead + of x86intrin.h, and only do so when __SSE3__ + * include/ext/random: Include emmintrin.h instead of x86intrin.h + 2015-10-11 Joseph Myers <joseph@codesourcery.com> * crossconfig.m4 (GLIBCXX_CROSSCONFIG) <*-linux* | *-uclinux* | diff --git a/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h b/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h index 4495569..a9f6c13 100644 --- a/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h +++ b/libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h @@ -30,7 +30,9 @@ #ifndef _BITS_OPT_RANDOM_H #define _BITS_OPT_RANDOM_H 1 -#include <x86intrin.h> +#ifdef __SSE3__ +#include <pmmintrin.h> +#endif #pragma GCC system_header diff --git a/libstdc++-v3/include/ext/random b/libstdc++-v3/include/ext/random index 0bcfa4a..ba363ce 100644 --- a/libstdc++-v3/include/ext/random +++ b/libstdc++-v3/include/ext/random @@ -40,7 +40,7 @@ #include <array> #include <ext/cmath> #ifdef __SSE2__ -# include <x86intrin.h> +# include <emmintrin.h> #endif #if defined(_GLIBCXX_USE_C99_STDINT_TR1) && defined(UINT32_C)
From: Nathan Froyd <froydnj@gmail.com> Including <algorithm> in C++11 mode (typically done for std::{min,max,swap}) includes <random>, for std::uniform_int_distribution. On x86 platforms, <random> manages to drag in <x86intrin.h> through x86's opt_random.h header, and <x86intrin.h> has gotten rather large recently with the addition of AVX intrinsics. The comparison between C++03 mode and C++11 mode is not quite exact, but it gives an idea of the penalty we're talking about here: froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++11 | wc 53460 127553 1401268 froydnj@thor:~/src$ echo '#include <algorithm>' | g++ -x c++ - -o - -E -std=c++03 | wc 9202 18933 218189 That's approximately a 7x penalty in C++11 mode (granted, C++11 includes more than just <x86intrin.h>) with GCC 4.9.2 on a Debian system; current mainline is somewhat worse: froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc 84851 210475 2369616 froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++03 | wc 9383 19402 239676 <x86intrin.h> itself clocks in at 1.3MB+ of preprocessed text. This patch aims to reduce that size penalty by recognizing that both of the places that #include <x86intrin.h> do not need the full set of x86 intrinsics, but can get by with a smaller, more focused header in each case. <ext/random> needs only <emmintrin.h> to declare __m128i, while x86's opt_random.h must include <pmmintrin.h> for declarations of various intrinsic functions. The net result is that the size of mainline's <algorithm> is significantly reduced: froydnj@thor: gcc-build$ echo '#include <algorithm>' | xgcc [...] -std=c++11 | wc 39174 88538 1015281 which seems like a win. Bootstrapped on x86_64-pc-linux-gnu with --enable-languages=c,c++, tested with check-target-libstdc++-v3, no regressions. Also verified that <algorithm> and <ext/random> pass -fsyntax-check with -march=native (on a recent Haswell chip); if an -march=native bootstrap is necessary, I am happy to do that if somebody instructs me in getting everything properly set up. OK? -Nathan * config/cpu/i486/opt/bits/opt_random.h: Include pmmintrin.h instead of x86intrin.h, and only do so when __SSE3__ * include/ext/random: Include emmintrin.h instead of x86intrin.h --- libstdc++-v3/ChangeLog | 6 ++++++ libstdc++-v3/config/cpu/i486/opt/bits/opt_random.h | 4 +++- libstdc++-v3/include/ext/random | 2 +- 3 files changed, 10 insertions(+), 2 deletions(-)