From patchwork Tue May 7 13:41:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matthias Kretz X-Patchwork-Id: 1932481 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VYfbn31lfz1ydW for ; Tue, 7 May 2024 23:42:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B9F53858410 for ; Tue, 7 May 2024 13:42:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from lxmtout2.gsi.de (lxmtout2.gsi.de [140.181.3.112]) by sourceware.org (Postfix) with ESMTPS id D22AB3858D1E; Tue, 7 May 2024 13:41:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D22AB3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gsi.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gsi.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D22AB3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=140.181.3.112 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715089307; cv=none; b=og7e2qLagYgJHrdctaW5fYknjapA9ZHMJyuW63ko1xsh4VSf7VsZ2yN6fHnAE3cEpwmGdD+2/73KqwUJaR7t7Wk41WoGqhL28MaQE0mbcEjRWtZ8B85mgxMvnD6tReYnyQf+cLD65ZIBIr2Cyql9AYFXSceWFoELAw9bWbBMCLA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715089307; c=relaxed/simple; bh=3KwHfsxamcp8f+dBJxPowD0Q9qfXfKLBf+2uEPFfH30=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=ksb/MlHqD83Dlb7IL6D6kH/9zhYrCT5y116+4cmmiqr8Gam3tRFFdJbp0Vr8cE4oU9lXRjQ1jWm8Ao7WiqC4fSBjHPxa9NHfQ7t0hdYd7yhzs/5UKMQc5dAblWVMJ+Fa8aiUI6+5ysWA6amEzjki8e9Tn+E06JlLopDRx5OPudY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from localhost (localhost [127.0.0.1]) by lxmtout2.gsi.de (Postfix) with ESMTP id A394120350E6; Tue, 7 May 2024 15:41:42 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at lxmtout2.gsi.de Received: from lxmtout2.gsi.de ([127.0.0.1]) by localhost (lxmtout2.gsi.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id XfRv4ePQSiAv; Tue, 7 May 2024 15:41:42 +0200 (CEST) Received: from srvEX6.campus.gsi.de (unknown [10.10.4.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lxmtout2.gsi.de (Postfix) with ESMTPS id 9237020350E1; Tue, 7 May 2024 15:41:41 +0200 (CEST) Received: from excalibur.localnet (140.181.3.12) by srvEX6.campus.gsi.de (10.10.4.96) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 7 May 2024 15:41:41 +0200 From: Matthias Kretz To: , Subject: [PATCH] libstdc++: Use __builtin_shufflevector for simd split and concat Date: Tue, 7 May 2024 15:41:40 +0200 Message-ID: <7312653.5fSG56mABF@excalibur> Organization: GSI Helmholtz Center for Heavy Ion Research MIME-Version: 1.0 X-Originating-IP: [140.181.3.12] X-ClientProxiedBy: srvex5.Campus.gsi.de (10.10.4.95) To srvEX6.campus.gsi.de (10.10.4.96) X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Tested on x86_64-linux-gnu and aarch64-linux-gnu and with Clang 18 on x86_64- linux-gnu. OK for trunk and backport(s)? ---------------------- 8< ---------------------------- Signed-off-by: Matthias Kretz libstdc++-v3/ChangeLog: PR libstdc++/114958 * include/experimental/bits/simd.h (__as_vector): Return scalar simd as one-element vector. Return vector from single-vector fixed_size simd. (__vec_shuffle): New. (__extract_part): Adjust return type signature. (split): Use __extract_part for any split into non-fixed_size simds. (concat): If the return type stores a single vector, use __vec_shuffle (which calls __builtin_shufflevector) to produce the return value. * include/experimental/bits/simd_builtin.h (__shift_elements_right): Removed. (__extract_part): Return single elements directly. Use __vec_shuffle (which calls __builtin_shufflevector) to for all non-trivial cases. * include/experimental/bits/simd_fixed_size.h (__extract_part): Return single elements directly. * testsuite/experimental/simd/pr114958.cc: New test. --- libstdc++-v3/include/experimental/bits/simd.h | 161 +++++++++++++----- .../include/experimental/bits/simd_builtin.h | 152 +---------------- .../experimental/bits/simd_fixed_size.h | 4 +- .../testsuite/experimental/simd/pr114958.cc | 20 +++ 4 files changed, 145 insertions(+), 192 deletions(-) create mode 100644 libstdc++-v3/testsuite/experimental/simd/pr114958.cc -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de stdₓ::simd ────────────────────────────────────────────────────────────────────────── diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 6ef9c955cfa..6a6fd4f109d 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -1651,7 +1651,24 @@ __as_vector(_V __x) if constexpr (__is_vector_type_v<_V>) return __x; else if constexpr (is_simd<_V>::value || is_simd_mask<_V>::value) - return __data(__x)._M_data; + { + if constexpr (__is_fixed_size_abi_v) + { + static_assert(is_simd<_V>::value); + static_assert(_V::abi_type::template __traits< + typename _V::value_type>::_SimdMember::_S_tuple_size == 1); + return __as_vector(__data(__x).first); + } + else if constexpr (_V::size() > 1) + return __data(__x)._M_data; + else + { + static_assert(is_simd<_V>::value); + using _Tp = typename _V::value_type; + using _RV [[__gnu__::__vector_size__(sizeof(_Tp))]] = _Tp; + return _RV{__data(__x)}; + } + } else if constexpr (__is_vectorizable_v<_V>) return __vector_type_t<_V, 2>{__x}; else @@ -2061,6 +2078,60 @@ __not(_Tp __a) noexcept return ~__a; } +// }}} +// __vec_shuffle{{{ +template + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __vec_shuffle(_T0 __x, _T1 __y, index_sequence<_Is...> __seq, _Fun __idx_perm) + { + constexpr int _N0 = sizeof(__x) / sizeof(__x[0]); + constexpr int _N1 = sizeof(__y) / sizeof(__y[0]); +#if __has_builtin(__builtin_shufflevector) +#ifdef __clang__ + // Clang requires _T0 == _T1 + if constexpr (sizeof(__x) > sizeof(__y) and _N1 == 1) + return __vec_shuffle(__x, _T0{__y[0]}, __seq, __idx_perm); + else if constexpr (sizeof(__x) > sizeof(__y)) + return __vec_shuffle(__x, __intrin_bitcast<_T0>(__y), __seq, __idx_perm); + else if constexpr (sizeof(__x) < sizeof(__y) and _N0 == 1) + return __vec_shuffle(_T1{__x[0]}, __y, __seq, [=](int __i) { + __i = __idx_perm(__i); + return __i < _N0 ? __i : __i - _N0 + _N1; + }); + else if constexpr (sizeof(__x) < sizeof(__y)) + return __vec_shuffle(__intrin_bitcast<_T1>(__x), __y, __seq, [=](int __i) { + __i = __idx_perm(__i); + return __i < _N0 ? __i : __i - _N0 + _N1; + }); + else +#endif + return __builtin_shufflevector(__x, __y, [=] { + constexpr int __j = __idx_perm(_Is); + static_assert(__j < _N0 + _N1); + return __j; + }()...); +#else + using _Tp = __remove_cvref_t; + return __vector_type_t<_Tp, sizeof...(_Is)> { + [=]() -> _Tp { + constexpr int __j = __idx_perm(_Is); + static_assert(__j < _N0 + _N1); + if constexpr (__j < 0) + return 0; + else if constexpr (__j < _N0) + return __x[__j]; + else + return __y[__j - _N0]; + }()... + }; +#endif + } + +template + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __vec_shuffle(_T0 __x, _Seq __seq, _Fun __idx_perm) + { return __vec_shuffle(__x, _T0(), __seq, __idx_perm); } + // }}} // __concat{{{ template , @@ -3947,7 +4018,7 @@ clamp(const simd<_Tp, _Ap>& __v, const simd<_Tp, _Ap>& __lo, const simd<_Tp, _Ap // __extract_part {{{ template _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr - _SimdWrapper<_Tp, _Np / _Total * _Combine> + conditional_t<_Np == _Total and _Combine == 1, _Tp, _SimdWrapper<_Tp, _Np / _Total * _Combine>> __extract_part(const _SimdWrapper<_Tp, _Np> __x); template @@ -4231,48 +4302,21 @@ static_assert( __split_wrapper(_SL::template _S_pop_front<1>(), __data(__x).second)); } - else if constexpr ((!is_same_v> && ...) - && (!__is_fixed_size_abi_v< - simd_abi::deduce_t<_Tp, _Sizes>> && ...)) + else if constexpr ((!__is_fixed_size_abi_v> && ...)) { - if constexpr (((_Sizes * 2 == _Np) && ...)) - return {{__private_init, __extract_part<0, 2>(__data(__x))}, - {__private_init, __extract_part<1, 2>(__data(__x))}}; - else if constexpr (is_same_v<_SizeList<_Sizes...>, - _SizeList<_Np / 3, _Np / 3, _Np / 3>>) - return {{__private_init, __extract_part<0, 3>(__data(__x))}, - {__private_init, __extract_part<1, 3>(__data(__x))}, - {__private_init, __extract_part<2, 3>(__data(__x))}}; - else if constexpr (is_same_v<_SizeList<_Sizes...>, - _SizeList<2 * _Np / 3, _Np / 3>>) - return {{__private_init, __extract_part<0, 3, 2>(__data(__x))}, - {__private_init, __extract_part<2, 3>(__data(__x))}}; - else if constexpr (is_same_v<_SizeList<_Sizes...>, - _SizeList<_Np / 3, 2 * _Np / 3>>) - return {{__private_init, __extract_part<0, 3>(__data(__x))}, - {__private_init, __extract_part<1, 3, 2>(__data(__x))}}; - else if constexpr (is_same_v<_SizeList<_Sizes...>, - _SizeList<_Np / 2, _Np / 4, _Np / 4>>) - return {{__private_init, __extract_part<0, 2>(__data(__x))}, - {__private_init, __extract_part<2, 4>(__data(__x))}, - {__private_init, __extract_part<3, 4>(__data(__x))}}; - else if constexpr (is_same_v<_SizeList<_Sizes...>, - _SizeList<_Np / 4, _Np / 4, _Np / 2>>) - return {{__private_init, __extract_part<0, 4>(__data(__x))}, - {__private_init, __extract_part<1, 4>(__data(__x))}, - {__private_init, __extract_part<1, 2>(__data(__x))}}; - else if constexpr (is_same_v<_SizeList<_Sizes...>, - _SizeList<_Np / 4, _Np / 2, _Np / 4>>) - return {{__private_init, __extract_part<0, 4>(__data(__x))}, - {__private_init, __extract_center(__data(__x))}, - {__private_init, __extract_part<3, 4>(__data(__x))}}; - else if constexpr (((_Sizes * 4 == _Np) && ...)) - return {{__private_init, __extract_part<0, 4>(__data(__x))}, - {__private_init, __extract_part<1, 4>(__data(__x))}, - {__private_init, __extract_part<2, 4>(__data(__x))}, - {__private_init, __extract_part<3, 4>(__data(__x))}}; - // else fall through + constexpr array __size = {_Sizes...}; + return __generate_from_n_evaluations( + [&](auto __i) constexpr { + constexpr size_t __offset = [&]() { + size_t __r = 0; + for (unsigned __j = 0; __j < __i; ++__j) + __r += __size[__j]; + return __r; + }(); + return __deduced_simd<_Tp, __size[__i]>( + __private_init, + __extract_part<__offset, _Np, __size[__i]>(__data(__x))); + }); } #ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS const __may_alias<_Tp>* const __element_ptr @@ -4334,14 +4378,37 @@ static_assert( simd<_Tp, simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>> concat(const simd<_Tp, _As>&... __xs) { - using _Rp = __deduced_simd<_Tp, (simd_size_v<_Tp, _As> + ...)>; + constexpr int _Np = (simd_size_v<_Tp, _As> + ...); + using _Abi = simd_abi::deduce_t<_Tp, _Np>; + using _Rp = simd<_Tp, _Abi>; + using _RW = typename _SimdTraits<_Tp, _Abi>::_SimdMember; if constexpr (sizeof...(__xs) == 1) return simd_cast<_Rp>(__xs...); else if ((... && __xs._M_is_constprop())) - return simd<_Tp, - simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>>( - [&](auto __i) constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA + return _Rp([&](auto __i) constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { return __subscript_in_pack<__i>(__xs...); }); + else if constexpr (__is_simd_wrapper_v<_RW> and sizeof...(__xs) == 2) + { + return {__private_init, + __vec_shuffle(__as_vector(__xs)..., std::make_index_sequence<_RW::_S_full_size>(), + [](int __i) { + constexpr int __sizes[2] = {int(simd_size_v<_Tp, _As>)...}; + constexpr int __padding0 + = sizeof(__vector_type_t<_Tp, __sizes[0]>) / sizeof(_Tp) + - __sizes[0]; + return __i >= _Np ? -1 : __i < __sizes[0] ? __i : __i + __padding0; + })}; + } + else if constexpr (__is_simd_wrapper_v<_RW> and sizeof...(__xs) == 3) + return [](const auto& __x0, const auto& __x1, const auto& __x2) + _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { + return concat(concat(__x0, __x1), __x2); + }(__xs...); + else if constexpr (__is_simd_wrapper_v<_RW> and sizeof...(__xs) > 3) + return [](const auto& __x0, const auto& __x1, const auto&... __rest) + _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { + return concat(concat(__x0, __x1), concat(__rest...)); + }(__xs...); else { _Rp __r{}; diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h index 4ceeb423894..505f8083794 100644 --- a/libstdc++-v3/include/experimental/bits/simd_builtin.h +++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h @@ -92,124 +92,16 @@ __wrapper_bitcast(_SimdWrapper<_Up, _M> __x) return __intrin_bitcast<__vector_type_t<_Tp, _Np>>(__x._M_data); } -// }}} -// __shift_elements_right{{{ -// if (__shift % 2ⁿ == 0) => the low n Bytes are correct -template > - _GLIBCXX_SIMD_INTRINSIC _Tp - __shift_elements_right(_Tp __v) - { - [[maybe_unused]] const auto __iv = __to_intrin(__v); - static_assert(__shift <= sizeof(_Tp)); - if constexpr (__shift == 0) - return __v; - else if constexpr (__shift == sizeof(_Tp)) - return _Tp(); -#if _GLIBCXX_SIMD_X86INTRIN // {{{ - else if constexpr (__have_sse && __shift == 8 - && _TVT::template _S_is) - return _mm_movehl_ps(__iv, __iv); - else if constexpr (__have_sse2 && __shift == 8 - && _TVT::template _S_is) - return _mm_unpackhi_pd(__iv, __iv); - else if constexpr (__have_sse2 && sizeof(_Tp) == 16) - return reinterpret_cast( - _mm_srli_si128(reinterpret_cast<__m128i>(__iv), __shift)); - else if constexpr (__shift == 16 && sizeof(_Tp) == 32) - { - /*if constexpr (__have_avx && _TVT::template _S_is) - return _mm256_permute2f128_pd(__iv, __iv, 0x81); - else if constexpr (__have_avx && _TVT::template _S_is) - return _mm256_permute2f128_ps(__iv, __iv, 0x81); - else if constexpr (__have_avx) - return reinterpret_cast( - _mm256_permute2f128_si256(__iv, __iv, 0x81)); - else*/ - return __zero_extend(__hi128(__v)); - } - else if constexpr (__have_avx2 && sizeof(_Tp) == 32 && __shift < 16) - { - const auto __vll = __vector_bitcast<_LLong>(__v); - return reinterpret_cast( - _mm256_alignr_epi8(_mm256_permute2x128_si256(__vll, __vll, 0x81), - __vll, __shift)); - } - else if constexpr (__have_avx && sizeof(_Tp) == 32 && __shift < 16) - { - const auto __vll = __vector_bitcast<_LLong>(__v); - return reinterpret_cast( - __concat(_mm_alignr_epi8(__hi128(__vll), __lo128(__vll), __shift), - _mm_srli_si128(__hi128(__vll), __shift))); - } - else if constexpr (sizeof(_Tp) == 32 && __shift > 16) - return __zero_extend(__shift_elements_right<__shift - 16>(__hi128(__v))); - else if constexpr (sizeof(_Tp) == 64 && __shift == 32) - return __zero_extend(__hi256(__v)); - else if constexpr (__have_avx512f && sizeof(_Tp) == 64) - { - if constexpr (__shift >= 48) - return __zero_extend( - __shift_elements_right<__shift - 48>(__extract<3, 4>(__v))); - else if constexpr (__shift >= 32) - return __zero_extend( - __shift_elements_right<__shift - 32>(__hi256(__v))); - else if constexpr (__shift % 8 == 0) - return reinterpret_cast( - _mm512_alignr_epi64(__m512i(), __intrin_bitcast<__m512i>(__v), - __shift / 8)); - else if constexpr (__shift % 4 == 0) - return reinterpret_cast( - _mm512_alignr_epi32(__m512i(), __intrin_bitcast<__m512i>(__v), - __shift / 4)); - else if constexpr (__have_avx512bw && __shift < 16) - { - const auto __vll = __vector_bitcast<_LLong>(__v); - return reinterpret_cast( - _mm512_alignr_epi8(_mm512_shuffle_i32x4(__vll, __vll, 0xf9), - __vll, __shift)); - } - else if constexpr (__have_avx512bw && __shift < 32) - { - const auto __vll = __vector_bitcast<_LLong>(__v); - return reinterpret_cast( - _mm512_alignr_epi8(_mm512_shuffle_i32x4(__vll, __m512i(), 0xee), - _mm512_shuffle_i32x4(__vll, __vll, 0xf9), - __shift - 16)); - } - else - __assert_unreachable<_Tp>(); - } - /* - } else if constexpr (__shift % 16 == 0 && sizeof(_Tp) == 64) - return __auto_bitcast(__extract<__shift / 16, 4>(__v)); - */ -#endif // _GLIBCXX_SIMD_X86INTRIN }}} - else - { - constexpr int __chunksize = __shift % 8 == 0 ? 8 - : __shift % 4 == 0 ? 4 - : __shift % 2 == 0 ? 2 - : 1; - auto __w = __vector_bitcast<__int_with_sizeof_t<__chunksize>>(__v); - using _Up = decltype(__w); - return __intrin_bitcast<_Tp>( - __call_with_n_evaluations<(sizeof(_Tp) - __shift) / __chunksize>( - [](auto... __chunks) _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { - return _Up{__chunks...}; - }, [&](auto __i) _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { - return __w[__shift / __chunksize + __i]; - })); - } - } - // }}} // __extract_part(_SimdWrapper<_Tp, _Np>) {{{ template _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr - _SimdWrapper<_Tp, _Np / _Total * _Combine> + conditional_t<_Np == _Total and _Combine == 1, _Tp, _SimdWrapper<_Tp, _Np / _Total * _Combine>> __extract_part(const _SimdWrapper<_Tp, _Np> __x) { - if constexpr (_Index % 2 == 0 && _Total % 2 == 0 && _Combine % 2 == 0) + if constexpr (_Np == _Total and _Combine == 1) + return __x[_Index]; + else if constexpr (_Index % 2 == 0 && _Total % 2 == 0 && _Combine % 2 == 0) return __extract_part<_Index / 2, _Total / 2, _Combine / 2>(__x); else { @@ -235,39 +127,11 @@ __extract_part(const _SimdWrapper<_Tp, _Np> __x) return __x; else if constexpr (_Index == 0) return __intrin_bitcast<_R>(__as_vector(__x)); -#if _GLIBCXX_SIMD_X86INTRIN // {{{ - else if constexpr (sizeof(__x) == 32 - && __return_size * sizeof(_Tp) <= 16) - { - constexpr size_t __bytes_to_skip = __values_to_skip * sizeof(_Tp); - if constexpr (__bytes_to_skip == 16) - return __vector_bitcast<_Tp, __return_size>( - __hi128(__as_vector(__x))); - else - return __vector_bitcast<_Tp, __return_size>( - _mm_alignr_epi8(__hi128(__vector_bitcast<_LLong>(__x)), - __lo128(__vector_bitcast<_LLong>(__x)), - __bytes_to_skip)); - } -#endif // _GLIBCXX_SIMD_X86INTRIN }}} - else if constexpr (_Index > 0 - && (__values_to_skip % __return_size != 0 - || sizeof(_R) >= 8) - && (__values_to_skip + __return_size) * sizeof(_Tp) - <= 64 - && sizeof(__x) >= 16) - return __intrin_bitcast<_R>( - __shift_elements_right<__values_to_skip * sizeof(_Tp)>( - __as_vector(__x))); else - { - _R __r = {}; - __builtin_memcpy(&__r, - reinterpret_cast(&__x) - + sizeof(_Tp) * __values_to_skip, - __return_size * sizeof(_Tp)); - return __r; - } + return __vec_shuffle(__as_vector(__x), make_index_sequence<__bit_ceil(__return_size)>(), + [](size_t __i) { + return __i + __values_to_skip; + }); } } diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h index 40885521297..bdfeefd0632 100644 --- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h +++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h @@ -927,7 +927,9 @@ __extract_part(const _SimdTuple<_Tp, _A0, _As...>& __x) using _RetAbi = simd_abi::deduce_t<_Tp, __return_size>; // handle (optimize) the simple cases - if constexpr (_Index == 0 && _Tuple::_S_first_size == __return_size) + if constexpr (__return_size == 1) + return __x[integral_constant()]; + else if constexpr (_Index == 0 && _Tuple::_S_first_size == __return_size) return __x.first._M_data; else if constexpr (_Index == 0 && _Total == _Combine) return __x; diff --git a/libstdc++-v3/testsuite/experimental/simd/pr114958.cc b/libstdc++-v3/testsuite/experimental/simd/pr114958.cc new file mode 100644 index 00000000000..94c9e0a2d18 --- /dev/null +++ b/libstdc++-v3/testsuite/experimental/simd/pr114958.cc @@ -0,0 +1,20 @@ +// { dg-options "-std=c++17" } +// { dg-do compile { target x86_64-*-* } } +// { dg-require-effective-target c++17 } +// { dg-additional-options "-march=x86-64-v3" { target x86_64-*-* } } +// { dg-require-cmath "" } +// { dg-final { scan-assembler-times "vperm(q|pd)\[\\t \]+\\\$144" 1 } } + +#include + +namespace stdx = std::experimental; + +using T = std::uint64_t; +using V = stdx::simd>; +using V1 = stdx::simd; + +V perm(V data) +{ + auto [carry, _] = stdx::split<3, 1>(data); + return concat(V1(), carry); +}