From patchwork Fri Oct 11 14:57:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 1996176 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=SitHuTHg; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XQ8tW08sMz1xtp for ; Sat, 12 Oct 2024 02:00:03 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29ED63856DEB for ; Fri, 11 Oct 2024 15:00:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id 70B443858C2B for ; Fri, 11 Oct 2024 14:59:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 70B443858C2B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 70B443858C2B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728658773; cv=none; b=vNcjirNLPOiVqhsxTXLmkqbqtrUIf1oeeee7cI/PXxA034mVcj3IwzJKTRS0j+chPJkIeLVULlAWf0w2RMTaeflJrfywfeUMa2CD4PiGjMD8hfMd/yU3SCBaPyGD9cc0WW6cetpsiSq8Cmh3oc6SGlG2dFFFeMI1KEIR24+odHg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728658773; c=relaxed/simple; bh=4w5F/wl7X8YT6/EaRzlllVpvrd7HRizOuCGBCSFjSt8=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=IAIJ3mncf9dcgtdPG76vitbhGtVzlF/81FDe3YBs8shPByPb7b219eEwBuMktMFI6zh8OzoMN+y5E6qSfvRV/58+9HAJSjpDz1/LZsKcA06bfw5QgeXZUnp/D2dnIi7/Yol97yKWKcKYiafMlQvpgmj7w7F2v2ejQaeHlQpvSjk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1728658770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oaG+zc+N2G8A4E52oRdakvsrOU2QTrFuBcLNy3uEvBo=; b=SitHuTHgmyxZxB3vpguQpPTAEgmmOTElRk5PaZMYYCSO2bYUAaEAVowIMsgU/uHFJY4GKd Y9OGLYONrG08IAmJtGnEULcYTBac38kKuq0bhXpmBXwBRR4AyAAYOG0FPj0qTYEFHR/wyC bVT9LIZyml0FfMGMf5HgUAt/QSz1D+Q= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-613-bBllUMXYOVOz2Y_qszqGQQ-1; Fri, 11 Oct 2024 10:59:27 -0400 X-MC-Unique: bBllUMXYOVOz2Y_qszqGQQ-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5559319560BD; Fri, 11 Oct 2024 14:59:26 +0000 (UTC) Received: from localhost (unknown [10.42.28.27]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id AE9971956052; Fri, 11 Oct 2024 14:59:25 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH v2 1/2] libstdc++: Enable memcpy optimizations for distinct integral types [PR93059] Date: Fri, 11 Oct 2024 15:57:23 +0100 Message-ID: <20241011145924.1323101-1-jwakely@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This removes the __memcpyable_integer specialization (for the reasons given earlier, and the commit msg below). Tested x86_64-linux. -- >8 -- Currently we only optimize std::copy, std::copy_n etc. to memmove when the source and destination types are the same. This means that we fail to optimize copying between distinct 1-byte types, e.g. copying from a buffer of unsigned char to a buffer of char8_t or vice versa. This patch adds more partial specializations of the __memcpyable trait so that we allow memcpy between integers of equal widths. This will enable memmove for copies between narrow character types and also between same-width types like int and unsigned. Enabling the optimization needs to be based on the width of the integer type, not just the size in bytes. This is because some targets define non-standard integral types such as __int20 in msp430, which has padding bits. It would not be safe to memcpy between e.g. __int20 and int32_t, even though sizeof(__int20) == sizeof(int32_t). A new trait is introduced to define the width, __memcpyable_integer, and then the __memcpyable trait compares the widths. It's safe to copy between signed and unsigned integers of the same width, because GCC only supports two's complement integers. I initially though it would be useful to define the specialization __memcpyable_integer to enable copying between narrow character types and std::byte. But that isn't possible with std::copy, because is_assignable is false. Optimized copies using memmove will already happen for copying std::byte to std::byte, because __memcpyable is true. libstdc++-v3/ChangeLog: PR libstdc++/93059 * include/bits/cpp_type_traits.h (__memcpyable): Add partial specialization for pointers to distinct types. (__memcpyable_integer): New trait to control which types can use cross-type memcpy optimizations. --- libstdc++-v3/include/bits/cpp_type_traits.h | 79 ++++++++++++++++++++- 1 file changed, 77 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h b/libstdc++-v3/include/bits/cpp_type_traits.h index 060652afb18..25ad8bcbf6b 100644 --- a/libstdc++-v3/include/bits/cpp_type_traits.h +++ b/libstdc++-v3/include/bits/cpp_type_traits.h @@ -434,8 +434,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3) }; #endif - template struct iterator_traits; - // A type that is safe for use with memcpy, memmove, memcmp etc. template struct __is_nonvolatile_trivially_copyable @@ -459,16 +457,93 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3) enum { __value = 0 }; }; + // Allow memcpy when source and destination are pointers to the same type. template struct __memcpyable<_Tp*, _Tp*> : __is_nonvolatile_trivially_copyable<_Tp> { }; + // Source pointer can be const. template struct __memcpyable<_Tp*, const _Tp*> : __is_nonvolatile_trivially_copyable<_Tp> { }; + template struct __memcpyable_integer; + + // For heterogeneous types, allow memcpy between equal-sized integers. + template + struct __memcpyable<_Tp*, _Up*> + { + enum { + __value = __memcpyable_integer<_Tp>::__width != 0 + && ((int)__memcpyable_integer<_Tp>::__width + == (int)__memcpyable_integer<_Up>::__width) + }; + }; + + // Specialization for const U* because __is_integer is never true. + template + struct __memcpyable<_Tp*, const _Up*> + : __memcpyable<_Tp*, _Up*> + { }; + + template + struct __memcpyable_integer + { + enum { + __width = __is_integer<_Tp>::__value ? (sizeof(_Tp) * __CHAR_BIT__) : 0 + }; + }; + + // Cannot memcpy volatile memory. + template + struct __memcpyable_integer + { enum { __width = 0 }; }; + + // Specializations for __intNN types with padding bits. +#if defined __GLIBCXX_TYPE_INT_N_0 && __GLIBCXX_BITSIZE_INT_N_0 % __CHAR_BIT__ + template<> + struct __memcpyable_integer<__GLIBCXX_TYPE_INT_N_0> + { enum { __width = __GLIBCXX_BITSIZE_INT_N_0 }; }; + template<> + struct __memcpyable_integer + { enum { __width = __GLIBCXX_BITSIZE_INT_N_0 }; }; +#endif +#if defined __GLIBCXX_TYPE_INT_N_1 && __GLIBCXX_BITSIZE_INT_N_1 % __CHAR_BIT__ + template<> + struct __memcpyable_integer<__GLIBCXX_TYPE_INT_N_1> + { enum { __width = __GLIBCXX_BITSIZE_INT_N_1 }; }; + template<> + struct __memcpyable_integer + { enum { __width = __GLIBCXX_BITSIZE_INT_N_1 }; }; +#endif +#if defined __GLIBCXX_TYPE_INT_N_2 && __GLIBCXX_BITSIZE_INT_N_2 % __CHAR_BIT__ + template<> + struct __memcpyable_integer<__GLIBCXX_TYPE_INT_N_2> + { enum { __width = __GLIBCXX_BITSIZE_INT_N_2 }; }; + template<> + struct __memcpyable_integer + { enum { __width = __GLIBCXX_BITSIZE_INT_N_2 }; }; +#endif +#if defined __GLIBCXX_TYPE_INT_N_3 && __GLIBCXX_BITSIZE_INT_N_3 % __CHAR_BIT__ + template<> + struct __memcpyable_integer<__GLIBCXX_TYPE_INT_N_3> + { enum { __width = __GLIBCXX_BITSIZE_INT_N_3 }; }; + template<> + struct __memcpyable_integer + { enum { __width = __GLIBCXX_BITSIZE_INT_N_3 }; }; +#endif + +#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__ + // In strict modes __is_integer<__int128> is false, + // but we want to allow memcpy between signed/unsigned __int128. + template<> + struct __memcpyable_integer<__int128> { enum { __width = 128 }; }; + template<> + struct __memcpyable_integer { enum { __width = 128 }; }; +#endif + // Whether two iterator types can be used with memcmp. // This trait only says it's well-formed to use memcmp, not that it // gives the right answer for a given algorithm. So for example, std::equal From patchwork Fri Oct 11 14:57:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Wakely X-Patchwork-Id: 1996180 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=h12biCYE; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XQ8v91CY1z1xt1 for ; Sat, 12 Oct 2024 02:00:37 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 574C6385843F for ; Fri, 11 Oct 2024 15:00:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id A85C53857C5F for ; Fri, 11 Oct 2024 14:59:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A85C53857C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A85C53857C5F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728658779; cv=none; b=xpM04qdUorFzQZwctylumr3rlA/Wd7fslMW8Tde6BLlWrTPf3oaSY6a3Ji8IqCkPbqz8yxHvG3yvcwKIXUibyouqZ8HUfr3CGjJW4Re7HzApt43iQEAo5hXbH3rMpc1VavlL2LGCkUjP3/Ow4rFvAnOOf+Of4GsX9eqmX92FVuE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728658779; c=relaxed/simple; bh=blRouGE3ai5I14ddyKFYO2/INlN3yzDFwMBbNl60Spg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=xUTPVZSyCBBf1Ea8BfUAWmx3Lhfi708QJk8/i+R7qEZgwk2DWf2I9n5l5W2k1Z3oCQbbqYUNYJ2pL8ZJvsMyuMPxv0tvSnz4Hqnedb0+OZ+h1fFy3Se7yJIfkuCLgJgwgh/tqpP2/CWZx8c9oKMM+T8Lq9nZ+YUsZaNEqhanSDs= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1728658774; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hDuUzIC5krI58E2pkfXA435f0OwkOeOHYPMYS6zLv40=; b=h12biCYEVUQplXUsCqSzZupRO2DUlIxprGclC4Vla9ay/5EtT4W5qtC3Du8oS7NPJ3rPN4 UPNuoVX4ACCFeTbxVus1TaR7qg9fn/gru9a6csKurVndcwlLAvIZo8l9XydwCJnxrnwFNx fd+r9qRK3muXXLOqFXgvhY9yzl1IpYw= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-608-L20sDSCkMdiKGAZJHlFnrg-1; Fri, 11 Oct 2024 10:59:30 -0400 X-MC-Unique: L20sDSCkMdiKGAZJHlFnrg-1 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 68E5C1955EB4; Fri, 11 Oct 2024 14:59:29 +0000 (UTC) Received: from localhost (unknown [10.42.28.27]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id EC0EA19560B2; Fri, 11 Oct 2024 14:59:27 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH v2 2/2] libstdc++: Enable memset optimizations for distinct character types [PR93059] Date: Fri, 11 Oct 2024 15:57:24 +0100 Message-ID: <20241011145924.1323101-2-jwakely@redhat.com> In-Reply-To: <20241011145924.1323101-1-jwakely@redhat.com> References: <20241011145924.1323101-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This adds __are_same to the enable_if condition, to ensure that the optimized overload can still be selected for std::byte (because that's no longer done via a __memcpyable_integer specialization). Tested x86_64-linux. -- >8 -- Currently we only optimize std::fill to memset when the source and destination types are the same byte-sized type. This means that we fail to optimize cases like std::fill(buf. buf+n, 0) because the literal 0 is not the same type as the character buffer. Such cases can safely be optimized to use memset, because assigning an int (or other integer) to a narrow character type has the same effects as converting the integer to unsigned char then copying it with memset. This patch enables the optimized code path when the fill character is a memcpy-able integer (using the new __memcpyable_integer trait). We still need to check is_same to enable the memset optimization for filling a range of std::byte with a std::byte value, because that isn't a memcpyable integer. libstdc++-v3/ChangeLog: PR libstdc++/93059 * include/bits/stl_algobase.h (__fill_a1(T*, T*, const T&)): Change template parameters and enable_if condition to allow the fill value to be an integer. --- libstdc++-v3/include/bits/stl_algobase.h | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h index 9e92211c124..384e5fdcdc9 100644 --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -967,23 +967,27 @@ _GLIBCXX_END_NAMESPACE_CONTAINER #pragma GCC diagnostic pop // Specialization: for char types we can use memset. - template + template _GLIBCXX20_CONSTEXPR inline typename - __gnu_cxx::__enable_if<__is_byte<_Tp>::__value, void>::__type - __fill_a1(_Tp* __first, _Tp* __last, const _Tp& __c) + __gnu_cxx::__enable_if<__is_byte<_Up>::__value + && (__are_same<_Up, _Tp>::__value // for std::byte + || __memcpyable_integer<_Tp>::__value), + void>::__type + __fill_a1(_Up* __first, _Up* __last, const _Tp& __x) { - const _Tp __tmp = __c; + // This hoists the load out of the loop and also ensures that we don't + // use memset for cases where the assignment would be ill-formed. + const _Up __val = __x; #if __cpp_lib_is_constant_evaluated if (std::is_constant_evaluated()) { for (; __first != __last; ++__first) - *__first = __tmp; - return; + *__first = __val; } #endif if (const size_t __len = __last - __first) - __builtin_memset(__first, static_cast(__tmp), __len); + __builtin_memset(__first, static_cast(__val), __len); } template