From patchwork Mon Oct 14 13:28:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Jesus X-Patchwork-Id: 1996976 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=XGqufwLB; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XRyjt4Sy1z1xvK for ; Tue, 15 Oct 2024 00:28:50 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63B6B385AE57 for ; Mon, 14 Oct 2024 13:28:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on20606.outbound.protection.outlook.com [IPv6:2a01:111:f403:2413::606]) by sourceware.org (Postfix) with ESMTPS id 608B53858414; Mon, 14 Oct 2024 13:28:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 608B53858414 Authentication-Results: sourceware.org; dmarc=fail (p=reject dis=none) header.from=nvidia.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nvidia.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 608B53858414 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2413::606 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1728912498; cv=pass; b=SURPazeuQlTIHUuPS7LF2/aU125MsrnOX1KrRI79M3gZnulb5u1fr2FCdhy6faJxWR2g7jCtH7uWT/dOGvKfGHwAY1qtxRkuF6eY3iU81resFRFOPZ44USqiXiXmumFZW01Gdak3PqghZ9Q0KE2TekTuTGIVAOlnA5lnxktgn28= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1728912498; c=relaxed/simple; bh=IUdVb0YHFNyfbJ9MJ/dRI2PeuXnWq6IDHv3h7xqAjgw=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=UOYpAPIaoUYxuxDXA2H2IHWFNe+yzjrYLMY0Uud3Q9Gb0+7BGbI2aZ6zmfWXnHYM4RCjWhCKJj1h+tVMeXFgykCyt8/BgXvehoEfhCVw3PJCG2MhqvH3Bx+OhwvTnBOXOdUFRJgEGKpaxjULNCNZ7TqTuAcUt1aIjMJFZpI33OI= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=f6ly/ZH0EqP98e7lNTz0Xm+RNL//3TOKjjfZLXUsCTVEBSWjtvZoECSyX2shGGZqoWUArnXOLDbU9CVy5E3DA5ARPo95Lny+IQcVU4s0t5ab4fChiYLyOt2HwtVJdfnmUqS0O8suftgvrCQ+CzArp1LriKawzoyBxBD7UVCAulGp7HBIutFb1c1hwwpj2jghuNXH0YSMsSZ//HtsCxiPCdz8V5yRQldb+X8eoE9DTz+SRPOO3Oa539gUOsiYergHlhQqFM6gZtY+pjmG7s11gIjSQyl80r8EYT2kjczWW0nBv7oCciij3bCvCxg4y4V7EbshxO3bPgsOyARyq1v4Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PnHcOsSfnDadLWHfemmFn4PlL9w8a5/SFujzoqakogU=; b=N4pXb8NO+j+Y8WGFC/dplNobNwQc4XocNmbE5SiZch+Cpl1OOrxbWNrFm00M5UVv0UGJ2EId8tWM3EJfV+7N/lSfTAGwFcqGm9sje7034h53AbO9ahVc0Lu6RvGA/CwrGFz022A8WpSMkv65zhBLaC/7OvVht1tvcy6zCVqDnzLc4Ek+elB17oDA38dVX1qmvlhG8OSWZ8xEKJg1rsur4VDslqMb0nKqmma2qW87aJZNtlaJH68lwgOrHAaj3Q12y4ZMNErqvTAd9sotW+KD7BQJuJDVFDo5xetJQCW6ZeTKfY4Q8IEi+CShiqIE7IGOvNQHvZYiG3yIRfA+zN0ywA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PnHcOsSfnDadLWHfemmFn4PlL9w8a5/SFujzoqakogU=; b=XGqufwLBxoYMh7hatnYqbvFXrr1X6/gAEL6ftvc+BXFxLo8HTU0DUd6GYnhy1qRKi+fVTBJ6zNzb1CzW77TS64hzYwZZ/Q/Zh73PF30TflzV7L0IbNKPxuNv6bNCu2iRb9PllI3B2Iz4AT/JDRH2av6i+/iYyNeCtDwp3rV9TKilKTKlWIR3nQCZrCbUmnPLIsX39CHfd4fS5glR28yzDlyidKIQelf+q/+guVh9/qv4dpmqf+3gfVnO55O4TCD25e8/ezsSSam4bc1BCIxt7I6fG6bMS9OpBeJ5449vZYIWIR4O6iNW1nI1ZUtCu9SuP8LTMQOeMpTUrJs8Fj5e9A== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CH3PR12MB8933.namprd12.prod.outlook.com (2603:10b6:610:17a::22) by MN2PR12MB4144.namprd12.prod.outlook.com (2603:10b6:208:15f::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.27; Mon, 14 Oct 2024 13:28:05 +0000 Received: from CH3PR12MB8933.namprd12.prod.outlook.com ([fe80::9f5b:39f1:3299:bb63]) by CH3PR12MB8933.namprd12.prod.outlook.com ([fe80::9f5b:39f1:3299:bb63%5]) with mapi id 15.20.8026.019; Mon, 14 Oct 2024 13:28:05 +0000 From: Ricardo Jesus To: gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Cc: jwakely@redhat.com Subject: [PATCH] aarch64: libstdc++: Use shufflevector instead of shuffle in opt_random.h Date: Mon, 14 Oct 2024 14:28:02 +0100 Message-Id: <20241014132802.506972-1-rjj@nvidia.com> X-Mailer: git-send-email 2.34.1 X-ClientProxiedBy: LO4P265CA0063.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2af::16) To CH3PR12MB8933.namprd12.prod.outlook.com (2603:10b6:610:17a::22) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH3PR12MB8933:EE_|MN2PR12MB4144:EE_ X-MS-Office365-Filtering-Correlation-Id: 0941d393-f989-4a16-23b7-08dcec5408d1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: sccYsM94iBbZlXswFJl/AY2+9q+4Dcv2/F8RMLQDGsusZ5/xMULoBxeaErJm+vzlfGtKeUGo1VYZ1tlxxMRIt1t1iI1+W2m7bVydPx74kHWNJQhAHDEeb3eTQZsopn2OmJSuMhqMMPW8kyFFdkI6MeWeFNkXb5j54LEY3uI18VGX2Rn5NchJLgWB6Lt0WAEVOzo7Fe6xTq1FuWqX9pZzrzmbF3jvhGiVQrcdTmggG9SMTSoPUseU8jbfboJ6OAEFA+nAu5+NeJ815VetYZjFpxnt1F9fvIshzAQLHjKITrQLW+zIERTLenzb/UHC50LqCGv7/G/KOC657GLBzWJ9/co/sXpcGjyINRL48OnrOVLiTuQMEiKaR3us5FIavPOpVtC4W9vaXlQf4OWhgWeznR7A9ZAwzmi7vJmDU/G/5k0ijdSpqw/F96/2V5vrSQ/ONyvpUV7NRYoPJaSmo/VBs3lLmz+oLHvymIY0w20W0v1/OZrPjwW7d1JICYubguCChWK5HCoKwWHxvHk5sgvxYg+3Gm+c30Bgn9WjjbME7JsOzTacbe4uKoRA2/QgoMKnjwlH3296sr35dlef0shRZz9p9uone7xIaD+dXj0yYVQvNAGYE9BBivsp+6Y5SjwztwiQ/2RqNG5yP7kU5px+3+Zour+pNe4/qBJ5K7oawAD8tMyXBRGq7Z8AvyA3hi+I6ZutMcWO0lP7+vFBlB8uP6Dp9I+9ktfbECutT5rtcan5soEcb3qioahwGq1LOkwFGzeAtilp7h5zhUP5qiQf4yRx2G0RX2oBUae8QMPex/3t/tsW8djHQ0jW2MvIL9vOhCPZlq0oygVObb7i0wJ/iBBrpR1cIDJxwOWNMcLgEI9fJkFq6gSjoAxL5arKo+9xlq07EmZOw8XAeSsLIs2jj0gNws3ByFRXtPuWb5HA4wApSAXeouyA914IFMSbE9yhvRxzI/CUkaVzZXV+fxNTR91ZsaPD/ZnCtOrIWjOwyhDYV5lLIBhKQ37/R+5sEqovx38vPXw7W/Xb/gl/Pir0jjoGX40+gVR6v0L10ZYny1XG6ALBszPLBSARscUnLO0cE2cXA03WZVvItSAgrTuh/9cawagMGFt/hZzFRpo6y+gLlcnPvoulu1wdUEz/qVpoiCvK9grnLq+8J0m1rx+4Iq+rYbht2wjxlCS7u0gl7R8arMyMveR0LVLupIR8ZfRdWR9VrLpUOVptTkYzhd1OKP8vDwFF5vKnr3y2z+927qsaEYGi9Rz5mJpPGOQSZd2PmWa6ZkCo7iSYEdhwxESSHw== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH3PR12MB8933.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Ac/qcm7+X9c8snnyJ9wlA1oMIBEJ/hqhd0AKNEnANc4/0MWWoxjfElJaL5MiKUQ86k+HapS1OinUuYL3yA1D5KYDWnksV5HcSMT6JlGtGPTiOBaUWN3P8Zet2RuhLB0yscOetTp6fFTCi920YpNstNBRbedSTYMxA1K2n55I4lh9EOZGCMIZpHnum47XJUCHkkyRiRhTuUT8qZj9kMZF24LrxEUzFYsYSeRs9mLdNL1POrkuHb5eseEJ4U0CjWUTvWBbvEgop2EcPFGXTVUtXribio7vkAXtCWoF0in1Hp4oP922TzDLVI320fFizf2fxXjHGNf7I6xWqm4+A7odbIH1VaBqScZrZ/wn7v6ROYBx2ZSdnUv7/mqH1mvOfASVDJPA2FDByJHJVjY/PYc01GWS4mbuyHlasOVKGJfTI3SVqV5iuHsMD3BPW6PL5yxMiQj6S5x8+fbfEPXN0TCTqz4CX3QaINsos/JTf2qHl5Hoa8Y4gvlxZpd65CogIuryVCLirgEBs+2l9uWUcoVzz46PtiV6naDwW7S5pwEBePmoRA61kkFnitiYAxyQ4QQmTsXnvpiCVqySi1fOMWFHaGDZjmVKQ+fqFh5LJJGz6ze4xpxNeKIkcKUnWZpisKMPWTKPY6qcpikmMtLHfL2sgqL0bZYLxgio7xNSAselJIcaVK+dNuty/0q0qkZllwzfyKDOy19P9NoAgeOwqwhDybv/vv2t4slJTaJLMZipZo0Vxu6kos5XhPsYAPyFmzR7G5MzGVSdMZQr57Yh0kxIDd18IlpG4mmxBl8UDMbniQupwX09IMCBa+d9bgibtyItwzHs8PCs5PyZfWdbqyOwJB3Sosa3Rsx16U3iJN52fSy6cLST0NW//Hrhe0l2qMi6UwxR+He0kxcNerJDSByjHIrATOe3ovHgjH45vB3SVIeJH1li5axUyL90iXtQdd1kMFW3hEGlOt9ZDx1RLxo5q4N5oF301pebPwvdW1hxjaf3Rcd2kbTsz7vVhPgR4yEtQqLYfZLfxNnoaQBdJwsnQ+0Zv5QmsIHJztOfQRCaoUES53/mwCYS+cYdfOnZ54xCfPziOIqx6lCTI3aDN33oRgUspNPira5JAYYo4oQIrFC3upApqLuZ+Kaiw126q9Tp8q+/XPgyp1WB5DXfpBpRYtH9ffk87EQUBbJG3z1/DqZa7zosTn3PiabK6KOudx7MdhpCpqtIVpFitj6uAV8xXEZCUN2vK+LxN8n2ERhnSMxjahEsKRApd8mfMK5q59dbB32tBOqx7TlHGP9PYnSl1bs7DUXC1h98Y4O8okr/HVnoq3J33TRtNcNEM9OT94UiERadHDFHe0oj3Gmoq6tpUZG+Umavt/uF8EM8bDXqgJ0m5KjK7nYLm3nXQOVESn3EJgZsUD8BTvInEML8Typeugqqh7B8yJeJAMyu1WxVGn0t1GsS/EUez/ibOJ2MKFrTuPDVxc0dd7MnfQvtMrjViBuqX4rT5LlGmtvOmsoGGazsUYj5XJMLYdQo+48go1tB4/y6kpVb0envn6+6Pd6nc4E0yggMHEAraA8YnnzQX9oMvP/Vh8NEBa2Mx+3lTL9x X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0941d393-f989-4a16-23b7-08dcec5408d1 X-MS-Exchange-CrossTenant-AuthSource: CH3PR12MB8933.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2024 13:28:05.0884 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 1xqNUBs1D9gld/lw3JDgsiXsGhUIR+dMigOUBXxl5vX4Ja7sjA37tCWocfY/SipF X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4144 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch modifies the implementation of the vectorized mersenne twister random number generator to use __builtin_shufflevector instead of __builtin_shuffle. This makes it (almost) compatible with Clang. To make the implementation fully compatible with Clang, Clang will need to support internal Neon types like __Uint8x16_t and __Uint32x4_t, which currently it does not. This looks like an oversight in Clang and so will be addressed separately. I see no codegen change with this patch. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Ricardo Jesus 2024-09-05 Ricardo Jesus * config/cpu/aarch64/opt/ext/opt_random.h (__VEXT): Replace uses of __builtin_shuffle with __builtin_shufflevector. (__aarch64_lsl_128): Move shift amount to a template parameter. (__aarch64_lsr_128): Move shift amount to a template parameter. (__aarch64_recursion): Update call sites of __aarch64_lsl_128 and __aarch64_lsr_128. --- .../config/cpu/aarch64/opt/ext/opt_random.h | 28 +++++++++++-------- 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h b/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h index 7f756d1572f..7eb816abcd0 100644 --- a/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h +++ b/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h @@ -35,13 +35,13 @@ #ifdef __ARM_NEON #ifdef __ARM_BIG_ENDIAN -# define __VEXT(_A,_B,_C) __builtin_shuffle (_A, _B, (__Uint8x16_t) \ - {16-_C, 17-_C, 18-_C, 19-_C, 20-_C, 21-_C, 22-_C, 23-_C, \ - 24-_C, 25-_C, 26-_C, 27-_C, 28-_C, 29-_C, 30-_C, 31-_C}) +# define __VEXT(_A,_B,_C) __builtin_shufflevector (_A, _B, \ + 16-_C, 17-_C, 18-_C, 19-_C, 20-_C, 21-_C, 22-_C, 23-_C, \ + 24-_C, 25-_C, 26-_C, 27-_C, 28-_C, 29-_C, 30-_C, 31-_C) #else -# define __VEXT(_A,_B,_C) __builtin_shuffle (_B, _A, (__Uint8x16_t) \ - {_C, _C+1, _C+2, _C+3, _C+4, _C+5, _C+6, _C+7, \ - _C+8, _C+9, _C+10, _C+11, _C+12, _C+13, _C+14, _C+15}) +# define __VEXT(_A,_B,_C) __builtin_shufflevector (_B, _A, \ + _C, _C+1, _C+2, _C+3, _C+4, _C+5, _C+6, _C+7, \ + _C+8, _C+9, _C+10, _C+11, _C+12, _C+13, _C+14, _C+15) #endif #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ @@ -52,9 +52,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION namespace { // Logical Shift right 128-bits by c * 8 bits - __extension__ extern __inline __Uint32x4_t + __extension__ + template + extern __inline __Uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) - __aarch64_lsr_128 (__Uint8x16_t __a, __const int __c) + __aarch64_lsr_128 (__Uint8x16_t __a) { const __Uint8x16_t __zero = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; @@ -64,9 +66,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Logical Shift left 128-bits by c * 8 bits - __extension__ extern __inline __Uint32x4_t + __extension__ + template + extern __inline __Uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) - __aarch64_lsl_128 (__Uint8x16_t __a, __const int __c) + __aarch64_lsl_128 (__Uint8x16_t __a) { const __Uint8x16_t __zero = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; @@ -82,14 +86,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __Uint32x4_t __e) { __Uint32x4_t __y = (__b >> __sr1); - __Uint32x4_t __z = __aarch64_lsr_128 ((__Uint8x16_t) __c, __sr2); + __Uint32x4_t __z = __aarch64_lsr_128<__sr2> ((__Uint8x16_t) __c); __Uint32x4_t __v = __d << __sl1; __z = __z ^ __a; __z = __z ^ __v; - __Uint32x4_t __x = __aarch64_lsl_128 ((__Uint8x16_t) __a, __sl2); + __Uint32x4_t __x = __aarch64_lsl_128<__sl2> ((__Uint8x16_t) __a); __y = __y & __e; __z = __z ^ __x;