From patchwork Mon Jun 12 07:56:58 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 774454 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wmQH66wwmz9s2P for ; Mon, 12 Jun 2017 17:57:38 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="JvNV4I6p"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=rGnOKoBgtLpZiufLbWntML3MPhmsk3HvbXRvJq3PCib37JRyYV BGvAtggo41KNL748+DbSw1o/CETXeTPPbkI9WNrEgGOUMN9+45mVJB4ULoILr65J lOVCi9I63Fhlvu0sydzkrB0F2xHHDFv39f7JsnUiDMF5fm7EYtBfbMbbw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=L2+6XT1cBqOsAqWYJwp8+AAd3vs=; b=JvNV4I6pQlyobzzGhs2q CUvUc9xI8gMKNbcJcYBEtDURxXZ+mcsID4NGMHbuGt5gpzaPjG9wVG/YRAtuwzMx DstCs0J1hLTgj1qyeva1Y97D6iCV++j1KZCk+Cj27QUtFZ2YxlmNxLnOS9MatFGx +jmmSoKp53MPambuf0Jlzag= Received: (qmail 55814 invoked by alias); 12 Jun 2017 07:57:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 55739 invoked by uid 89); 12 Jun 2017 07:57:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=1516 X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-he1eur01on0062.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (104.47.0.62) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 12 Jun 2017 07:56:58 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com (10.173.74.140) by HE1PR0802MB2395.eurprd08.prod.outlook.com (10.175.33.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1157.12; Mon, 12 Jun 2017 07:56:59 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com ([fe80::b9d4:b6e7:35f9:e399]) by VI1PR0801MB2031.eurprd08.prod.outlook.com ([fe80::b9d4:b6e7:35f9:e399%17]) with mapi id 15.01.1157.017; Mon, 12 Jun 2017 07:56:58 +0000 From: Tamar Christina To: GCC Patches CC: nd , James Greenhalgh , "Richard Earnshaw" , Marcus Shawcroft Subject: [GCC][PATCH][AArch64] Optimize x * copysign (1.0, y) [Patch (2/2)] Date: Mon, 12 Jun 2017 07:56:58 +0000 Message-ID: authentication-results: arm.com; dkim=none (message not signed) header.d=none; arm.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; HE1PR0802MB2395; 7:owFPW8ncd04IiYLVcFauLxIds8A8yVcajYmvyU9m9hxBYI4uasgixXSwRIq3eNiQ22RTrAVKaLSiz6wAZvcLsCvVoXJhhS+aEVpO0xznEZWF7r9mvQG/ZCOx0tnzgscB0+XPGenLZpTX9yCLj5Vu8K9R6IihHTtIOYRiyjPlQ6YGUjdXqoW0cB3EmnvoDJK+bUtFZWDiAtSN4PMEJqGHjXShskWW4s66tOfA/njvnYrAo/rpBEWadPwiHiX7BJ51uWepbr51XR9QAR4MOoznqM8vJGMg+h38gzWfU2yKBZIPh6jZtjJ6QWFXUpZidQgcnSYke0qIin0fa7A0k91BPw== x-ms-traffictypediagnostic: HE1PR0802MB2395: x-ms-office365-filtering-correlation-id: ce940ba9-aee3-45c6-ae27-08d4b1689a3a x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081); SRVR:HE1PR0802MB2395; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917)(22074186197030)(183786458502308); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(102415395)(6040450)(601004)(2401047)(8121501046)(5005006)(93006095)(93001095)(100000703101)(100105400095)(10201501046)(3002001)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123555025)(20161123560025)(20161123562025)(20161123564025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:HE1PR0802MB2395; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:HE1PR0802MB2395; x-forefront-prvs: 03361FCC43 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39840400002)(39850400002)(39450400003)(39860400002)(39400400002)(39410400002)(377424004)(53754006)(3280700002)(72206003)(2906002)(6436002)(3660700001)(4326008)(305945005)(54906002)(7736002)(33656002)(189998001)(55016002)(99286003)(966005)(74316002)(478600001)(86362001)(5660300001)(6916009)(25786009)(66066001)(38730400002)(110136004)(99936001)(6306002)(9686003)(8676002)(7696004)(5250100002)(14454004)(8936002)(2900100001)(6506006)(102836003)(3846002)(50986999)(53936002)(54356999)(81166006); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0802MB2395; H:VI1PR0801MB2031.eurprd08.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Jun 2017 07:56:58.6615 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2395 X-IsSubscribed: yes Hi All, this patch implements a optimization rewriting x * copysign (1.0, y) and x * copysign (-1.0, y) to: x ^ (y & (1 << sign_bit_position)) The patch provides AArch64 optabs for XORSIGN, both vectorized and scalar. This patch is a revival of a previous patch https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues. Regression done on aarch64-none-linux-gnu and no regressions. AArch64 now generates in GCC: movi v2.2s, 0x80, lsl 24 and v1.8b, v1.8b, v2.8b eor v0.8b, v0.8b, v1.8b as opposed to before: fmov s2, 1.0e+0 mov x0, 2147483648 fmov d3, x0 bsl v3.8b, v1.8b, v2.8b fmul s0, s0, s3 Ok for trunk? gcc/ 2017-06-07 Tamar Christina * config/aarch64/aarch64.md (xorsign3): New optabs. * config/aarch64/aarch64-builtins.c (aarch64_builtin_vectorized_function): Added CASE_CFN_XORSIGN. * config/aarch64/aarch64-simd-builtins.def: Added xorsign BINOP. * config/aarch64/aarch64-simd.md: Added xorsign3. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index f09399f4c158112c90c270856bffb4cafd03e7d4..8a2e214db2bd590fc809cf8c58bfe4aca2af9bef 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1432,6 +1432,15 @@ aarch64_builtin_vectorized_function (unsigned int fn, tree type_out, return AARCH64_FIND_FRINT_VARIANT (nearbyint); CASE_CFN_SQRT: return AARCH64_FIND_FRINT_VARIANT (sqrt); + CASE_CFN_XORSIGN: + if (AARCH64_CHECK_BUILTIN_MODE (2, S)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_xorsignv2sf]; + else if (AARCH64_CHECK_BUILTIN_MODE (4, S)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_xorsignv4sf]; + else if (AARCH64_CHECK_BUILTIN_MODE (2, D)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_xorsignv2df]; + else + return NULL_TREE; #undef AARCH64_CHECK_BUILTIN_MODE #define AARCH64_CHECK_BUILTIN_MODE(C, N) \ (out_mode == SImode && out_n == C \ diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index d713d5d8b88837ec6f2dc51188fb252f8d5bc8bd..b7f50b849dba8d788be142cd839c4a5560e9204e 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -151,6 +151,9 @@ BUILTIN_VQN (TERNOP, raddhn2, 0) BUILTIN_VQN (TERNOP, rsubhn2, 0) + /* Implemented by xorsign3. */ + BUILTIN_VHSDF (BINOP, xorsign, 3) + BUILTIN_VSQN_HSDI (UNOP, sqmovun, 0) /* Implemented by aarch64_qmovn. */ BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c5a86ff6f7196eb634be426ecea97cdfbfc7a7a4..1e92fa1b54a592db5dde9048e51988c03ece141c 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -351,6 +351,35 @@ } ) +(define_expand "xorsign3" + [(match_operand:VHSDF 0 "register_operand") + (match_operand:VHSDF 1 "register_operand") + (match_operand:VHSDF 2 "register_operand")] + "TARGET_SIMD" +{ + + machine_mode imode = mode; + rtx v_bitmask = gen_reg_rtx (imode); + rtx op1x = gen_reg_rtx (imode); + rtx op2x = gen_reg_rtx (imode); + + rtx arg1 = lowpart_subreg (imode, operands[1], mode); + rtx arg2 = lowpart_subreg (imode, operands[2], mode); + + int bits = GET_MODE_UNIT_BITSIZE (mode) - 1; + + emit_move_insn (v_bitmask, + aarch64_simd_gen_const_vector_dup (mode, + HOST_WIDE_INT_M1U << bits)); + + emit_insn (gen_and3 (op2x, v_bitmask, arg2)); + emit_insn (gen_xor3 (op1x, arg1, op2x)); + emit_move_insn (operands[0], + lowpart_subreg (mode, op1x, imode)); + DONE; +} +) + (define_expand "copysign3" [(match_operand:VHSDF 0 "register_operand") (match_operand:VHSDF 1 "register_operand") diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 2e9331fd72b3f36270b8741d97fb3275b4bf2657..8ecdae41a2f4ec42cf28dc6309f3e69fe74ba39d 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4988,6 +4988,42 @@ } ) +;; For xorsign (x, y), we want to generate: +;; +;; LDR d2, #1<<63 +;; AND v3.8B, v1.8B, v2.8B +;; EOR v0.8B, v0.8B, v3.8B +;; + +(define_expand "xorsign3" + [(match_operand:GPF 0 "register_operand") + (match_operand:GPF 1 "register_operand") + (match_operand:GPF 2 "register_operand")] + "TARGET_FLOAT && TARGET_SIMD" +{ + + machine_mode imode = mode; + rtx mask = gen_reg_rtx (imode); + rtx op1x = gen_reg_rtx (imode); + rtx op2x = gen_reg_rtx (imode); + + int bits = GET_MODE_BITSIZE (mode) - 1; + emit_move_insn (mask, GEN_INT (trunc_int_for_mode (HOST_WIDE_INT_M1U << bits, + imode))); + + emit_insn (gen_and3 (op2x, mask, + lowpart_subreg (imode, operands[2], + mode))); + emit_insn (gen_xor3 (op1x, + lowpart_subreg (imode, operands[1], + mode), + op2x)); + emit_move_insn (operands[0], + lowpart_subreg (mode, op1x, imode)); + DONE; +} +) + ;; ------------------------------------------------------------------- ;; Reload support ;; -------------------------------------------------------------------