From patchwork Wed Jun 7 11:38:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 772374 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wjRS61fGyz9s78 for ; Wed, 7 Jun 2017 21:40:06 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="KYVpj1zP"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=uF7CTLUdgMdwbjZkzCDJKO1fRuI5X+ioHG7W7d0hWITOsRxOwR KBaoMBcr2Jj8Jo6HxDJg+teKSQf1pTRWyQ6yUh+ie6PV88HavuoBm+FNmuN4KsUY cEFyNprb0+MvGopJsWF79y/pni0Rfe+k7+dzSfqdLSPwc3nxIunFaEMt8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=GvO2obwzrt9+5Do82f/ZfaYlI0A=; b=KYVpj1zPjkSPxMvYnj3b vMUjtQBkPlkdx+KsaAzs4N6ad4KVYks3sE+j/aEDBacFXrSrWMmC3ZSEGygDaXsh MUg27bhF6LltmfkYLm58cDHci9dHXvJtoufFR28elFKCcB5mX/eBgEqmIwLldBq+ 2K5JJWXZrVGEwlbSSu8FZdU= Received: (qmail 47342 invoked by alias); 7 Jun 2017 11:38:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 47268 invoked by uid 89); 7 Jun 2017 11:38:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=ww, 0h X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-he1eur01on0064.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (104.47.0.64) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 07 Jun 2017 11:38:36 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com (10.173.74.140) by HE1PR0802MB2619.eurprd08.prod.outlook.com (10.175.36.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1143.10; Wed, 7 Jun 2017 11:38:37 +0000 Received: from VI1PR0801MB2031.eurprd08.prod.outlook.com ([fe80::b9d4:b6e7:35f9:e399]) by VI1PR0801MB2031.eurprd08.prod.outlook.com ([fe80::b9d4:b6e7:35f9:e399%17]) with mapi id 15.01.1143.018; Wed, 7 Jun 2017 11:38:37 +0000 From: Tamar Christina To: GCC Patches CC: nd , James Greenhalgh , "Marcus Shawcroft" , Richard Earnshaw Subject: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode. Date: Wed, 7 Jun 2017 11:38:37 +0000 Message-ID: authentication-results: arm.com; dkim=none (message not signed) header.d=none; arm.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; HE1PR0802MB2619; 7:c+M10gqoAgGxqeP0gnBslDa0ugyBCtf8kT+o5DGymCQ741/erLJUVLcp6BDvw0ZEYZ24B1la+E1TSfcT2/sV1p7M0O+llFI0I3oPcwE/qSX4LiQxK7h6eBylC0z6mhY6jEK+pEv5j8RV9MJwGaEo5b5sLgmjowzhEIpL98315h0l7IR4yN0+No1iFPmEFZzmZSfaOxHQRz+2cQp24zcw8zAnL+L84qbpsCrqJs2VsOfWUCwe0rWf6d2vDKFYOQuAtLjzhF0sEc/7bp8iz16Je/iyCFX+0DYvKg8jxD6Vzm1SvOxJC3BjuIfCIL91Lu2ltnc9D2qiVUAMIFc61zy5Tw== x-ms-traffictypediagnostic: HE1PR0802MB2619: x-ms-office365-filtering-correlation-id: 30f41991-bfad-4ea8-0929-08d4ad99bcaa x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081); SRVR:HE1PR0802MB2619; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(102415395)(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(100000703101)(100105400095)(93006095)(93001095)(6055026)(6041248)(20161123564025)(20161123560025)(201703131423075)(201703011903075)(201702281528075)(201703061421075)(20161123558100)(20161123562025)(20161123555025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:HE1PR0802MB2619; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:HE1PR0802MB2619; x-forefront-prvs: 03319F6FEF x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39860400002)(39850400002)(39840400002)(39450400003)(39400400002)(39410400002)(377424004)(53754006)(55016002)(99286003)(33656002)(110136004)(9686003)(53936002)(54906002)(508600001)(6506006)(6436002)(81166006)(8676002)(8936002)(14454004)(38730400002)(3660700001)(25786009)(5250100002)(2906002)(3846002)(6916009)(54356999)(50986999)(102836003)(6116002)(72206003)(99936001)(4326008)(3280700002)(86362001)(7696004)(305945005)(5660300001)(74316002)(189998001)(2900100001)(7736002)(66066001)(14773001); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0802MB2619; H:VI1PR0801MB2031.eurprd08.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Jun 2017 11:38:37.1850 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2619 X-IsSubscribed: yes Hi All, This patch adds support for creating floating point constants using mov immediate instructions. The movi SIMD instruction can be used for HFmode and SFmode constants, eg. for -0.0f we generate: movi v0.2s, 0x80, lsl 24 More complex constants can be generated using an integer MOV or MOV+MOVK: mov w0, 48128 movk w0, 0x47f0, lsl 16 fmov s0, w0 We allow up to 3 instructions as this allows all HF, SF and most DF constants to be generated without a literal load, and is overall best for codesize. Regression tested on aarch64-none-linux-gnu and no regressions. OK for trunk? Thanks, Tamar gcc/ 2017-06-07 Tamar Christina * config/aarch64/aarch64.md (mov): Generalize. (*movhf_aarch64, *movsf_aarch64, *movdf_aarch64): Add integer and movi cases. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5adc5edb8dde9c30450b04932a37c41f84cc5ed1..7f107672882b13809be01355ffafbc2807cc5adb 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1167,66 +1167,120 @@ } ) -(define_insn "*movhf_aarch64" - [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w ,?r,w,w,m,r,m ,r") - (match_operand:HF 1 "general_operand" "Y ,?rY, w,w,m,w,m,rY,r"))] +(define_insn_and_split "*movhf_aarch64" + [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w ,?r,w,w ,w ,w,m,r,m ,r") + (match_operand:HF 1 "general_operand" "Y ,?rY, w,w,Ufc,Uvi,m,w,m,rY,r"))] "TARGET_FLOAT && (register_operand (operands[0], HFmode) - || aarch64_reg_or_fp_zero (operands[1], HFmode))" + || aarch64_reg_or_fp_float (operands[1], HFmode))" "@ movi\\t%0.4h, #0 - mov\\t%0.h[0], %w1 + fmov\\t%s0, %w1 umov\\t%w0, %1.h[0] mov\\t%0.h[0], %1.h[0] + fmov\\t%s0, %1 + * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode); ldr\\t%h0, %1 str\\t%h1, %0 ldrh\\t%w0, %1 strh\\t%w1, %0 mov\\t%w0, %w1" - [(set_attr "type" "neon_move,neon_from_gp,neon_to_gp,neon_move,\ - f_loads,f_stores,load1,store1,mov_reg") - (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")] + "&& can_create_pseudo_p () + && !aarch64_can_const_movi_rtx_p (operands[1], HFmode) + && !aarch64_float_const_representable_p (operands[1]) + && aarch64_float_const_rtx_p (operands[1])" + [(const_int 0)] + "{ + unsigned HOST_WIDE_INT ival; + if (!aarch64_reinterpret_float_as_int (operands[1], &ival)) + FAIL; + + rtx tmp = gen_reg_rtx (SImode); + aarch64_expand_mov_immediate (tmp, GEN_INT (ival)); + tmp = simplify_gen_subreg (HImode, tmp, SImode, 0); + emit_move_insn (operands[0], gen_lowpart (HFmode, tmp)); + DONE; + }" + [(set_attr "type" "neon_move,f_mcr,neon_to_gp,neon_move,fconsts, \ + neon_move,f_loads,f_stores,load1,store1,mov_reg") + (set_attr "simd" "yes,*,yes,yes,*,yes,*,*,*,*,*")] ) -(define_insn "*movsf_aarch64" - [(set (match_operand:SF 0 "nonimmediate_operand" "=w,w ,?r,w,w ,w,m,r,m ,r") - (match_operand:SF 1 "general_operand" "Y ,?rY, w,w,Ufc,m,w,m,rY,r"))] +(define_insn_and_split "*movsf_aarch64" + [(set (match_operand:SF 0 "nonimmediate_operand" "=w,w ,?r,w,w ,w ,w,m,r,m ,r,r") + (match_operand:SF 1 "general_operand" "Y ,?rY, w,w,Ufc,Uvi,m,w,m,rY,r,M"))] "TARGET_FLOAT && (register_operand (operands[0], SFmode) - || aarch64_reg_or_fp_zero (operands[1], SFmode))" + || aarch64_reg_or_fp_float (operands[1], SFmode))" "@ movi\\t%0.2s, #0 fmov\\t%s0, %w1 fmov\\t%w0, %s1 fmov\\t%s0, %s1 fmov\\t%s0, %1 + * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode); ldr\\t%s0, %1 str\\t%s1, %0 ldr\\t%w0, %1 str\\t%w1, %0 - mov\\t%w0, %w1" - [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconsts,\ - f_loads,f_stores,load1,store1,mov_reg") - (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")] + mov\\t%w0, %w1 + mov\\t%w0, %1" + "&& can_create_pseudo_p () + && !aarch64_can_const_movi_rtx_p (operands[1], SFmode) + && !aarch64_float_const_representable_p (operands[1]) + && aarch64_float_const_rtx_p (operands[1])" + [(const_int 0)] + "{ + unsigned HOST_WIDE_INT ival; + if (!aarch64_reinterpret_float_as_int (operands[1], &ival)) + FAIL; + + rtx tmp = gen_reg_rtx (SImode); + aarch64_expand_mov_immediate (tmp, GEN_INT (ival)); + emit_move_insn (operands[0], gen_lowpart (SFmode, tmp)); + DONE; + }" + [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconsts,neon_move,\ + f_loads,f_stores,load1,store1,mov_reg,\ + fconsts") + (set_attr "simd" "yes,*,*,*,*,yes,*,*,*,*,*,*")] ) -(define_insn "*movdf_aarch64" - [(set (match_operand:DF 0 "nonimmediate_operand" "=w,w ,?r,w,w ,w,m,r,m ,r") - (match_operand:DF 1 "general_operand" "Y ,?rY, w,w,Ufc,m,w,m,rY,r"))] +(define_insn_and_split "*movdf_aarch64" + [(set (match_operand:DF 0 "nonimmediate_operand" "=w, w ,?r,w,w ,w ,w,m,r,m ,r,r") + (match_operand:DF 1 "general_operand" "Y , ?rY, w,w,Ufc,Uvi,m,w,m,rY,r,N"))] "TARGET_FLOAT && (register_operand (operands[0], DFmode) - || aarch64_reg_or_fp_zero (operands[1], DFmode))" + || aarch64_reg_or_fp_float (operands[1], DFmode))" "@ movi\\t%d0, #0 fmov\\t%d0, %x1 fmov\\t%x0, %d1 fmov\\t%d0, %d1 fmov\\t%d0, %1 + * return aarch64_output_scalar_simd_mov_immediate (operands[1], DImode); ldr\\t%d0, %1 str\\t%d1, %0 ldr\\t%x0, %1 str\\t%x1, %0 - mov\\t%x0, %x1" - [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,\ - f_loadd,f_stored,load1,store1,mov_reg") - (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")] + mov\\t%x0, %x1 + mov\\t%x0, %1" + "&& can_create_pseudo_p () + && !aarch64_can_const_movi_rtx_p (operands[1], DFmode) + && !aarch64_float_const_representable_p (operands[1]) + && aarch64_float_const_rtx_p (operands[1])" + [(const_int 0)] + "{ + unsigned HOST_WIDE_INT ival; + if (!aarch64_reinterpret_float_as_int (operands[1], &ival)) + FAIL; + + rtx tmp = gen_reg_rtx (DImode); + aarch64_expand_mov_immediate (tmp, GEN_INT (ival)); + emit_move_insn (operands[0], gen_lowpart (DFmode, tmp)); + DONE; + }" + [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,neon_move,\ + f_loadd,f_stored,load1,store1,mov_reg,\ + fconstd") + (set_attr "simd" "yes,*,*,*,*,yes,*,*,*,*,*,*")] ) (define_insn "*movtf_aarch64"