From patchwork Tue Nov 29 21:37:17 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Edlinger X-Patchwork-Id: 700722 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tSxjP1vvhz9t1Q for ; Wed, 30 Nov 2016 08:37:44 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="mpYGwnCI"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:content-id:content-transfer-encoding:mime-version; q=dns; s=default; b=VrcI60M9ir9ce/1JrWF9dyFCNgJqzvP+deLXSTZ5Hvl qTDwVq2aiFpHnVVgcqlKzmolh+/ImQ/yVb6U1QSz2R17IJlqQ4FjdqjFHczNsPaB k9/crIQJIXhpxyjGz2WCNBHi7KaWnEp5XiL/oWY/oTuOHxaP8n86plAmq0F0lOFI = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:content-id:content-transfer-encoding:mime-version; s=default; bh=ZEZtmP34T1u4QTYqq+DUjtnTTlY=; b=mpYGwnCIrotBP23VW mq1t6J8C+X11EA+LTI9VUsWY+7KOeiXjhOM4hgfsRQSG7a1feMcUr/7NhNupmAHR 3960ASl2hwjx4EKwWfQRsMsowBvMKn2Pe8zZZRcP1/2aHnj5vk7miIIorf2RgKpe LAKKMQaapbOnzP3mO1DHnC/7/Q= Received: (qmail 42482 invoked by alias); 29 Nov 2016 21:37:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 42461 invoked by uid 89); 29 Nov 2016 21:37:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=xj, x7, x6, 1900 X-HELO: SNT004-OMC1S23.hotmail.com Received: from snt004-omc1s23.hotmail.com (HELO SNT004-OMC1S23.hotmail.com) (65.55.90.34) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Nov 2016 21:37:21 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com ([65.55.90.7]) by SNT004-OMC1S23.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 29 Nov 2016 13:37:20 -0800 Received: from HE1EUR02FT007.eop-EUR02.prod.protection.outlook.com (10.152.10.58) by HE1EUR02HT238.eop-EUR02.prod.protection.outlook.com (10.152.10.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.734.4; Tue, 29 Nov 2016 21:37:18 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.10.55) by HE1EUR02FT007.mail.protection.outlook.com (10.152.10.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.734.4 via Frontend Transport; Tue, 29 Nov 2016 21:37:18 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) with mapi id 15.01.0761.009; Tue, 29 Nov 2016 21:37:18 +0000 From: Bernd Edlinger To: Wilco Dijkstra , Ramana Radhakrishnan CC: GCC Patches , Kyrill Tkachov , Richard Earnshaw Subject: Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308) Date: Tue, 29 Nov 2016 21:37:17 +0000 Message-ID: References: In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none; arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:; UpperCasedChecksum:; SizeAsReceived:7810; Count:37 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 37 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; HE1EUR02HT238; 7:EWrLfBD0TUx+Th0K2a3afzwDtg7Yx+y2VrxXG0/JLTeR+UQP0ej5WmI8IqJXZqTYnXcQp52XRut/Bo1NRQmxqauF1Kgj8OGLr9HUEc5Ud33A7i9xt+DXJ6pDdUhxrDoZcMEtOzbiuExyPnoYTdBPuT9yR2paVlLQ6Yj2RVjsWaDvxjcfZxP/Icf+YBbtrDLvsBcxe7MJA5y78Leajcw1l3yIhCptAdtKSRWISzs5Zk6tpr+SBq9fWq9fiH+/gZJYq6pCZigHybprpxF2iNJKd7eTCKo+rVhWsTCpdrHa1rVa+Ce50nynnObReRdn2I9Uaii5hS0Aoqt0V5qu74EcaaEv+Eo2fyA6rUqPufyo/Bg= x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900003); DIR:OUT; SFP:1102; SCL:1; SRVR:HE1EUR02HT238; H:AM4PR0701MB2162.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: 89f81764-b060-48ae-1626-08d4189fe3e3 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(1601124038)(1603103113)(1601125047); SRVR:HE1EUR02HT238; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015012)(82015046); SRVR:HE1EUR02HT238; BCL:0; PCL:0; RULEID:; SRVR:HE1EUR02HT238; x-forefront-prvs: 01415BB535 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <577897F380507F45AA4D3764323E0131@eurprd07.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Nov 2016 21:37:17.9148 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1EUR02HT238 On 11/29/16 16:06, Wilco Dijkstra wrote: > Bernd Edlinger wrote: > > - "TARGET_32BIT && reload_completed > + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed) > && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))" > > This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since we're > already excluding NEON. > Aehm, no. This would split the addi_neon insn before it is clear if the reload pass will assign a VFP register. With this change the stack usage with -mfpu=neon increases from 2300 to around 2600 bytes. > This patch expands ADD and SUB earlier, so shouldn't we do the same obvious > change for the similar instructions CMP and NEG? > Good question. I think the cmp and neg pattern are more complicated and do typically have a more complicated data flow than the other patterns. I tried to create a test case which expands cmpdi and negdi patterns as follows: ontop of the latest patch, I got: gcc -S -Os pr77308-2.c -fdump-rtl-all-verbose pr77308-2.c: In function 'sha512_block_data_order': pr77308-2.c:169:1: error: unrecognizable insn: } ^ (insn 4870 4869 1636 87 (set (scratch:SI) (minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4) (subreg:SI (reg:DI 473 [ X$14 ]) 4)) (ltu:SI (reg:CC_C 100 cc) (const_int 0 [0])))) "pr77308-2.c":140 -1 (nil)) pr77308-2.c:169:1: internal compiler error: in extract_insn, at recog.c:2311 0xaf4cd8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../gcc-trunk/gcc/rtl-error.c:108 0xaf4d09 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../gcc-trunk/gcc/rtl-error.c:116 0xac74ef extract_insn(rtx_insn*) ../../gcc-trunk/gcc/recog.c:2311 0x122427a decompose_multiword_subregs ../../gcc-trunk/gcc/lower-subreg.c:1467 0x122550d execute ../../gcc-trunk/gcc/lower-subreg.c:1734 So it is certainly possible, but not really simple to improve the stack size even further. But I would prefer to do that in a separate patch. BTW: there are also negd2_compare, *negdi_extendsidi, *negdi_zero_extendsidi, *thumb2_negdi2. I think it would be a precondition to have test cases that exercise each of these patterns before we try to split these instructions. Bernd. --- pr77308-1.c 2016-11-25 17:53:20.379141465 +0100 +++ pr77308-2.c 2016-11-29 20:46:51.266948631 +0100 @@ -68,10 +68,10 @@ #define B(x,j) (((SHA_LONG64)(*(((const unsigned char *)(&x))+j)))<<((7-j)*8)) #define PULL64(x) (B(x,0)|B(x,1)|B(x,2)|B(x,3)|B(x,4)|B(x,5)|B(x,6)|B(x,7)) #define ROTR(x,s) (((x)>>s) | (x)<<(64-s)) -#define Sigma0(x) ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39)) -#define Sigma1(x) ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41)) -#define sigma0(x) ~(ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7)) -#define sigma1(x) ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6)) +#define Sigma0(x) (ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39) == (x) ? -(x) : (x)) +#define Sigma1(x) (ROTR((x),14) ^ ROTR(-(x),18) ^ ROTR((x),41) < (x) ? -(x) : (x)) +#define sigma0(x) (ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7) <= (x) ? ~(x) : (x)) +#define sigma1(x) ((long long)(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6)) < (long long)(x) ? -(x) : (x)) #define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z))) #define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z))) This expands *arm_negdi2, *arm_cmpdi_unsigned, *arm_cmpdi_insn. The stack usage is around 1900 bytes with previous patch, and 2300 bytes without. I tried to split *arm_negdi2 and *arm_cmpdi_unsined early, and it gives indeed smaller stack sizes in the test case above (~400 bytes). But when I make *arm_cmpdi_insn split early, it ICEs: --- arm.md.orig 2016-11-27 09:22:41.794790123 +0100 +++ arm.md 2016-11-29 21:51:51.438163078 +0100 @@ -7432,7 +7432,7 @@ (clobber (match_scratch:SI 2 "=r"))] "TARGET_32BIT" "#" ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 0) (match_dup 1))) (parallel [(set (reg:CC CC_REGNUM)