From patchwork Thu Sep 3 11:11:59 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 513989 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B01A21401DA for ; Thu, 3 Sep 2015 21:12:14 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=gqpjK0+q; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=kFRwd+QzjGfO9JGmi OvfVETSyyyDR144jn9+NGouXhhzTiX+l5gxKhufouQWBCwnRVgpcd8x3sf9D51F+ E0SqifpaMGW60YbGbxl2qSuEcztkmJJyxytVYz2jWtrlD6cMB6xkL+7roOrsi21n pxaIIQUAehBBmJpyhrVyeLJETU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=shGST6g0FIgwjTFCi0AKYsU 79zI=; b=gqpjK0+qI+ZX2nFEf2K5gT/dMPfiE7ujmDTo91gPvvWjwQDMBc71aq3 jf/puGv/9JDSP8hpxHsGRrSsfng1LcsVReR3WfY0zCyqA0gCssGjz8SiKt158f6G CAWyVx7XtPfOXUHeBjEP3k47T9vRMC2gv5h73kB/EIBTIqU1QaIU= Received: (qmail 84904 invoked by alias); 3 Sep 2015 11:12:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 84884 invoked by uid 89); 3 Sep 2015 11:12:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL, BAYES_50, KAM_LOTSOFHASH, SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 03 Sep 2015 11:12:04 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-35-dC2-rdo2Tn-jNWtEvTT52A-1; Thu, 03 Sep 2015 12:12:00 +0100 Received: from [10.2.207.8] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 3 Sep 2015 12:11:59 +0100 Message-ID: <55E82AFF.4020401@arm.com> Date: Thu, 03 Sep 2015 12:11:59 +0100 From: Andre Vieira User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Richard Biener CC: GCC Patches Subject: Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify References: <55E092D9.8070700@arm.com> <55E5AACB.3050205@arm.com> In-Reply-To: X-MC-Unique: dC2-rdo2Tn-jNWtEvTT52A-1 X-IsSubscribed: yes On 01/09/15 15:01, Richard Biener wrote: > On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira > wrote: >> Hi Marc, >> >> On 28/08/15 19:07, Marc Glisse wrote: >>> >>> (not a review, I haven't even read the whole patch) >>> >>> On Fri, 28 Aug 2015, Andre Vieira wrote: >>> >>>> 2015-08-03 Andre Vieira >>>> >>>> * match.pd: Added new patterns: >>>> ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) >>>> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) >>> >>> >>> +(for op0 (rshift rshift lshift lshift bit_and bit_and) >>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor) >>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior) >>> >>> You can nest for-loops, it seems clearer as: >>> (for op0 (rshift lshift bit_and) >>> (for op1 (bit_ior bit_xor) >>> op2 (bit_xor bit_ior) >> >> >> Will do, thank you for pointing it out. >>> >>> >>> +(simplify >>> + (op2:c >>> + (op1:c >>> + (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) >>> >>> I suspect you will want more :s (single_use) and less :c (canonicalization >>> should put constants in second position). >>> >> I can't find the definition of :s (single_use). > > Sorry for that - didn't get along updating it yet :/ It restricts matching to > sub-expressions that have a single-use. So > > + a &= 0xd123; > + unsigned short tem = a ^ 0x6040; > + a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071). */ > ... use of tem ... > > we shouldn't do the simplifcation here because the expression > (a & 0x123) ^ 0x6040 is kept live by 'tem'. > >> GCC internals do point out >> that canonicalization does put constants in the second position, didnt see >> that first. Thank you for pointing it out. >> >>> + C1 = wi::bit_and_not (C1,C2); >>> >>> Space after ','. >>> >> Will do. >> >>> Having wide_int_storage in many places is surprising, I can't find similar >>> code anywhere else in gcc. >>> >>> >> >> I tried looking for examples of something similar, I think I ended up using >> wide_int because I was able to convert easily to and from it and it has the >> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on what I >> should be using here for integer constant transformations and comparisons. > > Using wide-ints is fine, but you shouldn't need 'wide_int_storage' > constructors - those > are indeed odd. Is it just for the initializers of wide-ints? > > + wide_int zero_mask = wi::zero (prec); > + wide_int C0 = wide_int_storage (@1); > + wide_int C1 = wide_int_storage (@2); > + wide_int C2 = wide_int_storage (@3); > ... > + zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, prec)); > > tree_to_uhwi (@1) should do the trick as well > > + C1 = wi::bit_and_not (C1,C2); > + cst_emit = wi::bit_or (C1, C2); > > the ops should be replacable with @2 and @3, the store to C1 obviously not > (but you can use a tree temporary and use wide_int_to_tree here to avoid > the back-and-forth for the case where C1 is not assigned to). > > Note that transforms only doing association are prone to endless recursion > in case some other pattern does the reverse op... > > Richard. > > >> BR, >> Andre >> > Thank you for all the comments, see reworked version: Two new algorithmic optimisations: 1.((X op0 C0) op1 C1) op2 C2) with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2 zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0) and 0's otherwise. if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted) if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2) if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2) 2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) This patch does two algorithmic optimisations that target patterns like: (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) | 0x80000000. The transformation uses the knowledge of which bits are zero after operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing run-time operations. The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF and (X >> 1) ^ 0xa0000001 respectively. The second transformation enables more applications of the first. Also some targets may benefit from delaying shift operations. I am aware that such an optimization, in combination with one or more optimizations that cause the reverse transformation, may lead to an infinite loop. Though such behavior has not been detected during regression testing and bootstrapping on aarch64. gcc/ChangeLog: 2015-08-03 Andre Vieira * match.pd: Added new patterns: ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) gcc/testsuite/ChangeLog: 2015-08-03 Andre Vieira Hale Wang * gcc.dg/tree-ssa/forwprop-33.c: New test. From 3d1d4d838fed9af45aea9fa99f8954585fee7c23 Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Wed, 2 Sep 2015 16:47:38 +0100 Subject: [PATCH] algorithmic optimization v2 --- gcc/match.pd | 70 +++++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c | 42 +++++++++++++++++ 2 files changed, 112 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c diff --git a/gcc/match.pd b/gcc/match.pd index fb4b342d31d26a03bc756c538f6635f2acf6ddb2..6138591c0cef1814dcbd6313dedaa95a91700dc2 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -710,6 +710,76 @@ along with GCC; see the file COPYING3. If not see && tree_nop_conversion_p (type, TREE_TYPE (@1))) (convert (bit_and (bit_not @1) @0)))) +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */ +(for bit_op (bit_ior bit_xor bit_and) +(simplify + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) + (bit_op + (rshift @0 @2) + { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); }))) + +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */ +(for bit_op (bit_ior bit_xor bit_and) +(simplify + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) + (bit_op + (lshift @0 @2) + { wide_int_to_tree (type, wi::lshift (@1, @2)); }))) + + +/* ((X op0 C0) op1 C1) op2 C2) + with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2 + zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0) + and 0's otherwise. + if (op1 == '^') C1 &= ~C2; + if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2) + if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2) +*/ +(for op0 (rshift lshift bit_and) + (for op1 (bit_ior bit_xor) + op2 (bit_xor bit_ior) +(simplify + (op2 + (op1:s + (op0:s @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p (@1)) + (with + { + unsigned int prec = TYPE_PRECISION (type); + wide_int zero_mask_not; + wide_int C1; + wide_int cst_emit; + if (op0 == BIT_AND_EXPR) + { + zero_mask_not = @1; + } + else if (op0 == LSHIFT_EXPR) + { + zero_mask_not = wi::bit_not (wi::mask (tree_to_uhwi (@1), false, + prec)); + } + else if (op0 == RSHIFT_EXPR) + { + unsigned HOST_WIDE_INT m = prec - tree_to_uhwi (@1); + zero_mask_not = wi::bit_not (wi::mask (m, true, prec)); + } + + if (op1 == BIT_XOR_EXPR) + { + C1 = wi::bit_and_not (@2, @3); + cst_emit = wi::bit_or (C1, @3); + } + else + { + C1 = @2; + cst_emit = wi::bit_xor (@2, @3); + } + } + (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec))) + (op2 (op0 @0 @1) { wide_int_to_tree (type, cst_emit); }) + (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec))) + (op1 (op0 @0 @1) { wide_int_to_tree (type, cst_emit); })))))))) + /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)). */ (simplify (pointer_plus (pointer_plus:s @0 @1) @3) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c new file mode 100644 index 0000000000000000000000000000000000000000..c8940d62a7a9370e9d2b911badfc6d085f988304 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-forwprop1" } */ + +unsigned short +foo (unsigned short a) +{ + a ^= 0x4002; + a >>= 1; + a |= 0x8000; /* Simplify to ((a >> 1) ^ 0xa001). */ + return a; +} +/* { dg-final { scan-tree-dump "\\^ 40961" "forwprop1" } } */ + +unsigned short +bar (unsigned short a) +{ + a |= 0x4002; + a <<= 1; + a ^= 0x0001; /* Simplify to ((a << 1) | 0x8005). */ + return a; +} +/* { dg-final { scan-tree-dump "\\| 32773" "forwprop1" } } */ + +unsigned short +baz (unsigned short a) +{ + a &= 0xd123; + a ^= 0x6040; + a |= 0xc031; /* Simplify to ((a & 0xd123) | 0xe071). */ + return a; +} +/* { dg-final { scan-tree-dump "\\| 57457" "forwprop1" } } */ + +short +qux (short a) +{ + a ^= 0x8002; + a >>= 1; + a |= 0x8000; /* Only move shift inward: (((a >> 1) ^ 0x4001 |) 0x8000). */ + return a; +} +/* { dg-final { scan-tree-dump "\\^ -16383" "forwprop1" } } */ -- 1.9.1