From patchwork Thu Jul 30 13:42:11 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 502160 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id DDF8F1402D6 for ; Thu, 30 Jul 2015 23:42:34 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=mrbgPU3O; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=cr2/brn1D7n6LSNyW jTZW58p2JnL0ngcaAvqDtc/IXoW6MX6IvGNL2NwMedaiGQFekM4D/0JChLJxBnia RD8NYHBnTIAjIINivEx6FbAEDc73bbY+LU72LU9VENWHHG9rXOaBeRzrX7zKNIWt 2ZeLq79fc8AjBrV8avHwrrOhuc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=default; bh=FnjY/nUxT+j8tKBwKUjm6ZW ac4M=; b=mrbgPU3OWWscctCDB4sDHB0Qsq5fC2N4W/kyTNw+ayaHY3gzaSwgXFU eevRjVMfdkRbtPOTZiPdOI16ek5WikeXrCRygx1LDJ7kslhnSqOsFxiO31hPAe9L g3Wk9XnjbTkSfC9rfa6z58gZvDLLHR9YJXJVDaFx7QL/KDxWeEWI= Received: (qmail 111152 invoked by alias); 30 Jul 2015 13:42:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 111141 invoked by uid 89); 30 Jul 2015 13:42:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 30 Jul 2015 13:42:17 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-10-PAGlK_BjSi-5_zw_7O1PRw-1; Thu, 30 Jul 2015 14:42:12 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 30 Jul 2015 14:42:11 +0100 Message-ID: <55BA29B3.30102@arm.com> Date: Thu, 30 Jul 2015 14:42:11 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Jeff Law , GCC Patches Subject: Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates References: <55B8D9EB.3060806@arm.com> <55B955D9.9070103@redhat.com> In-Reply-To: <55B955D9.9070103@redhat.com> X-MC-Unique: PAGlK_BjSi-5_zw_7O1PRw-1 X-IsSubscribed: yes Hi Jeff, On 29/07/15 23:38, Jeff Law wrote: > On 07/29/2015 07:49 AM, Kyrill Tkachov wrote: >> Hi all, >> >> This patch improves RTL if-conversion on sequences that perform a >> conditional select on integer constants. >> Most of the smart logic to do that already exists in the >> noce_try_store_flag_constants function. >> However, currently that function is tried after noce_try_cmove. >> noce_try_cmove is a simple catch-all function that just loads the two >> immediates and performs a conditional >> select between them. It returns true and then the caller >> noce_process_if_block doesn't try any other transformations, >> completely skipping the more aggressive transformations that >> noce_try_store_flag_constants allows! >> >> Calling noce_try_store_flag_constants before noce_try_cmove allows for >> the smarter if-conversion transformations >> to be used. An example that occurs a lot in the gcc code itself is for >> the C code: >> int >> foo (int a, int b) >> { >> return ((a & (1 << 25)) ? 5 : 4); >> } >> >> i.e. test a bit in a and return 5 or 4. Currently on aarch64 this >> generates the naive: >> and w2, w0, 33554432 // mask away all bits except bit 25 >> mov w1, 4 >> cmp w2, wzr >> mov w0, 5 >> csel w0, w0, w1, ne >> >> >> whereas with this patch this can be transformed into the much better: >> ubfx x0, x0, 25, 1 // extract bit 25 >> add w0, w0, 4 > I suspect the PA would benefit from this as well, probably several > other architectures with reasonable bitfield extraction capabilities. This helps on arm as well but isn't enabled there at the moment because this function is gated on !have_conditional_execution (). We can look at enabling this for conditional execution as well as a separate piece of work. > >> Another issue I encountered is that the code that claims to perform the >> transformation: >> /* if (test) x = 3; else x = 4; >> => x = 3 + (test == 0); */ >> >> doesn't seem to do exactly that in all cases. In fact for that case it >> will try something like: >> x = 4 - (test == 0) >> which is suboptimal for targets like aarch64 which have a conditional >> increment operation. > I vaguely recall targets that don't have const - X insns, but do have X > + const style insns. And more generally I think we're better off > generating the PLUS version. Now that I think of it, aarch64 does have the CSETM instruction that conditionally sets a register to 0 or -1, but I didn't see it being utilised as frequently as I'd like to. Anyway, I do prefer generating the PLUS version when possible too, which is what this patch tries to do. > > >> This patch tweaks that code to always try to generate an addition of the >> condition rather than >> a subtraction. >> >> Anyway, for code: >> int >> fooinc (int x) >> { >> return x ? 1025 : 1026; >> } >> >> we currently generate: >> mov w2, 1025 >> mov w1, 1026 >> cmp w0, wzr >> csel w0, w2, w1, ne >> >> whereas with this patch we will generate: >> cmp w0, wzr >> cset w0, eq >> add w0, w0, 1025 >> >> Bootstrapped and tested on arm, aarch64, x86_64. >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> P.S. noce_try_store_flag_constants is currently gated on >> !targetm.have_conditional_execution () but I don't see >> any reason to restrict it on targets with conditional execution. For >> example, I think the first example above >> would show a benefit on arm if it was enabled there. But that can be a >> separate investigation. >> >> 2015-07-29 Kyrylo Tkachov >> >> * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is >> -STORE_FLAG and condition is reversable. Prefer to add to the >> flag value. >> (noce_process_if_block): Try noce_try_store_flag_constants before >> noce_try_cmove. >> >> 2015-07-29 Kyrylo Tkachov >> >> * gcc.target/aarch64/csel_bfx_1.c: New test. >> * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. >> >> ifcvt-csel-imms.patch >> >> >> commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d >> Author: Kyrylo Tkachov >> Date: Tue Jul 28 14:59:46 2015 +0100 >> >> [RTL-ifcvt] Improve conditional increment ops on immediates >> >> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c >> index a57d78c..80d0285 100644 >> --- a/gcc/ifcvt.c >> +++ b/gcc/ifcvt.c >> @@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) >> >> reversep = 0; >> if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) >> - normalize = 0; >> + normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse; > We generally avoid using a ',' operator like this. However, I can see > that you're just following existing convention in that code. So I won't > object. Ok, I've respun the patch to convert this part to use a more conventional style and also took the liberty of converting reversep and can_reverse into bool variables. > > > >> else if (ifalse == 0 && exact_log2 (itrue) >= 0 >> && (STORE_FLAG_VALUE == 1 >> || if_info->branch_cost >= 2)) >> @@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) >> => x = 3 + (test == 0); */ >> if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) >> { >> - target = expand_simple_binop (mode, >> - (diff == STORE_FLAG_VALUE >> - ? PLUS : MINUS), >> - gen_int_mode (ifalse, mode), target, >> + rtx_code code = reversep ? PLUS : >> + (diff == STORE_FLAG_VALUE ? PLUS >> + : MINUS); >> + HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse; >> + >> + target = expand_simple_binop (mode, code, >> + gen_int_mode (to_add, mode), target, >> if_info->x, 0, OPTAB_WIDEN); >> } > Note that STORE_FLAG_VALUE can potentially be any value. Most targets > use "1", but others use -1 (m68k for example, there are others). I'm > concerned that the reversep ? MIN (ifalse, itrue) won't do what we want > on targets such as the m68k. > > The logic here is also somewhat convoluted. When reversep is true, we > always use PLUS, but is that actually correct? Normally we'd compute > the code, then invert the code. ie, if we normally want PLUS, then > we've flip to MINUS and vice-versa. I agree that it's hard to follow. In this respin I have explicitly laid out the different combinations of diff and STORE_FLAG_VALUE. It is more verbose now, but hopefully it's easier to follow. > > >> @@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info) >> goto success; >> if (noce_try_abs (if_info)) >> goto success; >> + if (!targetm.have_conditional_execution () >> + && noce_try_store_flag_constants (if_info)) >> + goto success; >> if (HAVE_conditional_move >> && noce_try_cmove (if_info)) >> goto success; >> if (! targetm.have_conditional_execution ()) >> { >> - if (noce_try_store_flag_constants (if_info)) >> - goto success; >> if (noce_try_addcc (if_info)) >> goto success; >> if (noce_try_store_flag_mask (if_info)) > This part seems fine and ought to be able to go forward independently. > Consider this hunk and its associated testcase approved if you want to > test it independently and get it installed while we iterate on the other > hunks, This respin includes that hunk, however if this patch goes through any more respins then I'll separate it out into a separate patch. This version has been bootstrapped and tested on x86_64 and aarch64. Thanks for the feedback! Kyrill 2015-07-30 Kyrylo Tkachov * ifcvt.c (noce_try_store_flag_constants): Make logic of the case when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more explicit. Prefer to add the flag whenever possible. (noce_process_if_block): Try noce_try_store_flag_constants before noce_try_cmove. 2015-07-30 Kyrylo Tkachov * gcc.target/aarch64/csel_bfx_1.c: New test. * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. > > jeff > > commit bd720b6a5a298449ad850517e0fc29e825c702d9 Author: Kyrylo Tkachov Date: Tue Jul 28 14:59:46 2015 +0100 [RTL-ifcvt] Improve conditional increment ops on immediates diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index a57d78c..909c674 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1194,9 +1194,10 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) { rtx target; rtx_insn *seq; - int reversep; + bool reversep; HOST_WIDE_INT itrue, ifalse, diff, tmp; - int normalize, can_reverse; + int normalize; + bool can_reverse; machine_mode mode; if (!noce_simple_bbs (if_info)) @@ -1208,6 +1209,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) mode = GET_MODE (if_info->x); ifalse = INTVAL (if_info->a); itrue = INTVAL (if_info->b); + bool subtract_flag_p = false; diff = (unsigned HOST_WIDE_INT) itrue - ifalse; /* Make sure we can represent the difference between the two values. */ @@ -1220,23 +1222,61 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) can_reverse = (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN); - reversep = 0; + reversep = false; if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) - normalize = 0; + { + normalize = 0; + /* We could collapse these cases but it is easier to follow the + diff/STORE_FLAG_VALUE combinations when they are listed + explicitly. */ + + /* test ? 3 : 4 + => 4 + (test != 0). */ + if (diff < 0 && STORE_FLAG_VALUE < 0) + reversep = false; + /* test ? 4 : 3 + => can_reverse | 4 + (test == 0) + !can_reverse | 3 - (test != 0). */ + else if (diff > 0 && STORE_FLAG_VALUE < 0) + { + reversep = can_reverse; + subtract_flag_p = !can_reverse; + } + /* test ? 3 : 4 + => can_reverse | 3 + (test == 0) + !can_reverse | 4 - (test != 0). */ + else if (diff < 0 && STORE_FLAG_VALUE > 0) + { + reversep = can_reverse; + subtract_flag_p = !can_reverse; + } + /* test ? 4 : 3 + => 4 + (test != 0). */ + else if (diff > 0 && STORE_FLAG_VALUE > 0) + reversep = false; + else + gcc_unreachable (); + } else if (ifalse == 0 && exact_log2 (itrue) >= 0 && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2)) normalize = 1; else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2)) - normalize = 1, reversep = 1; + { + normalize = 1; + reversep = true; + } else if (itrue == -1 && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2)) normalize = -1; else if (ifalse == -1 && can_reverse && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2)) - normalize = -1, reversep = 1; + { + normalize = -1; + reversep = true; + } else if ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1) || if_info->branch_cost >= 3) normalize = -1; @@ -1261,9 +1301,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) => x = 3 + (test == 0); */ if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) { - target = expand_simple_binop (mode, - (diff == STORE_FLAG_VALUE - ? PLUS : MINUS), + /* Always use ifalse here. It should have been swapped with itrue + when appropriate when reversep is true. */ + target = expand_simple_binop (mode, subtract_flag_p ? MINUS : PLUS, gen_int_mode (ifalse, mode), target, if_info->x, 0, OPTAB_WIDEN); } @@ -3120,13 +3160,14 @@ noce_process_if_block (struct noce_if_info *if_info) goto success; if (noce_try_abs (if_info)) goto success; + if (!targetm.have_conditional_execution () + && noce_try_store_flag_constants (if_info)) + goto success; if (HAVE_conditional_move && noce_try_cmove (if_info)) goto success; if (! targetm.have_conditional_execution ()) { - if (noce_try_store_flag_constants (if_info)) - goto success; if (noce_try_addcc (if_info)) goto success; if (noce_try_store_flag_mask (if_info)) diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c new file mode 100644 index 0000000..c20597f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +int +foo (int a, int b) +{ + return ((a & (1 << 25)) ? 5 : 4); +} + +/* { dg-final { scan-assembler "ubfx\t\[xw\]\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c new file mode 100644 index 0000000..2ae434d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csel_imms_inc_1.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-options "-save-temps -O2 -fno-inline" } */ + +extern void abort (void); + +int +fooinc (int x) +{ + if (x) + return 1025; + else + return 1026; +} + +int +fooinc2 (int x) +{ + if (x) + return 1026; + else + return 1025; +} + +int +main (void) +{ + if (fooinc (0) != 1026) + abort (); + + if (fooinc (1) != 1025) + abort (); + + if (fooinc2 (0) != 1025) + abort (); + + if (fooinc2 (1) != 1026) + abort (); + + return 0; +} + +/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */