Message ID | 55CB71E4.6040507@arm.com |
---|---|
State | New |
Headers | show |
On 08/12/2015 10:18 AM, Kyrill Tkachov wrote: > Hi all, > > This patch is a sequel to: > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html > > It allows if-conversion of expressions of the form: > if (test) x := y + c1; else x := y + c2 > where c1 and c2 are a pair constants that can be optimised using the > existing rules in noce_try_store_flag_constants. > > The resulting sequence looks something like: > x := (test [!=,==] 0) + y; > x := x + [c1,c2]; > > This patch reuses the logic for the combinations of diff and STORE_FLAG > signs that > is hammered out in > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html. > > That means we just extract the constants c1 and c2 from the PLUS rtx > and use their diff as the driver for the logic in that function. > > Currently, this patch allows only the case where > diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE > as it is the most beneficial case (from looking at SPEC2006 on aarch64) > and also the least tricky to get right. > In the future I suppose that restriction can be relaxed if needed, but > in any case > this should be an improvement over the existing situation. > > On aarch64 this allows us for code: > int > barsi (int x) > { > return x > 100 ? x + 4 : x + 3; > } > > to generate: > barsi: > cmp w0, 101 > cinc w0, w0, ge > add w0, w0, 3 > ret > > instead of the current: > barsi: > add w1, w0, 4 > add w2, w0, 3 > cmp w0, 101 > csel w0, w2, w1, lt > ret > > thus saving one instruction and using two fewer registers. > > Bootstrapped and tested on arm, aarch64, x86_64. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-08-12 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * ifcvt.c (noce_try_store_flag_constants): Handle PLUS-immediate > expressions in A and B. > > 2015-08-12 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/cinc_common_1.c: New test. The function comment for noce_try_store_flag_constants needs to be updated. OK with that fix. Presumably there's some obvious extensions here for operators other than PLUS, the question in my mind is whether or not they matter in practice. Curious to hear your thoughts about handling other operators and whether or not you think it's worth it as a follow-up. Jeff
On 12/08/15 18:32, Jeff Law wrote: > On 08/12/2015 10:18 AM, Kyrill Tkachov wrote: >> Hi all, >> >> This patch is a sequel to: >> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html >> >> It allows if-conversion of expressions of the form: >> if (test) x := y + c1; else x := y + c2 >> where c1 and c2 are a pair constants that can be optimised using the >> existing rules in noce_try_store_flag_constants. >> >> The resulting sequence looks something like: >> x := (test [!=,==] 0) + y; >> x := x + [c1,c2]; >> >> This patch reuses the logic for the combinations of diff and STORE_FLAG >> signs that >> is hammered out in >> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html. >> >> That means we just extract the constants c1 and c2 from the PLUS rtx >> and use their diff as the driver for the logic in that function. >> >> Currently, this patch allows only the case where >> diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE >> as it is the most beneficial case (from looking at SPEC2006 on aarch64) >> and also the least tricky to get right. >> In the future I suppose that restriction can be relaxed if needed, but >> in any case >> this should be an improvement over the existing situation. >> >> On aarch64 this allows us for code: >> int >> barsi (int x) >> { >> return x > 100 ? x + 4 : x + 3; >> } >> >> to generate: >> barsi: >> cmp w0, 101 >> cinc w0, w0, ge >> add w0, w0, 3 >> ret >> >> instead of the current: >> barsi: >> add w1, w0, 4 >> add w2, w0, 3 >> cmp w0, 101 >> csel w0, w2, w1, lt >> ret >> >> thus saving one instruction and using two fewer registers. >> >> Bootstrapped and tested on arm, aarch64, x86_64. >> >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> 2015-08-12 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * ifcvt.c (noce_try_store_flag_constants): Handle PLUS-immediate >> expressions in A and B. >> >> 2015-08-12 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * gcc.target/aarch64/cinc_common_1.c: New test. > The function comment for noce_try_store_flag_constants needs to be updated. > > OK with that fix. Thanks, I'll update the comment and commit. > > Presumably there's some obvious extensions here for operators other than > PLUS, the question in my mind is whether or not they matter in practice. > Curious to hear your thoughts about handling other operators and > whether or not you think it's worth it as a follow-up. I did consider it. I am most interested in the case that I enabled in this patch because this can enable targets that have conditional increment instructions, like aarch64. In that case, if-conversion opens the way for combine to utilise that naturally. The other two transformations in that function are: /* if (test) x = 8; else x = 0; => x = (test != 0) << 3; */ and /* if (test) x = -1; else x = b; => x = -(test != 0) | b; */ So they either shift the stored flag, or do other bitwise ops on it. By adding an extra addition (for the common part) I don't think there's much to gain against the alternative, which will be to do two parallel additions and a conditional select. Thanks, Kyrill > > Jeff >
commit ce0de8d30962bf00339dc3a030401a4be2a7a248 Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Thu Jul 30 15:14:27 2015 +0100 [ifcvt][unfinished] Allow PLUS+immediate expression in noce_try_store_flag_constants diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index f815166..7a44e29 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1198,17 +1198,35 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) HOST_WIDE_INT itrue, ifalse, diff, tmp; int normalize; bool can_reverse; - machine_mode mode; + machine_mode mode = GET_MODE (if_info->x);; + rtx common = NULL_RTX; if (!noce_simple_bbs (if_info)) return FALSE; - if (CONST_INT_P (if_info->a) - && CONST_INT_P (if_info->b)) + rtx a = if_info->a; + rtx b = if_info->b; + + /* Handle cases like x := test ? y + 3 : y + 4. */ + if (GET_CODE (a) == PLUS + && GET_CODE (b) == PLUS + && CONST_INT_P (XEXP (a, 1)) + && CONST_INT_P (XEXP (b, 1)) + && rtx_equal_p (XEXP (a, 0), XEXP (b, 0)) + && noce_operand_ok (XEXP (a, 0)) + && if_info->branch_cost >= 2) { - mode = GET_MODE (if_info->x); - ifalse = INTVAL (if_info->a); - itrue = INTVAL (if_info->b); + common = XEXP (a, 0); + + a = XEXP (a, 1); + b = XEXP (b, 1); + } + + if (CONST_INT_P (a) + && CONST_INT_P (b)) + { + ifalse = INTVAL (a); + itrue = INTVAL (b); bool subtract_flag_p = false; diff = (unsigned HOST_WIDE_INT) itrue - ifalse; @@ -1241,6 +1259,11 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) { reversep = can_reverse; subtract_flag_p = !can_reverse; + /* If we need to subtract the flag and we have PLUS-immediate + A and B then it is unlikely to be beneficial to play tricks + here. */ + if (subtract_flag_p && common) + return FALSE; } /* test ? 3 : 4 => can_reverse | 3 + (test == 0) @@ -1249,6 +1272,11 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) { reversep = can_reverse; subtract_flag_p = !can_reverse; + /* If we need to subtract the flag and we have PLUS-immediate + A and B then it is unlikely to be beneficial to play tricks + here. */ + if (subtract_flag_p && common) + return FALSE; } /* test ? 4 : 3 => 4 + (test != 0). */ @@ -1290,6 +1318,15 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) } start_sequence (); + + /* If we have x := test ? x + 3 : x + 4 then move the original + x out of the way while we store flags. */ + if (common && rtx_equal_p (common, if_info->x)) + { + common = gen_reg_rtx (mode); + noce_emit_move_insn (common, if_info->x); + } + target = noce_emit_store_flag (if_info, if_info->x, reversep, normalize); if (! target) { @@ -1301,13 +1338,27 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) => x = 3 + (test == 0); */ if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) { + /* Add the common part now. This may allow combine to merge this + with the store flag operation earlier into some sort of conditional + increment/decrement if the target allows it. */ + if (common) + target = expand_simple_binop (mode, PLUS, + target, common, + target, 0, OPTAB_WIDEN); + /* Always use ifalse here. It should have been swapped with itrue when appropriate when reversep is true. */ target = expand_simple_binop (mode, subtract_flag_p ? MINUS : PLUS, gen_int_mode (ifalse, mode), target, if_info->x, 0, OPTAB_WIDEN); } - + /* Other cases are not beneficial when the original A and B are PLUS + expressions. */ + else if (common) + { + end_sequence (); + return FALSE; + } /* if (test) x = 8; else x = 0; => x = (test != 0) << 3; */ else if (ifalse == 0 && (tmp = exact_log2 (itrue)) >= 0) diff --git a/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c new file mode 100644 index 0000000..d041263 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cinc_common_1.c @@ -0,0 +1,64 @@ +/* { dg-do run } */ +/* { dg-options "-save-temps -O2 -fno-inline" } */ + +extern void abort (void); + +int +foosi (int x) +{ + return x > 100 ? x - 2 : x - 1; +} + +int +barsi (int x) +{ + return x > 100 ? x + 4 : x + 3; +} + +long +foodi (long x) +{ + return x > 100 ? x - 2 : x - 1; +} + +long +bardi (long x) +{ + return x > 100 ? x + 4 : x + 3; +} + +/* { dg-final { scan-assembler-times "cs?inc\tw\[0-9\]*" 2 } } */ +/* { dg-final { scan-assembler-times "cs?inc\tx\[0-9\]*" 2 } } */ + +int +main (void) +{ + if (foosi (105) != 103) + abort (); + + if (foosi (95) != 94) + abort (); + + if (barsi (105) != 109) + abort (); + + if (barsi (95) != 98) + abort (); + + if (foodi (105) != 103) + abort (); + + if (foodi (95) != 94) + abort (); + + if (bardi (105) != 109) + abort (); + + if (bardi (95) != 98) + abort (); + + return 0; +} + +/* { dg-final { scan-assembler-not "csel\tx\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "csel\tw\[0-9\]*.*" } } */