Message ID | 56376FFF.3070008@arm.com |
---|---|
State | New |
Headers | show |
On 11/02/2015 07:15 AM, Kyrill Tkachov wrote: > Hi all, > > This patch attempts to restrict combine from transforming ZERO_EXTEND > and SIGN_EXTEND operations into and-bitmask > and weird SUBREG expressions when they appear inside MULT expressions. > This is because a MULT rtx containing these > extend operations is usually a well understood widening multiply operation. > However, if the costs for simple zero or sign-extend moves happen to > line up in a particular way, > expand_compound_operation will end up mangling a perfectly innocent > extend+mult+add rtx like: > (set (reg/f:DI 393) > (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ])) > (sign_extend:DI (reg:SI 606))) > (reg:DI 600))) > > into: > (set (reg/f:DI 393) > (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0) > (const_int 202 [0xca])) > (sign_extend:DI (reg/v:SI 425 [ selected ]))) > (reg:DI 600))) Going to leave the review side of this for Segher. If you decide to go forward, there's a section in md.texi WRT canonicalization of these RTL codes that probably would need updating. Just search for "canonicalization" section and read down a ways. Jeff
Ji Jrgg, On 02/11/15 22:31, Jeff Law wrote: > On 11/02/2015 07:15 AM, Kyrill Tkachov wrote: >> Hi all, >> >> This patch attempts to restrict combine from transforming ZERO_EXTEND >> and SIGN_EXTEND operations into and-bitmask >> and weird SUBREG expressions when they appear inside MULT expressions. >> This is because a MULT rtx containing these >> extend operations is usually a well understood widening multiply operation. >> However, if the costs for simple zero or sign-extend moves happen to >> line up in a particular way, >> expand_compound_operation will end up mangling a perfectly innocent >> extend+mult+add rtx like: >> (set (reg/f:DI 393) >> (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ])) >> (sign_extend:DI (reg:SI 606))) >> (reg:DI 600))) >> >> into: >> (set (reg/f:DI 393) >> (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0) >> (const_int 202 [0xca])) >> (sign_extend:DI (reg/v:SI 425 [ selected ]))) >> (reg:DI 600))) > Going to leave the review side of this for Segher. > > If you decide to go forward, there's a section in md.texi WRT canonicalization of these RTL codes that probably would need updating. Just search for "canonicalization" section and read down a ways. > You mean document a canonical form for these operations as (mult (extend op1) (extend op2)) ? > > Jeff > >
On 04/11/15 11:37, Kyrill Tkachov wrote: > > On 02/11/15 22:31, Jeff Law wrote: >> On 11/02/2015 07:15 AM, Kyrill Tkachov wrote: >>> Hi all, >>> >>> This patch attempts to restrict combine from transforming ZERO_EXTEND >>> and SIGN_EXTEND operations into and-bitmask >>> and weird SUBREG expressions when they appear inside MULT expressions. >>> This is because a MULT rtx containing these >>> extend operations is usually a well understood widening multiply operation. >>> However, if the costs for simple zero or sign-extend moves happen to >>> line up in a particular way, >>> expand_compound_operation will end up mangling a perfectly innocent >>> extend+mult+add rtx like: >>> (set (reg/f:DI 393) >>> (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ])) >>> (sign_extend:DI (reg:SI 606))) >>> (reg:DI 600))) >>> >>> into: >>> (set (reg/f:DI 393) >>> (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0) >>> (const_int 202 [0xca])) >>> (sign_extend:DI (reg/v:SI 425 [ selected ]))) >>> (reg:DI 600))) >> Going to leave the review side of this for Segher. >> >> If you decide to go forward, there's a section in md.texi WRT canonicalization of these RTL codes that probably would need updating. Just search for "canonicalization" section and read down a ways. >> > > You mean document a canonical form for these operations as (mult (extend op1) (extend op2)) ? > Looking at the issue more deeply I think the concrete problem is the code in expand_compound_operation: <code> /* If we reach here, we want to return a pair of shifts. The inner shift is a left shift of BITSIZE - POS - LEN bits. The outer shift is a right shift of BITSIZE - LEN bits. It is arithmetic or logical depending on the value of UNSIGNEDP. If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be converted into an AND of a shift. We must check for the case where the left shift would have a negative count. This can happen in a case like (x >> 31) & 255 on machines that can't shift by a constant. On those machines, we would first combine the shift with the AND to produce a variable-position extraction. Then the constant of 31 would be substituted in to produce such a position. */ modewidth = GET_MODE_PRECISION (GET_MODE (x)); if (modewidth >= pos + len) { machine_mode mode = GET_MODE (x); tem = gen_lowpart (mode, XEXP (x, 0)); if (!tem || GET_CODE (tem) == CLOBBER) return x; tem = simplify_shift_const (NULL_RTX, ASHIFT, mode, tem, modewidth - pos - len); tem = simplify_shift_const (NULL_RTX, unsignedp ? LSHIFTRT : ASHIFTRT, mode, tem, modewidth - len); } else if (unsignedp && len < HOST_BITS_PER_WIDE_INT) tem = simplify_and_const_int (NULL_RTX, GET_MODE (x), simplify_shift_const (NULL_RTX, LSHIFTRT, GET_MODE (x), XEXP (x, 0), pos), ((unsigned HOST_WIDE_INT) 1 << len) - 1); else /* Any other cases we can't handle. */ return x; </code> this code ends up unconditionally transforming (zero_extend:DI (reg:SI 125)) into: (and:DI (subreg:DI (reg:SI 125) 0) (const_int 1 [0x1])) which then gets substituted as an operand of the mult and nothing matches after that. archaeology shows this code has been there since at least 1992. I guess my question is why do we do this unconditionally? Should we be checking whether the zero_extend form is cheaper than whatever simplify_shift_const returns? I don't think what expand_compound_operatio is trying to do here is canonicalisation... Thanks, Kyrill >> >> Jeff >> >> >
commit 2005243d896cbeb3d22421a63f080699ab357610 Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Fri Oct 30 13:56:10 2015 +0000 [combine] Don't transform sign and zero extends inside mults diff --git a/gcc/combine.c b/gcc/combine.c index fa1a93f..be0f5ae 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -5369,6 +5369,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) n_occurrences++; } else + { /* If we are in a SET_DEST, suppress most cases unless we have gone inside a MEM, in which case we want to simplify the address. We assume here that things that @@ -5376,15 +5377,33 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) parts in the first expression. This is true for SUBREG, STRICT_LOW_PART, and ZERO_EXTRACT, which are the only things aside from REG and MEM that should appear in a - SET_DEST. */ - new_rtx = subst (XEXP (x, i), from, to, + SET_DEST. Also, restrict transformations on SIGN_EXTENDs + and ZERO_EXTENDs if they appear inside multiplications if + the target supports widening multiplies. This is to avoid + mangling such expressions beyond recognition. */ + bool restrict_extend_p = false; + rtx_code inner_code = GET_CODE (XEXP (x, i)); + + if (code == MULT + && (inner_code == SIGN_EXTEND + || inner_code == ZERO_EXTEND) + && widening_optab_handler (inner_code == ZERO_EXTEND + ? umul_widen_optab + : smul_widen_optab, + GET_MODE (x), + GET_MODE (XEXP (XEXP (x, i), 0))) + != CODE_FOR_nothing) + restrict_extend_p = true; + + new_rtx = subst (XEXP (x, i), from, to, (((in_dest && (code == SUBREG || code == STRICT_LOW_PART || code == ZERO_EXTRACT)) || code == SET) - && i == 0), + && i == 0) || restrict_extend_p, code == IF_THEN_ELSE && i == 0, unique_copy); + } /* If we found that we will have to reject this combination, indicate that by returning the CLOBBER ourselves, rather than @@ -8509,8 +8528,30 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask, /* For most binary operations, just propagate into the operation and change the mode if we have an operation of that mode. */ - op0 = force_to_mode (XEXP (x, 0), mode, mask, next_select); - op1 = force_to_mode (XEXP (x, 1), mode, mask, next_select); + op0 = XEXP (x, 0); + op1 = XEXP (x, 1); + + /* If we have a widening multiply avoid messing with the + ZERO/SIGN_EXTEND operands so that we can still match the + appropriate smul/umul patterns. */ + if (GET_CODE (x) == MULT + && GET_CODE (op0) == GET_CODE (op1) + && (GET_CODE (op0) == SIGN_EXTEND + || GET_CODE (op0) == ZERO_EXTEND) + && GET_MODE (op0) == GET_MODE (op1) + && widening_optab_handler (GET_CODE (op0) == ZERO_EXTEND + ? umul_widen_optab + : smul_widen_optab, + GET_MODE (x), + GET_MODE (XEXP (op0, 0))) + != CODE_FOR_nothing) + ; + else + { + op0 = force_to_mode (op0, mode, mask, next_select); + op1 = force_to_mode (op1, mode, mask, next_select); + } + /* If we ended up truncating both operands, truncate the result of the operation instead. */ diff --git a/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c new file mode 100644 index 0000000..703c94d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/umaddl_combine_1.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=cortex-a53" } */ + +enum reg_class +{ + NO_REGS, + AD_REGS, + ALL_REGS, LIM_REG_CLASSES +}; + +extern enum reg_class + reg_class_subclasses[((int) LIM_REG_CLASSES)][((int) LIM_REG_CLASSES)]; + +void +init_reg_sets_1 () +{ + unsigned int i, j; + { + for (j = i + 1; j < ((int) LIM_REG_CLASSES); j++) + { + enum reg_class *p; + p = ®_class_subclasses[j][0]; + while (*p != LIM_REG_CLASSES) + p++; + } + } +} + +/* { dg-final { scan-assembler-not "umull\tx\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */