Message ID | 4C6F0FD1.7040207@codesourcery.com |
---|---|
State | New |
Headers | show |
On Sat, 2010-08-21 at 01:29 +0200, Bernd Schmidt wrote: > A week or two ago I posted a patch which added some new > canonicalizations in the combiner, to change > (plus (mult X -Y) Z) > to > (minus Z (mult X Y)) > > This helped on ARM, and to a lesser degree on x86. I mentioned at the > time that I was aware of one counterexample, in the 20040709-2 testcase. > Here, we have two successive pairs of multiply and add insns, which can > be combined to a single multiply and add - however, that becomes > (minus (const_int) (mult X (const_int)) > which is not recognizable on i686 even when split. > > Since then I've also found a couple additional examples on x86_64 with > vectorization enabled. > > The patch below corrects the problem by tweaking find_split_point. If > we have > (minus Z (mult X Y)) > at the top level, and Y is not a constant power of 2, assume it is > easier for the target to split the (plus (mult X -Y) Z) form. > > - imull $1103515245, %ecx, %r8d > - addl $12345, %r8d > - imull $1103515245, %r8d, %r8d > - addl $12345, %r8d > + imull $-1029531031, %ecx, %r8d > + subl $740551042, %r8d > > Bootstrapped and tested on x86_64-linux. Looking at code generation on > ARMv7 and x86_64, a few sequences were improved, and I did not discover > additional regressions. > > Ok? > > This is OK. R.
Index: testsuite/gcc.target/i386/combine-mul.c =================================================================== --- testsuite/gcc.target/i386/combine-mul.c (revision 0) +++ testsuite/gcc.target/i386/combine-mul.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not "12345" } } */ + +static inline unsigned int myrnd (void) +{ + static unsigned int s = 1388815473; + s *= 1103515245; + s += 12345; +} + +struct __attribute__ ((packed)) A { + unsigned short i:1, l:1, j:3, k:11; +}; + +struct A sA; +void testA (void) +{ + char *p = (char *) &sA; + *p++ = myrnd (); + *p++ = myrnd (); + sA.k = -1; +} Index: combine.c =================================================================== --- combine.c (revision 163389) +++ combine.c (working copy) @@ -4771,6 +4771,23 @@ find_split_point (rtx *loc, rtx insn, bo case PLUS: case MINUS: + /* Canonicalization can produce (minus A (mult B C)), where C is a + constant. It may be better to try splitting (plus (mult B -C) A) + instead if this isn't a multiply by a power of two. */ + if (set_src && code == MINUS && GET_CODE (XEXP (x, 1)) == MULT + && GET_CODE (XEXP (XEXP (x, 1), 1)) == CONST_INT + && exact_log2 (INTVAL (XEXP (XEXP (x, 1), 1))) < 0) + { + enum machine_mode mode = GET_MODE (x); + unsigned HOST_WIDE_INT this_int = INTVAL (XEXP (XEXP (x, 1), 1)); + HOST_WIDE_INT other_int = trunc_int_for_mode (-this_int, mode); + SUBST (*loc, gen_rtx_PLUS (mode, gen_rtx_MULT (mode, + XEXP (XEXP (x, 1), 0), + GEN_INT (other_int)), + XEXP (x, 0))); + return find_split_point (loc, insn, set_src); + } + /* Split at a multiply-accumulate instruction. However if this is the SET_SRC, we likely do not have such an instruction and it's worthless to try this split. */