Message ID | 8F47DDC3-F9FE-4E94-90F7-3A16A3FD47CE@comcast.net |
---|---|
State | New |
Headers | show |
On 04/08/14 14:07, Mike Stump wrote: > Something broke in the compiler to cause combine to incorrectly optimize: > > (insn 12 11 13 3 (set (reg:SI 604 [ D.6102 ]) > (lshiftrt:SI (subreg/s/u:SI (reg/v:DI 601 [ x ]) 0) > (reg:SI 602 [ D.6103 ]))) t.c:47 4436 {lshrsi3} > (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ]) > (nil))) > (insn 13 12 14 3 (set (reg:SI 605) > (and:SI (reg:SI 604 [ D.6102 ]) > (const_int 1 [0x1]))) t.c:47 3658 {andsi3} > (expr_list:REG_DEAD (reg:SI 604 [ D.6102 ]) > (nil))) > (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ]) > (zero_extend:DI (reg:SI 605))) t.c:47 4616 {zero_extendsidi2} > (expr_list:REG_DEAD (reg:SI 605) > (nil))) > > into: > > (insn 11 10 12 3 (set (reg:SI 602 [ D.6103 ]) > (not:SI (subreg:SI (reg:DI 595 [ D.6102 ]) 0))) t.c:47 3732 {one_cmplsi2} > (expr_list:REG_DEAD (reg:DI 595 [ D.6102 ]) > (nil))) > (note 12 11 13 3 NOTE_INSN_DELETED) > (note 13 12 14 3 NOTE_INSN_DELETED) > (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ]) > (zero_extract:DI (reg/v:DI 601 [ x ]) > (const_int 1 [0x1]) > (reg:SI 602 [ D.6103 ]))) t.c:47 4668 {c2_extzvdi} > (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ]) > (nil))) > > This shows up in: > > FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution, -Og -g > > for me. > > diff --git a/gcc/combine.c b/gcc/combine.c > index 708691f..c1f50ff 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -7245,6 +7245,18 @@ make_extraction (enum machine_mode mode, rtx inner, HOST_WIDE_INT pos, > extraction_mode = insn.field_mode; > } > > + /* On a SHIFT_COUNT_TRUNCATED machine, we can't promote the mode of > + the extract to a larger size on a variable extract, as previously > + the position might have been optimized to change a bit of the > + index of the starting bit that would have been ignored before, > + but, with a larger mode, will then not be. If we wanted to do > + this, we'd have to mask out those bits or prove that those bits > + are 0. */ > + if (SHIFT_COUNT_TRUNCATED > + && pos_rtx > + && GET_MODE_BITSIZE (extraction_mode) > GET_MODE_BITSIZE (mode)) > + extraction_mode = mode; > + > /* Never narrow an object, since that might not be safe. */ > > if (mode != VOIDmode > > is sufficient to never widen variable extracts on SHIFT_COUNT_TRUNCATED machines. So, the question is, how did people expect this to work? I didn’t spot what changed recently to cause the bad code-gen. The optimization of sub into not is ok, despite how funny it looks, because is feeds into extract which we know by SHIFT_COUNT_TRUNCATED is safe. > > Is the patch a reasonable way to fix this? On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a shift, variable or constant. In the case of a variable shift, we could easily have eliminated the masking code before or during combine. For a constant shift amount we could have adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c) I think it's just an oversight and it has simply never bit us before. jeff
On Mon, Jan 12, 2015 at 11:12 PM, Jeff Law <law@redhat.com> wrote: > On 04/08/14 14:07, Mike Stump wrote: >> >> Something broke in the compiler to cause combine to incorrectly optimize: >> >> (insn 12 11 13 3 (set (reg:SI 604 [ D.6102 ]) >> (lshiftrt:SI (subreg/s/u:SI (reg/v:DI 601 [ x ]) 0) >> (reg:SI 602 [ D.6103 ]))) t.c:47 4436 {lshrsi3} >> (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ]) >> (nil))) >> (insn 13 12 14 3 (set (reg:SI 605) >> (and:SI (reg:SI 604 [ D.6102 ]) >> (const_int 1 [0x1]))) t.c:47 3658 {andsi3} >> (expr_list:REG_DEAD (reg:SI 604 [ D.6102 ]) >> (nil))) >> (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ]) >> (zero_extend:DI (reg:SI 605))) t.c:47 4616 {zero_extendsidi2} >> (expr_list:REG_DEAD (reg:SI 605) >> (nil))) >> >> into: >> >> (insn 11 10 12 3 (set (reg:SI 602 [ D.6103 ]) >> (not:SI (subreg:SI (reg:DI 595 [ D.6102 ]) 0))) t.c:47 3732 >> {one_cmplsi2} >> (expr_list:REG_DEAD (reg:DI 595 [ D.6102 ]) >> (nil))) >> (note 12 11 13 3 NOTE_INSN_DELETED) >> (note 13 12 14 3 NOTE_INSN_DELETED) >> (insn 14 13 15 3 (set (reg:DI 599 [ D.6102 ]) >> (zero_extract:DI (reg/v:DI 601 [ x ]) >> (const_int 1 [0x1]) >> (reg:SI 602 [ D.6103 ]))) t.c:47 4668 {c2_extzvdi} >> (expr_list:REG_DEAD (reg:SI 602 [ D.6103 ]) >> (nil))) >> >> This shows up in: >> >> FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution, -Og -g >> >> for me. >> >> diff --git a/gcc/combine.c b/gcc/combine.c >> index 708691f..c1f50ff 100644 >> --- a/gcc/combine.c >> +++ b/gcc/combine.c >> @@ -7245,6 +7245,18 @@ make_extraction (enum machine_mode mode, rtx inner, >> HOST_WIDE_INT pos, >> extraction_mode = insn.field_mode; >> } >> >> + /* On a SHIFT_COUNT_TRUNCATED machine, we can't promote the mode of >> + the extract to a larger size on a variable extract, as previously >> + the position might have been optimized to change a bit of the >> + index of the starting bit that would have been ignored before, >> + but, with a larger mode, will then not be. If we wanted to do >> + this, we'd have to mask out those bits or prove that those bits >> + are 0. */ >> + if (SHIFT_COUNT_TRUNCATED >> + && pos_rtx >> + && GET_MODE_BITSIZE (extraction_mode) > GET_MODE_BITSIZE (mode)) >> + extraction_mode = mode; >> + >> /* Never narrow an object, since that might not be safe. */ >> >> if (mode != VOIDmode >> >> is sufficient to never widen variable extracts on SHIFT_COUNT_TRUNCATED >> machines. So, the question is, how did people expect this to work? I >> didn’t spot what changed recently to cause the bad code-gen. The >> optimization of sub into not is ok, despite how funny it looks, because is >> feeds into extract which we know by SHIFT_COUNT_TRUNCATED is safe. >> >> Is the patch a reasonable way to fix this? > > On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a > shift, variable or constant. > > In the case of a variable shift, we could easily have eliminated the masking > code before or during combine. For a constant shift amount we could have > adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c) > > I think it's just an oversight and it has simply never bit us before. IMHO SHIFT_COUNT_TRUNCATED should be removed and instead backends should provide shift patterns with a (and:QI ...) for the shift amount which simply will omit that operation if suitable. Richard. > jeff
On 01/13/15 02:51, Richard Biener wrote: >> On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a >> shift, variable or constant. >> >> In the case of a variable shift, we could easily have eliminated the masking >> code before or during combine. For a constant shift amount we could have >> adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c) >> >> I think it's just an oversight and it has simply never bit us before. > > IMHO SHIFT_COUNT_TRUNCATED should be removed and instead > backends should provide shift patterns with a (and:QI ...) for the > shift amount which simply will omit that operation if suitable. Perhaps. I'm certainly not wed to concept of SHIFT_COUNT_TRUNCATED. I don't see that getting addressed in the gcc-5 timeframe. aarch64, alpha, epiphany, iq2000, lm32, m32r, mep, microblaze, mips, mn103, nds32, pa, sparc, stormy16, tilepro, v850 and xtensa are the current SHIFT_COUNT_TRUNCATED targets. Jeff
On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote: > IMHO SHIFT_COUNT_TRUNCATED should be removed and instead > backends should provide shift patterns with a (and:QI ...) for the > shift amount which simply will omit that operation if suitable. Note that that catches less though, e.g. in int f(int x, int n) { return x << ((2*n) & 31); } without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30, not with 31. Segher
On Tue, Jan 13, 2015 at 6:38 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote: >> IMHO SHIFT_COUNT_TRUNCATED should be removed and instead >> backends should provide shift patterns with a (and:QI ...) for the >> shift amount which simply will omit that operation if suitable. > > Note that that catches less though, e.g. in > > int f(int x, int n) { return x << ((2*n) & 31); } > > without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30, > not with 31. But even with SHIFT_COUNT_TRUNCATED you cannot omit the and as it clears the LSB. Only at a higher level we might be tempted to drop the & 31 while it still persists in its original form (not sure if fold does that - I don't see SHIFT_COUNT_TRUNCATED mentioned there). Richard. > > Segher
On Wed, Jan 14, 2015 at 10:10:24AM +0100, Richard Biener wrote: > On Tue, Jan 13, 2015 at 6:38 PM, Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote: > >> IMHO SHIFT_COUNT_TRUNCATED should be removed and instead > >> backends should provide shift patterns with a (and:QI ...) for the > >> shift amount which simply will omit that operation if suitable. > > > > Note that that catches less though, e.g. in > > > > int f(int x, int n) { return x << ((2*n) & 31); } > > > > without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30, > > not with 31. > > But even with SHIFT_COUNT_TRUNCATED you cannot omit the > and as it clears the LSB. The 2*n already does that. Before combine, we have something like t1 = n << 1 t2 = t1 & 30 ret = x << t2 (it actually has some register copies to more temporaries), and on SHIFT_COUNT_TRUNCATED targets where the first two insns don't combine, e.g. m32r, currently combine ends up with t1 = n << 1 ret = x << t1 while it doesn't without SHIFT_COUNT_TRUNCATED if you only have a x << (n & 31) pattern. I'm all for eradicating SHIFT_COUNT_TRUNCATED; just pointing out that it is not trivial to fully replace (just the important, obvious cases are easy). Segher
diff --git a/gcc/combine.c b/gcc/combine.c index 708691f..c1f50ff 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7245,6 +7245,18 @@ make_extraction (enum machine_mode mode, rtx inner, HOST_WIDE_INT pos, extraction_mode = insn.field_mode; } + /* On a SHIFT_COUNT_TRUNCATED machine, we can't promote the mode of + the extract to a larger size on a variable extract, as previously + the position might have been optimized to change a bit of the + index of the starting bit that would have been ignored before, + but, with a larger mode, will then not be. If we wanted to do + this, we'd have to mask out those bits or prove that those bits + are 0. */ + if (SHIFT_COUNT_TRUNCATED + && pos_rtx + && GET_MODE_BITSIZE (extraction_mode) > GET_MODE_BITSIZE (mode)) + extraction_mode = mode; + /* Never narrow an object, since that might not be safe. */ if (mode != VOIDmode