Message ID | 20240813115701.3169159-1-manolis.tsamis@vrull.eu |
---|---|
State | New |
Headers | show |
Series | ifcvt: Fix force_operand ICE due to noce_convert_multiple_sets [PR116353] | expand |
Manolis Tsamis <manolis.tsamis@vrull.eu> writes: > Now that more operations are allowed for noce_convert_multiple_sets, we need to > check noce_can_force_operand on the sequence before calling try_emit_cmove_seq. > Otherwise an inappropriate argument may be given to copy_to_mode_reg and result > in an ICE. > > Fix-up for the recent ifcvt commit 72c9b5f438f22cca493b4e2a8a2a31ff61bf1477 Thanks! Bootstrapped & tested with all FEs with no regressions on amd64. > > PR tree-optimization/116353 > > gcc/ChangeLog: > > * ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check > noce_can_force_operand. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr116353.c: New test. > > Tested-by: Christoph Müllner <christoph.muellner@vrull.eu> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > --- sam
On Tue, Aug 13, 2024 at 4:57 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > Now that more operations are allowed for noce_convert_multiple_sets, we need to > check noce_can_force_operand on the sequence before calling try_emit_cmove_seq. > Otherwise an inappropriate argument may be given to copy_to_mode_reg and result > in an ICE. > > Fix-up for the recent ifcvt commit 72c9b5f438f22cca493b4e2a8a2a31ff61bf1477 > > PR tree-optimization/116353 > > gcc/ChangeLog: > > * ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check > noce_can_force_operand. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr116353.c: New test. > > Tested-by: Christoph Müllner <christoph.muellner@vrull.eu> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > --- > > gcc/ifcvt.cc | 6 ++- > gcc/testsuite/gcc.target/i386/pr116353.c | 55 ++++++++++++++++++++++++ > 2 files changed, 59 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr116353.c > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > index 3e25f30b67e..da59c907891 100644 > --- a/gcc/ifcvt.cc > +++ b/gcc/ifcvt.cc > @@ -3938,8 +3938,10 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost) > rtx src = SET_SRC (set); > > /* Do not handle anything involving memory loads/stores since it might > - violate data-race-freedom guarantees. */ > - if (!REG_P (dest) || contains_mem_rtx_p (src)) > + violate data-race-freedom guarantees. Make sure we can force SRC > + to a register as that may be needed in try_emit_cmove_seq. */ > + if (!REG_P (dest) || contains_mem_rtx_p (src) > + || !noce_can_force_operand (src)) > return false; > > /* Destination and source must be appropriate. */ > diff --git a/gcc/testsuite/gcc.target/i386/pr116353.c b/gcc/testsuite/gcc.target/i386/pr116353.c > new file mode 100644 > index 00000000000..8e254653d5d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr116353.c Does this test contain x86 specific code? > @@ -0,0 +1,55 @@ > +/* PR tree-optimization/116353 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +enum desmode { C }; > +struct { > + unsigned char des_ivec[]; > +} _des_crypt_desp; > +int des_SPtrans_6_0, des_SPtrans_4_0, des_encrypt_encrypt, des_encrypt_i; > +long des_encrypt_s_0, _des_crypt_tin1, _des_crypt_tout0, _des_crypt_tout1, > + _des_crypt_tin0; > +enum desmode _des_crypt_desp_0; > +unsigned long _des_crypt_tbuf[2]; > +char _des_crypt_out; > +void des_encrypt(unsigned long *buf) { > + long l, r, t; > + l = buf[0]; > + r = buf[1]; > + t = r; > + r ^= l ^= t < 6; > + if (des_encrypt_encrypt) > + for (;; des_encrypt_i += 4) > + des_encrypt_s_0 ^= des_SPtrans_4_0 | des_SPtrans_6_0; > + buf[1] = r; > +} > +void _des_crypt() { > + long xor0, xor1; > + unsigned char *in; > + int cbc_mode = _des_crypt_desp_0; > + in = _des_crypt_desp.des_ivec; > + xor0 = xor1 = 0; > + for (;;) { > + _des_crypt_tin0 = *in++; > + _des_crypt_tin0 |= *in++ << 8; > + _des_crypt_tin0 |= *in++ << 16; > + _des_crypt_tin0 |= (long)*in << 24; > + _des_crypt_tin1 = *in++; > + _des_crypt_tin1 |= *in++ << 8; > + _des_crypt_tin1 |= *in++ << 16; > + _des_crypt_tin1 |= (long)*in << 24; > + _des_crypt_tbuf[0] = _des_crypt_tin0; > + _des_crypt_tbuf[1] = _des_crypt_tin1; > + des_encrypt(_des_crypt_tbuf); > + if (cbc_mode) { > + _des_crypt_tout0 = xor0; > + _des_crypt_tout1 = _des_crypt_tbuf[1] ^ xor1; > + xor0 = _des_crypt_tin0; > + xor1 = _des_crypt_tin1; > + } else { > + _des_crypt_tout0 = _des_crypt_tbuf[0]; > + _des_crypt_tout1 = _des_crypt_tbuf[1]; > + } > + _des_crypt_out = _des_crypt_tout0 * _des_crypt_tout1; > + } > +} > -- > 2.34.1 >
On 8/13/24 5:57 AM, Manolis Tsamis wrote: > Now that more operations are allowed for noce_convert_multiple_sets, we need to > check noce_can_force_operand on the sequence before calling try_emit_cmove_seq. > Otherwise an inappropriate argument may be given to copy_to_mode_reg and result > in an ICE. > > Fix-up for the recent ifcvt commit 72c9b5f438f22cca493b4e2a8a2a31ff61bf1477 > > PR tree-optimization/116353 > > gcc/ChangeLog: > > * ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check > noce_can_force_operand. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr116353.c: New test. OK. Note I'm not entirely sure that noce_can_force_operand is sufficient based on what I've seen on the v8 port. What I'm seeing on the v8 port is that we're trying to create a conditional move where one arm is a rotate expression. That in and of itself isn't an issue. The port does (of course) expect a register operand, so we use force_operand to get the result into a GPR. So far, so good. The ifcvt code does check noce_can_force_operand which returns true. I don't remember the precise details other than it looked reasonable. So still, so far, so good. The problem in the v850 doesn't have a generalized rotation pattern. The expander will FAIL for most rotate counts and there's no alternate synthesis currently defined in the optabs interface for a word mode rotate. So that in turn causes force_operand to return NULL_RTX to its caller and boom! The rotation patterns are allowed to FAIL if I'm reading the docs correctly. So it seems like what the v8 port is doing is valid, but it is causing a segfault and this testsuite failure: > Tests that now fail, but worked before (4 tests): > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > v850-sim/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) But perhaps it isn't that bad in practice. I can fix this by removing a bit of what I expect is unnecessary code in the v850 port. Jeff
Applied to master, thanks. --Philipp. On Tue, 13 Aug 2024 at 21:48, Jeff Law <jeffreyalaw@gmail.com> wrote: > > > On 8/13/24 5:57 AM, Manolis Tsamis wrote: > > Now that more operations are allowed for noce_convert_multiple_sets, we > need to > > check noce_can_force_operand on the sequence before calling > try_emit_cmove_seq. > > Otherwise an inappropriate argument may be given to copy_to_mode_reg and > result > > in an ICE. > > > > Fix-up for the recent ifcvt commit > 72c9b5f438f22cca493b4e2a8a2a31ff61bf1477 > > > > PR tree-optimization/116353 > > > > gcc/ChangeLog: > > > > * ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check > > noce_can_force_operand. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr116353.c: New test. > OK. > > Note I'm not entirely sure that noce_can_force_operand is sufficient > based on what I've seen on the v8 port. > > What I'm seeing on the v8 port is that we're trying to create a > conditional move where one arm is a rotate expression. That in and of > itself isn't an issue. The port does (of course) expect a register > operand, so we use force_operand to get the result into a GPR. So far, > so good. > > The ifcvt code does check noce_can_force_operand which returns true. I > don't remember the precise details other than it looked reasonable. So > still, so far, so good. > > The problem in the v850 doesn't have a generalized rotation pattern. > The expander will FAIL for most rotate counts and there's no alternate > synthesis currently defined in the optabs interface for a word mode > rotate. So that in turn causes force_operand to return NULL_RTX to its > caller and boom! > > The rotation patterns are allowed to FAIL if I'm reading the docs > correctly. So it seems like what the v8 port is doing is valid, but it > is causing a segfault and this testsuite failure: > > > Tests that now fail, but worked before (4 tests): > > > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: > gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess > errors) > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: > gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess > errors) > > v850-sim/-msoft-float/-mv850e3v5: gcc: > gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess > errors) > > v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer > -finline-functions (test for excess errors) > > > But perhaps it isn't that bad in practice. I can fix this by removing a > bit of what I expect is unnecessary code in the v850 port. > > Jeff >
"H.J. Lu" <hjl.tools@gmail.com> writes: > On Tue, Aug 13, 2024 at 4:57 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: >> >> Now that more operations are allowed for noce_convert_multiple_sets, we need to >> check noce_can_force_operand on the sequence before calling try_emit_cmove_seq. >> Otherwise an inappropriate argument may be given to copy_to_mode_reg and result >> in an ICE. >> >> Fix-up for the recent ifcvt commit 72c9b5f438f22cca493b4e2a8a2a31ff61bf1477 >> >> PR tree-optimization/116353 >> >> gcc/ChangeLog: >> >> * ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check >> noce_can_force_operand. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/i386/pr116353.c: New test. >> >> Tested-by: Christoph Müllner <christoph.muellner@vrull.eu> >> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> >> --- >> >> gcc/ifcvt.cc | 6 ++- >> gcc/testsuite/gcc.target/i386/pr116353.c | 55 ++++++++++++++++++++++++ >> 2 files changed, 59 insertions(+), 2 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/i386/pr116353.c >> >> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc >> index 3e25f30b67e..da59c907891 100644 >> --- a/gcc/ifcvt.cc >> +++ b/gcc/ifcvt.cc >> @@ -3938,8 +3938,10 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost) >> rtx src = SET_SRC (set); >> >> /* Do not handle anything involving memory loads/stores since it might >> - violate data-race-freedom guarantees. */ >> - if (!REG_P (dest) || contains_mem_rtx_p (src)) >> + violate data-race-freedom guarantees. Make sure we can force SRC >> + to a register as that may be needed in try_emit_cmove_seq. */ >> + if (!REG_P (dest) || contains_mem_rtx_p (src) >> + || !noce_can_force_operand (src)) >> return false; >> >> /* Destination and source must be appropriate. */ >> diff --git a/gcc/testsuite/gcc.target/i386/pr116353.c b/gcc/testsuite/gcc.target/i386/pr116353.c >> new file mode 100644 >> index 00000000000..8e254653d5d >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/i386/pr116353.c > > Does this test contain x86 specific code? > Yes, please move it to the generic suite (maybe even torture, as it has a few nice properties). >> @@ -0,0 +1,55 @@ >> +/* PR tree-optimization/116353 */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +enum desmode { C }; >> +struct { >> + unsigned char des_ivec[]; >> +} _des_crypt_desp; >> +int des_SPtrans_6_0, des_SPtrans_4_0, des_encrypt_encrypt, des_encrypt_i; >> +long des_encrypt_s_0, _des_crypt_tin1, _des_crypt_tout0, _des_crypt_tout1, >> + _des_crypt_tin0; >> +enum desmode _des_crypt_desp_0; >> +unsigned long _des_crypt_tbuf[2]; >> +char _des_crypt_out; >> +void des_encrypt(unsigned long *buf) { >> + long l, r, t; >> + l = buf[0]; >> + r = buf[1]; >> + t = r; >> + r ^= l ^= t < 6; >> + if (des_encrypt_encrypt) >> + for (;; des_encrypt_i += 4) >> + des_encrypt_s_0 ^= des_SPtrans_4_0 | des_SPtrans_6_0; >> + buf[1] = r; >> +} >> +void _des_crypt() { >> + long xor0, xor1; >> + unsigned char *in; >> + int cbc_mode = _des_crypt_desp_0; >> + in = _des_crypt_desp.des_ivec; >> + xor0 = xor1 = 0; >> + for (;;) { >> + _des_crypt_tin0 = *in++; >> + _des_crypt_tin0 |= *in++ << 8; >> + _des_crypt_tin0 |= *in++ << 16; >> + _des_crypt_tin0 |= (long)*in << 24; >> + _des_crypt_tin1 = *in++; >> + _des_crypt_tin1 |= *in++ << 8; >> + _des_crypt_tin1 |= *in++ << 16; >> + _des_crypt_tin1 |= (long)*in << 24; >> + _des_crypt_tbuf[0] = _des_crypt_tin0; >> + _des_crypt_tbuf[1] = _des_crypt_tin1; >> + des_encrypt(_des_crypt_tbuf); >> + if (cbc_mode) { >> + _des_crypt_tout0 = xor0; >> + _des_crypt_tout1 = _des_crypt_tbuf[1] ^ xor1; >> + xor0 = _des_crypt_tin0; >> + xor1 = _des_crypt_tin1; >> + } else { >> + _des_crypt_tout0 = _des_crypt_tbuf[0]; >> + _des_crypt_tout1 = _des_crypt_tbuf[1]; >> + } >> + _des_crypt_out = _des_crypt_tout0 * _des_crypt_tout1; >> + } >> +} >> -- >> 2.34.1 >>
On Tue, Aug 13, 2024 at 12:40:35PM -0700, H.J. Lu wrote: > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr116353.c > > Does this test contain x86 specific code? Guess int32plus dependent at least (on 16-bit int *in++ << 16 would be out of bounds shift). Jakub
On Tue, Aug 13, 2024 at 10:48 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 8/13/24 5:57 AM, Manolis Tsamis wrote: > > Now that more operations are allowed for noce_convert_multiple_sets, we need to > > check noce_can_force_operand on the sequence before calling try_emit_cmove_seq. > > Otherwise an inappropriate argument may be given to copy_to_mode_reg and result > > in an ICE. > > > > Fix-up for the recent ifcvt commit 72c9b5f438f22cca493b4e2a8a2a31ff61bf1477 > > > > PR tree-optimization/116353 > > > > gcc/ChangeLog: > > > > * ifcvt.cc (bb_ok_for_noce_convert_multiple_sets): Check > > noce_can_force_operand. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr116353.c: New test. > OK. > > Note I'm not entirely sure that noce_can_force_operand is sufficient > based on what I've seen on the v8 port. > > What I'm seeing on the v8 port is that we're trying to create a > conditional move where one arm is a rotate expression. That in and of > itself isn't an issue. The port does (of course) expect a register > operand, so we use force_operand to get the result into a GPR. So far, > so good. > > The ifcvt code does check noce_can_force_operand which returns true. I > don't remember the precise details other than it looked reasonable. So > still, so far, so good. > > The problem in the v850 doesn't have a generalized rotation pattern. > The expander will FAIL for most rotate counts and there's no alternate > synthesis currently defined in the optabs interface for a word mode > rotate. So that in turn causes force_operand to return NULL_RTX to its > caller and boom! > > The rotation patterns are allowed to FAIL if I'm reading the docs > correctly. So it seems like what the v8 port is doing is valid, but it > is causing a segfault and this testsuite failure: > > > Tests that now fail, but worked before (4 tests): > > > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > > v850-sim/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > > v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > > > But perhaps it isn't that bad in practice. I can fix this by removing a > bit of what I expect is unnecessary code in the v850 port. > Hi Jeff, Yes, what you're saying is right, testing noce_can_force_operand for SET_SRC is not sufficient. I don't think you should change the v850 backend, this should be fixed in ifcvt. As far as my analysis goes this looks to be the same issue with PR116358. We can force_operand SET_SRC just fine, but that doesn't imply we can force_operand its individual arguments. I'm testing a solution that involves checking noce_can_force_operand in try_emit_cmove_seq, so that we know if it's safe to call emit_conditional_move (essentially an extension of this initial fix). Thanks, Manolis > Jeff
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index 3e25f30b67e..da59c907891 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -3938,8 +3938,10 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost) rtx src = SET_SRC (set); /* Do not handle anything involving memory loads/stores since it might - violate data-race-freedom guarantees. */ - if (!REG_P (dest) || contains_mem_rtx_p (src)) + violate data-race-freedom guarantees. Make sure we can force SRC + to a register as that may be needed in try_emit_cmove_seq. */ + if (!REG_P (dest) || contains_mem_rtx_p (src) + || !noce_can_force_operand (src)) return false; /* Destination and source must be appropriate. */ diff --git a/gcc/testsuite/gcc.target/i386/pr116353.c b/gcc/testsuite/gcc.target/i386/pr116353.c new file mode 100644 index 00000000000..8e254653d5d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr116353.c @@ -0,0 +1,55 @@ +/* PR tree-optimization/116353 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +enum desmode { C }; +struct { + unsigned char des_ivec[]; +} _des_crypt_desp; +int des_SPtrans_6_0, des_SPtrans_4_0, des_encrypt_encrypt, des_encrypt_i; +long des_encrypt_s_0, _des_crypt_tin1, _des_crypt_tout0, _des_crypt_tout1, + _des_crypt_tin0; +enum desmode _des_crypt_desp_0; +unsigned long _des_crypt_tbuf[2]; +char _des_crypt_out; +void des_encrypt(unsigned long *buf) { + long l, r, t; + l = buf[0]; + r = buf[1]; + t = r; + r ^= l ^= t < 6; + if (des_encrypt_encrypt) + for (;; des_encrypt_i += 4) + des_encrypt_s_0 ^= des_SPtrans_4_0 | des_SPtrans_6_0; + buf[1] = r; +} +void _des_crypt() { + long xor0, xor1; + unsigned char *in; + int cbc_mode = _des_crypt_desp_0; + in = _des_crypt_desp.des_ivec; + xor0 = xor1 = 0; + for (;;) { + _des_crypt_tin0 = *in++; + _des_crypt_tin0 |= *in++ << 8; + _des_crypt_tin0 |= *in++ << 16; + _des_crypt_tin0 |= (long)*in << 24; + _des_crypt_tin1 = *in++; + _des_crypt_tin1 |= *in++ << 8; + _des_crypt_tin1 |= *in++ << 16; + _des_crypt_tin1 |= (long)*in << 24; + _des_crypt_tbuf[0] = _des_crypt_tin0; + _des_crypt_tbuf[1] = _des_crypt_tin1; + des_encrypt(_des_crypt_tbuf); + if (cbc_mode) { + _des_crypt_tout0 = xor0; + _des_crypt_tout1 = _des_crypt_tbuf[1] ^ xor1; + xor0 = _des_crypt_tin0; + xor1 = _des_crypt_tin1; + } else { + _des_crypt_tout0 = _des_crypt_tbuf[0]; + _des_crypt_tout1 = _des_crypt_tbuf[1]; + } + _des_crypt_out = _des_crypt_tout0 * _des_crypt_tout1; + } +}