Message ID | mpt7cd1ms4l.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | Make may_trap_p_1 return false for constant pool references [PR116145] | expand |
On Wed, Jul 31, 2024 at 6:41 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > The testcase contains the constant: > > arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); > > which was initially hoisted by hand, but which gimple optimisers later > propagated to each use (as expected). The constant was then expanded > as a load-and-duplicate from the constant pool. Normally that load > should then be hoisted back out of the loop, but may_trap_or_fault_p > stopped that from happening in this case. > > The code responsible was: > > if (/* MEM_NOTRAP_P only relates to the actual position of the memory > reference; moving it out of context such as when moving code > when optimizing, might cause its address to become invalid. */ > code_changed > || !MEM_NOTRAP_P (x)) > { > poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; > return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size, > GET_MODE (x), code_changed); > } > > where code_changed is true. (Arguably it doesn't need to be true in > this case, if we inserted invariants on the preheader edge, but it > would still need to be true for conditionally executed loads.) > > Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1 > would recognise that the address refers to the constant pool. > However, the SVE load-and-replicate instructions have a limited > offset range, so it isn't possible for them to have a LO_SUM address. > All we have is a plain pseudo base register. > > MEM_READONLY_P is defined as: > > /* 1 if RTX is a mem that is statically allocated in read-only memory. */ > (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging) > > and so I think it should be safe to move memory references if both > MEM_READONLY_P and MEM_NOTRAP_P are true. > > The testcase isn't a minimal reproducer, but I think it's good > to have a realistic full routine in the testsuite. > > Bootstrapped & regression-tested on aarch64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > PR rtl-optimization/116145 > * rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code > movement if MEM_READONLY_P is also true. > > gcc/testsuite/ > PR rtl-optimization/116145 > * gcc.target/aarch64/sve/acle/general/pr116145.c: New test. > --- > gcc/rtlanal.cc | 14 ++++-- > .../aarch64/sve/acle/general/pr116145.c | 46 +++++++++++++++++++ > 2 files changed, 56 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > > diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc > index 4158a531bdd..893a6afbbc5 100644 > --- a/gcc/rtlanal.cc > +++ b/gcc/rtlanal.cc > @@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags) > && MEM_VOLATILE_P (x) > && XEXP (x, 0) == stack_pointer_rtx) > return true; > - if (/* MEM_NOTRAP_P only relates to the actual position of the memory > - reference; moving it out of context such as when moving code > - when optimizing, might cause its address to become invalid. */ > - code_changed > + if (/* MEM_READONLY_P means that the memory is both statically > + allocated and readonly, so MEM_NOTRAP_P should remain true > + even if the memory reference is moved. This is certainly > + true for the important case of force_const_mem. > + > + Otherwise, MEM_NOTRAP_P only relates to the actual position > + of the memory reference; moving it out of context such as > + when moving code when optimizing, might cause its address > + to become invalid. */ > + (code_changed && !MEM_READONLY_P (x)) > || !MEM_NOTRAP_P (x)) > { > poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > new file mode 100644 > index 00000000000..a3d93d3e1c8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > @@ -0,0 +1,46 @@ > +// { dg-options "-O2" } > + > +#include <stdlib.h> > +#include <arm_sve.h> > + > +#pragma GCC target "+sve2" > + > +typedef unsigned char uchar; > + > +const uchar * > +search_line_fast (const uchar *s, const uchar *end) > +{ > + size_t VL = svcntb(); > + svuint8_t arr1, arr2; > + svbool_t pc, pg = svptrue_b8(); > + > + // This should not be loaded inside the loop every time. > + arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); > + > + for (; s+VL <= end; s += VL) { > + arr1 = svld1_u8(pg, s); > + pc = svmatch_u8(pg, arr1, arr2); > + > + if (svptest_any(pg, pc)) { > + pc = svbrkb_z(pg, pc); > + return s+svcntp_b8(pg, pc); > + } > + } > + > + // Handle remainder. > + if (s < end) { > + pg = svwhilelt_b8((size_t)s, (size_t)end); > + > + arr1 = svld1_u8(pg, s); > + pc = svmatch_u8(pg, arr1, arr2); > + > + if (svptest_any(pg, pc)) { > + pc = svbrkb_z(pg, pc); > + return s+svcntp_b8(pg, pc); > + } > + } > + > + return end; > +} > + > +// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } } > -- > 2.25.1 >
On Wed, Jul 31, 2024 at 9:41 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > The testcase contains the constant: > > arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); > > which was initially hoisted by hand, but which gimple optimisers later > propagated to each use (as expected). The constant was then expanded > as a load-and-duplicate from the constant pool. Normally that load > should then be hoisted back out of the loop, but may_trap_or_fault_p > stopped that from happening in this case. > > The code responsible was: > > if (/* MEM_NOTRAP_P only relates to the actual position of the memory > reference; moving it out of context such as when moving code > when optimizing, might cause its address to become invalid. */ > code_changed > || !MEM_NOTRAP_P (x)) > { > poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; > return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size, > GET_MODE (x), code_changed); > } > > where code_changed is true. (Arguably it doesn't need to be true in > this case, if we inserted invariants on the preheader edge, but it > would still need to be true for conditionally executed loads.) > > Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1 > would recognise that the address refers to the constant pool. > However, the SVE load-and-replicate instructions have a limited > offset range, so it isn't possible for them to have a LO_SUM address. > All we have is a plain pseudo base register. > > MEM_READONLY_P is defined as: > > /* 1 if RTX is a mem that is statically allocated in read-only memory. */ > (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging) > > and so I think it should be safe to move memory references if both > MEM_READONLY_P and MEM_NOTRAP_P are true. > > The testcase isn't a minimal reproducer, but I think it's good > to have a realistic full routine in the testsuite. > > Bootstrapped & regression-tested on aarch64-linux-gnu. OK to install? This is breaking the build on a few targets (x86_64 and powerpc64le so far reported, see PR 116200). From what I can tell is that it is treating `(plus:DI (ashift:DI (reg:DI 0 ax [690]) (const_int 3 [0x3])) (label_ref:DI 1620))` as not trapping and allowing it to be moved before the check of ax being in the range [0..2] and we have eax being (unsigned long)(unsigned int)-9 in value. So we get a bogus address which will trap. I put my findings in PR 116200 too. Thanks, Andrew Pinski > > Richard > > > gcc/ > PR rtl-optimization/116145 > * rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code > movement if MEM_READONLY_P is also true. > > gcc/testsuite/ > PR rtl-optimization/116145 > * gcc.target/aarch64/sve/acle/general/pr116145.c: New test. > --- > gcc/rtlanal.cc | 14 ++++-- > .../aarch64/sve/acle/general/pr116145.c | 46 +++++++++++++++++++ > 2 files changed, 56 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > > diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc > index 4158a531bdd..893a6afbbc5 100644 > --- a/gcc/rtlanal.cc > +++ b/gcc/rtlanal.cc > @@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags) > && MEM_VOLATILE_P (x) > && XEXP (x, 0) == stack_pointer_rtx) > return true; > - if (/* MEM_NOTRAP_P only relates to the actual position of the memory > - reference; moving it out of context such as when moving code > - when optimizing, might cause its address to become invalid. */ > - code_changed > + if (/* MEM_READONLY_P means that the memory is both statically > + allocated and readonly, so MEM_NOTRAP_P should remain true > + even if the memory reference is moved. This is certainly > + true for the important case of force_const_mem. > + > + Otherwise, MEM_NOTRAP_P only relates to the actual position > + of the memory reference; moving it out of context such as > + when moving code when optimizing, might cause its address > + to become invalid. */ > + (code_changed && !MEM_READONLY_P (x)) > || !MEM_NOTRAP_P (x)) > { > poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > new file mode 100644 > index 00000000000..a3d93d3e1c8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c > @@ -0,0 +1,46 @@ > +// { dg-options "-O2" } > + > +#include <stdlib.h> > +#include <arm_sve.h> > + > +#pragma GCC target "+sve2" > + > +typedef unsigned char uchar; > + > +const uchar * > +search_line_fast (const uchar *s, const uchar *end) > +{ > + size_t VL = svcntb(); > + svuint8_t arr1, arr2; > + svbool_t pc, pg = svptrue_b8(); > + > + // This should not be loaded inside the loop every time. > + arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); > + > + for (; s+VL <= end; s += VL) { > + arr1 = svld1_u8(pg, s); > + pc = svmatch_u8(pg, arr1, arr2); > + > + if (svptest_any(pg, pc)) { > + pc = svbrkb_z(pg, pc); > + return s+svcntp_b8(pg, pc); > + } > + } > + > + // Handle remainder. > + if (s < end) { > + pg = svwhilelt_b8((size_t)s, (size_t)end); > + > + arr1 = svld1_u8(pg, s); > + pc = svmatch_u8(pg, arr1, arr2); > + > + if (svptest_any(pg, pc)) { > + pc = svbrkb_z(pg, pc); > + return s+svcntp_b8(pg, pc); > + } > + } > + > + return end; > +} > + > +// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } } > -- > 2.25.1 >
On 8/2/24 2:23 PM, Andrew Pinski wrote: > On Wed, Jul 31, 2024 at 9:41 AM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> The testcase contains the constant: >> >> arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); >> >> which was initially hoisted by hand, but which gimple optimisers later >> propagated to each use (as expected). The constant was then expanded >> as a load-and-duplicate from the constant pool. Normally that load >> should then be hoisted back out of the loop, but may_trap_or_fault_p >> stopped that from happening in this case. >> >> The code responsible was: >> >> if (/* MEM_NOTRAP_P only relates to the actual position of the memory >> reference; moving it out of context such as when moving code >> when optimizing, might cause its address to become invalid. */ >> code_changed >> || !MEM_NOTRAP_P (x)) >> { >> poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; >> return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size, >> GET_MODE (x), code_changed); >> } >> >> where code_changed is true. (Arguably it doesn't need to be true in >> this case, if we inserted invariants on the preheader edge, but it >> would still need to be true for conditionally executed loads.) >> >> Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1 >> would recognise that the address refers to the constant pool. >> However, the SVE load-and-replicate instructions have a limited >> offset range, so it isn't possible for them to have a LO_SUM address. >> All we have is a plain pseudo base register. >> >> MEM_READONLY_P is defined as: >> >> /* 1 if RTX is a mem that is statically allocated in read-only memory. */ >> (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging) >> >> and so I think it should be safe to move memory references if both >> MEM_READONLY_P and MEM_NOTRAP_P are true. >> >> The testcase isn't a minimal reproducer, but I think it's good >> to have a realistic full routine in the testsuite. >> >> Bootstrapped & regression-tested on aarch64-linux-gnu. OK to install? > > > This is breaking the build on a few targets (x86_64 and powerpc64le so > far reported, see PR 116200). > > From what I can tell is that it is treating `(plus:DI (ashift:DI > (reg:DI 0 ax [690]) (const_int 3 [0x3])) (label_ref:DI 1620))` as not > trapping and allowing it to be moved before the check of ax being in > the range [0..2] and we have eax being (unsigned long)(unsigned int)-9 > in value. So we get a bogus address which will trap. I put my findings > in PR 116200 too. I think it's the root cause of the x86_64 libgomp failures on the trunk as well. I haven't done any debugging beyond that. jeff
Jeff Law <jeffreyalaw@gmail.com> writes: > On 8/2/24 2:23 PM, Andrew Pinski wrote: >> On Wed, Jul 31, 2024 at 9:41 AM Richard Sandiford >> <richard.sandiford@arm.com> wrote: >>> >>> The testcase contains the constant: >>> >>> arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); >>> >>> which was initially hoisted by hand, but which gimple optimisers later >>> propagated to each use (as expected). The constant was then expanded >>> as a load-and-duplicate from the constant pool. Normally that load >>> should then be hoisted back out of the loop, but may_trap_or_fault_p >>> stopped that from happening in this case. >>> >>> The code responsible was: >>> >>> if (/* MEM_NOTRAP_P only relates to the actual position of the memory >>> reference; moving it out of context such as when moving code >>> when optimizing, might cause its address to become invalid. */ >>> code_changed >>> || !MEM_NOTRAP_P (x)) >>> { >>> poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; >>> return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size, >>> GET_MODE (x), code_changed); >>> } >>> >>> where code_changed is true. (Arguably it doesn't need to be true in >>> this case, if we inserted invariants on the preheader edge, but it >>> would still need to be true for conditionally executed loads.) >>> >>> Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1 >>> would recognise that the address refers to the constant pool. >>> However, the SVE load-and-replicate instructions have a limited >>> offset range, so it isn't possible for them to have a LO_SUM address. >>> All we have is a plain pseudo base register. >>> >>> MEM_READONLY_P is defined as: >>> >>> /* 1 if RTX is a mem that is statically allocated in read-only memory. */ >>> (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging) >>> >>> and so I think it should be safe to move memory references if both >>> MEM_READONLY_P and MEM_NOTRAP_P are true. >>> >>> The testcase isn't a minimal reproducer, but I think it's good >>> to have a realistic full routine in the testsuite. >>> >>> Bootstrapped & regression-tested on aarch64-linux-gnu. OK to install? >> >> >> This is breaking the build on a few targets (x86_64 and powerpc64le so >> far reported, see PR 116200). >> >> From what I can tell is that it is treating `(plus:DI (ashift:DI >> (reg:DI 0 ax [690]) (const_int 3 [0x3])) (label_ref:DI 1620))` as not >> trapping and allowing it to be moved before the check of ax being in >> the range [0..2] and we have eax being (unsigned long)(unsigned int)-9 >> in value. So we get a bogus address which will trap. I put my findings >> in PR 116200 too. > I think it's the root cause of the x86_64 libgomp failures on the trunk > as well. I haven't done any debugging beyond that. Sorry for the breakage. I've reverted the patch. Richard
diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc index 4158a531bdd..893a6afbbc5 100644 --- a/gcc/rtlanal.cc +++ b/gcc/rtlanal.cc @@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags) && MEM_VOLATILE_P (x) && XEXP (x, 0) == stack_pointer_rtx) return true; - if (/* MEM_NOTRAP_P only relates to the actual position of the memory - reference; moving it out of context such as when moving code - when optimizing, might cause its address to become invalid. */ - code_changed + if (/* MEM_READONLY_P means that the memory is both statically + allocated and readonly, so MEM_NOTRAP_P should remain true + even if the memory reference is moved. This is certainly + true for the important case of force_const_mem. + + Otherwise, MEM_NOTRAP_P only relates to the actual position + of the memory reference; moving it out of context such as + when moving code when optimizing, might cause its address + to become invalid. */ + (code_changed && !MEM_READONLY_P (x)) || !MEM_NOTRAP_P (x)) { poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c new file mode 100644 index 00000000000..a3d93d3e1c8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c @@ -0,0 +1,46 @@ +// { dg-options "-O2" } + +#include <stdlib.h> +#include <arm_sve.h> + +#pragma GCC target "+sve2" + +typedef unsigned char uchar; + +const uchar * +search_line_fast (const uchar *s, const uchar *end) +{ + size_t VL = svcntb(); + svuint8_t arr1, arr2; + svbool_t pc, pg = svptrue_b8(); + + // This should not be loaded inside the loop every time. + arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); + + for (; s+VL <= end; s += VL) { + arr1 = svld1_u8(pg, s); + pc = svmatch_u8(pg, arr1, arr2); + + if (svptest_any(pg, pc)) { + pc = svbrkb_z(pg, pc); + return s+svcntp_b8(pg, pc); + } + } + + // Handle remainder. + if (s < end) { + pg = svwhilelt_b8((size_t)s, (size_t)end); + + arr1 = svld1_u8(pg, s); + pc = svmatch_u8(pg, arr1, arr2); + + if (svptest_any(pg, pc)) { + pc = svbrkb_z(pg, pc); + return s+svcntp_b8(pg, pc); + } + } + + return end; +} + +// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } }