Message ID | 1492f7fed6cea211a174ad171f43cb0561eb1a37.camel@physik.fu-berlin.de |
---|---|
State | New |
Headers | show |
Series | RFH: Debugging GCC segfault with LRA-enabled SH backend | expand |
On Thu, Aug 22, 2024 at 12:32 PM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > > (Please CC me in the replies, I am not subscribed to the list) > > Hi, > > I am currently trying to switch the SH backend to use the LRA register allocater > by default with the help of patches by Oleg and Kaz (CC'ed) to address various > issues when using LRA by default. The patches can all be found in the corresponding > Bugzilla report [1]. > > Currently, I have applied the following patches from the bug report: > > - 58832 > - 58833 > - 58883 > - 58905 > > plus the following change to enable LRA by default: > > diff --git a/gcc/config/sh/sh.opt b/gcc/config/sh/sh.opt > index c44cfe70cb1..718dfb744ff 100644 > --- a/gcc/config/sh/sh.opt > +++ b/gcc/config/sh/sh.opt > @@ -299,5 +299,5 @@ Target Var(TARGET_FSRRA) > Enable the use of the fsrra instruction. > > mlra > -Target Var(sh_lra_flag) Init(0) Save > +Target Var(sh_lra_flag) Init(1) Save > Use LRA instead of reload (transitional). > > With these changes applied, I have configured and built GCC from Git as follows: > > # ../configure --disable-multilib --enable-multiarch --enable-bootstrap --enable-languages=c,c++ > # make -j32 > > which fails on QEMU with a segmentation fault: > > /srv/glaubitz/gcc/build/./gcc/xgcc -B/srv/glaubitz/gcc/build/./gcc/ -B/usr/local/sh4-unknown-linux-gnu/bin/ \ > -B/usr/local/sh4-unknown-linux-gnu/lib/ -isystem /usr/local/sh4-unknown-linux-gnu/include -isystem \ > /usr/local/sh4-unknown-linux-gnu/sys-include -fno-checking -g -O2 -O2 -g -O2 -DIN_GCC \ > -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes \ > -Wold-style-definition -isystem ./include -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -g -DIN_LIBGCC2 \ > -fbuilding-libgcc -fno-stack-protector -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -I. -I. \ > -I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc -I../../../libgcc/../include \ > -DHAVE_CC_TLS -o _paritysi2.o -MT _paritysi2.o -MD -MP -MF _paritysi2.dep -DL_paritysi2 \ > -c ../../../libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS > during GIMPLE pass: waccess > ../../../libgcc/libgcc2.c: In function '__muldi3': > ../../../libgcc/libgcc2.c:538:1: internal compiler error: Segmentation fault > 538 | } > | ^ > during GIMPLE pass: waccess If this is stage2 or stage3 it hints at a miscompile of the stage2/3 compiler. I'd concentrate on other issues first and suggest to use --disable-bootstrap to see if that gets you to running the testsuite. Otherwise you need to bisect which stage2/3 object was miscompiled and then investigate the nature of the miscompilation. A much more tedious process than addressing remaining testsuite execution FAILs. Richard. > This is reproducible on real hardware (Renesas SH-7785LCR, Linux 6.5.0), so it's not an emulation issue. > > I have tried to debug this issue with my Linux-SH-enabled GDB fork [2] and got the following backtrace: > > (gdb) bt > #0 0x0109fee4 in wi::add_large(long long*, long long const*, unsigned int, long long const*, unsigned int, unsigned int, signop, wi::overflow_type*) () > #1 0x00bdbc10 in access_ref::add_offset(generic_wide_int<fixed_wide_int_storage<128> > const&, generic_wide_int<fixed_wide_int_storage<128> > const&) () > #2 0x00bdd0e8 in compute_objsize_r(tree_node*, gimple*, bool, int, access_ref*, ssa_name_limit_t&, pointer_query*) () > #3 0x00000000 in ?? () > (gdb) display/i $pc > 1: x/i $pc > => 0x109fee4 <_ZN2wi9add_largeEPxPKxjS2_jj6signopPNS_13overflow_typeE+84>: mov.l @r2,r3 > (gdb) x/wx $r2 > 0x7c07eaa0: Cannot access memory at address 0x7c07eaa0 > (gdb) > > I have also tried disabling late combine by SH by default, but that didn't help: > > diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc > index 280588268ae..dca27893536 100644 > --- a/gcc/config/sh/sh.cc > +++ b/gcc/config/sh/sh.cc > @@ -1047,6 +1047,9 @@ sh_override_options_after_change (void) > str_align_functions = r; > } > } > + > + if (!OPTION_SET_P (flag_late_combine_instructions)) > + flag_late_combine_instructions = 0; > } > ^L > /* Print the operand address in x to the stream. */ > > Thus, I have now run out of ideas (as I'm not really a compiler expert). > > Does anyone else have any other suggestions? > > Thanks, > Adrian > > PS: Would be great if we could upstream Linux SH native GDB support from [2] > as well, in case any binutils-gdb maintainer is reading here. > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212 > > [2] https://github.com/glaubitz/binutils-gdb/tree/linux-sh > > -- > .''`. John Paul Adrian Glaubitz > : :' : Debian Developer > `. `' Physicist > `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Hi Richard, On Thu, 2024-08-22 at 12:49 +0200, Richard Biener wrote: > If this is stage2 or stage3 it hints at a miscompile of the stage2/3 > compiler. I'd concentrate on other > issues first and suggest to use --disable-bootstrap to see if that > gets you to running the testsuite. I have actually done that, just forgot to mention it here. The problem is that every test is failing so far. I'm testing on real hardware which is why it's been running for a few days already. I'm afraid I might have done something wrong running the tests. > Otherwise you need to bisect which stage2/3 object was miscompiled and > then investigate the nature > of the miscompilation. A much more tedious process than addressing > remaining testsuite execution > FAILs. I'm not sure that bisecting works here as I suspect the issue is a result of the LRA switch. Adrian
On Thu, Aug 22, 2024 at 12:54 PM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > > Hi Richard, > > On Thu, 2024-08-22 at 12:49 +0200, Richard Biener wrote: > > If this is stage2 or stage3 it hints at a miscompile of the stage2/3 > > compiler. I'd concentrate on other > > issues first and suggest to use --disable-bootstrap to see if that > > gets you to running the testsuite. > > I have actually done that, just forgot to mention it here. The problem > is that every test is failing so far. I'm testing on real hardware which > is why it's been running for a few days already. > > I'm afraid I might have done something wrong running the tests. > > > Otherwise you need to bisect which stage2/3 object was miscompiled and > > then investigate the nature > > of the miscompilation. A much more tedious process than addressing > > remaining testsuite execution > > FAILs. > > I'm not sure that bisecting works here as I suspect the issue is a result > of the LRA switch. For sure. Still debugging/fixing the testsuite issue will be much easier. Does a int main(){} also segfault? Richard. > Adrian > > -- > .''`. John Paul Adrian Glaubitz > : :' : Debian Developer > `. `' Physicist > `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
On Thu, 2024-08-22 at 13:05 +0200, Richard Biener wrote: > > I'm not sure that bisecting works here as I suspect the issue is a result > > of the LRA switch. > > For sure. Still debugging/fixing the testsuite issue will be much easier. > > Does a int main(){} also segfault? I can run the LRA-enabled GCC normally, if you mean that: (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./prev-gcc/xgcc --version xgcc (GCC) 15.0.0 20240818 (experimental) Copyright (C) 2024 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./gcc/xgcc --version xgcc (GCC) 15.0.0 20240818 (experimental) Copyright (C) 2024 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ Adrian
On Thu, Aug 22, 2024 at 1:35 PM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > > On Thu, 2024-08-22 at 13:05 +0200, Richard Biener wrote: > > > I'm not sure that bisecting works here as I suspect the issue is a result > > > of the LRA switch. > > > > For sure. Still debugging/fixing the testsuite issue will be much easier. > > > > Does a int main(){} also segfault? > > I can run the LRA-enabled GCC normally, if you mean that: > > (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./prev-gcc/xgcc --version > xgcc (GCC) 15.0.0 20240818 (experimental) > Copyright (C) 2024 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. OK, then that compiler also successfully built the stage1 target libraries. But no, I meant that if you build a int main(){} program and run _that_, does it segfault? That is, I suspect there is something broken with compiling memory accesses. Check the testsuite. Best see what tests pass with LRA disabled and then enable LRA and see which tests then fail. Richard. > (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./gcc/xgcc --version > xgcc (GCC) 15.0.0 20240818 (experimental) > Copyright (C) 2024 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ > > Adrian > > -- > .''`. John Paul Adrian Glaubitz > : :' : Debian Developer > `. `' Physicist > `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
diff --git a/gcc/config/sh/sh.opt b/gcc/config/sh/sh.opt index c44cfe70cb1..718dfb744ff 100644 --- a/gcc/config/sh/sh.opt +++ b/gcc/config/sh/sh.opt @@ -299,5 +299,5 @@ Target Var(TARGET_FSRRA) Enable the use of the fsrra instruction. mlra -Target Var(sh_lra_flag) Init(0) Save +Target Var(sh_lra_flag) Init(1) Save Use LRA instead of reload (transitional). With these changes applied, I have configured and built GCC from Git as follows: # ../configure --disable-multilib --enable-multiarch --enable-bootstrap --enable-languages=c,c++ # make -j32 which fails on QEMU with a segmentation fault: /srv/glaubitz/gcc/build/./gcc/xgcc -B/srv/glaubitz/gcc/build/./gcc/ -B/usr/local/sh4-unknown-linux-gnu/bin/ \ -B/usr/local/sh4-unknown-linux-gnu/lib/ -isystem /usr/local/sh4-unknown-linux-gnu/include -isystem \ /usr/local/sh4-unknown-linux-gnu/sys-include -fno-checking -g -O2 -O2 -g -O2 -DIN_GCC \ -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes \ -Wold-style-definition -isystem ./include -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -g -DIN_LIBGCC2 \ -fbuilding-libgcc -fno-stack-protector -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -I. -I. \ -I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc -I../../../libgcc/../include \ -DHAVE_CC_TLS -o _paritysi2.o -MT _paritysi2.o -MD -MP -MF _paritysi2.dep -DL_paritysi2 \ -c ../../../libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS during GIMPLE pass: waccess ../../../libgcc/libgcc2.c: In function '__muldi3': ../../../libgcc/libgcc2.c:538:1: internal compiler error: Segmentation fault 538 | } | ^ during GIMPLE pass: waccess This is reproducible on real hardware (Renesas SH-7785LCR, Linux 6.5.0), so it's not an emulation issue. I have tried to debug this issue with my Linux-SH-enabled GDB fork [2] and got the following backtrace: (gdb) bt #0 0x0109fee4 in wi::add_large(long long*, long long const*, unsigned int, long long const*, unsigned int, unsigned int, signop, wi::overflow_type*) () #1 0x00bdbc10 in access_ref::add_offset(generic_wide_int<fixed_wide_int_storage<128> > const&, generic_wide_int<fixed_wide_int_storage<128> > const&) () #2 0x00bdd0e8 in compute_objsize_r(tree_node*, gimple*, bool, int, access_ref*, ssa_name_limit_t&, pointer_query*) () #3 0x00000000 in ?? () (gdb) display/i $pc 1: x/i $pc => 0x109fee4 <_ZN2wi9add_largeEPxPKxjS2_jj6signopPNS_13overflow_typeE+84>: mov.l @r2,r3 (gdb) x/wx $r2 0x7c07eaa0: Cannot access memory at address 0x7c07eaa0 (gdb) I have also tried disabling late combine by SH by default, but that didn't help: diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc index 280588268ae..dca27893536 100644 --- a/gcc/config/sh/sh.cc +++ b/gcc/config/sh/sh.cc @@ -1047,6 +1047,9 @@ sh_override_options_after_change (void) str_align_functions = r; } } + + if (!OPTION_SET_P (flag_late_combine_instructions)) + flag_late_combine_instructions = 0; } ^L /* Print the operand address in x to the stream. */