diff mbox series

RFH: Debugging GCC segfault with LRA-enabled SH backend

Message ID 1492f7fed6cea211a174ad171f43cb0561eb1a37.camel@physik.fu-berlin.de
State New
Headers show
Series RFH: Debugging GCC segfault with LRA-enabled SH backend | expand

Commit Message

John Paul Adrian Glaubitz Aug. 22, 2024, 10:31 a.m. UTC
(Please CC me in the replies, I am not subscribed to the list)

Hi,

I am currently trying to switch the SH backend to use the LRA register allocater
by default with the help of patches by Oleg and Kaz (CC'ed) to address various
issues when using LRA by default. The patches can all be found in the corresponding
Bugzilla report [1].

Currently, I have applied the following patches from the bug report:

- 58832
- 58833
- 58883
- 58905

plus the following change to enable LRA by default:


Thus, I have now run out of ideas (as I'm not really a compiler expert).

Does anyone else have any other suggestions?

Thanks,
Adrian

PS: Would be great if we could upstream Linux SH native GDB support from [2]
    as well, in case any binutils-gdb maintainer is reading here.

> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212
> [2] https://github.com/glaubitz/binutils-gdb/tree/linux-sh

Comments

Richard Biener Aug. 22, 2024, 10:49 a.m. UTC | #1
On Thu, Aug 22, 2024 at 12:32 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
>
> (Please CC me in the replies, I am not subscribed to the list)
>
> Hi,
>
> I am currently trying to switch the SH backend to use the LRA register allocater
> by default with the help of patches by Oleg and Kaz (CC'ed) to address various
> issues when using LRA by default. The patches can all be found in the corresponding
> Bugzilla report [1].
>
> Currently, I have applied the following patches from the bug report:
>
> - 58832
> - 58833
> - 58883
> - 58905
>
> plus the following change to enable LRA by default:
>
> diff --git a/gcc/config/sh/sh.opt b/gcc/config/sh/sh.opt
> index c44cfe70cb1..718dfb744ff 100644
> --- a/gcc/config/sh/sh.opt
> +++ b/gcc/config/sh/sh.opt
> @@ -299,5 +299,5 @@ Target Var(TARGET_FSRRA)
>  Enable the use of the fsrra instruction.
>
>  mlra
> -Target Var(sh_lra_flag) Init(0) Save
> +Target Var(sh_lra_flag) Init(1) Save
>  Use LRA instead of reload (transitional).
>
> With these changes applied, I have configured and built GCC from Git as follows:
>
> # ../configure --disable-multilib --enable-multiarch --enable-bootstrap --enable-languages=c,c++
> # make -j32
>
> which fails on QEMU with a segmentation fault:
>
> /srv/glaubitz/gcc/build/./gcc/xgcc -B/srv/glaubitz/gcc/build/./gcc/ -B/usr/local/sh4-unknown-linux-gnu/bin/ \
> -B/usr/local/sh4-unknown-linux-gnu/lib/ -isystem /usr/local/sh4-unknown-linux-gnu/include -isystem \
> /usr/local/sh4-unknown-linux-gnu/sys-include   -fno-checking -g -O2 -O2  -g -O2 -DIN_GCC   \
> -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes \
> -Wold-style-definition  -isystem ./include  -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -g -DIN_LIBGCC2 \
> -fbuilding-libgcc -fno-stack-protector  -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -I. -I. \
> -I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc -I../../../libgcc/../include  \
> -DHAVE_CC_TLS   -o _paritysi2.o -MT _paritysi2.o -MD -MP -MF _paritysi2.dep -DL_paritysi2 \
> -c ../../../libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS
> during GIMPLE pass: waccess
> ../../../libgcc/libgcc2.c: In function '__muldi3':
> ../../../libgcc/libgcc2.c:538:1: internal compiler error: Segmentation fault
>   538 | }
>       | ^
> during GIMPLE pass: waccess

If this is stage2 or stage3 it hints at a miscompile of the stage2/3
compiler.  I'd concentrate on other
issues first and suggest to use --disable-bootstrap to see if that
gets you to running the testsuite.

Otherwise you need to bisect which stage2/3 object was miscompiled and
then investigate the nature
of the miscompilation.  A much more tedious process than addressing
remaining testsuite execution
FAILs.

Richard.

> This is reproducible on real hardware (Renesas SH-7785LCR, Linux 6.5.0), so it's not an emulation issue.
>
> I have tried to debug this issue with my Linux-SH-enabled GDB fork [2] and got the following backtrace:
>
> (gdb) bt
> #0  0x0109fee4 in wi::add_large(long long*, long long const*, unsigned int, long long const*, unsigned int, unsigned int, signop, wi::overflow_type*) ()
> #1  0x00bdbc10 in access_ref::add_offset(generic_wide_int<fixed_wide_int_storage<128> > const&, generic_wide_int<fixed_wide_int_storage<128> > const&) ()
> #2  0x00bdd0e8 in compute_objsize_r(tree_node*, gimple*, bool, int, access_ref*, ssa_name_limit_t&, pointer_query*) ()
> #3  0x00000000 in ?? ()
> (gdb) display/i $pc
> 1: x/i $pc
> => 0x109fee4 <_ZN2wi9add_largeEPxPKxjS2_jj6signopPNS_13overflow_typeE+84>:      mov.l   @r2,r3
> (gdb) x/wx $r2
> 0x7c07eaa0:     Cannot access memory at address 0x7c07eaa0
> (gdb)
>
> I have also tried disabling late combine by SH by default, but that didn't help:
>
> diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
> index 280588268ae..dca27893536 100644
> --- a/gcc/config/sh/sh.cc
> +++ b/gcc/config/sh/sh.cc
> @@ -1047,6 +1047,9 @@ sh_override_options_after_change (void)
>           str_align_functions = r;
>         }
>      }
> +
> +    if (!OPTION_SET_P (flag_late_combine_instructions))
> +      flag_late_combine_instructions = 0;
>  }
>  ^L
>  /* Print the operand address in x to the stream.  */
>
> Thus, I have now run out of ideas (as I'm not really a compiler expert).
>
> Does anyone else have any other suggestions?
>
> Thanks,
> Adrian
>
> PS: Would be great if we could upstream Linux SH native GDB support from [2]
>     as well, in case any binutils-gdb maintainer is reading here.
>
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212
> > [2] https://github.com/glaubitz/binutils-gdb/tree/linux-sh
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz Aug. 22, 2024, 10:54 a.m. UTC | #2
Hi Richard,

On Thu, 2024-08-22 at 12:49 +0200, Richard Biener wrote:
> If this is stage2 or stage3 it hints at a miscompile of the stage2/3
> compiler.  I'd concentrate on other
> issues first and suggest to use --disable-bootstrap to see if that
> gets you to running the testsuite.

I have actually done that, just forgot to mention it here. The problem
is that every test is failing so far. I'm testing on real hardware which
is why it's been running for a few days already.

I'm afraid I might have done something wrong running the tests.

> Otherwise you need to bisect which stage2/3 object was miscompiled and
> then investigate the nature
> of the miscompilation.  A much more tedious process than addressing
> remaining testsuite execution
> FAILs.

I'm not sure that bisecting works here as I suspect the issue is a result
of the LRA switch.

Adrian
Richard Biener Aug. 22, 2024, 11:05 a.m. UTC | #3
On Thu, Aug 22, 2024 at 12:54 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
>
> Hi Richard,
>
> On Thu, 2024-08-22 at 12:49 +0200, Richard Biener wrote:
> > If this is stage2 or stage3 it hints at a miscompile of the stage2/3
> > compiler.  I'd concentrate on other
> > issues first and suggest to use --disable-bootstrap to see if that
> > gets you to running the testsuite.
>
> I have actually done that, just forgot to mention it here. The problem
> is that every test is failing so far. I'm testing on real hardware which
> is why it's been running for a few days already.
>
> I'm afraid I might have done something wrong running the tests.
>
> > Otherwise you need to bisect which stage2/3 object was miscompiled and
> > then investigate the nature
> > of the miscompilation.  A much more tedious process than addressing
> > remaining testsuite execution
> > FAILs.
>
> I'm not sure that bisecting works here as I suspect the issue is a result
> of the LRA switch.

For sure.  Still debugging/fixing the testsuite issue will be much easier.

Does a int main(){} also segfault?

Richard.

> Adrian
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
John Paul Adrian Glaubitz Aug. 22, 2024, 11:35 a.m. UTC | #4
On Thu, 2024-08-22 at 13:05 +0200, Richard Biener wrote:
> > I'm not sure that bisecting works here as I suspect the issue is a result
> > of the LRA switch.
> 
> For sure.  Still debugging/fixing the testsuite issue will be much easier.
> 
> Does a int main(){} also segfault?

I can run the LRA-enabled GCC normally, if you mean that:

(unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./prev-gcc/xgcc --version
xgcc (GCC) 15.0.0 20240818 (experimental)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./gcc/xgcc --version
xgcc (GCC) 15.0.0 20240818 (experimental)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$

Adrian
Richard Biener Aug. 22, 2024, 11:49 a.m. UTC | #5
On Thu, Aug 22, 2024 at 1:35 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
>
> On Thu, 2024-08-22 at 13:05 +0200, Richard Biener wrote:
> > > I'm not sure that bisecting works here as I suspect the issue is a result
> > > of the LRA switch.
> >
> > For sure.  Still debugging/fixing the testsuite issue will be much easier.
> >
> > Does a int main(){} also segfault?
>
> I can run the LRA-enabled GCC normally, if you mean that:
>
> (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./prev-gcc/xgcc --version
> xgcc (GCC) 15.0.0 20240818 (experimental)
> Copyright (C) 2024 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

OK, then that compiler also successfully built the stage1 target libraries.

But no, I meant that if you build a int main(){} program and run
_that_, does it segfault?
That is, I suspect there is something broken with compiling memory accesses.

Check the testsuite.  Best see what tests pass with LRA disabled and
then enable LRA
and see which tests then fail.

Richard.

> (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$ ./gcc/xgcc --version
> xgcc (GCC) 15.0.0 20240818 (experimental)
> Copyright (C) 2024 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> (unstable-sh4-sbuild)glaubitz@acrux:/srv/glaubitz/gcc/build$
>
> Adrian
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
diff mbox series

Patch

diff --git a/gcc/config/sh/sh.opt b/gcc/config/sh/sh.opt
index c44cfe70cb1..718dfb744ff 100644
--- a/gcc/config/sh/sh.opt
+++ b/gcc/config/sh/sh.opt
@@ -299,5 +299,5 @@  Target Var(TARGET_FSRRA)
 Enable the use of the fsrra instruction.
 
 mlra
-Target Var(sh_lra_flag) Init(0) Save
+Target Var(sh_lra_flag) Init(1) Save
 Use LRA instead of reload (transitional).

With these changes applied, I have configured and built GCC from Git as follows:

# ../configure --disable-multilib --enable-multiarch --enable-bootstrap --enable-languages=c,c++
# make -j32

which fails on QEMU with a segmentation fault:

/srv/glaubitz/gcc/build/./gcc/xgcc -B/srv/glaubitz/gcc/build/./gcc/ -B/usr/local/sh4-unknown-linux-gnu/bin/ \
-B/usr/local/sh4-unknown-linux-gnu/lib/ -isystem /usr/local/sh4-unknown-linux-gnu/include -isystem \
/usr/local/sh4-unknown-linux-gnu/sys-include   -fno-checking -g -O2 -O2  -g -O2 -DIN_GCC   \
-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes \
-Wold-style-definition  -isystem ./include  -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -g -DIN_LIBGCC2 \
-fbuilding-libgcc -fno-stack-protector  -fpic -DNO_FPSCR_VALUES -w -Wno-sync-nand -I. -I. \
-I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc -I../../../libgcc/../include  \
-DHAVE_CC_TLS   -o _paritysi2.o -MT _paritysi2.o -MD -MP -MF _paritysi2.dep -DL_paritysi2 \
-c ../../../libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS
during GIMPLE pass: waccess
../../../libgcc/libgcc2.c: In function '__muldi3':
../../../libgcc/libgcc2.c:538:1: internal compiler error: Segmentation fault
  538 | }
      | ^
during GIMPLE pass: waccess

This is reproducible on real hardware (Renesas SH-7785LCR, Linux 6.5.0), so it's not an emulation issue.

I have tried to debug this issue with my Linux-SH-enabled GDB fork [2] and got the following backtrace:

(gdb) bt
#0  0x0109fee4 in wi::add_large(long long*, long long const*, unsigned int, long long const*, unsigned int, unsigned int, signop, wi::overflow_type*) ()
#1  0x00bdbc10 in access_ref::add_offset(generic_wide_int<fixed_wide_int_storage<128> > const&, generic_wide_int<fixed_wide_int_storage<128> > const&) ()
#2  0x00bdd0e8 in compute_objsize_r(tree_node*, gimple*, bool, int, access_ref*, ssa_name_limit_t&, pointer_query*) ()
#3  0x00000000 in ?? ()
(gdb) display/i $pc
1: x/i $pc
=> 0x109fee4 <_ZN2wi9add_largeEPxPKxjS2_jj6signopPNS_13overflow_typeE+84>:      mov.l   @r2,r3
(gdb) x/wx $r2
0x7c07eaa0:     Cannot access memory at address 0x7c07eaa0
(gdb)

I have also tried disabling late combine by SH by default, but that didn't help:

diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 280588268ae..dca27893536 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -1047,6 +1047,9 @@  sh_override_options_after_change (void)
          str_align_functions = r;
        }
     }
+
+    if (!OPTION_SET_P (flag_late_combine_instructions))
+      flag_late_combine_instructions = 0;
 }
 ^L
 /* Print the operand address in x to the stream.  */