Message ID | 20240703100855.3855337-1-sebastien.michelland@lcis.grenoble-inp.fr |
---|---|
State | New |
Headers | show |
Series | [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4 | expand |
On 7/3/24 3:59 AM, Sébastien Michelland wrote: > libgcc's fp-bit.c is quite slow and most modern/developed architectures > have switched to using the soft-fp library. This patch does so for > free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default parameters > for the most part, most notably no exceptions. > > A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows > about x3 speedup (~320 -> 1050 Kwhets/s). > > I'm sending this as RFC because I'm quite unsure about testing. I built > the compiler and ran the benchmark, but I don't know if GCC has a test > for soft-fp correctness and whether I can run that in my non-hosted > environment. Any advice? > > Cheers, > Sébastien > > libgcc/ChangeLog: > > * config.host: Use soft-fp library for non-hosted SH3/SH4 > instead of fpdbit. > * config/sh/sfp-machine.h: New. I'd really like to hear from Oleg on this, though given we're using the soft-fp library on other targets it seems reasonable at a high level. As far as testing, the GCC testsuite has some FP components which would implicitly test soft fp on any target that doesn't have hardware floating point. Jeff
On 2024-07-03 17:59, Jeff Law wrote: > On 7/3/24 3:59 AM, Sébastien Michelland wrote: >> libgcc's fp-bit.c is quite slow and most modern/developed architectures >> have switched to using the soft-fp library. This patch does so for >> free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default >> parameters >> for the most part, most notably no exceptions. >> >> A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows >> about x3 speedup (~320 -> 1050 Kwhets/s). >> >> I'm sending this as RFC because I'm quite unsure about testing. I built >> the compiler and ran the benchmark, but I don't know if GCC has a test >> for soft-fp correctness and whether I can run that in my non-hosted >> environment. Any advice? >> >> Cheers, >> Sébastien >> >> libgcc/ChangeLog: >> >> * config.host: Use soft-fp library for non-hosted SH3/SH4 >> instead of fpdbit. >> * config/sh/sfp-machine.h: New. > I'd really like to hear from Oleg on this, though given we're using the > soft-fp library on other targets it seems reasonable at a high level. > > As far as testing, the GCC testsuite has some FP components which would > implicitly test soft fp on any target that doesn't have hardware > floating point. Thank you. I went this route, following the guide [1] and the instructions for cross-compiling [2] before hitting "Newlib does not support CPU sh3eb" which I should have seen coming. There are plenty of random ports lying around but just grabbing one doesn't feel right (and I don't have a canonical one to go to as I usually run a custom libc for... mostly bad reasons). Deferring maybe again to the few SH users... how do you usually do it? Sébastien [1] https://gcc.gnu.org/install/test.html [2] https://gcc.gnu.org/simtest-howto.html
Hi! On Wed, 2024-07-03 at 19:28 +0200, Sébastien Michelland wrote: > On 2024-07-03 17:59, Jeff Law wrote: > > On 7/3/24 3:59 AM, Sébastien Michelland wrote: > > > libgcc's fp-bit.c is quite slow and most modern/developed architectures > > > have switched to using the soft-fp library. This patch does so for > > > free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default > > > parameters > > > for the most part, most notably no exceptions. > > > > > > A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows > > > about x3 speedup (~320 -> 1050 Kwhets/s). > > > > > > I'm sending this as RFC because I'm quite unsure about testing. I built > > > the compiler and ran the benchmark, but I don't know if GCC has a test > > > for soft-fp correctness and whether I can run that in my non-hosted > > > environment. Any advice? > > > > > > Cheers, > > > Sébastien > > > > > > libgcc/ChangeLog: > > > > > > * config.host: Use soft-fp library for non-hosted SH3/SH4 > > > instead of fpdbit. > > > * config/sh/sfp-machine.h: New. > > I'd really like to hear from Oleg on this, though given we're using the > > soft-fp library on other targets it seems reasonable at a high level. I don't understand why this is being limited to SH3 and SH4 only? Almost all SH4 systems out there have an FPU (unless special configurations are used). So I'd say if switching to soft-fp, then for SH-anything, not just SH3/SH4. If it yields some improvements for some users, I'm all for it. > > As far as testing, the GCC testsuite has some FP components which would > > implicitly test soft fp on any target that doesn't have hardware > > floating point. > > Thank you. I went this route, following the guide [1] and the > instructions for cross-compiling [2] before hitting "Newlib does not > support CPU sh3eb" which I should have seen coming. > > There are plenty of random ports lying around but just grabbing one > doesn't feel right (and I don't have a canonical one to go to as I > usually run a custom libc for... mostly bad reasons). > > Deferring maybe again to the few SH users... how do you usually do it? > > I think it would make sense to test it using sh-sim on SH2 big-endian and little endian at least, as that doesn't have an FPU and hence would run tests utilizing soft-fp. After building the toolchain for --target=sh-elf, you can use this to run the testsuite in the simulator: make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}" (add make -j parameter according to you needs -- it will be slow) Let me know if you have any further questions. Best regards, Oleg Endo
Hi Oleg! > I don't understand why this is being limited to SH3 and SH4 only? > Almost all SH4 systems out there have an FPU (unless special configurations > are used). So I'd say if switching to soft-fp, then for SH-anything, not > just SH3/SH4. > > If it yields some improvements for some users, I'm all for it. Yeah I just defaulted to SH3/SH4 conservatively because that's the only hardware I have. (My main platform also happens to be one of these SH4 without an FPU, the SH4AL-DSP.) Once this is tested/validated on simulator, I'll happily simplify the patch to apply to all SH. > I think it would make sense to test it using sh-sim on SH2 big-endian and > little endian at least, as that doesn't have an FPU and hence would run > tests utilizing soft-fp. > > After building the toolchain for --target=sh-elf, you can use this to run > the testsuite in the simulator: > > make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}" > > (add make -j parameter according to you needs -- it will be slow) Alright, it might take a little bit. Building the combined tree of gcc/binutils/newlib masters (again following [1]) gives me an ICE in libstdc++v3/src/libbacktrace, irrespective of my libgcc change: --- during RTL pass: final elf.c: In function ‘elf_zstd_decompress’: elf.c:4999:1: internal compiler error: in output_296, at config/sh/sh.md:8408 4999 | } | ^ 0x1c8765e internal_error(char const*, ...) ../../combined/gcc/diagnostic-global-context.cc:491 0x881269 fancy_abort(char const*, int, char const*) ../../combined/gcc/diagnostic.cc:1725 0x83b73b output_296 ../../combined/gcc/config/sh/sh.md:8408 0x83b73b output_296 ../../combined/gcc/config/sh/sh.md:8063 0xb783c2 final_scan_insn_1 ../../combined/gcc/final.cc:2773 0xb78938 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) ../../combined/gcc/final.cc:2886 0xb78b5f final_1 ../../combined/gcc/final.cc:1977 0xb796a8 rest_of_handle_final ../../combined/gcc/final.cc:4239 0xb796a8 execute ../../combined/gcc/final.cc:4317 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. make[9]: *** [Makefile:628: std_stacktrace-elf.lo] Error 1 make[9]: *** Waiting for unfinished jobs.... make[9]: Leaving directory '/home/el/Programs/sh-elf-gcc/build-combined2/sh-elf/m2a/libstdc++-v3/src/libbacktrace' --- My configure, for reference (--disable-source-highlight came up from a configure error earlier): ../combined/configure \ --prefix="$PREFIX" \ --target="sh-elf" \ --enable-languages="c,c++" \ --disable-gdb \ --disable-source-highlight The libbacktrace build in gcc (make all-libbacktrace) works without an issue. I'll have to prepare a bug report (I couldn't find anything related), but bisecting on a triplet of repos doesn't sound very fun and I believe I do need the newlib to build libstdc++ in a reproducible way. Any advice before I go on that tangent? Sébastien [1] https://gcc.gnu.org/simtest-howto.html
On 7/5/24 1:28 AM, Sébastien Michelland wrote: > Hi Oleg! > >> I don't understand why this is being limited to SH3 and SH4 only? >> Almost all SH4 systems out there have an FPU (unless special >> configurations >> are used). So I'd say if switching to soft-fp, then for SH-anything, not >> just SH3/SH4. >> >> If it yields some improvements for some users, I'm all for it. > > Yeah I just defaulted to SH3/SH4 conservatively because that's the only > hardware I have. (My main platform also happens to be one of these SH4 > without an FPU, the SH4AL-DSP.) > > Once this is tested/validated on simulator, I'll happily simplify the > patch to apply to all SH. > >> I think it would make sense to test it using sh-sim on SH2 big-endian and >> little endian at least, as that doesn't have an FPU and hence would run >> tests utilizing soft-fp. >> >> After building the toolchain for --target=sh-elf, you can use this to run >> the testsuite in the simulator: >> >> make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}" >> >> (add make -j parameter according to you needs -- it will be slow) > > Alright, it might take a little bit. > > Building the combined tree of gcc/binutils/newlib masters (again > following [1]) gives me an ICE in libstdc++v3/src/libbacktrace, > irrespective of my libgcc change: This is almost certainly a poorly written pattern. I just fixed a bunch of these, but not this one. Essentially a recent change in the generic parts of the compiler is exposing some bugs in the SH backend. Specifically: > ;; Store (negated) T bit as all zeros or ones in a reg. > ;; subc Rn,Rn ! Rn = Rn - Rn - T; T = T > ;; not Rn,Rn ! Rn = 0 - Rn > ;; > ;; Note the call to sh_split_treg_set_expr may clobber > ;; the T reg. We must express this, even though it's > ;; not immediately obvious this pattern changes the > ;; T register. > (define_insn_and_split "mov_neg_si_t" > [(set (match_operand:SI 0 "arith_reg_dest" "=r") > (neg:SI (match_operand 1 "treg_set_expr"))) > (clobber (reg:SI T_REG))] > "TARGET_SH1" > { > gcc_assert (t_reg_operand (operands[1], VOIDmode)); > return "subc %0,%0"; > } > "&& can_create_pseudo_p () && !t_reg_operand (operands[1], VOIDmode)" > [(const_int 0)] > { > sh_treg_insns ti = sh_split_treg_set_expr (operands[1], curr_insn); > emit_insn (gen_mov_neg_si_t (operands[0], get_t_reg_rtx ())); > > if (ti.remove_trailing_nott ()) > emit_insn (gen_one_cmplsi2 (operands[0], operands[0])); > > DONE; > } > [(set_attr "type" "arith")]) As written this pattern could match after register allocation is complete and thus we can't create new pseudos (the condition TARGET_SH1 controls that behavior). operands[1] won't necessarily be the T register in that case. The split condition fails because we can't create new pseudos, so it's left as-is. At final assembly time the assertion triggers. the "&& can_create_pseudo ()" part of the split condition should be moved into the main condition. I think that's all that's necessary to fix this problem. It'd probably be best of Oleg went through the various define_insn_and_split patterns that utilize can_create_pseudo in their split condition and evaluated them. I only fixed the most obvious cases in my change from this morning. I don't typically work on the SH port and for changes which aren't obviously correct, Oleg is in a better position to evaluate the proper fix. jeff
Hi, ( For some weird reason I keep losing Sebastien's messages ... ) On Sat, 2024-07-06 at 07:35 -0600, Jeff Law wrote: > > On 7/5/24 1:28 AM, Sébastien Michelland wrote: > > Hi Oleg! > > > > > I don't understand why this is being limited to SH3 and SH4 only? > > > Almost all SH4 systems out there have an FPU (unless special > > > configurations > > > are used). So I'd say if switching to soft-fp, then for SH-anything, not > > > just SH3/SH4. > > > > > > If it yields some improvements for some users, I'm all for it. > > > > Yeah I just defaulted to SH3/SH4 conservatively because that's the only > > hardware I have. (My main platform also happens to be one of these SH4 > > without an FPU, the SH4AL-DSP.) Oh, wow, especially rare type! > > > > Once this is tested/validated on simulator, I'll happily simplify the > > patch to apply to all SH. The default sh-elf configuration has no multi-libs for SH3 and SH4 variants without FPU (from what I can see). So it won't use soft-fp so much during sim testing. So please change to soft-fp for sh*, not just SH3/SH4. > > > > > I think it would make sense to test it using sh-sim on SH2 big-endian and > > > little endian at least, as that doesn't have an FPU and hence would run > > > tests utilizing soft-fp. > > > > > > After building the toolchain for --target=sh-elf, you can use this to run > > > the testsuite in the simulator: > > > > > > make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}" > > > > > > (add make -j parameter according to you needs -- it will be slow) > > > > Alright, it might take a little bit. > > > > Building the combined tree of gcc/binutils/newlib masters (again > > following [1]) > > I have never built the toolchain using a combined tree. Like you said, it's difficult to debug and so on. I've only built it separately and never had any issues with this approach on multiple platforms/targets. Here's an old proposed change to the simtest instructions to not use combined trees: https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin > > This is almost certainly a poorly written pattern. I just fixed a bunch > of these, but not this one. Essentially a recent change in the generic > parts of the compiler is exposing some bugs in the SH backend. The patterns were written and tested to the best of our knowledge at that time many years ago. Nobody thought that we'll get a 2nd combine pass after RA. Anyway, I'll have a look at the remaining patterns. Sebastien, in the meantime you could also try out and test your changes on the latest GCC 14 branch, which shouldn't have those issues. Best regards, Oleg Endo
Hi! > The default sh-elf configuration has no multi-libs for SH3 and SH4 variants > without FPU (from what I can see). So it won't use soft-fp so much during > sim testing. So please change to soft-fp for sh*, not just SH3/SH4. Got it, done that locally, and will update patch once tested. > Here's an old proposed change to the simtest instructions to not use > combined trees: > > https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin Thanks for the instructions. Apologies for the back-and-forth as I'm pretty new with this infrastructure (I usually do research stuff on LLVM). The split-tree build goes better, still fails with GCC 15 (as expected, though somehow my custom toolchain did build originally) and sort of works with GCC 14. The binutils/gdb repos have been merged since that attachement, and while I can build binutils only with --disable-gdb, building gdb (in another build folder, reconfiguring from scratch) seems iffy. The global CFLAGS/CXXFLAGS to switch to 32-bit affects at least parts of binutils, resulting in a broken toolchain due to architecture mixup: --- % sh-elf-g++ /tmp/test.cc -o /tmp/test $PREFIX/lib/gcc/sh-elf/14.1.1/../../../../sh-elf/bin/ld: $PREFIX/libexec/gcc/sh-elf/14.1.1/liblto_plugin.so: error loading plugin: $PREFIX/libexec/gcc/sh-elf/14.1.1/liblto_plugin.so: wrong ELF class: ELFCLASS64 --- My first build kept GDB as 64-bit, but running the test binary in sh-elf-run gave a Bus error. Even with the 32-bit GDB build, ignoring the broken toolchain and running that old binary still gives a Bus error. For reference, here's my latest attempt. --- % cd ${TOP} % git clone git://sourceware.org/git/binutils-gdb.git % git clone https://sourceware.org/git/newlib-cygwin.git newlib % ln -sf PATH_TO_MY_GCC_14.1 gcc % cd ${TOP}/build-binutils % ../binutils-gdb/configure --target=sh-elf --prefix="$PREFIX" --disable-nls --disable-werror --disable-gdb % make all && make install % cd ${TOP}/build-gcc % ../gcc/configure --target=sh-elf --prefix="$PREFIX" --enable-languages=c,c++ --disable-nls --disable-werror --with-newlib --enable-lto --enable-multilib % make all-gcc && make install-gcc % cd ${TOP}/build-newlib % ../newlib/configure --target=sh-elf --prefix="$PREFIX" --enable-lto --enable-multilib % make all && make install % cd ${TOP}/build-gcc % rm -rf * % ../gcc/configure --target=sh-elf --prefix="$PREFIX" --enable-languages=c,c++ --disable-nls --disable-werror --with-newlib --enable-lto --enable-multilib % make all && make install % cd ${TOP}/build-gdb % CFLAGS="-O2 -m32 -msse -mfpmath=sse" CXXFLAGS="-O2 -m32 -msse -mfpmath=sse" ../binutils-gdb/configure --target=sh-elf --prefix="$PREFIX" --enable-interwork --enable-multilib --disable-nls --disable-werror % make all && make install --- >>> Yeah I just defaulted to SH3/SH4 conservatively because that's the only >>> hardware I have. (My main platform also happens to be one of these SH4 >>> without an FPU, the SH4AL-DSP.) > > Oh, wow, especially rare type! How active are the main types? Like are there still new products designed with these (maybe the J2)? > The patterns were written and tested to the best of our knowledge at that > time many years ago. Nobody thought that we'll get a 2nd combine pass after > RA. Anyway, I'll have a look at the remaining patterns. I'd be interested to learn more about the history of the SH backend, if anyone wrote that up somewhere... Thanks again, Sébastien
On 7/6/24 6:12 PM, Oleg Endo wrote: > >> This is almost certainly a poorly written pattern. I just fixed a bunch >> of these, but not this one. Essentially a recent change in the generic >> parts of the compiler is exposing some bugs in the SH backend. > > The patterns were written and tested to the best of our knowledge at that > time many years ago. Nobody thought that we'll get a 2nd combine pass after > RA. Anyway, I'll have a look at the remaining patterns. My comment wasn't meant to disparage you or anyone. Just to note that the patterns need fixing as they are incorrect. We have certainly had cases where the model used by those patterns would cause problems in the past. Hard register cprop, compare elimination and others could trigger the same kind of problem. late-combine is just more likely to expose these kinds of latent bugs. Jeff
Hi, > > > > The default sh-elf configuration has no multi-libs for SH3 and SH4 variants > > > > without FPU (from what I can see). So it won't use soft-fp so much during > > > > sim testing. So please change to soft-fp for sh*, not just SH3/SH4. > > > > Got it, done that locally, and will update patch once tested. > > > > > > Here's an old proposed change to the simtest instructions to not use > > > > combined trees: > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin > > > > Thanks for the instructions. Apologies for the back-and-forth as I'm > > pretty new with this infrastructure (I usually do research stuff on LLVM). No need to apologize. I know this is a tedious and annoying thing to go through and there is only very little useful information out there. > > The split-tree build goes better, still fails with GCC 15 (as expected, > > though somehow my custom toolchain did build originally) and sort of > > works with GCC 14. > > The binutils/gdb repos have been merged since that attachement, and > > while I can build binutils only with --disable-gdb, building gdb (in > > another build folder, reconfiguring from scratch) seems iffy. The global > > CFLAGS/CXXFLAGS to switch to 32-bit affects at least parts of binutils, > > resulting in a broken toolchain due to architecture mixup: It shouldn't be needed to build GDB separately or to specify the -m32 flags. Not sure why you have to do that. I've just tried the following configure lines: binutils-gdb (binutils-2_41-release) <..>/configure --target=sh-elf --prefix=/usr/local --disable-nls --disable-werror --enable-initfini-array gcc (any version) <..>/configure --target=sh-elf --prefix=/usr/local --enable-languages=c,c++,lto --disable-nls --disable-werror --with-newlib --enable-lto --enable-multilib --with-system-zlib --disable-libstdcxx-verbose --disable-symvers newlib (latest) CFLAGS_FOR_TARGET="-Wno-error=implicit-function-declaration -Wno-implicit-int -ffunction-sections -fdata-sections -flto" <..>/newlib/configure --host=sh-elf --target=sh-elf --prefix=/usr/local --enable-multilib --enable-newlib-io-c99-formats Note that the latest newlib version will try to create multilib directories one directory above its current build directory for some reason. So just create another sub-directory in the build directory and do the config and build from there. Other than that, the build steps are the same as before. I could reproduce the issue with the latest GCC when building libstdc++. I'm working on a fix for it. Unfortunately I'm also getting the SIGBUS error when running a C++ program that uses std::cout / std::cerr. To be honest, I don't remember what the issue was/is, whether this has ever worked at all or not. I've tried rewinding everything back ~10 years ago but was still getting the same error. Using printf from the simulator seems to work fine though. So I guess a bunch of C++ tests of the GCC testsuite will fail on the simulator, but that could be tolerable -- it never passed all the tests on the simulator anyway. It's still a good way to test for regressions that could be introduced by a patch. > > How active are the main types? Like are there still new products > > designed with these (maybe the J2)? There is some activity on the software side which mainly stems from folks using old parts and systems. I'd say the biggest activity is now people hammering on Sega 32X (SH2), Saturn (SH2) and Dreamcast (SH4), but I might be biased here. As for new hardware, I'm not sure. Apparently it's still possible to license SH4A(+FPU) and SH4AL-DSP IP cores from Renesas, but I doubt anybody is really doing that. Some parts are still being manufactured, like SH2A for some niche applications. Don't know what j-core people are up to these days. Some of the SH MCUs have been re-implemented as open source gateware for the MisTer FPGA project. > > > > I'd be interested to learn more about the history of the SH backend, if > > anyone wrote that up somewhere... > > From what I know it started during the earlier cygwin days in the 90s, originally contracted by Hitachi to complement their own in-house C compiler and also to allow sh-linux to happen at some point. It was entertained by Renesas for a while through further contracted support work but eventually they have abandoned it. STmicro was also a licensee of the SH4 CPU for their TV set top boxes and had a few guys submitting patches now and then for a while. But the whole thing basically went on life support about 10 years ago. Perhaps Jeff or others can give more insight on the historical parts. Best regards, Oleg Endo
On 7/8/24 5:52 AM, Oleg Endo wrote: > > From what I know it started during the earlier cygwin days in the 90s, > originally contracted by Hitachi to complement their own in-house C compiler > and also to allow sh-linux to happen at some point. It was entertained by > Renesas for a while through further contracted support work but eventually > they have abandoned it. STmicro was also a licensee of the SH4 CPU for > their TV set top boxes and had a few guys submitting patches now and then > for a while. But the whole thing basically went on life support about 10 > years ago. > > Perhaps Jeff or others can give more insight on the historical parts. IIRC Joern Rennecke was doing most of the GCC work for SH back in the 90s, with perhaps Steve Chamberlain pitching in (though I think he did more on the BFD side). I never really did anything with it as I was focused more on other systems. Jeff
Hi again! > It shouldn't be needed to build GDB separately or to specify the -m32 flags. > Not sure why you have to do that. It was in the document you sent, especially some warning about sh-elf-run not working on 64-bit hosts. Guess that's solved by now. > I've just tried the following configure lines: (...) Thanks, this flow worked for me as well. I'm now running the tests; quite a few fail, all C++. I'll have to control for without/with my change, try a few architectures, and report back. (Wow it's slow.) > There is some activity on the software side which mainly stems from folks > using old parts and systems. I'd say the biggest activity is now people > hammering on Sega 32X (SH2), Saturn (SH2) and Dreamcast (SH4), but I might > be biased here. I come from a community that works on CASIO graphing calculators. CASIO is still producing new models with SH4AL-DSP cores (SH7305), including some this year and next year. Seeing the latest updates they don't seem to have any intention to change it anytime soon either. Not a large community (I'd say a few dozen rotating people a year) but still there. > From what I know it started during the earlier cygwin days in the 90s, > originally contracted by Hitachi to complement their own in-house C compiler > and also to allow sh-linux to happen at some point. It was entertained by > Renesas for a while through further contracted support work but eventually > they have abandoned it. STmicro was also a licensee of the SH4 CPU for > their TV set top boxes and had a few guys submitting patches now and then > for a while. But the whole thing basically went on life support about 10 > years ago. Thanks for the insight. It's quite telling that things are still working after 10 years of "life support". :-) Sébastien
On Wed, 2024-07-03 at 11:59 +0200, Sébastien Michelland wrote: > libgcc's fp-bit.c is quite slow and most modern/developed architectures > have switched to using the soft-fp library. This patch does so for > free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default parameters > for the most part, most notably no exceptions. > > A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows > about x3 speedup (~320 -> 1050 Kwhets/s). > > I'm sending this as RFC because I'm quite unsure about testing. I built > the compiler and ran the benchmark, but I don't know if GCC has a test > for soft-fp correctness and whether I can run that in my non-hosted > environment. Any advice? > As discussed, the patch was changed to use soft-fp not only for SH3/SH4 but generally for all sh-elf targets. sh-*-linux* and sh-*-rtems* are not affected by that and continue using the fdpbit library for floating-point emulation on no-fpu variants. If that should be changed as well, please let me know. Tested with make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}" committed & pushed the attached version to master. Best regards, Oleg Endo
diff --git a/libgcc/config.host b/libgcc/config.host index 9fae51d4c..fee3bf0c0 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1399,7 +1399,15 @@ s390x-ibm-tpf*) md_unwind_header=s390/tpf-unwind.h ;; sh-*-elf* | sh[12346l]*-*-elf*) - tmake_file="$tmake_file sh/t-sh t-crtstuff-pic t-fdpbit" + tmake_file="$tmake_file sh/t-sh t-crtstuff-pic" + case ${host} in + sh[34]*-*-elf*) + tmake_file="${tmake_file} t-softfp-sfdf t-softfp" + ;; + *) + tmake_file="${tmake_file} t-fdpbit" + ;; + esac extra_parts="$extra_parts crt1.o crti.o crtn.o crtbeginS.o crtendS.o \ libic_invalidate_array_4-100.a \ libic_invalidate_array_4-200.a \ diff --git a/libgcc/config/sh/sfp-machine.h b/libgcc/config/sh/sfp-machine.h new file mode 100644 index 000000000..c1aa428c0 --- /dev/null +++ b/libgcc/config/sh/sfp-machine.h @@ -0,0 +1,83 @@ +/* Software floating-point machine description for SuperH. + + Copyright (C) 2016-2024 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#define _FP_W_TYPE_SIZE 32 +#define _FP_W_TYPE unsigned long +#define _FP_WS_TYPE signed long +#define _FP_I_TYPE long + +#define _FP_MUL_MEAT_S(R,X,Y) \ + _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm) +#define _FP_MUL_MEAT_D(R,X,Y) \ + _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm) +#define _FP_MUL_MEAT_Q(R,X,Y) \ + _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm) + +#define _FP_DIV_MEAT_S(R,X,Y) _FP_DIV_MEAT_1_udiv_norm(S,R,X,Y) +#define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) +#define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) + +#define _FP_NANFRAC_B _FP_QNANBIT_B +#define _FP_NANFRAC_H _FP_QNANBIT_H +#define _FP_NANFRAC_S _FP_QNANBIT_S +#define _FP_NANFRAC_D _FP_QNANBIT_D, 0 +#define _FP_NANFRAC_Q _FP_QNANBIT_Q, 0, 0, 0 + +/* The type of the result of a floating point comparison. This must + match __libgcc_cmp_return__ in GCC for the target. */ +typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); +#define CMPtype __gcc_CMPtype + +#define _FP_NANSIGN_B 0 +#define _FP_NANSIGN_H 0 +#define _FP_NANSIGN_S 0 +#define _FP_NANSIGN_D 0 +#define _FP_NANSIGN_Q 0 + +#define _FP_KEEPNANFRACP 0 +#define _FP_QNANNEGATEDP 0 + +#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP) \ + do { \ + R##_s = _FP_NANSIGN_##fs; \ + _FP_FRAC_SET_##wc(R,_FP_NANFRAC_##fs); \ + R##_c = FP_CLS_NAN; \ + } while (0) + +#define _FP_TININESS_AFTER_ROUNDING 1 + +#define __LITTLE_ENDIAN 1234 +#define __BIG_ENDIAN 4321 + +#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) +#define __BYTE_ORDER __BIG_ENDIAN +#else +#define __BYTE_ORDER __LITTLE_ENDIAN +#endif + +/* Define ALIASNAME as a strong alias for NAME. */ +# define strong_alias(name, aliasname) _strong_alias(name, aliasname) +# define _strong_alias(name, aliasname) \ + extern __typeof (name) aliasname __attribute__ ((alias (#name)));
libgcc's fp-bit.c is quite slow and most modern/developed architectures have switched to using the soft-fp library. This patch does so for free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default parameters for the most part, most notably no exceptions. A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows about x3 speedup (~320 -> 1050 Kwhets/s). I'm sending this as RFC because I'm quite unsure about testing. I built the compiler and ran the benchmark, but I don't know if GCC has a test for soft-fp correctness and whether I can run that in my non-hosted environment. Any advice? Cheers, Sébastien libgcc/ChangeLog: * config.host: Use soft-fp library for non-hosted SH3/SH4 instead of fpdbit. * config/sh/sfp-machine.h: New. Signed-off-by: Sébastien Michelland <sebastien.michelland@lcis.grenoble-inp.fr> --- libgcc/config.host | 10 +++- libgcc/config/sh/sfp-machine.h | 83 ++++++++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 libgcc/config/sh/sfp-machine.h