Message ID | 000901da6d6a$2d642630$882c7290$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | PR target/114187: Fix ?Fmode SUBREG simplification in simplify_subreg. | expand |
> Am 03.03.2024 um 13:56 schrieb Roger Sayle <roger@nextmovesoftware.com>: > > > This patch fixes PR target/114187 a typo/missed-optimization in simplify-rtx > that's exposed by (my) changes to x86_64's parameter passing. The context > is that construction of double word (TImode) values now uses the idiom: > > (ior:TI (ashift:TI (zero_extend:TI (reg:DI x)) (const_int 64 [0x40])) > (zero_extend:TI (reg:DI y))) > > Extracting the DImode highpart and lowpart halves of this complex expression > is supported by simplications in simplify_subreg. The problem is when the > doubleword TImode value represents two DFmode fields, there isn't a direct > simplification to extract the highpart or lowpart SUBREGs, but instead GCC > uses two steps, extract the DImode {high,low} part and then cast the result > back to a floating point mode, DFmode. > > The (buggy) code to do this is: > > /* If the outer mode is not integral, try taking a subreg with the > equivalent > integer outer mode and then bitcasting the result. > Other simplifications rely on integer to integer subregs and we'd > potentially miss out on optimizations otherwise. */ > if (known_gt (GET_MODE_SIZE (innermode), > GET_MODE_SIZE (outermode)) > && SCALAR_INT_MODE_P (innermode) > && !SCALAR_INT_MODE_P (outermode) > && int_mode_for_size (GET_MODE_BITSIZE (outermode), > 0).exists (&int_outermode)) > { > rtx tem = simplify_subreg (int_outermode, op, innermode, byte); > if (tem) > return simplify_gen_subreg (outermode, tem, int_outermode, byte); > } > > The issue/mistake is that the second call, to simplify_subreg, shouldn't > use "byte" as the final argument; the offset has already been handled by > the first call, to simplify_subreg, and this second call is just a type > conversion from an integer mode to floating point (from DImode to DFmode). > > Interestingly, this mistake was already spotted by Richard Sandiford when > the optimization was originally contributed in January 2023. > https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610920.html >>> Richard Sandiford writes: >>> Also, the final line should pass 0 rather than byte. > > Unfortunately a miscommunication/misunderstanding in a later thread > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612898.html > resulted in this correction being undone. Alas the lack of any test > cases when the optimization was added/modified potentially contributed > to this lapse. Using lowpart_subreg should avoid/reduce confusion in > future. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? Ok Richard > > 2024-03-03 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/114187 > * simplify-rtx.cc (simplify_context::simplify_subreg): Call > lowpart_subreg to perform type conversion, to avoid confusion > over the offset to use in the call to simplify_reg_subreg. > > gcc/testsuite/ChangeLog > PR target/114187 > * g++.target/i386/pr114187.C: New test case. > > > Thanks in advance, > Roger > -- > > <patchss.txt>
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 36dd522..dceaa13 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7846,7 +7846,7 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, { rtx tem = simplify_subreg (int_outermode, op, innermode, byte); if (tem) - return simplify_gen_subreg (outermode, tem, int_outermode, byte); + return lowpart_subreg (outermode, tem, int_outermode); } /* If OP is a vector comparison and the subreg is not changing the diff --git a/gcc/testsuite/g++.target/i386/pr114187.C b/gcc/testsuite/g++.target/i386/pr114187.C new file mode 100644 index 0000000..69912a9 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr114187.C @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct P2d { + double x, y; +}; + +double sumxy_p(P2d p) { + return p.x + p.y; +} + +/* { dg-final { scan-assembler-not "movq" } } */ +/* { dg-final { scan-assembler-not "xchg" } } */