Message ID | 001501d9210f$694bed70$3be3c850$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86_64] Introduce insvti_highpart define_insn_and_split. | expand |
On Thu, Jan 5, 2023 at 3:10 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch adds a convenient post-reload splitter for setting/updating > the highpart of a TImode variable, using the recently added > split_double_concat infrastructure. > > For the new test case below: > > __int128 foo(__int128 x, unsigned long long y) > { > __int128 t = (__int128)y << 64; > __int128 r = (x & ~0ull) | t; > return r; > } > > mainline GCC with -O2 currently generates: > > foo: movq %rdi, %rcx > xorl %eax, %eax > xorl %edi, %edi > orq %rcx, %rax > orq %rdi, %rdx > ret > > with this patch, GCC instead now generates the much better: > > foo: movq %rdi, %rcx > movq %rcx, %rax > ret > > [which interestingly shows the deeper (ABI) issue that I'm trying > to fix, this new define_insn_and_split being the next piece]. > > It turns out that the -m32 equivalent of this testcase, already > avoids using explict orl/xor instructions, as it gets optimized > (in combine) by a completely different path. Given that this idiom > isn't seen in 32-bit code (and therefore this pattern wouldn't match), > and also that the shorter 32-bit AND bitmask is represented as a > CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split > is implemented for just TARGET_64BIT rather than contort a "generic" > implementation using DWI mode iterators. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? Please let me know if you'd > prefer a different pattern name [insv seemed better than mov]. > > > 2023-01-05 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (any_or_plus): Move definition earlier. > (*insvti_highpart_1): New define_insn_and_split to overwrite > (insv) the highpart of a TImode register/memory. > > gcc/testsuite/ChangeLog > * gcc.target/i386/insvti_highpart-1.c: New test case. LGTM, but the new pattern looks complex enough to make me a bit nervous at this development stage. If the patch just extended the existing SI/DI mode pattern to also handle DI/TI case for 64-bit targets, I would approve it without hesitation, but since 128 bit types are not that common ATM, I'd rather postpone introduction of complex new pattern to the next stage 1. Target specific part of the compiler does have some more freedom to introduce new functionality during "general bugfixing" development stage, so simple patches that extend existing patterns to handle new modes or constraints should be OK, but let's not stretch this rule too much. To follow with the example, your recent extendditi2 splitter patch was OK even at this development stage, because it just extended existing patterns and existing approach to also handle 64bit instructions (that are actually same as 32bit ones). Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 890c4c8..9f3c62b 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3401,6 +3401,31 @@ "mov{b}\t{%h1, %h0|%h0, %h1}" [(set_attr "type" "imov") (set_attr "mode" "QI")]) + +(define_code_iterator any_or_plus [plus ior xor]) +(define_insn_and_split "*insvti_highpart_1" + [(set (match_operand:TI 0 "nonimmediate_operand" "=ro,r,r,&r") + (any_or_plus:TI + (and:TI + (match_operand:TI 1 "nonimmediate_operand" "r,m,r,m") + (match_operand:TI 3 "const_scalar_int_operand" "n,n,n,n")) + (ashift:TI + (zero_extend:TI + (match_operand:DI 2 "nonimmediate_operand" "r,r,m,m")) + (const_int 64))))] + "TARGET_64BIT + && CONST_WIDE_INT_P (operands[3]) + && CONST_WIDE_INT_NUNITS (operands[3]) == 2 + && CONST_WIDE_INT_ELT (operands[3], 0) == -1 + && CONST_WIDE_INT_ELT (operands[3], 1) == 0" + "#" + "&& reload_completed" + [(clobber (const_int 0))] +{ + operands[4] = gen_lowpart (DImode, operands[1]); + split_double_concat (TImode, operands[0], operands[4], operands[2]); + DONE; +}) ;; Floating point push instructions. @@ -11418,7 +11443,6 @@ (set_attr "mode" "QI")]) ;; Split DST = (HI<<32)|LO early to minimize register usage. -(define_code_iterator any_or_plus [plus ior xor]) (define_insn_and_split "*concat<mode><dwi>3_1" [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r") (any_or_plus:<DWI> diff --git a/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c b/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c new file mode 100644 index 0000000..4ae9ccf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/insvti_highpart-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +__int128 foo(__int128 x, unsigned long long y) +{ + __int128 t = (__int128)y << 64; + __int128 r = (x & ~0ull) | t; + return r; +} + +/* { dg-final { scan-assembler-not "xorl" } } */ +/* { dg-final { scan-assembler-not "orq" } } */