Message ID | 004501d82fb2$d9e77b20$8db67160$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [i386] PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32. | expand |
On Fri, Mar 4, 2022 at 11:30 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This clean-up patch resolves PR testsuite/104732, the failure of the recent > test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86. Rather than just > tweak the testcase, the proposed approach is to fix the underlying problem > by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode > logical operation expanders and pre-reload splitters in i386.md, which as > I'll show generate inferior code (even a GCC 12 regression) on !TARGET_64BIT > whenever -mno-stv (such as Solaris) or -msse (but not -msse2). > > First a little bit of history. In the beginning, DImode operations on > i386 weren't defined by the machine description, and lowered during RTL > expansion to SI mode operations. The with PR 65105 in 2015, -mstv was > added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x) > together with several *<code>di3_doubleword post-reload splitters that > made use of register allocation to perform some double word operations > in 64-but XMM registers. A short while later in 2016, PR 70322 added > similar support for one_cmpldi2. All of this logic was dependent upon > "!TARGET_64BIT && TARGET_STV && TARGET_SSE2". With the passing of time, > these conditions became irrelevant when in 2019, it was decided to split > these double-word patterns before reload. > https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html > https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html > Hence the current situation, where on most modern CPU architectures > (where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI > mode operations, that are then split into two SI mode instructions > before reload, except on Solaris and other odd cases, where the splitting > is to two SI mode instructions is done during RTL expansion. By the > time compilation reaches register allocation both paths in theory > produce identical or similar code, so the vestigial legacy/logic would > appear to be harmless. > > Unfortunately, there is one place where this arbitrary choice of how > to lower DI mode doubleword operations is visible to the middle-end, > it controls whether the backend appears to have a suitable optab, and > the presence (or not) of DImode optabs can influence vectorization > cost models and veclower decisions. > > The issue (and code quality regression) can be seen in this test case: > > typedef long long v2di __attribute__((vector_size (16))); > v2di x; > void foo (long long a) > { > v2di t = {a, a}; > x = ~t; > } > > which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces: > > foo: subl $28, %esp > movl %ebx, 16(%esp) > movl 32(%esp), %eax > movl %esi, 20(%esp) > movl 36(%esp), %edx > movl %edi, 24(%esp) > movl %eax, %esi > movl %eax, %edi > movl %edx, %ebx > movl %edx, %ecx > notl %esi > notl %ebx > movl %esi, (%esp) > notl %edi > notl %ecx > movl %ebx, 4(%esp) > movl 20(%esp), %esi > movl %edi, 8(%esp) > movl 16(%esp), %ebx > movl %ecx, 12(%esp) > movl 24(%esp), %edi > movss 8(%esp), %xmm1 > movss 12(%esp), %xmm2 > movss (%esp), %xmm0 > movss 4(%esp), %xmm3 > unpcklps %xmm2, %xmm1 > unpcklps %xmm3, %xmm0 > movlhps %xmm1, %xmm0 > movaps %xmm0, x > addl $28, %esp > ret > > > Importantly notice the four "notl" instructions. With this patch: > > foo: subl $28, %esp > movl 32(%esp), %edx > movl 36(%esp), %eax > notl %edx > movl %edx, (%esp) > notl %eax > movl %eax, 4(%esp) > movl %edx, 8(%esp) > movl %eax, 12(%esp) > movaps (%esp), %xmm1 > movaps %xmm1, x > addl $28, %esp > ret > > Notice only two "notl" instructions. Checking with Godbolt.org, GCC > generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x, > and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies > this clean-up as suitable for stage 4]. > > Most significantly, this patch allows pr100711-1.c to pass with > -mno-stv, allowing pandn to be used with V2DImode on Solaris/x86. > Fingers-crossed this should reduce the number of discrepancies > that Rainer Orth encounters supporting Solaris/x86. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, with/without --target_board='unix{-m32\ -march= > cascadelake}' with no new failures. Ok for mainline? The idea was to leave decomposition of double-word operations to the generic middle-end, where simplification and propagation of constants will be handled in a generic way. However, several releases later, these simplifications were also introduced to STV-enabeld patterns. So, if the latter approach is demonstrably better, then I see no problem to enable it for all targets, not only STV-enabled ones. > > 2022-03-04 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR testsuite/104732 > * config/i386/i386.md (SWIM1248s): Include DI mode unconditionally. > (*anddi3_doubleword): Remove && TARGET_STV && TARGET_SSE2 condition, > i.e. always split on !TARGET_64BIT. > (*<any_or>di3_doubleword): Likewise. > (*one_cmpldi2_doubleword): Likewise. > > gcc/testsuite/ChangeLog > PR testsuite/104732 > * gcc.target/i386/pr104732.c: New test case. OK with a nit below. Thanks, Uros. -;; Math-dependant integer modes with DImode (enabled for 32bit with STV). +;; Math-dependant integer modes with DImode. (define_mode_iterator SWIM1248s [(QI "TARGET_QIMODE_MATH") (HI "TARGET_HIMODE_MATH") - SI (DI "TARGET_64BIT || (TARGET_STV && TARGET_SSE2)")]) + SI DI]) Please rename this iterator to SWIM1248x, as is the case with other iterators where DImode is used unconditionally.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 5e0a980..8eae4b0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1079,11 +1079,11 @@ (HI "TARGET_HIMODE_MATH") SI]) -;; Math-dependant integer modes with DImode (enabled for 32bit with STV). +;; Math-dependant integer modes with DImode. (define_mode_iterator SWIM1248s [(QI "TARGET_QIMODE_MATH") (HI "TARGET_HIMODE_MATH") - SI (DI "TARGET_64BIT || (TARGET_STV && TARGET_SSE2)")]) + SI DI]) ;; Math-dependant single word integer modes without QImode. (define_mode_iterator SWIM248 [(HI "TARGET_HIMODE_MATH") @@ -9733,7 +9733,7 @@ (match_operand:DI 1 "nonimmediate_operand") (match_operand:DI 2 "x86_64_szext_general_operand"))) (clobber (reg:CC FLAGS_REG))] - "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 + "!TARGET_64BIT && ix86_binary_operator_ok (AND, DImode, operands) && ix86_pre_reload_split ()" "#" @@ -10349,7 +10349,7 @@ (match_operand:DI 1 "nonimmediate_operand") (match_operand:DI 2 "x86_64_szext_general_operand"))) (clobber (reg:CC FLAGS_REG))] - "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 + "!TARGET_64BIT && ix86_binary_operator_ok (<CODE>, DImode, operands) && ix86_pre_reload_split ()" "#" @@ -11435,7 +11435,7 @@ (define_insn_and_split "*one_cmpldi2_doubleword" [(set (match_operand:DI 0 "nonimmediate_operand") (not:DI (match_operand:DI 1 "nonimmediate_operand")))] - "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 + "!TARGET_64BIT && ix86_unary_operator_ok (NOT, DImode, operands) && ix86_pre_reload_split ()" "#" diff --git a/gcc/testsuite/gcc.target/i386/pr104732.c b/gcc/testsuite/gcc.target/i386/pr104732.c new file mode 100644 index 0000000..c8954366c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104732.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2 -m32 -msse -march=pentiumpro" } */ + +typedef long long v2di __attribute__((vector_size (16))); + +v2di x; + +void foo (long long a) +{ + v2di t = {a, a}; + x = ~t; +} + +/* { dg-final { scan-assembler-times "notl" 2 } } */