mbox series

[0/7,x86] Remove vcond{,u,eq}<mode> expanders.

Message ID 20240627082307.1166985-1-hongtao.liu@intel.com
Headers show
Series Remove vcond{,u,eq}<mode> expanders. | expand

Message

liuhongt June 27, 2024, 8:23 a.m. UTC
There're several regressions after obsolete vcond{,u,eq}<mode>,
Some regressions are due to the direct optimizations in
ix86_expand_{fp,int}_vcond..i.e ix86_expand_sse_fp_minmax.
Some regrssions are due to optimizations relies on canonicalization
in ix86_expand_{fp,int}_vcond.

This series add define_split or define_insn_and_split to restore
those optimizations at pass_combine. It fixed most regressions in GCC
testsuite except for ones compiled w/o sse4.1. W/o sse4.1 it takes 3
instrution for vector condition move, and pass_combine only supports
at most 4 instructions combination. One possible solution is add fake
"ssemovcc" instructions to help combine, and split that back to real
instruction. This series doesn't handle that, but just adjust testcases
to XFAIL.

I also test performance on SPEC2017 with different options set.
-march=sapphirerapids -O2
-march=x86-64-v3 -O2
-march=x86-64 -O2
-march=sapphirerapids -O2
Didn't observe obvious performance change, mostly same binaries.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Any comments?

liuhongt (7):
  [x86] Add more splitters to match (unspec [op1 op2 (gt op3
    constm1_operand)] UNSPEC_BLENDV)
  Lower AVX512 kmask comparison back to AVX2 comparison when
    op_{true,false} is vector -1/0.
  [x86] Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}.
  Add more splitter for mskmov with avx512 comparison.
  Adjust testcase for the regressed testcases after obsolete of
    vcond{,u,eq}.
  [x86] Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
  Remove vcond{,u,eq}<mode> expanders since they will be obsolete.

 gcc/config/i386/mmx.md                        | 149 ++--
 gcc/config/i386/sse.md                        | 772 +++++++++++++-----
 gcc/testsuite/g++.target/i386/avx2-pr115517.C |  60 ++
 .../g++.target/i386/avx512-pr115517.C         |  70 ++
 gcc/testsuite/g++.target/i386/pr100637-1b.C   |   4 +-
 gcc/testsuite/g++.target/i386/pr100637-1w.C   |   4 +-
 gcc/testsuite/g++.target/i386/pr103861-1.C    |   4 +-
 .../g++.target/i386/sse4_1-pr100637-1b.C      |  17 +
 .../g++.target/i386/sse4_1-pr100637-1w.C      |  17 +
 .../g++.target/i386/sse4_1-pr103861-1.C       |  17 +
 gcc/testsuite/gcc.target/i386/avx2-pr115517.c |  33 +
 .../gcc.target/i386/avx512-pr115517.c         |  70 ++
 gcc/testsuite/gcc.target/i386/pr103941-2.c    |   2 +-
 gcc/testsuite/gcc.target/i386/pr111023-2.c    |   4 +-
 gcc/testsuite/gcc.target/i386/pr88540.c       |   4 +-
 .../gcc.target/i386/sse4_1-pr88540.c          |  10 +
 gcc/testsuite/gcc.target/i386/vect-div-1.c    |   3 +-
 17 files changed, 918 insertions(+), 322 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/avx2-pr115517.C
 create mode 100644 gcc/testsuite/g++.target/i386/avx512-pr115517.C
 create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr100637-1b.C
 create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr100637-1w.C
 create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr103861-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr115517.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-pr115517.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-pr88540.c

Comments

Richard Biener June 27, 2024, 9:59 a.m. UTC | #1
On Thu, Jun 27, 2024 at 10:27 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> There're several regressions after obsolete vcond{,u,eq}<mode>,
> Some regressions are due to the direct optimizations in
> ix86_expand_{fp,int}_vcond..i.e ix86_expand_sse_fp_minmax.
> Some regrssions are due to optimizations relies on canonicalization
> in ix86_expand_{fp,int}_vcond.
>
> This series add define_split or define_insn_and_split to restore
> those optimizations at pass_combine. It fixed most regressions in GCC
> testsuite except for ones compiled w/o sse4.1. W/o sse4.1 it takes 3
> instrution for vector condition move, and pass_combine only supports
> at most 4 instructions combination. One possible solution is add fake
> "ssemovcc" instructions to help combine, and split that back to real
> instruction. This series doesn't handle that, but just adjust testcases
> to XFAIL.
>
> I also test performance on SPEC2017 with different options set.
> -march=sapphirerapids -O2
> -march=x86-64-v3 -O2
> -march=x86-64 -O2
> -march=sapphirerapids -O2
> Didn't observe obvious performance change, mostly same binaries.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Any comments?

Thanks for working on this.  Can you open a bugreport for the cases
you XFAILed so we can see if the middle-end can be of help here?

Thanks,
Richard.

> liuhongt (7):
>   [x86] Add more splitters to match (unspec [op1 op2 (gt op3
>     constm1_operand)] UNSPEC_BLENDV)
>   Lower AVX512 kmask comparison back to AVX2 comparison when
>     op_{true,false} is vector -1/0.
>   [x86] Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}.
>   Add more splitter for mskmov with avx512 comparison.
>   Adjust testcase for the regressed testcases after obsolete of
>     vcond{,u,eq}.
>   [x86] Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
>   Remove vcond{,u,eq}<mode> expanders since they will be obsolete.
>
>  gcc/config/i386/mmx.md                        | 149 ++--
>  gcc/config/i386/sse.md                        | 772 +++++++++++++-----
>  gcc/testsuite/g++.target/i386/avx2-pr115517.C |  60 ++
>  .../g++.target/i386/avx512-pr115517.C         |  70 ++
>  gcc/testsuite/g++.target/i386/pr100637-1b.C   |   4 +-
>  gcc/testsuite/g++.target/i386/pr100637-1w.C   |   4 +-
>  gcc/testsuite/g++.target/i386/pr103861-1.C    |   4 +-
>  .../g++.target/i386/sse4_1-pr100637-1b.C      |  17 +
>  .../g++.target/i386/sse4_1-pr100637-1w.C      |  17 +
>  .../g++.target/i386/sse4_1-pr103861-1.C       |  17 +
>  gcc/testsuite/gcc.target/i386/avx2-pr115517.c |  33 +
>  .../gcc.target/i386/avx512-pr115517.c         |  70 ++
>  gcc/testsuite/gcc.target/i386/pr103941-2.c    |   2 +-
>  gcc/testsuite/gcc.target/i386/pr111023-2.c    |   4 +-
>  gcc/testsuite/gcc.target/i386/pr88540.c       |   4 +-
>  .../gcc.target/i386/sse4_1-pr88540.c          |  10 +
>  gcc/testsuite/gcc.target/i386/vect-div-1.c    |   3 +-
>  17 files changed, 918 insertions(+), 322 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/avx2-pr115517.C
>  create mode 100644 gcc/testsuite/g++.target/i386/avx512-pr115517.C
>  create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr100637-1b.C
>  create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr100637-1w.C
>  create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr103861-1.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr115517.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-pr115517.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-pr88540.c
>
> --
> 2.31.1
>