[i386] Fix PR 88998, bad codegen with mmx instructions

Message ID	CAFULd4ZN1jGnbRdjhKTjNVDqEAMYdU3nEKxuAfB=AhyWRuPkOg@mail.gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-494625-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=D2bINoaCd2EycdAErQN8uljf67IHxmXbtHpLB9RnxQNJ81 D8CBlel3a/3TuO9NTWoilY5FskvGAeUHZYYAbYFHeC0uGu3TeXcvKwqC7e3DxKSz dgWegThlBoB7aOQpvXRer+6NZ+5vAVgos/VCgm5j8nEJjYO7f9zAb8LCj7frA= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org MIME-Version: 1.0 From: Uros Bizjak <ubizjak@gmail.com> Date: Wed, 23 Jan 2019 20:22:19 +0100 Message-ID: <CAFULd4ZN1jGnbRdjhKTjNVDqEAMYdU3nEKxuAfB=AhyWRuPkOg@mail.gmail.com> Subject: [PATCH, i386] Fix PR 88998, bad codegen with mmx instructions To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> Content-Type: multipart/mixed; boundary="000000000000d81f8105802503f2"
Series	[i386] Fix PR 88998, bad codegen with mmx instructions \| expand [i386] Fix PR 88998, bad codegen with mmx instructions

Message ID

CAFULd4ZN1jGnbRdjhKTjNVDqEAMYdU3nEKxuAfB=AhyWRuPkOg@mail.gmail.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:from:date:message-id:subject:to:content-type; q=
	dns; s=default; b=D2bINoaCd2EycdAErQN8uljf67IHxmXbtHpLB9RnxQNJ81
	D8CBlel3a/3TuO9NTWoilY5FskvGAeUHZYYAbYFHeC0uGu3TeXcvKwqC7e3DxKSz
	dgWegThlBoB7aOQpvXRer+6NZ+5vAVgos/VCgm5j8nEJjYO7f9zAb8LCj7frA=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
MIME-Version: 1.0
From: Uros Bizjak <ubizjak@gmail.com>
Date: Wed, 23 Jan 2019 20:22:19 +0100
Message-ID: <CAFULd4ZN1jGnbRdjhKTjNVDqEAMYdU3nEKxuAfB=AhyWRuPkOg@mail.gmail.com>
Subject: [PATCH, i386] Fix PR 88998, bad codegen with mmx instructions
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Content-Type: multipart/mixed; boundary="000000000000d81f8105802503f2"

Series

[i386] Fix PR 88998, bad codegen with mmx instructions | expand

Commit Message

Uros Bizjak Jan. 23, 2019, 7:22 p.m. UTC

Attached patch adds SSE alternatives to sse2_cvtpi2pd, sse2_cvtpd2pi
and sse2_cvttpd2pi to avoid MMX registers when e.g. _mm_cvtepi32_pd
intrinsics is used. Without the patch, the testcase compiles to (-O2
-mavx):

_Z7prepareii:
        vmovd   %edi, %xmm1
        vpinsrd $1, %esi, %xmm1, %xmm0
        movdq2q %xmm0, %mm0
        cvtpi2pd        %mm0, %xmm0
        vhaddpd %xmm0, %xmm0, %xmm0
        ret

while patched gcc generates:

        vmovd   %edi, %xmm1
        vpinsrd $1, %esi, %xmm1, %xmm0
        vcvtdq2pd       %xmm0, %xmm0
        vhaddpd %xmm0, %xmm0, %xmm0
        ret

The later avoids transition of FPU to MMX mode.

2019-01-23  Uroš Bizjak  <ubizjak@gmail.com>

    PR target/88998
    * config/i386/sse.md (sse2_cvtpi2pd): Add SSE alternatives.
    Disparage MMX alternative.
    (sse2_cvtpd2pi): Ditto.
    (sse2_cvttpd2pi): Ditto.

testsuite/ChangeLog:

2019-01-23  Uroš Bizjak  <ubizjak@gmail.com>

    PR target/88998
    * g++.target/i386/pr88998.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to release branches.

Uros.

Comments

H.J. Lu Jan. 23, 2019, 7:52 p.m. UTC | #1

On Wed, Jan 23, 2019 at 11:22 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> Attached patch adds SSE alternatives to sse2_cvtpi2pd, sse2_cvtpd2pi
> and sse2_cvttpd2pi to avoid MMX registers when e.g. _mm_cvtepi32_pd
> intrinsics is used. Without the patch, the testcase compiles to (-O2
> -mavx):
>
> _Z7prepareii:
>         vmovd   %edi, %xmm1
>         vpinsrd $1, %esi, %xmm1, %xmm0
>         movdq2q %xmm0, %mm0
>         cvtpi2pd        %mm0, %xmm0
>         vhaddpd %xmm0, %xmm0, %xmm0
>         ret
>
> while patched gcc generates:
>
>         vmovd   %edi, %xmm1
>         vpinsrd $1, %esi, %xmm1, %xmm0
>         vcvtdq2pd       %xmm0, %xmm0
>         vhaddpd %xmm0, %xmm0, %xmm0
>         ret
>
> The later avoids transition of FPU to MMX mode.
>

Is that possible to support 64-bit vectors, like V2SI, with SSE
instead of MMX for x86-64 under a command-line switch?

Uros Bizjak Jan. 23, 2019, 8:26 p.m. UTC | #2

On Wed, Jan 23, 2019 at 8:52 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Jan 23, 2019 at 11:22 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > Attached patch adds SSE alternatives to sse2_cvtpi2pd, sse2_cvtpd2pi
> > and sse2_cvttpd2pi to avoid MMX registers when e.g. _mm_cvtepi32_pd
> > intrinsics is used. Without the patch, the testcase compiles to (-O2
> > -mavx):
> >
> > _Z7prepareii:
> >         vmovd   %edi, %xmm1
> >         vpinsrd $1, %esi, %xmm1, %xmm0
> >         movdq2q %xmm0, %mm0
> >         cvtpi2pd        %mm0, %xmm0
> >         vhaddpd %xmm0, %xmm0, %xmm0
> >         ret
> >
> > while patched gcc generates:
> >
> >         vmovd   %edi, %xmm1
> >         vpinsrd $1, %esi, %xmm1, %xmm0
> >         vcvtdq2pd       %xmm0, %xmm0
> >         vhaddpd %xmm0, %xmm0, %xmm0
> >         ret
> >
> > The later avoids transition of FPU to MMX mode.
> >
>
> Is that possible to support 64-bit vectors, like V2SI, with SSE
> instead of MMX for x86-64 under a command-line switch?

SSE registers are preferred for 64bit vectors (see number of
exclamation marks in *mov<mode>_internal in mmx.md), so the value will
be passed in SSE regs unless there is pure MMX instruction, where due
to missing SSE alternatives, RA will need to allocate MMX register.

Uros.

H.J. Lu Jan. 23, 2019, 8:56 p.m. UTC | #3

On Wed, Jan 23, 2019 at 12:27 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Jan 23, 2019 at 8:52 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Wed, Jan 23, 2019 at 11:22 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > Attached patch adds SSE alternatives to sse2_cvtpi2pd, sse2_cvtpd2pi
> > > and sse2_cvttpd2pi to avoid MMX registers when e.g. _mm_cvtepi32_pd
> > > intrinsics is used. Without the patch, the testcase compiles to (-O2
> > > -mavx):
> > >
> > > _Z7prepareii:
> > >         vmovd   %edi, %xmm1
> > >         vpinsrd $1, %esi, %xmm1, %xmm0
> > >         movdq2q %xmm0, %mm0
> > >         cvtpi2pd        %mm0, %xmm0
> > >         vhaddpd %xmm0, %xmm0, %xmm0
> > >         ret
> > >
> > > while patched gcc generates:
> > >
> > >         vmovd   %edi, %xmm1
> > >         vpinsrd $1, %esi, %xmm1, %xmm0
> > >         vcvtdq2pd       %xmm0, %xmm0
> > >         vhaddpd %xmm0, %xmm0, %xmm0
> > >         ret
> > >
> > > The later avoids transition of FPU to MMX mode.
> > >
> >
> > Is that possible to support 64-bit vectors, like V2SI, with SSE
> > instead of MMX for x86-64 under a command-line switch?
>
> SSE registers are preferred for 64bit vectors (see number of
> exclamation marks in *mov<mode>_internal in mmx.md), so the value will
> be passed in SSE regs unless there is pure MMX instruction, where due
> to missing SSE alternatives, RA will need to allocate MMX register.
>

[hjl@gnu-cfl-1 v64-1]$ cat x.i
typedef float __v2sf __attribute__ ((__vector_size__ (8)));

extern __v2sf z;

void
add (__v2sf x, __v2sf y)
{
  z = x + y;
}
[hjl@gnu-cfl-1 v64-1]$ make x.s
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/  -O2
-S x.i
[hjl@gnu-cfl-1 v64-1]$ cat x.s
.file "x.i"
.text
.p2align 4
.globl add
.type add, @function
add:
.LFB0:
.cfi_startproc
movlps %xmm0, -32(%rsp)
movss -32(%rsp), %xmm0
movlps %xmm1, -40(%rsp)
addss -40(%rsp), %xmm0
movss %xmm0, -56(%rsp)
movss -28(%rsp), %xmm0
addss -36(%rsp), %xmm0
movss %xmm0, -52(%rsp)
movq -56(%rsp), %rax
movq %rax, z(%rip)
ret
.cfi_endproc
.LFE0:
.size add, .-add
.ident "GCC: (GNU) 9.0.0 20190122 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-1 v64-1]$

I am expecting:

addps %xmm1, %xmm0
movlps %xmm0, z(%rip)
retq

Index: config/i386/sse.md
===================================================================
--- config/i386/sse.md	(revision 268188)
+++ config/i386/sse.md	(working copy)
@@ -4997,37 +4997,49 @@ 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "sse2_cvtpi2pd"
-  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
-	(float:V2DF (match_operand:V2SI 1 "nonimmediate_operand" "y,m")))]
+  [(set (match_operand:V2DF 0 "register_operand" "=v,x")
+	(float:V2DF (match_operand:V2SI 1 "nonimmediate_operand" "vBm,?!y")))]
   "TARGET_SSE2"
-  "cvtpi2pd\t{%1, %0|%0, %1}"
+  "@
+   %vcvtdq2pd\t{%1, %0|%0, %1}
+   cvtpi2pd\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx,*")
-   (set_attr "prefix_data16" "1,*")
+   (set_attr "unit" "*,mmx")
+   (set_attr "prefix_data16" "*,1")
+   (set_attr "prefix" "maybe_vex,*")
    (set_attr "mode" "V2DF")])
 
 (define_insn "sse2_cvtpd2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
-	(unspec:V2SI [(match_operand:V2DF 1 "nonimmediate_operand" "xm")]
+  [(set (match_operand:V2SI 0 "register_operand" "=v,?!y")
+	(unspec:V2SI [(match_operand:V2DF 1 "nonimmediate_operand" "vBm,xm")]
 		     UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2"
-  "cvtpd2pi\t{%1, %0|%0, %1}"
+  "@
+   * return TARGET_AVX ? \"vcvtpd2dq{x}\t{%1, %0|%0, %1}\" : \"cvtpd2dq\t{%1, %0|%0, %1}\";
+   cvtpd2pi\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+   (set_attr "unit" "*,mmx")
+   (set_attr "amdfam10_decode" "double")
+   (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")
-   (set_attr "btver2_decode" "direct")
-   (set_attr "prefix_data16" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "prefix_data16" "*,1")
+   (set_attr "prefix" "maybe_vex,*")
+   (set_attr "mode" "TI")])
 
 (define_insn "sse2_cvttpd2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
-	(fix:V2SI (match_operand:V2DF 1 "nonimmediate_operand" "xm")))]
+  [(set (match_operand:V2SI 0 "register_operand" "=v,?!y")
+	(fix:V2SI (match_operand:V2DF 1 "nonimmediate_operand" "vBm,xm")))]
   "TARGET_SSE2"
-  "cvttpd2pi\t{%1, %0|%0, %1}"
+  "@
+   * return TARGET_AVX ? \"vcvttpd2dq{x}\t{%1, %0|%0, %1}\" : \"cvttpd2dq\t{%1, %0|%0, %1}\";
+   cvttpd2pi\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+   (set_attr "unit" "*,mmx")
+   (set_attr "amdfam10_decode" "double")
+   (set_attr "athlon_decode" "vector")
    (set_attr "bdver1_decode" "double")
-   (set_attr "prefix_data16" "1")
+   (set_attr "prefix_data16" "*,1")
+   (set_attr "prefix" "maybe_vex,*")
    (set_attr "mode" "TI")])
 
 (define_insn "sse2_cvtsi2sd"
Index: testsuite/g++.target/i386/pr88998.C
===================================================================
--- testsuite/g++.target/i386/pr88998.C	(nonexistent)
+++ testsuite/g++.target/i386/pr88998.C	(working copy)
@@ -0,0 +1,31 @@ 
+// PR target/88998
+// { dg-do run { target sse2_runtime } }
+// { dg-options "-O2 -msse2 -mfpmath=387" }
+// { dg-require-effective-target c++11 }
+
+#include <cassert>
+#include <unordered_map>
+#include <x86intrin.h>
+
+double
+__attribute__((noinline))
+prepare (int a, int b)
+{
+  __m128i is = _mm_setr_epi32 (a, b, 0, 0);
+  __m128d ds = _mm_cvtepi32_pd (is);
+  return ds[0] + ds[1];
+}
+
+int
+main (int, char **)
+{
+  double d = prepare (1, 2);
+
+  std::unordered_map < int, int >m;
+  m.insert ({0, 0});
+  m.insert ({1, 1});
+  assert (m.load_factor () <= m.max_load_factor ());
+
+  assert (d == 3);
+  return 0;
+}

[i386] Fix PR 88998, bad codegen with mmx instructions

Commit Message

Comments

Patch