Message ID | 20220207193906.2111349-1-goldstein.w.n@gmail.com |
---|---|
State | New |
Headers | show |
Series | [v2] x86: Remove SSSE3 instruction for broadcast in memset.S (SSE2 Only) | expand |
On Mon, Feb 7, 2022 at 11:39 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > commit b62ace2740a106222e124cc86956448fa07abf4d > Author: Noah Goldstein <goldstein.w.n@gmail.com> > Date: Sun Feb 6 00:54:18 2022 -0600 > > x86: Improve vec generation in memset-vec-unaligned-erms.S > > Revert usage of 'pshufb' in broadcast logic as it is an SSSE3 > instruction and memset.S is restricted to only SSE2 instructions. > --- > sysdeps/x86_64/memset.S | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S > index ccf036be53..3f0517bbfc 100644 > --- a/sysdeps/x86_64/memset.S > +++ b/sysdeps/x86_64/memset.S > @@ -30,9 +30,10 @@ > > # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > movd d, %xmm0; \ > - pxor %xmm1, %xmm1; \ > - pshufb %xmm1, %xmm0; \ > - movq r, %rax > + movq r, %rax; \ > + punpcklbw %xmm0, %xmm0; \ > + punpcklwd %xmm0, %xmm0; \ > + pshufd $0, %xmm0, %xmm0 > > # define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > movd d, %xmm0; \ > -- > 2.25.1 > LGTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Thanks.
On Mon, Feb 7, 2022 at 1:49 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Feb 7, 2022 at 11:39 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > commit b62ace2740a106222e124cc86956448fa07abf4d > > Author: Noah Goldstein <goldstein.w.n@gmail.com> > > Date: Sun Feb 6 00:54:18 2022 -0600 > > > > x86: Improve vec generation in memset-vec-unaligned-erms.S > > > > Revert usage of 'pshufb' in broadcast logic as it is an SSSE3 > > instruction and memset.S is restricted to only SSE2 instructions. > > --- > > sysdeps/x86_64/memset.S | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S > > index ccf036be53..3f0517bbfc 100644 > > --- a/sysdeps/x86_64/memset.S > > +++ b/sysdeps/x86_64/memset.S > > @@ -30,9 +30,10 @@ > > > > # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > movd d, %xmm0; \ > > - pxor %xmm1, %xmm1; \ > > - pshufb %xmm1, %xmm0; \ > > - movq r, %rax > > + movq r, %rax; \ > > + punpcklbw %xmm0, %xmm0; \ > > + punpcklwd %xmm0, %xmm0; \ > > + pshufd $0, %xmm0, %xmm0 > > > > # define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > movd d, %xmm0; \ > > -- > > 2.25.1 > > > > LGTM. > > Reviewed-by: H.J. Lu <hjl.tools@gmail.com> > > Thanks. Thanks pushed. > > -- > H.J.
On Mon, Feb 7, 2022 at 12:56 PM Noah Goldstein via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Mon, Feb 7, 2022 at 1:49 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Mon, Feb 7, 2022 at 11:39 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > > > commit b62ace2740a106222e124cc86956448fa07abf4d > > > Author: Noah Goldstein <goldstein.w.n@gmail.com> > > > Date: Sun Feb 6 00:54:18 2022 -0600 > > > > > > x86: Improve vec generation in memset-vec-unaligned-erms.S > > > > > > Revert usage of 'pshufb' in broadcast logic as it is an SSSE3 > > > instruction and memset.S is restricted to only SSE2 instructions. > > > --- > > > sysdeps/x86_64/memset.S | 7 ++++--- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S > > > index ccf036be53..3f0517bbfc 100644 > > > --- a/sysdeps/x86_64/memset.S > > > +++ b/sysdeps/x86_64/memset.S > > > @@ -30,9 +30,10 @@ > > > > > > # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > > movd d, %xmm0; \ > > > - pxor %xmm1, %xmm1; \ > > > - pshufb %xmm1, %xmm0; \ > > > - movq r, %rax > > > + movq r, %rax; \ > > > + punpcklbw %xmm0, %xmm0; \ > > > + punpcklwd %xmm0, %xmm0; \ > > > + pshufd $0, %xmm0, %xmm0 > > > > > > # define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ > > > movd d, %xmm0; \ > > > -- > > > 2.25.1 > > > > > > > LGTM. > > > > Reviewed-by: H.J. Lu <hjl.tools@gmail.com> > > > > Thanks. > > Thanks pushed. > > > > -- > > H.J. I would like to backport this patch to release branches. Any comments or objections? --Sunil
diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S index ccf036be53..3f0517bbfc 100644 --- a/sysdeps/x86_64/memset.S +++ b/sysdeps/x86_64/memset.S @@ -30,9 +30,10 @@ # define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ movd d, %xmm0; \ - pxor %xmm1, %xmm1; \ - pshufb %xmm1, %xmm0; \ - movq r, %rax + movq r, %rax; \ + punpcklbw %xmm0, %xmm0; \ + punpcklwd %xmm0, %xmm0; \ + pshufd $0, %xmm0, %xmm0 # define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \ movd d, %xmm0; \