diff mbox series

x86: Only align destination to 1x VEC_SIZE in memset 4x loop

Message ID 20231101203026.2608879-1-goldstein.w.n@gmail.com
State New
Headers show
Series x86: Only align destination to 1x VEC_SIZE in memset 4x loop | expand

Commit Message

Noah Goldstein Nov. 1, 2023, 8:30 p.m. UTC
Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on
performance other than potentially resulting in an additional
iteration of the loop.
1x maintains aligned stores (the only reason to align in this case)
and doesn't incur any unnecessary loop iterations.
---
 sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Noah Goldstein Nov. 23, 2023, 6:10 a.m. UTC | #1
On Wed, Nov 1, 2023 at 3:30 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on
> performance other than potentially resulting in an additional
> iteration of the loop.
> 1x maintains aligned stores (the only reason to align in this case)
> and doesn't incur any unnecessary loop iterations.
> ---
>  sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> index 3d9ad49cb9..0f0636b90f 100644
> --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> @@ -293,7 +293,7 @@ L(more_2x_vec):
>         leaq    (VEC_SIZE * 4)(%rax), %LOOP_REG
>  #endif
>         /* Align dst for loop.  */
> -       andq    $(VEC_SIZE * -2), %LOOP_REG
> +       andq    $(VEC_SIZE * -1), %LOOP_REG
>         .p2align 4
>  L(loop):
>         VMOVA   %VMM(0), LOOP_4X_OFFSET(%LOOP_REG)
> --
> 2.34.1
>

ping.
Sunil Pandey Nov. 27, 2023, 11:11 p.m. UTC | #2
On Wed, Nov 22, 2023 at 10:11 PM Noah Goldstein <goldstein.w.n@gmail.com>
wrote:

> On Wed, Nov 1, 2023 at 3:30 PM Noah Goldstein <goldstein.w.n@gmail.com>
> wrote:
> >
> > Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on
> > performance other than potentially resulting in an additional
> > iteration of the loop.
> > 1x maintains aligned stores (the only reason to align in this case)
> > and doesn't incur any unnecessary loop iterations.
> > ---
> >  sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> > index 3d9ad49cb9..0f0636b90f 100644
> > --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> > +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
> > @@ -293,7 +293,7 @@ L(more_2x_vec):
> >         leaq    (VEC_SIZE * 4)(%rax), %LOOP_REG
> >  #endif
> >         /* Align dst for loop.  */
> > -       andq    $(VEC_SIZE * -2), %LOOP_REG
> > +       andq    $(VEC_SIZE * -1), %LOOP_REG
> >         .p2align 4
> >  L(loop):
> >         VMOVA   %VMM(0), LOOP_4X_OFFSET(%LOOP_REG)
> > --
> > 2.34.1
> >
>
> ping.
>


LGTM
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
diff mbox series

Patch

diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
index 3d9ad49cb9..0f0636b90f 100644
--- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
+++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
@@ -293,7 +293,7 @@  L(more_2x_vec):
 	leaq	(VEC_SIZE * 4)(%rax), %LOOP_REG
 #endif
 	/* Align dst for loop.  */
-	andq	$(VEC_SIZE * -2), %LOOP_REG
+	andq	$(VEC_SIZE * -1), %LOOP_REG
 	.p2align 4
 L(loop):
 	VMOVA	%VMM(0), LOOP_4X_OFFSET(%LOOP_REG)