Message ID | 1470293602-11121-1-git-send-email-anton@ozlabs.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Le 04/08/2016 à 08:53, Anton Blanchard a écrit : > From: Anton Blanchard <anton@samba.org> > > Align the hot loops in our assembly implementation of memset() > and backwards_memcpy(). > > backwards_memcpy() is called from tcp_v4_rcv(), so we might > want to optimise this a little more. > > Signed-off-by: Anton Blanchard <anton@samba.org> Shouldn't this patch be titled powerpc/64, as powerpc32 has a different memset() ? Christophe > --- > arch/powerpc/lib/mem_64.S | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S > index 43435c6..eda7a96 100644 > --- a/arch/powerpc/lib/mem_64.S > +++ b/arch/powerpc/lib/mem_64.S > @@ -37,6 +37,7 @@ _GLOBAL(memset) > clrldi r5,r5,58 > mtctr r0 > beq 5f > + .balign 16 > 4: std r4,0(r6) > std r4,8(r6) > std r4,16(r6) > @@ -90,6 +91,7 @@ _GLOBAL(backwards_memcpy) > andi. r0,r6,3 > mtctr r7 > bne 5f > + .balign 16 > 1: lwz r7,-4(r4) > lwzu r8,-8(r4) > stw r7,-4(r6) >
Hi Christophe, > > Align the hot loops in our assembly implementation of memset() > > and backwards_memcpy(). > > > > backwards_memcpy() is called from tcp_v4_rcv(), so we might > > want to optimise this a little more. > > > > Signed-off-by: Anton Blanchard <anton@samba.org> > > Shouldn't this patch be titled powerpc/64, as powerpc32 has a > different memset() ? Yeah, good point. Michael can you make this change if you choose to merge it? Anton > > --- > > arch/powerpc/lib/mem_64.S | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S > > index 43435c6..eda7a96 100644 > > --- a/arch/powerpc/lib/mem_64.S > > +++ b/arch/powerpc/lib/mem_64.S > > @@ -37,6 +37,7 @@ _GLOBAL(memset) > > clrldi r5,r5,58 > > mtctr r0 > > beq 5f > > + .balign 16 > > 4: std r4,0(r6) > > std r4,8(r6) > > std r4,16(r6) > > @@ -90,6 +91,7 @@ _GLOBAL(backwards_memcpy) > > andi. r0,r6,3 > > mtctr r7 > > bne 5f > > + .balign 16 > > 1: lwz r7,-4(r4) > > lwzu r8,-8(r4) > > stw r7,-4(r6) > > >
On Thu, 4 Aug 2016 16:53:22 +1000 Anton Blanchard <anton@ozlabs.org> wrote: > From: Anton Blanchard <anton@samba.org> > > Align the hot loops in our assembly implementation of memset() > and backwards_memcpy(). > > backwards_memcpy() is called from tcp_v4_rcv(), so we might > want to optimise this a little more. > > Signed-off-by: Anton Blanchard <anton@samba.org> > --- > arch/powerpc/lib/mem_64.S | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S > index 43435c6..eda7a96 100644 > --- a/arch/powerpc/lib/mem_64.S > +++ b/arch/powerpc/lib/mem_64.S > @@ -37,6 +37,7 @@ _GLOBAL(memset) > clrldi r5,r5,58 > mtctr r0 > beq 5f > + .balign 16 > 4: std r4,0(r6) > std r4,8(r6) > std r4,16(r6) Hmm. If we execute this loop once, we'll only fetch additional nops. Twice, and we make up for them by not fetching unused instructions. More than twice and we may start winning. For large sizes it probably helps, but I'd like to see what sizes memset sees.
Hi Nick, > Hmm. If we execute this loop once, we'll only fetch additional nops. > Twice, and we make up for them by not fetching unused instructions. > More than twice and we may start winning. > > For large sizes it probably helps, but I'd like to see what sizes > memset sees. I found this in a trace of nginx web serving. Looking back at it, get_empty_filp() zeros a struct file, and we go through the loop 4 times. We might want to look more generally at what lengths memset() is called with though. Anton
Hi Nick, > Hmm. If we execute this loop once, we'll only fetch additional nops. > Twice, and we make up for them by not fetching unused instructions. > More than twice and we may start winning. > > For large sizes it probably helps, but I'd like to see what sizes > memset sees. I noticed this in an nginx web serving test. There are some 1 and 2 iteration calls, but quite a few larger ones - get_empty_filp() goes for 4 iterations and sk_prot_alloc() for 26 iterations. Anton
On Sun, 25 Sep 2016 21:36:59 +1000 Anton Blanchard <anton@samba.org> wrote: > Hi Nick, > > > Hmm. If we execute this loop once, we'll only fetch additional nops. > > Twice, and we make up for them by not fetching unused instructions. > > More than twice and we may start winning. > > > > For large sizes it probably helps, but I'd like to see what sizes > > memset sees. > > I noticed this in an nginx web serving test. There are some 1 and 2 > iteration calls, but quite a few larger ones - get_empty_filp() goes for > 4 iterations and sk_prot_alloc() for 26 iterations. Hi Anton, I didn't have anything against the patch as such, I just wondered if it's likely to be an overall win. Thanks, Nick
On Thu, 2016-04-08 at 06:53:22 UTC, Anton Blanchard wrote: > From: Anton Blanchard <anton@samba.org> > > Align the hot loops in our assembly implementation of memset() > and backwards_memcpy(). > > backwards_memcpy() is called from tcp_v4_rcv(), so we might > want to optimise this a little more. > > Signed-off-by: Anton Blanchard <anton@samba.org> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/12ab11a2c09b30c1938c8e82e53908 cheers
diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S index 43435c6..eda7a96 100644 --- a/arch/powerpc/lib/mem_64.S +++ b/arch/powerpc/lib/mem_64.S @@ -37,6 +37,7 @@ _GLOBAL(memset) clrldi r5,r5,58 mtctr r0 beq 5f + .balign 16 4: std r4,0(r6) std r4,8(r6) std r4,16(r6) @@ -90,6 +91,7 @@ _GLOBAL(backwards_memcpy) andi. r0,r6,3 mtctr r7 bne 5f + .balign 16 1: lwz r7,-4(r4) lwzu r8,-8(r4) stw r7,-4(r6)