From patchwork Tue Sep 22 14:34:32 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Christophe Leroy <christophe.leroy@c-s.fr>
X-Patchwork-Id: 521082
X-Patchwork-Delegate: davem@davemloft.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 221901401AD
	for <patchwork-incoming@ozlabs.org>;
	Wed, 23 Sep 2015 00:36:08 +1000 (AEST)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S933351AbbIVOft (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);
	Tue, 22 Sep 2015 10:35:49 -0400
Received: from 2.236.17.93.rev.sfr.net ([93.17.236.2]:58440 "EHLO
	mailhub1.si.c-s.fr" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
	with ESMTP id S933116AbbIVOef (ORCPT
	<rfc822;netdev@vger.kernel.org>); Tue, 22 Sep 2015 10:34:35 -0400
Received: from localhost (mailhub1-int [192.168.12.234])
	by localhost (Postfix) with ESMTP id DC1551C926A;
	Tue, 22 Sep 2015 16:34:33 +0200 (CEST)
X-Virus-Scanned: amavisd-new at c-s.fr
Received: from mailhub1.si.c-s.fr ([192.168.12.234])
	by localhost (mailhub1.c-s.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id NiPBtQH7kLCv; Tue, 22 Sep 2015 16:34:33 +0200 (CEST)
Received: from messagerie.si.c-s.fr (messagerie [192.168.25.192])
	by pegase1.c-s.fr (Postfix) with ESMTP id C32471C9236;
	Tue, 22 Sep 2015 16:34:33 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])
	by messagerie.si.c-s.fr (Postfix) with ESMTP id AA6AEC73C9;
	Tue, 22 Sep 2015 16:34:33 +0200 (CEST)
X-Virus-Scanned: amavisd-new at c-s.fr
Received: from messagerie.si.c-s.fr ([127.0.0.1])
	by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,
	port 10023)
	with ESMTP id DxjZd5ZIja0Q; Tue, 22 Sep 2015 16:34:33 +0200 (CEST)
Received: from PO10863.localdomain (unknown [192.168.232.142])
	by messagerie.si.c-s.fr (Postfix) with ESMTP id 4C51EC73C4;
	Tue, 22 Sep 2015 16:34:33 +0200 (CEST)
Received: by localhost.localdomain (Postfix, from userid 0)
	id 164FE1A2467; Tue, 22 Sep 2015 16:34:32 +0200 (CEST)
Message-Id: 
 <321c562c2320371ac39abd64959c160f8e2f5db7.1442876807.git.christophe.leroy@c-s.fr>
In-Reply-To: <cover.1442876807.git.christophe.leroy@c-s.fr>
References: <cover.1442876807.git.christophe.leroy@c-s.fr>
From: Christophe Leroy <christophe.leroy@c-s.fr>
Subject: [PATCH 7/9] powerpc32: optimise csum_partial() loop
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>, scottwood@freescale.com
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	netdev@vger.kernel.org
Date: Tue, 22 Sep 2015 16:34:32 +0200 (CEST)
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.

This patch improves csum_partial() speed by around 10% on both:
* 8xx (single issue processor with parallele execution)
* 83xx (superscalar 6xx processor with dual instruction fetch
and parallele execution)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/lib/checksum_32.S | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index 9c12602..0d34f47 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -38,10 +38,24 @@ _GLOBAL(csum_partial)
 	srwi.	r6,r4,2		/* # words to do */
 	adde	r5,r5,r0
 	beq	3f
-1:	mtctr	r6
+1:	andi.	r6,r6,3		/* Prepare to handle words 4 by 4 */
+	beq	21f
+	mtctr	r6
 2:	lwzu	r0,4(r3)
 	adde	r5,r5,r0
 	bdnz	2b
+21:	srwi.	r6,r4,4		/* # blocks of 4 words to do */
+	beq	3f
+	mtctr	r6
+22:	lwz	r0,4(r3)
+	lwz	r6,8(r3)
+	lwz	r7,12(r3)
+	lwzu	r8,16(r3)
+	adde	r5,r5,r0
+	adde	r5,r5,r6
+	adde	r5,r5,r7
+	adde	r5,r5,r8
+	bdnz	22b
 3:	andi.	r0,r4,2
 	beq+	4f
 	lhz	r0,4(r3)