From patchwork Tue Sep 22 14:34:32 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Leroy X-Patchwork-Id: 521082 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 221901401AD for ; Wed, 23 Sep 2015 00:36:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933351AbbIVOft (ORCPT ); Tue, 22 Sep 2015 10:35:49 -0400 Received: from 2.236.17.93.rev.sfr.net ([93.17.236.2]:58440 "EHLO mailhub1.si.c-s.fr" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933116AbbIVOef (ORCPT ); Tue, 22 Sep 2015 10:34:35 -0400 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id DC1551C926A; Tue, 22 Sep 2015 16:34:33 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from mailhub1.si.c-s.fr ([192.168.12.234]) by localhost (mailhub1.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NiPBtQH7kLCv; Tue, 22 Sep 2015 16:34:33 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id C32471C9236; Tue, 22 Sep 2015 16:34:33 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id AA6AEC73C9; Tue, 22 Sep 2015 16:34:33 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id DxjZd5ZIja0Q; Tue, 22 Sep 2015 16:34:33 +0200 (CEST) Received: from PO10863.localdomain (unknown [192.168.232.142]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 4C51EC73C4; Tue, 22 Sep 2015 16:34:33 +0200 (CEST) Received: by localhost.localdomain (Postfix, from userid 0) id 164FE1A2467; Tue, 22 Sep 2015 16:34:32 +0200 (CEST) Message-Id: <321c562c2320371ac39abd64959c160f8e2f5db7.1442876807.git.christophe.leroy@c-s.fr> In-Reply-To: References: From: Christophe Leroy Subject: [PATCH 7/9] powerpc32: optimise csum_partial() loop To: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , scottwood@freescale.com Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org Date: Tue, 22 Sep 2015 16:34:32 +0200 (CEST) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On the 8xx, load latency is 2 cycles and taking branches also takes 2 cycles. So let's unroll the loop. This patch improves csum_partial() speed by around 10% on both: * 8xx (single issue processor with parallele execution) * 83xx (superscalar 6xx processor with dual instruction fetch and parallele execution) Signed-off-by: Christophe Leroy --- arch/powerpc/lib/checksum_32.S | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index 9c12602..0d34f47 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -38,10 +38,24 @@ _GLOBAL(csum_partial) srwi. r6,r4,2 /* # words to do */ adde r5,r5,r0 beq 3f -1: mtctr r6 +1: andi. r6,r6,3 /* Prepare to handle words 4 by 4 */ + beq 21f + mtctr r6 2: lwzu r0,4(r3) adde r5,r5,r0 bdnz 2b +21: srwi. r6,r4,4 /* # blocks of 4 words to do */ + beq 3f + mtctr r6 +22: lwz r0,4(r3) + lwz r6,8(r3) + lwz r7,12(r3) + lwzu r8,16(r3) + adde r5,r5,r0 + adde r5,r5,r6 + adde r5,r5,r7 + adde r5,r5,r8 + bdnz 22b 3: andi. r0,r4,2 beq+ 4f lhz r0,4(r3)