From patchwork Thu Jun 7 01:57:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Simon Guo X-Patchwork-Id: 926083 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 411TMr0K9Lz9s1B for ; Thu, 7 Jun 2018 12:03:04 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="pwmQW4+G"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 411TMq60JxzF327 for ; Thu, 7 Jun 2018 12:03:03 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="pwmQW4+G"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c00::241; helo=mail-pf0-x241.google.com; envelope-from=wei.guo.simon@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="pwmQW4+G"; dkim-atps=neutral Received: from mail-pf0-x241.google.com (mail-pf0-x241.google.com [IPv6:2607:f8b0:400e:c00::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 411TGC6LttzDrnp for ; Thu, 7 Jun 2018 11:58:11 +1000 (AEST) Received: by mail-pf0-x241.google.com with SMTP id j20-v6so4037378pff.10 for ; Wed, 06 Jun 2018 18:58:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wQV38E0BO+T8bucIXcPVIf+iv7/KjcxNmGRJnu1IGhY=; b=pwmQW4+Gf1GkVHzW+36UCEvO15y0hkefuZUif9Qr/JgbNMaUoYpS8IxxDbO9U9CbJK OC/WCNdbiwaANyTe9U1pU4Ez/u+qZXxdAiF9ytau8h9gbBhBw9OoqrQvLRbBIbSAuvaS eP3nD4Z2AgBzAKJDc9ZqPxwi8Y3StOsSqxNPkAOve+JDHZ85LxmHLD5VTZxTsR0AwTdn e9lI93/vskN5/Gn6Pj3xPcJXp5I3DCa/j2gvXmN9TkTIncmh9DheqtbpK9hA6jW/iSug X5s6pWOJkLL/xoKIQR9+LWrkaRswhM43qoquDMBhT1ilYQvcglKU5BaJvJDprzYFHNwS Pb4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wQV38E0BO+T8bucIXcPVIf+iv7/KjcxNmGRJnu1IGhY=; b=FKiBUILEA3OhJHduiXaKxpOqCgWjT1oSn45xh8Hua5FPlyofAdPk8EzJLTqHQbFGsN F2GicKftKgncFSqSipjC4njg3lCj7Wd7SSmT6iLNzhlJoXuxcHP0vwUXYZ8KN3kGFN5O aQj0BdsbiopOBJaJ/RqSiduecOKAj9hVWil4Vrf4cwdqXJD7i+OJ8uT2oE6bym9x0/Ef UnFB9gQrNyfiHwZ9WZUBs0SL+98rtPDdHkwe6owSG5R11gqzc8P6S8n0HFmr/6Nf76SZ f6ikOJL/d+qwu3AefW4xoN/Me6xFZ+Dj+rL4BrDqrwin82AeXoHTNxuzuvUTxk0v0UsK bg/A== X-Gm-Message-State: APt69E1MUeBcF7CPnE/07FCJuEYtfVjgXHovFRE6QDjh23PA3aZksxSB 1OsQoH65ktTrUUdM3hc0/Zi42w== X-Google-Smtp-Source: ADUXVKLAf2L3hKXZUjY1D11IwgsJvW4tYgfEgfKvnUD5IwvYm93vpOWfB6KQpLqoX93HAPQ0XdLFpQ== X-Received: by 2002:a62:3745:: with SMTP id e66-v6mr4678087pfa.43.1528336689404; Wed, 06 Jun 2018 18:58:09 -0700 (PDT) Received: from simonLocalRHEL7.cn.ibm.com ([112.73.0.89]) by smtp.gmail.com with ESMTPSA id g4-v6sm51946444pfg.38.2018.06.06.18.58.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jun 2018 18:58:08 -0700 (PDT) From: wei.guo.simon@gmail.com To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v8 1/5] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() Date: Thu, 7 Jun 2018 09:57:51 +0800 Message-Id: <1528336675-10879-2-git-send-email-wei.guo.simon@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> References: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Naveen N. Rao" , Simon Guo , Cyril Bur Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Simon Guo Currently memcmp() 64bytes version in powerpc will fall back to .Lshort (compare per byte mode) if either src or dst address is not 8 bytes aligned. It can be opmitized in 2 situations: 1) if both addresses are with the same offset with 8 bytes boundary: memcmp() can compare the unaligned bytes within 8 bytes boundary firstly and then compare the rest 8-bytes-aligned content with .Llong mode. 2) If src/dst addrs are not with the same offset of 8 bytes boundary: memcmp() can align src addr with 8 bytes, increment dst addr accordingly, then load src with aligned mode and load dst with unaligned mode. This patch optmizes memcmp() behavior in the above 2 situations. Tested with both little/big endian. Performance result below is based on little endian. Following is the test result with src/dst having the same offset case: (a similar result was observed when src/dst having different offset): (1) 256 bytes Test with the existing tools/testing/selftests/powerpc/stringloops/memcmp: - without patch 29.773018302 seconds time elapsed ( +- 0.09% ) - with patch 16.485568173 seconds time elapsed ( +- 0.02% ) -> There is ~+80% percent improvement (2) 32 bytes To observe performance impact on < 32 bytes, modify tools/testing/selftests/powerpc/stringloops/memcmp.c with following: ------- #include #include "utils.h" -#define SIZE 256 +#define SIZE 32 #define ITERATIONS 10000 int test_memcmp(const void *s1, const void *s2, size_t n); -------- - Without patch 0.244746482 seconds time elapsed ( +- 0.36%) - with patch 0.215069477 seconds time elapsed ( +- 0.51%) -> There is ~+13% improvement (3) 0~8 bytes To observe <8 bytes performance impact, modify tools/testing/selftests/powerpc/stringloops/memcmp.c with following: ------- #include #include "utils.h" -#define SIZE 256 -#define ITERATIONS 10000 +#define SIZE 8 +#define ITERATIONS 1000000 int test_memcmp(const void *s1, const void *s2, size_t n); ------- - Without patch 1.845642503 seconds time elapsed ( +- 0.12% ) - With patch 1.849767135 seconds time elapsed ( +- 0.26% ) -> They are nearly the same. (-0.2%) Signed-off-by: Simon Guo --- arch/powerpc/lib/memcmp_64.S | 140 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 133 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S index d75d18b..5776f91 100644 --- a/arch/powerpc/lib/memcmp_64.S +++ b/arch/powerpc/lib/memcmp_64.S @@ -24,28 +24,41 @@ #define rH r31 #ifdef __LITTLE_ENDIAN__ +#define LH lhbrx +#define LW lwbrx #define LD ldbrx #else +#define LH lhzx +#define LW lwzx #define LD ldx #endif +/* + * There are 2 categories for memcmp: + * 1) src/dst has the same offset to the 8 bytes boundary. The handlers + * are named like .Lsameoffset_xxxx + * 2) src/dst has different offset to the 8 bytes boundary. The handlers + * are named like .Ldiffoffset_xxxx + */ _GLOBAL(memcmp) cmpdi cr1,r5,0 - /* Use the short loop if both strings are not 8B aligned */ - or r6,r3,r4 + /* Use the short loop if the src/dst addresses are not + * with the same offset of 8 bytes align boundary. + */ + xor r6,r3,r4 andi. r6,r6,7 - /* Use the short loop if length is less than 32B */ - cmpdi cr6,r5,31 + /* Fall back to short loop if compare at aligned addrs + * with less than 8 bytes. + */ + cmpdi cr6,r5,7 beq cr1,.Lzero - bne .Lshort - bgt cr6,.Llong + bgt cr6,.Lno_short .Lshort: mtctr r5 - 1: lbz rA,0(r3) lbz rB,0(r4) subf. rC,rB,rA @@ -78,11 +91,89 @@ _GLOBAL(memcmp) li r3,0 blr +.Lno_short: + dcbt 0,r3 + dcbt 0,r4 + bne .Ldiffoffset_8bytes_make_align_start + + +.Lsameoffset_8bytes_make_align_start: + /* attempt to compare bytes not aligned with 8 bytes so that + * rest comparison can run based on 8 bytes alignment. + */ + andi. r6,r3,7 + + /* Try to compare the first double word which is not 8 bytes aligned: + * load the first double word at (src & ~7UL) and shift left appropriate + * bits before comparision. + */ + rlwinm r6,r3,3,26,28 + beq .Lsameoffset_8bytes_aligned + clrrdi r3,r3,3 + clrrdi r4,r4,3 + LD rA,0,r3 + LD rB,0,r4 + sld rA,rA,r6 + sld rB,rB,r6 + cmpld cr0,rA,rB + srwi r6,r6,3 + bne cr0,.LcmpAB_lightweight + subfic r6,r6,8 + subf. r5,r6,r5 + addi r3,r3,8 + addi r4,r4,8 + beq .Lzero + +.Lsameoffset_8bytes_aligned: + /* now we are aligned with 8 bytes. + * Use .Llong loop if left cmp bytes are equal or greater than 32B. + */ + cmpdi cr6,r5,31 + bgt cr6,.Llong + +.Lcmp_lt32bytes: + /* compare 1 ~ 32 bytes, at least r3 addr is 8 bytes aligned now */ + cmpdi cr5,r5,7 + srdi r0,r5,3 + ble cr5,.Lcmp_rest_lt8bytes + + /* handle 8 ~ 31 bytes */ + clrldi r5,r5,61 + mtctr r0 +2: + LD rA,0,r3 + LD rB,0,r4 + cmpld cr0,rA,rB + addi r3,r3,8 + addi r4,r4,8 + bne cr0,.LcmpAB_lightweight + bdnz 2b + + cmpwi r5,0 + beq .Lzero + +.Lcmp_rest_lt8bytes: + /* Here we have only less than 8 bytes to compare with. at least s1 + * Address is aligned with 8 bytes. + * The next double words are load and shift right with appropriate + * bits. + */ + subfic r6,r5,8 + slwi r6,r6,3 + LD rA,0,r3 + LD rB,0,r4 + srd rA,rA,r6 + srd rB,rB,r6 + cmpld cr0,rA,rB + bne cr0,.LcmpAB_lightweight + b .Lzero + .Lnon_zero: mr r3,rC blr .Llong: + /* At least s1 addr is aligned with 8 bytes */ li off8,8 li off16,16 li off24,24 @@ -232,4 +323,39 @@ _GLOBAL(memcmp) ld r28,-32(r1) ld r27,-40(r1) blr + +.LcmpAB_lightweight: /* skip NV GPRS restore */ + li r3,1 + bgtlr + li r3,-1 + blr + +.Ldiffoffset_8bytes_make_align_start: + /* now try to align s1 with 8 bytes */ + rlwinm r6,r3,3,26,28 + beq .Ldiffoffset_align_s1_8bytes + + clrrdi r3,r3,3 + LD rA,0,r3 + LD rB,0,r4 /* unaligned load */ + sld rA,rA,r6 + srd rA,rA,r6 + srd rB,rB,r6 + cmpld cr0,rA,rB + srwi r6,r6,3 + bne cr0,.LcmpAB_lightweight + + subfic r6,r6,8 + subf. r5,r6,r5 + addi r3,r3,8 + add r4,r4,r6 + + beq .Lzero + +.Ldiffoffset_align_s1_8bytes: + /* now s1 is aligned with 8 bytes. */ + cmpdi cr5,r5,31 + ble cr5,.Lcmp_lt32bytes + b .Llong + EXPORT_SYMBOL(memcmp) From patchwork Thu Jun 7 01:57:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Guo X-Patchwork-Id: 926084 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 411TQt56slz9s1B for ; Thu, 7 Jun 2018 12:05:42 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EaHDMLC7"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 411TQt3XrZzF32R for ; Thu, 7 Jun 2018 12:05:42 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EaHDMLC7"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c01::242; helo=mail-pl0-x242.google.com; envelope-from=wei.guo.simon@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EaHDMLC7"; dkim-atps=neutral Received: from mail-pl0-x242.google.com (mail-pl0-x242.google.com [IPv6:2607:f8b0:400e:c01::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 411TGG08zZzDrpC for ; Thu, 7 Jun 2018 11:58:14 +1000 (AEST) Received: by mail-pl0-x242.google.com with SMTP id 31-v6so4992714plc.4 for ; Wed, 06 Jun 2018 18:58:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=hcnTkq1gsyJoFF+zDT/m3yGLd9Hnj1f8hUWnucSk0io=; b=EaHDMLC7VPeTgrRL/cXeKO46em/DfoQDlguTm7rC6dK3ws06FWJLFORIRL8OGGPI39 gjdzj0UH7phvVf0aVpNZrfsbW2dfZfFraMKO5ISmTW93gWLyJJ4DsUdn3rl1u+Y7xGMd 9GKAdI3QF/HBt+C9yhJpWs4rLrHN62WoFPcXDdDPta454i+D6+X6HUg8le8gu262K5US 3ujeaKeJHVjqkf+ArPYzCeiEeG+dOYWxjTJvvKNanTd3LE94PmEt2Z1vBDnOTmUEyFct K9YEXvpsfUISYI1+v6Mf1aLaIYAo1rj33GZJZgqzjYRSgmRG22y3zwCeKTAtiCLUulvB Hi8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=hcnTkq1gsyJoFF+zDT/m3yGLd9Hnj1f8hUWnucSk0io=; b=fvgqSkR6PjDLQ7ieEdghIyrvI9ODlS0pEJJYDJMCT7HCbKQ8tdReTLBbYDkHhiUWJX fBuXJfU7i/Uo77UX+RwUY77+K55yPEkBUQjgQDQ8iA6W8ZcjHEND869dmX5XmvM2AGVw pD6xCopklzRgd58oPRBmMU3dbe/6oG2wDZWbvswHv82xQ1+iU4G99USHIrc6jEONyoQC Mb4TAJQkagMg9xCRVbquk5qKUWjbfQ27Xzt5apE3lvZPaVkgPLe7G2q9Dg1QRjGpju1s rmqrbYUMAMFMobfMF6WHjj2/0im2q9Ctb3uAk6Ey10Re+306qN6c+hjJKCqgzqgWulcC bUuA== X-Gm-Message-State: APt69E0yDMA6200GmruRaf5FSJowTr3XNFdb6Kf9rjQ8xAJBQGHgQPAK kjBgtalnV7aApyTQZsLSN0Vp+g== X-Google-Smtp-Source: ADUXVKILx6hOV2BqKnXeTVP2B3oM6ISHTU28LkhZVK91UeY0nk8ZV7sxGN6KIORnlRcjLl/Uetu6Sw== X-Received: by 2002:a17:902:b60c:: with SMTP id b12-v6mr5778683pls.44.1528336692139; Wed, 06 Jun 2018 18:58:12 -0700 (PDT) Received: from simonLocalRHEL7.cn.ibm.com ([112.73.0.89]) by smtp.gmail.com with ESMTPSA id g4-v6sm51946444pfg.38.2018.06.06.18.58.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jun 2018 18:58:11 -0700 (PDT) From: wei.guo.simon@gmail.com To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v8 2/5] powerpc: add vcmpequd/vcmpequb ppc instruction macro Date: Thu, 7 Jun 2018 09:57:52 +0800 Message-Id: <1528336675-10879-3-git-send-email-wei.guo.simon@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> References: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Naveen N. Rao" , Simon Guo , Cyril Bur Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Simon Guo Some old tool chains don't know about instructions like vcmpequd. This patch adds .long macro for vcmpequd and vcmpequb, which is a preparation to optimize ppc64 memcmp with VMX instructions. Signed-off-by: Simon Guo --- arch/powerpc/include/asm/ppc-opcode.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 18883b8..1866a97 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -366,6 +366,8 @@ #define PPC_INST_STFDX 0x7c0005ae #define PPC_INST_LVX 0x7c0000ce #define PPC_INST_STVX 0x7c0001ce +#define PPC_INST_VCMPEQUD 0x100000c7 +#define PPC_INST_VCMPEQUB 0x10000006 /* macros to insert fields into opcodes */ #define ___PPC_RA(a) (((a) & 0x1f) << 16) @@ -396,6 +398,7 @@ #define __PPC_BI(s) (((s) & 0x1f) << 16) #define __PPC_CT(t) (((t) & 0x0f) << 21) #define __PPC_SPR(r) ((((r) & 0x1f) << 16) | ((((r) >> 5) & 0x1f) << 11)) +#define __PPC_RC21 (0x1 << 10) /* * Only use the larx hint bit on 64bit CPUs. e500v1/v2 based CPUs will treat a @@ -567,4 +570,12 @@ ((IH & 0x7) << 21)) #define PPC_INVALIDATE_ERAT PPC_SLBIA(7) +#define VCMPEQUD_RC(vrt, vra, vrb) stringify_in_c(.long PPC_INST_VCMPEQUD | \ + ___PPC_RT(vrt) | ___PPC_RA(vra) | \ + ___PPC_RB(vrb) | __PPC_RC21) + +#define VCMPEQUB_RC(vrt, vra, vrb) stringify_in_c(.long PPC_INST_VCMPEQUB | \ + ___PPC_RT(vrt) | ___PPC_RA(vra) | \ + ___PPC_RB(vrb) | __PPC_RC21) + #endif /* _ASM_POWERPC_PPC_OPCODE_H */ From patchwork Thu Jun 7 01:57:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Guo X-Patchwork-Id: 926085 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 411TTW09kcz9s1B for ; Thu, 7 Jun 2018 12:07:59 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="W8f0txbj"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 411TTV5k2mzF320 for ; Thu, 7 Jun 2018 12:07:58 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="W8f0txbj"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c01::242; helo=mail-pl0-x242.google.com; envelope-from=wei.guo.simon@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="W8f0txbj"; dkim-atps=neutral Received: from mail-pl0-x242.google.com (mail-pl0-x242.google.com [IPv6:2607:f8b0:400e:c01::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 411TGK0m78zF324 for ; Thu, 7 Jun 2018 11:58:17 +1000 (AEST) Received: by mail-pl0-x242.google.com with SMTP id a7-v6so2488740plp.3 for ; Wed, 06 Jun 2018 18:58:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TON2aWMDTVBToXIJsj6S4bQ40lBV3FhY41yBTVhpSAA=; b=W8f0txbjM4GmQvgd1liit48tBpiG75sNXpoEjmc/jb7HO94P/tnMCP9scCyx2qG8dr KNrJc3EewnFDaT8SO8/5mZyC4y4FO67hjZYepiMRqQFk9+hET+6elckS8lxk16rHr6Xa L8DDvKkYkbfIXPVtZjSIhfBvppHjZRMAR9I9THT12v++zds0PZBLS3Eo2WtUnHi9YpQr 3F3lnY2VzgCF3BvwBrzR4cFCYhNwz2qYg7ka1jN1i/ZR5n0fMhQEfcJpmTYDFTiALSiI DgnQrR/+pL61YcP8/w+Lnvh+hMAJEW9YeR4PFxhcocMhOpHi4zHlwfgcdohWzJiHH9aI kH4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=TON2aWMDTVBToXIJsj6S4bQ40lBV3FhY41yBTVhpSAA=; b=Q/uike9obOorzt6SNgmzfKYIQDSnAikPvCt7adNHtAc5OQgobdpT15vREMmA+wXbM+ bOBHwGDpYSRy/Vu7xMfSU4Qn9nkxLLkB+ogmw4XTS77hnfCVsAQCZZwQZ0PCXnHRhsz1 9uElaToLuzA7ilejc+PTRbYzw7Bj9qWL67HRIxb1xKOH8ElCwDKYo4indlczDwuzfRIn liPMcyXkywyFXDM6KyxP28OzvlA2REWyRdbeXbyg1cd/QE/XgmMxRWxvFwqmX8zOpI6V vNRch+tjicv/i2p7v+La6C8+p0eMGhfEOhDgz0gudihRKCE8IB0n3DYEuUR+ZdcKZkTV bG/Q== X-Gm-Message-State: APt69E349swC482HXSTZv6fxi33oitXfg84UIMWG0GtZUuf1+TraJYsF iAYxAFdVruEWhfI8W9CbqrLpoA== X-Google-Smtp-Source: ADUXVKJv9p+RFcIPKO0/YBG/HbzSDouICi31VI4w4WcF2T6Z9CAHFjv/hNZTZzdAJHRw7F6SlR/Yzw== X-Received: by 2002:a17:902:8648:: with SMTP id y8-v6mr5540562plt.86.1528336694929; Wed, 06 Jun 2018 18:58:14 -0700 (PDT) Received: from simonLocalRHEL7.cn.ibm.com ([112.73.0.89]) by smtp.gmail.com with ESMTPSA id g4-v6sm51946444pfg.38.2018.06.06.18.58.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jun 2018 18:58:14 -0700 (PDT) From: wei.guo.simon@gmail.com To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v8 3/5] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision Date: Thu, 7 Jun 2018 09:57:53 +0800 Message-Id: <1528336675-10879-4-git-send-email-wei.guo.simon@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> References: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Naveen N. Rao" , Simon Guo , Cyril Bur Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Simon Guo This patch add VMX primitives to do memcmp() in case the compare size is equal or greater than 4K bytes. KSM feature can benefit from this. Test result with following test program(replace the "^>" with ""): ------ ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c >#include >#include >#include >#include >#include "utils.h" >#define SIZE (1024 * 1024 * 900) >#define ITERATIONS 40 int test_memcmp(const void *s1, const void *s2, size_t n); static int testcase(void) { char *s1; char *s2; unsigned long i; s1 = memalign(128, SIZE); if (!s1) { perror("memalign"); exit(1); } s2 = memalign(128, SIZE); if (!s2) { perror("memalign"); exit(1); } for (i = 0; i < SIZE; i++) { s1[i] = i & 0xff; s2[i] = i & 0xff; } for (i = 0; i < ITERATIONS; i++) { int ret = test_memcmp(s1, s2, SIZE); if (ret) { printf("return %d at[%ld]! should have returned zero\n", ret, i); abort(); } } return 0; } int main(void) { return test_harness(testcase, "memcmp"); } ------ Without this patch (but with the first patch "powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()." in the series): 4.726728762 seconds time elapsed ( +- 3.54%) With VMX patch: 4.234335473 seconds time elapsed ( +- 2.63%) There is ~+10% improvement. Testing with unaligned and different offset version (make s1 and s2 shift random offset within 16 bytes) can archieve higher improvement than 10%.. Signed-off-by: Simon Guo --- arch/powerpc/include/asm/asm-prototypes.h | 4 +- arch/powerpc/lib/copypage_power7.S | 4 +- arch/powerpc/lib/memcmp_64.S | 241 +++++++++++++++++++++++++++++- arch/powerpc/lib/memcpy_power7.S | 6 +- arch/powerpc/lib/vmx-helper.c | 4 +- 5 files changed, 248 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index d9713ad..31fdcee 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -49,8 +49,8 @@ void __trace_hcall_exit(long opcode, unsigned long retval, /* VMX copying */ int enter_vmx_usercopy(void); int exit_vmx_usercopy(void); -int enter_vmx_copy(void); -void * exit_vmx_copy(void *dest); +int enter_vmx_ops(void); +void *exit_vmx_ops(void *dest); /* Traps */ long machine_check_early(struct pt_regs *regs); diff --git a/arch/powerpc/lib/copypage_power7.S b/arch/powerpc/lib/copypage_power7.S index 8fa73b7..e38f956 100644 --- a/arch/powerpc/lib/copypage_power7.S +++ b/arch/powerpc/lib/copypage_power7.S @@ -57,7 +57,7 @@ _GLOBAL(copypage_power7) std r4,-STACKFRAMESIZE+STK_REG(R30)(r1) std r0,16(r1) stdu r1,-STACKFRAMESIZE(r1) - bl enter_vmx_copy + bl enter_vmx_ops cmpwi r3,0 ld r0,STACKFRAMESIZE+16(r1) ld r3,STK_REG(R31)(r1) @@ -100,7 +100,7 @@ _GLOBAL(copypage_power7) addi r3,r3,128 bdnz 1b - b exit_vmx_copy /* tail call optimise */ + b exit_vmx_ops /* tail call optimise */ #else li r0,(PAGE_SIZE/128) diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S index 5776f91..be2f792 100644 --- a/arch/powerpc/lib/memcmp_64.S +++ b/arch/powerpc/lib/memcmp_64.S @@ -9,6 +9,7 @@ */ #include #include +#include #define off8 r6 #define off16 r7 @@ -27,12 +28,73 @@ #define LH lhbrx #define LW lwbrx #define LD ldbrx +#define LVS lvsr +#define VPERM(_VRT,_VRA,_VRB,_VRC) \ + vperm _VRT,_VRB,_VRA,_VRC #else #define LH lhzx #define LW lwzx #define LD ldx +#define LVS lvsl +#define VPERM(_VRT,_VRA,_VRB,_VRC) \ + vperm _VRT,_VRA,_VRB,_VRC #endif +#define VMX_THRESH 4096 +#define ENTER_VMX_OPS \ + mflr r0; \ + std r3,-STACKFRAMESIZE+STK_REG(R31)(r1); \ + std r4,-STACKFRAMESIZE+STK_REG(R30)(r1); \ + std r5,-STACKFRAMESIZE+STK_REG(R29)(r1); \ + std r0,16(r1); \ + stdu r1,-STACKFRAMESIZE(r1); \ + bl enter_vmx_ops; \ + cmpwi cr1,r3,0; \ + ld r0,STACKFRAMESIZE+16(r1); \ + ld r3,STK_REG(R31)(r1); \ + ld r4,STK_REG(R30)(r1); \ + ld r5,STK_REG(R29)(r1); \ + addi r1,r1,STACKFRAMESIZE; \ + mtlr r0 + +#define EXIT_VMX_OPS \ + mflr r0; \ + std r3,-STACKFRAMESIZE+STK_REG(R31)(r1); \ + std r4,-STACKFRAMESIZE+STK_REG(R30)(r1); \ + std r5,-STACKFRAMESIZE+STK_REG(R29)(r1); \ + std r0,16(r1); \ + stdu r1,-STACKFRAMESIZE(r1); \ + bl exit_vmx_ops; \ + ld r0,STACKFRAMESIZE+16(r1); \ + ld r3,STK_REG(R31)(r1); \ + ld r4,STK_REG(R30)(r1); \ + ld r5,STK_REG(R29)(r1); \ + addi r1,r1,STACKFRAMESIZE; \ + mtlr r0 + +/* + * LD_VSR_CROSS16B load the 2nd 16 bytes for _vaddr which is unaligned with + * 16 bytes boundary and permute the result with the 1st 16 bytes. + + * | y y y y y y y y y y y y y 0 1 2 | 3 4 5 6 7 8 9 a b c d e f z z z | + * ^ ^ ^ + * 0xbbbb10 0xbbbb20 0xbbb30 + * ^ + * _vaddr + * + * + * _vmask is the mask generated by LVS + * _v1st_qw is the 1st aligned QW of current addr which is already loaded. + * for example: 0xyyyyyyyyyyyyy012 for big endian + * _v2nd_qw is the 2nd aligned QW of cur _vaddr to be loaded. + * for example: 0x3456789abcdefzzz for big endian + * The permute result is saved in _v_res. + * for example: 0x0123456789abcdef for big endian. + */ +#define LD_VSR_CROSS16B(_vaddr,_vmask,_v1st_qw,_v2nd_qw,_v_res) \ + lvx _v2nd_qw,_vaddr,off16; \ + VPERM(_v_res,_v1st_qw,_v2nd_qw,_vmask) + /* * There are 2 categories for memcmp: * 1) src/dst has the same offset to the 8 bytes boundary. The handlers @@ -40,7 +102,7 @@ * 2) src/dst has different offset to the 8 bytes boundary. The handlers * are named like .Ldiffoffset_xxxx */ -_GLOBAL(memcmp) +_GLOBAL_TOC(memcmp) cmpdi cr1,r5,0 /* Use the short loop if the src/dst addresses are not @@ -132,7 +194,7 @@ _GLOBAL(memcmp) bgt cr6,.Llong .Lcmp_lt32bytes: - /* compare 1 ~ 32 bytes, at least r3 addr is 8 bytes aligned now */ + /* compare 1 ~ 31 bytes, at least r3 addr is 8 bytes aligned now */ cmpdi cr5,r5,7 srdi r0,r5,3 ble cr5,.Lcmp_rest_lt8bytes @@ -173,6 +235,15 @@ _GLOBAL(memcmp) blr .Llong: +#ifdef CONFIG_ALTIVEC +BEGIN_FTR_SECTION + /* Try to use vmx loop if length is equal or greater than 4K */ + cmpldi cr6,r5,VMX_THRESH + bge cr6,.Lsameoffset_vmx_cmp +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) + +.Llong_novmx_cmp: +#endif /* At least s1 addr is aligned with 8 bytes */ li off8,8 li off16,16 @@ -330,7 +401,97 @@ _GLOBAL(memcmp) li r3,-1 blr +#ifdef CONFIG_ALTIVEC +.Lsameoffset_vmx_cmp: + /* Enter with src/dst addrs has the same offset with 8 bytes + * align boundary + */ + ENTER_VMX_OPS + beq cr1,.Llong_novmx_cmp + +3: + /* need to check whether r4 has the same offset with r3 + * for 16 bytes boundary. + */ + xor r0,r3,r4 + andi. r0,r0,0xf + bne .Ldiffoffset_vmx_cmp_start + + /* len is no less than 4KB. Need to align with 16 bytes further. + */ + andi. rA,r3,8 + LD rA,0,r3 + beq 4f + LD rB,0,r4 + cmpld cr0,rA,rB + addi r3,r3,8 + addi r4,r4,8 + addi r5,r5,-8 + + beq cr0,4f + /* save and restore cr0 */ + mfocrf r5,128 + EXIT_VMX_OPS + mtocrf 128,r5 + b .LcmpAB_lightweight + +4: + /* compare 32 bytes for each loop */ + srdi r0,r5,5 + mtctr r0 + clrldi r5,r5,59 + li off16,16 + +.balign 16 +5: + lvx v0,0,r3 + lvx v1,0,r4 + VCMPEQUD_RC(v0,v0,v1) + bnl cr6,7f + lvx v0,off16,r3 + lvx v1,off16,r4 + VCMPEQUD_RC(v0,v0,v1) + bnl cr6,6f + addi r3,r3,32 + addi r4,r4,32 + bdnz 5b + + EXIT_VMX_OPS + cmpdi r5,0 + beq .Lzero + b .Lcmp_lt32bytes + +6: + addi r3,r3,16 + addi r4,r4,16 + +7: + /* diff the last 16 bytes */ + EXIT_VMX_OPS + LD rA,0,r3 + LD rB,0,r4 + cmpld cr0,rA,rB + li off8,8 + bne cr0,.LcmpAB_lightweight + + LD rA,off8,r3 + LD rB,off8,r4 + cmpld cr0,rA,rB + bne cr0,.LcmpAB_lightweight + b .Lzero +#endif + .Ldiffoffset_8bytes_make_align_start: +#ifdef CONFIG_ALTIVEC +BEGIN_FTR_SECTION + /* only do vmx ops when the size equal or greater than 4K bytes */ + cmpdi cr5,r5,VMX_THRESH + bge cr5,.Ldiffoffset_vmx_cmp +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) + +.Ldiffoffset_novmx_cmp: +#endif + /* now try to align s1 with 8 bytes */ rlwinm r6,r3,3,26,28 beq .Ldiffoffset_align_s1_8bytes @@ -356,6 +517,82 @@ _GLOBAL(memcmp) /* now s1 is aligned with 8 bytes. */ cmpdi cr5,r5,31 ble cr5,.Lcmp_lt32bytes + +#ifdef CONFIG_ALTIVEC + b .Llong_novmx_cmp +#else b .Llong +#endif + +#ifdef CONFIG_ALTIVEC +.Ldiffoffset_vmx_cmp: + ENTER_VMX_OPS + beq cr1,.Ldiffoffset_novmx_cmp + +.Ldiffoffset_vmx_cmp_start: + /* Firstly try to align r3 with 16 bytes */ + andi. r6,r3,0xf + li off16,16 + beq .Ldiffoffset_vmx_s1_16bytes_align + LVS v3,0,r3 + LVS v4,0,r4 + + lvx v5,0,r3 + lvx v6,0,r4 + LD_VSR_CROSS16B(r3,v3,v5,v7,v9) + LD_VSR_CROSS16B(r4,v4,v6,v8,v10) + + VCMPEQUB_RC(v7,v9,v10) + bnl cr6,.Ldiffoffset_vmx_diff_found + + subfic r6,r6,16 + subf r5,r6,r5 + add r3,r3,r6 + add r4,r4,r6 + +.Ldiffoffset_vmx_s1_16bytes_align: + /* now s1 is aligned with 16 bytes */ + lvx v6,0,r4 + LVS v4,0,r4 + srdi r6,r5,5 /* loop for 32 bytes each */ + clrldi r5,r5,59 + mtctr r6 + +.balign 16 +.Ldiffoffset_vmx_32bytesloop: + /* the first qw of r4 was saved in v6 */ + lvx v9,0,r3 + LD_VSR_CROSS16B(r4,v4,v6,v8,v10) + VCMPEQUB_RC(v7,v9,v10) + vor v6,v8,v8 + bnl cr6,.Ldiffoffset_vmx_diff_found + + addi r3,r3,16 + addi r4,r4,16 + + lvx v9,0,r3 + LD_VSR_CROSS16B(r4,v4,v6,v8,v10) + VCMPEQUB_RC(v7,v9,v10) + vor v6,v8,v8 + bnl cr6,.Ldiffoffset_vmx_diff_found + + addi r3,r3,16 + addi r4,r4,16 + + bdnz .Ldiffoffset_vmx_32bytesloop + + EXIT_VMX_OPS + + cmpdi r5,0 + beq .Lzero + b .Lcmp_lt32bytes + +.Ldiffoffset_vmx_diff_found: + EXIT_VMX_OPS + /* anyway, the diff will appear in next 16 bytes */ + li r5,16 + b .Lcmp_lt32bytes + +#endif EXPORT_SYMBOL(memcmp) diff --git a/arch/powerpc/lib/memcpy_power7.S b/arch/powerpc/lib/memcpy_power7.S index df7de9d..070cdf6 100644 --- a/arch/powerpc/lib/memcpy_power7.S +++ b/arch/powerpc/lib/memcpy_power7.S @@ -230,7 +230,7 @@ _GLOBAL(memcpy_power7) std r5,-STACKFRAMESIZE+STK_REG(R29)(r1) std r0,16(r1) stdu r1,-STACKFRAMESIZE(r1) - bl enter_vmx_copy + bl enter_vmx_ops cmpwi cr1,r3,0 ld r0,STACKFRAMESIZE+16(r1) ld r3,STK_REG(R31)(r1) @@ -445,7 +445,7 @@ _GLOBAL(memcpy_power7) 15: addi r1,r1,STACKFRAMESIZE ld r3,-STACKFRAMESIZE+STK_REG(R31)(r1) - b exit_vmx_copy /* tail call optimise */ + b exit_vmx_ops /* tail call optimise */ .Lvmx_unaligned_copy: /* Get the destination 16B aligned */ @@ -649,5 +649,5 @@ _GLOBAL(memcpy_power7) 15: addi r1,r1,STACKFRAMESIZE ld r3,-STACKFRAMESIZE+STK_REG(R31)(r1) - b exit_vmx_copy /* tail call optimise */ + b exit_vmx_ops /* tail call optimise */ #endif /* CONFIG_ALTIVEC */ diff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c index bf925cd..9f34049 100644 --- a/arch/powerpc/lib/vmx-helper.c +++ b/arch/powerpc/lib/vmx-helper.c @@ -53,7 +53,7 @@ int exit_vmx_usercopy(void) return 0; } -int enter_vmx_copy(void) +int enter_vmx_ops(void) { if (in_interrupt()) return 0; @@ -70,7 +70,7 @@ int enter_vmx_copy(void) * passed a pointer to the destination which we return as required by a * memcpy implementation. */ -void *exit_vmx_copy(void *dest) +void *exit_vmx_ops(void *dest) { disable_kernel_altivec(); preempt_enable(); From patchwork Thu Jun 7 01:57:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Guo X-Patchwork-Id: 926086 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 411TXT09W3z9s1B for ; Thu, 7 Jun 2018 12:10:33 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="KdnCEdBk"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 411TXS5XxXzF323 for ; Thu, 7 Jun 2018 12:10:32 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="KdnCEdBk"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c05::244; helo=mail-pg0-x244.google.com; envelope-from=wei.guo.simon@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="KdnCEdBk"; dkim-atps=neutral Received: from mail-pg0-x244.google.com (mail-pg0-x244.google.com [IPv6:2607:f8b0:400e:c05::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 411TGM5LfCzF1Nq for ; Thu, 7 Jun 2018 11:58:19 +1000 (AEST) Received: by mail-pg0-x244.google.com with SMTP id p21-v6so3907503pgd.11 for ; Wed, 06 Jun 2018 18:58:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=I8LloTzAYJTbCjwYDFdHSAiYJ7H2lIMC40hQNVhOIcc=; b=KdnCEdBkuisI3Vn2IIeUdNE68Eon0LCxwY5/wmABnYz5jMUB0Mhmhg38oU6clf3Anl 5F/EauBJHduAvnAy5LZniHbB7H4tT5kJJzTp/2xCzH3+5x6m70wamjnox9qk/hp1lK9y 40Git7qg/8rZBrGx+26IUCyZ82qwDNRHllruayfNzJAqV7QkvffEkNcH0BUMIbRBMJpb ovBbRxYZRO/QUJ24kfFXp7oWEeB+j5rQbnFaHaDNKkVAdNY7/AiRgRl8RRYsJynpKfOD NFfZCcoc9AYSJhj/wYI57VouuceaVsuUzLAZ8rAnkYB0oE2k3zmJaFiM8vK8a43L8rAo pUFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=I8LloTzAYJTbCjwYDFdHSAiYJ7H2lIMC40hQNVhOIcc=; b=ja9SxtJ60j9uXsjWPqzCeRcQ64c17UurYFV+qejHQgBbeKMnZbBUrY2iNS/TgtFrog PZuGJW+vUjHf/pTme0/sdiYI1JxOJpYJfZ6l7ZYUcdfwY5pnmNSpVc43Wjj+5XUAiNem qxdZ4v5coZqOVObg8exH3LhQ92B2CdFDD3NvqQbaYp/MHQ0kqpZLQPs24QSYpzgNMPC5 cVPoXA7awaMH5sqq8CFZ7Q/rHKatPSaXR7JPqtt+rkbqeCXDvrfbIzAyGOWVqIOSl0n3 /AIUsPCwZ9R+UfcYaKwkpn895nc2iqj1T1GJGKywsfX0jgoHWu716PbiKR4ppxH2A39D nzHw== X-Gm-Message-State: APt69E0Pt2R+NrpuL/Xg5CNdTmJJufh54ciHtxwEfJcCjUylguQKLy0A 1kXQXy+KHhFOH7aONN3Snbsmvg== X-Google-Smtp-Source: ADUXVKISYbVxLeX52aaRqJcIdJFF5iScER/XZIfc/45zd4CnkWq+FjsOlXkCLolLpJPen/D2e1i+Rw== X-Received: by 2002:a65:62ce:: with SMTP id m14-v6mr4442399pgv.407.1528336697804; Wed, 06 Jun 2018 18:58:17 -0700 (PDT) Received: from simonLocalRHEL7.cn.ibm.com ([112.73.0.89]) by smtp.gmail.com with ESMTPSA id g4-v6sm51946444pfg.38.2018.06.06.18.58.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jun 2018 18:58:17 -0700 (PDT) From: wei.guo.simon@gmail.com To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v8 4/5] powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() Date: Thu, 7 Jun 2018 09:57:54 +0800 Message-Id: <1528336675-10879-5-git-send-email-wei.guo.simon@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> References: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Naveen N. Rao" , Simon Guo , Cyril Bur Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Simon Guo This patch is based on the previous VMX patch on memcmp(). To optimize ppc64 memcmp() with VMX instruction, we need to think about the VMX penalty brought with: If kernel uses VMX instruction, it needs to save/restore current thread's VMX registers. There are 32 x 128 bits VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store. The major concern regarding the memcmp() performance in kernel is KSM, who will use memcmp() frequently to merge identical pages. So it will make sense to take some measures/enhancement on KSM to see whether any improvement can be done here. Cyril Bur indicates that the memcmp() for KSM has a higher possibility to fail (unmatch) early in previous bytes in following mail. https://patchwork.ozlabs.org/patch/817322/#1773629 And I am taking a follow-up on this with this patch. Per some testing, it shows KSM memcmp() will fail early at previous 32 bytes. More specifically: - 76% cases will fail/unmatch before 16 bytes; - 83% cases will fail/unmatch before 32 bytes; - 84% cases will fail/unmatch before 64 bytes; So 32 bytes looks a better choice than other bytes for pre-checking. The early failure is also true for memcmp() for non-KSM case. With a non-typical call load, it shows ~73% cases fail before first 32 bytes. This patch adds a 32 bytes pre-checking firstly before jumping into VMX operations, to avoid the unnecessary VMX penalty. It is not limited to KSM case. And the testing shows ~20% improvement on memcmp() average execution time with this patch. And note the 32B pre-checking is only performed when the compare size is long enough (>=4K currently) to allow VMX operation. The detail data and analysis is at: https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md Signed-off-by: Simon Guo --- arch/powerpc/lib/memcmp_64.S | 57 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S index be2f792..844d8e7 100644 --- a/arch/powerpc/lib/memcmp_64.S +++ b/arch/powerpc/lib/memcmp_64.S @@ -404,8 +404,27 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) #ifdef CONFIG_ALTIVEC .Lsameoffset_vmx_cmp: /* Enter with src/dst addrs has the same offset with 8 bytes - * align boundary + * align boundary. + * + * There is an optimization based on following fact: memcmp() + * prones to fail early at the first 32 bytes. + * Before applying VMX instructions which will lead to 32x128bits + * VMX regs load/restore penalty, we compare the first 32 bytes + * so that we can catch the ~80% fail cases. */ + + li r0,4 + mtctr r0 +.Lsameoffset_prechk_32B_loop: + LD rA,0,r3 + LD rB,0,r4 + cmpld cr0,rA,rB + addi r3,r3,8 + addi r4,r4,8 + bne cr0,.LcmpAB_lightweight + addi r5,r5,-8 + bdnz .Lsameoffset_prechk_32B_loop + ENTER_VMX_OPS beq cr1,.Llong_novmx_cmp @@ -482,16 +501,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) #endif .Ldiffoffset_8bytes_make_align_start: -#ifdef CONFIG_ALTIVEC -BEGIN_FTR_SECTION - /* only do vmx ops when the size equal or greater than 4K bytes */ - cmpdi cr5,r5,VMX_THRESH - bge cr5,.Ldiffoffset_vmx_cmp -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) - -.Ldiffoffset_novmx_cmp: -#endif - /* now try to align s1 with 8 bytes */ rlwinm r6,r3,3,26,28 beq .Ldiffoffset_align_s1_8bytes @@ -515,6 +524,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) .Ldiffoffset_align_s1_8bytes: /* now s1 is aligned with 8 bytes. */ +#ifdef CONFIG_ALTIVEC +BEGIN_FTR_SECTION + /* only do vmx ops when the size equal or greater than 4K bytes */ + cmpdi cr5,r5,VMX_THRESH + bge cr5,.Ldiffoffset_vmx_cmp +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) + +.Ldiffoffset_novmx_cmp: +#endif + + cmpdi cr5,r5,31 ble cr5,.Lcmp_lt32bytes @@ -526,6 +546,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) #ifdef CONFIG_ALTIVEC .Ldiffoffset_vmx_cmp: + /* perform a 32 bytes pre-checking before + * enable VMX operations. + */ + li r0,4 + mtctr r0 +.Ldiffoffset_prechk_32B_loop: + LD rA,0,r3 + LD rB,0,r4 + cmpld cr0,rA,rB + addi r3,r3,8 + addi r4,r4,8 + bne cr0,.LcmpAB_lightweight + addi r5,r5,-8 + bdnz .Ldiffoffset_prechk_32B_loop + ENTER_VMX_OPS beq cr1,.Ldiffoffset_novmx_cmp From patchwork Thu Jun 7 01:57:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Guo X-Patchwork-Id: 926087 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 411TbR4rHLz9s1B for ; Thu, 7 Jun 2018 12:13:07 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LKgZbyfB"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 411TbR3H11zF32Y for ; Thu, 7 Jun 2018 12:13:07 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LKgZbyfB"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c05::241; helo=mail-pg0-x241.google.com; envelope-from=wei.guo.simon@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LKgZbyfB"; dkim-atps=neutral Received: from mail-pg0-x241.google.com (mail-pg0-x241.google.com [IPv6:2607:f8b0:400e:c05::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 411TGQ4f7CzDrnp for ; Thu, 7 Jun 2018 11:58:22 +1000 (AEST) Received: by mail-pg0-x241.google.com with SMTP id 15-v6so3916440pge.2 for ; Wed, 06 Jun 2018 18:58:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=nx4HSBjrSPpcRHZpn2f7wQUJJ2j+rIUz8cUxx08VhAQ=; b=LKgZbyfBIi8VRkh7XFudPX7DQWHh63ItVT7jwlsTLsx7C1f2lP+AdM3zVvOD7o+Hk7 OXEM1zXFxSkRGTHq7kbg0ehMBDwqqBMLQlaxEMFRtwMvj1B+7Ipy1TSKWEBZWvUUis6V uS+fhKHmQ92bipPrmr5wlTJxlhOeosHMpoa7u/5+vL321XkIOyJqIrd3TXnjAFQfWouB wQiUoiZgd+60QCAcaqT0eJXPyBqBgFqF4SluHSLpCzznn1gRlAYyvBuGzfXdEEW1PLee W3TEzvvANGD8mBjzBu3SEHVE5MGgnXxgg2CPIm3YIjZBl0DNwEialqm3Ud4/MoaOvyYw IGGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=nx4HSBjrSPpcRHZpn2f7wQUJJ2j+rIUz8cUxx08VhAQ=; b=b7WUWmLHOo7N6VbAaLN0jYxOPavvqC/U17iEZhkC32+MHRWo07w8G8kRBIQENWDMj+ yC2IJ9GM9FRoW5Stdb+1QtT2dZoC1CrMlebwrvu8m/cdZXR1Z54O4yxS69QM9K6hq4H0 bD6ntU4avdBKE/yoyBEt8RCHoZsKh4wDYvwpWwGurBWhF2hDojCzBjILtyUotP8vXyoh mzZUliyOsBka08+tG38FMgvECcpWyMB/9zT9n9y5s18s/VDvRhw2IiCRDfTelVVRA4JC oMM4JtVS/IlMfS+J4ZVLgnOdGbgbMloPL2uhjzd+E8TP8oGP+Vv2EVXne+gkEQcsWfhb x9YA== X-Gm-Message-State: APt69E2K/F1UYFTo+HyrpSmBuYRXMMp3iyTnJTgD9lN0dNvY3gcYHmyc ItjQrWbQQB3N36py8bzu5Pn8PA== X-Google-Smtp-Source: ADUXVKIXNdaKFgR9lm9yayI1nTzxUtkMN0E7DvACGhObfFN0JJOPHV87DKKd2E1Y7jCfF1Q556rRuw== X-Received: by 2002:a63:6485:: with SMTP id y127-v6mr2720754pgb.126.1528336700645; Wed, 06 Jun 2018 18:58:20 -0700 (PDT) Received: from simonLocalRHEL7.cn.ibm.com ([112.73.0.89]) by smtp.gmail.com with ESMTPSA id g4-v6sm51946444pfg.38.2018.06.06.18.58.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jun 2018 18:58:20 -0700 (PDT) From: wei.guo.simon@gmail.com To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v8 5/5] powerpc:selftest update memcmp_64 selftest for VMX implementation Date: Thu, 7 Jun 2018 09:57:55 +0800 Message-Id: <1528336675-10879-6-git-send-email-wei.guo.simon@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> References: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Naveen N. Rao" , Simon Guo , Cyril Bur Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Simon Guo This patch reworked selftest memcmp_64 so that memcmp selftest can cover more test cases. It adds testcases for: - memcmp over 4K bytes size. - s1/s2 with different/random offset on 16 bytes boundary. - enter/exit_vmx_ops pairness. Signed-off-by: Simon Guo --- .../selftests/powerpc/copyloops/asm/ppc_asm.h | 4 +- .../selftests/powerpc/stringloops/asm/ppc-opcode.h | 39 +++++++++ .../selftests/powerpc/stringloops/asm/ppc_asm.h | 25 ++++++ .../testing/selftests/powerpc/stringloops/memcmp.c | 98 +++++++++++++++++----- 4 files changed, 142 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h diff --git a/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h index 5ffe04d..dfce161 100644 --- a/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h +++ b/tools/testing/selftests/powerpc/copyloops/asm/ppc_asm.h @@ -36,11 +36,11 @@ li r3,0 blr -FUNC_START(enter_vmx_copy) +FUNC_START(enter_vmx_ops) li r3,1 blr -FUNC_START(exit_vmx_copy) +FUNC_START(exit_vmx_ops) blr FUNC_START(memcpy_power7) diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h b/tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h new file mode 100644 index 0000000..9de413c --- /dev/null +++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h @@ -0,0 +1,39 @@ +/* + * Copyright 2009 Freescale Semiconductor, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * provides masks and opcode images for use by code generation, emulation + * and for instructions that older assemblers might not know about + */ +#ifndef _ASM_POWERPC_PPC_OPCODE_H +#define _ASM_POWERPC_PPC_OPCODE_H + + +# define stringify_in_c(...) __VA_ARGS__ +# define ASM_CONST(x) x + + +#define PPC_INST_VCMPEQUD_RC 0x100000c7 +#define PPC_INST_VCMPEQUB_RC 0x10000006 + +#define __PPC_RC21 (0x1 << 10) + +/* macros to insert fields into opcodes */ +#define ___PPC_RA(a) (((a) & 0x1f) << 16) +#define ___PPC_RB(b) (((b) & 0x1f) << 11) +#define ___PPC_RS(s) (((s) & 0x1f) << 21) +#define ___PPC_RT(t) ___PPC_RS(t) + +#define VCMPEQUD_RC(vrt, vra, vrb) stringify_in_c(.long PPC_INST_VCMPEQUD_RC | \ + ___PPC_RT(vrt) | ___PPC_RA(vra) | \ + ___PPC_RB(vrb) | __PPC_RC21) + +#define VCMPEQUB_RC(vrt, vra, vrb) stringify_in_c(.long PPC_INST_VCMPEQUB_RC | \ + ___PPC_RT(vrt) | ___PPC_RA(vra) | \ + ___PPC_RB(vrb) | __PPC_RC21) + +#endif /* _ASM_POWERPC_PPC_OPCODE_H */ diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h index 136242e..d2c0a91 100644 --- a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h +++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h @@ -1,4 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _PPC_ASM_H +#define __PPC_ASM_H #include #ifndef r1 @@ -6,3 +8,26 @@ #endif #define _GLOBAL(A) FUNC_START(test_ ## A) +#define _GLOBAL_TOC(A) FUNC_START(test_ ## A) + +#define CONFIG_ALTIVEC + +#define R14 r14 +#define R15 r15 +#define R16 r16 +#define R17 r17 +#define R18 r18 +#define R19 r19 +#define R20 r20 +#define R21 r21 +#define R22 r22 +#define R29 r29 +#define R30 r30 +#define R31 r31 + +#define STACKFRAMESIZE 256 +#define STK_REG(i) (112 + ((i)-14)*8) + +#define BEGIN_FTR_SECTION +#define END_FTR_SECTION_IFSET(val) +#endif diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp.c b/tools/testing/selftests/powerpc/stringloops/memcmp.c index 8250db2..b5cf717 100644 --- a/tools/testing/selftests/powerpc/stringloops/memcmp.c +++ b/tools/testing/selftests/powerpc/stringloops/memcmp.c @@ -2,20 +2,40 @@ #include #include #include +#include #include "utils.h" #define SIZE 256 #define ITERATIONS 10000 +#define LARGE_SIZE (5 * 1024) +#define LARGE_ITERATIONS 1000 +#define LARGE_MAX_OFFSET 32 +#define LARGE_SIZE_START 4096 + +#define MAX_OFFSET_DIFF_S1_S2 48 + +int vmx_count; +int enter_vmx_ops(void) +{ + vmx_count++; + return 1; +} + +void exit_vmx_ops(void) +{ + vmx_count--; +} int test_memcmp(const void *s1, const void *s2, size_t n); /* test all offsets and lengths */ -static void test_one(char *s1, char *s2) +static void test_one(char *s1, char *s2, unsigned long max_offset, + unsigned long size_start, unsigned long max_size) { unsigned long offset, size; - for (offset = 0; offset < SIZE; offset++) { - for (size = 0; size < (SIZE-offset); size++) { + for (offset = 0; offset < max_offset; offset++) { + for (size = size_start; size < (max_size - offset); size++) { int x, y; unsigned long i; @@ -35,70 +55,104 @@ static void test_one(char *s1, char *s2) printf("\n"); abort(); } + + if (vmx_count != 0) { + printf("vmx enter/exit not paired.(offset:%ld size:%ld s1:%p s2:%p vc:%d\n", + offset, size, s1, s2, vmx_count); + printf("\n"); + abort(); + } } } } -static int testcase(void) +static int testcase(bool islarge) { char *s1; char *s2; unsigned long i; - s1 = memalign(128, SIZE); + unsigned long comp_size = (islarge ? LARGE_SIZE : SIZE); + unsigned long alloc_size = comp_size + MAX_OFFSET_DIFF_S1_S2; + int iterations = islarge ? LARGE_ITERATIONS : ITERATIONS; + + s1 = memalign(128, alloc_size); if (!s1) { perror("memalign"); exit(1); } - s2 = memalign(128, SIZE); + s2 = memalign(128, alloc_size); if (!s2) { perror("memalign"); exit(1); } - srandom(1); + srandom(time(0)); - for (i = 0; i < ITERATIONS; i++) { + for (i = 0; i < iterations; i++) { unsigned long j; unsigned long change; + char *rand_s1 = s1; + char *rand_s2 = s2; - for (j = 0; j < SIZE; j++) + for (j = 0; j < alloc_size; j++) s1[j] = random(); - memcpy(s2, s1, SIZE); + rand_s1 += random() % MAX_OFFSET_DIFF_S1_S2; + rand_s2 += random() % MAX_OFFSET_DIFF_S1_S2; + memcpy(rand_s2, rand_s1, comp_size); /* change one byte */ - change = random() % SIZE; - s2[change] = random() & 0xff; - - test_one(s1, s2); + change = random() % comp_size; + rand_s2[change] = random() & 0xff; + + if (islarge) + test_one(rand_s1, rand_s2, LARGE_MAX_OFFSET, + LARGE_SIZE_START, comp_size); + else + test_one(rand_s1, rand_s2, SIZE, 0, comp_size); } - srandom(1); + srandom(time(0)); - for (i = 0; i < ITERATIONS; i++) { + for (i = 0; i < iterations; i++) { unsigned long j; unsigned long change; + char *rand_s1 = s1; + char *rand_s2 = s2; - for (j = 0; j < SIZE; j++) + for (j = 0; j < alloc_size; j++) s1[j] = random(); - memcpy(s2, s1, SIZE); + rand_s1 += random() % MAX_OFFSET_DIFF_S1_S2; + rand_s2 += random() % MAX_OFFSET_DIFF_S1_S2; + memcpy(rand_s2, rand_s1, comp_size); /* change multiple bytes, 1/8 of total */ - for (j = 0; j < SIZE / 8; j++) { - change = random() % SIZE; + for (j = 0; j < comp_size / 8; j++) { + change = random() % comp_size; s2[change] = random() & 0xff; } - test_one(s1, s2); + if (islarge) + test_one(rand_s1, rand_s2, LARGE_MAX_OFFSET, + LARGE_SIZE_START, comp_size); + else + test_one(rand_s1, rand_s2, SIZE, 0, comp_size); } return 0; } +static int testcases(void) +{ + testcase(0); + testcase(1); + return 0; +} + int main(void) { - return test_harness(testcase, "memcmp"); + return test_harness(testcases, "memcmp"); }