From patchwork Thu Jun 7 01:57:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Guo X-Patchwork-Id: 926082 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 411TJW1fQQz9s1B for ; Thu, 7 Jun 2018 12:00:11 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="jnXJapR7"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 411TJV5qkwzF327 for ; Thu, 7 Jun 2018 12:00:10 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="jnXJapR7"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c05::244; helo=mail-pg0-x244.google.com; envelope-from=wei.guo.simon@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="jnXJapR7"; dkim-atps=neutral Received: from mail-pg0-x244.google.com (mail-pg0-x244.google.com [IPv6:2607:f8b0:400e:c05::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 411TG92y3nzDrpC for ; Thu, 7 Jun 2018 11:58:09 +1000 (AEST) Received: by mail-pg0-x244.google.com with SMTP id 15-v6so3916200pge.2 for ; Wed, 06 Jun 2018 18:58:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=VdL3OyQkkH6EJWUcIQHZXE2haL3VdcrjEGuT53svAE4=; b=jnXJapR7oEgv11m9chmhe0BFDqYEVCKIX24O2xxEhjdFWl7Z/Dnvu1/BOCtA0ZZnZ5 Kz+almL3tMViTsSBD5DbAx2wOxW89ONd4fXvEZfEurWoL+QHGDIyhRqyW2QkiGP36/T8 bIVvE52/npURYk6grx8AbhdiylVPTxXpbNq8t/+U2xYQWijoQ7puzznzfggVVLj0Ugwe qr7xt0chfdaw8NaidgcY+vO2Xsz6pgsGkI0t2r6ewXqqMzJNAZlnDNhyKeoT6lkHDxCb Aak672KvMgmXDxyuBdvr8RJRdXCryqvQkHetXGqB1zK2hmXBBmVH7YHqwvCRVSE7r+EY BZxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=VdL3OyQkkH6EJWUcIQHZXE2haL3VdcrjEGuT53svAE4=; b=dU3f2abcbXeiA59ATuxhlNR1b9Kv1XIxEiEc8MRkCxb0CPchby8du6D2wjiPjhChL5 Lsn6LOHarm5Tdl8V3xZCu+bxmlvuAY6wmonLSEPeU4k+i/eryoqYVjqQRSmH/eGSvmaT hK2LZQfb0HbhP+Oyt8qqxesqd5KIePwA2TkDyYxGWD0kcxl05Z+XnHseoeb1ItmnV1cZ AULKKn5LQsdBkFAG3Cy8n4DgNm+HBeLgvOg2r2Q8jD4eGYkXKiqFjSiL8NuRPKHV5Ev7 gqWbLfM6dr0EMqI492aa4yeDsfwulsUHkT2co+IT/+NL4CAML/jAVE+H6F090f8/4vfL EORQ== X-Gm-Message-State: APt69E2qLM7y0Dn23vLLW5W1MyZJCvaKbhMwK3lERUBLEkTrtMrpRL0H f+PaEDn35KBgiNB7CsW3iEnumQ== X-Google-Smtp-Source: ADUXVKKWNQfxdjPMkajP4sZQJesbLjaRJAC8Bm5YgQwY9x5WScBT9r38nBPCf5/NwPZRcqkHEScExQ== X-Received: by 2002:a62:c61d:: with SMTP id m29-v6mr4717535pfg.26.1528336686627; Wed, 06 Jun 2018 18:58:06 -0700 (PDT) Received: from simonLocalRHEL7.cn.ibm.com ([112.73.0.89]) by smtp.gmail.com with ESMTPSA id g4-v6sm51946444pfg.38.2018.06.06.18.58.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jun 2018 18:58:05 -0700 (PDT) From: wei.guo.simon@gmail.com To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v8 0/5] powerpc/64: memcmp() optimization Date: Thu, 7 Jun 2018 09:57:50 +0800 Message-Id: <1528336675-10879-1-git-send-email-wei.guo.simon@gmail.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Naveen N. Rao" , Simon Guo , Cyril Bur Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Simon Guo There is some room to optimize memcmp() in powerpc 64 bits version for following 2 cases: (1) Even src/dst addresses are not aligned with 8 bytes at the beginning, memcmp() can align them and go with .Llong comparision mode without fallback to .Lshort comparision mode do compare buffer byte by byte. (2) VMX instructions can be used to speed up for large size comparision, currently the threshold is set for 4K bytes. Notes the VMX instructions will lead to VMX regs save/load penalty. This patch set includes a patch to add a 32 bytes pre-checking to minimize the penalty. It did the similar with glibc commit dec4a7105e (powerpc: Improve memcmp performance for POWER8). Thanks Cyril Bur's information. This patch set also updates memcmp selftest case to make it compiled and incorporate large size comparison case. v7 -> v8: - define memcmp with _GLOBAL_TOC() instead of _GLOBAL() to fix TOC issue. add _GLOBAL_TOC() definition into selftest so that it can be compiled. - use mfocrf/mtocrf instead of mcrf to save/restore CR0 v6 -> v7: - add vcmpequd/vcmpequdb .long macro - add CPU_FTR pair so that Power7 won't invoke Altivec instrs. - rework some instructions for higher performance or more readable. v5 -> v6: - correct some comments/commit messsage. - rename VMX_OPS_THRES to VMX_THRESH v4 -> v5: - Expand 32 bytes prechk to src/dst different offset case, and remove KSM specific label/comment. v3 -> v4: - Add 32 bytes pre-checking before using VMX instructions. v2 -> v3: - add optimization for src/dst with different offset against 8 bytes boundary. - renamed some label names. - reworked some comments from Cyril Bur, such as fill the pipeline, and use VMX when size == 4K. - fix a bug of enter/exit_vmx_ops pairness issue. And revised test case to test whether enter/exit_vmx_ops are paired. v1 -> v2: - update 8bytes unaligned bytes comparison method. - fix a VMX comparision bug. - enhanced the original memcmp() selftest. - add powerpc/64 to subject/commit message. Simon Guo (5): powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() powerpc: add vcmpequd/vcmpequb ppc instruction macro powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() powerpc:selftest update memcmp_64 selftest for VMX implementation arch/powerpc/include/asm/asm-prototypes.h | 4 +- arch/powerpc/include/asm/ppc-opcode.h | 11 + arch/powerpc/lib/copypage_power7.S | 4 +- arch/powerpc/lib/memcmp_64.S | 414 ++++++++++++++++++++- arch/powerpc/lib/memcpy_power7.S | 6 +- arch/powerpc/lib/vmx-helper.c | 4 +- .../selftests/powerpc/copyloops/asm/ppc_asm.h | 4 +- .../selftests/powerpc/stringloops/asm/ppc-opcode.h | 39 ++ .../selftests/powerpc/stringloops/asm/ppc_asm.h | 25 ++ .../testing/selftests/powerpc/stringloops/memcmp.c | 98 +++-- 10 files changed, 568 insertions(+), 41 deletions(-) create mode 100644 tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h