From patchwork Thu Oct 27 13:02:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Jones X-Patchwork-Id: 1695409 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=W/nxq3aS; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=desiato.20200630 header.b=f3dpru2z; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=nicqwmHc; dkim-atps=neutral Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MymBy4zt1z23kY for ; Fri, 28 Oct 2022 00:06:00 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=v1I2ZNgIh0zJURINPBygv+LF35Kkf1y+sbc6D8plNos=; b=W/nxq3aS8XvsQt SNNcyMpd9ejoQyYAaSs3VLgPnLKTTP8Mk++SPBsG/y9nycs9QgtjvcpmrXEXhoRepzH/mvTn3M7o0 LNbDR2kZPLNmA2FGD2n8D74hCnLothVdMc6sK2gQzW7vDExlNcoyf0aFhCSwD2M/RdEUq/muSD0um UYC+Na3Zhows2tfIWqPH3lV8jUEbO2/0CleW432axkcet6R0VkQhsq8r0Y6GIQZVE/+4let2DkJCT V2PcT2bVZKi/HCV+UVA0Ep+XCEk7a0KUPF2zn/V5WP7rtGfAVXGMo7/xVq0qkqVffjNw9cBLKNXaf 4Cok5V4M+l0w4QOE1wBQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oo2a2-00DKXK-U7; Thu, 27 Oct 2022 13:05:54 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oo2XY-00DJDP-3t for kvm-riscv@bombadil.infradead.org; Thu, 27 Oct 2022 13:03:20 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:Content-type :MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Sender:Reply-To:Content-ID:Content-Description; bh=GAX0DbNejfkrqb/S2+c7OCAri1uVnAqmCEyoErzwy+w=; b=f3dpru2zN+uJStZ+Kv3PpdahfB mSehHh4DTYJNwEhBmTZfqZ31sDiHz6Jfs2Pz2XPQFKzdIZ4DUAm39ZbXifORUp5jI8X2cukRUsqLm ckOF0xDYn4f7Hj1JPKINZfvuuwnsipUcZlLaHB0CoU2PmOWaQ7YmDXmvog0iq5rGuW6p0T2xjlHOO DZgr7JJvKBeql0sDbo/cKvucSSyzU56eprL1LfA95/L/QsN1H9489vggMbuC0axls5Aae/qTyu2Lv dohbfFyZMbG9AP7OXioHUoXrJ832d5NleR6sv2tijQCR9OrNoj82csEy2sPvqn7lgNrZzvmELfI+i xsDKA+yw==; Received: from mail-wm1-x32d.google.com ([2a00:1450:4864:20::32d]) by desiato.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oo2XR-006rb7-Ba for kvm-riscv@lists.infradead.org; Thu, 27 Oct 2022 13:03:17 +0000 Received: by mail-wm1-x32d.google.com with SMTP id n14so943986wmq.3 for ; Thu, 27 Oct 2022 06:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GAX0DbNejfkrqb/S2+c7OCAri1uVnAqmCEyoErzwy+w=; b=nicqwmHcGSd/cPEoKhB2mITBUJDVsZIpcBjUeRu/UNeejWFxAbVCbiXvQNCBT1U+RT GKpBsNF4KSut9TthK82igyPHqyeWQtpvOVaHpHNqpax8QG89meINxY+BIqWWSpZzfylo 2bmdt/+48kjj7k2d3Kk59pZ6Pl/Nz6CDqQsirFrFij058JSsHZoNZLRlunZEimKhejsj i1CVWsLdrvywT3e8/rPByTZ6F46agQ4ANm0wafDnbfyFKQSkqtbwHeoL6AlCV5tVRafs Y+t2okAsaU19qg1wwAEsy4GV3NYke+Y80WB8ACVSYvOMx5WMJsz9ggUN/GUDCcfJT9Vm uBQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GAX0DbNejfkrqb/S2+c7OCAri1uVnAqmCEyoErzwy+w=; b=ZiKdMe+v6zt1fE+TLhVBfNzYrq4bgyULgh7zrkxFTGEycLjdtkbPthIhJf31fyNUk1 CuQ5fz5cKUEg0KxeBeCpbPuvBHe5duVFe8/8Cube+W/rCQX6WwKuMW8HARCmj1fyoMTU uWNpvOtTAlKV+DRSg+ObRymcaQTds+RIGhWziRuMGK+Vo7shOK3MKCGGzmoRmgsx/rjK CoKFjwIwWXpALOnkeddw6B6mPbcZdu1TfbhkPNGkqK2GGXwPmxX2/cCFLU6S0nB2CxHk Ob4WAIGO9P/zMmnoLOTFAWAXk56t7S/jQ60YkKrFQE5GUKND4IIR2KOdS8k6OsFwESge cjSw== X-Gm-Message-State: ACrzQf1+sRf1/alhhdICU0OsK1kiJibZdoClz7pxxZKIMDvAzoleo6Uu j8+sQAfHE9OXi1in37IKXkPV0A== X-Google-Smtp-Source: AMsMyM7ARWH3OcOCRtZrDUXW7Z8rm9Eg/J7+KLXEhlq6z0XD8gxE/KqKxtjNZJgJbPW14RpQGr+eQA== X-Received: by 2002:a05:600c:3147:b0:3c6:f860:9610 with SMTP id h7-20020a05600c314700b003c6f8609610mr5950330wmo.170.1666875786146; Thu, 27 Oct 2022 06:03:06 -0700 (PDT) Received: from localhost (cst2-173-61.cust.vodafone.cz. [31.30.173.61]) by smtp.gmail.com with ESMTPSA id e4-20020a5d5004000000b0023655e51c33sm1119677wrt.4.2022.10.27.06.03.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 06:03:05 -0700 (PDT) From: Andrew Jones To: linux-riscv@lists.infradead.org, kvm-riscv@lists.infradead.org Cc: Paul Walmsley , Palmer Dabbelt , Albert Ou , Anup Patel , Heiko Stuebner , Conor Dooley , Atish Patra , Jisheng Zhang Subject: [PATCH 9/9] RISC-V: Use Zicboz in memset when available Date: Thu, 27 Oct 2022 15:02:47 +0200 Message-Id: <20221027130247.31634-10-ajones@ventanamicro.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221027130247.31634-1-ajones@ventanamicro.com> References: <20221027130247.31634-1-ajones@ventanamicro.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221027_140314_391861_B88656C7 X-CRM114-Status: GOOD ( 15.64 ) X-Spam-Score: -0.2 (/) X-Spam-Report: Spam detection software, running on the system "desiato.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: RISC-V has an optimized memset() which does byte by byte writes up to the first sizeof(long) aligned address, then uses Duff's device until the last sizeof(long) aligned address, and finally byte by b [...] Content analysis details: (-0.2 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2a00:1450:4864:20:0:0:0:32d listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.0 T_SCC_BODY_TEXT_LINE No description available. X-BeenThere: kvm-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "kvm-riscv" Errors-To: kvm-riscv-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org RISC-V has an optimized memset() which does byte by byte writes up to the first sizeof(long) aligned address, then uses Duff's device until the last sizeof(long) aligned address, and finally byte by byte to the end. When memset is used to zero memory and the Zicboz extension is available, then we can extend that by doing the optimized memset up to the first Zicboz block size aligned address, then use the Zicboz zero instruction for each block to the last block size aligned address, and finally the optimized memset to the end. Signed-off-by: Andrew Jones --- arch/riscv/lib/memset.S | 81 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/arch/riscv/lib/memset.S b/arch/riscv/lib/memset.S index 74e4c7feec00..786b85b5e9cc 100644 --- a/arch/riscv/lib/memset.S +++ b/arch/riscv/lib/memset.S @@ -5,6 +5,12 @@ #include #include +#include +#include +#include + +#define ALT_ZICBOZ(old, new) ALTERNATIVE(old, new, 0, RISCV_ISA_EXT_ZICBOZ, \ + CONFIG_RISCV_ISA_ZICBOZ) /* void *memset(void *, int, size_t) */ ENTRY(__memset) @@ -15,6 +21,58 @@ WEAK(memset) sltiu a3, a2, 16 bnez a3, .Lfinish +#ifdef CONFIG_RISCV_ISA_ZICBOZ + ALT_ZICBOZ("j .Ldo_memset", "nop") + /* + * t1 will be the Zicboz block size. + * Zero means we're not using Zicboz, and we don't when a1 != 0 + */ + li t1, 0 + bnez a1, .Ldo_memset + la a3, riscv_cboz_block_size + lw t1, 0(a3) + + /* + * Round to nearest Zicboz block-aligned address + * greater than or equal to the start address. + */ + addi a3, t1, -1 + not t2, a3 /* t2 is Zicboz block size mask */ + add a3, t0, a3 + and t3, a3, t2 /* t3 is Zicboz block aligned start */ + + /* Did we go too far or not have at least one block? */ + add a3, a0, a2 + and a3, a3, t2 + bgtu a3, t3, .Ldo_zero + li t1, 0 + j .Ldo_memset + +.Ldo_zero: + /* Use Duff for initial bytes if there are any */ + bne t3, t0, .Ldo_memset + +.Ldo_zero2: + /* Calculate end address */ + and a3, a2, t2 + add a3, t0, a3 + sub a4, a3, t0 + +.Lzero_loop: + CBO_ZERO(t0) + add t0, t0, t1 + bltu t0, a3, .Lzero_loop + li t1, 0 /* We're done with Zicboz */ + + sub a2, a2, a4 /* Update count */ + sltiu a3, a2, 16 + bnez a3, .Lfinish + + /* t0 is Zicboz block size aligned, so it must be SZREG aligned */ + j .Ldo_duff3 +#endif + +.Ldo_memset: /* * Round to nearest XLEN-aligned address * greater than or equal to the start address. @@ -33,6 +91,18 @@ WEAK(memset) .Ldo_duff: /* Duff's device with 32 XLEN stores per iteration */ + +#ifdef CONFIG_RISCV_ISA_ZICBOZ + ALT_ZICBOZ("j .Ldo_duff2", "nop") + beqz t1, .Ldo_duff2 + /* a3, "end", is start of block aligned start. a1 is 0 */ + move a3, t3 + sub a4, a3, t0 /* a4 is SZREG aligned count */ + move t4, a4 /* Save count for later, see below. */ + j .Ldo_duff4 +#endif + +.Ldo_duff2: /* Broadcast value into all bytes */ andi a1, a1, 0xff slli a3, a1, 8 @@ -44,10 +114,12 @@ WEAK(memset) or a1, a3, a1 #endif +.Ldo_duff3: /* Calculate end address */ andi a4, a2, ~(SZREG-1) add a3, t0, a4 +.Ldo_duff4: andi a4, a4, 31*SZREG /* Calculate remainder */ beqz a4, .Lduff_loop /* Shortcut if no remainder */ neg a4, a4 @@ -100,6 +172,15 @@ WEAK(memset) addi t0, t0, 32*SZREG bltu t0, a3, .Lduff_loop + +#ifdef CONFIG_RISCV_ISA_ZICBOZ + ALT_ZICBOZ("j .Lcount_update", "nop") + beqz t1, .Lcount_update + sub a2, a2, t4 /* Difference was saved above */ + j .Ldo_zero2 +#endif + +.Lcount_update: andi a2, a2, SZREG-1 /* Update count */ .Lfinish: