From patchwork Mon Dec 2 01:08:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 2016963 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=sT/vap6R; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=ozlabs.org (client-ip=2404:9400:2221:ea00::3; helo=mail.ozlabs.org; envelope-from=srs0=hmzt=s3=vger.kernel.org=linux-ext4+bounces-5455-patchwork-incoming=ozlabs.org@ozlabs.org; receiver=patchwork.ozlabs.org) Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Y1m3N1wpzz1yQN for ; Mon, 2 Dec 2024 12:12:16 +1100 (AEDT) Received: from mail.ozlabs.org (mail.ozlabs.org [IPv6:2404:9400:2221:ea00::3]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4Y1m3J4ZCbz4wyV for ; Mon, 2 Dec 2024 12:12:12 +1100 (AEDT) Received: by gandalf.ozlabs.org (Postfix) id 4Y1m3J4WlDz4wvc; Mon, 2 Dec 2024 12:12:12 +1100 (AEDT) Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: gandalf.ozlabs.org; arc=pass smtp.remote-ip=147.75.199.223 arc.chain=subspace.kernel.org ARC-Seal: i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1733101932; cv=pass; b=m6kJxSpVnlBL/olDrr2OHlZ1jNHlopIoR96cMF6bmpVEdGmEJv9flcSbTLmP94vJgMUZLz7bjwMYkhrC9dhG1bGbmQo/1D/ojXO1o2vgS8mkrixYyDc4EhE+NAfaS9KfFwPHDgDlz38ectgyIyRvp5GqPibyg8OGIKBaEllyq6ufztZblqMbztG5B+lPQbLE/5SJ1wU49Pxkrzu8Lkv/C3e6htsSDgt5oMnrYBIo/iLxccClfZX6VSZ17Amqh2sQaowQUp3v8HwhKCB159I7VKoRyzC9apnw5pivGz8/YvkbtWCe3/t/AVcbxBD/oLrNuMIqDqOfr4+aioIZlKxaLA== ARC-Message-Signature: i=2; a=rsa-sha256; d=ozlabs.org; s=201707; t=1733101932; c=relaxed/relaxed; bh=98aAhony+oG7dxlmfuBhSzcm5NZAJPMsHXlN//Enqt0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KT3m/h8osPPjzBjmmOTm5KJOFd3FwX5djMhvb1yfuEDgTI7B1+9AeFsOcTgsCn+dp08WmPFeDki5SYGPutgrmNZV3ovBZENfWWsE7uhlWKfww/B9jh7tp3Uak5Il2TXW7F9LMa9/QvoG9+QrTkmWOtjoq/m1SEho8ebk8gcmx+eY2sGIlkW7hJN2nNyKophBFIPUjEEsKxCMo795tdLdlLwKqcroSDqcNbRHub8Iv5UIQQqy/jMNg8K5oAl5xySzUrI8lRwv9h/EAVU9pX6wOqHPQlNMfayoe+3lGPlinsxVkrtIstrQYwFbdtSkzw64fF+O9D4yzT7hTZKY18ksAg== ARC-Authentication-Results: i=2; gandalf.ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=kernel.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=sT/vap6R; dkim-atps=neutral; spf=pass (client-ip=147.75.199.223; helo=ny.mirrors.kernel.org; envelope-from=linux-ext4+bounces-5455-patchwork-incoming=ozlabs.org@vger.kernel.org; receiver=ozlabs.org) smtp.mailfrom=vger.kernel.org Authentication-Results: gandalf.ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=kernel.org Authentication-Results: gandalf.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=sT/vap6R; dkim-atps=neutral Authentication-Results: gandalf.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=147.75.199.223; helo=ny.mirrors.kernel.org; envelope-from=linux-ext4+bounces-5455-patchwork-incoming=ozlabs.org@vger.kernel.org; receiver=ozlabs.org) Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org [147.75.199.223]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by gandalf.ozlabs.org (Postfix) with ESMTPS id 4Y1m3F0rkSz4wyV for ; Mon, 2 Dec 2024 12:12:09 +1100 (AEDT) Received: from smtp.subspace.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 340F9162CCA for ; Mon, 2 Dec 2024 01:12:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 04B27183CBB; Mon, 2 Dec 2024 01:09:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sT/vap6R" X-Original-To: linux-ext4@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FD92175D4F; Mon, 2 Dec 2024 01:09:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733101775; cv=none; b=GRVlkU5ieeOViLjKa9hJx58FMuk6tu02E2VQdLLJnhijvj0ci6lByGNxzhOmEnzId6PKHGmNq/U1HInA9+uiqgiAKXLDSRSBSkSWrGmGvW8YXCs7g9yD2auRJhzghlTINs/Aw7OvxKMc5u0sLMGej+l//lyiSWNu+bnYhDeSMvo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733101775; c=relaxed/simple; bh=l+CLrwqJVSvfyOKygspLh9Mjv7UjhIJ0MHuliD+HqG4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P1ovozrHxzsKSYfwUGc2gmhopTeC0hCAsYUZVT5cRkeoUrPM88A5qqfKk1NwaK8NEPLWuP+BLLMRLvMAyptLQZVtqYGmom9u/I8R4qESSvxdOMTquPaqQFQASPO0JgZRxxp8IRs03htUHpxmWDhqOuL8wU/VubbmCNeyTrDyP9w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sT/vap6R; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 10744C4CEDD; Mon, 2 Dec 2024 01:09:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733101775; bh=l+CLrwqJVSvfyOKygspLh9Mjv7UjhIJ0MHuliD+HqG4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sT/vap6RPh46beKww4TVJmC7hJ1nlRZguIaqKyI2L2sjnF0Q3JAc3a+fcPACVF2/X iRozDD804ndJyynJIWjWRZIzKOt9C/DJBDJx47gyLFh90zPxPqnbPJuvp4mXi8JyMc BnK15oWiSL6nTQLCyZJQcQ3FV2xAd1EmuWEWf2HlJlTXcKxbYF7eaMn8um5tGMH8EZ ZgDWFM8jRXt9wVjpOLdPPWx5LIGYaF9jllwYz6HJ9TxsD81OfHbxgUGSBIybwHoCVH u+ngQdT+SVxwSV3UYqtb1szMPTdg3TkLoyq2xjTRlHV0B9k0c/4X5zXiQUI5Tl1+g8 kCprWDCmk/mKA== From: Eric Biggers To: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-crypto@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mips@vger.kernel.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-scsi@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, sparclinux@vger.kernel.org, x86@kernel.org, Ard Biesheuvel Subject: [PATCH v4 11/19] x86/crc32: update prototype for crc_pcl() Date: Sun, 1 Dec 2024 17:08:36 -0800 Message-ID: <20241202010844.144356-12-ebiggers@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241202010844.144356-1-ebiggers@kernel.org> References: <20241202010844.144356-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-3.5 required=5.0 tests=ARC_SIGNED,ARC_VALID, DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DMARC_PASS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=disabled version=4.0.0 X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on gandalf.ozlabs.org From: Eric Biggers - Change the len parameter from unsigned int to size_t, so that the library function which takes a size_t can safely use this code. - Rename to crc32c_x86_3way() which is much clearer. - Move the crc parameter to the front, as this is the usual convention. Reviewed-by: Ard Biesheuvel Signed-off-by: Eric Biggers --- arch/x86/crypto/crc32c-intel_glue.c | 7 ++- arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 63 ++++++++++++----------- 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index 52c5d47ef5a1..603d159de400 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -39,12 +39,11 @@ * size is >= 512 to account * for fpu state save/restore overhead. */ #define CRC32C_PCL_BREAKEVEN 512 -asmlinkage unsigned int crc_pcl(const u8 *buffer, unsigned int len, - unsigned int crc_init); +asmlinkage u32 crc32c_x86_3way(u32 crc, const u8 *buffer, size_t len); #endif /* CONFIG_X86_64 */ static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length) { while (length--) { @@ -157,11 +156,11 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data, * use faster PCL version if datasize is large enough to * overcome kernel fpu state save/restore overhead */ if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) { kernel_fpu_begin(); - *crcp = crc_pcl(data, len, *crcp); + *crcp = crc32c_x86_3way(*crcp, data, len); kernel_fpu_end(); } else *crcp = crc32c_intel_le_hw(*crcp, data, len); return 0; } @@ -169,11 +168,11 @@ static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data, static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len, u8 *out) { if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) { kernel_fpu_begin(); - *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp)); + *(__le32 *)out = ~cpu_to_le32(crc32c_x86_3way(*crcp, data, len)); kernel_fpu_end(); } else *(__le32 *)out = ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len)); return 0; diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S index 752812bc4991..9b8770503bbc 100644 --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -50,19 +50,20 @@ # Define threshold below which buffers are considered "small" and routed to # regular CRC code that does not interleave the CRC instructions. #define SMALL_SIZE 200 -# unsigned int crc_pcl(const u8 *buffer, unsigned int len, unsigned int crc_init); +# u32 crc32c_x86_3way(u32 crc, const u8 *buffer, size_t len); .text -SYM_FUNC_START(crc_pcl) -#define bufp %rdi -#define bufp_d %edi -#define len %esi -#define crc_init %edx -#define crc_init_q %rdx +SYM_FUNC_START(crc32c_x86_3way) +#define crc0 %edi +#define crc0_q %rdi +#define bufp %rsi +#define bufp_d %esi +#define len %rdx +#define len_dw %edx #define n_misaligned %ecx /* overlaps chunk_bytes! */ #define n_misaligned_q %rcx #define chunk_bytes %ecx /* overlaps n_misaligned! */ #define chunk_bytes_q %rcx #define crc1 %r8 @@ -83,13 +84,13 @@ SYM_FUNC_START(crc_pcl) # Process 1 <= n_misaligned <= 7 bytes individually in order to align # the remaining data to an 8-byte boundary. .Ldo_align: movq (bufp), %rax add n_misaligned_q, bufp - sub n_misaligned, len + sub n_misaligned_q, len .Lalign_loop: - crc32b %al, crc_init # compute crc32 of 1-byte + crc32b %al, crc0 # compute crc32 of 1-byte shr $8, %rax # get next byte dec n_misaligned jne .Lalign_loop .Laligned: @@ -100,11 +101,11 @@ SYM_FUNC_START(crc_pcl) cmp $128*24, len jae .Lfull_block .Lpartial_block: # Compute floor(len / 24) to get num qwords to process from each lane. - imul $2731, len, %eax # 2731 = ceil(2^16 / 24) + imul $2731, len_dw, %eax # 2731 = ceil(2^16 / 24) shr $16, %eax jmp .Lcrc_3lanes .Lfull_block: # Processing 128 qwords from each lane. @@ -123,20 +124,20 @@ SYM_FUNC_START(crc_pcl) jl .Lcrc_3lanes_4x_done # Unroll the loop by a factor of 4 to reduce the overhead of the loop # bookkeeping instructions, which can compete with crc32q for the ALUs. .Lcrc_3lanes_4x_loop: - crc32q (bufp), crc_init_q + crc32q (bufp), crc0_q crc32q (bufp,chunk_bytes_q), crc1 crc32q (bufp,chunk_bytes_q,2), crc2 - crc32q 8(bufp), crc_init_q + crc32q 8(bufp), crc0_q crc32q 8(bufp,chunk_bytes_q), crc1 crc32q 8(bufp,chunk_bytes_q,2), crc2 - crc32q 16(bufp), crc_init_q + crc32q 16(bufp), crc0_q crc32q 16(bufp,chunk_bytes_q), crc1 crc32q 16(bufp,chunk_bytes_q,2), crc2 - crc32q 24(bufp), crc_init_q + crc32q 24(bufp), crc0_q crc32q 24(bufp,chunk_bytes_q), crc1 crc32q 24(bufp,chunk_bytes_q,2), crc2 add $32, bufp sub $4, %eax jge .Lcrc_3lanes_4x_loop @@ -144,42 +145,42 @@ SYM_FUNC_START(crc_pcl) .Lcrc_3lanes_4x_done: add $4, %eax jz .Lcrc_3lanes_last_qword .Lcrc_3lanes_1x_loop: - crc32q (bufp), crc_init_q + crc32q (bufp), crc0_q crc32q (bufp,chunk_bytes_q), crc1 crc32q (bufp,chunk_bytes_q,2), crc2 add $8, bufp dec %eax jnz .Lcrc_3lanes_1x_loop .Lcrc_3lanes_last_qword: - crc32q (bufp), crc_init_q + crc32q (bufp), crc0_q crc32q (bufp,chunk_bytes_q), crc1 # SKIP crc32q (bufp,chunk_bytes_q,2), crc2 ; Don't do this one yet ################################################################ ## 4) Combine three results: ################################################################ lea (K_table-8)(%rip), %rax # first entry is for idx 1 pmovzxdq (%rax,chunk_bytes_q), %xmm0 # 2 consts: K1:K2 lea (chunk_bytes,chunk_bytes,2), %eax # chunk_bytes * 3 - sub %eax, len # len -= chunk_bytes * 3 + sub %rax, len # len -= chunk_bytes * 3 - movq crc_init_q, %xmm1 # CRC for block 1 + movq crc0_q, %xmm1 # CRC for block 1 pclmulqdq $0x00, %xmm0, %xmm1 # Multiply by K2 movq crc1, %xmm2 # CRC for block 2 pclmulqdq $0x10, %xmm0, %xmm2 # Multiply by K1 pxor %xmm2,%xmm1 movq %xmm1, %rax xor (bufp,chunk_bytes_q,2), %rax - mov crc2, crc_init_q - crc32 %rax, crc_init_q + mov crc2, crc0_q + crc32 %rax, crc0_q lea 8(bufp,chunk_bytes_q,2), bufp ################################################################ ## 5) If more blocks remain, goto (2): ################################################################ @@ -191,38 +192,38 @@ SYM_FUNC_START(crc_pcl) ####################################################################### ## 6) Process any remainder without interleaving: ####################################################################### .Lsmall: - test len, len + test len_dw, len_dw jz .Ldone - mov len, %eax + mov len_dw, %eax shr $3, %eax jz .Ldo_dword .Ldo_qwords: - crc32q (bufp), crc_init_q + crc32q (bufp), crc0_q add $8, bufp dec %eax jnz .Ldo_qwords .Ldo_dword: - test $4, len + test $4, len_dw jz .Ldo_word - crc32l (bufp), crc_init + crc32l (bufp), crc0 add $4, bufp .Ldo_word: - test $2, len + test $2, len_dw jz .Ldo_byte - crc32w (bufp), crc_init + crc32w (bufp), crc0 add $2, bufp .Ldo_byte: - test $1, len + test $1, len_dw jz .Ldone - crc32b (bufp), crc_init + crc32b (bufp), crc0 .Ldone: - mov crc_init, %eax + mov crc0, %eax RET -SYM_FUNC_END(crc_pcl) +SYM_FUNC_END(crc32c_x86_3way) .section .rodata, "a", @progbits ################################################################ ## PCLMULQDQ tables ## Table is 128 entries x 2 words (8 bytes) each