From patchwork Thu Oct 20 17:26:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1692662 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=s6A3oYF1; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MtZKS6Lg4z1ygT for ; Fri, 21 Oct 2022 04:27:08 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C9247384D16C for ; Thu, 20 Oct 2022 17:27:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C9247384D16C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1666286824; bh=EnsOCkgBhOilpmGKwx2KYfnreCWfxAA8KPvAD1VreXU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=s6A3oYF1Bs/w4t0GCRdP6UGlg92XwtqSEy05tgR6DdaFmRxrpmsZ7n/tlqgjB+PTg bZEvSCQLsZb2N6zPmlwIpGOvWfPHpAmKHwr/oNtwZusB3bnPdmn/AdZeF+x6Vor+Tk f8pAzvdWQu0D00F0DfvdH1Lud7jRmWUjryqEGDnk= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 1107E384D148 for ; Thu, 20 Oct 2022 17:26:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1107E384D148 Received: by mail-oi1-x236.google.com with SMTP id w196so362872oiw.8 for ; Thu, 20 Oct 2022 10:26:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EnsOCkgBhOilpmGKwx2KYfnreCWfxAA8KPvAD1VreXU=; b=fc4yyGvhFRamX317gDjS/qoN9UI36qvI+L1AgojOv8Azd9UmrtDB5Yhw/NzlhM8UP+ ruFr5aIp3/BmB/h4aJJ5NPwGYsh5gYcZhsRYFBGs7/5PFD4jlrjnTGohhHqDPX94eXaf O4xjxBKNbxBITI3PGX/QIgAiW8auFQOeUUjkvKzTk6fs/Bxq1C2BdwnyixjM/chGRlMy W/8eUpNMoullOQptfxRjC6yYyBVUIBKBhSTki9v8FdCCCSMxXARwVCRLXEJpJjpZnXY5 VMxC0LOH1uBcdq/Hmpzwta3MB+vjNbS+gNyZQ9D7vPrYkeb/UrdBYylvb+5PxUqUXKUv ZthA== X-Gm-Message-State: ACrzQf3nhs9lUtJiCx5KTcOqTY4ANKxqSxJAlomqUpGOWiFwCX/UscG9 NZhFAHwD8Q1ErpR5uMREFds2VwAe7jOH5Q== X-Google-Smtp-Source: AMsMyM6et5Nl7rKcMtMeFryQ8ziqpm462t+awnLACjWd3NPvTjrPhhJvmrBxoDydA6n6k6/TJxK0fA== X-Received: by 2002:a05:6808:130c:b0:355:39af:eb4f with SMTP id y12-20020a056808130c00b0035539afeb4fmr8181939oiv.218.1666286808930; Thu, 20 Oct 2022 10:26:48 -0700 (PDT) Received: from noah-tgl.lan (2603-8080-1301-76c6-c0cf-3a93-35b1-7254.res6.spectrum.com. [2603:8080:1301:76c6:c0cf:3a93:35b1:7254]) by smtp.gmail.com with ESMTPSA id cp27-20020a056830661b00b006605883eae6sm125039otb.63.2022.10.20.10.26.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Oct 2022 10:26:48 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1] x86: Remove AVX512-BVMI2 instruction from strrchr-evex.S Date: Thu, 20 Oct 2022 10:26:46 -0700 Message-Id: <20221020172646.3453468-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" commit b412213eee0afa3b51dfe92b736dfc7c981309f5 Author: Noah Goldstein Date: Tue Oct 18 17:44:07 2022 -0700 x86: Optimize strrchr-evex.S and implement with VMM headers Added `vpcompress{b|d}` to the page-cross logic with is an AVX512-VBMI2 instruction. This is not supported on SKX. Since the page-cross logic is relatively cold and the benefit is minimal revert the page-cross case back to the old logic which is supported on SKX. Tested on x86-64. --- sysdeps/x86_64/multiarch/strrchr-evex.S | 69 +++++++++++-------------- 1 file changed, 29 insertions(+), 40 deletions(-) diff --git a/sysdeps/x86_64/multiarch/strrchr-evex.S b/sysdeps/x86_64/multiarch/strrchr-evex.S index 45487dc87a..26b3457875 100644 --- a/sysdeps/x86_64/multiarch/strrchr-evex.S +++ b/sysdeps/x86_64/multiarch/strrchr-evex.S @@ -29,9 +29,7 @@ # include "x86-evex256-vecs.h" # ifdef USE_AS_WCSRCHR -# define RCX_M cl -# define SHIFT_REG rcx -# define VPCOMPRESS vpcompressd +# define SHIFT_REG rsi # define kunpck_2x kunpckbw # define kmov_2x kmovd # define maskz_2x ecx @@ -46,9 +44,7 @@ # define USE_WIDE_CHAR # else -# define RCX_M ecx # define SHIFT_REG rdi -# define VPCOMPRESS vpcompressb # define kunpck_2x kunpckdq # define kmov_2x kmovq # define maskz_2x rcx @@ -78,7 +74,7 @@ ENTRY_P2ALIGN(STRRCHR, 6) andl $(PAGE_SIZE - 1), %eax cmpl $(PAGE_SIZE - VEC_SIZE), %eax jg L(cross_page_boundary) - +L(page_cross_continue): VMOVU (%rdi), %VMM(1) /* k0 has a 1 for each zero CHAR in VEC(1). */ VPTESTN %VMM(1), %VMM(1), %k0 @@ -86,7 +82,6 @@ ENTRY_P2ALIGN(STRRCHR, 6) test %VRSI, %VRSI jz L(aligned_more) /* fallthrough: zero CHAR in first VEC. */ -L(page_cross_return): /* K1 has a 1 for each search CHAR match in VEC(1). */ VPCMPEQ %VMATCH, %VMM(1), %k1 KMOV %k1, %VRAX @@ -197,7 +192,6 @@ L(first_vec_x2): .p2align 4,, 12 L(aligned_more): -L(page_cross_continue): /* Need to keep original pointer incase VEC(1) has last match. */ movq %rdi, %rsi @@ -353,53 +347,48 @@ L(return_new_match): leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax ret - .p2align 4,, 4 L(cross_page_boundary): + /* eax contains all the page offset bits of src (rdi). `xor rdi, + rax` sets pointer will all page offset bits cleared so + offset of (PAGE_SIZE - VEC_SIZE) will get last aligned VEC + before page cross (guranteed to be safe to read). Doing this + as opposed to `movq %rdi, %rax; andq $-VEC_SIZE, %rax` saves + a bit of code size. */ xorq %rdi, %rax - mov $-1, %VRDX - VMOVU (PAGE_SIZE - VEC_SIZE)(%rax), %VMM(6) - VPTESTN %VMM(6), %VMM(6), %k0 - KMOV %k0, %VRSI - -# ifdef USE_AS_WCSRCHR - movl %edi, %ecx - and $(VEC_SIZE - 1), %ecx - shrl $2, %ecx -# endif - shlx %VGPR(SHIFT_REG), %VRDX, %VRDX + VMOVU (PAGE_SIZE - VEC_SIZE)(%rax), %VMM(1) + VPTESTN %VMM(1), %VMM(1), %k0 + KMOV %k0, %VRCX + /* Shift out zero CHAR matches that are before the begining of + src (rdi). */ # ifdef USE_AS_WCSRCHR - kmovb %edx, %k1 -# else - KMOV %VRDX, %k1 + movl %edi, %esi + andl $(VEC_SIZE - 1), %esi + shrl $2, %esi # endif + shrx %VGPR(SHIFT_REG), %VRCX, %VRCX - /* Need to adjust result to VEC(1) so it can be re-used by - L(return_vec_x0_test). The alternative is to collect VEC(1) - will a page cross load which is far more expensive. */ - VPCOMPRESS %VMM(6), %VMM(1){%k1}{z} - - /* We could technically just jmp back after the vpcompress but - it doesn't save any 16-byte blocks. */ - shrx %VGPR(SHIFT_REG), %VRSI, %VRSI - test %VRSI, %VRSI + test %VRCX, %VRCX jz L(page_cross_continue) - /* Duplicate of return logic from ENTRY. Doesn't cause spill to - next cache line so might as well copy it here. */ - VPCMPEQ %VMATCH, %VMM(1), %k1 + /* Found zero CHAR so need to test for search CHAR. */ + VPCMP $0, %VMATCH, %VMM(1), %k1 KMOV %k1, %VRAX - blsmsk %VRSI, %VRSI - and %VRSI, %VRAX - jz L(ret_page_cross) + /* Shift out search CHAR matches that are before the begining of + src (rdi). */ + shrx %VGPR(SHIFT_REG), %VRAX, %VRAX + + /* Check if any search CHAR match in range. */ + blsmsk %VRCX, %VRCX + and %VRCX, %VRAX + jz L(ret3) bsr %VRAX, %VRAX # ifdef USE_AS_WCSRCHR leaq (%rdi, %rax, CHAR_SIZE), %rax # else addq %rdi, %rax # endif -L(ret_page_cross): +L(ret3): ret - /* 1 byte till next cache line. */ END(STRRCHR) #endif