From patchwork Thu Jun 9 04:16:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1641014 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=W+5ubZvO; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LJW6L3Y2Xz9s09 for ; Thu, 9 Jun 2022 14:18:02 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D377E3834F2C for ; Thu, 9 Jun 2022 04:17:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D377E3834F2C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1654748278; bh=DVuSc1h+Y0rnlqx4QPk9vhBsHKZ5AIyDWwqOC592zOs=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=W+5ubZvOm8BfXgrtIwj1TJrP94r0Uc+EZrxusqHjf4DSR/vBESMhhqscH2x1TZr8h iiUQw5zEHY+GvpmUmTHZ3yI4KyjMfCZKgofkAeb91hYHxTHhx3MbPQpgKSomav75Ag hIKRss/bgsnklri5xVZmgqEX+B9M5X14Wt6F3WR4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id BF5E138356AB for ; Thu, 9 Jun 2022 04:16:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BF5E138356AB Received: by mail-pl1-x631.google.com with SMTP id r1so2067619plo.10 for ; Wed, 08 Jun 2022 21:16:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DVuSc1h+Y0rnlqx4QPk9vhBsHKZ5AIyDWwqOC592zOs=; b=cYVM1U791MNmMSvgbpLJcWuAsAO7i74giB36dWPh3kfIazqN2IfsLt6wCUPrTOryGk WDDCULX436BiAXkuEo8g4mJrryGgKraflpM8S9CRzA6mFk+wl4CbJOyAlq2uq1zyZ+9X atoO1s3ffNsRUSPTZXyQ6Of52FT5pFvAGNrvaPJx8S17zYgq6MVNM6PFWSE9YIENM5Or 8WRJAtMGtCC3+he2+UbVOlX1q8dXBjctfuEwcoY1QXJZ8RaZTdZdz2O9mdxRWnq7PQUV b3NdjfCXDPCSaG5Ybbac/YqQhKbkP9VW6mR5DR15lQVcJ+5EvdjHuUeMWAKRshGR/N6+ X1tg== X-Gm-Message-State: AOAM533AxjNNVje6CozeNLMLrgxslDOOBtxrtvcPSbUc1Yx2FKGd4Yyp KJoD/TVInWH5PXZfHLbDBAhj9vrME3IF+A== X-Google-Smtp-Source: ABdhPJxUnbXYn+fcoA9w6fEqyUvCTAu9koGc6qjFDYykerb5DBjbKyByluDCMmuCxck0NPV6kM2jgw== X-Received: by 2002:a17:90b:3e88:b0:1e8:8d83:8782 with SMTP id rj8-20020a17090b3e8800b001e88d838782mr1436004pjb.0.1654748218461; Wed, 08 Jun 2022 21:16:58 -0700 (PDT) Received: from noah-tgl.. ([2600:1010:b04a:6ef:d217:ff37:61dd:fb1]) by smtp.gmail.com with ESMTPSA id d10-20020a170902e14a00b00166d8100b7bsm11833644pla.176.2022.06.08.21.16.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jun 2022 21:16:58 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v1 2/3] x86: Add avx compiled version for strspn, strcspn, and strpbrk Date: Wed, 8 Jun 2022 21:16:52 -0700 Message-Id: <20220609041653.2515397-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220609041653.2515397-1-goldstein.w.n@gmail.com> References: <20220609041653.2515397-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" No change to the actual logic of the functions. The goal is to so the avx/avx2 machines rely less of sse instructions. Full xcheck passes on x86_64. --- sysdeps/x86_64/multiarch/Makefile | 21 ++++++++++----- .../multiarch/{ifunc-sse4_2.h => ifunc-avx.h} | 4 +++ sysdeps/x86_64/multiarch/ifunc-impl-list.c | 6 +++++ sysdeps/x86_64/multiarch/strcspn-c-avx.c | 21 +++++++++++++++ .../{strcspn-c.c => strcspn-c-sse4.c} | 26 ++++++++++++------- sysdeps/x86_64/multiarch/strcspn.c | 2 +- sysdeps/x86_64/multiarch/strpbrk-c-avx.c | 23 ++++++++++++++++ .../{strpbrk-c.c => strpbrk-c-sse4.c} | 6 ++--- sysdeps/x86_64/multiarch/strpbrk.c | 2 +- sysdeps/x86_64/multiarch/strspn-c-avx.c | 21 +++++++++++++++ .../multiarch/{strspn-c.c => strspn-c-sse4.c} | 15 ++++++++--- sysdeps/x86_64/multiarch/strspn.c | 2 +- 12 files changed, 122 insertions(+), 27 deletions(-) rename sysdeps/x86_64/multiarch/{ifunc-sse4_2.h => ifunc-avx.h} (89%) create mode 100644 sysdeps/x86_64/multiarch/strcspn-c-avx.c rename sysdeps/x86_64/multiarch/{strcspn-c.c => strcspn-c-sse4.c} (90%) create mode 100644 sysdeps/x86_64/multiarch/strpbrk-c-avx.c rename sysdeps/x86_64/multiarch/{strpbrk-c.c => strpbrk-c-sse4.c} (89%) create mode 100644 sysdeps/x86_64/multiarch/strspn-c-avx.c rename sysdeps/x86_64/multiarch/{strspn-c.c => strspn-c-sse4.c} (92%) diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index 3d153cac35..27f306c7c8 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -76,7 +76,8 @@ sysdep_routines += \ strcpy-evex \ strcpy-sse2 \ strcpy-sse2-unaligned \ - strcspn-c \ + strcspn-c-avx \ + strcspn-c-sse4 \ strcspn-sse2 \ strlen-avx2 \ strlen-avx2-rtm \ @@ -108,22 +109,28 @@ sysdep_routines += \ strnlen-evex \ strnlen-evex512 \ strnlen-sse2 \ - strpbrk-c \ + strpbrk-c-avx \ + strpbrk-c-sse4 \ strpbrk-sse2 \ strrchr-avx2 \ strrchr-avx2-rtm \ strrchr-evex \ strrchr-sse2 \ - strspn-c \ + strspn-c-avx \ + strspn-c-sse4 \ strspn-sse2 \ strstr-avx512 \ strstr-sse2-unaligned \ varshift \ # sysdep_routines -CFLAGS-varshift.c += -msse4 -CFLAGS-strcspn-c.c += -msse4 -CFLAGS-strpbrk-c.c += -msse4 -CFLAGS-strspn-c.c += -msse4 + +CFLAGS-strcspn-c-avx.c += -mavx +CFLAGS-strcspn-c-sse4.c += -msse4 +CFLAGS-strpbrk-c-avx.c += -mavx +CFLAGS-strpbrk-c-sse4.c += -msse4 +CFLAGS-strspn-c-avx.c += -mavx +CFLAGS-strspn-c-sse4.c += -msse4 + CFLAGS-strstr-avx512.c += -mavx512f -mavx512vl -mavx512dq -mavx512bw -mbmi -mbmi2 -O3 endif diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-avx.h similarity index 89% rename from sysdeps/x86_64/multiarch/ifunc-sse4_2.h rename to sysdeps/x86_64/multiarch/ifunc-avx.h index b555ff2fac..891f3ddcac 100644 --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h +++ b/sysdeps/x86_64/multiarch/ifunc-avx.h @@ -21,12 +21,16 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (sse42) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx) attribute_hidden; static inline void * IFUNC_SELECTOR (void) { const struct cpu_features* cpu_features = __get_cpu_features (); + if (CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) + return OPTIMIZE (avx); + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)) return OPTIMIZE (sse42); diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index 58f3ec8306..507c563669 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -529,6 +529,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/strcspn.c. */ IFUNC_IMPL (i, name, strcspn, + IFUNC_IMPL_ADD (array, i, strcspn, CPU_FEATURE_USABLE (AVX), + __strcspn_avx) IFUNC_IMPL_ADD (array, i, strcspn, CPU_FEATURE_USABLE (SSE4_2), __strcspn_sse42) IFUNC_IMPL_ADD (array, i, strcspn, 1, __strcspn_sse2)) @@ -605,6 +607,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/strpbrk.c. */ IFUNC_IMPL (i, name, strpbrk, + IFUNC_IMPL_ADD (array, i, strpbrk, CPU_FEATURE_USABLE (AVX), + __strpbrk_avx) IFUNC_IMPL_ADD (array, i, strpbrk, CPU_FEATURE_USABLE (SSE4_2), __strpbrk_sse42) IFUNC_IMPL_ADD (array, i, strpbrk, 1, __strpbrk_sse2)) @@ -612,6 +616,8 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, /* Support sysdeps/x86_64/multiarch/strspn.c. */ IFUNC_IMPL (i, name, strspn, + IFUNC_IMPL_ADD (array, i, strspn, CPU_FEATURE_USABLE (AVX), + __strspn_avx) IFUNC_IMPL_ADD (array, i, strspn, CPU_FEATURE_USABLE (SSE4_2), __strspn_sse42) IFUNC_IMPL_ADD (array, i, strspn, 1, __strspn_sse2)) diff --git a/sysdeps/x86_64/multiarch/strcspn-c-avx.c b/sysdeps/x86_64/multiarch/strcspn-c-avx.c new file mode 100644 index 0000000000..b8d983f79f --- /dev/null +++ b/sysdeps/x86_64/multiarch/strcspn-c-avx.c @@ -0,0 +1,21 @@ +/* strcspn with AVX intrinsics + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define STRCSPN __strcspn_avx +#define SECTION "avx" +#include "strcspn-c-sse4.c" diff --git a/sysdeps/x86_64/multiarch/strcspn-c.c b/sysdeps/x86_64/multiarch/strcspn-c-sse4.c similarity index 90% rename from sysdeps/x86_64/multiarch/strcspn-c.c rename to sysdeps/x86_64/multiarch/strcspn-c-sse4.c index c312fab8b1..848c3cfb14 100644 --- a/sysdeps/x86_64/multiarch/strcspn-c.c +++ b/sysdeps/x86_64/multiarch/strcspn-c-sse4.c @@ -52,9 +52,16 @@ when either CFlag or ZFlag is 1. If CFlag == 1, ECX has the offset X for case 1. */ -#ifndef STRCSPN_SSE2 -# define STRCSPN_SSE2 __strcspn_sse2 -# define STRCSPN_SSE42 __strcspn_sse42 +#ifndef STRCSPN_FALLBACK +# define STRCSPN_FALLBACK __strcspn_sse2 +#endif + +#ifndef STRCSPN +# define STRCSPN __strcspn_sse42 +#endif + +#ifndef SECTION +# define SECTION "sse4.2" #endif #ifdef USE_AS_STRPBRK @@ -69,16 +76,15 @@ char * #else size_t #endif -STRCSPN_SSE2 (const char *, const char *) attribute_hidden; - +STRCSPN_FALLBACK (const char *, const char *) attribute_hidden; #ifdef USE_AS_STRPBRK char * #else size_t #endif -__attribute__ ((section (".text.sse4.2"))) -STRCSPN_SSE42 (const char *s, const char *a) +__attribute__ ((section (".text." SECTION))) +STRCSPN (const char *s, const char *a) { if (*a == 0) RETURN (NULL, strlen (s)); @@ -116,10 +122,10 @@ STRCSPN_SSE42 (const char *s, const char *a) maskz_bits = _mm_movemask_epi8 (maskz); if (maskz_bits == 0) { - /* There is no NULL terminator. Don't use SSE4.2 if the length - of A > 16. */ + /* There is no NULL terminator. Don't use pcmpstri based approach if the + length of A > 16. */ if (a[16] != 0) - return STRCSPN_SSE2 (s, a); + return STRCSPN_FALLBACK (s, a); } aligned = s; diff --git a/sysdeps/x86_64/multiarch/strcspn.c b/sysdeps/x86_64/multiarch/strcspn.c index 4848fa8677..63e1cf052e 100644 --- a/sysdeps/x86_64/multiarch/strcspn.c +++ b/sysdeps/x86_64/multiarch/strcspn.c @@ -24,7 +24,7 @@ # undef strcspn # define SYMBOL_NAME strcspn -# include "ifunc-sse4_2.h" +# include "ifunc-avx.h" libc_ifunc_redirected (__redirect_strcspn, strcspn, IFUNC_SELECTOR ()); diff --git a/sysdeps/x86_64/multiarch/strpbrk-c-avx.c b/sysdeps/x86_64/multiarch/strpbrk-c-avx.c new file mode 100644 index 0000000000..2918013994 --- /dev/null +++ b/sysdeps/x86_64/multiarch/strpbrk-c-avx.c @@ -0,0 +1,23 @@ +/* strpbrk with AVX intrinsics + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define USE_AS_STRPBRK +#define STRCSPN_FALLBACK __strpbrk_sse2 +#define STRCSPN __strpbrk_avx +#define SECTION "avx" +#include "strcspn-c-sse4.c" diff --git a/sysdeps/x86_64/multiarch/strpbrk-c.c b/sysdeps/x86_64/multiarch/strpbrk-c-sse4.c similarity index 89% rename from sysdeps/x86_64/multiarch/strpbrk-c.c rename to sysdeps/x86_64/multiarch/strpbrk-c-sse4.c index abf4ff7f1a..2efd38d809 100644 --- a/sysdeps/x86_64/multiarch/strpbrk-c.c +++ b/sysdeps/x86_64/multiarch/strpbrk-c-sse4.c @@ -17,6 +17,6 @@ . */ #define USE_AS_STRPBRK -#define STRCSPN_SSE2 __strpbrk_sse2 -#define STRCSPN_SSE42 __strpbrk_sse42 -#include "strcspn-c.c" +#define STRCSPN_FALLBACK __strpbrk_sse2 +#define STRCSPN __strpbrk_sse42 +#include "strcspn-c-sse4.c" diff --git a/sysdeps/x86_64/multiarch/strpbrk.c b/sysdeps/x86_64/multiarch/strpbrk.c index 04e300ea71..ab5b04a482 100644 --- a/sysdeps/x86_64/multiarch/strpbrk.c +++ b/sysdeps/x86_64/multiarch/strpbrk.c @@ -24,7 +24,7 @@ # undef strpbrk # define SYMBOL_NAME strpbrk -# include "ifunc-sse4_2.h" +# include "ifunc-avx.h" libc_ifunc_redirected (__redirect_strpbrk, strpbrk, IFUNC_SELECTOR ()); diff --git a/sysdeps/x86_64/multiarch/strspn-c-avx.c b/sysdeps/x86_64/multiarch/strspn-c-avx.c new file mode 100644 index 0000000000..9d5fdb9550 --- /dev/null +++ b/sysdeps/x86_64/multiarch/strspn-c-avx.c @@ -0,0 +1,21 @@ +/* strspn with AVX intrinsics + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define STRSPN __strspn_avx +#define SECTION "avx" +#include "strspn-c-sse4.c" diff --git a/sysdeps/x86_64/multiarch/strspn-c.c b/sysdeps/x86_64/multiarch/strspn-c-sse4.c similarity index 92% rename from sysdeps/x86_64/multiarch/strspn-c.c rename to sysdeps/x86_64/multiarch/strspn-c-sse4.c index 6124033ceb..6a91def2e0 100644 --- a/sysdeps/x86_64/multiarch/strspn-c.c +++ b/sysdeps/x86_64/multiarch/strspn-c-sse4.c @@ -53,10 +53,17 @@ extern size_t __strspn_sse2 (const char *, const char *) attribute_hidden; +#ifndef STRSPN +# define STRSPN __strspn_sse42 +#endif + +#ifndef SECTION +# define SECTION "sse4.2" +#endif size_t -__attribute__ ((section (".text.sse4.2"))) -__strspn_sse42 (const char *s, const char *a) +__attribute__ ((section (".text." SECTION))) +STRSPN (const char *s, const char *a) { if (*a == 0) return 0; @@ -95,8 +102,8 @@ __strspn_sse42 (const char *s, const char *a) maskz_bits = _mm_movemask_epi8 (maskz); if (maskz_bits == 0) { - /* There is no NULL terminator. Don't use SSE4.2 if the length - of A > 16. */ + /* There is no NULL terminator. Don't use pcmpstri based approach if the + length of A > 16. */ if (a[16] != 0) return __strspn_sse2 (s, a); } diff --git a/sysdeps/x86_64/multiarch/strspn.c b/sysdeps/x86_64/multiarch/strspn.c index 07f5def155..c3c5e7a3cc 100644 --- a/sysdeps/x86_64/multiarch/strspn.c +++ b/sysdeps/x86_64/multiarch/strspn.c @@ -24,7 +24,7 @@ # undef strspn # define SYMBOL_NAME strspn -# include "ifunc-sse4_2.h" +# include "ifunc-avx.h" libc_ifunc_redirected (__redirect_strspn, strspn, IFUNC_SELECTOR ());