From patchwork Mon Aug 28 07:26:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 1826628 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RZ2FD3pP0z1yfX for ; Mon, 28 Aug 2023 17:27:12 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 94D293858002 for ; Mon, 28 Aug 2023 07:27:10 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id CCF573858D35 for ; Mon, 28 Aug 2023 07:26:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCF573858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8BxIvA+TOxkYnUcAA--.57276S3; Mon, 28 Aug 2023 15:26:54 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx4eQ9TOxkBYBlAA--.49174S3; Mon, 28 Aug 2023 15:26:54 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Subject: [PATCH 1/6] LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx} Date: Mon, 28 Aug 2023 15:26:46 +0800 Message-Id: <20230828072651.3085034-2-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230828072651.3085034-1-dengjianbo@loongson.cn> References: <20230828072651.3085034-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dx4eQ9TOxkBYBlAA--.49174S3 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoW3ZrW7KFy7Xr4kKr1rWFy8JFc_yoW8JFyUZo WftF4DXrs2krs8KrZ8CrsrX39ruF1fKr1jv3yYva1rJry8trW7CFWfCwnIkFsrZrn5WrWr XasrX3sxJrWxGFn3l-sFpf9Il3svdjkaLaAFLSUrUUUUbb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYI7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMxCIbckI1I0E14v26r126r1DMI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAV WUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU 7XTmDUUUU X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" According to glibc rawmemchr microbenchmark, A few cases tested with char '\0' experience performance degradation due to the lasx and lsx versions don't handle the '\0' separately. Overall, rawmemchr-lasx implementation could reduce the runtime about 40%-80%, rawmemchr-lsx implementation could reduce the runtime about 40%-66%, rawmemchr-aligned implementation could reduce the runtime about 20%-40%. --- sysdeps/loongarch/lp64/multiarch/Makefile | 3 + .../lp64/multiarch/ifunc-impl-list.c | 8 ++ .../lp64/multiarch/ifunc-rawmemchr.h | 40 ++++++ .../lp64/multiarch/rawmemchr-aligned.S | 124 ++++++++++++++++++ .../loongarch/lp64/multiarch/rawmemchr-lasx.S | 82 ++++++++++++ .../loongarch/lp64/multiarch/rawmemchr-lsx.S | 71 ++++++++++ sysdeps/loongarch/lp64/multiarch/rawmemchr.c | 37 ++++++ 7 files changed, 365 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index 5d7ae7ae73..64416b025a 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -21,5 +21,8 @@ sysdep_routines += \ memmove-unaligned \ memmove-lsx \ memmove-lasx \ + rawmemchr-aligned \ + rawmemchr-lsx \ + rawmemchr-lasx \ # sysdep_routines endif diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index c8ba87bd81..3db9af1460 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -94,5 +94,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_aligned) ) + IFUNC_IMPL (i, name, rawmemchr, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, rawmemchr, SUPPORT_LASX, __rawmemchr_lasx) + IFUNC_IMPL_ADD (array, i, rawmemchr, SUPPORT_LSX, __rawmemchr_lsx) +#endif + IFUNC_IMPL_ADD (array, i, rawmemchr, 1, __rawmemchr_aligned) + ) + return i; } diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h new file mode 100644 index 0000000000..a7bb4cf9ea --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h @@ -0,0 +1,40 @@ +/* Common definition for rawmemchr ifunc selections. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#if !defined __loongarch_soft_float +extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden; +#endif +extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{ +#if !defined __loongarch_soft_float + if (SUPPORT_LASX) + return OPTIMIZE (lasx); + else if (SUPPORT_LSX) + return OPTIMIZE (lsx); + else +#endif + return OPTIMIZE (aligned); +} diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S new file mode 100644 index 0000000000..9c7155ae82 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S @@ -0,0 +1,124 @@ +/* Optimized rawmemchr implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) +# define RAWMEMCHR_NAME __rawmemchr_aligned +#else +# define RAWMEMCHR_NAME __rawmemchr +#endif + +LEAF(RAWMEMCHR_NAME, 6) + andi t1, a0, 0x7 + bstrins.d a0, zero, 2, 0 + lu12i.w a2, 0x01010 + bstrins.d a1, a1, 15, 8 + + ld.d t0, a0, 0 + slli.d t1, t1, 3 + ori a2, a2, 0x101 + bstrins.d a1, a1, 31, 16 + + li.w t8, -1 + bstrins.d a1, a1, 63, 32 + bstrins.d a2, a2, 63, 32 + sll.d t2, t8, t1 + + sll.d t3, a1, t1 + orn t0, t0, t2 + slli.d a3, a2, 7 + beqz a1, L(find_zero) + + xor t0, t0, t3 + sub.d t1, t0, a2 + andn t2, a3, t0 + and t3, t1, t2 + + bnez t3, L(count_pos) + addi.d a0, a0, 8 + +L(loop): + ld.d t0, a0, 0 + xor t0, t0, a1 + + sub.d t1, t0, a2 + andn t2, a3, t0 + and t3, t1, t2 + bnez t3, L(count_pos) + + ld.d t0, a0, 8 + addi.d a0, a0, 16 + xor t0, t0, a1 + sub.d t1, t0, a2 + + andn t2, a3, t0 + and t3, t1, t2 + beqz t3, L(loop) + addi.d a0, a0, -8 +L(count_pos): + ctz.d t0, t3 + srli.d t0, t0, 3 + add.d a0, a0, t0 + jr ra + +L(loop_7bit): + ld.d t0, a0, 0 +L(find_zero): + sub.d t1, t0, a2 + and t2, t1, a3 + bnez t2, L(more_check) + + ld.d t0, a0, 8 + addi.d a0, a0, 16 + sub.d t1, t0, a2 + and t2, t1, a3 + + beqz t2, L(loop_7bit) + addi.d a0, a0, -8 + +L(more_check): + andn t2, a3, t0 + and t3, t1, t2 + bnez t3, L(count_pos) + addi.d a0, a0, 8 + +L(loop_8bit): + ld.d t0, a0, 0 + + sub.d t1, t0, a2 + andn t2, a3, t0 + and t3, t1, t2 + bnez t3, L(count_pos) + + ld.d t0, a0, 8 + addi.d a0, a0, 16 + sub.d t1, t0, a2 + + andn t2, a3, t0 + and t3, t1, t2 + beqz t3, L(loop_8bit) + + addi.d a0, a0, -8 + b L(count_pos) + +END(RAWMEMCHR_NAME) + +libc_hidden_builtin_def (__rawmemchr) diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S new file mode 100644 index 0000000000..be2eb59dbe --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S @@ -0,0 +1,82 @@ +/* Optimized rawmemchr implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define RAWMEMCHR __rawmemchr_lasx + +LEAF(RAWMEMCHR, 6) + move a2, a0 + bstrins.d a0, zero, 5, 0 + xvld xr0, a0, 0 + xvld xr1, a0, 32 + + xvreplgr2vr.b xr2, a1 + xvseq.b xr0, xr0, xr2 + xvseq.b xr1, xr1, xr2 + xvmsknz.b xr0, xr0 + + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + xvpickve.w xr4, xr1, 4 + vilvl.h vr0, vr3, vr0 + + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + sra.d t0, t0, a2 + + + beqz t0, L(loop) + ctz.d t0, t0 + add.d a0, a2, t0 + jr ra + +L(loop): + xvld xr0, a0, 64 + xvld xr1, a0, 96 + addi.d a0, a0, 64 + xvseq.b xr0, xr0, xr2 + + xvseq.b xr1, xr1, xr2 + xvmax.bu xr3, xr0, xr1 + xvseteqz.v fcc0, xr3 + bcnez fcc0, L(loop) + + xvmsknz.b xr0, xr0 + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + xvpickve.w xr4, xr1, 4 + + + vilvl.h vr0, vr3, vr0 + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + + ctz.d t0, t0 + add.d a0, a0, t0 + jr ra +END(RAWMEMCHR) + +libc_hidden_builtin_def (RAWMEMCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S new file mode 100644 index 0000000000..2f6fe024dc --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S @@ -0,0 +1,71 @@ +/* Optimized rawmemchr implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define RAWMEMCHR __rawmemchr_lsx + +LEAF(RAWMEMCHR, 6) + move a2, a0 + bstrins.d a0, zero, 4, 0 + vld vr0, a0, 0 + vld vr1, a0, 16 + + vreplgr2vr.b vr2, a1 + vseq.b vr0, vr0, vr2 + vseq.b vr1, vr1, vr2 + vmsknz.b vr0, vr0 + + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + sra.w t0, t0, a2 + + beqz t0, L(loop) + ctz.w t0, t0 + add.d a0, a2, t0 + jr ra + + +L(loop): + vld vr0, a0, 32 + vld vr1, a0, 48 + addi.d a0, a0, 32 + vseq.b vr0, vr0, vr2 + + vseq.b vr1, vr1, vr2 + vmax.bu vr3, vr0, vr1 + vseteqz.v fcc0, vr3 + bcnez fcc0, L(loop) + + vmsknz.b vr0, vr0 + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + + ctz.w t0, t0 + add.d a0, a0, t0 + jr ra +END(RAWMEMCHR) + +libc_hidden_builtin_def (RAWMEMCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr.c b/sysdeps/loongarch/lp64/multiarch/rawmemchr.c new file mode 100644 index 0000000000..89c7ffff8f --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr.c @@ -0,0 +1,37 @@ +/* Multiple versions of rawmemchr. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +# define rawmemchr __redirect_rawmemchr +# define __rawmemchr __redirect___rawmemchr +# include +# undef rawmemchr +# undef __rawmemchr + +# define SYMBOL_NAME rawmemchr +# include "ifunc-rawmemchr.h" + +libc_ifunc_redirected (__redirect_rawmemchr, __rawmemchr, + IFUNC_SELECTOR ()); +weak_alias (__rawmemchr, rawmemchr) +# ifdef SHARED +__hidden_ver1 (__rawmemchr, __GI___rawmemchr, __redirect___rawmemchr) + __attribute__((visibility ("hidden"))); +# endif +#endif From patchwork Mon Aug 28 07:26:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 1826630 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RZ2Fm1sncz1yfX for ; Mon, 28 Aug 2023 17:27:40 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 31465385DC30 for ; Mon, 28 Aug 2023 07:27:38 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 41BCD3858D28 for ; Mon, 28 Aug 2023 07:26:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 41BCD3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8BxnutATOxkZHUcAA--.55624S3; Mon, 28 Aug 2023 15:26:56 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx4eQ9TOxkBYBlAA--.49174S4; Mon, 28 Aug 2023 15:26:55 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Subject: [PATCH 2/6] LoongArch: Add ifunc support for memchr{aligned, lsx, lasx} Date: Mon, 28 Aug 2023 15:26:47 +0800 Message-Id: <20230828072651.3085034-3-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230828072651.3085034-1-dengjianbo@loongson.cn> References: <20230828072651.3085034-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dx4eQ9TOxkBYBlAA--.49174S4 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoW3KF15CryxXrWDtw45JFy8Xrc_yoW8JFy8Zo WftFWDJrs2krs0yrZ3CrsrX3srWFySgr4jv3y5ZayrJr18KryUKF93Ca4akrsFgrn5uan5 Xa4xZ3sxJ3yxGFn3l-sFpf9Il3svdjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYI7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUAVWUtwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMxCIbckI1I0E14v26r126r1DMI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAV WUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU wMKuUUUUU X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" According to glibc memchr microbenchmark, this implementation could reduce the runtime as following: Name Percent of runtime reduced memchr-lasx 37%-83% memchr-lsx 30%-66% memchr-aligned 0%-15% --- sysdeps/loongarch/lp64/multiarch/Makefile | 3 + .../lp64/multiarch/ifunc-impl-list.c | 7 ++ .../loongarch/lp64/multiarch/ifunc-memchr.h | 40 ++++++ .../loongarch/lp64/multiarch/memchr-aligned.S | 95 ++++++++++++++ .../loongarch/lp64/multiarch/memchr-lasx.S | 117 ++++++++++++++++++ sysdeps/loongarch/lp64/multiarch/memchr-lsx.S | 102 +++++++++++++++ sysdeps/loongarch/lp64/multiarch/memchr.c | 37 ++++++ 7 files changed, 401 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index 64416b025a..2f4802cfa4 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -24,5 +24,8 @@ sysdep_routines += \ rawmemchr-aligned \ rawmemchr-lsx \ rawmemchr-lasx \ + memchr-aligned \ + memchr-lsx \ + memchr-lasx \ # sysdep_routines endif diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index 3db9af1460..a567b9cf4d 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -102,5 +102,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, rawmemchr, 1, __rawmemchr_aligned) ) + IFUNC_IMPL (i, name, memchr, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, memchr, SUPPORT_LASX, __memchr_lasx) + IFUNC_IMPL_ADD (array, i, memchr, SUPPORT_LSX, __memchr_lsx) +#endif + IFUNC_IMPL_ADD (array, i, memchr, 1, __memchr_aligned) + ) return i; } diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h new file mode 100644 index 0000000000..9060ccd54d --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h @@ -0,0 +1,40 @@ +/* Common definition for memchr ifunc selections. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#if !defined __loongarch_soft_float +extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden; +#endif +extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{ +#if !defined __loongarch_soft_float + if (SUPPORT_LASX) + return OPTIMIZE (lasx); + else if (SUPPORT_LSX) + return OPTIMIZE (lsx); + else +#endif + return OPTIMIZE (aligned); +} diff --git a/sysdeps/loongarch/lp64/multiarch/memchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/memchr-aligned.S new file mode 100644 index 0000000000..81d0d00461 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memchr-aligned.S @@ -0,0 +1,95 @@ +/* Optimized memchr implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) +# define MEMCHR_NAME __memchr_aligned +#else +# define MEMCHR_NAME memchr +#endif + +LEAF(MEMCHR_NAME, 6) + beqz a2, L(out) + andi t1, a0, 0x7 + add.d a5, a0, a2 + bstrins.d a0, zero, 2, 0 + + ld.d t0, a0, 0 + bstrins.d a1, a1, 15, 8 + lu12i.w a3, 0x01010 + slli.d t2, t1, 03 + + bstrins.d a1, a1, 31, 16 + ori a3, a3, 0x101 + li.d t7, -1 + li.d t8, 8 + + bstrins.d a1, a1, 63, 32 + bstrins.d a3, a3, 63, 32 + sll.d t2, t7, t2 + xor t0, t0, a1 + + + addi.d a6, a5, -1 + slli.d a4, a3, 7 + sub.d t1, t8, t1 + orn t0, t0, t2 + + sub.d t2, t0, a3 + andn t3, a4, t0 + bstrins.d a6, zero, 2, 0 + and t0, t2, t3 + + bgeu t1, a2, L(end) +L(loop): + bnez t0, L(found) + ld.d t1, a0, 8 + xor t0, t1, a1 + + addi.d a0, a0, 8 + sub.d t2, t0, a3 + andn t3, a4, t0 + and t0, t2, t3 + + + bne a0, a6, L(loop) +L(end): + sub.d t1, a5, a6 + ctz.d t0, t0 + srli.d t0, t0, 3 + + sltu t1, t0, t1 + add.d a0, a0, t0 + maskeqz a0, a0, t1 + jr ra + +L(found): + ctz.d t0, t0 + srli.d t0, t0, 3 + add.d a0, a0, t0 + jr ra + +L(out): + move a0, zero + jr ra +END(MEMCHR_NAME) + +libc_hidden_builtin_def (MEMCHR_NAME) diff --git a/sysdeps/loongarch/lp64/multiarch/memchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/memchr-lasx.S new file mode 100644 index 0000000000..a26cdf48b5 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memchr-lasx.S @@ -0,0 +1,117 @@ +/* Optimized memchr implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define MEMCHR __memchr_lasx + +LEAF(MEMCHR, 6) + beqz a2, L(ret0) + add.d a3, a0, a2 + andi t0, a0, 0x3f + bstrins.d a0, zero, 5, 0 + + xvld xr0, a0, 0 + xvld xr1, a0, 32 + li.d t1, -1 + li.d t2, 64 + + xvreplgr2vr.b xr2, a1 + sll.d t3, t1, t0 + sub.d t2, t2, t0 + xvseq.b xr0, xr0, xr2 + + xvseq.b xr1, xr1, xr2 + xvmsknz.b xr0, xr0 + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + + + xvpickve.w xr4, xr1, 4 + vilvl.h vr0, vr3, vr0 + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + + movfr2gr.d t0, fa0 + and t0, t0, t3 + bgeu t2, a2, L(end) + bnez t0, L(found) + + addi.d a4, a3, -1 + bstrins.d a4, zero, 5, 0 +L(loop): + xvld xr0, a0, 64 + xvld xr1, a0, 96 + + addi.d a0, a0, 64 + xvseq.b xr0, xr0, xr2 + xvseq.b xr1, xr1, xr2 + beq a0, a4, L(out) + + + xvmax.bu xr3, xr0, xr1 + xvseteqz.v fcc0, xr3 + bcnez fcc0, L(loop) + xvmsknz.b xr0, xr0 + + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + xvpickve.w xr4, xr1, 4 + vilvl.h vr0, vr3, vr0 + + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 +L(found): + ctz.d t1, t0 + + add.d a0, a0, t1 + jr ra +L(ret0): + move a0, zero + jr ra + + +L(out): + xvmsknz.b xr0, xr0 + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + xvpickve.w xr4, xr1, 4 + + vilvl.h vr0, vr3, vr0 + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + +L(end): + sub.d t2, zero, a3 + srl.d t1, t1, t2 + and t0, t0, t1 + ctz.d t1, t0 + + add.d a0, a0, t1 + maskeqz a0, a0, t0 + jr ra +END(MEMCHR) + +libc_hidden_builtin_def (MEMCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/memchr-lsx.S new file mode 100644 index 0000000000..a73ecd2599 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memchr-lsx.S @@ -0,0 +1,102 @@ +/* Optimized memchr implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define MEMCHR __memchr_lsx + +LEAF(MEMCHR, 6) + beqz a2, L(ret0) + add.d a3, a0, a2 + andi t0, a0, 0x1f + bstrins.d a0, zero, 4, 0 + + vld vr0, a0, 0 + vld vr1, a0, 16 + li.d t1, -1 + li.d t2, 32 + + vreplgr2vr.b vr2, a1 + sll.d t3, t1, t0 + sub.d t2, t2, t0 + vseq.b vr0, vr0, vr2 + + vseq.b vr1, vr1, vr2 + vmsknz.b vr0, vr0 + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + + + movfr2gr.s t0, fa0 + and t0, t0, t3 + bgeu t2, a2, L(end) + bnez t0, L(found) + + addi.d a4, a3, -1 + bstrins.d a4, zero, 4, 0 +L(loop): + vld vr0, a0, 32 + vld vr1, a0, 48 + + addi.d a0, a0, 32 + vseq.b vr0, vr0, vr2 + vseq.b vr1, vr1, vr2 + beq a0, a4, L(out) + + vmax.bu vr3, vr0, vr1 + vseteqz.v fcc0, vr3 + bcnez fcc0, L(loop) + vmsknz.b vr0, vr0 + + + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 +L(found): + ctz.w t0, t0 + + add.d a0, a0, t0 + jr ra +L(ret0): + move a0, zero + jr ra + +L(out): + vmsknz.b vr0, vr0 + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + +L(end): + sub.d t2, zero, a3 + srl.w t1, t1, t2 + and t0, t0, t1 + ctz.w t1, t0 + + + add.d a0, a0, t1 + maskeqz a0, a0, t0 + jr ra +END(MEMCHR) + +libc_hidden_builtin_def (MEMCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memchr.c b/sysdeps/loongarch/lp64/multiarch/memchr.c new file mode 100644 index 0000000000..059479c0ce --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memchr.c @@ -0,0 +1,37 @@ +/* Multiple versions of memchr. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define memchr __redirect_memchr +# include +# undef memchr + +# define SYMBOL_NAME memchr +# include "ifunc-memchr.h" + +libc_ifunc_redirected (__redirect_memchr, memchr, + IFUNC_SELECTOR ()); + +# ifdef SHARED +__hidden_ver1 (memchr, __GI_memchr, __redirect_memchr) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memchr); +# endif + +#endif From patchwork Mon Aug 28 07:26:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 1826632 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RZ2GC0yhkz1yfX for ; Mon, 28 Aug 2023 17:28:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 35C66385B81D for ; Mon, 28 Aug 2023 07:28:01 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id D6AD23858D38 for ; Mon, 28 Aug 2023 07:26:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D6AD23858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8CxRuhBTOxkaHUcAA--.4728S3; Mon, 28 Aug 2023 15:26:57 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx4eQ9TOxkBYBlAA--.49174S5; Mon, 28 Aug 2023 15:26:56 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Subject: [PATCH 3/6] LoongArch: Add ifunc support for memrchr{lsx, lasx} Date: Mon, 28 Aug 2023 15:26:48 +0800 Message-Id: <20230828072651.3085034-4-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230828072651.3085034-1-dengjianbo@loongson.cn> References: <20230828072651.3085034-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dx4eQ9TOxkBYBlAA--.49174S5 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoW3ZFWDuw18Cr4kXF4DZr17CFX_yoWkuFWfpF Wkur15Gan7CrW7WFWxG3Wav3WrCFs5Jrn0g3WY9rWUXrWkXr1kuF42yFWkW3WkJ3yrGrWY vanIvFyj9F48AagCm3ZEXasCq-sJn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUvIb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1l4IxYO2xFxVAFwI0_Jw0_GFylx2IqxVAq x4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r 1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF 7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxV WUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07jo sjUUUUUU= X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" According to glibc memrchr microbenchmark, this implementation could reduce the runtime as following: Name Percent of rutime reduced memrchr-lasx 20%-83% memrchr-lsx 20%-64% --- sysdeps/loongarch/lp64/multiarch/Makefile | 3 + .../lp64/multiarch/ifunc-impl-list.c | 8 ++ .../loongarch/lp64/multiarch/ifunc-memrchr.h | 40 ++++++ .../lp64/multiarch/memrchr-generic.c | 23 ++++ .../loongarch/lp64/multiarch/memrchr-lasx.S | 123 ++++++++++++++++++ .../loongarch/lp64/multiarch/memrchr-lsx.S | 105 +++++++++++++++ sysdeps/loongarch/lp64/multiarch/memrchr.c | 33 +++++ 7 files changed, 335 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-generic.c create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index 2f4802cfa4..7b87bc9055 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -27,5 +27,8 @@ sysdep_routines += \ memchr-aligned \ memchr-lsx \ memchr-lasx \ + memrchr-generic \ + memrchr-lsx \ + memrchr-lasx \ # sysdep_routines endif diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index a567b9cf4d..8bd5489ee2 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -109,5 +109,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, #endif IFUNC_IMPL_ADD (array, i, memchr, 1, __memchr_aligned) ) + + IFUNC_IMPL (i, name, memrchr, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, memrchr, SUPPORT_LASX, __memrchr_lasx) + IFUNC_IMPL_ADD (array, i, memrchr, SUPPORT_LSX, __memrchr_lsx) +#endif + IFUNC_IMPL_ADD (array, i, memrchr, 1, __memrchr_generic) + ) return i; } diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h new file mode 100644 index 0000000000..8215f9ad94 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h @@ -0,0 +1,40 @@ +/* Common definition for memrchr implementation. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#if !defined __loongarch_soft_float +extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden; +#endif +extern __typeof (REDIRECT_NAME) OPTIMIZE (generic) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{ +#if !defined __loongarch_soft_float + if (SUPPORT_LASX) + return OPTIMIZE (lasx); + else if (SUPPORT_LSX) + return OPTIMIZE (lsx); + else +#endif + return OPTIMIZE (generic); +} diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr-generic.c b/sysdeps/loongarch/lp64/multiarch/memrchr-generic.c new file mode 100644 index 0000000000..ced61ebce5 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memrchr-generic.c @@ -0,0 +1,23 @@ +/* Generic implementation of memrchr. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#if IS_IN (libc) +# define MEMRCHR __memrchr_generic +#endif + +#include diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S new file mode 100644 index 0000000000..5f3e0d06d7 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S @@ -0,0 +1,123 @@ +/* Optimized memrchr implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +#ifndef MEMRCHR +# define MEMRCHR __memrchr_lasx +#endif + +LEAF(MEMRCHR, 6) + beqz a2, L(ret0) + addi.d a2, a2, -1 + add.d a3, a0, a2 + andi t1, a3, 0x3f + + bstrins.d a3, zero, 5, 0 + addi.d t1, t1, 1 + xvld xr0, a3, 0 + xvld xr1, a3, 32 + + sub.d t2, zero, t1 + li.d t3, -1 + xvreplgr2vr.b xr2, a1 + andi t4, a0, 0x3f + + srl.d t2, t3, t2 + xvseq.b xr0, xr0, xr2 + xvseq.b xr1, xr1, xr2 + xvmsknz.b xr0, xr0 + + + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + xvpickve.w xr4, xr1, 4 + vilvl.h vr0, vr3, vr0 + + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + and t0, t0, t2 + + bltu a2, t1, L(end) + bnez t0, L(found) + bstrins.d a0, zero, 5, 0 +L(loop): + xvld xr0, a3, -64 + + xvld xr1, a3, -32 + addi.d a3, a3, -64 + xvseq.b xr0, xr0, xr2 + xvseq.b xr1, xr1, xr2 + + + beq a0, a3, L(out) + xvmax.bu xr3, xr0, xr1 + xvseteqz.v fcc0, xr3 + bcnez fcc0, L(loop) + + xvmsknz.b xr0, xr0 + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + xvpickve.w xr4, xr1, 4 + + vilvl.h vr0, vr3, vr0 + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + +L(found): + addi.d a0, a3, 63 + clz.d t1, t0 + sub.d a0, a0, t1 + jr ra + + +L(out): + xvmsknz.b xr0, xr0 + xvmsknz.b xr1, xr1 + xvpickve.w xr3, xr0, 4 + xvpickve.w xr4, xr1, 4 + + vilvl.h vr0, vr3, vr0 + vilvl.h vr1, vr4, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + +L(end): + sll.d t2, t3, t4 + and t0, t0, t2 + addi.d a0, a3, 63 + clz.d t1, t0 + + sub.d a0, a0, t1 + maskeqz a0, a0, t0 + jr ra +L(ret0): + move a0, zero + + + jr ra +END(MEMRCHR) + +libc_hidden_builtin_def (MEMRCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S new file mode 100644 index 0000000000..39a7c8b076 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S @@ -0,0 +1,105 @@ +/* Optimized memrchr implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define MEMRCHR __memrchr_lsx + +LEAF(MEMRCHR, 6) + beqz a2, L(ret0) + addi.d a2, a2, -1 + add.d a3, a0, a2 + andi t1, a3, 0x1f + + bstrins.d a3, zero, 4, 0 + addi.d t1, t1, 1 + vld vr0, a3, 0 + vld vr1, a3, 16 + + sub.d t2, zero, t1 + li.d t3, -1 + vreplgr2vr.b vr2, a1 + andi t4, a0, 0x1f + + srl.d t2, t3, t2 + vseq.b vr0, vr0, vr2 + vseq.b vr1, vr1, vr2 + vmsknz.b vr0, vr0 + + + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + and t0, t0, t2 + + bltu a2, t1, L(end) + bnez t0, L(found) + bstrins.d a0, zero, 4, 0 +L(loop): + vld vr0, a3, -32 + + vld vr1, a3, -16 + addi.d a3, a3, -32 + vseq.b vr0, vr0, vr2 + vseq.b vr1, vr1, vr2 + + beq a0, a3, L(out) + vmax.bu vr3, vr0, vr1 + vseteqz.v fcc0, vr3 + bcnez fcc0, L(loop) + + + vmsknz.b vr0, vr0 + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + +L(found): + addi.d a0, a3, 31 + clz.w t1, t0 + sub.d a0, a0, t1 + jr ra + +L(out): + vmsknz.b vr0, vr0 + vmsknz.b vr1, vr1 + vilvl.h vr0, vr1, vr0 + movfr2gr.s t0, fa0 + +L(end): + sll.d t2, t3, t4 + and t0, t0, t2 + addi.d a0, a3, 31 + clz.w t1, t0 + + + sub.d a0, a0, t1 + maskeqz a0, a0, t0 + jr ra +L(ret0): + move a0, zero + + jr ra +END(MEMRCHR) + +libc_hidden_builtin_def (MEMRCHR) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr.c b/sysdeps/loongarch/lp64/multiarch/memrchr.c new file mode 100644 index 0000000000..8baba9ab7e --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memrchr.c @@ -0,0 +1,33 @@ +/* Multiple versions of memrchr. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define memrchr __redirect_memrchr +# include +# undef memrchr + +# define SYMBOL_NAME memrchr +# include "ifunc-memrchr.h" + +libc_ifunc_redirected (__redirect_memrchr, __memrchr, IFUNC_SELECTOR ()); +libc_hidden_def (__memrchr) +weak_alias (__memrchr, memrchr) + +#endif From patchwork Mon Aug 28 07:26:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 1826633 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RZ2Gd4k0tz1yfX for ; Mon, 28 Aug 2023 17:28:25 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8005138582A3 for ; Mon, 28 Aug 2023 07:28:23 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 72B2B3858D3C for ; Mon, 28 Aug 2023 07:27:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 72B2B3858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8AxTetDTOxka3UcAA--.52681S3; Mon, 28 Aug 2023 15:26:59 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx4eQ9TOxkBYBlAA--.49174S6; Mon, 28 Aug 2023 15:26:57 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Subject: [PATCH 4/6] LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx} Date: Mon, 28 Aug 2023 15:26:49 +0800 Message-Id: <20230828072651.3085034-5-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230828072651.3085034-1-dengjianbo@loongson.cn> References: <20230828072651.3085034-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dx4eQ9TOxkBYBlAA--.49174S6 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoWfJrWxtF1UAw4UJFyfCr4rWFX_yoW8CrWfXo WSyFZFqr4Ik3yUAFW2krnxJ39rW34fur12q3yrAw4kJry8Kr43CF9Yk3Z8tw47Krn5CFs5 X3s2qw43AFZ7Grn5l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYs7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUXVWUAwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280 aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMxCIbckI1I0E14v26r1q6r43MI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAV WUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7I U8l38UUUUUU== X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" According to glibc memset microbenchmark test results, for LSX and LASX versions, A few cases with length less than 8 experience performace degradation, overall, the LASX version could reduce the runtime about 15% - 75%, LSX version could reduce the runtime about 15%-50%. The unaligned version uses unaligned memmory access to set data which length is less than 64 and make address aligned with 8. For this part, the performace is better than aligned version. Comparing with the generic version, the performance is close when the length is larger than 128. When the length is 8-128, the unaligned version could reduce the runtime about 30%-70%, the aligned version could reduce the runtime about 20%-50%. --- sysdeps/loongarch/lp64/multiarch/Makefile | 4 + .../lp64/multiarch/dl-symbol-redir-ifunc.h | 24 +++ .../lp64/multiarch/ifunc-impl-list.c | 10 + .../loongarch/lp64/multiarch/memset-aligned.S | 174 ++++++++++++++++++ .../loongarch/lp64/multiarch/memset-lasx.S | 142 ++++++++++++++ sysdeps/loongarch/lp64/multiarch/memset-lsx.S | 135 ++++++++++++++ .../lp64/multiarch/memset-unaligned.S | 162 ++++++++++++++++ sysdeps/loongarch/lp64/multiarch/memset.c | 37 ++++ 8 files changed, 688 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-unaligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index 7b87bc9055..216886c551 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -30,5 +30,9 @@ sysdep_routines += \ memrchr-generic \ memrchr-lsx \ memrchr-lasx \ + memset-aligned \ + memset-unaligned \ + memset-lsx \ + memset-lasx \ # sysdep_routines endif diff --git a/sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h b/sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h new file mode 100644 index 0000000000..e2723873bc --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h @@ -0,0 +1,24 @@ +/* Symbol rediretion for loader/static initialization code. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _DL_IFUNC_GENERIC_H +#define _DL_IFUNC_GENERIC_H + +asm ("memset = __memset_aligned"); + +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index 8bd5489ee2..37f60dde91 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -117,5 +117,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, #endif IFUNC_IMPL_ADD (array, i, memrchr, 1, __memrchr_generic) ) + + IFUNC_IMPL (i, name, memset, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, memset, SUPPORT_LASX, __memset_lasx) + IFUNC_IMPL_ADD (array, i, memset, SUPPORT_LSX, __memset_lsx) +#endif + IFUNC_IMPL_ADD (array, i, memset, SUPPORT_UAL, __memset_unaligned) + IFUNC_IMPL_ADD (array, i, memset, 1, __memset_aligned) + ) + return i; } diff --git a/sysdeps/loongarch/lp64/multiarch/memset-aligned.S b/sysdeps/loongarch/lp64/multiarch/memset-aligned.S new file mode 100644 index 0000000000..1fce95b714 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memset-aligned.S @@ -0,0 +1,174 @@ +/* Optimized memset aligned implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) +# define MEMSET_NAME __memset_aligned +#else +# define MEMSET_NAME memset +#endif + +LEAF(MEMSET_NAME, 6) + move t0, a0 + andi a3, a0, 0x7 + li.w t6, 16 + beqz a3, L(align) + bltu a2, t6, L(short_data) + +L(make_align): + li.w t8, 8 + sub.d t2, t8, a3 + pcaddi t1, 11 + slli.d t3, t2, 2 + sub.d t1, t1, t3 + jr t1 + +L(al7): + st.b a1, t0, 6 +L(al6): + st.b a1, t0, 5 +L(al5): + st.b a1, t0, 4 +L(al4): + st.b a1, t0, 3 +L(al3): + st.b a1, t0, 2 +L(al2): + st.b a1, t0, 1 +L(al1): + st.b a1, t0, 0 +L(al0): + add.d t0, t0, t2 + sub.d a2, a2, t2 + +L(align): + bstrins.d a1, a1, 15, 8 + bstrins.d a1, a1, 31, 16 + bstrins.d a1, a1, 63, 32 + bltu a2, t6, L(less_16bytes) + + andi a4, a2, 0x3f + beq a4, a2, L(less_64bytes) + + sub.d t1, a2, a4 + move a2, a4 + add.d a5, t0, t1 + +L(loop_64bytes): + addi.d t0, t0, 64 + st.d a1, t0, -64 + st.d a1, t0, -56 + st.d a1, t0, -48 + st.d a1, t0, -40 + + st.d a1, t0, -32 + st.d a1, t0, -24 + st.d a1, t0, -16 + st.d a1, t0, -8 + bne t0, a5, L(loop_64bytes) + +L(less_64bytes): + srai.d a4, a2, 5 + beqz a4, L(less_32bytes) + addi.d a2, a2, -32 + st.d a1, t0, 0 + + st.d a1, t0, 8 + st.d a1, t0, 16 + st.d a1, t0, 24 + addi.d t0, t0, 32 + +L(less_32bytes): + bltu a2, t6, L(less_16bytes) + addi.d a2, a2, -16 + st.d a1, t0, 0 + st.d a1, t0, 8 + addi.d t0, t0, 16 + +L(less_16bytes): + srai.d a4, a2, 3 + beqz a4, L(less_8bytes) + addi.d a2, a2, -8 + st.d a1, t0, 0 + addi.d t0, t0, 8 + +L(less_8bytes): + beqz a2, L(less_1byte) + srai.d a4, a2, 2 + beqz a4, L(less_4bytes) + addi.d a2, a2, -4 + st.w a1, t0, 0 + addi.d t0, t0, 4 + +L(less_4bytes): + srai.d a3, a2, 1 + beqz a3, L(less_2bytes) + addi.d a2, a2, -2 + st.h a1, t0, 0 + addi.d t0, t0, 2 + +L(less_2bytes): + beqz a2, L(less_1byte) + st.b a1, t0, 0 +L(less_1byte): + jr ra + +L(short_data): + pcaddi t1, 19 + slli.d t3, a2, 2 + sub.d t1, t1, t3 + jr t1 +L(short_15): + st.b a1, a0, 14 +L(short_14): + st.b a1, a0, 13 +L(short_13): + st.b a1, a0, 12 +L(short_12): + st.b a1, a0, 11 +L(short_11): + st.b a1, a0, 10 +L(short_10): + st.b a1, a0, 9 +L(short_9): + st.b a1, a0, 8 +L(short_8): + st.b a1, a0, 7 +L(short_7): + st.b a1, a0, 6 +L(short_6): + st.b a1, a0, 5 +L(short_5): + st.b a1, a0, 4 +L(short_4): + st.b a1, a0, 3 +L(short_3): + st.b a1, a0, 2 +L(short_2): + st.b a1, a0, 1 +L(short_1): + st.b a1, a0, 0 +L(short_0): + jr ra +END(MEMSET_NAME) + +libc_hidden_builtin_def (MEMSET_NAME) diff --git a/sysdeps/loongarch/lp64/multiarch/memset-lasx.S b/sysdeps/loongarch/lp64/multiarch/memset-lasx.S new file mode 100644 index 0000000000..041abbac87 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memset-lasx.S @@ -0,0 +1,142 @@ +/* Optimized memset implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define MEMSET __memset_lasx + +LEAF(MEMSET, 6) + li.d t1, 32 + move a3, a0 + xvreplgr2vr.b xr0, a1 + add.d a4, a0, a2 + + bgeu t1, a2, L(less_32bytes) + li.d t3, 128 + li.d t2, 64 + blt t3, a2, L(long_bytes) + +L(less_128bytes): + bgeu t2, a2, L(less_64bytes) + xvst xr0, a3, 0 + xvst xr0, a3, 32 + xvst xr0, a4, -32 + + xvst xr0, a4, -64 + jr ra +L(less_64bytes): + xvst xr0, a3, 0 + xvst xr0, a4, -32 + + + jr ra +L(less_32bytes): + srli.d t0, a2, 4 + beqz t0, L(less_16bytes) + vst vr0, a3, 0 + + vst vr0, a4, -16 + jr ra +L(less_16bytes): + srli.d t0, a2, 3 + beqz t0, L(less_8bytes) + + vstelm.d vr0, a3, 0, 0 + vstelm.d vr0, a4, -8, 0 + jr ra +L(less_8bytes): + srli.d t0, a2, 2 + + beqz t0, L(less_4bytes) + vstelm.w vr0, a3, 0, 0 + vstelm.w vr0, a4, -4, 0 + jr ra + + +L(less_4bytes): + srli.d t0, a2, 1 + beqz t0, L(less_2bytes) + vstelm.h vr0, a3, 0, 0 + vstelm.h vr0, a4, -2, 0 + + jr ra +L(less_2bytes): + beqz a2, L(less_1bytes) + st.b a1, a3, 0 +L(less_1bytes): + jr ra + +L(long_bytes): + xvst xr0, a3, 0 + bstrins.d a3, zero, 4, 0 + addi.d a3, a3, 32 + sub.d a2, a4, a3 + + andi t0, a2, 0xff + beq t0, a2, L(long_end) + move a2, t0 + sub.d t0, a4, t0 + + +L(loop_256): + xvst xr0, a3, 0 + xvst xr0, a3, 32 + xvst xr0, a3, 64 + xvst xr0, a3, 96 + + xvst xr0, a3, 128 + xvst xr0, a3, 160 + xvst xr0, a3, 192 + xvst xr0, a3, 224 + + addi.d a3, a3, 256 + bne a3, t0, L(loop_256) +L(long_end): + bltu a2, t3, L(end_less_128) + addi.d a2, a2, -128 + + xvst xr0, a3, 0 + xvst xr0, a3, 32 + xvst xr0, a3, 64 + xvst xr0, a3, 96 + + + addi.d a3, a3, 128 +L(end_less_128): + bltu a2, t2, L(end_less_64) + addi.d a2, a2, -64 + xvst xr0, a3, 0 + + xvst xr0, a3, 32 + addi.d a3, a3, 64 +L(end_less_64): + bltu a2, t1, L(end_less_32) + xvst xr0, a3, 0 + +L(end_less_32): + xvst xr0, a4, -32 + jr ra +END(MEMSET) + +libc_hidden_builtin_def (MEMSET) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memset-lsx.S b/sysdeps/loongarch/lp64/multiarch/memset-lsx.S new file mode 100644 index 0000000000..3d3982aa5a --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memset-lsx.S @@ -0,0 +1,135 @@ +/* Optimized memset implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define MEMSET __memset_lsx + +LEAF(MEMSET, 6) + li.d t1, 16 + move a3, a0 + vreplgr2vr.b vr0, a1 + add.d a4, a0, a2 + + bgeu t1, a2, L(less_16bytes) + li.d t3, 64 + li.d t2, 32 + bgeu a2, t3, L(long_bytes) + +L(less_64bytes): + bgeu t2, a2, L(less_32bytes) + vst vr0, a3, 0 + vst vr0, a3, 16 + vst vr0, a4, -32 + + vst vr0, a4, -16 + jr ra +L(less_32bytes): + vst vr0, a3, 0 + vst vr0, a4, -16 + + + jr ra +L(less_16bytes): + srli.d t0, a2, 3 + beqz t0, L(less_8bytes) + vstelm.d vr0, a3, 0, 0 + + vstelm.d vr0, a4, -8, 0 + jr ra +L(less_8bytes): + srli.d t0, a2, 2 + beqz t0, L(less_4bytes) + + vstelm.w vr0, a3, 0, 0 + vstelm.w vr0, a4, -4, 0 + jr ra +L(less_4bytes): + srli.d t0, a2, 1 + + beqz t0, L(less_2bytes) + vstelm.h vr0, a3, 0, 0 + vstelm.h vr0, a4, -2, 0 + jr ra + + +L(less_2bytes): + beqz a2, L(less_1bytes) + vstelm.b vr0, a3, 0, 0 +L(less_1bytes): + jr ra +L(long_bytes): + vst vr0, a3, 0 + + bstrins.d a3, zero, 3, 0 + addi.d a3, a3, 16 + sub.d a2, a4, a3 + andi t0, a2, 0x7f + + beq t0, a2, L(long_end) + move a2, t0 + sub.d t0, a4, t0 + +L(loop_128): + vst vr0, a3, 0 + + vst vr0, a3, 16 + vst vr0, a3, 32 + vst vr0, a3, 48 + vst vr0, a3, 64 + + + vst vr0, a3, 80 + vst vr0, a3, 96 + vst vr0, a3, 112 + addi.d a3, a3, 128 + + bne a3, t0, L(loop_128) +L(long_end): + bltu a2, t3, L(end_less_64) + addi.d a2, a2, -64 + vst vr0, a3, 0 + + vst vr0, a3, 16 + vst vr0, a3, 32 + vst vr0, a3, 48 + addi.d a3, a3, 64 + +L(end_less_64): + bltu a2, t2, L(end_less_32) + addi.d a2, a2, -32 + vst vr0, a3, 0 + vst vr0, a3, 16 + + addi.d a3, a3, 32 +L(end_less_32): + bltu a2, t1, L(end_less_16) + vst vr0, a3, 0 + +L(end_less_16): + vst vr0, a4, -16 + jr ra +END(MEMSET) + +libc_hidden_builtin_def (MEMSET) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memset-unaligned.S b/sysdeps/loongarch/lp64/multiarch/memset-unaligned.S new file mode 100644 index 0000000000..f7d32039df --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memset-unaligned.S @@ -0,0 +1,162 @@ +/* Optimized memset unaligned implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) + +# define MEMSET_NAME __memset_unaligned + +#define ST_128(n) \ + st.d a1, a0, n; \ + st.d a1, a0, n+8 ; \ + st.d a1, a0, n+16 ; \ + st.d a1, a0, n+24 ; \ + st.d a1, a0, n+32 ; \ + st.d a1, a0, n+40 ; \ + st.d a1, a0, n+48 ; \ + st.d a1, a0, n+56 ; \ + st.d a1, a0, n+64 ; \ + st.d a1, a0, n+72 ; \ + st.d a1, a0, n+80 ; \ + st.d a1, a0, n+88 ; \ + st.d a1, a0, n+96 ; \ + st.d a1, a0, n+104; \ + st.d a1, a0, n+112; \ + st.d a1, a0, n+120; + +LEAF(MEMSET_NAME, 6) + bstrins.d a1, a1, 15, 8 + add.d t7, a0, a2 + bstrins.d a1, a1, 31, 16 + move t0, a0 + + bstrins.d a1, a1, 63, 32 + srai.d t8, a2, 4 + beqz t8, L(less_16bytes) + srai.d t8, a2, 6 + + bnez t8, L(more_64bytes) + srai.d t8, a2, 5 + beqz t8, L(less_32bytes) + + st.d a1, a0, 0 + st.d a1, a0, 8 + st.d a1, a0, 16 + st.d a1, a0, 24 + + st.d a1, t7, -32 + st.d a1, t7, -24 + st.d a1, t7, -16 + st.d a1, t7, -8 + + jr ra + +L(less_32bytes): + st.d a1, a0, 0 + st.d a1, a0, 8 + st.d a1, t7, -16 + st.d a1, t7, -8 + + jr ra + +L(less_16bytes): + srai.d t8, a2, 3 + beqz t8, L(less_8bytes) + st.d a1, a0, 0 + st.d a1, t7, -8 + + jr ra + +L(less_8bytes): + srai.d t8, a2, 2 + beqz t8, L(less_4bytes) + st.w a1, a0, 0 + st.w a1, t7, -4 + + jr ra + +L(less_4bytes): + srai.d t8, a2, 1 + beqz t8, L(less_2bytes) + st.h a1, a0, 0 + st.h a1, t7, -2 + + jr ra + +L(less_2bytes): + beqz a2, L(less_1bytes) + st.b a1, a0, 0 + + jr ra + +L(less_1bytes): + jr ra + +L(more_64bytes): + srli.d a0, a0, 3 + slli.d a0, a0, 3 + addi.d a0, a0, 0x8 + st.d a1, t0, 0 + + sub.d t2, t0, a0 + add.d a2, t2, a2 + addi.d a2, a2, -0x80 + blt a2, zero, L(end_unalign_proc) + +L(loop_less): + ST_128(0) + addi.d a0, a0, 0x80 + addi.d a2, a2, -0x80 + bge a2, zero, L(loop_less) + +L(end_unalign_proc): + addi.d a2, a2, 0x80 + pcaddi t1, 20 + andi t5, a2, 0x78 + srli.d t5, t5, 1 + + sub.d t1, t1, t5 + jr t1 + + st.d a1, a0, 112 + st.d a1, a0, 104 + st.d a1, a0, 96 + st.d a1, a0, 88 + st.d a1, a0, 80 + st.d a1, a0, 72 + st.d a1, a0, 64 + st.d a1, a0, 56 + st.d a1, a0, 48 + st.d a1, a0, 40 + st.d a1, a0, 32 + st.d a1, a0, 24 + st.d a1, a0, 16 + st.d a1, a0, 8 + st.d a1, a0, 0 + st.d a1, t7, -8 + + move a0, t0 + jr ra +END(MEMSET_NAME) + +libc_hidden_builtin_def (MEMSET_NAME) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memset.c b/sysdeps/loongarch/lp64/multiarch/memset.c new file mode 100644 index 0000000000..3ff60d8ac7 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memset.c @@ -0,0 +1,37 @@ +/* Multiple versions of memset. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define memset __redirect_memset +# include +# undef memset + +# define SYMBOL_NAME memset +# include "ifunc-lasx.h" + +libc_ifunc_redirected (__redirect_memset, memset, + IFUNC_SELECTOR ()); + +# ifdef SHARED +__hidden_ver1 (memset, __GI_memset, __redirect_memset) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memset); +# endif + +#endif From patchwork Mon Aug 28 07:26:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 1826631 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RZ2Fn0j2Fz1yhW for ; Mon, 28 Aug 2023 17:27:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0C46E3853D16 for ; Mon, 28 Aug 2023 07:27:39 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 46FC8385840A for ; Mon, 28 Aug 2023 07:27:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 46FC8385840A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8Cxh+hETOxkbnUcAA--.22518S3; Mon, 28 Aug 2023 15:27:00 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx4eQ9TOxkBYBlAA--.49174S7; Mon, 28 Aug 2023 15:26:59 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Subject: [PATCH 5/6] LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx} Date: Mon, 28 Aug 2023 15:26:50 +0800 Message-Id: <20230828072651.3085034-6-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230828072651.3085034-1-dengjianbo@loongson.cn> References: <20230828072651.3085034-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dx4eQ9TOxkBYBlAA--.49174S7 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoWfAF43ArWkGry8JFyfGw1UCFX_yoW8ZF1DJo WayF4qqws2kws0qFZrCwsxX3srWFWfKr1jq3yUZa1rJryrGr17trZYywnI9rsrtrn5uan8 X3s2vFs8C397GFnrl-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYs7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280 aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMxCIbckI1I0E14v26r1q6r43MI8I3I0E 5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAV WUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY 1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI 0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7I U8QJ57UUUUU== X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" According to glibc memcmp microbenchmark test results(Add generic memcmp), this implementation have performance improvement except the length is less than 3, details as below: Name Percent of time reduced memcmp-lasx 16%-74% memcmp-lsx 20%-50% memcmp-aligned 5%-20% --- sysdeps/loongarch/lp64/multiarch/Makefile | 3 + .../lp64/multiarch/ifunc-impl-list.c | 7 + .../loongarch/lp64/multiarch/ifunc-memcmp.h | 40 +++ .../loongarch/lp64/multiarch/memcmp-aligned.S | 292 ++++++++++++++++++ .../loongarch/lp64/multiarch/memcmp-lasx.S | 207 +++++++++++++ sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S | 269 ++++++++++++++++ sysdeps/loongarch/lp64/multiarch/memcmp.c | 43 +++ 7 files changed, 861 insertions(+) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp.c diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile index 216886c551..360a6718c0 100644 --- a/sysdeps/loongarch/lp64/multiarch/Makefile +++ b/sysdeps/loongarch/lp64/multiarch/Makefile @@ -34,5 +34,8 @@ sysdep_routines += \ memset-unaligned \ memset-lsx \ memset-lasx \ + memcmp-aligned \ + memcmp-lsx \ + memcmp-lasx \ # sysdep_routines endif diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c index 37f60dde91..e397d58c9d 100644 --- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c @@ -127,5 +127,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, memset, 1, __memset_aligned) ) + IFUNC_IMPL (i, name, memcmp, +#if !defined __loongarch_soft_float + IFUNC_IMPL_ADD (array, i, memcmp, SUPPORT_LASX, __memcmp_lasx) + IFUNC_IMPL_ADD (array, i, memcmp, SUPPORT_LSX, __memcmp_lsx) +#endif + IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_aligned) + ) return i; } diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h b/sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h new file mode 100644 index 0000000000..04adc2e561 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h @@ -0,0 +1,40 @@ +/* Common definition for memcmp ifunc selections. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#if !defined __loongarch_soft_float +extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden; +#endif +extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{ +#if !defined __loongarch_soft_float + if (SUPPORT_LASX) + return OPTIMIZE (lasx); + else if (SUPPORT_LSX) + return OPTIMIZE (lsx); + else +#endif + return OPTIMIZE (aligned); +} diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S b/sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S new file mode 100644 index 0000000000..14a7caa9a8 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S @@ -0,0 +1,292 @@ +/* Optimized memcmp implementation using basic LoongArch instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) +# define MEMCMP_NAME __memcmp_aligned +#else +# define MEMCMP_NAME memcmp +#endif + +LEAF(MEMCMP_NAME, 6) + beqz a2, L(ret) + andi a4, a1, 0x7 + andi a3, a0, 0x7 + sltu a5, a4, a3 + + xor t0, a0, a1 + li.w t8, 8 + maskeqz t0, t0, a5 + li.w t7, -1 + + xor a0, a0, t0 + xor a1, a1, t0 + andi a3, a0, 0x7 + andi a4, a1, 0x7 + + xor a0, a0, a3 + xor a1, a1, a4 + ld.d t2, a0, 0 + ld.d t1, a1, 0 + + slli.d t3, a3, 3 + slli.d t4, a4, 3 + sub.d a6, t3, t4 + srl.d t1, t1, t4 + + srl.d t0, t2, t3 + srl.d t5, t7, t4 + sub.d t6, t0, t1 + and t6, t6, t5 + + sub.d t5, t8, a4 + bnez t6, L(first_out) + bgeu t5, a2, L(ret) + sub.d a2, a2, t5 + + bnez a6, L(unaligned) + blt a2, t8, L(al_less_8bytes) + andi t1, a2, 31 + beq t1, a2, L(al_less_32bytes) + + sub.d t2, a2, t1 + add.d a4, a0, t2 + move a2, t1 + +L(al_loop): + ld.d t0, a0, 8 + + ld.d t1, a1, 8 + ld.d t2, a0, 16 + ld.d t3, a1, 16 + ld.d t4, a0, 24 + + ld.d t5, a1, 24 + ld.d t6, a0, 32 + ld.d t7, a1, 32 + addi.d a0, a0, 32 + + addi.d a1, a1, 32 + bne t0, t1, L(out1) + bne t2, t3, L(out2) + bne t4, t5, L(out3) + + bne t6, t7, L(out4) + bne a0, a4, L(al_loop) + +L(al_less_32bytes): + srai.d a4, a2, 4 + beqz a4, L(al_less_16bytes) + + ld.d t0, a0, 8 + ld.d t1, a1, 8 + ld.d t2, a0, 16 + ld.d t3, a1, 16 + + addi.d a0, a0, 16 + addi.d a1, a1, 16 + addi.d a2, a2, -16 + bne t0, t1, L(out1) + + bne t2, t3, L(out2) + +L(al_less_16bytes): + srai.d a4, a2, 3 + beqz a4, L(al_less_8bytes) + ld.d t0, a0, 8 + + ld.d t1, a1, 8 + addi.d a0, a0, 8 + addi.d a1, a1, 8 + addi.d a2, a2, -8 + + bne t0, t1, L(out1) + +L(al_less_8bytes): + beqz a2, L(ret) + ld.d t0, a0, 8 + ld.d t1, a1, 8 + + li.d t7, -1 + slli.d t2, a2, 3 + sll.d t2, t7, t2 + sub.d t3, t0, t1 + + andn t6, t3, t2 + bnez t6, L(count_diff) + +L(ret): + move a0, zero + jr ra + +L(out4): + move t0, t6 + move t1, t7 + sub.d t6, t6, t7 + b L(count_diff) + +L(out3): + move t0, t4 + move t1, t5 + sub.d t6, t4, t5 + b L(count_diff) + +L(out2): + move t0, t2 + move t1, t3 +L(out1): + sub.d t6, t0, t1 + b L(count_diff) + +L(first_out): + slli.d t4, a2, 3 + slt t3, a2, t5 + sll.d t4, t7, t4 + maskeqz t4, t4, t3 + + andn t6, t6, t4 + +L(count_diff): + ctz.d t2, t6 + bstrins.d t2, zero, 2, 0 + srl.d t0, t0, t2 + + srl.d t1, t1, t2 + andi t0, t0, 0xff + andi t1, t1, 0xff + sub.d t2, t0, t1 + + sub.d t3, t1, t0 + masknez t2, t2, a5 + maskeqz t3, t3, a5 + or a0, t2, t3 + + jr ra + +L(unaligned): + sub.d a7, zero, a6 + srl.d t0, t2, a6 + blt a2, t8, L(un_less_8bytes) + + andi t1, a2, 31 + beq t1, a2, L(un_less_32bytes) + sub.d t2, a2, t1 + add.d a4, a0, t2 + + move a2, t1 + +L(un_loop): + ld.d t2, a0, 8 + ld.d t1, a1, 8 + ld.d t4, a0, 16 + + ld.d t3, a1, 16 + ld.d t6, a0, 24 + ld.d t5, a1, 24 + ld.d t8, a0, 32 + + ld.d t7, a1, 32 + addi.d a0, a0, 32 + addi.d a1, a1, 32 + sll.d a3, t2, a7 + + or t0, a3, t0 + bne t0, t1, L(out1) + srl.d t0, t2, a6 + sll.d a3, t4, a7 + + or t2, a3, t0 + bne t2, t3, L(out2) + srl.d t0, t4, a6 + sll.d a3, t6, a7 + + or t4, a3, t0 + bne t4, t5, L(out3) + srl.d t0, t6, a6 + sll.d a3, t8, a7 + + or t6, t0, a3 + bne t6, t7, L(out4) + srl.d t0, t8, a6 + bne a0, a4, L(un_loop) + +L(un_less_32bytes): + srai.d a4, a2, 4 + beqz a4, L(un_less_16bytes) + ld.d t2, a0, 8 + ld.d t1, a1, 8 + + ld.d t4, a0, 16 + ld.d t3, a1, 16 + addi.d a0, a0, 16 + addi.d a1, a1, 16 + + addi.d a2, a2, -16 + sll.d a3, t2, a7 + or t0, a3, t0 + bne t0, t1, L(out1) + + srl.d t0, t2, a6 + sll.d a3, t4, a7 + or t2, a3, t0 + bne t2, t3, L(out2) + + srl.d t0, t4, a6 + +L(un_less_16bytes): + srai.d a4, a2, 3 + beqz a4, L(un_less_8bytes) + ld.d t2, a0, 8 + + ld.d t1, a1, 8 + addi.d a0, a0, 8 + addi.d a1, a1, 8 + addi.d a2, a2, -8 + + sll.d a3, t2, a7 + or t0, a3, t0 + bne t0, t1, L(out1) + srl.d t0, t2, a6 + +L(un_less_8bytes): + beqz a2, L(ret) + andi a7, a7, 63 + slli.d a4, a2, 3 + bgeu a7, a4, L(last_cmp) + + ld.d t2, a0, 8 + sll.d a3, t2, a7 + or t0, a3, t0 + +L(last_cmp): + ld.d t1, a1, 8 + + li.d t7, -1 + sll.d t2, t7, a4 + sub.d t3, t0, t1 + andn t6, t3, t2 + + bnez t6, L(count_diff) + move a0, zero + jr ra +END(MEMCMP_NAME) + +libc_hidden_builtin_def (MEMCMP_NAME) diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S b/sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S new file mode 100644 index 0000000000..3151a17927 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S @@ -0,0 +1,207 @@ +/* Optimized memcmp implementation using LoongArch LASX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +# define MEMCMP __memcmp_lasx + +LEAF(MEMCMP, 6) + li.d t2, 32 + add.d a3, a0, a2 + add.d a4, a1, a2 + bgeu t2, a2, L(less32) + + li.d t1, 160 + bgeu a2, t1, L(make_aligned) +L(loop32): + xvld xr0, a0, 0 + xvld xr1, a1, 0 + + addi.d a0, a0, 32 + addi.d a1, a1, 32 + addi.d a2, a2, -32 + xvseq.b xr2, xr0, xr1 + + xvsetanyeqz.b fcc0, xr2 + bcnez fcc0, L(end) +L(last_bytes): + bltu t2, a2, L(loop32) + xvld xr0, a3, -32 + + + xvld xr1, a4, -32 + xvseq.b xr2, xr0, xr1 +L(end): + xvmsknz.b xr2, xr2 + xvpermi.q xr4, xr0, 1 + + xvpickve.w xr3, xr2, 4 + xvpermi.q xr5, xr1, 1 + vilvl.h vr2, vr3, vr2 + movfr2gr.s t0, fa2 + + cto.w t0, t0 + vreplgr2vr.b vr2, t0 + vshuf.b vr0, vr4, vr0, vr2 + vshuf.b vr1, vr5, vr1, vr2 + + vpickve2gr.bu t0, vr0, 0 + vpickve2gr.bu t1, vr1, 0 + sub.d a0, t0, t1 + jr ra + + +L(less32): + srli.d t0, a2, 4 + beqz t0, L(less16) + vld vr0, a0, 0 + vld vr1, a1, 0 + + vld vr2, a3, -16 + vld vr3, a4, -16 +L(short_ret): + vseq.b vr4, vr0, vr1 + vseq.b vr5, vr2, vr3 + + vmsknz.b vr4, vr4 + vmsknz.b vr5, vr5 + vilvl.h vr4, vr5, vr4 + movfr2gr.s t0, fa4 + + cto.w t0, t0 + vreplgr2vr.b vr4, t0 + vshuf.b vr0, vr2, vr0, vr4 + vshuf.b vr1, vr3, vr1, vr4 + + + vpickve2gr.bu t0, vr0, 0 + vpickve2gr.bu t1, vr1, 0 + sub.d a0, t0, t1 + jr ra + +L(less16): + srli.d t0, a2, 3 + beqz t0, L(less8) + vldrepl.d vr0, a0, 0 + vldrepl.d vr1, a1, 0 + + vldrepl.d vr2, a3, -8 + vldrepl.d vr3, a4, -8 + b L(short_ret) + nop + +L(less8): + srli.d t0, a2, 2 + beqz t0, L(less4) + vldrepl.w vr0, a0, 0 + vldrepl.w vr1, a1, 0 + + + vldrepl.w vr2, a3, -4 + vldrepl.w vr3, a4, -4 + b L(short_ret) + nop + +L(less4): + srli.d t0, a2, 1 + beqz t0, L(less2) + vldrepl.h vr0, a0, 0 + vldrepl.h vr1, a1, 0 + + vldrepl.h vr2, a3, -2 + vldrepl.h vr3, a4, -2 + b L(short_ret) + nop + +L(less2): + beqz a2, L(ret0) + ld.bu t0, a0, 0 + ld.bu t1, a1, 0 + sub.d a0, t0, t1 + + jr ra +L(ret0): + move a0, zero + jr ra + +L(make_aligned): + xvld xr0, a0, 0 + + xvld xr1, a1, 0 + xvseq.b xr2, xr0, xr1 + xvsetanyeqz.b fcc0, xr2 + bcnez fcc0, L(end) + + andi t0, a0, 0x1f + sub.d t0, t2, t0 + sub.d t1, a2, t0 + add.d a0, a0, t0 + + add.d a1, a1, t0 + andi a2, t1, 0x3f + sub.d t0, t1, a2 + add.d a5, a0, t0 + + +L(loop_align): + xvld xr0, a0, 0 + xvld xr1, a1, 0 + xvld xr2, a0, 32 + xvld xr3, a1, 32 + + xvseq.b xr0, xr0, xr1 + xvseq.b xr1, xr2, xr3 + xvmin.bu xr2, xr1, xr0 + xvsetanyeqz.b fcc0, xr2 + + bcnez fcc0, L(pair_end) + addi.d a0, a0, 64 + addi.d a1, a1, 64 + bne a0, a5, L(loop_align) + + bnez a2, L(last_bytes) + move a0, zero + jr ra + nop + + +L(pair_end): + xvmsknz.b xr0, xr0 + xvmsknz.b xr1, xr1 + xvpickve.w xr2, xr0, 4 + xvpickve.w xr3, xr1, 4 + + vilvl.h vr0, vr2, vr0 + vilvl.h vr1, vr3, vr1 + vilvl.w vr0, vr1, vr0 + movfr2gr.d t0, fa0 + + cto.d t0, t0 + ldx.bu t1, a0, t0 + ldx.bu t2, a1, t0 + sub.d a0, t1, t2 + + jr ra +END(MEMCMP) + +libc_hidden_builtin_def (MEMCMP) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S new file mode 100644 index 0000000000..38a50a4c16 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S @@ -0,0 +1,269 @@ +/* Optimized memcmp implementation using LoongArch LSX instructions. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include +#include + +#if IS_IN (libc) && !defined __loongarch_soft_float + +#define MEMCMP __memcmp_lsx + +LEAF(MEMCMP, 6) + beqz a2, L(out) + pcalau12i t0, %pc_hi20(L(INDEX)) + andi a3, a0, 0xf + vld vr5, t0, %pc_lo12(L(INDEX)) + + andi a4, a1, 0xf + bne a3, a4, L(unaligned) + bstrins.d a0, zero, 3, 0 + xor a1, a1, a4 + + vld vr0, a0, 0 + vld vr1, a1, 0 + li.d t0, 16 + vreplgr2vr.b vr3, a3 + + sub.d t1, t0, a3 + vadd.b vr3, vr3, vr5 + vshuf.b vr0, vr3, vr0, vr3 + vshuf.b vr1, vr3, vr1, vr3 + + + vseq.b vr4, vr0, vr1 + bgeu t1, a2, L(al_end) + vsetanyeqz.b fcc0, vr4 + bcnez fcc0, L(al_found) + + sub.d t1, a2, t1 + andi a2, t1, 31 + beq a2, t1, L(al_less_32bytes) + sub.d t2, t1, a2 + + add.d a4, a0, t2 +L(al_loop): + vld vr0, a0, 16 + vld vr1, a1, 16 + vld vr2, a0, 32 + + vld vr3, a1, 32 + addi.d a0, a0, 32 + addi.d a1, a1, 32 + vseq.b vr4, vr0, vr1 + + + vseq.b vr6, vr2, vr3 + vand.v vr6, vr4, vr6 + vsetanyeqz.b fcc0, vr6 + bcnez fcc0, L(al_pair_end) + + bne a0, a4, L(al_loop) +L(al_less_32bytes): + bgeu t0, a2, L(al_less_16bytes) + vld vr0, a0, 16 + vld vr1, a1, 16 + + vld vr2, a0, 32 + vld vr3, a1, 32 + addi.d a2, a2, -16 + vreplgr2vr.b vr6, a2 + + vslt.b vr5, vr5, vr6 + vseq.b vr4, vr0, vr1 + vseq.b vr6, vr2, vr3 + vorn.v vr6, vr6, vr5 + + +L(al_pair_end): + vsetanyeqz.b fcc0, vr4 + bcnez fcc0, L(al_found) + vnori.b vr4, vr6, 0 + vfrstpi.b vr4, vr4, 0 + + vshuf.b vr0, vr2, vr2, vr4 + vshuf.b vr1, vr3, vr3, vr4 + vpickve2gr.bu t0, vr0, 0 + vpickve2gr.bu t1, vr1, 0 + + sub.d a0, t0, t1 + jr ra + nop + nop + +L(al_less_16bytes): + beqz a2, L(out) + vld vr0, a0, 16 + vld vr1, a1, 16 + vseq.b vr4, vr0, vr1 + + +L(al_end): + vreplgr2vr.b vr6, a2 + vslt.b vr5, vr5, vr6 + vorn.v vr4, vr4, vr5 + nop + +L(al_found): + vnori.b vr4, vr4, 0 + vfrstpi.b vr4, vr4, 0 + vshuf.b vr0, vr0, vr0, vr4 + vshuf.b vr1, vr1, vr1, vr4 + + vpickve2gr.bu t0, vr0, 0 + vpickve2gr.bu t1, vr1, 0 + sub.d a0, t0, t1 + jr ra + +L(out): + move a0, zero + jr ra + nop + nop + + +L(unaligned): + xor t2, a0, a1 + sltu a5, a3, a4 + masknez t2, t2, a5 + xor a0, a0, t2 + + xor a1, a1, t2 + andi a3, a0, 0xf + andi a4, a1, 0xf + bstrins.d a0, zero, 3, 0 + + xor a1, a1, a4 + vld vr4, a0, 0 + vld vr1, a1, 0 + li.d t0, 16 + + vreplgr2vr.b vr2, a4 + sub.d a6, a4, a3 + sub.d t1, t0, a4 + sub.d t2, t0, a6 + + + vadd.b vr2, vr2, vr5 + vreplgr2vr.b vr6, t2 + vadd.b vr6, vr6, vr5 + vshuf.b vr0, vr4, vr4, vr6 + + vshuf.b vr1, vr2, vr1, vr2 + vshuf.b vr0, vr2, vr0, vr2 + vseq.b vr7, vr0, vr1 + bgeu t1, a2, L(un_end) + + vsetanyeqz.b fcc0, vr7 + bcnez fcc0, L(un_found) + sub.d a2, a2, t1 + andi t1, a2, 31 + + beq a2, t1, L(un_less_32bytes) + sub.d t2, a2, t1 + move a2, t1 + add.d a4, a1, t2 + + +L(un_loop): + vld vr2, a0, 16 + vld vr1, a1, 16 + vld vr3, a1, 32 + addi.d a1, a1, 32 + + addi.d a0, a0, 32 + vshuf.b vr0, vr2, vr4, vr6 + vld vr4, a0, 0 + vseq.b vr7, vr0, vr1 + + vshuf.b vr2, vr4, vr2, vr6 + vseq.b vr8, vr2, vr3 + vand.v vr8, vr7, vr8 + vsetanyeqz.b fcc0, vr8 + + bcnez fcc0, L(un_pair_end) + bne a1, a4, L(un_loop) + +L(un_less_32bytes): + bltu a2, t0, L(un_less_16bytes) + vld vr2, a0, 16 + vld vr1, a1, 16 + addi.d a0, a0, 16 + + addi.d a1, a1, 16 + addi.d a2, a2, -16 + vshuf.b vr0, vr2, vr4, vr6 + vor.v vr4, vr2, vr2 + + vseq.b vr7, vr0, vr1 + vsetanyeqz.b fcc0, vr7 + bcnez fcc0, L(un_found) +L(un_less_16bytes): + beqz a2, L(out) + vld vr1, a1, 16 + bgeu a6, a2, 1f + + vld vr2, a0, 16 +1: + vshuf.b vr0, vr2, vr4, vr6 + vseq.b vr7, vr0, vr1 +L(un_end): + vreplgr2vr.b vr3, a2 + + + vslt.b vr3, vr5, vr3 + vorn.v vr7, vr7, vr3 + +L(un_found): + vnori.b vr7, vr7, 0 + vfrstpi.b vr7, vr7, 0 + + vshuf.b vr0, vr0, vr0, vr7 + vshuf.b vr1, vr1, vr1, vr7 +L(calc_result): + vpickve2gr.bu t0, vr0, 0 + vpickve2gr.bu t1, vr1, 0 + + sub.d t2, t0, t1 + sub.d t3, t1, t0 + masknez t0, t3, a5 + maskeqz t1, t2, a5 + + or a0, t0, t1 + jr ra +L(un_pair_end): + vsetanyeqz.b fcc0, vr7 + bcnez fcc0, L(un_found) + + + vnori.b vr7, vr8, 0 + vfrstpi.b vr7, vr7, 0 + vshuf.b vr0, vr2, vr2, vr7 + vshuf.b vr1, vr3, vr3, vr7 + + b L(calc_result) +END(MEMCMP) + + .section .rodata.cst16,"M",@progbits,16 + .align 4 +L(INDEX): + .dword 0x0706050403020100 + .dword 0x0f0e0d0c0b0a0908 + +libc_hidden_builtin_def (MEMCMP) +#endif diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp.c b/sysdeps/loongarch/lp64/multiarch/memcmp.c new file mode 100644 index 0000000000..32eccac2a3 --- /dev/null +++ b/sysdeps/loongarch/lp64/multiarch/memcmp.c @@ -0,0 +1,43 @@ +/* Multiple versions of memcmp. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ +#if IS_IN (libc) +# define memcmp __redirect_memcmp +# include +# undef memcmp + +# define SYMBOL_NAME memcmp +# include "ifunc-memcmp.h" + +libc_ifunc_redirected (__redirect_memcmp, memcmp, + IFUNC_SELECTOR ()); +# undef bcmp +weak_alias (memcmp, bcmp) + +# undef __memcmpeq +strong_alias (memcmp, __memcmpeq) +libc_hidden_def (__memcmpeq) + +# ifdef SHARED +__hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcmp); +# endif + +#endif From patchwork Mon Aug 28 07:26:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dengjianbo X-Patchwork-Id: 1826629 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RZ2Fk1ZV0z1yfX for ; Mon, 28 Aug 2023 17:27:38 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 47B1C385C6E2 for ; Mon, 28 Aug 2023 07:27:36 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 7597F3858C2C for ; Mon, 28 Aug 2023 07:27:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7597F3858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8AxFvFFTOxkcnUcAA--.57678S3; Mon, 28 Aug 2023 15:27:01 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx4eQ9TOxkBYBlAA--.49174S8; Mon, 28 Aug 2023 15:27:01 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Subject: [PATCH 6/6] LoongArch: Change loongarch to LoongArch in comments Date: Mon, 28 Aug 2023 15:26:51 +0800 Message-Id: <20230828072651.3085034-7-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230828072651.3085034-1-dengjianbo@loongson.cn> References: <20230828072651.3085034-1-dengjianbo@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dx4eQ9TOxkBYBlAA--.49174S8 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj9fXoWfGFWfCr18CryxWFW7WF4fCrX_yoW8XF13Wo WYkFWqqr1xCrWDK3yUCrs5uF9Fgr93Kw4UAay2qan8Gr1Fya4j9ry3Cas8KFy7tr95Gr4r Ga4Uua17Jr97JFn7l-sFpf9Il3svdjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY87kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUXVWUAwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280 aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aV CY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU8l38UUUUUU== X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" --- sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S | 2 +- sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/memmove-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/memmove-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/memmove-lsx.S | 2 +- sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strchr-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strchr-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strchr-lsx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strlen-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strlen-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strlen-lsx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S | 2 +- 24 files changed, 24 insertions(+), 24 deletions(-) diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S b/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S index 299dd49ce1..7eb34395cb 100644 --- a/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S @@ -1,4 +1,4 @@ -/* Optimized memcpy_aligned implementation using basic Loongarch instructions. +/* Optimized memcpy_aligned implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S b/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S index 4aae5bf831..ae148df5d7 100644 --- a/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S +++ b/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S @@ -1,4 +1,4 @@ -/* Optimized memcpy implementation using Loongarch LASX instructions. +/* Optimized memcpy implementation using LoongArch LASX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S b/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S index 6ebbe7a2c7..feb2bb0e0a 100644 --- a/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S @@ -1,4 +1,4 @@ -/* Optimized memcpy implementation using Loongarch LSX instructions. +/* Optimized memcpy implementation using LoongArch LSX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S b/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S index 8e60a22dfb..31019b138f 100644 --- a/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S +++ b/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S @@ -1,4 +1,4 @@ -/* Optimized unaligned memcpy implementation using basic Loongarch instructions. +/* Optimized unaligned memcpy implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S b/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S index 5354f38379..a02114c057 100644 --- a/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S @@ -1,4 +1,4 @@ -/* Optimized memmove_aligned implementation using basic Loongarch instructions. +/* Optimized memmove_aligned implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S b/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S index ff68e7a22b..95d8ee7b93 100644 --- a/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S +++ b/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S @@ -1,4 +1,4 @@ -/* Optimized memmove implementation using Loongarch LASX instructions. +/* Optimized memmove implementation using LoongArch LASX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S index 9e1502a79b..8a9367708d 100644 --- a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S @@ -1,4 +1,4 @@ -/* Optimized memmove implementation using Loongarch LSX instructions. +/* Optimized memmove implementation using LoongArch LSX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S b/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S index 90a64b6bb9..3284ce25fe 100644 --- a/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S +++ b/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S @@ -1,4 +1,4 @@ -/* Optimized memmove_unaligned implementation using basic Loongarch instructions. +/* Optimized memmove_unaligned implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S index 5fb01806e4..620200545b 100644 --- a/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S @@ -1,4 +1,4 @@ -/* Optimized strchr implementation using basic Loongarch instructions. +/* Optimized strchr implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S index 254402daa5..4d3cc58845 100644 --- a/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S +++ b/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S @@ -1,4 +1,4 @@ -/* Optimized strchr implementation using loongarch LASX SIMD instructions. +/* Optimized strchr implementation using LoongArch LASX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S index dae98b0a55..8b78c35c20 100644 --- a/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S @@ -1,4 +1,4 @@ -/* Optimized strlen implementation using loongarch LSX SIMD instructions. +/* Optimized strlen implementation using LoongArch LSX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S b/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S index 1c01a0232d..20856a06a0 100644 --- a/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S @@ -1,4 +1,4 @@ -/* Optimized strchrnul implementation using basic Loongarch instructions. +/* Optimized strchrnul implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S b/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S index d45495e48f..4753d4ced5 100644 --- a/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S +++ b/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S @@ -1,4 +1,4 @@ -/* Optimized strchrnul implementation using loongarch LASX SIMD instructions. +/* Optimized strchrnul implementation using LoongArch LASX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S b/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S index 07d793ae5f..671e740c03 100644 --- a/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S @@ -1,4 +1,4 @@ -/* Optimized strchrnul implementation using loongarch LSX SIMD instructions. +/* Optimized strchrnul implementation using LoongArch LSX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S b/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S index f5f4f3364e..ba1f9667e0 100644 --- a/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S @@ -1,4 +1,4 @@ -/* Optimized strcmp implementation using basic Loongarch instructions. +/* Optimized strcmp implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S index 2e177a3872..091c8c9ebd 100644 --- a/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S @@ -1,4 +1,4 @@ -/* Optimized strcmp implementation using Loongarch LSX instructions. +/* Optimized strcmp implementation using LoongArch LSX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S b/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S index e9e1d2fc04..ed0548e46b 100644 --- a/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S @@ -1,4 +1,4 @@ -/* Optimized strlen implementation using basic Loongarch instructions. +/* Optimized strlen implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S b/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S index 258c47cea0..91342f3415 100644 --- a/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S +++ b/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S @@ -1,4 +1,4 @@ -/* Optimized strlen implementation using loongarch LASX SIMD instructions. +/* Optimized strlen implementation using LoongArch LASX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S b/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S index b194355e7b..b09c12e00b 100644 --- a/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S @@ -1,4 +1,4 @@ -/* Optimized strlen implementation using Loongarch LSX SIMD instructions. +/* Optimized strlen implementation using LoongArch LSX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S b/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S index e2687fa770..f63de872a7 100644 --- a/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S @@ -1,4 +1,4 @@ -/* Optimized strncmp implementation using basic Loongarch instructions. +/* Optimized strncmp implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S index 0b4eee2a98..83cb801d5d 100644 --- a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S @@ -1,4 +1,4 @@ -/* Optimized strncmp implementation using Loongarch LSX instructions. +/* Optimized strncmp implementation using LoongArch LSX instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S b/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S index b900430a5d..a8296a1b21 100644 --- a/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S +++ b/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S @@ -1,4 +1,4 @@ -/* Optimized strnlen implementation using basic Loongarch instructions. +/* Optimized strnlen implementation using basic LoongArch instructions. Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S b/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S index 2c03d3d9b4..aa6c812d30 100644 --- a/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S +++ b/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S @@ -1,4 +1,4 @@ -/* Optimized strnlen implementation using loongarch LASX instructions +/* Optimized strnlen implementation using LoongArch LASX instructions Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library. diff --git a/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S b/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S index b769a89584..d0febe3eb0 100644 --- a/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S @@ -1,4 +1,4 @@ -/* Optimized strnlen implementation using loongarch LSX instructions +/* Optimized strnlen implementation using LoongArch LSX instructions Copyright (C) 2023 Free Software Foundation, Inc. This file is part of the GNU C Library.