From patchwork Wed Dec 20 07:20:12 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Palmer Dabbelt X-Patchwork-Id: 851279 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-88411-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="KRCV+f9z"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3z1mWj3njsz9s82 for ; Wed, 20 Dec 2017 18:25:21 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:date:message-id:in-reply-to:references :cc:from:to; q=dns; s=default; b=FoFwrXCJGTZHvxK3tGYHIQBcHWHl9nw 4c8HVc9KzDbAO8v3OuXdVesbLwO0Vy2ute4B9B3imk1M8DVAxIWbgefjn5XsXTVe enE6bvFgdIEFH2aNnAs057v6Op6yGeTRPQ3Dpl1c/pnGwLQwaF0CkDsDm1YX9Csb TmVGVbwtV7Qo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:date:message-id:in-reply-to:references :cc:from:to; s=default; bh=pUqmrUcZmyVBKHcbZutL3e4pbig=; b=KRCV+ f9zAOqkWZQHPbymV08kH4MHPgLJXd+il0DC/q/SBJTJXXabszR0G9VYJJLkcNCMP EDwML2wCR59AbrlnRTojUk1zYDV5vqoLyNumnc3RxpNqlslWZDibbgCRYDZ9Mvlx mvT5ZcoIRBpoMeVaHijhahmEp3wriwQHGvXeME= Received: (qmail 11031 invoked by alias); 20 Dec 2017 07:23:57 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 10875 invoked by uid 89); 20 Dec 2017 07:23:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=mutual, msk, HX-Received:10.99.49.215 X-HELO: mail-pg0-f53.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:date:message-id:in-reply-to:references :cc:from:to; bh=oBe8Ev8ARjPX4xDrvzFmwwIXeEVMXLq/Kaz6FYaNe1I=; b=bR0WoDK/Xpz8NyJhEaMRY539RRWf0bgwa7PBpW1vCVqLBYI15m+y91w0IIv49MerY3 1+NsslGkfa2Xf/qg69QUqptdy+HLjtRaDGbIwWeRUhiUqizlvwI5dWVOhnrTOMdIB5Ek 98BLGFEwBLraWZWKps1BuiC1YcHLa79lRQKl8Nu0uXcNe/WE9Syirsiu+TP13S1UIkqw lQQpm7WelWLD9WI3fl4LU0kJwGpaPBw+yClku0v6SYY85umz7vhtMd/N6o1x8CR84P9g OUlFTjEIImUEcrbKEkxbACX8RZeVaKnI6JabxpKAE/OxLBE1ZPgbUwBCJSmznV+wpGI6 9wPw== X-Gm-Message-State: AKGB3mI+DTa8knBMXyjxgSfdwwUK4NUExWvs4uCoJmVTPDS0YgfHMdO9 QU92N0ASrtIzglCeAEnWGtZgBg== X-Google-Smtp-Source: ACJfBotgvATn6rQLBVK4hrY6LjHcLV6AyIjOl6hJu1SSmzmzPKsVgrkMYO0giPd3k+RVqkVp3HcMwg== X-Received: by 10.99.49.215 with SMTP id x206mr5433993pgx.372.1513754632402; Tue, 19 Dec 2017 23:23:52 -0800 (PST) Subject: [PATCH v2 05/15] RISC-V: Generic Routines Date: Tue, 19 Dec 2017 23:20:12 -0800 Message-Id: <20171220072022.26909-6-palmer@dabbelt.com> In-Reply-To: <20171220072022.26909-1-palmer@dabbelt.com> References: <20171220072022.26909-1-palmer@dabbelt.com> Cc: Andrew Waterman , Darius Rad , dj@redhat.com, Palmer Dabbelt From: Palmer Dabbelt To: libc-alpha@sourceware.org This patch contains fast versions of the various routines from string.h that have been implemented for RISC-V. Since RISC-V doesn't define any specific performance characteristics they're not optimized for any particular microarchitecture, but are designed to be generally good. --- sysdeps/riscv/detect_null_byte.h | 31 ++++++++ sysdeps/riscv/memcpy.c | 92 ++++++++++++++++++++++ sysdeps/riscv/memset.S | 107 ++++++++++++++++++++++++++ sysdeps/riscv/strcmp.S | 159 +++++++++++++++++++++++++++++++++++++++ sysdeps/riscv/strcpy.c | 73 ++++++++++++++++++ sysdeps/riscv/strlen.c | 58 ++++++++++++++ 6 files changed, 520 insertions(+) create mode 100644 sysdeps/riscv/detect_null_byte.h create mode 100644 sysdeps/riscv/memcpy.c create mode 100644 sysdeps/riscv/memset.S create mode 100644 sysdeps/riscv/strcmp.S create mode 100644 sysdeps/riscv/strcpy.c create mode 100644 sysdeps/riscv/strlen.c diff --git a/sysdeps/riscv/detect_null_byte.h b/sysdeps/riscv/detect_null_byte.h new file mode 100644 index 000000000000..a888b5d25bf3 --- /dev/null +++ b/sysdeps/riscv/detect_null_byte.h @@ -0,0 +1,31 @@ +/* RISC-V null byte detection + Copyright (C) 2017 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#ifndef _RISCV_DETECT_NULL_BYTE_H +#define _RISCV_DETECT_NULL_BYTE_H 1 + +static __inline__ unsigned long detect_null_byte (unsigned long w) +{ + unsigned long mask = 0x7f7f7f7f; + if (sizeof (long) == 8) + mask = ((mask << 16) << 16) | mask; + return ~(((w & mask) + mask) | w | mask); +} + +#endif /* detect_null_byte.h */ diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c new file mode 100644 index 000000000000..8be924ac1b56 --- /dev/null +++ b/sysdeps/riscv/memcpy.c @@ -0,0 +1,92 @@ +/* Optimized memory copy implementation for RISC-V. + Copyright (C) 2011-2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +#define MEMCPY_LOOP_BODY(a, b, t) { \ + t tt = *b; \ + a++, b++; \ + *(a - 1) = tt; \ + } + +void *__memcpy(void *aa, const void *bb, size_t n) +{ + uintptr_t msk = sizeof(long) - 1; + char *a = (char *)aa, *end = a + n; + const char *b = (const char *)bb; + long *la, *lend; + const long *lb; + int same_alignment = ((uintptr_t)a & msk) == ((uintptr_t)b & msk); + + /* Handle small cases, and those without mutual alignment. */ + if (__glibc_unlikely(!same_alignment || n < sizeof(long))) + { +small: + while (a < end) + MEMCPY_LOOP_BODY(a, b, char); + return aa; + } + + /* Obtain alignment. */ + if (__glibc_unlikely(((uintptr_t)a & msk) != 0)) + while ((uintptr_t)a & msk) + MEMCPY_LOOP_BODY(a, b, char); + + la = (long *)a; + lb = (const long *)b; + lend = (long *)((uintptr_t)end & ~msk); + + /* Handle large, aligned cases. */ + if (__glibc_unlikely(la < lend - 8)) + while (la < lend - 8) + { + long b0 = *lb++; + long b1 = *lb++; + long b2 = *lb++; + long b3 = *lb++; + long b4 = *lb++; + long b5 = *lb++; + long b6 = *lb++; + long b7 = *lb++; + long b8 = *lb++; + *la++ = b0; + *la++ = b1; + *la++ = b2; + *la++ = b3; + *la++ = b4; + *la++ = b5; + *la++ = b6; + *la++ = b7; + *la++ = b8; + } + + /* Handle aligned, small case. */ + while (la < lend) + MEMCPY_LOOP_BODY(la, lb, long); + + /* Handle misaligned remainder. */ + a = (char *)la; + b = (const char *)lb; + if (__glibc_unlikely(a < end)) + goto small; + + return aa; +} +weak_alias (__memcpy, memcpy) +libc_hidden_builtin_def (memcpy) diff --git a/sysdeps/riscv/memset.S b/sysdeps/riscv/memset.S new file mode 100644 index 000000000000..b06eb8312ed1 --- /dev/null +++ b/sysdeps/riscv/memset.S @@ -0,0 +1,107 @@ +/* Optimized memset implementation for RISC-V. + Copyright (C) 2011-2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +ENTRY(memset) + li a6, 15 + mv a4, a0 + bleu a2, a6, .Ltiny + and a5, a4, 15 + bnez a5, .Lmisaligned + +.Laligned: + bnez a1, .Lwordify + +.Lwordified: + and a3, a2, ~15 + and a2, a2, 15 + add a3, a3, a4 + +#if __riscv_xlen == 64 +1:sd a1, 0(a4) + sd a1, 8(a4) +#else +1:sw a1, 0(a4) + sw a1, 4(a4) + sw a1, 8(a4) + sw a1, 12(a4) +#endif + add a4, a4, 16 + bltu a4, a3, 1b + + bnez a2, .Ltiny + ret + +.Ltiny: + sub a3, a6, a2 + sll a3, a3, 2 +1:auipc t0, %pcrel_hi(.Ltable) + add a3, a3, t0 +.option push +.option norvc +.Ltable_misaligned: + jr a3, %pcrel_lo(1b) +.Ltable: + sb a1,14(a4) + sb a1,13(a4) + sb a1,12(a4) + sb a1,11(a4) + sb a1,10(a4) + sb a1, 9(a4) + sb a1, 8(a4) + sb a1, 7(a4) + sb a1, 6(a4) + sb a1, 5(a4) + sb a1, 4(a4) + sb a1, 3(a4) + sb a1, 2(a4) + sb a1, 1(a4) + sb a1, 0(a4) +.option pop + ret + +.Lwordify: + and a1, a1, 0xFF + sll a3, a1, 8 + or a1, a1, a3 + sll a3, a1, 16 + or a1, a1, a3 +#if __riscv_xlen == 64 + sll a3, a1, 32 + or a1, a1, a3 +#endif + j .Lwordified + +.Lmisaligned: + sll a3, a5, 2 +1:auipc t0, %pcrel_hi(.Ltable_misaligned) + add a3, a3, t0 + mv t0, ra + jalr a3, %pcrel_lo(1b) + mv ra, t0 + + add a5, a5, -16 + sub a4, a4, a5 + add a2, a2, a5 + bleu a2, a6, .Ltiny + j .Laligned +END(memset) + +weak_alias(memset, __GI_memset) diff --git a/sysdeps/riscv/strcmp.S b/sysdeps/riscv/strcmp.S new file mode 100644 index 000000000000..edc38513cc21 --- /dev/null +++ b/sysdeps/riscv/strcmp.S @@ -0,0 +1,159 @@ +/* Optimized string compare implementation for RISC-V. + Copyright (C) 2011-2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +#if __BYTE_ORDER__ != __ORDER_LITTLE_ENDIAN__ +# error +#endif + +ENTRY(strcmp) + or a4, a0, a1 + li t2, -1 + and a4, a4, SZREG-1 + bnez a4, .Lmisaligned + +#if SZREG == 4 + li t3, 0x7f7f7f7f +#else + ld t3, mask +#endif + + .macro check_one_word i n + REG_L a2, \i*SZREG(a0) + REG_L a3, \i*SZREG(a1) + + and t0, a2, t3 + or t1, a2, t3 + add t0, t0, t3 + or t0, t0, t1 + + bne t0, t2, .Lnull\i + .if \i+1-\n + bne a2, a3, .Lmismatch + .else + add a0, a0, \n*SZREG + add a1, a1, \n*SZREG + beq a2, a3, .Lloop + # fall through to .Lmismatch + .endif + .endm + + .macro foundnull i n + .ifne \i + .Lnull\i: + add a0, a0, \i*SZREG + add a1, a1, \i*SZREG + .ifeq \i-1 + .Lnull0: + .endif + bne a2, a3, .Lmisaligned + li a0, 0 + ret + .endif + .endm + +.Lloop: + # examine full words at a time, favoring strings of a couple dozen chars +#if __riscv_xlen == 32 + check_one_word 0 5 + check_one_word 1 5 + check_one_word 2 5 + check_one_word 3 5 + check_one_word 4 5 +#else + check_one_word 0 3 + check_one_word 1 3 + check_one_word 2 3 +#endif + # backwards branch to .Lloop contained above + +.Lmismatch: + # words don't match, but a2 has no null byte. +#if __riscv_xlen == 64 + sll a4, a2, 48 + sll a5, a3, 48 + bne a4, a5, .Lmismatch_upper + sll a4, a2, 32 + sll a5, a3, 32 + bne a4, a5, .Lmismatch_upper +#endif + sll a4, a2, 16 + sll a5, a3, 16 + bne a4, a5, .Lmismatch_upper + + srl a4, a2, 8*SZREG-16 + srl a5, a3, 8*SZREG-16 + sub a0, a4, a5 + and a1, a0, 0xff + bnez a1, 1f + ret + +.Lmismatch_upper: + srl a4, a4, 8*SZREG-16 + srl a5, a5, 8*SZREG-16 + sub a0, a4, a5 + and a1, a0, 0xff + bnez a1, 1f + ret + +1:and a4, a4, 0xff + and a5, a5, 0xff + sub a0, a4, a5 + ret + +.Lmisaligned: + # misaligned + lbu a2, 0(a0) + lbu a3, 0(a1) + add a0, a0, 1 + add a1, a1, 1 + bne a2, a3, 1f + bnez a2, .Lmisaligned + +1: + sub a0, a2, a3 + ret + + # cases in which a null byte was detected +#if __riscv_xlen == 32 + foundnull 0 5 + foundnull 1 5 + foundnull 2 5 + foundnull 3 5 + foundnull 4 5 +#else + foundnull 0 3 + foundnull 1 3 + foundnull 2 3 +#endif + +END(strcmp) + +weak_alias(strcmp, __GI_strcmp) + +#if SZREG == 8 +#ifdef __PIC__ +.section .rodata.cst8,"aM",@progbits,8 +#else +.section .srodata.cst8,"aM",@progbits,8 +#endif +.align 3 +mask: .8byte 0x7f7f7f7f7f7f7f7f +#endif diff --git a/sysdeps/riscv/strcpy.c b/sysdeps/riscv/strcpy.c new file mode 100644 index 000000000000..9382fb145db9 --- /dev/null +++ b/sysdeps/riscv/strcpy.c @@ -0,0 +1,73 @@ +/* Optimized string copy implementation for RISC-V. + Copyright (C) 2011-2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include "detect_null_byte.h" + +#undef strcpy + +char* strcpy(char* dst, const char* src) +{ + char* dst0 = dst; + + int misaligned = ((uintptr_t)dst | (uintptr_t)src) & (sizeof(long)-1); + if (__builtin_expect(!misaligned, 1)) + { + long* ldst = (long*)dst; + const long* lsrc = (const long*)src; + + while (!detect_null_byte(*lsrc)) + *ldst++ = *lsrc++; + + dst = (char*)ldst; + src = (const char*)lsrc; + + char c0 = src[0]; + char c1 = src[1]; + char c2 = src[2]; + if (!(*dst++ = c0)) return dst0; + if (!(*dst++ = c1)) return dst0; + char c3 = src[3]; + if (!(*dst++ = c2)) return dst0; + if (sizeof(long) == 4) goto out; + char c4 = src[4]; + if (!(*dst++ = c3)) return dst0; + char c5 = src[5]; + if (!(*dst++ = c4)) return dst0; + char c6 = src[6]; + if (!(*dst++ = c5)) return dst0; + if (!(*dst++ = c6)) return dst0; + +out: + *dst++ = 0; + return dst0; + } + + char ch; + do + { + ch = *src; + src++; + dst++; + *(dst-1) = ch; + } while(ch); + + return dst0; +} +libc_hidden_def(strcpy) diff --git a/sysdeps/riscv/strlen.c b/sysdeps/riscv/strlen.c new file mode 100644 index 000000000000..86a76947c3f3 --- /dev/null +++ b/sysdeps/riscv/strlen.c @@ -0,0 +1,58 @@ +/* Determine the length of a string. RISC-V version. + Copyright (C) 2011-2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include "detect_null_byte.h" + +#undef strlen + +size_t strlen(const char* str) +{ + const char* start = str; + + if (__builtin_expect((uintptr_t)str & (sizeof(long)-1), 0)) do + { + char ch = *str; + str++; + if (!ch) + return str - start - 1; + } while ((uintptr_t)str & (sizeof(long)-1)); + + unsigned long* ls = (unsigned long*)str; + while (!detect_null_byte(*ls++)) + ; + asm volatile ("" : "+r"(ls)); /* prevent "optimization" */ + + str = (const char*)ls; + size_t ret = str - start, sl = sizeof(long); + + char c0 = str[0-sl], c1 = str[1-sl], c2 = str[2-sl], c3 = str[3-sl]; + if (c0 == 0) return ret + 0 - sl; + if (c1 == 0) return ret + 1 - sl; + if (c2 == 0) return ret + 2 - sl; + if (sl == 4 || c3 == 0) return ret + 3 - sl; + + c0 = str[4-sl], c1 = str[5-sl], c2 = str[6-sl], c3 = str[7-sl]; + if (c0 == 0) return ret + 4 - sl; + if (c1 == 0) return ret + 5 - sl; + if (c2 == 0) return ret + 6 - sl; + + return ret + 7 - sl; +} +libc_hidden_def(strlen)