From patchwork Tue Jul 12 19:29:05 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Noah Goldstein <goldstein.w.n@gmail.com>
X-Patchwork-Id: 1655616
Return-Path: <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: bilbo.ozlabs.org;
	dkim=pass (1024-bit key;
 secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256
 header.s=default header.b=k+SvUQj9;
	dkim-atps=neutral
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org;
 envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org;
 receiver=<UNKNOWN>)
Received: from sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
	(No client certificate requested)
	by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Lj9np2HjPz9s09
	for <incoming@patchwork.ozlabs.org>; Wed, 13 Jul 2022 05:30:22 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 412BC388CE8E
	for <incoming@patchwork.ozlabs.org>; Tue, 12 Jul 2022 19:30:20 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 412BC388CE8E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1657654220;
	bh=a4sd8sfvJf22s1ZMHNaFAZ08hQ/FA+7Ezh8I+3QuoyQ=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=k+SvUQj9xWlklqog9QcnVSLlIlaesgMEzXyRyVTZR1k2Q5W8HB323RLYqK31gAUEQ
	 0dC8JjzNyJe0RE4QapBgzkLXWmT+Ej+CPcTZzdn7JJWnHsejPZZU77MXzdVgPAhmW4
	 Xk8qaocaqxphYlW+JD3lcT2Ml2uEC9q82O3ClrwY=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com
 [IPv6:2607:f8b0:4864:20::1031])
 by sourceware.org (Postfix) with ESMTPS id E7C9938768A3
 for <libc-alpha@sourceware.org>; Tue, 12 Jul 2022 19:29:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E7C9938768A3
Received: by mail-pj1-x1031.google.com with SMTP id
 o5-20020a17090a3d4500b001ef76490983so9537916pjf.2
 for <libc-alpha@sourceware.org>; Tue, 12 Jul 2022 12:29:18 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=a4sd8sfvJf22s1ZMHNaFAZ08hQ/FA+7Ezh8I+3QuoyQ=;
 b=2io/3yx2nMzLu889lzApE+DMkaKCLz4rsVbuBejyCWb7lGh/R9ZsCN2Kxf0vG0xNgg
 nNVd6ObGo7VrYN496TlCd9FH8eRMHUvvnQEBri/8arGPQLmXpyWhQhL54F0tZGaIdhih
 ET3kRult3eWvlMQ0/iu73SIB37kM+qgSVPgLdk9OHh3V8RHMzyS76SxAQ7XfdQQHgKPT
 yuv41GidjZZBmKMwux+bW/J7j3zL+Y3SF3I+gFF7F7SUDgio17Moy3hXr7XaVGETaJjI
 xxr0CFUfMc7uhqCu/EclTyjKK8qzBmrUTXjj46eYcGVLpa7rcgosrZo4wKe5HtlQTQRe
 OSBA==
X-Gm-Message-State: AJIora9+lccCz5dbcMAld2v647mpuFcD87puhFBu4EfHxDpCHG1dr7wA
 4XBNjwi4+Pkru/bULTTXeo/aMxMXHVA=
X-Google-Smtp-Source: 
 AGRyM1uf34COTf7OavIWARGsHxmZOGdNmZy5J+Lx4tdITKsOKLIJega0B7cbwXoIHs6FRODmMdOemw==
X-Received: by 2002:a17:90b:4a44:b0:1f0:3680:2a72 with SMTP id
 lb4-20020a17090b4a4400b001f036802a72mr5849675pjb.97.1657654157627;
 Tue, 12 Jul 2022 12:29:17 -0700 (PDT)
Received: from noah-tgl.. ([192.55.60.37]) by smtp.gmail.com with ESMTPSA id
 w7-20020a170902e88700b0016c28fbd7e5sm7274704plg.268.2022.07.12.12.29.16
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 12 Jul 2022 12:29:17 -0700 (PDT)
To: libc-alpha@sourceware.org
Subject: [PATCH v1] x86: Move strchr SSE2 implementation to
 multiarch/strchr-sse2.S
Date: Tue, 12 Jul 2022 12:29:05 -0700
Message-Id: <20220712192910.351121-5-goldstein.w.n@gmail.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20220712192910.351121-1-goldstein.w.n@gmail.com>
References: <20220712192910.351121-1-goldstein.w.n@gmail.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Noah Goldstein via Libc-alpha
 <libc-alpha@sourceware.org>
From: Noah Goldstein <goldstein.w.n@gmail.com>
Reply-To: Noah Goldstein <goldstein.w.n@gmail.com>
Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org>

This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
---
 sysdeps/x86_64/multiarch/rtld-strchr.S    |  18 +++
 sysdeps/x86_64/multiarch/rtld-strchrnul.S |  18 +++
 sysdeps/x86_64/multiarch/strchr-sse2.S    | 175 +++++++++++++++++++++-
 sysdeps/x86_64/multiarch/strchrnul-sse2.S |  11 +-
 sysdeps/x86_64/strchr.S                   | 167 +--------------------
 sysdeps/x86_64/strchrnul.S                |   7 +-
 6 files changed, 213 insertions(+), 183 deletions(-)
 create mode 100644 sysdeps/x86_64/multiarch/rtld-strchr.S
 create mode 100644 sysdeps/x86_64/multiarch/rtld-strchrnul.S
diff --git a/sysdeps/x86_64/multiarch/rtld-strchr.S b/sysdeps/x86_64/multiarch/rtld-strchr.S
new file mode 100644
index 0000000000..2b7b879e37
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/rtld-strchr.S
@@ -0,0 +1,18 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "../strchr.S"
diff --git a/sysdeps/x86_64/multiarch/rtld-strchrnul.S b/sysdeps/x86_64/multiarch/rtld-strchrnul.S
new file mode 100644
index 0000000000..0cc5becc88
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/rtld-strchrnul.S
@@ -0,0 +1,18 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "../strchrnul.S"
diff --git a/sysdeps/x86_64/multiarch/strchr-sse2.S b/sysdeps/x86_64/multiarch/strchr-sse2.S
index 992f700077..f7767ca543 100644
--- a/sysdeps/x86_64/multiarch/strchr-sse2.S
+++ b/sysdeps/x86_64/multiarch/strchr-sse2.S
@@ -16,13 +16,172 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#if IS_IN (libc)
-# define strchr __strchr_sse2
+#if IS_IN (libc) || defined STRCHR
+# ifndef STRCHR
+#  define STRCHR __strchr_sse2
+# endif
 
-# undef weak_alias
-# define weak_alias(strchr, index)
-# undef libc_hidden_builtin_def
-# define libc_hidden_builtin_def(strchr)
-#endif
+# include <sysdep.h>
+
+	.text
+ENTRY (STRCHR)
+	movd	%esi, %xmm1
+	movl	%edi, %eax
+	andl	$4095, %eax
+	punpcklbw %xmm1, %xmm1
+	cmpl	$4032, %eax
+	punpcklwd %xmm1, %xmm1
+	pshufd	$0, %xmm1, %xmm1
+	jg	L(cross_page)
+	movdqu	(%rdi), %xmm0
+	pxor	%xmm3, %xmm3
+	movdqa	%xmm0, %xmm4
+	pcmpeqb	%xmm1, %xmm0
+	pcmpeqb	%xmm3, %xmm4
+	por	%xmm4, %xmm0
+	pmovmskb %xmm0, %eax
+	test	%eax, %eax
+	je	L(next_48_bytes)
+	bsf	%eax, %eax
+# ifdef AS_STRCHRNUL
+	leaq	(%rdi,%rax), %rax
+# else
+	movl	$0, %edx
+	leaq	(%rdi,%rax), %rax
+	cmpb	%sil, (%rax)
+	cmovne	%rdx, %rax
+# endif
+	ret
+
+	.p2align 3
+L(next_48_bytes):
+	movdqu	16(%rdi), %xmm0
+	movdqa	%xmm0, %xmm4
+	pcmpeqb	%xmm1, %xmm0
+	pcmpeqb	%xmm3, %xmm4
+	por	%xmm4, %xmm0
+	pmovmskb %xmm0, %ecx
+	movdqu	32(%rdi), %xmm0
+	movdqa	%xmm0, %xmm4
+	pcmpeqb	%xmm1, %xmm0
+	salq	$16, %rcx
+	pcmpeqb	%xmm3, %xmm4
+	por	%xmm4, %xmm0
+	pmovmskb %xmm0, %eax
+	movdqu	48(%rdi), %xmm0
+	pcmpeqb	%xmm0, %xmm3
+	salq	$32, %rax
+	pcmpeqb	%xmm1, %xmm0
+	orq	%rcx, %rax
+	por	%xmm3, %xmm0
+	pmovmskb %xmm0, %ecx
+	salq	$48, %rcx
+	orq	%rcx, %rax
+	testq	%rax, %rax
+	jne	L(return)
+L(loop_start):
+	/* We use this alignment to force loop be aligned to 8 but not
+	   16 bytes.  This gives better sheduling on AMD processors.  */
+	.p2align 4
+	pxor	%xmm6, %xmm6
+	andq	$-64, %rdi
+	.p2align 3
+L(loop64):
+	addq	$64, %rdi
+	movdqa	(%rdi), %xmm5
+	movdqa	16(%rdi), %xmm2
+	movdqa	32(%rdi), %xmm3
+	pxor	%xmm1, %xmm5
+	movdqa	48(%rdi), %xmm4
+	pxor	%xmm1, %xmm2
+	pxor	%xmm1, %xmm3
+	pminub	(%rdi), %xmm5
+	pxor	%xmm1, %xmm4
+	pminub	16(%rdi), %xmm2
+	pminub	32(%rdi), %xmm3
+	pminub	%xmm2, %xmm5
+	pminub	48(%rdi), %xmm4
+	pminub	%xmm3, %xmm5
+	pminub	%xmm4, %xmm5
+	pcmpeqb %xmm6, %xmm5
+	pmovmskb %xmm5, %eax
+
+	testl	%eax, %eax
+	je	L(loop64)
 
-#include "../strchr.S"
+	movdqa	(%rdi), %xmm5
+	movdqa	%xmm5, %xmm0
+	pcmpeqb	%xmm1, %xmm5
+	pcmpeqb	%xmm6, %xmm0
+	por	%xmm0, %xmm5
+	pcmpeqb %xmm6, %xmm2
+	pcmpeqb %xmm6, %xmm3
+	pcmpeqb %xmm6, %xmm4
+
+	pmovmskb %xmm5, %ecx
+	pmovmskb %xmm2, %eax
+	salq	$16, %rax
+	pmovmskb %xmm3, %r8d
+	pmovmskb %xmm4, %edx
+	salq	$32, %r8
+	orq	%r8, %rax
+	orq	%rcx, %rax
+	salq	$48, %rdx
+	orq	%rdx, %rax
+	.p2align 3
+L(return):
+	bsfq	%rax, %rax
+# ifdef AS_STRCHRNUL
+	leaq	(%rdi,%rax), %rax
+# else
+	movl	$0, %edx
+	leaq	(%rdi,%rax), %rax
+	cmpb	%sil, (%rax)
+	cmovne	%rdx, %rax
+# endif
+	ret
+	.p2align 4
+
+L(cross_page):
+	movq	%rdi, %rdx
+	pxor	%xmm2, %xmm2
+	andq	$-64, %rdx
+	movdqa	%xmm1, %xmm0
+	movdqa	(%rdx), %xmm3
+	movdqa	%xmm3, %xmm4
+	pcmpeqb	%xmm1, %xmm3
+	pcmpeqb	%xmm2, %xmm4
+	por	%xmm4, %xmm3
+	pmovmskb %xmm3, %r8d
+	movdqa	16(%rdx), %xmm3
+	movdqa	%xmm3, %xmm4
+	pcmpeqb	%xmm1, %xmm3
+	pcmpeqb	%xmm2, %xmm4
+	por	%xmm4, %xmm3
+	pmovmskb %xmm3, %eax
+	movdqa	32(%rdx), %xmm3
+	movdqa	%xmm3, %xmm4
+	pcmpeqb	%xmm1, %xmm3
+	salq	$16, %rax
+	pcmpeqb	%xmm2, %xmm4
+	por	%xmm4, %xmm3
+	pmovmskb %xmm3, %r9d
+	movdqa	48(%rdx), %xmm3
+	pcmpeqb	%xmm3, %xmm2
+	salq	$32, %r9
+	pcmpeqb	%xmm3, %xmm0
+	orq	%r9, %rax
+	orq	%r8, %rax
+	por	%xmm2, %xmm0
+	pmovmskb %xmm0, %ecx
+	salq	$48, %rcx
+	orq	%rcx, %rax
+	movl	%edi, %ecx
+	subb	%dl, %cl
+	shrq	%cl, %rax
+	testq	%rax, %rax
+	jne	L(return)
+	jmp	L(loop_start)
+
+END (STRCHR)
+#endif
diff --git a/sysdeps/x86_64/multiarch/strchrnul-sse2.S b/sysdeps/x86_64/multiarch/strchrnul-sse2.S
index f91c670369..7238977a21 100644
--- a/sysdeps/x86_64/multiarch/strchrnul-sse2.S
+++ b/sysdeps/x86_64/multiarch/strchrnul-sse2.S
@@ -17,10 +17,11 @@
    <https://www.gnu.org/licenses/>.  */
 
 #if IS_IN (libc)
-# define __strchrnul __strchrnul_sse2
-
-# undef weak_alias
-# define weak_alias(__strchrnul, strchrnul)
+# ifndef STRCHR
+#  define STRCHR	__strchrnul_sse2
+# endif
 #endif
 
-#include "../strchrnul.S"
+#define AS_STRCHRNUL
+
+#include "strchr-sse2.S"
diff --git a/sysdeps/x86_64/strchr.S b/sysdeps/x86_64/strchr.S
index dda7c0431d..77c956c92c 100644
--- a/sysdeps/x86_64/strchr.S
+++ b/sysdeps/x86_64/strchr.S
@@ -17,171 +17,8 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <sysdep.h>
 
-	.text
-ENTRY (strchr)
-	movd	%esi, %xmm1
-	movl	%edi, %eax
-	andl	$4095, %eax
-	punpcklbw %xmm1, %xmm1
-	cmpl	$4032, %eax
-	punpcklwd %xmm1, %xmm1
-	pshufd	$0, %xmm1, %xmm1
-	jg	L(cross_page)
-	movdqu	(%rdi), %xmm0
-	pxor	%xmm3, %xmm3
-	movdqa	%xmm0, %xmm4
-	pcmpeqb	%xmm1, %xmm0
-	pcmpeqb	%xmm3, %xmm4
-	por	%xmm4, %xmm0
-	pmovmskb %xmm0, %eax
-	test	%eax, %eax
-	je	L(next_48_bytes)
-	bsf	%eax, %eax
-#ifdef AS_STRCHRNUL
-	leaq	(%rdi,%rax), %rax
-#else
-	movl	$0, %edx
-	leaq	(%rdi,%rax), %rax
-	cmpb	%sil, (%rax)
-	cmovne	%rdx, %rax
-#endif
-	ret
-
-	.p2align 3
-	L(next_48_bytes):
-	movdqu	16(%rdi), %xmm0
-	movdqa	%xmm0, %xmm4
-	pcmpeqb	%xmm1, %xmm0
-	pcmpeqb	%xmm3, %xmm4
-	por	%xmm4, %xmm0
-	pmovmskb %xmm0, %ecx
-	movdqu	32(%rdi), %xmm0
-	movdqa	%xmm0, %xmm4
-	pcmpeqb	%xmm1, %xmm0
-	salq	$16, %rcx
-	pcmpeqb	%xmm3, %xmm4
-	por	%xmm4, %xmm0
-	pmovmskb %xmm0, %eax
-	movdqu	48(%rdi), %xmm0
-	pcmpeqb	%xmm0, %xmm3
-	salq	$32, %rax
-	pcmpeqb	%xmm1, %xmm0
-	orq	%rcx, %rax
-	por	%xmm3, %xmm0
-	pmovmskb %xmm0, %ecx
-	salq	$48, %rcx
-	orq	%rcx, %rax
-	testq	%rax, %rax
-	jne	L(return)
-L(loop_start):
-	/* We use this alignment to force loop be aligned to 8 but not
-	   16 bytes.  This gives better sheduling on AMD processors.  */
-	.p2align 4
-	pxor	%xmm6, %xmm6
-	andq	$-64, %rdi
-	.p2align 3
-L(loop64):
-	addq	$64, %rdi
-	movdqa	(%rdi), %xmm5
-	movdqa	16(%rdi), %xmm2
-	movdqa	32(%rdi), %xmm3
-	pxor	%xmm1, %xmm5
-	movdqa	48(%rdi), %xmm4
-	pxor	%xmm1, %xmm2
-	pxor	%xmm1, %xmm3
-	pminub	(%rdi), %xmm5
-	pxor	%xmm1, %xmm4
-	pminub	16(%rdi), %xmm2
-	pminub	32(%rdi), %xmm3
-	pminub	%xmm2, %xmm5
-	pminub	48(%rdi), %xmm4
-	pminub	%xmm3, %xmm5
-	pminub	%xmm4, %xmm5
-	pcmpeqb %xmm6, %xmm5
-	pmovmskb %xmm5, %eax
-
-	testl	%eax, %eax
-	je	L(loop64)
-
-	movdqa	(%rdi), %xmm5
-	movdqa	%xmm5, %xmm0
-	pcmpeqb	%xmm1, %xmm5
-	pcmpeqb	%xmm6, %xmm0
-	por	%xmm0, %xmm5
-	pcmpeqb %xmm6, %xmm2
-	pcmpeqb %xmm6, %xmm3
-	pcmpeqb %xmm6, %xmm4
-
-	pmovmskb %xmm5, %ecx
-	pmovmskb %xmm2, %eax
-	salq	$16, %rax
-	pmovmskb %xmm3, %r8d
-	pmovmskb %xmm4, %edx
-	salq	$32, %r8
-	orq	%r8, %rax
-	orq	%rcx, %rax
-	salq	$48, %rdx
-	orq	%rdx, %rax
-	.p2align 3
-L(return):
-	bsfq	%rax, %rax
-#ifdef AS_STRCHRNUL
-	leaq	(%rdi,%rax), %rax
-#else
-	movl	$0, %edx
-	leaq	(%rdi,%rax), %rax
-	cmpb	%sil, (%rax)
-	cmovne	%rdx, %rax
-#endif
-	ret
-	.p2align 4
-
-L(cross_page):
-	movq	%rdi, %rdx
-	pxor	%xmm2, %xmm2
-	andq	$-64, %rdx
-	movdqa	%xmm1, %xmm0
-	movdqa	(%rdx), %xmm3
-	movdqa	%xmm3, %xmm4
-	pcmpeqb	%xmm1, %xmm3
-	pcmpeqb	%xmm2, %xmm4
-	por	%xmm4, %xmm3
-	pmovmskb %xmm3, %r8d
-	movdqa	16(%rdx), %xmm3
-	movdqa	%xmm3, %xmm4
-	pcmpeqb	%xmm1, %xmm3
-	pcmpeqb	%xmm2, %xmm4
-	por	%xmm4, %xmm3
-	pmovmskb %xmm3, %eax
-	movdqa	32(%rdx), %xmm3
-	movdqa	%xmm3, %xmm4
-	pcmpeqb	%xmm1, %xmm3
-	salq	$16, %rax
-	pcmpeqb	%xmm2, %xmm4
-	por	%xmm4, %xmm3
-	pmovmskb %xmm3, %r9d
-	movdqa	48(%rdx), %xmm3
-	pcmpeqb	%xmm3, %xmm2
-	salq	$32, %r9
-	pcmpeqb	%xmm3, %xmm0
-	orq	%r9, %rax
-	orq	%r8, %rax
-	por	%xmm2, %xmm0
-	pmovmskb %xmm0, %ecx
-	salq	$48, %rcx
-	orq	%rcx, %rax
-	movl	%edi, %ecx
-	subb	%dl, %cl
-	shrq	%cl, %rax
-	testq	%rax, %rax
-	jne	L(return)
-	jmp	L(loop_start)
-
-END (strchr)
-
-#ifndef AS_STRCHRNUL
+#define STRCHR strchr
+#include "multiarch/strchr-sse2.S"
 weak_alias (strchr, index)
 libc_hidden_builtin_def (strchr)
-#endif
diff --git a/sysdeps/x86_64/strchrnul.S b/sysdeps/x86_64/strchrnul.S
index ec2e652e25..508e42db26 100644
--- a/sysdeps/x86_64/strchrnul.S
+++ b/sysdeps/x86_64/strchrnul.S
@@ -18,10 +18,7 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <sysdep.h>
-
-#define strchr __strchrnul
-#define AS_STRCHRNUL
-#include "strchr.S"
+#define STRCHR __strchrnul
+#include "multiarch/strchrnul-sse2.S"
 
 weak_alias (__strchrnul, strchrnul)