From patchwork Tue Jul 30 15:41:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andi Kleen X-Patchwork-Id: 1966613 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=lv0J2ZC/; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKJK11P8z1ybX for ; Wed, 31 Jul 2024 01:43:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B5C08385DC1B for ; Tue, 30 Jul 2024 15:43:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by sourceware.org (Postfix) with ESMTPS id 972B13858C56; Tue, 30 Jul 2024 15:42:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 972B13858C56 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.intel.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=linux.intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 972B13858C56 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722354141; cv=none; b=CW36r3QETVGj9uCCMOlmbpVTEcXWkVJu+LDx2b9uj0qMxw2YWNkHTsydzYlK3btxjFfJ+dYr3n8j8m1fXep0ed9IGbzi1HILB9dBpN8PTqa8f4Bqbe5pj2HZZ90TNVVx0fGKkJGl8udv+h1OR46EIuGnS3p3+Wuqg+6jk2aK8J8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722354141; c=relaxed/simple; bh=KAPrrx5U5wbm2ASj2x4s4cCZPOeNjfO5dKehVcatkqc=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=ooKQyMWijYl6nPmlnk0xu//3nlyQch+Bhyp6RTslnu1qJSOWJQONitX1UM9YxC3YyLK8jGJT7qTI6B49zTm4jTZAtmJ9z7e4K1wRdBASWYjOebHTKvgNJEANRZll0wSejl0lYQ45tOrMmUZnJGhqjlZuw32dAX6QPQoeBNLoEF4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722354139; x=1753890139; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=KAPrrx5U5wbm2ASj2x4s4cCZPOeNjfO5dKehVcatkqc=; b=lv0J2ZC/Io/3O1uWVPRN3umsIvjzuizu7eVXLLYwrtZMjY5in5y9UQnI iUl8Vym1aPLk0SlEyBAbaq+kVpGFuQYtcnfOdA/UsYTDwref9Rm2x1Mkh c5GGmiGoIgSGn9/zDypTqc+wIOFciC2Y2SYACDwi3K3VKoUY8fA8PQqyp lY72Gco4P+tYG3XAwDlljKoJatqGQBZBOJkiLfTlkgcg8TKLLaI6PTuaV /GS245uPJmKS3i90tPZ5ceOnl3OIN43XJng15SPttjzNTcW+ZtGVqIOdl wUt2qOYKztylc6IcRzbQ0bZh+AmeQlvdW3Y9ZBchZwC3B3Ko0z8S41/ii g==; X-CSE-ConnectionGUID: CFk6vrA0TLK3cOw3MV8dlw== X-CSE-MsgGUID: f6Vowo84QKmK415wXC3Fzw== X-IronPort-AV: E=McAfee;i="6700,10204,11149"; a="20336645" X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="20336645" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jul 2024 08:42:13 -0700 X-CSE-ConnectionGUID: SIvDqGmeTJ2YVGnBC3USmg== X-CSE-MsgGUID: rkTqLxJNQiWAcjdLt+eDhQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="58517822" Received: from tassilo.jf.intel.com ([10.54.38.190]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jul 2024 08:42:12 -0700 From: Andi Kleen To: gcc-patches@gcc.gnu.org Cc: Andi Kleen Subject: [PATCH 1/2] Remove MMX code path in lexer Date: Tue, 30 Jul 2024 08:41:58 -0700 Message-ID: <20240730154159.3799008-1-ak@linux.intel.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: Andi Kleen Host systems with only MMX and no SSE2 should be really rare now. Let's remove the MMX code path to keep the number of custom implementations the same. The SSE2 code path is also somewhat dubious now (nearly everything should have SSE4 4.2 which is >15 years old now), but the SSE2 code path is used as fallback for others and also apparently Solaris uses it due to tool chain deficiencies. libcpp/ChangeLog: * lex.cc (search_line_mmx): Remove function. (init_vectorized_lexer): Remove search_line_mmx. --- libcpp/lex.cc | 75 --------------------------------------------------- 1 file changed, 75 deletions(-) diff --git a/libcpp/lex.cc b/libcpp/lex.cc index 16f2c23af1e1..1591dcdf151a 100644 --- a/libcpp/lex.cc +++ b/libcpp/lex.cc @@ -290,71 +290,6 @@ static const char repl_chars[4][16] __attribute__((aligned(16))) = { '?', '?', '?', '?', '?', '?', '?', '?' }, }; -/* A version of the fast scanner using MMX vectorized byte compare insns. - - This uses the PMOVMSKB instruction which was introduced with "MMX2", - which was packaged into SSE1; it is also present in the AMD MMX - extension. Mark the function as using "sse" so that we emit a real - "emms" instruction, rather than the 3dNOW "femms" instruction. */ - -static const uchar * -#ifndef __SSE__ -__attribute__((__target__("sse"))) -#endif -search_line_mmx (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) -{ - typedef char v8qi __attribute__ ((__vector_size__ (8))); - typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__)); - - const v8qi repl_nl = *(const v8qi *)repl_chars[0]; - const v8qi repl_cr = *(const v8qi *)repl_chars[1]; - const v8qi repl_bs = *(const v8qi *)repl_chars[2]; - const v8qi repl_qm = *(const v8qi *)repl_chars[3]; - - unsigned int misalign, found, mask; - const v8qi *p; - v8qi data, t, c; - - /* Align the source pointer. While MMX doesn't generate unaligned data - faults, this allows us to safely scan to the end of the buffer without - reading beyond the end of the last page. */ - misalign = (uintptr_t)s & 7; - p = (const v8qi *)((uintptr_t)s & -8); - data = *p; - - /* Create a mask for the bytes that are valid within the first - 16-byte block. The Idea here is that the AND with the mask - within the loop is "free", since we need some AND or TEST - insn in order to set the flags for the branch anyway. */ - mask = -1u << misalign; - - /* Main loop processing 8 bytes at a time. */ - goto start; - do - { - data = *++p; - mask = -1; - - start: - t = __builtin_ia32_pcmpeqb(data, repl_nl); - c = __builtin_ia32_pcmpeqb(data, repl_cr); - t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c); - c = __builtin_ia32_pcmpeqb(data, repl_bs); - t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c); - c = __builtin_ia32_pcmpeqb(data, repl_qm); - t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c); - found = __builtin_ia32_pmovmskb (t); - found &= mask; - } - while (!found); - - __builtin_ia32_emms (); - - /* FOUND contains 1 in bits for which we matched a relevant - character. Conversion to the byte index is trivial. */ - found = __builtin_ctz(found); - return (const uchar *)p + found; -} /* A version of the fast scanner using SSE2 vectorized byte compare insns. */ @@ -509,8 +444,6 @@ init_vectorized_lexer (void) minimum = 3; #elif defined(__SSE2__) minimum = 2; -#elif defined(__SSE__) - minimum = 1; #endif if (minimum == 3) @@ -521,14 +454,6 @@ init_vectorized_lexer (void) impl = search_line_sse42; else if (minimum == 2 || (edx & bit_SSE2)) impl = search_line_sse2; - else if (minimum == 1 || (edx & bit_SSE)) - impl = search_line_mmx; - } - else if (__get_cpuid (0x80000001, &dummy, &dummy, &dummy, &edx)) - { - if (minimum == 1 - || (edx & (bit_MMXEXT | bit_CMOV)) == (bit_MMXEXT | bit_CMOV)) - impl = search_line_mmx; } search_line_fast = impl; From patchwork Tue Jul 30 15:41:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andi Kleen X-Patchwork-Id: 1966612 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=HarMEpD4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKHg4z28z1ybX for ; Wed, 31 Jul 2024 01:42:55 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CABA7385B50C for ; Tue, 30 Jul 2024 15:42:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by sourceware.org (Postfix) with ESMTPS id C0B83385C6E1; Tue, 30 Jul 2024 15:42:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C0B83385C6E1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.intel.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=linux.intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C0B83385C6E1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722354146; cv=none; b=oAq7SigVNCcEWLorPQYkJ+eH16zUO2EjID/B6E4WglURW7RhjhCjG/fs9t3iviInWT8GMr9o0EZZK1SpSdQ0zktF/89jlov8+ddzzw0wFXU1bvT3jG2kwwNvqZOg9gS7PnCBMfCVuv31n+yRCa/Urw2angBZaszJmZ6gAkzxMQw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722354146; c=relaxed/simple; bh=PnKgdEOUIIBSPk/uVefv3QXVSGVnQyE5KssZqy6P8Bo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=w3OvpN3RQhsfqdcioU95+xZe/4OyijBvddQBtJQEE+dlvjg9GvMqvJASBeeCsiF55K4M0c4R/eAYf5EP5HsP0PU3+eyjSlHuLSyDMaz5mD4qUVhAxB+Bg7D56QWPcEBfunrnyXf0tZiBnfreoISafB8hDvc7XiHXLX9sqf4+UVc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722354142; x=1753890142; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PnKgdEOUIIBSPk/uVefv3QXVSGVnQyE5KssZqy6P8Bo=; b=HarMEpD4swjTx2vluGGzLk2FKr7oim6YfxpS9t0J9yjlnnzUXqXSDFIO pc7hNGNxTSfP2aPUy3m91MDUbIp3Mn4laXOyItyvWHTypvWoqkRLwbl7G Nzfh+t2/Vje1t2FBsn48hnUcHFk7fckPCIgjzjNPFd+A/9HG4LeSdBGDS WjSn+D6kbdwtMHbnLC/VR99h2c6X2vUR0mSjFUfPP539oivdzk51DAJTy x6RLnSnDKrEguK0o7icVx9LJuhMpXioo86bayZ74MK3U+jHGfj6tzK9as eEngxu1KmU7FAG9cJhBhMlWICYIN7x4qTPIxbPp71yszXQ8aLHTQZz27B A==; X-CSE-ConnectionGUID: q2C9T74AR2Wxo3puaCldSg== X-CSE-MsgGUID: E7cQMijPRnepYqztD8+TcQ== X-IronPort-AV: E=McAfee;i="6700,10204,11149"; a="20336646" X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="20336646" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jul 2024 08:42:13 -0700 X-CSE-ConnectionGUID: cA6VVIZsSyCOgCpEAxGG/w== X-CSE-MsgGUID: QNe8T8F4Q8WGvZy50TItKw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="58517829" Received: from tassilo.jf.intel.com ([10.54.38.190]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jul 2024 08:42:13 -0700 From: Andi Kleen To: gcc-patches@gcc.gnu.org Cc: Andi Kleen Subject: [PATCH 2/2] Add AVX2 code path to lexer Date: Tue, 30 Jul 2024 08:41:59 -0700 Message-ID: <20240730154159.3799008-2-ak@linux.intel.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240730154159.3799008-1-ak@linux.intel.com> References: <20240730154159.3799008-1-ak@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: Andi Kleen AVX2 is widely available on x86 and it allows to do the scanner line check with 32 bytes at a time. The code is similar to the SSE2 code path, just using AVX and 32 bytes at a time instead of SSE2 16 bytes. Also adjust the code to allow inlining when the compiler is built for an AVX2 host, following what other architectures do. I see about a ~0.6% compile time improvement for compiling i386 insn-recog.i with -O0. libcpp/ChangeLog: * config.in (HAVE_AVX2): Add. * configure: Regenerate. * configure.ac: Add HAVE_AVX2 check. * lex.cc (repl_chars): Extend to 32 bytes. (search_line_avx2): New function to scan line using AVX2. (init_vectorized_lexer): Check for AVX2 in CPUID. --- libcpp/config.in | 3 ++ libcpp/configure | 17 +++++++++ libcpp/configure.ac | 3 ++ libcpp/lex.cc | 91 +++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 110 insertions(+), 4 deletions(-) diff --git a/libcpp/config.in b/libcpp/config.in index 253ef03a3dea..8fad6bd4b4f5 100644 --- a/libcpp/config.in +++ b/libcpp/config.in @@ -213,6 +213,9 @@ /* Define to 1 if you can assemble SSE4 insns. */ #undef HAVE_SSE4 +/* Define to 1 if you can assemble AVX2 insns. */ +#undef HAVE_AVX2 + /* Define to 1 if you have the header file. */ #undef HAVE_STDDEF_H diff --git a/libcpp/configure b/libcpp/configure index 32d6aaa30699..6d9286ac9601 100755 --- a/libcpp/configure +++ b/libcpp/configure @@ -9149,6 +9149,23 @@ if ac_fn_c_try_compile "$LINENO"; then : $as_echo "#define HAVE_SSE4 1" >>confdefs.h +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +int +main () +{ +asm ("vpcmpeqb %%ymm0, %%ymm4, %%ymm5" : : "i"(0)) + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + +$as_echo "#define HAVE_AVX2 1" >>confdefs.h + fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext esac diff --git a/libcpp/configure.ac b/libcpp/configure.ac index b883fec776fe..c06609827924 100644 --- a/libcpp/configure.ac +++ b/libcpp/configure.ac @@ -200,6 +200,9 @@ case $target in AC_TRY_COMPILE([], [asm ("pcmpestri %0, %%xmm0, %%xmm1" : : "i"(0))], [AC_DEFINE([HAVE_SSE4], [1], [Define to 1 if you can assemble SSE4 insns.])]) + AC_TRY_COMPILE([], [asm ("vpcmpeqb %%ymm0, %%ymm4, %%ymm5" : : "i"(0))], + [AC_DEFINE([HAVE_AVX2], [1], + [Define to 1 if you can assemble AVX2 insns.])]) esac # Enable --enable-host-shared. diff --git a/libcpp/lex.cc b/libcpp/lex.cc index 1591dcdf151a..72f3402aac99 100644 --- a/libcpp/lex.cc +++ b/libcpp/lex.cc @@ -278,19 +278,31 @@ search_line_acc_char (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) /* Replicated character data to be shared between implementations. Recall that outside of a context with vector support we can't define compatible vector types, therefore these are all defined - in terms of raw characters. */ -static const char repl_chars[4][16] __attribute__((aligned(16))) = { + in terms of raw characters. + gcc constant propagates this and usually turns it into a + vector broadcast, so it actually disappears. */ + +static const char repl_chars[4][32] __attribute__((aligned(32))) = { { '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', + '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', + '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n' }, { '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', + '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', + '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r' }, { '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', + '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', + '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\' }, { '?', '?', '?', '?', '?', '?', '?', '?', + '?', '?', '?', '?', '?', '?', '?', '?', + '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?' }, }; +#ifndef __AVX2__ /* A version of the fast scanner using SSE2 vectorized byte compare insns. */ static const uchar * @@ -343,8 +355,9 @@ search_line_sse2 (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) found = __builtin_ctz(found); return (const uchar *)p + found; } +#endif -#ifdef HAVE_SSE4 +#if defined(HAVE_SSE4) && !defined(__AVX2__) /* A version of the fast scanner using SSE 4.2 vectorized string insns. */ static const uchar * @@ -425,6 +438,71 @@ search_line_sse42 (const uchar *s, const uchar *end) #define search_line_sse42 search_line_sse2 #endif +#ifdef HAVE_AVX2 + +/* A version of the fast scanner using AVX2 vectorized byte compare insns. */ + +static const uchar * +#ifndef __AVX2__ +__attribute__((__target__("avx2"))) +#endif +search_line_avx2 (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) +{ + typedef char v32qi __attribute__ ((__vector_size__ (32))); + + const v32qi repl_nl = *(const v32qi *)repl_chars[0]; + const v32qi repl_cr = *(const v32qi *)repl_chars[1]; + const v32qi repl_bs = *(const v32qi *)repl_chars[2]; + const v32qi repl_qm = *(const v32qi *)repl_chars[3]; + + unsigned int misalign, found, mask; + const v32qi *p; + v32qi data, t; + + /* Align the source pointer. */ + misalign = (uintptr_t)s & 31; + p = (const v32qi *)((uintptr_t)s & -32); + data = *p; + + /* Create a mask for the bytes that are valid within the first + 32-byte block. The Idea here is that the AND with the mask + within the loop is "free", since we need some AND or TEST + insn in order to set the flags for the branch anyway. */ + mask = -1u << misalign; + + /* Main loop processing 32 bytes at a time. */ + goto start; + do + { + data = *++p; + mask = -1; + + start: + t = data == repl_nl; + t |= data == repl_cr; + t |= data == repl_bs; + t |= data == repl_qm; + found = __builtin_ia32_pmovmskb256 (t); + found &= mask; + } + while (!found); + + /* FOUND contains 1 in bits for which we matched a relevant + character. Conversion to the byte index is trivial. */ + found = __builtin_ctz (found); + return (const uchar *)p + found; +} + +#else +#define search_line_avx2 search_line_sse2 +#endif + +#ifdef __AVX2__ +/* Avoid indirect calls to encourage inlining if the compiler is built + using AVX. */ +#define search_line_fast search_line_avx2 +#else + /* Check the CPU capabilities. */ #include "../gcc/config/i386/cpuid.h" @@ -436,7 +514,7 @@ static search_line_fast_type search_line_fast; static inline void init_vectorized_lexer (void) { - unsigned dummy, ecx = 0, edx = 0; + unsigned dummy, ecx = 0, edx = 0, ebx = 0; search_line_fast_type impl = search_line_acc_char; int minimum = 0; @@ -448,6 +526,10 @@ init_vectorized_lexer (void) if (minimum == 3) impl = search_line_sse42; + else if (__get_cpuid_max (0, &dummy) >= 7 + && __get_cpuid_count (7, 0, &dummy, &ebx, &dummy, &dummy) + && (ebx & bit_AVX2)) + impl = search_line_avx2; else if (__get_cpuid (1, &dummy, &dummy, &ecx, &edx) || minimum == 2) { if (minimum == 3 || (ecx & bit_SSE4_2)) @@ -458,6 +540,7 @@ init_vectorized_lexer (void) search_line_fast = impl; } +#endif /* !__AVX2__ */ #elif (GCC_VERSION >= 4005) && defined(_ARCH_PWR8) && defined(__ALTIVEC__)