From patchwork Tue Jul 30 15:41:58 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andi Kleen <ak@linux.intel.com>
X-Patchwork-Id: 1966613
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256
 header.s=Intel header.b=lv0J2ZC/;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKJK11P8z1ybX
	for <incoming@patchwork.ozlabs.org>; Wed, 31 Jul 2024 01:43:29 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id B5C08385DC1B
	for <incoming@patchwork.ozlabs.org>; Tue, 30 Jul 2024 15:43:26 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16])
 by sourceware.org (Postfix) with ESMTPS id 972B13858C56;
 Tue, 30 Jul 2024 15:42:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 972B13858C56
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=linux.intel.com
Authentication-Results: sourceware.org; spf=none smtp.mailfrom=linux.intel.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 972B13858C56
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=198.175.65.16
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722354141; cv=none;
 b=CW36r3QETVGj9uCCMOlmbpVTEcXWkVJu+LDx2b9uj0qMxw2YWNkHTsydzYlK3btxjFfJ+dYr3n8j8m1fXep0ed9IGbzi1HILB9dBpN8PTqa8f4Bqbe5pj2HZZ90TNVVx0fGKkJGl8udv+h1OR46EIuGnS3p3+Wuqg+6jk2aK8J8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1722354141; c=relaxed/simple;
 bh=KAPrrx5U5wbm2ASj2x4s4cCZPOeNjfO5dKehVcatkqc=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=ooKQyMWijYl6nPmlnk0xu//3nlyQch+Bhyp6RTslnu1qJSOWJQONitX1UM9YxC3YyLK8jGJT7qTI6B49zTm4jTZAtmJ9z7e4K1wRdBASWYjOebHTKvgNJEANRZll0wSejl0lYQ45tOrMmUZnJGhqjlZuw32dAX6QPQoeBNLoEF4=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1722354139; x=1753890139;
 h=from:to:cc:subject:date:message-id:mime-version:
 content-transfer-encoding;
 bh=KAPrrx5U5wbm2ASj2x4s4cCZPOeNjfO5dKehVcatkqc=;
 b=lv0J2ZC/Io/3O1uWVPRN3umsIvjzuizu7eVXLLYwrtZMjY5in5y9UQnI
 iUl8Vym1aPLk0SlEyBAbaq+kVpGFuQYtcnfOdA/UsYTDwref9Rm2x1Mkh
 c5GGmiGoIgSGn9/zDypTqc+wIOFciC2Y2SYACDwi3K3VKoUY8fA8PQqyp
 lY72Gco4P+tYG3XAwDlljKoJatqGQBZBOJkiLfTlkgcg8TKLLaI6PTuaV
 /GS245uPJmKS3i90tPZ5ceOnl3OIN43XJng15SPttjzNTcW+ZtGVqIOdl
 wUt2qOYKztylc6IcRzbQ0bZh+AmeQlvdW3Y9ZBchZwC3B3Ko0z8S41/ii g==;
X-CSE-ConnectionGUID: CFk6vrA0TLK3cOw3MV8dlw==
X-CSE-MsgGUID: f6Vowo84QKmK415wXC3Fzw==
X-IronPort-AV: E=McAfee;i="6700,10204,11149"; a="20336645"
X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="20336645"
Received: from fmviesa003.fm.intel.com ([10.60.135.143])
 by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Jul 2024 08:42:13 -0700
X-CSE-ConnectionGUID: SIvDqGmeTJ2YVGnBC3USmg==
X-CSE-MsgGUID: rkTqLxJNQiWAcjdLt+eDhQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="58517822"
Received: from tassilo.jf.intel.com ([10.54.38.190])
 by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Jul 2024 08:42:12 -0700
From: Andi Kleen <ak@linux.intel.com>
To: gcc-patches@gcc.gnu.org
Cc: Andi Kleen <ak@gcc.gnu.org>
Subject: [PATCH 1/2] Remove MMX code path in lexer
Date: Tue, 30 Jul 2024 08:41:58 -0700
Message-ID: <20240730154159.3799008-1-ak@linux.intel.com>
X-Mailer: git-send-email 2.45.2
MIME-Version: 1.0
X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

From: Andi Kleen <ak@gcc.gnu.org>

Host systems with only MMX and no SSE2 should be really rare now.
Let's remove the MMX code path to keep the number of custom
implementations the same.

The SSE2 code path is also somewhat dubious now (nearly everything
should have SSE4 4.2 which is >15 years old now), but the SSE2
code path is used as fallback for others and also apparently
Solaris uses it due to tool chain deficiencies.

libcpp/ChangeLog:

	* lex.cc (search_line_mmx): Remove function.
	(init_vectorized_lexer): Remove search_line_mmx.
---
 libcpp/lex.cc | 75 ---------------------------------------------------
 1 file changed, 75 deletions(-)

diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index 16f2c23af1e1..1591dcdf151a 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -290,71 +290,6 @@ static const char repl_chars[4][16] __attribute__((aligned(16))) = {
     '?', '?', '?', '?', '?', '?', '?', '?' },
 };
 
-/* A version of the fast scanner using MMX vectorized byte compare insns.
-
-   This uses the PMOVMSKB instruction which was introduced with "MMX2",
-   which was packaged into SSE1; it is also present in the AMD MMX
-   extension.  Mark the function as using "sse" so that we emit a real
-   "emms" instruction, rather than the 3dNOW "femms" instruction.  */
-
-static const uchar *
-#ifndef __SSE__
-__attribute__((__target__("sse")))
-#endif
-search_line_mmx (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
-{
-  typedef char v8qi __attribute__ ((__vector_size__ (8)));
-  typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
-
-  const v8qi repl_nl = *(const v8qi *)repl_chars[0];
-  const v8qi repl_cr = *(const v8qi *)repl_chars[1];
-  const v8qi repl_bs = *(const v8qi *)repl_chars[2];
-  const v8qi repl_qm = *(const v8qi *)repl_chars[3];
-
-  unsigned int misalign, found, mask;
-  const v8qi *p;
-  v8qi data, t, c;
-
-  /* Align the source pointer.  While MMX doesn't generate unaligned data
-     faults, this allows us to safely scan to the end of the buffer without
-     reading beyond the end of the last page.  */
-  misalign = (uintptr_t)s & 7;
-  p = (const v8qi *)((uintptr_t)s & -8);
-  data = *p;
-
-  /* Create a mask for the bytes that are valid within the first
-     16-byte block.  The Idea here is that the AND with the mask
-     within the loop is "free", since we need some AND or TEST
-     insn in order to set the flags for the branch anyway.  */
-  mask = -1u << misalign;
-
-  /* Main loop processing 8 bytes at a time.  */
-  goto start;
-  do
-    {
-      data = *++p;
-      mask = -1;
-
-    start:
-      t = __builtin_ia32_pcmpeqb(data, repl_nl);
-      c = __builtin_ia32_pcmpeqb(data, repl_cr);
-      t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
-      c = __builtin_ia32_pcmpeqb(data, repl_bs);
-      t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
-      c = __builtin_ia32_pcmpeqb(data, repl_qm);
-      t = (v8qi) __builtin_ia32_por ((__m64)t, (__m64)c);
-      found = __builtin_ia32_pmovmskb (t);
-      found &= mask;
-    }
-  while (!found);
-
-  __builtin_ia32_emms ();
-
-  /* FOUND contains 1 in bits for which we matched a relevant
-     character.  Conversion to the byte index is trivial.  */
-  found = __builtin_ctz(found);
-  return (const uchar *)p + found;
-}
 
 /* A version of the fast scanner using SSE2 vectorized byte compare insns.  */
 
@@ -509,8 +444,6 @@ init_vectorized_lexer (void)
   minimum = 3;
 #elif defined(__SSE2__)
   minimum = 2;
-#elif defined(__SSE__)
-  minimum = 1;
 #endif
 
   if (minimum == 3)
@@ -521,14 +454,6 @@ init_vectorized_lexer (void)
         impl = search_line_sse42;
       else if (minimum == 2 || (edx & bit_SSE2))
 	impl = search_line_sse2;
-      else if (minimum == 1 || (edx & bit_SSE))
-	impl = search_line_mmx;
-    }
-  else if (__get_cpuid (0x80000001, &dummy, &dummy, &dummy, &edx))
-    {
-      if (minimum == 1
-	  || (edx & (bit_MMXEXT | bit_CMOV)) == (bit_MMXEXT | bit_CMOV))
-	impl = search_line_mmx;
     }
 
   search_line_fast = impl;

From patchwork Tue Jul 30 15:41:59 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andi Kleen <ak@linux.intel.com>
X-Patchwork-Id: 1966612
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256
 header.s=Intel header.b=HarMEpD4;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYKHg4z28z1ybX
	for <incoming@patchwork.ozlabs.org>; Wed, 31 Jul 2024 01:42:55 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id CABA7385B50C
	for <incoming@patchwork.ozlabs.org>; Tue, 30 Jul 2024 15:42:53 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16])
 by sourceware.org (Postfix) with ESMTPS id C0B83385C6E1;
 Tue, 30 Jul 2024 15:42:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C0B83385C6E1
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=linux.intel.com
Authentication-Results: sourceware.org; spf=none smtp.mailfrom=linux.intel.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C0B83385C6E1
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=198.175.65.16
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722354146; cv=none;
 b=oAq7SigVNCcEWLorPQYkJ+eH16zUO2EjID/B6E4WglURW7RhjhCjG/fs9t3iviInWT8GMr9o0EZZK1SpSdQ0zktF/89jlov8+ddzzw0wFXU1bvT3jG2kwwNvqZOg9gS7PnCBMfCVuv31n+yRCa/Urw2angBZaszJmZ6gAkzxMQw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1722354146; c=relaxed/simple;
 bh=PnKgdEOUIIBSPk/uVefv3QXVSGVnQyE5KssZqy6P8Bo=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=w3OvpN3RQhsfqdcioU95+xZe/4OyijBvddQBtJQEE+dlvjg9GvMqvJASBeeCsiF55K4M0c4R/eAYf5EP5HsP0PU3+eyjSlHuLSyDMaz5mD4qUVhAxB+Bg7D56QWPcEBfunrnyXf0tZiBnfreoISafB8hDvc7XiHXLX9sqf4+UVc=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1722354142; x=1753890142;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=PnKgdEOUIIBSPk/uVefv3QXVSGVnQyE5KssZqy6P8Bo=;
 b=HarMEpD4swjTx2vluGGzLk2FKr7oim6YfxpS9t0J9yjlnnzUXqXSDFIO
 pc7hNGNxTSfP2aPUy3m91MDUbIp3Mn4laXOyItyvWHTypvWoqkRLwbl7G
 Nzfh+t2/Vje1t2FBsn48hnUcHFk7fckPCIgjzjNPFd+A/9HG4LeSdBGDS
 WjSn+D6kbdwtMHbnLC/VR99h2c6X2vUR0mSjFUfPP539oivdzk51DAJTy
 x6RLnSnDKrEguK0o7icVx9LJuhMpXioo86bayZ74MK3U+jHGfj6tzK9as
 eEngxu1KmU7FAG9cJhBhMlWICYIN7x4qTPIxbPp71yszXQ8aLHTQZz27B A==;
X-CSE-ConnectionGUID: q2C9T74AR2Wxo3puaCldSg==
X-CSE-MsgGUID: E7cQMijPRnepYqztD8+TcQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11149"; a="20336646"
X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="20336646"
Received: from fmviesa003.fm.intel.com ([10.60.135.143])
 by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Jul 2024 08:42:13 -0700
X-CSE-ConnectionGUID: cA6VVIZsSyCOgCpEAxGG/w==
X-CSE-MsgGUID: QNe8T8F4Q8WGvZy50TItKw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.09,248,1716274800"; d="scan'208";a="58517829"
Received: from tassilo.jf.intel.com ([10.54.38.190])
 by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Jul 2024 08:42:13 -0700
From: Andi Kleen <ak@linux.intel.com>
To: gcc-patches@gcc.gnu.org
Cc: Andi Kleen <ak@gcc.gnu.org>
Subject: [PATCH 2/2] Add AVX2 code path to lexer
Date: Tue, 30 Jul 2024 08:41:59 -0700
Message-ID: <20240730154159.3799008-2-ak@linux.intel.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20240730154159.3799008-1-ak@linux.intel.com>
References: <20240730154159.3799008-1-ak@linux.intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

From: Andi Kleen <ak@gcc.gnu.org>

AVX2 is widely available on x86 and it allows to do the scanner line
check with 32 bytes at a time. The code is similar to the SSE2 code
path, just using AVX and 32 bytes at a time instead of SSE2 16 bytes.

Also adjust the code to allow inlining when the compiler
is built for an AVX2 host, following what other architectures
do.

I see about a ~0.6% compile time improvement for compiling i386
insn-recog.i with -O0.

libcpp/ChangeLog:

	* config.in (HAVE_AVX2): Add.
	* configure: Regenerate.
	* configure.ac: Add HAVE_AVX2 check.
	* lex.cc (repl_chars): Extend to 32 bytes.
	(search_line_avx2): New function to scan line using AVX2.
	(init_vectorized_lexer): Check for AVX2 in CPUID.
---
 libcpp/config.in    |  3 ++
 libcpp/configure    | 17 +++++++++
 libcpp/configure.ac |  3 ++
 libcpp/lex.cc       | 91 +++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 110 insertions(+), 4 deletions(-)

diff --git a/libcpp/config.in b/libcpp/config.in
index 253ef03a3dea..8fad6bd4b4f5 100644
--- a/libcpp/config.in
+++ b/libcpp/config.in
@@ -213,6 +213,9 @@
 /* Define to 1 if you can assemble SSE4 insns. */
 #undef HAVE_SSE4
 
+/* Define to 1 if you can assemble AVX2 insns. */
+#undef HAVE_AVX2
+
 /* Define to 1 if you have the <stddef.h> header file. */
 #undef HAVE_STDDEF_H
 
diff --git a/libcpp/configure b/libcpp/configure
index 32d6aaa30699..6d9286ac9601 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -9149,6 +9149,23 @@ if ac_fn_c_try_compile "$LINENO"; then :
 
 $as_echo "#define HAVE_SSE4 1" >>confdefs.h
 
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+    cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+asm ("vpcmpeqb %%ymm0, %%ymm4, %%ymm5" : : "i"(0))
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+
+$as_echo "#define HAVE_AVX2 1" >>confdefs.h
+
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 esac
diff --git a/libcpp/configure.ac b/libcpp/configure.ac
index b883fec776fe..c06609827924 100644
--- a/libcpp/configure.ac
+++ b/libcpp/configure.ac
@@ -200,6 +200,9 @@ case $target in
     AC_TRY_COMPILE([], [asm ("pcmpestri %0, %%xmm0, %%xmm1" : : "i"(0))],
       [AC_DEFINE([HAVE_SSE4], [1],
 		 [Define to 1 if you can assemble SSE4 insns.])])
+    AC_TRY_COMPILE([], [asm ("vpcmpeqb %%ymm0, %%ymm4, %%ymm5" : : "i"(0))],
+      [AC_DEFINE([HAVE_AVX2], [1],
+		 [Define to 1 if you can assemble AVX2 insns.])])
 esac
 
 # Enable --enable-host-shared.
diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index 1591dcdf151a..72f3402aac99 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -278,19 +278,31 @@ search_line_acc_char (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
 /* Replicated character data to be shared between implementations.
    Recall that outside of a context with vector support we can't
    define compatible vector types, therefore these are all defined
-   in terms of raw characters.  */
-static const char repl_chars[4][16] __attribute__((aligned(16))) = {
+   in terms of raw characters.
+   gcc constant propagates this and usually turns it into a
+   vector broadcast, so it actually disappears.  */
+
+static const char repl_chars[4][32] __attribute__((aligned(32))) = {
   { '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n',
+    '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n',
+    '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n',
     '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n' },
   { '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r',
+    '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r',
+    '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r',
     '\r', '\r', '\r', '\r', '\r', '\r', '\r', '\r' },
   { '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\',
+    '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\',
+    '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\',
     '\\', '\\', '\\', '\\', '\\', '\\', '\\', '\\' },
   { '?', '?', '?', '?', '?', '?', '?', '?',
+    '?', '?', '?', '?', '?', '?', '?', '?',
+    '?', '?', '?', '?', '?', '?', '?', '?',
     '?', '?', '?', '?', '?', '?', '?', '?' },
 };
 
 
+#ifndef __AVX2__
 /* A version of the fast scanner using SSE2 vectorized byte compare insns.  */
 
 static const uchar *
@@ -343,8 +355,9 @@ search_line_sse2 (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
   found = __builtin_ctz(found);
   return (const uchar *)p + found;
 }
+#endif
 
-#ifdef HAVE_SSE4
+#if defined(HAVE_SSE4) && !defined(__AVX2__)
 /* A version of the fast scanner using SSE 4.2 vectorized string insns.  */
 
 static const uchar *
@@ -425,6 +438,71 @@ search_line_sse42 (const uchar *s, const uchar *end)
 #define search_line_sse42 search_line_sse2
 #endif
 
+#ifdef HAVE_AVX2
+
+/* A version of the fast scanner using AVX2 vectorized byte compare insns.  */
+
+static const uchar *
+#ifndef __AVX2__
+__attribute__((__target__("avx2")))
+#endif
+search_line_avx2 (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
+{
+  typedef char v32qi __attribute__ ((__vector_size__ (32)));
+
+  const v32qi repl_nl = *(const v32qi *)repl_chars[0];
+  const v32qi repl_cr = *(const v32qi *)repl_chars[1];
+  const v32qi repl_bs = *(const v32qi *)repl_chars[2];
+  const v32qi repl_qm = *(const v32qi *)repl_chars[3];
+
+  unsigned int misalign, found, mask;
+  const v32qi *p;
+  v32qi data, t;
+
+  /* Align the source pointer.  */
+  misalign = (uintptr_t)s & 31;
+  p = (const v32qi *)((uintptr_t)s & -32);
+  data = *p;
+
+  /* Create a mask for the bytes that are valid within the first
+     32-byte block.  The Idea here is that the AND with the mask
+     within the loop is "free", since we need some AND or TEST
+     insn in order to set the flags for the branch anyway.  */
+  mask = -1u << misalign;
+
+  /* Main loop processing 32 bytes at a time.  */
+  goto start;
+  do
+    {
+      data = *++p;
+      mask = -1;
+
+    start:
+      t  = data == repl_nl;
+      t |= data == repl_cr;
+      t |= data == repl_bs;
+      t |= data == repl_qm;
+      found = __builtin_ia32_pmovmskb256 (t);
+      found &= mask;
+    }
+  while (!found);
+
+  /* FOUND contains 1 in bits for which we matched a relevant
+     character.  Conversion to the byte index is trivial.  */
+  found = __builtin_ctz (found);
+  return (const uchar *)p + found;
+}
+
+#else
+#define search_line_avx2 search_line_sse2
+#endif
+
+#ifdef __AVX2__
+/* Avoid indirect calls to encourage inlining if the compiler is built
+   using AVX.  */
+#define search_line_fast search_line_avx2
+#else
+
 /* Check the CPU capabilities.  */
 
 #include "../gcc/config/i386/cpuid.h"
@@ -436,7 +514,7 @@ static search_line_fast_type search_line_fast;
 static inline void
 init_vectorized_lexer (void)
 {
-  unsigned dummy, ecx = 0, edx = 0;
+  unsigned dummy, ecx = 0, edx = 0, ebx = 0;
   search_line_fast_type impl = search_line_acc_char;
   int minimum = 0;
 
@@ -448,6 +526,10 @@ init_vectorized_lexer (void)
 
   if (minimum == 3)
     impl = search_line_sse42;
+  else if (__get_cpuid_max (0, &dummy) >= 7
+	       && __get_cpuid_count (7, 0, &dummy, &ebx, &dummy, &dummy)
+	       && (ebx & bit_AVX2))
+    impl = search_line_avx2;
   else if (__get_cpuid (1, &dummy, &dummy, &ecx, &edx) || minimum == 2)
     {
       if (minimum == 3 || (ecx & bit_SSE4_2))
@@ -458,6 +540,7 @@ init_vectorized_lexer (void)
 
   search_line_fast = impl;
 }
+#endif /* !__AVX2__ */
 
 #elif (GCC_VERSION >= 4005) && defined(_ARCH_PWR8) && defined(__ALTIVEC__)