From patchwork Tue Nov  3 21:13:51 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Nathan Sidwell <nathan@acm.org>
X-Patchwork-Id: 1393416
Return-Path: <gcc-patches-bounces@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=sourceware.org;
 envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
 dmarc=none (p=none dis=none) header.from=acm.org
Authentication-Results: ozlabs.org;
	dkim=fail reason="signature verification failed" (2048-bit key;
 unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256
 header.s=20161025 header.b=K6kPYiZf;
	dkim-atps=neutral
Received: from sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 4CQjFg61rnz9sVM
	for <incoming@patchwork.ozlabs.org>; Wed,  4 Nov 2020 08:13:59 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C2FF6398743E;
	Tue,  3 Nov 2020 21:13:57 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com
 [IPv6:2607:f8b0:4864:20::72d])
 by sourceware.org (Postfix) with ESMTPS id 4EFE7398743E
 for <gcc-patches@gcc.gnu.org>; Tue,  3 Nov 2020 21:13:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4EFE7398743E
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=acm.org
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nathanmsidwell@gmail.com
Received: by mail-qk1-x72d.google.com with SMTP id s14so16578873qkg.11
 for <gcc-patches@gcc.gnu.org>; Tue, 03 Nov 2020 13:13:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=sender:from:subject:to:references:message-id:date:user-agent
 :mime-version:in-reply-to:content-language;
 bh=grKX5psdx78bm9rKO9VTtb72RJX9dMdUx0AV+NIc8Lw=;
 b=K6kPYiZfEiEE8MWdCcbFE3+ipSwaG9i2OvbybkOWsanYrh0zbxy+W+KScAyrIDahVc
 IoeU7p2n0HtyVyQVtJbplg/wC4YLzAcUdxRK/rbCDwt9N6hc71fPyjnxMEXzn7rVH+KM
 fNI1rkqhbTmSlux1Xxlk1I4H4nVXd+6y5scrtg0qyC1+C9Q6AKzSMrlv+dS1d5RznCkU
 IO7laAE4h+kP9USX7sysWOz3+edVXGKxOoLzWRpNt+Hs3sJyf3a44zWpHkYUIML626Ai
 7YWcXU7Lil7MujcbDBr0ZXBFBnfPjzLMPwD1jzDjkRUBVTdbEc+56SipOWlRafvtLEV4
 qxVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:sender:from:subject:to:references:message-id
 :date:user-agent:mime-version:in-reply-to:content-language;
 bh=grKX5psdx78bm9rKO9VTtb72RJX9dMdUx0AV+NIc8Lw=;
 b=hJFAw9Mw351sc7gKvp1/t/AgsPFdV3J+BV/xsJfA8zMfyCGAvjr0D5vYZ6UzEA2gIH
 TWZ1l4yWwQfobbOVawiR7wmL4QcSZwuVekmnoVH0/YxJSeiHWtJ8FA0iedck9lms3HqL
 gpve/mXyq9JZFMDqeTqVHW1ClGale9vS/Vy9snREtmuEFBd36galUkZ1BGknjPp/Gq9N
 yMSMImXWYnZDAz2PXYI7Nt/Sz61Dcv7v2V0X0fU4SiF1KAgv+BhXrody2XY2/+tmPP7T
 e2TpRgAWcPEjI6opyoEStHAgJdRmWlRy6y0s4mmuPK6PCtg1tnJbnSNZPqEkM2I+qSiK
 XAkQ==
X-Gm-Message-State: AOAM531UBpojftfiA6f8TCz4M/GxgxX7NKkBjeTcIuGLShzsDTtxXs6n
 Xb6SpOl96rfWnj5dbJoFGWQ=
X-Google-Smtp-Source: 
 ABdhPJyLbjkzHxiQvDUx512HpSqxlgY6lynST9vD+vp9ol9ymgKZo++W2nilAg0XLMxuP4I+yZ3DiA==
X-Received: by 2002:a37:4f0b:: with SMTP id d11mr22552046qkb.74.1604438033753;
 Tue, 03 Nov 2020 13:13:53 -0800 (PST)
Received: from ?IPv6:2620:10d:c0a8:1102:e5ef:7eb4:1609:98bb?
 ([2620:10d:c091:480::1:bc2d]) by smtp.googlemail.com with ESMTPSA id
 t184sm11898649qka.19.2020.11.03.13.13.52
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 03 Nov 2020 13:13:52 -0800 (PST)
From: Nathan Sidwell <nathan@acm.org>
Subject: [04/32] cpp lexer
To: GCC Patches <gcc-patches@gcc.gnu.org>, Jason Merrill <jason@redhat.com>,
 Richard Biener <richard.guenther@gmail.com>
References: <7fc9e868-3db9-4972-ed67-6ff249f549c3@acm.org>
 <c14417b3-4286-d877-f6e2-925bfc754684@acm.org>
 <0bdf00a8-d8ad-9e97-134d-6668c0e8c86b@acm.org>
 <c55d12d9-e966-5b36-1538-98c9c4b418c8@acm.org>
Message-ID: <d10060ca-ec44-4b34-e10a-ef37ee90efe9@acm.org>
Date: Tue, 3 Nov 2020 16:13:51 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.5.0
MIME-Version: 1.0
In-Reply-To: <c55d12d9-e966-5b36-1538-98c9c4b418c8@acm.org>
Content-Language: en-US
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,
 GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>

c++ modules creates 2 new kinds of preprocessor lines
[export] module ...
[export] import ...

To all intents and purposes these are cppdirectives spelt without a 
leading '#'.  module and import are context-sensitive keywords.  Thus 
preprocessor tokenizing needs a bit of token peeking.  This is that peeking.

We have a new node flag 'NODE_MODULE', which marks whether an identifier 
is significant to this peeking.  When we see such an identifier at the 
beginning of a logical line, we need to peek further and figure out 
whether these are those keywords.

When successfully peeked, we replace the identifiers with internal-only 
tokens that the c++ parser recognizes.
diff --git c/libcpp/include/cpplib.h w/libcpp/include/cpplib.h
index 8e398863cf6..81be6457951 100644
--- c/libcpp/include/cpplib.h
+++ w/libcpp/include/cpplib.h
@@ -487,6 +494,9 @@ struct cpp_options
   /* Nonzero for the '::' token.  */
   unsigned char scope;
 
+  /* Nonzero means tokenize C++20 module directives.  */
+  unsigned char module_directives;
+
   /* Holds the name of the target (execution) character set.  */
   const char *narrow_charset;
 
@@ -831,6 +857,7 @@ struct GTY(()) cpp_macro {
 #define NODE_USED	(1 << 5)	/* Dumped with -dU.  */
 #define NODE_CONDITIONAL (1 << 6)	/* Conditional macro */
 #define NODE_WARN_OPERATOR (1 << 7)	/* Warn about C++ named operator.  */
+#define NODE_MODULE (1 << 8)		/* C++-20 module-related name.  */
 
 /* Different flavors of hash node.  */
 enum node_type
@@ -888,9 +915,9 @@ struct GTY(()) cpp_hashnode {
   unsigned int directive_index : 7;	/* If is_directive,
 					   then index into directive table.
 					   Otherwise, a NODE_OPERATOR.  */
-  unsigned char rid_code;		/* Rid code - for front ends.  */
+  unsigned int rid_code : 8;		/* Rid code - for front ends.  */
+  unsigned int flags : 9;		/* CPP flags.  */
   ENUM_BITFIELD(node_type) type : 2;	/* CPP node type.  */
-  unsigned int flags : 8;		/* CPP flags.  */
 
   /* 6 bits spare (plus another 32 on 64-bit hosts).  */
 
diff --git c/libcpp/lex.c w/libcpp/lex.c
index fb222924c8c..b3498f195bf 100644
--- c/libcpp/lex.c
+++ w/libcpp/lex.c
@@ -2606,6 +2622,131 @@ _cpp_temp_token (cpp_reader *pfile)
   return result;
 }
 
+/* RESULT is a CPP_NAME with NODE_MODULE set.  See if we should enter
+   deferred_pragma mode to tokenize the rest of the line.  */
+
+static void
+cpp_maybe_module_directive (cpp_reader *pfile, cpp_token *result)
+{
+  unsigned backup = 0; /* Tokens we peeked.  */
+  cpp_hashnode *node = result->val.node.node;
+  cpp_token *peek = result;
+  cpp_token *keyword = peek;
+  cpp_hashnode *(&n_modules)[spec_nodes::M_HWM][2] = pfile->spec_nodes.n_modules;
+  int header_count = 0;
+
+  /* Enter directives mode for the peeking.  */
+  pfile->state.in_deferred_pragma = true;
+  pfile->state.pragma_allow_expansion = true;
+  pfile->state.save_comments = 0;
+  pfile->directive_line = result->src_loc;
+
+  if (node == n_modules[spec_nodes::M_EXPORT][0])
+    {
+      peek = _cpp_lex_direct (pfile);
+      keyword = peek;
+      backup++;
+      if (keyword->type != CPP_NAME)
+	goto not_module;
+      node = keyword->val.node.node;
+      if (!(node->flags & NODE_MODULE))
+	goto not_module;
+    }
+
+  if (__builtin_expect (node == n_modules[spec_nodes::M__IMPORT][0], false))
+    /* __import  */
+    header_count = backup + 2 + 16;
+  else if (__builtin_expect (node == n_modules[spec_nodes::M_IMPORT][0], false))
+    /* import  */
+    header_count = backup + 2 + (CPP_OPTION (pfile, preprocessed) ? 16 : 0);
+  else if (__builtin_expect (node == n_modules[spec_nodes::M_MODULE][0], false))
+    ; /* module  */
+  else
+    goto not_module;
+
+  /* We've seen [export] {module|import|__import}.  Check the next token.  */
+  if (header_count)
+    /* After '{,__}import' a header name may appear.  */
+    pfile->state.angled_headers = true;
+  peek = _cpp_lex_direct (pfile);
+  backup++;
+
+  /* ... import followed by identifier, ':', '<' or
+     header-name preprocessing tokens, or module
+     followed by cpp-identifier, ':' or ';' preprocessing
+     tokens.  C++ keywords are not yet relevant.  */
+  if (peek->type == CPP_NAME
+      || peek->type == CPP_COLON
+      ||  (header_count
+	   ? (peek->type == CPP_LESS
+	      || (peek->type == CPP_STRING && peek->val.str.text[0] != 'R')
+	      || peek->type == CPP_HEADER_NAME)
+	   : peek->type == CPP_SEMICOLON))
+    {
+      pfile->state.pragma_allow_expansion = !CPP_OPTION (pfile, preprocessed);
+      if (!pfile->state.pragma_allow_expansion)
+	pfile->state.prevent_expansion++;
+
+      if (!header_count && linemap_included_from
+	  (LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table)))
+	cpp_error_with_line (pfile, CPP_DL_ERROR, keyword->src_loc, 0,
+			     "module control-line cannot be in included file");
+
+      /* The first one or two tokens cannot be macro names.  */
+      for (int ix = backup; ix--;)
+	{
+	  cpp_token *tok = ix ? keyword : result;
+	  cpp_hashnode *node = tok->val.node.node;
+
+	  /* Don't attempt to expand the token.  */
+	  tok->flags |= NO_EXPAND;
+	  if (_cpp_defined_macro_p (node)
+	      && _cpp_maybe_notify_macro_use (pfile, node, tok->src_loc)
+	      && !cpp_fun_like_macro_p (node))
+	    cpp_error_with_line (pfile, CPP_DL_ERROR, tok->src_loc, 0, 
+				 "module control-line \"%s\" cannot be"
+				 " an object-like macro",
+				 NODE_NAME (node));
+	}
+
+      /* Map to underbar variants.  */
+      keyword->val.node.node = n_modules[header_count
+					 ? spec_nodes::M_IMPORT
+					 : spec_nodes::M_MODULE][1];
+      if (backup != 1)
+	result->val.node.node = n_modules[spec_nodes::M_EXPORT][1];
+
+      /* Maybe tell the tokenizer we expect a header-name down the
+	 road.  */
+      pfile->state.directive_file_token = header_count;
+    }
+  else
+    {
+    not_module:
+      /* Drop out of directive mode.  */
+      pfile->state.save_comments
+	= !CPP_OPTION (pfile, discard_comments);
+      pfile->state.in_deferred_pragma = false;
+      pfile->state.angled_headers = false;
+    }
+
+  /* In either case we want to backup the peeked tokens.  */
+  if (backup)
+    {
+      /* If we saw EOL, we should drop it, because this isn't a module
+	 control-line after all.  */
+      bool eol = peek->type == CPP_PRAGMA_EOL;
+      if (!eol || backup > 1)
+	{
+	  /* Put put the peeked tokens back  */
+	  _cpp_backup_tokens_direct (pfile, backup);
+	  /* But if the last one was an EOL, forget it.  */
+	  if (eol)
+	    pfile->lookaheads--;
+	}
+    }
+}
+
 /* Lex a token into RESULT (external interface).  Takes care of issues
    like directive handling, token lookahead, multiple include
    optimization and skipping.  */
@@ -2654,6 +2795,22 @@ _cpp_lex_token (cpp_reader *pfile)
 	    }
 	  else if (pfile->state.in_deferred_pragma)
 	    result = &pfile->directive_result;
+	  else if (result->type == CPP_NAME
+		   && __builtin_expect
+		   (result->val.node.node->flags & NODE_MODULE, 0)
+		   && !pfile->state.skipping
+		   /* Unlike regular directives, we do not deal with
+		      tokenizing module directives as macro arguments.
+		      That's not permitted.  */
+		   && !pfile->state.parsing_args)
+	    {
+	      /* P1857.  Before macro expansion, At start of logical
+		 line ... */
+	      /* We don't have to consider lookaheads at this point.  */
+	      gcc_checking_assert (!pfile->lookaheads);
+
+	      cpp_maybe_module_directive (pfile, result);
+	    }
 
 	  if (pfile->cb.line_change && !pfile->state.skipping)
 	    pfile->cb.line_change (pfile, result, pfile->state.parsing_args);
@@ -3446,7 +3609,11 @@ cpp_output_token (const cpp_token *token, FILE *fp)
       break;
 
     case SPELL_LITERAL:
+      if (token->type == CPP_HEADER_NAME)
+	fputc ('"', fp);
       fwrite (token->val.str.text, 1, token->val.str.len, fp);
+      if (token->type == CPP_HEADER_NAME)
+	fputc ('"', fp);
       break;
 
     case SPELL_NONE:
@@ -3932,6 +4099,188 @@ do_peek_prev (const unsigned char *peek, const unsigned char *bound)
     return peek;
 }
 
+/* If PEEK[-1] is identifier MATCH, scan past it and trailing white
+   space.  Otherwise return NULL.  */
+
+static const unsigned char *
+do_peek_ident (const char *match, const unsigned char *peek,
+	       const unsigned char *limit)
+{
+  for (; *++match; peek++)
+    if (*peek != *match)
+      {
+	peek = do_peek_next (peek, limit);
+	if (*peek != *match)
+	  return NULL;
+      }
+
+  /* Must now not be looking at an identifier char.  */
+  peek = do_peek_next (peek, limit);
+  if (ISIDNUM (*peek))
+    return NULL;
+
+  /* Skip control-line whitespace.  */
+ ws:
+  while (*peek == ' ' || *peek == '\t')
+    peek++;
+  if (__builtin_expect (*peek == '\\', false))
+    {
+      peek = do_peek_backslash (peek, limit);
+      if (*peek != '\\')
+	goto ws;
+    }
+
+  return peek;
+}
+
+/* Are we looking at a module control line starting as PEEK - 1?  */
+
+static bool
+do_peek_module (cpp_reader *pfile, unsigned char c,
+		const unsigned char *peek, const unsigned char *limit)
+{
+  bool import = false;
+
+  if (__builtin_expect (c == 'e', false))
+    {
+      if (!((peek[0] == 'x' || peek[0] == '\\')
+	    && (peek = do_peek_ident ("export", peek, limit))))
+	return false;
+
+      /* export, peek for import or module.  No need to peek __import
+	 here.  */
+      if (peek[0] == 'i')
+	{
+	  if (!((peek[1] == 'm' || peek[1] == '\\')
+		&& (peek = do_peek_ident ("import", peek + 1, limit))))
+	    return false;
+	  import = true;
+	}
+      else if (peek[0] == 'm')
+	{
+	  if (!((peek[1] == 'o' || peek[1] == '\\')
+		&& (peek = do_peek_ident ("module", peek + 1, limit))))
+	    return false;
+	}
+      else
+	return false;
+    }
+  else if (__builtin_expect (c == 'i', false))
+    {
+      if (!((peek[0] == 'm' || peek[0] == '\\')
+	    && (peek = do_peek_ident ("import", peek, limit))))
+	return false;
+      import = true;
+    }
+  else if (__builtin_expect (c == '_', false))
+    {
+      /* Needed for translated includes.   */
+      if (!((peek[0] == '_' || peek[0] == '\\')
+	    && (peek = do_peek_ident ("__import", peek, limit))))
+	return false;
+      import = true;
+    }
+  else if (__builtin_expect (c == 'm', false))
+    {
+      if (!((peek[0] == 'o' || peek[0] == '\\')
+	    && (peek = do_peek_ident ("module", peek, limit))))
+	return false;
+    }
+  else
+    return false;
+
+  /* Peek the next character to see if it's good enough.  We'll be at
+     the first non-whitespace char, including skipping an escaped
+     newline.  */
+  /* ... import followed by identifier, ':', '<' or header-name
+     preprocessing tokens, or module followed by identifier, ':' or
+     ';' preprocessing tokens.  */
+  unsigned char p = *peek++;
+      
+  /* A character literal is ... single quotes, ... optionally preceded
+     by u8, u, U, or L */
+  /* A string-literal is a ... double quotes, optionally prefixed by
+     R, u8, u8R, u, uR, U, UR, L, or LR */
+  if (p == 'u')
+    {
+      peek = do_peek_next (peek, limit);
+      if (*peek == '8')
+	{
+	  peek++;
+	  goto peek_u8;
+	}
+      goto peek_u;
+    }
+  else if (p == 'U' || p == 'L')
+    {
+    peek_u8:
+      peek = do_peek_next (peek, limit);
+    peek_u:
+      if (*peek == '\"' || *peek == '\'')
+	return false;
+
+      if (*peek == 'R')
+	goto peek_R;
+      /* Identifier. Ok.  */
+    }
+  else if (p == 'R')
+    {
+    peek_R:
+      if (CPP_OPTION (pfile, rliterals))
+	{
+	  peek = do_peek_next (peek, limit);
+	  if (*peek == '\"')
+	    return false;
+	}
+      /* Identifier. Ok.  */
+    }
+  else if ('Z' - 'A' == 25
+	   ? ((p >= 'A' && p <= 'Z') || (p >= 'a' && p <= 'z') || p == '_')
+	   : ISIDST (p))
+    {
+      /* Identifier.  Ok. */
+    }
+  else if (p == '<')
+    {
+      /* Maybe angle header, ok for import.  Reject
+	 '<=', '<<' digraph:'<:'.  */
+      if (!import)
+	return false;
+      peek = do_peek_next (peek, limit);
+      if (*peek == '=' || *peek == '<'
+	  || (*peek == ':' && CPP_OPTION (pfile, digraphs)))
+	return false;
+    }
+  else if (p == ';')
+    {
+      /* SEMICOLON, ok for module.  */
+      if (import)
+	return false;
+    }
+  else if (p == '"')
+    {
+      /* STRING, ok for import.  */
+      if (!import)
+	return false;
+    }
+  else if (p == ':')
+    {
+      /* Maybe COLON, ok.  Reject '::', digraph:':>'.  */
+      peek = do_peek_next (peek, limit);
+      if (*peek == ':' || (*peek == '>' && CPP_OPTION (pfile, digraphs)))
+	return false;
+    }
+  else
+    /* FIXME: Detect a unicode character, excluding those not
+       permitted as the initial character. [lex.name]/1.  I presume
+       we need to check the \[uU] spellings, and directly using
+       Unicode in say UTF8 form?  Or perhaps we do the phase-1
+       conversion of UTF8 to universal-character-names?  */
+    return false;
+
+  return true;
+}
+
 /* Directives-only scanning.  Somewhat more relaxed than correct
    parsing -- some ill-formed programs will not be rejected.  */
 
@@ -3940,6 +4289,8 @@ cpp_directive_only_process (cpp_reader *pfile,
 			    void *data,
 			    void (*cb) (cpp_reader *, CPP_DO_task, void *, ...))
 {
+  bool module_p = CPP_OPTION (pfile, module_directives);
+
   do
     {
     restart:
@@ -4332,6 +4683,51 @@ cpp_directive_only_process (cpp_reader *pfile,
 	      }
 	      goto dflt;
 
+	    case '_':
+	    case 'e':
+	    case 'i':
+	    case 'm':
+	      if (bol && module_p && !pfile->state.skipping
+		  && do_peek_module (pfile, c, pos, limit))
+		{
+		  /* We've seen the start of a module control line.
+		     Start up the tokenizer.  */
+		  pos--; /* Backup over the first character.  */
+
+		  /* Backup over whitespace to start of line.  */
+		  while (pos > line_start
+			 && (pos[-1] == ' ' || pos[-1] == '\t'))
+		    pos--;
+
+		  if (pos > base)
+		    cb (pfile, CPP_DO_print, data, line_count, base, pos - base);
+
+		  /* Prep things for directive handling. */
+		  buffer->next_line = pos;
+		  buffer->need_line = true;
+
+		  /* Now get tokens until the PRAGMA_EOL.  */
+		  do
+		    {
+		      location_t spelling;
+		      const cpp_token *tok
+			= cpp_get_token_with_location (pfile, &spelling);
+
+		      gcc_assert (pfile->state.in_deferred_pragma
+				  || tok->type == CPP_PRAGMA_EOL);
+		      cb (pfile, CPP_DO_token, data, tok, spelling);
+		    }
+		  while (pfile->state.in_deferred_pragma);
+
+		  if (pfile->buffer->next_line < pfile->buffer->rlimit)
+		    cb (pfile, CPP_DO_location, data,
+			pfile->line_table->highest_line);
+
+		  pfile->mi_valid = false;
+		  goto restart;
+		}
+	      goto dflt;
+
 	    default:
 	    dflt:
 	      bol = false;
diff --git c/libcpp/macro.c w/libcpp/macro.c
index e304f67c2e0..f5f280dfdc7 100644
--- c/libcpp/macro.c
+++ w/libcpp/macro.c
@@ -2930,6 +2932,85 @@ cpp_get_token_1 (cpp_reader *pfile, location_t *location)
     }
 
   pfile->about_to_expand_macro_p = saved_about_to_expand_macro;
+
+  if (pfile->state.directive_file_token
+      && !pfile->state.parsing_args
+      && !(result->type == CPP_PADDING || result->type == CPP_COMMENT)
+      && !(15 & --pfile->state.directive_file_token))
+    {
+      /* Do header-name frobbery.  Concatenate < ... > as approprate.
+	 Do header search if needed, and finally drop the outer <> or
+	 "".  */
+      pfile->state.angled_headers = false;
+
+      /* Do angle-header reconstitution.  Then do include searching.
+	 We'll always end up with a ""-quoted header-name in that
+	 case.  If searching finds nothing, we emit a diagnostic and
+	 an empty string.  */
+      size_t len = 0;
+      char *fname = NULL;
+
+      cpp_token *tmp = _cpp_temp_token (pfile);
+      *tmp = *result;
+
+      tmp->type = CPP_HEADER_NAME;
+      bool need_search = !pfile->state.directive_file_token;
+      pfile->state.directive_file_token = 0;
+
+      bool angle = result->type != CPP_STRING;
+      if (result->type == CPP_HEADER_NAME
+	  || (result->type == CPP_STRING && result->val.str.text[0] != 'R'))
+	{
+	  len = result->val.str.len - 2;
+	  fname = XNEWVEC (char, len + 1);
+	  memcpy (fname, result->val.str.text + 1, len);
+	  fname[len] = 0;
+	}
+      else if (result->type == CPP_LESS)
+	fname = _cpp_bracket_include (pfile);
+
+      if (fname)
+	{
+	  /* We have a header-name.  Look it up.  This will emit an
+	     unfound diagnostic.  Canonicalize the found name.  */
+	  const char *found = fname;
+
+	  if (need_search)
+	    {
+	      found = cpp_find_header_unit (pfile, fname, angle, tmp->src_loc);
+	      if (!found)
+		found = "";
+	      len = strlen (found);
+	    }
+	  /* Force a leading './' if it's not absolute.  */
+	  bool dotme = (found[0] == '.' ? !IS_DIR_SEPARATOR (found[1])
+			: found[0] && !IS_ABSOLUTE_PATH (found));
+
+	  if (BUFF_ROOM (pfile->u_buff) < len + 1 + dotme * 2)
+	    _cpp_extend_buff (pfile, &pfile->u_buff, len + 1 + dotme * 2);
+	  unsigned char *buf = BUFF_FRONT (pfile->u_buff);
+	  size_t pos = 0;
+	      
+	  if (dotme)
+	    {
+	      buf[pos++] = '.';
+	      /* Apparently '/' is unconditional.  */
+	      buf[pos++] = '/';
+	    }
+	  memcpy (&buf[pos], found, len);
+	  pos += len;
+	  buf[pos] = 0;
+
+	  tmp->val.str.len = pos;
+	  tmp->val.str.text = buf;
+
+	  tmp->type = CPP_HEADER_NAME;
+	  XDELETEVEC (fname);
+	  
+	  result = tmp;
+	}
+    }
+
   return result;
 }
 
diff --git c/gcc/c-family/c-lex.c w/gcc/c-family/c-lex.c
index e81e16ddc26..44575473719 100644
--- c/gcc/c-family/c-lex.c
+++ w/gcc/c-family/c-lex.c
@@ -654,8 +656,11 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
       *value = build_int_cst (integer_type_node, tok->val.pragma);
       break;
 
-      /* These tokens should not be visible outside cpplib.  */
     case CPP_HEADER_NAME:
+      *value = build_string (tok->val.str.len, (const char *)tok->val.str.text);
+      break;
+
+      /* These tokens should not be visible outside cpplib.  */
     case CPP_MACRO_ARG:
       gcc_unreachable ();
 
diff --git c/libcpp/init.c w/libcpp/init.c
index 6c52f50de39..96ade569457 100644
--- c/libcpp/init.c
+++ w/libcpp/init.c
@@ -840,4 +855,27 @@ post_options (cpp_reader *pfile)
       CPP_OPTION (pfile, trigraphs) = 0;
       CPP_OPTION (pfile, warn_trigraphs) = 0;
     }
+
+  if (CPP_OPTION (pfile, module_directives))
+    {
+      /* These unspellable tokens have a leading space.  */
+      const char *const inits[spec_nodes::M_HWM]
+	= {"export ", "module ", "import ", "__import"};
+
+      for (int ix = 0; ix != spec_nodes::M_HWM; ix++)
+	{
+	  cpp_hashnode *node = cpp_lookup (pfile, UC (inits[ix]),
+					   strlen (inits[ix]));
+
+	  /* Token we pass to the compiler.  */
+	  pfile->spec_nodes.n_modules[ix][1] = node;
+
+	  if (ix != spec_nodes::M__IMPORT)
+	    /* Token we recognize when lexing, drop the trailing ' '.  */
+	    node = cpp_lookup (pfile, NODE_NAME (node), NODE_LEN (node) - 1);
+
+	  node->flags |= NODE_MODULE;
+	  pfile->spec_nodes.n_modules[ix][0] = node;
+	}
+    }
 }
diff --git c/libcpp/internal.h w/libcpp/internal.h
index 4759961a33a..17b65601b66 100644
--- c/libcpp/internal.h
+++ w/libcpp/internal.h
@@ -280,6 +280,9 @@ struct lexer_state
   /* Nonzero when tokenizing a deferred pragma.  */
   unsigned char in_deferred_pragma;
 
+  /* Count to token that is a header-name.  */
+  unsigned char directive_file_token;
+
   /* Nonzero if the deferred pragma being handled allows macro expansion.  */
   unsigned char pragma_allow_expansion;
 };
@@ -292,6 +295,12 @@ struct spec_nodes
   cpp_hashnode *n_false;		/* C++ keyword false */
   cpp_hashnode *n__VA_ARGS__;		/* C99 vararg macros */
   cpp_hashnode *n__VA_OPT__;		/* C++ vararg macros */
+
+  enum {M_EXPORT, M_MODULE, M_IMPORT, M__IMPORT, M_HWM};
+  
+  /* C++20 modules, only set when module_directives is in effect.
+     incoming variants [0], outgoing ones [1] */
+  cpp_hashnode *n_modules[M_HWM][2];
 };
 
 typedef struct _cpp_line_note _cpp_line_note;