From patchwork Tue Dec 8 13:07:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Sidwell X-Patchwork-Id: 1412684 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=acm.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=TD9q2D2j; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Cr0p54RY7z9sWK for ; Wed, 9 Dec 2020 00:07:25 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AF19F3938395; Tue, 8 Dec 2020 13:07:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by sourceware.org (Postfix) with ESMTPS id B808F386100F for ; Tue, 8 Dec 2020 13:07:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B808F386100F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=acm.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nathanmsidwell@gmail.com Received: by mail-qt1-x833.google.com with SMTP id u21so11793017qtw.11 for ; Tue, 08 Dec 2020 05:07:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:to:from:subject:message-id:date:user-agent:mime-version :content-language; bh=j+frhyIqxiUqR8/z+NmTSCmOs+H97xegXXA/vJxqRUM=; b=TD9q2D2j9O5/IPk8NNhagSnTd1piP2jEpgw+IpU9/4vJZzMWJzaQTwAJfPSJt/NELE N2adq9MCz5g86d3tE9KC79A6wFryf2egTPMTFALL6PTojoiNTxcHCuy1FrF/b3SYGHEY T7oZokn/AXRGx5VGeK+SUNbXz9LI/qCllRwpkDjm5cjdtR23BzUltG995cJW9OEbqQne YnNiByx1BjbBZ/QmWKCejxFwaymM3aCyeNIqDf3CgLAm9TcC3ULBRlzYsrauUpPG/V7c MbU6XZfZfP6GLTmHycFoM10rX1M7pgMfGrz1snYwnydGwd5H/xIB472AU3lOl8jZVyPC pE1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:to:from:subject:message-id:date :user-agent:mime-version:content-language; bh=j+frhyIqxiUqR8/z+NmTSCmOs+H97xegXXA/vJxqRUM=; b=Iipym0OG25pJKBghJ2f76ZsJFdKYSZNOJmGJlwtRy+M7MSmspH1uwjTm0FRhC5lpx0 9HFu5X94Dp9LuLLnFrkESDkG3CtMDaliy8JnpFM+/2jwt9eiEZFKfNGCi1SesFAm8n+j qfY5RcUhxwGyvtDgYTXRGpTE6hgGEcd7ouDt9XNDtLGO6FdiOotvONkcYDmpHnvmSY34 5tNh98ucyN/PsItWcdKcKruy/DG/ZQKB65e8cw+FAJCMRsK2aBwVL5NLxK3pM7dEfCtF +PxnU0INt8pXO01Aw7tnaE9MzkVZTQ0XkMPiUvmlS3BMLURs5/kQOkB7t01Wc8M4YlS+ 94Kg== X-Gm-Message-State: AOAM532H7zyhlbq/tv1yr8ucYvdp5snAYw8Xsqe+AFr2ITd1jWKwLuMb JV3I/sg6CJFwoXYniik/BZQ= X-Google-Smtp-Source: ABdhPJw4wa5x+QnU7tP+6rANO1Mb2Z6pkQY8i0Ows5LfDxfiNF2xiktB9AL7jY6gZxbcuzwmylTO2A== X-Received: by 2002:ac8:588c:: with SMTP id t12mr15349317qta.184.1607432838820; Tue, 08 Dec 2020 05:07:18 -0800 (PST) Received: from ?IPv6:2620:10d:c0a8:1102:9e5:be57:d7de:21c6? ([2620:10d:c091:480::1:a476]) by smtp.googlemail.com with ESMTPSA id v4sm14831784qth.16.2020.12.08.05.07.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Dec 2020 05:07:17 -0800 (PST) To: GCC Patches From: Nathan Sidwell Subject: c++: module directive FSM Message-ID: Date: Tue, 8 Dec 2020 08:07:16 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 Content-Language: en-US X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" As mentioned in the preprocessor patches, there's a new kind of preprocessor directive for modules, and it interacts with the compiler-proper, as that has to stream in header-unit macro information (when the directive is an import that names a header-unit). This is that machinery. It's an FSM that inspects the token stream and does the minimal parsing to detect such imports. This ends up being called from the C++ parser's tokenizer and from the -E tokenizer (via a lang hook). The actual module streaming is a stub here. gcc/cp/ * cp-tree.h (module_token_pre, module_token_cdtor) (module_token_lang): Declare. * lex.c: Include langhooks. (struct module_token_filter): New. * cp-tree.h (module_token_pre, module_token_cdtor) (module_token_lang): Define. * module.cc (get_module, preprocess_module, preprocessed_module): Nop stubs. diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h index b72069eecda..aa2b0f782fa 100644 --- i/gcc/cp/cp-tree.h +++ w/gcc/cp/cp-tree.h @@ -6849,6 +6849,10 @@ extern void set_identifier_kind (tree, cp_identifier_kind); extern bool cxx_init (void); extern void cxx_finish (void); extern bool in_main_input_context (void); +extern uintptr_t module_token_pre (cpp_reader *, const cpp_token *, uintptr_t); +extern uintptr_t module_token_cdtor (cpp_reader *, uintptr_t); +extern uintptr_t module_token_lang (int type, int keyword, tree value, + location_t, uintptr_t); /* in method.c */ extern void init_method (void); diff --git i/gcc/cp/lex.c w/gcc/cp/lex.c index 795f5718198..6053848535e 100644 --- i/gcc/cp/lex.c +++ w/gcc/cp/lex.c @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include "c-family/c-objc.h" #include "gcc-rich-location.h" #include "cp-name-hint.h" +#include "langhooks.h" static int interface_strcmp (const char *); static void init_cp_pragma (void); @@ -380,7 +381,206 @@ interface_strcmp (const char* s) return 1; } - +/* We've just read a cpp-token, figure out our next state. Hey, this + is a hand-coded co-routine! */ + +struct module_token_filter +{ + enum state + { + idle, + module_first, + module_cont, + module_end, + }; + + enum state state : 8; + bool is_import : 1; + bool got_export : 1; + bool got_colon : 1; + bool want_dot : 1; + + location_t token_loc; + cpp_reader *reader; + module_state *module; + module_state *import; + + module_token_filter (cpp_reader *reader) + : state (idle), is_import (false), + got_export (false), got_colon (false), want_dot (false), + token_loc (UNKNOWN_LOCATION), + reader (reader), module (NULL), import (NULL) + { + }; + + /* Process the next token. Note we cannot see CPP_EOF inside a + pragma -- a CPP_PRAGMA_EOL always happens. */ + uintptr_t resume (int type, int keyword, tree value, location_t loc) + { + unsigned res = 0; + + switch (state) + { + case idle: + if (type == CPP_KEYWORD) + switch (keyword) + { + default: + break; + + case RID__EXPORT: + got_export = true; + res = lang_hooks::PT_begin_pragma; + break; + + case RID__IMPORT: + is_import = true; + /* FALLTHRU */ + case RID__MODULE: + state = module_first; + want_dot = false; + got_colon = false; + token_loc = loc; + import = NULL; + if (!got_export) + res = lang_hooks::PT_begin_pragma; + break; + } + break; + + case module_first: + if (is_import && type == CPP_HEADER_NAME) + { + /* A header name. The preprocessor will have already + done include searching and canonicalization. */ + state = module_end; + goto header_unit; + } + + if (type == CPP_PADDING || type == CPP_COMMENT) + break; + + state = module_cont; + if (type == CPP_COLON && module) + { + got_colon = true; + import = module; + break; + } + /* FALLTHROUGH */ + + case module_cont: + switch (type) + { + case CPP_PADDING: + case CPP_COMMENT: + break; + + default: + /* If we ever need to pay attention to attributes for + header modules, more logic will be needed. */ + state = module_end; + break; + + case CPP_COLON: + if (got_colon) + state = module_end; + got_colon = true; + /* FALLTHROUGH */ + case CPP_DOT: + if (!want_dot) + state = module_end; + want_dot = false; + break; + + case CPP_PRAGMA_EOL: + goto module_end; + + case CPP_NAME: + if (want_dot) + { + /* Got name instead of [.:]. */ + state = module_end; + break; + } + header_unit: + import = get_module (value, import, got_colon); + want_dot = true; + break; + } + break; + + case module_end: + if (type == CPP_PRAGMA_EOL) + { + module_end:; + /* End of the directive, handle the name. */ + if (import) + if (module_state *m + = preprocess_module (import, token_loc, module != NULL, + is_import, got_export, reader)) + if (!module) + module = m; + + is_import = got_export = false; + state = idle; + } + break; + } + + return res; + } +}; + +/* Initialize or teardown. */ + +uintptr_t +module_token_cdtor (cpp_reader *pfile, uintptr_t data_) +{ + if (module_token_filter *filter = reinterpret_cast (data_)) + { + preprocessed_module (pfile); + delete filter; + data_ = 0; + } + else if (modules_p ()) + data_ = reinterpret_cast (new module_token_filter (pfile)); + + return data_; +} + +uintptr_t +module_token_lang (int type, int keyword, tree value, location_t loc, + uintptr_t data_) +{ + module_token_filter *filter = reinterpret_cast (data_); + return filter->resume (type, keyword, value, loc); +} + +uintptr_t +module_token_pre (cpp_reader *pfile, const cpp_token *tok, uintptr_t data_) +{ + if (!tok) + return module_token_cdtor (pfile, data_); + + int type = tok->type; + int keyword = RID_MAX; + tree value = NULL_TREE; + + if (tok->type == CPP_NAME) + { + value = HT_IDENT_TO_GCC_IDENT (HT_NODE (tok->val.node.node)); + if (IDENTIFIER_KEYWORD_P (value)) + { + keyword = C_RID_CODE (value); + type = CPP_KEYWORD; + } + } + else if (tok->type == CPP_HEADER_NAME) + value = build_string (tok->val.str.len, (const char *)tok->val.str.text); + + return module_token_lang (type, keyword, value, tok->src_loc, data_); +} /* Parse a #pragma whose sole argument is a string constant. If OPT is true, the argument is optional. */ diff --git i/gcc/cp/module.cc w/gcc/cp/module.cc index f250d6c1819..91a16815811 100644 --- i/gcc/cp/module.cc +++ w/gcc/cp/module.cc @@ -64,3 +64,20 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "langhooks.h" +module_state * +get_module (tree, module_state *, bool) +{ + return nullptr; +} + +module_state * +preprocess_module (module_state *, unsigned, bool, bool, bool, cpp_reader *) +{ + return nullptr; +} + +void +preprocessed_module (cpp_reader *) +{ +} +