From patchwork Mon Oct 28 12:33:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Sidwell X-Patchwork-Id: 1185337 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-511892-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=acm.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="tPwiJHi3"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="T1kf+FdM"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 471vKK29bRz9sPK for ; Mon, 28 Oct 2019 23:33:55 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=x8seR2yONkO4w02b8CTdWia+LEEaUgZYmJ+qlQRoeD9znYbvRQ nqrgiOxcWq1RDKAQFkXPdNkM56FZt0BxpaPoNdpk/RD8fS664c5sEInhrJLTY6yU 8uzAdBagZf71ChCsbciklBv4cmFIssVQOODK/op0eSD8gJdOF47VvkpQM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=MBvJ//wesQ4OaDSkM2xsCvE2ieU=; b=tPwiJHi3gcGyDrQBQKXk 2NEc7qH9O/Z5pglzwGOVOdl4XsqPLu0C3mrXiZ4Gfc5jCqDDkjzeusLqjb/JJaUA +CpdZPtZWXQilTVfaRw/TrMCxistcNX+ro/23VqKh0xzYbB743+UsuzB+fcNgNzW yY3r4X2cw380qJTlFySHmUE= Received: (qmail 52069 invoked by alias); 28 Oct 2019 12:33:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 52029 invoked by uid 89); 28 Oct 2019 12:33:46 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.1 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=HX-Google-DKIM-Signature:sender, H*p:D*gmail.com, H*r:sk:TLS_AES, surrounded X-HELO: mail-qk1-f193.google.com Received: from mail-qk1-f193.google.com (HELO mail-qk1-f193.google.com) (209.85.222.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 28 Oct 2019 12:33:44 +0000 Received: by mail-qk1-f193.google.com with SMTP id m4so8234846qke.9 for ; Mon, 28 Oct 2019 05:33:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:to:from:subject:message-id:date:user-agent:mime-version :content-language; bh=7W3s/NSgWlj8N8CexhCL1S4py/DMG84cW6sLQz+YD6g=; b=T1kf+FdMZHmctZG593XJnaWBPx724Txgfr982yi6nZT1AgQMS7iFw/7FqRJfxCjdpG wtDsHIf8Biyv3Rb1XQHQ99ejG7rKgQ1PEIdy79INlcLc4xQFr+lDqo6BbGHF64NTngeG Zbt3Rwg56vdqCo6G5F+mq2n21UP0ZoDwr95zEYDsQ3MLKSxr1dVftO0aqtoec3eQ2U0V 8SriBLxunNtsYErpRuo6uTY420PTunayJE4hsVsVMa8d/OlCQyRxbXjrxU/4uRg+uxAD HR+9UDbVWZoqL+velnCDXthmbb1qdIAQo8n/dLBB+oWiQ1cnatSg+H6DjUDW6us1EHix fShw== Received: from ?IPv6:2620:10d:c0a3:1407:99fa:20f6:b156:f7e2? ([2620:10d:c091:500::3:f6a4]) by smtp.googlemail.com with ESMTPSA id b18sm3067056qkh.95.2019.10.28.05.33.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 28 Oct 2019 05:33:41 -0700 (PDT) To: GCC Patches From: Nathan Sidwell Subject: [C++ PATCH] simplify deferred parsing lexer Message-ID: <8529c108-d5de-1eac-fce5-696839ee1fd9@acm.org> Date: Mon, 28 Oct 2019 08:33:40 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 We use an eof_token global variable as a sentinel on a deferred parse (such as in-class function definitions, or default args). This complicates retrieving the next token in certain places. As such deferred parses always nest properly and completely before resuming the outer lexer, we can simply morph the token after the deferred buffer into a CPP_EOF token and restore it afterwards. I finally got around to implementing it with this patch. One complication is that we have to change the discriminator for when the token's value is a tree. We can't look at the token's type because it might have been overwritten. I add a bool flag to the token (there's several spare bits), and use that. This does simplify the discriminator because we just check a single bit, rather than a set of token types. I think this will allow us to report the location of EOF tokens -- currently we think they have no location, because they're (nearly) always the shared eof_token. that's a later change, as it could change some diagnostic locations in the testsuite [to be better]. applying to trunk. nathan 2019-10-28 Nathan Sidwell * parser.h (struct cp_token): Drop {ENUM,BOOL}_BITFIELD C-ism. Add tree_check_p flag, use as nested union discriminator. (struct cp_lexer): Add saved_type & saved_keyword fields. * parser.c (eof_token): Delete. (cp_lexer_new_main): Always init last_token to last token of buffer. (cp_lexer_new_from_tokens): Overlay EOF token at end of range. (cp_lexer_destroy): Restore token under the EOF. (cp_lexer_previous_token_position): No check for eof_token here. (cp_lexer_get_preprocessor_token): Clear tree_check_p. (cp_lexer_peek_nth_token): Check CPP_EOF not eof_token. (cp_lexer_consume_token): Assert not CPP_EOF, no check for eof_token. (cp_lexer_purge_token): Likewise. (cp_lexer_purge_tokens_after): No check for EOF token. (cp_parser_nested_name_specifier, cp_parser_decltype) (cp_parser_template_id): Set tree_check_p. Index: gcc/cp/parser.c =================================================================== --- gcc/cp/parser.c (revision 277460) +++ gcc/cp/parser.c (working copy) @@ -53,9 +53,4 @@ along with GCC; see the file COPYING3. and c-lex.c) and the C++ parser. */ -static cp_token eof_token = -{ - CPP_EOF, RID_MAX, 0, false, false, false, 0, { NULL } -}; - /* The various kinds of non integral constant we encounter. */ enum non_integral_constant { @@ -661,10 +656,8 @@ cp_lexer_new_main (void) } - lexer->last_token = lexer->buffer->address () + lexer->next_token = lexer->buffer->address (); + lexer->last_token = lexer->next_token + lexer->buffer->length () - 1; - lexer->next_token = lexer->buffer->length () - ? lexer->buffer->address () - : &eof_token; /* Subsequent preprocessor diagnostics should use compiler @@ -688,5 +681,12 @@ cp_lexer_new_from_tokens (cp_token_cache /* We do not own the buffer. */ lexer->buffer = NULL; - lexer->next_token = first == last ? &eof_token : first; + + /* Insert an EOF token. */ + lexer->saved_type = last->type; + lexer->saved_keyword = last->keyword; + last->type = CPP_EOF; + last->keyword = RID_MAX; + + lexer->next_token = first; lexer->last_token = last; @@ -705,5 +705,12 @@ static void cp_lexer_destroy (cp_lexer *lexer) { - vec_free (lexer->buffer); + if (lexer->buffer) + vec_free (lexer->buffer); + else + { + /* Restore the token we overwrite with EOF. */ + lexer->last_token->type = lexer->saved_type; + lexer->last_token->keyword = lexer->saved_keyword; + } lexer->saved_tokens.release (); ggc_free (lexer); @@ -732,6 +739,4 @@ static inline cp_token_position cp_lexer_token_position (cp_lexer *lexer, bool previous_p) { - gcc_assert (!previous_p || lexer->next_token != &eof_token); - return lexer->next_token - previous_p; } @@ -752,8 +757,5 @@ static inline cp_token_position cp_lexer_previous_token_position (cp_lexer *lexer) { - if (lexer->next_token == &eof_token) - return lexer->last_token - 1; - else - return cp_lexer_token_position (lexer, true); + return cp_lexer_token_position (lexer, true); } @@ -808,4 +810,5 @@ cp_lexer_get_preprocessor_token (cp_lexe token->purged_p = false; token->error_reported = false; + token->tree_check_p = false; /* On some systems, some header files are surrounded by an @@ -1083,14 +1086,7 @@ cp_lexer_peek_nth_token (cp_lexer* lexer --n; token = lexer->next_token; - gcc_assert (!n || token != &eof_token); - while (n != 0) + while (n && token->type != CPP_EOF) { ++token; - if (token == lexer->last_token) - { - token = &eof_token; - break; - } - if (!token->purged_p) --n; @@ -1114,16 +1110,10 @@ cp_lexer_consume_token (cp_lexer* lexer) cp_token *token = lexer->next_token; - gcc_assert (token != &eof_token); gcc_assert (!lexer->in_pragma || token->type != CPP_PRAGMA_EOL); do { + gcc_assert (token->type != CPP_EOF); lexer->next_token++; - if (lexer->next_token == lexer->last_token) - { - lexer->next_token = &eof_token; - break; - } - } while (lexer->next_token->purged_p); @@ -1151,5 +1141,5 @@ cp_lexer_purge_token (cp_lexer *lexer) cp_token *tok = lexer->next_token; - gcc_assert (tok != &eof_token); + gcc_assert (tok->type != CPP_EOF); tok->purged_p = true; tok->location = UNKNOWN_LOCATION; @@ -1158,12 +1148,5 @@ cp_lexer_purge_token (cp_lexer *lexer) do - { - tok++; - if (tok == lexer->last_token) - { - tok = &eof_token; - break; - } - } + tok++; while (tok->purged_p); lexer->next_token = tok; @@ -1179,10 +1162,7 @@ cp_lexer_purge_tokens_after (cp_lexer *l cp_token *peek = lexer->next_token; - if (peek == &eof_token) - peek = lexer->last_token; - gcc_assert (tok < peek); - for ( tok += 1; tok != peek; tok += 1) + for (tok++; tok != peek; tok++) { tok->purged_p = true; @@ -6617,4 +6597,5 @@ cp_parser_nested_name_specifier_opt (cp_ so the memory will not be reclaimed during token replacing below. */ token->u.tree_check_value = ggc_cleared_alloc (); + token->tree_check_p = true; token->u.tree_check_value->value = parser->scope; token->u.tree_check_value->checks = get_deferred_access_checks (); @@ -14802,4 +14783,5 @@ cp_parser_decltype (cp_parser *parser) start_token->type = CPP_DECLTYPE; start_token->u.tree_check_value = ggc_cleared_alloc (); + start_token->tree_check_p = true; start_token->u.tree_check_value->value = expr; start_token->u.tree_check_value->checks = get_deferred_access_checks (); @@ -16589,4 +16571,5 @@ cp_parser_template_id (cp_parser *parser so the memory will not be reclaimed during token replacing below. */ token->u.tree_check_value = ggc_cleared_alloc (); + token->tree_check_p = true; token->u.tree_check_value->value = template_id; token->u.tree_check_value->checks = get_deferred_access_checks (); Index: gcc/cp/parser.h =================================================================== --- gcc/cp/parser.h (revision 277460) +++ gcc/cp/parser.h (working copy) @@ -42,21 +42,23 @@ struct GTY(()) tree_check { struct GTY (()) cp_token { /* The kind of token. */ - ENUM_BITFIELD (cpp_ttype) type : 8; + enum cpp_ttype type : 8; /* If this token is a keyword, this value indicates which keyword. Otherwise, this value is RID_MAX. */ - ENUM_BITFIELD (rid) keyword : 8; + enum rid keyword : 8; /* Token flags. */ unsigned char flags; /* True if this token is from a context where it is implicitly extern "C" */ - BOOL_BITFIELD implicit_extern_c : 1; + bool implicit_extern_c : 1; /* True if an error has already been reported for this token, such as a CPP_NAME token that is not a keyword (i.e., for which KEYWORD is RID_MAX) iff this name was looked up and found to be ambiguous. */ - BOOL_BITFIELD error_reported : 1; + bool error_reported : 1; /* True for a token that has been purged. If a token is purged, it is no longer a valid token and it should be considered deleted. */ - BOOL_BITFIELD purged_p : 1; - /* 5 unused bits. */ + bool purged_p : 1; + bool tree_check_p : 1; + /* 4 unused bits. */ + /* The location at which this token was found. */ location_t location; @@ -64,10 +66,8 @@ struct GTY (()) cp_token { union cp_token_value { /* Used for compound tokens such as CPP_NESTED_NAME_SPECIFIER. */ - struct tree_check* GTY((tag ("1"))) tree_check_value; + struct tree_check* GTY((tag ("true"))) tree_check_value; /* Use for all other tokens. */ - tree GTY((tag ("0"))) value; - } GTY((desc ("(%1.type == CPP_TEMPLATE_ID)" - "|| (%1.type == CPP_NESTED_NAME_SPECIFIER)" - "|| (%1.type == CPP_DECLTYPE)"))) u; + tree GTY((tag ("false"))) value; + } GTY((desc ("%1.tree_check_p"))) u; }; @@ -100,4 +100,8 @@ struct GTY (()) cp_lexer { vec GTY ((skip)) saved_tokens; + /* Saved pieces of end token we replaced with the eof token. */ + enum cpp_ttype saved_type : 8; + enum rid saved_keyword : 8; + /* The next lexer in a linked list of lexers. */ struct cp_lexer *next;