From patchwork Wed Oct 16 21:09:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1998271 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KMYw1WDq; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XTNs81Qr4z1xw2 for ; Thu, 17 Oct 2024 08:10:02 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B3D7A3858C48 for ; Wed, 16 Oct 2024 21:10:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 6150E3858D37 for ; Wed, 16 Oct 2024 21:09:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6150E3858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6150E3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729112983; cv=none; b=s5v8VzvKgn9otR3/hmz++N2z8wcWg1NrtolwE1l4fc7N8huI3HV2nKwyFLTes2U/KldSP6ymYYme7youFwRi+y/ZcK6XZwzcyS2VEV9IuXAxKJxUte1sD434iM6h2rycOLYLUv1HCmgqdTBZ7Tui/Qbne8zrCCey5t4Ifm5F5j0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729112983; c=relaxed/simple; bh=QniD8q1ldGMghE5IE6B+gHjUf9k7+z/7p+fdkGq30vg=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=Peh0InfLsnSESFGH5lLJUcKTBqrdCU9WOLABZ+qRM0CoicyAEJ8lIwn+dTMAjKWBnj9iKW8M9cJaVR1Dil4DWmixyFOQnPRocucC8pmrKJme7zOP4kNaU+onjudYGJB3POQhWCPK8H/1mcYQlpeADF7HETSkmPN6uKCh4S0jfqU= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1729112981; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=TS16Mc+Kspq/5d1pRivJZ/Pe877GMOpF+4Ryslu3paw=; b=KMYw1WDqmH16GdzG/o88ZllUD+4Ts+uHrtHRHPy77o+wK6ZwX4Lf2lTUWguq1kTxHDKALP kaE0/qbdnZCnnyXZUm6Qlw0egV5ir/K/tpb2ZEqnwljMzwpuMF+I2fti3Gs5thTA/ZxKBv MXD2cphdih/2sfqfNMyf2ds7n8NFsWs= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-150-6I-lMnIRNLmDbmT9heypTw-1; Wed, 16 Oct 2024 17:09:38 -0400 X-MC-Unique: 6I-lMnIRNLmDbmT9heypTw-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EB65C19560AF for ; Wed, 16 Oct 2024 21:09:36 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.45.224.16]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3AAE71956086; Wed, 16 Oct 2024 21:09:36 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 49GL9X9N1627891 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 16 Oct 2024 23:09:33 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 49GL9WdO1627890; Wed, 16 Oct 2024 23:09:32 +0200 Date: Wed, 16 Oct 2024 23:09:32 +0200 From: Jakub Jelinek To: "Joseph S. Myers" Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] c: Fix up speed up compilation of large char array initializers when not using #embed [PR117177] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, RCVD_IN_SBL_CSS, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi! Apparently my c: Speed up compilation of large char array initializers when not using #embed patch broke building glibc. The issue is that when using CPP_EMBED, we are guaranteed by the preprocessor that there is CPP_NUMBER CPP_COMMA before it and CPP_COMMA CPP_NUMBER after it (or CPP_COMMA CPP_EMBED), so RAW_DATA_CST never ends up at the end of arrays of unknown length. Now, the c_parser_initval optimization attempted to preserve that property rather than changing everything that e.g. inferes array number of elements from the initializer etc. to deal with RAW_DATA_CST at the end, but it didn't take into account the possibility that there could be CPP_COMMA followed by CPP_CLOSE_BRACE (where the CPP_COMMA is redundant). As we are peaking already at 4 tokens in that code, peeking more would require using raw tokens and that seems to be expensive doing it for every pair of tokens due to vec_free done when we are out of raw tokens. So, the following patch instead determines the case where we want another INTEGER_CST element after it after consuming the tokens, and just arranges for another process_init_element. Ok for trunk if this passes bootstrap/regtest? 2024-10-16 Jakub Jelinek PR c/117177 gcc/c/ * c-parser.cc (c_parser_initval): Instead of doing orig_len == INT_MAX checks before consuming tokens to set last = 1, check it after consuming it and if not followed by CPP_COMMA CPP_NUMBER, call process_init_element once more with the last CPP_NUMBER. gcc/testsuite/ * c-c++-common/init-4.c: New test. Jakub --- gcc/c/c-parser.cc.jj 2024-10-16 17:45:16.482325343 +0200 +++ gcc/c/c-parser.cc 2024-10-16 22:57:33.083698527 +0200 @@ -6529,6 +6529,7 @@ c_parser_initval (c_parser *parser, stru unsigned int i; gcc_checking_assert (len >= 64); location_t last_loc = UNKNOWN_LOCATION; + location_t prev_loc = UNKNOWN_LOCATION; for (i = 0; i < 64; ++i) { c_token *tok = c_parser_peek_nth_token_raw (parser, 1 + 2 * i); @@ -6544,6 +6545,7 @@ c_parser_initval (c_parser *parser, stru buf1[i] = (char) tree_to_uhwi (tok->value); if (i == 0) loc = tok->location; + prev_loc = last_loc; last_loc = tok->location; } if (i < 64) @@ -6567,6 +6569,7 @@ c_parser_initval (c_parser *parser, stru unsigned int max_len = 131072 - offsetof (struct tree_string, str) - 1; unsigned int orig_len = len; unsigned int off = 0, last = 0; + unsigned char lastc = 0; if (!wi::neg_p (wi::to_wide (val)) && wi::to_widest (val) <= UCHAR_MAX) off = 1; len = MIN (len, max_len - off); @@ -6596,20 +6599,25 @@ c_parser_initval (c_parser *parser, stru if (tok2->type != CPP_COMMA && tok2->type != CPP_CLOSE_BRACE) break; buf2[i + off] = (char) tree_to_uhwi (tok->value); - /* If orig_len is INT_MAX, this can be flexible array member and - in that case we need to ensure another element which - for CPP_EMBED is normally guaranteed after it. Include - that byte in the RAW_DATA_OWNER though, so it can be optimized - later. */ - if (tok2->type == CPP_CLOSE_BRACE && orig_len == INT_MAX) - { - last = 1; - break; - } + prev_loc = last_loc; last_loc = tok->location; c_parser_consume_token (parser); c_parser_consume_token (parser); } + /* If orig_len is INT_MAX, this can be flexible array member and + in that case we need to ensure another element which + for CPP_EMBED is normally guaranteed after it. Include + that byte in the RAW_DATA_OWNER though, so it can be optimized + later. */ + if (orig_len == INT_MAX + && (!c_parser_next_token_is (parser, CPP_COMMA) + || c_parser_peek_2nd_token (parser)->type != CPP_NUMBER)) + { + --i; + last = 1; + std::swap (prev_loc, last_loc); + lastc = (unsigned char) buf2[i + off]; + } val = make_node (RAW_DATA_CST); TREE_TYPE (val) = integer_type_node; RAW_DATA_LENGTH (val) = i; @@ -6625,6 +6633,13 @@ c_parser_initval (c_parser *parser, stru init.original_type = integer_type_node; init.m_decimal = 0; process_init_element (loc, init, false, braced_init_obstack); + if (last) + { + init.value = build_int_cst (integer_type_node, lastc); + init.original_code = INTEGER_CST; + set_c_expr_source_range (&init, prev_loc, prev_loc); + process_init_element (prev_loc, init, false, braced_init_obstack); + } } } --- gcc/testsuite/c-c++-common/init-4.c.jj 2024-10-16 22:56:05.535934184 +0200 +++ gcc/testsuite/c-c++-common/init-4.c 2024-10-16 22:55:59.649017274 +0200 @@ -0,0 +1,97 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +unsigned char a1[] = { + 0xc8, 0xc9, 0xca, 0xcb, 0x00, 0xcd, 0xce, 0xcf, + 0x00, 0xd1, 0x00, 0xd3, 0xd4, 0x00, 0xd6, 0xd7, + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x00, 0x00, 0xdf, + 0xe0, 0xe1, 0xe2, 0x00, 0xe4, 0xe5, 0xe6, 0xe7, + 0xe8, 0xe9, 0xea, 0xeb, 0x00, 0xed, 0xee, 0xef, + 0x00, 0xf1, 0x00, 0xf3, 0xf4, 0x00, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0x00, 0xff, + 0x00, 0x00, 0xc3, 0xe3, 0x00, 0x00, 0x00, 0x00, + 0xfa, 0xfb, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +}; +unsigned char a2[] = { + 0xc8, 0xc9, 0xca, 0xcb, 0x00, 0xcd, 0xce, 0xcf, + 0x00, 0xd1, 0x00, 0xd3, 0xd4, 0x00, 0xd6, 0xd7, + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x00, 0x00, 0xdf, + 0xe0, 0xe1, 0xe2, 0x00, 0xe4, 0xe5, 0xe6, 0xe7, + 0xe8, 0xe9, 0xea, 0xeb, 0x00, 0xed, 0xee, 0xef, + 0x00, 0xf1, 0x00, 0xf3, 0xf4, 0x00, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0x00, 0xff, + 0x00, 0x00, 0xc3, 0xe3, 0x00, 0x00, 0x00, 0x00 +}; +unsigned char a3[] = { + 0xc8, 0xc9, 0xca, 0xcb, 0x00, 0xcd, 0xce, 0xcf, + 0x00, 0xd1, 0x00, 0xd3, 0xd4, 0x00, 0xd6, 0xd7, + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x00, 0x00, 0xdf, + 0xe0, 0xe1, 0xe2, 0x00, 0xe4, 0xe5, 0xe6, 0xe7, + 0xe8, 0xe9, 0xea, 0xeb, 0x00, 0xed, 0xee, 0xef, + 0x00, 0xf1, 0x00, 0xf3, 0xf4, 0x00, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0x00, 0xff, + 0x00, 0x00, 0xc3, 0xe3, 0x00, 0x00, 0x00, 0x00, +}; +unsigned char a4[] = { + 0xc8, 0xc9, 0xca, 0xcb, 0x00, 0xcd, 0xce, 0xcf, + 0x00, 0xd1, 0x00, 0xd3, 0xd4, 0x00, 0xd6, 0xd7, + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x00, 0x00, 0xdf, + 0xe0, 0xe1, 0xe2, 0x00, 0xe4, 0xe5, 0xe6, 0xe7, + 0xe8, 0xe9, 0xea, 0xeb, 0x00, 0xed, 0xee, 0xef, + 0x00, 0xf1, 0x00, 0xf3, 0xf4, 0x00, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0x00, 0xff, + 0x00, 0x00, 0xc3, 0xe3, 0x00, 0x00, 0x00, 0x00, + 0xfa +}; +unsigned char a5[] = { + 0xc8, 0xc9, 0xca, 0xcb, 0x00, 0xcd, 0xce, 0xcf, + 0x00, 0xd1, 0x00, 0xd3, 0xd4, 0x00, 0xd6, 0xd7, + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x00, 0x00, 0xdf, + 0xe0, 0xe1, 0xe2, 0x00, 0xe4, 0xe5, 0xe6, 0xe7, + 0xe8, 0xe9, 0xea, 0xeb, 0x00, 0xed, 0xee, 0xef, + 0x00, 0xf1, 0x00, 0xf3, 0xf4, 0x00, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0x00, 0xff, + 0x00, 0x00, 0xc3, 0xe3, 0x00, 0x00, 0x00, 0x00, + 0xfa, +}; +unsigned char a6[] = { + 0xc8, 0xc9, 0xca, 0xcb, 0x00, 0xcd, 0xce, 0xcf, + 0x00, 0xd1, 0x00, 0xd3, 0xd4, 0x00, 0xd6, 0xd7, + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x00, 0x00, 0xdf, + 0xe0, 0xe1, 0xe2, 0x00, 0xe4, 0xe5, 0xe6, 0xe7, + 0xe8, 0xe9, 0xea, 0xeb, 0x00, 0xed, 0xee, 0xef, + 0x00, 0xf1, 0x00, 0xf3, 0xf4, 0x00, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0x00, 0xff, + 0x00, 0x00, 0xc3, 0xe3, 0x00, 0x00, 0x00, 0x00, + 0xfa, 0xfb +}; +unsigned char a7[] = { + 0xc8, 0xc9, 0xca, 0xcb, 0x00, 0xcd, 0xce, 0xcf, + 0x00, 0xd1, 0x00, 0xd3, 0xd4, 0x00, 0xd6, 0xd7, + 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0x00, 0x00, 0xdf, + 0xe0, 0xe1, 0xe2, 0x00, 0xe4, 0xe5, 0xe6, 0xe7, + 0xe8, 0xe9, 0xea, 0xeb, 0x00, 0xed, 0xee, 0xef, + 0x00, 0xf1, 0x00, 0xf3, 0xf4, 0x00, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0x00, 0xff, + 0x00, 0x00, 0xc3, 0xe3, 0x00, 0x00, 0x00, 0x00, + 0xfa, 0xfb +}; + +int +main () +{ + if (sizeof (a1) != 72 + || sizeof (a2) != 64 + || __builtin_memcmp (a1, a2, 64) != 0 + || sizeof (a3) != 64 + || __builtin_memcmp (a1, a3, 64) != 0 + || sizeof (a4) != 65 + || __builtin_memcmp (a1, a4, 65) != 0 + || sizeof (a5) != 65 + || __builtin_memcmp (a1, a5, 65) != 0 + || sizeof (a6) != 66 + || __builtin_memcmp (a1, a6, 66) != 0 + || sizeof (a7) != 66 + || __builtin_memcmp (a1, a7, 66) != 0) + __builtin_abort (); +}