From patchwork Wed Jul 5 07:56:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 1803460 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QwsSM6kXNz20bL for ; Wed, 5 Jul 2023 17:56:51 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EBF80385700D for ; Wed, 5 Jul 2023 07:56:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 7F3483858CD1 for ; Wed, 5 Jul 2023 07:56:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7F3483858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="6.01,182,1684828800"; d="scan'208,223";a="10950792" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 04 Jul 2023 23:56:35 -0800 IronPort-SDR: oMYV/tpegoqo/7oHgXh5w3Xkz/ywyCykFJr9uho2k9h16ftsVP0J+De/q4DzheZC7JNk6WoRw8 4B5XQrpQGGgDYCP/q37U+MPKPXVPz2KJwCiqDxShKr3xWvLuYp/RXzxTaj9imvFf+OyZNA/kpl tRp2+X9UuS7cyn3OlTlS9V8QELat4GH71+b0+9oehzUF1LCVOXUmJZiZTpVtfh5HxIlKeov4fo L+LRBEbdaX4bmV6T3EL7xGkn5LAtW1OMXHU8F4eCOFjnsrfVrFabVkcTdbKq0BV2p2lEyv9Bv/ Mcg= From: Thomas Schwinge To: Lewis Hyatt , CC: Richard Sandiford , Jakub Jelinek , David Malcolm Subject: GTY: Enhance 'string_length' option documentation (was: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)) In-Reply-To: References: <87h6qjvfp1.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Wed, 5 Jul 2023 09:56:23 +0200 Message-ID: <878rbuvljs.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi! On 2023-07-04T15:56:23-0400, Lewis Hyatt via Gcc-patches wrote: > On Tue, Jul 4, 2023 at 11:50 AM Thomas Schwinge wrote: >> I came across this one here on my way working through another (somewhat >> related) GTY issue. I generally do understand the issue here, but do >> have a question about 'unsigned int len' field in >> 'libcpp/include/symtab.h:struct ht_identifier': [...] > I don't think there is currently any possibility for a null byte to > end up in an ht_identifier's string. I assumed that ht_identifier > stores the length as an optimization (especially since it doesn't take > up any extra space on 64-bit platforms, given the 32-bit hash code is > stored as well there.) I created the string_length GTY markup mainly > to support another patch that I have still pending review, which I > thought would increase the likelihood of PCH needing to handle null > bytes in general. When I did that, I added the markup to ht_identifier > simply because the length was already there, so there was no reason > not to add it. It does save a few cycles when streaming out the PCH, > but I doubt it is meaningful. Thanks for confirming. OK thus to push the attached "GTY: Enhance 'string_length' option documentation"? Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From a31b6657c26ac70c6e03b8ad81cdcb873f905716 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 5 Jul 2023 08:38:49 +0200 Subject: [PATCH] GTY: Enhance 'string_length' option documentation We're (currently) not aware of any actual use of 'ht_identifier's with NUL characters embedded; its 'len' field appears to exist for optimization purposes, since "forever". Before 'struct ht_identifier' was added in commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we had in 'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or earlier 'length', earlier in 'gcc/cpphash.h:struct hashnode': 'unsigned short length', earlier 'size_t length' with comment: "length of token, for quick comparison", earlier 'int length', ever since the 'gcc/cpp*' files were added in commit 7f2935c734c36f84ab62b20a04de465e19061333 (Subversion r9191). This amends commit f3b957ea8b9dadfb1ed30f24f463529684b7a36a "pch: Fix streaming of strings with embedded null bytes". gcc/ * doc/gty.texi (GTY Options) : Enhance. libcpp/ * include/symtab.h (struct ht_identifier): Document different rationale. --- gcc/doc/gty.texi | 11 +++++++++++ libcpp/include/symtab.h | 4 +--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi index 7bd064b5781..15f9fa07405 100644 --- a/gcc/doc/gty.texi +++ b/gcc/doc/gty.texi @@ -217,6 +217,17 @@ struct GTY(()) non_terminated_string @{ @}; @end smallexample +Similarly, this is useful for (regular NUL-terminated) strings with +NUL characters embedded (that the default @code{strlen} use would run +afoul of): + +@smallexample +struct GTY(()) multi_string @{ + const char * GTY((string_length ("%h.len + 1"))) str; + size_t len; +@}; +@end smallexample + The @code{string_length} option currently is not supported for (fields in) global variables. @c diff --git a/libcpp/include/symtab.h b/libcpp/include/symtab.h index c7ccc6db9f0..0c713f2ad30 100644 --- a/libcpp/include/symtab.h +++ b/libcpp/include/symtab.h @@ -29,9 +29,7 @@ along with this program; see the file COPYING3. If not see typedef struct ht_identifier ht_identifier; typedef struct ht_identifier *ht_identifier_ptr; struct GTY(()) ht_identifier { - /* This GTY markup arranges that the null-terminated identifier would still - stream to PCH correctly, if a null byte were to make its way into an - identifier somehow. */ + /* We know the 'len'gth of the 'str'ing; use it in the GTY markup. */ const unsigned char * GTY((string_length ("1 + %h.len"))) str; unsigned int len; unsigned int hash_value; -- 2.34.1