From patchwork Sun Nov 3 22:22:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lewis Hyatt X-Patchwork-Id: 2005817 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=bOWSo6Wq; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XhTdC39MCz1xxN for ; Mon, 4 Nov 2024 09:23:06 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D93F13857B8C for ; Sun, 3 Nov 2024 22:23:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by sourceware.org (Postfix) with ESMTPS id 662953858D33 for ; Sun, 3 Nov 2024 22:22:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 662953858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 662953858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::f30 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730672561; cv=none; b=Wkk3GEWo3NQZ0Rq0nO0SjE29BGZd+aqdYV5VYofJUGPIGVBu5mwusuqdtHbDGV09icf4DWXEzh8+VEXfFImPMbXz/wI+sysZz9C0kwaBL0XoBveffeDA6Jjoj90ZtWO6y083WLM5HxWJ0GdLlMqXXDYK6ksZ+eXXD820BKc4Qbo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730672561; c=relaxed/simple; bh=PSOHlP1gzjVyz47mywHfHw2V9usR/uoEl1da0uGXbG4=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=BKSjX0+u9R/NXmLAnw2pv8kDuEFkUYGJq5s8lovfWjPrPDE/bGS9cX6eARNeNB0+7l7u8paDgPOft26LGW2D593evOxxo+HPHot1wEiiFMDg6hrYu2ISPri4QWx6qxva52bGqRhIdo2lHrFRxA8k7jEuPd+fXQTHvk+8dXcY1Qg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qv1-xf30.google.com with SMTP id 6a1803df08f44-6cd7a2ed34bso27286376d6.2 for ; Sun, 03 Nov 2024 14:22:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730672555; x=1731277355; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Yg5UXTLP++U/njIvPSSTGNXTmN/w8/JVYzUerIet06A=; b=bOWSo6WqmrgxD737o87P402WbAFj82GMdlUPjS2nfmbCDddiqh1EyYMvTaXzYyUbtA 71fihQb8Y3yysxRWOKLyzqj4v4mVTJvL7M8573L//O+JdI0ejiazRIZeVMAKNOhuEOIr pZN3IVXSsoxC0alc/od6TAjQbDu4Z54RyJM9Nvje5xU1E1Fz8UhoQx73rlZj5JCSJ65b Zfko62Y2Pz6sNcK6MDX290v6+6KG1ghnUhliOWYfnh39lj/JWqkZHt56cypc3vSAPtNP BiiV46Q98eMygw1cdqOzycATmXqDLzGfv9xCQxWQ/fFWlmsqsLv+/TonEH7Tpwm0WcQv yzng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730672555; x=1731277355; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Yg5UXTLP++U/njIvPSSTGNXTmN/w8/JVYzUerIet06A=; b=oa+sf1uK6Nv9pagltaKz4mwLf5NekGC0Y56zQ3XtIT9wI3oP7nXHw+B7+j1A8hWeBj 1Dyi10+tAdP8WnIYnjIpewrYrv6dHkB8zZmku5R5BhljBV/+rsNS+UEf3wZBJuYLfEgi uCH2/Z8rbTAY/qwkA7n90WNESMo4pjb5KNxL8yR6Qpf5KrUKtVLestVwaB+w2KOKXc9P AUdZXtI+xjm0TcA90zRJLj/wzn49o3CQmxAaOm5RS7v48ulZUXk8BEmQNyJt4KPa5Rmy Ski0pLqyUpUt4hn1m66f1/kkBffEUjEN4iuqQnm3vd1P5noSnZuaothj6pB2YR3B+yOT e8Jg== X-Gm-Message-State: AOJu0Yyre+KB9QUuzoGHF5C0JcU2OwZQtiCQz8d3zQ3AIAqGQwmhQIIO yrDNkZxpM7lqOxLsZYT5SWWc/PReIuR7kXGJfjJncxVSD2QRiFn9+ISL1g== X-Google-Smtp-Source: AGHT+IH5N2aSJTPKlxI0bdBllOcwU7imihHpehGC5MgJpA3wZ5EdS8OXc++TyENDt8nV6eILgpDFog== X-Received: by 2002:a05:6214:2f84:b0:6cb:4b70:8ead with SMTP id 6a1803df08f44-6d35c19bc65mr173945226d6.37.1730672554604; Sun, 03 Nov 2024 14:22:34 -0800 (PST) Received: from localhost.localdomain ([173.54.240.58]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6d35415b2fesm42025866d6.92.2024.11.03.14.22.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Nov 2024 14:22:33 -0800 (PST) From: Lewis Hyatt To: gcc-patches@gcc.gnu.org Cc: Lewis Hyatt Subject: [PATCH 00/15] Support for 64-bit location_t Date: Sun, 3 Nov 2024 17:22:05 -0500 Message-Id: <20241103222220.933471-1-lhyatt@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hello- There is no shortage of PRs complaining about things that go wrong when the line_maps data structure in libcpp starts to run into its limits. Being restricted to a 32-bit location_t to cover all source locations means it has the following limitations, among others: -Column numbers larger than 2^12 are not tracked at all, so diagnostics can only refer to the whole line, and things like -Wmisleading-indentation stop working with large column numbers. [PR89549] -Tokens longer than 32 characters cannot be stored compactly in the line map and have to be stored as ad-hoc locations with the associated memory and locality overhead. -Sufficiently large sources can exhaust the available location_t space. The current limitations are not encountered too much, but it does happen often enough, especially in testsuites or large PCH compiles that include, e.g., every header in a large project. Currently, the main workaround is to use the option -flarge-source-locations. This sets the range bits in the line_maps to 0, effectively meaning that every source code token is stored as an ad-hoc location. This does significantly increase the number of locations that can be stored, but it loses all the benefits of the line_maps infrastructure, and it does not help with the 4096 limit for column numbers. It seemed to me like there's no way to improve on that other than letting location_t be 64 bits instead of 32 bits. I thought I would take a look at what that would entail... the attached 15 (mostly small) patches are the result. It was unfortunately necessary to touch a lot of different parts of the compiler, since location_t is used pretty pervasively. This patch set adds a configure option --enable-large-source-locations that makes location_t be 64 bits. The option is off by default. There are decisions to be made about what to do with all those new bits. I made some choices in this patch, but it's easy to change if different tradeoffs are preferable. Here is what I did: - Only 63 bits are used, the high bit is not used so that two location_t can be safely subtracted and stored in an int64_t. - As in the 32-bit case, half of the available locations are reserved for ad-hoc locations, and the other half for macros and ordinary locations. - The limit on the maximum column number is removed, so it's now set to 2^31-1. (GCC right now pervasively assumes line and column numbers within a file can be represented by a 32-bit signed int. That could be changed too, but it is not part of this patch and would be another similarly large change.) - The default range bits is changed from 5 to 7. This means that tokens up to 128 chars in length get ordinary locations, while longer ones get ad-hoc locations. I think that's a big improvement from the current max of 32 and will result in a lot fewer ad-hoc locations being generated. There are probably enough bits to go around, though, that this could be made even larger too. All that stuff is configured in line-map.h and is easy to change. The bulk of the patch is dealing with the fallout from making location_t's type configurable; in particular, C++ modules needed a lot of tweaks, since they deal with line_map internals, and I had to add a new field to RTL which necessitated a lot of small changes. I was not previously familiar with RTL... I think I generally grokked the idea there and I think the changes I made are safe, but I am pretty sure some of them are unnecessary so would appreciate feedback on that too. I tried to test on a variety of platforms. For each of them, I did three bootstrap + regtest (without the patch, with the patch and without --enable-large-source-locations, and with the patch and with --enable-large-source-locations) with --enable-checking=yes,extra,rtl and verified no change in the testsuite other than the expected new PASSes. Here is what I have tested so far: all languages: x86_64, aarch64 (cfarm185), ppc64le (cfarm135) c/c++ only: sparc 32-bit (cfarm216) The testing on sparc 32-bit was very productive and in particular identified a few issues with padding and alignment that have not been apparent until now. That platform differs from x86 32-bit in that uint64_t has a larger alignment than a pointer does. Also it's worth noting, even if there is no interest in supporting this feature, patches 2, 3, 4, 5, and 6 I think are worth accepting regardless; these are small bug fixes that came up in testing and are independent of whether location_t is changed or not... they could be triggered by other changes in the future too. Regarding memory usage implications of this change. It does make a lot of data structures larger. Here is what it does on x86-64. The base sizes of some types change as follows: location_t: From 4 to 8 (+100%) line_map_ordinary: From 24 to 32 (+33%) line_map_macro: From 32 to 40 (+25%) cpp_token: From 24 to 32 (+33%) cpp_macro: From 48 to 56 (+17%) tree_exp: From 32 to 40 (+25%) gimple: From 40 to 48 (+20%) The compiled size of stdc++.h.gch increases from 210MB to 219MB (+4%). Compiling testsuite/g++.dg/modules/xtreme-header.h into a module, I see: size of the output module: From 21.9MB to 21.6MB (-1%) peak memory usage: From 211MB to 235MB (+11%) The reason the module gets smaller is that it streams ordinary locations more efficiently than adhoc locations, and with the 64-bit location_t, the number of adhoc locations was reduced from 189,891 to 101,571. I'm curious to hear your thoughts on this one please. I think it could be nice potentially to get it into GCC 15, and consider changing the default in GCC 16 or later, if you think it's worth the trouble. If it seems too invasive a change and not worth it, I'll certainly understand, and I have enjoyed the tour through the compiler internals in any case. FWIW I feel like this issue will need to be dealt with some time or other. I think users generally expect not to run into arbitrary limits, and compiling a lot of headers together into one module is going to become possibly more common as modules become available by default. I think any attempt to deal with it will have to share at least a lot of the features of this one. -Lewis 01/15: Support for 64-bit location_t: libcpp parts 02/15: libcpp: Fix potential unaligned access in cpp_buffer 03/15: tree-cfg: Fix call to next_discriminator_for_locus() 04/15: tree-phinodes: Use 4 instead of 2 as the minimum number of phi args 05/15: c++: Fix tree_contains_struct for TRAIT_EXPR 06/15: gimple: Handle tail padding when computing gimple_ops_offset 07/15: Support for 64-bit location_t: toplev parts 08/15: Support for 64-bit location_t: Analyzer parts 09/15: Support for 64-bit location_t: Frontend parts 10/15: Support for 64-bit location_t: C++ modules parts 11/15: Support for 64-bit location_t: RTL parts 12/15: Support for 64-bit location_t: Backend parts 13/15: Support for 64-bit location_t: Internal parts 14/15: Support for 64-bit location_t: Testsuite parts 15/15: Support for 64-bit location_t: Configury parts 64 files changed, 841 insertions(+), 386 deletions(-)