From patchwork Mon Sep 2 11:44:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tobias Burnus X-Patchwork-Id: 1979615 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=LleofLNQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wy6PJ334Kz1yXY for ; Mon, 2 Sep 2024 21:44:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 491AE3858401 for ; Mon, 2 Sep 2024 11:44:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id B7CD8385EC33 for ; Mon, 2 Sep 2024 11:44:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B7CD8385EC33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B7CD8385EC33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::330 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725277469; cv=none; b=E0dnYH1iyMzMCUFYrnaj2ka1Aj5xV2bwth993MXlavlaPgrLrkRYGgHbYsFbiK+HvAXjzPSe3q0HZQK7BedbzGcTgEi3EXTxz1A4nqT/M+Bgbo3wCcz6xZnM0ngv5O580GeLTtvHd2bG8ZSp1enGsFUrqIhz87HqnQLsVtpL6a8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725277469; c=relaxed/simple; bh=jDgq+0YGBRo13FllIoQ4uNKV9/q59NYWpsDkIzudlvo=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=uKyqlpTksp26D8ytrO0ns/zVfnaO5YF4OyIj1dkzPyHxnXE2GrktsgE7E3y0rb0pE/F4Pw4L4cFSZfYbmj2sc2n+q9rycapqDXpub3KpwazCbfj+RrVcP3AO2B3M0siJFMwySifrP06UWiWb92kGrqp/w9BRmk+WMsv+6oLaM1c= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x330.google.com with SMTP id 5b1f17b1804b1-42c7856ed66so16588585e9.3 for ; Mon, 02 Sep 2024 04:44:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1725277460; x=1725882260; darn=gcc.gnu.org; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=qJIS3ABVBYqN5gqXv5NsgaHKdDe3Ft+Kc1uFhpr9osM=; b=LleofLNQHrXT9lhwoydK01qI9ztxsN45JJwaKMI+9/cObb2pa2dwHHik3MSRtIBkjh RIBNbJuV3g8B/QhzYb8x4ahi5WY86R8p5ZkV2m/Hi/mxIPvnFAUSFLd7sYajE7rp6Tww PJNQYDCDffoQ+l+UCptZCNViaUoDt4Pg5SIg7LP46dQ1WNiyU25WlJTUzZaghhJZR9Jp AEwoWHhyWugCcnMItqksQxqnmcGOYmGYEXvmzIuhcMukjBqko8y5S/00a2XeXqg0W2ts hat4wuHGvZfJE5agqqBsuNoWvULkyVekZLOEKe6QqGTBETAArxhhtWu+IWd7K1ZbfhS4 auiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725277460; x=1725882260; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qJIS3ABVBYqN5gqXv5NsgaHKdDe3Ft+Kc1uFhpr9osM=; b=iRybCmSdPkvsEyQBmQ+Oe83fPHs2ZTf0z54cTnubcx8jLFT1c3odw0Us0r1L/q0N+T KxvsmR4BSbnpZ/HQtz42YB2aHQDdR6tuUtzQMzDccN6vd8aBFRXM9mgCXzQYvncMXWo/ KKRGh4zWYAqBUNsqvi6oiNxe5P7oAlCAhu6gUny80zszuWfesJdBSE6cxgUMaUOk+aB6 68aro+CZB3Rxr0hwpl6K30ODuJ2rG3zJPm1vBBEtnfTLTvaL8SDqvko4LWD3syMLQdoX QecZNL0goW/kEh7v0t6z5Zs3duqfWjkh2yFGvtD1+XbzUfHW861EU7T7a7XBV2pwHsd3 lSIA== X-Gm-Message-State: AOJu0YwBlylD6VYKOPpmetC32vQIsCSN1P81H4dSdMo9ptUPzdpe7cxS QxNf1QgNRHYGrozsUJ5SgobThLtunfIA4pOEKo5fqXSavRePJYXs6+e9zi75c4DAD+j/gzi0xMl 7xJj0jg== X-Google-Smtp-Source: AGHT+IEdoAkIEn12BNg8lK/fKLuyTYykRWu6UNh08dWZdRoEZsIECV1RR8EdvEEZmEyFf1qYaa90HA== X-Received: by 2002:a05:600c:3596:b0:426:66e9:b844 with SMTP id 5b1f17b1804b1-42bb02c0711mr107549745e9.8.1725277460158; Mon, 02 Sep 2024 04:44:20 -0700 (PDT) Received: from ?IPV6:2001:16b8:3d9e:c900:4373:59c8:3daf:2a0c? ([2001:16b8:3d9e:c900:4373:59c8:3daf:2a0c]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42bb6deb303sm137426035e9.9.2024.09.02.04.44.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 02 Sep 2024 04:44:19 -0700 (PDT) Message-ID: <8a61e4a0-87c0-4df5-af8b-6178581b0cab@baylibre.com> Date: Mon, 2 Sep 2024 13:44:18 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches , Richard Biener From: Tobias Burnus Subject: [patch] LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535] X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The attached patch tries to fix the issue exposed by the PR: The main ingredient is partitioning of the LTO work, e.g. by using -flto-partition=max. With -flto=2 (or higher or when a jobserver has been detected), not only the LTO part is run in parallel but also the creation of the ltrans files itself, i.e. gcc/lto/lto.cc's stream_out_partitions forks multiple processes to write those files concurrently (here: -flto=2 means two processes, each writing about half of the partitions). For each partition, output_offload_tables is called – which in principle would add the offload tables to each file. To prevent this, in flag_wpa mode, the tables were freed. That solves the WPA problem, but only if all partitions are written by a single process (e.g. -flto=1). If not, the data is duplicated and only the data belonging to the fork is modified. This patch moves the logic to gcc/lto/lto.cc and sets a global variable to ensure that it is only output for the first partition, independently whether there is only one or several processes writing the ltrans file, trying to follow what Richard proposed in the PR? The patch has been tested on x86-64-gnu-linux with nvptx offloading, but I should do a full bootstrap+regtest next. Comments, suggestions, remarks, approval? Tobias LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535] When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming sufficient partiations, e.g., via -flto-partition=max), output_offload_tables wrote the output tables once per fork. PR lto/116535 gcc/ChangeLog: * omp-offload.h (offload_output_tables_p): New extern bool var. * omp-offload.cc (offload_output_tables_p): Define it with value true. * lto-cgraph.cc (output_offload_tables): Only output tables when offload_output_tables_p is true. gcc/lto/ChangeLog: * lto.cc (stream_out_partitions_1): Set offload_output_tables_p to false except for the first partition. gcc/lto-cgraph.cc | 16 ++++------------ gcc/lto/lto.cc | 3 +++ gcc/omp-offload.cc | 2 ++ gcc/omp-offload.h | 1 + 4 files changed, 10 insertions(+), 12 deletions(-) diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc index 6395033ab9d..19ac252e1b4 100644 --- a/gcc/lto-cgraph.cc +++ b/gcc/lto-cgraph.cc @@ -1081,8 +1081,10 @@ output_offload_tables (void) { bool output_requires = (flag_openmp && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0); - if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars) - && !output_requires) + if (!offload_output_tables_p + || (vec_safe_is_empty (offload_funcs) + && vec_safe_is_empty (offload_vars) + && !output_requires)) return; struct lto_simple_output_block *ob @@ -1139,16 +1141,6 @@ output_offload_tables (void) streamer_write_uhwi_stream (ob->main_stream, 0); lto_destroy_simple_output_block (ob); - - /* In WHOPR mode during the WPA stage the joint offload tables need to be - streamed to one partition only. That's why we free offload_funcs and - offload_vars after the first call of output_offload_tables. */ - if (flag_wpa) - { - vec_free (offload_funcs); - vec_free (offload_vars); - vec_free (offload_ind_funcs); - } } /* Verify the partitioning of NODE. */ diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc index 52dd436fd9a..69c7527d399 100644 --- a/gcc/lto/lto.cc +++ b/gcc/lto/lto.cc @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3. If not see #include "builtins.h" #include "lto-common.h" #include "opts-jobserver.h" +#include "omp-offload.h" /* Number of parallel tasks to run. */ static int lto_parallelism; @@ -226,12 +227,14 @@ wait_for_child () static void stream_out_partitions_1 (char *temp_filename, int blen, int min, int max) { + offload_output_tables_p = (min == 0); /* Write all the nodes in SET. */ for (int p = min; p < max; p ++) { sprintf (temp_filename + blen, "%u.o", p); stream_out (temp_filename, ltrans_partitions[p]->encoder, p); ltrans_partitions[p]->encoder = NULL; + offload_output_tables_p = false; } } diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc index 934fbd80bdd..76bfda94217 100644 --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc @@ -88,6 +88,8 @@ struct oacc_loop /* Holds offload tables with decls. */ vec *offload_funcs, *offload_vars, *offload_ind_funcs; +bool offload_output_tables_p = true; + /* Return level at which oacc routine may spawn a partitioned loop, or -1 if it is not a routine (i.e. is an offload fn). */ diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h index d972bb7eafd..2d1d173016c 100644 --- a/gcc/omp-offload.h +++ b/gcc/omp-offload.h @@ -29,6 +29,7 @@ extern int oacc_fn_attrib_level (tree attr); extern GTY(()) vec *offload_funcs; extern GTY(()) vec *offload_vars; extern GTY(()) vec *offload_ind_funcs; +extern bool offload_output_tables_p; extern void omp_finish_file (void); extern void omp_discover_implicit_declare_target (void);