From patchwork Fri Jun 28 10:24:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1953875 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=a3eAfqAw; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9Wq15K4cz20X6 for ; Fri, 28 Jun 2024 20:27:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E86FC382DB35 for ; Fri, 28 Jun 2024 10:27:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) by sourceware.org (Postfix) with ESMTPS id 4DD60382DB1D for ; Fri, 28 Jun 2024 10:25:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4DD60382DB1D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4DD60382DB1D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::234 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719570305; cv=none; b=aMQ3VfcpINu8RHRVunpXGQSqaJhgMUn4atEbwPshutgFoxIxaawstSIEDDeP+Cb5oVGyl+nz9AhkqIVkOGUAz96qz6UjwMy5CD5sF9EvEwE4wEv7zQAzXM3Vb3/frgbapvNwFCYUNowgFz8CNtpBi6bLcObxdX0YLSa9K0Q2Wrg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719570305; c=relaxed/simple; bh=yubXdsoOoBPir4ru9+FWdv2qsV79YcTC4e0hUX2ZdIM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=hPOKIfzq6zP96Ljg2/ixBgMy/u1SdIi3TOmBOfNjltdoGWt3yiV2VUtGaARae05N4v7q0tIsKW2giVXQLhsCZTSZNXhgWnXOskz7MoDFPLUAf4jw+odgFCpGPfUCLC9s+nGpfw9wdwi8dwEt9F+YnEdmK4UQTLoIeaAOBjZMx8s= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x234.google.com with SMTP id 38308e7fff4ca-2ebe6495aedso4284011fa.0 for ; Fri, 28 Jun 2024 03:25:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1719570300; x=1720175100; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2lUbxzE/k7I+saw8DQ95UOadKoeoRamTTkk8p4jByV8=; b=a3eAfqAwDr58TpD99HxOg/qB2nICcRQSobsQsdPZBi5wEHm6tTWXG4/VxYeGInCu6J IZUB+Xvi2LSXj00XdoEqM9pybRFk3jR8AAU544I59tPSlCvS6mTypFlhU6JeDP+Kc3l6 7YLx0oFETbIflPtrxL68VNwiJJuCxgJWxIGtLfTICWAeTykDUT7O4LOGbgzBjamlBSCX 0CVJHTMa3ReUYvFwHVn9MX6Aqyee5siqY6kVTJYFzeO4rtY04pyfJ6kyStIQux05CwpJ UKhcKzI/va/W1TGQptFOwB01et9ogFZux5MJ8nrNwfXzCG96F3WqssmpSxOzQnfkTybZ 3+FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719570300; x=1720175100; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2lUbxzE/k7I+saw8DQ95UOadKoeoRamTTkk8p4jByV8=; b=bncCKhrq7ONYEwj66E1fbs6DK3vZtsf+hVasETHr3GZluPB+o2kxYq/u/9QTlRsIAd Z7tGtIxFptb8jAMC8yqVesaWRUgIRt3BmbtRZKsz+D8q40M4Y/KouVURdOc6VsKA37+F gkPkoEYKKp7tsIwD3aWKgOIX72Cb5VocyZBt7sKouwrwwUXn9fnnbW6snToiDtShRTpB SIlD0eU2Uz9L5VHLYdM80OVID0I1hVSHf8wF3D6Th/v99kmxvpEMopY2H4Uq5ESsbSnD h5xbrV1SkgKIhwtxfedvY7ZnRfkpsotZJEdvlKvYSr2X6Z2fI/nfil+nxLJNO5RGNy+9 szFw== X-Gm-Message-State: AOJu0Yxo91tIUYDxFkaAQHgEz0JAeo+GdZFsr6+PneutcNVf47x6waQc jBCo40fSg5BlriZXnDrqanFjI3EVgjNNjG2WUvbdcYYHKJW+p52E7ae0Y9ZcCH0mVIVw6AHGBjV X2GA= X-Google-Smtp-Source: AGHT+IHu2eRx1zFykj6FrU+Ql/Qj4ISHvzBm626dwNUYB6ZtRZ756/UlFNt9S1cNBiKgDJ/oA5QQlA== X-Received: by 2002:a2e:9213:0:b0:2ec:53a9:2036 with SMTP id 38308e7fff4ca-2ec5b2f0491mr105864421fa.37.1719570299248; Fri, 28 Jun 2024 03:24:59 -0700 (PDT) Received: from arnold.baylibre (88-127-129-70.subs.proxad.net. [88.127.129.70]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-4256b061006sm28014945e9.22.2024.06.28.03.24.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Jun 2024 03:24:58 -0700 (PDT) From: Andrew Stubbs To: gcc-patches@gcc.gnu.org Cc: tburnus@baylibre.com, jakub@redhat.com Subject: [PATCH v2 5/8] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK Date: Fri, 28 Jun 2024 10:24:46 +0000 Message-ID: <20240628102449.562467-6-ams@baylibre.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20240628102449.562467-1-ams@baylibre.com> References: <20240628102449.562467-1-ams@baylibre.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Andrew Stubbs The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure that the information is passed all the way to the backend. The backend then places a marker in the assembler code for mkoffload to find. Finally mkoffload places a constructor function into the final program to ensure that the HSA_XNACK environment variable passes the correct mode to the GPU. The HSA_XNACK variable must be set before the HSA runtime is even loaded, so it makes more sense to have this set within the constructor than at some point later within libgomp or the GCN plugin. Other toolchains require the end-user to set HSA_XNACK manually (or else wonder why it's not working), so the constructor also checks that any existing manual setting is compatible with the binary's requirements. gcc/ChangeLog: * config/gcn/gcn.c (unified_shared_memory_enabled): New variable. (gcn_init_cumulative_args): Handle attribute "omp unified memory". (gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+". * config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro. (process_asm): Detect "MKOFFLOAD OPTIONS: USM+". Emit configure_xnack constructor, as required. * omp-low.c (create_omp_child_function): Add attribute "omp unified memory". --- gcc/config/gcn/gcn.cc | 32 +++++++++++++++++++++++++++++++- gcc/config/gcn/mkoffload.cc | 35 ++++++++++++++++++++++++++++++++++- gcc/omp-low.cc | 4 ++++ 3 files changed, 69 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index d6531f55190..6a83ff2a1b4 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -70,6 +70,11 @@ static bool ext_gcn_constants_init = 0; enum gcn_isa gcn_isa = ISA_GCN3; /* Default to GCN3. */ +/* Record whether the host compiler added "omp unifed memory" attributes to + any functions. We can then pass this on to mkoffload to ensure xnack is + compatible there too. */ +static bool unified_shared_memory_enabled = false; + /* Reserve this much space for LDS (for propagating variables from worker-single mode to worker-partitioned mode), per workgroup. Global analysis could calculate an exact bound, but we don't do that yet. @@ -2942,6 +2947,29 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ , if (!caller && cfun->machine->normal_function) gcn_detect_incoming_pointer_arg (fndecl); + if (fndecl && lookup_attribute ("omp unified memory", + DECL_ATTRIBUTES (fndecl))) + { + unified_shared_memory_enabled = true; + + switch (gcn_arch) + { + case PROCESSOR_FIJI: + case PROCESSOR_VEGA10: + case PROCESSOR_VEGA20: + case PROCESSOR_GFX908: + case PROCESSOR_GFX1030: + case PROCESSOR_GFX1036: + case PROCESSOR_GFX1100: + case PROCESSOR_GFX1103: + error ("GPU architecture does not support Unified Shared Memory"); + break; + default: + if (flag_xnack == HSACO_ATTR_OFF) + error ("Unified Shared Memory is enabled, but XNACK is disabled"); + } + } + reinit_regs (); } @@ -6820,12 +6848,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, fputs (",@function\n", file); ASM_OUTPUT_FUNCTION_LABEL (file, name, decl); - /* This comment is read by mkoffload. */ + /* These comments are read by mkoffload. */ if (flag_openacc) fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n", oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG), oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER), oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name); + if (unified_shared_memory_enabled) + fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n"); } /* Implement TARGET_ASM_SELECT_SECTION. diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index 810298a799b..3dcb6943c45 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -487,6 +487,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile) { int fn_count = 0, var_count = 0, ind_fn_count = 0; int dims_count = 0, regcount_count = 0; + bool unified_shared_memory_enabled = false; struct obstack fns_os, dims_os, regcounts_os; obstack_init (&fns_os); obstack_init (&dims_os); @@ -511,6 +512,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile) fn_count += 2; char buf[1000]; + char dummy; enum { IN_CODE, IN_METADATA, @@ -531,6 +533,9 @@ process_asm (FILE *in, FILE *out, FILE *cfile) dims_count++; } + if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0) + unified_shared_memory_enabled = true; + break; } case IN_METADATA: @@ -591,7 +596,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile) } } - char dummy; if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0) { state = IN_VARS; @@ -660,6 +664,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile) fprintf (cfile, "#include \n"); fprintf (cfile, "#include \n"); fprintf (cfile, "#include \n\n"); + fprintf (cfile, "#include \n\n"); fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count); fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count); @@ -713,6 +718,34 @@ process_asm (FILE *in, FILE *out, FILE *cfile) "}\n\n", gcn_stack_size); + /* Emit a constructor function to set the HSA_XNACK environment variable. + This must be done before the ROCr runtime library is loaded. + We never override a user value (exit empty string), but we do emit a + useful diagnostic in the wrong mode (the ROCr message is not good. */ + if (TEST_XNACK_OFF (elf_flags) && unified_shared_memory_enabled) + fatal_error (input_location, + "conflicting settings; XNACK is forced off but Unified " + "Shared Memory is on"); + if (!TEST_XNACK_ANY (elf_flags) || unified_shared_memory_enabled) + fprintf (cfile, + "static __attribute__((constructor))\n" + "void configure_xnack (void)\n" + "{\n" + " const char *val = getenv (\"HSA_XNACK\");\n" + " if (!val || val[0] == '\\0')\n" + " setenv (\"HSA_XNACK\", \"%d\", true);\n" + " else if (%s)\n" + " {\n" + " fprintf (stderr, \"error: HSA_XNACK=%%s is incompatible; " + "please unset\\n\", val);\n" + " exit (1);\n" + " }\n" + "}\n\n", + unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags), + (unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags) + ? "val[0] != '1' || val[1] != '\\0'" + : "val[0] == '1' && val[1] == '\\0'")); + obstack_free (&fns_os, NULL); for (i = 0; i < dims_count; i++) free (dims[i].name); diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d3f9ccc4567..b21b48280f2 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -2124,6 +2124,10 @@ create_omp_child_function (omp_context *ctx, bool task_copy) DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (target_attr), NULL_TREE, DECL_ATTRIBUTES (decl)); + if (omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY) + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("omp unified memory"), + NULL_TREE, DECL_ATTRIBUTES (decl)); } t = build_decl (DECL_SOURCE_LOCATION (decl),