From patchwork Wed Nov  6 15:27:20 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@baylibre.com>
X-Patchwork-Id: 2007599
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=baylibre-com.20230601.gappssmtp.com
 header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256
 header.s=20230601 header.b=JJwu8Trz;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4Xk8HC5d3Dz1xxf
	for <incoming@patchwork.ozlabs.org>; Thu,  7 Nov 2024 02:28:23 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id F0D963858C53
	for <incoming@patchwork.ozlabs.org>; Wed,  6 Nov 2024 15:28:21 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com
 [IPv6:2607:f8b0:4864:20::632])
 by sourceware.org (Postfix) with ESMTPS id D7A023858D28
 for <gcc-patches@gcc.gnu.org>; Wed,  6 Nov 2024 15:27:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D7A023858D28
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=baylibre.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D7A023858D28
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2607:f8b0:4864:20::632
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730906863; cv=none;
 b=OeFNtbidLaXG2tzKmavxuuRYQ5f7UKrIBxtouz+O/F1BhtB1be3Lw2FNY357lIscbkRgZJNtZmxJ8JphklvJbQp24hEWB8EQH3EK0eZdoETiSBm1MUwJ2/TlUvC+dqyzMdM77bn738RFk9Pv9z2+QtH+yCrdQJHlaFeDaXIoy5k=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1730906863; c=relaxed/simple;
 bh=lHwiq8HaVeTLmTXHAz4O2N/rkRzcrb0N3FtuVjiQprY=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=EESDEmGNzclABrdalLTnWP7R8dclKnfuAq+FNSY9hU6yw+fMGVlcQ5UiDTffBTgsEZ79fNFGvEIQGB+7vs3KmDPk5MqDnmMuZ+nnu6hLHCUVvWLzsZgICrbEMMVGBSqQhgALq0H/AI/iF3hqXZtF+od93eNdmifjMTn/WHaGtfc=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-pl1-x632.google.com with SMTP id
 d9443c01a7336-20cbcd71012so78898755ad.3
 for <gcc-patches@gcc.gnu.org>; Wed, 06 Nov 2024 07:27:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1730906859;
 x=1731511659;
 darn=gcc.gnu.org;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:from:to:cc:subject:date:message-id:reply-to;
 bh=wskvDqJJs4wR+M6qWW0jTNSB0NkZ9/8k+jkIYdlnK4s=;
 b=JJwu8TrzushZV7yhcM1Vl8IOebwnlo5uoGmyQeEhi7sibC6nkErf5beC8fmtcY/Ppq
 G1hfuEoLitiqVpTH4HTLre+6IWepUlHRBClXPkoaD8QH8SD/YxEkbAy0ZxJ+6XzxWZWt
 xGMbF+S/wircRTSijbQ8Q7C9zAyeuT8VDRX9mboW56ir5uhmM1WIXvhYDLw1biDdnLq5
 r0wiOitsbs1Rpj6NmEbYuFGq/U97g/DerIX65P3zgsWVCYb9jymXQCDFcfxI+B078MsK
 Yq4lIyCnQxXyWIK4YnykdA+n9rINpFPMV3g9yxbkPVjmir3H3/qOgMiawRzpbft9Kxs4
 b57A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1730906859; x=1731511659;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=wskvDqJJs4wR+M6qWW0jTNSB0NkZ9/8k+jkIYdlnK4s=;
 b=wRUGPnU+kLOE4n6wONTNjA/S91KAXoT4oLoeiRDM8DrA1Ca1la8r5IvxQTwY4ikL4v
 XJS3hVMTmml7NFrHtCVY57fLYNnV3qnC/a2YawWnTWdMRwRESP7h57KYTKndPhjfnuvp
 mrkq260S92nho2z1W4i1ib1bdSxCuRrm4gSg1ai3BOXz+q2GojRiIQodvyMODZMdgxum
 NkZKHa2JzZ8+RgPfheEARWn5lWipEJRgHer0DkAblPzUttKKIUfiaB5AmtvoTJ6IBAS1
 zwFubGSqhulsV01fhIrdxOGtL8wiyKpdkA34x/aBQSJct97LsTqwh/5tY6KyjB58eOwg
 9qSA==
X-Gm-Message-State: AOJu0YwJK084+1J4EJoubEak7A6GoJwETGJZTVfdDkVU4nRLvwivlVYV
 RyUiaFx9oFiLvBqoEqyPcuJYr/TE5c01hfJhtVkLGijXHs54VWBK14KsP/yr/owdayW0ZkmBH80
 TI30=
X-Google-Smtp-Source: 
 AGHT+IFF2gGf5xkF3KdejVc9DZR4D9RyAUrr9jsMVIXqpOYXbzRUZjxkjjOgRrwS5UmBgkvIjLzz4w==
X-Received: by 2002:a17:902:d4c2:b0:20b:bd8d:427c with SMTP id
 d9443c01a7336-2111aef288amr278387065ad.23.1730906858514;
 Wed, 06 Nov 2024 07:27:38 -0800 (PST)
Received: from carlos.baylibre ([217.13.61.132])
 by smtp.googlemail.com with ESMTPSA id
 d9443c01a7336-211057d44cesm97981115ad.255.2024.11.06.07.27.37
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 06 Nov 2024 07:27:38 -0800 (PST)
From: Andrew Stubbs <ams@baylibre.com>
To: gcc-patches@gcc.gnu.org
Cc: prathameshk@nvidia.com,
	jakub@redhat.com
Subject: [PATCH 2/4] openmp: use offload max_vf for chunk_size
Date: Wed,  6 Nov 2024 15:27:20 +0000
Message-ID: <20241106152722.2821586-3-ams@baylibre.com>
X-Mailer: git-send-email 2.46.0
MIME-Version: 1.0
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

The chunk size for SIMD loops should be right for the current device; too big
allocates too much memory, too small is inefficient.  Getting it wrong doesn't
actually break anything though.

This patch attempts to choose the optimal setting based on the context.  Both
host-fallback and device will get the same chunk size, but device performance
is the most important in this case.

gcc/ChangeLog:

	* omp-expand.cc (is_in_offload_region): New function.
	(omp_adjust_chunk_size): Add pass-through "offload" parameter.
	(get_ws_args_for): Likewise.
	(determine_parallel_type): Use is_in_offload_region to adjust call to
	get_ws_args_for.
	(expand_omp_for_generic): Likewise.
	(expand_omp_for_static_chunk): Likewise.
---
 gcc/omp-expand.cc | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 907fd46a5b2..b0f9d375b6c 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -127,6 +127,23 @@ is_combined_parallel (struct omp_region *region)
   return region->is_combined_parallel;
 }
 
+/* Return true is REGION is or is contained within an offload region.  */
+
+static bool
+is_in_offload_region (struct omp_region *region)
+{
+  gimple *entry_stmt = last_nondebug_stmt (region->entry);
+  if (is_gimple_omp (entry_stmt)
+      && is_gimple_omp_offloaded (entry_stmt))
+    return true;
+  else if (region->outer)
+    return is_in_offload_region (region->outer);
+  else
+    return (lookup_attribute ("omp declare target",
+			      DECL_ATTRIBUTES (current_function_decl))
+	    != NULL);
+}
+
 /* Given two blocks PAR_ENTRY_BB and WS_ENTRY_BB such that WS_ENTRY_BB
    is the immediate dominator of PAR_ENTRY_BB, return true if there
    are no data dependencies that would prevent expanding the parallel
@@ -207,12 +224,12 @@ workshare_safe_to_combine_p (basic_block ws_entry_bb)
    presence (SIMD_SCHEDULE).  */
 
 static tree
-omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
+omp_adjust_chunk_size (tree chunk_size, bool simd_schedule, bool offload)
 {
   if (!simd_schedule || integer_zerop (chunk_size))
     return chunk_size;
 
-  poly_uint64 vf = omp_max_vf (false);
+  poly_uint64 vf = omp_max_vf (offload);
   if (known_eq (vf, 1U))
     return chunk_size;
 
@@ -228,7 +245,7 @@ omp_adjust_chunk_size (tree chunk_size, bool simd_schedule)
    expanded.  */
 
 static vec<tree, va_gc> *
-get_ws_args_for (gimple *par_stmt, gimple *ws_stmt)
+get_ws_args_for (gimple *par_stmt, gimple *ws_stmt, bool offload)
 {
   tree t;
   location_t loc = gimple_location (ws_stmt);
@@ -270,7 +287,7 @@ get_ws_args_for (gimple *par_stmt, gimple *ws_stmt)
       if (fd.chunk_size)
 	{
 	  t = fold_convert_loc (loc, long_integer_type_node, fd.chunk_size);
-	  t = omp_adjust_chunk_size (t, fd.simd_schedule);
+	  t = omp_adjust_chunk_size (t, fd.simd_schedule, offload);
 	  ws_args->quick_push (t);
 	}
 
@@ -366,7 +383,8 @@ determine_parallel_type (struct omp_region *region)
 
       region->is_combined_parallel = true;
       region->inner->is_combined_parallel = true;
-      region->ws_args = get_ws_args_for (par_stmt, ws_stmt);
+      region->ws_args = get_ws_args_for (par_stmt, ws_stmt,
+					 is_in_offload_region (region));
     }
 }
 
@@ -3929,6 +3947,7 @@ expand_omp_for_generic (struct omp_region *region,
   tree *counts = NULL;
   int i;
   bool ordered_lastprivate = false;
+  bool offload = is_in_offload_region (region);
 
   gcc_assert (!broken_loop || !in_combined_parallel);
   gcc_assert (fd->iter_type == long_integer_type_node
@@ -4196,7 +4215,7 @@ expand_omp_for_generic (struct omp_region *region,
 	  if (fd->chunk_size)
 	    {
 	      t = fold_convert (fd->iter_type, fd->chunk_size);
-	      t = omp_adjust_chunk_size (t, fd->simd_schedule);
+	      t = omp_adjust_chunk_size (t, fd->simd_schedule, offload);
 	      if (sched_arg)
 		{
 		  if (fd->ordered)
@@ -4240,7 +4259,7 @@ expand_omp_for_generic (struct omp_region *region,
 	    {
 	      tree bfn_decl = builtin_decl_explicit (start_fn);
 	      t = fold_convert (fd->iter_type, fd->chunk_size);
-	      t = omp_adjust_chunk_size (t, fd->simd_schedule);
+	      t = omp_adjust_chunk_size (t, fd->simd_schedule, offload);
 	      if (sched_arg)
 		t = build_call_expr (bfn_decl, 10, t5, t0, t1, t2, sched_arg,
 				     t, t3, t4, reductions, mem);
@@ -5937,7 +5956,8 @@ expand_omp_for_static_chunk (struct omp_region *region,
   step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
 				   true, NULL_TREE, true, GSI_SAME_STMT);
   tree chunk_size = fold_convert (itype, fd->chunk_size);
-  chunk_size = omp_adjust_chunk_size (chunk_size, fd->simd_schedule);
+  chunk_size = omp_adjust_chunk_size (chunk_size, fd->simd_schedule,
+				      is_in_offload_region (region));
   chunk_size
     = force_gimple_operand_gsi (&gsi, chunk_size, true, NULL_TREE, true,
 				GSI_SAME_STMT);