From patchwork Wed Sep 20 11:14:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 1837183 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RrGCN2Nywz1ynX for ; Wed, 20 Sep 2023 21:14:55 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9C6FF385696F for ; Wed, 20 Sep 2023 11:14:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id DAD773858404 for ; Wed, 20 Sep 2023 11:14:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DAD773858404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-CSE-ConnectionGUID: cnYwN8UQQa+60ESWigp3Aw== X-CSE-MsgGUID: qRI2JBsDTS6L+4oofGX6Jg== X-IronPort-AV: E=Sophos;i="6.02,161,1688457600"; d="scan'208";a="19432878" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 20 Sep 2023 03:14:28 -0800 IronPort-SDR: 6qYTm1v/c47rpFl4lT5V+z8EQzVHrNFaTl8LzXb/h+DvXuuBmalyvf9let9+qu/SKT4hhil2JO cSwyCLByytZeMcjVGANwOqszOLF00yjQ7fwf94zfAjCd/8R5DhjrUC9Ivwj0DcaTzYVLMJViLc w3PrbjkvYEQ894Tss4KUXF4h4Lx0hkz09YjcHT3qOh5uBK4IYl7sn9OwkGk9JyamnDrSPkcNn3 gzR+3M0fQNC5K0lNKQcbFP2S+W+6PPUPDdnBwGVId1M31focP6ol2QKnPXJLg7OTUko93XyUua ppc= From: Julian Brown To: CC: , Subject: [PATCH 2/3] [og13] OpenMP, NVPTX: memcpy[23]D bias correction Date: Wed, 20 Sep 2023 11:14:00 +0000 Message-ID: <33eb021ad9d9e2957814cbddfa213f4e529ce097.1695207771.git.julian@codesourcery.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch works around behaviour of the 2D and 3D memcpy operations in the CUDA driver runtime. Particularly in Fortran, the "base pointer" of an array (used for either source or destination of a host/device copy) may lie outside of data that is actually stored on the device. The fix is to make sure that we use the first element of data to be transferred instead, and adjust parameters accordingly. This is a merge of the patch previously posted for mainline to the og13 branch. 2023-09-19 Julian Brown libgomp/ * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to avoid out-of-bounds array checks in CUDA runtime. (GOMP_OFFLOAD_memcpy3d): Likewise. --- libgomp/plugin/plugin-nvptx.c | 67 +++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index bc232f9f81f..dd8c56b8f58 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -2460,6 +2460,35 @@ GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, size_t dim1_size, data.srcXInBytes = src_offset1_size; data.srcY = src_offset0_len; + if (data.srcXInBytes != 0 || data.srcY != 0) + { + /* Adjust origin to the actual array data, else the CUDA 2D memory + copy API calls below may fail to validate source/dest pointers + correctly (especially for Fortran where the "virtual origin" of an + array is often outside the stored data). */ + if (src_ord == -1) + data.srcHost = (const void *) ((const char *) data.srcHost + + data.srcY * data.srcPitch + + data.srcXInBytes); + else + data.srcDevice += data.srcY * data.srcPitch + data.srcXInBytes; + data.srcXInBytes = 0; + data.srcY = 0; + } + + if (data.dstXInBytes != 0 || data.dstY != 0) + { + /* As above. */ + if (dst_ord == -1) + data.dstHost = (void *) ((char *) data.dstHost + + data.dstY * data.dstPitch + + data.dstXInBytes); + else + data.dstDevice += data.dstY * data.dstPitch + data.dstXInBytes; + data.dstXInBytes = 0; + data.dstY = 0; + } + CUresult res = CUDA_CALL_NOCHECK (cuMemcpy2D, &data); if (res == CUDA_ERROR_INVALID_VALUE) /* If pitch > CU_DEVICE_ATTRIBUTE_MAX_PITCH or for device-to-device @@ -2528,6 +2557,44 @@ GOMP_OFFLOAD_memcpy3d (int dst_ord, int src_ord, size_t dim2_size, data.srcY = src_offset1_len; data.srcZ = src_offset0_len; + if (data.srcXInBytes != 0 || data.srcY != 0 || data.srcZ != 0) + { + /* Adjust origin to the actual array data, else the CUDA 3D memory + copy API call below may fail to validate source/dest pointers + correctly (especially for Fortran where the "virtual origin" of an + array is often outside the stored data). */ + if (src_ord == -1) + data.srcHost + = (const void *) ((const char *) data.srcHost + + (data.srcZ * data.srcHeight + data.srcY) + * data.srcPitch + + data.srcXInBytes); + else + data.srcDevice + += (data.srcZ * data.srcHeight + data.srcY) * data.srcPitch + + data.srcXInBytes; + data.srcXInBytes = 0; + data.srcY = 0; + data.srcZ = 0; + } + + if (data.dstXInBytes != 0 || data.dstY != 0 || data.dstZ != 0) + { + /* As above. */ + if (dst_ord == -1) + data.dstHost = (void *) ((char *) data.dstHost + + (data.dstZ * data.dstHeight + data.dstY) + * data.dstPitch + + data.dstXInBytes); + else + data.dstDevice + += (data.dstZ * data.dstHeight + data.dstY) * data.dstPitch + + data.dstXInBytes; + data.dstXInBytes = 0; + data.dstY = 0; + data.dstZ = 0; + } + CUDA_CALL (cuMemcpy3D, &data); return true; }