From patchwork Thu Feb 10 09:14:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 1590929 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Jf/XyE8j; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JvWLw6mYCz9s5B for ; Thu, 10 Feb 2022 20:15:52 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AD6943858421 for ; Thu, 10 Feb 2022 09:15:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AD6943858421 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644484550; bh=Ua1vWiZ5QWlA/V4i//wsCK/CYs8VyWNduA6N+UtyvJc=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Jf/XyE8j8PlW58yGCcG8YQeYbF1zdzblYQXpRFhP6eFgACt0Ly6gAraT7W2DjSQ9N M5vRMwOe5Yb/0yvBPIje9t4Lz76lerkHzxXrRWurEhHol9mrdXaYGwNwvmL488qkwx T2kIEt7zFpSSnGSUY3+p/nLYchiGMiiv13nW8YX4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 333483858429 for ; Thu, 10 Feb 2022 09:14:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 333483858429 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 678CB21117; Thu, 10 Feb 2022 09:14:56 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4C13A13B31; Thu, 10 Feb 2022 09:14:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Ubx0EZDXBGLxRQAAMHmgww (envelope-from ); Thu, 10 Feb 2022 09:14:56 +0000 Date: Thu, 10 Feb 2022 10:14:54 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH][libgomp, openacc] Add terminating spinlock test-cases Message-ID: <20220210091452.GA20962@delia.home> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom de Vries via Gcc-patches From: Tom de Vries Reply-To: Tom de Vries Cc: Thomas Schwinge Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, The OpenACC execution model states that implementing a critical section across workers using atomic operations and a busy-wait loop may never succeed, since the scheduler may suspend the worker that owns the lock, in which case the worker waiting on the lock can never complete. Add a test-case that implements the next best thing: a spinlock using a busy-wait loop that gives up after a certain number of tries. This ensures termination, and makes the test-case a valid one, while still excercising atomic exchange and atomic store. OK for trunk? Thanks, - Tom [libgomp, openacc] Add terminating spinlock test-cases libgomp/ChangeLog: 2022-02-02 Tom de Vries * testsuite/libgomp.oacc-c/spin-lock-global.c: New test. * testsuite/libgomp.oacc-c/spin-lock-global.h: New test. * testsuite/libgomp.oacc-c/spin-lock-shared.c: New test. * testsuite/libgomp.oacc-c/spin-lock-shared.h: New test. --- .../testsuite/libgomp.oacc-c/spin-lock-global.c | 43 ++++++ .../testsuite/libgomp.oacc-c/spin-lock-global.h | 169 +++++++++++++++++++++ .../testsuite/libgomp.oacc-c/spin-lock-shared.c | 35 +++++ .../testsuite/libgomp.oacc-c/spin-lock-shared.h | 135 ++++++++++++++++ 4 files changed, 382 insertions(+) diff --git a/libgomp/testsuite/libgomp.oacc-c/spin-lock-global.c b/libgomp/testsuite/libgomp.oacc-c/spin-lock-global.c new file mode 100644 index 00000000000..0c1da9e842f --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c/spin-lock-global.c @@ -0,0 +1,43 @@ +#include +#include +#include +#include + +enum memmodel + { + MEMMODEL_RELAXED = 0, + MEMMODEL_ACQUIRE = 2, + MEMMODEL_RELEASE = 3, + MEMMODEL_SEQ_CST = 5, + }; + +#define TYPE unsigned int +#define LOCKVAR1 lock_32_1 +#define LOCKVAR2 lock_32_2 +#define TESTS tests_32 +#include "spin-lock-global.h" +#undef TYPE +#undef LOCKVAR1 +#undef LOCKVAR2 +#undef TESTS + +#define TYPE unsigned long long int +#define LOCKVAR1 lock_64_1 +#define LOCKVAR2 lock_64_2 +#define TESTS tests_64 +#include "spin-lock-global.h" +#undef TYPE +#undef LOCKVAR1 +#undef LOCKVAR2 +#undef TESTS + +#define N (7 * 1000) + +int +main (void) +{ + tests_32 (N); + tests_64 (N); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c/spin-lock-global.h b/libgomp/testsuite/libgomp.oacc-c/spin-lock-global.h new file mode 100644 index 00000000000..ea63fafccb9 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c/spin-lock-global.h @@ -0,0 +1,169 @@ +#define XSTR(S) STR (S) +#define STR(S) #S + +#define PRINTF(...) \ + { \ + printf (__VA_ARGS__); \ + fflush (NULL); \ + } + +#define DO_PRAGMA(x) _Pragma (#x) + +#ifndef SPIN_CNT_MAX +/* Define to have limited-spin spinlock. + Ensures that the program will terminate. */ +#define SPIN_CNT_MAX 0x8000U +#endif + +#define TEST_1(N, LOCKVAR, VERIFY, N_GANGS, N_WORKERS) \ + assert (N % N_GANGS == 0); \ + \ + DO_PRAGMA (acc parallel \ + num_gangs(N_GANGS) \ + num_workers(N_WORKERS) \ + copy (lock_cnt) \ + copy (spin_cnt_max_hit) \ + present (LOCKVAR)) \ + { \ + TYPE unlocked = (TYPE)0; \ + TYPE locked = ~unlocked; \ + \ + LOCKVAR = unlocked; \ + \ + unsigned int n_gangs \ + = __builtin_goacc_parlevel_size (GOMP_DIM_GANG); \ + \ + DO_PRAGMA (acc loop worker) \ + for (unsigned int i = 0; i < N / n_gangs; i++) \ + { \ + TYPE res; \ + \ + unsigned int spin_cnt = 0; \ + while (1) \ + { \ + res = __atomic_exchange_n (&LOCKVAR, locked, \ + MEMMODEL_ACQUIRE); \ + if (res == locked) \ + { \ + if (SPIN_CNT_MAX > 0) \ + { \ + spin_cnt++; \ + if (spin_cnt == SPIN_CNT_MAX) \ + { \ + if (VERIFY) \ + __atomic_fetch_add (&spin_cnt_max_hit, 1, \ + MEMMODEL_RELAXED); \ + break; \ + } \ + } \ + continue; \ + \ + } \ + else \ + { \ + if (res != unlocked) \ + __builtin_abort (); \ + \ + if (VERIFY) \ + __atomic_fetch_add (&lock_cnt, 1, \ + MEMMODEL_RELAXED); \ + \ + __atomic_store_n (&LOCKVAR, unlocked, \ + MEMMODEL_RELEASE); \ + break; \ + } \ + } \ + } \ + } + +#define TEST(N, LOCKVAR, VERIFY, N_GANGS, N_WORKERS) \ + { \ + spin_cnt_max_hit = 0; \ + \ + if (VERIFY) \ + lock_cnt = 0; \ + \ + PRINTF ("%s - verify=%u - lock=%s - gangs=%u - workers=%u ... ", \ + XSTR (TYPE), VERIFY, STR(LOCKVAR), N_GANGS, N_WORKERS); \ + TEST_1 (N, LOCKVAR, VERIFY, N_GANGS, N_WORKERS); \ + PRINTF ("done\n"); \ + \ + if (VERIFY && SPIN_CNT_MAX) \ + PRINTF ("spin_cnt_max_hit: %llu\n", spin_cnt_max_hit); \ + \ + if (VERIFY && (lock_cnt + spin_cnt_max_hit != N)) \ + { \ + PRINTF ("lock_cnt: %llu\n", lock_cnt); \ + PRINTF ("lock_cnt + spin_cnt_max_hit: %llu\n", \ + lock_cnt + spin_cnt_max_hit); \ + PRINTF ("N: %u\n", N); \ + __builtin_abort (); \ + } \ + } + +/* Uses .global addressing on nvptx. */ +TYPE LOCKVAR1; +#pragma acc declare create (LOCKVAR1) + +void +TESTS (unsigned int n) +{ + unsigned long long int lock_cnt; + unsigned long long int spin_cnt_max_hit; + + /* Uses generic addressing on nvptx. */ + TYPE LOCKVAR2; +#pragma acc declare create (LOCKVAR2) + +#define N_GANGS 1 +#define N_WORKERS 8 +#define VERIFY 0 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#define VERIFY 1 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#undef N_WORKERS +#undef N_GANGS + +#define N_GANGS 2 +#define N_WORKERS 4 +#define VERIFY 0 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#define VERIFY 1 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#undef N_WORKERS +#undef N_GANGS + +#define N_GANGS 4 +#define N_WORKERS 2 +#define VERIFY 0 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#define VERIFY 1 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#undef N_WORKERS +#undef N_GANGS + +#define N_GANGS 8 +#define N_WORKERS 1 +#define VERIFY 0 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#define VERIFY 1 + TEST (n, LOCKVAR1, VERIFY, N_GANGS, N_WORKERS); + TEST (n, LOCKVAR2, VERIFY, N_GANGS, N_WORKERS); +#undef VERIFY +#undef N_WORKERS +#undef N_GANGS +} diff --git a/libgomp/testsuite/libgomp.oacc-c/spin-lock-shared.c b/libgomp/testsuite/libgomp.oacc-c/spin-lock-shared.c new file mode 100644 index 00000000000..81d18fcc798 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c/spin-lock-shared.c @@ -0,0 +1,35 @@ +#include +#include +#include +#include + +enum memmodel + { + MEMMODEL_RELAXED = 0, + MEMMODEL_ACQUIRE = 2, + MEMMODEL_RELEASE = 3, + MEMMODEL_SEQ_CST = 5, + }; + +#define TYPE unsigned int +#define TESTS tests_32 +#include "spin-lock-shared.h" +#undef TYPE +#undef TESTS + +#define TYPE unsigned long long int +#define TESTS tests_64 +#include "spin-lock-shared.h" +#undef TYPE +#undef TESTS + +#define N (50 * 1000) + +int +main (void) +{ + tests_32 (N); + tests_64 (N); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c/spin-lock-shared.h b/libgomp/testsuite/libgomp.oacc-c/spin-lock-shared.h new file mode 100644 index 00000000000..923f38c60fe --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c/spin-lock-shared.h @@ -0,0 +1,135 @@ +#define XSTR(S) STR (S) +#define STR(S) #S + +#define PRINTF(...) \ + { \ + printf (__VA_ARGS__); \ + fflush (NULL); \ + } + +#define DO_PRAGMA(x) _Pragma (#x) + +#ifndef SPIN_CNT_MAX +/* Define to have limited-spin spinlock. + Ensures that the program will terminate. */ +#define SPIN_CNT_MAX 0x20000U +#endif + +#define TEST_1(N, LOCKREF) \ + DO_PRAGMA (acc parallel \ + num_gangs(1) \ + num_workers(N_WORKERS) \ + copy (lock_cnt) \ + copy (spin_cnt_max_hit)) \ + { \ + TYPE unlocked = (TYPE)0; \ + TYPE locked = ~unlocked; \ + TYPE lock; \ + TYPE *volatile lock_ptr = &lock; \ + unsigned long long int lock_cnt_1; \ + unsigned long long int spin_cnt_max_hit_1; \ + \ + if (VERIFY) \ + { \ + lock_cnt_1 = 0; \ + \ + if (SPIN_CNT_MAX) \ + spin_cnt_max_hit_1 = 0; \ + } \ + \ + *(LOCKREF) = unlocked; \ + \ + DO_PRAGMA (acc loop worker) \ + for (unsigned int i = 0; i < N; i++) \ + { \ + TYPE res; \ + \ + unsigned int spin_cnt = 0; \ + while (1) \ + { \ + res = __atomic_exchange_n (LOCKREF, locked, \ + MEMMODEL_ACQUIRE); \ + if (res == locked) \ + { \ + if (SPIN_CNT_MAX > 0) \ + { \ + spin_cnt++; \ + if (spin_cnt == SPIN_CNT_MAX) \ + { \ + if (VERIFY) \ + __atomic_fetch_add (&spin_cnt_max_hit_1, 1, \ + MEMMODEL_RELAXED); \ + break; \ + } \ + } \ + continue; \ + } \ + else \ + { \ + if (res != unlocked) \ + __builtin_abort (); \ + \ + if (VERIFY) \ + __atomic_fetch_add (&lock_cnt_1, 1, \ + MEMMODEL_RELAXED); \ + \ + __atomic_store_n (LOCKREF, unlocked, \ + MEMMODEL_RELEASE); \ + \ + break; \ + } \ + } \ + } \ + \ + if (VERIFY) \ + { \ + lock_cnt += lock_cnt_1; \ + \ + if (SPIN_CNT_MAX) \ + spin_cnt_max_hit += spin_cnt_max_hit_1; \ + } \ + } + +#define TEST(N, LOCKREF) \ + { \ + spin_cnt_max_hit = 0; \ + \ + if (VERIFY) \ + lock_cnt = 0; \ + \ + PRINTF ("%s - verify=%u - LOCKREF=%s ... ", \ + XSTR (TYPE), VERIFY, #LOCKREF); \ + TEST_1 (N, LOCKREF); \ + PRINTF ("done\n"); \ + \ + if (VERIFY && SPIN_CNT_MAX) \ + PRINTF ("spin_cnt_max_hit: %llu\n", spin_cnt_max_hit); \ + \ + if (VERIFY && (lock_cnt + spin_cnt_max_hit != N)) \ + { \ + PRINTF ("lock_cnt: %llu\n", lock_cnt); \ + PRINTF ("lock_cnt + spin_cnt_max_hit: %llu\n", \ + lock_cnt + spin_cnt_max_hit); \ + PRINTF ("N: %u\n", N); \ + __builtin_abort (); \ + } \ + } + +void +TESTS (unsigned int n) +{ + unsigned long long int lock_cnt; + unsigned long long int spin_cnt_max_hit; + +#define N_WORKERS 8 + +#define VERIFY 0 + TEST (n, &lock); + TEST (n, lock_ptr); +#undef VERIFY + +#define VERIFY 1 + TEST (n, &lock); + TEST (n, lock_ptr); +#undef VERIFY +}