From patchwork Wed Apr 28 19:34:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Zhuykov X-Patchwork-Id: 1471409 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=MA6UZTa5; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4FVpkL1KJrz9sjD for ; Thu, 29 Apr 2021 05:35:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DEB7E397C816; Wed, 28 Apr 2021 19:35:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DEB7E397C816 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1619638500; bh=HFm1eVlQ8OIhyu3gJ7O2eLHXpA76MowhsS+YSbS9oGI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=MA6UZTa53W76fwm6eIcPG+eXnH3RQw7U+lKooD9padNK5TxtzkgcCczKAA4qxyu3T Cm20TgqgaGy/mA1c6z8WUbK01F5fzHqMSlRtJmXWQ7mwZXRUDPpxwxZ0HVuy8lAyk0 Lk9zPX/DTv462LASV0w1czPirMLKxxiazVAJBnu0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 4ED7B394480C; Wed, 28 Apr 2021 19:34:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4ED7B394480C Received: from [10.10.3.54] (unknown [10.10.3.54]) by mail.ispras.ru (Postfix) with ESMTP id 8457A40D403D; Wed, 28 Apr 2021 19:34:52 +0000 (UTC) To: "gcc-patches@gcc.gnu.org" Subject: [PATCH] modulo-sched: skip loops with strange register defs [PR100225] Message-ID: <5918ecac-942d-e4bf-aab7-1c8eaeaeea92@ispras.ru> Date: Wed, 28 Apr 2021 22:34:52 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Language: en-US X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Roman Zhuykov via Gcc-patches From: Roman Zhuykov Reply-To: Roman Zhuykov Cc: Jakub Jelinek , Alexander Monakov , Richard Biener , Alex Coplan Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi all! Situation from PR was already caught earlier locally. So, I've just extracted appropriate part, it also slightly modifies loop checks related to non-single-set instructions. Patch (attached) was successfully bootstrapped/regtested on aarch64-linux on all active branches (8-12) with modulo-sched enabled by default, usual amd64-linux reg-strap is also ok. Pushing to trunk after 24h if no objections, backporting to 8-11 branches next week. Roman --- modulo-sched: skip loops with strange register defs [PR100225] PR84878 fix adds an assertion which can fail, e.g. when stack pointer is adjusted inside the loop. We have to prevent it and search earlier for any 'strange' instruction. The solution is to skip the whole loop if using 'note_stores' we found that one of hard registers is in 'df->regular_block_artificial_uses' set. Also patch properly prohibit not single-set instruction in loop body. gcc/ChangeLog: PR rtl-optimization/100225 PR rtl-optimization/84878 * modulo-sched.c (sms_schedule): Use note_stores to skip loops where we have an instruction which touches (writes) any hard register from df->regular_block_artificial_uses set. Allow not-single-set instruction only right before basic block tail. gcc/testsuite/ChangeLog: PR rtl-optimization/100225 PR rtl-optimization/84878 * gcc.dg/pr100225.c: New test. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: New test. -- modulo-sched: skip loops with strange register defs [PR100225] PR84878 fix adds an assertion which can fail, e.g. when stack pointer is adjusted inside the loop. We have to prevent it and search earlier for any 'strange' instruction. The solution is to skip the whole loop if using 'note_stores' we found that one of hard registers is in 'df->regular_block_artificial_uses' set. Also patch properly prohibit not single-set instruction in loop body. gcc/ChangeLog: PR rtl-optimization/100225 PR rtl-optimization/84878 * modulo-sched.c (sms_schedule): Use note_stores to skip loops where we have an instruction which touches (writes) any hard register from df->regular_block_artificial_uses set. Allow not-single-set instruction only right before basic block tail. gcc/testsuite/ChangeLog: PR rtl-optimization/100225 PR rtl-optimization/84878 * gcc.dg/pr100225.c: New test. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: New test. diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c index 6ad960ec1f..e72e46db38 100644 --- a/gcc/modulo-sched.c +++ b/gcc/modulo-sched.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "dbgcnt.h" #include "loop-unroll.h" +#include "hard-reg-set.h" #ifdef INSN_SCHEDULING @@ -1356,6 +1357,7 @@ sms_schedule (void) basic_block condition_bb = NULL; edge latch_edge; HOST_WIDE_INT trip_count, max_trip_count; + HARD_REG_SET prohibited_regs; loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_RECORDED_EXITS); @@ -1385,6 +1387,8 @@ sms_schedule (void) We use loop->num as index into this array. */ g_arr = XCNEWVEC (ddg_ptr, number_of_loops (cfun)); + REG_SET_TO_HARD_REG_SET (prohibited_regs, &df->regular_block_artificial_uses); + if (dump_file) { fprintf (dump_file, "\n\nSMS analysis phase\n"); @@ -1469,23 +1473,31 @@ sms_schedule (void) } /* Don't handle BBs with calls or barriers - or !single_set with the exception of instructions that include - count_reg---these instructions are part of the control part - that do-loop recognizes. + or !single_set with the exception of do-loop control part insns. ??? Should handle insns defining subregs. */ - for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn)) - { - rtx set; - - if (CALL_P (insn) - || BARRIER_P (insn) - || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn) - && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE - && !reg_mentioned_p (count_reg, insn)) - || (INSN_P (insn) && (set = single_set (insn)) - && GET_CODE (SET_DEST (set)) == SUBREG)) - break; - } + for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn)) + { + if (INSN_P (insn)) + { + HARD_REG_SET regs; + CLEAR_HARD_REG_SET (regs); + note_stores (insn, record_hard_reg_sets, ®s); + if (hard_reg_set_intersect_p (regs, prohibited_regs)) + break; + } + + if (CALL_P (insn) + || BARRIER_P (insn) + || (INSN_P (insn) && single_set (insn) + && GET_CODE (SET_DEST (single_set (insn))) == SUBREG) + /* Not a single set. */ + || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn) + && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE + /* But non-single-set allowed in one special case. */ + && (insn != prev_nondebug_insn (tail) + || !reg_mentioned_p (count_reg, insn)))) + break; + } if (insn != NEXT_INSN (tail)) { @@ -1495,11 +1507,13 @@ sms_schedule (void) fprintf (dump_file, "SMS loop-with-call\n"); else if (BARRIER_P (insn)) fprintf (dump_file, "SMS loop-with-barrier\n"); - else if ((NONDEBUG_INSN_P (insn) && !JUMP_P (insn) - && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)) - fprintf (dump_file, "SMS loop-with-not-single-set\n"); - else - fprintf (dump_file, "SMS loop with subreg in lhs\n"); + else if (INSN_P (insn) && single_set (insn) + && GET_CODE (SET_DEST (single_set (insn))) == SUBREG) + fprintf (dump_file, "SMS loop with subreg in lhs\n"); + else + fprintf (dump_file, + "SMS loop-with-not-single-set-or-prohibited-reg\n"); + print_rtl_single (dump_file, insn); } diff --git a/gcc/testsuite/gcc.dg/pr100225.c b/gcc/testsuite/gcc.dg/pr100225.c new file mode 100644 index 0000000000..b32163441a --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr100225.c @@ -0,0 +1,15 @@ +/* PR rtl-optimization/100225 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -fmodulo-sched" } */ + +void +vorbis_synthesis_lapout (void); + +void +ov_info (int **lappcm, int ov_info_i) +{ + while (ov_info_i < 1) + lappcm[ov_info_i++] = __builtin_alloca (1); + + vorbis_synthesis_lapout (); +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c new file mode 100644 index 0000000000..b976094998 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c @@ -0,0 +1,1627 @@ +/* { dg-do run } */ +/* { dg-additional-options "-fmodulo-sched -fmodulo-sched-allow-regmoves" } */ + +#include + +int +main(int argc, char **argv) +{ + int iexp, igot, imax, imin; + long long lexp, lgot; + int N = 32; + int i; + int idata[N]; + long long ldata[N]; + float fexp, fgot; + float fdata[N]; + + igot = 1234; + iexp = 31; + + for (i = 0; i < N; i++) + idata[i] = i; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { idata[i] = igot; igot = i; } + } + + imax = 0; + imin = N; + + for (i = 0; i < N; i++) + { + imax = idata[i] > imax ? idata[i] : imax; + imin = idata[i] < imin ? idata[i] : imin; + } + + if (imax != 1234 || imin != 0) + abort (); + + return 0; + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { idata[i] = igot; igot++; } + } + + if (iexp != igot) + abort (); + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { idata[i] = igot; ++igot; } + } + + if (iexp != igot) + abort (); + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { ++igot; idata[i] = igot; } + } + + if (iexp != igot) + abort (); + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { igot++; idata[i] = igot; } + } + + if (iexp != igot) + abort (); + + igot = 32; + iexp = 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { idata[i] = igot; igot--; } + } + + if (iexp != igot) + abort (); + + igot = 32; + iexp = 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { idata[i] = igot; --igot; } + } + + if (iexp != igot) + abort (); + + igot = 32; + iexp = 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { --igot; idata[i] = igot; } + } + + if (iexp != igot) + abort (); + + igot = 32; + iexp = 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) +#pragma acc atomic capture + { igot--; idata[i] = igot; } + } + + if (iexp != igot) + abort (); + + /* BINOP = + */ + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { idata[i] = igot; igot += expr; } + } + } + + if (iexp != igot) + abort (); + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { igot += expr; idata[i] = igot; } + } + } + + if (iexp != igot) + abort (); + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { idata[i] = igot; igot = igot + expr; } + } + } + + if (iexp != igot) + abort (); + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { idata[i] = igot; igot = expr + igot; } + } + } + + if (iexp != igot) + abort (); + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { igot = igot + expr; idata[i] = igot; } + } + } + + if (iexp != igot) + abort (); + + + igot = 0; + iexp = 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { igot = expr + igot; idata[i] = igot; } + } + } + + if (iexp != igot) + abort (); + + /* BINOP = * */ + lgot = 1LL; + lexp = 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot *= expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { lgot *= expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = lgot * expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = expr * lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { lgot = lgot * expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2; + +#pragma acc atomic capture + { lgot = expr * lgot; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + /* BINOP = - */ + igot = 32; + iexp = 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { idata[i] = igot; igot -= expr; } + } + } + + if (iexp != igot) + abort (); + + igot = 32; + iexp = 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { igot -= expr; idata[i] = igot; } + } + } + + if (iexp != igot) + abort (); + + igot = 32; + iexp = 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { idata[i] = igot; igot = igot - expr; } + } + } + + if (iexp != igot) + abort (); + + igot = 1; + iexp = 1; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { idata[i] = igot; igot = expr - igot; } + } + } + + for (i = 0; i < N; i++) + if (i % 2 == 0) + { + if (idata[i] != 1) + abort (); + } + else + { + if (idata[i] != 0) + abort (); + } + + if (iexp != igot) + abort (); + + igot = 1; + iexp = -31; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { igot = igot - expr; idata[i] = igot; } + } + } + + if (iexp != igot) + abort (); + + igot = 1; + iexp = 1; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + int expr = 1; + +#pragma acc atomic capture + { igot = expr - igot; idata[i] = igot; } + } + } + + for (i = 0; i < N; i++) + if (i % 2 == 0) + { + if (idata[i] != 0) + abort (); + } + else + { + if (idata[i] != 1) + abort (); + } + + if (iexp != igot) + abort (); + + /* BINOP = / */ + lgot = 1LL << 32; + lexp = 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot /= expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL << 32; + lexp = 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { lgot /= expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL << 32; + lexp = 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = lgot / expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 2LL; + lexp = 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL << N; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = expr / lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 2LL; + lexp = 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL << N; + +#pragma acc atomic capture + { lgot = lgot / expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 2LL; + lexp = 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL << N; + +#pragma acc atomic capture + { lgot = expr / lgot; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + /* BINOP = & */ + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { ldata[i] = lgot; lgot &= expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + iexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot &= expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = lgot & expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = expr & lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + iexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot = lgot & expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot = expr & lgot; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + /* BINOP = ^ */ + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1 << i; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot ^= expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + iexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot ^= expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = lgot ^ expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = expr ^ lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + iexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot = lgot ^ expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = ~0LL; + lexp = 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot = expr ^ lgot; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + /* BINOP = | */ + lgot = 0LL; + lexp = ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1 << i; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot |= expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 0LL; + iexp = ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot |= expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 0LL; + lexp = ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = lgot | expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 0LL; + lexp = ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = expr | lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 0LL; + iexp = ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot = lgot | expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 0LL; + lexp = ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = ~(1 << i); + +#pragma acc atomic capture + { lgot = expr | lgot; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + /* BINOP = << */ + lgot = 1LL; + lexp = 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot <<= expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + iexp = 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { lgot <<= expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = lgot << expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < 1; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = expr << lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < 1; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { lgot = lgot << expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < 1; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { lgot = expr << lgot; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + /* BINOP = >> */ + lgot = 1LL << N; + lexp = 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot >>= expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL << N; + iexp = 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { lgot >>= expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL << N; + lexp = 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = lgot >> expr; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << (N - 1); + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < 1; i++) + { + long long expr = 1LL << N; + +#pragma acc atomic capture + { ldata[i] = lgot; lgot = expr >> lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL << N; + lexp = 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL; + +#pragma acc atomic capture + { lgot = lgot >> expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + lgot = 1LL; + lexp = 1LL << (N - 1); + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < 1; i++) + { + long long expr = 1LL << N; + +#pragma acc atomic capture + { lgot = expr >> lgot; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + // FLOAT FLOAT FLOAT + + /* BINOP = + */ + fgot = 0.0; + fexp = 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot += expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 0.0; + fexp = 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fgot += expr; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 0.0; + fexp = 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { idata[i] = fgot; fgot = fgot + expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 0.0; + fexp = 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot = expr + fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 0.0; + fexp = 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fgot = fgot + expr; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 0.0; + fexp = 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fgot = expr + fgot; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + /* BINOP = * */ + fgot = 1.0; + fexp = 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot *= expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 1.0; + fexp = 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fgot *= expr; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 1.0; + fexp = 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot = fgot * expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 1.0; + fexp = 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot = expr * fgot; } + } + } + + if (fexp != fgot) + abort (); + + lgot = 1LL; + lexp = 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2LL; + +#pragma acc atomic capture + { lgot = lgot * expr; ldata[i] = lgot; } + } + } + + if (lexp != lgot) + abort (); + + fgot = 1.0; + fexp = 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 2; + +#pragma acc atomic capture + { fgot = expr * fgot; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + /* BINOP = - */ + fgot = 32.0; + fexp = 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot -= expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 32.0; + fexp = 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fgot -= expr; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 32.0; + fexp = 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot = fgot - expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 1.0; + fexp = 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot = expr - fgot; } + } + } + + for (i = 0; i < N; i++) + if (i % 2 == 0) + { + if (fdata[i] != 1.0) + abort (); + } + else + { + if (fdata[i] != 0.0) + abort (); + } + + if (fexp != fgot) + abort (); + + fgot = 1.0; + fexp = -31.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fgot = fgot - expr; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 1.0; + fexp = 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fgot = expr - fgot; fdata[i] = fgot; } + } + } + + for (i = 0; i < N; i++) + if (i % 2 == 0) + { + if (fdata[i] != 0.0) + abort (); + } + else + { + if (fdata[i] != 1.0) + abort (); + } + + if (fexp != fgot) + abort (); + + /* BINOP = / */ + fgot = 8192.0*8192.0*64.0; + fexp = 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot /= expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 8192.0*8192.0*64.0; + fexp = 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fgot /= expr; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 8192.0*8192.0*64.0; + fexp = 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot = fgot / expr; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 8192.0*8192.0*64.0; + fexp = 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 1.0; + +#pragma acc atomic capture + { fdata[i] = fgot; fgot = expr / fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 4.0; + fexp = 4.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + long long expr = 1LL << N; + +#pragma acc atomic capture + { fgot = fgot / expr; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + fgot = 4.0; + fexp = 4.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i = 0; i < N; i++) + { + float expr = 2.0; + +#pragma acc atomic capture + { fgot = expr / fgot; fdata[i] = fgot; } + } + } + + if (fexp != fgot) + abort (); + + return 0; +}