From patchwork Wed Oct 21 19:24:13 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Sidwell X-Patchwork-Id: 534016 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 7D4C11409F8 for ; Thu, 22 Oct 2015 06:24:31 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=P8mddUSU; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=JUzjJllJ4w9vuUnop BBsJWVND/p5yoRx0fXWNi+hNGxnMo8/i/umyvA7Ez4JFrQ8vulBU9ZiPodPfsBYK jeh+D1bhjfN/jnKtx1/GmFsxoh+tkROdvqbsELPCUUpX9Vm6vF5LkmUjR0V1j/KV xp3+sB1P5aXLaAVUVol9eEtCJk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=JTswmdc5tCPgRHp5gP0/fAZ MEIA=; b=P8mddUSUG6iXLtG3by8Vg+94yniDpd0kMB3kfCHBhDlvxvG+cfYquHR Wns7vZdAcXZCLSpKt/1bXmJnUxl8pLlTtLCu1wfz3yRaQ9FVeoXChWpO9fKNLXmQ dBE6v5kcdRWyoOdc0NwFr7HiSzEgsYYyJG9nwFvXz9clggalyb6I= Received: (qmail 61031 invoked by alias); 21 Oct 2015 19:24:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 61001 invoked by uid 89); 21 Oct 2015 19:24:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS, UNSUBSCRIBE_BODY autolearn=no version=3.3.2 X-HELO: mail-qg0-f46.google.com Received: from mail-qg0-f46.google.com (HELO mail-qg0-f46.google.com) (209.85.192.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 21 Oct 2015 19:24:17 +0000 Received: by qgad10 with SMTP id d10so36719639qga.3 for ; Wed, 21 Oct 2015 12:24:15 -0700 (PDT) X-Received: by 10.140.17.41 with SMTP id 38mr13388398qgc.55.1445455454995; Wed, 21 Oct 2015 12:24:14 -0700 (PDT) Received: from ?IPv6:2601:181:c000:c497:a2a8:cdff:fe3e:b48? ([2601:181:c000:c497:a2a8:cdff:fe3e:b48]) by smtp.googlemail.com with ESMTPSA id t64sm3864362qkt.11.2015.10.21.12.24.14 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Oct 2015 12:24:14 -0700 (PDT) Subject: Re: [OpenACC 6/11] Reduction initialization To: GCC Patches References: <5627DD78.9040302@acm.org> Cc: Jakub Jelinek , Bernd Schmidt , Jason Merrill , "Joseph S. Myers" From: Nathan Sidwell Message-ID: <5627E65D.30106@acm.org> Date: Wed, 21 Oct 2015 15:24:13 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <5627DD78.9040302@acm.org> This patch is a temporary measure to avoid breaking reductions, until I post the reductions patch set (which builds on this). Currently OpenACC reductions are handled by (a) spawning all threads throughout the offload region (b) having them each individually write to an allocated slot in a 'reductions array', according to their thread number. (c) having the host collate the reduction values after the region. This is clearly a rather restricted implementation of reductions. With loop partitioning implemented, not all threads execute though -- in fact, on a loop lacking any gang, worker or vector specifier, the loop won't be partitioned (until I commit the 'auto' implementation). This leads to entries in the reduction array being uninitialized. This patch takes the brute-force approach of initializing the reductions array on the host before offloading and then copying it to the device. Thus at the end of the region, any slots that weren't used have a sensible initial value which will not destroy the reduction result. This code should be short lived ... nathan 2015-10-20 Nathan Sidwell * omp-low.c (oacc_init_rediction_array): New. (oacc_initialize_reduction_data): Initialize array. Index: gcc/omp-low.c =================================================================== --- gcc/omp-low.c (revision 229101) +++ gcc/omp-low.c (working copy) @@ -12202,6 +13008,71 @@ oacc_gimple_assign (tree dest, tree_code gimplify_assign (dest, result, seq); } +/* Initialize the reduction array with default values. */ + +static void +oacc_init_reduction_array (tree array, tree init, tree nthreads, + gimple_seq *stmt_seqp) +{ + tree type = TREE_TYPE (TREE_TYPE (array)); + tree x, loop_header, loop_body, loop_exit; + gimple *stmt; + + /* Create for loop. + + let var = the original reduction variable + let array = reduction variable array + + for (i = 0; i < nthreads; i++) + var op= array[i] + */ + + loop_header = create_artificial_label (UNKNOWN_LOCATION); + loop_body = create_artificial_label (UNKNOWN_LOCATION); + loop_exit = create_artificial_label (UNKNOWN_LOCATION); + + /* Create and initialize an index variable. */ + tree ix = create_tmp_var (sizetype); + gimplify_assign (ix, fold_build1 (NOP_EXPR, sizetype, integer_zero_node), + stmt_seqp); + + /* Insert the loop header label here. */ + gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_header)); + + /* Exit loop if ix >= nthreads. */ + x = create_tmp_var (sizetype); + gimplify_assign (x, fold_build1 (NOP_EXPR, sizetype, nthreads), stmt_seqp); + stmt = gimple_build_cond (GE_EXPR, ix, x, loop_exit, loop_body); + gimple_seq_add_stmt (stmt_seqp, stmt); + + /* Insert the loop body label here. */ + gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_body)); + + /* Calculate the array offset. */ + tree offset = create_tmp_var (sizetype); + gimplify_assign (offset, TYPE_SIZE_UNIT (type), stmt_seqp); + stmt = gimple_build_assign (offset, MULT_EXPR, offset, ix); + gimple_seq_add_stmt (stmt_seqp, stmt); + + tree ptr = create_tmp_var (TREE_TYPE (array)); + stmt = gimple_build_assign (ptr, POINTER_PLUS_EXPR, array, offset); + gimple_seq_add_stmt (stmt_seqp, stmt); + + /* Assign init. */ + gimplify_assign (build_simple_mem_ref (ptr), init, stmt_seqp); + + /* Increment the induction variable. */ + tree one = fold_build1 (NOP_EXPR, sizetype, integer_one_node); + stmt = gimple_build_assign (ix, PLUS_EXPR, ix, one); + gimple_seq_add_stmt (stmt_seqp, stmt); + + /* Go back to the top of the loop. */ + gimple_seq_add_stmt (stmt_seqp, gimple_build_goto (loop_header)); + + /* Place the loop exit label here. */ + gimple_seq_add_stmt (stmt_seqp, gimple_build_label (loop_exit)); +} + /* Helper function to initialize local data for the reduction arrays. The reduction arrays need to be placed inside the calling function for accelerators, or else the host won't be able to preform the final @@ -12261,12 +13132,18 @@ oacc_initialize_reduction_data (tree cla gimple_call_set_lhs (stmt, array); gimple_seq_add_stmt (stmt_seqp, stmt); + /* Initialize array. */ + tree init = omp_reduction_init_op (OMP_CLAUSE_LOCATION (c), + OMP_CLAUSE_REDUCTION_CODE (c), + type); + oacc_init_reduction_array (array, init, nthreads, stmt_seqp); + /* Map this array into the accelerator. */ /* Add the reduction array to the list of clauses. */ tree x = array; t = build_omp_clause (gimple_location (ctx->stmt), OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (t, GOMP_MAP_FORCE_FROM); + OMP_CLAUSE_SET_MAP_KIND (t, GOMP_MAP_FORCE_TOFROM); OMP_CLAUSE_DECL (t) = x; OMP_CLAUSE_CHAIN (t) = NULL; if (oc)