From patchwork Mon Oct 19 23:06:25 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nathan Sidwell X-Patchwork-Id: 532730 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 79A071400A0 for ; Tue, 20 Oct 2015 10:06:40 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=DJULQsJp; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=f0wVa6hQCadVKxvRv1KdSuHInmYyTUxRjJ/5Gs5b793xa4mcLx HzidYMjCmXfbYnDmdTteo7LvoLl1Fr1js13F/bH/iQSy4tW6mKp6uL9N0YZOi54O BQvlw1LMZ8ap1DhtZzn2bSlpJmP0UvqofsqXe1Q2Nv3Jcuvd/k9/ivXmk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=/34IrXQN+SHc+JwmQSbFZveSdDA=; b=DJULQsJpPABZf5VtkCDI 9XQmijE4qM23fh2ghs3i6PtY5zasOb7Eho1mmxVi3wi/YhezJZ7HQJbq2E00xWuN itZzQ1cjDthPeR/5JuZVSOhjO4h/QEUd8Zy0+6v4Qu5PAyxPPWVJMZbke9agDOL9 zmiRpolDMmNLdEAK965Lijk= Received: (qmail 53193 invoked by alias); 19 Oct 2015 23:06:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 53171 invoked by uid 89); 19 Oct 2015 23:06:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qg0-f51.google.com Received: from mail-qg0-f51.google.com (HELO mail-qg0-f51.google.com) (209.85.192.51) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 19 Oct 2015 23:06:29 +0000 Received: by qgem9 with SMTP id m9so279590qge.1 for ; Mon, 19 Oct 2015 16:06:27 -0700 (PDT) X-Received: by 10.140.38.114 with SMTP id s105mr40200802qgs.45.1445295987151; Mon, 19 Oct 2015 16:06:27 -0700 (PDT) Received: from ?IPv6:2601:181:c000:c497:a2a8:cdff:fe3e:b48? ([2601:181:c000:c497:a2a8:cdff:fe3e:b48]) by smtp.googlemail.com with ESMTPSA id a109sm39614qge.18.2015.10.19.16.06.26 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Oct 2015 16:06:26 -0700 (PDT) To: GCC Patches From: Nathan Sidwell Subject: [gomp4] loop cleanup Message-ID: <56257771.90907@acm.org> Date: Mon, 19 Oct 2015 19:06:25 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 I've committed this to gomp4. 1) small cleanup combining the bodies of two identical conditionals. 2) replace and move the OpenACC thread numbering expanders to be nearer the now sole user. nathan 2015-10-19 Nathan Sidwell * omp-low.c (scan_omp_for): Combine OpenACC conditional. (expand_oacc_get_num_threads, expand_oacc_get_thread_num): Delete. (oacc_thread_numbers): New. (oacc_xform_loop): Correct comment, Use oacc_thread_numbers. Index: gcc/omp-low.c =================================================================== --- gcc/omp-low.c (revision 229002) +++ gcc/omp-low.c (working copy) @@ -2911,11 +2911,7 @@ scan_omp_for (gomp_for *stmt, omp_contex "argument not permitted on %<%s%> clause in" " OpenACC %", check); } - } - if (is_gimple_omp_oacc (stmt)) - { - omp_context *tgt = enclosing_target_ctx (ctx); if (tgt && is_oacc_kernels (tgt)) { /* Strip out reductions, as they are not handled yet. */ @@ -5131,80 +5127,6 @@ is_atomic_compatible_reduction (tree var } -/* Find the total number of threads used by a region partitioned by - GWV_BITS. Setup code required for the calculation is added to SEQ. Note - that this is currently used from both OMP-lowering and OMP-expansion phases, - and uses builtins specific to NVidia PTX: this will need refactoring into a - generic interface when support for other targets is added. */ - -static tree -expand_oacc_get_num_threads (gimple_seq *seq, int gwv_bits) -{ - tree res = build_int_cst (unsigned_type_node, 1); - unsigned ix; - - for (ix = GOMP_DIM_GANG; ix != GOMP_DIM_MAX; ix++) - if (GOMP_DIM_MASK(ix) & gwv_bits) - { - tree arg = build_int_cst (integer_type_node, ix); - tree count = create_tmp_var (integer_type_node); - gimple *call = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, arg); - - gimple_call_set_lhs (call, count); - gimple_seq_add_stmt (seq, call); - res = fold_build2 (MULT_EXPR, integer_type_node, res, count); - } - - return res; -} - -/* Find the current thread number to use within a region partitioned by - GWV_BITS. Setup code required for the calculation is added to SEQ. See - note for expand_oacc_get_num_threads above re: builtin usage. */ - -static tree -expand_oacc_get_thread_num (gimple_seq *seq, int gwv_bits) -{ - tree res = NULL_TREE; - unsigned ix; - - /* Start at gang level, and examine relevant dimension indices. */ - for (ix = GOMP_DIM_GANG; ix != GOMP_DIM_MAX; ix++) - if (GOMP_DIM_MASK (ix) & gwv_bits) - { - tree arg = build_int_cst (unsigned_type_node, ix); - - if (res) - { - /* We had an outer index, so scale that by the size of - this dimension. */ - tree n = create_tmp_var (integer_type_node); - gimple *call - = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, arg); - - gimple_call_set_lhs (call, n); - gimple_seq_add_stmt (seq, call); - res = fold_build2 (MULT_EXPR, integer_type_node, res, n); - } - - /* Determine index in this dimension. */ - tree id = create_tmp_var (integer_type_node); - gimple *call = gimple_build_call_internal (IFN_GOACC_DIM_POS, 1, arg); - - gimple_call_set_lhs (call, id); - gimple_seq_add_stmt (seq, call); - if (res) - res = fold_build2 (PLUS_EXPR, integer_type_node, res, id); - else - res = id; - } - - if (res == NULL_TREE) - res = build_int_cst (integer_type_node, 0); - - return res; -} - /* Lower the OpenACC reductions of CLAUSES for compute axis LEVEL (which might be a placeholder). INNER is true if this is an inner axis of a multi-axis loop. FORK and JOIN are (optional) fork and @@ -16904,16 +16826,63 @@ make_pass_late_lower_omp (gcc::context * return new pass_late_lower_omp (ctxt); } +/* Find the number of threads (POS = false), or thread number (POS = + tre) for an OpenACC region partitioned as MASK. Setup code + required for the calculation is added to SEQ. */ + +static tree +oacc_thread_numbers (bool pos, int mask, gimple_seq *seq) +{ + tree res = pos ? NULL_TREE : build_int_cst (unsigned_type_node, 1); + unsigned ix; + + /* Start at gang level, and examine relevant dimension indices. */ + for (ix = GOMP_DIM_GANG; ix != GOMP_DIM_MAX; ix++) + if (GOMP_DIM_MASK (ix) & mask) + { + tree arg = build_int_cst (unsigned_type_node, ix); + + if (res) + { + /* We had an outer index, so scale that by the size of + this dimension. */ + tree n = create_tmp_var (integer_type_node); + gimple *call + = gimple_build_call_internal (IFN_GOACC_DIM_SIZE, 1, arg); + + gimple_call_set_lhs (call, n); + gimple_seq_add_stmt (seq, call); + res = fold_build2 (MULT_EXPR, integer_type_node, res, n); + } + if (pos) + { + /* Determine index in this dimension. */ + tree id = create_tmp_var (integer_type_node); + gimple *call = gimple_build_call_internal + (IFN_GOACC_DIM_POS, 1, arg); + + gimple_call_set_lhs (call, id); + gimple_seq_add_stmt (seq, call); + if (res) + res = fold_build2 (PLUS_EXPR, integer_type_node, res, id); + else + res = id; + } + } + + if (res == NULL_TREE) + res = build_int_cst (integer_type_node, 0); + + return res; +} + /* Transform IFN_GOACC_LOOP calls to actual code. See expand_oacc_for for where these are generated. At the vector level, we stride loops, such that each member of a warp will operate on adjacent iterations. At the worker and gang level, each gang/warp executes a set of contiguous iterations. Chunking can override this such that each iteration engine executes a - contiguous chunk, and then moves on to stride to the next chunk. - - TODO: As with expand_oacc_for, the presence of GWV and CHUNK_SIZE - parameters here is an intermediate step. */ + contiguous chunk, and then moves on to stride to the next chunk. */ static void oacc_xform_loop (gcall *call) @@ -16964,7 +16933,7 @@ oacc_xform_loop (gcall *call) { /* chunk_max = (range - dir) / (chunks * step * num_threads) + dir */ - tree per = expand_oacc_get_num_threads (&seq, mask); + tree per = oacc_thread_numbers (false, mask, &seq); per = fold_convert (type, per); chunk_size = fold_convert (type, chunk_size); per = fold_build2 (MULT_EXPR, type, per, chunk_size); @@ -16981,7 +16950,7 @@ oacc_xform_loop (gcall *call) step by the inner volume. */ unsigned volume = striding ? mask : inner_mask; - r = expand_oacc_get_num_threads (&seq, volume); + r = oacc_thread_numbers (false, volume, &seq); r = build2 (MULT_EXPR, type, fold_convert (type, r), step); } break; @@ -16989,14 +16958,14 @@ oacc_xform_loop (gcall *call) case IFN_GOACC_LOOP_OFFSET: if (striding) { - r = expand_oacc_get_thread_num (&seq, mask); + r = oacc_thread_numbers (true, mask, &seq); r = fold_convert (diff_type, r); } else { tree span; - tree inner_size = expand_oacc_get_num_threads (&seq, inner_mask); - tree outer_size = expand_oacc_get_num_threads (&seq, outer_mask); + tree inner_size = oacc_thread_numbers (false, inner_mask, &seq); + tree outer_size = oacc_thread_numbers (false, outer_mask, &seq); tree volume = fold_build2 (MULT_EXPR, TREE_TYPE (inner_size), inner_size, outer_size); @@ -17019,11 +16988,11 @@ oacc_xform_loop (gcall *call) span = build2 (MULT_EXPR, diff_type, span, inner_size); } - r = expand_oacc_get_thread_num (&seq, outer_mask); + r = oacc_thread_numbers (true, outer_mask, &seq); r = fold_convert (diff_type, r); r = build2 (MULT_EXPR, diff_type, r, span); - tree inner = expand_oacc_get_thread_num (&seq, inner_mask); + tree inner = oacc_thread_numbers (true, inner_mask, &seq); inner = fold_convert (diff_type, inner); r = fold_build2 (PLUS_EXPR, diff_type, r, inner); @@ -17054,13 +17023,13 @@ oacc_xform_loop (gcall *call) { chunk_size = fold_convert (diff_type, chunk_size); - span = expand_oacc_get_num_threads (&seq, inner_mask); + span = oacc_thread_numbers (false, inner_mask, &seq); span = fold_convert (diff_type, span); span = fold_build2 (MULT_EXPR, diff_type, span, chunk_size); } else { - tree per = expand_oacc_get_num_threads (&seq, mask); + tree per = oacc_thread_numbers (false, mask, &seq); per = fold_convert (diff_type, per); per = build2 (MULT_EXPR, diff_type, per, step); span = build2 (MINUS_EXPR, diff_type, range, dir);