[OG11,committed,09/22] openacc: Use Graphite for dependence analysis in "kernels" regions

This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose.  This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops".  The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions.  This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.

Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure.  Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs.  The data-dependence analysis step in
Graphite has so far been tied to the code generation step.  This
commit introduces a separate data-dependence analysis step that avoids
the code generation.  This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.

gcc/ChangeLog:

        * Makefile.in: Add graphite-oacc.o
        * cfgloop.c (alloc_loop): Set can_be_parallel_valid_p to false.
        * cfgloop.h: Add can_be_parallel_valid_p field.
        * cfgloopmanip.c (copy_loop_info): Add assert.
        * config/nvptx/nvptx.c (nvptx_goacc_reduction_setup):
        * doc/invoke.texi: Adjust param openacc-kernels description.
        * doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
        * flag-types.h (enum openacc_kernels):Add
        OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
        * gimple-pretty-print.c (dump_gimple_omp_target): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        * gimple.h (enum gf_mask): Add
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
        widen GF_OMP_TARGET_KIND_MASK.
        (is_gimple_omp_oacc): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (is_gimple_omp_offloaded): Likewise.
        * gimplify.c (gimplify_omp_for): Enable reduction localization
        for "kernels" regions.
        (gimplify_omp_workshare): Likewise.
        * graphite-dependences.c (scop_get_reads_and_writes): Handle
        "kills" and "reduction" PDRs.
        (apply_schedule_on_deps): Add dump output for intermediate
        steps of the dependence computation to enable understanding
        of unexpected dependences.
        (carries_deps): Likewise.
        (scop_get_dependences): Handle "kill" operations and add dump
        output.
        * graphite-isl-ast-to-gimple.c (visit_schedule_loop_node): New function.
        (graphite_oacc_analyze_scop): New function.
        * graphite-optimize-isl.c (optimize_isl): Remove "static" and
        add argument to identify OpenACC use; don't fail on unchanged
        schedule in this case.
        * graphite-poly.c (new_poly_dr): Handle "kills".
        (print_pdr): Likewise.
        (new_gimple_poly_bb): Likewise.
        (free_gimple_poly_bb): Likewise.
        (new_scop): Handle "reduction", "private", and "firstprivate"
        hash sets.
        (free_scop): Likewise.
        (print_isl_space): New function.
        (debug_isl_space): New function.
        * graphite-scop-detection.c (scop_detection::can_represent_loop):
        Don't fail if niter is 0 in OpenACC functions.
        (scop_detection::add_scop): Don't reject regions with only one
        loop in OpenACC functions.
        (ignored_oacc_internal_call_p): New function.
        (scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
        (stmt_has_side_effects): Ignore internal OpenACC function calls.
        (add_write): Likewise.
        (add_read): Likewise.
        (add_kill): New function.
        (add_kills): New function.
        (add_oacc_kills): New function.
        (try_generate_gimple_bb): Kill false dependences for OpenACC
        "private"/"firstprivate" vars.
        (gather_bbs::gather_bbs): Determin OpenACC
        "private"/"firstprivate" vars in region.
        (gather_bbs::before_dom_children): Add assert.
        (determine_openacc_reductions): New function.
        (build_scops): Determine OpenACC "reduction" vars in SCoP.
        * graphite-sese-to-poly.c (oacc_ifn_call_extract): New declaration.
        (oacc_internal_call_p): New function.
        (build_poly_dr): Ignore internal OpenACC function calls,
        * handle "reduction" refs.
        (build_poly_sr): Likewise; handle "kill" operations.
        * graphite.c (graphite_transform_loops): Accept functions with
        only a single loop.
        (oacc_enable_graphite_p): New function.
        (gate_graphite_transforms): Enable pass on OpenACC functions.
        * graphite.h (enum poly_dr_type): Add PDR_KILL.
        (struct poly_dr): Add "is_reduction" field.
        (new_poly_dr): Add argument to declaration.
        (pdr_kill_p): New function.
        (print_isl_space): New declaration.
        (debug_isl_space): New declaration.
        (struct scop): Add fields "reductions_vars",
        "oacc_firstprivate_vars", and "oacc_private_scalars".
        (optimize_isl): New declaration.
        (graphite_oacc_analyze_scop): New declaration.
        * internal-fn.c (expand_UNIQUE): Handle
        IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
        * internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
        * omp-expand.c (struct omp_region): Adjust comment.
        (expand_omp_taskloop_for_inner):
        (expand_omp_for): Add asserts about expected "kernels" region types.
        (mark_loops_in_oacc_kernels_region): Likewise.
        (expand_omp_target): Likewise; handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (build_omp_regions_1): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        Likewise.
        (omp_make_gimple_edges): Likewise.
        * omp-general.c (oacc_get_kernels_attrib): New function.
        (oacc_get_fn_dim_size): Allow argument to be NULL.
        * omp-general.h (oacc_get_kernels_attrib): New declaration.
        * omp-low.c (struct omp_context): Add fields
        "oacc_firstprivate_vars" and "oacc_private_scalars".
        (was_originally_oacc_kernels): New function.
        (is_oacc_kernels):
        (is_oacc_kernels_decomposed_graphite_part): New function.
        (new_omp_context): Allocate "oacc_first_private_vars" and
        "oacc_private_scalars" ...
        (delete_omp_context): ... and free from here.
        (oacc_record_firstprivate_var_clauses): New function.
        (oacc_record_private_scalars): New function.
        (scan_sharing_clauses): Call functions to record "private"
        scalars and "firstprivate" variables.
        (check_oacc_kernel_gwv): Add assert.
        (ctx_in_oacc_kernels_region): Handle
        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
        (scan_omp_for): Likewise.
        (check_omp_nesting_restrictions): Likewise.
        (lower_oacc_head_mark): Likewise.
        (lower_omp_for): Likewise.
        (lower_omp_target): Create "private" and "firstprivate" marker
        call statements.
        (lower_oacc_head_tail): Adjust "private" and "firstprivate"
        marker calls.
        (lower_oacc_reductions): Emit "private" and "firstprivate"
         marker call statements.
        (make_oacc_firstprivate_vars_marker): New function.
        (make_oacc_private_scalars_marker): New function.
        * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
        Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
        region using the new "kernels" handling.
        (make_region_seq): Adjust default region type for new
        "kernels" handling; no more exceptions, let Graphite handle everything.
        (make_region_loop_nest): Likewise; add dump output and assert.
        (adjust_nested_loop_clauses): Stop creating "auto" clauses if
        loop has "independent", "gang" etc.
        (transform_kernels_loop_clauses): Likewise.
        * omp-offload.c (oacc_extract_loop_call): New function.
        (oacc_loop_get_cfg_loop): New function.
        (can_be_parallel_str): New function.
        (oacc_loop_can_be_parallel_p): New function.
        (oacc_parallel_kernels_graphite_fun_p): New function.
        (oacc_parallel_fun_p): New function.
        (oacc_loop_transform_auto_into_independent): New function, ...
        (oacc_loop_fixed_partitions): ... called from here to transfer
        the result of Graphite's analysis to the loop.
        (execute_oacc_loop_designation): Handle "oacc
        functions with "parallel_kernels_graphite" attribute.
        (execute_oacc_device_lower): Handle
        IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
        * omp-offload.h (oacc_extract_loop_call): Add declaration.
        * params.opt: Add "param=openacc-kernels" value "decompose-parloops".
        * sese.c (scalar_evolution_in_region): "Redirect" SCEV
        analysis to outer loop for IFN_GOACC_LOOP calls.
        * sese.h: Add field "kill_scalar_refs".
        * tree-chrec.c (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
        like CASE_CONVERT.
        * tree-data-ref.c (dump_data_reference): Include
        * DR_BASE_ADDRESS and DR_OFFSET in dump output.
        (get_references_in_stmt): Don't reject OpenACC internal function
        calls.
        (graphite_find_data_references_in_stmt): Remove unused variable.
        * tree-parloops.c (pass_parallelize_loops::execute): Disable
        pass with the new kernels handling, enable if requested explicitly.
        * tree-scalar-evolution.c (set_scev_analyze_openacc_calls):
        Set flag to enable the analysis of internal OpenACC function
        calls (use for Graphite only).
        (oacc_call_analyzable_p): New function.
        (oacc_ifn_call_extract): New function.
        (oacc_simplify): New function.
        (add_to_evolution): Simplify OpenACC internal function calls
        if applicable.
        (follow_ssa_edge_binary): Likewise.
        (follow_ssa_edge_expr): Likewise.
        (follow_copies_to_constant): Likewise.
        (analyze_initial_condition): Likewise.
        (interpret_loop_phi): Likewise.
        (interpret_gimple_call): New function.
        (interpret_rhs_expr): Likewise.
        (instantiate_scev_name): Likewise.
        (analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
        (expression_expensive_p): Consider internal OpenACC calls to
        be cheap.
        * tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
        New declaration.
        (oacc_call_analyzable_p): New declaration.
        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Mark
        lhs of internal OpenACC function calls necessary.
        * tree-ssa-ifcombine.c (recognize_if_then_else):
        * tree-ssa-loop-niter.c (oacc_call_analyzable_p):
        (oacc_ifn_call_extract): New declaration.
        (interpret_gimple_call): New delcaration.
        (expand_simple_operations): Handle internal OpenACC function calls.
        * tree-ssa-loop.c (gate_oacc_kernels): Disable for new
        "kernels" handling.
        * graphite-oacc.c: New file.
        * graphite-oacc.h: New file.

libgomp/ChangeLog:

        * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
        * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust.
        * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.

gcc/testsuite/ChangeLog:

        * c-c++-common/goacc/classify-kernels.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust.
        * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
        * c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
        * c-c++-common/goacc/classify-kernels-unparallelized.c: Removed.
        * c-c++-common/goacc/kernels-reduction.c: Removed.
        * gfortran.dg/goacc/loop-auto-transfer-2.f90: New test.
        * gfortran.dg/goacc/loop-auto-transfer-3.f90: New test.
        * gfortran.dg/goacc/loop-auto-transfer-4.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/Makefile.in                               |   1 +
 gcc/cfgloop.c                                 |   1 +
 gcc/cfgloop.h                                 |   6 +
 gcc/cfgloopmanip.c                            |   1 +
 gcc/config/nvptx/nvptx.c                      |   7 +
 gcc/doc/invoke.texi                           |  20 +-
 gcc/doc/passes.texi                           |   6 +-
 gcc/flag-types.h                              |   1 +
 gcc/gimple-pretty-print.c                     |   3 +
 gcc/gimple.h                                  |   7 +-
 gcc/gimplify.c                                |  13 +-
 gcc/graphite-dependences.c                    | 220 ++++--
 gcc/graphite-isl-ast-to-gimple.c              |  93 ++-
 gcc/graphite-oacc.c                           | 689 ++++++++++++++++++
 gcc/graphite-oacc.h                           |  55 ++
 gcc/graphite-optimize-isl.c                   |   7 +-
 gcc/graphite-poly.c                           |  39 +-
 gcc/graphite-scop-detection.c                 | 171 ++++-
 gcc/graphite-sese-to-poly.c                   |  65 +-
 gcc/graphite.c                                | 120 ++-
 gcc/graphite.h                                |  35 +-
 gcc/internal-fn.c                             |   2 +
 gcc/internal-fn.h                             |   4 +-
 gcc/omp-expand.c                              |  73 +-
 gcc/omp-general.c                             |  21 +-
 gcc/omp-general.h                             |   1 +
 gcc/omp-low.c                                 | 321 ++++++--
 gcc/omp-oacc-kernels-decompose.cc             | 145 ++--
 gcc/omp-offload.c                             | 512 +++++++++++--
 gcc/omp-offload.h                             |   2 +
 gcc/params.opt                                |   5 +-
 gcc/sese.c                                    |  25 +-
 gcc/sese.h                                    |   1 +
 .../goacc/classify-kernels-unparallelized.c   |  45 --
 .../c-c++-common/goacc/classify-kernels.c     |   2 +-
 .../c-c++-common/goacc/kernels-reduction.c    |  36 -
 ...kernels-conditional-loop-independent_seq.c |   2 +-
 .../goacc/note-parallelism-1-kernels-loops.c  |   4 +-
 .../goacc/note-parallelism-kernels-loops.c    |  14 +-
 .../goacc/loop-auto-transfer-2.f90            |  47 ++
 .../goacc/loop-auto-transfer-3.f90            | 103 +++
 .../goacc/loop-auto-transfer-4.f90            | 323 ++++++++
 gcc/tree-chrec.c                              |   3 +
 gcc/tree-data-ref.c                           |  20 +-
 gcc/tree-parloops.c                           |  18 +-
 gcc/tree-scalar-evolution.c                   | 179 ++++-
 gcc/tree-scalar-evolution.h                   |   3 +
 gcc/tree-ssa-dce.c                            |  14 +
 gcc/tree-ssa-loop-niter.c                     |   6 +
 gcc/tree-ssa-loop.c                           |  11 +
 .../libgomp.oacc-c-c++-common/parallel-dims.c |   2 +
 .../gangprivate-attrib-1.f90                  |   2 +-
 .../kernels-independent.f90                   |   1 +
 .../libgomp.oacc-fortran/kernels-loop-1.f90   |   1 +
 .../libgomp.oacc-fortran/pr94358-1.f90        |   1 +
 55 files changed, 3089 insertions(+), 420 deletions(-)
 create mode 100644 gcc/graphite-oacc.c
 create mode 100644 gcc/graphite-oacc.h
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

Message ID	20211117160330.20029-9-frederik@codesourcery.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 54127385AC0A IronPort-SDR: reMVIyPhc9v0FqRltGKtC+xuegHUzeMY/G46GVGvfRF18Ly6aLOfsdzltf3wRtA34D6n2dO1jy wZ6SM9raVzVfpJ/NpnUVFQiJvGjYCn605sit5/US0GRneTbK/IX96empjEsoqQ6XRfiWfrKe2i JVfsiUjwyTLA/G96Gu/gaEjcJWR1saJx6Tpkfqh97WRBvOCgH53y7fGUjNhcMEFvi4BNylr+XD 7WCP3oriUXDV+B7m9ODlnz5x74LRvB2Foc0XCp4m2sxKHdCGTY1x5HRK7szBOW2xXx4fUTaQHg nEAwQ8+8jmIm7fsfY2O7Z4bJ IronPort-SDR: EIIJ5H9PneMRY2TLuyXQoB7D0q6Vn5b95Z/fzAX/YsQC8nlUWQJUbCTkwj+2yvJfzp+3pAyF6S ZMff7TqHE5iFRFeSi/W+ZW3z51Rm6PqwRtWatVeyPS3EUTwMrU8gzH1WvzFpv7a0bZopcizDMl AmDYTIUHh6/Ezg7VcXB8HKQdbkzKXoczAEPrzUsPkJmQhOt48iXfxmhlCA5yToopfxEh7QvYla SRBaQ4aWUa2dFH3dxL20TRO3XOSjvj44OZ3+aAP4DRvGtbd95gAueWCBKTqeR4Mv51czUHCkmd vws= From: Frederik Harwath <frederik@codesourcery.com> To: <gcc-patches@gcc.gnu.org> Subject: [OG11][committed][PATCH 09/22] openacc: Use Graphite for dependence analysis in "kernels" regions Date: Wed, 17 Nov 2021 17:03:17 +0100 Message-ID: <20211117160330.20029-9-frederik@codesourcery.com> In-Reply-To: <20211117160330.20029-1-frederik@codesourcery.com> References: <20211117160330.20029-1-frederik@codesourcery.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	OpenACC "kernels" Improvements \| expand [OG11,committed,00/22] OpenACC "kernels" Improvements [OG11,committed,01/22] Fortran: delinearize multi-dimensional array accesses [OG11,committed,02/22] openacc: Move pass_oacc_device_lower after pass_graphite [OG11,committed,03/22] graphite: Extend SCoP detection dump output [OG11,committed,04/22] graphite: Rename isl_id_for_ssa_name [OG11,committed,05/22] graphite: Fix minor mistakes in comments [OG11,committed,07/22] Move compute_alias_check_pairs to tree-data-ref.c [OG11,committed,08/22] graphite: Add runtime alias checking [OG11,committed,09/22] openacc: Use Graphite for dependence analysis in "kernels" regions [OG11,committed,10/22] openacc: Add "can_be_parallel" flag info to "graph" dumps [OG11,committed,11/22] openacc: Add further kernels tests [OG11,committed,12/22] openacc: Remove unused partitioning in "kernels" regions [OG11,committed,13/22] Add function for printing a single OMP_CLAUSE [OG11,committed,14/22] openacc: Add data optimization pass [OG11,committed,15/22] openacc: Add runtime alias checking for OpenACC kernels [OG11,committed,16/22] openacc: Warn about "independent" "kernels" loops with data-dependences [OG11,committed,17/22] openacc: Handle internal function calls in pass_lim [OG11,committed,18/22] openacc: Disable pass_pre on outlined functions analyzed by Graphite [OG11,committed,19/22] graphite: Tune parameters for OpenACC use [OG11,committed,20/22] graphite: Adjust scop loop-nest choice [OG11,committed,21/22] graphite: Accept loops without data references [OG11,committed,22/22] openacc: Adjust test expectations to new "kernels" handling

[OG11,committed,09/22] openacc: Use Graphite for dependence analysis in "kernels" regions

Commit Message

Patch