From patchwork Sun Nov 5 11:03:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Koenig X-Patchwork-Id: 834331 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-465963-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ZKAYHA+A"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yVCVm738zz9t2f for ; Sun, 5 Nov 2017 22:03:49 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=xdrA0L24CbYf/agWxHoPp5P97bpubeKEHH/oKu8S4wvnxOpdE4 a07BrXIEswcnm4VcYR6V5EGX/8ll+T2DDNUDZ63eW/sMow+9n5XWsvMKgCCRbWfz mDYIHx9R1CXheunkHJtgS2n6SejOGpPu56Ox/FQjkbS7lqecYD+gidIBw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=OYntqHSWWrBhgcJ9+8nM3j84jSg=; b=ZKAYHA+ASMw1WeP/e12x S0v1b8WvVtNyZczaf1fQqn0A1Uy7JJl9Y4otb2JYq6M7EjICnwXlB55F2YSlGLjG TWVVXeSKyHk+Ra5bnMR48FUCMWHCjITv5cdg4tzrTBB88RoH2fF1wNl1qmtw0Pmr IPBgV7BMfbHCyEpItvjHOaQ= Received: (qmail 101182 invoked by alias); 5 Nov 2017 11:03:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 101159 invoked by uid 89); 5 Nov 2017 11:03:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-15.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=1559, joined, gol, Attempt X-Spam-User: qpsmtpd, 2 recipients X-HELO: cc-smtpout2.netcologne.de Received: from cc-smtpout2.netcologne.de (HELO cc-smtpout2.netcologne.de) (89.1.8.212) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 05 Nov 2017 11:03:36 +0000 Received: from cc-smtpin1.netcologne.de (cc-smtpin1.netcologne.de [89.1.8.201]) by cc-smtpout2.netcologne.de (Postfix) with ESMTP id 9FB11128ED; Sun, 5 Nov 2017 12:03:31 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by cc-smtpin1.netcologne.de (Postfix) with ESMTP id 931CF11E12; Sun, 5 Nov 2017 12:03:31 +0100 (CET) Received: from [78.35.138.12] (helo=cc-smtpin1.netcologne.de) by localhost with ESMTP (eXpurgate 4.1.9) (envelope-from ) id 59fef003-029d-7f0000012729-7f0000019dc1-1 for ; Sun, 05 Nov 2017 12:03:31 +0100 Received: from [192.168.178.20] (xdsl-78-35-138-12.netcologne.de [78.35.138.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by cc-smtpin1.netcologne.de (Postfix) with ESMTPSA; Sun, 5 Nov 2017 12:03:29 +0100 (CET) To: "fortran@gcc.gnu.org" , gcc-patches From: Thomas Koenig Subject: [patch, fortran] Do loop (or index) interchange for FORALL and DO CONCURRENT, take 2 Message-ID: <21703eb6-d9e7-c974-b1e9-25821dbe31f8@netcologne.de> Date: Sun, 5 Nov 2017 12:03:29 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 Hello world, the attached patch now includes a new option to warn about a loop interchange, plus a test case using that option. Regression-tested. OK for trunk? Regards Thomas 2017-11-05 Thomas Koenig PR fortran/82471 * lang.opt (ffrontend-loop-interchange): New option. (Wfrontend-loop-interchange): New option. * options.c (gfc_post_options): Handle ffrontend-loop-interchange. * frontend-passes.c (gfc_run_passes): Run optimize_namespace if flag_frontend_optimize or flag_frontend_loop_interchange are set. (optimize_namespace): Run functions according to flags set; also call index_interchange. (ind_type): New function. (has_var): New function. (index_cost): New function. (loop_comp): New function. 2017-11-05 Thomas Koenig PR fortran/82471 * gfortran.dg/loop_interchange_1.f90: New test. Index: frontend-passes.c =================================================================== --- frontend-passes.c (Revision 254408) +++ frontend-passes.c (Arbeitskopie) @@ -55,6 +55,7 @@ static gfc_expr* check_conjg_transpose_variable (g bool *); static bool has_dimen_vector_ref (gfc_expr *); static int matmul_temp_args (gfc_code **, int *,void *data); +static int index_interchange (gfc_code **, int*, void *); #ifdef CHECKING_P static void check_locus (gfc_namespace *); @@ -155,9 +156,11 @@ gfc_run_passes (gfc_namespace *ns) check_locus (ns); #endif + if (flag_frontend_optimize || flag_frontend_loop_interchange) + optimize_namespace (ns); + if (flag_frontend_optimize) { - optimize_namespace (ns); optimize_reduction (ns); if (flag_dump_fortran_optimized) gfc_dump_parse_tree (ns, stdout); @@ -1350,7 +1353,9 @@ simplify_io_impl_do (gfc_code **code, int *walk_su return 0; } -/* Optimize a namespace, including all contained namespaces. */ +/* Optimize a namespace, including all contained namespaces. + flag_frontend_optimize and flag_fronend_loop_interchange are + handled separately. */ static void optimize_namespace (gfc_namespace *ns) @@ -1363,28 +1368,35 @@ optimize_namespace (gfc_namespace *ns) in_assoc_list = false; in_omp_workshare = false; - gfc_code_walker (&ns->code, simplify_io_impl_do, dummy_expr_callback, NULL); - gfc_code_walker (&ns->code, convert_do_while, dummy_expr_callback, NULL); - gfc_code_walker (&ns->code, convert_elseif, dummy_expr_callback, NULL); - gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL); - gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL); - if (flag_inline_matmul_limit != 0) + if (flag_frontend_optimize) { - bool found; - do + gfc_code_walker (&ns->code, simplify_io_impl_do, dummy_expr_callback, NULL); + gfc_code_walker (&ns->code, convert_do_while, dummy_expr_callback, NULL); + gfc_code_walker (&ns->code, convert_elseif, dummy_expr_callback, NULL); + gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL); + gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL); + if (flag_inline_matmul_limit != 0) { - found = false; - gfc_code_walker (&ns->code, matmul_to_var_code, matmul_to_var_expr, - (void *) &found); + bool found; + do + { + found = false; + gfc_code_walker (&ns->code, matmul_to_var_code, matmul_to_var_expr, + (void *) &found); + } + while (found); + + gfc_code_walker (&ns->code, matmul_temp_args, dummy_expr_callback, + NULL); + gfc_code_walker (&ns->code, inline_matmul_assign, dummy_expr_callback, + NULL); } - while (found); - - gfc_code_walker (&ns->code, matmul_temp_args, dummy_expr_callback, - NULL); - gfc_code_walker (&ns->code, inline_matmul_assign, dummy_expr_callback, - NULL); } + if (flag_frontend_loop_interchange) + gfc_code_walker (&ns->code, index_interchange, dummy_expr_callback, + NULL); + /* BLOCKs are handled in the expression walker below. */ for (ns = ns->contained; ns; ns = ns->sibling) { @@ -4225,6 +4237,170 @@ inline_matmul_assign (gfc_code **c, int *walk_subt return 0; } + +/* Code for index interchange for loops which are grouped together in DO + CONCURRENT or FORALL statements. This is currently only applied if the + iterations are grouped together in a single statement. + + For this transformation, tt is assumed that memory access in strides is + expensive, and that loops which access later indices (which access memory + in bigger strides) should be moved to the first loops. + + For this, a loop over all the statements is executed, counting the times + that the loop iteration values are accessed in each index. The loop + indices are then sorted to minimize access to later indices from inner + loops. */ + +/* Type for holding index information. */ + +typedef struct { + gfc_symbol *sym; + gfc_forall_iterator *fa; + int num; + int n[GFC_MAX_DIMENSIONS]; +} ind_type; + +/* Callback function to determine if an expression is the + corresponding variable. */ + +static int +has_var (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED, void *data) +{ + gfc_expr *expr = *e; + gfc_symbol *sym; + + if (expr->expr_type != EXPR_VARIABLE) + return 0; + + sym = (gfc_symbol *) data; + return sym == expr->symtree->n.sym; +} + +/* Callback function to calculate the cost of a certain index. */ + +static int +index_cost (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED, + void *data) +{ + ind_type *ind; + gfc_expr *expr; + gfc_array_ref *ar; + gfc_ref *ref; + int i,j; + + expr = *e; + if (expr->expr_type != EXPR_VARIABLE) + return 0; + + ar = NULL; + for (ref = expr->ref; ref; ref = ref->next) + { + if (ref->type == REF_ARRAY) + { + ar = &ref->u.ar; + break; + } + } + if (ar == NULL || ar->type != AR_ELEMENT) + return 0; + + ind = (ind_type *) data; + for (i = 0; i < ar->dimen; i++) + { + for (j=0; ind[j].sym != NULL; j++) + { + if (gfc_expr_walker (&ar->start[i], has_var, (void *) (ind[j].sym))) + ind[j].n[i]++; + } + } + return 0; +} + +/* Callback function for qsort, to sort the loop indices. */ + +static int +loop_comp (const void *e1, const void *e2) +{ + const ind_type *i1 = (const ind_type *) e1; + const ind_type *i2 = (const ind_type *) e2; + int i; + + for (i=GFC_MAX_DIMENSIONS-1; i >= 0; i--) + { + if (i1->n[i] != i2->n[i]) + return i1->n[i] - i2->n[i]; + } + /* All other things being equal, let's not change the ordering. */ + return i2->num - i1->num; +} + +/* Main function to do the index interchange. */ + +static int +index_interchange (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) +{ + gfc_code *co; + co = *c; + int n_iter; + gfc_forall_iterator *fa; + ind_type *ind; + int i, j; + + if (co->op != EXEC_FORALL && co->op != EXEC_DO_CONCURRENT) + return 0; + + n_iter = 0; + for (fa = co->ext.forall_iterator; fa; fa = fa->next) + n_iter ++; + + /* Nothing to reorder. */ + if (n_iter < 2) + return 0; + + ind = XALLOCAVEC (ind_type, n_iter + 1); + + i = 0; + for (fa = co->ext.forall_iterator; fa; fa = fa->next) + { + ind[i].sym = fa->var->symtree->n.sym; + ind[i].fa = fa; + for (j=0; jext.forall_iterator = fa = ind[0].fa; + for (i=1; inext = ind[i].fa; + fa = fa->next; + } + fa->next = NULL; + + if (flag_warn_frontend_loop_interchange) + { + for (i=1; i ind[i].num) + { + gfc_warning (OPT_Wfrontend_loop_interchange, + "Interchanging loops at %L", &co->loc); + break; + } + } + } + + return 0; +} + #define WALK_SUBEXPR(NODE) \ do \ { \ Index: invoke.texi =================================================================== --- invoke.texi (Revision 254408) +++ invoke.texi (Arbeitskopie) @@ -149,8 +149,9 @@ and warnings}. -Wdo-subscript -Wfunction-elimination -Wimplicit-interface @gol -Wimplicit-procedure -Wintrinsic-shadow -Wuse-without-only -Wintrinsics-std @gol -Wline-truncation -Wno-align-commons -Wno-tabs -Wreal-q-constant @gol --Wsurprising -Wunderflow -Wunused-parameter -Wrealloc-lhs -Wrealloc-lhs-all @gol --Wtarget-lifetime -fmax-errors=@var{n} -fsyntax-only -pedantic -pedantic-errors +-Wsurprising -Wunderflow -Wunused-parameter -Wrealloc-lhs @gol +-Wrealloc-lhs-all -Wfrontend-loop-interchange -Wtarget-lifetime @gol +-fmax-errors=@var{n} -fsyntax-only -pedantic -pedantic-errors @gol } @item Debugging Options @@ -183,6 +184,7 @@ and warnings}. -fbounds-check -fcheck-array-temporaries @gol -fcheck=@var{} @gol -fcoarray=@var{} -fexternal-blas -ff2c +-ffrontend-loop-interchange @gol -ffrontend-optimize @gol -finit-character=@var{n} -finit-integer=@var{n} -finit-local-zero @gol -finit-derived @gol @@ -910,6 +912,13 @@ Enables some warning options for usages of languag may be problematic. This currently includes @option{-Wcompare-reals}, @option{-Wunused-parameter} and @option{-Wdo-subscript}. +@item -Wfrontend-loop-interchange +@opindex @code{Wfrontend-loop-interchange} +@cindex warnings, loop interchange +@cindex loop interchange, warning +Enable warning for loop interchanges performed by the +@option{-ffrontend-loop-interchange} option. + @item -Wimplicit-interface @opindex @code{Wimplicit-interface} @cindex warnings, implicit interface @@ -1782,6 +1791,14 @@ expressions, removing unnecessary calls to @code{T and assignments and replacing @code{TRIM(a)} with @code{a(1:LEN_TRIM(a))}. It can be deselected by specifying @option{-fno-frontend-optimize}. + +@item -ffrontend-loop-interchange +@opindex @code{frontend-loop-interchange} +@cindex loop interchange, Fortran +Attempt to interchange loops in the Fortran front end where +profitable. Enabled by default by any @option{-O} option. +At the moment, this option only affects @code{FORALL} and +@code{DO CONCURRENT} statements with several forall triplets. @end table @xref{Code Gen Options,,Options for Code Generation Conventions, Index: lang.opt =================================================================== --- lang.opt (Revision 254408) +++ lang.opt (Arbeitskopie) @@ -245,6 +245,10 @@ Wextra Fortran Warning ; Documented in common +Wfrontend-loop-interchange +Fortran Var(flag_warn_frontend_loop_interchange) +Warn if loops have been interchanged. + Wfunction-elimination Fortran Warning Var(warn_function_elimination) Warn about function call elimination. @@ -548,6 +552,10 @@ ffree-line-length- Fortran RejectNegative Joined UInteger Var(flag_free_line_length) Init(132) -ffree-line-length- Use n as character line width in free mode. +ffrontend-loop-interchange +Fortran Var(flag_frontend_loop_interchange) Init(-1) +Try to interchange loops if profitable. + ffrontend-optimize Fortran Var(flag_frontend_optimize) Init(-1) Enable front end optimization. Index: options.c =================================================================== --- options.c (Revision 254408) +++ options.c (Arbeitskopie) @@ -417,6 +417,11 @@ gfc_post_options (const char **pfilename) if (flag_frontend_optimize == -1) flag_frontend_optimize = optimize; + /* Same for front end loop interchange. */ + + if (flag_frontend_loop_interchange == -1) + flag_frontend_loop_interchange = optimize; + if (flag_max_array_constructor < 65535) flag_max_array_constructor = 65535;