Message ID | 20160624074059.GA22083@msticlxl57.ims.intel.com |
---|---|
State | New |
Headers | show |
Ping 2016-06-24 10:40 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>: > On 17 Jun 10:46, Jeff Law wrote: >> On 06/17/2016 08:33 AM, Ilya Enkovich wrote: >> >> >> >>Hmm, there seems to be a level of indirection I'm missing here. We're >> >>smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux. Ewww. I thought >> >>the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from >> >>the original loop to the vectorized epilogue. What am I missing? Rather >> >>than smuggling around in the aux field, is there some inherent reason why we >> >>can't just copy the info from the original loop directly into >> >>LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue? >> > >> >LOOP_VINFO_ORIG_LOOP_INFO is used for several things: >> > - mark this loop as epilogue >> > - get VF of original loop (required for both mask and nomask modes) >> > - get decision about epilogue masking >> > >> >That's all. When epilogue is created it has no LOOP_VINFO. Also when we >> >vectorize loop we create and destroy its LOOP_VINFO multiple times. When >> >loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in >> >LOOP_VINFO_ORIG_LOOP_INFO. When Loop has no LOOP_VINFO associated I have no >> >place to bind it with the original loop and therefore I use vacant loop->aux >> >for that. Any other way to bind epilogue with its original loop would work >> >as well. I just chose loop->aux to avoid new fields and data structures. >> I was starting to draw the conclusion that the smuggling in the aux field >> was for cases when there was no LOOP_VINFO. But was rather late at night >> and I didn't follow that idea through the code. THanks for clarifying. >> >> >> >> >> >>And something just occurred to me -- is there some inherent reason why SLP >> >>doesn't vectorize the epilogue, particularly for the cases where we can >> >>vectorize the epilogue using smaller vectors? Sorry if you've already >> >>answered this somewhere or it's a dumb question. >> > >> >IIUC this may happen only if we unroll epilogue into a single BB which happens >> >only when epilogue iterations count is known. Right? >> Probably. The need to make sure the epilogue is unrolled probably makes >> this a non-starter. >> >> I have a soft spot for SLP as I stumbled on the idea while rewriting a >> presentation in the wee hours of the morning for the next day. Essentially >> it was a "poor man's" vectorizer that could be done for dramatically less >> engineering cost than a traditional vectorizer. The MIT paper outlining the >> same ideas came out a couple years later... >> >> >> >>+ /* Add new loop to a processing queue. To make it easier >> >>>+ to match loop and its epilogue vectorization in dumps >> >>>+ put new loop as the next loop to process. */ >> >>>+ if (new_loop) >> >>>+ { >> >>>+ loops.safe_insert (i + 1, new_loop->num); >> >>>+ vect_loops_num = number_of_loops (cfun); >> >>>+ } >> >>>+ >> >> >> >>So just to be clear, the only reason to do this is for dumps -- other than >> >>processing the loop before it's epilogue, there's no other inherently >> >>necessary ordering of the loops, right? >> > >> >Right, I don't see other reasons to do it. >> Perfect. Thanks for confirming. >> >> jeff >> > > Hi, > > Here is an updated version with disabled alias checks for loop epilogues. > Instead of calling vect_analyze_data_ref_dependence I just use VF of the > original loop as MAX_VF for epilogue. > > Thanks, > Ilya > -- > gcc/ > > 2016-05-24 Ilya Enkovich <ilya.enkovich@intel.com> > > * tree-if-conv.c (tree_if_conversion): Make public. > * tree-if-conv.h: New file. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid > dynamic alias checks for epilogues. > (vect_enhance_data_refs_alignment): Don't try to enhance alignment > for epilogues. > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return > created loop. > * tree-vect-loop.c: include tree-if-conv.h. > (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in > loop->aux. > (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset > loop->aux. > (vect_analyze_loop): Reset loop->aux. > (vect_transform_loop): Check if created epilogue should be returned > for further vectorization. If-convert epilogue if required. > * tree-vectorizer.c (vectorize_loops): Add a queue of loops to > process and insert vectorized loop epilogues into this queue. > * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return created > loop. > (vect_transform_loop): Return created loop. > > > diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c > index 5914a78..b790ca9 100644 > --- a/gcc/tree-if-conv.c > +++ b/gcc/tree-if-conv.c > @@ -2808,7 +2808,7 @@ ifcvt_local_dce (basic_block bb) > profitability analysis. Returns non-zero todo flags when something > changed. */ > > -static unsigned int > +unsigned int > tree_if_conversion (struct loop *loop) > { > unsigned int todo = 0; > diff --git a/gcc/tree-if-conv.h b/gcc/tree-if-conv.h > new file mode 100644 > index 0000000..3a732c2 > --- /dev/null > +++ b/gcc/tree-if-conv.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2016 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#ifndef GCC_TREE_IF_CONV_H > +#define GCC_TREE_IF_CONV_H > + > +unsigned int tree_if_conversion (struct loop *); > + > +#endif /* GCC_TREE_IF_CONV_H */ > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index 36d302a..a902a50 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -472,9 +472,15 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, int *max_vf) > LOOP_VINFO_LOOP_NEST (loop_vinfo), true)) > return false; > > - FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr) > - if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf)) > - return false; > + /* For epilogues we either have no aliases or alias versioning > + was applied to original loop. Therefore we may just get max_vf > + using VF of original loop. */ > + if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > + *max_vf = LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo); > + else > + FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr) > + if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf)) > + return false; > > return true; > } > @@ -1595,7 +1601,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) > /* Check if we can possibly peel the loop. */ > if (!vect_can_advance_ivs_p (loop_vinfo) > || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) > - || loop->inner) > + || loop->inner > + /* Required peeling was performed in prologue and > + is not required for epilogue. */ > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > do_peeling = false; > > if (do_peeling > @@ -1875,7 +1884,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) > > do_versioning = > optimize_loop_nest_for_speed_p (loop) > - && (!loop->inner); /* FORNOW */ > + && (!loop->inner) /* FORNOW */ > + /* Required versioning was performed for the > + original loop and is not required for epilogue. */ > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); > > if (do_versioning) > { > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > index 7ec6dae..fab5879 100644 > --- a/gcc/tree-vect-loop-manip.c > +++ b/gcc/tree-vect-loop-manip.c > @@ -1742,9 +1742,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, tree niters, > NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO). > > COND_EXPR and COND_EXPR_STMT_LIST are combined with a new generated > - test. */ > + test. > > -void > + Return created loop. */ > + > +struct loop * > vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, > tree ni_name, tree ratio_mult_vf_name, > unsigned int th, bool check_profitability) > @@ -1812,6 +1814,8 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, > scev_reset (); > > free_original_copy_tables (); > + > + return new_loop; > } > > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 90d42b5..d48f565 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-vectorizer.h" > #include "gimple-fold.h" > #include "cgraph.h" > +#include "tree-if-conv.h" > > /* Loop Vectorization Pass. > > @@ -1214,8 +1215,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts) > destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)); > loop_vinfo->scalar_cost_vec.release (); > > + loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); > free (loop_vinfo); > - loop->aux = NULL; > } > > > @@ -1501,13 +1502,24 @@ vect_analyze_loop_form (struct loop *loop) > > if (! vect_analyze_loop_form_1 (loop, &loop_cond, &number_of_iterationsm1, > &number_of_iterations, &inner_loop_cond)) > - return NULL; > + { > + loop->aux = NULL; > + return NULL; > + } > > loop_vec_info loop_vinfo = new_loop_vec_info (loop); > LOOP_VINFO_NITERSM1 (loop_vinfo) = number_of_iterationsm1; > LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations; > LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = number_of_iterations; > > + /* For epilogues we want to vectorize aux holds > + loop_vec_info of the original loop. */ > + if (loop->aux) > + { > + gcc_assert (LOOP_VINFO_VECTORIZABLE_P ((loop_vec_info)loop->aux)); > + LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = (loop_vec_info)loop->aux; > + } > + > if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) > { > if (dump_enabled_p ()) > @@ -1524,7 +1536,6 @@ vect_analyze_loop_form (struct loop *loop) > STMT_VINFO_TYPE (vinfo_for_stmt (inner_loop_cond)) > = loop_exit_ctrl_vec_info_type; > > - gcc_assert (!loop->aux); > loop->aux = loop_vinfo; > return loop_vinfo; > } > @@ -2284,7 +2295,10 @@ vect_analyze_loop (struct loop *loop) > if (fatal > || vector_sizes == 0 > || current_vector_size == 0) > - return NULL; > + { > + loop->aux = NULL; > + return NULL; > + } > > /* Try the next biggest vector size. */ > current_vector_size = 1 << floor_log2 (vector_sizes); > @@ -6573,10 +6587,11 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo, > Vectorize the loop - created vectorized stmts to replace the scalar > stmts in the loop, and update the loop exit condition. */ > > -void > +struct loop * > vect_transform_loop (loop_vec_info loop_vinfo) > { > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + struct loop *epilogue = NULL; > basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); > int nbbs = loop->num_nodes; > int i; > @@ -6658,8 +6673,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) > ni_name = vect_build_loop_niters (loop_vinfo); > vect_generate_tmps_on_preheader (loop_vinfo, ni_name, &ratio_mult_vf, > &ratio); > - vect_do_peeling_for_loop_bound (loop_vinfo, ni_name, ratio_mult_vf, > - th, check_profitability); > + epilogue = vect_do_peeling_for_loop_bound (loop_vinfo, ni_name, > + ratio_mult_vf, th, > + check_profitability); > } > else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) > ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), > @@ -6964,6 +6980,59 @@ vect_transform_loop (loop_vec_info loop_vinfo) > FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (loop_vinfo), i, instance) > vect_free_slp_instance (instance); > LOOP_VINFO_SLP_INSTANCES (loop_vinfo).release (); > + > + /* Don't vectorize epilogue for epilogue. */ > + if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)) > + epilogue = NULL; > + /* Scalar epilogue is not vectorized in case > + we use combined vector epilogue. */ > + else if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) > + epilogue = NULL; > + > + if (epilogue) > + { > + if (!LOOP_VINFO_MASK_EPILOGUE (loop_vinfo)) > + { > + unsigned int vector_sizes > + = targetm.vectorize.autovectorize_vector_sizes (); > + vector_sizes &= current_vector_size - 1; > + > + if (!(flag_tree_vectorize_epilogues & VECT_EPILOGUE_NOMASK)) > + epilogue = NULL; > + else if (!vector_sizes) > + epilogue = NULL; > + else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > + && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) > + { > + int smallest_vec_size = 1 << ctz_hwi (vector_sizes); > + int ratio = current_vector_size / smallest_vec_size; > + int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo) > + - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); > + eiters = eiters % vectorization_factor; > + > + epilogue->nb_iterations_upper_bound = eiters - 1; > + > + if (eiters < vectorization_factor / ratio) > + epilogue = NULL; > + } > + } > + } > + > + if (epilogue) > + { > + epilogue->force_vectorize = loop->force_vectorize; > + epilogue->safelen = loop->safelen; > + epilogue->dont_vectorize = false; > + > + /* We may need to if-convert epilogue to vectorize it. */ > + if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)) > + tree_if_conversion (epilogue); > + > + gcc_assert (!epilogue->aux); > + epilogue->aux = loop_vinfo; > + } > + > + return epilogue; > } > > /* The code below is trying to perform simple optimization - revert > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > index 2669813..1fc8b65 100644 > --- a/gcc/tree-vectorizer.c > +++ b/gcc/tree-vectorizer.c > @@ -491,14 +491,16 @@ vectorize_loops (void) > { > unsigned int i; > unsigned int num_vectorized_loops = 0; > - unsigned int vect_loops_num; > + unsigned int vect_loops_num = number_of_loops (cfun); > struct loop *loop; > hash_table<simduid_to_vf> *simduid_to_vf_htab = NULL; > hash_table<simd_array_to_simduid> *simd_array_to_simduid_htab = NULL; > bool any_ifcvt_loops = false; > unsigned ret = 0; > + auto_vec<unsigned int> loops (vect_loops_num); > > - vect_loops_num = number_of_loops (cfun); > + FOR_EACH_LOOP (loop, 0) > + loops.quick_push (loop->num); > > /* Bail out if there are no loops. */ > if (vect_loops_num <= 1) > @@ -514,14 +516,18 @@ vectorize_loops (void) > /* If some loop was duplicated, it gets bigger number > than all previously defined loops. This fact allows us to run > only over initial loops skipping newly generated ones. */ > - FOR_EACH_LOOP (loop, 0) > - if (loop->dont_vectorize) > + for (i = 0; i < loops.length (); i++) > + if (!(loop = get_loop (cfun, loops[i]))) > + continue; > + else if (loop->dont_vectorize) > any_ifcvt_loops = true; > else if ((flag_tree_loop_vectorize > - && optimize_loop_nest_for_speed_p (loop)) > + && (optimize_loop_nest_for_speed_p (loop) > + || loop->aux)) > || loop->force_vectorize) > { > loop_vec_info loop_vinfo; > + struct loop *new_loop; > vect_location = find_loop_location (loop); > if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION > && dump_enabled_p ()) > @@ -551,12 +557,21 @@ vectorize_loops (void) > && dump_enabled_p ()) > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, > "loop vectorized\n"); > - vect_transform_loop (loop_vinfo); > + new_loop = vect_transform_loop (loop_vinfo); > num_vectorized_loops++; > /* Now that the loop has been vectorized, allow it to be unrolled > etc. */ > loop->force_vectorize = false; > > + /* Add new loop to a processing queue. To make it easier > + to match loop and its epilogue vectorization in dumps > + put new loop as the next loop to process. */ > + if (new_loop) > + { > + loops.safe_insert (i + 1, new_loop->num); > + vect_loops_num = number_of_loops (cfun); > + } > + > if (loop->simduid) > { > simduid_to_vf *simduid_to_vf_data = XNEW (simduid_to_vf); > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 2c6cdbf..26d84b4 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -984,8 +984,8 @@ extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); > struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, > struct loop *, edge); > extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); > -extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree, > - unsigned int, bool); > +extern struct loop *vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree, > + unsigned int, bool); > extern void vect_do_peeling_for_alignment (loop_vec_info, tree, > unsigned int, bool); > extern source_location find_loop_location (struct loop *); > @@ -1099,7 +1099,7 @@ extern gimple *vect_force_simple_reduction (loop_vec_info, gimple *, bool, > /* Drive for loop analysis stage. */ > extern loop_vec_info vect_analyze_loop (struct loop *); > /* Drive for loop transformation stage. */ > -extern void vect_transform_loop (loop_vec_info); > +extern struct loop *vect_transform_loop (loop_vec_info); > extern loop_vec_info vect_analyze_loop_form (struct loop *); > extern bool vectorizable_live_operation (gimple *, gimple_stmt_iterator *, > gimple **);
On 06/24/2016 01:40 AM, Ilya Enkovich wrote: > > Here is an updated version with disabled alias checks for loop epilogues. > Instead of calling vect_analyze_data_ref_dependence I just use VF of the > original loop as MAX_VF for epilogue. > > Thanks, > Ilya > -- > gcc/ > > 2016-05-24 Ilya Enkovich <ilya.enkovich@intel.com> > > * tree-if-conv.c (tree_if_conversion): Make public. > * tree-if-conv.h: New file. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid > dynamic alias checks for epilogues. > (vect_enhance_data_refs_alignment): Don't try to enhance alignment > for epilogues. > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return > created loop. > * tree-vect-loop.c: include tree-if-conv.h. > (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in > loop->aux. > (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset > loop->aux. > (vect_analyze_loop): Reset loop->aux. > (vect_transform_loop): Check if created epilogue should be returned > for further vectorization. If-convert epilogue if required. > * tree-vectorizer.c (vectorize_loops): Add a queue of loops to > process and insert vectorized loop epilogues into this queue. > * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return created > loop. > (vect_transform_loop): Return created loop. I know Richi wasn't all that happy with the call into if-conversion and I don't particularly like the use of aux for smuggling data around. But I think those are warts, not fundamental design flaws. OK once the rest of the bits are approved. jeff
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 5914a78..b790ca9 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -2808,7 +2808,7 @@ ifcvt_local_dce (basic_block bb) profitability analysis. Returns non-zero todo flags when something changed. */ -static unsigned int +unsigned int tree_if_conversion (struct loop *loop) { unsigned int todo = 0; diff --git a/gcc/tree-if-conv.h b/gcc/tree-if-conv.h new file mode 100644 index 0000000..3a732c2 --- /dev/null +++ b/gcc/tree-if-conv.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2016 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +#ifndef GCC_TREE_IF_CONV_H +#define GCC_TREE_IF_CONV_H + +unsigned int tree_if_conversion (struct loop *); + +#endif /* GCC_TREE_IF_CONV_H */ diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 36d302a..a902a50 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -472,9 +472,15 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, int *max_vf) LOOP_VINFO_LOOP_NEST (loop_vinfo), true)) return false; - FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr) - if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf)) - return false; + /* For epilogues we either have no aliases or alias versioning + was applied to original loop. Therefore we may just get max_vf + using VF of original loop. */ + if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)) + *max_vf = LOOP_VINFO_ORIG_VECT_FACTOR (loop_vinfo); + else + FOR_EACH_VEC_ELT (LOOP_VINFO_DDRS (loop_vinfo), i, ddr) + if (vect_analyze_data_ref_dependence (ddr, loop_vinfo, max_vf)) + return false; return true; } @@ -1595,7 +1601,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) /* Check if we can possibly peel the loop. */ if (!vect_can_advance_ivs_p (loop_vinfo) || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) - || loop->inner) + || loop->inner + /* Required peeling was performed in prologue and + is not required for epilogue. */ + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) do_peeling = false; if (do_peeling @@ -1875,7 +1884,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) do_versioning = optimize_loop_nest_for_speed_p (loop) - && (!loop->inner); /* FORNOW */ + && (!loop->inner) /* FORNOW */ + /* Required versioning was performed for the + original loop and is not required for epilogue. */ + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); if (do_versioning) { diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 7ec6dae..fab5879 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -1742,9 +1742,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, tree niters, NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO). COND_EXPR and COND_EXPR_STMT_LIST are combined with a new generated - test. */ + test. -void + Return created loop. */ + +struct loop * vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree ni_name, tree ratio_mult_vf_name, unsigned int th, bool check_profitability) @@ -1812,6 +1814,8 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, scev_reset (); free_original_copy_tables (); + + return new_loop; } diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 90d42b5..d48f565 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "gimple-fold.h" #include "cgraph.h" +#include "tree-if-conv.h" /* Loop Vectorization Pass. @@ -1214,8 +1215,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts) destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)); loop_vinfo->scalar_cost_vec.release (); + loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); free (loop_vinfo); - loop->aux = NULL; } @@ -1501,13 +1502,24 @@ vect_analyze_loop_form (struct loop *loop) if (! vect_analyze_loop_form_1 (loop, &loop_cond, &number_of_iterationsm1, &number_of_iterations, &inner_loop_cond)) - return NULL; + { + loop->aux = NULL; + return NULL; + } loop_vec_info loop_vinfo = new_loop_vec_info (loop); LOOP_VINFO_NITERSM1 (loop_vinfo) = number_of_iterationsm1; LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations; LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = number_of_iterations; + /* For epilogues we want to vectorize aux holds + loop_vec_info of the original loop. */ + if (loop->aux) + { + gcc_assert (LOOP_VINFO_VECTORIZABLE_P ((loop_vec_info)loop->aux)); + LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = (loop_vec_info)loop->aux; + } + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) { if (dump_enabled_p ()) @@ -1524,7 +1536,6 @@ vect_analyze_loop_form (struct loop *loop) STMT_VINFO_TYPE (vinfo_for_stmt (inner_loop_cond)) = loop_exit_ctrl_vec_info_type; - gcc_assert (!loop->aux); loop->aux = loop_vinfo; return loop_vinfo; } @@ -2284,7 +2295,10 @@ vect_analyze_loop (struct loop *loop) if (fatal || vector_sizes == 0 || current_vector_size == 0) - return NULL; + { + loop->aux = NULL; + return NULL; + } /* Try the next biggest vector size. */ current_vector_size = 1 << floor_log2 (vector_sizes); @@ -6573,10 +6587,11 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo, Vectorize the loop - created vectorized stmts to replace the scalar stmts in the loop, and update the loop exit condition. */ -void +struct loop * vect_transform_loop (loop_vec_info loop_vinfo) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + struct loop *epilogue = NULL; basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); int nbbs = loop->num_nodes; int i; @@ -6658,8 +6673,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) ni_name = vect_build_loop_niters (loop_vinfo); vect_generate_tmps_on_preheader (loop_vinfo, ni_name, &ratio_mult_vf, &ratio); - vect_do_peeling_for_loop_bound (loop_vinfo, ni_name, ratio_mult_vf, - th, check_profitability); + epilogue = vect_do_peeling_for_loop_bound (loop_vinfo, ni_name, + ratio_mult_vf, th, + check_profitability); } else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) ratio = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), @@ -6964,6 +6980,59 @@ vect_transform_loop (loop_vec_info loop_vinfo) FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (loop_vinfo), i, instance) vect_free_slp_instance (instance); LOOP_VINFO_SLP_INSTANCES (loop_vinfo).release (); + + /* Don't vectorize epilogue for epilogue. */ + if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)) + epilogue = NULL; + /* Scalar epilogue is not vectorized in case + we use combined vector epilogue. */ + else if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) + epilogue = NULL; + + if (epilogue) + { + if (!LOOP_VINFO_MASK_EPILOGUE (loop_vinfo)) + { + unsigned int vector_sizes + = targetm.vectorize.autovectorize_vector_sizes (); + vector_sizes &= current_vector_size - 1; + + if (!(flag_tree_vectorize_epilogues & VECT_EPILOGUE_NOMASK)) + epilogue = NULL; + else if (!vector_sizes) + epilogue = NULL; + else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) + { + int smallest_vec_size = 1 << ctz_hwi (vector_sizes); + int ratio = current_vector_size / smallest_vec_size; + int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo) + - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); + eiters = eiters % vectorization_factor; + + epilogue->nb_iterations_upper_bound = eiters - 1; + + if (eiters < vectorization_factor / ratio) + epilogue = NULL; + } + } + } + + if (epilogue) + { + epilogue->force_vectorize = loop->force_vectorize; + epilogue->safelen = loop->safelen; + epilogue->dont_vectorize = false; + + /* We may need to if-convert epilogue to vectorize it. */ + if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)) + tree_if_conversion (epilogue); + + gcc_assert (!epilogue->aux); + epilogue->aux = loop_vinfo; + } + + return epilogue; } /* The code below is trying to perform simple optimization - revert diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 2669813..1fc8b65 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -491,14 +491,16 @@ vectorize_loops (void) { unsigned int i; unsigned int num_vectorized_loops = 0; - unsigned int vect_loops_num; + unsigned int vect_loops_num = number_of_loops (cfun); struct loop *loop; hash_table<simduid_to_vf> *simduid_to_vf_htab = NULL; hash_table<simd_array_to_simduid> *simd_array_to_simduid_htab = NULL; bool any_ifcvt_loops = false; unsigned ret = 0; + auto_vec<unsigned int> loops (vect_loops_num); - vect_loops_num = number_of_loops (cfun); + FOR_EACH_LOOP (loop, 0) + loops.quick_push (loop->num); /* Bail out if there are no loops. */ if (vect_loops_num <= 1) @@ -514,14 +516,18 @@ vectorize_loops (void) /* If some loop was duplicated, it gets bigger number than all previously defined loops. This fact allows us to run only over initial loops skipping newly generated ones. */ - FOR_EACH_LOOP (loop, 0) - if (loop->dont_vectorize) + for (i = 0; i < loops.length (); i++) + if (!(loop = get_loop (cfun, loops[i]))) + continue; + else if (loop->dont_vectorize) any_ifcvt_loops = true; else if ((flag_tree_loop_vectorize - && optimize_loop_nest_for_speed_p (loop)) + && (optimize_loop_nest_for_speed_p (loop) + || loop->aux)) || loop->force_vectorize) { loop_vec_info loop_vinfo; + struct loop *new_loop; vect_location = find_loop_location (loop); if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION && dump_enabled_p ()) @@ -551,12 +557,21 @@ vectorize_loops (void) && dump_enabled_p ()) dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, "loop vectorized\n"); - vect_transform_loop (loop_vinfo); + new_loop = vect_transform_loop (loop_vinfo); num_vectorized_loops++; /* Now that the loop has been vectorized, allow it to be unrolled etc. */ loop->force_vectorize = false; + /* Add new loop to a processing queue. To make it easier + to match loop and its epilogue vectorization in dumps + put new loop as the next loop to process. */ + if (new_loop) + { + loops.safe_insert (i + 1, new_loop->num); + vect_loops_num = number_of_loops (cfun); + } + if (loop->simduid) { simduid_to_vf *simduid_to_vf_data = XNEW (simduid_to_vf); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 2c6cdbf..26d84b4 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -984,8 +984,8 @@ extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, struct loop *, edge); extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); -extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree, - unsigned int, bool); +extern struct loop *vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree, + unsigned int, bool); extern void vect_do_peeling_for_alignment (loop_vec_info, tree, unsigned int, bool); extern source_location find_loop_location (struct loop *); @@ -1099,7 +1099,7 @@ extern gimple *vect_force_simple_reduction (loop_vec_info, gimple *, bool, /* Drive for loop analysis stage. */ extern loop_vec_info vect_analyze_loop (struct loop *); /* Drive for loop transformation stage. */ -extern void vect_transform_loop (loop_vec_info); +extern struct loop *vect_transform_loop (loop_vec_info); extern loop_vec_info vect_analyze_loop_form (struct loop *); extern bool vectorizable_live_operation (gimple *, gimple_stmt_iterator *, gimple **);