Message ID | CAAkRFZLa+ENGNhBhpmwTWKTFbxAkqhs2KnNBdzZ0qmn39Ocihw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Mon, Sep 23, 2013 at 10:37 PM, Xinliang David Li <davidxl@google.com> wrote: > Thanks. I modified the patch so that the max allowed peel iterations > can be specified via a parameter. Testing on going. Ok for trunk ? +DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT, + "vect-max-peeling-for-alignment", + "Max number of loop peels to enhancement alignment of data references in a loop", + -1, 0, 0) I believe this doesn't allow --param vect-max-peeling-for-alignment=-1 as you specify 0 as the minimum. So, ok with changing minimum and maxmum to -1. (double check that this works) Thanks, Richard. > David > > On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davidxl@google.com> wrote: >>> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote: >>>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote: >>>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener >>>>>>> <richard.guenther@gmail.com> wrote: >>>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote: >>>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations, >>>>>>>>> but there is no simple way to turn on loop vectorization alone. The >>>>>>>>> logic for default O3 setting is also complicated. >>>>>>>>> >>>>>>>>> In this patch, two new options are introduced: >>>>>>>>> >>>>>>>>> 1) -ftree-loop-vectorize >>>>>>>>> >>>>>>>>> This option is used to turn on loop vectorization only. option >>>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny >>>>>>>>> business of Init(2) is needed. With this change, -ftree-vectorize >>>>>>>>> becomes a simple alias to -ftree-loop-vectorize + >>>>>>>>> -ftree-slp-vectorize. >>>>>>>>> >>>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is: >>>>>>>>> >>>>>>>>> -O3 -fno-tree-vectorize -ftree-slp-vectorize >>>>>>>>> >>>>>>>>> With the new change it becomes: >>>>>>>>> >>>>>>>>> -O3 -fno-loop-vectorize >>>>>>>>> >>>>>>>>> >>>>>>>>> To turn on only loop vectorize at O2, the old way is >>>>>>>>> >>>>>>>>> -O2 -ftree-vectorize -fno-slp-vectorize >>>>>>>>> >>>>>>>>> The new way is >>>>>>>>> >>>>>>>>> -O2 -ftree-loop-vectorize >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2) -ftree-vect-loop-peeling >>>>>>>>> >>>>>>>>> This option is used to turn on/off loop peeling for alignment. In the >>>>>>>>> long run, this should be folded into the cheap cost model proposed by >>>>>>>>> Richard. This option is also useful in scenarios where peeling can >>>>>>>>> introduce runtime problems: >>>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be >>>>>>>>> common in practice. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Patch attached. Compiler boostrapped. Ok after testing? >>>>>>>> >>>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2). >>>>>>> >>>>>>> Ok. Can you also comment on 2) ? >>>>>> >>>>>> I think we want to decide how granular we want to control the vectorizer >>>>>> and using which mechanism. My cost-model re-org makes >>>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like >>>>>> a step backwards in this context. >>>>> >>>>> Using cost model to do a coarse grain control/configuration is >>>>> certainly something we want, but having a fine grain control is still >>>>> useful. >>>>> >>>>>> >>>>>> So, can you summarize what pieces (including versioning) of the vectorizer >>>>>> you'd want to be able to disable separately? >>>>> >>>>> Loop peeling seems to be the main one. There is also a correctness >>>>> issue related. For instance, the following code is common in practice, >>>>> but loop peeling wrongly assumes initial base-alignment and generates >>>>> aligned mov instruction after peeling, leading to SEGV. Peeling is >>>>> not something we can blindly turned on -- even when it is on, there >>>>> should be a way to turn it off explicitly: >>>>> >>>>> char a[10000]; >>>>> >>>>> void foo(int n) >>>>> { >>>>> int* b = (int*)(a+n); >>>>> int i = 0; >>>>> for (; i < 1000; ++i) >>>>> b[i] = 1; >>>>> } >>>>> >>>>> int main(int argn, char** argv) >>>>> { >>>>> foo(argn); >>>>> } >>>> >>>> But that's just a bug that should be fixed (looking into it). >>>> >>>>>> Just disabling peeling for >>>>>> alignment may get you into the versioning for alignment path (and thus >>>>>> an unvectorized loop at runtime). >>>>> >>>>> This is not true for target supporting mis-aligned access. I have not >>>>> seen a case where alignment driver loop version happens on x86. >>>>> >>>>>>Also it's know that the alignment peeling >>>>>> code needs some serious TLC (it's outcome depends on the order of DRs, >>>>>> the cost model it uses leaves to be desired as we cannot distinguish >>>>>> between unaligned load and store costs). >>>>> >>>>> Yet another reason to turn it off as it is not effective anyways? >>>> >>>> As said I'll disable all remains of -ftree-vect-loop-version with the cost model >>>> patch because it wasn't guarding versioning for aliasing but only versioning >>>> for alignment. >>>> >>>> We have to be consistent here - if we add a way to disable peeling for >>>> alignment then we certainly don't want to remove the ability to disable >>>> versioning for alignment, no? >>> >>> We already have the ability to turn off versioning -- via --param. It >>> is a more natural way to fine tune a pass instead of introducing a -f >>> option. For this reason, your planned deprecation of the option is a >>> good thing to do. >>> >>> For consistency, I think we should introduce a new parameter to turn >>> on/off peeling. This can also be tied closely with the cost model >>> change. >>> >>> The proposed patch attached. Does this one look ok? >> >> I'd principally support adding a param but rather would for example make >> it limit the number of peeled iterations (zero turning off the feature). As >> you implemented it it will give false messages for >> >> supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); >> do_peeling = vector_alignment_reachable_p (dr); >> if (do_peeling) >> { >> ... >> } >> else >> { >> if (!aligned_access_p (dr)) >> { >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> "vector alignment may not be reachable\n"); >> break; >> } >> } >> >> the else path. I'd restructure this as >> >> if (!vector_alignment_reachable_p (dr)) >> { >> if (!aligned_access_p (dr)) >> .... >> continue; >> } >> >> if (known_alignment_for_access_p (dr)) >> { >> ... check param against the number of iterations to peel for this DR .... >> } >> else >> { >> ... check param against max iterations ... >> } >> >> an alternative is a tri-state param that would either allow no peeling at all, >> just peeling with known number of iterations or all kind of peelings. >> >> Thanks, >> Richard. >> >> >>> thanks, >>> >>> David >>> >>> >>> >>>> >>>> Richard. >>>> >>>>> >>>>> thanks, >>>>> >>>>> David >>>>> >>>>>> >>>>>> Richard. >>>>>> >>>>>>>> >>>>>>>> I've stopped a quick try doing 1) myself because >>>>>>>> >>>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options >>>>>>>> opts->x_flag_ipa_reference = false; >>>>>>>> break; >>>>>>>> >>>>>>>> + case OPT_ftree_vectorize: >>>>>>>> + if (!opts_set->x_flag_tree_loop_vectorize) >>>>>>>> + opts->x_flag_tree_loop_vectorize = value; >>>>>>>> + if (!opts_set->x_flag_tree_slp_vectorize) >>>>>>>> + opts->x_flag_tree_slp_vectorize = value; >>>>>>>> + break; >>>>>>>> >>>>>>>> doesn't look obviously correct. Does that handle >>>>>>>> >>>>>>>> -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize >>>>>>>> >>>>>>>> or >>>>>>>> >>>>>>>> -ftree-loop-vectorize -fno-tree-vectorize >>>>>>>> >>>>>>>> properly? Currently at least >>>>>>>> >>>>>>>> -ftree-slp-vectorize -fno-tree-vectorize >>>>>>>> >>>>>>>> doesn't "work". >>>>>>> >>>>>>> >>>>>>> Right -- same is true for -fprofile-use option. FDO enables some >>>>>>> passes, but can not re-enable them if they are flipped off before. >>>>>>> >>>>>>>> >>>>>>>> That said, the option machinery doesn't handle an option being an alias >>>>>>>> for two other options, so it's mechanism to contract positives/negatives >>>>>>>> doesn't work here and the override hooks do not work reliably for >>>>>>>> repeated options. >>>>>>>> >>>>>>>> Or am I wrong here? Should we care at all? Joseph? >>>>>>> >>>>>>> We should probably just document the behavior. Even better, we should >>>>>>> deprecate the old option. >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> David >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Richard. >>>>>>>> >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> David
Index: params.def =================================================================== --- params.def (revision 202834) +++ params.def (working copy) @@ -544,6 +544,11 @@ DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check", 10, 0, 0) +DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT, + "vect-max-peeling-for-alignment", + "Max number of loop peels to enhancement alignment of data references in a loop", + -1, 0, 0) + DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS, "max-cselib-memory-locations", "The maximum memory locations recorded by cselib", Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 202834) +++ tree-vect-data-refs.c (working copy) @@ -1718,6 +1718,30 @@ vect_enhance_data_refs_alignment (loop_v if (do_peeling) { + unsigned max_allowed_peel + = PARAM_VALUE (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT); + if (max_allowed_peel != (unsigned)-1) + { + unsigned max_peel = npeel; + if (max_peel == 0) + { + gimple dr_stmt = DR_STMT (dr0); + stmt_vec_info vinfo = vinfo_for_stmt (dr_stmt); + tree vtype = STMT_VINFO_VECTYPE (vinfo); + max_peel = TYPE_VECTOR_SUBPARTS (vtype) - 1; + } + if (max_peel > max_allowed_peel) + { + do_peeling = false; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Disable peeling, max peels reached: %d\n", max_peel); + } + } + } + + if (do_peeling) + { stmt_info_for_cost *si; void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 202834) +++ doc/invoke.texi (working copy) @@ -9451,6 +9451,10 @@ The maximum number of run-time checks th doing loop versioning for alias in the vectorizer. See option @option{-ftree-vect-loop-version} for more information. +@item vect-max-peeling-for-alignment +The maximum number of loop peels to enhance access alignment +for vectorizer. Value -1 means 'no limit'. + @item max-iterations-to-track The maximum number of iterations of a loop the brute-force algorithm for analysis of the number of iterations of the loop tries to evaluate.