Message ID | CAAkRFZ+_ky0w9jmw64+-+iAMiZNSzJkcXKjw8rkWwmrv0vGTFA@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davidxl@google.com> wrote: > On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davidxl@google.com> wrote: >>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote: >>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davidxl@google.com> wrote: >>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations, >>>>>>> but there is no simple way to turn on loop vectorization alone. The >>>>>>> logic for default O3 setting is also complicated. >>>>>>> >>>>>>> In this patch, two new options are introduced: >>>>>>> >>>>>>> 1) -ftree-loop-vectorize >>>>>>> >>>>>>> This option is used to turn on loop vectorization only. option >>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny >>>>>>> business of Init(2) is needed. With this change, -ftree-vectorize >>>>>>> becomes a simple alias to -ftree-loop-vectorize + >>>>>>> -ftree-slp-vectorize. >>>>>>> >>>>>>> For instance, to turn on only slp vectorize at O3, the old way is: >>>>>>> >>>>>>> -O3 -fno-tree-vectorize -ftree-slp-vectorize >>>>>>> >>>>>>> With the new change it becomes: >>>>>>> >>>>>>> -O3 -fno-loop-vectorize >>>>>>> >>>>>>> >>>>>>> To turn on only loop vectorize at O2, the old way is >>>>>>> >>>>>>> -O2 -ftree-vectorize -fno-slp-vectorize >>>>>>> >>>>>>> The new way is >>>>>>> >>>>>>> -O2 -ftree-loop-vectorize >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2) -ftree-vect-loop-peeling >>>>>>> >>>>>>> This option is used to turn on/off loop peeling for alignment. In the >>>>>>> long run, this should be folded into the cheap cost model proposed by >>>>>>> Richard. This option is also useful in scenarios where peeling can >>>>>>> introduce runtime problems: >>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be >>>>>>> common in practice. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Patch attached. Compiler boostrapped. Ok after testing? >>>>>> >>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 2). >>>>> >>>>> Ok. Can you also comment on 2) ? >>>> >>>> I think we want to decide how granular we want to control the vectorizer >>>> and using which mechanism. My cost-model re-org makes >>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like >>>> a step backwards in this context. >>> >>> Using cost model to do a coarse grain control/configuration is >>> certainly something we want, but having a fine grain control is still >>> useful. >>> >>>> >>>> So, can you summarize what pieces (including versioning) of the vectorizer >>>> you'd want to be able to disable separately? >>> >>> Loop peeling seems to be the main one. There is also a correctness >>> issue related. For instance, the following code is common in practice, >>> but loop peeling wrongly assumes initial base-alignment and generates >>> aligned mov instruction after peeling, leading to SEGV. Peeling is >>> not something we can blindly turned on -- even when it is on, there >>> should be a way to turn it off explicitly: >>> >>> char a[10000]; >>> >>> void foo(int n) >>> { >>> int* b = (int*)(a+n); >>> int i = 0; >>> for (; i < 1000; ++i) >>> b[i] = 1; >>> } >>> >>> int main(int argn, char** argv) >>> { >>> foo(argn); >>> } >> >> But that's just a bug that should be fixed (looking into it). >> >>>> Just disabling peeling for >>>> alignment may get you into the versioning for alignment path (and thus >>>> an unvectorized loop at runtime). >>> >>> This is not true for target supporting mis-aligned access. I have not >>> seen a case where alignment driver loop version happens on x86. >>> >>>>Also it's know that the alignment peeling >>>> code needs some serious TLC (it's outcome depends on the order of DRs, >>>> the cost model it uses leaves to be desired as we cannot distinguish >>>> between unaligned load and store costs). >>> >>> Yet another reason to turn it off as it is not effective anyways? >> >> As said I'll disable all remains of -ftree-vect-loop-version with the cost model >> patch because it wasn't guarding versioning for aliasing but only versioning >> for alignment. >> >> We have to be consistent here - if we add a way to disable peeling for >> alignment then we certainly don't want to remove the ability to disable >> versioning for alignment, no? > > We already have the ability to turn off versioning -- via --param. It > is a more natural way to fine tune a pass instead of introducing a -f > option. For this reason, your planned deprecation of the option is a > good thing to do. > > For consistency, I think we should introduce a new parameter to turn > on/off peeling. This can also be tied closely with the cost model > change. > > The proposed patch attached. Does this one look ok? I'd principally support adding a param but rather would for example make it limit the number of peeled iterations (zero turning off the feature). As you implemented it it will give false messages for supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); do_peeling = vector_alignment_reachable_p (dr); if (do_peeling) { ... } else { if (!aligned_access_p (dr)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "vector alignment may not be reachable\n"); break; } } the else path. I'd restructure this as if (!vector_alignment_reachable_p (dr)) { if (!aligned_access_p (dr)) .... continue; } if (known_alignment_for_access_p (dr)) { ... check param against the number of iterations to peel for this DR .... } else { ... check param against max iterations ... } an alternative is a tri-state param that would either allow no peeling at all, just peeling with known number of iterations or all kind of peelings. Thanks, Richard. > thanks, > > David > > > >> >> Richard. >> >>> >>> thanks, >>> >>> David >>> >>>> >>>> Richard. >>>> >>>>>> >>>>>> I've stopped a quick try doing 1) myself because >>>>>> >>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options >>>>>> opts->x_flag_ipa_reference = false; >>>>>> break; >>>>>> >>>>>> + case OPT_ftree_vectorize: >>>>>> + if (!opts_set->x_flag_tree_loop_vectorize) >>>>>> + opts->x_flag_tree_loop_vectorize = value; >>>>>> + if (!opts_set->x_flag_tree_slp_vectorize) >>>>>> + opts->x_flag_tree_slp_vectorize = value; >>>>>> + break; >>>>>> >>>>>> doesn't look obviously correct. Does that handle >>>>>> >>>>>> -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize >>>>>> >>>>>> or >>>>>> >>>>>> -ftree-loop-vectorize -fno-tree-vectorize >>>>>> >>>>>> properly? Currently at least >>>>>> >>>>>> -ftree-slp-vectorize -fno-tree-vectorize >>>>>> >>>>>> doesn't "work". >>>>> >>>>> >>>>> Right -- same is true for -fprofile-use option. FDO enables some >>>>> passes, but can not re-enable them if they are flipped off before. >>>>> >>>>>> >>>>>> That said, the option machinery doesn't handle an option being an alias >>>>>> for two other options, so it's mechanism to contract positives/negatives >>>>>> doesn't work here and the override hooks do not work reliably for >>>>>> repeated options. >>>>>> >>>>>> Or am I wrong here? Should we care at all? Joseph? >>>>> >>>>> We should probably just document the behavior. Even better, we should >>>>> deprecate the old option. >>>>> >>>>> thanks, >>>>> >>>>> David >>>>> >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> David
Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 202660) +++ doc/invoke.texi (working copy) @@ -9451,6 +9451,9 @@ The maximum number of run-time checks th doing loop versioning for alias in the vectorizer. See option @option{-ftree-vect-loop-version} for more information. +@item vect-do-peeling-for-alignment +Turn on/off alignment enhancing loop peeling for vectorizer. + @item max-iterations-to-track The maximum number of iterations of a loop the brute-force algorithm for analysis of the number of iterations of the loop tries to evaluate. Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 202660) +++ tree-vect-data-refs.c (working copy) @@ -1349,6 +1349,7 @@ vect_enhance_data_refs_alignment (loop_v unsigned int i, j; bool do_peeling = false; bool do_versioning = false; + int peeling_enabled; bool stat; gimple stmt; stmt_vec_info stmt_info; @@ -1396,6 +1397,7 @@ vect_enhance_data_refs_alignment (loop_v - The cost of peeling (the extra runtime checks, the increase in code size). */ + peeling_enabled = PARAM_VALUE (PARAM_VECT_DO_PEELING_FOR_ALIGNMENT); FOR_EACH_VEC_ELT (datarefs, i, dr) { stmt = DR_STMT (dr); @@ -1420,7 +1422,7 @@ vect_enhance_data_refs_alignment (loop_v continue; supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); - do_peeling = vector_alignment_reachable_p (dr); + do_peeling = (peeling_enabled && vector_alignment_reachable_p (dr)); if (do_peeling) { if (known_alignment_for_access_p (dr)) Index: params.def =================================================================== --- params.def (revision 202660) +++ params.def (working copy) @@ -544,6 +544,11 @@ DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check", 10, 0, 0) +DEFPARAM(PARAM_VECT_DO_PEELING_FOR_ALIGNMENT, + "vect-do-peeling-for-alignment", + "Do loop peeling to enhancement alignment of data references in a loop", + 1, 0, 0) + DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS, "max-cselib-memory-locations", "The maximum memory locations recorded by cselib",