Message ID | AANLkTikk4dX2Os740MRxRaaAzaOnyh6LmVvCij_aQwU8@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, 22 Jun 2010, Sebastian Pop wrote: > Hi, > > The attached patches clean the code generated by the if-conversion: > > 0001 adds a debug counter, useful to reduce code generation problems > due to the tree-if-conversion pass. Ok. > 0002 calls the cleanup_tree_cfg after if-conversion to ensure that the > CFG representation passed to the vectorizer is in good shape. Instead of + if (changed) + cleanup_tree_cfg (); return 0; } do return changed ? TODO_cleanup_cfg : 0; ok with that change. > 0003 adds some more sanity checks after the if-conversion pass: > verify_loop_structure and verify_loop_closed_ssa. This is already done. See passes.c. Not ok. > 0004 moves the code needed for the computation of the predicate of a > BB at the beginning of the predicated BB. The computation of the > insertion place is simplified, and makes it possible to use this > predicate in the code of the BB. Why should it ever be needed to use it in the code of the BB? As far as I see this only enlarges life-time of them. > 0005 implements and calls an unfold SSA function that first expands > the SSA_NAMEs and then calls fold on the expanded expression tree. > unfold_ssa_names exposes to fold a more complete expression than > otherwise available in the predicates of a BB. This patch is now > needed to improve the quality of the code generated by the > if-conversion as if-convert gimplifies the predicates in order to > avoid recomputations of some predicates. With this patch it is > possible to fold (a || b) into the true predicate when a and b are the > predicates of the two branches of a condition, as the SSA_NAMEs a and > b are first unfolded into their respective expressions and then fold > finishes by computing the true predicate. Note that this patch needs > some adjustment, i.e., replacing the magic constant 5 that is the > depth level for the unfold traversal to be some parameter, or so. + op2 = unfold_ssa_names (gimple_assign_rhs2 (stmt), level); this possibly accesses uninitialized and unallocated memory. I do not think this patch is a good idea at all and you allow exponential growth as you do not decrement level for each operand you visit. Thus, for all-ternary operands and your use of a limit of depth 5 you'd get 243 substitutions and foldings. At most a substitution depth of 1 would be which in turn will simplify that patch a lot. Please re-work it. > 0006 uses a single function reset_bb_predicate to reset the predicate > of a BB to true. This function has to release all the SSA_NAMEs used > in the gimplification of a predicate. Ok. > 0007 avoids the generation of code computing the true predicate, that > occurs for all the BBs merging disjunct predicates leading to the true > predicate. Ok. > 0008 forces the predicate of a BB to be either is_gimple_condexpr or > otherwise the predicate is gimplified, avoiding the insertion of the > same computation on other predicates. I suppose this is only necessary because of 0005, so defered. Please post your patches separately and inline, not as attachment. It's a major nuisance to deal with this kind of mails with its loads of attachements. Thanks, Richard.
On Fri, Jun 25, 2010 at 09:48, Richard Guenther <rguenther@suse.de> wrote: >> 0004 moves the code needed for the computation of the predicate of a >> BB at the beginning of the predicated BB. The computation of the >> insertion place is simplified, and makes it possible to use this >> predicate in the code of the BB. > > Why should it ever be needed to use it in the code of the BB? > As far as I see this only enlarges life-time of them. This is needed for predicating memory accesses in the BB: see 0009. I will include this patch with the other patch-set. > Please post your patches separately and inline, not as attachment. > It's a major nuisance to deal with this kind of mails with its loads > of attachements. Ok, sorry for the inconvenient. Sebastian
On Fri, Jun 25, 2010 at 09:48, Richard Guenther <rguenther@suse.de> wrote: >> 0005 implements and calls an unfold SSA function that first expands >> the SSA_NAMEs and then calls fold on the expanded expression tree. >> unfold_ssa_names exposes to fold a more complete expression than >> otherwise available in the predicates of a BB. This patch is now >> needed to improve the quality of the code generated by the >> if-conversion as if-convert gimplifies the predicates in order to >> avoid recomputations of some predicates. With this patch it is >> possible to fold (a || b) into the true predicate when a and b are the >> predicates of the two branches of a condition, as the SSA_NAMEs a and >> b are first unfolded into their respective expressions and then fold >> finishes by computing the true predicate. Note that this patch needs >> some adjustment, i.e., replacing the magic constant 5 that is the >> depth level for the unfold traversal to be some parameter, or so. > > + op2 = unfold_ssa_names (gimple_assign_rhs2 (stmt), level); > > this possibly accesses uninitialized and unallocated memory. How could this happen, could you explain? > I do not think this patch is a good idea at all and you allow > exponential growth as you do not decrement level for each operand > you visit. Thus, for all-ternary operands and your use of a limit > of depth 5 you'd get 243 substitutions and foldings. > > At most a substitution depth of 1 would be which in turn will > simplify that patch a lot. A substitution of depth 1 doesn't seem to be enough: if I am limiting the level to 1, then with my current patch-set, I get all these failures: FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-2.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-3.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-4.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-5.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-6.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-7.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-9.c scan-tree-dump-times vect "vectorized 1 loops" 2 FAIL: gcc.dg/vect/no-trapping-math-2.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-trapping-math-vect-111.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c scan-tree-dump-times vect "vectorized 1 loops" 1 for level = 2, I still get these fails: FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect "vectorized 1 loops" 1 for level = 3 these fails disappear. Sebastian
On Tue, 29 Jun 2010, Sebastian Pop wrote: > On Fri, Jun 25, 2010 at 09:48, Richard Guenther <rguenther@suse.de> wrote: > >> 0005 implements and calls an unfold SSA function that first expands > >> the SSA_NAMEs and then calls fold on the expanded expression tree. > >> unfold_ssa_names exposes to fold a more complete expression than > >> otherwise available in the predicates of a BB. This patch is now > >> needed to improve the quality of the code generated by the > >> if-conversion as if-convert gimplifies the predicates in order to > >> avoid recomputations of some predicates. With this patch it is > >> possible to fold (a || b) into the true predicate when a and b are the > >> predicates of the two branches of a condition, as the SSA_NAMEs a and > >> b are first unfolded into their respective expressions and then fold > >> finishes by computing the true predicate. Note that this patch needs > >> some adjustment, i.e., replacing the magic constant 5 that is the > >> depth level for the unfold traversal to be some parameter, or so. > > > > + op2 = unfold_ssa_names (gimple_assign_rhs2 (stmt), level); > > > > this possibly accesses uninitialized and unallocated memory. > > How could this happen, could you explain? You unconditionally access op2 w/o checking the rhs is binary. > > I do not think this patch is a good idea at all and you allow > > exponential growth as you do not decrement level for each operand > > you visit. Thus, for all-ternary operands and your use of a limit > > of depth 5 you'd get 243 substitutions and foldings. > > > > At most a substitution depth of 1 would be which in turn will > > simplify that patch a lot. > > A substitution of depth 1 doesn't seem to be enough: if I am limiting > the level to 1, then with my current patch-set, I get all these failures: How is it that an improvement patch creates new FAILs? W/o your patch we also don't need to "unfold". What kind of expressions do you need to fold? I suppose they are all comparisons and logical/bitwise and/or or nots? Please have a look at the infrastructure Sandra added for the patch for PRs 39874 and 28685 then. This is how it should be done - not using fold and building large trees for any expression kinds you are running into. I bet you can simply re-use what Sandra added. Richard. > FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect > "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect > "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-2.c scan-tree-dump-times vect "vectorized > 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-3.c scan-tree-dump-times vect "vectorized > 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-4.c scan-tree-dump-times vect "vectorized > 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-5.c scan-tree-dump-times vect "vectorized > 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-6.c scan-tree-dump-times vect "vectorized > 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-7.c scan-tree-dump-times vect "vectorized > 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-9.c scan-tree-dump-times vect "vectorized > 1 loops" 2 > FAIL: gcc.dg/vect/no-trapping-math-2.c scan-tree-dump-times vect > "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/no-trapping-math-vect-111.c scan-tree-dump-times > vect "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c > scan-tree-dump-times vect "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c > scan-tree-dump-times vect "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c > scan-tree-dump-times vect "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c > scan-tree-dump-times vect "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c > scan-tree-dump-times vect "vectorized 1 loops" 1 > > for level = 2, I still get these fails: > > FAIL: gcc.dg/vect/vect-ifcvt-16.c scan-tree-dump-times vect > "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/vect-ifcvt-17.c scan-tree-dump-times vect > "vectorized 1 loops" 1 > > for level = 3 these fails disappear. > > Sebastian > >
On Wed, Jun 30, 2010 at 03:22, Richard Guenther <rguenther@suse.de> wrote: >> A substitution of depth 1 doesn't seem to be enough: if I am limiting >> the level to 1, then with my current patch-set, I get all these failures: > > How is it that an improvement patch creates new FAILs? W/o your > patch we also don't need to "unfold". > The patch-set contains the if-conversion of memory writes, so these fails correspond to vect not able to handle the codes generated by the if-conversion, see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44710 > What kind of expressions do you need to fold? I suppose they > are all comparisons and logical/bitwise and/or or nots? > Yes, see the gimple code in the PR, although I guess that this case would be handled by level=1. > Please have a look at the infrastructure Sandra added > for the patch for PRs 39874 and 28685 then. This is how it should > be done - not using fold and building large trees for any > expression kinds you are running into. I bet you can simply > re-use what Sandra added. > Thanks for the pointer, I will try to reuse this. Sebastian
From 35f74eaa5d4a79f3fcdfeae8bdf3b81751c6f1ba Mon Sep 17 00:00:00 2001 From: Sebastian Pop <sebpop@gmail.com> Date: Wed, 16 Jun 2010 15:57:53 -0500 Subject: [PATCH 08/10] The predicate of a BB should either be a gimple_condexpr or an SSA_NAME. * tree-if-conv.c (add_to_predicate_list): Call force_gimple_operand when the predicate is not a gimple_condexpr. Call reset_bb_predicate when the predicate is true. * gcc.dg/tree-ssa/ifc-6.c: New. --- gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c | 15 +++++++++++++++ gcc/tree-if-conv.c | 12 +++++++++++- 2 files changed, 26 insertions(+), 1 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c new file mode 100644 index 0000000..a9c5db3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-6.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */ + +static int x; +foo (int n, int *A) +{ + int i; + for (i = 0; i < n; i++) + { + if (A[i]) + x = 2; + if (A[i + 1]) + x = 1; + } +} diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index ac3ba93..759920c 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -278,7 +278,17 @@ add_to_predicate_list (basic_block bb, tree new_cond) cond = unfold_ssa_names (cond, 5); } - set_bb_predicate (bb, cond); + if (!is_gimple_condexpr (cond)) + { + gimple_seq stmts; + cond = force_gimple_operand (cond, &stmts, true, NULL_TREE); + add_bb_predicate_gimplified_stmts (bb, stmts); + } + + if (is_true_predicate (cond)) + reset_bb_predicate (bb); + else + set_bb_predicate (bb, cond); } /* Add the condition COND to the previous condition PREV_COND, and add -- 1.7.0.4