Message ID | alpine.DEB.2.02.1411021029500.15070@stedding.saclay.inria.fr |
---|---|
State | New |
Headers | show |
On Sun, 2 Nov 2014, Marc Glisse wrote: > On Thu, 30 Oct 2014, Richard Biener wrote: > >> On Thu, 30 Oct 2014, Jakub Jelinek wrote: >> >>> On Thu, Oct 30, 2014 at 09:56:32AM +0100, Richard Biener wrote: >>>> >>>> The following patch makes fold_ternary no longer make >>>> VEC_PERMs valid for the target invalid. As pointed out >>>> in the PR we only need to make sure this doesn't happen >>>> after vector lowering. >>> >>> Well, even if you do that before vector lowering, if the original >>> mask is fine and new one is not, you'd seriously pessimize code. >> >> Note that what fold does here isn't very elaborate - it just >> tries hard to make a two-input VEC_PERM a one-input one which >> is good for canonicalization and CSE. I'd say that the >> optabs.c code should ideally be able to recover the working >> variant (or the target expanders should be more clever...). > > It may be hard for optabs.c. The target expanders should really be fixed. > This isn't the same as when we were talking of combining any permutations, > here it is something that should be fairly easy to do. In that sense, simply > avoiding the ICE (at least in release mode) and leaving optimization to the > targets should be enough. Still, I am proposing something closer to Jakub's > suggestion below. > >>> How about moving the VEC_PERM_EXPR arg2 == VECTOR_CST folding >>> into a separate function with single_arg argument, call it with >>> operand_equal_p (op0, op1, 0) initially and call that function again >>> if single_arg and !can_vec_perm_p (...), that time with >>> single_arg parameter false? > > If the original permutation is !can_vec_perm_p, I believe we should > transform. That can indeed be done with a function. I did it without a > function, but I can try to rewrite it if you want. > >> Or how about removing the code instead and doing it during >> vector lowering if the original permute is not !can_vec_perm_p? > > I'd rather not delay that much. > > > Here is a proposed patch that passed bootstrap+testsuite on x86_64-linux-gnu. > I did not test on arm (or whichever platform it was that failed). If > can_vec_perm_p is costly, we can test single_arg first (otherwise the 2 > can_vec_perm_p must be the same) and test PROP_gimple_lvec before the second > can_vec_perm_p (which must answer true then). > > I don't remember what the arg2 == op2 test is about, so I kept it. I also > didn't try to fix TREE_SIDE_EFFECTS handling, a quick test didn't trigger > that issue. > > 2014-11-03 Marc Glisse <marc.glisse@inria.fr> > PR tree-optimization/63666 > * fold-const.c: Include "optabs.h". > (fold_ternary_loc) <VEC_PERM_EXPR>: Avoid canonicalizing a > can_vec_perm_p permutation to one that is not. (sorry for the incomplete ChangeLog in the previous email)
On Sun, Nov 02, 2014 at 11:08:57AM +0100, Marc Glisse wrote: > @@ -14189,47 +14190,47 @@ fold_ternary_loc (location_t loc, enum t > return fold_build2_loc (loc, PLUS_EXPR, type, > const_binop (MULT_EXPR, arg0, arg1), arg2); > if (integer_zerop (arg2)) > return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); > > return fold_fma (loc, type, arg0, arg1, arg2); > > case VEC_PERM_EXPR: > if (TREE_CODE (arg2) == VECTOR_CST) > { > - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask; > + unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask, mask2; > unsigned char *sel = XALLOCAVEC (unsigned char, nelts); > + unsigned char *sel2 = XALLOCAVEC (unsigned char, nelts); Can't you just XALLOCAVEC 2 * nelts and set sel2 = sel + nelts; ? > bool need_mask_canon = false; > + bool need_mask_canon2 = false; > bool all_in_vec0 = true; > bool all_in_vec1 = true; > bool maybe_identity = true; > bool single_arg = (op0 == op1); > bool changed = false; > > mask = single_arg ? (nelts - 1) : (2 * nelts - 1); > + mask2 = 2 * nelts - 1; Perhaps mask2 = 2 * nelts - 1; first and use mask2 in mask = ? Otherwise LGTM. Jakub
Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c (revision 217007) +++ gcc/fold-const.c (working copy) @@ -75,20 +75,21 @@ along with GCC; see the file COPYING3. #include "gimple.h" #include "gimplify.h" #include "tree-dfa.h" #include "hash-table.h" /* Required for ENABLE_FOLD_CHECKING. */ #include "builtins.h" #include "hash-map.h" #include "plugin-api.h" #include "ipa-ref.h" #include "cgraph.h" #include "generic-match.h" +#include "optabs.h" /* Nonzero if we are folding constants inside an initializer; zero otherwise. */ int folding_initializer = 0; /* The following constants represent a bit based encoding of GCC's comparison operators. This encoding simplifies transformations on relational comparison operators, such as AND and OR. */ enum comparison_code { COMPCODE_FALSE = 0, @@ -14189,47 +14190,47 @@ fold_ternary_loc (location_t loc, enum t return fold_build2_loc (loc, PLUS_EXPR, type, const_binop (MULT_EXPR, arg0, arg1), arg2); if (integer_zerop (arg2)) return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); return fold_fma (loc, type, arg0, arg1, arg2); case VEC_PERM_EXPR: if (TREE_CODE (arg2) == VECTOR_CST) { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask; + unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask, mask2; unsigned char *sel = XALLOCAVEC (unsigned char, nelts); + unsigned char *sel2 = XALLOCAVEC (unsigned char, nelts); bool need_mask_canon = false; + bool need_mask_canon2 = false; bool all_in_vec0 = true; bool all_in_vec1 = true; bool maybe_identity = true; bool single_arg = (op0 == op1); bool changed = false; mask = single_arg ? (nelts - 1) : (2 * nelts - 1); + mask2 = 2 * nelts - 1; gcc_assert (nelts == VECTOR_CST_NELTS (arg2)); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg2, i); if (TREE_CODE (val) != INTEGER_CST) return NULL_TREE; /* Make sure that the perm value is in an acceptable range. */ wide_int t = val; - if (wi::gtu_p (t, mask)) - { - need_mask_canon = true; - sel[i] = t.to_uhwi () & mask; - } - else - sel[i] = t.to_uhwi (); + need_mask_canon |= wi::gtu_p (t, mask); + need_mask_canon2 |= wi::gtu_p (t, mask2); + sel[i] = t.to_uhwi () & mask; + sel2[i] = t.to_uhwi () & mask2; if (sel[i] < nelts) all_in_vec1 = false; else all_in_vec0 = false; if ((sel[i] & (nelts-1)) != i) maybe_identity = false; } @@ -14257,20 +14258,31 @@ fold_ternary_loc (location_t loc, enum t || TREE_CODE (op1) == CONSTRUCTOR)) { tree t = fold_vec_perm (type, op0, op1, sel); if (t != NULL_TREE) return t; } if (op0 == op1 && !single_arg) changed = true; + /* Some targets are deficient and fail to expand a single + argument permutation while still allowing an equivalent + 2-argument version. */ + if (need_mask_canon && arg2 == op2 + && !can_vec_perm_p (TYPE_MODE (type), false, sel) + && can_vec_perm_p (TYPE_MODE (type), false, sel2)) + { + need_mask_canon = need_mask_canon2; + sel = sel2; + } + if (need_mask_canon && arg2 == op2) { tree *tsel = XALLOCAVEC (tree, nelts); tree eltype = TREE_TYPE (TREE_TYPE (arg2)); for (i = 0; i < nelts; i++) tsel[i] = build_int_cst (eltype, sel[i]); op2 = build_vector (TREE_TYPE (arg2), tsel); changed = true; }