Message ID | 003701d826f6$fa3e5930$eebb0b90$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | Implement constant-folding simplifications of reductions. | expand |
On Mon, 21 Feb 2022, Roger Sayle wrote: > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST). */ > +(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX > + IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR) > + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor) > + (simplify (reduc (op @0 VECTOR_CST@1)) > + (op (reduc:type @0) (reduc:type @1)))) I wonder if we need to test flag_associative_math for the 'plus' case, or if the presence of IFN_REDUC_PLUS is enough to justify the possible loss of precision.
Hi Marc, I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the reduction to be done in any order, for example the testcase requires -ffast-math to allow the REDUC_PLUS to be introduced in the first place. This also applies explains why the patch doesn't need to distinguish negative zeros from positive zeros in ctor_single_nonzero_element (but it's perhaps something to beware of in uses of VECTOR_CST's single_nonzero_element). Cheers, Roger -- > -----Original Message----- > From: Marc Glisse <marc.glisse@inria.fr> > Sent: 21 February 2022 08:21 > To: Roger Sayle <roger@nextmovesoftware.com> > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Implement constant-folding simplifications of reductions. > > On Mon, 21 Feb 2022, Roger Sayle wrote: > > > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC > (VECTOR_CST). > > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN > IFN_REDUC_FMAX > > + IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR > IFN_REDUC_XOR) > > + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor) > > + (simplify (reduc (op @0 VECTOR_CST@1)) > > + (op (reduc:type @0) (reduc:type @1)))) > > I wonder if we need to test flag_associative_math for the 'plus' case, or if the > presence of IFN_REDUC_PLUS is enough to justify the possible loss of precision. > > -- > Marc Glisse
On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Marc, > I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the > reduction to be done in any order, for example the testcase requires > -ffast-math to allow the REDUC_PLUS to be introduced in the first place. > This also applies explains why the patch doesn't need to distinguish > negative zeros from positive zeros in ctor_single_nonzero_element > (but it's perhaps something to beware of in uses of VECTOR_CST's > single_nonzero_element). .IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab which does not specify an operation order. There's also fold_left_plus which does (left-to-right). An argument could be made that constant folding should do what the CPU will end up doing but we're already not doing that for double arithmetic and flush-to-zero enabled with -ffast-math and denormals I think. +/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE. */ +tree +ctor_single_nonzero_element (const_tree t) +{ + unsigned HOST_WIDE_INT idx; + constructor_elt *ce; + tree elt = NULL_TREE; + + if (TREE_CODE (t) == SSA_NAME) + { + gassign *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t)); I think this function should expect a CONSTRUCTOR 'T' and the use in match.pd be adjusted similar to other CONSTRUCTOR users like (simplify (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2) ... (with { tree ctor = (TREE_CODE (@0) == SSA_NAME ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0); +/* Fold reduction of a single nonzero element constructor. */ +(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR) + (simplify (reduc (CONSTRUCTOR@0)) + (with { tree elt = ctor_single_nonzero_element (@0); } + (if (elt) + (non_lvalue { elt; }))))) is (non_value... ) really necessary? I highly doubt so. I wonder if there's a case like { _1, 0., 0., -0. } where independent on the order of the operations this transform relies on !HONOR_SIGNED_ZEROS? It likely also should demote _1 from sNaN to qNaN and thus relies on -fno-signalling-nans? Otherwise OK. Thanks, Richard. > Cheers, > Roger > -- > > > -----Original Message----- > > From: Marc Glisse <marc.glisse@inria.fr> > > Sent: 21 February 2022 08:21 > > To: Roger Sayle <roger@nextmovesoftware.com> > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH] Implement constant-folding simplifications of > reductions. > > > > On Mon, 21 Feb 2022, Roger Sayle wrote: > > > > > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC > > (VECTOR_CST). > > > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN > > IFN_REDUC_FMAX > > > + IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR > > IFN_REDUC_XOR) > > > + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor) > > > + (simplify (reduc (op @0 VECTOR_CST@1)) > > > + (op (reduc:type @0) (reduc:type @1)))) > > > > I wonder if we need to test flag_associative_math for the 'plus' case, or > if the > > presence of IFN_REDUC_PLUS is enough to justify the possible loss of > precision. > > > > -- > > Marc Glisse >
On 2/21/2022 3:55 AM, Richard Biener via Gcc-patches wrote: > On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle <roger@nextmovesoftware.com> wrote: >> >> Hi Marc, >> I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the >> reduction to be done in any order, for example the testcase requires >> -ffast-math to allow the REDUC_PLUS to be introduced in the first place. >> This also applies explains why the patch doesn't need to distinguish >> negative zeros from positive zeros in ctor_single_nonzero_element >> (but it's perhaps something to beware of in uses of VECTOR_CST's >> single_nonzero_element). > .IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab > which does not specify an operation order. There's also > fold_left_plus which does (left-to-right). fold-left-plus is meant to be a strictly in-order reduction, which implies (at least to me) that REDUC_PLUS allows reassociation. > An argument could be made that constant folding should do what > the CPU will end up doing but we're already not doing that for > double arithmetic and flush-to-zero enabled with -ffast-math and > denormals I think. Right. We can also lose inexact state when folding, but I think that's largely considered OK as well. jeff
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 386d573..4283308 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -16792,6 +16792,33 @@ address_compare (tree_code code, tree type, tree op0, tree op1, return equal; } +/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE. */ +tree +ctor_single_nonzero_element (const_tree t) +{ + unsigned HOST_WIDE_INT idx; + constructor_elt *ce; + tree elt = NULL_TREE; + + if (TREE_CODE (t) == SSA_NAME) + { + gassign *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t)); + if (gimple_assign_rhs_code (def) == CONSTRUCTOR) + t = gimple_assign_rhs1 (def); + } + + if (TREE_CODE (t) != CONSTRUCTOR) + return NULL_TREE; + for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (t), idx, &ce); idx++) + if (!integer_zerop (ce->value) && !real_zerop (ce->value)) + { + if (elt) + return NULL_TREE; + elt = ce->value; + } + return elt; +} + #if CHECKING_P namespace selftest { diff --git a/gcc/fold-const.h b/gcc/fold-const.h index f217598..b2f0a2f 100644 --- a/gcc/fold-const.h +++ b/gcc/fold-const.h @@ -224,6 +224,7 @@ extern const char *c_getstr (tree); extern wide_int tree_nonzero_bits (const_tree); extern int address_compare (tree_code, tree, tree, tree, tree &, tree &, poly_int64 &, poly_int64 &, bool); +extern tree ctor_single_nonzero_element (const_tree); /* Return OFF converted to a pointer offset type suitable as offset for POINTER_PLUS_EXPR. Use location LOC for this conversion. */ diff --git a/gcc/match.pd b/gcc/match.pd index d9d8359..047fb50 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7528,6 +7528,20 @@ and, (BIT_FIELD_REF:elt_type @0 { size; } { pos; }) { elt; }))))))) +/* Fold reduction of a single nonzero element constructor. */ +(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR) + (simplify (reduc (CONSTRUCTOR@0)) + (with { tree elt = ctor_single_nonzero_element (@0); } + (if (elt) + (non_lvalue { elt; }))))) + +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST). */ +(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX + IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR) + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor) + (simplify (reduc (op @0 VECTOR_CST@1)) + (op (reduc:type @0) (reduc:type @1)))) + (simplify (vec_perm @0 @1 VECTOR_CST@2) (with diff --git a/gcc/testsuite/gcc.dg/fold-reduc-1.c b/gcc/testsuite/gcc.dg/fold-reduc-1.c new file mode 100644 index 0000000..c8360b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-reduc-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */ +float foo (float x) +{ + int i; + float j; + float a = 0; + for (i = 0; i < 4; ++i) + { + for (j = 0; j < 4; ++j) + { + a += 1; + x += a; + } + } + return x; +} + +/* { dg-final { scan-tree-dump-not "REDUC_PLUS" "optimized"} } */