diff mbox series

Implement constant-folding simplifications of reductions.

Message ID 003701d826f6$fa3e5930$eebb0b90$@nextmovesoftware.com
State New
Headers show
Series Implement constant-folding simplifications of reductions. | expand

Commit Message

Roger Sayle Feb. 21, 2022, 7:45 a.m. UTC
This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd.  The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:

float f (float x)
{
  vector(4) float vect_x_14.11;
  vector(4) float _2;
  float _32;

  _2 = {x_9(D), 0.0, 0.0, 0.0};
  vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
  _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
  return _32;
}

With these proposed new transformations, we can simplify the
above code even further.

float f (float x)
{
  float _32;
  _32 = x_9(D) + 1.36e+2;
  return _32;
}

[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-21  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* fold-const.cc (ctor_single_nonzero_element): New function to
	return the single non-zero element of a (vector) constructor.
	* fold-const.h (ctor_single_nonzero_element): Prototype here.
	* match.pd (reduc (constructor@0)): Simplify reductions of a
	constructor containing a single non-zero element.
	(reduc (@0 op VECTOR_CST) ->  (reduc @0) op CONST): Simplify
	reductions of vector operations of the same operator with
	constant vector operands.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-reduc-1.c: New test case.


Thanks in advance,
Roger
--

Comments

Marc Glisse Feb. 21, 2022, 8:20 a.m. UTC | #1
On Mon, 21 Feb 2022, Roger Sayle wrote:

> +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST).  */
> +(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX
> +            IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR)
> +     op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> +  (simplify (reduc (op @0 VECTOR_CST@1))
> +    (op (reduc:type @0) (reduc:type @1))))

I wonder if we need to test flag_associative_math for the 'plus' case,
or if the presence of IFN_REDUC_PLUS is enough to justify the possible
loss of precision.
Roger Sayle Feb. 21, 2022, 8:31 a.m. UTC | #2
Hi Marc,
I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
reduction to be done in any order, for example the testcase requires 
-ffast-math to allow the REDUC_PLUS to be introduced in the first place.
This also applies explains why the patch doesn't need to distinguish
negative zeros from positive zeros in ctor_single_nonzero_element
(but it's perhaps something to beware of in uses of VECTOR_CST's 
single_nonzero_element).

Cheers,
Roger
--

> -----Original Message-----
> From: Marc Glisse <marc.glisse@inria.fr>
> Sent: 21 February 2022 08:21
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Implement constant-folding simplifications of
reductions.
> 
> On Mon, 21 Feb 2022, Roger Sayle wrote:
> 
> > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC
> (VECTOR_CST).
> > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN
> IFN_REDUC_FMAX
> > +            IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR
> IFN_REDUC_XOR)
> > +     op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> > +  (simplify (reduc (op @0 VECTOR_CST@1))
> > +    (op (reduc:type @0) (reduc:type @1))))
> 
> I wonder if we need to test flag_associative_math for the 'plus' case, or
if the
> presence of IFN_REDUC_PLUS is enough to justify the possible loss of
precision.
> 
> --
> Marc Glisse
Richard Biener Feb. 21, 2022, 10:55 a.m. UTC | #3
On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Marc,
> I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
> reduction to be done in any order, for example the testcase requires
> -ffast-math to allow the REDUC_PLUS to be introduced in the first place.
> This also applies explains why the patch doesn't need to distinguish
> negative zeros from positive zeros in ctor_single_nonzero_element
> (but it's perhaps something to beware of in uses of VECTOR_CST's
> single_nonzero_element).

.IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab
which does not specify an operation order.  There's also
fold_left_plus which does (left-to-right).

An argument could be made that constant folding should do what
the CPU will end up doing but we're already not doing that for
double arithmetic and flush-to-zero enabled with -ffast-math and
denormals I think.

+/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE.  */
+tree
+ctor_single_nonzero_element (const_tree t)
+{
+  unsigned HOST_WIDE_INT idx;
+  constructor_elt *ce;
+  tree elt = NULL_TREE;
+
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      gassign *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t));

I think this function should expect a CONSTRUCTOR 'T' and the use in
match.pd be adjusted similar to other CONSTRUCTOR users like

(simplify
 (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2)
...
  (with
   {
     tree ctor = (TREE_CODE (@0) == SSA_NAME
                  ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0);


+/* Fold reduction of a single nonzero element constructor.  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
+  (simplify (reduc (CONSTRUCTOR@0))
+    (with { tree elt = ctor_single_nonzero_element (@0); }
+      (if (elt)
+        (non_lvalue { elt; })))))

is (non_value... ) really necessary?  I highly doubt so.

I wonder if there's a case like { _1, 0., 0., -0. } where independent on
the order of the operations this transform relies on !HONOR_SIGNED_ZEROS?
It likely also should demote _1 from sNaN to qNaN and thus relies on
-fno-signalling-nans?

Otherwise OK.

Thanks,
Richard.

> Cheers,
> Roger
> --
>
> > -----Original Message-----
> > From: Marc Glisse <marc.glisse@inria.fr>
> > Sent: 21 February 2022 08:21
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Implement constant-folding simplifications of
> reductions.
> >
> > On Mon, 21 Feb 2022, Roger Sayle wrote:
> >
> > > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC
> > (VECTOR_CST).
> > > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN
> > IFN_REDUC_FMAX
> > > +            IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR
> > IFN_REDUC_XOR)
> > > +     op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> > > +  (simplify (reduc (op @0 VECTOR_CST@1))
> > > +    (op (reduc:type @0) (reduc:type @1))))
> >
> > I wonder if we need to test flag_associative_math for the 'plus' case, or
> if the
> > presence of IFN_REDUC_PLUS is enough to justify the possible loss of
> precision.
> >
> > --
> > Marc Glisse
>
Jeff Law Feb. 21, 2022, 6:22 p.m. UTC | #4
On 2/21/2022 3:55 AM, Richard Biener via Gcc-patches wrote:
> On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>>
>> Hi Marc,
>> I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
>> reduction to be done in any order, for example the testcase requires
>> -ffast-math to allow the REDUC_PLUS to be introduced in the first place.
>> This also applies explains why the patch doesn't need to distinguish
>> negative zeros from positive zeros in ctor_single_nonzero_element
>> (but it's perhaps something to beware of in uses of VECTOR_CST's
>> single_nonzero_element).
> .IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab
> which does not specify an operation order.  There's also
> fold_left_plus which does (left-to-right).
fold-left-plus is meant to be a strictly in-order reduction, which 
implies (at least to me) that REDUC_PLUS allows reassociation.
> An argument could be made that constant folding should do what
> the CPU will end up doing but we're already not doing that for
> double arithmetic and flush-to-zero enabled with -ffast-math and
> denormals I think.
Right.  We can also lose inexact state when folding, but I think that's 
largely considered OK as well.

jeff
diff mbox series

Patch

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 386d573..4283308 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -16792,6 +16792,33 @@  address_compare (tree_code code, tree type, tree op0, tree op1,
   return equal;
 }
 
+/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE.  */
+tree
+ctor_single_nonzero_element (const_tree t)
+{
+  unsigned HOST_WIDE_INT idx;
+  constructor_elt *ce;
+  tree elt = NULL_TREE;
+
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      gassign *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t));
+      if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
+        t = gimple_assign_rhs1 (def);
+    }
+
+  if (TREE_CODE (t) != CONSTRUCTOR)
+    return NULL_TREE;
+  for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (t), idx, &ce); idx++)
+    if (!integer_zerop (ce->value) && !real_zerop (ce->value))
+      {
+	if (elt)
+	  return NULL_TREE;
+	elt = ce->value;
+      }
+  return elt;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index f217598..b2f0a2f 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -224,6 +224,7 @@  extern const char *c_getstr (tree);
 extern wide_int tree_nonzero_bits (const_tree);
 extern int address_compare (tree_code, tree, tree, tree, tree &, tree &,
 			    poly_int64 &, poly_int64 &, bool);
+extern tree ctor_single_nonzero_element (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
    POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index d9d8359..047fb50 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7528,6 +7528,20 @@  and,
        (BIT_FIELD_REF:elt_type @0 { size; } { pos; })
        { elt; })))))))
 
+/* Fold reduction of a single nonzero element constructor.  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
+  (simplify (reduc (CONSTRUCTOR@0))
+    (with { tree elt = ctor_single_nonzero_element (@0); }
+      (if (elt)
+        (non_lvalue { elt; })))))
+
+/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST).  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX
+            IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR)
+     op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
+  (simplify (reduc (op @0 VECTOR_CST@1))
+    (op (reduc:type @0) (reduc:type @1))))
+
 (simplify
  (vec_perm @0 @1 VECTOR_CST@2)
  (with
diff --git a/gcc/testsuite/gcc.dg/fold-reduc-1.c b/gcc/testsuite/gcc.dg/fold-reduc-1.c
new file mode 100644
index 0000000..c8360b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-reduc-1.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+float foo (float x)
+{
+ int i;
+ float j;
+ float a = 0;
+ for (i = 0; i < 4; ++i)
+   {
+     for (j = 0; j < 4; ++j)
+       {
+         a += 1;
+         x += a;
+       }
+   }
+ return x;
+}
+
+/* { dg-final { scan-tree-dump-not "REDUC_PLUS" "optimized"} } */