diff mbox series

[2/2] middle-end: use two's complement equality when comparing IVs during candidate selection [PR114932]

Message ID ZsSUzWt43/xcbN/8@arm.com
State New
Headers show
Series [1/2] middle-end: refactor type to be explicit in operand_equal_p [PR114932] | expand

Commit Message

Tamar Christina Aug. 20, 2024, 1:06 p.m. UTC
Hi All,

IVOPTS normally uses affine trees to perform comparisons between different IVs,
but these seem to have been missing in two key spots and instead normal tree
equivalencies used.

In some cases where we have a two-complements equivalence but not a strict
signedness equivalencies we end up generating both a signed and unsigned IV for
the same candidate.

This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag will
check if the operands would produce the same bit values after the computations
even if the final sign is different.

This happens quite a lot with fortran but can also happen in C because this came
code is unable to figure out when one expression is a multiple of another.

As an example in the attached testcase we get:

Initial set of candidates:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4
inv_expr 2:     (unsigned long) stride.3_27 * 4

These end up being used in the same group:

Group 1:
cand  cost    compl.  inv.expr.       inv.vars
1     0       0       NIL;    6
2     0       0       NIL;    6
3     0       0       NIL;    6

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot
;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE)
;;                25 [always]  count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
  # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot
;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)
;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)
;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE)
# .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
ivtmp.22_82 = ivtmp.22_41 + 1;
ivtmp.26_72 = ivtmp.26_51 + _80;
ivtmp.28_98 = ivtmp.28_90 + _39;  

These two IVs are always used as unsigned, so IV ops generates:

  _73 = stride.3_27 * 4;
  _80 = (unsigned long) _73;
  _54 = (unsigned long) stride.3_27;
  _39 = _54 * 4;

Which means that in e.g. exchange2 we generate a lot of duplicate code.

This is because candidate 6 and 8 are equivalent under two's complement but have
different signs.

This patch changes it so that if you have two IVs that are affine equivalent to
just pick one over the other.  IV already has code for this, so the patch just
uses affine trees instead of tree for the check.

With it we get:

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4

<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  5     0       2       NIL;    NIL;
  6     0       3       NIL;    NIL;

Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  1     0       0       NIL;    6
  2     0       0       NIL;    6
  3     0       0       NIL;    6
  4     0       0       NIL;    6

Initial set of candidates:
  cost: 16 (complexity 3)
  reg_cost: 6
  cand_cost: 10
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6
   group:0 --> iv_cand:6, cost=(0,3)
   group:1 --> iv_cand:1, cost=(0,0)
  invariant variables: 6
  invariant expressions: 1

The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/114932
	* fold-const.cc (operand_compare::operand_equal_p): Use it.
	(operand_compare::verify_hash_value): Likewise.
	(test_operand_equality::test): New.
	(fold_const_cc_tests): Use it.
	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.

gcc/testsuite/ChangeLog:

	PR tree-optimization/114932
	* gfortran.dg/addressing-modes_2.f90: New test.

---




--

Comments

Tamar Christina Sept. 10, 2024, 7:56 p.m. UTC | #1
ping

> -----Original Message-----
> From: Tamar Christina <tamar.christina@arm.com>
> Sent: Tuesday, August 20, 2024 2:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH 2/2]middle-end: use two's complement equality when comparing
> IVs during candidate selection [PR114932]
> 
> Hi All,
> 
> IVOPTS normally uses affine trees to perform comparisons between different IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
> 
> In some cases where we have a two-complements equivalence but not a strict
> signedness equivalencies we end up generating both a signed and unsigned IV for
> the same candidate.
> 
> This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag will
> check if the operands would produce the same bit values after the computations
> even if the final sign is different.
> 
> This happens quite a lot with fortran but can also happen in C because this came
> code is unable to figure out when one expression is a multiple of another.
> 
> As an example in the attached testcase we get:
> 
> Initial set of candidates:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> inv_expr 2:     (unsigned long) stride.3_27 * 4
> 
> These end up being used in the same group:
> 
> Group 1:
> cand  cost    compl.  inv.expr.       inv.vars
> 1     0       0       NIL;    6
> 2     0       0       NIL;    6
> 3     0       0       NIL;    6
> 
> which ends up with IV opts picking the signed and unsigned IVs:
> 
> Improved to:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> and so generates the same IV as both signed and unsigned:
> 
> ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> 58.2545), maybe hot
> ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> (FALLTHRU,EXECUTABLE)
> ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
>   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
>   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
>   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> 
> ...
> 
> ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> 58.2545), maybe hot
> ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> (FALLTHRU)
> ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182)
> (TRUE_VALUE,EXECUTABLE)
> ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455)
> (TRUE_VALUE,EXECUTABLE)
> # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> ivtmp.22_82 = ivtmp.22_41 + 1;
> ivtmp.26_72 = ivtmp.26_51 + _80;
> ivtmp.28_98 = ivtmp.28_90 + _39;
> 
> These two IVs are always used as unsigned, so IV ops generates:
> 
>   _73 = stride.3_27 * 4;
>   _80 = (unsigned long) _73;
>   _54 = (unsigned long) stride.3_27;
>   _39 = _54 * 4;
> 
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
> 
> This is because candidate 6 and 8 are equivalent under two's complement but have
> different signs.
> 
> This patch changes it so that if you have two IVs that are affine equivalent to
> just pick one over the other.  IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
> 
> With it we get:
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> 
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    compl.  inv.expr.       inv.vars
>   5     0       2       NIL;    NIL;
>   6     0       3       NIL;    NIL;
> 
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     0       0       NIL;    6
>   2     0       0       NIL;    6
>   3     0       0       NIL;    6
>   4     0       0       NIL;    6
> 
> Initial set of candidates:
>   cost: 16 (complexity 3)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6
>    group:0 --> iv_cand:6, cost=(0,3)
>    group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 6
>   invariant expressions: 1
> 
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* fold-const.cc (operand_compare::operand_equal_p): Use it.
> 	(operand_compare::verify_hash_value): Likewise.
> 	(test_operand_equality::test): New.
> 	(fold_const_cc_tests): Use it.
> 	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> 	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* gfortran.dg/addressing-modes_2.f90: New test.
> 
> ---
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b
> 3301e21e038e3e41 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0,
> const_tree arg0,
>        return tree_int_cst_equal (arg0, arg1);
>      }
> 
> -  if (!(flags & OEP_ADDRESS_OF))
> +  if ((flags & OEP_STRUCTURAL_EQ)
> +      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> +    {
> +      const_tree t_arg0 = arg0;
> +      const_tree t_arg1 = arg1;
> +      STRIP_NOPS (arg0);
> +      STRIP_NOPS (arg1);
> +      /* Only recurse if the conversion was one that was valid to strip.  */
> +      if (t_arg0 != arg0 || t_arg1 != arg1)
> +	return operand_equal_p (type0, arg0, type1, arg1, flags);
> +    }
> +
> +  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
>      {
>        /* If both types don't have the same signedness, then we can't consider
>  	 them equal.  We must check this before the STRIP_NOPS calls
> @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0,
> const_tree arg0,
>    if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
>      return false;
> 
> +  /* Check if we are checking an operation where the two's compliment bitwise
> +     representation of the result is not the same between signed and
> +     unsigned arithmetic.  */
> +  switch (TREE_CODE (arg0))
> +  {
> +  case PLUS_EXPR:
> +  case MINUS_EXPR:
> +  case MULT_EXPR:
> +    break;
> +
> +  default:
> +    /* Clear the structural comparison bit.  */
> +    flags &= ~OEP_STRUCTURAL_EQ;
> +    break;
> +  }
> +
>  /* Define macros to test an operand from arg0 and arg1 for equality and a
>     variant that allows null and views null as being different from any
>     non-null value.  In the latter case, if either is null, the both
> @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree
> arg0, const_tree arg1,
>      {
>        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
>  	{
> -	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> +	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME |
> OEP_STRUCTURAL_EQ)))
>  	    {
>  	      inchash::hash hstate0 (0), hstate1 (0);
>  	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
>  				   x);
>  }
> 
> +namespace test_operand_equality {
> +
> +/* Verify structural equality.  */
> +
> +/* Execute fold_vec_perm_cst unit tests.  */
> +
> +static void
> +test ()
> +{
> +  tree stype = integer_type_node;
> +  tree utype = unsigned_type_node;
> +  tree x = create_tmp_var_raw (stype, "x");
> +  tree y = create_tmp_var_raw (stype, "y");
> +  tree z = create_tmp_var_raw (stype, "z");
> +  tree four = build_int_cst (stype, 4);
> +  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> +  tree rhs1 = fold_convert (stype,
> +		fold_build2 (PLUS_EXPR, utype,
> +			     fold_convert (utype, x),
> +			     fold_convert (utype, y)));
> +
> +  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
> +  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> +
> +  /* (int)(unsigned) x == x.  */
> +  tree lhs2 = build1 (NOP_EXPR, stype,
> +		build1 (NOP_EXPR, utype, x));
> +  tree rhs2 = x;
> +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
> +
> +  /* (unsigned x) + (unsigned y) == x + y.  */
> +  tree lhs3 = lhs1;
> +  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> +			   fold_convert (utype, x),
> +			   fold_convert (utype, y));
> +  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> +
> +  /* (unsigned x) / (unsigned y) == x / y.  */
> +  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> +  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> +			   fold_convert (utype, x),
> +			   fold_convert (utype, y));
> +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> +
> +  /* (unsigned x) / 4 == x / 4.  */
> +  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> +  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> +			   fold_convert (utype, x),
> +			   fold_convert (utype, four));
> +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> +
> +  /* (unsigned x) + 4 == x + 4.  */
> +  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> +  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> +			   fold_convert (utype, x),
> +			   fold_convert (utype, four));
> +  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> +
> +  /* (unsigned x) + 4 == 4 + x.  */
> +  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> +  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> +			   fold_convert (utype, x),
> +			   fold_convert (utype, four));
> +  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> +
> +  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
> +  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> +			   fold_build2 (MULT_EXPR, stype,
> +					fold_build2 (PLUS_EXPR, stype, four, x),
> +					y),
> +			   z);
> +  tree rhs8 = fold_build2 (MULT_EXPR, utype,
> +			   fold_build2 (PLUS_EXPR, utype,
> +					fold_convert (utype, x),
> +					fold_convert (utype, four)),
> +			   fold_convert (utype, y));
> +  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> +  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> +}
> +}
> +
>  namespace test_fold_vec_perm_cst {
> 
>  /* Build a VECTOR_CST corresponding to VMODE, and has
> @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
>    test_vector_folding ();
>    test_vec_duplicate_folding ();
>    test_fold_vec_perm_cst::test ();
> +  test_operand_equality::test ();
>  }
> 
>  } // namespace selftest
> diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> 2e24a4973c8539fae
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> @@ -0,0 +1,20 @@
> +! { dg-do compile { target aarch64-*-* } }
> +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> +
> +module a
> +integer, parameter :: b=3, c=b
> +contains
> +subroutine d(block)
> +integer f, col   , block(:, :, :), e
> +do f = 1, c
> +   do col = 1, c
> +             block(:f,                          :, e()) = do
> +     end do
> +  end do
> +  end
> +  end
> +
> +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts }
> }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> ivopts } }
> +
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0
> 7d133bc393aa4a7 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -962,7 +962,9 @@ enum operand_equal_flag {
>    /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
>       of the same kind.  Used to compare VLA bounds involving parameters
>       across redeclarations of the same function.  */
> -  OEP_DECL_NAME = 512
> +  OEP_DECL_NAME = 512,
> +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> +  OEP_STRUCTURAL_EQ = 1024
>  };
> 
>  /* Enum and arrays used for tree allocation stats.
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3
> ed5e614b7ccc702f 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> 
>  	  /* Check if it has the same stripped base and step.  */
>  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> -	      && operand_equal_p (iv->step, use->iv->step, 0)
> -	      && operand_equal_p (addr_base, use->addr_base, 0))
> +	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> +	      && operand_equal_p (addr_base, use->addr_base,
> OEP_STRUCTURAL_EQ))
>  	    break;
>  	}
>        if (i == data->vgroups.length ())
> 
> 
> 
> 
> --
Tamar Christina Sept. 23, 2024, 7:41 a.m. UTC | #2
ping

> -----Original Message-----
> From: Tamar Christina
> Sent: Tuesday, September 10, 2024 8:57 PM
> To: Tamar Christina <tamar.christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> comparing IVs during candidate selection [PR114932]
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christina@arm.com>
> > Sent: Tuesday, August 20, 2024 2:06 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > Subject: [PATCH 2/2]middle-end: use two's complement equality when
> comparing
> > IVs during candidate selection [PR114932]
> >
> > Hi All,
> >
> > IVOPTS normally uses affine trees to perform comparisons between different IVs,
> > but these seem to have been missing in two key spots and instead normal tree
> > equivalencies used.
> >
> > In some cases where we have a two-complements equivalence but not a strict
> > signedness equivalencies we end up generating both a signed and unsigned IV for
> > the same candidate.
> >
> > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag will
> > check if the operands would produce the same bit values after the computations
> > even if the final sign is different.
> >
> > This happens quite a lot with fortran but can also happen in C because this came
> > code is unable to figure out when one expression is a multiple of another.
> >
> > As an example in the attached testcase we get:
> >
> > Initial set of candidates:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> > inv_expr 2:     (unsigned long) stride.3_27 * 4
> >
> > These end up being used in the same group:
> >
> > Group 1:
> > cand  cost    compl.  inv.expr.       inv.vars
> > 1     0       0       NIL;    6
> > 2     0       0       NIL;    6
> > 3     0       0       NIL;    6
> >
> > which ends up with IV opts picking the signed and unsigned IVs:
> >
> > Improved to:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > and so generates the same IV as both signed and unsigned:
> >
> > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > 58.2545), maybe hot
> > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > (FALLTHRU,EXECUTABLE)
> > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > (FALLTHRU,DFS_BACK,EXECUTABLE)
> >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> >
> > ...
> >
> > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > 58.2545), maybe hot
> > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > (FALLTHRU)
> > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq
> 19.4182)
> > (TRUE_VALUE,EXECUTABLE)
> > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq
> 12.9455)
> > (TRUE_VALUE,EXECUTABLE)
> > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > ivtmp.22_82 = ivtmp.22_41 + 1;
> > ivtmp.26_72 = ivtmp.26_51 + _80;
> > ivtmp.28_98 = ivtmp.28_90 + _39;
> >
> > These two IVs are always used as unsigned, so IV ops generates:
> >
> >   _73 = stride.3_27 * 4;
> >   _80 = (unsigned long) _73;
> >   _54 = (unsigned long) stride.3_27;
> >   _39 = _54 * 4;
> >
> > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> >
> > This is because candidate 6 and 8 are equivalent under two's complement but
> have
> > different signs.
> >
> > This patch changes it so that if you have two IVs that are affine equivalent to
> > just pick one over the other.  IV already has code for this, so the patch just
> > uses affine trees instead of tree for the check.
> >
> > With it we get:
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> >
> > <Group-candidate Costs>:
> > Group 0:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   5     0       2       NIL;    NIL;
> >   6     0       3       NIL;    NIL;
> >
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     0       0       NIL;    6
> >   2     0       0       NIL;    6
> >   3     0       0       NIL;    6
> >   4     0       0       NIL;    6
> >
> > Initial set of candidates:
> >   cost: 16 (complexity 3)
> >   reg_cost: 6
> >   cand_cost: 10
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6
> >    group:0 --> iv_cand:6, cost=(0,3)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >   invariant variables: 6
> >   invariant expressions: 1
> >
> > The two patches together results in a 10% performance increase in exchange2 in
> > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > compile
> > time. There's also a 5% performance improvement in fotonik3d and similar
> > reduction in binary size.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* fold-const.cc (operand_compare::operand_equal_p): Use it.
> > 	(operand_compare::verify_hash_value): Likewise.
> > 	(test_operand_equality::test): New.
> > 	(fold_const_cc_tests): Use it.
> > 	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> > 	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* gfortran.dg/addressing-modes_2.f90: New test.
> >
> > ---
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index
> >
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b
> > 3301e21e038e3e41 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0,
> > const_tree arg0,
> >        return tree_int_cst_equal (arg0, arg1);
> >      }
> >
> > -  if (!(flags & OEP_ADDRESS_OF))
> > +  if ((flags & OEP_STRUCTURAL_EQ)
> > +      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> > +    {
> > +      const_tree t_arg0 = arg0;
> > +      const_tree t_arg1 = arg1;
> > +      STRIP_NOPS (arg0);
> > +      STRIP_NOPS (arg1);
> > +      /* Only recurse if the conversion was one that was valid to strip.  */
> > +      if (t_arg0 != arg0 || t_arg1 != arg1)
> > +	return operand_equal_p (type0, arg0, type1, arg1, flags);
> > +    }
> > +
> > +  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
> >      {
> >        /* If both types don't have the same signedness, then we can't consider
> >  	 them equal.  We must check this before the STRIP_NOPS calls
> > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0,
> > const_tree arg0,
> >    if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
> >      return false;
> >
> > +  /* Check if we are checking an operation where the two's compliment bitwise
> > +     representation of the result is not the same between signed and
> > +     unsigned arithmetic.  */
> > +  switch (TREE_CODE (arg0))
> > +  {
> > +  case PLUS_EXPR:
> > +  case MINUS_EXPR:
> > +  case MULT_EXPR:
> > +    break;
> > +
> > +  default:
> > +    /* Clear the structural comparison bit.  */
> > +    flags &= ~OEP_STRUCTURAL_EQ;
> > +    break;
> > +  }
> > +
> >  /* Define macros to test an operand from arg0 and arg1 for equality and a
> >     variant that allows null and views null as being different from any
> >     non-null value.  In the latter case, if either is null, the both
> > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree
> > arg0, const_tree arg1,
> >      {
> >        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
> >  	{
> > -	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> > +	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME |
> > OEP_STRUCTURAL_EQ)))
> >  	    {
> >  	      inchash::hash hstate0 (0), hstate1 (0);
> >  	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> > @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
> >  				   x);
> >  }
> >
> > +namespace test_operand_equality {
> > +
> > +/* Verify structural equality.  */
> > +
> > +/* Execute fold_vec_perm_cst unit tests.  */
> > +
> > +static void
> > +test ()
> > +{
> > +  tree stype = integer_type_node;
> > +  tree utype = unsigned_type_node;
> > +  tree x = create_tmp_var_raw (stype, "x");
> > +  tree y = create_tmp_var_raw (stype, "y");
> > +  tree z = create_tmp_var_raw (stype, "z");
> > +  tree four = build_int_cst (stype, 4);
> > +  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> > +  tree rhs1 = fold_convert (stype,
> > +		fold_build2 (PLUS_EXPR, utype,
> > +			     fold_convert (utype, x),
> > +			     fold_convert (utype, y)));
> > +
> > +  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
> > +  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> > +
> > +  /* (int)(unsigned) x == x.  */
> > +  tree lhs2 = build1 (NOP_EXPR, stype,
> > +		build1 (NOP_EXPR, utype, x));
> > +  tree rhs2 = x;
> > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
> > +
> > +  /* (unsigned x) + (unsigned y) == x + y.  */
> > +  tree lhs3 = lhs1;
> > +  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> > +			   fold_convert (utype, x),
> > +			   fold_convert (utype, y));
> > +  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> > +
> > +  /* (unsigned x) / (unsigned y) == x / y.  */
> > +  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> > +  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > +			   fold_convert (utype, x),
> > +			   fold_convert (utype, y));
> > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> > +
> > +  /* (unsigned x) / 4 == x / 4.  */
> > +  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> > +  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > +			   fold_convert (utype, x),
> > +			   fold_convert (utype, four));
> > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> > +
> > +  /* (unsigned x) + 4 == x + 4.  */
> > +  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> > +  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> > +			   fold_convert (utype, x),
> > +			   fold_convert (utype, four));
> > +  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> > +
> > +  /* (unsigned x) + 4 == 4 + x.  */
> > +  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> > +  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> > +			   fold_convert (utype, x),
> > +			   fold_convert (utype, four));
> > +  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> > +
> > +  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
> > +  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> > +			   fold_build2 (MULT_EXPR, stype,
> > +					fold_build2 (PLUS_EXPR, stype, four, x),
> > +					y),
> > +			   z);
> > +  tree rhs8 = fold_build2 (MULT_EXPR, utype,
> > +			   fold_build2 (PLUS_EXPR, utype,
> > +					fold_convert (utype, x),
> > +					fold_convert (utype, four)),
> > +			   fold_convert (utype, y));
> > +  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> > +  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> > +}
> > +}
> > +
> >  namespace test_fold_vec_perm_cst {
> >
> >  /* Build a VECTOR_CST corresponding to VMODE, and has
> > @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
> >    test_vector_folding ();
> >    test_vec_duplicate_folding ();
> >    test_fold_vec_perm_cst::test ();
> > +  test_operand_equality::test ();
> >  }
> >
> >  } // namespace selftest
> > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > 2e24a4973c8539fae
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > @@ -0,0 +1,20 @@
> > +! { dg-do compile { target aarch64-*-* } }
> > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > +
> > +module a
> > +integer, parameter :: b=3, c=b
> > +contains
> > +subroutine d(block)
> > +integer f, col   , block(:, :, :), e
> > +do f = 1, c
> > +   do col = 1, c
> > +             block(:f,                          :, e()) = do
> > +     end do
> > +  end do
> > +  end
> > +  end
> > +
> > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts
> }
> > }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> > ivopts } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> > ivopts } }
> > +
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index
> >
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0
> > 7d133bc393aa4a7 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -962,7 +962,9 @@ enum operand_equal_flag {
> >    /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
> >       of the same kind.  Used to compare VLA bounds involving parameters
> >       across redeclarations of the same function.  */
> > -  OEP_DECL_NAME = 512
> > +  OEP_DECL_NAME = 512,
> > +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> > +  OEP_STRUCTURAL_EQ = 1024
> >  };
> >
> >  /* Enum and arrays used for tree allocation stats.
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index
> >
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3
> > ed5e614b7ccc702f 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> >
> >  	  /* Check if it has the same stripped base and step.  */
> >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > +	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> > +	      && operand_equal_p (addr_base, use->addr_base,
> > OEP_STRUCTURAL_EQ))
> >  	    break;
> >  	}
> >        if (i == data->vgroups.length ())
> >
> >
> >
> >
> > --
Tamar Christina Oct. 14, 2024, 3:08 p.m. UTC | #3
ping

> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, September 23, 2024 8:41 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> comparing IVs during candidate selection [PR114932]
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina
> > Sent: Tuesday, September 10, 2024 8:57 PM
> > To: Tamar Christina <tamar.christina@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> > comparing IVs during candidate selection [PR114932]
> >
> > ping
> >
> > > -----Original Message-----
> > > From: Tamar Christina <tamar.christina@arm.com>
> > > Sent: Tuesday, August 20, 2024 2:06 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > > Subject: [PATCH 2/2]middle-end: use two's complement equality when
> > comparing
> > > IVs during candidate selection [PR114932]
> > >
> > > Hi All,
> > >
> > > IVOPTS normally uses affine trees to perform comparisons between different
> IVs,
> > > but these seem to have been missing in two key spots and instead normal tree
> > > equivalencies used.
> > >
> > > In some cases where we have a two-complements equivalence but not a strict
> > > signedness equivalencies we end up generating both a signed and unsigned IV
> for
> > > the same candidate.
> > >
> > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag
> will
> > > check if the operands would produce the same bit values after the
> computations
> > > even if the final sign is different.
> > >
> > > This happens quite a lot with fortran but can also happen in C because this
> came
> > > code is unable to figure out when one expression is a multiple of another.
> > >
> > > As an example in the attached testcase we get:
> > >
> > > Initial set of candidates:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > >
> > > These end up being used in the same group:
> > >
> > > Group 1:
> > > cand  cost    compl.  inv.expr.       inv.vars
> > > 1     0       0       NIL;    6
> > > 2     0       0       NIL;    6
> > > 3     0       0       NIL;    6
> > >
> > > which ends up with IV opts picking the signed and unsigned IVs:
> > >
> > > Improved to:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > and so generates the same IV as both signed and unsigned:
> > >
> > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > > 58.2545), maybe hot
> > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > > (FALLTHRU,EXECUTABLE)
> > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > >
> > > ...
> > >
> > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > > 58.2545), maybe hot
> > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > > (FALLTHRU)
> > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq
> > 19.4182)
> > > (TRUE_VALUE,EXECUTABLE)
> > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq
> > 12.9455)
> > > (TRUE_VALUE,EXECUTABLE)
> > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > > ivtmp.22_82 = ivtmp.22_41 + 1;
> > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > >
> > > These two IVs are always used as unsigned, so IV ops generates:
> > >
> > >   _73 = stride.3_27 * 4;
> > >   _80 = (unsigned long) _73;
> > >   _54 = (unsigned long) stride.3_27;
> > >   _39 = _54 * 4;
> > >
> > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > >
> > > This is because candidate 6 and 8 are equivalent under two's complement but
> > have
> > > different signs.
> > >
> > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > just pick one over the other.  IV already has code for this, so the patch just
> > > uses affine trees instead of tree for the check.
> > >
> > > With it we get:
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > >
> > > <Group-candidate Costs>:
> > > Group 0:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   5     0       2       NIL;    NIL;
> > >   6     0       3       NIL;    NIL;
> > >
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     0       0       NIL;    6
> > >   2     0       0       NIL;    6
> > >   3     0       0       NIL;    6
> > >   4     0       0       NIL;    6
> > >
> > > Initial set of candidates:
> > >   cost: 16 (complexity 3)
> > >   reg_cost: 6
> > >   cand_cost: 10
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6
> > >    group:0 --> iv_cand:6, cost=(0,3)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >   invariant variables: 6
> > >   invariant expressions: 1
> > >
> > > The two patches together results in a 10% performance increase in exchange2
> in
> > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > compile
> > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > reduction in binary size.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	PR tree-optimization/114932
> > > 	* fold-const.cc (operand_compare::operand_equal_p): Use it.
> > > 	(operand_compare::verify_hash_value): Likewise.
> > > 	(test_operand_equality::test): New.
> > > 	(fold_const_cc_tests): Use it.
> > > 	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> > > 	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 	PR tree-optimization/114932
> > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > >
> > > ---
> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > index
> > >
> >
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b
> > > 3301e21e038e3e41 100644
> > > --- a/gcc/fold-const.cc
> > > +++ b/gcc/fold-const.cc
> > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0,
> > > const_tree arg0,
> > >        return tree_int_cst_equal (arg0, arg1);
> > >      }
> > >
> > > -  if (!(flags & OEP_ADDRESS_OF))
> > > +  if ((flags & OEP_STRUCTURAL_EQ)
> > > +      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> > > +    {
> > > +      const_tree t_arg0 = arg0;
> > > +      const_tree t_arg1 = arg1;
> > > +      STRIP_NOPS (arg0);
> > > +      STRIP_NOPS (arg1);
> > > +      /* Only recurse if the conversion was one that was valid to strip.  */
> > > +      if (t_arg0 != arg0 || t_arg1 != arg1)
> > > +	return operand_equal_p (type0, arg0, type1, arg1, flags);
> > > +    }
> > > +
> > > +  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
> > >      {
> > >        /* If both types don't have the same signedness, then we can't consider
> > >  	 them equal.  We must check this before the STRIP_NOPS calls
> > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0,
> > > const_tree arg0,
> > >    if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
> > >      return false;
> > >
> > > +  /* Check if we are checking an operation where the two's compliment bitwise
> > > +     representation of the result is not the same between signed and
> > > +     unsigned arithmetic.  */
> > > +  switch (TREE_CODE (arg0))
> > > +  {
> > > +  case PLUS_EXPR:
> > > +  case MINUS_EXPR:
> > > +  case MULT_EXPR:
> > > +    break;
> > > +
> > > +  default:
> > > +    /* Clear the structural comparison bit.  */
> > > +    flags &= ~OEP_STRUCTURAL_EQ;
> > > +    break;
> > > +  }
> > > +
> > >  /* Define macros to test an operand from arg0 and arg1 for equality and a
> > >     variant that allows null and views null as being different from any
> > >     non-null value.  In the latter case, if either is null, the both
> > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree
> > > arg0, const_tree arg1,
> > >      {
> > >        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
> > >  	{
> > > -	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> > > +	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME |
> > > OEP_STRUCTURAL_EQ)))
> > >  	    {
> > >  	      inchash::hash hstate0 (0), hstate1 (0);
> > >  	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
> > >  				   x);
> > >  }
> > >
> > > +namespace test_operand_equality {
> > > +
> > > +/* Verify structural equality.  */
> > > +
> > > +/* Execute fold_vec_perm_cst unit tests.  */
> > > +
> > > +static void
> > > +test ()
> > > +{
> > > +  tree stype = integer_type_node;
> > > +  tree utype = unsigned_type_node;
> > > +  tree x = create_tmp_var_raw (stype, "x");
> > > +  tree y = create_tmp_var_raw (stype, "y");
> > > +  tree z = create_tmp_var_raw (stype, "z");
> > > +  tree four = build_int_cst (stype, 4);
> > > +  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> > > +  tree rhs1 = fold_convert (stype,
> > > +		fold_build2 (PLUS_EXPR, utype,
> > > +			     fold_convert (utype, x),
> > > +			     fold_convert (utype, y)));
> > > +
> > > +  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
> > > +  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> > > +
> > > +  /* (int)(unsigned) x == x.  */
> > > +  tree lhs2 = build1 (NOP_EXPR, stype,
> > > +		build1 (NOP_EXPR, utype, x));
> > > +  tree rhs2 = x;
> > > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
> > > +
> > > +  /* (unsigned x) + (unsigned y) == x + y.  */
> > > +  tree lhs3 = lhs1;
> > > +  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> > > +			   fold_convert (utype, x),
> > > +			   fold_convert (utype, y));
> > > +  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> > > +
> > > +  /* (unsigned x) / (unsigned y) == x / y.  */
> > > +  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> > > +  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > > +			   fold_convert (utype, x),
> > > +			   fold_convert (utype, y));
> > > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> > > +
> > > +  /* (unsigned x) / 4 == x / 4.  */
> > > +  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> > > +  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > > +			   fold_convert (utype, x),
> > > +			   fold_convert (utype, four));
> > > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> > > +
> > > +  /* (unsigned x) + 4 == x + 4.  */
> > > +  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> > > +  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> > > +			   fold_convert (utype, x),
> > > +			   fold_convert (utype, four));
> > > +  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> > > +
> > > +  /* (unsigned x) + 4 == 4 + x.  */
> > > +  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> > > +  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> > > +			   fold_convert (utype, x),
> > > +			   fold_convert (utype, four));
> > > +  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> > > +
> > > +  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
> > > +  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> > > +			   fold_build2 (MULT_EXPR, stype,
> > > +					fold_build2 (PLUS_EXPR, stype, four, x),
> > > +					y),
> > > +			   z);
> > > +  tree rhs8 = fold_build2 (MULT_EXPR, utype,
> > > +			   fold_build2 (PLUS_EXPR, utype,
> > > +					fold_convert (utype, x),
> > > +					fold_convert (utype, four)),
> > > +			   fold_convert (utype, y));
> > > +  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> > > +  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> > > +}
> > > +}
> > > +
> > >  namespace test_fold_vec_perm_cst {
> > >
> > >  /* Build a VECTOR_CST corresponding to VMODE, and has
> > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
> > >    test_vector_folding ();
> > >    test_vec_duplicate_folding ();
> > >    test_fold_vec_perm_cst::test ();
> > > +  test_operand_equality::test ();
> > >  }
> > >
> > >  } // namespace selftest
> > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > new file mode 100644
> > > index
> > >
> >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > 2e24a4973c8539fae
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > @@ -0,0 +1,20 @@
> > > +! { dg-do compile { target aarch64-*-* } }
> > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > +
> > > +module a
> > > +integer, parameter :: b=3, c=b
> > > +contains
> > > +subroutine d(block)
> > > +integer f, col   , block(:, :, :), e
> > > +do f = 1, c
> > > +   do col = 1, c
> > > +             block(:f,                          :, e()) = do
> > > +     end do
> > > +  end do
> > > +  end
> > > +  end
> > > +
> > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:}
> ivopts
> > }
> > > }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> > > ivopts } }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> > > ivopts } }
> > > +
> > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > index
> > >
> >
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0
> > > 7d133bc393aa4a7 100644
> > > --- a/gcc/tree-core.h
> > > +++ b/gcc/tree-core.h
> > > @@ -962,7 +962,9 @@ enum operand_equal_flag {
> > >    /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
> > >       of the same kind.  Used to compare VLA bounds involving parameters
> > >       across redeclarations of the same function.  */
> > > -  OEP_DECL_NAME = 512
> > > +  OEP_DECL_NAME = 512,
> > > +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> > > +  OEP_STRUCTURAL_EQ = 1024
> > >  };
> > >
> > >  /* Enum and arrays used for tree allocation stats.
> > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > index
> > >
> >
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3
> > > ed5e614b7ccc702f 100644
> > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > >
> > >  	  /* Check if it has the same stripped base and step.  */
> > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > +	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> > > +	      && operand_equal_p (addr_base, use->addr_base,
> > > OEP_STRUCTURAL_EQ))
> > >  	    break;
> > >  	}
> > >        if (i == data->vgroups.length ())
> > >
> > >
> > >
> > >
> > > --
Tamar Christina Nov. 5, 2024, 9:03 a.m. UTC | #4
ping

> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, October 14, 2024 4:08 PM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> comparing IVs during candidate selection [PR114932]
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina <Tamar.Christina@arm.com>
> > Sent: Monday, September 23, 2024 8:41 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> > comparing IVs during candidate selection [PR114932]
> >
> > ping
> >
> > > -----Original Message-----
> > > From: Tamar Christina
> > > Sent: Tuesday, September 10, 2024 8:57 PM
> > > To: Tamar Christina <tamar.christina@arm.com>; gcc-patches@gcc.gnu.org
> > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> > > comparing IVs during candidate selection [PR114932]
> > >
> > > ping
> > >
> > > > -----Original Message-----
> > > > From: Tamar Christina <tamar.christina@arm.com>
> > > > Sent: Tuesday, August 20, 2024 2:06 PM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > > > Subject: [PATCH 2/2]middle-end: use two's complement equality when
> > > comparing
> > > > IVs during candidate selection [PR114932]
> > > >
> > > > Hi All,
> > > >
> > > > IVOPTS normally uses affine trees to perform comparisons between different
> > IVs,
> > > > but these seem to have been missing in two key spots and instead normal
> tree
> > > > equivalencies used.
> > > >
> > > > In some cases where we have a two-complements equivalence but not a strict
> > > > signedness equivalencies we end up generating both a signed and unsigned IV
> > for
> > > > the same candidate.
> > > >
> > > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag
> > will
> > > > check if the operands would produce the same bit values after the
> > computations
> > > > even if the final sign is different.
> > > >
> > > > This happens quite a lot with fortran but can also happen in C because this
> > came
> > > > code is unable to figure out when one expression is a multiple of another.
> > > >
> > > > As an example in the attached testcase we get:
> > > >
> > > > Initial set of candidates:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > > >
> > > > These end up being used in the same group:
> > > >
> > > > Group 1:
> > > > cand  cost    compl.  inv.expr.       inv.vars
> > > > 1     0       0       NIL;    6
> > > > 2     0       0       NIL;    6
> > > > 3     0       0       NIL;    6
> > > >
> > > > which ends up with IV opts picking the signed and unsigned IVs:
> > > >
> > > > Improved to:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > and so generates the same IV as both signed and unsigned:
> > > >
> > > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > > > 58.2545), maybe hot
> > > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > > > (FALLTHRU,EXECUTABLE)
> > > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > > >
> > > > ...
> > > >
> > > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > > > 58.2545), maybe hot
> > > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > > > (FALLTHRU)
> > > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq
> > > 19.4182)
> > > > (TRUE_VALUE,EXECUTABLE)
> > > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq
> > > 12.9455)
> > > > (TRUE_VALUE,EXECUTABLE)
> > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > > > ivtmp.22_82 = ivtmp.22_41 + 1;
> > > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > > >
> > > > These two IVs are always used as unsigned, so IV ops generates:
> > > >
> > > >   _73 = stride.3_27 * 4;
> > > >   _80 = (unsigned long) _73;
> > > >   _54 = (unsigned long) stride.3_27;
> > > >   _39 = _54 * 4;
> > > >
> > > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > > >
> > > > This is because candidate 6 and 8 are equivalent under two's complement but
> > > have
> > > > different signs.
> > > >
> > > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > > just pick one over the other.  IV already has code for this, so the patch just
> > > > uses affine trees instead of tree for the check.
> > > >
> > > > With it we get:
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > >
> > > > <Group-candidate Costs>:
> > > > Group 0:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   5     0       2       NIL;    NIL;
> > > >   6     0       3       NIL;    NIL;
> > > >
> > > > Group 1:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   1     0       0       NIL;    6
> > > >   2     0       0       NIL;    6
> > > >   3     0       0       NIL;    6
> > > >   4     0       0       NIL;    6
> > > >
> > > > Initial set of candidates:
> > > >   cost: 16 (complexity 3)
> > > >   reg_cost: 6
> > > >   cand_cost: 10
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6
> > > >    group:0 --> iv_cand:6, cost=(0,3)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1
> > > >
> > > > The two patches together results in a 10% performance increase in exchange2
> > in
> > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > > compile
> > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > reduction in binary size.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > -m32, -m64 and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	PR tree-optimization/114932
> > > > 	* fold-const.cc (operand_compare::operand_equal_p): Use it.
> > > > 	(operand_compare::verify_hash_value): Likewise.
> > > > 	(test_operand_equality::test): New.
> > > > 	(fold_const_cc_tests): Use it.
> > > > 	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> > > > 	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 	PR tree-optimization/114932
> > > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > > >
> > > > ---
> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > index
> > > >
> > >
> >
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b
> > > > 3301e21e038e3e41 100644
> > > > --- a/gcc/fold-const.cc
> > > > +++ b/gcc/fold-const.cc
> > > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree
> type0,
> > > > const_tree arg0,
> > > >        return tree_int_cst_equal (arg0, arg1);
> > > >      }
> > > >
> > > > -  if (!(flags & OEP_ADDRESS_OF))
> > > > +  if ((flags & OEP_STRUCTURAL_EQ)
> > > > +      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> > > > +    {
> > > > +      const_tree t_arg0 = arg0;
> > > > +      const_tree t_arg1 = arg1;
> > > > +      STRIP_NOPS (arg0);
> > > > +      STRIP_NOPS (arg1);
> > > > +      /* Only recurse if the conversion was one that was valid to strip.  */
> > > > +      if (t_arg0 != arg0 || t_arg1 != arg1)
> > > > +	return operand_equal_p (type0, arg0, type1, arg1, flags);
> > > > +    }
> > > > +
> > > > +  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
> > > >      {
> > > >        /* If both types don't have the same signedness, then we can't consider
> > > >  	 them equal.  We must check this before the STRIP_NOPS calls
> > > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree
> type0,
> > > > const_tree arg0,
> > > >    if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
> > > >      return false;
> > > >
> > > > +  /* Check if we are checking an operation where the two's compliment
> bitwise
> > > > +     representation of the result is not the same between signed and
> > > > +     unsigned arithmetic.  */
> > > > +  switch (TREE_CODE (arg0))
> > > > +  {
> > > > +  case PLUS_EXPR:
> > > > +  case MINUS_EXPR:
> > > > +  case MULT_EXPR:
> > > > +    break;
> > > > +
> > > > +  default:
> > > > +    /* Clear the structural comparison bit.  */
> > > > +    flags &= ~OEP_STRUCTURAL_EQ;
> > > > +    break;
> > > > +  }
> > > > +
> > > >  /* Define macros to test an operand from arg0 and arg1 for equality and a
> > > >     variant that allows null and views null as being different from any
> > > >     non-null value.  In the latter case, if either is null, the both
> > > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree
> > > > arg0, const_tree arg1,
> > > >      {
> > > >        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
> > > >  	{
> > > > -	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> > > > +	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME |
> > > > OEP_STRUCTURAL_EQ)))
> > > >  	    {
> > > >  	      inchash::hash hstate0 (0), hstate1 (0);
> > > >  	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> > > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
> > > >  				   x);
> > > >  }
> > > >
> > > > +namespace test_operand_equality {
> > > > +
> > > > +/* Verify structural equality.  */
> > > > +
> > > > +/* Execute fold_vec_perm_cst unit tests.  */
> > > > +
> > > > +static void
> > > > +test ()
> > > > +{
> > > > +  tree stype = integer_type_node;
> > > > +  tree utype = unsigned_type_node;
> > > > +  tree x = create_tmp_var_raw (stype, "x");
> > > > +  tree y = create_tmp_var_raw (stype, "y");
> > > > +  tree z = create_tmp_var_raw (stype, "z");
> > > > +  tree four = build_int_cst (stype, 4);
> > > > +  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> > > > +  tree rhs1 = fold_convert (stype,
> > > > +		fold_build2 (PLUS_EXPR, utype,
> > > > +			     fold_convert (utype, x),
> > > > +			     fold_convert (utype, y)));
> > > > +
> > > > +  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
> > > > +  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> > > > +
> > > > +  /* (int)(unsigned) x == x.  */
> > > > +  tree lhs2 = build1 (NOP_EXPR, stype,
> > > > +		build1 (NOP_EXPR, utype, x));
> > > > +  tree rhs2 = x;
> > > > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
> > > > +
> > > > +  /* (unsigned x) + (unsigned y) == x + y.  */
> > > > +  tree lhs3 = lhs1;
> > > > +  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> > > > +			   fold_convert (utype, x),
> > > > +			   fold_convert (utype, y));
> > > > +  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> > > > +
> > > > +  /* (unsigned x) / (unsigned y) == x / y.  */
> > > > +  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> > > > +  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > > > +			   fold_convert (utype, x),
> > > > +			   fold_convert (utype, y));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> > > > +
> > > > +  /* (unsigned x) / 4 == x / 4.  */
> > > > +  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> > > > +  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > > > +			   fold_convert (utype, x),
> > > > +			   fold_convert (utype, four));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> > > > +
> > > > +  /* (unsigned x) + 4 == x + 4.  */
> > > > +  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> > > > +  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> > > > +			   fold_convert (utype, x),
> > > > +			   fold_convert (utype, four));
> > > > +  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> > > > +
> > > > +  /* (unsigned x) + 4 == 4 + x.  */
> > > > +  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> > > > +  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> > > > +			   fold_convert (utype, x),
> > > > +			   fold_convert (utype, four));
> > > > +  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> > > > +
> > > > +  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
> > > > +  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> > > > +			   fold_build2 (MULT_EXPR, stype,
> > > > +					fold_build2 (PLUS_EXPR, stype, four, x),
> > > > +					y),
> > > > +			   z);
> > > > +  tree rhs8 = fold_build2 (MULT_EXPR, utype,
> > > > +			   fold_build2 (PLUS_EXPR, utype,
> > > > +					fold_convert (utype, x),
> > > > +					fold_convert (utype, four)),
> > > > +			   fold_convert (utype, y));
> > > > +  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> > > > +  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> > > > +  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> > > > +}
> > > > +}
> > > > +
> > > >  namespace test_fold_vec_perm_cst {
> > > >
> > > >  /* Build a VECTOR_CST corresponding to VMODE, and has
> > > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
> > > >    test_vector_folding ();
> > > >    test_vec_duplicate_folding ();
> > > >    test_fold_vec_perm_cst::test ();
> > > > +  test_operand_equality::test ();
> > > >  }
> > > >
> > > >  } // namespace selftest
> > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > new file mode 100644
> > > > index
> > > >
> > >
> >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > > 2e24a4973c8539fae
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > @@ -0,0 +1,20 @@
> > > > +! { dg-do compile { target aarch64-*-* } }
> > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > > +
> > > > +module a
> > > > +integer, parameter :: b=3, c=b
> > > > +contains
> > > > +subroutine d(block)
> > > > +integer f, col   , block(:, :, :), e
> > > > +do f = 1, c
> > > > +   do col = 1, c
> > > > +             block(:f,                          :, e()) = do
> > > > +     end do
> > > > +  end do
> > > > +  end
> > > > +  end
> > > > +
> > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:}
> > ivopts
> > > }
> > > > }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:}
> 1
> > > > ivopts } }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:}
> 1
> > > > ivopts } }
> > > > +
> > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > > index
> > > >
> > >
> >
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0
> > > > 7d133bc393aa4a7 100644
> > > > --- a/gcc/tree-core.h
> > > > +++ b/gcc/tree-core.h
> > > > @@ -962,7 +962,9 @@ enum operand_equal_flag {
> > > >    /* In conjunction with OEP_LEXICOGRAPHIC considers names of
> declarations
> > > >       of the same kind.  Used to compare VLA bounds involving parameters
> > > >       across redeclarations of the same function.  */
> > > > -  OEP_DECL_NAME = 512
> > > > +  OEP_DECL_NAME = 512,
> > > > +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> > > > +  OEP_STRUCTURAL_EQ = 1024
> > > >  };
> > > >
> > > >  /* Enum and arrays used for tree allocation stats.
> > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > > index
> > > >
> > >
> >
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3
> > > > ed5e614b7ccc702f 100644
> > > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree
> > > > *use_p,
> > > >
> > > >  	  /* Check if it has the same stripped base and step.  */
> > > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > > +	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> > > > +	      && operand_equal_p (addr_base, use->addr_base,
> > > > OEP_STRUCTURAL_EQ))
> > > >  	    break;
> > > >  	}
> > > >        if (i == data->vgroups.length ())
> > > >
> > > >
> > > >
> > > >
> > > > --
diff mbox series

Patch

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b3301e21e038e3e41 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -3208,7 +3208,19 @@  operand_compare::operand_equal_p (tree type0, const_tree arg0,
       return tree_int_cst_equal (arg0, arg1);
     }
 
-  if (!(flags & OEP_ADDRESS_OF))
+  if ((flags & OEP_STRUCTURAL_EQ)
+      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
+    {
+      const_tree t_arg0 = arg0;
+      const_tree t_arg1 = arg1;
+      STRIP_NOPS (arg0);
+      STRIP_NOPS (arg1);
+      /* Only recurse if the conversion was one that was valid to strip.  */
+      if (t_arg0 != arg0 || t_arg1 != arg1)
+	return operand_equal_p (type0, arg0, type1, arg1, flags);
+    }
+
+  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
     {
       /* If both types don't have the same signedness, then we can't consider
 	 them equal.  We must check this before the STRIP_NOPS calls
@@ -3439,6 +3451,22 @@  operand_compare::operand_equal_p (tree type0, const_tree arg0,
   if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
     return false;
 
+  /* Check if we are checking an operation where the two's compliment bitwise
+     representation of the result is not the same between signed and
+     unsigned arithmetic.  */
+  switch (TREE_CODE (arg0))
+  {
+  case PLUS_EXPR:
+  case MINUS_EXPR:
+  case MULT_EXPR:
+    break;
+
+  default:
+    /* Clear the structural comparison bit.  */
+    flags &= ~OEP_STRUCTURAL_EQ;
+    break;
+  }
+
 /* Define macros to test an operand from arg0 and arg1 for equality and a
    variant that allows null and views null as being different from any
    non-null value.  In the latter case, if either is null, the both
@@ -4218,7 +4246,7 @@  operand_compare::verify_hash_value (const_tree arg0, const_tree arg1,
     {
       if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
 	{
-	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
+	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ)))
 	    {
 	      inchash::hash hstate0 (0), hstate1 (0);
 	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
@@ -17357,6 +17385,95 @@  test_arithmetic_folding ()
 				   x);
 }
 
+namespace test_operand_equality {
+
+/* Verify structural equality.  */
+
+/* Execute fold_vec_perm_cst unit tests.  */
+
+static void
+test ()
+{
+  tree stype = integer_type_node;
+  tree utype = unsigned_type_node;
+  tree x = create_tmp_var_raw (stype, "x");
+  tree y = create_tmp_var_raw (stype, "y");
+  tree z = create_tmp_var_raw (stype, "z");
+  tree four = build_int_cst (stype, 4);
+  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
+  tree rhs1 = fold_convert (stype,
+		fold_build2 (PLUS_EXPR, utype,
+			     fold_convert (utype, x),
+			     fold_convert (utype, y)));
+
+  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
+  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
+  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
+
+  /* (int)(unsigned) x == x.  */
+  tree lhs2 = build1 (NOP_EXPR, stype,
+		build1 (NOP_EXPR, utype, x));
+  tree rhs2 = x;
+  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
+  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
+
+  /* (unsigned x) + (unsigned y) == x + y.  */
+  tree lhs3 = lhs1;
+  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
+			   fold_convert (utype, x),
+			   fold_convert (utype, y));
+  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
+  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
+
+  /* (unsigned x) / (unsigned y) == x / y.  */
+  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
+  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
+			   fold_convert (utype, x),
+			   fold_convert (utype, y));
+  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
+  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
+
+  /* (unsigned x) / 4 == x / 4.  */
+  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
+  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
+			   fold_convert (utype, x),
+			   fold_convert (utype, four));
+  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
+  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
+
+  /* (unsigned x) + 4 == x + 4.  */
+  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
+  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
+			   fold_convert (utype, x),
+			   fold_convert (utype, four));
+  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
+  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
+
+  /* (unsigned x) + 4 == 4 + x.  */
+  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
+  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
+			   fold_convert (utype, x),
+			   fold_convert (utype, four));
+  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
+  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
+
+  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
+  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
+			   fold_build2 (MULT_EXPR, stype,
+					fold_build2 (PLUS_EXPR, stype, four, x),
+					y),
+			   z);
+  tree rhs8 = fold_build2 (MULT_EXPR, utype,
+			   fold_build2 (PLUS_EXPR, utype,
+					fold_convert (utype, x),
+					fold_convert (utype, four)),
+			   fold_convert (utype, y));
+  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
+  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
+  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
+}
+}
+
 namespace test_fold_vec_perm_cst {
 
 /* Build a VECTOR_CST corresponding to VMODE, and has
@@ -18150,6 +18267,7 @@  fold_const_cc_tests ()
   test_vector_folding ();
   test_vec_duplicate_folding ();
   test_fold_vec_perm_cst::test ();
+  test_operand_equality::test ();
 }
 
 } // namespace selftest
diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
@@ -0,0 +1,20 @@ 
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
+
+module a
+integer, parameter :: b=3, c=b
+contains
+subroutine d(block)
+integer f, col   , block(:, :, :), e
+do f = 1, c
+   do col = 1, c
+             block(:f,                          :, e()) = do
+     end do
+  end do
+  end
+  end
+
+! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
+
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -962,7 +962,9 @@  enum operand_equal_flag {
   /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
      of the same kind.  Used to compare VLA bounds involving parameters
      across redeclarations of the same function.  */
-  OEP_DECL_NAME = 512
+  OEP_DECL_NAME = 512,
+  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
+  OEP_STRUCTURAL_EQ = 1024
 };
 
 /* Enum and arrays used for tree allocation stats.
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -1623,8 +1623,8 @@  record_group_use (struct ivopts_data *data, tree *use_p,
 
 	  /* Check if it has the same stripped base and step.  */
 	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
-	      && operand_equal_p (iv->step, use->iv->step, 0)
-	      && operand_equal_p (addr_base, use->addr_base, 0))
+	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
+	      && operand_equal_p (addr_base, use->addr_base, OEP_STRUCTURAL_EQ))
 	    break;
 	}
       if (i == data->vgroups.length ())