diff mbox series

[ivopts] : use affine_tree when comparing IVs during candidate selection [PR114932]

Message ID patch-18566-tamar@arm.com
State New
Headers show
Series [ivopts] : use affine_tree when comparing IVs during candidate selection [PR114932] | expand

Commit Message

Tamar Christina June 14, 2024, 11:49 a.m. UTC
Hi All,

IVOPTS normally uses affine trees to perform comparisons between different IVs,
but these seem to have been missing in two key spots and instead normal tree
equivalencies used.

In some cases where we have a structural equivalence but not a signedness
equivalencies we end up generating both a signed and unsigned IV for the same
candidate.

This happens quite a lot with fortran but can also happen in C because this came
code is unable to figure out when one expression is a multiple of another.

As an example in the attached testcase we get:

Initial set of candidates:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4
inv_expr 2:     (unsigned long) stride.3_27 * 4

These end up being used in the same group:

Group 1:
cand  cost    compl.  inv.expr.       inv.vars
1     0       0       NIL;    6
2     0       0       NIL;    6
3     0       0       NIL;    6

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot
;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE)
;;                25 [always]  count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
  # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot
;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)                                                                                                                                                                                   ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                             ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                               # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>                                                                                                                                                                                                                      ivtmp.22_82 = ivtmp.22_41 + 1;                                                                                                                                                                                                                                               ivtmp.26_72 = ivtmp.26_51 + _80;                                                                                                                                                                                                                                             ivtmp.28_98 = ivtmp.28_90 + _39;  

These two IVs are always used as unsigned, so IV ops generates:

  _73 = stride.3_27 * 4;
  _80 = (unsigned long) _73;
  _54 = (unsigned long) stride.3_27;
  _39 = _54 * 4;

Which means that in e.g. exchange2 we generate a lot of duplicate code.

This is because candidate 6 and 8 are structurally equivalent but have different
signs.

This patch changes it so that if you have two IVs that are affine equivalent to
just pick one over the other.  IV already has code for this, so the patch just
uses affine trees instead of tree for the check.

With it we get:

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4

<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  5     0       2       NIL;    NIL;
  6     0       3       NIL;    NIL;

Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  1     0       0       NIL;    6
  2     0       0       NIL;    6
  3     0       0       NIL;    6
  4     0       0       NIL;    6

Initial set of candidates:
  cost: 16 (complexity 3)
  reg_cost: 6
  cand_cost: 10
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6
   group:0 --> iv_cand:6, cost=(0,3)
   group:1 --> iv_cand:1, cost=(0,0)
  invariant variables: 6
  invariant expressions: 1

The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/114932
	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
	offsets into account.
	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
	(record_group_use): Use it.
	(constant_multiple_of): Also check equality under
	aff_combination_constant_multiple_p.

gcc/testsuite/ChangeLog:

	PR tree-optimization/114932
	* gfortran.dg/addressing-modes_2.f90: New test.

---




--

Comments

Richard Biener June 19, 2024, 11:54 a.m. UTC | #1
On Fri, 14 Jun 2024, Tamar Christina wrote:

> Hi All,
> 
> IVOPTS normally uses affine trees to perform comparisons between different IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
> 
> In some cases where we have a structural equivalence but not a signedness
> equivalencies we end up generating both a signed and unsigned IV for the same
> candidate.
> 
> This happens quite a lot with fortran but can also happen in C because this came
> code is unable to figure out when one expression is a multiple of another.
> 
> As an example in the attached testcase we get:
> 
> Initial set of candidates:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> inv_expr 2:     (unsigned long) stride.3_27 * 4
> 
> These end up being used in the same group:
> 
> Group 1:
> cand  cost    compl.  inv.expr.       inv.vars
> 1     0       0       NIL;    6
> 2     0       0       NIL;    6
> 3     0       0       NIL;    6
> 
> which ends up with IV opts picking the signed and unsigned IVs:
> 
> Improved to:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> and so generates the same IV as both signed and unsigned:
> 
> ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot
> ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE)
> ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
>   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
>   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
>   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> 
> ...
> 
> ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot
> ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)                                                                                                                                                                                   ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                             ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                               # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>                                                                                                                                    
                                                                                   ivtmp.22_82 = ivtmp.22_41 + 1;                                                                                                                                                                                                                                               ivtmp.26_72 = ivtmp.26_51 + _80;                                                                                                                                                                                                                                             ivtmp.28_98 = ivtmp.28_90 + _39;  
> 
> These two IVs are always used as unsigned, so IV ops generates:
> 
>   _73 = stride.3_27 * 4;
>   _80 = (unsigned long) _73;
>   _54 = (unsigned long) stride.3_27;
>   _39 = _54 * 4;
> 
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
> 
> This is because candidate 6 and 8 are structurally equivalent but have different
> signs.
> 
> This patch changes it so that if you have two IVs that are affine equivalent to
> just pick one over the other.  IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
> 
> With it we get:
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> 
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    compl.  inv.expr.       inv.vars
>   5     0       2       NIL;    NIL;
>   6     0       3       NIL;    NIL;
> 
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     0       0       NIL;    6
>   2     0       0       NIL;    6
>   3     0       0       NIL;    6
>   4     0       0       NIL;    6
> 
> Initial set of candidates:
>   cost: 16 (complexity 3)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6
>    group:0 --> iv_cand:6, cost=(0,3)
>    group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 6
>   invariant expressions: 1
> 
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> 	offsets into account.
> 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> 	(record_group_use): Use it.
> 	(constant_multiple_of): Also check equality under
> 	aff_combination_constant_multiple_p.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* gfortran.dg/addressing-modes_2.f90: New test.
> 
> ---
> diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> new file mode 100644
> index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> @@ -0,0 +1,20 @@
> +! { dg-do compile { target aarch64-*-* } }
> +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> +
> +module a
> +integer, parameter :: b=3, c=b
> +contains
> +subroutine d(block)
> +integer f, col   , block(:, :, :), e
> +do f = 1, c
> +   do col = 1, c
> +             block(:f,                          :, e()) = do
> +     end do
> +  end do
> +  end
> +  end
> +
> +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
> +
> diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
> --- a/gcc/tree-affine.cc
> +++ b/gcc/tree-affine.cc
> @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
>  				     &mult_set, mult))
>      return false;
>  
> +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> +     mult_set is checked, since that forced the only valid multiple of
> +     val and div to be 0 whereas 1 is also possible.  */
> +  if (known_eq (val->offset, 0)
> +      && known_eq (div->offset, 0))
> +    mult_set = false;
> +

In fact all numbers are possible?  Shouldn't this be better handled
in wide_int_constant_multiple_p by special-casing
known_eq (div, 0) in the known_eq (val, 0) condition by simply
returning 'true' without checking or setting *mult_set?

The docs of wide_int_constant_multiple_p is odd:

/* If VAL != CST * DIV for any constant CST, returns false.

should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
Or s/any/all/?

>    for (i = 0; i < div->n; i++)
>      {
>        class aff_comb_elt *elt
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
>    return exit;
>  }
>  
> +/* Compares the given affine tree LEFT with the tree expression RIGHT and return
> +   whether they are the same under affine equality.  */
> +
> +static bool
> +affine_compare_eq (aff_tree &left, tree right)
> +{
> +  aff_tree aff_right;
> +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> +  aff_combination_scale (&aff_right, -1);
> +  aff_combination_add (&aff_right, &left);
> +  return aff_combination_zero_p (&aff_right);
> +}
> +
>  
>  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
>     of the arguments of each expression is a constant and that the type of the
> @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>    tree addr_base = NULL;
>    struct iv_group *group = NULL;
>    poly_uint64 addr_offset = 0;
> +  aff_tree iv_step, iv_addr_base;
> +
> +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
>  
>    /* Record non address type use in a new group.  */
>    if (address_p (type))
> @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>        tree addr_toffset;
>        split_constant_offset (iv->base, &addr_base, &addr_toffset);
>        addr_offset = int_cst_value (addr_toffset);
> +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
>        for (i = 0; i < data->vgroups.length (); i++)
>  	{
>  	  struct iv_use *use;
> @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>  
>  	  /* Check if it has the same stripped base and step.  */
>  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> -	      && operand_equal_p (iv->step, use->iv->step, 0)
> -	      && operand_equal_p (addr_base, use->addr_base, 0))
> +	      && affine_compare_eq (iv_step, use->iv->step)
> +	      && affine_compare_eq (iv_addr_base, use->addr_base))

There's only this use of addr_base so I think the opportunity is to
turn iv_use->addr_base into aff_tree (even though that's a quite big
representation).

For the testcase, what are the two IVs we are comparing?  I wonder
why you need the affine compare for iv->step?

>  	    break;
>  	}
>        if (i == data->vgroups.length ())
> @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul)
>        return true;
>      }
>  
> +  aff_tree aff_top, aff_bot;
> +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> +  poly_widest_int poly_mul;
> +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> +      && poly_mul.is_constant (mul))
> +    return true;
> +

So why does stripping nops not work here?

>    code = TREE_CODE (top);
>    switch (code)
>      {
> 
> 
> 
> 
>
Tamar Christina June 19, 2024, 2:26 p.m. UTC | #2
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, June 19, 2024 12:55 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> selection [PR114932]
> 
> On Fri, 14 Jun 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > IVOPTS normally uses affine trees to perform comparisons between different IVs,
> > but these seem to have been missing in two key spots and instead normal tree
> > equivalencies used.
> >
> > In some cases where we have a structural equivalence but not a signedness
> > equivalencies we end up generating both a signed and unsigned IV for the same
> > candidate.
> >
> > This happens quite a lot with fortran but can also happen in C because this came
> > code is unable to figure out when one expression is a multiple of another.
> >
> > As an example in the attached testcase we get:
> >
> > Initial set of candidates:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> > inv_expr 2:     (unsigned long) stride.3_27 * 4
> >
> > These end up being used in the same group:
> >
> > Group 1:
> > cand  cost    compl.  inv.expr.       inv.vars
> > 1     0       0       NIL;    6
> > 2     0       0       NIL;    6
> > 3     0       0       NIL;    6
> >
> > which ends up with IV opts picking the signed and unsigned IVs:
> >
> > Improved to:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > and so generates the same IV as both signed and unsigned:
> >
> > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> 58.2545), maybe hot
> > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> (FALLTHRU,EXECUTABLE)
> > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> >
> > ...
> >
> > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> 58.2545), maybe hot
> > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> (FALLTHRU)
> ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182)
> (TRUE_VALUE,EXECUTABLE)
> ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455)
> (TRUE_VALUE,EXECUTABLE)
> # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> 
>                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> ivtmp.26_72 = ivtmp.26_51 + _80;
> ivtmp.28_98 = ivtmp.28_90 + _39;
> >
> > These two IVs are always used as unsigned, so IV ops generates:
> >
> >   _73 = stride.3_27 * 4;
> >   _80 = (unsigned long) _73;
> >   _54 = (unsigned long) stride.3_27;
> >   _39 = _54 * 4;
> >
> > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> >
> > This is because candidate 6 and 8 are structurally equivalent but have different
> > signs.
> >
> > This patch changes it so that if you have two IVs that are affine equivalent to
> > just pick one over the other.  IV already has code for this, so the patch just
> > uses affine trees instead of tree for the check.
> >
> > With it we get:
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> >
> > <Group-candidate Costs>:
> > Group 0:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   5     0       2       NIL;    NIL;
> >   6     0       3       NIL;    NIL;
> >
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     0       0       NIL;    6
> >   2     0       0       NIL;    6
> >   3     0       0       NIL;    6
> >   4     0       0       NIL;    6
> >
> > Initial set of candidates:
> >   cost: 16 (complexity 3)
> >   reg_cost: 6
> >   cand_cost: 10
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6
> >    group:0 --> iv_cand:6, cost=(0,3)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >   invariant variables: 6
> >   invariant expressions: 1
> >
> > The two patches together results in a 10% performance increase in exchange2 in
> > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> compile
> > time. There's also a 5% performance improvement in fotonik3d and similar
> > reduction in binary size.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > 	offsets into account.
> > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > 	(record_group_use): Use it.
> > 	(constant_multiple_of): Also check equality under
> > 	aff_combination_constant_multiple_p.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* gfortran.dg/addressing-modes_2.f90: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> 2e24a4973c8539fae
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > @@ -0,0 +1,20 @@
> > +! { dg-do compile { target aarch64-*-* } }
> > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > +
> > +module a
> > +integer, parameter :: b=3, c=b
> > +contains
> > +subroutine d(block)
> > +integer f, col   , block(:, :, :), e
> > +do f = 1, c
> > +   do col = 1, c
> > +             block(:f,                          :, e()) = do
> > +     end do
> > +  end do
> > +  end
> > +  end
> > +
> > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts
> } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> ivopts } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> ivopts } }
> > +
> > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > index
> d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> 6346890ddeab4096 100644
> > --- a/gcc/tree-affine.cc
> > +++ b/gcc/tree-affine.cc
> > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val,
> aff_tree *div,
> >  				     &mult_set, mult))
> >      return false;
> >
> > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > +     mult_set is checked, since that forced the only valid multiple of
> > +     val and div to be 0 whereas 1 is also possible.  */
> > +  if (known_eq (val->offset, 0)
> > +      && known_eq (div->offset, 0))
> > +    mult_set = false;
> > +
> 
> In fact all numbers are possible?  Shouldn't this be better handled
> in wide_int_constant_multiple_p by special-casing
> known_eq (div, 0) in the known_eq (val, 0) condition by simply
> returning 'true' without checking or setting *mult_set?
> 
> The docs of wide_int_constant_multiple_p is odd:
> 
> /* If VAL != CST * DIV for any constant CST, returns false.
> 
> should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> Or s/any/all/?
> 

Yeah, that condition would always be false.

> >    for (i = 0; i < div->n; i++)
> >      {
> >        class aff_comb_elt *elt
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index
> 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> 64cd2facd9ddb4914 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> >    return exit;
> >  }
> >
> > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> return
> > +   whether they are the same under affine equality.  */
> > +
> > +static bool
> > +affine_compare_eq (aff_tree &left, tree right)
> > +{
> > +  aff_tree aff_right;
> > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > +  aff_combination_scale (&aff_right, -1);
> > +  aff_combination_add (&aff_right, &left);
> > +  return aff_combination_zero_p (&aff_right);
> > +}
> > +
> >
> >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
> >     of the arguments of each expression is a constant and that the type of the
> > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> >    tree addr_base = NULL;
> >    struct iv_group *group = NULL;
> >    poly_uint64 addr_offset = 0;
> > +  aff_tree iv_step, iv_addr_base;
> > +
> > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> >
> >    /* Record non address type use in a new group.  */
> >    if (address_p (type))
> > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> >        tree addr_toffset;
> >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> >        addr_offset = int_cst_value (addr_toffset);
> > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> &iv_addr_base);
> >        for (i = 0; i < data->vgroups.length (); i++)
> >  	{
> >  	  struct iv_use *use;
> > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> >
> >  	  /* Check if it has the same stripped base and step.  */
> >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > +	      && affine_compare_eq (iv_step, use->iv->step)
> > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> 
> There's only this use of addr_base so I think the opportunity is to
> turn iv_use->addr_base into aff_tree (even though that's a quite big
> representation).

Ah, true.  Will do.

> 
> For the testcase, what are the two IVs we are comparing?  I wonder
> why you need the affine compare for iv->step?
> 

Because step builds up the IV expressions and can also be an expression.

In this case:

>>> p debug (iv->step)
((unsigned long) stride.3_27) * 4
>>> p debug (use->iv->step)
(sizetype) (stride.3_27 * 4)

Note that the original expressions were both unsigned, but the STRIP_NOPS
made one signed.   They still wouldn't have compared equal though due to
the different cast locations.

For completeness the base here is

>>> p debug (use->addr_base)
(integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
$9 = void

>>> p debug (addr_base)
(integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
$10 = void
>

> >  	    break;
> >  	}
> >        if (i == data->vgroups.length ())
> > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int
> *mul)
> >        return true;
> >      }
> >
> > +  aff_tree aff_top, aff_bot;
> > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > +  poly_widest_int poly_mul;
> > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > +      && poly_mul.is_constant (mul))
> > +    return true;
> > +
> 
> So why does stripping nops not work here?

So this is where we compare different IV expressions to determine which
IVs compute the same thing and thus can be in the same group.

The STRIP_NOPS don't work because while the incoming types are the same
the casts are different.  So:

>>> p debug (ustep)
(unsigned long) stride.3_27 * 4
$3 = void
>>> p debug (cstep)
(unsigned long) (stride.3_27 * 4)
$4 = void

Which is of course stripped to:

>>> p debug (top)
(unsigned long) stride.3_27 * 4
$1 = void
>>> p debug (bot)
stride.3_27 * 4

Both of these compute the same thing so by doing the affine compare they
end up in the same IV groups and can be costed.  Later during candidate selection
these are the steps we're comparing to see if the candidate is the same invariant.

The full list is:

<IV Groups>:
Group 0:
  Type: REFERENCE ADDRESS
  Use 0.0:
    At stmt:    *_4 = _33;
    At pos:     *_4
    IV struct:
      Type:     integer(kind=4) *
      Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 4)
      Step:     (sizetype) (stride.3_27 * 4)
      Object:   (void *) block.0_26
      Biv:      N
      Overflowness wrto loop niter:     Overflow
  Use 0.1:
    At stmt:    *_78 = _33;
    At pos:     *_78
    IV struct:
      Type:     integer(kind=4) *
      Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 8)
      Step:     (unsigned long) stride.3_27 * 4
      Object:   (void *) block.0_26
      Biv:      N
      Overflowness wrto loop niter:     Overflow
  Use 0.2:
    At stmt:    *_45 = _33;
    At pos:     *_45
    IV struct:
      Type:     integer(kind=4) *
      Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 12)
      Step:     (unsigned long) stride.3_27 * 4
      Object:   (void *) block.0_26
      Biv:      N
      Overflowness wrto loop niter:     Overflow

And all these IVs are the same but with a slightly different base offset.  By doing the affine compare here IV ops sees that
The best candidate is:

Candidate 6:
  Depend on inv.exprs: 1
  Var befor: ivtmp.26_51
  Var after: ivtmp.26_72
  Incr POS: before exit test
  IV struct:
    Type:       unsigned long
    Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26
    Step:       (unsigned long) (stride.3_27 * 4)
    Object:     (void *) block.0_26
    Biv:        N
    Overflowness wrto loop niter:       Overflow

And just builds the three IVs from the same candidate.

Does this cover what you wanted?

Cheers,
Tamar

> 
> >    code = TREE_CODE (top);
> >    switch (code)
> >      {
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Michael Matz June 19, 2024, 2:46 p.m. UTC | #3
Hello,

On Wed, 19 Jun 2024, Tamar Christina wrote:

> So this is where we compare different IV expressions to determine which
> IVs compute the same thing and thus can be in the same group.
> 
> The STRIP_NOPS don't work because while the incoming types are the same
> the casts are different.  So:
> 
> >>> p debug (ustep)
> (unsigned long) stride.3_27 * 4
> $3 = void
> >>> p debug (cstep)
> (unsigned long) (stride.3_27 * 4)
> $4 = void
> 
> Which is of course stripped to:
> 
> >>> p debug (top)
> (unsigned long) stride.3_27 * 4
> $1 = void
> >>> p debug (bot)
> stride.3_27 * 4
> 
> Both of these compute the same thing

In isolation these are _not_ computing the same when strides type is 
smaller than ulong, namely when stride is either negative or larger than 
its max-value/4.  I.e. when comparing IVs not only the overflow behaviour 
for the whole {base,+,step} revolution matters, but also the behaviour on 
the constituent expressions.  (It's possible that stride is known to be 
non-problematic here, I haven't checked.  I was just triggered by the 
claim of same-ness :) )


Ciao,
Michael.
Tamar Christina June 19, 2024, 2:58 p.m. UTC | #4
> -----Original Message-----
> From: Michael Matz <matz@suse.de>
> Sent: Wednesday, June 19, 2024 3:46 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>; bin.cheng@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> selection [PR114932]
> 
> Hello,
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > So this is where we compare different IV expressions to determine which
> > IVs compute the same thing and thus can be in the same group.
> >
> > The STRIP_NOPS don't work because while the incoming types are the same
> > the casts are different.  So:
> >
> > >>> p debug (ustep)
> > (unsigned long) stride.3_27 * 4
> > $3 = void
> > >>> p debug (cstep)
> > (unsigned long) (stride.3_27 * 4)
> > $4 = void
> >
> > Which is of course stripped to:
> >
> > >>> p debug (top)
> > (unsigned long) stride.3_27 * 4
> > $1 = void
> > >>> p debug (bot)
> > stride.3_27 * 4
> >
> > Both of these compute the same thing
> 
> In isolation these are _not_ computing the same when strides type is
> smaller than ulong, namely when stride is either negative or larger than
> its max-value/4.  I.e. when comparing IVs not only the overflow behaviour
> for the whole {base,+,step} revolution matters, but also the behaviour on
> the constituent expressions.  (It's possible that stride is known to be
> non-problematic here, I haven't checked.  I was just triggered by the
> claim of same-ness :) )

The only use of this method is to determine whether the two expressions
can possibly be the same.  After this IVops forcibly converts them to an
unsigned type though an affine fold in get_computation_aff_1.

So in the end it doesn't care about the sign and uses them all as unsigned.

Tamar
> 
> 
> Ciao,
> Michael.
Richard Biener June 20, 2024, 7:48 a.m. UTC | #5
On Wed, 19 Jun 2024, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, June 19, 2024 12:55 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> > selection [PR114932]
> > 
> > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > IVOPTS normally uses affine trees to perform comparisons between different IVs,
> > > but these seem to have been missing in two key spots and instead normal tree
> > > equivalencies used.
> > >
> > > In some cases where we have a structural equivalence but not a signedness
> > > equivalencies we end up generating both a signed and unsigned IV for the same
> > > candidate.
> > >
> > > This happens quite a lot with fortran but can also happen in C because this came
> > > code is unable to figure out when one expression is a multiple of another.
> > >
> > > As an example in the attached testcase we get:
> > >
> > > Initial set of candidates:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > >
> > > These end up being used in the same group:
> > >
> > > Group 1:
> > > cand  cost    compl.  inv.expr.       inv.vars
> > > 1     0       0       NIL;    6
> > > 2     0       0       NIL;    6
> > > 3     0       0       NIL;    6
> > >
> > > which ends up with IV opts picking the signed and unsigned IVs:
> > >
> > > Improved to:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > and so generates the same IV as both signed and unsigned:
> > >
> > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > 58.2545), maybe hot
> > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > (FALLTHRU,EXECUTABLE)
> > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > >
> > > ...
> > >
> > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > 58.2545), maybe hot
> > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > (FALLTHRU)
> > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182)
> > (TRUE_VALUE,EXECUTABLE)
> > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455)
> > (TRUE_VALUE,EXECUTABLE)
> > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > 
> >                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> > ivtmp.26_72 = ivtmp.26_51 + _80;
> > ivtmp.28_98 = ivtmp.28_90 + _39;
> > >
> > > These two IVs are always used as unsigned, so IV ops generates:
> > >
> > >   _73 = stride.3_27 * 4;
> > >   _80 = (unsigned long) _73;
> > >   _54 = (unsigned long) stride.3_27;
> > >   _39 = _54 * 4;
> > >
> > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > >
> > > This is because candidate 6 and 8 are structurally equivalent but have different
> > > signs.
> > >
> > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > just pick one over the other.  IV already has code for this, so the patch just
> > > uses affine trees instead of tree for the check.
> > >
> > > With it we get:
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > >
> > > <Group-candidate Costs>:
> > > Group 0:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   5     0       2       NIL;    NIL;
> > >   6     0       3       NIL;    NIL;
> > >
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     0       0       NIL;    6
> > >   2     0       0       NIL;    6
> > >   3     0       0       NIL;    6
> > >   4     0       0       NIL;    6
> > >
> > > Initial set of candidates:
> > >   cost: 16 (complexity 3)
> > >   reg_cost: 6
> > >   cand_cost: 10
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6
> > >    group:0 --> iv_cand:6, cost=(0,3)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >   invariant variables: 6
> > >   invariant expressions: 1
> > >
> > > The two patches together results in a 10% performance increase in exchange2 in
> > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > compile
> > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > reduction in binary size.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	PR tree-optimization/114932
> > > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > > 	offsets into account.
> > > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > > 	(record_group_use): Use it.
> > > 	(constant_multiple_of): Also check equality under
> > > 	aff_combination_constant_multiple_p.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 	PR tree-optimization/114932
> > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > >
> > > ---
> > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > 2e24a4973c8539fae
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > @@ -0,0 +1,20 @@
> > > +! { dg-do compile { target aarch64-*-* } }
> > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > +
> > > +module a
> > > +integer, parameter :: b=3, c=b
> > > +contains
> > > +subroutine d(block)
> > > +integer f, col   , block(:, :, :), e
> > > +do f = 1, c
> > > +   do col = 1, c
> > > +             block(:f,                          :, e()) = do
> > > +     end do
> > > +  end do
> > > +  end
> > > +  end
> > > +
> > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts
> > } }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> > ivopts } }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> > ivopts } }
> > > +
> > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > index
> > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> > 6346890ddeab4096 100644
> > > --- a/gcc/tree-affine.cc
> > > +++ b/gcc/tree-affine.cc
> > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val,
> > aff_tree *div,
> > >  				     &mult_set, mult))
> > >      return false;
> > >
> > > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > > +     mult_set is checked, since that forced the only valid multiple of
> > > +     val and div to be 0 whereas 1 is also possible.  */
> > > +  if (known_eq (val->offset, 0)
> > > +      && known_eq (div->offset, 0))
> > > +    mult_set = false;
> > > +
> > 
> > In fact all numbers are possible?  Shouldn't this be better handled
> > in wide_int_constant_multiple_p by special-casing
> > known_eq (div, 0) in the known_eq (val, 0) condition by simply
> > returning 'true' without checking or setting *mult_set?
> > 
> > The docs of wide_int_constant_multiple_p is odd:
> > 
> > /* If VAL != CST * DIV for any constant CST, returns false.
> > 
> > should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> > Or s/any/all/?
> > 
> 
> Yeah, that condition would always be false.

Can you split out a patch to fix this?

> > >    for (i = 0; i < div->n; i++)
> > >      {
> > >        class aff_comb_elt *elt
> > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > index
> > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> > 64cd2facd9ddb4914 100644
> > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> > >    return exit;
> > >  }
> > >
> > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> > return
> > > +   whether they are the same under affine equality.  */
> > > +
> > > +static bool
> > > +affine_compare_eq (aff_tree &left, tree right)
> > > +{
> > > +  aff_tree aff_right;
> > > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > > +  aff_combination_scale (&aff_right, -1);
> > > +  aff_combination_add (&aff_right, &left);
> > > +  return aff_combination_zero_p (&aff_right);
> > > +}
> > > +
> > >
> > >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
> > >     of the arguments of each expression is a constant and that the type of the
> > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> > >    tree addr_base = NULL;
> > >    struct iv_group *group = NULL;
> > >    poly_uint64 addr_offset = 0;
> > > +  aff_tree iv_step, iv_addr_base;
> > > +
> > > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> > >
> > >    /* Record non address type use in a new group.  */
> > >    if (address_p (type))
> > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> > >        tree addr_toffset;
> > >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> > >        addr_offset = int_cst_value (addr_toffset);
> > > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> > &iv_addr_base);
> > >        for (i = 0; i < data->vgroups.length (); i++)
> > >  	{
> > >  	  struct iv_use *use;
> > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> > >
> > >  	  /* Check if it has the same stripped base and step.  */
> > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > +	      && affine_compare_eq (iv_step, use->iv->step)
> > > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> > 
> > There's only this use of addr_base so I think the opportunity is to
> > turn iv_use->addr_base into aff_tree (even though that's a quite big
> > representation).
> 
> Ah, true.  Will do.
> 
> > 
> > For the testcase, what are the two IVs we are comparing?  I wonder
> > why you need the affine compare for iv->step?
> > 
> 
> Because step builds up the IV expressions and can also be an expression.
> 
> In this case:
> 
> >>> p debug (iv->step)
> ((unsigned long) stride.3_27) * 4
> >>> p debug (use->iv->step)
> (sizetype) (stride.3_27 * 4)
> 
> Note that the original expressions were both unsigned, but the STRIP_NOPS
> made one signed.   They still wouldn't have compared equal though due to
> the different cast locations.
> 
> For completeness the base here is
> 
> >>> p debug (use->addr_base)
> (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
> $9 = void
> 
> >>> p debug (addr_base)
> (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
> $10 = void
> >
> 
> > >  	    break;
> > >  	}
> > >        if (i == data->vgroups.length ())
> > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int
> > *mul)
> > >        return true;
> > >      }
> > >
> > > +  aff_tree aff_top, aff_bot;
> > > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > > +  poly_widest_int poly_mul;
> > > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > > +      && poly_mul.is_constant (mul))
> > > +    return true;
> > > +
> > 
> > So why does stripping nops not work here?
> 
> So this is where we compare different IV expressions to determine which
> IVs compute the same thing and thus can be in the same group.
> 
> The STRIP_NOPS don't work because while the incoming types are the same
> the casts are different.  So:
> 
> >>> p debug (ustep)
> (unsigned long) stride.3_27 * 4
> $3 = void
> >>> p debug (cstep)
> (unsigned long) (stride.3_27 * 4)
> $4 = void
> 
> Which is of course stripped to:
> 
> >>> p debug (top)
> (unsigned long) stride.3_27 * 4
> $1 = void
> >>> p debug (bot)
> stride.3_27 * 4

So we're expecting constant_multiple_of to compute *mul == 1, proving
equality?

The function seems to try stripping ops from TOP until it reaches
an expression equal to BOT and that's what fails to trigger here.

What I originally wondered was iff we compute the affine combinations
why not use only aff_combination_constant_multiple_p?

I might also point back to the idea I threw in somewhere, adding
OEP_VALUE (or a better name) to the set of flags accepted by
operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
touching hashing?

> Both of these compute the same thing so by doing the affine compare they
> end up in the same IV groups and can be costed.  Later during candidate selection
> these are the steps we're comparing to see if the candidate is the same invariant.
> 
> The full list is:
> 
> <IV Groups>:
> Group 0:
>   Type: REFERENCE ADDRESS
>   Use 0.0:
>     At stmt:    *_4 = _33;
>     At pos:     *_4
>     IV struct:
>       Type:     integer(kind=4) *
>       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 4)
>       Step:     (sizetype) (stride.3_27 * 4)
>       Object:   (void *) block.0_26
>       Biv:      N
>       Overflowness wrto loop niter:     Overflow
>   Use 0.1:
>     At stmt:    *_78 = _33;
>     At pos:     *_78
>     IV struct:
>       Type:     integer(kind=4) *
>       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 8)
>       Step:     (unsigned long) stride.3_27 * 4
>       Object:   (void *) block.0_26
>       Biv:      N
>       Overflowness wrto loop niter:     Overflow
>   Use 0.2:
>     At stmt:    *_45 = _33;
>     At pos:     *_45
>     IV struct:
>       Type:     integer(kind=4) *
>       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 12)
>       Step:     (unsigned long) stride.3_27 * 4
>       Object:   (void *) block.0_26
>       Biv:      N
>       Overflowness wrto loop niter:     Overflow
> 
> And all these IVs are the same but with a slightly different base offset.  By doing the affine compare here IV ops sees that
> The best candidate is:
> 
> Candidate 6:
>   Depend on inv.exprs: 1
>   Var befor: ivtmp.26_51
>   Var after: ivtmp.26_72
>   Incr POS: before exit test
>   IV struct:
>     Type:       unsigned long
>     Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26
>     Step:       (unsigned long) (stride.3_27 * 4)
>     Object:     (void *) block.0_26
>     Biv:        N
>     Overflowness wrto loop niter:       Overflow
> 
> And just builds the three IVs from the same candidate.
> 
> Does this cover what you wanted?
> 
> Cheers,
> Tamar
> 
> > 
> > >    code = TREE_CODE (top);
> > >    switch (code)
> > >      {
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
Tamar Christina June 24, 2024, 1:30 p.m. UTC | #6
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, June 20, 2024 8:49 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> selection [PR114932]
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Wednesday, June 19, 2024 12:55 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> bin.cheng@linux.alibaba.com
> > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during
> candidate
> > > selection [PR114932]
> > >
> > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > IVOPTS normally uses affine trees to perform comparisons between different
> IVs,
> > > > but these seem to have been missing in two key spots and instead normal
> tree
> > > > equivalencies used.
> > > >
> > > > In some cases where we have a structural equivalence but not a signedness
> > > > equivalencies we end up generating both a signed and unsigned IV for the
> same
> > > > candidate.
> > > >
> > > > This happens quite a lot with fortran but can also happen in C because this
> came
> > > > code is unable to figure out when one expression is a multiple of another.
> > > >
> > > > As an example in the attached testcase we get:
> > > >
> > > > Initial set of candidates:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > > >
> > > > These end up being used in the same group:
> > > >
> > > > Group 1:
> > > > cand  cost    compl.  inv.expr.       inv.vars
> > > > 1     0       0       NIL;    6
> > > > 2     0       0       NIL;    6
> > > > 3     0       0       NIL;    6
> > > >
> > > > which ends up with IV opts picking the signed and unsigned IVs:
> > > >
> > > > Improved to:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > and so generates the same IV as both signed and unsigned:
> > > >
> > > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > > 58.2545), maybe hot
> > > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > > (FALLTHRU,EXECUTABLE)
> > > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > > >
> > > > ...
> > > >
> > > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > > 58.2545), maybe hot
> > > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > > (FALLTHRU)
> > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq
> 19.4182)
> > > (TRUE_VALUE,EXECUTABLE)
> > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq
> 12.9455)
> > > (TRUE_VALUE,EXECUTABLE)
> > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > >
> > >                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > > >
> > > > These two IVs are always used as unsigned, so IV ops generates:
> > > >
> > > >   _73 = stride.3_27 * 4;
> > > >   _80 = (unsigned long) _73;
> > > >   _54 = (unsigned long) stride.3_27;
> > > >   _39 = _54 * 4;
> > > >
> > > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > > >
> > > > This is because candidate 6 and 8 are structurally equivalent but have
> different
> > > > signs.
> > > >
> > > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > > just pick one over the other.  IV already has code for this, so the patch just
> > > > uses affine trees instead of tree for the check.
> > > >
> > > > With it we get:
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > >
> > > > <Group-candidate Costs>:
> > > > Group 0:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   5     0       2       NIL;    NIL;
> > > >   6     0       3       NIL;    NIL;
> > > >
> > > > Group 1:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   1     0       0       NIL;    6
> > > >   2     0       0       NIL;    6
> > > >   3     0       0       NIL;    6
> > > >   4     0       0       NIL;    6
> > > >
> > > > Initial set of candidates:
> > > >   cost: 16 (complexity 3)
> > > >   reg_cost: 6
> > > >   cand_cost: 10
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6
> > > >    group:0 --> iv_cand:6, cost=(0,3)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1
> > > >
> > > > The two patches together results in a 10% performance increase in exchange2
> in
> > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > compile
> > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > reduction in binary size.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	PR tree-optimization/114932
> > > > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > > > 	offsets into account.
> > > > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > > > 	(record_group_use): Use it.
> > > > 	(constant_multiple_of): Also check equality under
> > > > 	aff_combination_constant_multiple_p.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 	PR tree-optimization/114932
> > > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > > >
> > > > ---
> > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > 2e24a4973c8539fae
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > @@ -0,0 +1,20 @@
> > > > +! { dg-do compile { target aarch64-*-* } }
> > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > > +
> > > > +module a
> > > > +integer, parameter :: b=3, c=b
> > > > +contains
> > > > +subroutine d(block)
> > > > +integer f, col   , block(:, :, :), e
> > > > +do f = 1, c
> > > > +   do col = 1, c
> > > > +             block(:f,                          :, e()) = do
> > > > +     end do
> > > > +  end do
> > > > +  end
> > > > +  end
> > > > +
> > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:}
> ivopts
> > > } }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:}
> 1
> > > ivopts } }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:}
> 1
> > > ivopts } }
> > > > +
> > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > > index
> > >
> d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> > > 6346890ddeab4096 100644
> > > > --- a/gcc/tree-affine.cc
> > > > +++ b/gcc/tree-affine.cc
> > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree
> *val,
> > > aff_tree *div,
> > > >  				     &mult_set, mult))
> > > >      return false;
> > > >
> > > > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > > > +     mult_set is checked, since that forced the only valid multiple of
> > > > +     val and div to be 0 whereas 1 is also possible.  */
> > > > +  if (known_eq (val->offset, 0)
> > > > +      && known_eq (div->offset, 0))
> > > > +    mult_set = false;
> > > > +
> > >
> > > In fact all numbers are possible?  Shouldn't this be better handled
> > > in wide_int_constant_multiple_p by special-casing
> > > known_eq (div, 0) in the known_eq (val, 0) condition by simply
> > > returning 'true' without checking or setting *mult_set?
> > >
> > > The docs of wide_int_constant_multiple_p is odd:
> > >
> > > /* If VAL != CST * DIV for any constant CST, returns false.
> > >
> > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> > > Or s/any/all/?
> > >
> >
> > Yeah, that condition would always be false.
> 
> Can you split out a patch to fix this?

Doing now.

> 
> > > >    for (i = 0; i < div->n; i++)
> > > >      {
> > > >        class aff_comb_elt *elt
> > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > > index
> > >
> 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> > > 64cd2facd9ddb4914 100644
> > > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> > > >    return exit;
> > > >  }
> > > >
> > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> > > return
> > > > +   whether they are the same under affine equality.  */
> > > > +
> > > > +static bool
> > > > +affine_compare_eq (aff_tree &left, tree right)
> > > > +{
> > > > +  aff_tree aff_right;
> > > > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > > > +  aff_combination_scale (&aff_right, -1);
> > > > +  aff_combination_add (&aff_right, &left);
> > > > +  return aff_combination_zero_p (&aff_right);
> > > > +}
> > > > +
> > > >
> > > >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if
> one
> > > >     of the arguments of each expression is a constant and that the type of the
> > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >    tree addr_base = NULL;
> > > >    struct iv_group *group = NULL;
> > > >    poly_uint64 addr_offset = 0;
> > > > +  aff_tree iv_step, iv_addr_base;
> > > > +
> > > > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> > > >
> > > >    /* Record non address type use in a new group.  */
> > > >    if (address_p (type))
> > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >        tree addr_toffset;
> > > >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> > > >        addr_offset = int_cst_value (addr_toffset);
> > > > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> > > &iv_addr_base);
> > > >        for (i = 0; i < data->vgroups.length (); i++)
> > > >  	{
> > > >  	  struct iv_use *use;
> > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >
> > > >  	  /* Check if it has the same stripped base and step.  */
> > > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > > +	      && affine_compare_eq (iv_step, use->iv->step)
> > > > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> > >
> > > There's only this use of addr_base so I think the opportunity is to
> > > turn iv_use->addr_base into aff_tree (even though that's a quite big
> > > representation).
> >
> > Ah, true.  Will do.
> >
> > >
> > > For the testcase, what are the two IVs we are comparing?  I wonder
> > > why you need the affine compare for iv->step?
> > >
> >
> > Because step builds up the IV expressions and can also be an expression.
> >
> > In this case:
> >
> > >>> p debug (iv->step)
> > ((unsigned long) stride.3_27) * 4
> > >>> p debug (use->iv->step)
> > (sizetype) (stride.3_27 * 4)
> >
> > Note that the original expressions were both unsigned, but the STRIP_NOPS
> > made one signed.   They still wouldn't have compared equal though due to
> > the different cast locations.
> >
> > For completeness the base here is
> >
> > >>> p debug (use->addr_base)
> > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> block.0_26)
> > $9 = void
> >
> > >>> p debug (addr_base)
> > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> block.0_26)
> > $10 = void
> > >
> >
> > > >  	    break;
> > > >  	}
> > > >        if (i == data->vgroups.length ())
> > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot,
> widest_int
> > > *mul)
> > > >        return true;
> > > >      }
> > > >
> > > > +  aff_tree aff_top, aff_bot;
> > > > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > > > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > > > +  poly_widest_int poly_mul;
> > > > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > > > +      && poly_mul.is_constant (mul))
> > > > +    return true;
> > > > +
> > >
> > > So why does stripping nops not work here?
> >
> > So this is where we compare different IV expressions to determine which
> > IVs compute the same thing and thus can be in the same group.
> >
> > The STRIP_NOPS don't work because while the incoming types are the same
> > the casts are different.  So:
> >
> > >>> p debug (ustep)
> > (unsigned long) stride.3_27 * 4
> > $3 = void
> > >>> p debug (cstep)
> > (unsigned long) (stride.3_27 * 4)
> > $4 = void
> >
> > Which is of course stripped to:
> >
> > >>> p debug (top)
> > (unsigned long) stride.3_27 * 4
> > $1 = void
> > >>> p debug (bot)
> > stride.3_27 * 4
> 
> So we're expecting constant_multiple_of to compute *mul == 1, proving
> equality?
> 

Indeed

> The function seems to try stripping ops from TOP until it reaches
> an expression equal to BOT and that's what fails to trigger here.
> 
> What I originally wondered was iff we compute the affine combinations
> why not use only aff_combination_constant_multiple_p?
> 

I think that's probably easier, The rest of the code seems to indeed be
repeating the work of aff_combination_constant_multiple_p.

I can try replacing the whole thing with aff_combination_constant_multiple_p
and see?

> I might also point back to the idea I threw in somewhere, adding
> OEP_VALUE (or a better name) to the set of flags accepted by
> operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> touching hashing?
> 

Yes, That can indeed be done with this approach.  The hashing was that before I
was trying to prevent the "duplicate" IV expressions from being created in the
first place by modifying get_loop_invariant_expr.

This function looks up if we have already seen a particular IV expression and if
we have it just returns that expression.  However after reading more of the code
I realized this wasn't the right approach, as without also dealing with the candidates
we'd end up creating IV expression that can't be handled by any candidate.

IVops would just give up then.   Reading the code it seems that get_loop_invariant_expr
is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and `a` as the same.

This is also why I think that everywhere else *has* to continue stripping the expression.

On a note from Richard S that he thought IVopts already had some code to deal with
expressions that differ only in signs led me to take a different approach.

The goal wasn't to remove the different sign/unsigned IV expressions, but instead get
Then to be servable by the same candidates. i.e. we want them in the same candidate
groups and then candidate optimization will just do its thing.

That seemed a more natural fit to how it worked.

Do you want me to try the operand_equal_p approach? Though in this case the issue
is we not only need to know they're equal, but also need to know the scale factor.

get_computation_aff_1 scales the common type IV by the scale we determined,
so I think operand_equal_p would not be very useful here.  But it does look like
constant_multiple_of can just be implemented with aff_combination_constant_multiple_p.

Should I try?

Thanks,
Tamar


> > Both of these compute the same thing so by doing the affine compare they
> > end up in the same IV groups and can be costed.  Later during candidate selection
> > these are the steps we're comparing to see if the candidate is the same invariant.
> >
> > The full list is:
> >
> > <IV Groups>:
> > Group 0:
> >   Type: REFERENCE ADDRESS
> >   Use 0.0:
> >     At stmt:    *_4 = _33;
> >     At pos:     *_4
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 4)
> >       Step:     (sizetype) (stride.3_27 * 4)
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >   Use 0.1:
> >     At stmt:    *_78 = _33;
> >     At pos:     *_78
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 8)
> >       Step:     (unsigned long) stride.3_27 * 4
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >   Use 0.2:
> >     At stmt:    *_45 = _33;
> >     At pos:     *_45
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 12)
> >       Step:     (unsigned long) stride.3_27 * 4
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >
> > And all these IVs are the same but with a slightly different base offset.  By doing
> the affine compare here IV ops sees that
> > The best candidate is:
> >
> > Candidate 6:
> >   Depend on inv.exprs: 1
> >   Var befor: ivtmp.26_51
> >   Var after: ivtmp.26_72
> >   Incr POS: before exit test
> >   IV struct:
> >     Type:       unsigned long
> >     Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned
> long) block.0_26
> >     Step:       (unsigned long) (stride.3_27 * 4)
> >     Object:     (void *) block.0_26
> >     Biv:        N
> >     Overflowness wrto loop niter:       Overflow
> >
> > And just builds the three IVs from the same candidate.
> >
> > Does this cover what you wanted?
> >
> > Cheers,
> > Tamar
> >
> > >
> > > >    code = TREE_CODE (top);
> > > >    switch (code)
> > > >      {
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Richard Biener June 25, 2024, 1:21 p.m. UTC | #7
On Mon, 24 Jun 2024, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, June 20, 2024 8:49 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> > Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> > selection [PR114932]
> > 
> > On Wed, 19 Jun 2024, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Wednesday, June 19, 2024 12:55 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > bin.cheng@linux.alibaba.com
> > > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during
> > candidate
> > > > selection [PR114932]
> > > >
> > > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > IVOPTS normally uses affine trees to perform comparisons between different
> > IVs,
> > > > > but these seem to have been missing in two key spots and instead normal
> > tree
> > > > > equivalencies used.
> > > > >
> > > > > In some cases where we have a structural equivalence but not a signedness
> > > > > equivalencies we end up generating both a signed and unsigned IV for the
> > same
> > > > > candidate.
> > > > >
> > > > > This happens quite a lot with fortran but can also happen in C because this
> > came
> > > > > code is unable to figure out when one expression is a multiple of another.
> > > > >
> > > > > As an example in the attached testcase we get:
> > > > >
> > > > > Initial set of candidates:
> > > > >   cost: 24 (complexity 3)
> > > > >   reg_cost: 9
> > > > >   cand_cost: 15
> > > > >   cand_group_cost: 0 (complexity 3)
> > > > >   candidates: 1, 6, 8
> > > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > > >   invariant variables: 6
> > > > >   invariant expressions: 1, 2
> > > > >
> > > > > <Invariant Expressions>:
> > > > > inv_expr 1:     stride.3_27 * 4
> > > > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > > > >
> > > > > These end up being used in the same group:
> > > > >
> > > > > Group 1:
> > > > > cand  cost    compl.  inv.expr.       inv.vars
> > > > > 1     0       0       NIL;    6
> > > > > 2     0       0       NIL;    6
> > > > > 3     0       0       NIL;    6
> > > > >
> > > > > which ends up with IV opts picking the signed and unsigned IVs:
> > > > >
> > > > > Improved to:
> > > > >   cost: 24 (complexity 3)
> > > > >   reg_cost: 9
> > > > >   cand_cost: 15
> > > > >   cand_group_cost: 0 (complexity 3)
> > > > >   candidates: 1, 6, 8
> > > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > > >   invariant variables: 6
> > > > >   invariant expressions: 1, 2
> > > > >
> > > > > and so generates the same IV as both signed and unsigned:
> > > > >
> > > > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > > > 58.2545), maybe hot
> > > > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > > > (FALLTHRU,EXECUTABLE)
> > > > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > > > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > > > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > > > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > > > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > > > >
> > > > > ...
> > > > >
> > > > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > > > 58.2545), maybe hot
> > > > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > > > (FALLTHRU)
> > > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq
> > 19.4182)
> > > > (TRUE_VALUE,EXECUTABLE)
> > > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq
> > 12.9455)
> > > > (TRUE_VALUE,EXECUTABLE)
> > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > > >
> > > >                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> > > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > > > >
> > > > > These two IVs are always used as unsigned, so IV ops generates:
> > > > >
> > > > >   _73 = stride.3_27 * 4;
> > > > >   _80 = (unsigned long) _73;
> > > > >   _54 = (unsigned long) stride.3_27;
> > > > >   _39 = _54 * 4;
> > > > >
> > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > > > >
> > > > > This is because candidate 6 and 8 are structurally equivalent but have
> > different
> > > > > signs.
> > > > >
> > > > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > > > just pick one over the other.  IV already has code for this, so the patch just
> > > > > uses affine trees instead of tree for the check.
> > > > >
> > > > > With it we get:
> > > > >
> > > > > <Invariant Expressions>:
> > > > > inv_expr 1:     stride.3_27 * 4
> > > > >
> > > > > <Group-candidate Costs>:
> > > > > Group 0:
> > > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > > >   5     0       2       NIL;    NIL;
> > > > >   6     0       3       NIL;    NIL;
> > > > >
> > > > > Group 1:
> > > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > > >   1     0       0       NIL;    6
> > > > >   2     0       0       NIL;    6
> > > > >   3     0       0       NIL;    6
> > > > >   4     0       0       NIL;    6
> > > > >
> > > > > Initial set of candidates:
> > > > >   cost: 16 (complexity 3)
> > > > >   reg_cost: 6
> > > > >   cand_cost: 10
> > > > >   cand_group_cost: 0 (complexity 3)
> > > > >   candidates: 1, 6
> > > > >    group:0 --> iv_cand:6, cost=(0,3)
> > > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > > >   invariant variables: 6
> > > > >   invariant expressions: 1
> > > > >
> > > > > The two patches together results in a 10% performance increase in exchange2
> > in
> > > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > > compile
> > > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > > reduction in binary size.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	PR tree-optimization/114932
> > > > > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > > > > 	offsets into account.
> > > > > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > > > > 	(record_group_use): Use it.
> > > > > 	(constant_multiple_of): Also check equality under
> > > > > 	aff_combination_constant_multiple_p.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > 	PR tree-optimization/114932
> > > > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > > > >
> > > > > ---
> > > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > > 2e24a4973c8539fae
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > > @@ -0,0 +1,20 @@
> > > > > +! { dg-do compile { target aarch64-*-* } }
> > > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > > > +
> > > > > +module a
> > > > > +integer, parameter :: b=3, c=b
> > > > > +contains
> > > > > +subroutine d(block)
> > > > > +integer f, col   , block(:, :, :), e
> > > > > +do f = 1, c
> > > > > +   do col = 1, c
> > > > > +             block(:f,                          :, e()) = do
> > > > > +     end do
> > > > > +  end do
> > > > > +  end
> > > > > +  end
> > > > > +
> > > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:}
> > ivopts
> > > > } }
> > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:}
> > 1
> > > > ivopts } }
> > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:}
> > 1
> > > > ivopts } }
> > > > > +
> > > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > > > index
> > > >
> > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> > > > 6346890ddeab4096 100644
> > > > > --- a/gcc/tree-affine.cc
> > > > > +++ b/gcc/tree-affine.cc
> > > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree
> > *val,
> > > > aff_tree *div,
> > > > >  				     &mult_set, mult))
> > > > >      return false;
> > > > >
> > > > > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > > > > +     mult_set is checked, since that forced the only valid multiple of
> > > > > +     val and div to be 0 whereas 1 is also possible.  */
> > > > > +  if (known_eq (val->offset, 0)
> > > > > +      && known_eq (div->offset, 0))
> > > > > +    mult_set = false;
> > > > > +
> > > >
> > > > In fact all numbers are possible?  Shouldn't this be better handled
> > > > in wide_int_constant_multiple_p by special-casing
> > > > known_eq (div, 0) in the known_eq (val, 0) condition by simply
> > > > returning 'true' without checking or setting *mult_set?
> > > >
> > > > The docs of wide_int_constant_multiple_p is odd:
> > > >
> > > > /* If VAL != CST * DIV for any constant CST, returns false.
> > > >
> > > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> > > > Or s/any/all/?
> > > >
> > >
> > > Yeah, that condition would always be false.
> > 
> > Can you split out a patch to fix this?
> 
> Doing now.
> 
> > 
> > > > >    for (i = 0; i < div->n; i++)
> > > > >      {
> > > > >        class aff_comb_elt *elt
> > > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > > > index
> > > >
> > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> > > > 64cd2facd9ddb4914 100644
> > > > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> > > > >    return exit;
> > > > >  }
> > > > >
> > > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> > > > return
> > > > > +   whether they are the same under affine equality.  */
> > > > > +
> > > > > +static bool
> > > > > +affine_compare_eq (aff_tree &left, tree right)
> > > > > +{
> > > > > +  aff_tree aff_right;
> > > > > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > > > > +  aff_combination_scale (&aff_right, -1);
> > > > > +  aff_combination_add (&aff_right, &left);
> > > > > +  return aff_combination_zero_p (&aff_right);
> > > > > +}
> > > > > +
> > > > >
> > > > >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if
> > one
> > > > >     of the arguments of each expression is a constant and that the type of the
> > > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> > > > *use_p,
> > > > >    tree addr_base = NULL;
> > > > >    struct iv_group *group = NULL;
> > > > >    poly_uint64 addr_offset = 0;
> > > > > +  aff_tree iv_step, iv_addr_base;
> > > > > +
> > > > > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> > > > >
> > > > >    /* Record non address type use in a new group.  */
> > > > >    if (address_p (type))
> > > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> > > > *use_p,
> > > > >        tree addr_toffset;
> > > > >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> > > > >        addr_offset = int_cst_value (addr_toffset);
> > > > > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> > > > &iv_addr_base);
> > > > >        for (i = 0; i < data->vgroups.length (); i++)
> > > > >  	{
> > > > >  	  struct iv_use *use;
> > > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> > > > *use_p,
> > > > >
> > > > >  	  /* Check if it has the same stripped base and step.  */
> > > > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > > > +	      && affine_compare_eq (iv_step, use->iv->step)
> > > > > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> > > >
> > > > There's only this use of addr_base so I think the opportunity is to
> > > > turn iv_use->addr_base into aff_tree (even though that's a quite big
> > > > representation).
> > >
> > > Ah, true.  Will do.
> > >
> > > >
> > > > For the testcase, what are the two IVs we are comparing?  I wonder
> > > > why you need the affine compare for iv->step?
> > > >
> > >
> > > Because step builds up the IV expressions and can also be an expression.
> > >
> > > In this case:
> > >
> > > >>> p debug (iv->step)
> > > ((unsigned long) stride.3_27) * 4
> > > >>> p debug (use->iv->step)
> > > (sizetype) (stride.3_27 * 4)
> > >
> > > Note that the original expressions were both unsigned, but the STRIP_NOPS
> > > made one signed.   They still wouldn't have compared equal though due to
> > > the different cast locations.
> > >
> > > For completeness the base here is
> > >
> > > >>> p debug (use->addr_base)
> > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> > block.0_26)
> > > $9 = void
> > >
> > > >>> p debug (addr_base)
> > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> > block.0_26)
> > > $10 = void
> > > >
> > >
> > > > >  	    break;
> > > > >  	}
> > > > >        if (i == data->vgroups.length ())
> > > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot,
> > widest_int
> > > > *mul)
> > > > >        return true;
> > > > >      }
> > > > >
> > > > > +  aff_tree aff_top, aff_bot;
> > > > > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > > > > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > > > > +  poly_widest_int poly_mul;
> > > > > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > > > > +      && poly_mul.is_constant (mul))
> > > > > +    return true;
> > > > > +
> > > >
> > > > So why does stripping nops not work here?
> > >
> > > So this is where we compare different IV expressions to determine which
> > > IVs compute the same thing and thus can be in the same group.
> > >
> > > The STRIP_NOPS don't work because while the incoming types are the same
> > > the casts are different.  So:
> > >
> > > >>> p debug (ustep)
> > > (unsigned long) stride.3_27 * 4
> > > $3 = void
> > > >>> p debug (cstep)
> > > (unsigned long) (stride.3_27 * 4)
> > > $4 = void
> > >
> > > Which is of course stripped to:
> > >
> > > >>> p debug (top)
> > > (unsigned long) stride.3_27 * 4
> > > $1 = void
> > > >>> p debug (bot)
> > > stride.3_27 * 4
> > 
> > So we're expecting constant_multiple_of to compute *mul == 1, proving
> > equality?
> > 
> 
> Indeed
> 
> > The function seems to try stripping ops from TOP until it reaches
> > an expression equal to BOT and that's what fails to trigger here.
> > 
> > What I originally wondered was iff we compute the affine combinations
> > why not use only aff_combination_constant_multiple_p?
> > 
> 
> I think that's probably easier, The rest of the code seems to indeed be
> repeating the work of aff_combination_constant_multiple_p.
> 
> I can try replacing the whole thing with aff_combination_constant_multiple_p
> and see?

Yes.

> > I might also point back to the idea I threw in somewhere, adding
> > OEP_VALUE (or a better name) to the set of flags accepted by
> > operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> > touching hashing?
> > 
> 
> Yes, That can indeed be done with this approach.  The hashing was that before I
> was trying to prevent the "duplicate" IV expressions from being created in the
> first place by modifying get_loop_invariant_expr.
> 
> This function looks up if we have already seen a particular IV expression and if
> we have it just returns that expression.  However after reading more of the code
> I realized this wasn't the right approach, as without also dealing with the candidates
> we'd end up creating IV expression that can't be handled by any candidate.
> 
> IVops would just give up then.   Reading the code it seems that get_loop_invariant_expr
> is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and `a` as the same.
> 
> This is also why I think that everywhere else *has* to continue stripping the expression.
> 
> On a note from Richard S that he thought IVopts already had some code to deal with
> expressions that differ only in signs led me to take a different approach.
> 
> The goal wasn't to remove the different sign/unsigned IV expressions, but instead get
> Then to be servable by the same candidates. i.e. we want them in the same candidate
> groups and then candidate optimization will just do its thing.
> 
> That seemed a more natural fit to how it worked.

Yeah, I agree that sounds like the better strathegy.

> Do you want me to try the operand_equal_p approach? Though in this case the issue
> is we not only need to know they're equal, but also need to know the scale factor.

For this case yes, but if you'd keep the code as-is, the equal with scale
factor one case would be fixed.  Not a case with different scale factors
though - but conversions "elsewhere" should be handled via the stripping.
So it would work to simply adjust the operand_equal_p check here?

> get_computation_aff_1 scales the common type IV by the scale we determined,
> so I think operand_equal_p would not be very useful here.  But it does look like
> constant_multiple_of can just be implemented with aff_combination_constant_multiple_p.
> 
> Should I try?

You've had the other places where you replace operand_equal_p with
affine-compute and compare.  As said that has some associated cost
as well as a limit on the number of elements after which it resorts
back to operand_equal_p.  So for strict equality tests implementing
a weaker operand_equal_p might be a better solution.

Richard.

> Thanks,
> Tamar
> 
> 
> > > Both of these compute the same thing so by doing the affine compare they
> > > end up in the same IV groups and can be costed.  Later during candidate selection
> > > these are the steps we're comparing to see if the candidate is the same invariant.
> > >
> > > The full list is:
> > >
> > > <IV Groups>:
> > > Group 0:
> > >   Type: REFERENCE ADDRESS
> > >   Use 0.0:
> > >     At stmt:    *_4 = _33;
> > >     At pos:     *_4
> > >     IV struct:
> > >       Type:     integer(kind=4) *
> > >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> > _36) * 4 + (unsigned long) block.0_26) + 4)
> > >       Step:     (sizetype) (stride.3_27 * 4)
> > >       Object:   (void *) block.0_26
> > >       Biv:      N
> > >       Overflowness wrto loop niter:     Overflow
> > >   Use 0.1:
> > >     At stmt:    *_78 = _33;
> > >     At pos:     *_78
> > >     IV struct:
> > >       Type:     integer(kind=4) *
> > >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> > _36) * 4 + (unsigned long) block.0_26) + 8)
> > >       Step:     (unsigned long) stride.3_27 * 4
> > >       Object:   (void *) block.0_26
> > >       Biv:      N
> > >       Overflowness wrto loop niter:     Overflow
> > >   Use 0.2:
> > >     At stmt:    *_45 = _33;
> > >     At pos:     *_45
> > >     IV struct:
> > >       Type:     integer(kind=4) *
> > >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> > _36) * 4 + (unsigned long) block.0_26) + 12)
> > >       Step:     (unsigned long) stride.3_27 * 4
> > >       Object:   (void *) block.0_26
> > >       Biv:      N
> > >       Overflowness wrto loop niter:     Overflow
> > >
> > > And all these IVs are the same but with a slightly different base offset.  By doing
> > the affine compare here IV ops sees that
> > > The best candidate is:
> > >
> > > Candidate 6:
> > >   Depend on inv.exprs: 1
> > >   Var befor: ivtmp.26_51
> > >   Var after: ivtmp.26_72
> > >   Incr POS: before exit test
> > >   IV struct:
> > >     Type:       unsigned long
> > >     Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned
> > long) block.0_26
> > >     Step:       (unsigned long) (stride.3_27 * 4)
> > >     Object:     (void *) block.0_26
> > >     Biv:        N
> > >     Overflowness wrto loop niter:       Overflow
> > >
> > > And just builds the three IVs from the same candidate.
> > >
> > > Does this cover what you wanted?
> > >
> > > Cheers,
> > > Tamar
> > >
> > > >
> > > > >    code = TREE_CODE (top);
> > > > >    switch (code)
> > > > >      {
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
Tamar Christina July 10, 2024, 10:19 a.m. UTC | #8
> > > I might also point back to the idea I threw in somewhere, adding
> > > OEP_VALUE (or a better name) to the set of flags accepted by
> > > operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> > > touching hashing?
> > >
> >
> > Yes, That can indeed be done with this approach.  The hashing was that before I
> > was trying to prevent the "duplicate" IV expressions from being created in the
> > first place by modifying get_loop_invariant_expr.
> >
> > This function looks up if we have already seen a particular IV expression and if
> > we have it just returns that expression.  However after reading more of the code
> > I realized this wasn't the right approach, as without also dealing with the
> candidates
> > we'd end up creating IV expression that can't be handled by any candidate.
> >
> > IVops would just give up then.   Reading the code it seems that
> get_loop_invariant_expr
> > is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and `a` as the
> same.
> >
> > This is also why I think that everywhere else *has* to continue stripping the
> expression.
> >
> > On a note from Richard S that he thought IVopts already had some code to deal
> with
> > expressions that differ only in signs led me to take a different approach.
> >
> > The goal wasn't to remove the different sign/unsigned IV expressions, but
> instead get
> > Then to be servable by the same candidates. i.e. we want them in the same
> candidate
> > groups and then candidate optimization will just do its thing.
> >
> > That seemed a more natural fit to how it worked.
> 
> Yeah, I agree that sounds like the better strathegy.
> 
> > Do you want me to try the operand_equal_p approach? Though in this case the
> issue
> > is we not only need to know they're equal, but also need to know the scale
> factor.
> 
> For this case yes, but if you'd keep the code as-is, the equal with scale
> factor one case would be fixed.  Not a case with different scale factors
> though - but conversions "elsewhere" should be handled via the stripping.
> So it would work to simply adjust the operand_equal_p check here?
> 
> > get_computation_aff_1 scales the common type IV by the scale we determined,
> > so I think operand_equal_p would not be very useful here.  But it does look like
> > constant_multiple_of can just be implemented with
> aff_combination_constant_multiple_p.
> >
> > Should I try?
> 
> You've had the other places where you replace operand_equal_p with
> affine-compute and compare.  As said that has some associated cost
> as well as a limit on the number of elements after which it resorts
> back to operand_equal_p.  So for strict equality tests implementing
> a weaker operand_equal_p might be a better solution.
> 

The structural comparison is implemented as a new mode for operand_equal_p which
compares two expressions ignoring NOP converts (unless their bitsizes differ)
and ignoring constant values, but requiring both operands be a constant.

There is one downside compared to affine comparison, in that this approach does
not deal well with commutative operations. i.e. it does not see a + (b + c) as
equivalent to c + (b + a).

This means we lose out on some of the more complicated addressing modes, but
with so many operations the address will likely be split anyway and we'll deal
with it then.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu -m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/114932
	* fold-const.cc (operand_compare::operand_equal_p): Use it.
	(operand_compare::verify_hash_value): Likewise.
	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.

gcc/testsuite/ChangeLog:

	PR tree-optimization/114932
	* gfortran.dg/addressing-modes_2.f90: New test.

-- inline copy of --

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c094164724da9d17fbb5793237 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
      precision differences.  */
   if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
     {
+      if (flags & OEP_STRUCTURAL_EQ)
+	return true;
+
       /* Address of INTEGER_CST is not defined; check that we did not forget
 	 to drop the OEP_ADDRESS_OF flags.  */
       gcc_checking_assert (!(flags & OEP_ADDRESS_OF));
@@ -3204,7 +3207,8 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
 	 because they may change the signedness of the arguments.  As pointers
 	 strictly don't have a signedness, require either two pointers or
 	 two non-pointers as well.  */
-      if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
+      if ((TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
+	   && !(flags & OEP_STRUCTURAL_EQ))
 	  || POINTER_TYPE_P (TREE_TYPE (arg0))
 			     != POINTER_TYPE_P (TREE_TYPE (arg1)))
 	return false;
@@ -4204,7 +4208,7 @@ operand_compare::verify_hash_value (const_tree arg0, const_tree arg1,
     {
       if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
 	{
-	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
+	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ)))
 	    {
 	      inchash::hash hstate0 (0), hstate1 (0);
 	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
@@ -0,0 +1,20 @@
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
+
+module a
+integer, parameter :: b=3, c=b
+contains
+subroutine d(block)
+integer f, col   , block(:, :, :), e
+do f = 1, c
+   do col = 1, c
+             block(:f,                          :, e()) = do
+     end do
+  end do
+  end
+  end
+
+! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
+
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -962,7 +962,9 @@ enum operand_equal_flag {
   /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
      of the same kind.  Used to compare VLA bounds involving parameters
      across redeclarations of the same function.  */
-  OEP_DECL_NAME = 512
+  OEP_DECL_NAME = 512,
+  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
+  OEP_STRUCTURAL_EQ = 1024
 };
 
 /* Enum and arrays used for tree allocation stats.
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
 
 	  /* Check if it has the same stripped base and step.  */
 	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
-	      && operand_equal_p (iv->step, use->iv->step, 0)
-	      && operand_equal_p (addr_base, use->addr_base, 0))
+	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
+	      && operand_equal_p (addr_base, use->addr_base, OEP_STRUCTURAL_EQ))
 	    break;
 	}
       if (i == data->vgroups.length ())
Richard Biener July 11, 2024, 12:09 p.m. UTC | #9
On Wed, 10 Jul 2024, Tamar Christina wrote:

> > > > I might also point back to the idea I threw in somewhere, adding
> > > > OEP_VALUE (or a better name) to the set of flags accepted by
> > > > operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> > > > touching hashing?
> > > >
> > >
> > > Yes, That can indeed be done with this approach.  The hashing was that before I
> > > was trying to prevent the "duplicate" IV expressions from being created in the
> > > first place by modifying get_loop_invariant_expr.
> > >
> > > This function looks up if we have already seen a particular IV expression and if
> > > we have it just returns that expression.  However after reading more of the code
> > > I realized this wasn't the right approach, as without also dealing with the
> > candidates
> > > we'd end up creating IV expression that can't be handled by any candidate.
> > >
> > > IVops would just give up then.   Reading the code it seems that
> > get_loop_invariant_expr
> > > is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and `a` as the
> > same.
> > >
> > > This is also why I think that everywhere else *has* to continue stripping the
> > expression.
> > >
> > > On a note from Richard S that he thought IVopts already had some code to deal
> > with
> > > expressions that differ only in signs led me to take a different approach.
> > >
> > > The goal wasn't to remove the different sign/unsigned IV expressions, but
> > instead get
> > > Then to be servable by the same candidates. i.e. we want them in the same
> > candidate
> > > groups and then candidate optimization will just do its thing.
> > >
> > > That seemed a more natural fit to how it worked.
> > 
> > Yeah, I agree that sounds like the better strathegy.
> > 
> > > Do you want me to try the operand_equal_p approach? Though in this case the
> > issue
> > > is we not only need to know they're equal, but also need to know the scale
> > factor.
> > 
> > For this case yes, but if you'd keep the code as-is, the equal with scale
> > factor one case would be fixed.  Not a case with different scale factors
> > though - but conversions "elsewhere" should be handled via the stripping.
> > So it would work to simply adjust the operand_equal_p check here?
> > 
> > > get_computation_aff_1 scales the common type IV by the scale we determined,
> > > so I think operand_equal_p would not be very useful here.  But it does look like
> > > constant_multiple_of can just be implemented with
> > aff_combination_constant_multiple_p.
> > >
> > > Should I try?
> > 
> > You've had the other places where you replace operand_equal_p with
> > affine-compute and compare.  As said that has some associated cost
> > as well as a limit on the number of elements after which it resorts
> > back to operand_equal_p.  So for strict equality tests implementing
> > a weaker operand_equal_p might be a better solution.
> > 
> 
> The structural comparison is implemented as a new mode for operand_equal_p which
> compares two expressions ignoring NOP converts (unless their bitsizes differ)
> and ignoring constant values, but requiring both operands be a constant.
> 
> There is one downside compared to affine comparison, in that this approach does
> not deal well with commutative operations. i.e. it does not see a + (b + c) as
> equivalent to c + (b + a).
> 
> This means we lose out on some of the more complicated addressing modes, but
> with so many operations the address will likely be split anyway and we'll deal
> with it then.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* fold-const.cc (operand_compare::operand_equal_p): Use it.
> 	(operand_compare::verify_hash_value): Likewise.
> 	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> 	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* gfortran.dg/addressing-modes_2.f90: New test.
> 
> -- inline copy of --
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c094164724da9d17fbb5793237 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
>       precision differences.  */
>    if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
>      {
> +      if (flags & OEP_STRUCTURAL_EQ)
> +	return true;
> +

Hmm, so you ignore all constants?

>        /* Address of INTEGER_CST is not defined; check that we did not forget
>  	 to drop the OEP_ADDRESS_OF flags.  */
>        gcc_checking_assert (!(flags & OEP_ADDRESS_OF));
> @@ -3204,7 +3207,8 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1,
>  	 because they may change the signedness of the arguments.  As pointers
>  	 strictly don't have a signedness, require either two pointers or
>  	 two non-pointers as well.  */
> -      if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
> +      if ((TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1))
> +	   && !(flags & OEP_STRUCTURAL_EQ))
>  	  || POINTER_TYPE_P (TREE_TYPE (arg0))
>  			     != POINTER_TYPE_P (TREE_TYPE (arg1)))

and all sign-conversions?

I was thinking to implement a "value equivalence" check, so
(int)((unsigned)i + (unsigned)j) would be equal to i + j for int i and j.

Doesn't your patch still treat (int)(unsigned)i and i as different?
And won't it treat (unsigned)i / 4 and (int)i / 4 as the same?  I'm not
sure this is the desired semantic for IVOPTs (or a useful one in general).

That said, when OEP_STRUCTURAL_EQ ignores a sign-change it can only
do that for operations where that maintains value-equivalence
(in terms of twos-complement bit representation).

OTOH if you ignore all differences in constants that's another thing.

>  	return false;
> @@ -4204,7 +4208,7 @@ operand_compare::verify_hash_value (const_tree arg0, const_tree arg1,
>      {
>        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
>  	{
> -	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> +	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ)))
>  	    {
>  	      inchash::hash hstate0 (0), hstate1 (0);
>  	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> new file mode 100644
> index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> @@ -0,0 +1,20 @@
> +! { dg-do compile { target aarch64-*-* } }
> +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> +
> +module a
> +integer, parameter :: b=3, c=b
> +contains
> +subroutine d(block)
> +integer f, col   , block(:, :, :), e
> +do f = 1, c
> +   do col = 1, c
> +             block(:f,                          :, e()) = do
> +     end do
> +  end do
> +  end
> +  end
> +
> +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
> +
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -962,7 +962,9 @@ enum operand_equal_flag {
>    /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
>       of the same kind.  Used to compare VLA bounds involving parameters
>       across redeclarations of the same function.  */
> -  OEP_DECL_NAME = 512
> +  OEP_DECL_NAME = 512,
> +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> +  OEP_STRUCTURAL_EQ = 1024
>  };
>  
>  /* Enum and arrays used for tree allocation stats.
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>  
>  	  /* Check if it has the same stripped base and step.  */
>  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> -	      && operand_equal_p (iv->step, use->iv->step, 0)
> -	      && operand_equal_p (addr_base, use->addr_base, 0))
> +	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> +	      && operand_equal_p (addr_base, use->addr_base, OEP_STRUCTURAL_EQ))
>  	    break;
>  	}
>        if (i == data->vgroups.length ())
>
Tamar Christina July 11, 2024, 1:32 p.m. UTC | #10
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, July 11, 2024 1:10 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> selection [PR114932]
> 
> On Wed, 10 Jul 2024, Tamar Christina wrote:
> 
> > > > > I might also point back to the idea I threw in somewhere, adding
> > > > > OEP_VALUE (or a better name) to the set of flags accepted by
> > > > > operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> > > > > touching hashing?
> > > > >
> > > >
> > > > Yes, That can indeed be done with this approach.  The hashing was that
> before I
> > > > was trying to prevent the "duplicate" IV expressions from being created in the
> > > > first place by modifying get_loop_invariant_expr.
> > > >
> > > > This function looks up if we have already seen a particular IV expression and if
> > > > we have it just returns that expression.  However after reading more of the
> code
> > > > I realized this wasn't the right approach, as without also dealing with the
> > > candidates
> > > > we'd end up creating IV expression that can't be handled by any candidate.
> > > >
> > > > IVops would just give up then.   Reading the code it seems that
> > > get_loop_invariant_expr
> > > > is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and `a` as
> the
> > > same.
> > > >
> > > > This is also why I think that everywhere else *has* to continue stripping the
> > > expression.
> > > >
> > > > On a note from Richard S that he thought IVopts already had some code to
> deal
> > > with
> > > > expressions that differ only in signs led me to take a different approach.
> > > >
> > > > The goal wasn't to remove the different sign/unsigned IV expressions, but
> > > instead get
> > > > Then to be servable by the same candidates. i.e. we want them in the same
> > > candidate
> > > > groups and then candidate optimization will just do its thing.
> > > >
> > > > That seemed a more natural fit to how it worked.
> > >
> > > Yeah, I agree that sounds like the better strathegy.
> > >
> > > > Do you want me to try the operand_equal_p approach? Though in this case
> the
> > > issue
> > > > is we not only need to know they're equal, but also need to know the scale
> > > factor.
> > >
> > > For this case yes, but if you'd keep the code as-is, the equal with scale
> > > factor one case would be fixed.  Not a case with different scale factors
> > > though - but conversions "elsewhere" should be handled via the stripping.
> > > So it would work to simply adjust the operand_equal_p check here?
> > >
> > > > get_computation_aff_1 scales the common type IV by the scale we
> determined,
> > > > so I think operand_equal_p would not be very useful here.  But it does look
> like
> > > > constant_multiple_of can just be implemented with
> > > aff_combination_constant_multiple_p.
> > > >
> > > > Should I try?
> > >
> > > You've had the other places where you replace operand_equal_p with
> > > affine-compute and compare.  As said that has some associated cost
> > > as well as a limit on the number of elements after which it resorts
> > > back to operand_equal_p.  So for strict equality tests implementing
> > > a weaker operand_equal_p might be a better solution.
> > >
> >
> > The structural comparison is implemented as a new mode for operand_equal_p
> which
> > compares two expressions ignoring NOP converts (unless their bitsizes differ)
> > and ignoring constant values, but requiring both operands be a constant.
> >
> > There is one downside compared to affine comparison, in that this approach
> does
> > not deal well with commutative operations. i.e. it does not see a + (b + c) as
> > equivalent to c + (b + a).
> >
> > This means we lose out on some of the more complicated addressing modes, but
> > with so many operations the address will likely be split anyway and we'll deal
> > with it then.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* fold-const.cc (operand_compare::operand_equal_p): Use it.
> > 	(operand_compare::verify_hash_value): Likewise.
> > 	* tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> > 	* tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* gfortran.dg/addressing-modes_2.f90: New test.
> >
> > -- inline copy of --
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index
> 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c09416472
> 4da9d17fbb5793237 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree
> arg0, const_tree arg1,
> >       precision differences.  */
> >    if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
> >      {
> > +      if (flags & OEP_STRUCTURAL_EQ)
> > +	return true;
> > +
> 
> Hmm, so you ignore all constants?

Yes, because later on it still needs to check that one expression is a multiple of the other.
If not while they're in the same group, they'll get different IVs.

Ideally I'd only check if the two expressions differ only in a constant offset or multiply.
But I have no idea how to do this from operand_equal_p as I'd need to know what the
parent expression is.  I guess I could do extra checks when comparing + and *.

> 
> >        /* Address of INTEGER_CST is not defined; check that we did not forget
> >  	 to drop the OEP_ADDRESS_OF flags.  */
> >        gcc_checking_assert (!(flags & OEP_ADDRESS_OF));
> > @@ -3204,7 +3207,8 @@ operand_compare::operand_equal_p (const_tree
> arg0, const_tree arg1,
> >  	 because they may change the signedness of the arguments.  As pointers
> >  	 strictly don't have a signedness, require either two pointers or
> >  	 two non-pointers as well.  */
> > -      if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE
> (arg1))
> > +      if ((TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE
> (arg1))
> > +	   && !(flags & OEP_STRUCTURAL_EQ))
> >  	  || POINTER_TYPE_P (TREE_TYPE (arg0))
> >  			     != POINTER_TYPE_P (TREE_TYPE (arg1)))
> 
> and all sign-conversions?

Yes, as long as they don't change bitsize.  That's checked in the lines below.

> 
> I was thinking to implement a "value equivalence" check, so
> (int)((unsigned)i + (unsigned)j) would be equal to i + j for int i and j.
> 

That's not enough for IV ops, what it's after is knowing whether two expressions
are the same but for some constant offset of multiply.  i.e. can one expression
replace the other with come constant known adjustment.

> Doesn't your patch still treat (int)(unsigned)i and i as different?

Sure, but we don't' get those because we've folded everything to
unsigned before in patch 1.

> And won't it treat (unsigned)i / 4 and (int)i / 4 as the same?  I'm not
> sure this is the desired semantic for IVOPTs (or a useful one in general).

Agree, though I didn't expect to have division in addressing mode calculations.
I only expected +,-,*.  Do we actually do /?

> 
> That said, when OEP_STRUCTURAL_EQ ignores a sign-change it can only
> do that for operations where that maintains value-equivalence
> (in terms of twos-complement bit representation).
> 
> OTOH if you ignore all differences in constants that's another thing.

It's a bit of a weird one.  As I mentioned above I only need +,- and *.
But whatever semantics I encode the use of the flag is going to be
specialized.

I can start by figuring out how to ignore the constants only in those
three cases.

Thanks,
Tamar

> 
> >  	return false;
> > @@ -4204,7 +4208,7 @@ operand_compare::verify_hash_value (const_tree
> arg0, const_tree arg1,
> >      {
> >        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
> >  	{
> > -	  if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> > +	  if (arg0 != arg1 && !(flags & (OEP_DECL_NAME |
> OEP_STRUCTURAL_EQ)))
> >  	    {
> >  	      inchash::hash hstate0 (0), hstate1 (0);
> >  	      hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> 2e24a4973c8539fae
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > @@ -0,0 +1,20 @@
> > +! { dg-do compile { target aarch64-*-* } }
> > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > +
> > +module a
> > +integer, parameter :: b=3, c=b
> > +contains
> > +subroutine d(block)
> > +integer f, col   , block(:, :, :), e
> > +do f = 1, c
> > +   do col = 1, c
> > +             block(:f,                          :, e()) = do
> > +     end do
> > +  end do
> > +  end
> > +  end
> > +
> > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts
> } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> ivopts } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> ivopts } }
> > +
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0
> 7d133bc393aa4a7 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -962,7 +962,9 @@ enum operand_equal_flag {
> >    /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
> >       of the same kind.  Used to compare VLA bounds involving parameters
> >       across redeclarations of the same function.  */
> > -  OEP_DECL_NAME = 512
> > +  OEP_DECL_NAME = 512,
> > +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> > +  OEP_STRUCTURAL_EQ = 1024
> >  };
> >
> >  /* Enum and arrays used for tree allocation stats.
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3
> ed5e614b7ccc702f 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> >
> >  	  /* Check if it has the same stripped base and step.  */
> >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > +	      && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> > +	      && operand_equal_p (addr_base, use->addr_base,
> OEP_STRUCTURAL_EQ))
> >  	    break;
> >  	}
> >        if (i == data->vgroups.length ())
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
diff mbox series

Patch

diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
@@ -0,0 +1,20 @@ 
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
+
+module a
+integer, parameter :: b=3, c=b
+contains
+subroutine d(block)
+integer f, col   , block(:, :, :), e
+do f = 1, c
+   do col = 1, c
+             block(:f,                          :, e()) = do
+     end do
+  end do
+  end
+  end
+
+! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
+
diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
--- a/gcc/tree-affine.cc
+++ b/gcc/tree-affine.cc
@@ -941,6 +941,13 @@  aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
 				     &mult_set, mult))
     return false;
 
+  /* Everything is a multiple of 0, which means we shoudn't enforce that
+     mult_set is checked, since that forced the only valid multiple of
+     val and div to be 0 whereas 1 is also possible.  */
+  if (known_eq (val->offset, 0)
+      && known_eq (div->offset, 0))
+    mult_set = false;
+
   for (i = 0; i < div->n; i++)
     {
       class aff_comb_elt *elt
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -757,6 +757,19 @@  single_dom_exit (class loop *loop)
   return exit;
 }
 
+/* Compares the given affine tree LEFT with the tree expression RIGHT and return
+   whether they are the same under affine equality.  */
+
+static bool
+affine_compare_eq (aff_tree &left, tree right)
+{
+  aff_tree aff_right;
+  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
+  aff_combination_scale (&aff_right, -1);
+  aff_combination_add (&aff_right, &left);
+  return aff_combination_zero_p (&aff_right);
+}
+
 
 /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
    of the arguments of each expression is a constant and that the type of the
@@ -1673,6 +1686,9 @@  record_group_use (struct ivopts_data *data, tree *use_p,
   tree addr_base = NULL;
   struct iv_group *group = NULL;
   poly_uint64 addr_offset = 0;
+  aff_tree iv_step, iv_addr_base;
+
+  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
 
   /* Record non address type use in a new group.  */
   if (address_p (type))
@@ -1683,6 +1699,7 @@  record_group_use (struct ivopts_data *data, tree *use_p,
       tree addr_toffset;
       split_constant_offset (iv->base, &addr_base, &addr_toffset);
       addr_offset = int_cst_value (addr_toffset);
+      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
       for (i = 0; i < data->vgroups.length (); i++)
 	{
 	  struct iv_use *use;
@@ -1694,8 +1711,8 @@  record_group_use (struct ivopts_data *data, tree *use_p,
 
 	  /* Check if it has the same stripped base and step.  */
 	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
-	      && operand_equal_p (iv->step, use->iv->step, 0)
-	      && operand_equal_p (addr_base, use->addr_base, 0))
+	      && affine_compare_eq (iv_step, use->iv->step)
+	      && affine_compare_eq (iv_addr_base, use->addr_base))
 	    break;
 	}
       if (i == data->vgroups.length ())
@@ -2231,6 +2248,14 @@  constant_multiple_of (tree top, tree bot, widest_int *mul)
       return true;
     }
 
+  aff_tree aff_top, aff_bot;
+  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
+  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
+  poly_widest_int poly_mul;
+  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
+      && poly_mul.is_constant (mul))
+    return true;
+
   code = TREE_CODE (top);
   switch (code)
     {