Message ID | patch-18566-tamar@arm.com |
---|---|
State | New |
Headers | show |
Series | [ivopts] : use affine_tree when comparing IVs during candidate selection [PR114932] | expand |
On Fri, 14 Jun 2024, Tamar Christina wrote: > Hi All, > > IVOPTS normally uses affine trees to perform comparisons between different IVs, > but these seem to have been missing in two key spots and instead normal tree > equivalencies used. > > In some cases where we have a structural equivalence but not a signedness > equivalencies we end up generating both a signed and unsigned IV for the same > candidate. > > This happens quite a lot with fortran but can also happen in C because this came > code is unable to figure out when one expression is a multiple of another. > > As an example in the attached testcase we get: > > Initial set of candidates: > cost: 24 (complexity 3) > reg_cost: 9 > cand_cost: 15 > cand_group_cost: 0 (complexity 3) > candidates: 1, 6, 8 > group:0 --> iv_cand:6, cost=(0,1) > group:1 --> iv_cand:1, cost=(0,0) > group:2 --> iv_cand:8, cost=(0,1) > group:3 --> iv_cand:8, cost=(0,1) > invariant variables: 6 > invariant expressions: 1, 2 > > <Invariant Expressions>: > inv_expr 1: stride.3_27 * 4 > inv_expr 2: (unsigned long) stride.3_27 * 4 > > These end up being used in the same group: > > Group 1: > cand cost compl. inv.expr. inv.vars > 1 0 0 NIL; 6 > 2 0 0 NIL; 6 > 3 0 0 NIL; 6 > > which ends up with IV opts picking the signed and unsigned IVs: > > Improved to: > cost: 24 (complexity 3) > reg_cost: 9 > cand_cost: 15 > cand_group_cost: 0 (complexity 3) > candidates: 1, 6, 8 > group:0 --> iv_cand:6, cost=(0,1) > group:1 --> iv_cand:1, cost=(0,0) > group:2 --> iv_cand:8, cost=(0,1) > group:3 --> iv_cand:8, cost=(0,1) > invariant variables: 6 > invariant expressions: 1, 2 > > and so generates the same IV as both signed and unsigned: > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE) > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE) > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > ... > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) (FALLTHRU) ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE) ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE) # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> ivtmp.22_82 = ivtmp.22_41 + 1; ivtmp.26_72 = ivtmp.26_51 + _80; ivtmp.28_98 = ivtmp.28_90 + _39; > > These two IVs are always used as unsigned, so IV ops generates: > > _73 = stride.3_27 * 4; > _80 = (unsigned long) _73; > _54 = (unsigned long) stride.3_27; > _39 = _54 * 4; > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > This is because candidate 6 and 8 are structurally equivalent but have different > signs. > > This patch changes it so that if you have two IVs that are affine equivalent to > just pick one over the other. IV already has code for this, so the patch just > uses affine trees instead of tree for the check. > > With it we get: > > <Invariant Expressions>: > inv_expr 1: stride.3_27 * 4 > > <Group-candidate Costs>: > Group 0: > cand cost compl. inv.expr. inv.vars > 5 0 2 NIL; NIL; > 6 0 3 NIL; NIL; > > Group 1: > cand cost compl. inv.expr. inv.vars > 1 0 0 NIL; 6 > 2 0 0 NIL; 6 > 3 0 0 NIL; 6 > 4 0 0 NIL; 6 > > Initial set of candidates: > cost: 16 (complexity 3) > reg_cost: 6 > cand_cost: 10 > cand_group_cost: 0 (complexity 3) > candidates: 1, 6 > group:0 --> iv_cand:6, cost=(0,3) > group:1 --> iv_cand:1, cost=(0,0) > invariant variables: 6 > invariant expressions: 1 > > The two patches together results in a 10% performance increase in exchange2 in > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile > time. There's also a 5% performance improvement in fotonik3d and similar > reduction in binary size. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/114932 > * tree-affine.cc (aff_combination_constant_multiple_p): Take zero > offsets into account. > * tree-ssa-loop-ivopts.cc (affine_compare_eq): New. > (record_group_use): Use it. > (constant_multiple_of): Also check equality under > aff_combination_constant_multiple_p. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/114932 > * gfortran.dg/addressing-modes_2.f90: New test. > > --- > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > new file mode 100644 > index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae > --- /dev/null > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > @@ -0,0 +1,20 @@ > +! { dg-do compile { target aarch64-*-* } } > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > + > +module a > +integer, parameter :: b=3, c=b > +contains > +subroutine d(block) > +integer f, col , block(:, :, :), e > +do f = 1, c > + do col = 1, c > + block(:f, :, e()) = do > + end do > + end do > + end > + end > + > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } } > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } } > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } } > + > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc > index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644 > --- a/gcc/tree-affine.cc > +++ b/gcc/tree-affine.cc > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div, > &mult_set, mult)) > return false; > > + /* Everything is a multiple of 0, which means we shoudn't enforce that > + mult_set is checked, since that forced the only valid multiple of > + val and div to be 0 whereas 1 is also possible. */ > + if (known_eq (val->offset, 0) > + && known_eq (div->offset, 0)) > + mult_set = false; > + In fact all numbers are possible? Shouldn't this be better handled in wide_int_constant_multiple_p by special-casing known_eq (div, 0) in the known_eq (val, 0) condition by simply returning 'true' without checking or setting *mult_set? The docs of wide_int_constant_multiple_p is odd: /* If VAL != CST * DIV for any constant CST, returns false. should that be 'If VAL == CST * DIV for no constant CST, returns false.'? Or s/any/all/? > for (i = 0; i < div->n; i++) > { > class aff_comb_elt *elt > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644 > --- a/gcc/tree-ssa-loop-ivopts.cc > +++ b/gcc/tree-ssa-loop-ivopts.cc > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) > return exit; > } > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and return > + whether they are the same under affine equality. */ > + > +static bool > +affine_compare_eq (aff_tree &left, tree right) > +{ > + aff_tree aff_right; > + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); > + aff_combination_scale (&aff_right, -1); > + aff_combination_add (&aff_right, &left); > + return aff_combination_zero_p (&aff_right); > +} > + > > /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one > of the arguments of each expression is a constant and that the type of the > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p, > tree addr_base = NULL; > struct iv_group *group = NULL; > poly_uint64 addr_offset = 0; > + aff_tree iv_step, iv_addr_base; > + > + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); > > /* Record non address type use in a new group. */ > if (address_p (type)) > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p, > tree addr_toffset; > split_constant_offset (iv->base, &addr_base, &addr_toffset); > addr_offset = int_cst_value (addr_toffset); > + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base); > for (i = 0; i < data->vgroups.length (); i++) > { > struct iv_use *use; > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p, > > /* Check if it has the same stripped base and step. */ > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > - && operand_equal_p (iv->step, use->iv->step, 0) > - && operand_equal_p (addr_base, use->addr_base, 0)) > + && affine_compare_eq (iv_step, use->iv->step) > + && affine_compare_eq (iv_addr_base, use->addr_base)) There's only this use of addr_base so I think the opportunity is to turn iv_use->addr_base into aff_tree (even though that's a quite big representation). For the testcase, what are the two IVs we are comparing? I wonder why you need the affine compare for iv->step? > break; > } > if (i == data->vgroups.length ()) > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul) > return true; > } > > + aff_tree aff_top, aff_bot; > + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); > + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); > + poly_widest_int poly_mul; > + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) > + && poly_mul.is_constant (mul)) > + return true; > + So why does stripping nops not work here? > code = TREE_CODE (top); > switch (code) > { > > > > >
> -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Wednesday, June 19, 2024 12:55 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate > selection [PR114932] > > On Fri, 14 Jun 2024, Tamar Christina wrote: > > > Hi All, > > > > IVOPTS normally uses affine trees to perform comparisons between different IVs, > > but these seem to have been missing in two key spots and instead normal tree > > equivalencies used. > > > > In some cases where we have a structural equivalence but not a signedness > > equivalencies we end up generating both a signed and unsigned IV for the same > > candidate. > > > > This happens quite a lot with fortran but can also happen in C because this came > > code is unable to figure out when one expression is a multiple of another. > > > > As an example in the attached testcase we get: > > > > Initial set of candidates: > > cost: 24 (complexity 3) > > reg_cost: 9 > > cand_cost: 15 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6, 8 > > group:0 --> iv_cand:6, cost=(0,1) > > group:1 --> iv_cand:1, cost=(0,0) > > group:2 --> iv_cand:8, cost=(0,1) > > group:3 --> iv_cand:8, cost=(0,1) > > invariant variables: 6 > > invariant expressions: 1, 2 > > > > <Invariant Expressions>: > > inv_expr 1: stride.3_27 * 4 > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > These end up being used in the same group: > > > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 0 0 NIL; 6 > > 2 0 0 NIL; 6 > > 3 0 0 NIL; 6 > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > Improved to: > > cost: 24 (complexity 3) > > reg_cost: 9 > > cand_cost: 15 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6, 8 > > group:0 --> iv_cand:6, cost=(0,1) > > group:1 --> iv_cand:1, cost=(0,0) > > group:2 --> iv_cand:8, cost=(0,1) > > group:3 --> iv_cand:8, cost=(0,1) > > invariant variables: 6 > > invariant expressions: 1, 2 > > > > and so generates the same IV as both signed and unsigned: > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > 58.2545), maybe hot > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > (FALLTHRU,EXECUTABLE) > > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > (FALLTHRU,DFS_BACK,EXECUTABLE) > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > ... > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > 58.2545), maybe hot > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > (FALLTHRU) > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq 19.4182) > (TRUE_VALUE,EXECUTABLE) > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq 12.9455) > (TRUE_VALUE,EXECUTABLE) > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > ivtmp.22_82 = ivtmp.22_41 + 1; > ivtmp.26_72 = ivtmp.26_51 + _80; > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > _73 = stride.3_27 * 4; > > _80 = (unsigned long) _73; > > _54 = (unsigned long) stride.3_27; > > _39 = _54 * 4; > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > This is because candidate 6 and 8 are structurally equivalent but have different > > signs. > > > > This patch changes it so that if you have two IVs that are affine equivalent to > > just pick one over the other. IV already has code for this, so the patch just > > uses affine trees instead of tree for the check. > > > > With it we get: > > > > <Invariant Expressions>: > > inv_expr 1: stride.3_27 * 4 > > > > <Group-candidate Costs>: > > Group 0: > > cand cost compl. inv.expr. inv.vars > > 5 0 2 NIL; NIL; > > 6 0 3 NIL; NIL; > > > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 0 0 NIL; 6 > > 2 0 0 NIL; 6 > > 3 0 0 NIL; 6 > > 4 0 0 NIL; 6 > > > > Initial set of candidates: > > cost: 16 (complexity 3) > > reg_cost: 6 > > cand_cost: 10 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6 > > group:0 --> iv_cand:6, cost=(0,3) > > group:1 --> iv_cand:1, cost=(0,0) > > invariant variables: 6 > > invariant expressions: 1 > > > > The two patches together results in a 10% performance increase in exchange2 in > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > compile > > time. There's also a 5% performance improvement in fotonik3d and similar > > reduction in binary size. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/114932 > > * tree-affine.cc (aff_combination_constant_multiple_p): Take zero > > offsets into account. > > * tree-ssa-loop-ivopts.cc (affine_compare_eq): New. > > (record_group_use): Use it. > > (constant_multiple_of): Also check equality under > > aff_combination_constant_multiple_p. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/114932 > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > --- > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > 2e24a4973c8539fae > > --- /dev/null > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > @@ -0,0 +1,20 @@ > > +! { dg-do compile { target aarch64-*-* } } > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > + > > +module a > > +integer, parameter :: b=3, c=b > > +contains > > +subroutine d(block) > > +integer f, col , block(:, :, :), e > > +do f = 1, c > > + do col = 1, c > > + block(:f, :, e()) = do > > + end do > > + end do > > + end > > + end > > + > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts > } } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 > ivopts } } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 > ivopts } } > > + > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc > > index > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28 > 6346890ddeab4096 100644 > > --- a/gcc/tree-affine.cc > > +++ b/gcc/tree-affine.cc > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, > aff_tree *div, > > &mult_set, mult)) > > return false; > > > > + /* Everything is a multiple of 0, which means we shoudn't enforce that > > + mult_set is checked, since that forced the only valid multiple of > > + val and div to be 0 whereas 1 is also possible. */ > > + if (known_eq (val->offset, 0) > > + && known_eq (div->offset, 0)) > > + mult_set = false; > > + > > In fact all numbers are possible? Shouldn't this be better handled > in wide_int_constant_multiple_p by special-casing > known_eq (div, 0) in the known_eq (val, 0) condition by simply > returning 'true' without checking or setting *mult_set? > > The docs of wide_int_constant_multiple_p is odd: > > /* If VAL != CST * DIV for any constant CST, returns false. > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'? > Or s/any/all/? > Yeah, that condition would always be false. > > for (i = 0; i < div->n; i++) > > { > > class aff_comb_elt *elt > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > index > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487 > 64cd2facd9ddb4914 100644 > > --- a/gcc/tree-ssa-loop-ivopts.cc > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) > > return exit; > > } > > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and > return > > + whether they are the same under affine equality. */ > > + > > +static bool > > +affine_compare_eq (aff_tree &left, tree right) > > +{ > > + aff_tree aff_right; > > + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); > > + aff_combination_scale (&aff_right, -1); > > + aff_combination_add (&aff_right, &left); > > + return aff_combination_zero_p (&aff_right); > > +} > > + > > > > /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one > > of the arguments of each expression is a constant and that the type of the > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree > *use_p, > > tree addr_base = NULL; > > struct iv_group *group = NULL; > > poly_uint64 addr_offset = 0; > > + aff_tree iv_step, iv_addr_base; > > + > > + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); > > > > /* Record non address type use in a new group. */ > > if (address_p (type)) > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree > *use_p, > > tree addr_toffset; > > split_constant_offset (iv->base, &addr_base, &addr_toffset); > > addr_offset = int_cst_value (addr_toffset); > > + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), > &iv_addr_base); > > for (i = 0; i < data->vgroups.length (); i++) > > { > > struct iv_use *use; > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree > *use_p, > > > > /* Check if it has the same stripped base and step. */ > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > - && operand_equal_p (iv->step, use->iv->step, 0) > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > + && affine_compare_eq (iv_step, use->iv->step) > > + && affine_compare_eq (iv_addr_base, use->addr_base)) > > There's only this use of addr_base so I think the opportunity is to > turn iv_use->addr_base into aff_tree (even though that's a quite big > representation). Ah, true. Will do. > > For the testcase, what are the two IVs we are comparing? I wonder > why you need the affine compare for iv->step? > Because step builds up the IV expressions and can also be an expression. In this case: >>> p debug (iv->step) ((unsigned long) stride.3_27) * 4 >>> p debug (use->iv->step) (sizetype) (stride.3_27 * 4) Note that the original expressions were both unsigned, but the STRIP_NOPS made one signed. They still wouldn't have compared equal though due to the different cast locations. For completeness the base here is >>> p debug (use->addr_base) (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26) $9 = void >>> p debug (addr_base) (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26) $10 = void > > > break; > > } > > if (i == data->vgroups.length ()) > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int > *mul) > > return true; > > } > > > > + aff_tree aff_top, aff_bot; > > + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); > > + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); > > + poly_widest_int poly_mul; > > + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) > > + && poly_mul.is_constant (mul)) > > + return true; > > + > > So why does stripping nops not work here? So this is where we compare different IV expressions to determine which IVs compute the same thing and thus can be in the same group. The STRIP_NOPS don't work because while the incoming types are the same the casts are different. So: >>> p debug (ustep) (unsigned long) stride.3_27 * 4 $3 = void >>> p debug (cstep) (unsigned long) (stride.3_27 * 4) $4 = void Which is of course stripped to: >>> p debug (top) (unsigned long) stride.3_27 * 4 $1 = void >>> p debug (bot) stride.3_27 * 4 Both of these compute the same thing so by doing the affine compare they end up in the same IV groups and can be costed. Later during candidate selection these are the steps we're comparing to see if the candidate is the same invariant. The full list is: <IV Groups>: Group 0: Type: REFERENCE ADDRESS Use 0.0: At stmt: *_4 = _33; At pos: *_4 IV struct: Type: integer(kind=4) * Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 4) Step: (sizetype) (stride.3_27 * 4) Object: (void *) block.0_26 Biv: N Overflowness wrto loop niter: Overflow Use 0.1: At stmt: *_78 = _33; At pos: *_78 IV struct: Type: integer(kind=4) * Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 8) Step: (unsigned long) stride.3_27 * 4 Object: (void *) block.0_26 Biv: N Overflowness wrto loop niter: Overflow Use 0.2: At stmt: *_45 = _33; At pos: *_45 IV struct: Type: integer(kind=4) * Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 12) Step: (unsigned long) stride.3_27 * 4 Object: (void *) block.0_26 Biv: N Overflowness wrto loop niter: Overflow And all these IVs are the same but with a slightly different base offset. By doing the affine compare here IV ops sees that The best candidate is: Candidate 6: Depend on inv.exprs: 1 Var befor: ivtmp.26_51 Var after: ivtmp.26_72 Incr POS: before exit test IV struct: Type: unsigned long Base: ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26 Step: (unsigned long) (stride.3_27 * 4) Object: (void *) block.0_26 Biv: N Overflowness wrto loop niter: Overflow And just builds the three IVs from the same candidate. Does this cover what you wanted? Cheers, Tamar > > > code = TREE_CODE (top); > > switch (code) > > { > > > > > > > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Hello, On Wed, 19 Jun 2024, Tamar Christina wrote: > So this is where we compare different IV expressions to determine which > IVs compute the same thing and thus can be in the same group. > > The STRIP_NOPS don't work because while the incoming types are the same > the casts are different. So: > > >>> p debug (ustep) > (unsigned long) stride.3_27 * 4 > $3 = void > >>> p debug (cstep) > (unsigned long) (stride.3_27 * 4) > $4 = void > > Which is of course stripped to: > > >>> p debug (top) > (unsigned long) stride.3_27 * 4 > $1 = void > >>> p debug (bot) > stride.3_27 * 4 > > Both of these compute the same thing In isolation these are _not_ computing the same when strides type is smaller than ulong, namely when stride is either negative or larger than its max-value/4. I.e. when comparing IVs not only the overflow behaviour for the whole {base,+,step} revolution matters, but also the behaviour on the constituent expressions. (It's possible that stride is known to be non-problematic here, I haven't checked. I was just triggered by the claim of same-ness :) ) Ciao, Michael.
> -----Original Message----- > From: Michael Matz <matz@suse.de> > Sent: Wednesday, June 19, 2024 3:46 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: Richard Biener <rguenther@suse.de>; gcc-patches@gcc.gnu.org; nd > <nd@arm.com>; bin.cheng@linux.alibaba.com > Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate > selection [PR114932] > > Hello, > > On Wed, 19 Jun 2024, Tamar Christina wrote: > > > So this is where we compare different IV expressions to determine which > > IVs compute the same thing and thus can be in the same group. > > > > The STRIP_NOPS don't work because while the incoming types are the same > > the casts are different. So: > > > > >>> p debug (ustep) > > (unsigned long) stride.3_27 * 4 > > $3 = void > > >>> p debug (cstep) > > (unsigned long) (stride.3_27 * 4) > > $4 = void > > > > Which is of course stripped to: > > > > >>> p debug (top) > > (unsigned long) stride.3_27 * 4 > > $1 = void > > >>> p debug (bot) > > stride.3_27 * 4 > > > > Both of these compute the same thing > > In isolation these are _not_ computing the same when strides type is > smaller than ulong, namely when stride is either negative or larger than > its max-value/4. I.e. when comparing IVs not only the overflow behaviour > for the whole {base,+,step} revolution matters, but also the behaviour on > the constituent expressions. (It's possible that stride is known to be > non-problematic here, I haven't checked. I was just triggered by the > claim of same-ness :) ) The only use of this method is to determine whether the two expressions can possibly be the same. After this IVops forcibly converts them to an unsigned type though an affine fold in get_computation_aff_1. So in the end it doesn't care about the sign and uses them all as unsigned. Tamar > > > Ciao, > Michael.
On Wed, 19 Jun 2024, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de> > > Sent: Wednesday, June 19, 2024 12:55 PM > > To: Tamar Christina <Tamar.Christina@arm.com> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate > > selection [PR114932] > > > > On Fri, 14 Jun 2024, Tamar Christina wrote: > > > > > Hi All, > > > > > > IVOPTS normally uses affine trees to perform comparisons between different IVs, > > > but these seem to have been missing in two key spots and instead normal tree > > > equivalencies used. > > > > > > In some cases where we have a structural equivalence but not a signedness > > > equivalencies we end up generating both a signed and unsigned IV for the same > > > candidate. > > > > > > This happens quite a lot with fortran but can also happen in C because this came > > > code is unable to figure out when one expression is a multiple of another. > > > > > > As an example in the attached testcase we get: > > > > > > Initial set of candidates: > > > cost: 24 (complexity 3) > > > reg_cost: 9 > > > cand_cost: 15 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6, 8 > > > group:0 --> iv_cand:6, cost=(0,1) > > > group:1 --> iv_cand:1, cost=(0,0) > > > group:2 --> iv_cand:8, cost=(0,1) > > > group:3 --> iv_cand:8, cost=(0,1) > > > invariant variables: 6 > > > invariant expressions: 1, 2 > > > > > > <Invariant Expressions>: > > > inv_expr 1: stride.3_27 * 4 > > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > > > These end up being used in the same group: > > > > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 0 0 NIL; 6 > > > 2 0 0 NIL; 6 > > > 3 0 0 NIL; 6 > > > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > > > Improved to: > > > cost: 24 (complexity 3) > > > reg_cost: 9 > > > cand_cost: 15 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6, 8 > > > group:0 --> iv_cand:6, cost=(0,1) > > > group:1 --> iv_cand:1, cost=(0,0) > > > group:2 --> iv_cand:8, cost=(0,1) > > > group:3 --> iv_cand:8, cost=(0,1) > > > invariant variables: 6 > > > invariant expressions: 1, 2 > > > > > > and so generates the same IV as both signed and unsigned: > > > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > > 58.2545), maybe hot > > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > > (FALLTHRU,EXECUTABLE) > > > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > > > ... > > > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > > 58.2545), maybe hot > > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > > (FALLTHRU) > > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq 19.4182) > > (TRUE_VALUE,EXECUTABLE) > > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq 12.9455) > > (TRUE_VALUE,EXECUTABLE) > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > > > ivtmp.22_82 = ivtmp.22_41 + 1; > > ivtmp.26_72 = ivtmp.26_51 + _80; > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > > > _73 = stride.3_27 * 4; > > > _80 = (unsigned long) _73; > > > _54 = (unsigned long) stride.3_27; > > > _39 = _54 * 4; > > > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > > > This is because candidate 6 and 8 are structurally equivalent but have different > > > signs. > > > > > > This patch changes it so that if you have two IVs that are affine equivalent to > > > just pick one over the other. IV already has code for this, so the patch just > > > uses affine trees instead of tree for the check. > > > > > > With it we get: > > > > > > <Invariant Expressions>: > > > inv_expr 1: stride.3_27 * 4 > > > > > > <Group-candidate Costs>: > > > Group 0: > > > cand cost compl. inv.expr. inv.vars > > > 5 0 2 NIL; NIL; > > > 6 0 3 NIL; NIL; > > > > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 0 0 NIL; 6 > > > 2 0 0 NIL; 6 > > > 3 0 0 NIL; 6 > > > 4 0 0 NIL; 6 > > > > > > Initial set of candidates: > > > cost: 16 (complexity 3) > > > reg_cost: 6 > > > cand_cost: 10 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6 > > > group:0 --> iv_cand:6, cost=(0,3) > > > group:1 --> iv_cand:1, cost=(0,0) > > > invariant variables: 6 > > > invariant expressions: 1 > > > > > > The two patches together results in a 10% performance increase in exchange2 in > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > compile > > > time. There's also a 5% performance improvement in fotonik3d and similar > > > reduction in binary size. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/114932 > > > * tree-affine.cc (aff_combination_constant_multiple_p): Take zero > > > offsets into account. > > > * tree-ssa-loop-ivopts.cc (affine_compare_eq): New. > > > (record_group_use): Use it. > > > (constant_multiple_of): Also check equality under > > > aff_combination_constant_multiple_p. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/114932 > > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > > > --- > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > 2e24a4973c8539fae > > > --- /dev/null > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > @@ -0,0 +1,20 @@ > > > +! { dg-do compile { target aarch64-*-* } } > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > > + > > > +module a > > > +integer, parameter :: b=3, c=b > > > +contains > > > +subroutine d(block) > > > +integer f, col , block(:, :, :), e > > > +do f = 1, c > > > + do col = 1, c > > > + block(:f, :, e()) = do > > > + end do > > > + end do > > > + end > > > + end > > > + > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts > > } } > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 > > ivopts } } > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 > > ivopts } } > > > + > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc > > > index > > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28 > > 6346890ddeab4096 100644 > > > --- a/gcc/tree-affine.cc > > > +++ b/gcc/tree-affine.cc > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, > > aff_tree *div, > > > &mult_set, mult)) > > > return false; > > > > > > + /* Everything is a multiple of 0, which means we shoudn't enforce that > > > + mult_set is checked, since that forced the only valid multiple of > > > + val and div to be 0 whereas 1 is also possible. */ > > > + if (known_eq (val->offset, 0) > > > + && known_eq (div->offset, 0)) > > > + mult_set = false; > > > + > > > > In fact all numbers are possible? Shouldn't this be better handled > > in wide_int_constant_multiple_p by special-casing > > known_eq (div, 0) in the known_eq (val, 0) condition by simply > > returning 'true' without checking or setting *mult_set? > > > > The docs of wide_int_constant_multiple_p is odd: > > > > /* If VAL != CST * DIV for any constant CST, returns false. > > > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'? > > Or s/any/all/? > > > > Yeah, that condition would always be false. Can you split out a patch to fix this? > > > for (i = 0; i < div->n; i++) > > > { > > > class aff_comb_elt *elt > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > > index > > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487 > > 64cd2facd9ddb4914 100644 > > > --- a/gcc/tree-ssa-loop-ivopts.cc > > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) > > > return exit; > > > } > > > > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and > > return > > > + whether they are the same under affine equality. */ > > > + > > > +static bool > > > +affine_compare_eq (aff_tree &left, tree right) > > > +{ > > > + aff_tree aff_right; > > > + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); > > > + aff_combination_scale (&aff_right, -1); > > > + aff_combination_add (&aff_right, &left); > > > + return aff_combination_zero_p (&aff_right); > > > +} > > > + > > > > > > /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one > > > of the arguments of each expression is a constant and that the type of the > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree > > *use_p, > > > tree addr_base = NULL; > > > struct iv_group *group = NULL; > > > poly_uint64 addr_offset = 0; > > > + aff_tree iv_step, iv_addr_base; > > > + > > > + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); > > > > > > /* Record non address type use in a new group. */ > > > if (address_p (type)) > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree > > *use_p, > > > tree addr_toffset; > > > split_constant_offset (iv->base, &addr_base, &addr_toffset); > > > addr_offset = int_cst_value (addr_toffset); > > > + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), > > &iv_addr_base); > > > for (i = 0; i < data->vgroups.length (); i++) > > > { > > > struct iv_use *use; > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree > > *use_p, > > > > > > /* Check if it has the same stripped base and step. */ > > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > > - && operand_equal_p (iv->step, use->iv->step, 0) > > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > > + && affine_compare_eq (iv_step, use->iv->step) > > > + && affine_compare_eq (iv_addr_base, use->addr_base)) > > > > There's only this use of addr_base so I think the opportunity is to > > turn iv_use->addr_base into aff_tree (even though that's a quite big > > representation). > > Ah, true. Will do. > > > > > For the testcase, what are the two IVs we are comparing? I wonder > > why you need the affine compare for iv->step? > > > > Because step builds up the IV expressions and can also be an expression. > > In this case: > > >>> p debug (iv->step) > ((unsigned long) stride.3_27) * 4 > >>> p debug (use->iv->step) > (sizetype) (stride.3_27 * 4) > > Note that the original expressions were both unsigned, but the STRIP_NOPS > made one signed. They still wouldn't have compared equal though due to > the different cast locations. > > For completeness the base here is > > >>> p debug (use->addr_base) > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26) > $9 = void > > >>> p debug (addr_base) > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26) > $10 = void > > > > > > break; > > > } > > > if (i == data->vgroups.length ()) > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int > > *mul) > > > return true; > > > } > > > > > > + aff_tree aff_top, aff_bot; > > > + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); > > > + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); > > > + poly_widest_int poly_mul; > > > + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) > > > + && poly_mul.is_constant (mul)) > > > + return true; > > > + > > > > So why does stripping nops not work here? > > So this is where we compare different IV expressions to determine which > IVs compute the same thing and thus can be in the same group. > > The STRIP_NOPS don't work because while the incoming types are the same > the casts are different. So: > > >>> p debug (ustep) > (unsigned long) stride.3_27 * 4 > $3 = void > >>> p debug (cstep) > (unsigned long) (stride.3_27 * 4) > $4 = void > > Which is of course stripped to: > > >>> p debug (top) > (unsigned long) stride.3_27 * 4 > $1 = void > >>> p debug (bot) > stride.3_27 * 4 So we're expecting constant_multiple_of to compute *mul == 1, proving equality? The function seems to try stripping ops from TOP until it reaches an expression equal to BOT and that's what fails to trigger here. What I originally wondered was iff we compute the affine combinations why not use only aff_combination_constant_multiple_p? I might also point back to the idea I threw in somewhere, adding OEP_VALUE (or a better name) to the set of flags accepted by operand_equal_p. You mentioned hashing IIRC but I don't see the patches touching hashing? > Both of these compute the same thing so by doing the affine compare they > end up in the same IV groups and can be costed. Later during candidate selection > these are the steps we're comparing to see if the candidate is the same invariant. > > The full list is: > > <IV Groups>: > Group 0: > Type: REFERENCE ADDRESS > Use 0.0: > At stmt: *_4 = _33; > At pos: *_4 > IV struct: > Type: integer(kind=4) * > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 4) > Step: (sizetype) (stride.3_27 * 4) > Object: (void *) block.0_26 > Biv: N > Overflowness wrto loop niter: Overflow > Use 0.1: > At stmt: *_78 = _33; > At pos: *_78 > IV struct: > Type: integer(kind=4) * > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 8) > Step: (unsigned long) stride.3_27 * 4 > Object: (void *) block.0_26 > Biv: N > Overflowness wrto loop niter: Overflow > Use 0.2: > At stmt: *_45 = _33; > At pos: *_45 > IV struct: > Type: integer(kind=4) * > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 12) > Step: (unsigned long) stride.3_27 * 4 > Object: (void *) block.0_26 > Biv: N > Overflowness wrto loop niter: Overflow > > And all these IVs are the same but with a slightly different base offset. By doing the affine compare here IV ops sees that > The best candidate is: > > Candidate 6: > Depend on inv.exprs: 1 > Var befor: ivtmp.26_51 > Var after: ivtmp.26_72 > Incr POS: before exit test > IV struct: > Type: unsigned long > Base: ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26 > Step: (unsigned long) (stride.3_27 * 4) > Object: (void *) block.0_26 > Biv: N > Overflowness wrto loop niter: Overflow > > And just builds the three IVs from the same candidate. > > Does this cover what you wanted? > > Cheers, > Tamar > > > > > > code = TREE_CODE (top); > > > switch (code) > > > { > > > > > > > > > > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) >
> -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Thursday, June 20, 2024 8:49 AM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com > Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate > selection [PR114932] > > On Wed, 19 Jun 2024, Tamar Christina wrote: > > > > -----Original Message----- > > > From: Richard Biener <rguenther@suse.de> > > > Sent: Wednesday, June 19, 2024 12:55 PM > > > To: Tamar Christina <Tamar.Christina@arm.com> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; > bin.cheng@linux.alibaba.com > > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during > candidate > > > selection [PR114932] > > > > > > On Fri, 14 Jun 2024, Tamar Christina wrote: > > > > > > > Hi All, > > > > > > > > IVOPTS normally uses affine trees to perform comparisons between different > IVs, > > > > but these seem to have been missing in two key spots and instead normal > tree > > > > equivalencies used. > > > > > > > > In some cases where we have a structural equivalence but not a signedness > > > > equivalencies we end up generating both a signed and unsigned IV for the > same > > > > candidate. > > > > > > > > This happens quite a lot with fortran but can also happen in C because this > came > > > > code is unable to figure out when one expression is a multiple of another. > > > > > > > > As an example in the attached testcase we get: > > > > > > > > Initial set of candidates: > > > > cost: 24 (complexity 3) > > > > reg_cost: 9 > > > > cand_cost: 15 > > > > cand_group_cost: 0 (complexity 3) > > > > candidates: 1, 6, 8 > > > > group:0 --> iv_cand:6, cost=(0,1) > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > group:2 --> iv_cand:8, cost=(0,1) > > > > group:3 --> iv_cand:8, cost=(0,1) > > > > invariant variables: 6 > > > > invariant expressions: 1, 2 > > > > > > > > <Invariant Expressions>: > > > > inv_expr 1: stride.3_27 * 4 > > > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > > > > > These end up being used in the same group: > > > > > > > > Group 1: > > > > cand cost compl. inv.expr. inv.vars > > > > 1 0 0 NIL; 6 > > > > 2 0 0 NIL; 6 > > > > 3 0 0 NIL; 6 > > > > > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > > > > > Improved to: > > > > cost: 24 (complexity 3) > > > > reg_cost: 9 > > > > cand_cost: 15 > > > > cand_group_cost: 0 (complexity 3) > > > > candidates: 1, 6, 8 > > > > group:0 --> iv_cand:6, cost=(0,1) > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > group:2 --> iv_cand:8, cost=(0,1) > > > > group:3 --> iv_cand:8, cost=(0,1) > > > > invariant variables: 6 > > > > invariant expressions: 1, 2 > > > > > > > > and so generates the same IV as both signed and unsigned: > > > > > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > > > 58.2545), maybe hot > > > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > > > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > > > (FALLTHRU,EXECUTABLE) > > > > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > > > > > ... > > > > > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > > > 58.2545), maybe hot > > > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > > > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > > > (FALLTHRU) > > > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq > 19.4182) > > > (TRUE_VALUE,EXECUTABLE) > > > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq > 12.9455) > > > (TRUE_VALUE,EXECUTABLE) > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > > > > > ivtmp.22_82 = ivtmp.22_41 + 1; > > > ivtmp.26_72 = ivtmp.26_51 + _80; > > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > > > > > _73 = stride.3_27 * 4; > > > > _80 = (unsigned long) _73; > > > > _54 = (unsigned long) stride.3_27; > > > > _39 = _54 * 4; > > > > > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > > > > > This is because candidate 6 and 8 are structurally equivalent but have > different > > > > signs. > > > > > > > > This patch changes it so that if you have two IVs that are affine equivalent to > > > > just pick one over the other. IV already has code for this, so the patch just > > > > uses affine trees instead of tree for the check. > > > > > > > > With it we get: > > > > > > > > <Invariant Expressions>: > > > > inv_expr 1: stride.3_27 * 4 > > > > > > > > <Group-candidate Costs>: > > > > Group 0: > > > > cand cost compl. inv.expr. inv.vars > > > > 5 0 2 NIL; NIL; > > > > 6 0 3 NIL; NIL; > > > > > > > > Group 1: > > > > cand cost compl. inv.expr. inv.vars > > > > 1 0 0 NIL; 6 > > > > 2 0 0 NIL; 6 > > > > 3 0 0 NIL; 6 > > > > 4 0 0 NIL; 6 > > > > > > > > Initial set of candidates: > > > > cost: 16 (complexity 3) > > > > reg_cost: 6 > > > > cand_cost: 10 > > > > cand_group_cost: 0 (complexity 3) > > > > candidates: 1, 6 > > > > group:0 --> iv_cand:6, cost=(0,3) > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > invariant variables: 6 > > > > invariant expressions: 1 > > > > > > > > The two patches together results in a 10% performance increase in exchange2 > in > > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > > compile > > > > time. There's also a 5% performance improvement in fotonik3d and similar > > > > reduction in binary size. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > > > Ok for master? > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > PR tree-optimization/114932 > > > > * tree-affine.cc (aff_combination_constant_multiple_p): Take zero > > > > offsets into account. > > > > * tree-ssa-loop-ivopts.cc (affine_compare_eq): New. > > > > (record_group_use): Use it. > > > > (constant_multiple_of): Also check equality under > > > > aff_combination_constant_multiple_p. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR tree-optimization/114932 > > > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > > > > > --- > > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > new file mode 100644 > > > > index > > > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > > 2e24a4973c8539fae > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > @@ -0,0 +1,20 @@ > > > > +! { dg-do compile { target aarch64-*-* } } > > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > > > + > > > > +module a > > > > +integer, parameter :: b=3, c=b > > > > +contains > > > > +subroutine d(block) > > > > +integer f, col , block(:, :, :), e > > > > +do f = 1, c > > > > + do col = 1, c > > > > + block(:f, :, e()) = do > > > > + end do > > > > + end do > > > > + end > > > > + end > > > > + > > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} > ivopts > > > } } > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} > 1 > > > ivopts } } > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} > 1 > > > ivopts } } > > > > + > > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc > > > > index > > > > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28 > > > 6346890ddeab4096 100644 > > > > --- a/gcc/tree-affine.cc > > > > +++ b/gcc/tree-affine.cc > > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree > *val, > > > aff_tree *div, > > > > &mult_set, mult)) > > > > return false; > > > > > > > > + /* Everything is a multiple of 0, which means we shoudn't enforce that > > > > + mult_set is checked, since that forced the only valid multiple of > > > > + val and div to be 0 whereas 1 is also possible. */ > > > > + if (known_eq (val->offset, 0) > > > > + && known_eq (div->offset, 0)) > > > > + mult_set = false; > > > > + > > > > > > In fact all numbers are possible? Shouldn't this be better handled > > > in wide_int_constant_multiple_p by special-casing > > > known_eq (div, 0) in the known_eq (val, 0) condition by simply > > > returning 'true' without checking or setting *mult_set? > > > > > > The docs of wide_int_constant_multiple_p is odd: > > > > > > /* If VAL != CST * DIV for any constant CST, returns false. > > > > > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'? > > > Or s/any/all/? > > > > > > > Yeah, that condition would always be false. > > Can you split out a patch to fix this? Doing now. > > > > > for (i = 0; i < div->n; i++) > > > > { > > > > class aff_comb_elt *elt > > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > > > index > > > > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487 > > > 64cd2facd9ddb4914 100644 > > > > --- a/gcc/tree-ssa-loop-ivopts.cc > > > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) > > > > return exit; > > > > } > > > > > > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and > > > return > > > > + whether they are the same under affine equality. */ > > > > + > > > > +static bool > > > > +affine_compare_eq (aff_tree &left, tree right) > > > > +{ > > > > + aff_tree aff_right; > > > > + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); > > > > + aff_combination_scale (&aff_right, -1); > > > > + aff_combination_add (&aff_right, &left); > > > > + return aff_combination_zero_p (&aff_right); > > > > +} > > > > + > > > > > > > > /* Given a nested expression in ARG consisting of PLUS or MULT try to see if > one > > > > of the arguments of each expression is a constant and that the type of the > > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree > > > *use_p, > > > > tree addr_base = NULL; > > > > struct iv_group *group = NULL; > > > > poly_uint64 addr_offset = 0; > > > > + aff_tree iv_step, iv_addr_base; > > > > + > > > > + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); > > > > > > > > /* Record non address type use in a new group. */ > > > > if (address_p (type)) > > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree > > > *use_p, > > > > tree addr_toffset; > > > > split_constant_offset (iv->base, &addr_base, &addr_toffset); > > > > addr_offset = int_cst_value (addr_toffset); > > > > + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), > > > &iv_addr_base); > > > > for (i = 0; i < data->vgroups.length (); i++) > > > > { > > > > struct iv_use *use; > > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree > > > *use_p, > > > > > > > > /* Check if it has the same stripped base and step. */ > > > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > > > - && operand_equal_p (iv->step, use->iv->step, 0) > > > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > > > + && affine_compare_eq (iv_step, use->iv->step) > > > > + && affine_compare_eq (iv_addr_base, use->addr_base)) > > > > > > There's only this use of addr_base so I think the opportunity is to > > > turn iv_use->addr_base into aff_tree (even though that's a quite big > > > representation). > > > > Ah, true. Will do. > > > > > > > > For the testcase, what are the two IVs we are comparing? I wonder > > > why you need the affine compare for iv->step? > > > > > > > Because step builds up the IV expressions and can also be an expression. > > > > In this case: > > > > >>> p debug (iv->step) > > ((unsigned long) stride.3_27) * 4 > > >>> p debug (use->iv->step) > > (sizetype) (stride.3_27 * 4) > > > > Note that the original expressions were both unsigned, but the STRIP_NOPS > > made one signed. They still wouldn't have compared equal though due to > > the different cast locations. > > > > For completeness the base here is > > > > >>> p debug (use->addr_base) > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) > block.0_26) > > $9 = void > > > > >>> p debug (addr_base) > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) > block.0_26) > > $10 = void > > > > > > > > > break; > > > > } > > > > if (i == data->vgroups.length ()) > > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, > widest_int > > > *mul) > > > > return true; > > > > } > > > > > > > > + aff_tree aff_top, aff_bot; > > > > + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); > > > > + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); > > > > + poly_widest_int poly_mul; > > > > + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) > > > > + && poly_mul.is_constant (mul)) > > > > + return true; > > > > + > > > > > > So why does stripping nops not work here? > > > > So this is where we compare different IV expressions to determine which > > IVs compute the same thing and thus can be in the same group. > > > > The STRIP_NOPS don't work because while the incoming types are the same > > the casts are different. So: > > > > >>> p debug (ustep) > > (unsigned long) stride.3_27 * 4 > > $3 = void > > >>> p debug (cstep) > > (unsigned long) (stride.3_27 * 4) > > $4 = void > > > > Which is of course stripped to: > > > > >>> p debug (top) > > (unsigned long) stride.3_27 * 4 > > $1 = void > > >>> p debug (bot) > > stride.3_27 * 4 > > So we're expecting constant_multiple_of to compute *mul == 1, proving > equality? > Indeed > The function seems to try stripping ops from TOP until it reaches > an expression equal to BOT and that's what fails to trigger here. > > What I originally wondered was iff we compute the affine combinations > why not use only aff_combination_constant_multiple_p? > I think that's probably easier, The rest of the code seems to indeed be repeating the work of aff_combination_constant_multiple_p. I can try replacing the whole thing with aff_combination_constant_multiple_p and see? > I might also point back to the idea I threw in somewhere, adding > OEP_VALUE (or a better name) to the set of flags accepted by > operand_equal_p. You mentioned hashing IIRC but I don't see the patches > touching hashing? > Yes, That can indeed be done with this approach. The hashing was that before I was trying to prevent the "duplicate" IV expressions from being created in the first place by modifying get_loop_invariant_expr. This function looks up if we have already seen a particular IV expression and if we have it just returns that expression. However after reading more of the code I realized this wasn't the right approach, as without also dealing with the candidates we'd end up creating IV expression that can't be handled by any candidate. IVops would just give up then. Reading the code it seems that get_loop_invariant_expr is just there to prevent blatant duplicates. i.e. it treats `(signed) a` and `a` as the same. This is also why I think that everywhere else *has* to continue stripping the expression. On a note from Richard S that he thought IVopts already had some code to deal with expressions that differ only in signs led me to take a different approach. The goal wasn't to remove the different sign/unsigned IV expressions, but instead get Then to be servable by the same candidates. i.e. we want them in the same candidate groups and then candidate optimization will just do its thing. That seemed a more natural fit to how it worked. Do you want me to try the operand_equal_p approach? Though in this case the issue is we not only need to know they're equal, but also need to know the scale factor. get_computation_aff_1 scales the common type IV by the scale we determined, so I think operand_equal_p would not be very useful here. But it does look like constant_multiple_of can just be implemented with aff_combination_constant_multiple_p. Should I try? Thanks, Tamar > > Both of these compute the same thing so by doing the affine compare they > > end up in the same IV groups and can be costed. Later during candidate selection > > these are the steps we're comparing to see if the candidate is the same invariant. > > > > The full list is: > > > > <IV Groups>: > > Group 0: > > Type: REFERENCE ADDRESS > > Use 0.0: > > At stmt: *_4 = _33; > > At pos: *_4 > > IV struct: > > Type: integer(kind=4) * > > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) > _36) * 4 + (unsigned long) block.0_26) + 4) > > Step: (sizetype) (stride.3_27 * 4) > > Object: (void *) block.0_26 > > Biv: N > > Overflowness wrto loop niter: Overflow > > Use 0.1: > > At stmt: *_78 = _33; > > At pos: *_78 > > IV struct: > > Type: integer(kind=4) * > > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) > _36) * 4 + (unsigned long) block.0_26) + 8) > > Step: (unsigned long) stride.3_27 * 4 > > Object: (void *) block.0_26 > > Biv: N > > Overflowness wrto loop niter: Overflow > > Use 0.2: > > At stmt: *_45 = _33; > > At pos: *_45 > > IV struct: > > Type: integer(kind=4) * > > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) > _36) * 4 + (unsigned long) block.0_26) + 12) > > Step: (unsigned long) stride.3_27 * 4 > > Object: (void *) block.0_26 > > Biv: N > > Overflowness wrto loop niter: Overflow > > > > And all these IVs are the same but with a slightly different base offset. By doing > the affine compare here IV ops sees that > > The best candidate is: > > > > Candidate 6: > > Depend on inv.exprs: 1 > > Var befor: ivtmp.26_51 > > Var after: ivtmp.26_72 > > Incr POS: before exit test > > IV struct: > > Type: unsigned long > > Base: ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned > long) block.0_26 > > Step: (unsigned long) (stride.3_27 * 4) > > Object: (void *) block.0_26 > > Biv: N > > Overflowness wrto loop niter: Overflow > > > > And just builds the three IVs from the same candidate. > > > > Does this cover what you wanted? > > > > Cheers, > > Tamar > > > > > > > > > code = TREE_CODE (top); > > > > switch (code) > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Richard Biener <rguenther@suse.de> > > > SUSE Software Solutions Germany GmbH, > > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > Nuernberg) > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Mon, 24 Jun 2024, Tamar Christina wrote: > > > > -----Original Message----- > > From: Richard Biener <rguenther@suse.de> > > Sent: Thursday, June 20, 2024 8:49 AM > > To: Tamar Christina <Tamar.Christina@arm.com> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com > > Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate > > selection [PR114932] > > > > On Wed, 19 Jun 2024, Tamar Christina wrote: > > > > > > -----Original Message----- > > > > From: Richard Biener <rguenther@suse.de> > > > > Sent: Wednesday, June 19, 2024 12:55 PM > > > > To: Tamar Christina <Tamar.Christina@arm.com> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; > > bin.cheng@linux.alibaba.com > > > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during > > candidate > > > > selection [PR114932] > > > > > > > > On Fri, 14 Jun 2024, Tamar Christina wrote: > > > > > > > > > Hi All, > > > > > > > > > > IVOPTS normally uses affine trees to perform comparisons between different > > IVs, > > > > > but these seem to have been missing in two key spots and instead normal > > tree > > > > > equivalencies used. > > > > > > > > > > In some cases where we have a structural equivalence but not a signedness > > > > > equivalencies we end up generating both a signed and unsigned IV for the > > same > > > > > candidate. > > > > > > > > > > This happens quite a lot with fortran but can also happen in C because this > > came > > > > > code is unable to figure out when one expression is a multiple of another. > > > > > > > > > > As an example in the attached testcase we get: > > > > > > > > > > Initial set of candidates: > > > > > cost: 24 (complexity 3) > > > > > reg_cost: 9 > > > > > cand_cost: 15 > > > > > cand_group_cost: 0 (complexity 3) > > > > > candidates: 1, 6, 8 > > > > > group:0 --> iv_cand:6, cost=(0,1) > > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > > group:2 --> iv_cand:8, cost=(0,1) > > > > > group:3 --> iv_cand:8, cost=(0,1) > > > > > invariant variables: 6 > > > > > invariant expressions: 1, 2 > > > > > > > > > > <Invariant Expressions>: > > > > > inv_expr 1: stride.3_27 * 4 > > > > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > > > > > > > These end up being used in the same group: > > > > > > > > > > Group 1: > > > > > cand cost compl. inv.expr. inv.vars > > > > > 1 0 0 NIL; 6 > > > > > 2 0 0 NIL; 6 > > > > > 3 0 0 NIL; 6 > > > > > > > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > > > > > > > Improved to: > > > > > cost: 24 (complexity 3) > > > > > reg_cost: 9 > > > > > cand_cost: 15 > > > > > cand_group_cost: 0 (complexity 3) > > > > > candidates: 1, 6, 8 > > > > > group:0 --> iv_cand:6, cost=(0,1) > > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > > group:2 --> iv_cand:8, cost=(0,1) > > > > > group:3 --> iv_cand:8, cost=(0,1) > > > > > invariant variables: 6 > > > > > invariant expressions: 1, 2 > > > > > > > > > > and so generates the same IV as both signed and unsigned: > > > > > > > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > > > > 58.2545), maybe hot > > > > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > > > > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > > > > (FALLTHRU,EXECUTABLE) > > > > > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > > > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > > > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > > > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > > > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > > > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > > > > > > > ... > > > > > > > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > > > > 58.2545), maybe hot > > > > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > > > > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > > > > (FALLTHRU) > > > > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq > > 19.4182) > > > > (TRUE_VALUE,EXECUTABLE) > > > > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq > > 12.9455) > > > > (TRUE_VALUE,EXECUTABLE) > > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > > > > > > > ivtmp.22_82 = ivtmp.22_41 + 1; > > > > ivtmp.26_72 = ivtmp.26_51 + _80; > > > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > > > > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > > > > > > > _73 = stride.3_27 * 4; > > > > > _80 = (unsigned long) _73; > > > > > _54 = (unsigned long) stride.3_27; > > > > > _39 = _54 * 4; > > > > > > > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > > > > > > > This is because candidate 6 and 8 are structurally equivalent but have > > different > > > > > signs. > > > > > > > > > > This patch changes it so that if you have two IVs that are affine equivalent to > > > > > just pick one over the other. IV already has code for this, so the patch just > > > > > uses affine trees instead of tree for the check. > > > > > > > > > > With it we get: > > > > > > > > > > <Invariant Expressions>: > > > > > inv_expr 1: stride.3_27 * 4 > > > > > > > > > > <Group-candidate Costs>: > > > > > Group 0: > > > > > cand cost compl. inv.expr. inv.vars > > > > > 5 0 2 NIL; NIL; > > > > > 6 0 3 NIL; NIL; > > > > > > > > > > Group 1: > > > > > cand cost compl. inv.expr. inv.vars > > > > > 1 0 0 NIL; 6 > > > > > 2 0 0 NIL; 6 > > > > > 3 0 0 NIL; 6 > > > > > 4 0 0 NIL; 6 > > > > > > > > > > Initial set of candidates: > > > > > cost: 16 (complexity 3) > > > > > reg_cost: 6 > > > > > cand_cost: 10 > > > > > cand_group_cost: 0 (complexity 3) > > > > > candidates: 1, 6 > > > > > group:0 --> iv_cand:6, cost=(0,3) > > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > > invariant variables: 6 > > > > > invariant expressions: 1 > > > > > > > > > > The two patches together results in a 10% performance increase in exchange2 > > in > > > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > > > compile > > > > > time. There's also a 5% performance improvement in fotonik3d and similar > > > > > reduction in binary size. > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > > > > > Ok for master? > > > > > > > > > > Thanks, > > > > > Tamar > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > PR tree-optimization/114932 > > > > > * tree-affine.cc (aff_combination_constant_multiple_p): Take zero > > > > > offsets into account. > > > > > * tree-ssa-loop-ivopts.cc (affine_compare_eq): New. > > > > > (record_group_use): Use it. > > > > > (constant_multiple_of): Also check equality under > > > > > aff_combination_constant_multiple_p. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > PR tree-optimization/114932 > > > > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > > > > > > > --- > > > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > > new file mode 100644 > > > > > index > > > > > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > > > 2e24a4973c8539fae > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > > @@ -0,0 +1,20 @@ > > > > > +! { dg-do compile { target aarch64-*-* } } > > > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > > > > + > > > > > +module a > > > > > +integer, parameter :: b=3, c=b > > > > > +contains > > > > > +subroutine d(block) > > > > > +integer f, col , block(:, :, :), e > > > > > +do f = 1, c > > > > > + do col = 1, c > > > > > + block(:f, :, e()) = do > > > > > + end do > > > > > + end do > > > > > + end > > > > > + end > > > > > + > > > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} > > ivopts > > > > } } > > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} > > 1 > > > > ivopts } } > > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} > > 1 > > > > ivopts } } > > > > > + > > > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc > > > > > index > > > > > > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28 > > > > 6346890ddeab4096 100644 > > > > > --- a/gcc/tree-affine.cc > > > > > +++ b/gcc/tree-affine.cc > > > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree > > *val, > > > > aff_tree *div, > > > > > &mult_set, mult)) > > > > > return false; > > > > > > > > > > + /* Everything is a multiple of 0, which means we shoudn't enforce that > > > > > + mult_set is checked, since that forced the only valid multiple of > > > > > + val and div to be 0 whereas 1 is also possible. */ > > > > > + if (known_eq (val->offset, 0) > > > > > + && known_eq (div->offset, 0)) > > > > > + mult_set = false; > > > > > + > > > > > > > > In fact all numbers are possible? Shouldn't this be better handled > > > > in wide_int_constant_multiple_p by special-casing > > > > known_eq (div, 0) in the known_eq (val, 0) condition by simply > > > > returning 'true' without checking or setting *mult_set? > > > > > > > > The docs of wide_int_constant_multiple_p is odd: > > > > > > > > /* If VAL != CST * DIV for any constant CST, returns false. > > > > > > > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'? > > > > Or s/any/all/? > > > > > > > > > > Yeah, that condition would always be false. > > > > Can you split out a patch to fix this? > > Doing now. > > > > > > > > for (i = 0; i < div->n; i++) > > > > > { > > > > > class aff_comb_elt *elt > > > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > > > > index > > > > > > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487 > > > > 64cd2facd9ddb4914 100644 > > > > > --- a/gcc/tree-ssa-loop-ivopts.cc > > > > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) > > > > > return exit; > > > > > } > > > > > > > > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and > > > > return > > > > > + whether they are the same under affine equality. */ > > > > > + > > > > > +static bool > > > > > +affine_compare_eq (aff_tree &left, tree right) > > > > > +{ > > > > > + aff_tree aff_right; > > > > > + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); > > > > > + aff_combination_scale (&aff_right, -1); > > > > > + aff_combination_add (&aff_right, &left); > > > > > + return aff_combination_zero_p (&aff_right); > > > > > +} > > > > > + > > > > > > > > > > /* Given a nested expression in ARG consisting of PLUS or MULT try to see if > > one > > > > > of the arguments of each expression is a constant and that the type of the > > > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree > > > > *use_p, > > > > > tree addr_base = NULL; > > > > > struct iv_group *group = NULL; > > > > > poly_uint64 addr_offset = 0; > > > > > + aff_tree iv_step, iv_addr_base; > > > > > + > > > > > + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); > > > > > > > > > > /* Record non address type use in a new group. */ > > > > > if (address_p (type)) > > > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree > > > > *use_p, > > > > > tree addr_toffset; > > > > > split_constant_offset (iv->base, &addr_base, &addr_toffset); > > > > > addr_offset = int_cst_value (addr_toffset); > > > > > + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), > > > > &iv_addr_base); > > > > > for (i = 0; i < data->vgroups.length (); i++) > > > > > { > > > > > struct iv_use *use; > > > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree > > > > *use_p, > > > > > > > > > > /* Check if it has the same stripped base and step. */ > > > > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > > > > - && operand_equal_p (iv->step, use->iv->step, 0) > > > > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > > > > + && affine_compare_eq (iv_step, use->iv->step) > > > > > + && affine_compare_eq (iv_addr_base, use->addr_base)) > > > > > > > > There's only this use of addr_base so I think the opportunity is to > > > > turn iv_use->addr_base into aff_tree (even though that's a quite big > > > > representation). > > > > > > Ah, true. Will do. > > > > > > > > > > > For the testcase, what are the two IVs we are comparing? I wonder > > > > why you need the affine compare for iv->step? > > > > > > > > > > Because step builds up the IV expressions and can also be an expression. > > > > > > In this case: > > > > > > >>> p debug (iv->step) > > > ((unsigned long) stride.3_27) * 4 > > > >>> p debug (use->iv->step) > > > (sizetype) (stride.3_27 * 4) > > > > > > Note that the original expressions were both unsigned, but the STRIP_NOPS > > > made one signed. They still wouldn't have compared equal though due to > > > the different cast locations. > > > > > > For completeness the base here is > > > > > > >>> p debug (use->addr_base) > > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - > > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) > > block.0_26) > > > $9 = void > > > > > > >>> p debug (addr_base) > > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - > > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) > > block.0_26) > > > $10 = void > > > > > > > > > > > > break; > > > > > } > > > > > if (i == data->vgroups.length ()) > > > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, > > widest_int > > > > *mul) > > > > > return true; > > > > > } > > > > > > > > > > + aff_tree aff_top, aff_bot; > > > > > + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); > > > > > + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); > > > > > + poly_widest_int poly_mul; > > > > > + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) > > > > > + && poly_mul.is_constant (mul)) > > > > > + return true; > > > > > + > > > > > > > > So why does stripping nops not work here? > > > > > > So this is where we compare different IV expressions to determine which > > > IVs compute the same thing and thus can be in the same group. > > > > > > The STRIP_NOPS don't work because while the incoming types are the same > > > the casts are different. So: > > > > > > >>> p debug (ustep) > > > (unsigned long) stride.3_27 * 4 > > > $3 = void > > > >>> p debug (cstep) > > > (unsigned long) (stride.3_27 * 4) > > > $4 = void > > > > > > Which is of course stripped to: > > > > > > >>> p debug (top) > > > (unsigned long) stride.3_27 * 4 > > > $1 = void > > > >>> p debug (bot) > > > stride.3_27 * 4 > > > > So we're expecting constant_multiple_of to compute *mul == 1, proving > > equality? > > > > Indeed > > > The function seems to try stripping ops from TOP until it reaches > > an expression equal to BOT and that's what fails to trigger here. > > > > What I originally wondered was iff we compute the affine combinations > > why not use only aff_combination_constant_multiple_p? > > > > I think that's probably easier, The rest of the code seems to indeed be > repeating the work of aff_combination_constant_multiple_p. > > I can try replacing the whole thing with aff_combination_constant_multiple_p > and see? Yes. > > I might also point back to the idea I threw in somewhere, adding > > OEP_VALUE (or a better name) to the set of flags accepted by > > operand_equal_p. You mentioned hashing IIRC but I don't see the patches > > touching hashing? > > > > Yes, That can indeed be done with this approach. The hashing was that before I > was trying to prevent the "duplicate" IV expressions from being created in the > first place by modifying get_loop_invariant_expr. > > This function looks up if we have already seen a particular IV expression and if > we have it just returns that expression. However after reading more of the code > I realized this wasn't the right approach, as without also dealing with the candidates > we'd end up creating IV expression that can't be handled by any candidate. > > IVops would just give up then. Reading the code it seems that get_loop_invariant_expr > is just there to prevent blatant duplicates. i.e. it treats `(signed) a` and `a` as the same. > > This is also why I think that everywhere else *has* to continue stripping the expression. > > On a note from Richard S that he thought IVopts already had some code to deal with > expressions that differ only in signs led me to take a different approach. > > The goal wasn't to remove the different sign/unsigned IV expressions, but instead get > Then to be servable by the same candidates. i.e. we want them in the same candidate > groups and then candidate optimization will just do its thing. > > That seemed a more natural fit to how it worked. Yeah, I agree that sounds like the better strathegy. > Do you want me to try the operand_equal_p approach? Though in this case the issue > is we not only need to know they're equal, but also need to know the scale factor. For this case yes, but if you'd keep the code as-is, the equal with scale factor one case would be fixed. Not a case with different scale factors though - but conversions "elsewhere" should be handled via the stripping. So it would work to simply adjust the operand_equal_p check here? > get_computation_aff_1 scales the common type IV by the scale we determined, > so I think operand_equal_p would not be very useful here. But it does look like > constant_multiple_of can just be implemented with aff_combination_constant_multiple_p. > > Should I try? You've had the other places where you replace operand_equal_p with affine-compute and compare. As said that has some associated cost as well as a limit on the number of elements after which it resorts back to operand_equal_p. So for strict equality tests implementing a weaker operand_equal_p might be a better solution. Richard. > Thanks, > Tamar > > > > > Both of these compute the same thing so by doing the affine compare they > > > end up in the same IV groups and can be costed. Later during candidate selection > > > these are the steps we're comparing to see if the candidate is the same invariant. > > > > > > The full list is: > > > > > > <IV Groups>: > > > Group 0: > > > Type: REFERENCE ADDRESS > > > Use 0.0: > > > At stmt: *_4 = _33; > > > At pos: *_4 > > > IV struct: > > > Type: integer(kind=4) * > > > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) > > _36) * 4 + (unsigned long) block.0_26) + 4) > > > Step: (sizetype) (stride.3_27 * 4) > > > Object: (void *) block.0_26 > > > Biv: N > > > Overflowness wrto loop niter: Overflow > > > Use 0.1: > > > At stmt: *_78 = _33; > > > At pos: *_78 > > > IV struct: > > > Type: integer(kind=4) * > > > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) > > _36) * 4 + (unsigned long) block.0_26) + 8) > > > Step: (unsigned long) stride.3_27 * 4 > > > Object: (void *) block.0_26 > > > Biv: N > > > Overflowness wrto loop niter: Overflow > > > Use 0.2: > > > At stmt: *_45 = _33; > > > At pos: *_45 > > > IV struct: > > > Type: integer(kind=4) * > > > Base: (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) > > _36) * 4 + (unsigned long) block.0_26) + 12) > > > Step: (unsigned long) stride.3_27 * 4 > > > Object: (void *) block.0_26 > > > Biv: N > > > Overflowness wrto loop niter: Overflow > > > > > > And all these IVs are the same but with a slightly different base offset. By doing > > the affine compare here IV ops sees that > > > The best candidate is: > > > > > > Candidate 6: > > > Depend on inv.exprs: 1 > > > Var befor: ivtmp.26_51 > > > Var after: ivtmp.26_72 > > > Incr POS: before exit test > > > IV struct: > > > Type: unsigned long > > > Base: ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned > > long) block.0_26 > > > Step: (unsigned long) (stride.3_27 * 4) > > > Object: (void *) block.0_26 > > > Biv: N > > > Overflowness wrto loop niter: Overflow > > > > > > And just builds the three IVs from the same candidate. > > > > > > Does this cover what you wanted? > > > > > > Cheers, > > > Tamar > > > > > > > > > > > > code = TREE_CODE (top); > > > > > switch (code) > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Richard Biener <rguenther@suse.de> > > > > SUSE Software Solutions Germany GmbH, > > > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > Nuernberg) > > > > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) >
> > > I might also point back to the idea I threw in somewhere, adding > > > OEP_VALUE (or a better name) to the set of flags accepted by > > > operand_equal_p. You mentioned hashing IIRC but I don't see the patches > > > touching hashing? > > > > > > > Yes, That can indeed be done with this approach. The hashing was that before I > > was trying to prevent the "duplicate" IV expressions from being created in the > > first place by modifying get_loop_invariant_expr. > > > > This function looks up if we have already seen a particular IV expression and if > > we have it just returns that expression. However after reading more of the code > > I realized this wasn't the right approach, as without also dealing with the > candidates > > we'd end up creating IV expression that can't be handled by any candidate. > > > > IVops would just give up then. Reading the code it seems that > get_loop_invariant_expr > > is just there to prevent blatant duplicates. i.e. it treats `(signed) a` and `a` as the > same. > > > > This is also why I think that everywhere else *has* to continue stripping the > expression. > > > > On a note from Richard S that he thought IVopts already had some code to deal > with > > expressions that differ only in signs led me to take a different approach. > > > > The goal wasn't to remove the different sign/unsigned IV expressions, but > instead get > > Then to be servable by the same candidates. i.e. we want them in the same > candidate > > groups and then candidate optimization will just do its thing. > > > > That seemed a more natural fit to how it worked. > > Yeah, I agree that sounds like the better strathegy. > > > Do you want me to try the operand_equal_p approach? Though in this case the > issue > > is we not only need to know they're equal, but also need to know the scale > factor. > > For this case yes, but if you'd keep the code as-is, the equal with scale > factor one case would be fixed. Not a case with different scale factors > though - but conversions "elsewhere" should be handled via the stripping. > So it would work to simply adjust the operand_equal_p check here? > > > get_computation_aff_1 scales the common type IV by the scale we determined, > > so I think operand_equal_p would not be very useful here. But it does look like > > constant_multiple_of can just be implemented with > aff_combination_constant_multiple_p. > > > > Should I try? > > You've had the other places where you replace operand_equal_p with > affine-compute and compare. As said that has some associated cost > as well as a limit on the number of elements after which it resorts > back to operand_equal_p. So for strict equality tests implementing > a weaker operand_equal_p might be a better solution. > The structural comparison is implemented as a new mode for operand_equal_p which compares two expressions ignoring NOP converts (unless their bitsizes differ) and ignoring constant values, but requiring both operands be a constant. There is one downside compared to affine comparison, in that this approach does not deal well with commutative operations. i.e. it does not see a + (b + c) as equivalent to c + (b + a). This means we lose out on some of the more complicated addressing modes, but with so many operations the address will likely be split anyway and we'll deal with it then. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/114932 * fold-const.cc (operand_compare::operand_equal_p): Use it. (operand_compare::verify_hash_value): Likewise. * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. gcc/testsuite/ChangeLog: PR tree-optimization/114932 * gfortran.dg/addressing-modes_2.f90: New test. -- inline copy of -- diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c094164724da9d17fbb5793237 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1, precision differences. */ if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST) { + if (flags & OEP_STRUCTURAL_EQ) + return true; + /* Address of INTEGER_CST is not defined; check that we did not forget to drop the OEP_ADDRESS_OF flags. */ gcc_checking_assert (!(flags & OEP_ADDRESS_OF)); @@ -3204,7 +3207,8 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1, because they may change the signedness of the arguments. As pointers strictly don't have a signedness, require either two pointers or two non-pointers as well. */ - if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1)) + if ((TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1)) + && !(flags & OEP_STRUCTURAL_EQ)) || POINTER_TYPE_P (TREE_TYPE (arg0)) != POINTER_TYPE_P (TREE_TYPE (arg1))) return false; @@ -4204,7 +4208,7 @@ operand_compare::verify_hash_value (const_tree arg0, const_tree arg1, { if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) { - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ))) { inchash::hash hstate0 (0), hstate1 (0); hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 new file mode 100644 index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae --- /dev/null +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 @@ -0,0 +1,20 @@ +! { dg-do compile { target aarch64-*-* } } +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } + +module a +integer, parameter :: b=3, c=b +contains +subroutine d(block) +integer f, col , block(:, :, :), e +do f = 1, c + do col = 1, c + block(:f, :, e()) = do + end do + end do + end + end + +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } } + diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -962,7 +962,9 @@ enum operand_equal_flag { /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations of the same kind. Used to compare VLA bounds involving parameters across redeclarations of the same function. */ - OEP_DECL_NAME = 512 + OEP_DECL_NAME = 512, + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ + OEP_STRUCTURAL_EQ = 1024 }; /* Enum and arrays used for tree allocation stats. diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc index c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f 100644 --- a/gcc/tree-ssa-loop-ivopts.cc +++ b/gcc/tree-ssa-loop-ivopts.cc @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree *use_p, /* Check if it has the same stripped base and step. */ if (operand_equal_p (iv->base_object, use->iv->base_object, 0) - && operand_equal_p (iv->step, use->iv->step, 0) - && operand_equal_p (addr_base, use->addr_base, 0)) + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) + && operand_equal_p (addr_base, use->addr_base, OEP_STRUCTURAL_EQ)) break; } if (i == data->vgroups.length ())
On Wed, 10 Jul 2024, Tamar Christina wrote: > > > > I might also point back to the idea I threw in somewhere, adding > > > > OEP_VALUE (or a better name) to the set of flags accepted by > > > > operand_equal_p. You mentioned hashing IIRC but I don't see the patches > > > > touching hashing? > > > > > > > > > > Yes, That can indeed be done with this approach. The hashing was that before I > > > was trying to prevent the "duplicate" IV expressions from being created in the > > > first place by modifying get_loop_invariant_expr. > > > > > > This function looks up if we have already seen a particular IV expression and if > > > we have it just returns that expression. However after reading more of the code > > > I realized this wasn't the right approach, as without also dealing with the > > candidates > > > we'd end up creating IV expression that can't be handled by any candidate. > > > > > > IVops would just give up then. Reading the code it seems that > > get_loop_invariant_expr > > > is just there to prevent blatant duplicates. i.e. it treats `(signed) a` and `a` as the > > same. > > > > > > This is also why I think that everywhere else *has* to continue stripping the > > expression. > > > > > > On a note from Richard S that he thought IVopts already had some code to deal > > with > > > expressions that differ only in signs led me to take a different approach. > > > > > > The goal wasn't to remove the different sign/unsigned IV expressions, but > > instead get > > > Then to be servable by the same candidates. i.e. we want them in the same > > candidate > > > groups and then candidate optimization will just do its thing. > > > > > > That seemed a more natural fit to how it worked. > > > > Yeah, I agree that sounds like the better strathegy. > > > > > Do you want me to try the operand_equal_p approach? Though in this case the > > issue > > > is we not only need to know they're equal, but also need to know the scale > > factor. > > > > For this case yes, but if you'd keep the code as-is, the equal with scale > > factor one case would be fixed. Not a case with different scale factors > > though - but conversions "elsewhere" should be handled via the stripping. > > So it would work to simply adjust the operand_equal_p check here? > > > > > get_computation_aff_1 scales the common type IV by the scale we determined, > > > so I think operand_equal_p would not be very useful here. But it does look like > > > constant_multiple_of can just be implemented with > > aff_combination_constant_multiple_p. > > > > > > Should I try? > > > > You've had the other places where you replace operand_equal_p with > > affine-compute and compare. As said that has some associated cost > > as well as a limit on the number of elements after which it resorts > > back to operand_equal_p. So for strict equality tests implementing > > a weaker operand_equal_p might be a better solution. > > > > The structural comparison is implemented as a new mode for operand_equal_p which > compares two expressions ignoring NOP converts (unless their bitsizes differ) > and ignoring constant values, but requiring both operands be a constant. > > There is one downside compared to affine comparison, in that this approach does > not deal well with commutative operations. i.e. it does not see a + (b + c) as > equivalent to c + (b + a). > > This means we lose out on some of the more complicated addressing modes, but > with so many operations the address will likely be split anyway and we'll deal > with it then. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > x86_64-pc-linux-gnu -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/114932 > * fold-const.cc (operand_compare::operand_equal_p): Use it. > (operand_compare::verify_hash_value): Likewise. > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/114932 > * gfortran.dg/addressing-modes_2.f90: New test. > > -- inline copy of -- > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c094164724da9d17fbb5793237 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1, > precision differences. */ > if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST) > { > + if (flags & OEP_STRUCTURAL_EQ) > + return true; > + Hmm, so you ignore all constants? > /* Address of INTEGER_CST is not defined; check that we did not forget > to drop the OEP_ADDRESS_OF flags. */ > gcc_checking_assert (!(flags & OEP_ADDRESS_OF)); > @@ -3204,7 +3207,8 @@ operand_compare::operand_equal_p (const_tree arg0, const_tree arg1, > because they may change the signedness of the arguments. As pointers > strictly don't have a signedness, require either two pointers or > two non-pointers as well. */ > - if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1)) > + if ((TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE (arg1)) > + && !(flags & OEP_STRUCTURAL_EQ)) > || POINTER_TYPE_P (TREE_TYPE (arg0)) > != POINTER_TYPE_P (TREE_TYPE (arg1))) and all sign-conversions? I was thinking to implement a "value equivalence" check, so (int)((unsigned)i + (unsigned)j) would be equal to i + j for int i and j. Doesn't your patch still treat (int)(unsigned)i and i as different? And won't it treat (unsigned)i / 4 and (int)i / 4 as the same? I'm not sure this is the desired semantic for IVOPTs (or a useful one in general). That said, when OEP_STRUCTURAL_EQ ignores a sign-change it can only do that for operations where that maintains value-equivalence (in terms of twos-complement bit representation). OTOH if you ignore all differences in constants that's another thing. > return false; > @@ -4204,7 +4208,7 @@ operand_compare::verify_hash_value (const_tree arg0, const_tree arg1, > { > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > { > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ))) > { > inchash::hash hstate0 (0), hstate1 (0); > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > new file mode 100644 > index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae > --- /dev/null > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > @@ -0,0 +1,20 @@ > +! { dg-do compile { target aarch64-*-* } } > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > + > +module a > +integer, parameter :: b=3, c=b > +contains > +subroutine d(block) > +integer f, col , block(:, :, :), e > +do f = 1, c > + do col = 1, c > + block(:f, :, e()) = do > + end do > + end do > + end > + end > + > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } } > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } } > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } } > + > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -962,7 +962,9 @@ enum operand_equal_flag { > /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations > of the same kind. Used to compare VLA bounds involving parameters > across redeclarations of the same function. */ > - OEP_DECL_NAME = 512 > + OEP_DECL_NAME = 512, > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > + OEP_STRUCTURAL_EQ = 1024 > }; > > /* Enum and arrays used for tree allocation stats. > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > index c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f 100644 > --- a/gcc/tree-ssa-loop-ivopts.cc > +++ b/gcc/tree-ssa-loop-ivopts.cc > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree *use_p, > > /* Check if it has the same stripped base and step. */ > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > - && operand_equal_p (iv->step, use->iv->step, 0) > - && operand_equal_p (addr_base, use->addr_base, 0)) > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > + && operand_equal_p (addr_base, use->addr_base, OEP_STRUCTURAL_EQ)) > break; > } > if (i == data->vgroups.length ()) >
> -----Original Message----- > From: Richard Biener <rguenther@suse.de> > Sent: Thursday, July 11, 2024 1:10 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com > Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate > selection [PR114932] > > On Wed, 10 Jul 2024, Tamar Christina wrote: > > > > > > I might also point back to the idea I threw in somewhere, adding > > > > > OEP_VALUE (or a better name) to the set of flags accepted by > > > > > operand_equal_p. You mentioned hashing IIRC but I don't see the patches > > > > > touching hashing? > > > > > > > > > > > > > Yes, That can indeed be done with this approach. The hashing was that > before I > > > > was trying to prevent the "duplicate" IV expressions from being created in the > > > > first place by modifying get_loop_invariant_expr. > > > > > > > > This function looks up if we have already seen a particular IV expression and if > > > > we have it just returns that expression. However after reading more of the > code > > > > I realized this wasn't the right approach, as without also dealing with the > > > candidates > > > > we'd end up creating IV expression that can't be handled by any candidate. > > > > > > > > IVops would just give up then. Reading the code it seems that > > > get_loop_invariant_expr > > > > is just there to prevent blatant duplicates. i.e. it treats `(signed) a` and `a` as > the > > > same. > > > > > > > > This is also why I think that everywhere else *has* to continue stripping the > > > expression. > > > > > > > > On a note from Richard S that he thought IVopts already had some code to > deal > > > with > > > > expressions that differ only in signs led me to take a different approach. > > > > > > > > The goal wasn't to remove the different sign/unsigned IV expressions, but > > > instead get > > > > Then to be servable by the same candidates. i.e. we want them in the same > > > candidate > > > > groups and then candidate optimization will just do its thing. > > > > > > > > That seemed a more natural fit to how it worked. > > > > > > Yeah, I agree that sounds like the better strathegy. > > > > > > > Do you want me to try the operand_equal_p approach? Though in this case > the > > > issue > > > > is we not only need to know they're equal, but also need to know the scale > > > factor. > > > > > > For this case yes, but if you'd keep the code as-is, the equal with scale > > > factor one case would be fixed. Not a case with different scale factors > > > though - but conversions "elsewhere" should be handled via the stripping. > > > So it would work to simply adjust the operand_equal_p check here? > > > > > > > get_computation_aff_1 scales the common type IV by the scale we > determined, > > > > so I think operand_equal_p would not be very useful here. But it does look > like > > > > constant_multiple_of can just be implemented with > > > aff_combination_constant_multiple_p. > > > > > > > > Should I try? > > > > > > You've had the other places where you replace operand_equal_p with > > > affine-compute and compare. As said that has some associated cost > > > as well as a limit on the number of elements after which it resorts > > > back to operand_equal_p. So for strict equality tests implementing > > > a weaker operand_equal_p might be a better solution. > > > > > > > The structural comparison is implemented as a new mode for operand_equal_p > which > > compares two expressions ignoring NOP converts (unless their bitsizes differ) > > and ignoring constant values, but requiring both operands be a constant. > > > > There is one downside compared to affine comparison, in that this approach > does > > not deal well with commutative operations. i.e. it does not see a + (b + c) as > > equivalent to c + (b + a). > > > > This means we lose out on some of the more complicated addressing modes, but > > with so many operations the address will likely be split anyway and we'll deal > > with it then. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > x86_64-pc-linux-gnu -m32, -m64 and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/114932 > > * fold-const.cc (operand_compare::operand_equal_p): Use it. > > (operand_compare::verify_hash_value): Likewise. > > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/114932 > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > -- inline copy of -- > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index > 710d697c0217c784b34f9f9f7b00b1945369076a..3d43020541c082c09416472 > 4da9d17fbb5793237 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -3191,6 +3191,9 @@ operand_compare::operand_equal_p (const_tree > arg0, const_tree arg1, > > precision differences. */ > > if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST) > > { > > + if (flags & OEP_STRUCTURAL_EQ) > > + return true; > > + > > Hmm, so you ignore all constants? Yes, because later on it still needs to check that one expression is a multiple of the other. If not while they're in the same group, they'll get different IVs. Ideally I'd only check if the two expressions differ only in a constant offset or multiply. But I have no idea how to do this from operand_equal_p as I'd need to know what the parent expression is. I guess I could do extra checks when comparing + and *. > > > /* Address of INTEGER_CST is not defined; check that we did not forget > > to drop the OEP_ADDRESS_OF flags. */ > > gcc_checking_assert (!(flags & OEP_ADDRESS_OF)); > > @@ -3204,7 +3207,8 @@ operand_compare::operand_equal_p (const_tree > arg0, const_tree arg1, > > because they may change the signedness of the arguments. As pointers > > strictly don't have a signedness, require either two pointers or > > two non-pointers as well. */ > > - if (TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE > (arg1)) > > + if ((TYPE_UNSIGNED (TREE_TYPE (arg0)) != TYPE_UNSIGNED (TREE_TYPE > (arg1)) > > + && !(flags & OEP_STRUCTURAL_EQ)) > > || POINTER_TYPE_P (TREE_TYPE (arg0)) > > != POINTER_TYPE_P (TREE_TYPE (arg1))) > > and all sign-conversions? Yes, as long as they don't change bitsize. That's checked in the lines below. > > I was thinking to implement a "value equivalence" check, so > (int)((unsigned)i + (unsigned)j) would be equal to i + j for int i and j. > That's not enough for IV ops, what it's after is knowing whether two expressions are the same but for some constant offset of multiply. i.e. can one expression replace the other with come constant known adjustment. > Doesn't your patch still treat (int)(unsigned)i and i as different? Sure, but we don't' get those because we've folded everything to unsigned before in patch 1. > And won't it treat (unsigned)i / 4 and (int)i / 4 as the same? I'm not > sure this is the desired semantic for IVOPTs (or a useful one in general). Agree, though I didn't expect to have division in addressing mode calculations. I only expected +,-,*. Do we actually do /? > > That said, when OEP_STRUCTURAL_EQ ignores a sign-change it can only > do that for operations where that maintains value-equivalence > (in terms of twos-complement bit representation). > > OTOH if you ignore all differences in constants that's another thing. It's a bit of a weird one. As I mentioned above I only need +,- and *. But whatever semantics I encode the use of the flag is going to be specialized. I can start by figuring out how to ignore the constants only in those three cases. Thanks, Tamar > > > return false; > > @@ -4204,7 +4208,7 @@ operand_compare::verify_hash_value (const_tree > arg0, const_tree arg1, > > { > > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > > { > > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | > OEP_STRUCTURAL_EQ))) > > { > > inchash::hash hstate0 (0), hstate1 (0); > > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > 2e24a4973c8539fae > > --- /dev/null > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > @@ -0,0 +1,20 @@ > > +! { dg-do compile { target aarch64-*-* } } > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > + > > +module a > > +integer, parameter :: b=3, c=b > > +contains > > +subroutine d(block) > > +integer f, col , block(:, :, :), e > > +do f = 1, c > > + do col = 1, c > > + block(:f, :, e()) = do > > + end do > > + end do > > + end > > + end > > + > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts > } } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 > ivopts } } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 > ivopts } } > > + > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > index > 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0 > 7d133bc393aa4a7 100644 > > --- a/gcc/tree-core.h > > +++ b/gcc/tree-core.h > > @@ -962,7 +962,9 @@ enum operand_equal_flag { > > /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations > > of the same kind. Used to compare VLA bounds involving parameters > > across redeclarations of the same function. */ > > - OEP_DECL_NAME = 512 > > + OEP_DECL_NAME = 512, > > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > > + OEP_STRUCTURAL_EQ = 1024 > > }; > > > > /* Enum and arrays used for tree allocation stats. > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > index > c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3 > ed5e614b7ccc702f 100644 > > --- a/gcc/tree-ssa-loop-ivopts.cc > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree > *use_p, > > > > /* Check if it has the same stripped base and step. */ > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > - && operand_equal_p (iv->step, use->iv->step, 0) > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > > + && operand_equal_p (addr_base, use->addr_base, > OEP_STRUCTURAL_EQ)) > > break; > > } > > if (i == data->vgroups.length ()) > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 new file mode 100644 index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae --- /dev/null +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 @@ -0,0 +1,20 @@ +! { dg-do compile { target aarch64-*-* } } +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } + +module a +integer, parameter :: b=3, c=b +contains +subroutine d(block) +integer f, col , block(:, :, :), e +do f = 1, c + do col = 1, c + block(:f, :, e()) = do + end do + end do + end + end + +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } } + diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644 --- a/gcc/tree-affine.cc +++ b/gcc/tree-affine.cc @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div, &mult_set, mult)) return false; + /* Everything is a multiple of 0, which means we shoudn't enforce that + mult_set is checked, since that forced the only valid multiple of + val and div to be 0 whereas 1 is also possible. */ + if (known_eq (val->offset, 0) + && known_eq (div->offset, 0)) + mult_set = false; + for (i = 0; i < div->n; i++) { class aff_comb_elt *elt diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644 --- a/gcc/tree-ssa-loop-ivopts.cc +++ b/gcc/tree-ssa-loop-ivopts.cc @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) return exit; } +/* Compares the given affine tree LEFT with the tree expression RIGHT and return + whether they are the same under affine equality. */ + +static bool +affine_compare_eq (aff_tree &left, tree right) +{ + aff_tree aff_right; + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); + aff_combination_scale (&aff_right, -1); + aff_combination_add (&aff_right, &left); + return aff_combination_zero_p (&aff_right); +} + /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one of the arguments of each expression is a constant and that the type of the @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p, tree addr_base = NULL; struct iv_group *group = NULL; poly_uint64 addr_offset = 0; + aff_tree iv_step, iv_addr_base; + + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); /* Record non address type use in a new group. */ if (address_p (type)) @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p, tree addr_toffset; split_constant_offset (iv->base, &addr_base, &addr_toffset); addr_offset = int_cst_value (addr_toffset); + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base); for (i = 0; i < data->vgroups.length (); i++) { struct iv_use *use; @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p, /* Check if it has the same stripped base and step. */ if (operand_equal_p (iv->base_object, use->iv->base_object, 0) - && operand_equal_p (iv->step, use->iv->step, 0) - && operand_equal_p (addr_base, use->addr_base, 0)) + && affine_compare_eq (iv_step, use->iv->step) + && affine_compare_eq (iv_addr_base, use->addr_base)) break; } if (i == data->vgroups.length ()) @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul) return true; } + aff_tree aff_top, aff_bot; + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); + poly_widest_int poly_mul; + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) + && poly_mul.is_constant (mul)) + return true; + code = TREE_CODE (top); switch (code) {