Message ID | ZsSUzWt43/xcbN/8@arm.com |
---|---|
State | New |
Headers | show |
Series | [1/2] middle-end: refactor type to be explicit in operand_equal_p [PR114932] | expand |
ping > -----Original Message----- > From: Tamar Christina <tamar.christina@arm.com> > Sent: Tuesday, August 20, 2024 2:06 PM > To: gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > Subject: [PATCH 2/2]middle-end: use two's complement equality when comparing > IVs during candidate selection [PR114932] > > Hi All, > > IVOPTS normally uses affine trees to perform comparisons between different IVs, > but these seem to have been missing in two key spots and instead normal tree > equivalencies used. > > In some cases where we have a two-complements equivalence but not a strict > signedness equivalencies we end up generating both a signed and unsigned IV for > the same candidate. > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ. This flag will > check if the operands would produce the same bit values after the computations > even if the final sign is different. > > This happens quite a lot with fortran but can also happen in C because this came > code is unable to figure out when one expression is a multiple of another. > > As an example in the attached testcase we get: > > Initial set of candidates: > cost: 24 (complexity 3) > reg_cost: 9 > cand_cost: 15 > cand_group_cost: 0 (complexity 3) > candidates: 1, 6, 8 > group:0 --> iv_cand:6, cost=(0,1) > group:1 --> iv_cand:1, cost=(0,0) > group:2 --> iv_cand:8, cost=(0,1) > group:3 --> iv_cand:8, cost=(0,1) > invariant variables: 6 > invariant expressions: 1, 2 > > <Invariant Expressions>: > inv_expr 1: stride.3_27 * 4 > inv_expr 2: (unsigned long) stride.3_27 * 4 > > These end up being used in the same group: > > Group 1: > cand cost compl. inv.expr. inv.vars > 1 0 0 NIL; 6 > 2 0 0 NIL; 6 > 3 0 0 NIL; 6 > > which ends up with IV opts picking the signed and unsigned IVs: > > Improved to: > cost: 24 (complexity 3) > reg_cost: 9 > cand_cost: 15 > cand_group_cost: 0 (complexity 3) > candidates: 1, 6, 8 > group:0 --> iv_cand:6, cost=(0,1) > group:1 --> iv_cand:1, cost=(0,0) > group:2 --> iv_cand:8, cost=(0,1) > group:3 --> iv_cand:8, cost=(0,1) > invariant variables: 6 > invariant expressions: 1, 2 > > and so generates the same IV as both signed and unsigned: > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > 58.2545), maybe hot > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > (FALLTHRU,EXECUTABLE) > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > (FALLTHRU,DFS_BACK,EXECUTABLE) > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > ... > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > 58.2545), maybe hot > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > (FALLTHRU) > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq 19.4182) > (TRUE_VALUE,EXECUTABLE) > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq 12.9455) > (TRUE_VALUE,EXECUTABLE) > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > ivtmp.22_82 = ivtmp.22_41 + 1; > ivtmp.26_72 = ivtmp.26_51 + _80; > ivtmp.28_98 = ivtmp.28_90 + _39; > > These two IVs are always used as unsigned, so IV ops generates: > > _73 = stride.3_27 * 4; > _80 = (unsigned long) _73; > _54 = (unsigned long) stride.3_27; > _39 = _54 * 4; > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > This is because candidate 6 and 8 are equivalent under two's complement but have > different signs. > > This patch changes it so that if you have two IVs that are affine equivalent to > just pick one over the other. IV already has code for this, so the patch just > uses affine trees instead of tree for the check. > > With it we get: > > <Invariant Expressions>: > inv_expr 1: stride.3_27 * 4 > > <Group-candidate Costs>: > Group 0: > cand cost compl. inv.expr. inv.vars > 5 0 2 NIL; NIL; > 6 0 3 NIL; NIL; > > Group 1: > cand cost compl. inv.expr. inv.vars > 1 0 0 NIL; 6 > 2 0 0 NIL; 6 > 3 0 0 NIL; 6 > 4 0 0 NIL; 6 > > Initial set of candidates: > cost: 16 (complexity 3) > reg_cost: 6 > cand_cost: 10 > cand_group_cost: 0 (complexity 3) > candidates: 1, 6 > group:0 --> iv_cand:6, cost=(0,3) > group:1 --> iv_cand:1, cost=(0,0) > invariant variables: 6 > invariant expressions: 1 > > The two patches together results in a 10% performance increase in exchange2 in > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > compile > time. There's also a 5% performance improvement in fotonik3d and similar > reduction in binary size. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/114932 > * fold-const.cc (operand_compare::operand_equal_p): Use it. > (operand_compare::verify_hash_value): Likewise. > (test_operand_equality::test): New. > (fold_const_cc_tests): Use it. > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/114932 > * gfortran.dg/addressing-modes_2.f90: New test. > > --- > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index > 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b > 3301e21e038e3e41 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0, > const_tree arg0, > return tree_int_cst_equal (arg0, arg1); > } > > - if (!(flags & OEP_ADDRESS_OF)) > + if ((flags & OEP_STRUCTURAL_EQ) > + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1))) > + { > + const_tree t_arg0 = arg0; > + const_tree t_arg1 = arg1; > + STRIP_NOPS (arg0); > + STRIP_NOPS (arg1); > + /* Only recurse if the conversion was one that was valid to strip. */ > + if (t_arg0 != arg0 || t_arg1 != arg1) > + return operand_equal_p (type0, arg0, type1, arg1, flags); > + } > + > + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ))) > { > /* If both types don't have the same signedness, then we can't consider > them equal. We must check this before the STRIP_NOPS calls > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0, > const_tree arg0, > if (flags & (OEP_ONLY_CONST | OEP_BITWISE)) > return false; > > + /* Check if we are checking an operation where the two's compliment bitwise > + representation of the result is not the same between signed and > + unsigned arithmetic. */ > + switch (TREE_CODE (arg0)) > + { > + case PLUS_EXPR: > + case MINUS_EXPR: > + case MULT_EXPR: > + break; > + > + default: > + /* Clear the structural comparison bit. */ > + flags &= ~OEP_STRUCTURAL_EQ; > + break; > + } > + > /* Define macros to test an operand from arg0 and arg1 for equality and a > variant that allows null and views null as being different from any > non-null value. In the latter case, if either is null, the both > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree > arg0, const_tree arg1, > { > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > { > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | > OEP_STRUCTURAL_EQ))) > { > inchash::hash hstate0 (0), hstate1 (0); > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > @@ -17357,6 +17385,95 @@ test_arithmetic_folding () > x); > } > > +namespace test_operand_equality { > + > +/* Verify structural equality. */ > + > +/* Execute fold_vec_perm_cst unit tests. */ > + > +static void > +test () > +{ > + tree stype = integer_type_node; > + tree utype = unsigned_type_node; > + tree x = create_tmp_var_raw (stype, "x"); > + tree y = create_tmp_var_raw (stype, "y"); > + tree z = create_tmp_var_raw (stype, "z"); > + tree four = build_int_cst (stype, 4); > + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y); > + tree rhs1 = fold_convert (stype, > + fold_build2 (PLUS_EXPR, utype, > + fold_convert (utype, x), > + fold_convert (utype, y))); > + > + /* (int)((unsigned x) + (unsigned y)) == x + y. */ > + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ)); > + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0)); > + > + /* (int)(unsigned) x == x. */ > + tree lhs2 = build1 (NOP_EXPR, stype, > + build1 (NOP_EXPR, utype, x)); > + tree rhs2 = x; > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ)); > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0)); > + > + /* (unsigned x) + (unsigned y) == x + y. */ > + tree lhs3 = lhs1; > + tree rhs3 = fold_build2 (PLUS_EXPR, utype, > + fold_convert (utype, x), > + fold_convert (utype, y)); > + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ)); > + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0)); > + > + /* (unsigned x) / (unsigned y) == x / y. */ > + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);; > + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype, > + fold_convert (utype, x), > + fold_convert (utype, y)); > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ)); > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0)); > + > + /* (unsigned x) / 4 == x / 4. */ > + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);; > + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype, > + fold_convert (utype, x), > + fold_convert (utype, four)); > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ)); > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0)); > + > + /* (unsigned x) + 4 == x + 4. */ > + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four); > + tree rhs6 = fold_build2 (PLUS_EXPR, utype, > + fold_convert (utype, x), > + fold_convert (utype, four)); > + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ)); > + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0)); > + > + /* (unsigned x) + 4 == 4 + x. */ > + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x); > + tree rhs7 = fold_build2 (PLUS_EXPR, utype, > + fold_convert (utype, x), > + fold_convert (utype, four)); > + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ)); > + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0)); > + > + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */ > + tree lhs8 = fold_build2 (PLUS_EXPR, stype, > + fold_build2 (MULT_EXPR, stype, > + fold_build2 (PLUS_EXPR, stype, four, x), > + y), > + z); > + tree rhs8 = fold_build2 (MULT_EXPR, utype, > + fold_build2 (PLUS_EXPR, utype, > + fold_convert (utype, x), > + fold_convert (utype, four)), > + fold_convert (utype, y)); > + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z); > + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ)); > + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0)); > +} > +} > + > namespace test_fold_vec_perm_cst { > > /* Build a VECTOR_CST corresponding to VMODE, and has > @@ -18150,6 +18267,7 @@ fold_const_cc_tests () > test_vector_folding (); > test_vec_duplicate_folding (); > test_fold_vec_perm_cst::test (); > + test_operand_equality::test (); > } > > } // namespace selftest > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > new file mode 100644 > index > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > 2e24a4973c8539fae > --- /dev/null > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > @@ -0,0 +1,20 @@ > +! { dg-do compile { target aarch64-*-* } } > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > + > +module a > +integer, parameter :: b=3, c=b > +contains > +subroutine d(block) > +integer f, col , block(:, :, :), e > +do f = 1, c > + do col = 1, c > + block(:f, :, e()) = do > + end do > + end do > + end > + end > + > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } > } > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 > ivopts } } > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 > ivopts } } > + > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index > 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0 > 7d133bc393aa4a7 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -962,7 +962,9 @@ enum operand_equal_flag { > /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations > of the same kind. Used to compare VLA bounds involving parameters > across redeclarations of the same function. */ > - OEP_DECL_NAME = 512 > + OEP_DECL_NAME = 512, > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > + OEP_STRUCTURAL_EQ = 1024 > }; > > /* Enum and arrays used for tree allocation stats. > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > index > c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3 > ed5e614b7ccc702f 100644 > --- a/gcc/tree-ssa-loop-ivopts.cc > +++ b/gcc/tree-ssa-loop-ivopts.cc > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree > *use_p, > > /* Check if it has the same stripped base and step. */ > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > - && operand_equal_p (iv->step, use->iv->step, 0) > - && operand_equal_p (addr_base, use->addr_base, 0)) > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > + && operand_equal_p (addr_base, use->addr_base, > OEP_STRUCTURAL_EQ)) > break; > } > if (i == data->vgroups.length ()) > > > > > --
ping > -----Original Message----- > From: Tamar Christina > Sent: Tuesday, September 10, 2024 8:57 PM > To: Tamar Christina <tamar.christina@arm.com>; gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > comparing IVs during candidate selection [PR114932] > > ping > > > -----Original Message----- > > From: Tamar Christina <tamar.christina@arm.com> > > Sent: Tuesday, August 20, 2024 2:06 PM > > To: gcc-patches@gcc.gnu.org > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > > Subject: [PATCH 2/2]middle-end: use two's complement equality when > comparing > > IVs during candidate selection [PR114932] > > > > Hi All, > > > > IVOPTS normally uses affine trees to perform comparisons between different IVs, > > but these seem to have been missing in two key spots and instead normal tree > > equivalencies used. > > > > In some cases where we have a two-complements equivalence but not a strict > > signedness equivalencies we end up generating both a signed and unsigned IV for > > the same candidate. > > > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ. This flag will > > check if the operands would produce the same bit values after the computations > > even if the final sign is different. > > > > This happens quite a lot with fortran but can also happen in C because this came > > code is unable to figure out when one expression is a multiple of another. > > > > As an example in the attached testcase we get: > > > > Initial set of candidates: > > cost: 24 (complexity 3) > > reg_cost: 9 > > cand_cost: 15 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6, 8 > > group:0 --> iv_cand:6, cost=(0,1) > > group:1 --> iv_cand:1, cost=(0,0) > > group:2 --> iv_cand:8, cost=(0,1) > > group:3 --> iv_cand:8, cost=(0,1) > > invariant variables: 6 > > invariant expressions: 1, 2 > > > > <Invariant Expressions>: > > inv_expr 1: stride.3_27 * 4 > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > These end up being used in the same group: > > > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 0 0 NIL; 6 > > 2 0 0 NIL; 6 > > 3 0 0 NIL; 6 > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > Improved to: > > cost: 24 (complexity 3) > > reg_cost: 9 > > cand_cost: 15 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6, 8 > > group:0 --> iv_cand:6, cost=(0,1) > > group:1 --> iv_cand:1, cost=(0,0) > > group:2 --> iv_cand:8, cost=(0,1) > > group:3 --> iv_cand:8, cost=(0,1) > > invariant variables: 6 > > invariant expressions: 1, 2 > > > > and so generates the same IV as both signed and unsigned: > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > > 58.2545), maybe hot > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > > (FALLTHRU,EXECUTABLE) > > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > ... > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > > 58.2545), maybe hot > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > > (FALLTHRU) > > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq > 19.4182) > > (TRUE_VALUE,EXECUTABLE) > > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq > 12.9455) > > (TRUE_VALUE,EXECUTABLE) > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > ivtmp.22_82 = ivtmp.22_41 + 1; > > ivtmp.26_72 = ivtmp.26_51 + _80; > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > _73 = stride.3_27 * 4; > > _80 = (unsigned long) _73; > > _54 = (unsigned long) stride.3_27; > > _39 = _54 * 4; > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > This is because candidate 6 and 8 are equivalent under two's complement but > have > > different signs. > > > > This patch changes it so that if you have two IVs that are affine equivalent to > > just pick one over the other. IV already has code for this, so the patch just > > uses affine trees instead of tree for the check. > > > > With it we get: > > > > <Invariant Expressions>: > > inv_expr 1: stride.3_27 * 4 > > > > <Group-candidate Costs>: > > Group 0: > > cand cost compl. inv.expr. inv.vars > > 5 0 2 NIL; NIL; > > 6 0 3 NIL; NIL; > > > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 0 0 NIL; 6 > > 2 0 0 NIL; 6 > > 3 0 0 NIL; 6 > > 4 0 0 NIL; 6 > > > > Initial set of candidates: > > cost: 16 (complexity 3) > > reg_cost: 6 > > cand_cost: 10 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6 > > group:0 --> iv_cand:6, cost=(0,3) > > group:1 --> iv_cand:1, cost=(0,0) > > invariant variables: 6 > > invariant expressions: 1 > > > > The two patches together results in a 10% performance increase in exchange2 in > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > compile > > time. There's also a 5% performance improvement in fotonik3d and similar > > reduction in binary size. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > -m32, -m64 and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/114932 > > * fold-const.cc (operand_compare::operand_equal_p): Use it. > > (operand_compare::verify_hash_value): Likewise. > > (test_operand_equality::test): New. > > (fold_const_cc_tests): Use it. > > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/114932 > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > --- > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index > > > 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b > > 3301e21e038e3e41 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0, > > const_tree arg0, > > return tree_int_cst_equal (arg0, arg1); > > } > > > > - if (!(flags & OEP_ADDRESS_OF)) > > + if ((flags & OEP_STRUCTURAL_EQ) > > + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1))) > > + { > > + const_tree t_arg0 = arg0; > > + const_tree t_arg1 = arg1; > > + STRIP_NOPS (arg0); > > + STRIP_NOPS (arg1); > > + /* Only recurse if the conversion was one that was valid to strip. */ > > + if (t_arg0 != arg0 || t_arg1 != arg1) > > + return operand_equal_p (type0, arg0, type1, arg1, flags); > > + } > > + > > + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ))) > > { > > /* If both types don't have the same signedness, then we can't consider > > them equal. We must check this before the STRIP_NOPS calls > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0, > > const_tree arg0, > > if (flags & (OEP_ONLY_CONST | OEP_BITWISE)) > > return false; > > > > + /* Check if we are checking an operation where the two's compliment bitwise > > + representation of the result is not the same between signed and > > + unsigned arithmetic. */ > > + switch (TREE_CODE (arg0)) > > + { > > + case PLUS_EXPR: > > + case MINUS_EXPR: > > + case MULT_EXPR: > > + break; > > + > > + default: > > + /* Clear the structural comparison bit. */ > > + flags &= ~OEP_STRUCTURAL_EQ; > > + break; > > + } > > + > > /* Define macros to test an operand from arg0 and arg1 for equality and a > > variant that allows null and views null as being different from any > > non-null value. In the latter case, if either is null, the both > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree > > arg0, const_tree arg1, > > { > > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > > { > > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | > > OEP_STRUCTURAL_EQ))) > > { > > inchash::hash hstate0 (0), hstate1 (0); > > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding () > > x); > > } > > > > +namespace test_operand_equality { > > + > > +/* Verify structural equality. */ > > + > > +/* Execute fold_vec_perm_cst unit tests. */ > > + > > +static void > > +test () > > +{ > > + tree stype = integer_type_node; > > + tree utype = unsigned_type_node; > > + tree x = create_tmp_var_raw (stype, "x"); > > + tree y = create_tmp_var_raw (stype, "y"); > > + tree z = create_tmp_var_raw (stype, "z"); > > + tree four = build_int_cst (stype, 4); > > + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y); > > + tree rhs1 = fold_convert (stype, > > + fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, y))); > > + > > + /* (int)((unsigned x) + (unsigned y)) == x + y. */ > > + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0)); > > + > > + /* (int)(unsigned) x == x. */ > > + tree lhs2 = build1 (NOP_EXPR, stype, > > + build1 (NOP_EXPR, utype, x)); > > + tree rhs2 = x; > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ)); > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0)); > > + > > + /* (unsigned x) + (unsigned y) == x + y. */ > > + tree lhs3 = lhs1; > > + tree rhs3 = fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, y)); > > + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0)); > > + > > + /* (unsigned x) / (unsigned y) == x / y. */ > > + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);; > > + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, y)); > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0)); > > + > > + /* (unsigned x) / 4 == x / 4. */ > > + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);; > > + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)); > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0)); > > + > > + /* (unsigned x) + 4 == x + 4. */ > > + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four); > > + tree rhs6 = fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)); > > + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0)); > > + > > + /* (unsigned x) + 4 == 4 + x. */ > > + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x); > > + tree rhs7 = fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)); > > + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0)); > > + > > + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */ > > + tree lhs8 = fold_build2 (PLUS_EXPR, stype, > > + fold_build2 (MULT_EXPR, stype, > > + fold_build2 (PLUS_EXPR, stype, four, x), > > + y), > > + z); > > + tree rhs8 = fold_build2 (MULT_EXPR, utype, > > + fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)), > > + fold_convert (utype, y)); > > + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z); > > + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0)); > > +} > > +} > > + > > namespace test_fold_vec_perm_cst { > > > > /* Build a VECTOR_CST corresponding to VMODE, and has > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests () > > test_vector_folding (); > > test_vec_duplicate_folding (); > > test_fold_vec_perm_cst::test (); > > + test_operand_equality::test (); > > } > > > > } // namespace selftest > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > 2e24a4973c8539fae > > --- /dev/null > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > @@ -0,0 +1,20 @@ > > +! { dg-do compile { target aarch64-*-* } } > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > + > > +module a > > +integer, parameter :: b=3, c=b > > +contains > > +subroutine d(block) > > +integer f, col , block(:, :, :), e > > +do f = 1, c > > + do col = 1, c > > + block(:f, :, e()) = do > > + end do > > + end do > > + end > > + end > > + > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts > } > > } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 > > ivopts } } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 > > ivopts } } > > + > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > index > > > 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0 > > 7d133bc393aa4a7 100644 > > --- a/gcc/tree-core.h > > +++ b/gcc/tree-core.h > > @@ -962,7 +962,9 @@ enum operand_equal_flag { > > /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations > > of the same kind. Used to compare VLA bounds involving parameters > > across redeclarations of the same function. */ > > - OEP_DECL_NAME = 512 > > + OEP_DECL_NAME = 512, > > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > > + OEP_STRUCTURAL_EQ = 1024 > > }; > > > > /* Enum and arrays used for tree allocation stats. > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > index > > > c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3 > > ed5e614b7ccc702f 100644 > > --- a/gcc/tree-ssa-loop-ivopts.cc > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree > > *use_p, > > > > /* Check if it has the same stripped base and step. */ > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > - && operand_equal_p (iv->step, use->iv->step, 0) > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > > + && operand_equal_p (addr_base, use->addr_base, > > OEP_STRUCTURAL_EQ)) > > break; > > } > > if (i == data->vgroups.length ()) > > > > > > > > > > --
ping > -----Original Message----- > From: Tamar Christina <Tamar.Christina@arm.com> > Sent: Monday, September 23, 2024 8:41 AM > To: gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > comparing IVs during candidate selection [PR114932] > > ping > > > -----Original Message----- > > From: Tamar Christina > > Sent: Tuesday, September 10, 2024 8:57 PM > > To: Tamar Christina <tamar.christina@arm.com>; gcc-patches@gcc.gnu.org > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > > comparing IVs during candidate selection [PR114932] > > > > ping > > > > > -----Original Message----- > > > From: Tamar Christina <tamar.christina@arm.com> > > > Sent: Tuesday, August 20, 2024 2:06 PM > > > To: gcc-patches@gcc.gnu.org > > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > > > Subject: [PATCH 2/2]middle-end: use two's complement equality when > > comparing > > > IVs during candidate selection [PR114932] > > > > > > Hi All, > > > > > > IVOPTS normally uses affine trees to perform comparisons between different > IVs, > > > but these seem to have been missing in two key spots and instead normal tree > > > equivalencies used. > > > > > > In some cases where we have a two-complements equivalence but not a strict > > > signedness equivalencies we end up generating both a signed and unsigned IV > for > > > the same candidate. > > > > > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ. This flag > will > > > check if the operands would produce the same bit values after the > computations > > > even if the final sign is different. > > > > > > This happens quite a lot with fortran but can also happen in C because this > came > > > code is unable to figure out when one expression is a multiple of another. > > > > > > As an example in the attached testcase we get: > > > > > > Initial set of candidates: > > > cost: 24 (complexity 3) > > > reg_cost: 9 > > > cand_cost: 15 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6, 8 > > > group:0 --> iv_cand:6, cost=(0,1) > > > group:1 --> iv_cand:1, cost=(0,0) > > > group:2 --> iv_cand:8, cost=(0,1) > > > group:3 --> iv_cand:8, cost=(0,1) > > > invariant variables: 6 > > > invariant expressions: 1, 2 > > > > > > <Invariant Expressions>: > > > inv_expr 1: stride.3_27 * 4 > > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > > > These end up being used in the same group: > > > > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 0 0 NIL; 6 > > > 2 0 0 NIL; 6 > > > 3 0 0 NIL; 6 > > > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > > > Improved to: > > > cost: 24 (complexity 3) > > > reg_cost: 9 > > > cand_cost: 15 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6, 8 > > > group:0 --> iv_cand:6, cost=(0,1) > > > group:1 --> iv_cand:1, cost=(0,0) > > > group:2 --> iv_cand:8, cost=(0,1) > > > group:3 --> iv_cand:8, cost=(0,1) > > > invariant variables: 6 > > > invariant expressions: 1, 2 > > > > > > and so generates the same IV as both signed and unsigned: > > > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > > > 58.2545), maybe hot > > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > > > (FALLTHRU,EXECUTABLE) > > > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > > > ... > > > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > > > 58.2545), maybe hot > > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > > > (FALLTHRU) > > > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq > > 19.4182) > > > (TRUE_VALUE,EXECUTABLE) > > > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq > > 12.9455) > > > (TRUE_VALUE,EXECUTABLE) > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > > ivtmp.22_82 = ivtmp.22_41 + 1; > > > ivtmp.26_72 = ivtmp.26_51 + _80; > > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > > > _73 = stride.3_27 * 4; > > > _80 = (unsigned long) _73; > > > _54 = (unsigned long) stride.3_27; > > > _39 = _54 * 4; > > > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > > > This is because candidate 6 and 8 are equivalent under two's complement but > > have > > > different signs. > > > > > > This patch changes it so that if you have two IVs that are affine equivalent to > > > just pick one over the other. IV already has code for this, so the patch just > > > uses affine trees instead of tree for the check. > > > > > > With it we get: > > > > > > <Invariant Expressions>: > > > inv_expr 1: stride.3_27 * 4 > > > > > > <Group-candidate Costs>: > > > Group 0: > > > cand cost compl. inv.expr. inv.vars > > > 5 0 2 NIL; NIL; > > > 6 0 3 NIL; NIL; > > > > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 0 0 NIL; 6 > > > 2 0 0 NIL; 6 > > > 3 0 0 NIL; 6 > > > 4 0 0 NIL; 6 > > > > > > Initial set of candidates: > > > cost: 16 (complexity 3) > > > reg_cost: 6 > > > cand_cost: 10 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6 > > > group:0 --> iv_cand:6, cost=(0,3) > > > group:1 --> iv_cand:1, cost=(0,0) > > > invariant variables: 6 > > > invariant expressions: 1 > > > > > > The two patches together results in a 10% performance increase in exchange2 > in > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > > compile > > > time. There's also a 5% performance improvement in fotonik3d and similar > > > reduction in binary size. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > -m32, -m64 and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/114932 > > > * fold-const.cc (operand_compare::operand_equal_p): Use it. > > > (operand_compare::verify_hash_value): Likewise. > > > (test_operand_equality::test): New. > > > (fold_const_cc_tests): Use it. > > > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > > > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/114932 > > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > > > --- > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > > index > > > > > > 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b > > > 3301e21e038e3e41 100644 > > > --- a/gcc/fold-const.cc > > > +++ b/gcc/fold-const.cc > > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0, > > > const_tree arg0, > > > return tree_int_cst_equal (arg0, arg1); > > > } > > > > > > - if (!(flags & OEP_ADDRESS_OF)) > > > + if ((flags & OEP_STRUCTURAL_EQ) > > > + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1))) > > > + { > > > + const_tree t_arg0 = arg0; > > > + const_tree t_arg1 = arg1; > > > + STRIP_NOPS (arg0); > > > + STRIP_NOPS (arg1); > > > + /* Only recurse if the conversion was one that was valid to strip. */ > > > + if (t_arg0 != arg0 || t_arg1 != arg1) > > > + return operand_equal_p (type0, arg0, type1, arg1, flags); > > > + } > > > + > > > + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ))) > > > { > > > /* If both types don't have the same signedness, then we can't consider > > > them equal. We must check this before the STRIP_NOPS calls > > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0, > > > const_tree arg0, > > > if (flags & (OEP_ONLY_CONST | OEP_BITWISE)) > > > return false; > > > > > > + /* Check if we are checking an operation where the two's compliment bitwise > > > + representation of the result is not the same between signed and > > > + unsigned arithmetic. */ > > > + switch (TREE_CODE (arg0)) > > > + { > > > + case PLUS_EXPR: > > > + case MINUS_EXPR: > > > + case MULT_EXPR: > > > + break; > > > + > > > + default: > > > + /* Clear the structural comparison bit. */ > > > + flags &= ~OEP_STRUCTURAL_EQ; > > > + break; > > > + } > > > + > > > /* Define macros to test an operand from arg0 and arg1 for equality and a > > > variant that allows null and views null as being different from any > > > non-null value. In the latter case, if either is null, the both > > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree > > > arg0, const_tree arg1, > > > { > > > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > > > { > > > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > > > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | > > > OEP_STRUCTURAL_EQ))) > > > { > > > inchash::hash hstate0 (0), hstate1 (0); > > > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding () > > > x); > > > } > > > > > > +namespace test_operand_equality { > > > + > > > +/* Verify structural equality. */ > > > + > > > +/* Execute fold_vec_perm_cst unit tests. */ > > > + > > > +static void > > > +test () > > > +{ > > > + tree stype = integer_type_node; > > > + tree utype = unsigned_type_node; > > > + tree x = create_tmp_var_raw (stype, "x"); > > > + tree y = create_tmp_var_raw (stype, "y"); > > > + tree z = create_tmp_var_raw (stype, "z"); > > > + tree four = build_int_cst (stype, 4); > > > + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y); > > > + tree rhs1 = fold_convert (stype, > > > + fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, y))); > > > + > > > + /* (int)((unsigned x) + (unsigned y)) == x + y. */ > > > + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0)); > > > + > > > + /* (int)(unsigned) x == x. */ > > > + tree lhs2 = build1 (NOP_EXPR, stype, > > > + build1 (NOP_EXPR, utype, x)); > > > + tree rhs2 = x; > > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ)); > > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0)); > > > + > > > + /* (unsigned x) + (unsigned y) == x + y. */ > > > + tree lhs3 = lhs1; > > > + tree rhs3 = fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, y)); > > > + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0)); > > > + > > > + /* (unsigned x) / (unsigned y) == x / y. */ > > > + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);; > > > + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, y)); > > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0)); > > > + > > > + /* (unsigned x) / 4 == x / 4. */ > > > + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);; > > > + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)); > > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0)); > > > + > > > + /* (unsigned x) + 4 == x + 4. */ > > > + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four); > > > + tree rhs6 = fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)); > > > + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0)); > > > + > > > + /* (unsigned x) + 4 == 4 + x. */ > > > + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x); > > > + tree rhs7 = fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)); > > > + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0)); > > > + > > > + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */ > > > + tree lhs8 = fold_build2 (PLUS_EXPR, stype, > > > + fold_build2 (MULT_EXPR, stype, > > > + fold_build2 (PLUS_EXPR, stype, four, x), > > > + y), > > > + z); > > > + tree rhs8 = fold_build2 (MULT_EXPR, utype, > > > + fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)), > > > + fold_convert (utype, y)); > > > + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z); > > > + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0)); > > > +} > > > +} > > > + > > > namespace test_fold_vec_perm_cst { > > > > > > /* Build a VECTOR_CST corresponding to VMODE, and has > > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests () > > > test_vector_folding (); > > > test_vec_duplicate_folding (); > > > test_fold_vec_perm_cst::test (); > > > + test_operand_equality::test (); > > > } > > > > > > } // namespace selftest > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > new file mode 100644 > > > index > > > > > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > > 2e24a4973c8539fae > > > --- /dev/null > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > @@ -0,0 +1,20 @@ > > > +! { dg-do compile { target aarch64-*-* } } > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > > + > > > +module a > > > +integer, parameter :: b=3, c=b > > > +contains > > > +subroutine d(block) > > > +integer f, col , block(:, :, :), e > > > +do f = 1, c > > > + do col = 1, c > > > + block(:f, :, e()) = do > > > + end do > > > + end do > > > + end > > > + end > > > + > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} > ivopts > > } > > > } > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 > > > ivopts } } > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 > > > ivopts } } > > > + > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > > index > > > > > > 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0 > > > 7d133bc393aa4a7 100644 > > > --- a/gcc/tree-core.h > > > +++ b/gcc/tree-core.h > > > @@ -962,7 +962,9 @@ enum operand_equal_flag { > > > /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations > > > of the same kind. Used to compare VLA bounds involving parameters > > > across redeclarations of the same function. */ > > > - OEP_DECL_NAME = 512 > > > + OEP_DECL_NAME = 512, > > > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > > > + OEP_STRUCTURAL_EQ = 1024 > > > }; > > > > > > /* Enum and arrays used for tree allocation stats. > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > > index > > > > > > c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3 > > > ed5e614b7ccc702f 100644 > > > --- a/gcc/tree-ssa-loop-ivopts.cc > > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree > > > *use_p, > > > > > > /* Check if it has the same stripped base and step. */ > > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > > - && operand_equal_p (iv->step, use->iv->step, 0) > > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > > > + && operand_equal_p (addr_base, use->addr_base, > > > OEP_STRUCTURAL_EQ)) > > > break; > > > } > > > if (i == data->vgroups.length ()) > > > > > > > > > > > > > > > --
ping > -----Original Message----- > From: Tamar Christina <Tamar.Christina@arm.com> > Sent: Monday, October 14, 2024 4:08 PM > To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > comparing IVs during candidate selection [PR114932] > > ping > > > -----Original Message----- > > From: Tamar Christina <Tamar.Christina@arm.com> > > Sent: Monday, September 23, 2024 8:41 AM > > To: gcc-patches@gcc.gnu.org > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > > comparing IVs during candidate selection [PR114932] > > > > ping > > > > > -----Original Message----- > > > From: Tamar Christina > > > Sent: Tuesday, September 10, 2024 8:57 PM > > > To: Tamar Christina <tamar.christina@arm.com>; gcc-patches@gcc.gnu.org > > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > > > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > > > comparing IVs during candidate selection [PR114932] > > > > > > ping > > > > > > > -----Original Message----- > > > > From: Tamar Christina <tamar.christina@arm.com> > > > > Sent: Tuesday, August 20, 2024 2:06 PM > > > > To: gcc-patches@gcc.gnu.org > > > > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com > > > > Subject: [PATCH 2/2]middle-end: use two's complement equality when > > > comparing > > > > IVs during candidate selection [PR114932] > > > > > > > > Hi All, > > > > > > > > IVOPTS normally uses affine trees to perform comparisons between different > > IVs, > > > > but these seem to have been missing in two key spots and instead normal > tree > > > > equivalencies used. > > > > > > > > In some cases where we have a two-complements equivalence but not a strict > > > > signedness equivalencies we end up generating both a signed and unsigned IV > > for > > > > the same candidate. > > > > > > > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ. This flag > > will > > > > check if the operands would produce the same bit values after the > > computations > > > > even if the final sign is different. > > > > > > > > This happens quite a lot with fortran but can also happen in C because this > > came > > > > code is unable to figure out when one expression is a multiple of another. > > > > > > > > As an example in the attached testcase we get: > > > > > > > > Initial set of candidates: > > > > cost: 24 (complexity 3) > > > > reg_cost: 9 > > > > cand_cost: 15 > > > > cand_group_cost: 0 (complexity 3) > > > > candidates: 1, 6, 8 > > > > group:0 --> iv_cand:6, cost=(0,1) > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > group:2 --> iv_cand:8, cost=(0,1) > > > > group:3 --> iv_cand:8, cost=(0,1) > > > > invariant variables: 6 > > > > invariant expressions: 1, 2 > > > > > > > > <Invariant Expressions>: > > > > inv_expr 1: stride.3_27 * 4 > > > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > > > > > These end up being used in the same group: > > > > > > > > Group 1: > > > > cand cost compl. inv.expr. inv.vars > > > > 1 0 0 NIL; 6 > > > > 2 0 0 NIL; 6 > > > > 3 0 0 NIL; 6 > > > > > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > > > > > Improved to: > > > > cost: 24 (complexity 3) > > > > reg_cost: 9 > > > > cand_cost: 15 > > > > cand_group_cost: 0 (complexity 3) > > > > candidates: 1, 6, 8 > > > > group:0 --> iv_cand:6, cost=(0,1) > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > group:2 --> iv_cand:8, cost=(0,1) > > > > group:3 --> iv_cand:8, cost=(0,1) > > > > invariant variables: 6 > > > > invariant expressions: 1, 2 > > > > > > > > and so generates the same IV as both signed and unsigned: > > > > > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > > > > 58.2545), maybe hot > > > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > > > ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) > > > > (FALLTHRU,EXECUTABLE) > > > > ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) > > > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > > > > > ... > > > > > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > > > > 58.2545), maybe hot > > > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > > > ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) > > > > (FALLTHRU) > > > > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq > > > 19.4182) > > > > (TRUE_VALUE,EXECUTABLE) > > > > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq > > > 12.9455) > > > > (TRUE_VALUE,EXECUTABLE) > > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > > > ivtmp.22_82 = ivtmp.22_41 + 1; > > > > ivtmp.26_72 = ivtmp.26_51 + _80; > > > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > > > > > _73 = stride.3_27 * 4; > > > > _80 = (unsigned long) _73; > > > > _54 = (unsigned long) stride.3_27; > > > > _39 = _54 * 4; > > > > > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > > > > > This is because candidate 6 and 8 are equivalent under two's complement but > > > have > > > > different signs. > > > > > > > > This patch changes it so that if you have two IVs that are affine equivalent to > > > > just pick one over the other. IV already has code for this, so the patch just > > > > uses affine trees instead of tree for the check. > > > > > > > > With it we get: > > > > > > > > <Invariant Expressions>: > > > > inv_expr 1: stride.3_27 * 4 > > > > > > > > <Group-candidate Costs>: > > > > Group 0: > > > > cand cost compl. inv.expr. inv.vars > > > > 5 0 2 NIL; NIL; > > > > 6 0 3 NIL; NIL; > > > > > > > > Group 1: > > > > cand cost compl. inv.expr. inv.vars > > > > 1 0 0 NIL; 6 > > > > 2 0 0 NIL; 6 > > > > 3 0 0 NIL; 6 > > > > 4 0 0 NIL; 6 > > > > > > > > Initial set of candidates: > > > > cost: 16 (complexity 3) > > > > reg_cost: 6 > > > > cand_cost: 10 > > > > cand_group_cost: 0 (complexity 3) > > > > candidates: 1, 6 > > > > group:0 --> iv_cand:6, cost=(0,3) > > > > group:1 --> iv_cand:1, cost=(0,0) > > > > invariant variables: 6 > > > > invariant expressions: 1 > > > > > > > > The two patches together results in a 10% performance increase in exchange2 > > in > > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > > > compile > > > > time. There's also a 5% performance improvement in fotonik3d and similar > > > > reduction in binary size. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > -m32, -m64 and no issues. > > > > > > > > Ok for master? > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > PR tree-optimization/114932 > > > > * fold-const.cc (operand_compare::operand_equal_p): Use it. > > > > (operand_compare::verify_hash_value): Likewise. > > > > (test_operand_equality::test): New. > > > > (fold_const_cc_tests): Use it. > > > > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > > > > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR tree-optimization/114932 > > > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > > > > > --- > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > > > index > > > > > > > > > > 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b > > > > 3301e21e038e3e41 100644 > > > > --- a/gcc/fold-const.cc > > > > +++ b/gcc/fold-const.cc > > > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree > type0, > > > > const_tree arg0, > > > > return tree_int_cst_equal (arg0, arg1); > > > > } > > > > > > > > - if (!(flags & OEP_ADDRESS_OF)) > > > > + if ((flags & OEP_STRUCTURAL_EQ) > > > > + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1))) > > > > + { > > > > + const_tree t_arg0 = arg0; > > > > + const_tree t_arg1 = arg1; > > > > + STRIP_NOPS (arg0); > > > > + STRIP_NOPS (arg1); > > > > + /* Only recurse if the conversion was one that was valid to strip. */ > > > > + if (t_arg0 != arg0 || t_arg1 != arg1) > > > > + return operand_equal_p (type0, arg0, type1, arg1, flags); > > > > + } > > > > + > > > > + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ))) > > > > { > > > > /* If both types don't have the same signedness, then we can't consider > > > > them equal. We must check this before the STRIP_NOPS calls > > > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree > type0, > > > > const_tree arg0, > > > > if (flags & (OEP_ONLY_CONST | OEP_BITWISE)) > > > > return false; > > > > > > > > + /* Check if we are checking an operation where the two's compliment > bitwise > > > > + representation of the result is not the same between signed and > > > > + unsigned arithmetic. */ > > > > + switch (TREE_CODE (arg0)) > > > > + { > > > > + case PLUS_EXPR: > > > > + case MINUS_EXPR: > > > > + case MULT_EXPR: > > > > + break; > > > > + > > > > + default: > > > > + /* Clear the structural comparison bit. */ > > > > + flags &= ~OEP_STRUCTURAL_EQ; > > > > + break; > > > > + } > > > > + > > > > /* Define macros to test an operand from arg0 and arg1 for equality and a > > > > variant that allows null and views null as being different from any > > > > non-null value. In the latter case, if either is null, the both > > > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree > > > > arg0, const_tree arg1, > > > > { > > > > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > > > > { > > > > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > > > > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | > > > > OEP_STRUCTURAL_EQ))) > > > > { > > > > inchash::hash hstate0 (0), hstate1 (0); > > > > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > > > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding () > > > > x); > > > > } > > > > > > > > +namespace test_operand_equality { > > > > + > > > > +/* Verify structural equality. */ > > > > + > > > > +/* Execute fold_vec_perm_cst unit tests. */ > > > > + > > > > +static void > > > > +test () > > > > +{ > > > > + tree stype = integer_type_node; > > > > + tree utype = unsigned_type_node; > > > > + tree x = create_tmp_var_raw (stype, "x"); > > > > + tree y = create_tmp_var_raw (stype, "y"); > > > > + tree z = create_tmp_var_raw (stype, "z"); > > > > + tree four = build_int_cst (stype, 4); > > > > + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y); > > > > + tree rhs1 = fold_convert (stype, > > > > + fold_build2 (PLUS_EXPR, utype, > > > > + fold_convert (utype, x), > > > > + fold_convert (utype, y))); > > > > + > > > > + /* (int)((unsigned x) + (unsigned y)) == x + y. */ > > > > + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0)); > > > > + > > > > + /* (int)(unsigned) x == x. */ > > > > + tree lhs2 = build1 (NOP_EXPR, stype, > > > > + build1 (NOP_EXPR, utype, x)); > > > > + tree rhs2 = x; > > > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0)); > > > > + > > > > + /* (unsigned x) + (unsigned y) == x + y. */ > > > > + tree lhs3 = lhs1; > > > > + tree rhs3 = fold_build2 (PLUS_EXPR, utype, > > > > + fold_convert (utype, x), > > > > + fold_convert (utype, y)); > > > > + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0)); > > > > + > > > > + /* (unsigned x) / (unsigned y) == x / y. */ > > > > + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);; > > > > + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype, > > > > + fold_convert (utype, x), > > > > + fold_convert (utype, y)); > > > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0)); > > > > + > > > > + /* (unsigned x) / 4 == x / 4. */ > > > > + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);; > > > > + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype, > > > > + fold_convert (utype, x), > > > > + fold_convert (utype, four)); > > > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0)); > > > > + > > > > + /* (unsigned x) + 4 == x + 4. */ > > > > + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four); > > > > + tree rhs6 = fold_build2 (PLUS_EXPR, utype, > > > > + fold_convert (utype, x), > > > > + fold_convert (utype, four)); > > > > + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0)); > > > > + > > > > + /* (unsigned x) + 4 == 4 + x. */ > > > > + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x); > > > > + tree rhs7 = fold_build2 (PLUS_EXPR, utype, > > > > + fold_convert (utype, x), > > > > + fold_convert (utype, four)); > > > > + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0)); > > > > + > > > > + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */ > > > > + tree lhs8 = fold_build2 (PLUS_EXPR, stype, > > > > + fold_build2 (MULT_EXPR, stype, > > > > + fold_build2 (PLUS_EXPR, stype, four, x), > > > > + y), > > > > + z); > > > > + tree rhs8 = fold_build2 (MULT_EXPR, utype, > > > > + fold_build2 (PLUS_EXPR, utype, > > > > + fold_convert (utype, x), > > > > + fold_convert (utype, four)), > > > > + fold_convert (utype, y)); > > > > + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z); > > > > + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ)); > > > > + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0)); > > > > +} > > > > +} > > > > + > > > > namespace test_fold_vec_perm_cst { > > > > > > > > /* Build a VECTOR_CST corresponding to VMODE, and has > > > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests () > > > > test_vector_folding (); > > > > test_vec_duplicate_folding (); > > > > test_fold_vec_perm_cst::test (); > > > > + test_operand_equality::test (); > > > > } > > > > > > > > } // namespace selftest > > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > new file mode 100644 > > > > index > > > > > > > > > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > > > 2e24a4973c8539fae > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > > @@ -0,0 +1,20 @@ > > > > +! { dg-do compile { target aarch64-*-* } } > > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > > > + > > > > +module a > > > > +integer, parameter :: b=3, c=b > > > > +contains > > > > +subroutine d(block) > > > > +integer f, col , block(:, :, :), e > > > > +do f = 1, c > > > > + do col = 1, c > > > > + block(:f, :, e()) = do > > > > + end do > > > > + end do > > > > + end > > > > + end > > > > + > > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} > > ivopts > > > } > > > > } > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} > 1 > > > > ivopts } } > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} > 1 > > > > ivopts } } > > > > + > > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > > > index > > > > > > > > > > 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0 > > > > 7d133bc393aa4a7 100644 > > > > --- a/gcc/tree-core.h > > > > +++ b/gcc/tree-core.h > > > > @@ -962,7 +962,9 @@ enum operand_equal_flag { > > > > /* In conjunction with OEP_LEXICOGRAPHIC considers names of > declarations > > > > of the same kind. Used to compare VLA bounds involving parameters > > > > across redeclarations of the same function. */ > > > > - OEP_DECL_NAME = 512 > > > > + OEP_DECL_NAME = 512, > > > > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > > > > + OEP_STRUCTURAL_EQ = 1024 > > > > }; > > > > > > > > /* Enum and arrays used for tree allocation stats. > > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > > > index > > > > > > > > > > c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3 > > > > ed5e614b7ccc702f 100644 > > > > --- a/gcc/tree-ssa-loop-ivopts.cc > > > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree > > > > *use_p, > > > > > > > > /* Check if it has the same stripped base and step. */ > > > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > > > - && operand_equal_p (iv->step, use->iv->step, 0) > > > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > > > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > > > > + && operand_equal_p (addr_base, use->addr_base, > > > > OEP_STRUCTURAL_EQ)) > > > > break; > > > > } > > > > if (i == data->vgroups.length ()) > > > > > > > > > > > > > > > > > > > > --
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b3301e21e038e3e41 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0, const_tree arg0, return tree_int_cst_equal (arg0, arg1); } - if (!(flags & OEP_ADDRESS_OF)) + if ((flags & OEP_STRUCTURAL_EQ) + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1))) + { + const_tree t_arg0 = arg0; + const_tree t_arg1 = arg1; + STRIP_NOPS (arg0); + STRIP_NOPS (arg1); + /* Only recurse if the conversion was one that was valid to strip. */ + if (t_arg0 != arg0 || t_arg1 != arg1) + return operand_equal_p (type0, arg0, type1, arg1, flags); + } + + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ))) { /* If both types don't have the same signedness, then we can't consider them equal. We must check this before the STRIP_NOPS calls @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0, const_tree arg0, if (flags & (OEP_ONLY_CONST | OEP_BITWISE)) return false; + /* Check if we are checking an operation where the two's compliment bitwise + representation of the result is not the same between signed and + unsigned arithmetic. */ + switch (TREE_CODE (arg0)) + { + case PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + break; + + default: + /* Clear the structural comparison bit. */ + flags &= ~OEP_STRUCTURAL_EQ; + break; + } + /* Define macros to test an operand from arg0 and arg1 for equality and a variant that allows null and views null as being different from any non-null value. In the latter case, if either is null, the both @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree arg0, const_tree arg1, { if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) { - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ))) { inchash::hash hstate0 (0), hstate1 (0); hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); @@ -17357,6 +17385,95 @@ test_arithmetic_folding () x); } +namespace test_operand_equality { + +/* Verify structural equality. */ + +/* Execute fold_vec_perm_cst unit tests. */ + +static void +test () +{ + tree stype = integer_type_node; + tree utype = unsigned_type_node; + tree x = create_tmp_var_raw (stype, "x"); + tree y = create_tmp_var_raw (stype, "y"); + tree z = create_tmp_var_raw (stype, "z"); + tree four = build_int_cst (stype, 4); + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y); + tree rhs1 = fold_convert (stype, + fold_build2 (PLUS_EXPR, utype, + fold_convert (utype, x), + fold_convert (utype, y))); + + /* (int)((unsigned x) + (unsigned y)) == x + y. */ + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ)); + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0)); + + /* (int)(unsigned) x == x. */ + tree lhs2 = build1 (NOP_EXPR, stype, + build1 (NOP_EXPR, utype, x)); + tree rhs2 = x; + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ)); + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0)); + + /* (unsigned x) + (unsigned y) == x + y. */ + tree lhs3 = lhs1; + tree rhs3 = fold_build2 (PLUS_EXPR, utype, + fold_convert (utype, x), + fold_convert (utype, y)); + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ)); + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0)); + + /* (unsigned x) / (unsigned y) == x / y. */ + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);; + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype, + fold_convert (utype, x), + fold_convert (utype, y)); + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ)); + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0)); + + /* (unsigned x) / 4 == x / 4. */ + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);; + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype, + fold_convert (utype, x), + fold_convert (utype, four)); + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ)); + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0)); + + /* (unsigned x) + 4 == x + 4. */ + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four); + tree rhs6 = fold_build2 (PLUS_EXPR, utype, + fold_convert (utype, x), + fold_convert (utype, four)); + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ)); + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0)); + + /* (unsigned x) + 4 == 4 + x. */ + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x); + tree rhs7 = fold_build2 (PLUS_EXPR, utype, + fold_convert (utype, x), + fold_convert (utype, four)); + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ)); + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0)); + + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */ + tree lhs8 = fold_build2 (PLUS_EXPR, stype, + fold_build2 (MULT_EXPR, stype, + fold_build2 (PLUS_EXPR, stype, four, x), + y), + z); + tree rhs8 = fold_build2 (MULT_EXPR, utype, + fold_build2 (PLUS_EXPR, utype, + fold_convert (utype, x), + fold_convert (utype, four)), + fold_convert (utype, y)); + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z); + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ)); + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0)); +} +} + namespace test_fold_vec_perm_cst { /* Build a VECTOR_CST corresponding to VMODE, and has @@ -18150,6 +18267,7 @@ fold_const_cc_tests () test_vector_folding (); test_vec_duplicate_folding (); test_fold_vec_perm_cst::test (); + test_operand_equality::test (); } } // namespace selftest diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 new file mode 100644 index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae --- /dev/null +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 @@ -0,0 +1,20 @@ +! { dg-do compile { target aarch64-*-* } } +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } + +module a +integer, parameter :: b=3, c=b +contains +subroutine d(block) +integer f, col , block(:, :, :), e +do f = 1, c + do col = 1, c + block(:f, :, e()) = do + end do + end do + end + end + +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } } + diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -962,7 +962,9 @@ enum operand_equal_flag { /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations of the same kind. Used to compare VLA bounds involving parameters across redeclarations of the same function. */ - OEP_DECL_NAME = 512 + OEP_DECL_NAME = 512, + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ + OEP_STRUCTURAL_EQ = 1024 }; /* Enum and arrays used for tree allocation stats. diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc index c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f 100644 --- a/gcc/tree-ssa-loop-ivopts.cc +++ b/gcc/tree-ssa-loop-ivopts.cc @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree *use_p, /* Check if it has the same stripped base and step. */ if (operand_equal_p (iv->base_object, use->iv->base_object, 0) - && operand_equal_p (iv->step, use->iv->step, 0) - && operand_equal_p (addr_base, use->addr_base, 0)) + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) + && operand_equal_p (addr_base, use->addr_base, OEP_STRUCTURAL_EQ)) break; } if (i == data->vgroups.length ())