Message ID | patch-15145-tamar@arm.com |
---|---|
State | New |
Headers | show |
Series | [1/3] middle-end vect: Simplify and extend the complex numbers validation routines. | expand |
Just a comment on the documentation: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is semantically the same as > a multiply and accumulate of complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] += a[i] * b[i]; > + op2[i] += op1[i] * op2[i]; > @} I think this should be: op0[i] = op1[i] * op2[i] + op3[i]; since operand 0 is the output and operand 3 is the accumulator input. Same idea for the others. For: > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically the same as multiply of > complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] = a[i] * b[i]; > + op2[i] = op0[i] * op1[i]; …this I think it should be: op0[i] = op1[i] * op2[i]; Thanks, Richard
> -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Friday, December 17, 2021 4:19 PM > To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> > Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>; > rguenther@suse.de > Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex > numbers validation routines. > > Just a comment on the documentation: > > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index > > > 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467 > bc66 > > e9cfebe9dcfc 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that > > is semantically the same as a multiply and accumulate of complex numbers. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] += a[i] * b[i]; > > + op2[i] += op1[i] * op2[i]; > > @} > > I think this should be: > > op0[i] = op1[i] * op2[i] + op3[i]; > > since operand 0 is the output and operand 3 is the accumulator input. > > Same idea for the others. For: > > > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically > > the same as multiply of complex numbers. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] = a[i] * b[i]; > > + op2[i] = op0[i] * op1[i]; > > …this I think it should be: > > op0[i] = op1[i] * op2[i]; Updated patch attached. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no regressions. Ok for master? and backport to GCC 11 after some stew? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/102819 PR tree-optimization/103169 * doc/md.texi: Update docs for cfms, cfma. * tree-data-ref.h (same_data_refs): Accept optional offset. * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating patterns. (vect_normalize_conj_loc): Remove. (is_eq_or_top): Change to take two nodes. (enum _conj_status, compatible_complex_nodes_p, vect_validate_multiplication): New. (class complex_add_pattern, complex_add_pattern::matches, complex_add_pattern::recognize, class complex_mul_pattern, complex_mul_pattern::recognize, class complex_fms_pattern, complex_fms_pattern::recognize, class complex_operations_pattern, complex_operations_pattern::recognize, addsub_pattern::recognize): Pass new cache. (complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new cache and use new validation code. * tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns, vect_analyze_slp): Pass along cache. (compatible_calls_p): Expose. * tree-vectorizer.h (compatible_calls_p, slp_node_hash, slp_compat_nodes_map_t): New. (class vect_pattern): Update signatures include new cache. gcc/testsuite/ChangeLog: PR tree-optimization/102819 PR tree-optimization/103169 * g++.dg/vect/pr99149.cc: xfail for now. * gcc.dg/vect/complex/pr102819-1.c: New test. * gcc.dg/vect/complex/pr102819-2.c: New test. * gcc.dg/vect/complex/pr102819-3.c: New test. * gcc.dg/vect/complex/pr102819-4.c: New test. * gcc.dg/vect/complex/pr102819-5.c: New test. * gcc.dg/vect/complex/pr102819-6.c: New test. * gcc.dg/vect/complex/pr102819-7.c: New test. * gcc.dg/vect/complex/pr102819-8.c: New test. * gcc.dg/vect/complex/pr102819-9.c: New test. * gcc.dg/vect/complex/pr103169.c: New test. --- inline copy of patch --- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..ad06b02d36876082afe4c3f3fb51887f7a522b23 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -6325,12 +6325,13 @@ Perform a vector multiply and accumulate that is semantically the same as a multiply and accumulate of complex numbers. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; + complex TYPE op3[N]; for (int i = 0; i < N; i += 1) @{ - c[i] += a[i] * b[i]; + op0[i] = op1[i] * op2[i] + op3[i]; @} @end smallexample @@ -6348,12 +6349,13 @@ the same as a multiply and accumulate of complex numbers where the second multiply arguments is conjugated. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; + complex TYPE op3[N]; for (int i = 0; i < N; i += 1) @{ - c[i] += a[i] * conj (b[i]); + op0[i] = op1[i] * conj (op2[i]) + op3[i]; @} @end smallexample @@ -6370,12 +6372,13 @@ Perform a vector multiply and subtract that is semantically the same as a multiply and subtract of complex numbers. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; + complex TYPE op3[N]; for (int i = 0; i < N; i += 1) @{ - c[i] -= a[i] * b[i]; + op0[i] = op1[i] * op2[i] - op3[i]; @} @end smallexample @@ -6393,12 +6396,13 @@ the same as a multiply and subtract of complex numbers where the second multiply arguments is conjugated. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; + complex TYPE op3[N]; for (int i = 0; i < N; i += 1) @{ - c[i] -= a[i] * conj (b[i]); + op0[i] = op1[i] * conj (op2[i]) - op3[i]; @} @end smallexample @@ -6415,12 +6419,12 @@ Perform a vector multiply that is semantically the same as multiply of complex numbers. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] = a[i] * b[i]; + op0[i] = op1[i] * op2[i]; @} @end smallexample @@ -6437,12 +6441,12 @@ Perform a vector multiply by conjugate that is semantically the same as a multiply of complex numbers where the second multiply arguments is conjugated. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] = a[i] * conj (b[i]); + op0[i] = op1[i] * conj (op2[i]); @} @end smallexample diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755 --- a/gcc/testsuite/g++.dg/vect/pr99149.cc +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc @@ -24,4 +24,4 @@ public: } n; main() { n.j(); } -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */ +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c new file mode 100644 index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad1(float v1, float v2) +{ + for (int r = 0; r < 100; r += 4) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); + f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1); + f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2); + // ^^^^^^^ ^^^^^^^ + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c new file mode 100644 index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad1(float v1, float v2) +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2); + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c new file mode 100644 index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void good1(float v1, float v2) +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c new file mode 100644 index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void good1() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i]; + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c new file mode 100644 index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void good2() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1); + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1); + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c new file mode 100644 index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad1() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i]; + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r]; + // ^^^^^^^ ^^^^^^^ + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c new file mode 100644 index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad2() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i]; + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r]; + // ^^^^ + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c new file mode 100644 index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad3() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i]; + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; + // ^^^^^^^ + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c new file mode 100644 index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +#include <stdio.h> +#include <complex.h> + +#define N 200 +#define TYPE float +#define TYPE2 float + +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + { + c[i] -= a[i] * b[0]; + } +} + +/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS. */ + +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c new file mode 100644 index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { vect_double } } } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */ + +_Complex double b_0, c_0; + +void +mul270snd (void) +{ + c_0 = b_0 * 1.0iF * 1.0iF; +} + diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b) } /* Return true when the data references A and B are accessing the same - memory object with the same access functions. */ + memory object with the same access functions. Optionally skip the + last OFFSET dimensions in the data reference. */ static inline bool -same_data_refs (data_reference_p a, data_reference_p b) +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0) { unsigned int i; @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b) if (!same_data_refs_base_objects (a, b)) return false; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) + for (i = offset; i < DR_NUM_DIMENSIONS (a); i++) if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i))) return false; diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c index 0350441fad9690cd5d04337171ca3470a064a571..020c29bba08c5bd80503a2dbc04292f8fd310b3c 100644 --- a/gcc/tree-vect-slp-patterns.c +++ b/gcc/tree-vect-slp-patterns.c @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads) int valid_patterns = 4; FOR_EACH_VEC_ELT (loads, i, load) { - if (candidates[0] != PERM_UNKNOWN && load != 1) + unsigned adj_load = load % 2; + if (candidates[0] != PERM_UNKNOWN && adj_load != 1) { candidates[0] = PERM_UNKNOWN; valid_patterns--; } - if (candidates[1] != PERM_UNKNOWN && load != 0) + if (candidates[1] != PERM_UNKNOWN && adj_load != 0) { candidates[1] = PERM_UNKNOWN; valid_patterns--; @@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); static vect_pattern* mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) @@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo) internal_fn complex_add_pattern::matches (complex_operation_t op, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t * /* compat_cache */, slp_tree *node, vec<slp_tree> *ops) { internal_fn ifn = IFN_LAST; @@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op, vect_pattern* complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node) { auto_vec<slp_tree> ops; complex_operation_t op = vect_detect_pair_op (*node, true, &ops); internal_fn ifn - = complex_add_pattern::matches (op, perm_cache, node, &ops); + = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops); if (ifn == IFN_LAST) return NULL; @@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, * complex_mul_pattern ******************************************************************************/ -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the first - child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE. - - If a negate is found then the values in ARGS are reordered such that the - negate node is always the second one and the entry is replaced by the child - of the negate node. */ +/* Helper function to check if PERM is KIND or PERM_TOP. */ static inline bool -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL) +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache, + slp_tree op1, complex_perm_kinds_t kind1, + slp_tree op2, complex_perm_kinds_t kind2) { - gcc_assert (args.length () == 2); - bool neg_found = false; - - if (vect_match_expression_p (args[0], NEGATE_EXPR)) - { - std::swap (args[0], args[1]); - neg_found = true; - if (neg_first_p) - *neg_first_p = true; - } - else if (vect_match_expression_p (args[1], NEGATE_EXPR)) - { - neg_found = true; - if (neg_first_p) - *neg_first_p = false; - } + complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1); + if (perm1 != kind1 && perm1 != PERM_TOP) + return false; - if (neg_found) - args[1] = SLP_TREE_CHILDREN (args[1])[0]; + complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2); + if (perm2 != kind2 && perm2 != PERM_TOP) + return false; - return neg_found; + return true; } -/* Helper function to check if PERM is KIND or PERM_TOP. */ +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND }; static inline bool -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind) +compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache, + slp_tree a, int *pa, slp_tree b, int *pb) { - return perm == kind || perm == PERM_TOP; -} + bool *tmp; + std::pair<slp_tree, slp_tree> key = std::make_pair(a, b); + if ((tmp = compat_cache->get (key)) != NULL) + return *tmp; -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR - nodes but also that they represent an operation that is either a complex - multiplication or a complex multiplication by conjugated value. + compat_cache->put (key, false); - Of the negation is expected to be in the first half of the tree (As required - by an FMS pattern) then NEG_FIRST is true. If the operation is a conjugate - operation then CONJ_FIRST_OPERAND is set to indicate whether the first or - second operand contains the conjugate operation. */ + if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ()) + return false; -static inline bool -vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, - const vec<slp_tree> &left_op, - const vec<slp_tree> &right_op, - bool neg_first, bool *conj_first_operand, - bool fms) -{ - /* The presence of a negation indicates that we have either a conjugate or a - rotation. We need to distinguish which one. */ - *conj_first_operand = false; - complex_perm_kinds_t kind; - - /* Complex conjugates have the negation on the imaginary part of the - number where rotations affect the real component. So check if the - negation is on a dup of lane 1. */ - if (fms) + if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b)) + return false; + + /* Only internal nodes can be loads, as such we can't check further if they + are externals. */ + if (SLP_TREE_DEF_TYPE (a) != vect_internal_def) { - /* Canonicalization for fms is not consistent. So have to test both - variants to be sure. This needs to be fixed in the mid-end so - this part can be simpler. */ - kind = linear_loads_p (perm_cache, right_op[0]); - if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD) - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), - PERM_ODDEVEN)) - || (kind == PERM_ODDEVEN - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), - PERM_ODDODD)))) - return false; + for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++) + { + tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]]; + tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]]; + if (!operand_equal_p (op1, op2, 0)) + return false; + } + + compat_cache->put (key, true); + return true; } + + auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a)); + auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b)); + + if (gimple_code (a_stmt) != gimple_code (b_stmt)) + return false; + + /* code, children, type, externals, loads, constants */ + if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt)) + return false; + + /* At this point, a and b are known to be the same gimple operations. */ + if (is_gimple_call (a_stmt)) + { + if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt), + dyn_cast <gcall *> (b_stmt))) + return false; + } + else if (!is_gimple_assign (a_stmt)) + return false; else { - if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD - && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), - PERM_ODDEVEN)) + tree_code acode = gimple_assign_rhs_code (a_stmt); + tree_code bcode = gimple_assign_rhs_code (b_stmt); + if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR) + && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR)) + return true; + + if (acode != bcode) return false; } - /* Deal with differences in indexes. */ - int index1 = fms ? 1 : 0; - int index2 = fms ? 0 : 1; - - /* Check if the conjugate is on the second first or second operand. The - order of the node with the conjugate value determines this, and the dup - node must be one of lane 0 of the same DR as the neg node. */ - kind = linear_loads_p (perm_cache, left_op[index1]); - if (kind == PERM_TOP) + if (!SLP_TREE_LOAD_PERMUTATION (a).exists () + || !SLP_TREE_LOAD_PERMUTATION (b).exists ()) { - if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD) - return true; + for (unsigned i = 0; i < gimple_num_args (a_stmt); i++) + { + tree t1 = gimple_arg (a_stmt, i); + tree t2 = gimple_arg (b_stmt, i); + if (TREE_CODE (t1) != TREE_CODE (t2)) + return false; + + /* If SSA name then we will need to inspect the children + so we can punt here. */ + if (TREE_CODE (t1) == SSA_NAME) + continue; + + if (!operand_equal_p (t1, t2, 0)) + return false; + } } - else if (kind == PERM_EVENODD && !neg_first) + else { - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN) + auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a)); + auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b)); + /* Don't check the last dimension as that's checked by the lineary + checks. This check is also much stricter than what we need + because it doesn't consider loading from adjacent elements + in the same struct as loading from the same base object. + But for now, I'll play it safe. */ + if (!same_data_refs (dr1, dr2, 1)) return false; - return true; } - else if (kind == PERM_EVENEVEN && neg_first) + + for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++) { - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD) + if (!compatible_complex_nodes_p (compat_cache, + SLP_TREE_CHILDREN (a)[i], pa, + SLP_TREE_CHILDREN (b)[i], pb)) return false; - - *conj_first_operand = true; - return true; } - else - return false; - - if (kind != PERM_EVENEVEN) - return false; + compat_cache->put (key, true); return true; } -/* Helper function to help distinguish between a conjugate and a rotation in a - complex multiplication. The operations have similar shapes but the order of - the load permutes are different. This function returns TRUE when the order - is consistent with a multiplication or multiplication by conjugated - operand but returns FALSE if it's a multiplication by rotated operand. */ - static inline bool vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, - const vec<slp_tree> &op, - complex_perm_kinds_t permKind) + slp_compat_nodes_map_t *compat_cache, + vec<slp_tree> &left_op, + vec<slp_tree> &right_op, + bool subtract, + enum _conj_status *_status) { - /* The left node is the more common case, test it first. */ - if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind)) + auto_vec<slp_tree> ops; + enum _conj_status stats = CONJ_NONE; + + /* The complex operations can occur in two layouts and two permute sequences + so declare them and re-use them. */ + int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}. */ + , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}. */ + }; + + /* Now for the corresponding permutes that go with these values. */ + complex_perm_kinds_t perms[][4] + = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN } + , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD } + }; + + /* These permutes are used during comparisons of externals on which + we require strict equality. */ + int cq[][4][2] + = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } } + , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } } + }; + + /* Default to style and perm 0, most operations use this one. */ + int style = 0; + int perm = subtract ? 1 : 0; + + /* Check if we have a negate operation, if so absorb the node and continue + looking. */ + bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR); + bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR); + + /* Determine which style we're looking at. We only have different ones + whenever a conjugate is involved. */ + if (neg0 && neg1) + ; + else if (neg0) { - if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind)) - return false; + right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0]; + stats = CONJ_FST; + if (subtract) + perm = 0; } - return true; + else if (neg1) + { + right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0]; + stats = CONJ_SND; + perm = 1; + } + + *_status = stats; + + /* Flatten the inputs after we've remapped them. */ + ops.create (4); + ops.safe_splice (left_op); + ops.safe_splice (right_op); + + /* Extract out the elements to check. */ + slp_tree op0 = ops[styles[style][0]]; + slp_tree op1 = ops[styles[style][1]]; + slp_tree op2 = ops[styles[style][2]]; + slp_tree op3 = ops[styles[style][3]]; + + /* Do cheapest test first. If failed no need to analyze further. */ + if (linear_loads_p (perm_cache, op0) != perms[perm][0] + || linear_loads_p (perm_cache, op1) != perms[perm][1] + || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3])) + return false; + + return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1, + cq[perm][1]) + && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3, + cq[perm][3]); } /* This function combines two nodes containing only even and only odd lanes @@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); static vect_pattern* mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) @@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern internal_fn complex_mul_pattern::matches (complex_operation_t op, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node, vec<slp_tree> *ops) { internal_fn ifn = IFN_LAST; @@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op, || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN) return IFN_LAST; - bool neg_first = false; - bool conj_first_operand = false; - bool is_neg = vect_normalize_conj_loc (right_op, &neg_first); + enum _conj_status status; + if (!vect_validate_multiplication (perm_cache, compat_cache, left_op, + right_op, false, &status)) + return IFN_LAST; - if (!is_neg) + if (status == CONJ_NONE) { - /* A multiplication needs to multiply agains the real pair, otherwise - the pattern matches that of FMS. */ - if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN) - || vect_normalize_conj_loc (left_op)) - return IFN_LAST; if (add0) ifn = IFN_COMPLEX_FMA; else @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op, } else { - if (!vect_validate_multiplication (perm_cache, left_op, right_op, - neg_first, &conj_first_operand, - false)) - return IFN_LAST; - if(add0) ifn = IFN_COMPLEX_FMA_CONJ; else @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op, ops->quick_push (add0); complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]); - if (kind == PERM_EVENODD) + if (kind == PERM_EVENODD || kind == PERM_TOP) { ops->quick_push (left_op[1]); ops->quick_push (right_op[1]); ops->quick_push (left_op[0]); } - else if (kind == PERM_TOP) - { - ops->quick_push (left_op[1]); - ops->quick_push (right_op[1]); - ops->quick_push (left_op[0]); - } - else if (kind == PERM_EVENEVEN && !conj_first_operand) + else if (kind == PERM_EVENEVEN && status != CONJ_SND) { ops->quick_push (left_op[0]); ops->quick_push (right_op[0]); @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op, vect_pattern* complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node) { auto_vec<slp_tree> ops; complex_operation_t op = vect_detect_pair_op (*node, true, &ops); internal_fn ifn - = complex_mul_pattern::matches (op, perm_cache, node, &ops); + = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops); if (ifn == IFN_LAST) return NULL; @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo) /* First re-arrange the children. */ SLP_TREE_CHILDREN (*this->m_node).safe_grow (3); - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0]; - SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3]; - SLP_TREE_CHILDREN (*this->m_node)[2] = newnode; + SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[3]; + SLP_TREE_CHILDREN (*this->m_node)[1] = newnode; + SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0]; /* Tell the builder to expect an extra argument. */ this->m_num_args++; @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); static vect_pattern* mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) @@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern internal_fn complex_fms_pattern::matches (complex_operation_t op, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree * ref_node, vec<slp_tree> *ops) { internal_fn ifn = IFN_LAST; @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op, if (!vect_match_expression_p (root, MINUS_EXPR)) return IFN_LAST; + /* TODO: Support invariants here, with the new layout CADD now + can match before we get a chance to try CFMS. */ auto nodes = SLP_TREE_CHILDREN (root); if (!vect_match_expression_p (nodes[1], MULT_EXPR) || vect_detect_pair_op (nodes[0]) != PLUS_MINUS) @@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op, || !vect_match_expression_p (l0node[1], MULT_EXPR)) return IFN_LAST; - bool is_neg = vect_normalize_conj_loc (left_op); - - bool conj_first_operand = false; - if (!vect_validate_multiplication (perm_cache, right_op, left_op, false, - &conj_first_operand, true)) + enum _conj_status status; + if (!vect_validate_multiplication (perm_cache, compat_cache, right_op, + left_op, true, &status)) return IFN_LAST; - if (!is_neg) + if (status == CONJ_NONE) ifn = IFN_COMPLEX_FMS; - else if (is_neg) + else ifn = IFN_COMPLEX_FMS_CONJ; if (!vect_pattern_validate_optab (ifn, *ref_node)) @@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op, ops->quick_push (right_op[1]); ops->quick_push (left_op[1]); } - else if (kind == PERM_TOP) - { - ops->quick_push (l0node[0]); - ops->quick_push (right_op[1]); - ops->quick_push (right_op[0]); - ops->quick_push (left_op[0]); - } - else if (kind == PERM_EVENEVEN && !is_neg) - { - ops->quick_push (l0node[0]); - ops->quick_push (right_op[1]); - ops->quick_push (right_op[0]); - ops->quick_push (left_op[0]); - } else { ops->quick_push (l0node[0]); ops->quick_push (right_op[1]); ops->quick_push (right_op[0]); - ops->quick_push (left_op[1]); + ops->quick_push (left_op[0]); } return ifn; @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op, vect_pattern* complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node) { auto_vec<slp_tree> ops; complex_operation_t op = vect_detect_pair_op (*node, true, &ops); internal_fn ifn - = complex_fms_pattern::matches (op, perm_cache, node, &ops); + = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops); if (ifn == IFN_LAST) return NULL; @@ -1305,9 +1353,9 @@ complex_fms_pattern::build (vec_info *vinfo) SLP_TREE_CHILDREN (*this->m_node).create (3); /* First re-arrange the children. */ - SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]); SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]); /* And then rewrite the node itself. */ complex_pattern::build (vinfo); @@ -1334,11 +1382,12 @@ class complex_operations_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); }; /* Dummy matches implementation for proxy object. */ @@ -1347,6 +1396,7 @@ internal_fn complex_operations_pattern:: matches (complex_operation_t /* op */, slp_tree_to_load_perm_map_t * /* perm_cache */, + slp_compat_nodes_map_t * /* compat_cache */, slp_tree * /* ref_node */, vec<slp_tree> * /* ops */) { return IFN_LAST; @@ -1356,6 +1406,7 @@ matches (complex_operation_t /* op */, vect_pattern* complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *ccache, slp_tree *node) { auto_vec<slp_tree> ops; @@ -1363,15 +1414,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, = vect_detect_pair_op (*node, true, &ops); internal_fn ifn = IFN_LAST; - ifn = complex_fms_pattern::matches (op, perm_cache, node, &ops); + ifn = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops); if (ifn != IFN_LAST) return complex_fms_pattern::mkInstance (node, &ops, ifn); - ifn = complex_mul_pattern::matches (op, perm_cache, node, &ops); + ifn = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops); if (ifn != IFN_LAST) return complex_mul_pattern::mkInstance (node, &ops, ifn); - ifn = complex_add_pattern::matches (op, perm_cache, node, &ops); + ifn = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops); if (ifn != IFN_LAST) return complex_add_pattern::mkInstance (node, &ops, ifn); @@ -1398,11 +1449,13 @@ class addsub_pattern : public vect_pattern void build (vec_info *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); }; vect_pattern * -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *node_) { slp_tree node = *node_; if (SLP_TREE_CODE (node) != VEC_PERM_EXPR diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, /* Return true if call statements CALL1 and CALL2 are similar enough to be combined into the same SLP group. */ -static bool +bool compatible_calls_p (gcall *call1, gcall *call2) { unsigned int nargs = gimple_call_num_args (call1); @@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map, static bool vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, hash_set<slp_tree> *visited) { unsigned i; @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, slp_tree child; FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i], - vinfo, perm_cache, visited); + vinfo, perm_cache, compat_cache, + visited); for (unsigned x = 0; x < num__slp_patterns; x++) { - vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node); + vect_pattern *pattern + = slp_patterns[x] (perm_cache, compat_cache, ref_node); if (pattern) { pattern->build (vinfo); @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, static bool vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, hash_set<slp_tree> *visited, - slp_tree_to_load_perm_map_t *perm_cache) + slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache) { DUMP_VECT_SCOPE ("vect_match_slp_patterns"); slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, "Analyzing SLP tree %p for patterns\n", SLP_INSTANCE_TREE (instance)); - return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited); + return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache, + visited); } /* STMT_INFO is a store group of size GROUP_SIZE that we are considering @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) hash_set<slp_tree> visited_patterns; slp_tree_to_load_perm_map_t perm_cache; + slp_compat_nodes_map_t compat_cache; /* See if any patterns can be found in the SLP tree. */ bool pattern_found = false; FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance) pattern_found |= vect_match_slp_patterns (instance, vinfo, - &visited_patterns, &perm_cache); + &visited_patterns, &perm_cache, + &compat_cache); /* If any were found optimize permutations of loads. */ if (pattern_found) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree, extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info); extern slp_tree vect_create_new_slp_node (unsigned, tree_code); extern void vect_free_slp_tree (slp_tree); +extern bool compatible_calls_p (gcall *, gcall *); /* In tree-vect-patterns.c. */ extern void @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds { typedef hash_map <slp_tree, complex_perm_kinds_t> slp_tree_to_load_perm_map_t; +/* Cache from nodes pair to being compatible or not. */ +typedef pair_hash <nofree_ptr_hash <_slp_tree>, + nofree_ptr_hash <_slp_tree>> slp_node_hash; +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t; + + /* Vector pattern matcher base class. All SLP pattern matchers must inherit from this type. */ @@ -2338,7 +2345,8 @@ class vect_pattern public: /* Create a new instance of the pattern matcher class of the given type. */ - static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *); /* Build the pattern from the data collected so far. */ virtual void build (vec_info *) = 0; @@ -2352,6 +2360,7 @@ class vect_pattern /* Function pointer to create a new pattern matcher from a generic type. */ typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *); /* List of supported pattern matchers. */
ping > -----Original Message----- > From: Tamar Christina > Sent: Monday, December 20, 2021 4:19 PM > To: Richard Sandiford <richard.sandiford@arm.com>; Tamar Christina via > Gcc- patches <gcc-patches@gcc.gnu.org> > Cc: nd <nd@arm.com>; rguenther@suse.de > Subject: RE: [1/3 PATCH]middle-end vect: Simplify and extend the > complex numbers validation routines. > > > > > -----Original Message----- > > From: Richard Sandiford <richard.sandiford@arm.com> > > Sent: Friday, December 17, 2021 4:19 PM > > To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> > > Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>; > > rguenther@suse.de > > Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the > > complex numbers validation routines. > > > > Just a comment on the documentation: > > > > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index > > > > > > 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467 > > bc66 > > > e9cfebe9dcfc 100644 > > > --- a/gcc/doc/md.texi > > > +++ b/gcc/doc/md.texi > > > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate > > > that is semantically the same as a multiply and accumulate of > > > complex > numbers. > > > > > > @smallexample > > > - complex TYPE c[N]; > > > - complex TYPE a[N]; > > > - complex TYPE b[N]; > > > + complex TYPE op0[N]; > > > + complex TYPE op1[N]; > > > + complex TYPE op2[N]; > > > for (int i = 0; i < N; i += 1) > > > @{ > > > - c[i] += a[i] * b[i]; > > > + op2[i] += op1[i] * op2[i]; > > > @} > > > > I think this should be: > > > > op0[i] = op1[i] * op2[i] + op3[i]; > > > > since operand 0 is the output and operand 3 is the accumulator input. > > > > Same idea for the others. For: > > > > > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is > > > semantically the same as multiply of complex numbers. > > > > > > @smallexample > > > - complex TYPE c[N]; > > > - complex TYPE a[N]; > > > - complex TYPE b[N]; > > > + complex TYPE op0[N]; > > > + complex TYPE op1[N]; > > > + complex TYPE op2[N]; > > > for (int i = 0; i < N; i += 1) > > > @{ > > > - c[i] = a[i] * b[i]; > > > + op2[i] = op0[i] * op1[i]; > > > > …this I think it should be: > > > > op0[i] = op1[i] * op2[i]; > > Updated patch attached. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no regressions. > > Ok for master? and backport to GCC 11 after some stew? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/102819 > PR tree-optimization/103169 > * doc/md.texi: Update docs for cfms, cfma. > * tree-data-ref.h (same_data_refs): Accept optional offset. > * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating > patterns. > (vect_normalize_conj_loc): Remove. > (is_eq_or_top): Change to take two nodes. > (enum _conj_status, compatible_complex_nodes_p, > vect_validate_multiplication): New. > (class complex_add_pattern, complex_add_pattern::matches, > complex_add_pattern::recognize, class complex_mul_pattern, > complex_mul_pattern::recognize, class complex_fms_pattern, > complex_fms_pattern::recognize, class complex_operations_pattern, > complex_operations_pattern::recognize, > addsub_pattern::recognize): Pass > new cache. > (complex_fms_pattern::matches, complex_mul_pattern::matches): > Pass new > cache and use new validation code. > * tree-vect-slp.c (vect_match_slp_patterns_2, > vect_match_slp_patterns, > vect_analyze_slp): Pass along cache. > (compatible_calls_p): Expose. > * tree-vectorizer.h (compatible_calls_p, slp_node_hash, > slp_compat_nodes_map_t): New. > (class vect_pattern): Update signatures include new cache. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/102819 > PR tree-optimization/103169 > * g++.dg/vect/pr99149.cc: xfail for now. > * gcc.dg/vect/complex/pr102819-1.c: New test. > * gcc.dg/vect/complex/pr102819-2.c: New test. > * gcc.dg/vect/complex/pr102819-3.c: New test. > * gcc.dg/vect/complex/pr102819-4.c: New test. > * gcc.dg/vect/complex/pr102819-5.c: New test. > * gcc.dg/vect/complex/pr102819-6.c: New test. > * gcc.dg/vect/complex/pr102819-7.c: New test. > * gcc.dg/vect/complex/pr102819-8.c: New test. > * gcc.dg/vect/complex/pr102819-9.c: New test. > * gcc.dg/vect/complex/pr103169.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index > 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..ad06b02d36876082afe4c3f3f > b51887f7a522b23 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6325,12 +6325,13 @@ Perform a vector multiply and accumulate that > is semantically the same as a multiply and accumulate of complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > + complex TYPE op3[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] += a[i] * b[i]; > + op0[i] = op1[i] * op2[i] + op3[i]; > @} > @end smallexample > > @@ -6348,12 +6349,13 @@ the same as a multiply and accumulate of > complex numbers where the second multiply arguments is conjugated. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > + complex TYPE op3[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] += a[i] * conj (b[i]); > + op0[i] = op1[i] * conj (op2[i]) + op3[i]; > @} > @end smallexample > > @@ -6370,12 +6372,13 @@ Perform a vector multiply and subtract that is > semantically the same as a multiply and subtract of complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > + complex TYPE op3[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] -= a[i] * b[i]; > + op0[i] = op1[i] * op2[i] - op3[i]; > @} > @end smallexample > > @@ -6393,12 +6396,13 @@ the same as a multiply and subtract of complex > numbers where the second multiply arguments is conjugated. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > + complex TYPE op3[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] -= a[i] * conj (b[i]); > + op0[i] = op1[i] * conj (op2[i]) - op3[i]; > @} > @end smallexample > > @@ -6415,12 +6419,12 @@ Perform a vector multiply that is semantically > the same as multiply of complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] = a[i] * b[i]; > + op0[i] = op1[i] * op2[i]; > @} > @end smallexample > > @@ -6437,12 +6441,12 @@ Perform a vector multiply by conjugate that is > semantically the same as a multiply of complex numbers where the > second multiply arguments is conjugated. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] = a[i] * conj (b[i]); > + op0[i] = op1[i] * conj (op2[i]); > @} > @end smallexample > > diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc > b/gcc/testsuite/g++.dg/vect/pr99149.cc > index > e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d > 6e9432c2166463 100755 > --- a/gcc/testsuite/g++.dg/vect/pr99149.cc > +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc > @@ -24,4 +24,4 @@ public: > } n; > main() { n.j(); } > > -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } > */ > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { > +xfail { vect_float } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02 > f779cf693ede07 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad1(float v1, float v2) > +{ > + for (int r = 0; r < 100; r += 4) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > + f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1); > + f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2); > + // ^^^^^^^ ^^^^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { > +vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96 > 601596f46dc5f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad1(float v1, float v2) > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2); > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { > +target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965 > dbb72cf8940de1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void good1(float v1, float v2) > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { > +vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..882851789c5085e73400060911 > 4be480d3b08bd0 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void good1() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i]; > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { > +vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd46 > 9473e6a5c333ae > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void good2() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1); > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1); > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { > +vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b > 216022fdc0af54e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad1() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i]; > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r]; > + // ^^^^^^^ ^^^^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { > +target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61 > b3a36b555acf3cf > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad2() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i]; > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r]; > + // ^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { > +target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..07b48148688b7d530e5891d02 > 3d558b58a485c23 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad3() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i]; > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; > + // ^^^^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { > +target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316 > e8caf3d485b8ee1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +#include <stdio.h> > +#include <complex.h> > + > +#define N 200 > +#define TYPE float > +#define TYPE2 float > + > +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE > +complex c[restrict N]) { > + for (int i=0; i < N; i++) > + { > + c[i] -= a[i] * b[0]; > + } > +} > + > +/* The pattern overlaps with COMPLEX_ADD so we need to support > +consuming ADDs in COMPLEX_FMS. */ > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { > +vect_float } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a8257 > 4324126e9083fc5 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target { vect_double } } } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */ > + > +_Complex double b_0, c_0; > + > +void > +mul270snd (void) > +{ > + c_0 = b_0 * 1.0iF * 1.0iF; > +} > + > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index > 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf44 > 2d5dc5c16e7ee 100644 > --- a/gcc/tree-data-ref.h > +++ b/gcc/tree-data-ref.h > @@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, > data_reference_p b) } > > /* Return true when the data references A and B are accessing the same > - memory object with the same access functions. */ > + memory object with the same access functions. Optionally skip the > + last OFFSET dimensions in the data reference. */ > > static inline bool > -same_data_refs (data_reference_p a, data_reference_p b) > +same_data_refs (data_reference_p a, data_reference_p b, int offset = > +0) > { > unsigned int i; > > @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, > data_reference_p b) > if (!same_data_refs_base_objects (a, b)) > return false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = offset; i < DR_NUM_DIMENSIONS (a); i++) > if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i))) > return false; > > diff --git a/gcc/tree-vect-slp-patterns.c > b/gcc/tree-vect-slp-patterns.c index > 0350441fad9690cd5d04337171ca3470a064a571..020c29bba08c5bd80503a2dbc > 04292f8fd310b3c 100644 > --- a/gcc/tree-vect-slp-patterns.c > +++ b/gcc/tree-vect-slp-patterns.c > @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads) > int valid_patterns = 4; > FOR_EACH_VEC_ELT (loads, i, load) > { > - if (candidates[0] != PERM_UNKNOWN && load != 1) > + unsigned adj_load = load % 2; > + if (candidates[0] != PERM_UNKNOWN && adj_load != 1) > { > candidates[0] = PERM_UNKNOWN; > valid_patterns--; > } > - if (candidates[1] != PERM_UNKNOWN && load != 0) > + if (candidates[1] != PERM_UNKNOWN && adj_load != 0) > { > candidates[1] = PERM_UNKNOWN; > valid_patterns--; > @@ -596,11 +597,12 @@ class complex_add_pattern : public > complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t > *, > + slp_tree *); > > static vect_pattern* > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn > ifn) @@ > -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo) > internal_fn complex_add_pattern::matches (complex_operation_t op, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t * /* compat_cache */, > slp_tree *node, vec<slp_tree> *ops) { > internal_fn ifn = IFN_LAST; > @@ -692,13 +695,14 @@ complex_add_pattern::matches > (complex_operation_t op, > > vect_pattern* > complex_add_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > complex_operation_t op > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn > - = complex_add_pattern::matches (op, perm_cache, node, &ops); > + = complex_add_pattern::matches (op, perm_cache, compat_cache, > + node, &ops); > if (ifn == IFN_LAST) > return NULL; > > @@ -709,147 +713,214 @@ complex_add_pattern::recognize > (slp_tree_to_load_perm_map_t *perm_cache, > * complex_mul_pattern > > ********************************************************** > ********************/ > > -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If > the first > - child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE. > - > - If a negate is found then the values in ARGS are reordered such that the > - negate node is always the second one and the entry is replaced by the > child > - of the negate node. */ > +/* Helper function to check if PERM is KIND or PERM_TOP. */ > > static inline bool > -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = > NULL) > +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache, > + slp_tree op1, complex_perm_kinds_t kind1, > + slp_tree op2, complex_perm_kinds_t kind2) > { > - gcc_assert (args.length () == 2); > - bool neg_found = false; > - > - if (vect_match_expression_p (args[0], NEGATE_EXPR)) > - { > - std::swap (args[0], args[1]); > - neg_found = true; > - if (neg_first_p) > - *neg_first_p = true; > - } > - else if (vect_match_expression_p (args[1], NEGATE_EXPR)) > - { > - neg_found = true; > - if (neg_first_p) > - *neg_first_p = false; > - } > + complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1); if > + (perm1 != kind1 && perm1 != PERM_TOP) > + return false; > > - if (neg_found) > - args[1] = SLP_TREE_CHILDREN (args[1])[0]; > + complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2); if > + (perm2 != kind2 && perm2 != PERM_TOP) > + return false; > > - return neg_found; > + return true; > } > > -/* Helper function to check if PERM is KIND or PERM_TOP. */ > +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND }; > > static inline bool > -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind) > +compatible_complex_nodes_p (slp_compat_nodes_map_t > *compat_cache, > + slp_tree a, int *pa, slp_tree b, int *pb) > { > - return perm == kind || perm == PERM_TOP; -} > + bool *tmp; > + std::pair<slp_tree, slp_tree> key = std::make_pair(a, b); if ((tmp > + = compat_cache->get (key)) != NULL) > + return *tmp; > > -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are > both MULT_EXPR > - nodes but also that they represent an operation that is either a complex > - multiplication or a complex multiplication by conjugated value. > + compat_cache->put (key, false); > > - Of the negation is expected to be in the first half of the tree (As required > - by an FMS pattern) then NEG_FIRST is true. If the operation is a conjugate > - operation then CONJ_FIRST_OPERAND is set to indicate whether the first > or > - second operand contains the conjugate operation. */ > + if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ()) > + return false; > > -static inline bool > -vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, > - const vec<slp_tree> &left_op, > - const vec<slp_tree> &right_op, > - bool neg_first, bool *conj_first_operand, > - bool fms) > -{ > - /* The presence of a negation indicates that we have either a > conjugate or a > - rotation. We need to distinguish which one. */ > - *conj_first_operand = false; > - complex_perm_kinds_t kind; > - > - /* Complex conjugates have the negation on the imaginary part of the > - number where rotations affect the real component. So check if the > - negation is on a dup of lane 1. */ > - if (fms) > + if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b)) > + return false; > + > + /* Only internal nodes can be loads, as such we can't check further if they > + are externals. */ > + if (SLP_TREE_DEF_TYPE (a) != vect_internal_def) > { > - /* Canonicalization for fms is not consistent. So have to test both > - variants to be sure. This needs to be fixed in the mid-end so > - this part can be simpler. */ > - kind = linear_loads_p (perm_cache, right_op[0]); > - if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), > PERM_ODDODD) > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), > - PERM_ODDEVEN)) > - || (kind == PERM_ODDEVEN > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), > - PERM_ODDODD)))) > - return false; > + for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++) > + { > + tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]]; > + tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]]; > + if (!operand_equal_p (op1, op2, 0)) > + return false; > + } > + > + compat_cache->put (key, true); > + return true; > } > + > + auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a)); auto > + b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b)); > + > + if (gimple_code (a_stmt) != gimple_code (b_stmt)) > + return false; > + > + /* code, children, type, externals, loads, constants */ if > + (gimple_num_args (a_stmt) != gimple_num_args (b_stmt)) > + return false; > + > + /* At this point, a and b are known to be the same gimple operations. > +*/ > + if (is_gimple_call (a_stmt)) > + { > + if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt), > + dyn_cast <gcall *> (b_stmt))) > + return false; > + } > + else if (!is_gimple_assign (a_stmt)) > + return false; > else > { > - if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD > - && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), > - PERM_ODDEVEN)) > + tree_code acode = gimple_assign_rhs_code (a_stmt); > + tree_code bcode = gimple_assign_rhs_code (b_stmt); > + if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR) > + && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR)) > + return true; > + > + if (acode != bcode) > return false; > } > > - /* Deal with differences in indexes. */ > - int index1 = fms ? 1 : 0; > - int index2 = fms ? 0 : 1; > - > - /* Check if the conjugate is on the second first or second operand. The > - order of the node with the conjugate value determines this, and the dup > - node must be one of lane 0 of the same DR as the neg node. */ > - kind = linear_loads_p (perm_cache, left_op[index1]); > - if (kind == PERM_TOP) > + if (!SLP_TREE_LOAD_PERMUTATION (a).exists () > + || !SLP_TREE_LOAD_PERMUTATION (b).exists ()) > { > - if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD) > - return true; > + for (unsigned i = 0; i < gimple_num_args (a_stmt); i++) > + { > + tree t1 = gimple_arg (a_stmt, i); > + tree t2 = gimple_arg (b_stmt, i); > + if (TREE_CODE (t1) != TREE_CODE (t2)) > + return false; > + > + /* If SSA name then we will need to inspect the children > + so we can punt here. */ > + if (TREE_CODE (t1) == SSA_NAME) > + continue; > + > + if (!operand_equal_p (t1, t2, 0)) > + return false; > + } > } > - else if (kind == PERM_EVENODD && !neg_first) > + else > { > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != > PERM_EVENEVEN) > + auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a)); > + auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b)); > + /* Don't check the last dimension as that's checked by the lineary > + checks. This check is also much stricter than what we need > + because it doesn't consider loading from adjacent elements > + in the same struct as loading from the same base object. > + But for now, I'll play it safe. */ > + if (!same_data_refs (dr1, dr2, 1)) > return false; > - return true; > } > - else if (kind == PERM_EVENEVEN && neg_first) > + > + for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++) > { > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != > PERM_EVENODD) > + if (!compatible_complex_nodes_p (compat_cache, > + SLP_TREE_CHILDREN (a)[i], pa, > + SLP_TREE_CHILDREN (b)[i], pb)) > return false; > - > - *conj_first_operand = true; > - return true; > } > - else > - return false; > - > - if (kind != PERM_EVENEVEN) > - return false; > > + compat_cache->put (key, true); > return true; > } > > -/* Helper function to help distinguish between a conjugate and a > rotation in a > - complex multiplication. The operations have similar shapes but the order > of > - the load permutes are different. This function returns TRUE when the > order > - is consistent with a multiplication or multiplication by conjugated > - operand but returns FALSE if it's a multiplication by rotated operand. */ > - > static inline bool > vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, > - const vec<slp_tree> &op, > - complex_perm_kinds_t permKind) > + slp_compat_nodes_map_t *compat_cache, > + vec<slp_tree> &left_op, > + vec<slp_tree> &right_op, > + bool subtract, > + enum _conj_status *_status) > { > - /* The left node is the more common case, test it first. */ > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind)) > + auto_vec<slp_tree> ops; > + enum _conj_status stats = CONJ_NONE; > + > + /* The complex operations can occur in two layouts and two permute > sequences > + so declare them and re-use them. */ > + int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}. */ > + , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}. */ > + }; > + > + /* Now for the corresponding permutes that go with these values. > + */ complex_perm_kinds_t perms[][4] > + = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, > PERM_ODDEVEN } > + , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, > PERM_ODDODD } > + }; > + > + /* These permutes are used during comparisons of externals on which > + we require strict equality. */ > + int cq[][4][2] > + = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } } > + , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } } > + }; > + > + /* Default to style and perm 0, most operations use this one. */ > + int style = 0; int perm = subtract ? 1 : 0; > + > + /* Check if we have a negate operation, if so absorb the node and > continue > + looking. */ > + bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR); > + bool > + neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR); > + > + /* Determine which style we're looking at. We only have different ones > + whenever a conjugate is involved. */ if (neg0 && neg1) > + ; > + else if (neg0) > { > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind)) > - return false; > + right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0]; > + stats = CONJ_FST; > + if (subtract) > + perm = 0; > } > - return true; > + else if (neg1) > + { > + right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0]; > + stats = CONJ_SND; > + perm = 1; > + } > + > + *_status = stats; > + > + /* Flatten the inputs after we've remapped them. */ ops.create > + (4); ops.safe_splice (left_op); ops.safe_splice (right_op); > + > + /* Extract out the elements to check. */ slp_tree op0 = > + ops[styles[style][0]]; slp_tree op1 = ops[styles[style][1]]; > + slp_tree op2 = ops[styles[style][2]]; slp_tree op3 = > + ops[styles[style][3]]; > + > + /* Do cheapest test first. If failed no need to analyze further. > + */ if (linear_loads_p (perm_cache, op0) != perms[perm][0] > + || linear_loads_p (perm_cache, op1) != perms[perm][1] > + || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, > perms[perm][3])) > + return false; > + > + return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], > op1, > + cq[perm][1]) > + && compatible_complex_nodes_p (compat_cache, op2, > cq[perm][2], op3, > + cq[perm][3]); > } > > /* This function combines two nodes containing only even and only odd > lanes @@ -908,11 +979,12 @@ class complex_mul_pattern : public > complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t > *, > + slp_tree *); > > static vect_pattern* > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn > ifn) @@ > -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern > internal_fn complex_mul_pattern::matches (complex_operation_t op, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node, vec<slp_tree> *ops) { > internal_fn ifn = IFN_LAST; > @@ -990,17 +1063,13 @@ complex_mul_pattern::matches > (complex_operation_t op, > || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN) > return IFN_LAST; > > - bool neg_first = false; > - bool conj_first_operand = false; > - bool is_neg = vect_normalize_conj_loc (right_op, &neg_first); > + enum _conj_status status; > + if (!vect_validate_multiplication (perm_cache, compat_cache, left_op, > + right_op, false, &status)) > + return IFN_LAST; > > - if (!is_neg) > + if (status == CONJ_NONE) > { > - /* A multiplication needs to multiply agains the real pair, otherwise > - the pattern matches that of FMS. */ > - if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN) > - || vect_normalize_conj_loc (left_op)) > - return IFN_LAST; > if (add0) > ifn = IFN_COMPLEX_FMA; > else > @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches > (complex_operation_t op, > } > else > { > - if (!vect_validate_multiplication (perm_cache, left_op, right_op, > - neg_first, &conj_first_operand, > - false)) > - return IFN_LAST; > - > if(add0) > ifn = IFN_COMPLEX_FMA_CONJ; > else > @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches > (complex_operation_t op, > ops->quick_push (add0); > > complex_perm_kinds_t kind = linear_loads_p (perm_cache, > left_op[0]); > - if (kind == PERM_EVENODD) > + if (kind == PERM_EVENODD || kind == PERM_TOP) > { > ops->quick_push (left_op[1]); > ops->quick_push (right_op[1]); > ops->quick_push (left_op[0]); > } > - else if (kind == PERM_TOP) > - { > - ops->quick_push (left_op[1]); > - ops->quick_push (right_op[1]); > - ops->quick_push (left_op[0]); > - } > - else if (kind == PERM_EVENEVEN && !conj_first_operand) > + else if (kind == PERM_EVENEVEN && status != CONJ_SND) > { > ops->quick_push (left_op[0]); > ops->quick_push (right_op[0]); > @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches > (complex_operation_t op, > > vect_pattern* > complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > complex_operation_t op > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn > - = complex_mul_pattern::matches (op, perm_cache, node, &ops); > + = complex_mul_pattern::matches (op, perm_cache, compat_cache, > + node, &ops); > if (ifn == IFN_LAST) > return NULL; > > @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo) > > /* First re-arrange the children. */ > SLP_TREE_CHILDREN (*this->m_node).safe_grow (3); > - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0]; > - SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3]; > - SLP_TREE_CHILDREN (*this->m_node)[2] = newnode; > + SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[3]; > + SLP_TREE_CHILDREN (*this->m_node)[1] = newnode; > + SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0]; > > /* Tell the builder to expect an extra argument. */ > this->m_num_args++; > @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public > complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t > *, > + slp_tree *); > > static vect_pattern* > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn > ifn) @@ > -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern > internal_fn complex_fms_pattern::matches (complex_operation_t op, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree * ref_node, vec<slp_tree> *ops) { > internal_fn ifn = IFN_LAST; > @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches > (complex_operation_t op, > if (!vect_match_expression_p (root, MINUS_EXPR)) > return IFN_LAST; > > + /* TODO: Support invariants here, with the new layout CADD now > + can match before we get a chance to try CFMS. */ > auto nodes = SLP_TREE_CHILDREN (root); > if (!vect_match_expression_p (nodes[1], MULT_EXPR) > || vect_detect_pair_op (nodes[0]) != PLUS_MINUS) @@ -1217,16 > +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op, > || !vect_match_expression_p (l0node[1], MULT_EXPR)) > return IFN_LAST; > > - bool is_neg = vect_normalize_conj_loc (left_op); > - > - bool conj_first_operand = false; > - if (!vect_validate_multiplication (perm_cache, right_op, left_op, false, > - &conj_first_operand, true)) > + enum _conj_status status; > + if (!vect_validate_multiplication (perm_cache, compat_cache, right_op, > + left_op, true, &status)) > return IFN_LAST; > > - if (!is_neg) > + if (status == CONJ_NONE) > ifn = IFN_COMPLEX_FMS; > - else if (is_neg) > + else > ifn = IFN_COMPLEX_FMS_CONJ; > > if (!vect_pattern_validate_optab (ifn, *ref_node)) @@ -1243,26 > +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op, > ops->quick_push (right_op[1]); > ops->quick_push (left_op[1]); > } > - else if (kind == PERM_TOP) > - { > - ops->quick_push (l0node[0]); > - ops->quick_push (right_op[1]); > - ops->quick_push (right_op[0]); > - ops->quick_push (left_op[0]); > - } > - else if (kind == PERM_EVENEVEN && !is_neg) > - { > - ops->quick_push (l0node[0]); > - ops->quick_push (right_op[1]); > - ops->quick_push (right_op[0]); > - ops->quick_push (left_op[0]); > - } > else > { > ops->quick_push (l0node[0]); > ops->quick_push (right_op[1]); > ops->quick_push (right_op[0]); > - ops->quick_push (left_op[1]); > + ops->quick_push (left_op[0]); > } > > return ifn; > @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches > (complex_operation_t op, > > vect_pattern* > complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > complex_operation_t op > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn > - = complex_fms_pattern::matches (op, perm_cache, node, &ops); > + = complex_fms_pattern::matches (op, perm_cache, compat_cache, > + node, &ops); > if (ifn == IFN_LAST) > return NULL; > > @@ -1305,9 +1353,9 @@ complex_fms_pattern::build (vec_info *vinfo) > SLP_TREE_CHILDREN (*this->m_node).create (3); > > /* First re-arrange the children. */ > - SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]); > SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); > SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); > + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]); > > /* And then rewrite the node itself. */ > complex_pattern::build (vinfo); > @@ -1334,11 +1382,12 @@ class complex_operations_pattern : public > complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t > *, > + slp_tree *); > }; > > /* Dummy matches implementation for proxy object. */ @@ -1347,6 > +1396,7 @@ internal_fn > complex_operations_pattern:: > matches (complex_operation_t /* op */, > slp_tree_to_load_perm_map_t * /* perm_cache */, > + slp_compat_nodes_map_t * /* compat_cache */, > slp_tree * /* ref_node */, vec<slp_tree> * /* ops */) { > return IFN_LAST; > @@ -1356,6 +1406,7 @@ matches (complex_operation_t /* op */, > > vect_pattern* > complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > + slp_compat_nodes_map_t *ccache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > @@ -1363,15 +1414,15 @@ complex_operations_pattern::recognize > (slp_tree_to_load_perm_map_t *perm_cache, > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn = IFN_LAST; > > - ifn = complex_fms_pattern::matches (op, perm_cache, node, &ops); > + ifn = complex_fms_pattern::matches (op, perm_cache, ccache, node, > + &ops); > if (ifn != IFN_LAST) > return complex_fms_pattern::mkInstance (node, &ops, ifn); > > - ifn = complex_mul_pattern::matches (op, perm_cache, node, &ops); > + ifn = complex_mul_pattern::matches (op, perm_cache, ccache, node, > + &ops); > if (ifn != IFN_LAST) > return complex_mul_pattern::mkInstance (node, &ops, ifn); > > - ifn = complex_add_pattern::matches (op, perm_cache, node, &ops); > + ifn = complex_add_pattern::matches (op, perm_cache, ccache, node, > + &ops); > if (ifn != IFN_LAST) > return complex_add_pattern::mkInstance (node, &ops, ifn); > > @@ -1398,11 +1449,13 @@ class addsub_pattern : public vect_pattern > void build (vec_info *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t > *, > + slp_tree *); > }; > > vect_pattern * > -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree > *node_) > +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *node_) > { > slp_tree node = *node_; > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR diff --git > a/gcc/tree-vect- slp.c b/gcc/tree-vect-slp.c index > b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06 > a6d7a0875de5e75 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, > unsigned char swap, > /* Return true if call statements CALL1 and CALL2 are similar enough > to be combined into the same SLP group. */ > > -static bool > +bool > compatible_calls_p (gcall *call1, gcall *call2) { > unsigned int nargs = gimple_call_num_args (call1); @@ -2907,6 > +2907,7 @@ optimize_load_redistribution > (scalar_stmts_to_slp_tree_map_t *bst_map, static bool > vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > hash_set<slp_tree> *visited) > { > unsigned i; > @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, > vec_info *vinfo, > slp_tree child; > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i], > - vinfo, perm_cache, visited); > + vinfo, perm_cache, compat_cache, > + visited); > > for (unsigned x = 0; x < num__slp_patterns; x++) > { > - vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node); > + vect_pattern *pattern > + = slp_patterns[x] (perm_cache, compat_cache, ref_node); > if (pattern) > { > pattern->build (vinfo); > @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, > vec_info *vinfo, static bool vect_match_slp_patterns (slp_instance > instance, vec_info *vinfo, > hash_set<slp_tree> *visited, > - slp_tree_to_load_perm_map_t *perm_cache) > + slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache) > { > DUMP_VECT_SCOPE ("vect_match_slp_patterns"); > slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2953,7 > +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info > *vinfo, > "Analyzing SLP tree %p for patterns\n", > SLP_INSTANCE_TREE (instance)); > > - return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, > visited); > + return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, > compat_cache, > + visited); > } > > /* STMT_INFO is a store group of size GROUP_SIZE that we are > considering @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, > unsigned > max_tree_size) > > hash_set<slp_tree> visited_patterns; > slp_tree_to_load_perm_map_t perm_cache; > + slp_compat_nodes_map_t compat_cache; > > /* See if any patterns can be found in the SLP tree. */ > bool pattern_found = false; > FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance) > pattern_found |= vect_match_slp_patterns (instance, vinfo, > - &visited_patterns, &perm_cache); > + &visited_patterns, &perm_cache, > + &compat_cache); > > /* If any were found optimize permutations of loads. */ > if (pattern_found) > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index > 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd8 > 81e0ec636a605a 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info > *, gimple_seq *, tree, extern int > vect_get_place_in_interleaving_chain > (stmt_vec_info, stmt_vec_info); extern slp_tree > vect_create_new_slp_node (unsigned, tree_code); extern void > vect_free_slp_tree (slp_tree); > +extern bool compatible_calls_p (gcall *, gcall *); > > /* In tree-vect-patterns.c. */ > extern void > @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds { typedef > hash_map <slp_tree, complex_perm_kinds_t> > slp_tree_to_load_perm_map_t; > > +/* Cache from nodes pair to being compatible or not. */ typedef > +pair_hash <nofree_ptr_hash <_slp_tree>, > + nofree_ptr_hash <_slp_tree>> slp_node_hash; typedef > hash_map > +<slp_node_hash, bool> slp_compat_nodes_map_t; > + > + > /* Vector pattern matcher base class. All SLP pattern matchers must inherit > from this type. */ > > @@ -2338,7 +2345,8 @@ class vect_pattern > public: > > /* Create a new instance of the pattern matcher class of the given type. > */ > - static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, > slp_tree *); > + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *); > > /* Build the pattern from the data collected so far. */ > virtual void build (vec_info *) = 0; @@ -2352,6 +2360,7 @@ class > vect_pattern > > /* Function pointer to create a new pattern matcher from a generic > type. */ typedef vect_pattern* (*vect_pattern_decl_t) > (slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, > slp_tree *); > > /* List of supported pattern matchers. */
On Fri, Dec 17, 2021 at 4:44 PM Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi All, > > This patch boosts the analysis for complex mul,fma and fms in order to ensure > that it doesn't create an incorrect output. > > Essentially it adds an extra verification to check that the two nodes it's going > to combine do the same operations on compatible values. The reason it needs to > do this is that if one computation differs from the other then with the current > implementation we have no way to deal with it since we have to remove the > permute. > > When we can keep the permute around we can probably handle these by unrolling. > > While implementing this since I have to do the traversal anyway I took advantage > of it by simplifying the code a bit. Previously we would determine whether > something is a conjugate and then try to figure out which conjugate it is and > then try to see if the permutes match what we expect. > > Now the code that does the traversal will detect this in one go and return to us > whether the operation is something that can be combined and whether a conjugate > is present. > > Secondly because it does this I can now simplify the checking code itself to > essentially just try to apply fixed patterns to each operation. > > The patterns represent the order operations should appear in. For instance a > complex MUL operation combines : > > Left 1 + Right 1 > Left 2 + Right 2 > > with a permute on the nodes consisting of: > > { Even, Even } + { Odd, Odd } > { Even, Odd } + { Odd, Even } > > By abstracting over these patterns the checking code becomes quite simple. > > As part of this I was checking the order of the operands which was left in > "slp" order. as in, the same order they showed up in during SLP, which means > that the accumulator is first. However it looks like I didn't document this > and the x86 optab was implemented assuming the same order as FMA, i.e. that > the accumulator is last. > > I have this changed the order to match that of FMA and FMS which corrects the > x86 codegen and will update the Arm targets. This has now also been > documented. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > x86_64-pc-linux-gnu and no regressions. > > Ok for master? and backport to GCC 11 after some stew? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/102819 > PR tree-optimization/103169 > * doc/md.texi: Update docs for cfms, cfma. > * tree-data-ref.h (same_data_refs): Accept optional offset. > * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating > patterns. > (vect_normalize_conj_loc): Remove. > (is_eq_or_top): Change to take two nodes. > (enum _conj_status, compatible_complex_nodes_p, > vect_validate_multiplication): New. > (class complex_add_pattern, complex_add_pattern::matches, > complex_add_pattern::recognize, class complex_mul_pattern, > complex_mul_pattern::recognize, class complex_fms_pattern, > complex_fms_pattern::recognize, class complex_operations_pattern, > complex_operations_pattern::recognize, addsub_pattern::recognize): Pass > new cache. > (complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new > cache and use new validation code. > * tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns, > vect_analyze_slp): Pass along cache. > (compatible_calls_p): Expose. > * tree-vectorizer.h (compatible_calls_p, slp_node_hash, > slp_compat_nodes_map_t): New. > (class vect_pattern): Update signatures include new cache. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/102819 > PR tree-optimization/103169 > * g++.dg/vect/pr99149.cc: xfail for now. > * gcc.dg/vect/complex/pr102819-1.c: New test. > * gcc.dg/vect/complex/pr102819-2.c: New test. > * gcc.dg/vect/complex/pr102819-3.c: New test. > * gcc.dg/vect/complex/pr102819-4.c: New test. > * gcc.dg/vect/complex/pr102819-5.c: New test. > * gcc.dg/vect/complex/pr102819-6.c: New test. > * gcc.dg/vect/complex/pr102819-7.c: New test. > * gcc.dg/vect/complex/pr102819-8.c: New test. > * gcc.dg/vect/complex/pr102819-9.c: New test. > * gcc.dg/vect/complex/pr103169.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is semantically the same as > a multiply and accumulate of complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] += a[i] * b[i]; > + op2[i] += op1[i] * op2[i]; > @} > @end smallexample > > @@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of complex numbers where the second > multiply arguments is conjugated. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] += a[i] * conj (b[i]); > + op2[i] += op0[i] * conj (op1[i]); > @} > @end smallexample > > @@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is semantically the same as > a multiply and subtract of complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] -= a[i] * b[i]; > + op2[i] -= op0[i] * op1[i]; > @} > @end smallexample > > @@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex numbers where the second > multiply arguments is conjugated. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] -= a[i] * conj (b[i]); > + op2[i] -= op0[i] * conj (op1[i]); > @} > @end smallexample > > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically the same as multiply of > complex numbers. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] = a[i] * b[i]; > + op2[i] = op0[i] * op1[i]; > @} > @end smallexample > > @@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is semantically the same as a > multiply of complex numbers where the second multiply arguments is conjugated. > > @smallexample > - complex TYPE c[N]; > - complex TYPE a[N]; > - complex TYPE b[N]; > + complex TYPE op0[N]; > + complex TYPE op1[N]; > + complex TYPE op2[N]; > for (int i = 0; i < N; i += 1) > @{ > - c[i] = a[i] * conj (b[i]); > + op2[i] = op0[i] * conj (op1[i]); > @} > @end smallexample > > diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc > index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755 > --- a/gcc/testsuite/g++.dg/vect/pr99149.cc > +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc > @@ -24,4 +24,4 @@ public: > } n; > main() { n.j(); } > > -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */ > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad1(float v1, float v2) > +{ > + for (int r = 0; r < 100; r += 4) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > + f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1); > + f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2); > + // ^^^^^^^ ^^^^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad1(float v1, float v2) > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2); > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > new file mode 100644 > index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void good1(float v1, float v2) > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > new file mode 100644 > index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void good1() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i]; > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > new file mode 100644 > index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void good2() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1); > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1); > + } > +} > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > new file mode 100644 > index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad1() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i]; > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r]; > + // ^^^^^^^ ^^^^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > new file mode 100644 > index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad2() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i]; > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r]; > + // ^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > new file mode 100644 > index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +float f[12][100]; > + > +void bad3() > +{ > + for (int r = 0; r < 100; r += 2) > + { > + int i = r + 1; > + f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i]; > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; > + // ^^^^^^^ > + } > +} > + > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ > + > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > new file mode 100644 > index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > + > +#include <stdio.h> > +#include <complex.h> > + > +#define N 200 > +#define TYPE float > +#define TYPE2 float > + > +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) > +{ > + for (int i=0; i < N; i++) > + { > + c[i] -= a[i] * b[0]; > + } > +} > + > +/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS. */ > + > +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > new file mode 100644 > index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target { vect_double } } } */ > +/* { dg-add-options arm_v8_3a_complex_neon } */ > +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */ > + > +_Complex double b_0, c_0; > + > +void > +mul270snd (void) > +{ > + c_0 = b_0 * 1.0iF * 1.0iF; > +} > + > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h > index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644 > --- a/gcc/tree-data-ref.h > +++ b/gcc/tree-data-ref.h > @@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b) > } > > /* Return true when the data references A and B are accessing the same > - memory object with the same access functions. */ > + memory object with the same access functions. Optionally skip the > + last OFFSET dimensions in the data reference. */ But you skip the _first_ dimensions? Otherwise looks OK to me. Thanks, Richard. > static inline bool > -same_data_refs (data_reference_p a, data_reference_p b) > +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0) > { > unsigned int i; > > @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b) > if (!same_data_refs_base_objects (a, b)) > return false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = offset; i < DR_NUM_DIMENSIONS (a); i++) > if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i))) > return false; > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > index 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f37305a5d3078fbb0c5 100644 > --- a/gcc/tree-vect-slp-patterns.c > +++ b/gcc/tree-vect-slp-patterns.c > @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads) > int valid_patterns = 4; > FOR_EACH_VEC_ELT (loads, i, load) > { > - if (candidates[0] != PERM_UNKNOWN && load != 1) > + unsigned adj_load = load % 2; > + if (candidates[0] != PERM_UNKNOWN && adj_load != 1) > { > candidates[0] = PERM_UNKNOWN; > valid_patterns--; > } > - if (candidates[1] != PERM_UNKNOWN && load != 0) > + if (candidates[1] != PERM_UNKNOWN && adj_load != 0) > { > candidates[1] = PERM_UNKNOWN; > valid_patterns--; > @@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, > + slp_tree *); > > static vect_pattern* > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) > @@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo) > internal_fn > complex_add_pattern::matches (complex_operation_t op, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t * /* compat_cache */, > slp_tree *node, vec<slp_tree> *ops) > { > internal_fn ifn = IFN_LAST; > @@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op, > > vect_pattern* > complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > complex_operation_t op > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn > - = complex_add_pattern::matches (op, perm_cache, node, &ops); > + = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops); > if (ifn == IFN_LAST) > return NULL; > > @@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, > * complex_mul_pattern > ******************************************************************************/ > > -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the first > - child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE. > - > - If a negate is found then the values in ARGS are reordered such that the > - negate node is always the second one and the entry is replaced by the child > - of the negate node. */ > +/* Helper function to check if PERM is KIND or PERM_TOP. */ > > static inline bool > -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL) > +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache, > + slp_tree op1, complex_perm_kinds_t kind1, > + slp_tree op2, complex_perm_kinds_t kind2) > { > - gcc_assert (args.length () == 2); > - bool neg_found = false; > - > - if (vect_match_expression_p (args[0], NEGATE_EXPR)) > - { > - std::swap (args[0], args[1]); > - neg_found = true; > - if (neg_first_p) > - *neg_first_p = true; > - } > - else if (vect_match_expression_p (args[1], NEGATE_EXPR)) > - { > - neg_found = true; > - if (neg_first_p) > - *neg_first_p = false; > - } > + complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1); > + if (perm1 != kind1 && perm1 != PERM_TOP) > + return false; > > - if (neg_found) > - args[1] = SLP_TREE_CHILDREN (args[1])[0]; > + complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2); > + if (perm2 != kind2 && perm2 != PERM_TOP) > + return false; > > - return neg_found; > + return true; > } > > -/* Helper function to check if PERM is KIND or PERM_TOP. */ > +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND }; > > static inline bool > -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind) > +compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache, > + slp_tree a, int *pa, slp_tree b, int *pb) > { > - return perm == kind || perm == PERM_TOP; > -} > + bool *tmp; > + std::pair<slp_tree, slp_tree> key = std::make_pair(a, b); > + if ((tmp = compat_cache->get (key)) != NULL) > + return *tmp; > > -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR > - nodes but also that they represent an operation that is either a complex > - multiplication or a complex multiplication by conjugated value. > + compat_cache->put (key, false); > > - Of the negation is expected to be in the first half of the tree (As required > - by an FMS pattern) then NEG_FIRST is true. If the operation is a conjugate > - operation then CONJ_FIRST_OPERAND is set to indicate whether the first or > - second operand contains the conjugate operation. */ > + if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ()) > + return false; > > -static inline bool > -vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, > - const vec<slp_tree> &left_op, > - const vec<slp_tree> &right_op, > - bool neg_first, bool *conj_first_operand, > - bool fms) > -{ > - /* The presence of a negation indicates that we have either a conjugate or a > - rotation. We need to distinguish which one. */ > - *conj_first_operand = false; > - complex_perm_kinds_t kind; > - > - /* Complex conjugates have the negation on the imaginary part of the > - number where rotations affect the real component. So check if the > - negation is on a dup of lane 1. */ > - if (fms) > + if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b)) > + return false; > + > + /* Only internal nodes can be loads, as such we can't check further if they > + are externals. */ > + if (SLP_TREE_DEF_TYPE (a) != vect_internal_def) > { > - /* Canonicalization for fms is not consistent. So have to test both > - variants to be sure. This needs to be fixed in the mid-end so > - this part can be simpler. */ > - kind = linear_loads_p (perm_cache, right_op[0]); > - if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD) > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), > - PERM_ODDEVEN)) > - || (kind == PERM_ODDEVEN > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), > - PERM_ODDODD)))) > - return false; > + for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++) > + { > + tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]]; > + tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]]; > + if (!operand_equal_p (op1, op2, 0)) > + return false; > + } > + > + compat_cache->put (key, true); > + return true; > + } > + > + auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a)); > + auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b)); > + > + if (gimple_code (a_stmt) != gimple_code (b_stmt)) > + return false; > + > + /* code, children, type, externals, loads, constants */ > + if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt)) > + return false; > + > + /* At this point, a and b are known to be the same gimple operations. */ > + if (is_gimple_call (a_stmt)) > + { > + if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt), > + dyn_cast <gcall *> (b_stmt))) > + return false; > } > + else if (!is_gimple_assign (a_stmt)) > + return false; > else > { > - if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD > - && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), > - PERM_ODDEVEN)) > + tree_code acode = gimple_assign_rhs_code (a_stmt); > + tree_code bcode = gimple_assign_rhs_code (b_stmt); > + if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR) > + && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR)) > + return true; > + > + if (acode != bcode) > return false; > } > > - /* Deal with differences in indexes. */ > - int index1 = fms ? 1 : 0; > - int index2 = fms ? 0 : 1; > - > - /* Check if the conjugate is on the second first or second operand. The > - order of the node with the conjugate value determines this, and the dup > - node must be one of lane 0 of the same DR as the neg node. */ > - kind = linear_loads_p (perm_cache, left_op[index1]); > - if (kind == PERM_TOP) > + if (!SLP_TREE_LOAD_PERMUTATION (a).exists () > + || !SLP_TREE_LOAD_PERMUTATION (b).exists ()) > { > - if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD) > - return true; > + for (unsigned i = 0; i < gimple_num_args (a_stmt); i++) > + { > + tree t1 = gimple_arg (a_stmt, i); > + tree t2 = gimple_arg (b_stmt, i); > + if (TREE_CODE (t1) != TREE_CODE (t2)) > + return false; > + > + /* If SSA name then we will need to inspect the children > + so we can punt here. */ > + if (TREE_CODE (t1) == SSA_NAME) > + continue; > + > + if (!operand_equal_p (t1, t2, 0)) > + return false; > + } > } > - else if (kind == PERM_EVENODD && !neg_first) > + else > { > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN) > + auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a)); > + auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b)); > + /* Don't check the last dimension as that's checked by the lineary > + checks. This check is also much stricter than what we need > + because it doesn't consider loading from adjacent elements > + in the same struct as loading from the same base object. > + But for now, I'll play it safe. */ > + if (!same_data_refs (dr1, dr2, 1)) > return false; > - return true; > } > - else if (kind == PERM_EVENEVEN && neg_first) > + > + for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++) > { > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD) > + if (!compatible_complex_nodes_p (compat_cache, > + SLP_TREE_CHILDREN (a)[i], pa, > + SLP_TREE_CHILDREN (b)[i], pb)) > return false; > - > - *conj_first_operand = true; > - return true; > } > - else > - return false; > - > - if (kind != PERM_EVENEVEN) > - return false; > > + compat_cache->put (key, true); > return true; > } > > -/* Helper function to help distinguish between a conjugate and a rotation in a > - complex multiplication. The operations have similar shapes but the order of > - the load permutes are different. This function returns TRUE when the order > - is consistent with a multiplication or multiplication by conjugated > - operand but returns FALSE if it's a multiplication by rotated operand. */ > - > static inline bool > vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, > - const vec<slp_tree> &op, > - complex_perm_kinds_t permKind) > + slp_compat_nodes_map_t *compat_cache, > + vec<slp_tree> &left_op, > + vec<slp_tree> &right_op, > + bool subtract, > + enum _conj_status *_status) > { > - /* The left node is the more common case, test it first. */ > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind)) > + auto_vec<slp_tree> ops; > + enum _conj_status stats = CONJ_NONE; > + > + /* The complex operations can occur in two layouts and two permute sequences > + so declare them and re-use them. */ > + int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}. */ > + , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}. */ > + }; > + > + /* Now for the corresponding permutes that go with these values. */ > + complex_perm_kinds_t perms[][4] > + = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN } > + , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD } > + }; > + > + /* These permutes are used during comparisons of externals on which > + we require strict equality. */ > + int cq[][4][2] > + = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } } > + , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } } > + }; > + > + /* Default to style and perm 0, most operations use this one. */ > + int style = 0; > + int perm = subtract ? 1 : 0; > + > + /* Check if we have a negate operation, if so absorb the node and continue > + looking. */ > + bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR); > + bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR); > + > + /* Determine which style we're looking at. We only have different ones > + whenever a conjugate is involved. */ > + if (neg0 && neg1) > + ; > + else if (neg0) > { > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind)) > - return false; > + right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0]; > + stats = CONJ_FST; > + if (subtract) > + perm = 0; > } > - return true; > + else if (neg1) > + { > + right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0]; > + stats = CONJ_SND; > + perm = 1; > + } > + > + *_status = stats; > + > + /* Flatten the inputs after we've remapped them. */ > + ops.create (4); > + ops.safe_splice (left_op); > + ops.safe_splice (right_op); > + > + /* Extract out the elements to check. */ > + slp_tree op0 = ops[styles[style][0]]; > + slp_tree op1 = ops[styles[style][1]]; > + slp_tree op2 = ops[styles[style][2]]; > + slp_tree op3 = ops[styles[style][3]]; > + > + /* Do cheapest test first. If failed no need to analyze further. */ > + if (linear_loads_p (perm_cache, op0) != perms[perm][0] > + || linear_loads_p (perm_cache, op1) != perms[perm][1] > + || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3])) > + return false; > + > + return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1, > + cq[perm][1]) > + && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3, > + cq[perm][3]); > } > > /* This function combines two nodes containing only even and only odd lanes > @@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, > + slp_tree *); > > static vect_pattern* > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) > @@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern > internal_fn > complex_mul_pattern::matches (complex_operation_t op, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node, vec<slp_tree> *ops) > { > internal_fn ifn = IFN_LAST; > @@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op, > || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN) > return IFN_LAST; > > - bool neg_first = false; > - bool conj_first_operand = false; > - bool is_neg = vect_normalize_conj_loc (right_op, &neg_first); > + enum _conj_status status; > + if (!vect_validate_multiplication (perm_cache, compat_cache, left_op, > + right_op, false, &status)) > + return IFN_LAST; > > - if (!is_neg) > + if (status == CONJ_NONE) > { > - /* A multiplication needs to multiply agains the real pair, otherwise > - the pattern matches that of FMS. */ > - if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN) > - || vect_normalize_conj_loc (left_op)) > - return IFN_LAST; > if (add0) > ifn = IFN_COMPLEX_FMA; > else > @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op, > } > else > { > - if (!vect_validate_multiplication (perm_cache, left_op, right_op, > - neg_first, &conj_first_operand, > - false)) > - return IFN_LAST; > - > if(add0) > ifn = IFN_COMPLEX_FMA_CONJ; > else > @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op, > ops->quick_push (add0); > > complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]); > - if (kind == PERM_EVENODD) > - { > - ops->quick_push (left_op[1]); > - ops->quick_push (right_op[1]); > - ops->quick_push (left_op[0]); > - } > - else if (kind == PERM_TOP) > + if (kind == PERM_EVENODD || kind == PERM_TOP) > { > ops->quick_push (left_op[1]); > ops->quick_push (right_op[1]); > ops->quick_push (left_op[0]); > } > - else if (kind == PERM_EVENEVEN && !conj_first_operand) > + else if (kind == PERM_EVENEVEN && status != CONJ_SND) > { > ops->quick_push (left_op[0]); > ops->quick_push (right_op[0]); > @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op, > > vect_pattern* > complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > complex_operation_t op > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn > - = complex_mul_pattern::matches (op, perm_cache, node, &ops); > + = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops); > if (ifn == IFN_LAST) > return NULL; > > @@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo) > > /* First re-arrange the children. */ > SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2); > - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2]; > - SLP_TREE_CHILDREN (*this->m_node)[1] = newnode; > + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode; > + SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2]; > break; > } > case IFN_COMPLEX_FMA: > @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo) > > /* First re-arrange the children. */ > SLP_TREE_CHILDREN (*this->m_node).safe_grow (3); > - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0]; > + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode; > SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3]; > - SLP_TREE_CHILDREN (*this->m_node)[2] = newnode; > + SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0]; > > /* Tell the builder to expect an extra argument. */ > this->m_num_args++; > @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, > + slp_tree *); > > static vect_pattern* > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) > @@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern > internal_fn > complex_fms_pattern::matches (complex_operation_t op, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree * ref_node, vec<slp_tree> *ops) > { > internal_fn ifn = IFN_LAST; > @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op, > if (!vect_match_expression_p (root, MINUS_EXPR)) > return IFN_LAST; > > + /* TODO: Support invariants here, with the new layout CADD now > + can match before we get a chance to try CFMS. */ > auto nodes = SLP_TREE_CHILDREN (root); > if (!vect_match_expression_p (nodes[1], MULT_EXPR) > || vect_detect_pair_op (nodes[0]) != PLUS_MINUS) > @@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op, > || !vect_match_expression_p (l0node[1], MULT_EXPR)) > return IFN_LAST; > > - bool is_neg = vect_normalize_conj_loc (left_op); > - > - bool conj_first_operand = false; > - if (!vect_validate_multiplication (perm_cache, right_op, left_op, false, > - &conj_first_operand, true)) > + enum _conj_status status; > + if (!vect_validate_multiplication (perm_cache, compat_cache, right_op, > + left_op, true, &status)) > return IFN_LAST; > > - if (!is_neg) > + if (status == CONJ_NONE) > ifn = IFN_COMPLEX_FMS; > - else if (is_neg) > + else > ifn = IFN_COMPLEX_FMS_CONJ; > > if (!vect_pattern_validate_optab (ifn, *ref_node)) > @@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op, > ops->quick_push (right_op[1]); > ops->quick_push (left_op[1]); > } > - else if (kind == PERM_TOP) > - { > - ops->quick_push (l0node[0]); > - ops->quick_push (right_op[1]); > - ops->quick_push (right_op[0]); > - ops->quick_push (left_op[0]); > - } > - else if (kind == PERM_EVENEVEN && !is_neg) > - { > - ops->quick_push (l0node[0]); > - ops->quick_push (right_op[1]); > - ops->quick_push (right_op[0]); > - ops->quick_push (left_op[0]); > - } > else > { > ops->quick_push (l0node[0]); > ops->quick_push (right_op[1]); > ops->quick_push (right_op[0]); > - ops->quick_push (left_op[1]); > + ops->quick_push (left_op[0]); > } > > return ifn; > @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op, > > vect_pattern* > complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > complex_operation_t op > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn > - = complex_fms_pattern::matches (op, perm_cache, node, &ops); > + = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops); > if (ifn == IFN_LAST) > return NULL; > > @@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo) > SLP_TREE_CHILDREN (*this->m_node).create (3); > > /* First re-arrange the children. */ > + switch (this->m_ifn) > + { > + case IFN_COMPLEX_FMS: > + { > + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); > + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); > + break; > + } > + case IFN_COMPLEX_FMS_CONJ: > + { > + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); > + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); > + break; > + } > + default: > + gcc_unreachable (); > + } > SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]); > - SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); > - SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); > > /* And then rewrite the node itself. */ > complex_pattern::build (vinfo); > @@ -1334,11 +1397,12 @@ class complex_operations_pattern : public complex_pattern > public: > void build (vec_info *); > static internal_fn > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, > - vec<slp_tree> *); > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, > + slp_tree *); > }; > > /* Dummy matches implementation for proxy object. */ > @@ -1347,6 +1411,7 @@ internal_fn > complex_operations_pattern:: > matches (complex_operation_t /* op */, > slp_tree_to_load_perm_map_t * /* perm_cache */, > + slp_compat_nodes_map_t * /* compat_cache */, > slp_tree * /* ref_node */, vec<slp_tree> * /* ops */) > { > return IFN_LAST; > @@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */, > > vect_pattern* > complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *ccache, > slp_tree *node) > { > auto_vec<slp_tree> ops; > @@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, > = vect_detect_pair_op (*node, true, &ops); > internal_fn ifn = IFN_LAST; > > - ifn = complex_fms_pattern::matches (op, perm_cache, node, &ops); > + ifn = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops); > if (ifn != IFN_LAST) > return complex_fms_pattern::mkInstance (node, &ops, ifn); > > - ifn = complex_mul_pattern::matches (op, perm_cache, node, &ops); > + ifn = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops); > if (ifn != IFN_LAST) > return complex_mul_pattern::mkInstance (node, &ops, ifn); > > - ifn = complex_add_pattern::matches (op, perm_cache, node, &ops); > + ifn = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops); > if (ifn != IFN_LAST) > return complex_add_pattern::mkInstance (node, &ops, ifn); > > @@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern > void build (vec_info *); > > static vect_pattern* > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, > + slp_tree *); > }; > > vect_pattern * > -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) > +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *node_) > { > slp_tree node = *node_; > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, > /* Return true if call statements CALL1 and CALL2 are similar enough > to be combined into the same SLP group. */ > > -static bool > +bool > compatible_calls_p (gcall *call1, gcall *call2) > { > unsigned int nargs = gimple_call_num_args (call1); > @@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map, > static bool > vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, > slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache, > hash_set<slp_tree> *visited) > { > unsigned i; > @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, > slp_tree child; > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i], > - vinfo, perm_cache, visited); > + vinfo, perm_cache, compat_cache, > + visited); > > for (unsigned x = 0; x < num__slp_patterns; x++) > { > - vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node); > + vect_pattern *pattern > + = slp_patterns[x] (perm_cache, compat_cache, ref_node); > if (pattern) > { > pattern->build (vinfo); > @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, > static bool > vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, > hash_set<slp_tree> *visited, > - slp_tree_to_load_perm_map_t *perm_cache) > + slp_tree_to_load_perm_map_t *perm_cache, > + slp_compat_nodes_map_t *compat_cache) > { > DUMP_VECT_SCOPE ("vect_match_slp_patterns"); > slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); > @@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, > "Analyzing SLP tree %p for patterns\n", > SLP_INSTANCE_TREE (instance)); > > - return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited); > + return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache, > + visited); > } > > /* STMT_INFO is a store group of size GROUP_SIZE that we are considering > @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) > > hash_set<slp_tree> visited_patterns; > slp_tree_to_load_perm_map_t perm_cache; > + slp_compat_nodes_map_t compat_cache; > > /* See if any patterns can be found in the SLP tree. */ > bool pattern_found = false; > FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance) > pattern_found |= vect_match_slp_patterns (instance, vinfo, > - &visited_patterns, &perm_cache); > + &visited_patterns, &perm_cache, > + &compat_cache); > > /* If any were found optimize permutations of loads. */ > if (pattern_found) > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree, > extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info); > extern slp_tree vect_create_new_slp_node (unsigned, tree_code); > extern void vect_free_slp_tree (slp_tree); > +extern bool compatible_calls_p (gcall *, gcall *); > > /* In tree-vect-patterns.c. */ > extern void > @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds { > typedef hash_map <slp_tree, complex_perm_kinds_t> > slp_tree_to_load_perm_map_t; > > +/* Cache from nodes pair to being compatible or not. */ > +typedef pair_hash <nofree_ptr_hash <_slp_tree>, > + nofree_ptr_hash <_slp_tree>> slp_node_hash; > +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t; > + > + > /* Vector pattern matcher base class. All SLP pattern matchers must inherit > from this type. */ > > @@ -2338,7 +2345,8 @@ class vect_pattern > public: > > /* Create a new instance of the pattern matcher class of the given type. */ > - static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, slp_tree *); > > /* Build the pattern from the data collected so far. */ > virtual void build (vec_info *) = 0; > @@ -2352,6 +2360,7 @@ class vect_pattern > > /* Function pointer to create a new pattern matcher from a generic type. */ > typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *, > + slp_compat_nodes_map_t *, > slp_tree *); > > /* List of supported pattern matchers. */ > > > --
> -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Monday, January 10, 2022 1:00 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Richard > Guenther <rguenther@suse.de> > Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex > numbers validation routines. > > On Fri, Dec 17, 2021 at 4:44 PM Tamar Christina via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi All, > > > > This patch boosts the analysis for complex mul,fma and fms in order to > ensure > > that it doesn't create an incorrect output. > > > > Essentially it adds an extra verification to check that the two nodes it's > going > > to combine do the same operations on compatible values. The reason it > needs to > > do this is that if one computation differs from the other then with the > current > > implementation we have no way to deal with it since we have to remove > the > > permute. > > > > When we can keep the permute around we can probably handle these by > unrolling. > > > > While implementing this since I have to do the traversal anyway I took > advantage > > of it by simplifying the code a bit. Previously we would determine whether > > something is a conjugate and then try to figure out which conjugate it is > and > > then try to see if the permutes match what we expect. > > > > Now the code that does the traversal will detect this in one go and return > to us > > whether the operation is something that can be combined and whether a > conjugate > > is present. > > > > Secondly because it does this I can now simplify the checking code itself to > > essentially just try to apply fixed patterns to each operation. > > > > The patterns represent the order operations should appear in. For instance > a > > complex MUL operation combines : > > > > Left 1 + Right 1 > > Left 2 + Right 2 > > > > with a permute on the nodes consisting of: > > > > { Even, Even } + { Odd, Odd } > > { Even, Odd } + { Odd, Even } > > > > By abstracting over these patterns the checking code becomes quite simple. > > > > As part of this I was checking the order of the operands which was left in > > "slp" order. as in, the same order they showed up in during SLP, which > means > > that the accumulator is first. However it looks like I didn't document this > > and the x86 optab was implemented assuming the same order as FMA, i.e. > that > > the accumulator is last. > > > > I have this changed the order to match that of FMA and FMS which corrects > the > > x86 codegen and will update the Arm targets. This has now also been > > documented. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > x86_64-pc-linux-gnu and no regressions. > > > > Ok for master? and backport to GCC 11 after some stew? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/102819 > > PR tree-optimization/103169 > > * doc/md.texi: Update docs for cfms, cfma. > > * tree-data-ref.h (same_data_refs): Accept optional offset. > > * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating > > patterns. > > (vect_normalize_conj_loc): Remove. > > (is_eq_or_top): Change to take two nodes. > > (enum _conj_status, compatible_complex_nodes_p, > > vect_validate_multiplication): New. > > (class complex_add_pattern, complex_add_pattern::matches, > > complex_add_pattern::recognize, class complex_mul_pattern, > > complex_mul_pattern::recognize, class complex_fms_pattern, > > complex_fms_pattern::recognize, class complex_operations_pattern, > > complex_operations_pattern::recognize, addsub_pattern::recognize): > Pass > > new cache. > > (complex_fms_pattern::matches, complex_mul_pattern::matches): > Pass new > > cache and use new validation code. > > * tree-vect-slp.c (vect_match_slp_patterns_2, > vect_match_slp_patterns, > > vect_analyze_slp): Pass along cache. > > (compatible_calls_p): Expose. > > * tree-vectorizer.h (compatible_calls_p, slp_node_hash, > > slp_compat_nodes_map_t): New. > > (class vect_pattern): Update signatures include new cache. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/102819 > > PR tree-optimization/103169 > > * g++.dg/vect/pr99149.cc: xfail for now. > > * gcc.dg/vect/complex/pr102819-1.c: New test. > > * gcc.dg/vect/complex/pr102819-2.c: New test. > > * gcc.dg/vect/complex/pr102819-3.c: New test. > > * gcc.dg/vect/complex/pr102819-4.c: New test. > > * gcc.dg/vect/complex/pr102819-5.c: New test. > > * gcc.dg/vect/complex/pr102819-6.c: New test. > > * gcc.dg/vect/complex/pr102819-7.c: New test. > > * gcc.dg/vect/complex/pr102819-8.c: New test. > > * gcc.dg/vect/complex/pr102819-9.c: New test. > > * gcc.dg/vect/complex/pr103169.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index > 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467 > bc66e9cfebe9dcfc 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that > is semantically the same as > > a multiply and accumulate of complex numbers. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] += a[i] * b[i]; > > + op2[i] += op1[i] * op2[i]; > > @} > > @end smallexample > > > > @@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of > complex numbers where the second > > multiply arguments is conjugated. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] += a[i] * conj (b[i]); > > + op2[i] += op0[i] * conj (op1[i]); > > @} > > @end smallexample > > > > @@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is > semantically the same as > > a multiply and subtract of complex numbers. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] -= a[i] * b[i]; > > + op2[i] -= op0[i] * op1[i]; > > @} > > @end smallexample > > > > @@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex > numbers where the second > > multiply arguments is conjugated. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] -= a[i] * conj (b[i]); > > + op2[i] -= op0[i] * conj (op1[i]); > > @} > > @end smallexample > > > > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically > the same as multiply of > > complex numbers. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] = a[i] * b[i]; > > + op2[i] = op0[i] * op1[i]; > > @} > > @end smallexample > > > > @@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is > semantically the same as a > > multiply of complex numbers where the second multiply arguments is > conjugated. > > > > @smallexample > > - complex TYPE c[N]; > > - complex TYPE a[N]; > > - complex TYPE b[N]; > > + complex TYPE op0[N]; > > + complex TYPE op1[N]; > > + complex TYPE op2[N]; > > for (int i = 0; i < N; i += 1) > > @{ > > - c[i] = a[i] * conj (b[i]); > > + op2[i] = op0[i] * conj (op1[i]); > > @} > > @end smallexample > > > > diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc > b/gcc/testsuite/g++.dg/vect/pr99149.cc > > index > e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d > 6e9432c2166463 100755 > > --- a/gcc/testsuite/g++.dg/vect/pr99149.cc > > +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc > > @@ -24,4 +24,4 @@ public: > > } n; > > main() { n.j(); } > > > > -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */ > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" > { xfail { vect_float } } } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02 > f779cf693ede07 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c > > @@ -0,0 +1,20 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void bad1(float v1, float v2) > > +{ > > + for (int r = 0; r < 100; r += 4) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); > > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > > + f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1); > > + f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2); > > + // ^^^^^^^ ^^^^^^^ > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96 > 601596f46dc5f8 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void bad1(float v1, float v2) > > +{ > > + for (int r = 0; r < 100; r += 2) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2); > > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965 > dbb72cf8940de1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void good1(float v1, float v2) > > +{ > > + for (int r = 0; r < 100; r += 2) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); > > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..882851789c5085e73400060911 > 4be480d3b08bd0 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void good1() > > +{ > > + for (int r = 0; r < 100; r += 2) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i]; > > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd46 > 9473e6a5c333ae > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void good2() > > +{ > > + for (int r = 0; r < 100; r += 2) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1); > > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1); > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b > 216022fdc0af54e > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void bad1() > > +{ > > + for (int r = 0; r < 100; r += 2) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i]; > > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r]; > > + // ^^^^^^^ ^^^^^^^ > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61 > b3a36b555acf3cf > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void bad2() > > +{ > > + for (int r = 0; r < 100; r += 2) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i]; > > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r]; > > + // ^^^^ > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..07b48148688b7d530e5891d02 > 3d558b58a485c23 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +float f[12][100]; > > + > > +void bad3() > > +{ > > + for (int r = 0; r < 100; r += 2) > > + { > > + int i = r + 1; > > + f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i]; > > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; > > + // ^^^^^^^ > > + } > > +} > > + > > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target > { vect_float } } } } */ > > + > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316 > e8caf3d485b8ee1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > + > > +#include <stdio.h> > > +#include <complex.h> > > + > > +#define N 200 > > +#define TYPE float > > +#define TYPE2 float > > + > > +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE > complex c[restrict N]) > > +{ > > + for (int i=0; i < N; i++) > > + { > > + c[i] -= a[i] * b[0]; > > + } > > +} > > + > > +/* The pattern overlaps with COMPLEX_ADD so we need to support > consuming ADDs in COMPLEX_FMS. */ > > + > > +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail > { vect_float } } } } */ > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > > new file mode 100644 > > index > 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a8257 > 4324126e9083fc5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c > > @@ -0,0 +1,12 @@ > > +/* { dg-do compile { target { vect_double } } } */ > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */ > > + > > +_Complex double b_0, c_0; > > + > > +void > > +mul270snd (void) > > +{ > > + c_0 = b_0 * 1.0iF * 1.0iF; > > +} > > + > > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h > > index > 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf44 > 2d5dc5c16e7ee 100644 > > --- a/gcc/tree-data-ref.h > > +++ b/gcc/tree-data-ref.h > > @@ -600,10 +600,11 @@ same_data_refs_base_objects > (data_reference_p a, data_reference_p b) > > } > > > > /* Return true when the data references A and B are accessing the same > > - memory object with the same access functions. */ > > + memory object with the same access functions. Optionally skip the > > + last OFFSET dimensions in the data reference. */ > > But you skip the _first_ dimensions? That's because the dimensions seem to be laid out in reverse order, i.e. float f[12][200] with an access as f[1][r] gets a DR as: >>> p debug (dr1) #(Data Ref: # bb: 3 # stmt: _1 = f[1][r_20]; # ref: f[1][r_20]; # base_object: f; # Access function 0: {0, +, 2}_1 # Access function 1: 1 #) So index 0 has the outer most dimension. Cheers, Tamar > > Otherwise looks OK to me. > > Thanks, > Richard. > > > static inline bool > > -same_data_refs (data_reference_p a, data_reference_p b) > > +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0) > > { > > unsigned int i; > > > > @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, > data_reference_p b) > > if (!same_data_refs_base_objects (a, b)) > > return false; > > > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > > + for (i = offset; i < DR_NUM_DIMENSIONS (a); i++) > > if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i))) > > return false; > > > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > > index > 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f3730 > 5a5d3078fbb0c5 100644 > > --- a/gcc/tree-vect-slp-patterns.c > > +++ b/gcc/tree-vect-slp-patterns.c > > @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads) > > int valid_patterns = 4; > > FOR_EACH_VEC_ELT (loads, i, load) > > { > > - if (candidates[0] != PERM_UNKNOWN && load != 1) > > + unsigned adj_load = load % 2; > > + if (candidates[0] != PERM_UNKNOWN && adj_load != 1) > > { > > candidates[0] = PERM_UNKNOWN; > > valid_patterns--; > > } > > - if (candidates[1] != PERM_UNKNOWN && load != 0) > > + if (candidates[1] != PERM_UNKNOWN && adj_load != 0) > > { > > candidates[1] = PERM_UNKNOWN; > > valid_patterns--; > > @@ -596,11 +597,12 @@ class complex_add_pattern : public > complex_pattern > > public: > > void build (vec_info *); > > static internal_fn > > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > > - vec<slp_tree> *); > > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > > > static vect_pattern* > > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > > + recognize (slp_tree_to_load_perm_map_t *, > slp_compat_nodes_map_t *, > > + slp_tree *); > > > > static vect_pattern* > > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) > > @@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo) > > internal_fn > > complex_add_pattern::matches (complex_operation_t op, > > slp_tree_to_load_perm_map_t *perm_cache, > > + slp_compat_nodes_map_t * /* compat_cache */, > > slp_tree *node, vec<slp_tree> *ops) > > { > > internal_fn ifn = IFN_LAST; > > @@ -692,13 +695,14 @@ complex_add_pattern::matches > (complex_operation_t op, > > > > vect_pattern* > > complex_add_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > > + slp_compat_nodes_map_t *compat_cache, > > slp_tree *node) > > { > > auto_vec<slp_tree> ops; > > complex_operation_t op > > = vect_detect_pair_op (*node, true, &ops); > > internal_fn ifn > > - = complex_add_pattern::matches (op, perm_cache, node, &ops); > > + = complex_add_pattern::matches (op, perm_cache, compat_cache, > node, &ops); > > if (ifn == IFN_LAST) > > return NULL; > > > > @@ -709,147 +713,214 @@ complex_add_pattern::recognize > (slp_tree_to_load_perm_map_t *perm_cache, > > * complex_mul_pattern > > > ********************************************************** > ********************/ > > > > -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the > first > > - child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE. > > - > > - If a negate is found then the values in ARGS are reordered such that the > > - negate node is always the second one and the entry is replaced by the > child > > - of the negate node. */ > > +/* Helper function to check if PERM is KIND or PERM_TOP. */ > > > > static inline bool > > -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL) > > +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache, > > + slp_tree op1, complex_perm_kinds_t kind1, > > + slp_tree op2, complex_perm_kinds_t kind2) > > { > > - gcc_assert (args.length () == 2); > > - bool neg_found = false; > > - > > - if (vect_match_expression_p (args[0], NEGATE_EXPR)) > > - { > > - std::swap (args[0], args[1]); > > - neg_found = true; > > - if (neg_first_p) > > - *neg_first_p = true; > > - } > > - else if (vect_match_expression_p (args[1], NEGATE_EXPR)) > > - { > > - neg_found = true; > > - if (neg_first_p) > > - *neg_first_p = false; > > - } > > + complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1); > > + if (perm1 != kind1 && perm1 != PERM_TOP) > > + return false; > > > > - if (neg_found) > > - args[1] = SLP_TREE_CHILDREN (args[1])[0]; > > + complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2); > > + if (perm2 != kind2 && perm2 != PERM_TOP) > > + return false; > > > > - return neg_found; > > + return true; > > } > > > > -/* Helper function to check if PERM is KIND or PERM_TOP. */ > > +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND }; > > > > static inline bool > > -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t > kind) > > +compatible_complex_nodes_p (slp_compat_nodes_map_t > *compat_cache, > > + slp_tree a, int *pa, slp_tree b, int *pb) > > { > > - return perm == kind || perm == PERM_TOP; > > -} > > + bool *tmp; > > + std::pair<slp_tree, slp_tree> key = std::make_pair(a, b); > > + if ((tmp = compat_cache->get (key)) != NULL) > > + return *tmp; > > > > -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both > MULT_EXPR > > - nodes but also that they represent an operation that is either a complex > > - multiplication or a complex multiplication by conjugated value. > > + compat_cache->put (key, false); > > > > - Of the negation is expected to be in the first half of the tree (As required > > - by an FMS pattern) then NEG_FIRST is true. If the operation is a > conjugate > > - operation then CONJ_FIRST_OPERAND is set to indicate whether the > first or > > - second operand contains the conjugate operation. */ > > + if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ()) > > + return false; > > > > -static inline bool > > -vect_validate_multiplication (slp_tree_to_load_perm_map_t > *perm_cache, > > - const vec<slp_tree> &left_op, > > - const vec<slp_tree> &right_op, > > - bool neg_first, bool *conj_first_operand, > > - bool fms) > > -{ > > - /* The presence of a negation indicates that we have either a conjugate > or a > > - rotation. We need to distinguish which one. */ > > - *conj_first_operand = false; > > - complex_perm_kinds_t kind; > > - > > - /* Complex conjugates have the negation on the imaginary part of the > > - number where rotations affect the real component. So check if the > > - negation is on a dup of lane 1. */ > > - if (fms) > > + if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b)) > > + return false; > > + > > + /* Only internal nodes can be loads, as such we can't check further if > they > > + are externals. */ > > + if (SLP_TREE_DEF_TYPE (a) != vect_internal_def) > > { > > - /* Canonicalization for fms is not consistent. So have to test both > > - variants to be sure. This needs to be fixed in the mid-end so > > - this part can be simpler. */ > > - kind = linear_loads_p (perm_cache, right_op[0]); > > - if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), > PERM_ODDODD) > > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), > > - PERM_ODDEVEN)) > > - || (kind == PERM_ODDEVEN > > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), > > - PERM_ODDODD)))) > > - return false; > > + for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++) > > + { > > + tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]]; > > + tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]]; > > + if (!operand_equal_p (op1, op2, 0)) > > + return false; > > + } > > + > > + compat_cache->put (key, true); > > + return true; > > + } > > + > > + auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a)); > > + auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b)); > > + > > + if (gimple_code (a_stmt) != gimple_code (b_stmt)) > > + return false; > > + > > + /* code, children, type, externals, loads, constants */ > > + if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt)) > > + return false; > > + > > + /* At this point, a and b are known to be the same gimple operations. */ > > + if (is_gimple_call (a_stmt)) > > + { > > + if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt), > > + dyn_cast <gcall *> (b_stmt))) > > + return false; > > } > > + else if (!is_gimple_assign (a_stmt)) > > + return false; > > else > > { > > - if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD > > - && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), > > - PERM_ODDEVEN)) > > + tree_code acode = gimple_assign_rhs_code (a_stmt); > > + tree_code bcode = gimple_assign_rhs_code (b_stmt); > > + if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR) > > + && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR)) > > + return true; > > + > > + if (acode != bcode) > > return false; > > } > > > > - /* Deal with differences in indexes. */ > > - int index1 = fms ? 1 : 0; > > - int index2 = fms ? 0 : 1; > > - > > - /* Check if the conjugate is on the second first or second operand. The > > - order of the node with the conjugate value determines this, and the > dup > > - node must be one of lane 0 of the same DR as the neg node. */ > > - kind = linear_loads_p (perm_cache, left_op[index1]); > > - if (kind == PERM_TOP) > > + if (!SLP_TREE_LOAD_PERMUTATION (a).exists () > > + || !SLP_TREE_LOAD_PERMUTATION (b).exists ()) > > { > > - if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD) > > - return true; > > + for (unsigned i = 0; i < gimple_num_args (a_stmt); i++) > > + { > > + tree t1 = gimple_arg (a_stmt, i); > > + tree t2 = gimple_arg (b_stmt, i); > > + if (TREE_CODE (t1) != TREE_CODE (t2)) > > + return false; > > + > > + /* If SSA name then we will need to inspect the children > > + so we can punt here. */ > > + if (TREE_CODE (t1) == SSA_NAME) > > + continue; > > + > > + if (!operand_equal_p (t1, t2, 0)) > > + return false; > > + } > > } > > - else if (kind == PERM_EVENODD && !neg_first) > > + else > > { > > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != > PERM_EVENEVEN) > > + auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a)); > > + auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b)); > > + /* Don't check the last dimension as that's checked by the lineary > > + checks. This check is also much stricter than what we need > > + because it doesn't consider loading from adjacent elements > > + in the same struct as loading from the same base object. > > + But for now, I'll play it safe. */ > > + if (!same_data_refs (dr1, dr2, 1)) > > return false; > > - return true; > > } > > - else if (kind == PERM_EVENEVEN && neg_first) > > + > > + for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++) > > { > > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != > PERM_EVENODD) > > + if (!compatible_complex_nodes_p (compat_cache, > > + SLP_TREE_CHILDREN (a)[i], pa, > > + SLP_TREE_CHILDREN (b)[i], pb)) > > return false; > > - > > - *conj_first_operand = true; > > - return true; > > } > > - else > > - return false; > > - > > - if (kind != PERM_EVENEVEN) > > - return false; > > > > + compat_cache->put (key, true); > > return true; > > } > > > > -/* Helper function to help distinguish between a conjugate and a rotation > in a > > - complex multiplication. The operations have similar shapes but the order > of > > - the load permutes are different. This function returns TRUE when the > order > > - is consistent with a multiplication or multiplication by conjugated > > - operand but returns FALSE if it's a multiplication by rotated operand. */ > > - > > static inline bool > > vect_validate_multiplication (slp_tree_to_load_perm_map_t > *perm_cache, > > - const vec<slp_tree> &op, > > - complex_perm_kinds_t permKind) > > + slp_compat_nodes_map_t *compat_cache, > > + vec<slp_tree> &left_op, > > + vec<slp_tree> &right_op, > > + bool subtract, > > + enum _conj_status *_status) > > { > > - /* The left node is the more common case, test it first. */ > > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind)) > > + auto_vec<slp_tree> ops; > > + enum _conj_status stats = CONJ_NONE; > > + > > + /* The complex operations can occur in two layouts and two permute > sequences > > + so declare them and re-use them. */ > > + int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}. */ > > + , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}. */ > > + }; > > + > > + /* Now for the corresponding permutes that go with these values. */ > > + complex_perm_kinds_t perms[][4] > > + = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, > PERM_ODDEVEN } > > + , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, > PERM_ODDODD } > > + }; > > + > > + /* These permutes are used during comparisons of externals on which > > + we require strict equality. */ > > + int cq[][4][2] > > + = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } } > > + , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } } > > + }; > > + > > + /* Default to style and perm 0, most operations use this one. */ > > + int style = 0; > > + int perm = subtract ? 1 : 0; > > + > > + /* Check if we have a negate operation, if so absorb the node and > continue > > + looking. */ > > + bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR); > > + bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR); > > + > > + /* Determine which style we're looking at. We only have different ones > > + whenever a conjugate is involved. */ > > + if (neg0 && neg1) > > + ; > > + else if (neg0) > > { > > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind)) > > - return false; > > + right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0]; > > + stats = CONJ_FST; > > + if (subtract) > > + perm = 0; > > } > > - return true; > > + else if (neg1) > > + { > > + right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0]; > > + stats = CONJ_SND; > > + perm = 1; > > + } > > + > > + *_status = stats; > > + > > + /* Flatten the inputs after we've remapped them. */ > > + ops.create (4); > > + ops.safe_splice (left_op); > > + ops.safe_splice (right_op); > > + > > + /* Extract out the elements to check. */ > > + slp_tree op0 = ops[styles[style][0]]; > > + slp_tree op1 = ops[styles[style][1]]; > > + slp_tree op2 = ops[styles[style][2]]; > > + slp_tree op3 = ops[styles[style][3]]; > > + > > + /* Do cheapest test first. If failed no need to analyze further. */ > > + if (linear_loads_p (perm_cache, op0) != perms[perm][0] > > + || linear_loads_p (perm_cache, op1) != perms[perm][1] > > + || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, > perms[perm][3])) > > + return false; > > + > > + return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], > op1, > > + cq[perm][1]) > > + && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], > op3, > > + cq[perm][3]); > > } > > > > /* This function combines two nodes containing only even and only odd > lanes > > @@ -908,11 +979,12 @@ class complex_mul_pattern : public > complex_pattern > > public: > > void build (vec_info *); > > static internal_fn > > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > > - vec<slp_tree> *); > > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > > > static vect_pattern* > > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > > + recognize (slp_tree_to_load_perm_map_t *, > slp_compat_nodes_map_t *, > > + slp_tree *); > > > > static vect_pattern* > > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) > > @@ -943,6 +1015,7 @@ class complex_mul_pattern : public > complex_pattern > > internal_fn > > complex_mul_pattern::matches (complex_operation_t op, > > slp_tree_to_load_perm_map_t *perm_cache, > > + slp_compat_nodes_map_t *compat_cache, > > slp_tree *node, vec<slp_tree> *ops) > > { > > internal_fn ifn = IFN_LAST; > > @@ -990,17 +1063,13 @@ complex_mul_pattern::matches > (complex_operation_t op, > > || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN) > > return IFN_LAST; > > > > - bool neg_first = false; > > - bool conj_first_operand = false; > > - bool is_neg = vect_normalize_conj_loc (right_op, &neg_first); > > + enum _conj_status status; > > + if (!vect_validate_multiplication (perm_cache, compat_cache, left_op, > > + right_op, false, &status)) > > + return IFN_LAST; > > > > - if (!is_neg) > > + if (status == CONJ_NONE) > > { > > - /* A multiplication needs to multiply agains the real pair, otherwise > > - the pattern matches that of FMS. */ > > - if (!vect_validate_multiplication (perm_cache, left_op, > PERM_EVENEVEN) > > - || vect_normalize_conj_loc (left_op)) > > - return IFN_LAST; > > if (add0) > > ifn = IFN_COMPLEX_FMA; > > else > > @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches > (complex_operation_t op, > > } > > else > > { > > - if (!vect_validate_multiplication (perm_cache, left_op, right_op, > > - neg_first, &conj_first_operand, > > - false)) > > - return IFN_LAST; > > - > > if(add0) > > ifn = IFN_COMPLEX_FMA_CONJ; > > else > > @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches > (complex_operation_t op, > > ops->quick_push (add0); > > > > complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]); > > - if (kind == PERM_EVENODD) > > - { > > - ops->quick_push (left_op[1]); > > - ops->quick_push (right_op[1]); > > - ops->quick_push (left_op[0]); > > - } > > - else if (kind == PERM_TOP) > > + if (kind == PERM_EVENODD || kind == PERM_TOP) > > { > > ops->quick_push (left_op[1]); > > ops->quick_push (right_op[1]); > > ops->quick_push (left_op[0]); > > } > > - else if (kind == PERM_EVENEVEN && !conj_first_operand) > > + else if (kind == PERM_EVENEVEN && status != CONJ_SND) > > { > > ops->quick_push (left_op[0]); > > ops->quick_push (right_op[0]); > > @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches > (complex_operation_t op, > > > > vect_pattern* > > complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > > + slp_compat_nodes_map_t *compat_cache, > > slp_tree *node) > > { > > auto_vec<slp_tree> ops; > > complex_operation_t op > > = vect_detect_pair_op (*node, true, &ops); > > internal_fn ifn > > - = complex_mul_pattern::matches (op, perm_cache, node, &ops); > > + = complex_mul_pattern::matches (op, perm_cache, compat_cache, > node, &ops); > > if (ifn == IFN_LAST) > > return NULL; > > > > @@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo) > > > > /* First re-arrange the children. */ > > SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2); > > - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2]; > > - SLP_TREE_CHILDREN (*this->m_node)[1] = newnode; > > + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode; > > + SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2]; > > break; > > } > > case IFN_COMPLEX_FMA: > > @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo) > > > > /* First re-arrange the children. */ > > SLP_TREE_CHILDREN (*this->m_node).safe_grow (3); > > - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0]; > > + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode; > > SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3]; > > - SLP_TREE_CHILDREN (*this->m_node)[2] = newnode; > > + SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0]; > > > > /* Tell the builder to expect an extra argument. */ > > this->m_num_args++; > > @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public > complex_pattern > > public: > > void build (vec_info *); > > static internal_fn > > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > > - vec<slp_tree> *); > > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > > > static vect_pattern* > > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > > + recognize (slp_tree_to_load_perm_map_t *, > slp_compat_nodes_map_t *, > > + slp_tree *); > > > > static vect_pattern* > > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) > > @@ -1182,6 +1242,7 @@ class complex_fms_pattern : public > complex_pattern > > internal_fn > > complex_fms_pattern::matches (complex_operation_t op, > > slp_tree_to_load_perm_map_t *perm_cache, > > + slp_compat_nodes_map_t *compat_cache, > > slp_tree * ref_node, vec<slp_tree> *ops) > > { > > internal_fn ifn = IFN_LAST; > > @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches > (complex_operation_t op, > > if (!vect_match_expression_p (root, MINUS_EXPR)) > > return IFN_LAST; > > > > + /* TODO: Support invariants here, with the new layout CADD now > > + can match before we get a chance to try CFMS. */ > > auto nodes = SLP_TREE_CHILDREN (root); > > if (!vect_match_expression_p (nodes[1], MULT_EXPR) > > || vect_detect_pair_op (nodes[0]) != PLUS_MINUS) > > @@ -1217,16 +1280,14 @@ complex_fms_pattern::matches > (complex_operation_t op, > > || !vect_match_expression_p (l0node[1], MULT_EXPR)) > > return IFN_LAST; > > > > - bool is_neg = vect_normalize_conj_loc (left_op); > > - > > - bool conj_first_operand = false; > > - if (!vect_validate_multiplication (perm_cache, right_op, left_op, false, > > - &conj_first_operand, true)) > > + enum _conj_status status; > > + if (!vect_validate_multiplication (perm_cache, compat_cache, right_op, > > + left_op, true, &status)) > > return IFN_LAST; > > > > - if (!is_neg) > > + if (status == CONJ_NONE) > > ifn = IFN_COMPLEX_FMS; > > - else if (is_neg) > > + else > > ifn = IFN_COMPLEX_FMS_CONJ; > > > > if (!vect_pattern_validate_optab (ifn, *ref_node)) > > @@ -1243,26 +1304,12 @@ complex_fms_pattern::matches > (complex_operation_t op, > > ops->quick_push (right_op[1]); > > ops->quick_push (left_op[1]); > > } > > - else if (kind == PERM_TOP) > > - { > > - ops->quick_push (l0node[0]); > > - ops->quick_push (right_op[1]); > > - ops->quick_push (right_op[0]); > > - ops->quick_push (left_op[0]); > > - } > > - else if (kind == PERM_EVENEVEN && !is_neg) > > - { > > - ops->quick_push (l0node[0]); > > - ops->quick_push (right_op[1]); > > - ops->quick_push (right_op[0]); > > - ops->quick_push (left_op[0]); > > - } > > else > > { > > ops->quick_push (l0node[0]); > > ops->quick_push (right_op[1]); > > ops->quick_push (right_op[0]); > > - ops->quick_push (left_op[1]); > > + ops->quick_push (left_op[0]); > > } > > > > return ifn; > > @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches > (complex_operation_t op, > > > > vect_pattern* > > complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > > + slp_compat_nodes_map_t *compat_cache, > > slp_tree *node) > > { > > auto_vec<slp_tree> ops; > > complex_operation_t op > > = vect_detect_pair_op (*node, true, &ops); > > internal_fn ifn > > - = complex_fms_pattern::matches (op, perm_cache, node, &ops); > > + = complex_fms_pattern::matches (op, perm_cache, compat_cache, > node, &ops); > > if (ifn == IFN_LAST) > > return NULL; > > > > @@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo) > > SLP_TREE_CHILDREN (*this->m_node).create (3); > > > > /* First re-arrange the children. */ > > + switch (this->m_ifn) > > + { > > + case IFN_COMPLEX_FMS: > > + { > > + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); > > + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); > > + break; > > + } > > + case IFN_COMPLEX_FMS_CONJ: > > + { > > + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); > > + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); > > + break; > > + } > > + default: > > + gcc_unreachable (); > > + } > > SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]); > > - SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); > > - SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); > > > > /* And then rewrite the node itself. */ > > complex_pattern::build (vinfo); > > @@ -1334,11 +1397,12 @@ class complex_operations_pattern : public > complex_pattern > > public: > > void build (vec_info *); > > static internal_fn > > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > slp_tree *, > > - vec<slp_tree> *); > > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, > > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); > > > > static vect_pattern* > > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > > + recognize (slp_tree_to_load_perm_map_t *, > slp_compat_nodes_map_t *, > > + slp_tree *); > > }; > > > > /* Dummy matches implementation for proxy object. */ > > @@ -1347,6 +1411,7 @@ internal_fn > > complex_operations_pattern:: > > matches (complex_operation_t /* op */, > > slp_tree_to_load_perm_map_t * /* perm_cache */, > > + slp_compat_nodes_map_t * /* compat_cache */, > > slp_tree * /* ref_node */, vec<slp_tree> * /* ops */) > > { > > return IFN_LAST; > > @@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */, > > > > vect_pattern* > > complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t > *perm_cache, > > + slp_compat_nodes_map_t *ccache, > > slp_tree *node) > > { > > auto_vec<slp_tree> ops; > > @@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize > (slp_tree_to_load_perm_map_t *perm_cache, > > = vect_detect_pair_op (*node, true, &ops); > > internal_fn ifn = IFN_LAST; > > > > - ifn = complex_fms_pattern::matches (op, perm_cache, node, &ops); > > + ifn = complex_fms_pattern::matches (op, perm_cache, ccache, node, > &ops); > > if (ifn != IFN_LAST) > > return complex_fms_pattern::mkInstance (node, &ops, ifn); > > > > - ifn = complex_mul_pattern::matches (op, perm_cache, node, &ops); > > + ifn = complex_mul_pattern::matches (op, perm_cache, ccache, node, > &ops); > > if (ifn != IFN_LAST) > > return complex_mul_pattern::mkInstance (node, &ops, ifn); > > > > - ifn = complex_add_pattern::matches (op, perm_cache, node, &ops); > > + ifn = complex_add_pattern::matches (op, perm_cache, ccache, node, > &ops); > > if (ifn != IFN_LAST) > > return complex_add_pattern::mkInstance (node, &ops, ifn); > > > > @@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern > > void build (vec_info *); > > > > static vect_pattern* > > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > > + recognize (slp_tree_to_load_perm_map_t *, > slp_compat_nodes_map_t *, > > + slp_tree *); > > }; > > > > vect_pattern * > > -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree > *node_) > > +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, > > + slp_compat_nodes_map_t *, slp_tree *node_) > > { > > slp_tree node = *node_; > > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > index > b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06 > a6d7a0875de5e75 100644 > > --- a/gcc/tree-vect-slp.c > > +++ b/gcc/tree-vect-slp.c > > @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, > unsigned char swap, > > /* Return true if call statements CALL1 and CALL2 are similar enough > > to be combined into the same SLP group. */ > > > > -static bool > > +bool > > compatible_calls_p (gcall *call1, gcall *call2) > > { > > unsigned int nargs = gimple_call_num_args (call1); > > @@ -2907,6 +2907,7 @@ optimize_load_redistribution > (scalar_stmts_to_slp_tree_map_t *bst_map, > > static bool > > vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, > > slp_tree_to_load_perm_map_t *perm_cache, > > + slp_compat_nodes_map_t *compat_cache, > > hash_set<slp_tree> *visited) > > { > > unsigned i; > > @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree > *ref_node, vec_info *vinfo, > > slp_tree child; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN > (node)[i], > > - vinfo, perm_cache, visited); > > + vinfo, perm_cache, compat_cache, > > + visited); > > > > for (unsigned x = 0; x < num__slp_patterns; x++) > > { > > - vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node); > > + vect_pattern *pattern > > + = slp_patterns[x] (perm_cache, compat_cache, ref_node); > > if (pattern) > > { > > pattern->build (vinfo); > > @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, > vec_info *vinfo, > > static bool > > vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, > > hash_set<slp_tree> *visited, > > - slp_tree_to_load_perm_map_t *perm_cache) > > + slp_tree_to_load_perm_map_t *perm_cache, > > + slp_compat_nodes_map_t *compat_cache) > > { > > DUMP_VECT_SCOPE ("vect_match_slp_patterns"); > > slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); > > @@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, > vec_info *vinfo, > > "Analyzing SLP tree %p for patterns\n", > > SLP_INSTANCE_TREE (instance)); > > > > - return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, > visited); > > + return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, > compat_cache, > > + visited); > > } > > > > /* STMT_INFO is a store group of size GROUP_SIZE that we are considering > > @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned > max_tree_size) > > > > hash_set<slp_tree> visited_patterns; > > slp_tree_to_load_perm_map_t perm_cache; > > + slp_compat_nodes_map_t compat_cache; > > > > /* See if any patterns can be found in the SLP tree. */ > > bool pattern_found = false; > > FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance) > > pattern_found |= vect_match_slp_patterns (instance, vinfo, > > - &visited_patterns, &perm_cache); > > + &visited_patterns, &perm_cache, > > + &compat_cache); > > > > /* If any were found optimize permutations of loads. */ > > if (pattern_found) > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index > 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd8 > 81e0ec636a605a 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, > gimple_seq *, tree, > > extern int vect_get_place_in_interleaving_chain (stmt_vec_info, > stmt_vec_info); > > extern slp_tree vect_create_new_slp_node (unsigned, tree_code); > > extern void vect_free_slp_tree (slp_tree); > > +extern bool compatible_calls_p (gcall *, gcall *); > > > > /* In tree-vect-patterns.c. */ > > extern void > > @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds { > > typedef hash_map <slp_tree, complex_perm_kinds_t> > > slp_tree_to_load_perm_map_t; > > > > +/* Cache from nodes pair to being compatible or not. */ > > +typedef pair_hash <nofree_ptr_hash <_slp_tree>, > > + nofree_ptr_hash <_slp_tree>> slp_node_hash; > > +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t; > > + > > + > > /* Vector pattern matcher base class. All SLP pattern matchers must > inherit > > from this type. */ > > > > @@ -2338,7 +2345,8 @@ class vect_pattern > > public: > > > > /* Create a new instance of the pattern matcher class of the given type. > */ > > - static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, > slp_tree *); > > + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, > > + slp_compat_nodes_map_t *, slp_tree *); > > > > /* Build the pattern from the data collected so far. */ > > virtual void build (vec_info *) = 0; > > @@ -2352,6 +2360,7 @@ class vect_pattern > > > > /* Function pointer to create a new pattern matcher from a generic type. > */ > > typedef vect_pattern* (*vect_pattern_decl_t) > (slp_tree_to_load_perm_map_t *, > > + slp_compat_nodes_map_t *, > > slp_tree *); > > > > /* List of supported pattern matchers. */ > > > > > > --
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is semantically the same as a multiply and accumulate of complex numbers. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] += a[i] * b[i]; + op2[i] += op1[i] * op2[i]; @} @end smallexample @@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of complex numbers where the second multiply arguments is conjugated. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] += a[i] * conj (b[i]); + op2[i] += op0[i] * conj (op1[i]); @} @end smallexample @@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is semantically the same as a multiply and subtract of complex numbers. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] -= a[i] * b[i]; + op2[i] -= op0[i] * op1[i]; @} @end smallexample @@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex numbers where the second multiply arguments is conjugated. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] -= a[i] * conj (b[i]); + op2[i] -= op0[i] * conj (op1[i]); @} @end smallexample @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically the same as multiply of complex numbers. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] = a[i] * b[i]; + op2[i] = op0[i] * op1[i]; @} @end smallexample @@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is semantically the same as a multiply of complex numbers where the second multiply arguments is conjugated. @smallexample - complex TYPE c[N]; - complex TYPE a[N]; - complex TYPE b[N]; + complex TYPE op0[N]; + complex TYPE op1[N]; + complex TYPE op2[N]; for (int i = 0; i < N; i += 1) @{ - c[i] = a[i] * conj (b[i]); + op2[i] = op0[i] * conj (op1[i]); @} @end smallexample diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755 --- a/gcc/testsuite/g++.dg/vect/pr99149.cc +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc @@ -24,4 +24,4 @@ public: } n; main() { n.j(); } -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */ +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c new file mode 100644 index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad1(float v1, float v2) +{ + for (int r = 0; r < 100; r += 4) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); + f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1); + f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2); + // ^^^^^^^ ^^^^^^^ + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c new file mode 100644 index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad1(float v1, float v2) +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2); + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c new file mode 100644 index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void good1(float v1, float v2) +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1); + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2); + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c new file mode 100644 index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void good1() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i]; + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c new file mode 100644 index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void good2() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1); + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1); + } +} + +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c new file mode 100644 index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad1() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i]; + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r]; + // ^^^^^^^ ^^^^^^^ + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c new file mode 100644 index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad2() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i]; + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r]; + // ^^^^ + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c new file mode 100644 index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +float f[12][100]; + +void bad3() +{ + for (int r = 0; r < 100; r += 2) + { + int i = r + 1; + f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i]; + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r]; + // ^^^^^^^ + } +} + +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c new file mode 100644 index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ + +#include <stdio.h> +#include <complex.h> + +#define N 200 +#define TYPE float +#define TYPE2 float + +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + { + c[i] -= a[i] * b[0]; + } +} + +/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS. */ + +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c new file mode 100644 index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target { vect_double } } } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */ + +_Complex double b_0, c_0; + +void +mul270snd (void) +{ + c_0 = b_0 * 1.0iF * 1.0iF; +} + diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b) } /* Return true when the data references A and B are accessing the same - memory object with the same access functions. */ + memory object with the same access functions. Optionally skip the + last OFFSET dimensions in the data reference. */ static inline bool -same_data_refs (data_reference_p a, data_reference_p b) +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0) { unsigned int i; @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b) if (!same_data_refs_base_objects (a, b)) return false; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) + for (i = offset; i < DR_NUM_DIMENSIONS (a); i++) if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i))) return false; diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c index 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f37305a5d3078fbb0c5 100644 --- a/gcc/tree-vect-slp-patterns.c +++ b/gcc/tree-vect-slp-patterns.c @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads) int valid_patterns = 4; FOR_EACH_VEC_ELT (loads, i, load) { - if (candidates[0] != PERM_UNKNOWN && load != 1) + unsigned adj_load = load % 2; + if (candidates[0] != PERM_UNKNOWN && adj_load != 1) { candidates[0] = PERM_UNKNOWN; valid_patterns--; } - if (candidates[1] != PERM_UNKNOWN && load != 0) + if (candidates[1] != PERM_UNKNOWN && adj_load != 0) { candidates[1] = PERM_UNKNOWN; valid_patterns--; @@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); static vect_pattern* mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) @@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo) internal_fn complex_add_pattern::matches (complex_operation_t op, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t * /* compat_cache */, slp_tree *node, vec<slp_tree> *ops) { internal_fn ifn = IFN_LAST; @@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op, vect_pattern* complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node) { auto_vec<slp_tree> ops; complex_operation_t op = vect_detect_pair_op (*node, true, &ops); internal_fn ifn - = complex_add_pattern::matches (op, perm_cache, node, &ops); + = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops); if (ifn == IFN_LAST) return NULL; @@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, * complex_mul_pattern ******************************************************************************/ -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the first - child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE. - - If a negate is found then the values in ARGS are reordered such that the - negate node is always the second one and the entry is replaced by the child - of the negate node. */ +/* Helper function to check if PERM is KIND or PERM_TOP. */ static inline bool -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL) +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache, + slp_tree op1, complex_perm_kinds_t kind1, + slp_tree op2, complex_perm_kinds_t kind2) { - gcc_assert (args.length () == 2); - bool neg_found = false; - - if (vect_match_expression_p (args[0], NEGATE_EXPR)) - { - std::swap (args[0], args[1]); - neg_found = true; - if (neg_first_p) - *neg_first_p = true; - } - else if (vect_match_expression_p (args[1], NEGATE_EXPR)) - { - neg_found = true; - if (neg_first_p) - *neg_first_p = false; - } + complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1); + if (perm1 != kind1 && perm1 != PERM_TOP) + return false; - if (neg_found) - args[1] = SLP_TREE_CHILDREN (args[1])[0]; + complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2); + if (perm2 != kind2 && perm2 != PERM_TOP) + return false; - return neg_found; + return true; } -/* Helper function to check if PERM is KIND or PERM_TOP. */ +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND }; static inline bool -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind) +compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache, + slp_tree a, int *pa, slp_tree b, int *pb) { - return perm == kind || perm == PERM_TOP; -} + bool *tmp; + std::pair<slp_tree, slp_tree> key = std::make_pair(a, b); + if ((tmp = compat_cache->get (key)) != NULL) + return *tmp; -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR - nodes but also that they represent an operation that is either a complex - multiplication or a complex multiplication by conjugated value. + compat_cache->put (key, false); - Of the negation is expected to be in the first half of the tree (As required - by an FMS pattern) then NEG_FIRST is true. If the operation is a conjugate - operation then CONJ_FIRST_OPERAND is set to indicate whether the first or - second operand contains the conjugate operation. */ + if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ()) + return false; -static inline bool -vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, - const vec<slp_tree> &left_op, - const vec<slp_tree> &right_op, - bool neg_first, bool *conj_first_operand, - bool fms) -{ - /* The presence of a negation indicates that we have either a conjugate or a - rotation. We need to distinguish which one. */ - *conj_first_operand = false; - complex_perm_kinds_t kind; - - /* Complex conjugates have the negation on the imaginary part of the - number where rotations affect the real component. So check if the - negation is on a dup of lane 1. */ - if (fms) + if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b)) + return false; + + /* Only internal nodes can be loads, as such we can't check further if they + are externals. */ + if (SLP_TREE_DEF_TYPE (a) != vect_internal_def) { - /* Canonicalization for fms is not consistent. So have to test both - variants to be sure. This needs to be fixed in the mid-end so - this part can be simpler. */ - kind = linear_loads_p (perm_cache, right_op[0]); - if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD) - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), - PERM_ODDEVEN)) - || (kind == PERM_ODDEVEN - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]), - PERM_ODDODD)))) - return false; + for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++) + { + tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]]; + tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]]; + if (!operand_equal_p (op1, op2, 0)) + return false; + } + + compat_cache->put (key, true); + return true; + } + + auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a)); + auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b)); + + if (gimple_code (a_stmt) != gimple_code (b_stmt)) + return false; + + /* code, children, type, externals, loads, constants */ + if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt)) + return false; + + /* At this point, a and b are known to be the same gimple operations. */ + if (is_gimple_call (a_stmt)) + { + if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt), + dyn_cast <gcall *> (b_stmt))) + return false; } + else if (!is_gimple_assign (a_stmt)) + return false; else { - if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD - && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), - PERM_ODDEVEN)) + tree_code acode = gimple_assign_rhs_code (a_stmt); + tree_code bcode = gimple_assign_rhs_code (b_stmt); + if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR) + && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR)) + return true; + + if (acode != bcode) return false; } - /* Deal with differences in indexes. */ - int index1 = fms ? 1 : 0; - int index2 = fms ? 0 : 1; - - /* Check if the conjugate is on the second first or second operand. The - order of the node with the conjugate value determines this, and the dup - node must be one of lane 0 of the same DR as the neg node. */ - kind = linear_loads_p (perm_cache, left_op[index1]); - if (kind == PERM_TOP) + if (!SLP_TREE_LOAD_PERMUTATION (a).exists () + || !SLP_TREE_LOAD_PERMUTATION (b).exists ()) { - if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD) - return true; + for (unsigned i = 0; i < gimple_num_args (a_stmt); i++) + { + tree t1 = gimple_arg (a_stmt, i); + tree t2 = gimple_arg (b_stmt, i); + if (TREE_CODE (t1) != TREE_CODE (t2)) + return false; + + /* If SSA name then we will need to inspect the children + so we can punt here. */ + if (TREE_CODE (t1) == SSA_NAME) + continue; + + if (!operand_equal_p (t1, t2, 0)) + return false; + } } - else if (kind == PERM_EVENODD && !neg_first) + else { - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN) + auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a)); + auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b)); + /* Don't check the last dimension as that's checked by the lineary + checks. This check is also much stricter than what we need + because it doesn't consider loading from adjacent elements + in the same struct as loading from the same base object. + But for now, I'll play it safe. */ + if (!same_data_refs (dr1, dr2, 1)) return false; - return true; } - else if (kind == PERM_EVENEVEN && neg_first) + + for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++) { - if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD) + if (!compatible_complex_nodes_p (compat_cache, + SLP_TREE_CHILDREN (a)[i], pa, + SLP_TREE_CHILDREN (b)[i], pb)) return false; - - *conj_first_operand = true; - return true; } - else - return false; - - if (kind != PERM_EVENEVEN) - return false; + compat_cache->put (key, true); return true; } -/* Helper function to help distinguish between a conjugate and a rotation in a - complex multiplication. The operations have similar shapes but the order of - the load permutes are different. This function returns TRUE when the order - is consistent with a multiplication or multiplication by conjugated - operand but returns FALSE if it's a multiplication by rotated operand. */ - static inline bool vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, - const vec<slp_tree> &op, - complex_perm_kinds_t permKind) + slp_compat_nodes_map_t *compat_cache, + vec<slp_tree> &left_op, + vec<slp_tree> &right_op, + bool subtract, + enum _conj_status *_status) { - /* The left node is the more common case, test it first. */ - if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind)) + auto_vec<slp_tree> ops; + enum _conj_status stats = CONJ_NONE; + + /* The complex operations can occur in two layouts and two permute sequences + so declare them and re-use them. */ + int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}. */ + , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}. */ + }; + + /* Now for the corresponding permutes that go with these values. */ + complex_perm_kinds_t perms[][4] + = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN } + , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD } + }; + + /* These permutes are used during comparisons of externals on which + we require strict equality. */ + int cq[][4][2] + = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } } + , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } } + }; + + /* Default to style and perm 0, most operations use this one. */ + int style = 0; + int perm = subtract ? 1 : 0; + + /* Check if we have a negate operation, if so absorb the node and continue + looking. */ + bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR); + bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR); + + /* Determine which style we're looking at. We only have different ones + whenever a conjugate is involved. */ + if (neg0 && neg1) + ; + else if (neg0) { - if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind)) - return false; + right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0]; + stats = CONJ_FST; + if (subtract) + perm = 0; } - return true; + else if (neg1) + { + right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0]; + stats = CONJ_SND; + perm = 1; + } + + *_status = stats; + + /* Flatten the inputs after we've remapped them. */ + ops.create (4); + ops.safe_splice (left_op); + ops.safe_splice (right_op); + + /* Extract out the elements to check. */ + slp_tree op0 = ops[styles[style][0]]; + slp_tree op1 = ops[styles[style][1]]; + slp_tree op2 = ops[styles[style][2]]; + slp_tree op3 = ops[styles[style][3]]; + + /* Do cheapest test first. If failed no need to analyze further. */ + if (linear_loads_p (perm_cache, op0) != perms[perm][0] + || linear_loads_p (perm_cache, op1) != perms[perm][1] + || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3])) + return false; + + return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1, + cq[perm][1]) + && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3, + cq[perm][3]); } /* This function combines two nodes containing only even and only odd lanes @@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); static vect_pattern* mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) @@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern internal_fn complex_mul_pattern::matches (complex_operation_t op, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node, vec<slp_tree> *ops) { internal_fn ifn = IFN_LAST; @@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op, || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN) return IFN_LAST; - bool neg_first = false; - bool conj_first_operand = false; - bool is_neg = vect_normalize_conj_loc (right_op, &neg_first); + enum _conj_status status; + if (!vect_validate_multiplication (perm_cache, compat_cache, left_op, + right_op, false, &status)) + return IFN_LAST; - if (!is_neg) + if (status == CONJ_NONE) { - /* A multiplication needs to multiply agains the real pair, otherwise - the pattern matches that of FMS. */ - if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN) - || vect_normalize_conj_loc (left_op)) - return IFN_LAST; if (add0) ifn = IFN_COMPLEX_FMA; else @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op, } else { - if (!vect_validate_multiplication (perm_cache, left_op, right_op, - neg_first, &conj_first_operand, - false)) - return IFN_LAST; - if(add0) ifn = IFN_COMPLEX_FMA_CONJ; else @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op, ops->quick_push (add0); complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]); - if (kind == PERM_EVENODD) - { - ops->quick_push (left_op[1]); - ops->quick_push (right_op[1]); - ops->quick_push (left_op[0]); - } - else if (kind == PERM_TOP) + if (kind == PERM_EVENODD || kind == PERM_TOP) { ops->quick_push (left_op[1]); ops->quick_push (right_op[1]); ops->quick_push (left_op[0]); } - else if (kind == PERM_EVENEVEN && !conj_first_operand) + else if (kind == PERM_EVENEVEN && status != CONJ_SND) { ops->quick_push (left_op[0]); ops->quick_push (right_op[0]); @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op, vect_pattern* complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node) { auto_vec<slp_tree> ops; complex_operation_t op = vect_detect_pair_op (*node, true, &ops); internal_fn ifn - = complex_mul_pattern::matches (op, perm_cache, node, &ops); + = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops); if (ifn == IFN_LAST) return NULL; @@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo) /* First re-arrange the children. */ SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2); - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2]; - SLP_TREE_CHILDREN (*this->m_node)[1] = newnode; + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode; + SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2]; break; } case IFN_COMPLEX_FMA: @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo) /* First re-arrange the children. */ SLP_TREE_CHILDREN (*this->m_node).safe_grow (3); - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0]; + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode; SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3]; - SLP_TREE_CHILDREN (*this->m_node)[2] = newnode; + SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0]; /* Tell the builder to expect an extra argument. */ this->m_num_args++; @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); static vect_pattern* mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn) @@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern internal_fn complex_fms_pattern::matches (complex_operation_t op, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree * ref_node, vec<slp_tree> *ops) { internal_fn ifn = IFN_LAST; @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op, if (!vect_match_expression_p (root, MINUS_EXPR)) return IFN_LAST; + /* TODO: Support invariants here, with the new layout CADD now + can match before we get a chance to try CFMS. */ auto nodes = SLP_TREE_CHILDREN (root); if (!vect_match_expression_p (nodes[1], MULT_EXPR) || vect_detect_pair_op (nodes[0]) != PLUS_MINUS) @@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op, || !vect_match_expression_p (l0node[1], MULT_EXPR)) return IFN_LAST; - bool is_neg = vect_normalize_conj_loc (left_op); - - bool conj_first_operand = false; - if (!vect_validate_multiplication (perm_cache, right_op, left_op, false, - &conj_first_operand, true)) + enum _conj_status status; + if (!vect_validate_multiplication (perm_cache, compat_cache, right_op, + left_op, true, &status)) return IFN_LAST; - if (!is_neg) + if (status == CONJ_NONE) ifn = IFN_COMPLEX_FMS; - else if (is_neg) + else ifn = IFN_COMPLEX_FMS_CONJ; if (!vect_pattern_validate_optab (ifn, *ref_node)) @@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op, ops->quick_push (right_op[1]); ops->quick_push (left_op[1]); } - else if (kind == PERM_TOP) - { - ops->quick_push (l0node[0]); - ops->quick_push (right_op[1]); - ops->quick_push (right_op[0]); - ops->quick_push (left_op[0]); - } - else if (kind == PERM_EVENEVEN && !is_neg) - { - ops->quick_push (l0node[0]); - ops->quick_push (right_op[1]); - ops->quick_push (right_op[0]); - ops->quick_push (left_op[0]); - } else { ops->quick_push (l0node[0]); ops->quick_push (right_op[1]); ops->quick_push (right_op[0]); - ops->quick_push (left_op[1]); + ops->quick_push (left_op[0]); } return ifn; @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op, vect_pattern* complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, slp_tree *node) { auto_vec<slp_tree> ops; complex_operation_t op = vect_detect_pair_op (*node, true, &ops); internal_fn ifn - = complex_fms_pattern::matches (op, perm_cache, node, &ops); + = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops); if (ifn == IFN_LAST) return NULL; @@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo) SLP_TREE_CHILDREN (*this->m_node).create (3); /* First re-arrange the children. */ + switch (this->m_ifn) + { + case IFN_COMPLEX_FMS: + { + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); + break; + } + case IFN_COMPLEX_FMS_CONJ: + { + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); + break; + } + default: + gcc_unreachable (); + } SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]); - SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]); - SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode); /* And then rewrite the node itself. */ complex_pattern::build (vinfo); @@ -1334,11 +1397,12 @@ class complex_operations_pattern : public complex_pattern public: void build (vec_info *); static internal_fn - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *, - vec<slp_tree> *); + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); }; /* Dummy matches implementation for proxy object. */ @@ -1347,6 +1411,7 @@ internal_fn complex_operations_pattern:: matches (complex_operation_t /* op */, slp_tree_to_load_perm_map_t * /* perm_cache */, + slp_compat_nodes_map_t * /* compat_cache */, slp_tree * /* ref_node */, vec<slp_tree> * /* ops */) { return IFN_LAST; @@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */, vect_pattern* complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *ccache, slp_tree *node) { auto_vec<slp_tree> ops; @@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, = vect_detect_pair_op (*node, true, &ops); internal_fn ifn = IFN_LAST; - ifn = complex_fms_pattern::matches (op, perm_cache, node, &ops); + ifn = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops); if (ifn != IFN_LAST) return complex_fms_pattern::mkInstance (node, &ops, ifn); - ifn = complex_mul_pattern::matches (op, perm_cache, node, &ops); + ifn = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops); if (ifn != IFN_LAST) return complex_mul_pattern::mkInstance (node, &ops, ifn); - ifn = complex_add_pattern::matches (op, perm_cache, node, &ops); + ifn = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops); if (ifn != IFN_LAST) return complex_add_pattern::mkInstance (node, &ops, ifn); @@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern void build (vec_info *); static vect_pattern* - recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *, + slp_tree *); }; vect_pattern * -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *node_) { slp_tree node = *node_; if (SLP_TREE_CODE (node) != VEC_PERM_EXPR diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, /* Return true if call statements CALL1 and CALL2 are similar enough to be combined into the same SLP group. */ -static bool +bool compatible_calls_p (gcall *call1, gcall *call2) { unsigned int nargs = gimple_call_num_args (call1); @@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map, static bool vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache, hash_set<slp_tree> *visited) { unsigned i; @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, slp_tree child; FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i], - vinfo, perm_cache, visited); + vinfo, perm_cache, compat_cache, + visited); for (unsigned x = 0; x < num__slp_patterns; x++) { - vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node); + vect_pattern *pattern + = slp_patterns[x] (perm_cache, compat_cache, ref_node); if (pattern) { pattern->build (vinfo); @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, static bool vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, hash_set<slp_tree> *visited, - slp_tree_to_load_perm_map_t *perm_cache) + slp_tree_to_load_perm_map_t *perm_cache, + slp_compat_nodes_map_t *compat_cache) { DUMP_VECT_SCOPE ("vect_match_slp_patterns"); slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, "Analyzing SLP tree %p for patterns\n", SLP_INSTANCE_TREE (instance)); - return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited); + return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache, + visited); } /* STMT_INFO is a store group of size GROUP_SIZE that we are considering @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) hash_set<slp_tree> visited_patterns; slp_tree_to_load_perm_map_t perm_cache; + slp_compat_nodes_map_t compat_cache; /* See if any patterns can be found in the SLP tree. */ bool pattern_found = false; FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance) pattern_found |= vect_match_slp_patterns (instance, vinfo, - &visited_patterns, &perm_cache); + &visited_patterns, &perm_cache, + &compat_cache); /* If any were found optimize permutations of loads. */ if (pattern_found) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree, extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info); extern slp_tree vect_create_new_slp_node (unsigned, tree_code); extern void vect_free_slp_tree (slp_tree); +extern bool compatible_calls_p (gcall *, gcall *); /* In tree-vect-patterns.c. */ extern void @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds { typedef hash_map <slp_tree, complex_perm_kinds_t> slp_tree_to_load_perm_map_t; +/* Cache from nodes pair to being compatible or not. */ +typedef pair_hash <nofree_ptr_hash <_slp_tree>, + nofree_ptr_hash <_slp_tree>> slp_node_hash; +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t; + + /* Vector pattern matcher base class. All SLP pattern matchers must inherit from this type. */ @@ -2338,7 +2345,8 @@ class vect_pattern public: /* Create a new instance of the pattern matcher class of the given type. */ - static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *); /* Build the pattern from the data collected so far. */ virtual void build (vec_info *) = 0; @@ -2352,6 +2360,7 @@ class vect_pattern /* Function pointer to create a new pattern matcher from a generic type. */ typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *, + slp_compat_nodes_map_t *, slp_tree *); /* List of supported pattern matchers. */