Message ID | mptfshacd3v.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | vect: Ensure SLP nodes don't end up in multiple BB partitions [PR106787] | expand |
On Fri, 2 Sep 2022, Richard Sandiford wrote: > In the PR we have two REDUC_PLUS SLP instances that share a common > load of stride 4. Each instance also has a unique contiguous load. > > Initially all three loads are out of order, so have a nontrivial > load permutation. The layout pass puts them in order instead, > For the two contiguous loads it is possible to do this by adjusting the > SLP_LOAD_PERMUTATION to be { 0, 1, 2, 3 }. But a SLP_LOAD_PERMUTATION > of { 0, 4, 8, 12 } is rejected as unsupported, so the pass creates a > separate VEC_PERM_EXPR instead. > > Later the 4-stride load's initial SLP_LOAD_PERMUTATION is rejected too, > so that the load gets replaced by an external node built from scalars. > We then have an external node feeding a VEC_PERM_EXPR. > > VEC_PERM_EXPRs created in this way do not have any associated > SLP_TREE_SCALAR_STMTS. This means that they do not affect the > decision about which nodes should be in which subgraph for costing > purposes. If the VEC_PERM_EXPR is fed by a vect_external_def, > then the VEC_PERM_EXPR's input doesn't affect that decision either. > > The net effect is that a shared VEC_PERM_EXPR fed by an external def > can appear in more than one subgraph. This triggered an ICE in > vect_schedule_node, which (rightly) expects to be called no more > than once for the same internal def. > > There seemed to be many possible fixes, including: > > (1) Replace unsupported loads with external defs *before* doing > the layout optimisation. This would avoid the need for the > VEC_PERM_EXPR altogether. > > (2) If the target doesn't support a load in its original layout, > stop the layout optimisation from checking whether the target > supports loads in any new candidate layout. In other words, > treat all layouts as if they were supported whenever the > original layout is not in fact supported. > > I'd rather not do this. In principle, the layout optimisation > could convert an unsupported layout to a supported one. > Selectively ignoring target support would work against that. > > We could try to look specifically for loads that will need > to be decomposed, but that just seems like admitting that > things are happening in the wrong order. > > (3) Add SLP_TREE_SCALAR_STMTS to VEC_PERM_EXPRs. > > That would be OK for this case, but wouldn't be possible > for external defs that represent existing vectors. In general it's good to provide SLP_TREE_SCALAR_STMTS when we can, but yes, that's not a fix for the actual problem. > (4) Make vect_schedule_slp share SCC info between subgraphs. > > It feels like that's working around the partitioning problem > rather than a real fix though. > > (5) Directly ensure that internal def nodes belong to a single > subgraph. > > (1) is probably the best long-term fix, but (5) is much simpler. > The subgraph partitioning code already has a hash set to record > which nodes have been visited; we just need to convert that to a > map from nodes to instances instead. Agreed. > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > PR tree-optimization/106787 > * tree-vect-slp.cc (vect_map_to_instance): New function, split out > from... > (vect_bb_partition_graph_r): ...here. Replace the visited set > with a map from nodes to instances. Ensure that a node only > appears in one partition. > (vect_bb_partition_graph): Update accordingly. > > gcc/testsuite/ > * gcc.dg/vect/bb-slp-layout-19.c: New test. > --- > gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c | 34 ++++++++++ > gcc/tree-vect-slp.cc | 69 ++++++++++++-------- > 2 files changed, 77 insertions(+), 26 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c > new file mode 100644 > index 00000000000..f075a83a25b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-fno-tree-loop-vectorize" } */ > + > +extern int a[][4], b[][4], c[][4], d[4], e[4]; > +void f() > +{ > + int t0 = a[0][3]; > + int t1 = a[1][3]; > + int t2 = a[2][3]; > + int t3 = a[3][3]; > + int a0 = 0, a1 = 0, a2 = 0, a3 = 0, b0 = 0, b1 = 0, b2 = 0, b3 = 0; > + for (int j = 0; j < 100; ++j) > + for (int i = 0; i < 400; i += 4) > + { > + a0 += b[i][3] * t0; > + a1 += b[i][2] * t1; > + a2 += b[i][1] * t2; > + a3 += b[i][0] * t3; > + b0 += c[i][3] * t0; > + b1 += c[i][2] * t1; > + b2 += c[i][1] * t2; > + b3 += c[i][0] * t3; > + } > + d[0] = a0; > + d[1] = a1; > + d[2] = a2; > + d[3] = a3; > + e[0] = b0; > + e[1] = b1; > + e[2] = b2; > + e[3] = b3; > +} > + > +/* { dg-final { scan-tree-dump-times "add new stmt: \[^\\n\\r\]* = VEC_PERM_EXPR" 3 "slp1" { target { vect_int_mult && vect_perm } } } } */ > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 1cf79eee4a6..59ec66a6f96 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -6435,47 +6435,64 @@ get_ultimate_leader (slp_instance instance, > return instance; > } > > +namespace { > +/* Subroutine of vect_bb_partition_graph_r. Map KEY to INSTANCE in > + KEY_TO_INSTANCE, making INSTANCE the leader of any previous mapping > + for KEY. Return true if KEY was already in KEY_TO_INSTANCE. > + > + INSTANCE_LEADER is as for get_ultimate_leader. */ > + > +template<typename T> > +bool > +vect_map_to_instance (slp_instance instance, T key, > + hash_map<T, slp_instance> &key_to_instance, > + hash_map<slp_instance, slp_instance> &instance_leader) > +{ > + bool existed_p; > + slp_instance &key_instance = key_to_instance.get_or_insert (key, &existed_p); > + if (!existed_p) > + ; > + else if (key_instance != instance) > + { > + /* If we're running into a previously marked key make us the > + leader of the current ultimate leader. This keeps the > + leader chain acyclic and works even when the current instance > + connects two previously independent graph parts. */ > + slp_instance key_leader > + = get_ultimate_leader (key_instance, instance_leader); > + if (key_leader != instance) > + instance_leader.put (key_leader, instance); > + } > + key_instance = instance; > + return existed_p; > +} > +} > + > /* Worker of vect_bb_partition_graph, recurse on NODE. */ > > static void > vect_bb_partition_graph_r (bb_vec_info bb_vinfo, > slp_instance instance, slp_tree node, > hash_map<stmt_vec_info, slp_instance> &stmt_to_instance, > - hash_map<slp_instance, slp_instance> &instance_leader, > - hash_set<slp_tree> &visited) > + hash_map<slp_tree, slp_instance> &node_to_instance, > + hash_map<slp_instance, slp_instance> &instance_leader) > { > stmt_vec_info stmt_info; > unsigned i; > > FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) > - { > - bool existed_p; > - slp_instance &stmt_instance > - = stmt_to_instance.get_or_insert (stmt_info, &existed_p); > - if (!existed_p) > - ; > - else if (stmt_instance != instance) > - { > - /* If we're running into a previously marked stmt make us the > - leader of the current ultimate leader. This keeps the > - leader chain acyclic and works even when the current instance > - connects two previously independent graph parts. */ > - slp_instance stmt_leader > - = get_ultimate_leader (stmt_instance, instance_leader); > - if (stmt_leader != instance) > - instance_leader.put (stmt_leader, instance); > - } > - stmt_instance = instance; > - } > + vect_map_to_instance (instance, stmt_info, stmt_to_instance, > + instance_leader); > > - if (!SLP_TREE_SCALAR_STMTS (node).is_empty () && visited.add (node)) > + if (vect_map_to_instance (instance, node, node_to_instance, > + instance_leader)) > return; > > slp_tree child; > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > if (child && SLP_TREE_DEF_TYPE (child) == vect_internal_def) > vect_bb_partition_graph_r (bb_vinfo, instance, child, stmt_to_instance, > - instance_leader, visited); > + node_to_instance, instance_leader); > } > > /* Partition the SLP graph into pieces that can be costed independently. */ > @@ -6489,16 +6506,16 @@ vect_bb_partition_graph (bb_vec_info bb_vinfo) > corresponding SLP graph entry and upon visiting a previously > marked stmt, make the stmts leader the current SLP graph entry. */ > hash_map<stmt_vec_info, slp_instance> stmt_to_instance; > + hash_map<slp_tree, slp_instance> node_to_instance; > hash_map<slp_instance, slp_instance> instance_leader; > - hash_set<slp_tree> visited; > slp_instance instance; > for (unsigned i = 0; bb_vinfo->slp_instances.iterate (i, &instance); ++i) > { > instance_leader.put (instance, instance); > vect_bb_partition_graph_r (bb_vinfo, > instance, SLP_INSTANCE_TREE (instance), > - stmt_to_instance, instance_leader, > - visited); > + stmt_to_instance, node_to_instance, > + instance_leader); > } > > /* Then collect entries to each independent subgraph. */ >
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c new file mode 100644 index 00000000000..f075a83a25b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-19.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fno-tree-loop-vectorize" } */ + +extern int a[][4], b[][4], c[][4], d[4], e[4]; +void f() +{ + int t0 = a[0][3]; + int t1 = a[1][3]; + int t2 = a[2][3]; + int t3 = a[3][3]; + int a0 = 0, a1 = 0, a2 = 0, a3 = 0, b0 = 0, b1 = 0, b2 = 0, b3 = 0; + for (int j = 0; j < 100; ++j) + for (int i = 0; i < 400; i += 4) + { + a0 += b[i][3] * t0; + a1 += b[i][2] * t1; + a2 += b[i][1] * t2; + a3 += b[i][0] * t3; + b0 += c[i][3] * t0; + b1 += c[i][2] * t1; + b2 += c[i][1] * t2; + b3 += c[i][0] * t3; + } + d[0] = a0; + d[1] = a1; + d[2] = a2; + d[3] = a3; + e[0] = b0; + e[1] = b1; + e[2] = b2; + e[3] = b3; +} + +/* { dg-final { scan-tree-dump-times "add new stmt: \[^\\n\\r\]* = VEC_PERM_EXPR" 3 "slp1" { target { vect_int_mult && vect_perm } } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 1cf79eee4a6..59ec66a6f96 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -6435,47 +6435,64 @@ get_ultimate_leader (slp_instance instance, return instance; } +namespace { +/* Subroutine of vect_bb_partition_graph_r. Map KEY to INSTANCE in + KEY_TO_INSTANCE, making INSTANCE the leader of any previous mapping + for KEY. Return true if KEY was already in KEY_TO_INSTANCE. + + INSTANCE_LEADER is as for get_ultimate_leader. */ + +template<typename T> +bool +vect_map_to_instance (slp_instance instance, T key, + hash_map<T, slp_instance> &key_to_instance, + hash_map<slp_instance, slp_instance> &instance_leader) +{ + bool existed_p; + slp_instance &key_instance = key_to_instance.get_or_insert (key, &existed_p); + if (!existed_p) + ; + else if (key_instance != instance) + { + /* If we're running into a previously marked key make us the + leader of the current ultimate leader. This keeps the + leader chain acyclic and works even when the current instance + connects two previously independent graph parts. */ + slp_instance key_leader + = get_ultimate_leader (key_instance, instance_leader); + if (key_leader != instance) + instance_leader.put (key_leader, instance); + } + key_instance = instance; + return existed_p; +} +} + /* Worker of vect_bb_partition_graph, recurse on NODE. */ static void vect_bb_partition_graph_r (bb_vec_info bb_vinfo, slp_instance instance, slp_tree node, hash_map<stmt_vec_info, slp_instance> &stmt_to_instance, - hash_map<slp_instance, slp_instance> &instance_leader, - hash_set<slp_tree> &visited) + hash_map<slp_tree, slp_instance> &node_to_instance, + hash_map<slp_instance, slp_instance> &instance_leader) { stmt_vec_info stmt_info; unsigned i; FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) - { - bool existed_p; - slp_instance &stmt_instance - = stmt_to_instance.get_or_insert (stmt_info, &existed_p); - if (!existed_p) - ; - else if (stmt_instance != instance) - { - /* If we're running into a previously marked stmt make us the - leader of the current ultimate leader. This keeps the - leader chain acyclic and works even when the current instance - connects two previously independent graph parts. */ - slp_instance stmt_leader - = get_ultimate_leader (stmt_instance, instance_leader); - if (stmt_leader != instance) - instance_leader.put (stmt_leader, instance); - } - stmt_instance = instance; - } + vect_map_to_instance (instance, stmt_info, stmt_to_instance, + instance_leader); - if (!SLP_TREE_SCALAR_STMTS (node).is_empty () && visited.add (node)) + if (vect_map_to_instance (instance, node, node_to_instance, + instance_leader)) return; slp_tree child; FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) if (child && SLP_TREE_DEF_TYPE (child) == vect_internal_def) vect_bb_partition_graph_r (bb_vinfo, instance, child, stmt_to_instance, - instance_leader, visited); + node_to_instance, instance_leader); } /* Partition the SLP graph into pieces that can be costed independently. */ @@ -6489,16 +6506,16 @@ vect_bb_partition_graph (bb_vec_info bb_vinfo) corresponding SLP graph entry and upon visiting a previously marked stmt, make the stmts leader the current SLP graph entry. */ hash_map<stmt_vec_info, slp_instance> stmt_to_instance; + hash_map<slp_tree, slp_instance> node_to_instance; hash_map<slp_instance, slp_instance> instance_leader; - hash_set<slp_tree> visited; slp_instance instance; for (unsigned i = 0; bb_vinfo->slp_instances.iterate (i, &instance); ++i) { instance_leader.put (instance, instance); vect_bb_partition_graph_r (bb_vinfo, instance, SLP_INSTANCE_TREE (instance), - stmt_to_instance, instance_leader, - visited); + stmt_to_instance, node_to_instance, + instance_leader); } /* Then collect entries to each independent subgraph. */