Message ID | 20240830104611.131000-1-manolis.tsamis@vrull.eu |
---|---|
State | New |
Headers | show |
Series | [RFC] Do not cache SLP trees across SLP instances [PR116352] | expand |
For more context please see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11. I've sent this as an RFC because it's hard to be sure that this was just uncovered by r15-2820-gab18785840d7b8 rather than being introduced by it. Still, to the best of my knowledge after reading through the code, this looked like a general issue to me and hence I'm proposing this fix. Thanks, Manolis On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > Currently we do SLP CSE after permute optimization using a single map across > all SLP instances. These SLP instances can affect many different basic blocks > and the cache may replace a statement in one block with one from another block. > Because there are no further limitations and the blocks may be visited in an > arbitrary order, this may lead to a statement being used in paths that may not > be defined. This patch creates one map per SLP instance to address that. > > PR tree-optimization/116352 > > gcc/ChangeLog: > > * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to tree > map per SLP instance. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr116352.c: New test. > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > --- > > gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++ > gcc/tree-vect-slp.cc | 12 +++++++----- > 2 files changed, 25 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/pr116352.c > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c b/gcc/testsuite/gcc.dg/pr116352.c > new file mode 100644 > index 00000000000..c427eff3c08 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr116352.c > @@ -0,0 +1,18 @@ > +/* PR tree-optimization/116352 */ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fchecking" } */ > + > +int a; > +float b, c; > +void l(int h, int f, int g, float *e) > +{ > + for (int m = 0; m < h; m++) > + { > + float i = 2 * b, j = 2 * c; > + if (a) { > + e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f); > + e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f); > + } else > + e[m*4 + 0] = f * 0.5f + g * 0.5f; > + } > +} > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index cfdf59ad386..a8836c2fb04 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo) > vect_optimize_slp_pass (vinfo).run (); > > /* Apply CSE again to nodes after permute optimization. */ > - scalar_stmts_to_slp_tree_map_t *bst_map > - = new scalar_stmts_to_slp_tree_map_t (); > - > for (auto inst : vinfo->slp_instances) > - vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > + { > + scalar_stmts_to_slp_tree_map_t *bst_map > + = new scalar_stmts_to_slp_tree_map_t (); > > - release_scalar_stmts_to_slp_tree_map (bst_map); > + vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > + > + release_scalar_stmts_to_slp_tree_map (bst_map); > + } > } > > /* Gather loads reachable from the individual SLP graph entries. */ > -- > 2.34.1 >
On Fri, 30 Aug 2024, Manolis Tsamis wrote: > For more context please see > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11. > > I've sent this as an RFC because it's hard to be sure that this was > just uncovered by r15-2820-gab18785840d7b8 rather than being > introduced by it. > Still, to the best of my knowledge after reading through the code, > this looked like a general issue to me and hence I'm proposing this > fix. It's a general issue indeed. Note we do want to (re-)perform CSE even between SLP instances - we're CSEing only internal defs, not vect_{constant,external}_def at the moment. The reason is that we want to put two instances accessing the same stmts into the same partition (I think that would happen also without the CSE, they'd just not share vectors). Now - in this case we're indirectly CSEing vect_{constant,external}_def given we have _119 = {j_160, _75, i_159, _73}; _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; and _21 = {j_132, _75, i_131, _73}; = VEC_PERM_EXPR <_21, _21, { 1, 1, 3, 3 }>; CSE doesn't honor scheduling constraints - it relies on SSA defs being available (for the scalar lanes) but here, while the scalar lanes are available the def they are picked from are not. So in general I don't agree with the patch - it's a workaround only. Note we can't reject this at the point we want to schedule, and we do currently have no separate check of whether there is a valid schedule - instead we rely on this by construction which means you have a point in CSE violating this. One way to avoid this in CSE would be to not allow CSEing of non-bijective permutes like { 1, 1, 3, 3 } that do not select all of the input lanes (not selecting a lane that has a NULL scalar def would be OK I think). The other obvious thing in this particular case is that the permutes could all have been vector constructors, so instead of _119 = {j_160, _75, i_159, _73}; _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; _150 = VEC_PERM_EXPR <_119, _119, { 0, 0, 2, 2 }>; we'd have _80 = { _75, _75, _73, _73 }; _150 = { j_160, j_160, i_159, i_159 }; I think that would be a profitable simplification but of course it doesn't address the general issue. I'd have expected that the SLP permute optimization would eventually do this but I think it only ever considers bijective permutes. Richard. > Thanks, > Manolis > > On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > Currently we do SLP CSE after permute optimization using a single map across > > all SLP instances. These SLP instances can affect many different basic blocks > > and the cache may replace a statement in one block with one from another block. > > Because there are no further limitations and the blocks may be visited in an > > arbitrary order, this may lead to a statement being used in paths that may not > > be defined. This patch creates one map per SLP instance to address that. > > > > PR tree-optimization/116352 > > > > gcc/ChangeLog: > > > > * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to tree > > map per SLP instance. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/pr116352.c: New test. > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > --- > > > > gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++ > > gcc/tree-vect-slp.cc | 12 +++++++----- > > 2 files changed, 25 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/pr116352.c > > > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c b/gcc/testsuite/gcc.dg/pr116352.c > > new file mode 100644 > > index 00000000000..c427eff3c08 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr116352.c > > @@ -0,0 +1,18 @@ > > +/* PR tree-optimization/116352 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O3 -fchecking" } */ > > + > > +int a; > > +float b, c; > > +void l(int h, int f, int g, float *e) > > +{ > > + for (int m = 0; m < h; m++) > > + { > > + float i = 2 * b, j = 2 * c; > > + if (a) { > > + e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f); > > + e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f); > > + } else > > + e[m*4 + 0] = f * 0.5f + g * 0.5f; > > + } > > +} > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > index cfdf59ad386..a8836c2fb04 100644 > > --- a/gcc/tree-vect-slp.cc > > +++ b/gcc/tree-vect-slp.cc > > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo) > > vect_optimize_slp_pass (vinfo).run (); > > > > /* Apply CSE again to nodes after permute optimization. */ > > - scalar_stmts_to_slp_tree_map_t *bst_map > > - = new scalar_stmts_to_slp_tree_map_t (); > > - > > for (auto inst : vinfo->slp_instances) > > - vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > + { > > + scalar_stmts_to_slp_tree_map_t *bst_map > > + = new scalar_stmts_to_slp_tree_map_t (); > > > > - release_scalar_stmts_to_slp_tree_map (bst_map); > > + vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > + > > + release_scalar_stmts_to_slp_tree_map (bst_map); > > + } > > } > > > > /* Gather loads reachable from the individual SLP graph entries. */ > > -- > > 2.34.1 > > >
On Fri, Aug 30, 2024 at 2:24 PM Richard Biener <rguenther@suse.de> wrote: > > On Fri, 30 Aug 2024, Manolis Tsamis wrote: > > > For more context please see > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11. > > > > I've sent this as an RFC because it's hard to be sure that this was > > just uncovered by r15-2820-gab18785840d7b8 rather than being > > introduced by it. > > Still, to the best of my knowledge after reading through the code, > > this looked like a general issue to me and hence I'm proposing this > > fix. > > It's a general issue indeed. Note we do want to (re-)perform CSE > even between SLP instances - we're CSEing only internal defs, > not vect_{constant,external}_def at the moment. > Ok. > The reason is that we want to put two instances accessing the > same stmts into the same partition (I think that would happen > also without the CSE, they'd just not share vectors). > > Now - in this case we're indirectly CSEing vect_{constant,external}_def > given we have > > _119 = {j_160, _75, i_159, _73}; > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > > and > > _21 = {j_132, _75, i_131, _73}; > = VEC_PERM_EXPR <_21, _21, { 1, 1, 3, 3 }>; > > CSE doesn't honor scheduling constraints - it relies on SSA defs > being available (for the scalar lanes) but here, while the scalar > lanes are available the def they are picked from are not. > > So in general I don't agree with the patch - it's a workaround only. > Agreed with that, based on the context you provided it is indeed a workaround at best. > Note we can't reject this at the point we want to schedule, and we > do currently have no separate check of whether there is a valid > schedule - instead we rely on this by construction which means you > have a point in CSE violating this. > > One way to avoid this in CSE would be to not allow CSEing of > non-bijective permutes like { 1, 1, 3, 3 } that do not select all > of the input lanes (not selecting a lane that has a NULL > scalar def would be OK I think). > > The other obvious thing in this particular case is that the > permutes could all have been vector constructors, so instead of > > _119 = {j_160, _75, i_159, _73}; > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > _150 = VEC_PERM_EXPR <_119, _119, { 0, 0, 2, 2 }>; > > we'd have > > _80 = { _75, _75, _73, _73 }; > _150 = { j_160, j_160, i_159, i_159 }; > > I think that would be a profitable simplification but of course > it doesn't address the general issue. I'd have expected that > the SLP permute optimization would eventually do this but I > think it only ever considers bijective permutes. > Ok thanks for all the details, We'll have to look into a different solution based on these. Manolis > Richard. > > > > Thanks, > > Manolis > > > > On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > Currently we do SLP CSE after permute optimization using a single map across > > > all SLP instances. These SLP instances can affect many different basic blocks > > > and the cache may replace a statement in one block with one from another block. > > > Because there are no further limitations and the blocks may be visited in an > > > arbitrary order, this may lead to a statement being used in paths that may not > > > be defined. This patch creates one map per SLP instance to address that. > > > > > > PR tree-optimization/116352 > > > > > > gcc/ChangeLog: > > > > > > * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to tree > > > map per SLP instance. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/pr116352.c: New test. > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > > --- > > > > > > gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++ > > > gcc/tree-vect-slp.cc | 12 +++++++----- > > > 2 files changed, 25 insertions(+), 5 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/pr116352.c > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c b/gcc/testsuite/gcc.dg/pr116352.c > > > new file mode 100644 > > > index 00000000000..c427eff3c08 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/pr116352.c > > > @@ -0,0 +1,18 @@ > > > +/* PR tree-optimization/116352 */ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O3 -fchecking" } */ > > > + > > > +int a; > > > +float b, c; > > > +void l(int h, int f, int g, float *e) > > > +{ > > > + for (int m = 0; m < h; m++) > > > + { > > > + float i = 2 * b, j = 2 * c; > > > + if (a) { > > > + e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f); > > > + e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f); > > > + } else > > > + e[m*4 + 0] = f * 0.5f + g * 0.5f; > > > + } > > > +} > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > > index cfdf59ad386..a8836c2fb04 100644 > > > --- a/gcc/tree-vect-slp.cc > > > +++ b/gcc/tree-vect-slp.cc > > > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo) > > > vect_optimize_slp_pass (vinfo).run (); > > > > > > /* Apply CSE again to nodes after permute optimization. */ > > > - scalar_stmts_to_slp_tree_map_t *bst_map > > > - = new scalar_stmts_to_slp_tree_map_t (); > > > - > > > for (auto inst : vinfo->slp_instances) > > > - vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > + { > > > + scalar_stmts_to_slp_tree_map_t *bst_map > > > + = new scalar_stmts_to_slp_tree_map_t (); > > > > > > - release_scalar_stmts_to_slp_tree_map (bst_map); > > > + vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > + > > > + release_scalar_stmts_to_slp_tree_map (bst_map); > > > + } > > > } > > > > > > /* Gather loads reachable from the individual SLP graph entries. */ > > > -- > > > 2.34.1 > > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
On Fri, 30 Aug 2024, Manolis Tsamis wrote: > On Fri, Aug 30, 2024 at 2:24 PM Richard Biener <rguenther@suse.de> wrote: > > > > On Fri, 30 Aug 2024, Manolis Tsamis wrote: > > > > > For more context please see > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352#c11. > > > > > > I've sent this as an RFC because it's hard to be sure that this was > > > just uncovered by r15-2820-gab18785840d7b8 rather than being > > > introduced by it. > > > Still, to the best of my knowledge after reading through the code, > > > this looked like a general issue to me and hence I'm proposing this > > > fix. > > > > It's a general issue indeed. Note we do want to (re-)perform CSE > > even between SLP instances - we're CSEing only internal defs, > > not vect_{constant,external}_def at the moment. > > > Ok. > > > The reason is that we want to put two instances accessing the > > same stmts into the same partition (I think that would happen > > also without the CSE, they'd just not share vectors). > > > > Now - in this case we're indirectly CSEing vect_{constant,external}_def > > given we have > > > > _119 = {j_160, _75, i_159, _73}; > > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > > > > and > > > > _21 = {j_132, _75, i_131, _73}; > > = VEC_PERM_EXPR <_21, _21, { 1, 1, 3, 3 }>; > > > > CSE doesn't honor scheduling constraints - it relies on SSA defs > > being available (for the scalar lanes) but here, while the scalar > > lanes are available the def they are picked from are not. > > > > So in general I don't agree with the patch - it's a workaround only. > > > Agreed with that, based on the context you provided it is indeed a > workaround at best. > > > Note we can't reject this at the point we want to schedule, and we > > do currently have no separate check of whether there is a valid > > schedule - instead we rely on this by construction which means you > > have a point in CSE violating this. > > > > One way to avoid this in CSE would be to not allow CSEing of > > non-bijective permutes like { 1, 1, 3, 3 } that do not select all > > of the input lanes (not selecting a lane that has a NULL > > scalar def would be OK I think). > > > > The other obvious thing in this particular case is that the > > permutes could all have been vector constructors, so instead of > > > > _119 = {j_160, _75, i_159, _73}; > > _80 = VEC_PERM_EXPR <_119, _119, { 1, 1, 3, 3 }>; > > _150 = VEC_PERM_EXPR <_119, _119, { 0, 0, 2, 2 }>; > > > > we'd have > > > > _80 = { _75, _75, _73, _73 }; > > _150 = { j_160, j_160, i_159, i_159 }; > > > > I think that would be a profitable simplification but of course > > it doesn't address the general issue. I'd have expected that > > the SLP permute optimization would eventually do this but I > > think it only ever considers bijective permutes. > > > Ok thanks for all the details, We'll have to look into a different > solution based on these. Btw, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115764 which is I think the very same (or similar) issue. Richard. > Manolis > > > Richard. > > > > > > > Thanks, > > > Manolis > > > > > > On Fri, Aug 30, 2024 at 1:46 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote: > > > > > > > > Currently we do SLP CSE after permute optimization using a single map across > > > > all SLP instances. These SLP instances can affect many different basic blocks > > > > and the cache may replace a statement in one block with one from another block. > > > > Because there are no further limitations and the blocks may be visited in an > > > > arbitrary order, this may lead to a statement being used in paths that may not > > > > be defined. This patch creates one map per SLP instance to address that. > > > > > > > > PR tree-optimization/116352 > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to tree > > > > map per SLP instance. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/pr116352.c: New test. > > > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> > > > > --- > > > > > > > > gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++ > > > > gcc/tree-vect-slp.cc | 12 +++++++----- > > > > 2 files changed, 25 insertions(+), 5 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.dg/pr116352.c > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr116352.c b/gcc/testsuite/gcc.dg/pr116352.c > > > > new file mode 100644 > > > > index 00000000000..c427eff3c08 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr116352.c > > > > @@ -0,0 +1,18 @@ > > > > +/* PR tree-optimization/116352 */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fchecking" } */ > > > > + > > > > +int a; > > > > +float b, c; > > > > +void l(int h, int f, int g, float *e) > > > > +{ > > > > + for (int m = 0; m < h; m++) > > > > + { > > > > + float i = 2 * b, j = 2 * c; > > > > + if (a) { > > > > + e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f); > > > > + e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f); > > > > + } else > > > > + e[m*4 + 0] = f * 0.5f + g * 0.5f; > > > > + } > > > > +} > > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > > > index cfdf59ad386..a8836c2fb04 100644 > > > > --- a/gcc/tree-vect-slp.cc > > > > +++ b/gcc/tree-vect-slp.cc > > > > @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo) > > > > vect_optimize_slp_pass (vinfo).run (); > > > > > > > > /* Apply CSE again to nodes after permute optimization. */ > > > > - scalar_stmts_to_slp_tree_map_t *bst_map > > > > - = new scalar_stmts_to_slp_tree_map_t (); > > > > - > > > > for (auto inst : vinfo->slp_instances) > > > > - vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > > + { > > > > + scalar_stmts_to_slp_tree_map_t *bst_map > > > > + = new scalar_stmts_to_slp_tree_map_t (); > > > > > > > > - release_scalar_stmts_to_slp_tree_map (bst_map); > > > > + vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); > > > > + > > > > + release_scalar_stmts_to_slp_tree_map (bst_map); > > > > + } > > > > } > > > > > > > > /* Gather loads reachable from the individual SLP graph entries. */ > > > > -- > > > > 2.34.1 > > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) >
diff --git a/gcc/testsuite/gcc.dg/pr116352.c b/gcc/testsuite/gcc.dg/pr116352.c new file mode 100644 index 00000000000..c427eff3c08 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr116352.c @@ -0,0 +1,18 @@ +/* PR tree-optimization/116352 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fchecking" } */ + +int a; +float b, c; +void l(int h, int f, int g, float *e) +{ + for (int m = 0; m < h; m++) + { + float i = 2 * b, j = 2 * c; + if (a) { + e[m*4 + 0] = e[m*4 + 1] = (j - g * 0.5f); + e[m*4 + 2] = e[m*4 + 3] = (i + f * 0.5f); + } else + e[m*4 + 0] = f * 0.5f + g * 0.5f; + } +} diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index cfdf59ad386..a8836c2fb04 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -6356,13 +6356,15 @@ vect_optimize_slp (vec_info *vinfo) vect_optimize_slp_pass (vinfo).run (); /* Apply CSE again to nodes after permute optimization. */ - scalar_stmts_to_slp_tree_map_t *bst_map - = new scalar_stmts_to_slp_tree_map_t (); - for (auto inst : vinfo->slp_instances) - vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); + { + scalar_stmts_to_slp_tree_map_t *bst_map + = new scalar_stmts_to_slp_tree_map_t (); - release_scalar_stmts_to_slp_tree_map (bst_map); + vect_cse_slp_nodes (bst_map, SLP_INSTANCE_TREE (inst)); + + release_scalar_stmts_to_slp_tree_map (bst_map); + } } /* Gather loads reachable from the individual SLP graph entries. */
Currently we do SLP CSE after permute optimization using a single map across all SLP instances. These SLP instances can affect many different basic blocks and the cache may replace a statement in one block with one from another block. Because there are no further limitations and the blocks may be visited in an arbitrary order, this may lead to a statement being used in paths that may not be defined. This patch creates one map per SLP instance to address that. PR tree-optimization/116352 gcc/ChangeLog: * tree-vect-slp.cc (vect_optimize_slp): Use one scalar stmts to tree map per SLP instance. gcc/testsuite/ChangeLog: * gcc.dg/pr116352.c: New test. Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> --- gcc/testsuite/gcc.dg/pr116352.c | 18 ++++++++++++++++++ gcc/tree-vect-slp.cc | 12 +++++++----- 2 files changed, 25 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr116352.c