Message ID | patch-14228-tamar@arm.com |
---|---|
State | New |
Headers | show |
Series | middle-end slp: fix accidental resource re-use of slp_tree (PR99220) | expand |
On Tue, 23 Feb 2021, Tamar Christina wrote: > Hi Richi, > > The attached testcase shows a bug where two nodes end up with the same pointer. > During the loop that analyzes all the instances > in optimize_load_redistribution_1 we do > > if (value) > { > SLP_TREE_REF_COUNT (value)++; > SLP_TREE_CHILDREN (root)[i] = value; > vect_free_slp_tree (node); > } > > when doing a replacement. When this is done and the refcount for the node > reaches 0, the node is removed, which allows the libc to return the pointer > again in the next call to new, which it does.. > > First instance > > note: node 0x5325f48 (max_nunits=1, refcnt=2) > note: op: VEC_PERM_EXPR > note: { } > note: lane permutation { 0[0] 1[1] 0[2] 1[3] } > note: children 0x5325db0 0x5325200 > > Second instance > > note: node 0x5325f48 (max_nunits=1, refcnt=1) > note: op: VEC_PERM_EXPR > note: { } > note: lane permutation { 0[0] 1[1] } > note: children 0x53255b8 0x5325530 > > This will end up with the illegal construction of > > note: node 0x53258e8 (max_nunits=2, refcnt=2) > note: op template: slp_patt_57 = .COMPLEX_MUL (_16, _16); > note: stmt 0 _16 = _14 - _15; > note: stmt 1 _23 = _17 + _22; > note: children 0x53257d8 0x5325d28 > note: node 0x53257d8 (max_nunits=2, refcnt=3) > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > note: load permutation { 0 1 } > note: node 0x5325d28 (max_nunits=2, refcnt=8) > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > note: stmt 2 l$b_4 = MEM[(const struct a &)_3].b; > note: stmt 3 l$c_5 = MEM[(const struct a &)_3].c; > note: load permutation { 0 1 0 1 } > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR nodes > to the bst_map cache and increase their refcnt one more. However since bst_map > is gated on scalar statements and these nodes have none we can't do that. > > Instead I realized that load_map is really only a visited list at the top level. > So instead of returning the reference, we should return NULL. > > What this means is that it will no replacement was found at that level. This is > fine since these VEC_PERM_EXPR are single use. So while the any other node is > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it. I don't understand really. Waiting for the other patch to be pushed so I can eventually have a look, but see below. > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/99220 > * tree-vect-slp.c (optimize_load_redistribution_1): Don't use > VEC_PERM_EXPR in cache. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/99220 > * g++.dg/vect/pr99220.cc: New test. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc > new file mode 100755 > index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa > --- /dev/null > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */ > + > +class a { > + float b; > + float c; > + > +public: > + a(float d, float e) : b(d), c(e) {} > + a operator+(a d) { return a(b + d.b, c + d.c); } > + a operator-(a d) { return a(b - d.b, c - d.c); } > + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } > +}; > +long f; > +a *g; > +class { > + a *h; > + long i; > + a *j; > + > +public: > + void k() { > + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; > + g[i] = m + n; > + g[i + 1] = m - n; > + j[f] = o; > + } > +} p; > +main() { p.k(); } > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map, > slp_tree root) > { > if (slp_tree *leader = load_map->get (root)) > - return *leader; > + { > + if (SLP_TREE_CODE (root) == VEC_PERM_EXPR) > + return NULL; But this will then only optimize the first occurance. Wouldn't it be better to increase the refcount at load_map->put (root, node); and walk load_map at the end, releasing refs owned by it like we do for bst_map? > + else > + return *leader; > + } > > load_map->put (root, NULL); > > > >
Hi Richi, This is an updated patch with your suggestion. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/99220 * tree-vect-slp.c (optimize_load_redistribution_1): Remove node from cache when it's about to be deleted. gcc/testsuite/ChangeLog: PR tree-optimization/99220 * g++.dg/vect/pr99220.cc: New test. The 02/24/2021 08:52, Richard Biener wrote: > On Tue, 23 Feb 2021, Tamar Christina wrote: > > > Hi Richi, > > > > The attached testcase shows a bug where two nodes end up with the same pointer. > > During the loop that analyzes all the instances > > in optimize_load_redistribution_1 we do > > > > if (value) > > { > > SLP_TREE_REF_COUNT (value)++; > > SLP_TREE_CHILDREN (root)[i] = value; > > vect_free_slp_tree (node); > > } > > > > when doing a replacement. When this is done and the refcount for the node > > reaches 0, the node is removed, which allows the libc to return the pointer > > again in the next call to new, which it does.. > > > > First instance > > > > note: node 0x5325f48 (max_nunits=1, refcnt=2) > > note: op: VEC_PERM_EXPR > > note: { } > > note: lane permutation { 0[0] 1[1] 0[2] 1[3] } > > note: children 0x5325db0 0x5325200 > > > > Second instance > > > > note: node 0x5325f48 (max_nunits=1, refcnt=1) > > note: op: VEC_PERM_EXPR > > note: { } > > note: lane permutation { 0[0] 1[1] } > > note: children 0x53255b8 0x5325530 > > > > This will end up with the illegal construction of > > > > note: node 0x53258e8 (max_nunits=2, refcnt=2) > > note: op template: slp_patt_57 = .COMPLEX_MUL (_16, _16); > > note: stmt 0 _16 = _14 - _15; > > note: stmt 1 _23 = _17 + _22; > > note: children 0x53257d8 0x5325d28 > > note: node 0x53257d8 (max_nunits=2, refcnt=3) > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > note: load permutation { 0 1 } > > note: node 0x5325d28 (max_nunits=2, refcnt=8) > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > note: stmt 2 l$b_4 = MEM[(const struct a &)_3].b; > > note: stmt 3 l$c_5 = MEM[(const struct a &)_3].c; > > note: load permutation { 0 1 0 1 } > > > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR nodes > > to the bst_map cache and increase their refcnt one more. However since bst_map > > is gated on scalar statements and these nodes have none we can't do that. > > > > Instead I realized that load_map is really only a visited list at the top level. > > So instead of returning the reference, we should return NULL. > > > > What this means is that it will no replacement was found at that level. This is > > fine since these VEC_PERM_EXPR are single use. So while the any other node is > > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it. > > I don't understand really. Waiting for the other patch to be pushed so > I can eventually have a look, but see below. > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/99220 > > * tree-vect-slp.c (optimize_load_redistribution_1): Don't use > > VEC_PERM_EXPR in cache. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/99220 > > * g++.dg/vect/pr99220.cc: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc > > new file mode 100755 > > index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc > > @@ -0,0 +1,29 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */ > > + > > +class a { > > + float b; > > + float c; > > + > > +public: > > + a(float d, float e) : b(d), c(e) {} > > + a operator+(a d) { return a(b + d.b, c + d.c); } > > + a operator-(a d) { return a(b - d.b, c - d.c); } > > + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } > > +}; > > +long f; > > +a *g; > > +class { > > + a *h; > > + long i; > > + a *j; > > + > > +public: > > + void k() { > > + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; > > + g[i] = m + n; > > + g[i + 1] = m - n; > > + j[f] = o; > > + } > > +} p; > > +main() { p.k(); } > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644 > > --- a/gcc/tree-vect-slp.c > > +++ b/gcc/tree-vect-slp.c > > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map, > > slp_tree root) > > { > > if (slp_tree *leader = load_map->get (root)) > > - return *leader; > > + { > > + if (SLP_TREE_CODE (root) == VEC_PERM_EXPR) > > + return NULL; > > But this will then only optimize the first occurance. Wouldn't it be > better to increase the refcount at > > load_map->put (root, node); > > and walk load_map at the end, releasing refs owned by it like we do > for bst_map? > > > + else > > + return *leader; > > + } > > > > load_map->put (root, NULL); > > > > > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) --
On Wed, 24 Feb 2021, Tamar Christina wrote: > Hi Richi, > > This is an updated patch with your suggestion. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? OK. Thanks, Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/99220 > * tree-vect-slp.c (optimize_load_redistribution_1): Remove > node from cache when it's about to be deleted. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/99220 > * g++.dg/vect/pr99220.cc: New test. > > The 02/24/2021 08:52, Richard Biener wrote: > > On Tue, 23 Feb 2021, Tamar Christina wrote: > > > > > Hi Richi, > > > > > > The attached testcase shows a bug where two nodes end up with the same pointer. > > > During the loop that analyzes all the instances > > > in optimize_load_redistribution_1 we do > > > > > > if (value) > > > { > > > SLP_TREE_REF_COUNT (value)++; > > > SLP_TREE_CHILDREN (root)[i] = value; > > > vect_free_slp_tree (node); > > > } > > > > > > when doing a replacement. When this is done and the refcount for the node > > > reaches 0, the node is removed, which allows the libc to return the pointer > > > again in the next call to new, which it does.. > > > > > > First instance > > > > > > note: node 0x5325f48 (max_nunits=1, refcnt=2) > > > note: op: VEC_PERM_EXPR > > > note: { } > > > note: lane permutation { 0[0] 1[1] 0[2] 1[3] } > > > note: children 0x5325db0 0x5325200 > > > > > > Second instance > > > > > > note: node 0x5325f48 (max_nunits=1, refcnt=1) > > > note: op: VEC_PERM_EXPR > > > note: { } > > > note: lane permutation { 0[0] 1[1] } > > > note: children 0x53255b8 0x5325530 > > > > > > This will end up with the illegal construction of > > > > > > note: node 0x53258e8 (max_nunits=2, refcnt=2) > > > note: op template: slp_patt_57 = .COMPLEX_MUL (_16, _16); > > > note: stmt 0 _16 = _14 - _15; > > > note: stmt 1 _23 = _17 + _22; > > > note: children 0x53257d8 0x5325d28 > > > note: node 0x53257d8 (max_nunits=2, refcnt=3) > > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > > note: load permutation { 0 1 } > > > note: node 0x5325d28 (max_nunits=2, refcnt=8) > > > note: op template: l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 0 l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 1 l$c_5 = MEM[(const struct a &)_3].c; > > > note: stmt 2 l$b_4 = MEM[(const struct a &)_3].b; > > > note: stmt 3 l$c_5 = MEM[(const struct a &)_3].c; > > > note: load permutation { 0 1 0 1 } > > > > > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR nodes > > > to the bst_map cache and increase their refcnt one more. However since bst_map > > > is gated on scalar statements and these nodes have none we can't do that. > > > > > > Instead I realized that load_map is really only a visited list at the top level. > > > So instead of returning the reference, we should return NULL. > > > > > > What this means is that it will no replacement was found at that level. This is > > > fine since these VEC_PERM_EXPR are single use. So while the any other node is > > > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it. > > > > I don't understand really. Waiting for the other patch to be pushed so > > I can eventually have a look, but see below. > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/99220 > > > * tree-vect-slp.c (optimize_load_redistribution_1): Don't use > > > VEC_PERM_EXPR in cache. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/99220 > > > * g++.dg/vect/pr99220.cc: New test. > > > > > > --- inline copy of patch -- > > > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc > > > new file mode 100755 > > > index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa > > > --- /dev/null > > > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc > > > @@ -0,0 +1,29 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */ > > > + > > > +class a { > > > + float b; > > > + float c; > > > + > > > +public: > > > + a(float d, float e) : b(d), c(e) {} > > > + a operator+(a d) { return a(b + d.b, c + d.c); } > > > + a operator-(a d) { return a(b - d.b, c - d.c); } > > > + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } > > > +}; > > > +long f; > > > +a *g; > > > +class { > > > + a *h; > > > + long i; > > > + a *j; > > > + > > > +public: > > > + void k() { > > > + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; > > > + g[i] = m + n; > > > + g[i + 1] = m - n; > > > + j[f] = o; > > > + } > > > +} p; > > > +main() { p.k(); } > > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > > index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644 > > > --- a/gcc/tree-vect-slp.c > > > +++ b/gcc/tree-vect-slp.c > > > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map, > > > slp_tree root) > > > { > > > if (slp_tree *leader = load_map->get (root)) > > > - return *leader; > > > + { > > > + if (SLP_TREE_CODE (root) == VEC_PERM_EXPR) > > > + return NULL; > > > > But this will then only optimize the first occurance. Wouldn't it be > > better to increase the refcount at > > > > load_map->put (root, node); > > > > and walk load_map at the end, releasing refs owned by it like we do > > for bst_map? > > > > > + else > > > + return *leader; > > > + } > > > > > > load_map->put (root, NULL); > > > > > > > > > > > > > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > >
diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc new file mode 100755 index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */ + +class a { + float b; + float c; + +public: + a(float d, float e) : b(d), c(e) {} + a operator+(a d) { return a(b + d.b, c + d.c); } + a operator-(a d) { return a(b - d.b, c - d.c); } + a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); } +}; +long f; +a *g; +class { + a *h; + long i; + a *j; + +public: + void k() { + a l = h[0], m = g[i], n = l * g[1], o = l * j[8]; + g[i] = m + n; + g[i + 1] = m - n; + j[f] = o; + } +} p; +main() { p.k(); } diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map, slp_tree root) { if (slp_tree *leader = load_map->get (root)) - return *leader; + { + if (SLP_TREE_CODE (root) == VEC_PERM_EXPR) + return NULL; + else + return *leader; + } load_map->put (root, NULL);