diff mbox series

middle-end slp: fix accidental resource re-use of slp_tree (PR99220)

Message ID patch-14228-tamar@arm.com
State New
Headers show
Series middle-end slp: fix accidental resource re-use of slp_tree (PR99220) | expand

Commit Message

Tamar Christina Feb. 23, 2021, 7:17 p.m. UTC
Hi Richi,

The attached testcase shows a bug where two nodes end up with the same pointer.
During the loop that analyzes all the instances
in optimize_load_redistribution_1 we do

      if (value)
        {
          SLP_TREE_REF_COUNT (value)++;
          SLP_TREE_CHILDREN (root)[i] = value;
          vect_free_slp_tree (node);
        }

when doing a replacement.  When this is done and the refcount for the node
reaches 0, the node is removed, which allows the libc to return the pointer
again in the next call to new, which it does..

First instance

note:   node 0x5325f48 (max_nunits=1, refcnt=2)
note:   op: VEC_PERM_EXPR
note:           { }
note:           lane permutation { 0[0] 1[1] 0[2] 1[3] }
note:           children 0x5325db0 0x5325200

Second instance

note:   node 0x5325f48 (max_nunits=1, refcnt=1)
note:   op: VEC_PERM_EXPR
note:           { }
note:           lane permutation { 0[0] 1[1] }
note:           children 0x53255b8 0x5325530

This will end up with the illegal construction of

note:   node 0x53258e8 (max_nunits=2, refcnt=2)
note:   op template: slp_patt_57 = .COMPLEX_MUL (_16, _16);
note:           stmt 0 _16 = _14 - _15;
note:           stmt 1 _23 = _17 + _22;
note:           children 0x53257d8 0x5325d28
note:   node 0x53257d8 (max_nunits=2, refcnt=3)
note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
note:           load permutation { 0 1 }
note:   node 0x5325d28 (max_nunits=2, refcnt=8)
note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
note:           stmt 2 l$b_4 = MEM[(const struct a &)_3].b;
note:           stmt 3 l$c_5 = MEM[(const struct a &)_3].c;
note:           load permutation { 0 1 0 1 }

To prevent this my initial thought was to add the temporary VEC_PERM_EXPR nodes
to the bst_map cache and increase their refcnt one more.  However since bst_map
is gated on scalar statements and these nodes have none we can't do that.

Instead I realized that load_map is really only a visited list at the top level.
So instead of returning the reference, we should return NULL.

What this means is that it will no replacement was found at that level.  This is
fine since these VEC_PERM_EXPR are single use.  So while the any other node is
an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/99220
	* tree-vect-slp.c (optimize_load_redistribution_1): Don't use
	VEC_PERM_EXPR in cache.

gcc/testsuite/ChangeLog:

	PR tree-optimization/99220
	* g++.dg/vect/pr99220.cc: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc
new file mode 100755
index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa


--

Comments

Richard Biener Feb. 24, 2021, 8:52 a.m. UTC | #1
On Tue, 23 Feb 2021, Tamar Christina wrote:

> Hi Richi,
> 
> The attached testcase shows a bug where two nodes end up with the same pointer.
> During the loop that analyzes all the instances
> in optimize_load_redistribution_1 we do
> 
>       if (value)
>         {
>           SLP_TREE_REF_COUNT (value)++;
>           SLP_TREE_CHILDREN (root)[i] = value;
>           vect_free_slp_tree (node);
>         }
> 
> when doing a replacement.  When this is done and the refcount for the node
> reaches 0, the node is removed, which allows the libc to return the pointer
> again in the next call to new, which it does..
> 
> First instance
> 
> note:   node 0x5325f48 (max_nunits=1, refcnt=2)
> note:   op: VEC_PERM_EXPR
> note:           { }
> note:           lane permutation { 0[0] 1[1] 0[2] 1[3] }
> note:           children 0x5325db0 0x5325200
> 
> Second instance
> 
> note:   node 0x5325f48 (max_nunits=1, refcnt=1)
> note:   op: VEC_PERM_EXPR
> note:           { }
> note:           lane permutation { 0[0] 1[1] }
> note:           children 0x53255b8 0x5325530
> 
> This will end up with the illegal construction of
> 
> note:   node 0x53258e8 (max_nunits=2, refcnt=2)
> note:   op template: slp_patt_57 = .COMPLEX_MUL (_16, _16);
> note:           stmt 0 _16 = _14 - _15;
> note:           stmt 1 _23 = _17 + _22;
> note:           children 0x53257d8 0x5325d28
> note:   node 0x53257d8 (max_nunits=2, refcnt=3)
> note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> note:           load permutation { 0 1 }
> note:   node 0x5325d28 (max_nunits=2, refcnt=8)
> note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> note:           stmt 2 l$b_4 = MEM[(const struct a &)_3].b;
> note:           stmt 3 l$c_5 = MEM[(const struct a &)_3].c;
> note:           load permutation { 0 1 0 1 }
> 
> To prevent this my initial thought was to add the temporary VEC_PERM_EXPR nodes
> to the bst_map cache and increase their refcnt one more.  However since bst_map
> is gated on scalar statements and these nodes have none we can't do that.
> 
> Instead I realized that load_map is really only a visited list at the top level.
> So instead of returning the reference, we should return NULL.
> 
> What this means is that it will no replacement was found at that level.  This is
> fine since these VEC_PERM_EXPR are single use.  So while the any other node is
> an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it.

I don't understand really.  Waiting for the other patch to be pushed so
I can eventually have a look, but see below.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/99220
> 	* tree-vect-slp.c (optimize_load_redistribution_1): Don't use
> 	VEC_PERM_EXPR in cache.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/99220
> 	* g++.dg/vect/pr99220.cc: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc
> new file mode 100755
> index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */
> +
> +class a {
> +  float b;
> +  float c;
> +
> +public:
> +  a(float d, float e) : b(d), c(e) {}
> +  a operator+(a d) { return a(b + d.b, c + d.c); }
> +  a operator-(a d) { return a(b - d.b, c - d.c); }
> +  a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); }
> +};
> +long f;
> +a *g;
> +class {
> +  a *h;
> +  long i;
> +  a *j;
> +
> +public:
> +  void k() {
> +    a l = h[0], m = g[i], n = l * g[1], o = l * j[8];
> +    g[i] = m + n;
> +    g[i + 1] = m - n;
> +    j[f] = o;
> +  }
> +} p;
> +main() { p.k(); }
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map,
>  				slp_tree root)
>  {
>    if (slp_tree *leader = load_map->get (root))
> -    return *leader;
> +    {
> +      if (SLP_TREE_CODE (root) == VEC_PERM_EXPR)
> +	return NULL;

But this will then only optimize the first occurance.  Wouldn't it be
better to increase the refcount at

      load_map->put (root, node);

and walk load_map at the end, releasing refs owned by it like we do
for bst_map?

> +      else
> +	return *leader;
> +    }
>  
>    load_map->put (root, NULL);
>  
> 
> 
>
Tamar Christina Feb. 24, 2021, 2:08 p.m. UTC | #2
Hi Richi,

This is an updated patch with your suggestion.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/99220
	* tree-vect-slp.c (optimize_load_redistribution_1): Remove
	node from cache when it's about to be deleted.

gcc/testsuite/ChangeLog:

	PR tree-optimization/99220
	* g++.dg/vect/pr99220.cc: New test.

The 02/24/2021 08:52, Richard Biener wrote:
> On Tue, 23 Feb 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> > 
> > The attached testcase shows a bug where two nodes end up with the same pointer.
> > During the loop that analyzes all the instances
> > in optimize_load_redistribution_1 we do
> > 
> >       if (value)
> >         {
> >           SLP_TREE_REF_COUNT (value)++;
> >           SLP_TREE_CHILDREN (root)[i] = value;
> >           vect_free_slp_tree (node);
> >         }
> > 
> > when doing a replacement.  When this is done and the refcount for the node
> > reaches 0, the node is removed, which allows the libc to return the pointer
> > again in the next call to new, which it does..
> > 
> > First instance
> > 
> > note:   node 0x5325f48 (max_nunits=1, refcnt=2)
> > note:   op: VEC_PERM_EXPR
> > note:           { }
> > note:           lane permutation { 0[0] 1[1] 0[2] 1[3] }
> > note:           children 0x5325db0 0x5325200
> > 
> > Second instance
> > 
> > note:   node 0x5325f48 (max_nunits=1, refcnt=1)
> > note:   op: VEC_PERM_EXPR
> > note:           { }
> > note:           lane permutation { 0[0] 1[1] }
> > note:           children 0x53255b8 0x5325530
> > 
> > This will end up with the illegal construction of
> > 
> > note:   node 0x53258e8 (max_nunits=2, refcnt=2)
> > note:   op template: slp_patt_57 = .COMPLEX_MUL (_16, _16);
> > note:           stmt 0 _16 = _14 - _15;
> > note:           stmt 1 _23 = _17 + _22;
> > note:           children 0x53257d8 0x5325d28
> > note:   node 0x53257d8 (max_nunits=2, refcnt=3)
> > note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> > note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> > note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> > note:           load permutation { 0 1 }
> > note:   node 0x5325d28 (max_nunits=2, refcnt=8)
> > note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> > note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> > note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> > note:           stmt 2 l$b_4 = MEM[(const struct a &)_3].b;
> > note:           stmt 3 l$c_5 = MEM[(const struct a &)_3].c;
> > note:           load permutation { 0 1 0 1 }
> > 
> > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR nodes
> > to the bst_map cache and increase their refcnt one more.  However since bst_map
> > is gated on scalar statements and these nodes have none we can't do that.
> > 
> > Instead I realized that load_map is really only a visited list at the top level.
> > So instead of returning the reference, we should return NULL.
> > 
> > What this means is that it will no replacement was found at that level.  This is
> > fine since these VEC_PERM_EXPR are single use.  So while the any other node is
> > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it.
> 
> I don't understand really.  Waiting for the other patch to be pushed so
> I can eventually have a look, but see below.
> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > 	PR tree-optimization/99220
> > 	* tree-vect-slp.c (optimize_load_redistribution_1): Don't use
> > 	VEC_PERM_EXPR in cache.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	PR tree-optimization/99220
> > 	* g++.dg/vect/pr99220.cc: New test.
> > 
> > --- inline copy of patch -- 
> > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc
> > new file mode 100755
> > index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */
> > +
> > +class a {
> > +  float b;
> > +  float c;
> > +
> > +public:
> > +  a(float d, float e) : b(d), c(e) {}
> > +  a operator+(a d) { return a(b + d.b, c + d.c); }
> > +  a operator-(a d) { return a(b - d.b, c - d.c); }
> > +  a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); }
> > +};
> > +long f;
> > +a *g;
> > +class {
> > +  a *h;
> > +  long i;
> > +  a *j;
> > +
> > +public:
> > +  void k() {
> > +    a l = h[0], m = g[i], n = l * g[1], o = l * j[8];
> > +    g[i] = m + n;
> > +    g[i + 1] = m - n;
> > +    j[f] = o;
> > +  }
> > +} p;
> > +main() { p.k(); }
> > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644
> > --- a/gcc/tree-vect-slp.c
> > +++ b/gcc/tree-vect-slp.c
> > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map,
> >  				slp_tree root)
> >  {
> >    if (slp_tree *leader = load_map->get (root))
> > -    return *leader;
> > +    {
> > +      if (SLP_TREE_CODE (root) == VEC_PERM_EXPR)
> > +	return NULL;
> 
> But this will then only optimize the first occurance.  Wouldn't it be
> better to increase the refcount at
> 
>       load_map->put (root, node);
> 
> and walk load_map at the end, releasing refs owned by it like we do
> for bst_map?
> 
> > +      else
> > +	return *leader;
> > +    }
> >  
> >    load_map->put (root, NULL);
> >  
> > 
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

--
Richard Biener Feb. 24, 2021, 3:12 p.m. UTC | #3
On Wed, 24 Feb 2021, Tamar Christina wrote:

> Hi Richi,
> 
> This is an updated patch with your suggestion.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/99220
> 	* tree-vect-slp.c (optimize_load_redistribution_1): Remove
> 	node from cache when it's about to be deleted.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/99220
> 	* g++.dg/vect/pr99220.cc: New test.
> 
> The 02/24/2021 08:52, Richard Biener wrote:
> > On Tue, 23 Feb 2021, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > > 
> > > The attached testcase shows a bug where two nodes end up with the same pointer.
> > > During the loop that analyzes all the instances
> > > in optimize_load_redistribution_1 we do
> > > 
> > >       if (value)
> > >         {
> > >           SLP_TREE_REF_COUNT (value)++;
> > >           SLP_TREE_CHILDREN (root)[i] = value;
> > >           vect_free_slp_tree (node);
> > >         }
> > > 
> > > when doing a replacement.  When this is done and the refcount for the node
> > > reaches 0, the node is removed, which allows the libc to return the pointer
> > > again in the next call to new, which it does..
> > > 
> > > First instance
> > > 
> > > note:   node 0x5325f48 (max_nunits=1, refcnt=2)
> > > note:   op: VEC_PERM_EXPR
> > > note:           { }
> > > note:           lane permutation { 0[0] 1[1] 0[2] 1[3] }
> > > note:           children 0x5325db0 0x5325200
> > > 
> > > Second instance
> > > 
> > > note:   node 0x5325f48 (max_nunits=1, refcnt=1)
> > > note:   op: VEC_PERM_EXPR
> > > note:           { }
> > > note:           lane permutation { 0[0] 1[1] }
> > > note:           children 0x53255b8 0x5325530
> > > 
> > > This will end up with the illegal construction of
> > > 
> > > note:   node 0x53258e8 (max_nunits=2, refcnt=2)
> > > note:   op template: slp_patt_57 = .COMPLEX_MUL (_16, _16);
> > > note:           stmt 0 _16 = _14 - _15;
> > > note:           stmt 1 _23 = _17 + _22;
> > > note:           children 0x53257d8 0x5325d28
> > > note:   node 0x53257d8 (max_nunits=2, refcnt=3)
> > > note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> > > note:           load permutation { 0 1 }
> > > note:   node 0x5325d28 (max_nunits=2, refcnt=8)
> > > note:   op template: l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 0 l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 1 l$c_5 = MEM[(const struct a &)_3].c;
> > > note:           stmt 2 l$b_4 = MEM[(const struct a &)_3].b;
> > > note:           stmt 3 l$c_5 = MEM[(const struct a &)_3].c;
> > > note:           load permutation { 0 1 0 1 }
> > > 
> > > To prevent this my initial thought was to add the temporary VEC_PERM_EXPR nodes
> > > to the bst_map cache and increase their refcnt one more.  However since bst_map
> > > is gated on scalar statements and these nodes have none we can't do that.
> > > 
> > > Instead I realized that load_map is really only a visited list at the top level.
> > > So instead of returning the reference, we should return NULL.
> > > 
> > > What this means is that it will no replacement was found at that level.  This is
> > > fine since these VEC_PERM_EXPR are single use.  So while the any other node is
> > > an indication to use the cache, VEC_PERM_EXPR are an indication to avoid it.
> > 
> > I don't understand really.  Waiting for the other patch to be pushed so
> > I can eventually have a look, but see below.
> > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > 
> > > Ok for master?
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 	PR tree-optimization/99220
> > > 	* tree-vect-slp.c (optimize_load_redistribution_1): Don't use
> > > 	VEC_PERM_EXPR in cache.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > 	PR tree-optimization/99220
> > > 	* g++.dg/vect/pr99220.cc: New test.
> > > 
> > > --- inline copy of patch -- 
> > > diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc
> > > new file mode 100755
> > > index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/vect/pr99220.cc
> > > @@ -0,0 +1,29 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */
> > > +
> > > +class a {
> > > +  float b;
> > > +  float c;
> > > +
> > > +public:
> > > +  a(float d, float e) : b(d), c(e) {}
> > > +  a operator+(a d) { return a(b + d.b, c + d.c); }
> > > +  a operator-(a d) { return a(b - d.b, c - d.c); }
> > > +  a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); }
> > > +};
> > > +long f;
> > > +a *g;
> > > +class {
> > > +  a *h;
> > > +  long i;
> > > +  a *j;
> > > +
> > > +public:
> > > +  void k() {
> > > +    a l = h[0], m = g[i], n = l * g[1], o = l * j[8];
> > > +    g[i] = m + n;
> > > +    g[i + 1] = m - n;
> > > +    j[f] = o;
> > > +  }
> > > +} p;
> > > +main() { p.k(); }
> > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > > index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644
> > > --- a/gcc/tree-vect-slp.c
> > > +++ b/gcc/tree-vect-slp.c
> > > @@ -2292,7 +2292,12 @@ optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map,
> > >  				slp_tree root)
> > >  {
> > >    if (slp_tree *leader = load_map->get (root))
> > > -    return *leader;
> > > +    {
> > > +      if (SLP_TREE_CODE (root) == VEC_PERM_EXPR)
> > > +	return NULL;
> > 
> > But this will then only optimize the first occurance.  Wouldn't it be
> > better to increase the refcount at
> > 
> >       load_map->put (root, node);
> > 
> > and walk load_map at the end, releasing refs owned by it like we do
> > for bst_map?
> > 
> > > +      else
> > > +	return *leader;
> > > +    }
> > >  
> > >    load_map->put (root, NULL);
> > >  
> > > 
> > > 
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
>
diff mbox series

Patch

diff --git a/gcc/testsuite/g++.dg/vect/pr99220.cc b/gcc/testsuite/g++.dg/vect/pr99220.cc
new file mode 100755
index 0000000000000000000000000000000000000000..ff3058832b742414202a8ada0a9dafc72c9a54aa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr99220.cc
@@ -0,0 +1,29 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-w -O3 -march=armv8.3-a" { target { aarch64*-*-* } } } */
+
+class a {
+  float b;
+  float c;
+
+public:
+  a(float d, float e) : b(d), c(e) {}
+  a operator+(a d) { return a(b + d.b, c + d.c); }
+  a operator-(a d) { return a(b - d.b, c - d.c); }
+  a operator*(a d) { return a(b * b - c * c, b * c + c * d.b); }
+};
+long f;
+a *g;
+class {
+  a *h;
+  long i;
+  a *j;
+
+public:
+  void k() {
+    a l = h[0], m = g[i], n = l * g[1], o = l * j[8];
+    g[i] = m + n;
+    g[i + 1] = m - n;
+    j[f] = o;
+  }
+} p;
+main() { p.k(); }
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 605873714a5cafaaf822f61f1f769f96b3876694..e631463be8fc5b2de355e674a9c96665beb9516c 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2292,7 +2292,12 @@  optimize_load_redistribution_1 (scalar_stmts_to_slp_tree_map_t *bst_map,
 				slp_tree root)
 {
   if (slp_tree *leader = load_map->get (root))
-    return *leader;
+    {
+      if (SLP_TREE_CODE (root) == VEC_PERM_EXPR)
+	return NULL;
+      else
+	return *leader;
+    }
 
   load_map->put (root, NULL);