cse: Fix handling of fake vec_select sets [PR111702]

Message ID	mptr0jgloae.fsf@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6625C3858CD1 From: Richard Sandiford <richard.sandiford@arm.com> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, prathamesh.kulkarni@linaro.org, richard.sandiford@arm.com Cc: prathamesh.kulkarni@linaro.org Subject: [PATCH] cse: Fix handling of fake vec_select sets [PR111702] Date: Wed, 20 Dec 2023 18:30:33 +0000 Message-ID: <mptr0jgloae.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	cse: Fix handling of fake vec_select sets [PR111702] \| expand cse: Fix handling of fake vec_select sets [PR111702]

Message ID

mptr0jgloae.fsf@arm.com

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6625C3858CD1
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, prathamesh.kulkarni@linaro.org,
 richard.sandiford@arm.com
Cc: prathamesh.kulkarni@linaro.org
Subject: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]
Date: Wed, 20 Dec 2023 18:30:33 +0000
Message-ID: <mptr0jgloae.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Precedence: list
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Series

cse: Fix handling of fake vec_select sets [PR111702] | expand

Commit Message

Richard Sandiford Dec. 20, 2023, 6:30 p.m. UTC

If cse sees:

  (set (reg R) (const_vector [A B ...]))

it creates fake sets of the form:

  (set R[0] A)
  (set R[1] B)
  ...

(with R[n] replaced by appropriate rtl) and then adds them to the tables
in the same way as for normal sets.  This allows a sequence like:

  (set (reg R2) A)
  ...(reg R2)...

to try to use R[0] instead of (reg R2).

But the pass was taking the analogy too far, and was trying to simplify
these fake sets based on costs.  That is, if there was an earlier:

  (set (reg T) A)

the pass would go to considerable effort trying to work out whether:

  (set R[0] A)

or:

  (set R[0] (reg T))

was more profitable.  This included running validate*_change on the sets,
which has no meaning given that the sets are not part of the insn.

In this example, the equivalence A == T is already known, and the
purpose of the fake sets is to add A == T == R[0].  We can do that
just as easily (or, as the PR shows, more easily) if we keep the
original form of the fake set, with A instead of T.

The problem in the PR occurred if we had:

(1) something that establishes an equivalence between a vector V1 of
    M-bit scalar integers and a hard register H

(2) something that establishes an equivalence between a vector V2 of
    N-bit scalar integers, where N<M and where V2 contains at least 2
    instances of V1[0]

(1) established an equivalence between V1[0] and H in M bits.
(2) then triggered a search for an equivalence of V1[0] in N bits.
This included:

      /* See if we have a CONST_INT that is already in a register in a
	 wider mode.  */

which (correctly) found that the low N bits of H contain the right value.
But because it came from a wider mode, this equivalence between N-bit H
and N-bit V1[0] was not yet in the hash table.  It therefore survived
the purge in:

      /* At this point, ELT, if nonzero, points to a class of expressions
         equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
	 and SRC_RELATED, if nonzero, each contain additional equivalent
	 expressions.  Prune these latter expressions by deleting expressions
	 already in the equivalence class.

And since more than 1 set found the same N-bit equivalence between
H and V1[0], the pass tried to add it more than once.

Things were already wrong at this stage, but an ICE was only triggered
later when trying to merge this N-bit equivalence with another one.

We could avoid the double registration by adding:

  for (elt = classp; elt; elt = elt->next_same_value)
    if (rtx_equal_p (elt->exp, x))
      return elt;

to insert_with_costs, or by making cse_insn check whether previous
sets have recorded the same equivalence.  The latter seems more
appealing from a compile-time perspective.  But in this case,
doing that would be adding yet more spurious work to the handling
of fake sets.

The handling of fake sets therefore seems like the more fundamental bug.

While there, the patch also makes sure that we don't apply REG_EQUAL
notes to these fake sets.  They only describe the "real" (first) set.

gcc/
	PR rtl-optimization/111702
	* cse.cc (set::mode): Move earlier.
	(set::src_in_memory, set::src_volatile): Convert to bitfields.
	(set::is_fake_set): New member variable.
	(add_to_set): Add an is_fake_set parameter.
	(find_sets_in_insn): Update calls accordingly.
	(cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
	try to optimize them either, or validate changes to them.

gcc/
	PR rtl-optimization/111702
	* gcc.dg/rtl/aarch64/pr111702.c: New test.
---
 gcc/cse.cc                                  | 38 +++++++++++-------
 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +++++++++++++++++++++
 2 files changed, 67 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c

Comments

Richard Biener Dec. 20, 2023, 7 p.m. UTC | #1

> Am 20.12.2023 um 19:30 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> 
> If cse sees:
> 
>  (set (reg R) (const_vector [A B ...]))
> 
> it creates fake sets of the form:
> 
>  (set R[0] A)
>  (set R[1] B)
>  ...
> 
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
> 
>  (set (reg R2) A)
>  ...(reg R2)...
> 
> to try to use R[0] instead of (reg R2).
> 
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
> 
>  (set (reg T) A)
> 
> the pass would go to considerable effort trying to work out whether:
> 
>  (set R[0] A)
> 
> or:
> 
>  (set R[0] (reg T))
> 
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
> 
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
> 
> The problem in the PR occurred if we had:
> 
> (1) something that establishes an equivalence between a vector V1 of
>    M-bit scalar integers and a hard register H
> 
> (2) something that establishes an equivalence between a vector V2 of
>    N-bit scalar integers, where N<M and where V2 contains at least 2
>    instances of V1[0]
> 
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
> 
>      /* See if we have a CONST_INT that is already in a register in a
>     wider mode.  */
> 
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
> 
>      /* At this point, ELT, if nonzero, points to a class of expressions
>         equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
>     and SRC_RELATED, if nonzero, each contain additional equivalent
>     expressions.  Prune these latter expressions by deleting expressions
>     already in the equivalence class.
> 
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
> 
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
> 
> We could avoid the double registration by adding:
> 
>  for (elt = classp; elt; elt = elt->next_same_value)
>    if (rtx_equal_p (elt->exp, x))
>      return elt;
> 
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
> 
> The handling of fake sets therefore seems like the more fundamental bug.
> 
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.

Agreed and OK

> gcc/
>    PR rtl-optimization/111702
>    * cse.cc (set::mode): Move earlier.
>    (set::src_in_memory, set::src_volatile): Convert to bitfields.
>    (set::is_fake_set): New member variable.
>    (add_to_set): Add an is_fake_set parameter.
>    (find_sets_in_insn): Update calls accordingly.
>    (cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
>    try to optimize them either, or validate changes to them.
> 
> gcc/
>    PR rtl-optimization/111702
>    * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
> gcc/cse.cc                                  | 38 +++++++++++-------
> gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +++++++++++++++++++++
> 2 files changed, 67 insertions(+), 14 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> 
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>   unsigned dest_hash;
>   /* The SET_DEST, with SUBREG, etc., stripped.  */
>   rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>   /* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>   /* Nonzero if the SET_SRC contains something
>      whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been created to describe
> +     part of an insn's effect.  Zero means that RTL appears directly in
> +     the insn pattern.  */
> +  unsigned int is_fake_set : 1;
>   /* Hash value of constant equivalent for SET_SRC.  */
>   unsigned src_const_hash;
>   /* A constant equivalent for SET_SRC, if any.  */
> @@ -4229,12 +4233,15 @@ try_back_substitute_reg (rtx set, rtx_insn *insn)
>     }
> }
> 
> -/* Add an entry containing RTL X into SETS.  */
> +/* Add an entry containing RTL X into SETS.  IS_FAKE_SET is true if X is
> +   an artifical set that has been created to describe part of an insn's
> +   effect.  */
> static inline void
> -add_to_set (vec<struct set> *sets, rtx x)
> +add_to_set (vec<struct set> *sets, rtx x, bool is_fake_set)
> {
>   struct set entry = {};
>   entry.rtl = x;
> +  entry.is_fake_set = is_fake_set;
>   sets->safe_push (entry);
> }
> 
> @@ -4271,7 +4278,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
>            && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
>    {
>      /* First register the vector itself.  */
> -      add_to_set (psets, x);
> +      add_to_set (psets, x, false);
>      rtx src = SET_SRC (x);
>      /* Go over the constants of the CONST_VECTOR in forward order, to
>         put them in the same order in the SETS array.  */
> @@ -4281,11 +4288,12 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
>         used to tell CSE how to get to a particular constant.  */
>          rtx y = simplify_gen_vec_select (SET_DEST (x), i);
>          gcc_assert (y);
> -          add_to_set (psets, gen_rtx_SET (y, CONST_VECTOR_ELT (src, i)));
> +          rtx set = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
> +          add_to_set (psets, set, true);
>        }
>    }
>       else
> -    add_to_set (psets, x);
> +    add_to_set (psets, x, false);
>     }
>   else if (GET_CODE (x) == PARALLEL)
>     {
> @@ -4306,7 +4314,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
>          else if (GET_CODE (SET_SRC (y)) == CALL)
>        ;
>          else
> -        add_to_set (psets, y);
> +        add_to_set (psets, y, false);
>        }
>    }
>     }
> @@ -4616,6 +4624,7 @@ cse_insn (rtx_insn *insn)
>       int src_related_regcost = MAX_COST;
>       int src_elt_regcost = MAX_COST;
>       scalar_int_mode int_mode;
> +      bool is_fake_set = sets[i].is_fake_set;
> 
>       dest = SET_DEST (sets[i].rtl);
>       src = SET_SRC (sets[i].rtl);
> @@ -4627,7 +4636,7 @@ cse_insn (rtx_insn *insn)
>       mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src);
>       sets[i].mode = mode;
> 
> -      if (src_eqv)
> +      if (!is_fake_set && src_eqv)
>    {
>      machine_mode eqvmode = mode;
>      if (GET_CODE (dest) == STRICT_LOW_PART)
> @@ -4648,7 +4657,7 @@ cse_insn (rtx_insn *insn)
>       /* If this is a STRICT_LOW_PART assignment, src_eqv corresponds to the
>     value of the INNER register, not the destination.  So it is not
>     a valid substitution for the source.  But save it for later.  */
> -      if (GET_CODE (dest) == STRICT_LOW_PART)
> +      if (is_fake_set || GET_CODE (dest) == STRICT_LOW_PART)
>    src_eqv_here = 0;
>       else
>    src_eqv_here = src_eqv;
> @@ -5158,7 +5167,7 @@ cse_insn (rtx_insn *insn)
> 
>       /* Terminate loop when replacement made.  This must terminate since
>          the current contents will be tested and will always be valid.  */
> -      while (1)
> +      while (!is_fake_set)
>    {
>      rtx trial;
> 
> @@ -5425,7 +5434,8 @@ cse_insn (rtx_insn *insn)
>     with the head of the class.  If we do not do this, we will have
>     both registers live over a portion of the basic block.  This way,
>     their lifetimes will likely abut instead of overlapping.  */
> -      if (REG_P (dest)
> +      if (!is_fake_set
> +      && REG_P (dest)
>      && REGNO_QTY_VALID_P (REGNO (dest)))
>    {
>      int dest_q = REG_QTY (REGNO (dest));
> diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> new file mode 100644
> index 00000000000..8af2c54de3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target aarch64*-*-* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int data[];
> +
> +void __RTL (startwith ("vregs")) foo ()
> +{
> +  (function "foo"
> +    (insn-chain
> +      (block 2
> +    (edge-from entry (flags "FALLTHRU"))
> +    (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +    (insn 4 (set (reg:V16QI <0>)
> +             (const_vector:V16QI [(const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)])))
> +    (insn 5 (set (reg:V2SI v0)
> +             (const_vector:V2SI [(const_int 1) (const_int 0)])))
> +    (insn 6 (set (reg:V16QI v1)
> +             (const_vector:V16QI [(const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)
> +                      (const_int 0) (const_int 0)
> +                      (const_int 1) (const_int 1)])))
> +        (insn 7 (set (reg:QI x0) (subreg:QI (reg:V16QI <0>) 0))
> +        (expr_list:REG_EQUAL (const_int 1) (nil)))
> +        (insn 8 (use (reg:V16QI <0>)))
> +        (insn 9 (use (reg:V2SI v0)))
> +        (insn 10 (use (reg:V16QI v1)))
> +        (insn 11 (use (reg:QI x0)))
> +    (edge-to exit (flags "FALLTHRU"))
> +      ) ;; block 2
> +    ) ;; insn-chain
> +  ) ;; function
> +}
> --
> 2.25.1
>

Prathamesh Kulkarni Dec. 26, 2023, 9:19 p.m. UTC | #2

On Thu, 21 Dec 2023 at 00:00, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> If cse sees:
>
>   (set (reg R) (const_vector [A B ...]))
>
> it creates fake sets of the form:
>
>   (set R[0] A)
>   (set R[1] B)
>   ...
>
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
>
>   (set (reg R2) A)
>   ...(reg R2)...
>
> to try to use R[0] instead of (reg R2).
>
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
>
>   (set (reg T) A)
>
> the pass would go to considerable effort trying to work out whether:
>
>   (set R[0] A)
>
> or:
>
>   (set R[0] (reg T))
>
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
>
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
>
> The problem in the PR occurred if we had:
>
> (1) something that establishes an equivalence between a vector V1 of
>     M-bit scalar integers and a hard register H
>
> (2) something that establishes an equivalence between a vector V2 of
>     N-bit scalar integers, where N<M and where V2 contains at least 2
>     instances of V1[0]
>
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
>
>       /* See if we have a CONST_INT that is already in a register in a
>          wider mode.  */
>
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
>
>       /* At this point, ELT, if nonzero, points to a class of expressions
>          equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
>          and SRC_RELATED, if nonzero, each contain additional equivalent
>          expressions.  Prune these latter expressions by deleting expressions
>          already in the equivalence class.
>
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
>
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
>
> We could avoid the double registration by adding:
>
>   for (elt = classp; elt; elt = elt->next_same_value)
>     if (rtx_equal_p (elt->exp, x))
>       return elt;
>
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
>
> The handling of fake sets therefore seems like the more fundamental bug.
>
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.
Hi Richard,
Thanks for the detailed explanation and fix!

Thanks,
Prathamesh
>
> gcc/
>         PR rtl-optimization/111702
>         * cse.cc (set::mode): Move earlier.
>         (set::src_in_memory, set::src_volatile): Convert to bitfields.
>         (set::is_fake_set): New member variable.
>         (add_to_set): Add an is_fake_set parameter.
>         (find_sets_in_insn): Update calls accordingly.
>         (cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
>         try to optimize them either, or validate changes to them.
>
> gcc/
>         PR rtl-optimization/111702
>         * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
>  gcc/cse.cc                                  | 38 +++++++++++-------
>  gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +++++++++++++++++++++
>  2 files changed, 67 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>    unsigned dest_hash;
>    /* The SET_DEST, with SUBREG, etc., stripped.  */
>    rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>    /* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>    /* Nonzero if the SET_SRC contains something
>       whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been created to describe
> +     part of an insn's effect.  Zero means that RTL appears directly in
> +     the insn pattern.  */
> +  unsigned int is_fake_set : 1;
>    /* Hash value of constant equivalent for SET_SRC.  */
>    unsigned src_const_hash;
>    /* A constant equivalent for SET_SRC, if any.  */
> @@ -4229,12 +4233,15 @@ try_back_substitute_reg (rtx set, rtx_insn *insn)
>      }
>  }
>
> -/* Add an entry containing RTL X into SETS.  */
> +/* Add an entry containing RTL X into SETS.  IS_FAKE_SET is true if X is
> +   an artifical set that has been created to describe part of an insn's
> +   effect.  */
>  static inline void
> -add_to_set (vec<struct set> *sets, rtx x)
> +add_to_set (vec<struct set> *sets, rtx x, bool is_fake_set)
>  {
>    struct set entry = {};
>    entry.rtl = x;
> +  entry.is_fake_set = is_fake_set;
>    sets->safe_push (entry);
>  }
>
> @@ -4271,7 +4278,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
>                     && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
>         {
>           /* First register the vector itself.  */
> -         add_to_set (psets, x);
> +         add_to_set (psets, x, false);
>           rtx src = SET_SRC (x);
>           /* Go over the constants of the CONST_VECTOR in forward order, to
>              put them in the same order in the SETS array.  */
> @@ -4281,11 +4288,12 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
>                  used to tell CSE how to get to a particular constant.  */
>               rtx y = simplify_gen_vec_select (SET_DEST (x), i);
>               gcc_assert (y);
> -             add_to_set (psets, gen_rtx_SET (y, CONST_VECTOR_ELT (src, i)));
> +             rtx set = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
> +             add_to_set (psets, set, true);
>             }
>         }
>        else
> -       add_to_set (psets, x);
> +       add_to_set (psets, x, false);
>      }
>    else if (GET_CODE (x) == PARALLEL)
>      {
> @@ -4306,7 +4314,7 @@ find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
>               else if (GET_CODE (SET_SRC (y)) == CALL)
>                 ;
>               else
> -               add_to_set (psets, y);
> +               add_to_set (psets, y, false);
>             }
>         }
>      }
> @@ -4616,6 +4624,7 @@ cse_insn (rtx_insn *insn)
>        int src_related_regcost = MAX_COST;
>        int src_elt_regcost = MAX_COST;
>        scalar_int_mode int_mode;
> +      bool is_fake_set = sets[i].is_fake_set;
>
>        dest = SET_DEST (sets[i].rtl);
>        src = SET_SRC (sets[i].rtl);
> @@ -4627,7 +4636,7 @@ cse_insn (rtx_insn *insn)
>        mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src);
>        sets[i].mode = mode;
>
> -      if (src_eqv)
> +      if (!is_fake_set && src_eqv)
>         {
>           machine_mode eqvmode = mode;
>           if (GET_CODE (dest) == STRICT_LOW_PART)
> @@ -4648,7 +4657,7 @@ cse_insn (rtx_insn *insn)
>        /* If this is a STRICT_LOW_PART assignment, src_eqv corresponds to the
>          value of the INNER register, not the destination.  So it is not
>          a valid substitution for the source.  But save it for later.  */
> -      if (GET_CODE (dest) == STRICT_LOW_PART)
> +      if (is_fake_set || GET_CODE (dest) == STRICT_LOW_PART)
>         src_eqv_here = 0;
>        else
>         src_eqv_here = src_eqv;
> @@ -5158,7 +5167,7 @@ cse_insn (rtx_insn *insn)
>
>        /* Terminate loop when replacement made.  This must terminate since
>           the current contents will be tested and will always be valid.  */
> -      while (1)
> +      while (!is_fake_set)
>         {
>           rtx trial;
>
> @@ -5425,7 +5434,8 @@ cse_insn (rtx_insn *insn)
>          with the head of the class.  If we do not do this, we will have
>          both registers live over a portion of the basic block.  This way,
>          their lifetimes will likely abut instead of overlapping.  */
> -      if (REG_P (dest)
> +      if (!is_fake_set
> +         && REG_P (dest)
>           && REGNO_QTY_VALID_P (REGNO (dest)))
>         {
>           int dest_q = REG_QTY (REGNO (dest));
> diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> new file mode 100644
> index 00000000000..8af2c54de3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target aarch64*-*-* } } */
> +/* { dg-options "-O2" } */
> +
> +extern int data[];
> +
> +void __RTL (startwith ("vregs")) foo ()
> +{
> +  (function "foo"
> +    (insn-chain
> +      (block 2
> +       (edge-from entry (flags "FALLTHRU"))
> +       (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +       (insn 4 (set (reg:V16QI <0>)
> +                    (const_vector:V16QI [(const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)])))
> +       (insn 5 (set (reg:V2SI v0)
> +                    (const_vector:V2SI [(const_int 1) (const_int 0)])))
> +       (insn 6 (set (reg:V16QI v1)
> +                    (const_vector:V16QI [(const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)
> +                                         (const_int 0) (const_int 0)
> +                                         (const_int 1) (const_int 1)])))
> +        (insn 7 (set (reg:QI x0) (subreg:QI (reg:V16QI <0>) 0))
> +               (expr_list:REG_EQUAL (const_int 1) (nil)))
> +        (insn 8 (use (reg:V16QI <0>)))
> +        (insn 9 (use (reg:V2SI v0)))
> +        (insn 10 (use (reg:V16QI v1)))
> +        (insn 11 (use (reg:QI x0)))
> +       (edge-to exit (flags "FALLTHRU"))
> +      ) ;; block 2
> +    ) ;; insn-chain
> +  ) ;; function
> +}
> --
> 2.25.1
>

diff --git a/gcc/cse.cc b/gcc/cse.cc
index f9603fdfd43..9fd51ca2832 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4128,13 +4128,17 @@  struct set
   unsigned dest_hash;
   /* The SET_DEST, with SUBREG, etc., stripped.  */
   rtx inner_dest;
+  /* Original machine mode, in case it becomes a CONST_INT.  */
+  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
   /* Nonzero if the SET_SRC is in memory.  */
-  char src_in_memory;
+  unsigned int src_in_memory : 1;
   /* Nonzero if the SET_SRC contains something
      whose value cannot be predicted and understood.  */
-  char src_volatile;
-  /* Original machine mode, in case it becomes a CONST_INT.  */
-  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
+  unsigned int src_volatile : 1;
+  /* Nonzero if RTL is an artifical set that has been created to describe
+     part of an insn's effect.  Zero means that RTL appears directly in
+     the insn pattern.  */
+  unsigned int is_fake_set : 1;
   /* Hash value of constant equivalent for SET_SRC.  */
   unsigned src_const_hash;
   /* A constant equivalent for SET_SRC, if any.  */
@@ -4229,12 +4233,15 @@  try_back_substitute_reg (rtx set, rtx_insn *insn)
     }
 }
 
-/* Add an entry containing RTL X into SETS.  */
+/* Add an entry containing RTL X into SETS.  IS_FAKE_SET is true if X is
+   an artifical set that has been created to describe part of an insn's
+   effect.  */
 static inline void
-add_to_set (vec<struct set> *sets, rtx x)
+add_to_set (vec<struct set> *sets, rtx x, bool is_fake_set)
 {
   struct set entry = {};
   entry.rtl = x;
+  entry.is_fake_set = is_fake_set;
   sets->safe_push (entry);
 }
 
@@ -4271,7 +4278,7 @@  find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
 		    && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
 	{
 	  /* First register the vector itself.  */
-	  add_to_set (psets, x);
+	  add_to_set (psets, x, false);
 	  rtx src = SET_SRC (x);
 	  /* Go over the constants of the CONST_VECTOR in forward order, to
 	     put them in the same order in the SETS array.  */
@@ -4281,11 +4288,12 @@  find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
 		 used to tell CSE how to get to a particular constant.  */
 	      rtx y = simplify_gen_vec_select (SET_DEST (x), i);
 	      gcc_assert (y);
-	      add_to_set (psets, gen_rtx_SET (y, CONST_VECTOR_ELT (src, i)));
+	      rtx set = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i));
+	      add_to_set (psets, set, true);
 	    }
 	}
       else
-	add_to_set (psets, x);
+	add_to_set (psets, x, false);
     }
   else if (GET_CODE (x) == PARALLEL)
     {
@@ -4306,7 +4314,7 @@  find_sets_in_insn (rtx_insn *insn, vec<struct set> *psets)
 	      else if (GET_CODE (SET_SRC (y)) == CALL)
 		;
 	      else
-		add_to_set (psets, y);
+		add_to_set (psets, y, false);
 	    }
 	}
     }
@@ -4616,6 +4624,7 @@  cse_insn (rtx_insn *insn)
       int src_related_regcost = MAX_COST;
       int src_elt_regcost = MAX_COST;
       scalar_int_mode int_mode;
+      bool is_fake_set = sets[i].is_fake_set;
 
       dest = SET_DEST (sets[i].rtl);
       src = SET_SRC (sets[i].rtl);
@@ -4627,7 +4636,7 @@  cse_insn (rtx_insn *insn)
       mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src);
       sets[i].mode = mode;
 
-      if (src_eqv)
+      if (!is_fake_set && src_eqv)
 	{
 	  machine_mode eqvmode = mode;
 	  if (GET_CODE (dest) == STRICT_LOW_PART)
@@ -4648,7 +4657,7 @@  cse_insn (rtx_insn *insn)
       /* If this is a STRICT_LOW_PART assignment, src_eqv corresponds to the
 	 value of the INNER register, not the destination.  So it is not
 	 a valid substitution for the source.  But save it for later.  */
-      if (GET_CODE (dest) == STRICT_LOW_PART)
+      if (is_fake_set || GET_CODE (dest) == STRICT_LOW_PART)
 	src_eqv_here = 0;
       else
 	src_eqv_here = src_eqv;
@@ -5158,7 +5167,7 @@  cse_insn (rtx_insn *insn)
 
       /* Terminate loop when replacement made.  This must terminate since
          the current contents will be tested and will always be valid.  */
-      while (1)
+      while (!is_fake_set)
 	{
 	  rtx trial;
 
@@ -5425,7 +5434,8 @@  cse_insn (rtx_insn *insn)
 	 with the head of the class.  If we do not do this, we will have
 	 both registers live over a portion of the basic block.  This way,
 	 their lifetimes will likely abut instead of overlapping.  */
-      if (REG_P (dest)
+      if (!is_fake_set
+	  && REG_P (dest)
 	  && REGNO_QTY_VALID_P (REGNO (dest)))
 	{
 	  int dest_q = REG_QTY (REGNO (dest));
diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
new file mode 100644
index 00000000000..8af2c54de3c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
@@ -0,0 +1,43 @@ 
+/* { dg-do compile { target aarch64*-*-* } } */
+/* { dg-options "-O2" } */
+
+extern int data[];
+
+void __RTL (startwith ("vregs")) foo ()
+{
+  (function "foo"
+    (insn-chain
+      (block 2
+	(edge-from entry (flags "FALLTHRU"))
+	(cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
+	(insn 4 (set (reg:V16QI <0>)
+		     (const_vector:V16QI [(const_int 1) (const_int 1)
+					  (const_int 0) (const_int 0)
+					  (const_int 1) (const_int 1)
+					  (const_int 0) (const_int 0)
+					  (const_int 1) (const_int 1)
+					  (const_int 0) (const_int 0)
+					  (const_int 1) (const_int 1)
+					  (const_int 0) (const_int 0)])))
+	(insn 5 (set (reg:V2SI v0)
+		     (const_vector:V2SI [(const_int 1) (const_int 0)])))
+	(insn 6 (set (reg:V16QI v1)
+		     (const_vector:V16QI [(const_int 0) (const_int 0)
+					  (const_int 1) (const_int 1)
+					  (const_int 0) (const_int 0)
+					  (const_int 1) (const_int 1)
+					  (const_int 0) (const_int 0)
+					  (const_int 1) (const_int 1)
+					  (const_int 0) (const_int 0)
+					  (const_int 1) (const_int 1)])))
+        (insn 7 (set (reg:QI x0) (subreg:QI (reg:V16QI <0>) 0))
+		(expr_list:REG_EQUAL (const_int 1) (nil)))
+        (insn 8 (use (reg:V16QI <0>)))
+        (insn 9 (use (reg:V2SI v0)))
+        (insn 10 (use (reg:V16QI v1)))
+        (insn 11 (use (reg:QI x0)))
+	(edge-to exit (flags "FALLTHRU"))
+      ) ;; block 2
+    ) ;; insn-chain
+  ) ;; function
+}

cse: Fix handling of fake vec_select sets [PR111702]

Commit Message

Comments

Patch