diff mbox series

ifcvt: Clarify if_info.original_cost.

Message ID 57bb6ce5-79c3-4b08-b524-e807b9ac431b@gmail.com
State New
Headers show
Series ifcvt: Clarify if_info.original_cost. | expand

Commit Message

Robin Dapp May 31, 2024, 3:03 p.m. UTC
Hi,

before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.
This patch computes the jump costs via
  insn_cost (if_info.jump, ...)
which is supposed to incorporate the branch costs and, in case of a CC
comparison,
  pattern_cost (if_info.cond, ...)
which is supposed to account for the CC creation.

For compare_and_jump patterns insn_cost should have already computed
the right cost.

Does this "split" make sense, generally?

Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
on riscv.

Regards
 Robin

gcc/ChangeLog:

	* ifcvt.cc (noce_process_if_block): Subtract condition pattern
	cost if applicable.
	(noce_find_if_block): Use insn_cost and pattern_cost for
	original cost.
---
 gcc/ifcvt.cc | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

Comments

Jeff Law June 1, 2024, 4:05 a.m. UTC | #1
On 5/31/24 9:03 AM, Robin Dapp wrote:
> Hi,
> 
> before noce_find_if_block processes a block it sets up an if_info
> structure that holds the original costs.  At that point the costs of
> the then/else blocks have not been added so we only care about the
> "if" cost.
> 
> The code originally used BRANCH_COST for that but was then changed
> to COST_N_INSNS (2) - a compare and a jump.
> This patch computes the jump costs via
>    insn_cost (if_info.jump, ...)
> which is supposed to incorporate the branch costs and, in case of a CC
> comparison,
>    pattern_cost (if_info.cond, ...)
> which is supposed to account for the CC creation.
> 
> For compare_and_jump patterns insn_cost should have already computed
> the right cost.
> 
> Does this "split" make sense, generally?
> 
> Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
> on riscv.
> 
> Regards
>   Robin
> 
> gcc/ChangeLog:
> 
> 	* ifcvt.cc (noce_process_if_block): Subtract condition pattern
> 	cost if applicable.
> 	(noce_find_if_block): Use insn_cost and pattern_cost for
> 	original cost.
OK.  Obviously we'll need to be on the lookout for regressions.  My bet 
is on s390 since you already tested the x86, aarch64 & p10 targets :-)


jeff
Stefan Schulze Frielinghaus June 2, 2024, 6:35 a.m. UTC | #2
On Fri, May 31, 2024 at 10:05:55PM -0600, Jeff Law wrote:
> 
> 
> On 5/31/24 9:03 AM, Robin Dapp wrote:
> > Hi,
> > 
> > before noce_find_if_block processes a block it sets up an if_info
> > structure that holds the original costs.  At that point the costs of
> > the then/else blocks have not been added so we only care about the
> > "if" cost.
> > 
> > The code originally used BRANCH_COST for that but was then changed
> > to COST_N_INSNS (2) - a compare and a jump.
> > This patch computes the jump costs via
> >    insn_cost (if_info.jump, ...)
> > which is supposed to incorporate the branch costs and, in case of a CC
> > comparison,
> >    pattern_cost (if_info.cond, ...)
> > which is supposed to account for the CC creation.
> > 
> > For compare_and_jump patterns insn_cost should have already computed
> > the right cost.
> > 
> > Does this "split" make sense, generally?
> > 
> > Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
> > on riscv.
> > 
> > Regards
> >   Robin
> > 
> > gcc/ChangeLog:
> > 
> > 	* ifcvt.cc (noce_process_if_block): Subtract condition pattern
> > 	cost if applicable.
> > 	(noce_find_if_block): Use insn_cost and pattern_cost for
> > 	original cost.
> OK.  Obviously we'll need to be on the lookout for regressions.  My bet is
> on s390 since you already tested the x86, aarch64 & p10 targets :-)

I just gave it a try on s390 where bootstrap and regtest were successful.

Cheers,
Stefan

> 
> 
> jeff
>
Richard Sandiford June 3, 2024, 11:48 a.m. UTC | #3
Robin Dapp <rdapp.gcc@gmail.com> writes:
> Hi,
>
> before noce_find_if_block processes a block it sets up an if_info
> structure that holds the original costs.  At that point the costs of
> the then/else blocks have not been added so we only care about the
> "if" cost.
>
> The code originally used BRANCH_COST for that but was then changed
> to COST_N_INSNS (2) - a compare and a jump.
> This patch computes the jump costs via
>   insn_cost (if_info.jump, ...)
> which is supposed to incorporate the branch costs and, in case of a CC
> comparison,
>   pattern_cost (if_info.cond, ...)
> which is supposed to account for the CC creation.
>
> For compare_and_jump patterns insn_cost should have already computed
> the right cost.
>
> Does this "split" make sense, generally?
>
> Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
> on riscv.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> 	* ifcvt.cc (noce_process_if_block): Subtract condition pattern
> 	cost if applicable.
> 	(noce_find_if_block): Use insn_cost and pattern_cost for
> 	original cost.
> ---
>  gcc/ifcvt.cc | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 58ed42673e5..305b9faed38 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3940,7 +3940,9 @@ noce_process_if_block (struct noce_if_info *if_info)
>       ??? Actually, instead of the branch instruction costs we might want
>       to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
>  
> -  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  unsigned potential_cost = if_info->original_cost;
> +  if (cc_in_cond (if_info->cond))
> +    potential_cost -= pattern_cost (if_info->cond, if_info->speed_p);
>    unsigned old_cost = if_info->original_cost;
>    if (!else_bb
>        && HAVE_conditional_move
> @@ -4703,11 +4705,13 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
>      = targetm.max_noce_ifcvt_seq_cost (then_edge);
>    /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
>       that they are valid to transform.  We can't easily get back to the insn
> -     for COND (and it may not exist if we had to canonicalize to get COND),
> -     and jump_insns are always given a cost of 1 by seq_cost, so treat
> -     both instructions as having cost COSTS_N_INSNS (1).  */
> -  if_info.original_cost = COSTS_N_INSNS (2);
> -
> +     for COND (and it may not exist if we had to canonicalize to get COND).
> +     Here we assume one CC compare insn (if the target uses CC) and one
> +     jump insn that is costed via insn_cost.  It is assumed that the
> +     costs of a jump insn are dependent on the branch costs.  */
> +  if (cc_in_cond (if_info.cond))
> +    if_info.original_cost = pattern_cost (if_info.cond, if_info.speed_p);
> +  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
>  
>    /* Do the real work.  */

Is there any way we can avoid using pattern_cost here?  Using it means
that we can make use of targetm.insn_cost for the jump but circumvent
it for the condition, giving a bit of a mixed metric.

(I realise there are existing calls to pattern_cost in ifcvt.cc,
but if possible I think we should try to avoid adding more.)

Thanks,
Richard
Robin Dapp June 7, 2024, 9:23 a.m. UTC | #4
> Is there any way we can avoid using pattern_cost here?  Using it means
> that we can make use of targetm.insn_cost for the jump but circumvent
> it for the condition, giving a bit of a mixed metric.
> 
> (I realise there are existing calls to pattern_cost in ifcvt.cc,
> but if possible I think we should try to avoid adding more.)

Yes, I believe there is.  In addition, what I did with
if_info->cond wasn't what I intended to do.

The whole point of the exercise is that noce_convert_multiple_sets
can re-use the CC comparison that is already present (because it
is used in the jump pattern).  Therefore I want to split costs
into a jump part and a CC-setting part so the final costing
decision for multiple sets can be:

 insn_cost (jump) + n * insn_cost (set)
vs
 n * insn_cost ("cmov")

Still, the original costs should be:
 insn_cost (set_cc) + insn_cost (jump)
and with the split we can just remove insn_cost (set_cc) before
the multiple-set cost comparison and re-add it afterwards.

For non-CC targets this is not necessary.

So what I'd hope is better is to use
insn_cost (if_info.earliest_cond)
which is indeed the CC-set/comparison if it exists.

The attached v2 was bootstrapped and regtested on x86, aarch64 and
power10 and regtested on riscv64.

Regards
 Robin

gcc/ChangeLog:

	* ifcvt.cc (noce_process_if_block): Subtract condition pattern
	cost if applicable.
	(noce_find_if_block): Use insn_cost and pattern_cost for
	original cost.
---
 gcc/ifcvt.cc | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..ebb838fd82c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3931,16 +3931,16 @@ noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  The original costs already include
-     a base cost of COSTS_N_INSNS (2): one instruction for the compare
-     (which we will be needing either way) and one instruction for the
-     branch.  When comparing costs we want to use the branch instruction
-     cost and the sets vs. the cmovs generated here.  Therefore subtract
-     the costs of the compare before checking.
-     ??? Actually, instead of the branch instruction costs we might want
-     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
-
-  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  /* First look for multiple SETS.
+     The original costs already include costs for the jump insn as well
+     as for a CC comparison if there is any.
+     We want to allow the backend to re-use the existing CC comparison
+     and therefore don't consider it for the cost comparison (as it is
+     then needed for both the jump as well as the cmov sequence).  */
+
+  unsigned potential_cost = if_info->original_cost;
+  if (if_info->cond_earliest && if_info->jump != if_info->cond_earliest)
+    potential_cost -= insn_cost (if_info->cond_earliest, if_info->speed_p);
   unsigned old_cost = if_info->original_cost;
   if (!else_bb
       && HAVE_conditional_move
@@ -4703,11 +4703,12 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
     = targetm.max_noce_ifcvt_seq_cost (then_edge);
   /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
      that they are valid to transform.  We can't easily get back to the insn
-     for COND (and it may not exist if we had to canonicalize to get COND),
-     and jump_insns are always given a cost of 1 by seq_cost, so treat
-     both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+     for COND (and it may not exist if we had to canonicalize to get COND).
+     jump insn that is costed via insn_cost.  It is assumed that the
+     costs of a jump insn are dependent on the branch costs.  */
+  if (if_info.cond_earliest && if_info.jump != if_info.cond_earliest)
+    if_info.original_cost = insn_cost (if_info.cond_earliest, if_info.speed_p);
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
 
   /* Do the real work.  */
Richard Sandiford June 10, 2024, 5:56 p.m. UTC | #5
Robin Dapp <rdapp.gcc@gmail.com> writes:
>> Is there any way we can avoid using pattern_cost here?  Using it means
>> that we can make use of targetm.insn_cost for the jump but circumvent
>> it for the condition, giving a bit of a mixed metric.
>> 
>> (I realise there are existing calls to pattern_cost in ifcvt.cc,
>> but if possible I think we should try to avoid adding more.)
>
> Yes, I believe there is.  In addition, what I did with
> if_info->cond wasn't what I intended to do.
>
> The whole point of the exercise is that noce_convert_multiple_sets
> can re-use the CC comparison that is already present (because it
> is used in the jump pattern).  Therefore I want to split costs
> into a jump part and a CC-setting part so the final costing
> decision for multiple sets can be:
>
>  insn_cost (jump) + n * insn_cost (set)
> vs
>  n * insn_cost ("cmov")
>
> Still, the original costs should be:
>  insn_cost (set_cc) + insn_cost (jump)
> and with the split we can just remove insn_cost (set_cc) before
> the multiple-set cost comparison and re-add it afterwards.
>
> For non-CC targets this is not necessary.
>
> So what I'd hope is better is to use
> insn_cost (if_info.earliest_cond)
> which is indeed the CC-set/comparison if it exists.

I agree that's probably good enough in practice.  It doesn't cope
with things like:

	    /* Handle sequences like:

	       (set op0 (xor X Y))
	       ...(eq|ne op0 (const_int 0))...

	       in which case:

	       (eq op0 (const_int 0)) reduces to (eq X Y)
	       (ne op0 (const_int 0)) reduces to (ne X Y)

	       This is the form used by MIPS16, for example.  */

but then neither does the current code.  But...

> The attached v2 was bootstrapped and regtested on x86, aarch64 and
> power10 and regtested on riscv64.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> 	* ifcvt.cc (noce_process_if_block): Subtract condition pattern
> 	cost if applicable.
> 	(noce_find_if_block): Use insn_cost and pattern_cost for
> 	original cost.
> ---
>  gcc/ifcvt.cc | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 58ed42673e5..ebb838fd82c 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3931,16 +3931,16 @@ noce_process_if_block (struct noce_if_info *if_info)
>       to calculate a value for x.
>       ??? For future expansion, further expand the "multiple X" rules.  */
>  
> -  /* First look for multiple SETS.  The original costs already include
> -     a base cost of COSTS_N_INSNS (2): one instruction for the compare
> -     (which we will be needing either way) and one instruction for the
> -     branch.  When comparing costs we want to use the branch instruction
> -     cost and the sets vs. the cmovs generated here.  Therefore subtract
> -     the costs of the compare before checking.
> -     ??? Actually, instead of the branch instruction costs we might want
> -     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
> -
> -  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  /* First look for multiple SETS.
> +     The original costs already include costs for the jump insn as well
> +     as for a CC comparison if there is any.
> +     We want to allow the backend to re-use the existing CC comparison
> +     and therefore don't consider it for the cost comparison (as it is
> +     then needed for both the jump as well as the cmov sequence).  */
> +
> +  unsigned potential_cost = if_info->original_cost;
> +  if (if_info->cond_earliest && if_info->jump != if_info->cond_earliest)
> +    potential_cost -= insn_cost (if_info->cond_earliest, if_info->speed_p);
>    unsigned old_cost = if_info->original_cost;
>    if (!else_bb
>        && HAVE_conditional_move

...why do we do the adjustment here?  Doesn't noce_convert_multiple_sets_1
know for certain (or at least with more certainty) whether any of the
new instructions use the old CC result?  It seems like we could record
that and do the adjustment around the call to
targetm.noce_conversion_profitable_p.

> @@ -4703,11 +4703,12 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
>      = targetm.max_noce_ifcvt_seq_cost (then_edge);
>    /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
>       that they are valid to transform.  We can't easily get back to the insn
> -     for COND (and it may not exist if we had to canonicalize to get COND),
> -     and jump_insns are always given a cost of 1 by seq_cost, so treat
> -     both instructions as having cost COSTS_N_INSNS (1).  */
> -  if_info.original_cost = COSTS_N_INSNS (2);
> -
> +     for COND (and it may not exist if we had to canonicalize to get COND).
> +     jump insn that is costed via insn_cost.  It is assumed that the
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Looks like this part of the comment got a bit garbled.

Thanks,
Richard

> +     costs of a jump insn are dependent on the branch costs.  */
> +  if (if_info.cond_earliest && if_info.jump != if_info.cond_earliest)
> +    if_info.original_cost = insn_cost (if_info.cond_earliest, if_info.speed_p);
> +  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
>  
>    /* Do the real work.  */
Robin Dapp June 11, 2024, 3:28 p.m. UTC | #6
The attached v3 tracks the use of cond_earliest as you suggested
and adds its cost in default_noce_conversion_profitable_p.

Bootstrapped and regtested on x86 and p10, aarch64 still
running.  Regtested on riscv64.

Regards
 Robin

Before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.

This patch computes the jump costs via
  insn_cost (if_info.jump, ...)
under the assumption that the target takes BRANCH_COST into account
when costing a jump instruction.

In noce_convert_multiple_sets, we keep track of the need for the initial
CC comparison.  If we needed it for the generated sequence we add its
cost in default_noce_conversion_profitable_p.

gcc/ChangeLog:

	* ifcvt.cc (default_noce_conversion_profitable_p):  Add cost of
	CC comparison.
	(noce_convert_multiple_sets_1): Set use_cond_earliest.
	(noce_process_if_block): Just use original cost.
	(noce_find_if_block): Use insn_cost (jump_insn).
	* ifcvt.h (struct noce_if_info): Add use_cond_earliest.
---
 gcc/ifcvt.cc | 37 ++++++++++++++++++++++---------------
 gcc/ifcvt.h  |  3 +++
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..9b408eeb313 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -814,7 +814,16 @@ default_noce_conversion_profitable_p (rtx_insn *seq,
   /* Cost up the new sequence.  */
   unsigned int cost = seq_cost (seq, speed_p);
 
-  if (cost <= if_info->original_cost)
+  /* If the created sequence does not use cond_earliest (but the jump
+     does) add its cost to the original_cost here.  */
+  unsigned int cost_adjust = 0;
+
+  if (if_info->jump != if_info->cond_earliest
+      && !if_info->use_cond_earliest)
+    cost_adjust = insn_cost (if_info->cond_earliest,
+			     if_info->speed_p);
+
+  if (cost <= if_info->original_cost + cost_adjust)
     return true;
 
   /* When compiling for size, we can make a reasonably accurately guess
@@ -3780,6 +3789,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	  temp_dest = temp_dest2;
 	  if (!second_try && read_comparison)
 	    *last_needs_comparison = count;
+	  if_info->use_cond_earliest = true;
 	}
       else
 	{
@@ -3931,16 +3941,13 @@ noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  The original costs already include
-     a base cost of COSTS_N_INSNS (2): one instruction for the compare
-     (which we will be needing either way) and one instruction for the
-     branch.  When comparing costs we want to use the branch instruction
-     cost and the sets vs. the cmovs generated here.  Therefore subtract
-     the costs of the compare before checking.
-     ??? Actually, instead of the branch instruction costs we might want
-     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
+  /* First look for multiple SETS.
+     The original costs already include costs for the jump insn as well
+     as for a CC comparison if there is any.
+     If a target re-uses the existing CC comparison we keep track of that
+     and add the costs in default default_noce_conversion_profitable_p.  */
 
-  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned potential_cost = if_info->original_cost;
   unsigned old_cost = if_info->original_cost;
   if (!else_bb
       && HAVE_conditional_move
@@ -4696,6 +4703,7 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   gcc_assert (if_info.rev_cond == NULL_RTX
 	      || rev_cond_earliest == cond_earliest);
   if_info.cond_earliest = cond_earliest;
+  if_info.use_cond_earliest = false;
   if_info.jump = jump;
   if_info.then_else_reversed = then_else_reversed;
   if_info.speed_p = speed_p;
@@ -4703,11 +4711,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
     = targetm.max_noce_ifcvt_seq_cost (then_edge);
   /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
      that they are valid to transform.  We can't easily get back to the insn
-     for COND (and it may not exist if we had to canonicalize to get COND),
-     and jump_insns are always given a cost of 1 by seq_cost, so treat
-     both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+     for COND (and it may not exist if we had to canonicalize to get COND).
+     It is assumed that the costs of a jump insn are dependent on the
+     branch costs.  */
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
 
   /* Do the real work.  */
 
diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h
index 37147f99129..67d4f6a502e 100644
--- a/gcc/ifcvt.h
+++ b/gcc/ifcvt.h
@@ -63,6 +63,9 @@ struct noce_if_info
   /* New insns should be inserted before this one.  */
   rtx_insn *cond_earliest;
 
+  /* Whether or not COND_EARLIEST is used in the resulting sequence.  */
+  bool use_cond_earliest;
+
   /* Insns in the THEN and ELSE block.  There is always just this
      one insns in those blocks.  The insns are single_set insns.
      If there was no ELSE block, INSN_B is the last insn before
Richard Sandiford June 11, 2024, 3:46 p.m. UTC | #7
Robin Dapp <rdapp.gcc@gmail.com> writes:
> The attached v3 tracks the use of cond_earliest as you suggested
> and adds its cost in default_noce_conversion_profitable_p.
>
> Bootstrapped and regtested on x86 and p10, aarch64 still
> running.  Regtested on riscv64.
>
> Regards
>  Robin
>
> Before noce_find_if_block processes a block it sets up an if_info
> structure that holds the original costs.  At that point the costs of
> the then/else blocks have not been added so we only care about the
> "if" cost.
>
> The code originally used BRANCH_COST for that but was then changed
> to COST_N_INSNS (2) - a compare and a jump.
>
> This patch computes the jump costs via
>   insn_cost (if_info.jump, ...)
> under the assumption that the target takes BRANCH_COST into account
> when costing a jump instruction.
>
> In noce_convert_multiple_sets, we keep track of the need for the initial
> CC comparison.  If we needed it for the generated sequence we add its
> cost in default_noce_conversion_profitable_p.

I was looking at the code in more detail and just wanted to check.
We have:

  int last_needs_comparison = -1;

  bool ok = noce_convert_multiple_sets_1
    (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
     &unmodified_insns, &last_needs_comparison);
  if (!ok)
      return false;

  /* If there are insns that overwrite part of the initial
     comparison, we can still omit creating temporaries for
     the last of them.
     As the second try will always create a less expensive,
     valid sequence, we do not need to compare and can discard
     the first one.  */
  if (last_needs_comparison != -1)
    {
      end_sequence ();
      start_sequence ();
      ok = noce_convert_multiple_sets_1
	(if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
	 &unmodified_insns, &last_needs_comparison);
      /* Actually we should not fail anymore if we reached here,
	 but better still check.  */
      if (!ok)
	  return false;
    }

But noce_convert_multiple_sets_1 ends with:

  /* Even if we did not actually need the comparison, we want to make sure
     to try a second time in order to get rid of the temporaries.  */
  if (*last_needs_comparison == -1)
    *last_needs_comparison = 0;


  return true;

AFAICT that means that the first attempt is always redundant.

Have I missed something?

I don't know if this was something that Manolis's patches addressed.

Thanks,
Richard

>
> gcc/ChangeLog:
>
> 	* ifcvt.cc (default_noce_conversion_profitable_p):  Add cost of
> 	CC comparison.
> 	(noce_convert_multiple_sets_1): Set use_cond_earliest.
> 	(noce_process_if_block): Just use original cost.
> 	(noce_find_if_block): Use insn_cost (jump_insn).
> 	* ifcvt.h (struct noce_if_info): Add use_cond_earliest.
> ---
>  gcc/ifcvt.cc | 37 ++++++++++++++++++++++---------------
>  gcc/ifcvt.h  |  3 +++
>  2 files changed, 25 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 58ed42673e5..9b408eeb313 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -814,7 +814,16 @@ default_noce_conversion_profitable_p (rtx_insn *seq,
>    /* Cost up the new sequence.  */
>    unsigned int cost = seq_cost (seq, speed_p);
>  
> -  if (cost <= if_info->original_cost)
> +  /* If the created sequence does not use cond_earliest (but the jump
> +     does) add its cost to the original_cost here.  */
> +  unsigned int cost_adjust = 0;
> +
> +  if (if_info->jump != if_info->cond_earliest
> +      && !if_info->use_cond_earliest)
> +    cost_adjust = insn_cost (if_info->cond_earliest,
> +			     if_info->speed_p);
> +
> +  if (cost <= if_info->original_cost + cost_adjust)
>      return true;
>  
>    /* When compiling for size, we can make a reasonably accurately guess
> @@ -3780,6 +3789,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>  	  temp_dest = temp_dest2;
>  	  if (!second_try && read_comparison)
>  	    *last_needs_comparison = count;
> +	  if_info->use_cond_earliest = true;
>  	}
>        else
>  	{
> @@ -3931,16 +3941,13 @@ noce_process_if_block (struct noce_if_info *if_info)
>       to calculate a value for x.
>       ??? For future expansion, further expand the "multiple X" rules.  */
>  
> -  /* First look for multiple SETS.  The original costs already include
> -     a base cost of COSTS_N_INSNS (2): one instruction for the compare
> -     (which we will be needing either way) and one instruction for the
> -     branch.  When comparing costs we want to use the branch instruction
> -     cost and the sets vs. the cmovs generated here.  Therefore subtract
> -     the costs of the compare before checking.
> -     ??? Actually, instead of the branch instruction costs we might want
> -     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
> +  /* First look for multiple SETS.
> +     The original costs already include costs for the jump insn as well
> +     as for a CC comparison if there is any.
> +     If a target re-uses the existing CC comparison we keep track of that
> +     and add the costs in default default_noce_conversion_profitable_p.  */
>  
> -  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  unsigned potential_cost = if_info->original_cost;
>    unsigned old_cost = if_info->original_cost;
>    if (!else_bb
>        && HAVE_conditional_move
> @@ -4696,6 +4703,7 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
>    gcc_assert (if_info.rev_cond == NULL_RTX
>  	      || rev_cond_earliest == cond_earliest);
>    if_info.cond_earliest = cond_earliest;
> +  if_info.use_cond_earliest = false;
>    if_info.jump = jump;
>    if_info.then_else_reversed = then_else_reversed;
>    if_info.speed_p = speed_p;
> @@ -4703,11 +4711,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
>      = targetm.max_noce_ifcvt_seq_cost (then_edge);
>    /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
>       that they are valid to transform.  We can't easily get back to the insn
> -     for COND (and it may not exist if we had to canonicalize to get COND),
> -     and jump_insns are always given a cost of 1 by seq_cost, so treat
> -     both instructions as having cost COSTS_N_INSNS (1).  */
> -  if_info.original_cost = COSTS_N_INSNS (2);
> -
> +     for COND (and it may not exist if we had to canonicalize to get COND).
> +     It is assumed that the costs of a jump insn are dependent on the
> +     branch costs.  */
> +  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
>  
>    /* Do the real work.  */
>  
> diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h
> index 37147f99129..67d4f6a502e 100644
> --- a/gcc/ifcvt.h
> +++ b/gcc/ifcvt.h
> @@ -63,6 +63,9 @@ struct noce_if_info
>    /* New insns should be inserted before this one.  */
>    rtx_insn *cond_earliest;
>  
> +  /* Whether or not COND_EARLIEST is used in the resulting sequence.  */
> +  bool use_cond_earliest;
> +
>    /* Insns in the THEN and ELSE block.  There is always just this
>       one insns in those blocks.  The insns are single_set insns.
>       If there was no ELSE block, INSN_B is the last insn before
Robin Dapp June 11, 2024, 7:32 p.m. UTC | #8
> I was looking at the code in more detail and just wanted to check.
> We have:
> 
>   int last_needs_comparison = -1;
> 
>   bool ok = noce_convert_multiple_sets_1
>     (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
>      &unmodified_insns, &last_needs_comparison);
>   if (!ok)
>       return false;
> 
>   /* If there are insns that overwrite part of the initial
>      comparison, we can still omit creating temporaries for
>      the last of them.
>      As the second try will always create a less expensive,
>      valid sequence, we do not need to compare and can discard
>      the first one.  */
>   if (last_needs_comparison != -1)
>     {
>       end_sequence ();
>       start_sequence ();
>       ok = noce_convert_multiple_sets_1
> 	(if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
> 	 &unmodified_insns, &last_needs_comparison);
>       /* Actually we should not fail anymore if we reached here,
> 	 but better still check.  */
>       if (!ok)
> 	  return false;
>     }
> 
> But noce_convert_multiple_sets_1 ends with:
> 
>   /* Even if we did not actually need the comparison, we want to make sure
>      to try a second time in order to get rid of the temporaries.  */
>   if (*last_needs_comparison == -1)
>     *last_needs_comparison = 0;
> 
> 
>   return true;
> 
> AFAICT that means that the first attempt is always redundant.
> 
> Have I missed something?

(I might not have fully gotten the question)

The idea is that the first attempt goes through all insns and sets
*last_need_comparison to the insn number that either
- used the condition/comparison by preferring seq1 or
- used the condition as a side-effect insn when creating a CC-using
  insn in seq2.
(And we only know that after actually creating the sequences). 

The second attempt then improves on the first one by skipping
any temporary destination registers after the last insn that required
the condition (even though its target overlaps with the condition
registers).  This is true for all cmovs that only use the CC
(instead of the condition).  Essentially, we know that all following
cmovs can be created via the CC which is not overwritten.

So, even when we never used the condition because of all CC-using
cmovs we would skip the temporary targets in the second attempt.
But we can't know that all we ever needed is the CC comparison
before actually creating the sequences in the first attempt.

Regards
 Robin
Richard Sandiford June 11, 2024, 9:19 p.m. UTC | #9
Robin Dapp <rdapp.gcc@gmail.com> writes:
>> I was looking at the code in more detail and just wanted to check.
>> We have:
>> 
>>   int last_needs_comparison = -1;
>> 
>>   bool ok = noce_convert_multiple_sets_1
>>     (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
>>      &unmodified_insns, &last_needs_comparison);
>>   if (!ok)
>>       return false;
>> 
>>   /* If there are insns that overwrite part of the initial
>>      comparison, we can still omit creating temporaries for
>>      the last of them.
>>      As the second try will always create a less expensive,
>>      valid sequence, we do not need to compare and can discard
>>      the first one.  */
>>   if (last_needs_comparison != -1)
>>     {
>>       end_sequence ();
>>       start_sequence ();
>>       ok = noce_convert_multiple_sets_1
>> 	(if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
>> 	 &unmodified_insns, &last_needs_comparison);
>>       /* Actually we should not fail anymore if we reached here,
>> 	 but better still check.  */
>>       if (!ok)
>> 	  return false;
>>     }
>> 
>> But noce_convert_multiple_sets_1 ends with:
>> 
>>   /* Even if we did not actually need the comparison, we want to make sure
>>      to try a second time in order to get rid of the temporaries.  */
>>   if (*last_needs_comparison == -1)
>>     *last_needs_comparison = 0;
>> 
>> 
>>   return true;
>> 
>> AFAICT that means that the first attempt is always redundant.
>> 
>> Have I missed something?
>
> (I might not have fully gotten the question)
>
> The idea is that the first attempt goes through all insns and sets
> *last_need_comparison to the insn number that either
> - used the condition/comparison by preferring seq1 or
> - used the condition as a side-effect insn when creating a CC-using
>   insn in seq2.
> (And we only know that after actually creating the sequences). 
>
> The second attempt then improves on the first one by skipping
> any temporary destination registers after the last insn that required
> the condition (even though its target overlaps with the condition
> registers).  This is true for all cmovs that only use the CC
> (instead of the condition).  Essentially, we know that all following
> cmovs can be created via the CC which is not overwritten.
>
> So, even when we never used the condition because of all CC-using
> cmovs we would skip the temporary targets in the second attempt.
> But we can't know that all we ever needed is the CC comparison
> before actually creating the sequences in the first attempt.

Hmm, ok.  The bit that confused me most was:

  if (last_needs_comparison != -1)
    {
      end_sequence ();
      start_sequence ();
      ...
    }

which implied that the second attempt was made conditionally.
It seems like it's always used and is an inherent part of the
algorithm.

If the problem is tracking liveness, wouldn't it be better to
iterate over the "then" block in reverse order?  We would start
with the liveness set for the join block and update as we move
backwards through the "then" block.  This liveness set would
tell us whether the current instruction needs to preserve a
particular register.  That should make it possible to do the
transformation in one step, and so avoid the risk that the
second attempt does something that is unexpectedly different
from the first attempt.

FWIW, the reason for asking was that it seemed safer to pass
use_cond_earliest back from noce_convert_multiple_sets_1
to noce_convert_multiple_sets, as another parameter,
and then do the adjustment around noce_convert_multiple_sets's
call to targetm.noce_conversion_profitable_p.  That would avoid
the new for a new if_info field, which in turn would make it
less likely that stale information is carried over from one attempt
to the next (e.g. if other ifcvt techniques end up using the same
field in future).

Thanks,
Richard
Robin Dapp June 12, 2024, 7:54 a.m. UTC | #10
> Hmm, ok.  The bit that confused me most was:
> 
>   if (last_needs_comparison != -1)
>     {
>       end_sequence ();
>       start_sequence ();
>       ...
>     }
> 
> which implied that the second attempt was made conditionally.
> It seems like it's always used and is an inherent part of the
> algorithm.
> 
> If the problem is tracking liveness, wouldn't it be better to
> iterate over the "then" block in reverse order?  We would start
> with the liveness set for the join block and update as we move
> backwards through the "then" block.  This liveness set would
> tell us whether the current instruction needs to preserve a
> particular register.  That should make it possible to do the
> transformation in one step, and so avoid the risk that the
> second attempt does something that is unexpectedly different
> from the first attempt.

I agree that the current approach is rather cumbersome.  Indeed
the second attempt was conditional at first and I changed it to
be unconditional after some patch iterations.
Your reverse-order idea sounds like it should work.  To further
clean up the algorithm we could also make it more explicit
that a "cmov" depends on either the condition or the CC and
basically track two separate paths through the block, one CC
path and one "condition" path.

I can surely do that as a follow up.  It might conflict with
Manolis's changes, though, so his work should probably be in
first.

> FWIW, the reason for asking was that it seemed safer to pass
> use_cond_earliest back from noce_convert_multiple_sets_1
> to noce_convert_multiple_sets, as another parameter,
> and then do the adjustment around noce_convert_multiple_sets's
> call to targetm.noce_conversion_profitable_p.  That would avoid
> the new for a new if_info field, which in turn would make it
> less likely that stale information is carried over from one attempt
> to the next (e.g. if other ifcvt techniques end up using the same
> field in future).

Would something like the attached v4 be OK that uses a parameter
instead (I mean without having refactored the full algorithm)?
At least I changed the comment before the second attempt to
hopefully cause a tiny bit less confusion :)
I haven't fully bootstrapped it yet.

Regards
 Robin

Before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.

This patch computes the jump costs via
  insn_cost (if_info.jump, ...)
under the assumption that the target takes BRANCH_COST into account
when costing a jump instruction.

In noce_convert_multiple_sets, we keep track of the need for the initial
CC comparison.  If we needed it for the generated sequence we add its
cost before default_noce_conversion_profitable_p.

gcc/ChangeLog:

	* ifcvt.cc (noce_convert_multiple_sets):  Define
	use_cond_earliest and adjust original cost if needed.
	(noce_convert_multiple_sets_1): Add param use_cond_earliest.
	(noce_process_if_block): Do not subtract CC cost anymore.
	(noce_find_if_block): Use insn_cost for costing jump insn.
---
 gcc/ifcvt.cc | 79 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 35 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..2854eea7702 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -105,7 +105,8 @@ static bool noce_convert_multiple_sets_1 (struct noce_if_info *,
 					  hash_map<rtx_insn *, int> *,
 					  auto_vec<rtx> *,
 					  auto_vec<rtx> *,
-					  auto_vec<rtx_insn *> *, int *);
+					  auto_vec<rtx_insn *> *,
+					  int *, bool *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3502,30 +3503,28 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   int last_needs_comparison = -1;
 
+  bool use_cond_earliest = false;
+
   bool ok = noce_convert_multiple_sets_1
     (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-     &unmodified_insns, &last_needs_comparison);
+     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
   if (!ok)
       return false;
 
-  /* If there are insns that overwrite part of the initial
-     comparison, we can still omit creating temporaries for
-     the last of them.
-     As the second try will always create a less expensive,
-     valid sequence, we do not need to compare and can discard
-     the first one.  */
-  if (last_needs_comparison != -1)
-    {
-      end_sequence ();
-      start_sequence ();
-      ok = noce_convert_multiple_sets_1
-	(if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-	 &unmodified_insns, &last_needs_comparison);
-      /* Actually we should not fail anymore if we reached here,
-	 but better still check.  */
-      if (!ok)
-	  return false;
-    }
+  /* Always perform a second attempt that uses information gathered in the
+     first.  At least we can omit creating temporaries until we definitely
+     need them.  The sequence created in the second attempt is never worse
+     than the first.  */
+
+  end_sequence ();
+  start_sequence ();
+  ok = noce_convert_multiple_sets_1
+    (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
+     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
+  /* Actually we should not fail anymore if we reached here,
+     but better still check.  */
+  if (!ok)
+    return false;
 
   /* We must have seen some sort of insn to insert, otherwise we were
      given an empty BB to convert, and we can't handle that.  */
@@ -3539,12 +3538,23 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
 
+  /* If the created sequence does not use cond_earliest (but the jump
+     does) add its cost to the original_cost before comparing costs.  */
+  unsigned int original_cost = if_info->original_cost;
+  if (if_info->jump != if_info->cond_earliest && !use_cond_earliest)
+    if_info->original_cost += insn_cost (if_info->cond_earliest,
+					 if_info->speed_p);
+
   if (!targetm.noce_conversion_profitable_p (seq, if_info))
     {
+      if_info->original_cost = original_cost;
       end_sequence ();
       return false;
     }
 
+  /* Restore the original cost in case we do not succeed below.  */
+  if_info->original_cost = original_cost;
+
   for (insn = seq; insn; insn = NEXT_INSN (insn))
     set_used_flags (insn);
 
@@ -3620,7 +3630,8 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 			      auto_vec<rtx> *targets,
 			      auto_vec<rtx> *temporaries,
 			      auto_vec<rtx_insn *> *unmodified_insns,
-			      int *last_needs_comparison)
+			      int *last_needs_comparison,
+			      bool *use_cond_earliest)
 {
   basic_block then_bb = if_info->then_bb;
   rtx_insn *jump = if_info->jump;
@@ -3644,6 +3655,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
   unmodified_insns->truncate (0);
 
   bool second_try = *last_needs_comparison != -1;
+  *use_cond_earliest = false;
 
   FOR_BB_INSNS (then_bb, insn)
     {
@@ -3780,6 +3792,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	  temp_dest = temp_dest2;
 	  if (!second_try && read_comparison)
 	    *last_needs_comparison = count;
+	  *use_cond_earliest = true;
 	}
       else
 	{
@@ -3931,16 +3944,13 @@ noce_process_if_block (struct noce_if_info *if_info)
      to calculate a value for x.
      ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  The original costs already include
-     a base cost of COSTS_N_INSNS (2): one instruction for the compare
-     (which we will be needing either way) and one instruction for the
-     branch.  When comparing costs we want to use the branch instruction
-     cost and the sets vs. the cmovs generated here.  Therefore subtract
-     the costs of the compare before checking.
-     ??? Actually, instead of the branch instruction costs we might want
-     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
+  /* First look for multiple SETS.
+     The original costs already include costs for the jump insn as well
+     as for a CC comparison if there is any.
+     If a target re-uses the existing CC comparison we keep track of that
+     and add the costs before default noce_conversion_profitable_p.  */
 
-  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned potential_cost = if_info->original_cost;
   unsigned old_cost = if_info->original_cost;
   if (!else_bb
       && HAVE_conditional_move
@@ -4703,11 +4713,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
     = targetm.max_noce_ifcvt_seq_cost (then_edge);
   /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
      that they are valid to transform.  We can't easily get back to the insn
-     for COND (and it may not exist if we had to canonicalize to get COND),
-     and jump_insns are always given a cost of 1 by seq_cost, so treat
-     both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+     for COND (and it may not exist if we had to canonicalize to get COND).
+     It is assumed that the costs of a jump insn are dependent on the
+     branch costs.  */
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
 
   /* Do the real work.  */
Jeff Law Aug. 25, 2024, 10:15 p.m. UTC | #11
I think Manolis's patches are all in, time to revisit this one?



On 6/12/24 1:54 AM, Robin Dapp wrote:
>> Hmm, ok.  The bit that confused me most was:
>>
>>    if (last_needs_comparison != -1)
>>      {
>>        end_sequence ();
>>        start_sequence ();
>>        ...
>>      }
>>
>> which implied that the second attempt was made conditionally.
>> It seems like it's always used and is an inherent part of the
>> algorithm.
>>
>> If the problem is tracking liveness, wouldn't it be better to
>> iterate over the "then" block in reverse order?  We would start
>> with the liveness set for the join block and update as we move
>> backwards through the "then" block.  This liveness set would
>> tell us whether the current instruction needs to preserve a
>> particular register.  That should make it possible to do the
>> transformation in one step, and so avoid the risk that the
>> second attempt does something that is unexpectedly different
>> from the first attempt.
> 
> I agree that the current approach is rather cumbersome.  Indeed
> the second attempt was conditional at first and I changed it to
> be unconditional after some patch iterations.
> Your reverse-order idea sounds like it should work.  To further
> clean up the algorithm we could also make it more explicit
> that a "cmov" depends on either the condition or the CC and
> basically track two separate paths through the block, one CC
> path and one "condition" path.
> 
> I can surely do that as a follow up.  It might conflict with
> Manolis's changes, though, so his work should probably be in
> first.
> 
>> FWIW, the reason for asking was that it seemed safer to pass
>> use_cond_earliest back from noce_convert_multiple_sets_1
>> to noce_convert_multiple_sets, as another parameter,
>> and then do the adjustment around noce_convert_multiple_sets's
>> call to targetm.noce_conversion_profitable_p.  That would avoid
>> the new for a new if_info field, which in turn would make it
>> less likely that stale information is carried over from one attempt
>> to the next (e.g. if other ifcvt techniques end up using the same
>> field in future).
> 
> Would something like the attached v4 be OK that uses a parameter
> instead (I mean without having refactored the full algorithm)?
> At least I changed the comment before the second attempt to
> hopefully cause a tiny bit less confusion :)
> I haven't fully bootstrapped it yet.
> 
> Regards
>   Robin
> 
> Before noce_find_if_block processes a block it sets up an if_info
> structure that holds the original costs.  At that point the costs of
> the then/else blocks have not been added so we only care about the
> "if" cost.
> 
> The code originally used BRANCH_COST for that but was then changed
> to COST_N_INSNS (2) - a compare and a jump.
> 
> This patch computes the jump costs via
>    insn_cost (if_info.jump, ...)
> under the assumption that the target takes BRANCH_COST into account
> when costing a jump instruction.
> 
> In noce_convert_multiple_sets, we keep track of the need for the initial
> CC comparison.  If we needed it for the generated sequence we add its
> cost before default_noce_conversion_profitable_p.
> 
> gcc/ChangeLog:
> 
> 	* ifcvt.cc (noce_convert_multiple_sets):  Define
> 	use_cond_earliest and adjust original cost if needed.
> 	(noce_convert_multiple_sets_1): Add param use_cond_earliest.
> 	(noce_process_if_block): Do not subtract CC cost anymore.
> 	(noce_find_if_block): Use insn_cost for costing jump insn.
> ---
>   gcc/ifcvt.cc | 79 +++++++++++++++++++++++++++++-----------------------
>   1 file changed, 44 insertions(+), 35 deletions(-)
> 
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 58ed42673e5..2854eea7702 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -105,7 +105,8 @@ static bool noce_convert_multiple_sets_1 (struct noce_if_info *,
>   					  hash_map<rtx_insn *, int> *,
>   					  auto_vec<rtx> *,
>   					  auto_vec<rtx> *,
> -					  auto_vec<rtx_insn *> *, int *);
> +					  auto_vec<rtx_insn *> *,
> +					  int *, bool *);
>   
>   /* Count the number of non-jump active insns in BB.  */
>   
> @@ -3502,30 +3503,28 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>   
>     int last_needs_comparison = -1;
>   
> +  bool use_cond_earliest = false;
> +
>     bool ok = noce_convert_multiple_sets_1
>       (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
> -     &unmodified_insns, &last_needs_comparison);
> +     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
>     if (!ok)
>         return false;
>   
> -  /* If there are insns that overwrite part of the initial
> -     comparison, we can still omit creating temporaries for
> -     the last of them.
> -     As the second try will always create a less expensive,
> -     valid sequence, we do not need to compare and can discard
> -     the first one.  */
> -  if (last_needs_comparison != -1)
> -    {
> -      end_sequence ();
> -      start_sequence ();
> -      ok = noce_convert_multiple_sets_1
> -	(if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
> -	 &unmodified_insns, &last_needs_comparison);
> -      /* Actually we should not fail anymore if we reached here,
> -	 but better still check.  */
> -      if (!ok)
> -	  return false;
> -    }
> +  /* Always perform a second attempt that uses information gathered in the
> +     first.  At least we can omit creating temporaries until we definitely
> +     need them.  The sequence created in the second attempt is never worse
> +     than the first.  */
> +
> +  end_sequence ();
> +  start_sequence ();
> +  ok = noce_convert_multiple_sets_1
> +    (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
> +     &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
> +  /* Actually we should not fail anymore if we reached here,
> +     but better still check.  */
> +  if (!ok)
> +    return false;
>   
>     /* We must have seen some sort of insn to insert, otherwise we were
>        given an empty BB to convert, and we can't handle that.  */
> @@ -3539,12 +3538,23 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
>     /* Actually emit the sequence if it isn't too expensive.  */
>     rtx_insn *seq = get_insns ();
>   
> +  /* If the created sequence does not use cond_earliest (but the jump
> +     does) add its cost to the original_cost before comparing costs.  */
> +  unsigned int original_cost = if_info->original_cost;
> +  if (if_info->jump != if_info->cond_earliest && !use_cond_earliest)
> +    if_info->original_cost += insn_cost (if_info->cond_earliest,
> +					 if_info->speed_p);
> +
>     if (!targetm.noce_conversion_profitable_p (seq, if_info))
>       {
> +      if_info->original_cost = original_cost;
>         end_sequence ();
>         return false;
>       }
>   
> +  /* Restore the original cost in case we do not succeed below.  */
> +  if_info->original_cost = original_cost;
> +
>     for (insn = seq; insn; insn = NEXT_INSN (insn))
>       set_used_flags (insn);
>   
> @@ -3620,7 +3630,8 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>   			      auto_vec<rtx> *targets,
>   			      auto_vec<rtx> *temporaries,
>   			      auto_vec<rtx_insn *> *unmodified_insns,
> -			      int *last_needs_comparison)
> +			      int *last_needs_comparison,
> +			      bool *use_cond_earliest)
>   {
>     basic_block then_bb = if_info->then_bb;
>     rtx_insn *jump = if_info->jump;
> @@ -3644,6 +3655,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>     unmodified_insns->truncate (0);
>   
>     bool second_try = *last_needs_comparison != -1;
> +  *use_cond_earliest = false;
>   
>     FOR_BB_INSNS (then_bb, insn)
>       {
> @@ -3780,6 +3792,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
>   	  temp_dest = temp_dest2;
>   	  if (!second_try && read_comparison)
>   	    *last_needs_comparison = count;
> +	  *use_cond_earliest = true;
>   	}
>         else
>   	{
> @@ -3931,16 +3944,13 @@ noce_process_if_block (struct noce_if_info *if_info)
>        to calculate a value for x.
>        ??? For future expansion, further expand the "multiple X" rules.  */
>   
> -  /* First look for multiple SETS.  The original costs already include
> -     a base cost of COSTS_N_INSNS (2): one instruction for the compare
> -     (which we will be needing either way) and one instruction for the
> -     branch.  When comparing costs we want to use the branch instruction
> -     cost and the sets vs. the cmovs generated here.  Therefore subtract
> -     the costs of the compare before checking.
> -     ??? Actually, instead of the branch instruction costs we might want
> -     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
> +  /* First look for multiple SETS.
> +     The original costs already include costs for the jump insn as well
> +     as for a CC comparison if there is any.
> +     If a target re-uses the existing CC comparison we keep track of that
> +     and add the costs before default noce_conversion_profitable_p.  */
>   
> -  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  unsigned potential_cost = if_info->original_cost;
>     unsigned old_cost = if_info->original_cost;
>     if (!else_bb
>         && HAVE_conditional_move
> @@ -4703,11 +4713,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
>       = targetm.max_noce_ifcvt_seq_cost (then_edge);
>     /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
>        that they are valid to transform.  We can't easily get back to the insn
> -     for COND (and it may not exist if we had to canonicalize to get COND),
> -     and jump_insns are always given a cost of 1 by seq_cost, so treat
> -     both instructions as having cost COSTS_N_INSNS (1).  */
> -  if_info.original_cost = COSTS_N_INSNS (2);
> -
> +     for COND (and it may not exist if we had to canonicalize to get COND).
> +     It is assumed that the costs of a jump insn are dependent on the
> +     branch costs.  */
> +  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
>   
>     /* Do the real work.  */
>
Robin Dapp Nov. 7, 2024, 2:56 p.m. UTC | #12
>> If the problem is tracking liveness, wouldn't it be better to
>> iterate over the "then" block in reverse order?  We would start
>> with the liveness set for the join block and update as we move
>> backwards through the "then" block.  This liveness set would
>> tell us whether the current instruction needs to preserve a
>> particular register.  That should make it possible to do the
>> transformation in one step, and so avoid the risk that the
>> second attempt does something that is unexpectedly different
>> from the first attempt.
>
> I agree that the current approach is rather cumbersome.  Indeed
> the second attempt was conditional at first and I changed it to
> be unconditional after some patch iterations.
> Your reverse-order idea sounds like it should work.  To further
> clean up the algorithm we could also make it more explicit
> that a "cmov" depends on either the condition or the CC and
> basically track two separate paths through the block, one CC
> path and one "condition" path.

I gave this another thought.  Right now we keep track of the
generated targets and temporaries in forward order, using those
for the source rewiring.  I don't see how we could do that in
reverse order other than have another fixup iteration afterwards.

>> FWIW, the reason for asking was that it seemed safer to pass
>> use_cond_earliest back from noce_convert_multiple_sets_1
>> to noce_convert_multiple_sets, as another parameter,
>> and then do the adjustment around noce_convert_multiple_sets's
>> call to targetm.noce_conversion_profitable_p.  That would avoid
>> the new for a new if_info field, which in turn would make it
>> less likely that stale information is carried over from one attempt
>> to the next (e.g. if other ifcvt techniques end up using the same
>> field in future).
>
> Would something like the attached v4 be OK that uses a parameter
> instead (I mean without having refactored the full algorithm)?
> At least I changed the comment before the second attempt to
> hopefully cause a tiny bit less confusion :)
> I haven't fully bootstrapped it yet.

That v4 was bootstrapped and regtested on x86 and aarch64 in the meanwhile and
it has been in our internal tree for a while without problems.

Would it be OK for trunk without further refactoring?
Richard Sandiford Nov. 7, 2024, 5:19 p.m. UTC | #13
"Robin Dapp" <rdapp.gcc@gmail.com> writes:
>>> If the problem is tracking liveness, wouldn't it be better to
>>> iterate over the "then" block in reverse order?  We would start
>>> with the liveness set for the join block and update as we move
>>> backwards through the "then" block.  This liveness set would
>>> tell us whether the current instruction needs to preserve a
>>> particular register.  That should make it possible to do the
>>> transformation in one step, and so avoid the risk that the
>>> second attempt does something that is unexpectedly different
>>> from the first attempt.
>>
>> I agree that the current approach is rather cumbersome.  Indeed
>> the second attempt was conditional at first and I changed it to
>> be unconditional after some patch iterations.
>> Your reverse-order idea sounds like it should work.  To further
>> clean up the algorithm we could also make it more explicit
>> that a "cmov" depends on either the condition or the CC and
>> basically track two separate paths through the block, one CC
>> path and one "condition" path.
>
> I gave this another thought.  Right now we keep track of the
> generated targets and temporaries in forward order, using those
> for the source rewiring.  I don't see how we could do that in
> reverse order other than have another fixup iteration afterwards.
>
>>> FWIW, the reason for asking was that it seemed safer to pass
>>> use_cond_earliest back from noce_convert_multiple_sets_1
>>> to noce_convert_multiple_sets, as another parameter,
>>> and then do the adjustment around noce_convert_multiple_sets's
>>> call to targetm.noce_conversion_profitable_p.  That would avoid
>>> the new for a new if_info field, which in turn would make it
>>> less likely that stale information is carried over from one attempt
>>> to the next (e.g. if other ifcvt techniques end up using the same
>>> field in future).
>>
>> Would something like the attached v4 be OK that uses a parameter
>> instead (I mean without having refactored the full algorithm)?
>> At least I changed the comment before the second attempt to
>> hopefully cause a tiny bit less confusion :)
>> I haven't fully bootstrapped it yet.
>
> That v4 was bootstrapped and regtested on x86 and aarch64 in the meanwhile and
> it has been in our internal tree for a while without problems.
>
> Would it be OK for trunk without further refactoring?

I think it'd be better if I abstain from this.  I probably disagree too
much with the current structure and the way that the code is developing.
I won't object if anyone else approves it though.

Thanks,
Richard
Robin Dapp Nov. 7, 2024, 5:36 p.m. UTC | #14
> I think it'd be better if I abstain from this.  I probably disagree too
> much with the current structure and the way that the code is developing.
> I won't object if anyone else approves it though.

It's not that I'm happy with the current state either and I thought about
how to rewrite it more than once.  So if you have an idea for a good
rework of it (or larger parts of ifcvt) I'd be all ears.

My argument right now would be that this patch doesn't make things worse in
terms of complexity and improves SPEC 2017's deepsjeng considerably on riscv.
IMHO it doesn't inhibit a future rewrite and even simplifies things
ever so slightly.
diff mbox series

Patch

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..305b9faed38 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3940,7 +3940,9 @@  noce_process_if_block (struct noce_if_info *if_info)
      ??? Actually, instead of the branch instruction costs we might want
      to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
 
-  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned potential_cost = if_info->original_cost;
+  if (cc_in_cond (if_info->cond))
+    potential_cost -= pattern_cost (if_info->cond, if_info->speed_p);
   unsigned old_cost = if_info->original_cost;
   if (!else_bb
       && HAVE_conditional_move
@@ -4703,11 +4705,13 @@  noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
     = targetm.max_noce_ifcvt_seq_cost (then_edge);
   /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
      that they are valid to transform.  We can't easily get back to the insn
-     for COND (and it may not exist if we had to canonicalize to get COND),
-     and jump_insns are always given a cost of 1 by seq_cost, so treat
-     both instructions as having cost COSTS_N_INSNS (1).  */
-  if_info.original_cost = COSTS_N_INSNS (2);
-
+     for COND (and it may not exist if we had to canonicalize to get COND).
+     Here we assume one CC compare insn (if the target uses CC) and one
+     jump insn that is costed via insn_cost.  It is assumed that the
+     costs of a jump insn are dependent on the branch costs.  */
+  if (cc_in_cond (if_info.cond))
+    if_info.original_cost = pattern_cost (if_info.cond, if_info.speed_p);
+  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
 
   /* Do the real work.  */