Message ID | 57bb6ce5-79c3-4b08-b524-e807b9ac431b@gmail.com |
---|---|
State | New |
Headers | show |
Series | ifcvt: Clarify if_info.original_cost. | expand |
On 5/31/24 9:03 AM, Robin Dapp wrote: > Hi, > > before noce_find_if_block processes a block it sets up an if_info > structure that holds the original costs. At that point the costs of > the then/else blocks have not been added so we only care about the > "if" cost. > > The code originally used BRANCH_COST for that but was then changed > to COST_N_INSNS (2) - a compare and a jump. > This patch computes the jump costs via > insn_cost (if_info.jump, ...) > which is supposed to incorporate the branch costs and, in case of a CC > comparison, > pattern_cost (if_info.cond, ...) > which is supposed to account for the CC creation. > > For compare_and_jump patterns insn_cost should have already computed > the right cost. > > Does this "split" make sense, generally? > > Bootstrapped and regtested on x86, aarch64 and power10. Regtested > on riscv. > > Regards > Robin > > gcc/ChangeLog: > > * ifcvt.cc (noce_process_if_block): Subtract condition pattern > cost if applicable. > (noce_find_if_block): Use insn_cost and pattern_cost for > original cost. OK. Obviously we'll need to be on the lookout for regressions. My bet is on s390 since you already tested the x86, aarch64 & p10 targets :-) jeff
On Fri, May 31, 2024 at 10:05:55PM -0600, Jeff Law wrote: > > > On 5/31/24 9:03 AM, Robin Dapp wrote: > > Hi, > > > > before noce_find_if_block processes a block it sets up an if_info > > structure that holds the original costs. At that point the costs of > > the then/else blocks have not been added so we only care about the > > "if" cost. > > > > The code originally used BRANCH_COST for that but was then changed > > to COST_N_INSNS (2) - a compare and a jump. > > This patch computes the jump costs via > > insn_cost (if_info.jump, ...) > > which is supposed to incorporate the branch costs and, in case of a CC > > comparison, > > pattern_cost (if_info.cond, ...) > > which is supposed to account for the CC creation. > > > > For compare_and_jump patterns insn_cost should have already computed > > the right cost. > > > > Does this "split" make sense, generally? > > > > Bootstrapped and regtested on x86, aarch64 and power10. Regtested > > on riscv. > > > > Regards > > Robin > > > > gcc/ChangeLog: > > > > * ifcvt.cc (noce_process_if_block): Subtract condition pattern > > cost if applicable. > > (noce_find_if_block): Use insn_cost and pattern_cost for > > original cost. > OK. Obviously we'll need to be on the lookout for regressions. My bet is > on s390 since you already tested the x86, aarch64 & p10 targets :-) I just gave it a try on s390 where bootstrap and regtest were successful. Cheers, Stefan > > > jeff >
Robin Dapp <rdapp.gcc@gmail.com> writes: > Hi, > > before noce_find_if_block processes a block it sets up an if_info > structure that holds the original costs. At that point the costs of > the then/else blocks have not been added so we only care about the > "if" cost. > > The code originally used BRANCH_COST for that but was then changed > to COST_N_INSNS (2) - a compare and a jump. > This patch computes the jump costs via > insn_cost (if_info.jump, ...) > which is supposed to incorporate the branch costs and, in case of a CC > comparison, > pattern_cost (if_info.cond, ...) > which is supposed to account for the CC creation. > > For compare_and_jump patterns insn_cost should have already computed > the right cost. > > Does this "split" make sense, generally? > > Bootstrapped and regtested on x86, aarch64 and power10. Regtested > on riscv. > > Regards > Robin > > gcc/ChangeLog: > > * ifcvt.cc (noce_process_if_block): Subtract condition pattern > cost if applicable. > (noce_find_if_block): Use insn_cost and pattern_cost for > original cost. > --- > gcc/ifcvt.cc | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > index 58ed42673e5..305b9faed38 100644 > --- a/gcc/ifcvt.cc > +++ b/gcc/ifcvt.cc > @@ -3940,7 +3940,9 @@ noce_process_if_block (struct noce_if_info *if_info) > ??? Actually, instead of the branch instruction costs we might want > to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ > > - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); > + unsigned potential_cost = if_info->original_cost; > + if (cc_in_cond (if_info->cond)) > + potential_cost -= pattern_cost (if_info->cond, if_info->speed_p); > unsigned old_cost = if_info->original_cost; > if (!else_bb > && HAVE_conditional_move > @@ -4703,11 +4705,13 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, > = targetm.max_noce_ifcvt_seq_cost (then_edge); > /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check > that they are valid to transform. We can't easily get back to the insn > - for COND (and it may not exist if we had to canonicalize to get COND), > - and jump_insns are always given a cost of 1 by seq_cost, so treat > - both instructions as having cost COSTS_N_INSNS (1). */ > - if_info.original_cost = COSTS_N_INSNS (2); > - > + for COND (and it may not exist if we had to canonicalize to get COND). > + Here we assume one CC compare insn (if the target uses CC) and one > + jump insn that is costed via insn_cost. It is assumed that the > + costs of a jump insn are dependent on the branch costs. */ > + if (cc_in_cond (if_info.cond)) > + if_info.original_cost = pattern_cost (if_info.cond, if_info.speed_p); > + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); > > /* Do the real work. */ Is there any way we can avoid using pattern_cost here? Using it means that we can make use of targetm.insn_cost for the jump but circumvent it for the condition, giving a bit of a mixed metric. (I realise there are existing calls to pattern_cost in ifcvt.cc, but if possible I think we should try to avoid adding more.) Thanks, Richard
> Is there any way we can avoid using pattern_cost here? Using it means > that we can make use of targetm.insn_cost for the jump but circumvent > it for the condition, giving a bit of a mixed metric. > > (I realise there are existing calls to pattern_cost in ifcvt.cc, > but if possible I think we should try to avoid adding more.) Yes, I believe there is. In addition, what I did with if_info->cond wasn't what I intended to do. The whole point of the exercise is that noce_convert_multiple_sets can re-use the CC comparison that is already present (because it is used in the jump pattern). Therefore I want to split costs into a jump part and a CC-setting part so the final costing decision for multiple sets can be: insn_cost (jump) + n * insn_cost (set) vs n * insn_cost ("cmov") Still, the original costs should be: insn_cost (set_cc) + insn_cost (jump) and with the split we can just remove insn_cost (set_cc) before the multiple-set cost comparison and re-add it afterwards. For non-CC targets this is not necessary. So what I'd hope is better is to use insn_cost (if_info.earliest_cond) which is indeed the CC-set/comparison if it exists. The attached v2 was bootstrapped and regtested on x86, aarch64 and power10 and regtested on riscv64. Regards Robin gcc/ChangeLog: * ifcvt.cc (noce_process_if_block): Subtract condition pattern cost if applicable. (noce_find_if_block): Use insn_cost and pattern_cost for original cost. --- gcc/ifcvt.cc | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index 58ed42673e5..ebb838fd82c 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -3931,16 +3931,16 @@ noce_process_if_block (struct noce_if_info *if_info) to calculate a value for x. ??? For future expansion, further expand the "multiple X" rules. */ - /* First look for multiple SETS. The original costs already include - a base cost of COSTS_N_INSNS (2): one instruction for the compare - (which we will be needing either way) and one instruction for the - branch. When comparing costs we want to use the branch instruction - cost and the sets vs. the cmovs generated here. Therefore subtract - the costs of the compare before checking. - ??? Actually, instead of the branch instruction costs we might want - to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ - - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); + /* First look for multiple SETS. + The original costs already include costs for the jump insn as well + as for a CC comparison if there is any. + We want to allow the backend to re-use the existing CC comparison + and therefore don't consider it for the cost comparison (as it is + then needed for both the jump as well as the cmov sequence). */ + + unsigned potential_cost = if_info->original_cost; + if (if_info->cond_earliest && if_info->jump != if_info->cond_earliest) + potential_cost -= insn_cost (if_info->cond_earliest, if_info->speed_p); unsigned old_cost = if_info->original_cost; if (!else_bb && HAVE_conditional_move @@ -4703,11 +4703,12 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, = targetm.max_noce_ifcvt_seq_cost (then_edge); /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check that they are valid to transform. We can't easily get back to the insn - for COND (and it may not exist if we had to canonicalize to get COND), - and jump_insns are always given a cost of 1 by seq_cost, so treat - both instructions as having cost COSTS_N_INSNS (1). */ - if_info.original_cost = COSTS_N_INSNS (2); - + for COND (and it may not exist if we had to canonicalize to get COND). + jump insn that is costed via insn_cost. It is assumed that the + costs of a jump insn are dependent on the branch costs. */ + if (if_info.cond_earliest && if_info.jump != if_info.cond_earliest) + if_info.original_cost = insn_cost (if_info.cond_earliest, if_info.speed_p); + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); /* Do the real work. */
Robin Dapp <rdapp.gcc@gmail.com> writes: >> Is there any way we can avoid using pattern_cost here? Using it means >> that we can make use of targetm.insn_cost for the jump but circumvent >> it for the condition, giving a bit of a mixed metric. >> >> (I realise there are existing calls to pattern_cost in ifcvt.cc, >> but if possible I think we should try to avoid adding more.) > > Yes, I believe there is. In addition, what I did with > if_info->cond wasn't what I intended to do. > > The whole point of the exercise is that noce_convert_multiple_sets > can re-use the CC comparison that is already present (because it > is used in the jump pattern). Therefore I want to split costs > into a jump part and a CC-setting part so the final costing > decision for multiple sets can be: > > insn_cost (jump) + n * insn_cost (set) > vs > n * insn_cost ("cmov") > > Still, the original costs should be: > insn_cost (set_cc) + insn_cost (jump) > and with the split we can just remove insn_cost (set_cc) before > the multiple-set cost comparison and re-add it afterwards. > > For non-CC targets this is not necessary. > > So what I'd hope is better is to use > insn_cost (if_info.earliest_cond) > which is indeed the CC-set/comparison if it exists. I agree that's probably good enough in practice. It doesn't cope with things like: /* Handle sequences like: (set op0 (xor X Y)) ...(eq|ne op0 (const_int 0))... in which case: (eq op0 (const_int 0)) reduces to (eq X Y) (ne op0 (const_int 0)) reduces to (ne X Y) This is the form used by MIPS16, for example. */ but then neither does the current code. But... > The attached v2 was bootstrapped and regtested on x86, aarch64 and > power10 and regtested on riscv64. > > Regards > Robin > > gcc/ChangeLog: > > * ifcvt.cc (noce_process_if_block): Subtract condition pattern > cost if applicable. > (noce_find_if_block): Use insn_cost and pattern_cost for > original cost. > --- > gcc/ifcvt.cc | 31 ++++++++++++++++--------------- > 1 file changed, 16 insertions(+), 15 deletions(-) > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > index 58ed42673e5..ebb838fd82c 100644 > --- a/gcc/ifcvt.cc > +++ b/gcc/ifcvt.cc > @@ -3931,16 +3931,16 @@ noce_process_if_block (struct noce_if_info *if_info) > to calculate a value for x. > ??? For future expansion, further expand the "multiple X" rules. */ > > - /* First look for multiple SETS. The original costs already include > - a base cost of COSTS_N_INSNS (2): one instruction for the compare > - (which we will be needing either way) and one instruction for the > - branch. When comparing costs we want to use the branch instruction > - cost and the sets vs. the cmovs generated here. Therefore subtract > - the costs of the compare before checking. > - ??? Actually, instead of the branch instruction costs we might want > - to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ > - > - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); > + /* First look for multiple SETS. > + The original costs already include costs for the jump insn as well > + as for a CC comparison if there is any. > + We want to allow the backend to re-use the existing CC comparison > + and therefore don't consider it for the cost comparison (as it is > + then needed for both the jump as well as the cmov sequence). */ > + > + unsigned potential_cost = if_info->original_cost; > + if (if_info->cond_earliest && if_info->jump != if_info->cond_earliest) > + potential_cost -= insn_cost (if_info->cond_earliest, if_info->speed_p); > unsigned old_cost = if_info->original_cost; > if (!else_bb > && HAVE_conditional_move ...why do we do the adjustment here? Doesn't noce_convert_multiple_sets_1 know for certain (or at least with more certainty) whether any of the new instructions use the old CC result? It seems like we could record that and do the adjustment around the call to targetm.noce_conversion_profitable_p. > @@ -4703,11 +4703,12 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, > = targetm.max_noce_ifcvt_seq_cost (then_edge); > /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check > that they are valid to transform. We can't easily get back to the insn > - for COND (and it may not exist if we had to canonicalize to get COND), > - and jump_insns are always given a cost of 1 by seq_cost, so treat > - both instructions as having cost COSTS_N_INSNS (1). */ > - if_info.original_cost = COSTS_N_INSNS (2); > - > + for COND (and it may not exist if we had to canonicalize to get COND). > + jump insn that is costed via insn_cost. It is assumed that the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Looks like this part of the comment got a bit garbled. Thanks, Richard > + costs of a jump insn are dependent on the branch costs. */ > + if (if_info.cond_earliest && if_info.jump != if_info.cond_earliest) > + if_info.original_cost = insn_cost (if_info.cond_earliest, if_info.speed_p); > + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); > > /* Do the real work. */
The attached v3 tracks the use of cond_earliest as you suggested and adds its cost in default_noce_conversion_profitable_p. Bootstrapped and regtested on x86 and p10, aarch64 still running. Regtested on riscv64. Regards Robin Before noce_find_if_block processes a block it sets up an if_info structure that holds the original costs. At that point the costs of the then/else blocks have not been added so we only care about the "if" cost. The code originally used BRANCH_COST for that but was then changed to COST_N_INSNS (2) - a compare and a jump. This patch computes the jump costs via insn_cost (if_info.jump, ...) under the assumption that the target takes BRANCH_COST into account when costing a jump instruction. In noce_convert_multiple_sets, we keep track of the need for the initial CC comparison. If we needed it for the generated sequence we add its cost in default_noce_conversion_profitable_p. gcc/ChangeLog: * ifcvt.cc (default_noce_conversion_profitable_p): Add cost of CC comparison. (noce_convert_multiple_sets_1): Set use_cond_earliest. (noce_process_if_block): Just use original cost. (noce_find_if_block): Use insn_cost (jump_insn). * ifcvt.h (struct noce_if_info): Add use_cond_earliest. --- gcc/ifcvt.cc | 37 ++++++++++++++++++++++--------------- gcc/ifcvt.h | 3 +++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index 58ed42673e5..9b408eeb313 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -814,7 +814,16 @@ default_noce_conversion_profitable_p (rtx_insn *seq, /* Cost up the new sequence. */ unsigned int cost = seq_cost (seq, speed_p); - if (cost <= if_info->original_cost) + /* If the created sequence does not use cond_earliest (but the jump + does) add its cost to the original_cost here. */ + unsigned int cost_adjust = 0; + + if (if_info->jump != if_info->cond_earliest + && !if_info->use_cond_earliest) + cost_adjust = insn_cost (if_info->cond_earliest, + if_info->speed_p); + + if (cost <= if_info->original_cost + cost_adjust) return true; /* When compiling for size, we can make a reasonably accurately guess @@ -3780,6 +3789,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, temp_dest = temp_dest2; if (!second_try && read_comparison) *last_needs_comparison = count; + if_info->use_cond_earliest = true; } else { @@ -3931,16 +3941,13 @@ noce_process_if_block (struct noce_if_info *if_info) to calculate a value for x. ??? For future expansion, further expand the "multiple X" rules. */ - /* First look for multiple SETS. The original costs already include - a base cost of COSTS_N_INSNS (2): one instruction for the compare - (which we will be needing either way) and one instruction for the - branch. When comparing costs we want to use the branch instruction - cost and the sets vs. the cmovs generated here. Therefore subtract - the costs of the compare before checking. - ??? Actually, instead of the branch instruction costs we might want - to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ + /* First look for multiple SETS. + The original costs already include costs for the jump insn as well + as for a CC comparison if there is any. + If a target re-uses the existing CC comparison we keep track of that + and add the costs in default default_noce_conversion_profitable_p. */ - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); + unsigned potential_cost = if_info->original_cost; unsigned old_cost = if_info->original_cost; if (!else_bb && HAVE_conditional_move @@ -4696,6 +4703,7 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, gcc_assert (if_info.rev_cond == NULL_RTX || rev_cond_earliest == cond_earliest); if_info.cond_earliest = cond_earliest; + if_info.use_cond_earliest = false; if_info.jump = jump; if_info.then_else_reversed = then_else_reversed; if_info.speed_p = speed_p; @@ -4703,11 +4711,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, = targetm.max_noce_ifcvt_seq_cost (then_edge); /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check that they are valid to transform. We can't easily get back to the insn - for COND (and it may not exist if we had to canonicalize to get COND), - and jump_insns are always given a cost of 1 by seq_cost, so treat - both instructions as having cost COSTS_N_INSNS (1). */ - if_info.original_cost = COSTS_N_INSNS (2); - + for COND (and it may not exist if we had to canonicalize to get COND). + It is assumed that the costs of a jump insn are dependent on the + branch costs. */ + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); /* Do the real work. */ diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h index 37147f99129..67d4f6a502e 100644 --- a/gcc/ifcvt.h +++ b/gcc/ifcvt.h @@ -63,6 +63,9 @@ struct noce_if_info /* New insns should be inserted before this one. */ rtx_insn *cond_earliest; + /* Whether or not COND_EARLIEST is used in the resulting sequence. */ + bool use_cond_earliest; + /* Insns in the THEN and ELSE block. There is always just this one insns in those blocks. The insns are single_set insns. If there was no ELSE block, INSN_B is the last insn before
Robin Dapp <rdapp.gcc@gmail.com> writes: > The attached v3 tracks the use of cond_earliest as you suggested > and adds its cost in default_noce_conversion_profitable_p. > > Bootstrapped and regtested on x86 and p10, aarch64 still > running. Regtested on riscv64. > > Regards > Robin > > Before noce_find_if_block processes a block it sets up an if_info > structure that holds the original costs. At that point the costs of > the then/else blocks have not been added so we only care about the > "if" cost. > > The code originally used BRANCH_COST for that but was then changed > to COST_N_INSNS (2) - a compare and a jump. > > This patch computes the jump costs via > insn_cost (if_info.jump, ...) > under the assumption that the target takes BRANCH_COST into account > when costing a jump instruction. > > In noce_convert_multiple_sets, we keep track of the need for the initial > CC comparison. If we needed it for the generated sequence we add its > cost in default_noce_conversion_profitable_p. I was looking at the code in more detail and just wanted to check. We have: int last_needs_comparison = -1; bool ok = noce_convert_multiple_sets_1 (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, &unmodified_insns, &last_needs_comparison); if (!ok) return false; /* If there are insns that overwrite part of the initial comparison, we can still omit creating temporaries for the last of them. As the second try will always create a less expensive, valid sequence, we do not need to compare and can discard the first one. */ if (last_needs_comparison != -1) { end_sequence (); start_sequence (); ok = noce_convert_multiple_sets_1 (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, &unmodified_insns, &last_needs_comparison); /* Actually we should not fail anymore if we reached here, but better still check. */ if (!ok) return false; } But noce_convert_multiple_sets_1 ends with: /* Even if we did not actually need the comparison, we want to make sure to try a second time in order to get rid of the temporaries. */ if (*last_needs_comparison == -1) *last_needs_comparison = 0; return true; AFAICT that means that the first attempt is always redundant. Have I missed something? I don't know if this was something that Manolis's patches addressed. Thanks, Richard > > gcc/ChangeLog: > > * ifcvt.cc (default_noce_conversion_profitable_p): Add cost of > CC comparison. > (noce_convert_multiple_sets_1): Set use_cond_earliest. > (noce_process_if_block): Just use original cost. > (noce_find_if_block): Use insn_cost (jump_insn). > * ifcvt.h (struct noce_if_info): Add use_cond_earliest. > --- > gcc/ifcvt.cc | 37 ++++++++++++++++++++++--------------- > gcc/ifcvt.h | 3 +++ > 2 files changed, 25 insertions(+), 15 deletions(-) > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > index 58ed42673e5..9b408eeb313 100644 > --- a/gcc/ifcvt.cc > +++ b/gcc/ifcvt.cc > @@ -814,7 +814,16 @@ default_noce_conversion_profitable_p (rtx_insn *seq, > /* Cost up the new sequence. */ > unsigned int cost = seq_cost (seq, speed_p); > > - if (cost <= if_info->original_cost) > + /* If the created sequence does not use cond_earliest (but the jump > + does) add its cost to the original_cost here. */ > + unsigned int cost_adjust = 0; > + > + if (if_info->jump != if_info->cond_earliest > + && !if_info->use_cond_earliest) > + cost_adjust = insn_cost (if_info->cond_earliest, > + if_info->speed_p); > + > + if (cost <= if_info->original_cost + cost_adjust) > return true; > > /* When compiling for size, we can make a reasonably accurately guess > @@ -3780,6 +3789,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, > temp_dest = temp_dest2; > if (!second_try && read_comparison) > *last_needs_comparison = count; > + if_info->use_cond_earliest = true; > } > else > { > @@ -3931,16 +3941,13 @@ noce_process_if_block (struct noce_if_info *if_info) > to calculate a value for x. > ??? For future expansion, further expand the "multiple X" rules. */ > > - /* First look for multiple SETS. The original costs already include > - a base cost of COSTS_N_INSNS (2): one instruction for the compare > - (which we will be needing either way) and one instruction for the > - branch. When comparing costs we want to use the branch instruction > - cost and the sets vs. the cmovs generated here. Therefore subtract > - the costs of the compare before checking. > - ??? Actually, instead of the branch instruction costs we might want > - to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ > + /* First look for multiple SETS. > + The original costs already include costs for the jump insn as well > + as for a CC comparison if there is any. > + If a target re-uses the existing CC comparison we keep track of that > + and add the costs in default default_noce_conversion_profitable_p. */ > > - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); > + unsigned potential_cost = if_info->original_cost; > unsigned old_cost = if_info->original_cost; > if (!else_bb > && HAVE_conditional_move > @@ -4696,6 +4703,7 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, > gcc_assert (if_info.rev_cond == NULL_RTX > || rev_cond_earliest == cond_earliest); > if_info.cond_earliest = cond_earliest; > + if_info.use_cond_earliest = false; > if_info.jump = jump; > if_info.then_else_reversed = then_else_reversed; > if_info.speed_p = speed_p; > @@ -4703,11 +4711,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, > = targetm.max_noce_ifcvt_seq_cost (then_edge); > /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check > that they are valid to transform. We can't easily get back to the insn > - for COND (and it may not exist if we had to canonicalize to get COND), > - and jump_insns are always given a cost of 1 by seq_cost, so treat > - both instructions as having cost COSTS_N_INSNS (1). */ > - if_info.original_cost = COSTS_N_INSNS (2); > - > + for COND (and it may not exist if we had to canonicalize to get COND). > + It is assumed that the costs of a jump insn are dependent on the > + branch costs. */ > + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); > > /* Do the real work. */ > > diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h > index 37147f99129..67d4f6a502e 100644 > --- a/gcc/ifcvt.h > +++ b/gcc/ifcvt.h > @@ -63,6 +63,9 @@ struct noce_if_info > /* New insns should be inserted before this one. */ > rtx_insn *cond_earliest; > > + /* Whether or not COND_EARLIEST is used in the resulting sequence. */ > + bool use_cond_earliest; > + > /* Insns in the THEN and ELSE block. There is always just this > one insns in those blocks. The insns are single_set insns. > If there was no ELSE block, INSN_B is the last insn before
> I was looking at the code in more detail and just wanted to check. > We have: > > int last_needs_comparison = -1; > > bool ok = noce_convert_multiple_sets_1 > (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, > &unmodified_insns, &last_needs_comparison); > if (!ok) > return false; > > /* If there are insns that overwrite part of the initial > comparison, we can still omit creating temporaries for > the last of them. > As the second try will always create a less expensive, > valid sequence, we do not need to compare and can discard > the first one. */ > if (last_needs_comparison != -1) > { > end_sequence (); > start_sequence (); > ok = noce_convert_multiple_sets_1 > (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, > &unmodified_insns, &last_needs_comparison); > /* Actually we should not fail anymore if we reached here, > but better still check. */ > if (!ok) > return false; > } > > But noce_convert_multiple_sets_1 ends with: > > /* Even if we did not actually need the comparison, we want to make sure > to try a second time in order to get rid of the temporaries. */ > if (*last_needs_comparison == -1) > *last_needs_comparison = 0; > > > return true; > > AFAICT that means that the first attempt is always redundant. > > Have I missed something? (I might not have fully gotten the question) The idea is that the first attempt goes through all insns and sets *last_need_comparison to the insn number that either - used the condition/comparison by preferring seq1 or - used the condition as a side-effect insn when creating a CC-using insn in seq2. (And we only know that after actually creating the sequences). The second attempt then improves on the first one by skipping any temporary destination registers after the last insn that required the condition (even though its target overlaps with the condition registers). This is true for all cmovs that only use the CC (instead of the condition). Essentially, we know that all following cmovs can be created via the CC which is not overwritten. So, even when we never used the condition because of all CC-using cmovs we would skip the temporary targets in the second attempt. But we can't know that all we ever needed is the CC comparison before actually creating the sequences in the first attempt. Regards Robin
Robin Dapp <rdapp.gcc@gmail.com> writes: >> I was looking at the code in more detail and just wanted to check. >> We have: >> >> int last_needs_comparison = -1; >> >> bool ok = noce_convert_multiple_sets_1 >> (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, >> &unmodified_insns, &last_needs_comparison); >> if (!ok) >> return false; >> >> /* If there are insns that overwrite part of the initial >> comparison, we can still omit creating temporaries for >> the last of them. >> As the second try will always create a less expensive, >> valid sequence, we do not need to compare and can discard >> the first one. */ >> if (last_needs_comparison != -1) >> { >> end_sequence (); >> start_sequence (); >> ok = noce_convert_multiple_sets_1 >> (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, >> &unmodified_insns, &last_needs_comparison); >> /* Actually we should not fail anymore if we reached here, >> but better still check. */ >> if (!ok) >> return false; >> } >> >> But noce_convert_multiple_sets_1 ends with: >> >> /* Even if we did not actually need the comparison, we want to make sure >> to try a second time in order to get rid of the temporaries. */ >> if (*last_needs_comparison == -1) >> *last_needs_comparison = 0; >> >> >> return true; >> >> AFAICT that means that the first attempt is always redundant. >> >> Have I missed something? > > (I might not have fully gotten the question) > > The idea is that the first attempt goes through all insns and sets > *last_need_comparison to the insn number that either > - used the condition/comparison by preferring seq1 or > - used the condition as a side-effect insn when creating a CC-using > insn in seq2. > (And we only know that after actually creating the sequences). > > The second attempt then improves on the first one by skipping > any temporary destination registers after the last insn that required > the condition (even though its target overlaps with the condition > registers). This is true for all cmovs that only use the CC > (instead of the condition). Essentially, we know that all following > cmovs can be created via the CC which is not overwritten. > > So, even when we never used the condition because of all CC-using > cmovs we would skip the temporary targets in the second attempt. > But we can't know that all we ever needed is the CC comparison > before actually creating the sequences in the first attempt. Hmm, ok. The bit that confused me most was: if (last_needs_comparison != -1) { end_sequence (); start_sequence (); ... } which implied that the second attempt was made conditionally. It seems like it's always used and is an inherent part of the algorithm. If the problem is tracking liveness, wouldn't it be better to iterate over the "then" block in reverse order? We would start with the liveness set for the join block and update as we move backwards through the "then" block. This liveness set would tell us whether the current instruction needs to preserve a particular register. That should make it possible to do the transformation in one step, and so avoid the risk that the second attempt does something that is unexpectedly different from the first attempt. FWIW, the reason for asking was that it seemed safer to pass use_cond_earliest back from noce_convert_multiple_sets_1 to noce_convert_multiple_sets, as another parameter, and then do the adjustment around noce_convert_multiple_sets's call to targetm.noce_conversion_profitable_p. That would avoid the new for a new if_info field, which in turn would make it less likely that stale information is carried over from one attempt to the next (e.g. if other ifcvt techniques end up using the same field in future). Thanks, Richard
> Hmm, ok. The bit that confused me most was: > > if (last_needs_comparison != -1) > { > end_sequence (); > start_sequence (); > ... > } > > which implied that the second attempt was made conditionally. > It seems like it's always used and is an inherent part of the > algorithm. > > If the problem is tracking liveness, wouldn't it be better to > iterate over the "then" block in reverse order? We would start > with the liveness set for the join block and update as we move > backwards through the "then" block. This liveness set would > tell us whether the current instruction needs to preserve a > particular register. That should make it possible to do the > transformation in one step, and so avoid the risk that the > second attempt does something that is unexpectedly different > from the first attempt. I agree that the current approach is rather cumbersome. Indeed the second attempt was conditional at first and I changed it to be unconditional after some patch iterations. Your reverse-order idea sounds like it should work. To further clean up the algorithm we could also make it more explicit that a "cmov" depends on either the condition or the CC and basically track two separate paths through the block, one CC path and one "condition" path. I can surely do that as a follow up. It might conflict with Manolis's changes, though, so his work should probably be in first. > FWIW, the reason for asking was that it seemed safer to pass > use_cond_earliest back from noce_convert_multiple_sets_1 > to noce_convert_multiple_sets, as another parameter, > and then do the adjustment around noce_convert_multiple_sets's > call to targetm.noce_conversion_profitable_p. That would avoid > the new for a new if_info field, which in turn would make it > less likely that stale information is carried over from one attempt > to the next (e.g. if other ifcvt techniques end up using the same > field in future). Would something like the attached v4 be OK that uses a parameter instead (I mean without having refactored the full algorithm)? At least I changed the comment before the second attempt to hopefully cause a tiny bit less confusion :) I haven't fully bootstrapped it yet. Regards Robin Before noce_find_if_block processes a block it sets up an if_info structure that holds the original costs. At that point the costs of the then/else blocks have not been added so we only care about the "if" cost. The code originally used BRANCH_COST for that but was then changed to COST_N_INSNS (2) - a compare and a jump. This patch computes the jump costs via insn_cost (if_info.jump, ...) under the assumption that the target takes BRANCH_COST into account when costing a jump instruction. In noce_convert_multiple_sets, we keep track of the need for the initial CC comparison. If we needed it for the generated sequence we add its cost before default_noce_conversion_profitable_p. gcc/ChangeLog: * ifcvt.cc (noce_convert_multiple_sets): Define use_cond_earliest and adjust original cost if needed. (noce_convert_multiple_sets_1): Add param use_cond_earliest. (noce_process_if_block): Do not subtract CC cost anymore. (noce_find_if_block): Use insn_cost for costing jump insn. --- gcc/ifcvt.cc | 79 +++++++++++++++++++++++++++++----------------------- 1 file changed, 44 insertions(+), 35 deletions(-) diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index 58ed42673e5..2854eea7702 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -105,7 +105,8 @@ static bool noce_convert_multiple_sets_1 (struct noce_if_info *, hash_map<rtx_insn *, int> *, auto_vec<rtx> *, auto_vec<rtx> *, - auto_vec<rtx_insn *> *, int *); + auto_vec<rtx_insn *> *, + int *, bool *); /* Count the number of non-jump active insns in BB. */ @@ -3502,30 +3503,28 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) int last_needs_comparison = -1; + bool use_cond_earliest = false; + bool ok = noce_convert_multiple_sets_1 (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, - &unmodified_insns, &last_needs_comparison); + &unmodified_insns, &last_needs_comparison, &use_cond_earliest); if (!ok) return false; - /* If there are insns that overwrite part of the initial - comparison, we can still omit creating temporaries for - the last of them. - As the second try will always create a less expensive, - valid sequence, we do not need to compare and can discard - the first one. */ - if (last_needs_comparison != -1) - { - end_sequence (); - start_sequence (); - ok = noce_convert_multiple_sets_1 - (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, - &unmodified_insns, &last_needs_comparison); - /* Actually we should not fail anymore if we reached here, - but better still check. */ - if (!ok) - return false; - } + /* Always perform a second attempt that uses information gathered in the + first. At least we can omit creating temporaries until we definitely + need them. The sequence created in the second attempt is never worse + than the first. */ + + end_sequence (); + start_sequence (); + ok = noce_convert_multiple_sets_1 + (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, + &unmodified_insns, &last_needs_comparison, &use_cond_earliest); + /* Actually we should not fail anymore if we reached here, + but better still check. */ + if (!ok) + return false; /* We must have seen some sort of insn to insert, otherwise we were given an empty BB to convert, and we can't handle that. */ @@ -3539,12 +3538,23 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) /* Actually emit the sequence if it isn't too expensive. */ rtx_insn *seq = get_insns (); + /* If the created sequence does not use cond_earliest (but the jump + does) add its cost to the original_cost before comparing costs. */ + unsigned int original_cost = if_info->original_cost; + if (if_info->jump != if_info->cond_earliest && !use_cond_earliest) + if_info->original_cost += insn_cost (if_info->cond_earliest, + if_info->speed_p); + if (!targetm.noce_conversion_profitable_p (seq, if_info)) { + if_info->original_cost = original_cost; end_sequence (); return false; } + /* Restore the original cost in case we do not succeed below. */ + if_info->original_cost = original_cost; + for (insn = seq; insn; insn = NEXT_INSN (insn)) set_used_flags (insn); @@ -3620,7 +3630,8 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, auto_vec<rtx> *targets, auto_vec<rtx> *temporaries, auto_vec<rtx_insn *> *unmodified_insns, - int *last_needs_comparison) + int *last_needs_comparison, + bool *use_cond_earliest) { basic_block then_bb = if_info->then_bb; rtx_insn *jump = if_info->jump; @@ -3644,6 +3655,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, unmodified_insns->truncate (0); bool second_try = *last_needs_comparison != -1; + *use_cond_earliest = false; FOR_BB_INSNS (then_bb, insn) { @@ -3780,6 +3792,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, temp_dest = temp_dest2; if (!second_try && read_comparison) *last_needs_comparison = count; + *use_cond_earliest = true; } else { @@ -3931,16 +3944,13 @@ noce_process_if_block (struct noce_if_info *if_info) to calculate a value for x. ??? For future expansion, further expand the "multiple X" rules. */ - /* First look for multiple SETS. The original costs already include - a base cost of COSTS_N_INSNS (2): one instruction for the compare - (which we will be needing either way) and one instruction for the - branch. When comparing costs we want to use the branch instruction - cost and the sets vs. the cmovs generated here. Therefore subtract - the costs of the compare before checking. - ??? Actually, instead of the branch instruction costs we might want - to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ + /* First look for multiple SETS. + The original costs already include costs for the jump insn as well + as for a CC comparison if there is any. + If a target re-uses the existing CC comparison we keep track of that + and add the costs before default noce_conversion_profitable_p. */ - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); + unsigned potential_cost = if_info->original_cost; unsigned old_cost = if_info->original_cost; if (!else_bb && HAVE_conditional_move @@ -4703,11 +4713,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, = targetm.max_noce_ifcvt_seq_cost (then_edge); /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check that they are valid to transform. We can't easily get back to the insn - for COND (and it may not exist if we had to canonicalize to get COND), - and jump_insns are always given a cost of 1 by seq_cost, so treat - both instructions as having cost COSTS_N_INSNS (1). */ - if_info.original_cost = COSTS_N_INSNS (2); - + for COND (and it may not exist if we had to canonicalize to get COND). + It is assumed that the costs of a jump insn are dependent on the + branch costs. */ + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); /* Do the real work. */
I think Manolis's patches are all in, time to revisit this one? On 6/12/24 1:54 AM, Robin Dapp wrote: >> Hmm, ok. The bit that confused me most was: >> >> if (last_needs_comparison != -1) >> { >> end_sequence (); >> start_sequence (); >> ... >> } >> >> which implied that the second attempt was made conditionally. >> It seems like it's always used and is an inherent part of the >> algorithm. >> >> If the problem is tracking liveness, wouldn't it be better to >> iterate over the "then" block in reverse order? We would start >> with the liveness set for the join block and update as we move >> backwards through the "then" block. This liveness set would >> tell us whether the current instruction needs to preserve a >> particular register. That should make it possible to do the >> transformation in one step, and so avoid the risk that the >> second attempt does something that is unexpectedly different >> from the first attempt. > > I agree that the current approach is rather cumbersome. Indeed > the second attempt was conditional at first and I changed it to > be unconditional after some patch iterations. > Your reverse-order idea sounds like it should work. To further > clean up the algorithm we could also make it more explicit > that a "cmov" depends on either the condition or the CC and > basically track two separate paths through the block, one CC > path and one "condition" path. > > I can surely do that as a follow up. It might conflict with > Manolis's changes, though, so his work should probably be in > first. > >> FWIW, the reason for asking was that it seemed safer to pass >> use_cond_earliest back from noce_convert_multiple_sets_1 >> to noce_convert_multiple_sets, as another parameter, >> and then do the adjustment around noce_convert_multiple_sets's >> call to targetm.noce_conversion_profitable_p. That would avoid >> the new for a new if_info field, which in turn would make it >> less likely that stale information is carried over from one attempt >> to the next (e.g. if other ifcvt techniques end up using the same >> field in future). > > Would something like the attached v4 be OK that uses a parameter > instead (I mean without having refactored the full algorithm)? > At least I changed the comment before the second attempt to > hopefully cause a tiny bit less confusion :) > I haven't fully bootstrapped it yet. > > Regards > Robin > > Before noce_find_if_block processes a block it sets up an if_info > structure that holds the original costs. At that point the costs of > the then/else blocks have not been added so we only care about the > "if" cost. > > The code originally used BRANCH_COST for that but was then changed > to COST_N_INSNS (2) - a compare and a jump. > > This patch computes the jump costs via > insn_cost (if_info.jump, ...) > under the assumption that the target takes BRANCH_COST into account > when costing a jump instruction. > > In noce_convert_multiple_sets, we keep track of the need for the initial > CC comparison. If we needed it for the generated sequence we add its > cost before default_noce_conversion_profitable_p. > > gcc/ChangeLog: > > * ifcvt.cc (noce_convert_multiple_sets): Define > use_cond_earliest and adjust original cost if needed. > (noce_convert_multiple_sets_1): Add param use_cond_earliest. > (noce_process_if_block): Do not subtract CC cost anymore. > (noce_find_if_block): Use insn_cost for costing jump insn. > --- > gcc/ifcvt.cc | 79 +++++++++++++++++++++++++++++----------------------- > 1 file changed, 44 insertions(+), 35 deletions(-) > > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc > index 58ed42673e5..2854eea7702 100644 > --- a/gcc/ifcvt.cc > +++ b/gcc/ifcvt.cc > @@ -105,7 +105,8 @@ static bool noce_convert_multiple_sets_1 (struct noce_if_info *, > hash_map<rtx_insn *, int> *, > auto_vec<rtx> *, > auto_vec<rtx> *, > - auto_vec<rtx_insn *> *, int *); > + auto_vec<rtx_insn *> *, > + int *, bool *); > > /* Count the number of non-jump active insns in BB. */ > > @@ -3502,30 +3503,28 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) > > int last_needs_comparison = -1; > > + bool use_cond_earliest = false; > + > bool ok = noce_convert_multiple_sets_1 > (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, > - &unmodified_insns, &last_needs_comparison); > + &unmodified_insns, &last_needs_comparison, &use_cond_earliest); > if (!ok) > return false; > > - /* If there are insns that overwrite part of the initial > - comparison, we can still omit creating temporaries for > - the last of them. > - As the second try will always create a less expensive, > - valid sequence, we do not need to compare and can discard > - the first one. */ > - if (last_needs_comparison != -1) > - { > - end_sequence (); > - start_sequence (); > - ok = noce_convert_multiple_sets_1 > - (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, > - &unmodified_insns, &last_needs_comparison); > - /* Actually we should not fail anymore if we reached here, > - but better still check. */ > - if (!ok) > - return false; > - } > + /* Always perform a second attempt that uses information gathered in the > + first. At least we can omit creating temporaries until we definitely > + need them. The sequence created in the second attempt is never worse > + than the first. */ > + > + end_sequence (); > + start_sequence (); > + ok = noce_convert_multiple_sets_1 > + (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries, > + &unmodified_insns, &last_needs_comparison, &use_cond_earliest); > + /* Actually we should not fail anymore if we reached here, > + but better still check. */ > + if (!ok) > + return false; > > /* We must have seen some sort of insn to insert, otherwise we were > given an empty BB to convert, and we can't handle that. */ > @@ -3539,12 +3538,23 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) > /* Actually emit the sequence if it isn't too expensive. */ > rtx_insn *seq = get_insns (); > > + /* If the created sequence does not use cond_earliest (but the jump > + does) add its cost to the original_cost before comparing costs. */ > + unsigned int original_cost = if_info->original_cost; > + if (if_info->jump != if_info->cond_earliest && !use_cond_earliest) > + if_info->original_cost += insn_cost (if_info->cond_earliest, > + if_info->speed_p); > + > if (!targetm.noce_conversion_profitable_p (seq, if_info)) > { > + if_info->original_cost = original_cost; > end_sequence (); > return false; > } > > + /* Restore the original cost in case we do not succeed below. */ > + if_info->original_cost = original_cost; > + > for (insn = seq; insn; insn = NEXT_INSN (insn)) > set_used_flags (insn); > > @@ -3620,7 +3630,8 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, > auto_vec<rtx> *targets, > auto_vec<rtx> *temporaries, > auto_vec<rtx_insn *> *unmodified_insns, > - int *last_needs_comparison) > + int *last_needs_comparison, > + bool *use_cond_earliest) > { > basic_block then_bb = if_info->then_bb; > rtx_insn *jump = if_info->jump; > @@ -3644,6 +3655,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, > unmodified_insns->truncate (0); > > bool second_try = *last_needs_comparison != -1; > + *use_cond_earliest = false; > > FOR_BB_INSNS (then_bb, insn) > { > @@ -3780,6 +3792,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info, > temp_dest = temp_dest2; > if (!second_try && read_comparison) > *last_needs_comparison = count; > + *use_cond_earliest = true; > } > else > { > @@ -3931,16 +3944,13 @@ noce_process_if_block (struct noce_if_info *if_info) > to calculate a value for x. > ??? For future expansion, further expand the "multiple X" rules. */ > > - /* First look for multiple SETS. The original costs already include > - a base cost of COSTS_N_INSNS (2): one instruction for the compare > - (which we will be needing either way) and one instruction for the > - branch. When comparing costs we want to use the branch instruction > - cost and the sets vs. the cmovs generated here. Therefore subtract > - the costs of the compare before checking. > - ??? Actually, instead of the branch instruction costs we might want > - to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ > + /* First look for multiple SETS. > + The original costs already include costs for the jump insn as well > + as for a CC comparison if there is any. > + If a target re-uses the existing CC comparison we keep track of that > + and add the costs before default noce_conversion_profitable_p. */ > > - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); > + unsigned potential_cost = if_info->original_cost; > unsigned old_cost = if_info->original_cost; > if (!else_bb > && HAVE_conditional_move > @@ -4703,11 +4713,10 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, > = targetm.max_noce_ifcvt_seq_cost (then_edge); > /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check > that they are valid to transform. We can't easily get back to the insn > - for COND (and it may not exist if we had to canonicalize to get COND), > - and jump_insns are always given a cost of 1 by seq_cost, so treat > - both instructions as having cost COSTS_N_INSNS (1). */ > - if_info.original_cost = COSTS_N_INSNS (2); > - > + for COND (and it may not exist if we had to canonicalize to get COND). > + It is assumed that the costs of a jump insn are dependent on the > + branch costs. */ > + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); > > /* Do the real work. */ >
>> If the problem is tracking liveness, wouldn't it be better to >> iterate over the "then" block in reverse order? We would start >> with the liveness set for the join block and update as we move >> backwards through the "then" block. This liveness set would >> tell us whether the current instruction needs to preserve a >> particular register. That should make it possible to do the >> transformation in one step, and so avoid the risk that the >> second attempt does something that is unexpectedly different >> from the first attempt. > > I agree that the current approach is rather cumbersome. Indeed > the second attempt was conditional at first and I changed it to > be unconditional after some patch iterations. > Your reverse-order idea sounds like it should work. To further > clean up the algorithm we could also make it more explicit > that a "cmov" depends on either the condition or the CC and > basically track two separate paths through the block, one CC > path and one "condition" path. I gave this another thought. Right now we keep track of the generated targets and temporaries in forward order, using those for the source rewiring. I don't see how we could do that in reverse order other than have another fixup iteration afterwards. >> FWIW, the reason for asking was that it seemed safer to pass >> use_cond_earliest back from noce_convert_multiple_sets_1 >> to noce_convert_multiple_sets, as another parameter, >> and then do the adjustment around noce_convert_multiple_sets's >> call to targetm.noce_conversion_profitable_p. That would avoid >> the new for a new if_info field, which in turn would make it >> less likely that stale information is carried over from one attempt >> to the next (e.g. if other ifcvt techniques end up using the same >> field in future). > > Would something like the attached v4 be OK that uses a parameter > instead (I mean without having refactored the full algorithm)? > At least I changed the comment before the second attempt to > hopefully cause a tiny bit less confusion :) > I haven't fully bootstrapped it yet. That v4 was bootstrapped and regtested on x86 and aarch64 in the meanwhile and it has been in our internal tree for a while without problems. Would it be OK for trunk without further refactoring?
"Robin Dapp" <rdapp.gcc@gmail.com> writes: >>> If the problem is tracking liveness, wouldn't it be better to >>> iterate over the "then" block in reverse order? We would start >>> with the liveness set for the join block and update as we move >>> backwards through the "then" block. This liveness set would >>> tell us whether the current instruction needs to preserve a >>> particular register. That should make it possible to do the >>> transformation in one step, and so avoid the risk that the >>> second attempt does something that is unexpectedly different >>> from the first attempt. >> >> I agree that the current approach is rather cumbersome. Indeed >> the second attempt was conditional at first and I changed it to >> be unconditional after some patch iterations. >> Your reverse-order idea sounds like it should work. To further >> clean up the algorithm we could also make it more explicit >> that a "cmov" depends on either the condition or the CC and >> basically track two separate paths through the block, one CC >> path and one "condition" path. > > I gave this another thought. Right now we keep track of the > generated targets and temporaries in forward order, using those > for the source rewiring. I don't see how we could do that in > reverse order other than have another fixup iteration afterwards. > >>> FWIW, the reason for asking was that it seemed safer to pass >>> use_cond_earliest back from noce_convert_multiple_sets_1 >>> to noce_convert_multiple_sets, as another parameter, >>> and then do the adjustment around noce_convert_multiple_sets's >>> call to targetm.noce_conversion_profitable_p. That would avoid >>> the new for a new if_info field, which in turn would make it >>> less likely that stale information is carried over from one attempt >>> to the next (e.g. if other ifcvt techniques end up using the same >>> field in future). >> >> Would something like the attached v4 be OK that uses a parameter >> instead (I mean without having refactored the full algorithm)? >> At least I changed the comment before the second attempt to >> hopefully cause a tiny bit less confusion :) >> I haven't fully bootstrapped it yet. > > That v4 was bootstrapped and regtested on x86 and aarch64 in the meanwhile and > it has been in our internal tree for a while without problems. > > Would it be OK for trunk without further refactoring? I think it'd be better if I abstain from this. I probably disagree too much with the current structure and the way that the code is developing. I won't object if anyone else approves it though. Thanks, Richard
> I think it'd be better if I abstain from this. I probably disagree too > much with the current structure and the way that the code is developing. > I won't object if anyone else approves it though. It's not that I'm happy with the current state either and I thought about how to rewrite it more than once. So if you have an idea for a good rework of it (or larger parts of ifcvt) I'd be all ears. My argument right now would be that this patch doesn't make things worse in terms of complexity and improves SPEC 2017's deepsjeng considerably on riscv. IMHO it doesn't inhibit a future rewrite and even simplifies things ever so slightly.
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc index 58ed42673e5..305b9faed38 100644 --- a/gcc/ifcvt.cc +++ b/gcc/ifcvt.cc @@ -3940,7 +3940,9 @@ noce_process_if_block (struct noce_if_info *if_info) ??? Actually, instead of the branch instruction costs we might want to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */ - unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1); + unsigned potential_cost = if_info->original_cost; + if (cc_in_cond (if_info->cond)) + potential_cost -= pattern_cost (if_info->cond, if_info->speed_p); unsigned old_cost = if_info->original_cost; if (!else_bb && HAVE_conditional_move @@ -4703,11 +4705,13 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge, = targetm.max_noce_ifcvt_seq_cost (then_edge); /* We'll add in the cost of THEN_BB and ELSE_BB later, when we check that they are valid to transform. We can't easily get back to the insn - for COND (and it may not exist if we had to canonicalize to get COND), - and jump_insns are always given a cost of 1 by seq_cost, so treat - both instructions as having cost COSTS_N_INSNS (1). */ - if_info.original_cost = COSTS_N_INSNS (2); - + for COND (and it may not exist if we had to canonicalize to get COND). + Here we assume one CC compare insn (if the target uses CC) and one + jump insn that is costed via insn_cost. It is assumed that the + costs of a jump insn are dependent on the branch costs. */ + if (cc_in_cond (if_info.cond)) + if_info.original_cost = pattern_cost (if_info.cond, if_info.speed_p); + if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p); /* Do the real work. */