mbox series

[v2,00/21] target/sparc: Cleanup condition codes etc

Message ID 20231101041132.174501-1-richard.henderson@linaro.org
Headers show
Series target/sparc: Cleanup condition codes etc | expand

Message

Richard Henderson Nov. 1, 2023, 4:11 a.m. UTC
This was part of my guess for some of the performance problems.

I saw compute_all_sub quite high in the profile at some point, and I
believe that the test case has a partially rotated loop such that "cmp"
is in a delay slot, and so the gen_compare fast path for CC_OP_SUB is
not visible to the conditional branch that uses the output of the compare.
Which means that helper_compute_psr gets called more often that we'd like.

Since almost all Sparc instructions that set cc also have a version of
the instruction that does not set cc, we can trust that the compiler
has only used the cc-setting version when it is actually required.
Thus, unlike CISC processors, there is very little scope for optimization
of the flags -- we might as well compute them immediately.

Move away from CC_OP to explicit computation of conditions.  This is
modeled on target/arm for the (mostly) separate representation of the bits.
We can pack icc.[NV] and xcc.[NV] into the same target_ulong, but Z and C
cannot share.  (For "normal" setting of Z, we could share, but it is
possible to set xcc.Z and !icc.Z via explicit write to %ccr, and for
that we have to have two variables.)

After removing CC_OP, clean up the handling of conditions so that we can
minimize additional setcond required for env->cond.

Finally, inline some division, which can make use of the new out-of-line
exception path, which means we can expand UDIVX and SDIVX with very few
host insns.  The 64/32 UDIV insn needs only a few more.  Leave UDIVcc and
SDIV* out of line, as the overflow and saturation computation in these
cases is really too large to inline.


r~


Richard Henderson (21):
  target/sparc: Introduce cpu_put_psr_icc
  target/sparc: Split psr and xcc into components
  target/sparc: Remove CC_OP_LOGIC
  target/sparc: Remove CC_OP_DIV
  target/sparc: Remove CC_OP_ADD, CC_OP_ADDX, CC_OP_TADD
  target/sparc: Remove CC_OP_SUB, CC_OP_SUBX, CC_OP_TSUB
  target/sparc: Remove CC_OP_TADDTV, CC_OP_TSUBTV
  target/sparc: Remove CC_OP leftovers
  target/sparc: Remove DisasCompare.is_bool
  target/sparc: Change DisasCompare.c2 to int
  target/sparc: Always copy conditions into a new temporary
  target/sparc: Do flush_cond in advance_jump_cond
  target/sparc: Merge gen_branch2 into advance_pc
  target/sparc: Merge advance_jump_uncond_{never,always} into
    advance_jump_cond
  target/sparc: Pass displacement to advance_jump_cond
  target/sparc: Merge gen_op_next_insn into only caller
  target/sparc: Record entire jump condition in DisasContext
  target/sparc: Discard cpu_cond at the end of each insn
  target/sparc: Implement UDIVX and SDIVX inline
  target/sparc: Implement UDIV inline
  target/sparc: Check for invalid cond in gen_compare_reg

 linux-user/sparc/target_cpu.h |   17 +-
 target/sparc/cpu.h            |   58 +-
 target/sparc/helper.h         |   12 +-
 target/sparc/insns.decode     |    7 +-
 linux-user/sparc/cpu_loop.c   |   11 +-
 linux-user/sparc/signal.c     |    2 +-
 target/sparc/cc_helper.c      |  471 ------------
 target/sparc/cpu.c            |    1 -
 target/sparc/helper.c         |  171 ++---
 target/sparc/int32_helper.c   |    5 -
 target/sparc/int64_helper.c   |    5 -
 target/sparc/machine.c        |   45 +-
 target/sparc/translate.c      | 1333 ++++++++++++++-------------------
 target/sparc/win_helper.c     |   56 +-
 target/sparc/meson.build      |    1 -
 15 files changed, 789 insertions(+), 1406 deletions(-)
 delete mode 100644 target/sparc/cc_helper.c

Comments

Mark Cave-Ayland Nov. 5, 2023, 1:22 p.m. UTC | #1
On 01/11/2023 04:11, Richard Henderson wrote:
> This was part of my guess for some of the performance problems.
> 
> I saw compute_all_sub quite high in the profile at some point, and I
> believe that the test case has a partially rotated loop such that "cmp"
> is in a delay slot, and so the gen_compare fast path for CC_OP_SUB is
> not visible to the conditional branch that uses the output of the compare.
> Which means that helper_compute_psr gets called more often that we'd like.
> 
> Since almost all Sparc instructions that set cc also have a version of
> the instruction that does not set cc, we can trust that the compiler
> has only used the cc-setting version when it is actually required.
> Thus, unlike CISC processors, there is very little scope for optimization
> of the flags -- we might as well compute them immediately.
> 
> Move away from CC_OP to explicit computation of conditions.  This is
> modeled on target/arm for the (mostly) separate representation of the bits.
> We can pack icc.[NV] and xcc.[NV] into the same target_ulong, but Z and C
> cannot share.  (For "normal" setting of Z, we could share, but it is
> possible to set xcc.Z and !icc.Z via explicit write to %ccr, and for
> that we have to have two variables.)
> 
> After removing CC_OP, clean up the handling of conditions so that we can
> minimize additional setcond required for env->cond.
> 
> Finally, inline some division, which can make use of the new out-of-line
> exception path, which means we can expand UDIVX and SDIVX with very few
> host insns.  The 64/32 UDIV insn needs only a few more.  Leave UDIVcc and
> SDIV* out of line, as the overflow and saturation computation in these
> cases is really too large to inline.
> 
> r~

I've tested this series by running through my OpenBIOS boot tests for SPARC32 and 
SPARC64 and didn't spot any obvious regressions, so:

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

> Richard Henderson (21):
>    target/sparc: Introduce cpu_put_psr_icc
>    target/sparc: Split psr and xcc into components
>    target/sparc: Remove CC_OP_LOGIC
>    target/sparc: Remove CC_OP_DIV
>    target/sparc: Remove CC_OP_ADD, CC_OP_ADDX, CC_OP_TADD
>    target/sparc: Remove CC_OP_SUB, CC_OP_SUBX, CC_OP_TSUB
>    target/sparc: Remove CC_OP_TADDTV, CC_OP_TSUBTV
>    target/sparc: Remove CC_OP leftovers
>    target/sparc: Remove DisasCompare.is_bool
>    target/sparc: Change DisasCompare.c2 to int
>    target/sparc: Always copy conditions into a new temporary
>    target/sparc: Do flush_cond in advance_jump_cond
>    target/sparc: Merge gen_branch2 into advance_pc
>    target/sparc: Merge advance_jump_uncond_{never,always} into
>      advance_jump_cond
>    target/sparc: Pass displacement to advance_jump_cond
>    target/sparc: Merge gen_op_next_insn into only caller
>    target/sparc: Record entire jump condition in DisasContext
>    target/sparc: Discard cpu_cond at the end of each insn
>    target/sparc: Implement UDIVX and SDIVX inline
>    target/sparc: Implement UDIV inline
>    target/sparc: Check for invalid cond in gen_compare_reg
> 
>   linux-user/sparc/target_cpu.h |   17 +-
>   target/sparc/cpu.h            |   58 +-
>   target/sparc/helper.h         |   12 +-
>   target/sparc/insns.decode     |    7 +-
>   linux-user/sparc/cpu_loop.c   |   11 +-
>   linux-user/sparc/signal.c     |    2 +-
>   target/sparc/cc_helper.c      |  471 ------------
>   target/sparc/cpu.c            |    1 -
>   target/sparc/helper.c         |  171 ++---
>   target/sparc/int32_helper.c   |    5 -
>   target/sparc/int64_helper.c   |    5 -
>   target/sparc/machine.c        |   45 +-
>   target/sparc/translate.c      | 1333 ++++++++++++++-------------------
>   target/sparc/win_helper.c     |   56 +-
>   target/sparc/meson.build      |    1 -
>   15 files changed, 789 insertions(+), 1406 deletions(-)
>   delete mode 100644 target/sparc/cc_helper.c
> 


ATB,

Mark.