mbox series

[0/4] sched1 improvements

Message ID 20241020194018.3051160-1-vineetg@rivosinc.com
Headers show
Series sched1 improvements | expand

Message

Vineet Gupta Oct. 20, 2024, 7:40 p.m. UTC
Hi,

PFA patch series which improves sched1 spilling. This all started with
SPEC2017 507.Cactu dynamic icounts on RISC-V being double than those of
aarch64 (~2.6 trillion vs. ~1.4 trillion). Robin/Jeff hinted that the
issue could be sched1 which it turned out to be.

Essentially there are 2 fixes

  - Patch 1/4 improves the main list schedular outcomes by not
    watering down negative pressure change to zero. It implements
    a target hook, which is seperately enabled in patch 2/4 for RISC-V.

  - Patch 3/4 improves model schedule to not increase register
    pressure in certain cases.

  - Patch 4/4 is just a debug hack which I would like any testers to
    apply as that helpe dme a lot during development of patch 3/4.

More details can be found in individual patches.

Results on RISC-V hardware BPI-F3 (perf stat instructions/cycles) and
on aarch64 (I could only get QEMU dynamic icounts).

RISC-V BPI-F3 (-Ofast -march=rv64gcv_zba_zbb_zbs)

  baseline  | 7,631,707,552,979      cycles:u                         #    1.600 GHz
            | 2,630,225,489,010      instructions:u                   #    0.34  insn per cycle
            |
  all       | 6,736,337,207,427      cycles:u           (12% faster)  #    1.600 GHz
  patches   | 2,078,712,047,604      instructions:u     (21% fewer)   #    0.31  insn per cycle

aarch64 (-Ofast -march=armv9-a+sve2) + implement TARGET_SCHED_PRESSURE_PREFER_NARROW=true

  baseline  | 1,382,403,783,566
            |
  all       | 1,113,896,471,282                         (19.4% fewer)
  patches   |

As a follow up to discussions at Cauldron last month, I'm CC'ing some of
the aarch64 and power folks to test this on real hardware and get the
results (please don't forget to add equivalent of patch 2/4 for your
respective backends, i.e.

+#undef  TARGET_SCHED_PRESSURE_PREFER_NARROW
+#define TARGET_SCHED_PRESSURE_PREFER_NARROW hook_bool_void_true

Thx,
-Vineet

Vineet Gupta (4):
  sched1: hookize pressure scheduling spilling agressiveness
  RISC-V: Implement TARGET_SCHED_PRESSURE_PREFER_NARROW [PR/114729]
  sched1: model: only promote true dependecies in predecessor promotion
  sched1: model: ICE on infinite loops in predecessor promotion (Not for
    Merge)

 gcc/config/riscv/riscv.cc                     |   3 +
 gcc/doc/tm.texi                               |  11 ++
 gcc/doc/tm.texi.in                            |   2 +
 gcc/haifa-sched.cc                            | 109 ++++++++++++++----
 gcc/sched-rgn.cc                              |  14 ++-
 gcc/target.def                                |  13 +++
 gcc/testsuite/gcc.target/riscv/riscv.exp      |   2 +
 .../gcc.target/riscv/sched1-spills/hang1.c    |  32 +++++
 .../gcc.target/riscv/sched1-spills/hang5.c    |  60 ++++++++++
 .../gcc.target/riscv/sched1-spills/spill1.cpp |  31 +++++
 .../gcc.target/riscv/sched1-spills/spill2.cpp |  37 ++++++
 11 files changed, 289 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/hang1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/hang5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp
 create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill2.cpp

--
2.43.0

Comments

Vineet Gupta Oct. 28, 2024, 10:24 p.m. UTC | #1
Ping !

On 10/20/24 12:40, Vineet Gupta wrote:
> Hi,
>
> PFA patch series which improves sched1 spilling. This all started with
> SPEC2017 507.Cactu dynamic icounts on RISC-V being double than those of
> aarch64 (~2.6 trillion vs. ~1.4 trillion). Robin/Jeff hinted that the
> issue could be sched1 which it turned out to be.
>
> Essentially there are 2 fixes
>
>   - Patch 1/4 improves the main list schedular outcomes by not
>     watering down negative pressure change to zero. It implements
>     a target hook, which is seperately enabled in patch 2/4 for RISC-V.
>
>   - Patch 3/4 improves model schedule to not increase register
>     pressure in certain cases.
>
>   - Patch 4/4 is just a debug hack which I would like any testers to
>     apply as that helpe dme a lot during development of patch 3/4.
>
> More details can be found in individual patches.
>
> Results on RISC-V hardware BPI-F3 (perf stat instructions/cycles) and
> on aarch64 (I could only get QEMU dynamic icounts).
>
> RISC-V BPI-F3 (-Ofast -march=rv64gcv_zba_zbb_zbs)
>
>   baseline  | 7,631,707,552,979      cycles:u                         #    1.600 GHz
>             | 2,630,225,489,010      instructions:u                   #    0.34  insn per cycle
>             |
>   all       | 6,736,337,207,427      cycles:u           (12% faster)  #    1.600 GHz
>   patches   | 2,078,712,047,604      instructions:u     (21% fewer)   #    0.31  insn per cycle
>
> aarch64 (-Ofast -march=armv9-a+sve2) + implement TARGET_SCHED_PRESSURE_PREFER_NARROW=true
>
>   baseline  | 1,382,403,783,566
>             |
>   all       | 1,113,896,471,282                         (19.4% fewer)
>   patches   |
>
> As a follow up to discussions at Cauldron last month, I'm CC'ing some of
> the aarch64 and power folks to test this on real hardware and get the
> results (please don't forget to add equivalent of patch 2/4 for your
> respective backends, i.e.
>
> +#undef  TARGET_SCHED_PRESSURE_PREFER_NARROW
> +#define TARGET_SCHED_PRESSURE_PREFER_NARROW hook_bool_void_true
>
> Thx,
> -Vineet
>
> Vineet Gupta (4):
>   sched1: hookize pressure scheduling spilling agressiveness
>   RISC-V: Implement TARGET_SCHED_PRESSURE_PREFER_NARROW [PR/114729]
>   sched1: model: only promote true dependecies in predecessor promotion
>   sched1: model: ICE on infinite loops in predecessor promotion (Not for
>     Merge)
>
>  gcc/config/riscv/riscv.cc                     |   3 +
>  gcc/doc/tm.texi                               |  11 ++
>  gcc/doc/tm.texi.in                            |   2 +
>  gcc/haifa-sched.cc                            | 109 ++++++++++++++----
>  gcc/sched-rgn.cc                              |  14 ++-
>  gcc/target.def                                |  13 +++
>  gcc/testsuite/gcc.target/riscv/riscv.exp      |   2 +
>  .../gcc.target/riscv/sched1-spills/hang1.c    |  32 +++++
>  .../gcc.target/riscv/sched1-spills/hang5.c    |  60 ++++++++++
>  .../gcc.target/riscv/sched1-spills/spill1.cpp |  31 +++++
>  .../gcc.target/riscv/sched1-spills/spill2.cpp |  37 ++++++
>  11 files changed, 289 insertions(+), 25 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/hang1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/hang5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill2.cpp
>
> --
> 2.43.0
>
Jeff Law Oct. 28, 2024, 10:53 p.m. UTC | #2
On 10/28/24 4:24 PM, Vineet Gupta wrote:
> Ping !
Pong.  I've got a response to the first patch partially written :-) 
Exec summary is I don't have a problem with functionality in that patch, 
just naming/comments stuff.  Still trying to figure out how to express 
it clearly.

jeff