[3/4] sched1: model: only promote true dependecies in predecessor promotion

Background
----------
sched1 runs a preliminary "model schedular" ahead of the main list schedular.
Its sole purpose is to keep register pressure to mimimum [1] and it uses DFA
register depenendency tracking to arrange insns.

   [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2011-12/msg01684.html

                           `The idea was to construct a preliminary
   "model" schedule in which the only objective is to keep register
   pressure to a minimum.  This schedule ignores pipeline characteristics,
   latencies, and the number of available registers.  The maximum pressure
   seen in this initial model schedule (MP) is then the benchmark for ECC(X).`

It starts off with an intial "worklist" of insns w/o any prior dependencies,
scheduling them, adding successors of scheduled insn to the worklist and so
on until all insns in the basic block are done.
It can run into situations where an otherwise to-be-scheduled candidate can't
because it's predecessors haven't been scheduled yet, requiring
"predecessor promotion" implemented in model_promote_predecessors ().
Promotion is essentially bumping INSN model_priority so that it makes it
towards the head of elligible-to-schedule list.

An INSN can have multiple dependencies/predecessor nodes, some of them
being true dependency REG_DEP_TRUE meaning the predecessor register output
is a must have for the INSN to be scheduled.
e.g.
In the sched1 dump below, insn 70 has multiple deps, but 68 and 69 are
true reg deps:

;;	| insn | prio |
;;	|   68 |    3 | r217=r144+r146
;;	|   69 |    5 | r218=flt(r143)
;;	|   70 |    2 | [r217]=r218

;;      insn  code    bb   dep  prio  cost   reservation
;;      ----  ----    --   ---  ----  ----   -----------
;;       70   286     6     6     2     1   alu	: FW: 97tnm 91m 83tn 78m 76tn 72tn
;;						: BK: 69t 68t 57n 60n 61n 64n
                                                      ^^^ ^^^

Issue
-----
Currently predecessor promotion bumps the priority of all predecessors
to same value, treating the true deps and the rest alike. This simple
strategy can sometimes cause a subtle inadvertent effect: given the right
"other" conditions (depth, height etc) a non true dependency can get
scheduled ahead of the true dep, increasing the live range between the
true dep and the dependent. This increases the peak register pressure
for the BB. Subsequently this inflated peak register pressure steers
the main list schdular, giving it the lower bound to work with.
Main schedular can make pressure worse (due to instruction latencies
and pipeline models etc) but otherwise it can work with the given
peak pressure. Thus a higher model pressure will ultimately lead to a
less than ideal final schedule.

This subtle effect get crazy exacerbated on RISC-V SPEC2017 Cactu
benchmark.

For the spill2.cpp testcase (reduced from Cactu itself) on RISC-V,
here's what was seen:

  - After idx #6, insn 70 predecessors are promoted (see list above).
    Of the true deps, insn 68 is already schdeuled, insn 69 needs to be.
    insn 69 does get promoted (higher priority 4) but so does insn 60
    (prio 4) with its predecessor insn 58 getting even higher
    promotion (prio 5).

  - The insn 58 and its deps chain end up being scheduled ahead of insn 70
    such that now there are 3 insns seperating it from its direct dep
    insn 69. This blows reg pressure past the spill threshhold of
    sched_class_regs_num[GR_REGS] as evident from the pressure
    summary at the end.

;;	Model schedule:
;;
;;	| idx insn | mpri hght dpth prio |
;;	|   0   56 |    0    8    0   15 | r210=flt(r185#0)     GR_REGS:[25,+0] FP_REGS:[0,+1]
;;	|   1   57 |    0    8    1   12 | [r242+low(`e')]=r210 GR_REGS:[25,+0] FP_REGS:[1,-1]
;;	|   2   63 |    2    7    0   12 | r215=r141+r183       GR_REGS:[25,+1] FP_REGS:[0,+0]
;;	|   3   64 |    1    8    2   11 | r216=[r215]          GR_REGS:[26,-1] FP_REGS:[0,+1]
;;	|   4   65 |    0    8    3    8 | r143=fix(r216)       GR_REGS:[25,+1] FP_REGS:[1,-1]
;;	|   5   67 |    0    8    4    4 | r146=r143<<0x3       GR_REGS:[26,+1] FP_REGS:[0,+0]
;;	|   6   68 |    0    8    5    3 | r217=r144+r146       GR_REGS:[27,+1] FP_REGS:[0,+0]

;;	+--- priority of 70 = 3, priority of 69 60 61 = 4

;;	|   7   69 |    4    7    4    5 | r218=flt(r143)        GR_REGS:[28,+0] FP_REGS:[0,+1]
;;	|   8   58 |    6    4    0    5 | r211=r138+r183        GR_REGS:[28,+1] FP_REGS:[1,+0]
                                                                         ^^^^^^^
;;	|   9   60 |    5    5    2    4 | r213=[r211]           GR_REGS:[29,-1] FP_REGS:[1,+1]
;;	|  10   61 |    4    3    0    4 | r214=[r243+low(`o')]  GR_REGS:[28,+0] FP_REGS:[2,+1]
;;	|  11   70 |    3    8    6    2 | [r217]=r218           GR_REGS:[28,-1] FP_REGS:[3,-1]
...
...
;; Pressure summary: GR_REGS:29 FP_REGS:3
                            ^^^

Solution
--------
When promoting predecessors, only assign the true deps higher priority.
The rest of predecessors get the same priority as depedant insn.

Implementation
--------------

Predecessor promotion logic can be described in pseudo "C" as follows:

  insn->model_priority = model_next_priority++;
  for (;;)                                                   #0
    {
     FOR_EACH_DEP (insn->insn, SD_LIST_HARD_BACK, sd_it, dep)
       {
	if (pro->insn                                        #1
	      && pro->model_priority != model_next_priority
	      && QUEUE_INDEX (pro->insn) != QUEUE_SCHEDULED)
	 {

            pro->model_priority = model_next_priority;       #2

	    if (QUEUE_INDEX (pro->insn) == QUEUE_READY)
	      {
...
              }
            else // recurse to dependent                     #3
              {
                  pro->next = first;
                  first = pro;
              }
	 }                                  // if not schduled
       }                                    // FOR_EACH_DEP

       if (!first)
        break;
       insn = first;
       first = insn->next;
    }                                       // for
   model_next_priority++ ;

 - The core change of this patch is essentially bifurcating #2 to bump the
   priortiy for true dependency more than the rest of deps.

 - An additional gaurd added in #3, to only recurse for true deps;
   otherwise it can end up clobbering the recurse list with same entry
   showing up multiple times, (e.g. an insn could be predecessor of multiple
   dependant insns: as true dep for first and normal dep for the others).

 - The condition #1 also needs to be tightened for the two levels of
   predecessor priorities.

 - The good (and bad) thing is there is a overarching infinite loop #0
   and any coding snafus tend to hit it pretty fast, just by trying to
   bootstrap the toolchain (specially libgcc) or building glibc off of it.

 - There is also a need to track true dependencies in addition to the
   existing all deps of an insn. This is needed at the call-site of
   promotion logic to only invoke promotion when there's unscheduled
   true dependecies.

 - These changes are NOT gated behing the new target hook as it seems
   like the right thing to do anywhere/everywhere.

Improvement measurement
------------------------
Results are convincing On RISC-V (BPI-F3) run of Cactu.
(Build: -Ofast -march=rv64gcv_zba_zbb_zbs)

 Before:
 ------
 7,631,707,552,979      cycles:u                         #    1.600 GHz
 2,630,225,489,010      instructions:u                   #    0.34  insn per cycle

 After (just this fix)
 -----
 7,100,864,968,062      cycles:u          ( 7% faster)   #    1.600 GHz
 2,180,957,013,712      instructions:u    (17% fewer)    #    0.31  insn per cycle

 Aggregate (with ECC fix)
 ----
 6,736,337,207,427      cycles:u          (12% faster)   #    1.600 GHz
 2,078,712,047,604      instructions:u    (21% fewer)    #    0.31  insn per cycle

 ECC fix alone (prev patch)
 ----
 7,153,778,156,281      cycles:u         ( 6% faster)    #    1.600 GHz
 2,143,115,846,207      instructions:u   (18.5% faster)  #    0.30  insn per cycle

Significant gains are also seen on aarch64 (QEMU dynamic icounts only)
(Build: -Ofast -march=armv9-a+sve2)

 Before                   : 1,382,403,783,566
 After (just this fix)    : 1,237,532,639,657 (10.4% fewer)
 Aggregate (with ECC fix) : 1,113,896,471,282 (19.4% fewer)
 Just ECC fix (prev patch): 1,264,869,192,921 (8.5% fewer)

TBD
---

On RISC-V the individual gains from model pressure fix (7, 17) and
the ECC fix (6, 18.5) are are not adding up with both fixes (12, 21)
indicating that main list schedular could be undoing some of the
model schedule arrangements (which it does anyways for right
reasons). On aarch64 they seem to be accumulating on top nicely,
atleast in the QEMU icounts reduction.

gcc/ChangeLog:

	PR target/114729
	* haifa-sched.cc (struct model_insn_info): Add field
	unscheduled_true_preds.
	(model_analyze_insns): Initialize unscheduled_true_preds.
	(model_add_successors_to_worklist): Decrement
	unscheduled_true_preds.
	(model_promote_predecessors): Handle true dependencies
	differently than the rest.
	(model_choose_insn): Promote only on pending true dependencies.
	(model_dump_pressure_summary): Print BB index.
	(model_start_schedule): Call dump summary with BB reference.
	* sched-rgn.cc (debug_dependencies): Print predecessors for
	debugging aid.

gcc/testsuite/ChangeLog:

	PR target/114729
	* gcc.target/riscv/sched1-spills/hang1.c: New test.
	* gcc.target/riscv/sched1-spills/hang5.c: New test.
	* gcc.target/riscv/sched1-spills/spill2.cpp: New test.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
 gcc/haifa-sched.cc                            | 66 ++++++++++++++-----
 gcc/sched-rgn.cc                              | 14 +++-
 .../gcc.target/riscv/sched1-spills/hang1.c    | 32 +++++++++
 .../gcc.target/riscv/sched1-spills/hang5.c    | 60 +++++++++++++++++
 .../gcc.target/riscv/sched1-spills/spill2.cpp | 37 +++++++++++
 5 files changed, 192 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/hang1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/hang5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill2.cpp

Message ID	20241020194018.3051160-4-vineetg@rivosinc.com
State	New
Headers	show Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=WbMIroYF; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XWphv3Q9pz1xvp for <incoming@patchwork.ozlabs.org>; Mon, 21 Oct 2024 06:41:19 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A7D28385840A for <incoming@patchwork.ozlabs.org>; Sun, 20 Oct 2024 19:41:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id EB7EE3858C53 for <gcc-patches@gcc.gnu.org>; Sun, 20 Oct 2024 19:40:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EB7EE3858C53 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EB7EE3858C53 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::42b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729453235; cv=none; b=hjD5DxylK/4dPOnVUiyicdaZNwnIbVmiQ/gvq24RZM0SMiKtKPH0dXM3AD6l25S/QU2HBS0adU3Xa3nYPhwppZCcSuyCOI/I3K0dn1EHmupVoRp2ACfzGTxIQ9EZtiSf38ixZuA/i72ng5puetWi7qfP5tied/FFuGG7fjfRq9c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729453235; c=relaxed/simple; bh=XGJQ63fkCPHe1eWyLIzUWe33ry5o97J6Vfo1jWkoKGQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Co3GdDZfXgXovihfnwWrHRLOmaIZDtZVtPsgafoXfM8F/sm9OCWfJOiZJQGNwmKyC94crQ/6Z7YH4l7nglS+17wKTwls9G6QsjwzUVTjuYd/6F3Glogvr4pF73IVZLqeu0Vaks7cyMcDEyOuc1lE6LOavDwuP27twNAS4DqW8qY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-71e585ef0b3so2957719b3a.1 for <gcc-patches@gcc.gnu.org>; Sun, 20 Oct 2024 12:40:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1729453229; x=1730058029; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Gf6uNCzQSp5XpZ/0a0Jd9KpVyFkyk5kCSU6ws+CjOi8=; b=WbMIroYFgfsfYEBL1T42oYg1Nk9kHkbYut0MWi0FJ5cP/9csyP9/hPk/rmsAoA2AvK 9yyRxWxcwXw6Q4ziZBePTybMRiJdz0fZ3HXI/J9saG+ir2zk3LpwQ02xikowlXcyPBpp Wxj0Mkt2CgTsEZn5/N2IhK4j1ROmFHm+K0JB3TNpZfXxhqxVDHYNtGxT+Mct8plWZX68 0RYQbAgd/hZs3dW1c2nwvL+6tJIlBkW5a2edD1EhfEdvEmBFIj2zFczOlCR6wUbzDlKj rnWB8cfXKfZE6DRNaSlk6AJ+oAPjQeOEPdMaZWmNEe5J7VResXf2I7kZMIvxIWE6BVWk TJiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729453229; x=1730058029; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Gf6uNCzQSp5XpZ/0a0Jd9KpVyFkyk5kCSU6ws+CjOi8=; b=OgnaXlK3SDlw4Hr1/x+X1M0vVcxIdHdcz9h+Urk/rAYuwJA9ybPp6mmoN+6MbDTf6w IrWLSMdxlFD70+U+RMN1MpgbImNUidTKlEl7IFy+IVdHcy1rX4xwPbX4SWfeKK55zC2d meA+wqJd84tOiq6J0cl9ojJnbq5yfu1zPDQ2lhFF28tPljYVSapAyUnW9LXMyvtEIuXU xwvNlqODEm7yiyHQ6I2/Xe/6DMtBFyuPYzMnyWuTOYhk4aZNGSw9kJtBKt7pF9AlAz0G 5SP3kWwZ2fNUXBFxAJUaaizC2kIMCNlSvNc9TXPKdLrb0cZl0w0FaN6ge/F8PXWe6Dzv ggSA== X-Gm-Message-State: AOJu0YxsHguSKNHsULZH0X6ExDMQ2uC5KIwjoVGFDn8ECtvfc7NvDlaJ qQLS7Qv19TLASVyIoqLsRrEM7KQ7aIuuzt3ss+ZH4ASBv3KNZTSfGPsmY5pm+AVk0eKuyoPLviN PZDE= X-Google-Smtp-Source: AGHT+IEJ9A641MKj+Jv91yI46Gt0R+0w8zBvPcr0MAQYIyEkl5zent4b4cDSwST0cW4dYXY9PCBwNg== X-Received: by 2002:a05:6a00:10c4:b0:71e:7a56:3eaf with SMTP id d2e1a72fcca58-71ea32fef75mr11669477b3a.24.1729453228169; Sun, 20 Oct 2024 12:40:28 -0700 (PDT) Received: from fw-ubuntu.ba.rivosinc.com (c-24-5-188-125.hsd1.ca.comcast.net. [24.5.188.125]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7eaeab58003sm1635031a12.44.2024.10.20.12.40.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Oct 2024 12:40:27 -0700 (PDT) From: Vineet Gupta <vineetg@rivosinc.com> To: gcc-patches@gcc.gnu.org, Richard Sandiford <rdsandiford@googlemail.com>, Vladimir Makarov <vmakarov@redhat.com>, Michael Meissner <gnu@the-meissners.org>, Peter Bergner <bergner@linux.ibm.com>, Wilco Dijkstra <wdijkstr@arm.com> Cc: Jeff Law <jeffreyalaw@gmail.com>, gnu-toolchain@rivosinc.com, Vineet Gupta <vineetg@rivosinc.com> Subject: [PATCH 3/4] sched1: model: only promote true dependecies in predecessor promotion Date: Sun, 20 Oct 2024 12:40:17 -0700 Message-ID: <20241020194018.3051160-4-vineetg@rivosinc.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241020194018.3051160-1-vineetg@rivosinc.com> References: <20241020194018.3051160-1-vineetg@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	sched1 improvements \| expand [0/4] sched1 improvements [1/4] sched1: hookize pressure scheduling spilling agressiveness [2/4] RISC-V: Implement TARGET_SCHED_PRESSURE_PREFER_NARROW [PR/114729] [3/4] sched1: model: only promote true dependecies in predecessor promotion [4/4] sched1: model: ICE on infinite loops in predecessor promotion (Not for Merge)

[3/4] sched1: model: only promote true dependecies in predecessor promotion

Commit Message

Comments

Patch