From patchwork Tue Nov 5 20:11:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vineet Gupta X-Patchwork-Id: 2007130 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=W6S0/CWD; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Xjfd61k7tz1xxf for ; Wed, 6 Nov 2024 07:12:10 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58C97385772A for ; Tue, 5 Nov 2024 20:12:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 4FEDD3858CDB for ; Tue, 5 Nov 2024 20:11:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4FEDD3858CDB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4FEDD3858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730837501; cv=none; b=FsEscMErCKv5KXh8c2SDTFYccq6Fz2X98nr+LK4vjdE2xO5uQiAYSB5kHsGJ2zdwKMjAJoJsoMG3/6gBaJw9uW8br+rTFbDXvBcV2vmfM8ZLhuI9YDW2zPfI2nTXrnuMQdpNE97zKwltdEIG6+bZ5+mWudGqaI9nNOHTOQ/rotw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730837501; c=relaxed/simple; bh=pXBgKg9nssxS8XiamHgnhbJOWuPctxFwGuJaae2AP34=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=oNbPF6bpzqrwZ8rqRCxOcbKcGn8XziquuDU5WkpeqhRYijGDwg4/dXtBmKlbclWUlUrv0KaUwcW4x0eeqa+EqRilhWNUziGBXdipg76csJuHio8XX2g2pCbOfWRyrYiR94tEYHhgrn6caGjb4+9iiTng13Bhe1lSglUipmZO2w8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-21116b187c4so41924105ad.3 for ; Tue, 05 Nov 2024 12:11:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1730837497; x=1731442297; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=pUutUqf7/Vx8PUAatPqx7+MaeU9cLi+OuZ0dJ0Y+SdY=; b=W6S0/CWDhpZGVoidnZkflqqxxlw/c5RbZWwVdJD23CGQHKF3PvnyOP0D+qWGfkqPsn CpzE2eL3RPPrC/F/kt+FUebDO1tlVa6VTniEkJlwALnmIH99S5+BmY386i9eBRUKXVPW OiVujk51WYe7id+Wthd6GjJrv7MJOzGk8S2bjRSnx3RJ26ugOeXx5s8oxhMILJcOyWWG 3BYhuWjQeOxF3E1FzYGjLdHulTFS80ugPXPZgTmyMvbkRWekyDUSsZwXD9/R0BLeGpS8 EXtdSJKccObTf8hWw8jrXdaDvxUibNev37sQFxbeX+UER38caMV0dTT/AgqwKS+xaVxo 8tcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730837497; x=1731442297; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pUutUqf7/Vx8PUAatPqx7+MaeU9cLi+OuZ0dJ0Y+SdY=; b=t4vuPuPQazc72AcVchdmJBc5IEqog98fACvjYvqk2G/WLevhpdkkP/eOu7NY5MlRLR JvpU0w7NPtYHfGcu85scpzbJMdNjQsLEw5pPUo3ybUsn9zaq6xKTYGw3gGpjtSkIwy/j kl0gJF++sy9SbV46RXVp8KPH0f3qPfIM+3NI2aH20AR3nJz+rqZiDWJXkxkxLqvCgi1w 7i3L0T5BrUckDL9m23fw/ISo70Enw6aDBM/Not/kd98xRrsM2RCWO4izrntCji/SLEih jbNopWmbC6lXx4odgc3CDgahXUE63XzQxT/GkJs6fWD0NGq+u/kYahsid3ERaLkEz5ws ngBg== X-Gm-Message-State: AOJu0YxyH5UpDyUhYAkVRZtN2WQiCnZlXeF9N8mCYPVl0HGR/f8X3jTa hnaOoLsNjG0fwttOTWTGCl5WCoP9UERiwTEOM4Le1vKq+vgsILKxM7tXdpaqsfNhNF13iYbUpZK dL8A= X-Google-Smtp-Source: AGHT+IE/psJwn7v48KoM/keqZ2oaKvFPvSybvqnRXjfK1/oEhFTwVq89drzm+sKyKEbpz6qWc0y4RA== X-Received: by 2002:a17:903:41cd:b0:211:18bc:e74b with SMTP id d9443c01a7336-2111af1cfeemr213926185ad.1.1730837496889; Tue, 05 Nov 2024 12:11:36 -0800 (PST) Received: from fw-ubuntu.hq.rivosinc.com ([50.145.13.30]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2110571b5ebsm81420975ad.117.2024.11.05.12.11.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2024 12:11:36 -0800 (PST) From: Vineet Gupta To: gcc-patches@gcc.gnu.org, Jeff Law , Richard Sandiford , Vladimir Makarov Cc: Michael Meissner , Peter Bergner , Wilco Dijkstra , gnu-toolchain@rivosinc.com, Vineet Gupta Subject: [PATCH v2] sched1: parameterize pressure scheduling spilling agressiveness [PR/114729] Date: Tue, 5 Nov 2024 12:11:33 -0800 Message-ID: <20241105201133.1363604-1-vineetg@rivosinc.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org changes since v1 * Changed target hook to --param * squash addon patch for RISC-V opting-in, testcase here * updated changelog with latest perf numbers --- sched1 computes ECC (Excess Change Cost) for each insn, which represents the register pressure attributed to the insn. Currently the pressure sensitive schduling algorithm deliberately ignores negative values (pressure reduction), making them 0 (neutral), leading to more spills. This happens due to the assumption that the compiler has a reasonably accurate processor pipeline scheduling model and thus tries to aggresively fill pipeline bubbles with spill slots. This however might not be true, as the model might not be available for certains uarches or even applicable especially for modern out-of-order cores. The existing heuristic induces spill frenzy on RISC-V, noticably so on SPEC2017 507.Cactu. If insn scheduling is disabled completely, the total dynamic icounts for this workload are reduced in half from ~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns). This patch adds --param=cycle-accurate-model={0,1} to gate the spill behavior. - The default (1) preserves existing spill behavior. - targets/uarches sensitive to spilling can override the param to (0) to get the reverse effect. RISC-V backend does so too. The actual perf numbers are very promising. (1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs: Before: ------ Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,917,712.97 msec task-clock:u # 1.000 CPUs utilized 5,314 context-switches:u # 1.081 /sec 3 cpu-migrations:u # 0.001 /sec 204,784 page-faults:u # 41.642 /sec 7,868,291,222,513 cycles:u # 1.600 GHz 2,615,069,866,153 instructions:u # 0.33 insn per cycle 10,799,381,890 branches:u # 2.196 M/sec 15,714,572 branch-misses:u # 0.15% of all branches After: ----- Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,552,979.58 msec task-clock:u # 0.998 CPUs utilized 205,020 context-switches:u # 45.030 /sec 2 cpu-migrations:u # 0.000 /sec 204,221 page-faults:u # 44.854 /sec 7,285,176,204,764 cycles:u (7.4% faster) # 1.600 GHz 2,145,284,345,397 instructions:u (17.96% fewer) # 0.29 insn per cycle 10,799,382,011 branches:u # 2.372 M/sec 16,235,628 branch-misses:u # 0.15% of all branches (2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs. gcc/ChangeLog: PR target/11472 * params.opt (--param=cycle-accurate-model=): New opt. * doc/invoke.texi (cycle-accurate-model): Document. * haifa-sched.cc (model_excess_group_cost): Return negative delta if param_cycle_accurate_model is 0. (model_excess_cost): Ceil negative baseECC to 0 only if param_cycle_accurate_model is 1. Dump the actual ECC value. * config/riscv/riscv.cc (riscv_option_override): Set param to 0. gcc/testsuite/ChangeLog: PR target/114729 * gcc.target/riscv/riscv.exp: Enable new tests to build. * gcc.target/riscv/sched1-spills/spill1.cpp: Add new test. Signed-off-by: Vineet Gupta --- gcc/config/riscv/riscv.cc | 4 +++ gcc/doc/invoke.texi | 7 ++++ gcc/haifa-sched.cc | 32 ++++++++++++++----- gcc/params.opt | 4 +++ gcc/testsuite/gcc.target/riscv/riscv.exp | 2 ++ .../gcc.target/riscv/sched1-spills/spill1.cpp | 32 +++++++++++++++++++ 6 files changed, 73 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 2e9ac280c8f2..75fcadfc3b58 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -10549,6 +10549,10 @@ riscv_option_override (void) param_sched_pressure_algorithm, SCHED_PRESSURE_MODEL); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_cycle_accurate_model, + 0); + /* Function to allocate machine-dependent function status. */ init_machine_status = &riscv_init_machine_status; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 7146163d66d0..c1e07e258b25 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -17084,6 +17084,13 @@ With @option{--param=openacc-privatization=quiet}, don't diagnose. This is the current default. With @option{--param=openacc-privatization=noisy}, do diagnose. +@item cycle-accurate-model +Specifies whether GCC should assume that the scheduling description is mostly +a cycle-accurate model of the target processor, where the code is intended to +run on, in the absence of cache misses. Nonzero means that the selected scheduling +model is accuate and likely describes an in-order processor, and that scheduling +will aggressively spill to try and fill any pipeline bubbles. + @end table The following choices of @var{name} are available on AArch64 targets: diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc index 02c893ec5cd3..cd4b6baddcd2 100644 --- a/gcc/haifa-sched.cc +++ b/gcc/haifa-sched.cc @@ -2398,11 +2398,18 @@ model_excess_group_cost (struct model_pressure_group *group, int pressure, cl; cl = ira_pressure_classes[pci]; - if (delta < 0 && point >= group->limits[pci].point) + if (delta < 0) { - pressure = MAX (group->limits[pci].orig_pressure, - curr_reg_pressure[cl] + delta); - return -model_spill_cost (cl, pressure, curr_reg_pressure[cl]); + if (point >= group->limits[pci].point) + { + pressure = MAX (group->limits[pci].orig_pressure, + curr_reg_pressure[cl] + delta); + return -model_spill_cost (cl, pressure, curr_reg_pressure[cl]); + } + /* if target prefers fewer spills, return the -ve delta indicating + pressure reduction. */ + else if (!param_cycle_accurate_model) + return delta; } if (delta > 0) @@ -2453,7 +2460,7 @@ model_excess_cost (rtx_insn *insn, bool print_p) } if (print_p) - fprintf (sched_dump, "\n"); + fprintf (sched_dump, " ECC %d\n", cost); return cost; } @@ -2489,8 +2496,9 @@ model_set_excess_costs (rtx_insn **insns, int count) bool print_p; /* Record the baseECC value for each instruction in the model schedule, - except that negative costs are converted to zero ones now rather than - later. Do not assign a cost to debug instructions, since they must + except that for targets which prefer wider schedules (more spills) + negative costs are converted to zero ones now rather than later. + Do not assign a cost to debug instructions, since they must not change code-generation decisions. Experiments suggest we also get better results by not assigning a cost to instructions from a different block. @@ -2512,7 +2520,7 @@ model_set_excess_costs (rtx_insn **insns, int count) print_p = true; } cost = model_excess_cost (insns[i], print_p); - if (cost <= 0) + if (param_cycle_accurate_model && cost <= 0) { priority = INSN_PRIORITY (insns[i]) - insn_delay (insns[i]) - cost; priority_base = MAX (priority_base, priority); @@ -2523,6 +2531,14 @@ model_set_excess_costs (rtx_insn **insns, int count) if (print_p) fprintf (sched_dump, MODEL_BAR); + /* Typically in-order cores have a good pipeline scheduling model and the + algorithm would try to use that to minimize bubbles, favoring spills. + MAX (baseECC, 0) below changes negative baseECC (pressure reduction) + to 0 (pressure neutral) thus tending to more spills. + Otherwise return. */ + if (!param_cycle_accurate_model) + return; + /* Use MAX (baseECC, 0) and baseP to calculcate ECC for each instruction. */ for (i = 0; i < count; i++) diff --git a/gcc/params.opt b/gcc/params.opt index 7c572774df24..6d73993cd089 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -66,6 +66,10 @@ Enable asan stack protection. Common Joined UInteger Var(param_asan_use_after_return) Init(1) IntegerRange(0, 1) Param Optimization Enable asan detection of use-after-return bugs. +-param=cycle-accurate-model +Common Joined UInteger Var(param_cycle_accurate_model) Init(1) IntegerRange(0, 1) Param Optimization +Whether the scheduling description is mostly a cycle-accurate model of the target processor and is likely to be spill aggressively to fill any pipeline bubbles. + -param=hwasan-instrument-stack= Common Joined UInteger Var(param_hwasan_instrument_stack) Init(1) IntegerRange(0, 1) Param Optimization Enable hwasan instrumentation of statically sized stack-allocated variables. diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp b/gcc/testsuite/gcc.target/riscv/riscv.exp index 187eb6640470..3cbbf63b9d0a 100644 --- a/gcc/testsuite/gcc.target/riscv/riscv.exp +++ b/gcc/testsuite/gcc.target/riscv/riscv.exp @@ -38,6 +38,8 @@ dg-init # Main loop. gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \ "" $DEFAULT_CFLAGS +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/sched1-spills/*.{\[cS\],cpp}]] \ + "" $DEFAULT_CFLAGS # All done. dg-finish diff --git a/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp b/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp new file mode 100644 index 000000000000..8060ec245281 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp @@ -0,0 +1,32 @@ +/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -save-temps -fverbose-asm" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "O1" "-Og" "-Os" "-Oz" } } */ + +/* Reduced from SPEC2017 Cactu ML_BSSN_Advect.cpp + by comparing -fschedule-insn and -fno-schedule-insns builds. + Shows up one extra spill (pair of spill markers "sfp") in verbose asm + output which the patch fixes. */ + +void s(); +double b, c, d, e, f, g, h, k, l, m, n, o, p, q, t, u, v; +int *j; +double *r, *w; +long x; +void y() { + double *a((double *)s); + for (;;) + for (; j[1];) + for (int i = 1; i < j[0]; i++) { + k = l; + m = n; + o = p = q; + r[0] = t; + a[0] = u; + x = g; + e = f; + v = w[x]; + b = c; + d = h; + } +} + +/* { dg-final { scan-assembler-not "%sfp" } } */