From patchwork Wed Aug 23 13:48:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1824709 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WH0Pf1A1; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RW6yB1Jxvz1ybW for ; Wed, 23 Aug 2023 23:49:04 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5D8BB3854160 for ; Wed, 23 Aug 2023 13:49:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5D8BB3854160 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1692798542; bh=R+nzoALqy7jB69MeMzYCcbGLCVwMcC1RTjLR8qpi+Bg=; h=Date:Cc:Subject:To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=WH0Pf1A1BYJS8GaArdrHJh8fKRHGUjMb31ZQPee4uFOrEPdCulYYb8zccoqw9bKhf aCw92bUsKbHzHWAVyjQ5XSN+8YC/DVnS1AFu92zih4/GuZx1/xPtQXNZYlpwk+Z0cE Zb6yubmiQtYV/Zc9j+1BnCzRM7/umxkpgvV4Egrw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c]) by sourceware.org (Postfix) with ESMTPS id 253D7385770F for ; Wed, 23 Aug 2023 13:48:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 253D7385770F Received: by mail-lj1-x22c.google.com with SMTP id 38308e7fff4ca-2bcde83ce9fso7572091fa.1 for ; Wed, 23 Aug 2023 06:48:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692798510; x=1693403310; h=content-transfer-encoding:to:subject:from:content-language:cc :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=R+nzoALqy7jB69MeMzYCcbGLCVwMcC1RTjLR8qpi+Bg=; b=Aa1Ir3SrlcJ+2Qa1RIwqZOBHlDxXL7c6fGDYIqxX4GwdsHPCwtD6LIa5PCqc/dTyMI /bY+HNwSkPcOacvq2O1Xi/FDrytgTJ3N5XuZtRZ0JFruOf9SHO0Eg6OYqYhXLvYLRmrL vJeFYAMTGqzh5RuL9HCgbL1KM3YIHU8R+fLDWjWtdnV0KEDNImal0Ot8T7sK4BEYIOoX UxySk+qgFrUsi3SDaNTnbEIHcWz/eMMx9dauyBPgaKlJCShWJ3nAAeyVl+tFTla0AbCJ WpK9lLjc94i3KThKk1ADuyUuXsxxPKNH0/AcV6OAaRMF2F7fZQJ8Ws8eRxJTyvpnd1aF Wxiw== X-Gm-Message-State: AOJu0Yw6rMMfkcOXykE2NiowbVUN5k56mmXxUv2BV7XKa7atmyXvRBfN fcUf7byvXkrMl08HWhtYN84gKTLWOpU= X-Google-Smtp-Source: AGHT+IG9SRNjA0PzU+XiLTcxHeme8xKC4N/HeeqAMuc4TDYdWwPh5m87iWsPur0PLlHWvx6sXv5sgA== X-Received: by 2002:a2e:9b4c:0:b0:2b9:e053:7a07 with SMTP id o12-20020a2e9b4c000000b002b9e0537a07mr9191139ljj.45.1692798508829; Wed, 23 Aug 2023 06:48:28 -0700 (PDT) Received: from [192.168.1.23] (ip-046-005-130-086.um12.pools.vodafone-ip.de. [46.5.130.86]) by smtp.gmail.com with ESMTPSA id h3-20020a170906260300b00992afee724bsm9706193ejc.76.2023.08.23.06.48.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Aug 2023 06:48:28 -0700 (PDT) Message-ID: <80530cd8-b0b6-43af-48b1-6e6cccfe5d6d@gmail.com> Date: Wed, 23 Aug 2023 15:48:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: rdapp.gcc@gmail.com Content-Language: en-US Subject: [PATCH] RISC-V: Add initial pipeline description for an out-of-order core. To: gcc-patches , palmer , Kito Cheng , jeffreyalaw , "juzhe.zhong@rivai.ai" X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Robin Dapp via Gcc-patches From: Robin Dapp Reply-To: Robin Dapp Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, this adds a pipeline description for a generic out-of-order core. Latency and units are not based on any real processor but more or less educated guesses what such a processor could look like. For the lack of a better name, I called the -mtune parameter "generic-ooo". In order to account for latency scaling by LMUL != 1, sched_adjust_cost is implemented. It will scale an instruction's latency by its LMUL so an LMUL == 8 instruction will take 8 times the number of cycles the same instruction with LMUL == 1 would take. As this potentially causes very high latencies which, in turn, might lead to scheduling anomalies and a higher number of vsetvls emitted, this feature is only enabled when specifying -madjust-lmul-cost. Additionally, in order to easily recognize pre-RA vsetvls this patch introduces an insn type vsetvl_pre which is used in sched_adjust_cost. As mentioned, the latency numbers are guesswork at best. I assumed 6-wide issue as most public announcements point into that direction and obviously everything else is similarly coarse. Feel free to correct in case I unnecessarily pessimized or underestimated something. Regards Robin gcc/ChangeLog: * config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add generic_ooo. * config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement scheduler hook. (TARGET_SCHED_ADJUST_COST): Define. * config/riscv/riscv.md (no,yes"): Include generic-ooo.md * config/riscv/riscv.opt: Add -madjust-lmul-cost. * config/riscv/generic-ooo.md: New file. * config/riscv/vector.md: Add vsetvl_pre. --- gcc/config/riscv/generic-ooo.md | 284 +++++++++++++++++++++++++++++++ gcc/config/riscv/riscv-cores.def | 1 + gcc/config/riscv/riscv-opts.h | 3 +- gcc/config/riscv/riscv.cc | 87 ++++++++++ gcc/config/riscv/riscv.md | 5 +- gcc/config/riscv/riscv.opt | 3 + gcc/config/riscv/vector.md | 4 +- 7 files changed, 383 insertions(+), 4 deletions(-) create mode 100644 gcc/config/riscv/generic-ooo.md diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md new file mode 100644 index 00000000000..78b9e48f935 --- /dev/null +++ b/gcc/config/riscv/generic-ooo.md @@ -0,0 +1,284 @@ +;; RISC-V generic out-of-order core scheduling model. +;; Copyright (C) 2017-2023 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, but +;; WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . + +(define_automaton "generic_ooo") + +;; Regarding functional units we assume a three-way split: +;; - Integer ALU (IXU) - 4 symmetric units. +;; - Floating-point (FXU) - 2 symmetric units. +;; - Vector Unit (VXU) - 1 unit. + +;; We assume 6-wide issue: +;; - 5-wide generic/integer issue. +;; - 1-wide vector issue. + +;; For now, the only subunits are for non-pipelined integer division and +;; vector div/mult/sqrt. +;; No extra units for e.g. vector permutes, masking, everything is assumed to +;; be on the same pipelined execution unit. + +;; Latency: +;; - Regular integer operations take 1 cycle. +;; - Multiplication/Division take multiple cycles. +;; - Float operations take 4-6 cycles. +;; - Regular vector operations take 2-6 cycles. +;; (This assumes LMUL = 1, latency for LMUL = 2, 4, 8 is scaled accordingly +;; by riscv_sched_adjust_cost when -madjust-lmul-cost is given) +;; - Load/Store: +;; - To/From IXU: 4 cycles. +;; - To/From FXU: 6 cycles. +;; - To/From VXU: 6 cycles. + +;; Integer/float issue queues. +(define_cpu_unit "issue0,issue1,issue2,issue3,issue4" "generic_ooo") + +;; Separate issue queue for vector instructions. +(define_cpu_unit "generic_ooo_vxu_issue" "generic_ooo") + +;; Integer/float execution units. +(define_cpu_unit "ixu0,ixu1,ixu2,ixu3" "generic_ooo") +(define_cpu_unit "fxu0,fxu1" "generic_ooo") + +;; Integer subunit for division. +(define_cpu_unit "generic_ooo_div" "generic_ooo") + +;; Vector execution unit. +(define_cpu_unit "generic_ooo_vxu_alu" "generic_ooo") + +;; Vector subunit that does mult/div/sqrt. +(define_cpu_unit "generic_ooo_vxu_multicycle" "generic_ooo") + +;; Shortcuts +(define_reservation "generic_ooo_issue" "issue0|issue1|issue2|issue3|issue4") +(define_reservation "generic_ooo_ixu_alu" "ixu0|ixu1|ixu2|ixu3") +(define_reservation "generic_ooo_fxu" "fxu0|fxu1") + + +;; Integer load/store +(define_insn_reservation "generic_ooo_int_load" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "load")) + "generic_ooo_issue,generic_ooo_ixu_alu") + +(define_insn_reservation "generic_ooo_int_store" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "store")) + "generic_ooo_issue,generic_ooo_ixu_alu") + +;; Float load/store +(define_insn_reservation "generic_ooo_float_load" 6 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "fpload")) + "generic_ooo_issue,generic_ooo_ixu_alu") + +(define_insn_reservation "generic_ooo_float_store" 6 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "fpstore")) + "generic_ooo_issue,generic_ooo_fxu") + +;; Vector load/store +(define_insn_reservation "generic_ooo_vec_load" 6 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vlde,vldm,vlds,vldux,vldox,vldff,vldr")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +(define_insn_reservation "generic_ooo_vec_store" 6 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vste,vstm,vsts,vstux,vstox,vstr")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector segment loads/stores. +(define_insn_reservation "generic_ooo_vec_loadstore_seg" 10 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vlsegde,vlsegds,vlsegdux,vlsegdox,vlsegdff,\ + vssegte,vssegts,vssegtux,vssegtox")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + + +;; Generic integer instructions. +(define_insn_reservation "generic_ooo_alu" 1 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\ + move,bitmanip,min,max,minu,maxu,clz,ctz")) + "generic_ooo_issue,generic_ooo_ixu_alu") + + +;; Float move, convert and compare. +(define_insn_reservation "generic_ooo_float_move" 3 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "fmove")) + "generic_ooo_issue,generic_ooo_fxu") + +(define_insn_reservation "generic_ooo_fcvt" 3 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "fcvt")) + "generic_ooo_issue,generic_ooo_fxu") + +(define_insn_reservation "generic_ooo_fcmp" 2 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "fcmp")) + "generic_ooo_issue,generic_ooo_fxu") + +;; Integer multiplication. +(define_insn_reservation "generic_ooo_imul" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "imul")) + "generic_ooo_issue,generic_ooo_ixu_alu,generic_ooo_ixu_alu") + +;; Assume integer division is not pipelined. Do not block the unit for more +;; than three cycles so the DFA does not get too large. Similar for other +;; non-pipelined instructions. +(define_insn_reservation "generic_ooo_idiv" 16 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "idiv")) + "generic_ooo_issue,generic_ooo_ixu_alu,generic_ooo_div,generic_ooo_div*3") + +;; Float addition and multiplication. +(define_insn_reservation "generic_ooo_faddmul" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "fadd,fmul")) + "generic_ooo_issue,generic_ooo_fxu") + +;; Float FMA. +(define_insn_reservation "generic_ooo_float_fma" 6 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "fmadd")) + "generic_ooo_issue,generic_ooo_fxu") + +;; Assume float division and sqrt are not pipelined. +(define_insn_reservation "generic_ooo_float_div_single" 12 + (and (eq_attr "tune" "generic_ooo") + (and (eq_attr "type" "fdiv,fsqrt") + (eq_attr "mode" "SF"))) + "generic_ooo_issue,generic_ooo_fxu,generic_ooo_div,generic_ooo_div*3") + +(define_insn_reservation "generic_ooo_float_div_double" 16 + (and (eq_attr "tune" "generic_ooo") + (and (eq_attr "type" "fdiv,fsqrt") + (eq_attr "mode" "DF"))) + "generic_ooo_issue,generic_ooo_fxu,generic_ooo_div,generic_ooo_div*3") + +;; Popcount and clmul. +(define_insn_reservation "generic_ooo_popcount" 2 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "cpop,clmul")) + "generic_ooo_issue,generic_ooo_ixu_alu") + +;; Regular vector operations and integer comparisons. +(define_insn_reservation "generic_ooo_vec_alu" 3 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\ + vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector float comparison, conversion etc. +(define_insn_reservation "generic_ooo_vec_fcmp" 3 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vfrecp,vfminmax,vfcmp,vfsgnj,vfclass,vfcvtitof,\ + vfcvtftoi,vfwcvtitof,vfwcvtftoi,vfwcvtftof,vfncvtitof,\ + vfncvtftoi,vfncvtftof")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector integer multiplication. +(define_insn_reservation "generic_ooo_vec_imul" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vimul,viwmul,vimuladd,viwmuladd,vsmul")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector float addition. +(define_insn_reservation "generic_ooo_vec_fadd" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vfalu,vfwalu")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector float multiplication and FMA. +(define_insn_reservation "generic_ooo_vec_fmul" 6 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector crypto, assumed to be a generic operation for now. +(define_insn_reservation "generic_ooo_crypto" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "crypto")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector permute. +(define_insn_reservation "generic_ooo_perm" 3 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vimerge,vfmerge,vslideup,vslidedown,vislide1up,\ + vislide1down,vfslide1up,vfslide1down,vgather,vcompress")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector reduction. +(define_insn_reservation "generic_ooo_vec_reduction" 8 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vired,viwred,vfredu,vfwredu")) + "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle") + +;; Vector ordered reduction, assume the latency number is for +;; a 128-bit vector. It is scaled in riscv_sched_adjust_cost +;; for larger vectors. +(define_insn_reservation "generic_ooo_vec_ordered_reduction" 10 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vfredo,vfwredo")) + "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle*3") + +;; Vector integer division, assume not pipelined. +(define_insn_reservation "generic_ooo_vec_idiv" 16 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vidiv")) + "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle*3") + +;; Vector float divisions and sqrt, assume not pipelined. +(define_insn_reservation "generic_ooo_vec_float_divsqrt" 16 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vfdiv,vfsqrt")) + "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle*3") + +;; Vector mask operations. +(define_insn_reservation "generic_ooo_vec_mask" 2 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vmalu,vmpop,vmffs,vmsfs,vmiota,vmidx,vimovvx,vimovxv,\ + vfmovvf,vfmovfv")) + "generic_ooo_vxu_issue,generic_ooo_vxu_alu") + +;; Vector vsetvl. +(define_insn_reservation "generic_ooo_vec_vesetvl" 1 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "vsetvl,vsetvl_pre")) + "generic_ooo_vxu_issue") + +;; Vector rounding mode setters, assume pipeline barrier. +(define_insn_reservation "generic_ooo_vec_setrm" 20 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "wrvxrm,wrfrm")) + "generic_ooo_vxu_issue,generic_ooo_vxu_issue*3") + +;; Vector read vlen/vlenb. +(define_insn_reservation "generic_ooo_vec_readlen" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "rdvlenb,rdvl")) + "generic_ooo_vxu_issue,generic_ooo_vxu_issue") + +;; Transfer from/to coprocessor. Assume not pipelined. +(define_insn_reservation "generic_ooo_xfer" 4 + (and (eq_attr "tune" "generic_ooo") + (eq_attr "type" "mfc,mtc")) + "generic_ooo_issue,generic_ooo_ixu_alu,generic_ooo_ixu_alu*3") diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def index 7d87ab7ce28..91deabb6079 100644 --- a/gcc/config/riscv/riscv-cores.def +++ b/gcc/config/riscv/riscv-cores.def @@ -38,6 +38,7 @@ RISCV_TUNE("sifive-3-series", generic, rocket_tune_info) RISCV_TUNE("sifive-5-series", generic, rocket_tune_info) RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info) RISCV_TUNE("thead-c906", generic, thead_c906_tune_info) +RISCV_TUNE("generic-ooo", generic_ooo, generic_ooo_tune_info) RISCV_TUNE("size", generic, optimize_size_tune_info) #undef RISCV_TUNE diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index 378a17699cd..2f22fcaf9bd 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -52,7 +52,8 @@ extern enum riscv_isa_spec_class riscv_isa_spec; /* Keep this list in sync with define_attr "tune" in riscv.md. */ enum riscv_microarchitecture_type { generic, - sifive_7 + sifive_7, + generic_ooo }; extern enum riscv_microarchitecture_type riscv_microarchitecture; diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 480f3124496..0d7bc2ae9b6 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -349,6 +349,21 @@ static const struct riscv_tune_param thead_c906_tune_info = { false /* use_divmod_expansion */ }; +/* Costs to use when optimizing for a generic ooo profile. */ +static const struct riscv_tune_param generic_ooo_tune_info = { + {COSTS_N_INSNS (2), COSTS_N_INSNS (2)}, /* fp_add */ + {COSTS_N_INSNS (5), COSTS_N_INSNS (6)}, /* fp_mul */ + {COSTS_N_INSNS (7), COSTS_N_INSNS (8)}, /* fp_div */ + {COSTS_N_INSNS (2), COSTS_N_INSNS (2)}, /* int_mul */ + {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */ + 1, /* issue_rate */ + 3, /* branch_cost */ + 4, /* memory_cost */ + 4, /* fmv_cost */ + false, /* slow_unaligned_access */ + false, /* use_divmod_expansion */ +}; + /* Costs to use when optimizing for size. */ static const struct riscv_tune_param optimize_size_tune_info = { {COSTS_N_INSNS (1), COSTS_N_INSNS (1)}, /* fp_add */ @@ -6744,6 +6759,75 @@ riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more) return more - 1; } +/* Adjust the cost/latency of instructions for scheduling. + For now this is just used to change the latency of vector instructions + according to their LMUL. We assume that an insn with LMUL == 8 requires + eight times more execution cycles than the same insn with LMUL == 1. + As this may cause very high latencies which lead to scheduling artifacts + we currently only perform the adjustment when -madjust-lmul-cost is given. + */ +static int +riscv_sched_adjust_cost (rtx_insn *, int, rtx_insn *insn, int cost, + unsigned int) +{ + /* Only do adjustments for the generic out-of-order scheduling model. */ + if (!TARGET_VECTOR || riscv_microarchitecture != generic_ooo) + return cost; + + if (recog_memoized (insn) < 0) + return cost; + + enum attr_type type = get_attr_type (insn); + + if (type == TYPE_VFREDO || type == TYPE_VFWREDO) + { + /* TODO: For ordered reductions scale the base cost relative to the + number of units. */ + ; + } + + /* Don't do any LMUL-based latency adjustment unless explicitly asked to. */ + if (!TARGET_ADJUST_LMUL_COST) + return cost; + + /* vsetvl has a vlmul attribute but its latency does not depend on it. */ + if (type == TYPE_VSETVL || type == TYPE_VSETVL_PRE) + return cost; + + enum riscv_vector::vlmul_type lmul = + (riscv_vector::vlmul_type)get_attr_vlmul (insn); + + double factor = 1; + switch (lmul) + { + case riscv_vector::LMUL_2: + factor = 2; + break; + case riscv_vector::LMUL_4: + factor = 4; + break; + case riscv_vector::LMUL_8: + factor = 8; + break; + case riscv_vector::LMUL_F2: + factor = 0.5; + break; + case riscv_vector::LMUL_F4: + factor = 0.25; + break; + case riscv_vector::LMUL_F8: + factor = 0.125; + break; + default: + factor = 1; + } + + /* If the latency was nonzero, keep it that way. */ + int new_cost = MAX (cost > 0 ? 1 : 0, cost * factor); + + return new_cost; +} + /* Auxiliary function to emit RISC-V ELF attribute. */ static void riscv_emit_attribute () @@ -8469,6 +8553,9 @@ riscv_frame_pointer_required (void) #undef TARGET_SCHED_VARIABLE_ISSUE #define TARGET_SCHED_VARIABLE_ISSUE riscv_sched_variable_issue +#undef TARGET_SCHED_ADJUST_COST +#define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost + #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index b456fa6abb3..2a9bad0d8ee 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -406,7 +406,7 @@ (define_attr "type" mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul, fmadd,fdiv,fcmp,fcvt,fsqrt,multi,auipc,sfb_alu,nop,ghost,bitmanip,rotate, clmul,min,max,minu,maxu,clz,ctz,cpop, - atomic,condmove,crypto,rdvlenb,rdvl,wrvxrm,wrfrm,rdfrm,vsetvl, + atomic,condmove,crypto,rdvlenb,rdvl,wrvxrm,wrfrm,rdfrm,vsetvl,vsetvl_pre, vlde,vste,vldm,vstm,vlds,vsts, vldux,vldox,vstux,vstox,vldff,vldr,vstr, vlsegde,vssegte,vlsegds,vssegts,vlsegdux,vlsegdox,vssegtux,vssegtox,vlsegdff, @@ -534,7 +534,7 @@ (define_attr "cannot_copy" "no,yes" (const_string "no")) ;; Microarchitectures we know how to tune for. ;; Keep this in sync with enum riscv_microarchitecture. (define_attr "tune" - "generic,sifive_7" + "generic,sifive_7,generic_ooo" (const (symbol_ref "((enum attr_tune) riscv_microarchitecture)"))) ;; Describe a user's asm statement. @@ -3328,5 +3328,6 @@ (define_expand "msubhisi4" (include "generic.md") (include "sifive-7.md") (include "thead.md") +(include "generic-ooo.md") (include "vector.md") (include "zicond.md") diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index a962ea8f9d4..724a5429e5d 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -311,3 +311,6 @@ Enum(riscv_autovec_lmul) String(m8) Value(RVV_M8) -param=riscv-autovec-lmul= Target RejectNegative Joined Enum(riscv_autovec_lmul) Var(riscv_autovec_lmul) Init(RVV_M1) -param=riscv-autovec-lmul= Set the RVV LMUL of auto-vectorization in the RISC-V port. + +madjust-lmul-cost +Target Var(TARGET_ADJUST_LMUL_COST) Init(0) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 6ceae25dbed..9c61d0ead8a 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -935,7 +935,9 @@ (define_insn "@vlmax_avl" [(set (match_operand:P 0 "register_operand" "=r") (unspec:P [(match_operand:P 1 "const_int_operand" "i")] UNSPEC_VLMAX))] "TARGET_VECTOR" - "") + "" + [(set_attr "type" "vsetvl_pre")] + ) ;; Set VXRM (define_insn "vxrmsi"