From patchwork Wed Aug 23 13:48:27 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Robin Dapp <rdapp.gcc@gmail.com>
X-Patchwork-Id: 1824709
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256
 header.s=default header.b=WH0Pf1A1;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4RW6yB1Jxvz1ybW
	for <incoming@patchwork.ozlabs.org>; Wed, 23 Aug 2023 23:49:04 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 5D8BB3854160
	for <incoming@patchwork.ozlabs.org>; Wed, 23 Aug 2023 13:49:02 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5D8BB3854160
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1692798542;
	bh=R+nzoALqy7jB69MeMzYCcbGLCVwMcC1RTjLR8qpi+Bg=;
	h=Date:Cc:Subject:To:List-Id:List-Unsubscribe:List-Archive:
	 List-Post:List-Help:List-Subscribe:From:Reply-To:From;
	b=WH0Pf1A1BYJS8GaArdrHJh8fKRHGUjMb31ZQPee4uFOrEPdCulYYb8zccoqw9bKhf
	 aCw92bUsKbHzHWAVyjQ5XSN+8YC/DVnS1AFu92zih4/GuZx1/xPtQXNZYlpwk+Z0cE
	 Zb6yubmiQtYV/Zc9j+1BnCzRM7/umxkpgvV4Egrw=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com
 [IPv6:2a00:1450:4864:20::22c])
 by sourceware.org (Postfix) with ESMTPS id 253D7385770F
 for <gcc-patches@gcc.gnu.org>; Wed, 23 Aug 2023 13:48:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 253D7385770F
Received: by mail-lj1-x22c.google.com with SMTP id
 38308e7fff4ca-2bcde83ce9fso7572091fa.1
 for <gcc-patches@gcc.gnu.org>; Wed, 23 Aug 2023 06:48:32 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1692798510; x=1693403310;
 h=content-transfer-encoding:to:subject:from:content-language:cc
 :user-agent:mime-version:date:message-id:x-gm-message-state:from:to
 :cc:subject:date:message-id:reply-to;
 bh=R+nzoALqy7jB69MeMzYCcbGLCVwMcC1RTjLR8qpi+Bg=;
 b=Aa1Ir3SrlcJ+2Qa1RIwqZOBHlDxXL7c6fGDYIqxX4GwdsHPCwtD6LIa5PCqc/dTyMI
 /bY+HNwSkPcOacvq2O1Xi/FDrytgTJ3N5XuZtRZ0JFruOf9SHO0Eg6OYqYhXLvYLRmrL
 vJeFYAMTGqzh5RuL9HCgbL1KM3YIHU8R+fLDWjWtdnV0KEDNImal0Ot8T7sK4BEYIOoX
 UxySk+qgFrUsi3SDaNTnbEIHcWz/eMMx9dauyBPgaKlJCShWJ3nAAeyVl+tFTla0AbCJ
 WpK9lLjc94i3KThKk1ADuyUuXsxxPKNH0/AcV6OAaRMF2F7fZQJ8Ws8eRxJTyvpnd1aF
 Wxiw==
X-Gm-Message-State: AOJu0Yw6rMMfkcOXykE2NiowbVUN5k56mmXxUv2BV7XKa7atmyXvRBfN
 fcUf7byvXkrMl08HWhtYN84gKTLWOpU=
X-Google-Smtp-Source: 
 AGHT+IG9SRNjA0PzU+XiLTcxHeme8xKC4N/HeeqAMuc4TDYdWwPh5m87iWsPur0PLlHWvx6sXv5sgA==
X-Received: by 2002:a2e:9b4c:0:b0:2b9:e053:7a07 with SMTP id
 o12-20020a2e9b4c000000b002b9e0537a07mr9191139ljj.45.1692798508829;
 Wed, 23 Aug 2023 06:48:28 -0700 (PDT)
Received: from [192.168.1.23] (ip-046-005-130-086.um12.pools.vodafone-ip.de.
 [46.5.130.86]) by smtp.gmail.com with ESMTPSA id
 h3-20020a170906260300b00992afee724bsm9706193ejc.76.2023.08.23.06.48.28
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 23 Aug 2023 06:48:28 -0700 (PDT)
Message-ID: <80530cd8-b0b6-43af-48b1-6e6cccfe5d6d@gmail.com>
Date: Wed, 23 Aug 2023 15:48:27 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.0
Cc: rdapp.gcc@gmail.com
Content-Language: en-US
Subject: [PATCH] RISC-V: Add initial pipeline description for an out-of-order
 core.
To: gcc-patches <gcc-patches@gcc.gnu.org>, palmer <palmer@dabbelt.com>,
 Kito Cheng <kito.cheng@gmail.com>, jeffreyalaw <jeffreyalaw@gmail.com>,
 "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Robin Dapp via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Robin Dapp <rdapp.gcc@gmail.com>
Reply-To: Robin Dapp <rdapp.gcc@gmail.com>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

Hi,

this adds a pipeline description for a generic out-of-order core.
Latency and units are not based on any real processor but more or less
educated guesses what such a processor could look like.
For the lack of a better name, I called the -mtune parameter "generic-ooo".

In order to account for latency scaling by LMUL != 1, sched_adjust_cost
is implemented.  It will scale an instruction's latency by its LMUL
so an LMUL == 8 instruction will take 8 times the number of cycles
the same instruction with LMUL == 1 would take.
As this potentially causes very high latencies which, in turn, might
lead to scheduling anomalies and a higher number of vsetvls emitted,
this feature is only enabled when specifying -madjust-lmul-cost.

Additionally, in order to easily recognize pre-RA vsetvls this patch
introduces an insn type vsetvl_pre which is used in sched_adjust_cost.

As mentioned, the latency numbers are guesswork at best.  I assumed
6-wide issue as most public announcements point into that direction
and obviously everything else is similarly coarse.  Feel free to
correct in case I unnecessarily pessimized or underestimated something.

Regards
 Robin

gcc/ChangeLog:

	* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
	* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
	Add generic_ooo.
	* config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement
	scheduler hook.
	(TARGET_SCHED_ADJUST_COST): Define.
	* config/riscv/riscv.md (no,yes"): Include generic-ooo.md
	* config/riscv/riscv.opt: Add -madjust-lmul-cost.
	* config/riscv/generic-ooo.md: New file.
	* config/riscv/vector.md: Add vsetvl_pre.
---
 gcc/config/riscv/generic-ooo.md  | 284 +++++++++++++++++++++++++++++++
 gcc/config/riscv/riscv-cores.def |   1 +
 gcc/config/riscv/riscv-opts.h    |   3 +-
 gcc/config/riscv/riscv.cc        |  87 ++++++++++
 gcc/config/riscv/riscv.md        |   5 +-
 gcc/config/riscv/riscv.opt       |   3 +
 gcc/config/riscv/vector.md       |   4 +-
 7 files changed, 383 insertions(+), 4 deletions(-)
 create mode 100644 gcc/config/riscv/generic-ooo.md

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
new file mode 100644
index 00000000000..78b9e48f935
--- /dev/null
+++ b/gcc/config/riscv/generic-ooo.md
@@ -0,0 +1,284 @@
+;; RISC-V generic out-of-order core scheduling model.
+;; Copyright (C) 2017-2023 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "generic_ooo")
+
+;; Regarding functional units we assume a three-way split:
+;; - Integer ALU (IXU) - 4 symmetric units.
+;; - Floating-point (FXU) - 2 symmetric units.
+;; - Vector Unit (VXU) - 1 unit.
+
+;; We assume 6-wide issue:
+;; - 5-wide generic/integer issue.
+;; - 1-wide vector issue.
+
+;; For now, the only subunits are for non-pipelined integer division and
+;; vector div/mult/sqrt.
+;; No extra units for e.g. vector permutes, masking, everything is assumed to
+;; be on the same pipelined execution unit.
+
+;; Latency:
+;; - Regular integer operations take 1 cycle.
+;; - Multiplication/Division take multiple cycles.
+;; - Float operations take 4-6 cycles.
+;; - Regular vector operations take 2-6 cycles.
+;;   (This assumes LMUL = 1, latency for LMUL = 2, 4, 8 is scaled accordingly
+;;    by riscv_sched_adjust_cost when -madjust-lmul-cost is given)
+;; - Load/Store:
+;;   - To/From IXU: 4 cycles.
+;;   - To/From FXU: 6 cycles.
+;;   - To/From VXU: 6 cycles.
+
+;; Integer/float issue queues.
+(define_cpu_unit "issue0,issue1,issue2,issue3,issue4" "generic_ooo")
+
+;; Separate issue queue for vector instructions.
+(define_cpu_unit "generic_ooo_vxu_issue" "generic_ooo")
+
+;; Integer/float execution units.
+(define_cpu_unit "ixu0,ixu1,ixu2,ixu3" "generic_ooo")
+(define_cpu_unit "fxu0,fxu1" "generic_ooo")
+
+;; Integer subunit for division.
+(define_cpu_unit "generic_ooo_div" "generic_ooo")
+
+;; Vector execution unit.
+(define_cpu_unit "generic_ooo_vxu_alu" "generic_ooo")
+
+;; Vector subunit that does mult/div/sqrt.
+(define_cpu_unit "generic_ooo_vxu_multicycle" "generic_ooo")
+
+;; Shortcuts
+(define_reservation "generic_ooo_issue" "issue0|issue1|issue2|issue3|issue4")
+(define_reservation "generic_ooo_ixu_alu" "ixu0|ixu1|ixu2|ixu3")
+(define_reservation "generic_ooo_fxu" "fxu0|fxu1")
+
+
+;; Integer load/store
+(define_insn_reservation "generic_ooo_int_load" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "load"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+(define_insn_reservation "generic_ooo_int_store" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "store"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+;; Float load/store
+(define_insn_reservation "generic_ooo_float_load" 6
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "fpload"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+(define_insn_reservation "generic_ooo_float_store" 6
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "fpstore"))
+  "generic_ooo_issue,generic_ooo_fxu")
+
+;; Vector load/store
+(define_insn_reservation "generic_ooo_vec_load" 6
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vlde,vldm,vlds,vldux,vldox,vldff,vldr"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+(define_insn_reservation "generic_ooo_vec_store" 6
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vste,vstm,vsts,vstux,vstox,vstr"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector segment loads/stores.
+(define_insn_reservation "generic_ooo_vec_loadstore_seg" 10
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vlsegde,vlsegds,vlsegdux,vlsegdox,vlsegdff,\
+			vssegte,vssegts,vssegtux,vssegtox"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+
+;; Generic integer instructions.
+(define_insn_reservation "generic_ooo_alu" 1
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\
+			move,bitmanip,min,max,minu,maxu,clz,ctz"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+
+;; Float move, convert and compare.
+(define_insn_reservation "generic_ooo_float_move" 3
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "fmove"))
+  "generic_ooo_issue,generic_ooo_fxu")
+
+(define_insn_reservation "generic_ooo_fcvt" 3
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "fcvt"))
+  "generic_ooo_issue,generic_ooo_fxu")
+
+(define_insn_reservation "generic_ooo_fcmp" 2
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "fcmp"))
+  "generic_ooo_issue,generic_ooo_fxu")
+
+;; Integer multiplication.
+(define_insn_reservation "generic_ooo_imul" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "imul"))
+  "generic_ooo_issue,generic_ooo_ixu_alu,generic_ooo_ixu_alu")
+
+;; Assume integer division is not pipelined.  Do not block the unit for more
+;; than three cycles so the DFA does not get too large.  Similar for other
+;; non-pipelined instructions.
+(define_insn_reservation "generic_ooo_idiv" 16
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "idiv"))
+  "generic_ooo_issue,generic_ooo_ixu_alu,generic_ooo_div,generic_ooo_div*3")
+
+;; Float addition and multiplication.
+(define_insn_reservation "generic_ooo_faddmul" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "fadd,fmul"))
+  "generic_ooo_issue,generic_ooo_fxu")
+
+;; Float FMA.
+(define_insn_reservation "generic_ooo_float_fma" 6
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "fmadd"))
+  "generic_ooo_issue,generic_ooo_fxu")
+
+;; Assume float division and sqrt are not pipelined.
+(define_insn_reservation "generic_ooo_float_div_single" 12
+  (and (eq_attr "tune" "generic_ooo")
+       (and (eq_attr "type" "fdiv,fsqrt")
+	    (eq_attr "mode" "SF")))
+  "generic_ooo_issue,generic_ooo_fxu,generic_ooo_div,generic_ooo_div*3")
+
+(define_insn_reservation "generic_ooo_float_div_double" 16
+  (and (eq_attr "tune" "generic_ooo")
+       (and (eq_attr "type" "fdiv,fsqrt")
+	    (eq_attr "mode" "DF")))
+  "generic_ooo_issue,generic_ooo_fxu,generic_ooo_div,generic_ooo_div*3")
+
+;; Popcount and clmul.
+(define_insn_reservation "generic_ooo_popcount" 2
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "cpop,clmul"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+;; Regular vector operations and integer comparisons.
+(define_insn_reservation "generic_ooo_vec_alu" 3
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\
+		        vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector float comparison, conversion etc.
+(define_insn_reservation "generic_ooo_vec_fcmp" 3
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vfrecp,vfminmax,vfcmp,vfsgnj,vfclass,vfcvtitof,\
+			vfcvtftoi,vfwcvtitof,vfwcvtftoi,vfwcvtftof,vfncvtitof,\
+			vfncvtftoi,vfncvtftof"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector integer multiplication.
+(define_insn_reservation "generic_ooo_vec_imul" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vimul,viwmul,vimuladd,viwmuladd,vsmul"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector float addition.
+(define_insn_reservation "generic_ooo_vec_fadd" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vfalu,vfwalu"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector float multiplication and FMA.
+(define_insn_reservation "generic_ooo_vec_fmul" 6
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector crypto, assumed to be a generic operation for now.
+(define_insn_reservation "generic_ooo_crypto" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "crypto"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector permute.
+(define_insn_reservation "generic_ooo_perm" 3
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vimerge,vfmerge,vslideup,vslidedown,vislide1up,\
+			vislide1down,vfslide1up,vfslide1down,vgather,vcompress"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector reduction.
+(define_insn_reservation "generic_ooo_vec_reduction" 8
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vired,viwred,vfredu,vfwredu"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle")
+
+;; Vector ordered reduction, assume the latency number is for
+;; a 128-bit vector.  It is scaled in riscv_sched_adjust_cost
+;; for larger vectors.
+(define_insn_reservation "generic_ooo_vec_ordered_reduction" 10
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vfredo,vfwredo"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle*3")
+
+;; Vector integer division, assume not pipelined.
+(define_insn_reservation "generic_ooo_vec_idiv" 16
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vidiv"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle*3")
+
+;; Vector float divisions and sqrt, assume not pipelined.
+(define_insn_reservation "generic_ooo_vec_float_divsqrt" 16
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vfdiv,vfsqrt"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_multicycle*3")
+
+;; Vector mask operations.
+(define_insn_reservation "generic_ooo_vec_mask" 2
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vmalu,vmpop,vmffs,vmsfs,vmiota,vmidx,vimovvx,vimovxv,\
+			vfmovvf,vfmovfv"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
+
+;; Vector vsetvl.
+(define_insn_reservation "generic_ooo_vec_vesetvl" 1
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "vsetvl,vsetvl_pre"))
+  "generic_ooo_vxu_issue")
+
+;; Vector rounding mode setters, assume pipeline barrier.
+(define_insn_reservation "generic_ooo_vec_setrm" 20
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "wrvxrm,wrfrm"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_issue*3")
+
+;; Vector read vlen/vlenb.
+(define_insn_reservation "generic_ooo_vec_readlen" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "rdvlenb,rdvl"))
+  "generic_ooo_vxu_issue,generic_ooo_vxu_issue")
+
+;; Transfer from/to coprocessor.  Assume not pipelined.
+(define_insn_reservation "generic_ooo_xfer" 4
+  (and (eq_attr "tune" "generic_ooo")
+       (eq_attr "type" "mfc,mtc"))
+  "generic_ooo_issue,generic_ooo_ixu_alu,generic_ooo_ixu_alu*3")
diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 7d87ab7ce28..91deabb6079 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -38,6 +38,7 @@ RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
 RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
 RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
 RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
+RISCV_TUNE("generic-ooo", generic_ooo, generic_ooo_tune_info)
 RISCV_TUNE("size", generic, optimize_size_tune_info)
 
 #undef RISCV_TUNE
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 378a17699cd..2f22fcaf9bd 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -52,7 +52,8 @@ extern enum riscv_isa_spec_class riscv_isa_spec;
 /* Keep this list in sync with define_attr "tune" in riscv.md.  */
 enum riscv_microarchitecture_type {
   generic,
-  sifive_7
+  sifive_7,
+  generic_ooo
 };
 extern enum riscv_microarchitecture_type riscv_microarchitecture;
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 480f3124496..0d7bc2ae9b6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -349,6 +349,21 @@ static const struct riscv_tune_param thead_c906_tune_info = {
   false		/* use_divmod_expansion */
 };
 
+/* Costs to use when optimizing for a generic ooo profile.  */
+static const struct riscv_tune_param generic_ooo_tune_info = {
+  {COSTS_N_INSNS (2), COSTS_N_INSNS (2)},	/* fp_add */
+  {COSTS_N_INSNS (5), COSTS_N_INSNS (6)},	/* fp_mul */
+  {COSTS_N_INSNS (7), COSTS_N_INSNS (8)},	/* fp_div */
+  {COSTS_N_INSNS (2), COSTS_N_INSNS (2)},	/* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},	/* int_div */
+  1,						/* issue_rate */
+  3,						/* branch_cost */
+  4,						/* memory_cost */
+  4,						/* fmv_cost */
+  false,					/* slow_unaligned_access */
+  false,					/* use_divmod_expansion */
+};
+
 /* Costs to use when optimizing for size.  */
 static const struct riscv_tune_param optimize_size_tune_info = {
   {COSTS_N_INSNS (1), COSTS_N_INSNS (1)},	/* fp_add */
@@ -6744,6 +6759,75 @@ riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more)
   return more - 1;
 }
 
+/* Adjust the cost/latency of instructions for scheduling.
+   For now this is just used to change the latency of vector instructions
+   according to their LMUL.  We assume that an insn with LMUL == 8 requires
+   eight times more execution cycles than the same insn with LMUL == 1.
+   As this may cause very high latencies which lead to scheduling artifacts
+   we currently only perform the adjustment when -madjust-lmul-cost is given.
+   */
+static int
+riscv_sched_adjust_cost (rtx_insn *, int, rtx_insn *insn, int cost,
+			 unsigned int)
+{
+  /* Only do adjustments for the generic out-of-order scheduling model.  */
+  if (!TARGET_VECTOR || riscv_microarchitecture != generic_ooo)
+    return cost;
+
+  if (recog_memoized (insn) < 0)
+    return cost;
+
+  enum attr_type type = get_attr_type (insn);
+
+  if (type == TYPE_VFREDO || type == TYPE_VFWREDO)
+    {
+      /* TODO: For ordered reductions scale the base cost relative to the
+	 number of units.  */
+      ;
+    }
+
+  /* Don't do any LMUL-based latency adjustment unless explicitly asked to.  */
+  if (!TARGET_ADJUST_LMUL_COST)
+    return cost;
+
+  /* vsetvl has a vlmul attribute but its latency does not depend on it.  */
+  if (type == TYPE_VSETVL || type == TYPE_VSETVL_PRE)
+    return cost;
+
+  enum riscv_vector::vlmul_type lmul =
+    (riscv_vector::vlmul_type)get_attr_vlmul (insn);
+
+  double factor = 1;
+  switch (lmul)
+    {
+    case riscv_vector::LMUL_2:
+      factor = 2;
+      break;
+    case riscv_vector::LMUL_4:
+      factor = 4;
+      break;
+    case riscv_vector::LMUL_8:
+      factor = 8;
+      break;
+    case riscv_vector::LMUL_F2:
+      factor = 0.5;
+      break;
+    case riscv_vector::LMUL_F4:
+      factor = 0.25;
+      break;
+    case riscv_vector::LMUL_F8:
+      factor = 0.125;
+      break;
+    default:
+      factor = 1;
+    }
+
+  /* If the latency was nonzero, keep it that way.  */
+  int new_cost = MAX (cost > 0 ? 1 : 0, cost * factor);
+
+  return new_cost;
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
@@ -8469,6 +8553,9 @@ riscv_frame_pointer_required (void)
 #undef  TARGET_SCHED_VARIABLE_ISSUE
 #define TARGET_SCHED_VARIABLE_ISSUE riscv_sched_variable_issue
 
+#undef  TARGET_SCHED_ADJUST_COST
+#define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost
+
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b456fa6abb3..2a9bad0d8ee 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -406,7 +406,7 @@ (define_attr "type"
    mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
    fmadd,fdiv,fcmp,fcvt,fsqrt,multi,auipc,sfb_alu,nop,ghost,bitmanip,rotate,
    clmul,min,max,minu,maxu,clz,ctz,cpop,
-   atomic,condmove,crypto,rdvlenb,rdvl,wrvxrm,wrfrm,rdfrm,vsetvl,
+   atomic,condmove,crypto,rdvlenb,rdvl,wrvxrm,wrfrm,rdfrm,vsetvl,vsetvl_pre,
    vlde,vste,vldm,vstm,vlds,vsts,
    vldux,vldox,vstux,vstox,vldff,vldr,vstr,
    vlsegde,vssegte,vlsegds,vssegts,vlsegdux,vlsegdox,vssegtux,vssegtox,vlsegdff,
@@ -534,7 +534,7 @@ (define_attr "cannot_copy" "no,yes" (const_string "no"))
 ;; Microarchitectures we know how to tune for.
 ;; Keep this in sync with enum riscv_microarchitecture.
 (define_attr "tune"
-  "generic,sifive_7"
+  "generic,sifive_7,generic_ooo"
   (const (symbol_ref "((enum attr_tune) riscv_microarchitecture)")))
 
 ;; Describe a user's asm statement.
@@ -3328,5 +3328,6 @@ (define_expand "msubhisi4"
 (include "generic.md")
 (include "sifive-7.md")
 (include "thead.md")
+(include "generic-ooo.md")
 (include "vector.md")
 (include "zicond.md")
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index a962ea8f9d4..724a5429e5d 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -311,3 +311,6 @@ Enum(riscv_autovec_lmul) String(m8) Value(RVV_M8)
 -param=riscv-autovec-lmul=
 Target RejectNegative Joined Enum(riscv_autovec_lmul) Var(riscv_autovec_lmul) Init(RVV_M1)
 -param=riscv-autovec-lmul=<string>	Set the RVV LMUL of auto-vectorization in the RISC-V port.
+
+madjust-lmul-cost
+Target Var(TARGET_ADJUST_LMUL_COST) Init(0)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 6ceae25dbed..9c61d0ead8a 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -935,7 +935,9 @@ (define_insn "@vlmax_avl<mode>"
   [(set (match_operand:P 0 "register_operand" "=r")
 	(unspec:P [(match_operand:P 1 "const_int_operand" "i")] UNSPEC_VLMAX))]
   "TARGET_VECTOR"
-  "")
+  ""
+  [(set_attr "type" "vsetvl_pre")]
+  )
 
 ;; Set VXRM
 (define_insn "vxrmsi"