From patchwork Mon Nov 4 13:09:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Blackmore X-Patchwork-Id: 2006241 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.a=rsa-sha256 header.s=google header.b=D2AbfuID; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XhsLd1yHSz1xxN for ; Tue, 5 Nov 2024 00:11:53 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 39FE5385782C for ; Mon, 4 Nov 2024 13:11:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id B6C853858D29 for ; Mon, 4 Nov 2024 13:10:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B6C853858D29 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B6C853858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::429 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730725842; cv=none; b=BqYt4gy49IgdVC0qWBrr4+IAHRfLZ+CB9Np9Xa4ek9rs3bIPCbdRxDJM0LqIB3932MtjVzvflCXNA4VE7nwJaO5Q691oUL0CIdqQ1I2z+/w4AvyZg48kPC8Bt7pE1bWczbm3UZIoFrWSK4honVMDM0/MG5t9JyP06w/HpNmxcBM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730725842; c=relaxed/simple; bh=qkx7z+CAAyH+vvKlgu3jabsTAkrr+6aKgXqw/PN/rl4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=A/Euu+8H48FFTZiq7hCAYUAe1wtk3oY5CuVaG88lbL61IjqzuV3wPTLwo8ER9hZZcdXApxZT4rYb6A9ENZNfd+p/Y+A+b1/KAvnZ+q3hfyM2bjP6jdpzxkKwss7p1HXh8myPxPDbjc6k9T//trmmPxCl45tmIgTwC36k5Ca4TdU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x429.google.com with SMTP id ffacd0b85a97d-37d3ecad390so3166540f8f.1 for ; Mon, 04 Nov 2024 05:10:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1730725832; x=1731330632; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oaTpzdSsf0cTjgtqpLFYSnZazoXrD9GYjgURcSkjZUg=; b=D2AbfuIDefaiRQOnTF7sPbTkfeuk/9LqJvABFch4GV0GX61/4xheT/hp6eCLigudhd PV+99Uwv9mmfL0l4Qk9PYzuchxWTu3/X1xeGXv/9J7XNeFxhaQpUsNdllk8/4/7Aaxi5 dJJwROylrwZoLpE1G9AQch/l5gqg8NgezjfPVFYHmQiPi14GsfmzOngIZGYONLTMt33H +RdYHbEt4EIozi0+T2AWPHptmecA0N/1OP+yuVy58m/yfQ0s5aB+ZfPozt7WJ+jwwsjG rwPah8JwBvneGslkT2AUN568s1pITuLYOKmCncTHxwvCtBNe5WFupPL8zYp9qE7FJf3q 2+MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730725832; x=1731330632; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oaTpzdSsf0cTjgtqpLFYSnZazoXrD9GYjgURcSkjZUg=; b=F5fcwCycfiQciKEuiEYxvoRMjA0NqMkqffzUOi1+g6LwazmakxGhk3kw4lLDledy75 lmSqFKJdGQTaplzemosgaiN5bpilocFRKzElUU2YRHbVoRFeXnx9sh6ppFBxb+gAUdPE evkeqTCOhKTMbKoKuADAtDN3L9rr1aKodXnY5R5gsMV8mNyZoFIfiWRC7DmM9Ldn4meL qskvaS24SKECENSNx3Th43LmvT11xQKiqwxCPVhlutDkQUaWoQ8QHkrIl2Ada+zljqFP Zh7FdTMOGerdREWEclQaKzOYUN0uaI5ngY1EW6rw+1mABPo2xKxtslJTiVlUR3tjLqJk 1Llw== X-Gm-Message-State: AOJu0YzDzKsjYMqfTvu4BsKZRcTNv1P7YhOWFh7kPDcMlIk/GAw+5wBf or3O/XyQd7bMXFMxE2O2OJwp7Z8UdEPEfXuEOY47Da753NbRmPoOP/j+p6NhN5r1y9LDT4W0GKP x X-Google-Smtp-Source: AGHT+IFIFQRWJRhBMkgSLUvBP8v+KoVuka/MGZA21W9Xvs8v84DXJuqvWNt8fyeQK5i/9Wlel3Fitw== X-Received: by 2002:a5d:47a3:0:b0:37e:c4e3:f14b with SMTP id ffacd0b85a97d-381c14c4395mr11996816f8f.23.1730725832379; Mon, 04 Nov 2024 05:10:32 -0800 (PST) Received: from dorian.sou.embecosm-corp.com ([212.69.42.53]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-381c10d4983sm13118154f8f.33.2024.11.04.05.10.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Nov 2024 05:10:32 -0800 (PST) From: Craig Blackmore To: gcc-patches@gcc.gnu.org Cc: jeffreyalaw@gmail.com, Craig Blackmore Subject: [PATCH v2 1/2] RISC-V: Make vectorized memset handle more cases Date: Mon, 4 Nov 2024 13:09:42 +0000 Message-ID: <20241104130943.4041719-2-craig.blackmore@embecosm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241104130943.4041719-1-craig.blackmore@embecosm.com> References: <20241018131300.1150819-1-craig.blackmore@embecosm.com> <20241104130943.4041719-1-craig.blackmore@embecosm.com> MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org `expand_vec_setmem` only generated vectorized memset if it fitted into a single vector store of at least (TARGET_MIN_VLEN / 8) bytes. Also, without dynamic LMUL the operation was always TARGET_MAX_LMUL even if it would have fitted a smaller LMUL. Allow vectorized memset to be generated for smaller lengths and smaller LMUL by switching to using use_vector_string_op. Smaller LMUL can be seen in setmem-3.c:f3. Smaller lengths will be seen after the second patch in this series which selectively disables by pieces. gcc/ChangeLog: * config/riscv/riscv-string.cc (use_vector_stringop_p): Add comment. (expand_vec_setmem): Use use_vector_stringop_p instead of check_vectorise_memory_operation. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/setmem-3.c: Expect smaller lmul. --- gcc/config/riscv/riscv-string.cc | 37 ++++++++++--------- .../gcc.target/riscv/rvv/base/setmem-3.c | 6 +-- 2 files changed, 22 insertions(+), 21 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 118c02a4021..20395e19c60 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1062,6 +1062,9 @@ struct stringop_info { MAX_EW is the maximum element width that the caller wants to use and LENGTH_IN is the length of the stringop in bytes. + + This is currently used for cpymem and setmem. If expand_vec_cmpmem switches + to using it too then check_vectorise_memory_operation can be removed. */ static bool @@ -1600,41 +1603,39 @@ check_vectorise_memory_operation (rtx length_in, HOST_WIDE_INT &lmul_out) bool expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in) { - HOST_WIDE_INT lmul; + stringop_info info; + /* Check we are able and allowed to vectorise this operation; bail if not. */ - if (!check_vectorise_memory_operation (length_in, lmul)) + if (!use_vector_stringop_p (info, 1, length_in) || info.need_loop) return false; - machine_mode vmode - = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul) - .require (); rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0)); - rtx dst = change_address (dst_in, vmode, dst_addr); + rtx dst = change_address (dst_in, info.vmode, dst_addr); - rtx fill_value = gen_reg_rtx (vmode); + rtx fill_value = gen_reg_rtx (info.vmode); rtx broadcast_ops[] = { fill_value, fill_value_in }; /* If the length is exactly vlmax for the selected mode, do that. Otherwise, use a predicated store. */ - if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + if (known_eq (GET_MODE_SIZE (info.vmode), INTVAL (info.avl))) { - emit_vlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, - broadcast_ops); + emit_vlmax_insn (code_for_pred_broadcast (info.vmode), UNARY_OP, + broadcast_ops); emit_move_insn (dst, fill_value); } else { - if (!satisfies_constraint_K (length_in)) - length_in = force_reg (Pmode, length_in); - emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, - broadcast_ops, length_in); + if (!satisfies_constraint_K (info.avl)) + info.avl = force_reg (Pmode, info.avl); + emit_nonvlmax_insn (code_for_pred_broadcast (info.vmode), + riscv_vector::UNARY_OP, broadcast_ops, info.avl); machine_mode mask_mode - = riscv_vector::get_vector_mode (BImode, GET_MODE_NUNITS (vmode)) - .require (); + = riscv_vector::get_vector_mode (BImode, GET_MODE_NUNITS (info.vmode)) + .require (); rtx mask = CONSTM1_RTX (mask_mode); - emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in, - get_avl_type_rtx (riscv_vector::NONVLMAX))); + emit_insn (gen_pred_store (info.vmode, dst, mask, fill_value, info.avl, + get_avl_type_rtx (riscv_vector::NONVLMAX))); } return true; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c index 25be694d248..52766fece76 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c @@ -21,13 +21,13 @@ f1 (void *a, int const b) return __builtin_memset (a, b, MIN_VECTOR_BYTES - 1); } -/* Vectorise+inline minimum vector register width using requested lmul. +/* Vectorised code should use smallest lmul known to fit length. ** f2: ** ( -** vsetivli\s+zero,\d+,e8,m8,ta,ma +** vsetivli\s+zero,\d+,e8,m1,ta,ma ** | ** li\s+a\d+,\d+ -** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** vsetvli\s+zero,a\d+,e8,m1,ta,ma ** ) ** vmv\.v\.x\s+v\d+,a1 ** vse8\.v\s+v\d+,0\(a0\)