From patchwork Thu Nov 7 09:18:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 2007914 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=Ny0REctW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Xkc9M69wdz1xxq for ; Thu, 7 Nov 2024 20:24:55 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 15A49385841F for ; Thu, 7 Nov 2024 09:24:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by sourceware.org (Postfix) with ESMTPS id 5AA003858D37 for ; Thu, 7 Nov 2024 09:19:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5AA003858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5AA003858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::32f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730971155; cv=none; b=gw/hOYF8xWzm1VTautrLcD6LF3uiCEzSMIJJ/nqXJQA2UdJarBLz6Zb6s6VQ+qPO7A12xLPF+m7tIS8Q2JH412yMMaLNYnw/0wY+KYJQTMW7/XPv3fe2RGd6O0cleDscffzJJ8WgqxKe4f4TwvXUSXpkD5WJ01zUA0/h3P7pLMY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730971155; c=relaxed/simple; bh=PiyipCZipTmYf/Iy8XIcOWm8RGzeJueVU3ITdczwJao=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=CoGlceFUIGfYEqDeFQzA6VKWv955RFb33vKYG5xmoVZ24CY+8i5FJe0Tg+UYxisg8XpsLb4YH1XLEtZsHtFXpPJ9XB2kqCEZm2eLWkcBjUUB8KMr26wq0cq+o1b06EHjdIMhJiwJWSx8XCJQ7SU4OLXPN4PmOd8YYaOqJxEDQ+g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ot1-x32f.google.com with SMTP id 46e09a7af769-718078c7f53so364889a34.1 for ; Thu, 07 Nov 2024 01:19:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1730971148; x=1731575948; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2nBEv9SFHEgoDsqcCs0XQmwwcAdsgt11UhuzASMneQk=; b=Ny0REctWRpwRy+r1fYeRfcgFfgSnUqkebPDYuaSsIfhU9OLRv1YQgV8pvZf6ig0bZB M/Eookm/TU86D7wYFs6H9S5b0cjhRFj2def7yLpHicxPCFi4n0b1iA0dFg3R6A1X3zS3 KsVDeFlrUd8acs5A6JNoSINTfc3nBd/yoOzEpEtJxQqKuozuEK4mdT/dhUjqDI36SZ+X vOulOtTZpeCgbjSuHvmISO8ZnNAZ4ITkIthBzIlxi4wYVqeMTbFniUFiBGRA+Zv5DfD2 CfTDE6Aq2XjwNOuuxQXU4gAOdsOHzvFPUqWD7SZniw1XEnH3evIxNLrNLq6RqnRfd5Ht kWTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730971148; x=1731575948; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2nBEv9SFHEgoDsqcCs0XQmwwcAdsgt11UhuzASMneQk=; b=HiyeBCDQpEpcL2T1MWB/QkPm7SonZbxriHcFS2xgyMN7yrboQs3eQ7pVcnbboVkm9m mf1deWWJCkoFvQPw7qUgFcYw41W+PHz3phE24u2sVxwM1P6uJNc0pw03HSTOTpqO71HQ WFWpc8oKxwwX+eA6ROy/hQw0PElmU/zW4dMOM/gCdAfrGW0NeKALOGh0dubYCiR54UOk Ajf59BfgRby5LHg2+zE0VltvrJCTKFN+d3RGJkUW/JGYXLl1Bn/XV9VKUHIUiJjU5XJ4 MGik2Qdsv+B/yeDKxUFmnz+wikTauTwYzaBYD56/l1HQsVBSziKnxQd3+TTRinQEyxhw s7Hw== X-Gm-Message-State: AOJu0YxmeQVsNfm8VT2fLS1DRAcDT0cQvj3ECKkmiw19Bl2/9bIMErjX ybDzjBS3L7O6yjZokD4snJmzkP4h//hQw67zzD8xcp8g/zRhAYezqzxhDUQvk+KfdZ7hGRR5ZaQ msEAxEw== X-Google-Smtp-Source: AGHT+IGIcrMBjbyhoxltpo2fJ9by+CoxT32vZVnBQi0UCHcsmqbQYWfZZq1AYwgJ7j+Kmx9R4KYtMg== X-Received: by 2002:a05:6830:1052:b0:718:c99:de12 with SMTP id 46e09a7af769-71a0ee6eed2mr1090709a34.3.1730971146954; Thu, 07 Nov 2024 01:19:06 -0800 (PST) Received: from localhost.localdomain ([139.178.84.207]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-71a108e344asm189755a34.48.2024.11.07.01.19.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Nov 2024 01:19:06 -0800 (PST) From: Christophe Lyon To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, ramanara@nvidia.com, andre.simoesdiasvieira@arm.com Cc: Christophe Lyon Subject: [PATCH 03/15] arm: [MVE intrinsics] rework vstr?q_scatter_offset Date: Thu, 7 Nov 2024 09:18:08 +0000 Message-Id: <20241107091820.2010568-4-christophe.lyon@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241107091820.2010568-1-christophe.lyon@linaro.org> References: <20241107091820.2010568-1-christophe.lyon@linaro.org> MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch implements vstr?q_scatter_offset using the new MVE builtins framework. It uses a similar approach to a previous patch which grouped truncating and non-truncating stores in two sets of patterns, rather than having groups of patterns depending on the destination size. We need to add the 'integer_64' types of suffixes in order to support vstrdq_scatter_offset. The patch introduces the MVE_VLD_ST_scatter iterator, similar to MVE_VLD_ST but which also includes V2DI (again, for vstrdq_scatter_offset). The new MVE_scatter_offset mode attribute is used to map the destination type to the offset type (both are usually equal, except when the destination is floating-point). We end up with four sets of patterns: - vector scatter stores with offset (non-truncating) - predicated vector scatter stores with offset (non-truncating) - truncating vector scatter stores with offset - predicated truncating vector scatter stores with offset gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl): New. (vstrbq_scatter, vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins-base.def (vstrbq_scatter) (vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins-base.h (vstrbq_scatter) (vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins.cc (integer_64): New. * config/arm/arm_mve.h (vstrbq_scatter_offset): Delete. (vstrbq_scatter_offset_p): Delete. (vstrhq_scatter_offset): Delete. (vstrhq_scatter_offset_p): Delete. (vstrdq_scatter_offset_p): Delete. (vstrdq_scatter_offset): Delete. (vstrwq_scatter_offset_p): Delete. (vstrwq_scatter_offset): Delete. (vstrbq_scatter_offset_s8): Delete. (vstrbq_scatter_offset_u8): Delete. (vstrbq_scatter_offset_u16): Delete. (vstrbq_scatter_offset_s16): Delete. (vstrbq_scatter_offset_u32): Delete. (vstrbq_scatter_offset_s32): Delete. (vstrbq_scatter_offset_p_s8): Delete. (vstrbq_scatter_offset_p_s32): Delete. (vstrbq_scatter_offset_p_s16): Delete. (vstrbq_scatter_offset_p_u8): Delete. (vstrbq_scatter_offset_p_u32): Delete. (vstrbq_scatter_offset_p_u16): Delete. (vstrhq_scatter_offset_s32): Delete. (vstrhq_scatter_offset_s16): Delete. (vstrhq_scatter_offset_u32): Delete. (vstrhq_scatter_offset_u16): Delete. (vstrhq_scatter_offset_p_s32): Delete. (vstrhq_scatter_offset_p_s16): Delete. (vstrhq_scatter_offset_p_u32): Delete. (vstrhq_scatter_offset_p_u16): Delete. (vstrdq_scatter_offset_p_s64): Delete. (vstrdq_scatter_offset_p_u64): Delete. (vstrdq_scatter_offset_s64): Delete. (vstrdq_scatter_offset_u64): Delete. (vstrhq_scatter_offset_f16): Delete. (vstrhq_scatter_offset_p_f16): Delete. (vstrwq_scatter_offset_f32): Delete. (vstrwq_scatter_offset_p_f32): Delete. (vstrwq_scatter_offset_p_s32): Delete. (vstrwq_scatter_offset_p_u32): Delete. (vstrwq_scatter_offset_s32): Delete. (vstrwq_scatter_offset_u32): Delete. (__arm_vstrbq_scatter_offset_s8): Delete. (__arm_vstrbq_scatter_offset_s32): Delete. (__arm_vstrbq_scatter_offset_s16): Delete. (__arm_vstrbq_scatter_offset_u8): Delete. (__arm_vstrbq_scatter_offset_u32): Delete. (__arm_vstrbq_scatter_offset_u16): Delete. (__arm_vstrbq_scatter_offset_p_s8): Delete. (__arm_vstrbq_scatter_offset_p_s32): Delete. (__arm_vstrbq_scatter_offset_p_s16): Delete. (__arm_vstrbq_scatter_offset_p_u8): Delete. (__arm_vstrbq_scatter_offset_p_u32): Delete. (__arm_vstrbq_scatter_offset_p_u16): Delete. (__arm_vstrhq_scatter_offset_s32): Delete. (__arm_vstrhq_scatter_offset_s16): Delete. (__arm_vstrhq_scatter_offset_u32): Delete. (__arm_vstrhq_scatter_offset_u16): Delete. (__arm_vstrhq_scatter_offset_p_s32): Delete. (__arm_vstrhq_scatter_offset_p_s16): Delete. (__arm_vstrhq_scatter_offset_p_u32): Delete. (__arm_vstrhq_scatter_offset_p_u16): Delete. (__arm_vstrdq_scatter_offset_p_s64): Delete. (__arm_vstrdq_scatter_offset_p_u64): Delete. (__arm_vstrdq_scatter_offset_s64): Delete. (__arm_vstrdq_scatter_offset_u64): Delete. (__arm_vstrwq_scatter_offset_p_s32): Delete. (__arm_vstrwq_scatter_offset_p_u32): Delete. (__arm_vstrwq_scatter_offset_s32): Delete. (__arm_vstrwq_scatter_offset_u32): Delete. (__arm_vstrhq_scatter_offset_f16): Delete. (__arm_vstrhq_scatter_offset_p_f16): Delete. (__arm_vstrwq_scatter_offset_f32): Delete. (__arm_vstrwq_scatter_offset_p_f32): Delete. (__arm_vstrbq_scatter_offset): Delete. (__arm_vstrbq_scatter_offset_p): Delete. (__arm_vstrhq_scatter_offset): Delete. (__arm_vstrhq_scatter_offset_p): Delete. (__arm_vstrdq_scatter_offset_p): Delete. (__arm_vstrdq_scatter_offset): Delete. (__arm_vstrwq_scatter_offset_p): Delete. (__arm_vstrwq_scatter_offset): Delete. * config/arm/arm_mve_builtins.def (vstrbq_scatter_offset_s) (vstrbq_scatter_offset_u, vstrbq_scatter_offset_p_s) (vstrbq_scatter_offset_p_u, vstrhq_scatter_offset_p_u) (vstrhq_scatter_offset_u, vstrhq_scatter_offset_p_s) (vstrhq_scatter_offset_s, vstrdq_scatter_offset_s) (vstrhq_scatter_offset_f, vstrwq_scatter_offset_f) (vstrwq_scatter_offset_s, vstrdq_scatter_offset_p_s) (vstrhq_scatter_offset_p_f, vstrwq_scatter_offset_p_f) (vstrwq_scatter_offset_p_s, vstrdq_scatter_offset_u) (vstrwq_scatter_offset_u, vstrdq_scatter_offset_p_u) (vstrwq_scatter_offset_p_u) Delete. * config/arm/iterators.md (MVE_VLD_ST_scatter): New. (MVE_scatter_offset): New. (MVE_elem_ch): Add entry for V2DI. (supf): Remove VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S, VSTRHQSO_U, VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_U, VSTRWQSO_S. (VSTRBSOQ, VSTRHSOQ, VSTRDSOQ, VSTRWSOQ): Delete. * config/arm/mve.md (mve_vstrbq_scatter_offset_): Delete. (mve_vstrbq_scatter_offset__insn): Delete. (mve_vstrbq_scatter_offset_p_): Delete. (mve_vstrbq_scatter_offset_p__insn): Delete. (mve_vstrhq_scatter_offset_p_): Delete. (mve_vstrhq_scatter_offset_p__insn): Delete. (mve_vstrhq_scatter_offset_): Delete. (mve_vstrhq_scatter_offset__insn): Delete. (mve_vstrdq_scatter_offset_p_v2di): Delete. (mve_vstrdq_scatter_offset_p_v2di_insn): Delete. (mve_vstrdq_scatter_offset_v2di): Delete. (mve_vstrdq_scatter_offset_v2di_insn): Delete. (mve_vstrhq_scatter_offset_fv8hf): Delete. (mve_vstrhq_scatter_offset_fv8hf_insn): Delete. (mve_vstrhq_scatter_offset_p_fv8hf): Delete. (mve_vstrhq_scatter_offset_p_fv8hf_insn): Delete. (mve_vstrwq_scatter_offset_fv4sf): Delete. (mve_vstrwq_scatter_offset_fv4sf_insn): Delete. (mve_vstrwq_scatter_offset_p_fv4sf): Delete. (mve_vstrwq_scatter_offset_p_fv4sf_insn): Delete. (mve_vstrwq_scatter_offset_p_v4si): Delete. (mve_vstrwq_scatter_offset_p_v4si_insn): Delete. (mve_vstrwq_scatter_offset_v4si): Delete. (mve_vstrwq_scatter_offset_v4si_insn): Delete. (@mve_vstrq_scatter_offset_): New. (@mve_vstrq_scatter_offset_p_): New. (@mve_vstrq_truncate_scatter_offset_): New. (@mve_vstrq_truncate_scatter_offset_p_): New. * config/arm/unspecs.md (VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S) (VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_S, VSTRWQSO_U, VSTRHQSO_F) (VSTRWQSO_F, VSTRHQSO_U): Delete. (VSTRQSO, VSTRQSO_P, VSTRQSO_TRUNC, VSTRQSO_TRUNC_P): New. --- gcc/config/arm/arm-mve-builtins-base.cc | 38 ++ gcc/config/arm/arm-mve-builtins-base.def | 6 + gcc/config/arm/arm-mve-builtins-base.h | 4 + gcc/config/arm/arm-mve-builtins.cc | 5 + gcc/config/arm/arm_mve.h | 615 ----------------------- gcc/config/arm/arm_mve_builtins.def | 20 - gcc/config/arm/iterators.md | 19 +- gcc/config/arm/mve.md | 439 +++------------- gcc/config/arm/unspecs.md | 14 +- 9 files changed, 143 insertions(+), 1017 deletions(-) diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc index 2c8ff461c53..855115c009f 100644 --- a/gcc/config/arm/arm-mve-builtins-base.cc +++ b/gcc/config/arm/arm-mve-builtins-base.cc @@ -239,6 +239,40 @@ public: } }; + /* Builds the vstrq_scatter*offset intrinsics. */ +class vstrq_scatter_impl : public store_truncating +{ +public: + using store_truncating::store_truncating; + + rtx expand (function_expander &e) const override + { + insn_code icode; + machine_mode memory_mode = e.memory_vector_mode (); + + switch (e.pred) + { + case PRED_none: + icode = (e.vector_mode (0) == memory_mode + /* Non-truncating store case. */ + ? code_for_mve_vstrq_scatter_offset (memory_mode) + /* Truncating store case. */ + : code_for_mve_vstrq_truncate_scatter_offset (memory_mode)); + break; + + case PRED_p: + icode = (e.vector_mode (0) == memory_mode + ? code_for_mve_vstrq_scatter_offset_p (memory_mode) + : code_for_mve_vstrq_truncate_scatter_offset_p (memory_mode)); + break; + + default: + gcc_unreachable (); + } + return e.use_exact_insn (icode); + } +}; + /* Builds the vldrq* intrinsics. */ class vldrq_impl : public load_extending { @@ -1207,8 +1241,12 @@ FUNCTION_ONLY_N_NO_F (vsliq, VSLIQ) FUNCTION_ONLY_N_NO_F (vsriq, VSRIQ) FUNCTION (vst1q, vst1_impl,) FUNCTION (vstrbq, vstrq_impl, (QImode, opt_scalar_mode ())) +FUNCTION (vstrbq_scatter, vstrq_scatter_impl, (QImode, opt_scalar_mode ())) +FUNCTION (vstrdq_scatter, vstrq_scatter_impl, (DImode, opt_scalar_mode ())) FUNCTION (vstrhq, vstrq_impl, (HImode, HFmode)) +FUNCTION (vstrhq_scatter, vstrq_scatter_impl, (HImode, HFmode)) FUNCTION (vstrwq, vstrq_impl, (SImode, SFmode)) +FUNCTION (vstrwq_scatter, vstrq_scatter_impl, (SImode, SFmode)) FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ) FUNCTION (vuninitializedq, vuninitializedq_impl,) diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def index 6166f1b38f4..30b576f01ed 100644 --- a/gcc/config/arm/arm-mve-builtins-base.def +++ b/gcc/config/arm/arm-mve-builtins-base.def @@ -171,8 +171,12 @@ DEF_MVE_FUNCTION (vsliq, ternary_lshift, all_integer, m_or_none) DEF_MVE_FUNCTION (vsriq, ternary_rshift, all_integer, m_or_none) DEF_MVE_FUNCTION (vst1q, store, all_integer, p_or_none) DEF_MVE_FUNCTION (vstrbq, store, all_integer, p_or_none) +DEF_MVE_FUNCTION (vstrbq_scatter, store_scatter_offset, all_integer, p_or_none) DEF_MVE_FUNCTION (vstrhq, store, integer_16_32, p_or_none) +DEF_MVE_FUNCTION (vstrhq_scatter, store_scatter_offset, integer_16_32, p_or_none) DEF_MVE_FUNCTION (vstrwq, store, integer_32, p_or_none) +DEF_MVE_FUNCTION (vstrwq_scatter, store_scatter_offset, integer_32, p_or_none) +DEF_MVE_FUNCTION (vstrdq_scatter, store_scatter_offset, integer_64, p_or_none) DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_integer, mx_or_none) DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none) #undef REQUIRES_FLOAT @@ -242,7 +246,9 @@ DEF_MVE_FUNCTION (vrndq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndxq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vst1q, store, all_float, p_or_none) DEF_MVE_FUNCTION (vstrhq, store, float_16, p_or_none) +DEF_MVE_FUNCTION (vstrhq_scatter, store_scatter_offset, float_16, p_or_none) DEF_MVE_FUNCTION (vstrwq, store, float_32, p_or_none) +DEF_MVE_FUNCTION (vstrwq_scatter, store_scatter_offset, float_32, p_or_none) DEF_MVE_FUNCTION (vsubq, binary_opt_n, all_float, mx_or_none) DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none) #undef REQUIRES_FLOAT diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h index 7c866d81c44..6ff3b149bc6 100644 --- a/gcc/config/arm/arm-mve-builtins-base.h +++ b/gcc/config/arm/arm-mve-builtins-base.h @@ -206,8 +206,12 @@ extern const function_base *const vsliq; extern const function_base *const vsriq; extern const function_base *const vst1q; extern const function_base *const vstrbq; +extern const function_base *const vstrbq_scatter; +extern const function_base *const vstrdq_scatter; extern const function_base *const vstrhq; +extern const function_base *const vstrhq_scatter; extern const function_base *const vstrwq; +extern const function_base *const vstrwq_scatter; extern const function_base *const vsubq; extern const function_base *const vuninitializedq; diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc index 4af32b5faa2..7b88ca6cce5 100644 --- a/gcc/config/arm/arm-mve-builtins.cc +++ b/gcc/config/arm/arm-mve-builtins.cc @@ -208,6 +208,10 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = { #define TYPES_signed_32(S, D) \ S (s32) +/* s64 _u64. */ +#define TYPES_integer_64(S, D) \ + S (s64), S (u64) + /* All the type combinations allowed by vcvtq. */ #define TYPES_cvt(S, D) \ D (f16, s16), \ @@ -315,6 +319,7 @@ DEF_MVE_TYPES_ARRAY (integer_8); DEF_MVE_TYPES_ARRAY (integer_8_16); DEF_MVE_TYPES_ARRAY (integer_16_32); DEF_MVE_TYPES_ARRAY (integer_32); +DEF_MVE_TYPES_ARRAY (integer_64); DEF_MVE_TYPES_ARRAY (poly_8_16); DEF_MVE_TYPES_ARRAY (signed_16_32); DEF_MVE_TYPES_ARRAY (signed_32); diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 8ffdbc7e109..fa76deec245 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -42,10 +42,8 @@ #ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE #define vst4q(__addr, __value) __arm_vst4q(__addr, __value) -#define vstrbq_scatter_offset(__base, __offset, __value) __arm_vstrbq_scatter_offset(__base, __offset, __value) #define vstrwq_scatter_base(__addr, __offset, __value) __arm_vstrwq_scatter_base(__addr, __offset, __value) #define vldrbq_gather_offset(__base, __offset) __arm_vldrbq_gather_offset(__base, __offset) -#define vstrbq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p(__base, __offset, __value, __p) #define vstrwq_scatter_base_p(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_p(__addr, __offset, __value, __p) #define vldrbq_gather_offset_z(__base, __offset, __p) __arm_vldrbq_gather_offset_z(__base, __offset, __p) #define vldrhq_gather_offset(__base, __offset) __arm_vldrhq_gather_offset(__base, __offset) @@ -60,18 +58,12 @@ #define vldrwq_gather_offset_z(__base, __offset, __p) __arm_vldrwq_gather_offset_z(__base, __offset, __p) #define vldrwq_gather_shifted_offset(__base, __offset) __arm_vldrwq_gather_shifted_offset(__base, __offset) #define vldrwq_gather_shifted_offset_z(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z(__base, __offset, __p) -#define vstrhq_scatter_offset(__base, __offset, __value) __arm_vstrhq_scatter_offset(__base, __offset, __value) -#define vstrhq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p(__base, __offset, __value, __p) #define vstrhq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrhq_scatter_shifted_offset(__base, __offset, __value) #define vstrhq_scatter_shifted_offset_p(__base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p(__base, __offset, __value, __p) #define vstrdq_scatter_base_p(__addr, __offset, __value, __p) __arm_vstrdq_scatter_base_p(__addr, __offset, __value, __p) #define vstrdq_scatter_base(__addr, __offset, __value) __arm_vstrdq_scatter_base(__addr, __offset, __value) -#define vstrdq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrdq_scatter_offset_p(__base, __offset, __value, __p) -#define vstrdq_scatter_offset(__base, __offset, __value) __arm_vstrdq_scatter_offset(__base, __offset, __value) #define vstrdq_scatter_shifted_offset_p(__base, __offset, __value, __p) __arm_vstrdq_scatter_shifted_offset_p(__base, __offset, __value, __p) #define vstrdq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrdq_scatter_shifted_offset(__base, __offset, __value) -#define vstrwq_scatter_offset_p(__base, __offset, __value, __p) __arm_vstrwq_scatter_offset_p(__base, __offset, __value, __p) -#define vstrwq_scatter_offset(__base, __offset, __value) __arm_vstrwq_scatter_offset(__base, __offset, __value) #define vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p(__base, __offset, __value, __p) #define vstrwq_scatter_shifted_offset(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset(__base, __offset, __value) #define vuninitializedq(__v) __arm_vuninitializedq(__v) @@ -95,12 +87,6 @@ #define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value) #define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value) #define vpnot(__a) __arm_vpnot(__a) -#define vstrbq_scatter_offset_s8( __base, __offset, __value) __arm_vstrbq_scatter_offset_s8( __base, __offset, __value) -#define vstrbq_scatter_offset_u8( __base, __offset, __value) __arm_vstrbq_scatter_offset_u8( __base, __offset, __value) -#define vstrbq_scatter_offset_u16( __base, __offset, __value) __arm_vstrbq_scatter_offset_u16( __base, __offset, __value) -#define vstrbq_scatter_offset_s16( __base, __offset, __value) __arm_vstrbq_scatter_offset_s16( __base, __offset, __value) -#define vstrbq_scatter_offset_u32( __base, __offset, __value) __arm_vstrbq_scatter_offset_u32( __base, __offset, __value) -#define vstrbq_scatter_offset_s32( __base, __offset, __value) __arm_vstrbq_scatter_offset_s32( __base, __offset, __value) #define vstrwq_scatter_base_s32(__addr, __offset, __value) __arm_vstrwq_scatter_base_s32(__addr, __offset, __value) #define vstrwq_scatter_base_u32(__addr, __offset, __value) __arm_vstrwq_scatter_base_u32(__addr, __offset, __value) #define vldrbq_gather_offset_u8(__base, __offset) __arm_vldrbq_gather_offset_u8(__base, __offset) @@ -111,12 +97,6 @@ #define vldrbq_gather_offset_s32(__base, __offset) __arm_vldrbq_gather_offset_s32(__base, __offset) #define vldrwq_gather_base_s32(__addr, __offset) __arm_vldrwq_gather_base_s32(__addr, __offset) #define vldrwq_gather_base_u32(__addr, __offset) __arm_vldrwq_gather_base_u32(__addr, __offset) -#define vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s8( __base, __offset, __value, __p) -#define vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s32( __base, __offset, __value, __p) -#define vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_s16( __base, __offset, __value, __p) -#define vstrbq_scatter_offset_p_u8( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u8( __base, __offset, __value, __p) -#define vstrbq_scatter_offset_p_u32( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u32( __base, __offset, __value, __p) -#define vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p) __arm_vstrbq_scatter_offset_p_u16( __base, __offset, __value, __p) #define vstrwq_scatter_base_p_s32(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_p_s32(__addr, __offset, __value, __p) #define vstrwq_scatter_base_p_u32(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_p_u32(__addr, __offset, __value, __p) #define vldrbq_gather_offset_z_s16(__base, __offset, __p) __arm_vldrbq_gather_offset_z_s16(__base, __offset, __p) @@ -173,14 +153,6 @@ #define vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_f32(__base, __offset, __p) #define vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_s32(__base, __offset, __p) #define vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) __arm_vldrwq_gather_shifted_offset_z_u32(__base, __offset, __p) -#define vstrhq_scatter_offset_s32( __base, __offset, __value) __arm_vstrhq_scatter_offset_s32( __base, __offset, __value) -#define vstrhq_scatter_offset_s16( __base, __offset, __value) __arm_vstrhq_scatter_offset_s16( __base, __offset, __value) -#define vstrhq_scatter_offset_u32( __base, __offset, __value) __arm_vstrhq_scatter_offset_u32( __base, __offset, __value) -#define vstrhq_scatter_offset_u16( __base, __offset, __value) __arm_vstrhq_scatter_offset_u16( __base, __offset, __value) -#define vstrhq_scatter_offset_p_s32( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_s32( __base, __offset, __value, __p) -#define vstrhq_scatter_offset_p_s16( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_s16( __base, __offset, __value, __p) -#define vstrhq_scatter_offset_p_u32( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_u32( __base, __offset, __value, __p) -#define vstrhq_scatter_offset_p_u16( __base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_u16( __base, __offset, __value, __p) #define vstrhq_scatter_shifted_offset_s32( __base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_s32( __base, __offset, __value) #define vstrhq_scatter_shifted_offset_s16( __base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_s16( __base, __offset, __value) #define vstrhq_scatter_shifted_offset_u32( __base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_u32( __base, __offset, __value) @@ -193,26 +165,14 @@ #define vstrdq_scatter_base_p_u64(__addr, __offset, __value, __p) __arm_vstrdq_scatter_base_p_u64(__addr, __offset, __value, __p) #define vstrdq_scatter_base_s64(__addr, __offset, __value) __arm_vstrdq_scatter_base_s64(__addr, __offset, __value) #define vstrdq_scatter_base_u64(__addr, __offset, __value) __arm_vstrdq_scatter_base_u64(__addr, __offset, __value) -#define vstrdq_scatter_offset_p_s64(__base, __offset, __value, __p) __arm_vstrdq_scatter_offset_p_s64(__base, __offset, __value, __p) -#define vstrdq_scatter_offset_p_u64(__base, __offset, __value, __p) __arm_vstrdq_scatter_offset_p_u64(__base, __offset, __value, __p) -#define vstrdq_scatter_offset_s64(__base, __offset, __value) __arm_vstrdq_scatter_offset_s64(__base, __offset, __value) -#define vstrdq_scatter_offset_u64(__base, __offset, __value) __arm_vstrdq_scatter_offset_u64(__base, __offset, __value) #define vstrdq_scatter_shifted_offset_p_s64(__base, __offset, __value, __p) __arm_vstrdq_scatter_shifted_offset_p_s64(__base, __offset, __value, __p) #define vstrdq_scatter_shifted_offset_p_u64(__base, __offset, __value, __p) __arm_vstrdq_scatter_shifted_offset_p_u64(__base, __offset, __value, __p) #define vstrdq_scatter_shifted_offset_s64(__base, __offset, __value) __arm_vstrdq_scatter_shifted_offset_s64(__base, __offset, __value) #define vstrdq_scatter_shifted_offset_u64(__base, __offset, __value) __arm_vstrdq_scatter_shifted_offset_u64(__base, __offset, __value) -#define vstrhq_scatter_offset_f16(__base, __offset, __value) __arm_vstrhq_scatter_offset_f16(__base, __offset, __value) -#define vstrhq_scatter_offset_p_f16(__base, __offset, __value, __p) __arm_vstrhq_scatter_offset_p_f16(__base, __offset, __value, __p) #define vstrhq_scatter_shifted_offset_f16(__base, __offset, __value) __arm_vstrhq_scatter_shifted_offset_f16(__base, __offset, __value) #define vstrhq_scatter_shifted_offset_p_f16(__base, __offset, __value, __p) __arm_vstrhq_scatter_shifted_offset_p_f16(__base, __offset, __value, __p) #define vstrwq_scatter_base_f32(__addr, __offset, __value) __arm_vstrwq_scatter_base_f32(__addr, __offset, __value) #define vstrwq_scatter_base_p_f32(__addr, __offset, __value, __p) __arm_vstrwq_scatter_base_p_f32(__addr, __offset, __value, __p) -#define vstrwq_scatter_offset_f32(__base, __offset, __value) __arm_vstrwq_scatter_offset_f32(__base, __offset, __value) -#define vstrwq_scatter_offset_p_f32(__base, __offset, __value, __p) __arm_vstrwq_scatter_offset_p_f32(__base, __offset, __value, __p) -#define vstrwq_scatter_offset_p_s32(__base, __offset, __value, __p) __arm_vstrwq_scatter_offset_p_s32(__base, __offset, __value, __p) -#define vstrwq_scatter_offset_p_u32(__base, __offset, __value, __p) __arm_vstrwq_scatter_offset_p_u32(__base, __offset, __value, __p) -#define vstrwq_scatter_offset_s32(__base, __offset, __value) __arm_vstrwq_scatter_offset_s32(__base, __offset, __value) -#define vstrwq_scatter_offset_u32(__base, __offset, __value) __arm_vstrwq_scatter_offset_u32(__base, __offset, __value) #define vstrwq_scatter_shifted_offset_f32(__base, __offset, __value) __arm_vstrwq_scatter_shifted_offset_f32(__base, __offset, __value) #define vstrwq_scatter_shifted_offset_p_f32(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p_f32(__base, __offset, __value, __p) #define vstrwq_scatter_shifted_offset_p_s32(__base, __offset, __value, __p) __arm_vstrwq_scatter_shifted_offset_p_s32(__base, __offset, __value, __p) @@ -384,48 +344,6 @@ __arm_vpnot (mve_pred16_t __a) return __builtin_mve_vpnotv16bi (__a); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_s8 (int8_t * __base, uint8x16_t __offset, int8x16_t __value) -{ - __builtin_mve_vstrbq_scatter_offset_sv16qi ((__builtin_neon_qi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_s32 (int8_t * __base, uint32x4_t __offset, int32x4_t __value) -{ - __builtin_mve_vstrbq_scatter_offset_sv4si ((__builtin_neon_qi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_s16 (int8_t * __base, uint16x8_t __offset, int16x8_t __value) -{ - __builtin_mve_vstrbq_scatter_offset_sv8hi ((__builtin_neon_qi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_u8 (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value) -{ - __builtin_mve_vstrbq_scatter_offset_uv16qi ((__builtin_neon_qi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_u32 (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value) -{ - __builtin_mve_vstrbq_scatter_offset_uv4si ((__builtin_neon_qi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_u16 (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value) -{ - __builtin_mve_vstrbq_scatter_offset_uv8hi ((__builtin_neon_qi *) __base, __offset, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_base_s32 (uint32x4_t __addr, const int __offset, int32x4_t __value) @@ -496,48 +414,6 @@ __arm_vldrwq_gather_base_u32 (uint32x4_t __addr, const int __offset) return __builtin_mve_vldrwq_gather_base_uv4si (__addr, __offset); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p_s8 (int8_t * __base, uint8x16_t __offset, int8x16_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_scatter_offset_p_sv16qi ((__builtin_neon_qi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p_s32 (int8_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_scatter_offset_p_sv4si ((__builtin_neon_qi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p_s16 (int8_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_scatter_offset_p_sv8hi ((__builtin_neon_qi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p_u8 (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_scatter_offset_p_uv16qi ((__builtin_neon_qi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p_u32 (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_scatter_offset_p_uv4si ((__builtin_neon_qi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p_u16 (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrbq_scatter_offset_p_uv8hi ((__builtin_neon_qi *) __base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_base_p_s32 (uint32x4_t __addr, const int __offset, int32x4_t __value, mve_pred16_t __p) @@ -861,62 +737,6 @@ __arm_vldrwq_gather_shifted_offset_z_u32 (uint32_t const * __base, uint32x4_t __ return __builtin_mve_vldrwq_gather_shifted_offset_z_uv4si ((__builtin_neon_si *) __base, __offset, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value) -{ - __builtin_mve_vstrhq_scatter_offset_sv4si ((__builtin_neon_hi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_s16 (int16_t * __base, uint16x8_t __offset, int16x8_t __value) -{ - __builtin_mve_vstrhq_scatter_offset_sv8hi ((__builtin_neon_hi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_u32 (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value) -{ - __builtin_mve_vstrhq_scatter_offset_uv4si ((__builtin_neon_hi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_u16 (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value) -{ - __builtin_mve_vstrhq_scatter_offset_uv8hi ((__builtin_neon_hi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_scatter_offset_p_sv4si ((__builtin_neon_hi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p_s16 (int16_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_scatter_offset_p_sv8hi ((__builtin_neon_hi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p_u32 (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_scatter_offset_p_uv4si ((__builtin_neon_hi *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p_u16 (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_scatter_offset_p_uv8hi ((__builtin_neon_hi *) __base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrhq_scatter_shifted_offset_s32 (int16_t * __base, uint32x4_t __offset, int32x4_t __value) @@ -1001,34 +821,6 @@ __arm_vstrdq_scatter_base_u64 (uint64x2_t __addr, const int __offset, uint64x2_t __builtin_mve_vstrdq_scatter_base_uv2di (__addr, __offset, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset_p_s64 (int64_t * __base, uint64x2_t __offset, int64x2_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrdq_scatter_offset_p_sv2di ((__builtin_neon_di *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset_p_u64 (uint64_t * __base, uint64x2_t __offset, uint64x2_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrdq_scatter_offset_p_uv2di ((__builtin_neon_di *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset_s64 (int64_t * __base, uint64x2_t __offset, int64x2_t __value) -{ - __builtin_mve_vstrdq_scatter_offset_sv2di ((__builtin_neon_di *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset_u64 (uint64_t * __base, uint64x2_t __offset, uint64x2_t __value) -{ - __builtin_mve_vstrdq_scatter_offset_uv2di ((__builtin_neon_di *) __base, __offset, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrdq_scatter_shifted_offset_p_s64 (int64_t * __base, uint64x2_t __offset, int64x2_t __value, mve_pred16_t __p) @@ -1057,34 +849,6 @@ __arm_vstrdq_scatter_shifted_offset_u64 (uint64_t * __base, uint64x2_t __offset, __builtin_mve_vstrdq_scatter_shifted_offset_uv2di ((__builtin_neon_di *) __base, __offset, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_p_s32 (int32_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrwq_scatter_offset_p_sv4si ((__builtin_neon_si *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_p_u32 (uint32_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrwq_scatter_offset_p_uv4si ((__builtin_neon_si *) __base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_s32 (int32_t * __base, uint32x4_t __offset, int32x4_t __value) -{ - __builtin_mve_vstrwq_scatter_offset_sv4si ((__builtin_neon_si *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_u32 (uint32_t * __base, uint32x4_t __offset, uint32x4_t __value) -{ - __builtin_mve_vstrwq_scatter_offset_uv4si ((__builtin_neon_si *) __base, __offset, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_shifted_offset_p_s32 (int32_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) @@ -1749,20 +1513,6 @@ __arm_vldrwq_gather_shifted_offset_z_f32 (float32_t const * __base, uint32x4_t _ return __builtin_mve_vldrwq_gather_shifted_offset_z_fv4sf ((__builtin_neon_si *) __base, __offset, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_f16 (float16_t * __base, uint16x8_t __offset, float16x8_t __value) -{ - __builtin_mve_vstrhq_scatter_offset_fv8hf ((__builtin_neon_hi *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p_f16 (float16_t * __base, uint16x8_t __offset, float16x8_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrhq_scatter_offset_p_fv8hf ((__builtin_neon_hi *) __base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrhq_scatter_shifted_offset_f16 (float16_t * __base, uint16x8_t __offset, float16x8_t __value) @@ -1791,20 +1541,6 @@ __arm_vstrwq_scatter_base_p_f32 (uint32x4_t __addr, const int __offset, float32x __builtin_mve_vstrwq_scatter_base_p_fv4sf (__addr, __offset, __value, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_f32 (float32_t * __base, uint32x4_t __offset, float32x4_t __value) -{ - __builtin_mve_vstrwq_scatter_offset_fv4sf ((__builtin_neon_si *) __base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_p_f32 (float32_t * __base, uint32x4_t __offset, float32x4_t __value, mve_pred16_t __p) -{ - __builtin_mve_vstrwq_scatter_offset_p_fv4sf ((__builtin_neon_si *) __base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_shifted_offset_f32 (float32_t * __base, uint32x4_t __offset, float32x4_t __value) @@ -1985,48 +1721,6 @@ __arm_vst4q (uint32_t * __addr, uint32x4x4_t __value) __arm_vst4q_u32 (__addr, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset (int8_t * __base, uint8x16_t __offset, int8x16_t __value) -{ - __arm_vstrbq_scatter_offset_s8 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset (int8_t * __base, uint32x4_t __offset, int32x4_t __value) -{ - __arm_vstrbq_scatter_offset_s32 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset (int8_t * __base, uint16x8_t __offset, int16x8_t __value) -{ - __arm_vstrbq_scatter_offset_s16 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value) -{ - __arm_vstrbq_scatter_offset_u8 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value) -{ - __arm_vstrbq_scatter_offset_u32 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value) -{ - __arm_vstrbq_scatter_offset_u16 (__base, __offset, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_base (uint32x4_t __addr, const int __offset, int32x4_t __value) @@ -2083,48 +1777,6 @@ __arm_vldrbq_gather_offset (int8_t const * __base, uint32x4_t __offset) return __arm_vldrbq_gather_offset_s32 (__base, __offset); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p (int8_t * __base, uint8x16_t __offset, int8x16_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_scatter_offset_p_s8 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p (int8_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_scatter_offset_p_s32 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p (int8_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_scatter_offset_p_s16 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p (uint8_t * __base, uint8x16_t __offset, uint8x16_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_scatter_offset_p_u8 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p (uint8_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_scatter_offset_p_u32 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrbq_scatter_offset_p (uint8_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrbq_scatter_offset_p_u16 (__base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_base_p (uint32x4_t __addr, const int __offset, int32x4_t __value, mve_pred16_t __p) @@ -2405,62 +2057,6 @@ __arm_vldrwq_gather_shifted_offset_z (uint32_t const * __base, uint32x4_t __offs return __arm_vldrwq_gather_shifted_offset_z_u32 (__base, __offset, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset (int16_t * __base, uint32x4_t __offset, int32x4_t __value) -{ - __arm_vstrhq_scatter_offset_s32 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset (int16_t * __base, uint16x8_t __offset, int16x8_t __value) -{ - __arm_vstrhq_scatter_offset_s16 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value) -{ - __arm_vstrhq_scatter_offset_u32 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value) -{ - __arm_vstrhq_scatter_offset_u16 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p (int16_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_scatter_offset_p_s32 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p (int16_t * __base, uint16x8_t __offset, int16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_scatter_offset_p_s16 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p (uint16_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_scatter_offset_p_u32 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p (uint16_t * __base, uint16x8_t __offset, uint16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_scatter_offset_p_u16 (__base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrhq_scatter_shifted_offset (int16_t * __base, uint32x4_t __offset, int32x4_t __value) @@ -2545,34 +2141,6 @@ __arm_vstrdq_scatter_base (uint64x2_t __addr, const int __offset, uint64x2_t __v __arm_vstrdq_scatter_base_u64 (__addr, __offset, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset_p (int64_t * __base, uint64x2_t __offset, int64x2_t __value, mve_pred16_t __p) -{ - __arm_vstrdq_scatter_offset_p_s64 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset_p (uint64_t * __base, uint64x2_t __offset, uint64x2_t __value, mve_pred16_t __p) -{ - __arm_vstrdq_scatter_offset_p_u64 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset (int64_t * __base, uint64x2_t __offset, int64x2_t __value) -{ - __arm_vstrdq_scatter_offset_s64 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrdq_scatter_offset (uint64_t * __base, uint64x2_t __offset, uint64x2_t __value) -{ - __arm_vstrdq_scatter_offset_u64 (__base, __offset, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrdq_scatter_shifted_offset_p (int64_t * __base, uint64x2_t __offset, int64x2_t __value, mve_pred16_t __p) @@ -2601,34 +2169,6 @@ __arm_vstrdq_scatter_shifted_offset (uint64_t * __base, uint64x2_t __offset, uin __arm_vstrdq_scatter_shifted_offset_u64 (__base, __offset, __value); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_p (int32_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrwq_scatter_offset_p_s32 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_p (uint32_t * __base, uint32x4_t __offset, uint32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrwq_scatter_offset_p_u32 (__base, __offset, __value, __p); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset (int32_t * __base, uint32x4_t __offset, int32x4_t __value) -{ - __arm_vstrwq_scatter_offset_s32 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset (uint32_t * __base, uint32x4_t __offset, uint32x4_t __value) -{ - __arm_vstrwq_scatter_offset_u32 (__base, __offset, __value); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_shifted_offset_p (int32_t * __base, uint32x4_t __offset, int32x4_t __value, mve_pred16_t __p) @@ -3023,20 +2563,6 @@ __arm_vldrwq_gather_shifted_offset_z (float32_t const * __base, uint32x4_t __off return __arm_vldrwq_gather_shifted_offset_z_f32 (__base, __offset, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset (float16_t * __base, uint16x8_t __offset, float16x8_t __value) -{ - __arm_vstrhq_scatter_offset_f16 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrhq_scatter_offset_p (float16_t * __base, uint16x8_t __offset, float16x8_t __value, mve_pred16_t __p) -{ - __arm_vstrhq_scatter_offset_p_f16 (__base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrhq_scatter_shifted_offset (float16_t * __base, uint16x8_t __offset, float16x8_t __value) @@ -3065,20 +2591,6 @@ __arm_vstrwq_scatter_base_p (uint32x4_t __addr, const int __offset, float32x4_t __arm_vstrwq_scatter_base_p_f32 (__addr, __offset, __value, __p); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset (float32_t * __base, uint32x4_t __offset, float32x4_t __value) -{ - __arm_vstrwq_scatter_offset_f32 (__base, __offset, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vstrwq_scatter_offset_p (float32_t * __base, uint32x4_t __offset, float32x4_t __value, mve_pred16_t __p) -{ - __arm_vstrwq_scatter_offset_p_f32 (__base, __offset, __value, __p); -} - __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vstrwq_scatter_shifted_offset (float32_t * __base, uint32x4_t __offset, float32x4_t __value) @@ -3589,24 +3101,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x2_t]: __arm_vst2q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x2_t)), \ int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x2_t]: __arm_vst2q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x2_t)));}) -#define __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_p_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_p_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_p_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \ - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_offset_p_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3));}) - -#define __arm_vstrhq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)), \ - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_offset_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t)));}) - #define __arm_vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ @@ -3625,24 +3119,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_shifted_offset_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)), \ int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_shifted_offset_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t)));}) -#define __arm_vstrhq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)), \ - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_offset_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t)));}) - -#define __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_p_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_p_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_p_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3), \ - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_float16x8_t]: __arm_vstrhq_scatter_offset_p_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3));}) - #define __arm_vstrhq_scatter_shifted_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ @@ -3673,20 +3149,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_base_p_u32(p0, p1, __ARM_mve_coerce(__p2, uint32x4_t), p3), \ int (*)[__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_base_p_f32(p0, p1, __ARM_mve_coerce(__p2, float32x4_t), p3));}) -#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, __ARM_mve_coerce(__p2, int32x4_t)), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, __ARM_mve_coerce(__p2, uint32x4_t)), \ - int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_offset_f32 (__ARM_mve_coerce_f32_ptr(__p0, float32_t *), p1, __ARM_mve_coerce(__p2, float32x4_t)));}) - -#define __arm_vstrwq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_offset_p_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_offset_p_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, __ARM_mve_coerce(__p2, uint32x4_t), p3), \ - int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4_t]: __arm_vstrwq_scatter_offset_p_f32 (__ARM_mve_coerce_f32_ptr(__p0, float32_t *), p1, __ARM_mve_coerce(__p2, float32x4_t), p3));}) - #define __arm_vstrwq_scatter_shifted_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \ @@ -3864,21 +3326,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x2_t]: __arm_vst2q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x2_t)), \ int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x2_t]: __arm_vst2q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x2_t)));}) -#define __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_p_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_p_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_p_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));}) - -#define __arm_vstrhq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));}) #define __arm_vstrhq_scatter_shifted_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \ __typeof(p2) __p2 = (p2); \ @@ -3906,22 +3353,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_base_s64 (p0, p1, __ARM_mve_coerce(__p2, int64x2_t)), \ int (*)[__ARM_mve_type_uint64x2_t]: __arm_vstrdq_scatter_base_u64 (p0, p1, __ARM_mve_coerce(__p2, uint64x2_t)));}) -#define __arm_vstrhq_scatter_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));}) - -#define __arm_vstrhq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrhq_scatter_offset_p_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrhq_scatter_offset_p_s32 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_offset_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_offset_p_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));}) - #define __arm_vstrhq_scatter_shifted_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ @@ -3938,18 +3369,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrhq_scatter_shifted_offset_p_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \ int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrhq_scatter_shifted_offset_p_u32 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));}) -#define __arm_vstrwq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_offset_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, __ARM_mve_coerce(__p2, int32x4_t)), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_offset_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, __ARM_mve_coerce(__p2, uint32x4_t)));}) - -#define __arm_vstrwq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4_t]: __arm_vstrwq_scatter_offset_p_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), p1, __ARM_mve_coerce(__p2, int32x4_t), p3), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4_t]: __arm_vstrwq_scatter_offset_p_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), p1, __ARM_mve_coerce(__p2, uint32x4_t), p3));}) - #define __arm_vstrwq_scatter_shifted_offset(p0,p1,p2) ({ __typeof(p1) __p1 = (p1); \ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p2)])0, \ @@ -4070,40 +3489,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_base_p_s64 (p0, p1, __ARM_mve_coerce(__p2, int64x2_t), p3), \ int (*)[__ARM_mve_type_uint64x2_t]: __arm_vstrdq_scatter_base_p_u64 (p0, p1, __ARM_mve_coerce(__p2, uint64x2_t), p3));}) -#define __arm_vstrbq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vstrbq_scatter_offset_s8 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int8x16_t)), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrbq_scatter_offset_s16 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t)), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrbq_scatter_offset_s32 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_scatter_offset_u8 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_scatter_offset_u16 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_scatter_offset_u32 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t)));}) - -#define __arm_vstrbq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p1) __p1 = (p1); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_int8x16_t]: __arm_vstrbq_scatter_offset_p_s8 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, int8x16_t), p3), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_int16x8_t]: __arm_vstrbq_scatter_offset_p_s16 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, int16x8_t), p3), \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_int32x4_t]: __arm_vstrbq_scatter_offset_p_s32 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, int32x4_t), p3), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: __arm_vstrbq_scatter_offset_p_u8 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, uint8x16_t), p3), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: __arm_vstrbq_scatter_offset_p_u16 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, uint16x8_t), p3), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: __arm_vstrbq_scatter_offset_p_u32 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, uint32x4_t), p3));}) - -#define __arm_vstrdq_scatter_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int64_t_ptr][__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_offset_p_s64 (__ARM_mve_coerce_s64_ptr(__p0, int64_t *), p1, __ARM_mve_coerce(__p2, int64x2_t), p3), \ - int (*)[__ARM_mve_type_uint64_t_ptr][__ARM_mve_type_uint64x2_t]: __arm_vstrdq_scatter_offset_p_u64 (__ARM_mve_coerce_u64_ptr(__p0, uint64_t *), p1, __ARM_mve_coerce(__p2, uint64x2_t), p3));}) - -#define __arm_vstrdq_scatter_offset(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p2) __p2 = (p2); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \ - int (*)[__ARM_mve_type_int64_t_ptr][__ARM_mve_type_int64x2_t]: __arm_vstrdq_scatter_offset_s64 (__ARM_mve_coerce_s64_ptr(__p0, int64_t *), p1, __ARM_mve_coerce(__p2, int64x2_t)), \ - int (*)[__ARM_mve_type_uint64_t_ptr][__ARM_mve_type_uint64x2_t]: __arm_vstrdq_scatter_offset_u64 (__ARM_mve_coerce_u64_ptr(__p0, uint64_t *), p1, __ARM_mve_coerce(__p2, uint64x2_t)));}) - #define __arm_vstrdq_scatter_shifted_offset_p(p0,p1,p2,p3) ({ __typeof(p0) __p0 = (p0); \ __typeof(p2) __p2 = (p2); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p2)])0, \ diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index 5a0c7606339..167d272916c 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -663,16 +663,12 @@ VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_f, v8hf, v4sf) VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_n_f, v8hf, v4sf) VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vaddq_m_f, v8hf, v4sf) VAR2 (QUADOP_NONE_NONE_NONE_NONE_PRED, vabdq_m_f, v8hf, v4sf) -VAR3 (STRSS, vstrbq_scatter_offset_s, v16qi, v8hi, v4si) -VAR3 (STRSU, vstrbq_scatter_offset_u, v16qi, v8hi, v4si) VAR1 (STRSBS, vstrwq_scatter_base_s, v4si) VAR1 (STRSBU, vstrwq_scatter_base_u, v4si) VAR3 (LDRGU, vldrbq_gather_offset_u, v16qi, v8hi, v4si) VAR3 (LDRGS, vldrbq_gather_offset_s, v16qi, v8hi, v4si) VAR1 (LDRGBS, vldrwq_gather_base_s, v4si) VAR1 (LDRGBU, vldrwq_gather_base_u, v4si) -VAR3 (STRSS_P, vstrbq_scatter_offset_p_s, v16qi, v8hi, v4si) -VAR3 (STRSU_P, vstrbq_scatter_offset_p_u, v16qi, v8hi, v4si) VAR1 (STRSBS_P, vstrwq_scatter_base_p_s, v4si) VAR1 (STRSBU_P, vstrwq_scatter_base_p_u, v4si) VAR1 (LDRGBS_Z, vldrwq_gather_base_z_s, v4si) @@ -718,42 +714,26 @@ VAR1 (LDRGU_Z, vldrdq_gather_shifted_offset_z_u, v2di) VAR1 (LDRGU_Z, vldrwq_gather_offset_z_u, v4si) VAR1 (LDRGU_Z, vldrwq_gather_shifted_offset_z_u, v4si) VAR2 (STRSU_P, vstrhq_scatter_shifted_offset_p_u, v8hi, v4si) -VAR2 (STRSU_P, vstrhq_scatter_offset_p_u, v8hi, v4si) VAR2 (STRSU, vstrhq_scatter_shifted_offset_u, v8hi, v4si) -VAR2 (STRSU, vstrhq_scatter_offset_u, v8hi, v4si) VAR2 (STRSS_P, vstrhq_scatter_shifted_offset_p_s, v8hi, v4si) -VAR2 (STRSS_P, vstrhq_scatter_offset_p_s, v8hi, v4si) VAR2 (STRSS, vstrhq_scatter_shifted_offset_s, v8hi, v4si) -VAR2 (STRSS, vstrhq_scatter_offset_s, v8hi, v4si) VAR1 (STRSBS, vstrdq_scatter_base_s, v2di) VAR1 (STRSBS, vstrwq_scatter_base_f, v4sf) VAR1 (STRSBS_P, vstrdq_scatter_base_p_s, v2di) VAR1 (STRSBS_P, vstrwq_scatter_base_p_f, v4sf) VAR1 (STRSBU, vstrdq_scatter_base_u, v2di) VAR1 (STRSBU_P, vstrdq_scatter_base_p_u, v2di) -VAR1 (STRSS, vstrdq_scatter_offset_s, v2di) VAR1 (STRSS, vstrdq_scatter_shifted_offset_s, v2di) -VAR1 (STRSS, vstrhq_scatter_offset_f, v8hf) VAR1 (STRSS, vstrhq_scatter_shifted_offset_f, v8hf) -VAR1 (STRSS, vstrwq_scatter_offset_f, v4sf) -VAR1 (STRSS, vstrwq_scatter_offset_s, v4si) VAR1 (STRSS, vstrwq_scatter_shifted_offset_f, v4sf) VAR1 (STRSS, vstrwq_scatter_shifted_offset_s, v4si) -VAR1 (STRSS_P, vstrdq_scatter_offset_p_s, v2di) VAR1 (STRSS_P, vstrdq_scatter_shifted_offset_p_s, v2di) -VAR1 (STRSS_P, vstrhq_scatter_offset_p_f, v8hf) VAR1 (STRSS_P, vstrhq_scatter_shifted_offset_p_f, v8hf) -VAR1 (STRSS_P, vstrwq_scatter_offset_p_f, v4sf) -VAR1 (STRSS_P, vstrwq_scatter_offset_p_s, v4si) VAR1 (STRSS_P, vstrwq_scatter_shifted_offset_p_f, v4sf) VAR1 (STRSS_P, vstrwq_scatter_shifted_offset_p_s, v4si) -VAR1 (STRSU, vstrdq_scatter_offset_u, v2di) VAR1 (STRSU, vstrdq_scatter_shifted_offset_u, v2di) -VAR1 (STRSU, vstrwq_scatter_offset_u, v4si) VAR1 (STRSU, vstrwq_scatter_shifted_offset_u, v4si) -VAR1 (STRSU_P, vstrdq_scatter_offset_p_u, v2di) VAR1 (STRSU_P, vstrdq_scatter_shifted_offset_p_u, v2di) -VAR1 (STRSU_P, vstrwq_scatter_offset_p_u, v4si) VAR1 (STRSU_P, vstrwq_scatter_shifted_offset_p_u, v4si) VAR1 (STRSBWBU, vstrwq_scatter_base_wb_u, v4si) VAR1 (STRSBWBU, vstrdq_scatter_base_wb_u, v2di) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 22f8c180565..f046225584a 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -273,6 +273,7 @@ (define_mode_iterator VBFCVTM [V2SI SF]) (define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF]) (define_mode_iterator MVE_vecs [V16QI V8HI V4SI V2DI V8HF V4SF V2DF]) (define_mode_iterator MVE_VLD_ST [V16QI V8HI V4SI V8HF V4SF]) +(define_mode_iterator MVE_VLD_ST_scatter [V16QI V8HI V4SI V8HF V4SF V2DI]) (define_mode_iterator MVE_0 [V8HF V4SF]) (define_mode_iterator MVE_1 [V16QI V8HI V4SI V2DI]) (define_mode_iterator MVE_3 [V16QI V8HI]) @@ -290,6 +291,7 @@ (define_mode_attr MVE_wide_n_TYPE [(V8QI "V8HI") (V4QI "V4SI") (V4HI "V4SI")]) (define_mode_attr MVE_wide_n_type [(V8QI "v8hi") (V4QI "v4si") (V4HI "v4si")]) (define_mode_attr MVE_wide_n_sz_elem [(V8QI "16") (V4QI "32") (V4HI "32")]) (define_mode_attr MVE_wide_n_VPRED [(V8QI "V8BI") (V4QI "V4BI") (V4HI "V4BI")]) +(define_mode_attr MVE_scatter_offset [(V16QI "V16QI") (V8HI "V8HI") (V4SI "V4SI") (V8HF "V8HI") (V4SF "V4SI") (V2DI "V2DI")]) ;;---------------------------------------------------------------------------- ;; Code iterators @@ -1817,7 +1819,8 @@ (define_mode_attr V_elem_ch [(V8QI "b") (V16QI "b") (define_mode_attr MVE_elem_ch [(V4QI "b") (V8QI "b") (V16QI "b") (V4HI "h") (V8HI "h") (V8HF "h") - (V4SI "w") (V4SF "w")]) + (V4SI "w") (V4SF "w") + (V2DI "d")]) (define_mode_attr VH_elem_ch [(V4HI "s") (V8HI "s") (V4HF "s") (V8HF "s") @@ -2521,8 +2524,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") (VQRSHRNBQ_M_N_S "s") (VQRSHRNBQ_M_N_U "u") (VMLALDAVAXQ_P_S "s") (VMLALDAVAQ_P_S "s") (VMLALDAVAQ_P_U "u") - (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VSTRBQSO_S "s") - (VSTRBQSO_U "u") + (VSTRWQSB_S "s") (VSTRWQSB_U "u") (VLDRBQGO_S "s") (VLDRBQGO_U "u") (VLDRWQGB_S "s") (VLDRWQGB_U "u") (VLDRHQGO_S "s") (VLDRHQGO_U "u") (VLDRHQGSO_S "s") (VLDRHQGSO_U "u") @@ -2530,11 +2532,10 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") (VLDRDQGO_S "s") (VLDRDQGO_U "u") (VLDRDQGSO_S "s") (VLDRDQGSO_U "u") (VLDRWQGO_S "s") (VLDRWQGO_U "u") (VLDRWQGSO_S "s") (VLDRWQGSO_U "u") - (VSTRHQSO_S "s") (VSTRHQSO_U "u") (VSTRHQSSO_S "s") (VSTRHQSSO_U "u") - (VSTRDQSB_S "s") (VSTRDQSB_U "u") (VSTRDQSO_S "s") - (VSTRDQSO_U "u") (VSTRDQSSO_S "s") (VSTRDQSSO_U "u") - (VSTRWQSO_U "u") (VSTRWQSO_S "s") (VSTRWQSSO_U "u") + (VSTRDQSB_S "s") (VSTRDQSB_U "u") + (VSTRDQSSO_S "s") (VSTRDQSSO_U "u") + (VSTRWQSSO_U "u") (VSTRWQSSO_S "s") (VSTRWQSBWB_S "s") (VSTRWQSBWB_U "u") (VLDRWQGBWB_S "s") (VLDRWQGBWB_U "u") (VLDRDQGBWB_S "s") (VLDRDQGBWB_U "u") (VSTRDQSBWB_S "s") (VADCQ_M_S "s") @@ -2937,7 +2938,6 @@ (define_int_iterator VSHLLxQ_M_N [VSHLLBQ_M_N_U VSHLLBQ_M_N_S VSHLLTQ_M_N_U VSHL (define_int_iterator VSHRNBQ_M_N [VSHRNBQ_M_N_S VSHRNBQ_M_N_U]) (define_int_iterator VSHRNTQ_M_N [VSHRNTQ_M_N_S VSHRNTQ_M_N_U]) (define_int_iterator VSTRWSBQ [VSTRWQSB_S VSTRWQSB_U]) -(define_int_iterator VSTRBSOQ [VSTRBQSO_S VSTRBQSO_U]) (define_int_iterator VLDRBGOQ [VLDRBQGO_S VLDRBQGO_U]) (define_int_iterator VLDRWGBQ [VLDRWQGB_S VLDRWQGB_U]) (define_int_iterator VLDRHGOQ [VLDRHQGO_S VLDRHQGO_U]) @@ -2947,12 +2947,9 @@ (define_int_iterator VLDRDGOQ [VLDRDQGO_S VLDRDQGO_U]) (define_int_iterator VLDRDGSOQ [VLDRDQGSO_S VLDRDQGSO_U]) (define_int_iterator VLDRWGOQ [VLDRWQGO_S VLDRWQGO_U]) (define_int_iterator VLDRWGSOQ [VLDRWQGSO_S VLDRWQGSO_U]) -(define_int_iterator VSTRHSOQ [VSTRHQSO_S VSTRHQSO_U]) (define_int_iterator VSTRHSSOQ [VSTRHQSSO_S VSTRHQSSO_U]) (define_int_iterator VSTRDSBQ [VSTRDQSB_S VSTRDQSB_U]) -(define_int_iterator VSTRDSOQ [VSTRDQSO_S VSTRDQSO_U]) (define_int_iterator VSTRDSSOQ [VSTRDQSSO_S VSTRDQSSO_U]) -(define_int_iterator VSTRWSOQ [VSTRWQSO_S VSTRWQSO_U]) (define_int_iterator VSTRWSSOQ [VSTRWQSSO_S VSTRWQSSO_U]) (define_int_iterator VSTRWSBWBQ [VSTRWQSBWB_S VSTRWQSBWB_U]) (define_int_iterator VLDRWGBWBQ [VLDRWQGBWB_S VLDRWQGBWB_U]) diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index e54153edb5d..b2fb2d878c4 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -3311,36 +3311,93 @@ (define_insn "@mve_vldrq_z_extend_" (set_attr "type" "mve_move") (set_attr "length" "8")]) +;; Vector scatter stores with offset ;; -;; [vstrbq_scatter_offset_s vstrbq_scatter_offset_u] +;; [vstrbq_scatter_offset_s8, vstrbq_scatter_offset_u8, +;; vstrhq_scatter_offset_s16, vstrhq_scatter_offset_u16, +;; vstrwq_scatter_offset_s32, vstrwq_scatter_offset_u32, +;; vstrdq_scatter_offset_s64, vstrdq_scatter_offset_u64, +;; vstrhq_scatter_offset_f16, +;; vstrwq_scatter_offset_f32] ;; -(define_expand "mve_vstrbq_scatter_offset_" - [(match_operand: 0 "mve_scatter_memory") - (match_operand:MVE_2 1 "s_register_operand") - (match_operand:MVE_2 2 "s_register_operand") - (unspec:V4SI [(const_int 0)] VSTRBSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrbq_scatter_offset__insn (ind, operands[1], - operands[2])); - DONE; -}) +(define_insn "@mve_vstrq_scatter_offset_" + [(set (mem:BLK (scratch)) + (unspec:BLK + [(match_operand:SI 0 "register_operand" "r") + (match_operand: 1 "s_register_operand" "w") + (match_operand:MVE_VLD_ST_scatter 2 "s_register_operand" "w")] + VSTRQSO))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" + "vstr.\t%q2, [%0, %q1]" + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrq_scatter_offset_")) + (set_attr "length" "4")]) -(define_insn "mve_vstrbq_scatter_offset__insn" +;; Predicated vector scatter stores with offset +;; +;; [vstrbq_scatter_offset_p_s8, vstrbq_scatter_offset_p_u8, +;; [vstrhq_scatter_offset_p_s16, vstrhq_scatter_offset_p_u16, +;; [vstrwq_scatter_offset_p_s32, vstrwq_scatter_offset_p_u32, +;; [vstrdq_scatter_offset_p_s64, vstrdq_scatter_offset_p_u64, +;; [vstrhq_scatter_offset_p_f16, +;; [vstrwq_scatter_offset_p_f32] +;; +(define_insn "@mve_vstrq_scatter_offset_p_" [(set (mem:BLK (scratch)) (unspec:BLK [(match_operand:SI 0 "register_operand" "r") - (match_operand:MVE_2 1 "s_register_operand" "w") - (match_operand:MVE_2 2 "s_register_operand" "w")] - VSTRBSOQ))] - "TARGET_HAVE_MVE" - "vstrb.\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrbq_scatter_offset__insn")) + (match_operand: 1 "s_register_operand" "w") + (match_operand:MVE_VLD_ST_scatter 2 "s_register_operand" "w") + (match_operand: 3 "vpr_register_operand" "Up")] + VSTRQSO_P))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" + "vpst\;vstrt.\t%q2, [%0, %q1]" + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrq_scatter_offset_")) + (set_attr "length" "8")]) + +;; Truncating vector scatter stores with offset +;; +;; [vstrbq_scatter_offset_s16, vstrbq_scatter_offset_u16, +;; [vstrbq_scatter_offset_s32, vstrbq_scatter_offset_u32, +;; [vstrhq_scatter_offset_s32, vstrhq_scatter_offset_u32] +;; +(define_insn "@mve_vstrq_truncate_scatter_offset_" + [(set (mem:BLK (scratch)) + (unspec:BLK + [(match_operand:SI 0 "register_operand" "r") + (match_operand: 1 "s_register_operand" "w") + (truncate:MVE_w_narrow_TYPE + (match_operand: 2 "s_register_operand" "w"))] + VSTRQSO_TRUNC))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode))" + "vstr.\t%q2, [%0, %q1]" + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrq_truncate_scatter_offset_")) (set_attr "length" "4")]) + +;; Predicated truncating vector scatter stores with offset ;; +;; [vstrbq_scatter_offset_p_s16, vstrbq_scatter_offset_p_u16, +;; [vstrbq_scatter_offset_p_s32, vstrbq_scatter_offset_p_u32, +;; [vstrhq_scatter_offset_p_s32, vstrhq_scatter_offset_p_u32] +;; +(define_insn "@mve_vstrq_truncate_scatter_offset_p_" + [(set (mem:BLK (scratch)) + (unspec:BLK + [(match_operand:SI 0 "register_operand" "r") + (match_operand: 1 "s_register_operand" "w") + (truncate:MVE_w_narrow_TYPE + (match_operand: 2 "s_register_operand" "w")) + (match_operand: 3 "vpr_register_operand" "Up")] + VSTRQSO_TRUNC_P))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode))" + "vpst\;vstrt.\t%q2, [%0, %q1]" + [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrq_truncate_scatter_offset_")) + (set_attr "length" "8")]) + +; +; ;; [vstrwq_scatter_base_s vstrwq_scatter_base_u] ;; (define_insn "mve_vstrwq_scatter_base_v4si" @@ -3407,40 +3464,6 @@ (define_insn "mve_vldrwq_gather_base_v4si" } [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_gather_base_v4si")) (set_attr "length" "4")]) - -;; -;; [vstrbq_scatter_offset_p_s vstrbq_scatter_offset_p_u] -;; -(define_expand "mve_vstrbq_scatter_offset_p_" - [(match_operand: 0 "mve_scatter_memory") - (match_operand:MVE_2 1 "s_register_operand") - (match_operand:MVE_2 2 "s_register_operand") - (match_operand: 3 "vpr_register_operand" "Up") - (unspec:V4SI [(const_int 0)] VSTRBSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn ( - gen_mve_vstrbq_scatter_offset_p__insn (ind, operands[1], - operands[2], - operands[3])); - DONE; -}) - -(define_insn "mve_vstrbq_scatter_offset_p__insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:MVE_2 1 "s_register_operand" "w") - (match_operand:MVE_2 2 "s_register_operand" "w") - (match_operand: 3 "vpr_register_operand" "Up")] - VSTRBSOQ))] - "TARGET_HAVE_MVE" - "vpst\;vstrbt.\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrbq_scatter_offset__insn")) - (set_attr "length" "8")]) - ;; ;; [vstrwq_scatter_base_p_s vstrwq_scatter_base_p_u] ;; @@ -4049,68 +4072,6 @@ (define_insn "mve_vldrwq_gather_shifted_offset_z_v4si" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vldrwq_gather_shifted_offset_v4si")) (set_attr "length" "8")]) -;; -;; [vstrhq_scatter_offset_p_s vstrhq_scatter_offset_p_u] -;; -(define_expand "mve_vstrhq_scatter_offset_p_" - [(match_operand: 0 "mve_scatter_memory") - (match_operand:MVE_5 1 "s_register_operand") - (match_operand:MVE_5 2 "s_register_operand") - (match_operand: 3 "vpr_register_operand") - (unspec:V4SI [(const_int 0)] VSTRHSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn ( - gen_mve_vstrhq_scatter_offset_p__insn (ind, operands[1], - operands[2], - operands[3])); - DONE; -}) - -(define_insn "mve_vstrhq_scatter_offset_p__insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:MVE_5 1 "s_register_operand" "w") - (match_operand:MVE_5 2 "s_register_operand" "w") - (match_operand: 3 "vpr_register_operand" "Up")] - VSTRHSOQ))] - "TARGET_HAVE_MVE" - "vpst\;vstrht.\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_scatter_offset__insn")) - (set_attr "length" "8")]) - -;; -;; [vstrhq_scatter_offset_s vstrhq_scatter_offset_u] -;; -(define_expand "mve_vstrhq_scatter_offset_" - [(match_operand: 0 "mve_scatter_memory") - (match_operand:MVE_5 1 "s_register_operand") - (match_operand:MVE_5 2 "s_register_operand") - (unspec:V4SI [(const_int 0)] VSTRHSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrhq_scatter_offset__insn (ind, operands[1], - operands[2])); - DONE; -}) - -(define_insn "mve_vstrhq_scatter_offset__insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:MVE_5 1 "s_register_operand" "w") - (match_operand:MVE_5 2 "s_register_operand" "w")] - VSTRHSOQ))] - "TARGET_HAVE_MVE" - "vstrh.\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_scatter_offset__insn")) - (set_attr "length" "4")]) - ;; ;; [vstrhq_scatter_shifted_offset_p_s vstrhq_scatter_shifted_offset_p_u] ;; @@ -4221,67 +4182,6 @@ (define_insn "mve_vstrdq_scatter_base_v2di" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrdq_scatter_base_v2di")) (set_attr "length" "4")]) -;; -;; [vstrdq_scatter_offset_p_s vstrdq_scatter_offset_p_u] -;; -(define_expand "mve_vstrdq_scatter_offset_p_v2di" - [(match_operand:V2DI 0 "mve_scatter_memory") - (match_operand:V2DI 1 "s_register_operand") - (match_operand:V2DI 2 "s_register_operand") - (match_operand:V2QI 3 "vpr_register_operand") - (unspec:V4SI [(const_int 0)] VSTRDSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrdq_scatter_offset_p_v2di_insn (ind, operands[1], - operands[2], - operands[3])); - DONE; -}) - -(define_insn "mve_vstrdq_scatter_offset_p_v2di_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V2DI 1 "s_register_operand" "w") - (match_operand:V2DI 2 "s_register_operand" "w") - (match_operand:V2QI 3 "vpr_register_operand" "Up")] - VSTRDSOQ))] - "TARGET_HAVE_MVE" - "vpst\;vstrdt.64\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrdq_scatter_offset_v2di_insn")) - (set_attr "length" "8")]) - -;; -;; [vstrdq_scatter_offset_s vstrdq_scatter_offset_u] -;; -(define_expand "mve_vstrdq_scatter_offset_v2di" - [(match_operand:V2DI 0 "mve_scatter_memory") - (match_operand:V2DI 1 "s_register_operand") - (match_operand:V2DI 2 "s_register_operand") - (unspec:V4SI [(const_int 0)] VSTRDSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrdq_scatter_offset_v2di_insn (ind, operands[1], - operands[2])); - DONE; -}) - -(define_insn "mve_vstrdq_scatter_offset_v2di_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V2DI 1 "s_register_operand" "w") - (match_operand:V2DI 2 "s_register_operand" "w")] - VSTRDSOQ))] - "TARGET_HAVE_MVE" - "vstrd.64\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrdq_scatter_offset_v2di_insn")) - (set_attr "length" "4")]) - ;; ;; [vstrdq_scatter_shifted_offset_p_s vstrdq_scatter_shifted_offset_p_u] ;; @@ -4345,67 +4245,6 @@ (define_insn "mve_vstrdq_scatter_shifted_offset_v2di_insn" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrdq_scatter_shifted_offset_v2di_insn")) (set_attr "length" "4")]) -;; -;; [vstrhq_scatter_offset_f] -;; -(define_expand "mve_vstrhq_scatter_offset_fv8hf" - [(match_operand:V8HI 0 "mve_scatter_memory") - (match_operand:V8HI 1 "s_register_operand") - (match_operand:V8HF 2 "s_register_operand") - (unspec:V4SI [(const_int 0)] VSTRHQSO_F)] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrhq_scatter_offset_fv8hf_insn (ind, operands[1], - operands[2])); - DONE; -}) - -(define_insn "mve_vstrhq_scatter_offset_fv8hf_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V8HI 1 "s_register_operand" "w") - (match_operand:V8HF 2 "s_register_operand" "w")] - VSTRHQSO_F))] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" - "vstrh.16\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_scatter_offset_fv8hf_insn")) - (set_attr "length" "4")]) - -;; -;; [vstrhq_scatter_offset_p_f] -;; -(define_expand "mve_vstrhq_scatter_offset_p_fv8hf" - [(match_operand:V8HI 0 "mve_scatter_memory") - (match_operand:V8HI 1 "s_register_operand") - (match_operand:V8HF 2 "s_register_operand") - (match_operand:V8BI 3 "vpr_register_operand") - (unspec:V4SI [(const_int 0)] VSTRHQSO_F)] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrhq_scatter_offset_p_fv8hf_insn (ind, operands[1], - operands[2], - operands[3])); - DONE; -}) - -(define_insn "mve_vstrhq_scatter_offset_p_fv8hf_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V8HI 1 "s_register_operand" "w") - (match_operand:V8HF 2 "s_register_operand" "w") - (match_operand:V8BI 3 "vpr_register_operand" "Up")] - VSTRHQSO_F))] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" - "vpst\;vstrht.16\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrhq_scatter_offset_fv8hf_insn")) - (set_attr "length" "8")]) - ;; ;; [vstrhq_scatter_shifted_offset_f] ;; @@ -4515,128 +4354,6 @@ (define_insn "mve_vstrwq_scatter_base_p_fv4sf" [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_scatter_base_fv4sf")) (set_attr "length" "8")]) -;; -;; [vstrwq_scatter_offset_f] -;; -(define_expand "mve_vstrwq_scatter_offset_fv4sf" - [(match_operand:V4SI 0 "mve_scatter_memory") - (match_operand:V4SI 1 "s_register_operand") - (match_operand:V4SF 2 "s_register_operand") - (unspec:V4SI [(const_int 0)] VSTRWQSO_F)] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrwq_scatter_offset_fv4sf_insn (ind, operands[1], - operands[2])); - DONE; -}) - -(define_insn "mve_vstrwq_scatter_offset_fv4sf_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V4SI 1 "s_register_operand" "w") - (match_operand:V4SF 2 "s_register_operand" "w")] - VSTRWQSO_F))] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" - "vstrw.32\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_scatter_offset_fv4sf_insn")) - (set_attr "length" "4")]) - -;; -;; [vstrwq_scatter_offset_p_f] -;; -(define_expand "mve_vstrwq_scatter_offset_p_fv4sf" - [(match_operand:V4SI 0 "mve_scatter_memory") - (match_operand:V4SI 1 "s_register_operand") - (match_operand:V4SF 2 "s_register_operand") - (match_operand:V4BI 3 "vpr_register_operand") - (unspec:V4SI [(const_int 0)] VSTRWQSO_F)] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrwq_scatter_offset_p_fv4sf_insn (ind, operands[1], - operands[2], - operands[3])); - DONE; -}) - -(define_insn "mve_vstrwq_scatter_offset_p_fv4sf_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V4SI 1 "s_register_operand" "w") - (match_operand:V4SF 2 "s_register_operand" "w") - (match_operand:V4BI 3 "vpr_register_operand" "Up")] - VSTRWQSO_F))] - "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" - "vpst\;vstrwt.32\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_scatter_offset_fv4sf_insn")) - (set_attr "length" "8")]) - -;; -;; [vstrwq_scatter_offset_s vstrwq_scatter_offset_u] -;; -(define_expand "mve_vstrwq_scatter_offset_p_v4si" - [(match_operand:V4SI 0 "mve_scatter_memory") - (match_operand:V4SI 1 "s_register_operand") - (match_operand:V4SI 2 "s_register_operand") - (match_operand:V4BI 3 "vpr_register_operand") - (unspec:V4SI [(const_int 0)] VSTRWSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrwq_scatter_offset_p_v4si_insn (ind, operands[1], - operands[2], - operands[3])); - DONE; -}) - -(define_insn "mve_vstrwq_scatter_offset_p_v4si_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V4SI 1 "s_register_operand" "w") - (match_operand:V4SI 2 "s_register_operand" "w") - (match_operand:V4BI 3 "vpr_register_operand" "Up")] - VSTRWSOQ))] - "TARGET_HAVE_MVE" - "vpst\;vstrwt.32\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_scatter_offset_v4si_insn")) - (set_attr "length" "8")]) - -;; -;; [vstrwq_scatter_offset_s vstrwq_scatter_offset_u] -;; -(define_expand "mve_vstrwq_scatter_offset_v4si" - [(match_operand:V4SI 0 "mve_scatter_memory") - (match_operand:V4SI 1 "s_register_operand") - (match_operand:V4SI 2 "s_register_operand") - (unspec:V4SI [(const_int 0)] VSTRWSOQ)] - "TARGET_HAVE_MVE" -{ - rtx ind = XEXP (operands[0], 0); - gcc_assert (REG_P (ind)); - emit_insn (gen_mve_vstrwq_scatter_offset_v4si_insn (ind, operands[1], - operands[2])); - DONE; -}) - -(define_insn "mve_vstrwq_scatter_offset_v4si_insn" - [(set (mem:BLK (scratch)) - (unspec:BLK - [(match_operand:SI 0 "register_operand" "r") - (match_operand:V4SI 1 "s_register_operand" "w") - (match_operand:V4SI 2 "s_register_operand" "w")] - VSTRWSOQ))] - "TARGET_HAVE_MVE" - "vstrw.32\t%q2, [%0, %q1]" - [(set (attr "mve_unpredicated_insn") (symbol_ref "CODE_FOR_mve_vstrwq_scatter_offset_v4si_insn")) - (set_attr "length" "4")]) - ;; ;; [vstrwq_scatter_shifted_offset_f] ;; diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 01963d54cd4..5315db580f3 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -1148,8 +1148,10 @@ (define_c_enum "unspec" [ VSUBQ_M_F VSTRWQSB_S VSTRWQSB_U - VSTRBQSO_S - VSTRBQSO_U + VSTRQSO + VSTRQSO_P + VSTRQSO_TRUNC + VSTRQSO_TRUNC_P VLDRQ VLDRQ_Z VLDRQ_EXT @@ -1181,21 +1183,14 @@ (define_c_enum "unspec" [ VSTRQ_P VSTRQ_TRUNC VSTRQ_TRUNC_P - VSTRHQSO_S VSTRDQSB_S VSTRDQSB_U - VSTRDQSO_S - VSTRDQSO_U VSTRDQSSO_S VSTRDQSSO_U - VSTRWQSO_S - VSTRWQSO_U VSTRWQSSO_S VSTRWQSSO_U - VSTRHQSO_F VSTRHQSSO_F VSTRWQSB_F - VSTRWQSO_F VSTRWQSSO_F VDDUPQ VDDUPQ_M @@ -1236,7 +1231,6 @@ (define_c_enum "unspec" [ VST2Q VSHLCQ_M_U VSHLCQ_M_S - VSTRHQSO_U VSTRHQSSO_S VSTRHQSSO_U VSTRHQ_S