From patchwork Mon Dec 9 15:05:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 2020120 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=dvERdTRR; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Y6QFC00Nvz1yQl for ; Tue, 10 Dec 2024 02:06:54 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 579153857C6E for ; Mon, 9 Dec 2024 15:06:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 579153857C6E Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=dvERdTRR X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) by sourceware.org (Postfix) with ESMTPS id 070E03858428 for ; Mon, 9 Dec 2024 15:05:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 070E03858428 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 070E03858428 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::22d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756744; cv=none; b=et8sD5pkLmCpbfzIfdOwCFOowTrTbSSmZh4Wok60KoMNsGydLTWn8KkVDQRCNas4FO8OtUXGhSIyDag3CAGK6hUJb+mfr2gm5qRcDhFvPEsZWoBxdtRHGbtaN8aPJ9qBMOVibg78JUA8JqDs7MncZwliFl1Fk3C5cqGCOBDoPlE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756744; c=relaxed/simple; bh=W23dd5O60upNLGy7ah0BgO51TFKRKNBrI8aAbsgonHo=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=OqvsMst4SAJzTNOm1AJEZqd5QeP0D7d2t1ohPNLF8iq2u0dPD3XrGG+pQJ0zeM0WZxij9SG4g3b234yhvtPH4lTCuffjFJKU1+zN0cqlba7s7wWqPV6BhxwOdK4JRJp2n7f3LC6ItzRCh9b7Ew4Bkcpemo3Y8eh1oiGsXMchzuQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 070E03858428 Received: by mail-oi1-x22d.google.com with SMTP id 5614622812f47-3eb4d2b39eeso345758b6e.1 for ; Mon, 09 Dec 2024 07:05:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1733756743; x=1734361543; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cafZAqjxaJfFFn3Q+z5AK13LUNFJHoIe4W5rCh5BJ+w=; b=dvERdTRRrb/5uqgqgIwwpWjzsJK6eAidbRz51epkF9OPM+tOpuFpHz/PgDMvPN9Qy6 dILIjbzHuomeYhqXrhw/PhPlywhdtawd3oPcnYDvpP71/hJjCBbCsoguhW8YZ0rQFICM zqUjpxuliBfBhB5bj/eZeWjcKq4r1Fo2FFc0CpoUUdV5nW8ApInoNge3Jo97I2hRK9xV 1FhOd22v31hLhGZY1hHRuxEjJia899bhgcrWxn5kwG5K3RzSGunNbIgUrX6p9B3+UwpY g5xZ/HnaKVKLn+ZKT/bY4tfQP6j6s4+zkGYLIHH5fm6eHVGII1DovWMPsnbmCVTz/QNK ZKHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733756743; x=1734361543; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cafZAqjxaJfFFn3Q+z5AK13LUNFJHoIe4W5rCh5BJ+w=; b=xQO12ItuL3zYzICfOasTX/y/opZ8i+sFv/MWLTUnjwUn3eViRHnkZ+C61xalU9V6QD obqHtIaEv6B2FtnPlaWNnjxmxh25KXmIFQGgk1rA3IVcY1vo9ZvenHgdBNXOezgvzqMt KKeFe4KSY0tQX7fGc9UrDu0DKjT1RtS45/1NMmBdJ/qSTaOSH+Ut3VxKLEmlnXbHEIvq dWNu+SFWy4KZpYF1JK9dkNTzG5zeA81gqrsq1mriHVYumWQD8nNShfzTlzF/t7KTeYc7 ndLPev82E45fvEY2e+nnwcrMxYiHjFF0MMNWgapN/UMWAaNGRfA3QQ1OAPvMzM8Z89IE ZNkg== X-Gm-Message-State: AOJu0YwBdFp8r+tCHEW01WP8azFuzVhtJgNsvqMxbGjUGLQ6n9Y2bzqQ wklgF4Thl0tyMnZJWc8Pgpy1dZHBvCCGvAJPax1hEkU3ooGcL5AyYOvK4/BLMBMULT9Fk0CWu0I GI/eYsg== X-Gm-Gg: ASbGncvmNDrsCwK32a8Q+MgVfXpfTtROwXEeEXU2l0rJQ31CnFNnN0iErE/j/52OX+k Q711cI31tyZr3qvNS2wMaiua9Ii5O3J/8TbjKMU5OSRIVdzEOJnrE/QtztasOa3RB0P7l2H+XvA kfcGpLxEE3/d4dEy9GpVUfeTvGbS7ySt3kue8Zhn0nfSWm6gJ/mD23cf4kvEpaZmJVZOXvkznpG 1qaO/9VhYN08nv2n/GLM9NYg3UKB/+AwPPBmnaiPs2HMpwXjIg6qOwJi3HadjCu2kHARBmA X-Google-Smtp-Source: AGHT+IExWF25YjS2xZPQN/yjxQqGqdWACS7GTmAkapxmMwprxCdc286ohj8F20KKBcr33hYiAIwjHQ== X-Received: by 2002:a05:6808:192a:b0:3eb:619f:1a87 with SMTP id 5614622812f47-3eb66f4aaf9mr518299b6e.28.1733756742978; Mon, 09 Dec 2024 07:05:42 -0800 (PST) Received: from localhost.localdomain ([139.178.84.207]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-71df6987f26sm490435a34.68.2024.12.09.07.05.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Dec 2024 07:05:42 -0800 (PST) From: Christophe Lyon To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, andre.simoesdiasvieira@arm.com, ramanara@nvidia.com Cc: Christophe Lyon Subject: [PATCH v2 1/4] arm: [MVE intrinsics] add modes for tuples Date: Mon, 9 Dec 2024 15:05:29 +0000 Message-Id: <20241209150532.2174817-2-christophe.lyon@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241209150532.2174817-1-christophe.lyon@linaro.org> References: <20241209150532.2174817-1-christophe.lyon@linaro.org> MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Add V2x and V4x modes, like we do on aarch64 for Advanced SIMD q-registers. gcc/ChangeLog: * config/arm/arm-modes.def (MVE_STRUCT_MODES): New. --- gcc/config/arm/arm-modes.def | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def index 4d592ad3cfb..4774dfbcd4b 100644 --- a/gcc/config/arm/arm-modes.def +++ b/gcc/config/arm/arm-modes.def @@ -105,3 +105,25 @@ INT_MODE (EI, 24); INT_MODE (OI, 32); INT_MODE (CI, 48); INT_MODE (XI, 64); + +/* Define MVE modes for structures of 2 and 4 q-registers. */ +#define MVE_STRUCT_MODES(NVECS, VB, VH, VS, VD) \ + VECTOR_MODES_WITH_PREFIX (V##NVECS##x, INT, 16, 3); \ + VECTOR_MODES_WITH_PREFIX (V##NVECS##x, FLOAT, 16, 3); \ + \ + ADJUST_NUNITS (VB##QI, NVECS * 16); \ + ADJUST_NUNITS (VH##HI, NVECS * 8); \ + ADJUST_NUNITS (VS##SI, NVECS * 4); \ + ADJUST_NUNITS (VD##DI, NVECS * 2); \ + ADJUST_NUNITS (VH##HF, NVECS * 8); \ + ADJUST_NUNITS (VS##SF, NVECS * 4); \ + \ + ADJUST_ALIGNMENT (VB##QI, 16); \ + ADJUST_ALIGNMENT (VH##HI, 16); \ + ADJUST_ALIGNMENT (VS##SI, 16); \ + ADJUST_ALIGNMENT (VD##DI, 16); \ + ADJUST_ALIGNMENT (VH##HF, 16); \ + ADJUST_ALIGNMENT (VS##SF, 16); + +MVE_STRUCT_MODES (2, V2x16, V2x8, V2x4, V2x2) +MVE_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2) From patchwork Mon Dec 9 15:05:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 2020121 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=u2qMh4Ij; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Y6QGD5yLlz1yQl for ; Tue, 10 Dec 2024 02:07:48 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55339385841C for ; Mon, 9 Dec 2024 15:07:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 55339385841C Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=u2qMh4Ij X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 48EB53858D33 for ; Mon, 9 Dec 2024 15:05:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 48EB53858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 48EB53858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::236 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756745; cv=none; b=OLmdhgB8vUZ0FrrXns2DUIBD/Aa/GBchx5CJeKyoaHw0Q2nA7QE0Y3oYkK4F4ka5o9FiMyAjdF3yWhlgTpnqsP8EytnfC+6f/mM/Z1/kJ+kusFR9RwwxIZyKufRAod+uVP8VEEsxY1ujIYHW8WCQHQcdID0m/JFKwDHjyN5ZHn0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756745; c=relaxed/simple; bh=3i2JGV1xuRjsggbYFUkFvwGCtzh+NmhFyb6+oF55MVw=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=L5IOpCdFEoGuP4TpYP4Vtd49fvWb3DwWFO8MsqLyRaGTUfmHGpxy7hWmYvcrohqQUSiDUQfKYhEb+/w5eoB/M3FBTPRdT9Ag8jOikXM9tL9WScOKrockcxfJxZbJZDs0EFJ8AJjKAOWuunyw5DY1wW9RidlxCEiM24q3NTtTlN8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 48EB53858D33 Received: by mail-oi1-x236.google.com with SMTP id 5614622812f47-3eb494c23e8so244283b6e.0 for ; Mon, 09 Dec 2024 07:05:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1733756744; x=1734361544; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QoGZstC1M6EH2sg1b6A4+1BHj0MnL9anTonTSwrYQHA=; b=u2qMh4Ij3W+BVCJVn+3XMehiVRuR6aeqQA/E5kNDTUoUZ+7XLjF6QbB8e+eYJ/kKpe bnoDTukGsUmcKkZ+ICMOm3rZlW8egpmdHH23HkXMLg7ndMc14UXc0NetKMh5HOl2r59A kSovAuoJkCZTdeM3Dz0ffqdWjmthJNKcnsA8mZ+gZse4Zc0v13m2/0L5TAzcNdnFvD98 y5gqpyZj1Z7qEFejwAsrEEpemORLto0QwJQTCAqjCfcKGLD0ZNKWGKbWOVeLheOUEIbg +fRBWQg3/DA/vLDXd209cSP24oAt1hUxBElv4udY0IJKNCHggdMhA6WtVKVKRqMZmJSd P9aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733756744; x=1734361544; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QoGZstC1M6EH2sg1b6A4+1BHj0MnL9anTonTSwrYQHA=; b=Lbfi1FFJzmno5nmi4l9p/8hVG5TBhhsodRH31YXAMPW0wEngQNqQVd8nD4mqGjIpyn WaRTFU/gcrJ3IM9VSXnl7TI0PvtsPRoD8bO9xWQTc+IolSqw5jas80qAftCtPCh4jtPR gbUTdQ5+chqPgvDizdlUZOBX7e/8n4S+L4o9YysNo07r5B/IuLIqZHud10UHzwJnKcuf b5YgMW40LBOOmYs43u0OkBgHOhomju1LRVP6ooa9F24eHz2M5Lt40OQoCMmzxSNxVACi GiWNPEi9EPOgAZ13jcVpb9D9oEtFVISyOvBHm7YGfYxPtgbrK3lwDK5NhCgcFVaxBW+T BKng== X-Gm-Message-State: AOJu0YzoCLAZjK6sOMEzF25ammL27QUUYvhyenPuNHU/7LVHp78Hp83P havg1X5Sxp4eWi/YxqsBQbF2cxdPLbLrGfmCo0lr6tbIMu3oHCREQ7CpzoFxQCVXtj8YyD+LpvL v6TDQnQ== X-Gm-Gg: ASbGncsCVFCoVYmCin9XQbzCbdJ3MkyRWVjpwUzFVpGE8kqKDJjHntKAYygOLcW+ih8 tT0/OdzXKfEP2/tnuU27nJgsNxsmgXycsYb5OJYHkgBQPqjg1aYMtCpzC503cscWE03L5Zc2stJ 6ZppMeus3BAOx6exA71RavJOyTBZhGbBvkGMvnh+yKOoTW6hhNda0e1iwTdSKEuPWaazRBuHH+L MhdSMl/xbHyEGTkDNPXB08IR+YjB0eyFzktX+JQKSrrEOwNZ+4hIo3blMrIYNwHXP4aYzlR X-Google-Smtp-Source: AGHT+IF1hYTNz2DBBTOqDLG3DRsRnF8ZSMVoXYPf29pDkWYUq9nkfmBxhg4kabjnVY7w8+LIqAoIlg== X-Received: by 2002:a05:6808:1825:b0:3eb:3c8d:665c with SMTP id 5614622812f47-3eb66f61095mr482310b6e.36.1733756744139; Mon, 09 Dec 2024 07:05:44 -0800 (PST) Received: from localhost.localdomain ([139.178.84.207]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-71df6987f26sm490435a34.68.2024.12.09.07.05.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Dec 2024 07:05:43 -0800 (PST) From: Christophe Lyon To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, andre.simoesdiasvieira@arm.com, ramanara@nvidia.com Cc: Christophe Lyon Subject: [PATCH v2 2/4] arm: [MVE intrinsics] add support for tuples Date: Mon, 9 Dec 2024 15:05:30 +0000 Message-Id: <20241209150532.2174817-3-christophe.lyon@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241209150532.2174817-1-christophe.lyon@linaro.org> References: <20241209150532.2174817-1-christophe.lyon@linaro.org> MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch is largely a copy/paste from the aarch64 SVE counterpart, and adds support for tuples to the MVE intrinsics framework. Introduce function_resolver::infer_tuple_type which will be used to resolve overloaded vst2q and vst4q function names in a later patch. Fix access to acle_vector_types in a few places, as well as in infer_vector_or_tuple_type because we should shift the tuple size to the right by one bit when computing the array index. The new wrap_type_in_struct, register_type_decl and infer_tuple_type are largely copies of the aarch64 versions, and register_builtin_tuple_types is very similar. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (parse_type): Fix access to acle_vector_types. * config/arm/arm-mve-builtins.cc (wrap_type_in_struct): New. (register_type_decl): New. (register_builtin_tuple_types): Fix support for tuples. (function_resolver::infer_tuple_type): New. * config/arm/arm-mve-builtins.h (function_resolver::infer_tuple_type): Declare. (function_instance::tuple_type): Fix access to acle_vector_types. --- gcc/config/arm/arm-mve-builtins-shapes.cc | 2 +- gcc/config/arm/arm-mve-builtins.cc | 76 +++++++++++++++++++---- gcc/config/arm/arm-mve-builtins.h | 3 +- 3 files changed, 68 insertions(+), 13 deletions(-) diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc index d7cfdca3acd..58ea2f5b988 100644 --- a/gcc/config/arm/arm-mve-builtins-shapes.cc +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc @@ -205,7 +205,7 @@ parse_type (const function_instance &instance, const char *&format) type_suffix_index suffix = parse_element_type (instance, format); vector_type_index vector_type = type_suffixes[suffix].vector_type; unsigned int num_vectors = instance.vectors_per_tuple (); - return acle_vector_types[num_vectors - 1][vector_type]; + return acle_vector_types[num_vectors >> 1][vector_type]; } if (ch == 'v') diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc index a70a9606e8b..0a7ffcfa546 100644 --- a/gcc/config/arm/arm-mve-builtins.cc +++ b/gcc/config/arm/arm-mve-builtins.cc @@ -38,6 +38,7 @@ #include "gimple-iterator.h" #include "explow.h" #include "emit-rtl.h" +#include "stor-layout.h" #include "langhooks.h" #include "stringpool.h" #include "attribs.h" @@ -462,6 +463,48 @@ register_vector_type (vector_type_index type) acle_vector_types[0][type] = vectype; } +/* Return a structure type that contains a single field of type FIELD_TYPE. + The field is called __val, but that's an internal detail rather than + an exposed part of the API. */ +static tree +wrap_type_in_struct (tree field_type) +{ + tree field = build_decl (input_location, FIELD_DECL, + get_identifier ("__val"), field_type); + tree struct_type = lang_hooks.types.make_type (RECORD_TYPE); + DECL_FIELD_CONTEXT (field) = struct_type; + TYPE_FIELDS (struct_type) = field; + layout_type (struct_type); + return struct_type; +} + +/* Register a built-in TYPE_DECL called NAME for TYPE. This is used/needed + when TYPE is a structure type. */ +static void +register_type_decl (tree type, const char *name) +{ + tree decl = build_decl (input_location, TYPE_DECL, + get_identifier (name), type); + TYPE_NAME (type) = decl; + TYPE_STUB_DECL (type) = decl; + lang_hooks.decls.pushdecl (decl); + /* ??? Undo the effect of set_underlying_type for C. The C frontend + doesn't recognize DECL as a built-in because (as intended) the decl has + a real location instead of BUILTINS_LOCATION. The frontend therefore + treats the decl like a normal C "typedef struct foo foo;", expecting + the type for tag "struct foo" to have a dummy unnamed TYPE_DECL instead + of the named one we attached above. It then sets DECL_ORIGINAL_TYPE + on the supposedly unnamed decl, creating a circularity that upsets + dwarf2out. + + We don't want to follow the normal C model and create "struct foo" + tags for tuple types since (a) the types are supposed to be opaque + and (b) they couldn't be defined as a real struct anyway. Treating + the TYPE_DECLs as "typedef struct foo foo;" without creating + "struct foo" would lead to confusing error messages. */ + DECL_ORIGINAL_TYPE (decl) = NULL_TREE; +} + /* Register tuple types of element type TYPE under their arm_mve_types.h names. */ static void @@ -479,7 +522,7 @@ register_builtin_tuple_types (vector_type_index type) { for (unsigned int num_vectors = 2; num_vectors <= 4; num_vectors += 2) acle_vector_types[num_vectors >> 1][type] = void_type_node; - return; + return; } const char *vector_type_name = info->acle_name; @@ -492,15 +535,16 @@ register_builtin_tuple_types (vector_type_index type) tree vectype = acle_vector_types[0][type]; tree arrtype = build_array_type_nelts (vectype, num_vectors); - gcc_assert (TYPE_MODE_RAW (arrtype) == TYPE_MODE (arrtype)); - tree field = build_decl (input_location, FIELD_DECL, - get_identifier ("val"), arrtype); - - tree t = lang_hooks.types.simulate_record_decl (input_location, buffer, - make_array_slice (&field, - 1)); - gcc_assert (TYPE_MODE_RAW (t) == TYPE_MODE (t)); - acle_vector_types[num_vectors >> 1][type] = TREE_TYPE (t); + gcc_assert (TYPE_MODE_RAW (arrtype) == TYPE_MODE (arrtype) + && TYPE_ALIGN (arrtype) == 64); + + tree tuple_type = wrap_type_in_struct (arrtype); + gcc_assert (TYPE_MODE_RAW (tuple_type) == TYPE_MODE (tuple_type) + && TYPE_ALIGN (tuple_type) == 64); + + register_type_decl (tuple_type, buffer); + + acle_vector_types[num_vectors >> 1][type] = tuple_type; } } @@ -1293,7 +1337,7 @@ function_resolver::infer_vector_or_tuple_type (unsigned int argno, tree type = acle_vector_types[size_i][type_i]; if (type && matches_type_p (type, actual)) { - if (size_i + 1 == num_vectors) + if (size_i == (num_vectors >> 1)) return type_suffix_index (suffix_i); if (num_vectors == 1) @@ -1334,6 +1378,16 @@ function_resolver::infer_vector_type (unsigned int argno) return infer_vector_or_tuple_type (argno, 1); } +/* If the function operates on tuples of vectors, require argument ARGNO to be + a tuple with the appropriate number of vectors, otherwise require it to be a + single vector. Return the associated type suffix on success. Report an + error and return NUM_TYPE_SUFFIXES on failure. */ +type_suffix_index +function_resolver::infer_tuple_type (unsigned int argno) +{ + return infer_vector_or_tuple_type (argno, vectors_per_tuple ()); +} + /* Require argument ARGNO to be a vector or scalar argument. Return true if it is, otherwise report an appropriate error. */ bool diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-builtins.h index c6a929c3eee..b0b3b512408 100644 --- a/gcc/config/arm/arm-mve-builtins.h +++ b/gcc/config/arm/arm-mve-builtins.h @@ -387,6 +387,7 @@ public: type_suffix_index infer_pointer_type (unsigned int); type_suffix_index infer_vector_or_tuple_type (unsigned int, unsigned int); type_suffix_index infer_vector_type (unsigned int); + type_suffix_index infer_tuple_type (unsigned int); bool require_vector_or_scalar_type (unsigned int); @@ -733,7 +734,7 @@ inline tree function_instance::tuple_type (unsigned int i) const { unsigned int num_vectors = vectors_per_tuple (); - return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type]; + return acle_vector_types[num_vectors >> 1][type_suffix (i).vector_type]; } /* Return the vector or predicate mode associated with type suffix I. */ From patchwork Mon Dec 9 15:05:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 2020124 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=Jwy532eS; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Y6QNj3j6qz1yRl for ; Tue, 10 Dec 2024 02:13:24 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 28B323858D33 for ; Mon, 9 Dec 2024 15:13:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 28B323858D33 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=Jwy532eS X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ot1-x329.google.com (mail-ot1-x329.google.com [IPv6:2607:f8b0:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id 6ECA5385841C for ; Mon, 9 Dec 2024 15:05:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6ECA5385841C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6ECA5385841C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::329 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756746; cv=none; b=YsKcdHV0eU9bdp5t7N7UuGEO/1NYrLvbP+TDJ4lNG12c1cNWOgsUXoJOD5uxhQmSXzBI37bYU0vcxYLJ9JVt5ZIRJRMqdMGBPJ1GbhWzlPn3oUB+M6S7pYQOccLfiunNOTLY1F0q/uidR65PhS6md5/WpE6xApIdQjOpDztpb2c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756746; c=relaxed/simple; bh=WgbrTR43PTUgF5MRKpAMJvcSYqciZD0cwNxaGUv7LsQ=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Pw+cobG0t2XU+kvU+I2bxlK+PnhZpAAqd91miQskRUSDPVXkhdwbKIrBLcnQzmGRtMTnCF3iS9gMD/1jpxIW2s9iuIrsaK4SwfkRBPLdOEx5Gfeoxbc1HTclx/cQNftiStl25VOHgk77FwXePcizGtGr0Eo8MQwUmk3DLvRqGv0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6ECA5385841C Received: by mail-ot1-x329.google.com with SMTP id 46e09a7af769-71e0368c9faso39184a34.3 for ; Mon, 09 Dec 2024 07:05:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1733756745; x=1734361545; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=44QbNFiA6zOE91U5kreOpxuYE9p5S/W422jNQhzvwpo=; b=Jwy532eSmZQM+ZAxMnPPN+TXR2Fm83fvRzHzefWHezypmFQPI6eccpvv4+tFMgAaYM RM+IveyJ1UlZp4IW3MbFMJfCXn/kN3kUPXxoKF8woUwidr1d/sW0lRronnf+U/FS9NPe 9z6r+FtqFiwFXX7MkpuQ+zg9AgdS2D1kmac3VsaCT3HDcTxSYjoNOkrv/SoUVVrCCPz9 yRgnmYi8/u9zQgJ6VoBSRLVdREt/vbquO4p5YNfXwzipEwglf5jaB4L/NITsq9T8tBfW rHkzir+5VLZ0wPhW8VfFuPenvcm/TocMdzwwaiuVP3G899HXqS6AzCdCCmRb06dTOk+1 rDGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733756745; x=1734361545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=44QbNFiA6zOE91U5kreOpxuYE9p5S/W422jNQhzvwpo=; b=SB3dMVzyzcfYTQWeVEZ2ZrIhlcnY5iFSSSS6Y/2agh+CY8cfI8FGOy9sSWkoUcyiLb tqi1Flkm0+qNN8njtUKIc7ZOB4rzkkh2GaS3mhx6yMJzqIy1WJBLNYOgIsPzL6b4mk7v qvpSDCPN2M1lPXyDQrkJ0YEK/FsEy7ALt6HsT4oYJtFjjSdqMUwfNU+EwFwdk3Z2OQYH JyGmhJKyv75NVezfkMXW4E8FpBifsIsEjaJI9uNDFEG4HEMr+l98s4a0rzfExhhoh+Pr dkFXypoJi+bnkr/Sv/8XeWm17BYiIC8bA4+DSzPJ2Phejy7OxGTpxGH/eQ4uTWe6ZEyh N2NQ== X-Gm-Message-State: AOJu0Yz+1LnZhFF86nO7xzqZj2shYOdpxtCi59E9uJBoXR80rWFXkjA7 NJqSqtlXP8RG1h/0WbB0fGpBZpmQF9P2DIsEUOdOwO9JWujmyfW6OYTlpHpNRFCkbALKHZHhBd0 h7uKC0w== X-Gm-Gg: ASbGncsXG1eKaxzIFdhp1bON63md8HX0BhHWUqR5VA5llBTxahUJOvq3vYRa9wOxV92 dLD4hEGVj1GeRz/GjTlp1ug7mF/Yc/LEtUvw/6tIJKiJnW59UicL883d8rAfHC2k2zjr+EHopXK LoBiXJPel5mFHjRyDMg68QquqaY1M4pd0YYOijtg8muzFUq2wi1z+JS/woTjEXgAmlQqXtZvA9G G5gK4BW6PxRN4YbX4Fsp0ylE95cHmY9CcphUYv+6dnaB4OlR0bV+gb4sc0bXE4VvEX8whHq X-Google-Smtp-Source: AGHT+IGV3D55T9wmZA6pMrItR95s+k+TlcFF4AEZXNZFdY+b1eH8xZBuppO4yk7DGIrOB3Uwr+vs5Q== X-Received: by 2002:a05:6830:4105:b0:71d:fb64:b601 with SMTP id 46e09a7af769-71dfb64b946mr1969878a34.27.1733756745223; Mon, 09 Dec 2024 07:05:45 -0800 (PST) Received: from localhost.localdomain ([139.178.84.207]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-71df6987f26sm490435a34.68.2024.12.09.07.05.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Dec 2024 07:05:44 -0800 (PST) From: Christophe Lyon To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, andre.simoesdiasvieira@arm.com, ramanara@nvidia.com Cc: Christophe Lyon Subject: [PATCH v2 3/4] arm: [MVE intrinsics] fix store shape to support tuples Date: Mon, 9 Dec 2024 15:05:31 +0000 Message-Id: <20241209150532.2174817-4-christophe.lyon@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241209150532.2174817-1-christophe.lyon@linaro.org> References: <20241209150532.2174817-1-christophe.lyon@linaro.org> MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Now that tuples are properly supported, we can update the store shape, to expect "t0" instead of "v0" as last argument. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct store_def): Add support for tuples. --- gcc/config/arm/arm-mve-builtins-shapes.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-mve-builtins-shapes.cc index 58ea2f5b988..5b45ee2f465 100644 --- a/gcc/config/arm/arm-mve-builtins-shapes.cc +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc @@ -1701,7 +1701,7 @@ struct store_def : public overloaded_base<0> bool preserve_user_namespace) const override { b.add_overloaded_functions (group, MODE_none, preserve_user_namespace); - build_all (b, "_,as,v0", group, MODE_none, preserve_user_namespace); + build_all (b, "_,as,t0", group, MODE_none, preserve_user_namespace); } tree @@ -1713,7 +1713,7 @@ struct store_def : public overloaded_base<0> type_suffix_index type; if (!r.check_gp_argument (2, i, nargs) || !r.require_pointer_type (0) - || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES) + || (type = r.infer_tuple_type (1)) == NUM_TYPE_SUFFIXES) return error_mark_node; return r.resolve_to (r.mode_suffix_id, type); From patchwork Mon Dec 9 15:05:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 2020122 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=qcpcqF9K; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Y6QGH2wQcz1yQl for ; Tue, 10 Dec 2024 02:07:51 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D66C33858288 for ; Mon, 9 Dec 2024 15:07:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D66C33858288 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=qcpcqF9K X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by sourceware.org (Postfix) with ESMTPS id A67C53858431 for ; Mon, 9 Dec 2024 15:05:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A67C53858431 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A67C53858431 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::332 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756748; cv=none; b=D7yHOK3bpnqJhLygJ+rfu8rhuqF6h9vgpCZjkMdAcziwNxfoStU2S0hSthS9M/zj6LvCgWGKZb3xASs8EpCaEr1UiooyFIcuSiIVfG8DHLT8M09KMGEoAqd/Rsm+icgV/sSyfbfiHoZgCz7boh3jYfEtaiwTMSsYqfxjSwH/QqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733756748; c=relaxed/simple; bh=uxmXR6NbtxSVMkgZet+nkce04AWMZbnITl6MWnN55ME=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=o1p8WgATo/NsaaqSH+k5DJa+JmwWbOu6epWXDBojWwWgg0tqEB8n19GU37mwY6P7eXqshKXKIfgKmaxFQ9WpXxhl+NPEGOdndwCf3PCI4+WcG78dfN1WcvT53nKKtG0QAzg2n8tVikQWaZ0nreYUwS3HpeFUhUmkYM3eTRhBO7U= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A67C53858431 Received: by mail-ot1-x332.google.com with SMTP id 46e09a7af769-71dfc9ac7caso285103a34.3 for ; Mon, 09 Dec 2024 07:05:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1733756747; x=1734361547; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fHj3xctmOiOwP2Eb5HE4k0I2WFqfzxTxBb2IW0H2bys=; b=qcpcqF9KDXd5s1ebtcZ8S5ZhEo20NBGoo8SgV5iYbyZHqBpYZ26GAOorSDH8/A16vT o3vQConW+DZTHovUH4/fVKzlffZx8qDf0Rl71Cd+VWjqmNIcDC8HKobUVYSkv58r4p1R MlDBjWe5lkklPNUd2tQdSGKuWhg5ztO5hDmJAlZAyDvaE77ien5/yllICsq3a1zFI7gt dTY0FhNfC+OOL8pMo77nmG0bMgL/ho5k3QwKJ87nhxKu2Kaa4pKrTtBn+/F+TYzCGJuq 4Oa+/Q3LLQZ34DjhNETQPgLc6z5tjZYqmqwztgZ3gutCw0f9Qi1yDGy3AN5qWigpf2/T z8og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733756747; x=1734361547; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fHj3xctmOiOwP2Eb5HE4k0I2WFqfzxTxBb2IW0H2bys=; b=D/0O/hLccamEQKkxcFVpZMnFpZs1F9BtIH3OstlLReGNsq3bo1lrTxkOwt7swzQvXP K5OTZzznHWXqdLO2ANjhg4U+m5RKxFC7QUfSENxIOpg1Y3TAOXYlFCLXQZDQqVQ15amY 4r5iuGjrUCtUXIFlw99nPP+mtptwRBsXx83GyFh/DpavKTJ96AQetRFsnr45RWjWktJk OyT+1DDYRpOa2A6dP6dOpC2GoT2UFEv2OlIT0HCGYycsoWJqcIrq7Qj03RJ80CqP+DMU L6rEUpTCtQGCiLOt+hJi7VVuQ8/3pZOh2TNKVWeDrBf56VvoBudWS4wy8EjuqdpoOlsO necg== X-Gm-Message-State: AOJu0YyB1i+1+R/a42ekMuVjL2VYD3H+qzosXWiAZg/IuW6Lk2ve8SEe 4Ei0wQHqk7hgyD6bePU0qjzByeEGhKvDjaoMBKGebF7n6EsD1mzVD32wD0JzBuUbKxrP0kjTBYN 2RNTrTw== X-Gm-Gg: ASbGncvKuIoQuiJjfnRBLb9Mn5lhsDc4eWpEFLDrZ+t6pWd22MLj/9mD5P7zAG+L2Wt oTeSyoKrKABKYQemRO1TIesRD5Sag4oHDktVxzYimsrmtkHGL87eNKO898nAzxOTDQIa7I9lp9D lJfyj6o3MLNIPuv7OLdLf1x6HyF18Ij+j/A5Qo5nb736ZCAJWAYp92ciAEi+pMzXd4Qb6l08dO5 bX5GfW6uK5w0PJCVk+kAeB711V6ZPCvDTYvxw1b2yMAXOh/khzisIogvIMB0FRIb9n0X4eF X-Google-Smtp-Source: AGHT+IGw1Bu00ceacAMR/28C+7Si9/lKL1mfyvJpshXD8XzX9vh6GXIy/wIv40rfz4kpWiikVSf7kA== X-Received: by 2002:a05:6830:6489:b0:70f:7123:1f34 with SMTP id 46e09a7af769-71dcf568f1dmr11151697a34.30.1733756746561; Mon, 09 Dec 2024 07:05:46 -0800 (PST) Received: from localhost.localdomain ([139.178.84.207]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-71df6987f26sm490435a34.68.2024.12.09.07.05.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Dec 2024 07:05:45 -0800 (PST) From: Christophe Lyon To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, andre.simoesdiasvieira@arm.com, ramanara@nvidia.com Cc: Christophe Lyon Subject: [PATCH v2 4/4] arm: [MVE intrinsics] rework vst2q vst4q vld2q vld4q Date: Mon, 9 Dec 2024 15:05:32 +0000 Message-Id: <20241209150532.2174817-5-christophe.lyon@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241209150532.2174817-1-christophe.lyon@linaro.org> References: <20241209150532.2174817-1-christophe.lyon@linaro.org> MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Implement vst2q, vst4q, vld2q and vld4q using the new MVE builtins framework. Since MVE uses different tuple modes than Neon, we need to use VALID_MVE_STRUCT_MODE because VALID_NEON_STRUCT_MODE is no longer a super-set of it, for instance in output_move_neon and arm_print_operand_address. In arm_hard_regno_mode_ok, the change is similar but a bit more intrusive. Expand the VSTRUCT iterator, so that mov and neon_mov patterns from neon.md still work for MVE. Besides the small updates to the patterns in mve.md, we have to update vec_load_lanes and vec_store_lanes in vec-common.md so that the vectorizer can handle the new modes. These patterns are now different from Neon's, so maybe we should move them back to neon.md and mve.md The patch adds arm_array_mode, which is used by build_array_type_nelts and makes it possible to support the new assert in register_builtin_tuple_types. gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (class vst24_impl): New. (class vld24_impl): New. (vld2q, vld4q, vst2q, vst4q): New. * config/arm/arm-mve-builtins-base.def (vld2q, vld4q, vst2q) (vst4q): New. * config/arm/arm-mve-builtins-base.h (vld2q, vld4q, vst2q, vst4q): New. * config/arm/arm-mve-builtins.cc (register_builtin_tuple_types): Add more asserts. * config/arm/arm.cc (TARGET_ARRAY_MODE): New. (output_move_neon): Handle MVE struct modes. (arm_print_operand_address): Likewise. (arm_hard_regno_mode_ok): Likewise. (arm_array_mode): New. * config/arm/arm.h (VALID_MVE_STRUCT_MODE): Likewise. * config/arm/arm_mve.h (vst4q): Delete. (vst2q): Delete. (vld2q): Delete. (vld4q): Delete. (vst4q_s8): Delete. (vst4q_s16): Delete. (vst4q_s32): Delete. (vst4q_u8): Delete. (vst4q_u16): Delete. (vst4q_u32): Delete. (vst4q_f16): Delete. (vst4q_f32): Delete. (vst2q_s8): Delete. (vst2q_u8): Delete. (vld2q_s8): Delete. (vld2q_u8): Delete. (vld4q_s8): Delete. (vld4q_u8): Delete. (vst2q_s16): Delete. (vst2q_u16): Delete. (vld2q_s16): Delete. (vld2q_u16): Delete. (vld4q_s16): Delete. (vld4q_u16): Delete. (vst2q_s32): Delete. (vst2q_u32): Delete. (vld2q_s32): Delete. (vld2q_u32): Delete. (vld4q_s32): Delete. (vld4q_u32): Delete. (vld4q_f16): Delete. (vld2q_f16): Delete. (vst2q_f16): Delete. (vld4q_f32): Delete. (vld2q_f32): Delete. (vst2q_f32): Delete. (__arm_vst4q_s8): Delete. (__arm_vst4q_s16): Delete. (__arm_vst4q_s32): Delete. (__arm_vst4q_u8): Delete. (__arm_vst4q_u16): Delete. (__arm_vst4q_u32): Delete. (__arm_vst2q_s8): Delete. (__arm_vst2q_u8): Delete. (__arm_vld2q_s8): Delete. (__arm_vld2q_u8): Delete. (__arm_vld4q_s8): Delete. (__arm_vld4q_u8): Delete. (__arm_vst2q_s16): Delete. (__arm_vst2q_u16): Delete. (__arm_vld2q_s16): Delete. (__arm_vld2q_u16): Delete. (__arm_vld4q_s16): Delete. (__arm_vld4q_u16): Delete. (__arm_vst2q_s32): Delete. (__arm_vst2q_u32): Delete. (__arm_vld2q_s32): Delete. (__arm_vld2q_u32): Delete. (__arm_vld4q_s32): Delete. (__arm_vld4q_u32): Delete. (__arm_vst4q_f16): Delete. (__arm_vst4q_f32): Delete. (__arm_vld4q_f16): Delete. (__arm_vld2q_f16): Delete. (__arm_vst2q_f16): Delete. (__arm_vld4q_f32): Delete. (__arm_vld2q_f32): Delete. (__arm_vst2q_f32): Delete. (__arm_vst4q): Delete. (__arm_vst2q): Delete. (__arm_vld2q): Delete. (__arm_vld4q): Delete. * config/arm/arm_mve_builtins.def (vst4q, vst2q, vld4q, vld2q): Delete. * config/arm/iterators.md (VSTRUCT): Add V2x16QI, V2x8HI, V2x4SI, V2x8HF, V2x4SF, V4x16QI, V4x8HI, V4x4SI, V4x8HF, V4x4SF. (MVE_VLD2_VST2, MVE_vld2_vst2, MVE_VLD4_VST4, MVE_vld4_vst4): New. * config/arm/mve.md (mve_vst4q): Update into ... (@mve_vst4q): ... this. (mve_vst2q): Update into ... (@mve_vst2q): ... this. (mve_vld2q): Update into ... (@mve_vld2q): ... this. (mve_vld4q): Update into ... (@mve_vld4q): ... this. * config/arm/vec-common.md (vec_load_lanesoi) Remove MVE support. (vec_load_lanesxi): Likewise. (vec_store_lanesoi): Likewise. (vec_store_lanesxi): Likewise. (vec_load_lanes): New. (vec_store_lanes): New. (vec_load_lanes): New. (vec_store_lanes): New. --- gcc/config/arm/arm-mve-builtins-base.cc | 71 +++ gcc/config/arm/arm-mve-builtins-base.def | 8 + gcc/config/arm/arm-mve-builtins-base.h | 4 + gcc/config/arm/arm-mve-builtins.cc | 6 +- gcc/config/arm/arm.cc | 43 +- gcc/config/arm/arm.h | 13 +- gcc/config/arm/arm_mve.h | 628 ----------------------- gcc/config/arm/arm_mve_builtins.def | 4 - gcc/config/arm/iterators.md | 36 +- gcc/config/arm/mve.md | 47 +- gcc/config/arm/vec-common.md | 76 ++- 11 files changed, 253 insertions(+), 683 deletions(-) diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-mve-builtins-base.cc index 737403527a9..723004b53d7 100644 --- a/gcc/config/arm/arm-mve-builtins-base.cc +++ b/gcc/config/arm/arm-mve-builtins-base.cc @@ -1100,6 +1100,73 @@ public: } }; + +/* Implements vst2 and vst4. */ +class vst24_impl : public full_width_access +{ +public: + using full_width_access::full_width_access; + + unsigned int + call_properties (const function_instance &) const override + { + return CP_WRITE_MEMORY; + } + + rtx + expand (function_expander &e) const override + { + insn_code icode; + switch (vectors_per_tuple ()) + { + case 2: + icode = code_for_mve_vst2q (e.vector_mode (0)); + break; + + case 4: + icode = code_for_mve_vst4q (e.vector_mode (0)); + break; + + default: + gcc_unreachable (); + } + return e.use_contiguous_store_insn (icode); + } +}; + +/* Implements vld2 and vld4. */ +class vld24_impl : public full_width_access +{ +public: + using full_width_access::full_width_access; + + unsigned int + call_properties (const function_instance &) const override + { + return CP_READ_MEMORY; + } + + rtx + expand (function_expander &e) const override + { + insn_code icode; + switch (vectors_per_tuple ()) + { + case 2: + icode = code_for_mve_vld2q (e.vector_mode (0)); + break; + + case 4: + icode = code_for_mve_vld4q (e.vector_mode (0)); + break; + + default: + gcc_unreachable (); + } + return e.use_contiguous_load_insn (icode); + } +}; + } /* end anonymous namespace */ namespace arm_mve { @@ -1326,6 +1393,8 @@ FUNCTION (vfmsq, unspec_mve_function_exact_insn, (-1, -1, VFMSQ_F, -1, -1, -1, - FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ) FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ) FUNCTION (vld1q, vld1_impl,) +FUNCTION (vld2q, vld24_impl, (2)) +FUNCTION (vld4q, vld24_impl, (4)) FUNCTION (vldrbq, vldrq_impl, (TYPE_SUFFIX_s8, TYPE_SUFFIX_u8)) FUNCTION (vldrbq_gather, vldrq_gather_impl, (false, TYPE_SUFFIX_s8, TYPE_SUFFIX_u8)) FUNCTION (vldrdq_gather, vldrq_gather_impl, (false, TYPE_SUFFIX_s64, TYPE_SUFFIX_u64, NUM_TYPE_SUFFIXES)) @@ -1458,6 +1527,8 @@ FUNCTION_ONLY_N_NO_F (vshrq, VSHRQ) FUNCTION_ONLY_N_NO_F (vsliq, VSLIQ) FUNCTION_ONLY_N_NO_F (vsriq, VSRIQ) FUNCTION (vst1q, vst1_impl,) +FUNCTION (vst2q, vst24_impl, (2)) +FUNCTION (vst4q, vst24_impl, (4)) FUNCTION (vstrbq, vstrq_impl, (QImode, opt_scalar_mode ())) FUNCTION (vstrbq_scatter, vstrq_scatter_impl, (false, QImode, opt_scalar_mode ())) FUNCTION (vstrdq_scatter, vstrq_scatter_impl, (false, DImode, opt_scalar_mode ())) diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-mve-builtins-base.def index 223d20436e0..73d70af1072 100644 --- a/gcc/config/arm/arm-mve-builtins-base.def +++ b/gcc/config/arm/arm-mve-builtins-base.def @@ -59,6 +59,8 @@ DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none) DEF_MVE_FUNCTION (vidupq, viddup, all_unsigned, mx_or_none) DEF_MVE_FUNCTION (viwdupq, vidwdup, all_unsigned, mx_or_none) DEF_MVE_FUNCTION (vld1q, load, all_integer, z_or_none) +DEF_MVE_FUNCTION (vld2q, load, all_integer, none) +DEF_MVE_FUNCTION (vld4q, load, all_integer, none) DEF_MVE_FUNCTION (vldrbq, load_ext, all_integer, z_or_none) DEF_MVE_FUNCTION (vldrbq_gather, load_ext_gather_offset, all_integer, z_or_none) DEF_MVE_FUNCTION (vldrdq_gather, load_ext_gather_offset, integer_64, z_or_none) @@ -179,6 +181,8 @@ DEF_MVE_FUNCTION (vshrq, binary_rshift, all_integer, mx_or_none) DEF_MVE_FUNCTION (vsliq, ternary_lshift, all_integer, m_or_none) DEF_MVE_FUNCTION (vsriq, ternary_rshift, all_integer, m_or_none) DEF_MVE_FUNCTION (vst1q, store, all_integer, p_or_none) +DEF_MVE_FUNCTION (vst2q, store, all_integer, none) +DEF_MVE_FUNCTION (vst4q, store, all_integer, none) DEF_MVE_FUNCTION (vstrbq, store, all_integer, p_or_none) DEF_MVE_FUNCTION (vstrbq_scatter, store_scatter_offset, all_integer, p_or_none) DEF_MVE_FUNCTION (vstrhq, store, integer_16_32, p_or_none) @@ -234,6 +238,8 @@ DEF_MVE_FUNCTION (vfmaq, ternary_opt_n, all_float, m_or_none) DEF_MVE_FUNCTION (vfmasq, ternary_n, all_float, m_or_none) DEF_MVE_FUNCTION (vfmsq, ternary, all_float, m_or_none) DEF_MVE_FUNCTION (vld1q, load, all_float, z_or_none) +DEF_MVE_FUNCTION (vld2q, load, all_float, none) +DEF_MVE_FUNCTION (vld4q, load, all_float, none) DEF_MVE_FUNCTION (vldrhq, load_ext, float_16, z_or_none) DEF_MVE_FUNCTION (vldrhq_gather, load_ext_gather_offset, float_16, z_or_none) DEF_MVE_FUNCTION (vldrhq_gather_shifted, load_ext_gather_offset, float_16, z_or_none) @@ -264,6 +270,8 @@ DEF_MVE_FUNCTION (vrndpq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vrndxq, unary, all_float, mx_or_none) DEF_MVE_FUNCTION (vst1q, store, all_float, p_or_none) +DEF_MVE_FUNCTION (vst2q, store, all_float, none) +DEF_MVE_FUNCTION (vst4q, store, all_float, none) DEF_MVE_FUNCTION (vstrhq, store, float_16, p_or_none) DEF_MVE_FUNCTION (vstrhq_scatter, store_scatter_offset, float_16, p_or_none) DEF_MVE_FUNCTION (vstrhq_scatter_shifted, store_scatter_offset, float_16, p_or_none) diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-mve-builtins-base.h index 3bc1e933bfc..362eef5940a 100644 --- a/gcc/config/arm/arm-mve-builtins-base.h +++ b/gcc/config/arm/arm-mve-builtins-base.h @@ -82,6 +82,8 @@ extern const function_base *const vhsubq; extern const function_base *const vidupq; extern const function_base *const viwdupq; extern const function_base *const vld1q; +extern const function_base *const vld2q; +extern const function_base *const vld4q; extern const function_base *const vldrbq; extern const function_base *const vldrbq_gather; extern const function_base *const vldrdq_gather; @@ -214,6 +216,8 @@ extern const function_base *const vshrq; extern const function_base *const vsliq; extern const function_base *const vsriq; extern const function_base *const vst1q; +extern const function_base *const vst2q; +extern const function_base *const vst4q; extern const function_base *const vstrbq; extern const function_base *const vstrbq_scatter; extern const function_base *const vstrdq_scatter; diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc index 0a7ffcfa546..8570e18fd96 100644 --- a/gcc/config/arm/arm-mve-builtins.cc +++ b/gcc/config/arm/arm-mve-builtins.cc @@ -535,11 +535,13 @@ register_builtin_tuple_types (vector_type_index type) tree vectype = acle_vector_types[0][type]; tree arrtype = build_array_type_nelts (vectype, num_vectors); - gcc_assert (TYPE_MODE_RAW (arrtype) == TYPE_MODE (arrtype) + gcc_assert (VECTOR_MODE_P (TYPE_MODE (arrtype)) + && TYPE_MODE_RAW (arrtype) == TYPE_MODE (arrtype) && TYPE_ALIGN (arrtype) == 64); tree tuple_type = wrap_type_in_struct (arrtype); - gcc_assert (TYPE_MODE_RAW (tuple_type) == TYPE_MODE (tuple_type) + gcc_assert (VECTOR_MODE_P (TYPE_MODE (tuple_type)) + && TYPE_MODE_RAW (tuple_type) == TYPE_MODE (tuple_type) && TYPE_ALIGN (tuple_type) == 64); register_type_decl (tuple_type, buffer); diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 4ee6fc9d670..777c737d1ff 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -278,6 +278,7 @@ static rtx_insn *arm_pic_static_addr (rtx orig, rtx reg); static bool cortex_a9_sched_adjust_cost (rtx_insn *, int, rtx_insn *, int *); static bool xscale_sched_adjust_cost (rtx_insn *, int, rtx_insn *, int *); static bool fa726te_sched_adjust_cost (rtx_insn *, int, rtx_insn *, int *); +static opt_machine_mode arm_array_mode (machine_mode, unsigned HOST_WIDE_INT); static bool arm_array_mode_supported_p (machine_mode, unsigned HOST_WIDE_INT); static machine_mode arm_preferred_simd_mode (scalar_mode); @@ -515,6 +516,8 @@ static const scoped_attribute_specs *const arm_attribute_table[] = #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p +#undef TARGET_ARRAY_MODE +#define TARGET_ARRAY_MODE arm_array_mode #undef TARGET_ARRAY_MODE_SUPPORTED_P #define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE @@ -20774,7 +20777,9 @@ output_move_neon (rtx *operands) || NEON_REGNO_OK_FOR_QUAD (regno)); gcc_assert (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode) - || VALID_NEON_STRUCT_MODE (mode)); + || VALID_NEON_STRUCT_MODE (mode) + || (TARGET_HAVE_MVE + && VALID_MVE_STRUCT_MODE (mode))); gcc_assert (MEM_P (mem)); addr = XEXP (mem, 0); @@ -24949,7 +24954,8 @@ arm_print_operand_address (FILE *stream, machine_mode mode, rtx x) REGNO (XEXP (x, 0)), GET_CODE (x) == PRE_DEC ? "-" : "", GET_MODE_SIZE (mode)); - else if (TARGET_HAVE_MVE && (mode == OImode || mode == XImode)) + else if (TARGET_HAVE_MVE + && VALID_MVE_STRUCT_MODE (mode)) asm_fprintf (stream, "[%r]!", REGNO (XEXP (x,0))); else asm_fprintf (stream, "[%r], #%s%d", REGNO (XEXP (x, 0)), @@ -25839,7 +25845,17 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode) if (TARGET_HAVE_MVE) return ((VALID_MVE_MODE (mode) && NEON_REGNO_OK_FOR_QUAD (regno)) || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4)) - || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8))); + || (mode == V2x16QImode && NEON_REGNO_OK_FOR_NREGS (regno, 4)) + || (mode == V2x8HImode && NEON_REGNO_OK_FOR_NREGS (regno, 4)) + || (mode == V2x4SImode && NEON_REGNO_OK_FOR_NREGS (regno, 4)) + || (mode == V2x8HFmode && NEON_REGNO_OK_FOR_NREGS (regno, 4)) + || (mode == V2x4SFmode && NEON_REGNO_OK_FOR_NREGS (regno, 4)) + || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)) + || (mode == V4x16QImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)) + || (mode == V4x8HImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)) + || (mode == V4x4SImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)) + || (mode == V4x8HFmode && NEON_REGNO_OK_FOR_NREGS (regno, 8)) + || (mode == V4x4SFmode && NEON_REGNO_OK_FOR_NREGS (regno, 8))); return false; } @@ -29785,6 +29801,27 @@ arm_vector_mode_supported_p (machine_mode mode) return false; } +/* Implements target hook array_mode. */ +static opt_machine_mode +arm_array_mode (machine_mode mode, unsigned HOST_WIDE_INT nelems) +{ + if (TARGET_HAVE_MVE + /* MVE accepts only tuples of 2 or 4 vectors. */ + && (nelems == 2 + || nelems == 4)) + { + machine_mode struct_mode; + FOR_EACH_MODE_IN_CLASS (struct_mode, GET_MODE_CLASS (mode)) + { + if (GET_MODE_INNER (struct_mode) == GET_MODE_INNER (mode) + && known_eq (GET_MODE_NUNITS (struct_mode), + GET_MODE_NUNITS (mode) * nelems)) + return struct_mode; + } + } + return opt_machine_mode (); +} + /* Implements target hook array_mode_supported_p. */ static bool diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 13a90d854d2..b2044db938b 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1127,8 +1127,17 @@ extern const int arm_arch_cde_coproc_bits[]; ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \ || (MODE) == CImode || (MODE) == XImode) -#define VALID_MVE_STRUCT_MODE(MODE) \ - ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode) +#define VALID_MVE_STRUCT_MODE(MODE) \ + ((MODE) == V2x16QImode \ + || (MODE) == V2x8HImode \ + || (MODE) == V2x4SImode \ + || (MODE) == V2x8HFmode \ + || (MODE) == V2x4SFmode \ + || (MODE) == V4x16QImode \ + || (MODE) == V4x8HImode \ + || (MODE) == V4x4SImode \ + || (MODE) == V4x8HFmode \ + || (MODE) == V4x4SFmode) /* The conditions under which vector modes are supported for general arithmetic using Neon. */ diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index 45b27ed9fb8..d2e382ee347 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -45,23 +45,11 @@ #endif #ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE -#define vst4q(__addr, __value) __arm_vst4q(__addr, __value) #define vuninitializedq(__v) __arm_vuninitializedq(__v) -#define vst2q(__addr, __value) __arm_vst2q(__addr, __value) -#define vld2q(__addr) __arm_vld2q(__addr) -#define vld4q(__addr) __arm_vld4q(__addr) #define vsetq_lane(__a, __b, __idx) __arm_vsetq_lane(__a, __b, __idx) #define vgetq_lane(__a, __idx) __arm_vgetq_lane(__a, __idx) -#define vst4q_s8( __addr, __value) __arm_vst4q_s8( __addr, __value) -#define vst4q_s16( __addr, __value) __arm_vst4q_s16( __addr, __value) -#define vst4q_s32( __addr, __value) __arm_vst4q_s32( __addr, __value) -#define vst4q_u8( __addr, __value) __arm_vst4q_u8( __addr, __value) -#define vst4q_u16( __addr, __value) __arm_vst4q_u16( __addr, __value) -#define vst4q_u32( __addr, __value) __arm_vst4q_u32( __addr, __value) -#define vst4q_f16( __addr, __value) __arm_vst4q_f16( __addr, __value) -#define vst4q_f32( __addr, __value) __arm_vst4q_f32( __addr, __value) #define vpnot(__a) __arm_vpnot(__a) #define vuninitializedq_u8(void) __arm_vuninitializedq_u8(void) #define vuninitializedq_u16(void) __arm_vuninitializedq_u16(void) @@ -73,30 +61,6 @@ #define vuninitializedq_s64(void) __arm_vuninitializedq_s64(void) #define vuninitializedq_f16(void) __arm_vuninitializedq_f16(void) #define vuninitializedq_f32(void) __arm_vuninitializedq_f32(void) -#define vst2q_s8(__addr, __value) __arm_vst2q_s8(__addr, __value) -#define vst2q_u8(__addr, __value) __arm_vst2q_u8(__addr, __value) -#define vld2q_s8(__addr) __arm_vld2q_s8(__addr) -#define vld2q_u8(__addr) __arm_vld2q_u8(__addr) -#define vld4q_s8(__addr) __arm_vld4q_s8(__addr) -#define vld4q_u8(__addr) __arm_vld4q_u8(__addr) -#define vst2q_s16(__addr, __value) __arm_vst2q_s16(__addr, __value) -#define vst2q_u16(__addr, __value) __arm_vst2q_u16(__addr, __value) -#define vld2q_s16(__addr) __arm_vld2q_s16(__addr) -#define vld2q_u16(__addr) __arm_vld2q_u16(__addr) -#define vld4q_s16(__addr) __arm_vld4q_s16(__addr) -#define vld4q_u16(__addr) __arm_vld4q_u16(__addr) -#define vst2q_s32(__addr, __value) __arm_vst2q_s32(__addr, __value) -#define vst2q_u32(__addr, __value) __arm_vst2q_u32(__addr, __value) -#define vld2q_s32(__addr) __arm_vld2q_s32(__addr) -#define vld2q_u32(__addr) __arm_vld2q_u32(__addr) -#define vld4q_s32(__addr) __arm_vld4q_s32(__addr) -#define vld4q_u32(__addr) __arm_vld4q_u32(__addr) -#define vld4q_f16(__addr) __arm_vld4q_f16(__addr) -#define vld2q_f16(__addr) __arm_vld2q_f16(__addr) -#define vst2q_f16(__addr, __value) __arm_vst2q_f16(__addr, __value) -#define vld4q_f32(__addr) __arm_vld4q_f32(__addr) -#define vld2q_f32(__addr) __arm_vld2q_f32(__addr) -#define vst2q_f32(__addr, __value) __arm_vst2q_f32(__addr, __value) #define vsetq_lane_f16(__a, __b, __idx) __arm_vsetq_lane_f16(__a, __b, __idx) #define vsetq_lane_f32(__a, __b, __idx) __arm_vsetq_lane_f32(__a, __b, __idx) #define vsetq_lane_s16(__a, __b, __idx) __arm_vsetq_lane_s16(__a, __b, __idx) @@ -147,60 +111,6 @@ __builtin_arm_lane_check (__ARM_NUM_LANES(__vec), \ __ARM_LANEQ(__vec, __idx)) -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value) -{ - union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_s16 (int16_t * __addr, int16x8x4_t __value) -{ - union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_s32 (int32_t * __addr, int32x4x4_t __value) -{ - union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_u8 (uint8_t * __addr, uint8x16x4_t __value) -{ - union { uint8x16x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv16qi ((__builtin_neon_qi *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_u16 (uint16_t * __addr, uint16x8x4_t __value) -{ - union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv8hi ((__builtin_neon_hi *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_u32 (uint32_t * __addr, uint32x4x4_t __value) -{ - union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv4si ((__builtin_neon_si *) __addr, __rv.__o); -} - __extension__ extern __inline mve_pred16_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vpnot (mve_pred16_t __a) @@ -208,168 +118,6 @@ __arm_vpnot (mve_pred16_t __a) return __builtin_mve_vpnotv16bi (__a); } -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_s8 (int8_t * __addr, int8x16x2_t __value) -{ - union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv16qi ((__builtin_neon_qi *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_u8 (uint8_t * __addr, uint8x16x2_t __value) -{ - union { uint8x16x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv16qi ((__builtin_neon_qi *) __addr, __rv.__o); -} - -__extension__ extern __inline int8x16x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_s8 (int8_t const * __addr) -{ - union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv16qi ((__builtin_neon_qi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline uint8x16x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_u8 (uint8_t const * __addr) -{ - union { uint8x16x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv16qi ((__builtin_neon_qi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline int8x16x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_s8 (int8_t const * __addr) -{ - union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv16qi ((__builtin_neon_qi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline uint8x16x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_u8 (uint8_t const * __addr) -{ - union { uint8x16x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv16qi ((__builtin_neon_qi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_s16 (int16_t * __addr, int16x8x2_t __value) -{ - union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv8hi ((__builtin_neon_hi *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_u16 (uint16_t * __addr, uint16x8x2_t __value) -{ - union { uint16x8x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv8hi ((__builtin_neon_hi *) __addr, __rv.__o); -} - -__extension__ extern __inline int16x8x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_s16 (int16_t const * __addr) -{ - union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv8hi ((__builtin_neon_hi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline uint16x8x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_u16 (uint16_t const * __addr) -{ - union { uint16x8x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv8hi ((__builtin_neon_hi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline int16x8x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_s16 (int16_t const * __addr) -{ - union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv8hi ((__builtin_neon_hi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline uint16x8x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_u16 (uint16_t const * __addr) -{ - union { uint16x8x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv8hi ((__builtin_neon_hi *) __addr); - return __rv.__i; -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_s32 (int32_t * __addr, int32x4x2_t __value) -{ - union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv4si ((__builtin_neon_si *) __addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_u32 (uint32_t * __addr, uint32x4x2_t __value) -{ - union { uint32x4x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv4si ((__builtin_neon_si *) __addr, __rv.__o); -} - -__extension__ extern __inline int32x4x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_s32 (int32_t const * __addr) -{ - union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv4si ((__builtin_neon_si *) __addr); - return __rv.__i; -} - -__extension__ extern __inline uint32x4x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_u32 (uint32_t const * __addr) -{ - union { uint32x4x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv4si ((__builtin_neon_si *) __addr); - return __rv.__i; -} - -__extension__ extern __inline int32x4x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_s32 (int32_t const * __addr) -{ - union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv4si ((__builtin_neon_si *) __addr); - return __rv.__i; -} - -__extension__ extern __inline uint32x4x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_u32 (uint32_t const * __addr) -{ - union { uint32x4x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv4si ((__builtin_neon_si *) __addr); - return __rv.__i; -} - __extension__ extern __inline int16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vsetq_lane_s16 (int16_t __a, int16x8_t __b, const int __idx) @@ -620,78 +368,6 @@ __arm_srshr (int32_t value, const int shift) #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point. */ -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_f16 (float16_t * __addr, float16x8x4_t __value) -{ - union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv8hf (__addr, __rv.__o); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q_f32 (float32_t * __addr, float32x4x4_t __value) -{ - union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst4qv4sf (__addr, __rv.__o); -} - -__extension__ extern __inline float16x8x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_f16 (float16_t const * __addr) -{ - union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv8hf (__addr); - return __rv.__i; -} - -__extension__ extern __inline float16x8x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_f16 (float16_t const * __addr) -{ - union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv8hf (__addr); - return __rv.__i; -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_f16 (float16_t * __addr, float16x8x2_t __value) -{ - union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv8hf (__addr, __rv.__o); -} - -__extension__ extern __inline float32x4x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q_f32 (float32_t const * __addr) -{ - union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv; - __rv.__o = __builtin_mve_vld4qv4sf (__addr); - return __rv.__i; -} - -__extension__ extern __inline float32x4x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q_f32 (float32_t const * __addr) -{ - union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__o = __builtin_mve_vld2qv4sf (__addr); - return __rv.__i; -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q_f32 (float32_t * __addr, float32x4x2_t __value) -{ - union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv; - __rv.__i = __value; - __builtin_mve_vst2qv4sf (__addr, __rv.__o); -} - __extension__ extern __inline float16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vsetq_lane_f16 (float16_t __a, float16x8_t __b, const int __idx) @@ -728,173 +404,6 @@ __arm_vgetq_lane_f32 (float32x4_t __a, const int __idx) #endif #ifdef __cplusplus -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (int8_t * __addr, int8x16x4_t __value) -{ - __arm_vst4q_s8 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (int16_t * __addr, int16x8x4_t __value) -{ - __arm_vst4q_s16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (int32_t * __addr, int32x4x4_t __value) -{ - __arm_vst4q_s32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (uint8_t * __addr, uint8x16x4_t __value) -{ - __arm_vst4q_u8 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (uint16_t * __addr, uint16x8x4_t __value) -{ - __arm_vst4q_u16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (uint32_t * __addr, uint32x4x4_t __value) -{ - __arm_vst4q_u32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (int8_t * __addr, int8x16x2_t __value) -{ - __arm_vst2q_s8 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (uint8_t * __addr, uint8x16x2_t __value) -{ - __arm_vst2q_u8 (__addr, __value); -} - -__extension__ extern __inline int8x16x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (int8_t const * __addr) -{ - return __arm_vld2q_s8 (__addr); -} - -__extension__ extern __inline uint8x16x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (uint8_t const * __addr) -{ - return __arm_vld2q_u8 (__addr); -} - -__extension__ extern __inline int8x16x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (int8_t const * __addr) -{ - return __arm_vld4q_s8 (__addr); -} - -__extension__ extern __inline uint8x16x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (uint8_t const * __addr) -{ - return __arm_vld4q_u8 (__addr); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (int16_t * __addr, int16x8x2_t __value) -{ - __arm_vst2q_s16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (uint16_t * __addr, uint16x8x2_t __value) -{ - __arm_vst2q_u16 (__addr, __value); -} - -__extension__ extern __inline int16x8x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (int16_t const * __addr) -{ - return __arm_vld2q_s16 (__addr); -} - -__extension__ extern __inline uint16x8x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (uint16_t const * __addr) -{ - return __arm_vld2q_u16 (__addr); -} - -__extension__ extern __inline int16x8x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (int16_t const * __addr) -{ - return __arm_vld4q_s16 (__addr); -} - -__extension__ extern __inline uint16x8x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (uint16_t const * __addr) -{ - return __arm_vld4q_u16 (__addr); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (int32_t * __addr, int32x4x2_t __value) -{ - __arm_vst2q_s32 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (uint32_t * __addr, uint32x4x2_t __value) -{ - __arm_vst2q_u32 (__addr, __value); -} - -__extension__ extern __inline int32x4x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (int32_t const * __addr) -{ - return __arm_vld2q_s32 (__addr); -} - -__extension__ extern __inline uint32x4x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (uint32_t const * __addr) -{ - return __arm_vld2q_u32 (__addr); -} - -__extension__ extern __inline int32x4x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (int32_t const * __addr) -{ - return __arm_vld4q_s32 (__addr); -} - -__extension__ extern __inline uint32x4x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (uint32_t const * __addr) -{ - return __arm_vld4q_u32 (__addr); -} __extension__ extern __inline int16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) @@ -1010,62 +519,6 @@ __arm_vgetq_lane (uint64x2_t __a, const int __idx) #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point. */ -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (float16_t * __addr, float16x8x4_t __value) -{ - __arm_vst4q_f16 (__addr, __value); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst4q (float32_t * __addr, float32x4x4_t __value) -{ - __arm_vst4q_f32 (__addr, __value); -} - -__extension__ extern __inline float16x8x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (float16_t const * __addr) -{ - return __arm_vld4q_f16 (__addr); -} - -__extension__ extern __inline float16x8x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (float16_t const * __addr) -{ - return __arm_vld2q_f16 (__addr); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (float16_t * __addr, float16x8x2_t __value) -{ - __arm_vst2q_f16 (__addr, __value); -} - -__extension__ extern __inline float32x4x4_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld4q (float32_t const * __addr) -{ - return __arm_vld4q_f32 (__addr); -} - -__extension__ extern __inline float32x4x2_t -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vld2q (float32_t const * __addr) -{ - return __arm_vld2q_f32 (__addr); -} - -__extension__ extern __inline void -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -__arm_vst2q (float32_t * __addr, float32x4x2_t __value) -{ - __arm_vst2q_f32 (__addr, __value); -} - __extension__ extern __inline float16x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vsetq_lane (float16_t __a, float16x8_t __b, const int __idx) @@ -1405,51 +858,6 @@ extern void *__ARM_undef; #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point. */ -#define __arm_vst4q(p0,p1) ({ __typeof(p0) __p0 = (p0); \ - __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: __arm_vst4q_s8 (__ARM_mve_coerce_s8_ptr(__p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x4_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: __arm_vst4q_s16 (__ARM_mve_coerce_s16_ptr(__p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x4_t)), \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: __arm_vst4q_s32 (__ARM_mve_coerce_s32_ptr(__p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x4_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: __arm_vst4q_u8 (__ARM_mve_coerce_u8_ptr(__p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x4_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce_u16_ptr(__p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce_u32_ptr(__p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)), \ - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x4_t]: __arm_vst4q_f16 (__ARM_mve_coerce_f16_ptr(__p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x4_t)), \ - int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x4_t]: __arm_vst4q_f32 (__ARM_mve_coerce_f32_ptr(__p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x4_t)));}) - -#define __arm_vld2q(p0) ( \ - _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \ - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld2q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \ - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld2q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \ - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld2q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \ - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld2q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \ - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld2q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)), \ - int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld2q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *)), \ - int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld2q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *)))) - -#define __arm_vld4q(p0) ( \ - _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld4q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \ - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld4q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \ - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld4q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \ - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld4q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \ - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \ - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)), \ - int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld4q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *)), \ - int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld4q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *)))) - -#define __arm_vst2q(p0,p1) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x2_t]: __arm_vst2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x2_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x2_t]: __arm_vst2q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x2_t)), \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x2_t]: __arm_vst2q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x2_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x2_t]: __arm_vst2q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x2_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x2_t]: __arm_vst2q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x2_t)), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x2_t]: __arm_vst2q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x2_t)), \ - int (*)[__ARM_mve_type_float16_t_ptr][__ARM_mve_type_float16x8x2_t]: __arm_vst2q_f16 (__ARM_mve_coerce_f16_ptr(p0, float16_t *), __ARM_mve_coerce(__p1, float16x8x2_t)), \ - int (*)[__ARM_mve_type_float32_t_ptr][__ARM_mve_type_float32x4x2_t]: __arm_vst2q_f32 (__ARM_mve_coerce_f32_ptr(p0, float32_t *), __ARM_mve_coerce(__p1, float32x4x2_t)));}) - #define __arm_vuninitializedq(p0) ({ __typeof(p0) __p0 = (p0); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \ int (*)[__ARM_mve_type_int8x16_t]: __arm_vuninitializedq_s8 (), \ @@ -1492,25 +900,6 @@ extern void *__ARM_undef; #else /* MVE Integer. */ -#define __arm_vst4q(p0,p1) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x4_t]: __arm_vst4q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x4_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x4_t]: __arm_vst4q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x4_t)), \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x4_t]: __arm_vst4q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x4_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x4_t]: __arm_vst4q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x4_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x4_t]: __arm_vst4q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x4_t)), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x4_t]: __arm_vst4q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x4_t)));}) - -#define __arm_vst2q(p0,p1) ({ __typeof(p1) __p1 = (p1); \ - _Generic( (int (*)[__ARM_mve_typeid(p0)][__ARM_mve_typeid(__p1)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr][__ARM_mve_type_int8x16x2_t]: __arm_vst2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *), __ARM_mve_coerce(__p1, int8x16x2_t)), \ - int (*)[__ARM_mve_type_int16_t_ptr][__ARM_mve_type_int16x8x2_t]: __arm_vst2q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *), __ARM_mve_coerce(__p1, int16x8x2_t)), \ - int (*)[__ARM_mve_type_int32_t_ptr][__ARM_mve_type_int32x4x2_t]: __arm_vst2q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *), __ARM_mve_coerce(__p1, int32x4x2_t)), \ - int (*)[__ARM_mve_type_uint8_t_ptr][__ARM_mve_type_uint8x16x2_t]: __arm_vst2q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *), __ARM_mve_coerce(__p1, uint8x16x2_t)), \ - int (*)[__ARM_mve_type_uint16_t_ptr][__ARM_mve_type_uint16x8x2_t]: __arm_vst2q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *), __ARM_mve_coerce(__p1, uint16x8x2_t)), \ - int (*)[__ARM_mve_type_uint32_t_ptr][__ARM_mve_type_uint32x4x2_t]: __arm_vst2q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *), __ARM_mve_coerce(__p1, uint32x4x2_t)));}) - - #define __arm_vuninitializedq(p0) ({ __typeof(p0) __p0 = (p0); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \ int (*)[__ARM_mve_type_int8x16_t]: __arm_vuninitializedq_s8 (), \ @@ -1522,23 +911,6 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint32x4_t]: __arm_vuninitializedq_u32 (), \ int (*)[__ARM_mve_type_uint64x2_t]: __arm_vuninitializedq_u64 ());}) -#define __arm_vld2q(p0) ( _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld2q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \ - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld2q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \ - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld2q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \ - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld2q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \ - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld2q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \ - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld2q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)))) - - -#define __arm_vld4q(p0) ( _Generic( (int (*)[__ARM_mve_typeid(p0)])0, \ - int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld4q_s8 (__ARM_mve_coerce_s8_ptr(p0, int8_t *)), \ - int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld4q_s16 (__ARM_mve_coerce_s16_ptr(p0, int16_t *)), \ - int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld4q_s32 (__ARM_mve_coerce_s32_ptr(p0, int32_t *)), \ - int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld4q_u8 (__ARM_mve_coerce_u8_ptr(p0, uint8_t *)), \ - int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce_u16_ptr(p0, uint16_t *)), \ - int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce_u32_ptr(p0, uint32_t *)))) - #define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \ _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \ int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \ diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index b85b334a81e..90d8f90b98f 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -18,7 +18,6 @@ along with GCC; see the file COPYING3. If not see . */ -VAR5 (STORE1, vst4q, v16qi, v8hi, v4si, v8hf, v4sf) VAR2 (UNOP_NONE_NONE, vrndxq_f, v8hf, v4sf) VAR2 (UNOP_NONE_NONE, vrndq_f, v8hf, v4sf) VAR2 (UNOP_NONE_NONE, vrndpq_f, v8hf, v4sf) @@ -679,9 +678,6 @@ VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsbciq_m_s, v4si) VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsbciq_m_u, v4si) VAR1 (QUADOP_NONE_NONE_NONE_NONE_PRED, vsbcq_m_s, v4si) VAR1 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vsbcq_m_u, v4si) -VAR5 (STORE1, vst2q, v16qi, v8hi, v4si, v8hf, v4sf) -VAR5 (LOAD1, vld4q, v16qi, v8hi, v4si, v8hf, v4sf) -VAR5 (LOAD1, vld2q, v16qi, v8hi, v4si, v8hf, v4sf) VAR1 (ASRL, sqrshr_,si) VAR1 (ASRL, sqrshrl_sat64_,di) VAR1 (ASRL, sqrshrl_sat48_,di) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 1caf5d18ad6..cfe712ceda9 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -139,7 +139,18 @@ (define_mode_iterator VQXMOV [V16QI V8HI V8HF V8BF V4SI V4SF V2DI TI]) ;; Opaque structure types wider than TImode. (define_mode_iterator VSTRUCT [(EI "!TARGET_HAVE_MVE") OI - (CI "!TARGET_HAVE_MVE") XI]) + (CI "!TARGET_HAVE_MVE") XI + (V2x16QI "TARGET_HAVE_MVE") + (V2x8HI "TARGET_HAVE_MVE") + (V2x4SI "TARGET_HAVE_MVE") + (V2x8HF "TARGET_HAVE_MVE_FLOAT") + (V2x4SF "TARGET_HAVE_MVE_FLOAT") + (V4x16QI "TARGET_HAVE_MVE") + (V4x8HI "TARGET_HAVE_MVE") + (V4x4SI "TARGET_HAVE_MVE") + (V4x8HF "TARGET_HAVE_MVE_FLOAT") + (V4x4SF "TARGET_HAVE_MVE_FLOAT") + ]) ;; Opaque structure types used in table lookups (except vtbl1/vtbx1). (define_mode_iterator VTAB [TI EI OI]) @@ -286,6 +297,29 @@ (define_mode_iterator MVE_7_HI [HI V16BI V8BI V4BI V2QI]) (define_mode_iterator MVE_V8HF [V8HF]) (define_mode_iterator MVE_V16QI [V16QI]) +(define_mode_attr MVE_VLD2_VST2 [(V16QI "V2x16QI") + (V8HI "V2x8HI") + (V4SI "V2x4SI") + (V8HF "V2x8HF") + (V4SF "V2x4SF")]) +(define_mode_attr MVE_vld2_vst2 [(V16QI "v2x16qi") + (V8HI "v2x8hi") + (V4SI "v2x4si") + (V8HF "v2x8hf") + (V4SF "v2x4sf")]) + +(define_mode_attr MVE_VLD4_VST4 [(V16QI "V4x16QI") + (V8HI "V4x8HI") + (V4SI "V4x4SI") + (V8HF "V4x8HF") + (V4SF "V4x4SF")]) + +(define_mode_attr MVE_vld4_vst4 [(V16QI "v4x16qi") + (V8HI "v4x8hi") + (V4SI "v4x4si") + (V8HF "v4x8hf") + (V4SF "v4x4sf")]) + ;; Types for MVE truncating stores and widening loads (define_mode_iterator MVE_w_narrow_TYPE [V8QI V4QI V4HI]) (define_mode_attr MVE_w_narrow_type [(V8QI "v8qi") (V4QI "v4qi") (V4HI "v4hi")]) diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 70f6ec6c2cc..325dad87833 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -110,13 +110,14 @@ (define_insn "@mve_vdupq_n" ;; ;; [vst4q]) ;; -(define_insn "mve_vst4q" - [(set (match_operand:XI 0 "mve_struct_operand" "=Ug") - (unspec:XI [(match_operand:XI 1 "s_register_operand" "w") - (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_insn "@mve_vst4q" + [(set (match_operand: 0 "mve_struct_operand" "=Ug") + (unspec: + [(match_operand: 1 "s_register_operand" "w") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VST4Q)) ] - "TARGET_HAVE_MVE" + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" { rtx ops[6]; int regno = REGNO (operands[1]); @@ -4061,14 +4062,14 @@ (define_insn "@mve_q_m_v4si" ;; ;; [vst2q]) ;; -(define_insn "mve_vst2q" - [(set (match_operand:OI 0 "mve_struct_operand" "=Ug") - (unspec:OI [(match_operand:OI 1 "s_register_operand" "w") - (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_insn "@mve_vst2q" + [(set (match_operand: 0 "mve_struct_operand" "=Ug") + (unspec: + [(match_operand: 1 "s_register_operand" "w") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VST2Q)) ] - "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) - || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" { rtx ops[4]; int regno = REGNO (operands[1]); @@ -4089,14 +4090,14 @@ (define_insn "mve_vst2q" ;; ;; [vld2q]) ;; -(define_insn "mve_vld2q" - [(set (match_operand:OI 0 "s_register_operand" "=w") - (unspec:OI [(match_operand:OI 1 "mve_struct_operand" "Ug") - (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_insn "@mve_vld2q" + [(set (match_operand: 0 "s_register_operand" "=w") + (unspec: + [(match_operand: 1 "mve_struct_operand" "Ug") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VLD2Q)) ] - "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) - || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" { rtx ops[4]; int regno = REGNO (operands[0]); @@ -4117,14 +4118,14 @@ (define_insn "mve_vld2q" ;; ;; [vld4q]) ;; -(define_insn "mve_vld4q" - [(set (match_operand:XI 0 "s_register_operand" "=w") - (unspec:XI [(match_operand:XI 1 "mve_struct_operand" "Ug") - (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] +(define_insn "@mve_vld4q" + [(set (match_operand: 0 "s_register_operand" "=w") + (unspec: + [(match_operand: 1 "mve_struct_operand" "Ug") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] VLD4Q)) ] - "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) - || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" { rtx ops[6]; int regno = REGNO (operands[0]); diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index ff1c27a0d71..03a5cf9e7e3 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -492,12 +492,21 @@ (define_expand "vec_load_lanesoi" (unspec:OI [(match_operand:OI 1 "neon_struct_operand") (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2))] - "TARGET_NEON || TARGET_HAVE_MVE" + "TARGET_NEON" { - if (TARGET_NEON) - emit_insn (gen_neon_vld2 (operands[0], operands[1])); - else - emit_insn (gen_mve_vld2q (operands[0], operands[1])); + emit_insn (gen_neon_vld2 (operands[0], operands[1])); + DONE; +}) + +;;; On MVE we use V2xYYY modes instead of OI +(define_expand "vec_load_lanes" + [(set (match_operand: 0 "s_register_operand") + (unspec: [(match_operand: 1 "neon_struct_operand") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_VLD2))] + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" +{ + emit_insn (gen_mve_vld2q (operands[0], operands[1])); DONE; }) @@ -506,12 +515,21 @@ (define_expand "vec_store_lanesoi" (unspec:OI [(match_operand:OI 1 "s_register_operand") (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST2))] - "TARGET_NEON || TARGET_HAVE_MVE" + "TARGET_NEON" { - if (TARGET_NEON) - emit_insn (gen_neon_vst2 (operands[0], operands[1])); - else - emit_insn (gen_mve_vst2q (operands[0], operands[1])); + emit_insn (gen_neon_vst2 (operands[0], operands[1])); + DONE; +}) + +;;; On MVE we use V2xYYY modes instead of OI +(define_expand "vec_store_lanes" + [(set (match_operand: 0 "neon_struct_operand") + (unspec: [(match_operand: 1 "s_register_operand") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_VST2))] + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" +{ + emit_insn (gen_mve_vst2q (operands[0], operands[1])); DONE; }) @@ -519,12 +537,21 @@ (define_expand "vec_load_lanesxi" [(match_operand:XI 0 "s_register_operand") (match_operand:XI 1 "neon_struct_operand") (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - "TARGET_NEON || TARGET_HAVE_MVE" + "TARGET_NEON" { - if (TARGET_NEON) - emit_insn (gen_neon_vld4 (operands[0], operands[1])); - else - emit_insn (gen_mve_vld4q (operands[0], operands[1])); + emit_insn (gen_neon_vld4 (operands[0], operands[1])); + DONE; +}) + +;;; On MVE we use V4xYYY modes instead of XI +(define_expand "vec_load_lanes" + [(set (match_operand: 0 "s_register_operand") + (unspec: [(match_operand: 1 "neon_struct_operand") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_VLD4))] + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" +{ + emit_insn (gen_mve_vld4q (operands[0], operands[1])); DONE; }) @@ -532,12 +559,21 @@ (define_expand "vec_store_lanesxi" [(match_operand:XI 0 "neon_struct_operand") (match_operand:XI 1 "s_register_operand") (unspec:VQ2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - "TARGET_NEON || TARGET_HAVE_MVE" + "TARGET_NEON" { - if (TARGET_NEON) - emit_insn (gen_neon_vst4 (operands[0], operands[1])); - else - emit_insn (gen_mve_vst4q (operands[0], operands[1])); + emit_insn (gen_neon_vst4 (operands[0], operands[1])); + DONE; +}) + +;;; On MVE we use V4xYYY modes instead of XI +(define_expand "vec_store_lanes" + [(set (match_operand: 0 "neon_struct_operand") + (unspec: [(match_operand: 1 "s_register_operand") + (unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_VST4))] + "(TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode))" +{ + emit_insn (gen_mve_vst4q (operands[0], operands[1])); DONE; })