From patchwork Fri Oct 18 14:22:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999196 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=FpzWvcWd; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRkW4cpzz1xvV for ; Sat, 19 Oct 2024 01:22:59 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BD64D385841E for ; Fri, 18 Oct 2024 14:22:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by sourceware.org (Postfix) with ESMTPS id 9E40E3858D37 for ; Fri, 18 Oct 2024 14:22:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9E40E3858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9E40E3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::534 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261349; cv=none; b=lC355l8MEIfbMo4EPhWTAABbigbbR9p/H2lZIfsQelFlZ0smxsj0ceOyZp6GYy12K2CfzvKREhl2KUo/c4F7cOlu2LaI9i+oSapDn/9+pja/wcFRrJVHdo1UC35lXlAr9MA2cwqodt5x+3lZ07ff7I/y3v8Uuq7A7Skui8KD+5A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261349; c=relaxed/simple; bh=sgj4dbiNCasUojCaDOfHtFMAUzrlzSznuQCwvEqKLCc=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=VBxjnVf0Pt/Uy4L5ZwwEqJYs9hxXRUUxj8yOkiZL2LAeJshW/r2Ro6DbKh8w4o4jCu3M/VsjY5Mhf5gHqRXIFZ+kORn/5WIi9kFJpbBdUOrVBboQm6aU2ampvZ+Mh4zOjY/7r0aST3JmIBNFtiG6sWDs0vHo6y0+ls3bpPc82WE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-5c9625cfe4dso2642152a12.0 for ; Fri, 18 Oct 2024 07:22:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261343; x=1729866143; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KhJO8uKERAhlVQtdkBzE2wZoVQVyy76Y+8GTqq/jovA=; b=FpzWvcWd40kQZgTF64XpmSIPZfuLbcCYWm6tuQmsVLkyOjo2LJ6hrMM/PNQOf5jPv0 3oBPO0sRxbNFoB+p90HkHHVGlKVq16x/iMmywmV2ySAWJZnPhBOkcSQDX4HuYfP8VLcF ngcJHWusp6XrH6ASisbaxla3Bu7HstIa2klsImXoWIA5Zt8Eqj5HN2E/4AujrFi1mHag cBCBrvRdKsrT+cP8i7ubjzC/jUrn0aRfkMIcsBVmnpVrn1Frue6fERs4VP9agis8mdU/ oTqk6vKF6f+6VX84Ueqjjtbm/a6GwcTbzUOg9EaNwiIDyHt/i9FDkNU80D+Ml8ZQaxCQ AHhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261343; x=1729866143; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KhJO8uKERAhlVQtdkBzE2wZoVQVyy76Y+8GTqq/jovA=; b=FT6v1v4Bi/bCUdh+r9Zy2NgWYo0xKn99FAJU9eP+Z1OuZ2EwFU74eOwzOgg3ITJ4Y0 of5jLylD4dvVTWpfZpqXo23AAiQoBO+bhbYSn67U8mVc9iIP71gXy2EI2t5rzy4xSIus yuLvbDsaqUXLF3qKJ3qM+nfqxUy6vzUbHggzgow0TWQUANNfFbBFF6aPv4CXTw8TDXKS n29zzQmlZ8WFV7IF+jda6vzlsr7noU5PTLOD9jfKFCQlpdDZtNBQYkMkuYS6YiJ81nbd ASTR3Sb7T8LCktoAUYtq6XpoKw0I+yZGlLKmHv+yFe71fXiBW/1Q4Whi5n5lEugHV02s +Jqw== X-Gm-Message-State: AOJu0YwhXsnmo6rOs+q6dMMmtcpVIEvHdTGYbgBMCi/Yywaj72grbumj +BUSmOcDi1ua2k6k2JMq36PjFn1oxCbnTkRAm4HonnordRBiMErxsLOXpA== X-Google-Smtp-Source: AGHT+IH5EVBfYdaUZhRM6eWp0rjt51U0jAHVQpIUtbeBYgbBT9WkXPy+GsfulNUisBUxG6V0lrorQQ== X-Received: by 2002:a17:907:86a5:b0:a99:77f0:51f7 with SMTP id a640c23a62f3a-a9a69ca375amr223915566b.61.1729261342517; Fri, 18 Oct 2024 07:22:22 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:22 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 1/8] docs: Document maskload else operand and behavior. Date: Fri, 18 Oct 2024 16:22:13 +0200 Message-ID: <20241018142220.173482-2-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch amends the documentation for masked loads (maskload, vec_mask_load_lanes, and mask_gather_load as well as their len counterparts) with an else operand. gcc/ChangeLog: * doc/md.texi: Document masked load else operand. --- gcc/doc/md.texi | 63 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 41 insertions(+), 22 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 603f74a78c0..632b036b36c 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5017,8 +5017,10 @@ This pattern is not allowed to @code{FAIL}. @item @samp{vec_mask_load_lanes@var{m}@var{n}} Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional mask operand (operand 2) that specifies which elements of the destination -vectors should be loaded. Other elements of the destination -vectors are set to zero. The operation is equivalent to: +vectors should be loaded. Other elements of the destination vectors are +taken from operand 3, which is an else operand similar to the one in +@code{maskload}. +The operation is equivalent to: @smallexample int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); @@ -5028,7 +5030,7 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) operand0[i][j] = operand1[j * c + i]; else for (i = 0; i < c; i++) - operand0[i][j] = 0; + operand0[i][j] = operand3[j]; @end smallexample This pattern is not allowed to @code{FAIL}. @@ -5036,16 +5038,20 @@ This pattern is not allowed to @code{FAIL}. @cindex @code{vec_mask_len_load_lanes@var{m}@var{n}} instruction pattern @item @samp{vec_mask_len_load_lanes@var{m}@var{n}} Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional -mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) -that specifies which elements of the destination vectors should be loaded. -Other elements of the destination vectors are undefined. The operation is equivalent to: +mask operand (operand 2), length operand (operand 4) as well as bias operand +(operand 5) that specifies which elements of the destination vectors should be +loaded. Other elements of the destination vectors are taken from operand 3, +which is an else operand similar to the one in @code{maskload}. +The operation is equivalent to: @smallexample int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); -for (j = 0; j < operand3 + operand4; j++) - if (operand2[j]) - for (i = 0; i < c; i++) +for (j = 0; j < operand4 + operand5; j++) + for (i = 0; i < c; i++) + if (operand2[j]) operand0[i][j] = operand1[j * c + i]; + else + operand0[i][j] = operand3[j]; @end smallexample This pattern is not allowed to @code{FAIL}. @@ -5125,18 +5131,25 @@ address width. @cindex @code{mask_gather_load@var{m}@var{n}} instruction pattern @item @samp{mask_gather_load@var{m}@var{n}} Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand as -operand 5. Bit @var{i} of the mask is set if element @var{i} +operand 5. +Other elements of the destination vectors are taken from operand 6, +which is an else operand similar to the one in @code{maskload}. +Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} -of the result should be set to zero. +of the result should be set to operand 6. @cindex @code{mask_len_gather_load@var{m}@var{n}} instruction pattern @item @samp{mask_len_gather_load@var{m}@var{n}} -Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand (operand 5), -a len operand (operand 6) as well as a bias operand (operand 7). Similar to mask_len_load, -the instruction loads at most (operand 6 + operand 7) elements from memory. +Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand +(operand 5) and an else operand (operand 6) as well as a len operand +(operand 7) and a bias operand (operand 8). + +Similar to mask_len_load the instruction loads at +most (operand 7 + operand 8) elements from memory. Bit @var{i} of the mask is set if element @var{i} of the result should -be loaded from memory and clear if element @var{i} of the result should be undefined. -Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +be loaded from memory and clear if element @var{i} of the result should +be set to element @var{i} of operand 6. +Mask elements @var{i} with @var{i} > (operand 7 + operand 8) are ignored. @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} @@ -5368,8 +5381,13 @@ Operands 4 and 5 have a target-dependent scalar integer mode. @cindex @code{maskload@var{m}@var{n}} instruction pattern @item @samp{maskload@var{m}@var{n}} Perform a masked load of vector from memory operand 1 of mode @var{m} -into register operand 0. Mask is provided in register operand 2 of -mode @var{n}. +into register operand 0. The mask is provided in register operand 2 of +mode @var{n}. Operand 3 (the "else value") is of mode @var{m} and +specifies which value is loaded when the mask is unset. +The predicate of operand 3 must only accept the else values that the target +actually supports. Currently three values are attempted, zero, -1, and +undefined. GCC handles an else value of zero more efficiently than -1 or +undefined. This pattern is not allowed to @code{FAIL}. @@ -5435,15 +5453,16 @@ Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type @var{n}. The mask has lower precedence than the length and is itself subject to length masking, -i.e. only mask indices < (operand 3 + operand 4) are used. +i.e. only mask indices < (operand 4 + operand 5) are used. +Operand 3 is an else operand similar to the one in @code{maskload}. Operand 4 conceptually has mode @code{QI}. -Operand 2 can be a variable or a constant amount. Operand 4 specifies a +Operand 4 can be a variable or a constant amount. Operand 5 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on -operand 4 must only accept the bias values that the target actually supports. +operand 5 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. -If (operand 2 + operand 4) exceeds the number of elements in mode +If (operand 4 + operand 5) exceeds the number of elements in mode @var{m}, the behavior is undefined. If the target prefers the length to be measured in bytes From patchwork Fri Oct 18 14:22:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999206 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=NZFMLxIN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRm41gTFz1xwG for ; Sat, 19 Oct 2024 01:24:19 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB1023858C2B for ; Fri, 18 Oct 2024 14:24:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id 1674E3858C42 for ; Fri, 18 Oct 2024 14:22:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1674E3858C42 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1674E3858C42 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::535 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261350; cv=none; b=oy0GEQLGl+RCsSQMC84CFpNp56SaaRLrmJubfaHXFDlZNeabAcaxLZZLEptomJ6NaGLO83LRWmGk+oFXFskUme0p5WZqsIA1jm+UJ0tm1nSkuxKJ/wiB3gvOkMTRypFf/8PzZZcJTZNen22CfINAfr8Cf2Y2IhXK1Mdp7MV2i+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261350; c=relaxed/simple; bh=cJMyerBZ5Wlu7iRlnIZMw39D6ThI9e6nMEoKr5TaS48=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=fPXietky320Z/+2h8hmFZ2qG0XKnxazMba7lygV/f5EJcG5O/BIEYfFE1FDgdUggtJBGmRKNR8vkzMb+ZtjWk8DPOOO6fZYh5MFMcFbmrFNXQlMpUbNp1i9Ov5FATATbzKsJwXDZG7+b1KMCFoKe0iS4KCKYbeJPR5eQ+FHgD1Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-5c9362c26d8so5406551a12.1 for ; Fri, 18 Oct 2024 07:22:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261344; x=1729866144; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qNoKdapSoWx0NMJKdzii9Ptoe/Ytijh7P6iTVOCnuOY=; b=NZFMLxINcdbLzdQRnE9vjxrt9t2hQAIzGGMXA/ABbth+1IxFfcp8hcZyOTmouN0hVe t1RX+WCJG6XudwrEWbhg73InHmT23KnQHHLcSutNmflZ3mKN+mUjnl41g+kAs/3IDPN2 PINenXYYliV9USZz3b6aHN6xAjxAvbvpasNJmr3jrdkxMLl/haL0QK8fVpQeewumsBqu gh1kQke6xpW3eyOYOW6/yAOg9qshI/QtZeclENit0g+L5w1HkiyKPrW9/xLJ1j8NoQI8 wj/pbVqPIcC8tMWXA4YY+kFmVEGYIhoLpNjg5WKvOpgqV7t7U3INl7/a4V9E5KHqb2sG eMgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261344; x=1729866144; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qNoKdapSoWx0NMJKdzii9Ptoe/Ytijh7P6iTVOCnuOY=; b=gm+jjupFHD2f5WVGFO5vOqTs5j2CHBP+E+caivKXIDNhgqyRkXtpziJ3XomgGYIemw w7AEVGU64i3HwF2Ts87EyvMbpp/GuwnjWeMyQEsq3FfZbQKPS7UMCr2hwESJMeiaQMBn q0Fl6ereXz/VG/q+W40v2vQariQfYcK4kOtYnyv9xn5lHJYYnx1B7p10emxvqeIUSB01 qq+2o9yYI91EsSj8JaB2Fem7zaaiwAz1RJ2M1JRwdQepwkV/oN3i/IgdL4U46wtrtc/+ ONvsQEPglCXi2/2p04xGpIHsOX5hn3+GacdOIdRzg+d8TG3KGOQgN1TpJuXmJez4g4F6 0noQ== X-Gm-Message-State: AOJu0Yy5qhw5mvhMtwR7+ktoz72J8YpyApkB9tz6aNHgd7cDd7UWFM4H loqTni2id/IRa2l9oofrKhuScbW/Fh5kffNXI/Vxd16VeFF7TBN7+jqbUw== X-Google-Smtp-Source: AGHT+IF5UB48tBcVYUfy9zq4LPEIh5TIELdE+DGydEjQ0MKM5Z7sGss3W6UDyr4Dqv547uQP7WnrTg== X-Received: by 2002:a17:906:c14c:b0:a9a:3da4:f42f with SMTP id a640c23a62f3a-a9a6a3eb48bmr296006266b.7.1729261343337; Fri, 18 Oct 2024 07:22:23 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:22 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 2/8] ifn: Add else-operand handling. Date: Fri, 18 Oct 2024 16:22:14 +0200 Message-ID: <20241018142220.173482-3-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds else-operand handling to the internal functions. gcc/ChangeLog: * internal-fn.cc (add_mask_and_len_args): Rename... (add_mask_else_and_len_args): ...to this and add else handling. (expand_partial_load_optab_fn): Use adjusted function. (expand_partial_store_optab_fn): Ditto. (expand_scatter_store_optab_fn): Ditto. (expand_gather_load_optab_fn): Ditto. (internal_fn_len_index): Add else handling. (internal_fn_else_index): Ditto. (internal_fn_mask_index): Ditto. (get_supported_else_vals): New function. (supported_else_val_p): New function. (internal_gather_scatter_fn_supported_p): Add else operand. * internal-fn.h (internal_gather_scatter_fn_supported_p): Define else constants. (MASK_LOAD_ELSE_ZERO): Ditto. (MASK_LOAD_ELSE_M1): Ditto. (MASK_LOAD_ELSE_UNDEFINED): Ditto. (get_supported_else_vals): Declare. (supported_else_val_p): Ditto. --- gcc/internal-fn.cc | 131 +++++++++++++++++++++++++++++++++++++++------ gcc/internal-fn.h | 15 +++++- 2 files changed, 129 insertions(+), 17 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index d89a04fe412..b6049cec91e 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -331,17 +331,18 @@ get_multi_vector_move (tree array_type, convert_optab optab) return convert_optab_handler (optab, imode, vmode); } -/* Add mask and len arguments according to the STMT. */ +/* Add mask, else, and len arguments according to the STMT. */ static unsigned int -add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt) +add_mask_else_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt) { internal_fn ifn = gimple_call_internal_fn (stmt); int len_index = internal_fn_len_index (ifn); /* BIAS is always consecutive next of LEN. */ int bias_index = len_index + 1; int mask_index = internal_fn_mask_index (ifn); - /* The order of arguments are always {len,bias,mask}. */ + + /* The order of arguments is always {mask, else, len, bias}. */ if (mask_index >= 0) { tree mask = gimple_call_arg (stmt, mask_index); @@ -362,6 +363,23 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt) create_input_operand (&ops[opno++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); + + } + + int els_index = internal_fn_else_index (ifn); + if (els_index >= 0) + { + tree els = gimple_call_arg (stmt, els_index); + tree els_type = TREE_TYPE (els); + if (TREE_CODE (els) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (els) + && VAR_P (SSA_NAME_VAR (els))) + create_undefined_input_operand (&ops[opno++], TYPE_MODE (els_type)); + else + { + rtx els_rtx = expand_normal (els); + create_input_operand (&ops[opno++], els_rtx, TYPE_MODE (els_type)); + } } if (len_index >= 0) { @@ -3014,7 +3032,7 @@ static void expand_partial_load_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab) { int i = 0; - class expand_operand ops[5]; + class expand_operand ops[6]; tree type, lhs, rhs, maskt; rtx mem, target; insn_code icode; @@ -3044,7 +3062,7 @@ expand_partial_load_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab) target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); create_call_lhs_operand (&ops[i++], target, TYPE_MODE (type)); create_fixed_operand (&ops[i++], mem); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); expand_insn (icode, i, ops); assign_call_lhs (lhs, target, &ops[0]); @@ -3090,7 +3108,7 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab reg = expand_normal (rhs); create_fixed_operand (&ops[i++], mem); create_input_operand (&ops[i++], reg, TYPE_MODE (type)); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); expand_insn (icode, i, ops); } @@ -3676,7 +3694,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); create_integer_operand (&ops[i++], scale_int); create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)), TYPE_MODE (TREE_TYPE (offset))); @@ -3705,7 +3723,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); create_integer_operand (&ops[i++], scale_int); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)), TYPE_MODE (TREE_TYPE (offset))); expand_insn (icode, i, ops); @@ -4600,6 +4618,18 @@ get_len_internal_fn (internal_fn fn) case IFN_COND_##NAME: \ return IFN_COND_LEN_##NAME; #include "internal-fn.def" + default: + break; + } + + switch (fn) + { + case IFN_MASK_LOAD: + return IFN_MASK_LEN_LOAD; + case IFN_MASK_LOAD_LANES: + return IFN_MASK_LEN_LOAD_LANES; + case IFN_MASK_GATHER_LOAD: + return IFN_MASK_LEN_GATHER_LOAD; default: return IFN_LAST; } @@ -4785,8 +4815,12 @@ internal_fn_len_index (internal_fn fn) case IFN_LEN_STORE: return 2; - case IFN_MASK_LEN_GATHER_LOAD: case IFN_MASK_LEN_SCATTER_STORE: + return 5; + + case IFN_MASK_LEN_GATHER_LOAD: + return 6; + case IFN_COND_LEN_FMA: case IFN_COND_LEN_FMS: case IFN_COND_LEN_FNMA: @@ -4811,13 +4845,15 @@ internal_fn_len_index (internal_fn fn) return 4; case IFN_COND_LEN_NEG: - case IFN_MASK_LEN_LOAD: case IFN_MASK_LEN_STORE: - case IFN_MASK_LEN_LOAD_LANES: case IFN_MASK_LEN_STORE_LANES: case IFN_VCOND_MASK_LEN: return 3; + case IFN_MASK_LEN_LOAD: + case IFN_MASK_LEN_LOAD_LANES: + return 4; + default: return -1; } @@ -4867,6 +4903,12 @@ internal_fn_else_index (internal_fn fn) case IFN_COND_LEN_SHR: return 3; + case IFN_MASK_LOAD: + case IFN_MASK_LEN_LOAD: + case IFN_MASK_LOAD_LANES: + case IFN_MASK_LEN_LOAD_LANES: + return 3; + case IFN_COND_FMA: case IFN_COND_FMS: case IFN_COND_FNMA: @@ -4877,6 +4919,10 @@ internal_fn_else_index (internal_fn fn) case IFN_COND_LEN_FNMS: return 4; + case IFN_MASK_GATHER_LOAD: + case IFN_MASK_LEN_GATHER_LOAD: + return 5; + default: return -1; } @@ -4908,6 +4954,7 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_LEN_SCATTER_STORE: return 4; + case IFN_VCOND_MASK: case IFN_VCOND_MASK_LEN: return 0; @@ -4944,6 +4991,50 @@ internal_fn_stored_value_index (internal_fn fn) } } + +/* Push all supported else values for the optab referred to by ICODE + into ELSE_VALS. The index of the else operand must be specified in + ELSE_INDEX. */ + +void +get_supported_else_vals (enum insn_code icode, unsigned else_index, + auto_vec &else_vals) +{ + const struct insn_data_d *data = &insn_data[icode]; + if ((char)else_index >= data->n_operands) + return; + + machine_mode else_mode = data->operand[else_index].mode; + + /* For now we only support else values of 0, -1, and "undefined". */ + if (insn_operand_matches (icode, else_index, CONST0_RTX (else_mode))) + else_vals.safe_push (MASK_LOAD_ELSE_ZERO); + + if (insn_operand_matches (icode, else_index, gen_rtx_SCRATCH (else_mode))) + else_vals.safe_push (MASK_LOAD_ELSE_UNDEFINED); + + if (GET_MODE_CLASS (else_mode) == MODE_VECTOR_INT + && insn_operand_matches (icode, else_index, CONSTM1_RTX (else_mode))) + else_vals.safe_push (MASK_LOAD_ELSE_M1); +} + +/* Return true if the else value ELSE_VAL (one of MASK_LOAD_ELSE_ZERO, + MASK_LOAD_ELSE_M1, and MASK_LOAD_ELSE_UNDEFINED) is valid fo the optab + referred to by ICODE. The index of the else operand must be specified + in ELSE_INDEX. */ + +bool +supported_else_val_p (enum insn_code icode, unsigned else_index, int else_val) +{ + if (else_val != MASK_LOAD_ELSE_ZERO && else_val != MASK_LOAD_ELSE_M1 + && else_val != MASK_LOAD_ELSE_UNDEFINED) + __builtin_unreachable (); + + auto_vec else_vals; + get_supported_else_vals (icode, else_index, else_vals); + return else_vals.contains (else_val); +} + /* Return true if the target supports gather load or scatter store function IFN. For loads, VECTOR_TYPE is the vector type of the load result, while for stores it is the vector type of the stored data argument. @@ -4951,12 +5042,15 @@ internal_fn_stored_value_index (internal_fn fn) or stored. OFFSET_VECTOR_TYPE is the vector type that holds the offset from the shared base address of each loaded or stored element. SCALE is the amount by which these offsets should be multiplied - *after* they have been extended to address width. */ + *after* they have been extended to address width. + If the target supports the gather load the supported else values + will be added to the vector ELSVAL points to if it is nonzero. */ bool internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, tree memory_element_type, - tree offset_vector_type, int scale) + tree offset_vector_type, int scale, + auto_vec *elsvals) { if (!tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (vector_type)), TYPE_SIZE (memory_element_type))) @@ -4969,9 +5063,14 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, TYPE_MODE (offset_vector_type)); int output_ops = internal_load_fn_p (ifn) ? 1 : 0; bool unsigned_p = TYPE_UNSIGNED (TREE_TYPE (offset_vector_type)); - return (icode != CODE_FOR_nothing - && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p)) - && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); + bool ok = icode != CODE_FOR_nothing + && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p)) + && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)); + + if (ok && elsvals) + get_supported_else_vals (icode, MASK_LOAD_GATHER_ELSE_IDX, *elsvals); + + return ok; } /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 2785a5a95a2..11bad4e5ed9 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -240,9 +240,22 @@ extern int internal_fn_len_index (internal_fn); extern int internal_fn_else_index (internal_fn); extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, - tree, tree, int); + tree, tree, int, + auto_vec * = nullptr); extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, poly_uint64, unsigned int); + +/* Integer constants representing which else value is supported for masked load + functions. */ +#define MASK_LOAD_ELSE_ZERO -1 +#define MASK_LOAD_ELSE_M1 -2 +#define MASK_LOAD_ELSE_UNDEFINED -3 + +#define MASK_LOAD_GATHER_ELSE_IDX 6 +extern void get_supported_else_vals (enum insn_code, unsigned, + auto_vec &); +extern bool supported_else_val_p (enum insn_code, unsigned, int); + #define VECT_PARTIAL_BIAS_UNSUPPORTED 127 extern signed char internal_len_load_store_bias (internal_fn ifn, From patchwork Fri Oct 18 14:22:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999205 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=kuvAUJBt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRm30J19z1xvV for ; Sat, 19 Oct 2024 01:24:19 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8627C3858430 for ; Fri, 18 Oct 2024 14:24:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id 434983858C48 for ; Fri, 18 Oct 2024 14:22:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 434983858C48 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 434983858C48 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::632 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261354; cv=none; b=WItv7nsuQiR3MwYImkR6aRbPLZ4bJwOENhu2Zp76Aoq3Sgz7R8jnX8YP7YgUC+KA2vpgOwoXLytpd73QyUwvGNAlnPG+54HFAk851OWQHZ1eWlGdcGJRunV+XMeEOegOh77c6rBSa3rk1A2LNQcxNMs28cW8+0fLyNtVGjMKX0s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261354; c=relaxed/simple; bh=5lTZbFhJdHMgcDiODSFoe1aHaCLj/GXqrw6Rj5OivdM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=gWChb9Jqldogui9pTF3rkmtRVWAJizFVv2jCBUesCzHwD7F/it4Tl7qd4gYwnDY2rtGcytSp4r48SUQHI2FhW5HkOQbdn3fSVIZqnEVuCdGG5n9GB0VEpTWzLq8GmYOb6Rihq5QrKbW6gjjW1kWfPaUOTdHc8ZKQJQNTJ3cdsSk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-a9a68480164so99054366b.3 for ; Fri, 18 Oct 2024 07:22:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261345; x=1729866145; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WrxsIIY9b4olEl4/98R12txGi5ou4nZfWBerbC+6ExY=; b=kuvAUJBtaKyWCOdxINW3b3RfCn2Wpm2egYP3s42H7anR8vZ8jfrYiUKWGrhPYnnY8n y693UP+k3/98WSM3A95sQzukUkozkffnYJFnkvdNdW+oDD9qq448VggmtcC49SQLKRbL lncde7/kgBUZF5LUMpcxRIJZFfJrmuOHHNSaNPw+FqAAoC8PGqRLa7xL6SzrDZLlK4yo /v/r2j26uupc38m8JvxpZWwGaCnE+umvzbe3CYnYLtVTBy9g/AzmGwKzgyFTS8p4TYm/ tVfV6FQmS8YcAvXW7gmsomrbHM08oQ7cY7QH8EWyEvlGnqWXGcKI4q+Jun1gJpqTQPPm lRew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261345; x=1729866145; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WrxsIIY9b4olEl4/98R12txGi5ou4nZfWBerbC+6ExY=; b=H+rsOPWXMsdl6s9Zl4wdy0dTseVSCVDMd8nq/eJ2qq479JBt+R9y+dP8BciaIRWGDb BGqAn96vr7Z0RtnCwVeER4f4AM3ps4m3oBIX47S+DfjzOfNKDr66nIm2nX5WxfD5Gf5e apDZSRW3IdM6CLKqpV7BMI2EPi1l4c92dU5G/htMZ9UWP7+iWmy3WssyYIgFc19EdboZ wqpDLa3yNRkO33BrxSHehwNk8jD41rN3P9MwjKW3LF1DbUG7llw/yGzRFqs3jhjwZkql E0nUKxvfXo3fUuco9Vuz60/zZZ90nMB326rWoLdrOMbmv6Gdxgea3j6BjHLVeN5a3LqN l7PQ== X-Gm-Message-State: AOJu0Yw9tq4ZMOVlm3WKipMF54LQkM47PLzGcWcnOxr5c8pbUtV439UJ onH7J9U0jh6OGdDmnJmZpOkOZpTtyyLhse9dhENInqf6KemURFyB7afZlA== X-Google-Smtp-Source: AGHT+IEeYkTcsrtMsqbI/CnYeRMH/amiQqUKV/QkOGakHhKa1k1k7jdI7qpZIYOVU69GpKAP+YjjOw== X-Received: by 2002:a17:907:94d4:b0:a99:f56e:ce40 with SMTP id a640c23a62f3a-a9a69c9e9c1mr258739466b.47.1729261344161; Fri, 18 Oct 2024 07:22:24 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:23 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 3/8] tree-ifcvt: Enforce zero else value after maskload. Date: Fri, 18 Oct 2024 16:22:15 +0200 Message-ID: <20241018142220.173482-4-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org When predicating a load we implicitly assume that the else value is zero. This matters in case the loaded value is padded (like e.g. a Bool) and we must ensure that the padding bytes are zero on targets that don't implicitly zero inactive elements. In order to formalize this this patch queries the target for its supported else operand and uses that for the maskload call. Subsequently, if the else operand is nonzero, a cond_expr enforcing a zero else value is emitted. gcc/ChangeLog: * tree-if-conv.cc (predicate_load_or_store): Enforce zero else value for padded types. (predicate_statements): Use sequence instead of statement. --- gcc/tree-if-conv.cc | 112 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 94 insertions(+), 18 deletions(-) diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 90c754a4814..9623426e1e1 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -2531,12 +2531,15 @@ mask_exists (int size, const vec &vec) /* Helper function for predicate_statements. STMT is a memory read or write and it needs to be predicated by MASK. Return a statement - that does so. */ + that does so. SSA_NAMES is the set of SSA names defined earlier in + STMT's block. */ -static gimple * -predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask) +static gimple_seq +predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask, + hash_set *ssa_names) { - gcall *new_stmt; + gimple_seq stmts = NULL; + gcall *call_stmt; tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); @@ -2552,21 +2555,88 @@ predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask) ref); if (TREE_CODE (lhs) == SSA_NAME) { - new_stmt - = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, - ptr, mask); - gimple_call_set_lhs (new_stmt, lhs); - gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + /* Get the preferred vector mode and its corresponding mask for the + masked load. We need this to query the target's supported else + operands. */ + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + scalar_mode smode = as_a (mode); + + machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode); + machine_mode mask_mode + = targetm.vectorize.get_mask_mode (vmode).require (); + + auto_vec elsvals; + internal_fn ifn; + bool have_masked_load + = target_supports_mask_load_store_p (vmode, mask_mode, true, &ifn, + &elsvals); + + /* We might need to explicitly zero inactive elements if there are + padding bits in the type that might leak otherwise. + Refer to PR115336. */ + bool need_zero + = TYPE_PRECISION (TREE_TYPE (lhs)) < GET_MODE_PRECISION (smode); + + int elsval; + bool implicit_zero = false; + if (have_masked_load) + { + gcc_assert (elsvals.length ()); + + /* But not if the target already provide implicit zeroing of inactive + elements. */ + implicit_zero = elsvals.contains (MASK_LOAD_ELSE_ZERO); + + /* For now, just use the first else value if zero is unsupported. */ + elsval = implicit_zero ? MASK_LOAD_ELSE_ZERO : *elsvals.begin (); + } + else + { + /* We cannot vectorize this either way so just use a zero even + if it is unsupported. */ + elsval = MASK_LOAD_ELSE_ZERO; + } + + tree els = vect_get_mask_load_else (elsval, TREE_TYPE (lhs)); + + call_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 4, addr, + ptr, mask, els); + + /* Build the load call and, if the else value is nonzero, + a COND_EXPR that enforces it. */ + tree loadlhs; + if (!need_zero || implicit_zero) + gimple_call_set_lhs (call_stmt, gimple_get_lhs (stmt)); + else + { + loadlhs = make_temp_ssa_name (TREE_TYPE (lhs), NULL, "_ifc_"); + ssa_names->add (loadlhs); + gimple_call_set_lhs (call_stmt, loadlhs); + } + gimple_set_vuse (call_stmt, gimple_vuse (stmt)); + gimple_seq_add_stmt (&stmts, call_stmt); + + if (need_zero && !implicit_zero) + { + tree cond_rhs + = fold_build_cond_expr (TREE_TYPE (loadlhs), mask, loadlhs, + build_zero_cst (TREE_TYPE (loadlhs))); + gassign *cond_stmt + = gimple_build_assign (gimple_get_lhs (stmt), cond_rhs); + gimple_seq_add_stmt (&stmts, cond_stmt); + } } else { - new_stmt + call_stmt = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr, mask, rhs); - gimple_move_vops (new_stmt, stmt); + gimple_move_vops (call_stmt, stmt); + gimple_seq_add_stmt (&stmts, call_stmt); } - gimple_call_set_nothrow (new_stmt, true); - return new_stmt; + gimple_call_set_nothrow (call_stmt, true); + return stmts; } /* STMT uses OP_LHS. Check whether it is equivalent to: @@ -2836,7 +2906,6 @@ predicate_statements (loop_p loop) { tree lhs = gimple_assign_lhs (stmt); tree mask; - gimple *new_stmt; gimple_seq stmts = NULL; machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); /* We checked before setting GF_PLF_2 that an equivalent @@ -2870,11 +2939,18 @@ predicate_statements (loop_p loop) vect_masks.safe_push (mask); } if (gimple_assign_single_p (stmt)) - new_stmt = predicate_load_or_store (&gsi, stmt, mask); - else - new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); + { + gimple_seq call_seq + = predicate_load_or_store (&gsi, stmt, mask, &ssa_names); - gsi_replace (&gsi, new_stmt, true); + gsi_replace_with_seq (&gsi, call_seq, true); + } + else + { + gimple *new_stmt; + new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); + gsi_replace (&gsi, new_stmt, true); + } } else if (((lhs = gimple_assign_lhs (stmt)), true) && (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) From patchwork Fri Oct 18 14:22:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999199 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=GNY+U3F5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRkr5clBz1xwD for ; Sat, 19 Oct 2024 01:23:16 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 092B43857C58 for ; Fri, 18 Oct 2024 14:23:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id 415FC3858C53 for ; Fri, 18 Oct 2024 14:22:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 415FC3858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 415FC3858C53 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::632 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261354; cv=none; b=Nl7wGb5x2xUCubx7O9xBjwZNyoR1gF5Le2SWYuqpQ2kXBphpObJntbZu9nk1OkNKxCG7QLsMe4zvlb0oSEQJW8wzXjGkcARrmlN3oeskRMrhg7qunUdQBnsDzZlJQlyMgkcYhvM+2s1Lzo9HZzTNa67wgm4chXIKx441PB1knOI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261354; c=relaxed/simple; bh=Mn6gPzNwfBAu9Rh3L995WeTGezpqEqdk9LDM6wwtbYo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=G30NZ4XndWNNph36A+dSLYXK0yxY8GMu1onvBzIsM2H15zQisyUBFW7GUlEFwBkctB2lWV1CUEvnQHoP18jQPbGiyl3C3o8BnxfIztlqGU5CYkzl9CsOlE2tEuhQ+lLtpmNK9OzpVS9+UpDj9ZTbw3s7B2SiB/Hy/wEsbPuhTXY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-a9a0c7abaa6so243963966b.2 for ; Fri, 18 Oct 2024 07:22:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261346; x=1729866146; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ht57u8R2oe3M271GOGID18pyHFrie0UNQRGNCwVmd0A=; b=GNY+U3F59vqEwNzhROkFtTBXEXC8YXYfSlQWSXgGPBE27YOxUkgK5FCiUukHfOHtH+ YTbixIgZAnQbFALGDyUkpZKzuVmdT/5BX3aoCcNkmiEpDHvQvuMwAMlO3alGL6CeJyfO 7NpK3a9GJkEImDK4iSHvBaVBnDI/BlU998pWHGeMcfoTpDbbzWBfqm8RLVUpz/M3Te3P GXHE+lk+oTo/VHtEv1ZkAUHLex5IyUPzOA/jFzMZPUG006sDa0NC0WkZpHM392O8k/dl V4Y6RwRgraT9G6FdfUTmk0gJJreHhMbi/vm09R82pBVfMLwYl5iGos8Uxn0qAeyT35ls +CEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261346; x=1729866146; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ht57u8R2oe3M271GOGID18pyHFrie0UNQRGNCwVmd0A=; b=PqufT/7UMy+XUsyEl6OJtdJmJ8MP4ngJhg9aWK+SB0gNkFZ9mHK9kD+8Hzy0bRBw4a gzOeQNvTdy1LQ4jmik2PI8fEllE51aNL+Y4ksHGvO2LJ86C3sGQmXvl6gjAaDJccOoIB 1Sg0F9FvtWsEEzpSrwwTJ6U5i6On1A51ul+dhRqfoX6YTKqPAzOWSlRJdX3Y/6Kc4Lan aT+dFytzcZMxQJCTqhfNtmJvDTbmBf18lSJsMXyu3NMb3dZ3vdZ+R2MyCC5Qh3XbDWhW 00Eo70G8pDngnubzVu5Fbj6dvixivd9bqONt2b9gJgkD2DmAye+oHAWuZDsW0KVzKNbG nF5w== X-Gm-Message-State: AOJu0Yxa43RDzt9nHl0crzuhg0xWHDfcS33U//Lcd+RfT+yt445wsQLu TkVwMrTsLY34puzrzhaq9TkqKTClDAdQ9WZRm6K67PzwFjsHBbsvCh7Oxg== X-Google-Smtp-Source: AGHT+IHQdYo4+EZUsymjulWiP4UB1bFdCIWrysbZBttnyZHxpfJSVMP05CxzSrkk8OfYKQDYvBz9tg== X-Received: by 2002:a17:907:3d8a:b0:a99:37f5:de59 with SMTP id a640c23a62f3a-a9a69de3304mr256263966b.53.1729261345838; Fri, 18 Oct 2024 07:22:25 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:24 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 4/8] vect: Add maskload else value support. Date: Fri, 18 Oct 2024 16:22:16 +0200 Message-ID: <20241018142220.173482-5-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds an else operand to vectorized masked load calls. The current implementation adds else-value arguments to the respective target-querying functions that is used to supply the vectorizer with the proper else value. Right now, the only spot where a zero else value is actually enforced is tree-ifcvt. Loop masking and other instances of masked loads in the vectorizer itself do not use vec_cond_exprs. gcc/ChangeLog: * optabs-query.cc (supports_vec_convert_optab_p): Return icode. (get_supported_else_val): Return supported else value for optab's operand at index. (supports_vec_gather_load_p): Add else argument. (supports_vec_scatter_store_p): Ditto. * optabs-query.h (supports_vec_gather_load_p): Ditto. (get_supported_else_val): Ditto. * optabs-tree.cc (target_supports_mask_load_store_p): Ditto. (can_vec_mask_load_store_p): Ditto. (target_supports_len_load_store_p): Ditto. (get_len_load_store_mode): Ditto. * optabs-tree.h (target_supports_mask_load_store_p): Ditto. (can_vec_mask_load_store_p): Ditto. * tree-vect-data-refs.cc (vect_lanes_optab_supported_p): Ditto. (vect_gather_scatter_fn_p): Ditto. (vect_check_gather_scatter): Ditto. (vect_load_lanes_supported): Ditto. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto. * tree-vect-slp.cc (vect_get_operand_map): Adjust indices for else operand. (vect_slp_analyze_node_operations): Skip undefined else operand. * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): Add else operand handling. (vect_get_vec_defs_for_operand): Handle undefined else operand. (check_load_store_for_partial_vectors): Add else argument. (vect_truncate_gather_scatter_offset): Ditto. (vect_use_strided_gather_scatters_p): Ditto. (get_group_load_store_type): Ditto. (get_load_store_type): Ditto. (vect_get_mask_load_else): Ditto. (vect_get_else_val_from_tree): Ditto. (vect_build_one_gather_load_call): Add zero else operand. (vectorizable_load): Use else operand. * tree-vectorizer.h (vect_gather_scatter_fn_p): Add else argument. (vect_load_lanes_supported): Ditto. (vect_get_mask_load_else): Ditto. (vect_get_else_val_from_tree): Ditto. --- gcc/optabs-query.cc | 59 ++++++--- gcc/optabs-query.h | 3 +- gcc/optabs-tree.cc | 62 ++++++--- gcc/optabs-tree.h | 8 +- gcc/tree-vect-data-refs.cc | 77 +++++++---- gcc/tree-vect-patterns.cc | 18 ++- gcc/tree-vect-slp.cc | 22 +++- gcc/tree-vect-stmts.cc | 257 +++++++++++++++++++++++++++++-------- gcc/tree-vectorizer.h | 11 +- 9 files changed, 394 insertions(+), 123 deletions(-) diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc index cc52bc0f5ea..347a1322479 100644 --- a/gcc/optabs-query.cc +++ b/gcc/optabs-query.cc @@ -29,6 +29,9 @@ along with GCC; see the file COPYING3. If not see #include "rtl.h" #include "recog.h" #include "vec-perm-indices.h" +#include "internal-fn.h" +#include "memmodel.h" +#include "optabs.h" struct target_optabs default_target_optabs; struct target_optabs *this_fn_optabs = &default_target_optabs; @@ -672,34 +675,48 @@ lshift_cheap_p (bool speed_p) that mode, given that the second mode is always an integer vector. If MODE is VOIDmode, return true if OP supports any vector mode. */ -static bool +static enum insn_code supports_vec_convert_optab_p (optab op, machine_mode mode) { int start = mode == VOIDmode ? 0 : mode; int end = mode == VOIDmode ? MAX_MACHINE_MODE - 1 : mode; + enum insn_code icode = CODE_FOR_nothing; for (int i = start; i <= end; ++i) if (VECTOR_MODE_P ((machine_mode) i)) for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j) - if (convert_optab_handler (op, (machine_mode) i, - (machine_mode) j) != CODE_FOR_nothing) - return true; + { + if ((icode + = convert_optab_handler (op, (machine_mode) i, + (machine_mode) j)) != CODE_FOR_nothing) + return icode; + } - return false; + return icode; } /* If MODE is not VOIDmode, return true if vec_gather_load is available for that mode. If MODE is VOIDmode, return true if gather_load is available - for at least one vector mode. */ + for at least one vector mode. + In that case, and if ELSVALS is nonzero, store the supported else values + into the vector it points to. */ bool -supports_vec_gather_load_p (machine_mode mode) +supports_vec_gather_load_p (machine_mode mode, auto_vec *elsvals) { - if (!this_fn_optabs->supports_vec_gather_load[mode]) - this_fn_optabs->supports_vec_gather_load[mode] - = (supports_vec_convert_optab_p (gather_load_optab, mode) - || supports_vec_convert_optab_p (mask_gather_load_optab, mode) - || supports_vec_convert_optab_p (mask_len_gather_load_optab, mode) - ? 1 : -1); + enum insn_code icode = CODE_FOR_nothing; + if (!this_fn_optabs->supports_vec_gather_load[mode] || elsvals) + { + icode = supports_vec_convert_optab_p (gather_load_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_gather_load_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_len_gather_load_optab, mode); + this_fn_optabs->supports_vec_gather_load[mode] + = (icode != CODE_FOR_nothing) ? 1 : -1; + } + + if (elsvals && icode != CODE_FOR_nothing) + get_supported_else_vals (icode, MASK_LOAD_GATHER_ELSE_IDX, *elsvals); return this_fn_optabs->supports_vec_gather_load[mode] > 0; } @@ -711,12 +728,18 @@ supports_vec_gather_load_p (machine_mode mode) bool supports_vec_scatter_store_p (machine_mode mode) { + enum insn_code icode; if (!this_fn_optabs->supports_vec_scatter_store[mode]) - this_fn_optabs->supports_vec_scatter_store[mode] - = (supports_vec_convert_optab_p (scatter_store_optab, mode) - || supports_vec_convert_optab_p (mask_scatter_store_optab, mode) - || supports_vec_convert_optab_p (mask_len_scatter_store_optab, mode) - ? 1 : -1); + { + icode = supports_vec_convert_optab_p (scatter_store_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_scatter_store_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_len_scatter_store_optab, + mode); + this_fn_optabs->supports_vec_scatter_store[mode] + = (icode != CODE_FOR_nothing) ? 1 : -1; + } return this_fn_optabs->supports_vec_scatter_store[mode] > 0; } diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h index 0cb2c21ba85..5e0f59ee4b9 100644 --- a/gcc/optabs-query.h +++ b/gcc/optabs-query.h @@ -191,7 +191,8 @@ bool can_compare_and_swap_p (machine_mode, bool); bool can_atomic_exchange_p (machine_mode, bool); bool can_atomic_load_p (machine_mode); bool lshift_cheap_p (bool); -bool supports_vec_gather_load_p (machine_mode = E_VOIDmode); +bool supports_vec_gather_load_p (machine_mode = E_VOIDmode, + auto_vec * = nullptr); bool supports_vec_scatter_store_p (machine_mode = E_VOIDmode); bool can_vec_extract (machine_mode, machine_mode); diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc index b69a5bc3676..ebdb6051c14 100644 --- a/gcc/optabs-tree.cc +++ b/gcc/optabs-tree.cc @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "optabs.h" #include "optabs-tree.h" #include "stor-layout.h" +#include "internal-fn.h" /* Return the optab used for computing the operation given by the tree code, CODE and the tree EXP. This function is not always usable (for example, it @@ -552,24 +553,38 @@ target_supports_op_p (tree type, enum tree_code code, or mask_len_{load,store}. This helper function checks whether target supports masked load/store and return corresponding IFN in the last argument - (IFN_MASK_{LOAD,STORE} or IFN_MASK_LEN_{LOAD,STORE}). */ + (IFN_MASK_{LOAD,STORE} or IFN_MASK_LEN_{LOAD,STORE}). + If there is support and ELSVALS is nonzero add the possible else values + to the vector it points to. */ -static bool +bool target_supports_mask_load_store_p (machine_mode mode, machine_mode mask_mode, - bool is_load, internal_fn *ifn) + bool is_load, internal_fn *ifn, + auto_vec *elsvals) { optab op = is_load ? maskload_optab : maskstore_optab; optab len_op = is_load ? mask_len_load_optab : mask_len_store_optab; - if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing) + enum insn_code icode; + if ((icode = convert_optab_handler (op, mode, mask_mode)) + != CODE_FOR_nothing) { if (ifn) *ifn = is_load ? IFN_MASK_LOAD : IFN_MASK_STORE; + if (elsvals) + get_supported_else_vals (icode, + internal_fn_else_index (IFN_MASK_LOAD), + *elsvals); return true; } - else if (convert_optab_handler (len_op, mode, mask_mode) != CODE_FOR_nothing) + else if ((icode = convert_optab_handler (len_op, mode, mask_mode)) + != CODE_FOR_nothing) { if (ifn) *ifn = is_load ? IFN_MASK_LEN_LOAD : IFN_MASK_LEN_STORE; + if (elsvals) + get_supported_else_vals (icode, + internal_fn_else_index (IFN_MASK_LEN_LOAD), + *elsvals); return true; } return false; @@ -584,13 +599,15 @@ bool can_vec_mask_load_store_p (machine_mode mode, machine_mode mask_mode, bool is_load, - internal_fn *ifn) + internal_fn *ifn, + auto_vec *elsvals) { machine_mode vmode; /* If mode is vector mode, check it directly. */ if (VECTOR_MODE_P (mode)) - return target_supports_mask_load_store_p (mode, mask_mode, is_load, ifn); + return target_supports_mask_load_store_p (mode, mask_mode, is_load, ifn, + elsvals); /* Otherwise, return true if there is some vector mode with the mask load/store supported. */ @@ -604,7 +621,8 @@ can_vec_mask_load_store_p (machine_mode mode, vmode = targetm.vectorize.preferred_simd_mode (smode); if (VECTOR_MODE_P (vmode) && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) - && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn)) + && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn, + elsvals)) return true; auto_vector_modes vector_modes; @@ -612,7 +630,8 @@ can_vec_mask_load_store_p (machine_mode mode, for (machine_mode base_mode : vector_modes) if (related_vector_mode (base_mode, smode).exists (&vmode) && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) - && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn)) + && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn, + elsvals)) return true; return false; } @@ -622,11 +641,13 @@ can_vec_mask_load_store_p (machine_mode mode, or mask_len_{load,store}. This helper function checks whether target supports len load/store and return corresponding IFN in the last argument - (IFN_LEN_{LOAD,STORE} or IFN_MASK_LEN_{LOAD,STORE}). */ + (IFN_LEN_{LOAD,STORE} or IFN_MASK_LEN_{LOAD,STORE}). + If there is support and ELSVALS is nonzero add the possible else values + to the vector it points to. */ static bool target_supports_len_load_store_p (machine_mode mode, bool is_load, - internal_fn *ifn) + internal_fn *ifn, auto_vec *elsvals) { optab op = is_load ? len_load_optab : len_store_optab; optab masked_op = is_load ? mask_len_load_optab : mask_len_store_optab; @@ -638,11 +659,17 @@ target_supports_len_load_store_p (machine_mode mode, bool is_load, return true; } machine_mode mask_mode; + enum insn_code icode; if (targetm.vectorize.get_mask_mode (mode).exists (&mask_mode) - && convert_optab_handler (masked_op, mode, mask_mode) != CODE_FOR_nothing) + && ((icode = convert_optab_handler (masked_op, mode, mask_mode)) + != CODE_FOR_nothing)) { if (ifn) *ifn = is_load ? IFN_MASK_LEN_LOAD : IFN_MASK_LEN_STORE; + if (elsvals) + get_supported_else_vals (icode, + internal_fn_else_index (IFN_MASK_LEN_LOAD), + *elsvals); return true; } return false; @@ -656,22 +683,25 @@ target_supports_len_load_store_p (machine_mode mode, bool is_load, VnQI to wrap the other supportable same size vector modes. An additional output in the last argument which is the IFN pointer. We set IFN as LEN_{LOAD,STORE} or MASK_LEN_{LOAD,STORE} according - which optab is supported in the target. */ + which optab is supported in the target. + If there is support and ELSVALS is nonzero add the possible else values + to the vector it points to. */ opt_machine_mode -get_len_load_store_mode (machine_mode mode, bool is_load, internal_fn *ifn) +get_len_load_store_mode (machine_mode mode, bool is_load, internal_fn *ifn, + auto_vec *elsvals) { gcc_assert (VECTOR_MODE_P (mode)); /* Check if length in lanes supported for this mode directly. */ - if (target_supports_len_load_store_p (mode, is_load, ifn)) + if (target_supports_len_load_store_p (mode, is_load, ifn, elsvals)) return mode; /* Check if length in bytes supported for same vector size VnQI. */ machine_mode vmode; poly_uint64 nunits = GET_MODE_SIZE (mode); if (related_vector_mode (mode, QImode, nunits).exists (&vmode) - && target_supports_len_load_store_p (vmode, is_load, ifn)) + && target_supports_len_load_store_p (vmode, is_load, ifn, elsvals)) return vmode; return opt_machine_mode (); diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h index f2b49991462..390954bf998 100644 --- a/gcc/optabs-tree.h +++ b/gcc/optabs-tree.h @@ -47,9 +47,13 @@ bool expand_vec_cond_expr_p (tree, tree, enum tree_code); void init_tree_optimization_optabs (tree); bool target_supports_op_p (tree, enum tree_code, enum optab_subtype = optab_default); +bool target_supports_mask_load_store_p (machine_mode, machine_mode, + bool, internal_fn *, auto_vec *); bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool, - internal_fn * = nullptr); + internal_fn * = nullptr, + auto_vec * = nullptr); opt_machine_mode get_len_load_store_mode (machine_mode, bool, - internal_fn * = nullptr); + internal_fn * = nullptr, + auto_vec * = nullptr); #endif diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index 202af7a8952..d9f608dd2c0 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -55,13 +55,18 @@ along with GCC; see the file COPYING3. If not see #include "vec-perm-indices.h" #include "internal-fn.h" #include "gimple-fold.h" +#include "optabs-query.h" /* Return true if load- or store-lanes optab OPTAB is implemented for - COUNT vectors of type VECTYPE. NAME is the name of OPTAB. */ + COUNT vectors of type VECTYPE. NAME is the name of OPTAB. + + If it is implemented and ELSVALS is nonzero add the possible else values + to the vector it points to. */ static bool vect_lanes_optab_supported_p (const char *name, convert_optab optab, - tree vectype, unsigned HOST_WIDE_INT count) + tree vectype, unsigned HOST_WIDE_INT count, + auto_vec *elsvals = nullptr) { machine_mode mode, array_mode; bool limit_p; @@ -81,7 +86,9 @@ vect_lanes_optab_supported_p (const char *name, convert_optab optab, } } - if (convert_optab_handler (optab, array_mode, mode) == CODE_FOR_nothing) + enum insn_code icode; + if ((icode = convert_optab_handler (optab, array_mode, mode)) + == CODE_FOR_nothing) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -92,8 +99,13 @@ vect_lanes_optab_supported_p (const char *name, convert_optab optab, if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, - "can use %s<%s><%s>\n", name, GET_MODE_NAME (array_mode), - GET_MODE_NAME (mode)); + "can use %s<%s><%s>\n", name, GET_MODE_NAME (array_mode), + GET_MODE_NAME (mode)); + + if (elsvals) + get_supported_else_vals (icode, + internal_fn_else_index (IFN_MASK_LEN_LOAD_LANES), + *elsvals); return true; } @@ -4177,13 +4189,15 @@ vect_prune_runtime_alias_test_list (loop_vec_info loop_vinfo) be multiplied *after* it has been converted to address width. Return true if the function is supported, storing the function id in - *IFN_OUT and the vector type for the offset in *OFFSET_VECTYPE_OUT. */ + *IFN_OUT and the vector type for the offset in *OFFSET_VECTYPE_OUT. + + If we can use gather and add the possible else values to ELSVALS. */ bool vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, tree vectype, tree memory_type, tree offset_type, int scale, internal_fn *ifn_out, - tree *offset_vectype_out) + tree *offset_vectype_out, auto_vec *elsvals) { unsigned int memory_bits = tree_to_uhwi (TYPE_SIZE (memory_type)); unsigned int element_bits = vector_element_bits (vectype); @@ -4221,7 +4235,8 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, /* Test whether the target supports this combination. */ if (internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type, - offset_vectype, scale)) + offset_vectype, scale, + elsvals)) { *ifn_out = ifn; *offset_vectype_out = offset_vectype; @@ -4231,7 +4246,7 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, && internal_gather_scatter_fn_supported_p (alt_ifn, vectype, memory_type, offset_vectype, - scale)) + scale, elsvals)) { *ifn_out = alt_ifn; *offset_vectype_out = offset_vectype; @@ -4239,7 +4254,8 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, } else if (internal_gather_scatter_fn_supported_p (alt_ifn2, vectype, memory_type, - offset_vectype, scale)) + offset_vectype, scale, + elsvals)) { *ifn_out = alt_ifn2; *offset_vectype_out = offset_vectype; @@ -4278,11 +4294,13 @@ vect_describe_gather_scatter_call (stmt_vec_info stmt_info, } /* Return true if a non-affine read or write in STMT_INFO is suitable for a - gather load or scatter store. Describe the operation in *INFO if so. */ + gather load or scatter store. Describe the operation in *INFO if so. + If it is suitable and ELSVALS is nonzero add the supported else values + to the vector it points to. */ bool vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, - gather_scatter_info *info) + gather_scatter_info *info, auto_vec *elsvals) { HOST_WIDE_INT scale = 1; poly_int64 pbitpos, pbitsize; @@ -4306,6 +4324,16 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, ifn = gimple_call_internal_fn (call); if (internal_gather_scatter_fn_p (ifn)) { + /* Extract the else value from a masked-load call. This is + necessary when we created a gather_scatter pattern from a + maskload. It is a bit cumbersome to basically create the + same else value three times but it's probably acceptable until + tree-ifcvt goes away. */ + if (internal_fn_mask_index (ifn) >= 0 && elsvals) + { + tree els = gimple_call_arg (call, internal_fn_else_index (ifn)); + elsvals->safe_push (vect_get_else_val_from_tree (els)); + } vect_describe_gather_scatter_call (stmt_info, info); return true; } @@ -4315,7 +4343,8 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, /* True if we should aim to use internal functions rather than built-in functions. */ bool use_ifn_p = (DR_IS_READ (dr) - ? supports_vec_gather_load_p (TYPE_MODE (vectype)) + ? supports_vec_gather_load_p (TYPE_MODE (vectype), + elsvals) : supports_vec_scatter_store_p (TYPE_MODE (vectype))); base = DR_REF (dr); @@ -4472,12 +4501,14 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, masked_p, vectype, memory_type, signed_char_type_node, new_scale, &ifn, - &offset_vectype) + &offset_vectype, + elsvals) && !vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p, vectype, memory_type, unsigned_char_type_node, new_scale, &ifn, - &offset_vectype)) + &offset_vectype, + elsvals)) break; scale = new_scale; off = op0; @@ -4500,7 +4531,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, && vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p, vectype, memory_type, TREE_TYPE (off), scale, &ifn, - &offset_vectype)) + &offset_vectype, elsvals)) break; if (TYPE_PRECISION (TREE_TYPE (op0)) @@ -4554,7 +4585,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, { if (!vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p, vectype, memory_type, offtype, scale, - &ifn, &offset_vectype)) + &ifn, &offset_vectype, elsvals)) ifn = IFN_LAST; decl = NULL_TREE; } @@ -6391,27 +6422,29 @@ vect_grouped_load_supported (tree vectype, bool single_element_p, } /* Return FN if vec_{masked_,mask_len_}load_lanes is available for COUNT vectors - of type VECTYPE. MASKED_P says whether the masked form is needed. */ + of type VECTYPE. MASKED_P says whether the masked form is needed. + If it is available and ELSVALS is nonzero add the possible else values + to the vector it points to. */ internal_fn vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count, - bool masked_p) + bool masked_p, auto_vec *elsvals) { if (vect_lanes_optab_supported_p ("vec_mask_len_load_lanes", vec_mask_len_load_lanes_optab, vectype, - count)) + count, elsvals)) return IFN_MASK_LEN_LOAD_LANES; else if (masked_p) { if (vect_lanes_optab_supported_p ("vec_mask_load_lanes", vec_mask_load_lanes_optab, vectype, - count)) + count, elsvals)) return IFN_MASK_LOAD_LANES; } else { if (vect_lanes_optab_supported_p ("vec_load_lanes", vec_load_lanes_optab, - vectype, count)) + vectype, count, elsvals)) return IFN_LOAD_LANES; } return IFN_LAST; diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 746f100a084..184d150f96d 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -6630,7 +6630,8 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo, /* Make sure that the target supports an appropriate internal function for the gather/scatter operation. */ gather_scatter_info gs_info; - if (!vect_check_gather_scatter (stmt_info, loop_vinfo, &gs_info) + auto_vec elsvals; + if (!vect_check_gather_scatter (stmt_info, loop_vinfo, &gs_info, &elsvals) || gs_info.ifn == IFN_LAST) return NULL; @@ -6653,20 +6654,27 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo, tree offset = vect_add_conversion_to_pattern (vinfo, offset_type, gs_info.offset, stmt_info); + tree vec_els = NULL_TREE; /* Build the new pattern statement. */ tree scale = size_int (gs_info.scale); gcall *pattern_stmt; + tree load_lhs; if (DR_IS_READ (dr)) { tree zero = build_zero_cst (gs_info.element_type); if (mask != NULL) - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, base, - offset, scale, zero, mask); + { + int elsval = *elsvals.begin (); + vec_els = vect_get_mask_load_else (elsval, TREE_TYPE (gs_vectype)); + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 6, base, + offset, scale, zero, mask, + vec_els); + } else pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base, offset, scale, zero); - tree load_lhs = vect_recog_temp_ssa_var (gs_info.element_type, NULL); - gimple_call_set_lhs (pattern_stmt, load_lhs); + load_lhs = vect_recog_temp_ssa_var (gs_info.element_type, NULL); + gimple_set_lhs (pattern_stmt, load_lhs); } else { diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 8727246c27a..d161f28d62c 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -511,13 +511,13 @@ static const int cond_expr_maps[3][5] = { static const int no_arg_map[] = { 0 }; static const int arg0_map[] = { 1, 0 }; static const int arg1_map[] = { 1, 1 }; -static const int arg2_map[] = { 1, 2 }; -static const int arg1_arg4_map[] = { 2, 1, 4 }; +static const int arg2_arg3_map[] = { 2, 2, 3 }; +static const int arg1_arg4_arg5_map[] = { 3, 1, 4, 5 }; static const int arg3_arg2_map[] = { 2, 3, 2 }; static const int op1_op0_map[] = { 2, 1, 0 }; static const int off_map[] = { 1, -3 }; static const int off_op0_map[] = { 2, -3, 0 }; -static const int off_arg2_map[] = { 2, -3, 2 }; +static const int off_arg2_arg3_map[] = { 3, -3, 2, 3 }; static const int off_arg3_arg2_map[] = { 3, -3, 3, 2 }; static const int mask_call_maps[6][7] = { { 1, 1, }, @@ -564,14 +564,14 @@ vect_get_operand_map (const gimple *stmt, bool gather_scatter_p = false, switch (gimple_call_internal_fn (call)) { case IFN_MASK_LOAD: - return gather_scatter_p ? off_arg2_map : arg2_map; + return gather_scatter_p ? off_arg2_arg3_map : arg2_arg3_map; case IFN_GATHER_LOAD: return arg1_map; case IFN_MASK_GATHER_LOAD: case IFN_MASK_LEN_GATHER_LOAD: - return arg1_arg4_map; + return arg1_arg4_arg5_map; case IFN_MASK_STORE: return gather_scatter_p ? off_arg3_arg2_map : arg3_arg2_map; @@ -7775,6 +7775,18 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node, tree vector_type = SLP_TREE_VECTYPE (child); if (!vector_type) { + /* Masked loads can have an undefined (default SSA definition) + else operand. We do not need to cost it. */ + vec ops = SLP_TREE_SCALAR_OPS (child); + if ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) + == load_vec_info_type) + && ((ops.length () && + TREE_CODE (ops[0]) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (ops[0]) + && VAR_P (SSA_NAME_VAR (ops[0]))) + || SLP_TREE_DEF_TYPE (child) == vect_constant_def)) + continue; + /* For shifts with a scalar argument we don't need to cost or code-generate anything. ??? Represent this more explicitely. */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 9b14b96cb5a..74a437735a5 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3. If not see #include "regs.h" #include "attribs.h" #include "optabs-libfuncs.h" +#include "tree-dfa.h" /* For lang_hooks.types.type_for_mode. */ #include "langhooks.h" @@ -469,6 +470,10 @@ exist_non_indexing_operands_for_use_p (tree use, stmt_vec_info stmt_info) if (mask_index >= 0 && use == gimple_call_arg (call, mask_index)) return true; + int els_index = internal_fn_else_index (ifn); + if (els_index >= 0 + && use == gimple_call_arg (call, els_index)) + return true; int stored_value_index = internal_fn_stored_value_index (ifn); if (stored_value_index >= 0 && use == gimple_call_arg (call, stored_value_index)) @@ -1280,7 +1285,17 @@ vect_get_vec_defs_for_operand (vec_info *vinfo, stmt_vec_info stmt_vinfo, vector_type = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE (op)); gcc_assert (vector_type); - tree vop = vect_init_vector (vinfo, stmt_vinfo, op, vector_type, NULL); + /* A masked load can have a default SSA definition as else operand. + We should "vectorize" this instead of creating a duplicate from the + scalar default. */ + tree vop; + if (TREE_CODE (op) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (op) + && VAR_P (SSA_NAME_VAR (op))) + vop = get_or_create_ssa_default_def (cfun, + create_tmp_var (vector_type)); + else + vop = vect_init_vector (vinfo, stmt_vinfo, op, vector_type, NULL); while (ncopies--) vec_oprnds->quick_push (vop); } @@ -1492,7 +1507,10 @@ static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info, Clear LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P if a loop using partial vectors is not supported, otherwise record the required rgroup control - types. */ + types. + + If partial vectors can be used and ELSVALS is nonzero the supported + else values will be added to the vector ELSVALS points to. */ static void check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, @@ -1502,7 +1520,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, vect_memory_access_type memory_access_type, gather_scatter_info *gs_info, - tree scalar_mask) + tree scalar_mask, + auto_vec *elsvals = nullptr) { /* Invariant loads need no special support. */ if (memory_access_type == VMAT_INVARIANT) @@ -1518,7 +1537,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, if (slp_node) nvectors /= group_size; internal_fn ifn - = (is_load ? vect_load_lanes_supported (vectype, group_size, true) + = (is_load ? vect_load_lanes_supported (vectype, group_size, true, + elsvals) : vect_store_lanes_supported (vectype, group_size, true)); if (ifn == IFN_MASK_LEN_LOAD_LANES || ifn == IFN_MASK_LEN_STORE_LANES) vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1); @@ -1548,12 +1568,14 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, if (internal_gather_scatter_fn_supported_p (len_ifn, vectype, gs_info->memory_type, gs_info->offset_vectype, - gs_info->scale)) + gs_info->scale, + elsvals)) vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1); else if (internal_gather_scatter_fn_supported_p (ifn, vectype, gs_info->memory_type, gs_info->offset_vectype, - gs_info->scale)) + gs_info->scale, + elsvals)) vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype, scalar_mask); else @@ -1607,7 +1629,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, machine_mode mask_mode; machine_mode vmode; bool using_partial_vectors_p = false; - if (get_len_load_store_mode (vecmode, is_load).exists (&vmode)) + if (get_len_load_store_mode + (vecmode, is_load, nullptr, elsvals).exists (&vmode)) { nvectors = group_memory_nvectors (group_size * vf, nunits); unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vecmode); @@ -1615,7 +1638,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, using_partial_vectors_p = true; } else if (targetm.vectorize.get_mask_mode (vecmode).exists (&mask_mode) - && can_vec_mask_load_store_p (vecmode, mask_mode, is_load)) + && can_vec_mask_load_store_p (vecmode, mask_mode, is_load, NULL, + elsvals)) { nvectors = group_memory_nvectors (group_size * vf, nunits); vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype, scalar_mask); @@ -1672,12 +1696,16 @@ prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type, tree loop_mask, without loss of precision, where X is STMT_INFO's DR_STEP. Return true if this is possible, describing the gather load or scatter - store in GS_INFO. MASKED_P is true if the load or store is conditional. */ + store in GS_INFO. MASKED_P is true if the load or store is conditional. + + If we can use gather/scatter and ELSVALS is nonzero the supported + else values will be added to the vector ELSVALS points to. */ static bool vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, bool masked_p, - gather_scatter_info *gs_info) + gather_scatter_info *gs_info, + auto_vec *elsvals) { dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info); data_reference *dr = dr_info->dr; @@ -1734,7 +1762,8 @@ vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info, tree memory_type = TREE_TYPE (DR_REF (dr)); if (!vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p, vectype, memory_type, offset_type, scale, - &gs_info->ifn, &gs_info->offset_vectype) + &gs_info->ifn, &gs_info->offset_vectype, + elsvals) || gs_info->ifn == IFN_LAST) continue; @@ -1762,17 +1791,21 @@ vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info, vectorize STMT_INFO, which is a grouped or strided load or store. MASKED_P is true if load or store is conditional. When returning true, fill in GS_INFO with the information required to perform the - operation. */ + operation. + + If we can use gather/scatter and ELSVALS is nonzero the supported + else values will be added to the vector ELSVALS points to. */ static bool vect_use_strided_gather_scatters_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, bool masked_p, - gather_scatter_info *gs_info) + gather_scatter_info *gs_info, + auto_vec *elsvals) { - if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info) + if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info, elsvals) || gs_info->ifn == IFN_LAST) return vect_truncate_gather_scatter_offset (stmt_info, loop_vinfo, - masked_p, gs_info); + masked_p, gs_info, elsvals); tree old_offset_type = TREE_TYPE (gs_info->offset); tree new_offset_type = TREE_TYPE (gs_info->offset_vectype); @@ -1985,7 +2018,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, dr_alignment_support *alignment_support_scheme, int *misalignment, gather_scatter_info *gs_info, - internal_fn *lanes_ifn) + internal_fn *lanes_ifn, + auto_vec *elsvals) { loop_vec_info loop_vinfo = dyn_cast (vinfo); class loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; @@ -2074,7 +2108,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, else if (slp_node->ldst_lanes && (*lanes_ifn = (vls_type == VLS_LOAD - ? vect_load_lanes_supported (vectype, group_size, masked_p) + ? vect_load_lanes_supported (vectype, group_size, + masked_p, elsvals) : vect_store_lanes_supported (vectype, group_size, masked_p))) != IFN_LAST) *memory_access_type = VMAT_LOAD_STORE_LANES; @@ -2242,7 +2277,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, /* Otherwise try using LOAD/STORE_LANES. */ *lanes_ifn = vls_type == VLS_LOAD - ? vect_load_lanes_supported (vectype, group_size, masked_p) + ? vect_load_lanes_supported (vectype, group_size, masked_p, + elsvals) : vect_store_lanes_supported (vectype, group_size, masked_p); if (*lanes_ifn != IFN_LAST) @@ -2276,7 +2312,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, && single_element_p && loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, - masked_p, gs_info)) + masked_p, gs_info, elsvals)) *memory_access_type = VMAT_GATHER_SCATTER; if (*memory_access_type == VMAT_GATHER_SCATTER @@ -2338,7 +2374,10 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, SLP says whether we're performing SLP rather than loop vectorization. MASKED_P is true if the statement is conditional on a vectorized mask. VECTYPE is the vector type that the vectorized statements will use. - NCOPIES is the number of vector statements that will be needed. */ + NCOPIES is the number of vector statements that will be needed. + + If ELSVALS is nonzero the supported else values will be added to the + vector ELSVALS points to. */ static bool get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, @@ -2350,7 +2389,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, dr_alignment_support *alignment_support_scheme, int *misalignment, gather_scatter_info *gs_info, - internal_fn *lanes_ifn) + internal_fn *lanes_ifn, + auto_vec *elsvals = nullptr) { loop_vec_info loop_vinfo = dyn_cast (vinfo); poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); @@ -2359,7 +2399,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) { *memory_access_type = VMAT_GATHER_SCATTER; - if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info)) + if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info, + elsvals)) gcc_unreachable (); /* When using internal functions, we rely on pattern recognition to convert the type of the offset to the type that the target @@ -2413,7 +2454,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, masked_p, vls_type, memory_access_type, poffset, alignment_support_scheme, - misalignment, gs_info, lanes_ifn)) + misalignment, gs_info, lanes_ifn, + elsvals)) return false; } else if (STMT_VINFO_STRIDED_P (stmt_info)) @@ -2421,7 +2463,7 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, gcc_assert (!slp_node); if (loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, - masked_p, gs_info)) + masked_p, gs_info, elsvals)) *memory_access_type = VMAT_GATHER_SCATTER; else *memory_access_type = VMAT_ELEMENTWISE; @@ -2689,6 +2731,53 @@ vect_build_zero_merge_argument (vec_info *vinfo, return vect_init_vector (vinfo, stmt_info, merge, vectype, NULL); } +/* Return the supported else value for a masked load internal function IFN. + The vector type is given in VECTYPE and the mask type in VECTYPE2. + TYPE specifies the type of the returned else value. */ + +tree +vect_get_mask_load_else (int elsval, tree type) +{ + tree els; + if (elsval == MASK_LOAD_ELSE_UNDEFINED) + { + tree tmp = create_tmp_var (type); + /* No need to warn about anything. */ + TREE_NO_WARNING (tmp) = 1; + els = get_or_create_ssa_default_def (cfun, tmp); + } + else if (elsval == MASK_LOAD_ELSE_M1) + els = build_minus_one_cst (type); + else if (elsval == MASK_LOAD_ELSE_ZERO) + els = build_zero_cst (type); + else + __builtin_unreachable (); + + return els; +} + +/* Return the integer define a tree else operand ELS represents. + This performs the inverse of vect_get_mask_load_else. Refer to + vect_check_gather_scatter for its usage rationale. */ + +int +vect_get_else_val_from_tree (tree els) +{ + if (TREE_CODE (els) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (els) + && TREE_CODE (SSA_NAME_VAR (els)) == VAR_DECL) + return MASK_LOAD_ELSE_UNDEFINED; + else + { + if (zerop (els)) + return MASK_LOAD_ELSE_ZERO; + else if (integer_minus_onep (els)) + return MASK_LOAD_ELSE_M1; + else + __builtin_unreachable (); + } +} + /* Build a gather load call while vectorizing STMT_INFO. Insert new instructions before GSI and add them to VEC_STMT. GS_INFO describes the gather load operation. If the load is conditional, MASK is the @@ -2770,8 +2859,14 @@ vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info, } tree scale = build_int_cst (scaletype, gs_info->scale); - gimple *new_stmt = gimple_build_call (gs_info->decl, 5, src_op, ptr, op, - mask_op, scale); + gimple *new_stmt; + + if (!mask) + new_stmt = gimple_build_call (gs_info->decl, 5, src_op, ptr, op, + mask_op, scale); + else + new_stmt = gimple_build_call (gs_info->decl, 5, src_op, ptr, op, + mask_op, scale); if (!useless_type_conversion_p (vectype, rettype)) { @@ -9967,6 +10062,7 @@ vectorizable_load (vec_info *vinfo, gather_scatter_info gs_info; tree ref_type; enum vect_def_type mask_dt = vect_unknown_def_type; + enum vect_def_type els_dt = vect_unknown_def_type; if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) return false; @@ -9979,8 +10075,12 @@ vectorizable_load (vec_info *vinfo, return false; tree mask = NULL_TREE, mask_vectype = NULL_TREE; + tree els = NULL_TREE; tree els_vectype = NULL_TREE; + int mask_index = -1; + int els_index = -1; slp_tree slp_op = NULL; + slp_tree els_op = NULL; if (gassign *assign = dyn_cast (stmt_info->stmt)) { scalar_dest = gimple_assign_lhs (assign); @@ -10020,6 +10120,15 @@ vectorizable_load (vec_info *vinfo, && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, &slp_op, &mask_dt, &mask_vectype)) return false; + + els_index = internal_fn_else_index (ifn); + if (els_index >= 0 && slp_node) + els_index = vect_slp_child_index_for_operand + (call, els_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info)); + if (els_index >= 0 + && !vect_is_simple_use (vinfo, stmt_info, slp_node, els_index, + &els, &els_op, &els_dt, &els_vectype)) + return false; } tree vectype = STMT_VINFO_VECTYPE (stmt_info); @@ -10122,10 +10231,11 @@ vectorizable_load (vec_info *vinfo, int misalignment; poly_int64 poffset; internal_fn lanes_ifn; + auto_vec elsvals; if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, VLS_LOAD, ncopies, &memory_access_type, &poffset, &alignment_support_scheme, &misalignment, &gs_info, - &lanes_ifn)) + &lanes_ifn, &elsvals)) return false; /* ??? The following checks should really be part of @@ -10191,7 +10301,8 @@ vectorizable_load (vec_info *vinfo, machine_mode vec_mode = TYPE_MODE (vectype); if (!VECTOR_MODE_P (vec_mode) || !can_vec_mask_load_store_p (vec_mode, - TYPE_MODE (mask_vectype), true)) + TYPE_MODE (mask_vectype), + true, NULL, &elsvals)) return false; } else if (memory_access_type != VMAT_LOAD_STORE_LANES @@ -10260,6 +10371,16 @@ vectorizable_load (vec_info *vinfo, STMT_VINFO_TYPE (stmt_info) = load_vec_info_type; } + else + { + /* Here just get the else values. */ + if (loop_vinfo + && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) + check_load_store_for_partial_vectors (loop_vinfo, vectype, slp_node, + VLS_LOAD, group_size, + memory_access_type, &gs_info, + mask, &elsvals); + } if (!slp) gcc_assert (memory_access_type @@ -10930,6 +11051,7 @@ vectorizable_load (vec_info *vinfo, } tree vec_mask = NULL_TREE; + tree vec_els = NULL_TREE; if (memory_access_type == VMAT_LOAD_STORE_LANES) { gcc_assert (alignment_support_scheme == dr_aligned @@ -11020,6 +11142,11 @@ vectorizable_load (vec_info *vinfo, } } + if (final_mask) + vec_els = vect_get_mask_load_else + (elsvals.contains (MASK_LOAD_ELSE_ZERO) + ? MASK_LOAD_ELSE_ZERO : *elsvals.begin (), vectype); + gcall *call; if (final_len && final_mask) { @@ -11028,9 +11155,10 @@ vectorizable_load (vec_info *vinfo, VEC_MASK, LEN, BIAS). */ unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype)); tree alias_ptr = build_int_cst (ref_type, align); - call = gimple_build_call_internal (IFN_MASK_LEN_LOAD_LANES, 5, + call = gimple_build_call_internal (IFN_MASK_LEN_LOAD_LANES, 6, dataref_ptr, alias_ptr, - final_mask, final_len, bias); + final_mask, vec_els, + final_len, bias); } else if (final_mask) { @@ -11039,9 +11167,9 @@ vectorizable_load (vec_info *vinfo, VEC_MASK). */ unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype)); tree alias_ptr = build_int_cst (ref_type, align); - call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3, + call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 4, dataref_ptr, alias_ptr, - final_mask); + final_mask, vec_els); } else { @@ -11190,17 +11318,29 @@ vectorizable_load (vec_info *vinfo, } } + if (final_mask) + vec_els = vect_get_mask_load_else + (elsvals.contains (MASK_LOAD_ELSE_ZERO) + ? MASK_LOAD_ELSE_ZERO : *elsvals.begin (), vectype); + gcall *call; if (final_len && final_mask) - call - = gimple_build_call_internal (IFN_MASK_LEN_GATHER_LOAD, 7, - dataref_ptr, vec_offset, - scale, zero, final_mask, - final_len, bias); + { + call + = gimple_build_call_internal (IFN_MASK_LEN_GATHER_LOAD, + 8, dataref_ptr, + vec_offset, scale, zero, + final_mask, vec_els, + final_len, bias); + } else if (final_mask) - call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, 5, - dataref_ptr, vec_offset, - scale, zero, final_mask); + { + call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, + 6, dataref_ptr, + vec_offset, scale, + zero, final_mask, + vec_els); + } else call = gimple_build_call_internal (IFN_GATHER_LOAD, 4, dataref_ptr, vec_offset, @@ -11514,6 +11654,7 @@ vectorizable_load (vec_info *vinfo, tree final_mask = NULL_TREE; tree final_len = NULL_TREE; tree bias = NULL_TREE; + if (!costing_p) { if (mask) @@ -11566,7 +11707,8 @@ vectorizable_load (vec_info *vinfo, if (loop_lens) { opt_machine_mode new_ovmode - = get_len_load_store_mode (vmode, true, &partial_ifn); + = get_len_load_store_mode (vmode, true, &partial_ifn, + &elsvals); new_vmode = new_ovmode.require (); unsigned factor = (new_ovmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vmode); @@ -11578,7 +11720,7 @@ vectorizable_load (vec_info *vinfo, { if (!can_vec_mask_load_store_p ( vmode, TYPE_MODE (TREE_TYPE (final_mask)), true, - &partial_ifn)) + &partial_ifn, &elsvals)) gcc_unreachable (); } @@ -11606,19 +11748,28 @@ vectorizable_load (vec_info *vinfo, bias = build_int_cst (intQI_type_node, biasval); } + tree vec_els; + if (final_len) { tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); gcall *call; if (partial_ifn == IFN_MASK_LEN_LOAD) - call = gimple_build_call_internal (IFN_MASK_LEN_LOAD, 5, - dataref_ptr, ptr, - final_mask, final_len, - bias); + { + vec_els = vect_get_mask_load_else + (elsvals.contains (MASK_LOAD_ELSE_ZERO) + ? MASK_LOAD_ELSE_ZERO : *elsvals.begin (), vectype); + call = gimple_build_call_internal (IFN_MASK_LEN_LOAD, + 6, dataref_ptr, ptr, + final_mask, vec_els, + final_len, bias); + } else - call = gimple_build_call_internal (IFN_LEN_LOAD, 4, - dataref_ptr, ptr, - final_len, bias); + { + call = gimple_build_call_internal (IFN_LEN_LOAD, 4, + dataref_ptr, ptr, + final_len, bias); + } gimple_call_set_nothrow (call, true); new_stmt = call; data_ref = NULL_TREE; @@ -11641,9 +11792,13 @@ vectorizable_load (vec_info *vinfo, else if (final_mask) { tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); - gcall *call = gimple_build_call_internal (IFN_MASK_LOAD, 3, + vec_els = vect_get_mask_load_else + (elsvals.contains (MASK_LOAD_ELSE_ZERO) + ? MASK_LOAD_ELSE_ZERO : *elsvals.begin (), vectype); + gcall *call = gimple_build_call_internal (IFN_MASK_LOAD, 4, dataref_ptr, ptr, - final_mask); + final_mask, + vec_els); gimple_call_set_nothrow (call, true); new_stmt = call; data_ref = NULL_TREE; diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index b7f2708fec0..0b20c36a7fe 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2439,9 +2439,11 @@ extern bool vect_slp_analyze_instance_alignment (vec_info *, slp_instance); extern opt_result vect_analyze_data_ref_accesses (vec_info *, vec *); extern opt_result vect_prune_runtime_alias_test_list (loop_vec_info); extern bool vect_gather_scatter_fn_p (vec_info *, bool, bool, tree, tree, - tree, int, internal_fn *, tree *); + tree, int, internal_fn *, tree *, + auto_vec * = nullptr); extern bool vect_check_gather_scatter (stmt_vec_info, loop_vec_info, - gather_scatter_info *); + gather_scatter_info *, + auto_vec * = nullptr); extern opt_result vect_find_stmt_data_reference (loop_p, gimple *, vec *, vec *, int); @@ -2459,7 +2461,8 @@ extern tree vect_create_destination_var (tree, tree); extern bool vect_grouped_store_supported (tree, unsigned HOST_WIDE_INT); extern internal_fn vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT, bool); extern bool vect_grouped_load_supported (tree, bool, unsigned HOST_WIDE_INT); -extern internal_fn vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT, bool); +extern internal_fn vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT, + bool, auto_vec * = nullptr); extern void vect_permute_store_chain (vec_info *, vec &, unsigned int, stmt_vec_info, gimple_stmt_iterator *, vec *); @@ -2605,6 +2608,8 @@ extern int vect_slp_child_index_for_operand (const gimple *, int op, bool); extern tree prepare_vec_mask (loop_vec_info, tree, tree, tree, gimple_stmt_iterator *); +extern tree vect_get_mask_load_else (int, tree); +extern int vect_get_else_val_from_tree (tree els); /* In tree-vect-patterns.cc. */ extern void From patchwork Fri Oct 18 14:22:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999207 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=k41AbV7L; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRnS2KHkz1xw2 for ; Sat, 19 Oct 2024 01:25:32 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 84DC6385840C for ; Fri, 18 Oct 2024 14:25:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by sourceware.org (Postfix) with ESMTPS id 463193858C32 for ; Fri, 18 Oct 2024 14:22:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 463193858C32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 463193858C32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261357; cv=none; b=WCuISX6/FWiUaBamVAaRdC+iptQ/dNPsiJM1SR5nHymH/x8DazMB3K1EoKTN8GzWI61vpn1nAHIPLph36yjPlkJdDPAq+Y1IzMjyhFKznWiLP1boO4vtrsMo7RiggtgEqqjwmErnPPq5g0FHl5TYL8e0bxstfjCwNsFTo8/GqBo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261357; c=relaxed/simple; bh=KCS7JuETTlVFsUwcPcl93h95X66/ScPGbp47QuwX9/E=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Blv78zg4cNV+5Hy59Kp7IgnRKL4hPw5bYPJl2fDFw2bQo5DRS1i+JgmX4fsJwnTZJBHf5XG09Lh/svoeb1BSaWy2x3H7SQXbOk1VVKAJ+PgsTgLV4jrbts3dXbxbllB0lH740om7hUPUihdLBmQlo+J81tbhaYUMcrzYP1LW7hE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-539fb49c64aso3219523e87.0 for ; Fri, 18 Oct 2024 07:22:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261347; x=1729866147; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7/Q9Nxj19Mch3awf7UQv3IMRqI1tHGJmxGQH+mqVC8w=; b=k41AbV7Lq/H3aUJ0hXr48uPk1IOIozvu6KSocglZQ4grJh0f9FL6qj3zbx4VF0PVUS ttqfFm18BeUntzCWy1MszXtSYkl8RsguVIyQk1MDx6g/4n/Ci7z+JRTiwtTmae5Nbo0H D5+EzDNZJcjZ2AAw2WsGlqbRiIDZqvxUGJdq/crjI4lXYtisA3xQDkCgBNZfLZnliO6w R5Hv6H16gA5zJt95Lc0RxAm1GzYlI1ab6EoUrB+DlumgMWBdTu669P4nmkgtKnCP5Oxs RELI3PvGEpqLMYao07P7+1NEHhfX2fy47TBkpW3yg2zzYaoxqMJipos1C/zOAteneSTP xs7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261347; x=1729866147; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7/Q9Nxj19Mch3awf7UQv3IMRqI1tHGJmxGQH+mqVC8w=; b=eipBS88GSffyYIw2BIm67rNRGv1kCUPLWmitAYS5/t2tr6czb/AhDbU2Q1RVwstumL upvw0NcATtwu3iT1gWhG93KZvBkMcvzMN8yVCSxhrefqLx6NEQKRQQepBVI2wkGzuRBp YfkCgDNlRaur8kZFaSPl92bil5sWNirugJEmg5niV2SkC5zmmX5lyil1IDYG0iskh6ZQ KtyMeXZLIpGQ9Ax/fjtc9aJBezDKBcfg+G2fsJeyEem7PSvBobE9KACu0/V1jI3rXH+t NDc4fiSx+exq6dkc1fhXdawGOPWRlbJmMR6K/21VWOEfHGSsgCohNqUU6a8TzjlSRGJp NKvQ== X-Gm-Message-State: AOJu0Yy+798qGBDq2h/6iPHgs1E2BGbxIvjgmEOly7MoK+1vEjrz/5HV UFcUacFNSUOp+krmoCR811BBk1OYd6TdlBZc/J71TV4tc+uRJNOBxVDwgA== X-Google-Smtp-Source: AGHT+IGGTv3VDQmyd3nsx/zA4O6W6asXDCklz2VYJAZGVm+ET8cEzeZcgvqpl8lKCTaaRLwW0ZFS0w== X-Received: by 2002:a05:6512:3b89:b0:52c:cd77:fe03 with SMTP id 2adb3069b0e04-53a1544481emr2644221e87.14.1729261346745; Fri, 18 Oct 2024 07:22:26 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:26 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 5/8] aarch64: Add masked-load else operands. Date: Fri, 18 Oct 2024 16:22:17 +0200 Message-ID: <20241018142220.173482-6-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This adds zero else operands to masked loads and their intrinsics. I needed to adjust more than initially thought because we rely on combine for several instructions and a change in a "base" pattern needs to propagate to all those. For the lack of a better idea I used a function call property to specify whether a builtin needs an else operand or not. Somebody with better knowledge of the aarch64 target can surely improve that. gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-base.cc: Add else handling. * config/aarch64/aarch64-sve-builtins.cc (function_expander::use_contiguous_load_insn): Ditto. * config/aarch64/aarch64-sve-builtins.h: Add "has else". * config/aarch64/aarch64-sve.md (*aarch64_load __mov): Add else operands. * config/aarch64/aarch64-sve2.md: Ditto. * config/aarch64/predicates.md (aarch64_maskload_else_operand): Add zero else operand. --- .../aarch64/aarch64-sve-builtins-base.cc | 58 ++++++++++++++----- gcc/config/aarch64/aarch64-sve-builtins.cc | 5 ++ gcc/config/aarch64/aarch64-sve-builtins.h | 1 + gcc/config/aarch64/aarch64-sve.md | 47 +++++++++++++-- gcc/config/aarch64/aarch64-sve2.md | 3 +- gcc/config/aarch64/predicates.md | 4 ++ 6 files changed, 98 insertions(+), 20 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 1c17149e1f0..08d2fb796dd 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1476,7 +1476,7 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } gimple * @@ -1491,11 +1491,12 @@ public: gimple_seq stmts = NULL; tree pred = f.convert_pred (stmts, vectype, 0); tree base = f.fold_contiguous_base (stmts, vectype); + tree els = build_zero_cst (vectype); gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT); tree cookie = f.load_store_cookie (TREE_TYPE (vectype)); - gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD, 3, - base, cookie, pred); + gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD, 4, + base, cookie, pred, els); gimple_call_set_lhs (new_call, f.lhs); return new_call; } @@ -1505,10 +1506,16 @@ public: { insn_code icode; if (e.vectors_per_tuple () == 1) - icode = convert_optab_handler (maskload_optab, - e.vector_mode (0), e.gp_mode (0)); + { + icode = convert_optab_handler (maskload_optab, + e.vector_mode (0), e.gp_mode (0)); + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); + } else - icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0)); + { + icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0)); + e.args.quick_push (CONST0_RTX (e.tuple_mode (0))); + } return e.use_contiguous_load_insn (icode); } }; @@ -1519,12 +1526,20 @@ class svld1_extend_impl : public extending_load public: using extending_load::extending_load; + unsigned int + call_properties (const function_instance &) const override + { + return CP_READ_MEMORY | CP_HAS_ELSE; + } + rtx expand (function_expander &e) const override { insn_code icode = code_for_aarch64_load (UNSPEC_LD1_SVE, extend_rtx_code (), e.vector_mode (0), e.memory_vector_mode ()); + /* Add the else operand. */ + e.args.quick_push (CONST0_RTX (e.vector_mode (1))); return e.use_contiguous_load_insn (icode); } }; @@ -1535,7 +1550,7 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } rtx @@ -1544,6 +1559,8 @@ public: e.prepare_gather_address_operands (1); /* Put the predicate last, as required by mask_gather_load_optab. */ e.rotate_inputs_left (0, 5); + /* Add the else operand. */ + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); machine_mode mem_mode = e.memory_vector_mode (); machine_mode int_mode = aarch64_sve_int_mode (mem_mode); insn_code icode = convert_optab_handler (mask_gather_load_optab, @@ -1567,6 +1584,8 @@ public: e.rotate_inputs_left (0, 5); /* Add a constant predicate for the extension rtx. */ e.args.quick_push (CONSTM1_RTX (VNx16BImode)); + /* Add the else operand. */ + e.args.quick_push (CONST0_RTX (e.vector_mode (1))); insn_code icode = code_for_aarch64_gather_load (extend_rtx_code (), e.vector_mode (0), e.memory_vector_mode ()); @@ -1697,7 +1716,7 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } gimple * @@ -1709,6 +1728,7 @@ public: /* Get the predicate and base pointer. */ gimple_seq stmts = NULL; tree pred = f.convert_pred (stmts, vectype, 0); + tree els = build_zero_cst (vectype); tree base = f.fold_contiguous_base (stmts, vectype); gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT); @@ -1727,8 +1747,8 @@ public: /* Emit the load itself. */ tree cookie = f.load_store_cookie (TREE_TYPE (vectype)); - gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3, - base, cookie, pred); + gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 4, + base, cookie, pred, els); gimple_call_set_lhs (new_call, lhs_array); gsi_insert_after (f.gsi, new_call, GSI_SAME_STMT); @@ -1741,6 +1761,7 @@ public: machine_mode tuple_mode = e.result_mode (); insn_code icode = convert_optab_handler (vec_mask_load_lanes_optab, tuple_mode, e.vector_mode (0)); + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); return e.use_contiguous_load_insn (icode); } }; @@ -1802,16 +1823,23 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } rtx expand (function_expander &e) const override { - insn_code icode = (e.vectors_per_tuple () == 1 - ? code_for_aarch64_ldnt1 (e.vector_mode (0)) - : code_for_aarch64 (UNSPEC_LDNT1_COUNT, - e.tuple_mode (0))); + insn_code icode; + if (e.vectors_per_tuple () == 1) + { + icode = code_for_aarch64_ldnt1 (e.vector_mode (0)); + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); + } + else + { + icode = code_for_aarch64 (UNSPEC_LDNT1_COUNT, e.tuple_mode (0)); + e.args.quick_push (CONST0_RTX (e.tuple_mode (0))); + } return e.use_contiguous_load_insn (icode); } }; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index e7c703c987e..7214f1f5a3e 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -4207,6 +4207,11 @@ function_expander::use_contiguous_load_insn (insn_code icode) add_input_operand (icode, args[0]); if (GET_MODE_UNIT_BITSIZE (mem_mode) < type_suffix (0).element_bits) add_input_operand (icode, CONSTM1_RTX (VNx16BImode)); + + /* If we have an else operand, add it. */ + if (call_properties () & CP_HAS_ELSE) + add_input_operand (icode, args.last ()); + return generate_insn (icode); } diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 645e56badbe..6cda8bd8a8c 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -103,6 +103,7 @@ const unsigned int CP_READ_ZA = 1U << 7; const unsigned int CP_WRITE_ZA = 1U << 8; const unsigned int CP_READ_ZT0 = 1U << 9; const unsigned int CP_WRITE_ZT0 = 1U << 10; +const unsigned int CP_HAS_ELSE = 1U << 11; /* Enumerates the SVE predicate and (data) vector types, together called "vector types" for brevity. */ diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 06bd3e4bb2c..1e12fa3c982 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1291,7 +1291,8 @@ (define_insn "maskload" [(set (match_operand:SVE_ALL 0 "register_operand" "=w") (unspec:SVE_ALL [(match_operand: 2 "register_operand" "Upl") - (match_operand:SVE_ALL 1 "memory_operand" "m")] + (match_operand:SVE_ALL 1 "memory_operand" "m") + (match_operand:SVE_ALL 3 "aarch64_maskload_else_operand")] UNSPEC_LD1_SVE))] "TARGET_SVE" "ld1\t%0., %2/z, %1" @@ -1302,11 +1303,14 @@ (define_expand "vec_load_lanes" [(set (match_operand:SVE_STRUCT 0 "register_operand") (unspec:SVE_STRUCT [(match_dup 2) - (match_operand:SVE_STRUCT 1 "memory_operand")] + (match_operand:SVE_STRUCT 1 "memory_operand") + (match_dup 3) + ] UNSPEC_LDN))] "TARGET_SVE" { operands[2] = aarch64_ptrue_reg (mode); + operands[3] = CONST0_RTX (mode); } ) @@ -1315,7 +1319,8 @@ (define_insn "vec_mask_load_lanes" [(set (match_operand:SVE_STRUCT 0 "register_operand" "=w") (unspec:SVE_STRUCT [(match_operand: 2 "register_operand" "Upl") - (match_operand:SVE_STRUCT 1 "memory_operand" "m")] + (match_operand:SVE_STRUCT 1 "memory_operand" "m") + (match_operand 3 "aarch64_maskload_else_operand")] UNSPEC_LDN))] "TARGET_SVE" "ld\t%0, %2/z, %1" @@ -1335,6 +1340,27 @@ (define_insn "vec_mask_load_lanes" ;; Predicated load and extend, with 8 elements per 128-bit block. (define_insn_and_rewrite "@aarch64_load_" + [(set (match_operand:SVE_HSDI 0 "register_operand" "=w") + (unspec:SVE_HSDI + [(match_operand: 3 "general_operand" "UplDnm") + (ANY_EXTEND:SVE_HSDI + (unspec:SVE_PARTIAL_I + [(match_operand: 2 "register_operand" "Upl") + (match_operand:SVE_PARTIAL_I 1 "memory_operand" "m") + (match_operand:SVE_PARTIAL_I 4 "aarch64_maskload_else_operand")] + SVE_PRED_LOAD))] + UNSPEC_PRED_X))] + "TARGET_SVE && (~ & ) == 0" + "ld1\t%0., %2/z, %1" + "&& !CONSTANT_P (operands[3])" + { + operands[3] = CONSTM1_RTX (mode); + } +) + +;; Same as above without the maskload_else_operand to still allow combine to +;; match a sign-extended pred_mov pattern. +(define_insn_and_rewrite "*aarch64_load__mov" [(set (match_operand:SVE_HSDI 0 "register_operand" "=w") (unspec:SVE_HSDI [(match_operand: 3 "general_operand" "UplDnm") @@ -1433,7 +1459,8 @@ (define_insn "@aarch64_ldnt1" [(set (match_operand:SVE_FULL 0 "register_operand" "=w") (unspec:SVE_FULL [(match_operand: 2 "register_operand" "Upl") - (match_operand:SVE_FULL 1 "memory_operand" "m")] + (match_operand:SVE_FULL 1 "memory_operand" "m") + (match_operand:SVE_FULL 3 "aarch64_maskload_else_operand")] UNSPEC_LDNT1_SVE))] "TARGET_SVE" "ldnt1\t%0., %2/z, %1" @@ -1456,11 +1483,13 @@ (define_expand "gather_load" (match_operand: 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_dup 6) (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" { operands[5] = aarch64_ptrue_reg (mode); + operands[6] = CONST0_RTX (mode); } ) @@ -1474,6 +1503,7 @@ (define_insn "mask_gather_load" (match_operand:VNx4SI 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_4 6 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1503,6 +1533,7 @@ (define_insn "mask_gather_load" (match_operand:VNx2DI 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 6 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1531,6 +1562,7 @@ (define_insn_and_rewrite "*mask_gather_load_xtw_unpac UNSPEC_PRED_X) (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1561,6 +1593,7 @@ (define_insn_and_rewrite "*mask_gather_load_sxtw" UNSPEC_PRED_X) (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1588,6 +1621,7 @@ (define_insn "*mask_gather_load_uxtw" (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate")) (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1624,6 +1658,7 @@ (define_insn_and_rewrite "@aarch64_gather_load_ (match_operand:VNx4SI 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_4BHI 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1663,6 +1698,7 @@ (define_insn_and_rewrite "@aarch64_gather_load_") + (match_operand:SVE_2BHSI 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1701,6 +1737,7 @@ (define_insn_and_rewrite "*aarch64_gather_load_") + (match_operand:SVE_2BHSI 8 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1738,6 +1775,7 @@ (define_insn_and_rewrite "*aarch64_gather_load_") + (match_operand:SVE_2BHSI 8 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1772,6 +1810,7 @@ (define_insn_and_rewrite "*aarch64_gather_load_") + (match_operand:SVE_2BHSI 8 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 5f2697c3179..22e8632af80 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -138,7 +138,8 @@ (define_insn "@aarch64_" [(set (match_operand:SVE_FULLx24 0 "aligned_register_operand" "=Uw") (unspec:SVE_FULLx24 [(match_operand:VNx16BI 2 "register_operand" "Uph") - (match_operand:SVE_FULLx24 1 "memory_operand" "m")] + (match_operand:SVE_FULLx24 1 "memory_operand" "m") + (match_operand:SVE_FULLx24 3 "aarch64_maskload_else_operand")] LD1_COUNT))] "TARGET_STREAMING_SME2" "\t%0, %K2/z, %1" diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 8f3aab2272c..744f36ff67d 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -1069,3 +1069,7 @@ (define_predicate "aarch64_granule16_simm9" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), -4096, 4080) && !(INTVAL (op) & 0xf)"))) + +(define_predicate "aarch64_maskload_else_operand" + (and (match_code "const_int,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) From patchwork Fri Oct 18 14:22:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999209 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=U+uAMJf+; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRpm60ZZz1xw2 for ; Sat, 19 Oct 2024 01:26:40 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0E4EA385840A for ; Fri, 18 Oct 2024 14:26:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id EBD083858405 for ; Fri, 18 Oct 2024 14:22:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EBD083858405 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EBD083858405 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::535 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261362; cv=none; b=p5BaB2oWwQerKFDIqIo0y22lmQo8qxWzDv53Eal+Vh+7yMmPxp4DRdPVGpRtIXJSxeYjPeefhnLEmCR6dHgeeHZDKzOPv/uP4Fc7hbXAis36qCP74PYsaZAPf/LSAlbCC16m3nB40Df/HX745SpxAIHitFUoOqP1snKEtN6Fayc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261362; c=relaxed/simple; bh=ABkEfa4h8N+w8vLK0onxMI5azB09S8EBw1rb6aqGcJ4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=t69LYSqJhiv3PGUmuj/CLUFufwSVMkxziNOpEByrkgUlCd/pLaWlLvTxk3KZwklcOm9D4id5h6mEdCxGb2VUkyawi2T8nN0vOWOpAgTwWQ2jBVQ3sc4BFULKVFezj5go/bNsrQraumzFQ6RsYGUTq61KSUjVsNhrJus4UwKAbOI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-5c903f5bd0eso3697839a12.3 for ; Fri, 18 Oct 2024 07:22:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261348; x=1729866148; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jrvojWImiFXXTY6JoicewCr9j/nvMejXbWV0rk9H+NY=; b=U+uAMJf+bSCs5gZWYja1aG5EaiIipIh9Ze8F7wKD2hD6L8b7pH2TAJmYMuGOOEdy6f tdyopQqrS77ybJwXHVGjHVdhG+2k0SBaHxI1qDK1A77VOaRMzIEmgrwIiEIxn147DSWY u6hlTT55zndp9+t2hQ5OkrycuKhNsXCfahVgwommxEHl3OpBbTf/vzZibgCrBb8+IykR tRhlz6aVnDSLZdZdLZrwg85xVEr7hoGqflfD93BzHMw93APfYSt4cAcg/IM5fBlMP9p2 oiVEPfWHAtWdpPrlxdJPlNXBnS4b2jmHuVEQQRxPK7DIPIVa4AvDOA9z+LHMvjpFdpc2 ImSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261348; x=1729866148; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jrvojWImiFXXTY6JoicewCr9j/nvMejXbWV0rk9H+NY=; b=J1BKL71v5FavbtltdtT/PeGsTP2upI9F+XTFoc4hYZJXYTewMFsCeZGESLEeoSlkGY L+06Rt7zmTu5/lQBV6gbNwChaWtVd2Mxgj/t3xUj9M9PSoePKNYtljvPNGB+SzNbgxzP ME61rGtHvc1ayKk4m9MP1ark0D5Dfvju+roNOYIihKQfmM86LizL3Lk5VcH527VwR745 fKS21SQMk+uWbr1CMPiVVBNAYMd2xqGuCsMVjwShlFOhIzf3yQbRgziszANJU+FGakA6 6FyWSUlGUa9rNEFmJ9zy7Vvr546pfQHN0tZuL5FCPP3vyZBvPlAzyRuwf6RdMU9HZOrT fmtw== X-Gm-Message-State: AOJu0Yw9Rcgsfu/wEvLh/9nl7VrumCLPq5dfeAqXr5jMx/diPIq+dfcA YCHiT22KmEzMatp8m/lGI9nL8P+4dYbrYNbZxBZErfX6T0NPZmGuxR89uA== X-Google-Smtp-Source: AGHT+IFnVESLGR8CsVPPGWBYmxfGJ//2TrPwloNabWoLMkUTD4MjmznUtpnSzJJ/CtE6nUiW1PHiVQ== X-Received: by 2002:a17:907:2da0:b0:a9a:20b4:4078 with SMTP id a640c23a62f3a-a9a69a78620mr245352066b.16.1729261347612; Fri, 18 Oct 2024 07:22:27 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:27 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 6/8] gcn: Add else operand to masked loads. Date: Fri, 18 Oct 2024 16:22:18 +0200 Message-ID: <20241018142220.173482-7-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds an undefined else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new predicate. --- gcc/config/gcn/gcn-valu.md | 12 ++++-------- gcc/config/gcn/predicates.md | 2 ++ 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index cb2f4a78035..15e9fe8da40 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -3989,7 +3989,8 @@ (define_expand "while_ultsidi" (define_expand "maskloaddi" [(match_operand:V_MOV 0 "register_operand") (match_operand:V_MOV 1 "memory_operand") - (match_operand 2 "")] + (match_operand 2 "") + (match_operand:V_MOV 3 "maskload_else_operand")] "" { rtx exec = force_reg (DImode, operands[2]); @@ -3998,9 +3999,6 @@ (define_expand "maskloaddi" rtx as = gen_rtx_CONST_INT (VOIDmode, MEM_ADDR_SPACE (operands[1])); rtx v = gen_rtx_CONST_INT (VOIDmode, MEM_VOLATILE_P (operands[1])); - /* Masked lanes are required to hold zero. */ - emit_move_insn (operands[0], gcn_vec_constant (mode, 0)); - emit_insn (gen_gather_expr_exec (operands[0], addr, as, v, operands[0], exec)); DONE; @@ -4027,7 +4025,8 @@ (define_expand "mask_gather_load" (match_operand: 2 "register_operand") (match_operand 3 "immediate_operand") (match_operand:SI 4 "gcn_alu_operand") - (match_operand:DI 5 "")] + (match_operand:DI 5 "") + (match_operand:V_MOV 6 "maskload_else_operand")] "" { rtx exec = force_reg (DImode, operands[5]); @@ -4036,9 +4035,6 @@ (define_expand "mask_gather_load" operands[2], operands[4], INTVAL (operands[3]), exec); - /* Masked lanes are required to hold zero. */ - emit_move_insn (operands[0], gcn_vec_constant (mode, 0)); - if (GET_MODE (addr) == mode) emit_insn (gen_gather_insn_1offset_exec (operands[0], addr, const0_rtx, const0_rtx, diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md index 3f59396a649..21beeb586a4 100644 --- a/gcc/config/gcn/predicates.md +++ b/gcc/config/gcn/predicates.md @@ -228,3 +228,5 @@ (define_predicate "ascending_zero_int_parallel" return gcn_stepped_zero_int_parallel_p (op, 1); }) +(define_predicate "maskload_else_operand" + (match_operand 0 "scratch_operand")) From patchwork Fri Oct 18 14:22:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999202 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=IXUFGVHC; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRlD2025z1xwD for ; Sat, 19 Oct 2024 01:23:36 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7A4C63857C4F for ; Fri, 18 Oct 2024 14:23:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 352893858401 for ; Fri, 18 Oct 2024 14:22:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 352893858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 352893858401 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::629 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261362; cv=none; b=S3OKa/VC0/+wSFg2mEpID0txzpsLOw70VaOl7cY1qxfKFVocEfCxxnQRCpKiIzXc0dQPPpZINK4xxfSCnQh8vqC5wjvXdCdZ5X2E3d78qWcNlRbSQp2Aqr14+v1E5qanMeJEJXwYcX0BDzNfGNuYZ4/OsruSg7fpABzNifeYnA4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261362; c=relaxed/simple; bh=mLb2jSsfmmQ5Gvy/MjJAfx5FRQxjy6KevEOR7JlTwpw=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Oum0NrvNmVyV8wAwuwOkysEdGxWeXrbDri3L1jd9S0kT5rsxLl1K611wXGzmp0OttnM/4sy232PVlz4RfKdKvIoJNBBSWeSxt7LhpCNoK6bEflzhDXWulRjmU8iPcb8Oexeq8al4XlnkNnBi0leztgTvrFhytI6C7DyhtE7//Ck= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x629.google.com with SMTP id a640c23a62f3a-a9a1b71d7ffso307239166b.1 for ; Fri, 18 Oct 2024 07:22:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261351; x=1729866151; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zr0mtnAPYzL1RfI8/t1hQILRzJ3B0W5QMPOiZNhaoCY=; b=IXUFGVHC5X5JZd/rztg1vweP/6rbW1keLTKJe4fi6tINxtdz49wJ+37Koebv6n7+D8 /Enu3a3TuEKljKIskwKKVCNk/AfK+D/9fKC97j5/aJoLLtqYqfVVvTWM1QLxlphemo7S eVwApLHt2LS9TaCxT2LTsYCDvD8J709cCuQycfV4phHIS5hobMJ34RNOH02LS81yFlWK 4ozFcIAEMvAIOdXEfn30A7AWWT3Uv00ML7FfsHbu31BATRbPSUw0Oa1lYCrp7EskynLK nWIHVotvwHST2wYBAsxq/RBB/Knt1EpgB02xzKDJJmVjZVFyZ5GKdq9I8F1lxfEe6h6h aCtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261351; x=1729866151; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zr0mtnAPYzL1RfI8/t1hQILRzJ3B0W5QMPOiZNhaoCY=; b=OiOsLiMZ7m1w44AxZ9x8uNMspR9fluZ/WEtmBSOKm66NFqsWUm17oVdPcfRIE888ao FId8DWvqgTtvomhNGwtCwBvH79L05lVvNd61/+GgZKXJj5OLKP1x8uMV2xNaiLtM+Lki 9Tjp6XtYKdkjn0egLq4oIYTaGXYSLPkOEc8zuKnI5VTP52Pv2ZQjrtGTEdPR4WX0knZd Dkr1i6cZ42UeDTmzePqizve6ElLhRBgsTQ/CynvpkHqTOMEtipHHX9Fqy5hmE1IKkQtt 49yZJVFu7vOSPEpYtTDczxR0IRxpcV3AmX8iH4qr2AgOyS7+h8dDjgyAKwvHIE2EUcXp ZtaA== X-Gm-Message-State: AOJu0YwlZ8tQ1NH2UE9vP32YK/CRb0dtqewyf3SQjCaT3POP2NMEurAQ VloPc0TWvKNbnh7VsQkUCWwnh0D55MxbQTewCuds2K1ZrwVGYzAi6U50EA== X-Google-Smtp-Source: AGHT+IFmNEIWy/GnB8fIwAu/+PvlLY0oIaaCTdhsSoGioKbG0pxg/qOse7W7GOBwhdaLNKpmFMTRxQ== X-Received: by 2002:a17:907:728c:b0:a9a:6855:1820 with SMTP id a640c23a62f3a-a9a69a7616cmr254291466b.15.1729261349304; Fri, 18 Oct 2024 07:22:29 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:28 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 7/8] i386: Add else operand to masked loads. Date: Fri, 18 Oct 2024 16:22:19 +0200 Message-ID: <20241018142220.173482-8-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds a zero else operand to masked loads, in particular the masked gather load builtins that are used for gather vectorization. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Add else-operand handling. (ix86_expand_builtin): Ditto. * config/i386/predicates.md (vcvtne2ps2bf_parallel): New predicate. (maskload_else_operand): Ditto. * config/i386/sse.md: Use predicate. --- gcc/config/i386/i386-expand.cc | 26 +++++-- gcc/config/i386/predicates.md | 4 ++ gcc/config/i386/sse.md | 124 ++++++++++++++++++++------------- 3 files changed, 101 insertions(+), 53 deletions(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 63f5e348d64..f6a2c2d65b8 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -12994,10 +12994,11 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, { tree arg; rtx pat, op; - unsigned int i, nargs, arg_adjust, memory; + unsigned int i, nargs, arg_adjust, memory = -1; unsigned int constant = 100; bool aligned_mem = false; - rtx xops[4]; + rtx xops[4] = {}; + bool add_els = false; enum insn_code icode = d->icode; const struct insn_data_d *insn_p = &insn_data[icode]; machine_mode tmode = insn_p->operand[0].mode; @@ -13124,6 +13125,9 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case V4DI_FTYPE_PCV4DI_V4DI: case V4SI_FTYPE_PCV4SI_V4SI: case V2DI_FTYPE_PCV2DI_V2DI: + /* Two actual args but an additional else operand. */ + add_els = true; + /* Fallthru. */ case VOID_FTYPE_INT_INT64: nargs = 2; klass = load; @@ -13396,6 +13400,12 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, xops[i]= op; } + if (add_els) + { + xops[i] = CONST0_RTX (GET_MODE (xops[0])); + nargs++; + } + switch (nargs) { case 0: @@ -13652,7 +13662,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, enum insn_code icode, icode2; tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); tree arg0, arg1, arg2, arg3, arg4; - rtx op0, op1, op2, op3, op4, pat, pat2, insn; + rtx op0, op1, op2, op3, op4, opels, pat, pat2, insn; machine_mode mode0, mode1, mode2, mode3, mode4; unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl); HOST_WIDE_INT bisa, bisa2; @@ -15559,12 +15569,15 @@ rdseed_step: op3 = copy_to_reg (op3); op3 = lowpart_subreg (mode3, op3, GET_MODE (op3)); } + if (!insn_data[icode].operand[5].predicate (op4, mode4)) { - error ("the last argument must be scale 1, 2, 4, 8"); - return const0_rtx; + error ("the last argument must be scale 1, 2, 4, 8"); + return const0_rtx; } + opels = CONST0_RTX (GET_MODE (subtarget)); + /* Optimize. If mask is known to have all high bits set, replace op0 with pc_rtx to signal that the instruction overwrites the whole destination and doesn't use its @@ -15633,7 +15646,8 @@ rdseed_step: } } - pat = GEN_FCN (icode) (subtarget, op0, op1, op2, op3, op4); + pat = GEN_FCN (icode) (subtarget, op0, op1, op2, op3, op4, opels); + if (! pat) return const0_rtx; emit_insn (pat); diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 053312bbe27..7c7d8f61f11 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -2346,3 +2346,7 @@ (define_predicate "apx_evex_add_memory_operand" return true; }) + +(define_predicate "maskload_else_operand" + (and (match_code "const_int,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a45b50ad732..83955eee5a0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1575,7 +1575,8 @@ (define_expand "_load_mask" } else if (MEM_P (operands[1])) operands[1] = gen_rtx_UNSPEC (mode, - gen_rtvec(1, operands[1]), + gen_rtvec(2, operands[1], + CONST0_RTX (mode)), UNSPEC_MASKLOAD); }) @@ -1583,7 +1584,8 @@ (define_insn "*_load_mask" [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") (vec_merge:V48_AVX512VL (unspec:V48_AVX512VL - [(match_operand:V48_AVX512VL 1 "memory_operand" "m")] + [(match_operand:V48_AVX512VL 1 "memory_operand" "m") + (match_operand:V48_AVX512VL 4 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_operand:V48_AVX512VL 2 "nonimm_or_0_operand" "0C") (match_operand: 3 "register_operand" "Yk")))] @@ -1611,7 +1613,8 @@ (define_insn "*_load_mask" (define_insn_and_split "*_load" [(set (match_operand:V48_AVX512VL 0 "register_operand") (unspec:V48_AVX512VL - [(match_operand:V48_AVX512VL 1 "memory_operand")] + [(match_operand:V48_AVX512VL 1 "memory_operand") + (match_operand:V48_AVX512VL 2 "maskload_else_operand")] UNSPEC_MASKLOAD))] "TARGET_AVX512F" "#" @@ -1633,7 +1636,8 @@ (define_expand "_load_mask" } else if (MEM_P (operands[1])) operands[1] = gen_rtx_UNSPEC (mode, - gen_rtvec(1, operands[1]), + gen_rtvec(2, operands[1], + CONST0_RTX (mode)), UNSPEC_MASKLOAD); }) @@ -1642,7 +1646,8 @@ (define_insn "*_load_mask" [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v") (vec_merge:VI12HFBF_AVX512VL (unspec:VI12HFBF_AVX512VL - [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m")] + [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m") + (match_operand:VI12HFBF_AVX512VL 4 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_operand:VI12HFBF_AVX512VL 2 "nonimm_or_0_operand" "0C") (match_operand: 3 "register_operand" "Yk")))] @@ -1655,7 +1660,8 @@ (define_insn "*_load_mask" (define_insn_and_split "*_load" [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v") (unspec:VI12HFBF_AVX512VL - [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m")] + [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m") + (match_operand:VI12HFBF_AVX512VL 2 "maskload_else_operand")] UNSPEC_MASKLOAD))] "TARGET_AVX512BW" "#" @@ -28586,7 +28592,8 @@ (define_insn "_maskload" [(set (match_operand:V48_128_256 0 "register_operand" "=x") (unspec:V48_128_256 [(match_operand: 2 "register_operand" "x") - (match_operand:V48_128_256 1 "memory_operand" "jm")] + (match_operand:V48_128_256 1 "memory_operand" "jm") + (match_operand:V48_128_256 3 "maskload_else_operand")] UNSPEC_MASKMOV))] "TARGET_AVX" { @@ -28627,7 +28634,8 @@ (define_expand "maskload" [(set (match_operand:V48_128_256 0 "register_operand") (unspec:V48_128_256 [(match_operand: 2 "register_operand") - (match_operand:V48_128_256 1 "memory_operand")] + (match_operand:V48_128_256 1 "memory_operand") + (match_operand:V48_128_256 3 "maskload_else_operand")] UNSPEC_MASKMOV))] "TARGET_AVX") @@ -28635,20 +28643,24 @@ (define_expand "maskload" [(set (match_operand:V48_AVX512VL 0 "register_operand") (vec_merge:V48_AVX512VL (unspec:V48_AVX512VL - [(match_operand:V48_AVX512VL 1 "memory_operand")] + [(match_operand:V48_AVX512VL 1 "memory_operand") + (match_operand:V48_AVX512VL 3 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_dup 0) - (match_operand: 2 "register_operand")))] + (match_operand: 2 "register_operand"))) + ] "TARGET_AVX512F") (define_expand "maskload" [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand") (vec_merge:VI12HFBF_AVX512VL (unspec:VI12HFBF_AVX512VL - [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand")] + [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand") + (match_operand:VI12HFBF_AVX512VL 3 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_dup 0) - (match_operand: 2 "register_operand")))] + (match_operand: 2 "register_operand"))) + ] "TARGET_AVX512BW") (define_expand "maskstore" @@ -29214,20 +29226,22 @@ (define_expand "avx2_gathersi" (unspec:VEC_GATHER_MODE [(match_operand:VEC_GATHER_MODE 1 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand ")])) + (match_operand:SI 5 "const1248_operand ") + (match_operand:VEC_GATHER_MODE 6 "maskload_else_operand")])) (mem:BLK (scratch)) (match_operand:VEC_GATHER_MODE 4 "register_operand")] UNSPEC_GATHER)) - (clobber (match_scratch:VEC_GATHER_MODE 7))])] + (clobber (match_scratch:VEC_GATHER_MODE 8))])] "TARGET_AVX2" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx2_gathersi" @@ -29238,7 +29252,8 @@ (define_insn "*avx2_gathersi" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "jb") (match_operand: 4 "register_operand" "x") - (match_operand:SI 6 "const1248_operand")] + (match_operand:SI 6 "const1248_operand") + (match_operand:VEC_GATHER_MODE 8 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand:VEC_GATHER_MODE 5 "register_operand" "1")] @@ -29259,7 +29274,8 @@ (define_insn "*avx2_gathersi_2" [(unspec:P [(match_operand:P 2 "vsib_address_operand" "jb") (match_operand: 3 "register_operand" "x") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VEC_GATHER_MODE 7 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand:VEC_GATHER_MODE 4 "register_operand" "1")] @@ -29277,20 +29293,22 @@ (define_expand "avx2_gatherdi" (unspec:VEC_GATHER_MODE [(match_operand: 1 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand ")])) + (match_operand:SI 5 "const1248_operand ") + (match_operand:VEC_GATHER_MODE 6 "maskload_else_operand")])) (mem:BLK (scratch)) (match_operand: 4 "register_operand")] UNSPEC_GATHER)) - (clobber (match_scratch:VEC_GATHER_MODE 7))])] + (clobber (match_scratch:VEC_GATHER_MODE 8))])] "TARGET_AVX2" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx2_gatherdi" @@ -29301,7 +29319,8 @@ (define_insn "*avx2_gatherdi" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "jb") (match_operand: 4 "register_operand" "x") - (match_operand:SI 6 "const1248_operand")] + (match_operand:SI 6 "const1248_operand") + (match_operand:VEC_GATHER_MODE 8 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 5 "register_operand" "1")] @@ -29322,7 +29341,8 @@ (define_insn "*avx2_gatherdi_2" [(unspec:P [(match_operand:P 2 "vsib_address_operand" "jb") (match_operand: 3 "register_operand" "x") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VEC_GATHER_MODE 7 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 4 "register_operand" "1")] @@ -29348,7 +29368,8 @@ (define_insn "*avx2_gatherdi_3" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "jb") (match_operand: 4 "register_operand" "x") - (match_operand:SI 6 "const1248_operand")] + (match_operand:SI 6 "const1248_operand") + (match_operand:VI4F_256 8 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 5 "register_operand" "1")] @@ -29372,7 +29393,8 @@ (define_insn "*avx2_gatherdi_4" [(unspec:P [(match_operand:P 2 "vsib_address_operand" "jb") (match_operand: 3 "register_operand" "x") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI4F_256 7 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 4 "register_operand" "1")] @@ -29393,17 +29415,19 @@ (define_expand "_gathersi" [(match_operand:VI48F 1 "register_operand") (match_operand: 4 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand")]))] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 6 "maskload_else_operand")]))] UNSPEC_GATHER)) - (clobber (match_scratch: 7))])] + (clobber (match_scratch: 8))])] "TARGET_AVX512F" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_gathersi" @@ -29415,7 +29439,8 @@ (define_insn "*avx512f_gathersi" [(unspec:P [(match_operand:P 4 "vsib_address_operand" "Tv") (match_operand: 3 "register_operand" "v") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 8 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch: 2 "=&Yk"))] @@ -29436,7 +29461,8 @@ (define_insn "*avx512f_gathersi_2" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand")] + (match_operand:SI 4 "const1248_operand") + (match_operand:VI48F 7 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch: 1 "=&Yk"))] @@ -29455,17 +29481,19 @@ (define_expand "_gatherdi" [(match_operand: 1 "register_operand") (match_operand:QI 4 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand")]))] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 6 "maskload_else_operand")]))] UNSPEC_GATHER)) - (clobber (match_scratch:QI 7))])] + (clobber (match_scratch:QI 8))])] "TARGET_AVX512F" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_gatherdi" @@ -29477,7 +29505,8 @@ (define_insn "*avx512f_gatherdi" [(unspec:P [(match_operand:P 4 "vsib_address_operand" "Tv") (match_operand: 3 "register_operand" "v") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 8 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch:QI 2 "=&Yk"))] @@ -29498,7 +29527,8 @@ (define_insn "*avx512f_gatherdi_2" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand")] + (match_operand:SI 4 "const1248_operand") + (match_operand:VI48F 7 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch:QI 1 "=&Yk"))] @@ -29535,7 +29565,7 @@ (define_expand "_scattersi" operands[5] = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], operands[4], operands[1]), - UNSPEC_VSIBADDR); + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_scattersi" From patchwork Fri Oct 18 14:22:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1999208 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=LrHbELlx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XVRnX4LvFz1xw2 for ; Sat, 19 Oct 2024 01:25:36 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CD3453858410 for ; Fri, 18 Oct 2024 14:25:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id D71023858282 for ; Fri, 18 Oct 2024 14:22:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D71023858282 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D71023858282 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261362; cv=none; b=pituFBp5BwkJa8pU6lNDZn8NNkqO0r3VHHnb/FAONavKlJ1+pXMd1dU3ITwtpBFFYlbLBCUwdyTuYBSwtT9b2vUKyfuo2WB83tiCbLuom805Jd/DUX5/bhK+3VN7PuhY5fyclLVXYIo+sGOPcqKm4dzcaTsyDcYrxbimtZDzFPA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729261362; c=relaxed/simple; bh=27SJjBNZ68bmD0kJlpyV23ldoM7/AoNvU5ik9XrBXDs=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=cvsM4UUZYY5oqtuPRbGj2bxbXC+jEKmcB5t4kx4WtiFf9Ms6HqdCHw6OTGDYeayIJMJe+dAIXQLzdm0Dckef46isD4/r/BmpaglpFUTMo+Rh9/Yv5bH4rqeGoucL6uYmpKCMpgO50rd81QSHO5WLg8MNYPbXTc15bD7B66GgfBo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-a9a2cdc6f0cso265544966b.2 for ; Fri, 18 Oct 2024 07:22:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729261350; x=1729866150; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zyQkWOIHdJM/+cYOES9I1Dmo7cBKEOVJFKOh7QZ6xCg=; b=LrHbELlxDpKg8iIRLO6A84m6XMeSn5o8061rLHdR5E9YI/h9H2cPQPECerQdBmzCmz NnEDOk1EjiYxuT2Wr047hW6ulg3IkdwX3+Ji7ErN4UFyuHIhrxJtrihxxDUvnbIsB7uU GOqWoeugyO1k7aFbsHOLKG7PvPMkvwSPWysN5WSOgYhJii8l7cnb8Jh3VnDDWaJWW8K5 xICYeh2ZXbId3UiZnfnrfH28+wahDD56fUTAV+7gv9DhsLGh6sXlN3NauTi6tV4iqHTT Nb0I796lIrhTjb4pe8nuMmNpky1l2husUTc60cmkPbcwOxH5HJ+h55s8b7TuhhicdUPf 8YpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729261350; x=1729866150; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zyQkWOIHdJM/+cYOES9I1Dmo7cBKEOVJFKOh7QZ6xCg=; b=f/c9joa8nAC5MnK7nH3MI5qxhFW7dyEgGcmDJEu5d1FfA9oAm2y0xfcUpfUPDBJcdc LU1YpnPMJB6B8nJ0iSWQm8fcqfAJ2Ou1HnoMHoT9gcpaNC141qcyFjdTy0Tj1ROwtoDR teGWtxMr1bBqcybba8QTYR3K4mTH5/yVltpUvZNvhfVW8fs7dMyloXa3mNF+Q/4+y+4M 0KKJvjIyrSXOEN/DIbWFveEy77oeSCvQTQ8Ac8DDHLNPBbaYrgcy7xt+lOq1effzewFX bO0rPXF+JFEa2vM+SnoK0k31i6GrVs0/wKDgihnVOAnUvJbXtOrr2rvOeiRJj+GecEBo +IWg== X-Gm-Message-State: AOJu0YzIFNmj3MJkiwg6u87UmeO8iptS1dcEGBffyq0cNWXqaXzVnLfE vCeZFR9DU00j+8Ajdt2ytqvFSBbcqenswLM6KBlt9uQvRQWMnLtzmvlfGw== X-Google-Smtp-Source: AGHT+IH2qnXoZi9yG6TDmVykPpkpbsBjWiq8Bkivz2cNtfXmPlN4wFp+Dbj6zFemWNhcZ2B3vQf9oQ== X-Received: by 2002:a17:907:7f8e:b0:a99:5f16:3539 with SMTP id a640c23a62f3a-a9a695d79c0mr225955266b.0.1729261350156; Fri, 18 Oct 2024 07:22:30 -0700 (PDT) Received: from x1c10.dc1.ventanamicro.com (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a68c2677esm102812166b.188.2024.10.18.07.22.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Oct 2024 07:22:29 -0700 (PDT) From: Robin Dapp X-Google-Original-From: Robin Dapp To: gcc-patches@gcc.gnu.org Cc: rdapp.gcc@gmail.com, rguenther@suse.de, richard.sandiford@arm.com, jeffreyalaw@gmail.com, ams@baylibre.com Subject: [PATCH v2 8/8] RISC-V: Add else operand to masked loads [PR115336]. Date: Fri, 18 Oct 2024 16:22:20 +0200 Message-ID: <20241018142220.173482-9-rdapp@ventanamicro.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241018142220.173482-1-rdapp@ventanamicro.com> References: <20241018142220.173482-1-rdapp@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds else operands to masked loads. Currently the default else operand predicate accepts "undefined" (i.e. SCRATCH) as well as all-ones values. Note that this series introduces a large number of new RVV FAILs for riscv. All of them are due to us not being able to elide redundant vec_cond_exprs. PR middle-end/115336 PR middle-end/116059 gcc/ChangeLog: * config/riscv/autovec.md: Add else operand. * config/riscv/predicates.md (maskload_else_operand): New predicate. * config/riscv/riscv-v.cc (get_else_operand): Remove static. (expand_load_store): Use get_else_operand and adjust index. (expand_gather_scatter): Ditto. (expand_lanes_load_store): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr115336.c: New test. * gcc.target/riscv/rvv/autovec/pr116059.c: New test. --- gcc/config/riscv/autovec.md | 45 +++++++++++-------- gcc/config/riscv/predicates.md | 3 ++ gcc/config/riscv/riscv-v.cc | 26 +++++++---- .../gcc.target/riscv/rvv/autovec/pr115336.c | 20 +++++++++ .../gcc.target/riscv/rvv/autovec/pr116059.c | 13 ++++++ 5 files changed, 80 insertions(+), 27 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 7dc78a48874..a09f94021ca 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -26,8 +26,9 @@ (define_expand "mask_len_load" [(match_operand:V 0 "register_operand") (match_operand:V 1 "memory_operand") (match_operand: 2 "vector_mask_operand") - (match_operand 3 "autovec_length_operand") - (match_operand 4 "const_0_operand")] + (match_operand:V 3 "maskload_else_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] "TARGET_VECTOR" { riscv_vector::expand_load_store (operands, true); @@ -57,8 +58,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -72,8 +74,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -87,8 +90,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -102,8 +106,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -117,8 +122,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -132,8 +138,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -151,8 +158,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR" { riscv_vector::expand_gather_scatter (operands, true); @@ -280,8 +288,9 @@ (define_expand "vec_mask_len_load_lanes" [(match_operand:VT 0 "register_operand") (match_operand:VT 1 "memory_operand") (match_operand: 2 "vector_mask_operand") - (match_operand 3 "autovec_length_operand") - (match_operand 4 "const_0_operand")] + (match_operand 3 "maskload_else_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] "TARGET_VECTOR" { riscv_vector::expand_lanes_load_store (operands, true); diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 9971fabc587..7cc7c2b1f9d 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -528,6 +528,9 @@ (define_predicate "autovec_else_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "scratch_operand"))) +(define_predicate "maskload_else_operand" + (match_operand 0 "scratch_operand")) + (define_predicate "vector_arith_operand" (ior (match_operand 0 "register_operand") (and (match_code "const_vector") diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index fba35652cc2..137f7f20268 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3781,12 +3781,23 @@ expand_select_vl (rtx *ops) emit_insn (gen_no_side_effects_vsetvl_rtx (rvv_mode, ops[0], ops[1])); } +/* Return RVV_VUNDEF if the ELSE value is scratch rtx. */ +static rtx +get_else_operand (rtx op) +{ + return GET_CODE (op) == SCRATCH ? RVV_VUNDEF (GET_MODE (op)) : op; +} + /* Expand MASK_LEN_{LOAD,STORE}. */ void expand_load_store (rtx *ops, bool is_load) { - rtx mask = ops[2]; - rtx len = ops[3]; + int idx = 2; + rtx mask = ops[idx++]; + /* A masked load has a merge/else operand. */ + if (is_load) + get_else_operand (ops[idx++]); + rtx len = ops[idx]; machine_mode mode = GET_MODE (ops[0]); if (is_vlmax_len_p (mode, len)) @@ -3879,13 +3890,6 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, rtx *ops, rtx len) emit_nonvlmax_insn (icode, insn_flags, ops, len); } -/* Return RVV_VUNDEF if the ELSE value is scratch rtx. */ -static rtx -get_else_operand (rtx op) -{ - return GET_CODE (op) == SCRATCH ? RVV_VUNDEF (GET_MODE (op)) : op; -} - /* Expand unary ops COND_LEN_*. */ void expand_cond_len_unop (unsigned icode, rtx *ops) @@ -4006,6 +4010,8 @@ expand_gather_scatter (rtx *ops, bool is_load) int shift; rtx mask = ops[5]; rtx len = ops[6]; + if (is_load) + len = ops[7]; if (is_load) { vec_reg = ops[0]; @@ -4228,6 +4234,8 @@ expand_lanes_load_store (rtx *ops, bool is_load) { rtx mask = ops[2]; rtx len = ops[3]; + if (is_load) + len = ops[4]; rtx addr = is_load ? XEXP (ops[1], 0) : XEXP (ops[0], 0); rtx reg = is_load ? ops[0] : ops[1]; machine_mode mode = GET_MODE (ops[0]); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c new file mode 100644 index 00000000000..aa2d02309be --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c @@ -0,0 +1,20 @@ +/* { dg-do run } */ +/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d } } */ +/* { dg-require-effective-target rvv_zvl256b_ok } */ + +short d[19]; +_Bool e[100][19][19]; +_Bool f[10000]; + +int main() +{ + for (long g = 0; g < 19; ++g) + d[g] = 3; + _Bool(*h)[19][19] = e; + for (short g = 0; g < 9; g++) + for (int i = 4; i < 16; i += 3) + f[i * 9 + g] = d[i] ? d[i] : h[g][i][2]; + for (long i = 120; i < 122; ++i) + if (f[i] != 1) + __builtin_abort (); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c new file mode 100644 index 00000000000..01650e7381d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c @@ -0,0 +1,13 @@ +/* { dg-do run } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O2" } */ + +char a; +_Bool b[11] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; +int main() { + _Bool *c = b; + for (signed d = 0; d < 11; d += 1) + a = d % 2 == 0 ? c[d] / c[d] + : c[d]; + if (a != 1) + __builtin_abort (); +}