From patchwork Sun Aug 11 21:00:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971380 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=crdmqwCN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WhqnJ22Fdz1yYl for ; Mon, 12 Aug 2024 07:01:08 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6AD30385840D for ; Sun, 11 Aug 2024 21:01:06 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by sourceware.org (Postfix) with ESMTPS id C5FD1385840A for ; Sun, 11 Aug 2024 21:00:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C5FD1385840A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C5FD1385840A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::132 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410041; cv=none; b=kc1Vy2XHFeAaX6YmStfJbHAo9m0TSNJoNjYnL2pFg5Sb60tjqBud/+Egrz9qxCYX7Et275xH8gzZ+Kk3EJOb96+OzmLOfADBlxxwdGepnhXbQbKbwp2jVhMdppVL0QTbORApXEiznUiE58RIFEl6gju1xfi2NkBiyeOVVrvdgsY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410041; c=relaxed/simple; bh=8fRvkbaajY6MYwRM8IJ0PFTnjPvLVYa5dKBn+Hkqmxs=; h=DKIM-Signature:Mime-Version:Date:Message-Id:To:From:Subject; b=DyXpnmXenzAo40immoy+1JgnN+NXp8DbQP9NxXEiwZGyYsqlcaNUPD3jczsETdgrBJicsrlUDKARCSTqwd16lhhsWqjlUrkUUXrzKmIy07CmXgIrBpZMAxOVriQ4N7AhENcSivCd5MztfQlUJs7P26PV8uMHBYK3Ij80ytdEfdo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x132.google.com with SMTP id 2adb3069b0e04-52fc4388a64so5144061e87.1 for ; Sun, 11 Aug 2024 14:00:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410037; x=1724014837; darn=gcc.gnu.org; h=subject:from:to:cc:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MwjS0kow+EJJmgZ9IMF1m1ymOGcwsT7/f8HNU8tdTW4=; b=crdmqwCN4B3ZcPiHEtIjcA6QP7+fzyP5jo3SM19Uy7YJN2B8exojcYyx+GiU+5vVif iJeXUpuBfdVJ4HwA9/7BkLZ8hPifaNzwq/aNGw3g67na75gRiFjXl3qOg3HQlfqIsZP+ sDqxpzMy23OAfoX7WuMfbx0+oo4eGgdA3vDuD4CR5bPq/QoKAmIFjj84ov/VWQqpmYGj ei5xkJtylO4FZUt3f4P6pv9q8ZnUnQ9Xees1K6Uqt2/xlXHV3YQ+lBzjKCFqtk05Tz3x pMaEch2L9gC9HuMV6UN4LfsDXC4n6Sq/QXzIPSmu7Z4v2SmZkVW/EUxZhhI4gGwylhBq xF3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410037; x=1724014837; h=subject:from:to:cc:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MwjS0kow+EJJmgZ9IMF1m1ymOGcwsT7/f8HNU8tdTW4=; b=jAZaGUirV6HGN3cHXBh8NFSmMNe4IHEXpKZ7QWCW3J2SMhYlCOVESf0rchjXBKP0Oz 3nUscKw4/+UgmTiC0UwbHmGMuiMMN1pKsVDO1JlBh2C38JhmB6E2QheUExpUe9MeoKMn qQL7YtutG2I5hE7fJiH3dNnBOBpHyPVptu7mkZ6dcHlYE6oFY3jthxcKjQLRr8AIpEG1 KdpAa/MU8jGK2pEil432+JfiuaChKtEdUPRX7OrTnfJAm4ILng5/xw1Q/wO4fD9vGACc LO/MCSmgFIuWQa118UsHPK+TihCyDCz8i3bmbHUcuTRoQ36xvyCG9n9DqcttYeDami10 GueQ== X-Gm-Message-State: AOJu0Yzcw4Gq29noKXz84oSro7Vx/R/9RjO2a9VWfTvWOV+5XCf5Ck12 WbwGMrwKviVGxU5KzCWXVdV8gpkLlwZSTL90CTt31JG4BLk4QtyQX7gmsw== X-Google-Smtp-Source: AGHT+IGT/i9g73HXSiSnioUIvh0xMp2a50XU51in0l8z6sz1DDaMAOJCOpkxzVQUwwT4IMt/oFn7FA== X-Received: by 2002:ac2:4e0b:0:b0:52f:c24b:175f with SMTP id 2adb3069b0e04-530ee99e101mr6033109e87.20.1723410036126; Sun, 11 Aug 2024 14:00:36 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a80bb090b38sm174508266b.13.2024.08.11.14.00.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:00:35 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:00:35 +0200 Message-Id: Cc: , "Richard Sandiford" , "Richard Biener" To: "gcc-patches" From: "Robin Dapp" Subject: [PATCH 1/8] docs: Document maskload else operand and behavior. X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch amends the documentation for masked loads (maskload, vec_mask_load_lanes, and mask_gather_load as well as their len counterparts) with an else operand. gcc/ChangeLog: * doc/md.texi: Document masked load else operand. --- gcc/doc/md.texi | 60 +++++++++++++++++++++++++++++++------------------ 1 file changed, 38 insertions(+), 22 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5dc0d55edd6..4047d8f58fe 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5017,8 +5017,9 @@ This pattern is not allowed to @code{FAIL}. @item @samp{vec_mask_load_lanes@var{m}@var{n}} Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional mask operand (operand 2) that specifies which elements of the destination -vectors should be loaded. Other elements of the destination -vectors are set to zero. The operation is equivalent to: +vectors should be loaded. Operand 3 is an else operand similar to the one +in @code{maskload}. +The operation is equivalent to: @smallexample int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); @@ -5028,7 +5029,7 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) operand0[i][j] = operand1[j * c + i]; else for (i = 0; i < c; i++) - operand0[i][j] = 0; + operand0[i][j] = operand3; @end smallexample This pattern is not allowed to @code{FAIL}. @@ -5036,16 +5037,20 @@ This pattern is not allowed to @code{FAIL}. @cindex @code{vec_mask_len_load_lanes@var{m}@var{n}} instruction pattern @item @samp{vec_mask_len_load_lanes@var{m}@var{n}} Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional -mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) -that specifies which elements of the destination vectors should be loaded. -Other elements of the destination vectors are undefined. The operation is equivalent to: +mask operand (operand 2), length operand (operand 4) as well as bias operand +(operand 5) that specifies which elements of the destination vectors should be +loaded. Operand 3 is an else operand similar to the one in @code{maskload}. +Other elements of the destination vectors are undefined. The operation +is equivalent to: @smallexample int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); -for (j = 0; j < operand3 + operand4; j++) - if (operand2[j]) - for (i = 0; i < c; i++) +for (j = 0; j < operand4 + operand5; j++) + for (i = 0; i < c; i++) + if (operand2[j]) operand0[i][j] = operand1[j * c + i]; + else + operand0[i][j] = operand3; @end smallexample This pattern is not allowed to @code{FAIL}. @@ -5125,18 +5130,24 @@ address width. @cindex @code{mask_gather_load@var{m}@var{n}} instruction pattern @item @samp{mask_gather_load@var{m}@var{n}} Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand as -operand 5. Bit @var{i} of the mask is set if element @var{i} +operand 5 and an else operand 6 similar to the one in @code{maskload}. +Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} -of the result should be set to zero. +of the result should be set to operand 6. @cindex @code{mask_len_gather_load@var{m}@var{n}} instruction pattern @item @samp{mask_len_gather_load@var{m}@var{n}} -Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand (operand 5), -a len operand (operand 6) as well as a bias operand (operand 7). Similar to mask_len_load, -the instruction loads at most (operand 6 + operand 7) elements from memory. +Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand +(operand 5) and an else operand (operand 6) similar to the one in +@code{maskload} as well as a len operand (operand 7) and a bias operand +(operand 8). + +Similar to mask_len_load the instruction loads at +most (operand 7 + operand 8) elements from memory. Bit @var{i} of the mask is set if element @var{i} of the result should -be loaded from memory and clear if element @var{i} of the result should be undefined. -Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +be loaded from memory and clear if element @var{i} of the result should +be set to operand 6. +Mask elements @var{i} with @var{i} > (operand 7 + operand 8) are ignored. @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} @@ -5368,8 +5379,12 @@ Operands 4 and 5 have a target-dependent scalar integer mode. @cindex @code{maskload@var{m}@var{n}} instruction pattern @item @samp{maskload@var{m}@var{n}} Perform a masked load of vector from memory operand 1 of mode @var{m} -into register operand 0. Mask is provided in register operand 2 of -mode @var{n}. +into register operand 0. The mask is provided in register operand 2 of +mode @var{n}. Operand 3 (the "else value") specifies which value is loaded +when the mask is unset. The predicate of operand 3 must only accept +the else values that the target actually supports. Currently two values +are attempted, zero and undefined. GCC handles an else value of zero more +efficiently than an undefined one. This pattern is not allowed to @code{FAIL}. @@ -5435,15 +5450,16 @@ Operands 0 and 1 have mode @var{m}, which must be a vector mode. Operand 3 has whichever integer mode the target prefers. A mask is specified in operand 2 which must be of type @var{n}. The mask has lower precedence than the length and is itself subject to length masking, -i.e. only mask indices < (operand 3 + operand 4) are used. +i.e. only mask indices < (operand 4 + operand 5) are used. +Operand 3 is an else operand similar to the one in @code{maskload}. Operand 4 conceptually has mode @code{QI}. -Operand 2 can be a variable or a constant amount. Operand 4 specifies a +Operand 2 can be a variable or a constant amount. Operand 5 specifies a constant bias: it is either a constant 0 or a constant -1. The predicate on -operand 4 must only accept the bias values that the target actually supports. +operand 5 must only accept the bias values that the target actually supports. GCC handles a bias of 0 more efficiently than a bias of -1. -If (operand 2 + operand 4) exceeds the number of elements in mode +If (operand 2 + operand 4 + operand 5) exceeds the number of elements in mode @var{m}, the behavior is undefined. If the target prefers the length to be measured in bytes From patchwork Sun Aug 11 21:00:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971381 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=jqNRIV/s; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WhqnV4dZ9z1yYl for ; Mon, 12 Aug 2024 07:01:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 735D1385C6CD for ; Sun, 11 Aug 2024 21:01:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) by sourceware.org (Postfix) with ESMTPS id 03E803858401 for ; Sun, 11 Aug 2024 21:00:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 03E803858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 03E803858401 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410051; cv=none; b=KA/8nVrMqQgvH8zzklny4N9TdGQmb2/mm3rhAFQJtg2y90wOMYzSkRxpOhw86i80/9V/LgZqdCm6V/ZDq2ALqhcLxG5BKn8d7eB2fDSZJ2GQ9kp9BexQIt9J/xlVGelf4QXJQKP3YmNBALWhmLf7b7zM7Ksx06bKwUXR8dU27DI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410051; c=relaxed/simple; bh=pJO/SrVhFwxH2zpcEeDARO8TbylpsPKzLX7+ahJkHf8=; h=DKIM-Signature:Mime-Version:Date:Message-Id:Subject:To:From; b=x+CSOLBvhyhdmQBdadJshDf9GQE4VxX7sYEZ+up5NUT1QO2vPMc9XQpZ8Yxv9tVfqlFVOusSIxsfRh/Lit09w7aDkovm8Q4slrd4FRzTdG4OCji0lbwRYtBVCcE4uV0pTqyYtHX1MvOuoxPcv6MvZdf8NqSu/ggwnxgUe/QMl1k= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-5b9d48d1456so2581731a12.1 for ; Sun, 11 Aug 2024 14:00:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410047; x=1724014847; darn=gcc.gnu.org; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=alFLX9d+Ymi7LTZ08oKbOKDv5fxyNA9ybthkqkiWtxA=; b=jqNRIV/s8t5nhWG22pXVvMSg7i/FST0dmT57FdasTr58stm0ApVxViZ6ms/NYiEQtB 6AKUG47LpxdKzX+7DEepdjLogv603RKlFgFv3fJbl7DzGGVccKRshYCDKKErdqrq8FC1 NpgCMMz9JMco/Ff+qM0hujZTb4zEaiESPN76tGLQkKrkqNf2g027+MXUMYcOEOqTHXMz +bLyRf6KTilP77uFK9fqTKW0y1Ak5lxVlqJWTaUhFORbAa8phpTY5SQFnV73BipQMPtz xm/1hzS30shXEnO155ou10Iq1XPZz9zSPLBCFLRvDnYU6FOE81ZkVx3rRsfvU+HaAWZ9 rOog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410047; x=1724014847; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=alFLX9d+Ymi7LTZ08oKbOKDv5fxyNA9ybthkqkiWtxA=; b=Azj2q8gxZDW+WXfUPZ7gZbD0hoN2rpC/1i+qvgZnJJk/SR6CISJSTIKoFckTGHKUiE i5Bga12nGRZ0hVoGHDLfl/8S703EFZjneYMacyg3auky3yF/sZPS+SwpnvelpVvfyMyC Neh88652QBYveasPp5VA261B2jcpQK6dPevwSuvl016ic0r+sw3o9PLcOWx0w6Ma7VUO jnAULmxWNROxEvkyQTuDtQwyCsrX3BoBQedWI40//mx2xMqwNxi09PaEJ5D8ATcEwWSJ gD2W8nE/58rWUdiVbXp16CzXPMqfLeijZDrhfSnH0uZkMGgb9vI29CorTFHKzzTtNfwL YV7Q== X-Gm-Message-State: AOJu0Yy4rAMLi7fdHMtM9hhOufhV3FOh4sRcgxTVP+clTaFZpTqQ1SlR ZkAgHIQ6Q1gj7gcXCI7nsDZK4z9MA3XwhJ+RIoVSfOgAGzojOLvtFX4jwQ== X-Google-Smtp-Source: AGHT+IET7DDhg4hc9ZsJovhQHH7usZ0qOludb63j5zrzROnjWQpn3XWND8iBemvD6pa8VynPDylrEQ== X-Received: by 2002:a05:6402:2187:b0:5b9:d835:ccbc with SMTP id 4fb4d7f45d1cf-5bbb3c422c9mr9653543a12.10.1723410046627; Sun, 11 Aug 2024 14:00:46 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5bd1a5e0030sm1621962a12.65.2024.08.11.14.00.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:00:46 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:00:45 +0200 Message-Id: Subject: [PATCH 2/8] ifn: Add else-operand handling. Cc: , "Richard Sandiford" , "Richard Biener" To: "gcc-patches" From: "Robin Dapp" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds else-operand handling to the internal functions. gcc/ChangeLog: * internal-fn.cc (add_mask_and_len_args): Rename... (add_mask_else_and_len_args): ...to this and add else handling. (expand_partial_load_optab_fn): Use adjusted function. (expand_partial_store_optab_fn): Ditto. (expand_scatter_store_optab_fn): Ditto. (expand_gather_load_optab_fn): Ditto. (internal_fn_len_index): Adjust for masked loads. (internal_fn_else_index): Add masked loads. --- gcc/internal-fn.cc | 69 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 58 insertions(+), 11 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 8a2e07f2f96..586978e8f3f 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -331,17 +331,18 @@ get_multi_vector_move (tree array_type, convert_optab optab) return convert_optab_handler (optab, imode, vmode); } -/* Add mask and len arguments according to the STMT. */ +/* Add mask, else, and len arguments according to the STMT. */ static unsigned int -add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt) +add_mask_else_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt) { internal_fn ifn = gimple_call_internal_fn (stmt); int len_index = internal_fn_len_index (ifn); /* BIAS is always consecutive next of LEN. */ int bias_index = len_index + 1; int mask_index = internal_fn_mask_index (ifn); - /* The order of arguments are always {len,bias,mask}. */ + + /* The order of arguments is always {mask, else, len, bias}. */ if (mask_index >= 0) { tree mask = gimple_call_arg (stmt, mask_index); @@ -362,6 +363,23 @@ add_mask_and_len_args (expand_operand *ops, unsigned int opno, gcall *stmt) create_input_operand (&ops[opno++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); + + } + + int els_index = internal_fn_else_index (ifn); + if (els_index >= 0) + { + tree els = gimple_call_arg (stmt, els_index); + tree els_type = TREE_TYPE (els); + if (TREE_CODE (els) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (els) + && VAR_P (SSA_NAME_VAR (els))) + create_undefined_input_operand (&ops[opno++], TYPE_MODE (els_type)); + else + { + rtx els_rtx = expand_normal (els); + create_input_operand (&ops[opno++], els_rtx, TYPE_MODE (els_type)); + } } if (len_index >= 0) { @@ -3014,7 +3032,7 @@ static void expand_partial_load_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab) { int i = 0; - class expand_operand ops[5]; + class expand_operand ops[6]; tree type, lhs, rhs, maskt; rtx mem, target; insn_code icode; @@ -3044,7 +3062,7 @@ expand_partial_load_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab) target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); create_call_lhs_operand (&ops[i++], target, TYPE_MODE (type)); create_fixed_operand (&ops[i++], mem); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); expand_insn (icode, i, ops); assign_call_lhs (lhs, target, &ops[0]); @@ -3090,7 +3108,7 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab reg = expand_normal (rhs); create_fixed_operand (&ops[i++], mem); create_input_operand (&ops[i++], reg, TYPE_MODE (type)); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); expand_insn (icode, i, ops); } @@ -3676,7 +3694,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); create_integer_operand (&ops[i++], scale_int); create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)), TYPE_MODE (TREE_TYPE (offset))); @@ -3705,7 +3723,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); create_integer_operand (&ops[i++], scale_int); - i = add_mask_and_len_args (ops, i, stmt); + i = add_mask_else_and_len_args (ops, i, stmt); insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)), TYPE_MODE (TREE_TYPE (offset))); expand_insn (icode, i, ops); @@ -4590,6 +4608,18 @@ get_len_internal_fn (internal_fn fn) case IFN_COND_##NAME: \ return IFN_COND_LEN_##NAME; #include "internal-fn.def" + default: + break; + } + + switch (fn) + { + case IFN_MASK_LOAD: + return IFN_MASK_LEN_LOAD; + case IFN_MASK_LOAD_LANES: + return IFN_MASK_LEN_LOAD_LANES; + case IFN_MASK_GATHER_LOAD: + return IFN_MASK_LEN_GATHER_LOAD; default: return IFN_LAST; } @@ -4775,8 +4805,12 @@ internal_fn_len_index (internal_fn fn) case IFN_LEN_STORE: return 2; - case IFN_MASK_LEN_GATHER_LOAD: case IFN_MASK_LEN_SCATTER_STORE: + return 5; + + case IFN_MASK_LEN_GATHER_LOAD: + return 6; + case IFN_COND_LEN_FMA: case IFN_COND_LEN_FMS: case IFN_COND_LEN_FNMA: @@ -4801,13 +4835,15 @@ internal_fn_len_index (internal_fn fn) return 4; case IFN_COND_LEN_NEG: - case IFN_MASK_LEN_LOAD: case IFN_MASK_LEN_STORE: - case IFN_MASK_LEN_LOAD_LANES: case IFN_MASK_LEN_STORE_LANES: case IFN_VCOND_MASK_LEN: return 3; + case IFN_MASK_LEN_LOAD: + case IFN_MASK_LEN_LOAD_LANES: + return 4; + default: return -1; } @@ -4857,6 +4893,12 @@ internal_fn_else_index (internal_fn fn) case IFN_COND_LEN_SHR: return 3; + case IFN_MASK_LOAD: + case IFN_MASK_LEN_LOAD: + case IFN_MASK_LOAD_LANES: + case IFN_MASK_LEN_LOAD_LANES: + return 3; + case IFN_COND_FMA: case IFN_COND_FMS: case IFN_COND_FNMA: @@ -4867,6 +4909,10 @@ internal_fn_else_index (internal_fn fn) case IFN_COND_LEN_FNMS: return 4; + case IFN_MASK_GATHER_LOAD: + case IFN_MASK_LEN_GATHER_LOAD: + return 5; + default: return -1; } @@ -4898,6 +4944,7 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_LEN_SCATTER_STORE: return 4; + case IFN_VCOND_MASK: case IFN_VCOND_MASK_LEN: return 0; From patchwork Sun Aug 11 21:00:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971384 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=ivhIDmdK; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Whqpc33zWz1yYl for ; Mon, 12 Aug 2024 07:02:16 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4560F385DDCF for ; Sun, 11 Aug 2024 21:02:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by sourceware.org (Postfix) with ESMTPS id BE5AF3858D39 for ; Sun, 11 Aug 2024 21:00:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BE5AF3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BE5AF3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410054; cv=none; b=PjEQrhT/AyyH8FzE3s3Nrk++IIuYrtvsatjq8XKjTTKofcR5ShqwUCpgXs9Xr6CLi0tNyShfPvUvD2VWW0/bA2LemVDmye6mSAGwMTshwWZQFlBbThjgd0prc5BfuInjZSnU8l0XN8VEbTvMX4/E7B/eaZxY579VeqmdHAAbUZI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410054; c=relaxed/simple; bh=2OQX0oeM8HBVbUKX7p5hdFsivlG0iV2tUIV+PM4w3js=; h=DKIM-Signature:Mime-Version:Date:Message-Id:From:Subject:To; b=QyJ1K868DsnbZi7OIMjzXKtt+P0HnmweZLlhWptj1GUEDW4+RVYda47teNAjtRBR64KFBBGpNWpBdNlbDoT0ZpabFWBjyUGo54+OnbN2TzffIS3PnbP4wgNkU46a/wKrRJ2bpzjO1LRjZa0zRTmcVC7qrPdlXdeLirnLZB11eiY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-530e062217eso4498010e87.1 for ; Sun, 11 Aug 2024 14:00:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410050; x=1724014850; darn=gcc.gnu.org; h=to:cc:subject:from:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=NJFt71B4HgU/z+9vIACDv6CKrcm9xRFQGZKF5gXAQgg=; b=ivhIDmdKhwYUvbuEXdbL+F4D3pslWwY6XqrKPtxLXwQsrkMFDj8Y8m0aQMIiEeardd qzaRFAkeSUQup1Xdz6FnAvLdAMxKUp/wMJwQ27TTBbXME5UOUXYR7Oo7FP/N5lyoWasa Ec1aP1CORbNGuFvRQ/hTCuHTu4xFOEdkuRP0ox9r9c7CwRanFGFsuOF5vtS0Y9OV3cUX jCJ79xIOE7CPncGTUCbsV/ngoNKPUqn3c5vv+wy+J3A3RUpRqllUuMKfLGp3PJ+AvgCt YYHlBRZsKuI6+Q3dOfWzmanNkDFHdRLzIVjzPwW/i46mczrYJx8dWSeG+MencpQ+UlNk odqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410050; x=1724014850; h=to:cc:subject:from:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NJFt71B4HgU/z+9vIACDv6CKrcm9xRFQGZKF5gXAQgg=; b=ErC3ArZBAd+zBUTQLl820BlPVLM7ZSpR6hPlW1QBL5HNm2LDWfYj9DNcRgOBMKNYX9 B1arNaIH/Xk86Jfrm1X+LHi69ItTi9UTijIPgLmul7fuArSa0AZIsYlOZ/DR4UhSk23J 1/uTyh+LlpD+pIBlTXXixPNtoOn0rcFLb+019mQ+JpFpD2rjbHTHfvmiDivkfu8+LEjf F38DYzxbWud6DcYXV7vSEDGP/i3a+u+4Hoo+MKyTDH8NIN8MFgse81feb4wPtxC4qL4u Z/sjm7TRC4KEXqR1GXdV1jFx1tgdsvFIyC1YAggqnA7FOQX7C5XbNh0PVCpheEgC7FpV rokg== X-Gm-Message-State: AOJu0YzPHdB+15yIpUZ4J9A1KbMV/Puky1i5ULXl75OVK2ebQe2VwIOn fJqxo85zsdqngT2n6f1y0b/aFHp5TxZOuKi4VSE3YDbjjkCdd7g5pWeL1w== X-Google-Smtp-Source: AGHT+IHx2KLDcHbtV6uSjdM/5vYGw4FWsFpfHOnRIvBpfonzxo4HPYXz3G3CAg0BzcG9Vtv2H/5LUQ== X-Received: by 2002:a05:6512:b08:b0:530:dab8:7dd9 with SMTP id 2adb3069b0e04-530ee96f49amr6197701e87.12.1723410049041; Sun, 11 Aug 2024 14:00:49 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5bd187f5d3asm1601249a12.12.2024.08.11.14.00.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:00:48 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:00:48 +0200 Message-Id: From: "Robin Dapp" Subject: [PATCH 3/8] tree-ifcvt: Enforce zero else value after maskload. Cc: , "" , "Richard Biener" To: "gcc-patches" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org When predicating a load we implicitly assume that the else value is zero. In order to formalize this this patch queries the target for its supported else operand and uses that for the maskload call. Subsequently, if the else operand is nonzero, a cond_expr enforcing a zero else value is emitted. gcc/ChangeLog: * tree-if-conv.cc (predicate_load_or_store): Enforce zero else value. (predicate_statements): Use sequence instead of statement. --- gcc/tree-if-conv.cc | 78 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 62 insertions(+), 16 deletions(-) diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 57992b6deca..54cb9ef0ef1 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -2452,10 +2452,12 @@ mask_exists (int size, const vec &vec) write and it needs to be predicated by MASK. Return a statement that does so. */ -static gimple * -predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask) +static gimple_seq +predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask, + hash_set *ssa_names) { - gcall *new_stmt; + gimple_seq stmts = NULL; + gcall *call_stmt; tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); @@ -2471,21 +2473,59 @@ predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask) ref); if (TREE_CODE (lhs) == SSA_NAME) { - new_stmt - = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, - ptr, mask); - gimple_call_set_lhs (new_stmt, lhs); - gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + /* Get the preferred vector mode and its corresponding mask for the + masked load. We need this to query the target's supported else + operands. */ + machine_mode mode = TYPE_MODE (TREE_TYPE (addr)); + scalar_mode smode = as_a (mode); + + machine_mode vmode = targetm.vectorize.preferred_simd_mode (smode); + machine_mode mask_mode + = targetm.vectorize.get_mask_mode (vmode).require (); + + int elsval; + internal_fn ifn; + target_supports_mask_load_store_p (vmode, mask_mode, true, &ifn, &elsval); + tree els = vect_get_mask_load_else (elsval, TREE_TYPE (lhs)); + + call_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 4, addr, + ptr, mask, els); + + /* Build the load call and, if the else value is nonzero, + a COND_EXPR that enforces it. */ + tree loadlhs; + if (elsval == MASK_LOAD_ELSE_ZERO) + gimple_call_set_lhs (call_stmt, gimple_get_lhs (stmt)); + else + { + loadlhs = make_temp_ssa_name (TREE_TYPE (lhs), NULL, "_ifc_"); + ssa_names->add (loadlhs); + gimple_call_set_lhs (call_stmt, loadlhs); + } + gimple_set_vuse (call_stmt, gimple_vuse (stmt)); + gimple_seq_add_stmt (&stmts, call_stmt); + + if (elsval != MASK_LOAD_ELSE_ZERO) + { + tree cond_rhs + = fold_build_cond_expr (TREE_TYPE (loadlhs), mask, loadlhs, + build_zero_cst (TREE_TYPE (loadlhs))); + gassign *cond_stmt + = gimple_build_assign (gimple_get_lhs (stmt), cond_rhs); + gimple_seq_add_stmt (&stmts, cond_stmt); + } } else { - new_stmt + call_stmt = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr, mask, rhs); - gimple_move_vops (new_stmt, stmt); + gimple_move_vops (call_stmt, stmt); + gimple_seq_add_stmt (&stmts, call_stmt); } - gimple_call_set_nothrow (new_stmt, true); - return new_stmt; + gimple_call_set_nothrow (call_stmt, true); + return stmts; } /* STMT uses OP_LHS. Check whether it is equivalent to: @@ -2789,11 +2829,17 @@ predicate_statements (loop_p loop) vect_masks.safe_push (mask); } if (gimple_assign_single_p (stmt)) - new_stmt = predicate_load_or_store (&gsi, stmt, mask); - else - new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); + { + gimple_seq call_seq + = predicate_load_or_store (&gsi, stmt, mask, &ssa_names); - gsi_replace (&gsi, new_stmt, true); + gsi_replace_with_seq (&gsi, call_seq, true); + } + else + { + new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); + gsi_replace (&gsi, new_stmt, true); + } } else if (((lhs = gimple_assign_lhs (stmt)), true) && (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) From patchwork Sun Aug 11 21:00:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971383 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=BXA9TL1u; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Whqnx5NdSz1yYl for ; Mon, 12 Aug 2024 07:01:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 96C77385C6C0 for ; Sun, 11 Aug 2024 21:01:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) by sourceware.org (Postfix) with ESMTPS id 20DA03858CD9 for ; Sun, 11 Aug 2024 21:00:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 20DA03858CD9 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 20DA03858CD9 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::232 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410065; cv=none; b=ANlcQU/WSZx5OKS0Xa3RYF4az1RDCff5mQ0Mxg0zsoze6YY2KELspCcnOqW8XA6iFQdrUA9nxw22FnYJPyY3c7UIwB2Q9rFcMU902k4iAbyB+CZ3b3Z17m/cmUyBRXGZnSLU0dGsqXfdqGLrsiQKH1dKlUzm5NKhbnDCtE8wHEU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410065; c=relaxed/simple; bh=50thlf9QyyruQW/6Qvp04qlvU2T90bct/iIN/rVnUQE=; h=DKIM-Signature:Mime-Version:Date:Message-Id:To:From:Subject; b=g5dXCK8fWhL0nEZr8tBwSsGN9Ugdj86lgAXH/7Dy5inJHe8H1lOGqXl7PafJijH0QvJi9teSpwRUHSKn/690p4tkw8LEzJ2jIP41Py84yT8EPXe0xFjPHDWqj6FpecT7M8u8lBQmsOcvQTK6g0OraJToBb7L9UD5oQj2DUrpXIo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2f040733086so34777401fa.1 for ; Sun, 11 Aug 2024 14:00:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410056; x=1724014856; darn=gcc.gnu.org; h=subject:from:to:cc:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=wxi8GlbG/0+8Xj9eL0VtyXotPHyaqrqWIWB2BC8LFII=; b=BXA9TL1uDWr8Q7MyK8tok4cAVWSPUAWX9sTCaxvQt7zUeXavFeNFg3Z6NIzoLtsrkL xG+J0B+EPaDMGymgcTzfjRl9iBY3DUplLoCtjTeagQ1rseumZSTCabInSIvyzvbOXC+q CjqFm/KQN2JH/09n3LR9MvLwUpMkWsc8tSskLEZqsOHffYtdIuZm//oK+eLfMZxG9RGd UZGGPOL778TlyCF5+NPJXlmLssSBkTvNqD4eaPDceHHx62RQNFRKdSTsgQqwvT3t8iTC dROfQoJNffjcznCRl/bW+z8HAqvWzygEYme9qCBpYS5tkz5lzBM2bmsIWPpF+AigoNx+ QYEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410056; x=1724014856; h=subject:from:to:cc:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wxi8GlbG/0+8Xj9eL0VtyXotPHyaqrqWIWB2BC8LFII=; b=VY1uSc1m10mRgaDvmEw7Gw4f5GY0pq/4Eld22hulkVcJu0xYOIXrrmZOs1XNIk/xrg bfaZDMMtHu92C3pKzNOkxJsKQ2FsxoQgHLTZiIix2em7QE4c/lqAVAFiCRoxcAhulEq5 QU4wMDHZ2Owu/OsBVYrIadkHGcy1qNKQO5u2Yh/QKo7uSxQMpSKwpEMbbiGsnAI9VzOb kRKDnl4g555+wWltTfTovT0+xAnCKPkylgo99GkfCzyU5A/vTkGIb2Z2gu6gi9U4CNeV iku8wmIHHBdnz++E1g08HQPin3q2X1AaaHukG2czjbjNf0SjuppHfUAsYJnEQ1C+8LZX H5Fg== X-Gm-Message-State: AOJu0Yznnkt9eLPiRMM1RpZbKrahGMFq1yEfDS+MgNIE/qFYW+RXrJ5f 7QB0NP31SuxA/BlnA5aJkcCYVaIYHg27YgWaWyUXsFdMmwxokOu38N62Vg== X-Google-Smtp-Source: AGHT+IF2zEF4DVW37diWfcU84LHVg/o5SpAUPCg5+/mtH2OOhLi55vxP5PNsnpWa6HHB2tZI6HvH5w== X-Received: by 2002:a2e:be07:0:b0:2ef:2b38:879c with SMTP id 38308e7fff4ca-2f1a6c4cf87mr56689141fa.3.1723410055136; Sun, 11 Aug 2024 14:00:55 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5bd1a5dfdbbsm1575820a12.69.2024.08.11.14.00.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:00:54 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:00:54 +0200 Message-Id: Cc: , "Richard Sandiford" , "Richard Biener" To: "gcc-patches" From: "Robin Dapp" Subject: [PATCH 4/8] vect: Add maskload else value support. X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds an else operand to vectorized masked load calls. The current implementation adds else-value arguments to the respective target-querying functions that is used to supply the vectorizer with the proper else value. Right now, the only spot where a zero else value is actually enforced is tree-ifcvt. Loop masking and other instances of masked loads in the vectorizer itself do not use vec_cond_exprs. gcc/ChangeLog: * internal-fn.cc (internal_gather_scatter_fn_supported_p): Add else argument. * internal-fn.h (internal_gather_scatter_fn_supported_p): Ditto. (MASK_LOAD_ELSE_NONE): Define. (MASK_LOAD_ELSE_ZERO): Ditto. (MASK_LOAD_ELSE_M1): Ditto. (MASK_LOAD_ELSE_UNDEFINED): Ditto. * optabs-query.cc (supports_vec_convert_optab_p): Return icode. (get_supported_else_val): Return supported else value for optab's operand at index. (supports_vec_gather_load_p): Add else argument. (supports_vec_scatter_store_p): Ditto. * optabs-query.h (supports_vec_gather_load_p): Ditto. (get_supported_else_val): Ditto. * optabs-tree.cc (target_supports_mask_load_store_p): Ditto. (can_vec_mask_load_store_p): Ditto. (target_supports_len_load_store_p): Ditto. (get_len_load_store_mode): Ditto. * optabs-tree.h (target_supports_mask_load_store_p): Ditto. (can_vec_mask_load_store_p): Ditto. * tree-vect-data-refs.cc (vect_lanes_optab_supported_p): Ditto. (vect_gather_scatter_fn_p): Ditto. (vect_check_gather_scatter): Ditto. (vect_load_lanes_supported): Ditto. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Ditto. * tree-vect-slp.cc (vect_get_operand_map): Adjust indices for else operand. (vect_slp_analyze_node_operations): Skip undefined else operand. * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): Add else operand handling. (vect_get_vec_defs_for_operand): Handle undefined else operand. (check_load_store_for_partial_vectors): Add else argument. (vect_truncate_gather_scatter_offset): Ditto. (vect_use_strided_gather_scatters_p): Ditto. (get_group_load_store_type): Ditto. (get_load_store_type): Ditto. (vect_get_mask_load_else): Ditto. (vect_get_else_val_from_tree): Ditto. (vect_build_one_gather_load_call): Add zero else operand. (vectorizable_load): Use else operand. * tree-vectorizer.h (vect_gather_scatter_fn_p): Add else argument. (vect_load_lanes_supported): Ditto. (vect_get_mask_load_else): Ditto. (vect_get_else_val_from_tree): Ditto. --- gcc/internal-fn.cc | 19 +++- gcc/internal-fn.h | 11 +- gcc/optabs-query.cc | 83 +++++++++++--- gcc/optabs-query.h | 3 +- gcc/optabs-tree.cc | 43 +++++--- gcc/optabs-tree.h | 8 +- gcc/tree-vect-data-refs.cc | 39 +++++-- gcc/tree-vect-patterns.cc | 17 ++- gcc/tree-vect-slp.cc | 22 +++- gcc/tree-vect-stmts.cc | 218 +++++++++++++++++++++++++++++-------- gcc/tree-vectorizer.h | 11 +- 11 files changed, 367 insertions(+), 107 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 586978e8f3f..2fc676e397c 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4988,12 +4988,15 @@ internal_fn_stored_value_index (internal_fn fn) or stored. OFFSET_VECTOR_TYPE is the vector type that holds the offset from the shared base address of each loaded or stored element. SCALE is the amount by which these offsets should be multiplied - *after* they have been extended to address width. */ + *after* they have been extended to address width. + If the target supports the gather load the supported else value + will be written to the position ELSVAL points to if it is nonzero. */ bool internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, tree memory_element_type, - tree offset_vector_type, int scale) + tree offset_vector_type, int scale, + int *elsval) { if (!tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (vector_type)), TYPE_SIZE (memory_element_type))) @@ -5006,9 +5009,15 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, TYPE_MODE (offset_vector_type)); int output_ops = internal_load_fn_p (ifn) ? 1 : 0; bool unsigned_p = TYPE_UNSIGNED (TREE_TYPE (offset_vector_type)); - return (icode != CODE_FOR_nothing - && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p)) - && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); + bool ok = false; + ok = icode != CODE_FOR_nothing + && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p)) + && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)); + + if (ok && elsval) + *elsval = get_supported_else_val (icode, 6); + + return ok; } /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 2785a5a95a2..7b301732069 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -240,9 +240,18 @@ extern int internal_fn_len_index (internal_fn); extern int internal_fn_else_index (internal_fn); extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, - tree, tree, int); + tree, tree, int, + int * = nullptr); extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, poly_uint64, unsigned int); + +/* Integer constants representing which else value is supported for masked load + functions. */ +#define MASK_LOAD_ELSE_NONE 0 +#define MASK_LOAD_ELSE_ZERO -1 +#define MASK_LOAD_ELSE_M1 -2 +#define MASK_LOAD_ELSE_UNDEFINED -3 + #define VECT_PARTIAL_BIAS_UNSUPPORTED 127 extern signed char internal_len_load_store_bias (internal_fn ifn, diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc index 5149de57468..93c1d7b8485 100644 --- a/gcc/optabs-query.cc +++ b/gcc/optabs-query.cc @@ -29,6 +29,9 @@ along with GCC; see the file COPYING3. If not see #include "rtl.h" #include "recog.h" #include "vec-perm-indices.h" +#include "internal-fn.h" +#include "memmodel.h" +#include "optabs.h" struct target_optabs default_target_optabs; struct target_optabs *this_fn_optabs = &default_target_optabs; @@ -665,34 +668,74 @@ lshift_cheap_p (bool speed_p) that mode, given that the second mode is always an integer vector. If MODE is VOIDmode, return true if OP supports any vector mode. */ -static bool +static enum insn_code supports_vec_convert_optab_p (optab op, machine_mode mode) { int start = mode == VOIDmode ? 0 : mode; int end = mode == VOIDmode ? MAX_MACHINE_MODE - 1 : mode; + enum insn_code icode = CODE_FOR_nothing; for (int i = start; i <= end; ++i) if (VECTOR_MODE_P ((machine_mode) i)) for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j) - if (convert_optab_handler (op, (machine_mode) i, - (machine_mode) j) != CODE_FOR_nothing) - return true; + { + if ((icode + = convert_optab_handler (op, (machine_mode) i, + (machine_mode) j)) != CODE_FOR_nothing) + return icode; + } - return false; + return icode; } +/* Return the supported else value for the optab referred to by ICODE. The + index of the else operand must be specified in ELS_INDEX. + If no else value is supported, return MASK_LOAD_ELSE_NONE. */ +int +get_supported_else_val (enum insn_code icode, unsigned els_index) +{ + const struct insn_data_d *data = &insn_data[icode]; + machine_mode els_mode = data->operand[els_index].mode; + + /* For now we only support else values of 0, -1 and "undefined". */ + /* ??? Does a -1 constant make sense for anything but integer? */ + if (GET_MODE_CLASS (els_mode) == MODE_VECTOR_INT + && insn_operand_matches (icode, els_index, CONSTM1_RTX (els_mode))) + { + return MASK_LOAD_ELSE_M1; + } + else if (insn_operand_matches (icode, els_index, gen_rtx_SCRATCH (els_mode))) + { + return MASK_LOAD_ELSE_UNDEFINED; + } + else if (insn_operand_matches (icode, els_index, CONST0_RTX (els_mode))) + { + return MASK_LOAD_ELSE_ZERO; + } + return MASK_LOAD_ELSE_NONE; +} + + /* If MODE is not VOIDmode, return true if vec_gather_load is available for that mode. If MODE is VOIDmode, return true if gather_load is available for at least one vector mode. */ bool -supports_vec_gather_load_p (machine_mode mode) +supports_vec_gather_load_p (machine_mode mode, int *elsval) { - if (!this_fn_optabs->supports_vec_gather_load[mode]) - this_fn_optabs->supports_vec_gather_load[mode] - = (supports_vec_convert_optab_p (gather_load_optab, mode) - || supports_vec_convert_optab_p (mask_gather_load_optab, mode) - || supports_vec_convert_optab_p (mask_len_gather_load_optab, mode) - ? 1 : -1); + enum insn_code icode = CODE_FOR_nothing; + if (!this_fn_optabs->supports_vec_gather_load[mode] || elsval) + { + icode = supports_vec_convert_optab_p (gather_load_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_gather_load_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_len_gather_load_optab, mode); + this_fn_optabs->supports_vec_gather_load[mode] + = (icode != CODE_FOR_nothing) ? 1 : -1; + } + + if (elsval && icode != CODE_FOR_nothing) + *elsval = get_supported_else_val (icode, 6); return this_fn_optabs->supports_vec_gather_load[mode] > 0; } @@ -704,12 +747,18 @@ supports_vec_gather_load_p (machine_mode mode) bool supports_vec_scatter_store_p (machine_mode mode) { + enum insn_code icode; if (!this_fn_optabs->supports_vec_scatter_store[mode]) - this_fn_optabs->supports_vec_scatter_store[mode] - = (supports_vec_convert_optab_p (scatter_store_optab, mode) - || supports_vec_convert_optab_p (mask_scatter_store_optab, mode) - || supports_vec_convert_optab_p (mask_len_scatter_store_optab, mode) - ? 1 : -1); + { + icode = supports_vec_convert_optab_p (scatter_store_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_scatter_store_optab, mode); + if (icode == CODE_FOR_nothing) + icode = supports_vec_convert_optab_p (mask_len_scatter_store_optab, + mode); + this_fn_optabs->supports_vec_scatter_store[mode] + = (icode != CODE_FOR_nothing) ? 1 : -1; + } return this_fn_optabs->supports_vec_scatter_store[mode] > 0; } diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h index 0cb2c21ba85..331832bfad2 100644 --- a/gcc/optabs-query.h +++ b/gcc/optabs-query.h @@ -191,9 +191,10 @@ bool can_compare_and_swap_p (machine_mode, bool); bool can_atomic_exchange_p (machine_mode, bool); bool can_atomic_load_p (machine_mode); bool lshift_cheap_p (bool); -bool supports_vec_gather_load_p (machine_mode = E_VOIDmode); +bool supports_vec_gather_load_p (machine_mode = E_VOIDmode, int * = nullptr); bool supports_vec_scatter_store_p (machine_mode = E_VOIDmode); bool can_vec_extract (machine_mode, machine_mode); +int get_supported_else_val (enum insn_code, unsigned); /* Version of find_widening_optab_handler_and_mode that operates on specific mode types. */ diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc index b69a5bc3676..68e1eb9167c 100644 --- a/gcc/optabs-tree.cc +++ b/gcc/optabs-tree.cc @@ -554,22 +554,30 @@ target_supports_op_p (tree type, enum tree_code code, load/store and return corresponding IFN in the last argument (IFN_MASK_{LOAD,STORE} or IFN_MASK_LEN_{LOAD,STORE}). */ -static bool +bool target_supports_mask_load_store_p (machine_mode mode, machine_mode mask_mode, - bool is_load, internal_fn *ifn) + bool is_load, internal_fn *ifn, + int *elsval) { optab op = is_load ? maskload_optab : maskstore_optab; optab len_op = is_load ? mask_len_load_optab : mask_len_store_optab; - if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing) + enum insn_code icode; + if ((icode = convert_optab_handler (op, mode, mask_mode)) + != CODE_FOR_nothing) { if (ifn) *ifn = is_load ? IFN_MASK_LOAD : IFN_MASK_STORE; + if (elsval) + *elsval = get_supported_else_val (icode, 3); return true; } - else if (convert_optab_handler (len_op, mode, mask_mode) != CODE_FOR_nothing) + else if ((icode = convert_optab_handler (len_op, mode, mask_mode)) + != CODE_FOR_nothing) { if (ifn) *ifn = is_load ? IFN_MASK_LEN_LOAD : IFN_MASK_LEN_STORE; + if (elsval) + *elsval = get_supported_else_val (icode, 3); return true; } return false; @@ -584,13 +592,15 @@ bool can_vec_mask_load_store_p (machine_mode mode, machine_mode mask_mode, bool is_load, - internal_fn *ifn) + internal_fn *ifn, + int *elsval) { machine_mode vmode; /* If mode is vector mode, check it directly. */ if (VECTOR_MODE_P (mode)) - return target_supports_mask_load_store_p (mode, mask_mode, is_load, ifn); + return target_supports_mask_load_store_p (mode, mask_mode, is_load, ifn, + elsval); /* Otherwise, return true if there is some vector mode with the mask load/store supported. */ @@ -604,7 +614,8 @@ can_vec_mask_load_store_p (machine_mode mode, vmode = targetm.vectorize.preferred_simd_mode (smode); if (VECTOR_MODE_P (vmode) && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) - && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn)) + && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn, + elsval)) return true; auto_vector_modes vector_modes; @@ -612,7 +623,8 @@ can_vec_mask_load_store_p (machine_mode mode, for (machine_mode base_mode : vector_modes) if (related_vector_mode (base_mode, smode).exists (&vmode) && targetm.vectorize.get_mask_mode (vmode).exists (&mask_mode) - && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn)) + && target_supports_mask_load_store_p (vmode, mask_mode, is_load, ifn, + elsval)) return true; return false; } @@ -626,7 +638,7 @@ can_vec_mask_load_store_p (machine_mode mode, static bool target_supports_len_load_store_p (machine_mode mode, bool is_load, - internal_fn *ifn) + internal_fn *ifn, int *elsval) { optab op = is_load ? len_load_optab : len_store_optab; optab masked_op = is_load ? mask_len_load_optab : mask_len_store_optab; @@ -638,11 +650,15 @@ target_supports_len_load_store_p (machine_mode mode, bool is_load, return true; } machine_mode mask_mode; + enum insn_code icode; if (targetm.vectorize.get_mask_mode (mode).exists (&mask_mode) - && convert_optab_handler (masked_op, mode, mask_mode) != CODE_FOR_nothing) + && ((icode = convert_optab_handler (masked_op, mode, mask_mode)) + != CODE_FOR_nothing)) { if (ifn) *ifn = is_load ? IFN_MASK_LEN_LOAD : IFN_MASK_LEN_STORE; + if (elsval) + *elsval = get_supported_else_val (icode, 3); return true; } return false; @@ -659,19 +675,20 @@ target_supports_len_load_store_p (machine_mode mode, bool is_load, which optab is supported in the target. */ opt_machine_mode -get_len_load_store_mode (machine_mode mode, bool is_load, internal_fn *ifn) +get_len_load_store_mode (machine_mode mode, bool is_load, internal_fn *ifn, + int *elsval) { gcc_assert (VECTOR_MODE_P (mode)); /* Check if length in lanes supported for this mode directly. */ - if (target_supports_len_load_store_p (mode, is_load, ifn)) + if (target_supports_len_load_store_p (mode, is_load, ifn, elsval)) return mode; /* Check if length in bytes supported for same vector size VnQI. */ machine_mode vmode; poly_uint64 nunits = GET_MODE_SIZE (mode); if (related_vector_mode (mode, QImode, nunits).exists (&vmode) - && target_supports_len_load_store_p (vmode, is_load, ifn)) + && target_supports_len_load_store_p (vmode, is_load, ifn, elsval)) return vmode; return opt_machine_mode (); diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h index f2b49991462..117118c02fc 100644 --- a/gcc/optabs-tree.h +++ b/gcc/optabs-tree.h @@ -47,9 +47,13 @@ bool expand_vec_cond_expr_p (tree, tree, enum tree_code); void init_tree_optimization_optabs (tree); bool target_supports_op_p (tree, enum tree_code, enum optab_subtype = optab_default); +bool target_supports_mask_load_store_p (machine_mode, machine_mode, + bool, internal_fn *, int *); bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool, - internal_fn * = nullptr); + internal_fn * = nullptr, + int * = nullptr); opt_machine_mode get_len_load_store_mode (machine_mode, bool, - internal_fn * = nullptr); + internal_fn * = nullptr, + int * = nullptr); #endif diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index 39fd887a96b..17f3cbbdb6c 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -54,13 +54,15 @@ along with GCC; see the file COPYING3. If not see #include "vec-perm-indices.h" #include "internal-fn.h" #include "gimple-fold.h" +#include "optabs-query.h" /* Return true if load- or store-lanes optab OPTAB is implemented for COUNT vectors of type VECTYPE. NAME is the name of OPTAB. */ static bool vect_lanes_optab_supported_p (const char *name, convert_optab optab, - tree vectype, unsigned HOST_WIDE_INT count) + tree vectype, unsigned HOST_WIDE_INT count, + int *elsval = nullptr) { machine_mode mode, array_mode; bool limit_p; @@ -80,7 +82,9 @@ vect_lanes_optab_supported_p (const char *name, convert_optab optab, } } - if (convert_optab_handler (optab, array_mode, mode) == CODE_FOR_nothing) + enum insn_code icode; + if ((icode = convert_optab_handler (optab, array_mode, mode)) + == CODE_FOR_nothing) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -94,6 +98,9 @@ vect_lanes_optab_supported_p (const char *name, convert_optab optab, "can use %s<%s><%s>\n", name, GET_MODE_NAME (array_mode), GET_MODE_NAME (mode)); + if (elsval) + *elsval = get_supported_else_val (icode, 3); + return true; } @@ -4176,7 +4183,7 @@ bool vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, tree vectype, tree memory_type, tree offset_type, int scale, internal_fn *ifn_out, - tree *offset_vectype_out) + tree *offset_vectype_out, int *elsval) { unsigned int memory_bits = tree_to_uhwi (TYPE_SIZE (memory_type)); unsigned int element_bits = vector_element_bits (vectype); @@ -4214,7 +4221,8 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, /* Test whether the target supports this combination. */ if (internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type, - offset_vectype, scale)) + offset_vectype, scale, + elsval)) { *ifn_out = ifn; *offset_vectype_out = offset_vectype; @@ -4275,7 +4283,7 @@ vect_describe_gather_scatter_call (stmt_vec_info stmt_info, bool vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, - gather_scatter_info *info) + gather_scatter_info *info, int *elsval) { HOST_WIDE_INT scale = 1; poly_int64 pbitpos, pbitsize; @@ -4299,6 +4307,16 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, ifn = gimple_call_internal_fn (call); if (internal_gather_scatter_fn_p (ifn)) { + /* Extract the else value from a masked-load call. This is + necessary when we created a gather_scatter pattern from a + maskload. It is a bit cumbersome to basically create the + same else value three times but it's probably acceptable until + tree-ifcvt goes away. */ + if (internal_fn_mask_index (ifn) >= 0 && elsval) + { + tree els = gimple_call_arg (call, internal_fn_else_index (ifn)); + *elsval = vect_get_else_val_from_tree (els); + } vect_describe_gather_scatter_call (stmt_info, info); return true; } @@ -4308,7 +4326,8 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, /* True if we should aim to use internal functions rather than built-in functions. */ bool use_ifn_p = (DR_IS_READ (dr) - ? supports_vec_gather_load_p (TYPE_MODE (vectype)) + ? supports_vec_gather_load_p (TYPE_MODE (vectype), + elsval) : supports_vec_scatter_store_p (TYPE_MODE (vectype))); base = DR_REF (dr); @@ -6388,23 +6407,23 @@ vect_grouped_load_supported (tree vectype, bool single_element_p, internal_fn vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count, - bool masked_p) + bool masked_p, int *elsval) { if (vect_lanes_optab_supported_p ("vec_mask_len_load_lanes", vec_mask_len_load_lanes_optab, vectype, - count)) + count, elsval)) return IFN_MASK_LEN_LOAD_LANES; else if (masked_p) { if (vect_lanes_optab_supported_p ("vec_mask_load_lanes", vec_mask_load_lanes_optab, vectype, - count)) + count, elsval)) return IFN_MASK_LOAD_LANES; } else { if (vect_lanes_optab_supported_p ("vec_load_lanes", vec_load_lanes_optab, - vectype, count)) + vectype, count, elsval)) return IFN_LOAD_LANES; } return IFN_LAST; diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 4674a16d15f..3bee280fd91 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -6466,7 +6466,8 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo, /* Make sure that the target supports an appropriate internal function for the gather/scatter operation. */ gather_scatter_info gs_info; - if (!vect_check_gather_scatter (stmt_info, loop_vinfo, &gs_info) + int elsval; + if (!vect_check_gather_scatter (stmt_info, loop_vinfo, &gs_info, &elsval) || gs_info.ifn == IFN_LAST) return NULL; @@ -6489,20 +6490,26 @@ vect_recog_gather_scatter_pattern (vec_info *vinfo, tree offset = vect_add_conversion_to_pattern (vinfo, offset_type, gs_info.offset, stmt_info); + tree vec_els = NULL_TREE; /* Build the new pattern statement. */ tree scale = size_int (gs_info.scale); gcall *pattern_stmt; + tree load_lhs; if (DR_IS_READ (dr)) { tree zero = build_zero_cst (gs_info.element_type); if (mask != NULL) - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, base, - offset, scale, zero, mask); + { + vec_els = vect_get_mask_load_else (elsval, TREE_TYPE (gs_vectype)); + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 6, base, + offset, scale, zero, mask, + vec_els); + } else pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base, offset, scale, zero); - tree load_lhs = vect_recog_temp_ssa_var (gs_info.element_type, NULL); - gimple_call_set_lhs (pattern_stmt, load_lhs); + load_lhs = vect_recog_temp_ssa_var (gs_info.element_type, NULL); + gimple_set_lhs (pattern_stmt, load_lhs); } else { diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 5f0d9e51c32..22448ec9917 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -507,13 +507,13 @@ static const int cond_expr_maps[3][5] = { }; static const int arg0_map[] = { 1, 0 }; static const int arg1_map[] = { 1, 1 }; -static const int arg2_map[] = { 1, 2 }; -static const int arg1_arg4_map[] = { 2, 1, 4 }; +static const int arg2_arg3_map[] = { 2, 2, 3 }; +static const int arg1_arg4_arg5_map[] = { 3, 1, 4, 5 }; static const int arg3_arg2_map[] = { 2, 3, 2 }; static const int op1_op0_map[] = { 2, 1, 0 }; static const int off_map[] = { 1, -3 }; static const int off_op0_map[] = { 2, -3, 0 }; -static const int off_arg2_map[] = { 2, -3, 2 }; +static const int off_arg2_arg3_map[] = { 3, -3, 2, 3 }; static const int off_arg3_arg2_map[] = { 3, -3, 3, 2 }; static const int mask_call_maps[6][7] = { { 1, 1, }, @@ -560,14 +560,14 @@ vect_get_operand_map (const gimple *stmt, bool gather_scatter_p = false, switch (gimple_call_internal_fn (call)) { case IFN_MASK_LOAD: - return gather_scatter_p ? off_arg2_map : arg2_map; + return gather_scatter_p ? off_arg2_arg3_map : arg2_arg3_map; case IFN_GATHER_LOAD: return arg1_map; case IFN_MASK_GATHER_LOAD: case IFN_MASK_LEN_GATHER_LOAD: - return arg1_arg4_map; + return arg1_arg4_arg5_map; case IFN_MASK_STORE: return gather_scatter_p ? off_arg3_arg2_map : arg3_arg2_map; @@ -6818,6 +6818,18 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node, tree vector_type = SLP_TREE_VECTYPE (child); if (!vector_type) { + /* Masked loads can have an undefined (default SSA definition) + else operand. We do not need to cost it. */ + vec ops = SLP_TREE_SCALAR_OPS (child); + if ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) + == load_vec_info_type) + && ((ops.length () && + TREE_CODE (ops[0]) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (ops[0]) + && VAR_P (SSA_NAME_VAR (ops[0]))) + || SLP_TREE_DEF_TYPE (child) == vect_constant_def)) + continue; + /* For shifts with a scalar argument we don't need to cost or code-generate anything. ??? Represent this more explicitely. */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 20cae83e820..9e721c72ddf 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. If not see #include "regs.h" #include "attribs.h" #include "optabs-libfuncs.h" +#include "tree-dfa.h" /* For lang_hooks.types.type_for_mode. */ #include "langhooks.h" @@ -467,6 +468,10 @@ exist_non_indexing_operands_for_use_p (tree use, stmt_vec_info stmt_info) if (mask_index >= 0 && use == gimple_call_arg (call, mask_index)) return true; + int els_index = internal_fn_else_index (ifn); + if (els_index >= 0 + && use == gimple_call_arg (call, els_index)) + return true; int stored_value_index = internal_fn_stored_value_index (ifn); if (stored_value_index >= 0 && use == gimple_call_arg (call, stored_value_index)) @@ -1278,7 +1283,17 @@ vect_get_vec_defs_for_operand (vec_info *vinfo, stmt_vec_info stmt_vinfo, vector_type = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE (op)); gcc_assert (vector_type); - tree vop = vect_init_vector (vinfo, stmt_vinfo, op, vector_type, NULL); + /* A masked load can have a default SSA definition as else operand. + We should "vectorize" this instead of creating a duplicate from the + scalar default. */ + tree vop; + if (TREE_CODE (op) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (op) + && VAR_P (SSA_NAME_VAR (op))) + vop = get_or_create_ssa_default_def (cfun, + create_tmp_var (vector_type)); + else + vop = vect_init_vector (vinfo, stmt_vinfo, op, vector_type, NULL); while (ncopies--) vec_oprnds->quick_push (vop); } @@ -1500,7 +1515,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, vect_memory_access_type memory_access_type, gather_scatter_info *gs_info, - tree scalar_mask) + tree scalar_mask, + int *elsval = nullptr) { /* Invariant loads need no special support. */ if (memory_access_type == VMAT_INVARIANT) @@ -1519,7 +1535,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, if (memory_access_type == VMAT_LOAD_STORE_LANES) { internal_fn ifn - = (is_load ? vect_load_lanes_supported (vectype, group_size, true) + = (is_load ? vect_load_lanes_supported (vectype, group_size, true, + elsval) : vect_store_lanes_supported (vectype, group_size, true)); if (ifn == IFN_MASK_LEN_LOAD_LANES || ifn == IFN_MASK_LEN_STORE_LANES) vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1); @@ -1549,7 +1566,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, if (internal_gather_scatter_fn_supported_p (len_ifn, vectype, gs_info->memory_type, gs_info->offset_vectype, - gs_info->scale)) + gs_info->scale, + elsval)) vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1); else if (internal_gather_scatter_fn_supported_p (ifn, vectype, gs_info->memory_type, @@ -1608,7 +1626,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, machine_mode mask_mode; machine_mode vmode; bool using_partial_vectors_p = false; - if (get_len_load_store_mode (vecmode, is_load).exists (&vmode)) + if (get_len_load_store_mode + (vecmode, is_load, nullptr, elsval).exists (&vmode)) { nvectors = group_memory_nvectors (group_size * vf, nunits); unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vecmode); @@ -1616,7 +1635,8 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, using_partial_vectors_p = true; } else if (targetm.vectorize.get_mask_mode (vecmode).exists (&mask_mode) - && can_vec_mask_load_store_p (vecmode, mask_mode, is_load)) + && can_vec_mask_load_store_p (vecmode, mask_mode, is_load, NULL, + elsval)) { nvectors = group_memory_nvectors (group_size * vf, nunits); vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype, scalar_mask); @@ -1678,7 +1698,8 @@ prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type, tree loop_mask, static bool vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, bool masked_p, - gather_scatter_info *gs_info) + gather_scatter_info *gs_info, + int *elsval) { dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info); data_reference *dr = dr_info->dr; @@ -1735,7 +1756,8 @@ vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info, tree memory_type = TREE_TYPE (DR_REF (dr)); if (!vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p, vectype, memory_type, offset_type, scale, - &gs_info->ifn, &gs_info->offset_vectype) + &gs_info->ifn, &gs_info->offset_vectype, + elsval) || gs_info->ifn == IFN_LAST) continue; @@ -1768,12 +1790,13 @@ vect_truncate_gather_scatter_offset (stmt_vec_info stmt_info, static bool vect_use_strided_gather_scatters_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, bool masked_p, - gather_scatter_info *gs_info) + gather_scatter_info *gs_info, + int *elsval) { - if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info) + if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info, elsval) || gs_info->ifn == IFN_LAST) return vect_truncate_gather_scatter_offset (stmt_info, loop_vinfo, - masked_p, gs_info); + masked_p, gs_info, elsval); tree old_offset_type = TREE_TYPE (gs_info->offset); tree new_offset_type = TREE_TYPE (gs_info->offset_vectype); @@ -1986,7 +2009,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, dr_alignment_support *alignment_support_scheme, int *misalignment, gather_scatter_info *gs_info, - internal_fn *lanes_ifn) + internal_fn *lanes_ifn, + int *elsval) { loop_vec_info loop_vinfo = dyn_cast (vinfo); class loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; @@ -2220,7 +2244,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, /* Otherwise try using LOAD/STORE_LANES. */ *lanes_ifn = vls_type == VLS_LOAD - ? vect_load_lanes_supported (vectype, group_size, masked_p) + ? vect_load_lanes_supported (vectype, group_size, masked_p, + elsval) : vect_store_lanes_supported (vectype, group_size, masked_p); if (*lanes_ifn != IFN_LAST) @@ -2253,7 +2278,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, && single_element_p && loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, - masked_p, gs_info)) + masked_p, gs_info, elsval)) *memory_access_type = VMAT_GATHER_SCATTER; } @@ -2328,7 +2353,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, dr_alignment_support *alignment_support_scheme, int *misalignment, gather_scatter_info *gs_info, - internal_fn *lanes_ifn) + internal_fn *lanes_ifn, + int *elsval = nullptr) { loop_vec_info loop_vinfo = dyn_cast (vinfo); poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); @@ -2337,7 +2363,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) { *memory_access_type = VMAT_GATHER_SCATTER; - if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info)) + if (!vect_check_gather_scatter (stmt_info, loop_vinfo, gs_info, + elsval)) gcc_unreachable (); /* When using internal functions, we rely on pattern recognition to convert the type of the offset to the type that the target @@ -2391,7 +2418,8 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, masked_p, vls_type, memory_access_type, poffset, alignment_support_scheme, - misalignment, gs_info, lanes_ifn)) + misalignment, gs_info, lanes_ifn, + elsval)) return false; } else if (STMT_VINFO_STRIDED_P (stmt_info)) @@ -2399,7 +2427,7 @@ get_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, gcc_assert (!slp_node); if (loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, - masked_p, gs_info)) + masked_p, gs_info, elsval)) *memory_access_type = VMAT_GATHER_SCATTER; else *memory_access_type = VMAT_ELEMENTWISE; @@ -2667,6 +2695,52 @@ vect_build_zero_merge_argument (vec_info *vinfo, return vect_init_vector (vinfo, stmt_info, merge, vectype, NULL); } +/* Return the supported else value for a masked load internal function IFN. + The vector type is given in VECTYPE and the mask type in VECTYPE2. + TYPE specifies the type of the returned else value. */ + +tree +vect_get_mask_load_else (int elsval, tree type) +{ + tree els; + if (elsval == MASK_LOAD_ELSE_UNDEFINED) + { + tree tmp = create_tmp_var (type); + /* No need to warn about anything. */ + TREE_NO_WARNING (tmp) = 1; + els = get_or_create_ssa_default_def (cfun, tmp); + } + else if (elsval == MASK_LOAD_ELSE_M1) + els = build_minus_one_cst (type); + else if (elsval == MASK_LOAD_ELSE_ZERO) + els = build_zero_cst (type); + else + __builtin_unreachable (); + + return els; +} + +/* Return the integer define a tree else operand ELS represents. + This performs the inverse of vect_get_mask_load_else. Refer to + vect_check_gather_scatter for its usage rationale. */ + +int +vect_get_else_val_from_tree (tree els) +{ + if (TREE_CODE (els) == SSA_NAME + && SSA_NAME_IS_DEFAULT_DEF (els)) + return MASK_LOAD_ELSE_UNDEFINED; + else + { + if (zerop (els)) + return MASK_LOAD_ELSE_ZERO; + else if (integer_minus_onep (els)) + return MASK_LOAD_ELSE_M1; + else + return MASK_LOAD_ELSE_NONE; + } +} + /* Build a gather load call while vectorizing STMT_INFO. Insert new instructions before GSI and add them to VEC_STMT. GS_INFO describes the gather load operation. If the load is conditional, MASK is the @@ -2748,8 +2822,20 @@ vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info, } tree scale = build_int_cst (scaletype, gs_info->scale); - gimple *new_stmt = gimple_build_call (gs_info->decl, 5, src_op, ptr, op, - mask_op, scale); + gimple *new_stmt; + + /* ??? Rather than trying to querying a builtin's predicates + in a cumbersome way go with a zero else value. + As this vectorizer path is x86 only and x86 gather loads + always zero-fill masked elements a hard-coded zero else value + seems reasonable. */ + tree vec_els = build_zero_cst (vectype); + if (!mask) + new_stmt = gimple_build_call (gs_info->decl, 5, src_op, ptr, op, + mask_op, scale); + else + new_stmt = gimple_build_call (gs_info->decl, 6, src_op, ptr, op, + mask_op, vec_els, scale); if (!useless_type_conversion_p (vectype, rettype)) { @@ -9832,6 +9918,7 @@ vectorizable_load (vec_info *vinfo, gather_scatter_info gs_info; tree ref_type; enum vect_def_type mask_dt = vect_unknown_def_type; + enum vect_def_type els_dt = vect_unknown_def_type; if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) return false; @@ -9844,8 +9931,12 @@ vectorizable_load (vec_info *vinfo, return false; tree mask = NULL_TREE, mask_vectype = NULL_TREE; + tree els = NULL_TREE; tree els_vectype = NULL_TREE; + int mask_index = -1; + int els_index = -1; slp_tree slp_op = NULL; + slp_tree els_op = NULL; if (gassign *assign = dyn_cast (stmt_info->stmt)) { scalar_dest = gimple_assign_lhs (assign); @@ -9885,6 +9976,15 @@ vectorizable_load (vec_info *vinfo, && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, &slp_op, &mask_dt, &mask_vectype)) return false; + + els_index = internal_fn_else_index (ifn); + if (els_index >= 0 && slp_node) + els_index = vect_slp_child_index_for_operand + (call, els_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info)); + if (els_index >= 0 + && !vect_is_simple_use (vinfo, stmt_info, slp_node, els_index, + &els, &els_op, &els_dt, &els_vectype)) + return false; } tree vectype = STMT_VINFO_VECTYPE (stmt_info); @@ -10027,10 +10127,11 @@ vectorizable_load (vec_info *vinfo, int misalignment; poly_int64 poffset; internal_fn lanes_ifn; + int elsval; if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, VLS_LOAD, ncopies, &memory_access_type, &poffset, &alignment_support_scheme, &misalignment, &gs_info, - &lanes_ifn)) + &lanes_ifn, &elsval)) return false; if (mask) @@ -10040,7 +10141,8 @@ vectorizable_load (vec_info *vinfo, machine_mode vec_mode = TYPE_MODE (vectype); if (!VECTOR_MODE_P (vec_mode) || !can_vec_mask_load_store_p (vec_mode, - TYPE_MODE (mask_vectype), true)) + TYPE_MODE (mask_vectype), + true, NULL, &elsval)) return false; } else if (memory_access_type != VMAT_LOAD_STORE_LANES @@ -10771,6 +10873,7 @@ vectorizable_load (vec_info *vinfo, } tree vec_mask = NULL_TREE; + tree vec_els = NULL_TREE; if (memory_access_type == VMAT_LOAD_STORE_LANES) { gcc_assert (alignment_support_scheme == dr_aligned @@ -10860,6 +10963,9 @@ vectorizable_load (vec_info *vinfo, } } + if (loop_masks || final_mask) + vec_els = vect_get_mask_load_else (elsval, vectype); + gcall *call; if (final_len && final_mask) { @@ -10868,9 +10974,10 @@ vectorizable_load (vec_info *vinfo, VEC_MASK, LEN, BIAS). */ unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype)); tree alias_ptr = build_int_cst (ref_type, align); - call = gimple_build_call_internal (IFN_MASK_LEN_LOAD_LANES, 5, + call = gimple_build_call_internal (IFN_MASK_LEN_LOAD_LANES, 6, dataref_ptr, alias_ptr, - final_mask, final_len, bias); + final_mask, vec_els, + final_len, bias); } else if (final_mask) { @@ -10879,9 +10986,9 @@ vectorizable_load (vec_info *vinfo, VEC_MASK). */ unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype)); tree alias_ptr = build_int_cst (ref_type, align); - call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3, + call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 4, dataref_ptr, alias_ptr, - final_mask); + final_mask, vec_els); } else { @@ -11023,17 +11130,27 @@ vectorizable_load (vec_info *vinfo, } } + if (final_mask) + vec_els = vect_get_mask_load_else (elsval, vectype); + gcall *call; if (final_len && final_mask) - call - = gimple_build_call_internal (IFN_MASK_LEN_GATHER_LOAD, 7, - dataref_ptr, vec_offset, - scale, zero, final_mask, - final_len, bias); + { + call + = gimple_build_call_internal (IFN_MASK_LEN_GATHER_LOAD, + 8, dataref_ptr, + vec_offset, scale, zero, + final_mask, vec_els, + final_len, bias); + } else if (final_mask) - call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, 5, - dataref_ptr, vec_offset, - scale, zero, final_mask); + { + call = gimple_build_call_internal (IFN_MASK_GATHER_LOAD, + 6, dataref_ptr, + vec_offset, scale, + zero, final_mask, + vec_els); + } else call = gimple_build_call_internal (IFN_GATHER_LOAD, 4, dataref_ptr, vec_offset, @@ -11347,6 +11464,7 @@ vectorizable_load (vec_info *vinfo, tree final_mask = NULL_TREE; tree final_len = NULL_TREE; tree bias = NULL_TREE; + if (!costing_p) { if (mask) @@ -11399,7 +11517,8 @@ vectorizable_load (vec_info *vinfo, if (loop_lens) { opt_machine_mode new_ovmode - = get_len_load_store_mode (vmode, true, &partial_ifn); + = get_len_load_store_mode (vmode, true, &partial_ifn, + &elsval); new_vmode = new_ovmode.require (); unsigned factor = (new_ovmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vmode); @@ -11411,7 +11530,7 @@ vectorizable_load (vec_info *vinfo, { if (!can_vec_mask_load_store_p ( vmode, TYPE_MODE (TREE_TYPE (final_mask)), true, - &partial_ifn)) + &partial_ifn, &elsval)) gcc_unreachable (); } @@ -11439,19 +11558,27 @@ vectorizable_load (vec_info *vinfo, bias = build_int_cst (intQI_type_node, biasval); } + tree vec_els; + if (final_len || final_mask) + vec_els = vect_get_mask_load_else (elsval, vectype); + if (final_len) { tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); gcall *call; if (partial_ifn == IFN_MASK_LEN_LOAD) - call = gimple_build_call_internal (IFN_MASK_LEN_LOAD, 5, - dataref_ptr, ptr, - final_mask, final_len, - bias); + { + call = gimple_build_call_internal (IFN_MASK_LEN_LOAD, + 6, dataref_ptr, ptr, + final_mask, vec_els, + final_len, bias); + } else - call = gimple_build_call_internal (IFN_LEN_LOAD, 4, - dataref_ptr, ptr, - final_len, bias); + { + call = gimple_build_call_internal (IFN_LEN_LOAD, 4, + dataref_ptr, ptr, + final_len, bias); + } gimple_call_set_nothrow (call, true); new_stmt = call; data_ref = NULL_TREE; @@ -11474,9 +11601,10 @@ vectorizable_load (vec_info *vinfo, else if (final_mask) { tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); - gcall *call = gimple_build_call_internal (IFN_MASK_LOAD, 3, + gcall *call = gimple_build_call_internal (IFN_MASK_LOAD, 4, dataref_ptr, ptr, - final_mask); + final_mask, + vec_els); gimple_call_set_nothrow (call, true); new_stmt = call; data_ref = NULL_TREE; diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index df6c8ada2f7..e14b3f278b4 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2399,9 +2399,11 @@ extern bool vect_slp_analyze_instance_alignment (vec_info *, slp_instance); extern opt_result vect_analyze_data_ref_accesses (vec_info *, vec *); extern opt_result vect_prune_runtime_alias_test_list (loop_vec_info); extern bool vect_gather_scatter_fn_p (vec_info *, bool, bool, tree, tree, - tree, int, internal_fn *, tree *); + tree, int, internal_fn *, tree *, + int * = nullptr); extern bool vect_check_gather_scatter (stmt_vec_info, loop_vec_info, - gather_scatter_info *); + gather_scatter_info *, + int * = nullptr); extern opt_result vect_find_stmt_data_reference (loop_p, gimple *, vec *, vec *, int); @@ -2419,7 +2421,8 @@ extern tree vect_create_destination_var (tree, tree); extern bool vect_grouped_store_supported (tree, unsigned HOST_WIDE_INT); extern internal_fn vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT, bool); extern bool vect_grouped_load_supported (tree, bool, unsigned HOST_WIDE_INT); -extern internal_fn vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT, bool); +extern internal_fn vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT, + bool, int * = nullptr); extern void vect_permute_store_chain (vec_info *, vec &, unsigned int, stmt_vec_info, gimple_stmt_iterator *, vec *); @@ -2560,6 +2563,8 @@ extern int vect_slp_child_index_for_operand (const gimple *, int op, bool); extern tree prepare_vec_mask (loop_vec_info, tree, tree, tree, gimple_stmt_iterator *); +extern tree vect_get_mask_load_else (int, tree); +extern int vect_get_else_val_from_tree (tree els); /* In tree-vect-patterns.cc. */ extern void From patchwork Sun Aug 11 21:00:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971382 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=KW1rIpHJ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Whqns3S6wz1yYl for ; Mon, 12 Aug 2024 07:01:37 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5B6683857C4F for ; Sun, 11 Aug 2024 21:01:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) by sourceware.org (Postfix) with ESMTPS id 01B683858431 for ; Sun, 11 Aug 2024 21:00:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 01B683858431 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 01B683858431 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::529 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410064; cv=none; b=vfdu5p0s7g8J3ttsDijCmaYJjSjSgKrx8VzG6DYpcewXMsFKh7H79hFs1SDo4g7KqMXTy0oGcoKTtdn2s5fXfYh+HyN1izp5aLhCt5+VtpA32Dd9AqRJ8f96Z9LCzBwc1k/PY2dKvkwzQaoWeVtk2IJxPE0VU/Ngcxkm1+QGVE8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410064; c=relaxed/simple; bh=m1KbKT/GyWcjSt23UwnqqcvHFkUIeFhV3+hd7rmA4Zo=; h=DKIM-Signature:Mime-Version:Date:Message-Id:Subject:To:From; b=g5+qJPC4gIiO9/wDEnPJ+KneuXajs5oQ3uo8dHZNKy0gZ51nJf9YK5ddOWql/6yRHGNal/spykQO1iAMgwXfz/M0hE0hwdwQkbsGVuqqb4L6cjTXrEtQH7sTFWqve8qYZhafa0R1f64x4hJfDFXgRCgtfBQz65IHGq3eh0GQMmA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-5b391c8abd7so4299045a12.2 for ; Sun, 11 Aug 2024 14:00:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410058; x=1724014858; darn=gcc.gnu.org; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oKcuX0OjGcyl+aRIMRPAs9VM8rF/mMFqNT4YUpMAuTY=; b=KW1rIpHJzqXw9Sw4+gqg8in7GXpvar+D8CCBGBbzxBUo1Dq1hHjs4Fzm3hY5nC7oLp nU5AG6bsnPeHAtIaqq4Y7u6hEol2JzzOXcWr3tZNtLKxAeFSRF4wPcP5ycNkY2qdEInI UQA7uj0farY0bGpAdNZdIAk9I4nD2d0ClE38gNEWPqOZ4Dso/0Yi+EU7m7yy6jcZKGmg og2vdN7Mog+eO3xj0HUVOCzCrr5kfuA3Gf20wSf8AHZBiAmMfz8Pe6G0NEnlYr0XPBNl Zadylzo679aphhFYJLQJibGlg8yR28ZlkkSrchF4l1/SGeG/tphqxSzMG3RdjwRJNo3V uQgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410058; x=1724014858; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oKcuX0OjGcyl+aRIMRPAs9VM8rF/mMFqNT4YUpMAuTY=; b=Hl8rQJ6+A90/CwVEcse/ZpZWfnza4w0yULcDYfxzqnq99U3rODfLZ+IjYqxnJGBry6 4D9r61kclucXOSGu1uAINUEHQBtCegu1ZaiuieHu5Jvx7/FWnzS2vQKkbDPS+W/IHKu3 jBqzR63CEMWy+qvHCTm3zHh2EX0UfCcUsvyVKFK+3GqQe3SQMSZDLc9iGwZlGMfXLSC6 7w/H2igmWw8qO6CiTAQEl5V0KsxPFw3/XMAgwDI3B7SRlurt1eC56Dv1OfUMDFHPhAfO vlB8IWdnufrFZ3kjMEpWHRBmiRAxfPD7lmbE8wAwBQSYzIP2Y/spr8PNdR/K7a9fmv86 Fxyw== X-Gm-Message-State: AOJu0Yz/3g3MhIVHCxc0YbVVmOZsbRpmNF3gOVqURohS6r//TwGUlzMk BARcspUH3E4NCQj8TX+r4/vJN26wSUCkz1gVtPOrXFdAGOXC4WhRpGP/3g== X-Google-Smtp-Source: AGHT+IFZ6Gzoe/b/0Ugh0+qfeK8uIYlGZpUwF9cF18PkreQYd0Ej18q/EkGEJthXYBphwbQrBS/F2g== X-Received: by 2002:a17:907:80a:b0:a7a:b43e:86e3 with SMTP id a640c23a62f3a-a80aa676907mr425165066b.62.1723410057450; Sun, 11 Aug 2024 14:00:57 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a80bb090a1dsm170640566b.7.2024.08.11.14.00.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:00:57 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:00:56 +0200 Message-Id: Subject: [PATCH 5/8] aarch64: Add masked-load else operands. Cc: , "Richard Sandiford" , "Richard Biener" To: "gcc-patches" From: "Robin Dapp" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This adds zero else operands to masked loads and their intrinsics. I needed to adjust more than initially thought because we rely on combine for several instructions and a change in a "base" pattern needs to propagate to all those. For the lack of a better idea I used a function call property to specify whether a builtin needs an else operand or not. Somebody with better knowledge of the aarch64 target can surely improve that. --- .../aarch64/aarch64-sve-builtins-base.cc | 58 ++++++++++++++----- gcc/config/aarch64/aarch64-sve-builtins.cc | 5 ++ gcc/config/aarch64/aarch64-sve-builtins.h | 1 + gcc/config/aarch64/aarch64-sve.md | 47 +++++++++++++-- gcc/config/aarch64/aarch64-sve2.md | 3 +- gcc/config/aarch64/predicates.md | 4 ++ 6 files changed, 98 insertions(+), 20 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index d55bee0b72f..131c822a2cd 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1459,7 +1459,7 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } gimple * @@ -1474,11 +1474,12 @@ public: gimple_seq stmts = NULL; tree pred = f.convert_pred (stmts, vectype, 0); tree base = f.fold_contiguous_base (stmts, vectype); + tree els = build_zero_cst (vectype); gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT); tree cookie = f.load_store_cookie (TREE_TYPE (vectype)); - gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD, 3, - base, cookie, pred); + gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD, 4, + base, cookie, pred, els); gimple_call_set_lhs (new_call, f.lhs); return new_call; } @@ -1488,10 +1489,16 @@ public: { insn_code icode; if (e.vectors_per_tuple () == 1) - icode = convert_optab_handler (maskload_optab, - e.vector_mode (0), e.gp_mode (0)); + { + icode = convert_optab_handler (maskload_optab, + e.vector_mode (0), e.gp_mode (0)); + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); + } else - icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0)); + { + icode = code_for_aarch64 (UNSPEC_LD1_COUNT, e.tuple_mode (0)); + e.args.quick_push (CONST0_RTX (e.tuple_mode (0))); + } return e.use_contiguous_load_insn (icode); } }; @@ -1502,12 +1509,20 @@ class svld1_extend_impl : public extending_load public: using extending_load::extending_load; + unsigned int + call_properties (const function_instance &) const override + { + return CP_READ_MEMORY | CP_HAS_ELSE; + } + rtx expand (function_expander &e) const override { insn_code icode = code_for_aarch64_load (UNSPEC_LD1_SVE, extend_rtx_code (), e.vector_mode (0), e.memory_vector_mode ()); + /* Add the else operand. */ + e.args.quick_push (CONST0_RTX (e.vector_mode (1))); return e.use_contiguous_load_insn (icode); } }; @@ -1518,7 +1533,7 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } rtx @@ -1527,6 +1542,8 @@ public: e.prepare_gather_address_operands (1); /* Put the predicate last, as required by mask_gather_load_optab. */ e.rotate_inputs_left (0, 5); + /* Add the else operand. */ + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); machine_mode mem_mode = e.memory_vector_mode (); machine_mode int_mode = aarch64_sve_int_mode (mem_mode); insn_code icode = convert_optab_handler (mask_gather_load_optab, @@ -1550,6 +1567,8 @@ public: e.rotate_inputs_left (0, 5); /* Add a constant predicate for the extension rtx. */ e.args.quick_push (CONSTM1_RTX (VNx16BImode)); + /* Add the else operand. */ + e.args.quick_push (CONST0_RTX (e.vector_mode (1))); insn_code icode = code_for_aarch64_gather_load (extend_rtx_code (), e.vector_mode (0), e.memory_vector_mode ()); @@ -1680,7 +1699,7 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } gimple * @@ -1692,6 +1711,7 @@ public: /* Get the predicate and base pointer. */ gimple_seq stmts = NULL; tree pred = f.convert_pred (stmts, vectype, 0); + tree els = build_zero_cst (vectype); tree base = f.fold_contiguous_base (stmts, vectype); gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT); @@ -1710,8 +1730,8 @@ public: /* Emit the load itself. */ tree cookie = f.load_store_cookie (TREE_TYPE (vectype)); - gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3, - base, cookie, pred); + gcall *new_call = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 4, + base, cookie, pred, els); gimple_call_set_lhs (new_call, lhs_array); gsi_insert_after (f.gsi, new_call, GSI_SAME_STMT); @@ -1724,6 +1744,7 @@ public: machine_mode tuple_mode = e.result_mode (); insn_code icode = convert_optab_handler (vec_mask_load_lanes_optab, tuple_mode, e.vector_mode (0)); + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); return e.use_contiguous_load_insn (icode); } }; @@ -1785,16 +1806,23 @@ public: unsigned int call_properties (const function_instance &) const override { - return CP_READ_MEMORY; + return CP_READ_MEMORY | CP_HAS_ELSE; } rtx expand (function_expander &e) const override { - insn_code icode = (e.vectors_per_tuple () == 1 - ? code_for_aarch64_ldnt1 (e.vector_mode (0)) - : code_for_aarch64 (UNSPEC_LDNT1_COUNT, - e.tuple_mode (0))); + insn_code icode; + if (e.vectors_per_tuple () == 1) + { + icode = code_for_aarch64_ldnt1 (e.vector_mode (0)); + e.args.quick_push (CONST0_RTX (e.vector_mode (0))); + } + else + { + icode = code_for_aarch64 (UNSPEC_LDNT1_COUNT, e.tuple_mode (0)); + e.args.quick_push (CONST0_RTX (e.tuple_mode (0))); + } return e.use_contiguous_load_insn (icode); } }; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 0a560eaedca..d5448188788 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -4247,6 +4247,11 @@ function_expander::use_contiguous_load_insn (insn_code icode) add_input_operand (icode, args[0]); if (GET_MODE_UNIT_BITSIZE (mem_mode) < type_suffix (0).element_bits) add_input_operand (icode, CONSTM1_RTX (VNx16BImode)); + + /* If we have an else operand, add it. */ + if (call_properties () & CP_HAS_ELSE) + add_input_operand (icode, args.last ()); + return generate_insn (icode); } diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 9ab6f202c30..f5858651c0f 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -103,6 +103,7 @@ const unsigned int CP_READ_ZA = 1U << 7; const unsigned int CP_WRITE_ZA = 1U << 8; const unsigned int CP_READ_ZT0 = 1U << 9; const unsigned int CP_WRITE_ZT0 = 1U << 10; +const unsigned int CP_HAS_ELSE = 1U << 11; /* Enumerates the SVE predicate and (data) vector types, together called "vector types" for brevity. */ diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index c3ed5075c4e..c61428f5be6 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1291,7 +1291,8 @@ (define_insn "maskload" [(set (match_operand:SVE_ALL 0 "register_operand" "=w") (unspec:SVE_ALL [(match_operand: 2 "register_operand" "Upl") - (match_operand:SVE_ALL 1 "memory_operand" "m")] + (match_operand:SVE_ALL 1 "memory_operand" "m") + (match_operand:SVE_ALL 3 "aarch64_maskload_else_operand")] UNSPEC_LD1_SVE))] "TARGET_SVE" "ld1\t%0., %2/z, %1" @@ -1302,11 +1303,14 @@ (define_expand "vec_load_lanes" [(set (match_operand:SVE_STRUCT 0 "register_operand") (unspec:SVE_STRUCT [(match_dup 2) - (match_operand:SVE_STRUCT 1 "memory_operand")] + (match_operand:SVE_STRUCT 1 "memory_operand") + (match_dup 3) + ] UNSPEC_LDN))] "TARGET_SVE" { operands[2] = aarch64_ptrue_reg (mode); + operands[3] = CONST0_RTX (mode); } ) @@ -1315,7 +1319,8 @@ (define_insn "vec_mask_load_lanes" [(set (match_operand:SVE_STRUCT 0 "register_operand" "=w") (unspec:SVE_STRUCT [(match_operand: 2 "register_operand" "Upl") - (match_operand:SVE_STRUCT 1 "memory_operand" "m")] + (match_operand:SVE_STRUCT 1 "memory_operand" "m") + (match_operand 3 "aarch64_maskload_else_operand")] UNSPEC_LDN))] "TARGET_SVE" "ld\t%0, %2/z, %1" @@ -1335,6 +1340,27 @@ (define_insn "vec_mask_load_lanes" ;; Predicated load and extend, with 8 elements per 128-bit block. (define_insn_and_rewrite "@aarch64_load_" + [(set (match_operand:SVE_HSDI 0 "register_operand" "=w") + (unspec:SVE_HSDI + [(match_operand: 3 "general_operand" "UplDnm") + (ANY_EXTEND:SVE_HSDI + (unspec:SVE_PARTIAL_I + [(match_operand: 2 "register_operand" "Upl") + (match_operand:SVE_PARTIAL_I 1 "memory_operand" "m") + (match_operand:SVE_PARTIAL_I 4 "aarch64_maskload_else_operand")] + SVE_PRED_LOAD))] + UNSPEC_PRED_X))] + "TARGET_SVE && (~ & ) == 0" + "ld1\t%0., %2/z, %1" + "&& !CONSTANT_P (operands[3])" + { + operands[3] = CONSTM1_RTX (mode); + } +) + +;; Same as above without the maskload_else_operand to still allow combine to +;; match a sign-extended pred_mov pattern. +(define_insn_and_rewrite "*aarch64_load__mov" [(set (match_operand:SVE_HSDI 0 "register_operand" "=w") (unspec:SVE_HSDI [(match_operand: 3 "general_operand" "UplDnm") @@ -1433,7 +1459,8 @@ (define_insn "@aarch64_ldnt1" [(set (match_operand:SVE_FULL 0 "register_operand" "=w") (unspec:SVE_FULL [(match_operand: 2 "register_operand" "Upl") - (match_operand:SVE_FULL 1 "memory_operand" "m")] + (match_operand:SVE_FULL 1 "memory_operand" "m") + (match_operand:SVE_FULL 3 "aarch64_maskload_else_operand")] UNSPEC_LDNT1_SVE))] "TARGET_SVE" "ldnt1\t%0., %2/z, %1" @@ -1456,11 +1483,13 @@ (define_expand "gather_load" (match_operand: 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_dup 6) (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" { operands[5] = aarch64_ptrue_reg (mode); + operands[6] = CONST0_RTX (mode); } ) @@ -1474,6 +1503,7 @@ (define_insn "mask_gather_load" (match_operand:VNx4SI 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_4 6 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1503,6 +1533,7 @@ (define_insn "mask_gather_load" (match_operand:VNx2DI 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 6 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1531,6 +1562,7 @@ (define_insn_and_rewrite "*mask_gather_load_xtw_unpac UNSPEC_PRED_X) (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1561,6 +1593,7 @@ (define_insn_and_rewrite "*mask_gather_load_sxtw" UNSPEC_PRED_X) (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1588,6 +1621,7 @@ (define_insn "*mask_gather_load_uxtw" (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate")) (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_2 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] "TARGET_SVE && TARGET_NON_STREAMING" @@ -1624,6 +1658,7 @@ (define_insn_and_rewrite "@aarch64_gather_load_ (match_operand:VNx4SI 2 "register_operand") (match_operand:DI 3 "const_int_operand") (match_operand:DI 4 "aarch64_gather_scale_operand_") + (match_operand:SVE_4BHI 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1663,6 +1698,7 @@ (define_insn_and_rewrite "@aarch64_gather_load_") + (match_operand:SVE_2BHSI 7 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1701,6 +1737,7 @@ (define_insn_and_rewrite "*aarch64_gather_load_") + (match_operand:SVE_2BHSI 8 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1738,6 +1775,7 @@ (define_insn_and_rewrite "*aarch64_gather_load_") + (match_operand:SVE_2BHSI 8 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] @@ -1772,6 +1810,7 @@ (define_insn_and_rewrite "*aarch64_gather_load_") + (match_operand:SVE_2BHSI 8 "aarch64_maskload_else_operand") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 972b03a4fef..30a5b041a0a 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -138,7 +138,8 @@ (define_insn "@aarch64_" [(set (match_operand:SVE_FULLx24 0 "aligned_register_operand" "=Uw") (unspec:SVE_FULLx24 [(match_operand:VNx16BI 2 "register_operand" "Uph") - (match_operand:SVE_FULLx24 1 "memory_operand" "m")] + (match_operand:SVE_FULLx24 1 "memory_operand" "m") + (match_operand:SVE_FULLx24 3 "aarch64_maskload_else_operand")] LD1_COUNT))] "TARGET_STREAMING_SME2" "\t%0, %K2/z, %1" diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 8f3aab2272c..744f36ff67d 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -1069,3 +1069,7 @@ (define_predicate "aarch64_granule16_simm9" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), -4096, 4080) && !(INTVAL (op) & 0xf)"))) + +(define_predicate "aarch64_maskload_else_operand" + (and (match_code "const_int,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) From patchwork Sun Aug 11 21:00:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971385 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=DkukZRkv; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Whqq04rm7z1yYl for ; Mon, 12 Aug 2024 07:02:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 83F513858CDA for ; Sun, 11 Aug 2024 21:02:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id 1150C3858427 for ; Sun, 11 Aug 2024 21:01:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1150C3858427 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1150C3858427 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::533 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410064; cv=none; b=OB3RYTr7StDv/K4t30XOo9ls64+bNOQC3kMtzpHADZWNnYdyMMXt+Dkd85L+2AOYC4gGwRFHt1N1We9f+PCGE7xmKmK3FJkzKLaic3ybMW777p76DiYXdjI8CgURCvsrt9POLqAqf1sJwtGmsxBYd5VMEb+F2l+nONLGLXz2DUA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410064; c=relaxed/simple; bh=VIrVHAI2K9u6JmPl7YQr+cgX6TLLBfxf2IraiiviA7w=; h=DKIM-Signature:Mime-Version:Date:Message-Id:From:Subject:To; b=Ck0980yfG5kdqX1TJ4xN+nTDjxS0lpu5zU/AiliH+C7ONVHtgJySq1HRU7yXIxaZngisNx4TA1b+tHTMxJzx+aAVHm9QJH4e7v8sipOA5jFylplP5iEfogvmqceIgKZd0aOZNjffONC3SVkfX4gk4H9s2n35hHFKO0iS2CUZnmA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-5af6a1afa63so4273686a12.0 for ; Sun, 11 Aug 2024 14:01:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410060; x=1724014860; darn=gcc.gnu.org; h=to:cc:subject:from:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qqnnRWVNEzSeKeaTi1gB+I8eXJLI5uJSeSHl06b7x7I=; b=DkukZRkv939fQSPlc8UG1/dGt4QSd+/pX4/XPDJ1Dau0HI5VKvEUOMYNzMcf50pfAj wXW6sPGFyrE5dHHuqaDEuKgJuiCLbQYuZqXpot1rauTvaJcSgIDMu2t7W0wxGZavIaiS zM/8pPAqHZvLP0msSijWW0ryI7g5G82dJJzhQwAuTqvN0RlXDTp2UZ/izv8zdOf8d2B/ sQ8wWzQgkIvnb6ydTspHiIKqKDeQVAWTM+gNV5GGHIAST/drDiBaiDaopfLQu9pnGJOc e+VCdW94/KzcSCevt/PFAr7FtPSFp+nPMCe8X2cWlSgy/cdNoc3lHSTxnNhhkz6mwgVn x1TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410060; x=1724014860; h=to:cc:subject:from:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qqnnRWVNEzSeKeaTi1gB+I8eXJLI5uJSeSHl06b7x7I=; b=PILbyBAYdUTIaaGf96kQeMCiHv2oVUutjgbop+jUaoU4Mw5H6xJ3E+0WQ9VISYBnJz IV1OsdkFMSyYA/ajt02uKee9jncu2HuS3NKF+gmrzsfVwdDUFB0T6t17snsohxwj7W4t KD15XuM2uqafYojtxVOpz2KaXKKh50IuHTXofsf4ZFdOWPmSwWyhuMhslnTzQj+90oPg IosIzjr//BW77GMa2CC1Klkl7+au12jmHaLPJl3mufLi/vM06Fr49KOUQZeqWwXHlW4/ +NLUyHAQViX+IGXguMm5QWg9hx5naWCxGBQyzQwjglv6qHTnt26WyxwL9IWuKCnBUzpH 4eIQ== X-Gm-Message-State: AOJu0YwonWc3u5LMx98qEqzh+cvtyIjeoEKxoZOQYE/juPlBvbNHcAHu 3GvOBd2c1MluQ7dl9pcXZkNKdCafy77Huy0KOBfDEmzlksMb19TiECFzIw== X-Google-Smtp-Source: AGHT+IFbmNYYBLYQqRPGNtSPkiGf4Z/vVQdyrwJBagzu+uY1BAIhtudI4HMdldbb7vtgs/EostjB4Q== X-Received: by 2002:a05:6402:11d2:b0:578:60a6:7c69 with SMTP id 4fb4d7f45d1cf-5bd0a69364emr4717827a12.30.1723410059656; Sun, 11 Aug 2024 14:00:59 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5bd1a5e0030sm1622076a12.65.2024.08.11.14.00.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:00:59 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:00:58 +0200 Message-Id: From: "Robin Dapp" Subject: [PATCH 6/8] gcn: Add else operand to masked loads. Cc: , "Richard Sandiford" , "Richard Biener" To: "gcc-patches" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds a zero else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new predicate. --- gcc/config/gcn/gcn-valu.md | 6 ++++-- gcc/config/gcn/predicates.md | 3 +++ 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index b24cf9be32e..2344bc00ffc 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -4002,7 +4002,8 @@ (define_expand "while_ultsidi" (define_expand "maskloaddi" [(match_operand:V_MOV 0 "register_operand") (match_operand:V_MOV 1 "memory_operand") - (match_operand 2 "")] + (match_operand 2 "") + (match_operand:V_MOV 3 "maskload_else_operand")] "" { rtx exec = force_reg (DImode, operands[2]); @@ -4040,7 +4041,8 @@ (define_expand "mask_gather_load" (match_operand: 2 "register_operand") (match_operand 3 "immediate_operand") (match_operand:SI 4 "gcn_alu_operand") - (match_operand:DI 5 "")] + (match_operand:DI 5 "") + (match_operand:V_MOV 6 "maskload_else_operand")] "" { rtx exec = force_reg (DImode, operands[5]); diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md index 3f59396a649..9bc806cf990 100644 --- a/gcc/config/gcn/predicates.md +++ b/gcc/config/gcn/predicates.md @@ -228,3 +228,6 @@ (define_predicate "ascending_zero_int_parallel" return gcn_stepped_zero_int_parallel_p (op, 1); }) +(define_predicate "maskload_else_operand" + (and (match_code "const_int,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) From patchwork Sun Aug 11 21:01:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971386 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=SpiXnjQa; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WhqqB51v1z1yYl for ; Mon, 12 Aug 2024 07:02:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 832063858282 for ; Sun, 11 Aug 2024 21:02:44 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id B998C3858D3C for ; Sun, 11 Aug 2024 21:01:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B998C3858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B998C3858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::52b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410068; cv=none; b=EIWyCPV8cq2HJUiB97V256WqPsD43ETJbQyZ3jIudZ/2EglIMiH0Q2w4WHEYw/xBTbNa+pRsdlgMGvWflthglm42tf+e4X4lPFtnpcrFCCTkVojq+Mt2gT+fCF/+RmiKa2sBrm/bzi5ACMvmWzcuLUV+uPZwntZN0Ajo91R4UK0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410068; c=relaxed/simple; bh=CqbDuqXdXhT5Y4gj0v1H1ahJmrNo3lenkvEUcuqnXCs=; h=DKIM-Signature:Mime-Version:Date:Message-Id:Subject:To:From; b=E8iiSrurS3VjEsXMGTfWvQ2tOl6ptvjxIRjGPezSXXeGH5t13OMXl7k2j9BajK5qQJe8r093arQ1YILWzi4a7iHO/IhuaX8vbYiOAxiNefWFvArNTuoIUikEgDWTOu7GobL2iYqyJHLoZCq1y/ys2coP1ASqmn8iEu3jBd+81BM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x52b.google.com with SMTP id 4fb4d7f45d1cf-5a1337cfbb5so4908572a12.3 for ; Sun, 11 Aug 2024 14:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410062; x=1724014862; darn=gcc.gnu.org; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xLjt7ZtBmJAUi71nY6htF9lnqBCByPFrvoElZIioYX8=; b=SpiXnjQaK0lSktrYzYVoJgDaBdkAPA6hBe5tN+ql/AnhW0VD6QWI4zTTZOzfTQco12 iu6/dRnToCWZLXGw3DnYykeUiHps76ZonraKJdl3w0DpoSkSItf9WaUNByzWti6ZaK3x VtwbpTt88dGj7SAcgh2EWe1IGunj9Rj4JSQsl70S+0SZlAiZC9Px2UuNs3c6drMeimS+ tbE1Gg23+gWbNRcx+wR9LfkAQdwfy8w5E+MTM5I8f1hi41vm6fBs1YihH4+G90g4vhXs GiM0zTb6JGe4z5Tf8xrYXeUyBe8FDap7hstVAhJEjlijG/X0B6M1zzhVURkuvEOW5cI7 usnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410062; x=1724014862; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xLjt7ZtBmJAUi71nY6htF9lnqBCByPFrvoElZIioYX8=; b=eotLae8MrzyO0tG7/3zLeSklQp6thutzZC/bs7kHW++PKRq67iZSbBK/NP4JaZIHuG TMhGSqhsacWD3VWtSyvV737olrufFzXgKk9ldq1eJmX2+d01ybiAYvEXyxmGEkF5ge9l +yRV1c95SnG4Y9ESe5XeUzjC1GF6EQzJWsRwESTd+sHJ+HCYFHAC2Z9JyIYWoqcgVGRp joAYFISfIF1gRSeVCFhcB9O7z5I3/72Q2LoTHm3nRo94WUMKwRXVujzPAzTwX41uXBEu bf83NzaLq3fknnxa+EpJdfG3/grs+BL5YlByeqYPFtNNY7QZ81/YoCNfRZqwrtyqPNxl iGeg== X-Gm-Message-State: AOJu0YzS6ak8M9Y+SiScZy4ib4dAjmcOxzQC/FwfywAvO7Inl99/ohx+ fqDroKEAnn5IVnky7d2eDAuCJ6aEoiWSUkKUab0CyqRnQrTwF6W8XYtxBQ== X-Google-Smtp-Source: AGHT+IEcXGlYlP9OlAUuBu59HGYaZv0tAmzO/JHPFjRyJ68WSsIiCzItY72+W0tDoVQ9SNRtRBa0ew== X-Received: by 2002:a17:907:c892:b0:a7a:aa35:4089 with SMTP id a640c23a62f3a-a80aa594542mr541971166b.24.1723410061346; Sun, 11 Aug 2024 14:01:01 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a80bb2136c5sm172421166b.185.2024.08.11.14.01.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:01:00 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:01:00 +0200 Message-Id: Subject: [PATCH 7/8] i386: Add else operand to masked loads. Cc: , "Richard Sandiford" , "Richard Biener" To: "gcc-patches" From: "Robin Dapp" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds a zero else operand to masked loads, in particular the masked gather load builtins that are used for gather vectorization. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Add else-operand handling. (ix86_expand_builtin): Ditto. * config/i386/predicates.md (vcvtne2ps2bf_parallel): New predicate. (maskload_else_operand): Ditto. * config/i386/sse.md: Use predicate. --- gcc/config/i386/i386-expand.cc | 59 +++++++++++++--- gcc/config/i386/predicates.md | 15 ++++ gcc/config/i386/sse.md | 124 ++++++++++++++++++++------------- 3 files changed, 142 insertions(+), 56 deletions(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index d9ad06264aa..b8505fe2c38 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -12462,10 +12462,11 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, { tree arg; rtx pat, op; - unsigned int i, nargs, arg_adjust, memory; + unsigned int i, nargs, arg_adjust, memory = -1; unsigned int constant = 100; bool aligned_mem = false; - rtx xops[4]; + rtx xops[4] = {}; + bool add_els = false; enum insn_code icode = d->icode; const struct insn_data_d *insn_p = &insn_data[icode]; machine_mode tmode = insn_p->operand[0].mode; @@ -12592,6 +12593,9 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case V4DI_FTYPE_PCV4DI_V4DI: case V4SI_FTYPE_PCV4SI_V4SI: case V2DI_FTYPE_PCV2DI_V2DI: + /* Two actual args but an additional else operand. */ + add_els = true; + /* Fallthru. */ case VOID_FTYPE_INT_INT64: nargs = 2; klass = load; @@ -12864,6 +12868,12 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, xops[i]= op; } + if (add_els) + { + xops[i] = CONST0_RTX (GET_MODE (xops[0])); + nargs++; + } + switch (nargs) { case 0: @@ -13113,10 +13123,11 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, size_t i; enum insn_code icode, icode2; tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); - tree arg0, arg1, arg2, arg3, arg4; - rtx op0, op1, op2, op3, op4, pat, pat2, insn; - machine_mode mode0, mode1, mode2, mode3, mode4; + tree arg0, arg1, arg2, arg3, arg4, arg5; + rtx op0, op1, op2, op3, op4, op5, opels, pat, pat2, insn; + machine_mode mode0, mode1, mode2, mode3, mode4, mode5; unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl); + bool has_else_op; HOST_WIDE_INT bisa, bisa2; /* For CPU builtins that can be folded, fold first and expand the fold. */ @@ -14919,6 +14930,7 @@ rdseed_step: arg2 = CALL_EXPR_ARG (exp, 2); arg3 = CALL_EXPR_ARG (exp, 3); arg4 = CALL_EXPR_ARG (exp, 4); + has_else_op = call_expr_nargs (exp) == 6; op0 = expand_normal (arg0); op1 = expand_normal (arg1); op2 = expand_normal (arg2); @@ -15021,10 +15033,38 @@ rdseed_step: op3 = copy_to_reg (op3); op3 = lowpart_subreg (mode3, op3, GET_MODE (op3)); } - if (!insn_data[icode].operand[5].predicate (op4, mode4)) + /* The vectorizer only adds an else operand for real masks. */ + if (has_else_op) + { + if (op4 != CONST0_RTX (GET_MODE (subtarget))) + { + error ("the else operand must be 0"); + return const0_rtx; + } + else + { + arg5 = CALL_EXPR_ARG (exp, 5); + op5 = expand_normal (arg5); + /* Note the arg order is different from the operand order. */ + mode5 = insn_data[icode].operand[5].mode; + if (!insn_data[icode].operand[5].predicate (op5, mode5)) + { + error ("the last argument must be scale 1, 2, 4, 8"); + return const0_rtx; + } + } + opels = op4; + op4 = op5; + mode4 = mode5; + } + else { - error ("the last argument must be scale 1, 2, 4, 8"); - return const0_rtx; + if (!insn_data[icode].operand[5].predicate (op4, mode4)) + { + error ("the last argument must be scale 1, 2, 4, 8"); + return const0_rtx; + } + opels = CONST0_RTX (GET_MODE (subtarget)); } /* Optimize. If mask is known to have all high bits set, @@ -15095,7 +15135,8 @@ rdseed_step: } } - pat = GEN_FCN (icode) (subtarget, op0, op1, op2, op3, op4); + pat = GEN_FCN (icode) (subtarget, op0, op1, op2, op3, op4, opels); + if (! pat) return const0_rtx; emit_insn (pat); diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 680594871de..aac7341aeab 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -2332,3 +2332,18 @@ (define_predicate "apx_ndd_add_memory_operand" return true; }) + +;; Check that each element is odd and incrementally increasing from 1 +(define_predicate "vcvtne2ps2bf_parallel" + (and (match_code "const_vector") + (match_code "const_int" "a")) +{ + for (int i = 0; i < XVECLEN (op, 0); ++i) + if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1)) + return false; + return true; +}) + +(define_predicate "maskload_else_operand" + (and (match_code "const_int,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index baaec689749..d1e64152000 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1487,7 +1487,8 @@ (define_expand "_load_mask" } else if (MEM_P (operands[1])) operands[1] = gen_rtx_UNSPEC (mode, - gen_rtvec(1, operands[1]), + gen_rtvec(2, operands[1], + CONST0_RTX (mode)), UNSPEC_MASKLOAD); }) @@ -1495,7 +1496,8 @@ (define_insn "*_load_mask" [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") (vec_merge:V48_AVX512VL (unspec:V48_AVX512VL - [(match_operand:V48_AVX512VL 1 "memory_operand" "m")] + [(match_operand:V48_AVX512VL 1 "memory_operand" "m") + (match_operand:V48_AVX512VL 4 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_operand:V48_AVX512VL 2 "nonimm_or_0_operand" "0C") (match_operand: 3 "register_operand" "Yk")))] @@ -1523,7 +1525,8 @@ (define_insn "*_load_mask" (define_insn_and_split "*_load" [(set (match_operand:V48_AVX512VL 0 "register_operand") (unspec:V48_AVX512VL - [(match_operand:V48_AVX512VL 1 "memory_operand")] + [(match_operand:V48_AVX512VL 1 "memory_operand") + (match_operand:V48_AVX512VL 2 "maskload_else_operand")] UNSPEC_MASKLOAD))] "TARGET_AVX512F" "#" @@ -1545,7 +1548,8 @@ (define_expand "_load_mask" } else if (MEM_P (operands[1])) operands[1] = gen_rtx_UNSPEC (mode, - gen_rtvec(1, operands[1]), + gen_rtvec(2, operands[1], + CONST0_RTX (mode)), UNSPEC_MASKLOAD); }) @@ -1554,7 +1558,8 @@ (define_insn "*_load_mask" [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v") (vec_merge:VI12HFBF_AVX512VL (unspec:VI12HFBF_AVX512VL - [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m")] + [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m") + (match_operand:VI12HFBF_AVX512VL 4 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_operand:VI12HFBF_AVX512VL 2 "nonimm_or_0_operand" "0C") (match_operand: 3 "register_operand" "Yk")))] @@ -1567,7 +1572,8 @@ (define_insn "*_load_mask" (define_insn_and_split "*_load" [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand" "=v") (unspec:VI12HFBF_AVX512VL - [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m")] + [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand" "m") + (match_operand:VI12HFBF_AVX512VL 2 "maskload_else_operand")] UNSPEC_MASKLOAD))] "TARGET_AVX512BW" "#" @@ -28440,7 +28446,8 @@ (define_insn "_maskload" [(set (match_operand:V48_128_256 0 "register_operand" "=x") (unspec:V48_128_256 [(match_operand: 2 "register_operand" "x") - (match_operand:V48_128_256 1 "memory_operand" "jm")] + (match_operand:V48_128_256 1 "memory_operand" "jm") + (match_operand:V48_128_256 3 "maskload_else_operand")] UNSPEC_MASKMOV))] "TARGET_AVX" { @@ -28481,7 +28488,8 @@ (define_expand "maskload" [(set (match_operand:V48_128_256 0 "register_operand") (unspec:V48_128_256 [(match_operand: 2 "register_operand") - (match_operand:V48_128_256 1 "memory_operand")] + (match_operand:V48_128_256 1 "memory_operand") + (match_operand:V48_128_256 3 "maskload_else_operand")] UNSPEC_MASKMOV))] "TARGET_AVX") @@ -28489,20 +28497,24 @@ (define_expand "maskload" [(set (match_operand:V48_AVX512VL 0 "register_operand") (vec_merge:V48_AVX512VL (unspec:V48_AVX512VL - [(match_operand:V48_AVX512VL 1 "memory_operand")] + [(match_operand:V48_AVX512VL 1 "memory_operand") + (match_operand:V48_AVX512VL 3 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_dup 0) - (match_operand: 2 "register_operand")))] + (match_operand: 2 "register_operand"))) + ] "TARGET_AVX512F") (define_expand "maskload" [(set (match_operand:VI12HFBF_AVX512VL 0 "register_operand") (vec_merge:VI12HFBF_AVX512VL (unspec:VI12HFBF_AVX512VL - [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand")] + [(match_operand:VI12HFBF_AVX512VL 1 "memory_operand") + (match_operand:VI12HFBF_AVX512VL 3 "maskload_else_operand")] UNSPEC_MASKLOAD) (match_dup 0) - (match_operand: 2 "register_operand")))] + (match_operand: 2 "register_operand"))) + ] "TARGET_AVX512BW") (define_expand "maskstore" @@ -29067,20 +29079,22 @@ (define_expand "avx2_gathersi" (unspec:VEC_GATHER_MODE [(match_operand:VEC_GATHER_MODE 1 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand ")])) + (match_operand:SI 5 "const1248_operand ") + (match_operand:VEC_GATHER_MODE 6 "maskload_else_operand")])) (mem:BLK (scratch)) (match_operand:VEC_GATHER_MODE 4 "register_operand")] UNSPEC_GATHER)) - (clobber (match_scratch:VEC_GATHER_MODE 7))])] + (clobber (match_scratch:VEC_GATHER_MODE 8))])] "TARGET_AVX2" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx2_gathersi" @@ -29091,7 +29105,8 @@ (define_insn "*avx2_gathersi" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "jb") (match_operand: 4 "register_operand" "x") - (match_operand:SI 6 "const1248_operand")] + (match_operand:SI 6 "const1248_operand") + (match_operand:VEC_GATHER_MODE 8 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand:VEC_GATHER_MODE 5 "register_operand" "1")] @@ -29112,7 +29127,8 @@ (define_insn "*avx2_gathersi_2" [(unspec:P [(match_operand:P 2 "vsib_address_operand" "jb") (match_operand: 3 "register_operand" "x") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VEC_GATHER_MODE 7 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand:VEC_GATHER_MODE 4 "register_operand" "1")] @@ -29130,20 +29146,22 @@ (define_expand "avx2_gatherdi" (unspec:VEC_GATHER_MODE [(match_operand: 1 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand ")])) + (match_operand:SI 5 "const1248_operand ") + (match_operand:VEC_GATHER_MODE 6 "maskload_else_operand")])) (mem:BLK (scratch)) (match_operand: 4 "register_operand")] UNSPEC_GATHER)) - (clobber (match_scratch:VEC_GATHER_MODE 7))])] + (clobber (match_scratch:VEC_GATHER_MODE 8))])] "TARGET_AVX2" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx2_gatherdi" @@ -29154,7 +29172,8 @@ (define_insn "*avx2_gatherdi" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "jb") (match_operand: 4 "register_operand" "x") - (match_operand:SI 6 "const1248_operand")] + (match_operand:SI 6 "const1248_operand") + (match_operand:VEC_GATHER_MODE 8 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 5 "register_operand" "1")] @@ -29175,7 +29194,8 @@ (define_insn "*avx2_gatherdi_2" [(unspec:P [(match_operand:P 2 "vsib_address_operand" "jb") (match_operand: 3 "register_operand" "x") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VEC_GATHER_MODE 7 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 4 "register_operand" "1")] @@ -29201,7 +29221,8 @@ (define_insn "*avx2_gatherdi_3" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "jb") (match_operand: 4 "register_operand" "x") - (match_operand:SI 6 "const1248_operand")] + (match_operand:SI 6 "const1248_operand") + (match_operand:VI4F_256 8 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 5 "register_operand" "1")] @@ -29225,7 +29246,8 @@ (define_insn "*avx2_gatherdi_4" [(unspec:P [(match_operand:P 2 "vsib_address_operand" "jb") (match_operand: 3 "register_operand" "x") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI4F_256 7 "maskload_else_operand")] UNSPEC_VSIBADDR)]) (mem:BLK (scratch)) (match_operand: 4 "register_operand" "1")] @@ -29246,17 +29268,19 @@ (define_expand "_gathersi" [(match_operand:VI48F 1 "register_operand") (match_operand: 4 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand")]))] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 6 "maskload_else_operand")]))] UNSPEC_GATHER)) - (clobber (match_scratch: 7))])] + (clobber (match_scratch: 8))])] "TARGET_AVX512F" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_gathersi" @@ -29268,7 +29292,8 @@ (define_insn "*avx512f_gathersi" [(unspec:P [(match_operand:P 4 "vsib_address_operand" "Tv") (match_operand: 3 "register_operand" "v") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 8 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch: 2 "=&Yk"))] @@ -29289,7 +29314,8 @@ (define_insn "*avx512f_gathersi_2" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand")] + (match_operand:SI 4 "const1248_operand") + (match_operand:VI48F 7 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch: 1 "=&Yk"))] @@ -29308,17 +29334,19 @@ (define_expand "_gatherdi" [(match_operand: 1 "register_operand") (match_operand:QI 4 "register_operand") (mem: - (match_par_dup 6 + (match_par_dup 7 [(match_operand 2 "vsib_address_operand") (match_operand: 3 "register_operand") - (match_operand:SI 5 "const1248_operand")]))] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 6 "maskload_else_operand")]))] UNSPEC_GATHER)) - (clobber (match_scratch:QI 7))])] + (clobber (match_scratch:QI 8))])] "TARGET_AVX512F" { - operands[6] - = gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[2], operands[3], - operands[5]), UNSPEC_VSIBADDR); + operands[7] + = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[2], operands[3], + operands[5], operands[6]), + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_gatherdi" @@ -29330,7 +29358,8 @@ (define_insn "*avx512f_gatherdi" [(unspec:P [(match_operand:P 4 "vsib_address_operand" "Tv") (match_operand: 3 "register_operand" "v") - (match_operand:SI 5 "const1248_operand")] + (match_operand:SI 5 "const1248_operand") + (match_operand:VI48F 8 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch:QI 2 "=&Yk"))] @@ -29351,7 +29380,8 @@ (define_insn "*avx512f_gatherdi_2" [(unspec:P [(match_operand:P 3 "vsib_address_operand" "Tv") (match_operand: 2 "register_operand" "v") - (match_operand:SI 4 "const1248_operand")] + (match_operand:SI 4 "const1248_operand") + (match_operand:VI48F 7 "maskload_else_operand")] UNSPEC_VSIBADDR)])] UNSPEC_GATHER)) (clobber (match_scratch:QI 1 "=&Yk"))] @@ -29388,7 +29418,7 @@ (define_expand "_scattersi" operands[5] = gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2], operands[4], operands[1]), - UNSPEC_VSIBADDR); + UNSPEC_VSIBADDR); }) (define_insn "*avx512f_scattersi" From patchwork Sun Aug 11 21:01:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 1971387 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=Ks2p8BD1; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Whqr11xTHz1yYl for ; Mon, 12 Aug 2024 07:03:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 817DF3858CDA for ; Sun, 11 Aug 2024 21:03:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by sourceware.org (Postfix) with ESMTPS id E8B8E3858C39 for ; Sun, 11 Aug 2024 21:01:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E8B8E3858C39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E8B8E3858C39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410069; cv=none; b=AoCpMskM7TTsLP3kEMIhPt2zO7P+v7hbSCd0n5WgCMTEysdB0Fu8cMZ2Hp06MHmwcQeXyGR1o89hU4JyO4tfdS3SPjYsSaj0kAgio9BstGGnD5uiAa5Iclwx1Y44bhmK43nhMnpvLrsuRIItDVhc1riQXtMVGBSzJ1fGGOXT2/Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723410069; c=relaxed/simple; bh=+o0QzAp80EAKtFp9587lcRfFhdgjnPBVaCYzV9CL3E0=; h=DKIM-Signature:Mime-Version:Date:Message-Id:Subject:To:From; b=rRpmoUtisWXh9wwwi6i9kpRy9UKt1TKAj+9OtXEnfSmWVOTz9wl49RtvEvDdgSPNZi4bl/1e3nGPF+8xLHQmzxOdfmwV/M9QGmM99JdcYN+MQIz2nrmzVo7kSxjfTg8COrPYqF+ggriL4Ysyt6rPWt0uvMYGGc8JM8TUIJ1q2Pc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2ef23d04541so40870621fa.2 for ; Sun, 11 Aug 2024 14:01:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723410064; x=1724014864; darn=gcc.gnu.org; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+sfbJP+ipirl/YTJO9yE8pobj5A6c3JpJbYlFKq7VZs=; b=Ks2p8BD1L+q0SgPbeiZQ6TuomqZO1PTqZL9RCM/bA1iorqueI48rM37kvWAOxqwRwG 44cHwYz8sad5JpuT5KJV8EPENfbU4qXVLROuNy+rIZQB+nBBP4KXFRu7It59E9TwsIij Jdwcuv11nwl0iu/4a1z2IZ5G6HMlg29cKpwdDim0f3HGAtirYGfGPqgvuUs4Rgk+Kth5 AfZgO654P5dyNDzpEXIPpnaXL7l7sC79J6jQUJCrVK3DrK8lKg1OjYtR5a3Px336dCgE VXh3OAVp2Cla9Jru+vtEb1DqJurXKK4ap7ImqQD9MlTum2tFEJubiZ8Iiq2iO068k18y 9FfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723410064; x=1724014864; h=from:to:cc:subject:message-id:date:content-transfer-encoding :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+sfbJP+ipirl/YTJO9yE8pobj5A6c3JpJbYlFKq7VZs=; b=EGO/snR08tk6RVbumq19h97Zi44HoQgH5KWpht6O1FA5ZmqsaZw24vmZ/xi7W6CYr5 uuyrXwmPY2hMuP6iEJIFC6bQodUi7crooYyQMpAAHR/+CrOVwsO4xED44gX43Etst8PK JjWalffWfaDb1HcfaCFIt2hJwdSzLRMrXavIg6tmw42AVoBEa6NVgq4J3wt8iBH1BZ3b +7MRp/9kQZNCkubGzWGb2WEUiJuf/TF1ADoCsHwYOaZ69ffRdppXipbM46wexcKAFpRT 9XM6fLfIgNcnvVPnmUajj+Z6iw3Zw69n8VhKd16rt3W7Mj4pSrXAS7dCmFE+BVmehRcZ 29Lw== X-Gm-Message-State: AOJu0YzKE3G+Uh64FeYIAvpWW+rh8jt3UyXijqBT84J7efWU+BBanqI9 XsZmk+3aUUpElYbq8gaK6gOyvnig3Tovruzh23lIDUHv2pMXfz2ezIZC8w== X-Google-Smtp-Source: AGHT+IFL1URR2eiTbY4Q+9YJ1R33pBnjg9NN+0bWzXukuBVRGOH/j+9y9Mwq/Xhz4VRh7lk6F2sXWg== X-Received: by 2002:a05:6512:3b13:b0:52c:c9ce:be8d with SMTP id 2adb3069b0e04-530eea1e1edmr5067544e87.57.1723410063283; Sun, 11 Aug 2024 14:01:03 -0700 (PDT) Received: from localhost (ip-149-172-150-237.um42.pools.vodafone-ip.de. [149.172.150.237]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a80bb11a56dsm172050966b.93.2024.08.11.14.01.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Aug 2024 14:01:02 -0700 (PDT) Mime-Version: 1.0 Date: Sun, 11 Aug 2024 23:01:02 +0200 Message-Id: Subject: [PATCH 8/8] RISC-V: Add else operand to masked loads [PR115536]. Cc: , "Richard Sandiford" , "Richard Biener" To: "gcc-patches" From: "Robin Dapp" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds else operands to masked loads. Currently the default else operand predicate accepts "undefined" (i.e. SCRATCH) as well as all-ones values. Note that this series introduces a large number of new RVV FAILs for riscv. All of them are due to us not being able to elide redundant vec_cond_exprs. PR 115336 PR 116059 gcc/ChangeLog: * config/riscv/autovec.md: Add else operand. * config/riscv/predicates.md (maskload_else_operand): New predicate. * config/riscv/riscv-v.cc (get_else_operand): Remove static. (expand_load_store): Use get_else_operand and adjust index. (expand_gather_scatter): Ditto. (expand_lanes_load_store): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr115336.c: New test. * gcc.target/riscv/rvv/autovec/pr116059.c: New test. --- gcc/config/riscv/autovec.md | 45 +++++++++++-------- gcc/config/riscv/predicates.md | 3 ++ gcc/config/riscv/riscv-v.cc | 26 +++++++---- .../gcc.target/riscv/rvv/autovec/pr115336.c | 20 +++++++++ .../gcc.target/riscv/rvv/autovec/pr116059.c | 9 ++++ 5 files changed, 76 insertions(+), 27 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index d5793acc999..4111474309c 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -26,8 +26,9 @@ (define_expand "mask_len_load" [(match_operand:V 0 "register_operand") (match_operand:V 1 "memory_operand") (match_operand: 2 "vector_mask_operand") - (match_operand 3 "autovec_length_operand") - (match_operand 4 "const_0_operand")] + (match_operand:V 3 "maskload_else_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] "TARGET_VECTOR" { riscv_vector::expand_load_store (operands, true); @@ -57,8 +58,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -72,8 +74,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -87,8 +90,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -102,8 +106,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -117,8 +122,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -132,8 +138,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR && riscv_vector::gather_scatter_valid_offset_p (mode)" { riscv_vector::expand_gather_scatter (operands, true); @@ -151,8 +158,9 @@ (define_expand "mask_len_gather_load" (match_operand 3 "") (match_operand 4 "") (match_operand: 5 "vector_mask_operand") - (match_operand 6 "autovec_length_operand") - (match_operand 7 "const_0_operand")] + (match_operand 6 "maskload_else_operand") + (match_operand 7 "autovec_length_operand") + (match_operand 8 "const_0_operand")] "TARGET_VECTOR" { riscv_vector::expand_gather_scatter (operands, true); @@ -280,8 +288,9 @@ (define_expand "vec_mask_len_load_lanes" [(match_operand:VT 0 "register_operand") (match_operand:VT 1 "memory_operand") (match_operand: 2 "vector_mask_operand") - (match_operand 3 "autovec_length_operand") - (match_operand 4 "const_0_operand")] + (match_operand 3 "maskload_else_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] "TARGET_VECTOR" { riscv_vector::expand_lanes_load_store (operands, true); diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 9971fabc587..7cc7c2b1f9d 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -528,6 +528,9 @@ (define_predicate "autovec_else_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "scratch_operand"))) +(define_predicate "maskload_else_operand" + (match_operand 0 "scratch_operand")) + (define_predicate "vector_arith_operand" (ior (match_operand 0 "register_operand") (and (match_code "const_vector") diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index d74c4723abc..ed40e768500 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3818,12 +3818,23 @@ expand_select_vl (rtx *ops) emit_insn (gen_no_side_effects_vsetvl_rtx (rvv_mode, ops[0], ops[1])); } +/* Return RVV_VUNDEF if the ELSE value is scratch rtx. */ +static rtx +get_else_operand (rtx op) +{ + return GET_CODE (op) == SCRATCH ? RVV_VUNDEF (GET_MODE (op)) : op; +} + /* Expand MASK_LEN_{LOAD,STORE}. */ void expand_load_store (rtx *ops, bool is_load) { - rtx mask = ops[2]; - rtx len = ops[3]; + int idx = 2; + rtx mask = ops[idx++]; + /* A masked load has a merge/else operand. */ + if (is_load) + get_else_operand (ops[idx++]); + rtx len = ops[idx]; machine_mode mode = GET_MODE (ops[0]); if (is_vlmax_len_p (mode, len)) @@ -3916,13 +3927,6 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, rtx *ops, rtx len) emit_nonvlmax_insn (icode, insn_flags, ops, len); } -/* Return RVV_VUNDEF if the ELSE value is scratch rtx. */ -static rtx -get_else_operand (rtx op) -{ - return GET_CODE (op) == SCRATCH ? RVV_VUNDEF (GET_MODE (op)) : op; -} - /* Expand unary ops COND_LEN_*. */ void expand_cond_len_unop (unsigned icode, rtx *ops) @@ -4043,6 +4047,8 @@ expand_gather_scatter (rtx *ops, bool is_load) int shift; rtx mask = ops[5]; rtx len = ops[6]; + if (is_load) + len = ops[7]; if (is_load) { vec_reg = ops[0]; @@ -4265,6 +4271,8 @@ expand_lanes_load_store (rtx *ops, bool is_load) { rtx mask = ops[2]; rtx len = ops[3]; + if (is_load) + len = ops[4]; rtx addr = is_load ? XEXP (ops[1], 0) : XEXP (ops[0], 0); rtx reg = is_load ? ops[0] : ops[1]; machine_mode mode = GET_MODE (ops[0]); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c new file mode 100644 index 00000000000..29e55705a7a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d } } */ + +short d[19]; +_Bool e[100][19][19]; +_Bool f[10000]; + +int main() +{ + for (long g = 0; g < 19; ++g) + d[g] = 3; + _Bool(*h)[19][19] = e; + for (short g = 0; g < 9; g++) + for (int i = 4; i < 16; i += 3) + f[i * 9 + g] = d[i] ? d[i] : h[g][i][2]; + for (long i = 120; i < 122; ++i) + __builtin_printf("%d\n", f[i]); +} + +/* { dg-final { scan-assembler-times {vmv.v.i\s*v[0-9]+,0} 3 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c new file mode 100644 index 00000000000..55e03c5ec0b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116059.c @@ -0,0 +1,9 @@ +char a; +_Bool b[11] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; +int main() { + _Bool *c = b; + for (signed d = 0; d < 11; d += 1) + a = d % 2 == 0 ? c[d] / c[d] + : c[d]; + __builtin_printf("%u\n", a); +}