From patchwork Thu Oct 10 08:27:04 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jennifer Schmitz <jschmitz@nvidia.com>
X-Patchwork-Id: 1995326
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256
 header.s=selector2 header.b=dK2n4Avm;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XPND53J9gz1xtp
	for <incoming@patchwork.ozlabs.org>; Thu, 10 Oct 2024 19:27:33 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id B9DFC3857C5F
	for <incoming@patchwork.ozlabs.org>; Thu, 10 Oct 2024 08:27:30 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from NAM10-MW2-obe.outbound.protection.outlook.com
 (mail-mw2nam10on2060f.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:2412::60f])
 by sourceware.org (Postfix) with ESMTPS id 316983858C2B
 for <gcc-patches@gcc.gnu.org>; Thu, 10 Oct 2024 08:27:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 316983858C2B
Authentication-Results: sourceware.org;
 dmarc=fail (p=reject dis=none) header.from=nvidia.com
Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nvidia.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 316983858C2B
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:2412::60f
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1728548832; cv=pass;
 b=Ssme+m0zaaK/qfwN9SadmwdtKspAdEf6sIG7etYLLkELHFrqQsj2UgcQZi6Ft5Oxyj1+LYTROCU6j5TNKjUMcbI3CoMxSt6vmw/0RgMfAIpMqklYpuKslAd9HeKZKRyRNG5rOLlPJcLSDXrKK9LTejEJQklXJE2iecjXNb16w0k=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1728548832; c=relaxed/simple;
 bh=FEWylIKnKiquqkodqgM6WfoOAhHj2ZJYSVbRNdf8LqA=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=m4DvSUPFwNr7dB3nILmwrC71yzRYIiEzL7nzBjzFPrzIE/0vii75YFwySOM03rT8FABRmmsygp+xNkzhbpi2ykUn5FT1u80NhThfb+o3zYqwit3MaPgOYZdP+HMAaEvxxOos5e3+2zk2rhrqTb2b38k1E6xe96MCwWNXDKDxGZI=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=IBprfuCbNTsUkBUEnvWTVYb/XC0xm9sNkqQ3PuVaPo/SrVRsEMynEL3vKPZIPY3vxyXlsJ1D2/PHPNWpk82sc8FNclUXgK0MPaKEFBAg+mfqPbJmLj7Gp3XeXDTWFhizWUwqcIT/Trh132hexNW8Ares5ZQ5JdU2R40XCGXUA2VVw0KEKf56tPVuSAHJZ3CoWcxGTksId7XLJJ9Ft8YmtXQQI1SkdPMJATN9RhE2uVe/Ym+k1CtQ9w2eqzP55KfhnrCf1rwGK6fgaB9gh1R69Y97GKWzYcK0w7WK3CwvcGva4E6D9yYQ4vlIIfy1FF0JOir5E+hAtvkQIP6unRwKMw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=xMJeuLVLpJwccNiD3HNQvmSDiQL8PVR3U+//KOzEkkE=;
 b=SMP0/WO63LoH76/laW0z4tknlJCyWneT8ZvV0eCvXeNsB3jtZtzEkRLYBWzbxyIxnpq7SYQpEKZtytChCcIF19Fh18HGKbWxT2Fcc/05A8TfS8nt+dNck7BgQn5DMH1mo7YvHYMWE5iZjrE49krwypSulA7GsZuIFjnlM4s5VCbYNVpLGiM6dIpIPzV7ES5/hz9EZ9FtU8DHZ6TqlS09umsL3R40OZgXoYCAyloiW4dgC3quY+BwHEODEOoCN1DVPWyCLE0GpcR5Xm7eY8F1CWpEeDD26WhgGvrOwiBAR/ePLE6YoSF7NaHnxYWFY9gL2RBCyuhRbxRmqdwzV2cSYQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=xMJeuLVLpJwccNiD3HNQvmSDiQL8PVR3U+//KOzEkkE=;
 b=dK2n4AvmLKq7/OBL+SQV1mklEIPGqf1FQGIO37TjKFbIMSmJ6387f+czbgvbjuyhgzEReuOEVxvUVoonfTHtkVY2t8u6KQyGPyA+gfeG5bc1irlrsFyAA6FAQrQJVAxPi43L5Iqtg3d+Jy5g3K3KhqVSO79BscV1jL0dacciva+l803baHt+MLW1FZm7pUQaioplTNZeWsYE+CaGllsUhQSgb4UB0COO+axk3HCxRi79Frh4nLKtM8t/HbWscoItd4e3T82jZRHJA1aD4/UTXH5O0SUJKdJ7339JtF1K4JAqa8xy3vxvgwOs+Tb+zEGkolktfdTkrECwKk5xGC9SAA==
Received: from CH0PR12MB5252.namprd12.prod.outlook.com (2603:10b6:610:d3::24)
 by PH8PR12MB7136.namprd12.prod.outlook.com (2603:10b6:510:22b::5)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8048.16; Thu, 10 Oct
 2024 08:27:05 +0000
Received: from CH0PR12MB5252.namprd12.prod.outlook.com
 ([fe80::290b:293f:5cbd:9c9d]) by CH0PR12MB5252.namprd12.prod.outlook.com
 ([fe80::290b:293f:5cbd:9c9d%5]) with mapi id 15.20.8048.017; Thu, 10 Oct 2024
 08:27:05 +0000
From: Jennifer Schmitz <jschmitz@nvidia.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Sandiford <richard.sandiford@arm.com>, "richard.earnshaw@arm.com"
 <richard.earnshaw@arm.com>, Kyrylo Tkachov <ktkachov@nvidia.com>, Tamar
 Christina <tamar.christina@arm.com>
Subject: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector reductions
Thread-Topic: [PATCH][PR113816] AArch64: Use SIMD+GPR for logical vector
 reductions
Thread-Index: AQHbGu4wBJltS7fO70aEV1YyDlMmng==
Date: Thu, 10 Oct 2024 08:27:04 +0000
Message-ID: <A4713418-FDC8-4D1C-893C-F36F89C5EA08@nvidia.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: CH0PR12MB5252:EE_|PH8PR12MB7136:EE_
x-ms-office365-filtering-correlation-id: 69927c99-d177-4d11-075e-08dce9055290
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|366016|376014|1800799024|38070700018;
x-microsoft-antispam-message-info: 
 bp/c3fjPKGMXkzECw2LwKgjPQRuPKMJcctYy48Syda1FuTj0OIfk2/7y46z6QLckYThISrmAHVG+HY3Xhj4j8zp8598ocLf7UAaxHAoRe9Baq5FsLHO6wgARaZTXoci53XQtBoBAQYjkygjrSULWaEJh7hg2HDgqk1izIUZSUqIZ4EVFVt3oXleoEk2T9lwBuN/dl338c2WJLZmKKR5OZ6RHfQimUuycv+bRmxFM/wat0r/zaRTvnwXaTydNTV3Pv37BARpAyPSArreEu9cV4qXYf9ZBxC99Yht28J2D+AIhIF1vf0M6sd3g2RGZ2CIaVHsZ5XNXegos5jYWomXnZ3BMJYADTaJ+7ChswBFrHjgxx1Uznm1loU4ez06MkUHOIkrfJ4jo+TBjoeGp5AbzV3pvCTBE+KLHMRnGyQKJJVIL5pj8qItQAdOaju4MZfma4gnvwzZBMLFWEgZmNqFxLwK4MMTOw58XbgJE8lYK/PWNXLdwYhfqB7n0dssAXO9wrDY1urPXLM7J7ASWo/B6g7f7Ex6dhcm8FPPK05meC2zd1vaY8OYdWtxvv6QgzS5RhcMWpjXx7MzYqweIzNRkQII71UQV/B80ngGGou9jCh+KuYMj8wWu1Vd24j+v4zRbXGSbxGujYmgVmS2veSrIR2RXIoH25YYFhVP3UrUCGUWX/1zXd+5sKRDsBKvFOTWmB5kSllUwckLa8CPRgN9b4fNknH63t0ZesoRDWeRgFw9lQPQx2DVq/URY93M9/b7VnqOLeHlqizk3YMl6XB/RbaHdyyaBWh62U5g/nXo6+FPPo2PWt71ZLAFc1W+zqbOh2y4GhJcIAPpypcCqEgyO6p2dKetDfrez0IVosBCBa1b+sI2eQI5h5U6ajpryRiFyhdYDfLf6vhnzu+O8/lp5V3Alcjn/LxS9J9IkKallBD7sHmVdepodpGwaJZdWbLVYUBJUHHXTaco9DkxmmkC4UO0qaMPiA32gTNxQUH0Y8/ldh6nd2WMh1WYPZRfOR2Ai4jo//mat6QGeLzirS2JPFJqM/KGJQWgAxmn7fXlXJFmbYZlHZuGE50RgWLyVmB/wzep4Ad+g+80ak7sAo02uxYTaKvpZkh4LIkr+hmBRDtsW0sVa6upGam9uhRv97xU+i8ZIMmOHEiDKlhZBnBL5hXDDyxfQMMG3bEXPPdmXzJxg54TWFZsFdBhW15UEOz8Lf4i62/UC3MRkUN0StPdWHibUrf+uMoO55Lp2aORbGmOM2pLa9HsGXO6gE/6dOcPzdY2K4C7Y14Ibi2IHKtH3HsiGTVvf8DJoKWoD/nBMijbAmG8EjCMiz3qjvnhaN6VpDsNjDw/Uwa0G+N8/PD/Crg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:CH0PR12MB5252.namprd12.prod.outlook.com; PTR:; CAT:NONE;
 SFS:(13230040)(366016)(376014)(1800799024)(38070700018); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: 
 OqeGkoahC4LxVY95sQPBwTeBpl60I7jcDAStTTkAn5MSZ21NVaFMFcdtq+fp+cHnqLm0sV92J1pEChwzqIQ5hHDGPXujYP6t1jU9rlj1jkN5hsfQhH0lY+B5AE0kwLRvzR9AJwcWUjcSqVffJW0rD4hVuN0niAMHI22lMMqUTbWw4DPBvMN8RPH4HXe1yULF1bUU+z2fgrp0+oABNrrLq1Td1Zwg+j5XttT+u51Ho+x7Kel6rpOVIKcWy9dZWj7+zvaC0gCgEKhbbWq+8DnZDaMiSKQwn2D9JvrV0RVavQDvkVA2GEkYlowqM00FDOXWzSae5R667JsO52l989Mz0etejEHOa7l3+h/ORM8263W33ULcjyfbSfIOJoO5Zi5X2DvfnQIo9kycal5vmBhlHmKDr58vX03AVOHRI8u1UUaDugdpb77TirjQoNzhcgFChif0QbAI7Ab3Lk88lr3NeoXM03AA1DYTfX4BbJ/+DSKByVwddTOUEYO/H1QeRfiRRcwMLg31eRbGWJQFRB/sj4ZnI3VGqQBMF/vIea+ksyxrinbS21evH6iq3wJ1wxjdu2+oWl6TDx6DmfmDNdySVHafgWVfe3P8M2UizpIkurIGLwdINSKL+wnobuadWUl1IYHLqt+TMGBFv13rFk07FyqXekKajjb4VWjyaO2eZPRjSSTFYEVAB5+p+Dxk6/pRz6nM3y+o/4SlyReckByjZo1YvB1kJ/08+dy6ayuDqVgaKRHKbpUPP4+n222rjes48lFZtYdFGvmvaUWQhHX9h/IUyVWkX1P9LunVwQMQgF1yp+ezRuAcA7br5nv9lkZ6GHZT5fNyeyDkDYabaTiVIbS5Uy0J5T7CQCcQRjpXzLaw96KEhNOofcA+jTKmTtIXEnrTyGyMEBCC0a+CgXpIavMnJQ+KKLMg9tFX1MrJywLHEbzUsoENLh9Togo2tf4kzgi9kyTrEWC94xr+AQh9slVDbRIe9dLjSdq4aWp6jlMnfKRQpvdybpY94SlvFGiSr8XtziFRx9ckiiOHu80ZcSn2zqqryfIXHQBKF3+m27kdO86fodYR1vBxxdFCoSSaMWS7pTvZ3Df5d5RTYGfRdcOh+8yGyAIMwukT286Y9xD7TYTJZ3QnZvfrmN/7HUQf8FQOmKDei88fp4LsfDgtfkcsT8kWUQwQmuHq4nJrQH5BQYy/E3GY9h5Kd8csZKsI59oqcoYuzP4FzunMoyxorq0bqk2HFp+mg3F1/pjRgFQ8P4gwVtHm1COpFvssEygdO+6pDxAIYgQ+hLiiwbIJ5rAfgi/z1NjAY2w4dyHLTn0W2hxgE7gdZl9TgV7qL7KQ+pkbemRKeQS3jyK5ENyU9DbtZbsQqsYg7tv61OZjLtiX1wB0hSWfunOZPRWl0oqNye1Iv4ko02WFhCSQE/jLpH4IXJGz3sYNabVScFMA0IFNKvzc8HHCZN3nAyKuThk89j2O2ibMv/WhraavHlS1VmDcB8La/bN+jeBoF+/i1Kipkp8I28U4Kd9jtb+35YqwO8n2luRIr09ykLAfZLfTRW6Be+Ufr98A3tOFPKaBnFFO766yzd3/U4DF0DTR7X7I
MIME-Version: 1.0
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: CH0PR12MB5252.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 69927c99-d177-4d11-075e-08dce9055290
X-MS-Exchange-CrossTenant-originalarrivaltime: 10 Oct 2024 08:27:04.8753 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 lmUk8cNXsOWPZlcDPHK2WU/+KZNidba6WCQYJZLt3b0s3AOfAUMXD5cGUnfI7Vt98AAiLrZ/xH56NEceFyc5vA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB7136
X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO,
 GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

This patch implements the optabs reduc_and_scal_<mode>,
reduc_ior_scal_<mode>, and reduc_xor_scal_<mode> for ASIMD modes V8QI,
V16QI, V4HI, and V8HI for TARGET_SIMD to improve codegen for bitwise logical
vector reduction operations.
Previously, either only vector registers or only general purpose registers (GPR)
were used. Now, vector registers are used for the reduction from 128 to 64 bits;
64-bit GPR are used for the reduction from 64 to 32 bits; and 32-bit GPR are used
for the rest of the reduction steps.

For example, the test case (V8HI)
int16_t foo (int16_t *a)
{
  int16_t b = -1;
  for (int i = 0; i < 8; ++i)
    b &= a[i];
  return b;
}

was previously compiled to (-O2):
foo:
	ldr     q0, [x0]
	movi    v30.4s, 0
	ext     v29.16b, v0.16b, v30.16b, #8
	and     v29.16b, v29.16b, v0.16b
	ext     v31.16b, v29.16b, v30.16b, #4
	and     v31.16b, v31.16b, v29.16b
	ext     v30.16b, v31.16b, v30.16b, #2
	and     v30.16b, v30.16b, v31.16b
	umov    w0, v30.h[0]
	ret

With patch, it is compiled to:
foo:
	ldr     q31, [x0]
	ext     v30.16b, v31.16b, v31.16b, #8
	and     v31.8b, v30.8b, v31.8b
	fmov    x0, d31
	and     x0, x0, x0, lsr 32
	and     w0, w0, w0, lsr 16
	ret

For modes V4SI and V2DI, the pattern was not implemented, because the
current codegen (using only base instructions) is already efficient.

Note that the PR initially suggested to use SVE reduction ops. However,
they have higher latency than the proposed sequence, which is why using
neon and base instructions is preferable.

Test cases were added for 8/16-bit integers for all implemented modes and all
three operations to check the produced assembly.

We also added [istarget aarch64*-*-*] to the selector vect_logical_reduc,
because for aarch64 vector types, either the logical reduction optabs are
implemented or the codegen for reduction operations is good as it is.
This was motivated by failure of a scan-tree-dump directive in the test cases
gcc.dg/vect/vect-reduc-or_1.c and gcc.dg/vect/vect-reduc-or_2.c.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	PR target/113816
	* config/aarch64/aarch64-simd.md (reduc_<optab>_scal_<mode>):
	Implement for logical bitwise operations for VDQV_E.

gcc/testsuite/
	PR target/113816
	* lib/target-supports.exp (vect_logical_reduc): Add aarch64*.
	* gcc.target/aarch64/simd/logical_reduc.c: New test.
	* gcc.target/aarch64/vect-reduc-or_1.c: Adjust expected outcome.
---
 gcc/config/aarch64/aarch64-simd.md            |  55 +++++
 .../gcc.target/aarch64/simd/logical_reduc.c   | 208 ++++++++++++++++++
 .../gcc.target/aarch64/vect-reduc-or_1.c      |   2 +-
 gcc/testsuite/lib/target-supports.exp         |   4 +-
 4 files changed, 267 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 23c03a96371..00286b8b020 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3608,6 +3608,61 @@
   }
 )
 
+;; Emit a sequence for bitwise logical reductions over vectors for V8QI, V16QI,
+;; V4HI, and V8HI modes.  The reduction is achieved by iteratively operating
+;; on the two halves of the input.
+;; If the input has 128 bits, the first operation is performed in vector
+;; registers.  From 64 bits down, the reduction steps are performed in general
+;; purpose registers.
+;; For example, for V8HI and operation AND, the intended sequence is:
+;; EXT      v1.16b, v0.16b, v0.16b, #8
+;; AND      v0.8b, v1.8b, v0.8b
+;; FMOV     x0, d0
+;; AND      x0, x0, x0, 32
+;; AND      w0, w0, w0, 16
+;;
+;; For V8QI and operation AND, the sequence is:
+;; AND      x0, x0, x0, lsr 32
+;; AND      w0, w0, w0, lsr, 16
+;; AND      w0, w0, w0, lsr, 8
+
+(define_expand "reduc_<optab>_scal_<mode>"
+ [(match_operand:<VEL> 0 "register_operand")
+  (LOGICAL:VDQV_E (match_operand:VDQV_E 1 "register_operand"))]
+  "TARGET_SIMD"
+  {
+    rtx dst = operands[1];
+    rtx tdi = gen_reg_rtx (DImode);
+    rtx tsi = lowpart_subreg (SImode, tdi, DImode);
+    rtx op1_lo;
+    if (known_eq (GET_MODE_SIZE (<MODE>mode), 16))
+      {
+	rtx t0 = gen_reg_rtx (<MODE>mode);
+	rtx t1 = gen_reg_rtx (DImode);
+	rtx t2 = gen_reg_rtx (DImode);
+	rtx idx = GEN_INT (8 / GET_MODE_UNIT_SIZE (<MODE>mode));
+	emit_insn (gen_aarch64_ext<mode> (t0, dst, dst, idx));
+	op1_lo = lowpart_subreg (V2DImode, dst, <MODE>mode);
+	rtx t0_lo = lowpart_subreg (V2DImode, t0, <MODE>mode);
+	emit_insn (gen_aarch64_get_lanev2di (t1, op1_lo, GEN_INT (0)));
+	emit_insn (gen_aarch64_get_lanev2di (t2, t0_lo, GEN_INT (0)));
+	emit_insn (gen_<optab>di3 (t1, t1, t2));
+	emit_move_insn (tdi, t1);
+      }
+    else
+      {
+	op1_lo = lowpart_subreg (DImode, dst, <MODE>mode);
+	emit_move_insn (tdi, op1_lo);
+      }
+    emit_insn (gen_<optab>_lshrdi3 (tdi, tdi, GEN_INT (32), tdi));
+    emit_insn (gen_<optab>_lshrsi3 (tsi, tsi, GEN_INT (16), tsi));
+    if (known_eq (GET_MODE_UNIT_BITSIZE (<MODE>mode), 8))
+      emit_insn (gen_<optab>_lshrsi3 (tsi, tsi, GEN_INT (8), tsi));
+    emit_move_insn (operands[0], lowpart_subreg (<VEL>mode, tsi, SImode));
+    DONE;
+  }
+)
+
 (define_insn "aarch64_reduc_<optab>_internal<mode>"
  [(set (match_operand:VDQV_S 0 "register_operand" "=w")
        (unspec:VDQV_S [(match_operand:VDQV_S 1 "register_operand" "w")]
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c b/gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c
new file mode 100644
index 00000000000..9508288b218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/logical_reduc.c
@@ -0,0 +1,208 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <stdint.h>
+
+/*
+** fv16qi_and:
+**	ldr	q([0-9]+), \[x0\]
+**	ext	v([0-9]+)\.16b, v\1\.16b, v\1\.16b, #8
+**	and	v\1\.8b, v\2\.8b, v\1\.8b
+**	fmov	x0, d\1
+**	and	x0, x0, x0, lsr 32
+**	and	w0, w0, w0, lsr 16
+**	and	w0, w0, w0, lsr 8
+**	ret
+*/
+int8_t fv16qi_and (int8_t *a)
+{
+  int8_t b = -1;
+  for (int i = 0; i < 16; ++i)
+    b &= a[i];
+  return b;
+}
+
+/*
+** fv8hi_and:
+**	ldr	q([0-9]+), \[x0\]
+**	ext	v([0-9]+)\.16b, v\1\.16b, v\1\.16b, #8
+**	and	v\1\.8b, v\2\.8b, v\1\.8b
+**	fmov	x0, d\1
+**	and	x0, x0, x0, lsr 32
+**	and	w0, w0, w0, lsr 16
+**	ret
+*/
+int16_t fv8hi_and (int16_t *a)
+{
+  int16_t b = -1;
+  for (int i = 0; i < 8; ++i)
+    b &= a[i];
+  return b;
+}
+
+/*
+** fv16qi_or:
+**	ldr	q([0-9]+), \[x0\]
+**	ext	v([0-9]+)\.16b, v\1\.16b, v\1\.16b, #8
+**	orr	v\1\.8b, v\2\.8b, v\1\.8b
+**	fmov	x0, d\1
+**	orr	x0, x0, x0, lsr 32
+**	orr	w0, w0, w0, lsr 16
+**	orr	w0, w0, w0, lsr 8
+**	ret
+*/
+int8_t fv16qi_or (int8_t *a)
+{
+  int8_t b = 0;
+  for (int i = 0; i < 16; ++i)
+    b |= a[i];
+  return b;
+}
+
+/*
+** fv8hi_or:
+**	ldr	q([0-9]+), \[x0\]
+**	ext	v([0-9]+)\.16b, v\1\.16b, v\1\.16b, #8
+**	orr	v\1\.8b, v\2\.8b, v\1\.8b
+**	fmov	x0, d\1
+**	orr	x0, x0, x0, lsr 32
+**	orr	w0, w0, w0, lsr 16
+**	ret
+*/
+int16_t fv8hi_or (int16_t *a)
+{
+  int16_t b = 0;
+  for (int i = 0; i < 8; ++i)
+    b |= a[i];
+  return b;
+}
+
+/*
+** fv16qi_xor:
+**	ldr	q([0-9]+), \[x0\]
+**	ext	v([0-9]+)\.16b, v\1\.16b, v\1\.16b, #8
+**	eor	v\1\.8b, v\2\.8b, v\1\.8b
+**	fmov	x0, d\1
+**	eor	x0, x0, x0, lsr 32
+**	eor	w0, w0, w0, lsr 16
+**	eor	w0, w0, w0, lsr 8
+**	ret
+*/
+int8_t fv16qi_xor (int8_t *a)
+{
+  int8_t b = 0;
+  for (int i = 0; i < 16; ++i)
+    b ^= a[i];
+  return b;
+}
+
+/*
+** fv8hi_xor:
+**	ldr	q([0-9]+), \[x0\]
+**	ext	v([0-9]+)\.16b, v\1\.16b, v\1\.16b, #8
+**	eor	v\1\.8b, v\2\.8b, v\1\.8b
+**	fmov	x0, d\1
+**	eor	x0, x0, x0, lsr 32
+**	eor	w0, w0, w0, lsr 16
+**	ret
+*/
+int16_t fv8hi_xor (int16_t *a)
+{
+  int16_t b = 0;
+  for (int i = 0; i < 8; ++i)
+    b ^= a[i];
+  return b;
+}
+
+/*
+** fv8qi_and:
+**	ldr	x0, \[x0\]
+**	and	x0, x0, x0, lsr 32
+**	and	w0, w0, w0, lsr 16
+**	and	w0, w0, w0, lsr 8
+**	ret
+*/
+int8_t fv8qi_and (int8_t *a)
+{
+  int8_t b = -1;
+  for (int i = 0; i < 8; ++i)
+    b &= a[i];
+  return b;
+}
+
+/*
+** fv4hi_and:
+**	ldr	x0, \[x0\]
+**	and	x0, x0, x0, lsr 32
+**	and	w0, w0, w0, lsr 16
+**	ret
+*/
+int16_t fv4hi_and (int16_t *a)
+{
+  int16_t b = -1;
+  for (int i = 0; i < 4; ++i)
+    b &= a[i];
+  return b;
+}
+
+/*
+** fv8qi_or:
+**	ldr	x0, \[x0\]
+**	orr	x0, x0, x0, lsr 32
+**	orr	w0, w0, w0, lsr 16
+**	orr	w0, w0, w0, lsr 8
+**	ret
+*/
+int8_t fv8qi_or (int8_t *a)
+{
+  int8_t b = 0;
+  for (int i = 0; i < 8; ++i)
+    b |= a[i];
+  return b;
+}
+
+/*
+** fv4hi_or:
+**	ldr	x0, \[x0\]
+**	orr	x0, x0, x0, lsr 32
+**	orr	w0, w0, w0, lsr 16
+**	ret
+*/
+int16_t fv4hi_or (int16_t *a)
+{
+  int16_t b = 0;
+  for (int i = 0; i < 4; ++i)
+    b |= a[i];
+  return b;
+}
+
+/*
+** fv8qi_xor:
+**	ldr	x0, \[x0\]
+**	eor	x0, x0, x0, lsr 32
+**	eor	w0, w0, w0, lsr 16
+**	eor	w0, w0, w0, lsr 8
+**	ret
+*/
+int8_t fv8qi_xor (int8_t *a)
+{
+  int8_t b = 0;
+  for (int i = 0; i < 8; ++i)
+    b ^= a[i];
+  return b;
+}
+
+/*
+** fv4hi_xor:
+**	ldr	x0, \[x0\]
+**	eor	x0, x0, x0, lsr 32
+**	eor	w0, w0, w0, lsr 16
+**	ret
+*/
+int16_t fv4hi_xor (int16_t *a)
+{
+  int16_t b = 0;
+  for (int i = 0; i < 4; ++i)
+    b ^= a[i];
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-reduc-or_1.c b/gcc/testsuite/gcc.target/aarch64/vect-reduc-or_1.c
index 918822a7d00..70c4ca18094 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-reduc-or_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-reduc-or_1.c
@@ -32,4 +32,4 @@ main (unsigned char argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */
+/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8f2afe866c7..44f737f15d0 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9564,7 +9564,9 @@ proc check_effective_target_vect_logical_reduc { } {
 		   || [istarget amdgcn-*-*]
 		   || [check_effective_target_riscv_v]
 		   || [check_effective_target_loongarch_sx]
-		   || [istarget i?86-*-*] || [istarget x86_64-*-*]}]
+		   || [istarget i?86-*-*]
+		   || [istarget x86_64-*-*]
+		   || [istarget aarch64*-*-*]}]
 }
 
 # Return 1 if the target supports the fold_extract_last optab.