From patchwork Sun Jul 21 09:14:59 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Feng Xue OS <fxue@os.amperecomputing.com>
X-Patchwork-Id: 1962884
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com
 header.a=rsa-sha256 header.s=selector2 header.b=rVes/5QJ;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WRd6k0209z1yYm
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 19:15:25 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id BC46E385DDFB
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 09:15:23 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from CY4PR05CU001.outbound.protection.outlook.com
 (mail-westcentralusazlp170100000.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:c112::])
 by sourceware.org (Postfix) with ESMTPS id 84C533858C39
 for <gcc-patches@gcc.gnu.org>; Sun, 21 Jul 2024 09:15:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 84C533858C39
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=os.amperecomputing.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=os.amperecomputing.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 84C533858C39
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:c112::
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553304; cv=pass;
 b=KijPTX/21HjJCwOJv4y7GzmjVSkkVXQ3AXVYZBOYiPKFvkI6SFtpU84SK2AKnamV/QP2dls/BJ8M93ktW8BzxCP744ZRyVY8PhBthUzS8fHftzMJHA4a8MN0Z3s0rHgUU4Zldd2VdTBPgqbRUglaiXFCPttUUuHa43E71mxcYXk=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1721553304; c=relaxed/simple;
 bh=gEz6O6QwI0OEzID0gD8GpdxXGVS9gKUBWhSAMyPy+P4=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=BDScgPbFnVOlC4oH0lFnwwj0yKzj6Xa8/pGI7WqF8fzpSjIVR4/9Msv72hfMGDbC0apRcGrlXESVyKgIBek/wIMJ0+SJE7B/UVRi/DDbhWY6qTKYPlZB/TeOU6H7Ki30Ff1lImVWG220Kd/LxNhbSVCJKDAsBiOdvCJnnOvwr5w=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=doksAEPyj5mxLsOectPWN+A6AiWbcA3K0Ix+yE+UFOOivu5mBU7crjQJVAZnUDll84VKnReCLDOrjrtSV6mvnk6N74OX67O03+WLqmF4bZwSkPIpzQzKXZnbePA9LaiZxrwxReuj9btcUQny17+kyows/XgjdSw8KkwY2idEDdmDKuUAm/JutKFGHhbZe4H+onngHmAVn09Ezf+vKrjUF6xl3LKvUXdDDKA15liIjc3YB3rs5xShj6wkAnLxuUKqi0wK8MChFRb3IAUc0DdkMTk9xtFLDVlFN6ZGJJWHfgCbGN12gL9zpVu4Oudmra9xynSP13KYcYsEeuuA4336/g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=HNDSINRcY8/OxUQogGuBQewkVe+wR/21k1X3QdrMfh0=;
 b=M/ZMvsKnw4iT6PUYx2TKFVPa7VAuNeGMJCrPYwloOSaOk2jX+DnpTN5WswXUn+vKBZ+BLsfQpx7Jo71bZvKS9BLvaiHhccwnt0PLhjRwTNawUaS8DecfzUT1wO6eaAG5Bptm2XhA+sUkUelDabWtwmT2jEnGxY2v+h4wdNB7BkQFGOhzOitClCDqJhNG+m9gRSs6KAU9NcXQ4G52vrS5SG94NWcUjj4by9pUnELwUMqV0p0L6tAexKPQsCG78wOxS8YUChUo+7VqofB4ViZKscTTflahMIOvz1TWzqS1s/3xupw7mkVlAk2sYLg6xxlcZLC1Oes9+2rX3T0B4vItDw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none
 header.from=os.amperecomputing.com; dkim=pass
 header.d=os.amperecomputing.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=os.amperecomputing.com; s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=HNDSINRcY8/OxUQogGuBQewkVe+wR/21k1X3QdrMfh0=;
 b=rVes/5QJIftXTBO5EgY+fPdH8gz9BUDRi3GQacBBIp4GF0E1gdNBbu5z0rl96dMGL+4sDTQJ4/vvFDuAl3bsxVWC3rNNfTEdr+xTHxVAtFojisIISW5anOTjQMJst1W4wuQXsvygOKrFlSWW4SpAMMeWIRmvcRZw9jvkLm8AAm4=
Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by
 DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7784.14; Sun, 21 Jul 2024 09:14:59 +0000
Received: from LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024
 09:14:59 +0000
From: Feng Xue OS <fxue@os.amperecomputing.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Biener <richard.guenther@gmail.com>, Tamar Christina
 <Tamar.Christina@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns
Thread-Topic: [RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns
Thread-Index: AQHa20mwT9DDm8/aJUK1dkZViTITDQ==
Date: Sun, 21 Jul 2024 09:14:59 +0000
Message-ID: 
 <LV2PR01MB78398755C9C2FFCC5CE994E8F7AF2@LV2PR01MB7839.prod.exchangelabs.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:14:58.925Z;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=os.amperecomputing.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_
x-ms-office365-filtering-correlation-id: dddba7b3-c503-4a9d-c132-08dca9659855
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|376014|366016|1800799024|38070700018;
x-microsoft-antispam-message-info: =?iso-8859-1?q?J+FCIfMyQHhBhnqNkKrtlj/ipI?=
	=?iso-8859-1?q?Jk53/Kszm92e+oBuzDDVD9HCsy1BMPeVSpAo4fjozAULmjhdPCqQxBJFksrh?=
	=?iso-8859-1?q?56vcAfAjUjeC2o1Pg/QvG3uP8bTZs/wM2YnJcOMXuG4f9y1+A9kWK7Z8/8Dk?=
	=?iso-8859-1?q?1d1k6JTs1KI3cob4LwViLuOb4ARKh2yAblEzhXvtulgop8DyLslZSBYHYjw8?=
	=?iso-8859-1?q?LZzOarB58PwWSPoWLIrNQFS5NmLHSipkv4fEDftpI/Ww8PzSlKFkpfTpM8BX?=
	=?iso-8859-1?q?94T5Fm26TCOZH2+nYVwT3SGEKURNpZGd/t7ct9TCvWpQ22TWuKKoCaYdg64F?=
	=?iso-8859-1?q?WXhfHD1Zb5WngDJSOAiwIXtr+VGjLJxtkY8H3uKmwxFAPOuJRriyx6Rililr?=
	=?iso-8859-1?q?pRkFOMzUJFUGj2ChEBxHFLkkg99J43VQa0ign65zWjeul/FU2tU8/nqHIXQ7?=
	=?iso-8859-1?q?VOaRffunDTmoOA5BGiY8xuWi4iby2UczU1cLGerKi1Ibb+uyqYZwbMEpVzBQ?=
	=?iso-8859-1?q?ogi4/jWLvm4vPrgvSdix47Gihq6ZyaE0oTxKuO4GuCJTfubz8okiIpy1sW60?=
	=?iso-8859-1?q?N1PySjPrY3mVR2CeBhadmK+Fr9yiDm+h11PpG/L6mT+r3HTb3SP5HaTYDF19?=
	=?iso-8859-1?q?vNOy7jqWXze+pgY+2SB45k3vhTmQwhl31Yo6Qz8m+BFx2GHswYGfzL+nJumo?=
	=?iso-8859-1?q?Dn8RoVcj2R2/qfU/QbmvmR0bxDycVndODxhYYxHjXCbK006mMwpkub5fcizu?=
	=?iso-8859-1?q?LGCxMVQvw1WdLN5EIJ/CaCE5dunym2bCpsbJi/aHawYv/8HdCpBdBkX/Fe64?=
	=?iso-8859-1?q?dMzj1ozH17sbwv2ZXQJLs8KVIgSF3CRL/zQ7Z2O/KS25OqjymoS3vmHI7Vnh?=
	=?iso-8859-1?q?yNBSGgoLbLgdXfVMGrCRoJ7mWUFW9ldr2nnCh/foXBxbpHl6kfsYWAQGPQFE?=
	=?iso-8859-1?q?SWWf2/ePy/2mGJPD9ohvM44EC/cE3i+xaLy/nH+h+eWR9Z4KeP9LL0qw5ah7?=
	=?iso-8859-1?q?tM/N0mvcSJs/ag1PadaDHjllk7e53r7lIHAprsVaQR5CQwl+a1RDa83RABxg?=
	=?iso-8859-1?q?sYBikijBJ8E3a2nOA1rYNsAFXGYEwNXDJoJWQ5Dno3axJY6ylZybaE309PHb?=
	=?iso-8859-1?q?zplNcPSHIqT8RP3ID49v8Iaea4MmKP9B1FtpQsy94j3s6VCeXdcFvHIcEr2Y?=
	=?iso-8859-1?q?y92poZ9nTId/uHq6xNblaJVMbnfZKRMyhetWPWnnvvbmKUPjkt2XH3Nns2Ok?=
	=?iso-8859-1?q?kLx6Viwcx3488gU68lTohcXaK8E7HEvupYllEtCV5ofUgZCg5vwsxKjrZ/Uy?=
	=?iso-8859-1?q?uZRODcHcJvnRYvMXqSChfiWbJ2pE2IYIoHKy7fA7v9ima1CpFP9pA0Fj7kfj?=
	=?iso-8859-1?q?2y7jdXP4QOkZ2bBs+6Lrw55kusTPrPLqwWJg9GFaMPgr4Bn23y/eeoEkZHnI?=
	=?iso-8859-1?q?iYzh9X0W3aZLrDoEI+uvKmJg=3D=3D?=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE;
 SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?zCtgjHc934898222nDwKYDZ?=
	=?iso-8859-1?q?8tTA4szZCsCA02T04gz+FUWBN6Q5tNhNDelyWv43UpvqEgwNNXBRwEnt2Wrr?=
	=?iso-8859-1?q?yJ7MmYZA/NwWXWwDQDXoeC1n4ZbzXR6D7nExJ7k4wFn7p1qU80HiuxJ/QWZc?=
	=?iso-8859-1?q?Wu/pFYCUTTOXGc37y7SgoZJvpuJtLBQ22Ue2u3wxa6DaF98NU+QZ4hmiJ9Fj?=
	=?iso-8859-1?q?v6efu3xICJCBZZkMJtd9Jr+BrKooPMXPdfHKOg3UaGqcpXEQwDMBCZeTvFKv?=
	=?iso-8859-1?q?sOtuFUGCKJalK8wNtajojUbKAKydId+WJ9u4fMFJjkbwLUlXcDW4zNfqRPFo?=
	=?iso-8859-1?q?um9NGmH33reKKNz39uXTXDVhk4Ou5l+WHv7Rfux9DzjeW/dBcDQmHM8eVAPx?=
	=?iso-8859-1?q?58lU6YdVTsK5D7bMJPq/EFKsypP8/4urG5EQXsK8/zXQWlMJzNYf9zDfFClw?=
	=?iso-8859-1?q?sw2XQpRuezUh8YVcr/QsTqS8bX4r6lh7dBqC5KUXkpaHwad0SLjZReGCzHfE?=
	=?iso-8859-1?q?w+ly4ihTVoPmsBZf6EsIAXi49H6fc4mJlogDKmSACnymGndc5u7dTmPgFgQT?=
	=?iso-8859-1?q?uqqyQVEPzPbzA3TIa9vq1gL4nqTLxJsc3LSoeC4phmX23avGohphjiu3ZRRo?=
	=?iso-8859-1?q?V/yDaPSSGTosEhdCfm4h/fQvIzWrO1lLjaVMG3qKeBkwhheJHv+1vUoihpgS?=
	=?iso-8859-1?q?POamDE6by5PHhuvXVLWv4ctNSSeaBpF7HRuUbDYDnFSCIS11eHKMClJ7TQS9?=
	=?iso-8859-1?q?u8wJMMSTVR/Sy3g8W5Mvsv6f6MDS8rmWe/78BhWAHkXmuQFVToxBJEY4KGcD?=
	=?iso-8859-1?q?q1MwGCFN8wYd2eb4zFbtNpNxkbW9xPxgw1kQf/yqkE2Xa7IC4sTjve8fjvPx?=
	=?iso-8859-1?q?ey+DawfiN0JNassgxm+nVvgyu+l3Dd6ctHIA2l2oN26IoIXZnO7E5qAO7arJ?=
	=?iso-8859-1?q?hGKTU+Fk20GQoUGZOT354291FUzzsk+ZycNbsSKM/7OoX6574ExxSd1J1Qaf?=
	=?iso-8859-1?q?F0zLNURoMggyGRx/ztNqE2ooZ4tATy9QdfJfRUMxlODmM9mS6przOHlyMZFp?=
	=?iso-8859-1?q?0kDrOdGQZBMULy4Z9bbe+iS0sFrv52E+mR8hw6Ws8CCZPD0Pp/HVY7KPe4yQ?=
	=?iso-8859-1?q?5KEv2oRMbcda+UcsKYVBFjCMuNFregFcO7CTE990zFa+HCxv0qsofVtSKA8d?=
	=?iso-8859-1?q?Tx1K6rzpHNwRaFBmOdwCDLQMLb/rVJ6/056WyVKEdPC/voBu0CHE5s3Mm0t6?=
	=?iso-8859-1?q?49aTN4GnPNDX8sahi86KO24xXcigq3+23sbuAcnvjyuBtoGuRqJXiNR8y5ea?=
	=?iso-8859-1?q?59rPyyf94ehA4V58+tzw6DDEHGxGwFJJc70NnWnzscFavscSXAE+62QLJJz6?=
	=?iso-8859-1?q?eRRdrjFFvD4eteNdrT+bOT3rfEldHITvV1SbpgtS4fchXx71hEwmhjDbEgsJ?=
	=?iso-8859-1?q?UJSr8ZfVDw9QdH4H7euI0k/4Xn7juhLqNjjt8pJkfogOuogrg1wo0+CuNs3b?=
	=?iso-8859-1?q?t8kbAhZrrjcAsXXhdkJJtfwTq36kiW26ra0VgEl24t77Et4z9P/Kjqch+lrB?=
	=?iso-8859-1?q?mQ5LzYPZgrS1ne/q0Fwk+DYBNeyQ7MX2NlLnScAOrpdJ/pBY41b2Jx0ZoKFG?=
	=?iso-8859-1?q?tjipbtMWDQWkAXjTmBfAayIKtW9IIskH25iltug=3D=3D?=
MIME-Version: 1.0
X-OriginatorOrg: os.amperecomputing.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 dddba7b3-c503-4a9d-c132-08dca9659855
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:14:59.1761 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 dprWdc78DaSLlJBD2wPamL3bFJJ+nZ1b6XbwegFWL7MLPtZe9Cl6FPBhqriakJqGfWuEvqkwJED3YmSr03lcoWxWjlnu2nlI/LL3cmijn7Q=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

The work for RFC (https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657860.html)
involves not a little code change, so I have to separate it into several batches
of patchset. This and the following patches constitute the first batch.

Since pattern statement coexists with normal statements in a way that it is
not linked into function body, we should not invoke utility procedures that
depends on def/use graph on pattern statement, such as counting uses of a
pseudo value defined by a pattern statement. This patch is to fix a bug of
this type in vect pattern formation.

Thanks,
Feng
---
gcc/
	* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call
	single_imm_use if statement is not generated by pattern recognition.
---
 gcc/tree-vect-patterns.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

From 52e1725339fc7e4552eb7916570790c4ab7f133d Mon Sep 17 00:00:00 2001
From: Feng Xue <fxue@os.amperecomputing.com>
Date: Fri, 14 Jun 2024 15:49:23 +0800
Subject: [PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns

Since pattern statement coexists with normal statements in a way that it is
not linked into function body, we should not invoke utility procedures that
depends on def/use graph on pattern statement, such as counting uses of a
pseudo value defined by a pattern statement. This patch is to fix a bug of
this type in vect pattern formation.

2024-06-14 Feng Xue <fxue@os.amperecomputing.com>

gcc/
	* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call
	single_imm_use if statement is not generated by pattern recognition.
---
 gcc/tree-vect-patterns.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4570c25b664..ca8809e7cfd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2700,7 +2700,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
      PLUS_EXPR then do the shift last as some targets can combine the shift and
      add into a single instruction.  */
-  if (lhs && single_imm_use (lhs, &use_p, &use_stmt))
+  if (lhs && !STMT_VINFO_RELATED_STMT (stmt_info)
+      && single_imm_use (lhs, &use_p, &use_stmt))
     {
       if (gimple_code (use_stmt) == GIMPLE_ASSIGN
 	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
-- 
2.17.1


From patchwork Sun Jul 21 09:15:33 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Feng Xue OS <fxue@os.amperecomputing.com>
X-Patchwork-Id: 1962887
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=fail reason="signature verification failed" (1024-bit key;
 unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com
 header.a=rsa-sha256 header.s=selector2 header.b=EkE+Qc8/;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WRd8L4c3bz1yYm
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 19:16:50 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C5240386100A
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 09:16:48 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from CY4PR05CU001.outbound.protection.outlook.com
 (mail-westcentralusazlp170100000.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:c112::])
 by sourceware.org (Postfix) with ESMTPS id 532273861003
 for <gcc-patches@gcc.gnu.org>; Sun, 21 Jul 2024 09:15:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 532273861003
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=os.amperecomputing.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=os.amperecomputing.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 532273861003
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:c112::
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553345; cv=pass;
 b=UDxTVKbDGE8w+qs6aShtEob/+bss4oaMBSHcMb+NGsM+dDwgkpx0PGBVWB+SND+1GKiZATfcmqYGWl/Ky5xgD1U+ZJ2Vgu0egvXJBAD8ViksZ9MP2FvSOEVz9QcgjTjOBTwiqAIV/gqa194iKbVggcRc3cHo9JWsGOCjp4XAYpw=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1721553345; c=relaxed/simple;
 bh=ml/pCwdQXA1C7MLSiUoEs7BmwDUHIJHqF4U3HKeaWN4=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=OLhudATJNNbGX+0TG/Rvcexm/snXurItKgjYwuyaEjjsoeJo6OijQDWgEuI1Qd3osJKcIJ/lDD+zLMpK/OOuwdqNhn4NFvp//D4gfE2HlnpvC2ayx+ospnyQiudRrzoOV8fCNc0sN6BV9HxBbjAc01tEKCFPGeRCc5Y0KkFlTNw=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=wgdScrJ1tbgeG2eXqZttoPgZA5j5BvZzmMnmAPKMaOSeSgw0uvrOzm+7Xe+WLpa12NKrWPQL8h4yG87z5bQrZfkChMpeuvE0xGYJKkTJ4Mtqs2EPh0bij6Nvx2YSwt9F1qtHMaAsX+uY+bZqjpz0iABaE5oaQfTdQlLV1CB5EVPSFO9yNA0YeiPZH8qyeRLjuBeGMh4aBeyfbgg+neeyAEPC7sM/OoxFSXoAi8pepNVPHRLS5VdDR7vRo3Z3lSVMxpI9PohWcPtWxD4PsfCWY//iSsAsFCY4fcf60IGR7PfpEp95nIu5r6hZUzFAIZkqliu4v29FEm/uShH5V4NvVw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=CN8qYJNHtQeZYY/Tggn0XgH1Abjf8mMpx2QTMahokUk=;
 b=KiP3g9CWHg3XDeNWFUUIbtF0ahhCqaLSN9PxUklQsGrXjuhozBtlOCvoWnSirZLFMFgCL0h4Juof16iYyf9+KS/CpS/OMLuYvNn+85jL8R2mrXNnhdP9874q1VYhHWFDZj733eFptDEjWImqtoJIoCDYcGTzXVJKrRtlhFWdf2aNfNWSPDyQsP4o/3Tpak0tEs2mXRNNvo6FvGvPaLFI8GFamp3ZsTd31hO5+Szslbp3Njd3wMscuO7QSmwy2vgtgPEbRwFjXMMjFDE6GXdrgxggBXDaGTIZYzys4R6AiDe0xxMtD/r1Cfk1sP8AaiJIRFhfOPrKTkkhdkF+iEpZpQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none
 header.from=os.amperecomputing.com; dkim=pass
 header.d=os.amperecomputing.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=os.amperecomputing.com; s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=CN8qYJNHtQeZYY/Tggn0XgH1Abjf8mMpx2QTMahokUk=;
 b=EkE+Qc8/qB/WXNgdlBsDtuxhBPTJp+WS2DMV1XpypmZcXJwWI9EKPRbxBVMUb7m7tepIk0XS1hd0D9ZOnexfBDe5yw2R+KBTM0btsl1ARfcUCuS3FDUSl6EukDDrnODo0bzJjol3JZlh+qc52SsTdRsFp2KxY2md2p9hh9GIkJ8=
Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by
 DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7784.14; Sun, 21 Jul 2024 09:15:34 +0000
Received: from LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024
 09:15:34 +0000
From: Feng Xue OS <fxue@os.amperecomputing.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Biener <richard.guenther@gmail.com>, Tamar Christina
 <Tamar.Christina@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [RFC][PATCH 2/5] vect: Introduce loop reduction affine closure to
 vect pattern recog
Thread-Topic: [RFC][PATCH 2/5] vect: Introduce loop reduction affine closure
 to vect pattern recog
Thread-Index: AQHa20pyTUhuVtaz7kG4SqRzj95c2A==
Date: Sun, 21 Jul 2024 09:15:33 +0000
Message-ID: 
 <LV2PR01MB783928896DF80235611B7FD2F7AF2@LV2PR01MB7839.prod.exchangelabs.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:33.745Z;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=os.amperecomputing.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_
x-ms-office365-filtering-correlation-id: b5106731-09fb-4dd1-d300-08dca965ad15
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|376014|366016|1800799024|38070700018;
x-microsoft-antispam-message-info: =?iso-8859-1?q?1Y+W1GlRzytCsdSzmb3ALPt/oo?=
	=?iso-8859-1?q?LZtZmihGKyvwjbBT3fjTL/VzF+dDYJG/HRwPN4Brxd50r0tnuse79snQlzOh?=
	=?iso-8859-1?q?Lfkj63Au/qoDdmeWgiSUNuJL2AwD/RWfpsXfuJ9FD/CByXalNO5fbLNVI3wg?=
	=?iso-8859-1?q?mv5MBtSHje3R+jLm5wencdthsF407kF7OO5VKL2LyttqXqvjc82iqIgfpahk?=
	=?iso-8859-1?q?xeWdXcl+jdaFre50EhCccO8IwPaNOpzsyk9darBA9vsfTEXYkFyNqqHkCpdb?=
	=?iso-8859-1?q?2domsinuIRHQmU9hmc8P5nUizfpBNSXCqDodsdDq7RGCELPPsTs/eeICnrSu?=
	=?iso-8859-1?q?cT0z7ef+RXhen654YUwE84uUxC2tsjfemhUkB3bJz+Qv26WyLL45gy0pZLml?=
	=?iso-8859-1?q?G7K7VUcplD3inhQe7now29yHCAGRZGzN6yYXcuIgI/9BczqTP7hdgYbnyT8s?=
	=?iso-8859-1?q?f7TfsBN1g/03BD9sEqiblNTCBV3KKEEShIKyJHsbwnzg/nOFkVkr6OdPg3pw?=
	=?iso-8859-1?q?aoTaG0oEGdFsJXOpZN2yXgoIZHC95x6b5mqmrSm5kA5z/pCYIwj0q7FZrX/V?=
	=?iso-8859-1?q?L0wg3ynzj/jEw8j+bPzVScsqiWHgOxzoDANOVjPpbBbSKwG9SyOTA1RS5u7C?=
	=?iso-8859-1?q?crzPIi4wJpJ/aR+rg2Wn99MHfWEA0859ETnte8qGAU0C9GwglV0idLTarATb?=
	=?iso-8859-1?q?yC3VMSfw0q58IiEf8fpqEYIQPQkl5DzVLKZkhSjMq3+GHN4prKzQIn2vcZGh?=
	=?iso-8859-1?q?F3N3y+RWkQrCmUtNDfHf7CZmZk8YaehKZm99iRxhqeJm+AfIpSFhgG7KnuBZ?=
	=?iso-8859-1?q?vlpIB4hSjWzMrWkIUx5Ht85/FEY4FN3q2BjNqgKPkffziDj+vRxUSfMw63Pv?=
	=?iso-8859-1?q?C0IOt8tmoQu88iOj37OLdMTQJxY3UA+3+dQSDatlNGogh77r2GYs+pc0a9T9?=
	=?iso-8859-1?q?FMBBeoDlYW9LtH+qVUHcLWj+jx/dgJ6rGU2DVp8T7J+niQtzc9qVFTjIodYW?=
	=?iso-8859-1?q?KJN03bmwuPfLSd15v29Nfd3aumi5RkqYH3gcVG5lEqzjP6a2qSfVeYIRd4Df?=
	=?iso-8859-1?q?ujOp8QrBvDLJWWSCDUqXppMP8ZNt+E4JyigMKn+XpjmtU3lzHVpRFQfHKkWv?=
	=?iso-8859-1?q?8WcmDtcDfIUmPbV/mx2Mmv8YN6ZpWluknZ2vUgYYClWvbxQZqkVieMK8jbhB?=
	=?iso-8859-1?q?PkQCmMExh322KF1OK7VfGecI7+LO901QMaf4bYEk6cDgfvRazmMO/WKI5/9H?=
	=?iso-8859-1?q?qilPdhEOJ3kYu09oeECxZnKN5zbD+2RvLrCoaxVAm07E1d8lnTx8bQ4F4i1S?=
	=?iso-8859-1?q?T8YP40M3X0Z+BovxsAzDnh0FPd1g8DqvUBOIoY3nxSad82I4MZGKhHQg6/d5?=
	=?iso-8859-1?q?j7o+MdzjzTQmyOafkXkt5hkYcli6klSZWLZKa8zyl8zBPdoBdQ1WdJ1wPcJ3?=
	=?iso-8859-1?q?ibr4NEo6aGHsNfmAHTeLvkXA=3D=3D?=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE;
 SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?4ALfCZsIy8TIqJ5kLidiDBg?=
	=?iso-8859-1?q?hWSbMMoegcZ4YNwxpTjLRQac2XuwExeI7gaFOJNw9dPRs2BAB0/PvWylKO54?=
	=?iso-8859-1?q?XtoABQfMMrEHpRbuDYh3NIkgXpHYEhHQgkoLVUVFWHzV+9kD0kwgafgELe0E?=
	=?iso-8859-1?q?Qc5XBzFhYH3X2FUj6ggbPmezZTwSdheGvoC9NabNFqBd4endccT1PXsHBT/I?=
	=?iso-8859-1?q?yr7jbA3bJaS1d41dTdmsAoVLAJEUpLewtA9Ntyy4U0itxA6Wj+h8O/WF0BYQ?=
	=?iso-8859-1?q?EUT0yHhqVtkwtJcCtAHNfm7jCfM8lJuTtLm3Drgtr5YkL/IutUTwcRSgpzT3?=
	=?iso-8859-1?q?su3eQtzpn4lZU1jERkkkdaPDdMpHykQT9FZ4XDgoDGyqPyVT/7HMUeZmymhJ?=
	=?iso-8859-1?q?3LAH21iDQxjroN3JyNDtTbCQFk80h9iDDJiNc1G1o792s7y684zcdQuicK0w?=
	=?iso-8859-1?q?kWRU8jN6LLjOIEt65UQeSHdNnhj6HT7r88lAH6UUfKDkwfqrrGtAm/lNhaHG?=
	=?iso-8859-1?q?pDJcrpxXRp3fT3rH/L49NndYAkHuUgJIs0dGrpoWkBGg/11rf5cmakbIs8fz?=
	=?iso-8859-1?q?0HitvYDPfH3PgMfSa7h0zEHwXFNpWUGFbFIrdWPIZRngXu4zxH1zobEbpFca?=
	=?iso-8859-1?q?eXs8Jy2QtGf+6RT8F38qfj1xCfz8/KBVaMF3pSgKCa6ASK4w048e1YASBrRT?=
	=?iso-8859-1?q?NUZseQwEIcIMwosuzRGorlY8OzI62zhqPReKzhWg28ZWAMCeP5klSHTic1iq?=
	=?iso-8859-1?q?hMtbiM8vEVPKQBUXx0UtFJKqWfoaIEYyZjibZNvisflJpCci2h/vnKey+aR6?=
	=?iso-8859-1?q?3u7Y1nEzrwIFHf/esx3s3t5vhTKbQMWxKECslwBOg1GD9FBeGVJ3YwWEi15B?=
	=?iso-8859-1?q?rif4dUn7CfJ/DjYOveFMeUWidhW1/Ujc1/nayyEIWjDDqbRq7cDEKahU3bIC?=
	=?iso-8859-1?q?Mos5EhKGERFLF/JDRgvq/t+Rqkm4qE7v2r9QGoW45fs9ILq5x2tmdkXTNI2q?=
	=?iso-8859-1?q?q5JJluVxOkkIAd7BUKXHDNtuJYq3Dr0dZ5gJd87RrP1tPfRrutO8NfL22V2M?=
	=?iso-8859-1?q?CVyQ7/zDL//28mJ9/w/2uFOgp8+gu/3zZ0bbhuSRvrAVfHKjze259uk46WHb?=
	=?iso-8859-1?q?g8/kk8PoIPpd8oHyzWf/svohn45u5423aB1bYzXOCSh3ckqFYpdhYPMXNVYB?=
	=?iso-8859-1?q?pXplR6C51ZBqlgfyqqpN9ruN4NwXIA0U7LhtD0MV6jhs1gSwqtE6DEnmCniu?=
	=?iso-8859-1?q?WWvR+9vPQhUWftVWakhTxpcaZq4skPqp5aywAPXVLbhMv+n+llUzDcXuNxol?=
	=?iso-8859-1?q?eeGkYrnrEs3N0lswuNjtlA0dUOjuyFD/23ptVHNo/+89Wl4ko6DgULEK12Vk?=
	=?iso-8859-1?q?2MWMvk2+X8ZwBK+uu3mR/SlagSAaI/SV6Cp8W6FM+J7FxGMU1PxKPxja4Kl3?=
	=?iso-8859-1?q?mwq+4cz2v08woYrbN2aJvdimPaYk8G1mD4QFqSdsjmMofhf6CdH3zHLZ8ADv?=
	=?iso-8859-1?q?dXyKgWs6T8cApgLDQBoKyPa9ZDBjv68x0mrTxUWMob5gZGFTKFbCXdvsI6lu?=
	=?iso-8859-1?q?/7vefIp7Utg8i+jb0MyCeiQ/uxH3LMIE5dumq1B+jGRnWBrynIXBCiJmYPth?=
	=?iso-8859-1?q?m1YiBYHv6rNEjYkQHLCUYM5EOBlV3xEoZIT/fUQ=3D=3D?=
MIME-Version: 1.0
X-OriginatorOrg: os.amperecomputing.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 b5106731-09fb-4dd1-d300-08dca965ad15
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:34.0042 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 5gJVhFNBHTVRaSn0ifGhEoBPanLriRBEzDUEIvGKforqptKyv4J6yP0ejZIh2oHOunYUyEAK7QttY19vxS+sgBr2WJ/KL3e44OFjsvV0X2o=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

For sum-based loop reduction, its affine closure is composed by statements
whose results and derived computation only end up in the reduction, and are
not used in any non-linear transform operation. The concept underlies the
generalized lane-reducing pattern recognition in the coming patches. As
mathematically proved, it is legitimate to optimize evaluation of a value
with lane-reducing pattern, only if its definition statement locates in affine
closure. That is to say, canonicalized representation for loop reduction
could be of the following affine form, in which "opX" denotes an operation
for lane-reducing pattern, h(i) represents remaining operations irrelvant to
those patterns.

  for (i)
    sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

At initialization, we invoke a preprocessing step to mark all statements in
affine closure, which could ease retrieval of the property during pattern
matching. Since a pattern hit would replace original statement with new
pattern statements, we resort to a postprocessing step after recognition,
to parse semantics of those new, and incrementally update affine closure,
or rollback the pattern change if it would break completeness of existing
closure.

Thus, inside affine closure, recog framework could universally handle both
lane-reducing and normal patterns. Also with this patch, we are able to add
more complicated logic to enhance lane-reducing patterns.

Thanks,
Feng
---
gcc/
	* tree-vectorizer.h (enum vect_reduc_pattern_status): New enum.
	(_stmt_vec_info): Add a new field reduc_pattern_status.
	* tree-vect-patterns.cc (vect_split_statement): Adjust statement
	status for reduction affine closure.
	(vect_convert_input): Do not reuse conversion statement in process.
	(vect_reassociating_reduction_p): Add a condition check to only allow
	statement in reduction affine closure.
	(vect_pattern_expr_invariant_p): New function.
	(vect_get_affine_operands_mask): Likewise.
	(vect_mark_reduction_affine_closure): Likewise.
	(vect_mark_stmts_for_reduction_pattern_recog): Likewise.
	(vect_get_prev_reduction_stmt): Likewise.
	(vect_mark_reduction_pattern_sequence_formed): Likewise.
	(vect_check_pattern_stmts_for_reduction): Likewise.
	(vect_pattern_recog_1): Check if a pattern recognition would break
	existing lane-reducing pattern statements.
	(vect_pattern_recog): Mark loop reduction affine closure.
---
 gcc/tree-vect-patterns.cc | 722 +++++++++++++++++++++++++++++++++++++-
 gcc/tree-vectorizer.h     |  23 ++
 2 files changed, 742 insertions(+), 3 deletions(-)

From 737e7ea35dff9d85f5dbd5ec908e8b8229a6631d Mon Sep 17 00:00:00 2001
From: Feng Xue <fxue@os.amperecomputing.com>
Date: Mon, 8 Apr 2024 10:57:54 +0800
Subject: [PATCH 2/5] vect: Introduce loop reduction affine closure to vect
 pattern recog

For sum-based loop reduction, its affine closure is composed by statements
whose results and derived computation only end up in the reduction, and are
not used in any non-linear transform operation. The concept underlies the
generalized lane-reducing pattern recognition in the coming patches. As
mathematically proved, it is legitimate to optimize evaluation of a value
with lane-reducing pattern, only if its definition statement locates in affine
closure. That is to say, canonicalized representation for loop reduction
could be of the following affine form, in which "opX" denotes an operation
for lane-reducing pattern, h(i) represents remaining operations irrelvant to
those patterns.

  for (i)
    sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

At initialization, we invoke a preprocessing step to mark all statements in
affine closure, which could ease retrieval of the property during pattern
matching. Since a pattern hit would replace original statement with new
pattern statements, we resort to a postprocessing step after recognition,
to parse semantics of those new, and incrementally update affine closure,
or rollback the pattern change if it would break completeness of existing
closure.

Thus, inside affine closure, recog framework could universally handle both
lane-reducing and normal patterns. Also with this patch, we are able to add
more complicated logic to enhance lane-reducing patterns.

2024-04-08 Feng Xue <fxue@os.amperecomputing.com>

gcc/
	* tree-vectorizer.h (enum vect_reduc_pattern_status): New enum.
	(_stmt_vec_info): Add a new field reduc_pattern_status.
	* tree-vect-patterns.cc (vect_split_statement): Adjust statement
	status for reduction affine closure.
	(vect_convert_input): Do not reuse conversion statement in process.
	(vect_reassociating_reduction_p): Add a condition check to only allow
	statement in reduction affine closure.
	(vect_pattern_expr_invariant_p): New function.
	(vect_get_affine_operands_mask): Likewise.
	(vect_mark_reduction_affine_closure): Likewise.
	(vect_mark_stmts_for_reduction_pattern_recog): Likewise.
	(vect_get_prev_reduction_stmt): Likewise.
	(vect_mark_reduction_pattern_sequence_formed): Likewise.
	(vect_check_pattern_stmts_for_reduction): Likewise.
	(vect_pattern_recog_1): Check if a pattern recognition would break
	existing lane-reducing pattern statements.
	(vect_pattern_recog): Mark loop reduction affine closure.
---
 gcc/tree-vect-patterns.cc | 722 +++++++++++++++++++++++++++++++++++++-
 gcc/tree-vectorizer.h     |  23 ++
 2 files changed, 742 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index ca8809e7cfd..02f6b942026 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -750,7 +750,6 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 	  gimple_stmt_iterator gsi = gsi_for_stmt (stmt2_info->stmt, def_seq);
 	  gsi_insert_before_without_update (&gsi, stmt1, GSI_SAME_STMT);
 	}
-      return true;
     }
   else
     {
@@ -783,9 +782,35 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 	  dump_printf_loc (MSG_NOTE, vect_location, "and: %G",
 			   (gimple *) new_stmt2);
 	}
+    }
 
-      return true;
+  /* Since this function would change existing conversion statement no matter
+     the pattern is finally applied or not, we should check whether affine
+     closure of loop reduction need to be adjusted for impacted statements.  */
+  unsigned int status = stmt2_info->reduc_pattern_status;
+
+  if (status != rpatt_none)
+    {
+      tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (stmt1));
+      tree new_rhs_type = TREE_TYPE (new_rhs);
+
+      /* The new statement generated by splitting is a nature widening
+	 conversion. */
+      gcc_assert (TYPE_PRECISION (rhs_type) < TYPE_PRECISION (new_rhs_type));
+      gcc_assert (TYPE_UNSIGNED (rhs_type) || !TYPE_UNSIGNED (new_rhs_type));
+
+      /* The new statement would not break transform invariance of lane-
+	 reducing operation, if the original conversion depends on the one
+	 formed previously.  For the case, it should also be marked with
+	 rpatt_formed status.  */
+      if (status & rpatt_formed)
+	vinfo->lookup_stmt (stmt1)->reduc_pattern_status = rpatt_formed;
+
+      if (!is_pattern_stmt_p (stmt2_info))
+	STMT_VINFO_RELATED_STMT (stmt2_info)->reduc_pattern_status = status;
     }
+
+  return true;
 }
 
 /* Look for the following pattern
@@ -890,7 +915,10 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
     return wide_int_to_tree (type, wi::to_widest (unprom->op));
 
   tree input = unprom->op;
-  if (unprom->caster)
+
+  /* We should not reuse conversion, if it is just the statement under pattern
+     recognition.  */
+  if (unprom->caster && unprom->caster != stmt_info)
     {
       tree lhs = gimple_get_lhs (unprom->caster->stmt);
       tree lhs_type = TREE_TYPE (lhs);
@@ -1018,6 +1046,11 @@ vect_reassociating_reduction_p (vec_info *vinfo,
   if (!loop_info)
     return false;
 
+  /* As a candidate of lane-reducing pattern matching, the statement must
+     be inside affine closure of loop reduction.  */
+  if (!(stmt_info->reduc_pattern_status & rpatt_allow))
+    return false;
+
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
   if (!assign || gimple_assign_rhs_code (assign) != code)
     return false;
@@ -7201,6 +7234,672 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
 
+/* Check if EXPR is invariant regarding to vectorization region VINFO.  */
+
+static bool
+vect_pattern_expr_invariant_p (vec_info *vinfo, tree expr)
+{
+  enum vect_def_type dt;
+
+  if (TREE_CODE (expr) == SSA_NAME)
+    {
+      if (SSA_NAME_IS_DEFAULT_DEF (expr))
+	return true;
+
+      /* This is a value that is defined by a pattern statement that has not
+	 been bounded with its original statement.  */
+      if (!gimple_bb (SSA_NAME_DEF_STMT (expr)))
+	return false;
+    }
+
+  if (!vect_is_simple_use (expr, vinfo, &dt))
+    return false;
+
+  if (dt == vect_external_def || dt == vect_constant_def)
+    return true;
+
+  return false;
+}
+
+/* If OP is a linear transform operation, return index bit mask of all possible
+   variant operands, otherwise, return 0.  */
+
+static int
+vect_get_affine_operands_mask (vec_info *vinfo, const gimple_match_op &op)
+{
+  switch (op.code.safe_as_tree_code ())
+    {
+      CASE_CONVERT:
+	if (!tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0])))
+	  break;
+	/* FALLTHRU */
+
+      case SSA_NAME:
+      case NEGATE_EXPR:
+      case BIT_NOT_EXPR:
+	return 1 << 0;
+
+      case PLUS_EXPR:
+      case MINUS_EXPR:
+	return (1 << 0) | (1 << 1);
+
+      case MULT_EXPR:
+	if (vect_pattern_expr_invariant_p (vinfo, op.ops[0]))
+	  return 1 << 1;
+	/* FALLTHRU */
+
+      case LSHIFT_EXPR:
+	if (vect_pattern_expr_invariant_p (vinfo, op.ops[1]))
+	  return 1 << 0;
+	break;
+
+      default:
+	if (lane_reducing_op_p (op.code))
+	  {
+	    /* The last operand of lane-reducing op is for reduction.  */
+	    gcc_assert (op.num_ops > 1);
+	    return 1 << (op.num_ops - 1);
+	  }
+	break;
+    }
+
+  return 0;
+}
+
+/* Mark all statements in affine closure whose computation leads to START that
+   is non-reduction addend of a loop reduction statement.  The corresponding
+   reduction PHI is represented by REDUC_INFO.  For ssa name defined by marked
+   statement, we record the count of uses that have not been marked so far,
+   into hash map USE_CNT_MAP.  This function is to be called for all reduction
+   statements in the loop.  */
+
+static void
+vect_mark_reduction_affine_closure (loop_vec_info loop_vinfo,
+				    tree start, stmt_vec_info reduc_info,
+				    hash_map<tree, unsigned> &use_cnt_map)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  auto_vec<tree> worklist;
+
+  worklist.safe_push (start);
+
+  do
+    {
+      tree value = worklist.pop ();
+      stmt_vec_info stmt_info = loop_vinfo->lookup_def (value);
+
+      if (!stmt_info || STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+	continue;
+
+      if (!has_single_use (value))
+	{
+	  bool exist;
+	  auto &use_cnt = use_cnt_map.get_or_insert (value, &exist);
+
+	  if (!exist)
+	    use_cnt = num_imm_uses (value);
+
+	  gcc_checking_assert (use_cnt > 0);
+
+	  /* As long as value is not referred by statement outside of reduction
+	     affine closure, we are free to apply lane-reducing patterns to it
+	     without duplication, no matter whether the value is single used
+	     or not, thus even sharing a lane-reducing operation among multiple
+	     loop reductions could be possible.  */
+	  if (--use_cnt)
+	    continue;
+	}
+
+      gimple *stmt = stmt_info->stmt;
+      gimple_match_op op;
+
+      /* Skip reduction PHI statement and leaf statement like "x = const".  */
+      if (!gimple_extract_op (stmt, &op))
+	continue;
+
+      if (needs_fold_left_reduction_p (op.type, op.code)
+	  || gimple_bb (stmt)->loop_father != loop)
+	continue;
+
+      stmt_info->reduc_pattern_status = rpatt_allow;
+
+      /* Vectorizable analysis and transform on lane-reducing operation needs
+	 some information in the associated reduction PHI statement.  */
+      STMT_VINFO_REDUC_DEF (stmt_info) = reduc_info;
+
+      if (auto mask = vect_get_affine_operands_mask (loop_vinfo, op))
+	{
+	  /* Try to expand affine closure to dependant affine operands.  */
+	  for (unsigned i = 0; i < op.num_ops; i++)
+	    {
+	      if (mask & (1 << i))
+		worklist.safe_push (op.ops[i]);
+	    }
+	}
+    } while (!worklist.is_empty ());
+}
+
+/* The prerequisite to optimize evaluation of a value with lane-reducing
+   pattern is that its definition statement must locate in affine closure of
+   non-reduction addend of loop reduction statements.  To be specific, the
+   value and all its derived computation only end up in loop reductions, and
+   are not used in any non-linear transform operation.  That is to say, if
+   such kind of patterns are matched, final pattern statements for loop
+   reduction could be canonicalized to the following affine form, in which
+   "opX" denotes a lane-reducing operation, h(i) represents other operations
+   irrelvant to those patterns.
+
+     for (i)
+       sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);
+
+   This function traverses all loop reductions to discover affine closures
+   and mark all statements inside them.  */
+
+static void
+vect_mark_stmts_for_reduction_pattern_recog (loop_vec_info loop_vinfo)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  const edge latch = loop_latch_edge (loop);
+  basic_block header = loop->header;
+  hash_map<tree, unsigned> use_cnt_map;
+
+  DUMP_VECT_SCOPE ("vect_mark_stmts_for_reduction_pattern_recog");
+
+  for (auto gsi = gsi_start_phis (header); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gphi *phi = gsi.phi ();
+      stmt_vec_info reduc_info = loop_vinfo->lookup_stmt (phi);
+
+      if (!reduc_info
+	  || STMT_VINFO_DEF_TYPE (reduc_info) != vect_reduction_def
+	  || STMT_VINFO_REDUC_CODE (reduc_info) != PLUS_EXPR
+	  || STMT_VINFO_REDUC_TYPE (reduc_info) != TREE_CODE_REDUCTION)
+	continue;
+
+      tree start_def = PHI_RESULT (phi);
+      tree reduc_def = PHI_ARG_DEF_FROM_EDGE (phi, latch);
+      auto_vec<stmt_vec_info, 8> reduc_stmts;
+      auto_vec<tree, 8> addends;
+
+      while (reduc_def != start_def)
+	{
+	  gimple *stmt = SSA_NAME_DEF_STMT (reduc_def);
+	  gimple_match_op op;
+
+	  /* Dot not step into inner loop.  */
+	  if (gimple_bb (stmt)->loop_father != loop)
+	    break;
+
+	  if (!gimple_extract_op (stmt, &op))
+	    {
+	      gcc_assert (gimple_code (stmt) == GIMPLE_PHI);
+	      break;
+	    }
+
+	  stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt);
+	  int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
+
+	  gcc_assert (reduc_idx >= 0 && reduc_idx < (int) op.num_ops);
+
+	  if (op.code == PLUS_EXPR || op.code == MINUS_EXPR)
+	    {
+	      if (needs_fold_left_reduction_p (op.type, op.code))
+		break;
+
+	      /* Record non-reduction addend.  */
+	      addends.safe_push (op.ops[reduc_idx ? 0 : 1]);
+	    }
+	  else
+	    {
+	      gcc_assert (CONVERT_EXPR_CODE_P (op.code));
+	      if (!tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0])))
+		break;
+	    }
+
+	  reduc_stmts.safe_push (stmt_info);
+	  reduc_def = op.ops[reduc_idx];
+	}
+
+      if (reduc_def == start_def)
+	{
+	  /* Mark reduction PHI statement although it would not be matched
+	     against any pattern.  */
+	  reduc_info->reduc_pattern_status = rpatt_allow;
+
+	  for (auto stmt_info : reduc_stmts)
+	    {
+	      /* Mark reduction statement itself.  */
+	      stmt_info->reduc_pattern_status = rpatt_allow;
+
+	      /* Vectorizable analysis and transform on lane-reducing operation
+		 needs some information in the associated reduction PHI
+		 statement.  */
+	      STMT_VINFO_REDUC_DEF (stmt_info) = reduc_info;
+	    }
+
+	  /* Mark statements that participate in loop reduction indirectly
+	     through non-reduction addends.  */
+	  for (auto addend : addends)
+	    vect_mark_reduction_affine_closure (loop_vinfo, addend,
+						reduc_info, use_cnt_map);
+	}
+    }
+}
+
+/* For a reduction statement STMT_INFO, which could also be the reduction PHI,
+   return the previous reduction statement that it depends on. */
+
+static stmt_vec_info
+vect_get_prev_reduction_stmt (loop_vec_info loop_vinfo,
+			      stmt_vec_info stmt_info)
+{
+  gimple *stmt = stmt_info->stmt;
+  tree prev_def;
+
+  if (is_a <gphi *> (stmt))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      const edge latch = loop_latch_edge (loop);
+
+      gcc_assert (STMT_VINFO_REDUC_DEF (stmt_info));
+      gcc_assert (loop == gimple_bb (stmt)->loop_father);
+      prev_def = PHI_ARG_DEF_FROM_EDGE (stmt, latch);
+    }
+  else
+    {
+      int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
+      gimple_match_op op;
+
+      if (!gimple_extract_op (stmt, &op))
+	gcc_unreachable ();
+
+      gcc_assert (reduc_idx >= 0 && reduc_idx < (int) op.num_ops);
+      prev_def = op.ops[reduc_idx];
+    }
+
+  return vect_stmt_to_vectorize (loop_vinfo->lookup_def (prev_def));
+}
+
+/* Given pattern statement sequence for ORIG_STMT_INFO (including PATTERN_STMT
+   and STMT_VINFO_PATTERN_DEF_SEQ), a subset of it represented by FORMED_STMTS
+   are known to depend on (or just be) lane-reducing operations.  In this
+   function, the subset would be marked with rpatt_formed at first, then the
+   status is forward propagated to every dependent pattern statement along
+   paths that contribute to PATTERN_STMT, other statements remain unchanged.
+   FORMED_STMTS is reset to empty upon completion.  */
+
+static void
+vect_mark_reduction_pattern_sequence_formed (loop_vec_info loop_vinfo,
+					     stmt_vec_info orig_stmt_info,
+					     gimple *pattern_stmt,
+					     vec<stmt_vec_info> &formed_stmts)
+{
+  stmt_vec_info last_stmt = formed_stmts.last ();
+  stmt_vec_info related_stmt = STMT_VINFO_RELATED_STMT (last_stmt);
+  gimple_seq pattern_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
+  hash_map<tree, auto_vec<stmt_vec_info>> use_map;
+
+  /* Due to lack of a mechanism to quickly get immedidate uses for a pattern
+     def, we have to build a simple def-use graph out of pattern statement
+     sequence.  */
+  for (auto seq : { pattern_stmt, pattern_seq })
+    for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi))
+      {
+	stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (gsi));
+	gimple_match_op op;
+
+	gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == related_stmt);
+
+	/* Since elements are placed to FORMED_STMTS in the way that the nearer
+	   distance to PATTERN_STMT, the first order, pattern statements after
+	   the last one in the set must not depend on lane-reducing operation,
+	   no need to process them.  */
+	if (stmt_info == last_stmt)
+	  goto out;
+
+	if (!gimple_extract_op (stmt_info->stmt, &op))
+	  continue;
+
+	for (unsigned i = 0; i < op.num_ops; i++)
+	  {
+	    if (TREE_CODE (op.ops[i]) == SSA_NAME)
+	      use_map.get_or_insert (op.ops[i]).safe_push (stmt_info);
+	  }
+      }
+out:
+
+  basic_block bb = gimple_bb (pattern_stmt);
+
+  do
+    {
+      stmt_vec_info stmt_info = formed_stmts.pop ();
+      gimple *stmt = stmt_info->stmt;
+
+      gcc_assert (gimple_bb (stmt) == bb);
+
+      /* A statement may be reached from more than one lane-reducing
+	 operations, suppose a case in which two dot-products are added
+	 together.  */
+      if (stmt_info->reduc_pattern_status & rpatt_formed)
+	continue;
+
+      stmt_info->reduc_pattern_status |= rpatt_formed;
+
+      /* Do not propagate status outside of pattern statement sequence.  */
+      if (stmt != pattern_stmt)
+	{
+	  auto *uses = use_map.get (gimple_get_lhs (stmt));
+
+	  gcc_assert (uses);
+	  for (auto use : *uses)
+	    formed_stmts.safe_push (use);
+	}
+    } while (!formed_stmts.is_empty ());
+}
+
+/* A successful pattern recognition would replace matched statement with new
+   pattern statements, which might cause loop reduction affine closure being
+   changed.  On the one hand, new linear-transform-like pattern statement could
+   be pulled into closure, for example, this could happen with a pattern that
+   decomposes a mult-by-constant to a series of additions and shifts.  On the
+   other hand, some statements that are originally in closure have to be kicked
+   out if linearity of a relay statement linking into the closure would be
+   broken, such as, due to introduction of a non-trivial conversion.  However,
+   this would get us into a conflict situation when impacted statement connects
+   lane-reducing and loop reduction statement, in that lane-reducing pattern
+   could not be reverted once it has been formed.  Only alternative is to
+   invalidate the other pattern in process.
+
+   Therefore, after a pattern is recognized on ORIG_STMT_INFO, this function
+   is called to parse semantics of all new pattern statements (including
+   PATTERN_STMT), and check if possible resultant adjustment on affine closure
+   of loop reduction would conflict with existing lane-reducing statements, if
+   not, return true, otherwise, return false.  */
+
+static bool
+vect_check_pattern_stmts_for_reduction (loop_vec_info loop_vinfo,
+					stmt_vec_info orig_stmt_info,
+					gimple *pattern_stmt)
+{
+  unsigned status = orig_stmt_info->reduc_pattern_status;
+
+  /* Nothing to be done if original statement is reduction irrelevant.  */
+  if (status == rpatt_none)
+    return true;
+
+  /* Degraded lane-reducing statement is not in reduction affine closure.
+     Pattern recognition on such statement should be very rare.  Do not allow
+     it for simplicity.  */
+  if (!(status & rpatt_allow))
+    return false;
+
+  auto_vec<stmt_vec_info> non_reduc_stmts;
+  auto_vec<stmt_vec_info> rpatt_formed_stmts;
+  gimple_seq pattern_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
+  gimple_match_op op;
+
+  for (auto seq : { pattern_stmt, pattern_seq })
+    for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt);
+
+	if (!stmt_info)
+	  stmt_info = loop_vinfo->add_stmt (stmt);
+
+	/* Replacement of original statement by pattern statement sequence has
+	   not been committed yet, so basic block is not set.  This fact could
+	   be used to distinguish these pending pattern statements from
+	   existing ones.  */
+	gcc_assert (!gimple_bb (stmt));
+
+	/* Initially mark pattern statement as in affine closure, and this
+	   status might be changed later according to def/use relationship
+	   among all pattern statements.  */
+	stmt_info->reduc_pattern_status = rpatt_allow;
+      }
+
+  /* Traverse statements in the order that use precedes def.  */
+  for (auto seq : { pattern_stmt, pattern_seq })
+    for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+
+	/* Need not do any further for leaf statement like "x = const".  */
+	if (!gimple_extract_op (stmt, &op))
+	  continue;
+
+	stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt);
+	int affine_oprnds_mask = 0;
+
+	if (needs_fold_left_reduction_p (op.type, op.code))
+	  stmt_info->reduc_pattern_status = rpatt_none;
+	else if (stmt_info->reduc_pattern_status == rpatt_allow)
+	  affine_oprnds_mask = vect_get_affine_operands_mask (loop_vinfo, op);
+
+	/* Record lane-reducing statement into a set from which forward
+	   propagation of rpatt_formed status would start.  */
+	if (lane_reducing_op_p (op.code))
+	  rpatt_formed_stmts.safe_push (stmt_info);
+
+	for (unsigned i = 0; i < op.num_ops; i++)
+	  {
+	    tree oprnd = op.ops[i];
+	    stmt_vec_info oprnd_info = loop_vinfo->lookup_def (oprnd);
+
+	    if (oprnd_info)
+	      oprnd_info = vect_stmt_to_vectorize (oprnd_info);
+	    else if (vect_pattern_expr_invariant_p (loop_vinfo, oprnd))
+	      continue;
+	    else
+	      {
+		/* Pattern statement contains unvectorizable operand, simply
+		   bail out.  */
+		return false;
+	      }
+
+	    if (!(affine_oprnds_mask & (1 << i)))
+	      {
+		/* It is expected that this operand would not be in affine
+		   closure.  */
+
+		if (!gimple_bb (oprnd_info->stmt))
+		  {
+		    /* The operand is defined by another uncommitted pattern
+		       statement, whose status should be changed to
+		       rpatt_none.  */
+		    oprnd_info->reduc_pattern_status = rpatt_none;
+		  }
+		else if (oprnd_info->reduc_pattern_status & rpatt_formed)
+		  {
+		    /* Conflict with an existing lane-reducing pattern
+		       statement, so fail the check.  TODO: Allow pattern
+		       statement that uses value defined by degraded lane-
+		       reducing statement.  */
+		    return false;
+		  }
+		else if (oprnd_info->reduc_pattern_status & rpatt_allow)
+		  {
+		    /* This statement has to be removed from affine closure.
+		       Here only record it into a set, and the actual removal
+		       action will be recursively performed later on it and
+		       all statements that are linked to the closure through
+		       it.  */
+		    non_reduc_stmts.safe_push (oprnd_info);
+		  }
+	      }
+	    else if (oprnd_info->reduc_pattern_status & rpatt_formed)
+	      {
+		/* There must be a path from the original statement to some
+		   lane-reducing statement.  */
+		gcc_assert (status & rpatt_formed);
+
+		/* The operand definition statement should not be uncommited
+		   pattern statement, for which propagation of rpatt_formed
+		   status has not been started.   */
+		gcc_assert (gimple_bb (oprnd_info->stmt));
+
+		/* The operand definition statement should be in reduction
+		   affine closure.   */
+		gcc_assert (oprnd_info->reduc_pattern_status & rpatt_allow);
+
+		/* This uncommitted pattern statement is a boundary point to
+		   which rpatt_formed status would be propagated from other
+		   exisiting statement.  */
+		rpatt_formed_stmts.safe_push (stmt_info);
+	      }
+	  }
+      }
+
+  /* Forward propagate rpatt_formed status inside uncommitted pattern statement
+     sequence.  */
+  if (!rpatt_formed_stmts.is_empty ())
+    vect_mark_reduction_pattern_sequence_formed (loop_vinfo, orig_stmt_info,
+						 pattern_stmt,
+						 rpatt_formed_stmts);
+
+  stmt_vec_info pattern_stmt_info = loop_vinfo->lookup_stmt (pattern_stmt);
+  unsigned pattern_status = pattern_stmt_info->reduc_pattern_status;
+
+  /* Overriding formed lane-reducing operation by another new normal pattern
+     matching is not allowed.  */
+  if ((status & rpatt_formed) && !(pattern_status & rpatt_formed))
+    return false;
+
+  if (pattern_status == rpatt_none && vect_is_reduction (orig_stmt_info))
+    {
+      auto prev = vect_get_prev_reduction_stmt (loop_vinfo, orig_stmt_info);
+
+      /* Since statements in a reduction chain are cyclically dependent, we
+	 have to exclude the whole chain from affine closure if one reduction
+	 statement does not meet lane-reducing prerequisite.  Then prepare for
+	 propagating rpatt_none status to the previous reduction statement.  */
+      non_reduc_stmts.safe_push (prev);
+
+      /* Terminate propagation when rotating back to the original
+	 statement.  */
+      orig_stmt_info->reduc_pattern_status = rpatt_none;
+   }
+
+  /* Backward propagate rpatt_none status to existing statements.  */
+  while (!non_reduc_stmts.is_empty ())
+    {
+      stmt_vec_info stmt_info = non_reduc_stmts.pop ();
+      gimple *stmt = stmt_info->stmt;
+
+      gcc_assert (gimple_bb (stmt));
+
+      if (stmt_info->reduc_pattern_status == rpatt_none)
+	continue;
+
+      gcc_assert (!(stmt_info->reduc_pattern_status & rpatt_formed));
+      stmt_info->reduc_pattern_status = rpatt_none;
+
+      if (is_a <gphi *> (stmt))
+	{
+	  auto prev = vect_get_prev_reduction_stmt (loop_vinfo, stmt_info);
+
+	  /* For reduction PHI, propagation shoule be confined inside the loop,
+	     so only through latch edge.  */
+	  non_reduc_stmts.safe_push (prev);
+	  continue;
+	}
+
+      if (!gimple_extract_op (stmt, &op))
+	continue;
+
+      gcc_assert (!lane_reducing_op_p (op.code));
+
+      for (unsigned i = 0; i < op.num_ops; i++)
+	{
+	  if (auto oprnd_info = loop_vinfo->lookup_def (op.ops[i]))
+	    non_reduc_stmts.safe_push (vect_stmt_to_vectorize (oprnd_info));
+	}
+    }
+
+  if ((status & rpatt_formed) || !(pattern_status & rpatt_formed))
+    {
+      /* If lane-reducing statement has already existed on other path to the
+	 original statement, no need to propagate rpatt_formed status again.
+	 Or no lane-reducing statement is generated, nothing to do.  */
+      return true;
+    }
+
+  rpatt_formed_stmts.safe_push (orig_stmt_info);
+
+  /* Forward propagate rpatt_formed status inside existing pattern statement
+     sequence.  */
+  if (is_pattern_stmt_p (orig_stmt_info))
+    {
+      stmt_vec_info root_orig_info = vect_orig_stmt (orig_stmt_info);
+      stmt_vec_info root_pattern = vect_stmt_to_vectorize (root_orig_info);
+
+      vect_mark_reduction_pattern_sequence_formed (loop_vinfo, root_orig_info,
+						   root_pattern->stmt,
+						   rpatt_formed_stmts);
+
+      gcc_assert (root_pattern->reduc_pattern_status & rpatt_formed);
+      rpatt_formed_stmts.safe_push (root_orig_info);
+    }
+
+  /* Forward propagate rpatt_formed status to existing statements that haven
+     not been processed for pattern recognition.  */
+  do
+    {
+      stmt_vec_info stmt_info = rpatt_formed_stmts.pop ();
+      gimple *stmt = stmt_info->stmt;
+
+      gcc_assert (gimple_bb (stmt));
+
+      if (stmt_info->reduc_pattern_status & rpatt_formed)
+	continue;
+
+      gcc_assert (stmt_info->reduc_pattern_status & rpatt_allow);
+      stmt_info->reduc_pattern_status |= rpatt_formed;
+
+      if (vect_is_reduction (stmt_info) || is_a <gphi *> (stmt))
+	{
+	  auto prev = vect_get_prev_reduction_stmt (loop_vinfo, stmt_info);
+
+	  /* We must consider statements in reduction chain as a whole in order
+	     to ensure legality of lane-reducing operations, for which all
+	     reduction statements should be marked with rpatt_formed status.
+	     As a special handling, here we traverse reduction statements
+	     "backforward", in that some of them might be created by pattern,
+	     and currently, there is no straightforward way to obtain
+	     immedidate uses for a value defined by pattern statement.  Since
+	     reduction statements are in a cycle chain, the approach would lead
+	     to the same marking as forward progation.  */
+	  rpatt_formed_stmts.safe_push (prev);
+	  continue;
+	}
+
+      tree lhs = gimple_get_lhs (stmt);
+      imm_use_iterator iter;
+      gimple *use_stmt;
+
+      /* The statement has not been processed yet, so we could walk def/use
+	 chain by normal means.  */
+      gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
+      gcc_assert (!is_pattern_stmt_p (stmt_info));
+
+      FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+	{
+	  if (is_gimple_debug (use_stmt))
+	    continue;
+
+	  stmt_vec_info use_stmt_info = loop_vinfo->lookup_stmt (use_stmt);
+
+	  /* Because the statement is not reduction statement or PHI, it
+	     should not have any use outside of the loop.  */
+	  gcc_assert (gimple_has_lhs (use_stmt) && use_stmt_info);
+	  rpatt_formed_stmts.safe_push (use_stmt_info);
+	}
+    } while (!rpatt_formed_stmts.is_empty ());
+
+  return true;
+}
+
 /* Mark statements that are involved in a pattern.  */
 
 void
@@ -7383,6 +8082,19 @@ vect_pattern_recog_1 (vec_info *vinfo,
       return;
     }
 
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+
+  /* Check if the pattern would break existing lane-reducing pattern
+     statements.  */
+  if (loop_vinfo
+      && !vect_check_pattern_stmts_for_reduction (loop_vinfo, stmt_info,
+						  pattern_stmt))
+    {
+      /* Invalidate the pattern when detecting conflict.  */
+      STMT_VINFO_PATTERN_DEF_SEQ (stmt_info) = NULL;
+      return;
+    }
+
   /* Found a vectorizable pattern.  */
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -7481,6 +8193,10 @@ vect_pattern_recog (vec_info *vinfo)
 
   DUMP_VECT_SCOPE ("vect_pattern_recog");
 
+  /* Mark loop reduction affine closure for lane-reducing patterns.  */
+  if (auto loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+    vect_mark_stmts_for_reduction_pattern_recog (loop_vinfo);
+
   /* Scan through the stmts in the region, applying the pattern recognition
      functions starting at each stmt visited.  */
   for (unsigned i = 0; i < nbbs; i++)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index df6c8ada2f7..52793ee87e9 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1185,6 +1185,25 @@ enum slp_vect_type {
   hybrid
 };
 
+/* The status of statement for lane-reducing patterns matching.  */
+enum vect_reduc_pattern_status {
+  /* Statement is not in loop reduction affine closure.  */
+  rpatt_none = 0,
+
+  /* Statement is part of loop reduction affine closure, so it is candidate of
+     lane-reducing patterns.  */
+  rpatt_allow = 1,
+
+  /* Statement is or depends on lane-reducing pattern statement, once being
+     marked, the status could not be changed.  In most situations, the
+     statement also has the status of rpatt_allow.  One exceptional case
+     is that when lane-reducing to a given result type is not supported by
+     target, we could settle for second best by creating a degraded lane-
+     reducing statement with a smaller intermediate result type.  Such
+     statement is not in affine closure.  */
+  rpatt_formed = 2
+};
+
 /* Says whether a statement is a load, a store of a vectorized statement
    result, or a store of an invariant value.  */
 enum vec_load_store_type {
@@ -1431,6 +1450,10 @@ public:
   /* Whether on this stmt reduction meta is recorded.  */
   bool is_reduc_info;
 
+  /* Describe how the statement would be handled when performing lane-reducing
+     pattern matching.   */
+  unsigned int reduc_pattern_status;
+
   /* If nonzero, the lhs of the statement could be truncated to this
      many bits without affecting any users of the result.  */
   unsigned int min_output_precision;
-- 
2.17.1


From patchwork Sun Jul 21 09:15:43 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Feng Xue OS <fxue@os.amperecomputing.com>
X-Patchwork-Id: 1962885
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=fail reason="signature verification failed" (1024-bit key;
 unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com
 header.a=rsa-sha256 header.s=selector2 header.b=v4Ukireg;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WRd7p5Jdbz1yYm
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 19:16:22 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 50AF33861003
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 09:16:20 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from CY4PR05CU001.outbound.protection.outlook.com
 (mail-westcentralusazlp170100000.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:c112::])
 by sourceware.org (Postfix) with ESMTPS id 40E193858C39
 for <gcc-patches@gcc.gnu.org>; Sun, 21 Jul 2024 09:15:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 40E193858C39
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=os.amperecomputing.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=os.amperecomputing.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 40E193858C39
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:c112::
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553348; cv=pass;
 b=ec6VHfYfroLLe2fzgpGTRn3icN9VCvpJhLHQb2zRUyfNbtrLNSSX51idXE29hfeIvzEO11o8u5upwAURobj8Q4gtwT4ZxcoEfFQLu7X2K1tCubI4lyxNEs7iNIaWHWq9BD7LsTOI13NTnQxvbQ0PnUJJxXFYhJryETx5cxFa8Vs=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1721553348; c=relaxed/simple;
 bh=qINA8VTV7GbdLAwUheK5t4/gJI4n4VSpALbFfy+FEBE=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=VxHwEX2fOGfc0AwVz1eJuAbhJKOxEQ8QKPpUnWdZIQw1VmhN2J4hmXtrHkh9EGGD+TXgvagAkirTTP6tazS+7AdpVncqlpvxKK+wIBRBXk2ccDBUaBcfSq0Y1TDpckN6Jmyo2AG5p9RymR4KyKXQGQfGNru4i4POhelyZtB6GZA=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=bCzdy72zVj41bKq1wXMFxe1xIgjOgDnFQ3APshTj8dh9+rytkEOqhSbYyB19Jjxan/UnhxcMcc67dQlbL6lACnlG16uiZnNQ+wKtc2l9y9fkJus81I9lE47/0xZcffxM1cMLFcM20TyFw/76bn9Xb90RzFEy8ePPId7MgiAZxd+FgQQIbVDquvCJ10trwjkwoP+mPEEefmytQWY9SF+sS8EY1E2N15QSrnDTMMvm7UGDnNGztVF9xb14OpAQV6dKHIOwadocCqhTkp/gK24cROCNWtlZa2r7t3oLtT/FCkrhPMbOL7mryjbmFEU5ikBocc2/LLaw6eqhAt63QyOHXg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=9NhSv7QuLcWK8WqLvXcbBxU2qbpciavLBN+hPP0giwI=;
 b=zDq/8O3ldUMO+7gvKZEUXQWEMDvsze4kXzRFrwtVcANpetD9HIH4WH5RwBc3dz/8N1gf5dxHZ0k90/BEv2DqVppD9e4po55n+StRC/E510hHFubMCq6p9dDqlsUtT/2RsYY1E9TBK+AdpJaepr0LEM0iiBv0yYEf3GzbWhXLSSQAMj5A27O5brLLHa10x3Xapl0OdIY8WlvTnBSH0HWVfSlr9hX+4dN8hbf2JB/q7ezk5SobsA1S/n8aFaQ7+Hj0UL0SQKekNSd+fVcdhJMp7E+FirxNb5o7UQHBWwWO/4XV1o22QdfCzwVQVF/LwCZezn/unzocfb1O+0e3M89X1g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none
 header.from=os.amperecomputing.com; dkim=pass
 header.d=os.amperecomputing.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=os.amperecomputing.com; s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=9NhSv7QuLcWK8WqLvXcbBxU2qbpciavLBN+hPP0giwI=;
 b=v4Ukireg6W7uBC3iA/HFH3vYB7Hr/GvOXlr8emWjBwC6/b2p29QchH4p3HdAy4g6JuZojT9BGAkyFBsmJw1nYOn3tzwASJkSwrTzZRHCq09QeOhOwbeQNmweMwhuOtLfpCqdMZlGkeWUVcG7fxGsbyidRfGrKTpqNcHiLWqH/z0=
Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by
 DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7784.14; Sun, 21 Jul 2024 09:15:43 +0000
Received: from LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024
 09:15:43 +0000
From: Feng Xue OS <fxue@os.amperecomputing.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Biener <richard.guenther@gmail.com>, Tamar Christina
 <Tamar.Christina@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [RFC][PATCH 3/5] vect: Enable lane-reducing operation that is not
 loop reduction statement
Thread-Topic: [RFC][PATCH 3/5] vect: Enable lane-reducing operation that is
 not loop reduction statement
Thread-Index: AQHa20rQBvebQu0gaEOqQV2X1ti+tg==
Date: Sun, 21 Jul 2024 09:15:43 +0000
Message-ID: 
 <LV2PR01MB783953E7DB764D4FBCAF38D1F7AF2@LV2PR01MB7839.prod.exchangelabs.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:43.184Z;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=os.amperecomputing.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_
x-ms-office365-filtering-correlation-id: a7db6a83-8ff5-4172-d5d9-08dca965b2b5
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|376014|366016|1800799024|38070700018;
x-microsoft-antispam-message-info: =?iso-8859-1?q?3O3/2JVP+sAbT8My7pfXMuCC+9?=
	=?iso-8859-1?q?255JWh4kMAoD+a3rWgm6JYkFZzudi2GD0Buz/rc7aw663aRjAaQmI4JpXUXx?=
	=?iso-8859-1?q?HzqSKCzjdd1UlP1DtSftPc3oNuSOBHY9GtpChYNRB0+JydS9dPKeMKt7+LFV?=
	=?iso-8859-1?q?rWX459H67kTpjhflECRoHRkaZmrVxo1PqGmljjrc7PS/XQ9TpvDU4JD+EUZD?=
	=?iso-8859-1?q?0vmHEKh+2x7DpKAAORW/DeGBy29QYu0xpBU5ek63Zd/lKevT8l5HDV/i5k5d?=
	=?iso-8859-1?q?4qXpDn3orPRECGnmVXU3i2YL6wGHDg5CviPt9p1Nq9gp4RKe0XlHwUUyeBJx?=
	=?iso-8859-1?q?P7JY4UvvhPekX3zkmuK0n0Dmkv2+jPakHII0V7nQxe3RXbxMgUMaPB4bh1XH?=
	=?iso-8859-1?q?P8cOGsSt7GyJxHmfjpHIs2vIHOFYM2zfw1hW4PX8PUJQfOvZJ8wMcsQk7Ni9?=
	=?iso-8859-1?q?Z8eKe2kpUuBuEpFdj4ekIhqa/HQSLVSULoS+HYRhbxzr8u4/G0QnvFOJSa1F?=
	=?iso-8859-1?q?fGxHRKJ3S0zRzutuEma/KRbyCP8Vgv9Q8OBoDTrgC5FSHKWJHMIhghFZWl2s?=
	=?iso-8859-1?q?WbnyygLGN05v6PuZah5ZQf46R+m/d/FKpxSLp5+heB6x6JUnmzQWfqFcQYS8?=
	=?iso-8859-1?q?yYsckWayBwMfUblxBBONBthkTbz5QD8uN5wi8cRBH+DhJm5QK1Ba9pER8bGh?=
	=?iso-8859-1?q?lqhwX8KPfdp7ieYhHY7KsjSV+RMOY/yQTw/4Thxfx6Ek1HcGIPE1MPBL9k+o?=
	=?iso-8859-1?q?dP3hbHGtwWUcG212moIRMOaIHtJc0Bqdees4p86KPvBWYhayvwI8DWO9esZN?=
	=?iso-8859-1?q?t/N6S4OXFT1HfMjXeIfwme16eKNZVi0+DOiNrhbfnLgQFX/YtrjXhnq6sY7o?=
	=?iso-8859-1?q?VQThkRG2oSzZS/+Me4FUQJ8MF0JDeGJQgMvzKgA5v0g357ZzfAEHMS1Xyj8e?=
	=?iso-8859-1?q?pwxMvjewmfWH9+ymevFJhNTFwpkwgzZM/NxjNvaozDTAHnactGxibLFUr5/8?=
	=?iso-8859-1?q?C9zvzEeZuSEILOhzIh1aQ/x18upTUIPWao3p/ELNcogn2uxMd830ZJRB4BXD?=
	=?iso-8859-1?q?CbHKTAxXJCqcPEWZ+v8flPDDUm4b4Ut1w/zMsDbkuybcynuzrMh4Lubnn9T/?=
	=?iso-8859-1?q?h7wYho8jx/X34CBUErgFa6K5qvqBD/Jcup2EwjQ3FSboPnr1MpquaXhw1Zmg?=
	=?iso-8859-1?q?dmhp+1hCdKj4VUVU2uwSwUvC+gWAwrvWS9Qa9R5h7TFMqsf7KqhU391D4Dof?=
	=?iso-8859-1?q?vptahBzQSyDUqbUXrDXnCTHpDeVkN3ojdC0OuQfInnKGHt/7sDbGdZcNTGYm?=
	=?iso-8859-1?q?pF17FhHldoUwm4rR3dyxu2ZZi7rvOdpSH0xkdGmqukUQGvKNSCVFeycBJ9ff?=
	=?iso-8859-1?q?kVTqgfuY4d+JZNFagUafpWx4xVAxb1CrZO4c6plgV1Bw2/3eWmb7IEHUnO0A?=
	=?iso-8859-1?q?fow8y+drUsmyAl+dQGCHJQgA=3D=3D?=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE;
 SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?VhYtfiJQrJcP8Bw8CvpUC1q?=
	=?iso-8859-1?q?Wb5NdPZAMkAfTnkiOD9D5Dwc1xaDDgF0T9CPRgLQwFyhOtOJoN+KAPh78lmY?=
	=?iso-8859-1?q?Z5eehMIORWE69YmY+jbkKOxuXe+ZGFO/kN1fOgtRznV2N+pNEOkMXdSBDvEk?=
	=?iso-8859-1?q?Up+l1vpbE00eaZIi1ByK26Va/se/rD//G9Ei54l5onxTq2kL6nob8fj97P8V?=
	=?iso-8859-1?q?28PR5LtWNfze81MMGQhf1LlZlRIZVLZVVYFE6/ufSHC29Foru7lULahIMJ8q?=
	=?iso-8859-1?q?saApw6Excao/fUINisscrUkJHkvd6N9tRsF7nRJ3vJGRYZXPwCrecop1f7Q5?=
	=?iso-8859-1?q?M8Wosxwbb7c/4iQU2sx/9mnaG08cCCsuSRPLLY0/bylQTocoGjXOfHH2mysl?=
	=?iso-8859-1?q?qAk+vjqvrSYX5X74ec0uDo0BsE6QOqDTIB6h2TbfwSQVRtxBtdzCMKebfkd8?=
	=?iso-8859-1?q?i6G8A33G8iPhPyoCLDIuRrPKHNjivbZitd2Z8k4my1P4UDMC7nzRRvEjQWBt?=
	=?iso-8859-1?q?mEMN8vRND7R3hFNSiWQya6avFm5WdIkQeRtLtuyKnZZWGueWY1uRfq5LKRw+?=
	=?iso-8859-1?q?95icO7olylG0NM98VyyJvbisGruKdGmtaqGi5younJ+ojyXJ+XC4+4MyeJZX?=
	=?iso-8859-1?q?V7Tlnt0znSlFzFphKa21ZKa1Y0+/voemW5xCcmrjJYsOTDnm3ysDtTkz49v+?=
	=?iso-8859-1?q?Hq66tBUZtpuIgVY+IWjgc9bE+YoMiw4ohKDoymKN8GXJAtiUgRmTe9cLQR74?=
	=?iso-8859-1?q?4DfDNtuLVdsLu21Of7nAaNaMQqGgS/43OoPzv9LKSHTRbrd9yI6ypPwqvVMU?=
	=?iso-8859-1?q?X+oa8Ah0RMr/XiM1HWH686mTLsfuPH70UDK3IWiruTt2GBih5mM9E+VVyuzR?=
	=?iso-8859-1?q?akS8An2A04hS+jNFApolouvpyn/Wx/9xb9NUEGgwa43HRTKZ9lMr4oqgWwz0?=
	=?iso-8859-1?q?DFqsXDJznojUqCR2B4ULDmcJQYWy/gZQhSzgHxkRbE3qhCuLTe0OIqlSmh6I?=
	=?iso-8859-1?q?jdS8cW2wJQnRQ0rFVvw52O4OZVyk9O6oZmOVSwOb6J3pE05YuT5ammmm0qcc?=
	=?iso-8859-1?q?wQxEkHKOEegMA7OeJRHZlTSWe5xtcy9D8hlrxu8gHfBm6NJxuDvHomXWDy14?=
	=?iso-8859-1?q?Ke2ALaTj34hqw8qCTCLEQ0Iz27dZHPj3SYESOE6ne3Ou0qDHjOvKwaD/yivv?=
	=?iso-8859-1?q?VrLMTR0MtbiRObg2Lr8TFkqkdOsofKLd9kjziR9rA58CbczZF9b7G3w6kgH9?=
	=?iso-8859-1?q?F1Yr2Efz3n3WTR1xLd+rDJig95z4+pDTMu1kO84nQfxEiAgQPtck1EVss/aY?=
	=?iso-8859-1?q?Qx9wE7BZH5174Lz2fImkZUa2ODAK/wzf1wrVaznOCMnGuG36/YuJBoez43X2?=
	=?iso-8859-1?q?bXlyCQjoQx1t94ecOzJQczHZyoAf2U+CjmtwvUcGTlDBjycCKHe0zaoo+Vw7?=
	=?iso-8859-1?q?5CyycUFbf5t4yiEIUwIjijRf7NDwUwouyJxFOadzffxQa9zM2DQ89eQ4xYwH?=
	=?iso-8859-1?q?50NB/8loFzomJ767xURNVcdoJ3RJEhyJvsAjRGMfDE8MiMaifTIngxPGH5wl?=
	=?iso-8859-1?q?dxFgFqEDpbQf7rCCetiRosP2IUxwaivO8wYt8FMGtPxXiVpzy1To5T8409hw?=
	=?iso-8859-1?q?r3Ebp0cn7cfG7sUlGDng1vJ9jXWFVuGRxp+7OGA=3D=3D?=
MIME-Version: 1.0
X-OriginatorOrg: os.amperecomputing.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 a7db6a83-8ff5-4172-d5d9-08dca965b2b5
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:43.4347 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 f6/vKNu7vYbK1UyukwlGMAQQCfcgZNqpBZa8mIcfkv/Xlfog+fTy+7xrUOU3FW0HzHgvlPuesxx1WGOBPB/krWNIpJbF7mzIWrhMjzFzGUI=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

This patch extends original vect analysis and transform to support a new kind
of lane-reducing operation that participates in loop reduction indirectly. The
operation itself is not reduction statement, but its value would be accumulated
into reduction result finally.

Thanks,
Feng
---
gcc/
    	* tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane-
	reducing operation.
	(vect_transform_reduction): Extend transform for indirect lane-reducing
	operation.
---
 gcc/tree-vect-loop.cc | 48 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

From 5e65c65786d9594c172b58a6cd1af50c67efb927 Mon Sep 17 00:00:00 2001
From: Feng Xue <fxue@os.amperecomputing.com>
Date: Wed, 24 Apr 2024 16:46:49 +0800
Subject: [PATCH 3/5] vect: Enable lane-reducing operation that is not loop
 reduction statement

This patch extends original vect analysis and transform to support a new kind
of lane-reducing operation that participates in loop reduction indirectly. The
operation itself is not reduction statement, but its value would be accumulated
into reduction result finally.

2024-04-24 Feng Xue <fxue@os.amperecomputing.com>

gcc/
    	* tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane-
	reducing operation.
	(vect_transform_reduction): Extend transform for indirect lane-reducing
	operation.
---
 gcc/tree-vect-loop.cc | 48 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7d628efa60..c344158b419 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7520,9 +7520,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
 
   stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
 
-  /* TODO: Support lane-reducing operation that does not directly participate
-     in loop reduction.  */
-  if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0)
+  if (!reduc_info)
     return false;
 
   /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not
@@ -7530,7 +7528,16 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
   gcc_assert (STMT_VINFO_DEF_TYPE (reduc_info) == vect_reduction_def);
   gcc_assert (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION);
 
-  for (int i = 0; i < (int) gimple_num_ops (stmt) - 1; i++)
+  int sum_idx = STMT_VINFO_REDUC_IDX (stmt_info);
+  int num_ops = (int) gimple_num_ops (stmt) - 1;
+
+  /* Participate in loop reduction either directly or indirectly.  */
+  if (sum_idx >= 0)
+    gcc_assert (sum_idx  == num_ops - 1);
+  else
+    sum_idx = num_ops - 1;
+
+  for (int i = 0; i < num_ops; i++)
     {
       stmt_vec_info def_stmt_info;
       slp_tree slp_op;
@@ -7573,7 +7580,24 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
 
   tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info);
 
-  gcc_assert (vectype_in);
+  if (!vectype_in)
+    {
+      enum vect_def_type dt;
+      tree rhs1 = gimple_assign_rhs1 (stmt);
+
+      if (!vect_is_simple_use (rhs1, loop_vinfo, &dt, &vectype_in))
+	return false;
+
+      if (!vectype_in)
+	{
+	  vectype_in = get_vectype_for_scalar_type (loop_vinfo,
+						    TREE_TYPE (rhs1));
+	  if (!vectype_in)
+	    return false;
+	}
+
+      STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
+    }
 
   /* Compute number of effective vector statements for costing.  */
   unsigned int ncopies_for_cost = vect_get_num_copies (loop_vinfo, slp_node,
@@ -8750,9 +8774,17 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   gcc_assert (single_defuse_cycle || lane_reducing);
 
   if (lane_reducing)
-    {
-      /* The last operand of lane-reducing op is for reduction.  */
-      gcc_assert (reduc_index == (int) op.num_ops - 1);
+    {  
+      if (reduc_index < 0)
+	{
+	  reduc_index = (int) op.num_ops - 1;
+	  single_defuse_cycle = false;
+	}
+      else
+	{
+	  /* The last operand of lane-reducing op is for reduction.  */
+	  gcc_assert (reduc_index == (int) op.num_ops - 1);
+	}
     }
 
   /* Create the destination vector  */
-- 
2.17.1


From patchwork Sun Jul 21 09:15:50 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Feng Xue OS <fxue@os.amperecomputing.com>
X-Patchwork-Id: 1962886
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=fail reason="signature verification failed" (1024-bit key;
 unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com
 header.a=rsa-sha256 header.s=selector2 header.b=P6LdKsTG;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WRd8J4XwYz1yYm
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 19:16:48 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D0DA6386100E
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 09:16:46 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from CY4PR05CU001.outbound.protection.outlook.com
 (mail-westcentralusazlp170100000.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:c112::])
 by sourceware.org (Postfix) with ESMTPS id 8F3603861038
 for <gcc-patches@gcc.gnu.org>; Sun, 21 Jul 2024 09:15:53 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8F3603861038
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=os.amperecomputing.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=os.amperecomputing.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8F3603861038
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:c112::
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553365; cv=pass;
 b=vTd+zlQNwSlqJbj8z/W+8aIcdgy4svE2wPj7DQaURzbG0UNHAHTBYGyVkw8CMIuyVpPTM03kopcfUBKpi1U+d3wsmx95VDsGRhZGmgVlZwKr6IyJOyhWebsWoKzhbW/txnL2VS1Vg/9v8qVXlBZK4DFmIxQiOUcUDh8nfACXQpI=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1721553365; c=relaxed/simple;
 bh=lFW4j0Bu88oOCP1V54iGbTvYoKvCsulWiASq4xMtnYc=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=vacXBRzF0/WQUMoTVNMSDAUH+botS6ui40d4aTgGWSa4nU7hAbNXqpVjMNajzdJ1y/ZP2lHKoGTw0Wt94RnKql6tdB22Ru1mAM0mZm/QrM/fS0ig5acAm1e86p3ZmtMptmEuvfi5uXvmqfnx2W7GsmWkUWy9vFUspZysKKoYJ5s=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=A/EznIE8i20xFRSSTPRKZUT6Wk1YgJGG4tPZvuFOIiiqdE7WKqBAc8+15fiwTaeRas8YcghlsXwQUsXUPRYAe9BQjmymUfPvT71U4xhEWixGuBYTBuanpj4TDCNHxtv3vMOdtJ+F6GL4Rb8GrqNerV8rQgfIA7G/bfbSrSQipyBBQ/D1Qc2+KutMQuYXig94cv6iqNer0WivV5wzdZmveMaEZCzW8I1sqaVYrx9MUrz+7tVjeR6uhmpkbmEm/voTcMvNs4moC27OZWa5uzp9PuulgjcOC2CoXfWpK5Yv+AgnaijbVg6dwog/DjJi3ZhnX2tctkb4ERlOkdmdi1ruRg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=JQmJVVTCgn05ozv09MZ1rvYeC3A9uS2q0iCWakkpbXc=;
 b=DgpSqp+0W5SndkOXwnMNI0ryueVG/opc/gquSFWql8sWAMczMRZU4FpLjwxytOPGnlOirLTFX5QgTK+5xzxF9c6hEW41dBiSq0JQ+ei3imhTmM5vI8AOYg41VDug2PNUNBE11OfVieObi/xm993x9OOvJlhhhk0bO87PZWrTnmZy78vBVYgIf1oGwNqci7QYO8d7At8hs/dDkIaUjbHkjIUooYRnqOKLKOF/2Z5zBIfvK9C45uNvcExSU0YdJ0vAKjq7JX37Vhol4FlHt51h8HjwkwcA1bnUlU1CHDaqUu/jnYG02f+a38/NNDvaaZZEejDo5kA8L1kVpWkSYBra6g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none
 header.from=os.amperecomputing.com; dkim=pass
 header.d=os.amperecomputing.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=os.amperecomputing.com; s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=JQmJVVTCgn05ozv09MZ1rvYeC3A9uS2q0iCWakkpbXc=;
 b=P6LdKsTGpCDPewFRF7fLyVZF9uXvDk+TSrvxdShI1BVgCJk/18EIEGPbmEM2P6cXlj2uESAJUFCXeFx4xN5Zptrp1TOe+EQVDzUgADniGCRhSsQn6GkZSz1XM1qByM8yjAB/WiAbjvnzhrLQJ1Mg+aduz4Akbrm7J0pYgJ0+94I=
Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by
 DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7784.14; Sun, 21 Jul 2024 09:15:50 +0000
Received: from LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024
 09:15:50 +0000
From: Feng Xue OS <fxue@os.amperecomputing.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Biener <richard.guenther@gmail.com>, Tamar Christina
 <Tamar.Christina@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [RFC][PATCH 4/5] vect: Extend lane-reducing patterns to
 non-loop-reduction statement
Thread-Topic: [RFC][PATCH 4/5] vect: Extend lane-reducing patterns to
 non-loop-reduction statement
Thread-Index: AQHa20sZK5ftqy49LEKBrm3IvwHYqg==
Date: Sun, 21 Jul 2024 09:15:50 +0000
Message-ID: 
 <LV2PR01MB78394B51D6494BB0898ABAC3F7AF2@LV2PR01MB7839.prod.exchangelabs.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:50.352Z;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=os.amperecomputing.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_
x-ms-office365-filtering-correlation-id: 5657580c-3b2d-4396-0a58-08dca965b6f5
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|376014|366016|1800799024|38070700018;
x-microsoft-antispam-message-info: =?iso-8859-1?q?mphSJm5hCKpeYlFEKcPQLamwD4?=
	=?iso-8859-1?q?uYkGZTUcGznAWa35bXBiH87n6UjqoYhewHtD3nDn37fE7otzLtJzxOgNY5g4?=
	=?iso-8859-1?q?w4eRFTUC1eDH3Lu6ubNXQ8vSSps78zyOT1W7VOl+L06yn3dXXEpkCo1x86xA?=
	=?iso-8859-1?q?7T1pal2GL6FtS2dhx6cpxwcUEev/2Fptk5H4sxJyjuHSq7hOrsA8aVgT3M23?=
	=?iso-8859-1?q?Z9PdI9qDssATSNQZuBRiD9tmmaZ09Or8GMn+P6f0LuODxRhtm1OpKlC7lpZG?=
	=?iso-8859-1?q?uvCZsdUC2jhSVaN6NcA55E5xeVk+b+qmT6KTAJI/7R1Dpec1y+eDF/br5/Zu?=
	=?iso-8859-1?q?RZDtQ6FYlRLHdg/ma6xVzWmuD3kpUh10A+H1b8bmATqdiixHQgo+rCe+ttZK?=
	=?iso-8859-1?q?2wjZ5ok3rg3/EjLcBiUMih/WXUb70595Q0PkbsVfw+Zn0LqeLjuN4MoRV6p1?=
	=?iso-8859-1?q?nAA261XzJTuX93q5aPWhhEUzTLSq58I9qflHEX1JVVmxz3R+xtcchSc4oXpt?=
	=?iso-8859-1?q?3nBnWJ41UdexqTLLq2/lYED3tuE7K9GYOGG6kJi97L7jYOnOdXVBgbG7bdWx?=
	=?iso-8859-1?q?7Ob3f9MXVlkyyybUGHRIqlMPQs0fVS5XqCNAVquyssJUkNM/ahjvwqZnpDO+?=
	=?iso-8859-1?q?ixuBAj+/FzDLGCYklLLteEKdev8rKikPivkHhqQPCgVhaaJaC/7j0LNbNfyP?=
	=?iso-8859-1?q?oQ/QuzvJdNoIhRG99s3STmFBlKtsl9M78Ac3FiBbomfMXNvhBcECgC1SZ4+s?=
	=?iso-8859-1?q?PsRmzv9rv/3n4R6i5n+eln2HtVmh0P4m8yHgKVhxLtu95h563OD5oaPxm3hO?=
	=?iso-8859-1?q?2AfQAGUwfRV9pTwBY7+l2EFeACVXi9hrEy+ApCEpBwig366DRDEnscWcboK3?=
	=?iso-8859-1?q?2LSZrxrqswWHu3AD5lGGwLgG/ePPgtpSYXan3Tft9HShEXbrSU5yYuO/HErB?=
	=?iso-8859-1?q?eMznyoPUOrGQEljGTzgvY9qk3NIS8MiemXOYm3r0uUvgSpWUrx8s5yjz3ZQH?=
	=?iso-8859-1?q?DKvyBdzs76OzUZH51GRUcypxSuX10AMEUD7LQCgEuu8KIMiFwHEi+j9yM/c2?=
	=?iso-8859-1?q?UWOuGn4DBiliDAUMZ0whWiOK7pJC+NmC9k43oGNU0c1rYRNC9P3zNMp8BhBH?=
	=?iso-8859-1?q?6e981q9G6ynXt5JF+2c+7u6yGbVylDZGLnfKzUa8z7Xyr9C9WjdACyqPPvlc?=
	=?iso-8859-1?q?kDlPKdKMRMT7uwELWmePTQTqbTebJ1wWS9UC6yWYoJsvNCVZl6t2VDebaoUw?=
	=?iso-8859-1?q?dDSjxJhDi3engLDALI1JqPBxJ9F6k0wotcqKSixPCBCb1eWOrBd5AyPNuG3x?=
	=?iso-8859-1?q?6JO9TNBG6nGdkSZWcRzC1vWwQDowNACjB5mGyI6siFZHzrlifoQLIC+WN1PH?=
	=?iso-8859-1?q?GQj4TaNWeZW8oeZQJ2DmONFcVfFx/DhmwmEnJDqFY=3D?=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE;
 SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?3uNxm4waHf08eyhiVXOZdtv?=
	=?iso-8859-1?q?P03o8eaguak8iaFKXGhWQEUuKXdMzDDrPVomAghb7J0Kk9kgpLTCIE03jso1?=
	=?iso-8859-1?q?TaishUi8W6I5M5Ghulz3zZS8zkEB+AZi2yO+mSYHOsm5xk/XUfqBKJ/1hRNL?=
	=?iso-8859-1?q?Amvk8EI2O6fMZHsu/PsrvQSxpv/d+yKkbWsEZumZ9m75o7fvCNUrZElzkGh3?=
	=?iso-8859-1?q?wM8Pf8FPzMwSDRXXqTAMIaBSLWZXX9AUFb6h8ds7t7HJ880b+p/lwIPu9x7K?=
	=?iso-8859-1?q?QNiw5cNuG756S6J5NHuaMKqvBMkfaWcxZ3dBCwZehBiFKEKYV/mr7FhZYg2+?=
	=?iso-8859-1?q?5SL8xO/4UA40N4U9To9uUSkL1tYBiSkqYQcCaA/LFWsFhUjnsjCjWEMDejYc?=
	=?iso-8859-1?q?2oCDCQPyvONBa4Y2tox1k3Zuj0DXfhORZNE4KtBlhkV1839/449N/7jBWiWt?=
	=?iso-8859-1?q?HySwoNRbEzFINjLJVFLaqjbxMpxGBeWMnSlmAhMuVeBhDAoAIPbCMNwQ/rrx?=
	=?iso-8859-1?q?gLopiAYdbZSZJwQIfRpw9dCJZrf1y1immtdrQnerzfSCLU07vEqz5gZuoghN?=
	=?iso-8859-1?q?YZvvkINV28Yj4xY7DeJ8kMp9tJ9ObvnFO01ATERnd5Ne+muMuaIO3hPF7Fmy?=
	=?iso-8859-1?q?OxbXOW0vSUhqmpFoVs9fjfnaCupoYXwvdxHklHlOc3ZiKB/KG0SG/0MEWqaV?=
	=?iso-8859-1?q?CZkdwVOiDTv2v/5ia1hfd/88UnlDLE8Pg/5e9Sq7MXxkTL6c8c6pAToN82Vm?=
	=?iso-8859-1?q?mGsMbrjZyc0KfAeeLr7+AffpQQQH9fOifrlpijrJaGNXmpN4bkH7rF87RiPN?=
	=?iso-8859-1?q?j33Rrlh1445GymUk6Cn1W/57G4soJFNhz85gk1fIGO0vyNiAn7A7g7ve0pkl?=
	=?iso-8859-1?q?xLolUsX7xWU0d3HBv8sutjXz68vl2BG3taPU/dvSw6GTrnlj81rzL9vNKf54?=
	=?iso-8859-1?q?9aUrDIpmqKnAylhf9fWBIE3b7chahVpJauTuZErj/r0ryBHtm8/t93ZWzQak?=
	=?iso-8859-1?q?atOuiX7d+OKWmhSq5ERfwRacZXZ7wvPkfjuGA26I+EpJL550Gb9kiQOGsJZV?=
	=?iso-8859-1?q?a3DHqfB9B0iCuMzI1MsLjIgV3xEZRRd4QE3qSotR9cbyV1zycCdYEyRokSj3?=
	=?iso-8859-1?q?p5DhE4b7S0Q5mv6aTCnKPl9fXCbeGlkXvDyc6kXyH7ap1DB9y7N+zRinpfsf?=
	=?iso-8859-1?q?x6Ckl83Ogcx1HL1u0+A/9FbBBmJFf0+/Eu365qMYhgX/BB6kgYGge7jjugFD?=
	=?iso-8859-1?q?2E0/G0eR3Sp1wNDGgE287EhVTx45XTgbF9OGdpACSn7MPOBdVbUpCDXt6F0T?=
	=?iso-8859-1?q?WqfV00iFTfdo+ONrYZLXVpRO976XzfgYJu4lpcgp6cQk5uxvS3WxQYn3bG4e?=
	=?iso-8859-1?q?Mto45dgemmNqjlGlDYT6Czkj36yBxUah18GMOB7WDFxdr1uYIn4Rf65KHcVO?=
	=?iso-8859-1?q?v5LSaTs6ilT8pvN/ibpTvjpmnBZrluCgubj3Lk5mrxVB19QrCeG2DP9QZTQe?=
	=?iso-8859-1?q?GB9zwZVOSteIf2eHQLLlescQNSMVjYbnEh5tdUd05qTHIw40V0US5nxWu4Oq?=
	=?iso-8859-1?q?x7E4Vseh0KUo7czX8NXLETTeRTurF1aSSh3SafH6Z8QvHHlcuu7h3bjNeu/U?=
	=?iso-8859-1?q?Fr0GdcFhRchLzm+2S4WbZeQonQlZHLqDIEJr1Fw=3D=3D?=
MIME-Version: 1.0
X-OriginatorOrg: os.amperecomputing.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 5657580c-3b2d-4396-0a58-08dca965b6f5
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:50.6049 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 EQt9iUGFVjWwkswHHLuTr7NRfyJhAgCmTZOp8ntM3AKfkPGiZRh3MvGAOx6BiE/evPB+zcWUCz1kO32+wG07TpTbf3oU0BKcovwDpKze9W4=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

Previously, only simple lane-reducing case is supported, in which one loop
reduction statement forms one pattern match:

  char *d0, *d1, *s0, *s1, *w;
  for (i) {
    sum += d0[i] * d1[i];      // sum = DOT_PROD(d0, d1, sum);
    sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum);
    sum += w[i];               // sum = WIDEN_SUM(w, sum);
  }

This patch removes limitation of current lane-reducing matching strategy, and
extends candidate scope to the whole loop reduction affine closure. Thus, we
could optimize reduction with lane-reducing as many as possible, which ends up
with generalized pattern recognition as ("opX" denotes an operation for
lane-reducing pattern):

 for (i)
   sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

A lane-reducing operation contains two aspects: main primitive operation and
appendant result-accumulation. Original design handles match of the compound
semantics in single pattern, but the means is not suitable for operation that
does not directly participate in loop reduction. In this patch, we only focus
on the basic aspect, and leave another patch to cover the rest. An example
with dot-product:

    sum = DOT_PROD(d0, d1, sum);       // original
    sum = DOT_PROD(d0, d1, 0) + sum;   // now

Thanks,
Feng
---
gcc/
	* tree-vect-patterns (vect_reassociating_reduction_p): Remove the
	function.
	(vect_recog_dot_prod_pattern): Relax check to allow any statement in
	reduction affine closure.
	(vect_recog_sad_pattern): Likewise.
	(vect_recog_widen_sum_pattern): Likewise. And use dot-product if
	widen-sum is not supported.
	(vect_vect_recog_func_ptrs): Move lane-reducing patterns to the topmost.

gcc/testsuite/
    	* gcc.dg/vect/vect-reduc-affine-1.c
    	* gcc.dg/vect/vect-reduc-affine-2.c
    	* gcc.dg/vect/vect-reduc-affine-slp-1.c
---
 .../gcc.dg/vect/vect-reduc-affine-1.c         | 112 ++++++
 .../gcc.dg/vect/vect-reduc-affine-2.c         |  81 +++++
 .../gcc.dg/vect/vect-reduc-affine-slp-1.c     |  74 ++++
 gcc/tree-vect-patterns.cc                     | 321 ++++++------------
 4 files changed, 372 insertions(+), 216 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c

From 548026f343a3291a38cdf06575046be5d85fe33d Mon Sep 17 00:00:00 2001
From: Feng Xue <fxue@os.amperecomputing.com>
Date: Fri, 14 Jun 2024 15:45:26 +0800
Subject: [PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction
 statement

Previously, only simple lane-reducing case is supported, in which one loop
reduction statement forms one pattern match:

  char *d0, *d1, *s0, *s1, *w;
  for (i) {
    sum += d0[i] * d1[i];      // sum = DOT_PROD(d0, d1, sum);
    sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum);
    sum += w[i];               // sum = WIDEN_SUM(w, sum);
  }

This patch removes limitation of current lane-reducing matching strategy, and
extends candidate scope to the whole loop reduction affine closure. Thus, we
could optimize reduction with lane-reducing as many as possible, which ends up
with generalized pattern recognition as ("opX" denotes an operation for
lane-reducing pattern):

 for (i)
   sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

A lane-reducing operation contains two aspects: main primitive operation and
appendant result-accumulation. Original design handles match of the compound
semantics in single pattern, but the means is not suitable for operation that
does not directly participate in loop reduction. In this patch, we only focus
on the basic aspect, and leave another patch to cover the rest. An example
with dot-product:

    sum = DOT_PROD(d0, d1, sum);       // original
    sum = DOT_PROD(d0, d1, 0) + sum;   // now

2024-06-14 Feng Xue <fxue@os.amperecomputing.com>

gcc/
	* tree-vect-patterns (vect_reassociating_reduction_p): Remove the
	function.
	(vect_recog_dot_prod_pattern): Relax check to allow any statement in
	reduction affine closure.
	(vect_recog_sad_pattern): Likewise.
	(vect_recog_widen_sum_pattern): Likewise. And use dot-product if
	widen-sum is not supported.
	(vect_vect_recog_func_ptrs): Move lane-reducing patterns to the topmost.

gcc/testsuite/
    	* gcc.dg/vect/vect-reduc-affine-1.c
    	* gcc.dg/vect/vect-reduc-affine-2.c
    	* gcc.dg/vect/vect-reduc-affine-slp-1.c
---
 .../gcc.dg/vect/vect-reduc-affine-1.c         | 112 ++++++
 .../gcc.dg/vect/vect-reduc-affine-2.c         |  81 +++++
 .../gcc.dg/vect/vect-reduc-affine-slp-1.c     |  74 ++++
 gcc/tree-vect-patterns.cc                     | 321 ++++++------------
 4 files changed, 372 insertions(+), 216 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c
new file mode 100644
index 00000000000..a5e99ce703b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c
@@ -0,0 +1,112 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#define FN(name, S1, S2)				\
+S1 int __attribute__ ((noipa))				\
+name (S1 int res,					\
+      S2 char *restrict a,				\
+      S2 char *restrict b,				\
+      S2 int *restrict c,				\
+      S2 int cst1,					\
+      S2 int cst2,					\
+      int shift)					\
+{							\
+  for (int i = 0; i < N; i++)				\
+    res += a[i] * b[i] + 16;				\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += a[i] * b[i] + cst1;				\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += a[i] * b[i] + c[i];				\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += a[i] * b[i] * 23;				\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += a[i] * b[i] << 6;				\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += a[i] * b[i] * cst2;				\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += a[i] * b[i] << shift;			\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += cst1 * 5 - a[i] * b[i];			\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    res += ~(((a[i] * b[i] + 3) << shift) - c[i]);	\
+							\
+  asm volatile ("" ::: "memory");			\
+  for (int i = 0; i < N; i++)				\
+    {							\
+      S2 int t = a[i] * b[i];				\
+      res += (t * cst2) + ~((t - cst1) << 3);		\
+    }							\
+							\
+  asm volatile ("" ::: "memory");			\
+  S1 int res1 = 1;					\
+  S1 int res2 = 2;					\
+  for (int i = 0; i < N; i++)				\
+    {							\
+      S2 int t = a[i] * b[i];				\
+      res1 += (t * cst2) + 18;				\
+      res2 += (t - cst1) << shift;			\
+    }							\
+  res += res1 ^ res2;					\
+  return res;						\
+}
+
+FN(f1_vec_s, signed, signed)
+FN(f1_vec_u, unsigned, signed)
+
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
+FN(f1_novec_s, signed, signed)
+FN(f1_novec_u, unsigned, signed)
+#pragma GCC pop_options
+
+#define BASE ((int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  signed char a[N], b[N];
+  int c[N];
+
+  #pragma GCC novector
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE + i * 5;
+      b[i] = BASE + OFFSET + i * 4;
+      c[i] = i;
+    }
+
+  if (f1_vec_s (0x12345, a, b, c, -5, 17, 3) != f1_novec_s (0x12345, a, b, c, -5, 17, 3))
+    __builtin_abort ();
+
+  if (f1_vec_u (0x12345, a, b, c, -5, 17, 3) != f1_novec_u (0x12345, a, b, c, -5, 17, 3))
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing statement: \\S+ = DOT_PROD_EXPR" 20 "vect" { target vect_sdot_qi } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c
new file mode 100644
index 00000000000..a160bc72082
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c
@@ -0,0 +1,81 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 signed
+#endif
+
+#define FN(name, S1, S2, S3, S4)						\
+S1 int __attribute__ ((noipa))							\
+name (S1 int res,								\
+   S2 char *restrict a,								\
+   S2 char *restrict b,								\
+   S3 char *restrict c,								\
+   S3 char *restrict d,								\
+   S4 short *restrict e,							\
+   S4 short *restrict f,							\
+   S1 int *restrict g,								\
+   S1 int cst1)									\
+{										\
+  for (int i = 0; i < N; ++i)							\
+    {										\
+      short diff = a[i] - b[i];							\
+      S2 short abs = diff < 0 ? -diff : diff;					\
+      res += ((abs + i) << 3) - (c[i] + 1) * cst1 + d[i] * 3 + e[i]  - g[i];	\
+    }										\
+										\
+  return res;									\
+}
+
+FN(f1_vec, signed, unsigned, signed, signed)
+
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
+FN(f1_novec, signed, unsigned, signed, signed)
+#pragma GCC pop_options
+
+#define BASE2 ((unsigned int) -1 < 0 ? -126 : 4)
+#define BASE3 ((signed int) -1 < 0 ? -126 : 4)
+#define BASE4 ((signed int) -1 < 0 ? -1026 : 373)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  unsigned char a[N], b[N];
+  signed char c[N], d[N];
+  signed short e[N], f[N];
+  signed int g[N];
+
+#pragma GCC novector
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE2 + i * 5;
+      b[i] = BASE2 + OFFSET + i * 4;
+      c[i] = BASE3 + i * 2;
+      d[i] = BASE3 + OFFSET + i * 3;
+      e[i] = BASE4 + i * 6;
+      f[i] = BASE4 + OFFSET + i * 5;
+      g[i] = i;
+    }
+
+  if (f1_vec (0x12345, a, b, c, d, e, f, g, 17) != f1_novec (0x12345, a, b, c, d, e, f, g, 17))
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_qi } } } } */
+/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_udot_qi } } } } */
+/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_hi } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c
new file mode 100644
index 00000000000..0e76536925e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c
@@ -0,0 +1,74 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 100
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 unsigned
+#define SIGNEDNESS_3 signed
+#define SIGNEDNESS_4 signed
+#endif
+
+#define FN(name, S1, S2)					\
+S1 int __attribute__ ((noipa))					\
+name (S1 int res,						\
+      S2 char *restrict a,					\
+      S2 char *restrict b,					\
+      S2 short *restrict c,					\
+      S2 int *restrict d,					\
+      S1 int cst1,						\
+      S1 int cst2)						\
+{								\
+  for (int i = 0; i < N / 2; ++i)				\
+    {								\
+      res += ~((a[2 * i + 0] * b[2 * i + 0] + 1) << 3)		\
+	     - (c[2 * i + 0] + cst1) * cst2 + d[2 * i + 0];	\
+      res += ~((a[2 * i + 1] * b[2 * i + 1] + 1) << 3)		\
+	     - (c[2 * i + 1] + cst1) * cst2 + d[2 * i + 1];	\
+    }								\
+								\
+  return res;							\
+}
+
+FN(f1_vec, signed, signed)
+
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
+FN(f1_novec, signed, signed)
+#pragma GCC pop_options
+
+#define BASE2 ((signed int) -1 < 0 ? -126 : 4)
+#define BASE3 ((signed int) -1 < 0 ? -1026 : 373)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  signed char a[N], b[N];
+  signed short c[N];
+  signed int d[N];
+
+#pragma GCC novector
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE2 + i * 5;
+      b[i] = BASE2 + OFFSET + i * 4;
+      c[i] = BASE3 + i * 6;
+      d[i] = i;
+    }
+
+  if (f1_vec (0x12345, a, b, c, d, -5, 17) != f1_novec (0x12345, a, b, c, d, -5, 17))
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_qi } } } } */
+/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_hi } } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 02f6b942026..bb037af0b68 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1029,54 +1029,6 @@ vect_convert_output (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
   return pattern_stmt;
 }
 
-/* Return true if STMT_VINFO describes a reduction for which reassociation
-   is allowed.  If STMT_INFO is part of a group, assume that it's part of
-   a reduction chain and optimistically assume that all statements
-   except the last allow reassociation.
-   Also require it to have code CODE and to be a reduction
-   in the outermost loop.  When returning true, store the operands in
-   *OP0_OUT and *OP1_OUT.  */
-
-static bool
-vect_reassociating_reduction_p (vec_info *vinfo,
-				stmt_vec_info stmt_info, tree_code code,
-				tree *op0_out, tree *op1_out)
-{
-  loop_vec_info loop_info = dyn_cast <loop_vec_info> (vinfo);
-  if (!loop_info)
-    return false;
-
-  /* As a candidate of lane-reducing pattern matching, the statement must
-     be inside affine closure of loop reduction.  */
-  if (!(stmt_info->reduc_pattern_status & rpatt_allow))
-    return false;
-
-  gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
-  if (!assign || gimple_assign_rhs_code (assign) != code)
-    return false;
-
-  /* We don't allow changing the order of the computation in the inner-loop
-     when doing outer-loop vectorization.  */
-  class loop *loop = LOOP_VINFO_LOOP (loop_info);
-  if (loop && nested_in_vect_loop_p (loop, stmt_info))
-    return false;
-
-  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
-    {
-      if (needs_fold_left_reduction_p (TREE_TYPE (gimple_assign_lhs (assign)),
-				       code))
-	return false;
-    }
-  else if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) == NULL)
-    return false;
-
-  *op0_out = gimple_assign_rhs1 (assign);
-  *op1_out = gimple_assign_rhs2 (assign);
-  if (commutative_tree_code (code) && STMT_VINFO_REDUC_IDX (stmt_info) == 0)
-    std::swap (*op0_out, *op1_out);
-  return true;
-}
-
 /* match.pd function to match
    (cond (cmp@3 a b) (convert@1 c) (convert@2 d))
    with conditions:
@@ -1189,96 +1141,60 @@ vect_recog_cond_expr_convert_pattern (vec_info *vinfo,
      S3  x_T = (TYPE1) x_t;
      S4  y_T = (TYPE1) y_t;
      S5  prod = x_T * y_T;
-     [S6  prod = (TYPE2) prod;  #optional]
-     S7  sum_1 = prod + sum_0;
+     [S6+ value = affine_fn (prod, ...);  #optional]
+     S7  sum_1 = value + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
-   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
-   'type1a' and 'type1b' can differ.
+   There exisits natural widening conversion from both 'type1a' and 'type1b'
+   to 'TYPE1'.  The function 'affine_fn' represents a linear transform in
+   concept of math, and may be composed by a series of statements.
 
    Input:
 
    * STMT_VINFO: The stmt from which the pattern search begins.  In the
-   example, when this function is called with S7, the pattern {S3,S4,S5,S6,S7}
-   will be detected.
+   example, when this function is called with S5, the pattern {S3,S4,S5} will
+   be detected if S5 is known to be in affine closure of reduction for 'sum'.
 
    Output:
 
-   * TYPE_OUT: The type of the output  of this pattern.
+   * TYPE_OUT: The type of the output of this pattern.
 
    * Return value: A new stmt that will be used to replace the sequence of
    stmts that constitute the pattern. In this case it will be:
-        WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
+	DOT_PROD_EXPR <x_t, y_t, 0>
 
    Note: The dot-prod idiom is a widening reduction pattern that is
-         vectorized without preserving all the intermediate results. It
-         produces only N/2 (widened) results (by summing up pairs of
-         intermediate results) rather than all N results.  Therefore, we
-         cannot allow this pattern when we want to get all the results and in
-         the correct order (as is the case when this computation is in an
-         inner-loop nested in an outer-loop that us being vectorized).  */
+	 vectorized without preserving all the intermediate results. It
+	 produces less than N (widened) results (by summing up pairs of
+	 intermediate results) rather than all N results.  Therefore, we
+	 cannot allow this pattern when we want to get all the results and in
+	 the correct order (as is the case when this computation is in an
+	 inner-loop nested in an outer-loop that us being vectorized).  */
 
 static gimple *
 vect_recog_dot_prod_pattern (vec_info *vinfo,
 			     stmt_vec_info stmt_vinfo, tree *type_out)
 {
-  tree oprnd0, oprnd1;
-  gimple *last_stmt = stmt_vinfo->stmt;
-  tree type, half_type;
-  gimple *pattern_stmt;
-  tree var;
-
-  /* Look for the following pattern
-          DX = (TYPE1) X;
-          DY = (TYPE1) Y;
-          DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
-          sum_1 = DDPROD + sum_0;
-     In which
-     - DX is double the size of X
-     - DY is double the size of Y
-     - DX, DY, DPROD all have the same type but the sign
-       between X, Y and DPROD can differ.
-     - sum is the same size of DPROD or bigger
-     - sum has been recognized as a reduction variable.
-
-     This is equivalent to:
-       DPROD = X w* Y;          #widen mult
-       sum_1 = DPROD w+ sum_0;  #widen summation
-     or
-       DPROD = X w* Y;          #widen mult
-       sum_1 = DPROD + sum_0;   #summation
-   */
-
-  /* Starting from LAST_STMT, follow the defs of its uses in search
-     of the above pattern.  */
-
-  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
-				       &oprnd0, &oprnd1))
+  if (!(stmt_vinfo->reduc_pattern_status & rpatt_allow))
     return NULL;
 
-  type = TREE_TYPE (gimple_get_lhs (last_stmt));
-
+  gimple *last_stmt = stmt_vinfo->stmt;
+  tree value = gimple_get_lhs (last_stmt);
+  tree type = TREE_TYPE (value);
+  tree half_type;
   vect_unpromoted_value unprom_mult;
-  oprnd0 = vect_look_through_possible_promotion (vinfo, oprnd0, &unprom_mult);
 
-  /* So far so good.  Since last_stmt was detected as a (summation) reduction,
-     we know that oprnd1 is the reduction variable (defined by a loop-header
-     phi), and oprnd0 is an ssa-name defined by a stmt in the loop body.
-     Left to check that oprnd0 is defined by a (widen_)mult_expr  */
-  if (!oprnd0)
+  value = vect_look_through_possible_promotion (vinfo, value, &unprom_mult);
+  if (!value)
     return NULL;
 
-  stmt_vec_info mult_vinfo = vect_get_internal_def (vinfo, oprnd0);
+  stmt_vec_info mult_vinfo = vect_get_internal_def (vinfo, value);
   if (!mult_vinfo)
     return NULL;
 
-  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
-     inside the loop (in case we are analyzing an outer-loop).  */
-  vect_unpromoted_value unprom0[2];
+  vect_unpromoted_value unprom[2];
   enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type, &subtype))
+			     false, 2, unprom, &half_type, &subtype))
     return NULL;
 
   /* If there are two widening operations, make sure they agree on the sign
@@ -1318,16 +1234,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype, subtype);
-
-  var = vect_recog_temp_ssa_var (type, NULL);
-  pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
-				      mult_oprnd[0], mult_oprnd[1], oprnd1);
+		       unprom, half_vectype, subtype);
 
+  tree var = vect_recog_temp_ssa_var (type, NULL);
+  gimple *pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
+					      mult_oprnd[0], mult_oprnd[1],
+					      build_zero_cst (type));
   return pattern_stmt;
 }
 
-
 /* Function vect_recog_sad_pattern
 
    Try to find the following Sum of Absolute Difference (SAD) pattern:
@@ -1343,18 +1258,20 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      S4  y_T = (TYPE1) y_t;
      S5  diff = x_T - y_T;
      S6  abs_diff = ABS_EXPR <diff>;
-     [S7  abs_diff = (TYPE2) abs_diff;  #optional]
-     S8  sum_1 = abs_diff + sum_0;
+     [S7+ value = affine_fn (abs_diff, ...);  #optional]
+     S8  sum_1 = value + sum_0;
 
    where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
-   computation.
+   same size of 'TYPE1' or bigger.  The function 'affine_fn' represents a
+   linear transform in concept of math, and may be composed by a series of
+   statements.  This is a special case of a reduction computation.
 
    Input:
 
    * STMT_VINFO: The stmt from which the pattern search begins.  In the
-   example, when this function is called with S8, the pattern
-   {S3,S4,S5,S6,S7,S8} will be detected.
+   example, when this function is called with S6, the pattern {S3,S4,S5,S6}
+   will be detected if S6 is known to be in affine closure of reduction for
+   'sum'.
 
    Output:
 
@@ -1362,49 +1279,24 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
    * Return value: A new stmt that will be used to replace the sequence of
    stmts that constitute the pattern. In this case it will be:
-        SAD_EXPR <x_t, y_t, sum_0>
+	SAD_EXPR <x_t, y_t, 0>
   */
 
 static gimple *
 vect_recog_sad_pattern (vec_info *vinfo,
 			stmt_vec_info stmt_vinfo, tree *type_out)
 {
+  if (!(stmt_vinfo->reduc_pattern_status & rpatt_allow))
+    return NULL;
+
   gimple *last_stmt = stmt_vinfo->stmt;
   tree half_type;
 
-  /* Look for the following pattern
-          DX = (TYPE1) X;
-          DY = (TYPE1) Y;
-          DDIFF = DX - DY;
-          DAD = ABS_EXPR <DDIFF>;
-          DDPROD = (TYPE2) DPROD;
-          sum_1 = DAD + sum_0;
-     In which
-     - DX is at least double the size of X
-     - DY is at least double the size of Y
-     - DX, DY, DDIFF, DAD all have the same type
-     - sum is the same size of DAD or bigger
-     - sum has been recognized as a reduction variable.
-
-     This is equivalent to:
-       DDIFF = X w- Y;          #widen sub
-       DAD = ABS_EXPR <DDIFF>;
-       sum_1 = DAD w+ sum_0;    #widen summation
-     or
-       DDIFF = X w- Y;          #widen sub
-       DAD = ABS_EXPR <DDIFF>;
-       sum_1 = DAD + sum_0;     #summation
-   */
-
   /* Starting from LAST_STMT, follow the defs of its uses in search
      of the above pattern.  */
 
-  tree plus_oprnd0, plus_oprnd1;
-  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
-				       &plus_oprnd0, &plus_oprnd1))
-    return NULL;
-
-  tree sum_type = TREE_TYPE (gimple_get_lhs (last_stmt));
+  tree value = gimple_get_lhs (last_stmt);
+  tree type = TREE_TYPE (value);
 
   /* Any non-truncating sequence of conversions is OK here, since
      with a successful match, the result of the ABS(U) is known to fit
@@ -1412,23 +1304,15 @@ vect_recog_sad_pattern (vec_info *vinfo,
      negative of the minimum signed value due to the range of the widening
      MINUS_EXPR.)  */
   vect_unpromoted_value unprom_abs;
-  plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
-						      &unprom_abs);
-
-  /* So far so good.  Since last_stmt was detected as a (summation) reduction,
-     we know that plus_oprnd1 is the reduction variable (defined by a loop-header
-     phi), and plus_oprnd0 is an ssa-name defined by a stmt in the loop body.
-     Then check that plus_oprnd0 is defined by an abs_expr.  */
 
-  if (!plus_oprnd0)
+  value = vect_look_through_possible_promotion (vinfo, value, &unprom_abs);
+  if (!value)
     return NULL;
 
-  stmt_vec_info abs_stmt_vinfo = vect_get_internal_def (vinfo, plus_oprnd0);
+  stmt_vec_info abs_stmt_vinfo = vect_get_internal_def (vinfo, value);
   if (!abs_stmt_vinfo)
     return NULL;
 
-  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
-     inside the loop (in case we are analyzing an outer-loop).  */
   gassign *abs_stmt = dyn_cast <gassign *> (abs_stmt_vinfo->stmt);
   vect_unpromoted_value unprom[2];
 
@@ -1467,22 +1351,22 @@ vect_recog_sad_pattern (vec_info *vinfo,
 					    unprom, NULL))
     return NULL;
 
-  vect_pattern_detected ("vect_recog_sad_pattern", last_stmt);
-
   tree half_vectype;
-  if (!vect_supportable_direct_optab_p (vinfo, sum_type, SAD_EXPR, half_type,
+  if (!vect_supportable_direct_optab_p (vinfo, type, SAD_EXPR, half_type,
 					type_out, &half_vectype))
     return NULL;
 
+  vect_pattern_detected ("vect_recog_sad_pattern", last_stmt);
+
   /* Get the inputs to the SAD_EXPR in the appropriate types.  */
   tree sad_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, sad_oprnd, half_type,
 		       unprom, half_vectype);
 
-  tree var = vect_recog_temp_ssa_var (sum_type, NULL);
+  tree var = vect_recog_temp_ssa_var (type, NULL);
   gimple *pattern_stmt = gimple_build_assign (var, SAD_EXPR, sad_oprnd[0],
-					      sad_oprnd[1], plus_oprnd1);
-
+					      sad_oprnd[1],
+					      build_zero_cst (type));
   return pattern_stmt;
 }
 
@@ -2492,30 +2376,35 @@ vect_recog_pow_pattern (vec_info *vinfo,
      TYPE x_T, sum = init;
    loop:
      sum_0 = phi <init, sum_1>
-     S1  x_t = *p;
+     S1  x_t = ...;
      S2  x_T = (TYPE) x_t;
-     S3  sum_1 = x_T + sum_0;
+     [S3+ value = affine_fn (x_T, ...);  #optional]
+     S4  sum_1 = value + sum_0;
 
    where type 'TYPE' is at least double the size of type 'type', i.e - we're
-   summing elements of type 'type' into an accumulator of type 'TYPE'. This is
-   a special case of a reduction computation.
+   summing elements of type 'type' into an accumulator of type 'TYPE'.  The
+   function 'affine_fn' represents a linear transform in concept of math, and
+   may be composed by a series of statements.  This is a special case of a
+   reduction computation.
 
    Input:
 
    * STMT_VINFO: The stmt from which the pattern search begins. In the example,
-   when this function is called with S3, the pattern {S2,S3} will be detected.
+   when this function is called with S2, the pattern {S2} will be detected if
+   S2 is known to be in affine closure of reduction for 'sum'.
 
    Output:
 
    * TYPE_OUT: The type of the output of this pattern.
 
    * Return value: A new stmt that will be used to replace the sequence of
-   stmts that constitute the pattern. In this case it will be:
-        WIDEN_SUM <x_t, sum_0>
+   stmts that constitute the pattern.  In this case it will be
+   WIDEN_SUM_EXPR <x_t, 0> if the operation is supported by target, otherwise,
+   DOT_PROD_EXPR <x_t, 1, 0> if dot-product could be used.
 
    Note: The widening-sum idiom is a widening reduction pattern that is
 	 vectorized without preserving all the intermediate results. It
-         produces only N/2 (widened) results (by summing up pairs of
+	 produces less than N (widened) results (by summing up pairs of
 	 intermediate results) rather than all N results.  Therefore, we
 	 cannot allow this pattern when we want to get all the results and in
 	 the correct order (as is the case when this computation is in an
@@ -2525,49 +2414,42 @@ static gimple *
 vect_recog_widen_sum_pattern (vec_info *vinfo,
 			      stmt_vec_info stmt_vinfo, tree *type_out)
 {
+  if (!(stmt_vinfo->reduc_pattern_status & rpatt_allow))
+    return NULL;
+
   gimple *last_stmt = stmt_vinfo->stmt;
-  tree oprnd0, oprnd1;
-  tree type;
-  gimple *pattern_stmt;
+  tree value = gimple_get_lhs (last_stmt);
+  tree type = TREE_TYPE (value);
+  gimple *pattern_stmt = NULL;
+  vect_unpromoted_value unprom;
   tree var;
 
-  /* Look for the following pattern
-          DX = (TYPE) X;
-          sum_1 = DX + sum_0;
-     In which DX is at least double the size of X, and sum_1 has been
-     recognized as a reduction variable.
-   */
-
-  /* Starting from LAST_STMT, follow the defs of its uses in search
-     of the above pattern.  */
-
-  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
-				       &oprnd0, &oprnd1)
-      || TREE_CODE (oprnd0) != SSA_NAME
-      || !vinfo->lookup_def (oprnd0))
+  /* Check that value is defined by a widening cast.  */
+  if (!vect_look_through_possible_promotion (vinfo, value, &unprom)
+      || TYPE_PRECISION (unprom.type) * 2 > TYPE_PRECISION (type))
     return NULL;
 
-  type = TREE_TYPE (gimple_get_lhs (last_stmt));
-
-  /* So far so good.  Since last_stmt was detected as a (summation) reduction,
-     we know that oprnd1 is the reduction variable (defined by a loop-header
-     phi), and oprnd0 is an ssa-name defined by a stmt in the loop body.
-     Left to check that oprnd0 is defined by a cast from type 'type' to type
-     'TYPE'.  */
-
-  vect_unpromoted_value unprom0;
-  if (!vect_look_through_possible_promotion (vinfo, oprnd0, &unprom0)
-      || TYPE_PRECISION (unprom0.type) * 2 > TYPE_PRECISION (type))
+  /* TODO: Support widening-sum on boolean value.  */
+  if (TREE_CODE (unprom.type) != INTEGER_TYPE)
     return NULL;
 
-  vect_pattern_detected ("vect_recog_widen_sum_pattern", last_stmt);
-
-  if (!vect_supportable_direct_optab_p (vinfo, type, WIDEN_SUM_EXPR,
-					unprom0.type, type_out))
-    return NULL;
+  if (vect_supportable_direct_optab_p (vinfo, type, WIDEN_SUM_EXPR,
+				       unprom.type, type_out))
+    {
+      var = vect_recog_temp_ssa_var (type, NULL);
+      pattern_stmt = gimple_build_assign (var, WIDEN_SUM_EXPR, unprom.op,
+					  build_zero_cst (type));
+    }
+  else if (vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
+					    unprom.type, type_out))
+    {
+      var = vect_recog_temp_ssa_var (type, NULL);
+      pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR, unprom.op,
+					  build_one_cst (unprom.type),
+					  build_zero_cst (type));
+    }
 
-  var = vect_recog_temp_ssa_var (type, NULL);
-  pattern_stmt = gimple_build_assign (var, WIDEN_SUM_EXPR, unprom0.op, oprnd1);
+  vect_pattern_detected ("vect_recog_widen_sum_pattern", last_stmt);
 
   return pattern_stmt;
 }
@@ -7191,8 +7073,18 @@ struct vect_recog_func
 
 /* Note that ordering matters - the first pattern matching on a stmt is
    taken which means usually the more complex one needs to preceed the
-   less comples onex (widen_sum only after dot_prod or sad for example).  */
+   less complex ones (widen_sum only after dot_prod or sad for example).  */
 static vect_recog_func vect_vect_recog_func_ptrs[] = {
+
+  /* Lane-reducing patterns(dot_prod/sad/widen_sum) are not that sort of
+     local statement-based patterns, in that they require knowledge of
+     loop structure.  Naturally, it is anticipated that these patterns
+     would benefit loop vectorization much more than peephole-like
+     patterns.  So give lane-reducing patterns overriding priorities.  */
+  { vect_recog_dot_prod_pattern, "dot_prod" },
+  { vect_recog_sad_pattern, "sad" },
+  { vect_recog_widen_sum_pattern, "widen_sum" },
+
   { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
   { vect_recog_bit_insert_pattern, "bit_insert" },
   { vect_recog_abd_pattern, "abd" },
@@ -7204,9 +7096,6 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_mulhs_pattern, "mult_high" },
   { vect_recog_cast_forwprop_pattern, "cast_forwprop" },
   { vect_recog_widen_mult_pattern, "widen_mult" },
-  { vect_recog_dot_prod_pattern, "dot_prod" },
-  { vect_recog_sad_pattern, "sad" },
-  { vect_recog_widen_sum_pattern, "widen_sum" },
   { vect_recog_pow_pattern, "pow" },
   { vect_recog_popcount_clz_ctz_ffs_pattern, "popcount_clz_ctz_ffs" },
   { vect_recog_ctz_ffs_pattern, "ctz_ffs" },
-- 
2.17.1


From patchwork Sun Jul 21 09:15:56 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Feng Xue OS <fxue@os.amperecomputing.com>
X-Patchwork-Id: 1962888
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=fail reason="signature verification failed" (1024-bit key;
 unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com
 header.a=rsa-sha256 header.s=selector2 header.b=cvrE7w79;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WRd9l72dvz1yYm
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 19:18:03 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 39291386075D
	for <incoming@patchwork.ozlabs.org>; Sun, 21 Jul 2024 09:18:02 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from CY4PR05CU001.outbound.protection.outlook.com
 (mail-westcentralusazlp170100000.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:c112::])
 by sourceware.org (Postfix) with ESMTPS id E6224385ED72
 for <gcc-patches@gcc.gnu.org>; Sun, 21 Jul 2024 09:16:05 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E6224385ED72
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=os.amperecomputing.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=os.amperecomputing.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E6224385ED72
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:c112::
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553371; cv=pass;
 b=fw16F8635u5EdjSwnbITY8EcI9h9ornFEH4q3oEsoAA1uQnj5d2d/4voBJZR5QwbzEvkQ/0k8inxkVdf5ZzphKJF+ADD3vjSDgSZKvR9AfnV90JcG9So8I87+WZavPdE5awzsj5l7zLUow49yR/PK9SCjxYz5l2gMzMPCcM3IvE=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1721553371; c=relaxed/simple;
 bh=ypd3i8eGmdrq5BcDTkDP8yxfxFTI0X7AUgrTyNOsSuU=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=cHLh/VKEGsjsAPCoyGi1Wwcgln/k9m8qS8gX3d41fd/wAs+KMY69iySRqB1JWxs0ak9EOsfLlDGNGEsSIfdcjdEijDqUxpViOkF9gbYx/siE1IatAZ4H1zsHG0zl1cLTVOutoSW6h8XLXl7mSy0BTTTDe7o96PUrAuJFSmgqvRQ=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=S9rwSHvMGzMpUENg+SQQ9NH+VhA4jaYydOp+gQZvYdGoqMLh6kFXZv1teYzYAp2wXBFyNKrBmwfdgcuDbYe/Fo3Aw69f99B1IUqNLA/uCZnoxscUKPIlZ2O+SpV7oQQ8BXK959rf2KwEqGSpvV/TMnHmu8GcHT/Bk7w2p9ogr8NWIFLypPTAz72rdjuMHi+APpBD+9Y5Qr7wr7hRKWzOla9I7R0bU/CJAoQbKOOrghSYPdtFXjZghrm5ZqnybRGdzNr/CbXyXaIyF5l1CAQRdCatmTSZTFybssXKzTtYD9pUQ3sXm735k/WazkhJBELf+NOYj/dAf4UvDGVix1iaAw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=92owvmFg60FK1xhhdSdWd/j2iojsmNjkdIWNGHqfyLM=;
 b=IVTkZmZLXtkjVcHj8rdCW4dhyx5PKkL1iFYtpraF7ovZWm62sdwB0bgPp4iDPK2ZS9RPW2ifR//MxTC9iMoxaHBheIo95yQ0X5sHQTO2sGeCM9O6HZd2vZSzUFLSO5MbhOheckUbYHKq/Tf1E1jHaAmQYOiiBn0c0ds38xdGvaDBghpLZ8/Cs1Iw48OUC3p02hvLjOr9ZKkguxXYmsP2BPUmuILW4IP2hJvSa6E8KgFSI+gVWyYWjYlNAveaqw4jdHrZ6bPw23ZfXEVNpkPf5amrJ4+yovq3pm/imTfrCKjW2Fjha9+QUnY250B8IuNP2ZRdwhPRSY/2UzZ2hVGf4w==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none
 header.from=os.amperecomputing.com; dkim=pass
 header.d=os.amperecomputing.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=os.amperecomputing.com; s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=92owvmFg60FK1xhhdSdWd/j2iojsmNjkdIWNGHqfyLM=;
 b=cvrE7w79Tq7wzxObZhpUAN87/GSDx4cbTm2MI5NFqI186FX8JHVwq4JT0lRBl4G9u4SzPaV7OHljnao1M1R55KXFhGnhndGwkUNo1IGwujJj2ZQn6xjfxfSDbPXYxUoEL2ZB0aiu5+ke9lr571vjvcjs7WeBVwBEvYVj6TYI08w=
Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by
 DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7784.14; Sun, 21 Jul 2024 09:15:56 +0000
Received: from LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024
 09:15:56 +0000
From: Feng Xue OS <fxue@os.amperecomputing.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Biener <richard.guenther@gmail.com>, Tamar Christina
 <Tamar.Christina@arm.com>, Richard Sandiford <Richard.Sandiford@arm.com>
Subject: [RFC][PATCH 5/5] vect: Add accumulating-result pattern for
 lane-reducing operation
Thread-Topic: [RFC][PATCH 5/5] vect: Add accumulating-result pattern for
 lane-reducing operation
Thread-Index: AQHa20tQ4OgEd3B6NkG7ouF+1GXWyQ==
Date: Sun, 21 Jul 2024 09:15:56 +0000
Message-ID: 
 <LV2PR01MB7839AE52B14A240021B2E057F7AF2@LV2PR01MB7839.prod.exchangelabs.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:56.401Z;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=os.amperecomputing.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_
x-ms-office365-filtering-correlation-id: b2a2d631-c70e-452d-1f8c-08dca965ba92
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|376014|366016|1800799024|38070700018;
x-microsoft-antispam-message-info: =?iso-8859-1?q?eefl5+hwqNOe3pYLNkv6ev45LV?=
	=?iso-8859-1?q?/d9I2Y+jMGmJMpqPolzDdVvnvNGlRNwMgP7BEJSM2agQR6fHkvGA9fJ2Zm2i?=
	=?iso-8859-1?q?p4JeXJXxSCVxwCXT7PzKlK5DUB/vjzcXf8mS7AFcPTJPuXN+7HnwvgbwVhB4?=
	=?iso-8859-1?q?0Q0tvVKmO4bsHbDa/s8Xm4EpoewiH274lRXfdJ+2opAlJo+2fl/D38cm1jWm?=
	=?iso-8859-1?q?TIXNpJwyJ2lwNfGuBQ4FKwRHjgGMvqI8aP/HjlDcqhzV/pOAfTedx830HD24?=
	=?iso-8859-1?q?7PoLZtGi/QT1+bHbk4fzCLHIW6jRi2zixUhtJ38XQSBG+FzJSY9yHWx+oV3c?=
	=?iso-8859-1?q?+Hy9YXU3I7cRWVTSw/+W99BHtZ7tlHUZdl9Zbj0Go8T49TXJ1WwbwcU7rqeU?=
	=?iso-8859-1?q?YSL6eNgz1NQb+pxfK8/bsWZPk01hg8IRijoz6UJJbwOKo95aES57GBA8b1Xc?=
	=?iso-8859-1?q?l7kAbF8omdhj5Ix/Ub6I5T8QRkNzDwG2armIKsFdc972uWRkWxcOaW2k39T5?=
	=?iso-8859-1?q?wktkfsq/58SUot7kRLTh0+KMdgxvO9ZobwWX36gEaL3HZbo+BWkEDHKBCCh4?=
	=?iso-8859-1?q?HSBByT0Ulpj2jtqxiPAT/RCagO+cUlseCzk5NK8+W1y4Ib6AyQJTncqqEmsT?=
	=?iso-8859-1?q?Qw4RVmJ76NkWrIirKRnMUQ0uNQis4QHTyIZ01L92ECtuiDFI79S/n9V/I9Kl?=
	=?iso-8859-1?q?SwdmvEUTwlNwYDd5UQ86jK4+lB3tS7YCVnxeHbYP7Vh1Pt4M3zHRV3QWAwYp?=
	=?iso-8859-1?q?oq7pbQuOVEBY2cjVh8CGPvGdqE7SdmrB0qIB4Z1JDBZ6oSKl+Yu8w3i1bKs3?=
	=?iso-8859-1?q?ZhrZ4Sqr9/f4q3V7gO46KeNa0aTGS9cPJBK0842Xezev6R5hzZztPF39HAjU?=
	=?iso-8859-1?q?s8RqTeLytOkloXtkG4WIMi6w6GluahcheW32NAxTUolhexlc55CIoKZnpVRp?=
	=?iso-8859-1?q?/Sd2P4f/vWtti85u9XspWtmKuzrESeEKFiqcJgegnh9PNZSTo+kJOxHcwDF/?=
	=?iso-8859-1?q?qrWug447Vgfgq6jZeox1KXY29pUv2sXSE31+PVmhnBPvvchxLSZuDpI4A5OM?=
	=?iso-8859-1?q?u9ezdrmQUd290Y5coM3yBPwfRoEFOLrjbsXApfuX9DXBCPKftLspdLdvaaAj?=
	=?iso-8859-1?q?f7yuc5zV/Tt20eFOwcTnw4QoV17iAmzJst4BLpUVbFWUS8zl/9POWGqaA70k?=
	=?iso-8859-1?q?ivJuQm4dD0cBno0LLJIuMdFE+FKVpRbys9XL5iimlPNpZ3TyNBjVqJhokbFn?=
	=?iso-8859-1?q?bmorxw/NaN2jaVp4qQSa0HG43y7xPTdLkJKOodKKkkMhoq4rrbYe9JQDfP2u?=
	=?iso-8859-1?q?mNpGSOvHF/rYbzJzVJH1uysv3eA7aMvqFxBSzDQyD2obg8lznywAJk4hWdOX?=
	=?iso-8859-1?q?LEl7+L3IgXkypzghXEJzT8VgQ8p9kwFGf7F5V21Dc=3D?=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE;
 SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?LNZlY0MlyGvZzZFFL4sDR0f?=
	=?iso-8859-1?q?M/7ssSaVfK6aIFOBEI/Bg8pRGX4s41z6wA7BSj/FKVDMFgM+FVjQeswxVjgq?=
	=?iso-8859-1?q?Fr/XBTQ3yPxmxR5LDpNJPCnoc5GokevrbjkRYonel+j2MArTSRduVl/3EipS?=
	=?iso-8859-1?q?gqCDfRj88CkivSDwmiZCwa0+YbWHarD0t5yEPrHQVJU6p0F5g0zYiyhWvMat?=
	=?iso-8859-1?q?LDeCly/310t23UUFRDzH+u4M2L5fhBAEglceIj3bWS/HJvcWdrDsirdJOqKX?=
	=?iso-8859-1?q?LfKGwLKQooKnkot5fnU2La8YsLcV+r4hJlr0Rb0yKZCnTUXMYeTJ5oIjB3R9?=
	=?iso-8859-1?q?erB1wQCakNxv49VD7ekzf4/1NKqV1napkPZv0bBGb7Mrj4ENDGkD545XTJwQ?=
	=?iso-8859-1?q?YM1HRAs/uWpvrzHTMooFQjxk0cm8ekIeQIURaQVfHiVC0wD1tDUycwwxHL93?=
	=?iso-8859-1?q?MCmbIz21gN0kw6ZiormPgb+LCvoXn5B4B5ItgjIm4Tr6qel5CHqUk/u1EA66?=
	=?iso-8859-1?q?Nkp9elLzZy/sNiJNro/FbQIFfp/NOwc7751bYeRBOdw3w3Zi8ipTVIjEZMTG?=
	=?iso-8859-1?q?pJRKdevFKZ7W84dvwx6d3J+jeDQtotNWDil+QW4QHQSMd8/DdzyZUvHAWZF0?=
	=?iso-8859-1?q?Za4TzdB8/b6O8qgRU7BDiYR5Izf8KSJrTM7NJOC1RFblwckSkNCyOyoXJQjD?=
	=?iso-8859-1?q?Rwp1Dvcje9hRVNc7HX9k7KMa2cXzc3HFItNa9MccQGlpTOvzSANiXQ+y/Hzi?=
	=?iso-8859-1?q?ne4Hio+Vffq9yAdthj8gQ6XSss1IVbP9Buk2no31kx4c8pgrnfKs1nWZWN1R?=
	=?iso-8859-1?q?Y1jwNx/RowB3vA/CQUUiTlKSfnGWVUrSRNfvtpYSPBpa4iPLd+/agNVMynZj?=
	=?iso-8859-1?q?SfBVDjHUhXX9vmUPYYKAY8bzMAWXzNMUn4bjM0GHB692o3eJ5+T5GnZ7MN+e?=
	=?iso-8859-1?q?51n0H8Ehon3Me83/DiTCx9Otx7PTYT6BE9jwuV/iFYOgfVzL+dtIEWU3WdLM?=
	=?iso-8859-1?q?zrdY9W3BqG1LDoIk9TXD07CLKyAe8lAdSBFjROSNMUV3PVAKRuEZ97rHUI3b?=
	=?iso-8859-1?q?ISC8wKwYjQsJ9xR30du8yTn7nvTlwnOi8saWP1p2OXnFnBKDgyZrTRtcLipH?=
	=?iso-8859-1?q?gzKAgApylYBSuKWsi3fPNo49PrxWRGEbDgaPehCIU7jwhU755fvuPXvXkSE+?=
	=?iso-8859-1?q?iE/hcLVpDLd9b47GIBE4r/X9XLwmiXcrKbVfY/dvGEf8QQgyetdT8aeUdPuc?=
	=?iso-8859-1?q?cgcEYdbM4P+gdSk9vwWE8iNV+y3XH7Bw9VV7Lzd9O0FbDhoOkOqEyCYkODz0?=
	=?iso-8859-1?q?Aq6dZK/hl4F/1VPiBrEVqy6DHVqc1+tlwIrffIPh9hM4oMpH9AtqEv5gh2uC?=
	=?iso-8859-1?q?6w68Z+Q5rR+ee5Jjw6PfekL/KRE97SOsF0lQdk5VTFpjoQxG1lQengsGAKiF?=
	=?iso-8859-1?q?JwdIy6Mx9FymKMZWhnK3EyhcRPOL8127z9hCMCrN+yHCN2FZyd48VonbhZBM?=
	=?iso-8859-1?q?8D8D22ZELQw0vX2caemvhh/G/4i/MY+Xfum8ral5W96eAdHXn4lH69CZoCAa?=
	=?iso-8859-1?q?JITKG05JqRa38GsJ3lwHx0/8U6dy0bhs265zldRH4SLtFSqcwYyF/eTdjeFU?=
	=?iso-8859-1?q?IJh6GXXuy87xlf5qYzB3lcPjHeqx+ao/l7yxN7g=3D=3D?=
MIME-Version: 1.0
X-OriginatorOrg: os.amperecomputing.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 b2a2d631-c70e-452d-1f8c-08dca965ba92
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:56.6502 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 k2OIdxXWwMlQdNsMdrehhj1UsfN/vLISw4VvycbGzPKnjcSBZ2D2nHA8iGmlg117yMWKmqqmegFDbIVeWxICX46J2rbSe+GrOwr9ZA2ts2c=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

This patch adds a pattern to fold a summation into the last operand of lane-
reducing operation when appropriate, which is a supplement to those operation-
specific patterns for dot-prod/sad/widen-sum.

  sum = lane-reducing-op(..., 0) + value;
=>
  sum = lane-reducing-op(..., value);

Thanks,
Feng
---
gcc/
	* tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New
	pattern function.
	(vect_vect_recog_func_ptrs): Add the new pattern function.
	* params.opt (vect-lane-reducing-accum-pattern): New parameter.

gcc/testsuite/
	* gcc.dg/vect/vect-reduc-accum-pattern.c
---
 gcc/params.opt                                |   4 +
 .../gcc.dg/vect/vect-reduc-accum-pattern.c    |  61 ++++++++++
 gcc/tree-vect-patterns.cc                     | 106 ++++++++++++++++++
 3 files changed, 171 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c

From 94d34da8de2fd479c81e8398544466e6ffe7fdfc Mon Sep 17 00:00:00 2001
From: Feng Xue <fxue@os.amperecomputing.com>
Date: Wed, 22 May 2024 17:08:32 +0800
Subject: [PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing
 operation

This patch adds a pattern to fold a summation into the last operand of lane-
reducing operation when appropriate, which is a supplement to those operation-
specific patterns for dot-prod/sad/widen-sum.

  sum = lane-reducing-op(..., 0) + value;
=>
  sum = lane-reducing-op(..., value);

2024-05-22 Feng Xue <fxue@os.amperecomputing.com>

gcc/
	* tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New
	pattern function.
	(vect_vect_recog_func_ptrs): Add the new pattern function.
	* params.opt (vect-lane-reducing-accum-pattern): New parameter.

gcc/testsuite/
	* gcc.dg/vect/vect-reduc-accum-pattern.c
---
 gcc/params.opt                                |   4 +
 .../gcc.dg/vect/vect-reduc-accum-pattern.c    |  61 ++++++++++
 gcc/tree-vect-patterns.cc                     | 106 ++++++++++++++++++
 3 files changed, 171 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c

diff --git a/gcc/params.opt b/gcc/params.opt
index c17ba17b91b..b94bdc26cbd 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vect-lane-reducing-accum-pattern=
+Common Joined UInteger Var(param_vect_lane_reducing_accum_pattern) Init(2) IntegerRange(0, 2) Param Optimization
+Allow pattern of combining plus into lane reducing operation or not. If value is 2, allow this for all statements, or if 1, only for reduction statement, otherwise, disable it.
+
 -param=vrp-block-limit=
 Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param
 Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c
new file mode 100644
index 00000000000..80a2c4f047e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c
@@ -0,0 +1,61 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#define FN(name, S1, S2)			\
+S1 int __attribute__ ((noipa))			\
+name (S1 int res,				\
+      S2 char *restrict a,			\
+      S2 char *restrict b,			\
+      S2 char *restrict c,			\
+      S2 char *restrict d)			\
+{						\
+  for (int i = 0; i < N; i++)			\
+    res += a[i] * b[i];				\
+						\
+  asm volatile ("" ::: "memory");		\
+  for (int i = 0; i < N; ++i)			\
+    res += (a[i] * b[i] + c[i] * d[i]) << 3;	\
+						\
+  return res;					\
+}
+
+FN(f1_vec, signed, signed)
+
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
+FN(f1_novec, signed, signed)
+#pragma GCC pop_options
+
+#define BASE2 ((signed int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  signed char a[N], b[N];
+  signed char c[N], d[N];
+
+#pragma GCC novector
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = BASE2 + i * 5;
+      b[i] = BASE2 + OFFSET + i * 4;
+      c[i] = BASE2 + i * 6;
+      d[i] = BASE2 + OFFSET + i * 5;
+    }
+
+  if (f1_vec (0x12345, a, b, c, d) != f1_novec (0x12345, a, b, c, d))
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump "vect_recog_lane_reducing_accum_pattern: detected" "vect" { target { vect_sdot_qi } } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index bb037af0b68..9a6b16532e4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1490,6 +1490,111 @@ vect_recog_abd_pattern (vec_info *vinfo,
   return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out);
 }
 
+/* Function vect_recog_lane_reducing_accum_pattern
+
+   Try to fold a summation into the last operand of lane-reducing operation.
+
+   sum = lane-reducing-op(..., 0) + value;
+
+   A lane-reducing operation contains two aspects: main primitive operation
+   and appendant result-accumulation.  Pattern matching for the basic aspect
+   is handled in specific pattern for dot-prod/sad/widen-sum respectively.
+   The function is in charge of the other aspect.
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern, that is:
+	sum = lane-reducing-op(..., value);
+*/
+
+static gimple *
+vect_recog_lane_reducing_accum_pattern (vec_info *vinfo,
+					stmt_vec_info stmt_vinfo,
+					tree *type_out)
+{
+  if (!(stmt_vinfo->reduc_pattern_status & rpatt_formed))
+    return NULL;
+
+  if (param_vect_lane_reducing_accum_pattern == 0)
+    return NULL;
+
+  if (param_vect_lane_reducing_accum_pattern == 1)
+    {
+      /* Only allow combing for loop reduction statement.  */
+      if (STMT_VINFO_REDUC_IDX (stmt_vinfo) < 0)
+	return NULL;
+    }
+
+  gimple *last_stmt = stmt_vinfo->stmt;
+
+  if (!is_gimple_assign (last_stmt)
+      || gimple_assign_rhs_code (last_stmt) != PLUS_EXPR)
+    return NULL;
+
+  gimple *lane_reducing_stmt = NULL;
+  tree sum_oprnd = NULL_TREE;
+
+  for (unsigned i = 0; i < 2; i++)
+    {
+      tree oprnd = gimple_op (last_stmt, i + 1);
+      vect_unpromoted_value unprom;
+      bool single_use_p = true;
+
+      if (!vect_look_through_possible_promotion (vinfo, oprnd, &unprom,
+						 &single_use_p)
+	  || !single_use_p)
+	continue;
+
+      stmt_vec_info oprnd_vinfo = vect_get_internal_def (vinfo, unprom.op);
+
+      if (!oprnd_vinfo)
+	continue;
+
+      gimple *stmt = oprnd_vinfo->stmt;
+
+      if (lane_reducing_stmt_p (stmt)
+	  && integer_zerop (gimple_op (stmt, gimple_num_ops (stmt) - 1)))
+	{
+	  lane_reducing_stmt = stmt;
+	  sum_oprnd = gimple_op (last_stmt,  2 - i);
+	  break;
+	}
+    }
+
+  if (!lane_reducing_stmt)
+    return NULL;
+
+  tree type = TREE_TYPE (gimple_get_lhs (last_stmt));
+
+  *type_out = get_vectype_for_scalar_type (vinfo, type);
+  if (!*type_out)
+    return NULL;
+
+  vect_pattern_detected ("vect_recog_lane_reducing_accum_pattern", last_stmt);
+
+  tree var = vect_recog_temp_ssa_var (type, NULL);
+  enum tree_code code = gimple_assign_rhs_code (lane_reducing_stmt);
+  gimple *pattern_stmt;
+
+  if (code == WIDEN_SUM_EXPR)
+    pattern_stmt = gimple_build_assign (var, code,
+					gimple_op (lane_reducing_stmt, 1),
+					sum_oprnd);
+  else
+    pattern_stmt = gimple_build_assign (var, code,
+					gimple_op (lane_reducing_stmt, 1),
+					gimple_op (lane_reducing_stmt, 2),
+					sum_oprnd);
+  return pattern_stmt;
+}
+
 /* Recognize an operation that performs ORIG_CODE on widened inputs,
    so that it can be treated as though it had the form:
 
@@ -7084,6 +7189,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_dot_prod_pattern, "dot_prod" },
   { vect_recog_sad_pattern, "sad" },
   { vect_recog_widen_sum_pattern, "widen_sum" },
+  { vect_recog_lane_reducing_accum_pattern, "lane_reducing_accum" },
 
   { vect_recog_bitfield_ref_pattern, "bitfield_ref" },
   { vect_recog_bit_insert_pattern, "bit_insert" },
-- 
2.17.1