From patchwork Sat Jul 13 15:47:14 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Feng Xue OS <fxue@os.amperecomputing.com>
X-Patchwork-Id: 1960185
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=fail reason="signature verification failed" (1024-bit key;
 unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com
 header.a=rsa-sha256 header.s=selector2 header.b=JkvlCGEg;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WLtCC5jZsz1xr4
	for <incoming@patchwork.ozlabs.org>; Sun, 14 Jul 2024 01:47:51 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 075383861004
	for <incoming@patchwork.ozlabs.org>; Sat, 13 Jul 2024 15:47:50 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from SN4PR2101CU001.outbound.protection.outlook.com
 (mail-southcentralusazlp170120000.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:c10d::])
 by sourceware.org (Postfix) with ESMTPS id EF9D2385E027
 for <gcc-patches@gcc.gnu.org>; Sat, 13 Jul 2024 15:47:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EF9D2385E027
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=os.amperecomputing.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=os.amperecomputing.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EF9D2385E027
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:c10d::
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1720885642; cv=pass;
 b=JwVg+5Jp1AgtqONjjJNziCnkqyVZR67yylHt9eRVIngDArUmi9crhsxTGGoAfQYs0VTTASq1j4q0MrQRYm+TBxFWXtzUHBFiurISNmtMcDklQ0heakhMxVkvMI8JKqkqJBWCumbYxlaU8I9Gha1vZdtOA2mT/Xr0ykaaMHBCwW8=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1720885642; c=relaxed/simple;
 bh=ZjFGe3kuMgp7+jdOhY2bpeu6av3lCh/yhQ380O7rKmQ=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=basfd3L3pB5jtXIacfpiUQSxkoRxGTWkVCkNOOnvvE80tx0jzUJIjeoYfsVtvtGGRwgkAcURjUcbYT7xZMdHzhZ+qEJh1T6bh7QPI3DHEnD0KuLa007ON0usUHFrSjoIO84V7k95NrJ3NR85tmdYt3QLrhIpRgE58Wb8E6buJcw=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=NbdBfoRebw1DjV9xJeWznUNl0lal5pPp2tILW06H4hBTCMrfuNTC11caJPX7R7x7otWnEizF6a/Ip8UHK7DPeGwIPejtXNmBvCLj5XH+43adtINKKwWQZP/jbfA/jSVjYF9gbJbqHc0fjZRxD8/Bacsxde/roluMCTxKCIPUVrpEwrSb9K22tfnC2tizzy6rHhG/qLM/o4Bk9D57AjORlRtzFq77ShLM4jS36j7jQP4s6imxRV2xjlUqhIbmRb7uuChWbqlc+AzIrR7JZ6yR1UnqPccuWS5mf0dWt0G8tiFFq6M0UoZtmn9bibxorg+eUiBI3TaCVfVXDixdKQCPlA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=O0LeKJHHjMOB/Sc0nFqN43JdJ1bCvi9DGB1k3ZnmK+Q=;
 b=U7aOc8qdyzb4K+xNXmXdTqYq0lPKCsDrK0X2GPp57mXCRQC+/KrsJpdqe5uMMoyyNCjxp3cLBMysGIW9RxyrIa14faTaLD0RTAdT/QAFi9d3/EThKTuoaPkSsAEKnRUbpve2GrlMsWFDwAXyJfQ+xOlSfvRvXBlW8t1y4v/jVI63/62qm7XPuX1B9/0zaQgT3/d8M7UqCKpQacHIjLZP6UxFZ2RzXzSe4AXUhJjc82BimuuTAowB0KWwCPpZAv3FIC9O1dx3m01caKhrWj/x+RFzBnxm9fl7jF3XL0KvJ0sdiOTjciC/wmGpVVCXLbbXK+OmD4NySkPZacHBXJrGew==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none
 header.from=os.amperecomputing.com; dkim=pass
 header.d=os.amperecomputing.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=os.amperecomputing.com; s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=O0LeKJHHjMOB/Sc0nFqN43JdJ1bCvi9DGB1k3ZnmK+Q=;
 b=JkvlCGEg+Km6iFMss7bn9lylqRb0qiMMV3GU6Zt+FM96pFAJ25MpwW/0A1hDZ5LGow57eDHsQKL0itpXl6o0h8A+MkeI5kyLu7UqvJaxYDH370bPcrYl4HZHnjVYf5ST/JtjimiwGLaxR2gE6BOx2ZbVFNyHjYhARoXb6hQanU0=
Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by
 PH0PR01MB7334.prod.exchangelabs.com (2603:10b6:510:10d::22) with
 Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7762.24; Sat, 13 Jul 2024 15:47:14 +0000
Received: from LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com
 ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7762.020; Sat, 13 Jul 2024
 15:47:14 +0000
From: Feng Xue OS <fxue@os.amperecomputing.com>
To: Richard Biener <richard.guenther@gmail.com>, "gcc-patches@gcc.gnu.org"
 <gcc-patches@gcc.gnu.org>
Subject: [PATCH 2/4] vect: Refit lane-reducing to be normal operation
Thread-Topic: [PATCH 2/4] vect: Refit lane-reducing to be normal operation
Thread-Index: AQHa1TvXLlSGrjJ6Oka5bbDPXp+L8A==
Date: Sat, 13 Jul 2024 15:47:14 +0000
Message-ID: 
 <LV2PR01MB7839EFE65507BCE03159DDAAF7A72@LV2PR01MB7839.prod.exchangelabs.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-13T15:47:14.140Z;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0;
 MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard;
authentication-results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=os.amperecomputing.com;
x-ms-publictraffictype: Email
x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|PH0PR01MB7334:EE_
x-ms-office365-filtering-correlation-id: 96707789-5010-482d-2f20-08dca3531116
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
 ARA:13230040|1800799024|376014|366016|38070700018;
x-microsoft-antispam-message-info: =?iso-2022-jp?b?bUYyMW9OZ0ZMelc4ZUFoY3pU?=
	=?iso-2022-jp?b?WEwrQUhBRW5EVVdNNGt5MHZqeFZBZUkwRm5xWEZ2elE1Y0NTODluTWJx?=
	=?iso-2022-jp?b?Q3pXM1dxck1xWThucEZSdEE4Ny9PVUV0M0xXZnh5RnhlY1VoMkh4dDgz?=
	=?iso-2022-jp?b?SFduR0pGTXZGclZwelZLV3RZWlVBRzV3MWo3SDVNWVI1ZzZBSWphdXBQ?=
	=?iso-2022-jp?b?NWRLSEk5UUVQQmV3ayszNWhoS0lja3FxSGFXZGp6RHd5T2FHZmJBOHE2?=
	=?iso-2022-jp?b?ZjRRWE1udGtWTEEvYnpTbUtNZjlsdE1ERjMwZkxZTGIvUSs1WnoreDZm?=
	=?iso-2022-jp?b?NW5mU25SaE1kSnJ0STV5Y1FtWUNJU0oyUEd0TlpKbGtrZlRnVHV6dUdn?=
	=?iso-2022-jp?b?cmtxeW1rRHRzZVZFVndKdEpDbXI5WUh0eGxCbHI0c3RrRFVNTnlYQjU0?=
	=?iso-2022-jp?b?UGlGckV5MUtOa3R3ZDNhSUxmT2RVQ3E5cCtHNzEvOVh5KzUwTTFXY25r?=
	=?iso-2022-jp?b?dERkM01xME9DZkdYT0I1RDYxdXlHc2U2WkFOTEN0NkU3ckp0cE9YaUJi?=
	=?iso-2022-jp?b?c1lXUVEyVFlhcTREVUFnSzhyZ1RhdXl4em1ESFV6RWQvaUZsQUxSblBC?=
	=?iso-2022-jp?b?SWVkSFBGRUh3czdPTUQ3YStLcWhhNVJDN29xajhhZDF0SFlzeHF5V1lU?=
	=?iso-2022-jp?b?ZW5hV1pjKy8vcU1ndmV5cW96WGkrQkNRTWoweVVBMTNNRVQzRHd4cFYw?=
	=?iso-2022-jp?b?dzRETWtLM0FkV1ByQ0hNSlVtNGQvWmtOZzlVSkc3SWhVL3cwbGkrbXpP?=
	=?iso-2022-jp?b?TVUzWHNyL01kblEwaFpzVHRoY29LaXIyekR6MGpIK2lyelRyUWlPMFZh?=
	=?iso-2022-jp?b?aGtod09sblg0MGh1RW1QQjUvMm45QjN2UjFWa1pYWWVYYVhjRFVXU2ow?=
	=?iso-2022-jp?b?N1NCUVBnY05PcjMwZ2hmTWhzUDliUEJTeThoS2ZzTzJJMER1Yy96bnFw?=
	=?iso-2022-jp?b?Q2hhU003WjF0MnBaWXcxa2lZME1DMVNtaGpJanN3RmY2Y21ycjZoa3BZ?=
	=?iso-2022-jp?b?dVlUR1p0VmFLb1FtVjRRUDcza0NsY1RTaSs2VytqSEZVOVNkNzVsYlcy?=
	=?iso-2022-jp?b?NEhpTWJXZHJheUlVU1p6UkppZlY1dDlYakJYUE9GL1d3R0xnREJhUncz?=
	=?iso-2022-jp?b?WWw5ZUZJNncvTFRiZGZGa2FuaG5EcTVqeElUaStaTkZlOEwzVlZPa1Az?=
	=?iso-2022-jp?b?enBGcjl2S3ZxclY4RWtyNnBjaFJDMmlGM25pYW5MdE9xSkNQWUJLcVRr?=
	=?iso-2022-jp?b?d1NMV3lMNkRielg2QVh5VXJpeDRuVHJJOXJlN3FKQzNYaGdDV1pYVWpq?=
	=?iso-2022-jp?b?TWx2S1FJbGJEWGhoaWRYdm9scnVDbk1HdGxUSTJIQnpCdnY2MVZKOVd6?=
	=?iso-2022-jp?b?ckNWSm1HeFVISmdoWE9PT0VrZ0JlWElXTTVqb3NqS1FSRjZNZ1ZheERV?=
	=?iso-2022-jp?b?blI4b0M4T3JSanNnRTlnWHhMc2E0ZmJoOFRBR25VV1ptdFh1VFQwVkZr?=
	=?iso-2022-jp?b?NzVxSWVuUnBOakxlbEhVR3JxRmRZSVJhQjdkNkdaNUtwTDd0Uk9xOEhM?=
	=?iso-2022-jp?b?SmdlZ281bllOS2NzN3IvNk5TUFVSbW42OGl4Z2N3UUthdmUzYzkrRWhR?=
	=?iso-2022-jp?b?MnZ0MkxESXFvZm5ZWnNwMHArbHhvMXNrdi9zcElQdEI2Q2E0ZWpyaUE5?=
	=?iso-2022-jp?b?WCtSZURNSlRJbU1GeDNaS3h1RW9OTmxOOTkwRGZrTzV0aEZmbzcxc204?=
	=?iso-2022-jp?b?YnpHcUNCMjdscHRLY0t0M1ZNb2NDWmhZS2dUMFl2eENwcTNaVG1DT2Qx?=
	=?iso-2022-jp?b?SXFObUdSZ0hmaVhYb2xoTXBSdHNCZ29jYVM4ZThEUFFvR0dySFhRWmpV?=
	=?iso-2022-jp?b?MlByaG9FRFUxQVN4dGNFUk5wa083K3NWQ0ZuVXQrb1BzUkxaQmxFSXVZ?=
	=?iso-2022-jp?b?RUYyaXpNYmxOSHE2SEFjUExCQUFRWW0xNWFHWmo5SCtnemRtQmZRcHRO?=
	=?iso-2022-jp?b?anJHWnpxaWxIdVl0bHY4WmQvd3JTSGFRamc9PQ==?=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:ja; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE;
 SFS:(13230040)(1800799024)(376014)(366016)(38070700018); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?b?aTZINmRvdmdZL1ZhZXRD?=
	=?iso-2022-jp?b?MkNOMTVUVTArdHFTOWR5ajFmMEdxSExiU1dQT1o0L3hHN3RoUkFWaklX?=
	=?iso-2022-jp?b?WHhwaEZXWFhqNXV5WFJXTVpzRzFBV3cvb3c0Z1lUNjE4bitvM1BJU0o2?=
	=?iso-2022-jp?b?ZFllamNOUDJPejVreFJpaGlkVG9mV3RoUTBvc2w1R1RyZGI0MXB0MzVk?=
	=?iso-2022-jp?b?MVZKRFhlWXNVazRac0hzS0VnTkF3QkY1TEh5akxIS1U1WjAxcFp1UEFX?=
	=?iso-2022-jp?b?UysvWWdkUHVtQzBlZzJxSnZuTjV5TW41OVczL0tTZ1Y0RFZIOTRLdVpq?=
	=?iso-2022-jp?b?VEN6b2w2RjNWMHkrcU9kbitzUUVyQ0VCbE91SzlaSG1qbHl4OGtaRXFB?=
	=?iso-2022-jp?b?ME9lSzBBa0gzaVg4TGhzblVTaHdabE04clhyeEh6Uit3aXFOQklpOXVu?=
	=?iso-2022-jp?b?b2RpRTdNMmlheTAxQnRYR2NBZkpSVGNDbElXbmRaYStCT2EyY0F3eXZV?=
	=?iso-2022-jp?b?YkdHcllETnZZNDQxMDJLZDVCS3d3MW9BbTRpMUFMMlRrVkJ2K3RqZzBX?=
	=?iso-2022-jp?b?eXgyclRXVmdENHl6YnUxRGdQZWtYaENpVnVyWWdJemp4NUNMWERIdVh2?=
	=?iso-2022-jp?b?cmthdEZMVnNrQnQvajcwTnRxSCtUaS9UbFBxb1N1S1JlR3c5aGhPSHEz?=
	=?iso-2022-jp?b?WE9HZzJQYTJjSWYzaVMyVXVLZHNkYjJpQ2RDZWVhb3MwNC8rYmJqU2FS?=
	=?iso-2022-jp?b?ejVuSXpqU2ZRcEhqR3JySWNPYktpZ2MzVUtYQi8yTFdRaXVEbkV6SE1p?=
	=?iso-2022-jp?b?ZUQwb2lDR25RMThtODdBNnF3RkxLNWluK2ZjcTFLRWRZbXRnWDN0aDk0?=
	=?iso-2022-jp?b?bVdnemhGdXlZZTRlMFRSY2NKVWk0dlNaNlJReVZoZ0hBM0REWXJ4OEJM?=
	=?iso-2022-jp?b?cWVqVXFnVk15K0VDNzBoLzd1em00UGxSeWhYdE13TGRNV203TVJxTlBj?=
	=?iso-2022-jp?b?ZnhtbmE1aHY2cVVlQTJhTGI2Z1UwbCt2WGQ3SVBOcHV6c1pUMkRwdzZN?=
	=?iso-2022-jp?b?TGhKV0xhaTVjeVNSU2xob3pWQjdlUU9xMWRzMGk1RS9udVBET3hMVEJ0?=
	=?iso-2022-jp?b?aUxYV2g0a1pPT1hpSDdHZmo3cm55Yk8zeENhN1pySEhYaWNqakJwVnlO?=
	=?iso-2022-jp?b?MFBDUWRvcXRuM1l1cVpLU1REbXp6WmFHSVlKNmlaVkJ6bndaUFA3ajJs?=
	=?iso-2022-jp?b?Z0NlNXhzK3JwTXNyek9Xa0xhMTJQbVN4bVNBY3I2K1lvNDkwa0NKby80?=
	=?iso-2022-jp?b?SkR3WDA3amU0RXl6QldhZVRHV3VEL2JNQzhpVHZkT05xQTQ3ZkowdGc0?=
	=?iso-2022-jp?b?OHoyVVNxU3luTnJDbGZoMHZ5b3p3Q0lodk5yZHhBRDhoU0pwRDVpcUVa?=
	=?iso-2022-jp?b?VTlpZVJ6WXZOd2JkUE5QZkxOU21PME9MVm45RVZwNzQ3NS9qQi9nbDZa?=
	=?iso-2022-jp?b?VU1ORmlQK0ErM2xRWE81UDVOUEZiakowenBMVmo4VjRTSzdHQTJIRGsx?=
	=?iso-2022-jp?b?czBlUCtmYnpZOW41MUxPaEt3cG1RS3ZJeWMrRHhOSExZRlVnQmhKanlT?=
	=?iso-2022-jp?b?YXk0Um4xd1EyZlFubDEyVWRVSElYVTc0a1ZKSGhoZk9veWVnS0w0OWJk?=
	=?iso-2022-jp?b?NHRXbkNzd1dqRlJMMW9NaTlFaDdjOVdPellSWURnaHRSMkFWeTVPc0hn?=
	=?iso-2022-jp?b?ckx2eGdyVzdpMnd6R2grSXpDaDdYVVdtU2hHcFVyc284RFhpdkxRL2pw?=
	=?iso-2022-jp?b?akwzUnRNR1RDK2l4NVNvbHFMM3pXMlNYMDhDcUFBcTFwZDlyb25mMzBC?=
	=?iso-2022-jp?b?dno3cG0yMXBkZHdzR2FjcUxFU3RONmRaWnNaUndsZmVWMlVSUjBocTZw?=
	=?iso-2022-jp?b?ZTdrYXArdTMzNFVmbDdnclFNRFJsUXl6NUF4aG9PSERDclUvbDZTaEJW?=
	=?iso-2022-jp?b?SENpU1NlRHhUVEp4cnJlQ3VhYmNOTFlGMU9qMSs3alBWNURHa1Mxekpm?=
	=?iso-2022-jp?b?d2tKcWZZd29VUWVvd3B4QVNYSWU4Q2t1ZkFtUUh5R3hXeXdoUWFOTlJm?=
	=?iso-2022-jp?b?OW82eTN4dDRiSWgxekZBVjI5M3U5NVU0N1hHZVZGVHRBMkcvdmZHUkF3?=
	=?iso-2022-jp?b?cFRXZXlHMVhRZGlmR2g5QUJiUDFTcFpDd1Z2MS9selNQbDNSMkI4WEJZ?=
	=?iso-2022-jp?b?OHVKRnhhSjR4UFJXcUNSZUZHVVBaTzl0T1dIdW8zQkJHV3pqS3EwaHB4?=
	=?iso-2022-jp?b?cXdhZ1h6MHlYRW92dE12OFFNZjZYNFByWEFaOFFwVA==?=
MIME-Version: 1.0
X-OriginatorOrg: os.amperecomputing.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 96707789-5010-482d-2f20-08dca3531116
X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Jul 2024 15:47:14.3806 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: 
 46eyII7C7JSC79XRF591L6uRu5zyPMiBf8zla0CB+5oH+k1M01nb/iPF5yZBxA7NhT0KOCxhbKLjnCZZFFnWA1taHTIDhYtn9swTZXUYEf4=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR01MB7334
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
to make it behave like a normal one, by adding new pass-through copies to
fix possible def/use gap. And resultant superfluous statements could be
optimized away after vectorization.  For example:

  int sum = 1;
  for (i)
    {
      sum += d0[i] * d1[i];      // dot-prod <vector(16) char>
    }

  The vector size is 128-bit，vectorization factor is 16.  Reduction
  statements would be transformed as:

  vector<4> int sum_v0 = { 0, 0, 0, 1 };
  vector<4> int sum_v1 = { 0, 0, 0, 0 };
  vector<4> int sum_v2 = { 0, 0, 0, 0 };
  vector<4> int sum_v3 = { 0, 0, 0, 0 };

  for (i / 16)
    {
      sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
      sum_v1 = sum_v1;  // copy
      sum_v2 = sum_v2;  // copy
      sum_v3 = sum_v3;  // copy
    }

  sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0

Thanks,
Feng
---
gcc/
	* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
	Calculate effective vector stmts number with generic
	vect_get_num_copies.
	(vect_transform_reduction): Insert copies for lane-reducing so as to
	fix over-estimated vector stmts number.
	(vect_transform_cycle_phi): Calculate vector PHI number only based on
	output vectype.
	* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove
	adjustment on vector stmts number specific to slp reduction.
---
 gcc/tree-vect-loop.cc | 134 +++++++++++++++++++++++++++++++++++-------
 gcc/tree-vect-slp.cc  |  27 +++------
 2 files changed, 121 insertions(+), 40 deletions(-)

From 2b9b22f7f1a19816a17086c79e7ec5f7d0298af6 Mon Sep 17 00:00:00 2001
From: Feng Xue <fxue@os.amperecomputing.com>
Date: Tue, 2 Jul 2024 17:12:00 +0800
Subject: [PATCH 2/4] vect: Refit lane-reducing to be normal operation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
to make it behave like a normal one, by adding new pass-through copies to
fix possible def/use gap. And resultant superfluous statements could be
optimized away after vectorization.  For example:

  int sum = 1;
  for (i)
    {
      sum += d0[i] * d1[i];      // dot-prod <vector(16) char>
    }

  The vector size is 128-bit，vectorization factor is 16.  Reduction
  statements would be transformed as:

  vector<4> int sum_v0 = { 0, 0, 0, 1 };
  vector<4> int sum_v1 = { 0, 0, 0, 0 };
  vector<4> int sum_v2 = { 0, 0, 0, 0 };
  vector<4> int sum_v3 = { 0, 0, 0, 0 };

  for (i / 16)
    {
      sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
      sum_v1 = sum_v1;  // copy
      sum_v2 = sum_v2;  // copy
      sum_v3 = sum_v3;  // copy
    }

  sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0

2024-07-02 Feng Xue <fxue@os.amperecomputing.com>

gcc/
	* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
	Calculate effective vector stmts number with generic
	vect_get_num_copies.
	(vect_transform_reduction): Insert copies for lane-reducing so as to
	fix over-estimated vector stmts number.
	(vect_transform_cycle_phi): Calculate vector PHI number only based on
	output vectype.
	* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove
	adjustment on vector stmts number specific to slp reduction.
---
 gcc/tree-vect-loop.cc | 134 +++++++++++++++++++++++++++++++++++-------
 gcc/tree-vect-slp.cc  |  27 +++------
 2 files changed, 121 insertions(+), 40 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a64b5082bd1..5ac83e76975 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7468,12 +7468,8 @@ vect_reduction_update_partial_vector_usage (loop_vec_info loop_vinfo,
 			= get_masked_reduction_fn (reduc_fn, vectype_in);
       vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
       vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
-      unsigned nvectors;
-
-      if (slp_node)
-	nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-      else
-	nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
+      unsigned nvectors = vect_get_num_copies (loop_vinfo, slp_node,
+					       vectype_in);
 
       if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
 	vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_in, 1);
@@ -8595,12 +8591,15 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
   gphi *reduc_def_phi = as_a <gphi *> (phi_info->stmt);
   int reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
-  tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (reduc_info);
+  tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info);
+
+  if (!vectype_in)
+    vectype_in = STMT_VINFO_VECTYPE (stmt_info);
 
   if (slp_node)
     {
       ncopies = 1;
-      vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+      vec_num = vect_get_num_copies (loop_vinfo, slp_node, vectype_in);
     }
   else
     {
@@ -8658,13 +8657,40 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   bool lane_reducing = lane_reducing_op_p (code);
   gcc_assert (single_defuse_cycle || lane_reducing);
 
+  if (lane_reducing)
+    {
+      /* The last operand of lane-reducing op is for reduction.  */
+      gcc_assert (reduc_index == (int) op.num_ops - 1);
+    }
+
   /* Create the destination vector  */
   tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
   tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
+  if (lane_reducing && !slp_node && !single_defuse_cycle)
+    {
+      /* Note: there are still vectorizable cases that can not be handled by
+	 single-lane slp.  Probably it would take some time to evolve the
+	 feature to a mature state.  So we have to keep the below non-slp code
+	 path as failsafe for lane-reducing support.  */
+      gcc_assert (op.num_ops <= 3);
+      for (unsigned i = 0; i < op.num_ops; i++)
+	{
+	  unsigned oprnd_ncopies = ncopies;
+
+	  if ((int) i == reduc_index)
+	    {
+	      tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+	      oprnd_ncopies = vect_get_num_copies (loop_vinfo, vectype);
+	    }
+
+	  vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, oprnd_ncopies,
+					 op.ops[i], &vec_oprnds[i]);
+	}
+    }
   /* Get NCOPIES vector definitions for all operands except the reduction
      definition.  */
-  if (!cond_fn_p)
+  else if (!cond_fn_p)
     {
       gcc_assert (reduc_index >= 0 && reduc_index <= 2);
       vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
@@ -8702,6 +8728,61 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 			 reduc_index == 2 ? op.ops[2] : NULL_TREE,
 			 &vec_oprnds[2]);
     }
+  else if (lane_reducing)
+    {
+      /* For normal reduction, consistency between vectorized def/use is
+	 naturally ensured when mapping from scalar statement.  But if lane-
+	 reducing op is involved in reduction, thing would become somewhat
+	 complicated in that the op's result and operand for accumulation are
+	 limited to less lanes than other operands, which certainly causes
+	 def/use mismatch on adjacent statements around the op if do not have
+	 any kind of specific adjustment.  One approach is to refit lane-
+	 reducing op in the way of introducing new trivial pass-through copies
+	 to fix possible def/use gap, so as to make it behave like a normal op.
+	 And vector reduction PHIs are always generated to the full extent, no
+	 matter lane-reducing op exists or not.  If some copies or PHIs are
+	 actually superfluous, they would be cleaned up by passes after
+	 vectorization.  An example for single-lane slp is given as below.
+	 Similarly, this handling is applicable for multiple-lane slp as well.
+
+	   int sum = 1;
+	   for (i)
+	     {
+	       sum += d0[i] * d1[i];      // dot-prod <vector(16) char>
+	     }
+
+	 The vector size is 128-bit，vectorization factor is 16.  Reduction
+	 statements would be transformed as:
+
+	   vector<4> int sum_v0 = { 0, 0, 0, 1 };
+	   vector<4> int sum_v1 = { 0, 0, 0, 0 };
+	   vector<4> int sum_v2 = { 0, 0, 0, 0 };
+	   vector<4> int sum_v3 = { 0, 0, 0, 0 };
+
+	   for (i / 16)
+	     {
+	       sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
+	       sum_v1 = sum_v1;  // copy
+	       sum_v2 = sum_v2;  // copy
+	       sum_v3 = sum_v3;  // copy
+	     }
+
+	   sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0
+	*/
+      unsigned effec_ncopies = vec_oprnds[0].length ();
+      unsigned total_ncopies = vec_oprnds[reduc_index].length ();
+
+      gcc_assert (effec_ncopies <= total_ncopies);
+
+      if (effec_ncopies < total_ncopies)
+	{
+	  for (unsigned i = 0; i < op.num_ops - 1; i++)
+	    {
+	      gcc_assert (vec_oprnds[i].length () == effec_ncopies);
+	      vec_oprnds[i].safe_grow_cleared (total_ncopies);
+	    }
+	}
+    }
 
   bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
   unsigned num = vec_oprnds[reduc_index == 0 ? 1 : 0].length ();
@@ -8710,7 +8791,27 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
     {
       gimple *new_stmt;
       tree vop[3] = { vec_oprnds[0][i], vec_oprnds[1][i], NULL_TREE };
-      if (masked_loop_p && !mask_by_cond_expr)
+      if (!vop[0] || !vop[1])
+	{
+	  tree reduc_vop = vec_oprnds[reduc_index][i];
+
+	  /* If could not generate an effective vector statement for current
+	     portion of reduction operand, insert a trivial copy to simply
+	     handle over the operand to other dependent statements.  */
+	  gcc_assert (reduc_vop);
+
+	  if (slp_node && TREE_CODE (reduc_vop) == SSA_NAME
+	      && !SSA_NAME_IS_DEFAULT_DEF (reduc_vop))
+	    new_stmt = SSA_NAME_DEF_STMT (reduc_vop);
+	  else
+	    {
+	      new_temp = make_ssa_name (vec_dest);
+	      new_stmt = gimple_build_assign (new_temp, reduc_vop);
+	      vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+					   gsi);
+	    }
+	}
+      else if (masked_loop_p && !mask_by_cond_expr)
 	{
 	  /* No conditional ifns have been defined for lane-reducing op
 	     yet.  */
@@ -8810,23 +8911,16 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
     /* Leave the scalar phi in place.  */
     return true;
 
-  tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (reduc_info);
-  /* For a nested cycle we do not fill the above.  */
-  if (!vectype_in)
-    vectype_in = STMT_VINFO_VECTYPE (stmt_info);
-  gcc_assert (vectype_in);
-
   if (slp_node)
     {
-      /* The size vect_schedule_slp_instance computes is off for us.  */
-      vec_num = vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-				      * SLP_TREE_LANES (slp_node), vectype_in);
+      vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
       ncopies = 1;
     }
   else
     {
       vec_num = 1;
-      ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
+      ncopies = vect_get_num_copies (loop_vinfo,
+				     STMT_VINFO_VECTYPE (stmt_info));
     }
 
   /* Check whether we should use a single PHI node and accumulate
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4dadbc6854d..55ae496cbb2 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6554,26 +6554,13 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, slp_tree node,
 {
   stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
 
-  /* Calculate the number of vector statements to be created for the
-     scalar stmts in this node.  For SLP reductions it is equal to the
-     number of vector statements in the children (which has already been
-     calculated by the recursive call).  Otherwise it is the number of
-     scalar elements in one scalar iteration (DR_GROUP_SIZE) multiplied by
-     VF divided by the number of elements in a vector.  */
-  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
-      && !STMT_VINFO_DATA_REF (stmt_info)
-      && REDUC_GROUP_FIRST_ELEMENT (stmt_info))
-    {
-      for (unsigned i = 0; i < SLP_TREE_CHILDREN (node).length (); ++i)
-	if (SLP_TREE_DEF_TYPE (SLP_TREE_CHILDREN (node)[i]) == vect_internal_def)
-	  {
-	    SLP_TREE_NUMBER_OF_VEC_STMTS (node)
-	      = SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_CHILDREN (node)[i]);
-	    break;
-	  }
-    }
-  else
-    SLP_TREE_NUMBER_OF_VEC_STMTS (node) = vect_get_num_copies (vinfo, node);
+  /* Calculate the number of vector statements to be created for the scalar
+     stmts in this node.  It is the number of scalar elements in one scalar
+     iteration (DR_GROUP_SIZE) multiplied by VF divided by the number of
+     elements in a vector.  For single-defuse-cycle, lane-reducing op, and
+     PHI statement that starts reduction comprised of only lane-reducing ops,
+     the number is more than effective vector statements actually required.  */
+  SLP_TREE_NUMBER_OF_VEC_STMTS (node) = vect_get_num_copies (vinfo, node);
 
   /* Handle purely internal nodes.  */
   if (SLP_TREE_CODE (node) == VEC_PERM_EXPR)
-- 
2.17.1