From patchwork Sun Jun 16 07:20:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948262 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=D/ntsra0; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24FS2dLdz20Ws for ; Sun, 16 Jun 2024 17:21:30 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2AF083858C32 for ; Sun, 16 Jun 2024 07:21:25 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on20705.outbound.protection.outlook.com [IPv6:2a01:111:f400:7e88::705]) by sourceware.org (Postfix) with ESMTPS id 67DF73858D26 for ; Sun, 16 Jun 2024 07:21:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 67DF73858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 67DF73858D26 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f400:7e88::705 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522466; cv=pass; b=UFYxBxDFGvl6seKOUfMN0wUbpcs6GP3GCXBeDgGzWUaCZpioY71MIV3Hs8YU4Qub05V02URrMcOnSNUthW4CKwd23v60uZ30uwuzk/mgjhcUzQygDQB3XphxJRpcFeDf2CIbrEmo2SyceQL0GFpCfylTBt7VAIUvpb0NsSiGdCo= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522466; c=relaxed/simple; bh=1DC6i/m7eEZX+QNfiE9Ah2xNfUiFv9cTbw8/AGxacjk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=GtT62OV/4SEXfnS1foutPRz6FXJ+bhHu7ngNtTmFoyknyXKwt+1tZLAqvMR2j/oHtB5SPVlwUHISwJjf1BslYBzM31LWZKwk6wogFRywXwPnvOtRwRoF/GUOhw24QQc8e39G+tNQfihP1JNBJ4Ai/BXNHD3gfm1TexPxhoDVtWg= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BZmIsOcMVW5DDZeYgfZLRepEi5+etgbJ63S/2QJYepKkvU6LWShtgj9AdqoiRzzTTWe0yXLZSWHl3VXsdfpVMsDl7iWgfw96hWXjr/h3120iC+G/VwApB+glcMdqFtcCTGJle1hwzzkwUDDr0q3ezk6VIrstGSvBj5N7MIxzGssma9PIverSKNj9c1jsfFmIdTiZiBhr3wT7O82oBv2GCuK/UjeiuIn3lIXD+9GHvhIHx60PWO0z9TzOmg9Oe3e8sBtVBqqvMqxO0rSmkFeJCBVTRjfEVw2fqabYOMbFJSkGJy9wOzmE/3XeHM8Z5ur/La6JV/q2raXQ/8VN1fW7hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2vfLCQE2xaDaMRdlxI5kuE7yW40x5OZKuNC6WAmg9/s=; b=Tiwe8JRVbH+u3vM16sSyLvpDPUyuOMsi1Q5Vgv16k1dsIidQ+sfKwjGX+tkAr8CzZMNd9QxLy9S7q16sQuXeGYXXy9lm5OesAKAKQQMZ5nSZ5eSwSAkACYE3RN+wTbyVIYTBfpigIcZ+GwUaBULOVSwh6I6POboN65as2FnvxKW+oF4eHJcjArZGnRtZruVdD8it9fbqJ4wurWESKslZPzwfiBBtTj60PUaWFMzHgRPssrqleqtb7bdUcbclSZVANbSBcicmjAKzGEp7XMhmOqOtREdmy1OMGdaPR5kxt5nHD/806GH6XuQa7ZGKp9G0k79pWoGLxT8W/qlhUJA9sQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2vfLCQE2xaDaMRdlxI5kuE7yW40x5OZKuNC6WAmg9/s=; b=D/ntsra0bJtc2G9Er/k5tlFPBwTy5hoA3UnYbhn2EldUmgiQzdc6VLIzcYQi2unlOwg648fe8HmzIzXVVscfSRU+Ib/qynAqsivHlEGKbdJjw++PY1hWZRtcqHY5ezDp4ATEQhilLNq4fhTuShjQ0VI4o/W68zORFHct4f6R3mg= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by SA6PR01MB9024.prod.exchangelabs.com (2603:10b6:806:42f::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.28; Sun, 16 Jun 2024 07:20:57 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:20:57 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATH 1/8] vect: Add a function to check lane-reducing stmt Thread-Topic: [PATH 1/8] vect: Add a function to check lane-reducing stmt Thread-Index: AQHav7vvD1oKwveQlkSUExRcrisCgA== Date: Sun, 16 Jun 2024 07:20:57 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:20:56.896Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|SA6PR01MB9024:EE_ x-ms-office365-filtering-correlation-id: e79bb5b2-f4d2-4c9d-effe-08dc8dd4ddbc x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|366013|376011|1800799021|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?8rj5kWc2cc1yKtSelNAuFFDCDr?= =?iso-8859-1?q?pA6xbnJ10tEC/UiEHEWAYSjktM4YkckGO96Hr7bqgaoi4UtUEvwefybsNSr7?= =?iso-8859-1?q?vBsX4fRHxZ+fE3Y9KGmW0cekTyWMYAo/+GgktDE1ekspn93thMRtPfVzkhCu?= =?iso-8859-1?q?5HquVzUqlEH16O4yhMQZftAxIOL70pT7uK64rhVfKinBSvIXWs07Eo4fz/UE?= =?iso-8859-1?q?GaaY39fQt6ER2kkt6x3VFErRsQS1XoHpgnTl/7xqC8YFmyGfYbQNSYSuHokK?= =?iso-8859-1?q?gc9oDLyQzh1sIyzac3db+LnAaSbLCAUmNT0FyXCwy8YhHs4BsCAZ+ERL5DMW?= =?iso-8859-1?q?KwQakQGtvmrpvhy/qLgq3l93X7Nsfzm9HBLIuKRcXT+asD8ZgKcuuGsNVuFq?= =?iso-8859-1?q?R529PqU8HtW1QOAST2cGyTMTfuieyAj8AVvL1CAuoPZmktemr+Ye9HQ76Obt?= =?iso-8859-1?q?4FAqz375o7zfG+b/tqnE5l6hsQD5EWLTBP5tJEZbPLntdmWi7Wy1SDgTCkBk?= =?iso-8859-1?q?CYHNOiCVEXRdhIIUcNB5x+iGmZmrlcoq0nCX0oBnTzzTdT8i1mL2SRXD6dCu?= =?iso-8859-1?q?2ZfPt6qq4lOLKaTevu3qFWyBOl17OqUjsgua0nAP3mUKJTSCz97ZHjhfS2BR?= =?iso-8859-1?q?cMDvEZytDPvxTvZPBTiF87Rq0eUd522fW9usPAvl0fla3fcesJ5xYaY4RyO5?= =?iso-8859-1?q?QeO7dpMAdi4F6nDfr8DQlW3Yopb+c595OwMzvGdxpAHcWdZLJ/3pWrPSXtfZ?= =?iso-8859-1?q?RnhTdSP+vAXpU1UV0szD0pmJlzrgTCrL3noeFDu630zwwwQG/bVNK3bmxWlV?= =?iso-8859-1?q?oyY9YtkTvYUPOXLQzujn/xiA4XGMuZ/Paa2i7Ks58uNfbe8+WsmaFfA3pnRS?= =?iso-8859-1?q?lJCT7eLez1zaEgaYsbX/L47v1TIeoM1AyJCe4JzOxLfA0jOyolJRqPrAYsOH?= =?iso-8859-1?q?iPXR/43ffciWS23GS5aIEivfp5I352wJ5GBTxf0sLD15nwkjFkP38v2xo7ws?= =?iso-8859-1?q?1M/LwUSFhxN5bAhSdHIWYOTT/Kl0J/iAMMVvyLw0lcM+mpYT60wKmVFF9EYu?= =?iso-8859-1?q?j5I2uaPz+nwo9qVgvZVn4YxMRSOMit8nVPw6h4WeCxGTVgRRiNV1vjNlFa8y?= =?iso-8859-1?q?dJCsqsyz1z4x4v6/E/S4eD8gXoBAZWHB+YJ29b2zr5KQWOM/VdcizOcAif57?= =?iso-8859-1?q?5trGo7d1GDi9YT1+1J9QFNGi8yGXNtm+b7FcRG2aChVF+MFmOAuL5LN4oNt8?= =?iso-8859-1?q?uffC3l66JigWZYXq9ulmNlHXaB0fGCy18zKw/n0ROI61QDo1FQApXwv+ydy7?= =?iso-8859-1?q?dHr13mVtmHyvGjMMv21YIo2swb0ynF4weBhRGKY+/Y2IWAhlq8aCSy0RYNdU?= =?iso-8859-1?q?ka5Wkyqwl/Q2VYaxN/YE756dxplVP2bTgNmWuEB8hplNA1Np84Et8p/EE7F4?= =?iso-8859-1?q?ET?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(366013)(376011)(1800799021)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?Z3BgUytJUSfTzaWBHNSIXE/?= =?iso-8859-1?q?icXExek/uoHPFjJfv2xodH3Y+5MwqifiRwb2SPNJPeXL308xS6EsKETH/tN6?= =?iso-8859-1?q?TsKU3MEOAws/9ehEJJtdjlkGB/s3tqe/z71gTzpX92SrmzXei4pFTMsa42+v?= =?iso-8859-1?q?Buo4pjAsOK+xuZ+iRaBLPk8GolBhxhRCAFO0d8Nakp865eGgbgwypSIm4GKO?= =?iso-8859-1?q?bvu95szzvruFEAw+RIgkqCb9H1WVlumQqwkD1mr+J8Nz015OA0nPqpKX2m9L?= =?iso-8859-1?q?65SWgyEhVuGxdD/vwOUijymZxJRcYcTcOMMO5HfVJG1xwzEKt36UMWb55f8L?= =?iso-8859-1?q?rVxHcaSrYJIGQVDtinjHpdlYEtDspX/MVBUtry6wOOKvYcq65WJDOhho3hKV?= =?iso-8859-1?q?EvtF0YKuBzAW8JB8xu1I7Qnl3WQ1I0zOWGz3Z8F+s6i8fH1lGpWZCMGIhepO?= =?iso-8859-1?q?yK3B0PGkTYW2KI9ZQCpcstyG8YP99Lkzq3NAucTUsiRKnCZuWYo14t9WaLVW?= =?iso-8859-1?q?P5v3vayvOQJ9yAoMJ4iQpK/MwQfZDLKRFNV/T1+BHV8osYQcLcqiaHWLprYr?= =?iso-8859-1?q?wxu0z6z6SauvgFdsSE1ZH1mV5VFF1nz0SIz2pCEfwFFd+7JEuo95RczPP8au?= =?iso-8859-1?q?r1nCrEIOSbZEl+7F/QTbaIEDeGt1tEvEFVHBLHE/MArcVJ4kTwJLrXVHA5XO?= =?iso-8859-1?q?enHpimzJMplDMrJaBMi8JSAFQMvfIzqxRp6K1+zHxMYJI/FhXl+gCYZvoN8B?= =?iso-8859-1?q?oX8g0OZ4Q1kO4o4b+7UH4kfJnuYkAUCXdfMH+G6/3TjPZgoqOLa1HJ0g9oJn?= =?iso-8859-1?q?nH/v8d31VoWdeeCKvcWQMm4rJ3d4QOciu7X0Pkz67im/sDHJoNrvNOWHJ56U?= =?iso-8859-1?q?MbxJ2MOQDfMJMhWeA3g1kZ0dkWJPtZjLd7jNXke3J5V1Baf1alnv61flRGhY?= =?iso-8859-1?q?OxMt9m+jAdNkkoRXljx5ToUg0IuKhuvUD8mXVqbk9FezOSTZcU7FEH7CZTFo?= =?iso-8859-1?q?vjHoa9XThJw4Fzv1irxACt6Nx6bPRRronag/4+Kd6vIB+XjoFm2foFz7qQrC?= =?iso-8859-1?q?bTQfiiSLDJrr4lW3uTBJyXrq9KZmY5D125uUidvGVxkChb240uc/v0P5BqPb?= =?iso-8859-1?q?5z79npl+XvKAlYRZSHlKpxmljDpIm285aTwFlC6YsuIrh6JgrUuFFt5kFTvI?= =?iso-8859-1?q?f4/SW3pSEoh94d8E8sEGIL6Rg/cNCsdlTNL9oYmRhcBhvQ0WqSaRQaL/fnSd?= =?iso-8859-1?q?jDUIZKWz0iVut3ukW3gDLQlKDKuhfKN9knBXOU6GKKG/YrO7hIiStmBI3b/M?= =?iso-8859-1?q?67bMGyPmAHwx7oFrq2/oKT/nQjUze+G4RB3I4h094UV+m3wHvYMSNqyxN+FN?= =?iso-8859-1?q?WNwVckjCK7EhyHhTq7BHzpx45Aw2g/4fbDBiFVdv4q5fNBPatQBPU2AJoALN?= =?iso-8859-1?q?JIjvE86wf1/mDW2STzuLpEHNlNczwG5hganFR1J0eiezTqbnOENL0gS6puQV?= =?iso-8859-1?q?GZcGZq0R0bIgZqccku46Vmovv9B9DLeWuaIjQRFqH5WgRh2q9y8IJCaI4ldw?= =?iso-8859-1?q?MAyTDDxuuX89M9Nw2wIQRdW8qwWH7d06QEF0e0uC3wPvZzCE5j6B2PPDj1TI?= =?iso-8859-1?q?SQcu8PzoN+ZSsRKmV?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: e79bb5b2-f4d2-4c9d-effe-08dc8dd4ddbc X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:20:57.1869 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: +TSXfUvl3xjLdTZlmWlj1lcGSMJRGjV61bnDcIzJ8SZKvowtMC0WwSL9J3C6sT0OZzyKm+/ih5ikpLIe0KuBCK/oSQ2VOUHLXiGgaQ0ruqA= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA6PR01MB9024 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The series of patches are meant to support multiple lane-reducing reduction statements. Since the original ones conflicted with the new single-lane slp node patches, I have reworked most of the patches, and split them as small as possible, which may make code review easier. In the 1st one, I add a utility function to check if a statement is lane-reducing operation, which could simplify some existing code. Thanks, Feng --- gcc/ * tree-vectorizer.h (lane_reducing_stmt_p): New function. * tree-vect-slp.cc (vect_analyze_slp): Use new function lane_reducing_stmt_p to check statement. --- gcc/tree-vect-slp.cc | 4 +--- gcc/tree-vectorizer.h | 12 ++++++++++++ 2 files changed, 13 insertions(+), 3 deletions(-) From 0a90550b4ed3addfb2a36c40085bfa9b4bb05b7c Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Sat, 15 Jun 2024 23:17:10 +0800 Subject: [PATCH 1/8] vect: Add a function to check lane-reducing stmt Add a utility function to check if a statement is lane-reducing operation, which could simplify some existing code. 2024-06-16 Feng Xue gcc/ * tree-vectorizer.h (lane_reducing_stmt_p): New function. * tree-vect-slp.cc (vect_analyze_slp): Use new function lane_reducing_stmt_p to check statement. --- gcc/tree-vect-slp.cc | 4 +--- gcc/tree-vectorizer.h | 12 ++++++++++++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 7e3d0107b4e..b4ea2e18f00 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -3919,7 +3919,6 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) scalar_stmts.create (loop_vinfo->reductions.length ()); for (auto next_info : loop_vinfo->reductions) { - gassign *g; next_info = vect_stmt_to_vectorize (next_info); if ((STMT_VINFO_RELEVANT_P (next_info) || STMT_VINFO_LIVE_P (next_info)) @@ -3931,8 +3930,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) { /* Do not discover SLP reductions combining lane-reducing ops, that will fail later. */ - if (!(g = dyn_cast (STMT_VINFO_STMT (next_info))) - || !lane_reducing_op_p (gimple_assign_rhs_code (g))) + if (!lane_reducing_stmt_p (STMT_VINFO_STMT (next_info))) scalar_stmts.quick_push (next_info); else { diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 6bb0f5c3a56..60224f4e284 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2169,12 +2169,24 @@ vect_apply_runtime_profitability_check_p (loop_vec_info loop_vinfo) && th >= vect_vf_for_cost (loop_vinfo)); } +/* Return true if CODE is a lane-reducing opcode. */ + inline bool lane_reducing_op_p (code_helper code) { return code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR; } +/* Return true if STMT is a lane-reducing statement. */ + +inline bool +lane_reducing_stmt_p (gimple *stmt) +{ + if (auto *assign = dyn_cast (stmt)) + return lane_reducing_op_p (gimple_assign_rhs_code (assign)); + return false; +} + /* Source location + hotness information. */ extern dump_user_location_t vect_location; -- 2.17.1 From patchwork Sun Jun 16 07:22:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948263 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=phd5Zc0T; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24Gs4TWYz20Ws for ; Sun, 16 Jun 2024 17:22:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E111A3858282 for ; Sun, 16 Jun 2024 07:22:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on20711.outbound.protection.outlook.com [IPv6:2a01:111:f400:7e88::711]) by sourceware.org (Postfix) with ESMTPS id 23A663858D26 for ; Sun, 16 Jun 2024 07:22:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 23A663858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 23A663858D26 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f400:7e88::711 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522546; cv=pass; b=YxJGKDc3ygA4ELbjxIc93VLVh8ZpAca067ycDVT9uOPLqUa9U4lgj3GAOo5Kzq7UPdQLki1zRVI8xQGX0A11u6q55kcQwlZM6FdXVLr5RmKeZMKWNCVNhcFVV1DBjgGmhlp0kp48TNgz4m7Db4HZ5kJtQBnvzSYiJc5rA9kjFeU= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522546; c=relaxed/simple; bh=Vshawn3vOanqDPtjmcnyzEjzqFAtnBGSvLAgNHiunSg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=vuhZJ3Y3WgElgkk1z//RdUURd52PKW63gV5nv+3e0GfU7xcW4R+kVtO4fjRXV5QOiKly8RvOs3PLbFmMFnr/xo7B47yncv+yGHAgsUcYPjmoEmf1U6gwSQvA+wpoy87jMvD+eRMtUuMcRDm7qyVzfoP/R8tnH3tNxHNFV9mupkg= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nQBLXCfj38vH/kytFkdyxDw8+gHeqDoIc66VfoIZDf/ebdxphQeNVt8juoUHI40+4JrFm6YtM18Btrzw/AzHgCIoz0BVqIzZ2tf5nZqXSLWNbBxpjwlhbWw2Jxm8bt5Y3tqHUPyac5xBpeupe2twzp0KsXDtPO7ziucAaqG11kkfl5bR2OzDoXI7nlCeNcPNp5O/f5VT177gOVutpC1hDk1iZ6+/nDNnoXI3gItKir2t/pY2wmrDDEOhdnol9svpOlO33tdXqY9vWXeKihVecgpba1LYH9TB52RLdaMDlm/dam0GuR7agQ9N9/GB/KQIg0XQxIkr+fmc6uu8nkhkKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/jre9g1fLjwP+mFAS+wgpzITXtH5m6ISz1E24A06z+w=; b=Yz4vmZJvg6WHRH9v7EqvJzQP/C0ihmW0aEKPD/6sWM6sb7HbURdMORFcuDm5pxG9FZg+/rGhz8+n3QE3DYILjSV4qpgWIdE/O+RrFCOw8wMF7k69YbZdOoNweeT2I9ytZ4vSZ8dH2E81SVKDukZeKOCLZVGd+hoNqO7QDAqtYzyGUU0e3CR0r7aKS/ZaOsTjBoiOQGQKol+BBCvllMJoDCUjUX9JXs2Lo4OtJ+UTIZZ3ZfTWAQHXJBL6QzVZ8Tf2UtOIh2giySogUIXIkSfU92Xo3LVMjVkTI1lwsqAjrBthPjiEUxK4Ql8oDSPjaEbX4OGuCVclRa/x2oA3xfPfVQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/jre9g1fLjwP+mFAS+wgpzITXtH5m6ISz1E24A06z+w=; b=phd5Zc0Tl0PBNPJnF9MbJ6m4iSq0rTHA1psx8fdV/yIEpu7WeXNL/8zyBoNvR5fXgWJhVm0VfEejsAardKD6+ngZq67yxog7yXSk3KyMXw3XwgGe9D/vAxBKnC7jMhsyw4hX4GOVWnq1c/k9OBNA6/eMThEQrEqJA7uit5K297g= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by SA6PR01MB9024.prod.exchangelabs.com (2603:10b6:806:42f::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.28; Sun, 16 Jun 2024 07:22:21 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:22:21 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH 2/8] vect: Remove duplicated check on reduction operand Thread-Topic: [PATCH 2/8] vect: Remove duplicated check on reduction operand Thread-Index: AQHav73P4qoyaQrLb0mhxKacgMmoeg== Date: Sun, 16 Jun 2024 07:22:20 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:22:20.709Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|SA6PR01MB9024:EE_ x-ms-office365-filtering-correlation-id: 78075bf6-0dc8-425d-a5a8-08dc8dd50faa x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|366013|376011|1800799021|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?tGcMrcdT+AQAAU+ePs2D6sa9e9?= =?iso-8859-1?q?+lAGMF2asirR649wjjrYZrk4M17HHPuDieNNxTSPtYnefDmDMfu1EsZt6ZGt?= =?iso-8859-1?q?Aq31JE0p6acLsiUlSu6ouwLTT6hWdULhskJHSbO79SgZIb5/u8s3FLqWrtJA?= =?iso-8859-1?q?kEAm4ZxYopfMFcOIR3w98sVfq3hivvF5J/i0HH2ovldJ5umRbN1DM2aO6nf+?= =?iso-8859-1?q?it4Yw7XJIaAJbF/TYwumd33EwxmkdKDPcnFoEHLDq/adEw3Q9frtPG+mLM5s?= =?iso-8859-1?q?/akcSB6oATHpwjqxfJpb+f/I9OzMUwNkLEMKoBQOAVksbucIrPAEIrVYn8vB?= =?iso-8859-1?q?IbK9IV1QGYqLpKzS5eJWsjG62LSsypJwrHnBVh5YObr+Hs1gGcN8Spg2UGKz?= =?iso-8859-1?q?HGgyUfv+MPwFXVq96PQYl+MaD/SNrIeb7SCSfNJVbHIiFj95s+SJcLGAeaW6?= =?iso-8859-1?q?GtA5osHhk85vkA1r41AnBWShUtCtEzhwT4OZvOwAJDM57hFdbJDNjkq63VK8?= =?iso-8859-1?q?Eehly32NJNRXQFapZ66mrOpmHBQ/lgnimtBcBIWYp+dS1192Wf+QaV3YSw5Y?= =?iso-8859-1?q?WR6FttmcF94qFPdelcwCUBtr4zO+MmHFeRGay7DjpwJq+dEy3w3lsWEME1t1?= =?iso-8859-1?q?kGOBmTM9AqeIqgogMBX81vXwLKL0zl0NieEO9gwUoumDG4L9NZFl/YtvVqPZ?= =?iso-8859-1?q?4gATGL/Y0RfJMOjnA68zLJshwhKkQqwcWw8FGUjcb9FRvDfoCgtp1CafRnRv?= =?iso-8859-1?q?jZwx3JcB9bMOgut4GeI5w6cvfVjIqNrOHUmYvzpZ8pdWQdWysoHW3gbOLySJ?= =?iso-8859-1?q?mWEuxpwOlMbGqEgoPat9kTycAFBpBAikHq2VTWuQbErc9NIN0CZXDSTn45jI?= =?iso-8859-1?q?zhYA84CFGUZ8D6PtDvljb5uymsPNvVxch0H8AXT5w5xlDk5ib5TIVYwpYR9D?= =?iso-8859-1?q?uWYa9cQmnWVs33K3I3C/A5YghUKjTq3Py1IoSADfoO/Mrc+go5dQZy63f82G?= =?iso-8859-1?q?RjxEuipyVCfwbpyS8lBi3p3I9bKlsj2PgERGWcaEidLH74BZDcis4l3q0yhZ?= =?iso-8859-1?q?iCvsnybBF6N+o7Ju0nd290NYjIVlvnn9tZfLeA2AkRKU0ehi4aXTmlQSkYqn?= =?iso-8859-1?q?zxsgbzul21ghgIyLv6Jd+vaEiYwz8LFOETT/ApkeoKiXRPCiUEfCnfDn7IKF?= =?iso-8859-1?q?hTbrTYn1PRyqObOxPCJpDPK+asF80rQdBp5x7sfqHtzSuAVTPDtYM0y/2sf5?= =?iso-8859-1?q?KmRLLsmpWj0XMJxHrPsgWYK7vOHzOR1akYRWduit1EX6eDIkA9jAC/SmhNS1?= =?iso-8859-1?q?Ls+iMpMzKw3m/kW3Ip0YpBsWPRrEfnZD2uIReepL0P7JOTAeKk4Y1rlO5p9R?= =?iso-8859-1?q?m0QaLtygBGRKtD8EyxoA/P8ybALBkzpo09wSqeb/kUdu8W/Nznz4zaH+M8yc?= =?iso-8859-1?q?CF?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(366013)(376011)(1800799021)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?bUjEbP/vvlm/mR3fPEUCsh/?= =?iso-8859-1?q?ZqDmf2Ipr6X/dWXz2+TcXqSa9kayFmTAzTyRCfcMtTH9+MB/yDkbtFGcwEgm?= =?iso-8859-1?q?jFk9BK9MJs4HdlGYjgGDGpHQJw4txW3UHPKYwCfTcTz48WUr/Pr7GDyhiQN8?= =?iso-8859-1?q?lBSMT4PojscuCMcmyGDB6A0O/Oj4BnTx83gh6C4/iBXG/8eM83WHiIbmklpx?= =?iso-8859-1?q?dVja1d6wbk3W3xHSTdK3ZMNL3148aMNZvTm27j/dEbpkTrMtVJI9q7MaTXMM?= =?iso-8859-1?q?lwsSJeK+0a+RJp0jfLa1RzIuHdq3+l0kWgOcKgcIAJHY3e+633mye3VNSywO?= =?iso-8859-1?q?ghJThM50Latc6ThipxrilQb3qO7FWEsMPvma/z6Wb3gqbNyqGi9Y+MJYON8X?= =?iso-8859-1?q?oVS2+87tcfDyMKAABQRQsDKGb1Lh6at3osVZn5rir3dcFshPEpuYfB1za+cf?= =?iso-8859-1?q?YmPHZk0pGlAeb9wuMHjpTFR5L7t2ilW1uvq5OykSOmSWvgN93emoJ9SHTVVU?= =?iso-8859-1?q?Qq7wvl2gDs7rUp1Nc8/stScpL5rUPY4yhLtWVvCtuPfd6I3QExO4OxiKM8IG?= =?iso-8859-1?q?kGQShoqkUFuJB4/cu3Q8+YZJqsTQFj5+U453iHn9sI4QmEZM1fWITTx8JI0a?= =?iso-8859-1?q?8Z8NEb+yUeVMuBIjvMnRTfQRXq0MrXMF0yh8+Rz0ukdUJiZJIlCASuRXlnqK?= =?iso-8859-1?q?kW6JYjEW8qFBsf5pDCSYs5ctCqlxKl/HN8mGAUiHM5djsqlPB8l47SpwT/4j?= =?iso-8859-1?q?rfpFYwPs/pk5woCo8iSRoVrnMT6MpYpDoUUqkjQaTzfFoHvEeYjrUixURcB+?= =?iso-8859-1?q?mE7lCDHE2gGO34Hi9427kABpUlJXLBpi2WK8tXs2U0B3lkCOth0eb1Ttr7BD?= =?iso-8859-1?q?bUEvXIUaDBg2Mbw/EZ9xgG1jOfSsK7p0LZD6hdVRg51U1kxrhDlnpfNq+W7y?= =?iso-8859-1?q?WqUB94uFDRcRwaurLMjdvmLZSOnsTOGuKredQc5WijhMqF2XVguN0n+hg40Z?= =?iso-8859-1?q?qik1UX8Y0w1UGXYR0v1HioYFGjRdIX6DyPsYw5+pPFMaPpUzZoAKhJLZZtD3?= =?iso-8859-1?q?uTYH79yc3tmtghaxavMRn0vfXD7fqDHszG7vGNsnfEeSChIIXaCuE7B3G9wn?= =?iso-8859-1?q?eCOI06ruIhiTelJpo0gKKRTec9IRqFNvRvEFnR12h267Al0OjOBjtu5U8z+P?= =?iso-8859-1?q?+lwHp8fQe1BOf4UJvsoQo+r+6GI3gKFlLFCwOOWJ10RWDvGzGfH8zbB4u8em?= =?iso-8859-1?q?vIVsGFZL9t5yz+ND6hTmW+R6to6AgPtDLnJ9htXTPI4CnkCfSXdstZusvD42?= =?iso-8859-1?q?VToGDTNdpgvNIIlVSmRTxEIL4krVCTBV50SBwDcX+Hlp1kMSCWD6n/JOZiCS?= =?iso-8859-1?q?pFNcqtNVCTEa7LF6WBIoHH7+d+GwjVe1t7xFfQDhJgN8o71fdQ2p0IQL/IrJ?= =?iso-8859-1?q?fuoQFMvTr499V3Z9KrhaHqfzfGKJXPU7yqWEZ+YMCMoKdRj+0Gjmr+/MnB6y?= =?iso-8859-1?q?px4CesI/75J/4CPOzVcR5SNycIHFVtmhI5gXWxQ6Tt55YHphutBX302J2JdV?= =?iso-8859-1?q?moc6BNcADIF4fbo/bQMXwTWRTfSP9/SfBhtQn+4tVsN6CAFaAszSQh83k1qN?= =?iso-8859-1?q?ja1xIvOEaWm54V+yt?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 78075bf6-0dc8-425d-a5a8-08dc8dd50faa X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:22:20.9657 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Gzy746D3qOb+JAZaM2SZQKY9Cdmxb78Fo2d7QnotbxJTirSxhodPTJsnODEd5a9cQ6QEbI9xAv3YnxqNOBu23H9Zshb2JLv4AxcQVbxeFc0= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA6PR01MB9024 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org In vectorizable_reduction, one check on a reduction operand via index could be contained by another one check via pointer, so remove the former. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove the duplicated check. --- gcc/tree-vect-loop.cc | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) From 5d2c22ad724856db12bf0ca568650f471447fa34 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Sun, 16 Jun 2024 12:08:56 +0800 Subject: [PATCH 2/8] vect: Remove duplicated check on reduction operand In vectorizable_reduction, one check on a reduction operand via index could be contained by another one check via pointer, so remove the former. 2024-06-16 Feng Xue gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove the duplicated check. --- gcc/tree-vect-loop.cc | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d9a2ad69484..6e8b3639daf 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7815,11 +7815,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo, "use not simple.\n"); return false; } - if (i == STMT_VINFO_REDUC_IDX (stmt_info)) - continue; - /* For an IFN_COND_OP we might hit the reduction definition operand - twice (once as definition, once as else). */ + /* Skip reduction operands, and for an IFN_COND_OP we might hit the + reduction operand twice (once as definition, once as else). */ if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)]) continue; -- 2.17.1 From patchwork Sun Jun 16 07:23:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948264 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=mX3DTchB; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24JS3Dcwz20Ws for ; Sun, 16 Jun 2024 17:24:08 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AE9DF385C6C7 for ; Sun, 16 Jun 2024 07:24:06 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on20700.outbound.protection.outlook.com [IPv6:2a01:111:f403:2416::700]) by sourceware.org (Postfix) with ESMTPS id EE5983858D26 for ; Sun, 16 Jun 2024 07:23:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EE5983858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org EE5983858D26 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2416::700 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522628; cv=pass; b=cinTQwl/Lm21RuW1BHaMC1wAbHzHNr6GMJtolEXHA2ch9RV6txcSPsvaNBC833sB9mvhNiZlyBJFn9EyuNOUNHgzx7WO9bMF+LefGpBcV1FFB1eLJH88oOJwQvyf8io4k3WGbMrjYxreD2tcb6qr2dhaH/DzMl3Z/AsfaGtRd/Q= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522628; c=relaxed/simple; bh=kO2ajfqnqEOmv9E6vpnKaeAzwG7GOKkWqa7emoVXGjY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UGiSmONG9ep8WAzU8afQdGZqg7iwwo5UOK5dCDkEgQmFWQ4BmZWqDowS0nN3H8KVj5uS5aMclhBU7FIumkfLyUXTB50gnGi60JOxcwYYwZMBu+J5RQNINlgGMJ58d4llhkyPCcS+MaEo1WKQhr3GTwaMXnhxd6cgxqlLnIgO0z4= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q6GfqLP2qStd/NJQageQlkDR4XqZBJNOCtaTIqUBaczkTNxB5C3y5H2NN+pGEnzGWFZyWPQkZeaFwcTLAOZSTjbIg70QS5JXhRUItT6BalNYDxb6IPAeu2CdTLbqlgjjf3DIoxSIyTkvMXZTQwDNBvEXpB3Q6+I/pVVhSiUvGueJG7VHsi1nTjPX9QsImdtgdXrLxhFKIb8ROJNlMw1CIBSoqQfEqH4jBFlvxYFdNe7muW7BhOJuKIGLDal5ZwPR8y4KiRprs/A37iSavdo6jXckyPI8uDMWh72iWMfh/WM04AUMke67qNnOGiV8w/Kv9YXgrP/MrdgKZF2LxWlj6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EsdqIgSFFGchH142bDA0mkbzlbvpX3ASv3TDGeMv/Ag=; b=c1dokfqASsT966pkOdhAT3x5JVtRz5kt9k0yGP8ow6nd7114TDkgm0CDDIMtX3Tz70DUYguo/6likPLL52zZP98sLsx0O7ijzfWscYFutaNxTSNsQe+uCVKA8Ks2hpQX2zre/vsqZwZqgvsZOZeaN1xYWvB9ywuzMYibZTLxtV/wa5w0aRjGwldtZodTahffBRF6PTcSzeS/CaiTUhXWGFmZ4zBXukepoX6Wa+Kl/2dWkL17CDPHuWzwd/FgDyBZtzxcqnAZZ5Y7R076sJhR9mSKI4asJiHdnMnchdpC5zCjrYWGJjpV3eUY0JxIi4b2MsRMAiEXWN3P8qyUlQGbNA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EsdqIgSFFGchH142bDA0mkbzlbvpX3ASv3TDGeMv/Ag=; b=mX3DTchBanPzORzhSy6nEr6f4D8MNmZBMgycAgyYw1QfmTq0cqmYA39wEAIVMFHdUDsJVo8kCzfNxlrYvX7oIK9zV4EO+HaO1T6/zAFzaJ9vl0qY/DSNlfj+rTNPOzpTM3oSYavq6H+Ga80MlqqroG/u7jvcMbOoC5o54g9hPBE= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by CH7PR01MB9001.prod.exchangelabs.com (2603:10b6:610:24f::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.29; Sun, 16 Jun 2024 07:23:39 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:23:39 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH 3/8] vect: Use one reduction_type local variable Thread-Topic: [PATCH 3/8] vect: Use one reduction_type local variable Thread-Index: AQHav74AxtC/0twDQU+gV5ElrJ33RQ== Date: Sun, 16 Jun 2024 07:23:39 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:23:38.871Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|CH7PR01MB9001:EE_ x-ms-office365-filtering-correlation-id: d30d41bf-4ec2-443b-7adc-08dc8dd53e40 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|376011|1800799021|366013|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?6dPwCrDwZS69BfQDxMKm6p1AoV?= =?iso-8859-1?q?AEK6MOW7ISHs11HOfKIX86AwqDoY89R28pycRMsbd5dnTIQtXBQFf/WIgmE3?= =?iso-8859-1?q?Me9JcelqZEdPv1MoHaaE3NWAzYL8SQAaL7sk0/Vv0/rbIh6aH97HGM7CijME?= =?iso-8859-1?q?tXhW3w3v7Hvy4d1GyyfbQbRnYkuNODcyDCQLvw1BoOsfE4r5z+Cs1gXAxzL/?= =?iso-8859-1?q?VAtGdP6sOpRuJDCPUJOJ8wL1UKKrnKDJ2VpwSovpVmlKX0cvsIv1Miyer7ib?= =?iso-8859-1?q?4FroVYRumgGGCS0tNa+xh67radvs2OHScBpCro/HWZwCsodFKuse/aTsFnKn?= =?iso-8859-1?q?JBoHCyW8xps2dLY4/XHpxdHSuk4t6TDvPm7qGB9jn/L6Ow3kVH2cTiej3tVW?= =?iso-8859-1?q?9MujsrLAFbORmGfigTzmZYSYV2EEgmpZHpu+6WgtIp58v12Of1rNWUlwti2I?= =?iso-8859-1?q?lm46Zn3/1qY90mbM+58HpVAy3HrWcV0QQ95m4+emEKyTfN4UBiYDqsLCiH38?= =?iso-8859-1?q?8/RG3Lndl3AysJFd2uuTEsLdoh0Q8KYXcS3iK995VZJGgjLo9dIBjIQFptZu?= =?iso-8859-1?q?LNv91R6YItoaem1rrXEq9y+C+5ClWcDE7bkuoP/pX93YnXfU6F6mdQAVmVLP?= =?iso-8859-1?q?Mb5oquhPvOOaA30cSLbq91sccQUptaH9lg2hKFKfUY/lBQmC25gG42v2ibxa?= =?iso-8859-1?q?hTli4QzllMQLL0BkaKyqdztw3d9gnQ/6SAxDWbJF1DgVxHbamNjFoqmaZIJU?= =?iso-8859-1?q?wVGHWd4MVXdr96XWectz9ZCK/boPdkRPDH3UiwaA6sB8doZX5BKJA1dSAJMB?= =?iso-8859-1?q?zwYA69EM/mAx8trbk3OZUgZfKfJnyKTh9N2/1MeXi1+dLzyHCPW9CaIhj1n3?= =?iso-8859-1?q?LknWzNYnmIB0BY3za3KCRLp+VoYy1Q40Fc0svlYH/iSorV23EmZCb6RyHToZ?= =?iso-8859-1?q?7J4ASmFThN9pvn32VcKvl9f7sSLupz0ieeB9mK24VcEBsL2xCGSJvdaJCNYE?= =?iso-8859-1?q?VQ6b7LDn/Mx4crm136pjdRy/+THdHUrV+X63JEsj/wUrR19BBdWljzvaDDp9?= =?iso-8859-1?q?bxT/hMr6ve5OzP2zNvCr/36rYtqmfD2SGyUyHXkd4xzIWF+go7JVimKutjQV?= =?iso-8859-1?q?AKnH6/QnAxq5SqPg+vIMEtNPAJrXARe1nmNhLXjLX1QW7P1uZsVNTB2UL5Gq?= =?iso-8859-1?q?tLh+faef6G6Vve4pfhIrONy/DqaBJF3J6QbEehRxmbvEIF4QZrbVMX/TVPeo?= =?iso-8859-1?q?u8Vw8j2yXR7gOYC38TPbffZY2GmklKHqFwjFIfGUOeHZiIBfzNzkGTj3mMhV?= =?iso-8859-1?q?uRNJaiQp55htjXxlSwUKkRZz0l3zFw+yvqJBQtvRbq6uk7jzAIhqT/eWcXPY?= =?iso-8859-1?q?YkRIQtTsdVz+euGTlYF+Jx2jA745pezt5/rmwD6Sg=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(376011)(1800799021)(366013)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?GtlLaP8uz24SJRiOoS+58lP?= =?iso-8859-1?q?ZlESqnq6lIqXOqZcHHO3p/6nW8T3cqlSoHIBVKx8L+8i2BlVxl1EQTqfVEE5?= =?iso-8859-1?q?DaMvorv+bE9EzcfwOsTMx0VVgWmXnMEomDcJcWSmv5cxAcTfDHiAoBzk1OdU?= =?iso-8859-1?q?sNNxUviPbrPbA6a4N17eXD2P9Ff10QigXLjgXuFmHVjoJg9DHQNqOSjQ2JxV?= =?iso-8859-1?q?8cqKlvL9aA3RbK2/g8KAFgy+pB9Ttfrlw6Iw6iH6b03TbeEKGZ5SBtoFZxWe?= =?iso-8859-1?q?zpNP6lxwhaqrk0lXa6Onzjyh0VZqRB1/Dh8HD0z20s3uutODU6EUs96qL+Q1?= =?iso-8859-1?q?51xtctaXPLJw1s+MuoNpyiOUuVdiZSqZ1PqJ1bPn/GfSAVFkVMew1vjKkx4I?= =?iso-8859-1?q?48azZwz4fEXOwtp/CJEZO/smoF8HtzeZYjF+rBm6iijHy2uhR0b+1xFep3iX?= =?iso-8859-1?q?40iaCkN0baNh+VZTJi2m5ii1/cYEHOIUSlsI2IdWu2mexMcDuudWPSge6s+1?= =?iso-8859-1?q?zrZY2OjN8WMmIR1c5meN/FW+yQeEdfLV9I7NIzopxY+tHWcCca+z83amLMUr?= =?iso-8859-1?q?yCxtUzQJZ2k3nkfDDreDXnxU03mns1eSmCKkgH4FxASSeZtD6AslGipiRG9A?= =?iso-8859-1?q?VfpJUiMIcoA4Rv3cVocPx8ZDUmfcmX+GRUVeb7BiF03uzjAxlzT4RAWY1t9I?= =?iso-8859-1?q?JRc/8NITXOr/5o/Y6JRx1uTFeEwZJExUemruWXD6MDfkcQad3UMoJsNj1QmB?= =?iso-8859-1?q?JTVqNAUMrStsPfSNMkq2a5hZ29BI2L/s/lqAfS+dUqw2co8jqyh/nG+/H2lB?= =?iso-8859-1?q?z8158ofF+EsXLwgjxcL+M6HgLUJ/hM3bPB2ca9zLkbtzkmuIkmYqQQWqcrWF?= =?iso-8859-1?q?D1c6W/i2NUhhI/m0Vtga/24u++vlEi+anKwg1xQWfGC169LpZ1PeWPTxxTf7?= =?iso-8859-1?q?dzPpusUktnm58RL+Uf3xD6eQzw33So+QNA+7gjziJT6Vf4IcI65xfQVuq5sg?= =?iso-8859-1?q?MCoF6YRGYpGpdykUq8o3FLHNXKAY8d+kLQqHcjTu4jh3NsnaTuFQMUNHdgYd?= =?iso-8859-1?q?qJRXN0PXEURdp2Dh6TWMKSmNKHIxEKN1mp3dw3Mz7Z1Y5/j08KjsiRXtkjb8?= =?iso-8859-1?q?YGWXJ//u31F1wDMryoERZ7zaZn/u/cBhMt7cMyRvn6xy9llxHUUIxUQX40MI?= =?iso-8859-1?q?8S+m71unoVkk6iHUrib+2KiY9/t6p5isOjul5uhlksDpx/sYNjp8qN2+d2Hi?= =?iso-8859-1?q?e0KV5Ei9OmIwI9fvhz0LU2O6EwsMxS9SOSRHpooU8APDQwlt8LN5ZCacoWed?= =?iso-8859-1?q?LNPamZuXjYd7VfIr+iPsdat2UYnm+suE5w5klzmaU6c6YSEpFABiGTuw7WbN?= =?iso-8859-1?q?y67Uw0pvDNHp1UZlX+zYJbJSVGhPdOLwp7rlqxwdktFa+iBjEvSRHy8Mywcf?= =?iso-8859-1?q?6pOZvSXmrjSCxJm8QKR3pHSHXVesMcII+wKOcMWvghrd15ICPxCoP9+YqCfN?= =?iso-8859-1?q?/2sm8DBlQgLwwc8MlBXYY40VMSHDUVg+RgQR+OUcRHLA3WOL41U55M0TkL5m?= =?iso-8859-1?q?SoAlFqAre5FdYSRNmTR7XcGHPyfh/hh9ISvyWHZRUeJ4FxmAMifld/W+W+1a?= =?iso-8859-1?q?jET0v5Fx3lom6clX+?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: d30d41bf-4ec2-443b-7adc-08dc8dd53e40 X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:23:39.1541 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: yyn1+Hh6aQthOEvTCEBFVPBQIWKEFyOlFUjpdbFi5q0zrhKuVpEieMCe0cZX/Wrnon2W9tvB5pZXLoBxdgIfLKJo2ME+VK2Lnn14I9X1n5E= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH7PR01MB9001 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better to keep only one. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, and replace it to another local variable reduction_type. --- gcc/tree-vect-loop.cc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) From 19dc1c91f10ec22e695b9003cae1f4ab5aa45250 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Sun, 16 Jun 2024 12:17:26 +0800 Subject: [PATCH 3/8] vect: Use one reduction_type local variable Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better to keep only one. 2024-06-16 Feng Xue gcc/ * tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, and replace it to another local variable reduction_type. --- gcc/tree-vect-loop.cc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 6e8b3639daf..0f7b125e72d 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7868,10 +7868,10 @@ vectorizable_reduction (loop_vec_info loop_vinfo, if (lane_reducing) STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in; - enum vect_reduction_type v_reduc_type = STMT_VINFO_REDUC_TYPE (phi_info); - STMT_VINFO_REDUC_TYPE (reduc_info) = v_reduc_type; + enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info); + STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type; /* If we have a condition reduction, see if we can simplify it further. */ - if (v_reduc_type == COND_REDUCTION) + if (reduction_type == COND_REDUCTION) { if (slp_node && SLP_TREE_LANES (slp_node) != 1) return false; @@ -8038,7 +8038,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, STMT_VINFO_REDUC_CODE (reduc_info) = orig_code; - vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info); + reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info); if (reduction_type == TREE_CODE_REDUCTION) { /* Check whether it's ok to change the order of the computation. -- 2.17.1 From patchwork Sun Jun 16 07:25:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948265 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=A9VbMU3u; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24Lr5ysNz20Pb for ; Sun, 16 Jun 2024 17:26:12 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 06F9C3857C4F for ; Sun, 16 Jun 2024 07:26:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2070b.outbound.protection.outlook.com [IPv6:2a01:111:f400:7ea9::70b]) by sourceware.org (Postfix) with ESMTPS id 12FB13858D26 for ; Sun, 16 Jun 2024 07:25:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 12FB13858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 12FB13858D26 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f400:7ea9::70b ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522754; cv=pass; b=WP34F7x69lrWonIA9hyOzJZ6+yZFMModJVcgM1wIWW/N2cfnz27Z4N2AuGMp8nHoiMSnX+q+zOAmeSeUAX/cx1ChtpXw3Swv9hC2gfhDOPnei+aaR1wrSVaA6O8422wuHiXVd4UMyS00sgv22mHcMiUrdaOikbauUW/8y1OtThI= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522754; c=relaxed/simple; bh=ppkdpX90oC22ekZfmOvwWU9MARzxAL9bDh8Wsb4K1UM=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=hYxnTBpeyVOwiknDenjcZkI7L52zifrc48Irr7SrrPKu0hFuwJ0XIa+MFKkg4LFECnujpLHuZZg0GfbD19lnkwHotLVuAkPbX+R7UhBs2qy31Lbne4inFd3CedngOZA42qYXqn0IoMVFYFaW597oPEwNmQ1S2IFLIjPknYkkNW8= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LkYml0NC+MkwvUdQgzGSPrW/uw4Tbr+zxxNvm5szRv40byp4KNpWgLJnHnqagwQN2lRXSMS7QTYBFAd+t+7XYlu8Vg8END8BArJDnm0o3RvCl+VjKV8n0i0GrDHUvXmw/0buSyFn0fLnZweqN5pvJ1Bu7yilV6wEkktXsgErp0uzV9oXZKLLstw9Y483nqlY7jNSqUzO5PDmOIYVF9gXRHB4ugWd67N61H/V1OG2JbzWftnlZDg0hbDwK/M17VrnvdJ8/6OdJxCRDVf0ub1GKEtlxXB36WX3205wXSmdwMDtafCduQ5u7gLAa4HTGIZsIMS1URMNAmRkXHUQ2GztYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=px/CZK6z2coFjxr/w2tYq7nii84s/CJb7bvDkZrvYd8=; b=lm8oPCQks6EFjZShySBydsPUpaL5zsV+8j2D3zDSIsLiaBqB7IAYKHwQL3J4BQqWNuS+bLhVNJ21WbEpZ2ND0rToJEvwu8RZNMcMByUKC+Ccu9ImNRcP0OICGSiYmJkhDvN4syIQvLD3K+80fcaZdnwWCGoVCXjIQ1t6QPSjJ/gr0PDuvzvXn7D4rG56Vya7Z9J7R16T61DApMCR2Au9e6yTkP41nx6/KPDeSCqOvBd2ns3+/xHvFSHn3oMacVAXi2hbW7a+U948YYTGwwtmjcNjQgFqXS2Y3oOjmJdJfJ/hVCNb+/Ex0Nwd/P8CXB4BIaYqBOqO1Sj8IaoGQIIMnQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=px/CZK6z2coFjxr/w2tYq7nii84s/CJb7bvDkZrvYd8=; b=A9VbMU3u1tqP+l1JITU8XYDm49hUhEQK4zuqJmkHnAreViPiYXEmfmCTOVQZ5F0zZfZMmFuAPavQF62HVsVNFNa5FVDOTksQ8EdI3+xFZNm2D2MUwzc6wFcdQ7ydhimD+QD/YIM/R5CheHWCQgec6YyWR5vZWz8+/zgBWmqPwaU= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by CH7PR01MB9001.prod.exchangelabs.com (2603:10b6:610:24f::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.29; Sun, 16 Jun 2024 07:25:48 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:25:48 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing Thread-Topic: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing Thread-Index: AQHav75YqlRgryS1j066PGFahXVsRA== Date: Sun, 16 Jun 2024 07:25:48 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:25:47.838Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|CH7PR01MB9001:EE_ x-ms-office365-filtering-correlation-id: 2278f9e8-e6d3-4f1a-d39e-08dc8dd58b26 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|376011|1800799021|366013|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?pkhtBLV+rBzebG7dLWYKA7XxZU?= =?iso-8859-1?q?qWq2J3mIo5lDlftCbw5dkR5TmZOapwsYJyVtO1qGBx/fr6KEfYCM4qifsShy?= =?iso-8859-1?q?hC6d9QqcnzMmEa4lmU58IEUzSyz+S1izD5/5H/T0Ax7VJ0RE/h6WrT0me0wd?= =?iso-8859-1?q?pePZI+dKBd22ZlHl2oOQhN8iN1GyiFZyIB7CRN6ShJJiMELKNmT6vItLSKAx?= =?iso-8859-1?q?wJ81ujIMc1VJk421cU1Oc1vy6vvyTtM96d6HpUi531QG2LRIT2hfOsN8l4ZU?= =?iso-8859-1?q?KShgDwqXT4uihtmzu6aUa21yMkvfByiyJcU9B6AnCceuYaAqr7iRYNhkZr7N?= =?iso-8859-1?q?B3QaJDxu7LKHNKt36WXL4OylufEJ2VA1LBj7J7ORYkmp+X1fHzK/TjUwmCl5?= =?iso-8859-1?q?OVd2AI5/2yzhFG9jGO9Js3r3eOpzcULVQ/NHRech/3zPU/15SdL+P1R7+UiK?= =?iso-8859-1?q?5fNHhCGx85SiwVk6F7rmngn9b3iDaKoQ6dhy1FKERkW8vWZn9hpveS8FJCPI?= =?iso-8859-1?q?+YEDqcUHH6pYEojy0FxpI0q84EkLP5j87u15FZaPsT+SWlHqu63beostToZs?= =?iso-8859-1?q?qd14BBkm4crEUe4CAZGGf4IHdTpq/llUruvVXSSEcjry8z0SZOuLmSxk0yT0?= =?iso-8859-1?q?Tt9QYswgp/bC7S4kS3y1sVlufX0K1sjpJ/Hhs6Ssol49/5ELCPi/ufrz05/Y?= =?iso-8859-1?q?MAdmwDmmOYZ4D9kzxs9b+5G2+/+KoXqP4+Vzcjs8VSV0rPSUuOGeK5DTdDkg?= =?iso-8859-1?q?c4qo1SN4vJNKg0usswbnuIfMls4LZQhDJS295uhz/c8Ir8sDs6ycNLmmNmOt?= =?iso-8859-1?q?z1fsmoGfTHF9b5KctrnJBv2iKQSsPG+Pkl3RV4OgSGr8OYBCW9nnOLyN/+2s?= =?iso-8859-1?q?Uzz1DhdUU7k9nBw9BAie0Tv20j0i1KrASjVTKxaWDH9PX5xs7l1War/NM1LF?= =?iso-8859-1?q?gUMdxphOKbBS62MhvCA/kIKWbMKqqsBbxOLZ5nNuGS/259YueI/Xw49Ow46G?= =?iso-8859-1?q?pRBLIzhlynQU6wIokwmX6+8RYECinWoJPCPSlM8g2IQNSTOZDXOqnnpT89an?= =?iso-8859-1?q?i0fMhHZM4BtCgT3TPCRmdxyx3MEIh2qFsdCIrfOZcPKmCWyRx5j9OT3MsoNz?= =?iso-8859-1?q?EveDO2RsnHZY3sbfXYEZ3QDm7O1euM7Dn74B8moQV0/jiridhwVxJHwOFRNy?= =?iso-8859-1?q?hm+l/uTnJjR8qtMJMY3zFv/NVES0aH/oxmIT/I9Vcr3qFc3Kp6FYANNZ3cEJ?= =?iso-8859-1?q?BQG8TWrVzIzeMMWKHjGs2b6B+5Nwi0aQKSqOARe8xFHq9HBJMQrNrE9xo6Ai?= =?iso-8859-1?q?R4eJ0dJ/KllRHsSgf9wktfCdhfV23SGKrenAQ1bRcTPzKbYN1eRqSeDrwna4?= =?iso-8859-1?q?pqmKNDxXiKH8gpMUDPJIy5/muiEVllPD6W5j60sp5GbcKNDblLdzGrkSNQ3K?= =?iso-8859-1?q?qT?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(376011)(1800799021)(366013)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?+s6Cecm4mFn5buOp9E09MGN?= =?iso-8859-1?q?1GpPx/lmaFQdS8xrSwfvvNAHZreoBqSQXiVl31i4LoxAeJL/lljkopZ5Sts3?= =?iso-8859-1?q?fluJUt2vslXCGg5KWAFbYHhsAZxH3I4Wd59Pp6PZSvoePcEb99/Ixs7UXOdq?= =?iso-8859-1?q?tIzZZoK3SDfJuYnKVHM7AVw0ln8Nj01rGaTdHTJjv4QzYC7JGwEGQz/1YvvR?= =?iso-8859-1?q?CMQeIw9JC6uzO8kEfVdOPRf/rGX4mZNfKhp6JKR20Qu6p3TVxYH/AcggShHR?= =?iso-8859-1?q?ruRLRjb0qwMOS7f1fnmHFI+OnQ39B5YFPv+9LDYWnhY09GDcN+dXRo/2+Bw0?= =?iso-8859-1?q?xDmcP1mRsvI2+KfSe0hGX/Yqby5XOsz/T0JxRwir4U7rBIJ9bLVFixmU52uu?= =?iso-8859-1?q?3wpXTAccYvDjhSI2LNE9Ojn8QcS2N815NAr73JXX+t8ZeaIcZjVQIInTKziK?= =?iso-8859-1?q?Sd8silNaptypu18DSRUc3wUA2ljh6ah4Rlgciuqx/cIMgphq/x0pDPPWwQFM?= =?iso-8859-1?q?o9mScGh0qWSyAjo1EK4XYxAuCJhvu/T7NLueA0Jr1rTDKpFdt+ZVMZvLjLgp?= =?iso-8859-1?q?EM+++iPfkjKvucZrMYgOvqHWGPluwW3mDvLTRbAfiwzuMrEUPdySruq4H2P8?= =?iso-8859-1?q?Lx09cO8/9JzPuAscXZSfXqJezWCsqwqUxJ5bn6LKe7PDDAFBpc4kUevqlUjt?= =?iso-8859-1?q?HvjQTSIPDBALMQl4vRpvWMLopwPZcsq0M4TY2EwoBtVAkPQ4fMPDJe4MjS9x?= =?iso-8859-1?q?t1+f3HIyO93g+8birJ9ZNQOmZLDl6s2Eeby8WugLzWtsF4cJ0hbIsPEIZVKZ?= =?iso-8859-1?q?BsyjZs4lc/RNOk42QUct9D0Cfcef/nVq3KZ9DayCygHNs2J85WVrQM8RF45k?= =?iso-8859-1?q?ARZRGmaAxiqAeOHvZKZ52kLyNzKY4OsIkfkKqozx8QWt+LV36/NEfu6/6Tqr?= =?iso-8859-1?q?13hgjyX9AHFSkmQTlqHd4UlH3oo6ViIgAVv65jJ3s4UaRJ8N6fEGqcn4s6Ut?= =?iso-8859-1?q?cEu9lXOXCR9rUuG+XYtTTHnQgTsFLoEae2SN7jXZueSxS4IJODmRdZuSU37m?= =?iso-8859-1?q?wzjbPVCnMnVD17U7P5VsE7cXu29xFGOZIruLyT2QzVmg+32lwB6ktTFqq5Zs?= =?iso-8859-1?q?Zik5kSQcjeJsO2rMwM589gwuqYhAhvUHeykZww30p6atMfuWWcSk90PqMsUJ?= =?iso-8859-1?q?6VikuyQD7SjeqgUU1FapDqbE77hk6SBUMQ5+IRzv36EWIcLpKaS6NJvl0rDl?= =?iso-8859-1?q?OOTyJabwggPU7JkkwhyIROsnf2Q2uC28P6RckDqUQZXx3HcnWapSsLvO9j41?= =?iso-8859-1?q?APlCf8cWyuA+43W7aaQLTlRxY/+C3e6u+nED6FFE2ABaI5broVxiJcvs2ZKC?= =?iso-8859-1?q?pCEMapllddliEc1g1Tu6X+IQHSegO8WPyJqYwDpQ8tyZAJqf2h7Pnh8/aG3s?= =?iso-8859-1?q?6X9kP/ugk1Ou/Mu+6GLGQF5U0ZN+WSv9txHZRA6UEQj58Bc+c3rnrDtOXDux?= =?iso-8859-1?q?b1NPsWwD+/nXMygGG0cminkZDxKjTcxC+BVYNLBQOX/ykJhDOjojDqvvkwJw?= =?iso-8859-1?q?XXdP2Q7NnVUh02U09gRq5Hc0TZi+QiEbUfi8n0ep8M9nBG51nrXxSJznsGf8?= =?iso-8859-1?q?4ZnDH6fX8Z3IMvPPx?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2278f9e8-e6d3-4f1a-d39e-08dc8dd58b26 X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:25:48.1277 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: YZGcRFM2cDFCaKMlMCq5NpxxawyBthwl71sxwesGbaJ/4WYzk+vnoAgWLrTPjx9sqJJmScRUnn+LibpLNqvLpS+lQ5CqTsiOfGC4rHyTe3A= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH7PR01MB9001 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The input vectype of reduction PHI statement must be determined before vect cost computation for the reduction. Since lance-reducing operation has different input vectype from normal one, so we need to traverse all reduction statements to find out the input vectype with the least lanes, and set that to the PHI statement. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_reduction): Determine input vectype during traversal of reduction statements. --- gcc/tree-vect-loop.cc | 72 +++++++++++++++++++++++++++++-------------- 1 file changed, 49 insertions(+), 23 deletions(-) From f9aa029eef44b65bd6b54f9cb236a4be71f8ad52 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Sun, 16 Jun 2024 13:00:32 +0800 Subject: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing operations The input vectype of reduction PHI statement must be determined before vect cost computation for the reduction. Since lance-reducing operation has different input vectype from normal one, so we need to traverse all reduction statements to find out the input vectype with the least lanes, and set that to the PHI statement. 2024-06-16 Feng Xue gcc/ * tree-vect-loop.cc (vectorizable_reduction): Determine input vectype during traversal of reduction statements. --- gcc/tree-vect-loop.cc | 72 +++++++++++++++++++++++++++++-------------- 1 file changed, 49 insertions(+), 23 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 0f7b125e72d..39aa5cb1197 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7643,7 +7643,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo, { stmt_vec_info def = loop_vinfo->lookup_def (reduc_def); stmt_vec_info vdef = vect_stmt_to_vectorize (def); - if (STMT_VINFO_REDUC_IDX (vdef) == -1) + int reduc_idx = STMT_VINFO_REDUC_IDX (vdef); + + if (reduc_idx == -1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -7686,10 +7688,50 @@ vectorizable_reduction (loop_vec_info loop_vinfo, return false; } } - else if (!stmt_info) - /* First non-conversion stmt. */ - stmt_info = vdef; - reduc_def = op.ops[STMT_VINFO_REDUC_IDX (vdef)]; + else + { + /* First non-conversion stmt. */ + if (!stmt_info) + stmt_info = vdef; + + if (lane_reducing_op_p (op.code)) + { + unsigned group_size = slp_node ? SLP_TREE_LANES (slp_node) : 0; + tree op_type = TREE_TYPE (op.ops[0]); + tree new_vectype_in = get_vectype_for_scalar_type (loop_vinfo, + op_type, + group_size); + + /* The last operand of lane-reducing operation is for + reduction. */ + gcc_assert (reduc_idx > 0 && reduc_idx == (int) op.num_ops - 1); + + /* For lane-reducing operation vectorizable analysis needs the + reduction PHI information */ + STMT_VINFO_REDUC_DEF (def) = phi_info; + + if (!new_vectype_in) + return false; + + /* Each lane-reducing operation has its own input vectype, while + reduction PHI will record the input vectype with the least + lanes. */ + STMT_VINFO_REDUC_VECTYPE_IN (vdef) = new_vectype_in; + + /* To accommodate lane-reducing operations of mixed input + vectypes, choose input vectype with the least lanes for the + reduction PHI statement, which would result in the most + ncopies for vectorized reduction results. */ + if (!vectype_in + || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in))) + < GET_MODE_SIZE (SCALAR_TYPE_MODE (op_type)))) + vectype_in = new_vectype_in; + } + else + vectype_in = STMT_VINFO_VECTYPE (phi_info); + } + + reduc_def = op.ops[reduc_idx]; reduc_chain_length++; if (!stmt_info && slp_node) slp_for_stmt_info = SLP_TREE_CHILDREN (slp_for_stmt_info)[0]; @@ -7747,6 +7789,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo, tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out; + STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in; + gimple_match_op op; if (!gimple_extract_op (stmt_info->stmt, &op)) gcc_unreachable (); @@ -7831,16 +7875,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo, = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE (op.ops[i]), slp_op[i]); - /* To properly compute ncopies we are interested in the widest - non-reduction input type in case we're looking at a widening - accumulation that we later handle in vect_transform_reduction. */ - if (lane_reducing - && vectype_op[i] - && (!vectype_in - || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in))) - < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_op[i])))))) - vectype_in = vectype_op[i]; - /* Record how the non-reduction-def value of COND_EXPR is defined. ??? For a chain of multiple CONDs we'd have to match them up all. */ if (op.code == COND_EXPR && reduc_chain_length == 1) @@ -7859,14 +7893,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo, } } } - if (!vectype_in) - vectype_in = STMT_VINFO_VECTYPE (phi_info); - STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in; - - /* Each lane-reducing operation has its own input vectype, while reduction - PHI records the input vectype with least lanes. */ - if (lane_reducing) - STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in; enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info); STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type; -- 2.17.1 From patchwork Sun Jun 16 07:27:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948266 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=b5Rc2D+b; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24Nr6kslz20Pb for ; Sun, 16 Jun 2024 17:27:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AC10D385C6C8 for ; Sun, 16 Jun 2024 07:27:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on20710.outbound.protection.outlook.com [IPv6:2a01:111:f403:2412::710]) by sourceware.org (Postfix) with ESMTPS id 274233858D26 for ; Sun, 16 Jun 2024 07:27:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 274233858D26 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 274233858D26 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2412::710 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522853; cv=pass; b=rwyV6Q5GaqyHhR1ke/ZAOhBb0lKzUWMn1VPXNsKaILEI0cNeTtWQRuTYxGV0o/tTczTmh2nwD0JC0oTRaJOCIL+5Ds6igirAL/lJgTf1mNvb2PBmrLwZv3f9B3JAOGrHFh/NaYTdssb5BmBLX0vMhVkTyuzm744QYT00EWBjxXA= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522853; c=relaxed/simple; bh=eDVaGsL+otDBK40xnSeY+qM797VfFzf/V17+lXXoNxI=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=TEtaFfNmYGI9akSworXMc/ukKh1VMKGawbh5tpW+HK8LV9thN3fuGejQPlphKHnOpob9+5AFQDzdsCein/wGC/7nRTzH3z4ZFrVpFXByQZAdN4Pn/hkNvdEeaPo8Q8EFu2u0yur7snKJORh0LKFz35zOuJAqVqcNfUfBQ13UBw4= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=e82tZebDC9rU+XoMZMcVzzRosXmOD72crIQ9rODH7qTscxOj3+8rsjQ5coSfaUEuiq4H48mCPA5V4gLfHd0D1OhCoz82MlLifNF5yOGEjG9kVas025AlQ0quYVP7OiOU1lf+DGDuuzhU13mKTMrhc0VoM+1kiOwknSRSJfg1j8ABdSVWmIDH5MWD7CSLSVaqSjBqqTN/eGbHUOeO3CcN/rURfjUXyBtfNfJ/Yxj5Bp2WhSoANpN+uxdHQDUic98TN1wnKzHZgJW887anPs5lAJ6u2Pn2mzFVGkCSf9qllMx9I9cV/y3WeNZfCqsyfqRW0Ha+DbxSfhizqpdRM+ceAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QbTvIHKjm/JoE3qJmdkMCxcXlX2YUWdOwFd9VfECjNc=; b=nFGrF0vYs/MAMjZyQsZwm+AXLlMtd3RywhOhVIrVJiCHqzhqXXXwezDkd0vbXXiCIRHg0ZG7yyZ280YZUsRjPLqQThljLAOAVxIEBgeX/1AJKEA41TNfJ7PGL9D8swrzV3Q9rI2ycqks6Z3w66vbxKKT5cZaTf44x6MCWWOOIoO5LJr7J4hXuKXakn7YZif3L6sPgCK7GbAlOvPTsvEuU1El/0+yifRfwCNlOJJTyWZw13SVurkDZPjUK3WWoZf4v4khe2fT4ymnqysKkOoq+RJkBJTKnF9E0+HwPPrp6i+IFO+PqsIFWqF+tI4YrcVENGhRRm4vKc7Nubc0S682SA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QbTvIHKjm/JoE3qJmdkMCxcXlX2YUWdOwFd9VfECjNc=; b=b5Rc2D+b4AeGYQWtQJzdqj1XVIPfVOUi5LoEUr4N4iDDkddRk7KKK7Ks6ZB5C/Me6/tgixc/NkYfqQwZ1AGvmGOCxBsde95B2EWSnXacN/u0WRDqLJegjQBpqITwSLYThUhTRrncDFmX0HNEPaMsfw+hETyDFnLs5MJAzgVuprM= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by SA1PR01MB8592.prod.exchangelabs.com (2603:10b6:806:383::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.25; Sun, 16 Jun 2024 07:27:27 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:27:26 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH 5/8] vect: Use an array to replace 3 relevant variables Thread-Topic: [PATCH 5/8] vect: Use an array to replace 3 relevant variables Thread-Index: AQHav75/VIkSwHzYX0GZHg1+OGjcBw== Date: Sun, 16 Jun 2024 07:27:26 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:27:26.074Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|SA1PR01MB8592:EE_ x-ms-office365-filtering-correlation-id: d4e10f95-a6f9-4d78-e823-08dc8dd5c5b1 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|366013|376011|1800799021|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?f1cIL1wRn3yEV1wsNcYALyfhIP?= =?iso-8859-1?q?qNzl223u/ZHNVbWLGWbNhituror4/2/XE0p6wgQBvcieq7qffL0WBn6WiVPl?= =?iso-8859-1?q?ZVqmjwLb0YSk0vK624JtheRr0OT52EIeP4c5qnUlRWT4wV019MHgIMw1P2xH?= =?iso-8859-1?q?0BR8OQRtbd6r7A0TDQIdrkl5qpHicQrIKTpT9ns/pIgXgFNKEtzzMLOF9UhY?= =?iso-8859-1?q?i6lDYLVo0PqJyOa7UglSpt1U2fLLJNHTue/uQsEXA3ZiQ4Osm6b3s2ra1IxX?= =?iso-8859-1?q?BWnKsFaXhstJ4PrZKj8O9kMksTZzdUpBT7DGDpGdwBI7Yc4e0JMvsc5QGm99?= =?iso-8859-1?q?FhWwnMMAKm+SXqf/roqJiZT9MLTRsu2ISPP0Bh8YLfpU/wXF+H2ZFo9FfRml?= =?iso-8859-1?q?jXCbay4dLDCPN10jwZ8Fvv7x10aE+jauQX6ODXdisV5c4I0DY9E3IjbTwtQV?= =?iso-8859-1?q?YYyPeZIV7YbfYccJtVzF89dAYZHvQHKmit69ePHTIiMl4YsZcoevuAVo4MlX?= =?iso-8859-1?q?1/flNKnQwMySEcPgwstKtL6+GutPkXDNMiBziQi3i2FAke4XFjJKjEnRkpF5?= =?iso-8859-1?q?qcRLX6r+1zMQCQqNCllXVWue85RnlNW5m8Y1+u7+IFCAlyVxKBgBI7v9Ql73?= =?iso-8859-1?q?18KxfvAUR2/EQgmEfXvsNAQFY8oP4KjZwRgTgMqr40O4BwKnl/dFN9JX2ac/?= =?iso-8859-1?q?eaSU6luwSrhhYSjfz0OdXPaowBdR7ci5V/f6uhY1TfVlZRBaOAAVHhYEeTk5?= =?iso-8859-1?q?+jsF0pAp0hxn4XI5hEp73qL2stn/PogCmpza9u9KeQTrwdBu/GI+4FK/VqVi?= =?iso-8859-1?q?/7/52R929AqepXqHXco6Tmi7nbfOh3GLO7vfmSJ8kF1zH+CpOZrV+3AjMm/u?= =?iso-8859-1?q?ol0xhq2Bf0ErAJkU4axO0ITgCefLKgRQ5wXoeYof3gVKPMO/ZK4Sz17Rc1N3?= =?iso-8859-1?q?LzSEkhNEiD50kUc0BGX7DJX3du0lVaY4B2nVIJzGB7wIWIGLf8TJHuLMKMCn?= =?iso-8859-1?q?YMnDT9twqe3/7a2Fw3sJg+xLjEY2yx70Xnmd38BiC85Y+ZxH+4DBa4dSO3Re?= =?iso-8859-1?q?knJumKA9RQddVgwabn/UxpEWOYYmz9jJ26dtaciTgDnFCJDDKO2zW4APlhw0?= =?iso-8859-1?q?0VA/GlIErXeL5TN+VlrlnEywJ/l1hyaQaLJRITv3TTRTJBPB24vhe0o+mUuM?= =?iso-8859-1?q?cfQIhOgk6yDg5YpVvW6exWkgjXskWqVZXowTpxRI0zx77MkhiMgMQ8vp+LFw?= =?iso-8859-1?q?E0EOERrWKIm+9O0QjtA1y8K78wTKOIjJne4Lnh/gOHKsFoFKJmzfCXzCPwDV?= =?iso-8859-1?q?lO16JyWVVJIVoR6BUMKj0YhxsTb/zYczuA4hNckWrET10d4zZzfunHJ97yYQ?= =?iso-8859-1?q?r95ZcJW1+5hreQbzg7MjmyFhexXZ+mhbxn2hlmSRs=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(366013)(376011)(1800799021)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?NKlTuIeXWm0vRHW6K8XaM+8?= =?iso-8859-1?q?84385OvC0urHyyqL8MSflC+al3d0NnXTdNMs35vzPgSoyYHdQXVAmagushfQ?= =?iso-8859-1?q?YnQa3KYWcHe+pABU0CrLbITMcTQcQ1eig0HdDO/y6T3Qp9VJ5wj0KyMXHrW+?= =?iso-8859-1?q?nq6HZtUb8B6eVCmc2tVUBbWcfQ99qk0vikX9MEhjyOWW+JgeB7LhIwedL52r?= =?iso-8859-1?q?Ysf3P/IeuengjhqqWWkLQNFw+T1/8sJ5nGt6zKnkyJKdIW+s5AdPm5CoIxyj?= =?iso-8859-1?q?WE/7N6+P0xrfwLO0prbGjfU1090vIVTp2TIrwAwfZSMkKLhrazxvZvIBMTVW?= =?iso-8859-1?q?v9XGK2ePxZ53hcnbaIno0sOT3nzR8B13eyBUZdgEkf1EPhq14hPG1y0KXt46?= =?iso-8859-1?q?/E+83PlT6HcMIdSkh5ZX2vcRsMuMEwOWL7ML1b/TM0wMvMEHwOPe0UN/9U1l?= =?iso-8859-1?q?Ja0SCgGSny91k1MXJC0jnEhezacDFWgd0qpdADSuSq0qGygG+4n+jzvx7xah?= =?iso-8859-1?q?zQ/o3QhWDpDsPSTj+yMLZeXf15lvwZ7+4QOYhZjrQRV87pdjt1cuyVNogDzR?= =?iso-8859-1?q?njK6wcp9p4M6iUBDLoBYLBfm21htXIfCRzN+JoYcIAyHFsNSfmPZiGDn6eW2?= =?iso-8859-1?q?BfV5y03UZGgKxCQRylhwIu+7u1A01OLGEu4hzbH72htHE5Dk0dGyddkxQOcs?= =?iso-8859-1?q?oSzEfE9V9k6H8opgG8zPusBcJJtSAKq2gaTloL6Onqs0FQwTQYXJ4IP0lHdo?= =?iso-8859-1?q?hjaPP7XAFsrKF2/xR6axpv6pt+suZbVWDMGBNndLEIOvoPeySlDkb+pq0uTN?= =?iso-8859-1?q?0aFyrT24xexWFDWxy6lnRC+PiEEolsJcjd8EhFrh3L8Z79aj89myABmGcZ7j?= =?iso-8859-1?q?a6jc8BBW+cvT8Fj9N0IDZNn9DORQK7cVcHvx2Mpb8yuhltroCY6Vo9Vz/Dfb?= =?iso-8859-1?q?Exo88jKVlyjg1Mib14cfg+m36KguWcc3oIvn9ORROA8HxSl70pBcjillDG70?= =?iso-8859-1?q?n+Ou6t+eWkFqkLfQVhNuAMtZdChIqCW9KVjTiGHqU7lwAzKpNsksXqepN/7f?= =?iso-8859-1?q?wpbCO4hlwhPsG8qvTHBiQDEjuZxt0KAf3nmC+eSPx8K/3N/zBuj4FnZsMIc0?= =?iso-8859-1?q?fB0l/sRtsAAlCiCOWX56WZyT5BTEyvFoL4Z1mqvvqeQowkOOyclWgEe1UVrW?= =?iso-8859-1?q?lCLXmpNygIaNm1YXSjv3aBzRmT5hI1361dxu2y9eFzB35tJuCwmYb/ey3BXg?= =?iso-8859-1?q?YPHeUo+tBRTg0iHFgQDD+6mP7L3tcq7zNMNeW71zGZZED78TVDgLTuAWssBg?= =?iso-8859-1?q?G/2gEyc4CpXkLUV15z9FGJzDMA432oTypdFO9oqk2PRILk4HqPjHvkTbe2kY?= =?iso-8859-1?q?JASeg7qxAs2c1uddBO3XYyfj+3kRs1x51SxHWEOH5V8deukbYLlN+G1ULuIK?= =?iso-8859-1?q?afTdrYLGPjGkSuoGf7PKT/o4OTSziyUxON/MvzIx7BHU3G0Y5Bno/VsavBzg?= =?iso-8859-1?q?ilAmOOyg1ytgCVLwS8pJt9nWBv2SBW0CuN9irpSfDoQkHCohD4/VRbgIYQY2?= =?iso-8859-1?q?hHPPglY8ku9hJdCmMhlVOTZgDUGFCAvaLWUFRm079KW8pii739oI6lUiZnPn?= =?iso-8859-1?q?Ly6PhTP6p4XJwoWbC?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: d4e10f95-a6f9-4d78-e823-08dc8dd5c5b1 X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:27:26.3779 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 8TFnq1SSF7DU5LJttYH5zE2iFxXfzYu/jartZQ+cGmFR3EhzFLgtsDik2uEUOzkBMGGIEMWJAentU+duUpRYlzhvV2wNj0OjwSPmx/NAsVk= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB8592 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org It's better to place 3 relevant independent variables into array, since we have requirement to access them via an index in the following patch. At the same time, this change may get some duplicated code be more compact. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vect_transform_reduction): Replace vec_oprnds0/1/2 with one new array variable vec_oprnds[3]. --- gcc/tree-vect-loop.cc | 42 +++++++++++++++++------------------------- 1 file changed, 17 insertions(+), 25 deletions(-) From 168a55952ae317fca34af55d025c1235b4ff34b5 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Sun, 16 Jun 2024 13:21:13 +0800 Subject: [PATCH 5/8] vect: Use an array to replace 3 relevant variables It's better to place 3 relavent independent variables into array, since we have requirement to access them via an index in the following patch. At the same time, this change may get some duplicated code be more compact. 2024-06-16 Feng Xue gcc/ * tree-vect-loop.cc (vect_transform_reduction): Replace vec_oprnds0/1/2 with one new array variable vec_oprnds[3]. --- gcc/tree-vect-loop.cc | 42 +++++++++++++++++------------------------- 1 file changed, 17 insertions(+), 25 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 39aa5cb1197..7909d63d4df 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8605,9 +8605,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, /* Transform. */ tree new_temp = NULL_TREE; - auto_vec vec_oprnds0; - auto_vec vec_oprnds1; - auto_vec vec_oprnds2; + auto_vec vec_oprnds[3]; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n"); @@ -8657,12 +8655,12 @@ vect_transform_reduction (loop_vec_info loop_vinfo, { vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies, single_defuse_cycle && reduc_index == 0 - ? NULL_TREE : op.ops[0], &vec_oprnds0, + ? NULL_TREE : op.ops[0], &vec_oprnds[0], single_defuse_cycle && reduc_index == 1 - ? NULL_TREE : op.ops[1], &vec_oprnds1, + ? NULL_TREE : op.ops[1], &vec_oprnds[1], op.num_ops == 3 && !(single_defuse_cycle && reduc_index == 2) - ? op.ops[2] : NULL_TREE, &vec_oprnds2); + ? op.ops[2] : NULL_TREE, &vec_oprnds[2]); } else { @@ -8670,12 +8668,12 @@ vect_transform_reduction (loop_vec_info loop_vinfo, vectype. */ gcc_assert (single_defuse_cycle && (reduc_index == 1 || reduc_index == 2)); - vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies, - op.ops[0], truth_type_for (vectype_in), &vec_oprnds0, + vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies, op.ops[0], + truth_type_for (vectype_in), &vec_oprnds[0], reduc_index == 1 ? NULL_TREE : op.ops[1], - NULL_TREE, &vec_oprnds1, + NULL_TREE, &vec_oprnds[1], reduc_index == 2 ? NULL_TREE : op.ops[2], - NULL_TREE, &vec_oprnds2); + NULL_TREE, &vec_oprnds[2]); } /* For single def-use cycles get one copy of the vectorized reduction @@ -8683,20 +8681,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo, if (single_defuse_cycle) { vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, 1, - reduc_index == 0 ? op.ops[0] : NULL_TREE, &vec_oprnds0, - reduc_index == 1 ? op.ops[1] : NULL_TREE, &vec_oprnds1, + reduc_index == 0 ? op.ops[0] : NULL_TREE, + &vec_oprnds[0], + reduc_index == 1 ? op.ops[1] : NULL_TREE, + &vec_oprnds[1], reduc_index == 2 ? op.ops[2] : NULL_TREE, - &vec_oprnds2); + &vec_oprnds[2]); } bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info); + unsigned num = vec_oprnds[reduc_index == 0 ? 1 : 0].length (); - unsigned num = (reduc_index == 0 - ? vec_oprnds1.length () : vec_oprnds0.length ()); for (unsigned i = 0; i < num; ++i) { gimple *new_stmt; - tree vop[3] = { vec_oprnds0[i], vec_oprnds1[i], NULL_TREE }; + tree vop[3] = { vec_oprnds[0][i], vec_oprnds[1][i], NULL_TREE }; if (masked_loop_p && !mask_by_cond_expr) { /* No conditional ifns have been defined for dot-product yet. */ @@ -8721,7 +8720,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, else { if (op.num_ops >= 3) - vop[2] = vec_oprnds2[i]; + vop[2] = vec_oprnds[2][i]; if (masked_loop_p && mask_by_cond_expr) { @@ -8752,14 +8751,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo, } if (single_defuse_cycle && i < num - 1) - { - if (reduc_index == 0) - vec_oprnds0.safe_push (gimple_get_lhs (new_stmt)); - else if (reduc_index == 1) - vec_oprnds1.safe_push (gimple_get_lhs (new_stmt)); - else if (reduc_index == 2) - vec_oprnds2.safe_push (gimple_get_lhs (new_stmt)); - } + vec_oprnds[reduc_index].safe_push (gimple_get_lhs (new_stmt)); else if (slp_node) slp_node->push_vec_def (new_stmt); else -- 2.17.1 From patchwork Sun Jun 16 07:28:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948267 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=IpHzoklc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24QB1dLQz20Pb for ; Sun, 16 Jun 2024 17:29:06 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 71C4B385C6C7 for ; Sun, 16 Jun 2024 07:29:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on20728.outbound.protection.outlook.com [IPv6:2a01:111:f403:2412::728]) by sourceware.org (Postfix) with ESMTPS id A85273858CDA for ; Sun, 16 Jun 2024 07:28:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A85273858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A85273858CDA Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2412::728 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522926; cv=pass; b=DHjYfMjRHXygv2vnUtsKC6GGiQ7iJWSULYm/bn839osJv0TMgnVv67kaNtvsS5/rknKOby0cc0qlkGAaiBHsxvtb5AbVxKNBdwSmwUJSKuGqUBC8DLnYS7/KN1Xwk45bA6sCfBamPORV85YGQ3H4Hs6EdU8F2rzBs+zEumVSa94= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718522926; c=relaxed/simple; bh=T0KMC0U299jhAddZo7Om+yWBt1skA4vf7JDYtFA4msQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=lpqs4VyG8JB5EyrZzEcNYtZNOZJawufOFFUqdRqglDvMJBzYaiul15ZecbCa6JsEMx7+vMIYWMKt9eIXoiLGwDmsqePObq8oT3gIEHi2y8DGkYVNf/fiIhV99MV3ZL9gJyvPyJQZPfG6R4L2AyZCIEtb5F2SGG0nEjQshfxkz28= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Pc5tSP5UKIDViKO2GayXmnpvR65hfGgB5GjZBmcstjWzCnQQmpC/DwSHc4kY1v3alxjeRmYD2xixhWjm4HTqZdKfv/D/hZWkMKDR5BwjEpgCkLRhK8ORIoWHQLQwMJ/w2hB07W3QcGNZUuBqUk/zmpjJHwTcnC1oQpm604mb8a1L5mOF8AVIRcb0+JRqJChp7Z25kH74yBx2DxUDLMI5LuWGdE0BccNhjE0Hp6fSeIPGYqFxd14uOZYsd2B424AqfZ471Ka2vy3DuGnrEvgsLzcIjjhTYpnKoi1S377xZeqRLTqc0I8pdqbYLqHhZ83CrB3XgcRpMkfGr0WSR8PzVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UdZp8SrxS1iZyGZpYwtjcp1SQP25fizcyNYwuTmvH/U=; b=GnL2khL5JTPJq+mY5zA6DVDfu9j/YAwzX9nD6BZwsCRQABSOnZ+M014zIKxvr3dFLlmHQ+ep8q57lghlbnZQvPVWxFLZ9BxMd6KP/SD58qzzNIpr37IS97I2Orp3GOUtyLmIWeJzDDzZpjaOtkgn277vwECVcCDkszqFVPU3B0Zf+fT58oaIvcn6Vx9R4abw2t/YnWuKFpAIh2Onp3d/moEi2Y/svncR0x/opMH+Fp2QYw26Ta/3/IahbHHaG6XHgJAdXSmt5Hi6o1FTGhCtXv1pYN87xKK7b9ST6sXS0NpWOyXZ6JM6tBz5ReB8IRBpH0vqEsC9jqyao5OQ2t0BqA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UdZp8SrxS1iZyGZpYwtjcp1SQP25fizcyNYwuTmvH/U=; b=IpHzoklcUw4NmIiJt/ThrJ6ojjRL2WR7HxE8lx+Jy4cke25a4YJB13QqDUXKFLXKePOVxK72NZFeDKoYhQKhKtVqUO7X5e3xUoovMMCm5OvN+2U3DI0qsv/RZsP9gfDApP74LcboubKMiD7bG/Dy6WQjlfmTFZMxtURGRL5YipE= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by SA1PR01MB8592.prod.exchangelabs.com (2603:10b6:806:383::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.25; Sun, 16 Jun 2024 07:28:41 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:28:41 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH 6/8] vect: Tighten an assertion for lane-reducing in transform Thread-Topic: [PATCH 6/8] vect: Tighten an assertion for lane-reducing in transform Thread-Index: AQHav763B3aXyBkSJ0WCVeBwLMcO8w== Date: Sun, 16 Jun 2024 07:28:41 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:28:41.075Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|SA1PR01MB8592:EE_ x-ms-office365-filtering-correlation-id: 48250ff8-c83d-44e4-9f86-08dc8dd5f263 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|1800799021|366013|376011|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?F1FuJ7YnkeXe01xDTMXpYFEOWI?= =?iso-8859-1?q?Wdd7XkGqjyWGXXm20bhbPz208PM/u8iWucaSuI03QU2TUrpNMFPmqhopuWPq?= =?iso-8859-1?q?faa3drPmIdiCxEph5i0fHlbdj1aERXK/mHW61bDaPVFjYU3Pj69eblhiqQh0?= =?iso-8859-1?q?wc8MsECmrDYe3rlA8BKnJDSO7NQuaBh9GZnUW5kCf9LVBdBnCCgByNa1Yo64?= =?iso-8859-1?q?u3eY8bQ1/5abRsQq/ksHO1sVF2zjIi1ntvGDxMil7qUdcvU6lXIqJz4Qq4Ku?= =?iso-8859-1?q?gJtVXLLsVBzsBRRRoZrV5ft0rHDi6H1r5RqT9Vw97amwWb3QNqZ6gqr3i5Ra?= =?iso-8859-1?q?4cM6dFMxqiWUBzymTwUeQZLQ7UQWyDslz/yaYM/XCkxF0bu2Z6XVJ+cwOm/T?= =?iso-8859-1?q?mVmABle69ho4Sn/4G2TAFm1ZJTvUbGC1xEOQczgNlh7S6+NTu5RUtSMtkaAj?= =?iso-8859-1?q?ChRt3myqXXrY4KgJ2RnsqJTVsssaJYnIEmMlesToCpoSfWdKR/RuwZeFQrh9?= =?iso-8859-1?q?w9UpHnUX3Y+7zXqpyDlLBGOVPajXtm4a2IHNBl2p9F0fpxEJ3dIOU02g1R8V?= =?iso-8859-1?q?H+LPUI45UojmDUDkkSBE8B0pDXAYYCyqsojOJZRpfw9n71pHkTAIz95wic3s?= =?iso-8859-1?q?6HXNTM6WW9UR+8NftPzFiMO3IaPFjOY7NqIEStRW7QLqmFgUz12wJj6PgzDx?= =?iso-8859-1?q?WP3IWniXOE/7Bby3H2w4Wp0d+mZADyGZS2bKd9DUmFAfn6C+AbFQVyJKZgFS?= =?iso-8859-1?q?8Isv9NUxZlx00gR50oEGBE5gMUiX0avqtI+2mV11rOMim+TxfavLtf+jjIeI?= =?iso-8859-1?q?ti7mL2uqOEutItnM+IjP6JDGvVOL15r/efs9ig/CactKPw/i7mFeLvtwtLIY?= =?iso-8859-1?q?bQuQIrSBiMkeENqDe8s2etKjEQg6lfPsEMOelrAqHUSfFvXsMRc5wGANFVI0?= =?iso-8859-1?q?+tbxf1mDkOsApzuH3f2LxRXyToFs1jYMkM7fkJGbuac0RNKLScHLBsfStT+8?= =?iso-8859-1?q?Z+B3gHuXWdTFJ5ARQPn9A3uF70TweRr8v7h3Z0uxG6RAjUY7q6A1T+kGvaYG?= =?iso-8859-1?q?xhL6fFGNSxPu78tY7uZ3ZSt01pbrV0ohL8hvOA0EuzrxpUQAOmpkEIbkf8dZ?= =?iso-8859-1?q?7RnS6vIOMDqAMUHxE3HzGQq6Xk9I7+0u1HJhnNjC+BYOFSDkpCelGCb/dtFW?= =?iso-8859-1?q?G0+UIbEH43zmD1Mpoog8LvvY6c8F4BbFv7pNk8qYjJzmCChZz5oRcniuui+g?= =?iso-8859-1?q?eOXpmgOEUoPtQtvELWslQg10/PKg4UYQqFkGwY8ItYRSRrlgL07OW1cutE0g?= =?iso-8859-1?q?rv5L23r8ogm79GeOXRKX2u8kKMC62spMsVzCK1L8kYK7EPpzx3zTqYPKZJfA?= =?iso-8859-1?q?VyOu4WXeU/BrJ9qmliKn6sZV82MW8/Vum0CiOZsFDxymaLwOVOxg6vx6Cz3f?= =?iso-8859-1?q?UA?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(1800799021)(366013)(376011)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?K0p/8QTz0BxURoU6qV0D5XZ?= =?iso-8859-1?q?cIyqg+YKjM1UuyCoQSgHdKXeueMdauFo3pfwCjexApvxmoRjweMsS+4/ZSrH?= =?iso-8859-1?q?CbUM6uzUWpovHQThMBBjxL/xzLOVBh1QT4UvD7HLYGYAT4Hop3iTUgpKbLbb?= =?iso-8859-1?q?TUdqmatkbT49f588K3p4Xs6mrMKAueN5DNLXoXhxP7mkGYdFZN0/0y1wVe8m?= =?iso-8859-1?q?csNjmr9NR6Z3iivS6VEkqbUaqtUEQwutzzK55cpqgMWRefe2SMOkMTWXzO8Q?= =?iso-8859-1?q?YIhzKkzk46TxEy2gNcB5tP6HBhLZ82QLEeR7oBZmuI8Q46jRAAOi9GVQp/zI?= =?iso-8859-1?q?hIl8SPrfXXUAlzPWV642Pg7/MKFiBbXL1hdrYScUZi1mJ64O1Rh8+bwpKFRr?= =?iso-8859-1?q?c9NQT8ydd9t7lSId8Qt1SAIFM1WjAr5xdgiVl929yNuO1P9uvT6ZfaV40DoM?= =?iso-8859-1?q?1XHSr9NF7lXukMMkfDvjAosrrbEEbKBRqQTu/IVnoPtBS0bbhQQiHf3o2BcH?= =?iso-8859-1?q?z1xz21JxHDkSDXh2b1A29HP3GTodVC0nBfrquPhUTxyLzdVmLEqP/+JggpCW?= =?iso-8859-1?q?T73WX/fN/MGyLIT6YnpbyyQPai3mvx/dWLxrir6ZuC/lGA73uOPEgJr6y8oB?= =?iso-8859-1?q?4cKzqz2srOWLXXGcE45oJGLyNmt/AewDfBdBjiv53RS0ijXW321dhLcUsKaO?= =?iso-8859-1?q?+qUThyouYoFu+RBa5n76gHVn6A1THyl0uUdYSfgRT9NFBW10GdbUqhFnofzl?= =?iso-8859-1?q?iMLm185rUJ4XN31jq0EAcRRX+KaLQ7hmdmgtXVbcBZTNWc63Ia9b+9/obcnR?= =?iso-8859-1?q?qyd7R3BdcROHGG3KHhOMGQdnj/iLZPXsX7HixUyB5Vq4aWoScuJBMXRBXKY3?= =?iso-8859-1?q?ytpxCq+MZG79fcJ46MOabWy69kJCkhCAqcOPlL12vOjIl3fYtprOeao/PK6/?= =?iso-8859-1?q?PG0w40uwxGc516mk3SxMYpnWBTiiIR7bPARFS9ikyjdWk0VrlxnoimLMXsUV?= =?iso-8859-1?q?q6ha1EfLWu8flbQXnnUqsSCoHC1gJ5Sv+l+7qRatMiCZtyxokkwcR8Ox27Fy?= =?iso-8859-1?q?nPt4l95Yz947S1KlYS5Ro2D87pW8wtJnCLqRD1UjowQDr3B2mEKWYR3JyyB0?= =?iso-8859-1?q?I4nDtdgR7+FRjqkr6gqDt6dv0uVVga91fp8BLoruxqNPSfEmfZHsLbs1GCIi?= =?iso-8859-1?q?IZO3QHXuO9M2wOoSJAoYoc1vlN+tdNZjgQ9/pL8RfVhRt2UHtH08CjpwNtbG?= =?iso-8859-1?q?liow5OpyPJWuyHfkET4/2tjAA4T/QKSgZP2Fa/rONNkpc/sJbTQh8s5pSrio?= =?iso-8859-1?q?sgCx9nzZP5hXf9X65JLpUIZtYFl5ZVBMDA2XDJbUdn7byvmLfuOPdhkRW9YT?= =?iso-8859-1?q?m+b66nKwpjA24zCsU9mU89zIdCePYNYXTss0WBAkMqk2k+8UjFU5v7Q7hplo?= =?iso-8859-1?q?Hp+HmLabBMTpovD6YCYvEwsiXq60xTOwzn/AOvyXJAIoefp9U6N71GiSdzqm?= =?iso-8859-1?q?4gHoH6uaY5SIfQ8Klkw04eEUpNv4SrTmd8L4OH1XJCHnaCdatgrVwk1V4t/Z?= =?iso-8859-1?q?gIOmzkR5s8LIZi3UEtTFflgd9Bp7HAeg3dkZFEIXX3qsBixMm4AJXef3Eh7/?= =?iso-8859-1?q?v8ifYe4u5Fjmhj7jD?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 48250ff8-c83d-44e4-9f86-08dc8dd5f263 X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:28:41.3303 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: nD2WEXfvo4ZsBX0POUkLQoQpPd75N3ylte3DZ3LOUwQpoQYhCxI2A2tk+3q9SaObScyqpDZEc7LN50BHwswlJlWYU6NyHQo1ZIJrtF8wmqw= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB8592 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org According to logic of code nearby the assertion, all lane-reducing operations should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p" treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be allowed by the following assertion "gcc_assert (commutative_binary_op_p (...))", so tighten the assertion. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vect_transform_reduction): Change assertion to cover all lane-reducing ops. --- gcc/tree-vect-loop.cc | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) From d348e63c001e65067876a80dfae75abefe10c240 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Sun, 16 Jun 2024 13:33:52 +0800 Subject: [PATCH 6/8] vect: Tighten an assertion for lane-reducing in transform According to logic of code nearby the assertion, all lane-reducing operations should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p" treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be allowed by the following assertion "gcc_assert (commutative_binary_op_p (...))", so tighten the assertion. 2024-06-16 Feng Xue gcc/ * tree-vect-loop.cc (vect_transform_reduction): Change assertion to cover all lane-reducing ops. --- gcc/tree-vect-loop.cc | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 7909d63d4df..e0561feddce 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8643,7 +8643,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo, } bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info); - gcc_assert (single_defuse_cycle || lane_reducing_op_p (code)); + bool lane_reducing = lane_reducing_op_p (code); + gcc_assert (single_defuse_cycle || lane_reducing); /* Create the destination vector */ tree scalar_dest = gimple_get_lhs (stmt_info->stmt); @@ -8698,8 +8699,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo, tree vop[3] = { vec_oprnds[0][i], vec_oprnds[1][i], NULL_TREE }; if (masked_loop_p && !mask_by_cond_expr) { - /* No conditional ifns have been defined for dot-product yet. */ - gcc_assert (code != DOT_PROD_EXPR); + /* No conditional ifns have been defined for lane-reducing op + yet. */ + gcc_assert (!lane_reducing); /* Make sure that the reduction accumulator is vop[0]. */ if (reduc_index == 1) -- 2.17.1 From patchwork Sun Jun 16 07:31:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948268 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=LVIZ7JT8; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24TG03Lzz20Wb for ; Sun, 16 Jun 2024 17:31:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 02346385C6C7 for ; Sun, 16 Jun 2024 07:31:44 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on20715.outbound.protection.outlook.com [IPv6:2a01:111:f403:2412::715]) by sourceware.org (Postfix) with ESMTPS id 98B773858D28 for ; Sun, 16 Jun 2024 07:31:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 98B773858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 98B773858D28 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2412::715 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718523082; cv=pass; b=auFscrV79zSQ5iDIBVnXnzDlq+PbgLceluC4qpiO4IqGtjBv6iIz4fSnbtuFhDjB6sFFlbqUN5du6LuEnsEvSIHFUM/UMvOPHfLNrwLkVcDiyuHKw838F7DzsyLxDV43BtHyj8U5imbQnMZSu6c1bW2ejy0M7PPIegli1XSA1wY= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718523082; c=relaxed/simple; bh=uoWEqwJGV01vnKScU+QUNR+ptI3Fu/28Ln7WF0fkhos=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UafN6/6mbTA5RzusPIJjWCigjNFIa2wWzQVyadsm4cxr91zSUQvUEVObzkBFB6gQP1hcAlFhgh1hn6ZK46jntbKuUIISYppxjYi28/hx12On3NGfLfjFaPzw//BP427AS1DiKzUGPhImkiOhDFroEalC1K4mBfYvakdPn3qwmRA= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U1lhsydz/DbV7cCzDrTIpXuVHcmi3ohaxJ2jkcdWq4MPWLsAPOSuKoQl7Iw1NF1x+ny/Xj5NwH6OEUEw0cPcZvOVrMG30nDz6L+8WMgBvGC9ugFPhMhWXTsVbC/N6Smy00ZBduAe+bQ4ue45V0Oy5dWUhbnxpXkb8vEdSKVuApUTYlhXwB5Q1b9pGLDEjRZOkxzedx3vHTbcmHLkUlDiX7vyrJG5LO6UgJWxmYBLynZ1PS4/ptD0W9OQ4UHGdp7X1YD5HCdw7+/PSB6uHBg9b5rfsEmFTTIC+Qzb4w2P6kaL3fSQYBvvwb8/r8hLwSi0GufQT4fLX6auJyBA/FxEew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TpFOt3D/MWrVap4DeLYjDQJaE8sE5t8L4lEx86MbbWE=; b=ZcGKF9dhE/q9iOht6M/MDhrDQoY77luwooKQDYVZZN0ubw+tBTotvgeiX9788/e2hq2Fk38Nnz/2iR3bWkb9lgP2Zg1YtLaK5M8EqNlByfQPJnlJg2Wx8SaKSFfWi2U93T7d88TOXgeCAUTuuPr/WwpICEGYW5mEegW7NK32ysSvxCV5ztltrlrXqTwVCbbqK4R9jdw1ma74LggMbRoQknvhFINdHvrO6O0wHTvCe0fROiA7d/fH8hXq6be/1puZ9VEqIBE0J6kaXC3qUc3uGSFDBgwYBXSInoCwol3WzM+u1wFUtbV0smBSO4eUz04efbZbA43Crr6X57ijl8OEIg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TpFOt3D/MWrVap4DeLYjDQJaE8sE5t8L4lEx86MbbWE=; b=LVIZ7JT8kL5EyV8xbP7iKsoxYNvFjb2N7oEGjSrtikCzMMlfDW2W66SCadHeiDQDK6kgPL+KKMpDHYJB5yiqSvQ99tPUfo5GOfNP38vfiw7P0uvyqNz166rfuBAD5IX+Rs26vHPmYZiYuHavzysp97Vd5mEqjWxyefM3CvW1uO4= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by SA1PR01MB8592.prod.exchangelabs.com (2603:10b6:806:383::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.25; Sun, 16 Jun 2024 07:31:08 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:31:08 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440] Thread-Topic: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440] Thread-Index: AQHav77jTV0rEVT7ZUiXHCT0YUNvgg== Date: Sun, 16 Jun 2024 07:31:08 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:31:07.828Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|SA1PR01MB8592:EE_ x-ms-office365-filtering-correlation-id: be6499ce-533c-4691-70c2-08dc8dd649de x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|1800799021|366013|376011|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?I7m/sDixj4RSFl8FH+bLrmO+hr?= =?iso-8859-1?q?nYDuafG00HY6n8yvFjhchF0tb66jVSIhR4/D3dpUWDHFvvYbHk5v9h+JvQGv?= =?iso-8859-1?q?Ig2zJPiM2lH31oTdmTuhlbhAyVmXBoHqLQSFocWs2wjAfSh/+KHdQglaWhzW?= =?iso-8859-1?q?Im7tB8tHdXCo/oSfMQHu5yrTJ6Qei8ILsW/OCO+0mFjqmaEbc0BUKAOrlbA8?= =?iso-8859-1?q?zcctYbYbO4b39w3P9Wu1uBRe/RlqFRq2tIpS4/7I0MJ7SEenMJ1nEdvCmYlk?= =?iso-8859-1?q?6ym/nvuZ+B+2AbvWT4g4WOT+rHBfZKuLfVxOVhaSAq58gPeAW7Q0UrWqt9kQ?= =?iso-8859-1?q?1RTCGEeuf4/Trw3K1izVZ+HZ3NQajJwPWewq54ZQbJXym6gtkCEf2r2DNh6D?= =?iso-8859-1?q?9szpmCT803got3T7WXV5VS6zYv+B0tnmCFItEhqwYJsmJxhIdmy0bLBO2zOI?= =?iso-8859-1?q?9pvqA/wd5uf1scUH/I67Aqo56gaF4wVNHX8Q5XmDeFK2y5zNzMNVXo+7EhKs?= =?iso-8859-1?q?T2YSL/sQTDHgz7jySvPg38GBaSdsb8xgB75i3uMLFkPv3mN3Uk+qnwnYF3pN?= =?iso-8859-1?q?TIEIJpSCsW7ghK/bQ77n4yho90ev5ob4n2UVdmTmk1pOj25QOBpEPZTWAZ50?= =?iso-8859-1?q?4gtXjcCNpd/n8aW28b17EqpFb7v4l4wIN+5XLz7lBpOjL/BicCY058bQ9Uzn?= =?iso-8859-1?q?TkYQuUAECPe6BNuEN2/DWFoqygCUeiDnMJmK/HFXiQ5n2tm+rQInzdJykme5?= =?iso-8859-1?q?RWJGphP4aKrylgkeXBjKUnVaWBj9Nzxo6gtGYuifZgmUfg8Fh8baXQhcKxf0?= =?iso-8859-1?q?VWXWF7r4SqsUogrHV7/dyFSIO9zJ+jViz6NLj0/unPCq8LaO6UnJ3BgpRRlY?= =?iso-8859-1?q?KOiD6Dlfj2eceoPplydcPkc5p5zjqH94h+esgN//kLB9gebY7zE3IsPpcGH7?= =?iso-8859-1?q?u5CURXe2HRLLehxZ3ErmeGYdujoMbQLdb0FRtiiP45Ko9ZGQe7ortP++N7vd?= =?iso-8859-1?q?OIY9pvA8prFWzWiT4oKGQ9WKx0Ga9I1UFTyzhn4gIyztyMigjiCM5faCN2dp?= =?iso-8859-1?q?domleTty6OTT89gxMuBvbmV+9E2wrhPSkocwR3Syz8AHhFK7od0eBMCdKJZ7?= =?iso-8859-1?q?2DlX+7QI2Boac3NUv1N0VwoE+sT4jckB6Y1iI7Wj8Iv64sE7z9g/OlHmrBDJ?= =?iso-8859-1?q?9dSyFk1baJJ5Tv1WNPbCl2M9H7QUPQE6S5WQ9kDS6nL9pPAMgdYaOj2DkVgc?= =?iso-8859-1?q?uMlfbnrollluXLUicHD5dPRmkyWnPMDx7uLFAgxWfI/iG8YvqvuDRrT0uaov?= =?iso-8859-1?q?OEhFupSutxfd/e/vQ+DRWBnnXQCzOCdDCfjY6ChAgoKUrEw7im5U/wOwvr?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(1800799021)(366013)(376011)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?xEYux4bJuoK3dcSA22a3qRY?= =?iso-8859-1?q?ZjbrFj1v8CMH4FrFcMydF6Vzatn5rX/5CtYxF5FTVSbckmM0XWAHDWVbY68Q?= =?iso-8859-1?q?uimtdZhWfXdx9GjYDKOF5rzapyimvBVjpPwuIoTtrYhJRdNdjwHj4E5eXhui?= =?iso-8859-1?q?5izly7tjlFSyEHFwfk5WpJ7MVXDUWSRqapAZwYvDQLOqw3AG3GbymfDPJDPc?= =?iso-8859-1?q?F5eBcRJdubZ/12lXeB6ok9kLZpA8KZJNwqlxuNl3BeFQph9yPEptqksZexDd?= =?iso-8859-1?q?iX7r/JvjqkaAfJVW3Pe1QrVFMBy7+Kp18pQWS9DGc1gamx0rtvwjAi8Q5zuc?= =?iso-8859-1?q?NI2wg48yCns57Nc+oyEmLcAihe+/tyCAcwCK2tXSdH0HQ5MXQBM+zFh9lPJK?= =?iso-8859-1?q?LCzpFKg5r7QZWHudZAbjjOHsGu25wXJLGYormXLYoIQltz7kSs26Os+DcpTh?= =?iso-8859-1?q?/vD85Fb18Cl6aYZs/jj1tsYi+fQMCouLbQlkW41PF5lJVMqkQ+/rXI6c4iWX?= =?iso-8859-1?q?0Kj3yP9WXch5peAKfn2PRtMb6PgWJbgm3S7rka/lMvJNjjwJnVo7sjRS6jCD?= =?iso-8859-1?q?IlLEr9UwOa/8RSg4FRo4qT2fYn5FX2v1gFR5Eg8MnT72VcBjHcqxYedeu2PK?= =?iso-8859-1?q?xhNO4S/j1y2CeqdwiMWEe03weSdYv0OpGThafUHvPNFkuQpLplsy9DJDYekt?= =?iso-8859-1?q?uuHx53XsSOAWGZWan+KDDV+NVpZituBZwfdw4MnCPkCxTCbJZbdXDPcqdWLZ?= =?iso-8859-1?q?dKvX1Nvd7fT3f4/3aAfdWOZvO5XsEpkgykHkGHsw/8ZbJAz+Cbe+a+It3EPe?= =?iso-8859-1?q?8AgtMmFh4YaCe/t/ZVKErMxDNZxdJ/OF7BSkB1UVWMY0rX1Ch5AxlZ10b0IH?= =?iso-8859-1?q?hbtpXsja1bS1RXxUEKywQ66uNDyYH6IoEhMYDO3itnh3ekBT2X4BUYA6jgyO?= =?iso-8859-1?q?gm41ixbY9S+MRv4Fbnd7O6sZ4AqVCvoUjtZApz2qLwKAU6pec1zVR4BR3PbT?= =?iso-8859-1?q?A8CunaazqVGVWL7NeluiCI1e7+u3t0bDVdDWvmN7ZX05nvcvy1vD+rWB1kO5?= =?iso-8859-1?q?Dbj2p/1wsoZKIkqXqcZKUk8B4sbjke/60ILIR9v2sxEEmAW3Y8Z3EwzruDFA?= =?iso-8859-1?q?uqhyCCVlCrpQUPaBsbBw+rFKkniQT3DJigBRdekp53bNIYBWRZ1vA2M6aB8I?= =?iso-8859-1?q?4Niy1S8ilzPmSImBsTfC+8y4IUquICK/aYlfwi3cP6PasWBvlFpu1HIpF+fV?= =?iso-8859-1?q?APbAwHEjzRpyZ27uB3XOBp3btED4w949P3dNNr98dl2FNADa679A2BQKauvm?= =?iso-8859-1?q?cyctUCq4MyOPC/YrWMIupKBpE9PWPO27awf8Qun8MubSIOOQt7Tk+lEPFAbq?= =?iso-8859-1?q?iDQAvWmtxGXvvge6CypHIp3ONpINHfj7T1ZV1mgiOXXo1Z4xu6AciGqtEESC?= =?iso-8859-1?q?jg5MKD0bJTs6kbcnhcIorPMKwfMx+KmHhgcH2O92K+dgmF/+ZtJ42dVouhEP?= =?iso-8859-1?q?U3mVNCicnmolJ+5ONX8mN6T2a/KejetmVT0nF5RMvIyUWbBn6czi7+w5NRRB?= =?iso-8859-1?q?vxzghHGjYTjoUPz2Prt1P6Qdyrx6nEQrjtH+b08wJiUVPYo38DNroZIFHazw?= =?iso-8859-1?q?FzuP+pg6eYAQsEL3t?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: be6499ce-533c-4691-70c2-08dc8dd649de X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:31:08.1237 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Ezk/S49gv7dheFujIaGDIJUaYlUWK5cP43EPjFGmFqHQJE5y2iIzpnvkhjAz2VAOXugRyHz4Z5nNwPZ1UG0wT4CE8XrP+Fr+Zc4BZrVmXM4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB8592 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Actually, to allow multiple arbitrary lane-reducing operations, we need to support vectorization of loop reduction chain with mixed input vectypes. Since lanes of vectype may vary with operation, the effective ncopies of vectorized statements for operation also may not be same to each other, this causes mismatch on vectorized def-use cycles. A simple way is to align all operations with the one that has the most ncopies, the gap could be complemented by generating extra trivial pass-through copies. For example: int sum = 0; for (i) { sum += d0[i] * d1[i]; // dot-prod sum += w[i]; // widen-sum sum += abs(s0[i] - s1[i]); // sad sum += n[i]; // normal } The vector size is 128-bit vectorization factor is 16. Reduction statements would be transformed as: vector<4> int sum_v0 = { 0, 0, 0, 0 }; vector<4> int sum_v1 = { 0, 0, 0, 0 }; vector<4> int sum_v2 = { 0, 0, 0, 0 }; vector<4> int sum_v3 = { 0, 0, 0, 0 }; for (i / 16) { sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 += n_v0[i: 0 ~ 3 ]; sum_v1 += n_v1[i: 4 ~ 7 ]; sum_v2 += n_v2[i: 8 ~ 11]; sum_v3 += n_v3[i: 12 ~ 15]; } Thanks, Feng --- gcc/ PR tree-optimization/114440 * tree-vectorizer.h (vectorizable_lane_reducing): New function declaration. * tree-vect-stmts.cc (vect_analyze_stmt): Call new function vectorizable_lane_reducing to analyze lane-reducing operation. * tree-vect-loop.cc (vect_model_reduction_cost): Remove cost computation code related to emulated_mixed_dot_prod. (vect_reduction_update_partial_vector_usage): Compute ncopies as the original means for single-lane slp node. (vectorizable_lane_reducing): New function. (vectorizable_reduction): Allow multiple lane-reducing operations in loop reduction. Move some original lane-reducing related code to vectorizable_lane_reducing. (vect_transform_reduction): Extend transformation to support reduction statements with mixed input vectypes. gcc/testsuite/ PR tree-optimization/114440 * gcc.dg/vect/vect-reduc-chain-1.c * gcc.dg/vect/vect-reduc-chain-2.c * gcc.dg/vect/vect-reduc-chain-3.c * gcc.dg/vect/vect-reduc-chain-dot-slp-1.c * gcc.dg/vect/vect-reduc-chain-dot-slp-2.c * gcc.dg/vect/vect-reduc-chain-dot-slp-3.c * gcc.dg/vect/vect-reduc-chain-dot-slp-4.c * gcc.dg/vect/vect-reduc-dot-slp-1.c --- .../gcc.dg/vect/vect-reduc-chain-1.c | 62 ++++ .../gcc.dg/vect/vect-reduc-chain-2.c | 77 +++++ .../gcc.dg/vect/vect-reduc-chain-3.c | 66 ++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-1.c | 95 +++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-2.c | 67 ++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-3.c | 79 +++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-4.c | 63 ++++ .../gcc.dg/vect/vect-reduc-dot-slp-1.c | 35 ++ gcc/tree-vect-loop.cc | 324 ++++++++++++++---- gcc/tree-vect-stmts.cc | 2 + gcc/tree-vectorizer.h | 2 + 11 files changed, 802 insertions(+), 70 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-4.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c From 67045272c75c3016c33cb87f893ce4cd3a8374a0 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Wed, 29 May 2024 17:22:36 +0800 Subject: [PATCH 7/8] vect: Support multiple lane-reducing operations for loop reduction [PR114440] For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Actually, to allow multiple arbitray lane-reducing operations, we need to support vectorization of loop reduction chain with mixed input vectypes. Since lanes of vectype may vary with operation, the effective ncopies of vectorized statements for operation also may not be same to each other, this causes mismatch on vectorized def-use cycles. A simple way is to align all operations with the one that has the most ncopies, the gap could be complemented by generating extra trival pass-through copies. For example: int sum = 0; for (i) { sum += d0[i] * d1[i]; // dot-prod sum += w[i]; // widen-sum sum += abs(s0[i] - s1[i]); // sad sum += n[i]; // normal } The vector size is 128-bit vectorization factor is 16. Reduction statements would be transformed as: vector<4> int sum_v0 = { 0, 0, 0, 0 }; vector<4> int sum_v1 = { 0, 0, 0, 0 }; vector<4> int sum_v2 = { 0, 0, 0, 0 }; vector<4> int sum_v3 = { 0, 0, 0, 0 }; for (i / 16) { sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 += n_v0[i: 0 ~ 3 ]; sum_v1 += n_v1[i: 4 ~ 7 ]; sum_v2 += n_v2[i: 8 ~ 11]; sum_v3 += n_v3[i: 12 ~ 15]; } 2024-03-22 Feng Xue gcc/ PR tree-optimization/114440 * tree-vectorizer.h (vectorizable_lane_reducing): New function declaration. * tree-vect-stmts.cc (vect_analyze_stmt): Call new function vectorizable_lane_reducing to analyze lane-reducing operation. * tree-vect-loop.cc (vect_model_reduction_cost): Remove cost computation code related to emulated_mixed_dot_prod. (vect_reduction_update_partial_vector_usage): Compute ncopies as the original means for single-lane slp node. (vectorizable_lane_reducing): New function. (vectorizable_reduction): Allow multiple lane-reducing operations in loop reduction. Move some original lane-reducing related code to vectorizable_lane_reducing. (vect_transform_reduction): Extend transformation to support reduction statements with mixed input vectypes. gcc/testsuite/ PR tree-optimization/114440 * gcc.dg/vect/vect-reduc-chain-1.c * gcc.dg/vect/vect-reduc-chain-2.c * gcc.dg/vect/vect-reduc-chain-3.c * gcc.dg/vect/vect-reduc-chain-dot-slp-1.c * gcc.dg/vect/vect-reduc-chain-dot-slp-2.c * gcc.dg/vect/vect-reduc-chain-dot-slp-3.c * gcc.dg/vect/vect-reduc-chain-dot-slp-4.c * gcc.dg/vect/vect-reduc-dot-slp-1.c --- .../gcc.dg/vect/vect-reduc-chain-1.c | 62 ++++ .../gcc.dg/vect/vect-reduc-chain-2.c | 77 +++++ .../gcc.dg/vect/vect-reduc-chain-3.c | 66 ++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-1.c | 95 +++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-2.c | 67 ++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-3.c | 79 +++++ .../gcc.dg/vect/vect-reduc-chain-dot-slp-4.c | 63 ++++ .../gcc.dg/vect/vect-reduc-dot-slp-1.c | 35 ++ gcc/tree-vect-loop.cc | 324 ++++++++++++++---- gcc/tree-vect-stmts.cc | 2 + gcc/tree-vectorizer.h | 2 + 11 files changed, 802 insertions(+), 70 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-4.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c new file mode 100644 index 00000000000..04bfc419dbd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c @@ -0,0 +1,62 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, + SIGNEDNESS_2 char *restrict a, + SIGNEDNESS_2 char *restrict b, + SIGNEDNESS_2 char *restrict c, + SIGNEDNESS_2 char *restrict d, + SIGNEDNESS_1 int *restrict e) +{ + for (int i = 0; i < N; ++i) + { + res += a[i] * b[i]; + res += c[i] * d[i]; + res += e[i]; + } + return res; +} + +#define BASE ((SIGNEDNESS_2 int) -1 < 0 ? -126 : 4) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + SIGNEDNESS_2 char a[N], b[N]; + SIGNEDNESS_2 char c[N], d[N]; + SIGNEDNESS_1 int e[N]; + int expected = 0x12345; + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 5; + b[i] = BASE + OFFSET + i * 4; + c[i] = BASE + i * 2; + d[i] = BASE + OFFSET + i * 3; + e[i] = i; + asm volatile ("" ::: "memory"); + expected += a[i] * b[i]; + expected += c[i] * d[i]; + expected += e[i]; + } + if (f (0x12345, a, b, c, d, e) != expected) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing statement: \\S+ = DOT_PROD_EXPR" 2 "vect" { target vect_sdot_qi } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c new file mode 100644 index 00000000000..6c803b80120 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c @@ -0,0 +1,77 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 unsigned +#define SIGNEDNESS_3 signed +#define SIGNEDNESS_4 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +fn (SIGNEDNESS_1 int res, + SIGNEDNESS_2 char *restrict a, + SIGNEDNESS_2 char *restrict b, + SIGNEDNESS_3 char *restrict c, + SIGNEDNESS_3 char *restrict d, + SIGNEDNESS_4 short *restrict e, + SIGNEDNESS_4 short *restrict f, + SIGNEDNESS_1 int *restrict g) +{ + for (int i = 0; i < N; ++i) + { + res += a[i] * b[i]; + res += i + 1; + res += c[i] * d[i]; + res += e[i] * f[i]; + res += g[i]; + } + return res; +} + +#define BASE2 ((SIGNEDNESS_2 int) -1 < 0 ? -126 : 4) +#define BASE3 ((SIGNEDNESS_3 int) -1 < 0 ? -126 : 4) +#define BASE4 ((SIGNEDNESS_4 int) -1 < 0 ? -1026 : 373) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + SIGNEDNESS_2 char a[N], b[N]; + SIGNEDNESS_3 char c[N], d[N]; + SIGNEDNESS_4 short e[N], f[N]; + SIGNEDNESS_1 int g[N]; + int expected = 0x12345; + for (int i = 0; i < N; ++i) + { + a[i] = BASE2 + i * 5; + b[i] = BASE2 + OFFSET + i * 4; + c[i] = BASE3 + i * 2; + d[i] = BASE3 + OFFSET + i * 3; + e[i] = BASE4 + i * 6; + f[i] = BASE4 + OFFSET + i * 5; + g[i] = i; + asm volatile ("" ::: "memory"); + expected += a[i] * b[i]; + expected += i + 1; + expected += c[i] * d[i]; + expected += e[i] * f[i]; + expected += g[i]; + } + if (fn (0x12345, a, b, c, d, e, f, g) != expected) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_qi } } } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_udot_qi } } } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target { vect_sdot_hi } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c new file mode 100644 index 00000000000..a41e4b176c4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c @@ -0,0 +1,66 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ + +#include "tree-vect.h" + +#define N 50 + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 unsigned +#define SIGNEDNESS_3 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, + SIGNEDNESS_2 char *restrict a, + SIGNEDNESS_2 char *restrict b, + SIGNEDNESS_3 short *restrict c, + SIGNEDNESS_3 short *restrict d, + SIGNEDNESS_1 int *restrict e) +{ + for (int i = 0; i < N; ++i) + { + short diff = a[i] - b[i]; + SIGNEDNESS_2 short abs = diff < 0 ? -diff : diff; + res += abs; + res += c[i] * d[i]; + res += e[i]; + } + return res; +} + +#define BASE2 ((SIGNEDNESS_2 int) -1 < 0 ? -126 : 4) +#define BASE3 ((SIGNEDNESS_3 int) -1 < 0 ? -1236 : 373) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + SIGNEDNESS_2 char a[N], b[N]; + SIGNEDNESS_3 short c[N], d[N]; + SIGNEDNESS_1 int e[N]; + int expected = 0x12345; + for (int i = 0; i < N; ++i) + { + a[i] = BASE2 + i * 5; + b[i] = BASE2 - i * 4; + c[i] = BASE3 + i * 2; + d[i] = BASE3 + OFFSET + i * 3; + e[i] = i; + asm volatile ("" ::: "memory"); + short diff = a[i] - b[i]; + SIGNEDNESS_2 short abs = diff < 0 ? -diff : diff; + expected += abs; + expected += c[i] * d[i]; + expected += e[i]; + } + if (f (0x12345, a, b, c, d, e) != expected) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = SAD_EXPR" "vect" { target vect_udot_qi } } } */ +/* { dg-final { scan-tree-dump "vectorizing statement: \\S+ = DOT_PROD_EXPR" "vect" { target vect_sdot_hi } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c new file mode 100644 index 00000000000..c2831fbcc8e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c @@ -0,0 +1,95 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, + SIGNEDNESS_2 char *a, + SIGNEDNESS_2 char *b, + int step, int n) +{ + for (int i = 0; i < n; i++) + { + res += a[0] * b[0]; + res += a[1] * b[1]; + res += a[2] * b[2]; + res += a[3] * b[3]; + res += a[4] * b[4]; + res += a[5] * b[5]; + res += a[6] * b[6]; + res += a[7] * b[7]; + res += a[8] * b[8]; + res += a[9] * b[9]; + res += a[10] * b[10]; + res += a[11] * b[11]; + res += a[12] * b[12]; + res += a[13] * b[13]; + res += a[14] * b[14]; + res += a[15] * b[15]; + + a += step; + b += step; + } + + return res; +} + +#define BASE ((SIGNEDNESS_2 int) -1 < 0 ? -126 : 4) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + SIGNEDNESS_2 char a[100], b[100]; + int expected = 0x12345; + int step = 16; + int n = 2; + int t = 0; + + for (int i = 0; i < sizeof (a) / sizeof (a[0]); ++i) + { + a[i] = BASE + i * 5; + b[i] = BASE + OFFSET + i * 4; + asm volatile ("" ::: "memory"); + } + + for (int i = 0; i < n; i++) + { + asm volatile ("" ::: "memory"); + expected += a[t + 0] * b[t + 0]; + expected += a[t + 1] * b[t + 1]; + expected += a[t + 2] * b[t + 2]; + expected += a[t + 3] * b[t + 3]; + expected += a[t + 4] * b[t + 4]; + expected += a[t + 5] * b[t + 5]; + expected += a[t + 6] * b[t + 6]; + expected += a[t + 7] * b[t + 7]; + expected += a[t + 8] * b[t + 8]; + expected += a[t + 9] * b[t + 9]; + expected += a[t + 10] * b[t + 10]; + expected += a[t + 11] * b[t + 11]; + expected += a[t + 12] * b[t + 12]; + expected += a[t + 13] * b[t + 13]; + expected += a[t + 14] * b[t + 14]; + expected += a[t + 15] * b[t + 15]; + t += step; + } + + if (f (0x12345, a, b, step, n) != expected) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing statement: \\S+ = DOT_PROD_EXPR" 16 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c new file mode 100644 index 00000000000..4114264a364 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c @@ -0,0 +1,67 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, + SIGNEDNESS_2 char *a, + SIGNEDNESS_2 char *b, + int n) +{ + for (int i = 0; i < n; i++) + { + res += a[5 * i + 0] * b[5 * i + 0]; + res += a[5 * i + 1] * b[5 * i + 1]; + res += a[5 * i + 2] * b[5 * i + 2]; + res += a[5 * i + 3] * b[5 * i + 3]; + res += a[5 * i + 4] * b[5 * i + 4]; + } + + return res; +} + +#define BASE ((SIGNEDNESS_2 int) -1 < 0 ? -126 : 4) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + SIGNEDNESS_2 char a[100], b[100]; + int expected = 0x12345; + int n = 18; + + for (int i = 0; i < sizeof (a) / sizeof (a[0]); ++i) + { + a[i] = BASE + i * 5; + b[i] = BASE + OFFSET + i * 4; + asm volatile ("" ::: "memory"); + } + + for (int i = 0; i < n; i++) + { + asm volatile ("" ::: "memory"); + expected += a[5 * i + 0] * b[5 * i + 0]; + expected += a[5 * i + 1] * b[5 * i + 1]; + expected += a[5 * i + 2] * b[5 * i + 2]; + expected += a[5 * i + 3] * b[5 * i + 3]; + expected += a[5 * i + 4] * b[5 * i + 4]; + } + + if (f (0x12345, a, b, n) != expected) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing statement: \\S+ = DOT_PROD_EXPR" 5 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-3.c new file mode 100644 index 00000000000..2cdecc36d16 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-3.c @@ -0,0 +1,79 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, + SIGNEDNESS_2 short *a, + SIGNEDNESS_2 short *b, + int step, int n) +{ + for (int i = 0; i < n; i++) + { + res += a[0] * b[0]; + res += a[1] * b[1]; + res += a[2] * b[2]; + res += a[3] * b[3]; + res += a[4] * b[4]; + res += a[5] * b[5]; + res += a[6] * b[6]; + res += a[7] * b[7]; + + a += step; + b += step; + } + + return res; +} + +#define BASE ((SIGNEDNESS_2 int) -1 < 0 ? -1026 : 373) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + SIGNEDNESS_2 short a[100], b[100]; + int expected = 0x12345; + int step = 8; + int n = 2; + int t = 0; + + for (int i = 0; i < sizeof (a) / sizeof (a[0]); ++i) + { + a[i] = BASE + i * 5; + b[i] = BASE + OFFSET + i * 4; + asm volatile ("" ::: "memory"); + } + + for (int i = 0; i < n; i++) + { + asm volatile ("" ::: "memory"); + expected += a[t + 0] * b[t + 0]; + expected += a[t + 1] * b[t + 1]; + expected += a[t + 2] * b[t + 2]; + expected += a[t + 3] * b[t + 3]; + expected += a[t + 4] * b[t + 4]; + expected += a[t + 5] * b[t + 5]; + expected += a[t + 6] * b[t + 6]; + expected += a[t + 7] * b[t + 7]; + t += step; + } + + if (f (0x12345, a, b, step, n) != expected) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing statement: \\S+ = DOT_PROD_EXPR" 8 "vect" { target vect_sdot_hi } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-4.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-4.c new file mode 100644 index 00000000000..32c0f30c77b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-4.c @@ -0,0 +1,63 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, + SIGNEDNESS_2 short *a, + SIGNEDNESS_2 short *b, + int n) +{ + for (int i = 0; i < n; i++) + { + res += a[3 * i + 0] * b[3 * i + 0]; + res += a[3 * i + 1] * b[3 * i + 1]; + res += a[3 * i + 2] * b[3 * i + 2]; + } + + return res; +} + +#define BASE ((SIGNEDNESS_2 int) -1 < 0 ? -1026 : 373) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + SIGNEDNESS_2 short a[100], b[100]; + int expected = 0x12345; + int n = 18; + + for (int i = 0; i < sizeof (a) / sizeof (a[0]); ++i) + { + a[i] = BASE + i * 5; + b[i] = BASE + OFFSET + i * 4; + asm volatile ("" ::: "memory"); + } + + for (int i = 0; i < n; i++) + { + asm volatile ("" ::: "memory"); + expected += a[3 * i + 0] * b[3 * i + 0]; + expected += a[3 * i + 1] * b[3 * i + 1]; + expected += a[3 * i + 2] * b[3 * i + 2]; + } + + if (f (0x12345, a, b, n) != expected) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorizing statement: \\S+ = DOT_PROD_EXPR" 3 "vect" { target vect_sdot_hi } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c new file mode 100644 index 00000000000..e17d6291f75 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c @@ -0,0 +1,35 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-do compile } */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#ifndef SIGNEDNESS_1 +#define SIGNEDNESS_1 signed +#define SIGNEDNESS_2 signed +#endif + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res0, + SIGNEDNESS_1 int res1, + SIGNEDNESS_1 int res2, + SIGNEDNESS_1 int res3, + SIGNEDNESS_2 short *a, + SIGNEDNESS_2 short *b) +{ + for (int i = 0; i < 64; i += 4) + { + res0 += a[i + 0] * b[i + 0]; + res1 += a[i + 1] * b[i + 1]; + res2 += a[i + 2] * b[i + 2]; + res3 += a[i + 3] * b[i + 3]; + } + + return res0 ^ res1 ^ res2 ^ res3; +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" "vect" } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index e0561feddce..6d91665a341 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -5324,8 +5324,6 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo, if (!gimple_extract_op (orig_stmt_info->stmt, &op)) gcc_unreachable (); - bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info); - if (reduction_type == EXTRACT_LAST_REDUCTION) /* No extra instructions are needed in the prologue. The loop body operations are costed in vectorizable_condition. */ @@ -5360,12 +5358,8 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo, initial result of the data reduction, initial value of the index reduction. */ prologue_stmts = 4; - else if (emulated_mixed_dot_prod) - /* We need the initial reduction value and two invariants: - one that contains the minimum signed value and one that - contains half of its negative. */ - prologue_stmts = 3; else + /* We need the initial reduction value. */ prologue_stmts = 1; prologue_cost += record_stmt_cost (cost_vec, prologue_stmts, scalar_to_vec, stmt_info, 0, @@ -7466,7 +7460,7 @@ vect_reduction_update_partial_vector_usage (loop_vec_info loop_vinfo, vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); unsigned nvectors; - if (slp_node) + if (slp_node && SLP_TREE_LANES (slp_node) > 1) nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); else nvectors = vect_get_num_copies (loop_vinfo, vectype_in); @@ -7478,6 +7472,152 @@ vect_reduction_update_partial_vector_usage (loop_vec_info loop_vinfo, } } +/* Check if STMT_INFO is a lane-reducing operation that can be vectorized in + the context of LOOP_VINFO, and vector cost will be recorded in COST_VEC. + Now there are three such kinds of operations: dot-prod/widen-sum/sad + (sum-of-absolute-differences). + + For a lane-reducing operation, the loop reduction path that it lies in, + may contain normal operation, or other lane-reducing operation of different + input type size, an example as: + + int sum = 0; + for (i) + { + ... + sum += d0[i] * d1[i]; // dot-prod + sum += w[i]; // widen-sum + sum += abs(s0[i] - s1[i]); // sad + sum += n[i]; // normal + ... + } + + Vectorization factor is essentially determined by operation whose input + vectype has the most lanes ("vector(16) char" in the example), while we + need to choose input vectype with the least lanes ("vector(4) int" in the + example) for the reduction PHI statement. */ + +bool +vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) +{ + gimple *stmt = stmt_info->stmt; + + if (!lane_reducing_stmt_p (stmt)) + return false; + + tree type = TREE_TYPE (gimple_assign_lhs (stmt)); + + if (!INTEGRAL_TYPE_P (type) && !SCALAR_FLOAT_TYPE_P (type)) + return false; + + /* Do not try to vectorize bit-precision reductions. */ + if (!type_has_mode_precision_p (type)) + return false; + + if (!slp_node) + return false; + + for (int i = 0; i < (int) gimple_num_ops (stmt) - 1; i++) + { + stmt_vec_info def_stmt_info; + slp_tree slp_op; + tree op; + tree vectype; + enum vect_def_type dt; + + if (!vect_is_simple_use (loop_vinfo, stmt_info, slp_node, i, &op, + &slp_op, &dt, &vectype, &def_stmt_info)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "use not simple.\n"); + return false; + } + + if (!vectype) + { + vectype = get_vectype_for_scalar_type (loop_vinfo, TREE_TYPE (op), + slp_op); + if (!vectype) + return false; + } + + if (!vect_maybe_update_slp_op_vectype (slp_op, vectype)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "incompatible vector types for invariants\n"); + return false; + } + + if (i == STMT_VINFO_REDUC_IDX (stmt_info)) + continue; + + /* There should be at most one cycle def in the stmt. */ + if (VECTORIZABLE_CYCLE_DEF (dt)) + return false; + } + + stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)); + + /* TODO: Support lane-reducing operation that does not directly participate + in loop reduction. */ + if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0) + return false; + + /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not + recoginized. */ + gcc_assert (STMT_VINFO_DEF_TYPE (reduc_info) == vect_reduction_def); + gcc_assert (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION); + + tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info); + int ncopies_for_cost; + + if (SLP_TREE_LANES (slp_node) > 1) + { + /* Now lane-reducing operations in a non-single-lane slp node should only + come from the same loop reduction path. */ + gcc_assert (REDUC_GROUP_FIRST_ELEMENT (stmt_info)); + ncopies_for_cost = 1; + } + else + { + ncopies_for_cost = vect_get_num_copies (loop_vinfo, vectype_in); + gcc_assert (ncopies_for_cost >= 1); + } + + if (vect_is_emulated_mixed_dot_prod (stmt_info)) + { + /* We need extra two invariants: one that contains the minimum signed + value and one that contains half of its negative. */ + int prologue_stmts = 2; + unsigned cost = record_stmt_cost (cost_vec, prologue_stmts, + scalar_to_vec, stmt_info, 0, + vect_prologue); + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "vectorizable_lane_reducing: " + "extra prologue_cost = %d .\n", cost); + + /* Three dot-products and a subtraction. */ + ncopies_for_cost *= 4; + } + + record_stmt_cost (cost_vec, ncopies_for_cost, vector_stmt, stmt_info, 0, + vect_body); + + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) + { + enum tree_code code = gimple_assign_rhs_code (stmt); + vect_reduction_update_partial_vector_usage (loop_vinfo, reduc_info, + slp_node, code, type, + vectype_in); + } + + STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type; + return true; +} + /* Function vectorizable_reduction. Check if STMT_INFO performs a reduction operation that can be vectorized. @@ -7804,18 +7944,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo, if (!type_has_mode_precision_p (op.type)) return false; - /* For lane-reducing ops we're reducing the number of reduction PHIs - which means the only use of that may be in the lane-reducing operation. */ - if (lane_reducing - && reduc_chain_length != 1 - && !only_slp_reduc_chain) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "lane-reducing reduction with extra stmts.\n"); - return false; - } - /* Lane-reducing ops also never can be used in a SLP reduction group since we'll mix lanes belonging to different reductions. But it's OK to use them in a reduction chain or when the reduction group @@ -8354,14 +8482,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo, && loop_vinfo->suggested_unroll_factor == 1) single_defuse_cycle = true; - if (single_defuse_cycle || lane_reducing) + if (single_defuse_cycle && !lane_reducing) { gcc_assert (op.code != COND_EXPR); - /* 4. Supportable by target? */ - bool ok = true; - - /* 4.1. check support for the operation in the loop + /* 4. check support for the operation in the loop This isn't necessary for the lane reduction codes, since they can only be produced by pattern matching, and it's up to the @@ -8370,14 +8495,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo, mixed-sign dot-products can be implemented using signed dot-products. */ machine_mode vec_mode = TYPE_MODE (vectype_in); - if (!lane_reducing - && !directly_supported_p (op.code, vectype_in, optab_vector)) + if (!directly_supported_p (op.code, vectype_in, optab_vector)) { if (dump_enabled_p ()) dump_printf (MSG_NOTE, "op not supported by target.\n"); if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD) || !vect_can_vectorize_without_simd_p (op.code)) - ok = false; + single_defuse_cycle = false; else if (dump_enabled_p ()) dump_printf (MSG_NOTE, "proceeding using word mode.\n"); @@ -8390,16 +8514,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo, dump_printf (MSG_NOTE, "using word mode not possible.\n"); return false; } - - /* lane-reducing operations have to go through vect_transform_reduction. - For the other cases try without the single cycle optimization. */ - if (!ok) - { - if (lane_reducing) - return false; - else - single_defuse_cycle = false; - } } if (dump_enabled_p () && single_defuse_cycle) dump_printf_loc (MSG_NOTE, vect_location, @@ -8407,22 +8521,14 @@ vectorizable_reduction (loop_vec_info loop_vinfo, "multiple vectors to one in the loop body\n"); STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info) = single_defuse_cycle; - /* If the reduction stmt is one of the patterns that have lane - reduction embedded we cannot handle the case of ! single_defuse_cycle. */ - if ((ncopies > 1 && ! single_defuse_cycle) - && lane_reducing) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "multi def-use cycle not possible for lane-reducing " - "reduction operation\n"); - return false; - } + /* For lane-reducing operation, the below processing related to single + defuse-cycle will be done in its own vectorizable function. One more + thing to note is that the operation must not be involved in fold-left + reduction. */ + single_defuse_cycle &= !lane_reducing; if (slp_node - && !(!single_defuse_cycle - && !lane_reducing - && reduction_type != FOLD_LEFT_REDUCTION)) + && (single_defuse_cycle || reduction_type == FOLD_LEFT_REDUCTION)) for (i = 0; i < (int) op.num_ops; i++) if (!vect_maybe_update_slp_op_vectype (slp_op[i], vectype_op[i])) { @@ -8435,28 +8541,20 @@ vectorizable_reduction (loop_vec_info loop_vinfo, vect_model_reduction_cost (loop_vinfo, stmt_info, reduc_fn, reduction_type, ncopies, cost_vec); /* Cost the reduction op inside the loop if transformed via - vect_transform_reduction. Otherwise this is costed by the - separate vectorizable_* routines. */ - if (single_defuse_cycle || lane_reducing) - { - int factor = 1; - if (vect_is_emulated_mixed_dot_prod (stmt_info)) - /* Three dot-products and a subtraction. */ - factor = 4; - record_stmt_cost (cost_vec, ncopies * factor, vector_stmt, - stmt_info, 0, vect_body); - } + vect_transform_reduction for non-lane-reducing operation. Otherwise + this is costed by the separate vectorizable_* routines. */ + if (single_defuse_cycle) + record_stmt_cost (cost_vec, ncopies, vector_stmt, stmt_info, 0, vect_body); if (dump_enabled_p () && reduction_type == FOLD_LEFT_REDUCTION) dump_printf_loc (MSG_NOTE, vect_location, "using an in-order (fold-left) reduction.\n"); STMT_VINFO_TYPE (orig_stmt_of_analysis) = cycle_phi_info_type; - /* All but single defuse-cycle optimized, lane-reducing and fold-left - reductions go through their own vectorizable_* routines. */ - if (!single_defuse_cycle - && !lane_reducing - && reduction_type != FOLD_LEFT_REDUCTION) + + /* All but single defuse-cycle optimized and fold-left reductions go + through their own vectorizable_* routines. */ + if (!single_defuse_cycle && reduction_type != FOLD_LEFT_REDUCTION) { stmt_vec_info tem = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (phi_info)); @@ -8646,6 +8744,15 @@ vect_transform_reduction (loop_vec_info loop_vinfo, bool lane_reducing = lane_reducing_op_p (code); gcc_assert (single_defuse_cycle || lane_reducing); + if (lane_reducing) + { + /* The last operand of lane-reducing op is for reduction. */ + gcc_assert (reduc_index == (int) op.num_ops - 1); + + /* Now all lane-reducing ops are covered by some slp node. */ + gcc_assert (slp_node); + } + /* Create the destination vector */ tree scalar_dest = gimple_get_lhs (stmt_info->stmt); tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out); @@ -8689,6 +8796,58 @@ vect_transform_reduction (loop_vec_info loop_vinfo, reduc_index == 2 ? op.ops[2] : NULL_TREE, &vec_oprnds[2]); } + else if (lane_reducing && SLP_TREE_LANES (slp_node) == 1 + && vec_oprnds[0].length () < vec_oprnds[reduc_index].length ()) + { + /* For lane-reducing op covered by single-lane slp node, the input + vectype of the reduction PHI determines copies of vectorized def-use + cycles, which might be more than effective copies of vectorized lane- + reducing reduction statements. This could be complemented by + generating extra trivial pass-through copies. For example: + + int sum = 0; + for (i) + { + sum += d0[i] * d1[i]; // dot-prod + sum += abs(s0[i] - s1[i]); // sad + sum += n[i]; // normal + } + + The vector size is 128-bit,vectorization factor is 16. Reduction + statements would be transformed as: + + vector<4> int sum_v0 = { 0, 0, 0, 0 }; + vector<4> int sum_v1 = { 0, 0, 0, 0 }; + vector<4> int sum_v2 = { 0, 0, 0, 0 }; + vector<4> int sum_v3 = { 0, 0, 0, 0 }; + + for (i / 16) + { + sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); + sum_v1 = sum_v1; // copy + sum_v2 = sum_v2; // copy + sum_v3 = sum_v3; // copy + + sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); + sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); + sum_v2 = sum_v2; // copy + sum_v3 = sum_v3; // copy + + sum_v0 += n_v0[i: 0 ~ 3 ]; + sum_v1 += n_v1[i: 4 ~ 7 ]; + sum_v2 += n_v2[i: 8 ~ 11]; + sum_v3 += n_v3[i: 12 ~ 15]; + } + */ + unsigned using_ncopies = vec_oprnds[0].length (); + unsigned reduc_ncopies = vec_oprnds[reduc_index].length (); + + for (unsigned i = 0; i < op.num_ops - 1; i++) + { + gcc_assert (vec_oprnds[i].length () == using_ncopies); + vec_oprnds[i].safe_grow_cleared (reduc_ncopies); + } + } bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info); unsigned num = vec_oprnds[reduc_index == 0 ? 1 : 0].length (); @@ -8697,7 +8856,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo, { gimple *new_stmt; tree vop[3] = { vec_oprnds[0][i], vec_oprnds[1][i], NULL_TREE }; - if (masked_loop_p && !mask_by_cond_expr) + + if (!vop[0] || !vop[1]) + { + tree reduc_vop = vec_oprnds[reduc_index][i]; + + /* Insert trivial copy if no need to generate vectorized + statement. */ + gcc_assert (reduc_vop); + + new_stmt = gimple_build_assign (vec_dest, reduc_vop); + new_temp = make_ssa_name (vec_dest, new_stmt); + gimple_set_lhs (new_stmt, new_temp); + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, gsi); + } + else if (masked_loop_p && !mask_by_cond_expr) { /* No conditional ifns have been defined for lane-reducing op yet. */ @@ -8726,8 +8899,19 @@ vect_transform_reduction (loop_vec_info loop_vinfo, if (masked_loop_p && mask_by_cond_expr) { + tree stmt_vectype_in = vectype_in; + unsigned nvectors = vec_num * ncopies; + + if (lane_reducing && SLP_TREE_LANES (slp_node) == 1) + { + /* Input vectype of the reduction PHI may be defferent from + that of lane-reducing operation. */ + stmt_vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info); + nvectors = vect_get_num_copies (loop_vinfo, stmt_vectype_in); + } + tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, - vec_num * ncopies, vectype_in, i); + nvectors, stmt_vectype_in, i); build_vect_cond_expr (code, vop, mask, gsi); } diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ca6052662a3..1b73ef01ade 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13350,6 +13350,8 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_load (vinfo, stmt_info, NULL, NULL, node, cost_vec) || vectorizable_store (vinfo, stmt_info, NULL, NULL, node, cost_vec) + || vectorizable_lane_reducing (as_a (vinfo), + stmt_info, node, cost_vec) || vectorizable_reduction (as_a (vinfo), stmt_info, node, node_instance, cost_vec) || vectorizable_induction (as_a (vinfo), stmt_info, diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 60224f4e284..94736736dcc 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2455,6 +2455,8 @@ extern loop_vec_info vect_create_loop_vinfo (class loop *, vec_info_shared *, extern bool vectorizable_live_operation (vec_info *, stmt_vec_info, slp_tree, slp_instance, int, bool, stmt_vector_for_cost *); +extern bool vectorizable_lane_reducing (loop_vec_info, stmt_vec_info, + slp_tree, stmt_vector_for_cost *); extern bool vectorizable_reduction (loop_vec_info, stmt_vec_info, slp_tree, slp_instance, stmt_vector_for_cost *); -- 2.17.1 From patchwork Sun Jun 16 07:32:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1948269 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=EDtg6g8r; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W24Vd1ZKSz20Wb for ; Sun, 16 Jun 2024 17:32:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9BD4D385840C for ; Sun, 16 Jun 2024 07:32:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2071d.outbound.protection.outlook.com [IPv6:2a01:111:f403:2412::71d]) by sourceware.org (Postfix) with ESMTPS id 28C403858428 for ; Sun, 16 Jun 2024 07:32:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 28C403858428 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 28C403858428 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2412::71d ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718523153; cv=pass; b=HGldXPdah5pblACy0OQd8nXYFiVnLbZO6/FFle9kdYTQgEk4BxYc2lIvJ5nYtSZ85H8exQNf7Vesb8JFuo/AhDHasSEOp+Z9WP6d52JP3GVhPrPCtjuPDSeZjOPm/aYhOIoWBdC+VdPsB6E9FM/iKLPvdbbiJOn/5sDcKRU+b30= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1718523153; c=relaxed/simple; bh=9m2a6G7qmPaOypsLld5jfcra+Nwyg9OKuBOinqAfa5Q=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Rzp2s3oEHmvHTYi/IrRpuPL4TePJm7IAxzx7e1FFwASTtgb+3Boj7ElMbD9A5Rmc2u0NAzqYsgY3+xUK8B+i5woPh9mOLLnK9KfCbrpSaOadLEgjNn+roFutS/HftnjvWDyBU5+4jO68EdZrz3hgolpZj5XSNQ1Sy0MN4gWyQzA= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CLAyrdh/hl9/38llqBNEg5NexdrqSpuHMwbP9BzmUtQiwjGR1aeTImhcMjeaYrHpuHVCVmOZjcP9z0jrV8u2helmH5js9rErXmV53H9gYHmMZlebvl72H0o10IjApeNnjig74gA2dR2KgYUCzFGycjrtR68VYfbh61Xll1+omj5TlKFiosF08+RzzFTYA2G7rFLGZ+idxQmfBX/BvvrNopqTYXRlesLjsglJZ5D6M6SDKiDS2xFZ1vX7/Qpo/nSR/EV5B1Q61dIxoQCnoDueA0wYjyyV4H8PJQ62gYuAr+zxJFxpAaMD2GAS+GCyrx+iaRxZrIG4jUDtzM7Iyzlbuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=csdxMgcJA9Ws/NliEOBGINGGiRmS98C8kVW7OdsD488=; b=HW4MSt7iJEzgcKhcXAvs2pz/zGcn5HZeoFjIWO6tLcseX2Z0KRqV9llYy8Jl+H6Wvo1//Ew8ly9MarQC/oscvZlKnXSRdlY0OWKO1zFVP+FIdLjLdNGzf0ei+191kqReYZ6SLyhot1Nqq3MuiBc6+6Z+i7PpLumSpTGLRTMMF+pd/ds21gp2nRs795Bk3hD0RGSvNPudSv47pOfQ00WYLCHw0GGYA7Stk5DeVxM+dIOBfxI9PZKp2X0IDHLyD1+8FmMA1cijg53VxN2zba82Sm4qQ0rZEC5RfZn4+Cd0iUbonp2XmwQq2E1vpCPeK5C9o6wP3loOSMwGe51fOR1BzQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=csdxMgcJA9Ws/NliEOBGINGGiRmS98C8kVW7OdsD488=; b=EDtg6g8rNYyxdExIgOUtT3a68UCwzynP/n4pvlLNo4LhQe63w5AR4c0T6cUEU+OuhvvedvGrjS4gc6th1HIGgDdyTbrSOLXd9GjdYoV0khnAiT4b1uxycH+BAbAaUWNekb3BF7YZIZUAcXJJBgRYImS6r+Y+6cQD04GGq6WfnOE= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by SA1PR01MB8592.prod.exchangelabs.com (2603:10b6:806:383::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7677.25; Sun, 16 Jun 2024 07:32:28 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7677.029; Sun, 16 Jun 2024 07:32:28 +0000 From: Feng Xue OS To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles Thread-Topic: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles Thread-Index: AQHav786v2Z8y9EFd0WdWlutvvPXVg== Date: Sun, 16 Jun 2024 07:32:28 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-06-16T07:32:28.375Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|SA1PR01MB8592:EE_ x-ms-office365-filtering-correlation-id: 3829e542-64bc-49c7-3bb9-08dc8dd679dc x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230037|1800799021|366013|376011|38070700015; x-microsoft-antispam-message-info: =?iso-8859-1?q?lmWYlxIbEmuND07Ow++GIRGQ4c?= =?iso-8859-1?q?ysZlWFGlIaPspRQVnsSjOsu9CR6c+ANt8eiBUxCbidB+K8NqQ+fldU6RmELG?= =?iso-8859-1?q?0d5mBa4QlqamH/o1uNJe5eEDriELESGQ2IzmSNUdTpaIAvSZBvsqJHq7glwc?= =?iso-8859-1?q?lSYkjYPwZd+S3YPjrfejpOub5I3jRvD+A/LYwinhK1ikn7kfQ5bM9gsQEzom?= =?iso-8859-1?q?QME1YZOm7N01tw2/UlWJcAXEgby3lGn7icVagidOMHrq5hO8Cvb03xPsJckV?= =?iso-8859-1?q?FUb/zDosmvR3iVg7Dk0WafsmxSY4qbO58u4rvXRwZk4HiWvLa/YjpE2Kaqxf?= =?iso-8859-1?q?yjo4Zf+/z0/ggIsifFk5fuVfQTmleElAt6rP0Nd4PL71qbn+fz+D9voLYWr0?= =?iso-8859-1?q?S4XTGa7Iy9VitmUwdA9dtpRB+zcaG3hNl5bPf2aWL2lsvMLcyM0Uk1FJ3VmV?= =?iso-8859-1?q?IkWUzUTo2+utdQ9p6jRs0fw8QaQvNcp0/UDds4nMVQFROubJPhVAVvEjvmhs?= =?iso-8859-1?q?5SC7/M9liTx374oRF4rEx1QuPhn2Pwyx3WvtF3x6+qScJFVvZB2UdfVSXaMx?= =?iso-8859-1?q?f6d27zOHUgQ4RL0XlYxIbz1tZLYVSzTizgC7/IaQyHUYCTHAuxTO6rsUMT7e?= =?iso-8859-1?q?1frbhTzkr0BBlfwSIjCe//8b+ThfgTL5su2vtzCUzSUucztN77jtNZjVFMI3?= =?iso-8859-1?q?N+aPQJJm89K9qxrRYcYXt8WbFFsZRIDFQ8y3x4zwKWQlWxKW191zXa5Jg+sm?= =?iso-8859-1?q?bRbjN6os4jHAmWq0mAt11XXDjumcdDH5l1wrWFWas1o3wMoUE21/tGBta0qc?= =?iso-8859-1?q?4ubZHFmhmtESaqdTFksdn1hpBqlxawnFgcV+a1SVQz+EZzpScHxay0wQWQXI?= =?iso-8859-1?q?J4KIt4vgSnsSmjF5TIT6hSL6HFHo8MmjiOaP9tJFugug/FB9bBDsP/pXFyiJ?= =?iso-8859-1?q?busqQ9ay+limh/Mg0dYvllhYo1h8br5K70sam7Bmj59Bd77+EDZns4Kr6c+6?= =?iso-8859-1?q?6ZG1cOuK64QnB3PexfiRZ/E84a9vdcH69oxeRmHG/hHcK1MQzlEAhZ3Eqg4+?= =?iso-8859-1?q?OLzZiiPqBMMajQAIkGoQoG+tEifJ1VgNFYBsU6LJ4bo+uNH6AtcP4uXupgAD?= =?iso-8859-1?q?Yb6rP2gMw26XJVT5dLl2/wwVxvYHPUx6Vwhrxv+BcKFzvw7KwY9IodjRqmEk?= =?iso-8859-1?q?hGY8dFQuO2Qj8SGhWzll5TWPfj5Uzy3hGr2w4BGC1Cm/XVY8wVuQxmeO/Gsm?= =?iso-8859-1?q?06td4faShvoB3KZ3V1sh17w2VXCnIAHR2Rr9n6iQ3C7tuKlBgEHldDyL023u?= =?iso-8859-1?q?MjA5nlttZZijqYx52KF5CJG8u+THKKEdqKkmc/ll5lFSbTDF8YZBUMV9bWxk?= =?iso-8859-1?q?+KwE7cfOxHWdlnACJZbiayp0m0MMUWE0hrhd2/e/wDy1+Wzj2ItcfdEFYxwi?= =?iso-8859-1?q?Bl?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230037)(1800799021)(366013)(376011)(38070700015); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?+WVQHV2TOCFtgXw204UZKMp?= =?iso-8859-1?q?DcUw5uLCaRYf6S+ElAhz82YkZxM8p4uPys75l43gErWjHN6t7ilbFQ1SVnIr?= =?iso-8859-1?q?aU9j1eDLlxH9RSI5A46XXChUe+W5kMhugDypek0UescPoQz4ggzZ88MmiFHt?= =?iso-8859-1?q?a+SXh7XsWO8jvZM1QLDGFRaCRUUWIzALxopQG4ZcUAb1+RcDG4LFr6j1Do/Z?= =?iso-8859-1?q?ASnP6LRDABbbxXNAyGx4pLUyj9h7oX8kx52oxSknuQEJU8/ADvFSlv50G/Ff?= =?iso-8859-1?q?md62Cr7trWi4xKCPb/3jVEe+1dF+eAVoqTWuQXHiZboPc5kp7i0CuLQKwiae?= =?iso-8859-1?q?gLF0j9wCtWqFpg7DtRcUBfFgVvtnb952wecLUZ2+OsuxBLAqVkkdaNis3fIW?= =?iso-8859-1?q?o2dggjnTyhaiLtJgVL1YizF1IzXK5C3CMzWSqcQkQTOZ6PrzD/fFxMwRjQ6N?= =?iso-8859-1?q?qQrbyT7Mf90tTbambyt/lvmsamkAOWA6fLTmxCouytE4K1KRJL2l/hP0XYWH?= =?iso-8859-1?q?80hQ4RG7nkJCekrfkq4Wi+t4gE8wOkeq14JTesw3UfiD0eP2GFXdEH2kHT/Y?= =?iso-8859-1?q?ZQEz0/ffulHu6pCOVutlN7Kqo4Y1bg98KYsdTL2oHRb1HZ4wwtoya1geWf1U?= =?iso-8859-1?q?1JyvGmVf4pRD5T2p/bsylrB/zfSZ3Kx/+1c22aNRzoTZjggrloAYGIVWah3G?= =?iso-8859-1?q?KuzjVi/ojPX9/g3DHjUWU/WTqoCT49mYMKsgNNKU8DsTKd2MGl/NK3tb1f3W?= =?iso-8859-1?q?VPyyn9c/hwvgISRjTMDRsW/Pl57INoe5+S77N7cZ6gk24nXR6A0D4poQpuKq?= =?iso-8859-1?q?ehXV4bR3cajovqTzbSf58+6iUOvyNTkmm6+o2rjyGUBIhZpBNkidIwW4IxH2?= =?iso-8859-1?q?lJX0Wo0sWedP4IUxaAsSD4lGSO863ScHYBglDjNYulCcghNbXeTvP/xUIFU1?= =?iso-8859-1?q?zIHTCjI2StpjqWfA1Bzv5DxF0IwjgEsVIQxTQslv8coqcH3HTzWPJvH8mFZj?= =?iso-8859-1?q?V7z4GePBGvYX+6OJ7PMkswAWbgjsvVh8iE0JSFv5WhSura5/wdiNiGbAoTJK?= =?iso-8859-1?q?nkn5cwBKPKflUdAhUa/l0wNjywJ5dfrgNVhDb8h4Wd+tMlBNVA4iNi00r3bH?= =?iso-8859-1?q?wH7h5S0wu4uMmScPoEMV6udcRmPdFod/GFflQpjoUuKuPGfacMU/pYwIGufw?= =?iso-8859-1?q?AbUGZPAWiadAUnuIFtO3ffZ2nkywr2x5FR04OgvBlPeP8+JiIbE4+vZLcdPj?= =?iso-8859-1?q?XhUu2bO5NdSlvdLlzjOWHYyF6hu7E1DlmIi7Yyzu9afcAI2KMtnY++tXml7r?= =?iso-8859-1?q?YTfIXhT/jkX44wDBV1FpAganDHqaShmTaxbuAhNv2tPkW1OEYZj2zi81214u?= =?iso-8859-1?q?1/Cvi6xu6gZ0rI0DnNqgL5tRcRoQ+VotRQfuNzLeE9QNYjvxxHy41ce++ob+?= =?iso-8859-1?q?owwxY+UcEUFZeEnuMV4B2xqifyRQqb8joIpC624QWcg37NTENEjnRfDAl9vd?= =?iso-8859-1?q?SxAs5n3iW6WEEX7hCxPz9+qn70cis5AQ1XxzFTnpXzN9bCHgykmdPcgbF0YM?= =?iso-8859-1?q?AIPrVp/C9HbCloi05bqrBJrIjeSsMYmk/JSoxoLuseKaTlXkIbgBORT8tgSp?= =?iso-8859-1?q?Z0SX76cRgpdr3/vR1?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3829e542-64bc-49c7-3bb9-08dc8dd679dc X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jun 2024 07:32:28.6499 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 8EFOpwNxjfDFaMzQ4vBzhJDS4ewd+RgO5/jt5JuLyZnHX+sq/PXGGDf67pJY/vLXSbJ46RvdAFp71WEnSwhX9AQnIRrW/vcNqWmkkXY9AnQ= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB8592 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example: int sum = 0; for (i) { sum += d0[i] * d1[i]; // dot-prod sum += w[i]; // widen-sum sum += abs(s0[i] - s1[i]); // sad } Original transformation result: for (i / 16) { sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy } For a higher instruction parallelism in final vectorized loop, an optimal means is to make those effective vectorized lane-reducing statements be distributed evenly among all def-use cycles. Transformed as the below, DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles, instruction dependency could be eliminated. Thanks, Feng --- gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. * tree-vect-loop.cc (vect_transform_reduction): Generate lane-reducing statements in an optimized order. --- gcc/tree-vect-loop.cc | 39 +++++++++++++++++++++++++++++++++++---- gcc/tree-vectorizer.h | 6 ++++++ 2 files changed, 41 insertions(+), 4 deletions(-) From 1f2e05a6787eb4449a24a9d6e371ae162855aaff Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Wed, 29 May 2024 17:28:14 +0800 Subject: [PATCH 8/8] vect: Optimize order of lane-reducing statements in loop def-use cycles When transforming multiple lane-reducing operations in a loop reduction chain, originally, corresponding vectorized statements are generated into def-use cycles starting from 0. The def-use cycle with smaller index, would contain more statements, which means more instruction dependency. For example: int sum = 0; for (i) { sum += d0[i] * d1[i]; // dot-prod sum += w[i]; // widen-sum sum += abs(s0[i] - s1[i]); // sad } Original transformation result: for (i / 16) { sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy } For a higher instruction parallelism in final vectorized loop, an optimal means is to make those effective vectorized lane-reducing statements be distributed evenly among all def-use cycles. Transformed as the below, DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles, instruction dependency could be eliminated. for (i / 16) { sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = sum_v0; // copy sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1); sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = sum_v0; // copy sum_v1 = sum_v1; // copy sum_v2 = SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2); sum_v3 = SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3); } 2024-03-22 Feng Xue gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. * tree-vect-loop.cc (vect_transform_reduction): Generate lane-reducing statements in an optimized order. --- gcc/tree-vect-loop.cc | 39 +++++++++++++++++++++++++++++++++++---- gcc/tree-vectorizer.h | 6 ++++++ 2 files changed, 41 insertions(+), 4 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 6d91665a341..c7e13d655d8 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8828,9 +8828,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy - sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0); - sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1); - sum_v2 = sum_v2; // copy + sum_v0 = sum_v0; // copy + sum_v1 = SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1); + sum_v2 = SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2); sum_v3 = sum_v3; // copy sum_v0 += n_v0[i: 0 ~ 3 ]; @@ -8838,14 +8838,45 @@ vect_transform_reduction (loop_vec_info loop_vinfo, sum_v2 += n_v2[i: 8 ~ 11]; sum_v3 += n_v3[i: 12 ~ 15]; } - */ + + Moreover, for a higher instruction parallelism in final vectorized + loop, it is considered to make those effective vectorized lane- + reducing statements be distributed evenly among all def-use cycles. + In the above example, SADs are generated into other cycles rather + than that of DOT_PROD. */ unsigned using_ncopies = vec_oprnds[0].length (); unsigned reduc_ncopies = vec_oprnds[reduc_index].length (); + unsigned result_pos = reduc_info->reduc_result_pos; + + reduc_info->reduc_result_pos + = (result_pos + using_ncopies) % reduc_ncopies; + gcc_assert (result_pos < reduc_ncopies); for (unsigned i = 0; i < op.num_ops - 1; i++) { gcc_assert (vec_oprnds[i].length () == using_ncopies); vec_oprnds[i].safe_grow_cleared (reduc_ncopies); + + /* Find suitable def-use cycles to generate vectorized statements + into, and reorder operands based on the selection. */ + if (result_pos) + { + unsigned count = reduc_ncopies - using_ncopies; + unsigned start = result_pos - count; + + if ((int) start < 0) + { + count = result_pos; + start = 0; + } + + for (unsigned j = using_ncopies; j > start; j--) + { + unsigned k = j - 1; + std::swap (vec_oprnds[i][k], vec_oprnds[i][k + count]); + gcc_assert (!vec_oprnds[i][k]); + } + } } } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 94736736dcc..64c6571a293 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1402,6 +1402,12 @@ public: /* The vector type for performing the actual reduction. */ tree reduc_vectype; + /* For loop reduction with multiple vectorized results (ncopies > 1), a + lane-reducing operation participating in it may not use all of those + results, this field specifies result index starting from which any + following land-reducing operation would be assigned to. */ + unsigned int reduc_result_pos; + /* If IS_REDUC_INFO is true and if the vector code is performing N scalar reductions in parallel, this variable gives the initial scalar values of those N reductions. */ -- 2.17.1