From patchwork Sun Jul 21 09:15:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Xue OS X-Patchwork-Id: 1962888 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=os.amperecomputing.com header.i=@os.amperecomputing.com header.a=rsa-sha256 header.s=selector2 header.b=cvrE7w79; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WRd9l72dvz1yYm for ; Sun, 21 Jul 2024 19:18:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 39291386075D for ; Sun, 21 Jul 2024 09:18:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from CY4PR05CU001.outbound.protection.outlook.com (mail-westcentralusazlp170100000.outbound.protection.outlook.com [IPv6:2a01:111:f403:c112::]) by sourceware.org (Postfix) with ESMTPS id E6224385ED72 for ; Sun, 21 Jul 2024 09:16:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E6224385ED72 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=os.amperecomputing.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=os.amperecomputing.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E6224385ED72 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c112:: ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553371; cv=pass; b=fw16F8635u5EdjSwnbITY8EcI9h9ornFEH4q3oEsoAA1uQnj5d2d/4voBJZR5QwbzEvkQ/0k8inxkVdf5ZzphKJF+ADD3vjSDgSZKvR9AfnV90JcG9So8I87+WZavPdE5awzsj5l7zLUow49yR/PK9SCjxYz5l2gMzMPCcM3IvE= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1721553371; c=relaxed/simple; bh=ypd3i8eGmdrq5BcDTkDP8yxfxFTI0X7AUgrTyNOsSuU=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=cHLh/VKEGsjsAPCoyGi1Wwcgln/k9m8qS8gX3d41fd/wAs+KMY69iySRqB1JWxs0ak9EOsfLlDGNGEsSIfdcjdEijDqUxpViOkF9gbYx/siE1IatAZ4H1zsHG0zl1cLTVOutoSW6h8XLXl7mSy0BTTTDe7o96PUrAuJFSmgqvRQ= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=S9rwSHvMGzMpUENg+SQQ9NH+VhA4jaYydOp+gQZvYdGoqMLh6kFXZv1teYzYAp2wXBFyNKrBmwfdgcuDbYe/Fo3Aw69f99B1IUqNLA/uCZnoxscUKPIlZ2O+SpV7oQQ8BXK959rf2KwEqGSpvV/TMnHmu8GcHT/Bk7w2p9ogr8NWIFLypPTAz72rdjuMHi+APpBD+9Y5Qr7wr7hRKWzOla9I7R0bU/CJAoQbKOOrghSYPdtFXjZghrm5ZqnybRGdzNr/CbXyXaIyF5l1CAQRdCatmTSZTFybssXKzTtYD9pUQ3sXm735k/WazkhJBELf+NOYj/dAf4UvDGVix1iaAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=92owvmFg60FK1xhhdSdWd/j2iojsmNjkdIWNGHqfyLM=; b=IVTkZmZLXtkjVcHj8rdCW4dhyx5PKkL1iFYtpraF7ovZWm62sdwB0bgPp4iDPK2ZS9RPW2ifR//MxTC9iMoxaHBheIo95yQ0X5sHQTO2sGeCM9O6HZd2vZSzUFLSO5MbhOheckUbYHKq/Tf1E1jHaAmQYOiiBn0c0ds38xdGvaDBghpLZ8/Cs1Iw48OUC3p02hvLjOr9ZKkguxXYmsP2BPUmuILW4IP2hJvSa6E8KgFSI+gVWyYWjYlNAveaqw4jdHrZ6bPw23ZfXEVNpkPf5amrJ4+yovq3pm/imTfrCKjW2Fjha9+QUnY250B8IuNP2ZRdwhPRSY/2UzZ2hVGf4w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=92owvmFg60FK1xhhdSdWd/j2iojsmNjkdIWNGHqfyLM=; b=cvrE7w79Tq7wzxObZhpUAN87/GSDx4cbTm2MI5NFqI186FX8JHVwq4JT0lRBl4G9u4SzPaV7OHljnao1M1R55KXFhGnhndGwkUNo1IGwujJj2ZQn6xjfxfSDbPXYxUoEL2ZB0aiu5+ke9lr571vjvcjs7WeBVwBEvYVj6TYI08w= Received: from LV2PR01MB7839.prod.exchangelabs.com (2603:10b6:408:14f::13) by DS7PR01MB7855.prod.exchangelabs.com (2603:10b6:8:82::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.14; Sun, 21 Jul 2024 09:15:56 +0000 Received: from LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63]) by LV2PR01MB7839.prod.exchangelabs.com ([fe80::2ac3:5a77:36fd:9c63%4]) with mapi id 15.20.7784.016; Sun, 21 Jul 2024 09:15:56 +0000 From: Feng Xue OS To: "gcc-patches@gcc.gnu.org" CC: Richard Biener , Tamar Christina , Richard Sandiford Subject: [RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation Thread-Topic: [RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation Thread-Index: AQHa20tQ4OgEd3B6NkG7ouF+1GXWyQ== Date: Sun, 21 Jul 2024 09:15:56 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2024-07-21T09:15:56.401Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: LV2PR01MB7839:EE_|DS7PR01MB7855:EE_ x-ms-office365-filtering-correlation-id: b2a2d631-c70e-452d-1f8c-08dca965ba92 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|376014|366016|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?q?eefl5+hwqNOe3pYLNkv6ev45LV?= =?iso-8859-1?q?/d9I2Y+jMGmJMpqPolzDdVvnvNGlRNwMgP7BEJSM2agQR6fHkvGA9fJ2Zm2i?= =?iso-8859-1?q?p4JeXJXxSCVxwCXT7PzKlK5DUB/vjzcXf8mS7AFcPTJPuXN+7HnwvgbwVhB4?= =?iso-8859-1?q?0Q0tvVKmO4bsHbDa/s8Xm4EpoewiH274lRXfdJ+2opAlJo+2fl/D38cm1jWm?= =?iso-8859-1?q?TIXNpJwyJ2lwNfGuBQ4FKwRHjgGMvqI8aP/HjlDcqhzV/pOAfTedx830HD24?= =?iso-8859-1?q?7PoLZtGi/QT1+bHbk4fzCLHIW6jRi2zixUhtJ38XQSBG+FzJSY9yHWx+oV3c?= =?iso-8859-1?q?+Hy9YXU3I7cRWVTSw/+W99BHtZ7tlHUZdl9Zbj0Go8T49TXJ1WwbwcU7rqeU?= =?iso-8859-1?q?YSL6eNgz1NQb+pxfK8/bsWZPk01hg8IRijoz6UJJbwOKo95aES57GBA8b1Xc?= =?iso-8859-1?q?l7kAbF8omdhj5Ix/Ub6I5T8QRkNzDwG2armIKsFdc972uWRkWxcOaW2k39T5?= =?iso-8859-1?q?wktkfsq/58SUot7kRLTh0+KMdgxvO9ZobwWX36gEaL3HZbo+BWkEDHKBCCh4?= =?iso-8859-1?q?HSBByT0Ulpj2jtqxiPAT/RCagO+cUlseCzk5NK8+W1y4Ib6AyQJTncqqEmsT?= =?iso-8859-1?q?Qw4RVmJ76NkWrIirKRnMUQ0uNQis4QHTyIZ01L92ECtuiDFI79S/n9V/I9Kl?= =?iso-8859-1?q?SwdmvEUTwlNwYDd5UQ86jK4+lB3tS7YCVnxeHbYP7Vh1Pt4M3zHRV3QWAwYp?= =?iso-8859-1?q?oq7pbQuOVEBY2cjVh8CGPvGdqE7SdmrB0qIB4Z1JDBZ6oSKl+Yu8w3i1bKs3?= =?iso-8859-1?q?ZhrZ4Sqr9/f4q3V7gO46KeNa0aTGS9cPJBK0842Xezev6R5hzZztPF39HAjU?= =?iso-8859-1?q?s8RqTeLytOkloXtkG4WIMi6w6GluahcheW32NAxTUolhexlc55CIoKZnpVRp?= =?iso-8859-1?q?/Sd2P4f/vWtti85u9XspWtmKuzrESeEKFiqcJgegnh9PNZSTo+kJOxHcwDF/?= =?iso-8859-1?q?qrWug447Vgfgq6jZeox1KXY29pUv2sXSE31+PVmhnBPvvchxLSZuDpI4A5OM?= =?iso-8859-1?q?u9ezdrmQUd290Y5coM3yBPwfRoEFOLrjbsXApfuX9DXBCPKftLspdLdvaaAj?= =?iso-8859-1?q?f7yuc5zV/Tt20eFOwcTnw4QoV17iAmzJst4BLpUVbFWUS8zl/9POWGqaA70k?= =?iso-8859-1?q?ivJuQm4dD0cBno0LLJIuMdFE+FKVpRbys9XL5iimlPNpZ3TyNBjVqJhokbFn?= =?iso-8859-1?q?bmorxw/NaN2jaVp4qQSa0HG43y7xPTdLkJKOodKKkkMhoq4rrbYe9JQDfP2u?= =?iso-8859-1?q?mNpGSOvHF/rYbzJzVJH1uysv3eA7aMvqFxBSzDQyD2obg8lznywAJk4hWdOX?= =?iso-8859-1?q?LEl7+L3IgXkypzghXEJzT8VgQ8p9kwFGf7F5V21Dc=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:LV2PR01MB7839.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700018); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?LNZlY0MlyGvZzZFFL4sDR0f?= =?iso-8859-1?q?M/7ssSaVfK6aIFOBEI/Bg8pRGX4s41z6wA7BSj/FKVDMFgM+FVjQeswxVjgq?= =?iso-8859-1?q?Fr/XBTQ3yPxmxR5LDpNJPCnoc5GokevrbjkRYonel+j2MArTSRduVl/3EipS?= =?iso-8859-1?q?gqCDfRj88CkivSDwmiZCwa0+YbWHarD0t5yEPrHQVJU6p0F5g0zYiyhWvMat?= =?iso-8859-1?q?LDeCly/310t23UUFRDzH+u4M2L5fhBAEglceIj3bWS/HJvcWdrDsirdJOqKX?= =?iso-8859-1?q?LfKGwLKQooKnkot5fnU2La8YsLcV+r4hJlr0Rb0yKZCnTUXMYeTJ5oIjB3R9?= =?iso-8859-1?q?erB1wQCakNxv49VD7ekzf4/1NKqV1napkPZv0bBGb7Mrj4ENDGkD545XTJwQ?= =?iso-8859-1?q?YM1HRAs/uWpvrzHTMooFQjxk0cm8ekIeQIURaQVfHiVC0wD1tDUycwwxHL93?= =?iso-8859-1?q?MCmbIz21gN0kw6ZiormPgb+LCvoXn5B4B5ItgjIm4Tr6qel5CHqUk/u1EA66?= =?iso-8859-1?q?Nkp9elLzZy/sNiJNro/FbQIFfp/NOwc7751bYeRBOdw3w3Zi8ipTVIjEZMTG?= =?iso-8859-1?q?pJRKdevFKZ7W84dvwx6d3J+jeDQtotNWDil+QW4QHQSMd8/DdzyZUvHAWZF0?= =?iso-8859-1?q?Za4TzdB8/b6O8qgRU7BDiYR5Izf8KSJrTM7NJOC1RFblwckSkNCyOyoXJQjD?= =?iso-8859-1?q?Rwp1Dvcje9hRVNc7HX9k7KMa2cXzc3HFItNa9MccQGlpTOvzSANiXQ+y/Hzi?= =?iso-8859-1?q?ne4Hio+Vffq9yAdthj8gQ6XSss1IVbP9Buk2no31kx4c8pgrnfKs1nWZWN1R?= =?iso-8859-1?q?Y1jwNx/RowB3vA/CQUUiTlKSfnGWVUrSRNfvtpYSPBpa4iPLd+/agNVMynZj?= =?iso-8859-1?q?SfBVDjHUhXX9vmUPYYKAY8bzMAWXzNMUn4bjM0GHB692o3eJ5+T5GnZ7MN+e?= =?iso-8859-1?q?51n0H8Ehon3Me83/DiTCx9Otx7PTYT6BE9jwuV/iFYOgfVzL+dtIEWU3WdLM?= =?iso-8859-1?q?zrdY9W3BqG1LDoIk9TXD07CLKyAe8lAdSBFjROSNMUV3PVAKRuEZ97rHUI3b?= =?iso-8859-1?q?ISC8wKwYjQsJ9xR30du8yTn7nvTlwnOi8saWP1p2OXnFnBKDgyZrTRtcLipH?= =?iso-8859-1?q?gzKAgApylYBSuKWsi3fPNo49PrxWRGEbDgaPehCIU7jwhU755fvuPXvXkSE+?= =?iso-8859-1?q?iE/hcLVpDLd9b47GIBE4r/X9XLwmiXcrKbVfY/dvGEf8QQgyetdT8aeUdPuc?= =?iso-8859-1?q?cgcEYdbM4P+gdSk9vwWE8iNV+y3XH7Bw9VV7Lzd9O0FbDhoOkOqEyCYkODz0?= =?iso-8859-1?q?Aq6dZK/hl4F/1VPiBrEVqy6DHVqc1+tlwIrffIPh9hM4oMpH9AtqEv5gh2uC?= =?iso-8859-1?q?6w68Z+Q5rR+ee5Jjw6PfekL/KRE97SOsF0lQdk5VTFpjoQxG1lQengsGAKiF?= =?iso-8859-1?q?JwdIy6Mx9FymKMZWhnK3EyhcRPOL8127z9hCMCrN+yHCN2FZyd48VonbhZBM?= =?iso-8859-1?q?8D8D22ZELQw0vX2caemvhh/G/4i/MY+Xfum8ral5W96eAdHXn4lH69CZoCAa?= =?iso-8859-1?q?JITKG05JqRa38GsJ3lwHx0/8U6dy0bhs265zldRH4SLtFSqcwYyF/eTdjeFU?= =?iso-8859-1?q?IJh6GXXuy87xlf5qYzB3lcPjHeqx+ao/l7yxN7g=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: LV2PR01MB7839.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: b2a2d631-c70e-452d-1f8c-08dca965ba92 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jul 2024 09:15:56.6502 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: k2OIdxXWwMlQdNsMdrehhj1UsfN/vLISw4VvycbGzPKnjcSBZ2D2nHA8iGmlg117yMWKmqqmegFDbIVeWxICX46J2rbSe+GrOwr9ZA2ts2c= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR01MB7855 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds a pattern to fold a summation into the last operand of lane- reducing operation when appropriate, which is a supplement to those operation- specific patterns for dot-prod/sad/widen-sum. sum = lane-reducing-op(..., 0) + value; => sum = lane-reducing-op(..., value); Thanks, Feng --- gcc/ * tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add the new pattern function. * params.opt (vect-lane-reducing-accum-pattern): New parameter. gcc/testsuite/ * gcc.dg/vect/vect-reduc-accum-pattern.c --- gcc/params.opt | 4 + .../gcc.dg/vect/vect-reduc-accum-pattern.c | 61 ++++++++++ gcc/tree-vect-patterns.cc | 106 ++++++++++++++++++ 3 files changed, 171 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c From 94d34da8de2fd479c81e8398544466e6ffe7fdfc Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Wed, 22 May 2024 17:08:32 +0800 Subject: [PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation This patch adds a pattern to fold a summation into the last operand of lane- reducing operation when appropriate, which is a supplement to those operation- specific patterns for dot-prod/sad/widen-sum. sum = lane-reducing-op(..., 0) + value; => sum = lane-reducing-op(..., value); 2024-05-22 Feng Xue gcc/ * tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add the new pattern function. * params.opt (vect-lane-reducing-accum-pattern): New parameter. gcc/testsuite/ * gcc.dg/vect/vect-reduc-accum-pattern.c --- gcc/params.opt | 4 + .../gcc.dg/vect/vect-reduc-accum-pattern.c | 61 ++++++++++ gcc/tree-vect-patterns.cc | 106 ++++++++++++++++++ 3 files changed, 171 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c diff --git a/gcc/params.opt b/gcc/params.opt index c17ba17b91b..b94bdc26cbd 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization Enable loop vectorization of floating point inductions. +-param=vect-lane-reducing-accum-pattern= +Common Joined UInteger Var(param_vect_lane_reducing_accum_pattern) Init(2) IntegerRange(0, 2) Param Optimization +Allow pattern of combining plus into lane reducing operation or not. If value is 2, allow this for all statements, or if 1, only for reduction statement, otherwise, disable it. + -param=vrp-block-limit= Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param Maximum number of basic blocks before VRP switches to a fast model with less memory requirements. diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c new file mode 100644 index 00000000000..80a2c4f047e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c @@ -0,0 +1,61 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#define FN(name, S1, S2) \ +S1 int __attribute__ ((noipa)) \ +name (S1 int res, \ + S2 char *restrict a, \ + S2 char *restrict b, \ + S2 char *restrict c, \ + S2 char *restrict d) \ +{ \ + for (int i = 0; i < N; i++) \ + res += a[i] * b[i]; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; ++i) \ + res += (a[i] * b[i] + c[i] * d[i]) << 3; \ + \ + return res; \ +} + +FN(f1_vec, signed, signed) + +#pragma GCC push_options +#pragma GCC optimize ("O0") +FN(f1_novec, signed, signed) +#pragma GCC pop_options + +#define BASE2 ((signed int) -1 < 0 ? -126 : 4) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + signed char a[N], b[N]; + signed char c[N], d[N]; + +#pragma GCC novector + for (int i = 0; i < N; ++i) + { + a[i] = BASE2 + i * 5; + b[i] = BASE2 + OFFSET + i * 4; + c[i] = BASE2 + i * 6; + d[i] = BASE2 + OFFSET + i * 5; + } + + if (f1_vec (0x12345, a, b, c, d) != f1_novec (0x12345, a, b, c, d)) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vect_recog_lane_reducing_accum_pattern: detected" "vect" { target { vect_sdot_qi } } } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index bb037af0b68..9a6b16532e4 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -1490,6 +1490,111 @@ vect_recog_abd_pattern (vec_info *vinfo, return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out); } +/* Function vect_recog_lane_reducing_accum_pattern + + Try to fold a summation into the last operand of lane-reducing operation. + + sum = lane-reducing-op(..., 0) + value; + + A lane-reducing operation contains two aspects: main primitive operation + and appendant result-accumulation. Pattern matching for the basic aspect + is handled in specific pattern for dot-prod/sad/widen-sum respectively. + The function is in charge of the other aspect. + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + + Output: + + * TYPE_OUT: The type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern, that is: + sum = lane-reducing-op(..., value); +*/ + +static gimple * +vect_recog_lane_reducing_accum_pattern (vec_info *vinfo, + stmt_vec_info stmt_vinfo, + tree *type_out) +{ + if (!(stmt_vinfo->reduc_pattern_status & rpatt_formed)) + return NULL; + + if (param_vect_lane_reducing_accum_pattern == 0) + return NULL; + + if (param_vect_lane_reducing_accum_pattern == 1) + { + /* Only allow combing for loop reduction statement. */ + if (STMT_VINFO_REDUC_IDX (stmt_vinfo) < 0) + return NULL; + } + + gimple *last_stmt = stmt_vinfo->stmt; + + if (!is_gimple_assign (last_stmt) + || gimple_assign_rhs_code (last_stmt) != PLUS_EXPR) + return NULL; + + gimple *lane_reducing_stmt = NULL; + tree sum_oprnd = NULL_TREE; + + for (unsigned i = 0; i < 2; i++) + { + tree oprnd = gimple_op (last_stmt, i + 1); + vect_unpromoted_value unprom; + bool single_use_p = true; + + if (!vect_look_through_possible_promotion (vinfo, oprnd, &unprom, + &single_use_p) + || !single_use_p) + continue; + + stmt_vec_info oprnd_vinfo = vect_get_internal_def (vinfo, unprom.op); + + if (!oprnd_vinfo) + continue; + + gimple *stmt = oprnd_vinfo->stmt; + + if (lane_reducing_stmt_p (stmt) + && integer_zerop (gimple_op (stmt, gimple_num_ops (stmt) - 1))) + { + lane_reducing_stmt = stmt; + sum_oprnd = gimple_op (last_stmt, 2 - i); + break; + } + } + + if (!lane_reducing_stmt) + return NULL; + + tree type = TREE_TYPE (gimple_get_lhs (last_stmt)); + + *type_out = get_vectype_for_scalar_type (vinfo, type); + if (!*type_out) + return NULL; + + vect_pattern_detected ("vect_recog_lane_reducing_accum_pattern", last_stmt); + + tree var = vect_recog_temp_ssa_var (type, NULL); + enum tree_code code = gimple_assign_rhs_code (lane_reducing_stmt); + gimple *pattern_stmt; + + if (code == WIDEN_SUM_EXPR) + pattern_stmt = gimple_build_assign (var, code, + gimple_op (lane_reducing_stmt, 1), + sum_oprnd); + else + pattern_stmt = gimple_build_assign (var, code, + gimple_op (lane_reducing_stmt, 1), + gimple_op (lane_reducing_stmt, 2), + sum_oprnd); + return pattern_stmt; +} + /* Recognize an operation that performs ORIG_CODE on widened inputs, so that it can be treated as though it had the form: @@ -7084,6 +7189,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_dot_prod_pattern, "dot_prod" }, { vect_recog_sad_pattern, "sad" }, { vect_recog_widen_sum_pattern, "widen_sum" }, + { vect_recog_lane_reducing_accum_pattern, "lane_reducing_accum" }, { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, { vect_recog_bit_insert_pattern, "bit_insert" }, -- 2.17.1