From patchwork Mon Nov 23 11:26:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1404719 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Soi90wNp; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CflHG2l0Dz9sTL for ; Mon, 23 Nov 2020 22:27:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D07113865460; Mon, 23 Nov 2020 11:26:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D07113865460 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1606130816; bh=U6BFD6T1vvh6/wgHxPow33EkT6i/HQT8YMGol7dic68=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Soi90wNpV4SZFTEZWV4NUYs/KmmvRjOIhcK9B4LOg0SpJ8pj/DL8Xz8qO5kRYsXM+ 8ZmMSLc8W9mLIat7Id03p/bn2sNHjs6HHf9iNFy7yB2DeuYURDrFSEddJ0gGDrf3B1 21mmxhHm8LB6Owes1VuH8ZTeem+p+/PuSq6Q30js= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2074.outbound.protection.outlook.com [40.107.20.74]) by sourceware.org (Postfix) with ESMTPS id 4BF8F385800D for ; Mon, 23 Nov 2020 11:26:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4BF8F385800D Received: from AM6P192CA0089.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:8d::30) by AM9PR08MB6050.eurprd08.prod.outlook.com (2603:10a6:20b:285::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3589.20; Mon, 23 Nov 2020 11:26:37 +0000 Received: from VE1EUR03FT004.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:8d:cafe::61) by AM6P192CA0089.outlook.office365.com (2603:10a6:209:8d::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3589.20 via Frontend Transport; Mon, 23 Nov 2020 11:26:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT004.mail.protection.outlook.com (10.152.18.106) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3589.20 via Frontend Transport; Mon, 23 Nov 2020 11:26:37 +0000 Received: ("Tessian outbound 13ed5f5344c0:v71"); Mon, 23 Nov 2020 11:26:36 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: ba0711c6c840fe91 X-CR-MTA-TID: 64aa7808 Received: from 893ba396b9bc.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 168FC930-CA78-4AAC-828C-792EEB386868.1; Mon, 23 Nov 2020 11:26:26 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 893ba396b9bc.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 23 Nov 2020 11:26:26 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TZ7jMHX0qBLPQ45VBGaz96K79mT4BGyuSAwYUv3k1fUxvPOBuXAtlaZdjw7TILdRmgRM6gRb085PdWqqfWA8Yn42R3MK3pgfFzu+/plT/vm0P+45YpYae7lZ5zwVFPLqmW127fEyES59zx0NAvgEeU0eO7T6dYOBd1eWCpuEfiNurP54cbF6RtuTfA5XF3wAHZJsoBNftu9p6+xv0+wEpmUx8oWWgi0os53QipWBoIZxmTeBX73wNUMvRL8GSJKZvdfeDVzI2F3nCmMjpEDAAIlaKO6cAt+OPUFIm29h2Z1xFjKbeHZIz92SiqoA+tZjcN6oWX1HCUwqe06KgEw/SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=U6BFD6T1vvh6/wgHxPow33EkT6i/HQT8YMGol7dic68=; b=XLO8TP0SxrzZ9ALmnPxNCqZmNxgqZ0tfFpgyNStl/GViMONARHaHo3GQp1ZNz+nxnrVSYhHq3UpU7VhizpJ0IUz0rZcrrtHDEjVJaNvARdIRH2cCgco6Ueab6CrjRJBt1WQzACcxximrvGRKaI9E63VsXzVfHG5usv3nB0nXDQEnPzCpW/lrZ0bNRCd7fsUIg7WtuSXkzknUKUQk7gnmOuqzsthakB8oY45entvfgSuw6QHJlBdz63AeeLDCf+M3JR4otg/oHDHh9EkGtr+Cc8hAPlOGRAPErbkhUi3GbFm4ApNG58LrWmf5Oy4CiYSUQ26Jd7vBXmPxssB9CuC7iw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VE1PR08MB4654.eurprd08.prod.outlook.com (2603:10a6:802:a4::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3589.29; Mon, 23 Nov 2020 11:26:23 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f937:5b3:12e1:8297]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::f937:5b3:12e1:8297%7]) with mapi id 15.20.3589.022; Mon, 23 Nov 2020 11:26:23 +0000 Date: Mon, 23 Nov 2020 11:26:20 +0000 To: gcc-patches@gcc.gnu.org Subject: [PATCH] middle-end: Support complex Addition Message-ID: Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Originating-IP: [217.140.106.53] X-ClientProxiedBy: LO4P123CA0058.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:153::9) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.53) by LO4P123CA0058.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:153::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3589.20 via Frontend Transport; Mon, 23 Nov 2020 11:26:22 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 78a2e527-8b29-4f6e-dff5-08d88fa2a404 X-MS-TrafficTypeDiagnostic: VE1PR08MB4654:|AM9PR08MB6050: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:8882;OLM:8882; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: lITvg7J06Fyokk9nv7g+IIpckMqVGAePKF+SeMLzpce+5msvXQ2k4XQxSgdKz3FfJ5lCLNU1uZClDcTqhWQlEvNQ9nk8RNB9Xk0oAgBvsmZyVUHc34q6Zye48cn9EX2vOKJYv3WOmNmwdG8ztOB42X328lJeJ1h2gcXjzJseU4voJvVkc62G/ttKMmoH7tLxvpO9TtOSA8ttncIgv8A9voqYsqrlr1fqWGqXAg/NkjkHcP2cdYrxyLe2bLemYlO7JXI9sUQEf6/BGW9s+HViW5JYQ0BSzTqyYCRv2+G3YFRgxC+E4v0OoVraxLNtxf8SEycxjbbISMweKCi+Pgc3LZ59m6/unGRNlMQIDFxXazOIRZx/Cikyr4gYxjfF6zZ9qew/58nKH8q+8bRBvaXbQc0CC1CvT8U7kP+o62IcGGKPhfVdNUNs/XvjZpd51LRL5xeUXl0oW3qjkXQUwFu3u5OT86uAUhGOGzlPrtz5lpCN5742nbxyYtpFCOLKKUNY X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(366004)(136003)(346002)(396003)(376002)(7696005)(2906002)(186003)(478600001)(6916009)(16526019)(235185007)(316002)(8936002)(4743002)(8676002)(36756003)(44832011)(52116002)(8886007)(2616005)(30864003)(4326008)(86362001)(83380400001)(55016002)(26005)(44144004)(66476007)(66946007)(33964004)(66616009)(956004)(5660300002)(66556008)(4216001)(2004002)(2700100001)(579004)(559001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: pxbA4+NKyUCmHOaEtpefE9w5trKt47Xs7ik3ReVq1tYC51wbEnHkRm3Y02D3bT/EQvCEsHbl2lh+LZdYUNrYzV0KxAPlaI9GvxUJWUmDd816bsaWfZwna+xVadqwICpHm1AxmqpvhaHklNlw/8shMqJpD1Wv8SYY38MMIaBmRZWEUjkTnKheWdo1kv/4+MdRPdmhxlA9KKHAjpOw8TxBg5oR1SIboEkuT1Qk1NhKaexQtb/veJuuraujPoRB0jNit9pRMyGxYc3qxeNh9gRZtRJmQ2YmND+VTX1vWUI75bItyocUWkzMhDys8+l8FpxCu/DUCsElCDyYzHqpyWtxFVxdPleBCYIdRzLPa2YE75Z6YtgDMFXYGANji1qx5ZZyE5RM7xu7X+Xs7LtRlrP6kd/zvc24mtkYUAIyuI0VQX9S+Z4dJKF2If9HJ6R70/VV20jaWoQgY3ZblYXgbJLUH7iXWohWwFYMHlIqUD0vvTmpVksRYOCLDC+C7Yg+ULasa7YXJp4rHbsyDsikMNL+ePV3CTFjWrzNJDqRQs4yIDPRWuSb+TfU/HdV4iz8DjH6rL+FX+vJSGqZhmcQ+KCJeNp7dBnfZuUl8ahsPyM44uK6oouMGO8CTA4FHYHCyRdh/YcYkQd9Bmk+/2ZbZNtgyg== X-MS-Exchange-Transport-Forked: True X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB4654 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT004.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 26368d3e-7748-4770-8b7d-08d88fa29b76 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5zhWX6eqe9lCg2njbih2Fc7JI3ddGeR+1UN1ogd9v1wJhtIwpIHJh07gvyz6Fo5urBw6GWnm8Vy0Z6fWGhQ+hFPMDwetGujBuOra3icQzAx++W5QNSKGvdAhghF/so58jG05OiQQCRK37YPIOpVPvFAlUtkgH1GMSFn/SBaBOMtGP1JcuVit2Cs8wMC+rLUKwfhqiA1PjPmD1jVpLwaBDRvyDg8uTTSvXD69EeAATaDk6LGbrM99En/oK7qayVugAaOndlz7QCyGK08VhIIosjKq+6Pdb43I1XWEfwxzI5Xfk9wDXFTrmeN/Aqhcew67qLCsSvW2nV7ldYhGot3ePObpNKwFVE/QAm2mXQsKlxOgvm15+QcSOL2jDz2UQxfJg07sfPOsY1759oIssFvpEtxEE6YmGXDvxuiwZbvQfMnScxM+d8Nfp0k9wWa1leWLHhuo8KKYHIlO5SDA1TDriLzgKCy9o1rV5apIWIdJW2YA+U8Pi2bn0AIuk5/NdvqtSQtp8GHNSh87ap4OzpMZmAw060DnCGVAr2CTlMRe9Uo= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(396003)(39860400002)(136003)(376002)(346002)(46966005)(5660300002)(4326008)(235185007)(81166007)(55016002)(30864003)(6916009)(82310400003)(70586007)(66616009)(70206006)(4743002)(83380400001)(8886007)(47076004)(82740400003)(356005)(316002)(16526019)(26005)(36756003)(8676002)(336012)(33964004)(956004)(2616005)(44832011)(478600001)(186003)(8936002)(86362001)(2906002)(107886003)(7696005)(44144004)(4216001)(2004002)(2700100001)(559001)(579004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2020 11:26:37.0392 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 78a2e527-8b29-4f6e-dff5-08d88fa2a404 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT004.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6050 X-Spam-Status: No, score=-13.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, KAM_SHORT, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: hongtao.liu@intel.com, nd@arm.com, rguenther@suse.de, ook@ucw.cz Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi All, This patch adds support for * Complex Addition with rotation of 90 and 270. Addition with rotation of the second argument around the Argand plane. Supported rotations are 90 and 180. c = a + (b * I) and c = a + (b * I * I * I) For the full code I have pushed a branch at refs/users/tnfchris/heads/complex-numbers. As a side note, I still needed to set STMT_SLP_TYPE (call_stmt_info) = pure_slp; as the new hybrid detection code only runs for loop aware SLP. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues, but sorting out the testcases as TCL is processed before the CPP.. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-slp-patterns.c: New file. * Makefile.in: Add it. * doc/passes.texi: Document it. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New. * optabs.def (cadd90_optab, cadd270_optab): New. * doc/md.texi: Document them. * tree-vect-slp.c: (vect_free_slp_instance, vect_create_new_slp_node): Export. (vect_match_slp_patterns_2, vect_match_slp_patterns): New. (vect_analyze_slp): Use it. * tree-vectorizer.h (vect_free_slp_tree): Export. (enum _complex_operation): Forward declare. (class vect_pattern): New gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it. (check_effective_target_vect_complex_add_byte ,check_effective_target_vect_complex_add_int ,check_effective_target_vect_complex_add_short ,check_effective_target_vect_complex_add_long ,check_effective_target_vect_complex_add_half ,check_effective_target_vect_complex_add_float ,check_effective_target_vect_complex_add_double): New. * gcc.dg/vect/complex/bb-slp-complex-add-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c: New test. * gcc.dg/vect/complex/complex-add-pattern-template.c: New test. * gcc.dg/vect/complex/complex-add-template.c: New test. * gcc.dg/vect/complex/complex-operations-run.c: New test. * gcc.dg/vect/complex/complex-operations.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/vect-complex-add-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-unsigned-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-unsigned-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-unsigned-short.c: New test. --- inline copy of patch -- diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 778ec09c75d9af1cb9f2d5e7582b948c0397db65..d80657b089829fa30cede8bcfe036dda0ec06682 100644 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 778ec09c75d9af1cb9f2d5e7582b948c0397db65..d80657b089829fa30cede8bcfe036dda0ec06682 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1646,6 +1646,7 @@ OBJS = \ tree-vect-loop.o \ tree-vect-loop-manip.o \ tree-vect-slp.o \ + tree-vect-slp-patterns.o \ tree-vectorizer.o \ tree-vector-builder.o \ tree-vrp.o \ diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index da8c9a283dd42e2b3078ed5f370a37180ee0b538..2a030a1d7373cd2b5837aa1c99936a6a4e4e1480 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -6154,6 +6154,54 @@ floating-point mode. This pattern is not allowed to @code{FAIL}. +@cindex @code{cadd90@var{m}3} instruction pattern +@item @samp{cadd90@var{m}3} +Perform vector add and subtract on even/odd number pairs. The operation being +matched is semantically described as + +@smallexample + for (int i = 0; i < N; i += 2) + @{ + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + @} +@end smallexample + +This operation is semantically equivalent to performing a vector addition of +complex numbers in operand 1 with operand 2 rotated by 90 degrees around +the argand plane and storing the result in operand 0. + +In GCC lane ordering the real part of the number must be in the even lanes with +the imaginary part in the odd lanes. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{cadd270@var{m}3} instruction pattern +@item @samp{cadd270@var{m}3} +Perform vector add and subtract on even/odd number pairs. The operation being +matched is semantically described as + +@smallexample + for (int i = 0; i < N; i += 2) + @{ + c[i] = a[i] + b[i+1]; + c[i+1] = a[i+1] - b[i]; + @} +@end smallexample + +This operation is semantically equivalent to performing a vector addition of +complex numbers in operand 1 with operand 2 rotated by 270 degrees around +the argand plane and storing the result in operand 0. + +In GCC lane ordering the real part of the number must be in the even lanes with +the imaginary part in the odd lanes. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{ffs@var{m}2} instruction pattern @item @samp{ffs@var{m}2} Store into operand 0 one plus the index of the least significant 1-bit diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644 --- a/gcc/doc/passes.texi +++ b/gcc/doc/passes.texi @@ -709,7 +709,8 @@ loop. The pass is implemented in @file{tree-vectorizer.c} (the main driver), @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts and general loop utilities), @file{tree-vect-slp} (loop-aware SLP -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher. Analysis of data references is in @file{tree-data-ref.c}. SLP Vectorization. This pass performs vectorization of straight-line code. The diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1148df8cf4a509b5 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary) DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary) DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary) DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) + /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 5607f51e6b4b775a92d1d8ffcd3e9b53e9270d6c..e9727def4dbf941bb9ac8b56f83f8ea0f52b262c 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2") OPTAB_D (atanh_optab, "atanh$a2") OPTAB_D (copysign_optab, "copysign$F$a3") OPTAB_D (xorsign_optab, "xorsign$F$a3") +OPTAB_D (cadd90_optab, "cadd90$a3") +OPTAB_D (cadd270_optab, "cadd270$a3") OPTAB_D (cos_optab, "cos$a2") OPTAB_D (cosh_optab, "cosh$a2") OPTAB_D (exp10_optab, "exp10$a2") diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..3b1e0837a323364c55094240b21dcc4938fa37c2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int8_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c new file mode 100644 index 0000000000000000000000000000000000000000..33d3d13d629bb831272609c484c78e6d19a7b930 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int32_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c new file mode 100644 index 0000000000000000000000000000000000000000..54d0f1d6864c41fc656eeb1af32736ad37dcf381 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int64_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..fac77f7b626c985e4b033818a10f126784d5a9a6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int8_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c new file mode 100644 index 0000000000000000000000000000000000000000..41a836c10c8f2f45a521912186ab8ac5393f69fd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int32_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c new file mode 100644 index 0000000000000000000000000000000000000000..175f51c46d125578520b5205c86ca8a836174a2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int64_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c new file mode 100644 index 0000000000000000000000000000000000000000..c4fe72712a4d90bb5e89e6f6b2359029715c0bd8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int16_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..534a4201d54f73e0419c99a59955900b473107c8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint8_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c new file mode 100644 index 0000000000000000000000000000000000000000..9e3cf8062668b87962e0c71710579939f950651c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint32_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c new file mode 100644 index 0000000000000000000000000000000000000000..398fc94154c88f2f9088910e50c3c1d4cc0ce17f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint64_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c new file mode 100644 index 0000000000000000000000000000000000000000..7326d29d86c27056705c6287da41dd0b85d5cc35 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint16_t +#define N 16 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c new file mode 100644 index 0000000000000000000000000000000000000000..c1ce663dc7ab09875a06ad50381acc955dfd1fff --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int16_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..8d0c817fdae8e6ff6cdc665d6a132b4fc322ea61 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint8_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c new file mode 100644 index 0000000000000000000000000000000000000000..3b08ecd0dd80f949ab88d7e747602bb99fea7acc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint32_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c new file mode 100644 index 0000000000000000000000000000000000000000..4e069ee8297064dcad7447fff6012a10a34543e3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint64_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c new file mode 100644 index 0000000000000000000000000000000000000000..88d21abd3c8ee59901df645cf5c036c548cc6b1c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint16_t +#define N 16 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c new file mode 100644 index 0000000000000000000000000000000000000000..e8b8b19d1708673b17564b31d22df3443d667277 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c @@ -0,0 +1,60 @@ +void add90 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N]) +{ + for (int i=0; i < N; i+=2) + { + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + } +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */ + +void add270 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N]) +{ + for (int i=0; i < N; i+=2) + { + c[i] = a[i] + b[i+1]; + c[i+1] = a[i+1] - b[i]; + } +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */ + +void addMixed (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N]) +{ + for (int i=0; i < N; i+=4) + { + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + c[i+2] = a[i+2] + b[i+3]; + c[i+3] = a[i+3] - b[i+2]; + } +} + +void add90HandUnrolled (TYPE a[restrict N], TYPE b[restrict N], + TYPE c[restrict N]) +{ + for (int i=0; i < (N /2); i+=4) + { + c[i] = a[i] - b[i+1]; + c[i+2] = a[i+2] - b[i+3]; + c[i+1] = a[i+1] + b[i]; + c[i+3] = a[i+3] + b[i+2]; + } +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */ + +void add90Hybrid (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N], + TYPE d[restrict N]) +{ + for (int i=0; i < N; i+=2) + { + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + d[i] = a[i] - b[i]; + d[i+1] = a[i+1] - b[i+1]; + } +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c new file mode 100644 index 0000000000000000000000000000000000000000..afe08e867473695f0a742de330944f495bc541d7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c @@ -0,0 +1,77 @@ +void add0 (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + b[i]; +} + +void add90snd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + (b[i] * 1.0i); +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */ + +void add180snd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + (b[i] * 1.0i * 1.0i); +} + +void add270snd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + b[i]; +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */ + +void add90fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = (a[i] * 1.0i) + b[i]; +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */ + +void add180fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = (a[i] * 1.0i * 1.0i) + b[i]; +} + +void add270fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = (a[i] * 1.0i * 1.0i * 1.0i) + b[i]; +} + +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */ + +void addconjfst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = ~a[i] + b[i]; +} + +void addconjsnd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + ~b[i]; +} + +void addconjboth (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N], + TYPE _Complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = ~a[i] + ~b[i]; +} diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c new file mode 100644 index 0000000000000000000000000000000000000000..a0348a7041ca384104bc5ab688d941c14e5b7381 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c @@ -0,0 +1,103 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_complex_add_double } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#include +#include +#include +#include +#include + +#define PREF old +#pragma GCC push_options +#pragma GCC optimize ("no-tree-vectorize") +# include "complex-operations.c" +#pragma GCC pop_options +#undef PREF + +#define PREF new +# include "complex-operations.c" +#undef PREF + +#define TYPE double +#define TYPE2 double +#define EP pow(2, -45) + +#define xstr(s) str(s) +#define str(s) #s + +#define FCMP(A, B) \ + ((fabs (creal (A) - creal (B)) <= EP) && (fabs (cimag (A) - cimag (B)) <= EP)) + +#define CMP(A, B) \ + (FCMP(A,B) ? "PASS" : "FAIL") + +#define COMPARE(A,B) \ + memset (&c1, 0, sizeof (c1)); \ + memset (&c2, 0, sizeof (c2)); \ + A; B; \ + if (!FCMP(c1[0],c2[0]) || !FCMP(c1[1], c2[1])) \ + { \ + printf ("=> %s vs %s\n", xstr (A), xstr (B)); \ + printf ("%a\n", creal (c1[0]) - creal (c2[0])); \ + printf ("%a\n", cimag (c1[1]) - cimag (c2[1])); \ + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[0]), cimag (c1[0]), creal (c2[0]), cimag (c2[0]), CMP (c1[0], c2[0])); \ + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[1]), cimag (c1[1]), creal (c2[1]), cimag (c2[1]), CMP (c1[1], c2[1])); \ + printf ("\n"); \ + __builtin_abort (); \ + } + +int main () +{ + TYPE2 complex a[] = { 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I }; + TYPE complex b[] = { 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I }; + TYPE complex c2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; + TYPE complex c1[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; + TYPE diff1, diff2; + + COMPARE(fma0_old(a, b, c1), fma0_new(a, b, c2)); + COMPARE(fma90_old(a, b, c1), fma90_new(a, b, c2)); + COMPARE(fma180_old(a, b, c1), fma180_new(a, b, c2)); + COMPARE(fma270_old(a, b, c1), fma270_new(a, b, c2)); + COMPARE(fma0_snd_old(a, b, c1), fma0_snd_new(a, b, c2)); + COMPARE(fma90_snd_old(a, b, c1), fma90_snd_new(a, b, c2)); + COMPARE(fma180_snd_old(a, b, c1), fma180_snd_new(a, b, c2)); + COMPARE(fma270_snd_old(a, b, c1), fma270_snd_new(a, b, c2)); + COMPARE(fma_conj_first_old(a, b, c1), fma_conj_first_new(a, b, c2)); + COMPARE(fma_conj_second_old(a, b, c1), fma_conj_second_new(a, b, c2)); + COMPARE(fma_conj_both_old(a, b, c1), fma_conj_both_new(a, b, c2)); + COMPARE(fms0_old(a, b, c1), fms0_new(a, b, c2)); + COMPARE(fms90_old(a, b, c1), fms90_new(a, b, c2)); + COMPARE(fms180_old(a, b, c1), fms180_new(a, b, c2)); + COMPARE(fms270_old(a, b, c1), fms270_new(a, b, c2)); + COMPARE(fms0_snd_old(a, b, c1), fms0_snd_new(a, b, c2)); + COMPARE(fms90_snd_old(a, b, c1), fms90_snd_new(a, b, c2)); + COMPARE(fms180_snd_old(a, b, c1), fms180_snd_new(a, b, c2)); + COMPARE(fms270_snd_old(a, b, c1), fms270_snd_new(a, b, c2)); + COMPARE(fms_conj_first_old(a, b, c1), fms_conj_first_new(a, b, c2)); + COMPARE(fms_conj_second_old(a, b, c1), fms_conj_second_new(a, b, c2)); + COMPARE(fms_conj_both_old(a, b, c1), fms_conj_both_new(a, b, c2)); + COMPARE(mul0_old(a, b, c1), mul0_new(a, b, c2)); + COMPARE(mul90_old(a, b, c1), mul90_new(a, b, c2)); + COMPARE(mul180_old(a, b, c1), mul180_new(a, b, c2)); + COMPARE(mul270_old(a, b, c1), mul270_new(a, b, c2)); + COMPARE(mul0_snd_old(a, b, c1), mul0_snd_new(a, b, c2)); + COMPARE(mul90_snd_old(a, b, c1), mul90_snd_new(a, b, c2)); + COMPARE(mul180_snd_old(a, b, c1), mul180_snd_new(a, b, c2)); + COMPARE(mul270_snd_old(a, b, c1), mul270_snd_new(a, b, c2)); + COMPARE(mul_conj_first_old(a, b, c1), mul_conj_first_new(a, b, c2)); + COMPARE(mul_conj_second_old(a, b, c1), mul_conj_second_new(a, b, c2)); + COMPARE(mul_conj_both_old(a, b, c1), mul_conj_both_new(a, b, c2)); + COMPARE(add0_old(a, b, c1), add0_new(a, b, c2)); + COMPARE(add90_old(a, b, c1), add90_new(a, b, c2)); + COMPARE(add180_old(a, b, c1), add180_new(a, b, c2)); + COMPARE(add270_old(a, b, c1), add270_new(a, b, c2)); + COMPARE(add0_snd_old(a, b, c1), add0_snd_new(a, b, c2)); + COMPARE(add90_snd_old(a, b, c1), add90_snd_new(a, b, c2)); + COMPARE(add180_snd_old(a, b, c1), add180_snd_new(a, b, c2)); + COMPARE(add270_snd_old(a, b, c1), add270_snd_new(a, b, c2)); + COMPARE(add_conj_first_old(a, b, c1), add_conj_first_new(a, b, c2)); + COMPARE(add_conj_second_old(a, b, c1), add_conj_second_new(a, b, c2)); + COMPARE(add_conj_both_old(a, b, c1), add_conj_both_new(a, b, c2)); +} diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c new file mode 100644 index 0000000000000000000000000000000000000000..fdce995481d23c6a536293c8ee59eaf9ca9239bf --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c @@ -0,0 +1,358 @@ +#include +#include + +#ifndef PREF +#define PREF c +#endif + +#define FX(N,P) P ## _ ## N +#define MK(N,P) FX(P,N) + +#define N 32 +#define TYPE double + +// ------ FMA + +// Complex FMA instructions rotating the result + +__attribute__((noinline,noipa)) +void MK(fma0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(fma90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * b[i] * I; +} + +__attribute__((noinline,noipa)) +void MK(fma180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * b[i] * I * I; +} + +__attribute__((noinline,noipa)) +void MK(fma270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * b[i] * I * I * I; +} + +// Complex FMA instructions rotating the second parameter. + + +__attribute__((noinline,noipa)) +void MK(fma0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(fma90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * (b[i] * I); +} + +__attribute__((noinline,noipa)) +void MK(fma180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * (b[i] * I * I); +} + +__attribute__((noinline,noipa)) +void MK(fma270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * (b[i] * I * I * I); +} + +// Complex FMA instructions with conjucated values. + + +__attribute__((noinline,noipa)) +void MK(fma_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += conj (a[i]) * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(fma_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += a[i] * conj (b[i]); +} + +__attribute__((noinline,noipa)) +void MK(fma_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] += conj (a[i]) * conj (b[i]); +} + +// ----- FMS + +// Complex FMS instructions rotating the result + +__attribute__((noinline,noipa)) +void MK(fms0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(fms90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * b[i] * I; +} + +__attribute__((noinline,noipa)) +void MK(fms180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * b[i] * I * I; +} + +__attribute__((noinline,noipa)) +void MK(fms270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * b[i] * I * I * I; +} + +// Complex FMS instructions rotating the second parameter. + +__attribute__((noinline,noipa)) +void MK(fms0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(fms90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * (b[i] * I); +} + +__attribute__((noinline,noipa)) +void MK(fms180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * (b[i] * I * I); +} + +__attribute__((noinline,noipa)) +void MK(fms270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * (b[i] * I * I * I); +} + +// Complex FMS instructions with conjucated values. + +__attribute__((noinline,noipa)) +void MK(fms_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= conj (a[i]) * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(fms_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= a[i] * conj (b[i]); +} + +__attribute__((noinline,noipa)) +void MK(fms_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] -= conj (a[i]) * conj (b[i]); +} + + +// ----- MUL + +// Complex MUL instructions rotating the result + +__attribute__((noinline,noipa)) +void MK(mul0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(mul90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * b[i] * I; +} + +__attribute__((noinline,noipa)) +void MK(mul180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * b[i] * I * I; +} + +__attribute__((noinline,noipa)) +void MK(mul270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * b[i] * I * I * I; +} + +// Complex MUL instructions rotating the second parameter. + +__attribute__((noinline,noipa)) +void MK(mul0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(mul90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * (b[i] * I); +} + +__attribute__((noinline,noipa)) +void MK(mul180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * (b[i] * I * I); +} + +__attribute__((noinline,noipa)) +void MK(mul270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * (b[i] * I * I * I); +} + +// Complex FMS instructions with conjucated values. + +__attribute__((noinline,noipa)) +void MK(mul_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = conj (a[i]) * b[i]; +} + +__attribute__((noinline,noipa)) +void MK(mul_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] * conj (b[i]); +} + +__attribute__((noinline,noipa)) +void MK(mul_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = conj (a[i]) * conj (b[i]); +} + + +// ----- ADD + +// Complex ADD instructions rotating the result + +__attribute__((noinline,noipa)) +void MK(add0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + b[i]; +} + +__attribute__((noinline,noipa)) +void MK(add90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = (a[i] + b[i]) * I; +} + +__attribute__((noinline,noipa)) +void MK(add180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = (a[i] + b[i]) * I * I; +} + +__attribute__((noinline,noipa)) +void MK(add270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = (a[i] + b[i]) * I * I * I; +} + +// Complex ADD instructions rotating the second parameter. + +__attribute__((noinline,noipa)) +void MK(add0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + b[i]; +} + +__attribute__((noinline,noipa)) +void MK(add90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + (b[i] * I); +} + +__attribute__((noinline,noipa)) +void MK(add180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + (b[i] * I * I); +} + +__attribute__((noinline,noipa)) +void MK(add270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + (b[i] * I * I * I); +} + +// Complex ADD instructions with conjucated values. + +__attribute__((noinline,noipa)) +void MK(add_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = conj (a[i]) + b[i]; +} + +__attribute__((noinline,noipa)) +void MK(add_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = a[i] + conj (b[i]); +} + +__attribute__((noinline,noipa)) +void MK(add_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N]) +{ + for (int i=0; i < N; i++) + c[i] = conj (a[i]) + conj (b[i]); +} + + diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c new file mode 100644 index 0000000000000000000000000000000000000000..b5c252b176c7c21c9484574edc9a56d9d142e13c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_double } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE double +#define N 16 +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c new file mode 100644 index 0000000000000000000000000000000000000000..1a08e00bcede874d6acac9e2ebece5851c583530 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_float } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE float +#define N 16 +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c new file mode 100644 index 0000000000000000000000000000000000000000..e4d5c55c0a88f4ac8d45262ee13632443318931f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_half } */ +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE _Float16 +#define N 16 +#include "complex-add-template.c" diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c new file mode 100644 index 0000000000000000000000000000000000000000..6dd3f98a7a52b21f0365cd6c4394b20927a6a320 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_double } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE double +#define N 16 +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c new file mode 100644 index 0000000000000000000000000000000000000000..3d02cd455340e9510ae536d8d109b39f811743f0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_float } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE float +#define N 16 +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c new file mode 100644 index 0000000000000000000000000000000000000000..51dcd2724f51cb2d91f0aa234abc39f92275aa42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_half } */ +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE _Float16 +#define N 16 +#include "complex-add-pattern-template.c" diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c new file mode 100644 index 0000000000000000000000000000000000000000..606b8992b4890e4e2213157761bfac62f72aa40e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_double } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE double +#define N 200 +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c new file mode 100644 index 0000000000000000000000000000000000000000..5c640f0b14107b7cb8ad1535975d266e00b1d1b2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_float } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE float +#define N 200 +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c new file mode 100644 index 0000000000000000000000000000000000000000..6111356cbd4a9c86a9356bf67470512db44cfed2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_half } */ +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE _Float16 +#define N 200 +#include "complex-add-template.c" diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c new file mode 100644 index 0000000000000000000000000000000000000000..00f383d8cfddd1176cf4894ac7fd4d0ae9bcb297 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_double } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE double +#define N 200 +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c new file mode 100644 index 0000000000000000000000000000000000000000..ed108b14a3b704819a3c425b4d19d1103aeb432d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_float } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE float +#define N 200 +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c new file mode 100644 index 0000000000000000000000000000000000000000..aa239445a6563ea0ee15751a7f6a989fb1c9d9a7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_half } */ +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE _Float16 +#define N 200 +#include "complex-add-pattern-template.c" diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..4001f689671e0973b64665e6b9ea96c755277fae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int8_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c new file mode 100644 index 0000000000000000000000000000000000000000..1f006556af09027f22cefe129475bd7e977054a0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int32_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c new file mode 100644 index 0000000000000000000000000000000000000000..1e82657abf8316228e13651d111b7d256d0f266f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int64_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..db72e147c9dc4511fb46a036679b7ba77b97ffe3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int8_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c new file mode 100644 index 0000000000000000000000000000000000000000..8d350d69ae0eefba073aba8ae7b3da4b39c845df --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int32_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c new file mode 100644 index 0000000000000000000000000000000000000000..c8e56cd4f91bc6254a5fb2177b1f2484859bcf98 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int64_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c new file mode 100644 index 0000000000000000000000000000000000000000..2c54d756c9b2f54352d6dba97ccf05d37865cbaa --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int16_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..f54b903aa308a5dc68654b9ffd0a0c230f58e4cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint8_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c new file mode 100644 index 0000000000000000000000000000000000000000..96824f16b821236f5499dcb90454e72a1326df5c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint32_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c new file mode 100644 index 0000000000000000000000000000000000000000..8bd9f077b233eaf6e0c4ff4df9b97c109df7d002 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint64_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c new file mode 100644 index 0000000000000000000000000000000000000000..7e5154d73703512dceda39e37f0ebd0eb7c2e057 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint16_t +#define N 200 +#include +#include "complex-add-pattern-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c new file mode 100644 index 0000000000000000000000000000000000000000..ca0d618b991255f3ba34ee40fb876fd053e8121b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE int16_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c new file mode 100644 index 0000000000000000000000000000000000000000..925cfc2ea27b0d4ffbdadfb86abc5c198f57469d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_byte } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint8_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-int.c new file mode 100644 index 0000000000000000000000000000000000000000..6a70c6ebf0586c11a17cb1ad2cadd0d5927c2aca --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-int.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_int } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint32_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-long.c new file mode 100644 index 0000000000000000000000000000000000000000..084080aeb4386bf41b0e23d0c684917b2b0435d1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-long.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_long } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint64_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-short.c new file mode 100644 index 0000000000000000000000000000000000000000..1379608a60310fd26b18e3db2b6294c28bf5bf2e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-short.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_complex_add_short } */ +/* { dg-add-options arm_v8_3a_complex_neon } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ + +#define TYPE uint16_t +#define N 200 +#include +#include "complex-add-template.c" \ No newline at end of file diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 22acda2a74fdfa51aebbc311d5cc84763b0ffc63..baa5e4a569263edda2125bd8aca6f5b19bbad783 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3355,7 +3355,102 @@ proc check_effective_target_vect_int { } { }}] } -# Return 1 if the target supports signed int->float conversion +# Return 1 if the target supports hardware vectorization of complex additions of +# byte, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_complex_add_byte { } { + return [check_cached_effective_target_indexed vect_complex_add_byte { + expr { + [check_effective_target_aarch64_sve2] + || [check_effective_target_arm_v8_1m_mve_fp_ok] + }}] +} + +# Return 1 if the target supports hardware vectorization of complex additions of +# short, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_complex_add_short { } { + return [check_cached_effective_target_indexed vect_complex_add_short { + expr { + [check_effective_target_aarch64_sve2] + || [check_effective_target_arm_v8_1m_mve_fp_ok] + }}] +} + +# Return 1 if the target supports hardware vectorization of complex additions of +# int, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_complex_add_int { } { + return [check_cached_effective_target_indexed vect_complex_add_int { + expr { + [check_effective_target_aarch64_sve2] + || [check_effective_target_arm_v8_1m_mve_fp_ok] + }}] +} + +# Return 1 if the target supports hardware vectorization of complex additions of +# long, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_complex_add_long { } { + return [check_cached_effective_target_indexed vect_complex_add_long { + expr { + [check_effective_target_aarch64_sve2] + || [check_effective_target_arm_v8_1m_mve_fp_ok] + }}] +} + +# Return 1 if the target supports hardware vectorization of complex additions of +# half, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_complex_add_half { } { + return [check_cached_effective_target_indexed vect_complex_add_half { + expr { + [check_effective_target_arm_v8_3a_complex_neon_ok + && check_effective_target_arm_v8_2a_fp16_neon_ok] + || [check_effective_target_aarch64_sve2] + || [check_effective_target_arm_v8_1m_mve_fp_ok] + }}] +} + +# Return 1 if the target supports hardware vectorization of complex additions of +# float, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_complex_add_float { } { + return [check_cached_effective_target_indexed vect_complex_add_float { + expr { + [check_effective_target_arm_v8_3a_complex_neon_ok] + || [check_effective_target_aarch64_sve2] + || [check_effective_target_arm_v8_1m_mve_fp_ok] + }}] +} + +# Return 1 if the target supports hardware vectorization of complex additions of +# double, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_complex_add_double { } { + return [check_cached_effective_target_indexed vect_complex_add_double { + expr { + [check_effective_target_arm_v8_3a_complex_neon_ok] + || [check_effective_target_aarch64_sve2] + || [check_effective_target_arm_v8_1m_mve_fp_ok] + }}] +} + +# Return 1 if the target supports signed int->float conversion # proc check_effective_target_vect_intfloat_cvt { } { @@ -10367,7 +10462,7 @@ proc check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } { set et_arm_v8_3a_complex_neon_flags "" if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { - return 0; + return 1; } # Iterate through sets of options to find the compiler flags that @@ -10380,11 +10475,11 @@ proc check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } { #endif } "$flags -march=armv8.3-a"] } { set et_arm_v8_3a_complex_neon_flags "$flags -march=armv8.3-a" - return 1 + return 0; } } - return 0; + return 1; } proc check_effective_target_arm_v8_3a_complex_neon_ok { } { @@ -10400,13 +10495,57 @@ proc add_options_for_arm_v8_3a_complex_neon { flags } { return "$flags $et_arm_v8_3a_complex_neon_flags" } +# Return 1 if the target supports ARMv8.3 Adv.SIMD + FP16 Complex instructions +# instructions, 0 otherwise. The test is valid for ARM and for AArch64. +# Record the command line options needed. + +proc check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache { } { + global et_arm_v8_3a_fp16_complex_neon_flags + set et_arm_v8_3a_fp16_complex_neon_flags "" + + if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { + return 1; + } + + # Iterate through sets of options to find the compiler flags that + # need to be added to the -march option. + foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-abi=hard -mfpu=auto"} { + if { [check_no_compiler_messages_nocache \ + arm_v8_3a_fp16_complex_neon_ok object { + #if !defined (__ARM_FEATURE_COMPLEX) + #error "__ARM_FEATURE_COMPLEX not defined" + #endif + } "$flags -march=armv8.3-a+fp16"] } { + set et_arm_v8_3a_fp16_complex_neon_flags \ + "$flags -march=armv8.3-a+fp16" + return 0; + } + } + + return 1; +} + +proc check_effective_target_arm_v8_3a_fp16_complex_neon_ok { } { + return [check_cached_effective_target arm_v8_3a_fp16_complex_neon_ok \ + check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache] +} + +proc add_options_for_arm_v8_3a_fp16_complex_neon { flags } { + if { ! [check_effective_target_arm_v8_3a_fp16_complex_neon_ok] } { + return "$flags" + } + global et_arm_v8_3a_fp16_complex_neon_flags + return "$flags $et_arm_v8_3a_fp16_complex_neon_flags" +} + + # Return 1 if the target supports executing AdvSIMD instructions from ARMv8.3 # with the complex instruction extension, 0 otherwise. The test is valid for # ARM and for AArch64. proc check_effective_target_arm_v8_3a_complex_neon_hw { } { if { ![check_effective_target_arm_v8_3a_complex_neon_ok] } { - return 0; + return 1; } return [check_runtime arm_v8_3a_complex_neon_hw_available { #include "arm_neon.h" @@ -10431,7 +10570,7 @@ proc check_effective_target_arm_v8_3a_complex_neon_hw { } { : /* No clobbers. */); #endif - return (results[0] == 8 && results[1] == 24) ? 1 : 0; + return (results[0] == 8 && results[1] == 24) ? 0 : 1; } } [add_options_for_arm_v8_3a_complex_neon ""]] } diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c new file mode 100644 index 0000000000000000000000000000000000000000..aeb402289277c4bb48b62b7e9e074850a99d3182 --- /dev/null +++ b/gcc/tree-vect-slp-patterns.c @@ -0,0 +1,739 @@ +/* SLP - Pattern matcher on SLP trees + Copyright (C) 2020 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "target.h" +#include "rtl.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "optabs-tree.h" +#include "insn-config.h" +#include "recog.h" /* FIXME: for insn_data */ +#include "fold-const.h" +#include "stor-layout.h" +#include "gimple-iterator.h" +#include "cfgloop.h" +#include "tree-vectorizer.h" +#include "langhooks.h" +#include "gimple-walk.h" +#include "dbgcnt.h" +#include "tree-vector-builder.h" +#include "vec-perm-indices.h" +#include "gimple-fold.h" +#include "internal-fn.h" + +/* SLP Pattern matching mechanism. + + This extension to the SLP vectorizer allows one to transform the generated SLP + tree based on any pattern. The difference between this and the normal vect + pattern matcher is that unlike the former, this matcher allows you to match + with instructions that do not belong to the same SSA dominator graph. + + The only requirement that this pattern matcher has is that you are only + only allowed to either match an entire group or none. + + The pattern matcher currently only allows you to perform replacements to + internal functions. + + Once the patterns are matched it is one way, these cannot be undone. It is + currently not supported to match patterns recursively. + + To add a new pattern, implement the vect_pattern class and add the type to + slp_patterns. + +*/ + +/******************************************************************************* + * vect_pattern class + ******************************************************************************/ + +/* Default implementation of recognize that peforms matching, validation and + replacement of nodes but that can be overriden if required. */ + +static bool +vect_pattern_validate_optab (internal_fn ifn, slp_tree node) +{ + tree vectype = SLP_TREE_VECTYPE (node); + if (ifn == IFN_LAST || !vectype) + return false; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Found %s pattern in SLP tree\n", + internal_fn_name (ifn)); + + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Target supports %s vectorization with mode %T\n", + internal_fn_name (ifn), vectype); + } + else + { + if (dump_enabled_p ()) + { + if (!vectype) + dump_printf_loc (MSG_NOTE, vect_location, + "Target does not support vector type for %T\n", + SLP_TREE_DEF_TYPE (node)); + else + dump_printf_loc (MSG_NOTE, vect_location, + "Target does not support %s for vector type " + "%T\n", internal_fn_name (ifn), vectype); + } + return false; + } + return true; +} + +/******************************************************************************* + * General helper types + ******************************************************************************/ + +/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can + be matched when looking for expressions that we are interested matching for + complex numbers addition and mla. */ + +typedef enum _complex_operation : unsigned { + PLUS_PLUS, + MINUS_PLUS, + PLUS_MINUS, + MULT_MULT, + CMPLX_NONE +} complex_operation_t; + +/******************************************************************************* + * General helper functions + ******************************************************************************/ + +/* Helper function of linear_loads_p that checks to see if the load permutation + is sequential and in monotonically increasing order of loads with no gaps. +*/ + +static inline bool +is_linear_load_p (load_permutation_t loads) +{ + if (loads.length() == 0) + return false; + + unsigned leader = loads[0]; + unsigned load, i; + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1) + if (load != ++leader) + return false; + return true; +} + + +/* Check to see if all loads rooted in ROOT are linear. Linearity is + defined as having no gaps between values loaded. */ + +static load_permutation_t +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root, + bool *linear) +{ + *linear = false; + if (!root) + return vNULL; + + unsigned i; + load_permutation_t loads = vNULL; + load_permutation_t *tmp; + + if ((tmp = perm_cache->get (root)) != NULL) + { + *linear = is_linear_load_p (*tmp); + return *tmp; + } + + perm_cache->put (root, vNULL); + + /* If it's a load node, then just read the load permute. */ + if (SLP_TREE_LOAD_PERMUTATION (root).exists ()) + { + loads = SLP_TREE_LOAD_PERMUTATION (root); + perm_cache->put (root, loads); + if (!is_linear_load_p (loads)) + return loads; + } + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def) + { + loads.create (SLP_TREE_LANES (root)); + tree op; + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op) + { + if (TREE_CODE (op) != SSA_NAME) + return vNULL; + + gimple *defstmt = SSA_NAME_DEF_STMT (op); + if (!is_gimple_assign (defstmt)) + return vNULL; + + switch (gimple_assign_rhs_code (defstmt)) + { + case IMAGPART_EXPR: + loads.safe_push (1); + break; + case REALPART_EXPR: + loads.safe_push (0); + break; + default: + { + loads.release (); + return vNULL; + } + } + } + + perm_cache->put (root, loads); + if (!is_linear_load_p (loads)) + return loads; + } + else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def) + return vNULL; + + auto_vec all_loads; + bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists (); + + slp_tree child; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child) + { + loads = linear_loads_p (perm_cache, child, linear); + if ((!*linear && !is_perm) || !loads.exists ()) + return loads; + + all_loads.safe_push (loads); + } + + if (is_perm) + { + lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root); + load_permutation_t nloads; + nloads.create (SLP_TREE_LANES (root)); + nloads.quick_grow (SLP_TREE_LANES (root)); + for (i = 0; i < SLP_TREE_LANES (root); i++) + nloads[i] = all_loads[perm[i].first][perm[i].second]; + + perm_cache->put (root, nloads); + if (!is_linear_load_p (nloads)) + return nloads; + loads = nloads; + } + + perm_cache->put (root, loads); + *linear = true; + return loads; +} + + +/* This function attempts to make a node rooted in NODE with parent PARENT + linear. If the node if already linear than the node itself is returned + in RESULT. + + If the node is not linear then a new VEC_PERM_EXPR node is created with a + lane permute that when applied will make the node linear. If such a + permute cannot be created then FALSE is returned from the function. + + Here linearity is defined as having a sequential, monotically increasing + load position inside the load permute generated by the loads reachable from + NODE. */ + +static bool +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache, + slp_tree parent, slp_tree node, slp_tree *result) +{ + bool is_linear = false; + unsigned x, val; + load_permutation_t load_perm = linear_loads_p (perm_cache, node, &is_linear); + if (is_linear) + { + *result = node; + SLP_TREE_REF_COUNT (node)++; + return true; + } + + /* Attempt to linearise the permute. */ + vec > zipped; + zipped.create (load_perm.length ()); + FOR_EACH_VEC_ELT (load_perm, x, val) + zipped.quick_push (std::make_pair (val, x)); + + typedef const std::pair* cmp_t; + zipped.qsort ([](const void *a, const void *b) -> int + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; }); + + /* Verify if we have a linear permute sequence. */ + if (zipped.length () > 0) + { + unsigned leader = zipped[0].first; + for (x = 1; x < zipped.length (); x++) + if(!(is_linear = (zipped[x].first == ++leader))) + break; + } + + if (!is_linear) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Loads could not be made linear %p\n", + node); + zipped.release (); + return false; + } + + for (x = 0; x < zipped.length (); x++) + zipped[x].first = 0; + + /* Create the new permute node and store it instead. */ + slp_tree vnode = vect_create_new_slp_node (vNULL, 1); + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR; + SLP_TREE_LANE_PERMUTATION (vnode) = zipped; + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent); + SLP_TREE_CHILDREN (vnode).quick_push (node); + SLP_TREE_REF_COUNT (vnode) = 1; + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node); + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent); + SLP_TREE_REF_COUNT (node)++; + *result = vnode; + return is_linear; +} + +/* Checks to see of the expression represented by NODE is a gimple assign with + code CODE. */ + +static inline bool +vect_match_expression_p (slp_tree node, tree_code code) +{ + if (!node + || !SLP_TREE_REPRESENTATIVE (node)) + return false; + + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node)); + if (!is_gimple_assign (expr) + || gimple_assign_rhs_code (expr) != code) + return false; + + return true; +} + +/* Check if the given lane permute in PERMUTES matches an alternating sequence + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. Further mode + there resulting permute must be linear. */ + +static inline bool +vect_check_lane_permute (lane_permutation_t &permutes, + unsigned p0, unsigned p1) +{ + if (permutes.length () == 0) + return false; + + unsigned val[2] = {p0, p1}; + unsigned seed = permutes[0].second; + for (unsigned i = 0; i < permutes.length (); i++) + if (permutes[i].first != val[i % 2] + || permutes[i].second != seed++) + return false; + + return true; +} + +/* This function will match the two gimple expressions representing NODE1 and + NODE2 in parallel and returns the pair operation that represents the two + expressions in the two statements. + + If match is successful then the corresponding complex_operation is + returned and the arguments to the two matched operations are returned in OPS. + + If TWO_OPERANDS it is expected that the LANES of the parent VEC_PERM select + from the two nodes alternatingly. + + If unsuccessful then CMPLX_NONE is returned and OPS is untouched. + + e.g. the following gimple statements + + stmt 0 _39 = _37 + _12; + stmt 1 _6 = _38 - _36; + + will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}. +*/ + +static complex_operation_t +vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t &lanes, + bool two_operands = true, vec *ops = NULL) +{ + complex_operation_t result = CMPLX_NONE; + + if (vect_match_expression_p (node1, MINUS_EXPR) + && vect_match_expression_p (node2, PLUS_EXPR) + && (!two_operands || vect_check_lane_permute (lanes, 0, 1))) + result = MINUS_PLUS; + else if (vect_match_expression_p (node1, PLUS_EXPR) + && vect_match_expression_p (node2, MINUS_EXPR) + && (!two_operands || vect_check_lane_permute (lanes, 0, 1))) + result = PLUS_MINUS; + else if (vect_match_expression_p (node1, PLUS_EXPR) + && vect_match_expression_p (node2, PLUS_EXPR)) + result = PLUS_PLUS; + else if (vect_match_expression_p (node1, MULT_EXPR) + && vect_match_expression_p (node2, MULT_EXPR)) + result = MULT_MULT; + + if (result != CMPLX_NONE && ops != NULL) + { + ops->create (2); + ops->quick_push (node1); + ops->quick_push (node2); + } + return result; +} + +/* Overload of vect_detect_pair_op that matches against the representative + statements in the children of NODE. It is expected that NODE has exactly + two children and when TWO_OPERANDS then NODE must be a VEC_PERM. */ + +static complex_operation_t +vect_detect_pair_op (slp_tree node, bool two_operands = true, + vec *ops = NULL) +{ + if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR) + return CMPLX_NONE; + + if (SLP_TREE_CHILDREN (node).length () != 2) + return CMPLX_NONE; + + vec children = SLP_TREE_CHILDREN (node); + lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node); + + return vect_detect_pair_op (children[0], children[1], lanes, two_operands, + ops); +} + +/******************************************************************************* + * complex_pattern class + ******************************************************************************/ + +/* SLP Complex Numbers pattern matching. + + As an example, the following simple loop: + + double a[restrict N]; double b[restrict N]; double c[restrict N]; + + for (int i=0; i < N; i+=2) + { + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + } + + which represents a complex addition on with a rotation of 90* around the + argand plane. i.e. if `a` and `b` were complex numbers then this would be the + same as `a + (b * I)`. + + Here the expressions for `c[i]` and `c[i+1]` are independent but have to be + both recognized in order for the pattern to work. As an SLP tree this is + represented as + + +--------------------------------+ + | stmt 0 *_9 = _10; | + | stmt 1 *_15 = _16; | + +--------------------------------+ + | + | + v + +--------------------------------+ + | stmt 0 _10 = _4 - _8; | + | stmt 1 _16 = _12 + _14; | + | lane permutation { 0[0] 1[1] } | + +--------------------------------+ + | | + | | + | | + +-----+ | | +-----+ + | | | | | | + +-----| { } |<-----+ +----->| { } --------+ + | | | +------------------| | | + | +-----+ | +-----+ | + | | | | + | | | | + | +------|------------------+ | + | | | | + v v v v + +--------------------------+ +--------------------------------+ + | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; | + | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; | + | load permutation { 1 0 } | | load permutation { 0 1 } | + +--------------------------+ +--------------------------------+ + + The pattern matcher allows you to replace both statements 0 and 1 or none at + all. Because this operation is a two operands operation the actual nodes + being replaced are those in the { } nodes. The actual scalar statements + themselves are not replaced or used during the matching but instead the + SLP_TREE_REPRESENTATIVE statements are inspected. You are also allowed to + replace and match on any number of nodes. + + Because the pattern matcher matches on the representative statement for the + SLP node the case of two_operators it allows you to match the children of the + node. This is done using the method `recognize ()`. + +*/ + +/* The complex_pattern class contains common code for pattern matchers that work + on complex numbers. These provide functionality to allow de-construction and + validation of sequences depicting/transforming REAL and IMAG pairs. */ + +class complex_pattern : public vect_pattern +{ + protected: + auto_vec m_workset; + complex_pattern (slp_tree *node, vec *m_ops, internal_fn ifn) + : vect_pattern (node, m_ops, ifn) + { + this->m_workset.safe_push (*node); + } + + public: + void build (slp_tree_to_load_perm_map_t *, vec_info *); + + static internal_fn + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + vec *); +}; + +/* Create a replacement pattern statement for each node in m_node and inserts + the new statement into m_node as the new representative statement. The old + statement is marked as being in a pattern defined by the new statement. The + statement is created as call to internal function IFN with m_num_args + arguments. + + Futhermore the new pattern is also added to the vectorization information + structure VINFO and the old statement STMT_INFO is marked as unused while + the new statement is marked as used and the number of SLP uses of the new + statement is incremented. + + The newly created SLP nodes are marked as SLP only and will be dissolved + if SLP is aborted. + + The newly created gimple call is returned and the BB remains unchanged. + + This default method is designed to only match against simple operands where + all the input and output types are the same. +*/ + +void +complex_pattern::build (slp_tree_to_load_perm_map_t *perm_cache, + vec_info *vinfo) +{ + stmt_vec_info stmt_info; + + auto_vec args; + args.create (this->m_num_args); + args.quick_grow_cleared (this->m_num_args); + slp_tree node; + unsigned ix; + stmt_vec_info call_stmt_info; + gcall *call_stmt = NULL; + auto_vec nodes; + slp_tree tmp = NULL; + node = this->m_ops[0]; + + /* First re-arrange the children. */ + + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), ix, tmp) + { + slp_tree vnode = NULL; + if (vect_slp_make_linear (perm_cache, node, tmp, &vnode)) + nodes.safe_push (vnode); + else + { + FOR_EACH_VEC_ELT (nodes, ix, tmp) + vect_free_slp_tree (tmp); + + return; + } + } + + FOR_EACH_VEC_ELT (this->m_ops, ix, node) + vect_free_slp_tree (node); + + SLP_TREE_CHILDREN (*this->m_node).truncate (0); + SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes); + + /* Now modify the nodes themselves. */ + FOR_EACH_VEC_ELT (this->m_workset, ix, node) + { + /* Calculate the location of the statement in NODE to replace. */ + stmt_info = SLP_TREE_REPRESENTATIVE (node); + gimple* old_stmt = STMT_VINFO_STMT (stmt_info); + tree lhs_old_stmt = gimple_get_lhs (old_stmt); + tree type = TREE_TYPE (lhs_old_stmt); + + /* Create the argument set for use by gimple_build_call_internal_vec. */ + for (unsigned i = 0; i < this->m_num_args; i++) + args[i] = lhs_old_stmt; + + /* Create the new pattern statements. */ + call_stmt = gimple_build_call_internal_vec (this->m_ifn, args); + tree var = make_temp_ssa_name (type, call_stmt, "slp_patt"); + gimple_call_set_lhs (call_stmt, var); + gimple_set_location (call_stmt, gimple_location (old_stmt)); + gimple_call_set_nothrow (call_stmt, true); + + /* Adjust the book-keeping for the new and old statements for use during + SLP. This is required to get the right VF and statement during SLP + analysis. These changes are created after relevancy has been set for + the nodes as such we need to manually update them. Any changes will be + undone if SLP is cancelled. */ + call_stmt_info + = vinfo->add_pattern_stmt (call_stmt, stmt_info); + STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope; + + /* Unfortunately still need this on the new pattern because non-loop SLP + doesn't call vect_detect_hybrid_slp so it never updates it. */ + STMT_SLP_TYPE (call_stmt_info) = pure_slp; + + /* add_pattern_stmt can't be done in vect_mark_pattern_stmts because + the non-SLP pattern matchers already have added the statement to VINFO + by the time it is called. Some of them need to modify the returned + stmt_info. vect_mark_pattern_stmts is called by recog_pattern and it + would increase the size of each pattern with boilerplate code to make + the call there. */ + vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt, + SLP_TREE_VECTYPE (node)); + + /* Since we are replacing all the statements in the group with the same + thing it doesn't really matter. So just set it every time a new stmt + is created. */ + SLP_TREE_REPRESENTATIVE (node) = call_stmt_info; + SLP_TREE_CODE (node) = CALL_EXPR; + } +} + +/******************************************************************************* + * complex_add_pattern class + ******************************************************************************/ + +class complex_add_pattern : public complex_pattern +{ + protected: + complex_add_pattern (slp_tree *node, vec *m_ops, internal_fn ifn) + : complex_pattern (node, m_ops, ifn) + { + this->m_num_args = 2; + } + + public: + static internal_fn + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + vec *); + + static vect_pattern* + recognize (slp_tree_to_load_perm_map_t *, slp_tree *); +}; + +/* Pattern matcher for trying to match complex addition pattern in SLP tree. + + If no match is found then IFN is set to IFN_LAST. + This function matches the patterns shaped as: + + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + + If a match occurred then TRUE is returned, else FALSE. The initial match is + expected to be in OP1 and the initial match operands in args0. */ + +internal_fn +complex_add_pattern::matches (complex_operation_t op, + slp_tree_to_load_perm_map_t *perm_cache, + vec *ops) +{ + internal_fn ifn = IFN_LAST; + + /* Find the two components. Rotation in the complex plane will modify + the operations: + + * Rotation 0: + + + * Rotation 90: - + + * Rotation 180: - - + * Rotation 270: + - + + Rotation 0 and 180 can be handled by normal SIMD code, so we don't need + to care about them here. */ + if (op == MINUS_PLUS) + ifn = IFN_COMPLEX_ADD_ROT90; + else if (op == PLUS_MINUS) + ifn = IFN_COMPLEX_ADD_ROT270; + else + return ifn; + + /* verify that there is a permute, otherwise this isn't a pattern we + we support. */ + bool is_linear = false; + gcc_assert (ops->length () == 2); + + vec children = SLP_TREE_CHILDREN ((*ops)[0]); + + /* First node must be unpermuted. */ + linear_loads_p (perm_cache, children[0], &is_linear); + if (!is_linear) + return IFN_LAST; + + /* Second node must be permuted. */ + if (linear_loads_p (perm_cache, children[1], &is_linear).length () > 0 + && is_linear) + return IFN_LAST; + + return ifn; +} + +vect_pattern* +complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + slp_tree *node) +{ + auto_vec ops; + complex_operation_t op + = vect_detect_pair_op (*node, true, &ops); + internal_fn ifn = complex_add_pattern::matches (op, perm_cache, &ops); + if (!vect_pattern_validate_optab (ifn, *node)) + return NULL; + + return new complex_add_pattern (node, &ops, ifn); +} + +/******************************************************************************* + * Pattern matching definitions + ******************************************************************************/ + +#define SLP_PATTERN(x) &x::recognize +vect_pattern_decl_t slp_patterns[] +{ + /* For least amount of back-tracking and more efficient matching + order patterns from the largest to the smallest. Especially if they + overlap in what they can detect. */ + + SLP_PATTERN (complex_add_pattern), +}; +#undef SLP_PATTERN + +/* Set the number of SLP pattern matchers available. */ +size_t num__slp_patterns = sizeof(slp_patterns)/sizeof(vect_pattern_decl_t); diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index d19874f175703a96b1c1110874067fdbec48c068..7f5fbdbd4969036b5db1cb698da970304c87b03b 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -105,7 +105,7 @@ _slp_tree::~_slp_tree () /* Recursively free the memory allocated for the SLP tree rooted at NODE. */ -static void +void vect_free_slp_tree (slp_tree node) { int i; @@ -148,7 +148,7 @@ vect_free_slp_instance (slp_instance instance) /* Create an SLP node for SCALAR_STMTS. */ -slp_tree +static slp_tree vect_create_new_slp_node (slp_tree node, vec scalar_stmts, unsigned nops) { @@ -165,7 +165,7 @@ vect_create_new_slp_node (slp_tree node, /* Create an SLP node for SCALAR_STMTS. */ -static slp_tree +slp_tree vect_create_new_slp_node (vec scalar_stmts, unsigned nops) { return vect_create_new_slp_node (new _slp_tree, scalar_stmts, nops); @@ -2175,6 +2175,84 @@ calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size) return exact_div (common_multiple (nunits, group_size), group_size); } +/* Helper function of vect_match_slp_patterns. + + Attempts to match patterns against the slp tree rooted in REF_NODE using + VINFO. Patterns are matched in post-order traversal. + + If matching is successful the value in REF_NODE is updated and returned, if + not then it is returned unchanged. */ + +static bool +vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo, + slp_tree_to_load_perm_map_t *perm_cache, + hash_set *visited) +{ + unsigned i; + slp_tree node = *ref_node; + bool found_p = false; + if (!node || visited->add (node)) + return false; + + slp_tree child; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) + found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i], + vinfo, perm_cache, visited); + + for (unsigned x = 0; x < num__slp_patterns; x++) + { + vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node); + if (pattern) + { + pattern->build (perm_cache, vinfo); + delete pattern; + found_p = true; + } + } + + return found_p; +} + +/* Applies pattern matching to the given SLP tree rooted in REF_NODE using + vec_info VINFO. + + The modified tree is returned. Patterns are tried in order and multiple + patterns may match. */ + +static bool +vect_match_slp_patterns (slp_instance instance, vec_info *vinfo, + hash_set *visited, + slp_tree_to_load_perm_map_t *perm_cache, + scalar_stmts_to_slp_tree_map_t * /* bst_map */) +{ + DUMP_VECT_SCOPE ("vect_match_slp_patterns"); + slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Analyzing SLP tree %p for patterns\n", + SLP_INSTANCE_TREE (instance)); + + bool found_p + = vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited); + + if (found_p) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "Pattern matched SLP tree\n"); + vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node); + } + } + + return found_p; +} + +/* Analyze an SLP instance starting from a group of grouped stores. Call + vect_build_slp_tree to build a tree of packed stmts if possible. + Return FALSE if it's impossible to SLP any stmt in the loop. */ + static bool vect_analyze_slp_instance (vec_info *vinfo, scalar_stmts_to_slp_tree_map_t *bst_map, @@ -2540,6 +2618,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) { unsigned int i; stmt_vec_info first_element; + slp_instance instance; DUMP_VECT_SCOPE ("vect_analyze_slp"); @@ -2586,6 +2665,13 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) slp_inst_kind_reduc_group, max_tree_size); } + hash_set visited_patterns; + slp_tree_to_load_perm_map_t perm_cache; + /* See if any patterns can be found in the SLP tree. */ + FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance) + vect_match_slp_patterns (instance, vinfo, &visited_patterns, &perm_cache, + bst_map); + /* The map keeps a reference on SLP nodes built, release that. */ for (scalar_stmts_to_slp_tree_map_t::iterator it = bst_map->begin (); it != bst_map->end (); ++it) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 91e2e10761d591b99ad55467e4719219ea5c0e49..ea39f56365e6c6fcbaaeb9cde769a81a109d6af3 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -27,6 +27,7 @@ typedef class _stmt_vec_info *stmt_vec_info; #include "tree-hash-traits.h" #include "target.h" #include "alloc-pool.h" +#include "internal-fn.h" /* Used for naming of new temporaries. */ @@ -1994,6 +1995,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree, extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info); extern bool vect_update_shared_vectype (stmt_vec_info, tree); extern slp_tree vect_create_new_slp_node (vec, unsigned); +extern void vect_free_slp_tree (slp_tree); /* In tree-vect-patterns.c. */ extern void @@ -2010,4 +2012,67 @@ void vect_free_loop_info_assumptions (class loop *); gimple *vect_loop_vectorized_call (class loop *, gcond **cond = NULL); bool vect_stmt_dominates_stmt_p (gimple *, gimple *); +/* SLP Pattern matcher types, tree-vect-slp-patterns.c. */ + +/* Forward declaration of possible two operands operation that can be matched + by the complex numbers pattern matchers. */ +enum _complex_operation : unsigned; + +/* Cache from nodes to the load permutation they represent. */ +typedef hash_map + slp_tree_to_load_perm_map_t; + +/* Vector pattern matcher base class. All SLP pattern matchers must inherit + from this type. */ + +class vect_pattern +{ + protected: + /* The number of arguments that the IFN requires. */ + unsigned m_num_args; + + /* The internal function that will be used when a pattern is created. */ + internal_fn m_ifn; + + /* The current node being inspected. */ + slp_tree *m_node; + + /* The list of operands to be the children for the node produced when the + internal function is created. */ + vec m_ops; + + /* Default constructor where NODE is the root of the tree to inspect. */ + vect_pattern (slp_tree *node, vec *m_ops, internal_fn ifn) + { + this->m_ifn = ifn; + this->m_node = node; + this->m_ops.create (0); + this->m_ops.safe_splice (*m_ops); + } + + public: + + /* Create a new instance of the pattern matcher class of the given type. */ + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *); + + /* Build the pattern from the data collected so far. */ + virtual void build (slp_tree_to_load_perm_map_t *, vec_info *) = 0; + + /* Default destructor. */ + virtual ~vect_pattern () + { + this->m_ops.release (); + } +}; + +/* Function pointer to create a new pattern matcher from a generic type. */ +typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *, + slp_tree *); + +/* List of supported pattern matchers. */ +extern vect_pattern_decl_t slp_patterns[]; + +/* Number of supported pattern matchers. */ +extern size_t num__slp_patterns; + #endif /* GCC_TREE_VECTORIZER_H */