[v2,0/16,RFC,AArch64/Arm/SVE/SVE2/MVE] middle-end Add support for SLP vectorization of complex number instructions.

Message ID	20200925142704.GA9928@arm.com
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 45804394504E Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Date: Fri, 25 Sep 2020 15:27:09 +0100 From: Tamar Christina <tamar.christina@arm.com> To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 0/16][RFC][AArch64/Arm/SVE/SVE2/MVE]middle-end Add support for SLP vectorization of complex number instructions. Message-ID: <20200925142704.GA9928@arm.com> Content-Type: multipart/mixed; boundary="MGYHOYXEY6WxJCY8" Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 NoDisclaimer: true Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Precedence: list Cc: nd@arm.com, rguenther@suse.de, ook@ucw.cz Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	middle-end Add support for SLP vectorization of complex number instructions. \| expand [v2,0/16,RFC,AArch64/Arm/SVE/SVE2/MVE] middle-end Add support for SLP vectorization of complex numb… [v2,1/16] middle-end: Refactor refcnt to use SLP_TREE_REF_COUNT for consistency [v2,2/16] middle-end: Refactor and expose some vectorizer helper functions. [v2,3/16] middle-end Add basic SLP pattern matching scaffolding. [v2,4/16] middle-end: Add dissolve code for when SLP fails and non-SLP loop vectorization is to be … [v2,5/16] middle-end: Add shared machinery for matching patterns involving complex numbers. [v2,6/16] middle-end Add Complex Addition with rotation detection [v2,7/16] middle-end: Add Complex Multiplication and Multiplication with Conjucate detection [v2,8/16] middle-end: add Complex Multiply and Accumulate/Subtract and Multiply and Accumulate/Subt… [v2,9/16,docs] Add some missing test directive documentaion. [v2,10/16] AArch64: Add NEON RTL patterns for Complex Addition, Multiply and FMA. [v2,11/16] AArch64: Add SVE RTL patterns for Complex Addition, Multiply and FMA. [v2,12/16] AArch64: Add SVE2 Integer RTL patterns for Complex Addition, Multiply and FMA. [v2,13/16] Arm: Add support for auto-vectorization using HF mode. [v2,14/16] Arm: Add NEON RTL patterns for Complex Addition, Multiply and FMA. [v2,15/16] Arm: Add MVE RTL patterns for Complex Addition, Multiply and FMA. [v2,16/16] Testsuite: Add initial tests for NEON (incomplete)

Message ID

20200925142704.GA9928@arm.com

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 45804394504E
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 63.35.35.123 as permitted sender) receiver=protection.outlook.com;
 client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com;
Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed)
 header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com;
Date: Fri, 25 Sep 2020 15:27:09 +0100
From: Tamar Christina <tamar.christina@arm.com>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH v2 0/16][RFC][AArch64/Arm/SVE/SVE2/MVE]middle-end Add support
 for SLP vectorization of complex number instructions.
Message-ID: <20200925142704.GA9928@arm.com>
Content-Type: multipart/mixed; boundary="MGYHOYXEY6WxJCY8"
Content-Disposition: inline
User-Agent: Mutt/1.9.4 (2018-02-28)
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
Received: from arm.com (217.140.106.53) by
 LNXP265CA0001.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5e::13) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.3412.20 via Frontend Transport; Fri, 25 Sep 2020 14:27:11 +0000
X-Originating-IP: [217.140.106.53]
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-HT: Tenant
X-MS-Office365-Filtering-Correlation-Id: 2f50e31c-9ef3-4120-10b2-08d8615f1c80
X-MS-TrafficTypeDiagnostic: VE1PR08MB5678:|HE1PR0802MB2186:
X-Microsoft-Antispam-PRVS: 
 <HE1PR0802MB21861CDEFB4B1C846253F963FF360@HE1PR0802MB2186.eurprd08.prod.outlook.com>
x-checkrecipientrouted: true
NoDisclaimer: true
X-MS-Oob-TLC-OOBClassifiers: OLM:10000;OLM:10000;
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: 
 ygyjihB223NohqiIjmXYX5tygSgAzIparEI2F3adTNmILhzvrJTnMG3bc0fEwWptVh17i0vjks/cgWGxcQUbfg8+Bf/rgFDrChAhURW1+PS5xqRwOEBhIOLKCk7/n+dE6XcHwKJ+tdEg19hmPrwH3r/hAd+5EYprK0kq4wmJKpLpYpj6gteN7BzdURLzu+ocu4OCSaZRoCPo1VH4aJxNarb25AtAIUe6ciK72rN4GzEYmxUhzCM+2CGDxVAYUE3dQUvPMgcir3Nbn1jnveeIezAJzLqZE2wgRSr2a9kI0FHovnl2QKbJW7dumZ2qWL0tM69QbVC3YcGknhyLswRz1MXb7PwirW+Q1nH/JmxZJQWN3MHCvzFjHEpDaZraUnW9bgOQ7Ky9+JnX9ERkea3xlmRzvydldik5rVxL9cJIr8+105IWMLwcrxAKA1aFEGdTJdx1ODdduDl6lp7qJEUHSleHMfXYEciqmefuqxv7UYDUCTdvWhk4BxYviM8vV2QM
X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com;
 PTR:; CAT:NONE;
 SFS:(4636009)(396003)(366004)(136003)(39860400002)(346002)(376002)(8936002)(1076003)(55016002)(33656002)(8886007)(66946007)(66556008)(86362001)(66616009)(83380400001)(36756003)(66476007)(478600001)(6916009)(235185007)(44144004)(5660300002)(16526019)(186003)(4326008)(33964004)(52116002)(44832011)(7696005)(316002)(2616005)(966005)(26005)(2906002)(8676002)(956004)(2700100001);
 DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData: 
 z/3NMwoYakphJexsdhfxMZtmN47xeOFSGv9BUEcySRSXahp7BmDtGKCo+CyF3p8ix71PNh75bEMSDjqhfjv5d1hEuQkye++KVBodxTW4FgpfasYeIwnDGlja01UzlF/+k34aFbUy5uXA3aaA93w4GYQSNQ15AZQPaqfaMUKLBxKRRbFmUNmqL0CKBOuVXqe1YXQbw78JQbj3T95o130KCBISh17prmuEOpJtlhqUDvZzazq5sH1MqvA/JHQnkvaOSePbdeSEwt8FbqvbyxKjwoNl7bRt+pVdq1qQyDL0nYAkwmp3D0YeFPJUFw6XBomUor+PUifxi2MrBggWYzLg54ZJCcKAinQM2/8PI/rp1ZLo0FyK/pgLZELMLZ8Ejqfi0m1sB2XMEw6rqaY9IUhAgCbQvEjStEW+DVkibGYPnG27bn2QStfuCzAuqEZdZ/xxwxMquA9ZEeAlWbOEmkSZRYXLnRn2yZD30XbnSZlKzmsCL3Kbh739NStV+KuYNaiiA8zuZA9t74NGo2+sDDjolnijgMudqwvvIxbnro6NS8jSXwes0MD2W6nbaXPjohjb+PhDikRBK+vu4006ATdWFZXxgKQik4xWjsVr+3UD1JBso5yuRGUwmmJJpVi8aatWqJ6nu7HdrcKpnFk/DHdLCA==
X-MS-Exchange-Transport-Forked: True
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5678
Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed)
 header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com;
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: 
 AM5EUR03FT041.eop-EUR03.prod.protection.outlook.com
X-MS-Office365-Filtering-Correlation-Id-Prvs: 
 7e7d5c39-9cd3-4b93-e035-08d8615f179a
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: 
 L+cRNMxz1Qc1Kp2oojW/n5Ceee4+MoZhELaKTyH1ZkLH0iv683d/lKrRGCYCuiUQwJ4WhQ6QoQiXPa21t/YMfxy1vMhq+4B0u/ULCXosUSqk1fbkQjoO05JymJBmV0Srdae8w2eyadTKiacDjXitpCYqUi1ST2fn7DFROeljEl6u50E0L6Y19oImnzvPfklJm4YSoiHkGvJbs7UIBrI1KftP0KcVyxL6f2AM3hsiAlyBLwu9wG4RNMfM+eHyUsH2bl6VW5o9DnGCVsPi6VmgBArsNtzyxVaZgI0VQbkkFFcLDEYKFGfLQfyuK5rweb28b8GQ86LiKSVYt4QQDlYfpTbiv8FEaKpFetaRMiQbxbYN6DGAU/37ZfxVImU2ekBHoKy6Cs9DIc0YLHQEVOI6Q7f2GnrvRJEEt3C5vHPp1hZ5DbcaXL9aZGj2sNMb0JZ9dA+UQA+qQ419qQQrcdpbPSiedirGTgVsZoosarUdworryJRTda0fLMVvn3L0D0rmL2yLa2uk7tmLSGW2Y86rTSVVNxwDzCAPY3Od9yYFQt8td6r9lX1nkgpHjSZClfDg
X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:;
 IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE;
 SFS:(4636009)(39860400002)(136003)(346002)(376002)(396003)(46966005)(82740400003)(8886007)(5660300002)(316002)(4326008)(47076004)(36906005)(966005)(36756003)(2906002)(70586007)(81166007)(33656002)(83380400001)(478600001)(1076003)(44832011)(86362001)(2616005)(235185007)(8676002)(6916009)(956004)(33964004)(55016002)(82310400003)(336012)(44144004)(7696005)(16526019)(356005)(8936002)(70206006)(26005)(66616009)(186003)(2700100001);
 DIR:OUT; SFP:1101;
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Sep 2020 14:27:20.0307 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 2f50e31c-9ef3-4120-10b2-08d8615f1c80
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 AM5EUR03FT041.eop-EUR03.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2186
X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Cc: nd@arm.com, rguenther@suse.de, ook@ucw.cz
Errors-To: gcc-patches-bounces@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>

Series

middle-end Add support for SLP vectorization of complex number instructions. | expand

Message

Tamar Christina Sept. 25, 2020, 2:27 p.m. UTC

Hi All,

This patch series adds support for SLP vectorization of complex instructions [1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel. Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these vector
instructions expect this and no longer need the compiler to generate a permute.

For this reason the pass also re-orders the loads in the SLP tree such that they
become contiguous and no longer need the permutes. The Basic Blocks are left
untouched such that the scalar loop will still correctly issue permutes.

The instructions also support rotations along the Argand plane, as such the operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for:

* Complex Addition with rotation of 0 and 180.
* Complex Multiplication and Multiplication where one operand is conjucated.
* Complex FMA and FMA where one operand is conjucated.
* Complex FMS and FMS where one operand is conjucated.

Complex dot-product is not currently supported in this patch set as build_slp fails
for it. This will be provided as a future patch.

These are supported for both integer and floating point and as such these don't look
for real or imaginary pairs but instead rely on the early lowering of complex
numbers by GCC and canonicazation of the operations such that it just recognizes any
instruction sequence matching the operations requested.

To be safe when the it is not sure it can support the operation or if it finds something it
does not understand it backs off.

This patch is an RFC and I am looking on feedback on the approach. Particularly
this series has one problem which is when it is decided that SLP is not viable
and that the normal loop vectorizer is to be used.

In this case I dissolve the changes but the compiler crashes because the use of
pattern matcher essentially undoes two_operands. This means that the number of
copies needed when using the patterns and when not are different. When using
the patterns the two operands become the same and so are treated as manually
unrolled loops. The problem is that because nunits has already been decided
along with the unroll factor. When the dissolved statements are then analyzed
they fail. This is also the reason why I cannot analyze both the pattern and
original statements initially.

The relavent placed in the source code have comments describing the problem.

[1] https://developer.arm.com/documentation/ddi0487/fc/

Thanks,
Tamar

Comments

Richard Biener Sept. 28, 2020, 11:55 a.m. UTC | #1

On Fri, 25 Sep 2020, Tamar Christina wrote:

> Hi All,
> 
> This patch series adds support for SLP vectorization of complex instructions [1].
> 
> These instructions exist only in their vector forms and require you to recognize
> two statements in parallel.  Complex operations usually require a permute due to
> the fact that the real and imaginary numbers are stored intermixed but these vector
> instructions expect this and no longer need the compiler to generate a permute.
> 
> For this reason the pass also re-orders the loads in the SLP tree such that they
> become contiguous and no longer need the permutes.  The Basic Blocks are left
> untouched such that the scalar loop will still correctly issue permutes.
> 
> The instructions also support rotations along the Argand plane, as such the operands
> have to be re-ordered to coincide with their load group.
> 
> For now, this patch only adds support for:
> 
>   * Complex Addition with rotation of 0 and 180.
>   * Complex Multiplication and Multiplication where one operand is conjucated.
>   * Complex FMA and FMA where one operand is conjucated.
>   * Complex FMS and FMS where one operand is conjucated.
>   
> Complex dot-product is not currently supported in this patch set as build_slp fails
> for it.  This will be provided as a future patch.
>   
> These are supported for both integer and floating point and as such these don't look
> for real or imaginary pairs but instead rely on the early lowering of complex
> numbers by GCC and canonicazation of the operations such that it just recognizes any
> instruction sequence matching the operations requested.
> 
> To be safe when the it is not sure it can support the operation or if it finds something it
> does not understand it backs off.
> 
> This patch is an RFC and I am looking on feedback on the approach.  Particularly
> this series has one problem which is when it is decided that SLP is not viable
> and that the normal loop vectorizer is to be used.
> 
> In this case I dissolve the changes but the compiler crashes because the use of
> pattern matcher essentially undoes two_operands.  This means that the number of
> copies needed when using the patterns and when not are different.  When using
> the patterns the two operands become the same and so are treated as manually
> unrolled loops.  The problem is that because nunits has already been decided
> along with the unroll factor.  When the dissolved statements are then analyzed
> they fail.  This is also the reason why I cannot analyze both the pattern and
> original statements initially.

That's the same as with "regular" patterns btw., if vectorizing the
pattern fails vectorization fails, we never re-consider and we also
have no way of multiple patterns to choose from.

The way "regular" patterns make this a non-issue is that they try
to only convert things that are likely unhandled/suboptimal and
most likely vectorizable.

That said - the solution to the ICE is to _not_ dissolve the changes and
instead make vectorization fail.

Richard.

> The relavent placed in the source code have comments describing the problem.
> 
> [1] https://developer.arm.com/documentation/ddi0487/fc/
> 
> Thanks,
> Tamar