From patchwork Tue Jul 31 22:24:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Steve Ellcey X-Patchwork-Id: 951839 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-482819-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cavium.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="szsaI2MG"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=CAVIUMNETWORKS.onmicrosoft.com header.i=@CAVIUMNETWORKS.onmicrosoft.com header.b="POWAi5oe"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41g9x345zRz9s0R for ; Wed, 1 Aug 2018 08:25:09 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:reply-to:to:cc:date:content-type :mime-version; q=dns; s=default; b=bYj4G/9ZrYv3mToEv/8VMkQ9ErStP zKRQRWc/fYINc04Nfv2B7lnkI/Z7fsMP7EzUKKvSG9frQ8lEEx8p/ygPvw66p9bE v2GcFhp/rF6fmtPXC5+48Q1YAblYjddEXsMaH4ZSqevkj5O2/1hw9g5Lp91MTYqP qp+swSWVDUXZqM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:reply-to:to:cc:date:content-type :mime-version; s=default; bh=kl2amcdhzTRLcRCirzeRX9YVVF4=; b=szs aI2MG0bMpDp7g6OkppMNpAomOcjqYctrZnJaDMXe7sQTYKV/RWb5baIS9NzC0WVB 9ixDzHHkUyMm69Gwfl3GrgmdSsSnAm1WGRylAM1r3IB4cHxk6CMvo/tmvHT/G0AM FBMU8P2qoPlXWqh/sH2rYsKWESIl8lqiN/i/FiXY= Received: (qmail 78560 invoked by alias); 31 Jul 2018 22:25:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 78488 invoked by uid 89); 31 Jul 2018 22:24:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=bc, ac, KILL, Say X-HELO: NAM03-CO1-obe.outbound.protection.outlook.com Received: from mail-co1nam03on0071.outbound.protection.outlook.com (HELO NAM03-CO1-obe.outbound.protection.outlook.com) (104.47.40.71) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 31 Jul 2018 22:24:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=X8CaM309/Ano+TMUYUmCBMArkBVRuRuvwhhkvr3TigQ=; b=POWAi5oeXrRh68eLsRnweHpofl+RajqAsgw+kjMehHRTfy3DS/R13Uj+5UXaEYnW14jopxYaAz9q2BKtZHUhFz1EQ4Ban7dRelDvMzMaFbXRu8IMfnC91VWtGKmnSXjXGGs46gI3Qtj7b1UCzgC8nA+LsH9dSccM4e4Umli3YcA= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Steve.Ellcey@cavium.com; Received: from sellcey-dt.caveonetworks.com (50.233.148.155) by BL0PR07MB5025.namprd07.prod.outlook.com (2603:10b6:208:49::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.995.21; Tue, 31 Jul 2018 22:24:50 +0000 Message-ID: <1533075888.3879.14.camel@cavium.com> Subject: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute From: Steve Ellcey Reply-To: sellcey@cavium.com To: gcc-patches Cc: Wilco Dijkstra , "richard.sandiford" , "richard.earnshaw" , "james.greenhalgh" , Marcus Shawcroft Date: Tue, 31 Jul 2018 15:24:48 -0700 Mime-Version: 1.0 Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) Here is a new version of my patch to support the Aarch64 SIMD ABI [1] in GCC.  I think this is complete enought to be considered for check in.  I wrote a few new tests and put them in a new gcc.target/torture directory so they would be run with multiple optimization options.  I also verified that there are no regressions in the GCC testsuite. The significant difference between the standard ARM ABI and the SIMD ABI is that in the normal ABI a callee saves only the lower 64 bits of registers V8-V15, in the SIMD ABI the callee must save all 128 bits of registers V8-V23. As I mentioned in my RFC, I intend to (eventually) follow this patch with two more, one to define the TARGET_SIMD_CLONE* macros and one to improve the GCC register allocation/usage when calling SIMD functions.  Right now, a caller calling a SIMD function will save more registers than it needs to because some of those registers will also be saved by the callee. Steve Ellcey sellcey@cavium.com [1] https://developer.arm.com/products/software-development-tools/hpc/a rm-compiler-for-hpc/vector-function-abi Compiler ChangeLog: 2018-07-31  Steve Ellcey   * config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p): New prototype. (aarch64_epilogue_uses): Ditto. * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_decl_p): New function. (aarch64_reg_save_mode): New function. (aarch64_is_simd_call_p): New function. (aarch64_function_ok_for_sibcall): Check for simd calls. (aarch64_layout_frame): Check for simd function. (aarch64_gen_storewb_pair): Handle E_TFmode. (aarch64_push_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_loadwb_pair): Handle E_TFmode. (aarch64_pop_regs): Use aarch64_reg_save_mode to get mode. (aarch64_gen_store_pair): Handle E_TFmode. (aarch64_gen_load_pair): Ditto. (aarch64_save_callee_saves): Handle different mode sizes. (aarch64_restore_callee_saves): Ditto. (aarch64_components_for_bb): Check for simd function. (aarch64_epilogue_uses): New function. (aarch64_process_components): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (aarch64_expand_call): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (EPILOGUE_USES): Redefine. (FP_SIMD_SAVED_REGNUM_P): New macro. * config/aarch64/aarch64.md (V23_REGNUM) New constant. (simple_return): New define_expand. (load_pair_dw_tftf): New instruction. (store_pair_dw_tftf): Ditto. (loadwb_pair_): Ditto. ("storewb_pair_): Ditto. Testsuite ChangeLog: 2018-07-31  Steve Ellcey   * gcc.target/aarch64/torture/aarch64-torture.exp: New file. * gcc.target/aarch64/torture/simd-abi-1.c: New test. * gcc.target/aarch64/torture/simd-abi-2.c: Ditto. * gcc.target/aarch64/torture/simd-abi-3.c: Ditto. * gcc.target/aarch64/torture/simd-abi-4.c: Ditto. diff --git a/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp b/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp index e69de29..22f08ff 100644 --- a/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp +++ b/gcc/testsuite/gcc.target/aarch64/torture/aarch64-torture.exp @@ -0,0 +1,41 @@ +# Copyright (C) 2018 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# GCC testsuite that uses the `gcc-dg.exp' driver, looping over +# optimization options. + +# Exit immediately if this isn't a Aarch64 target. +if { ![istarget aarch64*-*-*] } then { + return +} + +# Load support procs. +load_lib gcc-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_CFLAGS +if ![info exists DEFAULT_CFLAGS] then { + set DEFAULT_CFLAGS " -ansi -pedantic-errors" +} + +# Initialize `dg'. +dg-init + +# Main loop. +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] "" $DEFAULT_CFLAGS + +# All done. +dg-finish diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c index e69de29..e11580a 100644 --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-1.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ + +void __attribute__ ((aarch64_vector_pcs)) +f (void) +{ + /* Clobber all fp/simd regs and verify that the correct ones are saved + and restored in the prologue and epilogue of a SIMD function. */ + __asm__ __volatile__ ("" ::: "q0", "q1", "q2", "q3"); + __asm__ __volatile__ ("" ::: "q4", "q5", "q6", "q7"); + __asm__ __volatile__ ("" ::: "q8", "q9", "q10", "q11"); + __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15"); + __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19"); + __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23"); + __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27"); + __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31"); +} + +/* { dg-final { scan-assembler "\[ \t\]stp\tq8, q9" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq10, q11" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq12, q13" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq14, q15" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq16, q17" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq18, q19" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq20, q21" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq22, q23" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq8, q9" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq10, q11" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq12, q13" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq14, q15" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq16, q17" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq18, q19" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq20, q21" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq22, q23" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[034567\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\tq\[034567\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq2\[456789\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\tq2\[456789\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\td" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\td" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]str\t" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldr\t" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c index e69de29..ecc60d0 100644 --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ + +void +f (void) +{ + /* Clobber all fp/simd regs and verify that the correct ones are saved + and restored in the prologue and epilogue of a SIMD function. */ + __asm__ __volatile__ ("" ::: "q0", "q1", "q2", "q3"); + __asm__ __volatile__ ("" ::: "q4", "q5", "q6", "q7"); + __asm__ __volatile__ ("" ::: "q8", "q9", "q10", "q11"); + __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15"); + __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19"); + __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23"); + __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27"); + __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31"); +} + +/* { dg-final { scan-assembler "\[ \t\]stp\td8, d9" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\td10, d11" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\td12, d13" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\td14, d15" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\td8, d9" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\td10, d11" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\td12, d13" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\td14, d15" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\tq" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[01234567\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\tq\[01234567\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq1\[6789\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\tq1\[6789\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]str\t" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldr\t" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c index e69de29..d7926d3 100644 --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-3.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ + +extern void g (void); + +void __attribute__ ((aarch64_vector_pcs)) +f (void) +{ + g(); +} + +/* { dg-final { scan-assembler "\[ \t\]stp\tq8, q9" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq10, q11" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq12, q13" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq14, q15" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq16, q17" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq18, q19" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq20, q21" } } */ +/* { dg-final { scan-assembler "\[ \t\]stp\tq22, q23" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq8, q9" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq10, q11" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq12, q13" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq14, q15" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq16, q17" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq18, q19" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq20, q21" } } */ +/* { dg-final { scan-assembler "\[ \t\]ldp\tq22, q23" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq\[034567\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\tq\[034567\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\tq2\[456789\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\tq2\[456789\]" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stp\td" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldp\td" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]str\t" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]ldr\t" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c index e69de29..e399690 100644 --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-4.c @@ -0,0 +1,34 @@ +/* dg-do run */ +/* { dg-additional-options "-std=c99" } */ + + + +/* There is nothing special about the calculations here, this is just + a test that can be compiled and run. */ + +extern void abort (void); + +__Float64x2_t __attribute__ ((noinline, aarch64_vector_pcs)) +foo(__Float64x2_t a, __Float64x2_t b, __Float64x2_t c, + __Float64x2_t d, __Float64x2_t e, __Float64x2_t f, + __Float64x2_t g, __Float64x2_t h, __Float64x2_t i) +{ + __Float64x2_t w, x, y, z; + w = a + b * c; + x = d + e * f; + y = g + h * i; + return w + x * y; +} + + +int main() +{ + __Float64x2_t a, b, c, d; + a = (__Float64x2_t) { 1.0, 2.0 }; + b = (__Float64x2_t) { 3.0, 4.0 }; + c = (__Float64x2_t) { 5.0, 6.0 }; + d = foo (a, b, c, (a+b), (b+c), (a+c), (a-b), (b-c), (a-c)) + a + b + c; + if (d[0] != 337.0 || d[1] != 554.0) + abort (); + return 0; +}