From patchwork Wed Jul 18 21:54:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Steve Ellcey X-Patchwork-Id: 945935 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-481835-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=cavium.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="upkZvNvF"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=CAVIUMNETWORKS.onmicrosoft.com header.i=@CAVIUMNETWORKS.onmicrosoft.com header.b="keXgzCuS"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41W9sh3LRfz9s3x for ; Thu, 19 Jul 2018 07:54:32 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:reply-to:to:cc:date:content-type :mime-version; q=dns; s=default; b=Mx4CBobcCHWKNnRcBTs/lhsUSx2LA frESqijQqlPhhokqrt4ECl64oPFOCa/fRJaeiX2INArczaZyJpLsY56NpiD5g1EW UigXi2h8/y6lUbymrhGj9lDaRW/73bqKxt7tXyLBTyutQuIgPw4YyDH7lK2ser/F /yd88S+tKhRqSg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:reply-to:to:cc:date:content-type :mime-version; s=default; bh=T8bMXEgko8uSptYL5SScilQeINo=; b=upk ZvNvFYMsItWXJ0rqdAevHCo3vyt+i+oDrtpwN9oW1nrzBoIpoocU9k+DG2sbB3+n Sh646YOLcnOfa7Px6p0xw7i+1QcixW1PxFyQbSKiTTnq+Jo3wkANvYlHHhRv0JlY C85ZAF4iG5rZD2wJXo+qsah1nCIR6VNcpxwhJD3Q= Received: (qmail 64763 invoked by alias); 18 Jul 2018 21:54:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 64750 invoked by uid 89); 18 Jul 2018 21:54:23 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=steve, aarch64_sub_sp, Steve, 5006 X-HELO: NAM03-CO1-obe.outbound.protection.outlook.com Received: from mail-co1nam03on0053.outbound.protection.outlook.com (HELO NAM03-CO1-obe.outbound.protection.outlook.com) (104.47.40.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 18 Jul 2018 21:54:21 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hldd7ZdB5mPuGN8SCuAUKrtJYSojvVmXQei/NLFW3LA=; b=keXgzCuSdcpv//Q9biTN1o8boqu/Wo/J3x1wrKevhYrxMp2gUoOQb3p9OH2BmZV25xnim3nvVBI4BZwb050K2FpZIscuMLAI/jeB7olBXaAw93jvJMbndfXmBDEvjzkujekwOVtm26djqKtoTAvm+yco4irXcVS+YtaRrri17XE= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Steve.Ellcey@cavium.com; Received: from sellcey-dt.caveonetworks.com (50.233.148.155) by DM6PR07MB5036.namprd07.prod.outlook.com (2603:10b6:5:25::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.973.16; Wed, 18 Jul 2018 21:54:17 +0000 Message-ID: <1531950855.1378.56.camel@cavium.com> Subject: RFC: Patch to implement Aarch64 SIMD ABI From: Steve Ellcey Reply-To: sellcey@cavium.com To: gcc-patches Cc: "richard.earnshaw" , "james.greenhalgh" , Marcus Shawcroft , richard.sandiford@arm.com Date: Wed, 18 Jul 2018 14:54:15 -0700 Mime-Version: 1.0 Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) This is a patch to support the Aarch64 SIMD ABI [1] in GCC.  I intend to eventually follow this up with two more patches; one to define the TARGET_SIMD_CLONE* macros and one to improve the GCC register allocation/usage when calling SIMD functions. The significant difference between the standard ARM ABI and the SIMD ABI is that in the normal ABI a callee saves only the lower 64 bits of registers V8-V15, in the SIMD ABI the callee must save all 128 bits of registers V8-V23. This patch checks for SIMD functions and saves the extra registers when needed.  It does not change the caller behavour, so with just this patch there may be values saved by both the caller and callee.  This is not efficient, but it is correct code. This patch bootstraps and passes the GCC testsuite but that only verifies I haven't broken anything, it doesn't validate the handling of SIMD functions. I tried to write some tests, but I could never get GCC to generate code that would save the FP callee-save registers in the prologue.  Complex code might generate spills and fills but it never triggered the prologue/epilogue code to save V8-V23.  If anyone has ideas on how to write a test that would cause GCC to generate this code I would appreciate some ideas.  Just doing lots of calculations with lots of intermediate values doesn't seem to be enough. Steve Ellcey sellcey@cavium.com [1] https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi 2018-07-18  Steve Ellcey   * config/aarch64/aarch64.c (aarch64_attribute_table): New array. (aarch64_simd_function_p): New function. (aarch64_layout_frame): Check for simd function. (aarch64_process_components): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (TARGET_ATTRIBUTE_TABLE): New define. * config/aarch64/aarch64.h (FP_SIMD_SAVED_REGNUM_P): New define. * config/aarch64/aarch64.md (V23_REGNUM) New constant. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1369704..b25da11 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1026,6 +1026,15 @@ static const struct processor *selected_tune; /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* Table of machine attributes. */ +static const struct attribute_spec aarch64_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "aarch64_vector_pcs", 0, 0, true, false, false, false, NULL, NULL }, + { NULL, 0, 0, false, false, false, false, NULL, NULL } +}; + #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0) /* An ISA extension in the co-processor and main instruction set space. */ @@ -1404,6 +1413,18 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return false; } +/* Return true if this is a definition of a vectorized simd function. */ + +static bool +aarch64_simd_function_p (tree fndecl) +{ + if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL) + return true; + if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL) + return false; + return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))); +} + /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves the lower 64 bits of a 128-bit register. Tell the compiler the callee clobbers the top 64 bits when restoring the bottom 64 bits. */ @@ -4034,6 +4055,7 @@ aarch64_layout_frame (void) { HOST_WIDE_INT offset = 0; int regno, last_fp_reg = INVALID_REGNUM; + bool simd_function = aarch64_simd_function_p (cfun->decl); if (reload_completed && cfun->machine->frame.laid_out) return; @@ -4068,7 +4090,8 @@ aarch64_layout_frame (void) for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++) if (df_regs_ever_live_p (regno) - && !call_used_regs[regno]) + && (!call_used_regs[regno] + || (simd_function && FP_SIMD_SAVED_REGNUM_P (regno)))) { cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED; last_fp_reg = regno; @@ -4105,7 +4128,8 @@ aarch64_layout_frame (void) { /* If there is an alignment gap between integer and fp callee-saves, allocate the last fp register to it if possible. */ - if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0) + if (regno == last_fp_reg && has_align_gap + && !simd_function && (offset & 8) == 0) { cfun->machine->frame.reg_offset[regno] = max_int_offset; break; @@ -4117,7 +4141,7 @@ aarch64_layout_frame (void) else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM && cfun->machine->frame.wb_candidate1 >= V0_REGNUM) cfun->machine->frame.wb_candidate2 = regno; - offset += UNITS_PER_WORD; + offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD; } offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT); @@ -4706,8 +4730,11 @@ aarch64_process_components (sbitmap components, bool prologue_p) while (regno != last_regno) { /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved - so DFmode for the vector registers is enough. */ - machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode; + so DFmode for the vector registers is enough. For simd functions + we want to save the entire register. */ + machine_mode mode = GP_REGNUM_P (regno) ? E_DImode + : (aarch64_simd_function_p (cfun->decl) ? E_TFmode : E_DFmode); + rtx reg = gen_rtx_REG (mode, regno); poly_int64 offset = cfun->machine->frame.reg_offset[regno]; if (!frame_pointer_needed) @@ -4736,6 +4763,7 @@ aarch64_process_components (sbitmap components, bool prologue_p) mergeable with the current one into a pair. */ if (!satisfies_constraint_Ump (mem) || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2) + || (aarch64_simd_function_p (cfun->decl) && (FP_REGNUM_P (regno))) || maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]), GET_MODE_SIZE (mode))) { @@ -4958,8 +4986,12 @@ aarch64_expand_prologue (void) aarch64_save_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM, callee_adjust != 0 || emit_frame_chain); - aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, - callee_adjust != 0 || emit_frame_chain); + if (aarch64_simd_function_p (cfun->decl)) + aarch64_save_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM, + callee_adjust != 0 || emit_frame_chain); + else + aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, + callee_adjust != 0 || emit_frame_chain); aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed); } @@ -5040,8 +5072,12 @@ aarch64_expand_epilogue (bool for_sibcall) aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM, callee_adjust != 0, &cfi_ops); - aarch64_restore_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, - callee_adjust != 0, &cfi_ops); + if (aarch64_simd_function_p (cfun->decl)) + aarch64_restore_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM, + callee_adjust != 0, &cfi_ops); + else + aarch64_restore_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, + callee_adjust != 0, &cfi_ops); if (need_barrier_p) emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx)); @@ -18070,6 +18106,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_SELECT_EARLY_REMAT_MODES #define TARGET_SELECT_EARLY_REMAT_MODES aarch64_select_early_remat_modes +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE aarch64_attribute_table + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index f284e74..d11474e 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -500,6 +500,8 @@ extern unsigned aarch64_architecture_version; #define PR_LO_REGNUM_P(REGNO)\ (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM)) +#define FP_SIMD_SAVED_REGNUM_P(REGNO) \ + (((unsigned) (REGNO - V8_REGNUM)) <= (V23_REGNUM - V8_REGNUM)) /* Register and constant classes. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a014a01..d319430 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -63,6 +63,7 @@ (V15_REGNUM 47) (V16_REGNUM 48) (V20_REGNUM 52) + (V23_REGNUM 55) (V24_REGNUM 56) (V28_REGNUM 60) (V31_REGNUM 63)