From patchwork Fri Jan 6 14:17:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 711936 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tw68s0RBsz9svs for ; Sat, 7 Jan 2017 01:18:20 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Y5XUrg1C"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=qMd38TM0HxR9I/1H6 TQ693tm3WLsvjMyuA3fT+oT+GL79tqxq4LTDUwIkkypfo4n52PF+E4uAUm9VfCbj AN1SCtxgJ7ro6MPSvaZX1jViiYYnCkdTcMcBheuVfitQIPqQ4lqfAq3B0pbF0SNa RgxV+yM5DC+IseGWe3GrENySwg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=7NCAFlc+eVqCcUo1w2F7g5e R/wk=; b=Y5XUrg1CJVwwqUwdNm60JLGaDIiaPqG9bsjxb6BAvZBZPFF1i9KPG0O /sbd8+CZZ8J7+3i0L+6//AcaYerzdhVNLPDpccJna8UJTOJfWVn3wunYRjuXNtfw na0S89FpkaSKgMsUXIljvwZ8oKHxwJEKSLeqzOUGccxDsEqUd7P4= Received: (qmail 55815 invoked by alias); 6 Jan 2017 14:18:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 54851 invoked by uid 89); 6 Jan 2017 14:18:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.0 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=am, 22526, Advanced, Wireless X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 06 Jan 2017 14:17:52 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AE566707; Fri, 6 Jan 2017 06:17:50 -0800 (PST) Received: from [10.2.206.251] (e107157-lin.cambridge.arm.com [10.2.206.251]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CB0B43F24D; Fri, 6 Jan 2017 06:17:49 -0800 (PST) Subject: Re: [PATCH 3/6][ARM] Implement support for ACLE Coprocessor CDP intrinsics To: Kyrill Tkachov , gcc-patches@gcc.gnu.org References: <5822F3CB.3040202@arm.com> <5822F652.208@arm.com> <586E23A3.8090402@foss.arm.com> Cc: Ramana Radhakrishnan From: "Andre Vieira (lists)" Message-ID: <586FA70C.7040100@arm.com> Date: Fri, 6 Jan 2017 14:17:48 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <586E23A3.8090402@foss.arm.com> X-IsSubscribed: yes On 05/01/17 10:44, Kyrill Tkachov wrote: > Hi Andre, > > On 09/11/16 10:11, Andre Vieira (lists) wrote: >> Hi, >> >> This patch implements support for the ARM ACLE Coprocessor CDP >> intrinsics. See below a table mapping the intrinsics to their respective >> instructions: >> >> +----------------------------------------------------+--------------------------------------+ >> >> | Intrinsic signature | Instruction >> pattern | >> +----------------------------------------------------+--------------------------------------+ >> >> |void __arm_cdp(coproc, opc1, CRd, CRn, CRm, opc2) |CDP coproc, opc1, >> CRd, CRn, CRm, opc2 | >> +----------------------------------------------------+--------------------------------------+ >> >> |void __arm_cdp2(coproc, opc1, CRd, CRn, CRm, opc2) |CDP2 coproc, opc1, >> CRd, CRn, CRm, opc2| >> +----------------------------------------------------+--------------------------------------+ >> >> Note that any untyped variable in the intrinsic signature is required to >> be a compiler-time constant and has the type 'unsigned int'. We do some >> boundary checks for coproc:[0-15], opc1:[0-15], CR*:[0-31], opc2:[0-7]. >> If either of these requirements are not met a diagnostic is issued. >> >> I renamed neon_const_bounds in this patch, to arm_const_bounds, simply >> because it is also used in the Coprocessor intrinsics. It also requires >> the expansion of the builtin frame work such that it accepted 'void' >> modes and intrinsics with 6 arguments. >> >> I also changed acle.exp to run tests for multiple options, where all lto >> option sets are appended with -ffat-objects to allow for assembly scans. >> >> Is this OK for trunk? > > This is okay if bootstrap and testing is ok (as part of the whole series) > modulo a couple of nits in the documentation below. > > Thanks, > Kyrill > >> Regards, >> Andre >> >> gcc/ChangeLog: >> 2016-11-09 Andre Vieira >> >> * config/arm/arm.md (): New. >> * config/arm/arm.c (neon_const_bounds): Rename this ... >> (arm_const_bounds): ... this. >> (arm_coproc_builtin_available): New. >> * config/arm/arm-builtins.c (SIMD_MAX_BUILTIN_ARGS): Increase. >> (arm_type_qualifiers): Add 'qualifier_unsigned_immediate'. >> (CDP_QUALIFIERS): Define to... >> (arm_cdp_qualifiers): ... this. New. >> (void_UP): Define. >> (arm_expand_builtin_args): Add case for 6 arguments. >> * config/arm/arm-protos.h (neon_const_bounds): Rename this ... >> (arm_const_bounds): ... this. >> (arm_coproc_builtin_available): New. >> * config/arm/arm_acle.h (__arm_cdp): New. >> (__arm_cdp2): New. >> * config/arm/arm_acle_builtins.def (cdp): New. >> (cdp2): New. >> * config/arm/iterators.md (CDPI,CDP,cdp): New. >> * config/arm/neon.md: Rename all 'neon_const_bounds' to >> 'arm_const_bounds'. >> * config/arm/types.md (coproc): New. >> * config/arm/unspecs.md (VUNSPEC_CDP, VUNSPEC_CDP2): New. >> * gcc/doc/extend.texi (ACLE): Add a mention of Coprocessor intrinsics. >> >> gcc/testsuite/ChangeLog: >> 2016-11-09 Andre Vieira >> >> * gcc.target/arm/acle/acle.exp: Run tests for different options >> and make sure fat-lto-objects is used such that we can still do >> assemble scans. >> * gcc.target/arm/acle/cdp.c: New. >> * gcc.target/arm/acle/cdp2.c: New. >> * lib/target-supports.exp (check_effective_target_arm_coproc1_ok): >> New. >> (check_effective_target_arm_coproc1_ok_nocache): New. >> (check_effective_target_arm_coproc2_ok): New. >> (check_effective_target_arm_coproc2_ok_nocache): New. >> (check_effective_target_arm_coproc3_ok): New. >> (check_effective_target_arm_coproc3_ok_nocache): New. > > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -1675,6 +1675,21 @@ and @code{MOVT} instructions available. > ARM target generates Thumb-1 code for @code{-mthumb} with > @code{CBZ} and @code{CBNZ} instructions available. > > +@item arm_coproc1_ok > +@anchor{arm_coproc1_ok} > +ARM target supports the following coprocessor instruction: @code{CDP}, > +@code{LDC}, @code{STC}, @code{MCR} and @code{MRC}. > > > s/instruction/instructions/ > > > +@item arm_coproc2_ok > +@anchor{arm_coproc2_ok} > +ARM target supports the all the coprocessor instructions also listed as > +supported in @ref{arm_coproc1_ok} and the following: @code{CDP2}, > @code{LDC2}, > +@code{LDC2l}, @code{STC2}, @code{STC2l}, @code{MCR2} and @code{MRC2}. > + > > s/the all the/all the/. > Also, I'd prefer to say "in addition to the following" rather than "and > the following" > > +@item arm_coproc3_ok > +ARM target supports the all the coprocessor instructions also listed as > +supported in @ref{arm_coproc2_ok} and the following: @code{MCRR}, > @code{MCRR2}, > +@code{MRRC}, and @code{MRRC2}. > > Likewise. > Hi, I reworked this patch after comments, rebased and noticed I had grouped MCRR/MRRC and MCRR2/MRRC2 together, the first two are supported in ARMv5TE but the latter only in ARMv6 and onwards. So I fixed the testsuite checks in this patch and the generation in the latter patch. I ran the patch series through a bootstrap and full regression on arm-none-linux-gnueabihf. Is this OK for trunk? Regards, Andre gcc/ChangeLog: 2017-01-xx Andre Vieira * config/arm/arm.md (): New. * config/arm/arm.c (neon_const_bounds): Rename this ... (arm_const_bounds): ... this. (arm_coproc_builtin_available): New. * config/arm/arm-builtins.c (SIMD_MAX_BUILTIN_ARGS): Increase. (arm_type_qualifiers): Add 'qualifier_unsigned_immediate'. (CDP_QUALIFIERS): Define to... (arm_cdp_qualifiers): ... this. New. (void_UP): Define. (arm_expand_builtin_args): Add case for 6 arguments. * config/arm/arm-protos.h (neon_const_bounds): Rename this ... (arm_const_bounds): ... this. (arm_coproc_builtin_available): New. * config/arm/arm_acle.h (__arm_cdp): New. (__arm_cdp2): New. * config/arm/arm_acle_builtins.def (cdp): New. (cdp2): New. * config/arm/iterators.md (CDPI,CDP,cdp): New. * config/arm/neon.md: Rename all 'neon_const_bounds' to 'arm_const_bounds'. * config/arm/types.md (coproc): New. * config/arm/unspecs.md (VUNSPEC_CDP, VUNSPEC_CDP2): New. * gcc/doc/extend.texi (ACLE): Add a mention of Coprocessor intrinsics. gcc/testsuite/ChangeLog: 2017-01-xx Andre Vieira * gcc.target/arm/acle/acle.exp: Run tests for different options and make sure fat-lto-objects is used such that we can still do assemble scans. * gcc.target/arm/acle/cdp.c: New. * gcc.target/arm/acle/cdp2.c: New. * lib/target-supports.exp (check_effective_target_arm_coproc1_ok): New. (check_effective_target_arm_coproc1_ok_nocache): New. (check_effective_target_arm_coproc2_ok): New. (check_effective_target_arm_coproc2_ok_nocache): New. (check_effective_target_arm_coproc3_ok): New. (check_effective_target_arm_coproc3_ok_nocache): New. (check_effective_target_arm_coproc4_ok): New. (check_effective_target_arm_coproc4_ok_nocache): New. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index febbec9fca079ac03b93edec970ebc537e25309b..2bb9e22bb8cf7ae2d8a5698e970af4845016d93c 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -39,7 +39,7 @@ #include "case-cfn-macros.h" #include "sbitmap.h" -#define SIMD_MAX_BUILTIN_ARGS 5 +#define SIMD_MAX_BUILTIN_ARGS 7 enum arm_type_qualifiers { @@ -54,6 +54,7 @@ enum arm_type_qualifiers /* Used when expanding arguments if an operand could be an immediate. */ qualifier_immediate = 0x8, /* 1 << 3 */ + qualifier_unsigned_immediate = 0x9, qualifier_maybe_immediate = 0x10, /* 1 << 4 */ /* void foo (...). */ qualifier_void = 0x20, /* 1 << 5 */ @@ -165,6 +166,18 @@ arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers) +/* void (unsigned immediate, unsigned immediate, unsigned immediate, + unsigned immediate, unsigned immediate, unsigned immediate). */ +static enum arm_type_qualifiers +arm_cdp_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate }; +#define CDP_QUALIFIERS \ + (arm_cdp_qualifiers) /* The first argument (return type) of a store should be void type, which we represent with qualifier_void. Their first operand will be a DImode pointer to the location to store to, so we must use @@ -201,6 +214,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define oi_UP OImode #define hf_UP HFmode #define si_UP SImode +#define void_UP VOIDmode #define UP(X) X##_UP @@ -2226,6 +2240,10 @@ constant_arg: pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3], op[4]); break; + case 6: + pat = GEN_FCN (icode) (target, op[0], op[1], op[2], op[3], op[4], op[5]); + break; + default: gcc_unreachable (); } @@ -2252,6 +2270,10 @@ constant_arg: pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]); break; + case 6: + pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]); + break; + default: gcc_unreachable (); } diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9a8166bf4316c82722b3a299c8a1fcda878561a3..4d6a3ed3d47952728c3c4c1a8bd5ec0b9274bb16 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -96,7 +96,7 @@ extern rtx neon_make_constant (rtx); extern tree arm_builtin_vectorized_function (unsigned int, tree, tree); extern void neon_expand_vector_init (rtx, rtx); extern void neon_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree); -extern void neon_const_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT); +extern void arm_const_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT); extern HOST_WIDE_INT neon_element_bits (machine_mode); extern void neon_emit_pair_result_insn (machine_mode, rtx (*) (rtx, rtx, rtx, rtx), @@ -176,6 +176,7 @@ extern void arm_expand_compare_and_swap (rtx op[]); extern void arm_split_compare_and_swap (rtx op[]); extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); +extern bool arm_coproc_builtin_available (enum unspecv); #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81a1b85812860739c8b414e467476ab3c26cecd5..64599981961d80c5493a88f30743b98a138ca932 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12206,7 +12206,7 @@ neon_lane_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high, /* Bounds-check constants. */ void -neon_const_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high) +arm_const_bounds (rtx operand, HOST_WIDE_INT low, HOST_WIDE_INT high) { bounds_check (operand, low, high, NULL_TREE, "constant"); } @@ -30888,4 +30888,34 @@ arm_expand_divmod_libfunc (rtx libfunc, machine_mode mode, *rem_p = remainder; } +/* This function checks for the availability of the coprocessor builtin passed + in BUILTIN for the current target. Returns true if it is available and + false otherwise. If a BUILTIN is passed for which this function has not + been implemented it will cause an exception. */ + +bool +arm_coproc_builtin_available (enum unspecv builtin) +{ + /* None of these builtins are available in Thumb mode if the target only + supports Thumb-1. */ + if (TARGET_THUMB1) + return false; + + switch (builtin) + { + case VUNSPEC_CDP: + if (arm_arch4) + return true; + break; + case VUNSPEC_CDP2: + /* Only present in ARMv5*, ARMv6 (but not ARMv6-M), ARMv7* and + ARMv8-{A,M}. */ + if (arm_arch5) + return true; + break; + default: + gcc_unreachable (); + } + return false; +} #include "gt-arm.h" diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 7a0ac7f8476cddb51cd93716af85f9cb25ef7090..b5325013c2179c06e0079f35a5c5bd0ae9388d4c 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -11919,6 +11919,26 @@ DONE; }) +(define_insn "" + [(unspec_volatile [(match_operand:SI 0 "immediate_operand" "n") + (match_operand:SI 1 "immediate_operand" "n") + (match_operand:SI 2 "immediate_operand" "n") + (match_operand:SI 3 "immediate_operand" "n") + (match_operand:SI 4 "immediate_operand" "n") + (match_operand:SI 5 "immediate_operand" "n")] CDPI)] + "arm_coproc_builtin_available (VUNSPEC_)" +{ + arm_const_bounds (operands[0], 0, 16); + arm_const_bounds (operands[1], 0, 16); + arm_const_bounds (operands[2], 0, (1 << 5)); + arm_const_bounds (operands[3], 0, (1 << 5)); + arm_const_bounds (operands[4], 0, (1 << 5)); + arm_const_bounds (operands[5], 0, 8); + return "\\tp%c0, %1, CR%c2, CR%c3, CR%c4, %5"; +} + [(set_attr "length" "4") + (set_attr "type" "coproc")]) + ;; Vector bits common to IWMMXT and Neon (include "vec-common.md") ;; Load the Intel Wireless Multimedia Extension patterns diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h index 03cd197b6c4c56c419072d52c46e85a5f2eb98ba..08add2b7ac79f487dea92477d39b9db886a3f027 100644 --- a/gcc/config/arm/arm_acle.h +++ b/gcc/config/arm/arm_acle.h @@ -32,6 +32,26 @@ extern "C" { #endif +#if (!__thumb__ || __thumb2__) && __ARM_ARCH >= 4 +__extension__ static __inline void __attribute__ ((__always_inline__)) +__arm_cdp (const unsigned int __coproc, const unsigned int __opc1, + const unsigned int __CRd, const unsigned int __CRn, + const unsigned int __CRm, const unsigned int __opc2) +{ + return __builtin_arm_cdp (__coproc, __opc1, __CRd, __CRn, __CRm, __opc2); +} + +#if __ARM_ARCH >= 5 +__extension__ static __inline void __attribute__ ((__always_inline__)) +__arm_cdp2 (const unsigned int __coproc, const unsigned int __opc1, + const unsigned int __CRd, const unsigned int __CRn, + const unsigned int __CRm, const unsigned int __opc2) +{ + return __builtin_arm_cdp2 (__coproc, __opc1, __CRd, __CRn, __CRm, __opc2); +} +#endif /* __ARM_ARCH >= 5. */ +#endif /* (!__thumb__ || __thumb2__) && __ARM_ARCH >= 4. */ + #ifdef __ARM_FEATURE_CRC32 __extension__ static __inline uint32_t __attribute__ ((__always_inline__)) __crc32b (uint32_t __a, uint8_t __b) diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def index 81ab7720971ba042a5d64c22b6bd19710147e602..03b5bf88ef2632bceedba1e64c0f83bc50337364 100644 --- a/gcc/config/arm/arm_acle_builtins.def +++ b/gcc/config/arm/arm_acle_builtins.def @@ -24,3 +24,5 @@ VAR1 (UBINOP, crc32w, si) VAR1 (UBINOP, crc32cb, si) VAR1 (UBINOP, crc32ch, si) VAR1 (UBINOP, crc32cw, si) +VAR1 (CDP, cdp, void) +VAR1 (CDP, cdp2, void) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 4f04f1cc0f45205d75ba3200607e074d0e1a96bb..86d6aa70e5766bc42a4209f14e929942ee63b773 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -943,3 +943,8 @@ ;; Attributes for VFMA_LANE/ VFMS_LANE (define_int_attr neon_vfm_lane_as [(UNSPEC_VFMA_LANE "a") (UNSPEC_VFMS_LANE "s")]) + +;; An iterator for the CDP coprocessor instructions +(define_int_iterator CDPI [VUNSPEC_CDP VUNSPEC_CDP2]) +(define_int_attr cdp [(VUNSPEC_CDP "cdp") (VUNSPEC_CDP2 "cdp2")]) +(define_int_attr CDP [(VUNSPEC_CDP "CDP") (VUNSPEC_CDP2 "CDP2")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 40f3a32befef869a0899bf47aa33c25486a8d178..cf281df0292d0f511d7d63e828886d860a3a8201 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3654,7 +3654,7 @@ if (BYTES_BIG_ENDIAN) VCVT_US_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 1, 33); + arm_const_bounds (operands[2], 1, 33); return "vcvt.%#32.f32\t%0, %1, %2"; } [(set_attr "type" "neon_fp_to_int_")] @@ -3668,7 +3668,7 @@ if (BYTES_BIG_ENDIAN) VCVT_US_N))] "TARGET_NEON_FP16INST" { - neon_const_bounds (operands[2], 0, 17); + arm_const_bounds (operands[2], 0, 17); return "vcvt.%#16.f16\t%0, %1, %2"; } [(set_attr "type" "neon_fp_to_int_")] @@ -3681,7 +3681,7 @@ if (BYTES_BIG_ENDIAN) VCVT_US_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 1, 33); + arm_const_bounds (operands[2], 1, 33); return "vcvt.f32.%#32\t%0, %1, %2"; } [(set_attr "type" "neon_int_to_fp_")] @@ -3695,7 +3695,7 @@ if (BYTES_BIG_ENDIAN) VCVT_US_N))] "TARGET_NEON_FP16INST" { - neon_const_bounds (operands[2], 0, 17); + arm_const_bounds (operands[2], 0, 17); return "vcvt.f16.%#16\t%0, %1, %2"; } [(set_attr "type" "neon_int_to_fp_")] @@ -4300,7 +4300,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VEXT))] "TARGET_NEON" { - neon_const_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); + arm_const_bounds (operands[3], 0, GET_MODE_NUNITS (mode)); return "vext.\t%0, %1, %2, %3"; } [(set_attr "type" "neon_ext")] @@ -4397,7 +4397,7 @@ if (BYTES_BIG_ENDIAN) VSHR_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 1, neon_element_bits (mode) + 1); + arm_const_bounds (operands[2], 1, neon_element_bits (mode) + 1); return "v.%#\t%0, %1, %2"; } [(set_attr "type" "neon_shift_imm")] @@ -4411,7 +4411,7 @@ if (BYTES_BIG_ENDIAN) VSHRN_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 1, neon_element_bits (mode) / 2 + 1); + arm_const_bounds (operands[2], 1, neon_element_bits (mode) / 2 + 1); return "v.\t%P0, %q1, %2"; } [(set_attr "type" "neon_shift_imm_narrow_q")] @@ -4425,7 +4425,7 @@ if (BYTES_BIG_ENDIAN) VQSHRN_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 1, neon_element_bits (mode) / 2 + 1); + arm_const_bounds (operands[2], 1, neon_element_bits (mode) / 2 + 1); return "v.%#\t%P0, %q1, %2"; } [(set_attr "type" "neon_sat_shift_imm_narrow_q")] @@ -4439,7 +4439,7 @@ if (BYTES_BIG_ENDIAN) VQSHRUN_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 1, neon_element_bits (mode) / 2 + 1); + arm_const_bounds (operands[2], 1, neon_element_bits (mode) / 2 + 1); return "v.\t%P0, %q1, %2"; } [(set_attr "type" "neon_sat_shift_imm_narrow_q")] @@ -4452,7 +4452,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VSHL_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 0, neon_element_bits (mode)); + arm_const_bounds (operands[2], 0, neon_element_bits (mode)); return "vshl.\t%0, %1, %2"; } [(set_attr "type" "neon_shift_imm")] @@ -4465,7 +4465,7 @@ if (BYTES_BIG_ENDIAN) VQSHL_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 0, neon_element_bits (mode)); + arm_const_bounds (operands[2], 0, neon_element_bits (mode)); return "vqshl.%#\t%0, %1, %2"; } [(set_attr "type" "neon_sat_shift_imm")] @@ -4478,7 +4478,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VQSHLU_N))] "TARGET_NEON" { - neon_const_bounds (operands[2], 0, neon_element_bits (mode)); + arm_const_bounds (operands[2], 0, neon_element_bits (mode)); return "vqshlu.\t%0, %1, %2"; } [(set_attr "type" "neon_sat_shift_imm")] @@ -4492,7 +4492,7 @@ if (BYTES_BIG_ENDIAN) "TARGET_NEON" { /* The boundaries are: 0 < imm <= size. */ - neon_const_bounds (operands[2], 0, neon_element_bits (mode) + 1); + arm_const_bounds (operands[2], 0, neon_element_bits (mode) + 1); return "vshll.%#\t%q0, %P1, %2"; } [(set_attr "type" "neon_shift_imm_long")] @@ -4507,7 +4507,7 @@ if (BYTES_BIG_ENDIAN) VSRA_N))] "TARGET_NEON" { - neon_const_bounds (operands[3], 1, neon_element_bits (mode) + 1); + arm_const_bounds (operands[3], 1, neon_element_bits (mode) + 1); return "v.%#\t%0, %2, %3"; } [(set_attr "type" "neon_shift_acc")] @@ -4521,7 +4521,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VSRI))] "TARGET_NEON" { - neon_const_bounds (operands[3], 1, neon_element_bits (mode) + 1); + arm_const_bounds (operands[3], 1, neon_element_bits (mode) + 1); return "vsri.\t%0, %2, %3"; } [(set_attr "type" "neon_shift_reg")] @@ -4535,7 +4535,7 @@ if (BYTES_BIG_ENDIAN) UNSPEC_VSLI))] "TARGET_NEON" { - neon_const_bounds (operands[3], 0, neon_element_bits (mode)); + arm_const_bounds (operands[3], 0, neon_element_bits (mode)); return "vsli.\t%0, %2, %3"; } [(set_attr "type" "neon_shift_reg")] diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 3de138ca3c8960d4bf93e4db4d27413099bc4f72..b0b375c6ddfbe69fff9abc3bdb6bcd592dd341f2 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -539,6 +539,10 @@ ; crypto_sha1_slow ; crypto_sha256_fast ; crypto_sha256_slow +; +; The classification below is for coprocessor instructions +; +; coproc (define_attr "type" "adc_imm,\ @@ -1073,7 +1077,8 @@ crypto_sha1_fast,\ crypto_sha1_slow,\ crypto_sha256_fast,\ - crypto_sha256_slow" + crypto_sha256_slow,\ + coproc" (const_string "untyped")) ; Is this an (integer side) multiply with a 32-bit (or smaller) result? diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 719ea08c0c44d71b5f4ee6c7ac40e118d7dae60f..01dd700a0af8043ce40ada939f9b0c34d846eded 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -150,6 +150,8 @@ VUNSPEC_GET_FPSCR ; Represent fetch of FPSCR content. VUNSPEC_SET_FPSCR ; Represent assign of FPSCR content. VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing. + VUNSPEC_CDP ; Represent the coprocessor cdp instruction. + VUNSPEC_CDP2 ; Represent the coprocessor cdp2 instruction. ]) ;; Enumerators for NEON unspecs. diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 29f62e51097d634357997bc9245bbb9952affc92..befdea9edd90f1b8c6cb81cb833c07bd2454fa80 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -1886,7 +1886,7 @@ (float_truncate:HF (float:SF (match_dup 0))))] "TARGET_VFP_FP16INST" { - neon_const_bounds (operands[2], 1, 33); + arm_const_bounds (operands[2], 1, 33); return "vcvt.f16.32\t%0, %0, %2\;vmov.f32\t%3, %0"; } [(set_attr "conds" "unconditional") @@ -1903,7 +1903,7 @@ { rtx op1 = gen_reg_rtx (SImode); - neon_const_bounds (operands[2], 1, 33); + arm_const_bounds (operands[2], 1, 33); emit_move_insn (op1, operands[1]); emit_insn (gen_neon_vcvth_nhf_unspec (op1, op1, operands[2], @@ -1927,7 +1927,7 @@ VCVT_SI_US_N))] "TARGET_VFP_FP16INST" { - neon_const_bounds (operands[2], 1, 33); + arm_const_bounds (operands[2], 1, 33); return "vmov.f32\t%0, %1\;vcvt.%#32.f16\t%0, %0, %2"; } [(set_attr "conds" "unconditional") @@ -1945,7 +1945,7 @@ { rtx op1 = gen_reg_rtx (SImode); - neon_const_bounds (operands[2], 1, 33); + arm_const_bounds (operands[2], 1, 33); emit_insn (gen_neon_vcvth_nsi_unspec (op1, operands[1], operands[2])); emit_move_insn (operands[0], op1); DONE; diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 30bdcf07ad8b1bd086bbb87acb36fb0333944087..e85da3a03130f14a91c4cc5931fe275b11509939 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -12625,8 +12625,9 @@ The built-in intrinsics for the Advanced SIMD extension are available when NEON is enabled. Currently, ARM and AArch64 back ends do not support ACLE 2.0 fully. Both -back ends support CRC32 intrinsics from @file{arm_acle.h}. The ARM back end's -16-bit floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1. +back ends support CRC32 intrinsics and the ARM back end supports the +Coprocessor intrinsics, all from @file{arm_acle.h}. The ARM back end's 16-bit +floating-point Advanced SIMD intrinsics currently comply to ACLE v1.1. AArch64's back end does not have support for 16-bit floating point Advanced SIMD intrinsics yet. diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 204518d38daa3e0545f150bb3c4a8a1caee9330a..292a3c7e0a4d29650510db0685cb8d09411d3f7c 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1678,6 +1678,25 @@ div instruction. ARM target supports ARMv8-M Security Extensions, enabled by the @code{-mcmse} option. +@item arm_coproc1_ok +@anchor{arm_coproc1_ok} +ARM target supports the following coprocessor instructions: @code{CDP}, +@code{LDC}, @code{STC}, @code{MCR} and @code{MRC}. + +@item arm_coproc2_ok +@anchor{arm_coproc2_ok} +ARM target supports all the coprocessor instructions also listed as supported +in @ref{arm_coproc1_ok} in addition to the following: @code{CDP2}, @code{LDC2}, +@code{LDC2l}, @code{STC2}, @code{STC2l}, @code{MCR2} and @code{MRC2}. + +@item arm_coproc3_ok +@anchor{arm_coproc3_ok} +ARM target supports all the coprocessor instructions also listed as supported +in @ref{arm_coproc2_ok} in addition the following: @code{MCRR} and @code{MRRC}. + +@item arm_coproc4_ok +ARM target supports all the coprocessor instructions also listed as supported +in @ref{arm_coproc3_ok} in addition the following: @code{MCRR2} and @code{MRRC2}. @end table @subsubsection AArch64-specific attributes diff --git a/gcc/testsuite/gcc.target/arm/acle/acle.exp b/gcc/testsuite/gcc.target/arm/acle/acle.exp index c05080ebf1953b3443823a6665ccd0a1a09edb3a..aebf71cfbae594d951960c9ebfd3608003f7df78 100644 --- a/gcc/testsuite/gcc.target/arm/acle/acle.exp +++ b/gcc/testsuite/gcc.target/arm/acle/acle.exp @@ -27,9 +27,26 @@ load_lib gcc-dg.exp # Initialize `dg'. dg-init +set saved-dg-do-what-default ${dg-do-what-default} +set dg-do-what-default "assemble" + +set saved-lto_torture_options ${LTO_TORTURE_OPTIONS} + +# Add -ffat-lto-objects option to all LTO options such that we can do assembly +# scans. +proc add_fat_objects { list } { + set res {} + foreach el $list {set res [lappend res [concat $el " -ffat-lto-objects"]]} + return $res +}; +set LTO_TORTURE_OPTIONS [add_fat_objects ${LTO_TORTURE_OPTIONS}] + # Main loop. -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ "" "" +# Restore globals +set dg-do-what-default ${saved-dg-do-what-default} +set LTO_TORTURE_OPTIONS ${saved-lto_torture_options} # All done. dg-finish diff --git a/gcc/testsuite/gcc.target/arm/acle/cdp.c b/gcc/testsuite/gcc.target/arm/acle/cdp.c new file mode 100644 index 0000000000000000000000000000000000000000..28b218e7cfcdb7d6ce1381feb4c6dea3ff08a620 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/acle/cdp.c @@ -0,0 +1,14 @@ +/* Test the cdp ACLE intrinsic. */ + +/* { dg-do assemble } */ +/* { dg-options "-save-temps" } */ +/* { dg-require-effective-target arm_coproc1_ok } */ + +#include "arm_acle.h" + +void test_cdp (void) +{ + __arm_cdp (10, 1, 2, 3, 4, 5); +} + +/* { dg-final { scan-assembler "cdp\tp10, #1, CR2, CR3, CR4, #5\n" } } */ diff --git a/gcc/testsuite/gcc.target/arm/acle/cdp2.c b/gcc/testsuite/gcc.target/arm/acle/cdp2.c new file mode 100644 index 0000000000000000000000000000000000000000..00bcd502b563cfe6df1e5d4c2e53f8034063d47e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/acle/cdp2.c @@ -0,0 +1,14 @@ +/* Test the cdp2 ACLE intrinsic. */ + +/* { dg-do assemble } */ +/* { dg-options "-save-temps" } */ +/* { dg-require-effective-target arm_coproc2_ok } */ + +#include "arm_acle.h" + +void test_cdp2 (void) +{ + __arm_cdp2 (10, 4, 3, 2, 1, 0); +} + +/* { dg-final { scan-assembler "cdp2\tp10, #4, CR3, CR2, CR1, #0\n" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index e4e015e721620e649b879aa398a59c550b5cbac8..342304da4b5fd02c70956496bcd03cdabaf78b01 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8239,3 +8239,78 @@ proc check_effective_target_store_merge { } { return 0 } + +# Return 1 if the target supports coprocessor instructions: cdp, ldc, stc, mcr and +# mrc. +proc check_effective_target_arm_coproc1_ok_nocache { } { + if { ![istarget arm*-*-*] } { + return 0 + } + return [check_no_compiler_messages_nocache arm_coproc1_ok assembly { + #if (__thumb__ && !__thumb2__) || __ARM_ARCH < 4 + #error FOO + #endif + }] +} + +proc check_effective_target_arm_coproc1_ok { } { + return [check_cached_effective_target arm_coproc1_ok \ + check_effective_target_arm_coproc1_ok_nocache] +} + +# Return 1 if the target supports all coprocessor instructions checked by +# check_effective_target_arm_coproc1_ok in addition to the following: cdp2, +# ldc2, ldc2l, stc2, stc2l, mcr2 and mrc2. +proc check_effective_target_arm_coproc2_ok_nocache { } { + if { ![check_effective_target_arm_coproc1_ok] } { + return 0 + } + return [check_no_compiler_messages_nocache arm_coproc2_ok assembly { + #if __ARM_ARCH < 5 + #error FOO + #endif + }] +} + +proc check_effective_target_arm_coproc2_ok { } { + return [check_cached_effective_target arm_coproc2_ok \ + check_effective_target_arm_coproc2_ok_nocache] +} + +# Return 1 if the target supports all coprocessor instructions checked by +# check_effective_target_arm_coproc2_ok in addition the following: mcrr and +mrrc. +proc check_effective_target_arm_coproc3_ok_nocache { } { + if { ![check_effective_target_arm_coproc2_ok] } { + return 0 + } + return [check_no_compiler_messages_nocache arm_coproc3_ok assembly { + #if __ARM_ARCH < 6 && !defined (__ARM_ARCH_5TE__) + #error FOO + #endif + }] +} + +proc check_effective_target_arm_coproc3_ok { } { + return [check_cached_effective_target arm_coproc3_ok \ + check_effective_target_arm_coproc3_ok_nocache] +} + +# Return 1 if the target supports all coprocessor instructions checked by +# check_effective_target_arm_coproc3_ok in addition the following: mcrr2 and +# mrcc2. +proc check_effective_target_arm_coproc4_ok_nocache { } { + if { ![check_effective_target_arm_coproc3_ok] } { + return 0 + } + return [check_no_compiler_messages_nocache arm_coproc4_ok assembly { + #if __ARM_ARCH < 6 + #error FOO + #endif + }] +} + +proc check_effective_target_arm_coproc4_ok { } { + return [check_cached_effective_target arm_coproc4_ok \ + check_effective_target_arm_coproc4_ok_nocache] +}