From patchwork Tue Dec 5 10:25:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1872052 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SkxWJ0h5qz23mf for ; Tue, 5 Dec 2023 21:25:31 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BAF50385AE5F for ; Tue, 5 Dec 2023 10:25:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 8371D3858C2B for ; Tue, 5 Dec 2023 10:25:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8371D3858C2B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8371D3858C2B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701771917; cv=none; b=GqOF6YrjXyfO0HGuOEjBFDJ57RCJmKC2ky0WM1fFTlIjwn+v/Q6Li+mDFPDns+cQhC1dqPFPdSjqjiTdkf7ELkFhWCRit2m/8oiYvUgnoNR5WWpwg7MGGPh9LmeGsfEraPl04u88Bxf0ZyATOmMp2u9CN54FyH1cSkID4iSAbyM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701771917; c=relaxed/simple; bh=xaACl6TmBS4h779gCXgKzyhfPnV61PEa4DWNEyExbIY=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=fEfX7CNeCLbMMNaxc69OFhMPWPOsjewtdUuzVeDoiaVkiRY/yRFpoZqHblyJ6YKOOEkrpTDYV0yh6fRuW3+CdQtdiwzZsiLksJtYL9F3ULXcOw700IaPzirZoTf0xmaZMg0lKpGJVHxYlcrb9pmz4Bx+qy19Dlr1ziX3Ff/LJdY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E90FD1477; Tue, 5 Dec 2023 02:26:00 -0800 (PST) Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DB50E3F5A1; Tue, 5 Dec 2023 02:25:13 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Cc: Richard Sandiford Subject: [pushed v2 3/5] aarch64: Add svboolx2_t Date: Tue, 5 Dec 2023 10:25:01 +0000 Message-Id: <20231205102503.1923331-4-richard.sandiford@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231205102503.1923331-1-richard.sandiford@arm.com> References: <20231205102503.1923331-1-richard.sandiford@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-21.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org SME2 has some instructions that operate on pairs of predicates. The SME2 ACLE defines an svboolx2_t type for the associated intrinsics. The patch uses a double-width predicate mode, VNx32BI, to represent the contents, similarly to how data vector tuples work. At present there doesn't seem to be any need to define pairs for VNx2BI, VNx4BI and VNx8BI. We already supported pairs of svbool_ts at the PCS level, as part of a more general framework. All that changes on the PCS side is that we now have an associated mode. gcc/ * config/aarch64/aarch64-modes.def (VNx32BI): New mode. * config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare. * config/aarch64/aarch64-sve-builtins.cc (register_tuple_type): Handle tuples of predicates. (handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts. * config/aarch64/aarch64-sve.md (movvnx32bi): New insn. * config/aarch64/aarch64.cc (pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs of predicates. (pure_scalable_type_info::add_piece): Don't try to form pairs of predicates. (VEC_STRUCT): Generalize comment. (aarch64_classify_vector_mode): Handle VNx32BI. (aarch64_array_mode): Likewise. Return BLKmode for arrays of predicates that have no associated mode, rather than allowing an integer mode to be chosen. (aarch64_hard_regno_nregs): Handle VNx32BI. (aarch64_hard_regno_mode_ok): Likewise. (aarch64_split_double_move): New function, split out from... (aarch64_split_128bit_move): ...here. (aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p. (aarch64_pfalse_reg): Likewise. (aarch64_sve_same_pred_for_ptest_p): Likewise. (aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI. (aarch64_expand_mov_immediate): Restrict handling of boolean vector constants to single-predicate modes. (aarch64_classify_address): Handle VNx32BI, ensuring that both halves can be addressed. (aarch64_class_max_nregs): Handle VNx32BI. (aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t. (aarch64_simd_valid_immediate): Allow all-zeros and all-ones for VNx32BI. (aarch64_mov_operand_p): Restrict predicate constant canonicalization to single-predicate modes. (aarch64_evpc_ext): Generalize exclusion to all predicate modes. (aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise. * config/aarch64/constraints.md (PR_REGS): New predicate. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust stack offsets. (ret_nonpst3): Remove XFAIL. * gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test. --- gcc/config/aarch64/aarch64-modes.def | 3 + gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-sve-builtins.cc | 18 ++- gcc/config/aarch64/aarch64-sve.md | 22 +++ gcc/config/aarch64/aarch64.cc | 136 ++++++++++++------ gcc/config/aarch64/constraints.md | 4 + .../aarch64/sve/acle/general-c/svboolx2_1.c | 135 +++++++++++++++++ .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 6 +- 8 files changed, 272 insertions(+), 53 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index a3efc5b8484..ffca5517dec 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -48,16 +48,19 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format); /* Vector modes. */ +VECTOR_BOOL_MODE (VNx32BI, 32, BI, 4); VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2); VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2); VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2); VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2); +ADJUST_NUNITS (VNx32BI, aarch64_sve_vg * 16); ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8); ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4); ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2); ADJUST_NUNITS (VNx2BI, aarch64_sve_vg); +ADJUST_ALIGNMENT (VNx32BI, 2); ADJUST_ALIGNMENT (VNx16BI, 2); ADJUST_ALIGNMENT (VNx8BI, 2); ADJUST_ALIGNMENT (VNx4BI, 2); diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ce7046b050e..25a9103f0e7 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -947,6 +947,7 @@ rtx aarch64_simd_expand_builtin (int, tree, rtx); void aarch64_simd_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree); rtx aarch64_endian_lane_rtx (machine_mode, unsigned int); +void aarch64_split_double_move (rtx, rtx, machine_mode); void aarch64_split_128bit_move (rtx, rtx); bool aarch64_split_128bit_move_p (rtx, rtx); diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index e32f0f8f903..7e4b9e67ed8 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -3949,6 +3949,9 @@ register_vector_type (vector_type_index type) static void register_tuple_type (unsigned int num_vectors, vector_type_index type) { + tree vector_type = acle_vector_types[0][type]; + bool is_pred = GET_MODE_CLASS (TYPE_MODE (vector_type)) == MODE_VECTOR_BOOL; + /* Work out the structure name. */ char buffer[sizeof ("svbfloat16x4_t")]; const char *vector_type_name = vector_types[type].acle_name; @@ -3969,17 +3972,19 @@ register_tuple_type (unsigned int num_vectors, vector_type_index type) Using arrays simplifies the handling of svget and svset for variable arguments. */ - tree vector_type = acle_vector_types[0][type]; tree array_type = build_array_type_nelts (vector_type, num_vectors); gcc_assert (VECTOR_MODE_P (TYPE_MODE (array_type)) && TYPE_MODE_RAW (array_type) == TYPE_MODE (array_type) - && TYPE_ALIGN (array_type) == 128); + && TYPE_ALIGN (array_type) == (is_pred ? 16 : 128)); tree tuple_type = wrap_type_in_struct (array_type); - add_sve_type_attribute (tuple_type, num_vectors, 0, NULL, buffer); + if (is_pred) + add_sve_type_attribute (tuple_type, 0, num_vectors, NULL, buffer); + else + add_sve_type_attribute (tuple_type, num_vectors, 0, NULL, buffer); gcc_assert (VECTOR_MODE_P (TYPE_MODE (tuple_type)) && TYPE_MODE_RAW (tuple_type) == TYPE_MODE (tuple_type) - && TYPE_ALIGN (tuple_type) == 128); + && TYPE_ALIGN (tuple_type) == TYPE_ALIGN (array_type)); register_type_decl (tuple_type, buffer); @@ -4031,9 +4036,10 @@ handle_arm_sve_h () { vector_type_index type = vector_type_index (type_i); register_vector_type (type); - if (scalar_types[type_i] != boolean_type_node) + if (type != VECTOR_TYPE_svcount_t) for (unsigned int count = 2; count <= MAX_TUPLE_SIZE; ++count) - register_tuple_type (count, type); + if (type != VECTOR_TYPE_svbool_t || count == 2) + register_tuple_type (count, type); } /* Define the enums. */ diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 3f48e4cdf26..3729c67eb69 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -33,6 +33,7 @@ ;; ---- Moves of single vectors ;; ---- Moves of multiple vectors ;; ---- Moves of predicates +;; ---- Moves of multiple predicates ;; ---- Moves relating to the FFR ;; ;; == Loads @@ -1069,6 +1070,27 @@ (define_insn_and_rewrite "*aarch64_sve_ptrue_ptest" } ) +;; ------------------------------------------------------------------------- +;; ---- Moves of multiple predicates +;; ------------------------------------------------------------------------- + +(define_insn_and_split "movvnx32bi" + [(set (match_operand:VNx32BI 0 "nonimmediate_operand") + (match_operand:VNx32BI 1 "aarch64_mov_operand"))] + "TARGET_SVE" + {@ [ cons: =0 , 1 ] + [ Upa , Upa ] # + [ Upa , m ] # + [ m , Upa ] # + } + "&& reload_completed" + [(const_int 0)] + { + aarch64_split_double_move (operands[0], operands[1], VNx16BImode); + DONE; + } +) + ;; ------------------------------------------------------------------------- ;; ---- Moves relating to the FFR ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 48b7811c100..b29d56b3743 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -846,7 +846,7 @@ pure_scalable_type_info::piece::get_rtx (unsigned int first_zr, if (num_zr > 0 && num_pr == 0) return gen_rtx_REG (mode, first_zr); - if (num_zr == 0 && num_pr == 1) + if (num_zr == 0 && num_pr <= 2) return gen_rtx_REG (mode, first_pr); gcc_unreachable (); @@ -1069,6 +1069,7 @@ pure_scalable_type_info::add_piece (const piece &p) gcc_assert (VECTOR_MODE_P (p.mode) && VECTOR_MODE_P (prev.mode)); unsigned int nelems1, nelems2; if (prev.orig_mode == p.orig_mode + && GET_MODE_CLASS (p.orig_mode) != MODE_VECTOR_BOOL && known_eq (prev.offset + GET_MODE_SIZE (prev.mode), p.offset) && constant_multiple_p (GET_MODE_NUNITS (prev.mode), GET_MODE_NUNITS (p.orig_mode), &nelems1) @@ -1370,8 +1371,7 @@ aarch64_sve_pred_mode_p (machine_mode mode) const unsigned int VEC_ADVSIMD = 1; const unsigned int VEC_SVE_DATA = 2; const unsigned int VEC_SVE_PRED = 4; -/* Can be used in combination with VEC_ADVSIMD or VEC_SVE_DATA to indicate - a structure of 2, 3 or 4 vectors. */ +/* Indicates a structure of 2, 3 or 4 vectors or predicates. */ const unsigned int VEC_STRUCT = 8; /* Can be used in combination with VEC_SVE_DATA to indicate that the vector has fewer significant bytes than a full SVE vector. */ @@ -1534,6 +1534,9 @@ aarch64_classify_vector_mode (machine_mode mode, bool any_target_p = false) case E_V2DFmode: return (TARGET_FLOAT || any_target_p) ? VEC_ADVSIMD : 0; + case E_VNx32BImode: + return TARGET_SVE ? VEC_SVE_PRED | VEC_STRUCT : 0; + default: return 0; } @@ -1661,12 +1664,24 @@ aarch64_sve_data_mode (scalar_mode inner_mode, poly_uint64 nunits) static opt_machine_mode aarch64_array_mode (machine_mode mode, unsigned HOST_WIDE_INT nelems) { - if (aarch64_classify_vector_mode (mode) == VEC_SVE_DATA - && IN_RANGE (nelems, 2, 4)) + if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL) + { + /* Use VNx32BI for pairs of predicates, but explicitly reject giving + a mode to other array sizes. Using integer modes requires a round + trip through memory and generates terrible code. */ + if (nelems == 1) + return mode; + if (mode == VNx16BImode && nelems == 2) + return VNx32BImode; + return BLKmode; + } + + auto flags = aarch64_classify_vector_mode (mode); + if (flags == VEC_SVE_DATA && IN_RANGE (nelems, 2, 4)) return aarch64_sve_data_mode (GET_MODE_INNER (mode), GET_MODE_NUNITS (mode) * nelems); - if (aarch64_classify_vector_mode (mode) == VEC_ADVSIMD - && IN_RANGE (nelems, 2, 4)) + + if (flags == VEC_ADVSIMD && IN_RANGE (nelems, 2, 4)) return aarch64_advsimd_vector_array_mode (mode, nelems); return opt_machine_mode (); @@ -1886,13 +1901,17 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode) return GET_MODE_SIZE (mode).to_constant () / 8; return CEIL (lowest_size, UNITS_PER_VREG); } + case PR_REGS: case PR_LO_REGS: case PR_HI_REGS: + return mode == VNx32BImode ? 2 : 1; + case FFR_REGS: case PR_AND_FFR_REGS: case FAKE_REGS: return 1; + default: return CEIL (lowest_size, UNITS_PER_WORD); } @@ -1916,9 +1935,12 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) return mode == DImode; unsigned int vec_flags = aarch64_classify_vector_mode (mode); - if (vec_flags & VEC_SVE_PRED) + if (vec_flags == VEC_SVE_PRED) return pr_or_ffr_regnum_p (regno); + if (vec_flags == (VEC_SVE_PRED | VEC_STRUCT)) + return PR_REGNUM_P (regno); + if (pr_or_ffr_regnum_p (regno)) return false; @@ -3000,6 +3022,33 @@ aarch64_emit_binop (rtx dest, optab binoptab, rtx op0, rtx op1) emit_move_insn (dest, tmp); } +/* Split a move from SRC to DST into two moves of mode SINGLE_MODE. */ + +void +aarch64_split_double_move (rtx dst, rtx src, machine_mode single_mode) +{ + machine_mode mode = GET_MODE (dst); + + rtx dst0 = simplify_gen_subreg (single_mode, dst, mode, 0); + rtx dst1 = simplify_gen_subreg (single_mode, dst, mode, + GET_MODE_SIZE (single_mode)); + rtx src0 = simplify_gen_subreg (single_mode, src, mode, 0); + rtx src1 = simplify_gen_subreg (single_mode, src, mode, + GET_MODE_SIZE (single_mode)); + + /* At most one pairing may overlap. */ + if (reg_overlap_mentioned_p (dst0, src1)) + { + aarch64_emit_move (dst1, src1); + aarch64_emit_move (dst0, src0); + } + else + { + aarch64_emit_move (dst0, src0); + aarch64_emit_move (dst1, src1); + } +} + /* Split a 128-bit move operation into two 64-bit move operations, taking care to handle partial overlap of register to register copies. Special cases are needed when moving between GP regs and @@ -3009,9 +3058,6 @@ aarch64_emit_binop (rtx dest, optab binoptab, rtx op0, rtx op1) void aarch64_split_128bit_move (rtx dst, rtx src) { - rtx dst_lo, dst_hi; - rtx src_lo, src_hi; - machine_mode mode = GET_MODE (dst); gcc_assert (mode == TImode || mode == TFmode || mode == TDmode); @@ -3026,8 +3072,8 @@ aarch64_split_128bit_move (rtx dst, rtx src) /* Handle FP <-> GP regs. */ if (FP_REGNUM_P (dst_regno) && GP_REGNUM_P (src_regno)) { - src_lo = gen_lowpart (word_mode, src); - src_hi = gen_highpart (word_mode, src); + rtx src_lo = gen_lowpart (word_mode, src); + rtx src_hi = gen_highpart (word_mode, src); emit_insn (gen_aarch64_movlow_di (mode, dst, src_lo)); emit_insn (gen_aarch64_movhigh_di (mode, dst, src_hi)); @@ -3035,8 +3081,8 @@ aarch64_split_128bit_move (rtx dst, rtx src) } else if (GP_REGNUM_P (dst_regno) && FP_REGNUM_P (src_regno)) { - dst_lo = gen_lowpart (word_mode, dst); - dst_hi = gen_highpart (word_mode, dst); + rtx dst_lo = gen_lowpart (word_mode, dst); + rtx dst_hi = gen_highpart (word_mode, dst); emit_insn (gen_aarch64_movdi_low (mode, dst_lo, src)); emit_insn (gen_aarch64_movdi_high (mode, dst_hi, src)); @@ -3044,22 +3090,7 @@ aarch64_split_128bit_move (rtx dst, rtx src) } } - dst_lo = gen_lowpart (word_mode, dst); - dst_hi = gen_highpart (word_mode, dst); - src_lo = gen_lowpart (word_mode, src); - src_hi = gen_highpart_mode (word_mode, mode, src); - - /* At most one pairing may overlap. */ - if (reg_overlap_mentioned_p (dst_lo, src_hi)) - { - aarch64_emit_move (dst_hi, src_hi); - aarch64_emit_move (dst_lo, src_lo); - } - else - { - aarch64_emit_move (dst_lo, src_lo); - aarch64_emit_move (dst_hi, src_hi); - } + aarch64_split_double_move (dst, src, word_mode); } /* Return true if we should split a move from 128-bit value SRC @@ -3325,7 +3356,7 @@ aarch64_ptrue_all (unsigned int elt_size) rtx aarch64_ptrue_reg (machine_mode mode) { - gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL); + gcc_assert (aarch64_sve_pred_mode_p (mode)); rtx reg = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode)); return gen_lowpart (mode, reg); } @@ -3335,7 +3366,7 @@ aarch64_ptrue_reg (machine_mode mode) rtx aarch64_pfalse_reg (machine_mode mode) { - gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL); + gcc_assert (aarch64_sve_pred_mode_p (mode)); rtx reg = force_reg (VNx16BImode, CONST0_RTX (VNx16BImode)); return gen_lowpart (mode, reg); } @@ -3351,7 +3382,7 @@ bool aarch64_sve_same_pred_for_ptest_p (rtx *pred1, rtx *pred2) { machine_mode mode = GET_MODE (pred1[0]); - gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL + gcc_assert (aarch64_sve_pred_mode_p (mode) && mode == GET_MODE (pred2[0]) && aarch64_sve_ptrue_flag (pred1[1], SImode) && aarch64_sve_ptrue_flag (pred2[1], SImode)); @@ -4824,7 +4855,9 @@ aarch64_sme_mode_switch_regs::add_reg (machine_mode mode, unsigned int regno) machine_mode submode = mode; if (vec_flags & VEC_STRUCT) { - if (vec_flags & VEC_SVE_DATA) + if (vec_flags & VEC_SVE_PRED) + submode = VNx16BImode; + else if (vec_flags & VEC_SVE_DATA) submode = SVE_BYTE_MODE; else if (vec_flags & VEC_PARTIAL) submode = V8QImode; @@ -4833,7 +4866,7 @@ aarch64_sme_mode_switch_regs::add_reg (machine_mode mode, unsigned int regno) } save_location loc; loc.reg = gen_rtx_REG (submode, regno); - if (vec_flags == VEC_SVE_PRED) + if (vec_flags & VEC_SVE_PRED) { gcc_assert (PR_REGNUM_P (regno)); loc.group = MEM_SVE_PRED; @@ -5845,7 +5878,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) if (!CONST_INT_P (imm)) { - if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL) + if (aarch64_sve_pred_mode_p (mode)) { /* Only the low bit of each .H, .S and .D element is defined, so we can set the upper bits to whatever we like. If the @@ -10311,6 +10344,15 @@ aarch64_classify_address (struct aarch64_address_info *info, if (vec_flags == VEC_SVE_PRED) return offset_9bit_signed_scaled_p (mode, offset); + if (vec_flags == (VEC_SVE_PRED | VEC_STRUCT)) + { + poly_int64 end_offset = (offset + + GET_MODE_SIZE (mode) + - BYTES_PER_SVE_PRED); + return (offset_9bit_signed_scaled_p (VNx16BImode, end_offset) + && offset_9bit_signed_scaled_p (VNx16BImode, offset)); + } + if (load_store_pair_p) return ((known_eq (GET_MODE_SIZE (mode), 4) || known_eq (GET_MODE_SIZE (mode), 8) @@ -12611,10 +12653,12 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) ? CEIL (lowest_size, UNITS_PER_VREG) : CEIL (lowest_size, UNITS_PER_WORD)); - case STACK_REG: case PR_REGS: case PR_LO_REGS: case PR_HI_REGS: + return mode == VNx32BImode ? 2 : 1; + + case STACK_REG: case FFR_REGS: case PR_AND_FFR_REGS: case FAKE_REGS: @@ -20252,11 +20296,11 @@ aarch64_member_type_forces_blk (const_tree field_or_array, machine_mode mode) an ARRAY_TYPE. In both cases we're interested in the TREE_TYPE. */ const_tree type = TREE_TYPE (field_or_array); - /* Assign BLKmode to anything that contains multiple SVE predicates. + /* Assign BLKmode to anything that contains more than 2 SVE predicates. For structures, the "multiple" case is indicated by MODE being VOIDmode. */ unsigned int num_zr, num_pr; - if (aarch64_sve::builtin_type_p (type, &num_zr, &num_pr) && num_pr != 0) + if (aarch64_sve::builtin_type_p (type, &num_zr, &num_pr) && num_pr > 2) { if (TREE_CODE (field_or_array) == ARRAY_TYPE) return !simple_cst_equal (TYPE_SIZE (field_or_array), @@ -21496,6 +21540,9 @@ aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info, if ((vec_flags & VEC_ADVSIMD) && !TARGET_SIMD) return false; + if (vec_flags == (VEC_SVE_PRED | VEC_STRUCT)) + return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode); + if (vec_flags & VEC_SVE_PRED) return aarch64_sve_pred_valid_immediate (op, info); @@ -21669,7 +21716,8 @@ aarch64_mov_operand_p (rtx x, machine_mode mode) force everything to have a canonical form. */ if (!lra_in_progress && !reload_completed - && GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_BOOL + && aarch64_sve_pred_mode_p (GET_MODE (x)) + && known_eq (GET_MODE_SIZE (GET_MODE (x)), BYTES_PER_SVE_PRED) && GET_MODE (x) != VNx16BImode) return false; @@ -24272,7 +24320,7 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d) /* The first element always refers to the first vector. Check if the extracted indices are increasing by one. */ - if (d->vec_flags == VEC_SVE_PRED + if ((d->vec_flags & VEC_SVE_PRED) || !d->perm[0].is_constant (&location) || !d->perm.series_p (0, 1, location, 1)) return false; @@ -24316,7 +24364,7 @@ aarch64_evpc_rev_local (struct expand_vec_perm_d *d) unsigned int i, size, unspec; machine_mode pred_mode; - if (d->vec_flags == VEC_SVE_PRED + if ((d->vec_flags & VEC_SVE_PRED) || !d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff) @@ -24397,7 +24445,7 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d) machine_mode vmode = d->vmode; rtx lane; - if (d->vec_flags == VEC_SVE_PRED + if ((d->vec_flags & VEC_SVE_PRED) || d->perm.encoding ().encoded_nelts () != 1 || !d->perm[0].is_constant (&elt)) return false; diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 38ed927ec14..78a62af1abf 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -42,6 +42,10 @@ (define_register_constraint "w" "FP_REGS" (define_register_constraint "Upa" "PR_REGS" "SVE predicate registers p0 - p15.") +(define_register_constraint "Up2" "PR_REGS" + "An even SVE predicate register, p0 - p14." + "regno % 2 == 0") + (define_register_constraint "Upl" "PR_LO_REGS" "SVE predicate registers p0 - p7.") diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c new file mode 100644 index 00000000000..877b1849986 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c @@ -0,0 +1,135 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include + +/* +** ret_p0: +** ret +*/ +svboolx2_t +ret_p0 (svboolx2_t p0) +{ + return p0; +} + +/* +** ret_p1: +** mov p0\.b, p1\.b +** mov p1\.b, p2\.b +** ret +*/ +svboolx2_t +ret_p1 (svbool_t p0, svboolx2_t p1) +{ + return p1; +} + +/* +** ret_p2: +** ( +** mov p0\.b, p2\.b +** mov p1\.b, p3\.b +** | +** mov p1\.b, p3\.b +** mov p0\.b, p2\.b +** ) +** ret +*/ +svboolx2_t +ret_p2 (svboolx2_t p0, svboolx2_t p2) +{ + return p2; +} + +/* +** ret_mem: +** ( +** ldr p0, \[x0\] +** ldr p1, \[x0, #1, mul vl\] +** | +** ldr p1, \[x0, #1, mul vl\] +** ldr p0, \[x0\] +** ) +** ret +*/ +svboolx2_t +ret_mem (svboolx2_t p0, svbool_t p2, svboolx2_t mem) +{ + return mem; +} + +/* +** load: +** ( +** ldr p0, \[x0\] +** ldr p1, \[x0, #1, mul vl\] +** | +** ldr p1, \[x0, #1, mul vl\] +** ldr p0, \[x0\] +** ) +** ret +*/ +svboolx2_t +load (svboolx2_t *ptr) +{ + return *ptr; +} + +/* +** store: +** ( +** str p1, \[x0\] +** str p2, \[x0, #1, mul vl\] +** | +** str p2, \[x0, #1, mul vl\] +** str p1, \[x0\] +** ) +** ret +*/ +void +store (svbool_t p0, svboolx2_t p1, svboolx2_t *ptr) +{ + *ptr = p1; +} + +/* +** upa_p1: +** ret +*/ +void +upa_p1 (svbool_t p0, svboolx2_t p1) +{ + asm volatile ("" :: "Upa" (p1)); +} + +/* +** up2_p1: +** ( +** mov p0\.b, p1\.b +** mov p1\.b, p2\.b +** | +** mov p3\.b, p2\.b +** mov p2\.b, p1\.b +** ) +** ret +*/ +void +up2_p1 (svbool_t p0, svboolx2_t p1) +{ + asm volatile ("" :: "Up2" (p1)); +} + +/* +** p1_to_p2: +** mov p3\.b, p2\.b +** mov p2\.b, p1\.b +** ret +*/ +void +p1_to_p2 (svbool_t p0, svboolx2_t p1) +{ + register svboolx2_t p2 asm ("p2") = p1; + asm volatile ("" :: "Up2" (p2)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c index f6d78469aa5..b8fe86058a9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c @@ -908,8 +908,8 @@ SEL2 (union, nonpst3) /* ** test_nonpst3: ** sub sp, sp, #16 -** str w0, \[sp, #?8\] -** ldr p0, \[sp, #4, mul vl\] +** str w0, \[sp, #?12\] +** ldr p0, \[sp, #6, mul vl\] ** add sp, sp, #?16 ** ret */ @@ -921,7 +921,7 @@ test_nonpst3 (union nonpst3 x) } /* -** ret_nonpst3: { xfail *-*-* } +** ret_nonpst3: ** mov w0, #?(?:0xffff|65535) ** ret */