From patchwork Thu Jul 4 03:22:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1956584 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=Zg5W8r+c; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WF2Bv6Fb6z1xqb for ; Thu, 4 Jul 2024 13:27:19 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F0C3D384A498 for ; Thu, 4 Jul 2024 03:27:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 759FF3858403; Thu, 4 Jul 2024 03:26:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 759FF3858403 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 759FF3858403 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720063616; cv=none; b=biV+xSM7/QHU71wnL6SesBj1NncGOfNsWmoowTLuBVeTcwOP5Osb0ZPPrvCCyCi5eWjprb47Gt4oalFhin1H7+a2YYG3lQfeR+Gi+oZBirwwVvhcWc2RY7MN5edTZyvAtiQI/RSoB9QtfhWQB3DnvzOycm5vYSCegvWmht/BldM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720063616; c=relaxed/simple; bh=QVy4v89qEeSTpPNW6ZCgprPDIdMACNhcuIQxYQ869qI=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=P8KxhILJNg8XfbnziFZdnz1GNk7wS5Z3K2wDSpT5foqMRiwFeXWn8p+sa0Fl6u+PAzv7F8OZGToJ1lvlC3TC1bqV+S6EEXJ17czHjJBm4qvthorFdlTF0ApDAfyPCpzpZ8EySVlbAh9bKmmHpUTkPgL/DnCOnh6aNE9hRgNokRs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4642xh4N017193; Thu, 4 Jul 2024 03:26:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from :to:cc:subject:date:message-id:content-transfer-encoding :mime-version; s=pp1; bh=yjB82zDS0F7z0v4uOKXpYZ7YARAbTYYCfoKaYHJ csXM=; b=Zg5W8r+cdgoPVXy8JTqSgAFxNkLgzSG9nwaieSPbkMyIDB7gSujnO/+ 9VtLs3ucAwKYqiAU/C2LAT44qCvq39E4EKnNtjFm/R6lm0fHGaUEMpUqgqbznx6y 4omyTvqn+G00ZivH6n+YNzkgMaIGnZewI3ph2hr4o1I91ZKqrBrRiY8E06V5YMC1 2Es8CM3Lp4j++qgAe+UVyhFob/3sidEvRhIipT0zqANzI4ByTwB9Px8LJDfJhnlp 7mgKIpOFQRSkWDZE7hrqGvG9xGTHlbQ6e703JdlC1fx8ffTbuZl0hGDUAVZi6ohJ mEGXiJZbeIBXBw1HCrMTLezCjRYaGhQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 405kdxr1p0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Jul 2024 03:26:47 +0000 (GMT) Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 4643QlD4022380; Thu, 4 Jul 2024 03:26:47 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 405kdxr1mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Jul 2024 03:26:47 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4642CMjx005930; Thu, 4 Jul 2024 03:22:51 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 402vkue80h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Jul 2024 03:22:50 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4643Mj3t55705952 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 4 Jul 2024 03:22:47 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 74C5B20043; Thu, 4 Jul 2024 03:22:45 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F198020040; Thu, 4 Jul 2024 03:22:43 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 4 Jul 2024 03:22:43 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, linkw@gcc.gnu.org, dje.gcc@gmail.com, bergner@linux.ibm.com, guojiufu@linux.ibm.com Subject: [PATCH V2] fsra: gimple final sra pass for paramters and returns Date: Thu, 4 Jul 2024 11:22:43 +0800 Message-Id: <20240704032243.733068-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: M1sGq6B261C26FOgFjlUR1InPpRH80b9 X-Proofpoint-ORIG-GUID: 5BbWLtencPqEWtztAExHFFGSVQUfaOb3 X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-03_18,2024-07-03_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 bulkscore=0 priorityscore=1501 impostorscore=0 clxscore=1015 spamscore=0 mlxlogscore=999 mlxscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2406140001 definitions=main-2407040025 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, There are a few PRs (meta-bug PR101926) about accessing aggregate param/returns which are passed through registers. We could use the current SRA pass in a special mode right before RTL expansion for the incoming/outgoing part, as the talked at: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637935.html This patch is using IFN ARG_PARTS and SET_RET_PARTS for parameters and returns. And expand the IFNs according to the incoming/outgoing registers. Again there are a few thing could be enhanced for this patch: * Multi-registers access * Parameter access cross call * Optimize for access parameter which in memory * More cases/targets checking Compare with previous version: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html This version refactors expand_ARG_PARTS and expand_SET_RET_PARTS. Bootstrapped/regtested on ppc64{,le}, x86_64. (With 2 expected failures: bfxil_1.c and pr101908-3.c). Is this ok for trunk, or continue to enhance the patch? BR, Jeff (Jiufu Guo) PR target/108073 PR target/69143 gcc/ChangeLog: * cfgexpand.cc (expand_value_return): Update for rtx eq checking. (expand_return): Update for sclarized returns. * internal-fn.cc (query_position_in_parallel): New function. (get_incoming_reg): New function. (reference_alias_ptr_type): Extern declare. (expand_ARG_PARTS): New IFN expand. (get_outgoing_reg): New function. (expand_SET_RET_PARTS): New IFN expand. (expand_SET_RET_LAST_PARTS): New IFN expand. * internal-fn.def (ARG_PARTS): New IFN. (SET_RET_PARTS): New IFN. (SET_RET_LAST_PARTS): New IFN. * passes.def (pass_sra_final): Add new pass. * tree-pass.h (make_pass_sra_final): New function. * tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA. (build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA. (scan_function): Update for argment in fsra. (find_var_candidates): Collect candidates for SRA_MODE_FINAL_INTRA. (analyze_access_subtree): Update analyze for fsra. (generate_subtree_copies): Update to generate new IFNs. (final_intra_sra): New function. (class pass_sra_final): New pass class. (make_pass_sra_final): New function. gcc/testsuite/ChangeLog: * g++.target/powerpc/pr102024.C: Update instructions. * gcc.target/powerpc/pr108073-1.c: New test. * gcc.target/powerpc/pr108073.c: New test. * gcc.target/powerpc/pr69143.c: New test. --- gcc/cfgexpand.cc | 6 +- gcc/internal-fn.cc | 267 ++++++++++++++++++ gcc/internal-fn.def | 9 + gcc/passes.def | 2 + gcc/tree-pass.h | 1 + gcc/tree-sra.cc | 156 +++++++++- gcc/testsuite/g++.target/powerpc/pr102024.C | 3 +- gcc/testsuite/gcc.target/powerpc/pr108073-1.c | 76 +++++ gcc/testsuite/gcc.target/powerpc/pr108073.c | 74 +++++ gcc/testsuite/gcc.target/powerpc/pr69143.c | 23 ++ 10 files changed, 601 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index dad3ae1b7c6..c8dbdf94941 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -3789,7 +3789,7 @@ expand_value_return (rtx val) tree decl = DECL_RESULT (current_function_decl); rtx return_reg = DECL_RTL (decl); - if (return_reg != val) + if (!rtx_equal_p (return_reg, val)) { tree funtype = TREE_TYPE (current_function_decl); tree type = TREE_TYPE (decl); @@ -3862,6 +3862,10 @@ expand_return (tree retval) been stored into it, so we don't have to do anything special. */ if (TREE_CODE (retval_rhs) == RESULT_DECL) expand_value_return (result_rtl); + /* return is scalarized by fsra. */ + else if (VAR_P (retval_rhs) + && rtx_equal_p (result_rtl, DECL_RTL (retval_rhs))) + expand_null_return_1 (); /* If the result is an aggregate that is being returned in one (or more) registers, load the registers here. */ diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 4948b48bde8..e8d21096907 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3474,6 +3474,273 @@ expand_ACCESS_WITH_SIZE (internal_fn, gcall *stmt) expand_assignment (lhs, ref_to_obj, false); } +/* From the parallel rtx register series REGS, compute which registers + are touched at {BITPOS, BITSIZE}. The results are stored into + START_INDEX, END_INDEX, LEFT_BITS and RIGHT_BITS. */ + +void +query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, + rtx regs, int &start_index, int &end_index, + HOST_WIDE_INT &left_bits, HOST_WIDE_INT &right_bits) +{ + int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1; + for (; cur_index < XVECLEN (regs, 0); cur_index++) + { + rtx slot = XVECEXP (regs, 0, cur_index); + HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT; + machine_mode mode = GET_MODE (XEXP (slot, 0)); + HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant (); + if (off <= bitpos && off + size > bitpos) + { + start_index = cur_index; + left_bits = bitpos - off; + } + if (off + size >= bitpos + bitsize) + { + end_index = cur_index; + right_bits = off + size - (bitpos + bitsize); + break; + } + } +} + +/* For an access on ARG at {BITPOS, BITSIZE}, compute a RTX + expression for the access. ARG is an aggregate parameter + of a function, and it should be passed through registers. + TYPE is the expected result type. + Return NULL_RTX if fail to extract the expression, otherwise, + return the rtx expression. */ + +static rtx +get_incoming_reg (tree arg, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, + bool reversep, tree type) +{ + rtx regs = DECL_INCOMING_RTL (arg); + int start_index = -1; + int end_index = -1; + HOST_WIDE_INT left_bits = 0; + HOST_WIDE_INT right_bits = 0; + + if (REG_P (regs) && GET_MODE (regs) == BLKmode) + { + HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (arg)); + /* For possible REG_PADDING */ + if (size < UNITS_PER_WORD) + { + regs = gen_rtx_REG (word_mode, REGNO (regs)); + rtx arg_rtx = arg->decl_with_rtl.rtl; // DECL_RTL + emit_move_insn (regs, adjust_address (arg_rtx, word_mode, 0)); + } + HOST_WIDE_INT end_bits = bitpos + bitsize - 1; + start_index = bitpos / BITS_PER_WORD; + left_bits = bitpos % BITS_PER_WORD; + end_index = end_bits / BITS_PER_WORD; + right_bits = BITS_PER_WORD - 1 - (end_bits % BITS_PER_WORD); + } + else + { + if (GET_CODE (regs) != PARALLEL) + return NULL_RTX; + query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index, + left_bits, right_bits); + } + + gcc_assert (start_index >= 0 && end_index >= start_index); + + /* Access mult-registers. */ + if (end_index != start_index) + return NULL_RTX; + + /* Just need one reg for the access. */ + rtx reg = REG_P (regs) ? gen_rtx_REG (word_mode, REGNO (regs) + start_index) + : XEXP (XVECEXP (regs, 0, start_index), 0); + + machine_mode expr_mode = TYPE_MODE (type); + if (left_bits == 0 && right_bits == 0) + { + if (reversep) + reg = flip_storage_order (GET_MODE (reg), reg); + if (GET_MODE (reg) != expr_mode) + reg = gen_lowpart (expr_mode, reg); + return reg; + } + + /* Need to extract bitfield part reg for the access. + left_bits != 0 or right_bits != 0 */ + scalar_int_mode imode; + if (!int_mode_for_mode (expr_mode).exists (&imode)) + return NULL_RTX; + + if (expr_mode != imode + && known_gt (GET_MODE_SIZE (GET_MODE (regs)), UNITS_PER_WORD)) + return NULL_RTX; + + machine_mode mode = GET_MODE (reg); + bool sgn = TYPE_UNSIGNED (type); + rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX, mode, + imode, reversep, NULL); + + if (GET_MODE (bfld) != imode) + bfld = gen_lowpart (imode, bfld); + + if (expr_mode == imode) + return bfld; + + /* expr_mode != imode, e.g. SF != SI. */ + rtx result = gen_reg_rtx (imode); + emit_move_insn (result, bfld); + return gen_lowpart (expr_mode, result); +} + +/* Extern function for building MEM_REF rtx. */ +tree +reference_alias_ptr_type (tree t); + +/* Expand the IFN_ARG_PARTS function: + LHS = .ARG_PARTS(INCOMING_ARG, BIT_OFFSET, BIT_SIZE, REVERSEP). */ + +static void +expand_ARG_PARTS (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + HOST_WIDE_INT offset = tree_to_shwi (gimple_call_arg (stmt, 1)); + HOST_WIDE_INT size = tree_to_shwi (gimple_call_arg (stmt, 2)); + int reversep = tree_to_shwi (gimple_call_arg (stmt, 3)); + rtx reg = get_incoming_reg (arg, offset, size, reversep, TREE_TYPE (lhs)); + if (reg) + { + rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + if (dest && REG_P (dest)) + { + emit_move_insn (dest, reg); + return; + } + } + + tree type = TREE_TYPE (lhs); + /* Access bitfiled. */ + if ((INTEGRAL_TYPE_P (type) && !type_has_mode_precision_p (type)) + || offset % BITS_PER_UNIT != 0 || size % BITS_PER_UNIT != 0) + { + machine_mode mode = TYPE_MODE (type); + rtx src + = expand_expr_real (arg, NULL, VOIDmode, EXPAND_NORMAL, NULL, true); + src = extract_bit_field (src, size, offset, TYPE_UNSIGNED (type), NULL, + mode, mode, reversep, NULL); + rtx dest = expand_expr (lhs, NULL, VOIDmode, EXPAND_WRITE); + if (GET_CODE (dest) == SUBREG && SUBREG_PROMOTED_VAR_P (dest)) + convert_move (SUBREG_REG (dest), src, SUBREG_PROMOTED_SIGN (dest)); + else + emit_move_insn (dest, src); + return; + } + + /* Fall to original expand. */ + gcc_assert (offset % BITS_PER_UNIT == 0 && size % BITS_PER_UNIT == 0); + tree base = build_fold_addr_expr (arg); + tree atype = reference_alias_ptr_type (arg); + tree off = build_int_cst (atype, offset / BITS_PER_UNIT); + location_t loc = EXPR_LOCATION (arg); + tree rhs = fold_build2_loc (loc, MEM_REF, type, base, off); + REF_REVERSE_STORAGE_ORDER (rhs) = reversep; + expand_assignment (lhs, rhs, false); +} + +/* REGS constains the function return registers, compute which register(s) + are touched at {BITPOS, BITSIZE}. + If the accessed register(s) are whole register(s) or lowpart of a + register, then return it(them) as it(they) can be the dest of an + assignment. Otherwise return NULL_RTX. */ + +static rtx +get_outgoing_reg (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize) +{ + if (GET_CODE (regs) != PARALLEL) + return NULL_RTX; + + int start_index = -1; + int end_index = -1; + HOST_WIDE_INT left_bits = 0; + HOST_WIDE_INT right_bits = 0; + query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index, + left_bits, right_bits); + + gcc_assert (start_index >= 0 && end_index >= start_index); + + if (end_index != start_index) + return NULL_RTX; + + if (!((left_bits == 0 && !BITS_BIG_ENDIAN) + || (right_bits == 0 && BITS_BIG_ENDIAN))) + return NULL_RTX; + + /* Just need one reg for the access. */ + rtx dest = XEXP (XVECEXP (regs, 0, start_index), 0); + machine_mode mode = GET_MODE (dest); + + if (left_bits != 0 || right_bits != 0) + { + machine_mode small_mode; + if (!SCALAR_INT_MODE_P (mode) + || !mode_for_size (bitsize, GET_MODE_CLASS (mode), 0) + .exists (&small_mode)) + return NULL_RTX; + + dest = gen_lowpart (small_mode, dest); + mode = small_mode; + } + + return dest; +} + +/* Expand the IFN_SET_RET_PARTS function: + FILED_OF_RET = .SET_REG_PARTS(RET_BASE, BIT_OFFSET, BIT_SIZE, SRC). + e.g. D.2774.f = .SET_RET_PARTS (D.2774, 0, 8, SRC); return D.2774; */ + +static void +expand_SET_RET_PARTS (internal_fn, gcall *stmt) +{ + HOST_WIDE_INT offset = tree_to_shwi (gimple_call_arg (stmt, 1)); + HOST_WIDE_INT size = tree_to_shwi (gimple_call_arg (stmt, 2)); + tree rhs = gimple_call_arg (stmt, 3); + tree decl = DECL_RESULT (current_function_decl); + rtx dest_regs = decl->decl_with_rtl.rtl; // DECL_RTL (base); + rtx reg = get_outgoing_reg (dest_regs, offset, size); + if (reg) + { + rtx src = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL); + machine_mode mode = GET_MODE (reg); + if (mode != GET_MODE (src)) + src = gen_lowpart (mode, src); + emit_move_insn (reg, src); + } + else + { + /* Reach here, there would be bitfield access. */ + tree base = gimple_call_arg (stmt, 0); + tree lhs = gimple_call_lhs (stmt); + expand_assignment (base, decl, false); + expand_assignment (lhs, rhs, false); + expand_assignment (decl, base, false); + } +} + +/* Similar with expand_SET_RET_PARTS, the only difference is: this IFN + indicates the last part of the function return is computed. */ + +static void +expand_SET_RET_LAST_PARTS (internal_fn, gcall *stmt) +{ + expand_SET_RET_PARTS (IFN_SET_RET_PARTS, stmt); + + /* Replace the RET_VAL's rtl with real function's result rtl. */ + tree decl = DECL_RESULT (current_function_decl); + rtx dest_regs = decl->decl_with_rtl.rtl; // DECL_RTL (base); + tree base = gimple_call_arg (stmt, 0); + base->decl_with_rtl.rtl = dest_regs; // SET_DECL_RTL +} + /* The size of an OpenACC compute dimension. */ static void diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index a8c83437ada..106811b4457 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -518,6 +518,15 @@ DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) 2nd argument. */ DEF_INTERNAL_FN (ACCESS_WITH_SIZE, ECF_PURE | ECF_LEAF | ECF_NOTHROW, NULL) +/* A function to extract elemet(s) from an aggregate argument in fsra. */ +DEF_INTERNAL_FN (ARG_PARTS, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) + +/* Functions to set/construct elemet(s) for an 'return' aggregate. */ +DEF_INTERNAL_FN (SET_RET_PARTS, ECF_LEAF | ECF_NOTHROW, NULL) +/* Functions to set/construct elemet(s) for a 'return' aggregate just before +return statement. */ +DEF_INTERNAL_FN (SET_RET_LAST_PARTS, ECF_LEAF | ECF_NOTHROW, NULL) + /* DIM_SIZE and DIM_POS return the size of a particular compute dimension and the executing thread's position within that dimension. DIM_POS is pure (and not const) so that it isn't diff --git a/gcc/passes.def b/gcc/passes.def index 041229e47a6..4808bec61cf 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -450,6 +450,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_harden_conditional_branches); NEXT_PASS (pass_harden_compares); NEXT_PASS (pass_warn_access, /*early=*/false); + NEXT_PASS (pass_sra_final); + NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index edebb2be245..0bfb1283b90 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_early_tree_profile (gcc::context *ctxt); extern gimple_opt_pass *make_pass_cleanup_eh (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sra (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sra_early (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_sra_final (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt); extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt); diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 8040b0c5645..0c78084d4f9 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -21,14 +21,16 @@ along with GCC; see the file COPYING3. If not see . */ /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run - twice, once in the early stages of compilation (early SRA) and once in the - late stages (late SRA). The aim of both is to turn references to scalar - parts of aggregates into uses of independent scalar variables. + three times, once in the early stages of compilation (early SRA) and once + in the late stages (late SRA). The aim of them is to turn references to + scalar parts of aggregates into uses of independent scalar variables. - The two passes are nearly identical, the only difference is that early SRA + The three passes are nearly identical, the difference are that early SRA does not scalarize unions which are used as the result in a GIMPLE_RETURN statement because together with inlining this can lead to weird type - conversions. + conversions. The third pass is more care about parameters and returns, + it would be helpful for the parameters and returns which are passed through + registers. Both passes operate in four stages: @@ -104,6 +106,7 @@ along with GCC; see the file COPYING3. If not see /* Enumeration of all aggregate reductions we can do. */ enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ + SRA_MODE_FINAL_INTRA, /* final gimple intraprocedural SRA */ SRA_MODE_INTRA }; /* late intraprocedural SRA */ /* Global variable describing which aggregate reduction we are performing at @@ -1549,7 +1552,8 @@ build_accesses_from_assign (gimple *stmt) } if (lacc && racc - && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA) + && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA + || sra_mode == SRA_MODE_FINAL_INTRA) && !lacc->grp_unscalarizable_region && !racc->grp_unscalarizable_region && AGGREGATE_TYPE_P (TREE_TYPE (lhs)) @@ -1616,7 +1620,8 @@ scan_function (void) tree t; unsigned i; - if (gimple_code (stmt) != GIMPLE_CALL) + if (gimple_code (stmt) != GIMPLE_CALL + || sra_mode == SRA_MODE_FINAL_INTRA) walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL, scan_visit_addr); @@ -2261,6 +2266,24 @@ find_var_candidates (void) parm = DECL_CHAIN (parm)) ret |= maybe_add_sra_candidate (parm); + /* fsra only care about parameters and returns */ + if (sra_mode == SRA_MODE_FINAL_INTRA) + { + if (!DECL_RESULT (current_function_decl)) + return ret; + + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) + { + tree val = gimple_return_retval (r); + if (val && VAR_P (val)) + ret |= maybe_add_sra_candidate (val); + } + return ret; + } + FOR_EACH_LOCAL_DECL (cfun, i, var) { if (!VAR_P (var)) @@ -2858,12 +2881,31 @@ analyze_access_subtree (struct access *root, struct access *parent, hole = true; } + auto check_rw = [] (struct access *root) -> bool { + if ((root->grp_scalar_read || root->grp_assignment_read) + && (root->grp_scalar_write || root->grp_assignment_write)) + return true; + if (sra_mode != SRA_MODE_FINAL_INTRA) + return false; + if ((root->grp_scalar_read || root->grp_assignment_read) + && TREE_CODE (root->base) == PARM_DECL) + return true; + /* Now in fsra (SRA_MODE_FINAL_INTRA), only PARAM and RETURNS + are candidates, so if "VAR_P (root->base)", then it is used by + a return stmt. + TODO: add a flag to root->base to indicate it is used by return + stmt.*/ + if ((root->grp_scalar_write || root->grp_assignment_write) + && VAR_P (root->base)) + return true; + + return false; + }; + + /* In fsra, parameter is scalarizable even no writing to it. */ if (allow_replacements && scalar && !root->first_child && (totally || !root->grp_total_scalarization) - && (totally - || root->grp_hint - || ((root->grp_scalar_read || root->grp_assignment_read) - && (root->grp_scalar_write || root->grp_assignment_write)))) + && (totally || root->grp_hint || check_rw (root))) { /* Always create access replacements that cover the whole access. For integral types this means the precision has to match. @@ -2932,6 +2974,15 @@ analyze_access_subtree (struct access *root, struct access *parent, root->grp_covered = 1; else if (root->grp_write || comes_initialized_p (root->base)) root->grp_unscalarized_data = 1; /* not covered and written to */ + + if (sra_mode == SRA_MODE_FINAL_INTRA) + {/* Does not support writen to PARAM and partial-unscalarized RET yet. */ + if (root->grp_unscalarized_data && (VAR_P (root->base))) + return false; + if (root->grp_write && TREE_CODE (root->base) == PARM_DECL) + return false; + } + return sth_created; } @@ -3887,7 +3938,7 @@ generate_subtree_copies (struct access *access, tree agg, || access->offset + access->size > start_offset)) { tree expr, repl = get_access_replacement (access); - gassign *stmt; + gimple *stmt; expr = build_ref_for_model (loc, agg, access->offset - top_offset, access, gsi, insert_after); @@ -3899,7 +3950,20 @@ generate_subtree_copies (struct access *access, tree agg, !insert_after, insert_after ? GSI_NEW_STMT : GSI_SAME_STMT); - stmt = gimple_build_assign (repl, expr); + if (sra_mode == SRA_MODE_FINAL_INTRA + && TREE_CODE (access->base) == PARM_DECL + && (access->grp_scalar_read || access->grp_assignment_read)) + { + gimple *call = gimple_build_call_internal ( + IFN_ARG_PARTS, 4, access->base, + wide_int_to_tree (sizetype, access->offset), + wide_int_to_tree (sizetype, access->size), + wide_int_to_tree (sizetype, access->reverse)); + gimple_call_set_lhs (call, repl); + stmt = call; + } + else + stmt = gimple_build_assign (repl, expr); } else { @@ -3909,7 +3973,25 @@ generate_subtree_copies (struct access *access, tree agg, !insert_after, insert_after ? GSI_NEW_STMT : GSI_SAME_STMT); - stmt = gimple_build_assign (expr, repl); + if (sra_mode == SRA_MODE_FINAL_INTRA && VAR_P (access->base) + && (access->grp_scalar_write || access->grp_assignment_write)) + { + enum internal_fn fcode; + if (access->first_child == NULL + && access->next_sibling == NULL) + fcode = IFN_SET_RET_LAST_PARTS; + else + fcode = IFN_SET_RET_PARTS; + + gimple *call = gimple_build_call_internal ( + fcode, 4, access->base, + wide_int_to_tree (sizetype, access->offset), + wide_int_to_tree (sizetype, access->size), repl); + gimple_call_set_lhs (call, expr); + stmt = call; + } + else + stmt = gimple_build_assign (expr, repl); } gimple_set_location (stmt, loc); @@ -5134,6 +5216,14 @@ late_intra_sra (void) return perform_intra_sra (); } +/* Perform "final sra" intraprocedural SRA just before expander. */ +static unsigned int +final_intra_sra (void) +{ + sra_mode = SRA_MODE_FINAL_INTRA; + return perform_intra_sra (); +} + static bool gate_intra_sra (void) @@ -5217,6 +5307,44 @@ make_pass_sra (gcc::context *ctxt) return new pass_sra (ctxt); } +namespace +{ +const pass_data pass_data_sra_final = { + GIMPLE_PASS, /* type */ + "fsra", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_TREE_SRA, /* tv_id */ + (PROP_cfg | PROP_ssa), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_sra_final : public gimple_opt_pass +{ +public: + pass_sra_final (gcc::context *ctxt) + : gimple_opt_pass (pass_data_sra_final, ctxt) + { + } + + /* opt_pass methods: */ + bool gate (function *) final override { return gate_intra_sra (); } + unsigned int execute (function *) final override + { + return final_intra_sra (); + } + +}; // class pass_sra_final + +} // namespace + +gimple_opt_pass * +make_pass_sra_final (gcc::context *ctxt) +{ + return new pass_sra_final (ctxt); +} /* If type T cannot be totally scalarized, return false. Otherwise return true and push to the vector within PC offsets and lengths of all padding in the diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C index 769585052b5..4d9bbb0f050 100644 --- a/gcc/testsuite/g++.target/powerpc/pr102024.C +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C @@ -5,7 +5,8 @@ // Test that a zero-width bit field in an otherwise homogeneous aggregate // generates a psabi warning and passes arguments in GPRs. -// { dg-final { scan-assembler-times {\mstd\M} 4 } } +// { dg-final { scan-assembler-times {\mstd\M} 4 {target { ! has_arch_pwr8 } } } } +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 {target { has_arch_pwr8 } } } } struct a_thing { diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073-1.c b/gcc/testsuite/gcc.target/powerpc/pr108073-1.c new file mode 100644 index 00000000000..4892716e85f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108073-1.c @@ -0,0 +1,76 @@ +/* { dg-do run } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-O2 -save-temps" } */ + +typedef struct DF +{ + double a[4]; + short s1; + short s2; + short s3; + short s4; +} DF; +typedef struct SF +{ + float a[4]; + int i1; + int i2; +} SF; + +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlwz\M} {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlhz\M} {target { lp64 && has_arch_pwr8 } } } } */ + +#define NOIPA __attribute__ ((noipa)) + +short NOIPA +foo_hi (DF a, int flag) +{ + if (flag == 2) + return a.s2 + a.s3; + return 0; +} +int NOIPA +foo_si (SF a, int flag) +{ + if (flag == 2) + return a.i2 + a.i1; + return 0; +} +double NOIPA +foo_df (DF arg, int flag) +{ + if (flag == 2) + return arg.a[3]; + else + return 0.0; +} +float NOIPA +foo_sf (SF arg, int flag) +{ + if (flag == 2) + return arg.a[2]; + return 0; +} +float NOIPA +foo_sf1 (SF arg, int flag) +{ + if (flag == 2) + return arg.a[1]; + return 0; +} + +DF gdf = {{1.0, 2.0, 3.0, 4.0}, 1, 2, 3, 4}; +SF gsf = {{1.0f, 2.0f, 3.0f, 4.0f}, 1, 2}; + +int +main () +{ + if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0 + && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0)) + __builtin_abort (); + if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0 + && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0)) + __builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c new file mode 100644 index 00000000000..4e7feaa6810 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-O2 -save-temps" } */ + +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 5 {target { lp64 && { has_arch_pwr8 && be } } } } } */ +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 4 {target { lp64 && { has_arch_pwr8 && be } } } } } */ +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { lp64 && { has_arch_pwr8 && le } } } } } */ +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 {target { lp64 && { has_arch_pwr8 && le } } } } } */ +/* { dg-final { scan-assembler-times {\mfadds\M} 2 {target { lp64 && has_arch_pwr8 } } } } */ + +#define NOIPA __attribute__ ((noipa)) +typedef struct X +{ + float x; + float y; +} X; + +float NOIPA +fooX (X y) +{ + y.x += 1; + return y.x + y.y; +} + +typedef struct Y +{ + double a[4]; + long l; +} Y; + +double NOIPA +fooY (Y arg) +{ + return arg.a[3]; +} + +typedef struct Z +{ + float a[4]; + short l; +} Z; + +float NOIPA +fooZ (Z arg) +{ + return arg.a[3]; +} + +float NOIPA +fooZ2 (Z arg) +{ + return arg.a[2]; +} + +X x = {1.0f, 2.0f}; +Y y = {1.0, 2.0, 3.0, 4.0, 1}; +Z z = {1.0f, 2.0f, 3.0f, 4.0f, 1}; +int +main () +{ + if (fooX (x) != 4.0f) + __builtin_abort (); + + if (fooY (y) != 4.0) + __builtin_abort (); + + if (fooZ (z) != 4.0f) + __builtin_abort (); + + if (fooZ2 (z) != 3.0f) + __builtin_abort (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/pr69143.c b/gcc/testsuite/gcc.target/powerpc/pr69143.c new file mode 100644 index 00000000000..216a270fb7b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr69143.c @@ -0,0 +1,23 @@ +/* { dg-require-effective-target hard_float } */ +/* { dg-require-effective-target powerpc_elfv2 } */ +/* { dg-options "-O2" } */ + +/* { dg-final { scan-assembler-times {\mfmr\M} 3 {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mxscvspdpn\M} {target { lp64 && has_arch_pwr8 } } } } */ + +struct foo1 +{ + float x; + float y; +}; + +struct foo1 +blah1 (struct foo1 y) +{ + struct foo1 x; + + x.x = y.y; + x.y = y.x; + + return x; +}