From patchwork Fri Oct 27 01:50:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1856017 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=C1PEIKou; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SGlxm35Ntz1ypX for ; Fri, 27 Oct 2023 12:51:06 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5BCB23857707 for ; Fri, 27 Oct 2023 01:51:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id DA5BD3858D35; Fri, 27 Oct 2023 01:50:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DA5BD3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DA5BD3858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698371453; cv=none; b=Y94/a7yO2MRBDggaHwIRqBPm5nSTUJjjKmpKre8AvVeEepdfc2pByGr+Bfszyq26j5OaRia168ZcSsoHJWE7Tumdmnm0c4osCFoz0IA6KxQYliMGL/WH2Mm1WsNR4xaRIc23ECTOa0mHXjXm1u0kavWcZ5ERd+pO4kgyyNOQZ58= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698371453; c=relaxed/simple; bh=w703gRcNH/o+kaGhNu9FAc+i89it7dznNH4F7FyXOQM=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=mF7CXs6hgdbegnqqbsyGVJ+lDyKSNFekrbKylL5E6LhUlOVPp0qIZBVTCRKv0TqfwNrXBLf7DrfGWSEAEOOrdRihJnEWeSLHKp001kFIcgXm1nvZxs3M856+mOsPFL3WWsz5tdsJ36nPVxbKXfIPW5Gp+IJfsTS4RrBqz/MCw24= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 39R1Ci3P027034; Fri, 27 Oct 2023 01:50:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : mime-version; s=pp1; bh=JaV3Qeps/ocKZeP/QJG2fP1ShHZRskDdalVBTQ7pHls=; b=C1PEIKou/S6jlm2OYkhoUmQhWfqiHQSR2Ud9eWbHMGCWCQmYhXtyvjAvP2fty6bDBXvR /nqYGA4VCVfVQvkHqnLnua0h8STfSnjU3JoCoSl8YJfHbJF2pIJudQs87bXGRdIRsH5/ xCz9+X7oa9r2AEZsvn0jSh1KKB5VME47Q2ATYf89LQgcadKMew6AeEmSpJbSwIdJoPbo CjZSJ+H7wJciAXWpx+VSagApTXK5G9NJ/sE41BSWIadE0AVHb9CJE3Lax1D66ZFwkW9Q oRpWg9dLH4vHIN7Rz83Ub8pe0vMWOTpDBv9hqdz5qlJP9ArNHwXcxAckWdsSKpyzZnW+ iA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u03bd8vfw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 27 Oct 2023 01:50:44 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 39R1d48H003338; Fri, 27 Oct 2023 01:50:44 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u03bd8vfc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 27 Oct 2023 01:50:43 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 39R0uUkO021017; Fri, 27 Oct 2023 01:50:42 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3tywqr9xgk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 27 Oct 2023 01:50:42 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 39R1ocPF45351434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 27 Oct 2023 01:50:39 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C7B5F20040; Fri, 27 Oct 2023 01:50:38 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 058562004B; Fri, 27 Oct 2023 01:50:37 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 27 Oct 2023 01:50:36 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, richard.sandiford@arm.com, steven@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, bergner@linux.ibm.com, guojiufu@linux.ibm.com Subject: [PATCH V2] introduce light expander sra Date: Fri, 27 Oct 2023 09:50:36 +0800 Message-Id: <20231027015036.3868319-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: XeIqqh-Yg-oa8iQGsvqm5LSKGBxtyo7x X-Proofpoint-ORIG-GUID: fT83fBqIFjaee7Z3ASDH-myd7VVBruKl X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-10-26_22,2023-10-26_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 suspectscore=0 mlxlogscore=999 priorityscore=1501 bulkscore=0 adultscore=0 phishscore=0 malwarescore=0 spamscore=0 impostorscore=0 mlxscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000 definitions=main-2310270014 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, Compare with previous version: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html This verion supports TI/VEC mode of the access. There are a few PRs (meta-bug PR101926) on various targets. The root causes of them are similar: the aggeragte param/ returns are passed by multi-registers, but they are stored to stack from registers first; and then, access the parameter through stack slot. A general idea to enhance this: accessing the aggregate parameters/returns directly through incoming/outgoing scalar registers. This idea would be a kind of SRA. This experimental patch for light-expander-sra contains below parts: a. Check if the parameters/returns are ok/profitable to scalarize, and set the incoming/outgoing registers( pseudos) for the parameter/return. - This is done in "expand_function_start", after the incoming/outgoing hard registers are determined for the paramter/return. The scalarized registers are recorded in DECL_RTL for the parameter/return in parallel form. - At the time when setting DECL_RTL, "scalarizable_aggregate" is called to check the accesses are ok/profitable to scalarize. We can continue to enhance this function, to support more cases. For example: - 'reverse storage order'. - 'writing to parameter'/'overlap accesses'. b. When expanding the accesses of the parameters/returns, according to the info of the access(e.g. bitpos,bitsize, mode), the scalar(pseudos) can be figured out to expand the access. This may happen when expand below accesses: - The component access of a parameter: "_1 = arg.f1". Or whole parameter access: rhs of "_2 = arg" - The assignment to a return val: "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return stmt. - This is mainly done in expr.cc(expand_expr_real_1, and expand_assignment). Function "extract_sub_member" is used to figure out the scalar rtxs(pseudos). Besides the above two parts, some work are done in the GIMPLE tree: collect sra candidates for parameters/returns, and collect the SRA access info. This is mainly done at the beginning of the expander pass. Below are two major items of this part. - Collect light-expand-sra candidates. Each parameter is checked if it has the proper aggregate type. Collect return val (VAR_P) on each return stmts if the function is returning via registers. This is implemented in expand_sra::collect_sra_candidates. - Build/collect/manage all the access on the candidates. The function "scan_function" is used to do this work, it goes through all basicblocks, and all interesting stmts ( phi, return, assign, call, asm) are checked. If there is an interesting expression (e.g. COMPONENT_REF or PARM_DECL), then record the required info for the access (e.g. pos, size, type, base). And if it is risky to do SRA, the candidates may be removed. e.g. address-taken and accessed via memory. "foo(struct S arg) {bar (&arg);}" This patch is tested on ppc64{,le} and x86_64. Is this ok for trunk? BR, Jeff (Jiufu Guo) PR target/65421 gcc/ChangeLog: * cfgexpand.cc (struct access): New class. (struct expand_sra): New class. (expand_sra::collect_sra_candidates): New member function. (expand_sra::add_sra_candidate): Likewise. (expand_sra::build_access): Likewise. (expand_sra::analyze_phi): Likewise. (expand_sra::analyze_assign): Likewise. (expand_sra::visit_base): Likewise. (expand_sra::protect_mem_access_in_stmt): Likewise. (expand_sra::expand_sra): Class constructor. (expand_sra::~expand_sra): Class destructor. (expand_sra::scalarizable_access): New member function. (expand_sra::scalarizable_accesses): Likewise. (scalarizable_aggregate): New function. (set_scalar_rtx_for_returns): New function. (expand_value_return): Updated. (expand_debug_expr): Updated. (pass_expand::execute): Updated to use expand_sra. * cfgexpand.h (scalarizable_aggregate): New declare. (set_scalar_rtx_for_returns): New declare. * expr.cc (expand_assignment): Updated. (expand_constructor): Updated. (query_position_in_parallel): New function. (extract_sub_member): New function. (expand_expr_real_1): Updated. * expr.h (query_position_in_parallel): New declare. * function.cc (assign_parm_setup_block): Updated. (assign_parms): Updated. (expand_function_start): Updated. * tree-sra.h (struct sra_base_access): New class. (struct sra_default_analyzer): New class. (scan_function): New template function. * var-tracking.cc (track_loc_p): Updated. gcc/testsuite/ChangeLog: * g++.target/powerpc/pr102024.C: Updated * gcc.target/powerpc/pr108073.c: New test. * gcc.target/powerpc/pr65421-1.c: New test. * gcc.target/powerpc/pr65421-2.c: New test. --- gcc/cfgexpand.cc | 352 ++++++++++++++++++- gcc/cfgexpand.h | 2 + gcc/expr.cc | 179 +++++++++- gcc/expr.h | 3 + gcc/function.cc | 36 +- gcc/tree-sra.h | 76 ++++ gcc/var-tracking.cc | 3 +- gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- gcc/testsuite/gcc.target/i386/pr20020-2.c | 5 + gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 ++ gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 + gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++ 12 files changed, 718 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index 4262703a138..ef99ca8ac13 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3. If not see #include "output.h" #include "builtins.h" #include "opts.h" +#include "tree-sra.h" /* Some systems use __main in a way incompatible with its use in gcc, in these cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to @@ -97,6 +98,343 @@ static bool defer_stack_allocation (tree, bool); static void record_alignment_for_reg_var (unsigned int); +/* For light SRA in expander about paramaters and returns. */ +struct access : public sra_base_access +{ +}; + +typedef struct access *access_p; + +struct expand_sra : public sra_default_analyzer +{ + /* Construct/destruct resources, e.g. sra candidates. */ + expand_sra (); + ~expand_sra (); + + /* No actions for pre_analyze_stmt, analyze_return. */ + + /* Overwrite phi,call,asm analyzations. */ + void analyze_phi (gphi *s); + + /* TODO: Check accesses on call/asm. */ + void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); }; + void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); }; + + /* Check access of SRA on assignment. */ + void analyze_assign (gassign *); + + /* Check if the accesses of BASE(parameter or return) are + scalarizable, according to the incoming/outgoing REGS. */ + bool scalarizable_accesses (tree base, rtx regs); + +private: + /* Collect the parameter and returns to check if they are suitable for + scalarization. */ + bool collect_sra_candidates (void); + + /* Return true if VAR is added as a candidate for SRA. */ + bool add_sra_candidate (tree var); + + /* Return true if EXPR has interesting sra access, and created access, + return false otherwise. */ + access_p build_access (tree expr, bool write); + + /* Check if the access ACC is scalarizable. REGS is the incoming/outgoing + registers which the access is based on. */ + bool scalarizable_access (access_p acc, rtx regs, bool is_parm); + + /* If there is risk (stored/loaded or addr taken), + disqualify the sra candidates in the un-interesting STMT. */ + void protect_mem_access_in_stmt (gimple *stmt); + + /* Callback of walk_stmt_load_store_addr_ops, used to remove + unscalarizable accesses. */ + static bool visit_base (gimple *, tree op, tree, void *data); + + /* Base (tree) -> Vector (vec *) map. */ + hash_map > *base_access_vec; +}; + +bool +expand_sra::collect_sra_candidates (void) +{ + bool ret = false; + + /* Collect parameters. */ + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; + parm = DECL_CHAIN (parm)) + ret |= add_sra_candidate (parm); + + /* Collect VARs on returns. */ + if (DECL_RESULT (current_function_decl)) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) + { + tree val = gimple_return_retval (r); + /* To sclaraized the return, the return value should be only + writen (except this return stmt). Using 'true(write)' to + pretend the access only be 'writen'. */ + if (val && VAR_P (val)) + ret |= add_sra_candidate (val) && build_access (val, true); + } + } + + return ret; +} + +bool +expand_sra::add_sra_candidate (tree var) +{ + tree type = TREE_TYPE (var); + + if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type)) + || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var) + || is_va_list_type (type)) + return false; + + base_access_vec->get_or_insert (var); + + return true; +} + +access_p +expand_sra::build_access (tree expr, bool write) +{ + enum tree_code code = TREE_CODE (expr); + if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF + && code != ARRAY_REF && code != ARRAY_RANGE_REF) + return NULL; + + HOST_WIDE_INT offset, size; + bool reverse; + tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, &reverse); + if (!base || !DECL_P (base)) + return NULL; + + vec *access_vec = base_access_vec->get (base); + if (!access_vec) + return NULL; + + /* TODO: support reverse. */ + if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE (base))) + { + base_access_vec->remove (base); + return NULL; + } + + struct access *access = XNEWVEC (struct access, 1); + + memset (access, 0, sizeof (struct access)); + access->offset = offset; + access->size = size; + access->expr = expr; + access->write = write; + access->reverse = reverse; + + access_vec->safe_push (access); + + return access; +} + +/* Function protect_mem_access_in_stmt removes the SRA candidates if + there is addr-taken on the candidate in the STMT. */ + +void +expand_sra::analyze_phi (gphi *stmt) +{ + if (base_access_vec && !base_access_vec->is_empty ()) + walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base); +} + +void +expand_sra::analyze_assign (gassign *stmt) +{ + if (!base_access_vec || base_access_vec->is_empty ()) + return; + + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt)) + { + tree rhs = gimple_assign_rhs1 (stmt); + tree lhs = gimple_assign_lhs (stmt); + bool res_r = build_access (rhs, false); + bool res_l = build_access (lhs, true); + + if (res_l || res_r) + return; + } + + protect_mem_access_in_stmt (stmt); +} + +/* Callback of walk_stmt_load_store_addr_ops, used to remove + unscalarizable accesses. Called by protect_mem_access_in_stmt. */ + +bool +expand_sra::visit_base (gimple *, tree op, tree, void *data) +{ + op = get_base_address (op); + if (op && DECL_P (op)) + { + expand_sra *p = (expand_sra *) data; + p->base_access_vec->remove (op); + } + return false; +} + +/* Function protect_mem_access_in_stmt removes the SRA candidates if + there is store/load/addr-taken on the candidate in the STMT. + + For some statements, which SRA does not care about, if there are + possible memory operation on the SRA candidates, it would be risky + to scalarize it. */ + +void +expand_sra::protect_mem_access_in_stmt (gimple *stmt) +{ + if (base_access_vec && !base_access_vec->is_empty ()) + walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base, + visit_base); +} + +expand_sra::expand_sra () : base_access_vec (NULL) +{ + if (optimize <= 0) + return; + + base_access_vec = new hash_map >; + collect_sra_candidates (); +} + +expand_sra::~expand_sra () +{ + if (optimize <= 0) + return; + + delete base_access_vec; +} + +bool +expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm) +{ + /* Now only support reading from parms + or writing to returns. */ + if (is_parm && acc->write) + return false; + if (!is_parm && !acc->write) + return false; + + /* Compute the position of the access in the parallel regs. */ + int start_index = -1; + int end_index = -1; + HOST_WIDE_INT left_bits = 0; + HOST_WIDE_INT right_bits = 0; + query_position_in_parallel (acc->offset, acc->size, regs, start_index, + end_index, left_bits, right_bits); + + /* Invalid access possition: padding or outof bound. */ + if (start_index < 0 || end_index < 0) + return false; + + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr)); + /* Need multi-registers in a parallel for the access. */ + if (expr_mode == BLKmode || end_index > start_index) + { + if (left_bits || right_bits) + return false; + if (expr_mode == BLKmode) + return true; + + /* For large modes, only support TI/VECTOR in mult-registers. */ + if (known_gt (acc->size, GET_MODE_BITSIZE (word_mode))) + return expr_mode == TImode || VECTOR_MODE_P (expr_mode); + return true; + } + + gcc_assert (end_index == start_index); + + /* Just need one reg for the access. */ + if (left_bits == 0 && right_bits == 0) + return true; + + scalar_int_mode imode; + /* Need to extract bits from the reg for the access. */ + return !acc->write && int_mode_for_mode (expr_mode).exists (&imode); +} + +/* Now, the base (parm/return) is scalarizable, only if all + accesses of the BASE are scalariable. + + This function need to be updated, to support more complicate + cases, like: + - Some access are scalarizable, but some are not. + - Access is writing to a parameter. + - Writing accesses are overlap with multi-accesses. */ + +bool +expand_sra::scalarizable_accesses (tree base, rtx regs) +{ + if (!base_access_vec) + return false; + vec *access_vec = base_access_vec->get (base); + if (!access_vec) + return false; + if (access_vec->is_empty ()) + return false; + + bool is_parm = TREE_CODE (base) == PARM_DECL; + int n = access_vec->length (); + int cur_access_index = 0; + for (; cur_access_index < n; cur_access_index++) + if (!scalarizable_access ((*access_vec)[cur_access_index], regs, is_parm)) + break; + + /* It is ok if all access are scalarizable. */ + if (cur_access_index == n) + return true; + + base_access_vec->remove (base); + return false; +} + +static expand_sra *current_sra = NULL; + +/* Check if the PARAM (or return) is scalarizable. + + This interface is used in expand_function_start + to check sra possiblity for parmeters. */ + +bool +scalarizable_aggregate (tree parm, rtx regs) +{ + if (!current_sra) + return false; + return current_sra->scalarizable_accesses (parm, regs); +} + +/* Check if interesting returns, and if they are scalarizable, + set DECL_RTL as scalar registers. + + This interface is used in expand_function_start + when outgoing registers are determinded for DECL_RESULT. */ + +void +set_scalar_rtx_for_returns () +{ + rtx res = DECL_RTL (DECL_RESULT (current_function_decl)); + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) + { + tree val = gimple_return_retval (r); + if (val && VAR_P (val) && scalarizable_aggregate (val, res)) + SET_DECL_RTL (val, res); + } +} + /* Return an expression tree corresponding to the RHS of GIMPLE statement STMT. */ @@ -3707,7 +4045,7 @@ expand_value_return (rtx val) tree decl = DECL_RESULT (current_function_decl); rtx return_reg = DECL_RTL (decl); - if (return_reg != val) + if (!rtx_equal_p (return_reg, val)) { tree funtype = TREE_TYPE (current_function_decl); tree type = TREE_TYPE (decl); @@ -4423,6 +4761,12 @@ expand_debug_expr (tree exp) addr_space_t as; scalar_int_mode op0_mode, op1_mode, addr_mode; + /* TODO: Enable to debug expand-sra optimized parm/returns. */ + tree base = get_base_address (exp); + if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P (base))) + && GET_CODE (DECL_RTL (base)) == PARALLEL) + return NULL_RTX; + switch (TREE_CODE_CLASS (TREE_CODE (exp))) { case tcc_expression: @@ -6628,6 +6972,10 @@ pass_expand::execute (function *fun) auto_bitmap forced_stack_vars; discover_nonconstant_array_refs (forced_stack_vars); + /* Enable light-expander-sra. */ + current_sra = new expand_sra; + scan_function (cfun, *current_sra); + /* Make sure all values used by the optimization passes have sane defaults. */ reg_renumber = 0; @@ -7056,6 +7404,8 @@ pass_expand::execute (function *fun) loop_optimizer_finalize (); } + delete current_sra; + current_sra = NULL; timevar_pop (TV_POST_EXPAND); return 0; diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h index 0e551f6cfd3..3415c217708 100644 --- a/gcc/cfgexpand.h +++ b/gcc/cfgexpand.h @@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *); extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *); extern void set_parm_rtl (tree, rtx); +extern bool scalarizable_aggregate (tree, rtx); +extern void set_scalar_rtx_for_returns (); #endif /* GCC_CFGEXPAND_H */ diff --git a/gcc/expr.cc b/gcc/expr.cc index 763bd82c59f..5ba26e0ef52 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool nontemporal) Assignment of an array element at a constant index, and assignment of an array element in an unaligned packed structure field, has the same problem. Same for (partially) storing into a non-memory object. */ - if (handled_component_p (to) + if ((handled_component_p (to) + && !(VAR_P (get_base_address (to)) + && DECL_RTL_SET_P (get_base_address (to)) + && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL)) || (TREE_CODE (to) == MEM_REF && (REF_REVERSE_STORAGE_ORDER (to) || mem_ref_refers_to_non_mem_p (to))) @@ -8909,6 +8912,19 @@ expand_constructor (tree exp, rtx target, enum expand_modifier modifier, && ! mostly_zeros_p (exp)) return NULL_RTX; + if (target && GET_CODE (target) == PARALLEL && all_zeros_p (exp)) + { + int length = XVECLEN (target, 0); + int start = XEXP (XVECEXP (target, 0, 0), 0) ? 0 : 1; + for (int i = start; i < length; i++) + { + rtx dst = XEXP (XVECEXP (target, 0, i), 0); + rtx zero = CONST0_RTX (GET_MODE (dst)); + emit_move_insn (dst, zero); + } + return target; + } + /* Handle calls that pass values in multiple non-contiguous locations. The Irix 6 ABI has examples of this. */ if (target == 0 || ! safe_from_p (target, exp, 1) @@ -10621,6 +10637,157 @@ stmt_is_replaceable_p (gimple *stmt) return false; } +/* In the parallel rtx register series REGS, compute the position for given + {BITPOS, BITSIZE}. + START_INDEX, END_INDEX, LEFT_BITS and RIGHT_BITS are computed outputs. */ + +void +query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, + rtx regs, int &start_index, int &end_index, + HOST_WIDE_INT &left_bits, HOST_WIDE_INT &right_bits) +{ + int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1; + for (; cur_index < XVECLEN (regs, 0); cur_index++) + { + rtx slot = XVECEXP (regs, 0, cur_index); + HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT; + machine_mode mode = GET_MODE (XEXP (slot, 0)); + HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant (); + if (off <= bitpos && off + size > bitpos) + { + start_index = cur_index; + left_bits = bitpos - off; + } + if (off + size >= bitpos + bitsize) + { + end_index = cur_index; + right_bits = off + size - (bitpos + bitsize); + break; + } + } +} + +static rtx +extract_sub_member (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, + tree expr) +{ + int start_index = -1; + int end_index = -1; + HOST_WIDE_INT left_bits = 0; + HOST_WIDE_INT right_bits = 0; + query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index, + left_bits, right_bits); + + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (expr)); + if (end_index > start_index || expr_mode == BLKmode) + { + /* TImode in multi-registers. */ + if (expr_mode == TImode) + { + rtx res = gen_reg_rtx (expr_mode); + HOST_WIDE_INT start; + start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1)); + for (int index = start_index; index <= end_index; index++) + { + rtx reg = XEXP (XVECEXP (regs, 0, index), 0); + machine_mode mode = GET_MODE (reg); + HOST_WIDE_INT off; + off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start; + rtx sub = simplify_gen_subreg (mode, res, expr_mode, off); + emit_move_insn (sub, reg); + } + return res; + } + + /* Vector in multi-registers. */ + if (VECTOR_MODE_P (expr_mode)) + { + rtvec vector = rtvec_alloc (end_index - start_index + 1); + machine_mode emode; + emode = GET_MODE (XEXP (XVECEXP (regs, 0, start_index), 0)); + for (int index = start_index; index <= end_index; index++) + { + rtx reg = XEXP (XVECEXP (regs, 0, index), 0); + gcc_assert (emode == GET_MODE (reg)); + RTVEC_ELT (vector, index - start_index) = reg; + } + scalar_int_mode imode; + machine_mode vmode; + int nunits = end_index - start_index + 1; + if (!(int_mode_for_mode (emode).exists (&imode) + && mode_for_vector (imode, nunits).exists (&vmode))) + gcc_unreachable (); + + insn_code icode; + icode = convert_optab_handler (vec_init_optab, vmode, imode); + rtx res = gen_reg_rtx (vmode); + emit_insn (GEN_FCN (icode) (res, gen_rtx_PARALLEL (vmode, vector))); + if (expr_mode == vmode) + return res; + return simplify_gen_subreg (expr_mode, res, vmode, 0); + } + + /* Need multi-registers in a parallel for the access. */ + int num_words = end_index - start_index + 1; + rtx *tmps = XALLOCAVEC (rtx, num_words); + + int pos = 0; + HOST_WIDE_INT start; + start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1)); + /* Extract whole registers. */ + for (; pos < num_words; pos++) + { + int index = start_index + pos; + rtx reg = XEXP (XVECEXP (regs, 0, index), 0); + machine_mode mode = GET_MODE (reg); + HOST_WIDE_INT off; + off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start; + tmps[pos] = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off)); + } + + rtx res = gen_rtx_PARALLEL (expr_mode, gen_rtvec_v (pos, tmps)); + return res; + } + + gcc_assert (end_index == start_index); + + /* Just need one reg for the access. */ + if (left_bits == 0 && right_bits == 0) + { + rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0); + if (GET_MODE (reg) != expr_mode) + reg = gen_lowpart (expr_mode, reg); + return reg; + } + + /* Need to extract bitfield part reg for the access. + left_bits != 0 or right_bits != 0 */ + rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0); + bool sgn = TYPE_UNSIGNED (TREE_TYPE (expr)); + scalar_int_mode imode; + if (!int_mode_for_mode (expr_mode).exists (&imode)) + { + gcc_assert (false); + return NULL_RTX; + } + + machine_mode mode = GET_MODE (reg); + bool reverse = false; + rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX, mode, + imode, reverse, NULL); + + if (GET_MODE (bfld) != imode) + bfld = gen_lowpart (imode, bfld); + + if (expr_mode == imode) + return bfld; + + /* expr_mode != imode, e.g. SF != SI. */ + rtx result = gen_reg_rtx (imode); + emit_move_insn (result, bfld); + return gen_lowpart (expr_mode, result); +} + rtx expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, enum expand_modifier modifier, rtx *alt_rtl, @@ -11498,6 +11665,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, = expand_expr_real (tem, tem_target, VOIDmode, tem_modifier, NULL, true); + /* It is scalarizable access on param which is passed by registers. */ + if (GET_CODE (op0) == PARALLEL + && (TREE_CODE (tem) == PARM_DECL || VAR_P (tem))) + { + HOST_WIDE_INT pos, size; + size = bitsize.to_constant (); + pos = bitpos.to_constant (); + return extract_sub_member (op0, pos, size, exp); + } + /* If the field has a mode, we want to access it in the field's mode, not the computed mode. If a MEM has VOIDmode (external with incomplete type), diff --git a/gcc/expr.h b/gcc/expr.h index 2a172867fdb..8a9332aaad6 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -362,5 +362,8 @@ extern rtx expr_size (tree); extern bool mem_ref_refers_to_non_mem_p (tree); extern bool non_mem_decl_p (tree); +extern void query_position_in_parallel (HOST_WIDE_INT, HOST_WIDE_INT, rtx, + int &, int &, HOST_WIDE_INT &, + HOST_WIDE_INT &); #endif /* GCC_EXPR_H */ diff --git a/gcc/function.cc b/gcc/function.cc index afb0b33da9e..518250b2728 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -3107,8 +3107,29 @@ assign_parm_setup_block (struct assign_parm_data_all *all, emit_move_insn (mem, entry_parm); } else - move_block_from_reg (REGNO (entry_parm), mem, - size_stored / UNITS_PER_WORD); + { + int regno = REGNO (entry_parm); + int nregs = size_stored / UNITS_PER_WORD; + rtx *tmps = XALLOCAVEC (rtx, nregs); + machine_mode mode = word_mode; + HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant (); + for (int i = 0; i < nregs; i++) + { + rtx reg = gen_rtx_REG (mode, regno + i); + rtx off = GEN_INT (word_size * i); + tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off); + } + + rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps)); + if (scalarizable_aggregate (parm, regs)) + { + rtx pseudos = gen_group_rtx (regs); + emit_group_move (pseudos, regs); + stack_parm = pseudos; + } + else + move_block_from_reg (regno, mem, nregs); + } } else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type)) { @@ -3710,7 +3731,15 @@ assign_parms (tree fndecl) assign_parm_adjust_stack_rtl (&data); - if (assign_parm_setup_block_p (&data)) + rtx incoming = DECL_INCOMING_RTL (parm); + if (GET_CODE (incoming) == PARALLEL + && scalarizable_aggregate (parm, incoming)) + { + rtx pseudos = gen_group_rtx (incoming); + emit_group_move (pseudos, incoming); + set_parm_rtl (parm, pseudos); + } + else if (assign_parm_setup_block_p (&data)) assign_parm_setup_block (&all, parm, &data); else if (data.arg.pass_by_reference || use_register_for_decl (parm)) assign_parm_setup_reg (&all, parm, &data); @@ -5128,6 +5157,7 @@ expand_function_start (tree subr) { gcc_assert (GET_CODE (hard_reg) == PARALLEL); set_parm_rtl (res, gen_group_rtx (hard_reg)); + set_scalar_rtx_for_returns (); } } diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h index f20266c4622..df3071ccf6e 100644 --- a/gcc/tree-sra.h +++ b/gcc/tree-sra.h @@ -19,6 +19,82 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see . */ +struct sra_base_access +{ + /* Values returned by get_ref_base_and_extent, indicates the + OFFSET, SIZE and BASE of the access. */ + HOST_WIDE_INT offset; + HOST_WIDE_INT size; + + /* The context expression of this access. */ + tree expr; + + /* Indicates this is a write access. */ + bool write : 1; + + /* Indicates if this access is made in reverse storage order. */ + bool reverse : 1; +}; + +/* Default template for sra_scan_function. */ + +struct sra_default_analyzer +{ + /* Template analyze functions. */ + void analyze_phi (gphi *){}; + void pre_analyze_stmt (gimple *){}; + void analyze_return (greturn *){}; + void analyze_assign (gassign *){}; + void analyze_call (gcall *){}; + void analyze_asm (gasm *){}; + void analyze_default_stmt (gimple *){}; +}; + +/* Scan function and look for interesting expressions. */ + +template +void +scan_function (struct function *fun, analyzer &a) +{ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) + { + for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + a.analyze_phi (gsi.phi ()); + + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + a.pre_analyze_stmt (stmt); + + switch (gimple_code (stmt)) + { + case GIMPLE_RETURN: + a.analyze_return (as_a (stmt)); + break; + + case GIMPLE_ASSIGN: + a.analyze_assign (as_a (stmt)); + break; + + case GIMPLE_CALL: + a.analyze_call (as_a (stmt)); + break; + + case GIMPLE_ASM: + a.analyze_asm (as_a (stmt)); + break; + + default: + a.analyze_default_stmt (stmt); + break; + } + } + } +} + bool type_internals_preclude_sra_p (tree type, const char **msg); /* Return true iff TYPE is stdarg va_list type (which early SRA and IPA-SRA diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc index d8dafa5481a..7fc801f2612 100644 --- a/gcc/var-tracking.cc +++ b/gcc/var-tracking.cc @@ -5352,7 +5352,8 @@ track_loc_p (rtx loc, tree expr, poly_int64 offset, bool store_reg_p, because the real and imaginary parts are represented as separate pseudo registers, even if the whole complex value fits into one hard register. */ - if ((paradoxical_subreg_p (mode, DECL_MODE (expr)) + if (((DECL_MODE (expr) != BLKmode + && paradoxical_subreg_p (mode, DECL_MODE (expr))) || (store_reg_p && !COMPLEX_MODE_P (DECL_MODE (expr)) && hard_regno_nregs (REGNO (loc), DECL_MODE (expr)) == 1)) diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C index 769585052b5..c8995cae707 100644 --- a/gcc/testsuite/g++.target/powerpc/pr102024.C +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C @@ -5,7 +5,7 @@ // Test that a zero-width bit field in an otherwise homogeneous aggregate // generates a psabi warning and passes arguments in GPRs. -// { dg-final { scan-assembler-times {\mstd\M} 4 } } +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } struct a_thing { diff --git a/gcc/testsuite/gcc.target/i386/pr20020-2.c b/gcc/testsuite/gcc.target/i386/pr20020-2.c index fa8cb2528c5..723f1826630 100644 --- a/gcc/testsuite/gcc.target/i386/pr20020-2.c +++ b/gcc/testsuite/gcc.target/i386/pr20020-2.c @@ -15,10 +15,15 @@ struct shared_ptr_struct }; typedef struct shared_ptr_struct sptr_t; +void foo (sptr_t *); + void copy_sptr (sptr_t *dest, sptr_t src) { *dest = src; + + /* Prevent 'src' to be scalarized as registers. */ + foo (&src); } /* { dg-final { scan-rtl-dump "\\\(set \\\(reg:TI \[0-9\]*" "expand" } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c new file mode 100644 index 00000000000..293bf93fb9a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -save-temps" } */ + +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } DF; +typedef struct SF {float a[4]; int i1; int i2; } SF; + +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ +short __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return a.s2+a.s3;return 0;} +int __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return a.i2+a.i1;return 0;} +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 2)return arg.a[3];else return 0.0;} +float __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 2)return arg.a[2]; return 0;} +float __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 2)return arg.a[1];return 0;} + +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4}; +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2}; + +int main() +{ + if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0 + && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0)) + __builtin_abort (); + if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0 + && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0)) + __builtin_abort (); + return 0; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c new file mode 100644 index 00000000000..4e1f87f7939 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c @@ -0,0 +1,6 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ + +typedef struct LARGE {double a[4]; int arr[32];} LARGE; +LARGE foo (LARGE a){return a;} +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c new file mode 100644 index 00000000000..8a8e1a0e996 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c @@ -0,0 +1,32 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target powerpc_elfv2 } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +typedef struct FLOATS +{ + double a[3]; +} FLOATS; + +/* 3 lfd after returns also optimized */ +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */ + +/* 3 stfd */ +void st_arg (FLOATS a, FLOATS *p) {*p = a;} +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ + +/* blr */ +FLOATS ret_arg (FLOATS a) {return a;} + +typedef struct MIX +{ + double a[2]; + long l; +} MIX; + +/* std 3 param regs to return slot */ +MIX ret_arg1 (MIX a) {return a;} +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */ + +/* count insns */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */