From patchwork Thu Nov 24 09:41:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1708606 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=BdemptDa; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NHtM86XxMz23mf for ; Thu, 24 Nov 2022 20:42:28 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 30B2838425B3 for ; Thu, 24 Nov 2022 09:42:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 30B2838425B3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1669282946; bh=1bZvGtMKwD7xNr1CN9nAwKfYHlGNJUsCV+nyPtE6+RE=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=BdemptDalAZ9yyE2LPTHb5Rczlas3N4ZpduBHrpSOovZAcnHQqA2WNl8cp0WB8F0s rsu+KDhy71S4tfsERIoIgOJz/NDfrA72TsHG0ZQlsX3u76GFmG0BV47St4ucQgQg5F 5ueHjpsKklJhG4E72qTiwKz7Q7G8Gh2K6g2biqb0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id B148A38432F6; Thu, 24 Nov 2022 09:41:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B148A38432F6 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AO8ZiZS014678; Thu, 24 Nov 2022 09:41:55 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m10pgyrrc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 24 Nov 2022 09:41:55 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AO9Qgwe009560; Thu, 24 Nov 2022 09:41:55 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m10pgyrqe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 24 Nov 2022 09:41:54 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AO9ZBqV026989; Thu, 24 Nov 2022 09:41:52 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma03ams.nl.ibm.com with ESMTP id 3kxps8yxgm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 24 Nov 2022 09:41:52 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AO9foUs11207056 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 24 Nov 2022 09:41:50 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2644911C050; Thu, 24 Nov 2022 09:41:50 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2908F11C04A; Thu, 24 Nov 2022 09:41:49 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 24 Nov 2022 09:41:49 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, guojiufu@linux.ibm.com, rguenther@suse.de, jeffreyalaw@gmail.com Subject: [PATCH V2] Update block move for struct param or returns Date: Thu, 24 Nov 2022 17:41:48 +0800 Message-Id: <20221124094148.125303-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: CUa0racxhATmhPlMjaYjcY6ds8tLXFiN X-Proofpoint-GUID: sUWE7ekWVeyiL9Gv6LJx8ksuecuheya1 X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-24_06,2022-11-23_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 adultscore=0 spamscore=0 phishscore=0 mlxscore=0 lowpriorityscore=0 bulkscore=0 malwarescore=0 clxscore=1015 impostorscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211240072 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jiufu Guo via Gcc-patches From: Jiufu Guo Reply-To: Jiufu Guo Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, When assigning a parameter to a variable, or assigning a variable to return value with struct type, "block move" are used to expand the assignment. It would be better to use the register mode according to the target/ABI to move the blocks. And then this would raise more opportunities for other optimization passes(cse/dse/xprop). As the example code (like code in PR65421): typedef struct SA {double a[3];} A; A ret_arg_pt (A *a){return *a;} // on ppc64le, only 3 lfd(s) A ret_arg (A a) {return a;} // just empty fun body void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s) This patch is based on the previous version which supports assignments from parameter: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605709.html This patch also supports returns. I also tried to update gimplify/nrv to replace "return D.xxx;" with "return ;". While there is one issue: "" with PARALLEL code can not be accessed through address/component_ref. This issue blocks a few passes (e.g. sra, expand). On ppc64, some dead stores are not eliminated. e.g. for ret_arg: .cfi_startproc std 4,56(1)//reductant std 5,64(1)//reductant std 6,72(1)//reductant std 4,0(3) std 5,8(3) std 6,16(3) blr Bootstraped and regtested on ppc64le and x86_64. I'm wondering if this patch could be committed first. Thanks for the comments and suggestions. BR, Jeff (Jiufu) PR target/65421 gcc/ChangeLog: * cfgexpand.cc (expand_used_vars): Add collecting return VARs. (expand_gimple_stmt_1): Call expand_special_struct_assignment. (pass_expand::execute): Free collections of return VARs. * expr.cc (expand_special_struct_assignment): New function. * expr.h (expand_special_struct_assignment): Declare. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr65421-1.c: New test. * gcc.target/powerpc/pr65421.c: New test. --- gcc/cfgexpand.cc | 37 +++++++++++++++++ gcc/expr.cc | 43 ++++++++++++++++++++ gcc/expr.h | 3 ++ gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 21 ++++++++++ gcc/testsuite/gcc.target/powerpc/pr65421.c | 19 +++++++++ 5 files changed, 123 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index dd29ffffc03..f185de39341 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -341,6 +341,9 @@ static hash_map *decl_to_stack_part; all of them in one big sweep. */ static bitmap_obstack stack_var_bitmap_obstack; +/* Those VARs on returns. */ +static bitmap return_vars; + /* An array of indices such that stack_vars[stack_vars_sorted[i]].size is non-decreasing. */ static size_t *stack_vars_sorted; @@ -2158,6 +2161,24 @@ expand_used_vars (bitmap forced_stack_vars) frame_phase = off ? align - off : 0; } + /* Collect VARs on returns. */ + return_vars = NULL; + if (DECL_RESULT (current_function_decl) + && TYPE_MODE (TREE_TYPE (DECL_RESULT (current_function_decl))) == BLKmode) + { + return_vars = BITMAP_ALLOC (NULL); + + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) + { + tree val = gimple_return_retval (ret); + if (val && VAR_P (val)) + bitmap_set_bit (return_vars, DECL_UID (val)); + } + } + /* Set TREE_USED on all variables in the local_decls. */ FOR_EACH_LOCAL_DECL (cfun, i, var) TREE_USED (var) = 1; @@ -3942,6 +3963,17 @@ expand_gimple_stmt_1 (gimple *stmt) /* This is a clobber to mark the going out of scope for this LHS. */ expand_clobber (lhs); + else if ((TREE_CODE (rhs) == PARM_DECL && DECL_INCOMING_RTL (rhs) + && TYPE_MODE (TREE_TYPE (rhs)) == BLKmode + && (GET_CODE (DECL_INCOMING_RTL (rhs)) == PARALLEL + || REG_P (DECL_INCOMING_RTL (rhs)))) + || (VAR_P (lhs) && return_vars + && DECL_RTL_SET_P (DECL_RESULT (current_function_decl)) + && GET_CODE ( + DECL_RTL (DECL_RESULT (current_function_decl))) + == PARALLEL + && bitmap_bit_p (return_vars, DECL_UID (lhs)))) + expand_special_struct_assignment (lhs, rhs); else expand_assignment (lhs, rhs, gimple_assign_nontemporal_move_p ( @@ -7025,6 +7057,11 @@ pass_expand::execute (function *fun) /* After expanding, the return labels are no longer needed. */ return_label = NULL; naked_return_label = NULL; + if (return_vars) + { + BITMAP_FREE (return_vars); + return_vars = NULL; + } /* After expanding, the tm_restart map is no longer needed. */ if (fun->gimple_df->tm_restart) diff --git a/gcc/expr.cc b/gcc/expr.cc index d9407432ea5..6ffd9439188 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -5559,6 +5559,49 @@ mem_ref_refers_to_non_mem_p (tree ref) return non_mem_decl_p (base); } +/* Expand the assignment from parameter or to returns if it needs + "block move" on struct type. */ + +void +expand_special_struct_assignment (tree to, tree from) +{ + rtx result; + + push_temp_slots (); + rtx par_ret = TREE_CODE (from) == PARM_DECL + ? DECL_INCOMING_RTL (from) + : DECL_RTL (DECL_RESULT (current_function_decl)); + machine_mode mode = GET_CODE (par_ret) == PARALLEL + ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0)) + : word_mode; + int mode_size = GET_MODE_SIZE (mode).to_constant (); + int size = INTVAL (expr_size (from)); + rtx to_rtx = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE); + + /* Here using a heurisitic number for how many words may pass via gprs. */ + int hurstc_num = 8; + if (size < mode_size || (size % mode_size) != 0 + || (GET_CODE (par_ret) != PARALLEL && size > (mode_size * hurstc_num))) + result = store_expr (from, to_rtx, 0, false, false); + else + { + rtx from_rtx + = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL); + for (int i = 0; i < size / mode_size; i++) + { + rtx temp = gen_reg_rtx (mode); + rtx src = adjust_address (from_rtx, mode, mode_size * i); + rtx dest = adjust_address (to_rtx, mode, mode_size * i); + emit_move_insn (temp, src); + emit_move_insn (dest, temp); + } + result = to_rtx; + } + + preserve_temp_slots (result); + pop_temp_slots (); +} + /* Expand an assignment that stores the value of FROM into TO. If NONTEMPORAL is true, try generating a nontemporal store. */ diff --git a/gcc/expr.h b/gcc/expr.h index 08b59b8d869..10527f23a56 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -281,6 +281,9 @@ extern void get_bit_range (poly_uint64_pod *, poly_uint64_pod *, tree, /* Expand an assignment that stores the value of FROM into TO. */ extern void expand_assignment (tree, tree, bool); +/* Expand an assignment from parameters or to returns. */ +extern void expand_special_struct_assignment (tree, tree); + /* Generate code for computing expression EXP, and storing the value into TARGET. If SUGGEST_REG is nonzero, copy the value through a register diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c new file mode 100644 index 00000000000..f55a0fe0002 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c @@ -0,0 +1,21 @@ +/* PR target/65421 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -m64" } */ + +typedef struct SA +{ + double a[3]; + long l; +} A; + +A ret_arg_pt (A *a){return *a;} + +A ret_arg (A a) {return a;} + +void st_arg (A a, A *p) {*p = a;} + +/* { dg-final { scan-assembler-times {\mlxvd2x\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvd2x\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstd\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mblr\M} 3 } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 16 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c b/gcc/testsuite/gcc.target/powerpc/pr65421.c new file mode 100644 index 00000000000..26e85468470 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c @@ -0,0 +1,19 @@ +/* PR target/65421 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -m64" } */ + +typedef struct SA +{ + double a[3]; +} A; + +A ret_arg_pt (A *a){return *a;} + +A ret_arg (A a) {return a;} + +void st_arg (A a, A *p) {*p = a;} + +/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mblr\M} 3 } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */