From patchwork Tue Feb 27 07:04:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1905024 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=IEDc+Lq4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TkbPd06JZz1yX4 for ; Tue, 27 Feb 2024 22:49:41 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F0C4A3858431 for ; Tue, 27 Feb 2024 11:49:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id E0819385840B; Tue, 27 Feb 2024 11:49:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E0819385840B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E0819385840B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709034556; cv=none; b=Uj1S1f0LoKqKCbNKs1bEwsySxKhjokCkfqKdpCpWGntc7ervtOFSskU/WGyE+JO4bV14Wf1Q/7x4iuf8mAbZUYo3h3IFzDnP1U8RaxGMT1/31IdjG6qtc9KyxMDKeuY2oiAiL7riuKIqEWhB5x+gcdijDRdB08kymIK4zyQe1pI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709034556; c=relaxed/simple; bh=KGMqW8BkMtwxQa3nVulUJuAmlCEsWfcAppIFYEeoQfA=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=vYkK2V6eLCT/d1S20ytEbbfMkBj0o8B0eEdvUJszdnIeTt5Dm25OQSDM5kYUlTd5L1BqZFRsshXDvbRsHdwQPVKv98UrP+qy+IjcuNj50lJgdPWdb58RW69jTnsL9+5X/LgeSLvnd/IiVLj+NCHwxbrXjCx4SoiIqH7YOfggTqs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41RARFCC017731; Tue, 27 Feb 2024 11:49:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pp1; bh=Yx/RZIeZeXOZsnXQxKDZtwj6JRYxb2zU5df8/IAfnVA=; b=IEDc+Lq4LYSq0K12IETKN7+aKYtdFKGk/HOTrz91ipJ8LCEPwi4qxjVMUW1dJSmYqjEY YYQQWjPykcoYHk9T/TwyiyFcmkJNaHQtMF0hKiTnMbwv4m1KkIA4RxWPurahAkQaEQTt cGLbeCOgezxlNTE8xJg/m/+e2nUXE3t6knd174mg0AoGl1+aWPKw4PJyUVaDd746zVyo MLoT/kdEvEQoLKf6p9KGwy50BsbvLck+9oFYtByh2eYcxfNP6V7qIWfEHSX25dCkZpIA 26i3Et3xzwnl+rMi2srBIaFP9mCeWjkLdwkRoQzMDjRjLq0s7y90pgHYaHjubIVMyw02 0Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3whe0d1ysj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 11:49:02 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41RAefM8003994; Tue, 27 Feb 2024 11:49:01 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3whe0d1ys6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 11:49:01 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41RB4M4E024160; Tue, 27 Feb 2024 11:49:00 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3wfw0k71h6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 11:49:00 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41R74GSq37617966 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 27 Feb 2024 07:04:18 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C513E20040; Tue, 27 Feb 2024 07:04:16 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 335EF2004F; Tue, 27 Feb 2024 07:04:15 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 27 Feb 2024 07:04:15 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, bergner@linux.ibm.com, guojiufu@linux.ibm.com Subject: [PATCH 1/3, RFC] fsra: Add final gimple sra just before expander Date: Tue, 27 Feb 2024 15:04:10 +0800 Message-Id: <20240227070412.3471038-2-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240227070412.3471038-1-guojiufu@linux.ibm.com> References: <20240227070412.3471038-1-guojiufu@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: JaapNcGV2KZ3s24Pp58kmKME9GO3V3Ei X-Proofpoint-ORIG-GUID: 9pgLWqUV0Lv7KgbhwS8K8PKThU7Tc1J6 X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-26_11,2024-02-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxscore=0 priorityscore=1501 mlxlogscore=461 spamscore=0 phishscore=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 impostorscore=0 malwarescore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402270092 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds a new mode for sra pass: "fsra". This 'fsra' pass handle function parameters and returns as candidates. And run it at the end of GIMPLE passes sequences. gcc/ChangeLog: * passes.def: Add pass pass_sra_final. * tree-pass.h (make_pass_sra_final): Declare make_pass_sra_final. * tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA. (build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA. (find_var_candidates): Collect candidates for SRA_MODE_FINAL_INTRA. (final_intra_sra): New function. (class pass_sra_final): New pass class. (make_pass_sra_final): New function. --- gcc/passes.def | 2 ++ gcc/tree-pass.h | 1 + gcc/tree-sra.cc | 81 +++++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 78 insertions(+), 6 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index 1cbbd413097..183c1becd65 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -449,6 +449,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_harden_conditional_branches); NEXT_PASS (pass_harden_compares); NEXT_PASS (pass_warn_access, /*early=*/false); + NEXT_PASS (pass_sra_final); + NEXT_PASS (pass_cleanup_cfg_post_optimizing); NEXT_PASS (pass_warn_function_noreturn); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 29267589eeb..2d0e12bd1bb 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_early_tree_profile (gcc::context *ctxt); extern gimple_opt_pass *make_pass_cleanup_eh (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sra (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sra_early (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_sra_final (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt); extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt); diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index f8e71ec48b9..aacc76f58b5 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -21,14 +21,16 @@ along with GCC; see the file COPYING3. If not see . */ /* This file implements Scalar Reduction of Aggregates (SRA). SRA is run - twice, once in the early stages of compilation (early SRA) and once in the - late stages (late SRA). The aim of both is to turn references to scalar - parts of aggregates into uses of independent scalar variables. + three times, once in the early stages of compilation (early SRA) and once + in the late stages (late SRA). The aim of them is to turn references to + scalar parts of aggregates into uses of independent scalar variables. - The two passes are nearly identical, the only difference is that early SRA + The three passes are nearly identical, the difference are that early SRA does not scalarize unions which are used as the result in a GIMPLE_RETURN statement because together with inlining this can lead to weird type - conversions. + conversions. The third pass is more care about parameters and returns, + it would be helpful for the parameters and returns which are passed through + registers. Both passes operate in four stages: @@ -104,6 +106,7 @@ along with GCC; see the file COPYING3. If not see /* Enumeration of all aggregate reductions we can do. */ enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */ + SRA_MODE_FINAL_INTRA, /* final gimple intraprocedural SRA */ SRA_MODE_INTRA }; /* late intraprocedural SRA */ /* Global variable describing which aggregate reduction we are performing at @@ -1437,7 +1440,8 @@ build_accesses_from_assign (gimple *stmt) } if (lacc && racc - && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA) + && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA + || sra_mode == SRA_MODE_FINAL_INTRA) && !lacc->grp_unscalarizable_region && !racc->grp_unscalarizable_region && AGGREGATE_TYPE_P (TREE_TYPE (lhs)) @@ -2149,6 +2153,24 @@ find_var_candidates (void) parm = DECL_CHAIN (parm)) ret |= maybe_add_sra_candidate (parm); + /* fsra only care about parameters and returns */ + if (sra_mode == SRA_MODE_FINAL_INTRA) + { + if (!DECL_RESULT (current_function_decl)) + return ret; + + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) + { + tree val = gimple_return_retval (r); + if (val && VAR_P (val)) + ret |= maybe_add_sra_candidate (val); + } + return ret; + } + FOR_EACH_LOCAL_DECL (cfun, i, var) { if (!VAR_P (var)) @@ -5017,6 +5039,14 @@ late_intra_sra (void) return perform_intra_sra (); } +/* Perform "final sra" intraprocedural SRA just before expander. */ +static unsigned int +final_intra_sra (void) +{ + sra_mode = SRA_MODE_FINAL_INTRA; + return perform_intra_sra (); +} + static bool gate_intra_sra (void) @@ -5099,3 +5129,42 @@ make_pass_sra (gcc::context *ctxt) { return new pass_sra (ctxt); } + +namespace +{ +const pass_data pass_data_sra_final = { + GIMPLE_PASS, /* type */ + "fsra", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_TREE_SRA, /* tv_id */ + (PROP_cfg | PROP_ssa), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_sra_final : public gimple_opt_pass +{ +public: + pass_sra_final (gcc::context *ctxt) + : gimple_opt_pass (pass_data_sra_final, ctxt) + { + } + + /* opt_pass methods: */ + bool gate (function *) final override { return gate_intra_sra (); } + unsigned int execute (function *) final override + { + return final_intra_sra (); + } + +}; // class pass_sra_final + +} // namespace + +gimple_opt_pass * +make_pass_sra_final (gcc::context *ctxt) +{ + return new pass_sra_final (ctxt); +} From patchwork Tue Feb 27 07:04:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1905250 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=mXAT8Zgq; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TkhYV5whXz23qP for ; Wed, 28 Feb 2024 02:41:49 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 769B03858414 for ; Tue, 27 Feb 2024 15:41:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id E13D93858C78; Tue, 27 Feb 2024 15:41:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E13D93858C78 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E13D93858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709048487; cv=none; b=qEVSzn9AcQcKOIi4yR2SJR8Ukua5w+3lYRY6rTpkNuYRFMJEBbiHkxeelWDiO/XcP5982CXhEwl2qNpi4jA7YdlCf5Vnh9z3rqpM4Kt7TzWOkHcXiNI2UHw0ywmkXrUanLXU+1uPIBmYDnPEfSfRbpW5hwN8qm5AJvD7CLEuMNs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709048487; c=relaxed/simple; bh=li5CQrupfRBsVlU48wBTOBVhCjwvVJuhdx2kbBRawNU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=BNr1FXR4lMdDBOFLMmZCMQ8q+gucApyoO0r1SbTocpbK5KTNWv12eA5oeR0IC6Z2vLn8cu+KDg+2sQKXTetP1mhBVr8lqtRzd8Y9QqxOlGp5i37v2gJvTD4QxdMZlkIcBXogOiPcpOj3912JD9tSu3weBJ3k6ObVQDHZvVfJZrI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41RFWQ1X028351; Tue, 27 Feb 2024 15:41:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=+3BrCujPFn6Bb0WIRdM4K32vsRfIcrdsAf0Z0lk6Fdc=; b=mXAT8ZgqohvTXsWSDLFqqQ6CXN9q1dxdx32eVib++HPj2W0c78FFuBbp4vGSX6DuxOIv kNon1Z5vOS7zFtPniU/Xp72/SlNSDLCq8j/gueMW00LB91ncOTgHMzfVrXNqLUFG74tL ONMaqfwHuowBSt1oKL7YLB0yZlsFvOpLnTEXk+KHPYFHnIRYm62y8Fy0UaGN8hIZQ6ph yhSHU3KqWNZjDuZ7ww7KlcYLhR5BNXNwKPltzDzbubiqWcawAzVn+iUSedtkLCBsdFvu yKbzhlCLELXKcKDPYKWoBpUZgMSxg8L3fMKJWzp8yriEe6bubH70xyHRr4zn1ZajvD+j HQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3whjfcg7cp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 15:41:19 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41RFYkh3003101; Tue, 27 Feb 2024 15:41:19 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3whjfcg7c8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 15:41:19 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41RDNUTZ024127; Tue, 27 Feb 2024 15:41:18 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3wfw0k86g7-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 15:41:18 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41R74IkU21168880 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 27 Feb 2024 07:04:20 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A942D20063; Tue, 27 Feb 2024 07:04:18 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17CBE2004E; Tue, 27 Feb 2024 07:04:17 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 27 Feb 2024 07:04:16 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, bergner@linux.ibm.com, guojiufu@linux.ibm.com Subject: [PATCH 2/3, RFC] fsra: support ARG_PARTS Date: Tue, 27 Feb 2024 15:04:11 +0800 Message-Id: <20240227070412.3471038-3-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240227070412.3471038-1-guojiufu@linux.ibm.com> References: <20240227070412.3471038-1-guojiufu@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: wBBeHnZpr1Vf-kY947Hb-HiD14F69HIR X-Proofpoint-ORIG-GUID: 8G4V_Hwb74HNCp-t6PNQa5wXQOK4mfyI X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-27_01,2024-02-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 malwarescore=0 suspectscore=0 mlxscore=0 adultscore=0 lowpriorityscore=0 priorityscore=1501 clxscore=1015 bulkscore=0 spamscore=0 mlxlogscore=999 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402270121 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds IFN_ARG_PARTS, and generate this IFN for parameters access in fsra pass. And this IFN is expanded according to the incoming registers of the parameter. "fsra" is tunned for the access of parameters. PR target/108073 gcc/ChangeLog: * internal-fn.cc (query_position_in_parallel): New function. (construct_reg_seq): New function. (get_incoming_element): New function. (reference_alias_ptr_type): Extern declare. (expand_ARG_PARTS): New expand function. * internal-fn.def (ARG_PARTS): New IFN. * tree-sra.cc (scan_function): Update for fsra. (analyze_access_subtree): Enable reading ARG analyze for fsra. (generate_subtree_copies): Update to generate IFN_ARG_PARTS. gcc/testsuite/ChangeLog: * g++.target/powerpc/pr102024.C: Update. * gcc.target/powerpc/pr108073-1.c: New test. * gcc.target/powerpc/pr108073.c: New test. --- gcc/internal-fn.cc | 164 ++++++++++++++++++ gcc/internal-fn.def | 3 + gcc/tree-sra.cc | 43 ++++- gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- gcc/testsuite/gcc.target/powerpc/pr108073-1.c | 76 ++++++++ gcc/testsuite/gcc.target/powerpc/pr108073.c | 74 ++++++++ 6 files changed, 354 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index a07f25f3aee..ee19e155628 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3393,6 +3393,170 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt) } } +/* In the parallel rtx register series REGS, compute the register position for + given {BITPOS, BITSIZE}. The results are stored into START_INDEX, END_INDEX, + LEFT_BITS and RIGHT_BITS. */ + +void +query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, + rtx regs, int &start_index, int &end_index, + HOST_WIDE_INT &left_bits, HOST_WIDE_INT &right_bits) +{ + int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1; + for (; cur_index < XVECLEN (regs, 0); cur_index++) + { + rtx slot = XVECEXP (regs, 0, cur_index); + HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT; + machine_mode mode = GET_MODE (XEXP (slot, 0)); + HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant (); + if (off <= bitpos && off + size > bitpos) + { + start_index = cur_index; + left_bits = bitpos - off; + } + if (off + size >= bitpos + bitsize) + { + end_index = cur_index; + right_bits = off + size - (bitpos + bitsize); + break; + } + } +} + +/* Create a serial registers which start at FIRST_REG, + and SIZE is the total size of those registers. */ +static rtx +construct_reg_seq (HOST_WIDE_INT size, rtx first_reg) +{ + int nregs = size / UNITS_PER_WORD + (((size % UNITS_PER_WORD) != 0) ? 1 : 0); + rtx *tmps = XALLOCAVEC (rtx, nregs); + int regno = REGNO (first_reg); + machine_mode mode = word_mode; + HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant (); + for (int i = 0; i < nregs; i++) + { + rtx reg = gen_rtx_REG (mode, regno + i); + rtx off = GEN_INT (word_size * i); + tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off); + } + return gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps)); +} + +static rtx +get_incoming_element (tree arg, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, + bool reversep, tree expr) +{ + rtx regs = DECL_INCOMING_RTL (arg); + bool has_padding = false; + if (REG_P (regs) && GET_MODE (regs) == BLKmode) + { + HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (arg)); + has_padding = (size % UNITS_PER_WORD) != 0; + regs = construct_reg_seq (size, regs); + } + + if (GET_CODE (regs) != PARALLEL) + return NULL_RTX; + + int start_index = -1; + int end_index = -1; + HOST_WIDE_INT left_bits = 0; + HOST_WIDE_INT right_bits = 0; + query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index, + left_bits, right_bits); + + if (start_index < 0 || end_index < 0) + return NULL_RTX; + + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (expr)); + /* Just need one reg for the access. */ + if (end_index != start_index) + return NULL_RTX; + + rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0); + /* Just need one reg for the access. */ + if (left_bits == 0 && right_bits == 0) + { + if (GET_MODE (reg) != expr_mode) + reg = gen_lowpart (expr_mode, reg); + return reg; + } + + /* Need to extract bitfield part reg for the access. + left_bits != 0 or right_bits != 0 */ + if (has_padding && end_index == XVECLEN (regs, 0) - 1) + return NULL_RTX; + scalar_int_mode imode; + if (!int_mode_for_mode (expr_mode).exists (&imode)) + return NULL_RTX; + + if (expr_mode != imode + && known_gt (GET_MODE_SIZE (GET_MODE (regs)), UNITS_PER_WORD)) + return NULL_RTX; + + machine_mode mode = GET_MODE (reg); + bool sgn = TYPE_UNSIGNED (TREE_TYPE (expr)); + rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX, mode, + imode, reversep, NULL); + + if (GET_MODE (bfld) != imode) + bfld = gen_lowpart (imode, bfld); + + if (expr_mode == imode) + return bfld; + + /* expr_mode != imode, e.g. SF != SI. */ + rtx result = gen_reg_rtx (imode); + emit_move_insn (result, bfld); + return gen_lowpart (expr_mode, result); +} + +tree +reference_alias_ptr_type (tree t); + +static void +expand_ARG_PARTS (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + HOST_WIDE_INT offset = tree_to_shwi (gimple_call_arg (stmt, 1)); + HOST_WIDE_INT size = tree_to_shwi (gimple_call_arg (stmt, 2)); + int reversep = tree_to_shwi (gimple_call_arg (stmt, 3)); + rtx sub_elem = get_incoming_element (arg, offset, size, reversep, lhs); + if (sub_elem) + { + rtx to_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + if (to_rtx) + { + gcc_assert (REG_P (to_rtx)); + emit_move_insn (to_rtx, sub_elem); + return; + } + } + /* Fall back to normal expand method. */ + if ((offset % BITS_PER_WORD == 0) && (size % BITS_PER_WORD == 0)) + { + tree base = build_fold_addr_expr (arg); + tree type = reference_alias_ptr_type (arg); + tree off = build_int_cst (type, offset / BITS_PER_UNIT); + location_t loc = EXPR_LOCATION (arg); + tree rhs = fold_build2_loc (loc, MEM_REF, TREE_TYPE (lhs), base, off); + REF_REVERSE_STORAGE_ORDER (rhs) = reversep; + expand_assignment (lhs, rhs, false); + } + else + { + tree type = TREE_TYPE (lhs); + machine_mode mode = TYPE_MODE (type); + rtx op0 + = expand_expr_real (arg, NULL, VOIDmode, EXPAND_NORMAL, NULL, true); + op0 = extract_bit_field (op0, size, offset, TYPE_UNSIGNED (type), NULL, + mode, mode, reversep, NULL); + rtx dest = expand_expr (lhs, NULL, VOIDmode, EXPAND_WRITE); + emit_move_insn (dest, op0); + } +} + /* The size of an OpenACC compute dimension. */ static void diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index c14d30365c1..2bbf70dd6a1 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -510,6 +510,9 @@ DEF_INTERNAL_FN (PHI, 0, NULL) automatic variable. */ DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +/* A function to extract elemet(s) from an aggregate argument in fsra. */ +DEF_INTERNAL_FN (ARG_PARTS, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) + /* DIM_SIZE and DIM_POS return the size of a particular compute dimension and the executing thread's position within that dimension. DIM_POS is pure (and not const) so that it isn't diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index aacc76f58b5..0bbb8940921 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -1508,7 +1508,8 @@ scan_function (void) tree t; unsigned i; - if (gimple_code (stmt) != GIMPLE_CALL) + if (gimple_code (stmt) != GIMPLE_CALL + || sra_mode == SRA_MODE_FINAL_INTRA) walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL, scan_visit_addr); @@ -2767,12 +2768,22 @@ analyze_access_subtree (struct access *root, struct access *parent, hole = true; } + auto check_rw = [] (struct access *root) -> bool { + if ((root->grp_scalar_read || root->grp_assignment_read) + && (root->grp_scalar_write || root->grp_assignment_write)) + return true; + if (sra_mode != SRA_MODE_FINAL_INTRA) + return false; + if ((root->grp_scalar_read || root->grp_assignment_read) + && TREE_CODE (root->base) == PARM_DECL) + return true; + return false; + }; + + /* In fsra, parameter is scalarizable even no writing to it. */ if (allow_replacements && scalar && !root->first_child && (totally || !root->grp_total_scalarization) - && (totally - || root->grp_hint - || ((root->grp_scalar_read || root->grp_assignment_read) - && (root->grp_scalar_write || root->grp_assignment_write)))) + && (totally || root->grp_hint || check_rw (root))) { /* Always create access replacements that cover the whole access. For integral types this means the precision has to match. @@ -2841,6 +2852,11 @@ analyze_access_subtree (struct access *root, struct access *parent, root->grp_covered = 1; else if (root->grp_write || comes_initialized_p (root->base)) root->grp_unscalarized_data = 1; /* not covered and written to */ + + if (sra_mode == SRA_MODE_FINAL_INTRA && root->grp_write + && TREE_CODE (root->base) == PARM_DECL) + return false; + return sth_created; } @@ -3802,7 +3818,7 @@ generate_subtree_copies (struct access *access, tree agg, || access->offset + access->size > start_offset)) { tree expr, repl = get_access_replacement (access); - gassign *stmt; + gimple *stmt; expr = build_ref_for_model (loc, agg, access->offset - top_offset, access, gsi, insert_after); @@ -3814,7 +3830,20 @@ generate_subtree_copies (struct access *access, tree agg, !insert_after, insert_after ? GSI_NEW_STMT : GSI_SAME_STMT); - stmt = gimple_build_assign (repl, expr); + if (sra_mode == SRA_MODE_FINAL_INTRA + && TREE_CODE (access->base) == PARM_DECL + && (access->grp_scalar_read || access->grp_assignment_read)) + { + gimple *call = gimple_build_call_internal ( + IFN_ARG_PARTS, 4, access->base, + wide_int_to_tree (sizetype, access->offset), + wide_int_to_tree (sizetype, access->size), + wide_int_to_tree (sizetype, access->reverse)); + gimple_call_set_lhs (call, repl); + stmt = call; + } + else + stmt = gimple_build_assign (repl, expr); } else { diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C index 769585052b5..c8995cae707 100644 --- a/gcc/testsuite/g++.target/powerpc/pr102024.C +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C @@ -5,7 +5,7 @@ // Test that a zero-width bit field in an otherwise homogeneous aggregate // generates a psabi warning and passes arguments in GPRs. -// { dg-final { scan-assembler-times {\mstd\M} 4 } } +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } struct a_thing { diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073-1.c b/gcc/testsuite/gcc.target/powerpc/pr108073-1.c new file mode 100644 index 00000000000..4892716e85f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108073-1.c @@ -0,0 +1,76 @@ +/* { dg-do run } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-O2 -save-temps" } */ + +typedef struct DF +{ + double a[4]; + short s1; + short s2; + short s3; + short s4; +} DF; +typedef struct SF +{ + float a[4]; + int i1; + int i2; +} SF; + +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlwz\M} {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mlhz\M} {target { lp64 && has_arch_pwr8 } } } } */ + +#define NOIPA __attribute__ ((noipa)) + +short NOIPA +foo_hi (DF a, int flag) +{ + if (flag == 2) + return a.s2 + a.s3; + return 0; +} +int NOIPA +foo_si (SF a, int flag) +{ + if (flag == 2) + return a.i2 + a.i1; + return 0; +} +double NOIPA +foo_df (DF arg, int flag) +{ + if (flag == 2) + return arg.a[3]; + else + return 0.0; +} +float NOIPA +foo_sf (SF arg, int flag) +{ + if (flag == 2) + return arg.a[2]; + return 0; +} +float NOIPA +foo_sf1 (SF arg, int flag) +{ + if (flag == 2) + return arg.a[1]; + return 0; +} + +DF gdf = {{1.0, 2.0, 3.0, 4.0}, 1, 2, 3, 4}; +SF gsf = {{1.0f, 2.0f, 3.0f, 4.0f}, 1, 2}; + +int +main () +{ + if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0 + && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0)) + __builtin_abort (); + if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0 + && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0)) + __builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c new file mode 100644 index 00000000000..4e7feaa6810 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c @@ -0,0 +1,74 @@ +/* { dg-do run } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-O2 -save-temps" } */ + +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 5 {target { lp64 && { has_arch_pwr8 && be } } } } } */ +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 4 {target { lp64 && { has_arch_pwr8 && be } } } } } */ +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { lp64 && { has_arch_pwr8 && le } } } } } */ +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 {target { lp64 && { has_arch_pwr8 && le } } } } } */ +/* { dg-final { scan-assembler-times {\mfadds\M} 2 {target { lp64 && has_arch_pwr8 } } } } */ + +#define NOIPA __attribute__ ((noipa)) +typedef struct X +{ + float x; + float y; +} X; + +float NOIPA +fooX (X y) +{ + y.x += 1; + return y.x + y.y; +} + +typedef struct Y +{ + double a[4]; + long l; +} Y; + +double NOIPA +fooY (Y arg) +{ + return arg.a[3]; +} + +typedef struct Z +{ + float a[4]; + short l; +} Z; + +float NOIPA +fooZ (Z arg) +{ + return arg.a[3]; +} + +float NOIPA +fooZ2 (Z arg) +{ + return arg.a[2]; +} + +X x = {1.0f, 2.0f}; +Y y = {1.0, 2.0, 3.0, 4.0, 1}; +Z z = {1.0f, 2.0f, 3.0f, 4.0f, 1}; +int +main () +{ + if (fooX (x) != 4.0f) + __builtin_abort (); + + if (fooY (y) != 4.0) + __builtin_abort (); + + if (fooZ (z) != 4.0f) + __builtin_abort (); + + if (fooZ2 (z) != 3.0f) + __builtin_abort (); + + return 0; +} From patchwork Tue Feb 27 07:04:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiufu Guo X-Patchwork-Id: 1905026 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=FAG3eCdk; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TkbRH5ZtHz23qN for ; Tue, 27 Feb 2024 22:51:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CD0F23858C78 for ; Tue, 27 Feb 2024 11:51:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C91383858427; Tue, 27 Feb 2024 11:50:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C91383858427 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C91383858427 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709034640; cv=none; b=BXLGsx1O563qkePZxyFBon1KnemvJUwFOsWIgiZD5LLd+ThlrZIz8UHdJM2rErgctzo9dgxxuz7qeI4KbNY/epb2+fBk1R57Vjnd+5hFC4j/GeUCAo6fD2oc1h4r5+toPu6uv/TbPkeXyZ1XgS+N4DTUH5r9f3UXSfgkmz/F4lI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709034640; c=relaxed/simple; bh=H7kIss9kL6LeW4kkEhEonQ1cIlABuhVPMdmHfnGknBg=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=RKueUngY2krPFoH/w+A3v2X0NSsmOzxQKPPQ+wi1GtJkpeabQrtnZrxNhXd5I2oPCuV18yrvQzGioQDSePMu+2FwtVenKhLIChxs0dUg8+GDbRxIwWTBgkn2551u8OvqIGXsqQ1i32rM85EUB5p5woiVKHOlpypS1/Pk0qyhWng= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41RBlehk011978; Tue, 27 Feb 2024 11:50:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=xqRcZa+ym3OHomHEa46ujffw0l0vFG/U/LpqUYimxJQ=; b=FAG3eCdkG4VJ8oWMQhbfJqkFmjz39+x/6InMhkOhA4I73wQUjrMZSjW1C+95jInZbIWI WrUmWXuMFS9FluAY1O6CmLTDhCYP1ATiuFui7PpdIDzgToQ7Z9PR44JpQxaeNxlz/ltI X/4+J7bH3ojj6+8DinYne41+698r2W2ZVV263iD3g7CW3ShPhcpGu0+TOSpnEnz0E30B +NHyPMKpI/0HNU1dJNVJ/HnklNZ6qQb7Ffb1Fck4S1t9lrxuxnN4NvM7F+v+bzQU6ZUv EPgsue1lMSRyAAzwy6OuZ19RPauQz9XhdqbyLXK5SxSgy3M3X8VxGfZ9Y0jopmIzOLaf UQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3whf62r2uc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 11:50:24 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41RBm2tr013429; Tue, 27 Feb 2024 11:50:24 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3whf62r2tq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 11:50:24 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41R9kLBX021291; Tue, 27 Feb 2024 11:50:23 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3wfusnycc5-17 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 27 Feb 2024 11:50:23 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41R74K2V7406196 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 27 Feb 2024 07:04:22 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8D2E520049; Tue, 27 Feb 2024 07:04:20 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F05B92004B; Tue, 27 Feb 2024 07:04:18 +0000 (GMT) Received: from genoa.aus.stglabs.ibm.com (unknown [9.40.192.157]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 27 Feb 2024 07:04:18 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, bergner@linux.ibm.com, guojiufu@linux.ibm.com Subject: [PATCH 3/3, RFC] fsra: support SET_RET_PART Date: Tue, 27 Feb 2024 15:04:12 +0800 Message-Id: <20240227070412.3471038-4-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240227070412.3471038-1-guojiufu@linux.ibm.com> References: <20240227070412.3471038-1-guojiufu@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: -U2QBVofZthhgUdv7Qs0_nJekSlwDy8S X-Proofpoint-ORIG-GUID: c4FVhmggoFEExKxMgJTCbI84efjSMNxJ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-26_11,2024-02-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 adultscore=0 malwarescore=0 mlxlogscore=999 lowpriorityscore=0 bulkscore=0 mlxscore=0 phishscore=0 priorityscore=1501 suspectscore=0 impostorscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402270092 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds IFN_SET_RET_PARTS, and generate this IFN for the accesses of the 'returns' in fsra pass. And the IFN is expanded according to the outgoing registers of the 'return'. "fsra" is tunned for the access analyze for 'returns'. 'IFN_SET_RET_LAST_PARTS' is just for this prototype, it helps to reuse the decl information of the 'return var'. With enhancing the implementation, this IFN may be removed. PR target/65421 PR target/69143 gcc/ChangeLog: * cfgexpand.cc (expand_value_return): Update. (expand_return): Update for returns expand. * internal-fn.cc (store_outgoing_element): New function. (expand_SET_RET_PARTS): New IFN expand function. (expand_SET_RET_LAST_PARTS): New IFN expand function. * internal-fn.def (SET_RET_PARTS): New IFN. (SET_RET_LAST_PARTS): New IFN. * tree-sra.cc (analyze_access_subtree): Upate for returns in fsra. (generate_subtree_copies): Generate IFN for returns. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr65421.c: New test. * gcc.target/powerpc/pr69143.c: New test. --- gcc/cfgexpand.cc | 6 +- gcc/internal-fn.cc | 84 ++++++++++++++++++++++ gcc/internal-fn.def | 6 ++ gcc/tree-sra.cc | 39 ++++++++-- gcc/testsuite/gcc.target/powerpc/pr65421.c | 10 +++ gcc/testsuite/gcc.target/powerpc/pr69143.c | 23 ++++++ 6 files changed, 163 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index eef565eddb5..1ec6c2d8102 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -3759,7 +3759,7 @@ expand_value_return (rtx val) tree decl = DECL_RESULT (current_function_decl); rtx return_reg = DECL_RTL (decl); - if (return_reg != val) + if (!rtx_equal_p (return_reg, val)) { tree funtype = TREE_TYPE (current_function_decl); tree type = TREE_TYPE (decl); @@ -3832,6 +3832,10 @@ expand_return (tree retval) been stored into it, so we don't have to do anything special. */ if (TREE_CODE (retval_rhs) == RESULT_DECL) expand_value_return (result_rtl); + /* return is scalarized by fsra: TODO use FLAG. */ + else if (VAR_P (retval_rhs) + && rtx_equal_p (result_rtl, DECL_RTL (retval_rhs))) + expand_null_return_1 (); /* If the result is an aggregate that is being returned in one (or more) registers, load the registers here. */ diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index ee19e155628..be06dc3a16c 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3557,6 +3557,90 @@ expand_ARG_PARTS (internal_fn, gcall *stmt) } } +static bool +store_outgoing_element (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, + tree rhs) +{ + if (GET_CODE (regs) != PARALLEL) + return false; + + int start_index = -1; + int end_index = -1; + HOST_WIDE_INT left_bits = 0; + HOST_WIDE_INT right_bits = 0; + query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index, + left_bits, right_bits); + + if (start_index < 0 || end_index < 0) + return false; + + if (end_index != start_index) + return false; + + if (!((left_bits == 0 && !BITS_BIG_ENDIAN) + || (right_bits == 0 && BITS_BIG_ENDIAN))) + return false; + + /* Just need one reg for the access. */ + rtx dest = XEXP (XVECEXP (regs, 0, start_index), 0); + machine_mode mode = GET_MODE (dest); + + if (left_bits != 0 || right_bits != 0) + { + machine_mode small_mode; + if (!SCALAR_INT_MODE_P (mode) + || !mode_for_size (bitsize, GET_MODE_CLASS (mode), 0) + .exists (&small_mode)) + return false; + + dest = gen_lowpart (small_mode, dest); + mode = small_mode; + } + + rtx src = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL); + if (!src) + return false; + + machine_mode src_mode = GET_MODE (src); + if (mode != src_mode) + src = gen_lowpart (mode, src); + + emit_move_insn (dest, src); + + return true; +} + +static void +expand_SET_RET_PARTS (internal_fn, gcall *stmt) +{ + HOST_WIDE_INT offset = tree_to_shwi (gimple_call_arg (stmt, 1)); + HOST_WIDE_INT size = tree_to_shwi (gimple_call_arg (stmt, 2)); + tree decl = DECL_RESULT (current_function_decl); + rtx dest_regs = decl->decl_with_rtl.rtl; // DECL_RTL (base); + tree rhs = gimple_call_arg (stmt, 3); + bool res = store_outgoing_element (dest_regs, offset, size, rhs); + if (!res) + { + tree base = gimple_call_arg (stmt, 0); + tree lhs = gimple_call_lhs (stmt); + expand_assignment (base, decl, false); + expand_assignment (lhs, rhs, false); + expand_assignment (decl, base, false); + } +} + +static void +expand_SET_RET_LAST_PARTS (internal_fn, gcall *stmt) +{ + expand_SET_RET_PARTS (IFN_SET_RET_PARTS, stmt); + + tree decl = DECL_RESULT (current_function_decl); + rtx dest_regs = decl->decl_with_rtl.rtl; // DECL_RTL (base); + tree base = gimple_call_arg (stmt, 0); + base->decl_with_rtl.rtl = dest_regs; // SET_DECL_RTL +} + + /* The size of an OpenACC compute dimension. */ static void diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 2bbf70dd6a1..e6fab4671d5 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -513,6 +513,12 @@ DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) /* A function to extract elemet(s) from an aggregate argument in fsra. */ DEF_INTERNAL_FN (ARG_PARTS, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) +/* Functions to set/construct elemet(s) for an 'return' aggregate. */ +DEF_INTERNAL_FN (SET_RET_PARTS, ECF_LEAF | ECF_NOTHROW, NULL) +/* Functions to set/construct elemet(s) for a 'return' aggregate just before +return statement. */ +DEF_INTERNAL_FN (SET_RET_LAST_PARTS, ECF_LEAF | ECF_NOTHROW, NULL) + /* DIM_SIZE and DIM_POS return the size of a particular compute dimension and the executing thread's position within that dimension. DIM_POS is pure (and not const) so that it isn't diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 0bbb8940921..d78a2cc4b02 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -2777,6 +2777,15 @@ analyze_access_subtree (struct access *root, struct access *parent, if ((root->grp_scalar_read || root->grp_assignment_read) && TREE_CODE (root->base) == PARM_DECL) return true; + /* Now in fsra (SRA_MODE_FINAL_INTRA), only PARAM and RETURNS + are candidates, so if "VAR_P (root->base)", then it is used by + a return stmt. + TODO: add a flag to root->base to indicate it is used by return + stmt.*/ + if ((root->grp_scalar_write || root->grp_assignment_write) + && VAR_P (root->base)) + return true; + return false; }; @@ -2853,9 +2862,13 @@ analyze_access_subtree (struct access *root, struct access *parent, else if (root->grp_write || comes_initialized_p (root->base)) root->grp_unscalarized_data = 1; /* not covered and written to */ - if (sra_mode == SRA_MODE_FINAL_INTRA && root->grp_write - && TREE_CODE (root->base) == PARM_DECL) - return false; + if (sra_mode == SRA_MODE_FINAL_INTRA) + {/* Does not support writen to PARAM and partial-unscalarized RET yet. */ + if (root->grp_unscalarized_data && (VAR_P (root->base))) + return false; + if (root->grp_write && TREE_CODE (root->base) == PARM_DECL) + return false; + } return sth_created; } @@ -3853,7 +3866,25 @@ generate_subtree_copies (struct access *access, tree agg, !insert_after, insert_after ? GSI_NEW_STMT : GSI_SAME_STMT); - stmt = gimple_build_assign (expr, repl); + if (sra_mode == SRA_MODE_FINAL_INTRA && VAR_P (access->base) + && (access->grp_scalar_write || access->grp_assignment_write)) + { + enum internal_fn fcode; + if (access->first_child == NULL + && access->next_sibling == NULL) + fcode = IFN_SET_RET_LAST_PARTS; + else + fcode = IFN_SET_RET_PARTS; + + gimple *call = gimple_build_call_internal ( + fcode, 4, access->base, + wide_int_to_tree (sizetype, access->offset), + wide_int_to_tree (sizetype, access->size), repl); + gimple_call_set_lhs (call, expr); + stmt = call; + } + else + stmt = gimple_build_assign (expr, repl); } gimple_set_location (stmt, loc); diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c b/gcc/testsuite/gcc.target/powerpc/pr65421.c new file mode 100644 index 00000000000..ea86b53afbb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c @@ -0,0 +1,10 @@ +/* { dg-require-effective-target hard_float } */ +/* { dg-require-effective-target powerpc_elfv2 } */ +/* { dg-options "-O2" } */ + +/* { dg-final { scan-assembler-times {\mlfd\M} 4 {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mstd\M} {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mld\M} {target { lp64 && has_arch_pwr8 } } } } */ + +typedef struct { double a[4]; } A; +A foo (const A *a) { return *a; } diff --git a/gcc/testsuite/gcc.target/powerpc/pr69143.c b/gcc/testsuite/gcc.target/powerpc/pr69143.c new file mode 100644 index 00000000000..216a270fb7b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr69143.c @@ -0,0 +1,23 @@ +/* { dg-require-effective-target hard_float } */ +/* { dg-require-effective-target powerpc_elfv2 } */ +/* { dg-options "-O2" } */ + +/* { dg-final { scan-assembler-times {\mfmr\M} 3 {target { lp64 && has_arch_pwr8 } } } } */ +/* { dg-final { scan-assembler-not {\mxscvspdpn\M} {target { lp64 && has_arch_pwr8 } } } } */ + +struct foo1 +{ + float x; + float y; +}; + +struct foo1 +blah1 (struct foo1 y) +{ + struct foo1 x; + + x.x = y.y; + x.y = y.x; + + return x; +}