From patchwork Tue Jul 2 03:23:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 1954999 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=iOxAtLgI; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WCpD90476z1xpc for ; Tue, 2 Jul 2024 13:24:05 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BF051386103B for ; Tue, 2 Jul 2024 03:24:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by sourceware.org (Postfix) with ESMTPS id 5906E383E518 for ; Tue, 2 Jul 2024 03:23:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5906E383E518 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5906E383E518 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719890622; cv=none; b=Amt14piI946bOvGgGGq90pqSOw1Ye5f2dVen/ZHcCjLuj26XwmHSYAYD7m+XEcjWdCjMldqhDmgD8ysBJ++sxAeCIqLsO0QNVp91AUoA94qbV3zdNqu24dgL236GW6W5qxIBeUaqcYqQbyIZwPTypIpHnBrZFzTFdy9T+5vzzsU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719890622; c=relaxed/simple; bh=l+L5spDI/G9Mt66e0w1+9/B/PjGZgvE2R4hXi1AAXKA=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=gr7b+9gU5Mk/bPa8FiBPVdYdRKjaATxehYZpOBAdmpl3yoBKTEa48RgH8RsJeXYeklK6MdVEpl73FaisAsgT+zdpTlEHtba2KaLaJKN2ZWKiXTlDolQstxL6gNQHa0M8bAnmsTrG+X+ZfAhSUnSUTvf300uQlWutXNj6fz+d+pw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719890620; x=1751426620; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=l+L5spDI/G9Mt66e0w1+9/B/PjGZgvE2R4hXi1AAXKA=; b=iOxAtLgIuXknfSj2pUcnVt8tDebUOBGV/plT6ZOtG3CUsDRzdmyipqJh wx3GgHwo2Xly4qu/MJkntNMFxfB8uUzjvHFIIsX+y6bCQMvSV4gurXEtH ScGiOREYvpGloAV9b2kefpoasj0OCHW/Ktefj2M/zBJNsj3BUfzgzUIDS 9chcOYqdbnzvYvKucnv3kZlP+h54XnTbXbS7bNIdo/xJG9PT2xQl2Azmi wB6/ZMMBA4aa5VzCXBU2H3OHRxlAz8JopgGpoJPIH0S0OigPusOKC9TLA PCdCtKfR0t+XNGWnbcwR2uGUt9jgHgBiqiQywGZM4CGe4p9s6Gs2/uScy Q==; X-CSE-ConnectionGUID: I6kyb8DjSl+BQQsehASRbw== X-CSE-MsgGUID: TZVj7ckqRwyIH5YNMUj//Q== X-IronPort-AV: E=McAfee;i="6700,10204,11120"; a="17183489" X-IronPort-AV: E=Sophos;i="6.09,178,1716274800"; d="scan'208";a="17183489" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jul 2024 20:23:39 -0700 X-CSE-ConnectionGUID: 9JaG1Hh+TJ2xezQyF4JasQ== X-CSE-MsgGUID: UMmsGobiTZyrK1m7Vz4I9w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,178,1716274800"; d="scan'208";a="45715194" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa008.fm.intel.com with ESMTP; 01 Jul 2024 20:23:37 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id C5A1A1007357; Tue, 2 Jul 2024 11:23:36 +0800 (CST) From: Hongyu Wang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com Subject: [PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue Date: Tue, 2 Jul 2024 11:23:36 +0800 Message-Id: <20240702032336.2766166-1-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, According to APX spec, the pushp/popp pairs should be matched, otherwise the PPX hint cannot take effect and cause performance loss. In the ix86_expand_epilogue, there are several optimizations that may cause the epilogue using mov to restore the regs. Check if PPX applied and prevent usage of mov/leave in the epilogue. Bootstrapped/regtested on x86_64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return. (ix86_emit_save_regs): Emit ppx is available only when TARGET_APX_PPX && !crtl->calls_eh_return. (ix86_expand_epilogue): Don't restore reg using mov when apx_ppx_used flag is true. * config/i386/i386.h (struct machine_frame_state): Add apx_ppx_used flag. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ppx-2.c: New test. * gcc.target/i386/apx-ppx-3.c: Likewise. --- gcc/config/i386/i386.cc | 13 +++++++++---- gcc/config/i386/i386.h | 4 ++++ gcc/testsuite/gcc.target/i386/apx-ppx-2.c | 14 ++++++++++++++ gcc/testsuite/gcc.target/i386/apx-ppx-3.c | 7 +++++++ 4 files changed, 34 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-ppx-2.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-ppx-3.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index bd7411190af..99def8d4a77 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -7429,6 +7429,7 @@ ix86_emit_save_regs (void) { int regno; rtx_insn *insn; + bool use_ppx = TARGET_APX_PPX && !crtl->calls_eh_return; if (!TARGET_APX_PUSH2POP2 || !ix86_can_use_push2pop2 () @@ -7438,7 +7439,7 @@ ix86_emit_save_regs (void) if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true)) { insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno), - TARGET_APX_PPX)); + use_ppx)); RTX_FRAME_RELATED_P (insn) = 1; } } @@ -7469,7 +7470,7 @@ ix86_emit_save_regs (void) regno_list[0]), gen_rtx_REG (word_mode, regno_list[1]), - TARGET_APX_PPX)); + use_ppx)); RTX_FRAME_RELATED_P (insn) = 1; rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3)); @@ -7502,7 +7503,7 @@ ix86_emit_save_regs (void) else { insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno), - TARGET_APX_PPX)); + use_ppx)); RTX_FRAME_RELATED_P (insn) = 1; aligned = true; } @@ -7511,7 +7512,7 @@ ix86_emit_save_regs (void) { insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno_list[0]), - TARGET_APX_PPX)); + use_ppx)); RTX_FRAME_RELATED_P (insn) = 1; } } @@ -8985,6 +8986,7 @@ ix86_expand_prologue (void) if (!frame.save_regs_using_mov) { ix86_emit_save_regs (); + m->fs.apx_ppx_used = TARGET_APX_PPX && !crtl->calls_eh_return; int_registers_saved = true; gcc_assert (m->fs.sp_offset == frame.reg_save_offset); } @@ -9870,6 +9872,9 @@ ix86_expand_epilogue (int style) /* SEH requires the use of pops to identify the epilogue. */ else if (TARGET_SEH) restore_regs_via_mov = false; + /* If we already save reg with pushp, don't use move at epilogue. */ + else if (m->fs.apx_ppx_used) + restore_regs_via_mov = false; /* If we're only restoring one register and sp cannot be used then using a move instruction to restore the register since it's less work than reloading sp and popping the register. */ diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 147b12cd014..0c5292e1d64 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2693,6 +2693,10 @@ struct GTY(()) machine_frame_state The flags realigned and sp_realigned are mutually exclusive. */ BOOL_BITFIELD sp_realigned : 1; + /* When APX_PPX used in prologue, force epilogue to emit + popp instead of move and leave. */ + BOOL_BITFIELD apx_ppx_used : 1; + /* If sp_realigned is set, this is the last valid offset from the CFA that can be used for access with the frame pointer. */ HOST_WIDE_INT sp_realigned_fp_last; diff --git a/gcc/testsuite/gcc.target/i386/apx-ppx-2.c b/gcc/testsuite/gcc.target/i386/apx-ppx-2.c new file mode 100644 index 00000000000..42a95340b55 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/apx-ppx-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O1 -mapx-features=ppx -fno-omit-frame-pointer" } */ + +/* { dg-final { scan-assembler "pushp" } } */ +/* { dg-final { scan-assembler "popp" } } */ +/* { dg-final { scan-assembler-not "leave" } } */ + +extern int bar (int a); +extern int *q; + +void foo (int *a) +{ + q[2] = bar (q[1]); +} diff --git a/gcc/testsuite/gcc.target/i386/apx-ppx-3.c b/gcc/testsuite/gcc.target/i386/apx-ppx-3.c new file mode 100644 index 00000000000..76931fbe294 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/apx-ppx-3.c @@ -0,0 +1,7 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mapx-features=ppx" } */ + +/* { dg-final { scan-assembler-not "pushp" } } */ +/* { dg-final { scan-assembler-not "popp" } } */ + +#include "eh_return-2.c"