From patchwork Fri Dec 8 02:16:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 1873578 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=KGF+eCJi; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SmZXY3pXGz23mf for ; Fri, 8 Dec 2023 13:17:17 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 747DE3858CDB for ; Fri, 8 Dec 2023 02:17:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by sourceware.org (Postfix) with ESMTPS id 2E1023858D39; Fri, 8 Dec 2023 02:17:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2E1023858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2E1023858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=134.134.136.65 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702001823; cv=none; b=BW8O4zGCt5V6yKP9qPdNG4wjWdZBv6yE9jZU+Ph8ikjT/8vV1t3JyqK1XH8l6FJNFo8rCzKG2/3S0m64SGZ100tx7OfUe0EAIOWXMD7q2U3YFwCQYPGnLuiaHTaqKut18sP9z4yLUrbhDMQYylLViCp0WfhV8MWdZDUnD3e132Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702001823; c=relaxed/simple; bh=8IPy+X0XFavhfK1tS9Ekedz8TkLn/bxd5GtH43esFcA=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=CtXG0AVD2tH3ojB/UgoWlCM1V4vjBB51AybV+dHTdBvvZ8BsigfQpkU+mRcDuWzey0tvgvh0lH9b+vn2/oyR3Z/+Zgp7LVxPGUFj6dqxVHbrvcVSLeEce5gHlVB4vYfgvM3r4NDKyNiP4QKrkuAqsmt9HxEKqDTc+RqaBoxfiNw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702001820; x=1733537820; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=8IPy+X0XFavhfK1tS9Ekedz8TkLn/bxd5GtH43esFcA=; b=KGF+eCJib8LTmnw2PXuMptCXgQueSSTwxNQjrKfkZk1zYAcLzBhmb+xY hkN6TzGcydEoCau7V/IOfzQiUHoq9i9d+xsUVrzIdTel8hNLtWdHa9I7s LiPX2xn7ZR7MhWuJOv5B263/fZZXbmD3w3i8vabfFXeoPvDx9I1Vxz25/ 0m8udukkkkLQAmyTpXVgxHiSu7L9qsRpHpmY5eE+PBrGJFqltHhZtz4zx If9AiYSdYl7LKga3BkF1hVlGBEUoY58DvUvIEL0ePDantUh3V456euPw1 1gv3Gn+z6UdWACnv0+PbrPmRhYI6z37Ur0dfd4mIHi04FaqgW+WzIO0sQ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10917"; a="398214470" X-IronPort-AV: E=Sophos;i="6.04,259,1695711600"; d="scan'208";a="398214470" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2023 18:16:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10917"; a="895371018" X-IronPort-AV: E=Sophos;i="6.04,259,1695711600"; d="scan'208";a="895371018" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga004.jf.intel.com with ESMTP; 07 Dec 2023 18:16:57 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id F323E1005684; Fri, 8 Dec 2023 10:16:55 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: rsandifo@gcc.gnu.org Subject: [PATCH] Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS. Date: Fri, 8 Dec 2023 10:16:55 +0800 Message-Id: <20231208021655.1595917-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org If the function desn't clobber any sse registers or only clobber 128-bit part, then vzeroupper isn't issued before the function exit. the status not CLEAN but ANY after the function. Also for sibling_call, it's safe to issue an vzeroupper. Also there could be missing vzeroupper since there's no mode_exit for sibling_call_p. Compared to the patch in the PR, this patch add sibling_call part. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk and backport? gcc/ChangeLog: PR target/112891 * config/i386/i386.cc (ix86_avx_u128_mode_after): Return AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to align with ix86_avx_u128_mode_needed. (ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for sibling_call. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112891.c: New test. * gcc.target/i386/pr112891-2.c: New test. --- gcc/config/i386/i386.cc | 22 +++++++++++++--- gcc/testsuite/gcc.target/i386/pr112891-2.c | 30 ++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr112891.c | 29 +++++++++++++++++++++ 3 files changed, 78 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr112891-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr112891.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 7c5cab4e2c6..fe259cdb789 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -15038,8 +15038,12 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) vzeroupper if all SSE registers are clobbered. */ const function_abi &abi = insn_callee_abi (insn); if (vzeroupper_pattern (PATTERN (insn), VOIDmode) - || !hard_reg_set_subset_p (reg_class_contents[SSE_REGS], - abi.mode_clobbers (V4DImode))) + /* Should be safe to issue an vzeroupper before sibling_call_p. + Also there not mode_exit for sibling_call, so there could be + missing vzeroupper for that. */ + || !(SIBLING_CALL_P (insn) + || hard_reg_set_subset_p (reg_class_contents[SSE_REGS], + abi.mode_clobbers (V4DImode)))) return AVX_U128_ANY; return AVX_U128_CLEAN; @@ -15177,7 +15181,19 @@ ix86_avx_u128_mode_after (int mode, rtx_insn *insn) bool avx_upper_reg_found = false; note_stores (insn, ix86_check_avx_upper_stores, &avx_upper_reg_found); - return avx_upper_reg_found ? AVX_U128_DIRTY : AVX_U128_CLEAN; + if (avx_upper_reg_found) + return AVX_U128_DIRTY; + + /* If the function desn't clobber any sse registers or only clobber + 128-bit part, Then vzeroupper isn't issued before the function exit. + the status not CLEAN but ANY after the function. */ + const function_abi &abi = insn_callee_abi (insn); + if (!(SIBLING_CALL_P (insn) + || hard_reg_set_subset_p (reg_class_contents[SSE_REGS], + abi.mode_clobbers (V4DImode)))) + return AVX_U128_ANY; + + return AVX_U128_CLEAN; } /* Otherwise, return current mode. Remember that if insn diff --git a/gcc/testsuite/gcc.target/i386/pr112891-2.c b/gcc/testsuite/gcc.target/i386/pr112891-2.c new file mode 100644 index 00000000000..164c3985d50 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr112891-2.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3" } */ +/* { dg-final { scan-assembler-times "vzeroupper" 1 } } */ + +void +__attribute__((noinline)) +bar (double* a) +{ + a[0] = 1.0; + a[1] = 2.0; +} + +double +__attribute__((noinline)) +foo (double* __restrict a, double* b) +{ + a[0] += b[0]; + a[1] += b[1]; + a[2] += b[2]; + a[3] += b[3]; + bar (b); + return a[5] + b[5]; +} + +double +foo1 (double* __restrict a, double* b) +{ + double c = foo (a, b); + return __builtin_exp (c); +} diff --git a/gcc/testsuite/gcc.target/i386/pr112891.c b/gcc/testsuite/gcc.target/i386/pr112891.c new file mode 100644 index 00000000000..dbf6c67948a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr112891.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3" } */ +/* { dg-final { scan-assembler-times "vzeroupper" 1 } } */ + +void +__attribute__((noinline)) +bar (double* a) +{ + a[0] = 1.0; + a[1] = 2.0; +} + +void +__attribute__((noinline)) +foo (double* __restrict a, double* b) +{ + a[0] += b[0]; + a[1] += b[1]; + a[2] += b[2]; + a[3] += b[3]; + bar (b); +} + +double +foo1 (double* __restrict a, double* b) +{ + foo (a, b); + return __builtin_exp (b[1]); +}