From patchwork Wed May 15 08:20:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 1935378 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=iXEaAt7W; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VfR633rGRz1ymw for ; Wed, 15 May 2024 18:22:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D16123849ACF for ; Wed, 15 May 2024 08:22:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) by sourceware.org (Postfix) with ESMTPS id 3D704385F014 for ; Wed, 15 May 2024 08:21:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3D704385F014 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3D704385F014 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715761269; cv=none; b=kCeuQIsDStoPF/qvcMp0hAHw/U4ngc98TeWYPPaIUktxts7D6SZFnSmWyDau21jj0fvTcLmaCSd4aCPLJX6IpbfTtxnuJyEI+kO6Hsm4l20kZ0Nqqyy7mlJFO3HewXNdhn2YTbFotuuQYg4M12ESggvbPbd5fS8xSaRQM+kfeX0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715761269; c=relaxed/simple; bh=g46GmmDe+aRkvkBHlw9nrH5rMd0VsApxQ/mJ4LQnRSc=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=D3yUZgdjrqzLvKhs647s2hi/WVlRT50xYMjx0pwIDtlYR6su/ak994nVRAf+U0Ncy4Krzl6/+5K+S4KMa+DYeas0Yrf/S3wtRQDXaffIioKIatw4HNxpkQBjZ7eO5yQ+ssjKmQTjiRsLKFXLj4251Aep4lfKcMa+dPlbCdvAPHc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715761264; x=1747297264; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g46GmmDe+aRkvkBHlw9nrH5rMd0VsApxQ/mJ4LQnRSc=; b=iXEaAt7WHVHmOxCjex447SjoH+3XLPDwa765RJgA+VCI/Jdg1KoVUE5G SKDruDN1qyQJoIUiLPyRPDg8WkYqLVot5qpYTlBqNYJffyjGmy8JvRDMZ HJDmlisCOolc9byJGTTCk34qCXqpFUg/oSHDU38rxoedDfA76qnBiEe0H HLsWI2iOHWesZtregM4cuuj41J7i5+vzEyBYx3iQu7+XwCVrPvG2k8+PT Uo+pnccaLh+wD+rQMMQWMzzXh/6P/pbZJhBE32seN7HM4PxJgdUXErln5 AUldkM8QXDq180DaW1ceQJT5QzP4QiansRuNk8wfW2DENDatiiR8te7/8 Q==; X-CSE-ConnectionGUID: /+IqCRZFSvagFknZAM0bZA== X-CSE-MsgGUID: +itEl7+/RMWoPFOFtpb0+Q== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11648199" X-IronPort-AV: E=Sophos;i="6.08,161,1712646000"; d="scan'208";a="11648199" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2024 01:20:58 -0700 X-CSE-ConnectionGUID: +H4b+Pt1QPWF5JwzY6+ygQ== X-CSE-MsgGUID: qmZ0Fo8YQFKrrMwvEKsB5Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,161,1712646000"; d="scan'208";a="35448190" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa005.fm.intel.com with ESMTP; 15 May 2024 01:20:55 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 8AE451008F96; Wed, 15 May 2024 16:20:54 +0800 (CST) From: Hongyu Wang To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, hongtao.liu@intel.com Subject: [PATCH 3/3] [APX CCMP] Support ccmp for float compare Date: Wed, 15 May 2024 16:20:54 +0800 Message-Id: <20240515082054.3934069-4-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20240515082054.3934069-1-hongyu.wang@intel.com> References: <20240515082054.3934069-1-hongyu.wang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The ccmp insn itself doesn't support fp compare, but x86 has fp comi insn that changes EFLAG which can be the scc input to ccmp. Allow scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD compare which can not be identified in ccmp. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_gen_ccmp_first): Add fp compare and check the allowed fp compare type. (ix86_gen_ccmp_next): Adjust compare_code input to ccmp for fp compare. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ccmp-1.c: Add test for fp compare. * gcc.target/i386/apx-ccmp-2.c: Likewise. --- gcc/config/i386/i386-expand.cc | 53 ++++++++++++++++++++-- gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 45 +++++++++++++++++- gcc/testsuite/gcc.target/i386/apx-ccmp-2.c | 47 +++++++++++++++++++ 3 files changed, 138 insertions(+), 7 deletions(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index f00525e449f..7507034dc91 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -25571,18 +25571,58 @@ ix86_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq, if (op_mode == VOIDmode) op_mode = GET_MODE (op1); + /* We only supports following scalar comparisons that use just 1 + instruction: DI/SI/QI/HI/DF/SF/HF. + Unordered/Ordered compare cannot be corretly indentified by + ccmp so they are not supported. */ if (!(op_mode == DImode || op_mode == SImode || op_mode == HImode - || op_mode == QImode)) + || op_mode == QImode || op_mode == DFmode || op_mode == SFmode + || op_mode == HFmode) + || code == ORDERED + || code == UNORDERED) { end_sequence (); return NULL_RTX; } /* Canonicalize the operands according to mode. */ - if (!nonimmediate_operand (op0, op_mode)) - op0 = force_reg (op_mode, op0); - if (!x86_64_general_operand (op1, op_mode)) - op1 = force_reg (op_mode, op1); + if (SCALAR_INT_MODE_P (op_mode)) + { + if (!nonimmediate_operand (op0, op_mode)) + op0 = force_reg (op_mode, op0); + if (!x86_64_general_operand (op1, op_mode)) + op1 = force_reg (op_mode, op1); + } + else + { + /* op0/op1 can be canonicallized from expand_fp_compare, so + just adjust the code to make it generate supported fp + condition. */ + if (ix86_fp_compare_code_to_integer (code) == UNKNOWN) + { + /* First try to split condition if we don't need to honor + NaNs, as the ORDERED/UNORDERED check always fall + through. */ + if (!HONOR_NANS (op_mode)) + { + rtx_code first_code; + split_comparison (code, op_mode, &first_code, &code); + } + /* Otherwise try to swap the operand order and check if + the comparison is supported. */ + else + { + code = swap_condition (code); + std::swap (op0, op1); + } + + if (ix86_fp_compare_code_to_integer (code) == UNKNOWN) + { + end_sequence (); + return NULL_RTX; + } + } + } *prep_seq = get_insns (); end_sequence (); @@ -25647,6 +25687,9 @@ ix86_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev, dfv = ix86_get_flags_cc ((rtx_code) cmp_code); prev_code = GET_CODE (prev); + /* Fixup FP compare code here. */ + if (GET_MODE (XEXP (prev, 0)) == CCFPmode) + prev_code = ix86_fp_compare_code_to_integer (prev_code); if (bit_code != AND) prev_code = reverse_condition (prev_code); diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c index 5a2dad89f1f..e4e112f07e0 100644 --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { ! ia32 } } } */ -/* { dg-options "-O2 -mapx-features=ccmp" } */ +/* { dg-options "-O2 -ffast-math -mapx-features=ccmp" } */ int f1 (int a) @@ -56,8 +56,49 @@ f9 (int a, int b) return a == 3 || a == 0; } +int +f10 (float a, int b, float c) +{ + return a > c || b < 19; +} + +int +f11 (float a, int b) +{ + return a == 0.0 && b > 21; +} + +int +f12 (double a, int b) +{ + return a < 3.0 && b != 23; +} + +int +f13 (double a, double b, int c, int d) +{ + a += b; + c += d; + return a != b || c == d; +} + +int +f14 (double a, int b) +{ + return b != 0 && a < 1.5; +} + +int +f15 (double a, double b, int c, int d) +{ + return c != d || a <= b; +} + /* { dg-final { scan-assembler-times "ccmpg" 2 } } */ /* { dg-final { scan-assembler-times "ccmple" 2 } } */ /* { dg-final { scan-assembler-times "ccmpne" 4 } } */ -/* { dg-final { scan-assembler-times "ccmpe" 1 } } */ +/* { dg-final { scan-assembler-times "ccmpe" 3 } } */ +/* { dg-final { scan-assembler-times "ccmpbe" 1 } } */ +/* { dg-final { scan-assembler-times "ccmpa" 1 } } */ +/* { dg-final { scan-assembler-times "ccmpbl" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c index 30a1c216c1b..0123a686d2c 100644 --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c @@ -42,6 +42,47 @@ int foo_noapx(int a, int b, int c, int d) return sum; } +__attribute__((noinline, noclone, + optimize(("finite-math-only")), target("apxf"))) +double foo_fp_apx(int a, double b, int c, double d) +{ + int sum = a; + double sumd = b; + + if (a != c) + { + sum += a; + if (a < c || sumd != d || sum > c) + { + c += a; + sum += a + c; + } + } + + return sum + sumd; +} + +__attribute__((noinline, noclone, + optimize(("finite-math-only")), target("no-apxf"))) +double foo_fp_noapx(int a, double b, int c, double d) +{ + int sum = a; + double sumd = b; + + if (a != c) + { + sum += a; + if (a < c || sumd != d || sum > c) + { + c += a; + sum += a + c; + } + } + + return sum + sumd; +} + + int main (void) { if (!__builtin_cpu_supports ("apxf")) @@ -53,5 +94,11 @@ int main (void) if (val1 != val2) __builtin_abort (); + double val3 = foo_fp_noapx (24, 7.5, 32, 2.0); + double val4 = foo_fp_apx (24, 7.5, 32, 2.0); + + if (val3 != val4) + __builtin_abort (); + return 0; }