From patchwork Thu Jun 27 08:23:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1953030 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=GYvgu75T; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W8s8b25N2z20XB for ; Thu, 27 Jun 2024 18:25:51 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6D0623838A0F for ; Thu, 27 Jun 2024 08:25:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) by sourceware.org (Postfix) with ESMTPS id 64D2A3838A04 for ; Thu, 27 Jun 2024 08:23:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 64D2A3838A04 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 64D2A3838A04 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719476601; cv=none; b=Ti1bknMUVNhkgHyb5CdQp9EHcNDO+juahZ9chz8Jh4ATMnAiEhHJu1G6ZcSWJUCeMKSQEpqKiYED6PRj1afmP6Dmtl5wzs2B3wuNQgXfzB0U8Owk9L7YeNIIP9dZSjGgcaD/ro4bva+Lhjldzj3/7+j+uw/q4G6Vi/NnmF3VYrk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719476601; c=relaxed/simple; bh=AEz6ashJVQn9iBByWgFq7BjmrTRX6hglvemdR5fwm6E=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=A0N1Jj51nRZ9w8GliCLtBqHXU21hkr8JcW8GkxOBWTyNmGF64sh/uU6kaSkDz1EJqWtCxxuum/qofv4hd7aMgxPhLaADvibSPBhqUwXgOyStbfYbI83VKbL/u21czyjHDArgnlF9m2lckH46eEthdUyRwLQy1Nh7ysrVu/JB4Yk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719476595; x=1751012595; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=AEz6ashJVQn9iBByWgFq7BjmrTRX6hglvemdR5fwm6E=; b=GYvgu75TqgYCEEut3h/beT6zOif7wtWKCwYYJf9QM1DBIugTAee81yFB 6Y/uEOwqOGrFWuaz5GL+p+7jcsGNx7BtLh9E9hl8ZMoJlmZY93m7cy+0w T0v4pcuZQ0HCaCfkxjcwadpMw/dpUp2toMnZfz8zF+1wLwFeHh1xUzsSB 8i/1P94oyJvqNsnos30HWqFEojEJK+Ax4IQj86/JvmSQzkRV9h05V1Q7L Dk23wBhtwFF1mwtFU1kdGzU17wkp3QzCT+9AHEP+evfBHcxchBEIbytFD gSWqD2pcDT+aAqxFuGtjtIbCslyk56X6HqSy0cvYEhsZZyDiOf09Wih65 Q==; X-CSE-ConnectionGUID: c+IOTgFERF2U+RCq9n+ruQ== X-CSE-MsgGUID: u2c6+bDPQP+FzS3EM2h+3g== X-IronPort-AV: E=McAfee;i="6700,10204,11115"; a="16732300" X-IronPort-AV: E=Sophos;i="6.08,269,1712646000"; d="scan'208";a="16732300" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jun 2024 01:23:10 -0700 X-CSE-ConnectionGUID: mdvAlL4DQwGIKYHK4XRSbA== X-CSE-MsgGUID: qRyYIlPTSWioRd/p2khjcw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,269,1712646000"; d="scan'208";a="44944357" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orviesa007.jf.intel.com with ESMTP; 27 Jun 2024 01:23:08 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 8F0921005682; Thu, 27 Jun 2024 16:23:07 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH 0/7][x86] Remove vcond{,u,eq} expanders. Date: Thu, 27 Jun 2024 16:23:00 +0800 Message-Id: <20240627082307.1166985-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org There're several regressions after obsolete vcond{,u,eq}, Some regressions are due to the direct optimizations in ix86_expand_{fp,int}_vcond..i.e ix86_expand_sse_fp_minmax. Some regrssions are due to optimizations relies on canonicalization in ix86_expand_{fp,int}_vcond. This series add define_split or define_insn_and_split to restore those optimizations at pass_combine. It fixed most regressions in GCC testsuite except for ones compiled w/o sse4.1. W/o sse4.1 it takes 3 instrution for vector condition move, and pass_combine only supports at most 4 instructions combination. One possible solution is add fake "ssemovcc" instructions to help combine, and split that back to real instruction. This series doesn't handle that, but just adjust testcases to XFAIL. I also test performance on SPEC2017 with different options set. -march=sapphirerapids -O2 -march=x86-64-v3 -O2 -march=x86-64 -O2 -march=sapphirerapids -O2 Didn't observe obvious performance change, mostly same binaries. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Any comments? liuhongt (7): [x86] Add more splitters to match (unspec [op1 op2 (gt op3 constm1_operand)] UNSPEC_BLENDV) Lower AVX512 kmask comparison back to AVX2 comparison when op_{true,false} is vector -1/0. [x86] Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}. Add more splitter for mskmov with avx512 comparison. Adjust testcase for the regressed testcases after obsolete of vcond{,u,eq}. [x86] Optimize a < 0 ? -1 : 0 to (signed)a >> 31. Remove vcond{,u,eq} expanders since they will be obsolete. gcc/config/i386/mmx.md | 149 ++-- gcc/config/i386/sse.md | 772 +++++++++++++----- gcc/testsuite/g++.target/i386/avx2-pr115517.C | 60 ++ .../g++.target/i386/avx512-pr115517.C | 70 ++ gcc/testsuite/g++.target/i386/pr100637-1b.C | 4 +- gcc/testsuite/g++.target/i386/pr100637-1w.C | 4 +- gcc/testsuite/g++.target/i386/pr103861-1.C | 4 +- .../g++.target/i386/sse4_1-pr100637-1b.C | 17 + .../g++.target/i386/sse4_1-pr100637-1w.C | 17 + .../g++.target/i386/sse4_1-pr103861-1.C | 17 + gcc/testsuite/gcc.target/i386/avx2-pr115517.c | 33 + .../gcc.target/i386/avx512-pr115517.c | 70 ++ gcc/testsuite/gcc.target/i386/pr103941-2.c | 2 +- gcc/testsuite/gcc.target/i386/pr111023-2.c | 4 +- gcc/testsuite/gcc.target/i386/pr88540.c | 4 +- .../gcc.target/i386/sse4_1-pr88540.c | 10 + gcc/testsuite/gcc.target/i386/vect-div-1.c | 3 +- 17 files changed, 918 insertions(+), 322 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/avx2-pr115517.C create mode 100644 gcc/testsuite/g++.target/i386/avx512-pr115517.C create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr100637-1b.C create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr100637-1w.C create mode 100644 gcc/testsuite/g++.target/i386/sse4_1-pr103861-1.C create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr115517.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512-pr115517.c create mode 100644 gcc/testsuite/gcc.target/i386/sse4_1-pr88540.c