From patchwork Wed May 15 00:18:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1935230 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=FGUkAxS6; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VfDQJ698tz1ymw for ; Wed, 15 May 2024 10:20:22 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA524384AB52 for ; Wed, 15 May 2024 00:20:20 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by sourceware.org (Postfix) with ESMTPS id DD5A83858D35 for ; Wed, 15 May 2024 00:19:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DD5A83858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DD5A83858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715732397; cv=none; b=S2SjTC9EErmFX//as4FQU3wVsvObwHhqqy41MO+4MBk4U12ipVwF/l81FjHpJ69h9jLvMy9/NrzDrc0gautxhvHuOFLsdlHrC0g4UX7/Kg7JyFkxv+ovqBDQdSglEAXoo1EdiwMjjm9rGRLTCGdDfRkqPkAKoyq6gCYenracxow= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715732397; c=relaxed/simple; bh=ap0Dlpc+SJLM966erQaHYSc9N8u5dmZVNSrhi40IW0U=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=p58liAxAZXQ+vk66ONWJU6z7DmBS1tHCcP70Xm9La030yFIK2UDmzfKRVi/PS8UinP4U4rAHAIkf+sziXjG2BFTgr4DlRhZ7Omjjy6darp0LmT9/RptLThK7fuGG88a7VelM9KcxZ+TktWGW1S2mUqObAlCeH6QxFezQhlozQ/Y= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715732395; x=1747268395; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=ap0Dlpc+SJLM966erQaHYSc9N8u5dmZVNSrhi40IW0U=; b=FGUkAxS6ZpOQs3OG+Tg8PrSMSMMSFDD1Fm+n07LdwJjhtaNhOovjKIdF k5+ImSnZo6vKNFE39lKTKVtdYtX4tmsd2Gp/fYW2Utky2wuX4h3V2msae gLmtNBGmYE72vHxzIfBzUUmFvE1MIS+yicChHll+CqjH8Li+mgINw98+2 06JKrJUq0EyuUBOouYtHvP4j6sOdCj4hDScIrBr57TsRvDZYeNDDh8Iji JnCs4phnorvVsp6IYtRzhC/Sgro51HcW7HQLvnRTYJxBG+UDHs/FFTC8w jbMFrc25svpt4qSQfnW0H1esUCkH8NhSxDDu4/sBbo3lqBmtj/+1y8HfH g==; X-CSE-ConnectionGUID: anwxg1m+QVOtrcoUfR+L7g== X-CSE-MsgGUID: ezmTkUFcRieQeBRPGUgizQ== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="15543398" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="15543398" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 17:18:45 -0700 X-CSE-ConnectionGUID: okpU9Tb3SzC8YNwJLJlM3Q== X-CSE-MsgGUID: zlzrAsGfS7SEp1YxNSbokQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="31426836" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orviesa007.jf.intel.com with ESMTP; 14 May 2024 17:18:43 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id AD0471007358; Wed, 15 May 2024 08:18:42 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] [x86] Optimize ashift >> 7 to vpcmpgtb for vector int8. Date: Wed, 15 May 2024 08:18:42 +0800 Message-Id: <20240515001842.1551438-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Since there is no corresponding instruction, the shift operation for vector int8 is implemented using the instructions for vector int16, but for some special shift counts, it can be transformed into vpcmpgtb. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: PR target/114514 * config/i386/i386-expand.cc (ix86_expand_vec_shift_qihi_constant): Optimize ashift >> 7 to vpcmpgtb. (ix86_expand_vecop_qihi_partial): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr114514-shift.c: New test. --- gcc/config/i386/i386-expand.cc | 32 ++++++++++++ .../gcc.target/i386/pr114514-shift.c | 49 +++++++++++++++++++ 2 files changed, 81 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr114514-shift.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 1ab22fe7973..ab6631f51e3 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -24182,6 +24182,28 @@ ix86_expand_vec_shift_qihi_constant (enum rtx_code code, return false; gcc_assert (code == ASHIFT || code == ASHIFTRT || code == LSHIFTRT); + + + if (shift_amount == 7 + && code == ASHIFTRT) + { + if (qimode == V16QImode + || qimode == V32QImode) + { + rtx zero = gen_reg_rtx (qimode); + emit_move_insn (zero, CONST0_RTX (qimode)); + emit_move_insn (dest, gen_rtx_fmt_ee (GT, qimode, zero, op1)); + } + else + { + gcc_assert (qimode == V64QImode); + rtx kmask = gen_reg_rtx (DImode); + emit_insn (gen_avx512bw_cvtb2maskv64qi (kmask, op1)); + emit_insn (gen_avx512bw_cvtmask2bv64qi (dest, kmask)); + } + return true; + } + /* Record sign bit. */ xor_constant = 1 << (8 - shift_amount - 1); @@ -24292,6 +24314,16 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2) return; } + if (CONST_INT_P (op2) + && code == ASHIFTRT + && INTVAL (op2) == 7) + { + rtx zero = gen_reg_rtx (qimode); + emit_move_insn (zero, CONST0_RTX (qimode)); + emit_move_insn (dest, gen_rtx_fmt_ee (GT, qimode, zero, op1)); + return; + } + switch (code) { case MULT: diff --git a/gcc/testsuite/gcc.target/i386/pr114514-shift.c b/gcc/testsuite/gcc.target/i386/pr114514-shift.c new file mode 100644 index 00000000000..cf8b32b3b1d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr114514-shift.c @@ -0,0 +1,49 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512bw -O2" } */ +/* { dg-final { scan-assembler-times "vpxor" 4 } } */ +/* { dg-final { scan-assembler-times "vpcmpgtb" 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpcmpgtb" 5 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "vpmovb2m" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovm2b" 1 } } */ + + +typedef char v16qi __attribute__((vector_size(16))); +typedef char v32qi __attribute__((vector_size(32))); +typedef char v64qi __attribute__((vector_size(64))); +typedef char v8qi __attribute__((vector_size(8))); +typedef char v4qi __attribute__((vector_size(4))); + +v4qi +__attribute__((noipa)) +foo1 (v4qi a) +{ + return a >> 7; +} + +v8qi +__attribute__((noipa)) +foo2 (v8qi a) +{ + return a >> 7; +} + +v16qi +__attribute__((noipa)) +foo3 (v16qi a) +{ + return a >> 7; +} + +v32qi +__attribute__((noipa)) +foo4 (v32qi a) +{ + return a >> 7; +} + +v64qi +__attribute__((noipa)) +foo5 (v64qi a) +{ + return a >> 7; +}