From patchwork Wed Jan 19 06:31:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1581669 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=bj6wRntJ; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JdwqC4jq3z9sCD for ; Wed, 19 Jan 2022 17:34:46 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A5FE33858037 for ; Wed, 19 Jan 2022 06:34:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A5FE33858037 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1642574083; bh=d/lRVKAzWB9aBgKBvIdw8+gcA9LhvN4imDbqlNBSD3c=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=bj6wRntJn3BWvC7QI0+qnAMvxpFqHOlZdRfvYd33/3ISgDUpMnv5pRL9+iKwoKiG2 B/hGzKy5cge7xqfHJPQqOg7PBlIwf/kH84+IBuZCxyPROfemL8izCht/NDWtvnQOQJ a7NR7PufXZasIs80m0vZECdY3usHWQ1MrEue4k6E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 30E4A3857805 for ; Wed, 19 Jan 2022 06:31:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 30E4A3857805 X-IronPort-AV: E=McAfee;i="6200,9189,10231"; a="244791900" X-IronPort-AV: E=Sophos;i="5.88,299,1635231600"; d="scan'208";a="244791900" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2022 22:31:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,299,1635231600"; d="scan'208";a="532132031" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga008.jf.intel.com with ESMTP; 18 Jan 2022 22:31:20 -0800 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 20J6VJAh030684; Tue, 18 Jan 2022 22:31:20 -0800 To: gcc-patches@gcc.gnu.org Subject: [PATCH] Enhance vec_pack_trunc for integral mode mask. Date: Wed, 19 Jan 2022 14:31:19 +0800 Message-Id: <20220119063119.21441-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: References: X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" > your description above hints at that the actual modes involved in the > vec_pack_sbool_trunc are the same so the TYPE_MODE (narrow_vectype) > and TYPE_MODE (vectype) are not the actual modes participating. I think > it would be way better to fix that. > > I suppose that since we know TYPE_VECTOR_SUBPARTS is a power of two > it's always going to be only QImode that is of interest here so maybe a better > check would be TYPE_MODE (narrow_vectype) == QImode rather than > the equality check or elide the mode check completely and only retain > the TYPE_VECTOR_SUBPARTS check you add? > > > optab1 = vec_pack_sbool_trunc_optab; > > else > > optab1 = optab_for_tree_code (c1, vectype, optab_default); > > @@ -12213,7 +12216,9 @@ supportable_narrowing_operation (enum tree_code code, > > if (VECTOR_BOOLEAN_TYPE_P (intermediate_type) > > && VECTOR_BOOLEAN_TYPE_P (prev_type) > > && intermediate_mode == prev_mode > > Likewise here. > > So I think the change is OK if you remove the mode equality checks. Thanks for the review, here is updated patch, it survived bootstrap and regtest. I'm going to check in the patch if there's no surprise for SPEC2017 on ICX. For testcase in PR, the patch supports QI:4 -> HI:16 pack with multi steps(first pack QI:4 -> QI:8 through vec_pack_sbool_trunc_qi, then pack QI:8 -> HI:16 through vec_pack_trunc_hi). Similar for QI:2 -> HI:16 which is test4 in mask-pack-prefer-128.c. gcc/ChangeLog: PR target/103771 * tree-vect-stmts.c (supportable_narrowing_operation): Enhance integral mode mask pack by multi steps which takes vec_pack_sbool_trunc_optab as start when elements number is less than BITS_PER_UNITS. gcc/testsuite/ChangeLog: * gcc.target/i386/mask-pack-prefer128.c: New test. * gcc.target/i386/mask-pack-prefer128.c: New test. * gcc.target/i386/pr103771.c: New test. --- .../gcc.target/i386/mask-pack-prefer128.c | 8 ++++++++ .../gcc.target/i386/mask-pack-prefer256.c | 8 ++++++++ gcc/testsuite/gcc.target/i386/pr103771.c | 18 ++++++++++++++++++ gcc/tree-vect-stmts.cc | 11 +++++++---- 4 files changed, 41 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c create mode 100644 gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c create mode 100644 gcc/testsuite/gcc.target/i386/pr103771.c diff --git a/gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c b/gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c new file mode 100644 index 00000000000..c9ea37c7ed3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mask-pack-prefer128.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-march=skylake-avx512 -O3 -fopenmp-simd -fdump-tree-vect-details -mprefer-vector-width=128" } */ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */ +/* { dg-final { scan-assembler-not "maskmov" } } */ + +#include "mask-pack.c" diff --git a/gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c b/gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c new file mode 100644 index 00000000000..841f51b4041 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mask-pack-prefer256.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-march=skylake-avx512 -O3 -fopenmp-simd -fdump-tree-vect-details -mprefer-vector-width=256" } */ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */ +/* { dg-final { scan-assembler-not "maskmov" } } */ + +#include "mask-pack.c" diff --git a/gcc/testsuite/gcc.target/i386/pr103771.c b/gcc/testsuite/gcc.target/i386/pr103771.c new file mode 100644 index 00000000000..a1a9952b6a8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr103771.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-march=cascadelake -O3 -fdump-tree-vect-details -mprefer-vector-width=128" } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + +typedef unsigned char uint8_t; + +static uint8_t x264_clip_uint8 (int x) +{ + return x & (~255) ? (-x) >> 31 : x; +} + +void +mc_weight (uint8_t* __restrict dst, uint8_t* __restrict src, + int i_width,int i_scale) +{ + for(int x = 0; x < i_width; x++) + dst[x] = x264_clip_uint8 (src[x] * i_scale); +} diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 95be4f38eea..824ebb6354b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -12124,6 +12124,7 @@ supportable_narrowing_operation (enum tree_code code, tree intermediate_type, prev_type; machine_mode intermediate_mode, prev_mode; int i; + unsigned HOST_WIDE_INT n_elts; bool uns; *multi_step_cvt = 0; @@ -12133,8 +12134,9 @@ supportable_narrowing_operation (enum tree_code code, c1 = VEC_PACK_TRUNC_EXPR; if (VECTOR_BOOLEAN_TYPE_P (narrow_vectype) && VECTOR_BOOLEAN_TYPE_P (vectype) - && TYPE_MODE (narrow_vectype) == TYPE_MODE (vectype) - && SCALAR_INT_MODE_P (TYPE_MODE (vectype))) + && SCALAR_INT_MODE_P (TYPE_MODE (vectype)) + && TYPE_VECTOR_SUBPARTS (vectype).is_constant (&n_elts) + && n_elts < BITS_PER_UNIT) optab1 = vec_pack_sbool_trunc_optab; else optab1 = optab_for_tree_code (c1, vectype, optab_default); @@ -12225,8 +12227,9 @@ supportable_narrowing_operation (enum tree_code code, = lang_hooks.types.type_for_mode (intermediate_mode, uns); if (VECTOR_BOOLEAN_TYPE_P (intermediate_type) && VECTOR_BOOLEAN_TYPE_P (prev_type) - && intermediate_mode == prev_mode - && SCALAR_INT_MODE_P (prev_mode)) + && SCALAR_INT_MODE_P (prev_mode) + && TYPE_VECTOR_SUBPARTS (intermediate_type).is_constant (&n_elts) + && n_elts < BITS_PER_UNIT) interm_optab = vec_pack_sbool_trunc_optab; else interm_optab