From patchwork Fri Jul 16 05:33:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1506013 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=vbbeUfWr; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GR0Ky5yVtz9sW8 for ; Fri, 16 Jul 2021 15:34:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E50A1383640F for ; Fri, 16 Jul 2021 05:34:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E50A1383640F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1626413665; bh=J9h36UXVxUN5PRTP7vZL09LxYXxdID2TM8l92zMgdUY=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=vbbeUfWrDkQwT5CBrZf7YqdPvH5bR7fPJWyjSO07nLHWvrKp1ERPEQtzSG/66kC6u XCRWnIG+Xdk1s16qKdkPuT/WsaK0PrJySresvp/GwjiZ+m05kcgSJjpp7YPoYIO8RR NKq8jpnZ3UeEpY47uOnmEgYFizCs0zWi4ALq58Ug= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 938373857C41 for ; Fri, 16 Jul 2021 05:33:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 938373857C41 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 16G5XGTW009044; Fri, 16 Jul 2021 01:33:31 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 39tw4p8vv3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Jul 2021 01:33:31 -0400 Received: from m0098394.ppops.net (m0098394.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 16G5XO1D009795; Fri, 16 Jul 2021 01:33:31 -0400 Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com with ESMTP id 39tw4p8vuf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Jul 2021 01:33:30 -0400 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 16G5X0HI015207; Fri, 16 Jul 2021 05:33:28 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma06ams.nl.ibm.com with ESMTP id 39txefg37d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Jul 2021 05:33:28 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 16G5XQTd24248636 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 16 Jul 2021 05:33:26 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00A4CA4057; Fri, 16 Jul 2021 05:33:26 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 38DDCA4051; Fri, 16 Jul 2021 05:33:23 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.197.235.113]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 16 Jul 2021 05:33:22 +0000 (GMT) Subject: [PATCH v4] vect: Recog mul_highpart pattern To: Richard Biener References: <0b72fa77-a281-35e6-34e3-17cf26f18bc1@linux.ibm.com> <46838de4-3d92-a270-e71a-73fbe923d306@linux.ibm.com> <926b210b-4394-6410-548a-33ca0297955a@linux.ibm.com> Message-ID: Date: Fri, 16 Jul 2021 13:33:21 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: oVmOINmEDmbdaZtouMFyIZMA8M0DLvow X-Proofpoint-GUID: 8q_8PqGFBw9mJnAZ4E1F-HbdIzjEhg9j X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-16_02:2021-07-16, 2021-07-16 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 adultscore=0 lowpriorityscore=0 spamscore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 clxscore=1015 impostorscore=0 mlxscore=0 suspectscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107160031 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Sandiford , Bill Schmidt , GCC Patches , Segher Boessenkool Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" on 2021/7/15 下午7:58, Richard Biener wrote: > On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin wrote: >> >> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote: >>> Hi Uros, >>> >>> on 2021/7/15 下午3:17, Uros Bizjak wrote: >>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin wrote: >>>>> >>>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: >>>>>> on 2021/7/14 下午2:38, Richard Biener wrote: >>>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin wrote: >>>>>>>> >>>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote: >>>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin wrote: >>>>>>> >>>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>> >>>>>> Thanks for confirming! The related patch v2 is attached and the testing >>>>>> is ongoing. >>>>>> >>>>> >>>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and >>>>> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: >>>>> >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >>>> >>>> These XFAILs should be removed after your patch. >>>> >>> I'm curious whether it's intentional not to specify -fno-vect-cost-model >>> for this test case. As noted above, this case is sensitive on how we >>> cost mult_highpart. Without cost modeling, the XFAILs can be removed >>> only with this mul_highpart pattern support, no matter how we model it >>> (x86 part of this patch exists or not). >>> >>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch >>>> is actually not needed. >>>> >>> >>> Thanks for the information! The justification for the x86 part is that: >>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart >>> optab support, i386 port has already customized costing for >>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab >>> support), if we don't follow the same way for IFN_MULH, I'm worried that >>> we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is >>> a right thing (we shouldn't cost it specially), it at least means we >>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it >>> has direct mul_highpart optab support, I think they should be costed >>> consistently. Does it sound reasonable? >>> >> >> Hi Richard(s), >> >> This possibly inconsistent handling problem seems like a counter example >> better to use a new IFN rather than the existing tree_code, it seems hard >> to maintain (should remember to keep consistent for its handlings). ;) >> From this perspective, maybe it's better to move backward to use tree_code >> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid >> costing issue Richi pointed out before)? >> >> What do you think? > > No, whenever we want to do code generation based on machine > capabilities the canonical way to test for those is to look at optabs > and then it's most natural to keep that 1:1 relation and emit > internal function calls which directly map to supported optabs > instead of going back to some tree codes. > > When targets "lie" and provide expanders for something they can > only emulate then they have to compensate in their costing. > But as I understand this isn't the case for x86 here. > > Now, in this case we already have the MULT_HIGHPART_EXPR tree, > so yes, it might make sense to use that instead of introducing an > alternate way via the direct internal function. Somebody decided > that MULT_HIGHPART is generic enough to warrant this - but I > see that expand_mult_highpart can fail unless can_mult_highpart_p > and this is exactly one of the cases we want to avoid - either > we can handle something generally in which case it can be a > tree code or we can't, then it should be 1:1 tied to optabs at best > (mult_highpart has scalar support only for the direct optab, > vector support also for widen_mult). > Thanks for the detailed explanation! The attached v4 follows the preferred IFN way like v3, just with extra test case updates. Bootstrapped & regtested again on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: PR tree-optimization/100696 * internal-fn.c (first_commutative_argument): Add info for IFN_MULH. * internal-fn.def (IFN_MULH): New internal function. * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to recog normal multiply highpart as IFN_MULH. * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined function CFN_MULH. gcc/testsuite/ChangeLog: PR tree-optimization/100696 * gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog. --- gcc/config/i386/i386.c | 3 ++ gcc/internal-fn.c | 1 + gcc/internal-fn.def | 2 ++ gcc/testsuite/gcc.target/i386/pr100637-3w.c | 6 ++-- gcc/tree-vect-patterns.c | 38 +++++++++++++++------ 5 files changed, 37 insertions(+), 13 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a93128fa0a4..1dd9108353c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -22559,6 +22559,9 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count, mode == SFmode ? ix86_cost->fmass : ix86_cost->fmasd); break; + case CFN_MULH: + stmt_cost = ix86_multiplication_cost (ix86_cost, mode); + break; default: break; } diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index fb8b43d1ce2..b1b4289357c 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -3703,6 +3703,7 @@ first_commutative_argument (internal_fn fn) case IFN_FNMS: case IFN_AVG_FLOOR: case IFN_AVG_CEIL: + case IFN_MULH: case IFN_MULHS: case IFN_MULHRS: case IFN_FMIN: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index c3b8e730960..ed6d7de1680 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -169,6 +169,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first, DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first, savg_ceil, uavg_ceil, binary) +DEF_INTERNAL_SIGNED_OPTAB_FN (MULH, ECF_CONST | ECF_NOTHROW, first, + smul_highpart, umul_highpart, binary) DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first, smulhs, umulhs, binary) DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first, diff --git a/gcc/testsuite/gcc.target/i386/pr100637-3w.c b/gcc/testsuite/gcc.target/i386/pr100637-3w.c index b951f30f571..4ea467b4af5 100644 --- a/gcc/testsuite/gcc.target/i386/pr100637-3w.c +++ b/gcc/testsuite/gcc.target/i386/pr100637-3w.c @@ -1,6 +1,6 @@ /* PR target/100637 */ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize -msse4" } */ +/* { dg-options "-O2 -ftree-vectorize -msse4 -fno-vect-cost-model" } */ short r[2], a[2], b[2]; unsigned short ur[2], ua[2], ub[2]; @@ -13,7 +13,7 @@ void mulh (void) r[i] = ((int) a[i] * b[i]) >> 16; } -/* { dg-final { scan-assembler "pmulhw" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler "pmulhw" } } */ void mulhu (void) { @@ -23,7 +23,7 @@ void mulhu (void) ur[i] = ((unsigned int) ua[i] * ub[i]) >> 16; } -/* { dg-final { scan-assembler "pmulhuw" { xfail *-*-* } } } */ +/* { dg-final { scan-assembler "pmulhuw" } } */ void mulhrs (void) { diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index b2e7fc2cc7a..ada89d7060b 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -1896,8 +1896,15 @@ vect_recog_over_widening_pattern (vec_info *vinfo, 1) Multiply high with scaling TYPE res = ((TYPE) a * (TYPE) b) >> c; + Here, c is bitsize (TYPE) / 2 - 1. + 2) ... or also with rounding TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1; + Here, d is bitsize (TYPE) / 2 - 2. + + 3) Normal multiply high + TYPE res = ((TYPE) a * (TYPE) b) >> e; + Here, e is bitsize (TYPE) / 2. where only the bottom half of res is used. */ @@ -1942,7 +1949,6 @@ vect_recog_mulhs_pattern (vec_info *vinfo, stmt_vec_info mulh_stmt_info; tree scale_term; internal_fn ifn; - unsigned int expect_offset; /* Check for the presence of the rounding term. */ if (gimple_assign_rhs_code (rshift_input_stmt) == PLUS_EXPR) @@ -1991,25 +1997,37 @@ vect_recog_mulhs_pattern (vec_info *vinfo, /* Get the scaling term. */ scale_term = gimple_assign_rhs2 (plus_input_stmt); + /* Check that the scaling factor is correct. */ + if (TREE_CODE (scale_term) != INTEGER_CST) + return NULL; + + /* Check pattern 2). */ + if (wi::to_widest (scale_term) + target_precision + 2 + != TYPE_PRECISION (lhs_type)) + return NULL; - expect_offset = target_precision + 2; ifn = IFN_MULHRS; } else { mulh_stmt_info = rshift_input_stmt_info; scale_term = gimple_assign_rhs2 (last_stmt); + /* Check that the scaling factor is correct. */ + if (TREE_CODE (scale_term) != INTEGER_CST) + return NULL; - expect_offset = target_precision + 1; - ifn = IFN_MULHS; + /* Check for pattern 1). */ + if (wi::to_widest (scale_term) + target_precision + 1 + == TYPE_PRECISION (lhs_type)) + ifn = IFN_MULHS; + /* Check for pattern 3). */ + else if (wi::to_widest (scale_term) + target_precision + == TYPE_PRECISION (lhs_type)) + ifn = IFN_MULH; + else + return NULL; } - /* Check that the scaling factor is correct. */ - if (TREE_CODE (scale_term) != INTEGER_CST - || wi::to_widest (scale_term) + expect_offset - != TYPE_PRECISION (lhs_type)) - return NULL; - /* Check whether the scaling input term can be seen as two widened inputs multiplied together. */ vect_unpromoted_value unprom_mult[2];