From patchwork Wed Aug 10 16:52:06 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 657755 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3s8cdm2pRwz9stc for ; Thu, 11 Aug 2016 02:52:44 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=m3a6DaiK; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:in-reply-to:references:content-type:date :mime-version:content-transfer-encoding:message-id; q=dns; s= default; b=b62cnzZ0lIP2CZZqmdNa4VHdiZ8+zFR0wJXHMzmR/imKG1FZ/hTbD +/d/jAa5hp4qrkOFxdDDaA9eYuTA6RqmHS+g2NH9/INPc7Tl4XTE1UgDKUKNFaOx NEJAU5BzwoiXHFp7FCWp4zwmOxAO/0W4wYDCuxud02M6puwI/7e4k4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:in-reply-to:references:content-type:date :mime-version:content-transfer-encoding:message-id; s=default; bh=OT0CQrPD5RZaiL3a8zTJA2edY6Q=; b=m3a6DaiKNy8fzvwxfWFysf+/b1+q 7T9uO6lUcux4IUrpoET3QXtRdKO6Txqm5EaoDXfSYKh4ko7IwkMm4MgLZHwnj3oV TzF8ZOZ40MSJqtLffyFzsy+wc90+XuLak7PU894GKBcQydTF015fC0LvpxO1YBAZ wnoOet68POsajFY= Received: (qmail 36848 invoked by alias); 10 Aug 2016 16:52:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 36834 invoked by uid 89); 10 Aug 2016 16:52:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=decisions, POWER, someday, establishing X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 10 Aug 2016 16:52:24 +0000 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u7AGnAxi001630 for ; Wed, 10 Aug 2016 12:52:22 -0400 Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.149]) by mx0a-001b2d01.pphosted.com with ESMTP id 24qm9sybw3-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 10 Aug 2016 12:52:22 -0400 Received: from localhost by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 10 Aug 2016 10:52:21 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e31.co.us.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 10 Aug 2016 10:52:18 -0600 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: wschmidt@linux.vnet.ibm.com Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 9768D3E400A5; Wed, 10 Aug 2016 10:52:07 -0600 (MDT) Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u7AGq79V15925646; Wed, 10 Aug 2016 09:52:07 -0700 Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 66F3513603A; Wed, 10 Aug 2016 10:52:07 -0600 (MDT) Received: from [9.10.86.191] (unknown [9.10.86.191]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP id 07BEC136044; Wed, 10 Aug 2016 10:52:06 -0600 (MDT) Subject: Re: [PATCH, rs6000] Fix vec_construct vectorization cost to be somewhat more accurate From: Bill Schmidt To: Richard Biener Cc: Segher Boessenkool , GCC Patches , David Edelsohn In-Reply-To: References: <1b21afb4-a971-a95d-1084-53948c9c7f4c@linux.vnet.ibm.com> <20160718115657.GA14108@gate.crashing.org> Date: Wed, 10 Aug 2016 11:52:06 -0500 Mime-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16081016-8235-0000-0000-000008F6A5AB X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005573; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000180; SDB=6.00742773; UDB=6.00349677; IPR=6.00515331; BA=6.00004655; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012300; XFM=3.00000011; UTC=2016-08-10 16:52:20 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16081016-8236-0000-0000-000033D95F34 Message-Id: <1470847926.5480.10.camel@oc8801110288.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-08-10_13:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608100175 X-IsSubscribed: yes Sorry for the long delay on getting back to this. I took a look at the suggested test cases with the cost model available, and did some SPEC testing to validate the model. I found that it is still important to model the 4xfloat case separately to account for conversion from 64-bit to 32-bit in our internal representation; correcting the calculation there actually results in no net change, but the commentary is now better. The default cost of N-1 that Richard added in his patch applies well to POWER also, except for V2DI and V2DF modes (N=2) where this tends to undercount the cost and encourage unprofitable SLP vectorization in some cases. Establishing a minimum cost of 2 avoids test suite regressions and produces acceptable SPEC results. (So rather than using N instead of N-1 as in the previous version of this patch, I'm using N-1 with a floor of 2.) I looked through gcc.dg/vect/slp-4[35].c with the cost model enabled, and the results are sensible with these changes. SPEC results were all in the noise range. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Ok for trunk? Thanks, Bill 2016-08-10 Bill Schmidt * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Correct costs for vec_construct. On Mon, 2016-07-18 at 14:29 +0200, Richard Biener wrote: > On Mon, Jul 18, 2016 at 1:56 PM, Segher Boessenkool > wrote: > > Hi Bill, > > > > On Fri, Jul 15, 2016 at 08:55:08AM -0500, Bill Schmidt wrote: > >> This patch is a follow-up to Richard's patch of > >> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html. The cost of a > >> vec_construct (initialization of an N-way vector by N scalars) is too low, > >> which can cause too-aggressive vectorization in particular for N=8 or > >> higher. Richard changed the default cost to N-1, which is generally > >> sensible. For powerpc I am going with a slightly higher cost of N, which > >> will keep us from being less conservative than the previous values when N=2. > > > >> In any case, the purpose of this patch is simply to avoid vectorizing > >> things we shouldn't when we've undercounted the cost of a vec_construct. > >> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > >> regressions (hence the vectorization decisions in the test suite have > >> not changed). Is this ok for trunk? > > > > Do you also have a testcase where it does matter? It would be good to > > add that, then. Or is it fixing a regression? > > > > I know nothing about the cost model, so someone else will have to review, > > or I can just say "okay" ;-) > > You can maybe look at gcc.dg/vect/slp-4[35].c (and run it with the cost model > enabled). > > Richard. > > > > > Segher > Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 239310) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5266,16 +5266,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ return 2; case vec_construct: - elements = TYPE_VECTOR_SUBPARTS (vectype); + /* This is a rough approximation assuming non-constant elements + constructed into a vector via element insertion. FIXME: + vec_construct is not granular enough for uniformly good + decisions. If the initialization is a splat, this is + cheaper than we estimate. Improve this someday. */ elem_type = TREE_TYPE (vectype); /* 32-bit vectors loaded into registers are stored as double - precision, so we need n/2 converts in addition to the usual - n/2 merges to construct a vector of short floats from them. */ + precision, so we need 2 permutes, 2 converts, and 1 merge + to construct a vector of short floats from them. */ if (SCALAR_FLOAT_TYPE_P (elem_type) && TYPE_PRECISION (elem_type) == 32) - return elements + 1; + return 5; else - return elements / 2 + 1; + return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1); default: gcc_unreachable ();