From patchwork Thu May 11 15:59:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 761227 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wNyWY3kRsz9s7v for ; Fri, 12 May 2017 02:00:56 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="OIABRAWQ"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=vuVcO LSWyGDmEW0YXdq3aVpADrHCvVFqdEDNwFBKvOy/eQykYH54PMdD4GV0CCnF97COs 66GjBCQ3D3ilWmNGGbI3Ot3oj1UUIDecwRoK0X8mAE2qIv8vjAUicSYapt3vXp/C aOkAyn2ya9ubeNnzUMfTMSh/DgjdjTnBBkVOMw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=H78r566svvo spSVhfX+X5zAJ12k=; b=OIABRAWQssdW//etMjmPf0ygzoa0hu6kkH/+FaZEAwr nO5TZ1mEFU2umyRV9X4c+sgreJanCDuCuHacOmPX+E1lZMMOx7jM7iO7jKixu8xJ Nxjk0HnWRl0amPzc2Ot2wykzeaynggLu8tpEcdOOD4v1LkRf6u0gX2ek9MLyaNmw = Received: (qmail 90882 invoked by alias); 11 May 2017 16:00:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 80670 invoked by uid 89); 11 May 2017 16:00:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=rough, N1 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 11 May 2017 16:00:19 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v4BG03YK058563 for ; Thu, 11 May 2017 12:00:07 -0400 Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159]) by mx0b-001b2d01.pphosted.com with ESMTP id 2acf1bajvw-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 11 May 2017 12:00:07 -0400 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 11 May 2017 09:59:44 -0600 Received: from b03cxnp08027.gho.boulder.ibm.com (9.17.130.19) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 11 May 2017 09:59:42 -0600 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v4BFxgEL8651176; Thu, 11 May 2017 08:59:42 -0700 Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 39C8C136046; Thu, 11 May 2017 09:59:42 -0600 (MDT) Received: from bigmac.rchland.ibm.com (unknown [9.10.86.41]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP id F24B3136043; Thu, 11 May 2017 09:59:41 -0600 (MDT) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn From: Bill Schmidt Subject: [PATCH, rs6000] Fix PR80695 (vec_construct cost modeling issue) Date: Thu, 11 May 2017 10:59:41 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17051115-0028-0000-0000-00000791A639 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007047; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000211; SDB=6.00859197; UDB=6.00425807; IPR=6.00638672; BA=6.00005345; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015413; XFM=3.00000015; UTC=2017-05-11 15:59:44 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17051115-0029-0000-0000-000035B5A2A1 Message-Id: <600d0ecf-e93d-79d9-1960-28147921fc6b@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-05-11_12:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705110082 X-IsSubscribed: yes Hi, PR80695 identifies a case (similar to several others we've seen) where SLP vectorization is too aggressive about vectorizing stores. The problem is that we undervalue the cost of a vec_construct operation. vec_construct is the vectorizer's representation for building a vector from scalar elements. When we construct an integer vector type from its constituent parts, it requires a direct move from two GPRs (one instruction on P9, two direct moves and a merge on P8). The high cost of this is not reflected in the current cost calculation, which only counts the cost of combining the elements using N-1 inserts. This patch provides a higher estimate that is closer to reality. Note that all cost estimation for vectorization is a bit rough, so this should be viewed as a heuristic. The patch treats all integer vectors separately from the default case. There is already special handling for V4SFmode, so this leaves only V2DFmode in the default case. It was previously established heuristically that a cost factor of 2 was appropriate for V2DFmode, so that is left unchanged here; but since V2DFmode is the only default, we can simplify the calculation to just return 2. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2017-05-11 Bill Schmidt PR target/80695 * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Account for direct move costs for vec_construct of integer vectors. [gcc/testsuite] 2017-05-11 Bill Schmidt PR target/80695 * gcc.target/powerpc/pr80695-p8.c: New file. * gcc.target/powerpc/pr80695-p9.c: New file. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 247809) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ if (SCALAR_FLOAT_TYPE_P (elem_type) && TYPE_PRECISION (elem_type) == 32) return 5; + /* On POWER9, integer vector types are built up in GPRs and then + use a direct move (2 cycles). For POWER8 this is even worse, + as we need two direct moves and a merge, and the direct moves + are five cycles. */ + else if (INTEGRAL_TYPE_P (elem_type)) + { + if (TARGET_P9_VECTOR) + return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2; + else + return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11; + } else - return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1); + /* V2DFmode doesn't need a direct move. */ + return 2; default: gcc_unreachable (); Index: gcc/testsuite/gcc.target/powerpc/pr80695-p8.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr80695-p8.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr80695-p8.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-options "-mcpu=power8 -O3 -fdump-tree-slp-details" } */ + +/* PR80695: Verify cost model for vec_construct on POWER8. */ + +long a[10] __attribute__((aligned(16))); + +void foo (long i, long j, long k, long l) +{ + a[6] = i; + a[7] = j; + a[8] = k; + a[9] = l; +} + +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */ Index: gcc/testsuite/gcc.target/powerpc/pr80695-p9.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr80695-p9.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr80695-p9.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-options "-mcpu=power9 -O3 -fdump-tree-slp-details" } */ + +/* PR80695: Verify cost model for vec_construct on POWER9. */ + +long a[10] __attribute__((aligned(16))); + +void foo (long i, long j, long k, long l) +{ + a[6] = i; + a[7] = j; + a[8] = k; + a[9] = l; +} + +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 "slp2" } } */