From patchwork Wed Dec 11 12:31:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1207630 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-515693-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="KCVaP4oB"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47XxC66prKz9sT5 for ; Wed, 11 Dec 2019 23:32:16 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type:message-id; q=dns; s=default; b=yN7p7aI81pUUepkl9VdSxkIfkysrU5ABTTlJxWBXDsVfg9oE9C z++3ciALBJMoby8Tgn1qqyIVVdeJjFvyY8MkEID682JLrTDhfLYk6z2Ki0/7CkQ9 NXJFCFd22GOvBAWTxkHY8r030F+tdEdsxAJYz9gvyr4X9bzEsquB0XV/U= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type:message-id; s= default; bh=XDRCkH1odB0GUSvGkpHaoa83Eq8=; b=KCVaP4oBABMHa28udbM5 Bj3lXhTgWSu3Q79HgRMK92qedW6SC5yzv4SBff+4FXUXqii0lWxmGVmhbWDV5aDr tOFk343CeMKQxfVht8+E+dHFokOA+RLh91rH92mU7/esJZn+0D8m/ySxpH5DklP3 MhPjmpgxp7ANzHsx3TsLuk4= Received: (qmail 3472 invoked by alias); 11 Dec 2019 12:32:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3462 invoked by uid 89); 11 Dec 2019 12:32:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-21.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=priced, rs6000c, measured, rs6000.c X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 11 Dec 2019 12:32:04 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id xBBCId27016146 for ; Wed, 11 Dec 2019 07:32:02 -0500 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 2wtbt21011-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 11 Dec 2019 07:32:02 -0500 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 11 Dec 2019 12:32:00 -0000 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 11 Dec 2019 12:31:58 -0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id xBBCVvug48300234 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 11 Dec 2019 12:31:57 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9349A5206D; Wed, 11 Dec 2019 12:31:56 +0000 (GMT) Received: from [9.197.235.40] (unknown [9.197.235.40]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 087885205F; Wed, 11 Dec 2019 12:31:54 +0000 (GMT) To: GCC Patches Cc: Segher Boessenkool , Bill Schmidt From: "Kewen.Lin" Subject: [PATCH, rs6000] Adjust vectorization cost for scalar COND_EXPR Date: Wed, 11 Dec 2019 20:31:53 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 x-cbid: 19121112-0008-0000-0000-0000033FCA68 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19121112-0009-0000-0000-00004A5F0073 Message-Id: <3329c840-dccc-5d06-740f-7e669fd5e39a@linux.ibm.com> X-IsSubscribed: yes Hi, We found that the vectorization cost modeling on scalar COND_EXPR is a bit off on rs6000. One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip -fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the default is -fvect-cost-model=dynamic) by 1.94%. Scalar COND_EXPR is expanded into compare + branch or compare + isel normally, either of them should be priced more than the simple FXU operation. This patch is to add additional vectorization cost onto scalar COND_EXPR on top of builtin_vectorization_cost. The idea to use additional cost value 2 instead of the others: 1) try various possible value candidates from 1 to 5, 2 is the best measured on Power9. 2) from latency view, compare takes 3 cycles and isel takes 2 on Power9, it's 2.5 times of simple FXU instruction which takes cost 1 in the current modeling, it's close. 3) get fine SPEC2017 ratio on Power8 as well. The SPEC2017 performance evaluation on Power9 with explicit unrolling shows 548.exchange2_r +2.35% gains, but 526.blender_r -1.99% degradation, the others is trivial. By further investigation on 526.blender_r, the assembly of 10 hottest functions are unchanged, the impact should be due to some side effects. SPECINT geomean +0.16%, SPECFP geomean -0.16% (mainly due to blender_r). Without explicit unrolling, 548.exchange2_r +1.78% gains and the others are trivial. SPECINT geomean +0.19%, SPECINT geomean +0.06%. While the SPEC2017 performance evaluation on Power8 shows 500.perlbench_r +1.32% gain and 511.povray_r +2.03% gain, the others are trivial. SPECINT geomean +0.08%, SPECINT geomean +0.18%. Bootstrapped and regress tested on powerpc64le-linux-gnu. Is OK for trunk? BR, Kewen --- gcc/ChangeLog 2019-12-11 Kewen Lin * config/rs6000/rs6000.c (adjust_vectorization_cost): New function. (rs6000_add_stmt_cost): Call adjust_vectorization_cost and update stmt_cost. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 2995348..5dad3cc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5016,6 +5016,29 @@ rs6000_init_cost (struct loop *loop_info) return data; } +/* Adjust vectorization cost after calling rs6000_builtin_vectorization_cost. + For some statement, we would like to further fine-grain tweak the cost on + top of rs6000_builtin_vectorization_cost handling which doesn't have any + information on statement operation codes etc. One typical case here is + COND_EXPR, it takes the same cost to simple FXU instruction when evaluating + for scalar cost, but it should be priced more whatever transformed to either + compare + branch or compare + isel instructions. */ + +static unsigned +adjust_vectorization_cost (enum vect_cost_for_stmt kind, + struct _stmt_vec_info *stmt_info) +{ + if (kind == scalar_stmt && stmt_info && stmt_info->stmt + && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) + { + tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); + if (subcode == COND_EXPR) + return 2; + } + + return 0; +} + /* Implement targetm.vectorize.add_stmt_cost. */ static unsigned @@ -5031,6 +5054,7 @@ rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind, tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype, misalign); + stmt_cost += adjust_vectorization_cost (kind, stmt_info); /* Statements in an inner loop relative to the loop being vectorized are weighted more heavily. The value here is arbitrary and could potentially be improved with analysis. */