[rs6000] Adjust vectorization cost for scalar COND_EXPR

Message ID	3329c840-dccc-5d06-740f-7e669fd5e39a@linux.ibm.com
State	New
Headers	show Return-Path: <gcc-patches-return-515693-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type:message-id; q=dns; s=default; b=yN7p7aI81pUUepkl9VdSxkIfkysrU5ABTTlJxWBXDsVfg9oE9C z++3ciALBJMoby8Tgn1qqyIVVdeJjFvyY8MkEID682JLrTDhfLYk6z2Ki0/7CkQ9 NXJFCFd22GOvBAWTxkHY8r030F+tdEdsxAJYz9gvyr4X9bzEsquB0XV/U= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Gateway: Authorized Use Only! Violators will be prosecuted for <gcc-patches@gcc.gnu.org> from <linkw@linux.ibm.com>; Wed, 11 Dec 2019 12:32:00 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 11 Dec 2019 12:31:58 -0000 To: GCC Patches <gcc-patches@gcc.gnu.org> Cc: Segher Boessenkool <segher@kernel.crashing.org>, Bill Schmidt <wschmidt@linux.ibm.com> From: "Kewen.Lin" <linkw@linux.ibm.com> Subject: [PATCH, rs6000] Adjust vectorization cost for scalar COND_EXPR Date: Wed, 11 Dec 2019 20:31:53 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------18927FB532E2C09819F6F220" Message-Id: <3329c840-dccc-5d06-740f-7e669fd5e39a@linux.ibm.com>
Series	[rs6000] Adjust vectorization cost for scalar COND_EXPR \| expand [rs6000] Adjust vectorization cost for scalar COND_EXPR

Message ID

3329c840-dccc-5d06-740f-7e669fd5e39a@linux.ibm.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type:message-id; q=dns;
	s=default; b=yN7p7aI81pUUepkl9VdSxkIfkysrU5ABTTlJxWBXDsVfg9oE9C
	z++3ciALBJMoby8Tgn1qqyIVVdeJjFvyY8MkEID682JLrTDhfLYk6z2Ki0/7CkQ9
	NXJFCFd22GOvBAWTxkHY8r030F+tdEdsxAJYz9gvyr4X9bzEsquB0XV/U=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	Bill Schmidt <wschmidt@linux.ibm.com>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Subject: [PATCH, rs6000] Adjust vectorization cost for scalar COND_EXPR
Date: Wed, 11 Dec 2019 20:31:53 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15;
	rv:68.0) Gecko/20100101 Thunderbird/68.2.2
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="------------18927FB532E2C09819F6F220"
Message-Id: <3329c840-dccc-5d06-740f-7e669fd5e39a@linux.ibm.com>

Series

[rs6000] Adjust vectorization cost for scalar COND_EXPR | expand

Commit Message

Kewen.Lin Dec. 11, 2019, 12:31 p.m. UTC

Hi,

We found that the vectorization cost modeling on scalar COND_EXPR is a bit off
on rs6000.  One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip
-fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the
default is -fvect-cost-model=dynamic) by 1.94%.  Scalar COND_EXPR is expanded
into compare + branch or compare + isel normally, either of them should be
priced more than the simple FXU operation.  This patch is to add additional
vectorization cost onto scalar COND_EXPR on top of builtin_vectorization_cost.
The idea to use additional cost value 2 instead of the others: 1) try various
possible value candidates from 1 to 5, 2 is the best measured on Power9.  2) 
from latency view, compare takes 3 cycles and isel takes 2 on Power9, it's 
2.5 times of simple FXU instruction which takes cost 1 in the current
modeling, it's close.  3) get fine SPEC2017 ratio on Power8 as well.

The SPEC2017 performance evaluation on Power9 with explicit unrolling shows 
548.exchange2_r +2.35% gains, but 526.blender_r -1.99% degradation, the others
is trivial.  By further investigation on 526.blender_r, the assembly of 10
hottest functions are unchanged, the impact should be due to some side effects.
SPECINT geomean +0.16%, SPECFP geomean -0.16% (mainly due to blender_r).
Without explicit unrolling, 548.exchange2_r +1.78% gains and the others are
trivial.  SPECINT geomean +0.19%, SPECINT geomean +0.06%.

While the SPEC2017 performance evaluation on Power8 shows 500.perlbench_r
+1.32% gain and 511.povray_r +2.03% gain, the others are trivial.  SPECINT
geomean +0.08%, SPECINT geomean +0.18%.

Bootstrapped and regress tested on powerpc64le-linux-gnu.  
Is OK for trunk?

BR,
Kewen
---

gcc/ChangeLog

2019-12-11  Kewen Lin  <linkw@gcc.gnu.org>

	* config/rs6000/rs6000.c (adjust_vectorization_cost): New function.
	(rs6000_add_stmt_cost): Call adjust_vectorization_cost and update
	stmt_cost.

Comments

Bill Schmidt Dec. 11, 2019, 1:39 p.m. UTC | #1

Hi!

I can't approve this, but for what it's worth it looks fine to me.

Bill

On 12/11/19 6:31 AM, Kewen.Lin wrote:
> Hi,
>
> We found that the vectorization cost modeling on scalar COND_EXPR is a bit off
> on rs6000.  One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip
> -fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the
> default is -fvect-cost-model=dynamic) by 1.94%.  Scalar COND_EXPR is expanded
> into compare + branch or compare + isel normally, either of them should be
> priced more than the simple FXU operation.  This patch is to add additional
> vectorization cost onto scalar COND_EXPR on top of builtin_vectorization_cost.
> The idea to use additional cost value 2 instead of the others: 1) try various
> possible value candidates from 1 to 5, 2 is the best measured on Power9.  2)
> from latency view, compare takes 3 cycles and isel takes 2 on Power9, it's
> 2.5 times of simple FXU instruction which takes cost 1 in the current
> modeling, it's close.  3) get fine SPEC2017 ratio on Power8 as well.
>
> The SPEC2017 performance evaluation on Power9 with explicit unrolling shows
> 548.exchange2_r +2.35% gains, but 526.blender_r -1.99% degradation, the others
> is trivial.  By further investigation on 526.blender_r, the assembly of 10
> hottest functions are unchanged, the impact should be due to some side effects.
> SPECINT geomean +0.16%, SPECFP geomean -0.16% (mainly due to blender_r).
> Without explicit unrolling, 548.exchange2_r +1.78% gains and the others are
> trivial.  SPECINT geomean +0.19%, SPECINT geomean +0.06%.
>
> While the SPEC2017 performance evaluation on Power8 shows 500.perlbench_r
> +1.32% gain and 511.povray_r +2.03% gain, the others are trivial.  SPECINT
> geomean +0.08%, SPECINT geomean +0.18%.
>
> Bootstrapped and regress tested on powerpc64le-linux-gnu.
> Is OK for trunk?
>
> BR,
> Kewen
> ---
>
> gcc/ChangeLog
>
> 2019-12-11  Kewen Lin  <linkw@gcc.gnu.org>
>
> 	* config/rs6000/rs6000.c (adjust_vectorization_cost): New function.
> 	(rs6000_add_stmt_cost): Call adjust_vectorization_cost and update
> 	stmt_cost.
>

Segher Boessenkool Dec. 11, 2019, 1:50 p.m. UTC | #2

On Wed, Dec 11, 2019 at 07:39:38AM -0600, Bill Schmidt wrote:
> I can't approve this, but for what it's worth it looks fine to me.

But I can :-)  Thanks for looking Bill!

The patch is okay for trunk.  Thanks Ke Wen!


Segher


> >2019-12-11  Kewen Lin  <linkw@gcc.gnu.org>
> >
> >	* config/rs6000/rs6000.c (adjust_vectorization_cost): New function.
> >	(rs6000_add_stmt_cost): Call adjust_vectorization_cost and update
> >	stmt_cost.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2995348..5dad3cc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5016,6 +5016,29 @@  rs6000_init_cost (struct loop *loop_info)
   return data;
 }
 
+/* Adjust vectorization cost after calling rs6000_builtin_vectorization_cost.
+   For some statement, we would like to further fine-grain tweak the cost on
+   top of rs6000_builtin_vectorization_cost handling which doesn't have any
+   information on statement operation codes etc.  One typical case here is
+   COND_EXPR, it takes the same cost to simple FXU instruction when evaluating
+   for scalar cost, but it should be priced more whatever transformed to either
+   compare + branch or compare + isel instructions.  */
+
+static unsigned
+adjust_vectorization_cost (enum vect_cost_for_stmt kind,
+			   struct _stmt_vec_info *stmt_info)
+{
+  if (kind == scalar_stmt && stmt_info && stmt_info->stmt
+      && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN)
+    {
+      tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
+      if (subcode == COND_EXPR)
+	return 2;
+    }
+
+  return 0;
+}
+
 /* Implement targetm.vectorize.add_stmt_cost.  */
 
 static unsigned
@@ -5031,6 +5054,7 @@  rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
       tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE;
       int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype,
 							 misalign);
+      stmt_cost += adjust_vectorization_cost (kind, stmt_info);
       /* Statements in an inner loop relative to the loop being
 	 vectorized are weighted more heavily.  The value here is
 	 arbitrary and could potentially be improved with analysis.  */