From patchwork Mon Oct 27 08:10:09 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evgeny Stupachenko X-Patchwork-Id: 403444 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 715C7140081 for ; Mon, 27 Oct 2014 19:10:22 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=bP9xDJSgiuNY1swzDg O/zWlpXepERrKNsCEup5W0qbWyCEXsaUf6fejSNhtmGQ4svlfDXYwmCWC9fK3thq 8DWYMoy8RSlw8j/h/JYZ3PmEnaZXatGNmpebH7EuFBs3uD18aIW+ptcdeGNxCetG O6mFETfW6Ku8+T2lN5vq1P2oU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=xbCAFMq4rIbKB6whBBY/7nwU zAI=; b=UjKukkeOrZyjhO7Jpj+b0swmLYOY/nGX5R6f0s/4Jbq16YvP2+0iQyIl kLqDW+KkykTSCCUaH6aRIp2Ul0GVcC2XGqy9Yhe6V1wnJGYkop37uFgti79X6w6R l5IoLZtwvmHCOm4V4R0dZ2hKoDad01RNotYEoseqi5KFQMSQ72E= Received: (qmail 13452 invoked by alias); 27 Oct 2014 08:10:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 13438 invoked by uid 89); 27 Oct 2014 08:10:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ig0-f171.google.com Received: from mail-ig0-f171.google.com (HELO mail-ig0-f171.google.com) (209.85.213.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 27 Oct 2014 08:10:12 +0000 Received: by mail-ig0-f171.google.com with SMTP id l13so5820144iga.10 for ; Mon, 27 Oct 2014 01:10:09 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.107.16.19 with SMTP id y19mr294132ioi.77.1414397409665; Mon, 27 Oct 2014 01:10:09 -0700 (PDT) Received: by 10.107.11.220 with HTTP; Mon, 27 Oct 2014 01:10:09 -0700 (PDT) In-Reply-To: <20141013122332.GA9404@atrey.karlin.mff.cuni.cz> References: <20141013122332.GA9404@atrey.karlin.mff.cuni.cz> Date: Mon, 27 Oct 2014 11:10:09 +0300 Message-ID: Subject: Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly From: Evgeny Stupachenko To: Jan Hubicka Cc: Richard Biener , Uros Bizjak , GCC Patches X-IsSubscribed: yes The results are the same for Silvermont. There are no significant changes on Haswell. So I agree with Richard, let's enable this x86 wide. Bootstrap/ passed. Make check in progress. Is it ok? 2014-10-25 Evgeny Stupachenko * config/i386/i386.c (ix86_option_override_internal): Increase PARAM_MAX_COMPLETELY_PEELED_INSNS. On Mon, Oct 13, 2014 at 4:23 PM, Jan Hubicka wrote: >> On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko wrote: >> > Hi, >> > >> > The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with >> > high branch cost. >> > Bootstrap and make check are in progress. >> > The patch boosts (up to 2,5 times improve) several benchmarks compiled >> > with "-Ofast" on Silvermont >> > Spec2000: >> > +5% gain on 173.applu >> > +1% gain on 255.vortex >> > >> > Is it ok for trunk when pass bootstrap and make check? >> >> This is only a 20% increase - from 100 to 120. I would instead suggest >> to explore doing this change unconditionally if it helps that much. > > Agreed, I think the value of 100 was set decade ago by Zdenek and me completely > artifically. I do not recall any serious tuning of this flag. > > Note that I plan to update > https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so > PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than tree > changing its meaning somewhat. > > Perhaps I could try to find time this or next week to update the patch so we do > not need to do the tuning twice. > > Honza > >> >> Richard. >> >> > Thanks, >> > Evgeny >> > >> > 2014-10-10 Evgeny Stupachenko >> > * config/i386/i386.c (ix86_option_override_internal): Increase >> > PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. >> > * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. >> > * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates >> > CPUs with high branch cost. >> > >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> > index 6337aa5..5ac10eb 100644 >> > --- a/gcc/config/i386/i386.c >> > +++ b/gcc/config/i386/i386.c >> > @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, >> > opts->x_param_values, >> > opts_set->x_param_values); >> > >> > + /* Extend full peel max insns parameter for CPUs with high branch cost. */ >> > + if (TARGET_HIGH_BRANCH_COST) >> > + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, >> > + 120, >> > + opts->x_param_values, >> > + opts_set->x_param_values); >> > + >> > + >> > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ >> > if (opts->x_flag_prefetch_loop_arrays < 0 >> > && HAVE_prefetch >> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h >> > index 2c64162..da0c57b 100644 >> > --- a/gcc/config/i386/i386.h >> > +++ b/gcc/config/i386/i386.h >> > @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; >> > #define TARGET_INTER_UNIT_CONVERSIONS \ >> > ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] >> > #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] >> > +#define TARGET_HIGH_BRANCH_COST >> > ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] >> > #define TARGET_SCHEDULE ix86_tune_features[X86_TUNE_SCHEDULE] >> > #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] >> > #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] >> > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def >> > index b6b210e..04d8bf8 100644 >> > --- a/gcc/config/i386/x86-tune.def >> > +++ b/gcc/config/i386/x86-tune.def >> > @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit", >> > m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | >> > m_ATHLON_K8 | m_AMDFAM10) >> > >> > +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be >> > + used to tune unroll, if-cvt, inline... heuristics. */ >> > +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, "high_branch_cost", >> > + m_BONNELL | m_SILVERMONT | m_INTEL) >> > + >> > /*****************************************************************************/ >> > /* Integer instruction selection tuning */ >> > /*****************************************************************************/ On Mon, Oct 13, 2014 at 3:23 PM, Jan Hubicka wrote: >> On Fri, Oct 10, 2014 at 5:40 PM, Evgeny Stupachenko wrote: >> > Hi, >> > >> > The patch increase PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with >> > high branch cost. >> > Bootstrap and make check are in progress. >> > The patch boosts (up to 2,5 times improve) several benchmarks compiled >> > with "-Ofast" on Silvermont >> > Spec2000: >> > +5% gain on 173.applu >> > +1% gain on 255.vortex >> > >> > Is it ok for trunk when pass bootstrap and make check? >> >> This is only a 20% increase - from 100 to 120. I would instead suggest >> to explore doing this change unconditionally if it helps that much. > > Agreed, I think the value of 100 was set decade ago by Zdenek and me completely > artifically. I do not recall any serious tuning of this flag. > > Note that I plan to update > https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02270.html to current tree so > PARAM_MAX_COMPLETELY_PEELED_INSNS will be used at gimple level rather than tree > changing its meaning somewhat. > > Perhaps I could try to find time this or next week to update the patch so we do > not need to do the tuning twice. > > Honza > >> >> Richard. >> >> > Thanks, >> > Evgeny >> > >> > 2014-10-10 Evgeny Stupachenko >> > * config/i386/i386.c (ix86_option_override_internal): Increase >> > PARAM_MAX_COMPLETELY_PEELED_INSNS for CPUs with high branch cost. >> > * config/i386/i386.h (TARGET_HIGH_BRANCH_COST): New. >> > * config/i386/x86-tune.def (X86_TUNE_HIGH_BRANCH_COST): Indicates >> > CPUs with high branch cost. >> > >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> > index 6337aa5..5ac10eb 100644 >> > --- a/gcc/config/i386/i386.c >> > +++ b/gcc/config/i386/i386.c >> > @@ -4081,6 +4081,14 @@ ix86_option_override_internal (bool main_args_p, >> > opts->x_param_values, >> > opts_set->x_param_values); >> > >> > + /* Extend full peel max insns parameter for CPUs with high branch cost. */ >> > + if (TARGET_HIGH_BRANCH_COST) >> > + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, >> > + 120, >> > + opts->x_param_values, >> > + opts_set->x_param_values); >> > + >> > + >> > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ >> > if (opts->x_flag_prefetch_loop_arrays < 0 >> > && HAVE_prefetch >> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h >> > index 2c64162..da0c57b 100644 >> > --- a/gcc/config/i386/i386.h >> > +++ b/gcc/config/i386/i386.h >> > @@ -415,6 +415,7 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; >> > #define TARGET_INTER_UNIT_CONVERSIONS \ >> > ix86_tune_features[X86_TUNE_INTER_UNIT_CONVERSIONS] >> > #define TARGET_FOUR_JUMP_LIMIT ix86_tune_features[X86_TUNE_FOUR_JUMP_LIMIT] >> > +#define TARGET_HIGH_BRANCH_COST >> > ix86_tune_features[X86_TUNE_HIGH_BRANCH_COST] >> > #define TARGET_SCHEDULE ix86_tune_features[X86_TUNE_SCHEDULE] >> > #define TARGET_USE_BT ix86_tune_features[X86_TUNE_USE_BT] >> > #define TARGET_USE_INCDEC ix86_tune_features[X86_TUNE_USE_INCDEC] >> > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def >> > index b6b210e..04d8bf8 100644 >> > --- a/gcc/config/i386/x86-tune.def >> > +++ b/gcc/config/i386/x86-tune.def >> > @@ -208,6 +208,11 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit", >> > m_PPRO | m_P4_NOCONA | m_BONNELL | m_SILVERMONT | m_INTEL | >> > m_ATHLON_K8 | m_AMDFAM10) >> > >> > +/* X86_TUNE_HIGH_BRANCH_COST: Some CPUs have higher branch cost. This could be >> > + used to tune unroll, if-cvt, inline... heuristics. */ >> > +DEF_TUNE (X86_TUNE_HIGH_BRANCH_COST, "high_branch_cost", >> > + m_BONNELL | m_SILVERMONT | m_INTEL) >> > + >> > /*****************************************************************************/ >> > /* Integer instruction selection tuning */ >> > /*****************************************************************************/ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6337aa5..5ac10eb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4081,6 +4081,12 @@ ix86_option_override_internal (bool main_args_p, opts->x_param_values, opts_set->x_param_values); + /* Extend full peel max insns parameter for x86. */ + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, + 120, + opts->x_param_values, + opts_set->x_param_values); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch