diff mbox

[2/3,x86] X86 Silvermont vector cost model tune

Message ID CAOvf_xxCs8B1WHUAHTx8HwC0bWw0rU8kiVqOX5tSAywAjYZ8AQ@mail.gmail.com
State New
Headers show

Commit Message

Evgeny Stupachenko April 15, 2014, 4:08 p.m. UTC
2d part:

2014-04-15  Evgeny Stupachenko  <evstupac@gmail.com>

       * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte
       shuffle on some x86 architectures.
       * config/i386/i386.h (TARGET_SLOW_PHUFFB): Ditto.
       * config/i386/i386.c (expand_vec_perm_even_odd_1): Avoid byte shuffles
       in architectures where they are slow (TARGET_SLOW_PHUFFB).


 /*****************************************************************************/

On Thu, Mar 6, 2014 at 12:58 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> slm_cost/intel_cost and TARGET_SLOW_PSHUFB are just preparation to a
> next vectorization patch.
> Changes in ix86_add_stmt_cost gives real performance to Silvermont.
> Let's move all to stage1.
>
> On Wed, Mar 5, 2014 at 9:29 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, Mar 5, 2014 at 5:46 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, Mar 5, 2014 at 7:58 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> The patch is for x86 Silvermont.
>>>> It improves x86 Silvermont vector cost model.
>>>> It gives +20% on facerec spec on Silvermont.
>>>> It passes make check and bootstrap on x86.
>>>>
>>>> Is this patch ok for stage1?
>>>>
>>>> ChangeLog:
>>>>
>>>> 2014-03-05  Evgeny Stupachenko  <evstupac@gmail.com>
>>>>
>>>>     * config/i386/x86-tune.def (TARGET_SLOW_PSHUFB): Target for slow byte
>>>>     shuffle on some x86 architectures.
>>>>     * config/i386/i386.h (TARGET_SLOW_PSHUFB): Ditto.
>>>>     * config/i386/i386.c (processor_costs): Fixing vec_to_scalar_cost for
>>>>     Silvermont according latency table.
>>>>     (expand_vec_perm_even_odd_1): Avoid byte shuffles in architectures
>>>>     where they are slow (TARGET_SLOW_PSHUFB).
>>>>     (x86_add_stmt_cost): Fixing vector cost model for Silvermont.
>>>>
>>>> Thanks,
>>>> Evgeny
>>>
>>> There are 3 separate changes in this patch:
>>>
>>> 1. Update slm_cost, which doesn't have a ChangeLog entry.
>>> 2. Add TARGET_SLOW_PSHUFB.
>>> 3. Update ix86_add_stmt_cost.
>>>
>>> I suggest you break it into 3 independent patches.
>>
>> I think that slm_cost/intel_cost and TARGET_SLOW_PSHUFB changes can
>> still go into mainline at this stage since they are trivial tuning
>> changes that should not destabilize the compiler.
>>
>> The  ix86_add_stmt_cost should wait for stage 1.
>>
>> Uros.

Comments

Uros Bizjak April 16, 2014, 7:58 a.m. UTC | #1
On Tue, Apr 15, 2014 at 6:08 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> 2d part:
>
> 2014-04-15  Evgeny Stupachenko  <evstupac@gmail.com>
>
>        * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte
>        shuffle on some x86 architectures.

... (X86_TUNE_SLOW_PSHUFB): New tune definition.

Typo: TARGET_SLOW_PHUFFB -> TARGET_SLOW_PSHUFB.

>        * config/i386/i386.h (TARGET_SLOW_PHUFFB): Ditto.

... : New tune flag.

>        * config/i386/i386.c (expand_vec_perm_even_odd_1): Avoid byte shuffles
>        in architectures where they are slow (TARGET_SLOW_PHUFFB).

...: Avoid byte shuffles for TARGET_SLOW_PSHUFB.

OK for mainline with the above ChangeLog modifications.

Thanks,
Uros.
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index bf4d576..0ae3cda 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -44026,7 +44026,7 @@  expand_vec_perm_even_odd_1 (struct
expand_vec_perm_d *d, unsigned odd)
       gcc_unreachable ();

     case V8HImode:
-      if (TARGET_SSSE3)
+      if (TARGET_SSSE3 && !TARGET_SLOW_PSHUFB)
        return expand_vec_perm_pshufb2 (d);
       else
        {
@@ -44049,7 +44049,7 @@  expand_vec_perm_even_odd_1 (struct
expand_vec_perm_d *d, unsigned odd)
       break;

     case V16QImode:
-      if (TARGET_SSSE3)
+      if (TARGET_SSSE3 && !TARGET_SLOW_PSHUFB)
        return expand_vec_perm_pshufb2 (d);
       else
        {
 diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 51659de..1a884d8 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -425,6 +425,8 @@  extern unsigned char ix86_tune_features[X86_TUNE_LAST];
        ix86_tune_features[X86_TUNE_USE_VECTOR_FP_CONVERTS]
 #define TARGET_USE_VECTOR_CONVERTS \
        ix86_tune_features[X86_TUNE_USE_VECTOR_CONVERTS]
+#define TARGET_SLOW_PSHUFB \
+       ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
 #define TARGET_FUSE_CMP_AND_BRANCH_32 \
        ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
 #define TARGET_FUSE_CMP_AND_BRANCH_64 \
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 8399102..9b0ff36 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -386,6 +386,10 @@  DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS,
"use_vector_fp_converts",
    from integer to FP. */
 DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, "use_vector_converts", m_AMDFAM10)

+/* X86_TUNE_SLOW_SHUFB: Indicates tunings with slow pshufb instruction.  */
+DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_pshufb",
+          m_BONNELL | m_SILVERMONT | m_INTEL)
+
 /*****************************************************************************/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too)     */