[PR52252] Vectorization for load/store groups of size 3.

The patch on cost model was successfully committed.
I've separated the rest part of the patch on loads/stores group into
2: on loads group and on stores group.
Below is first part on loads group.

Bootstrap and make check passed on x86.

Is it ok?

ChangeLog:

2014-05-06  Evgeny Stupachenko  <evstupac@gmail.com>

        * tree-vect-data-refs.c (vect_grouped_load_supported): New
        check for loads group of length 3.
        (vect_permute_load_chain): New permutations for loads group of
        length 3.
        * tree-vect-stmts.c (vect_model_load_cost): Change cost
        of vec_perm_shuffle for the new permutations.

ChangeLog for testsuite:

2014-05-06  Evgeny Stupachenko  <evstupac@gmail.com>

       PR tree-optimization/52252
       * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.

On Wed, Apr 30, 2014 at 6:31 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
> Ping.
>
> On Fri, Apr 18, 2014 at 2:05 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>> Hi,
>>
>> Merged with current master the patch passes bootstrap and is giving
>> expected gains.
>> Patch and new tests are attached.
>>
>> ChangeLog:
>>
>> 2014-04-18  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>>         * tree-vect-data-refs.c (vect_grouped_store_supported): New
>>         check for stores group of length 3.
>>         (vect_permute_store_chain): New permutations for stores group of
>>         length 3.
>>         (vect_grouped_load_supported): New check for loads group of length 3.
>>         (vect_permute_load_chain): New permutations for loads group of length 3.
>>         * tree-vect-stmts.c (vect_model_store_cost): Change cost
>>         of vec_perm_shuffle for the new permutations.
>>         (vect_model_load_cost): Ditto.
>>
>> ChangeLog for testsuite:
>>
>> 2014-04-18  Evgeny Stupachenko  <evstupac@gmail.com>
>>
>>        PR tree-optimization/52252
>>        * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.
>>        * gcc.dg/vect/pr52252-st.c: Test on stores group of size 3.
>>
>> Evgeny
>>
>> On Thu, Mar 6, 2014 at 6:44 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>> Missed attachment.
>>>
>>> On Thu, Mar 6, 2014 at 6:42 PM, Evgeny Stupachenko <evstupac@gmail.com> wrote:
>>>> I've separated the patch into 2: cost model tuning and load/store
>>>> groups parallelism.
>>>> SLM tuning was partially introduced in the patch:
>>>> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00226.html
>>>> The patch introducing vectorization for load/store groups of size 3 attached.
>>>>
>>>> Is it ok for stage1?
>>>>
>>>> ChangeLog:
>>>>
>>>> 2014-03-06  Evgeny Stupachenko  <evstupac@gmail.com>
>>>>
>>>>        * tree-vect-data-refs.c (vect_grouped_store_supported): New
>>>>        check for stores group of length 3.
>>>>        (vect_permute_store_chain): New permutations for stores group of
>>>>        length 3.
>>>>        (vect_grouped_load_supported): New check for loads group of length 3.
>>>>        (vect_permute_load_chain): New permutations for loads group of length 3.
>>>>        * tree-vect-stmts.c (vect_model_store_cost): Change cost
>>>>        of vec_perm_shuffle for the new permutations.
>>>>        (vect_model_load_cost): Ditto.
>>>>
>>>>
>>>>
>>>> On Tue, Feb 11, 2014 at 7:19 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>> On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>>>>
>>>>>> Missed patch attached in plain-text.
>>>>>>
>>>>>> I have copyright assignment on file with the FSF covering work on GCC.
>>>>>>
>>>>>> Load/stores groups of length 3 is the most frequent non-power-of-2
>>>>>> case. It is used in RGB image processing (like test case in PR52252).
>>>>>> For sure we can extend the patch to length 5 and more. However, this
>>>>>> potentially affect performance on some other architectures and
>>>>>> requires larger testing. So length 3 it is just first step.The
>>>>>> algorithm in the patch could be modified for a general case in several
>>>>>> steps.
>>>>>>
>>>>>> I understand that the patch should wait for the stage 1, however since
>>>>>> its ready we can discuss it right now and make some changes (like
>>>>>> general size of group).
>>>>>
>>>>> Other than that I'd like to see a vectorizer hook querying the cost of a
>>>>> vec_perm_const expansion instead of adding vec_perm_shuffle
>>>>> (thus requires the constant shuffle mask to be passed as well
>>>>> as the vector type).  That's more useful for other uses that
>>>>> would require (arbitrary) shuffles.
>>>>>
>>>>> Didn't look at the rest of the patch yet - queued in my review
>>>>> pipeline.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> Thanks,
>>>>>> Evgeny
>>>>>>
>>>>>> On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>> >
>>>>>> > On Tue, 11 Feb 2014, Evgeny Stupachenko wrote:
>>>>>> >
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > The patch gives an expected 3 times gain for the test case in the PR52252
>>>>>> > > (and even 6 times for AVX2).
>>>>>> > > It passes make check and bootstrap on x86.
>>>>>> > > spec2000/spec2006 got no regressions/gains on x86.
>>>>>> > >
>>>>>> > > Is this patch ok?
>>>>>> >
>>>>>> > I've worked on generalizing the permutation support in the light
>>>>>> > of the availability of the generic shuffle support in the IL
>>>>>> > but hit some road-blocks in the way code-generation works for
>>>>>> > group loads with permutations (I don't remember if I posted all patches).
>>>>>> >
>>>>>> > This patch seems to be to a slightly different place but it again
>>>>>> > special-cases a specific permutation.  Why's that?  Why can't we
>>>>>> > support groups of size 7 for example?  So - can this be generalized
>>>>>> > to support arbitrary non-power-of-two load/store groups?
>>>>>> >
>>>>>> > Other than that the patch has to wait for stage1 to open again,
>>>>>> > of course.  And it misses a testcase.
>>>>>> >
>>>>>> > Btw, do you have a copyright assignment on file with the FSF covering
>>>>>> > work on GCC?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Richard.
>>>>>> >
>>>>>> > > ChangeLog:
>>>>>> > >
>>>>>> > > 2014-02-11  Evgeny Stupachenko  <evstupac@gmail.com>
>>>>>> > >
>>>>>> > >         * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle.
>>>>>> > >         * tree-vect-data-refs.c (vect_grouped_store_supported): New
>>>>>> > >         check for stores group of length 3.
>>>>>> > >         (vect_permute_store_chain): New permutations for stores group of
>>>>>> > >         length 3.
>>>>>> > >         (vect_grouped_load_supported): New check for loads group of length
>>>>>> > > 3.
>>>>>> > >         (vect_permute_load_chain): New permutations for loads group of
>>>>>> > > length 3.
>>>>>> > >         * tree-vect-stmts.c (vect_model_store_cost): New cost
>>>>>> > > vec_perm_shuffle
>>>>>> > >         for the new permutations.
>>>>>> > >         (vect_model_load_cost): Ditto.
>>>>>> > >         * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding
>>>>>> > >         vec_perm_shuffle cost as equvivalent of vec_perm cost.
>>>>>> > >         * config/arm/arm.c: Ditto.
>>>>>> > >         * config/rs6000/rs6000.c: Ditto.
>>>>>> > >         * config/spu/spu.c: Ditto.
>>>>>> > >         * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow
>>>>>> > > byte
>>>>>> > >         shuffle on some x86 architectures.
>>>>>> > >         * config/i386/i386.h (processor_costs): Defining pshuffb cost.
>>>>>> > >         * config/i386/i386.c (processor_costs): Adding pshuffb cost.
>>>>>> > >         (ix86_builtin_vectorization_cost): Adding cost for the new
>>>>>> > > permutations.
>>>>>> > >         Fixing cost for other permutations.
>>>>>> > >         (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are
>>>>>> > >         slow (TARGET_SLOW_PHUFFB).
>>>>>> > >         (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY.
>>>>>> > >         Adding new shuffle cost only when byte shuffle is expected.
>>>>>> > >         Fixing cost model for Silvermont.
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > > Evgeny
>>>>>> > >
>>>>>> >
>>>>>> > --
>>>>>> > Richard Biener <rguenther@suse.de>
>>>>>> > SUSE / SUSE Labs
>>>>>> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>>>>> > GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
>>>>>>
>>>>>
>>>>> --
>>>>> Richard Biener <rguenther@suse.de>
>>>>> SUSE / SUSE Labs
>>>>> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
>>>>> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

[PR52252] Vectorization for load/store groups of size 3.

Commit Message

Comments

Patch