diff mbox series

[OG14] Revert "[og10] vect: Add target hook to prefer gather/scatter instructions" (was: [PATCH] [og10] vect: Add target hook to prefer gather/scatter instructions)

Message ID 87o76t2goc.fsf@euler.schwinge.ddns.net
State New
Headers show
Series [OG14] Revert "[og10] vect: Add target hook to prefer gather/scatter instructions" (was: [PATCH] [og10] vect: Add target hook to prefer gather/scatter instructions) | expand

Commit Message

Thomas Schwinge July 19, 2024, 9:52 p.m. UTC
Hi!

On 2021-01-13T15:48:42-0800, Julian Brown <julian@codesourcery.com> wrote:
> For AMD GCN, the instructions available for loading/storing vectors are
> always scatter/gather operations (i.e. there are separate addresses for
> each vector lane), so the current heuristic to avoid gather/scatter
> operations with too many elements in get_group_load_store_type is
> counterproductive. Avoiding such operations in that function can
> subsequently lead to a missed vectorization opportunity whereby later
> analyses in the vectorizer try to use a very wide array type which is
> not available on this target, and thus it bails out.
>
> The attached patch adds a target hook to override the "single_element_p"
> heuristic in the function as a target hook, and activates it for GCN. This
> allows much better code to be generated for affected loops.
>
> Tested with offloading to AMD GCN. I will apply to the og10 branch
> shortly.

Testing current OG14 commit 735bbbfc6eaf58522c3ebb0946b66f33958ea134 for
'--target=amdgcn-amdhsa' (I've tested '-march=gfx908', '-march=gfx1100'),
this change has been identified to be causing ~100 instances of execution
test PASS -> FAIL, thus wrong-code generation.  It's possible that we've
had the same misbehavior also on OG13 and earlier, but just nobody ever
tested that.  And/or, that at some point in time, the original patch fell
out of sync, wasn't updated for relevant upstream vectorizer changes.
Until someone gets to analyze that (and upstream these changes here), we
shall revert this commit on OG14.  Pushed to devel/omp/gcc-14 branch
commit 8678fc697046fba1014f1db6321ee670538b0881
'Revert "[og10] vect: Add target hook to prefer gather/scatter instructions"',
see attached.


List of GCC 14.1 vs OG14 regressions (... avoided by this revert commit):

'-march=gfx1100' only:

    PASS: g++.dg/vect/pr97255.cc  -std=c++14 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/vect/pr97255.cc  -std=c++14 execution test
    PASS: g++.dg/vect/pr97255.cc  -std=c++17 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/vect/pr97255.cc  -std=c++17 execution test
    PASS: g++.dg/vect/pr97255.cc  -std=c++20 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/vect/pr97255.cc  -std=c++20 execution test
    UNSUPPORTED: g++.dg/vect/pr97255.cc  -std=c++98

    GCN Kernel Aborted

    @@ -101950,11 +101950,11 @@ PASS: gcc.dg/torture/pr52028.c   -O0  execution test
    PASS: gcc.dg/torture/pr52028.c   -O1  (test for excess errors)
    PASS: gcc.dg/torture/pr52028.c   -O1  execution test
    PASS: gcc.dg/torture/pr52028.c   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr52028.c   -O2  execution test
    PASS: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gcc.dg/torture/pr52028.c   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr52028.c   -O3 -g  execution test
    PASS: gcc.dg/torture/pr52028.c   -Os  (test for excess errors)
    PASS: gcc.dg/torture/pr52028.c   -Os  execution test

    GCN Kernel Aborted

    @@ -102160,11 +102160,11 @@ PASS: gcc.dg/torture/pr53366-1.c   -O0  execution test
    PASS: gcc.dg/torture/pr53366-1.c   -O1  (test for excess errors)
    PASS: gcc.dg/torture/pr53366-1.c   -O1  execution test
    PASS: gcc.dg/torture/pr53366-1.c   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr53366-1.c   -O2  execution test
    PASS: gcc.dg/torture/pr53366-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr53366-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gcc.dg/torture/pr53366-1.c   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr53366-1.c   -O3 -g  execution test
    PASS: gcc.dg/torture/pr53366-1.c   -Os  (test for excess errors)
    PASS: gcc.dg/torture/pr53366-1.c   -Os  execution test

    GCN Kernel Aborted

    PASS: gcc.dg/torture/pr93868.c   -O0  (test for excess errors)
    PASS: gcc.dg/torture/pr93868.c   -O0  execution test
    PASS: gcc.dg/torture/pr93868.c   -O1  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.c   -O1  execution test
    PASS: gcc.dg/torture/pr93868.c   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.c   -O2  execution test
    PASS: gcc.dg/torture/pr93868.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gcc.dg/torture/pr93868.c   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.c   -O3 -g  execution test
    PASS: gcc.dg/torture/pr93868.c   -Os  (test for excess errors)
    PASS: gcc.dg/torture/pr93868.c   -Os  execution test

    GCN Kernel Aborted

    PASS: gcc.target/gcn/complex.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.target/gcn/complex.c execution test

    GCN Kernel Aborted

'gcc.dg/vect/': generally, both '-march=gfx908', '-march=gfx1100':

    PASS: gcc.dg/vect/pr45752.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr45752.c execution test
    PASS: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "gaps requires scalar epilogue loop" 0
    PASS: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr45752.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

    PASS: gcc.dg/vect/pr66636.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr66636.c execution test

    PASS: gcc.dg/vect/pr78558.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr78558.c execution test

    PASS: gcc.dg/vect/slp-12a.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-12a.c execution test
    PASS: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorized 1 loops" 1
    PASS: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
      '-march=gfx908' only.
    PASS: gcc.dg/vect/slp-12a.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-12a.c execution test
    PASS: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
      '-march=gfx1100' only.

    PASS: gcc.dg/vect/slp-19c.c (test for excess errors)
    PASS: gcc.dg/vect/slp-19c.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-21.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-21.c execution test
    PASS: gcc.dg/vect/slp-21.c scan-tree-dump-times vect "vectorized 4 loops" 1
    FAIL: gcc.dg/vect/slp-21.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

    PASS: gcc.dg/vect/slp-perm-12.c (test for excess errors)
    PASS: gcc.dg/vect/slp-perm-12.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-12.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
      '-march=gfx908' only.
    PASS: gcc.dg/vect/slp-perm-12.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-12.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-12.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
      '-march=gfx1100' only.

    PASS: gcc.dg/vect/slp-perm-4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-4.c execution test
    PASS: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "gaps requires scalar epilogue loop" 0
    PASS: gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-4.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/vect-avg-16.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-avg-16.c execution test
    PASS: gcc.dg/vect/vect-avg-16.c scan-tree-dump vect "vect_recog_average_pattern: detected"

    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap7.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u8-i8-gap7.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap7.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c scan-tree-dump-times vect "vectorized 2 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap4.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4.c scan-tree-dump-times vect "vectorized 2 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap7.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap7.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap7.c scan-tree-dump-times vect "vectorized 1 loops" 1

'gcc.dg/vect/': not '-march=gfx908'; '-march=gfx1100' only:

    PASS: gcc.dg/vect/no-scevccp-outer-18.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/no-scevccp-outer-18.c execution test
    PASS: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1

    PASS: gcc.dg/vect/pr101445.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr101445.c execution test

    PASS: gcc.dg/vect/pr37027.c (test for excess errors)
    PASS: gcc.dg/vect/pr37027.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0
    PASS: gcc.dg/vect/pr37027.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr37027.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/pr37539.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr37539.c execution test
    PASS: gcc.dg/vect/pr37539.c scan-tree-dump-times vect "vectorized 1 loops" 2

    PASS: gcc.dg/vect/pr56826.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr56826.c execution test

    PASS: gcc.dg/vect/pr59354.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr59354.c execution test
    PASS: gcc.dg/vect/pr59354.c scan-tree-dump vect "vectorized 1 loop"

    PASS: gcc.dg/vect/pr61680.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr61680.c execution test

    PASS: gcc.dg/vect/pr64252.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr64252.c execution test

    PASS: gcc.dg/vect/pr66253.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr66253.c execution test
    PASS: gcc.dg/vect/pr66253.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/pr67790.c (test for excess errors)
    PASS: gcc.dg/vect/pr67790.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr67790.c scan-tree-dump vect "vectorizing stmts using SLP"
    PASS: gcc.dg/vect/pr67790.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0

    PASS: gcc.dg/vect/pr68445.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr68445.c scan-tree-dump vect "vectorizing stmts using SLP"

    PASS: gcc.dg/vect/pr71259.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr71259.c execution test

    PASS: gcc.dg/vect/pr81410.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr81410.c execution test
    PASS: gcc.dg/vect/pr81410.c scan-tree-dump vect "vectorized 1 loops"

    PASS: gcc.dg/vect/pr82108.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr82108.c execution test
    PASS: gcc.dg/vect/pr82108.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/pr87288-1.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr87288-1.c execution test
    PASS: gcc.dg/vect/pr87288-1.c scan-tree-dump-times vect "LOOP VECTORIZED" 1

    PASS: gcc.dg/vect/pr87288-2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr87288-2.c execution test
    PASS: gcc.dg/vect/pr87288-2.c scan-tree-dump vect "LOOP VECTORIZED"

    PASS: gcc.dg/vect/pr87288-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr87288-3.c execution test
    PASS: gcc.dg/vect/pr87288-3.c scan-tree-dump vect "LOOP VECTORIZED"

    PASS: gcc.dg/vect/pr92420.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr92420.c execution test

    PASS: gcc.dg/vect/pr96783-2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr96783-2.c execution test

    PASS: gcc.dg/vect/pr97832-1.c (test for excess errors)
    PASS: gcc.dg/vect/pr97832-1.c scan-tree-dump vect "Loop contains only SLP stmts"
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr97832-1.c scan-tree-dump vect "vectorizing stmts using SLP"

    PASS: gcc.dg/vect/pr97832-2.c (test for excess errors)
    PASS: gcc.dg/vect/pr97832-2.c scan-tree-dump vect "Loop contains only SLP stmts"
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr97832-2.c scan-tree-dump vect "vectorizing stmts using SLP"

    PASS: gcc.dg/vect/pr97832-3.c (test for excess errors)
    PASS: gcc.dg/vect/pr97832-3.c scan-tree-dump vect "Loop contains only SLP stmts"
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr97832-3.c scan-tree-dump vect "vectorizing stmts using SLP"

    PASS: gcc.dg/vect/pr97832-4.c (test for excess errors)
    PASS: gcc.dg/vect/pr97832-4.c scan-tree-dump vect "Loop contains only SLP stmts"
    [-PASS:-]{+FAIL:+} gcc.dg/vect/pr97832-4.c scan-tree-dump vect "vectorizing stmts using SLP"

    PASS: gcc.dg/vect/slp-11a.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-11a.c execution test
    PASS: gcc.dg/vect/slp-11a.c scan-tree-dump-times vect "vectorized 1 loops" 1
    PASS: gcc.dg/vect/slp-11a.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0

    PASS: gcc.dg/vect/slp-11b.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-11b.c execution test
    PASS: gcc.dg/vect/slp-11b.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-11b.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-11c.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-11c.c execution test
    PASS: gcc.dg/vect/slp-11c.c scan-tree-dump-times vect "vectorized 1 loops" 1
    PASS: gcc.dg/vect/slp-11c.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0

    PASS: gcc.dg/vect/slp-23.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-23.c execution test
    PASS: gcc.dg/vect/slp-23.c scan-tree-dump-times vect "vectorized 2 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-23.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

    PASS: gcc.dg/vect/slp-42.c (test for excess errors)
    PASS: gcc.dg/vect/slp-42.c scan-tree-dump vect "vectorized 1 loops"
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-42.c scan-tree-dump vect "vectorizing stmts using SLP"

    PASS: gcc.dg/vect/slp-46.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-46.c execution test
    FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4

    PASS: gcc.dg/vect/slp-47.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-47.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-47.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

    PASS: gcc.dg/vect/slp-48.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-48.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-48.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2

    PASS: gcc.dg/vect/slp-perm-1.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-1.c execution test
    PASS: gcc.dg/vect/slp-perm-1.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/slp-perm-10.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-10.c execution test
    PASS: gcc.dg/vect/slp-perm-10.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-10.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-perm-11.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-11.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-perm-2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-2.c execution test
    PASS: gcc.dg/vect/slp-perm-2.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-2.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-perm-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-3.c execution test
    PASS: gcc.dg/vect/slp-perm-3.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-3.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-perm-5.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-5.c execution test
    PASS: gcc.dg/vect/slp-perm-5.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/slp-perm-7.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-7.c execution test
    PASS: gcc.dg/vect/slp-perm-7.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/slp-perm-8.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-8.c execution test
    PASS: gcc.dg/vect/slp-perm-8.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/slp-perm-9.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-9.c execution test
    PASS: gcc.dg/vect/slp-perm-9.c scan-tree-dump-not vect "permutation requires at least three vectors"
    PASS: gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-perm-9.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-reduc-1.c (test for excess errors)
    PASS: gcc.dg/vect/slp-reduc-1.c execution test
    PASS: gcc.dg/vect/slp-reduc-1.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0
    PASS: gcc.dg/vect/slp-reduc-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-reduc-1.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    @@ -119631,23 +119631,23 @@ PASS: gcc.dg/vect/slp-reduc-2.c (test for excess errors)
    PASS: gcc.dg/vect/slp-reduc-2.c execution test
    PASS: gcc.dg/vect/slp-reduc-2.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0
    PASS: gcc.dg/vect/slp-reduc-2.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-reduc-2.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-reduc-3.c (test for excess errors)
    PASS: gcc.dg/vect/slp-reduc-3.c execution test
    PASS: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0
    XFAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vect_recog_dot_prod_pattern: detected" 1
    PASS: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/slp-reduc-4.c (test for excess errors)
    PASS: gcc.dg/vect/slp-reduc-4.c execution test
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-reduc-4.c scan-tree-dump vect "vectorizing stmts using SLP"
    PASS: gcc.dg/vect/slp-reduc-4.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0
    PASS: gcc.dg/vect/slp-reduc-4.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/slp-reduc-5.c (test for excess errors)
    PASS: gcc.dg/vect/slp-reduc-5.c execution test
    PASS: gcc.dg/vect/slp-reduc-5.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0
    PASS: gcc.dg/vect/slp-reduc-5.c scan-tree-dump-times vect "vectorized 1 loops" 2
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-reduc-5.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    @@ -119657,7 +119657,7 @@ PASS: gcc.dg/vect/slp-reduc-7.c (test for excess errors)
    PASS: gcc.dg/vect/slp-reduc-7.c execution test
    PASS: gcc.dg/vect/slp-reduc-7.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0
    PASS: gcc.dg/vect/slp-reduc-7.c scan-tree-dump-times vect "vectorized 1 loops" 1
    [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-reduc-7.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

    PASS: gcc.dg/vect/tsvc/vect-tsvc-s127.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/tsvc/vect-tsvc-s127.c execution test
    PASS: gcc.dg/vect/tsvc/vect-tsvc-s127.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-119.c (test for excess errors)
    PASS: gcc.dg/vect/vect-119.c scan-tree-dump-not optimized "Invalid sum"
    [-FAIL:-]{+PASS:+} gcc.dg/vect/vect-119.c scan-tree-dump-times vect "Detected interleaving load of size 2" 1

    PASS: gcc.dg/vect/vect-cselim-1.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-cselim-1.c execution test
    PASS: gcc.dg/vect/vect-cselim-1.c scan-tree-dump-times vect "vectorized 2 loops" 1

    PASS: gcc.dg/vect/vect-fmax-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-fmax-3.c execution test
    PASS: gcc.dg/vect/vect-fmax-3.c scan-tree-dump vect "Detected reduction"

    PASS: gcc.dg/vect/vect-pr114375.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-pr114375.c execution test

    PASS: gcc.dg/vect/vect-strided-a-mult.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-mult.c execution test
    PASS: gcc.dg/vect/vect-strided-a-mult.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u16-i2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u16-i2.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u16-i2.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u16-i4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u16-i4.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u16-i4.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u16-mult.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u16-mult.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u16-mult.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u32-mult.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u32-mult.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u32-mult.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u8-i2-gap.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect "vectorized 2 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-a-u8-i8-gap2.c execution test
    PASS: gcc.dg/vect/vect-strided-a-u8-i8-gap2.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-float.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-float.c execution test
    XFAIL: gcc.dg/vect/vect-strided-float.c scan-tree-dump-times vect "vectorized 0 loops" 1

    PASS: gcc.dg/vect/vect-strided-mult-char-ls.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-mult-char-ls.c execution test
    PASS: gcc.dg/vect/vect-strided-mult-char-ls.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-mult.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-mult.c execution test
    PASS: gcc.dg/vect/vect-strided-mult.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-same-dr.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-same-dr.c execution test
    PASS: gcc.dg/vect/vect-strided-same-dr.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-store-a-u8-i2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-store-a-u8-i2.c execution test
    PASS: gcc.dg/vect/vect-strided-store-a-u8-i2.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-store-u16-i4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-store-u16-i4.c execution test
    PASS: gcc.dg/vect/vect-strided-store-u16-i4.c scan-tree-dump-times vect "vectorized 1 loops" 2

    PASS: gcc.dg/vect/vect-strided-store-u32-i2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-store-u32-i2.c execution test
    XFAIL: gcc.dg/vect/vect-strided-store-u32-i2.c scan-tree-dump-times vect "vectorized 0 loops" 1
    PASS: gcc.dg/vect/vect-strided-store-u32-i2.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u16-i2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u16-i2.c execution test
    PASS: gcc.dg/vect/vect-strided-u16-i2.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u16-i3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u16-i3.c execution test
    PASS: gcc.dg/vect/vect-strided-u16-i3.c scan-tree-dump-times vect "vectorized 4 loops" 1

    PASS: gcc.dg/vect/vect-strided-u16-i4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u16-i4.c execution test
    PASS: gcc.dg/vect/vect-strided-u16-i4.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u32-i4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u32-i4.c execution test
    PASS: gcc.dg/vect/vect-strided-u32-i4.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u32-i8.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u32-i8.c execution test
    PASS: gcc.dg/vect/vect-strided-u32-i8.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u32-mult.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u32-mult.c execution test
    PASS: gcc.dg/vect/vect-strided-u32-mult.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i2-gap.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i2-gap.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i2-gap.c scan-tree-dump-times vect "vectorized 2 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i2.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i2.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap2.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap2.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8-gap4.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8-gap4.c scan-tree-dump-times vect "vectorized 2 loops" 1

    PASS: gcc.dg/vect/vect-strided-u8-i8.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-strided-u8-i8.c execution test
    PASS: gcc.dg/vect/vect-strided-u8-i8.c scan-tree-dump-times vect "vectorized 1 loops" 1

    PASS: gcc.dg/vect/vect-vfa-03.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/vect/vect-vfa-03.c execution test
    XFAIL: gcc.dg/vect/vect-vfa-03.c scan-tree-dump-times vect "vectorized 1 loops" 0
    PASS: gcc.dg/vect/vect-vfa-03.c scan-tree-dump-times vect "vectorized 1 loops" 1

Miscellaneous GCN target Fortran regressions: generally, both
'-march=gfx908', '-march=gfx1100':

    @@ -811,9 +811,9 @@ PASS: gfortran.dg/c-interop/fc-descriptor-7.f90   -O1  execution test
    PASS: gfortran.dg/c-interop/fc-descriptor-7.f90   -O2  (test for excess errors)
    PASS: gfortran.dg/c-interop/fc-descriptor-7.f90   -O2  execution test
    PASS: gfortran.dg/c-interop/fc-descriptor-7.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/c-interop/fc-descriptor-7.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/c-interop/fc-descriptor-7.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/c-interop/fc-descriptor-7.f90   -O3 -g  execution test
    PASS: gfortran.dg/c-interop/fc-descriptor-7.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/c-interop/fc-descriptor-7.f90   -Os  execution test

    @@ -1013,9 +1013,9 @@ PASS: gfortran.dg/c-interop/ff-descriptor-7.f90   -O1  execution test
    PASS: gfortran.dg/c-interop/ff-descriptor-7.f90   -O2  (test for excess errors)
    PASS: gfortran.dg/c-interop/ff-descriptor-7.f90   -O2  execution test
    PASS: gfortran.dg/c-interop/ff-descriptor-7.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/c-interop/ff-descriptor-7.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/c-interop/ff-descriptor-7.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/c-interop/ff-descriptor-7.f90   -O3 -g  execution test
    PASS: gfortran.dg/c-interop/ff-descriptor-7.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/c-interop/ff-descriptor-7.f90   -Os  execution test

    @@ -26750,9 +26751,9 @@ PASS: gfortran.dg/finalize_15.f90   -O1  execution test
    PASS: gfortran.dg/finalize_15.f90   -O2  (test for excess errors)
    PASS: gfortran.dg/finalize_15.f90   -O2  execution test
    PASS: gfortran.dg/finalize_15.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/finalize_15.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/finalize_15.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/finalize_15.f90   -O3 -g  execution test
    PASS: gfortran.dg/finalize_15.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/finalize_15.f90   -Os  execution test

    PASS: gfortran.dg/inline_matmul_10.f90   -O  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_matmul_10.f90   -O  execution test

    @@ -30726,17 +30729,17 @@ PASS: gfortran.dg/inline_matmul_24.f90   -O2  (test for excess errors)
    PASS: gfortran.dg/inline_matmul_24.f90   -O2  execution test
    PASS: gfortran.dg/inline_matmul_24.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions   scan-tree-dump-times original "gamma5\\[__var_1_do \\* 4 \\+ __var_2_do\\]|gamma5\\[NON_LVALUE_EXPR <__var_1_do> \\* 4 \\+ NON_LVALUE_EXPR <__var_2_do>\\]" 1
    PASS: gfortran.dg/inline_matmul_24.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_matmul_24.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/inline_matmul_24.f90   -O3 -g   scan-tree-dump-times original "gamma5\\[__var_1_do \\* 4 \\+ __var_2_do\\]|gamma5\\[NON_LVALUE_EXPR <__var_1_do> \\* 4 \\+ NON_LVALUE_EXPR <__var_2_do>\\]" 1
    PASS: gfortran.dg/inline_matmul_24.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_matmul_24.f90   -O3 -g  execution test
    PASS: gfortran.dg/inline_matmul_24.f90   -Os   scan-tree-dump-times original "gamma5\\[__var_1_do \\* 4 \\+ __var_2_do\\]|gamma5\\[NON_LVALUE_EXPR <__var_1_do> \\* 4 \\+ NON_LVALUE_EXPR <__var_2_do>\\]" 1
    PASS: gfortran.dg/inline_matmul_24.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/inline_matmul_24.f90   -Os  execution test

    PASS: gfortran.dg/inline_matmul_3.f90   -O   scan-tree-dump-times optimized "_gfortran_matmul" 8
    PASS: gfortran.dg/inline_matmul_3.f90   -O  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_matmul_3.f90   -O  execution test

    @@ -30877,7 +30880,7 @@ PASS: gfortran.dg/inline_transpose_1.f90   -O0   scan-tree-dump-times original "
    PASS: gfortran.dg/inline_transpose_1.f90   -O0   scan-tree-dump-times original "_gfortran_transpose" 0
    PASS: gfortran.dg/inline_transpose_1.f90   -O0   scan-tree-dump-times original "struct[^\\n]*atmp" 24
    PASS: gfortran.dg/inline_transpose_1.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_transpose_1.f90   -O0  execution test
    PASS: gfortran.dg/inline_transpose_1.f90   -O1   (test for warnings, line 112)
    PASS: gfortran.dg/inline_transpose_1.f90   -O1   (test for warnings, line 120)
    PASS: gfortran.dg/inline_transpose_1.f90   -O1   (test for warnings, line 144)
    @@ -30903,7 +30906,7 @@ PASS: gfortran.dg/inline_transpose_1.f90   -O1   scan-tree-dump-times original "
    PASS: gfortran.dg/inline_transpose_1.f90   -O1   scan-tree-dump-times original "_gfortran_transpose" 0
    PASS: gfortran.dg/inline_transpose_1.f90   -O1   scan-tree-dump-times original "struct[^\\n]*atmp" 24
    PASS: gfortran.dg/inline_transpose_1.f90   -O1  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_transpose_1.f90   -O1  execution test
    PASS: gfortran.dg/inline_transpose_1.f90   -O2   (test for warnings, line 112)
    PASS: gfortran.dg/inline_transpose_1.f90   -O2   (test for warnings, line 120)
    PASS: gfortran.dg/inline_transpose_1.f90   -O2   (test for warnings, line 144)
    @@ -30929,7 +30932,7 @@ PASS: gfortran.dg/inline_transpose_1.f90   -O2   scan-tree-dump-times original "
    PASS: gfortran.dg/inline_transpose_1.f90   -O2   scan-tree-dump-times original "_gfortran_transpose" 0
    PASS: gfortran.dg/inline_transpose_1.f90   -O2   scan-tree-dump-times original "struct[^\\n]*atmp" 24
    PASS: gfortran.dg/inline_transpose_1.f90   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_transpose_1.f90   -O2  execution test
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions   (test for warnings, line 112)
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions   (test for warnings, line 120)
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions   (test for warnings, line 144)
    @@ -30955,7 +30958,7 @@ PASS: gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loo
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions   scan-tree-dump-times original "_gfortran_transpose" 0
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions   scan-tree-dump-times original "struct[^\\n]*atmp" 24
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_transpose_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -g   (test for warnings, line 112)
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -g   (test for warnings, line 120)
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -g   (test for warnings, line 144)
    @@ -30981,7 +30984,7 @@ PASS: gfortran.dg/inline_transpose_1.f90   -O3 -g   scan-tree-dump-times origina
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -g   scan-tree-dump-times original "_gfortran_transpose" 0
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -g   scan-tree-dump-times original "struct[^\\n]*atmp" 24
    PASS: gfortran.dg/inline_transpose_1.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_transpose_1.f90   -O3 -g  execution test
    PASS: gfortran.dg/inline_transpose_1.f90   -Os   (test for warnings, line 112)
    PASS: gfortran.dg/inline_transpose_1.f90   -Os   (test for warnings, line 120)
    PASS: gfortran.dg/inline_transpose_1.f90   -Os   (test for warnings, line 144)
    @@ -31007,7 +31010,7 @@ PASS: gfortran.dg/inline_transpose_1.f90   -Os   scan-tree-dump-times original "
    PASS: gfortran.dg/inline_transpose_1.f90   -Os   scan-tree-dump-times original "_gfortran_transpose" 0
    PASS: gfortran.dg/inline_transpose_1.f90   -Os   scan-tree-dump-times original "struct[^\\n]*atmp" 24
    PASS: gfortran.dg/inline_transpose_1.f90   -Os  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/inline_transpose_1.f90   -Os  execution test

    PASS: gfortran.dg/intrinsic_intkinds_1.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/intrinsic_intkinds_1.f90   -O0  execution test
    PASS: gfortran.dg/intrinsic_intkinds_1.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/intrinsic_intkinds_1.f90   -O1  execution test
    PASS: gfortran.dg/intrinsic_intkinds_1.f90   -O2  (test for excess errors)

    PASS: gfortran.dg/matmul_1.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_1.f90   -O0  execution test
    PASS: gfortran.dg/matmul_1.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/matmul_1.f90   -O1  execution test
    PASS: gfortran.dg/matmul_1.f90   -O2  (test for excess errors)
    PASS: gfortran.dg/matmul_1.f90   -O2  execution test
    PASS: gfortran.dg/matmul_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_1.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/matmul_1.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_1.f90   -O3 -g  execution test
    PASS: gfortran.dg/matmul_1.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/matmul_1.f90   -Os  execution test

    PASS: gfortran.dg/matmul_10.f90   -O0   (test for warnings, line 12)
    PASS: gfortran.dg/matmul_10.f90   -O0   (test for warnings, line 17)
    PASS: gfortran.dg/matmul_10.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_10.f90   -O0  execution test
    PASS: gfortran.dg/matmul_10.f90   -O1   (test for warnings, line 12)
    PASS: gfortran.dg/matmul_10.f90   -O1   (test for warnings, line 17)
    PASS: gfortran.dg/matmul_10.f90   -O1  (test for excess errors)
    @@ -34670,7 +34673,7 @@ PASS: gfortran.dg/matmul_10.f90   -Os  execution test

    PASS: gfortran.dg/matmul_12.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_12.f90   -O0  execution test
    PASS: gfortran.dg/matmul_12.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/matmul_12.f90   -O1  execution test
    PASS: gfortran.dg/matmul_12.f90   -O2  (test for excess errors)

    PASS: gfortran.dg/matmul_2.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_2.f90   -O0  execution test
    PASS: gfortran.dg/matmul_2.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/matmul_2.f90   -O1  execution test
    PASS: gfortran.dg/matmul_2.f90   -O2  (test for excess errors)

    PASS: gfortran.dg/matmul_3.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_3.f90   -O0  execution test
    PASS: gfortran.dg/matmul_3.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/matmul_3.f90   -O1  execution test
    PASS: gfortran.dg/matmul_3.f90   -O2  (test for excess errors)

    PASS: gfortran.dg/matmul_6.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/matmul_6.f90   -O0  execution test
    PASS: gfortran.dg/matmul_6.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/matmul_6.f90   -O1  execution test
    PASS: gfortran.dg/matmul_6.f90   -O2  (test for excess errors)

    @@ -38912,11 +38915,11 @@ PASS: gfortran.dg/overload_5.f90   -O0  execution test
    PASS: gfortran.dg/overload_5.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/overload_5.f90   -O1  execution test
    PASS: gfortran.dg/overload_5.f90   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/overload_5.f90   -O2  execution test
    PASS: gfortran.dg/overload_5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/overload_5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/overload_5.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/overload_5.f90   -O3 -g  execution test
    PASS: gfortran.dg/overload_5.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/overload_5.f90   -Os  execution test
      Not '-march=gfx908'; '-march=gfx1100' only.

    @@ -39828,9 +39831,9 @@ PASS: gfortran.dg/pointer_assign_4.f90   -O1  execution test
    PASS: gfortran.dg/pointer_assign_4.f90   -O2  (test for excess errors)
    PASS: gfortran.dg/pointer_assign_4.f90   -O2  execution test
    PASS: gfortran.dg/pointer_assign_4.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/pointer_assign_4.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/pointer_assign_4.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/pointer_assign_4.f90   -O3 -g  execution test
    PASS: gfortran.dg/pointer_assign_4.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/pointer_assign_4.f90   -Os  execution test

    @@ -40290,11 +40293,11 @@ PASS: gfortran.dg/pointer_remapping_10.f90   -O0  execution test
    PASS: gfortran.dg/pointer_remapping_10.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/pointer_remapping_10.f90   -O1  execution test
    PASS: gfortran.dg/pointer_remapping_10.f90   -O2  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/pointer_remapping_10.f90   -O2  execution test
    PASS: gfortran.dg/pointer_remapping_10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/pointer_remapping_10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
    PASS: gfortran.dg/pointer_remapping_10.f90   -O3 -g  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/pointer_remapping_10.f90   -O3 -g  execution test
    PASS: gfortran.dg/pointer_remapping_10.f90   -Os  (test for excess errors)
    PASS: gfortran.dg/pointer_remapping_10.f90   -Os  execution test
      Not '-march=gfx908'; '-march=gfx1100' only.

    PASS: gfortran.dg/transpose_4.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/transpose_4.f90   -O0  execution test
    PASS: gfortran.dg/transpose_4.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/transpose_4.f90   -O1  execution test
    PASS: gfortran.dg/transpose_4.f90   -O2  (test for excess errors)

    PASS: gfortran.dg/vector_subscript_5.f90   -O0  (test for excess errors)
    [-PASS:-]{+FAIL:+} gfortran.dg/vector_subscript_5.f90   -O0  execution test
    PASS: gfortran.dg/vector_subscript_5.f90   -O1  (test for excess errors)
    PASS: gfortran.dg/vector_subscript_5.f90   -O1  execution test
    PASS: gfortran.dg/vector_subscript_5.f90   -O2  (test for excess errors)

    @@ -56898,7 +56901,7 @@ PASS: gfortran.fortran-torture/execute/arrayarg.f90 execution,  -O2
    PASS: gfortran.fortran-torture/execute/arrayarg.f90 execution,  -O2 -fbounds-check 
    PASS: gfortran.fortran-torture/execute/arrayarg.f90 execution,  -O2 -fomit-frame-pointer -finline-functions 
    PASS: gfortran.fortran-torture/execute/arrayarg.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/arrayarg.f90 execution,  -O3 -g 
    PASS: gfortran.fortran-torture/execute/arrayarg.f90 execution,  -Os 

    @@ -57982,11 +57985,11 @@ PASS: gfortran.fortran-torture/execute/in-pack.f90 compilation,  -O3 -g
    PASS: gfortran.fortran-torture/execute/in-pack.f90 compilation,  -Os 
    PASS: gfortran.fortran-torture/execute/in-pack.f90 execution,  -O0 
    PASS: gfortran.fortran-torture/execute/in-pack.f90 execution,  -O1 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/in-pack.f90 execution,  -O2 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/in-pack.f90 execution,  -O2 -fbounds-check 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/in-pack.f90 execution,  -O2 -fomit-frame-pointer -finline-functions 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/in-pack.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/in-pack.f90 execution,  -O3 -g 
    PASS: gfortran.fortran-torture/execute/in-pack.f90 execution,  -Os 
      Not '-march=gfx908'; '-march=gfx1100' only.

    @@ -58460,7 +58463,7 @@ PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -f
    PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops 
    PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -O3 -g 
    PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 compilation,  -Os 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O0 
    PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O1 
    PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 
    PASS: gfortran.fortran-torture/execute/intrinsic_matmul.f90 execution,  -O2 -fbounds-check 

    @@ -60034,7 +60037,7 @@ PASS: gfortran.fortran-torture/execute/where_1.f90 execution,  -O2
    PASS: gfortran.fortran-torture/execute/where_1.f90 execution,  -O2 -fbounds-check 
    PASS: gfortran.fortran-torture/execute/where_1.f90 execution,  -O2 -fomit-frame-pointer -finline-functions 
    PASS: gfortran.fortran-torture/execute/where_1.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/where_1.f90 execution,  -O3 -g 
    PASS: gfortran.fortran-torture/execute/where_1.f90 execution,  -Os 

    @@ -60226,7 +60229,7 @@ PASS: gfortran.fortran-torture/execute/where_6.f90 execution,  -O2
    PASS: gfortran.fortran-torture/execute/where_6.f90 execution,  -O2 -fbounds-check 
    PASS: gfortran.fortran-torture/execute/where_6.f90 execution,  -O2 -fomit-frame-pointer -finline-functions 
    PASS: gfortran.fortran-torture/execute/where_6.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops 
    [-PASS:-]{+FAIL:+} gfortran.fortran-torture/execute/where_6.f90 execution,  -O3 -g 
    PASS: gfortran.fortran-torture/execute/where_6.f90 execution,  -Os 


Grüße
 Thomas


> 2021-01-13  Julian Brown  <julian@codesourcery.com>
>
> gcc/
> 	* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
> 	documentation hook.
> 	* doc/tm.texi: Regenerate.
> 	* target.def (prefer_gather_scatter): Add target hook under vectorizer.
> 	* tree-vect-stmts.c (get_group_load_store_type): Optionally prefer
> 	gather/scatter instructions to scalar/elementwise fallback.
> 	* config/gcn/gcn.c (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
> 	hook.
> ---
>  gcc/config/gcn/gcn.c  | 2 ++
>  gcc/doc/tm.texi       | 5 +++++
>  gcc/doc/tm.texi.in    | 2 ++
>  gcc/target.def        | 8 ++++++++
>  gcc/tree-vect-stmts.c | 9 +++++++--
>  5 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
> index ee9f00558305..ea88b5e91244 100644
> --- a/gcc/config/gcn/gcn.c
> +++ b/gcc/config/gcn/gcn.c
> @@ -6501,6 +6501,8 @@ gcn_dwarf_register_span (rtx rtl)
>    gcn_vector_alignment_reachable
>  #undef  TARGET_VECTOR_MODE_SUPPORTED_P
>  #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p
> +#undef  TARGET_VECTORIZE_PREFER_GATHER_SCATTER
> +#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true
>  
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 581b7b51eeb0..bd0b2eea477a 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6122,6 +6122,11 @@ The default is @code{NULL_TREE} which means to not vectorize scatter
>  stores.
>  @end deftypefn
>  
> +@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER
> +This hook is set to TRUE if gather loads or scatter stores are cheaper on
> +this target than a sequence of elementwise loads or stores.
> +@end deftypevr
> +
>  @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
>  This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
>  fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index afa19d4ac63c..c0883e5da82c 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4195,6 +4195,8 @@ address;  but often a machine-dependent strategy can generate better code.
>  
>  @hook TARGET_VECTORIZE_BUILTIN_SCATTER
>  
> +@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER
> +
>  @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
>  
>  @hook TARGET_SIMD_CLONE_ADJUST
> diff --git a/gcc/target.def b/gcc/target.def
> index 00421f3a6acd..0b34ab5c3d52 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -2027,6 +2027,14 @@ all zeros.  GCC can then try to branch around the instruction instead.",
>   (unsigned ifn),
>   default_empty_mask_is_expensive)
>  
> +/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\
> +we cannot use a contiguous access.  */
> +DEFHOOKPOD
> +(prefer_gather_scatter,
> + "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\
> +this target than a sequence of elementwise loads or stores.",
> + bool, false)
> +
>  /* Target builtin that implements vector gather operation.  */
>  DEFHOOK
>  (builtin_gather,
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 9ace345fc5e2..e117d3d16afc 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -2444,9 +2444,14 @@ get_group_load_store_type (stmt_vec_info stmt_info, tree vectype, bool slp,
>  	 it probably isn't a win to use separate strided accesses based
>  	 on nearby locations.  Or, even if it's a win over scalar code,
>  	 it might not be a win over vectorizing at a lower VF, if that
> -	 allows us to use contiguous accesses.  */
> +	 allows us to use contiguous accesses.
> +
> +	 On some targets (e.g. AMD GCN), always use gather/scatter accesses
> +	 here since those are the only types of vector loads/stores available,
> +	 and the fallback case of using elementwise accesses is very
> +	 inefficient.  */
>        if (*memory_access_type == VMAT_ELEMENTWISE
> -	  && single_element_p
> +	  && (targetm.vectorize.prefer_gather_scatter || single_element_p)
>  	  && loop_vinfo
>  	  && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
>  						 masked_p, gs_info))
diff mbox series

Patch

From 8678fc697046fba1014f1db6321ee670538b0881 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Wed, 3 Jul 2024 12:20:17 +0200
Subject: [PATCH] Revert "[og10] vect: Add target hook to prefer gather/scatter
 instructions"

Testing current OG14 commit 735bbbfc6eaf58522c3ebb0946b66f33958ea134 for
'--target=amdgcn-amdhsa' (I've tested '-march=gfx908', '-march=gfx1100'),
this change has been identified to be causing ~100 instances of execution
test PASS -> FAIL, thus wrong-code generation.  It's possible that we've
had the same misbehavior also on OG13 and earlier, but just nobody ever
tested that.  And/or, that at some point in time, the original patch fell
out of sync, wasn't updated for relevant upstream vectorizer changes.
Until someone gets to analyze that (and upstream these changes here), we
shall revert this commit on OG14.

	gcc/
	* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Remove
	documentation hook.
	* doc/tm.texi: Regenerate.
	* target.def (prefer_gather_scatter): Remove target hook under
	vectorizer.
	* tree-vect-stmts.cc (get_group_load_store_type): Remove code to
	optionally prefer gather/scatter instructions to
	scalar/elementwise fallback.
	* config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER):
	Remove hook definition.

This reverts OG14 commit 4abc54b6d6c3129cf4233e49231b1255b236c2be.
---
 gcc/ChangeLog.omp      | 13 +++++++++++++
 gcc/config/gcn/gcn.cc  |  2 --
 gcc/doc/tm.texi        |  5 -----
 gcc/doc/tm.texi.in     |  2 --
 gcc/target.def         |  8 --------
 gcc/tree-vect-stmts.cc |  9 ++-------
 6 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index ac4a30e81c8..3dd5bd03dc9 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,16 @@ 
+2024-07-03  Thomas Schwinge  <tschwinge@baylibre.com>
+
+	* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Remove
+	documentation hook.
+	* doc/tm.texi: Regenerate.
+	* target.def (prefer_gather_scatter): Remove target hook under
+	vectorizer.
+	* tree-vect-stmts.cc (get_group_load_store_type): Remove code to
+	optionally prefer gather/scatter instructions to
+	scalar/elementwise fallback.
+	* config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER):
+	Remove hook definition.
+
 2024-05-19  Roger Sayle  <roger@nextmovesoftware.com>
 
 	* config/nvptx/nvptx.md (popcount<mode>2): Split into...
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index a247eecd8e8..d6531f55190 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -8059,8 +8059,6 @@  gcn_dwarf_register_span (rtx rtl)
   gcn_vector_alignment_reachable
 #undef  TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p
-#undef  TARGET_VECTORIZE_PREFER_GATHER_SCATTER
-#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index e64c7541f60..c8b8b126b24 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6482,11 +6482,6 @@  The default is @code{NULL_TREE} which means to not vectorize scatter
 stores.
 @end deftypefn
 
-@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER
-This hook is set to TRUE if gather loads or scatter stores are cheaper on
-this target than a sequence of elementwise loads or stores.
-@end deftypevr
-
 @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 645950b12d7..658e1e63371 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4309,8 +4309,6 @@  address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_SCATTER
 
-@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER
-
 @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
 
 @hook TARGET_SIMD_CLONE_ADJUST
diff --git a/gcc/target.def b/gcc/target.def
index e4b26a7df3e..fdad7bbc93e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2044,14 +2044,6 @@  all zeros.  GCC can then try to branch around the instruction instead.",
  (unsigned ifn),
  default_empty_mask_is_expensive)
 
-/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\
-we cannot use a contiguous access.  */
-DEFHOOKPOD
-(prefer_gather_scatter,
- "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\
-this target than a sequence of elementwise loads or stores.",
- bool, false)
-
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a7e33120eda..f8d8636b139 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2217,14 +2217,9 @@  get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
 	 it probably isn't a win to use separate strided accesses based
 	 on nearby locations.  Or, even if it's a win over scalar code,
 	 it might not be a win over vectorizing at a lower VF, if that
-	 allows us to use contiguous accesses.
-
-	 On some targets (e.g. AMD GCN), always use gather/scatter accesses
-	 here since those are the only types of vector loads/stores available,
-	 and the fallback case of using elementwise accesses is very
-	 inefficient.  */
+	 allows us to use contiguous accesses.  */
       if (*memory_access_type == VMAT_ELEMENTWISE
-	  && (targetm.vectorize.prefer_gather_scatter || single_element_p)
+	  && single_element_p
 	  && loop_vinfo
 	  && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
 						 masked_p, gs_info))
-- 
2.34.1