Message ID | 20240611094538.1129822-1-victor.donascimento@arm.com |
---|---|
State | New |
Headers | show |
Series | [v2] middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061] | expand |
On Tue, Jun 11, 2024 at 11:46 AM Victor Do Nascimento <victor.donascimento@arm.com> wrote: > > At present the autovectorizer fails to vectorize simple loops > involving calls to `__builtin_prefetch'. A simple example of such > loop is given below: > > void foo(double * restrict a, double * restrict b, int n){ > int i; > for(i=0; i<n; ++i){ > a[i] = a[i] + b[i]; > __builtin_prefetch(&(b[i+8])); > } > } > > The failure stems from two issues: > > 1. Given that it is typically not possible to fully reason about a > function call due to the possibility of side effects, the > autovectorizer does not attempt to vectorize loops which make such > calls. > > Given the memory reference passed to `__builtin_prefetch', in the > absence of assurances about its effect on the passed memory > location the compiler deems the function unsafe to vectorize, > marking it as clobbering memory in `vect_find_stmt_data_reference'. > This leads to the failure in autovectorization. > > 2. Notwithstanding the above issue, though the prefetch statement > would be classed as `vect_unused_in_scope', the loop invariant that > is used in the address of the prefetch is the scalar loop's and not > the vector loop's IV. That is, it still uses `i' and not `vec_iv' > because the instruction wasn't vectorized, causing DCE to think the > value is live, such that we now have both the vector and scalar loop > invariant actively used in the loop. > > This patch addresses both of these: > > 1. About the issue regarding the memory clobber, data prefetch does > not generate faults if its address argument is invalid and does not > write to memory. Therefore, it does not alter the internal state > of the program or its control flow under any circumstance. As > such, it is reasonable that the function be marked as not affecting > memory contents. > > To achieve this, we add the necessary logic to > `get_references_in_stmt' to ensure that builtin functions are given > given the same treatment as internal functions. If the gimple call > is to a builtin function and its function code is > `BUILT_IN_PREFETCH', we mark `clobbers_memory' as false. > > 2. Finding precedence in the way clobber statements are handled, > whereby the vectorizer drops these from both the scalar and > vectorized versions of a given loop, we choose to drop prefetch > hints in a similar fashion. This seems appropriate given how > software prefetch hints are typically ignored by processors across > architectures, as they seldom lead to performance gain over their > hardware counterparts. OK. Thanks, Richard. > PR tree-optimization/114061 > > gcc/ChangeLog: > > * tree-data-ref.cc (get_references_in_stmt): set > `clobbers_memory' to false for __builtin_prefetch. > * tree-vect-loop.cc (vect_transform_loop): Drop all > __builtin_prefetch calls from loops. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-prefetch-drop.c: New test. > * gcc.target/aarch64/vect-prefetch-drop.c: Likewise. > --- > gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 12 ++++++++++++ > .../gcc.target/aarch64/vect-prefetch-drop.c | 13 +++++++++++++ > gcc/tree-data-ref.cc | 2 ++ > gcc/tree-vect-loop.cc | 6 ++++-- > 4 files changed, 31 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c > new file mode 100644 > index 00000000000..7a8915eb716 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +void foo(int * restrict a, int * restrict b, int n){ > + int i; > + for(i=0; i<n; ++i){ > + a[i] = a[i] + b[i]; > + __builtin_prefetch(&(b[i+8])); > + } > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c > new file mode 100644 > index 00000000000..e654b99fde8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile { target { aarch64*-*-* } } } */ > +/* { dg-additional-options "-O3 -march=armv9.2-a+sve --std=c99" { target { aarch64*-*-* } } } */ > + > +void foo(double * restrict a, double * restrict b, int n){ > + int i; > + for(i=0; i<n; ++i){ > + a[i] = a[i] + b[i]; > + __builtin_prefetch(&(b[i+8])); > + } > +} > + > +/* { dg-final { scan-assembler-not "prfm" } } */ > +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, z\[0-9\]+.d, z\[0-9\]+.d" } } */ > diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc > index 7b5f2d16238..bd61069b631 100644 > --- a/gcc/tree-data-ref.cc > +++ b/gcc/tree-data-ref.cc > @@ -5868,6 +5868,8 @@ get_references_in_stmt (gimple *stmt, vec<data_ref_loc, va_heap> *references) > clobbers_memory = true; > break; > } > + else if (gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH)) > + clobbers_memory = false; > else > clobbers_memory = true; > } > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index c471f1564a7..89cc6e64589 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -12177,8 +12177,10 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) > !gsi_end_p (si);) > { > stmt = gsi_stmt (si); > - /* During vectorization remove existing clobber stmts. */ > - if (gimple_clobber_p (stmt)) > + /* During vectorization remove existing clobber stmts and > + prefetches. */ > + if (gimple_clobber_p (stmt) > + || gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH)) > { > unlink_stmt_vdef (stmt); > gsi_remove (&si, true); > -- > 2.34.1 >
diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c new file mode 100644 index 00000000000..7a8915eb716 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +void foo(int * restrict a, int * restrict b, int n){ + int i; + for(i=0; i<n; ++i){ + a[i] = a[i] + b[i]; + __builtin_prefetch(&(b[i+8])); + } +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c new file mode 100644 index 00000000000..e654b99fde8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-additional-options "-O3 -march=armv9.2-a+sve --std=c99" { target { aarch64*-*-* } } } */ + +void foo(double * restrict a, double * restrict b, int n){ + int i; + for(i=0; i<n; ++i){ + a[i] = a[i] + b[i]; + __builtin_prefetch(&(b[i+8])); + } +} + +/* { dg-final { scan-assembler-not "prfm" } } */ +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, z\[0-9\]+.d, z\[0-9\]+.d" } } */ diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc index 7b5f2d16238..bd61069b631 100644 --- a/gcc/tree-data-ref.cc +++ b/gcc/tree-data-ref.cc @@ -5868,6 +5868,8 @@ get_references_in_stmt (gimple *stmt, vec<data_ref_loc, va_heap> *references) clobbers_memory = true; break; } + else if (gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH)) + clobbers_memory = false; else clobbers_memory = true; } diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index c471f1564a7..89cc6e64589 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -12177,8 +12177,10 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) !gsi_end_p (si);) { stmt = gsi_stmt (si); - /* During vectorization remove existing clobber stmts. */ - if (gimple_clobber_p (stmt)) + /* During vectorization remove existing clobber stmts and + prefetches. */ + if (gimple_clobber_p (stmt) + || gimple_call_builtin_p (stmt, BUILT_IN_PREFETCH)) { unlink_stmt_vdef (stmt); gsi_remove (&si, true);