Message ID | 20240930133350.3202348-1-victor.donascimento@arm.com |
---|---|
State | New |
Headers | show |
Series | middle-end: Fix ifcvt predicate generation for masked function calls | expand |
Hi Victor, Thanks! This looks good to me with one minor comment: > -----Original Message----- > From: Victor Do Nascimento <victor.donascimento@arm.com> > Sent: Monday, September 30, 2024 2:34 PM > To: gcc-patches@gcc.gnu.org > Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com; > Victor Do Nascimento <Victor.DoNascimento@arm.com> > Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked function > calls > > Up until now, due to a latent bug in the code for the ifcvt pass, > irrespective of the branch taken in a conditional statement, the > original condition for the if statement was used in masking the > function call. > > Thus, for code such as: > > if (a[i] > limit) > b[i] = fixed_const; > else > b[i] = fn (a[i]); > > we would generate the following (wrong) if-converted tree code: > > _1 = a[i_1]; > _2 = _1 > limit; > _3 = .MASK_CALL (fn, _1, _2); > cstore_4 = _2 ? fixed_const : _3; > > as opposed to the correct expected sequence: > > _1 = a[i_1]; > _2 = _1 > limit; > _3 = ~_2; > _4 = .MASK_CALL (fn, _1, _3); > cstore_5 = _2 ? fixed_const : _4; > > This patch ensures that the correct predicate mask generation is > carried out such that, upon autovectorization, the correct vector > lanes are selected in the vectorized function call. > > gcc/ChangeLog: > > * tree-if-conv.cc (predicate_statements): Fix handling of > predicated function calls. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-fncall-mask.c: New. > --- > gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 ++++++++++++++++++++ > gcc/tree-if-conv.cc | 14 ++++++++- > 2 files changed, 44 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > new file mode 100644 > index 00000000000..554488e0630 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile { target { aarch64*-*-* } } } */ > +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" > { target { aarch64*-*-* } } } */ > + > +extern int __attribute__ ((simd, const)) fn (int); > + > +const int N = 20; > +const float lim = 101.0; > +const float cst = -1.0; > +float tot = 0.0; > + > +float b[20]; > +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */ > + [10 ... 19] = 100.0 }; /* Else branch. */ > + > +int main (void) > +{ > + #pragma omp simd > + for (int i = 0; i < N; i += 1) > + { > + if (a[i] > lim) > + b[i] = cst; > + else > + b[i] = fn (a[i]); > + tot += b[i]; > + } > + return (0); > +} > + > +/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2, > NULL>} ifcvt } } */ > +/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL, > NULL>} ifcvt } } */ > +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } } > */ > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index 0346a1376c5..246a6bb5bd1 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop) > This will cause the vectorizer to match the "in branch" > clone variants, and serves to build the mask vector > in a natural way. */ > + tree mask = cond; > + gimple_seq stmts = NULL; > gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi)); > tree orig_fn = gimple_call_fn (call); > int orig_nargs = gimple_call_num_args (call); > @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop) > args.safe_push (orig_fn); > for (int i = 0; i < orig_nargs; i++) > args.safe_push (gimple_call_arg (call, i)); > - args.safe_push (cond); > + /* If `swap', we invert the mask used for the if branch for use > + when masking the function call. */ > + if (swap) > + { > + tree true_val > + = constant_boolean_node (true, TREE_TYPE (mask)); > + mask = gimple_build (&stmts, BIT_XOR_EXPR, > + TREE_TYPE (mask), mask, true_val); > + } > + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT); Looks like this mirrors what is currently being done for gimple_assign, but you can move the gsi_insert_seq and the declaration of stmts into the if block since they're only used there. Otherwise looks good to me but can't approve. Thanks, Tamar > + args.safe_push (mask); > > /* Replace the call with a IFN_MASK_CALL that has the extra > condition parameter. */ > -- > 2.34.1
On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina <Tamar.Christina@arm.com> wrote: > > Hi Victor, > > Thanks! This looks good to me with one minor comment: > > > -----Original Message----- > > From: Victor Do Nascimento <victor.donascimento@arm.com> > > Sent: Monday, September 30, 2024 2:34 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com; > > Victor Do Nascimento <Victor.DoNascimento@arm.com> > > Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked function > > calls > > > > Up until now, due to a latent bug in the code for the ifcvt pass, > > irrespective of the branch taken in a conditional statement, the > > original condition for the if statement was used in masking the > > function call. > > > > Thus, for code such as: > > > > if (a[i] > limit) > > b[i] = fixed_const; > > else > > b[i] = fn (a[i]); > > > > we would generate the following (wrong) if-converted tree code: > > > > _1 = a[i_1]; > > _2 = _1 > limit; > > _3 = .MASK_CALL (fn, _1, _2); > > cstore_4 = _2 ? fixed_const : _3; > > > > as opposed to the correct expected sequence: > > > > _1 = a[i_1]; > > _2 = _1 > limit; > > _3 = ~_2; > > _4 = .MASK_CALL (fn, _1, _3); > > cstore_5 = _2 ? fixed_const : _4; > > > > This patch ensures that the correct predicate mask generation is > > carried out such that, upon autovectorization, the correct vector > > lanes are selected in the vectorized function call. > > > > gcc/ChangeLog: > > > > * tree-if-conv.cc (predicate_statements): Fix handling of > > predicated function calls. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-fncall-mask.c: New. > > --- > > gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 ++++++++++++++++++++ > > gcc/tree-if-conv.cc | 14 ++++++++- > > 2 files changed, 44 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > > b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > > new file mode 100644 > > index 00000000000..554488e0630 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c > > @@ -0,0 +1,31 @@ > > +/* { dg-do compile { target { aarch64*-*-* } } } */ > > +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" > > { target { aarch64*-*-* } } } */ > > + > > +extern int __attribute__ ((simd, const)) fn (int); > > + > > +const int N = 20; > > +const float lim = 101.0; > > +const float cst = -1.0; > > +float tot = 0.0; > > + > > +float b[20]; > > +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */ > > + [10 ... 19] = 100.0 }; /* Else branch. */ > > + > > +int main (void) > > +{ > > + #pragma omp simd > > + for (int i = 0; i < N; i += 1) > > + { > > + if (a[i] > lim) > > + b[i] = cst; > > + else > > + b[i] = fn (a[i]); > > + tot += b[i]; > > + } > > + return (0); > > +} > > + > > +/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2, > > NULL>} ifcvt } } */ > > +/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL, > > NULL>} ifcvt } } */ > > +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } } > > */ > > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > > index 0346a1376c5..246a6bb5bd1 100644 > > --- a/gcc/tree-if-conv.cc > > +++ b/gcc/tree-if-conv.cc > > @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop) > > This will cause the vectorizer to match the "in branch" > > clone variants, and serves to build the mask vector > > in a natural way. */ > > + tree mask = cond; > > + gimple_seq stmts = NULL; > > gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi)); > > tree orig_fn = gimple_call_fn (call); > > int orig_nargs = gimple_call_num_args (call); > > @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop) > > args.safe_push (orig_fn); > > for (int i = 0; i < orig_nargs; i++) > > args.safe_push (gimple_call_arg (call, i)); > > - args.safe_push (cond); > > + /* If `swap', we invert the mask used for the if branch for use > > + when masking the function call. */ > > + if (swap) > > + { > > + tree true_val > > + = constant_boolean_node (true, TREE_TYPE (mask)); > > + mask = gimple_build (&stmts, BIT_XOR_EXPR, > > + TREE_TYPE (mask), mask, true_val); > > + } > > + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT); > > Looks like this mirrors what is currently being done for gimple_assign, but > you can move the gsi_insert_seq and the declaration of stmts into the if > block since they're only used there. > > Otherwise looks good to me but can't approve. OK. The issue is also present on the 13 and 14 branches? Can you see to backport the fix? Thanks, Richard. > Thanks, > Tamar > > > + args.safe_push (mask); > > > > /* Replace the call with a IFN_MASK_CALL that has the extra > > condition parameter. */ > > -- > > 2.34.1 >
On 10/1/24 13:10, Richard Biener wrote: > On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina <Tamar.Christina@arm.com> wrote: >> >> Hi Victor, >> >> Thanks! This looks good to me with one minor comment: >> >>> -----Original Message----- >>> From: Victor Do Nascimento <victor.donascimento@arm.com> >>> Sent: Monday, September 30, 2024 2:34 PM >>> To: gcc-patches@gcc.gnu.org >>> Cc: Tamar Christina <Tamar.Christina@arm.com>; richard.guenther@gmail.com; >>> Victor Do Nascimento <Victor.DoNascimento@arm.com> >>> Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked function >>> calls >>> >>> Up until now, due to a latent bug in the code for the ifcvt pass, >>> irrespective of the branch taken in a conditional statement, the >>> original condition for the if statement was used in masking the >>> function call. >>> >>> Thus, for code such as: >>> >>> if (a[i] > limit) >>> b[i] = fixed_const; >>> else >>> b[i] = fn (a[i]); >>> >>> we would generate the following (wrong) if-converted tree code: >>> >>> _1 = a[i_1]; >>> _2 = _1 > limit; >>> _3 = .MASK_CALL (fn, _1, _2); >>> cstore_4 = _2 ? fixed_const : _3; >>> >>> as opposed to the correct expected sequence: >>> >>> _1 = a[i_1]; >>> _2 = _1 > limit; >>> _3 = ~_2; >>> _4 = .MASK_CALL (fn, _1, _3); >>> cstore_5 = _2 ? fixed_const : _4; >>> >>> This patch ensures that the correct predicate mask generation is >>> carried out such that, upon autovectorization, the correct vector >>> lanes are selected in the vectorized function call. >>> >>> gcc/ChangeLog: >>> >>> * tree-if-conv.cc (predicate_statements): Fix handling of >>> predicated function calls. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.dg/vect/vect-fncall-mask.c: New. >>> --- >>> gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 ++++++++++++++++++++ >>> gcc/tree-if-conv.cc | 14 ++++++++- >>> 2 files changed, 44 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c >>> >>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c >>> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c >>> new file mode 100644 >>> index 00000000000..554488e0630 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c >>> @@ -0,0 +1,31 @@ >>> +/* { dg-do compile { target { aarch64*-*-* } } } */ >>> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" >>> { target { aarch64*-*-* } } } */ >>> + >>> +extern int __attribute__ ((simd, const)) fn (int); >>> + >>> +const int N = 20; >>> +const float lim = 101.0; >>> +const float cst = -1.0; >>> +float tot = 0.0; >>> + >>> +float b[20]; >>> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */ >>> + [10 ... 19] = 100.0 }; /* Else branch. */ >>> + >>> +int main (void) >>> +{ >>> + #pragma omp simd >>> + for (int i = 0; i < N; i += 1) >>> + { >>> + if (a[i] > lim) >>> + b[i] = cst; >>> + else >>> + b[i] = fn (a[i]); >>> + tot += b[i]; >>> + } >>> + return (0); >>> +} >>> + >>> +/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2, >>> NULL>} ifcvt } } */ >>> +/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL, >>> NULL>} ifcvt } } */ >>> +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } } >>> */ >>> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc >>> index 0346a1376c5..246a6bb5bd1 100644 >>> --- a/gcc/tree-if-conv.cc >>> +++ b/gcc/tree-if-conv.cc >>> @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop) >>> This will cause the vectorizer to match the "in branch" >>> clone variants, and serves to build the mask vector >>> in a natural way. */ >>> + tree mask = cond; >>> + gimple_seq stmts = NULL; >>> gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi)); >>> tree orig_fn = gimple_call_fn (call); >>> int orig_nargs = gimple_call_num_args (call); >>> @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop) >>> args.safe_push (orig_fn); >>> for (int i = 0; i < orig_nargs; i++) >>> args.safe_push (gimple_call_arg (call, i)); >>> - args.safe_push (cond); >>> + /* If `swap', we invert the mask used for the if branch for use >>> + when masking the function call. */ >>> + if (swap) >>> + { >>> + tree true_val >>> + = constant_boolean_node (true, TREE_TYPE (mask)); >>> + mask = gimple_build (&stmts, BIT_XOR_EXPR, >>> + TREE_TYPE (mask), mask, true_val); >>> + } >>> + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT); >> >> Looks like this mirrors what is currently being done for gimple_assign, but >> you can move the gsi_insert_seq and the declaration of stmts into the if >> block since they're only used there. >> >> Otherwise looks good to me but can't approve. > > OK. > > The issue is also present on the 13 and 14 branches? Can you see to > backport the fix? > > Thanks, > Richard. Of course, will look at getting it backported ASAP. Many thanks, Victor. >> Thanks, >> Tamar >> >>> + args.safe_push (mask); >>> >>> /* Replace the call with a IFN_MASK_CALL that has the extra >>> condition parameter. */ >>> -- >>> 2.34.1 >>
diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c new file mode 100644 index 00000000000..554488e0630 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c @@ -0,0 +1,31 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" { target { aarch64*-*-* } } } */ + +extern int __attribute__ ((simd, const)) fn (int); + +const int N = 20; +const float lim = 101.0; +const float cst = -1.0; +float tot = 0.0; + +float b[20]; +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */ + [10 ... 19] = 100.0 }; /* Else branch. */ + +int main (void) +{ + #pragma omp simd + for (int i = 0; i < N; i += 1) + { + if (a[i] > lim) + b[i] = cst; + else + b[i] = fn (a[i]); + tot += b[i]; + } + return (0); +} + +/* { dg-final { scan-tree-dump {gimple_assign <gt_expr, _12, _1, 1.01e\+2, NULL>} ifcvt } } */ +/* { dg-final { scan-tree-dump {gimple_assign <bit_not_expr, _34, _12, NULL, NULL>} ifcvt } } */ +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} ifcvt } } */ diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 0346a1376c5..246a6bb5bd1 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop) This will cause the vectorizer to match the "in branch" clone variants, and serves to build the mask vector in a natural way. */ + tree mask = cond; + gimple_seq stmts = NULL; gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi)); tree orig_fn = gimple_call_fn (call); int orig_nargs = gimple_call_num_args (call); @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop) args.safe_push (orig_fn); for (int i = 0; i < orig_nargs; i++) args.safe_push (gimple_call_arg (call, i)); - args.safe_push (cond); + /* If `swap', we invert the mask used for the if branch for use + when masking the function call. */ + if (swap) + { + tree true_val + = constant_boolean_node (true, TREE_TYPE (mask)); + mask = gimple_build (&stmts, BIT_XOR_EXPR, + TREE_TYPE (mask), mask, true_val); + } + gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT); + args.safe_push (mask); /* Replace the call with a IFN_MASK_CALL that has the extra condition parameter. */