Message ID | C3D79422-304A-46CD-86A3-8753A797D6D8@nvidia.com |
---|---|
State | New |
Headers | show |
Series | [PR106329] SVE intrinsics: Fold calls with pfalse predicate. | expand |
Thanks for doing this and sorry for the slow review. Jennifer Schmitz <jschmitz@nvidia.com> writes: > If an SVE intrinsic has predicate pfalse, we can fold the call to > a simplified assignment statement: For _m, _x, and implicit predication, > the LHS can be assigned the operand for inactive values and for _z, we can > assign a zero vector. _x functions don't really specify the values of inactive lanes. The values are instead "don't care" and so we have a free choice. Using zeros (as for _z) should be more efficient than reusing one of the inputs. > For example, > svint32_t foo (svint32_t op1, svint32_t op2) > { > return svadd_s32_m (svpfalse_b (), op1, op2); > } > can be folded to lhs <- op1, such that foo is compiled to just a RET. > > We implemented this optimization during gimple folding by calling a new method > function_shape::fold_pfalse from gimple_folder::fold. > The implementations of fold_pfalse in the function_shape subclasses define the > expected behavior for a pfalse predicate for the different predications > and return an appropriate gimple statement. > To avoid code duplication, function_shape::fold_pfalse calls a new method > gimple_folder:fold_by_pred that takes arguments of type tree for each > predication and returns the new assignment statement depending on the > predication. > > We tested the new behavior for each intrinsic with all supported predications > and data types and checked the produced assembly. There is a test file > for each shape subclass with scan-assembler-times tests that look for > the simplified instruction sequences, such as individual RET instructions > or zeroing moves. There is an additional directive counting the total number of > functions in the test, which must be the sum of counts of all other > directives. This is to check that all tested intrinsics were optimized. > > In this patch, we only implemented function_shape::fold_pfalse for > binary shapes. But we plan to cover more shapes in follow-up patches, > after getting feedback on this patch. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > OK for mainline? My main question is: can we not do this generically for _z, _x and _m, without the virtual method? (I wanted to prototype this locally before asking, hence the delay, but never found time.) The only snag is that we need to check whether the inactive lanes for _m are specified using a separate parameter before the predicate or whether they're taken from the parameter after the predicate. But that should be a generic rule that can be checked by looking at the types. That is, if the predication is PRED_z, PRED_m, or PRED_x, then I think we can say that: (1) if the first argument is a vector and the second argument is a pfalse predicate, the call folds to the first argument (2) if the first argument is a pfalse predicate and predication is PRED_m, the call folds to the second argument (3) if the first argument is a pfalse predicate and predication is not PRED_m, the call folds to zero. But perhaps I've forgotten a case. The implicit cases would still need to be handled separately. Perhaps for that case we could add a general fold method to the shape? I don't think we need to specialise the name of the virtual function beyond just "fold". It would be good to put: > + gimple_seq stmts = NULL; > + gimple *g = gimple_build_assign (lhs, new_lhs); > + gimple_seq_add_stmt_without_update (&stmts, g); > + gsi_replace_with_seq_vops (gsi, stmts); > + return g; into a helper so that we can use it for any g. (Maybe with yet another helper for folding to a gimple value, as here.) Some of the existing routines could use that helper too, so it would make a natural prepatch. I think I've made that sound more complicated than it really is, sorry. But I think the net effect should be less code, with automatic support for unary operations. Thanks, Richard > @@ -3666,6 +3683,39 @@ gimple_folder::fold_active_lanes_to (tree x) > return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); > } > > +/* Fold call to assignment statement > + lhs = new_lhs, > + where new_lhs is determined by the predication. > + Return the gimple statement on success, else return NULL. */ > +gimple * > +gimple_folder::fold_by_pred (tree m, tree x, tree z, tree implicit) > +{ > + tree new_lhs = NULL; > + switch (pred) > + { > + case PRED_z: > + new_lhs = z; > + break; > + case PRED_m: > + new_lhs = m; > + break; > + case PRED_x: > + new_lhs = x; > + break; > + case PRED_implicit: > + new_lhs = implicit; > + break; > + default: > + return NULL; > + } > + gcc_assert (new_lhs); > + gimple_seq stmts = NULL; > + gimple *g = gimple_build_assign (lhs, new_lhs); > + gimple_seq_add_stmt_without_update (&stmts, g); > + gsi_replace_with_seq_vops (gsi, stmts); > + return g; > +} > + > /* Try to fold the call. Return the new statement on success and null > on failure. */ > gimple * > @@ -3685,6 +3735,9 @@ gimple_folder::fold () > /* First try some simplifications that are common to many functions. */ > if (auto *call = redirect_pred_x ()) > return call; > + if (pred != PRED_none) > + if (auto *call = shape->fold_pfalse (*this)) > + return call; > > return base->fold (*this); > } > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h > index 4cdc0541bdc..4e443a8192e 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins.h > @@ -632,12 +632,15 @@ public: > gcall *redirect_call (const function_instance &); > gimple *redirect_pred_x (); > > + bool arg_is_pfalse_p (unsigned int idx); > + > gimple *fold_to_cstu (poly_uint64); > gimple *fold_to_pfalse (); > gimple *fold_to_ptrue (); > gimple *fold_to_vl_pred (unsigned int); > gimple *fold_const_binary (enum tree_code); > gimple *fold_active_lanes_to (tree); > + gimple *fold_by_pred (tree, tree, tree, tree); > > gimple *fold (); > > @@ -796,6 +799,11 @@ public: > /* Check whether the given call is semantically valid. Return true > if it is, otherwise report an error and return false. */ > virtual bool check (function_checker &) const { return true; } > + > + /* For a pfalse predicate, try to fold the given gimple call. > + Return the new gimple statement on success, otherwise return null. */ > + virtual gimple *fold_pfalse (gimple_folder &) const { return NULL; } > + > }; > > /* RAII class for enabling enough SVE features to define the built-in > @@ -829,6 +837,7 @@ extern tree acle_svprfop; > > bool vector_cst_all_same (tree, unsigned int); > bool is_ptrue (tree, unsigned int); > +bool is_pfalse (tree); > const function_instance *lookup_fndecl (tree); > > /* Try to find a mode with the given mode_suffix_info fields. Return the > diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c > new file mode 100644 > index 00000000000..3910ab36b6b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c > @@ -0,0 +1,176 @@ > +#include <arm_sve.h> > + > +#define MXZ(F, RTY, TY1, TY2) \ > + RTY F##_f (TY1 op1, TY2 op2) \ > + { \ > + return sv##F (svpfalse_b (), op1, op2); \ > + } > + > +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_##TY##_m, sv##RTY, sv##TYPE1, sv##TYPE2) \ > + MXZ (F##_##TY##_x, sv##RTY, sv##TYPE1, sv##TYPE2) > + > +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_##TY##_z, sv##RTY, sv##TYPE1, sv##TYPE2) > + > +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY) \ > + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ > + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) > + > +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY) \ > + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_n_##TY##_z, sv##RTY, sv##TYPE1, TYPE2) > + > +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY) \ > + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_n_##TY##_m, sv##RTY, sv##TYPE1, TYPE2) \ > + MXZ (F##_n_##TY##_x, sv##RTY, sv##TYPE1, TYPE2) \ > + PRED_Z (F, RTY, TYPE1, TYPE2, TY) > + > +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_##TY, sv##RTY, sv##TYPE1, sv##TYPE2) > + > +#define ALL_Q_INTEGER(F, P) \ > + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ > + PRED_##P (F, int8_t, int8_t, int8_t, s8) > + > +#define ALL_Q_INTEGER_UINT(F, P) \ > + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ > + PRED_##P (F, int8_t, int8_t, uint8_t, s8) > + > +#define ALL_Q_INTEGER_INT(F, P) \ > + PRED_##P (F, uint8_t, uint8_t, int8_t, u8) \ > + PRED_##P (F, int8_t, int8_t, int8_t, s8) > + > +#define ALL_H_INTEGER(F, P) \ > + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ > + PRED_##P (F, int16_t, int16_t, int16_t, s16) > + > +#define ALL_H_INTEGER_UINT(F, P) \ > + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ > + PRED_##P (F, int16_t, int16_t, uint16_t, s16) > + > +#define ALL_H_INTEGER_INT(F, P) \ > + PRED_##P (F, uint16_t, uint16_t, int16_t, u16) \ > + PRED_##P (F, int16_t, int16_t, int16_t, s16) > + > +#define ALL_H_INTEGER_WIDE(F, P) \ > + PRED_##P (F, uint16_t, uint16_t, uint8_t, u16) \ > + PRED_##P (F, int16_t, int16_t, int8_t, s16) > + > +#define ALL_S_INTEGER(F, P) \ > + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ > + PRED_##P (F, int32_t, int32_t, int32_t, s32) > + > +#define ALL_S_INTEGER_UINT(F, P) \ > + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ > + PRED_##P (F, int32_t, int32_t, uint32_t, s32) > + > +#define ALL_S_INTEGER_INT(F, P) \ > + PRED_##P (F, uint32_t, uint32_t, int32_t, u32) \ > + PRED_##P (F, int32_t, int32_t, int32_t, s32) > + > +#define ALL_S_INTEGER_WIDE(F, P) \ > + PRED_##P (F, uint32_t, uint32_t, uint16_t, u32) \ > + PRED_##P (F, int32_t, int32_t, int16_t, s32) > + > +#define ALL_D_INTEGER(F, P) \ > + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ > + PRED_##P (F, int64_t, int64_t, int64_t, s64) > + > +#define ALL_D_INTEGER_UINT(F, P) \ > + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ > + PRED_##P (F, int64_t, int64_t, uint64_t, s64) > + > +#define ALL_D_INTEGER_INT(F, P) \ > + PRED_##P (F, uint64_t, uint64_t, int64_t, u64) \ > + PRED_##P (F, int64_t, int64_t, int64_t, s64) > + > +#define ALL_D_INTEGER_WIDE(F, P) \ > + PRED_##P (F, uint64_t, uint64_t, uint32_t, u64) \ > + PRED_##P (F, int64_t, int64_t, int32_t, s64) > + > +#define SD_INTEGER_TO_UINT(F, P) \ > + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ > + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ > + PRED_##P (F, uint32_t, int32_t, int32_t, s32) \ > + PRED_##P (F, uint64_t, int64_t, int64_t, s64) > + > +#define BHS_UNSIGNED_UINT64(F, P) \ > + PRED_##P (F, uint8_t, uint8_t, uint64_t, u8) \ > + PRED_##P (F, uint16_t, uint16_t, uint64_t, u16) \ > + PRED_##P (F, uint32_t, uint32_t, uint64_t, u32) > + > +#define BHS_SIGNED_UINT64(F, P) \ > + PRED_##P (F, int8_t, int8_t, uint64_t, s8) \ > + PRED_##P (F, int16_t, int16_t, uint64_t, s16) \ > + PRED_##P (F, int32_t, int32_t, uint64_t, s32) > + > +#define ALL_UNSIGNED_UINT(F, P) \ > + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ > + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ > + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ > + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) > + > +#define ALL_SIGNED_UINT(F, P) \ > + PRED_##P (F, int8_t, int8_t, uint8_t, s8) \ > + PRED_##P (F, int16_t, int16_t, uint16_t, s16) \ > + PRED_##P (F, int32_t, int32_t, uint32_t, s32) \ > + PRED_##P (F, int64_t, int64_t, uint64_t, s64) > + > +#define ALL_FLOAT(F, P) \ > + PRED_##P (F, float16_t, float16_t, float16_t, f16) \ > + PRED_##P (F, float32_t, float32_t, float32_t, f32) \ > + PRED_##P (F, float64_t, float64_t, float64_t, f64) > + > +#define ALL_FLOAT_INT(F, P) \ > + PRED_##P (F, float16_t, float16_t, int16_t, f16) \ > + PRED_##P (F, float32_t, float32_t, int32_t, f32) \ > + PRED_##P (F, float64_t, float64_t, int64_t, f64) > + > +#define B(F, P) \ > + PRED_##P (F, bool_t, bool_t, bool_t, b) > + > +#define ALL_SD_INTEGER(F, P) \ > + ALL_S_INTEGER (F, P) \ > + ALL_D_INTEGER (F, P) > + > +#define HSD_INTEGER_WIDE(F, P) \ > + ALL_H_INTEGER_WIDE (F, P) \ > + ALL_S_INTEGER_WIDE (F, P) \ > + ALL_D_INTEGER_WIDE (F, P) > + > +#define BHS_INTEGER_UINT64(F, P) \ > + BHS_UNSIGNED_UINT64 (F, P) \ > + BHS_SIGNED_UINT64 (F, P) > + > +#define ALL_INTEGER(F, P) \ > + ALL_Q_INTEGER (F, P) \ > + ALL_H_INTEGER (F, P) \ > + ALL_S_INTEGER (F, P) \ > + ALL_D_INTEGER (F, P) > + > +#define ALL_INTEGER_UINT(F, P) \ > + ALL_Q_INTEGER_UINT (F, P) \ > + ALL_H_INTEGER_UINT (F, P) \ > + ALL_S_INTEGER_UINT (F, P) \ > + ALL_D_INTEGER_UINT (F, P) > + > +#define ALL_INTEGER_INT(F, P) \ > + ALL_Q_INTEGER_INT (F, P) \ > + ALL_H_INTEGER_INT (F, P) \ > + ALL_S_INTEGER_INT (F, P) \ > + ALL_D_INTEGER_INT (F, P) > + > +#define ALL_FLOAT_AND_SD_INTEGER(F, P) \ > + ALL_SD_INTEGER (F, P) \ > + ALL_FLOAT (F, P) > + > +#define ALL_ARITH(F, P) \ > + ALL_INTEGER (F, P) \ > + ALL_FLOAT (F, P) > + > +#define ALL_DATA(F, P) \ > + ALL_ARITH (F, P) \ > + PRED_##P (F, bfloat16_t, bfloat16_t, bfloat16_t, bf16) > + > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c > new file mode 100644 > index 00000000000..ef629413185 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +B (brkn, Zv) > +B (brkpa, Zv) > +B (brkpb, Zv) > +ALL_DATA (splice, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c > new file mode 100644 > index 00000000000..91d574f9249 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_FLOAT_INT (scale, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c > new file mode 100644 > index 00000000000..25c793ff40f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_ARITH (abd, MXZ) > +ALL_ARITH (add, MXZ) > +ALL_INTEGER (and, MXZ) > +B (and, Zv) > +ALL_INTEGER (bic, MXZ) > +B (bic, Zv) > +ALL_FLOAT_AND_SD_INTEGER (div, MXZ) > +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ) > +ALL_INTEGER (eor, MXZ) > +B (eor, Zv) > +ALL_ARITH (mul, MXZ) > +ALL_INTEGER (mulh, MXZ) > +ALL_FLOAT (mulx, MXZ) > +B (nand, Zv) > +B (nor, Zv) > +B (orn, Zv) > +ALL_INTEGER (orr, MXZ) > +B (orr, Zv) > +ALL_ARITH (sub, MXZ) > +ALL_ARITH (subr, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 448 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c > new file mode 100644 > index 00000000000..8d187c22eec > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_ARITH (max, MXZ) > +ALL_ARITH (min, MXZ) > +ALL_FLOAT (maxnm, MXZ) > +ALL_FLOAT (minnm, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 56 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c > new file mode 100644 > index 00000000000..9940866d5ef > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define MXZ4(F, TYPE) \ > + TYPE F##_f (TYPE op1, TYPE op2) \ > + { \ > + return sv##F (svpfalse_b (), op1, op2, 90); \ > + } > + > +#define PRED_MXZ(F, TYPE, TY) \ > + MXZ4 (F##_##TY##_m, TYPE) \ > + MXZ4 (F##_##TY##_x, TYPE) \ > + MXZ4 (F##_##TY##_z, TYPE) > + > +#define ALL_FLOAT(F, P) \ > + PRED_##P (F, svfloat16_t, f16) \ > + PRED_##P (F, svfloat32_t, f32) \ > + PRED_##P (F, svfloat64_t, f64) > + > +ALL_FLOAT (cadd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c > new file mode 100644 > index 00000000000..f8fd18043e6 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +BHS_SIGNED_UINT64 (asr_wide, MXZ) > +BHS_INTEGER_UINT64 (lsl_wide, MXZ) > +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 48 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c > new file mode 100644 > index 00000000000..2f1d7721bc8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_SIGNED_UINT (asr, MXZ) > +ALL_INTEGER_UINT (lsl, MXZ) > +ALL_UNSIGNED_UINT (lsr, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 32 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c > new file mode 100644 > index 00000000000..723fcd0a203 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_ARITH (addp, MXv) > +ALL_ARITH (maxp, MXv) > +ALL_FLOAT (maxnmp, MXv) > +ALL_ARITH (minp, MXv) > +ALL_FLOAT (minnmp, MXv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 78 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c > new file mode 100644 > index 00000000000..6e8be86f9b0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_INTEGER_INT (rshl, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c > new file mode 100644 > index 00000000000..7335a4ff011 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_INTEGER (hadd, MXZ) > +ALL_INTEGER (hsub, MXZ) > +ALL_INTEGER (hsubr, MXZ) > +ALL_INTEGER (qadd, MXZ) > +ALL_INTEGER (qsub, MXZ) > +ALL_INTEGER (qsubr, MXZ) > +ALL_INTEGER (rhadd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 112 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c > new file mode 100644 > index 00000000000..e03e1e890f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +SD_INTEGER_TO_UINT (histcnt, Zv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c > new file mode 100644 > index 00000000000..2649fc01954 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +ALL_SIGNED_UINT (uqadd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c > new file mode 100644 > index 00000000000..72693d01ad0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +HSD_INTEGER_WIDE (adalp, MXZv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */
Richard Sandiford <richard.sandiford@arm.com> writes: > Thanks for doing this and sorry for the slow review. > > Jennifer Schmitz <jschmitz@nvidia.com> writes: >> If an SVE intrinsic has predicate pfalse, we can fold the call to >> a simplified assignment statement: For _m, _x, and implicit predication, >> the LHS can be assigned the operand for inactive values and for _z, we can >> assign a zero vector. > > _x functions don't really specify the values of inactive lanes. > The values are instead "don't care" and so we have a free choice. > Using zeros (as for _z) should be more efficient than reusing one > of the inputs. > >> For example, >> svint32_t foo (svint32_t op1, svint32_t op2) >> { >> return svadd_s32_m (svpfalse_b (), op1, op2); >> } >> can be folded to lhs <- op1, such that foo is compiled to just a RET. >> >> We implemented this optimization during gimple folding by calling a new method >> function_shape::fold_pfalse from gimple_folder::fold. >> The implementations of fold_pfalse in the function_shape subclasses define the >> expected behavior for a pfalse predicate for the different predications >> and return an appropriate gimple statement. >> To avoid code duplication, function_shape::fold_pfalse calls a new method >> gimple_folder:fold_by_pred that takes arguments of type tree for each >> predication and returns the new assignment statement depending on the >> predication. >> >> We tested the new behavior for each intrinsic with all supported predications >> and data types and checked the produced assembly. There is a test file >> for each shape subclass with scan-assembler-times tests that look for >> the simplified instruction sequences, such as individual RET instructions >> or zeroing moves. There is an additional directive counting the total number of >> functions in the test, which must be the sum of counts of all other >> directives. This is to check that all tested intrinsics were optimized. >> >> In this patch, we only implemented function_shape::fold_pfalse for >> binary shapes. But we plan to cover more shapes in follow-up patches, >> after getting feedback on this patch. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? > > My main question is: can we not do this generically for _z, _x and _m, > without the virtual method? (I wanted to prototype this locally before > asking, hence the delay, but never found time.) The only snag is that > we need to check whether the inactive lanes for _m are specified using > a separate parameter before the predicate or whether they're taken from > the parameter after the predicate. But that should be a generic rule > that can be checked by looking at the types. > > That is, if the predication is PRED_z, PRED_m, or PRED_x, then I think > we can say that: > > (1) if the first argument is a vector and the second argument is a pfalse > predicate, the call folds to the first argument > > (2) if the first argument is a pfalse predicate and predication is PRED_m, > the call folds to the second argument > > (3) if the first argument is a pfalse predicate and predication is > not PRED_m, the call folds to zero. > > But perhaps I've forgotten a case. > > The implicit cases would still need to be handled separately. > Perhaps for that case we could add a general fold method to the shape? > I don't think we need to specialise the name of the virtual function > beyond just "fold". Actually, thinking more about it: I think we can handle some implicit cases directly too. If the call properties include CP_READ_MEMORY then the result should be zero. If the call properties include CP_WRITE_MEMORY or CP_PREFETCH_MEMORY then the call can be replaced by a nop. For the others, it would probably be better to stick to function-specific fold routines, rather than do it based on the shape. We could use common base classes for things like reductions. Thanks, Richard > > It would be good to put: > >> + gimple_seq stmts = NULL; >> + gimple *g = gimple_build_assign (lhs, new_lhs); >> + gimple_seq_add_stmt_without_update (&stmts, g); >> + gsi_replace_with_seq_vops (gsi, stmts); >> + return g; > > into a helper so that we can use it for any g. (Maybe with yet another > helper for folding to a gimple value, as here.) Some of the existing > routines could use that helper too, so it would make a natural prepatch. > > I think I've made that sound more complicated than it really is, sorry. > But I think the net effect should be less code, with automatic support > for unary operations. > > Thanks, > Richard > >> @@ -3666,6 +3683,39 @@ gimple_folder::fold_active_lanes_to (tree x) >> return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); >> } >> >> +/* Fold call to assignment statement >> + lhs = new_lhs, >> + where new_lhs is determined by the predication. >> + Return the gimple statement on success, else return NULL. */ >> +gimple * >> +gimple_folder::fold_by_pred (tree m, tree x, tree z, tree implicit) >> +{ >> + tree new_lhs = NULL; >> + switch (pred) >> + { >> + case PRED_z: >> + new_lhs = z; >> + break; >> + case PRED_m: >> + new_lhs = m; >> + break; >> + case PRED_x: >> + new_lhs = x; >> + break; >> + case PRED_implicit: >> + new_lhs = implicit; >> + break; >> + default: >> + return NULL; >> + } >> + gcc_assert (new_lhs); >> + gimple_seq stmts = NULL; >> + gimple *g = gimple_build_assign (lhs, new_lhs); >> + gimple_seq_add_stmt_without_update (&stmts, g); >> + gsi_replace_with_seq_vops (gsi, stmts); >> + return g; >> +} >> + >> /* Try to fold the call. Return the new statement on success and null >> on failure. */ >> gimple * >> @@ -3685,6 +3735,9 @@ gimple_folder::fold () >> /* First try some simplifications that are common to many functions. */ >> if (auto *call = redirect_pred_x ()) >> return call; >> + if (pred != PRED_none) >> + if (auto *call = shape->fold_pfalse (*this)) >> + return call; >> >> return base->fold (*this); >> } >> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h >> index 4cdc0541bdc..4e443a8192e 100644 >> --- a/gcc/config/aarch64/aarch64-sve-builtins.h >> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h >> @@ -632,12 +632,15 @@ public: >> gcall *redirect_call (const function_instance &); >> gimple *redirect_pred_x (); >> >> + bool arg_is_pfalse_p (unsigned int idx); >> + >> gimple *fold_to_cstu (poly_uint64); >> gimple *fold_to_pfalse (); >> gimple *fold_to_ptrue (); >> gimple *fold_to_vl_pred (unsigned int); >> gimple *fold_const_binary (enum tree_code); >> gimple *fold_active_lanes_to (tree); >> + gimple *fold_by_pred (tree, tree, tree, tree); >> >> gimple *fold (); >> >> @@ -796,6 +799,11 @@ public: >> /* Check whether the given call is semantically valid. Return true >> if it is, otherwise report an error and return false. */ >> virtual bool check (function_checker &) const { return true; } >> + >> + /* For a pfalse predicate, try to fold the given gimple call. >> + Return the new gimple statement on success, otherwise return null. */ >> + virtual gimple *fold_pfalse (gimple_folder &) const { return NULL; } >> + >> }; >> >> /* RAII class for enabling enough SVE features to define the built-in >> @@ -829,6 +837,7 @@ extern tree acle_svprfop; >> >> bool vector_cst_all_same (tree, unsigned int); >> bool is_ptrue (tree, unsigned int); >> +bool is_pfalse (tree); >> const function_instance *lookup_fndecl (tree); >> >> /* Try to find a mode with the given mode_suffix_info fields. Return the >> diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >> new file mode 100644 >> index 00000000000..3910ab36b6b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >> @@ -0,0 +1,176 @@ >> +#include <arm_sve.h> >> + >> +#define MXZ(F, RTY, TY1, TY2) \ >> + RTY F##_f (TY1 op1, TY2 op2) \ >> + { \ >> + return sv##F (svpfalse_b (), op1, op2); \ >> + } >> + >> +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_##TY##_m, sv##RTY, sv##TYPE1, sv##TYPE2) \ >> + MXZ (F##_##TY##_x, sv##RTY, sv##TYPE1, sv##TYPE2) >> + >> +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_##TY##_z, sv##RTY, sv##TYPE1, sv##TYPE2) >> + >> +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) >> + >> +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_n_##TY##_z, sv##RTY, sv##TYPE1, TYPE2) >> + >> +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY) \ >> + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_n_##TY##_m, sv##RTY, sv##TYPE1, TYPE2) \ >> + MXZ (F##_n_##TY##_x, sv##RTY, sv##TYPE1, TYPE2) \ >> + PRED_Z (F, RTY, TYPE1, TYPE2, TY) >> + >> +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY) \ >> + MXZ (F##_##TY, sv##RTY, sv##TYPE1, sv##TYPE2) >> + >> +#define ALL_Q_INTEGER(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ >> + PRED_##P (F, int8_t, int8_t, int8_t, s8) >> + >> +#define ALL_Q_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ >> + PRED_##P (F, int8_t, int8_t, uint8_t, s8) >> + >> +#define ALL_Q_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, int8_t, u8) \ >> + PRED_##P (F, int8_t, int8_t, int8_t, s8) >> + >> +#define ALL_H_INTEGER(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, int16_t, s16) >> + >> +#define ALL_H_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, uint16_t, s16) >> + >> +#define ALL_H_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, int16_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, int16_t, s16) >> + >> +#define ALL_H_INTEGER_WIDE(F, P) \ >> + PRED_##P (F, uint16_t, uint16_t, uint8_t, u16) \ >> + PRED_##P (F, int16_t, int16_t, int8_t, s16) >> + >> +#define ALL_S_INTEGER(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, int32_t, s32) >> + >> +#define ALL_S_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, uint32_t, s32) >> + >> +#define ALL_S_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, int32_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, int32_t, s32) >> + >> +#define ALL_S_INTEGER_WIDE(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint16_t, u32) \ >> + PRED_##P (F, int32_t, int32_t, int16_t, s32) >> + >> +#define ALL_D_INTEGER(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, int64_t, s64) >> + >> +#define ALL_D_INTEGER_UINT(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, uint64_t, s64) >> + >> +#define ALL_D_INTEGER_INT(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, int64_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, int64_t, s64) >> + >> +#define ALL_D_INTEGER_WIDE(F, P) \ >> + PRED_##P (F, uint64_t, uint64_t, uint32_t, u64) \ >> + PRED_##P (F, int64_t, int64_t, int32_t, s64) >> + >> +#define SD_INTEGER_TO_UINT(F, P) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ >> + PRED_##P (F, uint32_t, int32_t, int32_t, s32) \ >> + PRED_##P (F, uint64_t, int64_t, int64_t, s64) >> + >> +#define BHS_UNSIGNED_UINT64(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint64_t, u8) \ >> + PRED_##P (F, uint16_t, uint16_t, uint64_t, u16) \ >> + PRED_##P (F, uint32_t, uint32_t, uint64_t, u32) >> + >> +#define BHS_SIGNED_UINT64(F, P) \ >> + PRED_##P (F, int8_t, int8_t, uint64_t, s8) \ >> + PRED_##P (F, int16_t, int16_t, uint64_t, s16) \ >> + PRED_##P (F, int32_t, int32_t, uint64_t, s32) >> + >> +#define ALL_UNSIGNED_UINT(F, P) \ >> + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ >> + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ >> + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ >> + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) >> + >> +#define ALL_SIGNED_UINT(F, P) \ >> + PRED_##P (F, int8_t, int8_t, uint8_t, s8) \ >> + PRED_##P (F, int16_t, int16_t, uint16_t, s16) \ >> + PRED_##P (F, int32_t, int32_t, uint32_t, s32) \ >> + PRED_##P (F, int64_t, int64_t, uint64_t, s64) >> + >> +#define ALL_FLOAT(F, P) \ >> + PRED_##P (F, float16_t, float16_t, float16_t, f16) \ >> + PRED_##P (F, float32_t, float32_t, float32_t, f32) \ >> + PRED_##P (F, float64_t, float64_t, float64_t, f64) >> + >> +#define ALL_FLOAT_INT(F, P) \ >> + PRED_##P (F, float16_t, float16_t, int16_t, f16) \ >> + PRED_##P (F, float32_t, float32_t, int32_t, f32) \ >> + PRED_##P (F, float64_t, float64_t, int64_t, f64) >> + >> +#define B(F, P) \ >> + PRED_##P (F, bool_t, bool_t, bool_t, b) >> + >> +#define ALL_SD_INTEGER(F, P) \ >> + ALL_S_INTEGER (F, P) \ >> + ALL_D_INTEGER (F, P) >> + >> +#define HSD_INTEGER_WIDE(F, P) \ >> + ALL_H_INTEGER_WIDE (F, P) \ >> + ALL_S_INTEGER_WIDE (F, P) \ >> + ALL_D_INTEGER_WIDE (F, P) >> + >> +#define BHS_INTEGER_UINT64(F, P) \ >> + BHS_UNSIGNED_UINT64 (F, P) \ >> + BHS_SIGNED_UINT64 (F, P) >> + >> +#define ALL_INTEGER(F, P) \ >> + ALL_Q_INTEGER (F, P) \ >> + ALL_H_INTEGER (F, P) \ >> + ALL_S_INTEGER (F, P) \ >> + ALL_D_INTEGER (F, P) >> + >> +#define ALL_INTEGER_UINT(F, P) \ >> + ALL_Q_INTEGER_UINT (F, P) \ >> + ALL_H_INTEGER_UINT (F, P) \ >> + ALL_S_INTEGER_UINT (F, P) \ >> + ALL_D_INTEGER_UINT (F, P) >> + >> +#define ALL_INTEGER_INT(F, P) \ >> + ALL_Q_INTEGER_INT (F, P) \ >> + ALL_H_INTEGER_INT (F, P) \ >> + ALL_S_INTEGER_INT (F, P) \ >> + ALL_D_INTEGER_INT (F, P) >> + >> +#define ALL_FLOAT_AND_SD_INTEGER(F, P) \ >> + ALL_SD_INTEGER (F, P) \ >> + ALL_FLOAT (F, P) >> + >> +#define ALL_ARITH(F, P) \ >> + ALL_INTEGER (F, P) \ >> + ALL_FLOAT (F, P) >> + >> +#define ALL_DATA(F, P) \ >> + ALL_ARITH (F, P) \ >> + PRED_##P (F, bfloat16_t, bfloat16_t, bfloat16_t, bf16) >> + >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >> new file mode 100644 >> index 00000000000..ef629413185 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +B (brkn, Zv) >> +B (brkpa, Zv) >> +B (brkpb, Zv) >> +ALL_DATA (splice, IMPLICIT) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c >> new file mode 100644 >> index 00000000000..91d574f9249 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_FLOAT_INT (scale, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c >> new file mode 100644 >> index 00000000000..25c793ff40f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c >> @@ -0,0 +1,30 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_ARITH (abd, MXZ) >> +ALL_ARITH (add, MXZ) >> +ALL_INTEGER (and, MXZ) >> +B (and, Zv) >> +ALL_INTEGER (bic, MXZ) >> +B (bic, Zv) >> +ALL_FLOAT_AND_SD_INTEGER (div, MXZ) >> +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ) >> +ALL_INTEGER (eor, MXZ) >> +B (eor, Zv) >> +ALL_ARITH (mul, MXZ) >> +ALL_INTEGER (mulh, MXZ) >> +ALL_FLOAT (mulx, MXZ) >> +B (nand, Zv) >> +B (nor, Zv) >> +B (orn, Zv) >> +ALL_INTEGER (orr, MXZ) >> +B (orr, Zv) >> +ALL_ARITH (sub, MXZ) >> +ALL_ARITH (subr, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 448 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c >> new file mode 100644 >> index 00000000000..8d187c22eec >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_ARITH (max, MXZ) >> +ALL_ARITH (min, MXZ) >> +ALL_FLOAT (maxnm, MXZ) >> +ALL_FLOAT (minnm, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 56 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c >> new file mode 100644 >> index 00000000000..9940866d5ef >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c >> @@ -0,0 +1,26 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include <arm_sve.h> >> + >> +#define MXZ4(F, TYPE) \ >> + TYPE F##_f (TYPE op1, TYPE op2) \ >> + { \ >> + return sv##F (svpfalse_b (), op1, op2, 90); \ >> + } >> + >> +#define PRED_MXZ(F, TYPE, TY) \ >> + MXZ4 (F##_##TY##_m, TYPE) \ >> + MXZ4 (F##_##TY##_x, TYPE) \ >> + MXZ4 (F##_##TY##_z, TYPE) >> + >> +#define ALL_FLOAT(F, P) \ >> + PRED_##P (F, svfloat16_t, f16) \ >> + PRED_##P (F, svfloat32_t, f32) \ >> + PRED_##P (F, svfloat64_t, f64) >> + >> +ALL_FLOAT (cadd, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 3 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c >> new file mode 100644 >> index 00000000000..f8fd18043e6 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c >> @@ -0,0 +1,12 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +BHS_SIGNED_UINT64 (asr_wide, MXZ) >> +BHS_INTEGER_UINT64 (lsl_wide, MXZ) >> +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 48 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c >> new file mode 100644 >> index 00000000000..2f1d7721bc8 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c >> @@ -0,0 +1,12 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_SIGNED_UINT (asr, MXZ) >> +ALL_INTEGER_UINT (lsl, MXZ) >> +ALL_UNSIGNED_UINT (lsr, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 32 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c >> new file mode 100644 >> index 00000000000..723fcd0a203 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_ARITH (addp, MXv) >> +ALL_ARITH (maxp, MXv) >> +ALL_FLOAT (maxnmp, MXv) >> +ALL_ARITH (minp, MXv) >> +ALL_FLOAT (minnmp, MXv) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 78 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c >> new file mode 100644 >> index 00000000000..6e8be86f9b0 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_INTEGER_INT (rshl, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c >> new file mode 100644 >> index 00000000000..7335a4ff011 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_INTEGER (hadd, MXZ) >> +ALL_INTEGER (hsub, MXZ) >> +ALL_INTEGER (hsubr, MXZ) >> +ALL_INTEGER (qadd, MXZ) >> +ALL_INTEGER (qsub, MXZ) >> +ALL_INTEGER (qsubr, MXZ) >> +ALL_INTEGER (rhadd, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 112 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c >> new file mode 100644 >> index 00000000000..e03e1e890f8 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c >> @@ -0,0 +1,9 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +SD_INTEGER_TO_UINT (histcnt, Zv) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c >> new file mode 100644 >> index 00000000000..2649fc01954 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +ALL_SIGNED_UINT (uqadd, MXZ) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c >> new file mode 100644 >> index 00000000000..72693d01ad0 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c >> @@ -0,0 +1,10 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +HSD_INTEGER_WIDE (adalp, MXZv) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */
> On 7 Nov 2024, at 08:49, Richard Sandiford <richard.sandiford@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Richard Sandiford <richard.sandiford@arm.com> writes: >> Thanks for doing this and sorry for the slow review. >> >> Jennifer Schmitz <jschmitz@nvidia.com> writes: >>> If an SVE intrinsic has predicate pfalse, we can fold the call to >>> a simplified assignment statement: For _m, _x, and implicit predication, >>> the LHS can be assigned the operand for inactive values and for _z, we can >>> assign a zero vector. >> >> _x functions don't really specify the values of inactive lanes. >> The values are instead "don't care" and so we have a free choice. >> Using zeros (as for _z) should be more efficient than reusing one >> of the inputs. >> >>> For example, >>> svint32_t foo (svint32_t op1, svint32_t op2) >>> { >>> return svadd_s32_m (svpfalse_b (), op1, op2); >>> } >>> can be folded to lhs <- op1, such that foo is compiled to just a RET. >>> >>> We implemented this optimization during gimple folding by calling a new method >>> function_shape::fold_pfalse from gimple_folder::fold. >>> The implementations of fold_pfalse in the function_shape subclasses define the >>> expected behavior for a pfalse predicate for the different predications >>> and return an appropriate gimple statement. >>> To avoid code duplication, function_shape::fold_pfalse calls a new method >>> gimple_folder:fold_by_pred that takes arguments of type tree for each >>> predication and returns the new assignment statement depending on the >>> predication. >>> >>> We tested the new behavior for each intrinsic with all supported predications >>> and data types and checked the produced assembly. There is a test file >>> for each shape subclass with scan-assembler-times tests that look for >>> the simplified instruction sequences, such as individual RET instructions >>> or zeroing moves. There is an additional directive counting the total number of >>> functions in the test, which must be the sum of counts of all other >>> directives. This is to check that all tested intrinsics were optimized. >>> >>> In this patch, we only implemented function_shape::fold_pfalse for >>> binary shapes. But we plan to cover more shapes in follow-up patches, >>> after getting feedback on this patch. >>> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >>> OK for mainline? >> >> My main question is: can we not do this generically for _z, _x and _m, >> without the virtual method? (I wanted to prototype this locally before >> asking, hence the delay, but never found time.) The only snag is that >> we need to check whether the inactive lanes for _m are specified using >> a separate parameter before the predicate or whether they're taken from >> the parameter after the predicate. But that should be a generic rule >> that can be checked by looking at the types. >> >> That is, if the predication is PRED_z, PRED_m, or PRED_x, then I think >> we can say that: >> >> (1) if the first argument is a vector and the second argument is a pfalse >> predicate, the call folds to the first argument >> >> (2) if the first argument is a pfalse predicate and predication is PRED_m, >> the call folds to the second argument >> >> (3) if the first argument is a pfalse predicate and predication is >> not PRED_m, the call folds to zero. >> >> But perhaps I've forgotten a case. >> >> The implicit cases would still need to be handled separately. >> Perhaps for that case we could add a general fold method to the shape? >> I don't think we need to specialise the name of the virtual function >> beyond just "fold". > > Actually, thinking more about it: I think we can handle some implicit cases > directly too. If the call properties include CP_READ_MEMORY then the result > should be zero. If the call properties include CP_WRITE_MEMORY or > CP_PREFETCH_MEMORY then the call can be replaced by a nop. > > For the others, it would probably be better to stick to function-specific > fold routines, rather than do it based on the shape. We could use common > base classes for things like reductions. > Thanks for the feedback. It was possible to use these rules to implement the folding generically in a gimple_folder class method. Some few intrinsics did not fit this pattern and I added guards to ensure correct behavior. Below is the revised patch that addresses all intrinsic shapes, not just the binary ones that were covered in the first version of the patch. Thanks, Jennifer If an SVE intrinsic has predicate pfalse, we can fold the call to a simplified assignment statement: For _m predication, the LHS can be assigned the operand for inactive values and for _z, we can assign a zero vector. For _x, the returned values can be arbitrary and as suggested by Richard Sandiford, we fold to a zero vector. For example, svint32_t foo (svint32_t op1, svint32_t op2) { return svadd_s32_m (svpfalse_b (), op1, op2); } can be folded to lhs = op1, such that foo is compiled to just a RET. For implicit predication, a case distinction is necessary: Intrinsics that read from memory can be folded to a zero vector. Intrinsics that write to memory or prefetch can be folded to a no-op. Other intrinsics need case-by-case implemenation, which we added in the corresponding svxxx_impl::fold. We implemented this optimization during gimple folding by calling a new method gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic cases described above. We tested the new behavior for each intrinsic with all supported predications and data types and checked the produced assembly. There is a test file for each shape subclass with scan-assembler-times tests that look for the simplified instruction sequences, such as individual RET instructions or zeroing moves. There is an additional directive counting the total number of functions in the test, which must be the sum of counts of all other directives. This is to check that all tested intrinsics were optimized. Some few intrinsics were not covered by this patch: - svlasta and svlastb already have an implementation to cover a pfalse predicate. No changes were made to them. - svld1/2/3/4 return aggregate types and were excluded from the case that folds calls with implicit predication to lhs = {0, ...}. - svst1/2/3/4 already have an implementation in svstx_impl that precedes our optimization, such that it is not triggered. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ChangeLog: PR target/106329 * config/aarch64/aarch64-sve-builtins-base.cc (class svaddv_impl): Add folding if pfalse predicate. (class svandv_impl): Likewise. (class svcompact_impl): Likewise. (class svcvtnt_impl): Likewise. (class sveorv_impl): Likewise. (class svminv_impl): Likewise. (class svmaxnmv_impl): Likewise. (class svmaxv_impl): Likewise. (class svminnmv_impl): Likewise. (class svorv_impl): Likewise. (class svsplice_impl): Likewise. * config/aarch64/aarch64-sve-builtins-sve2.cc (class svcvtxnt_impl): Likewise. (class svqshlu_impl): Likewise. * config/aarch64/aarch64-sve-builtins.cc (is_pfalse): Return true if tree is pfalse. (gimple_folder::fold_pfalse): Fold calls with pfalse predicate. (gimple_folder::fold_call_to): Fold call to lhs = t for given tree t. (gimple_folder::fold_to_stmt_vops): Helper function that folds the call to given stmt and adjusts virtual operands. (gimple_folder::fold): Call fold_pfalse. * config/aarch64/aarch64-sve-builtins.h (is_pfalse): Declare is_pfalse. gcc/testsuite/ChangeLog: PR target/106329 * gcc.target/aarch64/pfalse-binary_0.c: New test. * gcc.target/aarch64/pfalse-unary_0.c: New test. * gcc.target/aarch64/sve/pfalse-binary.c: New test. * gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binaryxn.c: New test. * gcc.target/aarch64/sve/pfalse-clast.c: New test. * gcc.target/aarch64/sve/pfalse-compare_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-count_pred.c: New test. * gcc.target/aarch64/sve/pfalse-fold_left.c: New test. * gcc.target/aarch64/sve/pfalse-load.c: New test. * gcc.target/aarch64/sve/pfalse-load_ext.c: New test. * gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c: New test. * gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c: New test. * gcc.target/aarch64/sve/pfalse-load_gather_sv.c: New test. * gcc.target/aarch64/sve/pfalse-load_gather_vs.c: New test. * gcc.target/aarch64/sve/pfalse-load_replicate.c: New test. * gcc.target/aarch64/sve/pfalse-prefetch.c: New test. * gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c: New test. * gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c: New test. * gcc.target/aarch64/sve/pfalse-ptest.c: New test. * gcc.target/aarch64/sve/pfalse-rdffr.c: New test. * gcc.target/aarch64/sve/pfalse-reduction.c: New test. * gcc.target/aarch64/sve/pfalse-reduction_wide.c: New test. * gcc.target/aarch64/sve/pfalse-shift_right_imm.c: New test. * gcc.target/aarch64/sve/pfalse-store.c: New test. * gcc.target/aarch64/sve/pfalse-store_scatter_index.c: New test. * gcc.target/aarch64/sve/pfalse-store_scatter_offset.c: New test. * gcc.target/aarch64/sve/pfalse-storexn.c: New test. * gcc.target/aarch64/sve/pfalse-ternary_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-ternary_rotate.c: New test. * gcc.target/aarch64/sve/pfalse-unary.c: New test. * gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c: New test. * gcc.target/aarch64/sve/pfalse-unary_convertxn.c: New test. * gcc.target/aarch64/sve/pfalse-unary_n.c: New test. * gcc.target/aarch64/sve/pfalse-unary_pred.c: New test. * gcc.target/aarch64/sve/pfalse-unary_to_uint.c: New test. * gcc.target/aarch64/sve/pfalse-unaryxn.c: New test. * gcc.target/aarch64/sve2/pfalse-binary.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_wide.c: New test. * gcc.target/aarch64/sve2/pfalse-compare.c: New test. * gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-load_gather_vs.c: New test. * gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c: New test. * gcc.target/aarch64/sve2/pfalse-shift_right_imm.c: New test. * gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-unary.c: New test. * gcc.target/aarch64/sve2/pfalse-unary_convert.c: New test. * gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c: New test. * gcc.target/aarch64/sve2/pfalse-unary_to_int.c: New test. --- .../aarch64/aarch64-sve-builtins-base.cc | 266 +++++++++++++++++- .../aarch64/aarch64-sve-builtins-sve2.cc | 42 ++- gcc/config/aarch64/aarch64-sve-builtins.cc | 73 +++++ gcc/config/aarch64/aarch64-sve-builtins.h | 4 + .../gcc.target/aarch64/pfalse-binary_0.c | 240 ++++++++++++++++ .../gcc.target/aarch64/pfalse-unary_0.c | 195 +++++++++++++ .../gcc.target/aarch64/sve/pfalse-binary.c | 13 + .../aarch64/sve/pfalse-binary_int_opt_n.c | 10 + .../aarch64/sve/pfalse-binary_opt_n.c | 30 ++ .../aarch64/sve/pfalse-binary_opt_single_n.c | 13 + .../aarch64/sve/pfalse-binary_rotate.c | 26 ++ .../aarch64/sve/pfalse-binary_uint64_opt_n.c | 12 + .../aarch64/sve/pfalse-binary_uint_opt_n.c | 12 + .../gcc.target/aarch64/sve/pfalse-binaryxn.c | 30 ++ .../gcc.target/aarch64/sve/pfalse-clast.c | 10 + .../aarch64/sve/pfalse-compare_opt_n.c | 19 ++ .../aarch64/sve/pfalse-compare_wide_opt_n.c | 14 + .../aarch64/sve/pfalse-count_pred.c | 9 + .../gcc.target/aarch64/sve/pfalse-fold_left.c | 9 + .../gcc.target/aarch64/sve/pfalse-load.c | 31 ++ .../gcc.target/aarch64/sve/pfalse-load_ext.c | 39 +++ .../sve/pfalse-load_ext_gather_index.c | 50 ++++ .../sve/pfalse-load_ext_gather_offset.c | 70 +++++ .../aarch64/sve/pfalse-load_gather_sv.c | 31 ++ .../aarch64/sve/pfalse-load_gather_vs.c | 36 +++ .../aarch64/sve/pfalse-load_replicate.c | 30 ++ .../gcc.target/aarch64/sve/pfalse-prefetch.c | 21 ++ .../sve/pfalse-prefetch_gather_index.c | 26 ++ .../sve/pfalse-prefetch_gather_offset.c | 24 ++ .../gcc.target/aarch64/sve/pfalse-ptest.c | 11 + .../gcc.target/aarch64/sve/pfalse-rdffr.c | 12 + .../gcc.target/aarch64/sve/pfalse-reduction.c | 18 ++ .../aarch64/sve/pfalse-reduction_wide.c | 9 + .../aarch64/sve/pfalse-shift_right_imm.c | 27 ++ .../gcc.target/aarch64/sve/pfalse-store.c | 50 ++++ .../aarch64/sve/pfalse-store_scatter_index.c | 39 +++ .../aarch64/sve/pfalse-store_scatter_offset.c | 67 +++++ .../gcc.target/aarch64/sve/pfalse-storexn.c | 29 ++ .../aarch64/sve/pfalse-ternary_opt_n.c | 50 ++++ .../aarch64/sve/pfalse-ternary_rotate.c | 26 ++ .../gcc.target/aarch64/sve/pfalse-unary.c | 32 +++ .../sve/pfalse-unary_convert_narrowt.c | 20 ++ .../aarch64/sve/pfalse-unary_convertxn.c | 49 ++++ .../gcc.target/aarch64/sve/pfalse-unary_n.c | 42 +++ .../aarch64/sve/pfalse-unary_pred.c | 9 + .../aarch64/sve/pfalse-unary_to_uint.c | 12 + .../gcc.target/aarch64/sve/pfalse-unaryxn.c | 13 + .../gcc.target/aarch64/sve2/pfalse-binary.c | 14 + .../aarch64/sve2/pfalse-binary_int_opt_n.c | 12 + .../sve2/pfalse-binary_int_opt_single_n.c | 10 + .../aarch64/sve2/pfalse-binary_opt_n.c | 16 ++ .../aarch64/sve2/pfalse-binary_opt_single_n.c | 11 + .../aarch64/sve2/pfalse-binary_to_uint.c | 9 + .../aarch64/sve2/pfalse-binary_uint_opt_n.c | 10 + .../aarch64/sve2/pfalse-binary_wide.c | 10 + .../gcc.target/aarch64/sve2/pfalse-compare.c | 10 + .../pfalse-load_ext_gather_index_restricted.c | 45 +++ ...pfalse-load_ext_gather_offset_restricted.c | 64 +++++ .../sve2/pfalse-load_gather_sv_restricted.c | 27 ++ .../aarch64/sve2/pfalse-load_gather_vs.c | 35 +++ .../sve2/pfalse-shift_left_imm_to_uint.c | 27 ++ .../aarch64/sve2/pfalse-shift_right_imm.c | 31 ++ .../pfalse-store_scatter_index_restricted.c | 49 ++++ .../pfalse-store_scatter_offset_restricted.c | 64 +++++ .../gcc.target/aarch64/sve2/pfalse-unary.c | 31 ++ .../aarch64/sve2/pfalse-unary_convert.c | 30 ++ .../sve2/pfalse-unary_convert_narrowt.c | 22 ++ .../aarch64/sve2/pfalse-unary_to_int.c | 10 + 68 files changed, 2422 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c create mode 100644 gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 2117eceb606..596e406890b 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -201,6 +201,15 @@ class svac_impl : public function_base public: CONSTEXPR svac_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -216,6 +225,14 @@ public: class svadda_impl : public function_base { public: + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (gimple_call_arg (f.call, 1)); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -227,6 +244,21 @@ public: } }; +class svaddv_impl : public reduction +{ +public: + CONSTEXPR svaddv_impl () + : reduction (UNSPEC_SADDV, UNSPEC_UADDV, UNSPEC_FADDV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + /* Implements svadr[bhwd]. */ class svadr_bhwd_impl : public function_base { @@ -245,11 +277,25 @@ public: e.args.quick_push (expand_vector_broadcast (mode, shift)); return e.use_exact_insn (code_for_aarch64_adr_shift (mode)); } - /* How many bits left to shift the vector displacement. */ unsigned int m_shift; }; + +class svandv_impl : public reduction +{ +public: + CONSTEXPR svandv_impl () : reduction (UNSPEC_ANDV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_all_ones_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svbic_impl : public function_base { public: @@ -333,6 +379,14 @@ class svclast_impl : public quiet<function_base> public: CONSTEXPR svclast_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (gimple_call_arg (f.call, 1)); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -425,6 +479,8 @@ public: return gimple_build_assign (f.lhs, m_code, rhs1, rhs2); } + if (is_pfalse (pg)) + return f.fold_call_to (pg); return NULL; } @@ -464,6 +520,15 @@ public: : m_code (code), m_unspec_for_sint (unspec_for_sint), m_unspec_for_uint (unspec_for_uint) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -502,6 +567,16 @@ public: class svcmpuo_impl : public quiet<function_base> { public: + + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -598,6 +673,16 @@ public: class svcntp_impl : public function_base { public: + + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -613,6 +698,19 @@ public: } }; +class svcompact_impl + : public QUIET_CODE_FOR_MODE0 (aarch64_sve_compact) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + /* Implements svcreate2, svcreate3 and svcreate4. */ class svcreate_impl : public quiet<multi_vector_function> { @@ -746,6 +844,18 @@ public: } }; +class svcvtnt_impl : public CODE_FOR_MODE0 (aarch64_sve_cvtnt) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (f.pred == PRED_x && is_pfalse (gimple_call_arg (f.call, 1))) + f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svdiv_impl : public rtx_code_function { public: @@ -1138,6 +1248,20 @@ public: } }; +class sveorv_impl : public reduction +{ +public: + CONSTEXPR sveorv_impl () : reduction (UNSPEC_XORV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + /* Implements svextb, svexth and svextw. */ class svext_bhw_impl : public function_base { @@ -1380,7 +1504,8 @@ public: BIT_FIELD_REF lowers to Advanced SIMD element extract, so we have to ensure the index of the element being accessed is in the range of a Advanced SIMD vector width. */ - gimple *fold (gimple_folder & f) const override + gimple * + fold (gimple_folder & f) const override { tree pred = gimple_call_arg (f.call, 0); tree val = gimple_call_arg (f.call, 1); @@ -1950,6 +2075,80 @@ public: } }; +class svminv_impl : public reduction +{ +public: + CONSTEXPR svminv_impl () + : reduction (UNSPEC_SMINV, UNSPEC_UMINV, UNSPEC_FMINV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + tree rhs = f.type_suffix (0).integer_p + ? TYPE_MAX_VALUE (TREE_TYPE (f.lhs)) + : build_real (TREE_TYPE (f.lhs), dconstinf); + return f.fold_call_to (rhs); + } + return NULL; + } +}; + +class svmaxnmv_impl : public reduction +{ +public: + CONSTEXPR svmaxnmv_impl () : reduction (UNSPEC_FMAXNMV) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + REAL_VALUE_TYPE rnan = dconst0; + rnan.cl = rvc_nan; + return f.fold_call_to (build_real (TREE_TYPE (f.lhs), rnan)); + } + return NULL; + } +}; + +class svmaxv_impl : public reduction +{ +public: + CONSTEXPR svmaxv_impl () + : reduction (UNSPEC_SMAXV, UNSPEC_UMAXV, UNSPEC_FMAXV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + tree rhs = f.type_suffix (0).integer_p + ? TYPE_MIN_VALUE (TREE_TYPE (f.lhs)) + : build_real (TREE_TYPE (f.lhs), dconstninf); + return f.fold_call_to (rhs); + } + return NULL; + } +}; + +class svminnmv_impl : public reduction +{ +public: + CONSTEXPR svminnmv_impl () : reduction (UNSPEC_FMINNMV) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + REAL_VALUE_TYPE rnan = dconst0; + rnan.cl = rvc_nan; + return f.fold_call_to (build_real (TREE_TYPE (f.lhs), rnan)); + } + return NULL; + } +}; + class svmla_impl : public function_base { public: @@ -2198,6 +2397,20 @@ public: } }; +class svorv_impl : public reduction +{ +public: + CONSTEXPR svorv_impl () : reduction (UNSPEC_IORV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svpfalse_impl : public function_base { public: @@ -2222,6 +2435,14 @@ class svpfirst_svpnext_impl : public function_base { public: CONSTEXPR svpfirst_svpnext_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } rtx expand (function_expander &e) const override @@ -2302,6 +2523,13 @@ class svptest_impl : public function_base { public: CONSTEXPR svptest_impl (rtx_code compare) : m_compare (compare) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (boolean_false_node); + return NULL; + } rtx expand (function_expander &e) const override @@ -2717,6 +2945,18 @@ public: } }; +class svsplice_impl : public QUIET_CODE_FOR_MODE0 (aarch64_sve_splice) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (gimple_call_arg (f.call, 2)); + return NULL; + } +}; + class svst1_impl : public full_width_access { public: @@ -3170,13 +3410,13 @@ FUNCTION (svacle, svac_impl, (UNSPEC_COND_FCMLE)) FUNCTION (svaclt, svac_impl, (UNSPEC_COND_FCMLT)) FUNCTION (svadd, rtx_code_function, (PLUS, PLUS, UNSPEC_COND_FADD)) FUNCTION (svadda, svadda_impl,) -FUNCTION (svaddv, reduction, (UNSPEC_SADDV, UNSPEC_UADDV, UNSPEC_FADDV)) +FUNCTION (svaddv, svaddv_impl,) FUNCTION (svadrb, svadr_bhwd_impl, (0)) FUNCTION (svadrd, svadr_bhwd_impl, (3)) FUNCTION (svadrh, svadr_bhwd_impl, (1)) FUNCTION (svadrw, svadr_bhwd_impl, (2)) FUNCTION (svand, rtx_code_function, (AND, AND)) -FUNCTION (svandv, reduction, (UNSPEC_ANDV)) +FUNCTION (svandv, svandv_impl,) FUNCTION (svasr, rtx_code_function, (ASHIFTRT, ASHIFTRT)) FUNCTION (svasr_wide, shift_wide, (ASHIFTRT, UNSPEC_ASHIFTRT_WIDE)) FUNCTION (svasrd, unspec_based_function, (UNSPEC_ASRD, -1, -1)) @@ -3233,12 +3473,12 @@ FUNCTION (svcnth_pat, svcnt_bhwd_pat_impl, (VNx8HImode)) FUNCTION (svcntp, svcntp_impl,) FUNCTION (svcntw, svcnt_bhwd_impl, (VNx4SImode)) FUNCTION (svcntw_pat, svcnt_bhwd_pat_impl, (VNx4SImode)) -FUNCTION (svcompact, QUIET_CODE_FOR_MODE0 (aarch64_sve_compact),) +FUNCTION (svcompact, svcompact_impl,) FUNCTION (svcreate2, svcreate_impl, (2)) FUNCTION (svcreate3, svcreate_impl, (3)) FUNCTION (svcreate4, svcreate_impl, (4)) FUNCTION (svcvt, svcvt_impl,) -FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),) +FUNCTION (svcvtnt, svcvtnt_impl,) FUNCTION (svdiv, svdiv_impl,) FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV)) FUNCTION (svdot, svdot_impl,) @@ -3249,7 +3489,7 @@ FUNCTION (svdup_lane, svdup_lane_impl,) FUNCTION (svdupq, svdupq_impl,) FUNCTION (svdupq_lane, svdupq_lane_impl,) FUNCTION (sveor, rtx_code_function, (XOR, XOR, -1)) -FUNCTION (sveorv, reduction, (UNSPEC_XORV)) +FUNCTION (sveorv, sveorv_impl,) FUNCTION (svexpa, unspec_based_function, (-1, -1, UNSPEC_FEXPA)) FUNCTION (svext, QUIET_CODE_FOR_MODE0 (aarch64_sve_ext),) FUNCTION (svextb, svext_bhw_impl, (QImode)) @@ -3313,14 +3553,14 @@ FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX, UNSPEC_FMAX)) FUNCTION (svmaxnm, cond_or_uncond_unspec_function, (UNSPEC_COND_FMAXNM, UNSPEC_FMAXNM)) -FUNCTION (svmaxnmv, reduction, (UNSPEC_FMAXNMV)) -FUNCTION (svmaxv, reduction, (UNSPEC_SMAXV, UNSPEC_UMAXV, UNSPEC_FMAXV)) +FUNCTION (svmaxnmv, svmaxnmv_impl,) +FUNCTION (svmaxv, svmaxv_impl,) FUNCTION (svmin, rtx_code_function, (SMIN, UMIN, UNSPEC_COND_FMIN, UNSPEC_FMIN)) FUNCTION (svminnm, cond_or_uncond_unspec_function, (UNSPEC_COND_FMINNM, UNSPEC_FMINNM)) -FUNCTION (svminnmv, reduction, (UNSPEC_FMINNMV)) -FUNCTION (svminv, reduction, (UNSPEC_SMINV, UNSPEC_UMINV, UNSPEC_FMINV)) +FUNCTION (svminnmv, svminnmv_impl,) +FUNCTION (svminv, svminv_impl,) FUNCTION (svmla, svmla_impl,) FUNCTION (svmla_lane, svmla_lane_impl,) FUNCTION (svmls, svmls_impl,) @@ -3343,7 +3583,7 @@ FUNCTION (svnor, svnor_impl,) FUNCTION (svnot, svnot_impl,) FUNCTION (svorn, svorn_impl,) FUNCTION (svorr, rtx_code_function, (IOR, IOR)) -FUNCTION (svorv, reduction, (UNSPEC_IORV)) +FUNCTION (svorv, svorv_impl,) FUNCTION (svpfalse, svpfalse_impl,) FUNCTION (svpfirst, svpfirst_svpnext_impl, (UNSPEC_PFIRST)) FUNCTION (svpnext, svpfirst_svpnext_impl, (UNSPEC_PNEXT)) @@ -3405,7 +3645,7 @@ FUNCTION (svset2, svset_impl, (2)) FUNCTION (svset3, svset_impl, (3)) FUNCTION (svset4, svset_impl, (4)) FUNCTION (svsetffr, svsetffr_impl,) -FUNCTION (svsplice, QUIET_CODE_FOR_MODE0 (aarch64_sve_splice),) +FUNCTION (svsplice, svsplice_impl,) FUNCTION (svsqrt, rtx_code_function, (SQRT, SQRT, UNSPEC_COND_FSQRT)) FUNCTION (svst1, svst1_impl,) FUNCTION (svst1_scatter, svst1_scatter_impl,) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc index fd0c98c6b68..0d1210f7a6b 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc @@ -221,6 +221,18 @@ public: } }; +class svcvtxnt_impl : public CODE_FOR_MODE1 (aarch64_sve2_cvtxnt) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (f.pred == PRED_x && is_pfalse (gimple_call_arg (f.call, 1))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svdup_laneq_impl : public function_base { public: @@ -358,6 +370,14 @@ class svmatch_svnmatch_impl : public function_base { public: CONSTEXPR svmatch_svnmatch_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } rtx expand (function_expander &e) const override @@ -556,6 +576,24 @@ public: } }; +class svqshlu_impl : public unspec_based_function +{ +public: + CONSTEXPR svqshlu_impl () + : unspec_based_function (UNSPEC_SQSHLU, -1, -1) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (f.lhs), + gimple_call_arg (f.call, 1)); + return gimple_build_assign (f.lhs, VIEW_CONVERT_EXPR, rhs); + } + return NULL; + } +}; + class svrshl_impl : public unspec_based_function { public: @@ -910,7 +948,7 @@ FUNCTION (svclamp, svclamp_impl,) FUNCTION (svcvtlt, unspec_based_function, (-1, -1, UNSPEC_COND_FCVTLT)) FUNCTION (svcvtn, svcvtn_impl,) FUNCTION (svcvtx, unspec_based_function, (-1, -1, UNSPEC_COND_FCVTX)) -FUNCTION (svcvtxnt, CODE_FOR_MODE1 (aarch64_sve2_cvtxnt),) +FUNCTION (svcvtxnt, svcvtxnt_impl,) FUNCTION (svdup_laneq, svdup_laneq_impl,) FUNCTION (sveor3, CODE_FOR_MODE0 (aarch64_sve2_eor3),) FUNCTION (sveorbt, unspec_based_function, (UNSPEC_EORBT, UNSPEC_EORBT, -1)) @@ -1042,7 +1080,7 @@ FUNCTION (svqrshrun, unspec_based_uncond_function, (UNSPEC_SQRSHRUN, -1, -1, 1)) FUNCTION (svqrshrunb, unspec_based_function, (UNSPEC_SQRSHRUNB, -1, -1)) FUNCTION (svqrshrunt, unspec_based_function, (UNSPEC_SQRSHRUNT, -1, -1)) FUNCTION (svqshl, svqshl_impl,) -FUNCTION (svqshlu, unspec_based_function, (UNSPEC_SQSHLU, -1, -1)) +FUNCTION (svqshlu, svqshlu_impl,) FUNCTION (svqshrnb, unspec_based_function, (UNSPEC_SQSHRNB, UNSPEC_UQSHRNB, -1)) FUNCTION (svqshrnt, unspec_based_function, (UNSPEC_SQSHRNT, diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index b3d961452d3..6d21736f685 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -3517,6 +3517,15 @@ is_ptrue (tree v, unsigned int step) && vector_cst_all_same (v, step)); } +/* Return true if V is a constant predicate that acts as a pfalse. */ +bool +is_pfalse (tree v) +{ + return (TREE_CODE (v) == VECTOR_CST + && TYPE_MODE (TREE_TYPE (v)) == VNx16BImode + && integer_zerop (v)); +} + gimple_folder::gimple_folder (const function_instance &instance, tree fndecl, gimple_stmt_iterator *gsi_in, gcall *call_in) : function_call_info (gimple_location (call_in), instance, fndecl), @@ -3622,6 +3631,51 @@ gimple_folder::redirect_pred_x () return redirect_call (instance); } +/* Fold calls with predicate pfalse: + _m predication: lhs = op1. + _x or _z: lhs = {0, ...}. + Implicit predication that reads from memory: lhs = {0, ...}. + Implicit predication that writes to memory or prefetches: no-op. + Return the new gimple statement on success, else NULL. */ +gimple * +gimple_folder::fold_pfalse () +{ + if (pred == PRED_none) + return nullptr; + tree arg0 = gimple_call_arg (call, 0); + if (pred == PRED_m) + { + /* Unary function shapes with _m predication are folded to the + inactive vector (arg0), while other function shapes are folded + to op1 (arg1). + In some intrinsics, e.g. svqshlu, lhs and op1 have different types, + such that folding to op1 is not appropriate. */ + tree arg1 = gimple_call_arg (call, 1); + tree t; + if (is_pfalse (arg1)) + t = arg0; + else if (is_pfalse (arg0)) + t = arg1; + else return nullptr; + if (TREE_TYPE (t) == TREE_TYPE (lhs)) + return fold_call_to (t); + } + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); + if (pred == PRED_implicit && is_pfalse (arg0)) + { + unsigned int flags = call_properties (); + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with + AGGREGATE types as lhs. */ + if ((flags & CP_READ_MEMORY) + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) + return fold_to_stmt_vops (gimple_build_nop ()); + } + return nullptr; +} + /* Fold the call to constant VAL. */ gimple * gimple_folder::fold_to_cstu (poly_uint64 val) @@ -3724,6 +3778,23 @@ gimple_folder::fold_active_lanes_to (tree x) return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); } +/* Fold call to assignment statement lhs = t. */ +gimple * +gimple_folder::fold_call_to (tree t) +{ + return fold_to_stmt_vops (gimple_build_assign (lhs, t)); +} + +/* Fold call to G, incl. adjustments to the virtual operands. */ +gimple * +gimple_folder::fold_to_stmt_vops (gimple *g) +{ + gimple_seq stmts = NULL; + gimple_seq_add_stmt_without_update (&stmts, g); + gsi_replace_with_seq_vops (gsi, stmts); + return g; +} + /* Try to fold the call. Return the new statement on success and null on failure. */ gimple * @@ -3743,6 +3814,8 @@ gimple_folder::fold () /* First try some simplifications that are common to many functions. */ if (auto *call = redirect_pred_x ()) return call; + if (auto *call = fold_pfalse ()) + return call; return base->fold (*this); } diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 5bd9b88d117..ad4d06dd5f5 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -632,6 +632,7 @@ public: gcall *redirect_call (const function_instance &); gimple *redirect_pred_x (); + gimple *fold_pfalse (); gimple *fold_to_cstu (poly_uint64); gimple *fold_to_pfalse (); @@ -639,6 +640,8 @@ public: gimple *fold_to_vl_pred (unsigned int); gimple *fold_const_binary (enum tree_code); gimple *fold_active_lanes_to (tree); + gimple *fold_call_to (tree); + gimple *fold_to_stmt_vops (gimple *); gimple *fold (); @@ -832,6 +835,7 @@ extern tree acle_svprfop; bool vector_cst_all_same (tree, unsigned int); bool is_ptrue (tree, unsigned int); +bool is_pfalse (tree); const function_instance *lookup_fndecl (tree); /* Try to find a mode with the given mode_suffix_info fields. Return the diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c new file mode 100644 index 00000000000..72fb5b6ac6d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c @@ -0,0 +1,240 @@ +#include <arm_sve.h> + +#define MXZ(F, RTY, TY1, TY2) \ + RTY F##_f (TY1 op1, TY2 op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2); \ + } + +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY##_m, RTY, TYPE1, sv##TYPE2) \ + MXZ (F##_##TY##_x, RTY, TYPE1, sv##TYPE2) + +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY##_z, RTY, TYPE1, sv##TYPE2) + +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY) \ + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) + +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY) \ + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY##_z, RTY, TYPE1, TYPE2) + +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY) \ + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY##_m, RTY, TYPE1, TYPE2) \ + MXZ (F##_n_##TY##_x, RTY, TYPE1, TYPE2) \ + PRED_Z (F, RTY, TYPE1, TYPE2, TY) + +#define PRED_IMPLICITv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY, RTY, TYPE1, sv##TYPE2) + +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY) \ + PRED_IMPLICITv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY, RTY, TYPE1, TYPE2) + +#define ALL_Q_INTEGER(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svint8_t, svint8_t, int8_t, s8) + +#define ALL_Q_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svint8_t, svint8_t, uint8_t, s8) + +#define ALL_Q_INTEGER_INT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, int8_t, u8) \ + PRED_##P (F, svint8_t, svint8_t, int8_t, s8) + +#define ALL_Q_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svbool_t, svint8_t, int8_t, s8) + +#define ALL_H_INTEGER(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, int16_t, s16) + +#define ALL_H_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, uint16_t, s16) + +#define ALL_H_INTEGER_INT(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, int16_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, int16_t, s16) + +#define ALL_H_INTEGER_WIDE(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, uint8_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, int8_t, s16) + +#define ALL_H_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svbool_t, svint16_t, int16_t, s16) + +#define ALL_S_INTEGER(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, int32_t, s32) + +#define ALL_S_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, uint32_t, s32) + +#define ALL_S_INTEGER_INT(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, int32_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, int32_t, s32) + +#define ALL_S_INTEGER_WIDE(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint16_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, int16_t, s32) + +#define ALL_S_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svbool_t, svint32_t, int32_t, s32) + +#define ALL_D_INTEGER(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, int64_t, s64) + +#define ALL_D_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, uint64_t, s64) + +#define ALL_D_INTEGER_INT(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, int64_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, int64_t, s64) + +#define ALL_D_INTEGER_WIDE(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, uint32_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, int32_t, s64) + +#define ALL_D_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svbool_t, svint64_t, int64_t, s64) + +#define SD_INTEGER_TO_UINT(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svuint32_t, svint32_t, int32_t, s32) \ + PRED_##P (F, svuint64_t, svint64_t, int64_t, s64) + +#define BH_INTEGER_BOOL(F, P) \ + ALL_Q_INTEGER_BOOL (F, P) \ + ALL_H_INTEGER_BOOL (F, P) + +#define BHS_UNSIGNED_UINT64(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint64_t, u8) \ + PRED_##P (F, svuint16_t, svuint16_t, uint64_t, u16) \ + PRED_##P (F, svuint32_t, svuint32_t, uint64_t, u32) + +#define BHS_UNSIGNED_WIDE_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint8_t, uint64_t, u8) \ + PRED_##P (F, svbool_t, svuint16_t, uint64_t, u16) \ + PRED_##P (F, svbool_t, svuint32_t, uint64_t, u32) + +#define BHS_SIGNED_UINT64(F, P) \ + PRED_##P (F, svint8_t, svint8_t, uint64_t, s8) \ + PRED_##P (F, svint16_t, svint16_t, uint64_t, s16) \ + PRED_##P (F, svint32_t, svint32_t, uint64_t, s32) + +#define BHS_SIGNED_WIDE_BOOL(F, P) \ + PRED_##P (F, svbool_t, svint8_t, int64_t, s8) \ + PRED_##P (F, svbool_t, svint16_t, int64_t, s16) \ + PRED_##P (F, svbool_t, svint32_t, int64_t, s32) + +#define ALL_UNSIGNED_UINT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) + +#define ALL_UNSIGNED_INT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, int8_t, u8) \ + PRED_##P (F, svuint16_t, svuint16_t, int16_t, u16) \ + PRED_##P (F, svuint32_t, svuint32_t, int32_t, u32) \ + PRED_##P (F, svuint64_t, svuint64_t, int64_t, u64) + +#define ALL_SIGNED_UINT(F, P) \ + PRED_##P (F, svint8_t, svint8_t, uint8_t, s8) \ + PRED_##P (F, svint16_t, svint16_t, uint16_t, s16) \ + PRED_##P (F, svint32_t, svint32_t, uint32_t, s32) \ + PRED_##P (F, svint64_t, svint64_t, uint64_t, s64) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, svfloat16_t, svfloat16_t, float16_t, f16) \ + PRED_##P (F, svfloat32_t, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svfloat64_t, svfloat64_t, float64_t, f64) + +#define ALL_FLOAT_INT(F, P) \ + PRED_##P (F, svfloat16_t, svfloat16_t, int16_t, f16) \ + PRED_##P (F, svfloat32_t, svfloat32_t, int32_t, f32) \ + PRED_##P (F, svfloat64_t, svfloat64_t, int64_t, f64) + +#define ALL_FLOAT_BOOL(F, P) \ + PRED_##P (F, svbool_t, svfloat16_t, float16_t, f16) \ + PRED_##P (F, svbool_t, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svbool_t, svfloat64_t, float64_t, f64) + +#define ALL_FLOAT_SCALAR(F, P) \ + PRED_##P (F, float16_t, float16_t, float16_t, f16) \ + PRED_##P (F, float32_t, float32_t, float32_t, f32) \ + PRED_##P (F, float64_t, float64_t, float64_t, f64) + +#define B(F, P) \ + PRED_##P (F, svbool_t, svbool_t, bool_t, b) + +#define ALL_SD_INTEGER(F, P) \ + ALL_S_INTEGER (F, P) \ + ALL_D_INTEGER (F, P) + +#define HSD_INTEGER_WIDE(F, P) \ + ALL_H_INTEGER_WIDE (F, P) \ + ALL_S_INTEGER_WIDE (F, P) \ + ALL_D_INTEGER_WIDE (F, P) + +#define BHS_INTEGER_UINT64(F, P) \ + BHS_UNSIGNED_UINT64 (F, P) \ + BHS_SIGNED_UINT64 (F, P) + +#define BHS_INTEGER_WIDE_BOOL(F, P) \ + BHS_UNSIGNED_WIDE_BOOL (F, P) \ + BHS_SIGNED_WIDE_BOOL (F, P) + +#define ALL_INTEGER(F, P) \ + ALL_Q_INTEGER (F, P) \ + ALL_H_INTEGER (F, P) \ + ALL_S_INTEGER (F, P) \ + ALL_D_INTEGER (F, P) + +#define ALL_INTEGER_UINT(F, P) \ + ALL_Q_INTEGER_UINT (F, P) \ + ALL_H_INTEGER_UINT (F, P) \ + ALL_S_INTEGER_UINT (F, P) \ + ALL_D_INTEGER_UINT (F, P) + +#define ALL_INTEGER_INT(F, P) \ + ALL_Q_INTEGER_INT (F, P) \ + ALL_H_INTEGER_INT (F, P) \ + ALL_S_INTEGER_INT (F, P) \ + ALL_D_INTEGER_INT (F, P) + +#define ALL_INTEGER_BOOL(F, P) \ + ALL_Q_INTEGER_BOOL (F, P) \ + ALL_H_INTEGER_BOOL (F, P) \ + ALL_S_INTEGER_BOOL (F, P) \ + ALL_D_INTEGER_BOOL (F, P) + +#define ALL_FLOAT_AND_SD_INTEGER(F, P) \ + ALL_SD_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_ARITH(F, P) \ + ALL_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_ARITH_BOOL(F, P) \ + ALL_INTEGER_BOOL (F, P) \ + ALL_FLOAT_BOOL (F, P) + +#define ALL_DATA(F, P) \ + ALL_ARITH (F, P) \ + PRED_##P (F, svbfloat16_t, svbfloat16_t, bfloat16_t, bf16) + diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.c new file mode 100644 index 00000000000..f6183a14325 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.c @@ -0,0 +1,195 @@ +#include <arm_sve.h> +#include <stdbool.h> + +#define M(F, RTY, TY) \ + RTY F##_f (RTY inactive, TY op) \ + { \ + return sv##F (inactive, svpfalse_b (), op); \ + } + +#define XZI(F, RTY, TY) \ + RTY F##_f (TY op) \ + { \ + return sv##F (svpfalse_b (), op); \ + } + +#define PRED_Z(F, RTY, TYPE, TY) \ + XZI (F##_##TY##_z, RTY, sv##TYPE) \ + +#define PRED_MZ(F, RTY, TYPE, TY) \ + M (F##_##TY##_m, RTY, sv##TYPE) \ + PRED_Z (F, RTY, TYPE, TY) + +#define PRED_MXZ(F, RTY, TYPE, TY) \ + XZI (F##_##TY##_x, RTY, sv##TYPE) \ + PRED_MZ (F, RTY, TYPE, TY) + +#define PRED_IMPLICIT(F, RTY, TYPE, TY) \ + XZI (F##_##TY, RTY, sv##TYPE) + +#define PRED_IMPLICITn(F, RTY, TYPE) \ + XZI (F, RTY, sv##TYPE) + +#define Q_INTEGER(F, P) \ + PRED_##P (F, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svint8_t, int8_t, s8) + +#define Q_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint8_t, uint8_t, u8) \ + PRED_##P (F, int8_t, int8_t, s8) + +#define Q_INTEGER_SCALAR_WIDE(F, P) \ + PRED_##P (F, uint64_t, uint8_t, u8) \ + PRED_##P (F, int64_t, int8_t, s8) + +#define H_INTEGER(F, P) \ + PRED_##P (F, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svint16_t, int16_t, s16) + +#define H_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint16_t, uint16_t, u16) \ + PRED_##P (F, int16_t, int16_t, s16) + +#define H_INTEGER_SCALAR_WIDE(F, P) \ + PRED_##P (F, uint64_t, uint16_t, u16) \ + PRED_##P (F, int64_t, int16_t, s16) + +#define S_INTEGER(F, P) \ + PRED_##P (F, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svint32_t, int32_t, s32) + +#define S_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint32_t, uint32_t, u32) \ + PRED_##P (F, int32_t, int32_t, s32) + +#define S_INTEGER_SCALAR_WIDE(F, P) \ + PRED_##P (F, uint64_t, uint32_t, u32) \ + PRED_##P (F, int64_t, int32_t, s32) + +#define S_UNSIGNED(F, P) \ + PRED_##P (F, svuint32_t, uint32_t, u32) + +#define D_INTEGER(F, P) \ + PRED_##P (F, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svint64_t, int64_t, s64) + +#define D_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint64_t, uint64_t, u64) \ + PRED_##P (F, int64_t, int64_t, s64) + +#define SD_INTEGER(F, P) \ + S_INTEGER (F, P) \ + D_INTEGER (F, P) + +#define SD_DATA(F, P) \ + PRED_##P (F, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svfloat64_t, float64_t, f64) \ + S_INTEGER (F, P) \ + D_INTEGER (F, P) + +#define ALL_SIGNED(F, P) \ + PRED_##P (F, svint8_t, int8_t, s8) \ + PRED_##P (F, svint16_t, int16_t, s16) \ + PRED_##P (F, svint32_t, int32_t, s32) \ + PRED_##P (F, svint64_t, int64_t, s64) + +#define ALL_SIGNED_UINT(F, P) \ + PRED_##P (F, svuint8_t, int8_t, s8) \ + PRED_##P (F, svuint16_t, int16_t, s16) \ + PRED_##P (F, svuint32_t, int32_t, s32) \ + PRED_##P (F, svuint64_t, int64_t, s64) + +#define ALL_UNSIGNED_UINT(F, P) \ + PRED_##P (F, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svuint64_t, uint64_t, u64) + +#define HSD_INTEGER(F, P) \ + H_INTEGER (F, P) \ + S_INTEGER (F, P) \ + D_INTEGER (F, P) + +#define ALL_INTEGER(F, P) \ + Q_INTEGER (F, P) \ + HSD_INTEGER (F, P) + +#define ALL_INTEGER_SCALAR(F, P) \ + Q_INTEGER_SCALAR (F, P) \ + H_INTEGER_SCALAR (F, P) \ + S_INTEGER_SCALAR (F, P) \ + D_INTEGER_SCALAR (F, P) + +#define ALL_INTEGER_SCALAR_WIDE(F, P) \ + Q_INTEGER_SCALAR_WIDE (F, P) \ + H_INTEGER_SCALAR_WIDE (F, P) \ + S_INTEGER_SCALAR_WIDE (F, P) \ + D_INTEGER_SCALAR (F, P) + +#define ALL_INTEGER_UINT(F, P) \ + ALL_SIGNED_UINT (F, P) \ + ALL_UNSIGNED_UINT (F, P) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, svfloat16_t, float16_t, f16) \ + PRED_##P (F, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svfloat64_t, float64_t, f64) + +#define ALL_FLOAT_SCALAR(F, P) \ + PRED_##P (F, float16_t, float16_t, f16) \ + PRED_##P (F, float32_t, float32_t, f32) \ + PRED_##P (F, float64_t, float64_t, f64) + +#define ALL_FLOAT_INT(F, P) \ + PRED_##P (F, svint16_t, float16_t, f16) \ + PRED_##P (F, svint32_t, float32_t, f32) \ + PRED_##P (F, svint64_t, float64_t, f64) + +#define ALL_FLOAT_UINT(F, P) \ + PRED_##P (F, svuint16_t, float16_t, f16) \ + PRED_##P (F, svuint32_t, float32_t, f32) \ + PRED_##P (F, svuint64_t, float64_t, f64) + +#define ALL_FLOAT_AND_SIGNED(F, P) \ + ALL_SIGNED (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_ARITH_SCALAR(F, P) \ + ALL_INTEGER_SCALAR (F, P) \ + ALL_FLOAT_SCALAR (F, P) + +#define ALL_ARITH_SCALAR_WIDE(F, P) \ + ALL_INTEGER_SCALAR_WIDE (F, P) \ + ALL_FLOAT_SCALAR (F, P) + +#define ALL_DATA(F, P) \ + ALL_INTEGER (F, P) \ + ALL_FLOAT (F, P) \ + PRED_##P (F, svbfloat16_t, bfloat16_t, bf16) + +#define ALL_DATA_SCALAR(F, P) \ + ALL_ARITH_SCALAR (F, P) \ + PRED_##P (F, bfloat16_t, bfloat16_t, bf16) + +#define ALL_DATA_UINT(F, P) \ + ALL_INTEGER_UINT (F, P) \ + ALL_FLOAT_UINT (F, P) \ + PRED_##P (F, svuint16_t, bfloat16_t, bf16) + +#define B(F, P) \ + PRED_##P (F, svbool_t, bool_t, b) + +#define BN(F, P) \ + PRED_##P (F, svbool_t, bool_t, b8) \ + PRED_##P (F, svbool_t, bool_t, b16) \ + PRED_##P (F, svbool_t, bool_t, b32) \ + PRED_##P (F, svbool_t, bool_t, b64) + +#define BOOL(F, P) \ + PRED_##P (F, bool, bool_t) + +#define ALL_PRED_UINT64(F, P) \ + PRED_##P (F, uint64_t, bool_t, b8) \ + PRED_##P (F, uint64_t, bool_t, b16) \ + PRED_##P (F, uint64_t, bool_t, b32) \ + PRED_##P (F, uint64_t, bool_t, b64) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c new file mode 100644 index 00000000000..c1779818fc5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +B (brkn, Zv) +B (brkpa, Zv) +B (brkpb, Zv) +ALL_DATA (splice, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c new file mode 100644 index 00000000000..3ccdd3fedf5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_FLOAT_INT (scale, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c new file mode 100644 index 00000000000..567d442c98e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_ARITH (abd, MXZ) +ALL_ARITH (add, MXZ) +ALL_INTEGER (and, MXZ) +B (and, Zv) +ALL_INTEGER (bic, MXZ) +B (bic, Zv) +ALL_FLOAT_AND_SD_INTEGER (div, MXZ) +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ) +ALL_INTEGER (eor, MXZ) +B (eor, Zv) +ALL_ARITH (mul, MXZ) +ALL_INTEGER (mulh, MXZ) +ALL_FLOAT (mulx, MXZ) +B (nand, Zv) +B (nor, Zv) +B (orn, Zv) +ALL_INTEGER (orr, MXZ) +B (orr, Zv) +ALL_ARITH (sub, MXZ) +ALL_ARITH (subr, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 448 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c new file mode 100644 index 00000000000..ada5a6613cf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_ARITH (max, MXZ) +ALL_ARITH (min, MXZ) +ALL_FLOAT (maxnm, MXZ) +ALL_FLOAT (minnm, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 56 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c new file mode 100644 index 00000000000..8c3a6f89f9e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ4(F, TYPE) \ + TYPE F##_f (TYPE op1, TYPE op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2, 90); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + MXZ4 (F##_##TY##_m, TYPE) \ + MXZ4 (F##_##TY##_x, TYPE) \ + MXZ4 (F##_##TY##_z, TYPE) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, svfloat16_t, f16) \ + PRED_##P (F, svfloat32_t, f32) \ + PRED_##P (F, svfloat64_t, f64) + +ALL_FLOAT (cadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c new file mode 100644 index 00000000000..5275064ef44 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +BHS_SIGNED_UINT64 (asr_wide, MXZ) +BHS_INTEGER_UINT64 (lsl_wide, MXZ) +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c new file mode 100644 index 00000000000..b4bad639942 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_SIGNED_UINT (asr, MXZ) +ALL_INTEGER_UINT (lsl, MXZ) +ALL_UNSIGNED_UINT (lsr, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 64 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c new file mode 100644 index 00000000000..1b0a27ca635 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + sv##TY F##_f (sv##TY op1, sv##TY op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) \ + T (F##_b, bool_t) + +ALL_DATA (sel) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[zp]0\.[db], [zp]1\.[db]\n\tret\n} 13 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 13 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c new file mode 100644 index 00000000000..f45d4d4357e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_DATA (clasta, IMPLICITv) +ALL_DATA (clastb, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c new file mode 100644 index 00000000000..7a8ba535f1c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_FLOAT_BOOL (acge, IMPLICIT) +ALL_FLOAT_BOOL (acgt, IMPLICIT) +ALL_FLOAT_BOOL (acle, IMPLICIT) +ALL_FLOAT_BOOL (aclt, IMPLICIT) +ALL_ARITH_BOOL (cmpeq, IMPLICIT) +ALL_ARITH_BOOL (cmpge, IMPLICIT) +ALL_ARITH_BOOL (cmpgt, IMPLICIT) +ALL_ARITH_BOOL (cmple, IMPLICIT) +ALL_ARITH_BOOL (cmplt, IMPLICIT) +ALL_ARITH_BOOL (cmpne, IMPLICIT) +ALL_FLOAT_BOOL (cmpuo, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 162 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 162 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c new file mode 100644 index 00000000000..b1a40cafb08 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +BHS_SIGNED_WIDE_BOOL (cmpeq_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmpge_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmpgt_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmple_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmplt_wide, IMPLICIT) +BHS_SIGNED_WIDE_BOOL (cmpne_wide, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 60 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c new file mode 100644 index 00000000000..63c9e397c0b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +ALL_PRED_UINT64 (cntp, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tx0, 0\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c new file mode 100644 index 00000000000..921fc7002e2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_FLOAT_SCALAR (adda, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 3 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c new file mode 100644 index 00000000000..0f01a0a5d63 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + sv##TY F##_f (const TY *base) \ + { \ + return sv##F (svpfalse_b (), base); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) \ + +ALL_DATA (ldff1) +ALL_DATA (ldnf1) +ALL_DATA (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 36 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c new file mode 100644 index 00000000000..e112b8d01af --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, RTY, TY) \ + RTY F##_f (const TY *base) \ + { \ + return sv##F (svpfalse_b (), base); \ + } + +#define D_INTEGER(F, TY) \ + T (F##_s64, svint64_t, TY) \ + T (F##_u64, svuint64_t, TY) + +#define SD_INTEGER(F, TY) \ + D_INTEGER (F, TY) \ + T (F##_s32, svint32_t, TY) \ + T (F##_u32, svuint32_t, TY) + +#define HSD_INTEGER(F, TY) \ + SD_INTEGER (F, TY) \ + T (F##_s16, svint16_t, TY) \ + T (F##_u16, svuint16_t, TY) + +#define TEST(F) \ + HSD_INTEGER (F##sb, int8_t) \ + SD_INTEGER (F##sh, int16_t) \ + D_INTEGER (F##sw, int32_t) \ + HSD_INTEGER (F##ub, uint8_t) \ + SD_INTEGER (F##uh, uint16_t) \ + D_INTEGER (F##uw, uint32_t) \ + +TEST (ld1) +TEST (ldff1) +TEST (ldnf1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 72 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c new file mode 100644 index 00000000000..e664b294ea2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 indices) \ + { \ + return sv##F (svpfalse_b (), base, indices); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t index) \ + { \ + return sv##F (svpfalse_b (), bases, index); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_s##B##index_##TY, RTY, TYPE, svint##B##_t) \ + T1 (F##_gather_u##B##index_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_index_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_index_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sh, 32, svint32_t, s32, int16_t) \ + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##uh, 32, svint32_t, s32, uint16_t) \ + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ld1) +TEST (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 72 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c new file mode 100644 index 00000000000..86a406b5487 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 offsets) \ + { \ + return sv##F (svpfalse_b (), base, offsets); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t offset) \ + { \ + return sv##F (svpfalse_b (), bases, offset); \ + } + +#define T5(F, RTY, TY1) \ + RTY F##_f (TY1 bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_s##B##offset_##TY, RTY, TYPE, svint##B##_t) \ + T1 (F##_gather_u##B##offset_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_offset_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_offset_u##B, svuint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_s##B, svint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sb, 32, svint32_t, s32, int8_t) \ + T3 (F##sb, 32, svuint32_t, u32, int8_t) \ + T3 (F##sb, 64, svint64_t, s64, int8_t) \ + T3 (F##sb, 64, svuint64_t, u64, int8_t) \ + T4 (F##sb, 32, svuint32_t, u32) \ + T4 (F##sb, 64, svuint64_t, u64) \ + T3 (F##sh, 32, svint32_t, s32, int16_t) \ + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##ub, 32, svint32_t, s32, uint8_t) \ + T3 (F##ub, 32, svuint32_t, u32, uint8_t) \ + T3 (F##ub, 64, svint64_t, s64, uint8_t) \ + T3 (F##ub, 64, svuint64_t, u64, uint8_t) \ + T4 (F##ub, 32, svuint32_t, u32) \ + T4 (F##ub, 64, svuint64_t, u64) \ + T3 (F##uh, 32, svint32_t, s32, uint16_t) \ + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ld1) +TEST (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 160 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 160 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c new file mode 100644 index 00000000000..d87ae23d757 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (TY1 *base, TY2 values) \ + { \ + return sv##F (svpfalse_b (), base, values); \ + } + +#define T3(F, TY, B) \ + T1 (F##_f##B, svfloat##B##_t, float##B##_t, TY) \ + T1 (F##_s##B, svint##B##_t, int##B##_t, TY) \ + T1 (F##_u##B, svuint##B##_t, uint##B##_t, TY) \ + +#define T2(F, B) \ + T3 (F##_gather_u##B##offset, svuint##B##_t, B) \ + T3 (F##_gather_u##B##index, svuint##B##_t, B) \ + T3 (F##_gather_s##B##offset, svint##B##_t, B) \ + T3 (F##_gather_s##B##index, svint##B##_t, B) + +#define SD_DATA(F) \ + T2 (F, 32) \ + T2 (F, 64) + +SD_DATA (ld1) +SD_DATA (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c new file mode 100644 index 00000000000..7f05ffb753b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY) \ + RTY F##_f (TY bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T2(F, RTY, TY) \ + RTY F##_f (TY bases, int64_t value) \ + { \ + return sv##F (svpfalse_b (), bases, value); \ + } + +#define T4(F, TY, TEST, B) \ + TEST (F##_f##B, svfloat##B##_t, TY) \ + TEST (F##_s##B, svint##B##_t, TY) \ + TEST (F##_u##B, svuint##B##_t, TY) \ + +#define T3(F, B) \ + T4 (F##_gather_u##B##base, svuint##B##_t, T1, B) \ + T4 (F##_gather_u##B##base_offset, svuint##B##_t, T2, B) \ + T4 (F##_gather_u##B##base_index, svuint##B##_t, T2, B) + +#define SD_DATA(F) \ + T3 (F, 32) \ + T3 (F, 64) + +SD_DATA (ld1) +SD_DATA (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 36 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c new file mode 100644 index 00000000000..468cc220230 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+sve+f64mm" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + sv##TY F##_f (const TY *base) \ + { \ + return sv##F (svpfalse_b (), base); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) \ + +ALL_DATA (ld1rq) +ALL_DATA (ld1ro) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c new file mode 100644 index 00000000000..4e191c3021c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F) \ + void F##_f (const void *base) \ + { \ + return sv##F (svpfalse_b (), base, 0); \ + } + +#define ALL_PREFETCH \ + T (prfb) \ + T (prfh) \ + T (prfw) \ + T (prfd) + +ALL_PREFETCH + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c new file mode 100644 index 00000000000..c42a8a2ab7c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TYPE, TY) \ + void F##_##TY##_f (const void *base, sv##TYPE indices) \ + { \ + return sv##F##_##TY##index (svpfalse_b (), base, indices, 0);\ + } + +#define T1(F) \ + T (F, uint32_t, u32) \ + T (F, uint64_t, u64) \ + T (F, int32_t, s32) \ + T (F, int64_t, s64) + +#define ALL_PREFETCH_GATHER_INDEX \ + T1 (prfh_gather) \ + T1 (prfw_gather) \ + T1 (prfd_gather) + +ALL_PREFETCH_GATHER_INDEX + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c new file mode 100644 index 00000000000..060a533f168 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TYPE, TY) \ + void F##_##TY##_f (const void *base, sv##TYPE offsets) \ + { \ + return sv##F##_##TY##offset (svpfalse_b (), base, offsets, 0); \ + } + +#define T1(F) \ + T (F, uint32_t, u32) \ + T (F, uint64_t, u64) \ + T (F, int32_t, s32) \ + T (F, int64_t, s64) + +#define ALL_PREFETCH_GATHER_OFFSET \ + T1 (prfb_gather) \ + +ALL_PREFETCH_GATHER_OFFSET + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c new file mode 100644 index 00000000000..12408863a61 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +BOOL (ptest_any, IMPLICITn) +BOOL (ptest_first, IMPLICITn) +BOOL (ptest_last, IMPLICITn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tw0, 0\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 3 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c new file mode 100644 index 00000000000..550001a212f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +svbool_t rdffr_f () +{ + return svrdffr_z (svpfalse_b ()); +} + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 1 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c new file mode 100644 index 00000000000..9e74b0d1a35 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include "../pfalse-unary_0.c" + +ALL_INTEGER_SCALAR (andv, IMPLICIT) +ALL_INTEGER_SCALAR (eorv, IMPLICIT) +ALL_ARITH_SCALAR (maxv, IMPLICIT) +ALL_ARITH_SCALAR (minv, IMPLICIT) +ALL_INTEGER_SCALAR (orv, IMPLICIT) +ALL_FLOAT_SCALAR (maxnmv, IMPLICIT) +ALL_FLOAT_SCALAR (minnmv, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, 0\n\tret\n} 20 } } */ +/* { dg-final { scan-tree-dump-times "return Nan" 6 "optimized" } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wxz]0(?:\.[sd])?, #?-?[1-9]+[0-9]*\n\tret\n} 27 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\t(movi|mvni)\tv0\.(2s|4h), 0x[0-9a-f]+, [a-z]+ [0-9]+\n\tret\n} 5 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 52 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c new file mode 100644 index 00000000000..404f87f3fe3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +ALL_ARITH_SCALAR_WIDE (addv, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[dwvx]0(?:\.(2s|4h))?, #?0\n\tret\n} 11 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 11 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c new file mode 100644 index 00000000000..0fb7db44ba8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY) \ + TY F##_f (TY op1) \ + { \ + return sv##F (svpfalse_b (), op1, 2); \ + } + +#define PRED_MXZn(F, TYPE, TY) \ + MXZ (F##_n_##TY##_m, TYPE) \ + MXZ (F##_n_##TY##_x, TYPE) \ + MXZ (F##_n_##TY##_z, TYPE) + +#define ALL_SIGNED_IMM(F, P) \ + PRED_##P (F, svint8_t, s8) \ + PRED_##P (F, svint16_t, s16) \ + PRED_##P (F, svint32_t, s32) \ + PRED_##P (F, svint64_t, s64) + +ALL_SIGNED_IMM (asrd, MXZn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c new file mode 100644 index 00000000000..3da28712988 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY1, TY2) \ + void F##_f (TY1 *base, sv##TY2 data) \ + { \ + return sv##F (svpfalse_b (), base, data); \ + } + +#define D_INTEGER(F, TY) \ + T (F##_s64, TY, int64_t) \ + T (F##_u64, TY, uint64_t) + +#define SD_INTEGER(F, TY) \ + D_INTEGER (F, TY) \ + T (F##_s32, TY, int32_t) \ + T (F##_u32, TY, uint32_t) + +#define HSD_INTEGER(F, TY) \ + SD_INTEGER (F, TY) \ + T (F##_s16, TY, int16_t) \ + T (F##_u16, TY, uint16_t) + +#define ALL_DATA(F, A) \ + T (F##_bf16, bfloat16_t, bfloat16##A) \ + T (F##_f16, float16_t, float16##A) \ + T (F##_f32, float32_t, float32##A) \ + T (F##_f64, float64_t, float64##A) \ + T (F##_s8, int8_t, int8##A) \ + T (F##_s16, int16_t, int16##A) \ + T (F##_s32, int32_t, int32##A) \ + T (F##_s64, int64_t, int64##A) \ + T (F##_u8, uint8_t, uint8##A) \ + T (F##_u16, uint16_t, uint16##A) \ + T (F##_u32, uint32_t, uint32##A) \ + T (F##_u64, uint64_t, uint64##A) \ + +HSD_INTEGER (st1b, int8_t) +SD_INTEGER (st1h, int16_t) +D_INTEGER (st1w, int32_t) +ALL_DATA (st1, _t) +ALL_DATA (st2, x2_t) +ALL_DATA (st3, x3_t) +ALL_DATA (st4, x4_t) + + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c new file mode 100644 index 00000000000..5db6922b1c3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 indices, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, indices, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t index, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, index, data); \ + } + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_s##B##index_##TY, TYPE2, svint##B##_t, TYPE1) \ + T1 (F##_scatter_u##B##index_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_index_##TY, svuint##B##_t, TYPE1) + +#define TEST(F) \ + T3 (F##h, 32, svint32_t, s32, int16_t) \ + T3 (F##h, 32, svuint32_t, u32, int16_t) \ + T3 (F##h, 64, svint64_t, s64, int16_t) \ + T3 (F##h, 64, svuint64_t, u64, int16_t) \ + T4 (F##h, 32, svuint32_t, u32) \ + T4 (F##h, 64, svuint64_t, u64) \ + T3 (F##w, 64, svint64_t, s64, int32_t) \ + T3 (F##w, 64, svuint64_t, u64, int32_t) \ + T4 (F##w, 64, svuint64_t, u64) \ + +TEST (st1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 15 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c new file mode 100644 index 00000000000..760797ad236 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c @@ -0,0 +1,67 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 offsets, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, offsets, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t offset, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, offset, data); \ + } + +#define T5(F, TY1, TY3) \ + void F##_f (TY1 bases, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, data); \ + } + + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_s##B##offset_##TY, TYPE2, svint##B##_t, TYPE1)\ + T1 (F##_scatter_u##B##offset_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_offset_##TY, svuint##B##_t, TYPE1) \ + T5 (F##_scatter_u##B##base_##TY, svuint##B##_t, TYPE1) + +#define D_INTEGER(F, BHW) \ + T3 (F, 64, svint64_t, s64, int##BHW##_t) \ + T3 (F, 64, svuint64_t, u64, int##BHW##_t) \ + T4 (F, 64, svint64_t, s64) \ + T4 (F, 64, svuint64_t, u64) + +#define SD_INTEGER(F, BHW) \ + D_INTEGER (F, BHW) \ + T3 (F, 32, svint32_t, s32, int##BHW##_t) \ + T3 (F, 32, svuint32_t, u32, int##BHW##_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 32, svuint32_t, u32) + +#define SD_DATA(F) \ + T3 (F, 32, svint32_t, s32, int32_t) \ + T3 (F, 64, svint64_t, s64, int64_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 64, svint64_t, s64) \ + T3 (F, 32, svuint32_t, u32, uint32_t) \ + T3 (F, 64, svuint64_t, u64, uint64_t) \ + T4 (F, 32, svuint32_t, u32) \ + T4 (F, 64, svuint64_t, u64) \ + T3 (F, 32, svfloat32_t, f32, float32_t) \ + T3 (F, 64, svfloat64_t, f64, float64_t) \ + T4 (F, 32, svfloat32_t, f32) \ + T4 (F, 64, svfloat64_t, f64) + + +SD_DATA (st1) +SD_INTEGER (st1b, 8) +SD_INTEGER (st1h, 16) +D_INTEGER (st1w, 32) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 64 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c new file mode 100644 index 00000000000..a3bef095a81 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + void F##_f (TY *base, sv##TY data) \ + { \ + return sv##F (svpfalse_b (), base, data); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) + +ALL_DATA (stnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c new file mode 100644 index 00000000000..9512e56f079 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY12, TY3) \ + TY12 F##_f (TY12 op1, TY12 op2, TY3 op3) \ + { \ + return sv##F (svpfalse_b (), op1, op2, op3); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + MXZ (F##_##TY##_m, sv##TYPE, sv##TYPE) \ + MXZ (F##_n_##TY##_m, sv##TYPE, TYPE) \ + MXZ (F##_##TY##_x, sv##TYPE, sv##TYPE) \ + MXZ (F##_n_##TY##_x, sv##TYPE, TYPE) \ + MXZ (F##_##TY##_z, sv##TYPE, sv##TYPE) \ + MXZ (F##_n_##TY##_z, sv##TYPE, TYPE) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, float16_t, f16) \ + PRED_##P (F, float32_t, f32) \ + PRED_##P (F, float64_t, f64) + +#define ALL_INTEGER(F, P) \ + PRED_##P (F, uint8_t, u8) \ + PRED_##P (F, uint16_t, u16) \ + PRED_##P (F, uint32_t, u32) \ + PRED_##P (F, uint64_t, u64) \ + PRED_##P (F, int8_t, s8) \ + PRED_##P (F, int16_t, s16) \ + PRED_##P (F, int32_t, s32) \ + PRED_##P (F, int64_t, s64) \ + +#define ALL_ARITH(F, P) \ + ALL_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +ALL_ARITH (mad, MXZ) +ALL_ARITH (mla, MXZ) +ALL_ARITH (mls, MXZ) +ALL_ARITH (msb, MXZ) +ALL_FLOAT (nmad, MXZ) +ALL_FLOAT (nmla, MXZ) +ALL_FLOAT (nmls, MXZ) +ALL_FLOAT (nmsb, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c new file mode 100644 index 00000000000..49b4cb5aff3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY) \ + TY F##_f (TY op1, TY op2, TY op3) \ + { \ + return sv##F (svpfalse_b (), op1, op2, op3, 90); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + MXZ (F##_##TY##_m, sv##TYPE) \ + MXZ (F##_##TY##_x, sv##TYPE) \ + MXZ (F##_##TY##_z, sv##TYPE) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, float16_t, f16) \ + PRED_##P (F, float32_t, f32) \ + PRED_##P (F, float64_t, f64) + +ALL_FLOAT (cmla, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c new file mode 100644 index 00000000000..6d16341423c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +ALL_FLOAT_AND_SIGNED (abs, MXZ) +B (brka, MZ) +B (brkb, MZ) +ALL_INTEGER (cnot, MXZ) +HSD_INTEGER (extb, MXZ) +SD_INTEGER (exth, MXZ) +D_INTEGER (extw, MXZ) +B (mov, Z) +ALL_FLOAT_AND_SIGNED (neg, MXZ) +ALL_INTEGER (not, MXZ) +B (not, Z) +B (pfirst, IMPLICIT) +ALL_INTEGER (rbit, MXZ) +ALL_FLOAT (recpx, MXZ) +HSD_INTEGER (revb, MXZ) +SD_INTEGER (revh, MXZ) +D_INTEGER (revw, MXZ) +ALL_FLOAT (rinti, MXZ) +ALL_FLOAT (rintx, MXZ) +ALL_FLOAT (rintz, MXZ) +ALL_FLOAT (sqrt, MXZ) +SD_DATA (compact, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 79 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 160 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 5 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 244 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c new file mode 100644 index 00000000000..f18f2fcbe2c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+sve+bf16" } */ + +#include <arm_sve.h> + +#define T(F, TYPE1, TY1, TYPE2, TY2) \ + TYPE1 F##_##TY1##_##TY2##_x_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_x (even, svpfalse_b (), op); \ + } \ + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_m (even, svpfalse_b (), op); \ + } + +T (cvtnt, svbfloat16_t, bf16, svfloat32_t, f32) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 1 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?: [0-9]*[bhsd])?, #?0\n\tret\n} 1 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c new file mode 100644 index 00000000000..676054c472e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c @@ -0,0 +1,49 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+sve+bf16" } */ + +#include <arm_sve.h> + +#define T(TYPE1, TY1, TYPE2, TY2) \ + TYPE1 cvt_##TY1##_##TY2##_x_f (TYPE2 op) \ + { \ + return svcvt_##TY1##_##TY2##_x (svpfalse_b (), op); \ + } \ + TYPE1 cvt_##TY1##_##TY2##_z_f (TYPE2 op) \ + { \ + return svcvt_##TY1##_##TY2##_z (svpfalse_b (), op); \ + } \ + TYPE1 cvt_##TY1##_##TY2##_m_f (TYPE1 inactive, TYPE2 op) \ + { \ + return svcvt_##TY1##_##TY2##_m (inactive, svpfalse_b (), op); \ + } + +#define SWAP(TYPE1, TY1, TYPE2, TY2) \ + T (TYPE1, TY1, TYPE2, TY2) \ + T (TYPE2, TY2, TYPE1, TY1) + +#define TEST_ALL \ + T (svbfloat16_t, bf16, svfloat32_t, f32) \ + SWAP (svfloat16_t, f16, svfloat32_t, f32) \ + SWAP (svfloat16_t, f16, svfloat64_t, f64) \ + SWAP (svfloat32_t, f32, svfloat64_t, f64) \ + SWAP (svint16_t, s16, svfloat16_t, f16) \ + SWAP (svint32_t, s32, svfloat16_t, f16) \ + SWAP (svint32_t, s32, svfloat32_t, f32) \ + SWAP (svint32_t, s32, svfloat64_t, f64) \ + SWAP (svint64_t, s64, svfloat16_t, f16) \ + SWAP (svint64_t, s64, svfloat32_t, f32) \ + SWAP (svint64_t, s64, svfloat64_t, f64) \ + SWAP (svuint16_t, u16, svfloat16_t, f16) \ + SWAP (svuint32_t, u32, svfloat16_t, f16) \ + SWAP (svuint32_t, u32, svfloat32_t, f32) \ + SWAP (svuint32_t, u32, svfloat64_t, f64) \ + SWAP (svuint64_t, u64, svfloat16_t, f16) \ + SWAP (svuint64_t, u64, svfloat32_t, f32) \ + SWAP (svuint64_t, u64, svfloat64_t, f64) + +TEST_ALL + + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 35 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 70 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 105 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c new file mode 100644 index 00000000000..e161f70f7ac --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define M(F, TY) \ + sv##TY F##_f (sv##TY inactive, TY op) \ + { \ + return sv##F (inactive, svpfalse_b (), op); \ + } + +#define XZ(F, TY) \ + sv##TY F##_f (TY op) \ + { \ + return sv##F (svpfalse_b (), op); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + M (F##_##TY##_m, TYPE) \ + XZ (F##_##TY##_x, TYPE) \ + XZ (F##_##TY##_z, TYPE) + +#define ALL_DATA(F, P) \ + PRED_##P (F, uint8_t, u8) \ + PRED_##P (F, uint16_t, u16) \ + PRED_##P (F, uint32_t, u32) \ + PRED_##P (F, uint64_t, u64) \ + PRED_##P (F, int8_t, s8) \ + PRED_##P (F, int16_t, s16) \ + PRED_##P (F, int32_t, s32) \ + PRED_##P (F, int64_t, s64) \ + PRED_##P (F, float16_t, f16) \ + PRED_##P (F, float32_t, f32) \ + PRED_##P (F, float64_t, f64) \ + PRED_##P (F, bfloat16_t, bf16) \ + +ALL_DATA (dup, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0(?:\.[bhsd])?, [wxhsd]0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c new file mode 100644 index 00000000000..de98eff6e15 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +BN (pnext, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c new file mode 100644 index 00000000000..c825997b07c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +ALL_SIGNED_UINT (cls, MXZ) +ALL_INTEGER_UINT (clz, MXZ) +ALL_DATA_UINT (cnt, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c new file mode 100644 index 00000000000..d086adfa2bc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +ALL_FLOAT (rinta, MXZ) +ALL_FLOAT (rintm, MXZ) +ALL_FLOAT (rintn, MXZ) +ALL_FLOAT (rintp, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12} } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c new file mode 100644 index 00000000000..95ad8f14641 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_ARITH (addp, MXv) +ALL_ARITH (maxp, MXv) +ALL_FLOAT (maxnmp, MXv) +ALL_ARITH (minp, MXv) +ALL_FLOAT (minnmp, MXv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 39 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 39 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c new file mode 100644 index 00000000000..98a62e3c54f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_INTEGER_INT (qshl, MXZ) +ALL_INTEGER_INT (qrshl, MXZ) +ALL_UNSIGNED_INT (sqadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 40 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 80 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 120 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c new file mode 100644 index 00000000000..7ba6ba09178 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_INTEGER_INT (rshl, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 32 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c new file mode 100644 index 00000000000..a6015872bdc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_INTEGER (hadd, MXZ) +ALL_INTEGER (hsub, MXZ) +ALL_INTEGER (hsubr, MXZ) +ALL_INTEGER (qadd, MXZ) +ALL_INTEGER (qsub, MXZ) +ALL_INTEGER (qsubr, MXZ) +ALL_INTEGER (rhadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c new file mode 100644 index 00000000000..37c8a099d02 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+sve2+faminmax" } */ + +#include "../pfalse-binary_0.c" + +ALL_FLOAT (amax, MXZ) +ALL_FLOAT (amin, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c new file mode 100644 index 00000000000..e03e1e890f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +SD_INTEGER_TO_UINT (histcnt, Zv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c new file mode 100644 index 00000000000..93e3a67f012 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_SIGNED_UINT (uqadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c new file mode 100644 index 00000000000..21f27e84d4f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +HSD_INTEGER_WIDE (adalp, MXZv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c new file mode 100644 index 00000000000..c5c179d281f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +BH_INTEGER_BOOL (match, IMPLICITv) +BH_INTEGER_BOOL (nmatch, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 8 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c new file mode 100644 index 00000000000..40c132ba6ac --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 indices) \ + { \ + return sv##F (svpfalse_b (), base, indices); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t index) \ + { \ + return sv##F (svpfalse_b (), bases, index); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_s##B##index_##TY, RTY, TYPE, svint##B##_t) \ + T1 (F##_gather_u##B##index_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_index_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_index_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 28 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 28 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c new file mode 100644 index 00000000000..05e47bcebac --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 offsets) \ + { \ + return sv##F (svpfalse_b (), base, offsets); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t offset) \ + { \ + return sv##F (svpfalse_b (), bases, offset); \ + } + +#define T5(F, RTY, TY1) \ + RTY F##_f (TY1 bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_u##B##offset_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_offset_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_offset_u##B, svuint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_s##B, svint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sb, 32, svuint32_t, u32, int8_t) \ + T3 (F##sb, 64, svint64_t, s64, int8_t) \ + T3 (F##sb, 64, svuint64_t, u64, int8_t) \ + T4 (F##sb, 32, svuint32_t, u32) \ + T4 (F##sb, 64, svuint64_t, u64) \ + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##ub, 32, svuint32_t, u32, uint8_t) \ + T3 (F##ub, 64, svint64_t, s64, uint8_t) \ + T3 (F##ub, 64, svuint64_t, u64, uint8_t) \ + T4 (F##ub, 32, svuint32_t, u32) \ + T4 (F##ub, 64, svuint64_t, u64) \ + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 56 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 56 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c new file mode 100644 index 00000000000..2ca3e29dbc8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T2(F, RTY, TY1, TY2) \ + RTY F##_f (TY1 *base, TY2 values) \ + { \ + return sv##F (svpfalse_b (), base, values); \ + } + +#define T4(F, TY, B) \ + T2 (F##_f##B, svfloat##B##_t, float##B##_t, TY) \ + T2 (F##_s##B, svint##B##_t, int##B##_t, TY) \ + T2 (F##_u##B, svuint##B##_t, uint##B##_t, TY) \ + +#define SD_DATA(F) \ + T4 (F##_gather_s64index, svint64_t, 64) \ + T4 (F##_gather_u64index, svuint64_t, 64) \ + T4 (F##_gather_u32offset, svuint32_t, 32) \ + T4 (F##_gather_u64offset, svuint64_t, 64) \ + T4 (F##_gather_s64offset, svint64_t, 64) \ + +SD_DATA (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 15 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c new file mode 100644 index 00000000000..e6c2e25ef52 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY) \ + RTY F##_f (TY bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T2(F, RTY, TY) \ + RTY F##_f (TY bases, int64_t value) \ + { \ + return sv##F (svpfalse_b (), bases, value); \ + } + +#define T4(F, TY, TEST, B) \ + TEST (F##_f##B, svfloat##B##_t, TY) \ + TEST (F##_s##B, svint##B##_t, TY) \ + TEST (F##_u##B, svuint##B##_t, TY) \ + +#define T3(F, B) \ + T4 (F##_gather_u##B##base, svuint##B##_t, T1, B) \ + T4 (F##_gather_u##B##base_offset, svuint##B##_t, T2, B) \ + T4 (F##_gather_u##B##base_index, svuint##B##_t, T2, B) + +#define SD_DATA(F) \ + T3 (F, 32) \ + T3 (F, 64) + +SD_DATA (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 18 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c new file mode 100644 index 00000000000..dbc1850ce31 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, RTY, TY) \ + RTY F##_f (TY op1) \ + { \ + return sv##F (svpfalse_b (), op1, 2); \ + } + +#define PRED_MXZn(F, RTY, TYPE, TY) \ + MXZ (F##_n_##TY##_m, RTY, TYPE) \ + MXZ (F##_n_##TY##_x, RTY, TYPE) \ + MXZ (F##_n_##TY##_z, RTY, TYPE) + +#define ALL_SIGNED_IMM(F, P) \ + PRED_##P (F, svuint8_t, svint8_t, s8) \ + PRED_##P (F, svuint16_t, svint16_t, s16) \ + PRED_##P (F, svuint32_t, svint32_t, s32) \ + PRED_##P (F, svuint64_t, svint64_t, s64) + +ALL_SIGNED_IMM (qshlu, MXZn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c new file mode 100644 index 00000000000..3bf30d17c97 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY) \ + TY F##_f (TY op1) \ + { \ + return sv##F (svpfalse_b (), op1, 2); \ + } + +#define PRED_MXZn(F, TYPE, TY) \ + MXZ (F##_n_##TY##_m, TYPE) \ + MXZ (F##_n_##TY##_x, TYPE) \ + MXZ (F##_n_##TY##_z, TYPE) + +#define ALL_INTEGER_IMM(F, P) \ + PRED_##P (F, svuint8_t, u8) \ + PRED_##P (F, svuint16_t, u16) \ + PRED_##P (F, svuint32_t, u32) \ + PRED_##P (F, svuint64_t, u64) \ + PRED_##P (F, svint8_t, s8) \ + PRED_##P (F, svint16_t, s16) \ + PRED_##P (F, svint32_t, s32) \ + PRED_##P (F, svint64_t, s64) + +ALL_INTEGER_IMM (rshr, MXZn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c new file mode 100644 index 00000000000..644124a0f7f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c @@ -0,0 +1,49 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 indices, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, indices, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t index, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, index, data); \ + } + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_s##B##index_##TY, TYPE2, svint##B##_t, TYPE1) \ + T1 (F##_scatter_u##B##index_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_index_##TY, svuint##B##_t, TYPE1) + +#define TEST(F) \ + T3 (F##h, 64, svint64_t, s64, int16_t) \ + T3 (F##h, 64, svuint64_t, u64, int16_t) \ + T4 (F##h, 32, svuint32_t, u32) \ + T4 (F##h, 64, svuint64_t, u64) \ + T3 (F##w, 64, svint64_t, s64, int32_t) \ + T3 (F##w, 64, svuint64_t, u64, int32_t) \ + T4 (F##w, 64, svuint64_t, u64) \ + +#define SD_DATA(F) \ + T3 (F, 64, svfloat64_t, f64, float64_t) \ + T3 (F, 64, svint64_t, s64, int64_t) \ + T3 (F, 64, svuint64_t, u64, uint64_t) \ + T4 (F, 32, svfloat32_t, f32) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 32, svuint32_t, u32) \ + T4 (F, 64, svfloat64_t, f64) \ + T4 (F, 64, svint64_t, s64) \ + T4 (F, 64, svuint64_t, u64) \ + +TEST (stnt1) +SD_DATA (stnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 23 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 23 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c new file mode 100644 index 00000000000..350c16b6ed1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 offsets, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, offsets, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t offset, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, offset, data); \ + } + +#define T5(F, TY1, TY3) \ + void F##_f (TY1 bases, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, data); \ + } + + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_u##B##offset_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_offset_##TY, svuint##B##_t, TYPE1) \ + T5 (F##_scatter_u##B##base_##TY, svuint##B##_t, TYPE1) + +#define D_INTEGER(F, BHW) \ + T3 (F, 64, svint64_t, s64, int##BHW##_t) \ + T3 (F, 64, svuint64_t, u64, int##BHW##_t) \ + T4 (F, 64, svint64_t, s64) \ + T4 (F, 64, svuint64_t, u64) + +#define SD_INTEGER(F, BHW) \ + D_INTEGER (F, BHW) \ + T3 (F, 32, svuint32_t, u32, int##BHW##_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 32, svuint32_t, u32) + +#define SD_DATA(F) \ + T3 (F, 64, svint64_t, s64, int64_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 64, svint64_t, s64) \ + T3 (F, 32, svuint32_t, u32, uint32_t) \ + T3 (F, 64, svuint64_t, u64, uint64_t) \ + T4 (F, 32, svuint32_t, u32) \ + T4 (F, 64, svuint64_t, u64) \ + T3 (F, 32, svfloat32_t, f32, float32_t) \ + T3 (F, 64, svfloat64_t, f64, float64_t) \ + T4 (F, 32, svfloat32_t, f32) \ + T4 (F, 64, svfloat64_t, f64) + + +SD_DATA (stnt1) +SD_INTEGER (stnt1b, 8) +SD_INTEGER (stnt1h, 16) +D_INTEGER (stnt1w, 32) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 45 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 45 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c new file mode 100644 index 00000000000..5d44224d821 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv9.2-a+sve+sme" } */ + +#include "../pfalse-unary_0.c" + +ALL_SIGNED (qabs, MXZ) +ALL_SIGNED (qneg, MXZ) +S_UNSIGNED (recpe, MXZ) +S_UNSIGNED (rsqrte, MXZ) + +#undef M +#define M(F, RTY, TY) \ + __arm_streaming \ + RTY F##_f (RTY inactive, TY op) \ + { \ + return sv##F (inactive, svpfalse_b (), op); \ + } + +#undef XZI +#define XZI(F, RTY, TY) \ + __arm_streaming \ + RTY F##_f (TY op) \ + { \ + return sv##F (svpfalse_b (), op); \ + } + +ALL_DATA (revd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 22 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 44 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 66 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c new file mode 100644 index 00000000000..0dd04c4785c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define XZ(F, TYPE1, TY1, TYPE2, TY2, P) \ + TYPE1 F##_##TY1##_##TY2##_##P##_f (TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_##P (svpfalse_b (), op); \ + } \ + +#define M(F, TYPE1, TY1, TYPE2, TY2) \ + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 inactive, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_m (inactive, svpfalse_b (), op); \ + } + +M (cvtx, svfloat32_t, f32, svfloat64_t, f64) +XZ (cvtx, svfloat32_t, f32, svfloat64_t, f64, x) +XZ (cvtx, svfloat32_t, f32, svfloat64_t, f64, z) +M (cvtlt, svfloat32_t, f32, svfloat16_t, f16) +XZ (cvtlt, svfloat32_t, f32, svfloat16_t, f16, x) +M (cvtlt, svfloat64_t, f64, svfloat32_t, f32) +XZ (cvtlt, svfloat64_t, f64, svfloat32_t, f32, x) + + + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 7 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c new file mode 100644 index 00000000000..26bec84144f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TYPE1, TY1, TYPE2, TY2) \ + TYPE1 F##_##TY1##_##TY2##_x_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_x (even, svpfalse_b (), op); \ + } \ + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_m (even, svpfalse_b (), op); \ + } + +T (cvtnt, svfloat16_t, f16, svfloat32_t, f32) +T (cvtnt, svfloat32_t, f32, svfloat64_t, f64) +T (cvtxnt, svfloat32_t, f32, svfloat64_t, f64) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?: [0-9]*[bhsd])?, #?0\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 6 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c new file mode 100644 index 00000000000..d7d66a72770 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.c" + +ALL_FLOAT_INT (logb, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3} } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */
Jennifer Schmitz <jschmitz@nvidia.com> writes: > If an SVE intrinsic has predicate pfalse, we can fold the call to > a simplified assignment statement: For _m predication, the LHS can be assigned > the operand for inactive values and for _z, we can assign a zero vector. > For _x, the returned values can be arbitrary and as suggested by > Richard Sandiford, we fold to a zero vector. > > For example, > svint32_t foo (svint32_t op1, svint32_t op2) > { > return svadd_s32_m (svpfalse_b (), op1, op2); > } > can be folded to lhs = op1, such that foo is compiled to just a RET. > > For implicit predication, a case distinction is necessary: > Intrinsics that read from memory can be folded to a zero vector. > Intrinsics that write to memory or prefetch can be folded to a no-op. > Other intrinsics need case-by-case implemenation, which we added in > the corresponding svxxx_impl::fold. > > We implemented this optimization during gimple folding by calling a new method > gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic > cases described above. > > We tested the new behavior for each intrinsic with all supported predications > and data types and checked the produced assembly. There is a test file > for each shape subclass with scan-assembler-times tests that look for > the simplified instruction sequences, such as individual RET instructions > or zeroing moves. There is an additional directive counting the total number of > functions in the test, which must be the sum of counts of all other > directives. This is to check that all tested intrinsics were optimized. > > Some few intrinsics were not covered by this patch: > - svlasta and svlastb already have an implementation to cover a pfalse > predicate. No changes were made to them. > - svld1/2/3/4 return aggregate types and were excluded from the case > that folds calls with implicit predication to lhs = {0, ...}. > - svst1/2/3/4 already have an implementation in svstx_impl that precedes > our optimization, such that it is not triggered. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> Thanks, looks really good. > [...] > @@ -2222,6 +2435,14 @@ class svpfirst_svpnext_impl : public function_base > { > public: > CONSTEXPR svpfirst_svpnext_impl (int unspec) : m_unspec (unspec) {} > + gimple * > + fold (gimple_folder &f) const override > + { > + tree pg = gimple_call_arg (f.call, 0); > + if (is_pfalse (pg)) > + return f.fold_call_to (pg); > + return NULL; > + } I think this should instead return the second argument for UNSPEC_PFIRST. > [...] > @@ -556,6 +576,24 @@ public: > } > }; > > +class svqshlu_impl : public unspec_based_function > +{ > +public: > + CONSTEXPR svqshlu_impl () > + : unspec_based_function (UNSPEC_SQSHLU, -1, -1) {} > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + { > + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (f.lhs), > + gimple_call_arg (f.call, 1)); > + return gimple_build_assign (f.lhs, VIEW_CONVERT_EXPR, rhs); > + } > + return NULL; > + } > +}; > + Could we handle this in the generic code instead? (More below.) > [...] > @@ -3622,6 +3631,51 @@ gimple_folder::redirect_pred_x () > return redirect_call (instance); > } > > +/* Fold calls with predicate pfalse: > + _m predication: lhs = op1. > + _x or _z: lhs = {0, ...}. > + Implicit predication that reads from memory: lhs = {0, ...}. > + Implicit predication that writes to memory or prefetches: no-op. > + Return the new gimple statement on success, else NULL. */ > +gimple * > +gimple_folder::fold_pfalse () > +{ > + if (pred == PRED_none) > + return nullptr; > + tree arg0 = gimple_call_arg (call, 0); > + if (pred == PRED_m) > + { > + /* Unary function shapes with _m predication are folded to the > + inactive vector (arg0), while other function shapes are folded > + to op1 (arg1). > + In some intrinsics, e.g. svqshlu, lhs and op1 have different types, > + such that folding to op1 is not appropriate. */ > + tree arg1 = gimple_call_arg (call, 1); > + tree t; > + if (is_pfalse (arg1)) > + t = arg0; > + else if (is_pfalse (arg0)) > + t = arg1; > + else return nullptr; Formatting nit: the return should be on its own line. > + if (TREE_TYPE (t) == TREE_TYPE (lhs)) > + return fold_call_to (t); Going back to the question above, is there a case that this type check has to reject for correctness reasons? (I should try to work this out for myself, sorry, but I'm guessing you already have an example.) > + } > + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) > + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); > + if (pred == PRED_implicit && is_pfalse (arg0)) > + { > + unsigned int flags = call_properties (); > + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with > + AGGREGATE types as lhs. */ We could probably fold to a __builtin_memset call instead, but yeah, that's probably best left as future work. > + if ((flags & CP_READ_MEMORY) > + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) > + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); > + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) > + return fold_to_stmt_vops (gimple_build_nop ()); > + } > + return nullptr; > +} > + > /* Fold the call to constant VAL. */ > gimple * > gimple_folder::fold_to_cstu (poly_uint64 val) > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c > new file mode 100644 > index 00000000000..72fb5b6ac6d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c It looks like this and pfalse-unary_0.c are more like header files than standalone tests. It would probably be better to turn them into .h files if so. > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c > new file mode 100644 > index 00000000000..c1779818fc5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.c" > + > +B (brkn, Zv) > +B (brkpa, Zv) > +B (brkpb, Zv) > +ALL_DATA (splice, IMPLICITv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ Nice approach! However, I think ".cfi_startproc" is specific to ELF and so will fail on Darwin. Rather than try to work with both, it would probably be easier to add: /* { dg-require-effective-target elf } */ after the dg-do line. Same for the other tests. > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c > new file mode 100644 > index 00000000000..9e74b0d1a35 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +#include "../pfalse-unary_0.c" > + > +ALL_INTEGER_SCALAR (andv, IMPLICIT) > +ALL_INTEGER_SCALAR (eorv, IMPLICIT) > +ALL_ARITH_SCALAR (maxv, IMPLICIT) > +ALL_ARITH_SCALAR (minv, IMPLICIT) > +ALL_INTEGER_SCALAR (orv, IMPLICIT) > +ALL_FLOAT_SCALAR (maxnmv, IMPLICIT) > +ALL_FLOAT_SCALAR (minnmv, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, 0\n\tret\n} 20 } } */ > +/* { dg-final { scan-tree-dump-times "return Nan" 6 "optimized" } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wxz]0(?:\.[sd])?, #?-?[1-9]+[0-9]*\n\tret\n} 27 } } */ This is quite a liberal regexp. Could we add one that matches specifically -1, for the ANDV and UMINV cases? It could be in addition to this one, to avoid complicating things by excluding -1 above. > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\t(movi|mvni)\tv0\.(2s|4h), 0x[0-9a-f]+, [a-z]+ [0-9]+\n\tret\n} 5 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 52 } } */ > [...] > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c > new file mode 100644 > index 00000000000..3da28712988 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c > @@ -0,0 +1,50 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TY1, TY2) \ > + void F##_f (TY1 *base, sv##TY2 data) \ > + { \ > + return sv##F (svpfalse_b (), base, data); \ > + } > + > +#define D_INTEGER(F, TY) \ > + T (F##_s64, TY, int64_t) \ > + T (F##_u64, TY, uint64_t) > + > +#define SD_INTEGER(F, TY) \ > + D_INTEGER (F, TY) \ > + T (F##_s32, TY, int32_t) \ > + T (F##_u32, TY, uint32_t) > + > +#define HSD_INTEGER(F, TY) \ > + SD_INTEGER (F, TY) \ > + T (F##_s16, TY, int16_t) \ > + T (F##_u16, TY, uint16_t) > + > +#define ALL_DATA(F, A) \ > + T (F##_bf16, bfloat16_t, bfloat16##A) \ > + T (F##_f16, float16_t, float16##A) \ > + T (F##_f32, float32_t, float32##A) \ > + T (F##_f64, float64_t, float64##A) \ > + T (F##_s8, int8_t, int8##A) \ > + T (F##_s16, int16_t, int16##A) \ > + T (F##_s32, int32_t, int32##A) \ > + T (F##_s64, int64_t, int64##A) \ > + T (F##_u8, uint8_t, uint8##A) \ > + T (F##_u16, uint16_t, uint16##A) \ > + T (F##_u32, uint32_t, uint32##A) \ > + T (F##_u64, uint64_t, uint64##A) \ > + > +HSD_INTEGER (st1b, int8_t) > +SD_INTEGER (st1h, int16_t) > +D_INTEGER (st1w, int32_t) > +ALL_DATA (st1, _t) > +ALL_DATA (st2, x2_t) > +ALL_DATA (st3, x3_t) > +ALL_DATA (st4, x4_t) > + > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ What happens for the other 48? I assume they're for ST[234]? The test above is OK if we can't optimise those properly yet, but it would be good to add a FIXME comment above the first regexp to explain that it ought to be 60 eventually. Thanks, Richard
> On 19 Nov 2024, at 13:00, Richard Sandiford <richard.sandiford@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz <jschmitz@nvidia.com> writes: >> If an SVE intrinsic has predicate pfalse, we can fold the call to >> a simplified assignment statement: For _m predication, the LHS can be assigned >> the operand for inactive values and for _z, we can assign a zero vector. >> For _x, the returned values can be arbitrary and as suggested by >> Richard Sandiford, we fold to a zero vector. >> >> For example, >> svint32_t foo (svint32_t op1, svint32_t op2) >> { >> return svadd_s32_m (svpfalse_b (), op1, op2); >> } >> can be folded to lhs = op1, such that foo is compiled to just a RET. >> >> For implicit predication, a case distinction is necessary: >> Intrinsics that read from memory can be folded to a zero vector. >> Intrinsics that write to memory or prefetch can be folded to a no-op. >> Other intrinsics need case-by-case implemenation, which we added in >> the corresponding svxxx_impl::fold. >> >> We implemented this optimization during gimple folding by calling a new method >> gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic >> cases described above. >> >> We tested the new behavior for each intrinsic with all supported predications >> and data types and checked the produced assembly. There is a test file >> for each shape subclass with scan-assembler-times tests that look for >> the simplified instruction sequences, such as individual RET instructions >> or zeroing moves. There is an additional directive counting the total number of >> functions in the test, which must be the sum of counts of all other >> directives. This is to check that all tested intrinsics were optimized. >> >> Some few intrinsics were not covered by this patch: >> - svlasta and svlastb already have an implementation to cover a pfalse >> predicate. No changes were made to them. >> - svld1/2/3/4 return aggregate types and were excluded from the case >> that folds calls with implicit predication to lhs = {0, ...}. >> - svst1/2/3/4 already have an implementation in svstx_impl that precedes >> our optimization, such that it is not triggered. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> > > Thanks, looks really good. Thanks for the review and positive feedback on the patch! Below are the responses to your comments and the revised patch. It was re-tested, no regression. Best, Jennifer > >> [...] >> @@ -2222,6 +2435,14 @@ class svpfirst_svpnext_impl : public function_base >> { >> public: >> CONSTEXPR svpfirst_svpnext_impl (int unspec) : m_unspec (unspec) {} >> + gimple * >> + fold (gimple_folder &f) const override >> + { >> + tree pg = gimple_call_arg (f.call, 0); >> + if (is_pfalse (pg)) >> + return f.fold_call_to (pg); >> + return NULL; >> + } > > I think this should instead return the second argument for UNSPEC_PFIRST. Sorry about that and thanks for noticing it! I added the case distinction below. The test (sve/pfalse-unary.c) was adjusted accordingly (one more count for RET and one less for PFALSE + RET): tree pg = gimple_call_arg (f.call, 0); if (is_pfalse (pg)) - return f.fold_call_to (pg); + return f.fold_call_to (m_unspec == UNSPEC_PFIRST + ? gimple_call_arg (f.call, 1) + : pg); return NULL; > >> [...] >> @@ -556,6 +576,24 @@ public: >> } >> }; >> >> +class svqshlu_impl : public unspec_based_function >> +{ >> +public: >> + CONSTEXPR svqshlu_impl () >> + : unspec_based_function (UNSPEC_SQSHLU, -1, -1) {} >> + gimple * >> + fold (gimple_folder &f) const override >> + { >> + if (is_pfalse (gimple_call_arg (f.call, 0))) >> + { >> + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (f.lhs), >> + gimple_call_arg (f.call, 1)); >> + return gimple_build_assign (f.lhs, VIEW_CONVERT_EXPR, rhs); >> + } >> + return NULL; >> + } >> +}; >> + > > Could we handle this in the generic code instead? (More below.) Yes, see below. > >> [...] >> @@ -3622,6 +3631,51 @@ gimple_folder::redirect_pred_x () >> return redirect_call (instance); >> } >> >> +/* Fold calls with predicate pfalse: >> + _m predication: lhs = op1. >> + _x or _z: lhs = {0, ...}. >> + Implicit predication that reads from memory: lhs = {0, ...}. >> + Implicit predication that writes to memory or prefetches: no-op. >> + Return the new gimple statement on success, else NULL. */ >> +gimple * >> +gimple_folder::fold_pfalse () >> +{ >> + if (pred == PRED_none) >> + return nullptr; >> + tree arg0 = gimple_call_arg (call, 0); >> + if (pred == PRED_m) >> + { >> + /* Unary function shapes with _m predication are folded to the >> + inactive vector (arg0), while other function shapes are folded >> + to op1 (arg1). >> + In some intrinsics, e.g. svqshlu, lhs and op1 have different types, >> + such that folding to op1 is not appropriate. */ >> + tree arg1 = gimple_call_arg (call, 1); >> + tree t; >> + if (is_pfalse (arg1)) >> + t = arg0; >> + else if (is_pfalse (arg0)) >> + t = arg1; >> + else return nullptr; > > Formatting nit: the return should be on its own line. Done. > >> + if (TREE_TYPE (t) == TREE_TYPE (lhs)) >> + return fold_call_to (t); > > Going back to the question above, is there a case that this type check > has to reject for correctness reasons? (I should try to work this out > for myself, sorry, but I'm guessing you already have an example.) The check was added because svqshlu has a signed op1 but returns an unsigned type. Without the view_convert, it ICEs with a non-trivial conversion error. But I added the conditional type conversion in the generic code (diff to previous patch version) and reverted svqshlu_impl: + /* In some intrinsics, e.g. svqshlu, lhs and op1 have different types, + such that folding to op1 needs a type conversion. */ if (TREE_TYPE (t) == TREE_TYPE (lhs)) return fold_call_to (t); + else + { + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); + return gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs); + } > >> + } >> + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) >> + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); >> + if (pred == PRED_implicit && is_pfalse (arg0)) >> + { >> + unsigned int flags = call_properties (); >> + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with >> + AGGREGATE types as lhs. */ > > We could probably fold to a __builtin_memset call instead, but yeah, > that's probably best left as future work. Ok, I made no changes here. > >> + if ((flags & CP_READ_MEMORY) >> + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) >> + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); >> + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) >> + return fold_to_stmt_vops (gimple_build_nop ()); >> + } >> + return nullptr; >> +} >> + >> /* Fold the call to constant VAL. */ >> gimple * >> gimple_folder::fold_to_cstu (poly_uint64 val) >> [...] >> diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >> new file mode 100644 >> index 00000000000..72fb5b6ac6d >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c > > It looks like this and pfalse-unary_0.c are more like header files > than standalone tests. It would probably be better to turn them > into .h files if so. Done. > >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >> new file mode 100644 >> index 00000000000..c1779818fc5 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include "../pfalse-binary_0.c" >> + >> +B (brkn, Zv) >> +B (brkpa, Zv) >> +B (brkpb, Zv) >> +ALL_DATA (splice, IMPLICITv) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ > > Nice approach! However, I think ".cfi_startproc" is specific to ELF > and so will fail on Darwin. Rather than try to work with both, it would > probably be easier to add: > > /* { dg-require-effective-target elf } */ > > after the dg-do line. Same for the other tests. Done. > >> [...] >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c >> new file mode 100644 >> index 00000000000..9e74b0d1a35 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c >> @@ -0,0 +1,18 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-optimized" } */ >> + >> +#include "../pfalse-unary_0.c" >> + >> +ALL_INTEGER_SCALAR (andv, IMPLICIT) >> +ALL_INTEGER_SCALAR (eorv, IMPLICIT) >> +ALL_ARITH_SCALAR (maxv, IMPLICIT) >> +ALL_ARITH_SCALAR (minv, IMPLICIT) >> +ALL_INTEGER_SCALAR (orv, IMPLICIT) >> +ALL_FLOAT_SCALAR (maxnmv, IMPLICIT) >> +ALL_FLOAT_SCALAR (minnmv, IMPLICIT) >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, 0\n\tret\n} 20 } } */ >> +/* { dg-final { scan-tree-dump-times "return Nan" 6 "optimized" } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wxz]0(?:\.[sd])?, #?-?[1-9]+[0-9]*\n\tret\n} 27 } } */ > > This is quite a liberal regexp. Could we add one that matches specifically > -1, for the ANDV and UMINV cases? It could be in addition to this one, > to avoid complicating things by excluding -1 above. Done. I added a comment at the end of the test explaining the extra 12 counts. > >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\t(movi|mvni)\tv0\.(2s|4h), 0x[0-9a-f]+, [a-z]+ [0-9]+\n\tret\n} 5 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 52 } } */ >> [...] >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c >> new file mode 100644 >> index 00000000000..3da28712988 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c >> @@ -0,0 +1,50 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +#include <arm_sve.h> >> + >> +#define T(F, TY1, TY2) \ >> + void F##_f (TY1 *base, sv##TY2 data) \ >> + { \ >> + return sv##F (svpfalse_b (), base, data); \ >> + } >> + >> +#define D_INTEGER(F, TY) \ >> + T (F##_s64, TY, int64_t) \ >> + T (F##_u64, TY, uint64_t) >> + >> +#define SD_INTEGER(F, TY) \ >> + D_INTEGER (F, TY) \ >> + T (F##_s32, TY, int32_t) \ >> + T (F##_u32, TY, uint32_t) >> + >> +#define HSD_INTEGER(F, TY) \ >> + SD_INTEGER (F, TY) \ >> + T (F##_s16, TY, int16_t) \ >> + T (F##_u16, TY, uint16_t) >> + >> +#define ALL_DATA(F, A) \ >> + T (F##_bf16, bfloat16_t, bfloat16##A) \ >> + T (F##_f16, float16_t, float16##A) \ >> + T (F##_f32, float32_t, float32##A) \ >> + T (F##_f64, float64_t, float64##A) \ >> + T (F##_s8, int8_t, int8##A) \ >> + T (F##_s16, int16_t, int16##A) \ >> + T (F##_s32, int32_t, int32##A) \ >> + T (F##_s64, int64_t, int64##A) \ >> + T (F##_u8, uint8_t, uint8##A) \ >> + T (F##_u16, uint16_t, uint16##A) \ >> + T (F##_u32, uint32_t, uint32##A) \ >> + T (F##_u64, uint64_t, uint64##A) \ >> + >> +HSD_INTEGER (st1b, int8_t) >> +SD_INTEGER (st1h, int16_t) >> +D_INTEGER (st1w, int32_t) >> +ALL_DATA (st1, _t) >> +ALL_DATA (st2, x2_t) >> +ALL_DATA (st3, x3_t) >> +ALL_DATA (st4, x4_t) >> + >> + >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ >> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ > > What happens for the other 48? I assume they're for ST[234]? Yes, that’s right. The tests are already there (ALL_DATA (stX,…), but they are not folded yet. > > The test above is OK if we can't optimise those properly yet, but it would > be good to add a FIXME comment above the first regexp to explain that > it ought to be 60 eventually. Done. > > Thanks, > Richard If an SVE intrinsic has predicate pfalse, we can fold the call to a simplified assignment statement: For _m predication, the LHS can be assigned the operand for inactive values and for _z, we can assign a zero vector. For _x, the returned values can be arbitrary and as suggested by Richard Sandiford, we fold to a zero vector. For example, svint32_t foo (svint32_t op1, svint32_t op2) { return svadd_s32_m (svpfalse_b (), op1, op2); } can be folded to lhs = op1, such that foo is compiled to just a RET. For implicit predication, a case distinction is necessary: Intrinsics that read from memory can be folded to a zero vector. Intrinsics that write to memory or prefetch can be folded to a no-op. Other intrinsics need case-by-case implemenation, which we added in the corresponding svxxx_impl::fold. We implemented this optimization during gimple folding by calling a new method gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic cases described above. We tested the new behavior for each intrinsic with all supported predications and data types and checked the produced assembly. There is a test file for each shape subclass with scan-assembler-times tests that look for the simplified instruction sequences, such as individual RET instructions or zeroing moves. There is an additional directive counting the total number of functions in the test, which must be the sum of counts of all other directives. This is to check that all tested intrinsics were optimized. Some few intrinsics were not covered by this patch: - svlasta and svlastb already have an implementation to cover a pfalse predicate. No changes were made to them. - svld1/2/3/4 return aggregate types and were excluded from the case that folds calls with implicit predication to lhs = {0, ...}. - svst1/2/3/4 already have an implementation in svstx_impl that precedes our optimization, such that it is not triggered. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ChangeLog: PR target/106329 * config/aarch64/aarch64-sve-builtins-base.cc (svac_impl::fold): Add folding if pfalse predicate. (svadda_impl::fold): Likewise. (class svaddv_impl): Likewise. (class svandv_impl): Likewise. (svclast_impl::fold): Likewise. (svcmp_impl::fold): Likewise. (svcmp_wide_impl::fold): Likewise. (svcmpuo_impl::fold): Likewise. (svcntp_impl::fold): Likewise. (class svcompact_impl): Likewise. (class svcvtnt_impl): Likewise. (class sveorv_impl): Likewise. (class svminv_impl): Likewise. (class svmaxnmv_impl): Likewise. (class svmaxv_impl): Likewise. (class svminnmv_impl): Likewise. (class svorv_impl): Likewise. (svpfirst_svpnext_impl::fold): Likewise. (svptest_impl::fold): Likewise. (class svsplice_impl): Likewise. * config/aarch64/aarch64-sve-builtins-sve2.cc (class svcvtxnt_impl): Likewise. (svmatch_svnmatch_impl::fold): Likewise. * config/aarch64/aarch64-sve-builtins.cc (is_pfalse): Return true if tree is pfalse. (gimple_folder::fold_pfalse): Fold calls with pfalse predicate. (gimple_folder::fold_call_to): Fold call to lhs = t for given tree t. (gimple_folder::fold_to_stmt_vops): Helper function that folds the call to given stmt and adjusts virtual operands. (gimple_folder::fold): Call fold_pfalse. * config/aarch64/aarch64-sve-builtins.h (is_pfalse): Declare is_pfalse. gcc/testsuite/ChangeLog: PR target/106329 * gcc.target/aarch64/pfalse-binary_0.c: New test. * gcc.target/aarch64/pfalse-unary_0.c: New test. * gcc.target/aarch64/sve/pfalse-binary.c: New test. * gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binaryxn.c: New test. * gcc.target/aarch64/sve/pfalse-clast.c: New test. * gcc.target/aarch64/sve/pfalse-compare_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-count_pred.c: New test. * gcc.target/aarch64/sve/pfalse-fold_left.c: New test. * gcc.target/aarch64/sve/pfalse-load.c: New test. * gcc.target/aarch64/sve/pfalse-load_ext.c: New test. * gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c: New test. * gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c: New test. * gcc.target/aarch64/sve/pfalse-load_gather_sv.c: New test. * gcc.target/aarch64/sve/pfalse-load_gather_vs.c: New test. * gcc.target/aarch64/sve/pfalse-load_replicate.c: New test. * gcc.target/aarch64/sve/pfalse-prefetch.c: New test. * gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c: New test. * gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c: New test. * gcc.target/aarch64/sve/pfalse-ptest.c: New test. * gcc.target/aarch64/sve/pfalse-rdffr.c: New test. * gcc.target/aarch64/sve/pfalse-reduction.c: New test. * gcc.target/aarch64/sve/pfalse-reduction_wide.c: New test. * gcc.target/aarch64/sve/pfalse-shift_right_imm.c: New test. * gcc.target/aarch64/sve/pfalse-store.c: New test. * gcc.target/aarch64/sve/pfalse-store_scatter_index.c: New test. * gcc.target/aarch64/sve/pfalse-store_scatter_offset.c: New test. * gcc.target/aarch64/sve/pfalse-storexn.c: New test. * gcc.target/aarch64/sve/pfalse-ternary_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-ternary_rotate.c: New test. * gcc.target/aarch64/sve/pfalse-unary.c: New test. * gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c: New test. * gcc.target/aarch64/sve/pfalse-unary_convertxn.c: New test. * gcc.target/aarch64/sve/pfalse-unary_n.c: New test. * gcc.target/aarch64/sve/pfalse-unary_pred.c: New test. * gcc.target/aarch64/sve/pfalse-unary_to_uint.c: New test. * gcc.target/aarch64/sve/pfalse-unaryxn.c: New test. * gcc.target/aarch64/sve2/pfalse-binary.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_wide.c: New test. * gcc.target/aarch64/sve2/pfalse-compare.c: New test. * gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-load_gather_vs.c: New test. * gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c: New test. * gcc.target/aarch64/sve2/pfalse-shift_right_imm.c: New test. * gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c: New test. * gcc.target/aarch64/sve2/pfalse-unary.c: New test. * gcc.target/aarch64/sve2/pfalse-unary_convert.c: New test. * gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c: New test. * gcc.target/aarch64/sve2/pfalse-unary_to_int.c: New test. --- .../aarch64/aarch64-sve-builtins-base.cc | 268 +++++++++++++++++- .../aarch64/aarch64-sve-builtins-sve2.cc | 22 +- gcc/config/aarch64/aarch64-sve-builtins.cc | 79 ++++++ gcc/config/aarch64/aarch64-sve-builtins.h | 4 + .../gcc.target/aarch64/pfalse-binary_0.h | 240 ++++++++++++++++ .../gcc.target/aarch64/pfalse-unary_0.h | 195 +++++++++++++ .../gcc.target/aarch64/sve/pfalse-binary.c | 14 + .../aarch64/sve/pfalse-binary_int_opt_n.c | 11 + .../aarch64/sve/pfalse-binary_opt_n.c | 31 ++ .../aarch64/sve/pfalse-binary_opt_single_n.c | 14 + .../aarch64/sve/pfalse-binary_rotate.c | 27 ++ .../aarch64/sve/pfalse-binary_uint64_opt_n.c | 13 + .../aarch64/sve/pfalse-binary_uint_opt_n.c | 13 + .../gcc.target/aarch64/sve/pfalse-binaryxn.c | 31 ++ .../gcc.target/aarch64/sve/pfalse-clast.c | 11 + .../aarch64/sve/pfalse-compare_opt_n.c | 20 ++ .../aarch64/sve/pfalse-compare_wide_opt_n.c | 15 + .../aarch64/sve/pfalse-count_pred.c | 10 + .../gcc.target/aarch64/sve/pfalse-fold_left.c | 10 + .../gcc.target/aarch64/sve/pfalse-load.c | 32 +++ .../gcc.target/aarch64/sve/pfalse-load_ext.c | 40 +++ .../sve/pfalse-load_ext_gather_index.c | 51 ++++ .../sve/pfalse-load_ext_gather_offset.c | 71 +++++ .../aarch64/sve/pfalse-load_gather_sv.c | 32 +++ .../aarch64/sve/pfalse-load_gather_vs.c | 37 +++ .../aarch64/sve/pfalse-load_replicate.c | 31 ++ .../gcc.target/aarch64/sve/pfalse-prefetch.c | 22 ++ .../sve/pfalse-prefetch_gather_index.c | 27 ++ .../sve/pfalse-prefetch_gather_offset.c | 25 ++ .../gcc.target/aarch64/sve/pfalse-ptest.c | 12 + .../gcc.target/aarch64/sve/pfalse-rdffr.c | 13 + .../gcc.target/aarch64/sve/pfalse-reduction.c | 23 ++ .../aarch64/sve/pfalse-reduction_wide.c | 10 + .../aarch64/sve/pfalse-shift_right_imm.c | 28 ++ .../gcc.target/aarch64/sve/pfalse-store.c | 53 ++++ .../aarch64/sve/pfalse-store_scatter_index.c | 40 +++ .../aarch64/sve/pfalse-store_scatter_offset.c | 68 +++++ .../gcc.target/aarch64/sve/pfalse-storexn.c | 30 ++ .../aarch64/sve/pfalse-ternary_opt_n.c | 51 ++++ .../aarch64/sve/pfalse-ternary_rotate.c | 27 ++ .../gcc.target/aarch64/sve/pfalse-unary.c | 33 +++ .../sve/pfalse-unary_convert_narrowt.c | 21 ++ .../aarch64/sve/pfalse-unary_convertxn.c | 50 ++++ .../gcc.target/aarch64/sve/pfalse-unary_n.c | 43 +++ .../aarch64/sve/pfalse-unary_pred.c | 10 + .../aarch64/sve/pfalse-unary_to_uint.c | 13 + .../gcc.target/aarch64/sve/pfalse-unaryxn.c | 14 + .../gcc.target/aarch64/sve2/pfalse-binary.c | 15 + .../aarch64/sve2/pfalse-binary_int_opt_n.c | 13 + .../sve2/pfalse-binary_int_opt_single_n.c | 11 + .../aarch64/sve2/pfalse-binary_opt_n.c | 17 ++ .../aarch64/sve2/pfalse-binary_opt_single_n.c | 12 + .../aarch64/sve2/pfalse-binary_to_uint.c | 10 + .../aarch64/sve2/pfalse-binary_uint_opt_n.c | 11 + .../aarch64/sve2/pfalse-binary_wide.c | 11 + .../gcc.target/aarch64/sve2/pfalse-compare.c | 11 + .../pfalse-load_ext_gather_index_restricted.c | 46 +++ ...pfalse-load_ext_gather_offset_restricted.c | 65 +++++ .../sve2/pfalse-load_gather_sv_restricted.c | 28 ++ .../aarch64/sve2/pfalse-load_gather_vs.c | 36 +++ .../sve2/pfalse-shift_left_imm_to_uint.c | 28 ++ .../aarch64/sve2/pfalse-shift_right_imm.c | 32 +++ .../pfalse-store_scatter_index_restricted.c | 50 ++++ .../pfalse-store_scatter_offset_restricted.c | 65 +++++ .../gcc.target/aarch64/sve2/pfalse-unary.c | 32 +++ .../aarch64/sve2/pfalse-unary_convert.c | 31 ++ .../sve2/pfalse-unary_convert_narrowt.c | 23 ++ .../aarch64/sve2/pfalse-unary_to_int.c | 11 + 68 files changed, 2479 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h create mode 100644 gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 2117eceb606..aaa293a9091 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -201,6 +201,15 @@ class svac_impl : public function_base public: CONSTEXPR svac_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -216,6 +225,14 @@ public: class svadda_impl : public function_base { public: + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (gimple_call_arg (f.call, 1)); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -227,6 +244,21 @@ public: } }; +class svaddv_impl : public reduction +{ +public: + CONSTEXPR svaddv_impl () + : reduction (UNSPEC_SADDV, UNSPEC_UADDV, UNSPEC_FADDV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + /* Implements svadr[bhwd]. */ class svadr_bhwd_impl : public function_base { @@ -245,11 +277,25 @@ public: e.args.quick_push (expand_vector_broadcast (mode, shift)); return e.use_exact_insn (code_for_aarch64_adr_shift (mode)); } - /* How many bits left to shift the vector displacement. */ unsigned int m_shift; }; + +class svandv_impl : public reduction +{ +public: + CONSTEXPR svandv_impl () : reduction (UNSPEC_ANDV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_all_ones_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svbic_impl : public function_base { public: @@ -333,6 +379,14 @@ class svclast_impl : public quiet<function_base> public: CONSTEXPR svclast_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (gimple_call_arg (f.call, 1)); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -425,6 +479,8 @@ public: return gimple_build_assign (f.lhs, m_code, rhs1, rhs2); } + if (is_pfalse (pg)) + return f.fold_call_to (pg); return NULL; } @@ -464,6 +520,15 @@ public: : m_code (code), m_unspec_for_sint (unspec_for_sint), m_unspec_for_uint (unspec_for_uint) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -502,6 +567,16 @@ public: class svcmpuo_impl : public quiet<function_base> { public: + + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -598,6 +673,16 @@ public: class svcntp_impl : public function_base { public: + + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } + rtx expand (function_expander &e) const override { @@ -613,6 +698,19 @@ public: } }; +class svcompact_impl + : public QUIET_CODE_FOR_MODE0 (aarch64_sve_compact) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + /* Implements svcreate2, svcreate3 and svcreate4. */ class svcreate_impl : public quiet<multi_vector_function> { @@ -746,6 +844,18 @@ public: } }; +class svcvtnt_impl : public CODE_FOR_MODE0 (aarch64_sve_cvtnt) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (f.pred == PRED_x && is_pfalse (gimple_call_arg (f.call, 1))) + f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svdiv_impl : public rtx_code_function { public: @@ -1138,6 +1248,20 @@ public: } }; +class sveorv_impl : public reduction +{ +public: + CONSTEXPR sveorv_impl () : reduction (UNSPEC_XORV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + /* Implements svextb, svexth and svextw. */ class svext_bhw_impl : public function_base { @@ -1380,7 +1504,8 @@ public: BIT_FIELD_REF lowers to Advanced SIMD element extract, so we have to ensure the index of the element being accessed is in the range of a Advanced SIMD vector width. */ - gimple *fold (gimple_folder & f) const override + gimple * + fold (gimple_folder & f) const override { tree pred = gimple_call_arg (f.call, 0); tree val = gimple_call_arg (f.call, 1); @@ -1950,6 +2075,80 @@ public: } }; +class svminv_impl : public reduction +{ +public: + CONSTEXPR svminv_impl () + : reduction (UNSPEC_SMINV, UNSPEC_UMINV, UNSPEC_FMINV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + tree rhs = f.type_suffix (0).integer_p + ? TYPE_MAX_VALUE (TREE_TYPE (f.lhs)) + : build_real (TREE_TYPE (f.lhs), dconstinf); + return f.fold_call_to (rhs); + } + return NULL; + } +}; + +class svmaxnmv_impl : public reduction +{ +public: + CONSTEXPR svmaxnmv_impl () : reduction (UNSPEC_FMAXNMV) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + REAL_VALUE_TYPE rnan = dconst0; + rnan.cl = rvc_nan; + return f.fold_call_to (build_real (TREE_TYPE (f.lhs), rnan)); + } + return NULL; + } +}; + +class svmaxv_impl : public reduction +{ +public: + CONSTEXPR svmaxv_impl () + : reduction (UNSPEC_SMAXV, UNSPEC_UMAXV, UNSPEC_FMAXV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + tree rhs = f.type_suffix (0).integer_p + ? TYPE_MIN_VALUE (TREE_TYPE (f.lhs)) + : build_real (TREE_TYPE (f.lhs), dconstninf); + return f.fold_call_to (rhs); + } + return NULL; + } +}; + +class svminnmv_impl : public reduction +{ +public: + CONSTEXPR svminnmv_impl () : reduction (UNSPEC_FMINNMV) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + { + REAL_VALUE_TYPE rnan = dconst0; + rnan.cl = rvc_nan; + return f.fold_call_to (build_real (TREE_TYPE (f.lhs), rnan)); + } + return NULL; + } +}; + class svmla_impl : public function_base { public: @@ -2198,6 +2397,20 @@ public: } }; +class svorv_impl : public reduction +{ +public: + CONSTEXPR svorv_impl () : reduction (UNSPEC_IORV) {} + + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svpfalse_impl : public function_base { public: @@ -2222,6 +2435,16 @@ class svpfirst_svpnext_impl : public function_base { public: CONSTEXPR svpfirst_svpnext_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (m_unspec == UNSPEC_PFIRST + ? gimple_call_arg (f.call, 1) + : pg); + return NULL; + } rtx expand (function_expander &e) const override @@ -2302,6 +2525,13 @@ class svptest_impl : public function_base { public: CONSTEXPR svptest_impl (rtx_code compare) : m_compare (compare) {} + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (boolean_false_node); + return NULL; + } rtx expand (function_expander &e) const override @@ -2717,6 +2947,18 @@ public: } }; +class svsplice_impl : public QUIET_CODE_FOR_MODE0 (aarch64_sve_splice) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (is_pfalse (gimple_call_arg (f.call, 0))) + return f.fold_call_to (gimple_call_arg (f.call, 2)); + return NULL; + } +}; + class svst1_impl : public full_width_access { public: @@ -3170,13 +3412,13 @@ FUNCTION (svacle, svac_impl, (UNSPEC_COND_FCMLE)) FUNCTION (svaclt, svac_impl, (UNSPEC_COND_FCMLT)) FUNCTION (svadd, rtx_code_function, (PLUS, PLUS, UNSPEC_COND_FADD)) FUNCTION (svadda, svadda_impl,) -FUNCTION (svaddv, reduction, (UNSPEC_SADDV, UNSPEC_UADDV, UNSPEC_FADDV)) +FUNCTION (svaddv, svaddv_impl,) FUNCTION (svadrb, svadr_bhwd_impl, (0)) FUNCTION (svadrd, svadr_bhwd_impl, (3)) FUNCTION (svadrh, svadr_bhwd_impl, (1)) FUNCTION (svadrw, svadr_bhwd_impl, (2)) FUNCTION (svand, rtx_code_function, (AND, AND)) -FUNCTION (svandv, reduction, (UNSPEC_ANDV)) +FUNCTION (svandv, svandv_impl,) FUNCTION (svasr, rtx_code_function, (ASHIFTRT, ASHIFTRT)) FUNCTION (svasr_wide, shift_wide, (ASHIFTRT, UNSPEC_ASHIFTRT_WIDE)) FUNCTION (svasrd, unspec_based_function, (UNSPEC_ASRD, -1, -1)) @@ -3233,12 +3475,12 @@ FUNCTION (svcnth_pat, svcnt_bhwd_pat_impl, (VNx8HImode)) FUNCTION (svcntp, svcntp_impl,) FUNCTION (svcntw, svcnt_bhwd_impl, (VNx4SImode)) FUNCTION (svcntw_pat, svcnt_bhwd_pat_impl, (VNx4SImode)) -FUNCTION (svcompact, QUIET_CODE_FOR_MODE0 (aarch64_sve_compact),) +FUNCTION (svcompact, svcompact_impl,) FUNCTION (svcreate2, svcreate_impl, (2)) FUNCTION (svcreate3, svcreate_impl, (3)) FUNCTION (svcreate4, svcreate_impl, (4)) FUNCTION (svcvt, svcvt_impl,) -FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),) +FUNCTION (svcvtnt, svcvtnt_impl,) FUNCTION (svdiv, svdiv_impl,) FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV)) FUNCTION (svdot, svdot_impl,) @@ -3249,7 +3491,7 @@ FUNCTION (svdup_lane, svdup_lane_impl,) FUNCTION (svdupq, svdupq_impl,) FUNCTION (svdupq_lane, svdupq_lane_impl,) FUNCTION (sveor, rtx_code_function, (XOR, XOR, -1)) -FUNCTION (sveorv, reduction, (UNSPEC_XORV)) +FUNCTION (sveorv, sveorv_impl,) FUNCTION (svexpa, unspec_based_function, (-1, -1, UNSPEC_FEXPA)) FUNCTION (svext, QUIET_CODE_FOR_MODE0 (aarch64_sve_ext),) FUNCTION (svextb, svext_bhw_impl, (QImode)) @@ -3313,14 +3555,14 @@ FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX, UNSPEC_FMAX)) FUNCTION (svmaxnm, cond_or_uncond_unspec_function, (UNSPEC_COND_FMAXNM, UNSPEC_FMAXNM)) -FUNCTION (svmaxnmv, reduction, (UNSPEC_FMAXNMV)) -FUNCTION (svmaxv, reduction, (UNSPEC_SMAXV, UNSPEC_UMAXV, UNSPEC_FMAXV)) +FUNCTION (svmaxnmv, svmaxnmv_impl,) +FUNCTION (svmaxv, svmaxv_impl,) FUNCTION (svmin, rtx_code_function, (SMIN, UMIN, UNSPEC_COND_FMIN, UNSPEC_FMIN)) FUNCTION (svminnm, cond_or_uncond_unspec_function, (UNSPEC_COND_FMINNM, UNSPEC_FMINNM)) -FUNCTION (svminnmv, reduction, (UNSPEC_FMINNMV)) -FUNCTION (svminv, reduction, (UNSPEC_SMINV, UNSPEC_UMINV, UNSPEC_FMINV)) +FUNCTION (svminnmv, svminnmv_impl,) +FUNCTION (svminv, svminv_impl,) FUNCTION (svmla, svmla_impl,) FUNCTION (svmla_lane, svmla_lane_impl,) FUNCTION (svmls, svmls_impl,) @@ -3343,7 +3585,7 @@ FUNCTION (svnor, svnor_impl,) FUNCTION (svnot, svnot_impl,) FUNCTION (svorn, svorn_impl,) FUNCTION (svorr, rtx_code_function, (IOR, IOR)) -FUNCTION (svorv, reduction, (UNSPEC_IORV)) +FUNCTION (svorv, svorv_impl,) FUNCTION (svpfalse, svpfalse_impl,) FUNCTION (svpfirst, svpfirst_svpnext_impl, (UNSPEC_PFIRST)) FUNCTION (svpnext, svpfirst_svpnext_impl, (UNSPEC_PNEXT)) @@ -3405,7 +3647,7 @@ FUNCTION (svset2, svset_impl, (2)) FUNCTION (svset3, svset_impl, (3)) FUNCTION (svset4, svset_impl, (4)) FUNCTION (svsetffr, svsetffr_impl,) -FUNCTION (svsplice, QUIET_CODE_FOR_MODE0 (aarch64_sve_splice),) +FUNCTION (svsplice, svsplice_impl,) FUNCTION (svsqrt, rtx_code_function, (SQRT, SQRT, UNSPEC_COND_FSQRT)) FUNCTION (svst1, svst1_impl,) FUNCTION (svst1_scatter, svst1_scatter_impl,) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc index fd0c98c6b68..073763916a8 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc @@ -221,6 +221,18 @@ public: } }; +class svcvtxnt_impl : public CODE_FOR_MODE1 (aarch64_sve2_cvtxnt) +{ +public: + gimple * + fold (gimple_folder &f) const override + { + if (f.pred == PRED_x && is_pfalse (gimple_call_arg (f.call, 1))) + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); + return NULL; + } +}; + class svdup_laneq_impl : public function_base { public: @@ -358,6 +370,14 @@ class svmatch_svnmatch_impl : public function_base { public: CONSTEXPR svmatch_svnmatch_impl (int unspec) : m_unspec (unspec) {} + gimple * + fold (gimple_folder &f) const override + { + tree pg = gimple_call_arg (f.call, 0); + if (is_pfalse (pg)) + return f.fold_call_to (pg); + return NULL; + } rtx expand (function_expander &e) const override @@ -910,7 +930,7 @@ FUNCTION (svclamp, svclamp_impl,) FUNCTION (svcvtlt, unspec_based_function, (-1, -1, UNSPEC_COND_FCVTLT)) FUNCTION (svcvtn, svcvtn_impl,) FUNCTION (svcvtx, unspec_based_function, (-1, -1, UNSPEC_COND_FCVTX)) -FUNCTION (svcvtxnt, CODE_FOR_MODE1 (aarch64_sve2_cvtxnt),) +FUNCTION (svcvtxnt, svcvtxnt_impl,) FUNCTION (svdup_laneq, svdup_laneq_impl,) FUNCTION (sveor3, CODE_FOR_MODE0 (aarch64_sve2_eor3),) FUNCTION (sveorbt, unspec_based_function, (UNSPEC_EORBT, UNSPEC_EORBT, -1)) diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index b3d961452d3..554984b786c 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -3517,6 +3517,15 @@ is_ptrue (tree v, unsigned int step) && vector_cst_all_same (v, step)); } +/* Return true if V is a constant predicate that acts as a pfalse. */ +bool +is_pfalse (tree v) +{ + return (TREE_CODE (v) == VECTOR_CST + && TYPE_MODE (TREE_TYPE (v)) == VNx16BImode + && integer_zerop (v)); +} + gimple_folder::gimple_folder (const function_instance &instance, tree fndecl, gimple_stmt_iterator *gsi_in, gcall *call_in) : function_call_info (gimple_location (call_in), instance, fndecl), @@ -3622,6 +3631,57 @@ gimple_folder::redirect_pred_x () return redirect_call (instance); } +/* Fold calls with predicate pfalse: + _m predication: lhs = op1. + _x or _z: lhs = {0, ...}. + Implicit predication that reads from memory: lhs = {0, ...}. + Implicit predication that writes to memory or prefetches: no-op. + Return the new gimple statement on success, else NULL. */ +gimple * +gimple_folder::fold_pfalse () +{ + if (pred == PRED_none) + return nullptr; + tree arg0 = gimple_call_arg (call, 0); + if (pred == PRED_m) + { + /* Unary function shapes with _m predication are folded to the + inactive vector (arg0), while other function shapes are folded + to op1 (arg1). */ + tree arg1 = gimple_call_arg (call, 1); + tree t; + if (is_pfalse (arg1)) + t = arg0; + else if (is_pfalse (arg0)) + t = arg1; + else + return nullptr; + /* In some intrinsics, e.g. svqshlu, lhs and op1 have different types, + such that folding to op1 needs a type conversion. */ + if (TREE_TYPE (t) == TREE_TYPE (lhs)) + return fold_call_to (t); + else + { + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); + return gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs); + } + } + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); + if (pred == PRED_implicit && is_pfalse (arg0)) + { + unsigned int flags = call_properties (); + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with + AGGREGATE types as lhs. */ + if ((flags & CP_READ_MEMORY) + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) + return fold_to_stmt_vops (gimple_build_nop ()); + } + return nullptr; +} + /* Fold the call to constant VAL. */ gimple * gimple_folder::fold_to_cstu (poly_uint64 val) @@ -3724,6 +3784,23 @@ gimple_folder::fold_active_lanes_to (tree x) return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); } +/* Fold call to assignment statement lhs = t. */ +gimple * +gimple_folder::fold_call_to (tree t) +{ + return fold_to_stmt_vops (gimple_build_assign (lhs, t)); +} + +/* Fold call to G, incl. adjustments to the virtual operands. */ +gimple * +gimple_folder::fold_to_stmt_vops (gimple *g) +{ + gimple_seq stmts = NULL; + gimple_seq_add_stmt_without_update (&stmts, g); + gsi_replace_with_seq_vops (gsi, stmts); + return g; +} + /* Try to fold the call. Return the new statement on success and null on failure. */ gimple * @@ -3743,6 +3820,8 @@ gimple_folder::fold () /* First try some simplifications that are common to many functions. */ if (auto *call = redirect_pred_x ()) return call; + if (auto *call = fold_pfalse ()) + return call; return base->fold (*this); } diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 5bd9b88d117..ad4d06dd5f5 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -632,6 +632,7 @@ public: gcall *redirect_call (const function_instance &); gimple *redirect_pred_x (); + gimple *fold_pfalse (); gimple *fold_to_cstu (poly_uint64); gimple *fold_to_pfalse (); @@ -639,6 +640,8 @@ public: gimple *fold_to_vl_pred (unsigned int); gimple *fold_const_binary (enum tree_code); gimple *fold_active_lanes_to (tree); + gimple *fold_call_to (tree); + gimple *fold_to_stmt_vops (gimple *); gimple *fold (); @@ -832,6 +835,7 @@ extern tree acle_svprfop; bool vector_cst_all_same (tree, unsigned int); bool is_ptrue (tree, unsigned int); +bool is_pfalse (tree); const function_instance *lookup_fndecl (tree); /* Try to find a mode with the given mode_suffix_info fields. Return the diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h new file mode 100644 index 00000000000..72fb5b6ac6d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h @@ -0,0 +1,240 @@ +#include <arm_sve.h> + +#define MXZ(F, RTY, TY1, TY2) \ + RTY F##_f (TY1 op1, TY2 op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2); \ + } + +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY##_m, RTY, TYPE1, sv##TYPE2) \ + MXZ (F##_##TY##_x, RTY, TYPE1, sv##TYPE2) + +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY##_z, RTY, TYPE1, sv##TYPE2) + +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY) \ + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) + +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY) \ + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY##_z, RTY, TYPE1, TYPE2) + +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY) \ + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY##_m, RTY, TYPE1, TYPE2) \ + MXZ (F##_n_##TY##_x, RTY, TYPE1, TYPE2) \ + PRED_Z (F, RTY, TYPE1, TYPE2, TY) + +#define PRED_IMPLICITv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY, RTY, TYPE1, sv##TYPE2) + +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY) \ + PRED_IMPLICITv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY, RTY, TYPE1, TYPE2) + +#define ALL_Q_INTEGER(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svint8_t, svint8_t, int8_t, s8) + +#define ALL_Q_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svint8_t, svint8_t, uint8_t, s8) + +#define ALL_Q_INTEGER_INT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, int8_t, u8) \ + PRED_##P (F, svint8_t, svint8_t, int8_t, s8) + +#define ALL_Q_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svbool_t, svint8_t, int8_t, s8) + +#define ALL_H_INTEGER(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, int16_t, s16) + +#define ALL_H_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, uint16_t, s16) + +#define ALL_H_INTEGER_INT(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, int16_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, int16_t, s16) + +#define ALL_H_INTEGER_WIDE(F, P) \ + PRED_##P (F, svuint16_t, svuint16_t, uint8_t, u16) \ + PRED_##P (F, svint16_t, svint16_t, int8_t, s16) + +#define ALL_H_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svbool_t, svint16_t, int16_t, s16) + +#define ALL_S_INTEGER(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, int32_t, s32) + +#define ALL_S_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, uint32_t, s32) + +#define ALL_S_INTEGER_INT(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, int32_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, int32_t, s32) + +#define ALL_S_INTEGER_WIDE(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint16_t, u32) \ + PRED_##P (F, svint32_t, svint32_t, int16_t, s32) + +#define ALL_S_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svbool_t, svint32_t, int32_t, s32) + +#define ALL_D_INTEGER(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, int64_t, s64) + +#define ALL_D_INTEGER_UINT(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, uint64_t, s64) + +#define ALL_D_INTEGER_INT(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, int64_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, int64_t, s64) + +#define ALL_D_INTEGER_WIDE(F, P) \ + PRED_##P (F, svuint64_t, svuint64_t, uint32_t, u64) \ + PRED_##P (F, svint64_t, svint64_t, int32_t, s64) + +#define ALL_D_INTEGER_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svbool_t, svint64_t, int64_t, s64) + +#define SD_INTEGER_TO_UINT(F, P) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svuint32_t, svint32_t, int32_t, s32) \ + PRED_##P (F, svuint64_t, svint64_t, int64_t, s64) + +#define BH_INTEGER_BOOL(F, P) \ + ALL_Q_INTEGER_BOOL (F, P) \ + ALL_H_INTEGER_BOOL (F, P) + +#define BHS_UNSIGNED_UINT64(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint64_t, u8) \ + PRED_##P (F, svuint16_t, svuint16_t, uint64_t, u16) \ + PRED_##P (F, svuint32_t, svuint32_t, uint64_t, u32) + +#define BHS_UNSIGNED_WIDE_BOOL(F, P) \ + PRED_##P (F, svbool_t, svuint8_t, uint64_t, u8) \ + PRED_##P (F, svbool_t, svuint16_t, uint64_t, u16) \ + PRED_##P (F, svbool_t, svuint32_t, uint64_t, u32) + +#define BHS_SIGNED_UINT64(F, P) \ + PRED_##P (F, svint8_t, svint8_t, uint64_t, s8) \ + PRED_##P (F, svint16_t, svint16_t, uint64_t, s16) \ + PRED_##P (F, svint32_t, svint32_t, uint64_t, s32) + +#define BHS_SIGNED_WIDE_BOOL(F, P) \ + PRED_##P (F, svbool_t, svint8_t, int64_t, s8) \ + PRED_##P (F, svbool_t, svint16_t, int64_t, s16) \ + PRED_##P (F, svbool_t, svint32_t, int64_t, s32) + +#define ALL_UNSIGNED_UINT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) + +#define ALL_UNSIGNED_INT(F, P) \ + PRED_##P (F, svuint8_t, svuint8_t, int8_t, u8) \ + PRED_##P (F, svuint16_t, svuint16_t, int16_t, u16) \ + PRED_##P (F, svuint32_t, svuint32_t, int32_t, u32) \ + PRED_##P (F, svuint64_t, svuint64_t, int64_t, u64) + +#define ALL_SIGNED_UINT(F, P) \ + PRED_##P (F, svint8_t, svint8_t, uint8_t, s8) \ + PRED_##P (F, svint16_t, svint16_t, uint16_t, s16) \ + PRED_##P (F, svint32_t, svint32_t, uint32_t, s32) \ + PRED_##P (F, svint64_t, svint64_t, uint64_t, s64) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, svfloat16_t, svfloat16_t, float16_t, f16) \ + PRED_##P (F, svfloat32_t, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svfloat64_t, svfloat64_t, float64_t, f64) + +#define ALL_FLOAT_INT(F, P) \ + PRED_##P (F, svfloat16_t, svfloat16_t, int16_t, f16) \ + PRED_##P (F, svfloat32_t, svfloat32_t, int32_t, f32) \ + PRED_##P (F, svfloat64_t, svfloat64_t, int64_t, f64) + +#define ALL_FLOAT_BOOL(F, P) \ + PRED_##P (F, svbool_t, svfloat16_t, float16_t, f16) \ + PRED_##P (F, svbool_t, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svbool_t, svfloat64_t, float64_t, f64) + +#define ALL_FLOAT_SCALAR(F, P) \ + PRED_##P (F, float16_t, float16_t, float16_t, f16) \ + PRED_##P (F, float32_t, float32_t, float32_t, f32) \ + PRED_##P (F, float64_t, float64_t, float64_t, f64) + +#define B(F, P) \ + PRED_##P (F, svbool_t, svbool_t, bool_t, b) + +#define ALL_SD_INTEGER(F, P) \ + ALL_S_INTEGER (F, P) \ + ALL_D_INTEGER (F, P) + +#define HSD_INTEGER_WIDE(F, P) \ + ALL_H_INTEGER_WIDE (F, P) \ + ALL_S_INTEGER_WIDE (F, P) \ + ALL_D_INTEGER_WIDE (F, P) + +#define BHS_INTEGER_UINT64(F, P) \ + BHS_UNSIGNED_UINT64 (F, P) \ + BHS_SIGNED_UINT64 (F, P) + +#define BHS_INTEGER_WIDE_BOOL(F, P) \ + BHS_UNSIGNED_WIDE_BOOL (F, P) \ + BHS_SIGNED_WIDE_BOOL (F, P) + +#define ALL_INTEGER(F, P) \ + ALL_Q_INTEGER (F, P) \ + ALL_H_INTEGER (F, P) \ + ALL_S_INTEGER (F, P) \ + ALL_D_INTEGER (F, P) + +#define ALL_INTEGER_UINT(F, P) \ + ALL_Q_INTEGER_UINT (F, P) \ + ALL_H_INTEGER_UINT (F, P) \ + ALL_S_INTEGER_UINT (F, P) \ + ALL_D_INTEGER_UINT (F, P) + +#define ALL_INTEGER_INT(F, P) \ + ALL_Q_INTEGER_INT (F, P) \ + ALL_H_INTEGER_INT (F, P) \ + ALL_S_INTEGER_INT (F, P) \ + ALL_D_INTEGER_INT (F, P) + +#define ALL_INTEGER_BOOL(F, P) \ + ALL_Q_INTEGER_BOOL (F, P) \ + ALL_H_INTEGER_BOOL (F, P) \ + ALL_S_INTEGER_BOOL (F, P) \ + ALL_D_INTEGER_BOOL (F, P) + +#define ALL_FLOAT_AND_SD_INTEGER(F, P) \ + ALL_SD_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_ARITH(F, P) \ + ALL_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_ARITH_BOOL(F, P) \ + ALL_INTEGER_BOOL (F, P) \ + ALL_FLOAT_BOOL (F, P) + +#define ALL_DATA(F, P) \ + ALL_ARITH (F, P) \ + PRED_##P (F, svbfloat16_t, svbfloat16_t, bfloat16_t, bf16) + diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h b/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h new file mode 100644 index 00000000000..f6183a14325 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h @@ -0,0 +1,195 @@ +#include <arm_sve.h> +#include <stdbool.h> + +#define M(F, RTY, TY) \ + RTY F##_f (RTY inactive, TY op) \ + { \ + return sv##F (inactive, svpfalse_b (), op); \ + } + +#define XZI(F, RTY, TY) \ + RTY F##_f (TY op) \ + { \ + return sv##F (svpfalse_b (), op); \ + } + +#define PRED_Z(F, RTY, TYPE, TY) \ + XZI (F##_##TY##_z, RTY, sv##TYPE) \ + +#define PRED_MZ(F, RTY, TYPE, TY) \ + M (F##_##TY##_m, RTY, sv##TYPE) \ + PRED_Z (F, RTY, TYPE, TY) + +#define PRED_MXZ(F, RTY, TYPE, TY) \ + XZI (F##_##TY##_x, RTY, sv##TYPE) \ + PRED_MZ (F, RTY, TYPE, TY) + +#define PRED_IMPLICIT(F, RTY, TYPE, TY) \ + XZI (F##_##TY, RTY, sv##TYPE) + +#define PRED_IMPLICITn(F, RTY, TYPE) \ + XZI (F, RTY, sv##TYPE) + +#define Q_INTEGER(F, P) \ + PRED_##P (F, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svint8_t, int8_t, s8) + +#define Q_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint8_t, uint8_t, u8) \ + PRED_##P (F, int8_t, int8_t, s8) + +#define Q_INTEGER_SCALAR_WIDE(F, P) \ + PRED_##P (F, uint64_t, uint8_t, u8) \ + PRED_##P (F, int64_t, int8_t, s8) + +#define H_INTEGER(F, P) \ + PRED_##P (F, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svint16_t, int16_t, s16) + +#define H_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint16_t, uint16_t, u16) \ + PRED_##P (F, int16_t, int16_t, s16) + +#define H_INTEGER_SCALAR_WIDE(F, P) \ + PRED_##P (F, uint64_t, uint16_t, u16) \ + PRED_##P (F, int64_t, int16_t, s16) + +#define S_INTEGER(F, P) \ + PRED_##P (F, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svint32_t, int32_t, s32) + +#define S_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint32_t, uint32_t, u32) \ + PRED_##P (F, int32_t, int32_t, s32) + +#define S_INTEGER_SCALAR_WIDE(F, P) \ + PRED_##P (F, uint64_t, uint32_t, u32) \ + PRED_##P (F, int64_t, int32_t, s32) + +#define S_UNSIGNED(F, P) \ + PRED_##P (F, svuint32_t, uint32_t, u32) + +#define D_INTEGER(F, P) \ + PRED_##P (F, svuint64_t, uint64_t, u64) \ + PRED_##P (F, svint64_t, int64_t, s64) + +#define D_INTEGER_SCALAR(F, P) \ + PRED_##P (F, uint64_t, uint64_t, u64) \ + PRED_##P (F, int64_t, int64_t, s64) + +#define SD_INTEGER(F, P) \ + S_INTEGER (F, P) \ + D_INTEGER (F, P) + +#define SD_DATA(F, P) \ + PRED_##P (F, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svfloat64_t, float64_t, f64) \ + S_INTEGER (F, P) \ + D_INTEGER (F, P) + +#define ALL_SIGNED(F, P) \ + PRED_##P (F, svint8_t, int8_t, s8) \ + PRED_##P (F, svint16_t, int16_t, s16) \ + PRED_##P (F, svint32_t, int32_t, s32) \ + PRED_##P (F, svint64_t, int64_t, s64) + +#define ALL_SIGNED_UINT(F, P) \ + PRED_##P (F, svuint8_t, int8_t, s8) \ + PRED_##P (F, svuint16_t, int16_t, s16) \ + PRED_##P (F, svuint32_t, int32_t, s32) \ + PRED_##P (F, svuint64_t, int64_t, s64) + +#define ALL_UNSIGNED_UINT(F, P) \ + PRED_##P (F, svuint8_t, uint8_t, u8) \ + PRED_##P (F, svuint16_t, uint16_t, u16) \ + PRED_##P (F, svuint32_t, uint32_t, u32) \ + PRED_##P (F, svuint64_t, uint64_t, u64) + +#define HSD_INTEGER(F, P) \ + H_INTEGER (F, P) \ + S_INTEGER (F, P) \ + D_INTEGER (F, P) + +#define ALL_INTEGER(F, P) \ + Q_INTEGER (F, P) \ + HSD_INTEGER (F, P) + +#define ALL_INTEGER_SCALAR(F, P) \ + Q_INTEGER_SCALAR (F, P) \ + H_INTEGER_SCALAR (F, P) \ + S_INTEGER_SCALAR (F, P) \ + D_INTEGER_SCALAR (F, P) + +#define ALL_INTEGER_SCALAR_WIDE(F, P) \ + Q_INTEGER_SCALAR_WIDE (F, P) \ + H_INTEGER_SCALAR_WIDE (F, P) \ + S_INTEGER_SCALAR_WIDE (F, P) \ + D_INTEGER_SCALAR (F, P) + +#define ALL_INTEGER_UINT(F, P) \ + ALL_SIGNED_UINT (F, P) \ + ALL_UNSIGNED_UINT (F, P) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, svfloat16_t, float16_t, f16) \ + PRED_##P (F, svfloat32_t, float32_t, f32) \ + PRED_##P (F, svfloat64_t, float64_t, f64) + +#define ALL_FLOAT_SCALAR(F, P) \ + PRED_##P (F, float16_t, float16_t, f16) \ + PRED_##P (F, float32_t, float32_t, f32) \ + PRED_##P (F, float64_t, float64_t, f64) + +#define ALL_FLOAT_INT(F, P) \ + PRED_##P (F, svint16_t, float16_t, f16) \ + PRED_##P (F, svint32_t, float32_t, f32) \ + PRED_##P (F, svint64_t, float64_t, f64) + +#define ALL_FLOAT_UINT(F, P) \ + PRED_##P (F, svuint16_t, float16_t, f16) \ + PRED_##P (F, svuint32_t, float32_t, f32) \ + PRED_##P (F, svuint64_t, float64_t, f64) + +#define ALL_FLOAT_AND_SIGNED(F, P) \ + ALL_SIGNED (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_ARITH_SCALAR(F, P) \ + ALL_INTEGER_SCALAR (F, P) \ + ALL_FLOAT_SCALAR (F, P) + +#define ALL_ARITH_SCALAR_WIDE(F, P) \ + ALL_INTEGER_SCALAR_WIDE (F, P) \ + ALL_FLOAT_SCALAR (F, P) + +#define ALL_DATA(F, P) \ + ALL_INTEGER (F, P) \ + ALL_FLOAT (F, P) \ + PRED_##P (F, svbfloat16_t, bfloat16_t, bf16) + +#define ALL_DATA_SCALAR(F, P) \ + ALL_ARITH_SCALAR (F, P) \ + PRED_##P (F, bfloat16_t, bfloat16_t, bf16) + +#define ALL_DATA_UINT(F, P) \ + ALL_INTEGER_UINT (F, P) \ + ALL_FLOAT_UINT (F, P) \ + PRED_##P (F, svuint16_t, bfloat16_t, bf16) + +#define B(F, P) \ + PRED_##P (F, svbool_t, bool_t, b) + +#define BN(F, P) \ + PRED_##P (F, svbool_t, bool_t, b8) \ + PRED_##P (F, svbool_t, bool_t, b16) \ + PRED_##P (F, svbool_t, bool_t, b32) \ + PRED_##P (F, svbool_t, bool_t, b64) + +#define BOOL(F, P) \ + PRED_##P (F, bool, bool_t) + +#define ALL_PRED_UINT64(F, P) \ + PRED_##P (F, uint64_t, bool_t, b8) \ + PRED_##P (F, uint64_t, bool_t, b16) \ + PRED_##P (F, uint64_t, bool_t, b32) \ + PRED_##P (F, uint64_t, bool_t, b64) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c new file mode 100644 index 00000000000..a8fd4c8e8f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +B (brkn, Zv) +B (brkpa, Zv) +B (brkpb, Zv) +ALL_DATA (splice, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c new file mode 100644 index 00000000000..08cd6a07c57 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_FLOAT_INT (scale, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c new file mode 100644 index 00000000000..f5c9cbf1ae6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_ARITH (abd, MXZ) +ALL_ARITH (add, MXZ) +ALL_INTEGER (and, MXZ) +B (and, Zv) +ALL_INTEGER (bic, MXZ) +B (bic, Zv) +ALL_FLOAT_AND_SD_INTEGER (div, MXZ) +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ) +ALL_INTEGER (eor, MXZ) +B (eor, Zv) +ALL_ARITH (mul, MXZ) +ALL_INTEGER (mulh, MXZ) +ALL_FLOAT (mulx, MXZ) +B (nand, Zv) +B (nor, Zv) +B (orn, Zv) +ALL_INTEGER (orr, MXZ) +B (orr, Zv) +ALL_ARITH (sub, MXZ) +ALL_ARITH (subr, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 448 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c new file mode 100644 index 00000000000..91ae3c853f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_ARITH (max, MXZ) +ALL_ARITH (min, MXZ) +ALL_FLOAT (maxnm, MXZ) +ALL_FLOAT (minnm, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 56 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c new file mode 100644 index 00000000000..12368ce39e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ4(F, TYPE) \ + TYPE F##_f (TYPE op1, TYPE op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2, 90); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + MXZ4 (F##_##TY##_m, TYPE) \ + MXZ4 (F##_##TY##_x, TYPE) \ + MXZ4 (F##_##TY##_z, TYPE) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, svfloat16_t, f16) \ + PRED_##P (F, svfloat32_t, f32) \ + PRED_##P (F, svfloat64_t, f64) + +ALL_FLOAT (cadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c new file mode 100644 index 00000000000..dd52a5807e8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +BHS_SIGNED_UINT64 (asr_wide, MXZ) +BHS_INTEGER_UINT64 (lsl_wide, MXZ) +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c new file mode 100644 index 00000000000..e55ddfb674f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_SIGNED_UINT (asr, MXZ) +ALL_INTEGER_UINT (lsl, MXZ) +ALL_UNSIGNED_UINT (lsr, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 64 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c new file mode 100644 index 00000000000..6796229fb31 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + sv##TY F##_f (sv##TY op1, sv##TY op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) \ + T (F##_b, bool_t) + +ALL_DATA (sel) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[zp]0\.[db], [zp]1\.[db]\n\tret\n} 13 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 13 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c new file mode 100644 index 00000000000..7f2ec4acc26 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_DATA (clasta, IMPLICITv) +ALL_DATA (clastb, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c new file mode 100644 index 00000000000..d18427bbf26 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_FLOAT_BOOL (acge, IMPLICIT) +ALL_FLOAT_BOOL (acgt, IMPLICIT) +ALL_FLOAT_BOOL (acle, IMPLICIT) +ALL_FLOAT_BOOL (aclt, IMPLICIT) +ALL_ARITH_BOOL (cmpeq, IMPLICIT) +ALL_ARITH_BOOL (cmpge, IMPLICIT) +ALL_ARITH_BOOL (cmpgt, IMPLICIT) +ALL_ARITH_BOOL (cmple, IMPLICIT) +ALL_ARITH_BOOL (cmplt, IMPLICIT) +ALL_ARITH_BOOL (cmpne, IMPLICIT) +ALL_FLOAT_BOOL (cmpuo, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 162 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 162 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c new file mode 100644 index 00000000000..983ab5c160d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +BHS_SIGNED_WIDE_BOOL (cmpeq_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmpge_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmpgt_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmple_wide, IMPLICIT) +BHS_INTEGER_WIDE_BOOL (cmplt_wide, IMPLICIT) +BHS_SIGNED_WIDE_BOOL (cmpne_wide, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 60 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c new file mode 100644 index 00000000000..de36b66ffc0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +ALL_PRED_UINT64 (cntp, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tx0, 0\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c new file mode 100644 index 00000000000..333140d897a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_FLOAT_SCALAR (adda, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 3 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c new file mode 100644 index 00000000000..93d66937364 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + sv##TY F##_f (const TY *base) \ + { \ + return sv##F (svpfalse_b (), base); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) \ + +ALL_DATA (ldff1) +ALL_DATA (ldnf1) +ALL_DATA (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 36 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c new file mode 100644 index 00000000000..c88686a2dd8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, RTY, TY) \ + RTY F##_f (const TY *base) \ + { \ + return sv##F (svpfalse_b (), base); \ + } + +#define D_INTEGER(F, TY) \ + T (F##_s64, svint64_t, TY) \ + T (F##_u64, svuint64_t, TY) + +#define SD_INTEGER(F, TY) \ + D_INTEGER (F, TY) \ + T (F##_s32, svint32_t, TY) \ + T (F##_u32, svuint32_t, TY) + +#define HSD_INTEGER(F, TY) \ + SD_INTEGER (F, TY) \ + T (F##_s16, svint16_t, TY) \ + T (F##_u16, svuint16_t, TY) + +#define TEST(F) \ + HSD_INTEGER (F##sb, int8_t) \ + SD_INTEGER (F##sh, int16_t) \ + D_INTEGER (F##sw, int32_t) \ + HSD_INTEGER (F##ub, uint8_t) \ + SD_INTEGER (F##uh, uint16_t) \ + D_INTEGER (F##uw, uint32_t) \ + +TEST (ld1) +TEST (ldff1) +TEST (ldnf1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 72 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c new file mode 100644 index 00000000000..5f4b562fca3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 indices) \ + { \ + return sv##F (svpfalse_b (), base, indices); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t index) \ + { \ + return sv##F (svpfalse_b (), bases, index); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_s##B##index_##TY, RTY, TYPE, svint##B##_t) \ + T1 (F##_gather_u##B##index_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_index_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_index_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sh, 32, svint32_t, s32, int16_t) \ + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##uh, 32, svint32_t, s32, uint16_t) \ + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ld1) +TEST (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 72 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c new file mode 100644 index 00000000000..0fe8ab3ea61 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c @@ -0,0 +1,71 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 offsets) \ + { \ + return sv##F (svpfalse_b (), base, offsets); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t offset) \ + { \ + return sv##F (svpfalse_b (), bases, offset); \ + } + +#define T5(F, RTY, TY1) \ + RTY F##_f (TY1 bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_s##B##offset_##TY, RTY, TYPE, svint##B##_t) \ + T1 (F##_gather_u##B##offset_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_offset_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_offset_u##B, svuint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_s##B, svint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sb, 32, svint32_t, s32, int8_t) \ + T3 (F##sb, 32, svuint32_t, u32, int8_t) \ + T3 (F##sb, 64, svint64_t, s64, int8_t) \ + T3 (F##sb, 64, svuint64_t, u64, int8_t) \ + T4 (F##sb, 32, svuint32_t, u32) \ + T4 (F##sb, 64, svuint64_t, u64) \ + T3 (F##sh, 32, svint32_t, s32, int16_t) \ + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##ub, 32, svint32_t, s32, uint8_t) \ + T3 (F##ub, 32, svuint32_t, u32, uint8_t) \ + T3 (F##ub, 64, svint64_t, s64, uint8_t) \ + T3 (F##ub, 64, svuint64_t, u64, uint8_t) \ + T4 (F##ub, 32, svuint32_t, u32) \ + T4 (F##ub, 64, svuint64_t, u64) \ + T3 (F##uh, 32, svint32_t, s32, uint16_t) \ + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ld1) +TEST (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 160 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 160 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c new file mode 100644 index 00000000000..758f00fe175 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (TY1 *base, TY2 values) \ + { \ + return sv##F (svpfalse_b (), base, values); \ + } + +#define T3(F, TY, B) \ + T1 (F##_f##B, svfloat##B##_t, float##B##_t, TY) \ + T1 (F##_s##B, svint##B##_t, int##B##_t, TY) \ + T1 (F##_u##B, svuint##B##_t, uint##B##_t, TY) \ + +#define T2(F, B) \ + T3 (F##_gather_u##B##offset, svuint##B##_t, B) \ + T3 (F##_gather_u##B##index, svuint##B##_t, B) \ + T3 (F##_gather_s##B##offset, svint##B##_t, B) \ + T3 (F##_gather_s##B##index, svint##B##_t, B) + +#define SD_DATA(F) \ + T2 (F, 32) \ + T2 (F, 64) + +SD_DATA (ld1) +SD_DATA (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c new file mode 100644 index 00000000000..f82471f3b1e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY) \ + RTY F##_f (TY bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T2(F, RTY, TY) \ + RTY F##_f (TY bases, int64_t value) \ + { \ + return sv##F (svpfalse_b (), bases, value); \ + } + +#define T4(F, TY, TEST, B) \ + TEST (F##_f##B, svfloat##B##_t, TY) \ + TEST (F##_s##B, svint##B##_t, TY) \ + TEST (F##_u##B, svuint##B##_t, TY) \ + +#define T3(F, B) \ + T4 (F##_gather_u##B##base, svuint##B##_t, T1, B) \ + T4 (F##_gather_u##B##base_offset, svuint##B##_t, T2, B) \ + T4 (F##_gather_u##B##base_index, svuint##B##_t, T2, B) + +#define SD_DATA(F) \ + T3 (F, 32) \ + T3 (F, 64) + +SD_DATA (ld1) +SD_DATA (ldff1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 36 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c new file mode 100644 index 00000000000..ba500b6cab8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2 -march=armv8.2-a+sve+f64mm" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + sv##TY F##_f (const TY *base) \ + { \ + return sv##F (svpfalse_b (), base); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) \ + +ALL_DATA (ld1rq) +ALL_DATA (ld1ro) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c new file mode 100644 index 00000000000..71894c4b38a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F) \ + void F##_f (const void *base) \ + { \ + return sv##F (svpfalse_b (), base, 0); \ + } + +#define ALL_PREFETCH \ + T (prfb) \ + T (prfh) \ + T (prfw) \ + T (prfd) + +ALL_PREFETCH + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c new file mode 100644 index 00000000000..1b7cc42b969 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TYPE, TY) \ + void F##_##TY##_f (const void *base, sv##TYPE indices) \ + { \ + return sv##F##_##TY##index (svpfalse_b (), base, indices, 0);\ + } + +#define T1(F) \ + T (F, uint32_t, u32) \ + T (F, uint64_t, u64) \ + T (F, int32_t, s32) \ + T (F, int64_t, s64) + +#define ALL_PREFETCH_GATHER_INDEX \ + T1 (prfh_gather) \ + T1 (prfw_gather) \ + T1 (prfd_gather) + +ALL_PREFETCH_GATHER_INDEX + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c new file mode 100644 index 00000000000..7f4ff2df523 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TYPE, TY) \ + void F##_##TY##_f (const void *base, sv##TYPE offsets) \ + { \ + return sv##F##_##TY##offset (svpfalse_b (), base, offsets, 0); \ + } + +#define T1(F) \ + T (F, uint32_t, u32) \ + T (F, uint64_t, u64) \ + T (F, int32_t, s32) \ + T (F, int64_t, s64) + +#define ALL_PREFETCH_GATHER_OFFSET \ + T1 (prfb_gather) \ + +ALL_PREFETCH_GATHER_OFFSET + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c new file mode 100644 index 00000000000..0a587fca869 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +BOOL (ptest_any, IMPLICITn) +BOOL (ptest_first, IMPLICITn) +BOOL (ptest_last, IMPLICITn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tw0, 0\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 3 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c new file mode 100644 index 00000000000..d795f8ec3f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +svbool_t rdffr_f () +{ + return svrdffr_z (svpfalse_b ()); +} + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 1 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c new file mode 100644 index 00000000000..42b37aef0b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include "../pfalse-unary_0.h" + +ALL_INTEGER_SCALAR (andv, IMPLICIT) +ALL_INTEGER_SCALAR (eorv, IMPLICIT) +ALL_ARITH_SCALAR (maxv, IMPLICIT) +ALL_ARITH_SCALAR (minv, IMPLICIT) +ALL_INTEGER_SCALAR (orv, IMPLICIT) +ALL_FLOAT_SCALAR (maxnmv, IMPLICIT) +ALL_FLOAT_SCALAR (minnmv, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, 0\n\tret\n} 20 } } */ +/* { dg-final { scan-tree-dump-times "return Nan" 6 "optimized" } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, -1\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wxz]0(?:\.[sd])?, #?-?[1-9]+[0-9]*\n\tret\n} 27 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\t(movi|mvni)\tv0\.(2s|4h), 0x[0-9a-f]+, [a-z]+ [0-9]+\n\tret\n} 5 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 52 } } */ + +/* The sum of tested cases is 52 + 12, because mov [wx]0, -1 is tested in two + patterns. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c new file mode 100644 index 00000000000..bd9a9807611 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +ALL_ARITH_SCALAR_WIDE (addv, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[dwvx]0(?:\.(2s|4h))?, #?0\n\tret\n} 11 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 11 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c new file mode 100644 index 00000000000..62a07557e82 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY) \ + TY F##_f (TY op1) \ + { \ + return sv##F (svpfalse_b (), op1, 2); \ + } + +#define PRED_MXZn(F, TYPE, TY) \ + MXZ (F##_n_##TY##_m, TYPE) \ + MXZ (F##_n_##TY##_x, TYPE) \ + MXZ (F##_n_##TY##_z, TYPE) + +#define ALL_SIGNED_IMM(F, P) \ + PRED_##P (F, svint8_t, s8) \ + PRED_##P (F, svint16_t, s16) \ + PRED_##P (F, svint32_t, s32) \ + PRED_##P (F, svint64_t, s64) + +ALL_SIGNED_IMM (asrd, MXZn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c new file mode 100644 index 00000000000..751e60e2ba2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY1, TY2) \ + void F##_f (TY1 *base, sv##TY2 data) \ + { \ + return sv##F (svpfalse_b (), base, data); \ + } + +#define D_INTEGER(F, TY) \ + T (F##_s64, TY, int64_t) \ + T (F##_u64, TY, uint64_t) + +#define SD_INTEGER(F, TY) \ + D_INTEGER (F, TY) \ + T (F##_s32, TY, int32_t) \ + T (F##_u32, TY, uint32_t) + +#define HSD_INTEGER(F, TY) \ + SD_INTEGER (F, TY) \ + T (F##_s16, TY, int16_t) \ + T (F##_u16, TY, uint16_t) + +#define ALL_DATA(F, A) \ + T (F##_bf16, bfloat16_t, bfloat16##A) \ + T (F##_f16, float16_t, float16##A) \ + T (F##_f32, float32_t, float32##A) \ + T (F##_f64, float64_t, float64##A) \ + T (F##_s8, int8_t, int8##A) \ + T (F##_s16, int16_t, int16##A) \ + T (F##_s32, int32_t, int32##A) \ + T (F##_s64, int64_t, int64##A) \ + T (F##_u8, uint8_t, uint8##A) \ + T (F##_u16, uint16_t, uint16##A) \ + T (F##_u32, uint32_t, uint32##A) \ + T (F##_u64, uint64_t, uint64##A) \ + +HSD_INTEGER (st1b, int8_t) +SD_INTEGER (st1h, int16_t) +D_INTEGER (st1w, int32_t) +ALL_DATA (st1, _t) +ALL_DATA (st2, x2_t) +ALL_DATA (st3, x3_t) +ALL_DATA (st4, x4_t) + +/* FIXME: Currently, st1/2/3/4 are not folded with a pfalse + predicate, which is the reason for the 48 missing cases below. Once + folding is implemented for these intrinsics, the sum should be 60. */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c new file mode 100644 index 00000000000..44792d30b42 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 indices, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, indices, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t index, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, index, data); \ + } + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_s##B##index_##TY, TYPE2, svint##B##_t, TYPE1) \ + T1 (F##_scatter_u##B##index_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_index_##TY, svuint##B##_t, TYPE1) + +#define TEST(F) \ + T3 (F##h, 32, svint32_t, s32, int16_t) \ + T3 (F##h, 32, svuint32_t, u32, int16_t) \ + T3 (F##h, 64, svint64_t, s64, int16_t) \ + T3 (F##h, 64, svuint64_t, u64, int16_t) \ + T4 (F##h, 32, svuint32_t, u32) \ + T4 (F##h, 64, svuint64_t, u64) \ + T3 (F##w, 64, svint64_t, s64, int32_t) \ + T3 (F##w, 64, svuint64_t, u64, int32_t) \ + T4 (F##w, 64, svuint64_t, u64) \ + +TEST (st1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 15 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c new file mode 100644 index 00000000000..f3820e065f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 offsets, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, offsets, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t offset, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, offset, data); \ + } + +#define T5(F, TY1, TY3) \ + void F##_f (TY1 bases, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, data); \ + } + + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_s##B##offset_##TY, TYPE2, svint##B##_t, TYPE1)\ + T1 (F##_scatter_u##B##offset_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_offset_##TY, svuint##B##_t, TYPE1) \ + T5 (F##_scatter_u##B##base_##TY, svuint##B##_t, TYPE1) + +#define D_INTEGER(F, BHW) \ + T3 (F, 64, svint64_t, s64, int##BHW##_t) \ + T3 (F, 64, svuint64_t, u64, int##BHW##_t) \ + T4 (F, 64, svint64_t, s64) \ + T4 (F, 64, svuint64_t, u64) + +#define SD_INTEGER(F, BHW) \ + D_INTEGER (F, BHW) \ + T3 (F, 32, svint32_t, s32, int##BHW##_t) \ + T3 (F, 32, svuint32_t, u32, int##BHW##_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 32, svuint32_t, u32) + +#define SD_DATA(F) \ + T3 (F, 32, svint32_t, s32, int32_t) \ + T3 (F, 64, svint64_t, s64, int64_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 64, svint64_t, s64) \ + T3 (F, 32, svuint32_t, u32, uint32_t) \ + T3 (F, 64, svuint64_t, u64, uint64_t) \ + T4 (F, 32, svuint32_t, u32) \ + T4 (F, 64, svuint64_t, u64) \ + T3 (F, 32, svfloat32_t, f32, float32_t) \ + T3 (F, 64, svfloat64_t, f64, float64_t) \ + T4 (F, 32, svfloat32_t, f32) \ + T4 (F, 64, svfloat64_t, f64) + + +SD_DATA (st1) +SD_INTEGER (st1b, 8) +SD_INTEGER (st1h, 16) +D_INTEGER (st1w, 32) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 64 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c new file mode 100644 index 00000000000..e49266dab68 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TY) \ + void F##_f (TY *base, sv##TY data) \ + { \ + return sv##F (svpfalse_b (), base, data); \ + } + +#define ALL_DATA(F) \ + T (F##_bf16, bfloat16_t) \ + T (F##_f16, float16_t) \ + T (F##_f32, float32_t) \ + T (F##_f64, float64_t) \ + T (F##_s8, int8_t) \ + T (F##_s16, int16_t) \ + T (F##_s32, int32_t) \ + T (F##_s64, int64_t) \ + T (F##_u8, uint8_t) \ + T (F##_u16, uint16_t) \ + T (F##_u32, uint32_t) \ + T (F##_u64, uint64_t) + +ALL_DATA (stnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c new file mode 100644 index 00000000000..acdd1417af4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY12, TY3) \ + TY12 F##_f (TY12 op1, TY12 op2, TY3 op3) \ + { \ + return sv##F (svpfalse_b (), op1, op2, op3); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + MXZ (F##_##TY##_m, sv##TYPE, sv##TYPE) \ + MXZ (F##_n_##TY##_m, sv##TYPE, TYPE) \ + MXZ (F##_##TY##_x, sv##TYPE, sv##TYPE) \ + MXZ (F##_n_##TY##_x, sv##TYPE, TYPE) \ + MXZ (F##_##TY##_z, sv##TYPE, sv##TYPE) \ + MXZ (F##_n_##TY##_z, sv##TYPE, TYPE) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, float16_t, f16) \ + PRED_##P (F, float32_t, f32) \ + PRED_##P (F, float64_t, f64) + +#define ALL_INTEGER(F, P) \ + PRED_##P (F, uint8_t, u8) \ + PRED_##P (F, uint16_t, u16) \ + PRED_##P (F, uint32_t, u32) \ + PRED_##P (F, uint64_t, u64) \ + PRED_##P (F, int8_t, s8) \ + PRED_##P (F, int16_t, s16) \ + PRED_##P (F, int32_t, s32) \ + PRED_##P (F, int64_t, s64) \ + +#define ALL_ARITH(F, P) \ + ALL_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +ALL_ARITH (mad, MXZ) +ALL_ARITH (mla, MXZ) +ALL_ARITH (mls, MXZ) +ALL_ARITH (msb, MXZ) +ALL_FLOAT (nmad, MXZ) +ALL_FLOAT (nmla, MXZ) +ALL_FLOAT (nmls, MXZ) +ALL_FLOAT (nmsb, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c new file mode 100644 index 00000000000..7698045d27c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY) \ + TY F##_f (TY op1, TY op2, TY op3) \ + { \ + return sv##F (svpfalse_b (), op1, op2, op3, 90); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + MXZ (F##_##TY##_m, sv##TYPE) \ + MXZ (F##_##TY##_x, sv##TYPE) \ + MXZ (F##_##TY##_z, sv##TYPE) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, float16_t, f16) \ + PRED_##P (F, float32_t, f32) \ + PRED_##P (F, float64_t, f64) + +ALL_FLOAT (cmla, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c new file mode 100644 index 00000000000..037376b3a4a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +ALL_FLOAT_AND_SIGNED (abs, MXZ) +B (brka, MZ) +B (brkb, MZ) +ALL_INTEGER (cnot, MXZ) +HSD_INTEGER (extb, MXZ) +SD_INTEGER (exth, MXZ) +D_INTEGER (extw, MXZ) +B (mov, Z) +ALL_FLOAT_AND_SIGNED (neg, MXZ) +ALL_INTEGER (not, MXZ) +B (not, Z) +B (pfirst, IMPLICIT) +ALL_INTEGER (rbit, MXZ) +ALL_FLOAT (recpx, MXZ) +HSD_INTEGER (revb, MXZ) +SD_INTEGER (revh, MXZ) +D_INTEGER (revw, MXZ) +ALL_FLOAT (rinti, MXZ) +ALL_FLOAT (rintx, MXZ) +ALL_FLOAT (rintz, MXZ) +ALL_FLOAT (sqrt, MXZ) +SD_DATA (compact, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 80 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 160 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 244 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c new file mode 100644 index 00000000000..1287a70a7ed --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2 -march=armv8.2-a+sve+bf16" } */ + +#include <arm_sve.h> + +#define T(F, TYPE1, TY1, TYPE2, TY2) \ + TYPE1 F##_##TY1##_##TY2##_x_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_x (even, svpfalse_b (), op); \ + } \ + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_m (even, svpfalse_b (), op); \ + } + +T (cvtnt, svbfloat16_t, bf16, svfloat32_t, f32) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 1 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?: [0-9]*[bhsd])?, #?0\n\tret\n} 1 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c new file mode 100644 index 00000000000..f5192666f12 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2 -march=armv8.2-a+sve+bf16" } */ + +#include <arm_sve.h> + +#define T(TYPE1, TY1, TYPE2, TY2) \ + TYPE1 cvt_##TY1##_##TY2##_x_f (TYPE2 op) \ + { \ + return svcvt_##TY1##_##TY2##_x (svpfalse_b (), op); \ + } \ + TYPE1 cvt_##TY1##_##TY2##_z_f (TYPE2 op) \ + { \ + return svcvt_##TY1##_##TY2##_z (svpfalse_b (), op); \ + } \ + TYPE1 cvt_##TY1##_##TY2##_m_f (TYPE1 inactive, TYPE2 op) \ + { \ + return svcvt_##TY1##_##TY2##_m (inactive, svpfalse_b (), op); \ + } + +#define SWAP(TYPE1, TY1, TYPE2, TY2) \ + T (TYPE1, TY1, TYPE2, TY2) \ + T (TYPE2, TY2, TYPE1, TY1) + +#define TEST_ALL \ + T (svbfloat16_t, bf16, svfloat32_t, f32) \ + SWAP (svfloat16_t, f16, svfloat32_t, f32) \ + SWAP (svfloat16_t, f16, svfloat64_t, f64) \ + SWAP (svfloat32_t, f32, svfloat64_t, f64) \ + SWAP (svint16_t, s16, svfloat16_t, f16) \ + SWAP (svint32_t, s32, svfloat16_t, f16) \ + SWAP (svint32_t, s32, svfloat32_t, f32) \ + SWAP (svint32_t, s32, svfloat64_t, f64) \ + SWAP (svint64_t, s64, svfloat16_t, f16) \ + SWAP (svint64_t, s64, svfloat32_t, f32) \ + SWAP (svint64_t, s64, svfloat64_t, f64) \ + SWAP (svuint16_t, u16, svfloat16_t, f16) \ + SWAP (svuint32_t, u32, svfloat16_t, f16) \ + SWAP (svuint32_t, u32, svfloat32_t, f32) \ + SWAP (svuint32_t, u32, svfloat64_t, f64) \ + SWAP (svuint64_t, u64, svfloat16_t, f16) \ + SWAP (svuint64_t, u64, svfloat32_t, f32) \ + SWAP (svuint64_t, u64, svfloat64_t, f64) + +TEST_ALL + + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 35 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 70 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 105 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c new file mode 100644 index 00000000000..fabde3e52a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define M(F, TY) \ + sv##TY F##_f (sv##TY inactive, TY op) \ + { \ + return sv##F (inactive, svpfalse_b (), op); \ + } + +#define XZ(F, TY) \ + sv##TY F##_f (TY op) \ + { \ + return sv##F (svpfalse_b (), op); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + M (F##_##TY##_m, TYPE) \ + XZ (F##_##TY##_x, TYPE) \ + XZ (F##_##TY##_z, TYPE) + +#define ALL_DATA(F, P) \ + PRED_##P (F, uint8_t, u8) \ + PRED_##P (F, uint16_t, u16) \ + PRED_##P (F, uint32_t, u32) \ + PRED_##P (F, uint64_t, u64) \ + PRED_##P (F, int8_t, s8) \ + PRED_##P (F, int16_t, s16) \ + PRED_##P (F, int32_t, s32) \ + PRED_##P (F, int64_t, s64) \ + PRED_##P (F, float16_t, f16) \ + PRED_##P (F, float32_t, f32) \ + PRED_##P (F, float64_t, f64) \ + PRED_##P (F, bfloat16_t, bf16) \ + +ALL_DATA (dup, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0(?:\.[bhsd])?, [wxhsd]0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c new file mode 100644 index 00000000000..46c9592c47d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +BN (pnext, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c new file mode 100644 index 00000000000..b820bde817f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +ALL_SIGNED_UINT (cls, MXZ) +ALL_INTEGER_UINT (clz, MXZ) +ALL_DATA_UINT (cnt, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c new file mode 100644 index 00000000000..1e99b7f2f8b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +ALL_FLOAT (rinta, MXZ) +ALL_FLOAT (rintm, MXZ) +ALL_FLOAT (rintn, MXZ) +ALL_FLOAT (rintp, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12} } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c new file mode 100644 index 00000000000..94470a5d617 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_ARITH (addp, MXv) +ALL_ARITH (maxp, MXv) +ALL_FLOAT (maxnmp, MXv) +ALL_ARITH (minp, MXv) +ALL_FLOAT (minnmp, MXv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 39 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 39 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c new file mode 100644 index 00000000000..b8747b8d50a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_INTEGER_INT (qshl, MXZ) +ALL_INTEGER_INT (qrshl, MXZ) +ALL_UNSIGNED_INT (sqadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 40 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 80 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 120 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c new file mode 100644 index 00000000000..7cb7ee5203c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_INTEGER_INT (rshl, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 32 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c new file mode 100644 index 00000000000..787126f70bd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_INTEGER (hadd, MXZ) +ALL_INTEGER (hsub, MXZ) +ALL_INTEGER (hsubr, MXZ) +ALL_INTEGER (qadd, MXZ) +ALL_INTEGER (qsub, MXZ) +ALL_INTEGER (qsubr, MXZ) +ALL_INTEGER (rhadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c new file mode 100644 index 00000000000..6b2b0a424d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2 -march=armv8.2-a+sve2+faminmax" } */ + +#include "../pfalse-binary_0.h" + +ALL_FLOAT (amax, MXZ) +ALL_FLOAT (amin, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c new file mode 100644 index 00000000000..a0a7f809f82 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +SD_INTEGER_TO_UINT (histcnt, Zv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c new file mode 100644 index 00000000000..c13db48948b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +ALL_SIGNED_UINT (uqadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c new file mode 100644 index 00000000000..145b07760d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +HSD_INTEGER_WIDE (adalp, MXZv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c new file mode 100644 index 00000000000..da175db92e4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.h" + +BH_INTEGER_BOOL (match, IMPLICITv) +BH_INTEGER_BOOL (nmatch, IMPLICITv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 8 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c new file mode 100644 index 00000000000..c0476ce9c74 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 indices) \ + { \ + return sv##F (svpfalse_b (), base, indices); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t index) \ + { \ + return sv##F (svpfalse_b (), bases, index); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_s##B##index_##TY, RTY, TYPE, svint##B##_t) \ + T1 (F##_gather_u##B##index_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_index_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_index_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 28 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 28 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c new file mode 100644 index 00000000000..f6440246683 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY1, TY2) \ + RTY F##_f (const TY1 *base, TY2 offsets) \ + { \ + return sv##F (svpfalse_b (), base, offsets); \ + } + +#define T2(F, RTY, TY1) \ + RTY F##_f (TY1 bases, int64_t offset) \ + { \ + return sv##F (svpfalse_b (), bases, offset); \ + } + +#define T5(F, RTY, TY1) \ + RTY F##_f (TY1 bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T3(F, B, RTY, TY, TYPE) \ + T1 (F##_gather_u##B##offset_##TY, RTY, TYPE, svuint##B##_t) + +#define T4(F, B, RTY, TY) \ + T2 (F##_gather_##TY##base_offset_s##B, svint##B##_t, RTY) \ + T2 (F##_gather_##TY##base_offset_u##B, svuint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_s##B, svint##B##_t, RTY) \ + T5 (F##_gather_##TY##base_u##B, svuint##B##_t, RTY) + +#define TEST(F) \ + T3 (F##sb, 32, svuint32_t, u32, int8_t) \ + T3 (F##sb, 64, svint64_t, s64, int8_t) \ + T3 (F##sb, 64, svuint64_t, u64, int8_t) \ + T4 (F##sb, 32, svuint32_t, u32) \ + T4 (F##sb, 64, svuint64_t, u64) \ + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ + T3 (F##sh, 64, svint64_t, s64, int16_t) \ + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ + T4 (F##sh, 32, svuint32_t, u32) \ + T4 (F##sh, 64, svuint64_t, u64) \ + T3 (F##sw, 64, svint64_t, s64, int32_t) \ + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ + T4 (F##sw, 64, svuint64_t, u64) \ + T3 (F##ub, 32, svuint32_t, u32, uint8_t) \ + T3 (F##ub, 64, svint64_t, s64, uint8_t) \ + T3 (F##ub, 64, svuint64_t, u64, uint8_t) \ + T4 (F##ub, 32, svuint32_t, u32) \ + T4 (F##ub, 64, svuint64_t, u64) \ + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ + T4 (F##uh, 32, svuint32_t, u32) \ + T4 (F##uh, 64, svuint64_t, u64) \ + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ + T4 (F##uw, 64, svuint64_t, u64) \ + +TEST (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 56 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 56 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c new file mode 100644 index 00000000000..a48a8a9db51 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T2(F, RTY, TY1, TY2) \ + RTY F##_f (TY1 *base, TY2 values) \ + { \ + return sv##F (svpfalse_b (), base, values); \ + } + +#define T4(F, TY, B) \ + T2 (F##_f##B, svfloat##B##_t, float##B##_t, TY) \ + T2 (F##_s##B, svint##B##_t, int##B##_t, TY) \ + T2 (F##_u##B, svuint##B##_t, uint##B##_t, TY) \ + +#define SD_DATA(F) \ + T4 (F##_gather_s64index, svint64_t, 64) \ + T4 (F##_gather_u64index, svuint64_t, 64) \ + T4 (F##_gather_u32offset, svuint32_t, 32) \ + T4 (F##_gather_u64offset, svuint64_t, 64) \ + T4 (F##_gather_s64offset, svint64_t, 64) \ + +SD_DATA (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 15 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c new file mode 100644 index 00000000000..1fc08a3c53b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, RTY, TY) \ + RTY F##_f (TY bases) \ + { \ + return sv##F (svpfalse_b (), bases); \ + } + +#define T2(F, RTY, TY) \ + RTY F##_f (TY bases, int64_t value) \ + { \ + return sv##F (svpfalse_b (), bases, value); \ + } + +#define T4(F, TY, TEST, B) \ + TEST (F##_f##B, svfloat##B##_t, TY) \ + TEST (F##_s##B, svint##B##_t, TY) \ + TEST (F##_u##B, svuint##B##_t, TY) \ + +#define T3(F, B) \ + T4 (F##_gather_u##B##base, svuint##B##_t, T1, B) \ + T4 (F##_gather_u##B##base_offset, svuint##B##_t, T2, B) \ + T4 (F##_gather_u##B##base_index, svuint##B##_t, T2, B) + +#define SD_DATA(F) \ + T3 (F, 32) \ + T3 (F, 64) + +SD_DATA (ldnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 18 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c new file mode 100644 index 00000000000..bd2c9371e65 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, RTY, TY) \ + RTY F##_f (TY op1) \ + { \ + return sv##F (svpfalse_b (), op1, 2); \ + } + +#define PRED_MXZn(F, RTY, TYPE, TY) \ + MXZ (F##_n_##TY##_m, RTY, TYPE) \ + MXZ (F##_n_##TY##_x, RTY, TYPE) \ + MXZ (F##_n_##TY##_z, RTY, TYPE) + +#define ALL_SIGNED_IMM(F, P) \ + PRED_##P (F, svuint8_t, svint8_t, s8) \ + PRED_##P (F, svuint16_t, svint16_t, s16) \ + PRED_##P (F, svuint32_t, svint32_t, s32) \ + PRED_##P (F, svuint64_t, svint64_t, s64) + +ALL_SIGNED_IMM (qshlu, MXZn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c new file mode 100644 index 00000000000..f4994de4c80 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ(F, TY) \ + TY F##_f (TY op1) \ + { \ + return sv##F (svpfalse_b (), op1, 2); \ + } + +#define PRED_MXZn(F, TYPE, TY) \ + MXZ (F##_n_##TY##_m, TYPE) \ + MXZ (F##_n_##TY##_x, TYPE) \ + MXZ (F##_n_##TY##_z, TYPE) + +#define ALL_INTEGER_IMM(F, P) \ + PRED_##P (F, svuint8_t, u8) \ + PRED_##P (F, svuint16_t, u16) \ + PRED_##P (F, svuint32_t, u32) \ + PRED_##P (F, svuint64_t, u64) \ + PRED_##P (F, svint8_t, s8) \ + PRED_##P (F, svint16_t, s16) \ + PRED_##P (F, svint32_t, s32) \ + PRED_##P (F, svint64_t, s64) + +ALL_INTEGER_IMM (rshr, MXZn) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c new file mode 100644 index 00000000000..6bec3b34ece --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 indices, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, indices, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t index, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, index, data); \ + } + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_s##B##index_##TY, TYPE2, svint##B##_t, TYPE1) \ + T1 (F##_scatter_u##B##index_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_index_##TY, svuint##B##_t, TYPE1) + +#define TEST(F) \ + T3 (F##h, 64, svint64_t, s64, int16_t) \ + T3 (F##h, 64, svuint64_t, u64, int16_t) \ + T4 (F##h, 32, svuint32_t, u32) \ + T4 (F##h, 64, svuint64_t, u64) \ + T3 (F##w, 64, svint64_t, s64, int32_t) \ + T3 (F##w, 64, svuint64_t, u64, int32_t) \ + T4 (F##w, 64, svuint64_t, u64) \ + +#define SD_DATA(F) \ + T3 (F, 64, svfloat64_t, f64, float64_t) \ + T3 (F, 64, svint64_t, s64, int64_t) \ + T3 (F, 64, svuint64_t, u64, uint64_t) \ + T4 (F, 32, svfloat32_t, f32) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 32, svuint32_t, u32) \ + T4 (F, 64, svfloat64_t, f64) \ + T4 (F, 64, svint64_t, s64) \ + T4 (F, 64, svuint64_t, u64) \ + +TEST (stnt1) +SD_DATA (stnt1) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 23 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 23 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c new file mode 100644 index 00000000000..bcb4a148d62 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T1(F, TY1, TY2, TY3) \ + void F##_f (TY1 *base, TY2 offsets, TY3 data) \ + { \ + sv##F (svpfalse_b (), base, offsets, data); \ + } + +#define T2(F, TY1, TY3) \ + void F##_f (TY1 bases, int64_t offset, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, offset, data); \ + } + +#define T5(F, TY1, TY3) \ + void F##_f (TY1 bases, TY3 data) \ + { \ + sv##F (svpfalse_b (), bases, data); \ + } + + +#define T3(F, B, TYPE1, TY, TYPE2) \ + T1 (F##_scatter_u##B##offset_##TY, TYPE2, svuint##B##_t, TYPE1) + +#define T4(F, B, TYPE1, TY) \ + T2 (F##_scatter_u##B##base_offset_##TY, svuint##B##_t, TYPE1) \ + T5 (F##_scatter_u##B##base_##TY, svuint##B##_t, TYPE1) + +#define D_INTEGER(F, BHW) \ + T3 (F, 64, svint64_t, s64, int##BHW##_t) \ + T3 (F, 64, svuint64_t, u64, int##BHW##_t) \ + T4 (F, 64, svint64_t, s64) \ + T4 (F, 64, svuint64_t, u64) + +#define SD_INTEGER(F, BHW) \ + D_INTEGER (F, BHW) \ + T3 (F, 32, svuint32_t, u32, int##BHW##_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 32, svuint32_t, u32) + +#define SD_DATA(F) \ + T3 (F, 64, svint64_t, s64, int64_t) \ + T4 (F, 32, svint32_t, s32) \ + T4 (F, 64, svint64_t, s64) \ + T3 (F, 32, svuint32_t, u32, uint32_t) \ + T3 (F, 64, svuint64_t, u64, uint64_t) \ + T4 (F, 32, svuint32_t, u32) \ + T4 (F, 64, svuint64_t, u64) \ + T3 (F, 32, svfloat32_t, f32, float32_t) \ + T3 (F, 64, svfloat64_t, f64, float64_t) \ + T4 (F, 32, svfloat32_t, f32) \ + T4 (F, 64, svfloat64_t, f64) + + +SD_DATA (stnt1) +SD_INTEGER (stnt1b, 8) +SD_INTEGER (stnt1h, 16) +D_INTEGER (stnt1w, 32) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 45 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 45 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c new file mode 100644 index 00000000000..ba7e931f058 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2 -march=armv9.2-a+sve+sme" } */ + +#include "../pfalse-unary_0.h" + +ALL_SIGNED (qabs, MXZ) +ALL_SIGNED (qneg, MXZ) +S_UNSIGNED (recpe, MXZ) +S_UNSIGNED (rsqrte, MXZ) + +#undef M +#define M(F, RTY, TY) \ + __arm_streaming \ + RTY F##_f (RTY inactive, TY op) \ + { \ + return sv##F (inactive, svpfalse_b (), op); \ + } + +#undef XZI +#define XZI(F, RTY, TY) \ + __arm_streaming \ + RTY F##_f (TY op) \ + { \ + return sv##F (svpfalse_b (), op); \ + } + +ALL_DATA (revd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 22 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 44 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 66 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c new file mode 100644 index 00000000000..7aa59ff866f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define XZ(F, TYPE1, TY1, TYPE2, TY2, P) \ + TYPE1 F##_##TY1##_##TY2##_##P##_f (TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_##P (svpfalse_b (), op); \ + } \ + +#define M(F, TYPE1, TY1, TYPE2, TY2) \ + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 inactive, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_m (inactive, svpfalse_b (), op); \ + } + +M (cvtx, svfloat32_t, f32, svfloat64_t, f64) +XZ (cvtx, svfloat32_t, f32, svfloat64_t, f64, x) +XZ (cvtx, svfloat32_t, f32, svfloat64_t, f64, z) +M (cvtlt, svfloat32_t, f32, svfloat16_t, f16) +XZ (cvtlt, svfloat32_t, f32, svfloat16_t, f16, x) +M (cvtlt, svfloat64_t, f64, svfloat32_t, f32) +XZ (cvtlt, svfloat64_t, f64, svfloat32_t, f32, x) + + + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 7 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c new file mode 100644 index 00000000000..1a4525cc769 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define T(F, TYPE1, TY1, TYPE2, TY2) \ + TYPE1 F##_##TY1##_##TY2##_x_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_x (even, svpfalse_b (), op); \ + } \ + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 even, TYPE2 op) \ + { \ + return sv##F##_##TY1##_##TY2##_m (even, svpfalse_b (), op); \ + } + +T (cvtnt, svfloat16_t, f16, svfloat32_t, f32) +T (cvtnt, svfloat32_t, f32, svfloat64_t, f64) +T (cvtxnt, svfloat32_t, f32, svfloat64_t, f64) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?: [0-9]*[bhsd])?, #?0\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 6 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c new file mode 100644 index 00000000000..b64bfc31944 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target elf } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-unary_0.h" + +ALL_FLOAT_INT (logb, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3} } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */
Ping. Thanks, Jennifer > On 20 Nov 2024, at 20:22, Jennifer Schmitz <jschmitz@nvidia.com> wrote: > > > >> On 19 Nov 2024, at 13:00, Richard Sandiford <richard.sandiford@arm.com> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Jennifer Schmitz <jschmitz@nvidia.com> writes: >>> If an SVE intrinsic has predicate pfalse, we can fold the call to >>> a simplified assignment statement: For _m predication, the LHS can be assigned >>> the operand for inactive values and for _z, we can assign a zero vector. >>> For _x, the returned values can be arbitrary and as suggested by >>> Richard Sandiford, we fold to a zero vector. >>> >>> For example, >>> svint32_t foo (svint32_t op1, svint32_t op2) >>> { >>> return svadd_s32_m (svpfalse_b (), op1, op2); >>> } >>> can be folded to lhs = op1, such that foo is compiled to just a RET. >>> >>> For implicit predication, a case distinction is necessary: >>> Intrinsics that read from memory can be folded to a zero vector. >>> Intrinsics that write to memory or prefetch can be folded to a no-op. >>> Other intrinsics need case-by-case implemenation, which we added in >>> the corresponding svxxx_impl::fold. >>> >>> We implemented this optimization during gimple folding by calling a new method >>> gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic >>> cases described above. >>> >>> We tested the new behavior for each intrinsic with all supported predications >>> and data types and checked the produced assembly. There is a test file >>> for each shape subclass with scan-assembler-times tests that look for >>> the simplified instruction sequences, such as individual RET instructions >>> or zeroing moves. There is an additional directive counting the total number of >>> functions in the test, which must be the sum of counts of all other >>> directives. This is to check that all tested intrinsics were optimized. >>> >>> Some few intrinsics were not covered by this patch: >>> - svlasta and svlastb already have an implementation to cover a pfalse >>> predicate. No changes were made to them. >>> - svld1/2/3/4 return aggregate types and were excluded from the case >>> that folds calls with implicit predication to lhs = {0, ...}. >>> - svst1/2/3/4 already have an implementation in svstx_impl that precedes >>> our optimization, such that it is not triggered. >>> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >>> OK for mainline? >>> >>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> >> >> Thanks, looks really good. > Thanks for the review and positive feedback on the patch! > Below are the responses to your comments and the revised patch. It was re-tested, no regression. > Best, > Jennifer >> >>> [...] >>> @@ -2222,6 +2435,14 @@ class svpfirst_svpnext_impl : public function_base >>> { >>> public: >>> CONSTEXPR svpfirst_svpnext_impl (int unspec) : m_unspec (unspec) {} >>> + gimple * >>> + fold (gimple_folder &f) const override >>> + { >>> + tree pg = gimple_call_arg (f.call, 0); >>> + if (is_pfalse (pg)) >>> + return f.fold_call_to (pg); >>> + return NULL; >>> + } >> >> I think this should instead return the second argument for UNSPEC_PFIRST. > Sorry about that and thanks for noticing it! I added the case distinction below. The test (sve/pfalse-unary.c) was adjusted accordingly (one more count for RET and one less for PFALSE + RET): > > tree pg = gimple_call_arg (f.call, 0); > if (is_pfalse (pg)) > - return f.fold_call_to (pg); > + return f.fold_call_to (m_unspec == UNSPEC_PFIRST > + ? gimple_call_arg (f.call, 1) > + : pg); > return NULL; > >> >>> [...] >>> @@ -556,6 +576,24 @@ public: >>> } >>> }; >>> >>> +class svqshlu_impl : public unspec_based_function >>> +{ >>> +public: >>> + CONSTEXPR svqshlu_impl () >>> + : unspec_based_function (UNSPEC_SQSHLU, -1, -1) {} >>> + gimple * >>> + fold (gimple_folder &f) const override >>> + { >>> + if (is_pfalse (gimple_call_arg (f.call, 0))) >>> + { >>> + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (f.lhs), >>> + gimple_call_arg (f.call, 1)); >>> + return gimple_build_assign (f.lhs, VIEW_CONVERT_EXPR, rhs); >>> + } >>> + return NULL; >>> + } >>> +}; >>> + >> >> Could we handle this in the generic code instead? (More below.) > Yes, see below. >> >>> [...] >>> @@ -3622,6 +3631,51 @@ gimple_folder::redirect_pred_x () >>> return redirect_call (instance); >>> } >>> >>> +/* Fold calls with predicate pfalse: >>> + _m predication: lhs = op1. >>> + _x or _z: lhs = {0, ...}. >>> + Implicit predication that reads from memory: lhs = {0, ...}. >>> + Implicit predication that writes to memory or prefetches: no-op. >>> + Return the new gimple statement on success, else NULL. */ >>> +gimple * >>> +gimple_folder::fold_pfalse () >>> +{ >>> + if (pred == PRED_none) >>> + return nullptr; >>> + tree arg0 = gimple_call_arg (call, 0); >>> + if (pred == PRED_m) >>> + { >>> + /* Unary function shapes with _m predication are folded to the >>> + inactive vector (arg0), while other function shapes are folded >>> + to op1 (arg1). >>> + In some intrinsics, e.g. svqshlu, lhs and op1 have different types, >>> + such that folding to op1 is not appropriate. */ >>> + tree arg1 = gimple_call_arg (call, 1); >>> + tree t; >>> + if (is_pfalse (arg1)) >>> + t = arg0; >>> + else if (is_pfalse (arg0)) >>> + t = arg1; >>> + else return nullptr; >> >> Formatting nit: the return should be on its own line. > Done. >> >>> + if (TREE_TYPE (t) == TREE_TYPE (lhs)) >>> + return fold_call_to (t); >> >> Going back to the question above, is there a case that this type check >> has to reject for correctness reasons? (I should try to work this out >> for myself, sorry, but I'm guessing you already have an example.) > The check was added because svqshlu has a signed op1 but returns an unsigned type. Without the view_convert, it ICEs with a non-trivial conversion error. But I added the conditional type conversion in the generic code (diff to previous patch version) and reverted svqshlu_impl: > > + /* In some intrinsics, e.g. svqshlu, lhs and op1 have different types, > + such that folding to op1 needs a type conversion. */ > if (TREE_TYPE (t) == TREE_TYPE (lhs)) > return fold_call_to (t); > + else > + { > + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); > + return gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs); > + } > >> >>> + } >>> + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) >>> + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); >>> + if (pred == PRED_implicit && is_pfalse (arg0)) >>> + { >>> + unsigned int flags = call_properties (); >>> + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with >>> + AGGREGATE types as lhs. */ >> >> We could probably fold to a __builtin_memset call instead, but yeah, >> that's probably best left as future work. > Ok, I made no changes here. >> >>> + if ((flags & CP_READ_MEMORY) >>> + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) >>> + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); >>> + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) >>> + return fold_to_stmt_vops (gimple_build_nop ()); >>> + } >>> + return nullptr; >>> +} >>> + >>> /* Fold the call to constant VAL. */ >>> gimple * >>> gimple_folder::fold_to_cstu (poly_uint64 val) >>> [...] >>> diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >>> new file mode 100644 >>> index 00000000000..72fb5b6ac6d >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c >> >> It looks like this and pfalse-unary_0.c are more like header files >> than standalone tests. It would probably be better to turn them >> into .h files if so. > Done. >> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >>> new file mode 100644 >>> index 00000000000..c1779818fc5 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c >>> @@ -0,0 +1,13 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2" } */ >>> + >>> +#include "../pfalse-binary_0.c" >>> + >>> +B (brkn, Zv) >>> +B (brkpa, Zv) >>> +B (brkpb, Zv) >>> +ALL_DATA (splice, IMPLICITv) >>> + >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ >> >> Nice approach! However, I think ".cfi_startproc" is specific to ELF >> and so will fail on Darwin. Rather than try to work with both, it would >> probably be easier to add: >> >> /* { dg-require-effective-target elf } */ >> >> after the dg-do line. Same for the other tests. > Done. >> >>> [...] >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c >>> new file mode 100644 >>> index 00000000000..9e74b0d1a35 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c >>> @@ -0,0 +1,18 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fdump-tree-optimized" } */ >>> + >>> +#include "../pfalse-unary_0.c" >>> + >>> +ALL_INTEGER_SCALAR (andv, IMPLICIT) >>> +ALL_INTEGER_SCALAR (eorv, IMPLICIT) >>> +ALL_ARITH_SCALAR (maxv, IMPLICIT) >>> +ALL_ARITH_SCALAR (minv, IMPLICIT) >>> +ALL_INTEGER_SCALAR (orv, IMPLICIT) >>> +ALL_FLOAT_SCALAR (maxnmv, IMPLICIT) >>> +ALL_FLOAT_SCALAR (minnmv, IMPLICIT) >>> + >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, 0\n\tret\n} 20 } } */ >>> +/* { dg-final { scan-tree-dump-times "return Nan" 6 "optimized" } } */ >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wxz]0(?:\.[sd])?, #?-?[1-9]+[0-9]*\n\tret\n} 27 } } */ >> >> This is quite a liberal regexp. Could we add one that matches specifically >> -1, for the ANDV and UMINV cases? It could be in addition to this one, >> to avoid complicating things by excluding -1 above. > Done. I added a comment at the end of the test explaining the extra 12 counts. > >> >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\t(movi|mvni)\tv0\.(2s|4h), 0x[0-9a-f]+, [a-z]+ [0-9]+\n\tret\n} 5 } } */ >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 52 } } */ >>> [...] >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c >>> new file mode 100644 >>> index 00000000000..3da28712988 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c >>> @@ -0,0 +1,50 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2" } */ >>> + >>> +#include <arm_sve.h> >>> + >>> +#define T(F, TY1, TY2) \ >>> + void F##_f (TY1 *base, sv##TY2 data) \ >>> + { \ >>> + return sv##F (svpfalse_b (), base, data); \ >>> + } >>> + >>> +#define D_INTEGER(F, TY) \ >>> + T (F##_s64, TY, int64_t) \ >>> + T (F##_u64, TY, uint64_t) >>> + >>> +#define SD_INTEGER(F, TY) \ >>> + D_INTEGER (F, TY) \ >>> + T (F##_s32, TY, int32_t) \ >>> + T (F##_u32, TY, uint32_t) >>> + >>> +#define HSD_INTEGER(F, TY) \ >>> + SD_INTEGER (F, TY) \ >>> + T (F##_s16, TY, int16_t) \ >>> + T (F##_u16, TY, uint16_t) >>> + >>> +#define ALL_DATA(F, A) \ >>> + T (F##_bf16, bfloat16_t, bfloat16##A) \ >>> + T (F##_f16, float16_t, float16##A) \ >>> + T (F##_f32, float32_t, float32##A) \ >>> + T (F##_f64, float64_t, float64##A) \ >>> + T (F##_s8, int8_t, int8##A) \ >>> + T (F##_s16, int16_t, int16##A) \ >>> + T (F##_s32, int32_t, int32##A) \ >>> + T (F##_s64, int64_t, int64##A) \ >>> + T (F##_u8, uint8_t, uint8##A) \ >>> + T (F##_u16, uint16_t, uint16##A) \ >>> + T (F##_u32, uint32_t, uint32##A) \ >>> + T (F##_u64, uint64_t, uint64##A) \ >>> + >>> +HSD_INTEGER (st1b, int8_t) >>> +SD_INTEGER (st1h, int16_t) >>> +D_INTEGER (st1w, int32_t) >>> +ALL_DATA (st1, _t) >>> +ALL_DATA (st2, x2_t) >>> +ALL_DATA (st3, x3_t) >>> +ALL_DATA (st4, x4_t) >>> + >>> + >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ >>> +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ >> >> What happens for the other 48? I assume they're for ST[234]? > Yes, that’s right. The tests are already there (ALL_DATA (stX,…), but they are not folded yet. >> >> The test above is OK if we can't optimise those properly yet, but it would >> be good to add a FIXME comment above the first regexp to explain that >> it ought to be 60 eventually. > Done. >> >> Thanks, >> Richard > > If an SVE intrinsic has predicate pfalse, we can fold the call to > a simplified assignment statement: For _m predication, the LHS can be assigned > the operand for inactive values and for _z, we can assign a zero vector. > For _x, the returned values can be arbitrary and as suggested by > Richard Sandiford, we fold to a zero vector. > > For example, > svint32_t foo (svint32_t op1, svint32_t op2) > { > return svadd_s32_m (svpfalse_b (), op1, op2); > } > can be folded to lhs = op1, such that foo is compiled to just a RET. > > For implicit predication, a case distinction is necessary: > Intrinsics that read from memory can be folded to a zero vector. > Intrinsics that write to memory or prefetch can be folded to a no-op. > Other intrinsics need case-by-case implemenation, which we added in > the corresponding svxxx_impl::fold. > > We implemented this optimization during gimple folding by calling a new method > gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic > cases described above. > > We tested the new behavior for each intrinsic with all supported predications > and data types and checked the produced assembly. There is a test file > for each shape subclass with scan-assembler-times tests that look for > the simplified instruction sequences, such as individual RET instructions > or zeroing moves. There is an additional directive counting the total number of > functions in the test, which must be the sum of counts of all other > directives. This is to check that all tested intrinsics were optimized. > > Some few intrinsics were not covered by this patch: > - svlasta and svlastb already have an implementation to cover a pfalse > predicate. No changes were made to them. > - svld1/2/3/4 return aggregate types and were excluded from the case > that folds calls with implicit predication to lhs = {0, ...}. > - svst1/2/3/4 already have an implementation in svstx_impl that precedes > our optimization, such that it is not triggered. > > The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> > > gcc/ChangeLog: > > PR target/106329 > * config/aarch64/aarch64-sve-builtins-base.cc > (svac_impl::fold): Add folding if pfalse predicate. > (svadda_impl::fold): Likewise. > (class svaddv_impl): Likewise. > (class svandv_impl): Likewise. > (svclast_impl::fold): Likewise. > (svcmp_impl::fold): Likewise. > (svcmp_wide_impl::fold): Likewise. > (svcmpuo_impl::fold): Likewise. > (svcntp_impl::fold): Likewise. > (class svcompact_impl): Likewise. > (class svcvtnt_impl): Likewise. > (class sveorv_impl): Likewise. > (class svminv_impl): Likewise. > (class svmaxnmv_impl): Likewise. > (class svmaxv_impl): Likewise. > (class svminnmv_impl): Likewise. > (class svorv_impl): Likewise. > (svpfirst_svpnext_impl::fold): Likewise. > (svptest_impl::fold): Likewise. > (class svsplice_impl): Likewise. > * config/aarch64/aarch64-sve-builtins-sve2.cc > (class svcvtxnt_impl): Likewise. > (svmatch_svnmatch_impl::fold): Likewise. > * config/aarch64/aarch64-sve-builtins.cc > (is_pfalse): Return true if tree is pfalse. > (gimple_folder::fold_pfalse): Fold calls with pfalse predicate. > (gimple_folder::fold_call_to): Fold call to lhs = t for given tree t. > (gimple_folder::fold_to_stmt_vops): Helper function that folds the > call to given stmt and adjusts virtual operands. > (gimple_folder::fold): Call fold_pfalse. > * config/aarch64/aarch64-sve-builtins.h (is_pfalse): Declare is_pfalse. > > gcc/testsuite/ChangeLog: > > PR target/106329 > * gcc.target/aarch64/pfalse-binary_0.c: New test. > * gcc.target/aarch64/pfalse-unary_0.c: New test. > * gcc.target/aarch64/sve/pfalse-binary.c: New test. > * gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test. > * gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test. > * gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test. > * gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test. > * gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test. > * gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test. > * gcc.target/aarch64/sve/pfalse-binaryxn.c: New test. > * gcc.target/aarch64/sve/pfalse-clast.c: New test. > * gcc.target/aarch64/sve/pfalse-compare_opt_n.c: New test. > * gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c: New test. > * gcc.target/aarch64/sve/pfalse-count_pred.c: New test. > * gcc.target/aarch64/sve/pfalse-fold_left.c: New test. > * gcc.target/aarch64/sve/pfalse-load.c: New test. > * gcc.target/aarch64/sve/pfalse-load_ext.c: New test. > * gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c: New test. > * gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c: New test. > * gcc.target/aarch64/sve/pfalse-load_gather_sv.c: New test. > * gcc.target/aarch64/sve/pfalse-load_gather_vs.c: New test. > * gcc.target/aarch64/sve/pfalse-load_replicate.c: New test. > * gcc.target/aarch64/sve/pfalse-prefetch.c: New test. > * gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c: New test. > * gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c: New test. > * gcc.target/aarch64/sve/pfalse-ptest.c: New test. > * gcc.target/aarch64/sve/pfalse-rdffr.c: New test. > * gcc.target/aarch64/sve/pfalse-reduction.c: New test. > * gcc.target/aarch64/sve/pfalse-reduction_wide.c: New test. > * gcc.target/aarch64/sve/pfalse-shift_right_imm.c: New test. > * gcc.target/aarch64/sve/pfalse-store.c: New test. > * gcc.target/aarch64/sve/pfalse-store_scatter_index.c: New test. > * gcc.target/aarch64/sve/pfalse-store_scatter_offset.c: New test. > * gcc.target/aarch64/sve/pfalse-storexn.c: New test. > * gcc.target/aarch64/sve/pfalse-ternary_opt_n.c: New test. > * gcc.target/aarch64/sve/pfalse-ternary_rotate.c: New test. > * gcc.target/aarch64/sve/pfalse-unary.c: New test. > * gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c: New test. > * gcc.target/aarch64/sve/pfalse-unary_convertxn.c: New test. > * gcc.target/aarch64/sve/pfalse-unary_n.c: New test. > * gcc.target/aarch64/sve/pfalse-unary_pred.c: New test. > * gcc.target/aarch64/sve/pfalse-unary_to_uint.c: New test. > * gcc.target/aarch64/sve/pfalse-unaryxn.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: New test. > * gcc.target/aarch64/sve2/pfalse-binary_wide.c: New test. > * gcc.target/aarch64/sve2/pfalse-compare.c: New test. > * gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c: > New test. > * gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c: > New test. > * gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c: New test. > * gcc.target/aarch64/sve2/pfalse-load_gather_vs.c: New test. > * gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c: New test. > * gcc.target/aarch64/sve2/pfalse-shift_right_imm.c: New test. > * gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c: > New test. > * gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c: > New test. > * gcc.target/aarch64/sve2/pfalse-unary.c: New test. > * gcc.target/aarch64/sve2/pfalse-unary_convert.c: New test. > * gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c: New test. > * gcc.target/aarch64/sve2/pfalse-unary_to_int.c: New test. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 268 +++++++++++++++++- > .../aarch64/aarch64-sve-builtins-sve2.cc | 22 +- > gcc/config/aarch64/aarch64-sve-builtins.cc | 79 ++++++ > gcc/config/aarch64/aarch64-sve-builtins.h | 4 + > .../gcc.target/aarch64/pfalse-binary_0.h | 240 ++++++++++++++++ > .../gcc.target/aarch64/pfalse-unary_0.h | 195 +++++++++++++ > .../gcc.target/aarch64/sve/pfalse-binary.c | 14 + > .../aarch64/sve/pfalse-binary_int_opt_n.c | 11 + > .../aarch64/sve/pfalse-binary_opt_n.c | 31 ++ > .../aarch64/sve/pfalse-binary_opt_single_n.c | 14 + > .../aarch64/sve/pfalse-binary_rotate.c | 27 ++ > .../aarch64/sve/pfalse-binary_uint64_opt_n.c | 13 + > .../aarch64/sve/pfalse-binary_uint_opt_n.c | 13 + > .../gcc.target/aarch64/sve/pfalse-binaryxn.c | 31 ++ > .../gcc.target/aarch64/sve/pfalse-clast.c | 11 + > .../aarch64/sve/pfalse-compare_opt_n.c | 20 ++ > .../aarch64/sve/pfalse-compare_wide_opt_n.c | 15 + > .../aarch64/sve/pfalse-count_pred.c | 10 + > .../gcc.target/aarch64/sve/pfalse-fold_left.c | 10 + > .../gcc.target/aarch64/sve/pfalse-load.c | 32 +++ > .../gcc.target/aarch64/sve/pfalse-load_ext.c | 40 +++ > .../sve/pfalse-load_ext_gather_index.c | 51 ++++ > .../sve/pfalse-load_ext_gather_offset.c | 71 +++++ > .../aarch64/sve/pfalse-load_gather_sv.c | 32 +++ > .../aarch64/sve/pfalse-load_gather_vs.c | 37 +++ > .../aarch64/sve/pfalse-load_replicate.c | 31 ++ > .../gcc.target/aarch64/sve/pfalse-prefetch.c | 22 ++ > .../sve/pfalse-prefetch_gather_index.c | 27 ++ > .../sve/pfalse-prefetch_gather_offset.c | 25 ++ > .../gcc.target/aarch64/sve/pfalse-ptest.c | 12 + > .../gcc.target/aarch64/sve/pfalse-rdffr.c | 13 + > .../gcc.target/aarch64/sve/pfalse-reduction.c | 23 ++ > .../aarch64/sve/pfalse-reduction_wide.c | 10 + > .../aarch64/sve/pfalse-shift_right_imm.c | 28 ++ > .../gcc.target/aarch64/sve/pfalse-store.c | 53 ++++ > .../aarch64/sve/pfalse-store_scatter_index.c | 40 +++ > .../aarch64/sve/pfalse-store_scatter_offset.c | 68 +++++ > .../gcc.target/aarch64/sve/pfalse-storexn.c | 30 ++ > .../aarch64/sve/pfalse-ternary_opt_n.c | 51 ++++ > .../aarch64/sve/pfalse-ternary_rotate.c | 27 ++ > .../gcc.target/aarch64/sve/pfalse-unary.c | 33 +++ > .../sve/pfalse-unary_convert_narrowt.c | 21 ++ > .../aarch64/sve/pfalse-unary_convertxn.c | 50 ++++ > .../gcc.target/aarch64/sve/pfalse-unary_n.c | 43 +++ > .../aarch64/sve/pfalse-unary_pred.c | 10 + > .../aarch64/sve/pfalse-unary_to_uint.c | 13 + > .../gcc.target/aarch64/sve/pfalse-unaryxn.c | 14 + > .../gcc.target/aarch64/sve2/pfalse-binary.c | 15 + > .../aarch64/sve2/pfalse-binary_int_opt_n.c | 13 + > .../sve2/pfalse-binary_int_opt_single_n.c | 11 + > .../aarch64/sve2/pfalse-binary_opt_n.c | 17 ++ > .../aarch64/sve2/pfalse-binary_opt_single_n.c | 12 + > .../aarch64/sve2/pfalse-binary_to_uint.c | 10 + > .../aarch64/sve2/pfalse-binary_uint_opt_n.c | 11 + > .../aarch64/sve2/pfalse-binary_wide.c | 11 + > .../gcc.target/aarch64/sve2/pfalse-compare.c | 11 + > .../pfalse-load_ext_gather_index_restricted.c | 46 +++ > ...pfalse-load_ext_gather_offset_restricted.c | 65 +++++ > .../sve2/pfalse-load_gather_sv_restricted.c | 28 ++ > .../aarch64/sve2/pfalse-load_gather_vs.c | 36 +++ > .../sve2/pfalse-shift_left_imm_to_uint.c | 28 ++ > .../aarch64/sve2/pfalse-shift_right_imm.c | 32 +++ > .../pfalse-store_scatter_index_restricted.c | 50 ++++ > .../pfalse-store_scatter_offset_restricted.c | 65 +++++ > .../gcc.target/aarch64/sve2/pfalse-unary.c | 32 +++ > .../aarch64/sve2/pfalse-unary_convert.c | 31 ++ > .../sve2/pfalse-unary_convert_narrowt.c | 23 ++ > .../aarch64/sve2/pfalse-unary_to_int.c | 11 + > 68 files changed, 2479 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h > create mode 100644 gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 2117eceb606..aaa293a9091 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -201,6 +201,15 @@ class svac_impl : public function_base > public: > CONSTEXPR svac_impl (int unspec) : m_unspec (unspec) {} > > + gimple * > + fold (gimple_folder &f) const override > + { > + tree pg = gimple_call_arg (f.call, 0); > + if (is_pfalse (pg)) > + return f.fold_call_to (pg); > + return NULL; > + } > + > rtx > expand (function_expander &e) const override > { > @@ -216,6 +225,14 @@ public: > class svadda_impl : public function_base > { > public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (gimple_call_arg (f.call, 1)); > + return NULL; > + } > + > rtx > expand (function_expander &e) const override > { > @@ -227,6 +244,21 @@ public: > } > }; > > +class svaddv_impl : public reduction > +{ > +public: > + CONSTEXPR svaddv_impl () > + : reduction (UNSPEC_SADDV, UNSPEC_UADDV, UNSPEC_FADDV) {} > + > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > +}; > + > /* Implements svadr[bhwd]. */ > class svadr_bhwd_impl : public function_base > { > @@ -245,11 +277,25 @@ public: > e.args.quick_push (expand_vector_broadcast (mode, shift)); > return e.use_exact_insn (code_for_aarch64_adr_shift (mode)); > } > - > /* How many bits left to shift the vector displacement. */ > unsigned int m_shift; > }; > > + > +class svandv_impl : public reduction > +{ > +public: > + CONSTEXPR svandv_impl () : reduction (UNSPEC_ANDV) {} > + > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (build_all_ones_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > +}; > + > class svbic_impl : public function_base > { > public: > @@ -333,6 +379,14 @@ class svclast_impl : public quiet<function_base> > public: > CONSTEXPR svclast_impl (int unspec) : m_unspec (unspec) {} > > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (gimple_call_arg (f.call, 1)); > + return NULL; > + } > + > rtx > expand (function_expander &e) const override > { > @@ -425,6 +479,8 @@ public: > return gimple_build_assign (f.lhs, m_code, rhs1, rhs2); > } > > + if (is_pfalse (pg)) > + return f.fold_call_to (pg); > return NULL; > } > > @@ -464,6 +520,15 @@ public: > : m_code (code), m_unspec_for_sint (unspec_for_sint), > m_unspec_for_uint (unspec_for_uint) {} > > + gimple * > + fold (gimple_folder &f) const override > + { > + tree pg = gimple_call_arg (f.call, 0); > + if (is_pfalse (pg)) > + return f.fold_call_to (pg); > + return NULL; > + } > + > rtx > expand (function_expander &e) const override > { > @@ -502,6 +567,16 @@ public: > class svcmpuo_impl : public quiet<function_base> > { > public: > + > + gimple * > + fold (gimple_folder &f) const override > + { > + tree pg = gimple_call_arg (f.call, 0); > + if (is_pfalse (pg)) > + return f.fold_call_to (pg); > + return NULL; > + } > + > rtx > expand (function_expander &e) const override > { > @@ -598,6 +673,16 @@ public: > class svcntp_impl : public function_base > { > public: > + > + gimple * > + fold (gimple_folder &f) const override > + { > + tree pg = gimple_call_arg (f.call, 0); > + if (is_pfalse (pg)) > + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > + > rtx > expand (function_expander &e) const override > { > @@ -613,6 +698,19 @@ public: > } > }; > > +class svcompact_impl > + : public QUIET_CODE_FOR_MODE0 (aarch64_sve_compact) > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > +}; > + > /* Implements svcreate2, svcreate3 and svcreate4. */ > class svcreate_impl : public quiet<multi_vector_function> > { > @@ -746,6 +844,18 @@ public: > } > }; > > +class svcvtnt_impl : public CODE_FOR_MODE0 (aarch64_sve_cvtnt) > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (f.pred == PRED_x && is_pfalse (gimple_call_arg (f.call, 1))) > + f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > +}; > + > class svdiv_impl : public rtx_code_function > { > public: > @@ -1138,6 +1248,20 @@ public: > } > }; > > +class sveorv_impl : public reduction > +{ > +public: > + CONSTEXPR sveorv_impl () : reduction (UNSPEC_XORV) {} > + > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > +}; > + > /* Implements svextb, svexth and svextw. */ > class svext_bhw_impl : public function_base > { > @@ -1380,7 +1504,8 @@ public: > BIT_FIELD_REF lowers to Advanced SIMD element extract, so we have to > ensure the index of the element being accessed is in the range of a > Advanced SIMD vector width. */ > - gimple *fold (gimple_folder & f) const override > + gimple * > + fold (gimple_folder & f) const override > { > tree pred = gimple_call_arg (f.call, 0); > tree val = gimple_call_arg (f.call, 1); > @@ -1950,6 +2075,80 @@ public: > } > }; > > +class svminv_impl : public reduction > +{ > +public: > + CONSTEXPR svminv_impl () > + : reduction (UNSPEC_SMINV, UNSPEC_UMINV, UNSPEC_FMINV) {} > + > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + { > + tree rhs = f.type_suffix (0).integer_p > + ? TYPE_MAX_VALUE (TREE_TYPE (f.lhs)) > + : build_real (TREE_TYPE (f.lhs), dconstinf); > + return f.fold_call_to (rhs); > + } > + return NULL; > + } > +}; > + > +class svmaxnmv_impl : public reduction > +{ > +public: > + CONSTEXPR svmaxnmv_impl () : reduction (UNSPEC_FMAXNMV) {} > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + { > + REAL_VALUE_TYPE rnan = dconst0; > + rnan.cl = rvc_nan; > + return f.fold_call_to (build_real (TREE_TYPE (f.lhs), rnan)); > + } > + return NULL; > + } > +}; > + > +class svmaxv_impl : public reduction > +{ > +public: > + CONSTEXPR svmaxv_impl () > + : reduction (UNSPEC_SMAXV, UNSPEC_UMAXV, UNSPEC_FMAXV) {} > + > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + { > + tree rhs = f.type_suffix (0).integer_p > + ? TYPE_MIN_VALUE (TREE_TYPE (f.lhs)) > + : build_real (TREE_TYPE (f.lhs), dconstninf); > + return f.fold_call_to (rhs); > + } > + return NULL; > + } > +}; > + > +class svminnmv_impl : public reduction > +{ > +public: > + CONSTEXPR svminnmv_impl () : reduction (UNSPEC_FMINNMV) {} > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + { > + REAL_VALUE_TYPE rnan = dconst0; > + rnan.cl = rvc_nan; > + return f.fold_call_to (build_real (TREE_TYPE (f.lhs), rnan)); > + } > + return NULL; > + } > +}; > + > class svmla_impl : public function_base > { > public: > @@ -2198,6 +2397,20 @@ public: > } > }; > > +class svorv_impl : public reduction > +{ > +public: > + CONSTEXPR svorv_impl () : reduction (UNSPEC_IORV) {} > + > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > +}; > + > class svpfalse_impl : public function_base > { > public: > @@ -2222,6 +2435,16 @@ class svpfirst_svpnext_impl : public function_base > { > public: > CONSTEXPR svpfirst_svpnext_impl (int unspec) : m_unspec (unspec) {} > + gimple * > + fold (gimple_folder &f) const override > + { > + tree pg = gimple_call_arg (f.call, 0); > + if (is_pfalse (pg)) > + return f.fold_call_to (m_unspec == UNSPEC_PFIRST > + ? gimple_call_arg (f.call, 1) > + : pg); > + return NULL; > + } > > rtx > expand (function_expander &e) const override > @@ -2302,6 +2525,13 @@ class svptest_impl : public function_base > { > public: > CONSTEXPR svptest_impl (rtx_code compare) : m_compare (compare) {} > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (boolean_false_node); > + return NULL; > + } > > rtx > expand (function_expander &e) const override > @@ -2717,6 +2947,18 @@ public: > } > }; > > +class svsplice_impl : public QUIET_CODE_FOR_MODE0 (aarch64_sve_splice) > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (is_pfalse (gimple_call_arg (f.call, 0))) > + return f.fold_call_to (gimple_call_arg (f.call, 2)); > + return NULL; > + } > +}; > + > class svst1_impl : public full_width_access > { > public: > @@ -3170,13 +3412,13 @@ FUNCTION (svacle, svac_impl, (UNSPEC_COND_FCMLE)) > FUNCTION (svaclt, svac_impl, (UNSPEC_COND_FCMLT)) > FUNCTION (svadd, rtx_code_function, (PLUS, PLUS, UNSPEC_COND_FADD)) > FUNCTION (svadda, svadda_impl,) > -FUNCTION (svaddv, reduction, (UNSPEC_SADDV, UNSPEC_UADDV, UNSPEC_FADDV)) > +FUNCTION (svaddv, svaddv_impl,) > FUNCTION (svadrb, svadr_bhwd_impl, (0)) > FUNCTION (svadrd, svadr_bhwd_impl, (3)) > FUNCTION (svadrh, svadr_bhwd_impl, (1)) > FUNCTION (svadrw, svadr_bhwd_impl, (2)) > FUNCTION (svand, rtx_code_function, (AND, AND)) > -FUNCTION (svandv, reduction, (UNSPEC_ANDV)) > +FUNCTION (svandv, svandv_impl,) > FUNCTION (svasr, rtx_code_function, (ASHIFTRT, ASHIFTRT)) > FUNCTION (svasr_wide, shift_wide, (ASHIFTRT, UNSPEC_ASHIFTRT_WIDE)) > FUNCTION (svasrd, unspec_based_function, (UNSPEC_ASRD, -1, -1)) > @@ -3233,12 +3475,12 @@ FUNCTION (svcnth_pat, svcnt_bhwd_pat_impl, (VNx8HImode)) > FUNCTION (svcntp, svcntp_impl,) > FUNCTION (svcntw, svcnt_bhwd_impl, (VNx4SImode)) > FUNCTION (svcntw_pat, svcnt_bhwd_pat_impl, (VNx4SImode)) > -FUNCTION (svcompact, QUIET_CODE_FOR_MODE0 (aarch64_sve_compact),) > +FUNCTION (svcompact, svcompact_impl,) > FUNCTION (svcreate2, svcreate_impl, (2)) > FUNCTION (svcreate3, svcreate_impl, (3)) > FUNCTION (svcreate4, svcreate_impl, (4)) > FUNCTION (svcvt, svcvt_impl,) > -FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),) > +FUNCTION (svcvtnt, svcvtnt_impl,) > FUNCTION (svdiv, svdiv_impl,) > FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV)) > FUNCTION (svdot, svdot_impl,) > @@ -3249,7 +3491,7 @@ FUNCTION (svdup_lane, svdup_lane_impl,) > FUNCTION (svdupq, svdupq_impl,) > FUNCTION (svdupq_lane, svdupq_lane_impl,) > FUNCTION (sveor, rtx_code_function, (XOR, XOR, -1)) > -FUNCTION (sveorv, reduction, (UNSPEC_XORV)) > +FUNCTION (sveorv, sveorv_impl,) > FUNCTION (svexpa, unspec_based_function, (-1, -1, UNSPEC_FEXPA)) > FUNCTION (svext, QUIET_CODE_FOR_MODE0 (aarch64_sve_ext),) > FUNCTION (svextb, svext_bhw_impl, (QImode)) > @@ -3313,14 +3555,14 @@ FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX, > UNSPEC_FMAX)) > FUNCTION (svmaxnm, cond_or_uncond_unspec_function, (UNSPEC_COND_FMAXNM, > UNSPEC_FMAXNM)) > -FUNCTION (svmaxnmv, reduction, (UNSPEC_FMAXNMV)) > -FUNCTION (svmaxv, reduction, (UNSPEC_SMAXV, UNSPEC_UMAXV, UNSPEC_FMAXV)) > +FUNCTION (svmaxnmv, svmaxnmv_impl,) > +FUNCTION (svmaxv, svmaxv_impl,) > FUNCTION (svmin, rtx_code_function, (SMIN, UMIN, UNSPEC_COND_FMIN, > UNSPEC_FMIN)) > FUNCTION (svminnm, cond_or_uncond_unspec_function, (UNSPEC_COND_FMINNM, > UNSPEC_FMINNM)) > -FUNCTION (svminnmv, reduction, (UNSPEC_FMINNMV)) > -FUNCTION (svminv, reduction, (UNSPEC_SMINV, UNSPEC_UMINV, UNSPEC_FMINV)) > +FUNCTION (svminnmv, svminnmv_impl,) > +FUNCTION (svminv, svminv_impl,) > FUNCTION (svmla, svmla_impl,) > FUNCTION (svmla_lane, svmla_lane_impl,) > FUNCTION (svmls, svmls_impl,) > @@ -3343,7 +3585,7 @@ FUNCTION (svnor, svnor_impl,) > FUNCTION (svnot, svnot_impl,) > FUNCTION (svorn, svorn_impl,) > FUNCTION (svorr, rtx_code_function, (IOR, IOR)) > -FUNCTION (svorv, reduction, (UNSPEC_IORV)) > +FUNCTION (svorv, svorv_impl,) > FUNCTION (svpfalse, svpfalse_impl,) > FUNCTION (svpfirst, svpfirst_svpnext_impl, (UNSPEC_PFIRST)) > FUNCTION (svpnext, svpfirst_svpnext_impl, (UNSPEC_PNEXT)) > @@ -3405,7 +3647,7 @@ FUNCTION (svset2, svset_impl, (2)) > FUNCTION (svset3, svset_impl, (3)) > FUNCTION (svset4, svset_impl, (4)) > FUNCTION (svsetffr, svsetffr_impl,) > -FUNCTION (svsplice, QUIET_CODE_FOR_MODE0 (aarch64_sve_splice),) > +FUNCTION (svsplice, svsplice_impl,) > FUNCTION (svsqrt, rtx_code_function, (SQRT, SQRT, UNSPEC_COND_FSQRT)) > FUNCTION (svst1, svst1_impl,) > FUNCTION (svst1_scatter, svst1_scatter_impl,) > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc > index fd0c98c6b68..073763916a8 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc > @@ -221,6 +221,18 @@ public: > } > }; > > +class svcvtxnt_impl : public CODE_FOR_MODE1 (aarch64_sve2_cvtxnt) > +{ > +public: > + gimple * > + fold (gimple_folder &f) const override > + { > + if (f.pred == PRED_x && is_pfalse (gimple_call_arg (f.call, 1))) > + return f.fold_call_to (build_zero_cst (TREE_TYPE (f.lhs))); > + return NULL; > + } > +}; > + > class svdup_laneq_impl : public function_base > { > public: > @@ -358,6 +370,14 @@ class svmatch_svnmatch_impl : public function_base > { > public: > CONSTEXPR svmatch_svnmatch_impl (int unspec) : m_unspec (unspec) {} > + gimple * > + fold (gimple_folder &f) const override > + { > + tree pg = gimple_call_arg (f.call, 0); > + if (is_pfalse (pg)) > + return f.fold_call_to (pg); > + return NULL; > + } > > rtx > expand (function_expander &e) const override > @@ -910,7 +930,7 @@ FUNCTION (svclamp, svclamp_impl,) > FUNCTION (svcvtlt, unspec_based_function, (-1, -1, UNSPEC_COND_FCVTLT)) > FUNCTION (svcvtn, svcvtn_impl,) > FUNCTION (svcvtx, unspec_based_function, (-1, -1, UNSPEC_COND_FCVTX)) > -FUNCTION (svcvtxnt, CODE_FOR_MODE1 (aarch64_sve2_cvtxnt),) > +FUNCTION (svcvtxnt, svcvtxnt_impl,) > FUNCTION (svdup_laneq, svdup_laneq_impl,) > FUNCTION (sveor3, CODE_FOR_MODE0 (aarch64_sve2_eor3),) > FUNCTION (sveorbt, unspec_based_function, (UNSPEC_EORBT, UNSPEC_EORBT, -1)) > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc > index b3d961452d3..554984b786c 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -3517,6 +3517,15 @@ is_ptrue (tree v, unsigned int step) > && vector_cst_all_same (v, step)); > } > > +/* Return true if V is a constant predicate that acts as a pfalse. */ > +bool > +is_pfalse (tree v) > +{ > + return (TREE_CODE (v) == VECTOR_CST > + && TYPE_MODE (TREE_TYPE (v)) == VNx16BImode > + && integer_zerop (v)); > +} > + > gimple_folder::gimple_folder (const function_instance &instance, tree fndecl, > gimple_stmt_iterator *gsi_in, gcall *call_in) > : function_call_info (gimple_location (call_in), instance, fndecl), > @@ -3622,6 +3631,57 @@ gimple_folder::redirect_pred_x () > return redirect_call (instance); > } > > +/* Fold calls with predicate pfalse: > + _m predication: lhs = op1. > + _x or _z: lhs = {0, ...}. > + Implicit predication that reads from memory: lhs = {0, ...}. > + Implicit predication that writes to memory or prefetches: no-op. > + Return the new gimple statement on success, else NULL. */ > +gimple * > +gimple_folder::fold_pfalse () > +{ > + if (pred == PRED_none) > + return nullptr; > + tree arg0 = gimple_call_arg (call, 0); > + if (pred == PRED_m) > + { > + /* Unary function shapes with _m predication are folded to the > + inactive vector (arg0), while other function shapes are folded > + to op1 (arg1). */ > + tree arg1 = gimple_call_arg (call, 1); > + tree t; > + if (is_pfalse (arg1)) > + t = arg0; > + else if (is_pfalse (arg0)) > + t = arg1; > + else > + return nullptr; > + /* In some intrinsics, e.g. svqshlu, lhs and op1 have different types, > + such that folding to op1 needs a type conversion. */ > + if (TREE_TYPE (t) == TREE_TYPE (lhs)) > + return fold_call_to (t); > + else > + { > + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); > + return gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs); > + } > + } > + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) > + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); > + if (pred == PRED_implicit && is_pfalse (arg0)) > + { > + unsigned int flags = call_properties (); > + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with > + AGGREGATE types as lhs. */ > + if ((flags & CP_READ_MEMORY) > + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) > + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); > + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) > + return fold_to_stmt_vops (gimple_build_nop ()); > + } > + return nullptr; > +} > + > /* Fold the call to constant VAL. */ > gimple * > gimple_folder::fold_to_cstu (poly_uint64 val) > @@ -3724,6 +3784,23 @@ gimple_folder::fold_active_lanes_to (tree x) > return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); > } > > +/* Fold call to assignment statement lhs = t. */ > +gimple * > +gimple_folder::fold_call_to (tree t) > +{ > + return fold_to_stmt_vops (gimple_build_assign (lhs, t)); > +} > + > +/* Fold call to G, incl. adjustments to the virtual operands. */ > +gimple * > +gimple_folder::fold_to_stmt_vops (gimple *g) > +{ > + gimple_seq stmts = NULL; > + gimple_seq_add_stmt_without_update (&stmts, g); > + gsi_replace_with_seq_vops (gsi, stmts); > + return g; > +} > + > /* Try to fold the call. Return the new statement on success and null > on failure. */ > gimple * > @@ -3743,6 +3820,8 @@ gimple_folder::fold () > /* First try some simplifications that are common to many functions. */ > if (auto *call = redirect_pred_x ()) > return call; > + if (auto *call = fold_pfalse ()) > + return call; > > return base->fold (*this); > } > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h > index 5bd9b88d117..ad4d06dd5f5 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.h > +++ b/gcc/config/aarch64/aarch64-sve-builtins.h > @@ -632,6 +632,7 @@ public: > > gcall *redirect_call (const function_instance &); > gimple *redirect_pred_x (); > + gimple *fold_pfalse (); > > gimple *fold_to_cstu (poly_uint64); > gimple *fold_to_pfalse (); > @@ -639,6 +640,8 @@ public: > gimple *fold_to_vl_pred (unsigned int); > gimple *fold_const_binary (enum tree_code); > gimple *fold_active_lanes_to (tree); > + gimple *fold_call_to (tree); > + gimple *fold_to_stmt_vops (gimple *); > > gimple *fold (); > > @@ -832,6 +835,7 @@ extern tree acle_svprfop; > > bool vector_cst_all_same (tree, unsigned int); > bool is_ptrue (tree, unsigned int); > +bool is_pfalse (tree); > const function_instance *lookup_fndecl (tree); > > /* Try to find a mode with the given mode_suffix_info fields. Return the > diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h > new file mode 100644 > index 00000000000..72fb5b6ac6d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.h > @@ -0,0 +1,240 @@ > +#include <arm_sve.h> > + > +#define MXZ(F, RTY, TY1, TY2) \ > + RTY F##_f (TY1 op1, TY2 op2) \ > + { \ > + return sv##F (svpfalse_b (), op1, op2); \ > + } > + > +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_##TY##_m, RTY, TYPE1, sv##TYPE2) \ > + MXZ (F##_##TY##_x, RTY, TYPE1, sv##TYPE2) > + > +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_##TY##_z, RTY, TYPE1, sv##TYPE2) > + > +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY) \ > + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ > + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) > + > +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY) \ > + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_n_##TY##_z, RTY, TYPE1, TYPE2) > + > +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY) \ > + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_n_##TY##_m, RTY, TYPE1, TYPE2) \ > + MXZ (F##_n_##TY##_x, RTY, TYPE1, TYPE2) \ > + PRED_Z (F, RTY, TYPE1, TYPE2, TY) > + > +#define PRED_IMPLICITv(F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_##TY, RTY, TYPE1, sv##TYPE2) > + > +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY) \ > + PRED_IMPLICITv (F, RTY, TYPE1, TYPE2, TY) \ > + MXZ (F##_n_##TY, RTY, TYPE1, TYPE2) > + > +#define ALL_Q_INTEGER(F, P) \ > + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ > + PRED_##P (F, svint8_t, svint8_t, int8_t, s8) > + > +#define ALL_Q_INTEGER_UINT(F, P) \ > + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ > + PRED_##P (F, svint8_t, svint8_t, uint8_t, s8) > + > +#define ALL_Q_INTEGER_INT(F, P) \ > + PRED_##P (F, svuint8_t, svuint8_t, int8_t, u8) \ > + PRED_##P (F, svint8_t, svint8_t, int8_t, s8) > + > +#define ALL_Q_INTEGER_BOOL(F, P) \ > + PRED_##P (F, svbool_t, svuint8_t, uint8_t, u8) \ > + PRED_##P (F, svbool_t, svint8_t, int8_t, s8) > + > +#define ALL_H_INTEGER(F, P) \ > + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ > + PRED_##P (F, svint16_t, svint16_t, int16_t, s16) > + > +#define ALL_H_INTEGER_UINT(F, P) \ > + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ > + PRED_##P (F, svint16_t, svint16_t, uint16_t, s16) > + > +#define ALL_H_INTEGER_INT(F, P) \ > + PRED_##P (F, svuint16_t, svuint16_t, int16_t, u16) \ > + PRED_##P (F, svint16_t, svint16_t, int16_t, s16) > + > +#define ALL_H_INTEGER_WIDE(F, P) \ > + PRED_##P (F, svuint16_t, svuint16_t, uint8_t, u16) \ > + PRED_##P (F, svint16_t, svint16_t, int8_t, s16) > + > +#define ALL_H_INTEGER_BOOL(F, P) \ > + PRED_##P (F, svbool_t, svuint16_t, uint16_t, u16) \ > + PRED_##P (F, svbool_t, svint16_t, int16_t, s16) > + > +#define ALL_S_INTEGER(F, P) \ > + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ > + PRED_##P (F, svint32_t, svint32_t, int32_t, s32) > + > +#define ALL_S_INTEGER_UINT(F, P) \ > + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ > + PRED_##P (F, svint32_t, svint32_t, uint32_t, s32) > + > +#define ALL_S_INTEGER_INT(F, P) \ > + PRED_##P (F, svuint32_t, svuint32_t, int32_t, u32) \ > + PRED_##P (F, svint32_t, svint32_t, int32_t, s32) > + > +#define ALL_S_INTEGER_WIDE(F, P) \ > + PRED_##P (F, svuint32_t, svuint32_t, uint16_t, u32) \ > + PRED_##P (F, svint32_t, svint32_t, int16_t, s32) > + > +#define ALL_S_INTEGER_BOOL(F, P) \ > + PRED_##P (F, svbool_t, svuint32_t, uint32_t, u32) \ > + PRED_##P (F, svbool_t, svint32_t, int32_t, s32) > + > +#define ALL_D_INTEGER(F, P) \ > + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ > + PRED_##P (F, svint64_t, svint64_t, int64_t, s64) > + > +#define ALL_D_INTEGER_UINT(F, P) \ > + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ > + PRED_##P (F, svint64_t, svint64_t, uint64_t, s64) > + > +#define ALL_D_INTEGER_INT(F, P) \ > + PRED_##P (F, svuint64_t, svuint64_t, int64_t, u64) \ > + PRED_##P (F, svint64_t, svint64_t, int64_t, s64) > + > +#define ALL_D_INTEGER_WIDE(F, P) \ > + PRED_##P (F, svuint64_t, svuint64_t, uint32_t, u64) \ > + PRED_##P (F, svint64_t, svint64_t, int32_t, s64) > + > +#define ALL_D_INTEGER_BOOL(F, P) \ > + PRED_##P (F, svbool_t, svuint64_t, uint64_t, u64) \ > + PRED_##P (F, svbool_t, svint64_t, int64_t, s64) > + > +#define SD_INTEGER_TO_UINT(F, P) \ > + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ > + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) \ > + PRED_##P (F, svuint32_t, svint32_t, int32_t, s32) \ > + PRED_##P (F, svuint64_t, svint64_t, int64_t, s64) > + > +#define BH_INTEGER_BOOL(F, P) \ > + ALL_Q_INTEGER_BOOL (F, P) \ > + ALL_H_INTEGER_BOOL (F, P) > + > +#define BHS_UNSIGNED_UINT64(F, P) \ > + PRED_##P (F, svuint8_t, svuint8_t, uint64_t, u8) \ > + PRED_##P (F, svuint16_t, svuint16_t, uint64_t, u16) \ > + PRED_##P (F, svuint32_t, svuint32_t, uint64_t, u32) > + > +#define BHS_UNSIGNED_WIDE_BOOL(F, P) \ > + PRED_##P (F, svbool_t, svuint8_t, uint64_t, u8) \ > + PRED_##P (F, svbool_t, svuint16_t, uint64_t, u16) \ > + PRED_##P (F, svbool_t, svuint32_t, uint64_t, u32) > + > +#define BHS_SIGNED_UINT64(F, P) \ > + PRED_##P (F, svint8_t, svint8_t, uint64_t, s8) \ > + PRED_##P (F, svint16_t, svint16_t, uint64_t, s16) \ > + PRED_##P (F, svint32_t, svint32_t, uint64_t, s32) > + > +#define BHS_SIGNED_WIDE_BOOL(F, P) \ > + PRED_##P (F, svbool_t, svint8_t, int64_t, s8) \ > + PRED_##P (F, svbool_t, svint16_t, int64_t, s16) \ > + PRED_##P (F, svbool_t, svint32_t, int64_t, s32) > + > +#define ALL_UNSIGNED_UINT(F, P) \ > + PRED_##P (F, svuint8_t, svuint8_t, uint8_t, u8) \ > + PRED_##P (F, svuint16_t, svuint16_t, uint16_t, u16) \ > + PRED_##P (F, svuint32_t, svuint32_t, uint32_t, u32) \ > + PRED_##P (F, svuint64_t, svuint64_t, uint64_t, u64) > + > +#define ALL_UNSIGNED_INT(F, P) \ > + PRED_##P (F, svuint8_t, svuint8_t, int8_t, u8) \ > + PRED_##P (F, svuint16_t, svuint16_t, int16_t, u16) \ > + PRED_##P (F, svuint32_t, svuint32_t, int32_t, u32) \ > + PRED_##P (F, svuint64_t, svuint64_t, int64_t, u64) > + > +#define ALL_SIGNED_UINT(F, P) \ > + PRED_##P (F, svint8_t, svint8_t, uint8_t, s8) \ > + PRED_##P (F, svint16_t, svint16_t, uint16_t, s16) \ > + PRED_##P (F, svint32_t, svint32_t, uint32_t, s32) \ > + PRED_##P (F, svint64_t, svint64_t, uint64_t, s64) > + > +#define ALL_FLOAT(F, P) \ > + PRED_##P (F, svfloat16_t, svfloat16_t, float16_t, f16) \ > + PRED_##P (F, svfloat32_t, svfloat32_t, float32_t, f32) \ > + PRED_##P (F, svfloat64_t, svfloat64_t, float64_t, f64) > + > +#define ALL_FLOAT_INT(F, P) \ > + PRED_##P (F, svfloat16_t, svfloat16_t, int16_t, f16) \ > + PRED_##P (F, svfloat32_t, svfloat32_t, int32_t, f32) \ > + PRED_##P (F, svfloat64_t, svfloat64_t, int64_t, f64) > + > +#define ALL_FLOAT_BOOL(F, P) \ > + PRED_##P (F, svbool_t, svfloat16_t, float16_t, f16) \ > + PRED_##P (F, svbool_t, svfloat32_t, float32_t, f32) \ > + PRED_##P (F, svbool_t, svfloat64_t, float64_t, f64) > + > +#define ALL_FLOAT_SCALAR(F, P) \ > + PRED_##P (F, float16_t, float16_t, float16_t, f16) \ > + PRED_##P (F, float32_t, float32_t, float32_t, f32) \ > + PRED_##P (F, float64_t, float64_t, float64_t, f64) > + > +#define B(F, P) \ > + PRED_##P (F, svbool_t, svbool_t, bool_t, b) > + > +#define ALL_SD_INTEGER(F, P) \ > + ALL_S_INTEGER (F, P) \ > + ALL_D_INTEGER (F, P) > + > +#define HSD_INTEGER_WIDE(F, P) \ > + ALL_H_INTEGER_WIDE (F, P) \ > + ALL_S_INTEGER_WIDE (F, P) \ > + ALL_D_INTEGER_WIDE (F, P) > + > +#define BHS_INTEGER_UINT64(F, P) \ > + BHS_UNSIGNED_UINT64 (F, P) \ > + BHS_SIGNED_UINT64 (F, P) > + > +#define BHS_INTEGER_WIDE_BOOL(F, P) \ > + BHS_UNSIGNED_WIDE_BOOL (F, P) \ > + BHS_SIGNED_WIDE_BOOL (F, P) > + > +#define ALL_INTEGER(F, P) \ > + ALL_Q_INTEGER (F, P) \ > + ALL_H_INTEGER (F, P) \ > + ALL_S_INTEGER (F, P) \ > + ALL_D_INTEGER (F, P) > + > +#define ALL_INTEGER_UINT(F, P) \ > + ALL_Q_INTEGER_UINT (F, P) \ > + ALL_H_INTEGER_UINT (F, P) \ > + ALL_S_INTEGER_UINT (F, P) \ > + ALL_D_INTEGER_UINT (F, P) > + > +#define ALL_INTEGER_INT(F, P) \ > + ALL_Q_INTEGER_INT (F, P) \ > + ALL_H_INTEGER_INT (F, P) \ > + ALL_S_INTEGER_INT (F, P) \ > + ALL_D_INTEGER_INT (F, P) > + > +#define ALL_INTEGER_BOOL(F, P) \ > + ALL_Q_INTEGER_BOOL (F, P) \ > + ALL_H_INTEGER_BOOL (F, P) \ > + ALL_S_INTEGER_BOOL (F, P) \ > + ALL_D_INTEGER_BOOL (F, P) > + > +#define ALL_FLOAT_AND_SD_INTEGER(F, P) \ > + ALL_SD_INTEGER (F, P) \ > + ALL_FLOAT (F, P) > + > +#define ALL_ARITH(F, P) \ > + ALL_INTEGER (F, P) \ > + ALL_FLOAT (F, P) > + > +#define ALL_ARITH_BOOL(F, P) \ > + ALL_INTEGER_BOOL (F, P) \ > + ALL_FLOAT_BOOL (F, P) > + > +#define ALL_DATA(F, P) \ > + ALL_ARITH (F, P) \ > + PRED_##P (F, svbfloat16_t, svbfloat16_t, bfloat16_t, bf16) > + > diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h b/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h > new file mode 100644 > index 00000000000..f6183a14325 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-unary_0.h > @@ -0,0 +1,195 @@ > +#include <arm_sve.h> > +#include <stdbool.h> > + > +#define M(F, RTY, TY) \ > + RTY F##_f (RTY inactive, TY op) \ > + { \ > + return sv##F (inactive, svpfalse_b (), op); \ > + } > + > +#define XZI(F, RTY, TY) \ > + RTY F##_f (TY op) \ > + { \ > + return sv##F (svpfalse_b (), op); \ > + } > + > +#define PRED_Z(F, RTY, TYPE, TY) \ > + XZI (F##_##TY##_z, RTY, sv##TYPE) \ > + > +#define PRED_MZ(F, RTY, TYPE, TY) \ > + M (F##_##TY##_m, RTY, sv##TYPE) \ > + PRED_Z (F, RTY, TYPE, TY) > + > +#define PRED_MXZ(F, RTY, TYPE, TY) \ > + XZI (F##_##TY##_x, RTY, sv##TYPE) \ > + PRED_MZ (F, RTY, TYPE, TY) > + > +#define PRED_IMPLICIT(F, RTY, TYPE, TY) \ > + XZI (F##_##TY, RTY, sv##TYPE) > + > +#define PRED_IMPLICITn(F, RTY, TYPE) \ > + XZI (F, RTY, sv##TYPE) > + > +#define Q_INTEGER(F, P) \ > + PRED_##P (F, svuint8_t, uint8_t, u8) \ > + PRED_##P (F, svint8_t, int8_t, s8) > + > +#define Q_INTEGER_SCALAR(F, P) \ > + PRED_##P (F, uint8_t, uint8_t, u8) \ > + PRED_##P (F, int8_t, int8_t, s8) > + > +#define Q_INTEGER_SCALAR_WIDE(F, P) \ > + PRED_##P (F, uint64_t, uint8_t, u8) \ > + PRED_##P (F, int64_t, int8_t, s8) > + > +#define H_INTEGER(F, P) \ > + PRED_##P (F, svuint16_t, uint16_t, u16) \ > + PRED_##P (F, svint16_t, int16_t, s16) > + > +#define H_INTEGER_SCALAR(F, P) \ > + PRED_##P (F, uint16_t, uint16_t, u16) \ > + PRED_##P (F, int16_t, int16_t, s16) > + > +#define H_INTEGER_SCALAR_WIDE(F, P) \ > + PRED_##P (F, uint64_t, uint16_t, u16) \ > + PRED_##P (F, int64_t, int16_t, s16) > + > +#define S_INTEGER(F, P) \ > + PRED_##P (F, svuint32_t, uint32_t, u32) \ > + PRED_##P (F, svint32_t, int32_t, s32) > + > +#define S_INTEGER_SCALAR(F, P) \ > + PRED_##P (F, uint32_t, uint32_t, u32) \ > + PRED_##P (F, int32_t, int32_t, s32) > + > +#define S_INTEGER_SCALAR_WIDE(F, P) \ > + PRED_##P (F, uint64_t, uint32_t, u32) \ > + PRED_##P (F, int64_t, int32_t, s32) > + > +#define S_UNSIGNED(F, P) \ > + PRED_##P (F, svuint32_t, uint32_t, u32) > + > +#define D_INTEGER(F, P) \ > + PRED_##P (F, svuint64_t, uint64_t, u64) \ > + PRED_##P (F, svint64_t, int64_t, s64) > + > +#define D_INTEGER_SCALAR(F, P) \ > + PRED_##P (F, uint64_t, uint64_t, u64) \ > + PRED_##P (F, int64_t, int64_t, s64) > + > +#define SD_INTEGER(F, P) \ > + S_INTEGER (F, P) \ > + D_INTEGER (F, P) > + > +#define SD_DATA(F, P) \ > + PRED_##P (F, svfloat32_t, float32_t, f32) \ > + PRED_##P (F, svfloat64_t, float64_t, f64) \ > + S_INTEGER (F, P) \ > + D_INTEGER (F, P) > + > +#define ALL_SIGNED(F, P) \ > + PRED_##P (F, svint8_t, int8_t, s8) \ > + PRED_##P (F, svint16_t, int16_t, s16) \ > + PRED_##P (F, svint32_t, int32_t, s32) \ > + PRED_##P (F, svint64_t, int64_t, s64) > + > +#define ALL_SIGNED_UINT(F, P) \ > + PRED_##P (F, svuint8_t, int8_t, s8) \ > + PRED_##P (F, svuint16_t, int16_t, s16) \ > + PRED_##P (F, svuint32_t, int32_t, s32) \ > + PRED_##P (F, svuint64_t, int64_t, s64) > + > +#define ALL_UNSIGNED_UINT(F, P) \ > + PRED_##P (F, svuint8_t, uint8_t, u8) \ > + PRED_##P (F, svuint16_t, uint16_t, u16) \ > + PRED_##P (F, svuint32_t, uint32_t, u32) \ > + PRED_##P (F, svuint64_t, uint64_t, u64) > + > +#define HSD_INTEGER(F, P) \ > + H_INTEGER (F, P) \ > + S_INTEGER (F, P) \ > + D_INTEGER (F, P) > + > +#define ALL_INTEGER(F, P) \ > + Q_INTEGER (F, P) \ > + HSD_INTEGER (F, P) > + > +#define ALL_INTEGER_SCALAR(F, P) \ > + Q_INTEGER_SCALAR (F, P) \ > + H_INTEGER_SCALAR (F, P) \ > + S_INTEGER_SCALAR (F, P) \ > + D_INTEGER_SCALAR (F, P) > + > +#define ALL_INTEGER_SCALAR_WIDE(F, P) \ > + Q_INTEGER_SCALAR_WIDE (F, P) \ > + H_INTEGER_SCALAR_WIDE (F, P) \ > + S_INTEGER_SCALAR_WIDE (F, P) \ > + D_INTEGER_SCALAR (F, P) > + > +#define ALL_INTEGER_UINT(F, P) \ > + ALL_SIGNED_UINT (F, P) \ > + ALL_UNSIGNED_UINT (F, P) > + > +#define ALL_FLOAT(F, P) \ > + PRED_##P (F, svfloat16_t, float16_t, f16) \ > + PRED_##P (F, svfloat32_t, float32_t, f32) \ > + PRED_##P (F, svfloat64_t, float64_t, f64) > + > +#define ALL_FLOAT_SCALAR(F, P) \ > + PRED_##P (F, float16_t, float16_t, f16) \ > + PRED_##P (F, float32_t, float32_t, f32) \ > + PRED_##P (F, float64_t, float64_t, f64) > + > +#define ALL_FLOAT_INT(F, P) \ > + PRED_##P (F, svint16_t, float16_t, f16) \ > + PRED_##P (F, svint32_t, float32_t, f32) \ > + PRED_##P (F, svint64_t, float64_t, f64) > + > +#define ALL_FLOAT_UINT(F, P) \ > + PRED_##P (F, svuint16_t, float16_t, f16) \ > + PRED_##P (F, svuint32_t, float32_t, f32) \ > + PRED_##P (F, svuint64_t, float64_t, f64) > + > +#define ALL_FLOAT_AND_SIGNED(F, P) \ > + ALL_SIGNED (F, P) \ > + ALL_FLOAT (F, P) > + > +#define ALL_ARITH_SCALAR(F, P) \ > + ALL_INTEGER_SCALAR (F, P) \ > + ALL_FLOAT_SCALAR (F, P) > + > +#define ALL_ARITH_SCALAR_WIDE(F, P) \ > + ALL_INTEGER_SCALAR_WIDE (F, P) \ > + ALL_FLOAT_SCALAR (F, P) > + > +#define ALL_DATA(F, P) \ > + ALL_INTEGER (F, P) \ > + ALL_FLOAT (F, P) \ > + PRED_##P (F, svbfloat16_t, bfloat16_t, bf16) > + > +#define ALL_DATA_SCALAR(F, P) \ > + ALL_ARITH_SCALAR (F, P) \ > + PRED_##P (F, bfloat16_t, bfloat16_t, bf16) > + > +#define ALL_DATA_UINT(F, P) \ > + ALL_INTEGER_UINT (F, P) \ > + ALL_FLOAT_UINT (F, P) \ > + PRED_##P (F, svuint16_t, bfloat16_t, bf16) > + > +#define B(F, P) \ > + PRED_##P (F, svbool_t, bool_t, b) > + > +#define BN(F, P) \ > + PRED_##P (F, svbool_t, bool_t, b8) \ > + PRED_##P (F, svbool_t, bool_t, b16) \ > + PRED_##P (F, svbool_t, bool_t, b32) \ > + PRED_##P (F, svbool_t, bool_t, b64) > + > +#define BOOL(F, P) \ > + PRED_##P (F, bool, bool_t) > + > +#define ALL_PRED_UINT64(F, P) \ > + PRED_##P (F, uint64_t, bool_t, b8) \ > + PRED_##P (F, uint64_t, bool_t, b16) \ > + PRED_##P (F, uint64_t, bool_t, b32) \ > + PRED_##P (F, uint64_t, bool_t, b64) > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c > new file mode 100644 > index 00000000000..a8fd4c8e8f2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +B (brkn, Zv) > +B (brkpa, Zv) > +B (brkpb, Zv) > +ALL_DATA (splice, IMPLICITv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c > new file mode 100644 > index 00000000000..08cd6a07c57 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_FLOAT_INT (scale, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c > new file mode 100644 > index 00000000000..f5c9cbf1ae6 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_ARITH (abd, MXZ) > +ALL_ARITH (add, MXZ) > +ALL_INTEGER (and, MXZ) > +B (and, Zv) > +ALL_INTEGER (bic, MXZ) > +B (bic, Zv) > +ALL_FLOAT_AND_SD_INTEGER (div, MXZ) > +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ) > +ALL_INTEGER (eor, MXZ) > +B (eor, Zv) > +ALL_ARITH (mul, MXZ) > +ALL_INTEGER (mulh, MXZ) > +ALL_FLOAT (mulx, MXZ) > +B (nand, Zv) > +B (nor, Zv) > +B (orn, Zv) > +ALL_INTEGER (orr, MXZ) > +B (orr, Zv) > +ALL_ARITH (sub, MXZ) > +ALL_ARITH (subr, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 448 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c > new file mode 100644 > index 00000000000..91ae3c853f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_ARITH (max, MXZ) > +ALL_ARITH (min, MXZ) > +ALL_FLOAT (maxnm, MXZ) > +ALL_FLOAT (minnm, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 56 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 112 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c > new file mode 100644 > index 00000000000..12368ce39e6 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define MXZ4(F, TYPE) \ > + TYPE F##_f (TYPE op1, TYPE op2) \ > + { \ > + return sv##F (svpfalse_b (), op1, op2, 90); \ > + } > + > +#define PRED_MXZ(F, TYPE, TY) \ > + MXZ4 (F##_##TY##_m, TYPE) \ > + MXZ4 (F##_##TY##_x, TYPE) \ > + MXZ4 (F##_##TY##_z, TYPE) > + > +#define ALL_FLOAT(F, P) \ > + PRED_##P (F, svfloat16_t, f16) \ > + PRED_##P (F, svfloat32_t, f32) \ > + PRED_##P (F, svfloat64_t, f64) > + > +ALL_FLOAT (cadd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c > new file mode 100644 > index 00000000000..dd52a5807e8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +BHS_SIGNED_UINT64 (asr_wide, MXZ) > +BHS_INTEGER_UINT64 (lsl_wide, MXZ) > +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c > new file mode 100644 > index 00000000000..e55ddfb674f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_SIGNED_UINT (asr, MXZ) > +ALL_INTEGER_UINT (lsl, MXZ) > +ALL_UNSIGNED_UINT (lsr, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 64 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c > new file mode 100644 > index 00000000000..6796229fb31 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binaryxn.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TY) \ > + sv##TY F##_f (sv##TY op1, sv##TY op2) \ > + { \ > + return sv##F (svpfalse_b (), op1, op2); \ > + } > + > +#define ALL_DATA(F) \ > + T (F##_bf16, bfloat16_t) \ > + T (F##_f16, float16_t) \ > + T (F##_f32, float32_t) \ > + T (F##_f64, float64_t) \ > + T (F##_s8, int8_t) \ > + T (F##_s16, int16_t) \ > + T (F##_s32, int32_t) \ > + T (F##_s64, int64_t) \ > + T (F##_u8, uint8_t) \ > + T (F##_u16, uint16_t) \ > + T (F##_u32, uint32_t) \ > + T (F##_u64, uint64_t) \ > + T (F##_b, bool_t) > + > +ALL_DATA (sel) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[zp]0\.[db], [zp]1\.[db]\n\tret\n} 13 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 13 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c > new file mode 100644 > index 00000000000..7f2ec4acc26 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-clast.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_DATA (clasta, IMPLICITv) > +ALL_DATA (clastb, IMPLICITv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c > new file mode 100644 > index 00000000000..d18427bbf26 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_opt_n.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_FLOAT_BOOL (acge, IMPLICIT) > +ALL_FLOAT_BOOL (acgt, IMPLICIT) > +ALL_FLOAT_BOOL (acle, IMPLICIT) > +ALL_FLOAT_BOOL (aclt, IMPLICIT) > +ALL_ARITH_BOOL (cmpeq, IMPLICIT) > +ALL_ARITH_BOOL (cmpge, IMPLICIT) > +ALL_ARITH_BOOL (cmpgt, IMPLICIT) > +ALL_ARITH_BOOL (cmple, IMPLICIT) > +ALL_ARITH_BOOL (cmplt, IMPLICIT) > +ALL_ARITH_BOOL (cmpne, IMPLICIT) > +ALL_FLOAT_BOOL (cmpuo, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 162 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 162 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c > new file mode 100644 > index 00000000000..983ab5c160d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +BHS_SIGNED_WIDE_BOOL (cmpeq_wide, IMPLICIT) > +BHS_INTEGER_WIDE_BOOL (cmpge_wide, IMPLICIT) > +BHS_INTEGER_WIDE_BOOL (cmpgt_wide, IMPLICIT) > +BHS_INTEGER_WIDE_BOOL (cmple_wide, IMPLICIT) > +BHS_INTEGER_WIDE_BOOL (cmplt_wide, IMPLICIT) > +BHS_SIGNED_WIDE_BOOL (cmpne_wide, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 60 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c > new file mode 100644 > index 00000000000..de36b66ffc0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-count_pred.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_PRED_UINT64 (cntp, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tx0, 0\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c > new file mode 100644 > index 00000000000..333140d897a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-fold_left.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_FLOAT_SCALAR (adda, IMPLICITv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 3 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c > new file mode 100644 > index 00000000000..93d66937364 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TY) \ > + sv##TY F##_f (const TY *base) \ > + { \ > + return sv##F (svpfalse_b (), base); \ > + } > + > +#define ALL_DATA(F) \ > + T (F##_bf16, bfloat16_t) \ > + T (F##_f16, float16_t) \ > + T (F##_f32, float32_t) \ > + T (F##_f64, float64_t) \ > + T (F##_s8, int8_t) \ > + T (F##_s16, int16_t) \ > + T (F##_s32, int32_t) \ > + T (F##_s64, int64_t) \ > + T (F##_u8, uint8_t) \ > + T (F##_u16, uint16_t) \ > + T (F##_u32, uint32_t) \ > + T (F##_u64, uint64_t) \ > + > +ALL_DATA (ldff1) > +ALL_DATA (ldnf1) > +ALL_DATA (ldnt1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 36 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c > new file mode 100644 > index 00000000000..c88686a2dd8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext.c > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, RTY, TY) \ > + RTY F##_f (const TY *base) \ > + { \ > + return sv##F (svpfalse_b (), base); \ > + } > + > +#define D_INTEGER(F, TY) \ > + T (F##_s64, svint64_t, TY) \ > + T (F##_u64, svuint64_t, TY) > + > +#define SD_INTEGER(F, TY) \ > + D_INTEGER (F, TY) \ > + T (F##_s32, svint32_t, TY) \ > + T (F##_u32, svuint32_t, TY) > + > +#define HSD_INTEGER(F, TY) \ > + SD_INTEGER (F, TY) \ > + T (F##_s16, svint16_t, TY) \ > + T (F##_u16, svuint16_t, TY) > + > +#define TEST(F) \ > + HSD_INTEGER (F##sb, int8_t) \ > + SD_INTEGER (F##sh, int16_t) \ > + D_INTEGER (F##sw, int32_t) \ > + HSD_INTEGER (F##ub, uint8_t) \ > + SD_INTEGER (F##uh, uint16_t) \ > + D_INTEGER (F##uw, uint32_t) \ > + > +TEST (ld1) > +TEST (ldff1) > +TEST (ldnf1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 72 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c > new file mode 100644 > index 00000000000..5f4b562fca3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c > @@ -0,0 +1,51 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, RTY, TY1, TY2) \ > + RTY F##_f (const TY1 *base, TY2 indices) \ > + { \ > + return sv##F (svpfalse_b (), base, indices); \ > + } > + > +#define T2(F, RTY, TY1) \ > + RTY F##_f (TY1 bases, int64_t index) \ > + { \ > + return sv##F (svpfalse_b (), bases, index); \ > + } > + > +#define T3(F, B, RTY, TY, TYPE) \ > + T1 (F##_gather_s##B##index_##TY, RTY, TYPE, svint##B##_t) \ > + T1 (F##_gather_u##B##index_##TY, RTY, TYPE, svuint##B##_t) > + > +#define T4(F, B, RTY, TY) \ > + T2 (F##_gather_##TY##base_index_s##B, svint##B##_t, RTY) \ > + T2 (F##_gather_##TY##base_index_u##B, svuint##B##_t, RTY) > + > +#define TEST(F) \ > + T3 (F##sh, 32, svint32_t, s32, int16_t) \ > + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ > + T3 (F##sh, 64, svint64_t, s64, int16_t) \ > + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ > + T4 (F##sh, 32, svuint32_t, u32) \ > + T4 (F##sh, 64, svuint64_t, u64) \ > + T3 (F##sw, 64, svint64_t, s64, int32_t) \ > + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ > + T4 (F##sw, 64, svuint64_t, u64) \ > + T3 (F##uh, 32, svint32_t, s32, uint16_t) \ > + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ > + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ > + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ > + T4 (F##uh, 32, svuint32_t, u32) \ > + T4 (F##uh, 64, svuint64_t, u64) \ > + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ > + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ > + T4 (F##uw, 64, svuint64_t, u64) \ > + > +TEST (ld1) > +TEST (ldff1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 72 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c > new file mode 100644 > index 00000000000..0fe8ab3ea61 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c > @@ -0,0 +1,71 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, RTY, TY1, TY2) \ > + RTY F##_f (const TY1 *base, TY2 offsets) \ > + { \ > + return sv##F (svpfalse_b (), base, offsets); \ > + } > + > +#define T2(F, RTY, TY1) \ > + RTY F##_f (TY1 bases, int64_t offset) \ > + { \ > + return sv##F (svpfalse_b (), bases, offset); \ > + } > + > +#define T5(F, RTY, TY1) \ > + RTY F##_f (TY1 bases) \ > + { \ > + return sv##F (svpfalse_b (), bases); \ > + } > + > +#define T3(F, B, RTY, TY, TYPE) \ > + T1 (F##_gather_s##B##offset_##TY, RTY, TYPE, svint##B##_t) \ > + T1 (F##_gather_u##B##offset_##TY, RTY, TYPE, svuint##B##_t) > + > +#define T4(F, B, RTY, TY) \ > + T2 (F##_gather_##TY##base_offset_s##B, svint##B##_t, RTY) \ > + T2 (F##_gather_##TY##base_offset_u##B, svuint##B##_t, RTY) \ > + T5 (F##_gather_##TY##base_s##B, svint##B##_t, RTY) \ > + T5 (F##_gather_##TY##base_u##B, svuint##B##_t, RTY) > + > +#define TEST(F) \ > + T3 (F##sb, 32, svint32_t, s32, int8_t) \ > + T3 (F##sb, 32, svuint32_t, u32, int8_t) \ > + T3 (F##sb, 64, svint64_t, s64, int8_t) \ > + T3 (F##sb, 64, svuint64_t, u64, int8_t) \ > + T4 (F##sb, 32, svuint32_t, u32) \ > + T4 (F##sb, 64, svuint64_t, u64) \ > + T3 (F##sh, 32, svint32_t, s32, int16_t) \ > + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ > + T3 (F##sh, 64, svint64_t, s64, int16_t) \ > + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ > + T4 (F##sh, 32, svuint32_t, u32) \ > + T4 (F##sh, 64, svuint64_t, u64) \ > + T3 (F##sw, 64, svint64_t, s64, int32_t) \ > + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ > + T4 (F##sw, 64, svuint64_t, u64) \ > + T3 (F##ub, 32, svint32_t, s32, uint8_t) \ > + T3 (F##ub, 32, svuint32_t, u32, uint8_t) \ > + T3 (F##ub, 64, svint64_t, s64, uint8_t) \ > + T3 (F##ub, 64, svuint64_t, u64, uint8_t) \ > + T4 (F##ub, 32, svuint32_t, u32) \ > + T4 (F##ub, 64, svuint64_t, u64) \ > + T3 (F##uh, 32, svint32_t, s32, uint16_t) \ > + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ > + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ > + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ > + T4 (F##uh, 32, svuint32_t, u32) \ > + T4 (F##uh, 64, svuint64_t, u64) \ > + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ > + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ > + T4 (F##uw, 64, svuint64_t, u64) \ > + > +TEST (ld1) > +TEST (ldff1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 160 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 160 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c > new file mode 100644 > index 00000000000..758f00fe175 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_sv.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, RTY, TY1, TY2) \ > + RTY F##_f (TY1 *base, TY2 values) \ > + { \ > + return sv##F (svpfalse_b (), base, values); \ > + } > + > +#define T3(F, TY, B) \ > + T1 (F##_f##B, svfloat##B##_t, float##B##_t, TY) \ > + T1 (F##_s##B, svint##B##_t, int##B##_t, TY) \ > + T1 (F##_u##B, svuint##B##_t, uint##B##_t, TY) \ > + > +#define T2(F, B) \ > + T3 (F##_gather_u##B##offset, svuint##B##_t, B) \ > + T3 (F##_gather_u##B##index, svuint##B##_t, B) \ > + T3 (F##_gather_s##B##offset, svint##B##_t, B) \ > + T3 (F##_gather_s##B##index, svint##B##_t, B) > + > +#define SD_DATA(F) \ > + T2 (F, 32) \ > + T2 (F, 64) > + > +SD_DATA (ld1) > +SD_DATA (ldff1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c > new file mode 100644 > index 00000000000..f82471f3b1e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_gather_vs.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, RTY, TY) \ > + RTY F##_f (TY bases) \ > + { \ > + return sv##F (svpfalse_b (), bases); \ > + } > + > +#define T2(F, RTY, TY) \ > + RTY F##_f (TY bases, int64_t value) \ > + { \ > + return sv##F (svpfalse_b (), bases, value); \ > + } > + > +#define T4(F, TY, TEST, B) \ > + TEST (F##_f##B, svfloat##B##_t, TY) \ > + TEST (F##_s##B, svint##B##_t, TY) \ > + TEST (F##_u##B, svuint##B##_t, TY) \ > + > +#define T3(F, B) \ > + T4 (F##_gather_u##B##base, svuint##B##_t, T1, B) \ > + T4 (F##_gather_u##B##base_offset, svuint##B##_t, T2, B) \ > + T4 (F##_gather_u##B##base_index, svuint##B##_t, T2, B) > + > +#define SD_DATA(F) \ > + T3 (F, 32) \ > + T3 (F, 64) > + > +SD_DATA (ld1) > +SD_DATA (ldff1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 36 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c > new file mode 100644 > index 00000000000..ba500b6cab8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-load_replicate.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2 -march=armv8.2-a+sve+f64mm" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TY) \ > + sv##TY F##_f (const TY *base) \ > + { \ > + return sv##F (svpfalse_b (), base); \ > + } > + > +#define ALL_DATA(F) \ > + T (F##_bf16, bfloat16_t) \ > + T (F##_f16, float16_t) \ > + T (F##_f32, float32_t) \ > + T (F##_f64, float64_t) \ > + T (F##_s8, int8_t) \ > + T (F##_s16, int16_t) \ > + T (F##_s32, int32_t) \ > + T (F##_s64, int64_t) \ > + T (F##_u8, uint8_t) \ > + T (F##_u16, uint16_t) \ > + T (F##_u32, uint32_t) \ > + T (F##_u64, uint64_t) \ > + > +ALL_DATA (ld1rq) > +ALL_DATA (ld1ro) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c > new file mode 100644 > index 00000000000..71894c4b38a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F) \ > + void F##_f (const void *base) \ > + { \ > + return sv##F (svpfalse_b (), base, 0); \ > + } > + > +#define ALL_PREFETCH \ > + T (prfb) \ > + T (prfh) \ > + T (prfw) \ > + T (prfd) > + > +ALL_PREFETCH > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c > new file mode 100644 > index 00000000000..1b7cc42b969 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TYPE, TY) \ > + void F##_##TY##_f (const void *base, sv##TYPE indices) \ > + { \ > + return sv##F##_##TY##index (svpfalse_b (), base, indices, 0);\ > + } > + > +#define T1(F) \ > + T (F, uint32_t, u32) \ > + T (F, uint64_t, u64) \ > + T (F, int32_t, s32) \ > + T (F, int64_t, s64) > + > +#define ALL_PREFETCH_GATHER_INDEX \ > + T1 (prfh_gather) \ > + T1 (prfw_gather) \ > + T1 (prfd_gather) > + > +ALL_PREFETCH_GATHER_INDEX > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c > new file mode 100644 > index 00000000000..7f4ff2df523 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c > @@ -0,0 +1,25 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TYPE, TY) \ > + void F##_##TY##_f (const void *base, sv##TYPE offsets) \ > + { \ > + return sv##F##_##TY##offset (svpfalse_b (), base, offsets, 0); \ > + } > + > +#define T1(F) \ > + T (F, uint32_t, u32) \ > + T (F, uint64_t, u64) \ > + T (F, int32_t, s32) \ > + T (F, int64_t, s64) > + > +#define ALL_PREFETCH_GATHER_OFFSET \ > + T1 (prfb_gather) \ > + > +ALL_PREFETCH_GATHER_OFFSET > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c > new file mode 100644 > index 00000000000..0a587fca869 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ptest.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +BOOL (ptest_any, IMPLICITn) > +BOOL (ptest_first, IMPLICITn) > +BOOL (ptest_last, IMPLICITn) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tw0, 0\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 3 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c > new file mode 100644 > index 00000000000..d795f8ec3f8 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-rdffr.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +svbool_t rdffr_f () > +{ > + return svrdffr_z (svpfalse_b ()); > +} > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c > new file mode 100644 > index 00000000000..42b37aef0b5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_INTEGER_SCALAR (andv, IMPLICIT) > +ALL_INTEGER_SCALAR (eorv, IMPLICIT) > +ALL_ARITH_SCALAR (maxv, IMPLICIT) > +ALL_ARITH_SCALAR (minv, IMPLICIT) > +ALL_INTEGER_SCALAR (orv, IMPLICIT) > +ALL_FLOAT_SCALAR (maxnmv, IMPLICIT) > +ALL_FLOAT_SCALAR (minnmv, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, 0\n\tret\n} 20 } } */ > +/* { dg-final { scan-tree-dump-times "return Nan" 6 "optimized" } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wx]0, -1\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\t[wxz]0(?:\.[sd])?, #?-?[1-9]+[0-9]*\n\tret\n} 27 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\t(movi|mvni)\tv0\.(2s|4h), 0x[0-9a-f]+, [a-z]+ [0-9]+\n\tret\n} 5 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 52 } } */ > + > +/* The sum of tested cases is 52 + 12, because mov [wx]0, -1 is tested in two > + patterns. */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c > new file mode 100644 > index 00000000000..bd9a9807611 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-reduction_wide.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_ARITH_SCALAR_WIDE (addv, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[dwvx]0(?:\.(2s|4h))?, #?0\n\tret\n} 11 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 11 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c > new file mode 100644 > index 00000000000..62a07557e82 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-shift_right_imm.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define MXZ(F, TY) \ > + TY F##_f (TY op1) \ > + { \ > + return sv##F (svpfalse_b (), op1, 2); \ > + } > + > +#define PRED_MXZn(F, TYPE, TY) \ > + MXZ (F##_n_##TY##_m, TYPE) \ > + MXZ (F##_n_##TY##_x, TYPE) \ > + MXZ (F##_n_##TY##_z, TYPE) > + > +#define ALL_SIGNED_IMM(F, P) \ > + PRED_##P (F, svint8_t, s8) \ > + PRED_##P (F, svint16_t, s16) \ > + PRED_##P (F, svint32_t, s32) \ > + PRED_##P (F, svint64_t, s64) > + > +ALL_SIGNED_IMM (asrd, MXZn) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c > new file mode 100644 > index 00000000000..751e60e2ba2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store.c > @@ -0,0 +1,53 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TY1, TY2) \ > + void F##_f (TY1 *base, sv##TY2 data) \ > + { \ > + return sv##F (svpfalse_b (), base, data); \ > + } > + > +#define D_INTEGER(F, TY) \ > + T (F##_s64, TY, int64_t) \ > + T (F##_u64, TY, uint64_t) > + > +#define SD_INTEGER(F, TY) \ > + D_INTEGER (F, TY) \ > + T (F##_s32, TY, int32_t) \ > + T (F##_u32, TY, uint32_t) > + > +#define HSD_INTEGER(F, TY) \ > + SD_INTEGER (F, TY) \ > + T (F##_s16, TY, int16_t) \ > + T (F##_u16, TY, uint16_t) > + > +#define ALL_DATA(F, A) \ > + T (F##_bf16, bfloat16_t, bfloat16##A) \ > + T (F##_f16, float16_t, float16##A) \ > + T (F##_f32, float32_t, float32##A) \ > + T (F##_f64, float64_t, float64##A) \ > + T (F##_s8, int8_t, int8##A) \ > + T (F##_s16, int16_t, int16##A) \ > + T (F##_s32, int32_t, int32##A) \ > + T (F##_s64, int64_t, int64##A) \ > + T (F##_u8, uint8_t, uint8##A) \ > + T (F##_u16, uint16_t, uint16##A) \ > + T (F##_u32, uint32_t, uint32##A) \ > + T (F##_u64, uint64_t, uint64##A) \ > + > +HSD_INTEGER (st1b, int8_t) > +SD_INTEGER (st1h, int16_t) > +D_INTEGER (st1w, int32_t) > +ALL_DATA (st1, _t) > +ALL_DATA (st2, x2_t) > +ALL_DATA (st3, x3_t) > +ALL_DATA (st4, x4_t) > + > +/* FIXME: Currently, st1/2/3/4 are not folded with a pfalse > + predicate, which is the reason for the 48 missing cases below. Once > + folding is implemented for these intrinsics, the sum should be 60. */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 60 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c > new file mode 100644 > index 00000000000..44792d30b42 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_index.c > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, TY1, TY2, TY3) \ > + void F##_f (TY1 *base, TY2 indices, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), base, indices, data); \ > + } > + > +#define T2(F, TY1, TY3) \ > + void F##_f (TY1 bases, int64_t index, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), bases, index, data); \ > + } > + > +#define T3(F, B, TYPE1, TY, TYPE2) \ > + T1 (F##_scatter_s##B##index_##TY, TYPE2, svint##B##_t, TYPE1) \ > + T1 (F##_scatter_u##B##index_##TY, TYPE2, svuint##B##_t, TYPE1) > + > +#define T4(F, B, TYPE1, TY) \ > + T2 (F##_scatter_u##B##base_index_##TY, svuint##B##_t, TYPE1) > + > +#define TEST(F) \ > + T3 (F##h, 32, svint32_t, s32, int16_t) \ > + T3 (F##h, 32, svuint32_t, u32, int16_t) \ > + T3 (F##h, 64, svint64_t, s64, int16_t) \ > + T3 (F##h, 64, svuint64_t, u64, int16_t) \ > + T4 (F##h, 32, svuint32_t, u32) \ > + T4 (F##h, 64, svuint64_t, u64) \ > + T3 (F##w, 64, svint64_t, s64, int32_t) \ > + T3 (F##w, 64, svuint64_t, u64, int32_t) \ > + T4 (F##w, 64, svuint64_t, u64) \ > + > +TEST (st1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 15 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c > new file mode 100644 > index 00000000000..f3820e065f9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-store_scatter_offset.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, TY1, TY2, TY3) \ > + void F##_f (TY1 *base, TY2 offsets, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), base, offsets, data); \ > + } > + > +#define T2(F, TY1, TY3) \ > + void F##_f (TY1 bases, int64_t offset, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), bases, offset, data); \ > + } > + > +#define T5(F, TY1, TY3) \ > + void F##_f (TY1 bases, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), bases, data); \ > + } > + > + > +#define T3(F, B, TYPE1, TY, TYPE2) \ > + T1 (F##_scatter_s##B##offset_##TY, TYPE2, svint##B##_t, TYPE1)\ > + T1 (F##_scatter_u##B##offset_##TY, TYPE2, svuint##B##_t, TYPE1) > + > +#define T4(F, B, TYPE1, TY) \ > + T2 (F##_scatter_u##B##base_offset_##TY, svuint##B##_t, TYPE1) \ > + T5 (F##_scatter_u##B##base_##TY, svuint##B##_t, TYPE1) > + > +#define D_INTEGER(F, BHW) \ > + T3 (F, 64, svint64_t, s64, int##BHW##_t) \ > + T3 (F, 64, svuint64_t, u64, int##BHW##_t) \ > + T4 (F, 64, svint64_t, s64) \ > + T4 (F, 64, svuint64_t, u64) > + > +#define SD_INTEGER(F, BHW) \ > + D_INTEGER (F, BHW) \ > + T3 (F, 32, svint32_t, s32, int##BHW##_t) \ > + T3 (F, 32, svuint32_t, u32, int##BHW##_t) \ > + T4 (F, 32, svint32_t, s32) \ > + T4 (F, 32, svuint32_t, u32) > + > +#define SD_DATA(F) \ > + T3 (F, 32, svint32_t, s32, int32_t) \ > + T3 (F, 64, svint64_t, s64, int64_t) \ > + T4 (F, 32, svint32_t, s32) \ > + T4 (F, 64, svint64_t, s64) \ > + T3 (F, 32, svuint32_t, u32, uint32_t) \ > + T3 (F, 64, svuint64_t, u64, uint64_t) \ > + T4 (F, 32, svuint32_t, u32) \ > + T4 (F, 64, svuint64_t, u64) \ > + T3 (F, 32, svfloat32_t, f32, float32_t) \ > + T3 (F, 64, svfloat64_t, f64, float64_t) \ > + T4 (F, 32, svfloat32_t, f32) \ > + T4 (F, 64, svfloat64_t, f64) > + > + > +SD_DATA (st1) > +SD_INTEGER (st1b, 8) > +SD_INTEGER (st1h, 16) > +D_INTEGER (st1w, 32) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 64 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c > new file mode 100644 > index 00000000000..e49266dab68 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-storexn.c > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TY) \ > + void F##_f (TY *base, sv##TY data) \ > + { \ > + return sv##F (svpfalse_b (), base, data); \ > + } > + > +#define ALL_DATA(F) \ > + T (F##_bf16, bfloat16_t) \ > + T (F##_f16, float16_t) \ > + T (F##_f32, float32_t) \ > + T (F##_f64, float64_t) \ > + T (F##_s8, int8_t) \ > + T (F##_s16, int16_t) \ > + T (F##_s32, int32_t) \ > + T (F##_s64, int64_t) \ > + T (F##_u8, uint8_t) \ > + T (F##_u16, uint16_t) \ > + T (F##_u32, uint32_t) \ > + T (F##_u64, uint64_t) > + > +ALL_DATA (stnt1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c > new file mode 100644 > index 00000000000..acdd1417af4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_opt_n.c > @@ -0,0 +1,51 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define MXZ(F, TY12, TY3) \ > + TY12 F##_f (TY12 op1, TY12 op2, TY3 op3) \ > + { \ > + return sv##F (svpfalse_b (), op1, op2, op3); \ > + } > + > +#define PRED_MXZ(F, TYPE, TY) \ > + MXZ (F##_##TY##_m, sv##TYPE, sv##TYPE) \ > + MXZ (F##_n_##TY##_m, sv##TYPE, TYPE) \ > + MXZ (F##_##TY##_x, sv##TYPE, sv##TYPE) \ > + MXZ (F##_n_##TY##_x, sv##TYPE, TYPE) \ > + MXZ (F##_##TY##_z, sv##TYPE, sv##TYPE) \ > + MXZ (F##_n_##TY##_z, sv##TYPE, TYPE) > + > +#define ALL_FLOAT(F, P) \ > + PRED_##P (F, float16_t, f16) \ > + PRED_##P (F, float32_t, f32) \ > + PRED_##P (F, float64_t, f64) > + > +#define ALL_INTEGER(F, P) \ > + PRED_##P (F, uint8_t, u8) \ > + PRED_##P (F, uint16_t, u16) \ > + PRED_##P (F, uint32_t, u32) \ > + PRED_##P (F, uint64_t, u64) \ > + PRED_##P (F, int8_t, s8) \ > + PRED_##P (F, int16_t, s16) \ > + PRED_##P (F, int32_t, s32) \ > + PRED_##P (F, int64_t, s64) \ > + > +#define ALL_ARITH(F, P) \ > + ALL_INTEGER (F, P) \ > + ALL_FLOAT (F, P) > + > +ALL_ARITH (mad, MXZ) > +ALL_ARITH (mla, MXZ) > +ALL_ARITH (mls, MXZ) > +ALL_ARITH (msb, MXZ) > +ALL_FLOAT (nmad, MXZ) > +ALL_FLOAT (nmla, MXZ) > +ALL_FLOAT (nmls, MXZ) > +ALL_FLOAT (nmsb, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c > new file mode 100644 > index 00000000000..7698045d27c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-ternary_rotate.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define MXZ(F, TY) \ > + TY F##_f (TY op1, TY op2, TY op3) \ > + { \ > + return sv##F (svpfalse_b (), op1, op2, op3, 90); \ > + } > + > +#define PRED_MXZ(F, TYPE, TY) \ > + MXZ (F##_##TY##_m, sv##TYPE) \ > + MXZ (F##_##TY##_x, sv##TYPE) \ > + MXZ (F##_##TY##_z, sv##TYPE) > + > +#define ALL_FLOAT(F, P) \ > + PRED_##P (F, float16_t, f16) \ > + PRED_##P (F, float32_t, f32) \ > + PRED_##P (F, float64_t, f64) > + > +ALL_FLOAT (cmla, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c > new file mode 100644 > index 00000000000..037376b3a4a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary.c > @@ -0,0 +1,33 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_FLOAT_AND_SIGNED (abs, MXZ) > +B (brka, MZ) > +B (brkb, MZ) > +ALL_INTEGER (cnot, MXZ) > +HSD_INTEGER (extb, MXZ) > +SD_INTEGER (exth, MXZ) > +D_INTEGER (extw, MXZ) > +B (mov, Z) > +ALL_FLOAT_AND_SIGNED (neg, MXZ) > +ALL_INTEGER (not, MXZ) > +B (not, Z) > +B (pfirst, IMPLICIT) > +ALL_INTEGER (rbit, MXZ) > +ALL_FLOAT (recpx, MXZ) > +HSD_INTEGER (revb, MXZ) > +SD_INTEGER (revh, MXZ) > +D_INTEGER (revw, MXZ) > +ALL_FLOAT (rinti, MXZ) > +ALL_FLOAT (rintx, MXZ) > +ALL_FLOAT (rintz, MXZ) > +ALL_FLOAT (sqrt, MXZ) > +SD_DATA (compact, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 80 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 160 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 244 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c > new file mode 100644 > index 00000000000..1287a70a7ed > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2 -march=armv8.2-a+sve+bf16" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TYPE1, TY1, TYPE2, TY2) \ > + TYPE1 F##_##TY1##_##TY2##_x_f (TYPE1 even, TYPE2 op) \ > + { \ > + return sv##F##_##TY1##_##TY2##_x (even, svpfalse_b (), op); \ > + } \ > + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 even, TYPE2 op) \ > + { \ > + return sv##F##_##TY1##_##TY2##_m (even, svpfalse_b (), op); \ > + } > + > +T (cvtnt, svbfloat16_t, bf16, svfloat32_t, f32) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?: [0-9]*[bhsd])?, #?0\n\tret\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 2 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c > new file mode 100644 > index 00000000000..f5192666f12 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_convertxn.c > @@ -0,0 +1,50 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2 -march=armv8.2-a+sve+bf16" } */ > + > +#include <arm_sve.h> > + > +#define T(TYPE1, TY1, TYPE2, TY2) \ > + TYPE1 cvt_##TY1##_##TY2##_x_f (TYPE2 op) \ > + { \ > + return svcvt_##TY1##_##TY2##_x (svpfalse_b (), op); \ > + } \ > + TYPE1 cvt_##TY1##_##TY2##_z_f (TYPE2 op) \ > + { \ > + return svcvt_##TY1##_##TY2##_z (svpfalse_b (), op); \ > + } \ > + TYPE1 cvt_##TY1##_##TY2##_m_f (TYPE1 inactive, TYPE2 op) \ > + { \ > + return svcvt_##TY1##_##TY2##_m (inactive, svpfalse_b (), op); \ > + } > + > +#define SWAP(TYPE1, TY1, TYPE2, TY2) \ > + T (TYPE1, TY1, TYPE2, TY2) \ > + T (TYPE2, TY2, TYPE1, TY1) > + > +#define TEST_ALL \ > + T (svbfloat16_t, bf16, svfloat32_t, f32) \ > + SWAP (svfloat16_t, f16, svfloat32_t, f32) \ > + SWAP (svfloat16_t, f16, svfloat64_t, f64) \ > + SWAP (svfloat32_t, f32, svfloat64_t, f64) \ > + SWAP (svint16_t, s16, svfloat16_t, f16) \ > + SWAP (svint32_t, s32, svfloat16_t, f16) \ > + SWAP (svint32_t, s32, svfloat32_t, f32) \ > + SWAP (svint32_t, s32, svfloat64_t, f64) \ > + SWAP (svint64_t, s64, svfloat16_t, f16) \ > + SWAP (svint64_t, s64, svfloat32_t, f32) \ > + SWAP (svint64_t, s64, svfloat64_t, f64) \ > + SWAP (svuint16_t, u16, svfloat16_t, f16) \ > + SWAP (svuint32_t, u32, svfloat16_t, f16) \ > + SWAP (svuint32_t, u32, svfloat32_t, f32) \ > + SWAP (svuint32_t, u32, svfloat64_t, f64) \ > + SWAP (svuint64_t, u64, svfloat16_t, f16) \ > + SWAP (svuint64_t, u64, svfloat32_t, f32) \ > + SWAP (svuint64_t, u64, svfloat64_t, f64) > + > +TEST_ALL > + > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 35 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 70 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 105 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c > new file mode 100644 > index 00000000000..fabde3e52a5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_n.c > @@ -0,0 +1,43 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define M(F, TY) \ > + sv##TY F##_f (sv##TY inactive, TY op) \ > + { \ > + return sv##F (inactive, svpfalse_b (), op); \ > + } > + > +#define XZ(F, TY) \ > + sv##TY F##_f (TY op) \ > + { \ > + return sv##F (svpfalse_b (), op); \ > + } > + > +#define PRED_MXZ(F, TYPE, TY) \ > + M (F##_##TY##_m, TYPE) \ > + XZ (F##_##TY##_x, TYPE) \ > + XZ (F##_##TY##_z, TYPE) > + > +#define ALL_DATA(F, P) \ > + PRED_##P (F, uint8_t, u8) \ > + PRED_##P (F, uint16_t, u16) \ > + PRED_##P (F, uint32_t, u32) \ > + PRED_##P (F, uint64_t, u64) \ > + PRED_##P (F, int8_t, s8) \ > + PRED_##P (F, int16_t, s16) \ > + PRED_##P (F, int32_t, s32) \ > + PRED_##P (F, int64_t, s64) \ > + PRED_##P (F, float16_t, f16) \ > + PRED_##P (F, float32_t, f32) \ > + PRED_##P (F, float64_t, f64) \ > + PRED_##P (F, bfloat16_t, bf16) \ > + > +ALL_DATA (dup, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0(?:\.[bhsd])?, [wxhsd]0\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c > new file mode 100644 > index 00000000000..46c9592c47d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_pred.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +BN (pnext, IMPLICIT) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c > new file mode 100644 > index 00000000000..b820bde817f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unary_to_uint.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_SIGNED_UINT (cls, MXZ) > +ALL_INTEGER_UINT (clz, MXZ) > +ALL_DATA_UINT (cnt, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 24 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 48 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c > new file mode 100644 > index 00000000000..1e99b7f2f8b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-unaryxn.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_FLOAT (rinta, MXZ) > +ALL_FLOAT (rintm, MXZ) > +ALL_FLOAT (rintn, MXZ) > +ALL_FLOAT (rintp, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12} } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c > new file mode 100644 > index 00000000000..94470a5d617 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_ARITH (addp, MXv) > +ALL_ARITH (maxp, MXv) > +ALL_FLOAT (maxnmp, MXv) > +ALL_ARITH (minp, MXv) > +ALL_FLOAT (minnmp, MXv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 39 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 39 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c > new file mode 100644 > index 00000000000..b8747b8d50a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_INTEGER_INT (qshl, MXZ) > +ALL_INTEGER_INT (qrshl, MXZ) > +ALL_UNSIGNED_INT (sqadd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 40 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 80 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 120 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c > new file mode 100644 > index 00000000000..7cb7ee5203c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_INTEGER_INT (rshl, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 32 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c > new file mode 100644 > index 00000000000..787126f70bd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_INTEGER (hadd, MXZ) > +ALL_INTEGER (hsub, MXZ) > +ALL_INTEGER (hsubr, MXZ) > +ALL_INTEGER (qadd, MXZ) > +ALL_INTEGER (qsub, MXZ) > +ALL_INTEGER (qsubr, MXZ) > +ALL_INTEGER (rhadd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c > new file mode 100644 > index 00000000000..6b2b0a424d3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2 -march=armv8.2-a+sve2+faminmax" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_FLOAT (amax, MXZ) > +ALL_FLOAT (amin, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 36 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c > new file mode 100644 > index 00000000000..a0a7f809f82 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +SD_INTEGER_TO_UINT (histcnt, Zv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c > new file mode 100644 > index 00000000000..c13db48948b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +ALL_SIGNED_UINT (uqadd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 8 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c > new file mode 100644 > index 00000000000..145b07760d7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +HSD_INTEGER_WIDE (adalp, MXZv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 12 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c > new file mode 100644 > index 00000000000..da175db92e4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-compare.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-binary_0.h" > + > +BH_INTEGER_BOOL (match, IMPLICITv) > +BH_INTEGER_BOOL (nmatch, IMPLICITv) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 8 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 8 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c > new file mode 100644 > index 00000000000..c0476ce9c74 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c > @@ -0,0 +1,46 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, RTY, TY1, TY2) \ > + RTY F##_f (const TY1 *base, TY2 indices) \ > + { \ > + return sv##F (svpfalse_b (), base, indices); \ > + } > + > +#define T2(F, RTY, TY1) \ > + RTY F##_f (TY1 bases, int64_t index) \ > + { \ > + return sv##F (svpfalse_b (), bases, index); \ > + } > + > +#define T3(F, B, RTY, TY, TYPE) \ > + T1 (F##_gather_s##B##index_##TY, RTY, TYPE, svint##B##_t) \ > + T1 (F##_gather_u##B##index_##TY, RTY, TYPE, svuint##B##_t) > + > +#define T4(F, B, RTY, TY) \ > + T2 (F##_gather_##TY##base_index_s##B, svint##B##_t, RTY) \ > + T2 (F##_gather_##TY##base_index_u##B, svuint##B##_t, RTY) > + > +#define TEST(F) \ > + T3 (F##sh, 64, svint64_t, s64, int16_t) \ > + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ > + T4 (F##sh, 32, svuint32_t, u32) \ > + T4 (F##sh, 64, svuint64_t, u64) \ > + T3 (F##sw, 64, svint64_t, s64, int32_t) \ > + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ > + T4 (F##sw, 64, svuint64_t, u64) \ > + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ > + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ > + T4 (F##uh, 32, svuint32_t, u32) \ > + T4 (F##uh, 64, svuint64_t, u64) \ > + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ > + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ > + T4 (F##uw, 64, svuint64_t, u64) \ > + > +TEST (ldnt1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 28 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 28 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c > new file mode 100644 > index 00000000000..f6440246683 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c > @@ -0,0 +1,65 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, RTY, TY1, TY2) \ > + RTY F##_f (const TY1 *base, TY2 offsets) \ > + { \ > + return sv##F (svpfalse_b (), base, offsets); \ > + } > + > +#define T2(F, RTY, TY1) \ > + RTY F##_f (TY1 bases, int64_t offset) \ > + { \ > + return sv##F (svpfalse_b (), bases, offset); \ > + } > + > +#define T5(F, RTY, TY1) \ > + RTY F##_f (TY1 bases) \ > + { \ > + return sv##F (svpfalse_b (), bases); \ > + } > + > +#define T3(F, B, RTY, TY, TYPE) \ > + T1 (F##_gather_u##B##offset_##TY, RTY, TYPE, svuint##B##_t) > + > +#define T4(F, B, RTY, TY) \ > + T2 (F##_gather_##TY##base_offset_s##B, svint##B##_t, RTY) \ > + T2 (F##_gather_##TY##base_offset_u##B, svuint##B##_t, RTY) \ > + T5 (F##_gather_##TY##base_s##B, svint##B##_t, RTY) \ > + T5 (F##_gather_##TY##base_u##B, svuint##B##_t, RTY) > + > +#define TEST(F) \ > + T3 (F##sb, 32, svuint32_t, u32, int8_t) \ > + T3 (F##sb, 64, svint64_t, s64, int8_t) \ > + T3 (F##sb, 64, svuint64_t, u64, int8_t) \ > + T4 (F##sb, 32, svuint32_t, u32) \ > + T4 (F##sb, 64, svuint64_t, u64) \ > + T3 (F##sh, 32, svuint32_t, u32, int16_t) \ > + T3 (F##sh, 64, svint64_t, s64, int16_t) \ > + T3 (F##sh, 64, svuint64_t, u64, int16_t) \ > + T4 (F##sh, 32, svuint32_t, u32) \ > + T4 (F##sh, 64, svuint64_t, u64) \ > + T3 (F##sw, 64, svint64_t, s64, int32_t) \ > + T3 (F##sw, 64, svuint64_t, u64, int32_t) \ > + T4 (F##sw, 64, svuint64_t, u64) \ > + T3 (F##ub, 32, svuint32_t, u32, uint8_t) \ > + T3 (F##ub, 64, svint64_t, s64, uint8_t) \ > + T3 (F##ub, 64, svuint64_t, u64, uint8_t) \ > + T4 (F##ub, 32, svuint32_t, u32) \ > + T4 (F##ub, 64, svuint64_t, u64) \ > + T3 (F##uh, 32, svuint32_t, u32, uint16_t) \ > + T3 (F##uh, 64, svint64_t, s64, uint16_t) \ > + T3 (F##uh, 64, svuint64_t, u64, uint16_t) \ > + T4 (F##uh, 32, svuint32_t, u32) \ > + T4 (F##uh, 64, svuint64_t, u64) \ > + T3 (F##uw, 64, svint64_t, s64, uint32_t) \ > + T3 (F##uw, 64, svuint64_t, u64, uint32_t) \ > + T4 (F##uw, 64, svuint64_t, u64) \ > + > +TEST (ldnt1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 56 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 56 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c > new file mode 100644 > index 00000000000..a48a8a9db51 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T2(F, RTY, TY1, TY2) \ > + RTY F##_f (TY1 *base, TY2 values) \ > + { \ > + return sv##F (svpfalse_b (), base, values); \ > + } > + > +#define T4(F, TY, B) \ > + T2 (F##_f##B, svfloat##B##_t, float##B##_t, TY) \ > + T2 (F##_s##B, svint##B##_t, int##B##_t, TY) \ > + T2 (F##_u##B, svuint##B##_t, uint##B##_t, TY) \ > + > +#define SD_DATA(F) \ > + T4 (F##_gather_s64index, svint64_t, 64) \ > + T4 (F##_gather_u64index, svuint64_t, 64) \ > + T4 (F##_gather_u32offset, svuint32_t, 32) \ > + T4 (F##_gather_u64offset, svuint64_t, 64) \ > + T4 (F##_gather_s64offset, svint64_t, 64) \ > + > +SD_DATA (ldnt1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 15 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c > new file mode 100644 > index 00000000000..1fc08a3c53b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-load_gather_vs.c > @@ -0,0 +1,36 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, RTY, TY) \ > + RTY F##_f (TY bases) \ > + { \ > + return sv##F (svpfalse_b (), bases); \ > + } > + > +#define T2(F, RTY, TY) \ > + RTY F##_f (TY bases, int64_t value) \ > + { \ > + return sv##F (svpfalse_b (), bases, value); \ > + } > + > +#define T4(F, TY, TEST, B) \ > + TEST (F##_f##B, svfloat##B##_t, TY) \ > + TEST (F##_s##B, svint##B##_t, TY) \ > + TEST (F##_u##B, svuint##B##_t, TY) \ > + > +#define T3(F, B) \ > + T4 (F##_gather_u##B##base, svuint##B##_t, T1, B) \ > + T4 (F##_gather_u##B##base_offset, svuint##B##_t, T2, B) \ > + T4 (F##_gather_u##B##base_index, svuint##B##_t, T2, B) > + > +#define SD_DATA(F) \ > + T3 (F, 32) \ > + T3 (F, 64) > + > +SD_DATA (ldnt1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 18 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c > new file mode 100644 > index 00000000000..bd2c9371e65 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define MXZ(F, RTY, TY) \ > + RTY F##_f (TY op1) \ > + { \ > + return sv##F (svpfalse_b (), op1, 2); \ > + } > + > +#define PRED_MXZn(F, RTY, TYPE, TY) \ > + MXZ (F##_n_##TY##_m, RTY, TYPE) \ > + MXZ (F##_n_##TY##_x, RTY, TYPE) \ > + MXZ (F##_n_##TY##_z, RTY, TYPE) > + > +#define ALL_SIGNED_IMM(F, P) \ > + PRED_##P (F, svuint8_t, svint8_t, s8) \ > + PRED_##P (F, svuint16_t, svint16_t, s16) \ > + PRED_##P (F, svuint32_t, svint32_t, s32) \ > + PRED_##P (F, svuint64_t, svint64_t, s64) > + > +ALL_SIGNED_IMM (qshlu, MXZn) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c > new file mode 100644 > index 00000000000..f4994de4c80 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-shift_right_imm.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define MXZ(F, TY) \ > + TY F##_f (TY op1) \ > + { \ > + return sv##F (svpfalse_b (), op1, 2); \ > + } > + > +#define PRED_MXZn(F, TYPE, TY) \ > + MXZ (F##_n_##TY##_m, TYPE) \ > + MXZ (F##_n_##TY##_x, TYPE) \ > + MXZ (F##_n_##TY##_z, TYPE) > + > +#define ALL_INTEGER_IMM(F, P) \ > + PRED_##P (F, svuint8_t, u8) \ > + PRED_##P (F, svuint16_t, u16) \ > + PRED_##P (F, svuint32_t, u32) \ > + PRED_##P (F, svuint64_t, u64) \ > + PRED_##P (F, svint8_t, s8) \ > + PRED_##P (F, svint16_t, s16) \ > + PRED_##P (F, svint32_t, s32) \ > + PRED_##P (F, svint64_t, s64) > + > +ALL_INTEGER_IMM (rshr, MXZn) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 8 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c > new file mode 100644 > index 00000000000..6bec3b34ece > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c > @@ -0,0 +1,50 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, TY1, TY2, TY3) \ > + void F##_f (TY1 *base, TY2 indices, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), base, indices, data); \ > + } > + > +#define T2(F, TY1, TY3) \ > + void F##_f (TY1 bases, int64_t index, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), bases, index, data); \ > + } > + > +#define T3(F, B, TYPE1, TY, TYPE2) \ > + T1 (F##_scatter_s##B##index_##TY, TYPE2, svint##B##_t, TYPE1) \ > + T1 (F##_scatter_u##B##index_##TY, TYPE2, svuint##B##_t, TYPE1) > + > +#define T4(F, B, TYPE1, TY) \ > + T2 (F##_scatter_u##B##base_index_##TY, svuint##B##_t, TYPE1) > + > +#define TEST(F) \ > + T3 (F##h, 64, svint64_t, s64, int16_t) \ > + T3 (F##h, 64, svuint64_t, u64, int16_t) \ > + T4 (F##h, 32, svuint32_t, u32) \ > + T4 (F##h, 64, svuint64_t, u64) \ > + T3 (F##w, 64, svint64_t, s64, int32_t) \ > + T3 (F##w, 64, svuint64_t, u64, int32_t) \ > + T4 (F##w, 64, svuint64_t, u64) \ > + > +#define SD_DATA(F) \ > + T3 (F, 64, svfloat64_t, f64, float64_t) \ > + T3 (F, 64, svint64_t, s64, int64_t) \ > + T3 (F, 64, svuint64_t, u64, uint64_t) \ > + T4 (F, 32, svfloat32_t, f32) \ > + T4 (F, 32, svint32_t, s32) \ > + T4 (F, 32, svuint32_t, u32) \ > + T4 (F, 64, svfloat64_t, f64) \ > + T4 (F, 64, svint64_t, s64) \ > + T4 (F, 64, svuint64_t, u64) \ > + > +TEST (stnt1) > +SD_DATA (stnt1) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 23 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 23 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c > new file mode 100644 > index 00000000000..bcb4a148d62 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c > @@ -0,0 +1,65 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T1(F, TY1, TY2, TY3) \ > + void F##_f (TY1 *base, TY2 offsets, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), base, offsets, data); \ > + } > + > +#define T2(F, TY1, TY3) \ > + void F##_f (TY1 bases, int64_t offset, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), bases, offset, data); \ > + } > + > +#define T5(F, TY1, TY3) \ > + void F##_f (TY1 bases, TY3 data) \ > + { \ > + sv##F (svpfalse_b (), bases, data); \ > + } > + > + > +#define T3(F, B, TYPE1, TY, TYPE2) \ > + T1 (F##_scatter_u##B##offset_##TY, TYPE2, svuint##B##_t, TYPE1) > + > +#define T4(F, B, TYPE1, TY) \ > + T2 (F##_scatter_u##B##base_offset_##TY, svuint##B##_t, TYPE1) \ > + T5 (F##_scatter_u##B##base_##TY, svuint##B##_t, TYPE1) > + > +#define D_INTEGER(F, BHW) \ > + T3 (F, 64, svint64_t, s64, int##BHW##_t) \ > + T3 (F, 64, svuint64_t, u64, int##BHW##_t) \ > + T4 (F, 64, svint64_t, s64) \ > + T4 (F, 64, svuint64_t, u64) > + > +#define SD_INTEGER(F, BHW) \ > + D_INTEGER (F, BHW) \ > + T3 (F, 32, svuint32_t, u32, int##BHW##_t) \ > + T4 (F, 32, svint32_t, s32) \ > + T4 (F, 32, svuint32_t, u32) > + > +#define SD_DATA(F) \ > + T3 (F, 64, svint64_t, s64, int64_t) \ > + T4 (F, 32, svint32_t, s32) \ > + T4 (F, 64, svint64_t, s64) \ > + T3 (F, 32, svuint32_t, u32, uint32_t) \ > + T3 (F, 64, svuint64_t, u64, uint64_t) \ > + T4 (F, 32, svuint32_t, u32) \ > + T4 (F, 64, svuint64_t, u64) \ > + T3 (F, 32, svfloat32_t, f32, float32_t) \ > + T3 (F, 64, svfloat64_t, f64, float64_t) \ > + T4 (F, 32, svfloat32_t, f32) \ > + T4 (F, 64, svfloat64_t, f64) > + > + > +SD_DATA (stnt1) > +SD_INTEGER (stnt1b, 8) > +SD_INTEGER (stnt1h, 16) > +D_INTEGER (stnt1w, 32) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 45 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 45 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c > new file mode 100644 > index 00000000000..ba7e931f058 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary.c > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2 -march=armv9.2-a+sve+sme" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_SIGNED (qabs, MXZ) > +ALL_SIGNED (qneg, MXZ) > +S_UNSIGNED (recpe, MXZ) > +S_UNSIGNED (rsqrte, MXZ) > + > +#undef M > +#define M(F, RTY, TY) \ > + __arm_streaming \ > + RTY F##_f (RTY inactive, TY op) \ > + { \ > + return sv##F (inactive, svpfalse_b (), op); \ > + } > + > +#undef XZI > +#define XZI(F, RTY, TY) \ > + __arm_streaming \ > + RTY F##_f (TY op) \ > + { \ > + return sv##F (svpfalse_b (), op); \ > + } > + > +ALL_DATA (revd, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 22 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 44 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 66 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c > new file mode 100644 > index 00000000000..7aa59ff866f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define XZ(F, TYPE1, TY1, TYPE2, TY2, P) \ > + TYPE1 F##_##TY1##_##TY2##_##P##_f (TYPE2 op) \ > + { \ > + return sv##F##_##TY1##_##TY2##_##P (svpfalse_b (), op); \ > + } \ > + > +#define M(F, TYPE1, TY1, TYPE2, TY2) \ > + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 inactive, TYPE2 op) \ > + { \ > + return sv##F##_##TY1##_##TY2##_m (inactive, svpfalse_b (), op); \ > + } > + > +M (cvtx, svfloat32_t, f32, svfloat64_t, f64) > +XZ (cvtx, svfloat32_t, f32, svfloat64_t, f64, x) > +XZ (cvtx, svfloat32_t, f32, svfloat64_t, f64, z) > +M (cvtlt, svfloat32_t, f32, svfloat16_t, f16) > +XZ (cvtlt, svfloat32_t, f32, svfloat16_t, f16, x) > +M (cvtlt, svfloat64_t, f64, svfloat32_t, f32) > +XZ (cvtlt, svfloat64_t, f64, svfloat32_t, f32, x) > + > + > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 7 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c > new file mode 100644 > index 00000000000..1a4525cc769 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include <arm_sve.h> > + > +#define T(F, TYPE1, TY1, TYPE2, TY2) \ > + TYPE1 F##_##TY1##_##TY2##_x_f (TYPE1 even, TYPE2 op) \ > + { \ > + return sv##F##_##TY1##_##TY2##_x (even, svpfalse_b (), op); \ > + } \ > + TYPE1 F##_##TY1##_##TY2##_m_f (TYPE1 even, TYPE2 op) \ > + { \ > + return sv##F##_##TY1##_##TY2##_m (even, svpfalse_b (), op); \ > + } > + > +T (cvtnt, svfloat16_t, f16, svfloat32_t, f32) > +T (cvtnt, svfloat32_t, f32, svfloat64_t, f64) > +T (cvtxnt, svfloat32_t, f32, svfloat64_t, f64) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?: [0-9]*[bhsd])?, #?0\n\tret\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 6 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c > new file mode 100644 > index 00000000000..b64bfc31944 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-unary_to_int.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target elf } */ > +/* { dg-options "-O2" } */ > + > +#include "../pfalse-unary_0.h" > + > +ALL_FLOAT_INT (logb, MXZ) > + > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 3} } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ > -- > 2.44.0
Jennifer Schmitz <jschmitz@nvidia.com> writes: > @@ -3622,6 +3631,57 @@ gimple_folder::redirect_pred_x () > return redirect_call (instance); > } > > +/* Fold calls with predicate pfalse: > + _m predication: lhs = op1. > + _x or _z: lhs = {0, ...}. > + Implicit predication that reads from memory: lhs = {0, ...}. > + Implicit predication that writes to memory or prefetches: no-op. > + Return the new gimple statement on success, else NULL. */ > +gimple * > +gimple_folder::fold_pfalse () > +{ > + if (pred == PRED_none) > + return nullptr; > + tree arg0 = gimple_call_arg (call, 0); > + if (pred == PRED_m) > + { > + /* Unary function shapes with _m predication are folded to the > + inactive vector (arg0), while other function shapes are folded > + to op1 (arg1). */ > + tree arg1 = gimple_call_arg (call, 1); > + tree t; > + if (is_pfalse (arg1)) > + t = arg0; > + else if (is_pfalse (arg0)) > + t = arg1; > + else > + return nullptr; > + /* In some intrinsics, e.g. svqshlu, lhs and op1 have different types, > + such that folding to op1 needs a type conversion. */ > + if (TREE_TYPE (t) == TREE_TYPE (lhs)) > + return fold_call_to (t); > + else > + { > + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); > + return gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs); > + } I think we should move the VIEW_CONVERT_EXPR into fold_call_to, in case it's useful elsewhere. Admittedly that could also mask bugs, but still, it's seems like a nice facility to have. Using types_compatible_p (TREE_TYPE (lhs), TREE_TYPE (t)) would be more general than direct == between types. This block would then become: /* Unary function shapes with _m predication are folded to the inactive vector (arg0), while other function shapes are folded to op1 (arg1). */ tree arg1 = gimple_call_arg (call, 1); if (is_pfalse (arg1)) return fold_call_to (arg0); if (is_pfalse (arg0)) return fold_call_to (arg1); return nullptr; > + } > + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) > + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); > + if (pred == PRED_implicit && is_pfalse (arg0)) > + { > + unsigned int flags = call_properties (); > + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with > + AGGREGATE types as lhs. */ > + if ((flags & CP_READ_MEMORY) > + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) > + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); > + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) > + return fold_to_stmt_vops (gimple_build_nop ()); > + } > + return nullptr; > +} > + > /* Fold the call to constant VAL. */ > gimple * > gimple_folder::fold_to_cstu (poly_uint64 val) > @@ -3724,6 +3784,23 @@ gimple_folder::fold_active_lanes_to (tree x) > return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); > } > > +/* Fold call to assignment statement lhs = t. */ > +gimple * > +gimple_folder::fold_call_to (tree t) > +{ > + return fold_to_stmt_vops (gimple_build_assign (lhs, t)); > +} This would then become something like: /* Fold the result of the call to T, bitcasting to the right type if necessary. */ gimple * gimple_folder::fold_call_to (tree t) { if (types_compatible_p (TREE_TYPE (lhs), TREE_TYPE (t))) return fold_to_stmt_vops (gimple_build_assign (lhs, t)); tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); return fold_to_stmt_vops (gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs)); } OK with those changes, thanks. Richard
> On 5 Dec 2024, at 16:56, Richard Sandiford <richard.sandiford@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Jennifer Schmitz <jschmitz@nvidia.com> writes: >> @@ -3622,6 +3631,57 @@ gimple_folder::redirect_pred_x () >> return redirect_call (instance); >> } >> >> +/* Fold calls with predicate pfalse: >> + _m predication: lhs = op1. >> + _x or _z: lhs = {0, ...}. >> + Implicit predication that reads from memory: lhs = {0, ...}. >> + Implicit predication that writes to memory or prefetches: no-op. >> + Return the new gimple statement on success, else NULL. */ >> +gimple * >> +gimple_folder::fold_pfalse () >> +{ >> + if (pred == PRED_none) >> + return nullptr; >> + tree arg0 = gimple_call_arg (call, 0); >> + if (pred == PRED_m) >> + { >> + /* Unary function shapes with _m predication are folded to the >> + inactive vector (arg0), while other function shapes are folded >> + to op1 (arg1). */ >> + tree arg1 = gimple_call_arg (call, 1); >> + tree t; >> + if (is_pfalse (arg1)) >> + t = arg0; >> + else if (is_pfalse (arg0)) >> + t = arg1; >> + else >> + return nullptr; >> + /* In some intrinsics, e.g. svqshlu, lhs and op1 have different types, >> + such that folding to op1 needs a type conversion. */ >> + if (TREE_TYPE (t) == TREE_TYPE (lhs)) >> + return fold_call_to (t); >> + else >> + { >> + tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); >> + return gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs); >> + } > > I think we should move the VIEW_CONVERT_EXPR into fold_call_to, > in case it's useful elsewhere. Admittedly that could also mask bugs, > but still, it's seems like a nice facility to have. > > Using types_compatible_p (TREE_TYPE (lhs), TREE_TYPE (t)) would be > more general than direct == between types. > > This block would then become: > > /* Unary function shapes with _m predication are folded to the > inactive vector (arg0), while other function shapes are folded > to op1 (arg1). */ > tree arg1 = gimple_call_arg (call, 1); > if (is_pfalse (arg1)) > return fold_call_to (arg0); > if (is_pfalse (arg0)) > return fold_call_to (arg1); > return nullptr; > >> + } >> + if ((pred == PRED_x || pred == PRED_z) && is_pfalse (arg0)) >> + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); >> + if (pred == PRED_implicit && is_pfalse (arg0)) >> + { >> + unsigned int flags = call_properties (); >> + /* Folding to lhs = {0, ...} is not appropriate for intrinsics with >> + AGGREGATE types as lhs. */ >> + if ((flags & CP_READ_MEMORY) >> + && !AGGREGATE_TYPE_P (TREE_TYPE (lhs))) >> + return fold_call_to (build_zero_cst (TREE_TYPE (lhs))); >> + if (flags & (CP_WRITE_MEMORY | CP_PREFETCH_MEMORY)) >> + return fold_to_stmt_vops (gimple_build_nop ()); >> + } >> + return nullptr; >> +} >> + >> /* Fold the call to constant VAL. */ >> gimple * >> gimple_folder::fold_to_cstu (poly_uint64 val) >> @@ -3724,6 +3784,23 @@ gimple_folder::fold_active_lanes_to (tree x) >> return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); >> } >> >> +/* Fold call to assignment statement lhs = t. */ >> +gimple * >> +gimple_folder::fold_call_to (tree t) >> +{ >> + return fold_to_stmt_vops (gimple_build_assign (lhs, t)); >> +} > > This would then become something like: > > /* Fold the result of the call to T, bitcasting to the right type if > necessary. */ > gimple * > gimple_folder::fold_call_to (tree t) > { > if (types_compatible_p (TREE_TYPE (lhs), TREE_TYPE (t))) > return fold_to_stmt_vops (gimple_build_assign (lhs, t)); > > tree rhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), t); > return fold_to_stmt_vops (gimple_build_assign (lhs, VIEW_CONVERT_EXPR, rhs)); > } > > OK with those changes, thanks. Thank you. I made the changes and pushed to trunk: 5289540ed58e42ae66255e31f22afe4ca0a6e15e. Best, Jennifer > > Richard
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc index f190770250f..3350ecfcca4 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc @@ -22,6 +22,10 @@ #include "coretypes.h" #include "tm.h" #include "tree.h" +#include "basic-block.h" +#include "function.h" +#include "gimple.h" +#include "gimple-iterator.h" #include "rtl.h" #include "tm_p.h" #include "memmodel.h" @@ -1244,6 +1248,17 @@ struct binary_def : public overloaded_base<0> { return r.resolve_uniform (2); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + gimple_call_arg (f.call, 2)); + } }; SHAPE (binary) @@ -1273,6 +1288,17 @@ struct binary_int_opt_n_def : public overloaded_base<0> return r.finish_opt_n_resolution (i + 1, i, type, TYPE_signed); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_int_opt_n) @@ -1308,6 +1334,17 @@ struct binary_int_opt_single_n_def : public overloaded_base<0> ? r.finish_opt_n_resolution (i + 1, i, type.type, TYPE_signed) : r.finish_opt_single_resolution (i + 1, i, type, TYPE_signed)); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_int_opt_single_n) @@ -1512,6 +1549,17 @@ struct binary_opt_n_def : public overloaded_base<0> { return r.resolve_uniform_opt_n (2); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_opt_n) @@ -1547,6 +1595,17 @@ struct binary_opt_single_n_def : public overloaded_base<0> ? r.finish_opt_n_resolution (i + 1, i, type.type) : r.finish_opt_single_resolution (i + 1, i, type)); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_opt_single_n) @@ -1584,6 +1643,17 @@ struct binary_rotate_def : public overloaded_base<0> { return c.require_immediate_either_or (2, 90, 270); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_rotate) @@ -1645,6 +1715,15 @@ struct binary_to_uint_def : public overloaded_base<0> { return r.resolve_uniform (2); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (NULL, NULL, + build_zero_cst (TREE_TYPE (f.lhs)), NULL); + } }; SHAPE (binary_to_uint) @@ -1730,6 +1809,17 @@ struct binary_uint_opt_n_def : public overloaded_base<0> return r.finish_opt_n_resolution (i + 1, i, type, TYPE_unsigned); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_uint_opt_n) @@ -1787,6 +1877,17 @@ struct binary_uint64_opt_n_def : public overloaded_base<0> return r.finish_opt_n_resolution (i + 1, i, type, TYPE_unsigned, 64); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_uint64_opt_n) @@ -1813,6 +1914,17 @@ struct binary_wide_def : public overloaded_base<0> return r.resolve_to (r.mode_suffix_id, type); } + + gimple * + fold_pfalse (gimple_folder &f) const override + { + if (!f.arg_is_pfalse_p (0)) + return NULL; + return f.fold_by_pred (gimple_call_arg (f.call, 1), + gimple_call_arg (f.call, 1), + build_zero_cst (TREE_TYPE (f.lhs)), + NULL); + } }; SHAPE (binary_wide) diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index af6469fff71..5409fdff3e5 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -3459,6 +3459,15 @@ is_ptrue (tree v, unsigned int step) && vector_cst_all_same (v, step)); } +/* Return true if V is a constant predicate that acts as a pfalse. */ +bool +is_pfalse (tree v) +{ + return (TREE_CODE (v) == VECTOR_CST + && TYPE_MODE (TREE_TYPE (v)) == VNx16BImode + && integer_zerop (v)); +} + gimple_folder::gimple_folder (const function_instance &instance, tree fndecl, gimple_stmt_iterator *gsi_in, gcall *call_in) : function_call_info (gimple_location (call_in), instance, fndecl), @@ -3564,6 +3573,14 @@ gimple_folder::redirect_pred_x () return redirect_call (instance); } +/* Return true if the argument at position IDX is pfalse, + else return false. */ +bool +gimple_folder::arg_is_pfalse_p (unsigned int idx) +{ + return is_pfalse (gimple_call_arg (call, idx)); +} + /* Fold the call to constant VAL. */ gimple * gimple_folder::fold_to_cstu (poly_uint64 val) @@ -3666,6 +3683,39 @@ gimple_folder::fold_active_lanes_to (tree x) return gimple_build_assign (lhs, VEC_COND_EXPR, pred, x, vec_inactive); } +/* Fold call to assignment statement + lhs = new_lhs, + where new_lhs is determined by the predication. + Return the gimple statement on success, else return NULL. */ +gimple * +gimple_folder::fold_by_pred (tree m, tree x, tree z, tree implicit) +{ + tree new_lhs = NULL; + switch (pred) + { + case PRED_z: + new_lhs = z; + break; + case PRED_m: + new_lhs = m; + break; + case PRED_x: + new_lhs = x; + break; + case PRED_implicit: + new_lhs = implicit; + break; + default: + return NULL; + } + gcc_assert (new_lhs); + gimple_seq stmts = NULL; + gimple *g = gimple_build_assign (lhs, new_lhs); + gimple_seq_add_stmt_without_update (&stmts, g); + gsi_replace_with_seq_vops (gsi, stmts); + return g; +} + /* Try to fold the call. Return the new statement on success and null on failure. */ gimple * @@ -3685,6 +3735,9 @@ gimple_folder::fold () /* First try some simplifications that are common to many functions. */ if (auto *call = redirect_pred_x ()) return call; + if (pred != PRED_none) + if (auto *call = shape->fold_pfalse (*this)) + return call; return base->fold (*this); } diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 4cdc0541bdc..4e443a8192e 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -632,12 +632,15 @@ public: gcall *redirect_call (const function_instance &); gimple *redirect_pred_x (); + bool arg_is_pfalse_p (unsigned int idx); + gimple *fold_to_cstu (poly_uint64); gimple *fold_to_pfalse (); gimple *fold_to_ptrue (); gimple *fold_to_vl_pred (unsigned int); gimple *fold_const_binary (enum tree_code); gimple *fold_active_lanes_to (tree); + gimple *fold_by_pred (tree, tree, tree, tree); gimple *fold (); @@ -796,6 +799,11 @@ public: /* Check whether the given call is semantically valid. Return true if it is, otherwise report an error and return false. */ virtual bool check (function_checker &) const { return true; } + + /* For a pfalse predicate, try to fold the given gimple call. + Return the new gimple statement on success, otherwise return null. */ + virtual gimple *fold_pfalse (gimple_folder &) const { return NULL; } + }; /* RAII class for enabling enough SVE features to define the built-in @@ -829,6 +837,7 @@ extern tree acle_svprfop; bool vector_cst_all_same (tree, unsigned int); bool is_ptrue (tree, unsigned int); +bool is_pfalse (tree); const function_instance *lookup_fndecl (tree); /* Try to find a mode with the given mode_suffix_info fields. Return the diff --git a/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c new file mode 100644 index 00000000000..3910ab36b6b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c @@ -0,0 +1,176 @@ +#include <arm_sve.h> + +#define MXZ(F, RTY, TY1, TY2) \ + RTY F##_f (TY1 op1, TY2 op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2); \ + } + +#define PRED_MXv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY##_m, sv##RTY, sv##TYPE1, sv##TYPE2) \ + MXZ (F##_##TY##_x, sv##RTY, sv##TYPE1, sv##TYPE2) + +#define PRED_Zv(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY##_z, sv##RTY, sv##TYPE1, sv##TYPE2) + +#define PRED_MXZv(F, RTY, TYPE1, TYPE2, TY) \ + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) + +#define PRED_Z(F, RTY, TYPE1, TYPE2, TY) \ + PRED_Zv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY##_z, sv##RTY, sv##TYPE1, TYPE2) + +#define PRED_MXZ(F, RTY, TYPE1, TYPE2, TY) \ + PRED_MXv (F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_n_##TY##_m, sv##RTY, sv##TYPE1, TYPE2) \ + MXZ (F##_n_##TY##_x, sv##RTY, sv##TYPE1, TYPE2) \ + PRED_Z (F, RTY, TYPE1, TYPE2, TY) + +#define PRED_IMPLICIT(F, RTY, TYPE1, TYPE2, TY) \ + MXZ (F##_##TY, sv##RTY, sv##TYPE1, sv##TYPE2) + +#define ALL_Q_INTEGER(F, P) \ + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ + PRED_##P (F, int8_t, int8_t, int8_t, s8) + +#define ALL_Q_INTEGER_UINT(F, P) \ + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ + PRED_##P (F, int8_t, int8_t, uint8_t, s8) + +#define ALL_Q_INTEGER_INT(F, P) \ + PRED_##P (F, uint8_t, uint8_t, int8_t, u8) \ + PRED_##P (F, int8_t, int8_t, int8_t, s8) + +#define ALL_H_INTEGER(F, P) \ + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ + PRED_##P (F, int16_t, int16_t, int16_t, s16) + +#define ALL_H_INTEGER_UINT(F, P) \ + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ + PRED_##P (F, int16_t, int16_t, uint16_t, s16) + +#define ALL_H_INTEGER_INT(F, P) \ + PRED_##P (F, uint16_t, uint16_t, int16_t, u16) \ + PRED_##P (F, int16_t, int16_t, int16_t, s16) + +#define ALL_H_INTEGER_WIDE(F, P) \ + PRED_##P (F, uint16_t, uint16_t, uint8_t, u16) \ + PRED_##P (F, int16_t, int16_t, int8_t, s16) + +#define ALL_S_INTEGER(F, P) \ + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ + PRED_##P (F, int32_t, int32_t, int32_t, s32) + +#define ALL_S_INTEGER_UINT(F, P) \ + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ + PRED_##P (F, int32_t, int32_t, uint32_t, s32) + +#define ALL_S_INTEGER_INT(F, P) \ + PRED_##P (F, uint32_t, uint32_t, int32_t, u32) \ + PRED_##P (F, int32_t, int32_t, int32_t, s32) + +#define ALL_S_INTEGER_WIDE(F, P) \ + PRED_##P (F, uint32_t, uint32_t, uint16_t, u32) \ + PRED_##P (F, int32_t, int32_t, int16_t, s32) + +#define ALL_D_INTEGER(F, P) \ + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ + PRED_##P (F, int64_t, int64_t, int64_t, s64) + +#define ALL_D_INTEGER_UINT(F, P) \ + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ + PRED_##P (F, int64_t, int64_t, uint64_t, s64) + +#define ALL_D_INTEGER_INT(F, P) \ + PRED_##P (F, uint64_t, uint64_t, int64_t, u64) \ + PRED_##P (F, int64_t, int64_t, int64_t, s64) + +#define ALL_D_INTEGER_WIDE(F, P) \ + PRED_##P (F, uint64_t, uint64_t, uint32_t, u64) \ + PRED_##P (F, int64_t, int64_t, int32_t, s64) + +#define SD_INTEGER_TO_UINT(F, P) \ + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) \ + PRED_##P (F, uint32_t, int32_t, int32_t, s32) \ + PRED_##P (F, uint64_t, int64_t, int64_t, s64) + +#define BHS_UNSIGNED_UINT64(F, P) \ + PRED_##P (F, uint8_t, uint8_t, uint64_t, u8) \ + PRED_##P (F, uint16_t, uint16_t, uint64_t, u16) \ + PRED_##P (F, uint32_t, uint32_t, uint64_t, u32) + +#define BHS_SIGNED_UINT64(F, P) \ + PRED_##P (F, int8_t, int8_t, uint64_t, s8) \ + PRED_##P (F, int16_t, int16_t, uint64_t, s16) \ + PRED_##P (F, int32_t, int32_t, uint64_t, s32) + +#define ALL_UNSIGNED_UINT(F, P) \ + PRED_##P (F, uint8_t, uint8_t, uint8_t, u8) \ + PRED_##P (F, uint16_t, uint16_t, uint16_t, u16) \ + PRED_##P (F, uint32_t, uint32_t, uint32_t, u32) \ + PRED_##P (F, uint64_t, uint64_t, uint64_t, u64) + +#define ALL_SIGNED_UINT(F, P) \ + PRED_##P (F, int8_t, int8_t, uint8_t, s8) \ + PRED_##P (F, int16_t, int16_t, uint16_t, s16) \ + PRED_##P (F, int32_t, int32_t, uint32_t, s32) \ + PRED_##P (F, int64_t, int64_t, uint64_t, s64) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, float16_t, float16_t, float16_t, f16) \ + PRED_##P (F, float32_t, float32_t, float32_t, f32) \ + PRED_##P (F, float64_t, float64_t, float64_t, f64) + +#define ALL_FLOAT_INT(F, P) \ + PRED_##P (F, float16_t, float16_t, int16_t, f16) \ + PRED_##P (F, float32_t, float32_t, int32_t, f32) \ + PRED_##P (F, float64_t, float64_t, int64_t, f64) + +#define B(F, P) \ + PRED_##P (F, bool_t, bool_t, bool_t, b) + +#define ALL_SD_INTEGER(F, P) \ + ALL_S_INTEGER (F, P) \ + ALL_D_INTEGER (F, P) + +#define HSD_INTEGER_WIDE(F, P) \ + ALL_H_INTEGER_WIDE (F, P) \ + ALL_S_INTEGER_WIDE (F, P) \ + ALL_D_INTEGER_WIDE (F, P) + +#define BHS_INTEGER_UINT64(F, P) \ + BHS_UNSIGNED_UINT64 (F, P) \ + BHS_SIGNED_UINT64 (F, P) + +#define ALL_INTEGER(F, P) \ + ALL_Q_INTEGER (F, P) \ + ALL_H_INTEGER (F, P) \ + ALL_S_INTEGER (F, P) \ + ALL_D_INTEGER (F, P) + +#define ALL_INTEGER_UINT(F, P) \ + ALL_Q_INTEGER_UINT (F, P) \ + ALL_H_INTEGER_UINT (F, P) \ + ALL_S_INTEGER_UINT (F, P) \ + ALL_D_INTEGER_UINT (F, P) + +#define ALL_INTEGER_INT(F, P) \ + ALL_Q_INTEGER_INT (F, P) \ + ALL_H_INTEGER_INT (F, P) \ + ALL_S_INTEGER_INT (F, P) \ + ALL_D_INTEGER_INT (F, P) + +#define ALL_FLOAT_AND_SD_INTEGER(F, P) \ + ALL_SD_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_ARITH(F, P) \ + ALL_INTEGER (F, P) \ + ALL_FLOAT (F, P) + +#define ALL_DATA(F, P) \ + ALL_ARITH (F, P) \ + PRED_##P (F, bfloat16_t, bfloat16_t, bfloat16_t, bf16) + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c new file mode 100644 index 00000000000..ef629413185 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +B (brkn, Zv) +B (brkpa, Zv) +B (brkpb, Zv) +ALL_DATA (splice, IMPLICIT) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmov\tz0\.d, z1\.d\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 15 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c new file mode 100644 index 00000000000..91d574f9249 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_FLOAT_INT (scale, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c new file mode 100644 index 00000000000..25c793ff40f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_ARITH (abd, MXZ) +ALL_ARITH (add, MXZ) +ALL_INTEGER (and, MXZ) +B (and, Zv) +ALL_INTEGER (bic, MXZ) +B (bic, Zv) +ALL_FLOAT_AND_SD_INTEGER (div, MXZ) +ALL_FLOAT_AND_SD_INTEGER (divr, MXZ) +ALL_INTEGER (eor, MXZ) +B (eor, Zv) +ALL_ARITH (mul, MXZ) +ALL_INTEGER (mulh, MXZ) +ALL_FLOAT (mulx, MXZ) +B (nand, Zv) +B (nor, Zv) +B (orn, Zv) +ALL_INTEGER (orr, MXZ) +B (orr, Zv) +ALL_ARITH (sub, MXZ) +ALL_ARITH (subr, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 448 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tpfalse\tp0\.b\n\tret\n} 7 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 679 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c new file mode 100644 index 00000000000..8d187c22eec --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_ARITH (max, MXZ) +ALL_ARITH (min, MXZ) +ALL_FLOAT (maxnm, MXZ) +ALL_FLOAT (minnm, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 56 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 168 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c new file mode 100644 index 00000000000..9940866d5ef --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include <arm_sve.h> + +#define MXZ4(F, TYPE) \ + TYPE F##_f (TYPE op1, TYPE op2) \ + { \ + return sv##F (svpfalse_b (), op1, op2, 90); \ + } + +#define PRED_MXZ(F, TYPE, TY) \ + MXZ4 (F##_##TY##_m, TYPE) \ + MXZ4 (F##_##TY##_x, TYPE) \ + MXZ4 (F##_##TY##_z, TYPE) + +#define ALL_FLOAT(F, P) \ + PRED_##P (F, svfloat16_t, f16) \ + PRED_##P (F, svfloat32_t, f32) \ + PRED_##P (F, svfloat64_t, f64) + +ALL_FLOAT (cadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 3 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 9 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c new file mode 100644 index 00000000000..f8fd18043e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +BHS_SIGNED_UINT64 (asr_wide, MXZ) +BHS_INTEGER_UINT64 (lsl_wide, MXZ) +BHS_UNSIGNED_UINT64 (lsr_wide, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 48 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 24 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 72 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c new file mode 100644 index 00000000000..2f1d7721bc8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_SIGNED_UINT (asr, MXZ) +ALL_INTEGER_UINT (lsl, MXZ) +ALL_UNSIGNED_UINT (lsr, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 64 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 32 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 96 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c new file mode 100644 index 00000000000..723fcd0a203 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_ARITH (addp, MXv) +ALL_ARITH (maxp, MXv) +ALL_FLOAT (maxnmp, MXv) +ALL_ARITH (minp, MXv) +ALL_FLOAT (minnmp, MXv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 78 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 78 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c new file mode 100644 index 00000000000..6e8be86f9b0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_INTEGER_INT (rshl, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 32 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 48 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c new file mode 100644 index 00000000000..7335a4ff011 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_INTEGER (hadd, MXZ) +ALL_INTEGER (hsub, MXZ) +ALL_INTEGER (hsubr, MXZ) +ALL_INTEGER (qadd, MXZ) +ALL_INTEGER (qsub, MXZ) +ALL_INTEGER (qsubr, MXZ) +ALL_INTEGER (rhadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 224 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 112 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 336 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c new file mode 100644 index 00000000000..e03e1e890f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +SD_INTEGER_TO_UINT (histcnt, Zv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 4 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c new file mode 100644 index 00000000000..2649fc01954 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +ALL_SIGNED_UINT (uqadd, MXZ) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 16 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 8 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c new file mode 100644 index 00000000000..72693d01ad0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "../pfalse-binary_0.c" + +HSD_INTEGER_WIDE (adalp, MXZv) + +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tret\n} 12 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n\tmovi?\t[vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0\n\tret\n} 6 } } */ +/* { dg-final { scan-assembler-times {\t.cfi_startproc\n} 18 } } */
If an SVE intrinsic has predicate pfalse, we can fold the call to a simplified assignment statement: For _m, _x, and implicit predication, the LHS can be assigned the operand for inactive values and for _z, we can assign a zero vector. For example, svint32_t foo (svint32_t op1, svint32_t op2) { return svadd_s32_m (svpfalse_b (), op1, op2); } can be folded to lhs <- op1, such that foo is compiled to just a RET. We implemented this optimization during gimple folding by calling a new method function_shape::fold_pfalse from gimple_folder::fold. The implementations of fold_pfalse in the function_shape subclasses define the expected behavior for a pfalse predicate for the different predications and return an appropriate gimple statement. To avoid code duplication, function_shape::fold_pfalse calls a new method gimple_folder:fold_by_pred that takes arguments of type tree for each predication and returns the new assignment statement depending on the predication. We tested the new behavior for each intrinsic with all supported predications and data types and checked the produced assembly. There is a test file for each shape subclass with scan-assembler-times tests that look for the simplified instruction sequences, such as individual RET instructions or zeroing moves. There is an additional directive counting the total number of functions in the test, which must be the sum of counts of all other directives. This is to check that all tested intrinsics were optimized. In this patch, we only implemented function_shape::fold_pfalse for binary shapes. But we plan to cover more shapes in follow-up patches, after getting feedback on this patch. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR target/106329 * config/aarch64/aarch64-sve-builtins-shapes.cc (struct binary_def): Implement fold_pfalse to simplify function calls with predicate pfalse. (struct binary_int_opt_n_def): Likewise. (struct binary_int_opt_single_n_def): Likewise. (struct binary_opt_n_def): Likewise. (struct binary_opt_single_n_def): Likewise. (struct binary_rotate_def): Likewise. (struct binary_to_uint_def): Likewise. (struct binary_uint_opt_n_def): Likewise. (struct binary_uint64_opt_n_def): Likewise. (struct binary_wide_def): Likewise. * config/aarch64/aarch64-sve-builtins.cc (is_pfalse): New function checking whether a given tree is a pfalse predicate. (gimple_folder::arg_is_pfalse_p): New function checking that the argument at a given index is a pfalse predicate. (gimple_folder::fold_by_pred): New function that folds the call based on the predication. (gimple_folder::fold): Call function_shape::fold_pfalse. * config/aarch64/aarch64-sve-builtins.h: Declare is_pfalse, gimple_folder::fold_by_pred, and gimple_folder::arg_is_pfalse_p. gcc/testsuite/ PR target/106329 * gcc.target/aarch64/pfalse-binary_0.c: New test. * gcc.target/aarch64/sve/pfalse-binary.c: New test. * gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test. * gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: New test. * gcc.target/aarch64/sve2/pfalse-binary_wide.c: New test. --- .../aarch64/aarch64-sve-builtins-shapes.cc | 112 +++++++++++ gcc/config/aarch64/aarch64-sve-builtins.cc | 53 ++++++ gcc/config/aarch64/aarch64-sve-builtins.h | 9 + .../gcc.target/aarch64/pfalse-binary_0.c | 176 ++++++++++++++++++ .../gcc.target/aarch64/sve/pfalse-binary.c | 13 ++ .../aarch64/sve/pfalse-binary_int_opt_n.c | 10 + .../aarch64/sve/pfalse-binary_opt_n.c | 30 +++ .../aarch64/sve/pfalse-binary_opt_single_n.c | 13 ++ .../aarch64/sve/pfalse-binary_rotate.c | 26 +++ .../aarch64/sve/pfalse-binary_uint64_opt_n.c | 12 ++ .../aarch64/sve/pfalse-binary_uint_opt_n.c | 12 ++ .../gcc.target/aarch64/sve2/pfalse-binary.c | 13 ++ .../sve2/pfalse-binary_int_opt_single_n.c | 10 + .../aarch64/sve2/pfalse-binary_opt_n.c | 16 ++ .../aarch64/sve2/pfalse-binary_to_uint.c | 9 + .../aarch64/sve2/pfalse-binary_uint_opt_n.c | 10 + .../aarch64/sve2/pfalse-binary_wide.c | 10 + 17 files changed, 534 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/pfalse-binary_0.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_to_uint.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pfalse-binary_wide.c