Message ID | 20230922001238.97411-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v4] RISC-V: Support ceil and ceilf auto-vectorization | expand |
LGTM juzhe.zhong@rivai.ai From: pan2.li Date: 2023-09-22 08:12 To: gcc-patches CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng Subject: [PATCH v4] RISC-V: Support ceil and ceilf auto-vectorization From: Pan Li <pan2.li@intel.com> Update in v4: * Add test for _Float16. * Remove unnecessary macro in def.h for test. Original log: This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will convert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1, RUP * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +-----------+---------------+ | float | binary layout | +-----------+---------------+ | 8388607.5 | 0x4affffff | | 8388608.0 | 0x4b000000 | | 8388609.0 | 0x4b000001 | +-----------+---------------+ All single floating point great than 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-ceil-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call ceilf fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 3 .L4: vfabs.v v0,v1 vmv1r.v v2,v1 vmflt.vv v0,v0,v4 sub a3,a3,a4 vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 bne .L4 .L14: fsrm a6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (ceil<mode>2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_ceil): New function decl. * config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl. (expand_vec_float_cmp_mask): Ditto. (expand_vec_copysign): Ditto. (expand_vec_ceil): Ditto. * config/riscv/vector.md: Add VLS mode support. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-0.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test. * gcc.target/riscv/rvv/autovec/test-math.h: New test. * gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com> --- gcc/config/riscv/autovec.md | 16 +++ gcc/config/riscv/riscv-protos.h | 5 + gcc/config/riscv/riscv-v.cc | 133 ++++++++++++++++++ gcc/config/riscv/vector.md | 2 +- .../riscv/rvv/autovec/math-ceil-0.c | 26 ++++ .../riscv/rvv/autovec/math-ceil-1.c | 26 ++++ .../riscv/rvv/autovec/math-ceil-2.c | 26 ++++ .../riscv/rvv/autovec/math-ceil-3.c | 28 ++++ .../riscv/rvv/autovec/math-ceil-run-0.c | 39 +++++ .../riscv/rvv/autovec/math-ceil-run-1.c | 39 +++++ .../riscv/rvv/autovec/math-ceil-run-2.c | 39 +++++ .../gcc.target/riscv/rvv/autovec/test-math.h | 38 +++++ .../riscv/rvv/autovec/vls/math-ceil-1.c | 56 ++++++++ 13 files changed, 472 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index f0f1abc4e82..1b4bd82f9ec 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2239,3 +2239,19 @@ (define_expand "<u>avg<v_double_trunc>3_ceil" riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3); DONE; }) + +;; ------------------------------------------------------------------------- +;; ---- [FP] Math.h. +;; ------------------------------------------------------------------------- +;; Includes: +;; - ceil/ceilf +;; ------------------------------------------------------------------------- +(define_expand "ceil<mode>2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { + riscv_vector::expand_vec_ceil (operands[0], operands[1], <MODE>mode, <VCONVERT>mode); + DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 9ea0bcf15d3..07b4ffe3edf 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -250,6 +250,9 @@ enum insn_flags : unsigned int /* flags for the floating-point rounding mode. */ /* Means INSN has FRM operand and the value is FRM_DYN. */ FRM_DYN_P = 1 << 15, + + /* Means INSN has FRM operand and the value is FRM_RUP. */ + FRM_RUP_P = 1 << 16, }; enum insn_type : unsigned int @@ -290,6 +293,7 @@ enum insn_type : unsigned int UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P, UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P, UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P, + UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P, /* Binary operator. */ BINARY_OP = __NORMAL_OP | BINARY_OP_P, @@ -432,6 +436,7 @@ bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool); void expand_cond_len_unop (unsigned, rtx *); void expand_cond_len_binop (unsigned, rtx *); void expand_reduction (unsigned, unsigned, rtx *, rtx); +void expand_vec_ceil (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 4b9a494f8eb..f63dec573ef 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -323,6 +323,8 @@ public: /* Add rounding mode operand. */ if (m_insn_flags & FRM_DYN_P) add_rounding_mode_operand (FRM_DYN); + if (m_insn_flags & FRM_RUP_P) + add_rounding_mode_operand (FRM_RUP); gcc_assert (insn_data[(int) icode].n_operands == m_opno); expand (icode, any_mem_p); @@ -3439,4 +3441,135 @@ cmp_lmul_gt_one (machine_mode mode) return false; } +/* We don't have to convert the floating point to integer when the + mantissa is zero. Thus, ther will be a limitation for both the + single and double precision floating point. There will be no + mantissa if the floating point is greater than the limit. + + 1. Half floating point. + +-----------+---------------+ + | float | binary layout | + +-----------+---------------+ + | 1023.5 | 0x63ff | + +-----------+---------------+ + | 1024.0 | 0x6400 | + +-----------+---------------+ + | 1025.0 | 0x6401 | + +-----------+---------------+ + | ... | ... | + + All half floating point will be unchanged for ceil if it is + greater than and equal to 1024. + + 2. Single floating point. + +-----------+---------------+ + | float | binary layout | + +-----------+---------------+ + | 8388607.5 | 0x4affffff | + +-----------+---------------+ + | 8388608.0 | 0x4b000000 | + +-----------+---------------+ + | 8388609.0 | 0x4b000001 | + +-----------+---------------+ + | ... | ... | + + All single floating point will be unchanged for ceil if it is + greater than and equal to 8388608. + + 3. Double floating point. + +--------------------+--------------------+ + | float | binary layout | + +--------------------+--------------------+ + | 4503599627370495.5 | 0X432fffffffffffff | + +--------------------+--------------------+ + | 4503599627370496.0 | 0X4330000000000000 | + +--------------------+--------------------+ + | 4503599627370497.0 | 0X4340000000000000 | + +--------------------+--------------------+ + | ... | ... | + + All double floating point will be unchanged for ceil if it is + greater than and equal to 4503599627370496. + */ +static rtx +gen_ceil_const_fp (machine_mode inner_mode) +{ + REAL_VALUE_TYPE real; + + if (inner_mode == E_HFmode) + real_from_integer (&real, inner_mode, 1024, SIGNED); + else if (inner_mode == E_SFmode) + real_from_integer (&real, inner_mode, 8388608, SIGNED); + else if (inner_mode == E_DFmode) + real_from_integer (&real, inner_mode, 4503599627370496, SIGNED); + else + gcc_unreachable (); + + return const_double_from_real_value (real, inner_mode); +} + +static rtx +expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, + machine_mode vec_fp_mode) +{ + /* Step-1: Get the abs float value for mask generation. */ + rtx tmp = gen_reg_rtx (vec_fp_mode); + rtx abs_ops[] = {tmp, fp_vector}; + insn_code icode = code_for_pred (ABS, vec_fp_mode); + emit_vlmax_insn (icode, UNARY_OP, abs_ops); + + /* Step-2: Prepare the scalar float compare register. */ + rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode)); + emit_insn (gen_move_insn (fp_reg, fp_scalar)); + + /* Step-3: Prepare the vector float compare register. */ + rtx vec_dup = gen_reg_rtx (vec_fp_mode); + icode = code_for_pred_broadcast (vec_fp_mode); + rtx vfmv_ops[] = {vec_dup, fp_reg}; + emit_vlmax_insn (icode, UNARY_OP, vfmv_ops); + + /* Step-4: Generate the mask. */ + machine_mode mask_mode = get_mask_mode (vec_fp_mode); + rtx mask = gen_reg_rtx (mask_mode); + expand_vec_cmp (mask, code, tmp, vec_dup); + + return mask; +} + +static void +expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1, + machine_mode vec_mode) +{ + rtx sgnj_ops[] = {op_dest, op_src_0, op_src_1}; + insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, sgnj_ops); +} + +void +expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode, + machine_mode vec_int_mode) +{ + /* Step-1: Generate the mask on const fp. */ + rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode)); + rtx mask = expand_vec_float_cmp_mask (op_1, LT, const_fp, vec_fp_mode); + + /* Step-2: Convert to integer on mask, with rounding up (aka ceil). */ + rtx tmp = gen_reg_rtx (vec_int_mode); + rtx cvt_x_ops[] = {tmp, mask, tmp, op_1}; + insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_fp_mode); + emit_vlmax_insn (icode, UNARY_OP_TAMU_FRM_RUP, cvt_x_ops); + + /* Step-3: Convert to floating-point on mask for the final result. + To avoid unnecessary frm register access, we use RUP here and it will + never do the rounding up because the tmp rtx comes from the float + to int conversion. */ + rtx cvt_fp_ops[] = {op_0, mask, op_1, tmp}; + icode = code_for_pred (FLOAT, vec_fp_mode); + emit_vlmax_insn (icode, UNARY_OP_TAMU_FRM_RUP, cvt_fp_ops); + + /* Step-4: Retrieve the sign bit. */ + expand_vec_copysign (op_0, op_0, op_1, vec_fp_mode); +} + } // namespace riscv_vector diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 36f0256c747..73f90dea36b 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7438,7 +7438,7 @@ (define_insn "@pred_fcvt_x<v_su>_f<mode>" (reg:SI VTYPE_REGNUM) (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) (unspec:<VCONVERT> - [(match_operand:VF 3 "register_operand" " vr, vr, vr, vr")] VFCVTS) + [(match_operand:V_VLSF 3 "register_operand" " vr, vr, vr, vr")] VFCVTS) (match_operand:<VCONVERT> 2 "vector_merge_operand" " vu, 0, vu, 0")))] "TARGET_VECTOR" "vfcvt.x<v_su>.f.v\t%0,%3%p1" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c new file mode 100644 index 00000000000..88a2ac4b338 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test__Float16___builtin_ceilf16: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_CEIL(_Float16, __builtin_ceilf16) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c new file mode 100644 index 00000000000..0908ef269bd --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float___builtin_ceilf: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_CEIL(float, __builtin_ceilf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c new file mode 100644 index 00000000000..65d4807edef --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_double___builtin_ceil: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_CEIL(double, __builtin_ceil) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c new file mode 100644 index 00000000000..416698a753e --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float___builtin_ceilf: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0 +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_COND_CEIL(float, __builtin_ceilf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c new file mode 100644 index 00000000000..f1946e197cc --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -std=c2x -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_CEIL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1025.0, 1025.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c new file mode 100644 index 00000000000..202944ddd92 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +float in[ARRAY_SIZE]; +float out[ARRAY_SIZE]; +float ref[ARRAY_SIZE]; + +TEST_CEIL (float, __builtin_ceilf) +TEST_ASSERT (float) + +TEST_INIT (float, 1.2, 2.0, 1) +TEST_INIT (float, -1.2, -1.0, 2) +TEST_INIT (float, 3.0, 3.0, 3) +TEST_INIT (float, 8388607.5, 8388608.0, 4) +TEST_INIT (float, 8388609.0, 8388609.0, 5) +TEST_INIT (float, 0.0, 0.0, 6) +TEST_INIT (float, -0.0, -0.0, 7) +TEST_INIT (float, -8388607.5, -8388607.0, 8) +TEST_INIT (float, -8388608.0, -8388608.0, 9) + +int +main () +{ + RUN_TEST (float, 1, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 2, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 3, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 4, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 5, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 6, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 7, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 8, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 9, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c new file mode 100644 index 00000000000..f0ff9bca0af --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +double in[ARRAY_SIZE]; +double out[ARRAY_SIZE]; +double ref[ARRAY_SIZE]; + +TEST_CEIL (double, __builtin_ceil) +TEST_ASSERT (double) + +TEST_INIT (double, 1.2, 2.0, 1) +TEST_INIT (double, -1.2, -1.0, 2) +TEST_INIT (double, 3.0, 3.0, 3) +TEST_INIT (double, 4503599627370495.5, 4503599627370496.0, 4) +TEST_INIT (double, 4503599627370497.0, 4503599627370497.0, 5) +TEST_INIT (double, 0.0, 0.0, 6) +TEST_INIT (double, -0.0, -0.0, 7) +TEST_INIT (double, -4503599627370495.5, -4503599627370495.0, 8) +TEST_INIT (double, -4503599627370496.0, -4503599627370496.0, 9) + +int +main () +{ + RUN_TEST (double, 1, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 2, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 3, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 4, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 5, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 6, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 7, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 8, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 9, __builtin_ceil, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h new file mode 100644 index 00000000000..6e913da37f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h @@ -0,0 +1,38 @@ +#define TEST_CEIL(TYPE, CALL) \ + void test_##TYPE##_##CALL (TYPE *out, TYPE *in, unsigned count) \ + { \ + for (unsigned i = 0; i < count; i++) \ + out[i] = CALL (in[i]); \ + } + +#define TEST_COND_CEIL(TYPE, CALL) \ + void test_##TYPE##_##CALL (TYPE *out, int *cond, TYPE *in, unsigned count) \ + { \ + for (unsigned i = 0; i < count; i++) \ + out[i] = cond[i] ? CALL (in[i]) : in[i]; \ + } + +#define TEST_INIT(TYPE, VAL_IN, VAL_REF, NUM) \ + void test_##TYPE##_init_##NUM (TYPE *in, TYPE *ref, unsigned size) \ + { \ + for (unsigned i = 0; i < size; i++) \ + { \ + in[i] = VAL_IN; \ + ref[i] = VAL_REF; \ + } \ + } + +#define TEST_ASSERT(TYPE) \ + void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \ + { \ + for (unsigned i = 0; i < size; i++) \ + { \ + if (out[i] != ref[i]) \ + __builtin_abort (); \ + } \ + } + +#define RUN_TEST(TYPE, NUM, CALL, IN, OUT, REF, SIZE) \ + test_##TYPE##_init_##NUM (IN, REF, SIZE); \ + test_##TYPE##_##CALL (OUT, IN, SIZE); \ + test_##TYPE##_assert (OUT, REF, SIZE); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c new file mode 100644 index 00000000000..b113df80c5f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 -ffast-math -fdump-tree-optimized" } */ + +#include "def.h" + +DEF_OP_V (ceilf16, 1, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 2, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 4, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 8, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 16, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 32, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 64, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 128, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 256, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 512, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 1024, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 2048, _Float16, __builtin_ceilf16) + +DEF_OP_V (ceilf, 1, float, __builtin_ceilf) +DEF_OP_V (ceilf, 2, float, __builtin_ceilf) +DEF_OP_V (ceilf, 4, float, __builtin_ceilf) +DEF_OP_V (ceilf, 8, float, __builtin_ceilf) +DEF_OP_V (ceilf, 16, float, __builtin_ceilf) +DEF_OP_V (ceilf, 32, float, __builtin_ceilf) +DEF_OP_V (ceilf, 64, float, __builtin_ceilf) +DEF_OP_V (ceilf, 128, float, __builtin_ceilf) +DEF_OP_V (ceilf, 256, float, __builtin_ceilf) +DEF_OP_V (ceilf, 512, float, __builtin_ceilf) +DEF_OP_V (ceilf, 1024, float, __builtin_ceilf) + +DEF_OP_V (ceil, 1, double, __builtin_ceil) +DEF_OP_V (ceil, 2, double, __builtin_ceil) +DEF_OP_V (ceil, 4, double, __builtin_ceil) +DEF_OP_V (ceil, 8, double, __builtin_ceil) +DEF_OP_V (ceil, 16, double, __builtin_ceil) +DEF_OP_V (ceil, 32, double, __builtin_ceil) +DEF_OP_V (ceil, 64, double, __builtin_ceil) +DEF_OP_V (ceil, 128, double, __builtin_ceil) +DEF_OP_V (ceil, 256, double, __builtin_ceil) +DEF_OP_V (ceil, 512, double, __builtin_ceil) + +/* { dg-final { scan-assembler-not {csrr} } } */ +/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */ +/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */ +/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */
Committed, thanks Juzhe.
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Friday, September 22, 2023 8:47 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH v4] RISC-V: Support ceil and ceilf auto-vectorization
LGTM
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index f0f1abc4e82..1b4bd82f9ec 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2239,3 +2239,19 @@ (define_expand "<u>avg<v_double_trunc>3_ceil" riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, ops3); DONE; }) + +;; ------------------------------------------------------------------------- +;; ---- [FP] Math.h. +;; ------------------------------------------------------------------------- +;; Includes: +;; - ceil/ceilf +;; ------------------------------------------------------------------------- +(define_expand "ceil<mode>2" + [(match_operand:V_VLSF 0 "register_operand") + (match_operand:V_VLSF 1 "register_operand")] + "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math" + { + riscv_vector::expand_vec_ceil (operands[0], operands[1], <MODE>mode, <VCONVERT>mode); + DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 9ea0bcf15d3..07b4ffe3edf 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -250,6 +250,9 @@ enum insn_flags : unsigned int /* flags for the floating-point rounding mode. */ /* Means INSN has FRM operand and the value is FRM_DYN. */ FRM_DYN_P = 1 << 15, + + /* Means INSN has FRM operand and the value is FRM_RUP. */ + FRM_RUP_P = 1 << 16, }; enum insn_type : unsigned int @@ -290,6 +293,7 @@ enum insn_type : unsigned int UNARY_OP_TAMA = __MASK_OP_TAMA | UNARY_OP_P, UNARY_OP_TAMU = __MASK_OP_TAMU | UNARY_OP_P, UNARY_OP_FRM_DYN = UNARY_OP | FRM_DYN_P, + UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P, /* Binary operator. */ BINARY_OP = __NORMAL_OP | BINARY_OP_P, @@ -432,6 +436,7 @@ bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool); void expand_cond_len_unop (unsigned, rtx *); void expand_cond_len_binop (unsigned, rtx *); void expand_reduction (unsigned, unsigned, rtx *, rtx); +void expand_vec_ceil (rtx, rtx, machine_mode, machine_mode); #endif bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 4b9a494f8eb..f63dec573ef 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -323,6 +323,8 @@ public: /* Add rounding mode operand. */ if (m_insn_flags & FRM_DYN_P) add_rounding_mode_operand (FRM_DYN); + if (m_insn_flags & FRM_RUP_P) + add_rounding_mode_operand (FRM_RUP); gcc_assert (insn_data[(int) icode].n_operands == m_opno); expand (icode, any_mem_p); @@ -3439,4 +3441,135 @@ cmp_lmul_gt_one (machine_mode mode) return false; } +/* We don't have to convert the floating point to integer when the + mantissa is zero. Thus, ther will be a limitation for both the + single and double precision floating point. There will be no + mantissa if the floating point is greater than the limit. + + 1. Half floating point. + +-----------+---------------+ + | float | binary layout | + +-----------+---------------+ + | 1023.5 | 0x63ff | + +-----------+---------------+ + | 1024.0 | 0x6400 | + +-----------+---------------+ + | 1025.0 | 0x6401 | + +-----------+---------------+ + | ... | ... | + + All half floating point will be unchanged for ceil if it is + greater than and equal to 1024. + + 2. Single floating point. + +-----------+---------------+ + | float | binary layout | + +-----------+---------------+ + | 8388607.5 | 0x4affffff | + +-----------+---------------+ + | 8388608.0 | 0x4b000000 | + +-----------+---------------+ + | 8388609.0 | 0x4b000001 | + +-----------+---------------+ + | ... | ... | + + All single floating point will be unchanged for ceil if it is + greater than and equal to 8388608. + + 3. Double floating point. + +--------------------+--------------------+ + | float | binary layout | + +--------------------+--------------------+ + | 4503599627370495.5 | 0X432fffffffffffff | + +--------------------+--------------------+ + | 4503599627370496.0 | 0X4330000000000000 | + +--------------------+--------------------+ + | 4503599627370497.0 | 0X4340000000000000 | + +--------------------+--------------------+ + | ... | ... | + + All double floating point will be unchanged for ceil if it is + greater than and equal to 4503599627370496. + */ +static rtx +gen_ceil_const_fp (machine_mode inner_mode) +{ + REAL_VALUE_TYPE real; + + if (inner_mode == E_HFmode) + real_from_integer (&real, inner_mode, 1024, SIGNED); + else if (inner_mode == E_SFmode) + real_from_integer (&real, inner_mode, 8388608, SIGNED); + else if (inner_mode == E_DFmode) + real_from_integer (&real, inner_mode, 4503599627370496, SIGNED); + else + gcc_unreachable (); + + return const_double_from_real_value (real, inner_mode); +} + +static rtx +expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, + machine_mode vec_fp_mode) +{ + /* Step-1: Get the abs float value for mask generation. */ + rtx tmp = gen_reg_rtx (vec_fp_mode); + rtx abs_ops[] = {tmp, fp_vector}; + insn_code icode = code_for_pred (ABS, vec_fp_mode); + emit_vlmax_insn (icode, UNARY_OP, abs_ops); + + /* Step-2: Prepare the scalar float compare register. */ + rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode)); + emit_insn (gen_move_insn (fp_reg, fp_scalar)); + + /* Step-3: Prepare the vector float compare register. */ + rtx vec_dup = gen_reg_rtx (vec_fp_mode); + icode = code_for_pred_broadcast (vec_fp_mode); + rtx vfmv_ops[] = {vec_dup, fp_reg}; + emit_vlmax_insn (icode, UNARY_OP, vfmv_ops); + + /* Step-4: Generate the mask. */ + machine_mode mask_mode = get_mask_mode (vec_fp_mode); + rtx mask = gen_reg_rtx (mask_mode); + expand_vec_cmp (mask, code, tmp, vec_dup); + + return mask; +} + +static void +expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1, + machine_mode vec_mode) +{ + rtx sgnj_ops[] = {op_dest, op_src_0, op_src_1}; + insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, sgnj_ops); +} + +void +expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode, + machine_mode vec_int_mode) +{ + /* Step-1: Generate the mask on const fp. */ + rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode)); + rtx mask = expand_vec_float_cmp_mask (op_1, LT, const_fp, vec_fp_mode); + + /* Step-2: Convert to integer on mask, with rounding up (aka ceil). */ + rtx tmp = gen_reg_rtx (vec_int_mode); + rtx cvt_x_ops[] = {tmp, mask, tmp, op_1}; + insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_fp_mode); + emit_vlmax_insn (icode, UNARY_OP_TAMU_FRM_RUP, cvt_x_ops); + + /* Step-3: Convert to floating-point on mask for the final result. + To avoid unnecessary frm register access, we use RUP here and it will + never do the rounding up because the tmp rtx comes from the float + to int conversion. */ + rtx cvt_fp_ops[] = {op_0, mask, op_1, tmp}; + icode = code_for_pred (FLOAT, vec_fp_mode); + emit_vlmax_insn (icode, UNARY_OP_TAMU_FRM_RUP, cvt_fp_ops); + + /* Step-4: Retrieve the sign bit. */ + expand_vec_copysign (op_0, op_0, op_1, vec_fp_mode); +} + } // namespace riscv_vector diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 36f0256c747..73f90dea36b 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7438,7 +7438,7 @@ (define_insn "@pred_fcvt_x<v_su>_f<mode>" (reg:SI VTYPE_REGNUM) (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) (unspec:<VCONVERT> - [(match_operand:VF 3 "register_operand" " vr, vr, vr, vr")] VFCVTS) + [(match_operand:V_VLSF 3 "register_operand" " vr, vr, vr, vr")] VFCVTS) (match_operand:<VCONVERT> 2 "vector_merge_operand" " vu, 0, vu, 0")))] "TARGET_VECTOR" "vfcvt.x<v_su>.f.v\t%0,%3%p1" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c new file mode 100644 index 00000000000..88a2ac4b338 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test__Float16___builtin_ceilf16: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_CEIL(_Float16, __builtin_ceilf16) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c new file mode 100644 index 00000000000..0908ef269bd --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float___builtin_ceilf: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_CEIL(float, __builtin_ceilf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c new file mode 100644 index 00000000000..65d4807edef --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_double___builtin_ceil: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_CEIL(double, __builtin_ceil) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c new file mode 100644 index 00000000000..416698a753e --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "test-math.h" + +/* +** test_float___builtin_ceilf: +** frrm\s+[atx][0-9]+ +** ... +** fsrmi\s+3 +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu +** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ +** ... +** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ +** ... +** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** ... +** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t +** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ +** ... +** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0 +** ... +** fsrm\s+[atx][0-9]+ +** ... +*/ +TEST_COND_CEIL(float, __builtin_ceilf) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c new file mode 100644 index 00000000000..f1946e197cc --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -std=c2x -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +_Float16 in[ARRAY_SIZE]; +_Float16 out[ARRAY_SIZE]; +_Float16 ref[ARRAY_SIZE]; + +TEST_CEIL (_Float16, __builtin_ceilf16) +TEST_ASSERT (_Float16) + +TEST_INIT (_Float16, 1.2, 2.0, 1) +TEST_INIT (_Float16, -1.2, -1.0, 2) +TEST_INIT (_Float16, 3.0, 3.0, 3) +TEST_INIT (_Float16, 1023.5, 1024.0, 4) +TEST_INIT (_Float16, 1025.0, 1025.0, 5) +TEST_INIT (_Float16, 0.0, 0.0, 6) +TEST_INIT (_Float16, -0.0, -0.0, 7) +TEST_INIT (_Float16, -1023.5, -1023.0, 8) +TEST_INIT (_Float16, -1024.0, -1024.0, 9) + +int +main () +{ + RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c new file mode 100644 index 00000000000..202944ddd92 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +float in[ARRAY_SIZE]; +float out[ARRAY_SIZE]; +float ref[ARRAY_SIZE]; + +TEST_CEIL (float, __builtin_ceilf) +TEST_ASSERT (float) + +TEST_INIT (float, 1.2, 2.0, 1) +TEST_INIT (float, -1.2, -1.0, 2) +TEST_INIT (float, 3.0, 3.0, 3) +TEST_INIT (float, 8388607.5, 8388608.0, 4) +TEST_INIT (float, 8388609.0, 8388609.0, 5) +TEST_INIT (float, 0.0, 0.0, 6) +TEST_INIT (float, -0.0, -0.0, 7) +TEST_INIT (float, -8388607.5, -8388607.0, 8) +TEST_INIT (float, -8388608.0, -8388608.0, 9) + +int +main () +{ + RUN_TEST (float, 1, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 2, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 3, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 4, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 5, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 6, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 7, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 8, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + RUN_TEST (float, 9, __builtin_ceilf, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c new file mode 100644 index 00000000000..f0ff9bca0af --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c @@ -0,0 +1,39 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "-march=rv64gcv -std=c99 -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math" } */ + +#include "test-math.h" + +#define ARRAY_SIZE 128 + +double in[ARRAY_SIZE]; +double out[ARRAY_SIZE]; +double ref[ARRAY_SIZE]; + +TEST_CEIL (double, __builtin_ceil) +TEST_ASSERT (double) + +TEST_INIT (double, 1.2, 2.0, 1) +TEST_INIT (double, -1.2, -1.0, 2) +TEST_INIT (double, 3.0, 3.0, 3) +TEST_INIT (double, 4503599627370495.5, 4503599627370496.0, 4) +TEST_INIT (double, 4503599627370497.0, 4503599627370497.0, 5) +TEST_INIT (double, 0.0, 0.0, 6) +TEST_INIT (double, -0.0, -0.0, 7) +TEST_INIT (double, -4503599627370495.5, -4503599627370495.0, 8) +TEST_INIT (double, -4503599627370496.0, -4503599627370496.0, 9) + +int +main () +{ + RUN_TEST (double, 1, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 2, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 3, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 4, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 5, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 6, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 7, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 8, __builtin_ceil, in, out, ref, ARRAY_SIZE); + RUN_TEST (double, 9, __builtin_ceil, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h new file mode 100644 index 00000000000..6e913da37f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h @@ -0,0 +1,38 @@ +#define TEST_CEIL(TYPE, CALL) \ + void test_##TYPE##_##CALL (TYPE *out, TYPE *in, unsigned count) \ + { \ + for (unsigned i = 0; i < count; i++) \ + out[i] = CALL (in[i]); \ + } + +#define TEST_COND_CEIL(TYPE, CALL) \ + void test_##TYPE##_##CALL (TYPE *out, int *cond, TYPE *in, unsigned count) \ + { \ + for (unsigned i = 0; i < count; i++) \ + out[i] = cond[i] ? CALL (in[i]) : in[i]; \ + } + +#define TEST_INIT(TYPE, VAL_IN, VAL_REF, NUM) \ + void test_##TYPE##_init_##NUM (TYPE *in, TYPE *ref, unsigned size) \ + { \ + for (unsigned i = 0; i < size; i++) \ + { \ + in[i] = VAL_IN; \ + ref[i] = VAL_REF; \ + } \ + } + +#define TEST_ASSERT(TYPE) \ + void test_##TYPE##_assert (TYPE *out, TYPE *ref, unsigned size) \ + { \ + for (unsigned i = 0; i < size; i++) \ + { \ + if (out[i] != ref[i]) \ + __builtin_abort (); \ + } \ + } + +#define RUN_TEST(TYPE, NUM, CALL, IN, OUT, REF, SIZE) \ + test_##TYPE##_init_##NUM (IN, REF, SIZE); \ + test_##TYPE##_##CALL (OUT, IN, SIZE); \ + test_##TYPE##_assert (OUT, REF, SIZE); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c new file mode 100644 index 00000000000..b113df80c5f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 -ffast-math -fdump-tree-optimized" } */ + +#include "def.h" + +DEF_OP_V (ceilf16, 1, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 2, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 4, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 8, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 16, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 32, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 64, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 128, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 256, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 512, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 1024, _Float16, __builtin_ceilf16) +DEF_OP_V (ceilf16, 2048, _Float16, __builtin_ceilf16) + +DEF_OP_V (ceilf, 1, float, __builtin_ceilf) +DEF_OP_V (ceilf, 2, float, __builtin_ceilf) +DEF_OP_V (ceilf, 4, float, __builtin_ceilf) +DEF_OP_V (ceilf, 8, float, __builtin_ceilf) +DEF_OP_V (ceilf, 16, float, __builtin_ceilf) +DEF_OP_V (ceilf, 32, float, __builtin_ceilf) +DEF_OP_V (ceilf, 64, float, __builtin_ceilf) +DEF_OP_V (ceilf, 128, float, __builtin_ceilf) +DEF_OP_V (ceilf, 256, float, __builtin_ceilf) +DEF_OP_V (ceilf, 512, float, __builtin_ceilf) +DEF_OP_V (ceilf, 1024, float, __builtin_ceilf) + +DEF_OP_V (ceil, 1, double, __builtin_ceil) +DEF_OP_V (ceil, 2, double, __builtin_ceil) +DEF_OP_V (ceil, 4, double, __builtin_ceil) +DEF_OP_V (ceil, 8, double, __builtin_ceil) +DEF_OP_V (ceil, 16, double, __builtin_ceil) +DEF_OP_V (ceil, 32, double, __builtin_ceil) +DEF_OP_V (ceil, 64, double, __builtin_ceil) +DEF_OP_V (ceil, 128, double, __builtin_ceil) +DEF_OP_V (ceil, 256, double, __builtin_ceil) +DEF_OP_V (ceil, 512, double, __builtin_ceil) + +/* { dg-final { scan-assembler-not {csrr} } } */ +/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */ +/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */ +/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */