[v1] Internal-fn: Add new internal function SAT_ADDU

Message ID	20240217103029.3120318-1-pan2.li@intel.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 960A63858D33 From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, pan2.li@intel.com, yanzhang.wang@intel.com, kito.cheng@gmail.com, richard.guenther@gmail.com, Tamar.Christina@arm.com Subject: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU Date: Sat, 17 Feb 2024 18:30:29 +0800 Message-Id: <20240217103029.3120318-1-pan2.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	[v1] Internal-fn: Add new internal function SAT_ADDU \| expand [v1] Internal-fn: Add new internal function SAT_ADDU

Li, Pan2 Feb. 17, 2024, 10:30 a.m. UTC

From: Pan Li <pan2.li@intel.com>

This patch would like to add the middle-end presentation for the
unsigned saturation add.  Aka set the result of add to the max
when overflow.  It will take the pattern similar as below.

SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADDU (1, 254)   => 255.
* SAT_ADDU (1, 255)   => 255.
* SAT_ADDU (2, 255)   => 255.
* SAT_ADDU (255, 255) => 255.

The patch also implement the SAT_ADDU in the riscv backend as
the sample.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:

uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:

uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT

}

Then we will have the middle-end representation like .SAT_ADDU after
this patch.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
	New func decl for the SAT_ADDU expand.
	* config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
	impl for the SAT_ADDU expand.
	* config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
	the standard name SAT_ADDU.
	* doc/md.texi: Add doc for SAT_ADDU.
	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
	* internal-fn.def (SAT_ADDU): Add SAT_ADDU.
	* match.pd: Add simplify pattern patch for SAT_ADDU.
	* optabs.def (OPTAB_D): Add sat_addu_optab.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat_addu-1.c: New test.
	* gcc.target/riscv/sat_addu-2.c: New test.
	* gcc.target/riscv/sat_addu-3.c: New test.
	* gcc.target/riscv/sat_addu-4.c: New test.
	* gcc.target/riscv/sat_addu-run-1.c: New test.
	* gcc.target/riscv/sat_addu-run-2.c: New test.
	* gcc.target/riscv/sat_addu-run-3.c: New test.
	* gcc.target/riscv/sat_addu-run-4.c: New test.
	* gcc.target/riscv/sat_arith.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/riscv-protos.h               |  1 +
 gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
 gcc/config/riscv/riscv.md                     | 11 +++++
 gcc/doc/md.texi                               | 11 +++++
 gcc/internal-fn.cc                            |  1 +
 gcc/internal-fn.def                           |  1 +
 gcc/match.pd                                  | 22 +++++++++
 gcc/optabs.def                                |  2 +
 gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
 gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
 gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
 gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
 .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
 .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
 .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
 .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
 17 files changed, 356 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h

Richard Biener Feb. 19, 2024, 7:36 a.m. UTC | #1

On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to add the middle-end presentation for the
> unsigned saturation add.  Aka set the result of add to the max
> when overflow.  It will take the pattern similar as below.
>
> SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
>
> Take uint8_t as example, we will have:
>
> * SAT_ADDU (1, 254)   => 255.
> * SAT_ADDU (1, 255)   => 255.
> * SAT_ADDU (2, 255)   => 255.
> * SAT_ADDU (255, 255) => 255.
>
> The patch also implement the SAT_ADDU in the riscv backend as
> the sample.  Given below example:
>
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
>
> Before this patch:
>
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
>
> }
>
> After this patch:
>
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
>
> }
>
> Then we will have the middle-end representation like .SAT_ADDU after
> this patch.

I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
the corresponding ssadd/usadd optabs.  There's not much documentation
unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
suggests this is used for fixed-point operations.  It looks like arm uses
fractional/accumulator modes for this but for example bfin has ssaddsi3.

So the question is whether the fixed-point case can be distinguished from
the integer case based on mode.

There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
no special tree operator codes for them.  So compared to what appears
to be the case on RTL we'd need a way to represent saturating integer
operations on GIMPLE.

The natural thing is to use direct optab internal functions (that's what you
basically did, but you added a new optab, IMO without good reason).
More GIMPLE-like would be to let the types involved decide whether
it's signed or unsigned saturation.  That's actually what I'd prefer here
and if we don't map 1:1 to optabs then instead use tree codes like
S_PLUS_EXPR (mimicing RTL here).

Any other opinions?  Anyone knows more about fixed-point and RTL/modes?

Richard.

>         PR target/51492
>         PR target/112600
>
> gcc/ChangeLog:
>
>         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
>         New func decl for the SAT_ADDU expand.
>         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
>         impl for the SAT_ADDU expand.
>         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
>         the standard name SAT_ADDU.
>         * doc/md.texi: Add doc for SAT_ADDU.
>         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
>         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
>         * match.pd: Add simplify pattern patch for SAT_ADDU.
>         * optabs.def (OPTAB_D): Add sat_addu_optab.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/sat_addu-1.c: New test.
>         * gcc.target/riscv/sat_addu-2.c: New test.
>         * gcc.target/riscv/sat_addu-3.c: New test.
>         * gcc.target/riscv/sat_addu-4.c: New test.
>         * gcc.target/riscv/sat_addu-run-1.c: New test.
>         * gcc.target/riscv/sat_addu-run-2.c: New test.
>         * gcc.target/riscv/sat_addu-run-3.c: New test.
>         * gcc.target/riscv/sat_addu-run-4.c: New test.
>         * gcc.target/riscv/sat_arith.h: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/config/riscv/riscv-protos.h               |  1 +
>  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
>  gcc/config/riscv/riscv.md                     | 11 +++++
>  gcc/doc/md.texi                               | 11 +++++
>  gcc/internal-fn.cc                            |  1 +
>  gcc/internal-fn.def                           |  1 +
>  gcc/match.pd                                  | 22 +++++++++
>  gcc/optabs.def                                |  2 +
>  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
>  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
>  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
>  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
>  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
>  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
>  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
>  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
>  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
>  17 files changed, 356 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index ae1685850ac..f201b2384f9 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
>  extern bool
>  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
>  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
>
>  #ifdef RTX_CODE
>  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 799d7919a4a..84e86eb5d49 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
>    return true;
>  }
>
> +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> +void
> +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> +{
> +  machine_mode mode = GET_MODE (dest);
> +  rtx pmode_sum = gen_reg_rtx (Pmode);
> +  rtx pmode_lt = gen_reg_rtx (Pmode);
> +  rtx pmode_x = gen_lowpart (Pmode, x);
> +  rtx pmode_y = gen_lowpart (Pmode, y);
> +  rtx pmode_dest = gen_reg_rtx (Pmode);
> +
> +  /* Step-1: sum = x + y  */
> +  if (mode == SImode && mode != Pmode)
> +    { /* Take addw to avoid the sum truncate.  */
> +      rtx simode_sum = gen_reg_rtx (SImode);
> +      riscv_emit_binary (PLUS, simode_sum, x, y);
> +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> +    }
> +  else
> +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> +
> +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> +  if (mode == HImode || mode == QImode)
> +    {
> +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> +       - GET_MODE_BITSIZE (mode).to_constant ();
> +
> +      gcc_assert (shift_bits > 0);
> +
> +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> +    }
> +
> +  /* Step-2: lt = sum < x  */
> +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> +
> +  /* Step-3: lt = -lt  */
> +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> +
> +  /* Step-4: pmode_dest = sum | lt  */
> +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> +
> +  /* Step-5: dest = pmode_dest */
> +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 39b29795cd6..03cbe5a2ca9 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
>    [(set_attr "type" "load")
>     (set (attr "length") (const_int 8))])
>
> +(define_expand "sat_addu_<mode>3"
> +  [(match_operand:ANYI   0 "register_operand")
> +   (match_operand:ANYI   1 "register_operand")
> +   (match_operand:ANYI   2 "register_operand")]
> +  ""
> +  {
> +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> +    DONE;
> +  }
> +)
> +
>  (include "bitmanip.md")
>  (include "crypto.md")
>  (include "sync.md")
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index b0c61925120..5867afdb1a0 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes @var{m}.
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{sat_addu_@var{m}3} instruction pattern
> +@item @samp{sat_addu_@var{m}3}
> +Perform the saturation unsigned add for the operand 1 and operand 2 and
> +store the result into the operand 0.  All operands have mode @var{m},
> +which is a scalar integer mode.
> +
> +@smallexample
> +  typedef unsigned char uint8_t;
> +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> +@end smallexample
> +
>  @cindex @code{cmla@var{m}4} instruction pattern
>  @item @samp{cmla@var{m}4}
>  Perform a vector multiply and accumulate that is semantically the same as
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index a07f25f3aee..dee73dbc614 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_VEC_WIDEN_PLUS_HI:
>      case IFN_VEC_WIDEN_PLUS_EVEN:
>      case IFN_VEC_WIDEN_PLUS_ODD:
> +    case IFN_SAT_ADDU:
>        return true;
>
>      default:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index c14d30365c1..a04592fc779 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
>                                 binary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
> +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)
>
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 711c3a10c3f..9de1106adcf 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   )
>  )
>
> +#if GIMPLE
> +
> +/* Saturation add unsigned, aka:
> +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> +(simplify
> + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> +   (if (optimize
> +       && INTEGRAL_TYPE_P (type)
> +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> +       && types_match (type, TREE_TYPE (@0))
> +       && types_match (type, TREE_TYPE (@1))
> +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type, OPTIMIZE_FOR_BOTH))
> +   (IFN_SAT_ADDU @0 @1)))
> +
> +/* SAT_ADDU (X, 0) = X  */
> +(simplify
> + (IFN_SAT_ADDU:c @0 integer_zerop)
> + @0)
> +
> +#endif
> +
>  /* A few cases of fold-const.cc negate_expr_p predicate.  */
>  (match negate_expr_p
>   INTEGER_CST
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..a2c11b7707b 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
>  OPTAB_D (addptr3_optab, "addptr$a3")
>  OPTAB_D (spaceship_optab, "spaceship$a3")
>
> +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> +
>  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
>  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
>
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> new file mode 100644
> index 00000000000..229abef0faa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint8_t:
> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** andi\s+a0,\s*a0,\s*0xff
> +** ret
> +*/
> +DEF_SAT_ADDU(uint8_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> new file mode 100644
> index 00000000000..4023b030811
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint16_t:
> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** slli\s+a0,\s*a0,\s*48
> +** srli\s+a0,\s*a0,\s*48
> +** ret
> +*/
> +DEF_SAT_ADDU(uint16_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> new file mode 100644
> index 00000000000..4d0af97fb67
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint32_t:
> +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** sext.w\s+a0,\s*a0
> +** ret
> +*/
> +DEF_SAT_ADDU(uint32_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> new file mode 100644
> index 00000000000..926f31266e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint64_t:
> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** ret
> +*/
> +DEF_SAT_ADDU(uint64_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> new file mode 100644
> index 00000000000..b19515c39d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint8_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> new file mode 100644
> index 00000000000..90073fbe4ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint16_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> new file mode 100644
> index 00000000000..996dd3de737
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint32_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> new file mode 100644
> index 00000000000..51a5421577b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> @@ -0,0 +1,49 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint64_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> +    != 18446744073709551614u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> new file mode 100644
> index 00000000000..4c00157685e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> @@ -0,0 +1,15 @@
> +#ifndef HAVE_SAT_ARITH
> +#define HAVE_SAT_ARITH
> +
> +#include <stdint.h>
> +
> +#define DEF_SAT_ADDU(TYPE)                       \
> +TYPE __attribute__((noinline))                   \
> +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> +{                                                \
> +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> +}
> +
> +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> +
> +#endif
> --
> 2.34.1
>

Andrew Pinski Feb. 19, 2024, 8:30 a.m. UTC | #2

On Sun, Feb 18, 2024 at 11:37 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > unsigned saturation add.  Aka set the result of add to the max
> > when overflow.  It will take the pattern similar as below.
> >
> > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADDU (1, 254)   => 255.
> > * SAT_ADDU (1, 255)   => 255.
> > * SAT_ADDU (2, 255)   => 255.
> > * SAT_ADDU (255, 255) => 255.
> >
> > The patch also implement the SAT_ADDU in the riscv backend as
> > the sample.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > Then we will have the middle-end representation like .SAT_ADDU after
> > this patch.
>
> I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> the corresponding ssadd/usadd optabs.  There's not much documentation
> unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> suggests this is used for fixed-point operations.  It looks like arm uses
> fractional/accumulator modes for this but for example bfin has ssaddsi3.
>
> So the question is whether the fixed-point case can be distinguished from
> the integer case based on mode.
>
> There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> no special tree operator codes for them.  So compared to what appears
> to be the case on RTL we'd need a way to represent saturating integer
> operations on GIMPLE.
>
> The natural thing is to use direct optab internal functions (that's what you
> basically did, but you added a new optab, IMO without good reason).
> More GIMPLE-like would be to let the types involved decide whether
> it's signed or unsigned saturation.  That's actually what I'd prefer here
> and if we don't map 1:1 to optabs then instead use tree codes like
> S_PLUS_EXPR (mimicing RTL here).
>
> Any other opinions?  Anyone knows more about fixed-point and RTL/modes?

There was a discussion about this back in 2021:
https://gcc.gnu.org/pipermail/gcc/2021-May/236015.html

Including a reference to the much older discussion from JSM about
fixed-point types and lowering and such:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-05/msg00846.html

I am not 100% sure how much of this applies here though.

I have not looked fully into either thread to get a sense of what was
decided in the end.

Thanks,
Andrew

>
> Richard.
>
> >         PR target/51492
> >         PR target/112600
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> >         New func decl for the SAT_ADDU expand.
> >         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> >         impl for the SAT_ADDU expand.
> >         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> >         the standard name SAT_ADDU.
> >         * doc/md.texi: Add doc for SAT_ADDU.
> >         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> >         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> >         * match.pd: Add simplify pattern patch for SAT_ADDU.
> >         * optabs.def (OPTAB_D): Add sat_addu_optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/sat_addu-1.c: New test.
> >         * gcc.target/riscv/sat_addu-2.c: New test.
> >         * gcc.target/riscv/sat_addu-3.c: New test.
> >         * gcc.target/riscv/sat_addu-4.c: New test.
> >         * gcc.target/riscv/sat_addu-run-1.c: New test.
> >         * gcc.target/riscv/sat_addu-run-2.c: New test.
> >         * gcc.target/riscv/sat_addu-run-3.c: New test.
> >         * gcc.target/riscv/sat_addu-run-4.c: New test.
> >         * gcc.target/riscv/sat_arith.h: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/config/riscv/riscv-protos.h               |  1 +
> >  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> >  gcc/config/riscv/riscv.md                     | 11 +++++
> >  gcc/doc/md.texi                               | 11 +++++
> >  gcc/internal-fn.cc                            |  1 +
> >  gcc/internal-fn.def                           |  1 +
> >  gcc/match.pd                                  | 22 +++++++++
> >  gcc/optabs.def                                |  2 +
> >  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> >  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> >  17 files changed, 356 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index ae1685850ac..f201b2384f9 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
> >  extern bool
> >  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> >  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> > +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> >
> >  #ifdef RTX_CODE
> >  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 799d7919a4a..84e86eb5d49 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
> >    return true;
> >  }
> >
> > +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> > +void
> > +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> > +{
> > +  machine_mode mode = GET_MODE (dest);
> > +  rtx pmode_sum = gen_reg_rtx (Pmode);
> > +  rtx pmode_lt = gen_reg_rtx (Pmode);
> > +  rtx pmode_x = gen_lowpart (Pmode, x);
> > +  rtx pmode_y = gen_lowpart (Pmode, y);
> > +  rtx pmode_dest = gen_reg_rtx (Pmode);
> > +
> > +  /* Step-1: sum = x + y  */
> > +  if (mode == SImode && mode != Pmode)
> > +    { /* Take addw to avoid the sum truncate.  */
> > +      rtx simode_sum = gen_reg_rtx (SImode);
> > +      riscv_emit_binary (PLUS, simode_sum, x, y);
> > +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> > +    }
> > +  else
> > +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> > +
> > +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> > +  if (mode == HImode || mode == QImode)
> > +    {
> > +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> > +       - GET_MODE_BITSIZE (mode).to_constant ();
> > +
> > +      gcc_assert (shift_bits > 0);
> > +
> > +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> > +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> > +    }
> > +
> > +  /* Step-2: lt = sum < x  */
> > +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> > +
> > +  /* Step-3: lt = -lt  */
> > +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> > +
> > +  /* Step-4: pmode_dest = sum | lt  */
> > +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> > +
> > +  /* Step-5: dest = pmode_dest */
> > +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> > +}
> > +
> >  /* Initialize the GCC target structure.  */
> >  #undef TARGET_ASM_ALIGNED_HI_OP
> >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 39b29795cd6..03cbe5a2ca9 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> >    [(set_attr "type" "load")
> >     (set (attr "length") (const_int 8))])
> >
> > +(define_expand "sat_addu_<mode>3"
> > +  [(match_operand:ANYI   0 "register_operand")
> > +   (match_operand:ANYI   1 "register_operand")
> > +   (match_operand:ANYI   2 "register_operand")]
> > +  ""
> > +  {
> > +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> > +    DONE;
> > +  }
> > +)
> > +
> >  (include "bitmanip.md")
> >  (include "crypto.md")
> >  (include "sync.md")
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index b0c61925120..5867afdb1a0 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes @var{m}.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{sat_addu_@var{m}3} instruction pattern
> > +@item @samp{sat_addu_@var{m}3}
> > +Perform the saturation unsigned add for the operand 1 and operand 2 and
> > +store the result into the operand 0.  All operands have mode @var{m},
> > +which is a scalar integer mode.
> > +
> > +@smallexample
> > +  typedef unsigned char uint8_t;
> > +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> > +@end smallexample
> > +
> >  @cindex @code{cmla@var{m}4} instruction pattern
> >  @item @samp{cmla@var{m}4}
> >  Perform a vector multiply and accumulate that is semantically the same as
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index a07f25f3aee..dee73dbc614 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_VEC_WIDEN_PLUS_HI:
> >      case IFN_VEC_WIDEN_PLUS_EVEN:
> >      case IFN_VEC_WIDEN_PLUS_ODD:
> > +    case IFN_SAT_ADDU:
> >        return true;
> >
> >      default:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index c14d30365c1..a04592fc779 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
> >                                 binary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
> > +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)
> >
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 711c3a10c3f..9de1106adcf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   )
> >  )
> >
> > +#if GIMPLE
> > +
> > +/* Saturation add unsigned, aka:
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> > +(simplify
> > + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> > +   (if (optimize
> > +       && INTEGRAL_TYPE_P (type)
> > +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@1))
> > +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type, OPTIMIZE_FOR_BOTH))
> > +   (IFN_SAT_ADDU @0 @1)))
> > +
> > +/* SAT_ADDU (X, 0) = X  */
> > +(simplify
> > + (IFN_SAT_ADDU:c @0 integer_zerop)
> > + @0)
> > +
> > +#endif
> > +
> >  /* A few cases of fold-const.cc negate_expr_p predicate.  */
> >  (match negate_expr_p
> >   INTEGER_CST
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..a2c11b7707b 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> >  OPTAB_D (addptr3_optab, "addptr$a3")
> >  OPTAB_D (spaceship_optab, "spaceship$a3")
> >
> > +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> > +
> >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > new file mode 100644
> > index 00000000000..229abef0faa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint8_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** andi\s+a0,\s*a0,\s*0xff
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint8_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > new file mode 100644
> > index 00000000000..4023b030811
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint16_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** slli\s+a0,\s*a0,\s*48
> > +** srli\s+a0,\s*a0,\s*48
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint16_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > new file mode 100644
> > index 00000000000..4d0af97fb67
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint32_t:
> > +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** sext.w\s+a0,\s*a0
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint32_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > new file mode 100644
> > index 00000000000..926f31266e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint64_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint64_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > new file mode 100644
> > index 00000000000..b19515c39d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint8_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > new file mode 100644
> > index 00000000000..90073fbe4ba
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint16_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > new file mode 100644
> > index 00000000000..996dd3de737
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint32_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > new file mode 100644
> > index 00000000000..51a5421577b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint64_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> > +    != 18446744073709551614u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > new file mode 100644
> > index 00000000000..4c00157685e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > @@ -0,0 +1,15 @@
> > +#ifndef HAVE_SAT_ARITH
> > +#define HAVE_SAT_ARITH
> > +
> > +#include <stdint.h>
> > +
> > +#define DEF_SAT_ADDU(TYPE)                       \
> > +TYPE __attribute__((noinline))                   \
> > +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> > +{                                                \
> > +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> > +}
> > +
> > +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> > +
> > +#endif
> > --
> > 2.34.1
> >

Li, Pan2 Feb. 19, 2024, 8:41 a.m. UTC | #3

Thanks Richard for comments.

> I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> the corresponding ssadd/usadd optabs.  There's not much documentation
> unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> suggests this is used for fixed-point operations.  It looks like arm uses
> fractional/accumulator modes for this but for example bfin has ssaddsi3.

I find the related description about plus family in GCC internals doc but it doesn't mention
anything about mode m here.

(plus:m x y)
(ss_plus:m x y)
(us_plus:m x y)
These three expressions all represent the sum of the values represented by x
and y carried out in machine mode m. They diff er in their behavior on overflow
of integer modes. plus wraps round modulo the width of m; ss_plus saturates
at the maximum signed value representable in m; us_plus saturates at the
maximum unsigned value.

> The natural thing is to use direct optab internal functions (that's what you
> basically did, but you added a new optab, IMO without good reason).

That makes sense to me, I will try to leverage US_PLUS instead here.

> More GIMPLE-like would be to let the types involved decide whether
> it's signed or unsigned saturation.  That's actually what I'd prefer here
> and if we don't map 1:1 to optabs then instead use tree codes like
> S_PLUS_EXPR (mimicing RTL here).

Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I add one restriction
like unsigned_p (type) in match.pd. Looks we have a better way here.

> Any other opinions?  Anyone knows more about fixed-point and RTL/modes?

AFAIK, the scalar of the riscv backend doesn't have fixed-point but the vector does have. They
share the same mode as vector integer. For example, RVVM1SI in vector-iterators.md. Kito
and Juzhe can help to correct me if any misunderstandings.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Monday, February 19, 2024 3:36 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar.Christina@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to add the middle-end presentation for the
> unsigned saturation add.  Aka set the result of add to the max
> when overflow.  It will take the pattern similar as below.
>
> SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
>
> Take uint8_t as example, we will have:
>
> * SAT_ADDU (1, 254)   => 255.
> * SAT_ADDU (1, 255)   => 255.
> * SAT_ADDU (2, 255)   => 255.
> * SAT_ADDU (255, 255) => 255.
>
> The patch also implement the SAT_ADDU in the riscv backend as
> the sample.  Given below example:
>
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
>
> Before this patch:
>
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
>
> }
>
> After this patch:
>
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
>
> }
>
> Then we will have the middle-end representation like .SAT_ADDU after
> this patch.

I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
the corresponding ssadd/usadd optabs.  There's not much documentation
unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
suggests this is used for fixed-point operations.  It looks like arm uses
fractional/accumulator modes for this but for example bfin has ssaddsi3.

So the question is whether the fixed-point case can be distinguished from
the integer case based on mode.

There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
no special tree operator codes for them.  So compared to what appears
to be the case on RTL we'd need a way to represent saturating integer
operations on GIMPLE.

The natural thing is to use direct optab internal functions (that's what you
basically did, but you added a new optab, IMO without good reason).
More GIMPLE-like would be to let the types involved decide whether
it's signed or unsigned saturation.  That's actually what I'd prefer here
and if we don't map 1:1 to optabs then instead use tree codes like
S_PLUS_EXPR (mimicing RTL here).

Any other opinions?  Anyone knows more about fixed-point and RTL/modes?

Richard.

>         PR target/51492
>         PR target/112600
>
> gcc/ChangeLog:
>
>         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
>         New func decl for the SAT_ADDU expand.
>         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
>         impl for the SAT_ADDU expand.
>         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
>         the standard name SAT_ADDU.
>         * doc/md.texi: Add doc for SAT_ADDU.
>         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
>         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
>         * match.pd: Add simplify pattern patch for SAT_ADDU.
>         * optabs.def (OPTAB_D): Add sat_addu_optab.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/sat_addu-1.c: New test.
>         * gcc.target/riscv/sat_addu-2.c: New test.
>         * gcc.target/riscv/sat_addu-3.c: New test.
>         * gcc.target/riscv/sat_addu-4.c: New test.
>         * gcc.target/riscv/sat_addu-run-1.c: New test.
>         * gcc.target/riscv/sat_addu-run-2.c: New test.
>         * gcc.target/riscv/sat_addu-run-3.c: New test.
>         * gcc.target/riscv/sat_addu-run-4.c: New test.
>         * gcc.target/riscv/sat_arith.h: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/config/riscv/riscv-protos.h               |  1 +
>  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
>  gcc/config/riscv/riscv.md                     | 11 +++++
>  gcc/doc/md.texi                               | 11 +++++
>  gcc/internal-fn.cc                            |  1 +
>  gcc/internal-fn.def                           |  1 +
>  gcc/match.pd                                  | 22 +++++++++
>  gcc/optabs.def                                |  2 +
>  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
>  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
>  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
>  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
>  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
>  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
>  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
>  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
>  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
>  17 files changed, 356 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index ae1685850ac..f201b2384f9 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
>  extern bool
>  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
>  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
>
>  #ifdef RTX_CODE
>  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 799d7919a4a..84e86eb5d49 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
>    return true;
>  }
>
> +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> +void
> +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> +{
> +  machine_mode mode = GET_MODE (dest);
> +  rtx pmode_sum = gen_reg_rtx (Pmode);
> +  rtx pmode_lt = gen_reg_rtx (Pmode);
> +  rtx pmode_x = gen_lowpart (Pmode, x);
> +  rtx pmode_y = gen_lowpart (Pmode, y);
> +  rtx pmode_dest = gen_reg_rtx (Pmode);
> +
> +  /* Step-1: sum = x + y  */
> +  if (mode == SImode && mode != Pmode)
> +    { /* Take addw to avoid the sum truncate.  */
> +      rtx simode_sum = gen_reg_rtx (SImode);
> +      riscv_emit_binary (PLUS, simode_sum, x, y);
> +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> +    }
> +  else
> +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> +
> +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> +  if (mode == HImode || mode == QImode)
> +    {
> +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> +       - GET_MODE_BITSIZE (mode).to_constant ();
> +
> +      gcc_assert (shift_bits > 0);
> +
> +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> +    }
> +
> +  /* Step-2: lt = sum < x  */
> +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> +
> +  /* Step-3: lt = -lt  */
> +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> +
> +  /* Step-4: pmode_dest = sum | lt  */
> +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> +
> +  /* Step-5: dest = pmode_dest */
> +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 39b29795cd6..03cbe5a2ca9 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
>    [(set_attr "type" "load")
>     (set (attr "length") (const_int 8))])
>
> +(define_expand "sat_addu_<mode>3"
> +  [(match_operand:ANYI   0 "register_operand")
> +   (match_operand:ANYI   1 "register_operand")
> +   (match_operand:ANYI   2 "register_operand")]
> +  ""
> +  {
> +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> +    DONE;
> +  }
> +)
> +
>  (include "bitmanip.md")
>  (include "crypto.md")
>  (include "sync.md")
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index b0c61925120..5867afdb1a0 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes @var{m}.
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{sat_addu_@var{m}3} instruction pattern
> +@item @samp{sat_addu_@var{m}3}
> +Perform the saturation unsigned add for the operand 1 and operand 2 and
> +store the result into the operand 0.  All operands have mode @var{m},
> +which is a scalar integer mode.
> +
> +@smallexample
> +  typedef unsigned char uint8_t;
> +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> +@end smallexample
> +
>  @cindex @code{cmla@var{m}4} instruction pattern
>  @item @samp{cmla@var{m}4}
>  Perform a vector multiply and accumulate that is semantically the same as
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index a07f25f3aee..dee73dbc614 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_VEC_WIDEN_PLUS_HI:
>      case IFN_VEC_WIDEN_PLUS_EVEN:
>      case IFN_VEC_WIDEN_PLUS_ODD:
> +    case IFN_SAT_ADDU:
>        return true;
>
>      default:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index c14d30365c1..a04592fc779 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
>                                 binary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
>  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
> +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)
>
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 711c3a10c3f..9de1106adcf 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   )
>  )
>
> +#if GIMPLE
> +
> +/* Saturation add unsigned, aka:
> +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> +(simplify
> + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> +   (if (optimize
> +       && INTEGRAL_TYPE_P (type)
> +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> +       && types_match (type, TREE_TYPE (@0))
> +       && types_match (type, TREE_TYPE (@1))
> +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type, OPTIMIZE_FOR_BOTH))
> +   (IFN_SAT_ADDU @0 @1)))
> +
> +/* SAT_ADDU (X, 0) = X  */
> +(simplify
> + (IFN_SAT_ADDU:c @0 integer_zerop)
> + @0)
> +
> +#endif
> +
>  /* A few cases of fold-const.cc negate_expr_p predicate.  */
>  (match negate_expr_p
>   INTEGER_CST
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..a2c11b7707b 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
>  OPTAB_D (addptr3_optab, "addptr$a3")
>  OPTAB_D (spaceship_optab, "spaceship$a3")
>
> +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> +
>  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
>  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
>
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> new file mode 100644
> index 00000000000..229abef0faa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint8_t:
> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** andi\s+a0,\s*a0,\s*0xff
> +** ret
> +*/
> +DEF_SAT_ADDU(uint8_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> new file mode 100644
> index 00000000000..4023b030811
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint16_t:
> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** slli\s+a0,\s*a0,\s*48
> +** srli\s+a0,\s*a0,\s*48
> +** ret
> +*/
> +DEF_SAT_ADDU(uint16_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> new file mode 100644
> index 00000000000..4d0af97fb67
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint32_t:
> +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** sext.w\s+a0,\s*a0
> +** ret
> +*/
> +DEF_SAT_ADDU(uint32_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> new file mode 100644
> index 00000000000..926f31266e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "sat_arith.h"
> +
> +/*
> +** sat_addu_uint64_t:
> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> +** ret
> +*/
> +DEF_SAT_ADDU(uint64_t)
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> new file mode 100644
> index 00000000000..b19515c39d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint8_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> new file mode 100644
> index 00000000000..90073fbe4ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint16_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> new file mode 100644
> index 00000000000..996dd3de737
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint32_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> new file mode 100644
> index 00000000000..51a5421577b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> @@ -0,0 +1,49 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-additional-options "-std=c99" } */
> +
> +#include "sat_arith.h"
> +
> +DEF_SAT_ADDU(uint64_t)
> +
> +int
> +main ()
> +{
> +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> +    != 18446744073709551614u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u, 18446744073709551615u)
> +    != 18446744073709551615u)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> new file mode 100644
> index 00000000000..4c00157685e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> @@ -0,0 +1,15 @@
> +#ifndef HAVE_SAT_ARITH
> +#define HAVE_SAT_ARITH
> +
> +#include <stdint.h>
> +
> +#define DEF_SAT_ADDU(TYPE)                       \
> +TYPE __attribute__((noinline))                   \
> +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> +{                                                \
> +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> +}
> +
> +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> +
> +#endif
> --
> 2.34.1
>

Li, Pan2 Feb. 19, 2024, 8:42 a.m. UTC | #4

> There was a discussion about this back in 2021:
> https://gcc.gnu.org/pipermail/gcc/2021-May/236015.html
> Including a reference to the much older discussion from JSM about
> fixed-point types and lowering and such:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-05/msg00846.html

Thanks Andrew, I will go thru for more details.

Pan

-----Original Message-----
From: Andrew Pinski <pinskia@gmail.com> 
Sent: Monday, February 19, 2024 4:31 PM
To: Richard Biener <richard.guenther@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar.Christina@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

On Sun, Feb 18, 2024 at 11:37 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > unsigned saturation add.  Aka set the result of add to the max
> > when overflow.  It will take the pattern similar as below.
> >
> > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADDU (1, 254)   => 255.
> > * SAT_ADDU (1, 255)   => 255.
> > * SAT_ADDU (2, 255)   => 255.
> > * SAT_ADDU (255, 255) => 255.
> >
> > The patch also implement the SAT_ADDU in the riscv backend as
> > the sample.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > Then we will have the middle-end representation like .SAT_ADDU after
> > this patch.
>
> I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> the corresponding ssadd/usadd optabs.  There's not much documentation
> unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> suggests this is used for fixed-point operations.  It looks like arm uses
> fractional/accumulator modes for this but for example bfin has ssaddsi3.
>
> So the question is whether the fixed-point case can be distinguished from
> the integer case based on mode.
>
> There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> no special tree operator codes for them.  So compared to what appears
> to be the case on RTL we'd need a way to represent saturating integer
> operations on GIMPLE.
>
> The natural thing is to use direct optab internal functions (that's what you
> basically did, but you added a new optab, IMO without good reason).
> More GIMPLE-like would be to let the types involved decide whether
> it's signed or unsigned saturation.  That's actually what I'd prefer here
> and if we don't map 1:1 to optabs then instead use tree codes like
> S_PLUS_EXPR (mimicing RTL here).
>
> Any other opinions?  Anyone knows more about fixed-point and RTL/modes?

There was a discussion about this back in 2021:
https://gcc.gnu.org/pipermail/gcc/2021-May/236015.html

Including a reference to the much older discussion from JSM about
fixed-point types and lowering and such:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-05/msg00846.html

I am not 100% sure how much of this applies here though.

I have not looked fully into either thread to get a sense of what was
decided in the end.

Thanks,
Andrew

>
> Richard.
>
> >         PR target/51492
> >         PR target/112600
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> >         New func decl for the SAT_ADDU expand.
> >         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> >         impl for the SAT_ADDU expand.
> >         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> >         the standard name SAT_ADDU.
> >         * doc/md.texi: Add doc for SAT_ADDU.
> >         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> >         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> >         * match.pd: Add simplify pattern patch for SAT_ADDU.
> >         * optabs.def (OPTAB_D): Add sat_addu_optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/sat_addu-1.c: New test.
> >         * gcc.target/riscv/sat_addu-2.c: New test.
> >         * gcc.target/riscv/sat_addu-3.c: New test.
> >         * gcc.target/riscv/sat_addu-4.c: New test.
> >         * gcc.target/riscv/sat_addu-run-1.c: New test.
> >         * gcc.target/riscv/sat_addu-run-2.c: New test.
> >         * gcc.target/riscv/sat_addu-run-3.c: New test.
> >         * gcc.target/riscv/sat_addu-run-4.c: New test.
> >         * gcc.target/riscv/sat_arith.h: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/config/riscv/riscv-protos.h               |  1 +
> >  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> >  gcc/config/riscv/riscv.md                     | 11 +++++
> >  gcc/doc/md.texi                               | 11 +++++
> >  gcc/internal-fn.cc                            |  1 +
> >  gcc/internal-fn.def                           |  1 +
> >  gcc/match.pd                                  | 22 +++++++++
> >  gcc/optabs.def                                |  2 +
> >  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> >  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> >  17 files changed, 356 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index ae1685850ac..f201b2384f9 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
> >  extern bool
> >  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> >  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> > +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> >
> >  #ifdef RTX_CODE
> >  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 799d7919a4a..84e86eb5d49 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
> >    return true;
> >  }
> >
> > +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> > +void
> > +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> > +{
> > +  machine_mode mode = GET_MODE (dest);
> > +  rtx pmode_sum = gen_reg_rtx (Pmode);
> > +  rtx pmode_lt = gen_reg_rtx (Pmode);
> > +  rtx pmode_x = gen_lowpart (Pmode, x);
> > +  rtx pmode_y = gen_lowpart (Pmode, y);
> > +  rtx pmode_dest = gen_reg_rtx (Pmode);
> > +
> > +  /* Step-1: sum = x + y  */
> > +  if (mode == SImode && mode != Pmode)
> > +    { /* Take addw to avoid the sum truncate.  */
> > +      rtx simode_sum = gen_reg_rtx (SImode);
> > +      riscv_emit_binary (PLUS, simode_sum, x, y);
> > +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> > +    }
> > +  else
> > +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> > +
> > +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> > +  if (mode == HImode || mode == QImode)
> > +    {
> > +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> > +       - GET_MODE_BITSIZE (mode).to_constant ();
> > +
> > +      gcc_assert (shift_bits > 0);
> > +
> > +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> > +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> > +    }
> > +
> > +  /* Step-2: lt = sum < x  */
> > +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> > +
> > +  /* Step-3: lt = -lt  */
> > +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> > +
> > +  /* Step-4: pmode_dest = sum | lt  */
> > +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> > +
> > +  /* Step-5: dest = pmode_dest */
> > +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> > +}
> > +
> >  /* Initialize the GCC target structure.  */
> >  #undef TARGET_ASM_ALIGNED_HI_OP
> >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 39b29795cd6..03cbe5a2ca9 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> >    [(set_attr "type" "load")
> >     (set (attr "length") (const_int 8))])
> >
> > +(define_expand "sat_addu_<mode>3"
> > +  [(match_operand:ANYI   0 "register_operand")
> > +   (match_operand:ANYI   1 "register_operand")
> > +   (match_operand:ANYI   2 "register_operand")]
> > +  ""
> > +  {
> > +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> > +    DONE;
> > +  }
> > +)
> > +
> >  (include "bitmanip.md")
> >  (include "crypto.md")
> >  (include "sync.md")
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index b0c61925120..5867afdb1a0 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes @var{m}.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{sat_addu_@var{m}3} instruction pattern
> > +@item @samp{sat_addu_@var{m}3}
> > +Perform the saturation unsigned add for the operand 1 and operand 2 and
> > +store the result into the operand 0.  All operands have mode @var{m},
> > +which is a scalar integer mode.
> > +
> > +@smallexample
> > +  typedef unsigned char uint8_t;
> > +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> > +@end smallexample
> > +
> >  @cindex @code{cmla@var{m}4} instruction pattern
> >  @item @samp{cmla@var{m}4}
> >  Perform a vector multiply and accumulate that is semantically the same as
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index a07f25f3aee..dee73dbc614 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_VEC_WIDEN_PLUS_HI:
> >      case IFN_VEC_WIDEN_PLUS_EVEN:
> >      case IFN_VEC_WIDEN_PLUS_ODD:
> > +    case IFN_SAT_ADDU:
> >        return true;
> >
> >      default:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index c14d30365c1..a04592fc779 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
> >                                 binary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
> > +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)
> >
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 711c3a10c3f..9de1106adcf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   )
> >  )
> >
> > +#if GIMPLE
> > +
> > +/* Saturation add unsigned, aka:
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> > +(simplify
> > + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> > +   (if (optimize
> > +       && INTEGRAL_TYPE_P (type)
> > +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@1))
> > +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type, OPTIMIZE_FOR_BOTH))
> > +   (IFN_SAT_ADDU @0 @1)))
> > +
> > +/* SAT_ADDU (X, 0) = X  */
> > +(simplify
> > + (IFN_SAT_ADDU:c @0 integer_zerop)
> > + @0)
> > +
> > +#endif
> > +
> >  /* A few cases of fold-const.cc negate_expr_p predicate.  */
> >  (match negate_expr_p
> >   INTEGER_CST
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..a2c11b7707b 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> >  OPTAB_D (addptr3_optab, "addptr$a3")
> >  OPTAB_D (spaceship_optab, "spaceship$a3")
> >
> > +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> > +
> >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > new file mode 100644
> > index 00000000000..229abef0faa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint8_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** andi\s+a0,\s*a0,\s*0xff
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint8_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > new file mode 100644
> > index 00000000000..4023b030811
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint16_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** slli\s+a0,\s*a0,\s*48
> > +** srli\s+a0,\s*a0,\s*48
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint16_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > new file mode 100644
> > index 00000000000..4d0af97fb67
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint32_t:
> > +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** sext.w\s+a0,\s*a0
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint32_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > new file mode 100644
> > index 00000000000..926f31266e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint64_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint64_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > new file mode 100644
> > index 00000000000..b19515c39d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint8_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > new file mode 100644
> > index 00000000000..90073fbe4ba
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint16_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > new file mode 100644
> > index 00000000000..996dd3de737
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint32_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > new file mode 100644
> > index 00000000000..51a5421577b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint64_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> > +    != 18446744073709551614u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > new file mode 100644
> > index 00000000000..4c00157685e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > @@ -0,0 +1,15 @@
> > +#ifndef HAVE_SAT_ARITH
> > +#define HAVE_SAT_ARITH
> > +
> > +#include <stdint.h>
> > +
> > +#define DEF_SAT_ADDU(TYPE)                       \
> > +TYPE __attribute__((noinline))                   \
> > +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> > +{                                                \
> > +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> > +}
> > +
> > +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> > +
> > +#endif
> > --
> > 2.34.1
> >

Tamar Christina Feb. 19, 2024, 8:55 a.m. UTC | #5

Thanks for doing this!

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Monday, February 19, 2024 8:42 AM
> To: Richard Biener <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Richard for comments.
> 
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> 
> I find the related description about plus family in GCC internals doc but it doesn't
> mention
> anything about mode m here.
> 
> (plus:m x y)
> (ss_plus:m x y)
> (us_plus:m x y)
> These three expressions all represent the sum of the values represented by x
> and y carried out in machine mode m. They diff er in their behavior on overflow
> of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> at the maximum signed value representable in m; us_plus saturates at the
> maximum unsigned value.
> 
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).

I think we should actually do an indirect optab here, because the IFN can be used
to replace the general representation of saturating arithmetic.

e.g. the __builtin_add_overflow case in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
is inefficient on all targets and so the IFN can always expand to something that's more
efficient like the branchless version add_sat2. 

I think this is why you suggested a new tree code below, but we don't really need
tree-codes for this. It can be done cleaner using the same way as DEF_INTERNAL_INT_EXT_FN.

> 
> That makes sense to me, I will try to leverage US_PLUS instead here.
> 
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> 
> Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I add one
> restriction
> like unsigned_p (type) in match.pd. Looks we have a better way here.
> 

Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the sign
should be determined by the types at expansion time.  i.e. there should only be
.SAT_ADD. 

i.e. instead of this

+DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)

You should use DEF_INTERNAL_SIGNED_OPTAB_FN.

Regards,
Tamar

> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> 
> AFAIK, the scalar of the riscv backend doesn't have fixed-point but the vector does
> have. They
> share the same mode as vector integer. For example, RVVM1SI in vector-
> iterators.md. Kito
> and Juzhe can help to correct me if any misunderstandings.
> 
> Pan
> 
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, February 19, 2024 3:36 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar.Christina@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > unsigned saturation add.  Aka set the result of add to the max
> > when overflow.  It will take the pattern similar as below.
> >
> > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADDU (1, 254)   => 255.
> > * SAT_ADDU (1, 255)   => 255.
> > * SAT_ADDU (2, 255)   => 255.
> > * SAT_ADDU (255, 255) => 255.
> >
> > The patch also implement the SAT_ADDU in the riscv backend as
> > the sample.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > Then we will have the middle-end representation like .SAT_ADDU after
> > this patch.
> 
> I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> the corresponding ssadd/usadd optabs.  There's not much documentation
> unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> suggests this is used for fixed-point operations.  It looks like arm uses
> fractional/accumulator modes for this but for example bfin has ssaddsi3.
> 
> So the question is whether the fixed-point case can be distinguished from
> the integer case based on mode.
> 
> There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> no special tree operator codes for them.  So compared to what appears
> to be the case on RTL we'd need a way to represent saturating integer
> operations on GIMPLE.
> 
> The natural thing is to use direct optab internal functions (that's what you
> basically did, but you added a new optab, IMO without good reason).
> More GIMPLE-like would be to let the types involved decide whether
> it's signed or unsigned saturation.  That's actually what I'd prefer here
> and if we don't map 1:1 to optabs then instead use tree codes like
> S_PLUS_EXPR (mimicing RTL here).
> 
> Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> 
> Richard.
> 
> >         PR target/51492
> >         PR target/112600
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> >         New func decl for the SAT_ADDU expand.
> >         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> >         impl for the SAT_ADDU expand.
> >         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> >         the standard name SAT_ADDU.
> >         * doc/md.texi: Add doc for SAT_ADDU.
> >         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> >         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> >         * match.pd: Add simplify pattern patch for SAT_ADDU.
> >         * optabs.def (OPTAB_D): Add sat_addu_optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/sat_addu-1.c: New test.
> >         * gcc.target/riscv/sat_addu-2.c: New test.
> >         * gcc.target/riscv/sat_addu-3.c: New test.
> >         * gcc.target/riscv/sat_addu-4.c: New test.
> >         * gcc.target/riscv/sat_addu-run-1.c: New test.
> >         * gcc.target/riscv/sat_addu-run-2.c: New test.
> >         * gcc.target/riscv/sat_addu-run-3.c: New test.
> >         * gcc.target/riscv/sat_addu-run-4.c: New test.
> >         * gcc.target/riscv/sat_arith.h: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/config/riscv/riscv-protos.h               |  1 +
> >  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> >  gcc/config/riscv/riscv.md                     | 11 +++++
> >  gcc/doc/md.texi                               | 11 +++++
> >  gcc/internal-fn.cc                            |  1 +
> >  gcc/internal-fn.def                           |  1 +
> >  gcc/match.pd                                  | 22 +++++++++
> >  gcc/optabs.def                                |  2 +
> >  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> >  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> >  17 files changed, 356 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index ae1685850ac..f201b2384f9 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const
> tree, const char *);
> >  extern bool
> >  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> >  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> > +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> >
> >  #ifdef RTX_CODE
> >  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr =
> 0);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 799d7919a4a..84e86eb5d49 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p
> (machine_mode)
> >    return true;
> >  }
> >
> > +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> > +void
> > +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> > +{
> > +  machine_mode mode = GET_MODE (dest);
> > +  rtx pmode_sum = gen_reg_rtx (Pmode);
> > +  rtx pmode_lt = gen_reg_rtx (Pmode);
> > +  rtx pmode_x = gen_lowpart (Pmode, x);
> > +  rtx pmode_y = gen_lowpart (Pmode, y);
> > +  rtx pmode_dest = gen_reg_rtx (Pmode);
> > +
> > +  /* Step-1: sum = x + y  */
> > +  if (mode == SImode && mode != Pmode)
> > +    { /* Take addw to avoid the sum truncate.  */
> > +      rtx simode_sum = gen_reg_rtx (SImode);
> > +      riscv_emit_binary (PLUS, simode_sum, x, y);
> > +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> > +    }
> > +  else
> > +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> > +
> > +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> > +  if (mode == HImode || mode == QImode)
> > +    {
> > +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> > +       - GET_MODE_BITSIZE (mode).to_constant ();
> > +
> > +      gcc_assert (shift_bits > 0);
> > +
> > +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> > +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT
> (shift_bits));
> > +    }
> > +
> > +  /* Step-2: lt = sum < x  */
> > +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> > +
> > +  /* Step-3: lt = -lt  */
> > +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> > +
> > +  /* Step-4: pmode_dest = sum | lt  */
> > +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> > +
> > +  /* Step-5: dest = pmode_dest */
> > +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> > +}
> > +
> >  /* Initialize the GCC target structure.  */
> >  #undef TARGET_ASM_ALIGNED_HI_OP
> >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 39b29795cd6..03cbe5a2ca9 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> >    [(set_attr "type" "load")
> >     (set (attr "length") (const_int 8))])
> >
> > +(define_expand "sat_addu_<mode>3"
> > +  [(match_operand:ANYI   0 "register_operand")
> > +   (match_operand:ANYI   1 "register_operand")
> > +   (match_operand:ANYI   2 "register_operand")]
> > +  ""
> > +  {
> > +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> > +    DONE;
> > +  }
> > +)
> > +
> >  (include "bitmanip.md")
> >  (include "crypto.md")
> >  (include "sync.md")
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index b0c61925120..5867afdb1a0 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes
> @var{m}.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{sat_addu_@var{m}3} instruction pattern
> > +@item @samp{sat_addu_@var{m}3}
> > +Perform the saturation unsigned add for the operand 1 and operand 2 and
> > +store the result into the operand 0.  All operands have mode @var{m},
> > +which is a scalar integer mode.
> > +
> > +@smallexample
> > +  typedef unsigned char uint8_t;
> > +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> > +@end smallexample
> > +
> >  @cindex @code{cmla@var{m}4} instruction pattern
> >  @item @samp{cmla@var{m}4}
> >  Perform a vector multiply and accumulate that is semantically the same as
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index a07f25f3aee..dee73dbc614 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_VEC_WIDEN_PLUS_HI:
> >      case IFN_VEC_WIDEN_PLUS_EVEN:
> >      case IFN_VEC_WIDEN_PLUS_ODD:
> > +    case IFN_SAT_ADDU:
> >        return true;
> >
> >      default:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index c14d30365c1..a04592fc779 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN
> (VEC_WIDEN_ABD,
> >                                 binary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub,
> ternary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd,
> ternary)
> > +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW,
> sat_addu, binary)
> >
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 711c3a10c3f..9de1106adcf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   )
> >  )
> >
> > +#if GIMPLE
> > +
> > +/* Saturation add unsigned, aka:
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> > +(simplify
> > + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> > +   (if (optimize
> > +       && INTEGRAL_TYPE_P (type)
> > +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@1))
> > +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type,
> OPTIMIZE_FOR_BOTH))
> > +   (IFN_SAT_ADDU @0 @1)))
> > +
> > +/* SAT_ADDU (X, 0) = X  */
> > +(simplify
> > + (IFN_SAT_ADDU:c @0 integer_zerop)
> > + @0)
> > +
> > +#endif
> > +
> >  /* A few cases of fold-const.cc negate_expr_p predicate.  */
> >  (match negate_expr_p
> >   INTEGER_CST
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..a2c11b7707b 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> >  OPTAB_D (addptr3_optab, "addptr$a3")
> >  OPTAB_D (spaceship_optab, "spaceship$a3")
> >
> > +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> > +
> >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > new file mode 100644
> > index 00000000000..229abef0faa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint8_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** andi\s+a0,\s*a0,\s*0xff
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint8_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > new file mode 100644
> > index 00000000000..4023b030811
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint16_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** slli\s+a0,\s*a0,\s*48
> > +** srli\s+a0,\s*a0,\s*48
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint16_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > new file mode 100644
> > index 00000000000..4d0af97fb67
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint32_t:
> > +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** sext.w\s+a0,\s*a0
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint32_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > new file mode 100644
> > index 00000000000..926f31266e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint64_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint64_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > new file mode 100644
> > index 00000000000..b19515c39d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint8_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > new file mode 100644
> > index 00000000000..90073fbe4ba
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint16_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > new file mode 100644
> > index 00000000000..996dd3de737
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint32_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > new file mode 100644
> > index 00000000000..51a5421577b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint64_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> > +    != 18446744073709551614u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u,
> 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h
> b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > new file mode 100644
> > index 00000000000..4c00157685e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > @@ -0,0 +1,15 @@
> > +#ifndef HAVE_SAT_ARITH
> > +#define HAVE_SAT_ARITH
> > +
> > +#include <stdint.h>
> > +
> > +#define DEF_SAT_ADDU(TYPE)                       \
> > +TYPE __attribute__((noinline))                   \
> > +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> > +{                                                \
> > +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> > +}
> > +
> > +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> > +
> > +#endif
> > --
> > 2.34.1
> >

Li, Pan2 Feb. 19, 2024, 12:59 p.m. UTC | #6

Thanks Tamar for comments and explanations.

> I think we should actually do an indirect optab here, because the IFN can be used
> to replace the general representation of saturating arithmetic.

> e.g. the __builtin_add_overflow case in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> is inefficient on all targets and so the IFN can always expand to something that's more
> efficient like the branchless version add_sat2. 

> I think this is why you suggested a new tree code below, but we don't really need
> tree-codes for this. It can be done cleaner using the same way as DEF_INTERNAL_INT_EXT_FN

Yes, the backend could choose a branchless(of course we always hate branch for performance) code-gen or even better there is one saturation insn.
Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it.

> Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the sign
> should be determined by the types at expansion time.  i.e. there should only be
> .SAT_ADD.

Got it, my initial idea comes from that we may have two insns for saturation add, mostly these insns need to be signed or unsigned.
For example, slt/sltu in riscv scalar. But I am not very clear about a scenario like this. During define_expand in backend, we hit the standard name
sat_add_<m>3 but can we tell it is signed or not here? AFAIK, we only have QI, HI, SI and DI.
Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will keep you posted.

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Monday, February 19, 2024 4:55 PM
To: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

Thanks for doing this!

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Monday, February 19, 2024 8:42 AM
> To: Richard Biener <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Richard for comments.
> 
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> 
> I find the related description about plus family in GCC internals doc but it doesn't
> mention
> anything about mode m here.
> 
> (plus:m x y)
> (ss_plus:m x y)
> (us_plus:m x y)
> These three expressions all represent the sum of the values represented by x
> and y carried out in machine mode m. They diff er in their behavior on overflow
> of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> at the maximum signed value representable in m; us_plus saturates at the
> maximum unsigned value.
> 
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).

I think we should actually do an indirect optab here, because the IFN can be used
to replace the general representation of saturating arithmetic.

e.g. the __builtin_add_overflow case in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
is inefficient on all targets and so the IFN can always expand to something that's more
efficient like the branchless version add_sat2. 

I think this is why you suggested a new tree code below, but we don't really need
tree-codes for this. It can be done cleaner using the same way as DEF_INTERNAL_INT_EXT_FN.

> 
> That makes sense to me, I will try to leverage US_PLUS instead here.
> 
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> 
> Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I add one
> restriction
> like unsigned_p (type) in match.pd. Looks we have a better way here.
> 

Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the sign
should be determined by the types at expansion time.  i.e. there should only be
.SAT_ADD. 

i.e. instead of this

+DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)

You should use DEF_INTERNAL_SIGNED_OPTAB_FN.

Regards,
Tamar

> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> 
> AFAIK, the scalar of the riscv backend doesn't have fixed-point but the vector does
> have. They
> share the same mode as vector integer. For example, RVVM1SI in vector-
> iterators.md. Kito
> and Juzhe can help to correct me if any misunderstandings.
> 
> Pan
> 
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, February 19, 2024 3:36 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar.Christina@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > unsigned saturation add.  Aka set the result of add to the max
> > when overflow.  It will take the pattern similar as below.
> >
> > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADDU (1, 254)   => 255.
> > * SAT_ADDU (1, 255)   => 255.
> > * SAT_ADDU (2, 255)   => 255.
> > * SAT_ADDU (255, 255) => 255.
> >
> > The patch also implement the SAT_ADDU in the riscv backend as
> > the sample.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > Then we will have the middle-end representation like .SAT_ADDU after
> > this patch.
> 
> I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> the corresponding ssadd/usadd optabs.  There's not much documentation
> unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> suggests this is used for fixed-point operations.  It looks like arm uses
> fractional/accumulator modes for this but for example bfin has ssaddsi3.
> 
> So the question is whether the fixed-point case can be distinguished from
> the integer case based on mode.
> 
> There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> no special tree operator codes for them.  So compared to what appears
> to be the case on RTL we'd need a way to represent saturating integer
> operations on GIMPLE.
> 
> The natural thing is to use direct optab internal functions (that's what you
> basically did, but you added a new optab, IMO without good reason).
> More GIMPLE-like would be to let the types involved decide whether
> it's signed or unsigned saturation.  That's actually what I'd prefer here
> and if we don't map 1:1 to optabs then instead use tree codes like
> S_PLUS_EXPR (mimicing RTL here).
> 
> Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> 
> Richard.
> 
> >         PR target/51492
> >         PR target/112600
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> >         New func decl for the SAT_ADDU expand.
> >         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> >         impl for the SAT_ADDU expand.
> >         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> >         the standard name SAT_ADDU.
> >         * doc/md.texi: Add doc for SAT_ADDU.
> >         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> >         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> >         * match.pd: Add simplify pattern patch for SAT_ADDU.
> >         * optabs.def (OPTAB_D): Add sat_addu_optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/sat_addu-1.c: New test.
> >         * gcc.target/riscv/sat_addu-2.c: New test.
> >         * gcc.target/riscv/sat_addu-3.c: New test.
> >         * gcc.target/riscv/sat_addu-4.c: New test.
> >         * gcc.target/riscv/sat_addu-run-1.c: New test.
> >         * gcc.target/riscv/sat_addu-run-2.c: New test.
> >         * gcc.target/riscv/sat_addu-run-3.c: New test.
> >         * gcc.target/riscv/sat_addu-run-4.c: New test.
> >         * gcc.target/riscv/sat_arith.h: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/config/riscv/riscv-protos.h               |  1 +
> >  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> >  gcc/config/riscv/riscv.md                     | 11 +++++
> >  gcc/doc/md.texi                               | 11 +++++
> >  gcc/internal-fn.cc                            |  1 +
> >  gcc/internal-fn.def                           |  1 +
> >  gcc/match.pd                                  | 22 +++++++++
> >  gcc/optabs.def                                |  2 +
> >  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> >  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> >  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> >  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> >  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> >  17 files changed, 356 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index ae1685850ac..f201b2384f9 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const
> tree, const char *);
> >  extern bool
> >  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> >  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> > +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> >
> >  #ifdef RTX_CODE
> >  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr =
> 0);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 799d7919a4a..84e86eb5d49 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p
> (machine_mode)
> >    return true;
> >  }
> >
> > +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> > +void
> > +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> > +{
> > +  machine_mode mode = GET_MODE (dest);
> > +  rtx pmode_sum = gen_reg_rtx (Pmode);
> > +  rtx pmode_lt = gen_reg_rtx (Pmode);
> > +  rtx pmode_x = gen_lowpart (Pmode, x);
> > +  rtx pmode_y = gen_lowpart (Pmode, y);
> > +  rtx pmode_dest = gen_reg_rtx (Pmode);
> > +
> > +  /* Step-1: sum = x + y  */
> > +  if (mode == SImode && mode != Pmode)
> > +    { /* Take addw to avoid the sum truncate.  */
> > +      rtx simode_sum = gen_reg_rtx (SImode);
> > +      riscv_emit_binary (PLUS, simode_sum, x, y);
> > +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> > +    }
> > +  else
> > +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> > +
> > +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> > +  if (mode == HImode || mode == QImode)
> > +    {
> > +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> > +       - GET_MODE_BITSIZE (mode).to_constant ();
> > +
> > +      gcc_assert (shift_bits > 0);
> > +
> > +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
> > +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT
> (shift_bits));
> > +    }
> > +
> > +  /* Step-2: lt = sum < x  */
> > +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> > +
> > +  /* Step-3: lt = -lt  */
> > +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> > +
> > +  /* Step-4: pmode_dest = sum | lt  */
> > +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> > +
> > +  /* Step-5: dest = pmode_dest */
> > +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> > +}
> > +
> >  /* Initialize the GCC target structure.  */
> >  #undef TARGET_ASM_ALIGNED_HI_OP
> >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 39b29795cd6..03cbe5a2ca9 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> >    [(set_attr "type" "load")
> >     (set (attr "length") (const_int 8))])
> >
> > +(define_expand "sat_addu_<mode>3"
> > +  [(match_operand:ANYI   0 "register_operand")
> > +   (match_operand:ANYI   1 "register_operand")
> > +   (match_operand:ANYI   2 "register_operand")]
> > +  ""
> > +  {
> > +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> > +    DONE;
> > +  }
> > +)
> > +
> >  (include "bitmanip.md")
> >  (include "crypto.md")
> >  (include "sync.md")
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index b0c61925120..5867afdb1a0 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes
> @var{m}.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{sat_addu_@var{m}3} instruction pattern
> > +@item @samp{sat_addu_@var{m}3}
> > +Perform the saturation unsigned add for the operand 1 and operand 2 and
> > +store the result into the operand 0.  All operands have mode @var{m},
> > +which is a scalar integer mode.
> > +
> > +@smallexample
> > +  typedef unsigned char uint8_t;
> > +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> > +@end smallexample
> > +
> >  @cindex @code{cmla@var{m}4} instruction pattern
> >  @item @samp{cmla@var{m}4}
> >  Perform a vector multiply and accumulate that is semantically the same as
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index a07f25f3aee..dee73dbc614 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_VEC_WIDEN_PLUS_HI:
> >      case IFN_VEC_WIDEN_PLUS_EVEN:
> >      case IFN_VEC_WIDEN_PLUS_ODD:
> > +    case IFN_SAT_ADDU:
> >        return true;
> >
> >      default:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index c14d30365c1..a04592fc779 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN
> (VEC_WIDEN_ABD,
> >                                 binary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub,
> ternary)
> >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd,
> ternary)
> > +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW,
> sat_addu, binary)
> >
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 711c3a10c3f..9de1106adcf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   )
> >  )
> >
> > +#if GIMPLE
> > +
> > +/* Saturation add unsigned, aka:
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> > +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> > +(simplify
> > + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> > +   (if (optimize
> > +       && INTEGRAL_TYPE_P (type)
> > +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@0))
> > +       && types_match (type, TREE_TYPE (@1))
> > +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type,
> OPTIMIZE_FOR_BOTH))
> > +   (IFN_SAT_ADDU @0 @1)))
> > +
> > +/* SAT_ADDU (X, 0) = X  */
> > +(simplify
> > + (IFN_SAT_ADDU:c @0 integer_zerop)
> > + @0)
> > +
> > +#endif
> > +
> >  /* A few cases of fold-const.cc negate_expr_p predicate.  */
> >  (match negate_expr_p
> >   INTEGER_CST
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..a2c11b7707b 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> >  OPTAB_D (addptr3_optab, "addptr$a3")
> >  OPTAB_D (spaceship_optab, "spaceship$a3")
> >
> > +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> > +
> >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > new file mode 100644
> > index 00000000000..229abef0faa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint8_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** andi\s+a0,\s*a0,\s*0xff
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint8_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > new file mode 100644
> > index 00000000000..4023b030811
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint16_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** slli\s+a0,\s*a0,\s*48
> > +** srli\s+a0,\s*a0,\s*48
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint16_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > new file mode 100644
> > index 00000000000..4d0af97fb67
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint32_t:
> > +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** sext.w\s+a0,\s*a0
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint32_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > new file mode 100644
> > index 00000000000..926f31266e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include "sat_arith.h"
> > +
> > +/*
> > +** sat_addu_uint64_t:
> > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> > +** ret
> > +*/
> > +DEF_SAT_ADDU(uint64_t)
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > new file mode 100644
> > index 00000000000..b19515c39d1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint8_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > new file mode 100644
> > index 00000000000..90073fbe4ba
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint16_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > new file mode 100644
> > index 00000000000..996dd3de737
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint32_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > new file mode 100644
> > index 00000000000..51a5421577b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do run { target { riscv_v } } } */
> > +/* { dg-additional-options "-std=c99" } */
> > +
> > +#include "sat_arith.h"
> > +
> > +DEF_SAT_ADDU(uint64_t)
> > +
> > +int
> > +main ()
> > +{
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> > +    != 18446744073709551614u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u,
> 18446744073709551615u)
> > +    != 18446744073709551615u)
> > +    __builtin_abort ();
> > +
> > +  return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h
> b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > new file mode 100644
> > index 00000000000..4c00157685e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > @@ -0,0 +1,15 @@
> > +#ifndef HAVE_SAT_ARITH
> > +#define HAVE_SAT_ARITH
> > +
> > +#include <stdint.h>
> > +
> > +#define DEF_SAT_ADDU(TYPE)                       \
> > +TYPE __attribute__((noinline))                   \
> > +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> > +{                                                \
> > +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> > +}
> > +
> > +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> > +
> > +#endif
> > --
> > 2.34.1
> >

Tamar Christina Feb. 19, 2024, 1:04 p.m. UTC | #7

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Monday, February 19, 2024 12:59 PM
> To: Tamar Christina <Tamar.Christina@arm.com>; Richard Biener
> <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Tamar for comments and explanations.
> 
> > I think we should actually do an indirect optab here, because the IFN can be used
> > to replace the general representation of saturating arithmetic.
> 
> > e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> > is inefficient on all targets and so the IFN can always expand to something that's
> more
> > efficient like the branchless version add_sat2.
> 
> > I think this is why you suggested a new tree code below, but we don't really need
> > tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN
> 
> Yes, the backend could choose a branchless(of course we always hate branch for
> performance) code-gen or even better there is one saturation insn.
> Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it.
> 
> > Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> > should be determined by the types at expansion time.  i.e. there should only be
> > .SAT_ADD.
> 
> Got it, my initial idea comes from that we may have two insns for saturation add,
> mostly these insns need to be signed or unsigned.
> For example, slt/sltu in riscv scalar. But I am not very clear about a scenario like this.
> During define_expand in backend, we hit the standard name
> sat_add_<m>3 but can we tell it is signed or not here? AFAIK, we only have QI, HI,
> SI and DI.

Yeah, the way DEF_INTERNAL_SIGNED_OPTAB_FN works is that you give it two optabs,
one for when it's signed and one for when it's unsigned, and the right one is picked
automatically during expansion.  But in GIMPLE you'd only have one IFN.

> Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will
> keep you posted.

Awesome, Thanks!

Tamar
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, February 19, 2024 4:55 PM
> To: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks for doing this!
> 
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Monday, February 19, 2024 8:42 AM
> > To: Richard Biener <richard.guenther@gmail.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> > <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>
> > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > Thanks Richard for comments.
> >
> > > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > > the corresponding ssadd/usadd optabs.  There's not much documentation
> > > unfortunately besides the use of gen_*_fixed_libfunc usage where the
> comment
> > > suggests this is used for fixed-point operations.  It looks like arm uses
> > > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > I find the related description about plus family in GCC internals doc but it doesn't
> > mention
> > anything about mode m here.
> >
> > (plus:m x y)
> > (ss_plus:m x y)
> > (us_plus:m x y)
> > These three expressions all represent the sum of the values represented by x
> > and y carried out in machine mode m. They diff er in their behavior on overflow
> > of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> > at the maximum signed value representable in m; us_plus saturates at the
> > maximum unsigned value.
> >
> > > The natural thing is to use direct optab internal functions (that's what you
> > > basically did, but you added a new optab, IMO without good reason).
> 
> I think we should actually do an indirect optab here, because the IFN can be used
> to replace the general representation of saturating arithmetic.
> 
> e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> is inefficient on all targets and so the IFN can always expand to something that's
> more
> efficient like the branchless version add_sat2.
> 
> I think this is why you suggested a new tree code below, but we don't really need
> tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN.
> 
> >
> > That makes sense to me, I will try to leverage US_PLUS instead here.
> >
> > > More GIMPLE-like would be to let the types involved decide whether
> > > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > > and if we don't map 1:1 to optabs then instead use tree codes like
> > > S_PLUS_EXPR (mimicing RTL here).
> >
> > Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I add
> one
> > restriction
> > like unsigned_p (type) in match.pd. Looks we have a better way here.
> >
> 
> Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> should be determined by the types at expansion time.  i.e. there should only be
> .SAT_ADD.
> 
> i.e. instead of this
> 
> +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu,
> binary)
> 
> You should use DEF_INTERNAL_SIGNED_OPTAB_FN.
> 
> Regards,
> Tamar
> 
> > > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> >
> > AFAIK, the scalar of the riscv backend doesn't have fixed-point but the vector
> does
> > have. They
> > share the same mode as vector integer. For example, RVVM1SI in vector-
> > iterators.md. Kito
> > and Juzhe can help to correct me if any misunderstandings.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Monday, February 19, 2024 3:36 PM
> > To: Li, Pan2 <pan2.li@intel.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> > <yanzhang.wang@intel.com>; kito.cheng@gmail.com;
> Tamar.Christina@arm.com
> > Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
> > >
> > > From: Pan Li <pan2.li@intel.com>
> > >
> > > This patch would like to add the middle-end presentation for the
> > > unsigned saturation add.  Aka set the result of add to the max
> > > when overflow.  It will take the pattern similar as below.
> > >
> > > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADDU (1, 254)   => 255.
> > > * SAT_ADDU (1, 255)   => 255.
> > > * SAT_ADDU (2, 255)   => 255.
> > > * SAT_ADDU (255, 255) => 255.
> > >
> > > The patch also implement the SAT_ADDU in the riscv backend as
> > > the sample.  Given below example:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > >
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > After this patch:
> > >
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > Then we will have the middle-end representation like .SAT_ADDU after
> > > this patch.
> >
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > So the question is whether the fixed-point case can be distinguished from
> > the integer case based on mode.
> >
> > There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> > no special tree operator codes for them.  So compared to what appears
> > to be the case on RTL we'd need a way to represent saturating integer
> > operations on GIMPLE.
> >
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> >
> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> >
> > Richard.
> >
> > >         PR target/51492
> > >         PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> > >         New func decl for the SAT_ADDU expand.
> > >         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> > >         impl for the SAT_ADDU expand.
> > >         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> > >         the standard name SAT_ADDU.
> > >         * doc/md.texi: Add doc for SAT_ADDU.
> > >         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> > >         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> > >         * match.pd: Add simplify pattern patch for SAT_ADDU.
> > >         * optabs.def (OPTAB_D): Add sat_addu_optab.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/riscv/sat_addu-1.c: New test.
> > >         * gcc.target/riscv/sat_addu-2.c: New test.
> > >         * gcc.target/riscv/sat_addu-3.c: New test.
> > >         * gcc.target/riscv/sat_addu-4.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-1.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-2.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-3.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-4.c: New test.
> > >         * gcc.target/riscv/sat_arith.h: New test.
> > >
> > > Signed-off-by: Pan Li <pan2.li@intel.com>
> > > ---
> > >  gcc/config/riscv/riscv-protos.h               |  1 +
> > >  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> > >  gcc/config/riscv/riscv.md                     | 11 +++++
> > >  gcc/doc/md.texi                               | 11 +++++
> > >  gcc/internal-fn.cc                            |  1 +
> > >  gcc/internal-fn.def                           |  1 +
> > >  gcc/match.pd                                  | 22 +++++++++
> > >  gcc/optabs.def                                |  2 +
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> > >  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> > >  17 files changed, 356 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> > >
> > > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > > index ae1685850ac..f201b2384f9 100644
> > > --- a/gcc/config/riscv/riscv-protos.h
> > > +++ b/gcc/config/riscv/riscv-protos.h
> > > @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const
> > tree, const char *);
> > >  extern bool
> > >  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> > >  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> > > +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> > >
> > >  #ifdef RTX_CODE
> > >  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr
> =
> > 0);
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index 799d7919a4a..84e86eb5d49 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p
> > (machine_mode)
> > >    return true;
> > >  }
> > >
> > > +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> > > +void
> > > +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> > > +{
> > > +  machine_mode mode = GET_MODE (dest);
> > > +  rtx pmode_sum = gen_reg_rtx (Pmode);
> > > +  rtx pmode_lt = gen_reg_rtx (Pmode);
> > > +  rtx pmode_x = gen_lowpart (Pmode, x);
> > > +  rtx pmode_y = gen_lowpart (Pmode, y);
> > > +  rtx pmode_dest = gen_reg_rtx (Pmode);
> > > +
> > > +  /* Step-1: sum = x + y  */
> > > +  if (mode == SImode && mode != Pmode)
> > > +    { /* Take addw to avoid the sum truncate.  */
> > > +      rtx simode_sum = gen_reg_rtx (SImode);
> > > +      riscv_emit_binary (PLUS, simode_sum, x, y);
> > > +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> > > +    }
> > > +  else
> > > +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> > > +
> > > +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> > > +  if (mode == HImode || mode == QImode)
> > > +    {
> > > +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> > > +       - GET_MODE_BITSIZE (mode).to_constant ();
> > > +
> > > +      gcc_assert (shift_bits > 0);
> > > +
> > > +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT
> (shift_bits));
> > > +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT
> > (shift_bits));
> > > +    }
> > > +
> > > +  /* Step-2: lt = sum < x  */
> > > +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> > > +
> > > +  /* Step-3: lt = -lt  */
> > > +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> > > +
> > > +  /* Step-4: pmode_dest = sum | lt  */
> > > +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> > > +
> > > +  /* Step-5: dest = pmode_dest */
> > > +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> > > +}
> > > +
> > >  /* Initialize the GCC target structure.  */
> > >  #undef TARGET_ASM_ALIGNED_HI_OP
> > >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index 39b29795cd6..03cbe5a2ca9 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> > >    [(set_attr "type" "load")
> > >     (set (attr "length") (const_int 8))])
> > >
> > > +(define_expand "sat_addu_<mode>3"
> > > +  [(match_operand:ANYI   0 "register_operand")
> > > +   (match_operand:ANYI   1 "register_operand")
> > > +   (match_operand:ANYI   2 "register_operand")]
> > > +  ""
> > > +  {
> > > +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> > > +    DONE;
> > > +  }
> > > +)
> > > +
> > >  (include "bitmanip.md")
> > >  (include "crypto.md")
> > >  (include "sync.md")
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index b0c61925120..5867afdb1a0 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes
> > @var{m}.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{sat_addu_@var{m}3} instruction pattern
> > > +@item @samp{sat_addu_@var{m}3}
> > > +Perform the saturation unsigned add for the operand 1 and operand 2 and
> > > +store the result into the operand 0.  All operands have mode @var{m},
> > > +which is a scalar integer mode.
> > > +
> > > +@smallexample
> > > +  typedef unsigned char uint8_t;
> > > +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> > > +@end smallexample
> > > +
> > >  @cindex @code{cmla@var{m}4} instruction pattern
> > >  @item @samp{cmla@var{m}4}
> > >  Perform a vector multiply and accumulate that is semantically the same as
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index a07f25f3aee..dee73dbc614 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> > >      case IFN_VEC_WIDEN_PLUS_HI:
> > >      case IFN_VEC_WIDEN_PLUS_EVEN:
> > >      case IFN_VEC_WIDEN_PLUS_ODD:
> > > +    case IFN_SAT_ADDU:
> > >        return true;
> > >
> > >      default:
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index c14d30365c1..a04592fc779 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN
> > (VEC_WIDEN_ABD,
> > >                                 binary)
> > >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub,
> > ternary)
> > >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd,
> > ternary)
> > > +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW,
> > sat_addu, binary)
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 711c3a10c3f..9de1106adcf 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   )
> > >  )
> > >
> > > +#if GIMPLE
> > > +
> > > +/* Saturation add unsigned, aka:
> > > +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> > > +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> > > +(simplify
> > > + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> > > +   (if (optimize
> > > +       && INTEGRAL_TYPE_P (type)
> > > +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +       && types_match (type, TREE_TYPE (@0))
> > > +       && types_match (type, TREE_TYPE (@1))
> > > +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type,
> > OPTIMIZE_FOR_BOTH))
> > > +   (IFN_SAT_ADDU @0 @1)))
> > > +
> > > +/* SAT_ADDU (X, 0) = X  */
> > > +(simplify
> > > + (IFN_SAT_ADDU:c @0 integer_zerop)
> > > + @0)
> > > +
> > > +#endif
> > > +
> > >  /* A few cases of fold-const.cc negate_expr_p predicate.  */
> > >  (match negate_expr_p
> > >   INTEGER_CST
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..a2c11b7707b 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> > >  OPTAB_D (addptr3_optab, "addptr$a3")
> > >  OPTAB_D (spaceship_optab, "spaceship$a3")
> > >
> > > +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> > > +
> > >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> > >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> > >
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > > new file mode 100644
> > > index 00000000000..229abef0faa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint8_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** andi\s+a0,\s*a0,\s*0xff
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint8_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > > new file mode 100644
> > > index 00000000000..4023b030811
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint16_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > > +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** slli\s+a0,\s*a0,\s*48
> > > +** srli\s+a0,\s*a0,\s*48
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint16_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > > new file mode 100644
> > > index 00000000000..4d0af97fb67
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint32_t:
> > > +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** sext.w\s+a0,\s*a0
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint32_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > > new file mode 100644
> > > index 00000000000..926f31266e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint64_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint64_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > > new file mode 100644
> > > index 00000000000..b19515c39d1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint8_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > > new file mode 100644
> > > index 00000000000..90073fbe4ba
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint16_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > > new file mode 100644
> > > index 00000000000..996dd3de737
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint32_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > > new file mode 100644
> > > index 00000000000..51a5421577b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > > @@ -0,0 +1,49 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint64_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> > > +    != 18446744073709551614u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u,
> > 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > > new file mode 100644
> > > index 00000000000..4c00157685e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > > @@ -0,0 +1,15 @@
> > > +#ifndef HAVE_SAT_ARITH
> > > +#define HAVE_SAT_ARITH
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#define DEF_SAT_ADDU(TYPE)                       \
> > > +TYPE __attribute__((noinline))                   \
> > > +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> > > +{                                                \
> > > +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> > > +}
> > > +
> > > +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> > > +
> > > +#endif
> > > --
> > > 2.34.1
> > >

Li, Pan2 Feb. 24, 2024, 11:18 a.m. UTC | #8

Hi Tamar and Richard.

Just try DEF_INTERNAL_INT_EXT_FN as below draft patch, not very sure if
my understanding is correct(mostly reference the popcount implementation) here.
Thanks a lot.

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646442.html

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Monday, February 19, 2024 9:05 PM
To: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Monday, February 19, 2024 12:59 PM
> To: Tamar Christina <Tamar.Christina@arm.com>; Richard Biener
> <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Tamar for comments and explanations.
> 
> > I think we should actually do an indirect optab here, because the IFN can be used
> > to replace the general representation of saturating arithmetic.
> 
> > e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> > is inefficient on all targets and so the IFN can always expand to something that's
> more
> > efficient like the branchless version add_sat2.
> 
> > I think this is why you suggested a new tree code below, but we don't really need
> > tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN
> 
> Yes, the backend could choose a branchless(of course we always hate branch for
> performance) code-gen or even better there is one saturation insn.
> Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it.
> 
> > Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> > should be determined by the types at expansion time.  i.e. there should only be
> > .SAT_ADD.
> 
> Got it, my initial idea comes from that we may have two insns for saturation add,
> mostly these insns need to be signed or unsigned.
> For example, slt/sltu in riscv scalar. But I am not very clear about a scenario like this.
> During define_expand in backend, we hit the standard name
> sat_add_<m>3 but can we tell it is signed or not here? AFAIK, we only have QI, HI,
> SI and DI.

Yeah, the way DEF_INTERNAL_SIGNED_OPTAB_FN works is that you give it two optabs,
one for when it's signed and one for when it's unsigned, and the right one is picked
automatically during expansion.  But in GIMPLE you'd only have one IFN.

> Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will
> keep you posted.

Awesome, Thanks!

Tamar
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, February 19, 2024 4:55 PM
> To: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> <yanzhang.wang@intel.com>; kito.cheng@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks for doing this!
> 
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Monday, February 19, 2024 8:42 AM
> > To: Richard Biener <richard.guenther@gmail.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> > <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>
> > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > Thanks Richard for comments.
> >
> > > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > > the corresponding ssadd/usadd optabs.  There's not much documentation
> > > unfortunately besides the use of gen_*_fixed_libfunc usage where the
> comment
> > > suggests this is used for fixed-point operations.  It looks like arm uses
> > > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > I find the related description about plus family in GCC internals doc but it doesn't
> > mention
> > anything about mode m here.
> >
> > (plus:m x y)
> > (ss_plus:m x y)
> > (us_plus:m x y)
> > These three expressions all represent the sum of the values represented by x
> > and y carried out in machine mode m. They diff er in their behavior on overflow
> > of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> > at the maximum signed value representable in m; us_plus saturates at the
> > maximum unsigned value.
> >
> > > The natural thing is to use direct optab internal functions (that's what you
> > > basically did, but you added a new optab, IMO without good reason).
> 
> I think we should actually do an indirect optab here, because the IFN can be used
> to replace the general representation of saturating arithmetic.
> 
> e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> is inefficient on all targets and so the IFN can always expand to something that's
> more
> efficient like the branchless version add_sat2.
> 
> I think this is why you suggested a new tree code below, but we don't really need
> tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN.
> 
> >
> > That makes sense to me, I will try to leverage US_PLUS instead here.
> >
> > > More GIMPLE-like would be to let the types involved decide whether
> > > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > > and if we don't map 1:1 to optabs then instead use tree codes like
> > > S_PLUS_EXPR (mimicing RTL here).
> >
> > Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I add
> one
> > restriction
> > like unsigned_p (type) in match.pd. Looks we have a better way here.
> >
> 
> Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> should be determined by the types at expansion time.  i.e. there should only be
> .SAT_ADD.
> 
> i.e. instead of this
> 
> +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu,
> binary)
> 
> You should use DEF_INTERNAL_SIGNED_OPTAB_FN.
> 
> Regards,
> Tamar
> 
> > > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> >
> > AFAIK, the scalar of the riscv backend doesn't have fixed-point but the vector
> does
> > have. They
> > share the same mode as vector integer. For example, RVVM1SI in vector-
> > iterators.md. Kito
> > and Juzhe can help to correct me if any misunderstandings.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Monday, February 19, 2024 3:36 PM
> > To: Li, Pan2 <pan2.li@intel.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> > <yanzhang.wang@intel.com>; kito.cheng@gmail.com;
> Tamar.Christina@arm.com
> > Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
> > >
> > > From: Pan Li <pan2.li@intel.com>
> > >
> > > This patch would like to add the middle-end presentation for the
> > > unsigned saturation add.  Aka set the result of add to the max
> > > when overflow.  It will take the pattern similar as below.
> > >
> > > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADDU (1, 254)   => 255.
> > > * SAT_ADDU (1, 255)   => 255.
> > > * SAT_ADDU (2, 255)   => 255.
> > > * SAT_ADDU (255, 255) => 255.
> > >
> > > The patch also implement the SAT_ADDU in the riscv backend as
> > > the sample.  Given below example:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > >
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > After this patch:
> > >
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > Then we will have the middle-end representation like .SAT_ADDU after
> > > this patch.
> >
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > So the question is whether the fixed-point case can be distinguished from
> > the integer case based on mode.
> >
> > There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> > no special tree operator codes for them.  So compared to what appears
> > to be the case on RTL we'd need a way to represent saturating integer
> > operations on GIMPLE.
> >
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> >
> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> >
> > Richard.
> >
> > >         PR target/51492
> > >         PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> > >         New func decl for the SAT_ADDU expand.
> > >         * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> > >         impl for the SAT_ADDU expand.
> > >         * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> > >         the standard name SAT_ADDU.
> > >         * doc/md.texi: Add doc for SAT_ADDU.
> > >         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> > >         * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> > >         * match.pd: Add simplify pattern patch for SAT_ADDU.
> > >         * optabs.def (OPTAB_D): Add sat_addu_optab.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/riscv/sat_addu-1.c: New test.
> > >         * gcc.target/riscv/sat_addu-2.c: New test.
> > >         * gcc.target/riscv/sat_addu-3.c: New test.
> > >         * gcc.target/riscv/sat_addu-4.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-1.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-2.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-3.c: New test.
> > >         * gcc.target/riscv/sat_addu-run-4.c: New test.
> > >         * gcc.target/riscv/sat_arith.h: New test.
> > >
> > > Signed-off-by: Pan Li <pan2.li@intel.com>
> > > ---
> > >  gcc/config/riscv/riscv-protos.h               |  1 +
> > >  gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> > >  gcc/config/riscv/riscv.md                     | 11 +++++
> > >  gcc/doc/md.texi                               | 11 +++++
> > >  gcc/internal-fn.cc                            |  1 +
> > >  gcc/internal-fn.def                           |  1 +
> > >  gcc/match.pd                                  | 22 +++++++++
> > >  gcc/optabs.def                                |  2 +
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> > >  .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> > >  .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> > >  gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> > >  17 files changed, 356 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> > >
> > > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > > index ae1685850ac..f201b2384f9 100644
> > > --- a/gcc/config/riscv/riscv-protos.h
> > > +++ b/gcc/config/riscv/riscv-protos.h
> > > @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const
> > tree, const char *);
> > >  extern bool
> > >  riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> > >  extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> > > +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> > >
> > >  #ifdef RTX_CODE
> > >  extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr
> =
> > 0);
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index 799d7919a4a..84e86eb5d49 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p
> > (machine_mode)
> > >    return true;
> > >  }
> > >
> > > +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> > > +void
> > > +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> > > +{
> > > +  machine_mode mode = GET_MODE (dest);
> > > +  rtx pmode_sum = gen_reg_rtx (Pmode);
> > > +  rtx pmode_lt = gen_reg_rtx (Pmode);
> > > +  rtx pmode_x = gen_lowpart (Pmode, x);
> > > +  rtx pmode_y = gen_lowpart (Pmode, y);
> > > +  rtx pmode_dest = gen_reg_rtx (Pmode);
> > > +
> > > +  /* Step-1: sum = x + y  */
> > > +  if (mode == SImode && mode != Pmode)
> > > +    { /* Take addw to avoid the sum truncate.  */
> > > +      rtx simode_sum = gen_reg_rtx (SImode);
> > > +      riscv_emit_binary (PLUS, simode_sum, x, y);
> > > +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> > > +    }
> > > +  else
> > > +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> > > +
> > > +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> > > +  if (mode == HImode || mode == QImode)
> > > +    {
> > > +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> > > +       - GET_MODE_BITSIZE (mode).to_constant ();
> > > +
> > > +      gcc_assert (shift_bits > 0);
> > > +
> > > +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT
> (shift_bits));
> > > +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT
> > (shift_bits));
> > > +    }
> > > +
> > > +  /* Step-2: lt = sum < x  */
> > > +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> > > +
> > > +  /* Step-3: lt = -lt  */
> > > +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> > > +
> > > +  /* Step-4: pmode_dest = sum | lt  */
> > > +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> > > +
> > > +  /* Step-5: dest = pmode_dest */
> > > +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> > > +}
> > > +
> > >  /* Initialize the GCC target structure.  */
> > >  #undef TARGET_ASM_ALIGNED_HI_OP
> > >  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index 39b29795cd6..03cbe5a2ca9 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> > >    [(set_attr "type" "load")
> > >     (set (attr "length") (const_int 8))])
> > >
> > > +(define_expand "sat_addu_<mode>3"
> > > +  [(match_operand:ANYI   0 "register_operand")
> > > +   (match_operand:ANYI   1 "register_operand")
> > > +   (match_operand:ANYI   2 "register_operand")]
> > > +  ""
> > > +  {
> > > +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> > > +    DONE;
> > > +  }
> > > +)
> > > +
> > >  (include "bitmanip.md")
> > >  (include "crypto.md")
> > >  (include "sync.md")
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index b0c61925120..5867afdb1a0 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes
> > @var{m}.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{sat_addu_@var{m}3} instruction pattern
> > > +@item @samp{sat_addu_@var{m}3}
> > > +Perform the saturation unsigned add for the operand 1 and operand 2 and
> > > +store the result into the operand 0.  All operands have mode @var{m},
> > > +which is a scalar integer mode.
> > > +
> > > +@smallexample
> > > +  typedef unsigned char uint8_t;
> > > +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> > > +@end smallexample
> > > +
> > >  @cindex @code{cmla@var{m}4} instruction pattern
> > >  @item @samp{cmla@var{m}4}
> > >  Perform a vector multiply and accumulate that is semantically the same as
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index a07f25f3aee..dee73dbc614 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> > >      case IFN_VEC_WIDEN_PLUS_HI:
> > >      case IFN_VEC_WIDEN_PLUS_EVEN:
> > >      case IFN_VEC_WIDEN_PLUS_ODD:
> > > +    case IFN_SAT_ADDU:
> > >        return true;
> > >
> > >      default:
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index c14d30365c1..a04592fc779 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN
> > (VEC_WIDEN_ABD,
> > >                                 binary)
> > >  DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub,
> > ternary)
> > >  DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd,
> > ternary)
> > > +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW,
> > sat_addu, binary)
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 711c3a10c3f..9de1106adcf 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   )
> > >  )
> > >
> > > +#if GIMPLE
> > > +
> > > +/* Saturation add unsigned, aka:
> > > +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> > > +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> > > +(simplify
> > > + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> > > +   (if (optimize
> > > +       && INTEGRAL_TYPE_P (type)
> > > +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +       && types_match (type, TREE_TYPE (@0))
> > > +       && types_match (type, TREE_TYPE (@1))
> > > +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type,
> > OPTIMIZE_FOR_BOTH))
> > > +   (IFN_SAT_ADDU @0 @1)))
> > > +
> > > +/* SAT_ADDU (X, 0) = X  */
> > > +(simplify
> > > + (IFN_SAT_ADDU:c @0 integer_zerop)
> > > + @0)
> > > +
> > > +#endif
> > > +
> > >  /* A few cases of fold-const.cc negate_expr_p predicate.  */
> > >  (match negate_expr_p
> > >   INTEGER_CST
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..a2c11b7707b 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> > >  OPTAB_D (addptr3_optab, "addptr$a3")
> > >  OPTAB_D (spaceship_optab, "spaceship$a3")
> > >
> > > +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> > > +
> > >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> > >  OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> > >
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > > new file mode 100644
> > > index 00000000000..229abef0faa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint8_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** andi\s+a0,\s*a0,\s*0xff
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint8_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > > new file mode 100644
> > > index 00000000000..4023b030811
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint16_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > > +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** slli\s+a0,\s*a0,\s*48
> > > +** srli\s+a0,\s*a0,\s*48
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint16_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > > new file mode 100644
> > > index 00000000000..4d0af97fb67
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint32_t:
> > > +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** sext.w\s+a0,\s*a0
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint32_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > > new file mode 100644
> > > index 00000000000..926f31266e3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> > schedule-insns2" } */
> > > +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +/*
> > > +** sat_addu_uint64_t:
> > > +** add\s+[atx][0-9]+,\s*a0,\s*a1
> > > +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> > > +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> > > +** ret
> > > +*/
> > > +DEF_SAT_ADDU(uint64_t)
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > > new file mode 100644
> > > index 00000000000..b19515c39d1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint8_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > > new file mode 100644
> > > index 00000000000..90073fbe4ba
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint16_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > > new file mode 100644
> > > index 00000000000..996dd3de737
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> > > @@ -0,0 +1,42 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint32_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > > new file mode 100644
> > > index 00000000000..51a5421577b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> > > @@ -0,0 +1,49 @@
> > > +/* { dg-do run { target { riscv_v } } } */
> > > +/* { dg-additional-options "-std=c99" } */
> > > +
> > > +#include "sat_arith.h"
> > > +
> > > +DEF_SAT_ADDU(uint64_t)
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> > > +    != 18446744073709551614u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u,
> > 18446744073709551615u)
> > > +    != 18446744073709551615u)
> > > +    __builtin_abort ();
> > > +
> > > +  return 0;
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > > new file mode 100644
> > > index 00000000000..4c00157685e
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> > > @@ -0,0 +1,15 @@
> > > +#ifndef HAVE_SAT_ARITH
> > > +#define HAVE_SAT_ARITH
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#define DEF_SAT_ADDU(TYPE)                       \
> > > +TYPE __attribute__((noinline))                   \
> > > +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> > > +{                                                \
> > > +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> > > +}
> > > +
> > > +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> > > +
> > > +#endif
> > > --
> > > 2.34.1
> > >

Georg-Johann Lay Feb. 27, 2024, 10:53 a.m. UTC | #9

Am 19.02.24 um 08:36 schrieb Richard Biener:
> On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
>>
>> From: Pan Li <pan2.li@intel.com>
>>
>> This patch would like to add the middle-end presentation for the
>> unsigned saturation add.  Aka set the result of add to the max
>> when overflow.  It will take the pattern similar as below.
>>
>> SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Does this even try to wort out the costs?

For example, with the following example


#define T __UINT16_TYPE__

T sat_add1 (T x, T y)
{
   return (x + y) | (- (T)((T)(x + y) < x));
}

T sat_add2 (T x, T y)
{
     T z = x + y;
     if (z < x)
         z = (T) -1;
     return z;
}

And then "avr-gcc -S -Os -dp" the code is


sat_add1:
	add r22,r24	 ;  7	[c=8 l=2]  *addhi3/0
	adc r23,r25
	ldi r18,lo8(1)	 ;  8	[c=4 l=2]  *movhi/4
	ldi r19,0	
	cp r22,r24	 ;  9	[c=8 l=2]  cmphi3/2
	cpc r23,r25
	brlo .L2		 ;  10	[c=16 l=1]  branch
	ldi r19,0		 ;  31	[c=4 l=1]  movqi_insn/0
	ldi r18,0		 ;  32	[c=4 l=1]  movqi_insn/0
.L2:
	clr r24	 ;  13	[c=12 l=4]  neghi2/1
	clr r25
	sub r24,r18
	sbc r25,r19
	or r24,r22		 ;  29	[c=4 l=1]  iorqi3/0
	or r25,r23		 ;  30	[c=4 l=1]  iorqi3/0
	ret		 ;  35	[c=0 l=1]  return

sat_add2:
	add r22,r24	 ;  8	[c=8 l=2]  *addhi3/0
	adc r23,r25
	cp r22,r24	 ;  9	[c=8 l=2]  cmphi3/2
	cpc r23,r25
	brsh .L3		 ;  10	[c=16 l=1]  branch
	ldi r22,lo8(-1)	 ;  5	[c=4 l=2]  *movhi/4
	ldi r23,lo8(-1)
.L3:
	mov r25,r23	 ;  21	[c=4 l=1]  movqi_insn/0
	mov r24,r22	 ;  22	[c=4 l=1]  movqi_insn/0
	ret		 ;  25	[c=0 l=1]  return

i.e. the conditional jump is better than overly smart arithmetic
(smaller and faster code with less register pressure).
With larger dypes the difference is even more pronounced-

Johann


>> Take uint8_t as example, we will have:
>>
>> * SAT_ADDU (1, 254)   => 255.
>> * SAT_ADDU (1, 255)   => 255.
>> * SAT_ADDU (2, 255)   => 255.
>> * SAT_ADDU (255, 255) => 255.
>>
>> The patch also implement the SAT_ADDU in the riscv backend as
>> the sample.  Given below example:
>>
>> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
>> {
>>    return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
>> }
>>
>> Before this patch:
>>
>> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
>> {
>>    long unsigned int _1;
>>    _Bool _2;
>>    long unsigned int _3;
>>    long unsigned int _4;
>>    uint64_t _7;
>>    long unsigned int _10;
>>    __complex__ long unsigned int _11;
>>
>> ;;   basic block 2, loop depth 0
>> ;;    pred:       ENTRY
>>    _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>>    _1 = REALPART_EXPR <_11>;
>>    _10 = IMAGPART_EXPR <_11>;
>>    _2 = _10 != 0;
>>    _3 = (long unsigned int) _2;
>>    _4 = -_3;
>>    _7 = _1 | _4;
>>    return _7;
>> ;;    succ:       EXIT
>>
>> }
>>
>> After this patch:
>>
>> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
>> {
>>    uint64_t _7;
>>
>> ;;   basic block 2, loop depth 0
>> ;;    pred:       ENTRY
>>    _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
>>    return _7;
>> ;;    succ:       EXIT
>>
>> }
>>
>> Then we will have the middle-end representation like .SAT_ADDU after
>> this patch.
> 
> I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> the corresponding ssadd/usadd optabs.  There's not much documentation
> unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> suggests this is used for fixed-point operations.  It looks like arm uses
> fractional/accumulator modes for this but for example bfin has ssaddsi3.
> 
> So the question is whether the fixed-point case can be distinguished from
> the integer case based on mode.
> 
> There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> no special tree operator codes for them.  So compared to what appears
> to be the case on RTL we'd need a way to represent saturating integer
> operations on GIMPLE.
> 
> The natural thing is to use direct optab internal functions (that's what you
> basically did, but you added a new optab, IMO without good reason).
> More GIMPLE-like would be to let the types involved decide whether
> it's signed or unsigned saturation.  That's actually what I'd prefer here
> and if we don't map 1:1 to optabs then instead use tree codes like
> S_PLUS_EXPR (mimicing RTL here).
> 
> Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> 
> Richard.
> 
>>          PR target/51492
>>          PR target/112600
>>
>> gcc/ChangeLog:
>>
>>          * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
>>          New func decl for the SAT_ADDU expand.
>>          * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
>>          impl for the SAT_ADDU expand.
>>          * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
>>          the standard name SAT_ADDU.
>>          * doc/md.texi: Add doc for SAT_ADDU.
>>          * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
>>          * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
>>          * match.pd: Add simplify pattern patch for SAT_ADDU.
>>          * optabs.def (OPTAB_D): Add sat_addu_optab.
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/riscv/sat_addu-1.c: New test.
>>          * gcc.target/riscv/sat_addu-2.c: New test.
>>          * gcc.target/riscv/sat_addu-3.c: New test.
>>          * gcc.target/riscv/sat_addu-4.c: New test.
>>          * gcc.target/riscv/sat_addu-run-1.c: New test.
>>          * gcc.target/riscv/sat_addu-run-2.c: New test.
>>          * gcc.target/riscv/sat_addu-run-3.c: New test.
>>          * gcc.target/riscv/sat_addu-run-4.c: New test.
>>          * gcc.target/riscv/sat_arith.h: New test.
>>
>> Signed-off-by: Pan Li <pan2.li@intel.com>
>> ---
>>   gcc/config/riscv/riscv-protos.h               |  1 +
>>   gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
>>   gcc/config/riscv/riscv.md                     | 11 +++++
>>   gcc/doc/md.texi                               | 11 +++++
>>   gcc/internal-fn.cc                            |  1 +
>>   gcc/internal-fn.def                           |  1 +
>>   gcc/match.pd                                  | 22 +++++++++
>>   gcc/optabs.def                                |  2 +
>>   gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
>>   gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
>>   gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
>>   gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
>>   .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
>>   .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
>>   .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
>>   .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
>>   gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
>>   17 files changed, 356 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
>>
>> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>> index ae1685850ac..f201b2384f9 100644
>> --- a/gcc/config/riscv/riscv-protos.h
>> +++ b/gcc/config/riscv/riscv-protos.h
>> @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
>>   extern bool
>>   riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
>>   extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
>> +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
>>
>>   #ifdef RTX_CODE
>>   extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 799d7919a4a..84e86eb5d49 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
>>     return true;
>>   }
>>
>> +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
>> +void
>> +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
>> +{
>> +  machine_mode mode = GET_MODE (dest);
>> +  rtx pmode_sum = gen_reg_rtx (Pmode);
>> +  rtx pmode_lt = gen_reg_rtx (Pmode);
>> +  rtx pmode_x = gen_lowpart (Pmode, x);
>> +  rtx pmode_y = gen_lowpart (Pmode, y);
>> +  rtx pmode_dest = gen_reg_rtx (Pmode);
>> +
>> +  /* Step-1: sum = x + y  */
>> +  if (mode == SImode && mode != Pmode)
>> +    { /* Take addw to avoid the sum truncate.  */
>> +      rtx simode_sum = gen_reg_rtx (SImode);
>> +      riscv_emit_binary (PLUS, simode_sum, x, y);
>> +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
>> +    }
>> +  else
>> +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
>> +
>> +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
>> +  if (mode == HImode || mode == QImode)
>> +    {
>> +      int shift_bits = GET_MODE_BITSIZE (Pmode)
>> +       - GET_MODE_BITSIZE (mode).to_constant ();
>> +
>> +      gcc_assert (shift_bits > 0);
>> +
>> +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
>> +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT (shift_bits));
>> +    }
>> +
>> +  /* Step-2: lt = sum < x  */
>> +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
>> +
>> +  /* Step-3: lt = -lt  */
>> +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
>> +
>> +  /* Step-4: pmode_dest = sum | lt  */
>> +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
>> +
>> +  /* Step-5: dest = pmode_dest */
>> +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
>> +}
>> +
>>   /* Initialize the GCC target structure.  */
>>   #undef TARGET_ASM_ALIGNED_HI_OP
>>   #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
>> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>> index 39b29795cd6..03cbe5a2ca9 100644
>> --- a/gcc/config/riscv/riscv.md
>> +++ b/gcc/config/riscv/riscv.md
>> @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
>>     [(set_attr "type" "load")
>>      (set (attr "length") (const_int 8))])
>>
>> +(define_expand "sat_addu_<mode>3"
>> +  [(match_operand:ANYI   0 "register_operand")
>> +   (match_operand:ANYI   1 "register_operand")
>> +   (match_operand:ANYI   2 "register_operand")]
>> +  ""
>> +  {
>> +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
>> +    DONE;
>> +  }
>> +)
>> +
>>   (include "bitmanip.md")
>>   (include "crypto.md")
>>   (include "sync.md")
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index b0c61925120..5867afdb1a0 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes @var{m}.
>>
>>   This pattern is not allowed to @code{FAIL}.
>>
>> +@cindex @code{sat_addu_@var{m}3} instruction pattern
>> +@item @samp{sat_addu_@var{m}3}
>> +Perform the saturation unsigned add for the operand 1 and operand 2 and
>> +store the result into the operand 0.  All operands have mode @var{m},
>> +which is a scalar integer mode.
>> +
>> +@smallexample
>> +  typedef unsigned char uint8_t;
>> +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
>> +@end smallexample
>> +
>>   @cindex @code{cmla@var{m}4} instruction pattern
>>   @item @samp{cmla@var{m}4}
>>   Perform a vector multiply and accumulate that is semantically the same as
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index a07f25f3aee..dee73dbc614 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
>>       case IFN_VEC_WIDEN_PLUS_HI:
>>       case IFN_VEC_WIDEN_PLUS_EVEN:
>>       case IFN_VEC_WIDEN_PLUS_ODD:
>> +    case IFN_SAT_ADDU:
>>         return true;
>>
>>       default:
>> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
>> index c14d30365c1..a04592fc779 100644
>> --- a/gcc/internal-fn.def
>> +++ b/gcc/internal-fn.def
>> @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
>>                                  binary)
>>   DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
>>   DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
>> +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)
>>
>>   /* FP scales.  */
>>   DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 711c3a10c3f..9de1106adcf 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>    )
>>   )
>>
>> +#if GIMPLE
>> +
>> +/* Saturation add unsigned, aka:
>> +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
>> +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
>> +(simplify
>> + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
>> +   (if (optimize
>> +       && INTEGRAL_TYPE_P (type)
>> +       && TYPE_UNSIGNED (TREE_TYPE (@0))
>> +       && types_match (type, TREE_TYPE (@0))
>> +       && types_match (type, TREE_TYPE (@1))
>> +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type, OPTIMIZE_FOR_BOTH))
>> +   (IFN_SAT_ADDU @0 @1)))
>> +
>> +/* SAT_ADDU (X, 0) = X  */
>> +(simplify
>> + (IFN_SAT_ADDU:c @0 integer_zerop)
>> + @0)
>> +
>> +#endif
>> +
>>   /* A few cases of fold-const.cc negate_expr_p predicate.  */
>>   (match negate_expr_p
>>    INTEGER_CST
>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>> index ad14f9328b9..a2c11b7707b 100644
>> --- a/gcc/optabs.def
>> +++ b/gcc/optabs.def
>> @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
>>   OPTAB_D (addptr3_optab, "addptr$a3")
>>   OPTAB_D (spaceship_optab, "spaceship$a3")
>>
>> +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
>> +
>>   OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
>>   OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
>> new file mode 100644
>> index 00000000000..229abef0faa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
>> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#include "sat_arith.h"
>> +
>> +/*
>> +** sat_addu_uint8_t:
>> +** add\s+[atx][0-9]+,\s*a0,\s*a1
>> +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
>> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
>> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** andi\s+a0,\s*a0,\s*0xff
>> +** ret
>> +*/
>> +DEF_SAT_ADDU(uint8_t)
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
>> new file mode 100644
>> index 00000000000..4023b030811
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
>> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#include "sat_arith.h"
>> +
>> +/*
>> +** sat_addu_uint16_t:
>> +** add\s+[atx][0-9]+,\s*a0,\s*a1
>> +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
>> +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
>> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
>> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** slli\s+a0,\s*a0,\s*48
>> +** srli\s+a0,\s*a0,\s*48
>> +** ret
>> +*/
>> +DEF_SAT_ADDU(uint16_t)
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
>> new file mode 100644
>> index 00000000000..4d0af97fb67
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
>> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#include "sat_arith.h"
>> +
>> +/*
>> +** sat_addu_uint32_t:
>> +** addw\s+[atx][0-9]+,\s*a0,\s*a1
>> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
>> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** sext.w\s+a0,\s*a0
>> +** ret
>> +*/
>> +DEF_SAT_ADDU(uint32_t)
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
>> new file mode 100644
>> index 00000000000..926f31266e3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-schedule-insns2" } */
>> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#include "sat_arith.h"
>> +
>> +/*
>> +** sat_addu_uint64_t:
>> +** add\s+[atx][0-9]+,\s*a0,\s*a1
>> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
>> +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
>> +** ret
>> +*/
>> +DEF_SAT_ADDU(uint64_t)
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
>> new file mode 100644
>> index 00000000000..b19515c39d1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
>> @@ -0,0 +1,42 @@
>> +/* { dg-do run { target { riscv_v } } } */
>> +/* { dg-additional-options "-std=c99" } */
>> +
>> +#include "sat_arith.h"
>> +
>> +DEF_SAT_ADDU(uint8_t)
>> +
>> +int
>> +main ()
>> +{
>> +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
>> +    __builtin_abort ();
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
>> new file mode 100644
>> index 00000000000..90073fbe4ba
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
>> @@ -0,0 +1,42 @@
>> +/* { dg-do run { target { riscv_v } } } */
>> +/* { dg-additional-options "-std=c99" } */
>> +
>> +#include "sat_arith.h"
>> +
>> +DEF_SAT_ADDU(uint16_t)
>> +
>> +int
>> +main ()
>> +{
>> +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
>> +    __builtin_abort ();
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
>> new file mode 100644
>> index 00000000000..996dd3de737
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
>> @@ -0,0 +1,42 @@
>> +/* { dg-do run { target { riscv_v } } } */
>> +/* { dg-additional-options "-std=c99" } */
>> +
>> +#include "sat_arith.h"
>> +
>> +DEF_SAT_ADDU(uint32_t)
>> +
>> +int
>> +main ()
>> +{
>> +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
>> +    __builtin_abort ();
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
>> new file mode 100644
>> index 00000000000..51a5421577b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
>> @@ -0,0 +1,49 @@
>> +/* { dg-do run { target { riscv_v } } } */
>> +/* { dg-additional-options "-std=c99" } */
>> +
>> +#include "sat_arith.h"
>> +
>> +DEF_SAT_ADDU(uint64_t)
>> +
>> +int
>> +main ()
>> +{
>> +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
>> +    != 18446744073709551614u)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
>> +    != 18446744073709551615u)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
>> +    != 18446744073709551615u)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
>> +    != 18446744073709551615u)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
>> +    != 18446744073709551615u)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
>> +    != 18446744073709551615u)
>> +    __builtin_abort ();
>> +
>> +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u, 18446744073709551615u)
>> +    != 18446744073709551615u)
>> +    __builtin_abort ();
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
>> new file mode 100644
>> index 00000000000..4c00157685e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
>> @@ -0,0 +1,15 @@
>> +#ifndef HAVE_SAT_ARITH
>> +#define HAVE_SAT_ARITH
>> +
>> +#include <stdint.h>
>> +
>> +#define DEF_SAT_ADDU(TYPE)                       \
>> +TYPE __attribute__((noinline))                   \
>> +sat_addu_##TYPE (TYPE x, TYPE y)                 \
>> +{                                                \
>> +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
>> +}
>> +
>> +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
>> +
>> +#endif
>> --
>> 2.34.1
>>

Tamar Christina Feb. 27, 2024, 11:15 a.m. UTC | #10

> Am 19.02.24 um 08:36 schrieb Richard Biener:
> > On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
> >>
> >> From: Pan Li <pan2.li@intel.com>
> >>
> >> This patch would like to add the middle-end presentation for the
> >> unsigned saturation add.  Aka set the result of add to the max
> >> when overflow.  It will take the pattern similar as below.
> >>
> >> SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Does this even try to wort out the costs?
> 
> For example, with the following example
> 
> 
> #define T __UINT16_TYPE__
> 
> T sat_add1 (T x, T y)
> {
>    return (x + y) | (- (T)((T)(x + y) < x));
> }
> 
> T sat_add2 (T x, T y)
> {
>      T z = x + y;
>      if (z < x)
>          z = (T) -1;
>      return z;
> }
> 
> And then "avr-gcc -S -Os -dp" the code is
> 
> 
> sat_add1:
> 	add r22,r24	 ;  7	[c=8 l=2]  *addhi3/0
> 	adc r23,r25
> 	ldi r18,lo8(1)	 ;  8	[c=4 l=2]  *movhi/4
> 	ldi r19,0
> 	cp r22,r24	 ;  9	[c=8 l=2]  cmphi3/2
> 	cpc r23,r25
> 	brlo .L2		 ;  10	[c=16 l=1]  branch
> 	ldi r19,0		 ;  31	[c=4 l=1]  movqi_insn/0
> 	ldi r18,0		 ;  32	[c=4 l=1]  movqi_insn/0
> .L2:
> 	clr r24	 ;  13	[c=12 l=4]  neghi2/1
> 	clr r25
> 	sub r24,r18
> 	sbc r25,r19
> 	or r24,r22		 ;  29	[c=4 l=1]  iorqi3/0
> 	or r25,r23		 ;  30	[c=4 l=1]  iorqi3/0
> 	ret		 ;  35	[c=0 l=1]  return
> 
> sat_add2:
> 	add r22,r24	 ;  8	[c=8 l=2]  *addhi3/0
> 	adc r23,r25
> 	cp r22,r24	 ;  9	[c=8 l=2]  cmphi3/2
> 	cpc r23,r25
> 	brsh .L3		 ;  10	[c=16 l=1]  branch
> 	ldi r22,lo8(-1)	 ;  5	[c=4 l=2]  *movhi/4
> 	ldi r23,lo8(-1)
> .L3:
> 	mov r25,r23	 ;  21	[c=4 l=1]  movqi_insn/0
> 	mov r24,r22	 ;  22	[c=4 l=1]  movqi_insn/0
> 	ret		 ;  25	[c=0 l=1]  return
> 
> i.e. the conditional jump is better than overly smart arithmetic
> (smaller and faster code with less register pressure).
> With larger dypes the difference is even more pronounced-
> 

*on AVR. https://godbolt.org/z/7jaExbTa8  shows the branchless code is better.
And the branchy code will vectorize worse if at all https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

But looking at that output it just seems like it's your expansion that's inefficient.

But fair point, perhaps it should be just a normal DEF_INTERNAL_SIGNED_OPTAB_FN so that we
provide the additional optimization only for targets that want it.

Tamar

> >> Take uint8_t as example, we will have:
> >>
> >> * SAT_ADDU (1, 254)   => 255.
> >> * SAT_ADDU (1, 255)   => 255.
> >> * SAT_ADDU (2, 255)   => 255.
> >> * SAT_ADDU (255, 255) => 255.
> >>
> >> The patch also implement the SAT_ADDU in the riscv backend as
> >> the sample.  Given below example:
> >>
> >> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> >> {
> >>    return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> >> }
> >>
> >> Before this patch:
> >>
> >> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> >> {
> >>    long unsigned int _1;
> >>    _Bool _2;
> >>    long unsigned int _3;
> >>    long unsigned int _4;
> >>    uint64_t _7;
> >>    long unsigned int _10;
> >>    __complex__ long unsigned int _11;
> >>
> >> ;;   basic block 2, loop depth 0
> >> ;;    pred:       ENTRY
> >>    _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >>    _1 = REALPART_EXPR <_11>;
> >>    _10 = IMAGPART_EXPR <_11>;
> >>    _2 = _10 != 0;
> >>    _3 = (long unsigned int) _2;
> >>    _4 = -_3;
> >>    _7 = _1 | _4;
> >>    return _7;
> >> ;;    succ:       EXIT
> >>
> >> }
> >>
> >> After this patch:
> >>
> >> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> >> {
> >>    uint64_t _7;
> >>
> >> ;;   basic block 2, loop depth 0
> >> ;;    pred:       ENTRY
> >>    _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> >>    return _7;
> >> ;;    succ:       EXIT
> >>
> >> }
> >>
> >> Then we will have the middle-end representation like .SAT_ADDU after
> >> this patch.
> >
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > So the question is whether the fixed-point case can be distinguished from
> > the integer case based on mode.
> >
> > There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> > no special tree operator codes for them.  So compared to what appears
> > to be the case on RTL we'd need a way to represent saturating integer
> > operations on GIMPLE.
> >
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> >
> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> >
> > Richard.
> >
> >>          PR target/51492
> >>          PR target/112600
> >>
> >> gcc/ChangeLog:
> >>
> >>          * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> >>          New func decl for the SAT_ADDU expand.
> >>          * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> >>          impl for the SAT_ADDU expand.
> >>          * config/riscv/riscv.md (sat_addu_<mode>3): New pattern to impl
> >>          the standard name SAT_ADDU.
> >>          * doc/md.texi: Add doc for SAT_ADDU.
> >>          * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> >>          * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> >>          * match.pd: Add simplify pattern patch for SAT_ADDU.
> >>          * optabs.def (OPTAB_D): Add sat_addu_optab.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>          * gcc.target/riscv/sat_addu-1.c: New test.
> >>          * gcc.target/riscv/sat_addu-2.c: New test.
> >>          * gcc.target/riscv/sat_addu-3.c: New test.
> >>          * gcc.target/riscv/sat_addu-4.c: New test.
> >>          * gcc.target/riscv/sat_addu-run-1.c: New test.
> >>          * gcc.target/riscv/sat_addu-run-2.c: New test.
> >>          * gcc.target/riscv/sat_addu-run-3.c: New test.
> >>          * gcc.target/riscv/sat_addu-run-4.c: New test.
> >>          * gcc.target/riscv/sat_arith.h: New test.
> >>
> >> Signed-off-by: Pan Li <pan2.li@intel.com>
> >> ---
> >>   gcc/config/riscv/riscv-protos.h               |  1 +
> >>   gcc/config/riscv/riscv.cc                     | 46 +++++++++++++++++
> >>   gcc/config/riscv/riscv.md                     | 11 +++++
> >>   gcc/doc/md.texi                               | 11 +++++
> >>   gcc/internal-fn.cc                            |  1 +
> >>   gcc/internal-fn.def                           |  1 +
> >>   gcc/match.pd                                  | 22 +++++++++
> >>   gcc/optabs.def                                |  2 +
> >>   gcc/testsuite/gcc.target/riscv/sat_addu-1.c   | 18 +++++++
> >>   gcc/testsuite/gcc.target/riscv/sat_addu-2.c   | 20 ++++++++
> >>   gcc/testsuite/gcc.target/riscv/sat_addu-3.c   | 17 +++++++
> >>   gcc/testsuite/gcc.target/riscv/sat_addu-4.c   | 16 ++++++
> >>   .../gcc.target/riscv/sat_addu-run-1.c         | 42 ++++++++++++++++
> >>   .../gcc.target/riscv/sat_addu-run-2.c         | 42 ++++++++++++++++
> >>   .../gcc.target/riscv/sat_addu-run-3.c         | 42 ++++++++++++++++
> >>   .../gcc.target/riscv/sat_addu-run-4.c         | 49 +++++++++++++++++++
> >>   gcc/testsuite/gcc.target/riscv/sat_arith.h    | 15 ++++++
> >>   17 files changed, 356 insertions(+)
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
> >>
> >> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> >> index ae1685850ac..f201b2384f9 100644
> >> --- a/gcc/config/riscv/riscv-protos.h
> >> +++ b/gcc/config/riscv/riscv-protos.h
> >> @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const
> tree, const char *);
> >>   extern bool
> >>   riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
> >>   extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
> >> +extern void riscv_expand_saturation_addu (rtx, rtx, rtx);
> >>
> >>   #ifdef RTX_CODE
> >>   extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr
> = 0);
> >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> index 799d7919a4a..84e86eb5d49 100644
> >> --- a/gcc/config/riscv/riscv.cc
> >> +++ b/gcc/config/riscv/riscv.cc
> >> @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p
> (machine_mode)
> >>     return true;
> >>   }
> >>
> >> +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
> >> +void
> >> +riscv_expand_saturation_addu (rtx dest, rtx x, rtx y)
> >> +{
> >> +  machine_mode mode = GET_MODE (dest);
> >> +  rtx pmode_sum = gen_reg_rtx (Pmode);
> >> +  rtx pmode_lt = gen_reg_rtx (Pmode);
> >> +  rtx pmode_x = gen_lowpart (Pmode, x);
> >> +  rtx pmode_y = gen_lowpart (Pmode, y);
> >> +  rtx pmode_dest = gen_reg_rtx (Pmode);
> >> +
> >> +  /* Step-1: sum = x + y  */
> >> +  if (mode == SImode && mode != Pmode)
> >> +    { /* Take addw to avoid the sum truncate.  */
> >> +      rtx simode_sum = gen_reg_rtx (SImode);
> >> +      riscv_emit_binary (PLUS, simode_sum, x, y);
> >> +      emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
> >> +    }
> >> +  else
> >> +    riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
> >> +
> >> +  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
> >> +  if (mode == HImode || mode == QImode)
> >> +    {
> >> +      int shift_bits = GET_MODE_BITSIZE (Pmode)
> >> +       - GET_MODE_BITSIZE (mode).to_constant ();
> >> +
> >> +      gcc_assert (shift_bits > 0);
> >> +
> >> +      riscv_emit_binary (ASHIFT, pmode_sum, pmode_sum, GEN_INT
> (shift_bits));
> >> +      riscv_emit_binary (LSHIFTRT, pmode_sum, pmode_sum, GEN_INT
> (shift_bits));
> >> +    }
> >> +
> >> +  /* Step-2: lt = sum < x  */
> >> +  riscv_emit_binary (LTU, pmode_lt, pmode_sum, pmode_x);
> >> +
> >> +  /* Step-3: lt = -lt  */
> >> +  riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> >> +
> >> +  /* Step-4: pmode_dest = sum | lt  */
> >> +  riscv_emit_binary (IOR, pmode_dest, pmode_lt, pmode_sum);
> >> +
> >> +  /* Step-5: dest = pmode_dest */
> >> +  emit_move_insn (dest, gen_lowpart (mode, pmode_dest));
> >> +}
> >> +
> >>   /* Initialize the GCC target structure.  */
> >>   #undef TARGET_ASM_ALIGNED_HI_OP
> >>   #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> >> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> >> index 39b29795cd6..03cbe5a2ca9 100644
> >> --- a/gcc/config/riscv/riscv.md
> >> +++ b/gcc/config/riscv/riscv.md
> >> @@ -3841,6 +3841,17 @@ (define_insn "*large_load_address"
> >>     [(set_attr "type" "load")
> >>      (set (attr "length") (const_int 8))])
> >>
> >> +(define_expand "sat_addu_<mode>3"
> >> +  [(match_operand:ANYI   0 "register_operand")
> >> +   (match_operand:ANYI   1 "register_operand")
> >> +   (match_operand:ANYI   2 "register_operand")]
> >> +  ""
> >> +  {
> >> +    riscv_expand_saturation_addu (operands[0], operands[1], operands[2]);
> >> +    DONE;
> >> +  }
> >> +)
> >> +
> >>   (include "bitmanip.md")
> >>   (include "crypto.md")
> >>   (include "sync.md")
> >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> >> index b0c61925120..5867afdb1a0 100644
> >> --- a/gcc/doc/md.texi
> >> +++ b/gcc/doc/md.texi
> >> @@ -6653,6 +6653,17 @@ The operation is only supported for vector modes
> @var{m}.
> >>
> >>   This pattern is not allowed to @code{FAIL}.
> >>
> >> +@cindex @code{sat_addu_@var{m}3} instruction pattern
> >> +@item @samp{sat_addu_@var{m}3}
> >> +Perform the saturation unsigned add for the operand 1 and operand 2 and
> >> +store the result into the operand 0.  All operands have mode @var{m},
> >> +which is a scalar integer mode.
> >> +
> >> +@smallexample
> >> +  typedef unsigned char uint8_t;
> >> +  uint8_t sat_addu (uint8_t x, uint8_t y) => return (x + y) | -((x + y) < x);
> >> +@end smallexample
> >> +
> >>   @cindex @code{cmla@var{m}4} instruction pattern
> >>   @item @samp{cmla@var{m}4}
> >>   Perform a vector multiply and accumulate that is semantically the same as
> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >> index a07f25f3aee..dee73dbc614 100644
> >> --- a/gcc/internal-fn.cc
> >> +++ b/gcc/internal-fn.cc
> >> @@ -4159,6 +4159,7 @@ commutative_binary_fn_p (internal_fn fn)
> >>       case IFN_VEC_WIDEN_PLUS_HI:
> >>       case IFN_VEC_WIDEN_PLUS_EVEN:
> >>       case IFN_VEC_WIDEN_PLUS_ODD:
> >> +    case IFN_SAT_ADDU:
> >>         return true;
> >>
> >>       default:
> >> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> >> index c14d30365c1..a04592fc779 100644
> >> --- a/gcc/internal-fn.def
> >> +++ b/gcc/internal-fn.def
> >> @@ -428,6 +428,7 @@ DEF_INTERNAL_WIDENING_OPTAB_FN
> (VEC_WIDEN_ABD,
> >>                                  binary)
> >>   DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub,
> ternary)
> >>   DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd,
> ternary)
> >> +DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW,
> sat_addu, binary)
> >>
> >>   /* FP scales.  */
> >>   DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index 711c3a10c3f..9de1106adcf 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -1994,6 +1994,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>    )
> >>   )
> >>
> >> +#if GIMPLE
> >> +
> >> +/* Saturation add unsigned, aka:
> >> +   SAT_ADDU = (X + Y) | - ((X + Y) < X) or
> >> +   SAT_ADDU = (X + Y) | - ((X + Y) < Y).  */
> >> +(simplify
> >> + (bit_ior:c (plus:c@2 @0 @1) (negate (convert (lt @2 @0))))
> >> +   (if (optimize
> >> +       && INTEGRAL_TYPE_P (type)
> >> +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> >> +       && types_match (type, TREE_TYPE (@0))
> >> +       && types_match (type, TREE_TYPE (@1))
> >> +       && direct_internal_fn_supported_p (IFN_SAT_ADDU, type,
> OPTIMIZE_FOR_BOTH))
> >> +   (IFN_SAT_ADDU @0 @1)))
> >> +
> >> +/* SAT_ADDU (X, 0) = X  */
> >> +(simplify
> >> + (IFN_SAT_ADDU:c @0 integer_zerop)
> >> + @0)
> >> +
> >> +#endif
> >> +
> >>   /* A few cases of fold-const.cc negate_expr_p predicate.  */
> >>   (match negate_expr_p
> >>    INTEGER_CST
> >> diff --git a/gcc/optabs.def b/gcc/optabs.def
> >> index ad14f9328b9..a2c11b7707b 100644
> >> --- a/gcc/optabs.def
> >> +++ b/gcc/optabs.def
> >> @@ -300,6 +300,8 @@ OPTAB_D (usubc5_optab, "usubc$I$a5")
> >>   OPTAB_D (addptr3_optab, "addptr$a3")
> >>   OPTAB_D (spaceship_optab, "spaceship$a3")
> >>
> >> +OPTAB_D (sat_addu_optab, "sat_addu_$a3")
> >> +
> >>   OPTAB_D (smul_highpart_optab, "smul$a3_highpart")
> >>   OPTAB_D (umul_highpart_optab, "umul$a3_highpart")
> >>
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> >> new file mode 100644
> >> index 00000000000..229abef0faa
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-1.c
> >> @@ -0,0 +1,18 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> >> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +/*
> >> +** sat_addu_uint8_t:
> >> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> >> +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
> >> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> >> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** andi\s+a0,\s*a0,\s*0xff
> >> +** ret
> >> +*/
> >> +DEF_SAT_ADDU(uint8_t)
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> >> new file mode 100644
> >> index 00000000000..4023b030811
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-2.c
> >> @@ -0,0 +1,20 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> >> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +/*
> >> +** sat_addu_uint16_t:
> >> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> >> +** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> >> +** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
> >> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> >> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** slli\s+a0,\s*a0,\s*48
> >> +** srli\s+a0,\s*a0,\s*48
> >> +** ret
> >> +*/
> >> +DEF_SAT_ADDU(uint16_t)
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> >> new file mode 100644
> >> index 00000000000..4d0af97fb67
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-3.c
> >> @@ -0,0 +1,17 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> >> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +/*
> >> +** sat_addu_uint32_t:
> >> +** addw\s+[atx][0-9]+,\s*a0,\s*a1
> >> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> >> +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** sext.w\s+a0,\s*a0
> >> +** ret
> >> +*/
> >> +DEF_SAT_ADDU(uint32_t)
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> >> new file mode 100644
> >> index 00000000000..926f31266e3
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-4.c
> >> @@ -0,0 +1,16 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fno-schedule-insns -fno-
> schedule-insns2" } */
> >> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +/*
> >> +** sat_addu_uint64_t:
> >> +** add\s+[atx][0-9]+,\s*a0,\s*a1
> >> +** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** neg\s+[atx][0-9]+,\s*[atx][0-9]+
> >> +** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
> >> +** ret
> >> +*/
> >> +DEF_SAT_ADDU(uint64_t)
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> >> new file mode 100644
> >> index 00000000000..b19515c39d1
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-1.c
> >> @@ -0,0 +1,42 @@
> >> +/* { dg-do run { target { riscv_v } } } */
> >> +/* { dg-additional-options "-std=c99" } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +DEF_SAT_ADDU(uint8_t)
> >> +
> >> +int
> >> +main ()
> >> +{
> >> +  if (RUN_SAT_ADDU (uint8_t, 0, 0) != 0)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 0, 1) != 1)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 1, 1) != 2)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 0, 254) != 254)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 1, 254) != 255)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 2, 254) != 255)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 0, 255) != 255)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 1, 255) != 255)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 2, 255) != 255)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint8_t, 255, 255) != 255)
> >> +    __builtin_abort ();
> >> +
> >> +  return 0;
> >> +}
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> >> new file mode 100644
> >> index 00000000000..90073fbe4ba
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-2.c
> >> @@ -0,0 +1,42 @@
> >> +/* { dg-do run { target { riscv_v } } } */
> >> +/* { dg-additional-options "-std=c99" } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +DEF_SAT_ADDU(uint16_t)
> >> +
> >> +int
> >> +main ()
> >> +{
> >> +  if (RUN_SAT_ADDU (uint16_t, 0, 0) != 0)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 0, 1) != 1)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 1, 1) != 2)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 0, 65534) != 65534)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 1, 65534) != 65535)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 2, 65534) != 65535)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 0, 65535) != 65535)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 1, 65535) != 65535)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 2, 65535) != 65535)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint16_t, 65535, 65535) != 65535)
> >> +    __builtin_abort ();
> >> +
> >> +  return 0;
> >> +}
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> >> new file mode 100644
> >> index 00000000000..996dd3de737
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-3.c
> >> @@ -0,0 +1,42 @@
> >> +/* { dg-do run { target { riscv_v } } } */
> >> +/* { dg-additional-options "-std=c99" } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +DEF_SAT_ADDU(uint32_t)
> >> +
> >> +int
> >> +main ()
> >> +{
> >> +  if (RUN_SAT_ADDU (uint32_t, 0, 0) != 0)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 0, 1) != 1)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 1, 1) != 2)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967294) != 4294967294)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967294) != 4294967295)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967294) != 4294967295)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 0, 4294967295) != 4294967295)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 1, 4294967295) != 4294967295)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 2, 4294967295) != 4294967295)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint32_t, 4294967295, 4294967295) != 4294967295)
> >> +    __builtin_abort ();
> >> +
> >> +  return 0;
> >> +}
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> >> new file mode 100644
> >> index 00000000000..51a5421577b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_addu-run-4.c
> >> @@ -0,0 +1,49 @@
> >> +/* { dg-do run { target { riscv_v } } } */
> >> +/* { dg-additional-options "-std=c99" } */
> >> +
> >> +#include "sat_arith.h"
> >> +
> >> +DEF_SAT_ADDU(uint64_t)
> >> +
> >> +int
> >> +main ()
> >> +{
> >> +  if (RUN_SAT_ADDU (uint64_t, 0, 0) != 0)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 0, 1) != 1)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 1, 1) != 2)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551614u)
> >> +    != 18446744073709551614u)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551614u)
> >> +    != 18446744073709551615u)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551614u)
> >> +    != 18446744073709551615u)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 0, 18446744073709551615u)
> >> +    != 18446744073709551615u)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 1, 18446744073709551615u)
> >> +    != 18446744073709551615u)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 2, 18446744073709551615u)
> >> +    != 18446744073709551615u)
> >> +    __builtin_abort ();
> >> +
> >> +  if (RUN_SAT_ADDU (uint64_t, 18446744073709551615u,
> 18446744073709551615u)
> >> +    != 18446744073709551615u)
> >> +    __builtin_abort ();
> >> +
> >> +  return 0;
> >> +}
> >> diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h
> b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> >> new file mode 100644
> >> index 00000000000..4c00157685e
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
> >> @@ -0,0 +1,15 @@
> >> +#ifndef HAVE_SAT_ARITH
> >> +#define HAVE_SAT_ARITH
> >> +
> >> +#include <stdint.h>
> >> +
> >> +#define DEF_SAT_ADDU(TYPE)                       \
> >> +TYPE __attribute__((noinline))                   \
> >> +sat_addu_##TYPE (TYPE x, TYPE y)                 \
> >> +{                                                \
> >> +  return (x + y) | (-(TYPE)((TYPE)(x + y) < x)); \
> >> +}
> >> +
> >> +#define RUN_SAT_ADDU(TYPE, x, y) sat_addu_##TYPE(x, y)
> >> +
> >> +#endif
> >> --
> >> 2.34.1
> >>

Georg-Johann Lay Feb. 27, 2024, 12:07 p.m. UTC | #11

Am 27.02.24 um 12:15 schrieb Tamar Christina:
>> Am 19.02.24 um 08:36 schrieb Richard Biener:
>>> On Sat, Feb 17, 2024 at 11:30 AM <pan2.li@intel.com> wrote:
>>>>
>>>> From: Pan Li <pan2.li@intel.com>
>>>>
>>>> This patch would like to add the middle-end presentation for the
>>>> unsigned saturation add.  Aka set the result of add to the max
>>>> when overflow.  It will take the pattern similar as below.
>>>>
>>>> SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
>>
>> Does this even try to wort out the costs?
>>
>> For example, with the following example
>>
>>
>> #define T __UINT16_TYPE__
>>
>> T sat_add1 (T x, T y)
>> {
>>     return (x + y) | (- (T)((T)(x + y) < x));
>> }
>>
>> T sat_add2 (T x, T y)
>> {
>>       T z = x + y;
>>       if (z < x)
>>           z = (T) -1;
>>       return z;
>> }
>>
>> And then "avr-gcc -S -Os -dp" the code is
>>
>>
>> sat_add1:
>> 	add r22,r24	 ;  7	[c=8 l=2]  *addhi3/0
>> 	adc r23,r25
>> 	ldi r18,lo8(1)	 ;  8	[c=4 l=2]  *movhi/4
>> 	ldi r19,0
>> 	cp r22,r24	 ;  9	[c=8 l=2]  cmphi3/2
>> 	cpc r23,r25
>> 	brlo .L2		 ;  10	[c=16 l=1]  branch
>> 	ldi r19,0		 ;  31	[c=4 l=1]  movqi_insn/0
>> 	ldi r18,0		 ;  32	[c=4 l=1]  movqi_insn/0
>> .L2:
>> 	clr r24	 ;  13	[c=12 l=4]  neghi2/1
>> 	clr r25
>> 	sub r24,r18
>> 	sbc r25,r19
>> 	or r24,r22		 ;  29	[c=4 l=1]  iorqi3/0
>> 	or r25,r23		 ;  30	[c=4 l=1]  iorqi3/0
>> 	ret		 ;  35	[c=0 l=1]  return
>>
>> sat_add2:
>> 	add r22,r24	 ;  8	[c=8 l=2]  *addhi3/0
>> 	adc r23,r25
>> 	cp r22,r24	 ;  9	[c=8 l=2]  cmphi3/2
>> 	cpc r23,r25
>> 	brsh .L3		 ;  10	[c=16 l=1]  branch
>> 	ldi r22,lo8(-1)	 ;  5	[c=4 l=2]  *movhi/4
>> 	ldi r23,lo8(-1)
>> .L3:
>> 	mov r25,r23	 ;  21	[c=4 l=1]  movqi_insn/0
>> 	mov r24,r22	 ;  22	[c=4 l=1]  movqi_insn/0
>> 	ret		 ;  25	[c=0 l=1]  return
>>
>> i.e. the conditional jump is better than overly smart arithmetic
>> (smaller and faster code with less register pressure).
>> With larger dypes the difference is even more pronounced-
>>
> 
> *on AVR. https://godbolt.org/z/7jaExbTa8  shows the branchless code is better.
> And the branchy code will vectorize worse if at all https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

AVR is a GCC backend

https://gcc.gnu.org/git/?p=gcc.git;a=tree;f=gcc/config/avr

and likely not the only backend where tricky arithmetic is more
expensive than branching more often than not.

Johann


> 
> But looking at that output it just seems like it's your expansion that's inefficient.
> 
> But fair point, perhaps it should be just a normal DEF_INTERNAL_SIGNED_OPTAB_FN so that we
> provide the additional optimization only for targets that want it.
> 
> Tamar

[v1] Internal-fn: Add new internal function SAT_ADDU

Commit Message

Comments

Patch