From patchwork Wed May 15 02:14:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 1935239 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=LC1acAFx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VfGyJ6Y2pz1ymf for ; Wed, 15 May 2024 12:14:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 77B4A3849AE2 for ; Wed, 15 May 2024 02:14:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by sourceware.org (Postfix) with ESMTPS id A08533858D35 for ; Wed, 15 May 2024 02:14:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A08533858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A08533858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715739261; cv=none; b=w4Y2mpaaMBT4xPZasPvaXz8ZgJYtB4ot5Fef20zaJDd4pcmPAJEXXf23flJUxetk3+Pz0xv/J+GxRZSma8o7kx7P3GlmEcHDT9k2rCKrXTcgsGnIYImkkJjzqrC4an6R5ln7Z8cg/JgizFC0xN/ijcjqptxcpjsQ4Lb2pCcvxwI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715739261; c=relaxed/simple; bh=UDlhstPXQo+2IKr3btFWxFkb2f5NFo74ru8fs6BQdCs=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=IyR7y0nBCDTWACZ66ZwEd2dh3l9qc6ggwvWaaGtOZOtp6DWNt+bDMsOEFzewxw4Iw6ya29JnXNtCl2Y2elqGXQjOF6mr1QZyRehSf88vP0PemAZrFVV0Wg4hnnEnqIowmrAbI4hQVfwkGbmeCDoYY9O0um0d9xILGX3Je4Nkx+0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715739258; x=1747275258; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=UDlhstPXQo+2IKr3btFWxFkb2f5NFo74ru8fs6BQdCs=; b=LC1acAFx1dv0/gynmOUUI3ft1EIwl0ltzrQdUOPldxa8swHUu4JrcLUk hmWz7w3w5Kf4rP0Dv+XLvXJ/tQpGzlkc697M3MewuAoI/mcjpjeDMNG27 SJhXOf6kCjh1EEBlvwFqFO1X7yRWpSr9GJ2EYr5vOAmRLGJ57xVg96Mur e9JRaCKSPcpLKU1bNF/CINUlMtQUAOH+b8obp0ZRGg03nIoCe/0LpVwTW zgMjduIzfgCkiCPESftEy9IdijpF6VWPxqCRE0tIVor5y3koOID2a2kvK nVGrZgzLd1BffPwF9mbtB8DM/KAi4eF+0UiF/xC/hPNhJ9lD16Giy2QHC g==; X-CSE-ConnectionGUID: 3TkDC9oVQ9O2HimqfCdQ0Q== X-CSE-MsgGUID: 75Etd+wqRQqCHaaVSw0l5g== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="23168667" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="23168667" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 19:14:16 -0700 X-CSE-ConnectionGUID: Yv1Bvj1dQjmNujakdXBlyg== X-CSE-MsgGUID: tnysZp8KTGqvEkBmgnB8Fg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="61716356" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orviesa002.jf.intel.com with ESMTP; 14 May 2024 19:14:12 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 02D8F10081FF; Wed, 15 May 2024 10:14:12 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, tamar.christina@arm.com, richard.guenther@gmail.com, hongtao.liu@intel.com, Pan Li Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Date: Wed, 15 May 2024 10:14:05 +0800 Message-Id: <20240515021407.1287623-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Pan Li This patch would like to add the middle-end presentation for the saturation add. Aka set the result of add to the max when overflow. It will take the pattern similar as below. SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) Take uint8_t as example, we will have: * SAT_ADD (1, 254) => 255. * SAT_ADD (1, 255) => 255. * SAT_ADD (2, 255) => 255. * SAT_ADD (255, 255) => 255. Given below example for the unsigned scalar integer uint64_t: uint64_t sat_add_u64 (uint64_t x, uint64_t y) { return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); } Before this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { long unsigned int _1; _Bool _2; long unsigned int _3; long unsigned int _4; uint64_t _7; long unsigned int _10; __complex__ long unsigned int _11; ;; basic block 2, loop depth 0 ;; pred: ENTRY _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); _1 = REALPART_EXPR <_11>; _10 = IMAGPART_EXPR <_11>; _2 = _10 != 0; _3 = (long unsigned int) _2; _4 = -_3; _7 = _1 | _4; return _7; ;; succ: EXIT } After this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { uint64_t _7; ;; basic block 2, loop depth 0 ;; pred: ENTRY _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] return _7; ;; succ: EXIT } The below tests are passed for this patch: 1. The riscv fully regression tests. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD to the return true switch case(s). * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. * match.pd: Add unsigned SAT_ADD match(es). * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern func decl generated in match.pd match. (match_saturation_arith): New func impl to match the saturation arith. (math_opts_dom_walker::after_dom_children): Try match saturation arith when IOR expr. Signed-off-by: Pan Li --- gcc/internal-fn.cc | 1 + gcc/internal-fn.def | 2 ++ gcc/match.pd | 51 +++++++++++++++++++++++++++++++++++++++ gcc/optabs.def | 4 +-- gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++ 5 files changed, 88 insertions(+), 2 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 0a7053c2286..73045ca8c8c 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn) case IFN_UBSAN_CHECK_MUL: case IFN_ADD_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_SAT_ADD: case IFN_VEC_WIDEN_PLUS: case IFN_VEC_WIDEN_PLUS_LO: case IFN_VEC_WIDEN_PLUS_HI: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 848bb9dbff3..25badbb86e5 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first, DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first, smulhrs, umulhrs, binary) +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary) + DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary) DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary) DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary) diff --git a/gcc/match.pd b/gcc/match.pd index 07e743ae464..0f9c34fa897 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || POINTER_TYPE_P (itype)) && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)))))) +/* Unsigned Saturation Add */ +(match (usadd_left_part_1 @0 @1) + (plus:c @0 @1) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +(match (usadd_left_part_2 @0 @1) + (realpart (IFN_ADD_OVERFLOW:c @0 @1)) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +(match (usadd_right_part_1 @0 @1) + (negate (convert (lt (plus:c @0 @1) @0))) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +(match (usadd_right_part_1 @0 @1) + (negate (convert (gt @0 (plus:c @0 @1)))) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +(match (usadd_right_part_2 @0 @1) + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop))) + (if (INTEGRAL_TYPE_P (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@0)) + && types_match (type, TREE_TYPE (@1))))) + +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2 + because the sub part of left_part_2 cannot work with right_part_1. + For example, left_part_2 pattern focus one .ADD_OVERFLOW but the + right_part_1 has nothing to do with .ADD_OVERFLOW. */ + +/* Unsigned saturation add, case 1 (branchless): + SAT_U_ADD = (X + Y) | - ((X + Y) < X) or + SAT_U_ADD = (X + Y) | - (X > (X + Y)). */ +(match (unsigned_integer_sat_add @0 @1) + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1))) + +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW). */ +(match (unsigned_integer_sat_add @0 @1) + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne) diff --git a/gcc/optabs.def b/gcc/optabs.def index ad14f9328b9..3f2cb46aff8 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3") OPTAB_NX(add_optab, "add$Q$a3") OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc) OPTAB_VX(addv_optab, "add$F$a3") -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc) -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc) +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc) +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc) OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc) OPTAB_NX(sub_optab, "sub$F$a3") OPTAB_NX(sub_optab, "sub$Q$a3") diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index e8c804f09b7..62da1c5ee08 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple *cast_stmt, gimple *&use_stmt, return 0; } +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); + +/* + * Try to match saturation arith pattern(s). + * 1. SAT_ADD (unsigned) + * _7 = _4 + _6; + * _8 = _4 > _7; + * _9 = (long unsigned int) _8; + * _10 = -_9; + * _12 = _7 | _10; + * => + * _12 = .SAT_ADD (_4, _6); */ +static void +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt) +{ + gcall *call = NULL; + + tree ops[2]; + tree lhs = gimple_assign_lhs (stmt); + + if (gimple_unsigned_integer_sat_add (lhs, ops, NULL) + && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs), + OPTIMIZE_FOR_BOTH)) + { + call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]); + gimple_call_set_lhs (call, lhs); + gsi_replace (gsi, call, true); + } +} + /* Recognize for unsigned x x = y - z; if (x > y) @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children (basic_block bb) break; case BIT_IOR_EXPR: + match_saturation_arith (&gsi, as_a (stmt)); + /* fall-through */ case BIT_XOR_EXPR: match_uaddc_usubc (&gsi, stmt, code); break;