From patchwork Fri Jun 21 03:52:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 1950525 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=SapbOcro; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W53Q453Ppz20X4 for ; Fri, 21 Jun 2024 13:54:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2ED2D3896C04 for ; Fri, 21 Jun 2024 03:54:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by sourceware.org (Postfix) with ESMTPS id BB1063895FD6 for ; Fri, 21 Jun 2024 03:53:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BB1063895FD6 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BB1063895FD6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718942040; cv=none; b=WzReRBZ6D5/Br/Djt4mktjzwJGoFCu+trrbCYEGFq95VW0KMevymElkgS/8RzEbCV/HJjUjEjwMXn7zpJqjmo5IXPxRsuD/ERJ6hR6ZZd1jheYcj48rBM7peFbBtBKxBiBwod5mBNLOqKGaExKsCYIgd7qhOg8I3Lkj6Gb2k59s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718942040; c=relaxed/simple; bh=eZEtgb2n+f8gsoRmzPDz8lAKqAEQ+8HF2AXyCE9+yHY=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=iSbb7dwpZxWWLEtDPB+DUXh+R0Lq7iPkOP+H1u29YlY2nQwYKlKVogSKXEQvQzjuuZjtT3+s6Aa7RLhPjeER4zt9iofb0RLlag/iAJGB0qyfkSkrsQ+heHpDSZaS44Do7C9C1iDt/ADNoxqtQOurUQ3uxhhNUBLPsYQgEWgpLw4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718942038; x=1750478038; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=eZEtgb2n+f8gsoRmzPDz8lAKqAEQ+8HF2AXyCE9+yHY=; b=SapbOcroLG+faY/MFLMSCN78D+xP8ueEaYH7Z6t2zRCoL1/lyOgCfpPP uidVKWMGDdgnTvQUuPCVjIwRKQ/1XZrlC2nyMnlBg5Wat04xjtr2iQl1v jylXKUtuzOvyj1nxxPgiPx5R2tSnMritcOcpStuH7mcvFE7wKzjAxpc+E 8Q5lox1D6s7Mg/YwkqYQPbW6rDt+cDGSSYUX/YCdiGpQcahxxzfZmcmMj ZK1/bwMM+j2YBz2DW0R+fu5sLyLWsUJpx5dXnChdhDAQT+TxunYkhtALU A70Z6TkPC7gQm+9JBiXY9Yz+5fM57156GWWkNO8t9nwz5AjTA9ORpdIsd A==; X-CSE-ConnectionGUID: +kZLSPciREGc1Ls5BGVwkw== X-CSE-MsgGUID: +f8+S6TRT3S68+EbhffK0A== X-IronPort-AV: E=McAfee;i="6700,10204,11109"; a="26553962" X-IronPort-AV: E=Sophos;i="6.08,253,1712646000"; d="scan'208";a="26553962" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jun 2024 20:53:56 -0700 X-CSE-ConnectionGUID: QTKKiDgORx6RIaZ525degA== X-CSE-MsgGUID: JDVQastuQMOGop1MeU5oeg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,253,1712646000"; d="scan'208";a="73205298" Received: from shvmail02.sh.intel.com ([10.239.244.9]) by orviesa002.jf.intel.com with ESMTP; 20 Jun 2024 20:53:54 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail02.sh.intel.com (Postfix) with ESMTP id 60CBD10080E9; Fri, 21 Jun 2024 11:53:53 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@gmail.com, richard.guenther@gmail.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Pan Li Subject: [PATCH v1] Ifcvt: Add cond tree reconcile for truncated .SAT_SUB Date: Fri, 21 Jun 2024 11:52:52 +0800 Message-Id: <20240621035252.742099-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Pan Li The zip benchmark of coremark-pro have one SAT_SUB like pattern but truncated as below: void test (uint16_t *x, unsigned b, unsigned n) { unsigned a = 0; register uint16_t *p = x; do { a = *--p; *p = (uint16_t)(a >= b ? a - b : 0); // Truncate the result of SAT_SUB } while (--n); } It will have gimple after ifcvt pass, it cannot hit any pattern of SAT_SUB and then cannot vectorize to SAT_SUB. _2 = a_11 - b_12(D); iftmp.0_13 = (short unsigned int) _2; _18 = a_11 >= b_12(D); iftmp.0_5 = _18 ? iftmp.0_13 : 0; This patch would like to do some reconcile for above pattern to match the SAT_SUB pattern. Then the underlying vect pass is able to vectorize the SAT_SUB. _2 = a_11 - b_12(D); _18 = a_11 >= b_12(D); _pattmp = _18 ? _2 : 0; // .SAT_SUB pattern iftmp.0_13 = (short unsigned int) _pattmp; iftmp.0_5 = iftmp.0_13; The below tests are running for this patch. 1. The rv64gcv fully regression tests. 2. The rv64gcv build with glibc. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. gcc/ChangeLog: * match.pd: Add new match for trunated unsigned sat_sub. * tree-if-conv.cc (gimple_truncated_unsigned_integer_sat_sub): New external decl from match.pd. (tree_if_cond_reconcile_unsigned_integer_sat_sub): New func impl to reconcile the truncated sat_sub pattern. (tree_if_cond_reconcile): New func impl to reconcile. (pass_if_conversion::execute): Try to reconcile after ifcvt. Signed-off-by: Pan Li --- gcc/match.pd | 9 +++++ gcc/tree-if-conv.cc | 83 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 3d0689c9312..9617a5f9d5e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3210,6 +3210,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1)))) +/* Unsigned saturation sub and then truncated, aka: + Truncated = X >= Y ? (Other Type) (X - Y) : 0. + */ +(match (truncated_unsigned_integer_sat_sub @0 @1) + (cond (ge @0 @1) (convert (minus @0 @1)) integer_zerop) + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) + && types_match (@0, @1) + && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (@0)))))) + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne) diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 57992b6deca..535743130f2 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -3738,6 +3738,87 @@ bitfields_to_lower_p (class loop *loop, return !reads_to_lower.is_empty () || !writes_to_lower.is_empty (); } +extern bool gimple_truncated_unsigned_integer_sat_sub (tree, tree*, + tree (*)(tree)); + +/* + * Try to reconcile the stmt pattern as below to math the SAT_SUB + * in vectorization. If and only if the related internal_fn has + * been implemented already. + * + * The reconcile will insert one new stmt named 'a' in below example, + * replace the stmt '4' by new added stmt 'b' as well. Then the stmt + * pattern is able to hit the SAT_SUB pattern in the underlying pass. + * + * 1. _2 = a_11 - b_12(D); + * 2. iftmp.0_13 = (short unsigned int) _2; + * 3. _18 = a_11 >= b_12(D); + * 4. iftmp.0_5 = _18 ? iftmp.0_13 : 0; + * ==> + * 1. _2 = a_11 - b_12(D); + * 3. _18 = a_11 >= b_12(D); + * a. pattmp = _18 ? _2 : 0; // New insertion + * 2. iftmp.0_13 = (short unsigned int) _pattmp; // Move before + * b. iftmp.0_5 = iftmp.0_13; + * == Replace ==> 4. iftmp.0_5 = _18 ? iftmp.0_13 : 0; + */ +static void +tree_if_cond_reconcile_unsigned_integer_sat_sub (gimple_stmt_iterator *gsi, + gassign *stmt) +{ + tree ops[2]; + tree lhs = gimple_assign_lhs (stmt); + bool supported_p = direct_internal_fn_supported_p (IFN_SAT_SUB, + TREE_TYPE (lhs), + OPTIMIZE_FOR_BOTH); + + if (supported_p && gimple_truncated_unsigned_integer_sat_sub (lhs, ops, NULL)) + { + tree cond = gimple_assign_rhs1 (stmt); // aka _18 + tree truncated = gimple_assign_rhs2 (stmt); // aka iftmp.0_13 + gimple *stmt_2 = SSA_NAME_DEF_STMT (truncated); + tree minus = gimple_assign_rhs1 (stmt_2); // aka _2 + tree raw_type = TREE_TYPE (minus); + tree zero = build_zero_cst (raw_type); + tree tmp = make_temp_ssa_name (raw_type, NULL, "sat_sub_tmp"); + + /* For stmt 'a' in above example */ + gimple *stmt_a = gimple_build_assign (tmp, COND_EXPR, cond, minus, zero); + gsi_insert_before (gsi, stmt_a, GSI_SAME_STMT); + update_stmt (stmt_a); + + /* For stmt '2' in above example */ + gimple_stmt_iterator stmt_2_gsi = gsi_for_stmt (stmt_2); + gsi_move_before (&stmt_2_gsi, gsi, GSI_SAME_STMT); + gimple_assign_set_rhs1 (stmt_2, tmp); + update_stmt (stmt_2); + + /* For stmt 'b' in above example */ + gimple *stmt_b = gimple_build_assign (lhs, NOP_EXPR, truncated); + gsi_replace (gsi, stmt_b, /* update_eh_info */ true); + update_stmt (stmt_b); + } +} + +static void +tree_if_cond_reconcile (function *fun) +{ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) + { + gimple_stmt_iterator gsi; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (is_gimple_assign (stmt)) + { + gassign *assign = dyn_cast (stmt); + tree_if_cond_reconcile_unsigned_integer_sat_sub (&gsi, assign); + } + } + } +} /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something @@ -4063,6 +4144,8 @@ pass_if_conversion::execute (function *fun) } } + tree_if_cond_reconcile (fun); + return 0; }