From patchwork Tue Dec 12 06:12:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liu, Hongtao" X-Patchwork-Id: 1874886 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=gn29TsIc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sq7Z55qLBz20Gd for ; Tue, 12 Dec 2023 17:12:29 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CF24385841A for ; Tue, 12 Dec 2023 06:12:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 5E984385841A for ; Tue, 12 Dec 2023 06:12:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5E984385841A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5E984385841A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=134.134.136.24 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702361536; cv=none; b=Zp3CSR6L/dgS77vsg7Ovy3c9DaBZJdbaJnrx/881oXAoOp6Lf9V3HsG10nz9+gHJHwP2zL2U8h+8tEVg4gWDMkEfJJdDBa6uc3GDyaB7Zk922R4Fb+1O0f1wVvQCbgApVoGfWmBHkHZM+K57haFwHcgcf6rWQjlf20FL0TK0n2E= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702361536; c=relaxed/simple; bh=Ep5SZFpq/vr/UaEcNlPkROp0uJYR27E50JJVrJgxykM=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=g5ni2t68JSlwnuIJWC1jNwtG/UsuYnsKhCJT3xdoZ6M2WJpzoIGXBS2UqcXTn/2nmy0oQmIJW1p4X3vPZi12ay9wl0gViHwsbPAOsOn3cxaWD3chJXpYlLaO2Qr7M0o/D8zlnh+5qp0a2v8h5Ik1WxIzCxWR4AjNZXmhsoDEfmc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702361534; x=1733897534; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=Ep5SZFpq/vr/UaEcNlPkROp0uJYR27E50JJVrJgxykM=; b=gn29TsIcImhfEdzkA0yCiafzbr7fVnUOCs0qV3fz9boYEi5gkwJTkcTW kc3rw4EHuR3Y7K6EAQjs+KUXa//7yptdA/wkVcEbYZuFZhYTEDGDlc5IJ XJKhykNQys1RU8tkr3lyFbGJZh0HfSHcFO3BcWsiZlJlt2VfCSSe4dkHd IekIdVrltO3pj7IxNajwgqDxCEC0gv3k74ifj7qiZd93aqVsFKGnEJ8sV Wp6XMBdnUWq6hiOz73tzuuBz+UVqqSN6jzaiVSOQX/SGyMy17UmweYUEe qn1euvsZu+RB3VTCMIKLdadlnyikLl4tNRP6QoB79/w5P0VioGNgWjjP2 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="397543870" X-IronPort-AV: E=Sophos;i="6.04,269,1695711600"; d="scan'208";a="397543870" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 22:12:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="776972348" X-IronPort-AV: E=Sophos;i="6.04,269,1695711600"; d="scan'208";a="776972348" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga007.fm.intel.com with ESMTP; 11 Dec 2023 22:12:09 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id CF420100781B; Tue, 12 Dec 2023 14:12:08 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] Adjust vectorized cost for reduction. Date: Tue, 12 Dec 2023 14:12:08 +0800 Message-Id: <20231212061208.234184-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org x86 doesn't support horizontal reduction instructions, reduc_op_scal_m is emulated with vec_extract_half + op(half vector length) Take that into account when calculating cost for vectorization. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. No big performance impact on SPEC2017 as measured on ICX. Ok for trunk? gcc/ChangeLog: PR target/112325 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Handle reduction vec_to_scalar. (ix86_vector_costs::ix86_vect_reduc_cost): New function. --- gcc/config/i386/i386.cc | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 4b6bad37c8f..02c9a5004a1 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -24603,6 +24603,7 @@ private: /* Estimate register pressure of the vectorized code. */ void ix86_vect_estimate_reg_pressure (); + unsigned ix86_vect_reduc_cost (stmt_vec_info, tree); /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used for estimation of register pressure. ??? Currently it's only used by vec_construct/scalar_to_vec @@ -24845,6 +24846,12 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, if (TREE_CODE (op) == SSA_NAME) TREE_VISITED (op) = 0; } + /* This is a reduc_*_scal_m, x86 support reduc_*_scal_m with emulation. */ + else if (kind == vec_to_scalar + && stmt_info + && vect_is_reduction (stmt_info)) + stmt_cost = ix86_vect_reduc_cost (stmt_info, vectype); + if (stmt_cost == -1) stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); @@ -24875,6 +24882,44 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, return retval; } +/* x86 doesn't support horizontal reduction instructions, + redc_op_scal_m is emulated with vec_extract_hi + op. */ +unsigned +ix86_vector_costs::ix86_vect_reduc_cost (stmt_vec_info stmt_info, + tree vectype) +{ + gcc_assert (vectype); + unsigned cost = 0; + machine_mode mode = TYPE_MODE (vectype); + unsigned len = GET_MODE_SIZE (mode); + + /* PSADBW is used for reduc_plus_scal_{v16qi, v8qi, v4qi}. */ + if (GET_MODE_INNER (mode) == E_QImode + && stmt_info + && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (stmt_info->stmt) == PLUS_EXPR) + { + cost = ix86_cost->sse_op; + /* vec_extract_hi + vpaddb for 256/512-bit reduc_plus_scal_v*qi. */ + if (len > 16) + cost += exact_log2 (len >> 4) * ix86_cost->sse_op * 2; + } + else + /* vec_extract_hi + op. */ + cost = ix86_cost->sse_op * exact_log2 (TYPE_VECTOR_SUBPARTS (vectype)) * 2; + + /* Cout extra uops for TARGET_*_SPLIT_REGS. NB: There's no target which + supports 512-bit vector but has TARGET_AVX256/128_SPLIT_REGS. + ix86_vect_cost is not used since reduction instruction sequence are + consisted with mixed vector-length instructions after vec_extract_hi. */ + if ((len == 64 && TARGET_AVX512_SPLIT_REGS) + || (len == 32 && TARGET_AVX256_SPLIT_REGS) + || (len == 16 && TARGET_AVX256_SPLIT_REGS)) + cost += ix86_cost->sse_op; + + return cost; +} + void ix86_vector_costs::ix86_vect_estimate_reg_pressure () {