From patchwork Thu Jul 4 01:28:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1956564 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=RT6Aj8da; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WDzcL1ZSBz1xpP for ; Thu, 4 Jul 2024 11:30:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8F3EC384A49A for ; Thu, 4 Jul 2024 01:30:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by sourceware.org (Postfix) with ESMTPS id 9071F3858403 for ; Thu, 4 Jul 2024 01:30:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9071F3858403 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9071F3858403 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720056617; cv=none; b=Hp0kKn7OaYYXA8R5l2947KfFvcMWg5LjDX5eOIpiP9akUwA4t0snh2jyzG2XXrIUVnTS7zWFEKnYZjRF4JVE6Gd5Br4Xh4FSAGzp55k2hYXYBBPkuoU2AgOvQSEJ8AeqtS6gdWVnGDP7I/aU9jgOJgSv3Aj9UnKKIps0pPVoCOY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720056617; c=relaxed/simple; bh=vsVNEsZxu7fxEWRQZck/w8yKEYsEFmzTZNKCKlUQSts=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=E9Q4suhH1FLgR0otJD5FbZFHwMgSSrWIodcpC7WZE8m3v4TldngFx8nSUWOJGWIORmYVlR/mm4ipypSJEYVujGJ+G93nbA6Ose1Gfd4SjP7YTr+X+GN02LyCQsRelNf7TreDZc0cBj0JTO29mzvJl6DFgbYFwzbY8BFjIoGUDxk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720056615; x=1751592615; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vsVNEsZxu7fxEWRQZck/w8yKEYsEFmzTZNKCKlUQSts=; b=RT6Aj8dankWK860/uqPBW7c7pxSn3i7C0pZ0FE1fLx97/h+YW9GGqzpI 4NurXUiwMiM0c80rjL83kaSOBWPP9ZF/NGVav0ZjS4vYXnG32d6XxCLMD HUrwr4bUiPEjscMuCjRcXHQFkWZikT0JiI6CtZeaHqNi16WIecCoffPV6 +yq7yibNFbl1Fb4AiIdfL6ICKP+MNwwMmnVtuw2JPwEdK0xSLJMG/BGV1 T/Nde1DNZWjOJwElBiKLs7IIwbNY8hXZCfg9UyeMhM6tKuoV2uO2Njv0Q p3e65lZrzSOoKLwxaVqqO+K6RiX4SprqHl/KsIztHzU6p3xYMPIDPODMi g==; X-CSE-ConnectionGUID: AIg5TBlcQBaCHbX4Lsp/Jw== X-CSE-MsgGUID: kCv30PEGTgORILjFfrYmdg== X-IronPort-AV: E=McAfee;i="6700,10204,11122"; a="28705501" X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="28705501" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jul 2024 18:30:14 -0700 X-CSE-ConnectionGUID: ZMmzFkMwQxWtXFxkaonVaQ== X-CSE-MsgGUID: b6+LLVmDQ7u2peti1zIMyA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,183,1716274800"; d="scan'208";a="69636984" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa002.fm.intel.com with ESMTP; 03 Jul 2024 18:30:10 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id BF5BE1006FEC; Thu, 4 Jul 2024 09:30:09 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH V2] x86: Update branch hint for Redwood Cove. Date: Thu, 4 Jul 2024 09:28:09 +0800 Message-Id: <20240704012809.2385444-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org From: "H.J. Lu" >The above reads like it would be worth splitting branc_prediction_hits >into branch_prediction_hints_taken and branch_prediction_hints_not_taken >given not-taken is the default and thus will just increase code size? >According to Intel® 64 and IA-32 Architectures Optimization Reference >Manual[1], Branch Hint is updated for Redwood Cove. Changed. --------cut from [1]------------------------- Starting with the Redwood Cove microarchitecture, if the predictor has no stored information about a branch, the branch has the Intel® SSE2 branch taken hint (i.e., instruction prefix 3EH), When the codec decodes the branch, it flips the branch’s prediction from not-taken to taken. It then flushes the pipeline in front of it and steers this pipeline to fetch the taken path of the branch. --------cut end ----------------------------- Split tune branch_prediction_hints into branch_prediction_hints_taken and branch_prediction_hints_not_taken, always generate branch hint for conditional branches, both tunes are disabled by default. [1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ * config/i386/i386.cc (ix86_print_operand): Always generate branch hint for conditional branches. * config/i386/i386.h (TARGET_BRANCH_PREDICTION_HINTS): Split into .. (TARGET_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and .. (TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this. * config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS): Split into .. (X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and .. (X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this. --- gcc/config/i386/i386.cc | 29 +++++++++-------------------- gcc/config/i386/i386.h | 6 ++++-- gcc/config/i386/x86-tune.def | 13 +++++++++++-- 3 files changed, 24 insertions(+), 24 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 1f71ed04be6..ea9cb620f8d 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -14041,7 +14041,8 @@ ix86_print_operand (FILE *file, rtx x, int code) if (!optimize || optimize_function_for_size_p (cfun) - || !TARGET_BRANCH_PREDICTION_HINTS) + || (!TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN + && !TARGET_BRANCH_PREDICTION_HINTS_TAKEN)) return; x = find_reg_note (current_output_insn, REG_BR_PROB, 0); @@ -14050,25 +14051,13 @@ ix86_print_operand (FILE *file, rtx x, int code) int pred_val = profile_probability::from_reg_br_prob_note (XINT (x, 0)).to_reg_br_prob_base (); - if (pred_val < REG_BR_PROB_BASE * 45 / 100 - || pred_val > REG_BR_PROB_BASE * 55 / 100) - { - bool taken = pred_val > REG_BR_PROB_BASE / 2; - bool cputaken - = final_forward_branch_p (current_output_insn) == 0; - - /* Emit hints only in the case default branch prediction - heuristics would fail. */ - if (taken != cputaken) - { - /* We use 3e (DS) prefix for taken branches and - 2e (CS) prefix for not taken branches. */ - if (taken) - fputs ("ds ; ", file); - else - fputs ("cs ; ", file); - } - } + bool taken = pred_val > REG_BR_PROB_BASE / 2; + /* We use 3e (DS) prefix for taken branches and + 2e (CS) prefix for not taken branches. */ + if (taken && TARGET_BRANCH_PREDICTION_HINTS_TAKEN) + fputs ("ds ; ", file); + else if (!taken && TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN) + fputs ("cs ; ", file); } return; } diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 9ed225ec587..50ebed221dc 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -309,8 +309,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_ZERO_EXTEND_WITH_AND \ ix86_tune_features[X86_TUNE_ZERO_EXTEND_WITH_AND] #define TARGET_UNROLL_STRLEN ix86_tune_features[X86_TUNE_UNROLL_STRLEN] -#define TARGET_BRANCH_PREDICTION_HINTS \ - ix86_tune_features[X86_TUNE_BRANCH_PREDICTION_HINTS] +#define TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN \ + ix86_tune_features[X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN] +#define TARGET_BRANCH_PREDICTION_HINTS_TAKEN \ + ix86_tune_features[X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN] #define TARGET_DOUBLE_WITH_ADD ix86_tune_features[X86_TUNE_DOUBLE_WITH_ADD] #define TARGET_USE_SAHF ix86_tune_features[X86_TUNE_USE_SAHF] #define TARGET_MOVX ix86_tune_features[X86_TUNE_MOVX] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 343c32c291f..3d29bffc49c 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -683,15 +683,24 @@ DEF_TUNE (X86_TUNE_NOT_VECTORMODE, "not_vectormode", m_K6) DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE, "avoid_vector_decode", m_K8) +/* X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN, starting with the Redwood Cove + microarchitecture, if the predictor has no stored information about a branch, + the branch has the Intel® SSE2 branch taken hint + (i.e., instruction prefix 3EH), When the codec decodes the branch, it flips + the branch’s prediction from not-taken to taken. It then flushes the pipeline + in front of it and steers this pipeline to fetch the taken path of the + branch. */ +DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN, "branch_prediction_hints_taken", m_NONE) + /*****************************************************************************/ /* This never worked well before. */ /*****************************************************************************/ -/* X86_TUNE_BRANCH_PREDICTION_HINTS: Branch hints were put in P4 based +/* X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN: Branch hints were put in P4 based on simulation result. But after P4 was made, no performance benefit was observed with branch hints. It also increases the code size. As a result, icc never generates branch hints. */ -DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", m_NONE) +DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN, "branch_prediction_hints_not_taken", m_NONE) /* X86_TUNE_QIMODE_MATH: Enable use of 8bit arithmetic. */ DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", m_ALL)