Message ID | 20240702025133.2737549-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | x86: Update branch hint for Redwood Cove. | expand |
On Tue, Jul 2, 2024 at 4:54 AM liuhongt <hongtao.liu@intel.com> wrote: > > From: "H.J. Lu" <hjl.tools@gmail.com> > > According to Intel® 64 and IA-32 Architectures Optimization Reference > Manual[1], Branch Hint is updated for Redwood Cove. > > --------cut from [1]------------------------- > Starting with the Redwood Cove microarchitecture, if the predictor has > no stored information about a branch, the branch has the Intel® SSE2 > branch taken hint (i.e., instruction prefix 3EH), When the codec > decodes the branch, it flips the branch’s prediction from not-taken to > taken. It then flushes the pipeline in front of it and steers this > pipeline to fetch the taken path of the branch. > --------cut end ----------------------------- The above reads like it would be worth splitting branc_prediction_hits into branch_prediction_hints_taken and branch_prediction_hints_not_taken given not-taken is the default and thus will just increase code size? > For -mtune-ctrl=branch_prediction_hints, always generate branch hint for > conditional branches, this tune is disabled by default. > > [1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > gcc/ > > * config/i386/i386.cc (ix86_print_operand): Always generate > branch hint for conditional branches. > --- > gcc/config/i386/i386.cc | 24 +++++------------------- > 1 file changed, 5 insertions(+), 19 deletions(-) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 1f71ed04be6..9992b9d6186 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -14050,25 +14050,11 @@ ix86_print_operand (FILE *file, rtx x, int code) > int pred_val = profile_probability::from_reg_br_prob_note > (XINT (x, 0)).to_reg_br_prob_base (); > > - if (pred_val < REG_BR_PROB_BASE * 45 / 100 > - || pred_val > REG_BR_PROB_BASE * 55 / 100) > - { > - bool taken = pred_val > REG_BR_PROB_BASE / 2; > - bool cputaken > - = final_forward_branch_p (current_output_insn) == 0; > - > - /* Emit hints only in the case default branch prediction > - heuristics would fail. */ > - if (taken != cputaken) > - { > - /* We use 3e (DS) prefix for taken branches and > - 2e (CS) prefix for not taken branches. */ > - if (taken) > - fputs ("ds ; ", file); > - else > - fputs ("cs ; ", file); > - } > - } > + bool taken = pred_val > REG_BR_PROB_BASE / 2; > + /* We use 3e (DS) prefix for taken branches and > + 2e (CS) prefix for not taken branches. */ > + if (taken) > + fputs ("ds ; ", file); > } > return; > } > -- > 2.31.1 >
liuhongt <hongtao.liu@intel.com> writes: > From: "H.J. Lu" <hjl.tools@gmail.com> > > According to Intel® 64 and IA-32 Architectures Optimization Reference > Manual[1], Branch Hint is updated for Redwood Cove. > > --------cut from [1]------------------------- > Starting with the Redwood Cove microarchitecture, if the predictor has > no stored information about a branch, the branch has the Intel® SSE2 > branch taken hint (i.e., instruction prefix 3EH), When the codec > decodes the branch, it flips the branch’s prediction from not-taken to > taken. It then flushes the pipeline in front of it and steers this > pipeline to fetch the taken path of the branch. > --------cut end ----------------------------- > > For -mtune-ctrl=branch_prediction_hints, always generate branch hint for > conditional branches, this tune is disabled by default. > > [1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. So what does it do to code size? You may not want to do it with -Os. Maybe it should be only done with actual profile feedback data available, i'm not sure if the builtin heuristics are good enough to justify it and there is a risk that it is very wrong. Yes as long as it's disabled by default that's all not a problem, but it would need to be solved to enable it. -Andi
On Wed, Jul 3, 2024 at 2:10 AM Andi Kleen <ak@linux.intel.com> wrote: > > liuhongt <hongtao.liu@intel.com> writes: > > > From: "H.J. Lu" <hjl.tools@gmail.com> > > > > According to Intel® 64 and IA-32 Architectures Optimization Reference > > Manual[1], Branch Hint is updated for Redwood Cove. > > > > --------cut from [1]------------------------- > > Starting with the Redwood Cove microarchitecture, if the predictor has > > no stored information about a branch, the branch has the Intel® SSE2 > > branch taken hint (i.e., instruction prefix 3EH), When the codec > > decodes the branch, it flips the branch’s prediction from not-taken to > > taken. It then flushes the pipeline in front of it and steers this > > pipeline to fetch the taken path of the branch. > > --------cut end ----------------------------- > > > > For -mtune-ctrl=branch_prediction_hints, always generate branch hint for > > conditional branches, this tune is disabled by default. > > > > [1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > So what does it do to code size? > You may not want to do it with -Os. It won't be enabled by Os, because it's guarded by if (!optimize || optimize_function_for_size_p (cfun) || !TARGET_BRANCH_PREDICTION_HINTS) return; > > Maybe it should be only done with actual profile feedback data > available, i'm not sure if the builtin heuristics are good enough to > justify it and there is a risk that it is very wrong. Yes, best used with a PGO. > > Yes as long as it's disabled by default that's all not a problem, but it > would need to be solved to enable it. > > -Andi
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 1f71ed04be6..9992b9d6186 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -14050,25 +14050,11 @@ ix86_print_operand (FILE *file, rtx x, int code) int pred_val = profile_probability::from_reg_br_prob_note (XINT (x, 0)).to_reg_br_prob_base (); - if (pred_val < REG_BR_PROB_BASE * 45 / 100 - || pred_val > REG_BR_PROB_BASE * 55 / 100) - { - bool taken = pred_val > REG_BR_PROB_BASE / 2; - bool cputaken - = final_forward_branch_p (current_output_insn) == 0; - - /* Emit hints only in the case default branch prediction - heuristics would fail. */ - if (taken != cputaken) - { - /* We use 3e (DS) prefix for taken branches and - 2e (CS) prefix for not taken branches. */ - if (taken) - fputs ("ds ; ", file); - else - fputs ("cs ; ", file); - } - } + bool taken = pred_val > REG_BR_PROB_BASE / 2; + /* We use 3e (DS) prefix for taken branches and + 2e (CS) prefix for not taken branches. */ + if (taken) + fputs ("ds ; ", file); } return; }
From: "H.J. Lu" <hjl.tools@gmail.com> According to Intel® 64 and IA-32 Architectures Optimization Reference Manual[1], Branch Hint is updated for Redwood Cove. --------cut from [1]------------------------- Starting with the Redwood Cove microarchitecture, if the predictor has no stored information about a branch, the branch has the Intel® SSE2 branch taken hint (i.e., instruction prefix 3EH), When the codec decodes the branch, it flips the branch’s prediction from not-taken to taken. It then flushes the pipeline in front of it and steers this pipeline to fetch the taken path of the branch. --------cut end ----------------------------- For -mtune-ctrl=branch_prediction_hints, always generate branch hint for conditional branches, this tune is disabled by default. [1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ * config/i386/i386.cc (ix86_print_operand): Always generate branch hint for conditional branches. --- gcc/config/i386/i386.cc | 24 +++++------------------- 1 file changed, 5 insertions(+), 19 deletions(-)