Message ID | 20240827032001.4000076-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | [v2,1/2] Enhance cse_insn to handle all-zeros and all-ones for vector mode. | expand |
On Tue, Aug 27, 2024 at 5:20 AM liuhongt <hongtao.liu@intel.com> wrote: > > > You are possibly overwriting src_related_elt - I'd suggest to either break > > here or do the loop below for each found elt? > Changed. > > > Do we know that will always succeed? > 1) validate_subreg allows subreg for 2 vector modes with same component modes. > 2) gen_lowpart in cse.cc is defined as gen_lowpart_if_possible, > If it fails, it returns 0, just fallback to src_related = 0. > > > So on the GIMPLE side we are trying to handle such cases by maintaining > > only a single element in the hashtables, thus hash and compare them > > the same - them in this case (vec_dup:M (reg:c)) and (vec_dup:N (reg:c)), > > leaving it up to the consumer to reject or pun mismatches. > rtx_cost will be used to decided if it's profitable > ((subreg:M (reg: N) 0) vs (vec_dup:M (reg:c))), if M and N is > not tieable, rtx_cost will be expensive and failed the replacement. I see. > > > > For constants that would hold even more - note CSEing vs. duplicating > > constants might not be universally good. > Assume you mean (reg:c) in (vec_dup:M (reg:c) is from a constant, the later I was refering to the const0_rtx and constm1_rtx hunk at the start of the patch (maybe split the patch for bisection purposes?). And mostly from the context of RA and register pressure - though I would guess RA should be good enough to handle re-materialization here but still there's a single vs. a split live-range. > rtl optimizer (.i.e forwprop/combine) will try to do the further simplication > for the constants if rtx_cost is profitable.) > For const_vector, it handled by the other codes > > 5063 /* Try to re-materialize a vec_dup with an existing constant. */ > 5064 rtx src_elt; > 5065 if ((!src_eqv_here || CONSTANT_P (src_eqv_here)) > 5066 && const_vec_duplicate_p (src, &src_elt)) > 5067 { > Another general remark would be - did you try to handle these transforms in postreload CSE? That's supposed to detect "noop moves" and in fact all this does for x86 is to re-use a hardreg? I'll leave actual review to CSE/RTL experts. Thanks, Richard. > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > Also try to handle redundant broadcasts when there's already a > broadcast to a bigger mode with exactly the same component value. > For broadcast, component mode needs to be the same. > For all-zeros/ones, only need to check the bigger mode. > > gcc/ChangeLog: > > PR rtl-optimization/92080 > * cse.cc (cse_insn): Handle all-ones/all-zeros, and vec_dup > with variables. > --- > gcc/cse.cc | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 82 insertions(+) > > diff --git a/gcc/cse.cc b/gcc/cse.cc > index 65794ac5f2c..fab2f515f8c 100644 > --- a/gcc/cse.cc > +++ b/gcc/cse.cc > @@ -4870,6 +4870,50 @@ cse_insn (rtx_insn *insn) > } > } > > + /* Try to handle special const_vector with elt 0 or -1. > + They can be represented with different modes, and can be cse. */ > + if (src_const && src_related == 0 && CONST_VECTOR_P (src_const) > + && (src_const == CONST0_RTX (mode) > + || src_const == CONSTM1_RTX (mode)) > + && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) > + { > + machine_mode mode_iter; > + > + for (int l = 0; l != 2; l++) > + { > + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_VECTOR_INT) > + { > + if (maybe_lt (GET_MODE_SIZE (mode_iter), > + GET_MODE_SIZE (mode))) > + continue; > + > + rtx src_const_iter = (src_const == CONST0_RTX (mode) > + ? CONST0_RTX (mode_iter) > + : CONSTM1_RTX (mode_iter)); > + > + struct table_elt *const_elt > + = lookup (src_const_iter, HASH (src_const_iter, mode_iter), > + mode_iter); > + > + if (const_elt == 0) > + continue; > + > + for (const_elt = const_elt->first_same_value; > + const_elt; const_elt = const_elt->next_same_value) > + if (REG_P (const_elt->exp)) > + { > + src_related = gen_lowpart (mode, const_elt->exp); > + break; > + } > + > + if (src_related != 0) > + break; > + } > + if (src_related != 0) > + break; > + } > + } > + > /* See if we have a CONST_INT that is already in a register in a > wider mode. */ > > @@ -5041,6 +5085,44 @@ cse_insn (rtx_insn *insn) > } > } > > + /* Try to find something like (vec_dup:v16si (reg:c)) > + for (vec_dup:v8si (reg:c)). */ > + if (src_related == 0 > + && VECTOR_MODE_P (mode) > + && GET_CODE (src) == VEC_DUPLICATE) > + { > + poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (src)) * 2; > + rtx inner_elt = XEXP (src, 0); > + machine_mode result_mode; > + struct table_elt *src_related_elt = NULL;; > + while (related_vector_mode (mode, GET_MODE_INNER (mode), > + nunits).exists (&result_mode)) > + { > + rtx vec_dup = gen_rtx_VEC_DUPLICATE (result_mode, inner_elt); > + struct table_elt* tmp = lookup (vec_dup, HASH (vec_dup, result_mode), > + result_mode); > + if (tmp) > + { > + src_related_elt = tmp; > + break; > + } > + nunits *= 2; > + } > + > + if (src_related_elt) > + { > + for (src_related_elt = src_related_elt->first_same_value; > + src_related_elt; > + src_related_elt = src_related_elt->next_same_value) > + if (REG_P (src_related_elt->exp)) > + { > + src_related = gen_lowpart (mode, src_related_elt->exp); > + break; > + } > + } > + } > + > + > if (src == src_folded) > src_folded = 0; > > -- > 2.31.1 >
diff --git a/gcc/cse.cc b/gcc/cse.cc index 65794ac5f2c..fab2f515f8c 100644 --- a/gcc/cse.cc +++ b/gcc/cse.cc @@ -4870,6 +4870,50 @@ cse_insn (rtx_insn *insn) } } + /* Try to handle special const_vector with elt 0 or -1. + They can be represented with different modes, and can be cse. */ + if (src_const && src_related == 0 && CONST_VECTOR_P (src_const) + && (src_const == CONST0_RTX (mode) + || src_const == CONSTM1_RTX (mode)) + && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + { + machine_mode mode_iter; + + for (int l = 0; l != 2; l++) + { + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_VECTOR_INT) + { + if (maybe_lt (GET_MODE_SIZE (mode_iter), + GET_MODE_SIZE (mode))) + continue; + + rtx src_const_iter = (src_const == CONST0_RTX (mode) + ? CONST0_RTX (mode_iter) + : CONSTM1_RTX (mode_iter)); + + struct table_elt *const_elt + = lookup (src_const_iter, HASH (src_const_iter, mode_iter), + mode_iter); + + if (const_elt == 0) + continue; + + for (const_elt = const_elt->first_same_value; + const_elt; const_elt = const_elt->next_same_value) + if (REG_P (const_elt->exp)) + { + src_related = gen_lowpart (mode, const_elt->exp); + break; + } + + if (src_related != 0) + break; + } + if (src_related != 0) + break; + } + } + /* See if we have a CONST_INT that is already in a register in a wider mode. */ @@ -5041,6 +5085,44 @@ cse_insn (rtx_insn *insn) } } + /* Try to find something like (vec_dup:v16si (reg:c)) + for (vec_dup:v8si (reg:c)). */ + if (src_related == 0 + && VECTOR_MODE_P (mode) + && GET_CODE (src) == VEC_DUPLICATE) + { + poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (src)) * 2; + rtx inner_elt = XEXP (src, 0); + machine_mode result_mode; + struct table_elt *src_related_elt = NULL;; + while (related_vector_mode (mode, GET_MODE_INNER (mode), + nunits).exists (&result_mode)) + { + rtx vec_dup = gen_rtx_VEC_DUPLICATE (result_mode, inner_elt); + struct table_elt* tmp = lookup (vec_dup, HASH (vec_dup, result_mode), + result_mode); + if (tmp) + { + src_related_elt = tmp; + break; + } + nunits *= 2; + } + + if (src_related_elt) + { + for (src_related_elt = src_related_elt->first_same_value; + src_related_elt; + src_related_elt = src_related_elt->next_same_value) + if (REG_P (src_related_elt->exp)) + { + src_related = gen_lowpart (mode, src_related_elt->exp); + break; + } + } + } + + if (src == src_folded) src_folded = 0;