Message ID | 20240515082054.3934069-3-hongyu.wang@intel.com |
---|---|
State | New |
Headers | show |
Series | Support Intel APX CCMP | expand |
CC'd Richard for ccmp part as previously it is added only for aarch64. The original logic will not interrupted since if aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the prepare_operand will fixup the input that cmp supports but ccmp not, so ret/ret2 will all be valid when comparing cost. Thanks in advance. Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道: > > For general ccmp scenario, the tree sequence is like > > _1 = (a < b) > _2 = (c < d) > _3 = _1 & _2 > > current ccmp expanding will try to swap compare order for _1 and _2, > compare the cost/cost2 between compare _1 and _2 first, then return the > sequence with lower cost. > > For x86 ccmp, we don't support FP compare as ccmp operand, but we > support fp comi + int ccmp sequence. With current cost comparison > model, the fp comi + int ccmp can never be generated since it doesn't > check whether expand_ccmp_next returns available result and the rtl > cost for the empty ccmp sequence is always smaller. > > Check the expand_ccmp_next result ret and ret2, returns the valid one > before cost comparison. > > gcc/ChangeLog: > > * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of > expand_ccmp_next, returns the valid one first before > comparing cost. > --- > gcc/ccmp.cc | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc > index 7cb525addf4..4b424220068 100644 > --- a/gcc/ccmp.cc > +++ b/gcc/ccmp.cc > @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) > cost2 = seq_cost (prep_seq_2, speed_p); > cost2 += seq_cost (gen_seq_2, speed_p); > } > - if (cost2 < cost1) > + > + /* For x86 target the ccmp does not support fp operands, but > + have fcomi insn that can produce eflags and then do int > + ccmp. So if one of the op is fp compare, ret1 or ret2 can > + fail, and the cost of the corresponding empty seq will > + always be smaller, then the NULL sequence will be returned. > + Add check for ret and ret2, returns the available one if > + the other is NULL. */ > + if ((!ret && ret2) > + || (!(ret && !ret2) > + && cost2 < cost1)) > { > *prep_seq = prep_seq_2; > *gen_seq = gen_seq_2; > -- > 2.31.1 >
Gently ping for this :) Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can help to review this part? Thanks. Hongyu Wang <wwwhhhyyy333@gmail.com> 于2024年5月15日周三 16:25写道: > > CC'd Richard for ccmp part as previously it is added only for aarch64. > The original logic will not interrupted since if > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the > prepare_operand will fixup the input that cmp supports but ccmp not, > so ret/ret2 will all be valid when comparing cost. > Thanks in advance. > > Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道: > > > > For general ccmp scenario, the tree sequence is like > > > > _1 = (a < b) > > _2 = (c < d) > > _3 = _1 & _2 > > > > current ccmp expanding will try to swap compare order for _1 and _2, > > compare the cost/cost2 between compare _1 and _2 first, then return the > > sequence with lower cost. > > > > For x86 ccmp, we don't support FP compare as ccmp operand, but we > > support fp comi + int ccmp sequence. With current cost comparison > > model, the fp comi + int ccmp can never be generated since it doesn't > > check whether expand_ccmp_next returns available result and the rtl > > cost for the empty ccmp sequence is always smaller. > > > > Check the expand_ccmp_next result ret and ret2, returns the valid one > > before cost comparison. > > > > gcc/ChangeLog: > > > > * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of > > expand_ccmp_next, returns the valid one first before > > comparing cost. > > --- > > gcc/ccmp.cc | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc > > index 7cb525addf4..4b424220068 100644 > > --- a/gcc/ccmp.cc > > +++ b/gcc/ccmp.cc > > @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) > > cost2 = seq_cost (prep_seq_2, speed_p); > > cost2 += seq_cost (gen_seq_2, speed_p); > > } > > - if (cost2 < cost1) > > + > > + /* For x86 target the ccmp does not support fp operands, but > > + have fcomi insn that can produce eflags and then do int > > + ccmp. So if one of the op is fp compare, ret1 or ret2 can > > + fail, and the cost of the corresponding empty seq will > > + always be smaller, then the NULL sequence will be returned. > > + Add check for ret and ret2, returns the available one if > > + the other is NULL. */ > > + if ((!ret && ret2) > > + || (!(ret && !ret2) > > + && cost2 < cost1)) > > { > > *prep_seq = prep_seq_2; > > *gen_seq = gen_seq_2; > > -- > > 2.31.1 > >
Gently ping :) Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can help to review this part? Thanks. Hongyu Wang <wwwhhhyyy333@gmail.com> 于2024年5月23日周四 16:27写道: > > Gently ping for this :) > Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can > help to review this part? > Thanks. > > Hongyu Wang <wwwhhhyyy333@gmail.com> 于2024年5月15日周三 16:25写道: > > > > CC'd Richard for ccmp part as previously it is added only for aarch64. > > The original logic will not interrupted since if > > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also > > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the > > prepare_operand will fixup the input that cmp supports but ccmp not, > > so ret/ret2 will all be valid when comparing cost. > > Thanks in advance. > > > > Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道: > > > > > > For general ccmp scenario, the tree sequence is like > > > > > > _1 = (a < b) > > > _2 = (c < d) > > > _3 = _1 & _2 > > > > > > current ccmp expanding will try to swap compare order for _1 and _2, > > > compare the cost/cost2 between compare _1 and _2 first, then return the > > > sequence with lower cost. > > > > > > For x86 ccmp, we don't support FP compare as ccmp operand, but we > > > support fp comi + int ccmp sequence. With current cost comparison > > > model, the fp comi + int ccmp can never be generated since it doesn't > > > check whether expand_ccmp_next returns available result and the rtl > > > cost for the empty ccmp sequence is always smaller. > > > > > > Check the expand_ccmp_next result ret and ret2, returns the valid one > > > before cost comparison. > > > > > > gcc/ChangeLog: > > > > > > * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of > > > expand_ccmp_next, returns the valid one first before > > > comparing cost. > > > --- > > > gcc/ccmp.cc | 12 +++++++++++- > > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > > > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc > > > index 7cb525addf4..4b424220068 100644 > > > --- a/gcc/ccmp.cc > > > +++ b/gcc/ccmp.cc > > > @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) > > > cost2 = seq_cost (prep_seq_2, speed_p); > > > cost2 += seq_cost (gen_seq_2, speed_p); > > > } > > > - if (cost2 < cost1) > > > + > > > + /* For x86 target the ccmp does not support fp operands, but > > > + have fcomi insn that can produce eflags and then do int > > > + ccmp. So if one of the op is fp compare, ret1 or ret2 can > > > + fail, and the cost of the corresponding empty seq will > > > + always be smaller, then the NULL sequence will be returned. > > > + Add check for ret and ret2, returns the available one if > > > + the other is NULL. */ > > > + if ((!ret && ret2) > > > + || (!(ret && !ret2) > > > + && cost2 < cost1)) > > > { > > > *prep_seq = prep_seq_2; > > > *gen_seq = gen_seq_2; > > > -- > > > 2.31.1 > > >
Hongyu Wang <wwwhhhyyy333@gmail.com> writes: > CC'd Richard for ccmp part as previously it is added only for aarch64. > The original logic will not interrupted since if > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the > prepare_operand will fixup the input that cmp supports but ccmp not, > so ret/ret2 will all be valid when comparing cost. > Thanks in advance. Sorry for the slow review. > Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道: >> >> For general ccmp scenario, the tree sequence is like >> >> _1 = (a < b) >> _2 = (c < d) >> _3 = _1 & _2 >> >> current ccmp expanding will try to swap compare order for _1 and _2, >> compare the cost/cost2 between compare _1 and _2 first, then return the >> sequence with lower cost. >> >> For x86 ccmp, we don't support FP compare as ccmp operand, but we >> support fp comi + int ccmp sequence. With current cost comparison >> model, the fp comi + int ccmp can never be generated since it doesn't >> check whether expand_ccmp_next returns available result and the rtl >> cost for the empty ccmp sequence is always smaller. >> >> Check the expand_ccmp_next result ret and ret2, returns the valid one >> before cost comparison. >> >> gcc/ChangeLog: >> >> * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of >> expand_ccmp_next, returns the valid one first before >> comparing cost. >> --- >> gcc/ccmp.cc | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc >> index 7cb525addf4..4b424220068 100644 >> --- a/gcc/ccmp.cc >> +++ b/gcc/ccmp.cc >> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) >> cost2 = seq_cost (prep_seq_2, speed_p); >> cost2 += seq_cost (gen_seq_2, speed_p); >> } >> - if (cost2 < cost1) >> + >> + /* For x86 target the ccmp does not support fp operands, but >> + have fcomi insn that can produce eflags and then do int >> + ccmp. So if one of the op is fp compare, ret1 or ret2 can >> + fail, and the cost of the corresponding empty seq will >> + always be smaller, then the NULL sequence will be returned. >> + Add check for ret and ret2, returns the available one if >> + the other is NULL. */ I think the more fundamental point is that the cost of a failed expansion isn't meaningful. So how about: /* It's possible that one expansion succeeds and the other fails. For example, x86 has int ccmp but not fp ccmp, and so a combined fp and int comparison must be ordered such that the fp comparison happens first. The costs are not meaningful for failed expansions. */ >> + if ((!ret && ret2) >> + || (!(ret && !ret2) >> + && cost2 < cost1)) I think this simplifies to: if (ret2 && (!ret1 || cost2 < cost1)) OK with those changes, thanks. Richard >> { >> *prep_seq = prep_seq_2; >> *gen_seq = gen_seq_2; >> -- >> 2.31.1 >>
Thanks, this is the patch I'm going to check-in. For general ccmp scenario, the tree sequence is like _1 = (a < b) _2 = (c < d) _3 = _1 & _2 current ccmp expanding will try to swap compare order for _1 and _2, compare the expansion cost/cost2 for expanding _1 or _2 first, then return the sequence with lower cost. It is possible that one expansion succeeds and the other fails. For example, x86 has int ccmp but not fp ccmp, so a combined fp and int comparison must be ordered such that the fp comparison happens first. The costs are not meaningful for failed expansions. Check the expand_ccmp_next result ret and ret2, returns the valid one before cost comparison. gcc/ChangeLog: * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of expand_ccmp_next, returns the valid one first instead of comparing cost. --- gcc/ccmp.cc | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc index 7cb525addf4..4d50708d986 100644 --- a/gcc/ccmp.cc +++ b/gcc/ccmp.cc @@ -247,7 +247,15 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) cost2 = seq_cost (prep_seq_2, speed_p); cost2 += seq_cost (gen_seq_2, speed_p); } - if (cost2 < cost1) + + /* It's possible that one expansion succeeds and the other + fails. + For example, x86 has int ccmp but not fp ccmp, and so a + combined fp and int comparison must be ordered such that + the fp comparison happens first. The costs are not + meaningful for failed expansions. */ + + if (ret2 && (!ret || cost2 < cost1)) { *prep_seq = prep_seq_2; *gen_seq = gen_seq_2; -- 2.31.1 Richard Sandiford <richard.sandiford@arm.com> 于2024年6月5日周三 17:21写道: > > Hongyu Wang <wwwhhhyyy333@gmail.com> writes: > > CC'd Richard for ccmp part as previously it is added only for aarch64. > > The original logic will not interrupted since if > > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also > > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the > > prepare_operand will fixup the input that cmp supports but ccmp not, > > so ret/ret2 will all be valid when comparing cost. > > Thanks in advance. > > Sorry for the slow review. > > > Hongyu Wang <hongyu.wang@intel.com> 于2024年5月15日周三 16:22写道: > >> > >> For general ccmp scenario, the tree sequence is like > >> > >> _1 = (a < b) > >> _2 = (c < d) > >> _3 = _1 & _2 > >> > >> current ccmp expanding will try to swap compare order for _1 and _2, > >> compare the cost/cost2 between compare _1 and _2 first, then return the > >> sequence with lower cost. > >> > >> For x86 ccmp, we don't support FP compare as ccmp operand, but we > >> support fp comi + int ccmp sequence. With current cost comparison > >> model, the fp comi + int ccmp can never be generated since it doesn't > >> check whether expand_ccmp_next returns available result and the rtl > >> cost for the empty ccmp sequence is always smaller. > >> > >> Check the expand_ccmp_next result ret and ret2, returns the valid one > >> before cost comparison. > >> > >> gcc/ChangeLog: > >> > >> * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of > >> expand_ccmp_next, returns the valid one first before > >> comparing cost. > >> --- > >> gcc/ccmp.cc | 12 +++++++++++- > >> 1 file changed, 11 insertions(+), 1 deletion(-) > >> > >> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc > >> index 7cb525addf4..4b424220068 100644 > >> --- a/gcc/ccmp.cc > >> +++ b/gcc/ccmp.cc > >> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) > >> cost2 = seq_cost (prep_seq_2, speed_p); > >> cost2 += seq_cost (gen_seq_2, speed_p); > >> } > >> - if (cost2 < cost1) > >> + > >> + /* For x86 target the ccmp does not support fp operands, but > >> + have fcomi insn that can produce eflags and then do int > >> + ccmp. So if one of the op is fp compare, ret1 or ret2 can > >> + fail, and the cost of the corresponding empty seq will > >> + always be smaller, then the NULL sequence will be returned. > >> + Add check for ret and ret2, returns the available one if > >> + the other is NULL. */ > > I think the more fundamental point is that the cost of a failed > expansion isn't meaningful. So how about: > > /* It's possible that one expansion succeeds and the other fails. > For example, x86 has int ccmp but not fp ccmp, and so a combined > fp and int comparison must be ordered such that the fp comparison > happens first. The costs are not meaningful for failed > expansions. */ > > >> + if ((!ret && ret2) > >> + || (!(ret && !ret2) > >> + && cost2 < cost1)) > > I think this simplifies to: > > if (ret2 && (!ret1 || cost2 < cost1)) > > OK with those changes, thanks. > > Richard > > >> { > >> *prep_seq = prep_seq_2; > >> *gen_seq = gen_seq_2; > >> -- > >> 2.31.1 > >>
diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc index 7cb525addf4..4b424220068 100644 --- a/gcc/ccmp.cc +++ b/gcc/ccmp.cc @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) cost2 = seq_cost (prep_seq_2, speed_p); cost2 += seq_cost (gen_seq_2, speed_p); } - if (cost2 < cost1) + + /* For x86 target the ccmp does not support fp operands, but + have fcomi insn that can produce eflags and then do int + ccmp. So if one of the op is fp compare, ret1 or ret2 can + fail, and the cost of the corresponding empty seq will + always be smaller, then the NULL sequence will be returned. + Add check for ret and ret2, returns the available one if + the other is NULL. */ + if ((!ret && ret2) + || (!(ret && !ret2) + && cost2 < cost1)) { *prep_seq = prep_seq_2; *gen_seq = gen_seq_2;