Message ID | 1348247620-12734-8-git-send-email-rth@twiddle.net |
---|---|
State | New |
Headers | show |
On Fri, Sep 21, 2012 at 10:13:40AM -0700, Richard Henderson wrote: > When movcond_i32 is available we can further reduce the generated > op count from 12 to 6, and the generated code size on i686 from > 88 to 74 bytes. > > Signed-off-by: Richard Henderson <rth@twiddle.net> > --- > tcg/tcg-op.h | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h > index 3e375ea..0145a09 100644 > --- a/tcg/tcg-op.h > +++ b/tcg/tcg-op.h > @@ -2147,16 +2147,24 @@ static inline void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, > tcg_gen_op6i_i32(INDEX_op_setcond2_i32, t0, > TCGV_LOW(c1), TCGV_HIGH(c1), > TCGV_LOW(c2), TCGV_HIGH(c2), cond); > - tcg_gen_neg_i32(t0, t0); > > - tcg_gen_and_i32(t1, TCGV_LOW(v1), t0); > - tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0); > - tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1); > + if (TCG_TARGET_HAS_movcond_i32) { > + tcg_gen_movi_i32(t1, 0); > + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_LOW(ret), t0, t1, > + TCGV_LOW(v1), TCGV_LOW(v2)); > + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_HIGH(ret), t0, t1, > + TCGV_HIGH(v1), TCGV_HIGH(v2)); > + } else { > + tcg_gen_neg_i32(t0, t0); > > - tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0); > - tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0); > - tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1); > + tcg_gen_and_i32(t1, TCGV_LOW(v1), t0); > + tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0); > + tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1); > > + tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0); > + tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0); > + tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1); > + } > tcg_temp_free_i32(t0); > tcg_temp_free_i32(t1); > } else { At some point I tried to think how to implement movcond_i64 for MIPS directly in the backend. I just tried your patch, and I got this kind of code: | 0x2bb2ae58: sltu at,zero,s4 | 0x2bb2ae5c: sltu t0,zero,s3 | 0x2bb2ae60: or s3,at,t0 | 0x2bb2ae64: movz s1,s5,s3 | 0x2bb2ae68: movz s2,s6,s3 | | (in some cases some constants/globals loading appear in the middle, but | that's not due to movcond). It's basically the kind of code I would have written. It's clearly better to implement it directly in TCG. Now I wonder if it wouldn't be better to write brcond2 as setcond2 + brcond. And even setcond2 as a pair of setcond in TCG, which would allow some optimizations in case both high parts are zero. Tested-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
On 09/21/2012 02:23 PM, Aurelien Jarno wrote: > Now I wonder if it wouldn't be better to write brcond2 as setcond2 + > brcond. And even setcond2 as a pair of setcond in TCG, which would allow > some optimizations in case both high parts are zero. I think brcond2 vs setcond2 is a choice that has to be made on a host-by-host basis. E.g. for i386 we implement setcond2 with branches. E.g. for hppa setcond2, while not using branches, is twice the size of brcond2. But there's nothing saying you couldn't have the mips version of brcond2 use setcond2 internals to do its job. On the other hand, having tcg/optimize.c reduce both setcond2 and brcond2 to setcond and brcond with the appropriate values of zero would be a most welcome improvement. Also, I've been thinking about having a tcg.h function that produces the cond-without-equality table that I introduced to fix hppa recently. Using that could reduce code size in some of the other backends as well. r~
On Fri, Sep 21, 2012 at 03:27:52PM -0700, Richard Henderson wrote: > On 09/21/2012 02:23 PM, Aurelien Jarno wrote: > > Now I wonder if it wouldn't be better to write brcond2 as setcond2 + > > brcond. And even setcond2 as a pair of setcond in TCG, which would allow > > some optimizations in case both high parts are zero. > > I think brcond2 vs setcond2 is a choice that has to be made on a > host-by-host basis. E.g. for i386 we implement setcond2 with branches. > E.g. for hppa setcond2, while not using branches, is twice the size of > brcond2. But there's nothing saying you couldn't have the mips > version of brcond2 use setcond2 internals to do its job. Yes and now. Currently the MIPS version of setcond2, uses 2 temporary registers and the return argument. This is therefore not possible to make brcond2 to use setcond2. I have just seen it's possible to reorder the instructions to use two temporaries only, but you see the problem there. Doing that as the TCG level means you can use more temporaries, at the backend level means you have to play with the registers you have. > On the other hand, having tcg/optimize.c reduce both setcond2 and brcond2 > to setcond and brcond with the appropriate values of zero would be a most > welcome improvement. That's a good idea. > Also, I've been thinking about having a tcg.h function that produces the > cond-without-equality table that I introduced to fix hppa recently. Using > that could reduce code size in some of the other backends as well. > I am using such a table in the MIPS backend, but done with a big switch.
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 3e375ea..0145a09 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -2147,16 +2147,24 @@ static inline void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, tcg_gen_op6i_i32(INDEX_op_setcond2_i32, t0, TCGV_LOW(c1), TCGV_HIGH(c1), TCGV_LOW(c2), TCGV_HIGH(c2), cond); - tcg_gen_neg_i32(t0, t0); - tcg_gen_and_i32(t1, TCGV_LOW(v1), t0); - tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0); - tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1); + if (TCG_TARGET_HAS_movcond_i32) { + tcg_gen_movi_i32(t1, 0); + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_LOW(ret), t0, t1, + TCGV_LOW(v1), TCGV_LOW(v2)); + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_HIGH(ret), t0, t1, + TCGV_HIGH(v1), TCGV_HIGH(v2)); + } else { + tcg_gen_neg_i32(t0, t0); - tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0); - tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0); - tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1); + tcg_gen_and_i32(t1, TCGV_LOW(v1), t0); + tcg_gen_andc_i32(TCGV_LOW(ret), TCGV_LOW(v2), t0); + tcg_gen_or_i32(TCGV_LOW(ret), TCGV_LOW(ret), t1); + tcg_gen_and_i32(t1, TCGV_HIGH(v1), t0); + tcg_gen_andc_i32(TCGV_HIGH(ret), TCGV_HIGH(v2), t0); + tcg_gen_or_i32(TCGV_HIGH(ret), TCGV_HIGH(ret), t1); + } tcg_temp_free_i32(t0); tcg_temp_free_i32(t1); } else {
When movcond_i32 is available we can further reduce the generated op count from 12 to 6, and the generated code size on i686 from 88 to 74 bytes. Signed-off-by: Richard Henderson <rth@twiddle.net> --- tcg/tcg-op.h | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-)