Message ID | 4B2BF650.80902@twiddle.net |
---|---|
State | New |
Headers | show |
On Fri, Dec 18, 2009 at 10:38 PM, Richard Henderson <rth@twiddle.net> wrote: > On 12/18/2009 03:37 AM, Laurent Desnogues wrote: >>> >>> tcg: Generic support for conditional set and conditional move. >> >> Needs cosmetics changes. > > Fixed, attachment 1. > >>> tcg-x86_64: Implement setcond and movcond. >> >> Some cosmetics and comments, but overall good. > > Fixed, attachment 2. > >>> tcg-i386: Implement small forward branches. >> >> I think this contains a bug. > > Fixed, attachment 3. I've added an abort to patch_reloc to verify that the > relocation is in range. I've propagated the "small" flag to all of the > branch functions so that... > >>> tcg-i386: Simplify brcond2. >> >> I don't like the rewrite of brcond2. > > ... this patch is dropped. > >>> tcg-i386: Implement setcond, movcond, setcond2. >> >> Not yet reviewed. > > Fixed, attachment 4. Similar changes to the amd64 patch. Everything looks good to me, though for i386 I'd not bet my life it's 100% correct. OTOH I think this is good enough to go into mainline, so that people can start adding support in the front-ends. BTW for compiling to 32-bit on a 64-bit x86, the configure script is broken as it checks gcc before having set -m32, so passing -march=i686 will make it fail (in the end it means I could not convince QEMU to use cmov :-). Thanks, Laurent Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Hello, Am 18.12.2009 um 22:38 schrieb Richard Henderson: > <commit-cmov-1.txt><commit-cmov-amd64.txt><commit-cmov-i386- > jmps.txt><commit-cmov-i386.txt> Please send patches inline (and one patch per mail, like your original series), then a) you get more review comments and b) we can apply the patches with git-am (including your description) for testing. You can use git-send-email's --in-reply-to= to make them appear as reply in this thread. Thanks, Andreas
On Fri, Dec 18, 2009 at 01:38:24PM -0800, Richard Henderson wrote: > On 12/18/2009 03:37 AM, Laurent Desnogues wrote: >>> tcg: Generic support for conditional set and conditional move. >> >> Needs cosmetics changes. > > Fixed, attachment 1. > >>> tcg-x86_64: Implement setcond and movcond. >> >> Some cosmetics and comments, but overall good. > > Fixed, attachment 2. > >>> tcg-i386: Implement small forward branches. >> >> I think this contains a bug. > > Fixed, attachment 3. I've added an abort to patch_reloc to verify that > the relocation is in range. I've propagated the "small" flag to all of > the branch functions so that... > >>> tcg-i386: Simplify brcond2. >> >> I don't like the rewrite of brcond2. > > ... this patch is dropped. > >>> tcg-i386: Implement setcond, movcond, setcond2. >> >> Not yet reviewed. > > Fixed, attachment 4. Similar changes to the amd64 patch. > Could you please send the patches inline instead. It makes them easier to comment. Please find my comments here: - I am fine with the setcond instruction - For the movcond instruction, is there a real use for vtrue and vfalse values? Most CPU actually implement a version with one value. Implementing it with two values moves complexity within the arch specific tcg code. - Do we really want to make movcond mandatory? It can be easily implemented using setcond, and, or instructions. - The cmov instruction is not supported on all i386 CPU. While it is unlikely that someone runs qemu-system on such a CPU, it is not improbable that someone runs qemu-user on such a CPU. We should probably provide an alternative code and a check in configure (e.g. trying to compile asm inline code containing a cmov instruction). - I am not sure using xor and followed by setcc *conditionally* instead of a movzb after setcc is a good idea. The two instructions are supposed to be executed at the same speed, and time spent in code generation is not negligeable. - Pay attention to the coding style, there are a few cases of if without {}. A final comment, would it be possible to split setcond and movcond in different patches? setcond looks ready to go, there are probably some more concerns/comments about movcond.
On Sat, Dec 19, 2009 at 02:03:46PM +0100, Aurelien Jarno wrote: > On Fri, Dec 18, 2009 at 01:38:24PM -0800, Richard Henderson wrote: > > On 12/18/2009 03:37 AM, Laurent Desnogues wrote: > >>> tcg: Generic support for conditional set and conditional move. > >> > >> Needs cosmetics changes. > > > > Fixed, attachment 1. > > > >>> tcg-x86_64: Implement setcond and movcond. > >> > >> Some cosmetics and comments, but overall good. > > > > Fixed, attachment 2. > > > >>> tcg-i386: Implement small forward branches. > >> > >> I think this contains a bug. > > > > Fixed, attachment 3. I've added an abort to patch_reloc to verify that > > the relocation is in range. I've propagated the "small" flag to all of > > the branch functions so that... > > > >>> tcg-i386: Simplify brcond2. > >> > >> I don't like the rewrite of brcond2. > > > > ... this patch is dropped. > > > >>> tcg-i386: Implement setcond, movcond, setcond2. > >> > >> Not yet reviewed. > > > > Fixed, attachment 4. Similar changes to the amd64 patch. > > > > > Could you please send the patches inline instead. It makes them easier > to comment. > > Please find my comments here: > - I am fine with the setcond instruction > - For the movcond instruction, is there a real use for vtrue and vfalse > values? Most CPU actually implement a version with one value. > Implementing it with two values moves complexity within the arch > specific tcg code. > - Do we really want to make movcond mandatory? It can be easily > implemented using setcond, and, or instructions. > - The cmov instruction is not supported on all i386 CPU. While it is > unlikely that someone runs qemu-system on such a CPU, it is not > improbable that someone runs qemu-user on such a CPU. We should > probably provide an alternative code and a check in configure (e.g. > trying to compile asm inline code containing a cmov instruction). Forget about that, I read the i386 patch to quickly.
On 12/19/2009 03:40 AM, Laurent Desnogues wrote: > BTW for compiling to 32-bit on a 64-bit x86, the configure > script is broken as it checks gcc before having set -m32, > so passing -march=i686 will make it fail (in the end it means > I could not convince QEMU to use cmov :-). One of my servers has a 32-bit install. I got tired of remembering the moderately long command-line required to configure GCC for a 32-bit build on a 64-bit host. :-) r~
On 12/19/2009 05:03 AM, Aurelien Jarno wrote: > - For the movcond instruction, is there a real use for vtrue and vfalse > values? Most CPU actually implement a version with one value. > Implementing it with two values moves complexity within the arch > specific tcg code. The reason I added both is that rather than force TCG to insert extra moves in order to always match VFALSE with DEST, I could have the backend invert the condition if VTRUE happened to match DEST already. I suppose it would be possible to tweek the TCG register allocator to understand such commutative operations. That seemed harder, but perhaps it really isn't considering that inversion code has to be replicated across all the targets. It's certainly something to think about. > - Do we really want to make movcond mandatory? It can be easily > implemented using setcond, and, or instructions. I think so. In that every host does have something useful to emit that's better than the generic fallback, and we'll quickly have implementations for each. > - I am not sure using xor and followed by setcc *conditionally* instead > of a movzb after setcc is a good idea. The two instructions are > supposed to be executed at the same speed, and time spent in code > generation is not negligeable. I can certainly remove that bit if you insist. > - Pay attention to the coding style, there are a few cases of if > without {}. Ok. I try to remember, but it's at odds with my usual style, so please understand momentary lapses. > A final comment, would it be possible to split setcond and movcond in > different patches? setcond looks ready to go, there are probably some > more concerns/comments about movcond. Ok. r~
On Sat, Dec 19, 2009 at 08:19:32AM -0800, Richard Henderson wrote: > On 12/19/2009 05:03 AM, Aurelien Jarno wrote: > >- For the movcond instruction, is there a real use for vtrue and vfalse > > values? Most CPU actually implement a version with one value. > > Implementing it with two values moves complexity within the arch > > specific tcg code. > > The reason I added both is that rather than force TCG to insert > extra moves in order to always match VFALSE with DEST, I could have > the backend invert the condition if VTRUE happened to match DEST > already. > > I suppose it would be possible to tweek the TCG register allocator > to understand such commutative operations. That seemed harder, but > perhaps it really isn't considering that inversion code has to be > replicated across all the targets. It's certainly something to > think about. > My main concerns here is that we are introducing a complex code here that has to be multiplied by the number of TCG targets we have. On the other hand I am not sure there are a lot of use cases for the different architectures we support. In addition I am reserved to make such bug changes given I am not sure it will result in a speed gain. All the previous setcond implementations have been stopped because there was no measurable gain.
diff --git a/tcg/README b/tcg/README index e672258..6163abe 100644 --- a/tcg/README +++ b/tcg/README @@ -152,6 +152,11 @@ Conditional jump if t0 cond t1 is true. cond can be: TCG_COND_LEU /* unsigned */ TCG_COND_GTU /* unsigned */ +* brcond2_i32 cond, t0_low, t0_high, t1_low, t1_high, label + +Similar to brcond, except that the 64-bit values T0 and T1 +are formed from two 32-bit arguments. + ********* Arithmetic * add_i32/i64 t0, t1, t2 @@ -282,6 +287,25 @@ order bytes must be set to zero. Indicate that the value of t0 won't be used later. It is useful to force dead code elimination. +********* Conditional moves + +* setcond_i32/i64 cond, dest, t1, t2 + +dest = (t1 cond t2) + +Set DEST to 1 if (T1 cond T2) is true, otherwise set to 0. + +* movcond_i32/i64 cond, dest, c1, c2, vtrue, vfalse + +dest = (c1 cond c2 ? vtrue : vfalse) + +Set DEST to VTRUE if (C1 cond C2) is true, otherwise set to VFALSE. + +* setcond2_i32 cond, dest, t1_low, t1_high, t2_low, t2_high + +Similar to setcond, except that the 64-bit values T1 and T2 are +formed from two 32-bit arguments. The result is a 32-bit value. + ********* Type conversions * ext_i32_i64 t0, t1 @@ -375,7 +399,7 @@ The target word size (TCG_TARGET_REG_BITS) is expected to be 32 bit or On a 32 bit target, all 64 bit operations are converted to 32 bits. A few specific operations must be implemented to allow it (see add2_i32, -sub2_i32, brcond2_i32). +sub2_i32, brcond2_i32, setcond2_i32). Floating point operations are not supported in this version. A previous incarnation of the code generator had full support of them, diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index faf2e8b..f43ed16 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -280,6 +280,32 @@ static inline void tcg_gen_op6_i64(int opc, TCGv_i64 arg1, TCGv_i64 arg2, *gen_opparam_ptr++ = GET_TCGV_I64(arg6); } +static inline void tcg_gen_op6i_i32(int opc, TCGv_i32 arg1, TCGv_i32 arg2, + TCGv_i32 arg3, TCGv_i32 arg4, + TCGv_i32 arg5, TCGArg arg6) +{ + *gen_opc_ptr++ = opc; + *gen_opparam_ptr++ = GET_TCGV_I32(arg1); + *gen_opparam_ptr++ = GET_TCGV_I32(arg2); + *gen_opparam_ptr++ = GET_TCGV_I32(arg3); + *gen_opparam_ptr++ = GET_TCGV_I32(arg4); + *gen_opparam_ptr++ = GET_TCGV_I32(arg5); + *gen_opparam_ptr++ = arg6; +} + +static inline void tcg_gen_op6i_i64(int opc, TCGv_i64 arg1, TCGv_i64 arg2, + TCGv_i64 arg3, TCGv_i64 arg4, + TCGv_i64 arg5, TCGArg arg6) +{ + *gen_opc_ptr++ = opc; + *gen_opparam_ptr++ = GET_TCGV_I64(arg1); + *gen_opparam_ptr++ = GET_TCGV_I64(arg2); + *gen_opparam_ptr++ = GET_TCGV_I64(arg3); + *gen_opparam_ptr++ = GET_TCGV_I64(arg4); + *gen_opparam_ptr++ = GET_TCGV_I64(arg5); + *gen_opparam_ptr++ = arg6; +} + static inline void tcg_gen_op6ii_i32(int opc, TCGv_i32 arg1, TCGv_i32 arg2, TCGv_i32 arg3, TCGv_i32 arg4, TCGArg arg5, TCGArg arg6) @@ -1795,6 +1821,67 @@ static inline void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) } } +static inline void tcg_gen_setcond_i32(int cond, TCGv_i32 ret, + TCGv_i32 arg1, TCGv_i32 arg2) +{ + tcg_gen_op4i_i32(INDEX_op_setcond_i32, ret, arg1, arg2, cond); +} + +static inline void tcg_gen_setcond_i64(int cond, TCGv_i64 ret, + TCGv_i64 arg1, TCGv_i64 arg2) +{ +#if TCG_TARGET_REG_BITS == 64 + tcg_gen_op4i_i64(INDEX_op_setcond_i64, ret, arg1, arg2, cond); +#else + tcg_gen_op6i_i32(INDEX_op_setcond2_i32, TCGV_LOW(ret), + TCGV_LOW(arg1), TCGV_HIGH(arg1), + TCGV_LOW(arg2), TCGV_HIGH(arg2), cond); + tcg_gen_movi_i32(TCGV_HIGH(ret), 0); +#endif +} + +static inline void tcg_gen_movcond_i32(int cond, TCGv_i32 ret, + TCGv_i32 cmp1, TCGv_i32 cmp2, + TCGv_i32 op_t, TCGv_i32 op_f) +{ + if (TCGV_EQUAL_I32(op_t, op_f)) { + tcg_gen_mov_i32(ret, op_t); + return; + } + tcg_gen_op6i_i32(INDEX_op_movcond_i32, ret, cmp1, cmp2, op_t, op_f, cond); +} + +static inline void tcg_gen_movcond_i64(int cond, TCGv_i64 ret, + TCGv_i64 cmp1, TCGv_i64 cmp2, + TCGv_i64 op_t, TCGv_i64 op_f) +{ + if (TCGV_EQUAL_I64(op_t, op_f)) { + tcg_gen_mov_i64(ret, op_t); + return; + } +#if TCG_TARGET_REG_BITS == 64 + tcg_gen_op6i_i64(INDEX_op_movcond_i64, ret, cmp1, cmp2, op_t, op_f, cond); +#else + { + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 zero = tcg_const_i32(0); + + tcg_gen_op6i_i32(INDEX_op_setcond2_i32, t0, + TCGV_LOW(cmp1), TCGV_HIGH(cmp1), + TCGV_LOW(cmp2), TCGV_HIGH(cmp2), cond); + + /* ??? We could perhaps conditionally define a movcond2_i32. */ + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_LOW(ret), t0, zero, + TCGV_LOW(op_t), TCGV_LOW(op_f)); + tcg_gen_movcond_i32(TCG_COND_NE, TCGV_HIGH(ret), t0, zero, + TCGV_HIGH(op_t), TCGV_HIGH(op_f)); + + tcg_temp_free_i32(t0); + tcg_temp_free_i32(zero); + } +#endif +} + /***************************************/ /* QEMU specific operations. Their type depend on the QEMU CPU type. */ @@ -2067,6 +2154,8 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index) #define tcg_gen_sari_tl tcg_gen_sari_i64 #define tcg_gen_brcond_tl tcg_gen_brcond_i64 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i64 +#define tcg_gen_setcond_tl tcg_gen_setcond_i64 +#define tcg_gen_movcond_tl tcg_gen_movcond_i64 #define tcg_gen_mul_tl tcg_gen_mul_i64 #define tcg_gen_muli_tl tcg_gen_muli_i64 #define tcg_gen_div_tl tcg_gen_div_i64 @@ -2137,6 +2226,8 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index) #define tcg_gen_sari_tl tcg_gen_sari_i32 #define tcg_gen_brcond_tl tcg_gen_brcond_i32 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i32 +#define tcg_gen_setcond_tl tcg_gen_setcond_i32 +#define tcg_gen_movcond_tl tcg_gen_movcond_i32 #define tcg_gen_mul_tl tcg_gen_mul_i32 #define tcg_gen_muli_tl tcg_gen_muli_i32 #define tcg_gen_div_tl tcg_gen_div_i32 diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index b7f3fd7..086968c 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -42,6 +42,8 @@ DEF2(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS) DEF2(mov_i32, 1, 1, 0, 0) DEF2(movi_i32, 1, 0, 1, 0) +DEF2(setcond_i32, 1, 2, 1, 0) +DEF2(movcond_i32, 1, 4, 1, 0) /* load/store */ DEF2(ld8u_i32, 1, 1, 1, 0) DEF2(ld8s_i32, 1, 1, 1, 0) @@ -82,6 +84,7 @@ DEF2(add2_i32, 2, 4, 0, 0) DEF2(sub2_i32, 2, 4, 0, 0) DEF2(brcond2_i32, 0, 4, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS) DEF2(mulu2_i32, 2, 2, 0, 0) +DEF2(setcond2_i32, 1, 4, 1, 0) #endif #ifdef TCG_TARGET_HAS_ext8s_i32 DEF2(ext8s_i32, 1, 1, 0, 0) @@ -111,6 +114,8 @@ DEF2(neg_i32, 1, 1, 0, 0) #if TCG_TARGET_REG_BITS == 64 DEF2(mov_i64, 1, 1, 0, 0) DEF2(movi_i64, 1, 0, 1, 0) +DEF2(setcond_i64, 1, 2, 1, 0) +DEF2(movcond_i64, 1, 4, 1, 0) /* load/store */ DEF2(ld8u_i64, 1, 1, 1, 0) DEF2(ld8s_i64, 1, 1, 1, 0) diff --git a/tcg/tcg.c b/tcg/tcg.c index 3c0e296..f7ea727 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -670,6 +670,7 @@ void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1, } #endif + static void tcg_reg_alloc_start(TCGContext *s) { int i; @@ -888,21 +889,31 @@ void tcg_dump_ops(TCGContext *s, FILE *outfile) fprintf(outfile, "%s", tcg_get_arg_str_idx(s, buf, sizeof(buf), args[k++])); } - if (c == INDEX_op_brcond_i32 + switch (c) { + case INDEX_op_brcond_i32: +#if TCG_TARGET_REG_BITS == 32 + case INDEX_op_brcond2_i32: +#elif TCG_TARGET_REG_BITS == 64 + case INDEX_op_brcond_i64: +#endif + case INDEX_op_setcond_i32: + case INDEX_op_movcond_i32: #if TCG_TARGET_REG_BITS == 32 - || c == INDEX_op_brcond2_i32 + case INDEX_op_setcond2_i32: #elif TCG_TARGET_REG_BITS == 64 - || c == INDEX_op_brcond_i64 + case INDEX_op_setcond_i64: + case INDEX_op_movcond_i64: #endif - ) { if (args[k] < ARRAY_SIZE(cond_name) && cond_name[args[k]]) fprintf(outfile, ",%s", cond_name[args[k++]]); else fprintf(outfile, ",$0x%" TCG_PRIlx, args[k++]); i = 1; - } - else + break; + default: i = 0; + break; + } for(; i < nb_cargs; i++) { if (k != 0) fprintf(outfile, ",");