Message ID | 1256133153-3121-2-git-send-email-uli@suse.de |
---|---|
State | New |
Headers | show |
On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: > sync allows concurrent accesses to locations in memory through different TCG > variables. This comes in handy when you are emulating CPU registers that can > be used as either 32 or 64 bit, as TCG doesn't know anything about aliases. > See the s390x target for an example. > > Fixed sync_i64 build failure on 32-bit targets. Looking more in details to the use case of this patch, I think it can be useful in QEMU. However I don't feel very comfortable in merging it without having the opinion of more persons. Paul, Malc Blue Swirl or others, any opinion? Anyway this patch should come with (simple) documentation to drop into tcg/README. Otherwise, please find my comments inline. > Signed-off-by: Ulrich Hecht <uli@suse.de> > --- > tcg/tcg-op.h | 12 ++++++++++++ > tcg/tcg-opc.h | 2 ++ > tcg/tcg.c | 6 ++++++ > 3 files changed, 20 insertions(+), 0 deletions(-) > > diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h > index faf2e8b..c1b4710 100644 > --- a/tcg/tcg-op.h > +++ b/tcg/tcg-op.h > @@ -316,6 +316,18 @@ static inline void tcg_gen_br(int label) > tcg_gen_op1i(INDEX_op_br, label); > } > > +static inline void tcg_gen_sync_i32(TCGv_i32 arg) > +{ > + tcg_gen_op1_i32(INDEX_op_sync_i32, arg); > +} > + > +#if TCG_TARGET_REG_BITS == 64 > +static inline void tcg_gen_sync_i64(TCGv_i64 arg) > +{ > + tcg_gen_op1_i64(INDEX_op_sync_i64, arg); > +} > +#endif > + > static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg) > { > if (!TCGV_EQUAL_I32(ret, arg)) You may also want to add a sync_tl as a #define at the end of the file. > diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h > index b7f3fd7..5dcdeba 100644 > --- a/tcg/tcg-opc.h > +++ b/tcg/tcg-opc.h > @@ -40,6 +40,7 @@ DEF2(call, 0, 1, 2, TCG_OPF_SIDE_EFFECTS) /* variable number of parameters */ > DEF2(jmp, 0, 1, 0, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS) > DEF2(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS) > > +DEF2(sync_i32, 0, 1, 0, 0) > DEF2(mov_i32, 1, 1, 0, 0) > DEF2(movi_i32, 1, 0, 1, 0) > /* load/store */ > @@ -109,6 +110,7 @@ DEF2(neg_i32, 1, 1, 0, 0) > #endif > > #if TCG_TARGET_REG_BITS == 64 > +DEF2(sync_i64, 0, 1, 0, 0) > DEF2(mov_i64, 1, 1, 0, 0) > DEF2(movi_i64, 1, 0, 1, 0) > /* load/store */ > diff --git a/tcg/tcg.c b/tcg/tcg.c > index 3c0e296..8eb60f8 100644 > --- a/tcg/tcg.c > +++ b/tcg/tcg.c > @@ -1930,6 +1930,12 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf, > // dump_regs(s); > #endif > switch(opc) { > + case INDEX_op_sync_i32: > +#if TCG_TARGET_REG_BITS == 64 > + case INDEX_op_sync_i64: > +#endif > + temp_save(s, args[0], s->reserved_regs); > + break; This won't work for fixed register. You should probably add something like: if (s->temps[args[0]].fixed_reg) { tcg_abort(); } > case INDEX_op_mov_i32: > #if TCG_TARGET_REG_BITS == 64 > case INDEX_op_mov_i64: > -- > 1.6.2.1 > >
On Thu, 22 Oct 2009, Aurelien Jarno wrote: > On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: > > sync allows concurrent accesses to locations in memory through different TCG > > variables. This comes in handy when you are emulating CPU registers that can > > be used as either 32 or 64 bit, as TCG doesn't know anything about aliases. > > See the s390x target for an example. > > > > Fixed sync_i64 build failure on 32-bit targets. > > Looking more in details to the use case of this patch, I think it can be > useful in QEMU. However I don't feel very comfortable in merging it > without having the opinion of more persons. Paul, Malc Blue Swirl or > others, any opinion? I don't really understand the problem/solution so no - no opinion. [..snip..]
On Thursday 22 October 2009, Aurelien Jarno wrote: > On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: > > sync allows concurrent accesses to locations in memory through different > > TCG variables. This comes in handy when you are emulating CPU registers > > that can be used as either 32 or 64 bit, as TCG doesn't know anything > > about aliases. See the s390x target for an example. > > > > Fixed sync_i64 build failure on 32-bit targets. > > Looking more in details to the use case of this patch, I think it can be > useful in QEMU. However I don't feel very comfortable in merging it > without having the opinion of more persons. Paul, Malc Blue Swirl or > others, any opinion? I don't think this is the right solution. IIUC the basic problem is that we have a register file where adjacent pairs of 32-bit registers are also accessed as a 64-bit value. This is something many other targets need to do (at least ARM, PPC, MIPS and SPARC). While sync appears attractive as a quick hack to achieve this, I think it is liable to be abused, and cause us serious pain long-term. If you need an easy solution then use ld/st (as with ARM VFP registers). If you want a good solution then fix whichever bit of TCG makes accessing a pair of registers horribly slow. We already have some support for this (concat_i32_i64). Paul
Paul Brook wrote: > On Thursday 22 October 2009, Aurelien Jarno wrote: > >> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: >> >>> sync allows concurrent accesses to locations in memory through different >>> TCG variables. This comes in handy when you are emulating CPU registers >>> that can be used as either 32 or 64 bit, as TCG doesn't know anything >>> about aliases. See the s390x target for an example. >>> >>> Fixed sync_i64 build failure on 32-bit targets. >>> >> Looking more in details to the use case of this patch, I think it can be >> useful in QEMU. However I don't feel very comfortable in merging it >> without having the opinion of more persons. Paul, Malc Blue Swirl or >> others, any opinion? >> > > I don't think this is the right solution. > > IIUC the basic problem is that we have a register file where adjacent pairs of > 32-bit registers are also accessed as a 64-bit value. This is something many > other targets need to do (at least ARM, PPC, MIPS and SPARC). > > While sync appears attractive as a quick hack to achieve this, I think it is > liable to be abused, and cause us serious pain long-term. If you need an easy > solution then use ld/st (as with ARM VFP registers). If you want a good > solution then fix whichever bit of TCG makes accessing a pair of registers > horribly slow. We already have some support for this (concat_i32_i64). > Could we please include it nevertheless? I don't want to see S390 TCG and KVM targets not included because of this sync operation. If you like, add some big fat warning around it and maybe break compilation if anyone but the s390 target uses that sync op, but let's not keep a whole target from inclusion because of a single feature that _might_ one day affect others. Alex
> > While sync appears attractive as a quick hack to achieve this, I think it > > is liable to be abused, and cause us serious pain long-term. If you need > > an easy solution then use ld/st (as with ARM VFP registers). If you want > > a good solution then fix whichever bit of TCG makes accessing a pair of > > registers horribly slow. We already have some support for this > > (concat_i32_i64). > > Could we please include it nevertheless? I don't want to see S390 TCG > and KVM targets not included because of this sync operation. > > If you like, add some big fat warning around it and maybe break > compilation if anyone but the s390 target uses that sync op, but let's > not keep a whole target from inclusion because of a single feature that > _might_ one day affect others. I'd rather not include sync, and instead use the explicit ld/st code you already wrote. Paul
Paul Brook wrote: >>> While sync appears attractive as a quick hack to achieve this, I think it >>> is liable to be abused, and cause us serious pain long-term. If you need >>> an easy solution then use ld/st (as with ARM VFP registers). If you want >>> a good solution then fix whichever bit of TCG makes accessing a pair of >>> registers horribly slow. We already have some support for this >>> (concat_i32_i64). >>> >> Could we please include it nevertheless? I don't want to see S390 TCG >> and KVM targets not included because of this sync operation. >> >> If you like, add some big fat warning around it and maybe break >> compilation if anyone but the s390 target uses that sync op, but let's >> not keep a whole target from inclusion because of a single feature that >> _might_ one day affect others. >> > > I'd rather not include sync, and instead use the explicit ld/st code you > already wrote. > As long as the KVM code comes in I'm good. I don't really care that much about the TCG parts and only need them to have the build system surroundings set up. So yes, I don't care about sync. As long as we finally get any solution here I'm good, but patches lying around for weeks surely doesn't help qemu. Alex
On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote: > On Thursday 22 October 2009, Aurelien Jarno wrote: > > On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: > > > sync allows concurrent accesses to locations in memory through different > > > TCG variables. This comes in handy when you are emulating CPU registers > > > that can be used as either 32 or 64 bit, as TCG doesn't know anything > > > about aliases. See the s390x target for an example. > > > > > > Fixed sync_i64 build failure on 32-bit targets. > > > > Looking more in details to the use case of this patch, I think it can be > > useful in QEMU. However I don't feel very comfortable in merging it > > without having the opinion of more persons. Paul, Malc Blue Swirl or > > others, any opinion? > > I don't think this is the right solution. > > IIUC the basic problem is that we have a register file where adjacent pairs of > 32-bit registers are also accessed as a 64-bit value. This is something many > other targets need to do (at least ARM, PPC, MIPS and SPARC). > > While sync appears attractive as a quick hack to achieve this, I think it is > liable to be abused, and cause us serious pain long-term. If you need an easy > solution then use ld/st (as with ARM VFP registers). If you want a good > solution then fix whichever bit of TCG makes accessing a pair of registers > horribly slow. We already have some support for this (concat_i32_i64). > What is probably needed here are merge_low and merge_high ops, merging a 32-bit value into the low or high part of a 64-bit value, leaving the other part unchanged. Not sure we can really optimize that on x86/x86_64 compared to standard TCG code though.
On 18.11.2009, at 00:40, Aurelien Jarno wrote: > On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote: >> On Thursday 22 October 2009, Aurelien Jarno wrote: >>> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: >>>> sync allows concurrent accesses to locations in memory through >>>> different >>>> TCG variables. This comes in handy when you are emulating CPU >>>> registers >>>> that can be used as either 32 or 64 bit, as TCG doesn't know >>>> anything >>>> about aliases. See the s390x target for an example. >>>> >>>> Fixed sync_i64 build failure on 32-bit targets. >>> >>> Looking more in details to the use case of this patch, I think it >>> can be >>> useful in QEMU. However I don't feel very comfortable in merging it >>> without having the opinion of more persons. Paul, Malc Blue Swirl or >>> others, any opinion? >> >> I don't think this is the right solution. >> >> IIUC the basic problem is that we have a register file where >> adjacent pairs of >> 32-bit registers are also accessed as a 64-bit value. This is >> something many >> other targets need to do (at least ARM, PPC, MIPS and SPARC). >> >> While sync appears attractive as a quick hack to achieve this, I >> think it is >> liable to be abused, and cause us serious pain long-term. If you >> need an easy >> solution then use ld/st (as with ARM VFP registers). If you want a >> good >> solution then fix whichever bit of TCG makes accessing a pair of >> registers >> horribly slow. We already have some support for this >> (concat_i32_i64). >> > > What is probably needed here are merge_low and merge_high ops, > merging a > 32-bit value into the low or high part of a 64-bit value, leaving the > other part unchanged. Not sure we can really optimize that on x86/ > x86_64 > compared to standard TCG code though. Maybe I got the whole point wrong but isn't this about having 2 virtual register sets for the same target registers? So you'd have: TCGvar my_reg_32; TCGvar my_reg_64; and whenever you work with either of them you want to have the correct value present in both (cut off for 32bit, extended for 64 bit). So what's really necessary is some internal coupling and dirty log of variables within TCG so it knows that an access to the 64 bit of a coupled variable means it needs to sync the 32 bit value over and vice versa. Then magically everything would just work as expected. Or am I totally wrong here? Alex
On Wed, Nov 18, 2009 at 01:01:13AM +0100, Alexander Graf wrote: > > On 18.11.2009, at 00:40, Aurelien Jarno wrote: > >> On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote: >>> On Thursday 22 October 2009, Aurelien Jarno wrote: >>>> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: >>>>> sync allows concurrent accesses to locations in memory through >>>>> different >>>>> TCG variables. This comes in handy when you are emulating CPU >>>>> registers >>>>> that can be used as either 32 or 64 bit, as TCG doesn't know >>>>> anything >>>>> about aliases. See the s390x target for an example. >>>>> >>>>> Fixed sync_i64 build failure on 32-bit targets. >>>> >>>> Looking more in details to the use case of this patch, I think it >>>> can be >>>> useful in QEMU. However I don't feel very comfortable in merging it >>>> without having the opinion of more persons. Paul, Malc Blue Swirl or >>>> others, any opinion? >>> >>> I don't think this is the right solution. >>> >>> IIUC the basic problem is that we have a register file where >>> adjacent pairs of >>> 32-bit registers are also accessed as a 64-bit value. This is >>> something many >>> other targets need to do (at least ARM, PPC, MIPS and SPARC). >>> >>> While sync appears attractive as a quick hack to achieve this, I >>> think it is >>> liable to be abused, and cause us serious pain long-term. If you >>> need an easy >>> solution then use ld/st (as with ARM VFP registers). If you want a >>> good >>> solution then fix whichever bit of TCG makes accessing a pair of >>> registers >>> horribly slow. We already have some support for this >>> (concat_i32_i64). >>> >> >> What is probably needed here are merge_low and merge_high ops, merging >> a >> 32-bit value into the low or high part of a 64-bit value, leaving the >> other part unchanged. Not sure we can really optimize that on x86/ >> x86_64 >> compared to standard TCG code though. > > Maybe I got the whole point wrong but isn't this about having 2 virtual > register sets for the same target registers? So you'd have: > > TCGvar my_reg_32; > TCGvar my_reg_64; > > and whenever you work with either of them you want to have the correct > value present in both (cut off for 32bit, extended for 64 bit). > > So what's really necessary is some internal coupling and dirty log of > variables within TCG so it knows that an access to the 64 bit of a > coupled variable means it needs to sync the 32 bit value over and vice > versa. Then magically everything would just work as expected. > That's an option, basically keeping the list (or only one ?) of aliased TCG variables for each of them, and freeing the others before using one.
Aurelien Jarno wrote: > On Wed, Nov 18, 2009 at 01:01:13AM +0100, Alexander Graf wrote: > >> On 18.11.2009, at 00:40, Aurelien Jarno wrote: >> >> >>> On Wed, Nov 11, 2009 at 12:56:47AM +0000, Paul Brook wrote: >>> >>>> On Thursday 22 October 2009, Aurelien Jarno wrote: >>>> >>>>> On Wed, Oct 21, 2009 at 03:52:22PM +0200, Ulrich Hecht wrote: >>>>> >>>>>> sync allows concurrent accesses to locations in memory through >>>>>> different >>>>>> TCG variables. This comes in handy when you are emulating CPU >>>>>> registers >>>>>> that can be used as either 32 or 64 bit, as TCG doesn't know >>>>>> anything >>>>>> about aliases. See the s390x target for an example. >>>>>> >>>>>> Fixed sync_i64 build failure on 32-bit targets. >>>>>> >>>>> Looking more in details to the use case of this patch, I think it >>>>> can be >>>>> useful in QEMU. However I don't feel very comfortable in merging it >>>>> without having the opinion of more persons. Paul, Malc Blue Swirl or >>>>> others, any opinion? >>>>> >>>> I don't think this is the right solution. >>>> >>>> IIUC the basic problem is that we have a register file where >>>> adjacent pairs of >>>> 32-bit registers are also accessed as a 64-bit value. This is >>>> something many >>>> other targets need to do (at least ARM, PPC, MIPS and SPARC). >>>> >>>> While sync appears attractive as a quick hack to achieve this, I >>>> think it is >>>> liable to be abused, and cause us serious pain long-term. If you >>>> need an easy >>>> solution then use ld/st (as with ARM VFP registers). If you want a >>>> good >>>> solution then fix whichever bit of TCG makes accessing a pair of >>>> registers >>>> horribly slow. We already have some support for this >>>> (concat_i32_i64). >>>> >>>> >>> What is probably needed here are merge_low and merge_high ops, merging >>> a >>> 32-bit value into the low or high part of a 64-bit value, leaving the >>> other part unchanged. Not sure we can really optimize that on x86/ >>> x86_64 >>> compared to standard TCG code though. >>> >> Maybe I got the whole point wrong but isn't this about having 2 virtual >> register sets for the same target registers? So you'd have: >> >> TCGvar my_reg_32; >> TCGvar my_reg_64; >> >> and whenever you work with either of them you want to have the correct >> value present in both (cut off for 32bit, extended for 64 bit). >> >> So what's really necessary is some internal coupling and dirty log of >> variables within TCG so it knows that an access to the 64 bit of a >> coupled variable means it needs to sync the 32 bit value over and vice >> versa. Then magically everything would just work as expected. >> >> > > That's an option, basically keeping the list (or only one ?) of aliased > TCG variables for each of them, and freeing the others before using one. > Yeah, only one. I don't think this should be getting overengineered (and thus slow) :-). Doesn't really sound hard, does it? Any TCG magicians around? :) Alex
> > That's an option, basically keeping the list (or only one ?) of aliased > > TCG variables for each of them, and freeing the others before using one. > > Yeah, only one. I don't think this should be getting overengineered (and > thus slow) :-). > Doesn't really sound hard, does it? Any TCG magicians around? :) Maybe. This is the sort of thing that tends to get harder the more you think about it. Paul
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index faf2e8b..c1b4710 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -316,6 +316,18 @@ static inline void tcg_gen_br(int label) tcg_gen_op1i(INDEX_op_br, label); } +static inline void tcg_gen_sync_i32(TCGv_i32 arg) +{ + tcg_gen_op1_i32(INDEX_op_sync_i32, arg); +} + +#if TCG_TARGET_REG_BITS == 64 +static inline void tcg_gen_sync_i64(TCGv_i64 arg) +{ + tcg_gen_op1_i64(INDEX_op_sync_i64, arg); +} +#endif + static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg) { if (!TCGV_EQUAL_I32(ret, arg)) diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index b7f3fd7..5dcdeba 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -40,6 +40,7 @@ DEF2(call, 0, 1, 2, TCG_OPF_SIDE_EFFECTS) /* variable number of parameters */ DEF2(jmp, 0, 1, 0, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS) DEF2(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS) +DEF2(sync_i32, 0, 1, 0, 0) DEF2(mov_i32, 1, 1, 0, 0) DEF2(movi_i32, 1, 0, 1, 0) /* load/store */ @@ -109,6 +110,7 @@ DEF2(neg_i32, 1, 1, 0, 0) #endif #if TCG_TARGET_REG_BITS == 64 +DEF2(sync_i64, 0, 1, 0, 0) DEF2(mov_i64, 1, 1, 0, 0) DEF2(movi_i64, 1, 0, 1, 0) /* load/store */ diff --git a/tcg/tcg.c b/tcg/tcg.c index 3c0e296..8eb60f8 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1930,6 +1930,12 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf, // dump_regs(s); #endif switch(opc) { + case INDEX_op_sync_i32: +#if TCG_TARGET_REG_BITS == 64 + case INDEX_op_sync_i64: +#endif + temp_save(s, args[0], s->reserved_regs); + break; case INDEX_op_mov_i32: #if TCG_TARGET_REG_BITS == 64 case INDEX_op_mov_i64:
sync allows concurrent accesses to locations in memory through different TCG variables. This comes in handy when you are emulating CPU registers that can be used as either 32 or 64 bit, as TCG doesn't know anything about aliases. See the s390x target for an example. Fixed sync_i64 build failure on 32-bit targets. Signed-off-by: Ulrich Hecht <uli@suse.de> --- tcg/tcg-op.h | 12 ++++++++++++ tcg/tcg-opc.h | 2 ++ tcg/tcg.c | 6 ++++++ 3 files changed, 20 insertions(+), 0 deletions(-)