Message ID | 20240623115704.315645-3-mark.cave-ayland@ilande.co.uk |
---|---|
State | New |
Headers | show |
Series | target/m68k: implement unaligned accesses for m68k CPUs | expand |
On Sun, 23 Jun 2024, Mark Cave-Ayland wrote: > Now that do_unaligned_access has been implemented for 68k CPUs, pass the required > alignment into the TCG memory load/store routines. This allows the TCG memory core > to generate an Address Error exception for unaligned memory accesses if required. > > Suggested-by: Laurent Vivier <laurent@vivier.eu> > Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165 > --- > target/m68k/translate.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/target/m68k/translate.c b/target/m68k/translate.c > index 445966fb6a..661a7b4def 100644 > --- a/target/m68k/translate.c > +++ b/target/m68k/translate.c > @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr, > int sign, int index) > { > TCGv tmp = tcg_temp_new_i32(); > + MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE; > > switch (opsize) { > case OS_BYTE: > + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); > + break; > case OS_WORD: > case OS_LONG: > - tcg_gen_qemu_ld_tl(tmp, addr, index, > - opsize | (sign ? MO_SIGN : 0) | MO_TE); > + if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) { > + memop |= MO_ALIGN_2; > + } > + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); You could swap the order of these so byte comes last and fall through to it from word/long to avoid duplicated line. Maybe this answers my question about where it's restriced by CPU type. I wonder if this check for M68K_FEATURE_UNALIGNED_DATA could be avoded here and done by checking it in init and only set the unaligned method for CPUs that need it to not add overhead for most CPUs that don't need it. Regards, BALATON Zoltan > break; > default: > g_assert_not_reached(); > @@ -321,11 +326,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr, > static inline void gen_store(DisasContext *s, int opsize, TCGv addr, TCGv val, > int index) > { > + MemOp memop = opsize | MO_TE; > + > switch (opsize) { > case OS_BYTE: > + tcg_gen_qemu_st_tl(val, addr, index, memop); > + break; > case OS_WORD: > case OS_LONG: > - tcg_gen_qemu_st_tl(val, addr, index, opsize | MO_TE); > + if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) { > + memop |= MO_ALIGN_2; > + } > + tcg_gen_qemu_st_tl(val, addr, index, memop); > break; > default: > g_assert_not_reached(); >
On 6/23/24 08:23, BALATON Zoltan wrote: > On Sun, 23 Jun 2024, Mark Cave-Ayland wrote: >> Now that do_unaligned_access has been implemented for 68k CPUs, pass the required >> alignment into the TCG memory load/store routines. This allows the TCG memory core >> to generate an Address Error exception for unaligned memory accesses if required. >> >> Suggested-by: Laurent Vivier <laurent@vivier.eu> >> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165 >> --- >> target/m68k/translate.c | 18 +++++++++++++++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/target/m68k/translate.c b/target/m68k/translate.c >> index 445966fb6a..661a7b4def 100644 >> --- a/target/m68k/translate.c >> +++ b/target/m68k/translate.c >> @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr, >> int sign, int index) >> { >> TCGv tmp = tcg_temp_new_i32(); >> + MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE; >> >> switch (opsize) { >> case OS_BYTE: >> + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); >> + break; >> case OS_WORD: >> case OS_LONG: >> - tcg_gen_qemu_ld_tl(tmp, addr, index, >> - opsize | (sign ? MO_SIGN : 0) | MO_TE); >> + if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) { >> + memop |= MO_ALIGN_2; >> + } >> + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); > > You could swap the order of these so byte comes last and fall through to it from word/long > to avoid duplicated line. > > Maybe this answers my question about where it's restriced by CPU type. I wonder if this > check for M68K_FEATURE_UNALIGNED_DATA could be avoded here and done by checking it in init > and only set the unaligned method for CPUs that need it to not add overhead for most CPUs > that don't need it. No, there's no overhead in having the unaligned method present always. But swapping the order of case labels, or sinking the tcg_gen_qemu_ld_tl call below the switch, is a good idea. r~
On 23/06/2024 16:23, BALATON Zoltan wrote: > On Sun, 23 Jun 2024, Mark Cave-Ayland wrote: >> Now that do_unaligned_access has been implemented for 68k CPUs, pass the required >> alignment into the TCG memory load/store routines. This allows the TCG memory core >> to generate an Address Error exception for unaligned memory accesses if required. >> >> Suggested-by: Laurent Vivier <laurent@vivier.eu> >> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165 >> --- >> target/m68k/translate.c | 18 +++++++++++++++--- >> 1 file changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/target/m68k/translate.c b/target/m68k/translate.c >> index 445966fb6a..661a7b4def 100644 >> --- a/target/m68k/translate.c >> +++ b/target/m68k/translate.c >> @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv >> addr, >> int sign, int index) >> { >> TCGv tmp = tcg_temp_new_i32(); >> + MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE; >> >> switch (opsize) { >> case OS_BYTE: >> + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); >> + break; >> case OS_WORD: >> case OS_LONG: >> - tcg_gen_qemu_ld_tl(tmp, addr, index, >> - opsize | (sign ? MO_SIGN : 0) | MO_TE); >> + if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) { >> + memop |= MO_ALIGN_2; >> + } >> + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); > > You could swap the order of these so byte comes last and fall through to it from > word/long to avoid duplicated line. > > Maybe this answers my question about where it's restriced by CPU type. I wonder if > this check for M68K_FEATURE_UNALIGNED_DATA could be avoded here and done by checking > it in init and only set the unaligned method for CPUs that need it to not add > overhead for most CPUs that don't need it. I don't think that it matters too much if the method isn't implemented as the logic surrounding when to call do_unaligned_access is contained within the TCG core. I'll have a go at updating the ordering and send a v2 if it looks good. ATB, Mark.
On Mon, 24 Jun 2024, Mark Cave-Ayland wrote: > On 23/06/2024 16:23, BALATON Zoltan wrote: >> On Sun, 23 Jun 2024, Mark Cave-Ayland wrote: >>> Now that do_unaligned_access has been implemented for 68k CPUs, pass the >>> required >>> alignment into the TCG memory load/store routines. This allows the TCG >>> memory core >>> to generate an Address Error exception for unaligned memory accesses if >>> required. >>> >>> Suggested-by: Laurent Vivier <laurent@vivier.eu> >>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> >>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165 >>> --- >>> target/m68k/translate.c | 18 +++++++++++++++--- >>> 1 file changed, 15 insertions(+), 3 deletions(-) >>> >>> diff --git a/target/m68k/translate.c b/target/m68k/translate.c >>> index 445966fb6a..661a7b4def 100644 >>> --- a/target/m68k/translate.c >>> +++ b/target/m68k/translate.c >>> @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int >>> opsize, TCGv addr, >>> int sign, int index) >>> { >>> TCGv tmp = tcg_temp_new_i32(); >>> + MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE; >>> >>> switch (opsize) { >>> case OS_BYTE: >>> + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); >>> + break; >>> case OS_WORD: >>> case OS_LONG: >>> - tcg_gen_qemu_ld_tl(tmp, addr, index, >>> - opsize | (sign ? MO_SIGN : 0) | MO_TE); >>> + if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) { >>> + memop |= MO_ALIGN_2; >>> + } >>> + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); >> >> You could swap the order of these so byte comes last and fall through to it >> from word/long to avoid duplicated line. >> >> Maybe this answers my question about where it's restriced by CPU type. I >> wonder if this check for M68K_FEATURE_UNALIGNED_DATA could be avoded here >> and done by checking it in init and only set the unaligned method for CPUs >> that need it to not add overhead for most CPUs that don't need it. > > I don't think that it matters too much if the method isn't implemented as the > logic surrounding when to call do_unaligned_access is contained within the > TCG core. I've seen this after I've sent a patch for PPC where removing a conditional from a helper often called for memory access showed it had an effect on performance. So I thought adding a conditional here might cause some slow down for CPUs that don't need this. But this is compile time so maybe it's not that big problem as in a hepler. Yet only adding the unaligned method for CPUs then always set MO_ALIGN here avoiding the if only calls the method when needed if that works at all. I don't know if that's worth it, you could test with some memory intensive benchmark such as STREAM benchmark to check if this has any effect. Regards, BALATON Zotan > I'll have a go at updating the ordering and send a v2 if it looks good. > > > ATB, > > Mark. > >
diff --git a/target/m68k/translate.c b/target/m68k/translate.c index 445966fb6a..661a7b4def 100644 --- a/target/m68k/translate.c +++ b/target/m68k/translate.c @@ -303,13 +303,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr, int sign, int index) { TCGv tmp = tcg_temp_new_i32(); + MemOp memop = opsize | (sign ? MO_SIGN : 0) | MO_TE; switch (opsize) { case OS_BYTE: + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); + break; case OS_WORD: case OS_LONG: - tcg_gen_qemu_ld_tl(tmp, addr, index, - opsize | (sign ? MO_SIGN : 0) | MO_TE); + if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) { + memop |= MO_ALIGN_2; + } + tcg_gen_qemu_ld_tl(tmp, addr, index, memop); break; default: g_assert_not_reached(); @@ -321,11 +326,18 @@ static inline TCGv gen_load(DisasContext *s, int opsize, TCGv addr, static inline void gen_store(DisasContext *s, int opsize, TCGv addr, TCGv val, int index) { + MemOp memop = opsize | MO_TE; + switch (opsize) { case OS_BYTE: + tcg_gen_qemu_st_tl(val, addr, index, memop); + break; case OS_WORD: case OS_LONG: - tcg_gen_qemu_st_tl(val, addr, index, opsize | MO_TE); + if (!m68k_feature(s->env, M68K_FEATURE_UNALIGNED_DATA)) { + memop |= MO_ALIGN_2; + } + tcg_gen_qemu_st_tl(val, addr, index, memop); break; default: g_assert_not_reached();
Now that do_unaligned_access has been implemented for 68k CPUs, pass the required alignment into the TCG memory load/store routines. This allows the TCG memory core to generate an Address Error exception for unaligned memory accesses if required. Suggested-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2165 --- target/m68k/translate.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)