Message ID | 1349814652-22325-2-git-send-email-aurelien@aurel32.net |
---|---|
State | New |
Headers | show |
On 9 October 2012 21:30, Aurelien Jarno <aurelien@aurel32.net> wrote: > The TCG arm backend considers likely that the offset to the TLB > entries does not exceed 12 bits for mem_index = 0. In practice this is > not true for at list the MIPS target. > > The current patch fixes that by loading the bits 23-12 with a separate > instruction, and using loads with address writeback, independently of > the value of mem_idx. In total this allow a 24-bit offset, which is a > lot more than needed. Would probably be good to assert() that the bits above 24 are zero, though. > Cc: Andrzej Zaborowski <balrogg@gmail.com> > Cc: Peter Maydell <peter.maydell@linaro.org> > Cc: qemu-stable@nongnu.org > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> > --- > tcg/arm/tcg-target.c | 73 +++++++++++++++++++++++++------------------------- > 1 file changed, 37 insertions(+), 36 deletions(-) > > diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c > index 737200e..6cde512 100644 > --- a/tcg/arm/tcg-target.c > +++ b/tcg/arm/tcg-target.c > @@ -624,6 +624,19 @@ static inline void tcg_out_ld32_12(TCGContext *s, int cond, > (rn << 16) | (rd << 12) | ((-im) & 0xfff)); > } > > +/* Offset pre-increment with base writeback. */ > +static inline void tcg_out_ld32_12wb(TCGContext *s, int cond, > + int rd, int rn, tcg_target_long im) > +{ Worth asserting that rd != rn ? (that's an UNPREDICTABLE case for ldr with writeback) > + if (im >= 0) { > + tcg_out32(s, (cond << 28) | 0x05b00000 | > + (rn << 16) | (rd << 12) | (im & 0xfff)); > + } else { > + tcg_out32(s, (cond << 28) | 0x05300000 | > + (rn << 16) | (rd << 12) | ((-im) & 0xfff)); > + } you could avoid the if() by just setting the U bit using "(im >= 0) << 23" Clearer? Dunno. Looks OK otherwise (though I haven't tested it.) Random aside: we emit a pretty long string of COND_EQ predicated code in these functions, especially in the 64 bit address and byteswapped cases. It might be interesting to see if performance on an A9, say, was any better if we just did a conditional branch over it instead... Probably borderline to no-effect, though. -- PMM
On Tue, Oct 9, 2012 at 11:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 9 October 2012 21:30, Aurelien Jarno <aurelien@aurel32.net> wrote: >> The TCG arm backend considers likely that the offset to the TLB >> entries does not exceed 12 bits for mem_index = 0. In practice this is >> not true for at list the MIPS target. >> >> The current patch fixes that by loading the bits 23-12 with a separate >> instruction, and using loads with address writeback, independently of >> the value of mem_idx. In total this allow a 24-bit offset, which is a >> lot more than needed. > > Would probably be good to assert() that the bits above 24 are zero, > though. > >> Cc: Andrzej Zaborowski <balrogg@gmail.com> >> Cc: Peter Maydell <peter.maydell@linaro.org> >> Cc: qemu-stable@nongnu.org >> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> >> --- >> tcg/arm/tcg-target.c | 73 +++++++++++++++++++++++++------------------------- >> 1 file changed, 37 insertions(+), 36 deletions(-) >> >> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c >> index 737200e..6cde512 100644 >> --- a/tcg/arm/tcg-target.c >> +++ b/tcg/arm/tcg-target.c >> @@ -624,6 +624,19 @@ static inline void tcg_out_ld32_12(TCGContext *s, int cond, >> (rn << 16) | (rd << 12) | ((-im) & 0xfff)); >> } >> >> +/* Offset pre-increment with base writeback. */ >> +static inline void tcg_out_ld32_12wb(TCGContext *s, int cond, >> + int rd, int rn, tcg_target_long im) >> +{ > > Worth asserting that rd != rn ? (that's an UNPREDICTABLE case > for ldr with writeback) > >> + if (im >= 0) { >> + tcg_out32(s, (cond << 28) | 0x05b00000 | >> + (rn << 16) | (rd << 12) | (im & 0xfff)); >> + } else { >> + tcg_out32(s, (cond << 28) | 0x05300000 | >> + (rn << 16) | (rd << 12) | ((-im) & 0xfff)); >> + } > > you could avoid the if() by just setting the U bit using "(im >= 0) << 23" > Clearer? Dunno. You also have to negate the im value so it's not enough to just change the opcode. Laurent > Looks OK otherwise (though I haven't tested it.) > > Random aside: we emit a pretty long string of COND_EQ predicated > code in these functions, especially in the 64 bit address and > byteswapped cases. It might be interesting to see if performance > on an A9, say, was any better if we just did a conditional branch > over it instead... Probably borderline to no-effect, though. > > -- PMM >
On 10 October 2012 11:00, Laurent Desnogues <laurent.desnogues@gmail.com> wrote: > On Tue, Oct 9, 2012 at 11:13 PM, Peter Maydell <peter.maydell@linaro.org> wrote: >> On 9 October 2012 21:30, Aurelien Jarno <aurelien@aurel32.net> wrote: >>> + if (im >= 0) { >>> + tcg_out32(s, (cond << 28) | 0x05b00000 | >>> + (rn << 16) | (rd << 12) | (im & 0xfff)); >>> + } else { >>> + tcg_out32(s, (cond << 28) | 0x05300000 | >>> + (rn << 16) | (rd << 12) | ((-im) & 0xfff)); >>> + } >> >> you could avoid the if() by just setting the U bit using "(im >= 0) << 23" >> Clearer? Dunno. > > You also have to negate the im value so it's not enough to just > change the opcode. Doh, of course. Forget what I wrote, then :-) -- PMM
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c index 737200e..6cde512 100644 --- a/tcg/arm/tcg-target.c +++ b/tcg/arm/tcg-target.c @@ -624,6 +624,19 @@ static inline void tcg_out_ld32_12(TCGContext *s, int cond, (rn << 16) | (rd << 12) | ((-im) & 0xfff)); } +/* Offset pre-increment with base writeback. */ +static inline void tcg_out_ld32_12wb(TCGContext *s, int cond, + int rd, int rn, tcg_target_long im) +{ + if (im >= 0) { + tcg_out32(s, (cond << 28) | 0x05b00000 | + (rn << 16) | (rd << 12) | (im & 0xfff)); + } else { + tcg_out32(s, (cond << 28) | 0x05300000 | + (rn << 16) | (rd << 12) | ((-im) & 0xfff)); + } +} + static inline void tcg_out_st32_12(TCGContext *s, int cond, int rd, int rn, tcg_target_long im) { @@ -1056,7 +1069,7 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc) { int addr_reg, data_reg, data_reg2, bswap; #ifdef CONFIG_SOFTMMU - int mem_index, s_bits; + int mem_index, s_bits, tlb_offset; TCGReg argreg; # if TARGET_LONG_BITS == 64 int addr_reg2; @@ -1096,19 +1109,14 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc) TCG_REG_R0, TCG_REG_R8, CPU_TLB_SIZE - 1); tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_AREG0, TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS)); - /* In the - * ldr r1 [r0, #(offsetof(CPUArchState, tlb_table[mem_index][0].addr_read))] - * below, the offset is likely to exceed 12 bits if mem_index != 0 and - * not exceed otherwise, so use an - * add r0, r0, #(mem_index * sizeof *CPUArchState.tlb_table) - * before. - */ - if (mem_index) + /* We assume that the offset is contained within 24 bits. */ + tlb_offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_read); + if (tlb_offset > 0xfff) { tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_REG_R0, - (mem_index << (TLB_SHIFT & 1)) | - ((16 - (TLB_SHIFT >> 1)) << 8)); - tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R0, - offsetof(CPUArchState, tlb_table[0][0].addr_read)); + 0xa00 | (tlb_offset >> 12)); + tlb_offset &= 0xfff; + } + tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset); tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1, TCG_REG_R8, SHIFT_IMM_LSL(TARGET_PAGE_BITS)); /* Check alignment. */ @@ -1116,15 +1124,14 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc) tcg_out_dat_imm(s, COND_EQ, ARITH_TST, 0, addr_reg, (1 << s_bits) - 1); # if TARGET_LONG_BITS == 64 - /* XXX: possibly we could use a block data load or writeback in - * the first access. */ - tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, - offsetof(CPUArchState, tlb_table[0][0].addr_read) + 4); + /* XXX: possibly we could use a block data load in the first access. */ + tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4); tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, TCG_REG_R1, addr_reg2, SHIFT_IMM_LSL(0)); # endif tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, - offsetof(CPUArchState, tlb_table[0][0].addend)); + offsetof(CPUTLBEntry, addend) + - offsetof(CPUTLBEntry, addr_read)); switch (opc) { case 0: @@ -1273,7 +1280,7 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc) { int addr_reg, data_reg, data_reg2, bswap; #ifdef CONFIG_SOFTMMU - int mem_index, s_bits; + int mem_index, s_bits, tlb_offset; TCGReg argreg; # if TARGET_LONG_BITS == 64 int addr_reg2; @@ -1310,19 +1317,14 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc) TCG_REG_R0, TCG_REG_R8, CPU_TLB_SIZE - 1); tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_AREG0, TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS)); - /* In the - * ldr r1 [r0, #(offsetof(CPUArchState, tlb_table[mem_index][0].addr_write))] - * below, the offset is likely to exceed 12 bits if mem_index != 0 and - * not exceed otherwise, so use an - * add r0, r0, #(mem_index * sizeof *CPUArchState.tlb_table) - * before. - */ - if (mem_index) + /* We assume that the offset is contained within 24 bits. */ + tlb_offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_write); + if (tlb_offset > 0xfff) { tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_REG_R0, - (mem_index << (TLB_SHIFT & 1)) | - ((16 - (TLB_SHIFT >> 1)) << 8)); - tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R0, - offsetof(CPUArchState, tlb_table[0][0].addr_write)); + 0xa00 | (tlb_offset >> 12)); + tlb_offset &= 0xfff; + } + tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset); tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1, TCG_REG_R8, SHIFT_IMM_LSL(TARGET_PAGE_BITS)); /* Check alignment. */ @@ -1330,15 +1332,14 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc) tcg_out_dat_imm(s, COND_EQ, ARITH_TST, 0, addr_reg, (1 << s_bits) - 1); # if TARGET_LONG_BITS == 64 - /* XXX: possibly we could use a block data load or writeback in - * the first access. */ - tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, - offsetof(CPUArchState, tlb_table[0][0].addr_write) + 4); + /* XXX: possibly we could use a block data load in the first access. */ + tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4); tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0, TCG_REG_R1, addr_reg2, SHIFT_IMM_LSL(0)); # endif tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, - offsetof(CPUArchState, tlb_table[0][0].addend)); + offsetof(CPUTLBEntry, addend) + - offsetof(CPUTLBEntry, addr_write)); switch (opc) { case 0:
The TCG arm backend considers likely that the offset to the TLB entries does not exceed 12 bits for mem_index = 0. In practice this is not true for at list the MIPS target. The current patch fixes that by loading the bits 23-12 with a separate instruction, and using loads with address writeback, independently of the value of mem_idx. In total this allow a 24-bit offset, which is a lot more than needed. Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: qemu-stable@nongnu.org Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> --- tcg/arm/tcg-target.c | 73 +++++++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 36 deletions(-)