Message ID | 20231023024059.3858349-1-gaosong@loongson.cn |
---|---|
State | New |
Headers | show |
Series | target/loongarch: Support 4K page size | expand |
在 2023/10/23 上午10:40, Song Gao 写道: > The LoongArch kernel supports 4K page size. > Change TARGET_PAGE_BITS to 12. > > Signed-off-by: Song Gao <gaosong@loongson.cn> > --- > target/loongarch/cpu-param.h | 2 +- > target/loongarch/tlb_helper.c | 9 ++++----- > 2 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h > index 1265dc7cb5..cfe195db4e 100644 > --- a/target/loongarch/cpu-param.h > +++ b/target/loongarch/cpu-param.h > @@ -12,6 +12,6 @@ > #define TARGET_PHYS_ADDR_SPACE_BITS 48 > #define TARGET_VIRT_ADDR_SPACE_BITS 48 > > -#define TARGET_PAGE_BITS 14 > +#define TARGET_PAGE_BITS 12 Hi Gaosong, The popular OS about LoongArch still uses 16K page size, qemu should follow the rule of OS rather than defining 4K page size alone. Regards Bibo Mao > > #endif > diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c > index c8b8b0497f..449043c68b 100644 > --- a/target/loongarch/tlb_helper.c > +++ b/target/loongarch/tlb_helper.c > @@ -60,6 +60,9 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical, > tlb_rplv = 0; > } > > + /* Remove sw bit between bit12 -- bit PS*/ > + tlb_ppn = tlb_ppn & ~(((0x1UL << (tlb_ps - 12)) -1)); > + > /* Check access rights */ > if (!tlb_v) { > return TLBRET_INVALID; > @@ -82,10 +85,6 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical, > return TLBRET_DIRTY; > } > > - /* > - * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15] > - * need adjust. > - */ > *physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) | > (address & MAKE_64BIT_MASK(0, tlb_ps)); > *prot = PAGE_READ; > @@ -774,7 +773,7 @@ void helper_ldpte(CPULoongArchState *env, target_ulong base, target_ulong odd, > /* Move Global bit */ > tmp0 = ((tmp0 & (1 << LOONGARCH_HGLOBAL_SHIFT)) >> > LOONGARCH_HGLOBAL_SHIFT) << R_TLBENTRY_G_SHIFT | > - (tmp0 & (~(1 << R_TLBENTRY_G_SHIFT))); > + (tmp0 & (~(1 << LOONGARCH_HGLOBAL_SHIFT))); > ps = ptbase + ptwidth - 1; > if (odd) { > tmp0 += MAKE_64BIT_MASK(ps, 1); >
On Mon, 23 Oct 2023 at 05:06, maobibo <maobibo@loongson.cn> wrote: > > > > 在 2023/10/23 上午10:40, Song Gao 写道: > > The LoongArch kernel supports 4K page size. > > Change TARGET_PAGE_BITS to 12. > > > > Signed-off-by: Song Gao <gaosong@loongson.cn> > > --- > > target/loongarch/cpu-param.h | 2 +- > > target/loongarch/tlb_helper.c | 9 ++++----- > > 2 files changed, 5 insertions(+), 6 deletions(-) > > > > diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h > > index 1265dc7cb5..cfe195db4e 100644 > > --- a/target/loongarch/cpu-param.h > > +++ b/target/loongarch/cpu-param.h > > @@ -12,6 +12,6 @@ > > #define TARGET_PHYS_ADDR_SPACE_BITS 48 > > #define TARGET_VIRT_ADDR_SPACE_BITS 48 > > > > -#define TARGET_PAGE_BITS 14 > > +#define TARGET_PAGE_BITS 12 > Hi Gaosong, > > The popular OS about LoongArch still uses 16K page size, qemu should > follow the rule of OS rather than defining 4K page size alone. The TARGET_PAGE_BITS value in QEMU is a property of the hardware, not the guest OS. It should specify the smallest page size the guest can configure the CPU to use. If the guest asks for a larger page size than the minimum then that works fine. See for example PPC64 -- on this architecture both 4K and 64K pages are possible, so we define TARGET_PAGE_BITS to 12, even though a lot of Linux guests use 64K pages. It is slightly less efficient when the guest uses a page size larger than the TARGET_PAGE_BITS value indicates, so if you have an architecture where some CPUs support small pages but most do not, you can do what Arm does, and use the TARGET_PAGE_BITS_VARY support. This makes the TARGET_PAGE_BITS macro be a runtime-configurable value, where a machine model can set the mc->minimum_page_bits value to indicate that that machine doesn't need the small-pages handling. thanks -- PMM
在 2023/10/23 下午6:22, Peter Maydell 写道: > On Mon, 23 Oct 2023 at 05:06, maobibo <maobibo@loongson.cn> wrote: >> >> >> >> 在 2023/10/23 上午10:40, Song Gao 写道: >>> The LoongArch kernel supports 4K page size. >>> Change TARGET_PAGE_BITS to 12. >>> >>> Signed-off-by: Song Gao <gaosong@loongson.cn> >>> --- >>> target/loongarch/cpu-param.h | 2 +- >>> target/loongarch/tlb_helper.c | 9 ++++----- >>> 2 files changed, 5 insertions(+), 6 deletions(-) >>> >>> diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h >>> index 1265dc7cb5..cfe195db4e 100644 >>> --- a/target/loongarch/cpu-param.h >>> +++ b/target/loongarch/cpu-param.h >>> @@ -12,6 +12,6 @@ >>> #define TARGET_PHYS_ADDR_SPACE_BITS 48 >>> #define TARGET_VIRT_ADDR_SPACE_BITS 48 >>> >>> -#define TARGET_PAGE_BITS 14 >>> +#define TARGET_PAGE_BITS 12 >> Hi Gaosong, >> >> The popular OS about LoongArch still uses 16K page size, qemu should >> follow the rule of OS rather than defining 4K page size alone. > > The TARGET_PAGE_BITS value in QEMU is a property of the hardware, > not the guest OS. It should specify the smallest page size the > guest can configure the CPU to use. If the guest asks for a > larger page size than the minimum then that works fine. See > for example PPC64 -- on this architecture both 4K and 64K > pages are possible, so we define TARGET_PAGE_BITS to 12, > even though a lot of Linux guests use 64K pages. > > It is slightly less efficient when the guest uses a page size > larger than the TARGET_PAGE_BITS value indicates, so if you > have an architecture where some CPUs support small pages > but most do not, you can do what Arm does, and use the > TARGET_PAGE_BITS_VARY support. This makes the TARGET_PAGE_BITS > macro be a runtime-configurable value, where a machine model can > set the mc->minimum_page_bits value to indicate that that > machine doesn't need the small-pages handling. Peter, Thanks for your guidance, the TARGET_PAGE_BITS setting issue puzzle us for a long time. I have a simple test for kernel with 4K/16K different page size, it boots well if TARGET_PAGE_BITS is set as 12. And we will do more test, we will switch to TARGET_PAGE_BITS to 12 if all the test pass. Regards Bibo Mao > > thanks > -- PMM >
On 23/10/23 12:22, Peter Maydell wrote: > On Mon, 23 Oct 2023 at 05:06, maobibo <maobibo@loongson.cn> wrote: >> >> >> >> 在 2023/10/23 上午10:40, Song Gao 写道: >>> The LoongArch kernel supports 4K page size. >>> Change TARGET_PAGE_BITS to 12. >>> >>> Signed-off-by: Song Gao <gaosong@loongson.cn> >>> --- >>> target/loongarch/cpu-param.h | 2 +- >>> target/loongarch/tlb_helper.c | 9 ++++----- >>> 2 files changed, 5 insertions(+), 6 deletions(-) >>> >>> diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h >>> index 1265dc7cb5..cfe195db4e 100644 >>> --- a/target/loongarch/cpu-param.h >>> +++ b/target/loongarch/cpu-param.h >>> @@ -12,6 +12,6 @@ >>> #define TARGET_PHYS_ADDR_SPACE_BITS 48 >>> #define TARGET_VIRT_ADDR_SPACE_BITS 48 >>> >>> -#define TARGET_PAGE_BITS 14 >>> +#define TARGET_PAGE_BITS 12 >> Hi Gaosong, >> >> The popular OS about LoongArch still uses 16K page size, qemu should >> follow the rule of OS rather than defining 4K page size alone. > > The TARGET_PAGE_BITS value in QEMU is a property of the hardware, > not the guest OS. It should specify the smallest page size the > guest can configure the CPU to use. If the guest asks for a > larger page size than the minimum then that works fine. See > for example PPC64 -- on this architecture both 4K and 64K > pages are possible, so we define TARGET_PAGE_BITS to 12, > even though a lot of Linux guests use 64K pages. > > It is slightly less efficient when the guest uses a page size > larger than the TARGET_PAGE_BITS value indicates, so if you > have an architecture where some CPUs support small pages > but most do not, you can do what Arm does, and use the > TARGET_PAGE_BITS_VARY support. This makes the TARGET_PAGE_BITS > macro be a runtime-configurable value, where a machine model can > set the mc->minimum_page_bits value to indicate that that > machine doesn't need the small-pages handling. With heterogeneous architectures emulation, eventually all targets will use TARGET_PAGE_BITS_VARY.
23.10.2023 05:40, Song Gao wrote: > The LoongArch kernel supports 4K page size. > Change TARGET_PAGE_BITS to 12. This change appears to have 2 issues. First, the subject is misleading, - it does not only introduces support for 4K page size, it actually *switches* to 4K page size. But this is sort of minor. More interestingly is that it has quite noticeable effect on performance. For example, https://gitlab.com/qemu-project/qemu/-/issues/2491 - I confirm 7z decompression performance drop from ~110Mb/s before this change to ~73Mb/s after it. Is such a performance drop expected? Thanks, /mjt
On 10/7/24 07:48, Michael Tokarev wrote: > 23.10.2023 05:40, Song Gao wrote: >> The LoongArch kernel supports 4K page size. >> Change TARGET_PAGE_BITS to 12. > > This change appears to have 2 issues. > > First, the subject is misleading, - it does not only introduces support for 4K page > size, it actually *switches* to 4K page size. But this is sort of minor. > > More interestingly is that it has quite noticeable effect on performance. For > example, https://gitlab.com/qemu-project/qemu/-/issues/2491 - I confirm 7z > decompression performance drop from ~110Mb/s before this change to ~73Mb/s > after it. > > Is such a performance drop expected? The #2491 issue appears to be for user-mode emulation. Because the reported host is x86, I would expect guest page size == host page size to improve performance, not degrade it. If this were system mode emulation, quite possibly. If the guest loongarch kernel is still using 16k pages, then all pages that are given to softmmu are "large pages", which perform poorly. I hope to address this at some point. If this is really about user-mode, then perf may be your friend in determining where the extra overhead is coming from. r~
07.10.2024 18:09, Richard Henderson wrote: > The #2491 issue appears to be for user-mode emulation. Because the reported host is x86, I would expect guest page size == host page size to improve > performance, not degrade it. Yes, it is about linux-user. > If this is really about user-mode, then perf may be your friend in determining where the extra overhead is coming from. I updated the issue adding some perf output. It looks like the 4k-pagesize case just calls tb_lookup() and extract64() significantly more times than with 16K pages. Thanks, /mjt
diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h index 1265dc7cb5..cfe195db4e 100644 --- a/target/loongarch/cpu-param.h +++ b/target/loongarch/cpu-param.h @@ -12,6 +12,6 @@ #define TARGET_PHYS_ADDR_SPACE_BITS 48 #define TARGET_VIRT_ADDR_SPACE_BITS 48 -#define TARGET_PAGE_BITS 14 +#define TARGET_PAGE_BITS 12 #endif diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c index c8b8b0497f..449043c68b 100644 --- a/target/loongarch/tlb_helper.c +++ b/target/loongarch/tlb_helper.c @@ -60,6 +60,9 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical, tlb_rplv = 0; } + /* Remove sw bit between bit12 -- bit PS*/ + tlb_ppn = tlb_ppn & ~(((0x1UL << (tlb_ps - 12)) -1)); + /* Check access rights */ if (!tlb_v) { return TLBRET_INVALID; @@ -82,10 +85,6 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical, return TLBRET_DIRTY; } - /* - * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15] - * need adjust. - */ *physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) | (address & MAKE_64BIT_MASK(0, tlb_ps)); *prot = PAGE_READ; @@ -774,7 +773,7 @@ void helper_ldpte(CPULoongArchState *env, target_ulong base, target_ulong odd, /* Move Global bit */ tmp0 = ((tmp0 & (1 << LOONGARCH_HGLOBAL_SHIFT)) >> LOONGARCH_HGLOBAL_SHIFT) << R_TLBENTRY_G_SHIFT | - (tmp0 & (~(1 << R_TLBENTRY_G_SHIFT))); + (tmp0 & (~(1 << LOONGARCH_HGLOBAL_SHIFT))); ps = ptbase + ptwidth - 1; if (odd) { tmp0 += MAKE_64BIT_MASK(ps, 1);
The LoongArch kernel supports 4K page size. Change TARGET_PAGE_BITS to 12. Signed-off-by: Song Gao <gaosong@loongson.cn> --- target/loongarch/cpu-param.h | 2 +- target/loongarch/tlb_helper.c | 9 ++++----- 2 files changed, 5 insertions(+), 6 deletions(-)