Message ID | 20140317215455.2f14b61f@redhat.com |
---|---|
State | New |
Headers | show |
On 2014-03-18 02:54, Luiz Capitulino wrote: > If you start a Linux guest with more than 4GB of memory and try to look at a > memory address, you will get an error from gdb: > > (gdb) p node_data[0]->node_id > Cannot access memory at address 0xffff88013fffd3a0 > (gdb) I suppose this is x86-64, not 32-bit with PTE, right? > > I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the > case where the PDPTE has the PS bit set (although I didn't check where Linux > sets that bit). This commit adds the PS bit handling, which fixes the problem > for me. > > Signed-off-by: Luiz capitulino <lcapitulino@redhat.com> > --- > > Two observations: > > 1. This bug has always existed, so it's not a regression, so I'm not sure > it's worth it to fix for 2.0 > > 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(), > so I'm not completely sure this is the right thing to do > > target-i386/helper.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/target-i386/helper.c b/target-i386/helper.c > index 4f447b8..9b7803f 100644 > --- a/target-i386/helper.c > +++ b/target-i386/helper.c > @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr) > return -1; > } > > + if (pdpe & PG_PSE_MASK) { > + page_size = 1024 * 1024 * 1024; > + pte = pdpe & ~( (page_size - 1) & ~0xfff); > + pte &= ~(PG_NX_MASK | PG_HI_USER_MASK); > + goto out; > + } Does this also apply if we are not in long mode? I'll check this more carefully later. Jan > + > pde_addr = ((pdpe & ~0xfff & ~(PG_NX_MASK | PG_HI_USER_MASK)) + > (((addr >> 21) & 0x1ff) << 3)) & env->a20_mask; > pde = ldq_phys(cs->as, pde_addr); > @@ -993,6 +1000,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr) > pte = pte & env->a20_mask; > } > > +out: > page_offset = (addr & TARGET_PAGE_MASK) & (page_size - 1); > paddr = (pte & TARGET_PAGE_MASK) + page_offset; > return paddr; >
Il 18/03/2014 08:19, Jan Kiszka ha scritto: > On 2014-03-18 02:54, Luiz Capitulino wrote: >> If you start a Linux guest with more than 4GB of memory and try to look at a >> memory address, you will get an error from gdb: >> >> (gdb) p node_data[0]->node_id >> Cannot access memory at address 0xffff88013fffd3a0 >> (gdb) > > I suppose this is x86-64, not 32-bit with PTE, right? > >> >> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the >> case where the PDPTE has the PS bit set (although I didn't check where Linux >> sets that bit). This commit adds the PS bit handling, which fixes the problem >> for me. >> >> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com> >> --- >> >> Two observations: >> >> 1. This bug has always existed, so it's not a regression, so I'm not sure >> it's worth it to fix for 2.0 Sure, why not? >> 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(), >> so I'm not completely sure this is the right thing to do >> >> target-i386/helper.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/target-i386/helper.c b/target-i386/helper.c >> index 4f447b8..9b7803f 100644 >> --- a/target-i386/helper.c >> +++ b/target-i386/helper.c >> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr) >> return -1; >> } >> >> + if (pdpe & PG_PSE_MASK) { >> + page_size = 1024 * 1024 * 1024; >> + pte = pdpe & ~( (page_size - 1) & ~0xfff); >> + pte &= ~(PG_NX_MASK | PG_HI_USER_MASK); >> + goto out; >> + } > > Does this also apply if we are not in long mode? No, it doesn't. The only valid bits in a PAE PDPTE are P, PWT and PCD. Bit 7 (PS) is reserved. Paolo
On 2014-03-18 08:36, Paolo Bonzini wrote: > Il 18/03/2014 08:19, Jan Kiszka ha scritto: >> On 2014-03-18 02:54, Luiz Capitulino wrote: >>> If you start a Linux guest with more than 4GB of memory and try to >>> look at a >>> memory address, you will get an error from gdb: >>> >>> (gdb) p node_data[0]->node_id >>> Cannot access memory at address 0xffff88013fffd3a0 >>> (gdb) >> >> I suppose this is x86-64, not 32-bit with PTE, right? >> >>> >>> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't >>> handle the >>> case where the PDPTE has the PS bit set (although I didn't check >>> where Linux >>> sets that bit). This commit adds the PS bit handling, which fixes the >>> problem >>> for me. >>> >>> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com> >>> --- >>> >>> Two observations: >>> >>> 1. This bug has always existed, so it's not a regression, so I'm not >>> sure >>> it's worth it to fix for 2.0 > > Sure, why not? > >>> 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(), >>> so I'm not completely sure this is the right thing to do >>> >>> target-i386/helper.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/target-i386/helper.c b/target-i386/helper.c >>> index 4f447b8..9b7803f 100644 >>> --- a/target-i386/helper.c >>> +++ b/target-i386/helper.c >>> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, >>> vaddr addr) >>> return -1; >>> } >>> >>> + if (pdpe & PG_PSE_MASK) { >>> + page_size = 1024 * 1024 * 1024; >>> + pte = pdpe & ~( (page_size - 1) & ~0xfff); >>> + pte &= ~(PG_NX_MASK | PG_HI_USER_MASK); >>> + goto out; >>> + } >> >> Does this also apply if we are not in long mode? > > No, it doesn't. The only valid bits in a PAE PDPTE are P, PWT and PCD. > Bit 7 (PS) is reserved. Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block. And the subject or description should mention that x86_cpu_get_phys_page_debug was lacking support for 1G hugepages. Jan
On Tue, 18 Mar 2014 08:19:52 +0100 Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2014-03-18 02:54, Luiz Capitulino wrote: > > If you start a Linux guest with more than 4GB of memory and try to look at a > > memory address, you will get an error from gdb: > > > > (gdb) p node_data[0]->node_id > > Cannot access memory at address 0xffff88013fffd3a0 > > (gdb) > > I suppose this is x86-64, not 32-bit with PTE, right? Right. > > > > > I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the > > case where the PDPTE has the PS bit set (although I didn't check where Linux > > sets that bit). This commit adds the PS bit handling, which fixes the problem > > for me. > > > > Signed-off-by: Luiz capitulino <lcapitulino@redhat.com> > > --- > > > > Two observations: > > > > 1. This bug has always existed, so it's not a regression, so I'm not sure > > it's worth it to fix for 2.0 > > > > 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(), > > so I'm not completely sure this is the right thing to do > > > > target-i386/helper.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/target-i386/helper.c b/target-i386/helper.c > > index 4f447b8..9b7803f 100644 > > --- a/target-i386/helper.c > > +++ b/target-i386/helper.c > > @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr) > > return -1; > > } > > > > + if (pdpe & PG_PSE_MASK) { > > + page_size = 1024 * 1024 * 1024; > > + pte = pdpe & ~( (page_size - 1) & ~0xfff); > > + pte &= ~(PG_NX_MASK | PG_HI_USER_MASK); > > + goto out; > > + } > > Does this also apply if we are not in long mode? > > I'll check this more carefully later. > > Jan > > > + > > pde_addr = ((pdpe & ~0xfff & ~(PG_NX_MASK | PG_HI_USER_MASK)) + > > (((addr >> 21) & 0x1ff) << 3)) & env->a20_mask; > > pde = ldq_phys(cs->as, pde_addr); > > @@ -993,6 +1000,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr) > > pte = pte & env->a20_mask; > > } > > > > +out: > > page_offset = (addr & TARGET_PAGE_MASK) & (page_size - 1); > > paddr = (pte & TARGET_PAGE_MASK) + page_offset; > > return paddr; > > >
On Tue, 18 Mar 2014 11:30:42 +0100 Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2014-03-18 08:36, Paolo Bonzini wrote: > > Il 18/03/2014 08:19, Jan Kiszka ha scritto: > >> On 2014-03-18 02:54, Luiz Capitulino wrote: > >>> If you start a Linux guest with more than 4GB of memory and try to > >>> look at a > >>> memory address, you will get an error from gdb: > >>> > >>> (gdb) p node_data[0]->node_id > >>> Cannot access memory at address 0xffff88013fffd3a0 > >>> (gdb) > >> > >> I suppose this is x86-64, not 32-bit with PTE, right? > >> > >>> > >>> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't > >>> handle the > >>> case where the PDPTE has the PS bit set (although I didn't check > >>> where Linux > >>> sets that bit). This commit adds the PS bit handling, which fixes the > >>> problem > >>> for me. > >>> > >>> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com> > >>> --- > >>> > >>> Two observations: > >>> > >>> 1. This bug has always existed, so it's not a regression, so I'm not > >>> sure > >>> it's worth it to fix for 2.0 > > > > Sure, why not? > > > >>> 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(), > >>> so I'm not completely sure this is the right thing to do > >>> > >>> target-i386/helper.c | 8 ++++++++ > >>> 1 file changed, 8 insertions(+) > >>> > >>> diff --git a/target-i386/helper.c b/target-i386/helper.c > >>> index 4f447b8..9b7803f 100644 > >>> --- a/target-i386/helper.c > >>> +++ b/target-i386/helper.c > >>> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, > >>> vaddr addr) > >>> return -1; > >>> } > >>> > >>> + if (pdpe & PG_PSE_MASK) { > >>> + page_size = 1024 * 1024 * 1024; > >>> + pte = pdpe & ~( (page_size - 1) & ~0xfff); > >>> + pte &= ~(PG_NX_MASK | PG_HI_USER_MASK); > >>> + goto out; > >>> + } > >> > >> Does this also apply if we are not in long mode? > > > > No, it doesn't. The only valid bits in a PAE PDPTE are P, PWT and PCD. > > Bit 7 (PS) is reserved. > > Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block. > > And the subject or description should mention that > x86_cpu_get_phys_page_debug was lacking support for 1G hugepages. To be honest, although the PS bit is set and that indicates a 1GB page, I didn't know Linux does that. I thought Linux would use 4KB pages for everything unless it's explicitly asked to use bigger pages. Also, note that I was using gdb to debug really early kernel boot code (start_kernel()). I'd feel more confident to have such a changelog after I find out where exactly Linux sets that bit, but I won't have time in the next days. On the other hand, the patch does fix the problem to me.
On 2014-03-18 14:00, Luiz Capitulino wrote: > On Tue, 18 Mar 2014 11:30:42 +0100 > Jan Kiszka <jan.kiszka@siemens.com> wrote: > >> On 2014-03-18 08:36, Paolo Bonzini wrote: >>> Il 18/03/2014 08:19, Jan Kiszka ha scritto: >>>> On 2014-03-18 02:54, Luiz Capitulino wrote: >>>>> If you start a Linux guest with more than 4GB of memory and try to >>>>> look at a >>>>> memory address, you will get an error from gdb: >>>>> >>>>> (gdb) p node_data[0]->node_id >>>>> Cannot access memory at address 0xffff88013fffd3a0 >>>>> (gdb) >>>> >>>> I suppose this is x86-64, not 32-bit with PTE, right? >>>> >>>>> >>>>> I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't >>>>> handle the >>>>> case where the PDPTE has the PS bit set (although I didn't check >>>>> where Linux >>>>> sets that bit). This commit adds the PS bit handling, which fixes the >>>>> problem >>>>> for me. >>>>> >>>>> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com> >>>>> --- >>>>> >>>>> Two observations: >>>>> >>>>> 1. This bug has always existed, so it's not a regression, so I'm not >>>>> sure >>>>> it's worth it to fix for 2.0 >>> >>> Sure, why not? >>> >>>>> 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(), >>>>> so I'm not completely sure this is the right thing to do >>>>> >>>>> target-i386/helper.c | 8 ++++++++ >>>>> 1 file changed, 8 insertions(+) >>>>> >>>>> diff --git a/target-i386/helper.c b/target-i386/helper.c >>>>> index 4f447b8..9b7803f 100644 >>>>> --- a/target-i386/helper.c >>>>> +++ b/target-i386/helper.c >>>>> @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, >>>>> vaddr addr) >>>>> return -1; >>>>> } >>>>> >>>>> + if (pdpe & PG_PSE_MASK) { >>>>> + page_size = 1024 * 1024 * 1024; >>>>> + pte = pdpe & ~( (page_size - 1) & ~0xfff); >>>>> + pte &= ~(PG_NX_MASK | PG_HI_USER_MASK); >>>>> + goto out; >>>>> + } >>>> >>>> Does this also apply if we are not in long mode? >>> >>> No, it doesn't. The only valid bits in a PAE PDPTE are P, PWT and PCD. >>> Bit 7 (PS) is reserved. >> >> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block. >> >> And the subject or description should mention that >> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages. > > To be honest, although the PS bit is set and that indicates a 1GB page, > I didn't know Linux does that. I thought Linux would use 4KB pages for > everything unless it's explicitly asked to use bigger pages. Also, note that > I was using gdb to debug really early kernel boot code (start_kernel()). I could imagine that Linux initially creates a giant identity mapping page table for the startup process and only later on switches to fine-grained tables of 4K and 2M pages. Giant pages still require hughtlbfs, IIRC. > > I'd feel more confident to have such a changelog after I find out where > exactly Linux sets that bit, but I won't have time in the next days. On the > other hand, the patch does fix the problem to me. Don't worry about Linux (the code should work with any OS anyway), just believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page table structures. Jan
On Tue, 18 Mar 2014 15:36:45 +0100 Jan Kiszka <jan.kiszka@siemens.com> wrote: > >> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block. > >> > >> And the subject or description should mention that > >> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages. > > > > To be honest, although the PS bit is set and that indicates a 1GB page, > > I didn't know Linux does that. I thought Linux would use 4KB pages for > > everything unless it's explicitly asked to use bigger pages. Also, note that > > I was using gdb to debug really early kernel boot code (start_kernel()). > > I could imagine that Linux initially creates a giant identity mapping > page table for the startup process and only later on switches to > fine-grained tables of 4K and 2M pages. Giant pages still require > hughtlbfs, IIRC. > > > > > I'd feel more confident to have such a changelog after I find out where > > exactly Linux sets that bit, but I won't have time in the next days. On the > > other hand, the patch does fix the problem to me. > > Don't worry about Linux (the code should work with any OS anyway), just > believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page > table structures. OK, so you want me to change the subject? Anything else for v2?
Il 18/03/2014 17:23, Luiz Capitulino ha scritto: > On Tue, 18 Mar 2014 15:36:45 +0100 > Jan Kiszka <jan.kiszka@siemens.com> wrote: > >>>> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block. >>>> >>>> And the subject or description should mention that >>>> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages. >>> >>> To be honest, although the PS bit is set and that indicates a 1GB page, >>> I didn't know Linux does that. I thought Linux would use 4KB pages for >>> everything unless it's explicitly asked to use bigger pages. Also, note that >>> I was using gdb to debug really early kernel boot code (start_kernel()). >> >> I could imagine that Linux initially creates a giant identity mapping >> page table for the startup process and only later on switches to >> fine-grained tables of 4K and 2M pages. Giant pages still require >> hughtlbfs, IIRC. >> >>> >>> I'd feel more confident to have such a changelog after I find out where >>> exactly Linux sets that bit, but I won't have time in the next days. On the >>> other hand, the patch does fix the problem to me. >> >> Don't worry about Linux (the code should work with any OS anyway), just >> believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page >> table structures. > > OK, so you want me to change the subject? Anything else for v2? You only need to move the new code into the "if (env->hflags & HF_LMA_MASK)", I think. The subject is ok. Paolo
On 2014-03-18 17:37, Paolo Bonzini wrote: > Il 18/03/2014 17:23, Luiz Capitulino ha scritto: >> On Tue, 18 Mar 2014 15:36:45 +0100 >> Jan Kiszka <jan.kiszka@siemens.com> wrote: >> >>>>> Right, this belongs in the "if (env->hflags & HF_LMA_MASK)" block. >>>>> >>>>> And the subject or description should mention that >>>>> x86_cpu_get_phys_page_debug was lacking support for 1G hugepages. >>>> >>>> To be honest, although the PS bit is set and that indicates a 1GB page, >>>> I didn't know Linux does that. I thought Linux would use 4KB pages for >>>> everything unless it's explicitly asked to use bigger pages. Also, >>>> note that >>>> I was using gdb to debug really early kernel boot code >>>> (start_kernel()). >>> >>> I could imagine that Linux initially creates a giant identity mapping >>> page table for the startup process and only later on switches to >>> fine-grained tables of 4K and 2M pages. Giant pages still require >>> hughtlbfs, IIRC. >>> >>>> >>>> I'd feel more confident to have such a changelog after I find out where >>>> exactly Linux sets that bit, but I won't have time in the next days. >>>> On the >>>> other hand, the patch does fix the problem to me. >>> >>> Don't worry about Linux (the code should work with any OS anyway), just >>> believe your reviewers. ;) Alternatively, check Intel IA32 SDM on page >>> table structures. >> >> OK, so you want me to change the subject? Anything else for v2? > > You only need to move the new code into the "if (env->hflags & > HF_LMA_MASK)", I think. The subject is ok. Yes. Subject is fine, a reference to GB pages in the description would be nice-to-have. Jan
diff --git a/target-i386/helper.c b/target-i386/helper.c index 4f447b8..9b7803f 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -951,6 +951,13 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr) return -1; } + if (pdpe & PG_PSE_MASK) { + page_size = 1024 * 1024 * 1024; + pte = pdpe & ~( (page_size - 1) & ~0xfff); + pte &= ~(PG_NX_MASK | PG_HI_USER_MASK); + goto out; + } + pde_addr = ((pdpe & ~0xfff & ~(PG_NX_MASK | PG_HI_USER_MASK)) + (((addr >> 21) & 0x1ff) << 3)) & env->a20_mask; pde = ldq_phys(cs->as, pde_addr); @@ -993,6 +1000,7 @@ hwaddr x86_cpu_get_phys_page_debug(CPUState *cs, vaddr addr) pte = pte & env->a20_mask; } +out: page_offset = (addr & TARGET_PAGE_MASK) & (page_size - 1); paddr = (pte & TARGET_PAGE_MASK) + page_offset; return paddr;
If you start a Linux guest with more than 4GB of memory and try to look at a memory address, you will get an error from gdb: (gdb) p node_data[0]->node_id Cannot access memory at address 0xffff88013fffd3a0 (gdb) I debugged this down to x86_cpu_get_phys_page_debug(), it doesn't handle the case where the PDPTE has the PS bit set (although I didn't check where Linux sets that bit). This commit adds the PS bit handling, which fixes the problem for me. Signed-off-by: Luiz capitulino <lcapitulino@redhat.com> --- Two observations: 1. This bug has always existed, so it's not a regression, so I'm not sure it's worth it to fix for 2.0 2. I'm not familiar with every detail of x86_cpu_get_phys_page_debug(), so I'm not completely sure this is the right thing to do target-i386/helper.c | 8 ++++++++ 1 file changed, 8 insertions(+)