Message ID | 20170302202338.ci6wwb3yzjmdy4n2@wfg-t540p.sh.intel.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 03/02/2017 09:23 PM, Fengguang Wu wrote: [...] > I confirm that the below patch provided by Daniel fixes the above > issues on mainline kernel, too. Where should this patch be sent to? If nobody objects, I could send it to -net tree via Dave due to being BPF related, but I don't mind sending it elsewhere too (f.e. Linus directly?) in order to stop your bot from continuing to send such mails. The issue seems only related to i386 and doesn't trigger each time with Fengguang's kernel config and qemu image when I try to reproduce it. set_memory_ro()/set_memory_rw() on i386 seems to work in general, but when it's used/reproduced, from time to time (perhaps some corner-case?) it looks like that memory area can have issues much later on after being fed back to the allocator which then causes a GPF from random locations. Gut feeling, it might be an issue in set_memory_*() that my commit uncovered. Still looking into it, but mean-time I could just send the below, sure. Thanks, Daniel > It'd be very noisy if all these Oops hit the upcoming RC1 kernel. > > Daniel thinks there may be deeper problem in i386 set_memory_rw(). > However that could take much longer time to debug. > > Thanks, > Fengguang > --- > > Re: [bpf] 9d876e79df: BUG: unable to handle kernel paging request at 653a8346 > >> On Tue, Feb 28, 2017 at 04:39:36PM +0100, Daniel Borkmann wrote: > > I have a rough feeling what it is, but I didn't have cycles to work on > it yet (due to travel, sorry about that). The issue is likely shut down > by just doing: > > --- > arch/x86/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- linux.orig/arch/x86/Kconfig 2017-03-03 03:44:35.962022996 +0800 > +++ linux/arch/x86/Kconfig 2017-03-03 03:44:35.962022996 +0800 > @@ -54,7 +54,7 @@ config X86 > select ARCH_HAS_KCOV if X86_64 > select ARCH_HAS_MMIO_FLUSH > select ARCH_HAS_PMEM_API if X86_64 > - select ARCH_HAS_SET_MEMORY > + select ARCH_HAS_SET_MEMORY if X86_64 > select ARCH_HAS_SG_CHAIN > select ARCH_HAS_STRICT_KERNEL_RWX > select ARCH_HAS_STRICT_MODULE_RWX
Adding x86 people too, since this seems to be something off about ARCH_HAS_SET_MEMORY for x86-32. The code seems to be shared between x86-32 and 64, I'm not seeing why set_memory_r[ow]() should fail on one but not the other. Considering that it seems to be flaky even on 32-bit, maybe it's timing-related, or possibly related to TLB sizes or whatever (ie more likely hidden by a larger TLB on more modern hardware?) Anyway, just looking at change_page_attr_set_clr(), I notice that the page alias checking treats NX specially: /* No alias checking for _NX bit modifications */ checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX; which seems insane. Why would NX be different from other protection bits (like _PAGE_RW)? But that doesn't explain why the bpf code would have issues with this all only on x86-32. Maybe somebody else can see why ARCH_HAS_SET_MEMORY would depend on 64-bit only.. Linus On Thu, Mar 2, 2017 at 12:40 PM, Daniel Borkmann <daniel@iogearbox.net> wrote: > On 03/02/2017 09:23 PM, Fengguang Wu wrote: > [...] >> >> I confirm that the below patch provided by Daniel fixes the above >> issues on mainline kernel, too. Where should this patch be sent to? > > > If nobody objects, I could send it to -net tree via Dave due to being > BPF related, but I don't mind sending it elsewhere too (f.e. Linus > directly?) in order to stop your bot from continuing to send such mails. > > The issue seems only related to i386 and doesn't trigger each time with > Fengguang's kernel config and qemu image when I try to reproduce it. > set_memory_ro()/set_memory_rw() on i386 seems to work in general, but > when it's used/reproduced, from time to time (perhaps some corner-case?) > it looks like that memory area can have issues much later on after being > fed back to the allocator which then causes a GPF from random locations. > Gut feeling, it might be an issue in set_memory_*() that my commit > uncovered. Still looking into it, but mean-time I could just send the > below, sure. > > Thanks, > Daniel > > >> It'd be very noisy if all these Oops hit the upcoming RC1 kernel. >> >> Daniel thinks there may be deeper problem in i386 set_memory_rw(). >> However that could take much longer time to debug. >> >> Thanks, >> Fengguang >> --- >> >> Re: [bpf] 9d876e79df: BUG: unable to handle kernel paging request at >> 653a8346 >> >>> On Tue, Feb 28, 2017 at 04:39:36PM +0100, Daniel Borkmann wrote: >> >> >> I have a rough feeling what it is, but I didn't have cycles to work on >> it yet (due to travel, sorry about that). The issue is likely shut down >> by just doing: >> >> --- >> arch/x86/Kconfig | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> --- linux.orig/arch/x86/Kconfig 2017-03-03 03:44:35.962022996 +0800 >> +++ linux/arch/x86/Kconfig 2017-03-03 03:44:35.962022996 +0800 >> @@ -54,7 +54,7 @@ config X86 >> select ARCH_HAS_KCOV if X86_64 >> select ARCH_HAS_MMIO_FLUSH >> select ARCH_HAS_PMEM_API if X86_64 >> - select ARCH_HAS_SET_MEMORY >> + select ARCH_HAS_SET_MEMORY if X86_64 >> select ARCH_HAS_SG_CHAIN >> select ARCH_HAS_STRICT_KERNEL_RWX >> select ARCH_HAS_STRICT_MODULE_RWX > >
On Wed, 8 Mar 2017, Linus Torvalds wrote: > Adding x86 people too, since this seems to be something off about > ARCH_HAS_SET_MEMORY for x86-32. > > The code seems to be shared between x86-32 and 64, I'm not seeing why > set_memory_r[ow]() should fail on one but not the other. Indeed. > Considering that it seems to be flaky even on 32-bit, maybe it's > timing-related, or possibly related to TLB sizes or whatever (ie more > likely hidden by a larger TLB on more modern hardware?) The only difference I can see is the way how __tlb_flush_all() is happening. We have 3 variants: invpcid_flush_all() - depends on X86_FEATURE_INVPCID and X86_FEATURE_PGE cr4 based flush - depends on X86_FEATURE_PGE cr3 based flush No idea which variant is used in that failure case. > Anyway, just looking at change_page_attr_set_clr(), I notice that the > page alias checking treats NX specially: > > /* No alias checking for _NX bit modifications */ > checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX; > > which seems insane. Why would NX be different from other protection > bits (like _PAGE_RW)? The reason is that the alias mapping should never be executable at all. Thanks, tglx
--- linux.orig/arch/x86/Kconfig 2017-03-03 03:44:35.962022996 +0800 +++ linux/arch/x86/Kconfig 2017-03-03 03:44:35.962022996 +0800 @@ -54,7 +54,7 @@ config X86 select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_MMIO_FLUSH select ARCH_HAS_PMEM_API if X86_64 - select ARCH_HAS_SET_MEMORY + select ARCH_HAS_SET_MEMORY if X86_64 select ARCH_HAS_SG_CHAIN select ARCH_HAS_STRICT_KERNEL_RWX select ARCH_HAS_STRICT_MODULE_RWX