Message ID | 20170525033650.10891-2-bsingharora@gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Hi Balbir, [auto build test ERROR on powerpc/next] [also build test ERROR on v4.12-rc2 next-20170525] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Balbir-Singh/Enable-STRICT_KERNEL_RWX/20170525-150234 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allnoconfig (attached as .config) compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/lib/code-patching.c: In function 'kernel_map_addr': >> arch/powerpc/lib/code-patching.c:57:8: error: implicit declaration of function 'map_kernel_page' [-Werror=implicit-function-declaration] err = map_kernel_page((unsigned long)text_poke_area->addr, ^~~~~~~~~~~~~~~ cc1: all warnings being treated as errors vim +/map_kernel_page +57 arch/powerpc/lib/code-patching.c 51 52 if (is_vmalloc_addr(addr)) 53 pfn = vmalloc_to_pfn(addr); 54 else 55 pfn = __pa_symbol(addr) >> PAGE_SHIFT; 56 > 57 err = map_kernel_page((unsigned long)text_poke_area->addr, 58 (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT); 59 pr_devel("Mapped addr %p with pfn %lx\n", text_poke_area->addr, pfn); 60 if (err) --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Le 25/05/2017 à 05:36, Balbir Singh a écrit : > Today our patching happens via direct copy and > patch_instruction. The patching code is well > contained in the sense that copying bits are limited. > > While considering implementation of CONFIG_STRICT_RWX, > the first requirement is to a create another mapping > that will allow for patching. We create the window using > text_poke_area, allocated via get_vm_area(), which might > be an overkill. We can do per-cpu stuff as well. The > downside of these patches that patch_instruction is > now synchornized using a lock. Other arches do similar > things, but use fixmaps. The reason for not using > fixmaps is to make use of any randomization in the > future. The code also relies on set_pte_at and pte_clear > to do the appropriate tlb flushing. Isn't it overkill to remap the text in another area ? Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64, parisc, s390, x86/32, x86/64): - arm, x86/32 and x86/64 set text RW during the modification - s390 seems to uses a special instruction which bypassed write protection - parisc doesn't seem to implement any function which modifies kernel text. Therefore it seems only arm64 does it via another mapping. Wouldn't it be lighter to just unprotect memory during the modification as done on arm and x86 ? Or another alternative could be to disable DMMU and do the write at physical address ? Christophe > > Signed-off-by: Balbir Singh <bsingharora@gmail.com> > --- > arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 84 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c > index 500b0f6..0a16b2f 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -16,19 +16,98 @@ > #include <asm/code-patching.h> > #include <linux/uaccess.h> > #include <linux/kprobes.h> > +#include <asm/pgtable.h> > +#include <asm/tlbflush.h> > > +struct vm_struct *text_poke_area; > +static DEFINE_RAW_SPINLOCK(text_poke_lock); > > -int patch_instruction(unsigned int *addr, unsigned int instr) > +/* > + * This is an early_initcall and early_initcalls happen at the right time > + * for us, after slab is enabled and before we mark ro pages R/O. In the > + * future if get_vm_area is randomized, this will be more flexible than > + * fixmap > + */ > +static int __init setup_text_poke_area(void) > { > + text_poke_area = get_vm_area(PAGE_SIZE, VM_ALLOC); > + if (!text_poke_area) { > + WARN_ONCE(1, "could not create area for mapping kernel addrs" > + " which allow for patching kernel code\n"); > + return 0; > + } > + pr_info("text_poke area ready...\n"); > + raw_spin_lock_init(&text_poke_lock); > + return 0; > +} > + > +/* > + * This can be called for kernel text or a module. > + */ > +static int kernel_map_addr(void *addr) > +{ > + unsigned long pfn; > int err; > > - __put_user_size(instr, addr, 4, err); > + if (is_vmalloc_addr(addr)) > + pfn = vmalloc_to_pfn(addr); > + else > + pfn = __pa_symbol(addr) >> PAGE_SHIFT; > + > + err = map_kernel_page((unsigned long)text_poke_area->addr, > + (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT); > + pr_devel("Mapped addr %p with pfn %lx\n", text_poke_area->addr, pfn); > if (err) > - return err; > - asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r" (addr)); > + return -1; > return 0; > } > > +static inline void kernel_unmap_addr(void *addr) > +{ > + pte_t *pte; > + unsigned long kaddr = (unsigned long)addr; > + > + pte = pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(kaddr), > + kaddr), kaddr), kaddr); > + pr_devel("clearing mm %p, pte %p, kaddr %lx\n", &init_mm, pte, kaddr); > + pte_clear(&init_mm, kaddr, pte); > +} > + > +int patch_instruction(unsigned int *addr, unsigned int instr) > +{ > + int err; > + unsigned int *dest = NULL; > + unsigned long flags; > + unsigned long kaddr = (unsigned long)addr; > + > + /* > + * During early early boot patch_instruction is called > + * when text_poke_area is not ready, but we still need > + * to allow patching. We just do the plain old patching > + */ > + if (!text_poke_area) { > + __put_user_size(instr, addr, 4, err); > + asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (addr)); > + return 0; > + } > + > + raw_spin_lock_irqsave(&text_poke_lock, flags); > + if (kernel_map_addr(addr)) { > + err = -1; > + goto out; > + } > + > + dest = (unsigned int *)(text_poke_area->addr) + > + ((kaddr & ~PAGE_MASK) / sizeof(unsigned int)); > + __put_user_size(instr, dest, 4, err); > + asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (dest)); > + kernel_unmap_addr(text_poke_area->addr); > +out: > + raw_spin_unlock_irqrestore(&text_poke_lock, flags); > + return err; > +} > +NOKPROBE_SYMBOL(patch_instruction); > + > int patch_branch(unsigned int *addr, unsigned long target, int flags) > { > return patch_instruction(addr, create_branch(addr, target, flags)); > @@ -514,3 +593,4 @@ static int __init test_code_patching(void) > late_initcall(test_code_patching); > > #endif /* CONFIG_CODE_PATCHING_SELFTEST */ > +early_initcall(setup_text_poke_area); > --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus
Le 25/05/2017 à 05:36, Balbir Singh a écrit : > Today our patching happens via direct copy and > patch_instruction. The patching code is well > contained in the sense that copying bits are limited. > > While considering implementation of CONFIG_STRICT_RWX, > the first requirement is to a create another mapping > that will allow for patching. We create the window using > text_poke_area, allocated via get_vm_area(), which might > be an overkill. We can do per-cpu stuff as well. The > downside of these patches that patch_instruction is > now synchornized using a lock. Other arches do similar > things, but use fixmaps. The reason for not using > fixmaps is to make use of any randomization in the > future. The code also relies on set_pte_at and pte_clear > to do the appropriate tlb flushing. > > Signed-off-by: Balbir Singh <bsingharora@gmail.com> > --- > arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 84 insertions(+), 4 deletions(-) > [...] > +static int kernel_map_addr(void *addr) > +{ > + unsigned long pfn; > int err; > > - __put_user_size(instr, addr, 4, err); > + if (is_vmalloc_addr(addr)) > + pfn = vmalloc_to_pfn(addr); > + else > + pfn = __pa_symbol(addr) >> PAGE_SHIFT; > + > + err = map_kernel_page((unsigned long)text_poke_area->addr, > + (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT); Why not use PAGE_KERNEL instead of _PAGE_KERNEL_RW | _PAGE_PRESENT ? From asm/pte-common.h : #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) #define _PAGE_BASE (_PAGE_BASE_NC) #define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE) Also, in pte-common.h, maybe the following defines could/should be reworked once you serie applied, shouldn't it ? /* Protection used for kernel text. We want the debuggers to be able to * set breakpoints anywhere, so don't write protect the kernel text * on platforms where such control is possible. */ #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) ||\ defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) #define PAGE_KERNEL_TEXT PAGE_KERNEL_X #else #define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX #endif Christophe --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus
On Sun, 2017-05-28 at 20:00 +0200, christophe leroy wrote: > > Le 25/05/2017 à 05:36, Balbir Singh a écrit : > > Today our patching happens via direct copy and > > patch_instruction. The patching code is well > > contained in the sense that copying bits are limited. > > > > While considering implementation of CONFIG_STRICT_RWX, > > the first requirement is to a create another mapping > > that will allow for patching. We create the window using > > text_poke_area, allocated via get_vm_area(), which might > > be an overkill. We can do per-cpu stuff as well. The > > downside of these patches that patch_instruction is > > now synchornized using a lock. Other arches do similar > > things, but use fixmaps. The reason for not using > > fixmaps is to make use of any randomization in the > > future. The code also relies on set_pte_at and pte_clear > > to do the appropriate tlb flushing. > > > > Signed-off-by: Balbir Singh <bsingharora@gmail.com> > > --- > > arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 84 insertions(+), 4 deletions(-) > > > > [...] > > > +static int kernel_map_addr(void *addr) > > +{ > > + unsigned long pfn; > > int err; > > > > - __put_user_size(instr, addr, 4, err); > > + if (is_vmalloc_addr(addr)) > > + pfn = vmalloc_to_pfn(addr); > > + else > > + pfn = __pa_symbol(addr) >> PAGE_SHIFT; > > + > > + err = map_kernel_page((unsigned long)text_poke_area->addr, > > + (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT); > > > > Why not use PAGE_KERNEL instead of _PAGE_KERNEL_RW | _PAGE_PRESENT ? > Will do > From asm/pte-common.h : > > #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) > #define _PAGE_BASE (_PAGE_BASE_NC) > #define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE) > > Also, in pte-common.h, maybe the following defines could/should be > reworked once you serie applied, shouldn't it ? > > /* Protection used for kernel text. We want the debuggers to be able to > * set breakpoints anywhere, so don't write protect the kernel text > * on platforms where such control is possible. > */ > #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || > defined(CONFIG_BDI_SWITCH) ||\ > defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE) > #define PAGE_KERNEL_TEXT PAGE_KERNEL_X > #else > #define PAGE_KERNEL_TEXT PAGE_KERNEL_ROX > #endif Yes, I did see them and I want to rework them. Thanks, Balbir Singh.
On Sun, 2017-05-28 at 17:59 +0200, christophe leroy wrote: > > Le 25/05/2017 à 05:36, Balbir Singh a écrit : > > Today our patching happens via direct copy and > > patch_instruction. The patching code is well > > contained in the sense that copying bits are limited. > > > > While considering implementation of CONFIG_STRICT_RWX, > > the first requirement is to a create another mapping > > that will allow for patching. We create the window using > > text_poke_area, allocated via get_vm_area(), which might > > be an overkill. We can do per-cpu stuff as well. The > > downside of these patches that patch_instruction is > > now synchornized using a lock. Other arches do similar > > things, but use fixmaps. The reason for not using > > fixmaps is to make use of any randomization in the > > future. The code also relies on set_pte_at and pte_clear > > to do the appropriate tlb flushing. > > Isn't it overkill to remap the text in another area ? > > Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64, > parisc, s390, x86/32, x86/64): > - arm, x86/32 and x86/64 set text RW during the modification x86 uses set_fixmap() in text_poke(), am I missing something? > - s390 seems to uses a special instruction which bypassed write protection > - parisc doesn't seem to implement any function which modifies kernel text. > > Therefore it seems only arm64 does it via another mapping. > Wouldn't it be lighter to just unprotect memory during the modification > as done on arm and x86 ? > I am not sure if the trade-off is quite that simple, for security I thought 1. It would be better to randomize text_poke_area(), which is why I dynamically allocated it. If we start randomizing get_vm_area(), we get the benefit 2. text_poke_aread() is RW and the normal text is RX, for any attack to succeed, it would need to find text_poke_area() at the time of patching, patch the kernel in that small window and use the normal mapping for execution Generally patch_instruction() is not fast path except for ftrace, tracing. In my tests I did not find the slow down noticable > Or another alternative could be to disable DMMU and do the write at > physical address ? > This would be worse off, I think, but we were discussing doing something like that xmon. But for other cases, I think it opens up a bigger window. > Christophe Balbir Singh
Le 29/05/2017 à 00:50, Balbir Singh a écrit : > On Sun, 2017-05-28 at 17:59 +0200, christophe leroy wrote: >> >> Le 25/05/2017 à 05:36, Balbir Singh a écrit : >>> Today our patching happens via direct copy and >>> patch_instruction. The patching code is well >>> contained in the sense that copying bits are limited. >>> >>> While considering implementation of CONFIG_STRICT_RWX, >>> the first requirement is to a create another mapping >>> that will allow for patching. We create the window using >>> text_poke_area, allocated via get_vm_area(), which might >>> be an overkill. We can do per-cpu stuff as well. The >>> downside of these patches that patch_instruction is >>> now synchornized using a lock. Other arches do similar >>> things, but use fixmaps. The reason for not using >>> fixmaps is to make use of any randomization in the >>> future. The code also relies on set_pte_at and pte_clear >>> to do the appropriate tlb flushing. >> >> Isn't it overkill to remap the text in another area ? >> >> Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64, >> parisc, s390, x86/32, x86/64): >> - arm, x86/32 and x86/64 set text RW during the modification > > x86 uses set_fixmap() in text_poke(), am I missing something? > Indeed I looked how it is done in ftrace. On x86 text modifications are done using ftrace_write() which calls probe_kernel_write() which doesn't remap anything. It first calls ftrace_arch_code_modify_prepare() which sets the kernel text to rw. Indeed you are right, text_poke() remaps via fixmap. However it looks like text_poke() is used only for kgdb and kprobe Christophe
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 500b0f6..0a16b2f 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -16,19 +16,98 @@ #include <asm/code-patching.h> #include <linux/uaccess.h> #include <linux/kprobes.h> +#include <asm/pgtable.h> +#include <asm/tlbflush.h> +struct vm_struct *text_poke_area; +static DEFINE_RAW_SPINLOCK(text_poke_lock); -int patch_instruction(unsigned int *addr, unsigned int instr) +/* + * This is an early_initcall and early_initcalls happen at the right time + * for us, after slab is enabled and before we mark ro pages R/O. In the + * future if get_vm_area is randomized, this will be more flexible than + * fixmap + */ +static int __init setup_text_poke_area(void) { + text_poke_area = get_vm_area(PAGE_SIZE, VM_ALLOC); + if (!text_poke_area) { + WARN_ONCE(1, "could not create area for mapping kernel addrs" + " which allow for patching kernel code\n"); + return 0; + } + pr_info("text_poke area ready...\n"); + raw_spin_lock_init(&text_poke_lock); + return 0; +} + +/* + * This can be called for kernel text or a module. + */ +static int kernel_map_addr(void *addr) +{ + unsigned long pfn; int err; - __put_user_size(instr, addr, 4, err); + if (is_vmalloc_addr(addr)) + pfn = vmalloc_to_pfn(addr); + else + pfn = __pa_symbol(addr) >> PAGE_SHIFT; + + err = map_kernel_page((unsigned long)text_poke_area->addr, + (pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT); + pr_devel("Mapped addr %p with pfn %lx\n", text_poke_area->addr, pfn); if (err) - return err; - asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r" (addr)); + return -1; return 0; } +static inline void kernel_unmap_addr(void *addr) +{ + pte_t *pte; + unsigned long kaddr = (unsigned long)addr; + + pte = pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(kaddr), + kaddr), kaddr), kaddr); + pr_devel("clearing mm %p, pte %p, kaddr %lx\n", &init_mm, pte, kaddr); + pte_clear(&init_mm, kaddr, pte); +} + +int patch_instruction(unsigned int *addr, unsigned int instr) +{ + int err; + unsigned int *dest = NULL; + unsigned long flags; + unsigned long kaddr = (unsigned long)addr; + + /* + * During early early boot patch_instruction is called + * when text_poke_area is not ready, but we still need + * to allow patching. We just do the plain old patching + */ + if (!text_poke_area) { + __put_user_size(instr, addr, 4, err); + asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (addr)); + return 0; + } + + raw_spin_lock_irqsave(&text_poke_lock, flags); + if (kernel_map_addr(addr)) { + err = -1; + goto out; + } + + dest = (unsigned int *)(text_poke_area->addr) + + ((kaddr & ~PAGE_MASK) / sizeof(unsigned int)); + __put_user_size(instr, dest, 4, err); + asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (dest)); + kernel_unmap_addr(text_poke_area->addr); +out: + raw_spin_unlock_irqrestore(&text_poke_lock, flags); + return err; +} +NOKPROBE_SYMBOL(patch_instruction); + int patch_branch(unsigned int *addr, unsigned long target, int flags) { return patch_instruction(addr, create_branch(addr, target, flags)); @@ -514,3 +593,4 @@ static int __init test_code_patching(void) late_initcall(test_code_patching); #endif /* CONFIG_CODE_PATCHING_SELFTEST */ +early_initcall(setup_text_poke_area);
Today our patching happens via direct copy and patch_instruction. The patching code is well contained in the sense that copying bits are limited. While considering implementation of CONFIG_STRICT_RWX, the first requirement is to a create another mapping that will allow for patching. We create the window using text_poke_area, allocated via get_vm_area(), which might be an overkill. We can do per-cpu stuff as well. The downside of these patches that patch_instruction is now synchornized using a lock. Other arches do similar things, but use fixmaps. The reason for not using fixmaps is to make use of any randomization in the future. The code also relies on set_pte_at and pte_clear to do the appropriate tlb flushing. Signed-off-by: Balbir Singh <bsingharora@gmail.com> --- arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 84 insertions(+), 4 deletions(-)