[v1,1/8] powerpc/lib/code-patching: Enhance code patching

Message ID	20170525033650.10891-2-bsingharora@gmail.com (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org> From: Balbir Singh <bsingharora@gmail.com> To: linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au Subject: [PATCH v1 1/8] powerpc/lib/code-patching: Enhance code patching Date: Thu, 25 May 2017 13:36:43 +1000 Message-Id: <20170525033650.10891-2-bsingharora@gmail.com> In-Reply-To: <20170525033650.10891-1-bsingharora@gmail.com> References: <20170525033650.10891-1-bsingharora@gmail.com> Precedence: list Cc: paulus@samba.org, naveen.n.rao@linux.vnet.ibm.com, rashmica.g@gmail.com Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>

Balbir Singh May 25, 2017, 3:36 a.m. UTC

Today our patching happens via direct copy and
patch_instruction. The patching code is well
contained in the sense that copying bits are limited.

While considering implementation of CONFIG_STRICT_RWX,
the first requirement is to a create another mapping
that will allow for patching. We create the window using
text_poke_area, allocated via get_vm_area(), which might
be an overkill. We can do per-cpu stuff as well. The
downside of these patches that patch_instruction is
now synchornized using a lock. Other arches do similar
things, but use fixmaps. The reason for not using
fixmaps is to make use of any randomization in the
future. The code also relies on set_pte_at and pte_clear
to do the appropriate tlb flushing.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 84 insertions(+), 4 deletions(-)

kernel test robot May 25, 2017, 9:11 a.m. UTC | #1

Hi Balbir,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.12-rc2 next-20170525]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Balbir-Singh/Enable-STRICT_KERNEL_RWX/20170525-150234
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allnoconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/lib/code-patching.c: In function 'kernel_map_addr':
>> arch/powerpc/lib/code-patching.c:57:8: error: implicit declaration of function 'map_kernel_page' [-Werror=implicit-function-declaration]
     err = map_kernel_page((unsigned long)text_poke_area->addr,
           ^~~~~~~~~~~~~~~
   cc1: all warnings being treated as errors

vim +/map_kernel_page +57 arch/powerpc/lib/code-patching.c

    51	
    52		if (is_vmalloc_addr(addr))
    53			pfn = vmalloc_to_pfn(addr);
    54		else
    55			pfn = __pa_symbol(addr) >> PAGE_SHIFT;
    56	
  > 57		err = map_kernel_page((unsigned long)text_poke_area->addr,
    58				(pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);
    59		pr_devel("Mapped addr %p with pfn %lx\n", text_poke_area->addr, pfn);
    60		if (err)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

Christophe Leroy May 28, 2017, 3:59 p.m. UTC | #2

Le 25/05/2017 à 05:36, Balbir Singh a écrit :
> Today our patching happens via direct copy and
> patch_instruction. The patching code is well
> contained in the sense that copying bits are limited.
>
> While considering implementation of CONFIG_STRICT_RWX,
> the first requirement is to a create another mapping
> that will allow for patching. We create the window using
> text_poke_area, allocated via get_vm_area(), which might
> be an overkill. We can do per-cpu stuff as well. The
> downside of these patches that patch_instruction is
> now synchornized using a lock. Other arches do similar
> things, but use fixmaps. The reason for not using
> fixmaps is to make use of any randomization in the
> future. The code also relies on set_pte_at and pte_clear
> to do the appropriate tlb flushing.

Isn't it overkill to remap the text in another area ?

Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64, 
parisc, s390, x86/32, x86/64):
- arm, x86/32 and x86/64 set text RW during the modification
- s390 seems to uses a special instruction which bypassed write protection
- parisc doesn't seem to implement any function which modifies kernel text.

Therefore it seems only arm64 does it via another mapping.
Wouldn't it be lighter to just unprotect memory during the modification 
as done on arm and x86 ?

Or another alternative could be to disable DMMU and do the write at 
physical address ?

Christophe

>
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
>  arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 84 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> index 500b0f6..0a16b2f 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -16,19 +16,98 @@
>  #include <asm/code-patching.h>
>  #include <linux/uaccess.h>
>  #include <linux/kprobes.h>
> +#include <asm/pgtable.h>
> +#include <asm/tlbflush.h>
>
> +struct vm_struct *text_poke_area;
> +static DEFINE_RAW_SPINLOCK(text_poke_lock);
>
> -int patch_instruction(unsigned int *addr, unsigned int instr)
> +/*
> + * This is an early_initcall and early_initcalls happen at the right time
> + * for us, after slab is enabled and before we mark ro pages R/O. In the
> + * future if get_vm_area is randomized, this will be more flexible than
> + * fixmap
> + */
> +static int __init setup_text_poke_area(void)
>  {
> +	text_poke_area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> +	if (!text_poke_area) {
> +		WARN_ONCE(1, "could not create area for mapping kernel addrs"
> +				" which allow for patching kernel code\n");
> +		return 0;
> +	}
> +	pr_info("text_poke area ready...\n");
> +	raw_spin_lock_init(&text_poke_lock);
> +	return 0;
> +}
> +
> +/*
> + * This can be called for kernel text or a module.
> + */
> +static int kernel_map_addr(void *addr)
> +{
> +	unsigned long pfn;
>  	int err;
>
> -	__put_user_size(instr, addr, 4, err);
> +	if (is_vmalloc_addr(addr))
> +		pfn = vmalloc_to_pfn(addr);
> +	else
> +		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> +
> +	err = map_kernel_page((unsigned long)text_poke_area->addr,
> +			(pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);
> +	pr_devel("Mapped addr %p with pfn %lx\n", text_poke_area->addr, pfn);
>  	if (err)
> -		return err;
> -	asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r" (addr));
> +		return -1;
>  	return 0;
>  }
>
> +static inline void kernel_unmap_addr(void *addr)
> +{
> +	pte_t *pte;
> +	unsigned long kaddr = (unsigned long)addr;
> +
> +	pte = pte_offset_kernel(pmd_offset(pud_offset(pgd_offset_k(kaddr),
> +				kaddr), kaddr), kaddr);
> +	pr_devel("clearing mm %p, pte %p, kaddr %lx\n", &init_mm, pte, kaddr);
> +	pte_clear(&init_mm, kaddr, pte);
> +}
> +
> +int patch_instruction(unsigned int *addr, unsigned int instr)
> +{
> +	int err;
> +	unsigned int *dest = NULL;
> +	unsigned long flags;
> +	unsigned long kaddr = (unsigned long)addr;
> +
> +	/*
> +	 * During early early boot patch_instruction is called
> +	 * when text_poke_area is not ready, but we still need
> +	 * to allow patching. We just do the plain old patching
> +	 */
> +	if (!text_poke_area) {
> +		__put_user_size(instr, addr, 4, err);
> +		asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (addr));
> +		return 0;
> +	}
> +
> +	raw_spin_lock_irqsave(&text_poke_lock, flags);
> +	if (kernel_map_addr(addr)) {
> +		err = -1;
> +		goto out;
> +	}
> +
> +	dest = (unsigned int *)(text_poke_area->addr) +
> +		((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
> +	__put_user_size(instr, dest, 4, err);
> +	asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" :: "r" (dest));
> +	kernel_unmap_addr(text_poke_area->addr);
> +out:
> +	raw_spin_unlock_irqrestore(&text_poke_lock, flags);
> +	return err;
> +}
> +NOKPROBE_SYMBOL(patch_instruction);
> +
>  int patch_branch(unsigned int *addr, unsigned long target, int flags)
>  {
>  	return patch_instruction(addr, create_branch(addr, target, flags));
> @@ -514,3 +593,4 @@ static int __init test_code_patching(void)
>  late_initcall(test_code_patching);
>
>  #endif /* CONFIG_CODE_PATCHING_SELFTEST */
> +early_initcall(setup_text_poke_area);
>

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

Christophe Leroy May 28, 2017, 6 p.m. UTC | #3

Le 25/05/2017 à 05:36, Balbir Singh a écrit :
> Today our patching happens via direct copy and
> patch_instruction. The patching code is well
> contained in the sense that copying bits are limited.
>
> While considering implementation of CONFIG_STRICT_RWX,
> the first requirement is to a create another mapping
> that will allow for patching. We create the window using
> text_poke_area, allocated via get_vm_area(), which might
> be an overkill. We can do per-cpu stuff as well. The
> downside of these patches that patch_instruction is
> now synchornized using a lock. Other arches do similar
> things, but use fixmaps. The reason for not using
> fixmaps is to make use of any randomization in the
> future. The code also relies on set_pte_at and pte_clear
> to do the appropriate tlb flushing.
>
> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
>  arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 84 insertions(+), 4 deletions(-)
>

[...]

> +static int kernel_map_addr(void *addr)
> +{
> +	unsigned long pfn;
>  	int err;
>
> -	__put_user_size(instr, addr, 4, err);
> +	if (is_vmalloc_addr(addr))
> +		pfn = vmalloc_to_pfn(addr);
> +	else
> +		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> +
> +	err = map_kernel_page((unsigned long)text_poke_area->addr,
> +			(pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);



Why not use PAGE_KERNEL instead of _PAGE_KERNEL_RW | _PAGE_PRESENT ?

 From asm/pte-common.h :

#define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
#define _PAGE_BASE	(_PAGE_BASE_NC)
#define _PAGE_BASE_NC	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)

Also, in pte-common.h, maybe the following defines could/should be 
reworked once you serie applied, shouldn't it ?

/* Protection used for kernel text. We want the debuggers to be able to
  * set breakpoints anywhere, so don't write protect the kernel text
  * on platforms where such control is possible.
  */
#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || 
defined(CONFIG_BDI_SWITCH) ||\
	defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
#define PAGE_KERNEL_TEXT	PAGE_KERNEL_X
#else
#define PAGE_KERNEL_TEXT	PAGE_KERNEL_ROX
#endif


Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

Balbir Singh May 28, 2017, 10:15 p.m. UTC | #4

On Sun, 2017-05-28 at 20:00 +0200, christophe leroy wrote:
> 
> Le 25/05/2017 à 05:36, Balbir Singh a écrit :
> > Today our patching happens via direct copy and
> > patch_instruction. The patching code is well
> > contained in the sense that copying bits are limited.
> > 
> > While considering implementation of CONFIG_STRICT_RWX,
> > the first requirement is to a create another mapping
> > that will allow for patching. We create the window using
> > text_poke_area, allocated via get_vm_area(), which might
> > be an overkill. We can do per-cpu stuff as well. The
> > downside of these patches that patch_instruction is
> > now synchornized using a lock. Other arches do similar
> > things, but use fixmaps. The reason for not using
> > fixmaps is to make use of any randomization in the
> > future. The code also relies on set_pte_at and pte_clear
> > to do the appropriate tlb flushing.
> > 
> > Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> > ---
> >  arch/powerpc/lib/code-patching.c | 88 ++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 84 insertions(+), 4 deletions(-)
> > 
> 
> [...]
> 
> > +static int kernel_map_addr(void *addr)
> > +{
> > +	unsigned long pfn;
> >  	int err;
> > 
> > -	__put_user_size(instr, addr, 4, err);
> > +	if (is_vmalloc_addr(addr))
> > +		pfn = vmalloc_to_pfn(addr);
> > +	else
> > +		pfn = __pa_symbol(addr) >> PAGE_SHIFT;
> > +
> > +	err = map_kernel_page((unsigned long)text_poke_area->addr,
> > +			(pfn << PAGE_SHIFT), _PAGE_KERNEL_RW | _PAGE_PRESENT);
> 
> 
> 
> Why not use PAGE_KERNEL instead of _PAGE_KERNEL_RW | _PAGE_PRESENT ?
>

Will do
 
>  From asm/pte-common.h :
> 
> #define PAGE_KERNEL	__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
> #define _PAGE_BASE	(_PAGE_BASE_NC)
> #define _PAGE_BASE_NC	(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)
> 
> Also, in pte-common.h, maybe the following defines could/should be 
> reworked once you serie applied, shouldn't it ?
> 
> /* Protection used for kernel text. We want the debuggers to be able to
>   * set breakpoints anywhere, so don't write protect the kernel text
>   * on platforms where such control is possible.
>   */
> #if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || 
> defined(CONFIG_BDI_SWITCH) ||\
> 	defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
> #define PAGE_KERNEL_TEXT	PAGE_KERNEL_X
> #else
> #define PAGE_KERNEL_TEXT	PAGE_KERNEL_ROX
> #endif

Yes, I did see them and I want to rework them.

Thanks,
Balbir Singh.

Balbir Singh May 28, 2017, 10:50 p.m. UTC | #5

On Sun, 2017-05-28 at 17:59 +0200, christophe leroy wrote:
> 
> Le 25/05/2017 à 05:36, Balbir Singh a écrit :
> > Today our patching happens via direct copy and
> > patch_instruction. The patching code is well
> > contained in the sense that copying bits are limited.
> > 
> > While considering implementation of CONFIG_STRICT_RWX,
> > the first requirement is to a create another mapping
> > that will allow for patching. We create the window using
> > text_poke_area, allocated via get_vm_area(), which might
> > be an overkill. We can do per-cpu stuff as well. The
> > downside of these patches that patch_instruction is
> > now synchornized using a lock. Other arches do similar
> > things, but use fixmaps. The reason for not using
> > fixmaps is to make use of any randomization in the
> > future. The code also relies on set_pte_at and pte_clear
> > to do the appropriate tlb flushing.
> 
> Isn't it overkill to remap the text in another area ?
>
> Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64, 
> parisc, s390, x86/32, x86/64):
> - arm, x86/32 and x86/64 set text RW during the modification

x86 uses set_fixmap() in text_poke(), am I missing something?

> - s390 seems to uses a special instruction which bypassed write protection
> - parisc doesn't seem to implement any function which modifies kernel text.
> 
> Therefore it seems only arm64 does it via another mapping.
> Wouldn't it be lighter to just unprotect memory during the modification 
> as done on arm and x86 ?
> 

I am not sure if the trade-off is quite that simple, for security I thought

1. It would be better to randomize text_poke_area(), which is why I dynamically
allocated it. If we start randomizing get_vm_area(), we get the benefit
2. text_poke_aread() is RW and the normal text is RX, for any attack
to succeed, it would need to find text_poke_area() at the time of patching,
patch the kernel in that small window and use the normal mapping for
execution

Generally patch_instruction() is not fast path except for ftrace, tracing.
In my tests I did not find the slow down noticable

> Or another alternative could be to disable DMMU and do the write at 
> physical address ?
>

This would be worse off, I think, but we were discussing doing something
like that xmon. But for other cases, I think  it opens up a bigger window.

> Christophe

Balbir Singh

Christophe Leroy May 29, 2017, 5:50 a.m. UTC | #6

Le 29/05/2017 à 00:50, Balbir Singh a écrit :
> On Sun, 2017-05-28 at 17:59 +0200, christophe leroy wrote:
>>
>> Le 25/05/2017 à 05:36, Balbir Singh a écrit :
>>> Today our patching happens via direct copy and
>>> patch_instruction. The patching code is well
>>> contained in the sense that copying bits are limited.
>>>
>>> While considering implementation of CONFIG_STRICT_RWX,
>>> the first requirement is to a create another mapping
>>> that will allow for patching. We create the window using
>>> text_poke_area, allocated via get_vm_area(), which might
>>> be an overkill. We can do per-cpu stuff as well. The
>>> downside of these patches that patch_instruction is
>>> now synchornized using a lock. Other arches do similar
>>> things, but use fixmaps. The reason for not using
>>> fixmaps is to make use of any randomization in the
>>> future. The code also relies on set_pte_at and pte_clear
>>> to do the appropriate tlb flushing.
>>
>> Isn't it overkill to remap the text in another area ?
>>
>> Among the 6 arches implementing CONFIG_STRICT_KERNEL_RWX (arm, arm64,
>> parisc, s390, x86/32, x86/64):
>> - arm, x86/32 and x86/64 set text RW during the modification
> 
> x86 uses set_fixmap() in text_poke(), am I missing something?
> 

Indeed I looked how it is done in ftrace. On x86 text modifications are 
done using ftrace_write() which calls probe_kernel_write() which doesn't 
remap anything. It first calls ftrace_arch_code_modify_prepare() which 
sets the kernel text to rw.

Indeed you are right, text_poke() remaps via fixmap. However it looks 
like text_poke() is used only for kgdb and kprobe

Christophe

[v1,1/8] powerpc/lib/code-patching: Enhance code patching

Commit Message

Comments

Patch