bpf: validate bpf_func when BPF_JIT is enabled

Message ID	20190909223236.157099-1-samitolvanen@google.com
State	Changes Requested
Delegated to:	BPF Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Mon, 9 Sep 2019 15:32:36 -0700 Message-Id: <20190909223236.157099-1-samitolvanen@google.com> Mime-Version: 1.0 Subject: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled From: Sami Tolvanen <samitolvanen@google.com> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net> Cc: Kees Cook <keescook@chromium.org>, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Sami Tolvanen <samitolvanen@google.com> Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	bpf: validate bpf_func when BPF_JIT is enabled \| expand bpf: validate bpf_func when BPF_JIT is enabled

Sami Tolvanen Sept. 9, 2019, 10:32 p.m. UTC

With CONFIG_BPF_JIT, the kernel makes indirect calls to dynamically
generated code. This change adds basic sanity checking to ensure
we are jumping to a valid location, which narrows down the attack
surface on the stored pointer. This also prepares the code for future
Control-Flow Integrity (CFI) checking, which adds indirect call
validation to call targets that can be determined at compile-time, but
cannot validate calls to jited functions.

In addition, this change adds a weak arch_bpf_jit_check_func function,
which architectures that implement BPF JIT can override to perform
additional validation, such as verifying that the pointer points to
the correct memory region.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 include/linux/filter.h | 26 ++++++++++++++++++++++++--
 kernel/bpf/core.c      | 25 +++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 2 deletions(-)

Yonghong Song Sept. 10, 2019, 8:37 a.m. UTC | #1

On 9/9/19 11:32 PM, Sami Tolvanen wrote:
> With CONFIG_BPF_JIT, the kernel makes indirect calls to dynamically
> generated code. This change adds basic sanity checking to ensure
> we are jumping to a valid location, which narrows down the attack
> surface on the stored pointer. This also prepares the code for future
> Control-Flow Integrity (CFI) checking, which adds indirect call
> validation to call targets that can be determined at compile-time, but
> cannot validate calls to jited functions.
> 
> In addition, this change adds a weak arch_bpf_jit_check_func function,
> which architectures that implement BPF JIT can override to perform
> additional validation, such as verifying that the pointer points to
> the correct memory region.

You did not mention BPF_BINARY_HEADER_MAGIC and added member
of `magic` in bpf_binary_header. Could you add some details
on what is the purpose for this `magic` member?

> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>   include/linux/filter.h | 26 ++++++++++++++++++++++++--
>   kernel/bpf/core.c      | 25 +++++++++++++++++++++++++
>   2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 92c6e31fb008..abfb0e1b21a8 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -511,7 +511,10 @@ struct sock_fprog_kern {
>   	struct sock_filter	*filter;
>   };
>   
> +#define BPF_BINARY_HEADER_MAGIC	0x05de0e82
> +
>   struct bpf_binary_header {
> +	u32 magic;
>   	u32 pages;
>   	/* Some arches need word alignment for their instructions */
>   	u8 image[] __aligned(4);
> @@ -553,20 +556,39 @@ struct sk_filter {
>   
>   DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>   
> +#ifdef CONFIG_BPF_JIT
> +/*
> + * With JIT, the kernel makes an indirect call to dynamically generated
> + * code. Use bpf_call_func to perform additional validation of the call
> + * target to narrow down attack surface. Architectures implementing BPF
> + * JIT can override arch_bpf_jit_check_func for arch-specific checking.
> + */
> +extern unsigned int bpf_call_func(const struct bpf_prog *prog,
> +				  const void *ctx);
> +
> +extern bool arch_bpf_jit_check_func(const struct bpf_prog *prog);
> +#else
> +static inline unsigned int bpf_call_func(const struct bpf_prog *prog,
> +					 const void *ctx)
> +{
> +	return prog->bpf_func(ctx, prog->insnsi);
> +}
> +#endif
> +
>   #define BPF_PROG_RUN(prog, ctx)	({				\
>   	u32 ret;						\
>   	cant_sleep();						\
>   	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
>   		struct bpf_prog_stats *stats;			\
>   		u64 start = sched_clock();			\
> -		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
> +		ret = bpf_call_func(prog, ctx);			\
>   		stats = this_cpu_ptr(prog->aux->stats);		\
>   		u64_stats_update_begin(&stats->syncp);		\
>   		stats->cnt++;					\
>   		stats->nsecs += sched_clock() - start;		\
>   		u64_stats_update_end(&stats->syncp);		\
>   	} else {						\
> -		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
> +		ret = bpf_call_func(prog, ctx);			\
>   	}							\
>   	ret; })
>   
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 66088a9e9b9e..7aad58f67105 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -792,6 +792,30 @@ void __weak bpf_jit_free_exec(void *addr)
>   	module_memfree(addr);
>   }
>   
> +#ifdef CONFIG_BPF_JIT
> +bool __weak arch_bpf_jit_check_func(const struct bpf_prog *prog)
> +{
> +	return true;
> +}
> +
> +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
> +{
> +	const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
> +
> +	if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
> +		return prog->bpf_func(ctx, prog->insnsi);
> +
> +	if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> +		     !arch_bpf_jit_check_func(prog))) {
> +		WARN(1, "attempt to jump to an invalid address");
> +		return 0;
> +	}
> +
> +	return prog->bpf_func(ctx, prog->insnsi);
> +}

The above can be rewritten as
	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
	    hdr->magic != BPF_BINARY_HEADER_MAGIC ||
	    !arch_bpf_jit_check_func(prog))) {
		WARN(1, "attempt to jump to an invalid address");
		return 0;
	}

	return prog->bpf_func(ctx, prog->insnsi);

BPF_PROG_RUN() will be called during xdp fast path.
Have you measured how much slowdown the above change could
cost for the performance?

> +EXPORT_SYMBOL_GPL(bpf_call_func);
> +#endif
> +
>   struct bpf_binary_header *
>   bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>   		     unsigned int alignment,
> @@ -818,6 +842,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>   	/* Fill space with illegal/arch-dep instructions. */
>   	bpf_fill_ill_insns(hdr, size);
>   
> +	hdr->magic = BPF_BINARY_HEADER_MAGIC;
>   	hdr->pages = pages;
>   	hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)),
>   		     PAGE_SIZE - sizeof(*hdr));
>

Sami Tolvanen Sept. 10, 2019, 5:22 p.m. UTC | #2

On Tue, Sep 10, 2019 at 08:37:19AM +0000, Yonghong Song wrote:
> You did not mention BPF_BINARY_HEADER_MAGIC and added member
> of `magic` in bpf_binary_header. Could you add some details
> on what is the purpose for this `magic` member?

Sure, I'll add a description to the next version.

The magic is a random number used to identify bpf_binary_header in
memory. The purpose of this patch is to limit the possible call
targets for the function pointer and checking for the magic helps
ensure we are jumping to a page that contains a jited function,
instead of allowing calls to arbitrary targets.

This is particularly useful when combined with the compiler-based
Control-Flow Integrity (CFI) mitigation, which Google started shipping
in Pixel kernels last year. The compiler injects checks to all
indirect calls, but cannot obviously validate jumps to dynamically
generated code.

> > +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
> > +{
> > +	const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
> > +
> > +	if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
> > +		return prog->bpf_func(ctx, prog->insnsi);
> > +
> > +	if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> > +		     !arch_bpf_jit_check_func(prog))) {
> > +		WARN(1, "attempt to jump to an invalid address");
> > +		return 0;
> > +	}
> > +
> > +	return prog->bpf_func(ctx, prog->insnsi);
> > +}

> The above can be rewritten as
> 	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
> 	    hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> 	    !arch_bpf_jit_check_func(prog))) {
> 		WARN(1, "attempt to jump to an invalid address");
> 		return 0;
> 	}

That doesn't look quite equivalent, but yes, this can be rewritten as a
single if statement like this:

	if ((IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) ||
	     prog->jited) &&
	    (hdr->magic != BPF_BINARY_HEADER_MAGIC ||
	     !arch_bpf_jit_check_func(prog)))

I think splitting the interpreter and JIT paths would be more readable,
but I can certainly change this if you prefer.

> BPF_PROG_RUN() will be called during xdp fast path.
> Have you measured how much slowdown the above change could
> cost for the performance?

I have not measured the overhead, but it shouldn't be significant. Is
there a particular benchmark you'd like me to run?

Sami

Yonghong Song Sept. 11, 2019, 7:42 a.m. UTC | #3

On 9/10/19 6:22 PM, Sami Tolvanen wrote:
> On Tue, Sep 10, 2019 at 08:37:19AM +0000, Yonghong Song wrote:
>> You did not mention BPF_BINARY_HEADER_MAGIC and added member
>> of `magic` in bpf_binary_header. Could you add some details
>> on what is the purpose for this `magic` member?
> 
> Sure, I'll add a description to the next version.
> 
> The magic is a random number used to identify bpf_binary_header in
> memory. The purpose of this patch is to limit the possible call
> targets for the function pointer and checking for the magic helps
> ensure we are jumping to a page that contains a jited function,
> instead of allowing calls to arbitrary targets.
> 
> This is particularly useful when combined with the compiler-based
> Control-Flow Integrity (CFI) mitigation, which Google started shipping
> in Pixel kernels last year. The compiler injects checks to all
> indirect calls, but cannot obviously validate jumps to dynamically
> generated code.
> 
>>> +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
>>> +{
>>> +	const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
>>> +
>>> +	if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
>>> +		return prog->bpf_func(ctx, prog->insnsi);
>>> +
>>> +	if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
>>> +		     !arch_bpf_jit_check_func(prog))) {
>>> +		WARN(1, "attempt to jump to an invalid address");
>>> +		return 0;
>>> +	}
>>> +
>>> +	return prog->bpf_func(ctx, prog->insnsi);
>>> +}
> 
>> The above can be rewritten as
>> 	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited ||
>> 	    hdr->magic != BPF_BINARY_HEADER_MAGIC ||
>> 	    !arch_bpf_jit_check_func(prog))) {
>> 		WARN(1, "attempt to jump to an invalid address");
>> 		return 0;
>> 	}
> 
> That doesn't look quite equivalent, but yes, this can be rewritten as a

Indeed, I made a mistake. Your below change is correct.

> single if statement like this:
> 
> 	if ((IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) ||
> 	     prog->jited) &&
> 	    (hdr->magic != BPF_BINARY_HEADER_MAGIC ||
> 	     !arch_bpf_jit_check_func(prog)))
> 
> I think splitting the interpreter and JIT paths would be more readable,
> but I can certainly change this if you prefer.

How about this:

	if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
		goto out;

	if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
	    !arch_bpf_jit_check_func(prog))) {
		WARN(1, "attempt to jump to an invalid address");
		return 0;
	}
out:
	return prog->bpf_func(ctx, prog->insnsi);

> 
>> BPF_PROG_RUN() will be called during xdp fast path.
>> Have you measured how much slowdown the above change could
>> cost for the performance?
> 
> I have not measured the overhead, but it shouldn't be significant. Is
> there a particular benchmark you'd like me to run?

I am not an expert in XDP testing. Toke, Björn, could you give some
suggestions what to test for XDP performance here?

> 
> Sami
>

Björn Töpel Sept. 11, 2019, 10:39 a.m. UTC | #4

On 2019-09-11 09:42, Yonghong Song wrote:
> I am not an expert in XDP testing. Toke, Björn, could you give some
> suggestions what to test for XDP performance here?

I ran the "xdp_rxq_info" sample with and without Sami's patch:

$ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP

Before:

Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      20      23923874    0
XDP-RX CPU      total   23923874

RXQ stats       RXQ:CPU pps         issue-pps
rx_queue_index   20:20  23923878    0
rx_queue_index   20:sum 23923878

After Sami's patch:

Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      20      22998700    0
XDP-RX CPU      total   22998700

RXQ stats       RXQ:CPU pps         issue-pps
rx_queue_index   20:20  22998705    0
rx_queue_index   20:sum 22998705


So, roughly ~4% for this somewhat naive scenario.


As for XDP performance tests; I guess some of the XDP selftests could be
used as well!


Björn

Toke Høiland-Jørgensen Sept. 11, 2019, 12:09 p.m. UTC | #5

Björn Töpel <bjorn.topel@intel.com> writes:

> On 2019-09-11 09:42, Yonghong Song wrote:
>> I am not an expert in XDP testing. Toke, Björn, could you give some
>> suggestions what to test for XDP performance here?
>
> I ran the "xdp_rxq_info" sample with and without Sami's patch:

Thanks for doing this!

> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
>
> Before:
>
> Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      20      23923874    0
> XDP-RX CPU      total   23923874
>
> RXQ stats       RXQ:CPU pps         issue-pps
> rx_queue_index   20:20  23923878    0
> rx_queue_index   20:sum 23923878
>
> After Sami's patch:
>
> Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      20      22998700    0
> XDP-RX CPU      total   22998700
>
> RXQ stats       RXQ:CPU pps         issue-pps
> rx_queue_index   20:20  22998705    0
> rx_queue_index   20:sum 22998705
>
>
> So, roughly ~4% for this somewhat naive scenario.

Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead.

I guess that is not *too* bad; but it's still chipping away at
performance; anything we could do to lower the overhead?

-Toke

Sami Tolvanen Sept. 11, 2019, 8:29 p.m. UTC | #6

On Wed, Sep 11, 2019 at 12:43 AM Yonghong Song <yhs@fb.com> wrote:
> How about this:
>
>         if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
>                 goto out;
>
>         if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
>             !arch_bpf_jit_check_func(prog))) {
>                 WARN(1, "attempt to jump to an invalid address");
>                 return 0;
>         }
> out:
>         return prog->bpf_func(ctx, prog->insnsi);

Sure, that does look cleaner. I'll use this in the next version. Thanks.

Sami

Sami Tolvanen Sept. 11, 2019, 9:07 p.m. UTC | #7

On Wed, Sep 11, 2019 at 5:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Björn Töpel <bjorn.topel@intel.com> writes:
> > I ran the "xdp_rxq_info" sample with and without Sami's patch:
>
> Thanks for doing this!

Yes, thanks for testing this Björn!

> Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead.
>
> I guess that is not *too* bad; but it's still chipping away at
> performance; anything we could do to lower the overhead?

The check is already rather minimal, but I could move this to a static
inline function to help ensure the compiler doesn't generate an
additional function call for this. I'm also fine with gating this
behind a separate config option, but I'm not sure if that's worth it.
Any thoughts?

Sami

Toke Høiland-Jørgensen Sept. 12, 2019, 10:46 a.m. UTC | #8

Sami Tolvanen <samitolvanen@google.com> writes:

> On Wed, Sep 11, 2019 at 5:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Björn Töpel <bjorn.topel@intel.com> writes:
>> > I ran the "xdp_rxq_info" sample with and without Sami's patch:
>>
>> Thanks for doing this!
>
> Yes, thanks for testing this Björn!
>
>> Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead.
>>
>> I guess that is not *too* bad; but it's still chipping away at
>> performance; anything we could do to lower the overhead?
>
> The check is already rather minimal, but I could move this to a static
> inline function to help ensure the compiler doesn't generate an
> additional function call for this. I'm also fine with gating this
> behind a separate config option, but I'm not sure if that's worth it.
> Any thoughts?

I think it would be good if you do both. I'm a bit worried that XDP
performance will end up in a "death by a thousand paper cuts" situation,
so I'd rather push back on even relatively small overheads like this; so
being able to turn it off in the config would be good.

Can you share more details about what the "future CFI checking" is
likely to look like?

-Toke

Sami Tolvanen Sept. 12, 2019, 10:01 p.m. UTC | #9

On Thu, Sep 12, 2019 at 3:52 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> I think it would be good if you do both. I'm a bit worried that XDP
> performance will end up in a "death by a thousand paper cuts" situation,
> so I'd rather push back on even relatively small overheads like this; so
> being able to turn it off in the config would be good.

OK, thanks for the feedback. In that case, I think it's probably
better to wait until we have CFI ready for upstreaming and use the
same config for this one.

> Can you share more details about what the "future CFI checking" is
> likely to look like?

Sure, I posted an overview of CFI and what we're doing in Pixel devices here:

https://android-developers.googleblog.com/2018/10/control-flow-integrity-in-android-kernel.html

Sami

Toke Høiland-Jørgensen Sept. 13, 2019, 12:19 p.m. UTC | #10

Sami Tolvanen <samitolvanen@google.com> writes:

> On Thu, Sep 12, 2019 at 3:52 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> I think it would be good if you do both. I'm a bit worried that XDP
>> performance will end up in a "death by a thousand paper cuts" situation,
>> so I'd rather push back on even relatively small overheads like this; so
>> being able to turn it off in the config would be good.
>
> OK, thanks for the feedback. In that case, I think it's probably
> better to wait until we have CFI ready for upstreaming and use the
> same config for this one.

SGTM, thanks!

>> Can you share more details about what the "future CFI checking" is
>> likely to look like?
>
> Sure, I posted an overview of CFI and what we're doing in Pixel devices here:
>
> https://android-developers.googleblog.com/2018/10/control-flow-integrity-in-android-kernel.html

Great, thank you.

-Toke

bpf: validate bpf_func when BPF_JIT is enabled

Commit Message

Comments

Patch