Message ID | 20190909223236.157099-1-samitolvanen@google.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | bpf: validate bpf_func when BPF_JIT is enabled | expand |
On 9/9/19 11:32 PM, Sami Tolvanen wrote: > With CONFIG_BPF_JIT, the kernel makes indirect calls to dynamically > generated code. This change adds basic sanity checking to ensure > we are jumping to a valid location, which narrows down the attack > surface on the stored pointer. This also prepares the code for future > Control-Flow Integrity (CFI) checking, which adds indirect call > validation to call targets that can be determined at compile-time, but > cannot validate calls to jited functions. > > In addition, this change adds a weak arch_bpf_jit_check_func function, > which architectures that implement BPF JIT can override to perform > additional validation, such as verifying that the pointer points to > the correct memory region. You did not mention BPF_BINARY_HEADER_MAGIC and added member of `magic` in bpf_binary_header. Could you add some details on what is the purpose for this `magic` member? > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > --- > include/linux/filter.h | 26 ++++++++++++++++++++++++-- > kernel/bpf/core.c | 25 +++++++++++++++++++++++++ > 2 files changed, 49 insertions(+), 2 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 92c6e31fb008..abfb0e1b21a8 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -511,7 +511,10 @@ struct sock_fprog_kern { > struct sock_filter *filter; > }; > > +#define BPF_BINARY_HEADER_MAGIC 0x05de0e82 > + > struct bpf_binary_header { > + u32 magic; > u32 pages; > /* Some arches need word alignment for their instructions */ > u8 image[] __aligned(4); > @@ -553,20 +556,39 @@ struct sk_filter { > > DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); > > +#ifdef CONFIG_BPF_JIT > +/* > + * With JIT, the kernel makes an indirect call to dynamically generated > + * code. Use bpf_call_func to perform additional validation of the call > + * target to narrow down attack surface. Architectures implementing BPF > + * JIT can override arch_bpf_jit_check_func for arch-specific checking. > + */ > +extern unsigned int bpf_call_func(const struct bpf_prog *prog, > + const void *ctx); > + > +extern bool arch_bpf_jit_check_func(const struct bpf_prog *prog); > +#else > +static inline unsigned int bpf_call_func(const struct bpf_prog *prog, > + const void *ctx) > +{ > + return prog->bpf_func(ctx, prog->insnsi); > +} > +#endif > + > #define BPF_PROG_RUN(prog, ctx) ({ \ > u32 ret; \ > cant_sleep(); \ > if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ > struct bpf_prog_stats *stats; \ > u64 start = sched_clock(); \ > - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ > + ret = bpf_call_func(prog, ctx); \ > stats = this_cpu_ptr(prog->aux->stats); \ > u64_stats_update_begin(&stats->syncp); \ > stats->cnt++; \ > stats->nsecs += sched_clock() - start; \ > u64_stats_update_end(&stats->syncp); \ > } else { \ > - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ > + ret = bpf_call_func(prog, ctx); \ > } \ > ret; }) > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 66088a9e9b9e..7aad58f67105 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -792,6 +792,30 @@ void __weak bpf_jit_free_exec(void *addr) > module_memfree(addr); > } > > +#ifdef CONFIG_BPF_JIT > +bool __weak arch_bpf_jit_check_func(const struct bpf_prog *prog) > +{ > + return true; > +} > + > +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx) > +{ > + const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog); > + > + if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited) > + return prog->bpf_func(ctx, prog->insnsi); > + > + if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC || > + !arch_bpf_jit_check_func(prog))) { > + WARN(1, "attempt to jump to an invalid address"); > + return 0; > + } > + > + return prog->bpf_func(ctx, prog->insnsi); > +} The above can be rewritten as if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited || hdr->magic != BPF_BINARY_HEADER_MAGIC || !arch_bpf_jit_check_func(prog))) { WARN(1, "attempt to jump to an invalid address"); return 0; } return prog->bpf_func(ctx, prog->insnsi); BPF_PROG_RUN() will be called during xdp fast path. Have you measured how much slowdown the above change could cost for the performance? > +EXPORT_SYMBOL_GPL(bpf_call_func); > +#endif > + > struct bpf_binary_header * > bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, > unsigned int alignment, > @@ -818,6 +842,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, > /* Fill space with illegal/arch-dep instructions. */ > bpf_fill_ill_insns(hdr, size); > > + hdr->magic = BPF_BINARY_HEADER_MAGIC; > hdr->pages = pages; > hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)), > PAGE_SIZE - sizeof(*hdr)); >
On Tue, Sep 10, 2019 at 08:37:19AM +0000, Yonghong Song wrote: > You did not mention BPF_BINARY_HEADER_MAGIC and added member > of `magic` in bpf_binary_header. Could you add some details > on what is the purpose for this `magic` member? Sure, I'll add a description to the next version. The magic is a random number used to identify bpf_binary_header in memory. The purpose of this patch is to limit the possible call targets for the function pointer and checking for the magic helps ensure we are jumping to a page that contains a jited function, instead of allowing calls to arbitrary targets. This is particularly useful when combined with the compiler-based Control-Flow Integrity (CFI) mitigation, which Google started shipping in Pixel kernels last year. The compiler injects checks to all indirect calls, but cannot obviously validate jumps to dynamically generated code. > > +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx) > > +{ > > + const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog); > > + > > + if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited) > > + return prog->bpf_func(ctx, prog->insnsi); > > + > > + if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC || > > + !arch_bpf_jit_check_func(prog))) { > > + WARN(1, "attempt to jump to an invalid address"); > > + return 0; > > + } > > + > > + return prog->bpf_func(ctx, prog->insnsi); > > +} > The above can be rewritten as > if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited || > hdr->magic != BPF_BINARY_HEADER_MAGIC || > !arch_bpf_jit_check_func(prog))) { > WARN(1, "attempt to jump to an invalid address"); > return 0; > } That doesn't look quite equivalent, but yes, this can be rewritten as a single if statement like this: if ((IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited) && (hdr->magic != BPF_BINARY_HEADER_MAGIC || !arch_bpf_jit_check_func(prog))) I think splitting the interpreter and JIT paths would be more readable, but I can certainly change this if you prefer. > BPF_PROG_RUN() will be called during xdp fast path. > Have you measured how much slowdown the above change could > cost for the performance? I have not measured the overhead, but it shouldn't be significant. Is there a particular benchmark you'd like me to run? Sami
On 9/10/19 6:22 PM, Sami Tolvanen wrote: > On Tue, Sep 10, 2019 at 08:37:19AM +0000, Yonghong Song wrote: >> You did not mention BPF_BINARY_HEADER_MAGIC and added member >> of `magic` in bpf_binary_header. Could you add some details >> on what is the purpose for this `magic` member? > > Sure, I'll add a description to the next version. > > The magic is a random number used to identify bpf_binary_header in > memory. The purpose of this patch is to limit the possible call > targets for the function pointer and checking for the magic helps > ensure we are jumping to a page that contains a jited function, > instead of allowing calls to arbitrary targets. > > This is particularly useful when combined with the compiler-based > Control-Flow Integrity (CFI) mitigation, which Google started shipping > in Pixel kernels last year. The compiler injects checks to all > indirect calls, but cannot obviously validate jumps to dynamically > generated code. > >>> +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx) >>> +{ >>> + const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog); >>> + >>> + if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited) >>> + return prog->bpf_func(ctx, prog->insnsi); >>> + >>> + if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC || >>> + !arch_bpf_jit_check_func(prog))) { >>> + WARN(1, "attempt to jump to an invalid address"); >>> + return 0; >>> + } >>> + >>> + return prog->bpf_func(ctx, prog->insnsi); >>> +} > >> The above can be rewritten as >> if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || prog->jited || >> hdr->magic != BPF_BINARY_HEADER_MAGIC || >> !arch_bpf_jit_check_func(prog))) { >> WARN(1, "attempt to jump to an invalid address"); >> return 0; >> } > > That doesn't look quite equivalent, but yes, this can be rewritten as a Indeed, I made a mistake. Your below change is correct. > single if statement like this: > > if ((IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) || > prog->jited) && > (hdr->magic != BPF_BINARY_HEADER_MAGIC || > !arch_bpf_jit_check_func(prog))) > > I think splitting the interpreter and JIT paths would be more readable, > but I can certainly change this if you prefer. How about this: if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited) goto out; if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC || !arch_bpf_jit_check_func(prog))) { WARN(1, "attempt to jump to an invalid address"); return 0; } out: return prog->bpf_func(ctx, prog->insnsi); > >> BPF_PROG_RUN() will be called during xdp fast path. >> Have you measured how much slowdown the above change could >> cost for the performance? > > I have not measured the overhead, but it shouldn't be significant. Is > there a particular benchmark you'd like me to run? I am not an expert in XDP testing. Toke, Björn, could you give some suggestions what to test for XDP performance here? > > Sami >
On 2019-09-11 09:42, Yonghong Song wrote: > I am not an expert in XDP testing. Toke, Björn, could you give some > suggestions what to test for XDP performance here? I ran the "xdp_rxq_info" sample with and without Sami's patch: $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP Before: Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch XDP stats CPU pps issue-pps XDP-RX CPU 20 23923874 0 XDP-RX CPU total 23923874 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 20:20 23923878 0 rx_queue_index 20:sum 23923878 After Sami's patch: Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch XDP stats CPU pps issue-pps XDP-RX CPU 20 22998700 0 XDP-RX CPU total 22998700 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 20:20 22998705 0 rx_queue_index 20:sum 22998705 So, roughly ~4% for this somewhat naive scenario. As for XDP performance tests; I guess some of the XDP selftests could be used as well! Björn
Björn Töpel <bjorn.topel@intel.com> writes: > On 2019-09-11 09:42, Yonghong Song wrote: >> I am not an expert in XDP testing. Toke, Björn, could you give some >> suggestions what to test for XDP performance here? > > I ran the "xdp_rxq_info" sample with and without Sami's patch: Thanks for doing this! > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP > > Before: > > Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch > XDP stats CPU pps issue-pps > XDP-RX CPU 20 23923874 0 > XDP-RX CPU total 23923874 > > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 20:20 23923878 0 > rx_queue_index 20:sum 23923878 > > After Sami's patch: > > Running XDP on dev:enp134s0f0 (ifindex:6) action:XDP_DROP options:no_touch > XDP stats CPU pps issue-pps > XDP-RX CPU 20 22998700 0 > XDP-RX CPU total 22998700 > > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 20:20 22998705 0 > rx_queue_index 20:sum 22998705 > > > So, roughly ~4% for this somewhat naive scenario. Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead. I guess that is not *too* bad; but it's still chipping away at performance; anything we could do to lower the overhead? -Toke
On Wed, Sep 11, 2019 at 12:43 AM Yonghong Song <yhs@fb.com> wrote: > How about this: > > if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited) > goto out; > > if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC || > !arch_bpf_jit_check_func(prog))) { > WARN(1, "attempt to jump to an invalid address"); > return 0; > } > out: > return prog->bpf_func(ctx, prog->insnsi); Sure, that does look cleaner. I'll use this in the next version. Thanks. Sami
On Wed, Sep 11, 2019 at 5:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Björn Töpel <bjorn.topel@intel.com> writes: > > I ran the "xdp_rxq_info" sample with and without Sami's patch: > > Thanks for doing this! Yes, thanks for testing this Björn! > Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead. > > I guess that is not *too* bad; but it's still chipping away at > performance; anything we could do to lower the overhead? The check is already rather minimal, but I could move this to a static inline function to help ensure the compiler doesn't generate an additional function call for this. I'm also fine with gating this behind a separate config option, but I'm not sure if that's worth it. Any thoughts? Sami
Sami Tolvanen <samitolvanen@google.com> writes: > On Wed, Sep 11, 2019 at 5:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Björn Töpel <bjorn.topel@intel.com> writes: >> > I ran the "xdp_rxq_info" sample with and without Sami's patch: >> >> Thanks for doing this! > > Yes, thanks for testing this Björn! > >> Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead. >> >> I guess that is not *too* bad; but it's still chipping away at >> performance; anything we could do to lower the overhead? > > The check is already rather minimal, but I could move this to a static > inline function to help ensure the compiler doesn't generate an > additional function call for this. I'm also fine with gating this > behind a separate config option, but I'm not sure if that's worth it. > Any thoughts? I think it would be good if you do both. I'm a bit worried that XDP performance will end up in a "death by a thousand paper cuts" situation, so I'd rather push back on even relatively small overheads like this; so being able to turn it off in the config would be good. Can you share more details about what the "future CFI checking" is likely to look like? -Toke
On Thu, Sep 12, 2019 at 3:52 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > I think it would be good if you do both. I'm a bit worried that XDP > performance will end up in a "death by a thousand paper cuts" situation, > so I'd rather push back on even relatively small overheads like this; so > being able to turn it off in the config would be good. OK, thanks for the feedback. In that case, I think it's probably better to wait until we have CFI ready for upstreaming and use the same config for this one. > Can you share more details about what the "future CFI checking" is > likely to look like? Sure, I posted an overview of CFI and what we're doing in Pixel devices here: https://android-developers.googleblog.com/2018/10/control-flow-integrity-in-android-kernel.html Sami
Sami Tolvanen <samitolvanen@google.com> writes: > On Thu, Sep 12, 2019 at 3:52 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> I think it would be good if you do both. I'm a bit worried that XDP >> performance will end up in a "death by a thousand paper cuts" situation, >> so I'd rather push back on even relatively small overheads like this; so >> being able to turn it off in the config would be good. > > OK, thanks for the feedback. In that case, I think it's probably > better to wait until we have CFI ready for upstreaming and use the > same config for this one. SGTM, thanks! >> Can you share more details about what the "future CFI checking" is >> likely to look like? > > Sure, I posted an overview of CFI and what we're doing in Pixel devices here: > > https://android-developers.googleblog.com/2018/10/control-flow-integrity-in-android-kernel.html Great, thank you. -Toke
diff --git a/include/linux/filter.h b/include/linux/filter.h index 92c6e31fb008..abfb0e1b21a8 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -511,7 +511,10 @@ struct sock_fprog_kern { struct sock_filter *filter; }; +#define BPF_BINARY_HEADER_MAGIC 0x05de0e82 + struct bpf_binary_header { + u32 magic; u32 pages; /* Some arches need word alignment for their instructions */ u8 image[] __aligned(4); @@ -553,20 +556,39 @@ struct sk_filter { DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); +#ifdef CONFIG_BPF_JIT +/* + * With JIT, the kernel makes an indirect call to dynamically generated + * code. Use bpf_call_func to perform additional validation of the call + * target to narrow down attack surface. Architectures implementing BPF + * JIT can override arch_bpf_jit_check_func for arch-specific checking. + */ +extern unsigned int bpf_call_func(const struct bpf_prog *prog, + const void *ctx); + +extern bool arch_bpf_jit_check_func(const struct bpf_prog *prog); +#else +static inline unsigned int bpf_call_func(const struct bpf_prog *prog, + const void *ctx) +{ + return prog->bpf_func(ctx, prog->insnsi); +} +#endif + #define BPF_PROG_RUN(prog, ctx) ({ \ u32 ret; \ cant_sleep(); \ if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ struct bpf_prog_stats *stats; \ u64 start = sched_clock(); \ - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ + ret = bpf_call_func(prog, ctx); \ stats = this_cpu_ptr(prog->aux->stats); \ u64_stats_update_begin(&stats->syncp); \ stats->cnt++; \ stats->nsecs += sched_clock() - start; \ u64_stats_update_end(&stats->syncp); \ } else { \ - ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ + ret = bpf_call_func(prog, ctx); \ } \ ret; }) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 66088a9e9b9e..7aad58f67105 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -792,6 +792,30 @@ void __weak bpf_jit_free_exec(void *addr) module_memfree(addr); } +#ifdef CONFIG_BPF_JIT +bool __weak arch_bpf_jit_check_func(const struct bpf_prog *prog) +{ + return true; +} + +unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx) +{ + const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog); + + if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited) + return prog->bpf_func(ctx, prog->insnsi); + + if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC || + !arch_bpf_jit_check_func(prog))) { + WARN(1, "attempt to jump to an invalid address"); + return 0; + } + + return prog->bpf_func(ctx, prog->insnsi); +} +EXPORT_SYMBOL_GPL(bpf_call_func); +#endif + struct bpf_binary_header * bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, unsigned int alignment, @@ -818,6 +842,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, /* Fill space with illegal/arch-dep instructions. */ bpf_fill_ill_insns(hdr, size); + hdr->magic = BPF_BINARY_HEADER_MAGIC; hdr->pages = pages; hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)), PAGE_SIZE - sizeof(*hdr));
With CONFIG_BPF_JIT, the kernel makes indirect calls to dynamically generated code. This change adds basic sanity checking to ensure we are jumping to a valid location, which narrows down the attack surface on the stored pointer. This also prepares the code for future Control-Flow Integrity (CFI) checking, which adds indirect call validation to call targets that can be determined at compile-time, but cannot validate calls to jited functions. In addition, this change adds a weak arch_bpf_jit_check_func function, which architectures that implement BPF JIT can override to perform additional validation, such as verifying that the pointer points to the correct memory region. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- include/linux/filter.h | 26 ++++++++++++++++++++++++-- kernel/bpf/core.c | 25 +++++++++++++++++++++++++ 2 files changed, 49 insertions(+), 2 deletions(-)