Message ID | 20201022082138.2322434-13-jolsa@kernel.org |
---|---|
State | Not Applicable |
Delegated to: | BPF Maintainers |
Headers | show |
Series | bpf: Speed up trampoline attach | expand |
Context | Check | Description |
---|---|---|
jkicinski/cover_letter | success | Link |
jkicinski/fixes_present | success | Link |
jkicinski/patch_count | fail | Series longer than 15 patches |
jkicinski/tree_selection | success | Clearly marked for bpf-next |
jkicinski/subject_prefix | success | Link |
jkicinski/source_inline | success | Was 0 now: 0 |
jkicinski/verify_signedoff | success | Link |
jkicinski/module_param | success | Was 0 now: 0 |
jkicinski/build_32bit | success | Errors and warnings before: 1 this patch: 1 |
jkicinski/kdoc | success | Errors and warnings before: 0 this patch: 0 |
jkicinski/verify_fixes | success | Link |
jkicinski/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 24 lines checked |
jkicinski/build_allmodconfig_warn | success | Errors and warnings before: 1 this patch: 1 |
jkicinski/header_inline | success | Link |
jkicinski/stable | success | Stable not CCed |
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 19fb608546c0..b315803c34d3 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -31,6 +31,7 @@ #include <linux/poll.h> #include <linux/bpf-netns.h> #include <linux/rcupdate_trace.h> +#include <linux/rcupdate_wait.h> #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ @@ -2920,6 +2921,8 @@ static int bpf_trampoline_batch(const union bpf_attr *attr, int cmd) if (!batch) goto out_clean; + synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace); + for (i = 0; i < count; i++) { if (cmd == BPF_TRAMPOLINE_BATCH_ATTACH) { prog = bpf_prog_get(in[i]); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index cdad87461e5d..0d5e4c5860a9 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -271,7 +271,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, * programs finish executing. * Wait for these two grace periods together. */ - synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace); + if (!batch) + synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace); err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2, &tr->func.model, flags, tprogs,
I noticed some of the profiled workloads did not spend more cycles, but took more time to finish than current code. I tracked it to rcu synchronize_rcu_mult call in bpf_trampoline_update and when I called it just once for batch mode it got faster. The current processing when attaching the program is: for each program: bpf(BPF_RAW_TRACEPOINT_OPEN bpf_tracing_prog_attach bpf_trampoline_link_prog bpf_trampoline_update synchronize_rcu_mult register_ftrace_direct With the change the synchronize_rcu_mult is called just once: bpf(BPF_TRAMPOLINE_BATCH_ATTACH for each program: bpf_tracing_prog_attach bpf_trampoline_link_prog bpf_trampoline_update synchronize_rcu_mult register_ftrace_direct_ips I'm not sure this does not break stuff, because I don't follow rcu code that much ;-) However stats are nicer now: Before: Performance counter stats for './test_progs -t attach_test' (5 runs): 37,410,887 cycles:k ( +- 0.98% ) 70,062,158 cycles:u ( +- 0.39% ) 26.80 +- 4.10 seconds time elapsed ( +- 15.31% ) After: Performance counter stats for './test_progs -t attach_test' (5 runs): 36,812,432 cycles:k ( +- 2.52% ) 69,907,191 cycles:u ( +- 0.38% ) 15.04 +- 2.94 seconds time elapsed ( +- 19.54% ) Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- kernel/bpf/syscall.c | 3 +++ kernel/bpf/trampoline.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-)