Message ID | 1432334575-16959-1-git-send-email-ast@plumgrid.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On 05/23/2015 12:42 AM, Alexei Starovoitov wrote: > x86 has variable length encoding. x86 JIT compiler is trying > to pick the shortest encoding for given bpf instruction. > While doing so the jump targets are changing, so JIT is doing > multiple passes over the program. Typical program needs 3 passes. > Some very short programs converge with 2 passes. Large programs > may need 4 or 5. But specially crafted bpf programs may hit the > pass limit and if the program converges on the last iteration > the JIT compiler will be producing an image full of 'int 3' insns. > Fix this corner case by doing final iteration over bpf program. > > Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") > Reported-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> LGTM, thanks! Tested-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexei Starovoitov <ast@plumgrid.com> Date: Fri, 22 May 2015 15:42:55 -0700 > x86 has variable length encoding. x86 JIT compiler is trying > to pick the shortest encoding for given bpf instruction. > While doing so the jump targets are changing, so JIT is doing > multiple passes over the program. Typical program needs 3 passes. > Some very short programs converge with 2 passes. Large programs > may need 4 or 5. But specially crafted bpf programs may hit the > pass limit and if the program converges on the last iteration > the JIT compiler will be producing an image full of 'int 3' insns. > Fix this corner case by doing final iteration over bpf program. > > Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") > Reported-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexei Starovoitov > Sent: 22 May 2015 23:43 > x86 has variable length encoding. x86 JIT compiler is trying > to pick the shortest encoding for given bpf instruction. > While doing so the jump targets are changing, so JIT is doing > multiple passes over the program. Typical program needs 3 passes. > Some very short programs converge with 2 passes. Large programs > may need 4 or 5. But specially crafted bpf programs may hit the > pass limit and if the program converges on the last iteration > the JIT compiler will be producing an image full of 'int 3' insns. > Fix this corner case by doing final iteration over bpf program. If the JIT compiler is only changing the encoding of the constants in the x86 instructions (rather than changing the instructions themselves) then there is likely to me an unmeasurable change in the execution time. For instance I don't remember there being a difference in execution time between long and short branches - the only difference is the amount of cache they use. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-05-26 at 13:40 +0000, David Laight wrote: > If the JIT compiler is only changing the encoding of the constants > in the x86 instructions (rather than changing the instructions themselves) > then there is likely to me an unmeasurable change in the execution time. > For instance I don't remember there being a difference in execution time > between long and short branches - the only difference is the amount of > cache they use. icache is precisely the matter here. In the end, it makes a difference. You could check this interesting study Ingo did recently : https://lkml.org/lkml/2015/5/19/1009 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-05-26 at 15:13 +0000, David Laight wrote:
> Yes, interesting, a benchmark that manages to run a lot of code 'cold cache'.
We have binaries here at Google with 400 or 500 MBytes of text.
Not benchmark, super real workloads you know.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet > Sent: 26 May 2015 16:30 > > > Yes, interesting, a benchmark that manages to run a lot of code 'cold cache'. > > We have binaries here at Google with 400 or 500 MBytes of text. > > Not benchmark, super real workloads you know. Indeed, and a lot of the code is likely to be running 'cold cache'. I was alluding to the problem where people will benchmark a small function by running in 1000s of times in a tight loop with exactly the same data. Not only is it 'hot cache' but any dynamic branch prediction is 'trained' to the specific data. David
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 99f76103c6b7..ddeff4844a10 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -966,7 +966,12 @@ void bpf_int_jit_compile(struct bpf_prog *prog) } ctx.cleanup_addr = proglen; - for (pass = 0; pass < 10; pass++) { + /* JITed image shrinks with every pass and the loop iterates + * until the image stops shrinking. Very large bpf programs + * may converge on the last pass. In such case do one more + * pass to emit the final image + */ + for (pass = 0; pass < 10 || image; pass++) { proglen = do_jit(prog, addrs, image, oldproglen, &ctx); if (proglen <= 0) { image = NULL;
x86 has variable length encoding. x86 JIT compiler is trying to pick the shortest encoding for given bpf instruction. While doing so the jump targets are changing, so JIT is doing multiple passes over the program. Typical program needs 3 passes. Some very short programs converge with 2 passes. Large programs may need 4 or 5. But specially crafted bpf programs may hit the pass limit and if the program converges on the last iteration the JIT compiler will be producing an image full of 'int 3' insns. Fix this corner case by doing final iteration over bpf program. Fixes: 0a14842f5a3c ("net: filter: Just In Time compiler for x86-64") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> --- Daniel wrote the 'Edge hopping nuthouse' test case with 4k jump instructions that managed to trigger this bug. The test case is nuts and the bug is real. It's an old bug, but I think worth backporting all the way. Though this fix will apply cleanly only till commit: f3c2af7ba17a ("net: filter: x86: split bpf_jit_compile()") The older kernels should be similar. They have 'for (pass = 0; pass < 10; pass++) {' at the line 153 or so. and all have similar problem as far as I can see. arch/x86/net/bpf_jit_comp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)