Message ID | 1559202287-15553-1-git-send-email-jiong.wang@netronome.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf-next] bpf: doc: update answer for 32-bit subregister question | expand |
On Thu, May 30, 2019 at 12:46 AM Jiong Wang <jiong.wang@netronome.com> wrote: > > There has been quite a few progress around the two steps mentioned in the > answer to the following question: > > Q: BPF 32-bit subregister requirements > > This patch updates the answer to reflect what has been done. > > v1: > - Integrated rephrase from Quentin and Jakub. > > Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> > Signed-off-by: Jiong Wang <jiong.wang@netronome.com> > --- > Documentation/bpf/bpf_design_QA.rst | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst > index cb402c5..5092a2a 100644 > --- a/Documentation/bpf/bpf_design_QA.rst > +++ b/Documentation/bpf/bpf_design_QA.rst > @@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit > CPU architectures and 32-bit HW accelerators. Can true 32-bit registers > be added to BPF in the future? > > -A: NO. The first thing to improve performance on 32-bit archs is to teach > -LLVM to generate code that uses 32-bit subregisters. Then second step > -is to teach verifier to mark operations where zero-ing upper bits > -is unnecessary. Then JITs can take advantage of those markings and > -drastically reduce size of generated code and improve performance. > +A: NO Add period "."? > + > +But some optimizations on zero-ing the upper 32 bits for BPF registers are > +available, and can be leveraged to improve the performance of JIT compilers > +for 32-bit architectures. I guess it should be "improve the performance of JITed BPF programs for 32-bit architectures"? Thanks, Song > + > +Starting with version 7, LLVM is able to generate instructions that operate > +on 32-bit subregisters, provided the option -mattr=+alu32 is passed for > +compiling a program. Furthermore, the verifier can now mark the > +instructions for which zero-ing the upper bits of the destination register > +is required, and insert an explicit zero-extension (zext) instruction > +(a mov32 variant). This means that for architectures without zext hardware > +support, the JIT back-ends do not need to clear the upper bits for > +subregisters written by alu32 instructions or narrow loads. Instead, the > +back-ends simply need to support code generation for that mov32 variant, > +and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to > +enable zext insertion in the verifier). > + > +Note that it is possible for a JIT back-end to have partial hardware > +support for zext. In that case, if verifier zext insertion is enabled, > +it could lead to the insertion of unnecessary zext instructions. Such > +instructions could be removed by creating a simple peephole inside the JIT > +back-end: if one instruction has hardware support for zext and if the next > +instruction is an explicit zext, then the latter can be skipped when doing > +the code generation. > > Q: Does BPF have a stable ABI? > ------------------------------ > -- > 2.7.4 >
Song Liu writes: > On Thu, May 30, 2019 at 12:46 AM Jiong Wang <jiong.wang@netronome.com> wrote: >> >> There has been quite a few progress around the two steps mentioned in the >> answer to the following question: >> >> Q: BPF 32-bit subregister requirements >> >> This patch updates the answer to reflect what has been done. >> >> v1: >> - Integrated rephrase from Quentin and Jakub. >> >> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> >> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> >> --- >> Documentation/bpf/bpf_design_QA.rst | 30 +++++++++++++++++++++++++----- >> 1 file changed, 25 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst >> index cb402c5..5092a2a 100644 >> --- a/Documentation/bpf/bpf_design_QA.rst >> +++ b/Documentation/bpf/bpf_design_QA.rst >> @@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit >> CPU architectures and 32-bit HW accelerators. Can true 32-bit registers >> be added to BPF in the future? >> >> -A: NO. The first thing to improve performance on 32-bit archs is to teach >> -LLVM to generate code that uses 32-bit subregisters. Then second step >> -is to teach verifier to mark operations where zero-ing upper bits >> -is unnecessary. Then JITs can take advantage of those markings and >> -drastically reduce size of generated code and improve performance. >> +A: NO > > Add period "."? Ack > >> + >> +But some optimizations on zero-ing the upper 32 bits for BPF registers are >> +available, and can be leveraged to improve the performance of JIT compilers >> +for 32-bit architectures. > > I guess it should be "improve the performance of JITed BPF programs for 32-bit > architectures"? Ack, that is more accurate. Will respin. Thanks. Regards, Jiong > > Thanks, > Song > >> + >> +Starting with version 7, LLVM is able to generate instructions that operate >> +on 32-bit subregisters, provided the option -mattr=+alu32 is passed for >> +compiling a program. Furthermore, the verifier can now mark the >> +instructions for which zero-ing the upper bits of the destination register >> +is required, and insert an explicit zero-extension (zext) instruction >> +(a mov32 variant). This means that for architectures without zext hardware >> +support, the JIT back-ends do not need to clear the upper bits for >> +subregisters written by alu32 instructions or narrow loads. Instead, the >> +back-ends simply need to support code generation for that mov32 variant, >> +and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to >> +enable zext insertion in the verifier). >> + >> +Note that it is possible for a JIT back-end to have partial hardware >> +support for zext. In that case, if verifier zext insertion is enabled, >> +it could lead to the insertion of unnecessary zext instructions. Such >> +instructions could be removed by creating a simple peephole inside the JIT >> +back-end: if one instruction has hardware support for zext and if the next >> +instruction is an explicit zext, then the latter can be skipped when doing >> +the code generation. >> >> Q: Does BPF have a stable ABI? >> ------------------------------ >> -- >> 2.7.4 >>
diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index cb402c5..5092a2a 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit CPU architectures and 32-bit HW accelerators. Can true 32-bit registers be added to BPF in the future? -A: NO. The first thing to improve performance on 32-bit archs is to teach -LLVM to generate code that uses 32-bit subregisters. Then second step -is to teach verifier to mark operations where zero-ing upper bits -is unnecessary. Then JITs can take advantage of those markings and -drastically reduce size of generated code and improve performance. +A: NO + +But some optimizations on zero-ing the upper 32 bits for BPF registers are +available, and can be leveraged to improve the performance of JIT compilers +for 32-bit architectures. + +Starting with version 7, LLVM is able to generate instructions that operate +on 32-bit subregisters, provided the option -mattr=+alu32 is passed for +compiling a program. Furthermore, the verifier can now mark the +instructions for which zero-ing the upper bits of the destination register +is required, and insert an explicit zero-extension (zext) instruction +(a mov32 variant). This means that for architectures without zext hardware +support, the JIT back-ends do not need to clear the upper bits for +subregisters written by alu32 instructions or narrow loads. Instead, the +back-ends simply need to support code generation for that mov32 variant, +and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to +enable zext insertion in the verifier). + +Note that it is possible for a JIT back-end to have partial hardware +support for zext. In that case, if verifier zext insertion is enabled, +it could lead to the insertion of unnecessary zext instructions. Such +instructions could be removed by creating a simple peephole inside the JIT +back-end: if one instruction has hardware support for zext and if the next +instruction is an explicit zext, then the latter can be skipped when doing +the code generation. Q: Does BPF have a stable ABI? ------------------------------