mbox series

[v6,0/5] powerpc/bpf: use BPF prog pack allocator

Message ID 20231012200310.235137-1-hbathini@linux.ibm.com (mailing list archive)
Headers show
Series powerpc/bpf: use BPF prog pack allocator | expand

Message

Hari Bathini Oct. 12, 2023, 8:03 p.m. UTC
Most BPF programs are small, but they consume a page each. For systems
with busy traffic and many BPF programs, this may also add significant
pressure on instruction TLB. High iTLB pressure usually slows down the
whole system causing visible performance degradation for production
workloads.

bpf_prog_pack, a customized allocator that packs multiple bpf programs
into preallocated memory chunks, was proposed [1] to address it. This
series extends this support on powerpc.

Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
needed for this support depend on instruction patching in text area.
Currently, patch_instruction() supports patching only one instruction
at a time. The first patch introduces patch_instructions() function
to enable patching more than one instruction at a time. This helps in
avoiding performance degradation while JITing bpf programs.

Patches 2 & 3 implement the above mentioned arch specific functions
using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
code. The last patch enables the use of BPF prog pack allocator on
powerpc and also, ensures cleanup is handled gracefully.

[1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/

Changes in v6:
* No changes in patches 2-5/5 except addition of Acked-by tags from Song.
* Skipped merging code path of patch_instruction() & patch_instructions()
  to avoid performance overhead observed on ppc32 with that.

Changes in v5:
* Moved introduction of patch_instructions() as 1st patch in series.
* Improved patch_instructions() to use memset & memcpy.
* Fixed the misnomer in JITing code as a separate patch.
* Removed unused bpf_flush_icache() function.

Changes in v4:
* Updated bpf_patch_instructions() definition in patch 1/5 so that
  it doesn't have to be updated again in patch 2/5.
* Addressed Christophe's comment on bpf_arch_text_invalidate() return
  value in patch 2/5.

Changes in v3:
* Fixed segfault issue observed on ppc32 due to inaccurate offset
  calculation for branching.
* Tried to minimize the performance impact for patch_instruction()
  with the introduction of patch_instructions().
* Corrected uses of u32* vs ppc_instr_t.
* Moved the change that introduces patch_instructions() to after
  enabling bpf_prog_pack support.
* Added few comments to improve code readability.

Changes in v2:
* Introduced patch_instructions() to help with patching bpf programs.


Hari Bathini (5):
  powerpc/code-patching: introduce patch_instructions()
  powerpc/bpf: implement bpf_arch_text_copy
  powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
  powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data
  powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]

 arch/powerpc/include/asm/code-patching.h |   1 +
 arch/powerpc/lib/code-patching.c         | 138 +++++++++++++++++++++
 arch/powerpc/net/bpf_jit.h               |  18 +--
 arch/powerpc/net/bpf_jit_comp.c          | 145 ++++++++++++++++++-----
 arch/powerpc/net/bpf_jit_comp32.c        |  13 +-
 arch/powerpc/net/bpf_jit_comp64.c        |  10 +-
 6 files changed, 271 insertions(+), 54 deletions(-)

Comments

Daniel Borkmann Oct. 16, 2023, 12:07 p.m. UTC | #1
On 10/12/23 10:03 PM, Hari Bathini wrote:
> Most BPF programs are small, but they consume a page each. For systems
> with busy traffic and many BPF programs, this may also add significant
> pressure on instruction TLB. High iTLB pressure usually slows down the
> whole system causing visible performance degradation for production
> workloads.
> 
> bpf_prog_pack, a customized allocator that packs multiple bpf programs
> into preallocated memory chunks, was proposed [1] to address it. This
> series extends this support on powerpc.
> 
> Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
> needed for this support depend on instruction patching in text area.
> Currently, patch_instruction() supports patching only one instruction
> at a time. The first patch introduces patch_instructions() function
> to enable patching more than one instruction at a time. This helps in
> avoiding performance degradation while JITing bpf programs.
> 
> Patches 2 & 3 implement the above mentioned arch specific functions
> using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
> code. The last patch enables the use of BPF prog pack allocator on
> powerpc and also, ensures cleanup is handled gracefully.
> 
> [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
> 
> Changes in v6:
> * No changes in patches 2-5/5 except addition of Acked-by tags from Song.
> * Skipped merging code path of patch_instruction() & patch_instructions()
>    to avoid performance overhead observed on ppc32 with that.

I presume this will be routed via Michael?

Thanks,
Daniel
Hari Bathini Oct. 17, 2023, 6:26 a.m. UTC | #2
On 16/10/23 5:37 pm, Daniel Borkmann wrote:
> On 10/12/23 10:03 PM, Hari Bathini wrote:
>> Most BPF programs are small, but they consume a page each. For systems
>> with busy traffic and many BPF programs, this may also add significant
>> pressure on instruction TLB. High iTLB pressure usually slows down the
>> whole system causing visible performance degradation for production
>> workloads.
>>
>> bpf_prog_pack, a customized allocator that packs multiple bpf programs
>> into preallocated memory chunks, was proposed [1] to address it. This
>> series extends this support on powerpc.
>>
>> Both bpf_arch_text_copy() & bpf_arch_text_invalidate() functions,
>> needed for this support depend on instruction patching in text area.
>> Currently, patch_instruction() supports patching only one instruction
>> at a time. The first patch introduces patch_instructions() function
>> to enable patching more than one instruction at a time. This helps in
>> avoiding performance degradation while JITing bpf programs.
>>
>> Patches 2 & 3 implement the above mentioned arch specific functions
>> using patch_instructions(). Patch 4 fixes a misnomer in bpf JITing
>> code. The last patch enables the use of BPF prog pack allocator on
>> powerpc and also, ensures cleanup is handled gracefully.
>>
>> [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/
>>
>> Changes in v6:
>> * No changes in patches 2-5/5 except addition of Acked-by tags from Song.
>> * Skipped merging code path of patch_instruction() & patch_instructions()
>>    to avoid performance overhead observed on ppc32 with that.
> 
> I presume this will be routed via Michael?

Yes, Daniel. This can go via linuxppc tree.

Thanks
Hari