Message ID | 20191113170005.48813-1-iii@linux.ibm.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf-next] bpf: make bpf_jit_binary_alloc support alignment > 4 | expand |
On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > Currently passing alignment greater than 4 to bpf_jit_binary_alloc does > not work: in such cases it aligns only to 4 bytes. > > However, this is required on s390, where in order to load a constant > from memory in a large (>512k) BPF program, one must use lgrl > instruction, whose memory operand must be aligned on an 8-byte boundary. > > This patch makes it possible to request an arbitrary power-of-2 > alignment from bpf_jit_binary_alloc by allocating extra padding bytes > and aligning the resulting pointer rather than the start offset. > > An alternative would be to simply increase the alignment of > bpf_binary_header.image to 8, but this would increase the risk of > wasting a page on arches that don't need it, and would also be > insufficient in case someone needs e.g. 16-byte alignment in the > future. > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Maybe we can just make it 8 byte aligned for all architectures? #define BPF_IMAGE_ALIGNMENT 8
On Thu, Nov 14, 2019 at 9:40 AM Song Liu <liu.song.a23@gmail.com> wrote: > > On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > > Currently passing alignment greater than 4 to bpf_jit_binary_alloc does > > not work: in such cases it aligns only to 4 bytes. > > > > However, this is required on s390, where in order to load a constant > > from memory in a large (>512k) BPF program, one must use lgrl > > instruction, whose memory operand must be aligned on an 8-byte boundary. > > > > This patch makes it possible to request an arbitrary power-of-2 > > alignment from bpf_jit_binary_alloc by allocating extra padding bytes > > and aligning the resulting pointer rather than the start offset. > > > > An alternative would be to simply increase the alignment of > > bpf_binary_header.image to 8, but this would increase the risk of > > wasting a page on arches that don't need it, and would also be > > insufficient in case someone needs e.g. 16-byte alignment in the > > future. why not 8 or 16? I don't follow why that would waste a page. > > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > Maybe we can just make it 8 byte aligned for all architectures? > > #define BPF_IMAGE_ALIGNMENT 8
> Am 14.11.2019 um 19:14 schrieb Alexei Starovoitov <alexei.starovoitov@gmail.com>: > > On Thu, Nov 14, 2019 at 9:40 AM Song Liu <liu.song.a23@gmail.com> wrote: >> >> On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: >>> >>> Currently passing alignment greater than 4 to bpf_jit_binary_alloc does >>> not work: in such cases it aligns only to 4 bytes. >>> >>> However, this is required on s390, where in order to load a constant >>> from memory in a large (>512k) BPF program, one must use lgrl >>> instruction, whose memory operand must be aligned on an 8-byte boundary. >>> >>> This patch makes it possible to request an arbitrary power-of-2 >>> alignment from bpf_jit_binary_alloc by allocating extra padding bytes >>> and aligning the resulting pointer rather than the start offset. >>> >>> An alternative would be to simply increase the alignment of >>> bpf_binary_header.image to 8, but this would increase the risk of >>> wasting a page on arches that don't need it, and would also be >>> insufficient in case someone needs e.g. 16-byte alignment in the >>> future. > > why not 8 or 16? I don't follow why that would waste a page. It might waste a page because bpf_jit_binary_alloc rounds up allocation size to PAGE_SIZE, and unnecessary padding might be the last straw that would cause another page to be allocated. But that would apply only to a tiny amount of programs, whose JITed size is slightly smaller than a multiple of PAGE_SIZE. Sorry, I didn't fully get the 8 vs 16 question. At the moment, for s390-specific purpose 8 would be enough. I used 16 just to demonstrate that this solution wouldn't be future-proof. AFAIK some Intel instructions might want 16 (VMOVDQA?). But maybe it's better to think about it when someone actually needs to use them. >>> >>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> >> >> Maybe we can just make it 8 byte aligned for all architectures? >> >> #define BPF_IMAGE_ALIGNMENT 8 Seems like I'm overthinking this. If just bumping the alignment to 8 is OK, then I'll send a simpler patch. Best regards, Ilya
On Thu, Nov 14, 2019 at 10:35 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > > > Am 14.11.2019 um 19:14 schrieb Alexei Starovoitov <alexei.starovoitov@gmail.com>: > > > > On Thu, Nov 14, 2019 at 9:40 AM Song Liu <liu.song.a23@gmail.com> wrote: > >> > >> On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > >>> > >>> Currently passing alignment greater than 4 to bpf_jit_binary_alloc does > >>> not work: in such cases it aligns only to 4 bytes. > >>> > >>> However, this is required on s390, where in order to load a constant > >>> from memory in a large (>512k) BPF program, one must use lgrl > >>> instruction, whose memory operand must be aligned on an 8-byte boundary. > >>> > >>> This patch makes it possible to request an arbitrary power-of-2 > >>> alignment from bpf_jit_binary_alloc by allocating extra padding bytes > >>> and aligning the resulting pointer rather than the start offset. > >>> > >>> An alternative would be to simply increase the alignment of > >>> bpf_binary_header.image to 8, but this would increase the risk of > >>> wasting a page on arches that don't need it, and would also be > >>> insufficient in case someone needs e.g. 16-byte alignment in the > >>> future. > > > > why not 8 or 16? I don't follow why that would waste a page. > > It might waste a page because bpf_jit_binary_alloc rounds up allocation > size to PAGE_SIZE, and unnecessary padding might be the last straw that > would cause another page to be allocated. But that would apply only to > a tiny amount of programs, whose JITed size is slightly smaller than a > multiple of PAGE_SIZE. > > Sorry, I didn't fully get the 8 vs 16 question. At the moment, for > s390-specific purpose 8 would be enough. I used 16 just to demonstrate > that this solution wouldn't be future-proof. AFAIK some Intel > instructions might want 16 (VMOVDQA?). But maybe it's better to think > about it when someone actually needs to use them. > > >>> > >>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > >> > >> Maybe we can just make it 8 byte aligned for all architectures? > >> > >> #define BPF_IMAGE_ALIGNMENT 8 > > Seems like I'm overthinking this. If just bumping the alignment to 8 is > OK, then I'll send a simpler patch. yes. I think that's better.
diff --git a/include/linux/filter.h b/include/linux/filter.h index 7a6f8f6f1da4..351a31eec24b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -515,10 +515,12 @@ struct sock_fprog_kern { struct sock_filter *filter; }; +/* Some arches need word alignment for their instructions */ +#define BPF_IMAGE_ALIGNMENT 4 + struct bpf_binary_header { u32 pages; - /* Some arches need word alignment for their instructions */ - u8 image[] __aligned(4); + u8 image[] __aligned(BPF_IMAGE_ALIGNMENT); }; struct bpf_prog { diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index c1fde0303280..75dd3a43ada0 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -31,6 +31,8 @@ #include <linux/rcupdate.h> #include <linux/perf_event.h> #include <linux/extable.h> +#include <linux/kernel.h> +#include <linux/log2.h> #include <asm/unaligned.h> /* Registers */ @@ -812,14 +814,20 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, unsigned int alignment, bpf_jit_fill_hole_t bpf_fill_ill_insns) { + u32 size, hole, start, pages, padding; struct bpf_binary_header *hdr; - u32 size, hole, start, pages; + + WARN_ON_ONCE(!is_power_of_2(alignment)); + if (alignment <= BPF_IMAGE_ALIGNMENT) + padding = 0; + else + padding = alignment - BPF_IMAGE_ALIGNMENT + 1; /* Most of BPF filters are really small, but if some of them * fill a page, allow at least 128 extra bytes to insert a * random section of illegal instructions. */ - size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE); + size = round_up(proglen + sizeof(*hdr) + padding + 128, PAGE_SIZE); pages = size / PAGE_SIZE; if (bpf_jit_charge_modmem(pages)) @@ -834,12 +842,16 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, bpf_fill_ill_insns(hdr, size); hdr->pages = pages; - hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)), + hole = min_t(unsigned int, + size - (proglen + sizeof(*hdr) + padding), PAGE_SIZE - sizeof(*hdr)); - start = (get_random_int() % hole) & ~(alignment - 1); + start = get_random_int() % hole; /* Leave a random number of instructions before BPF code. */ - *image_ptr = &hdr->image[start]; + if (alignment <= BPF_IMAGE_ALIGNMENT) + *image_ptr = &hdr->image[start & ~(alignment - 1)]; + else + *image_ptr = PTR_ALIGN(&hdr->image[start], alignment); return hdr; }
Currently passing alignment greater than 4 to bpf_jit_binary_alloc does not work: in such cases it aligns only to 4 bytes. However, this is required on s390, where in order to load a constant from memory in a large (>512k) BPF program, one must use lgrl instruction, whose memory operand must be aligned on an 8-byte boundary. This patch makes it possible to request an arbitrary power-of-2 alignment from bpf_jit_binary_alloc by allocating extra padding bytes and aligning the resulting pointer rather than the start offset. An alternative would be to simply increase the alignment of bpf_binary_header.image to 8, but this would increase the risk of wasting a page on arches that don't need it, and would also be insufficient in case someone needs e.g. 16-byte alignment in the future. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> --- include/linux/filter.h | 6 ++++-- kernel/bpf/core.c | 22 +++++++++++++++++----- 2 files changed, 21 insertions(+), 7 deletions(-)