diff mbox series

[bpf-next] bpf: make bpf_jit_binary_alloc support alignment > 4

Message ID 20191113170005.48813-1-iii@linux.ibm.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf-next] bpf: make bpf_jit_binary_alloc support alignment > 4 | expand

Commit Message

Ilya Leoshkevich Nov. 13, 2019, 5 p.m. UTC
Currently passing alignment greater than 4 to bpf_jit_binary_alloc does
not work: in such cases it aligns only to 4 bytes.

However, this is required on s390, where in order to load a constant
from memory in a large (>512k) BPF program, one must use lgrl
instruction, whose memory operand must be aligned on an 8-byte boundary.

This patch makes it possible to request an arbitrary power-of-2
alignment from bpf_jit_binary_alloc by allocating extra padding bytes
and aligning the resulting pointer rather than the start offset.

An alternative would be to simply increase the alignment of
bpf_binary_header.image to 8, but this would increase the risk of
wasting a page on arches that don't need it, and would also be
insufficient in case someone needs e.g. 16-byte alignment in the
future.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 include/linux/filter.h |  6 ++++--
 kernel/bpf/core.c      | 22 +++++++++++++++++-----
 2 files changed, 21 insertions(+), 7 deletions(-)

Comments

Song Liu Nov. 14, 2019, 5:40 p.m. UTC | #1
On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> Currently passing alignment greater than 4 to bpf_jit_binary_alloc does
> not work: in such cases it aligns only to 4 bytes.
>
> However, this is required on s390, where in order to load a constant
> from memory in a large (>512k) BPF program, one must use lgrl
> instruction, whose memory operand must be aligned on an 8-byte boundary.
>
> This patch makes it possible to request an arbitrary power-of-2
> alignment from bpf_jit_binary_alloc by allocating extra padding bytes
> and aligning the resulting pointer rather than the start offset.
>
> An alternative would be to simply increase the alignment of
> bpf_binary_header.image to 8, but this would increase the risk of
> wasting a page on arches that don't need it, and would also be
> insufficient in case someone needs e.g. 16-byte alignment in the
> future.
>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>

Maybe we can just make it 8 byte aligned for all architectures?

#define BPF_IMAGE_ALIGNMENT 8
Alexei Starovoitov Nov. 14, 2019, 6:14 p.m. UTC | #2
On Thu, Nov 14, 2019 at 9:40 AM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >
> > Currently passing alignment greater than 4 to bpf_jit_binary_alloc does
> > not work: in such cases it aligns only to 4 bytes.
> >
> > However, this is required on s390, where in order to load a constant
> > from memory in a large (>512k) BPF program, one must use lgrl
> > instruction, whose memory operand must be aligned on an 8-byte boundary.
> >
> > This patch makes it possible to request an arbitrary power-of-2
> > alignment from bpf_jit_binary_alloc by allocating extra padding bytes
> > and aligning the resulting pointer rather than the start offset.
> >
> > An alternative would be to simply increase the alignment of
> > bpf_binary_header.image to 8, but this would increase the risk of
> > wasting a page on arches that don't need it, and would also be
> > insufficient in case someone needs e.g. 16-byte alignment in the
> > future.

why not 8 or 16? I don't follow why that would waste a page.

> >
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>
> Maybe we can just make it 8 byte aligned for all architectures?
>
> #define BPF_IMAGE_ALIGNMENT 8
Ilya Leoshkevich Nov. 14, 2019, 6:35 p.m. UTC | #3
> Am 14.11.2019 um 19:14 schrieb Alexei Starovoitov <alexei.starovoitov@gmail.com>:
> 
> On Thu, Nov 14, 2019 at 9:40 AM Song Liu <liu.song.a23@gmail.com> wrote:
>> 
>> On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>> 
>>> Currently passing alignment greater than 4 to bpf_jit_binary_alloc does
>>> not work: in such cases it aligns only to 4 bytes.
>>> 
>>> However, this is required on s390, where in order to load a constant
>>> from memory in a large (>512k) BPF program, one must use lgrl
>>> instruction, whose memory operand must be aligned on an 8-byte boundary.
>>> 
>>> This patch makes it possible to request an arbitrary power-of-2
>>> alignment from bpf_jit_binary_alloc by allocating extra padding bytes
>>> and aligning the resulting pointer rather than the start offset.
>>> 
>>> An alternative would be to simply increase the alignment of
>>> bpf_binary_header.image to 8, but this would increase the risk of
>>> wasting a page on arches that don't need it, and would also be
>>> insufficient in case someone needs e.g. 16-byte alignment in the
>>> future.
> 
> why not 8 or 16? I don't follow why that would waste a page.

It might waste a page because bpf_jit_binary_alloc rounds up allocation
size to PAGE_SIZE, and unnecessary padding might be the last straw that
would cause another page to be allocated. But that would apply only to
a tiny amount of programs, whose JITed size is slightly smaller than a
multiple of PAGE_SIZE.

Sorry, I didn't fully get the 8 vs 16 question. At the moment, for
s390-specific purpose 8 would be enough. I used 16 just to demonstrate
that this solution wouldn't be future-proof. AFAIK some Intel
instructions might want 16 (VMOVDQA?). But maybe it's better to think
about it when someone actually needs to use them.

>>> 
>>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>> 
>> Maybe we can just make it 8 byte aligned for all architectures?
>> 
>> #define BPF_IMAGE_ALIGNMENT 8

Seems like I'm overthinking this. If just bumping the alignment to 8 is
OK, then I'll send a simpler patch.

Best regards,
Ilya
Alexei Starovoitov Nov. 14, 2019, 6:37 p.m. UTC | #4
On Thu, Nov 14, 2019 at 10:35 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
>
>
> > Am 14.11.2019 um 19:14 schrieb Alexei Starovoitov <alexei.starovoitov@gmail.com>:
> >
> > On Thu, Nov 14, 2019 at 9:40 AM Song Liu <liu.song.a23@gmail.com> wrote:
> >>
> >> On Wed, Nov 13, 2019 at 9:20 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>>
> >>> Currently passing alignment greater than 4 to bpf_jit_binary_alloc does
> >>> not work: in such cases it aligns only to 4 bytes.
> >>>
> >>> However, this is required on s390, where in order to load a constant
> >>> from memory in a large (>512k) BPF program, one must use lgrl
> >>> instruction, whose memory operand must be aligned on an 8-byte boundary.
> >>>
> >>> This patch makes it possible to request an arbitrary power-of-2
> >>> alignment from bpf_jit_binary_alloc by allocating extra padding bytes
> >>> and aligning the resulting pointer rather than the start offset.
> >>>
> >>> An alternative would be to simply increase the alignment of
> >>> bpf_binary_header.image to 8, but this would increase the risk of
> >>> wasting a page on arches that don't need it, and would also be
> >>> insufficient in case someone needs e.g. 16-byte alignment in the
> >>> future.
> >
> > why not 8 or 16? I don't follow why that would waste a page.
>
> It might waste a page because bpf_jit_binary_alloc rounds up allocation
> size to PAGE_SIZE, and unnecessary padding might be the last straw that
> would cause another page to be allocated. But that would apply only to
> a tiny amount of programs, whose JITed size is slightly smaller than a
> multiple of PAGE_SIZE.
>
> Sorry, I didn't fully get the 8 vs 16 question. At the moment, for
> s390-specific purpose 8 would be enough. I used 16 just to demonstrate
> that this solution wouldn't be future-proof. AFAIK some Intel
> instructions might want 16 (VMOVDQA?). But maybe it's better to think
> about it when someone actually needs to use them.
>
> >>>
> >>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> >>
> >> Maybe we can just make it 8 byte aligned for all architectures?
> >>
> >> #define BPF_IMAGE_ALIGNMENT 8
>
> Seems like I'm overthinking this. If just bumping the alignment to 8 is
> OK, then I'll send a simpler patch.

yes. I think that's better.
diff mbox series

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7a6f8f6f1da4..351a31eec24b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -515,10 +515,12 @@  struct sock_fprog_kern {
 	struct sock_filter	*filter;
 };
 
+/* Some arches need word alignment for their instructions */
+#define BPF_IMAGE_ALIGNMENT 4
+
 struct bpf_binary_header {
 	u32 pages;
-	/* Some arches need word alignment for their instructions */
-	u8 image[] __aligned(4);
+	u8 image[] __aligned(BPF_IMAGE_ALIGNMENT);
 };
 
 struct bpf_prog {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c1fde0303280..75dd3a43ada0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -31,6 +31,8 @@ 
 #include <linux/rcupdate.h>
 #include <linux/perf_event.h>
 #include <linux/extable.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
 #include <asm/unaligned.h>
 
 /* Registers */
@@ -812,14 +814,20 @@  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 		     unsigned int alignment,
 		     bpf_jit_fill_hole_t bpf_fill_ill_insns)
 {
+	u32 size, hole, start, pages, padding;
 	struct bpf_binary_header *hdr;
-	u32 size, hole, start, pages;
+
+	WARN_ON_ONCE(!is_power_of_2(alignment));
+	if (alignment <= BPF_IMAGE_ALIGNMENT)
+		padding = 0;
+	else
+		padding = alignment - BPF_IMAGE_ALIGNMENT + 1;
 
 	/* Most of BPF filters are really small, but if some of them
 	 * fill a page, allow at least 128 extra bytes to insert a
 	 * random section of illegal instructions.
 	 */
-	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
+	size = round_up(proglen + sizeof(*hdr) + padding + 128, PAGE_SIZE);
 	pages = size / PAGE_SIZE;
 
 	if (bpf_jit_charge_modmem(pages))
@@ -834,12 +842,16 @@  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 	bpf_fill_ill_insns(hdr, size);
 
 	hdr->pages = pages;
-	hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)),
+	hole = min_t(unsigned int,
+		     size - (proglen + sizeof(*hdr) + padding),
 		     PAGE_SIZE - sizeof(*hdr));
-	start = (get_random_int() % hole) & ~(alignment - 1);
+	start = get_random_int() % hole;
 
 	/* Leave a random number of instructions before BPF code. */
-	*image_ptr = &hdr->image[start];
+	if (alignment <= BPF_IMAGE_ALIGNMENT)
+		*image_ptr = &hdr->image[start & ~(alignment - 1)];
+	else
+		*image_ptr = PTR_ALIGN(&hdr->image[start], alignment);
 
 	return hdr;
 }