Message ID | 20190716115910.23093-1-iii@linux.ibm.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf] bpf: fix narrower loads on s390 | expand |
On Tue, Jul 16, 2019 at 4:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > test_pkt_md_access is failing on s390, since the associated eBPF prog > returns TC_ACT_SHOT, which in turn happens because loading a part of a > struct __sk_buff field produces an incorrect result. > > The problem is that when verifier emits the code to replace partial load > of a field with a full load, a shift and a bitwise AND, it assumes that > the machine is little endian. > > Adjust shift count calculation to account for endianness. > > Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > --- > kernel/bpf/verifier.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 5900cbb966b1..3f9353653558 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) > } > > if (is_narrower_load && size < target_size) { > - u8 shift = (off & (size_default - 1)) * 8; > - > + u8 load_off = off & (size_default - 1); > +#ifdef __LITTLE_ENDIAN > + u8 shift = load_off * 8; > +#else > + u8 shift = (size_default - (load_off + size)) * 8; > +#endif All the values are in register. The shifting operations should be the same for big endian and little endian, e.g., value 64 >> 2 = 16 when value "64" is in register. So I did not see a problem here. Could you elaborate which field access in test_pkt_md_access caused problem? It would be good if you can give detailed memory la > if (ctx_field_size <= 4) { > if (shift) > insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH, > -- > 2.21.0 >
[sorry, resend again as previous one has come text messed out due to networking issues] On Tue, Jul 16, 2019 at 10:08 PM Y Song <ys114321@gmail.com> wrote: > > On Tue, Jul 16, 2019 at 4:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > > test_pkt_md_access is failing on s390, since the associated eBPF prog > > returns TC_ACT_SHOT, which in turn happens because loading a part of a > > struct __sk_buff field produces an incorrect result. > > > > The problem is that when verifier emits the code to replace partial load > > of a field with a full load, a shift and a bitwise AND, it assumes that > > the machine is little endian. > > > > Adjust shift count calculation to account for endianness. > > > > Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > > --- > > kernel/bpf/verifier.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 5900cbb966b1..3f9353653558 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) > > } > > > > if (is_narrower_load && size < target_size) { > > - u8 shift = (off & (size_default - 1)) * 8; > > - > > + u8 load_off = off & (size_default - 1); > > +#ifdef __LITTLE_ENDIAN > > + u8 shift = load_off * 8; > > +#else > > + u8 shift = (size_default - (load_off + size)) * 8; > > +#endif > All the values are in register. The shifting operations should be the same for big endian and little endian, e.g., value 64 >> 2 = 16 when value "64" is in register. So I did not see a problem here. Could you elaborate which field access in test_pkt_md_access caused problem? It would be good if you can give detailed memory layout and register values to illustrate the problem. > > > if (ctx_field_size <= 4) { > > if (shift) > > insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH, > > -- > > 2.21.0 > >
> Am 17.07.2019 um 07:11 schrieb Y Song <ys114321@gmail.com>: > > [sorry, resend again as previous one has come text messed out due to > networking issues] > > On Tue, Jul 16, 2019 at 10:08 PM Y Song <ys114321@gmail.com> wrote: >> >> On Tue, Jul 16, 2019 at 4:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: >>> >>> test_pkt_md_access is failing on s390, since the associated eBPF prog >>> returns TC_ACT_SHOT, which in turn happens because loading a part of a >>> struct __sk_buff field produces an incorrect result. >>> >>> The problem is that when verifier emits the code to replace partial load >>> of a field with a full load, a shift and a bitwise AND, it assumes that >>> the machine is little endian. >>> >>> Adjust shift count calculation to account for endianness. >>> >>> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") >>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> >>> --- >>> kernel/bpf/verifier.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>> index 5900cbb966b1..3f9353653558 100644 >>> --- a/kernel/bpf/verifier.c >>> +++ b/kernel/bpf/verifier.c >>> @@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) >>> } >>> >>> if (is_narrower_load && size < target_size) { >>> - u8 shift = (off & (size_default - 1)) * 8; >>> - >>> + u8 load_off = off & (size_default - 1); >>> +#ifdef __LITTLE_ENDIAN >>> + u8 shift = load_off * 8; >>> +#else >>> + u8 shift = (size_default - (load_off + size)) * 8; >>> +#endif >> > All the values are in register. The shifting operations should be the > same for big endian and little endian, e.g., value 64 >> 2 = 16 when > value "64" is in register. So I did not see a problem here. > > Could you elaborate which field access in test_pkt_md_access > caused problem? The very first one: TEST_FIELD(__u8, len, 0xFF); > It would be good if you can give detailed memory layout and register values > to illustrate the problem. Suppose len = 0x11223344. On a big endian system, this would be 11 22 33 44 Now, we would like to do *(u8 *)&len, the desired result is 0x11. Verifier should emit the following: ((*(u32 *)&len) >> 24) & 0xff, but as of today it misses the shift. On a little endian system the layout is: 44 33 22 11 and the desired result is different - 0x44. Verifier correctly emits (*(u32 *)&len) & 0xff. > >> >>> if (ctx_field_size <= 4) { >>> if (shift) >>> insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH, >>> -- >>> 2.21.0
> Am 17.07.2019 um 11:21 schrieb Ilya Leoshkevich <iii@linux.ibm.com>: > >> Am 17.07.2019 um 07:11 schrieb Y Song <ys114321@gmail.com>: >> >> [sorry, resend again as previous one has come text messed out due to >> networking issues] >> >> On Tue, Jul 16, 2019 at 10:08 PM Y Song <ys114321@gmail.com> wrote: >>> >>> On Tue, Jul 16, 2019 at 4:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: >>>> >>>> test_pkt_md_access is failing on s390, since the associated eBPF prog >>>> returns TC_ACT_SHOT, which in turn happens because loading a part of a >>>> struct __sk_buff field produces an incorrect result. >>>> >>>> The problem is that when verifier emits the code to replace partial load >>>> of a field with a full load, a shift and a bitwise AND, it assumes that >>>> the machine is little endian. >>>> >>>> Adjust shift count calculation to account for endianness. >>>> >>>> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") >>>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> >>>> --- >>>> kernel/bpf/verifier.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>>> index 5900cbb966b1..3f9353653558 100644 >>>> --- a/kernel/bpf/verifier.c >>>> +++ b/kernel/bpf/verifier.c >>>> @@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) >>>> } >>>> >>>> if (is_narrower_load && size < target_size) { >>>> - u8 shift = (off & (size_default - 1)) * 8; >>>> - >>>> + u8 load_off = off & (size_default - 1); >>>> +#ifdef __LITTLE_ENDIAN >>>> + u8 shift = load_off * 8; >>>> +#else >>>> + u8 shift = (size_default - (load_off + size)) * 8; >>>> +#endif >>> >> All the values are in register. The shifting operations should be the >> same for big endian and little endian, e.g., value 64 >> 2 = 16 when >> value "64" is in register. So I did not see a problem here. >> >> Could you elaborate which field access in test_pkt_md_access >> caused problem? > > The very first one: TEST_FIELD(__u8, len, 0xFF); > >> It would be good if you can give detailed memory layout and register values >> to illustrate the problem. > > Suppose len = 0x11223344. On a big endian system, this would be > > 11 22 33 44 > > Now, we would like to do *(u8 *)&len, the desired result is 0x11. > Verifier should emit the following: ((*(u32 *)&len) >> 24) & 0xff, but as > of today it misses the shift. > > On a little endian system the layout is: > > 44 33 22 11 > > and the desired result is different - 0x44. Verifier correctly emits > (*(u32 *)&len) & 0xff. I’ve just realized, that this example does not reflect what the test is doing on big-endian systems (there is an #ifdef for those). Here is a better one: len=0x11223344 and we would like to do ((u8 *)&len)[3]. len is represented as `11 22 33 44` in memory, so the desired result is 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the verifier does ((*(u32 *)&len) >> 24) & 0xff instead.
On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > Am 17.07.2019 um 11:21 schrieb Ilya Leoshkevich <iii@linux.ibm.com>: > > > >> Am 17.07.2019 um 07:11 schrieb Y Song <ys114321@gmail.com>: > >> > >> [sorry, resend again as previous one has come text messed out due to > >> networking issues] > >> > >> On Tue, Jul 16, 2019 at 10:08 PM Y Song <ys114321@gmail.com> wrote: > >>> > >>> On Tue, Jul 16, 2019 at 4:59 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > >>>> > >>>> test_pkt_md_access is failing on s390, since the associated eBPF prog > >>>> returns TC_ACT_SHOT, which in turn happens because loading a part of a > >>>> struct __sk_buff field produces an incorrect result. > >>>> > >>>> The problem is that when verifier emits the code to replace partial load > >>>> of a field with a full load, a shift and a bitwise AND, it assumes that > >>>> the machine is little endian. > >>>> > >>>> Adjust shift count calculation to account for endianness. > >>>> > >>>> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") > >>>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > >>>> --- > >>>> kernel/bpf/verifier.c | 8 ++++++-- > >>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >>>> index 5900cbb966b1..3f9353653558 100644 > >>>> --- a/kernel/bpf/verifier.c > >>>> +++ b/kernel/bpf/verifier.c > >>>> @@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) > >>>> } > >>>> > >>>> if (is_narrower_load && size < target_size) { > >>>> - u8 shift = (off & (size_default - 1)) * 8; > >>>> - > >>>> + u8 load_off = off & (size_default - 1); > >>>> +#ifdef __LITTLE_ENDIAN > >>>> + u8 shift = load_off * 8; > >>>> +#else > >>>> + u8 shift = (size_default - (load_off + size)) * 8; > >>>> +#endif > >>> > >> All the values are in register. The shifting operations should be the > >> same for big endian and little endian, e.g., value 64 >> 2 = 16 when > >> value "64" is in register. So I did not see a problem here. > >> > >> Could you elaborate which field access in test_pkt_md_access > >> caused problem? > > > > The very first one: TEST_FIELD(__u8, len, 0xFF); > > > >> It would be good if you can give detailed memory layout and register values > >> to illustrate the problem. > > > > Suppose len = 0x11223344. On a big endian system, this would be > > > > 11 22 33 44 > > > > Now, we would like to do *(u8 *)&len, the desired result is 0x11. > > Verifier should emit the following: ((*(u32 *)&len) >> 24) & 0xff, but as > > of today it misses the shift. > > > > On a little endian system the layout is: > > > > 44 33 22 11 > > > > and the desired result is different - 0x44. Verifier correctly emits > > (*(u32 *)&len) & 0xff. > > I’ve just realized, that this example does not reflect what the test is > doing on big-endian systems (there is an #ifdef for those). > > Here is a better one: len=0x11223344 and we would like to do > ((u8 *)&len)[3]. > > len is represented as `11 22 33 44` in memory, so the desired result is > 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the > verifier does ((*(u32 *)&len) >> 24) & 0xff instead. What you described above for the memory layout all makes sense. The root cause is for big endian, we should do *((u8 *)&len + 3). This is exactly what macros in test_pkt_md_access.c tries to do. if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ #define TEST_FIELD(TYPE, FIELD, MASK) \ { \ TYPE tmp = *(volatile TYPE *)&skb->FIELD; \ if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \ return TC_ACT_SHOT; \ } #else #define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b)) #define TEST_FIELD(TYPE, FIELD, MASK) \ { \ TYPE tmp = *((volatile TYPE *)&skb->FIELD + \ TEST_FIELD_OFFSET(skb->FIELD, TYPE)); \ if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \ return TC_ACT_SHOT; \ } #endif Could you check whether your __BYTE_ORDER__ is set correctly or not for this case? You may need to tweak Makefile if you are doing cross compilation, I am not sure how as I did not have environment.
> Am 17.07.2019 um 18:25 schrieb Y Song <ys114321@gmail.com>: > > On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: >> >> >> Here is a better one: len=0x11223344 and we would like to do >> ((u8 *)&len)[3]. >> >> len is represented as `11 22 33 44` in memory, so the desired result is >> 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the >> verifier does ((*(u32 *)&len) >> 24) & 0xff instead. > > What you described above for the memory layout all makes sense. > The root cause is for big endian, we should do *((u8 *)&len + 3). > This is exactly what macros in test_pkt_md_access.c tries to do. > > if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ > #define TEST_FIELD(TYPE, FIELD, MASK) \ > { \ > TYPE tmp = *(volatile TYPE *)&skb->FIELD; \ > if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \ > return TC_ACT_SHOT; \ > } > #else > #define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b)) > #define TEST_FIELD(TYPE, FIELD, MASK) \ > { \ > TYPE tmp = *((volatile TYPE *)&skb->FIELD + \ > TEST_FIELD_OFFSET(skb->FIELD, TYPE)); \ > if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \ > return TC_ACT_SHOT; \ > } > #endif > > Could you check whether your __BYTE_ORDER__ is set > correctly or not for this case? You may need to tweak Makefile > if you are doing cross compilation, I am not sure how as I > did not have environment. I’m building natively on s390. Here is the (formatted) preprocessed C code for the first condition: { __u8 tmp = *((volatile __u8 *)&skb->len + ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8))); if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2; }; So I believe the endianness is chosen correctly. Here is the clang-generated BPF bytecode for the first condition: # llvm-objdump -d test_pkt_md_access.o 0000000000000000 process: 0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3) 1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0) 2: 57 30 00 00 00 00 00 ff r3 &= 255 3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10> This also looks good to me. Finally, here is the verifier-generated BPF bytecode: # bpftool prog dump xlated id 14 ; TEST_FIELD(__u8, len, 0xFF); 0: (61) r2 = *(u32 *)(r1 +104) 1: (bc) w2 = w2 2: (74) w2 >>= 24 3: (bc) w2 = w2 4: (54) w2 &= 255 5: (bc) w2 = w2 Here we can see the shift that I'm referring to. I believe we should translate *(u8 *)(r1 + 3) in this case without this shift on big-endian machines. Best regards, Ilya
On Wed, Jul 17, 2019 at 1:52 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > > Am 17.07.2019 um 18:25 schrieb Y Song <ys114321@gmail.com>: > > > > On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > >> > >> > >> Here is a better one: len=0x11223344 and we would like to do > >> ((u8 *)&len)[3]. > >> > >> len is represented as `11 22 33 44` in memory, so the desired result is > >> 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the > >> verifier does ((*(u32 *)&len) >> 24) & 0xff instead. > > > > What you described above for the memory layout all makes sense. > > The root cause is for big endian, we should do *((u8 *)&len + 3). > > This is exactly what macros in test_pkt_md_access.c tries to do. > > > > if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ > > #define TEST_FIELD(TYPE, FIELD, MASK) \ > > { \ > > TYPE tmp = *(volatile TYPE *)&skb->FIELD; \ > > if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \ > > return TC_ACT_SHOT; \ > > } > > #else > > #define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b)) > > #define TEST_FIELD(TYPE, FIELD, MASK) \ > > { \ > > TYPE tmp = *((volatile TYPE *)&skb->FIELD + \ > > TEST_FIELD_OFFSET(skb->FIELD, TYPE)); \ > > if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK)) \ > > return TC_ACT_SHOT; \ > > } > > #endif > > > > Could you check whether your __BYTE_ORDER__ is set > > correctly or not for this case? You may need to tweak Makefile > > if you are doing cross compilation, I am not sure how as I > > did not have environment. > > I’m building natively on s390. > > Here is the (formatted) preprocessed C code for the first condition: > > { > __u8 tmp = *((volatile __u8 *)&skb->len + > ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8))); > if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2; > }; > > So I believe the endianness is chosen correctly. > > Here is the clang-generated BPF bytecode for the first condition: > > # llvm-objdump -d test_pkt_md_access.o > 0000000000000000 process: > 0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3) > 1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0) > 2: 57 30 00 00 00 00 00 ff r3 &= 255 > 3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10> > > This also looks good to me. > > Finally, here is the verifier-generated BPF bytecode: > > # bpftool prog dump xlated id 14 > ; TEST_FIELD(__u8, len, 0xFF); > 0: (61) r2 = *(u32 *)(r1 +104) > 1: (bc) w2 = w2 > 2: (74) w2 >>= 24 > 3: (bc) w2 = w2 > 4: (54) w2 &= 255 > 5: (bc) w2 = w2 > > Here we can see the shift that I'm referring to. I believe we should > translate *(u8 *)(r1 + 3) in this case without this shift on big-endian > machines. Thanks for the detailed illustration. Now, with your detailed output of byte codes and xlated program, it indeed becomes apparent that verifier should not do shift at insn 2. I was correct that after insn 0, register r2 should hold the same value for big and little endian. But I missed the fact in the first review that insn->off for original first insn is different. r2 = *(u8 *)(r1 + 3), the first insn on big endian, and r2 = *(u8 *)r1 for little endian. They should really have the same shift amount. Therefore, indeed, shifting amount is actually different between big and little endians. So your code is correct. Could you add a macro in linux/filter.h? Most narrow load related macros are there? This way, we maintain verifier.c __BYTE_ORDER__ macro free. Also, could you put your above analysis in the commit message? This will help reasoning the change easily later on. Thanks! > > Best regards, > Ilya
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5900cbb966b1..3f9353653558 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8616,8 +8616,12 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) } if (is_narrower_load && size < target_size) { - u8 shift = (off & (size_default - 1)) * 8; - + u8 load_off = off & (size_default - 1); +#ifdef __LITTLE_ENDIAN + u8 shift = load_off * 8; +#else + u8 shift = (size_default - (load_off + size)) * 8; +#endif if (ctx_field_size <= 4) { if (shift) insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
test_pkt_md_access is failing on s390, since the associated eBPF prog returns TC_ACT_SHOT, which in turn happens because loading a part of a struct __sk_buff field produces an incorrect result. The problem is that when verifier emits the code to replace partial load of a field with a full load, a shift and a bitwise AND, it assumes that the machine is little endian. Adjust shift count calculation to account for endianness. Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> --- kernel/bpf/verifier.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)