Message ID | 20190703205100.142904-1-sdf@google.com |
---|---|
State | Superseded |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf-next] selftests/bpf: make verifier loop tests arch independent | expand |
On Wed, Jul 3, 2019 at 1:51 PM Stanislav Fomichev <sdf@google.com> wrote: > > Take the first x bytes of pt_regs for scalability tests, there is > no real reason we need x86 specific rax. > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > tools/testing/selftests/bpf/progs/loop1.c | 3 ++- > tools/testing/selftests/bpf/progs/loop2.c | 3 ++- > tools/testing/selftests/bpf/progs/loop3.c | 3 ++- > 3 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/selftests/bpf/progs/loop1.c b/tools/testing/selftests/bpf/progs/loop1.c > index dea395af9ea9..d530c61d2517 100644 > --- a/tools/testing/selftests/bpf/progs/loop1.c > +++ b/tools/testing/selftests/bpf/progs/loop1.c > @@ -14,11 +14,12 @@ SEC("raw_tracepoint/kfree_skb") > int nested_loops(volatile struct pt_regs* ctx) > { > int i, j, sum = 0, m; > + volatile int *any_reg = (volatile int *)ctx; > > for (j = 0; j < 300; j++) > for (i = 0; i < j; i++) { > if (j & 1) > - m = ctx->rax; > + m = *any_reg; I agree. ctx->rax here is only to generate some operations, which cannot be optimized away by the compiler. dereferencing a volatile pointee may just serve that purpose. Comparing the byte code generated with ctx->rax and *any_reg, they are slightly different. Using *any_reg is slighly worse, but this should be still okay for the test. > else > m = j; > sum += i * m; > diff --git a/tools/testing/selftests/bpf/progs/loop2.c b/tools/testing/selftests/bpf/progs/loop2.c > index 0637bd8e8bcf..91bb89d901e3 100644 > --- a/tools/testing/selftests/bpf/progs/loop2.c > +++ b/tools/testing/selftests/bpf/progs/loop2.c > @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") > int while_true(volatile struct pt_regs* ctx) > { > int i = 0; > + volatile int *any_reg = (volatile int *)ctx; > > while (true) { > - if (ctx->rax & 1) > + if (*any_reg & 1) > i += 3; > else > i += 7; > diff --git a/tools/testing/selftests/bpf/progs/loop3.c b/tools/testing/selftests/bpf/progs/loop3.c > index 30a0f6cba080..3a7f12d7186c 100644 > --- a/tools/testing/selftests/bpf/progs/loop3.c > +++ b/tools/testing/selftests/bpf/progs/loop3.c > @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") > int while_true(volatile struct pt_regs* ctx) > { > __u64 i = 0, sum = 0; > + volatile __u64 *any_reg = (volatile __u64 *)ctx; > do { > i++; > - sum += ctx->rax; > + sum += *any_reg; > } while (i < 0x100000000ULL); > return sum; > } > -- > 2.22.0.410.gd8fdbe21b5-goog Ilya Leoshkevich (iii@linux.ibm.com, cc'ed) has another patch set trying to solve this problem by introducing s360 arch register access macros. I guess for now that patch set is not needed any more?
On 07/03, Y Song wrote: > On Wed, Jul 3, 2019 at 1:51 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > Take the first x bytes of pt_regs for scalability tests, there is > > no real reason we need x86 specific rax. > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > --- > > tools/testing/selftests/bpf/progs/loop1.c | 3 ++- > > tools/testing/selftests/bpf/progs/loop2.c | 3 ++- > > tools/testing/selftests/bpf/progs/loop3.c | 3 ++- > > 3 files changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/tools/testing/selftests/bpf/progs/loop1.c b/tools/testing/selftests/bpf/progs/loop1.c > > index dea395af9ea9..d530c61d2517 100644 > > --- a/tools/testing/selftests/bpf/progs/loop1.c > > +++ b/tools/testing/selftests/bpf/progs/loop1.c > > @@ -14,11 +14,12 @@ SEC("raw_tracepoint/kfree_skb") > > int nested_loops(volatile struct pt_regs* ctx) > > { > > int i, j, sum = 0, m; > > + volatile int *any_reg = (volatile int *)ctx; > > > > for (j = 0; j < 300; j++) > > for (i = 0; i < j; i++) { > > if (j & 1) > > - m = ctx->rax; > > + m = *any_reg; > > I agree. ctx->rax here is only to generate some operations, which > cannot be optimized away by the compiler. dereferencing a volatile > pointee may just serve that purpose. > > Comparing the byte code generated with ctx->rax and *any_reg, they are > slightly different. Using *any_reg is slighly worse, but this should > be still okay for the test. > > > else > > m = j; > > sum += i * m; > > diff --git a/tools/testing/selftests/bpf/progs/loop2.c b/tools/testing/selftests/bpf/progs/loop2.c > > index 0637bd8e8bcf..91bb89d901e3 100644 > > --- a/tools/testing/selftests/bpf/progs/loop2.c > > +++ b/tools/testing/selftests/bpf/progs/loop2.c > > @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") > > int while_true(volatile struct pt_regs* ctx) > > { > > int i = 0; > > + volatile int *any_reg = (volatile int *)ctx; > > > > while (true) { > > - if (ctx->rax & 1) > > + if (*any_reg & 1) > > i += 3; > > else > > i += 7; > > diff --git a/tools/testing/selftests/bpf/progs/loop3.c b/tools/testing/selftests/bpf/progs/loop3.c > > index 30a0f6cba080..3a7f12d7186c 100644 > > --- a/tools/testing/selftests/bpf/progs/loop3.c > > +++ b/tools/testing/selftests/bpf/progs/loop3.c > > @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") > > int while_true(volatile struct pt_regs* ctx) > > { > > __u64 i = 0, sum = 0; > > + volatile __u64 *any_reg = (volatile __u64 *)ctx; > > do { > > i++; > > - sum += ctx->rax; > > + sum += *any_reg; > > } while (i < 0x100000000ULL); > > return sum; > > } > > -- > > 2.22.0.410.gd8fdbe21b5-goog > > Ilya Leoshkevich (iii@linux.ibm.com, cc'ed) has another patch set > trying to solve this problem by introducing s360 arch register access > macros. I guess for now that patch set is not needed any more? Oh, I missed them. Do they fix the tests for other (non-s360) arches as well? I was trying to fix the issue by not depending on any arch specific stuff because the test really doesn't care :-)
> Am 08.07.2019 um 18:13 schrieb Stanislav Fomichev <sdf@fomichev.me>: > > On 07/03, Y Song wrote: >> On Wed, Jul 3, 2019 at 1:51 PM Stanislav Fomichev <sdf@google.com> wrote: >>> >>> Take the first x bytes of pt_regs for scalability tests, there is >>> no real reason we need x86 specific rax. >>> >>> Signed-off-by: Stanislav Fomichev <sdf@google.com> >>> --- >>> tools/testing/selftests/bpf/progs/loop1.c | 3 ++- >>> tools/testing/selftests/bpf/progs/loop2.c | 3 ++- >>> tools/testing/selftests/bpf/progs/loop3.c | 3 ++- >>> 3 files changed, 6 insertions(+), 3 deletions(-) >>> >>> diff --git a/tools/testing/selftests/bpf/progs/loop1.c b/tools/testing/selftests/bpf/progs/loop1.c >>> index dea395af9ea9..d530c61d2517 100644 >>> --- a/tools/testing/selftests/bpf/progs/loop1.c >>> +++ b/tools/testing/selftests/bpf/progs/loop1.c >>> @@ -14,11 +14,12 @@ SEC("raw_tracepoint/kfree_skb") >>> int nested_loops(volatile struct pt_regs* ctx) >>> { >>> int i, j, sum = 0, m; >>> + volatile int *any_reg = (volatile int *)ctx; >>> >>> for (j = 0; j < 300; j++) >>> for (i = 0; i < j; i++) { >>> if (j & 1) >>> - m = ctx->rax; >>> + m = *any_reg; >> >> I agree. ctx->rax here is only to generate some operations, which >> cannot be optimized away by the compiler. dereferencing a volatile >> pointee may just serve that purpose. >> >> Comparing the byte code generated with ctx->rax and *any_reg, they are >> slightly different. Using *any_reg is slighly worse, but this should >> be still okay for the test. >> >>> else >>> m = j; >>> sum += i * m; >>> diff --git a/tools/testing/selftests/bpf/progs/loop2.c b/tools/testing/selftests/bpf/progs/loop2.c >>> index 0637bd8e8bcf..91bb89d901e3 100644 >>> --- a/tools/testing/selftests/bpf/progs/loop2.c >>> +++ b/tools/testing/selftests/bpf/progs/loop2.c >>> @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") >>> int while_true(volatile struct pt_regs* ctx) >>> { >>> int i = 0; >>> + volatile int *any_reg = (volatile int *)ctx; >>> >>> while (true) { >>> - if (ctx->rax & 1) >>> + if (*any_reg & 1) >>> i += 3; >>> else >>> i += 7; >>> diff --git a/tools/testing/selftests/bpf/progs/loop3.c b/tools/testing/selftests/bpf/progs/loop3.c >>> index 30a0f6cba080..3a7f12d7186c 100644 >>> --- a/tools/testing/selftests/bpf/progs/loop3.c >>> +++ b/tools/testing/selftests/bpf/progs/loop3.c >>> @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") >>> int while_true(volatile struct pt_regs* ctx) >>> { >>> __u64 i = 0, sum = 0; >>> + volatile __u64 *any_reg = (volatile __u64 *)ctx; >>> do { >>> i++; >>> - sum += ctx->rax; >>> + sum += *any_reg; >>> } while (i < 0x100000000ULL); >>> return sum; >>> } >>> -- >>> 2.22.0.410.gd8fdbe21b5-goog >> >> Ilya Leoshkevich (iii@linux.ibm.com, cc'ed) has another patch set >> trying to solve this problem by introducing s360 arch register access >> macros. I guess for now that patch set is not needed any more? > Oh, I missed them. Do they fix the tests for other (non-s360) arches as > well? I was trying to fix the issue by not depending on any arch > specific stuff because the test really doesn't care :-) They are supposed to work for everything that defines PT_REGS_RC in bpf_helpers.h, but I have to admit I tested only x86_64 and s390. The main source of problems with my approach were mismatching definitions of struct pt_regs for userspace and kernel, and because of that there was some tweaking required for both arches. I will double check how it looks for others (arm, mips, ppc, sparc) tomorrow. Best regards, Ilya
On 07/08, Ilya Leoshkevich wrote: > > > > Am 08.07.2019 um 18:13 schrieb Stanislav Fomichev <sdf@fomichev.me>: > > > > On 07/03, Y Song wrote: > >> On Wed, Jul 3, 2019 at 1:51 PM Stanislav Fomichev <sdf@google.com> wrote: > >>> > >>> Take the first x bytes of pt_regs for scalability tests, there is > >>> no real reason we need x86 specific rax. > >>> > >>> Signed-off-by: Stanislav Fomichev <sdf@google.com> > >>> --- > >>> tools/testing/selftests/bpf/progs/loop1.c | 3 ++- > >>> tools/testing/selftests/bpf/progs/loop2.c | 3 ++- > >>> tools/testing/selftests/bpf/progs/loop3.c | 3 ++- > >>> 3 files changed, 6 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/tools/testing/selftests/bpf/progs/loop1.c b/tools/testing/selftests/bpf/progs/loop1.c > >>> index dea395af9ea9..d530c61d2517 100644 > >>> --- a/tools/testing/selftests/bpf/progs/loop1.c > >>> +++ b/tools/testing/selftests/bpf/progs/loop1.c > >>> @@ -14,11 +14,12 @@ SEC("raw_tracepoint/kfree_skb") > >>> int nested_loops(volatile struct pt_regs* ctx) > >>> { > >>> int i, j, sum = 0, m; > >>> + volatile int *any_reg = (volatile int *)ctx; > >>> > >>> for (j = 0; j < 300; j++) > >>> for (i = 0; i < j; i++) { > >>> if (j & 1) > >>> - m = ctx->rax; > >>> + m = *any_reg; > >> > >> I agree. ctx->rax here is only to generate some operations, which > >> cannot be optimized away by the compiler. dereferencing a volatile > >> pointee may just serve that purpose. > >> > >> Comparing the byte code generated with ctx->rax and *any_reg, they are > >> slightly different. Using *any_reg is slighly worse, but this should > >> be still okay for the test. > >> > >>> else > >>> m = j; > >>> sum += i * m; > >>> diff --git a/tools/testing/selftests/bpf/progs/loop2.c b/tools/testing/selftests/bpf/progs/loop2.c > >>> index 0637bd8e8bcf..91bb89d901e3 100644 > >>> --- a/tools/testing/selftests/bpf/progs/loop2.c > >>> +++ b/tools/testing/selftests/bpf/progs/loop2.c > >>> @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") > >>> int while_true(volatile struct pt_regs* ctx) > >>> { > >>> int i = 0; > >>> + volatile int *any_reg = (volatile int *)ctx; > >>> > >>> while (true) { > >>> - if (ctx->rax & 1) > >>> + if (*any_reg & 1) > >>> i += 3; > >>> else > >>> i += 7; > >>> diff --git a/tools/testing/selftests/bpf/progs/loop3.c b/tools/testing/selftests/bpf/progs/loop3.c > >>> index 30a0f6cba080..3a7f12d7186c 100644 > >>> --- a/tools/testing/selftests/bpf/progs/loop3.c > >>> +++ b/tools/testing/selftests/bpf/progs/loop3.c > >>> @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") > >>> int while_true(volatile struct pt_regs* ctx) > >>> { > >>> __u64 i = 0, sum = 0; > >>> + volatile __u64 *any_reg = (volatile __u64 *)ctx; > >>> do { > >>> i++; > >>> - sum += ctx->rax; > >>> + sum += *any_reg; > >>> } while (i < 0x100000000ULL); > >>> return sum; > >>> } > >>> -- > >>> 2.22.0.410.gd8fdbe21b5-goog > >> > >> Ilya Leoshkevich (iii@linux.ibm.com, cc'ed) has another patch set > >> trying to solve this problem by introducing s360 arch register access > >> macros. I guess for now that patch set is not needed any more? > > Oh, I missed them. Do they fix the tests for other (non-s360) arches as > > well? I was trying to fix the issue by not depending on any arch > > specific stuff because the test really doesn't care :-) > > They are supposed to work for everything that defines PT_REGS_RC in > bpf_helpers.h, but I have to admit I tested only x86_64 and s390. > > The main source of problems with my approach were mismatching definitions > of struct pt_regs for userspace and kernel, and because of that there was > some tweaking required for both arches. I will double check how it looks > for others (arm, mips, ppc, sparc) tomorrow. Thanks, I've tested your patches and they fix my issue as well. So you can have my Tested-by if we'd go with your approach. One thing I don't understand is: why do you add 'ifdef __KERNEL__' to the bpf_helpers.h for x86 case? Who is using bpf_helpers.h with __KERNEL__ defined? Is it perf? > Best regards, > Ilya
> Am 08.07.2019 um 23:20 schrieb Stanislav Fomichev <sdf@fomichev.me>: > > On 07/08, Ilya Leoshkevich wrote: >> >> >>> Am 08.07.2019 um 18:13 schrieb Stanislav Fomichev <sdf@fomichev.me>: >>> >>> On 07/03, Y Song wrote: >>>> On Wed, Jul 3, 2019 at 1:51 PM Stanislav Fomichev <sdf@google.com> wrote: >>>>> >>>>> Take the first x bytes of pt_regs for scalability tests, there is >>>>> no real reason we need x86 specific rax. >>>>> >>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com> >>>>> --- >>>>> tools/testing/selftests/bpf/progs/loop1.c | 3 ++- >>>>> tools/testing/selftests/bpf/progs/loop2.c | 3 ++- >>>>> tools/testing/selftests/bpf/progs/loop3.c | 3 ++- >>>>> 3 files changed, 6 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/tools/testing/selftests/bpf/progs/loop1.c b/tools/testing/selftests/bpf/progs/loop1.c >>>>> index dea395af9ea9..d530c61d2517 100644 >>>>> --- a/tools/testing/selftests/bpf/progs/loop1.c >>>>> +++ b/tools/testing/selftests/bpf/progs/loop1.c >>>>> @@ -14,11 +14,12 @@ SEC("raw_tracepoint/kfree_skb") >>>>> int nested_loops(volatile struct pt_regs* ctx) >>>>> { >>>>> int i, j, sum = 0, m; >>>>> + volatile int *any_reg = (volatile int *)ctx; >>>>> >>>>> for (j = 0; j < 300; j++) >>>>> for (i = 0; i < j; i++) { >>>>> if (j & 1) >>>>> - m = ctx->rax; >>>>> + m = *any_reg; >>>> >>>> I agree. ctx->rax here is only to generate some operations, which >>>> cannot be optimized away by the compiler. dereferencing a volatile >>>> pointee may just serve that purpose. >>>> >>>> Comparing the byte code generated with ctx->rax and *any_reg, they are >>>> slightly different. Using *any_reg is slighly worse, but this should >>>> be still okay for the test. >>>> >>>>> else >>>>> m = j; >>>>> sum += i * m; >>>>> diff --git a/tools/testing/selftests/bpf/progs/loop2.c b/tools/testing/selftests/bpf/progs/loop2.c >>>>> index 0637bd8e8bcf..91bb89d901e3 100644 >>>>> --- a/tools/testing/selftests/bpf/progs/loop2.c >>>>> +++ b/tools/testing/selftests/bpf/progs/loop2.c >>>>> @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") >>>>> int while_true(volatile struct pt_regs* ctx) >>>>> { >>>>> int i = 0; >>>>> + volatile int *any_reg = (volatile int *)ctx; >>>>> >>>>> while (true) { >>>>> - if (ctx->rax & 1) >>>>> + if (*any_reg & 1) >>>>> i += 3; >>>>> else >>>>> i += 7; >>>>> diff --git a/tools/testing/selftests/bpf/progs/loop3.c b/tools/testing/selftests/bpf/progs/loop3.c >>>>> index 30a0f6cba080..3a7f12d7186c 100644 >>>>> --- a/tools/testing/selftests/bpf/progs/loop3.c >>>>> +++ b/tools/testing/selftests/bpf/progs/loop3.c >>>>> @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") >>>>> int while_true(volatile struct pt_regs* ctx) >>>>> { >>>>> __u64 i = 0, sum = 0; >>>>> + volatile __u64 *any_reg = (volatile __u64 *)ctx; >>>>> do { >>>>> i++; >>>>> - sum += ctx->rax; >>>>> + sum += *any_reg; >>>>> } while (i < 0x100000000ULL); >>>>> return sum; >>>>> } >>>>> -- >>>>> 2.22.0.410.gd8fdbe21b5-goog >>>> >>>> Ilya Leoshkevich (iii@linux.ibm.com, cc'ed) has another patch set >>>> trying to solve this problem by introducing s360 arch register access >>>> macros. I guess for now that patch set is not needed any more? >>> Oh, I missed them. Do they fix the tests for other (non-s360) arches as >>> well? I was trying to fix the issue by not depending on any arch >>> specific stuff because the test really doesn't care :-) >> >> They are supposed to work for everything that defines PT_REGS_RC in >> bpf_helpers.h, but I have to admit I tested only x86_64 and s390. >> >> The main source of problems with my approach were mismatching definitions >> of struct pt_regs for userspace and kernel, and because of that there was >> some tweaking required for both arches. I will double check how it looks >> for others (arm, mips, ppc, sparc) tomorrow. > Thanks, I've tested your patches and they fix my issue as well. So you > can have my Tested-by if we'd go with your approach. > > One thing I don't understand is: why do you add 'ifdef __KERNEL__' to > the bpf_helpers.h for x86 case? Who is using bpf_helpers.h with > __KERNEL__ defined? Is it perf? That’s samples/bpf. Also, there is a modified copy of it in bcc (src/cc/export/helpers.h), which also gets built with __KERNEL__. Best regards, Ilya
diff --git a/tools/testing/selftests/bpf/progs/loop1.c b/tools/testing/selftests/bpf/progs/loop1.c index dea395af9ea9..d530c61d2517 100644 --- a/tools/testing/selftests/bpf/progs/loop1.c +++ b/tools/testing/selftests/bpf/progs/loop1.c @@ -14,11 +14,12 @@ SEC("raw_tracepoint/kfree_skb") int nested_loops(volatile struct pt_regs* ctx) { int i, j, sum = 0, m; + volatile int *any_reg = (volatile int *)ctx; for (j = 0; j < 300; j++) for (i = 0; i < j; i++) { if (j & 1) - m = ctx->rax; + m = *any_reg; else m = j; sum += i * m; diff --git a/tools/testing/selftests/bpf/progs/loop2.c b/tools/testing/selftests/bpf/progs/loop2.c index 0637bd8e8bcf..91bb89d901e3 100644 --- a/tools/testing/selftests/bpf/progs/loop2.c +++ b/tools/testing/selftests/bpf/progs/loop2.c @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") int while_true(volatile struct pt_regs* ctx) { int i = 0; + volatile int *any_reg = (volatile int *)ctx; while (true) { - if (ctx->rax & 1) + if (*any_reg & 1) i += 3; else i += 7; diff --git a/tools/testing/selftests/bpf/progs/loop3.c b/tools/testing/selftests/bpf/progs/loop3.c index 30a0f6cba080..3a7f12d7186c 100644 --- a/tools/testing/selftests/bpf/progs/loop3.c +++ b/tools/testing/selftests/bpf/progs/loop3.c @@ -14,9 +14,10 @@ SEC("raw_tracepoint/consume_skb") int while_true(volatile struct pt_regs* ctx) { __u64 i = 0, sum = 0; + volatile __u64 *any_reg = (volatile __u64 *)ctx; do { i++; - sum += ctx->rax; + sum += *any_reg; } while (i < 0x100000000ULL); return sum; }
Take the first x bytes of pt_regs for scalability tests, there is no real reason we need x86 specific rax. Signed-off-by: Stanislav Fomichev <sdf@google.com> --- tools/testing/selftests/bpf/progs/loop1.c | 3 ++- tools/testing/selftests/bpf/progs/loop2.c | 3 ++- tools/testing/selftests/bpf/progs/loop3.c | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-)