Message ID | 20191028122902.9763-1-iii@linux.ibm.com |
---|---|
State | Accepted |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf] bpf: allow narrow loads of bpf_sysctl fields with offset > 0 | expand |
Ilya Leoshkevich <iii@linux.ibm.com> [Mon, 2019-10-28 05:29 -0700]: > "ctx:file_pos sysctl:read read ok narrow" works on s390 by accident: it > reads the wrong byte, which happens to have the expected value of 0. > Improve the test by seeking to the 4th byte and expecting 4 instead of > 0. > > This makes the latent problem apparent: the test attempts to read the > first byte of bpf_sysctl.file_pos, assuming this is the least-significant > byte, which is not the case on big-endian machines: a non-zero offset is > needed. > > The point of the test is to verify narrow loads, so we cannot cheat our > way out by simply using BPF_W. The existence of the test means that such > loads have to be supported, most likely because llvm can generate them. Right, llvm sometimes generates narrow load even if C proram uses u32 and this is the reason to support them. > Fix the test by adding a big-endian variant, which uses an offset to > access the least-significant byte of bpf_sysctl.file_pos. > > This reveals the final problem: verifier rejects accesses to bpf_sysctl > fields with offset > 0. Such accesses are already allowed for a wide > range of structs: __sk_buff, bpf_sock_addr and sk_msg_md to name a few. > Extend this support to bpf_sysctl by using bpf_ctx_range instead of > offsetof when matching field offsets. > > Fixes: 7b146cebe30c ("bpf: Sysctl hook") > Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx") > Fixes: 9a1027e52535 ("selftests/bpf: Test file_pos field in bpf_sysctl ctx") > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > --- > kernel/bpf/cgroup.c | 4 ++-- > tools/testing/selftests/bpf/test_sysctl.c | 8 +++++++- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > index ddd8addcdb5c..a3eaf08e7dd3 100644 > --- a/kernel/bpf/cgroup.c > +++ b/kernel/bpf/cgroup.c > @@ -1311,12 +1311,12 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type, > return false; > > switch (off) { > - case offsetof(struct bpf_sysctl, write): > + case bpf_ctx_range(struct bpf_sysctl, write): > if (type != BPF_READ) > return false; > bpf_ctx_record_field_size(info, size_default); > return bpf_ctx_narrow_access_ok(off, size, size_default); LGTM, but could you please add a test case for narrow load from `write`? From what I see all existing test cases use BPF_W. > - case offsetof(struct bpf_sysctl, file_pos): > + case bpf_ctx_range(struct bpf_sysctl, file_pos): > if (type == BPF_READ) { > bpf_ctx_record_field_size(info, size_default); > return bpf_ctx_narrow_access_ok(off, size, size_default); > diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c > index a320e3844b17..7c6e5b173f33 100644 > --- a/tools/testing/selftests/bpf/test_sysctl.c > +++ b/tools/testing/selftests/bpf/test_sysctl.c > @@ -161,9 +161,14 @@ static struct sysctl_test tests[] = { > .descr = "ctx:file_pos sysctl:read read ok narrow", > .insns = { > /* If (file_pos == X) */ > +#if __BYTE_ORDER == __LITTLE_ENDIAN > BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, > offsetof(struct bpf_sysctl, file_pos)), > - BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2), > +#else > + BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, > + offsetof(struct bpf_sysctl, file_pos) + 3), > +#endif > + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 4, 2), > > /* return ALLOW; */ > BPF_MOV64_IMM(BPF_REG_0, 1), > @@ -176,6 +181,7 @@ static struct sysctl_test tests[] = { > .attach_type = BPF_CGROUP_SYSCTL, > .sysctl = "kernel/ostype", > .open_flags = O_RDONLY, > + .seek = 4, > .result = SUCCESS, > }, > { > -- > 2.23.0 >
On Mon, Oct 28, 2019 at 1:09 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > "ctx:file_pos sysctl:read read ok narrow" works on s390 by accident: it > reads the wrong byte, which happens to have the expected value of 0. > Improve the test by seeking to the 4th byte and expecting 4 instead of > 0. > > This makes the latent problem apparent: the test attempts to read the > first byte of bpf_sysctl.file_pos, assuming this is the least-significant > byte, which is not the case on big-endian machines: a non-zero offset is > needed. > > The point of the test is to verify narrow loads, so we cannot cheat our > way out by simply using BPF_W. The existence of the test means that such > loads have to be supported, most likely because llvm can generate them. > Fix the test by adding a big-endian variant, which uses an offset to > access the least-significant byte of bpf_sysctl.file_pos. > > This reveals the final problem: verifier rejects accesses to bpf_sysctl > fields with offset > 0. Such accesses are already allowed for a wide > range of structs: __sk_buff, bpf_sock_addr and sk_msg_md to name a few. > Extend this support to bpf_sysctl by using bpf_ctx_range instead of > offsetof when matching field offsets. > > Fixes: 7b146cebe30c ("bpf: Sysctl hook") > Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx") > Fixes: 9a1027e52535 ("selftests/bpf: Test file_pos field in bpf_sysctl ctx") > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> > --- > kernel/bpf/cgroup.c | 4 ++-- > tools/testing/selftests/bpf/test_sysctl.c | 8 +++++++- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > index ddd8addcdb5c..a3eaf08e7dd3 100644 > --- a/kernel/bpf/cgroup.c > +++ b/kernel/bpf/cgroup.c > @@ -1311,12 +1311,12 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type, > return false; > > switch (off) { > - case offsetof(struct bpf_sysctl, write): > + case bpf_ctx_range(struct bpf_sysctl, write): this will actually allow reads pas t write field (e.g., offset = 2, size = 4). > if (type != BPF_READ) > return false; > bpf_ctx_record_field_size(info, size_default); > return bpf_ctx_narrow_access_ok(off, size, size_default); > - case offsetof(struct bpf_sysctl, file_pos): > + case bpf_ctx_range(struct bpf_sysctl, file_pos) this will allow read past context struct altogether. When we allow ranges, we will have to adjust allowed read size. > if (type == BPF_READ) { > bpf_ctx_record_field_size(info, size_default); > return bpf_ctx_narrow_access_ok(off, size, size_default); > diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c > index a320e3844b17..7c6e5b173f33 100644 > --- a/tools/testing/selftests/bpf/test_sysctl.c > +++ b/tools/testing/selftests/bpf/test_sysctl.c > @@ -161,9 +161,14 @@ static struct sysctl_test tests[] = { > .descr = "ctx:file_pos sysctl:read read ok narrow", > .insns = { > /* If (file_pos == X) */ > +#if __BYTE_ORDER == __LITTLE_ENDIAN > BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, > offsetof(struct bpf_sysctl, file_pos)), > - BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2), > +#else > + BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, > + offsetof(struct bpf_sysctl, file_pos) + 3), > +#endif > + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 4, 2), > > /* return ALLOW; */ > BPF_MOV64_IMM(BPF_REG_0, 1), > @@ -176,6 +181,7 @@ static struct sysctl_test tests[] = { > .attach_type = BPF_CGROUP_SYSCTL, > .sysctl = "kernel/ostype", > .open_flags = O_RDONLY, > + .seek = 4, > .result = SUCCESS, > }, > { > -- > 2.23.0 >
> Am 29.10.2019 um 05:36 schrieb Andrii Nakryiko <andrii.nakryiko@gmail.com>: > > On Mon, Oct 28, 2019 at 1:09 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote: >> >> --- a/kernel/bpf/cgroup.c >> +++ b/kernel/bpf/cgroup.c >> @@ -1311,12 +1311,12 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type, >> return false; >> >> switch (off) { >> - case offsetof(struct bpf_sysctl, write): >> + case bpf_ctx_range(struct bpf_sysctl, write): > > this will actually allow reads pas t write field (e.g., offset = 2, size = 4). Wouldn't if (off < 0 || off + size > sizeof(struct bpf_sysctl) || off % size) return false; prevent all OOB read-write attempts? Especially the off % size part - I think it has the effect of preventing OOB accesses for fields. In particular, it would filter offset = 2, size = 4 case. I have also checked the other usages of bpf_ctx_range, for example, bpf_skb_is_valid_access, and they don't seem to be doing anything special. > >> if (type != BPF_READ) >> return false; >> bpf_ctx_record_field_size(info, size_default); >> return bpf_ctx_narrow_access_ok(off, size, size_default); >> - case offsetof(struct bpf_sysctl, file_pos): >> + case bpf_ctx_range(struct bpf_sysctl, file_pos) > > this will allow read past context struct altogether. When we allow > ranges, we will have to adjust allowed read size. Same here.
Ilya Leoshkevich <iii@linux.ibm.com> [Tue, 2019-10-29 07:20 -0700]: > > Am 29.10.2019 um 05:36 schrieb Andrii Nakryiko <andrii.nakryiko@gmail.com>: > > > > On Mon, Oct 28, 2019 at 1:09 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > >> > >> --- a/kernel/bpf/cgroup.c > >> +++ b/kernel/bpf/cgroup.c > >> @@ -1311,12 +1311,12 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type, > >> return false; > >> > >> switch (off) { > >> - case offsetof(struct bpf_sysctl, write): > >> + case bpf_ctx_range(struct bpf_sysctl, write): > > > > this will actually allow reads pas t write field (e.g., offset = 2, size = 4). > > Wouldn't > > if (off < 0 || off + size > sizeof(struct bpf_sysctl) || off % size) > return false; > > prevent all OOB read-write attempts? Especially the off % size part - I > think it has the effect of preventing OOB accesses for fields. In > particular, it would filter offset = 2, size = 4 case. Yes, it would. This code makes sure that narrow accesses are aligned so that offset = 2 would allow only size = 2 or size = 1. > I have also checked the other usages of bpf_ctx_range, for example, > bpf_skb_is_valid_access, and they don't seem to be doing anything > special. Yes, sysctl hook follows logic similar to that of other program types. > >> if (type != BPF_READ) > >> return false; > >> bpf_ctx_record_field_size(info, size_default); > >> return bpf_ctx_narrow_access_ok(off, size, size_default); > >> - case offsetof(struct bpf_sysctl, file_pos): > >> + case bpf_ctx_range(struct bpf_sysctl, file_pos) > > > > this will allow read past context struct altogether. When we allow > > ranges, we will have to adjust allowed read size. > > Same here.
Ilya Leoshkevich <iii@linux.ibm.com> [Mon, 2019-10-28 05:29 -0700]: > "ctx:file_pos sysctl:read read ok narrow" works on s390 by accident: it > reads the wrong byte, which happens to have the expected value of 0. > Improve the test by seeking to the 4th byte and expecting 4 instead of > 0. > > This makes the latent problem apparent: the test attempts to read the > first byte of bpf_sysctl.file_pos, assuming this is the least-significant > byte, which is not the case on big-endian machines: a non-zero offset is > needed. > > The point of the test is to verify narrow loads, so we cannot cheat our > way out by simply using BPF_W. The existence of the test means that such > loads have to be supported, most likely because llvm can generate them. > Fix the test by adding a big-endian variant, which uses an offset to > access the least-significant byte of bpf_sysctl.file_pos. > > This reveals the final problem: verifier rejects accesses to bpf_sysctl > fields with offset > 0. Such accesses are already allowed for a wide > range of structs: __sk_buff, bpf_sock_addr and sk_msg_md to name a few. > Extend this support to bpf_sysctl by using bpf_ctx_range instead of > offsetof when matching field offsets. > > Fixes: 7b146cebe30c ("bpf: Sysctl hook") > Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx") > Fixes: 9a1027e52535 ("selftests/bpf: Test file_pos field in bpf_sysctl ctx") > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Thanks for following up with the test case and for the bugfix itself! Acked-by: Andrey Ignatov <rdna@fb.com> > --- > kernel/bpf/cgroup.c | 4 ++-- > tools/testing/selftests/bpf/test_sysctl.c | 8 +++++++- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c > index ddd8addcdb5c..a3eaf08e7dd3 100644 > --- a/kernel/bpf/cgroup.c > +++ b/kernel/bpf/cgroup.c > @@ -1311,12 +1311,12 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type, > return false; > > switch (off) { > - case offsetof(struct bpf_sysctl, write): > + case bpf_ctx_range(struct bpf_sysctl, write): > if (type != BPF_READ) > return false; > bpf_ctx_record_field_size(info, size_default); > return bpf_ctx_narrow_access_ok(off, size, size_default); > - case offsetof(struct bpf_sysctl, file_pos): > + case bpf_ctx_range(struct bpf_sysctl, file_pos): > if (type == BPF_READ) { > bpf_ctx_record_field_size(info, size_default); > return bpf_ctx_narrow_access_ok(off, size, size_default); > diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c > index a320e3844b17..7c6e5b173f33 100644 > --- a/tools/testing/selftests/bpf/test_sysctl.c > +++ b/tools/testing/selftests/bpf/test_sysctl.c > @@ -161,9 +161,14 @@ static struct sysctl_test tests[] = { > .descr = "ctx:file_pos sysctl:read read ok narrow", > .insns = { > /* If (file_pos == X) */ > +#if __BYTE_ORDER == __LITTLE_ENDIAN > BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, > offsetof(struct bpf_sysctl, file_pos)), > - BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2), > +#else > + BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, > + offsetof(struct bpf_sysctl, file_pos) + 3), > +#endif > + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 4, 2), > > /* return ALLOW; */ > BPF_MOV64_IMM(BPF_REG_0, 1), > @@ -176,6 +181,7 @@ static struct sysctl_test tests[] = { > .attach_type = BPF_CGROUP_SYSCTL, > .sysctl = "kernel/ostype", > .open_flags = O_RDONLY, > + .seek = 4, > .result = SUCCESS, > }, > { > -- > 2.23.0 >
On Tue, Oct 29, 2019 at 8:16 AM Andrey Ignatov <rdna@fb.com> wrote: > > Ilya Leoshkevich <iii@linux.ibm.com> [Tue, 2019-10-29 07:20 -0700]: > > > Am 29.10.2019 um 05:36 schrieb Andrii Nakryiko <andrii.nakryiko@gmail.com>: > > > > > > On Mon, Oct 28, 2019 at 1:09 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > >> > > >> --- a/kernel/bpf/cgroup.c > > >> +++ b/kernel/bpf/cgroup.c > > >> @@ -1311,12 +1311,12 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type, > > >> return false; > > >> > > >> switch (off) { > > >> - case offsetof(struct bpf_sysctl, write): > > >> + case bpf_ctx_range(struct bpf_sysctl, write): > > > > > > this will actually allow reads pas t write field (e.g., offset = 2, size = 4). > > > > Wouldn't > > > > if (off < 0 || off + size > sizeof(struct bpf_sysctl) || off % size) > > return false; > > > > prevent all OOB read-write attempts? Especially the off % size part - I > > think it has the effect of preventing OOB accesses for fields. In > > particular, it would filter offset = 2, size = 4 case. > > Yes, it would. This code makes sure that narrow accesses are aligned so > that offset = 2 would allow only size = 2 or size = 1. Yes, you both are right, I missed the "off % size" check above. Thanks. Looks good to me as well. Acked-by: Andrii Nakryiko <andriin@fb.com> > > > I have also checked the other usages of bpf_ctx_range, for example, > > bpf_skb_is_valid_access, and they don't seem to be doing anything > > special. > > Yes, sysctl hook follows logic similar to that of other program types. > > > >> if (type != BPF_READ) > > >> return false; > > >> bpf_ctx_record_field_size(info, size_default); > > >> return bpf_ctx_narrow_access_ok(off, size, size_default); > > >> - case offsetof(struct bpf_sysctl, file_pos): > > >> + case bpf_ctx_range(struct bpf_sysctl, file_pos) > > > > > > this will allow read past context struct altogether. When we allow > > > ranges, we will have to adjust allowed read size. > > > > Same here. > > -- > Andrey Ignatov
Applied to bpf tree. Thanks
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index ddd8addcdb5c..a3eaf08e7dd3 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1311,12 +1311,12 @@ static bool sysctl_is_valid_access(int off, int size, enum bpf_access_type type, return false; switch (off) { - case offsetof(struct bpf_sysctl, write): + case bpf_ctx_range(struct bpf_sysctl, write): if (type != BPF_READ) return false; bpf_ctx_record_field_size(info, size_default); return bpf_ctx_narrow_access_ok(off, size, size_default); - case offsetof(struct bpf_sysctl, file_pos): + case bpf_ctx_range(struct bpf_sysctl, file_pos): if (type == BPF_READ) { bpf_ctx_record_field_size(info, size_default); return bpf_ctx_narrow_access_ok(off, size, size_default); diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c index a320e3844b17..7c6e5b173f33 100644 --- a/tools/testing/selftests/bpf/test_sysctl.c +++ b/tools/testing/selftests/bpf/test_sysctl.c @@ -161,9 +161,14 @@ static struct sysctl_test tests[] = { .descr = "ctx:file_pos sysctl:read read ok narrow", .insns = { /* If (file_pos == X) */ +#if __BYTE_ORDER == __LITTLE_ENDIAN BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, offsetof(struct bpf_sysctl, file_pos)), - BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0, 2), +#else + BPF_LDX_MEM(BPF_B, BPF_REG_7, BPF_REG_1, + offsetof(struct bpf_sysctl, file_pos) + 3), +#endif + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 4, 2), /* return ALLOW; */ BPF_MOV64_IMM(BPF_REG_0, 1), @@ -176,6 +181,7 @@ static struct sysctl_test tests[] = { .attach_type = BPF_CGROUP_SYSCTL, .sysctl = "kernel/ostype", .open_flags = O_RDONLY, + .seek = 4, .result = SUCCESS, }, {
"ctx:file_pos sysctl:read read ok narrow" works on s390 by accident: it reads the wrong byte, which happens to have the expected value of 0. Improve the test by seeking to the 4th byte and expecting 4 instead of 0. This makes the latent problem apparent: the test attempts to read the first byte of bpf_sysctl.file_pos, assuming this is the least-significant byte, which is not the case on big-endian machines: a non-zero offset is needed. The point of the test is to verify narrow loads, so we cannot cheat our way out by simply using BPF_W. The existence of the test means that such loads have to be supported, most likely because llvm can generate them. Fix the test by adding a big-endian variant, which uses an offset to access the least-significant byte of bpf_sysctl.file_pos. This reveals the final problem: verifier rejects accesses to bpf_sysctl fields with offset > 0. Such accesses are already allowed for a wide range of structs: __sk_buff, bpf_sock_addr and sk_msg_md to name a few. Extend this support to bpf_sysctl by using bpf_ctx_range instead of offsetof when matching field offsets. Fixes: 7b146cebe30c ("bpf: Sysctl hook") Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx") Fixes: 9a1027e52535 ("selftests/bpf: Test file_pos field in bpf_sysctl ctx") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> --- kernel/bpf/cgroup.c | 4 ++-- tools/testing/selftests/bpf/test_sysctl.c | 8 +++++++- 2 files changed, 9 insertions(+), 3 deletions(-)