Message ID | 20190628231049.22149-1-sdf@google.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf-next,1/2] bpf: allow wide (u64) aligned stores for some fields of bpf_sock_addr | expand |
On 6/28/19 4:10 PM, Stanislav Fomichev wrote: > Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided > that it can do a single u64 store into user_ip6[2] instead of two > separate u32 ones: > > # 17: (18) r2 = 0x100000000000000 > # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); > # 19: (7b) *(u64 *)(r1 +16) = r2 > # invalid bpf_context access off=16 size=8 > > From the compiler point of view it does look like a correct thing > to do, so let's support it on the kernel side. > > Credit to Andrii Nakryiko for a proper implementation of > bpf_ctx_wide_store_ok. > > Cc: Andrii Nakryiko <andriin@fb.com> > Cc: Yonghong Song <yhs@fb.com> > Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") > Reported-by: kernel test robot <rong.a.chen@intel.com> > Signed-off-by: Stanislav Fomichev <sdf@google.com> The change looks good to me with the following nits: 1. could you add a cover letter for the patch set? typically if the number of patches is more than one, it would be a good practice with a cover letter. See bpf_devel_QA.rst . 2. with this change, the comments in uapi bpf.h are not accurate any more. __u32 user_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. * Stored in network byte order. */ __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. * Stored in network byte order. */ now for stores, aligned 8-byte write is permitted. could you update this as well? From the typical usage pattern, I did not see a need for 8-tye read of user_ip6 and msg_src_ip6 yet. So let us just deal with write for now. With the above two nits, Acked-by: Yonghong Song <yhs@fb.com> > --- > include/linux/filter.h | 6 ++++++ > net/core/filter.c | 22 ++++++++++++++-------- > 2 files changed, 20 insertions(+), 8 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 340f7d648974..3901007e36f1 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -746,6 +746,12 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) > return size <= size_default && (size & (size - 1)) == 0; > } > > +#define bpf_ctx_wide_store_ok(off, size, type, field) \ > + (size == sizeof(__u64) && \ > + off >= offsetof(type, field) && \ > + off + sizeof(__u64) <= offsetofend(type, field) && \ > + off % sizeof(__u64) == 0) > + > #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) > > static inline void bpf_prog_lock_ro(struct bpf_prog *fp) > diff --git a/net/core/filter.c b/net/core/filter.c > index dc8534be12fc..5d33f2146dab 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -6849,6 +6849,16 @@ static bool sock_addr_is_valid_access(int off, int size, > if (!bpf_ctx_narrow_access_ok(off, size, size_default)) > return false; > } else { > + if (bpf_ctx_wide_store_ok(off, size, > + struct bpf_sock_addr, > + user_ip6)) > + return true; > + > + if (bpf_ctx_wide_store_ok(off, size, > + struct bpf_sock_addr, > + msg_src_ip6)) > + return true; > + > if (size != size_default) > return false; > } > @@ -7689,9 +7699,6 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > /* SOCK_ADDR_STORE_NESTED_FIELD_OFF() has semantic similar to > * SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF() but for store operation. > * > - * It doesn't support SIZE argument though since narrow stores are not > - * supported for now. > - * > * In addition it uses Temporary Field TF (member of struct S) as the 3rd > * "register" since two registers available in convert_ctx_access are not > * enough: we can't override neither SRC, since it contains value to store, nor > @@ -7699,7 +7706,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > * instructions. But we need a temporary place to save pointer to nested > * structure whose field we want to store to. > */ > -#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, TF) \ > +#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, OFF, TF) \ > do { \ > int tmp_reg = BPF_REG_9; \ > if (si->src_reg == tmp_reg || si->dst_reg == tmp_reg) \ > @@ -7710,8 +7717,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > offsetof(S, TF)); \ > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ > si->dst_reg, offsetof(S, F)); \ > - *insn++ = BPF_STX_MEM( \ > - BPF_FIELD_SIZEOF(NS, NF), tmp_reg, si->src_reg, \ > + *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ > bpf_target_off(NS, NF, FIELD_SIZEOF(NS, NF), \ > target_size) \ > + OFF); \ > @@ -7723,8 +7729,8 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > TF) \ > do { \ > if (type == BPF_WRITE) { \ > - SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, \ > - TF); \ > + SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, \ > + OFF, TF); \ > } else { \ > SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF( \ > S, NS, F, NF, SIZE, OFF); \ >
On Sat, Jun 29, 2019 at 10:53 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 6/28/19 4:10 PM, Stanislav Fomichev wrote: > > Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided > > that it can do a single u64 store into user_ip6[2] instead of two > > separate u32 ones: > > > > # 17: (18) r2 = 0x100000000000000 > > # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); > > # 19: (7b) *(u64 *)(r1 +16) = r2 > > # invalid bpf_context access off=16 size=8 > > > > From the compiler point of view it does look like a correct thing > > to do, so let's support it on the kernel side. > > > > Credit to Andrii Nakryiko for a proper implementation of > > bpf_ctx_wide_store_ok. > > > > Cc: Andrii Nakryiko <andriin@fb.com> > > Cc: Yonghong Song <yhs@fb.com> > > Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") > > Reported-by: kernel test robot <rong.a.chen@intel.com> > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > The change looks good to me with the following nits: > 1. could you add a cover letter for the patch set? > typically if the number of patches is more than one, > it would be a good practice with a cover letter. > See bpf_devel_QA.rst . > 2. with this change, the comments in uapi bpf.h > are not accurate any more. > __u32 user_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > * Stored in network byte order. > > */ > __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > * Stored in network byte order. > */ > now for stores, aligned 8-byte write is permitted. > could you update this as well? > > From the typical usage pattern, I did not see a need > for 8-tye read of user_ip6 and msg_src_ip6 yet. So let > us just deal with write for now. But I guess it's still possible for clang to optimize two consecutive 4-byte reads into single 8-byte read in some circumstances? If that's the case, maybe it's a good idea to have corresponding read checks as well? But overall this looks good to me: Acked-by: Andrii Nakryiko <andriin@fb.com> > > With the above two nits, > Acked-by: Yonghong Song <yhs@fb.com> > > > --- > > include/linux/filter.h | 6 ++++++ > > net/core/filter.c | 22 ++++++++++++++-------- > > 2 files changed, 20 insertions(+), 8 deletions(-) > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index 340f7d648974..3901007e36f1 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -746,6 +746,12 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) > > return size <= size_default && (size & (size - 1)) == 0; > > } > > > > +#define bpf_ctx_wide_store_ok(off, size, type, field) \ > > + (size == sizeof(__u64) && \ > > + off >= offsetof(type, field) && \ > > + off + sizeof(__u64) <= offsetofend(type, field) && \ > > + off % sizeof(__u64) == 0) > > + > > #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) > > > > static inline void bpf_prog_lock_ro(struct bpf_prog *fp) > > diff --git a/net/core/filter.c b/net/core/filter.c > > index dc8534be12fc..5d33f2146dab 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -6849,6 +6849,16 @@ static bool sock_addr_is_valid_access(int off, int size, > > if (!bpf_ctx_narrow_access_ok(off, size, size_default)) > > return false; > > } else { > > + if (bpf_ctx_wide_store_ok(off, size, > > + struct bpf_sock_addr, > > + user_ip6)) > > + return true; > > + > > + if (bpf_ctx_wide_store_ok(off, size, > > + struct bpf_sock_addr, > > + msg_src_ip6)) > > + return true; > > + > > if (size != size_default) > > return false; > > } > > @@ -7689,9 +7699,6 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > /* SOCK_ADDR_STORE_NESTED_FIELD_OFF() has semantic similar to > > * SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF() but for store operation. > > * > > - * It doesn't support SIZE argument though since narrow stores are not > > - * supported for now. > > - * > > * In addition it uses Temporary Field TF (member of struct S) as the 3rd > > * "register" since two registers available in convert_ctx_access are not > > * enough: we can't override neither SRC, since it contains value to store, nor > > @@ -7699,7 +7706,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > * instructions. But we need a temporary place to save pointer to nested > > * structure whose field we want to store to. > > */ > > -#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, TF) \ > > +#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, OFF, TF) \ > > do { \ > > int tmp_reg = BPF_REG_9; \ > > if (si->src_reg == tmp_reg || si->dst_reg == tmp_reg) \ > > @@ -7710,8 +7717,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > offsetof(S, TF)); \ > > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ > > si->dst_reg, offsetof(S, F)); \ > > - *insn++ = BPF_STX_MEM( \ > > - BPF_FIELD_SIZEOF(NS, NF), tmp_reg, si->src_reg, \ > > + *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ > > bpf_target_off(NS, NF, FIELD_SIZEOF(NS, NF), \ > > target_size) \ > > + OFF); \ > > @@ -7723,8 +7729,8 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > TF) \ > > do { \ > > if (type == BPF_WRITE) { \ > > - SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, \ > > - TF); \ > > + SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, \ > > + OFF, TF); \ > > } else { \ > > SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF( \ > > S, NS, F, NF, SIZE, OFF); \ > >
On 06/30, Yonghong Song wrote: > > > On 6/28/19 4:10 PM, Stanislav Fomichev wrote: > > Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided > > that it can do a single u64 store into user_ip6[2] instead of two > > separate u32 ones: > > > > # 17: (18) r2 = 0x100000000000000 > > # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); > > # 19: (7b) *(u64 *)(r1 +16) = r2 > > # invalid bpf_context access off=16 size=8 > > > > From the compiler point of view it does look like a correct thing > > to do, so let's support it on the kernel side. > > > > Credit to Andrii Nakryiko for a proper implementation of > > bpf_ctx_wide_store_ok. > > > > Cc: Andrii Nakryiko <andriin@fb.com> > > Cc: Yonghong Song <yhs@fb.com> > > Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") > > Reported-by: kernel test robot <rong.a.chen@intel.com> > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > The change looks good to me with the following nits: > 1. could you add a cover letter for the patch set? > typically if the number of patches is more than one, > it would be a good practice with a cover letter. > See bpf_devel_QA.rst . > 2. with this change, the comments in uapi bpf.h > are not accurate any more. > __u32 user_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > * Stored in network byte order. > > */ > __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > * Stored in network byte order. > */ > now for stores, aligned 8-byte write is permitted. > could you update this as well? > > From the typical usage pattern, I did not see a need > for 8-tye read of user_ip6 and msg_src_ip6 yet. So let > us just deal with write for now. > > With the above two nits, > Acked-by: Yonghong Song <yhs@fb.com> Thank you for a review, will follow up with a v2 shortly with both things addressed! > > --- > > include/linux/filter.h | 6 ++++++ > > net/core/filter.c | 22 ++++++++++++++-------- > > 2 files changed, 20 insertions(+), 8 deletions(-) > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index 340f7d648974..3901007e36f1 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -746,6 +746,12 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) > > return size <= size_default && (size & (size - 1)) == 0; > > } > > > > +#define bpf_ctx_wide_store_ok(off, size, type, field) \ > > + (size == sizeof(__u64) && \ > > + off >= offsetof(type, field) && \ > > + off + sizeof(__u64) <= offsetofend(type, field) && \ > > + off % sizeof(__u64) == 0) > > + > > #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) > > > > static inline void bpf_prog_lock_ro(struct bpf_prog *fp) > > diff --git a/net/core/filter.c b/net/core/filter.c > > index dc8534be12fc..5d33f2146dab 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -6849,6 +6849,16 @@ static bool sock_addr_is_valid_access(int off, int size, > > if (!bpf_ctx_narrow_access_ok(off, size, size_default)) > > return false; > > } else { > > + if (bpf_ctx_wide_store_ok(off, size, > > + struct bpf_sock_addr, > > + user_ip6)) > > + return true; > > + > > + if (bpf_ctx_wide_store_ok(off, size, > > + struct bpf_sock_addr, > > + msg_src_ip6)) > > + return true; > > + > > if (size != size_default) > > return false; > > } > > @@ -7689,9 +7699,6 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > /* SOCK_ADDR_STORE_NESTED_FIELD_OFF() has semantic similar to > > * SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF() but for store operation. > > * > > - * It doesn't support SIZE argument though since narrow stores are not > > - * supported for now. > > - * > > * In addition it uses Temporary Field TF (member of struct S) as the 3rd > > * "register" since two registers available in convert_ctx_access are not > > * enough: we can't override neither SRC, since it contains value to store, nor > > @@ -7699,7 +7706,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > * instructions. But we need a temporary place to save pointer to nested > > * structure whose field we want to store to. > > */ > > -#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, TF) \ > > +#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, OFF, TF) \ > > do { \ > > int tmp_reg = BPF_REG_9; \ > > if (si->src_reg == tmp_reg || si->dst_reg == tmp_reg) \ > > @@ -7710,8 +7717,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > offsetof(S, TF)); \ > > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ > > si->dst_reg, offsetof(S, F)); \ > > - *insn++ = BPF_STX_MEM( \ > > - BPF_FIELD_SIZEOF(NS, NF), tmp_reg, si->src_reg, \ > > + *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ > > bpf_target_off(NS, NF, FIELD_SIZEOF(NS, NF), \ > > target_size) \ > > + OFF); \ > > @@ -7723,8 +7729,8 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > TF) \ > > do { \ > > if (type == BPF_WRITE) { \ > > - SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, \ > > - TF); \ > > + SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, \ > > + OFF, TF); \ > > } else { \ > > SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF( \ > > S, NS, F, NF, SIZE, OFF); \ > >
On 07/01, Andrii Nakryiko wrote: > On Sat, Jun 29, 2019 at 10:53 PM Yonghong Song <yhs@fb.com> wrote: > > > > > > > > On 6/28/19 4:10 PM, Stanislav Fomichev wrote: > > > Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided > > > that it can do a single u64 store into user_ip6[2] instead of two > > > separate u32 ones: > > > > > > # 17: (18) r2 = 0x100000000000000 > > > # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); > > > # 19: (7b) *(u64 *)(r1 +16) = r2 > > > # invalid bpf_context access off=16 size=8 > > > > > > From the compiler point of view it does look like a correct thing > > > to do, so let's support it on the kernel side. > > > > > > Credit to Andrii Nakryiko for a proper implementation of > > > bpf_ctx_wide_store_ok. > > > > > > Cc: Andrii Nakryiko <andriin@fb.com> > > > Cc: Yonghong Song <yhs@fb.com> > > > Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") > > > Reported-by: kernel test robot <rong.a.chen@intel.com> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > > > The change looks good to me with the following nits: > > 1. could you add a cover letter for the patch set? > > typically if the number of patches is more than one, > > it would be a good practice with a cover letter. > > See bpf_devel_QA.rst . > > 2. with this change, the comments in uapi bpf.h > > are not accurate any more. > > __u32 user_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > > * Stored in network byte order. > > > > */ > > __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > > * Stored in network byte order. > > */ > > now for stores, aligned 8-byte write is permitted. > > could you update this as well? > > > > From the typical usage pattern, I did not see a need > > for 8-tye read of user_ip6 and msg_src_ip6 yet. So let > > us just deal with write for now. > > But I guess it's still possible for clang to optimize two consecutive > 4-byte reads into single 8-byte read in some circumstances? If that's > the case, maybe it's a good idea to have corresponding read checks as > well? I guess clang can do those kinds of optimizations. I can put it on my todo and address later (or when we actually see it out in the wild). > But overall this looks good to me: > > Acked-by: Andrii Nakryiko <andriin@fb.com> Thanks for a review! > > > > With the above two nits, > > Acked-by: Yonghong Song <yhs@fb.com> > > > > > --- > > > include/linux/filter.h | 6 ++++++ > > > net/core/filter.c | 22 ++++++++++++++-------- > > > 2 files changed, 20 insertions(+), 8 deletions(-) > > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > > index 340f7d648974..3901007e36f1 100644 > > > --- a/include/linux/filter.h > > > +++ b/include/linux/filter.h > > > @@ -746,6 +746,12 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) > > > return size <= size_default && (size & (size - 1)) == 0; > > > } > > > > > > +#define bpf_ctx_wide_store_ok(off, size, type, field) \ > > > + (size == sizeof(__u64) && \ > > > + off >= offsetof(type, field) && \ > > > + off + sizeof(__u64) <= offsetofend(type, field) && \ > > > + off % sizeof(__u64) == 0) > > > + > > > #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) > > > > > > static inline void bpf_prog_lock_ro(struct bpf_prog *fp) > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index dc8534be12fc..5d33f2146dab 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -6849,6 +6849,16 @@ static bool sock_addr_is_valid_access(int off, int size, > > > if (!bpf_ctx_narrow_access_ok(off, size, size_default)) > > > return false; > > > } else { > > > + if (bpf_ctx_wide_store_ok(off, size, > > > + struct bpf_sock_addr, > > > + user_ip6)) > > > + return true; > > > + > > > + if (bpf_ctx_wide_store_ok(off, size, > > > + struct bpf_sock_addr, > > > + msg_src_ip6)) > > > + return true; > > > + > > > if (size != size_default) > > > return false; > > > } > > > @@ -7689,9 +7699,6 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > > /* SOCK_ADDR_STORE_NESTED_FIELD_OFF() has semantic similar to > > > * SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF() but for store operation. > > > * > > > - * It doesn't support SIZE argument though since narrow stores are not > > > - * supported for now. > > > - * > > > * In addition it uses Temporary Field TF (member of struct S) as the 3rd > > > * "register" since two registers available in convert_ctx_access are not > > > * enough: we can't override neither SRC, since it contains value to store, nor > > > @@ -7699,7 +7706,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > > * instructions. But we need a temporary place to save pointer to nested > > > * structure whose field we want to store to. > > > */ > > > -#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, TF) \ > > > +#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, OFF, TF) \ > > > do { \ > > > int tmp_reg = BPF_REG_9; \ > > > if (si->src_reg == tmp_reg || si->dst_reg == tmp_reg) \ > > > @@ -7710,8 +7717,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > > offsetof(S, TF)); \ > > > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ > > > si->dst_reg, offsetof(S, F)); \ > > > - *insn++ = BPF_STX_MEM( \ > > > - BPF_FIELD_SIZEOF(NS, NF), tmp_reg, si->src_reg, \ > > > + *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ > > > bpf_target_off(NS, NF, FIELD_SIZEOF(NS, NF), \ > > > target_size) \ > > > + OFF); \ > > > @@ -7723,8 +7729,8 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > > > TF) \ > > > do { \ > > > if (type == BPF_WRITE) { \ > > > - SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, \ > > > - TF); \ > > > + SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, \ > > > + OFF, TF); \ > > > } else { \ > > > SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF( \ > > > S, NS, F, NF, SIZE, OFF); \ > > >
On 7/1/19 9:04 AM, Stanislav Fomichev wrote: > On 07/01, Andrii Nakryiko wrote: >> On Sat, Jun 29, 2019 at 10:53 PM Yonghong Song <yhs@fb.com> wrote: >>> >>> >>> >>> On 6/28/19 4:10 PM, Stanislav Fomichev wrote: >>>> Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided >>>> that it can do a single u64 store into user_ip6[2] instead of two >>>> separate u32 ones: >>>> >>>> # 17: (18) r2 = 0x100000000000000 >>>> # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); >>>> # 19: (7b) *(u64 *)(r1 +16) = r2 >>>> # invalid bpf_context access off=16 size=8 >>>> >>>> From the compiler point of view it does look like a correct thing >>>> to do, so let's support it on the kernel side. >>>> >>>> Credit to Andrii Nakryiko for a proper implementation of >>>> bpf_ctx_wide_store_ok. >>>> >>>> Cc: Andrii Nakryiko <andriin@fb.com> >>>> Cc: Yonghong Song <yhs@fb.com> >>>> Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") >>>> Reported-by: kernel test robot <rong.a.chen@intel.com> >>>> Signed-off-by: Stanislav Fomichev <sdf@google.com> >>> >>> The change looks good to me with the following nits: >>> 1. could you add a cover letter for the patch set? >>> typically if the number of patches is more than one, >>> it would be a good practice with a cover letter. >>> See bpf_devel_QA.rst . >>> 2. with this change, the comments in uapi bpf.h >>> are not accurate any more. >>> __u32 user_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. >>> * Stored in network byte order. >>> >>> */ >>> __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. >>> * Stored in network byte order. >>> */ >>> now for stores, aligned 8-byte write is permitted. >>> could you update this as well? >>> >>> From the typical usage pattern, I did not see a need >>> for 8-tye read of user_ip6 and msg_src_ip6 yet. So let >>> us just deal with write for now. >> >> But I guess it's still possible for clang to optimize two consecutive >> 4-byte reads into single 8-byte read in some circumstances? If that's >> the case, maybe it's a good idea to have corresponding read checks as >> well? > I guess clang can do those kinds of optimizations. I can put it on my > todo and address later (or when we actually see it out in the wild). Okay, I find a Facebook internal app. does trying to read the 4 bytes and compare to a predefined loopback address. We may need to handle read cases as well. But this can be a followup after actual tryout. > >> But overall this looks good to me: >> >> Acked-by: Andrii Nakryiko <andriin@fb.com> > Thanks for a review! > >>> >>> With the above two nits, >>> Acked-by: Yonghong Song <yhs@fb.com> >>> >>>> --- >>>> include/linux/filter.h | 6 ++++++ >>>> net/core/filter.c | 22 ++++++++++++++-------- >>>> 2 files changed, 20 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/include/linux/filter.h b/include/linux/filter.h >>>> index 340f7d648974..3901007e36f1 100644 >>>> --- a/include/linux/filter.h >>>> +++ b/include/linux/filter.h >>>> @@ -746,6 +746,12 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) >>>> return size <= size_default && (size & (size - 1)) == 0; >>>> } >>>> >>>> +#define bpf_ctx_wide_store_ok(off, size, type, field) \ >>>> + (size == sizeof(__u64) && \ >>>> + off >= offsetof(type, field) && \ >>>> + off + sizeof(__u64) <= offsetofend(type, field) && \ >>>> + off % sizeof(__u64) == 0) >>>> + >>>> #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) >>>> >>>> static inline void bpf_prog_lock_ro(struct bpf_prog *fp) >>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>> index dc8534be12fc..5d33f2146dab 100644 >>>> --- a/net/core/filter.c >>>> +++ b/net/core/filter.c >>>> @@ -6849,6 +6849,16 @@ static bool sock_addr_is_valid_access(int off, int size, >>>> if (!bpf_ctx_narrow_access_ok(off, size, size_default)) >>>> return false; >>>> } else { >>>> + if (bpf_ctx_wide_store_ok(off, size, >>>> + struct bpf_sock_addr, >>>> + user_ip6)) >>>> + return true; >>>> + >>>> + if (bpf_ctx_wide_store_ok(off, size, >>>> + struct bpf_sock_addr, >>>> + msg_src_ip6)) >>>> + return true; >>>> + >>>> if (size != size_default) >>>> return false; >>>> } >>>> @@ -7689,9 +7699,6 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, >>>> /* SOCK_ADDR_STORE_NESTED_FIELD_OFF() has semantic similar to >>>> * SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF() but for store operation. >>>> * >>>> - * It doesn't support SIZE argument though since narrow stores are not >>>> - * supported for now. >>>> - * >>>> * In addition it uses Temporary Field TF (member of struct S) as the 3rd >>>> * "register" since two registers available in convert_ctx_access are not >>>> * enough: we can't override neither SRC, since it contains value to store, nor >>>> @@ -7699,7 +7706,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, >>>> * instructions. But we need a temporary place to save pointer to nested >>>> * structure whose field we want to store to. >>>> */ >>>> -#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, TF) \ >>>> +#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, OFF, TF) \ >>>> do { \ >>>> int tmp_reg = BPF_REG_9; \ >>>> if (si->src_reg == tmp_reg || si->dst_reg == tmp_reg) \ >>>> @@ -7710,8 +7717,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, >>>> offsetof(S, TF)); \ >>>> *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ >>>> si->dst_reg, offsetof(S, F)); \ >>>> - *insn++ = BPF_STX_MEM( \ >>>> - BPF_FIELD_SIZEOF(NS, NF), tmp_reg, si->src_reg, \ >>>> + *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ >>>> bpf_target_off(NS, NF, FIELD_SIZEOF(NS, NF), \ >>>> target_size) \ >>>> + OFF); \ >>>> @@ -7723,8 +7729,8 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, >>>> TF) \ >>>> do { \ >>>> if (type == BPF_WRITE) { \ >>>> - SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, \ >>>> - TF); \ >>>> + SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, \ >>>> + OFF, TF); \ >>>> } else { \ >>>> SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF( \ >>>> S, NS, F, NF, SIZE, OFF); \ >>>>
On 07/01, Yonghong Song wrote: > > > On 7/1/19 9:04 AM, Stanislav Fomichev wrote: > > On 07/01, Andrii Nakryiko wrote: > >> On Sat, Jun 29, 2019 at 10:53 PM Yonghong Song <yhs@fb.com> wrote: > >>> > >>> > >>> > >>> On 6/28/19 4:10 PM, Stanislav Fomichev wrote: > >>>> Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided > >>>> that it can do a single u64 store into user_ip6[2] instead of two > >>>> separate u32 ones: > >>>> > >>>> # 17: (18) r2 = 0x100000000000000 > >>>> # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); > >>>> # 19: (7b) *(u64 *)(r1 +16) = r2 > >>>> # invalid bpf_context access off=16 size=8 > >>>> > >>>> From the compiler point of view it does look like a correct thing > >>>> to do, so let's support it on the kernel side. > >>>> > >>>> Credit to Andrii Nakryiko for a proper implementation of > >>>> bpf_ctx_wide_store_ok. > >>>> > >>>> Cc: Andrii Nakryiko <andriin@fb.com> > >>>> Cc: Yonghong Song <yhs@fb.com> > >>>> Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") > >>>> Reported-by: kernel test robot <rong.a.chen@intel.com> > >>>> Signed-off-by: Stanislav Fomichev <sdf@google.com> > >>> > >>> The change looks good to me with the following nits: > >>> 1. could you add a cover letter for the patch set? > >>> typically if the number of patches is more than one, > >>> it would be a good practice with a cover letter. > >>> See bpf_devel_QA.rst . > >>> 2. with this change, the comments in uapi bpf.h > >>> are not accurate any more. > >>> __u32 user_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > >>> * Stored in network byte order. > >>> > >>> */ > >>> __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. > >>> * Stored in network byte order. > >>> */ > >>> now for stores, aligned 8-byte write is permitted. > >>> could you update this as well? > >>> > >>> From the typical usage pattern, I did not see a need > >>> for 8-tye read of user_ip6 and msg_src_ip6 yet. So let > >>> us just deal with write for now. > >> > >> But I guess it's still possible for clang to optimize two consecutive > >> 4-byte reads into single 8-byte read in some circumstances? If that's > >> the case, maybe it's a good idea to have corresponding read checks as > >> well? > > I guess clang can do those kinds of optimizations. I can put it on my > > todo and address later (or when we actually see it out in the wild). > > Okay, I find a Facebook internal app. does trying to read the 4 bytes > and compare to a predefined loopback address. We may need to handle > read cases as well. But this can be a followup after actual tryout. Sounds good, will follow up on that. > > > >> But overall this looks good to me: > >> > >> Acked-by: Andrii Nakryiko <andriin@fb.com> > > Thanks for a review! > > > >>> > >>> With the above two nits, > >>> Acked-by: Yonghong Song <yhs@fb.com> > >>> > >>>> --- > >>>> include/linux/filter.h | 6 ++++++ > >>>> net/core/filter.c | 22 ++++++++++++++-------- > >>>> 2 files changed, 20 insertions(+), 8 deletions(-) > >>>> > >>>> diff --git a/include/linux/filter.h b/include/linux/filter.h > >>>> index 340f7d648974..3901007e36f1 100644 > >>>> --- a/include/linux/filter.h > >>>> +++ b/include/linux/filter.h > >>>> @@ -746,6 +746,12 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) > >>>> return size <= size_default && (size & (size - 1)) == 0; > >>>> } > >>>> > >>>> +#define bpf_ctx_wide_store_ok(off, size, type, field) \ > >>>> + (size == sizeof(__u64) && \ > >>>> + off >= offsetof(type, field) && \ > >>>> + off + sizeof(__u64) <= offsetofend(type, field) && \ > >>>> + off % sizeof(__u64) == 0) > >>>> + > >>>> #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) > >>>> > >>>> static inline void bpf_prog_lock_ro(struct bpf_prog *fp) > >>>> diff --git a/net/core/filter.c b/net/core/filter.c > >>>> index dc8534be12fc..5d33f2146dab 100644 > >>>> --- a/net/core/filter.c > >>>> +++ b/net/core/filter.c > >>>> @@ -6849,6 +6849,16 @@ static bool sock_addr_is_valid_access(int off, int size, > >>>> if (!bpf_ctx_narrow_access_ok(off, size, size_default)) > >>>> return false; > >>>> } else { > >>>> + if (bpf_ctx_wide_store_ok(off, size, > >>>> + struct bpf_sock_addr, > >>>> + user_ip6)) > >>>> + return true; > >>>> + > >>>> + if (bpf_ctx_wide_store_ok(off, size, > >>>> + struct bpf_sock_addr, > >>>> + msg_src_ip6)) > >>>> + return true; > >>>> + > >>>> if (size != size_default) > >>>> return false; > >>>> } > >>>> @@ -7689,9 +7699,6 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > >>>> /* SOCK_ADDR_STORE_NESTED_FIELD_OFF() has semantic similar to > >>>> * SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF() but for store operation. > >>>> * > >>>> - * It doesn't support SIZE argument though since narrow stores are not > >>>> - * supported for now. > >>>> - * > >>>> * In addition it uses Temporary Field TF (member of struct S) as the 3rd > >>>> * "register" since two registers available in convert_ctx_access are not > >>>> * enough: we can't override neither SRC, since it contains value to store, nor > >>>> @@ -7699,7 +7706,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > >>>> * instructions. But we need a temporary place to save pointer to nested > >>>> * structure whose field we want to store to. > >>>> */ > >>>> -#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, TF) \ > >>>> +#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, OFF, TF) \ > >>>> do { \ > >>>> int tmp_reg = BPF_REG_9; \ > >>>> if (si->src_reg == tmp_reg || si->dst_reg == tmp_reg) \ > >>>> @@ -7710,8 +7717,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > >>>> offsetof(S, TF)); \ > >>>> *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ > >>>> si->dst_reg, offsetof(S, F)); \ > >>>> - *insn++ = BPF_STX_MEM( \ > >>>> - BPF_FIELD_SIZEOF(NS, NF), tmp_reg, si->src_reg, \ > >>>> + *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ > >>>> bpf_target_off(NS, NF, FIELD_SIZEOF(NS, NF), \ > >>>> target_size) \ > >>>> + OFF); \ > >>>> @@ -7723,8 +7729,8 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, > >>>> TF) \ > >>>> do { \ > >>>> if (type == BPF_WRITE) { \ > >>>> - SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, \ > >>>> - TF); \ > >>>> + SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, \ > >>>> + OFF, TF); \ > >>>> } else { \ > >>>> SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF( \ > >>>> S, NS, F, NF, SIZE, OFF); \ > >>>>
diff --git a/include/linux/filter.h b/include/linux/filter.h index 340f7d648974..3901007e36f1 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -746,6 +746,12 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) return size <= size_default && (size & (size - 1)) == 0; } +#define bpf_ctx_wide_store_ok(off, size, type, field) \ + (size == sizeof(__u64) && \ + off >= offsetof(type, field) && \ + off + sizeof(__u64) <= offsetofend(type, field) && \ + off % sizeof(__u64) == 0) + #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) static inline void bpf_prog_lock_ro(struct bpf_prog *fp) diff --git a/net/core/filter.c b/net/core/filter.c index dc8534be12fc..5d33f2146dab 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6849,6 +6849,16 @@ static bool sock_addr_is_valid_access(int off, int size, if (!bpf_ctx_narrow_access_ok(off, size, size_default)) return false; } else { + if (bpf_ctx_wide_store_ok(off, size, + struct bpf_sock_addr, + user_ip6)) + return true; + + if (bpf_ctx_wide_store_ok(off, size, + struct bpf_sock_addr, + msg_src_ip6)) + return true; + if (size != size_default) return false; } @@ -7689,9 +7699,6 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, /* SOCK_ADDR_STORE_NESTED_FIELD_OFF() has semantic similar to * SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF() but for store operation. * - * It doesn't support SIZE argument though since narrow stores are not - * supported for now. - * * In addition it uses Temporary Field TF (member of struct S) as the 3rd * "register" since two registers available in convert_ctx_access are not * enough: we can't override neither SRC, since it contains value to store, nor @@ -7699,7 +7706,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, * instructions. But we need a temporary place to save pointer to nested * structure whose field we want to store to. */ -#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, TF) \ +#define SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, OFF, TF) \ do { \ int tmp_reg = BPF_REG_9; \ if (si->src_reg == tmp_reg || si->dst_reg == tmp_reg) \ @@ -7710,8 +7717,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, offsetof(S, TF)); \ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ si->dst_reg, offsetof(S, F)); \ - *insn++ = BPF_STX_MEM( \ - BPF_FIELD_SIZEOF(NS, NF), tmp_reg, si->src_reg, \ + *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ bpf_target_off(NS, NF, FIELD_SIZEOF(NS, NF), \ target_size) \ + OFF); \ @@ -7723,8 +7729,8 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type, TF) \ do { \ if (type == BPF_WRITE) { \ - SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, OFF, \ - TF); \ + SOCK_ADDR_STORE_NESTED_FIELD_OFF(S, NS, F, NF, SIZE, \ + OFF, TF); \ } else { \ SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF( \ S, NS, F, NF, SIZE, OFF); \
Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided that it can do a single u64 store into user_ip6[2] instead of two separate u32 ones: # 17: (18) r2 = 0x100000000000000 # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); # 19: (7b) *(u64 *)(r1 +16) = r2 # invalid bpf_context access off=16 size=8 From the compiler point of view it does look like a correct thing to do, so let's support it on the kernel side. Credit to Andrii Nakryiko for a proper implementation of bpf_ctx_wide_store_ok. Cc: Andrii Nakryiko <andriin@fb.com> Cc: Yonghong Song <yhs@fb.com> Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> --- include/linux/filter.h | 6 ++++++ net/core/filter.c | 22 ++++++++++++++-------- 2 files changed, 20 insertions(+), 8 deletions(-)