Message ID | 20170426182419.14574-7-hannes@stressinduktion.org |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Apr 26, 2017 at 08:24:19PM +0200, Hannes Frederic Sowa wrote: > > +static const char *bpf_type_string(enum bpf_prog_type type) > +{ > + static const char *bpf_type_names[] = { > +#define X(type) #type > + BPF_PROG_TYPES > +#undef X > + }; > + > + if (type >= ARRAY_SIZE(bpf_type_names)) > + return "<unknown>"; > + > + return bpf_type_names[type]; > +} > + > static int ebpf_proc_show(struct seq_file *s, void *v) > { > + struct bpf_prog *prog; > + struct bpf_prog_aux *aux; > + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; > + > if (v == SEQ_START_TOKEN) { > - seq_printf(s, "# tag\n"); > + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); > return 0; > } > > + aux = v; > + prog = aux->prog; > + > + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); > + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, > + bpf_type_string(prog->type), > + prog->jited ? "jit" : "int", > + prog->priv_cap_sys_admin ? "priv" : "unpriv", > + prog->pages * 1ULL << PAGE_SHIFT); As I said several times already I'm strongly against procfs style of exposing information about the programs. I don't want this to become debugfs for bpf. Maintaining the list of all loaded programs is fine and we need a way to iterate through them, but procfs is obviously not the interface to do that. Programs/maps are binary whereas any fs interface is text. It also doesn't scale with large number of programs/maps. I prefer Daniel's suggestion on adding 'get_next' like API. Also would be good if you can wait for Martin to finish his prog->handle/id patches. Then user space will be able to iterate through all the progs/maps and fetch all info about them through syscall in extensible way. And you wouldn't need to abuse kallsyms list for different purpose.
On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote: > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> > --- > include/uapi/linux/bpf.h | 32 +++++++++++++++++++------------- > kernel/bpf/core.c | 30 +++++++++++++++++++++++++++++- > 2 files changed, 48 insertions(+), 14 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index e553529929f683..d6506e320953d5 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -101,20 +101,26 @@ enum bpf_map_type { > BPF_MAP_TYPE_HASH_OF_MAPS, > }; > > +#define BPF_PROG_TYPES \ > + X(BPF_PROG_TYPE_UNSPEC), \ > + X(BPF_PROG_TYPE_SOCKET_FILTER), \ > + X(BPF_PROG_TYPE_KPROBE), \ > + X(BPF_PROG_TYPE_SCHED_CLS), \ > + X(BPF_PROG_TYPE_SCHED_ACT), \ > + X(BPF_PROG_TYPE_TRACEPOINT), \ > + X(BPF_PROG_TYPE_XDP), \ > + X(BPF_PROG_TYPE_PERF_EVENT), \ > + X(BPF_PROG_TYPE_CGROUP_SKB), \ > + X(BPF_PROG_TYPE_CGROUP_SOCK), \ > + X(BPF_PROG_TYPE_LWT_IN), \ > + X(BPF_PROG_TYPE_LWT_OUT), \ > + X(BPF_PROG_TYPE_LWT_XMIT), > + > + > enum bpf_prog_type { > - BPF_PROG_TYPE_UNSPEC, > - BPF_PROG_TYPE_SOCKET_FILTER, > - BPF_PROG_TYPE_KPROBE, > - BPF_PROG_TYPE_SCHED_CLS, > - BPF_PROG_TYPE_SCHED_ACT, > - BPF_PROG_TYPE_TRACEPOINT, > - BPF_PROG_TYPE_XDP, > - BPF_PROG_TYPE_PERF_EVENT, > - BPF_PROG_TYPE_CGROUP_SKB, > - BPF_PROG_TYPE_CGROUP_SOCK, > - BPF_PROG_TYPE_LWT_IN, > - BPF_PROG_TYPE_LWT_OUT, > - BPF_PROG_TYPE_LWT_XMIT, > +#define X(type) type Defining X in uapi could clash easily with other headers e.g. from application side. > + BPF_PROG_TYPES > +#undef X > }; > > enum bpf_attach_type { > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 3ba175a24e971a..685c1d0f31e029 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v) > rcu_read_unlock(); > } > > +static const char *bpf_type_string(enum bpf_prog_type type) > +{ > + static const char *bpf_type_names[] = { > +#define X(type) #type > + BPF_PROG_TYPES > +#undef X > + }; > + > + if (type >= ARRAY_SIZE(bpf_type_names)) > + return "<unknown>"; > + > + return bpf_type_names[type]; > +} > + > static int ebpf_proc_show(struct seq_file *s, void *v) > { > + struct bpf_prog *prog; > + struct bpf_prog_aux *aux; > + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; > + > if (v == SEQ_START_TOKEN) { > - seq_printf(s, "# tag\n"); > + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); > return 0; > } > > + aux = v; > + prog = aux->prog; > + > + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); > + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, > + bpf_type_string(prog->type), > + prog->jited ? "jit" : "int", > + prog->priv_cap_sys_admin ? "priv" : "unpriv", > + prog->pages * 1ULL << PAGE_SHIFT); Yeah, so that would be quite similar to what we dump in bpf_prog_show_fdinfo() modulo the priv bit. I generally agree that a facility for dumping all progs is needed and it was also on the TODO list after the bpf(2) cmd for dumping program insns back to user space. I think the procfs interface has pro and cons: the upside is that you can use it with tools like cat to inspect it, but what you still cannot do is to say that you want to see the prog insns for, say, prog #4 from that list. If we could iterate over that list through fds via bpf(2) syscall, you could i) present the same info you have above via fdinfo already and ii) also dump the BPF insns from that specific program through a BPF_PROG_DUMP bpf(2) command. Once that dump also supports maps in progs, you could go further and fetch related map fds for inspection, etc. Such option of iterating through that would need a new BPF syscall cmd aka BPF_PROG_GET_NEXT which returns the first prog from the list and you would walk the next one by passing the current fd, which can later also be closed as not needed anymore. We could restrict that dump to capable(CAP_SYS_ADMIN), and the kernel tree would need to ship a tool e.g. under tools/bpf/ that can be used for inspection. > + > return 0; > } > >
On 26.04.2017 23:35, Daniel Borkmann wrote: > On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote: >> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> >> --- >> include/uapi/linux/bpf.h | 32 +++++++++++++++++++------------- >> kernel/bpf/core.c | 30 +++++++++++++++++++++++++++++- >> 2 files changed, 48 insertions(+), 14 deletions(-) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index e553529929f683..d6506e320953d5 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -101,20 +101,26 @@ enum bpf_map_type { >> BPF_MAP_TYPE_HASH_OF_MAPS, >> }; >> >> +#define BPF_PROG_TYPES \ >> + X(BPF_PROG_TYPE_UNSPEC), \ >> + X(BPF_PROG_TYPE_SOCKET_FILTER), \ >> + X(BPF_PROG_TYPE_KPROBE), \ >> + X(BPF_PROG_TYPE_SCHED_CLS), \ >> + X(BPF_PROG_TYPE_SCHED_ACT), \ >> + X(BPF_PROG_TYPE_TRACEPOINT), \ >> + X(BPF_PROG_TYPE_XDP), \ >> + X(BPF_PROG_TYPE_PERF_EVENT), \ >> + X(BPF_PROG_TYPE_CGROUP_SKB), \ >> + X(BPF_PROG_TYPE_CGROUP_SOCK), \ >> + X(BPF_PROG_TYPE_LWT_IN), \ >> + X(BPF_PROG_TYPE_LWT_OUT), \ >> + X(BPF_PROG_TYPE_LWT_XMIT), >> + >> + >> enum bpf_prog_type { >> - BPF_PROG_TYPE_UNSPEC, >> - BPF_PROG_TYPE_SOCKET_FILTER, >> - BPF_PROG_TYPE_KPROBE, >> - BPF_PROG_TYPE_SCHED_CLS, >> - BPF_PROG_TYPE_SCHED_ACT, >> - BPF_PROG_TYPE_TRACEPOINT, >> - BPF_PROG_TYPE_XDP, >> - BPF_PROG_TYPE_PERF_EVENT, >> - BPF_PROG_TYPE_CGROUP_SKB, >> - BPF_PROG_TYPE_CGROUP_SOCK, >> - BPF_PROG_TYPE_LWT_IN, >> - BPF_PROG_TYPE_LWT_OUT, >> - BPF_PROG_TYPE_LWT_XMIT, >> +#define X(type) type > > Defining X in uapi could clash easily with other headers e.g. > from application side. I think that X-macros are so much used that code review should catch that no X macro is leaked beyond its scope. I certainly can use some other macro names, maybe something along the line from bpf_types.h. > >> + BPF_PROG_TYPES >> +#undef X >> }; >> >> enum bpf_attach_type { >> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >> index 3ba175a24e971a..685c1d0f31e029 100644 >> --- a/kernel/bpf/core.c >> +++ b/kernel/bpf/core.c >> @@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, >> void *v) >> rcu_read_unlock(); >> } >> >> +static const char *bpf_type_string(enum bpf_prog_type type) >> +{ >> + static const char *bpf_type_names[] = { >> +#define X(type) #type >> + BPF_PROG_TYPES >> +#undef X >> + }; >> + >> + if (type >= ARRAY_SIZE(bpf_type_names)) >> + return "<unknown>"; >> + >> + return bpf_type_names[type]; >> +} >> + >> static int ebpf_proc_show(struct seq_file *s, void *v) >> { >> + struct bpf_prog *prog; >> + struct bpf_prog_aux *aux; >> + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; >> + >> if (v == SEQ_START_TOKEN) { >> - seq_printf(s, "# tag\n"); >> + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); >> return 0; >> } >> >> + aux = v; >> + prog = aux->prog; >> + >> + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); >> + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, >> + bpf_type_string(prog->type), >> + prog->jited ? "jit" : "int", >> + prog->priv_cap_sys_admin ? "priv" : "unpriv", >> + prog->pages * 1ULL << PAGE_SHIFT); > > Yeah, so that would be quite similar to what we dump in > bpf_prog_show_fdinfo() modulo the priv bit. And, additionally, that you don't need to have a file descriptor handy to do so, which is my main point. > I generally agree that a facility for dumping all progs is needed > and it was also on the TODO list after the bpf(2) cmd for dumping > program insns back to user space. I think this is orthogonal. > I think the procfs interface has pro and cons: the upside is that > you can use it with tools like cat to inspect it, but what you still > cannot do is to say that you want to see the prog insns for, say, > prog #4 from that list. If we could iterate over that list through fds > via bpf(2) syscall, you could i) present the same info you have above > via fdinfo already and ii) also dump the BPF insns from that specific > program through a BPF_PROG_DUMP bpf(2) command. Once that dump also > supports maps in progs, you could go further and fetch related map > fds for inspection, etc. Sure, that sounds super. But so far Linux and most (maybe I should write all) subsystems always provided some easy way to get the insights of the kernel without having to code or rely on special tools so far. And I very much enjoyed that part on Linux. FreeBSD did very often abstract some details behind shared libraries and commands that has no easy pedant to use for with cat and grep. > Such option of iterating through that would need a new BPF syscall > cmd aka BPF_PROG_GET_NEXT which returns the first prog from the list > and you would walk the next one by passing the current fd, which can > later also be closed as not needed anymore. We could restrict that > dump to capable(CAP_SYS_ADMIN), and the kernel tree would need to > ship a tool e.g. under tools/bpf/ that can be used for inspection. > Martin already posted his patches which add a bpf_prog_id to ebpf programs. I will have alook at those and will comment over there. Thansk for the review, Hannes
On 26.04.2017 23:25, Alexei Starovoitov wrote: > On Wed, Apr 26, 2017 at 08:24:19PM +0200, Hannes Frederic Sowa wrote: >> >> +static const char *bpf_type_string(enum bpf_prog_type type) >> +{ >> + static const char *bpf_type_names[] = { >> +#define X(type) #type >> + BPF_PROG_TYPES >> +#undef X >> + }; >> + >> + if (type >= ARRAY_SIZE(bpf_type_names)) >> + return "<unknown>"; >> + >> + return bpf_type_names[type]; >> +} >> + >> static int ebpf_proc_show(struct seq_file *s, void *v) >> { >> + struct bpf_prog *prog; >> + struct bpf_prog_aux *aux; >> + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; >> + >> if (v == SEQ_START_TOKEN) { >> - seq_printf(s, "# tag\n"); >> + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); >> return 0; >> } >> >> + aux = v; >> + prog = aux->prog; >> + >> + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); >> + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, >> + bpf_type_string(prog->type), >> + prog->jited ? "jit" : "int", >> + prog->priv_cap_sys_admin ? "priv" : "unpriv", >> + prog->pages * 1ULL << PAGE_SHIFT); > > As I said several times already I'm strongly against procfs > style of exposing information about the programs. I would not expand procfs interface to dump ebpf bytecode or jitcode, albeit I would prefer a file system abstraction for that. > I don't want this to become debugfs for bpf. Right now it just prints a list of ebpf programs. You reject of where things are going or do you already reject this particular patch? > Maintaining the list of all loaded programs is fine > and we need a way to iterate through them, but procfs > is obviously not the interface to do that. > Programs/maps are binary whereas any fs interface is text. It should give admins only a high level overview of what is going on in there system. I don't want o print any kind of opcodes there, just giving people the possibility to see what is going on. > It also doesn't scale with large number of programs/maps. > I prefer Daniel's suggestion on adding 'get_next' like API. > Also would be good if you can wait for Martin to finish his > prog->handle/id patches. Then user space will be able > to iterate through all the progs/maps and fetch all info about > them through syscall in extensible way. > And you wouldn't need to abuse kallsyms list for different purpose. I don't buy the scalability argument: What is the scalability issue with this O(1) approach on loading and O(n) (but also without any locks held) on dumping? You can't do any better. You can even dump several programs with one syscall instead of slowly iterating over them. Certainly I can wait with those patches until Martin presented his patches. And regarding abusing the list, I just renamed it. If you think I abuse something I can also add another list like bpf_prog_id patches do? Bye bye, Hannes
From: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Thu, 27 Apr 2017 15:22:49 +0200 > Sure, that sounds super. But so far Linux and most (maybe I should write > all) subsystems always provided some easy way to get the insights of the > kernel without having to code or rely on special tools so far. Not true. You cannot fully dump socket TCP internal state without netlink based tools. It is just one of many examples. Can you dump all nftables rules without a special tool? I don't think this is a legitimate line of argument, and I want this to be done via the bpf() system call which is what people are working on. Thanks.
On 27.04.2017 18:00, David Miller wrote: > From: Hannes Frederic Sowa <hannes@stressinduktion.org> > Date: Thu, 27 Apr 2017 15:22:49 +0200 > >> Sure, that sounds super. But so far Linux and most (maybe I should write >> all) subsystems always provided some easy way to get the insights of the >> kernel without having to code or rely on special tools so far. > > Not true. Yes, I should not have written it that generally. ;) > You cannot fully dump socket TCP internal state without netlink based > tools. It is just one of many examples. > > Can you dump all nftables rules without a special tool? You got me here, I agree that not all state is discoverable via procfs. But to some degree even netfilter and tcp do expose some considerable amount of data via procfs. In the case of netfilter it might be less valuable, though, I have to agree. > I don't think this is a legitimate line of argument, and I want > this to be done via the bpf() system call which is what people > are working on. I hope you saw that I was absolutely not against dumping or enumeration with the bpf syscall. It will be the primary interface to debug ebpf and I completely agree. Merely I tried to establish the procfs interface as quick look interface if some type of bpf program is loaded which could start any further diagnosis. This interface should not have any dependencies and should even work on embedded devices, where sometimes it might be difficult to get a binary for the correct architecture installed ad-hoc (I am thinking about openwrt). But this is definitely also solvable. I do think if a common utility in util-linux, like lsbpf, is available I will be fine. Anyway, I will take this argument back. Thanks, Hannes
From: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Thu, 27 Apr 2017 18:28:17 +0200 > Merely I tried to establish the procfs interface as quick look > interface Show me that "quick look" nftables dumping facility via procfs and I'll start to listen to you. What you are proposing has no real value once we have bpf() system call based traversal and has no strict precedence across the networking subsystem. Thank you.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e553529929f683..d6506e320953d5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -101,20 +101,26 @@ enum bpf_map_type { BPF_MAP_TYPE_HASH_OF_MAPS, }; +#define BPF_PROG_TYPES \ + X(BPF_PROG_TYPE_UNSPEC), \ + X(BPF_PROG_TYPE_SOCKET_FILTER), \ + X(BPF_PROG_TYPE_KPROBE), \ + X(BPF_PROG_TYPE_SCHED_CLS), \ + X(BPF_PROG_TYPE_SCHED_ACT), \ + X(BPF_PROG_TYPE_TRACEPOINT), \ + X(BPF_PROG_TYPE_XDP), \ + X(BPF_PROG_TYPE_PERF_EVENT), \ + X(BPF_PROG_TYPE_CGROUP_SKB), \ + X(BPF_PROG_TYPE_CGROUP_SOCK), \ + X(BPF_PROG_TYPE_LWT_IN), \ + X(BPF_PROG_TYPE_LWT_OUT), \ + X(BPF_PROG_TYPE_LWT_XMIT), + + enum bpf_prog_type { - BPF_PROG_TYPE_UNSPEC, - BPF_PROG_TYPE_SOCKET_FILTER, - BPF_PROG_TYPE_KPROBE, - BPF_PROG_TYPE_SCHED_CLS, - BPF_PROG_TYPE_SCHED_ACT, - BPF_PROG_TYPE_TRACEPOINT, - BPF_PROG_TYPE_XDP, - BPF_PROG_TYPE_PERF_EVENT, - BPF_PROG_TYPE_CGROUP_SKB, - BPF_PROG_TYPE_CGROUP_SOCK, - BPF_PROG_TYPE_LWT_IN, - BPF_PROG_TYPE_LWT_OUT, - BPF_PROG_TYPE_LWT_XMIT, +#define X(type) type + BPF_PROG_TYPES +#undef X }; enum bpf_attach_type { diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 3ba175a24e971a..685c1d0f31e029 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v) rcu_read_unlock(); } +static const char *bpf_type_string(enum bpf_prog_type type) +{ + static const char *bpf_type_names[] = { +#define X(type) #type + BPF_PROG_TYPES +#undef X + }; + + if (type >= ARRAY_SIZE(bpf_type_names)) + return "<unknown>"; + + return bpf_type_names[type]; +} + static int ebpf_proc_show(struct seq_file *s, void *v) { + struct bpf_prog *prog; + struct bpf_prog_aux *aux; + char prog_tag[sizeof(prog->tag) * 2 + 1] = { }; + if (v == SEQ_START_TOKEN) { - seq_printf(s, "# tag\n"); + seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n"); return 0; } + aux = v; + prog = aux->prog; + + bin2hex(prog_tag, prog->tag, sizeof(prog->tag)); + seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag, + bpf_type_string(prog->type), + prog->jited ? "jit" : "int", + prog->priv_cap_sys_admin ? "priv" : "unpriv", + prog->pages * 1ULL << PAGE_SHIFT); + return 0; }
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> --- include/uapi/linux/bpf.h | 32 +++++++++++++++++++------------- kernel/bpf/core.c | 30 +++++++++++++++++++++++++++++- 2 files changed, 48 insertions(+), 14 deletions(-)