[net-next,6/6] bpf: show bpf programs

Message ID	20170426182419.14574-7-hannes@stressinduktion.org
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Hannes Frederic Sowa <hannes@stressinduktion.org> To: netdev@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.com, jbenc@redhat.com, aconole@bytheb.org Subject: [PATCH net-next 6/6] bpf: show bpf programs Date: Wed, 26 Apr 2017 20:24:19 +0200 Message-Id: <20170426182419.14574-7-hannes@stressinduktion.org> In-Reply-To: <20170426182419.14574-1-hannes@stressinduktion.org> References: <20170426182419.14574-1-hannes@stressinduktion.org> Sender: netdev-owner@vger.kernel.org Precedence: bulk

Hannes Frederic Sowa April 26, 2017, 6:24 p.m. UTC

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/uapi/linux/bpf.h | 32 +++++++++++++++++++-------------
 kernel/bpf/core.c        | 30 +++++++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 14 deletions(-)

Alexei Starovoitov April 26, 2017, 9:25 p.m. UTC | #1

On Wed, Apr 26, 2017 at 08:24:19PM +0200, Hannes Frederic Sowa wrote:
>  
> +static const char *bpf_type_string(enum bpf_prog_type type)
> +{
> +	static const char *bpf_type_names[] = {
> +#define X(type) #type
> +		BPF_PROG_TYPES
> +#undef X
> +	};
> +
> +	if (type >= ARRAY_SIZE(bpf_type_names))
> +		return "<unknown>";
> +
> +	return bpf_type_names[type];
> +}
> +
>  static int ebpf_proc_show(struct seq_file *s, void *v)
>  {
> +	struct bpf_prog *prog;
> +	struct bpf_prog_aux *aux;
> +	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> +
>  	if (v == SEQ_START_TOKEN) {
> -		seq_printf(s, "# tag\n");
> +		seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
>  		return 0;
>  	}
>  
> +	aux = v;
> +	prog = aux->prog;
> +
> +	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
> +	seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
> +		   bpf_type_string(prog->type),
> +		   prog->jited ? "jit" : "int",
> +		   prog->priv_cap_sys_admin ? "priv" : "unpriv",
> +		   prog->pages * 1ULL << PAGE_SHIFT);

As I said several times already I'm strongly against procfs
style of exposing information about the programs.
I don't want this to become debugfs for bpf.
Maintaining the list of all loaded programs is fine
and we need a way to iterate through them, but procfs
is obviously not the interface to do that.
Programs/maps are binary whereas any fs interface is text.
It also doesn't scale with large number of programs/maps.
I prefer Daniel's suggestion on adding 'get_next' like API.
Also would be good if you can wait for Martin to finish his
prog->handle/id patches. Then user space will be able
to iterate through all the progs/maps and fetch all info about
them through syscall in extensible way.
And you wouldn't need to abuse kallsyms list for different purpose.

Daniel Borkmann April 26, 2017, 9:35 p.m. UTC | #2

On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote:
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>   include/uapi/linux/bpf.h | 32 +++++++++++++++++++-------------
>   kernel/bpf/core.c        | 30 +++++++++++++++++++++++++++++-
>   2 files changed, 48 insertions(+), 14 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index e553529929f683..d6506e320953d5 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -101,20 +101,26 @@ enum bpf_map_type {
>   	BPF_MAP_TYPE_HASH_OF_MAPS,
>   };
>
> +#define BPF_PROG_TYPES			\
> +	X(BPF_PROG_TYPE_UNSPEC),	\
> +	X(BPF_PROG_TYPE_SOCKET_FILTER),	\
> +	X(BPF_PROG_TYPE_KPROBE),	\
> +	X(BPF_PROG_TYPE_SCHED_CLS),	\
> +	X(BPF_PROG_TYPE_SCHED_ACT),	\
> +	X(BPF_PROG_TYPE_TRACEPOINT),	\
> +	X(BPF_PROG_TYPE_XDP),		\
> +	X(BPF_PROG_TYPE_PERF_EVENT),	\
> +	X(BPF_PROG_TYPE_CGROUP_SKB),	\
> +	X(BPF_PROG_TYPE_CGROUP_SOCK),	\
> +	X(BPF_PROG_TYPE_LWT_IN),	\
> +	X(BPF_PROG_TYPE_LWT_OUT),	\
> +	X(BPF_PROG_TYPE_LWT_XMIT),
> +
> +
>   enum bpf_prog_type {
> -	BPF_PROG_TYPE_UNSPEC,
> -	BPF_PROG_TYPE_SOCKET_FILTER,
> -	BPF_PROG_TYPE_KPROBE,
> -	BPF_PROG_TYPE_SCHED_CLS,
> -	BPF_PROG_TYPE_SCHED_ACT,
> -	BPF_PROG_TYPE_TRACEPOINT,
> -	BPF_PROG_TYPE_XDP,
> -	BPF_PROG_TYPE_PERF_EVENT,
> -	BPF_PROG_TYPE_CGROUP_SKB,
> -	BPF_PROG_TYPE_CGROUP_SOCK,
> -	BPF_PROG_TYPE_LWT_IN,
> -	BPF_PROG_TYPE_LWT_OUT,
> -	BPF_PROG_TYPE_LWT_XMIT,
> +#define X(type) type

Defining X in uapi could clash easily with other headers e.g.
from application side.

> +	BPF_PROG_TYPES
> +#undef X
>   };
>
>   enum bpf_attach_type {
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 3ba175a24e971a..685c1d0f31e029 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s, void *v)
>   	rcu_read_unlock();
>   }
>
> +static const char *bpf_type_string(enum bpf_prog_type type)
> +{
> +	static const char *bpf_type_names[] = {
> +#define X(type) #type
> +		BPF_PROG_TYPES
> +#undef X
> +	};
> +
> +	if (type >= ARRAY_SIZE(bpf_type_names))
> +		return "<unknown>";
> +
> +	return bpf_type_names[type];
> +}
> +
>   static int ebpf_proc_show(struct seq_file *s, void *v)
>   {
> +	struct bpf_prog *prog;
> +	struct bpf_prog_aux *aux;
> +	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> +
>   	if (v == SEQ_START_TOKEN) {
> -		seq_printf(s, "# tag\n");
> +		seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
>   		return 0;
>   	}
>
> +	aux = v;
> +	prog = aux->prog;
> +
> +	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
> +	seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
> +		   bpf_type_string(prog->type),
> +		   prog->jited ? "jit" : "int",
> +		   prog->priv_cap_sys_admin ? "priv" : "unpriv",
> +		   prog->pages * 1ULL << PAGE_SHIFT);

Yeah, so that would be quite similar to what we dump in
bpf_prog_show_fdinfo() modulo the priv bit.

I generally agree that a facility for dumping all progs is needed
and it was also on the TODO list after the bpf(2) cmd for dumping
program insns back to user space.

I think the procfs interface has pro and cons: the upside is that
you can use it with tools like cat to inspect it, but what you still
cannot do is to say that you want to see the prog insns for, say,
prog #4 from that list. If we could iterate over that list through fds
via bpf(2) syscall, you could i) present the same info you have above
via fdinfo already and ii) also dump the BPF insns from that specific
program through a BPF_PROG_DUMP bpf(2) command. Once that dump also
supports maps in progs, you could go further and fetch related map
fds for inspection, etc.

Such option of iterating through that would need a new BPF syscall
cmd aka BPF_PROG_GET_NEXT which returns the first prog from the list
and you would walk the next one by passing the current fd, which can
later also be closed as not needed anymore. We could restrict that
dump to capable(CAP_SYS_ADMIN), and the kernel tree would need to
ship a tool e.g. under tools/bpf/ that can be used for inspection.

> +
>   	return 0;
>   }
>
>

Hannes Frederic Sowa April 27, 2017, 1:22 p.m. UTC | #3

On 26.04.2017 23:35, Daniel Borkmann wrote:
> On 04/26/2017 08:24 PM, Hannes Frederic Sowa wrote:
>> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> ---
>>   include/uapi/linux/bpf.h | 32 +++++++++++++++++++-------------
>>   kernel/bpf/core.c        | 30 +++++++++++++++++++++++++++++-
>>   2 files changed, 48 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index e553529929f683..d6506e320953d5 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -101,20 +101,26 @@ enum bpf_map_type {
>>       BPF_MAP_TYPE_HASH_OF_MAPS,
>>   };
>>
>> +#define BPF_PROG_TYPES            \
>> +    X(BPF_PROG_TYPE_UNSPEC),    \
>> +    X(BPF_PROG_TYPE_SOCKET_FILTER),    \
>> +    X(BPF_PROG_TYPE_KPROBE),    \
>> +    X(BPF_PROG_TYPE_SCHED_CLS),    \
>> +    X(BPF_PROG_TYPE_SCHED_ACT),    \
>> +    X(BPF_PROG_TYPE_TRACEPOINT),    \
>> +    X(BPF_PROG_TYPE_XDP),        \
>> +    X(BPF_PROG_TYPE_PERF_EVENT),    \
>> +    X(BPF_PROG_TYPE_CGROUP_SKB),    \
>> +    X(BPF_PROG_TYPE_CGROUP_SOCK),    \
>> +    X(BPF_PROG_TYPE_LWT_IN),    \
>> +    X(BPF_PROG_TYPE_LWT_OUT),    \
>> +    X(BPF_PROG_TYPE_LWT_XMIT),
>> +
>> +
>>   enum bpf_prog_type {
>> -    BPF_PROG_TYPE_UNSPEC,
>> -    BPF_PROG_TYPE_SOCKET_FILTER,
>> -    BPF_PROG_TYPE_KPROBE,
>> -    BPF_PROG_TYPE_SCHED_CLS,
>> -    BPF_PROG_TYPE_SCHED_ACT,
>> -    BPF_PROG_TYPE_TRACEPOINT,
>> -    BPF_PROG_TYPE_XDP,
>> -    BPF_PROG_TYPE_PERF_EVENT,
>> -    BPF_PROG_TYPE_CGROUP_SKB,
>> -    BPF_PROG_TYPE_CGROUP_SOCK,
>> -    BPF_PROG_TYPE_LWT_IN,
>> -    BPF_PROG_TYPE_LWT_OUT,
>> -    BPF_PROG_TYPE_LWT_XMIT,
>> +#define X(type) type
> 
> Defining X in uapi could clash easily with other headers e.g.
> from application side.

I think that X-macros are so much used that code review should catch
that no X macro is leaked beyond its scope.

I certainly can use some other macro names, maybe something along the
line from bpf_types.h.

> 
>> +    BPF_PROG_TYPES
>> +#undef X
>>   };
>>
>>   enum bpf_attach_type {
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 3ba175a24e971a..685c1d0f31e029 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -536,13 +536,41 @@ static void ebpf_proc_stop(struct seq_file *s,
>> void *v)
>>       rcu_read_unlock();
>>   }
>>
>> +static const char *bpf_type_string(enum bpf_prog_type type)
>> +{
>> +    static const char *bpf_type_names[] = {
>> +#define X(type) #type
>> +        BPF_PROG_TYPES
>> +#undef X
>> +    };
>> +
>> +    if (type >= ARRAY_SIZE(bpf_type_names))
>> +        return "<unknown>";
>> +
>> +    return bpf_type_names[type];
>> +}
>> +
>>   static int ebpf_proc_show(struct seq_file *s, void *v)
>>   {
>> +    struct bpf_prog *prog;
>> +    struct bpf_prog_aux *aux;
>> +    char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
>> +
>>       if (v == SEQ_START_TOKEN) {
>> -        seq_printf(s, "# tag\n");
>> +        seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
>>           return 0;
>>       }
>>
>> +    aux = v;
>> +    prog = aux->prog;
>> +
>> +    bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
>> +    seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
>> +           bpf_type_string(prog->type),
>> +           prog->jited ? "jit" : "int",
>> +           prog->priv_cap_sys_admin ? "priv" : "unpriv",
>> +           prog->pages * 1ULL << PAGE_SHIFT);
> 
> Yeah, so that would be quite similar to what we dump in
> bpf_prog_show_fdinfo() modulo the priv bit.

And, additionally, that you don't need to have a file descriptor handy
to do so, which is my main point.

> I generally agree that a facility for dumping all progs is needed
> and it was also on the TODO list after the bpf(2) cmd for dumping
> program insns back to user space.

I think this is orthogonal.

> I think the procfs interface has pro and cons: the upside is that
> you can use it with tools like cat to inspect it, but what you still
> cannot do is to say that you want to see the prog insns for, say,
> prog #4 from that list. If we could iterate over that list through fds
> via bpf(2) syscall, you could i) present the same info you have above
> via fdinfo already and ii) also dump the BPF insns from that specific
> program through a BPF_PROG_DUMP bpf(2) command. Once that dump also
> supports maps in progs, you could go further and fetch related map
> fds for inspection, etc.

Sure, that sounds super. But so far Linux and most (maybe I should write
all) subsystems always provided some easy way to get the insights of the
kernel without having to code or rely on special tools so far. And I
very much enjoyed that part on Linux. FreeBSD did very often abstract
some details behind shared libraries and commands that has no easy
pedant to use for with cat and grep.

> Such option of iterating through that would need a new BPF syscall
> cmd aka BPF_PROG_GET_NEXT which returns the first prog from the list
> and you would walk the next one by passing the current fd, which can
> later also be closed as not needed anymore. We could restrict that
> dump to capable(CAP_SYS_ADMIN), and the kernel tree would need to
> ship a tool e.g. under tools/bpf/ that can be used for inspection.
> 
Martin already posted his patches which add a bpf_prog_id to ebpf
programs. I will have alook at those and will comment over there.

Thansk for the review,
Hannes

Hannes Frederic Sowa April 27, 2017, 1:28 p.m. UTC | #4

On 26.04.2017 23:25, Alexei Starovoitov wrote:
> On Wed, Apr 26, 2017 at 08:24:19PM +0200, Hannes Frederic Sowa wrote:
>>  
>> +static const char *bpf_type_string(enum bpf_prog_type type)
>> +{
>> +	static const char *bpf_type_names[] = {
>> +#define X(type) #type
>> +		BPF_PROG_TYPES
>> +#undef X
>> +	};
>> +
>> +	if (type >= ARRAY_SIZE(bpf_type_names))
>> +		return "<unknown>";
>> +
>> +	return bpf_type_names[type];
>> +}
>> +
>>  static int ebpf_proc_show(struct seq_file *s, void *v)
>>  {
>> +	struct bpf_prog *prog;
>> +	struct bpf_prog_aux *aux;
>> +	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
>> +
>>  	if (v == SEQ_START_TOKEN) {
>> -		seq_printf(s, "# tag\n");
>> +		seq_printf(s, "# tag\t\t\ttype\t\t\truntime\tcap\tmemlock\n");
>>  		return 0;
>>  	}
>>  
>> +	aux = v;
>> +	prog = aux->prog;
>> +
>> +	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
>> +	seq_printf(s, "%s\t%s\t%s\t%s\t%llu\n", prog_tag,
>> +		   bpf_type_string(prog->type),
>> +		   prog->jited ? "jit" : "int",
>> +		   prog->priv_cap_sys_admin ? "priv" : "unpriv",
>> +		   prog->pages * 1ULL << PAGE_SHIFT);
> 
> As I said several times already I'm strongly against procfs
> style of exposing information about the programs.

I would not expand procfs interface to dump ebpf bytecode or jitcode,
albeit I would prefer a file system abstraction for that.

> I don't want this to become debugfs for bpf.

Right now it just prints a list of ebpf programs. You reject of where
things are going or do you already reject this particular patch?

> Maintaining the list of all loaded programs is fine
> and we need a way to iterate through them, but procfs
> is obviously not the interface to do that.
> Programs/maps are binary whereas any fs interface is text.

It should give admins only a high level overview of what is going on in
there system. I don't want o print any kind of opcodes there, just
giving people the possibility to see what is going on.

> It also doesn't scale with large number of programs/maps.
> I prefer Daniel's suggestion on adding 'get_next' like API.
> Also would be good if you can wait for Martin to finish his
> prog->handle/id patches. Then user space will be able
> to iterate through all the progs/maps and fetch all info about
> them through syscall in extensible way.
> And you wouldn't need to abuse kallsyms list for different purpose.

I don't buy the scalability argument:

What is the scalability issue with this O(1) approach on loading and
O(n) (but also without any locks held) on dumping? You can't do any
better. You can even dump several programs with one syscall instead of
slowly iterating over them.

Certainly I can wait with those patches until Martin presented his patches.

And regarding abusing the list, I just renamed it. If you think I abuse
something I can also add another list like bpf_prog_id patches do?

Bye bye,
Hannes

David Miller April 27, 2017, 4 p.m. UTC | #5

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 27 Apr 2017 15:22:49 +0200

> Sure, that sounds super. But so far Linux and most (maybe I should write
> all) subsystems always provided some easy way to get the insights of the
> kernel without having to code or rely on special tools so far.

Not true.

You cannot fully dump socket TCP internal state without netlink based
tools.  It is just one of many examples.

Can you dump all nftables rules without a special tool?

I don't think this is a legitimate line of argument, and I want
this to be done via the bpf() system call which is what people
are working on.

Thanks.

Hannes Frederic Sowa April 27, 2017, 4:28 p.m. UTC | #6

On 27.04.2017 18:00, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Thu, 27 Apr 2017 15:22:49 +0200
> 
>> Sure, that sounds super. But so far Linux and most (maybe I should write
>> all) subsystems always provided some easy way to get the insights of the
>> kernel without having to code or rely on special tools so far.
> 
> Not true.

Yes, I should not have written it that generally. ;)

> You cannot fully dump socket TCP internal state without netlink based
> tools.  It is just one of many examples.
>
> Can you dump all nftables rules without a special tool?

You got me here, I agree that not all state is discoverable via procfs.
But to some degree even netfilter and tcp do expose some considerable
amount of data via procfs. In the case of netfilter it might be less
valuable, though, I have to agree.

> I don't think this is a legitimate line of argument, and I want
> this to be done via the bpf() system call which is what people
> are working on.

I hope you saw that I was absolutely not against dumping or enumeration
with the bpf syscall. It will be the primary interface to debug ebpf and
I completely agree.

Merely I tried to establish the procfs interface as quick look interface
if some type of bpf program is loaded which could start any further
diagnosis. This interface should not have any dependencies and should
even work on embedded devices, where sometimes it might be difficult to
get a binary for the correct architecture installed ad-hoc (I am
thinking about openwrt). But this is definitely also solvable.

I do think if a common utility in util-linux, like lsbpf, is available I
will be fine.

Anyway, I will take this argument back.

Thanks,
Hannes

David Miller April 27, 2017, 4:40 p.m. UTC | #7

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 27 Apr 2017 18:28:17 +0200

> Merely I tried to establish the procfs interface as quick look
> interface

Show me that "quick look" nftables dumping facility via procfs
and I'll start to listen to you.

What you are proposing has no real value once we have bpf() system
call based traversal and has no strict precedence across the
networking subsystem.

Thank you.

[net-next,6/6] bpf: show bpf programs

Commit Message

Comments

Patch