[RFC,bpf-next,0/6] bpf, printk: add BTF-based type printing

Message ID	1587120160-3030-1-git-send-email-alan.maguire@oracle.com
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Alan Maguire <alan.maguire@oracle.com> To: ast@kernel.org, daniel@iogearbox.net, yhs@fb.com Cc: kafai@fb.com, songliubraving@fb.com, andriin@fb.com, john.fastabend@gmail.com, kpsingh@chromium.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Alan Maguire <alan.maguire@oracle.com> Subject: [RFC PATCH bpf-next 0/6] bpf, printk: add BTF-based type printing Date: Fri, 17 Apr 2020 11:42:34 +0100 Message-Id: <1587120160-3030-1-git-send-email-alan.maguire@oracle.com> Sender: bpf-owner@vger.kernel.org Precedence: bulk
Series	bpf, printk: add BTF-based type printing \| expand [RFC,bpf-next,0/6] bpf, printk: add BTF-based type printing [RFC,bpf-next,1/6] bpf: provide function to get vmlinux BTF information [RFC,bpf-next,2/6] bpf: btf->resolved_[ids,sizes] should not be used for vmlinux BTF [RFC,bpf-next,3/6] bpf: move to generic BTF show support, apply it to seq files/strings [RFC,bpf-next,4/6] checkpatch: add new BTF pointer format specifier [RFC,bpf-next,5/6] printk: add type-printing %pT<type> format specifier which uses BTF [RFC,bpf-next,6/6] printk: extend test_printf to test %pT BTF-based format specifier

Alan Maguire April 17, 2020, 10:42 a.m. UTC

The printk family of functions support printing specific pointer types
using %p format specifiers (MAC addresses, IP addresses, etc).  For
full details see Documentation/core-api/printk-formats.rst.

This RFC patchset proposes introducing a "print typed pointer" format
specifier "%pT<type>"; the type specified is then looked up in the BPF
Type Format (BTF) information provided for vmlinux to support display.

There is already support in kernel/bpf/btf.c for "show" functionality;
the changes here generalize that support from seq-file specific
verifier display to the more generic case and add another specific
use case; snprintf()-style rendering of type information to a
provided buffer.  This support is then used to support printk
rendering of types, but the intent is to provide a function
that might be useful in other in-kernel scenarios; for example:

- ftrace could possibly utilize the function to support typed
  display of function arguments by cross-referencing BTF function
  information to derive the types of arguments
- oops/panic messaging could extend the information displayed to
  dig into data structures associated with failing functions

The above potential use cases hint at a potential reply to
a reasonable objection that such typed display should be
solved by tracing programs, where the in kernel tracing records
data and the userspace program prints it out.  While this
is certainly the recommended approach for most cases, I
believe having an in-kernel mechanism would be valuable
also.

The function the printk() family of functions rely on
could potentially be used directly for other use cases
like ftrace where we might have the BTF ids of the
pointers we wish to display; its signature is as follows:

int btf_type_snprintf_show(const struct btf *btf, u32 type_id, void *obj,
                           char *buf, int len, u64 flags);

So if ftrace say had the BTF ids of the types of arguments,
we see that the above would allow us to convert the
pointer data into displayable form.

To give a flavour for what the printed-out data looks like,
here we use pr_info() to display a struct sk_buff *.  Note
we specify the 'N' modifier to show type field names:

  struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);

  pr_info("%pTN<struct sk_buff>", skb);

...gives us:

{{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,.rb_right=00000000c7916e9c,.rb_left=00000000c7916e9c}|.list={.next=00000000c7916e9c,.prev=00000000c7916e9c}},{.sk=00000000c7916e9c|.ip_defrag_offset=0},{.tstamp=0|.skb_mstamp_ns=0},.cb=['\0'],{{._skb_refdst=0,.destructor=00000000c7916e9c}|.tcp_tsorted_anchor={.next=00000000c7916e9c,.prev=00000000c7916e9c}},._nfct=0,.len=0,.data_len=0,.mac_len=0,.hdr_len=0,.queue_mapping=0,.__cloned_offset=[],.cloned=0x0,.nohdr=0x0,.fclone=0x0,.peeked=0x0,.head_frag=0x0,.pfmemalloc=0x0,.active_extensions=0,.headers_start=[],.__pkt_type_offset=[],.pkt_type=0x0,.ignore_df=0x0,.nf_trace=0x0,.ip_summed=0x0,.ooo_okay=0x0,.l4_hash=0x0,.sw_hash=0x0,.wifi_acked_valid=0x0,.wifi_acked=0x0,.no_fcs=0x0,.encapsulation=0x0,.encap_hdr_csum=0x0,.csum_valid=0x0,.__pkt_vlan_present_offset=[],.vlan_present=0x0,.csum_complete_sw=0x0,.csum_level=0x0,.csum_not_inet=0x0,.dst_pending_co

printk output is truncated at 1024 bytes.  For such cases, the compact
display mode (minus the field info 'N') may be used at the expense of
readability. "|" differentiates between different union members, but
aside from that the format is intended to minic a valid C initiailizer
for the given type.

The hope is that this functionality will be useful for debugging,
and possibly help facilitate the cases mentioned above in the future.

The patches are marked RFC for several reasons

- There's already an RFC patchset in flight dealing with BTF dumping;

https://www.spinics.net/lists/netdev/msg644412.html

  The reason I'm posting this is the approach is a bit different 
  and there may be ways of synthesizing the approaches.
- The mechanism of vmlinux BTF initialization is not fit for purpose
  in a printk() setting as I understand it (it uses mutex locking
  to prevent multiple initializations of the BTF info).  A simple
  approach to support printk might be to simply initialize the
  BTF vmlinux case early in boot; it only needs to happen once.
  Any suggestions here would be great.
- BTF-based rendering is more complex than other printk() format
  specifier-driven methods; that said, because of its generality it
  does provide significant value I think
- More tests are needed.

Alan Maguire (6):
  bpf: provide function to get vmlinux BTF information
  bpf: btf->resolved_[ids,sizes] should not be used for vmlinux BTF
  bpf: move to generic BTF show support, apply it to seq files/strings
  checkpatch: add new BTF pointer format specifier
  printk: add type-printing %pT<type> format specifier which uses BTF
  printk: extend test_printf to test %pT BTF-based format specifier

 Documentation/core-api/printk-formats.rst |   8 +
 include/linux/bpf.h                       |   2 +
 include/linux/btf.h                       |  35 ++-
 kernel/bpf/btf.c                          | 466 ++++++++++++++++++++++++------
 kernel/bpf/verifier.c                     |  18 +-
 lib/Kconfig                               |  16 +
 lib/test_printf.c                         | 118 ++++++++
 lib/vsprintf.c                            | 145 +++++++++-
 scripts/checkpatch.pl                     |   2 +-
 9 files changed, 704 insertions(+), 106 deletions(-)

Arnaldo Carvalho de Melo April 17, 2020, 4:47 p.m. UTC | #1

Em Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire escreveu:
> To give a flavour for what the printed-out data looks like,
> here we use pr_info() to display a struct sk_buff *.  Note
> we specify the 'N' modifier to show type field names:
> 
>   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
> 
>   pr_info("%pTN<struct sk_buff>", skb);
> 
> ...gives us:
> 
> {{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,.rb_right=00000000c7916e9c,.rb_left=00000000c7916e9c}|.list={.next=00000000c7916e9c,.prev=00000000c7916e9c}},{.sk=00000000c7916e9c|.ip_defrag_offset=0},{.tstamp=0|.skb_mstamp_ns=0},.cb=['\0'],{{._skb_refdst=0,.destructor=00000000c7916e9c}|.tcp_tsorted_anchor={.next=00000000c7916e9c,.prev=00000000c7916e9c}},._nfct=0,.len=0,.data_len=0,.mac_len=0,.hdr_len=0,.queue_mapping=0,.__cloned_offset=[],.cloned=0x0,.nohdr=0x0,.fclone=0x0,.peeked=0x0,.head_frag=0x0,.pfmemalloc=0x0,.active_extensions=0,.headers_start=[],.__pkt_type_offset=[],.pkt_type=0x0,.ignore_df=0x0,.nf_trace=0x0,.ip_summed=0x0,.ooo_okay=0x0,.l4_hash=0x0,.sw_hash=0x0,.wifi_acked_valid=0x0,.wifi_acked=0x0,.no_fcs=0x0,.encapsulation=0x0,.encap_hdr_csum=0x0,.csum_valid=0x0,.__pkt_vlan_present_offset=[],.vlan_present=0x0,.csum_complete_sw=0x0,.csum_level=0x0,.csum_not_inet=0x0,.dst_pending_co

One suggestion, to make this more compact, one could have %pTNz<struct
sk_buff>" that wouldn't print any integral type member that is zeroed
:-)
 
- Arnaldo

Alan Maguire April 17, 2020, 5:06 p.m. UTC | #2

On Fri, 17 Apr 2020, Arnaldo Carvalho de Melo wrote:

> Em Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire escreveu:
> > To give a flavour for what the printed-out data looks like,
> > here we use pr_info() to display a struct sk_buff *.  Note
> > we specify the 'N' modifier to show type field names:
> > 
> >   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
> > 
> >   pr_info("%pTN<struct sk_buff>", skb);
> > 
> > ...gives us:
> > 
> > {{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,.rb_right=00000000c7916e9c,.rb_left=00000000c7916e9c}|.list={.next=00000000c7916e9c,.prev=00000000c7916e9c}},{.sk=00000000c7916e9c|.ip_defrag_offset=0},{.tstamp=0|.skb_mstamp_ns=0},.cb=['\0'],{{._skb_refdst=0,.destructor=00000000c7916e9c}|.tcp_tsorted_anchor={.next=00000000c7916e9c,.prev=00000000c7916e9c}},._nfct=0,.len=0,.data_len=0,.mac_len=0,.hdr_len=0,.queue_mapping=0,.__cloned_offset=[],.cloned=0x0,.nohdr=0x0,.fclone=0x0,.peeked=0x0,.head_frag=0x0,.pfmemalloc=0x0,.active_extensions=0,.headers_start=[],.__pkt_type_offset=[],.pkt_type=0x0,.ignore_df=0x0,.nf_trace=0x0,.ip_summed=0x0,.ooo_okay=0x0,.l4_hash=0x0,.sw_hash=0x0,.wifi_acked_valid=0x0,.wifi_acked=0x0,.no_fcs=0x0,.encapsulation=0x0,.encap_hdr_csum=0x0,.csum_valid=0x0,.__pkt_vlan_present_offset=[],.vlan_present=0x0,.csum_complete_sw=0x0,.csum_level=0x0,.csum_not_inet=0x0,.dst_pending_co
> 
> One suggestion, to make this more compact, one could have %pTNz<struct
> sk_buff>" that wouldn't print any integral type member that is zeroed
> :-)
>

That's a great idea, thanks Arnaldo! I'll add that.

Alan
 
> - Arnaldo
>

Alexei Starovoitov April 18, 2020, 4:05 p.m. UTC | #3

On Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire wrote:
> The printk family of functions support printing specific pointer types
> using %p format specifiers (MAC addresses, IP addresses, etc).  For
> full details see Documentation/core-api/printk-formats.rst.
> 
> This RFC patchset proposes introducing a "print typed pointer" format
> specifier "%pT<type>"; the type specified is then looked up in the BPF
> Type Format (BTF) information provided for vmlinux to support display.

This is great idea! Love it.

> The above potential use cases hint at a potential reply to
> a reasonable objection that such typed display should be
> solved by tracing programs, where the in kernel tracing records
> data and the userspace program prints it out.  While this
> is certainly the recommended approach for most cases, I
> believe having an in-kernel mechanism would be valuable
> also.

yep. This is useful for general purpose printk.
The only piece that must be highlighted in the printk documentation
that unlike the rest of BPF there are zero safety guarantees here.
The programmer can pass wrong pointer to printk() and the kernel _will_ crash.

>   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
> 
>   pr_info("%pTN<struct sk_buff>", skb);

why follow "TN" convention?
I think "%p<struct sk_buff>" is much more obvious, unambiguous, and
equally easy to parse.

> ...gives us:
> 
> {{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,

This is unreadable.
I like the choice of C style output, but please format it similar to drgn. Like:
*(struct task_struct *)0xffff889ff8a08000 = {
	.thread_info = (struct thread_info){
		.flags = (unsigned long)0,
		.status = (u32)0,
	},
	.state = (volatile long)1,
	.stack = (void *)0xffffc9000c4dc000,
	.usage = (refcount_t){
		.refs = (atomic_t){
			.counter = (int)2,
		},
	},
	.flags = (unsigned int)4194560,
	.ptrace = (unsigned int)0,

I like Arnaldo's idea as well, but I prefer zeros to be dropped by default.
Just like %d doesn't print leading zeros by default.
"%p0<struct sk_buff>" would print them.

> The patches are marked RFC for several reasons
> 
> - There's already an RFC patchset in flight dealing with BTF dumping;
> 
> https://www.spinics.net/lists/netdev/msg644412.html
> 
>   The reason I'm posting this is the approach is a bit different 
>   and there may be ways of synthesizing the approaches.

I see no overlap between patch sets whatsoever.
Why do you think there is?

> - The mechanism of vmlinux BTF initialization is not fit for purpose
>   in a printk() setting as I understand it (it uses mutex locking
>   to prevent multiple initializations of the BTF info).  A simple
>   approach to support printk might be to simply initialize the
>   BTF vmlinux case early in boot; it only needs to happen once.
>   Any suggestions here would be great.
> - BTF-based rendering is more complex than other printk() format
>   specifier-driven methods; that said, because of its generality it
>   does provide significant value I think
> - More tests are needed.

yep. Please make sure to add one to selftest/bpf as well.
bpf maintainers don't run printk tests as part of workflow, so
future BTF changes will surely break it if there are no selftests/bpf.

Patch 2 isn't quite correct. Early parse of vmlinux BTF does not compute
resolved_ids to save kernel memory. The trade off is execution time vs kernel
memory. I believe that saving memory is more important here, since execution is
not in critical path. There is __get_type_size(). It should be used in later
patches instead of btf_type_id_size() that relies on pre-computed
resolved_sizes and resolved_ids.

Arnaldo Carvalho de Melo April 18, 2020, 8:31 p.m. UTC | #4

On April 18, 2020 1:05:36 PM GMT-03:00, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>On Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire wrote:
>> The printk family of functions support printing specific pointer
>types
>> using %p format specifiers (MAC addresses, IP addresses, etc).  For
>> full details see Documentation/core-api/printk-formats.rst.
>> 
>> This RFC patchset proposes introducing a "print typed pointer" format
>> specifier "%pT<type>"; the type specified is then looked up in the
>BPF
>> Type Format (BTF) information provided for vmlinux to support
>display.
>
>This is great idea! Love it.

21st century finally! 8-)

>> The above potential use cases hint at a potential reply to
>> a reasonable objection that such typed display should be
>> solved by tracing programs, where the in kernel tracing records
>> data and the userspace program prints it out.  While this
>> is certainly the recommended approach for most cases, I
>> believe having an in-kernel mechanism would be valuable
>> also.
>
>yep. This is useful for general purpose printk.
>The only piece that must be highlighted in the printk documentation
>that unlike the rest of BPF there are zero safety guarantees here.
>The programmer can pass wrong pointer to printk() and the kernel _will_
>crash.
>
>>   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
>> 
>>   pr_info("%pTN<struct sk_buff>", skb);
>
>why follow "TN" convention?
>I think "%p<struct sk_buff>" is much more obvious, unambiguous, and
>equally easy to parse.
>
>> ...gives us:
>> 
>>
>{{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,
>
>This is unreadable.
>I like the choice of C style output, but please format it similar to
>drgn. Like:
>*(struct task_struct *)0xffff889ff8a08000 = {
>	.thread_info = (struct thread_info){
>		.flags = (unsigned long)0,
>		.status = (u32)0,
>	},
>	.state = (volatile long)1,
>	.stack = (void *)0xffffc9000c4dc000,
>	.usage = (refcount_t){
>		.refs = (atomic_t){
>			.counter = (int)2,
>		},
>	},
>	.flags = (unsigned int)4194560,
>	.ptrace = (unsigned int)0,
>
>I like Arnaldo's idea as well, but I prefer zeros to be dropped by
>default.

That's my preference as well, it's just I feel ashamed of not participating in this party as much as I feel I should, so I was just being overly cautious in my suggestions.

'perf trace'  zaps any zero syscall arg out of the way by default, so that's my preference, sure.

 - Arnaldo

>Just like %d doesn't print leading zeros by default.
>"%p0<struct sk_buff>" would print them.
>
>> The patches are marked RFC for several reasons
>> 
>> - There's already an RFC patchset in flight dealing with BTF dumping;
>> 
>> https://www.spinics.net/lists/netdev/msg644412.html
>> 
>>   The reason I'm posting this is the approach is a bit different 
>>   and there may be ways of synthesizing the approaches.
>
>I see no overlap between patch sets whatsoever.
>Why do you think there is?
>
>> - The mechanism of vmlinux BTF initialization is not fit for purpose
>>   in a printk() setting as I understand it (it uses mutex locking
>>   to prevent multiple initializations of the BTF info).  A simple
>>   approach to support printk might be to simply initialize the
>>   BTF vmlinux case early in boot; it only needs to happen once.
>>   Any suggestions here would be great.
>> - BTF-based rendering is more complex than other printk() format
>>   specifier-driven methods; that said, because of its generality it
>>   does provide significant value I think
>> - More tests are needed.
>
>yep. Please make sure to add one to selftest/bpf as well.
>bpf maintainers don't run printk tests as part of workflow, so
>future BTF changes will surely break it if there are no selftests/bpf.
>
>Patch 2 isn't quite correct. Early parse of vmlinux BTF does not
>compute
>resolved_ids to save kernel memory. The trade off is execution time vs
>kernel
>memory. I believe that saving memory is more important here, since
>execution is
>not in critical path. There is __get_type_size(). It should be used in
>later
>patches instead of btf_type_id_size() that relies on pre-computed
>resolved_sizes and resolved_ids.

Alan Maguire April 20, 2020, 3:29 p.m. UTC | #5

On Sat, 18 Apr 2020, Alexei Starovoitov wrote:

> On Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire wrote:
> > The printk family of functions support printing specific pointer types
> > using %p format specifiers (MAC addresses, IP addresses, etc).  For
> > full details see Documentation/core-api/printk-formats.rst.
> > 
> > This RFC patchset proposes introducing a "print typed pointer" format
> > specifier "%pT<type>"; the type specified is then looked up in the BPF
> > Type Format (BTF) information provided for vmlinux to support display.
> 
> This is great idea! Love it.
>

Thanks for taking a look!
 
> > The above potential use cases hint at a potential reply to
> > a reasonable objection that such typed display should be
> > solved by tracing programs, where the in kernel tracing records
> > data and the userspace program prints it out.  While this
> > is certainly the recommended approach for most cases, I
> > believe having an in-kernel mechanism would be valuable
> > also.
> 
> yep. This is useful for general purpose printk.
> The only piece that must be highlighted in the printk documentation
> that unlike the rest of BPF there are zero safety guarantees here.
> The programmer can pass wrong pointer to printk() and the kernel _will_ crash.
> 

Good point; I'll highlight the fact that we aren't
executing in BPF context, no verifier etc.

> >   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
> > 
> >   pr_info("%pTN<struct sk_buff>", skb);
> 
> why follow "TN" convention?
> I think "%p<struct sk_buff>" is much more obvious, unambiguous, and
> equally easy to parse.
> 

That was my first choice, but the first character
after the 'p' in the '%p' specifier signifies the
pointer format specifier. If we use '<', and have
'%p<', where do we put the modifiers? '%p<xYz struct foo>'
seems clunky to me.

> > ...gives us:
> > 
> > {{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,
> 
> This is unreadable.
> I like the choice of C style output, but please format it similar to drgn. Like:
> *(struct task_struct *)0xffff889ff8a08000 = {
> 	.thread_info = (struct thread_info){
> 		.flags = (unsigned long)0,
> 		.status = (u32)0,
> 	},
> 	.state = (volatile long)1,
> 	.stack = (void *)0xffffc9000c4dc000,
> 	.usage = (refcount_t){
> 		.refs = (atomic_t){
> 			.counter = (int)2,
> 		},
> 	},
> 	.flags = (unsigned int)4194560,
> 	.ptrace = (unsigned int)0,
> 
> I like Arnaldo's idea as well, but I prefer zeros to be dropped by default.
> Just like %d doesn't print leading zeros by default.
> "%p0<struct sk_buff>" would print them.
> 

I'll try and match the above as closely as possible for v2
while retaining the compact form for the verifier's use.

> > The patches are marked RFC for several reasons
> > 
> > - There's already an RFC patchset in flight dealing with BTF dumping;
> > 
> > https://www.spinics.net/lists/netdev/msg644412.html
> > 
> >   The reason I'm posting this is the approach is a bit different 
> >   and there may be ways of synthesizing the approaches.
> 
> I see no overlap between patch sets whatsoever.
> Why do you think there is?
>

Because I hadn't read through Yonghong's patchset properly ;-)
I see potential future overlap in having a dumper support a 
"raw" mode using BTF-based display if needed, but no actual
overlap in what's there (and here) today.
 
> > - The mechanism of vmlinux BTF initialization is not fit for purpose
> >   in a printk() setting as I understand it (it uses mutex locking
> >   to prevent multiple initializations of the BTF info).  A simple
> >   approach to support printk might be to simply initialize the
> >   BTF vmlinux case early in boot; it only needs to happen once.
> >   Any suggestions here would be great.
> > - BTF-based rendering is more complex than other printk() format
> >   specifier-driven methods; that said, because of its generality it
> >   does provide significant value I think
> > - More tests are needed.
> 
> yep. Please make sure to add one to selftest/bpf as well.
> bpf maintainers don't run printk tests as part of workflow, so
> future BTF changes will surely break it if there are no selftests/bpf.
> 

Absolutely.

> Patch 2 isn't quite correct. Early parse of vmlinux BTF does not compute
> resolved_ids to save kernel memory. The trade off is execution time vs kernel
> memory. I believe that saving memory is more important here, since execution is
> not in critical path. There is __get_type_size(). It should be used in later
> patches instead of btf_type_id_size() that relies on pre-computed
> resolved_sizes and resolved_ids.
>

Thanks for the info, will fix for v2!

Alan

Joe Perches April 20, 2020, 4:32 p.m. UTC | #6

On Mon, 2020-04-20 at 16:29 +0100, Alan Maguire wrote:
> On Sat, 18 Apr 2020, Alexei Starovoitov wrote:
> 
> > On Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire wrote:
> > > The printk family of functions support printing specific pointer types
> > > using %p format specifiers (MAC addresses, IP addresses, etc).  For
> > > full details see Documentation/core-api/printk-formats.rst.
> > > 
> > > This RFC patchset proposes introducing a "print typed pointer" format
> > > specifier "%pT<type>"; the type specified is then looked up in the BPF
> > > Type Format (BTF) information provided for vmlinux to support display.
> > 
> > This is great idea! Love it.
> > 
> 
> Thanks for taking a look!
>  
> > > The above potential use cases hint at a potential reply to
> > > a reasonable objection that such typed display should be
> > > solved by tracing programs, where the in kernel tracing records
> > > data and the userspace program prints it out.  While this
> > > is certainly the recommended approach for most cases, I
> > > believe having an in-kernel mechanism would be valuable
> > > also.
> > 
> > yep. This is useful for general purpose printk.
> > The only piece that must be highlighted in the printk documentation
> > that unlike the rest of BPF there are zero safety guarantees here.
> > The programmer can pass wrong pointer to printk() and the kernel _will_ crash.
> > 
> 
> Good point; I'll highlight the fact that we aren't
> executing in BPF context, no verifier etc.
> 
> > >   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
> > > 
> > >   pr_info("%pTN<struct sk_buff>", skb);
> > 
> > why follow "TN" convention?
> > I think "%p<struct sk_buff>" is much more obvious, unambiguous, and
> > equally easy to parse.
> > 
> 
> That was my first choice, but the first character
> after the 'p' in the '%p' specifier signifies the
> pointer format specifier. If we use '<', and have
> '%p<', where do we put the modifiers? '%p<xYz struct foo>'
> seems clunky to me.

While I don't really like the %p<struct type> block,
it's at least obvious what's being attempted.

Modifiers could easily go after the <struct type> block.

It appears a %p<struct type> output might be a lot of
total characters so another potential issue might be
the maximum length of each individual printk.

> > I like the choice of C style output, but please format it similar to drgn. Like:
> > *(struct task_struct *)0xffff889ff8a08000 = {
> > 	.thread_info = (struct thread_info){
> > 		.flags = (unsigned long)0,
> > 		.status = (u32)0,
> > 	},
> > 	.state = (volatile long)1,
> > 	.stack = (void *)0xffffc9000c4dc000,
> > 	.usage = (refcount_t){
> > 		.refs = (atomic_t){
> > 			.counter = (int)2,
> > 		},
> > 	},
> > 	.flags = (unsigned int)4194560,
> > 	.ptrace = (unsigned int)0,

And here, the issue might be duplicating the log level
for each line of output and/or prefixing each line with
whatever struct type is being dumped as interleaving
with other concurrent logging is possible.

Here as well the individual field types don't contain
enough information to determine if a field should be
output as %x or %u.

Arnaldo Carvalho de Melo April 20, 2020, 8:54 p.m. UTC | #7

Em Sat, Apr 18, 2020 at 09:05:36AM -0700, Alexei Starovoitov escreveu:
> On Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire wrote:
> > ...gives us:
> > 
> > {{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,
 
> This is unreadable.
> I like the choice of C style output, but please format it similar to drgn. Like:
> *(struct task_struct *)0xffff889ff8a08000 = {
> 	.thread_info = (struct thread_info){
> 		.flags = (unsigned long)0,
> 		.status = (u32)0,
> 	},
> 	.state = (volatile long)1,
> 	.stack = (void *)0xffffc9000c4dc000,
> 	.usage = (refcount_t){
> 		.refs = (atomic_t){
> 			.counter = (int)2,
> 		},
> 	},
> 	.flags = (unsigned int)4194560,
> 	.ptrace = (unsigned int)0,
 
> I like Arnaldo's idea as well, but I prefer zeros to be dropped by default.
> Just like %d doesn't print leading zeros by default.
> "%p0<struct sk_buff>" would print them.

I was thinking about another way to compress the output of a given data
structure someone is tracking, having to print it from time to time,
which is to store a copy of the struct as you print it and then, when
printing it again, print just its pointer, i.e. that:

*(struct task_struct *)0xffff889ff8a08000 = {

Line, then just printing the fields that changed, say just that refcount
was bumped, so it first print:

*(struct task_struct *)0xffff889ff8a08000 = {
      .thread_info = (struct thread_info){
              .flags = (unsigned long)0,
              .status = (u32)0,
      },
      .state = (volatile long)1,
      .stack = (void *)0xffffc9000c4dc000,
      .usage = (refcount_t){
              .refs = (atomic_t){
                      .counter = (int)2,
              },
      },
      .flags = (unsigned int)4194560,
      .ptrace = (unsigned int)0,

Then, the next time it would print:

*(struct task_struct *)0xffff889ff8a08000 = {
      .usage = (refcount_t){
              .refs = (atomic_t){
                      .counter = (int)3,
              },
      },
},

- Arnaldo

Rasmus Villemoes April 29, 2020, 12:15 p.m. UTC | #8

On 20/04/2020 18.32, Joe Perches wrote:
> On Mon, 2020-04-20 at 16:29 +0100, Alan Maguire wrote:

>>>>   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
>>>>
>>>>   pr_info("%pTN<struct sk_buff>", skb);
>>>
>>> why follow "TN" convention?
>>> I think "%p<struct sk_buff>" is much more obvious, unambiguous, and
>>> equally easy to parse.
>>>
>>
>> That was my first choice, but the first character
>> after the 'p' in the '%p' specifier signifies the
>> pointer format specifier. If we use '<', and have
>> '%p<', where do we put the modifiers? '%p<xYz struct foo>'
>> seems clunky to me.

There's also the issue that %p followed by alnum has been understood to
be reserved/specially handled by the kernel's printf implementation for
a decade, while other characters have so far been treated as "OK, this
was just a normal %p". A quick grep for %p< only gives a hit in
drivers/scsi/dc395x.c, but there could be others (with field width
modifier between % and p), and in any case I think it's a bad idea to
extend the set of characters that cannot follow %p.

Rasmus

Alan Maguire April 30, 2020, 10:03 a.m. UTC | #9

On Mon, 20 Apr 2020, Joe Perches wrote:

> On Mon, 2020-04-20 at 16:29 +0100, Alan Maguire wrote:
> > On Sat, 18 Apr 2020, Alexei Starovoitov wrote:
> > 
> > > On Fri, Apr 17, 2020 at 11:42:34AM +0100, Alan Maguire wrote:
> > > > The printk family of functions support printing specific pointer types
> > > > using %p format specifiers (MAC addresses, IP addresses, etc).  For
> > > > full details see Documentation/core-api/printk-formats.rst.
> > > > 
> > > > This RFC patchset proposes introducing a "print typed pointer" format
> > > > specifier "%pT<type>"; the type specified is then looked up in the BPF
> > > > Type Format (BTF) information provided for vmlinux to support display.
> > > 
> > > This is great idea! Love it.
> > > 
> > 
> > Thanks for taking a look!
> >  
> > > > The above potential use cases hint at a potential reply to
> > > > a reasonable objection that such typed display should be
> > > > solved by tracing programs, where the in kernel tracing records
> > > > data and the userspace program prints it out.  While this
> > > > is certainly the recommended approach for most cases, I
> > > > believe having an in-kernel mechanism would be valuable
> > > > also.
> > > 
> > > yep. This is useful for general purpose printk.
> > > The only piece that must be highlighted in the printk documentation
> > > that unlike the rest of BPF there are zero safety guarantees here.
> > > The programmer can pass wrong pointer to printk() and the kernel _will_ crash.
> > > 
> > 
> > Good point; I'll highlight the fact that we aren't
> > executing in BPF context, no verifier etc.
> > 
> > > >   struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);
> > > > 
> > > >   pr_info("%pTN<struct sk_buff>", skb);
> > > 
> > > why follow "TN" convention?
> > > I think "%p<struct sk_buff>" is much more obvious, unambiguous, and
> > > equally easy to parse.
> > > 
> > 
> > That was my first choice, but the first character
> > after the 'p' in the '%p' specifier signifies the
> > pointer format specifier. If we use '<', and have
> > '%p<', where do we put the modifiers? '%p<xYz struct foo>'
> > seems clunky to me.
> 
> While I don't really like the %p<struct type> block,
> it's at least obvious what's being attempted.
> 
> Modifiers could easily go after the <struct type> block.
>

Good idea; I'll go with that approach for v2.
 
> It appears a %p<struct type> output might be a lot of
> total characters so another potential issue might be
> the maximum length of each individual printk.
>

Right, that's a concern. On the other side, it turns out
that with the "omit zeroed values" behaviour by default,
the output trims down nicely.  The omit zero functionality
works for nested types too, so a freshly-allocated skb
fits easily inside the printk limit now.  The in-progress
output looks like this now (v2 coming shortly I hope):

printk(KERN_INFO "%p<struct sk_buff>", skb);

(struct sk_buff){
 .transport_header = (__u16)65535,
 .mac_header = (__u16)65535,
 .end = (sk_buff_data_t)192,
 .head = (unsigned char *)000000007524fd8b,
 .data = (unsigned char *)000000007524fd8b,
 .truesize = (unsigned int)768,
 .users = (refcount_t){
  .refs = (atomic_t){
   .counter = (int)1,
  },
 },
}

Of course as structures get used and values get set
(precisely when we need to see them!) more fields will
be displayed and we will bump against printk limits.

The modifiers I'm working on for v2 are

'c' - compact mode (no pretty-printing), i.e.

(struct sk_buff){.transport_header = (__u16)65535,.mac_header = (__u16)65535,.end = (sk_buff_data_t)192,.head = (unsigned char *)000000007524fd8b,.data = (unsigned char *)000000007524fd8b,.truesize = (unsigned int)768,.users = (refcount_t){.refs = (atomic_t){.counter = (int)1,},},}

      This saves a fair few characters, especially for highly-indented
      data structures.

'T' - omit type/member names.
'x' - avoid pointer obfuscation
'0' - show zeroed values, as suggested by Arnaldo and Alexei

...so the default output of "%p<struct sk_buff>"
will be like the above example.

I was hoping to punt on pointer obfuscation and just
use %p[K] since obfuscation can be sysctl-controlled;
however that control assumes a controlling process context
as I understand it.  Since BTF printk can be invoked
anywhere (especially via trace_printk() in BPF programs),
those settings aren't hugely relevant, so I _think_ we need
a way to directly control obfuscation for this use case.

> > > I like the choice of C style output, but please format it similar to drgn. Like:
> > > *(struct task_struct *)0xffff889ff8a08000 = {
> > > 	.thread_info = (struct thread_info){
> > > 		.flags = (unsigned long)0,
> > > 		.status = (u32)0,
> > > 	},
> > > 	.state = (volatile long)1,
> > > 	.stack = (void *)0xffffc9000c4dc000,
> > > 	.usage = (refcount_t){
> > > 		.refs = (atomic_t){
> > > 			.counter = (int)2,
> > > 		},
> > > 	},
> > > 	.flags = (unsigned int)4194560,
> > > 	.ptrace = (unsigned int)0,
> 
> And here, the issue might be duplicating the log level
> for each line of output and/or prefixing each line with
> whatever struct type is being dumped as interleaving
> with other concurrent logging is possible.
> 

That's a good idea but sadly it makes the problem of longer
display worse.  Compact/type-omitted options help for
the particularly bad cases at least I suppose.

> Here as well the individual field types don't contain
> enough information to determine if a field should be
> output as %x or %u.
> 
>

Right, we could add some more format modifiers for cases
like that I guess.  Currently the display formats used are

- numbers are represented as decimal
- bitfields are represented in hex
- pointers are obfuscated unless the 'x' option is used
- char arrays are printed as chars if printable,
  [ 'l', 'i', 'k', 'e', ' ', 't', 'h', 'i', 's' ]

I'd be happy to add more format specifiers to control
these options, or tweak the defaults if needed. A
"print numbers in hex" option seems worthwhile perhaps?

Thanks for the feedback!

Alan

Joe Perches May 2, 2020, 12:25 a.m. UTC | #10

On Thu, 2020-04-30 at 03:03 -0700, Alan Maguire wrote:
> On Mon, 20 Apr 2020, Joe Perches wrote:

> > Here as well the individual field types don't contain
> > enough information to determine if a field should be
> > output as %x or %u.
> Right, we could add some more format modifiers for cases
> like that I guess.  Currently the display formats used are
> - numbers are represented as decimal
> - bitfields are represented in hex
> - pointers are obfuscated unless the 'x' option is used
> - char arrays are printed as chars if printable,
>   [ 'l', 'i', 'k', 'e', ' ', 't', 'h', 'i', 's' ]
> 
> I'd be happy to add more format specifiers to control
> these options, or tweak the defaults if needed. A
> "print numbers in hex" option seems worthwhile perhaps?

Or maybe add and use new typedefs like x8,x16,x32,x64 to the
bpf struct definitions where u8,u16,u32,u64 are %u and
x8,x16,x32,x64 are %x

[RFC,bpf-next,0/6] bpf, printk: add BTF-based type printing

Message

Comments