diff mbox series

[v2,bpf-next,2/2] btf: expose API to work with raw btf data

Message ID 20190206002949.1915237-3-andriin@fb.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series tools/btf: extends libbpf APIs to work with btf w/o kernel | expand

Commit Message

Andrii Nakryiko Feb. 6, 2019, 12:29 a.m. UTC
This patch exposes two new APIs btf__get_raw_data_size() and
btf__get_raw_data() that allows to get a copy of raw BTF data out of
struct btf. This is useful for external programs that need to manipulate
raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
and then writing it back to file.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 tools/lib/bpf/btf.c      | 10 ++++++++++
 tools/lib/bpf/btf.h      |  2 ++
 tools/lib/bpf/libbpf.map |  2 ++
 3 files changed, 14 insertions(+)

Comments

Alexei Starovoitov Feb. 6, 2019, 3:07 a.m. UTC | #1
On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> This patch exposes two new APIs btf__get_raw_data_size() and
> btf__get_raw_data() that allows to get a copy of raw BTF data out of
> struct btf. This is useful for external programs that need to manipulate
> raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> and then writing it back to file.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> ---
>  tools/lib/bpf/btf.c      | 10 ++++++++++
>  tools/lib/bpf/btf.h      |  2 ++
>  tools/lib/bpf/libbpf.map |  2 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 1c2ba7182400..34bfb3641aac 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
>  	return btf->fd;
>  }
>  
> +__u32 btf__get_raw_data_size(const struct btf *btf)
> +{
> +	return btf->data_size;
> +}
> +
> +void btf__get_raw_data(const struct btf *btf, char *data)
> +{
> +	memcpy(data, btf->data, btf->data_size);
> +}

I cannot think of any other way to use this api,
but to call btf__get_raw_data_size() first,
then malloc that much memory and then call btf__get_raw_data()
to store btf into it.

If so, may be api should be single call that allocates, copies,
and returns pointer to new mem and its size?
Probably less error prone?
Andrii Nakryiko Feb. 6, 2019, 5:46 a.m. UTC | #2
On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> > This patch exposes two new APIs btf__get_raw_data_size() and
> > btf__get_raw_data() that allows to get a copy of raw BTF data out of
> > struct btf. This is useful for external programs that need to manipulate
> > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> > and then writing it back to file.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > Acked-by: Song Liu <songliubraving@fb.com>
> > ---
> >  tools/lib/bpf/btf.c      | 10 ++++++++++
> >  tools/lib/bpf/btf.h      |  2 ++
> >  tools/lib/bpf/libbpf.map |  2 ++
> >  3 files changed, 14 insertions(+)
> >
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 1c2ba7182400..34bfb3641aac 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
> >       return btf->fd;
> >  }
> >
> > +__u32 btf__get_raw_data_size(const struct btf *btf)
> > +{
> > +     return btf->data_size;
> > +}
> > +
> > +void btf__get_raw_data(const struct btf *btf, char *data)
> > +{
> > +     memcpy(data, btf->data, btf->data_size);
> > +}
>
> I cannot think of any other way to use this api,
> but to call btf__get_raw_data_size() first,
> then malloc that much memory and then call btf__get_raw_data()
> to store btf into it.
>
> If so, may be api should be single call that allocates, copies,
> and returns pointer to new mem and its size?
> Probably less error prone?
>

I don't have strong preference, but providing pointer to allocated memory
seems more flexible and allows more clever/optimal use of memory from caller
side. E.g., instead of doing two mallocs, you can imagine doing something
like:

int max_size = max(btf__get_raw_data_size(btf),
                   btf_ext__get_raw_data_size(btf_ext));
char *m = malloc(max_size);
btf__get_raw_data(btf, m);
dump_btf_section_to_file(m, some_file);
btf_ext__get_raw_data(btf_ext, m);
dump_btf_ext_section_to_file(m, some_file);
free(m);

Also, pointer to memory could be mmap()'ed file, for instance. In general,
for a library it might be a good thing to not be prescriptive as to how one
gets that piece of memory.

If those examples are not convincing enough, I'm happy to go with single
btf__get_raw_data() call doing allocation and returning pointer.
Alexei Starovoitov Feb. 6, 2019, 6:24 a.m. UTC | #3
On Tue, Feb 05, 2019 at 09:46:14PM -0800, Andrii Nakryiko wrote:
> On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> > > This patch exposes two new APIs btf__get_raw_data_size() and
> > > btf__get_raw_data() that allows to get a copy of raw BTF data out of
> > > struct btf. This is useful for external programs that need to manipulate
> > > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> > > and then writing it back to file.
> > >
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > Acked-by: Song Liu <songliubraving@fb.com>
> > > ---
> > >  tools/lib/bpf/btf.c      | 10 ++++++++++
> > >  tools/lib/bpf/btf.h      |  2 ++
> > >  tools/lib/bpf/libbpf.map |  2 ++
> > >  3 files changed, 14 insertions(+)
> > >
> > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > index 1c2ba7182400..34bfb3641aac 100644
> > > --- a/tools/lib/bpf/btf.c
> > > +++ b/tools/lib/bpf/btf.c
> > > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
> > >       return btf->fd;
> > >  }
> > >
> > > +__u32 btf__get_raw_data_size(const struct btf *btf)
> > > +{
> > > +     return btf->data_size;
> > > +}
> > > +
> > > +void btf__get_raw_data(const struct btf *btf, char *data)
> > > +{
> > > +     memcpy(data, btf->data, btf->data_size);
> > > +}
> >
> > I cannot think of any other way to use this api,
> > but to call btf__get_raw_data_size() first,
> > then malloc that much memory and then call btf__get_raw_data()
> > to store btf into it.
> >
> > If so, may be api should be single call that allocates, copies,
> > and returns pointer to new mem and its size?
> > Probably less error prone?
> >
> 
> I don't have strong preference, but providing pointer to allocated memory
> seems more flexible and allows more clever/optimal use of memory from caller
> side. E.g., instead of doing two mallocs, you can imagine doing something
> like:
> 
> int max_size = max(btf__get_raw_data_size(btf),
>                    btf_ext__get_raw_data_size(btf_ext));
> char *m = malloc(max_size);
> btf__get_raw_data(btf, m);
> dump_btf_section_to_file(m, some_file);
> btf_ext__get_raw_data(btf_ext, m);
> dump_btf_ext_section_to_file(m, some_file);
> free(m);
> 
> Also, pointer to memory could be mmap()'ed file, for instance. In general,
> for a library it might be a good thing to not be prescriptive as to how one
> gets that piece of memory.

Plausible, but I'd like to see pahole patches to be convinced ;)
Andrii Nakryiko Feb. 7, 2019, 7:21 p.m. UTC | #4
On Tue, Feb 5, 2019 at 10:25 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 05, 2019 at 09:46:14PM -0800, Andrii Nakryiko wrote:
> > On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> > > > This patch exposes two new APIs btf__get_raw_data_size() and
> > > > btf__get_raw_data() that allows to get a copy of raw BTF data out of
> > > > struct btf. This is useful for external programs that need to manipulate
> > > > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> > > > and then writing it back to file.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > Acked-by: Song Liu <songliubraving@fb.com>
> > > > ---
> > > >  tools/lib/bpf/btf.c      | 10 ++++++++++
> > > >  tools/lib/bpf/btf.h      |  2 ++
> > > >  tools/lib/bpf/libbpf.map |  2 ++
> > > >  3 files changed, 14 insertions(+)
> > > >
> > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > > index 1c2ba7182400..34bfb3641aac 100644
> > > > --- a/tools/lib/bpf/btf.c
> > > > +++ b/tools/lib/bpf/btf.c
> > > > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
> > > >       return btf->fd;
> > > >  }
> > > >
> > > > +__u32 btf__get_raw_data_size(const struct btf *btf)
> > > > +{
> > > > +     return btf->data_size;
> > > > +}
> > > > +
> > > > +void btf__get_raw_data(const struct btf *btf, char *data)
> > > > +{
> > > > +     memcpy(data, btf->data, btf->data_size);
> > > > +}
> > >
> > > I cannot think of any other way to use this api,
> > > but to call btf__get_raw_data_size() first,
> > > then malloc that much memory and then call btf__get_raw_data()
> > > to store btf into it.
> > >
> > > If so, may be api should be single call that allocates, copies,
> > > and returns pointer to new mem and its size?
> > > Probably less error prone?
> > >
> >
> > I don't have strong preference, but providing pointer to allocated memory
> > seems more flexible and allows more clever/optimal use of memory from caller
> > side. E.g., instead of doing two mallocs, you can imagine doing something
> > like:
> >
> > int max_size = max(btf__get_raw_data_size(btf),
> >                    btf_ext__get_raw_data_size(btf_ext));
> > char *m = malloc(max_size);
> > btf__get_raw_data(btf, m);
> > dump_btf_section_to_file(m, some_file);
> > btf_ext__get_raw_data(btf_ext, m);
> > dump_btf_ext_section_to_file(m, some_file);
> > free(m);
> >
> > Also, pointer to memory could be mmap()'ed file, for instance. In general,
> > for a library it might be a good thing to not be prescriptive as to how one
> > gets that piece of memory.
>
> Plausible, but I'd like to see pahole patches to be convinced ;)
>

Here's a summary of proposed ways to expose raw data through new api,
with pros/cons.

1. Originally proposed two functions. `int btf__get_raw_data_size()`
to get size, `void btf__get_raw_data(void* buf)` to write raw data to
a provided buf.

Pros:
  - allows maximal flexibility for users of this API. They can manage
memory as it's convenient for them (e.g., reusing same buffer for
multiple btf and btf_ext raw data).
  - allows using mmap()'ed memory, as allocation and memory ownership
is delegated to user

Cons:
  - has potential of buffer overflows, if user doesn't provide big enough buffer


2. Alexei's proposal to combine getting size in single function that
internally allocates new memory buffer, copies data and returns it to
users to use and later free.

Pros:
  - one less API function
  - more straightforward usage, it's hard to misuse it (except for
memory leaking, if memory is not freed)

Cons:
  - always allocated for each call
  - least flexible approach, doesn't allow caller to manage memory,
prevents any kind of direct write to mmap()'ed file

3. Daniel proposed realloc-like approach, where caller optionally
provides memory buffer, but we always call realloc() internally to
ensure we have long enough buffer.

Pros:
  - allows callers to provide their memory buffer (similar to approach
#1, but see cons below)
  - prevents user error with providing too small buffer (similar to approach #2)

Cons:
  - realloc expects that memory was allocated by previous malloc()
call, so caller can't allocate bigger chunk of memory and provide
pointer inside that area (behavior is undefined in that case). This
requirement is not immediately obvious, so this approach feels more
error prone than either of approach #1 and #2
  - still doesn't allow mmap()'ed usage, again due to realloc()'s requirements


Approach #3 looks most subtly-error-prone, as it's too easy to just
provide pointer that's not at the beginning of malloc()'ed memory, but
this might not be detected immediately, and could potentially lead to
silent memory corruption.

I'd still go with approach #1 as it provides least restrictive API,
even though approach #2 will provide marginally better usability for
common cases.

Alexei, Daniel, which approach you'd prefer in the end after
considering all pros and cons?
Andrii Nakryiko Feb. 7, 2019, 8:13 p.m. UTC | #5
On Thu, Feb 7, 2019 at 11:21 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Feb 5, 2019 at 10:25 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Feb 05, 2019 at 09:46:14PM -0800, Andrii Nakryiko wrote:
> > > On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote:
> > > > > This patch exposes two new APIs btf__get_raw_data_size() and
> > > > > btf__get_raw_data() that allows to get a copy of raw BTF data out of
> > > > > struct btf. This is useful for external programs that need to manipulate
> > > > > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info
> > > > > and then writing it back to file.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > > Acked-by: Song Liu <songliubraving@fb.com>
> > > > > ---
> > > > >  tools/lib/bpf/btf.c      | 10 ++++++++++
> > > > >  tools/lib/bpf/btf.h      |  2 ++
> > > > >  tools/lib/bpf/libbpf.map |  2 ++
> > > > >  3 files changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > > > index 1c2ba7182400..34bfb3641aac 100644
> > > > > --- a/tools/lib/bpf/btf.c
> > > > > +++ b/tools/lib/bpf/btf.c
> > > > > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf)
> > > > >       return btf->fd;
> > > > >  }
> > > > >
> > > > > +__u32 btf__get_raw_data_size(const struct btf *btf)
> > > > > +{
> > > > > +     return btf->data_size;
> > > > > +}
> > > > > +
> > > > > +void btf__get_raw_data(const struct btf *btf, char *data)
> > > > > +{
> > > > > +     memcpy(data, btf->data, btf->data_size);
> > > > > +}
> > > >
> > > > I cannot think of any other way to use this api,
> > > > but to call btf__get_raw_data_size() first,
> > > > then malloc that much memory and then call btf__get_raw_data()
> > > > to store btf into it.
> > > >
> > > > If so, may be api should be single call that allocates, copies,
> > > > and returns pointer to new mem and its size?
> > > > Probably less error prone?
> > > >
> > >
> > > I don't have strong preference, but providing pointer to allocated memory
> > > seems more flexible and allows more clever/optimal use of memory from caller
> > > side. E.g., instead of doing two mallocs, you can imagine doing something
> > > like:
> > >
> > > int max_size = max(btf__get_raw_data_size(btf),
> > >                    btf_ext__get_raw_data_size(btf_ext));
> > > char *m = malloc(max_size);
> > > btf__get_raw_data(btf, m);
> > > dump_btf_section_to_file(m, some_file);
> > > btf_ext__get_raw_data(btf_ext, m);
> > > dump_btf_ext_section_to_file(m, some_file);
> > > free(m);
> > >
> > > Also, pointer to memory could be mmap()'ed file, for instance. In general,
> > > for a library it might be a good thing to not be prescriptive as to how one
> > > gets that piece of memory.
> >
> > Plausible, but I'd like to see pahole patches to be convinced ;)
> >
>
> Here's a summary of proposed ways to expose raw data through new api,
> with pros/cons.
>
> 1. Originally proposed two functions. `int btf__get_raw_data_size()`
> to get size, `void btf__get_raw_data(void* buf)` to write raw data to
> a provided buf.
>
> Pros:
>   - allows maximal flexibility for users of this API. They can manage
> memory as it's convenient for them (e.g., reusing same buffer for
> multiple btf and btf_ext raw data).
>   - allows using mmap()'ed memory, as allocation and memory ownership
> is delegated to user
>
> Cons:
>   - has potential of buffer overflows, if user doesn't provide big enough buffer
>
>
> 2. Alexei's proposal to combine getting size in single function that
> internally allocates new memory buffer, copies data and returns it to
> users to use and later free.
>
> Pros:
>   - one less API function
>   - more straightforward usage, it's hard to misuse it (except for
> memory leaking, if memory is not freed)
>
> Cons:
>   - always allocated for each call
>   - least flexible approach, doesn't allow caller to manage memory,
> prevents any kind of direct write to mmap()'ed file
>
> 3. Daniel proposed realloc-like approach, where caller optionally
> provides memory buffer, but we always call realloc() internally to
> ensure we have long enough buffer.
>
> Pros:
>   - allows callers to provide their memory buffer (similar to approach
> #1, but see cons below)
>   - prevents user error with providing too small buffer (similar to approach #2)
>
> Cons:
>   - realloc expects that memory was allocated by previous malloc()
> call, so caller can't allocate bigger chunk of memory and provide
> pointer inside that area (behavior is undefined in that case). This
> requirement is not immediately obvious, so this approach feels more
> error prone than either of approach #1 and #2
>   - still doesn't allow mmap()'ed usage, again due to realloc()'s requirements
>

There is actually approach #4 - just return const void* to an internal
memory buffer. This is trivial for struct btf, will require just
slight changes for struct btf_ext, but it puts all the control in
user's hands without imposing any unnecessary allocations. This
approach seems to provide best of all approaches with no downsides.

>
> Approach #3 looks most subtly-error-prone, as it's too easy to just
> provide pointer that's not at the beginning of malloc()'ed memory, but
> this might not be detected immediately, and could potentially lead to
> silent memory corruption.
>
> I'd still go with approach #1 as it provides least restrictive API,
> even though approach #2 will provide marginally better usability for
> common cases.
>
> Alexei, Daniel, which approach you'd prefer in the end after
> considering all pros and cons?
diff mbox series

Patch

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 1c2ba7182400..34bfb3641aac 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -437,6 +437,16 @@  int btf__fd(const struct btf *btf)
 	return btf->fd;
 }
 
+__u32 btf__get_raw_data_size(const struct btf *btf)
+{
+	return btf->data_size;
+}
+
+void btf__get_raw_data(const struct btf *btf, char *data)
+{
+	memcpy(data, btf->data, btf->data_size);
+}
+
 void btf__get_strings(const struct btf *btf, const char **strings,
 		      __u32 *str_len)
 {
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index e8410887f93a..d46f680b9416 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -66,6 +66,8 @@  LIBBPF_API const struct btf_type *btf__type_by_id(const struct btf *btf,
 LIBBPF_API __s64 btf__resolve_size(const struct btf *btf, __u32 type_id);
 LIBBPF_API int btf__resolve_type(const struct btf *btf, __u32 type_id);
 LIBBPF_API int btf__fd(const struct btf *btf);
+LIBBPF_API __u32 btf__get_raw_data_size(const struct btf *btf);
+LIBBPF_API void btf__get_raw_data(const struct btf *btf, char *data);
 LIBBPF_API void btf__get_strings(const struct btf *btf, const char **strings,
 				 __u32 *str_len);
 LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index f5372df143f4..0ebbee13a3cd 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -136,6 +136,8 @@  LIBBPF_0.0.2 {
 		btf__dedup;
 		btf__get_map_kv_tids;
 		btf__get_nr_types;
+		btf__get_raw_data;
+		btf__get_raw_data_size;
 		btf__get_strings;
 		btf__load;
 		btf_ext__free;