Message ID | 20200828193603.335512-4-sdf@google.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | Allow storage of flexible metadata information for eBPF programs | expand |
On Fri, Aug 28, 2020 at 12:37 PM Stanislav Fomichev <sdf@google.com> wrote: > > From: YiFei Zhu <zhuyifei@google.com> > > The patch adds a simple wrapper bpf_prog_bind_map around the syscall. > And when using libbpf to load a program, it will probe the kernel for > the support of this syscall, and scan for the .metadata ELF section > and load it as an internal map like a .data section. > > In the case that kernel supports the BPF_PROG_BIND_MAP syscall and > a .metadata section exists, the map will be explicitly bound to > the program via the syscall immediately after program is loaded. > -EEXIST is ignored for this syscall. Here is the question I have. How important is it that all this metadata is in a separate map? What if libbpf just PROG_BIND_MAP all the maps inside a single BPF .o file to all BPF programs in that file? Including ARRAY maps created for .data, .rodata and .bss, even if the BPF program doesn't use any of the global variables? If it's too extreme, we could do it only for global data maps, leaving explicit map definitions in SEC(".maps") alone. Would that be terrible? Conceptually it makes sense, because when you program in user-space, you expect global variables to be there, even if you don't reference it directly, right? The only downside is that you won't have a special ".metadata" map, rather it will be part of ".rodata" one. > > Cc: YiFei Zhu <zhuyifei1999@gmail.com> > Signed-off-by: YiFei Zhu <zhuyifei@google.com> > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > tools/lib/bpf/bpf.c | 13 ++++ > tools/lib/bpf/bpf.h | 8 +++ > tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++------- > tools/lib/bpf/libbpf.map | 1 + > 4 files changed, 131 insertions(+), 21 deletions(-) > [...] > @@ -3592,18 +3619,13 @@ static int probe_kern_prog_name(void) > return probe_fd(ret); > } > > -static int probe_kern_global_data(void) > +static void __probe_create_global_data(int *prog, int *map, > + struct bpf_insn *insns, size_t insns_cnt) all those static functions are internal, no need for double underscore in names > { > struct bpf_load_program_attr prg_attr; > struct bpf_create_map_attr map_attr; > char *cp, errmsg[STRERR_BUFSIZE]; > - struct bpf_insn insns[] = { > - BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16), > - BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42), > - BPF_MOV64_IMM(BPF_REG_0, 0), > - BPF_EXIT_INSN(), > - }; > - int ret, map; > + int err; > > memset(&map_attr, 0, sizeof(map_attr)); > map_attr.map_type = BPF_MAP_TYPE_ARRAY; > @@ -3611,26 +3633,40 @@ static int probe_kern_global_data(void) > map_attr.value_size = 32; > map_attr.max_entries = 1; > > - map = bpf_create_map_xattr(&map_attr); > - if (map < 0) { > - ret = -errno; > - cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); > + *map = bpf_create_map_xattr(&map_attr); > + if (*map < 0) { > + err = errno; > + cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg)); > pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n", > - __func__, cp, -ret); > - return ret; > + __func__, cp, -err); > + return; > } > > - insns[0].imm = map; > + insns[0].imm = *map; you are making confusing and error prone assumptions about the first instruction passed in, I really-really don't like this, super easy to miss and super easy to screw up. > > memset(&prg_attr, 0, sizeof(prg_attr)); > prg_attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; > prg_attr.insns = insns; > - prg_attr.insns_cnt = ARRAY_SIZE(insns); > + prg_attr.insns_cnt = insns_cnt; > prg_attr.license = "GPL"; > > - ret = bpf_load_program_xattr(&prg_attr, NULL, 0); > + *prog = bpf_load_program_xattr(&prg_attr, NULL, 0); > +} > + [...]
On Wed, Sep 02, 2020 at 07:31:33PM -0700, Andrii Nakryiko wrote: > On Fri, Aug 28, 2020 at 12:37 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > From: YiFei Zhu <zhuyifei@google.com> > > > > The patch adds a simple wrapper bpf_prog_bind_map around the syscall. > > And when using libbpf to load a program, it will probe the kernel for > > the support of this syscall, and scan for the .metadata ELF section > > and load it as an internal map like a .data section. > > > > In the case that kernel supports the BPF_PROG_BIND_MAP syscall and > > a .metadata section exists, the map will be explicitly bound to > > the program via the syscall immediately after program is loaded. > > -EEXIST is ignored for this syscall. > > Here is the question I have. How important is it that all this > metadata is in a separate map? What if libbpf just PROG_BIND_MAP all > the maps inside a single BPF .o file to all BPF programs in that file? > Including ARRAY maps created for .data, .rodata and .bss, even if the > BPF program doesn't use any of the global variables? If it's too > extreme, we could do it only for global data maps, leaving explicit > map definitions in SEC(".maps") alone. Would that be terrible? > Conceptually it makes sense, because when you program in user-space, > you expect global variables to be there, even if you don't reference > it directly, right? The only downside is that you won't have a special > ".metadata" map, rather it will be part of ".rodata" one. That's an interesting idea. Indeed. If we have BPF_PROG_BIND_MAP command why do we need to create another map that behaves exactly like .rodata but has a different name? Wouldn't it be better to identify metadata elements some other way? Like by common prefix/suffix name of the variables or via grouping them under one structure with standard prefix? Like: struct bpf_prog_metadata_blahblah { char compiler_name[]; int my_internal_prog_version; } = { .compiler_name[] = "clang v.12", ...}; In the past we did this hack for 'version' and for 'license', but we did it because we didn't have BTF and there was no other way to determine the boundaries. I think libbpf can and should support multiple rodata sections with arbitrary names, but hardcoding one specific ".metadata" name? Hmm. Let's think through the implications. Multiple .o support and static linking is coming soon. When two .o-s with multiple bpf progs are statically linked libbpf won't have any choice but to merge them together under single ".metadata" section and single map that will be BPF_PROG_BIND_MAP-ed to different progs. Meaning that metadata applies to final elf file after linking. It's _not_ per program metadata. May be we should talk about problem statement and goals. Do we actually need metadata per program or metadata per single .o or metadata per final .o with multiple .o linked together? What is this metadata? If it's just unreferenced by program read only data then no special names or prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any program and it would be up to tooling to decide the meaning of the data in the map. For example, bpftool can choose to print all variables from all read only maps that match "bpf_metadata_" prefix, but it will be bpftool convention only and not hard coded in libbpf.
On Thu, Sep 3, 2020 at 6:29 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Sep 02, 2020 at 07:31:33PM -0700, Andrii Nakryiko wrote: > > On Fri, Aug 28, 2020 at 12:37 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > > > From: YiFei Zhu <zhuyifei@google.com> > > > > > > The patch adds a simple wrapper bpf_prog_bind_map around the syscall. > > > And when using libbpf to load a program, it will probe the kernel for > > > the support of this syscall, and scan for the .metadata ELF section > > > and load it as an internal map like a .data section. > > > > > > In the case that kernel supports the BPF_PROG_BIND_MAP syscall and > > > a .metadata section exists, the map will be explicitly bound to > > > the program via the syscall immediately after program is loaded. > > > -EEXIST is ignored for this syscall. > > > > Here is the question I have. How important is it that all this > > metadata is in a separate map? What if libbpf just PROG_BIND_MAP all > > the maps inside a single BPF .o file to all BPF programs in that file? > > Including ARRAY maps created for .data, .rodata and .bss, even if the > > BPF program doesn't use any of the global variables? If it's too > > extreme, we could do it only for global data maps, leaving explicit > > map definitions in SEC(".maps") alone. Would that be terrible? > > Conceptually it makes sense, because when you program in user-space, > > you expect global variables to be there, even if you don't reference > > it directly, right? The only downside is that you won't have a special > > ".metadata" map, rather it will be part of ".rodata" one. > > That's an interesting idea. > Indeed. If we have BPF_PROG_BIND_MAP command why do we need to create > another map that behaves exactly like .rodata but has a different name? That was exactly my thought when I re-read this patch set :) > Wouldn't it be better to identify metadata elements some other way? > Like by common prefix/suffix name of the variables or > via grouping them under one structure with standard prefix? > Like: > struct bpf_prog_metadata_blahblah { > char compiler_name[]; > int my_internal_prog_version; > } = { .compiler_name[] = "clang v.12", ...}; > > In the past we did this hack for 'version' and for 'license', > but we did it because we didn't have BTF and there was no other way > to determine the boundaries. > I think libbpf can and should support multiple rodata sections with Yep, that's coming, we already have a pretty common .rodata.str1.1 section emitted by Clang for some cases, which libbpf currently ignores, but that should change. Creating a separate map for all such small sections seems excessive, so my plan is to combine them and their BTFs into one, as you assumed below. > arbitrary names, but hardcoding one specific ".metadata" name? > Hmm. Let's think through the implications. > Multiple .o support and static linking is coming soon. > When two .o-s with multiple bpf progs are statically linked libbpf > won't have any choice but to merge them together under single > ".metadata" section and single map that will be BPF_PROG_BIND_MAP-ed > to different progs. Meaning that metadata applies to final elf file > after linking. It's _not_ per program metadata. Right, exactly. > May be we should talk about problem statement and goals. > Do we actually need metadata per program or metadata per single .o > or metadata per final .o with multiple .o linked together? > What is this metadata? Yep, that's a very valid question. I've also CC'ed Andrey. > If it's just unreferenced by program read only data then no special names or > prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any > program and it would be up to tooling to decide the meaning of the data in the > map. For example, bpftool can choose to print all variables from all read only > maps that match "bpf_metadata_" prefix, but it will be bpftool convention only > and not hard coded in libbpf. Agree as well. It feels a bit odd for libbpf to handle ".metadata" specially, given libbpf itself doesn't care about its contents at all. So thanks for bringing this up, I think this is an important discussion to have.
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >> May be we should talk about problem statement and goals. >> Do we actually need metadata per program or metadata per single .o >> or metadata per final .o with multiple .o linked together? >> What is this metadata? > > Yep, that's a very valid question. I've also CC'ed Andrey. For the libxdp use case, I need metadata per program. But I'm already sticking that in a single section and disambiguating by struct name (just prefixing the function name with a _ ), so I think it's fine to have this kind of "concatenated metadata" per elf file and parse out the per-program information from that. This is similar to the BTF-encoded "metadata" we can do today. >> If it's just unreferenced by program read only data then no special names or >> prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any >> program and it would be up to tooling to decide the meaning of the data in the >> map. For example, bpftool can choose to print all variables from all read only >> maps that match "bpf_metadata_" prefix, but it will be bpftool convention only >> and not hard coded in libbpf. > > Agree as well. It feels a bit odd for libbpf to handle ".metadata" > specially, given libbpf itself doesn't care about its contents at all. > > So thanks for bringing this up, I think this is an important > discussion to have. I'm fine with having this be part of .rodata. One drawback, though, is that if any metadata is defined, it becomes a bit more complicated to use bpf_map__set_initial_value() because that now also has to include the metadata. Any way we can improve upon that? -Toke
On Mon, Sep 7, 2020 at 1:49 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > > >> May be we should talk about problem statement and goals. > >> Do we actually need metadata per program or metadata per single .o > >> or metadata per final .o with multiple .o linked together? > >> What is this metadata? > > > > Yep, that's a very valid question. I've also CC'ed Andrey. > > For the libxdp use case, I need metadata per program. But I'm already > sticking that in a single section and disambiguating by struct name > (just prefixing the function name with a _ ), so I think it's fine to > have this kind of "concatenated metadata" per elf file and parse out the > per-program information from that. This is similar to the BTF-encoded > "metadata" we can do today. We've come full circle :-) I think we discussed that approach originally - to stick everything into existing global .data/.rodata and use some variable prefix for the metadata. I'm fine with that approach. The only thing I don't understand is - why bother with the additional .rodata.metadata section and merging? Can we unconditionally do BPF_PROG_BIND_MAP(.rodata) from libbpf (and ignore the error) and be done? Sticking to the original question: for our use-case, the metadata is per .o file. I'm not sure how it would work in the 'multiple .o linked together' use case. Ideally, we'd need to preserve all metadata? > >> If it's just unreferenced by program read only data then no special names or > >> prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any > >> program and it would be up to tooling to decide the meaning of the data in the > >> map. For example, bpftool can choose to print all variables from all read only > >> maps that match "bpf_metadata_" prefix, but it will be bpftool convention only > >> and not hard coded in libbpf. > > > > Agree as well. It feels a bit odd for libbpf to handle ".metadata" > > specially, given libbpf itself doesn't care about its contents at all. > > > > So thanks for bringing this up, I think this is an important > > discussion to have. > > I'm fine with having this be part of .rodata. One drawback, though, is > that if any metadata is defined, it becomes a bit more complicated to > use bpf_map__set_initial_value() because that now also has to include > the metadata. Any way we can improve upon that? Right. One additional thing we wanted this metadata to have is the comm of the process who loaded this bpf program (to be filled/added by libbpf). I suppose .rodata.metadata section can help with that?
Andrii Nakryiko <andrii.nakryiko@gmail.com> [Fri, 2020-09-04 16:19 -0700]: > On Thu, Sep 3, 2020 at 6:29 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Wed, Sep 02, 2020 at 07:31:33PM -0700, Andrii Nakryiko wrote: > > > On Fri, Aug 28, 2020 at 12:37 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > > > > > From: YiFei Zhu <zhuyifei@google.com> > > > > > > > > The patch adds a simple wrapper bpf_prog_bind_map around the syscall. > > > > And when using libbpf to load a program, it will probe the kernel for > > > > the support of this syscall, and scan for the .metadata ELF section > > > > and load it as an internal map like a .data section. > > > > > > > > In the case that kernel supports the BPF_PROG_BIND_MAP syscall and > > > > a .metadata section exists, the map will be explicitly bound to > > > > the program via the syscall immediately after program is loaded. > > > > -EEXIST is ignored for this syscall. > > > > > > Here is the question I have. How important is it that all this > > > metadata is in a separate map? What if libbpf just PROG_BIND_MAP all > > > the maps inside a single BPF .o file to all BPF programs in that file? > > > Including ARRAY maps created for .data, .rodata and .bss, even if the > > > BPF program doesn't use any of the global variables? If it's too > > > extreme, we could do it only for global data maps, leaving explicit > > > map definitions in SEC(".maps") alone. Would that be terrible? > > > Conceptually it makes sense, because when you program in user-space, > > > you expect global variables to be there, even if you don't reference > > > it directly, right? The only downside is that you won't have a special > > > ".metadata" map, rather it will be part of ".rodata" one. > > > > That's an interesting idea. > > Indeed. If we have BPF_PROG_BIND_MAP command why do we need to create > > another map that behaves exactly like .rodata but has a different name? > > That was exactly my thought when I re-read this patch set :) > > > Wouldn't it be better to identify metadata elements some other way? > > Like by common prefix/suffix name of the variables or > > via grouping them under one structure with standard prefix? > > Like: > > struct bpf_prog_metadata_blahblah { > > char compiler_name[]; > > int my_internal_prog_version; > > } = { .compiler_name[] = "clang v.12", ...}; > > > > In the past we did this hack for 'version' and for 'license', > > but we did it because we didn't have BTF and there was no other way > > to determine the boundaries. > > I think libbpf can and should support multiple rodata sections with > > Yep, that's coming, we already have a pretty common .rodata.str1.1 > section emitted by Clang for some cases, which libbpf currently > ignores, but that should change. Creating a separate map for all such > small sections seems excessive, so my plan is to combine them and > their BTFs into one, as you assumed below. > > > arbitrary names, but hardcoding one specific ".metadata" name? > > Hmm. Let's think through the implications. > > Multiple .o support and static linking is coming soon. > > When two .o-s with multiple bpf progs are statically linked libbpf > > won't have any choice but to merge them together under single > > ".metadata" section and single map that will be BPF_PROG_BIND_MAP-ed > > to different progs. Meaning that metadata applies to final elf file > > after linking. It's _not_ per program metadata. > > Right, exactly. > > > May be we should talk about problem statement and goals. > > Do we actually need metadata per program or metadata per single .o > > or metadata per final .o with multiple .o linked together? > > What is this metadata? > > Yep, that's a very valid question. I've also CC'ed Andrey. From my side the problem statement is to be able to save a bunch of metadata fields per BPF object file (I don't distinguish "final .o" and "multiple .o linked together" since we have only the former in prod). Specifically things like oncall team who owns the programs in the object (the most important info), build info (repository revision, build commit time, build time), etc. The plan is to integrate it with build system and be able to quickly identify source code and point of contact for any particular program. All these things are always the same for all programs in one object. It may change in the future, but at the moment I'm not aware of any use-case where these things can be different for different programs in the same object. I don't have strong preferences on the implementation side as long as it covers the use-case, e.g. the one in the patch set would work FWIW. > > If it's just unreferenced by program read only data then no special names or > > prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any > > program and it would be up to tooling to decide the meaning of the data in the > > map. For example, bpftool can choose to print all variables from all read only > > maps that match "bpf_metadata_" prefix, but it will be bpftool convention only > > and not hard coded in libbpf. > > Agree as well. It feels a bit odd for libbpf to handle ".metadata" > specially, given libbpf itself doesn't care about its contents at all. > > So thanks for bringing this up, I think this is an important discussion to have.
On Mon, Sep 7, 2020 at 1:49 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > > >> May be we should talk about problem statement and goals. > >> Do we actually need metadata per program or metadata per single .o > >> or metadata per final .o with multiple .o linked together? > >> What is this metadata? > > > > Yep, that's a very valid question. I've also CC'ed Andrey. > > For the libxdp use case, I need metadata per program. But I'm already > sticking that in a single section and disambiguating by struct name > (just prefixing the function name with a _ ), so I think it's fine to > have this kind of "concatenated metadata" per elf file and parse out the > per-program information from that. This is similar to the BTF-encoded > "metadata" we can do today. > > >> If it's just unreferenced by program read only data then no special names or > >> prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any > >> program and it would be up to tooling to decide the meaning of the data in the > >> map. For example, bpftool can choose to print all variables from all read only > >> maps that match "bpf_metadata_" prefix, but it will be bpftool convention only > >> and not hard coded in libbpf. > > > > Agree as well. It feels a bit odd for libbpf to handle ".metadata" > > specially, given libbpf itself doesn't care about its contents at all. > > > > So thanks for bringing this up, I think this is an important > > discussion to have. > > I'm fine with having this be part of .rodata. One drawback, though, is > that if any metadata is defined, it becomes a bit more complicated to > use bpf_map__set_initial_value() because that now also has to include > the metadata. Any way we can improve upon that? I know that skeleton is not an answer for you, so you'll have to find DATASEC and corresponding variable offset and size (libbpf provides APIs for all those operations, but you'll need to combine them together). Then mmap() map and then you can do partial updates. There is no other way to update only portions of an ARRAY map, except through memory-mapping. > > -Toke >
On Tue, Sep 8, 2020 at 8:20 AM Stanislav Fomichev <sdf@google.com> wrote: > > On Mon, Sep 7, 2020 at 1:49 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > > > > >> May be we should talk about problem statement and goals. > > >> Do we actually need metadata per program or metadata per single .o > > >> or metadata per final .o with multiple .o linked together? > > >> What is this metadata? > > > > > > Yep, that's a very valid question. I've also CC'ed Andrey. > > > > For the libxdp use case, I need metadata per program. But I'm already > > sticking that in a single section and disambiguating by struct name > > (just prefixing the function name with a _ ), so I think it's fine to > > have this kind of "concatenated metadata" per elf file and parse out the > > per-program information from that. This is similar to the BTF-encoded > > "metadata" we can do today. > We've come full circle :-) > I think we discussed that approach originally - to stick everything > into existing global .data/.rodata and use some variable prefix for > the metadata. I'm fine with that approach. The only thing I don't That's what we wanted all along, but the problem was with keeping reference to bpf_map from bpf_prog. We eventually gave up and concluded that extra BPF command is necessary. But somewhere along the road we somehow concluded we need an entire new special map/section, and I didn't realize at first (and it seems it wasn't just me) that the latter part is unnecessary. > understand is - why bother with the additional .rodata.metadata > section and merging? > Can we unconditionally do BPF_PROG_BIND_MAP(.rodata) from libbpf (and > ignore the error) and be done? That's exactly what we are proposing, to stick to .rodata, instead of having extra .metadata section. Multiple .rodata/.data sections are orthogonal concerns, which we need to solve as well, because the compiler does emit many of them in some cases. So in that context, once we support multiple .rodata's, it would be possible to have metadata-only "sub-sections". But we don't have to do that, keeping everything simple and put into .rodata works just fine. > > Sticking to the original question: for our use-case, the metadata is > per .o file. I'm not sure how it would work in the 'multiple .o linked > together' use case. Ideally, we'd need to preserve all metadata? Just like in user-space, when you have multiple .c files compiled into .o files and later linked into a final library or binary, all the .data and .rodata sections are combined. That's what will happen with BPF .o files as well. So it will be automatically preserved, as you seem to want. > > > >> If it's just unreferenced by program read only data then no special names or > > >> prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any > > >> program and it would be up to tooling to decide the meaning of the data in the > > >> map. For example, bpftool can choose to print all variables from all read only > > >> maps that match "bpf_metadata_" prefix, but it will be bpftool convention only > > >> and not hard coded in libbpf. > > > > > > Agree as well. It feels a bit odd for libbpf to handle ".metadata" > > > specially, given libbpf itself doesn't care about its contents at all. > > > > > > So thanks for bringing this up, I think this is an important > > > discussion to have. > > > > I'm fine with having this be part of .rodata. One drawback, though, is > > that if any metadata is defined, it becomes a bit more complicated to > > use bpf_map__set_initial_value() because that now also has to include > > the metadata. Any way we can improve upon that? > Right. One additional thing we wanted this metadata to have is the > comm of the process who loaded this bpf program (to be filled/added by > libbpf). > I suppose .rodata.metadata section can help with that? .rodata.metadata has nothing to do with this. I'm also not sure whether it's a responsibility of libbpf to provide process's comm as a metadata, to be honest. Next thing it will be user name/user id, then cgroup name, then some other application-level concept and so on. I'd prefer to keep it simple and let applications handle that for themselves. Luckily, using a BPF skeleton this is **extremely** easy.
On Tue, Sep 8, 2020 at 10:45 AM Andrey Ignatov <rdna@fb.com> wrote: > > Andrii Nakryiko <andrii.nakryiko@gmail.com> [Fri, 2020-09-04 16:19 -0700]: > > On Thu, Sep 3, 2020 at 6:29 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Wed, Sep 02, 2020 at 07:31:33PM -0700, Andrii Nakryiko wrote: > > > > On Fri, Aug 28, 2020 at 12:37 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > > > > > > > From: YiFei Zhu <zhuyifei@google.com> > > > > > > > > > > The patch adds a simple wrapper bpf_prog_bind_map around the syscall. > > > > > And when using libbpf to load a program, it will probe the kernel for > > > > > the support of this syscall, and scan for the .metadata ELF section > > > > > and load it as an internal map like a .data section. > > > > > > > > > > In the case that kernel supports the BPF_PROG_BIND_MAP syscall and > > > > > a .metadata section exists, the map will be explicitly bound to > > > > > the program via the syscall immediately after program is loaded. > > > > > -EEXIST is ignored for this syscall. > > > > > > > > Here is the question I have. How important is it that all this > > > > metadata is in a separate map? What if libbpf just PROG_BIND_MAP all > > > > the maps inside a single BPF .o file to all BPF programs in that file? > > > > Including ARRAY maps created for .data, .rodata and .bss, even if the > > > > BPF program doesn't use any of the global variables? If it's too > > > > extreme, we could do it only for global data maps, leaving explicit > > > > map definitions in SEC(".maps") alone. Would that be terrible? > > > > Conceptually it makes sense, because when you program in user-space, > > > > you expect global variables to be there, even if you don't reference > > > > it directly, right? The only downside is that you won't have a special > > > > ".metadata" map, rather it will be part of ".rodata" one. > > > > > > That's an interesting idea. > > > Indeed. If we have BPF_PROG_BIND_MAP command why do we need to create > > > another map that behaves exactly like .rodata but has a different name? > > > > That was exactly my thought when I re-read this patch set :) > > > > > Wouldn't it be better to identify metadata elements some other way? > > > Like by common prefix/suffix name of the variables or > > > via grouping them under one structure with standard prefix? > > > Like: > > > struct bpf_prog_metadata_blahblah { > > > char compiler_name[]; > > > int my_internal_prog_version; > > > } = { .compiler_name[] = "clang v.12", ...}; > > > > > > In the past we did this hack for 'version' and for 'license', > > > but we did it because we didn't have BTF and there was no other way > > > to determine the boundaries. > > > I think libbpf can and should support multiple rodata sections with > > > > Yep, that's coming, we already have a pretty common .rodata.str1.1 > > section emitted by Clang for some cases, which libbpf currently > > ignores, but that should change. Creating a separate map for all such > > small sections seems excessive, so my plan is to combine them and > > their BTFs into one, as you assumed below. > > > > > arbitrary names, but hardcoding one specific ".metadata" name? > > > Hmm. Let's think through the implications. > > > Multiple .o support and static linking is coming soon. > > > When two .o-s with multiple bpf progs are statically linked libbpf > > > won't have any choice but to merge them together under single > > > ".metadata" section and single map that will be BPF_PROG_BIND_MAP-ed > > > to different progs. Meaning that metadata applies to final elf file > > > after linking. It's _not_ per program metadata. > > > > Right, exactly. > > > > > May be we should talk about problem statement and goals. > > > Do we actually need metadata per program or metadata per single .o > > > or metadata per final .o with multiple .o linked together? > > > What is this metadata? > > > > Yep, that's a very valid question. I've also CC'ed Andrey. > > From my side the problem statement is to be able to save a bunch of > metadata fields per BPF object file (I don't distinguish "final .o" and > "multiple .o linked together" since we have only the former in prod). We don't *yet*. But reading below, you have the entire BPF application (i.e., a collection of maps, variables and programs) in mind, not specifically single .c file compiled into a single .o file. So everything works out, I think. > > Specifically things like oncall team who owns the programs in the object > (the most important info), build info (repository revision, build commit > time, build time), etc. The plan is to integrate it with build system > and be able to quickly identify source code and point of contact for any > particular program. > > All these things are always the same for all programs in one object. It > may change in the future, but at the moment I'm not aware of any > use-case where these things can be different for different programs in > the same object. > > I don't have strong preferences on the implementation side as long as it > covers the use-case, e.g. the one in the patch set would work FWIW. > > > > If it's just unreferenced by program read only data then no special names or > > > prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any > > > program and it would be up to tooling to decide the meaning of the data in the > > > map. For example, bpftool can choose to print all variables from all read only > > > maps that match "bpf_metadata_" prefix, but it will be bpftool convention only > > > and not hard coded in libbpf. > > > > Agree as well. It feels a bit odd for libbpf to handle ".metadata" > > specially, given libbpf itself doesn't care about its contents at all. > > > > So thanks for bringing this up, I think this is an important discussion to have. > > -- > Andrey Ignatov
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > On Mon, Sep 7, 2020 at 1:49 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: >> >> >> May be we should talk about problem statement and goals. >> >> Do we actually need metadata per program or metadata per single .o >> >> or metadata per final .o with multiple .o linked together? >> >> What is this metadata? >> > >> > Yep, that's a very valid question. I've also CC'ed Andrey. >> >> For the libxdp use case, I need metadata per program. But I'm already >> sticking that in a single section and disambiguating by struct name >> (just prefixing the function name with a _ ), so I think it's fine to >> have this kind of "concatenated metadata" per elf file and parse out the >> per-program information from that. This is similar to the BTF-encoded >> "metadata" we can do today. >> >> >> If it's just unreferenced by program read only data then no special names or >> >> prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any >> >> program and it would be up to tooling to decide the meaning of the data in the >> >> map. For example, bpftool can choose to print all variables from all read only >> >> maps that match "bpf_metadata_" prefix, but it will be bpftool convention only >> >> and not hard coded in libbpf. >> > >> > Agree as well. It feels a bit odd for libbpf to handle ".metadata" >> > specially, given libbpf itself doesn't care about its contents at all. >> > >> > So thanks for bringing this up, I think this is an important >> > discussion to have. >> >> I'm fine with having this be part of .rodata. One drawback, though, is >> that if any metadata is defined, it becomes a bit more complicated to >> use bpf_map__set_initial_value() because that now also has to include >> the metadata. Any way we can improve upon that? > > I know that skeleton is not an answer for you, so you'll have to find > DATASEC and corresponding variable offset and size (libbpf provides > APIs for all those operations, but you'll need to combine them > together). Then mmap() map and then you can do partial updates. There > is no other way to update only portions of an ARRAY map, except > through memory-mapping. Well, I wouldn't mind having to go digging through the section. But is it really possible to pick out and modify parts of it my mmap() before the object is loaded (and the map frozen)? How? I seem to recall we added bpf_map__set_initial_value() because this was *not* possible with the public API? Also, for this, a bpf_map__get_initial_value() could be a simple way to allow partial modifications. The caller could just get the whole map value, modify it, and set it again afterwards with __set_initial_value(). Any objections to adding that? -Toke
On Wed, Sep 9, 2020 at 3:58 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > > > On Mon, Sep 7, 2020 at 1:49 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes: > >> > >> >> May be we should talk about problem statement and goals. > >> >> Do we actually need metadata per program or metadata per single .o > >> >> or metadata per final .o with multiple .o linked together? > >> >> What is this metadata? > >> > > >> > Yep, that's a very valid question. I've also CC'ed Andrey. > >> > >> For the libxdp use case, I need metadata per program. But I'm already > >> sticking that in a single section and disambiguating by struct name > >> (just prefixing the function name with a _ ), so I think it's fine to > >> have this kind of "concatenated metadata" per elf file and parse out the > >> per-program information from that. This is similar to the BTF-encoded > >> "metadata" we can do today. > >> > >> >> If it's just unreferenced by program read only data then no special names or > >> >> prefixes are needed. We can introduce BPF_PROG_BIND_MAP to bind any map to any > >> >> program and it would be up to tooling to decide the meaning of the data in the > >> >> map. For example, bpftool can choose to print all variables from all read only > >> >> maps that match "bpf_metadata_" prefix, but it will be bpftool convention only > >> >> and not hard coded in libbpf. > >> > > >> > Agree as well. It feels a bit odd for libbpf to handle ".metadata" > >> > specially, given libbpf itself doesn't care about its contents at all. > >> > > >> > So thanks for bringing this up, I think this is an important > >> > discussion to have. > >> > >> I'm fine with having this be part of .rodata. One drawback, though, is > >> that if any metadata is defined, it becomes a bit more complicated to > >> use bpf_map__set_initial_value() because that now also has to include > >> the metadata. Any way we can improve upon that? > > > > I know that skeleton is not an answer for you, so you'll have to find > > DATASEC and corresponding variable offset and size (libbpf provides > > APIs for all those operations, but you'll need to combine them > > together). Then mmap() map and then you can do partial updates. There > > is no other way to update only portions of an ARRAY map, except > > through memory-mapping. > > Well, I wouldn't mind having to go digging through the section. But is > it really possible to pick out and modify parts of it my mmap() before > the object is loaded (and the map frozen)? How? I seem to recall we > added bpf_map__set_initial_value() because this was *not* possible with > the public API? > Ah, right, .rodata is frozen on load, forgot we are talking about .rodata here. > Also, for this, a bpf_map__get_initial_value() could be a simple way to > allow partial modifications. The caller could just get the whole map > value, modify it, and set it again afterwards with > __set_initial_value(). Any objections to adding that? Yeah, I think having an API for getting initial map value makes sense. But please follow the naming convention for getters and call it bpf_map__initial_value(). > > -Toke >
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 82b983ff6569..5f6c5676cc45 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -872,3 +872,16 @@ int bpf_enable_stats(enum bpf_stats_type type) return sys_bpf(BPF_ENABLE_STATS, &attr, sizeof(attr)); } + +int bpf_prog_bind_map(int prog_fd, int map_fd, + const struct bpf_prog_bind_opts *opts) +{ + union bpf_attr attr; + + memset(&attr, 0, sizeof(attr)); + attr.prog_bind_map.prog_fd = prog_fd; + attr.prog_bind_map.map_fd = map_fd; + attr.prog_bind_map.flags = OPTS_GET(opts, flags, 0); + + return sys_bpf(BPF_PROG_BIND_MAP, &attr, sizeof(attr)); +} diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 015d13f25fcc..8c1ac4b42f90 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -243,6 +243,14 @@ LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, enum bpf_stats_type; /* defined in up-to-date linux/bpf.h */ LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type); +struct bpf_prog_bind_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ + __u32 flags; +}; +#define bpf_prog_bind_opts__last_field flags + +LIBBPF_API int bpf_prog_bind_map(int prog_fd, int map_fd, + const struct bpf_prog_bind_opts *opts); #ifdef __cplusplus } /* extern "C" */ #endif diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 8cdb2528482e..2b21021b66bb 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -176,6 +176,8 @@ enum kern_feature_id { FEAT_EXP_ATTACH_TYPE, /* bpf_probe_read_{kernel,user}[_str] helpers */ FEAT_PROBE_READ_KERN, + /* BPF_PROG_BIND_MAP is supported */ + FEAT_PROG_BIND_MAP, __FEAT_CNT, }; @@ -285,6 +287,7 @@ struct bpf_struct_ops { #define KCONFIG_SEC ".kconfig" #define KSYMS_SEC ".ksyms" #define STRUCT_OPS_SEC ".struct_ops" +#define METADATA_SEC ".metadata" enum libbpf_map_type { LIBBPF_MAP_UNSPEC, @@ -292,6 +295,7 @@ enum libbpf_map_type { LIBBPF_MAP_BSS, LIBBPF_MAP_RODATA, LIBBPF_MAP_KCONFIG, + LIBBPF_MAP_METADATA, }; static const char * const libbpf_type_to_btf_name[] = { @@ -299,6 +303,7 @@ static const char * const libbpf_type_to_btf_name[] = { [LIBBPF_MAP_BSS] = BSS_SEC, [LIBBPF_MAP_RODATA] = RODATA_SEC, [LIBBPF_MAP_KCONFIG] = KCONFIG_SEC, + [LIBBPF_MAP_METADATA] = METADATA_SEC, }; struct bpf_map { @@ -381,6 +386,7 @@ struct bpf_object { struct extern_desc *externs; int nr_extern; int kconfig_map_idx; + int metadata_map_idx; bool loaded; bool has_pseudo_calls; @@ -400,6 +406,7 @@ struct bpf_object { Elf_Data *rodata; Elf_Data *bss; Elf_Data *st_ops_data; + Elf_Data *metadata; size_t shstrndx; /* section index for section name strings */ size_t strtabidx; struct { @@ -416,6 +423,7 @@ struct bpf_object { int rodata_shndx; int bss_shndx; int st_ops_shndx; + int metadata_shndx; } efile; /* * All loaded bpf_object is linked in a list, which is @@ -1027,11 +1035,13 @@ static struct bpf_object *bpf_object__new(const char *path, obj->efile.obj_buf_sz = obj_buf_sz; obj->efile.maps_shndx = -1; obj->efile.btf_maps_shndx = -1; + obj->efile.metadata_shndx = -1; obj->efile.data_shndx = -1; obj->efile.rodata_shndx = -1; obj->efile.bss_shndx = -1; obj->efile.st_ops_shndx = -1; obj->kconfig_map_idx = -1; + obj->metadata_map_idx = -1; obj->kern_version = get_kernel_version(); obj->loaded = false; @@ -1343,7 +1353,8 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, def->key_size = sizeof(int); def->value_size = data_sz; def->max_entries = 1; - def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG + def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG || + type == LIBBPF_MAP_METADATA ? BPF_F_RDONLY_PROG : 0; def->map_flags |= BPF_F_MMAPABLE; @@ -1399,6 +1410,16 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj) if (err) return err; } + if (obj->efile.metadata_shndx >= 0) { + err = bpf_object__init_internal_map(obj, LIBBPF_MAP_METADATA, + obj->efile.metadata_shndx, + obj->efile.metadata->d_buf, + obj->efile.metadata->d_size); + if (err) + return err; + + obj->metadata_map_idx = obj->nr_maps - 1; + } return 0; } @@ -2790,6 +2811,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj) } else if (strcmp(name, STRUCT_OPS_SEC) == 0) { obj->efile.st_ops_data = data; obj->efile.st_ops_shndx = idx; + } else if (strcmp(name, METADATA_SEC) == 0) { + obj->efile.metadata = data; + obj->efile.metadata_shndx = idx; } else { pr_info("elf: skipping unrecognized data section(%d) %s\n", idx, name); @@ -3201,7 +3225,8 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj, { return shndx == obj->efile.data_shndx || shndx == obj->efile.bss_shndx || - shndx == obj->efile.rodata_shndx; + shndx == obj->efile.rodata_shndx || + shndx == obj->efile.metadata_shndx; } static bool bpf_object__shndx_is_maps(const struct bpf_object *obj, @@ -3222,6 +3247,8 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx) return LIBBPF_MAP_RODATA; else if (shndx == obj->efile.symbols_shndx) return LIBBPF_MAP_KCONFIG; + else if (shndx == obj->efile.metadata_shndx) + return LIBBPF_MAP_METADATA; else return LIBBPF_MAP_UNSPEC; } @@ -3592,18 +3619,13 @@ static int probe_kern_prog_name(void) return probe_fd(ret); } -static int probe_kern_global_data(void) +static void __probe_create_global_data(int *prog, int *map, + struct bpf_insn *insns, size_t insns_cnt) { struct bpf_load_program_attr prg_attr; struct bpf_create_map_attr map_attr; char *cp, errmsg[STRERR_BUFSIZE]; - struct bpf_insn insns[] = { - BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16), - BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42), - BPF_MOV64_IMM(BPF_REG_0, 0), - BPF_EXIT_INSN(), - }; - int ret, map; + int err; memset(&map_attr, 0, sizeof(map_attr)); map_attr.map_type = BPF_MAP_TYPE_ARRAY; @@ -3611,26 +3633,40 @@ static int probe_kern_global_data(void) map_attr.value_size = 32; map_attr.max_entries = 1; - map = bpf_create_map_xattr(&map_attr); - if (map < 0) { - ret = -errno; - cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); + *map = bpf_create_map_xattr(&map_attr); + if (*map < 0) { + err = errno; + cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg)); pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n", - __func__, cp, -ret); - return ret; + __func__, cp, -err); + return; } - insns[0].imm = map; + insns[0].imm = *map; memset(&prg_attr, 0, sizeof(prg_attr)); prg_attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; prg_attr.insns = insns; - prg_attr.insns_cnt = ARRAY_SIZE(insns); + prg_attr.insns_cnt = insns_cnt; prg_attr.license = "GPL"; - ret = bpf_load_program_xattr(&prg_attr, NULL, 0); + *prog = bpf_load_program_xattr(&prg_attr, NULL, 0); +} + +static int probe_kern_global_data(void) +{ + struct bpf_insn insns[] = { + BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16), + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }; + int prog = -1, map = -1; + + __probe_create_global_data(&prog, &map, insns, ARRAY_SIZE(insns)); + close(map); - return probe_fd(ret); + return probe_fd(prog); } static int probe_kern_btf(void) @@ -3757,6 +3793,32 @@ static int probe_kern_probe_read_kernel(void) return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); } +static int probe_prog_bind_map(void) +{ + struct bpf_insn insns[] = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }; + int prog = -1, map = -1, ret = 0; + + if (!kernel_supports(FEAT_GLOBAL_DATA)) + return 0; + + __probe_create_global_data(&prog, &map, insns, ARRAY_SIZE(insns)); + + if (map >= 0 && prog < 0) { + close(map); + return 0; + } + + if (!bpf_prog_bind_map(prog, map, NULL)) + ret = 1; + + close(map); + close(prog); + return ret; +} + enum kern_feature_result { FEAT_UNKNOWN = 0, FEAT_SUPPORTED = 1, @@ -3797,6 +3859,9 @@ static struct kern_feature_desc { }, [FEAT_PROBE_READ_KERN] = { "bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel, + }, + [FEAT_PROG_BIND_MAP] = { + "BPF_PROG_BIND_MAP support", probe_prog_bind_map, } }; @@ -3897,7 +3962,8 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map) } /* Freeze .rodata and .kconfig map as read-only from syscall side. */ - if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG) { + if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG || + map_type == LIBBPF_MAP_METADATA) { err = bpf_map_freeze(map->fd); if (err) { err = -errno; @@ -6057,6 +6123,28 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, if (ret >= 0) { if (log_buf && load_attr.log_level) pr_debug("verifier log:\n%s", log_buf); + + if (prog->obj->metadata_map_idx >= 0 && + kernel_supports(FEAT_PROG_BIND_MAP)) { + struct bpf_map *metadata_map = + &prog->obj->maps[prog->obj->metadata_map_idx]; + + /* EEXIST to bpf_prog_bind_map means the map is already + * bound to the program. This can happen if the program + * refers to the map in its code. Since all we are doing + * is to make sure when we iterate the program's maps + * metadata map is always inside, EXIST is okay; we + * ignore this errno + */ + if (bpf_prog_bind_map(ret, bpf_map__fd(metadata_map), NULL) && + errno != EEXIST) { + cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); + pr_warn("prog '%s': failed to bind .metadata map: %s\n", + prog->name, cp); + /* Don't fail hard if can't load metadata. */ + } + } + *pfd = ret; ret = 0; goto out; diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 66a6286d0716..529b99c0c2c3 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -302,6 +302,7 @@ LIBBPF_0.1.0 { LIBBPF_0.2.0 { global: + bpf_prog_bind_map; perf_buffer__buffer_cnt; perf_buffer__buffer_fd; perf_buffer__epoll_fd;