Message ID | 20200612223150.1177182-3-andriin@fb.com |
---|---|
State | RFC |
Delegated to: | BPF Maintainers |
Headers | show |
Series | libbpf ksym support and bpftool show PIDs | expand |
On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote: > > Thanks, Andrii, > > This change looks nice! A couple of comments: > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'? That choice is very deliberate one. `extern const void` is the right way in C language to access linker-generated symbols, for instance, which is quite similar to what the intent is her. Having void type is very explicit that you don't know/care about that value pointed to by extern address, the only operation you can perform is to get it's address. Once we add kernel variables support, that's when types will start to be specified and libbpf will do extra checks (type matching) and extra work (generating ldimm64 with BTF ID, for instance), to allow C code to access data pointed to by extern address. Switching type to u64 would be misleading in allowing C code to implicitly dereference value of extern. E.g., there is a big difference between: extern u64 bla; printf("%lld\n", bla); /* de-reference happens here, we get contents of memory pointed to by "bla" symbol */ printf("%p\n", &bla); /* here we get value of linker symbol/address of extern variable */ Currently I explicitly support only the latter and want to prevent the former, until we have kernel variables in BTF. Using `extern void` makes compiler enforce that only the &bla form is allowed. Everything else is compilation error. > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size? That's a bit of a hack on my part. Variable needs to point to some type, which size will match the size of datasec's varinfo entry. This is checked and enforced by kernel. I'm looking for 4-byte int, because it's almost guaranteed that it will be present in program's BTF and I won't have to explicitly add it (it's because all BPF programs return int, so it must be in program's BTF already). While 8-byte long is less likely to be there. In the future, if we have a nicer way to extend BTF (and we will soon), we can do this a bit better, but either way that .ksyms DATASEC type isn't used for anything (there is no map with that DATASEC as a value type), so it doesn't matter. > > Hao > > On Fri, Jun 12, 2020 at 3:35 PM Andrii Nakryiko <andriin@fb.com> wrote: >> >> Add support for another (in addition to existing Kconfig) special kind of >> externs in BPF code, kernel symbol externs. Such externs allow BPF code to >> "know" kernel symbol address and either use it for comparisons with kernel >> data structures (e.g., struct file's f_op pointer, to distinguish different >> kinds of file), or, with the help of bpf_probe_user_kernel(), to follow >> pointers and read data from global variables. Kernel symbol addresses are >> found through /proc/kallsyms, which should be present in the system. >> >> Currently, such kernel symbol variables are typeless: they have to be defined >> as `extern const void <symbol>` and the only operation you can do (in C code) >> with them is to take its address. Such extern should reside in a special >> section '.ksyms'. bpf_helpers.h header provides __ksym macro for this. Strong >> vs weak semantics stays the same as with Kconfig externs. If symbol is not >> found in /proc/kallsyms, this will be a failure for strong (non-weak) extern, >> but will be defaulted to 0 for weak externs. >> >> If the same symbol is defined multiple times in /proc/kallsyms, then it will >> be error if any of the associated addresses differs. In that case, address is >> ambiguous, so libbpf falls on the side of caution, rather than confusing user >> with randomly chosen address. >> >> In the future, once kernel is extended with variables BTF information, such >> ksym externs will be supported in a typed version, which will allow BPF >> program to read variable's contents directly, similarly to how it's done for >> fentry/fexit input arguments. >> >> Signed-off-by: Andrii Nakryiko <andriin@fb.com> >> --- >> tools/lib/bpf/bpf_helpers.h | 1 + >> tools/lib/bpf/btf.h | 5 ++ >> tools/lib/bpf/libbpf.c | 138 ++++++++++++++++++++++++++++++++++-- >> 3 files changed, 139 insertions(+), 5 deletions(-) >> [...] >> >> enum extern_type { >> EXT_UNKNOWN, >> + EXT_KSYM, >> >> EXT_KCFG, >> }; > > > Minor, let EXT_KSYM come after EXT_KCFG. I wanted ksym externs to go before KCFG ones, but not sure why. I'll double check, I don't think it should matter. > >> >> [...] >> +static int bpf_object__read_kallsyms_file(struct bpf_object *obj) >> +{ >> + char sym_type, sym_name[256]; >> + unsigned long sym_addr; >> + struct extern_desc *ext; >> + int ret, err = 0; >> + FILE *f; >> + >> + f = fopen("/proc/kallsyms", "r"); >> + if (!f) { >> + err = -errno; >> + pr_warn("failed to open /proc/kallsyms: %d\n", err); >> + return err; >> + } >> + >> + while (true) { >> + ret = fscanf(f, "%lx %c %s%*[^\n]\n", >> + &sym_addr, &sym_type, sym_name); > > > Maybe better follow the existing pattern in kernel (scripts/kallsyms.c https://github.com/torvalds/linux/blob/master/scripts/kallsyms.c#L177) oh, didn't know about this "%499s" trick, will change. > >> >> + if (ret == EOF && feof(f)) >> + break; >> + if (ret != 3) { >> + err = -EINVAL; >> + goto out; >> + } >> + [...]
On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote: > > > > Thanks, Andrii, > > > > This change looks nice! A couple of comments: > > > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'? > > That choice is very deliberate one. `extern const void` is the right > way in C language to access linker-generated symbols, for instance, > which is quite similar to what the intent is her. Having void type is > very explicit that you don't know/care about that value pointed to by > extern address, the only operation you can perform is to get it's > address. > > Once we add kernel variables support, that's when types will start to > be specified and libbpf will do extra checks (type matching) and extra > work (generating ldimm64 with BTF ID, for instance), to allow C code > to access data pointed to by extern address. > > Switching type to u64 would be misleading in allowing C code to > implicitly dereference value of extern. E.g., there is a big > difference between: > > extern u64 bla; > > printf("%lld\n", bla); /* de-reference happens here, we get contents > of memory pointed to by "bla" symbol */ > > printf("%p\n", &bla); /* here we get value of linker symbol/address of > extern variable */ > > Currently I explicitly support only the latter and want to prevent the > former, until we have kernel variables in BTF. Using `extern void` > makes compiler enforce that only the &bla form is allowed. Everything > else is compilation error. > Ah, I see. I've been taking the extern variable as an actual variable that contains the symbol's address, which is the first case. Your approach makes sense. Thanks for explaining. > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size? > > That's a bit of a hack on my part. Variable needs to point to some > type, which size will match the size of datasec's varinfo entry. This > is checked and enforced by kernel. I'm looking for 4-byte int, because > it's almost guaranteed that it will be present in program's BTF and I > won't have to explicitly add it (it's because all BPF programs return > int, so it must be in program's BTF already). While 8-byte long is > less likely to be there. > > In the future, if we have a nicer way to extend BTF (and we will > soon), we can do this a bit better, but either way that .ksyms DATASEC > type isn't used for anything (there is no map with that DATASEC as a > value type), so it doesn't matter. > Thanks for explaining, Andrii. These explanations as comments in the code would be quite helpful, IMHO.
Andrii, Do you think we need to put the kernel's variables in one single DATASEC in vmlinux BTF? It looks like all the ksyms in the program will be under one ".ksyms" section, so we are not able to tell whether a ksym is from a percpu section or a .rodata section. Without this information, if the vmlinux has multiple DATASECs, the loader may need to traverse all of them. If vmlinux BTF has only one DATASEC, it matches the object's BTF better. Right now, the percpu vars are in a ".data..percpu" DATASEC in my patch and the plan seems that we will introduce more DATASECs to hold other data. Please let me know your insights here. Thanks. Hao On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote: > > On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote: > > > > > > Thanks, Andrii, > > > > > > This change looks nice! A couple of comments: > > > > > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'? > > > > That choice is very deliberate one. `extern const void` is the right > > way in C language to access linker-generated symbols, for instance, > > which is quite similar to what the intent is her. Having void type is > > very explicit that you don't know/care about that value pointed to by > > extern address, the only operation you can perform is to get it's > > address. > > > > Once we add kernel variables support, that's when types will start to > > be specified and libbpf will do extra checks (type matching) and extra > > work (generating ldimm64 with BTF ID, for instance), to allow C code > > to access data pointed to by extern address. > > > > Switching type to u64 would be misleading in allowing C code to > > implicitly dereference value of extern. E.g., there is a big > > difference between: > > > > extern u64 bla; > > > > printf("%lld\n", bla); /* de-reference happens here, we get contents > > of memory pointed to by "bla" symbol */ > > > > printf("%p\n", &bla); /* here we get value of linker symbol/address of > > extern variable */ > > > > Currently I explicitly support only the latter and want to prevent the > > former, until we have kernel variables in BTF. Using `extern void` > > makes compiler enforce that only the &bla form is allowed. Everything > > else is compilation error. > > > > Ah, I see. I've been taking the extern variable as an actual variable > that contains the symbol's address, which is the first case. Your > approach makes sense. Thanks for explaining. > > > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size? > > > > That's a bit of a hack on my part. Variable needs to point to some > > type, which size will match the size of datasec's varinfo entry. This > > is checked and enforced by kernel. I'm looking for 4-byte int, because > > it's almost guaranteed that it will be present in program's BTF and I > > won't have to explicitly add it (it's because all BPF programs return > > int, so it must be in program's BTF already). While 8-byte long is > > less likely to be there. > > > > In the future, if we have a nicer way to extend BTF (and we will > > soon), we can do this a bit better, but either way that .ksyms DATASEC > > type isn't used for anything (there is no map with that DATASEC as a > > value type), so it doesn't matter. > > > > Thanks for explaining, Andrii. > > These explanations as comments in the code would be quite helpful, IMHO.
On Tue, Jun 16, 2020 at 6:24 PM Hao Luo <haoluo@google.com> wrote: > > Andrii, > > Do you think we need to put the kernel's variables in one single > DATASEC in vmlinux BTF? It looks like all the ksyms in the program > will be under one ".ksyms" section, so we are not able to tell whether > a ksym is from a percpu section or a .rodata section. Without this > information, if the vmlinux has multiple DATASECs, the loader may need > to traverse all of them. If vmlinux BTF has only one DATASEC, it > matches the object's BTF better. > > Right now, the percpu vars are in a ".data..percpu" DATASEC in my > patch and the plan seems that we will introduce more DATASECs to hold > other data. > > Please let me know your insights here. Thanks. I think we should keep original DATASECs in vmlinux's BTF, so that they match ELF sections. Otherwise BTF is going to lie and will cause confusion down the road in the longer term. On the BPF program side, though, I think we'll limit it to just two special sections: .ksyms and .ksyms.percpu. libbpf will have to traverse all vmlinux DATASECs to find corresponding variables, but that's ok, it has to do the linear scan either way. > > Hao > > On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote: > > > > On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote: > > > > > > > > Thanks, Andrii, > > > > > > > > This change looks nice! A couple of comments: > > > > > > > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'? > > > > > > That choice is very deliberate one. `extern const void` is the right > > > way in C language to access linker-generated symbols, for instance, > > > which is quite similar to what the intent is her. Having void type is > > > very explicit that you don't know/care about that value pointed to by > > > extern address, the only operation you can perform is to get it's > > > address. > > > > > > Once we add kernel variables support, that's when types will start to > > > be specified and libbpf will do extra checks (type matching) and extra > > > work (generating ldimm64 with BTF ID, for instance), to allow C code > > > to access data pointed to by extern address. > > > > > > Switching type to u64 would be misleading in allowing C code to > > > implicitly dereference value of extern. E.g., there is a big > > > difference between: > > > > > > extern u64 bla; > > > > > > printf("%lld\n", bla); /* de-reference happens here, we get contents > > > of memory pointed to by "bla" symbol */ > > > > > > printf("%p\n", &bla); /* here we get value of linker symbol/address of > > > extern variable */ > > > > > > Currently I explicitly support only the latter and want to prevent the > > > former, until we have kernel variables in BTF. Using `extern void` > > > makes compiler enforce that only the &bla form is allowed. Everything > > > else is compilation error. > > > > > > > Ah, I see. I've been taking the extern variable as an actual variable > > that contains the symbol's address, which is the first case. Your > > approach makes sense. Thanks for explaining. > > > > > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size? > > > > > > That's a bit of a hack on my part. Variable needs to point to some > > > type, which size will match the size of datasec's varinfo entry. This > > > is checked and enforced by kernel. I'm looking for 4-byte int, because > > > it's almost guaranteed that it will be present in program's BTF and I > > > won't have to explicitly add it (it's because all BPF programs return > > > int, so it must be in program's BTF already). While 8-byte long is > > > less likely to be there. > > > > > > In the future, if we have a nicer way to extend BTF (and we will > > > soon), we can do this a bit better, but either way that .ksyms DATASEC > > > type isn't used for anything (there is no map with that DATASEC as a > > > value type), so it doesn't matter. > > > > > > > Thanks for explaining, Andrii. > > > > These explanations as comments in the code would be quite helpful, IMHO.
Sounds good. Thanks. On Tue, Jun 16, 2020 at 6:37 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, Jun 16, 2020 at 6:24 PM Hao Luo <haoluo@google.com> wrote: > > > > Andrii, > > > > Do you think we need to put the kernel's variables in one single > > DATASEC in vmlinux BTF? It looks like all the ksyms in the program > > will be under one ".ksyms" section, so we are not able to tell whether > > a ksym is from a percpu section or a .rodata section. Without this > > information, if the vmlinux has multiple DATASECs, the loader may need > > to traverse all of them. If vmlinux BTF has only one DATASEC, it > > matches the object's BTF better. > > > > Right now, the percpu vars are in a ".data..percpu" DATASEC in my > > patch and the plan seems that we will introduce more DATASECs to hold > > other data. > > > > Please let me know your insights here. Thanks. > > I think we should keep original DATASECs in vmlinux's BTF, so that > they match ELF sections. Otherwise BTF is going to lie and will cause > confusion down the road in the longer term. > > On the BPF program side, though, I think we'll limit it to just two > special sections: .ksyms and .ksyms.percpu. libbpf will have to > traverse all vmlinux DATASECs to find corresponding variables, but > that's ok, it has to do the linear scan either way. > > > > > Hao > > > > On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote: > > > > > > On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote: > > > > > > > > > > Thanks, Andrii, > > > > > > > > > > This change looks nice! A couple of comments: > > > > > > > > > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'? > > > > > > > > That choice is very deliberate one. `extern const void` is the right > > > > way in C language to access linker-generated symbols, for instance, > > > > which is quite similar to what the intent is her. Having void type is > > > > very explicit that you don't know/care about that value pointed to by > > > > extern address, the only operation you can perform is to get it's > > > > address. > > > > > > > > Once we add kernel variables support, that's when types will start to > > > > be specified and libbpf will do extra checks (type matching) and extra > > > > work (generating ldimm64 with BTF ID, for instance), to allow C code > > > > to access data pointed to by extern address. > > > > > > > > Switching type to u64 would be misleading in allowing C code to > > > > implicitly dereference value of extern. E.g., there is a big > > > > difference between: > > > > > > > > extern u64 bla; > > > > > > > > printf("%lld\n", bla); /* de-reference happens here, we get contents > > > > of memory pointed to by "bla" symbol */ > > > > > > > > printf("%p\n", &bla); /* here we get value of linker symbol/address of > > > > extern variable */ > > > > > > > > Currently I explicitly support only the latter and want to prevent the > > > > former, until we have kernel variables in BTF. Using `extern void` > > > > makes compiler enforce that only the &bla form is allowed. Everything > > > > else is compilation error. > > > > > > > > > > Ah, I see. I've been taking the extern variable as an actual variable > > > that contains the symbol's address, which is the first case. Your > > > approach makes sense. Thanks for explaining. > > > > > > > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size? > > > > > > > > That's a bit of a hack on my part. Variable needs to point to some > > > > type, which size will match the size of datasec's varinfo entry. This > > > > is checked and enforced by kernel. I'm looking for 4-byte int, because > > > > it's almost guaranteed that it will be present in program's BTF and I > > > > won't have to explicitly add it (it's because all BPF programs return > > > > int, so it must be in program's BTF already). While 8-byte long is > > > > less likely to be there. > > > > > > > > In the future, if we have a nicer way to extend BTF (and we will > > > > soon), we can do this a bit better, but either way that .ksyms DATASEC > > > > type isn't used for anything (there is no map with that DATASEC as a > > > > value type), so it doesn't matter. > > > > > > > > > > Thanks for explaining, Andrii. > > > > > > These explanations as comments in the code would be quite helpful, IMHO.
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index f67dce2af802..a510d8ed716f 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -75,5 +75,6 @@ enum libbpf_tristate { }; #define __kconfig __attribute__((section(".kconfig"))) +#define __ksym __attribute__((section(".ksyms"))) #endif diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 70c1b7ec2bd0..06cd1731c154 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -168,6 +168,11 @@ static inline bool btf_kflag(const struct btf_type *t) return BTF_INFO_KFLAG(t->info); } +static inline bool btf_is_void(const struct btf_type *t) +{ + return btf_kind(t) == BTF_KIND_UNKN; +} + static inline bool btf_is_int(const struct btf_type *t) { return btf_kind(t) == BTF_KIND_INT; diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index b1a36c965fc0..451fb0b5615b 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -285,6 +285,7 @@ struct bpf_struct_ops { #define BSS_SEC ".bss" #define RODATA_SEC ".rodata" #define KCONFIG_SEC ".kconfig" +#define KSYMS_SEC ".ksyms" #define STRUCT_OPS_SEC ".struct_ops" enum libbpf_map_type { @@ -329,6 +330,7 @@ struct bpf_map { enum extern_type { EXT_UNKNOWN, + EXT_KSYM, EXT_KCFG, }; @@ -357,6 +359,9 @@ struct extern_desc { int data_off; bool is_signed; } kcfg; + struct { + unsigned long addr; + } ksym; }; }; @@ -2812,9 +2817,25 @@ static int cmp_externs(const void *_a, const void *_b) return strcmp(a->name, b->name); } +static int find_int_btf_id(const struct btf *btf) +{ + const struct btf_type *t; + int i, n; + + n = btf__get_nr_types(btf); + for (i = 1; i <= n; i++) { + t = btf__type_by_id(btf, i); + + if (btf_is_int(t) && btf_int_bits(t) == 32) + return i; + } + + return 0; +} + static int bpf_object__collect_externs(struct bpf_object *obj) { - struct btf_type *sec, *kcfg_sec = NULL; + struct btf_type *sec, *kcfg_sec = NULL, *ksym_sec = NULL; const struct btf_type *t; struct extern_desc *ext; int i, n, off; @@ -2895,6 +2916,17 @@ static int bpf_object__collect_externs(struct bpf_object *obj) pr_warn("kconfig extern '%s' type is unsupported\n", ext_name); return -ENOTSUP; } + } else if (strcmp(sec_name, KSYMS_SEC) == 0) { + const struct btf_type *vt; + + ksym_sec = sec; + ext->type = EXT_KSYM; + + vt = skip_mods_and_typedefs(obj->btf, t->type, NULL); + if (!btf_is_void(vt)) { + pr_warn("ksym extern '%s' is not typeless (void)\n", ext_name); + return -ENOTSUP; + } } else { pr_warn("unrecognized extern section '%s'\n", sec_name); return -ENOTSUP; @@ -2908,6 +2940,43 @@ static int bpf_object__collect_externs(struct bpf_object *obj) /* sort externs by type, for kcfg ones also by (align, size, name) */ qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs); + /* for .ksyms section, we need to turn all externs into allocated + * variables in BTF to pass kernel verification; we do this by + * pretending that each extern is a 8-byte variable + */ + if (ksym_sec) { + /* find existing 4-byte integer type in BTF to use for fake + * extern variables in DATASEC + */ + int int_btf_id = find_int_btf_id(obj->btf); + + for (i = 0; i < n; i++) { + pr_debug("extern #%d (ksym): symbol %d, name %s\n", + i, ext->sym_idx, ext->name); + } + + sec = ksym_sec; + n = btf_vlen(sec); + for (i = 0, off = 0; i < n; i++, off += sizeof(int)) { + struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i; + struct btf_type *vt; + + vt = (void *)btf__type_by_id(obj->btf, vs->type); + ext_name = btf__name_by_offset(obj->btf, vt->name_off); + ext = find_extern_by_name(obj, ext_name); + if (!ext) { + pr_warn("failed to find extern definition for BTF var '%s'\n", + ext_name); + return -ESRCH; + } + btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED; + vt->type = int_btf_id; + vs->offset = off; + vs->size = sizeof(int); + } + sec->size = off; + } + if (kcfg_sec) { sec = kcfg_sec; /* for kcfg externs calculate their offsets within a .kconfig map */ @@ -5009,9 +5078,14 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj) break; case RELO_EXTERN: ext = &obj->externs[relo->sym_off]; - insn[0].src_reg = BPF_PSEUDO_MAP_VALUE; - insn[0].imm = obj->maps[obj->kconfig_map_idx].fd; - insn[1].imm = ext->kcfg.data_off; + if (ext->type == EXT_KCFG) { + insn[0].src_reg = BPF_PSEUDO_MAP_VALUE; + insn[0].imm = obj->maps[obj->kconfig_map_idx].fd; + insn[1].imm = ext->kcfg.data_off; + } else /* EXT_KSYM */ { + insn[0].imm = (__u32)ext->ksym.addr; + insn[1].imm = ext->ksym.addr >> 32; + } break; case RELO_CALL: err = bpf_program__reloc_text(prog, obj, relo); @@ -5630,10 +5704,57 @@ static int bpf_object__sanitize_maps(struct bpf_object *obj) return 0; } +static int bpf_object__read_kallsyms_file(struct bpf_object *obj) +{ + char sym_type, sym_name[256]; + unsigned long sym_addr; + struct extern_desc *ext; + int ret, err = 0; + FILE *f; + + f = fopen("/proc/kallsyms", "r"); + if (!f) { + err = -errno; + pr_warn("failed to open /proc/kallsyms: %d\n", err); + return err; + } + + while (true) { + ret = fscanf(f, "%lx %c %s%*[^\n]\n", + &sym_addr, &sym_type, sym_name); + if (ret == EOF && feof(f)) + break; + if (ret != 3) { + err = -EINVAL; + goto out; + } + + ext = find_extern_by_name(obj, sym_name); + if (!ext || ext->type != EXT_KSYM) + continue; + + if (ext->is_set && ext->ksym.addr != sym_addr) { + pr_warn("ksym extern '%s' resolution is ambiguous: 0x%lx or 0x%lx\n", + sym_name, ext->ksym.addr, sym_addr); + err = -EINVAL; + goto out; + } + if (!ext->is_set) { + ext->is_set = true; + ext->ksym.addr = sym_addr; + pr_debug("ksym extern %s=0x%lx\n", sym_name, sym_addr); + } + } + +out: + fclose(f); + return err; +} + static int bpf_object__resolve_externs(struct bpf_object *obj, const char *extra_kconfig) { - bool need_config = false; + bool need_config = false, need_kallsyms = false; struct extern_desc *ext; void *kcfg_data; int err, i; @@ -5663,6 +5784,8 @@ static int bpf_object__resolve_externs(struct bpf_object *obj, } else if (ext->type == EXT_KCFG && strncmp(ext->name, "CONFIG_", 7) == 0) { need_config = true; + } else if (ext->type == EXT_KSYM) { + need_kallsyms = true; } else { pr_warn("unrecognized extern '%s'\n", ext->name); return -EINVAL; @@ -5686,6 +5809,11 @@ static int bpf_object__resolve_externs(struct bpf_object *obj, if (err) return -EINVAL; } + if (need_kallsyms) { + err = bpf_object__read_kallsyms_file(obj); + if (err) + return -EINVAL; + } for (i = 0; i < obj->nr_extern; i++) { ext = &obj->externs[i];
Add support for another (in addition to existing Kconfig) special kind of externs in BPF code, kernel symbol externs. Such externs allow BPF code to "know" kernel symbol address and either use it for comparisons with kernel data structures (e.g., struct file's f_op pointer, to distinguish different kinds of file), or, with the help of bpf_probe_user_kernel(), to follow pointers and read data from global variables. Kernel symbol addresses are found through /proc/kallsyms, which should be present in the system. Currently, such kernel symbol variables are typeless: they have to be defined as `extern const void <symbol>` and the only operation you can do (in C code) with them is to take its address. Such extern should reside in a special section '.ksyms'. bpf_helpers.h header provides __ksym macro for this. Strong vs weak semantics stays the same as with Kconfig externs. If symbol is not found in /proc/kallsyms, this will be a failure for strong (non-weak) extern, but will be defaulted to 0 for weak externs. If the same symbol is defined multiple times in /proc/kallsyms, then it will be error if any of the associated addresses differs. In that case, address is ambiguous, so libbpf falls on the side of caution, rather than confusing user with randomly chosen address. In the future, once kernel is extended with variables BTF information, such ksym externs will be supported in a typed version, which will allow BPF program to read variable's contents directly, similarly to how it's done for fentry/fexit input arguments. Signed-off-by: Andrii Nakryiko <andriin@fb.com> --- tools/lib/bpf/bpf_helpers.h | 1 + tools/lib/bpf/btf.h | 5 ++ tools/lib/bpf/libbpf.c | 138 ++++++++++++++++++++++++++++++++++-- 3 files changed, 139 insertions(+), 5 deletions(-)