mbox series

[V11,0/4] BPF: New helper to obtain namespace data from current task

Message ID 20190924152005.4659-1-cneirabustos@gmail.com
Headers show
Series BPF: New helper to obtain namespace data from current task | expand

Message

Carlos Antonio Neira Bustos Sept. 24, 2019, 3:20 p.m. UTC
Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
scripts but this helper returns the pid as seen by the root namespace which is
fine when a bcc script is not executed inside a container.
When the process of interest is inside a container, pid filtering will not work
if bpf_get_current_pid_tgid() is used.
This helper addresses this limitation returning the pid as it's seen by the current
namespace where the script is executing.

In the future different pid_ns files may belong to different devices, according to the
discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
used to do pid filtering even inside a container.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>

Carlos Neira (4):
  fs/nsfs.c: added ns_match
  bpf: added new helper bpf_get_ns_current_pid_tgid
  tools: Added bpf_get_ns_current_pid_tgid helper
  tools/testing/selftests/bpf: Add self-tests for new helper. self tests
    added for new helper

 fs/nsfs.c                                     |   8 +
 include/linux/bpf.h                           |   1 +
 include/linux/proc_ns.h                       |   2 +
 include/uapi/linux/bpf.h                      |  18 ++-
 kernel/bpf/core.c                             |   1 +
 kernel/bpf/helpers.c                          |  32 ++++
 kernel/trace/bpf_trace.c                      |   2 +
 tools/include/uapi/linux/bpf.h                |  18 ++-
 tools/testing/selftests/bpf/Makefile          |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
 .../selftests/bpf/progs/test_pidns_kern.c     |  71 ++++++++
 tools/testing/selftests/bpf/test_pidns.c      | 152 ++++++++++++++++++
 12 files changed, 307 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns.c

Comments

Daniel Borkmann Sept. 24, 2019, 6:01 p.m. UTC | #1
On Tue, Sep 24, 2019 at 12:20:01PM -0300, Carlos Neira wrote:
> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> scripts but this helper returns the pid as seen by the root namespace which is
> fine when a bcc script is not executed inside a container.
> When the process of interest is inside a container, pid filtering will not work
> if bpf_get_current_pid_tgid() is used.
> This helper addresses this limitation returning the pid as it's seen by the current
> namespace where the script is executing.
> 
> In the future different pid_ns files may belong to different devices, according to the
> discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
> To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> used to do pid filtering even inside a container.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> 
> Carlos Neira (4):
>   fs/nsfs.c: added ns_match
>   bpf: added new helper bpf_get_ns_current_pid_tgid
>   tools: Added bpf_get_ns_current_pid_tgid helper
>   tools/testing/selftests/bpf: Add self-tests for new helper. self tests
>     added for new helper

bpf-next is currently closed due to merge window. Please resubmit once back open, thanks.
Carlos Antonio Neira Bustos Sept. 24, 2019, 6:14 p.m. UTC | #2
On Tue, Sep 24, 2019 at 08:01:17PM +0200, Daniel Borkmann wrote:
> On Tue, Sep 24, 2019 at 12:20:01PM -0300, Carlos Neira wrote:
> > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> > scripts but this helper returns the pid as seen by the root namespace which is
> > fine when a bcc script is not executed inside a container.
> > When the process of interest is inside a container, pid filtering will not work
> > if bpf_get_current_pid_tgid() is used.
> > This helper addresses this limitation returning the pid as it's seen by the current
> > namespace where the script is executing.
> > 
> > In the future different pid_ns files may belong to different devices, according to the
> > discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
> > To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
> > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> > used to do pid filtering even inside a container.
> > 
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > 
> > Carlos Neira (4):
> >   fs/nsfs.c: added ns_match
> >   bpf: added new helper bpf_get_ns_current_pid_tgid
> >   tools: Added bpf_get_ns_current_pid_tgid helper
> >   tools/testing/selftests/bpf: Add self-tests for new helper. self tests
> >     added for new helper
> 
> bpf-next is currently closed due to merge window. Please resubmit once back open, thanks.

Thanks, Daniel, I'll do so.

Bests.
Eric W. Biederman Sept. 26, 2019, 12:59 a.m. UTC | #3
Carlos Neira <cneirabustos@gmail.com> writes:

> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> scripts but this helper returns the pid as seen by the root namespace which is
> fine when a bcc script is not executed inside a container.
> When the process of interest is inside a container, pid filtering will not work
> if bpf_get_current_pid_tgid() is used.
> This helper addresses this limitation returning the pid as it's seen by the current
> namespace where the script is executing.
>
> In the future different pid_ns files may belong to different devices, according to the
> discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
> To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> used to do pid filtering even inside a container.

I think I may have asked this before.  If I am repeating old gound
please excuse me.

Am I correct in understanding these new helpers are designed to be used
when programs running in ``conainers'' call it inside pid namespaces
register bpf programs for tracing?

If so would it be possible to change how the existing bpf opcodes
operate when they are used in the context of a pid namespace?

That later would seem to allow just moving an existing application into
a pid namespace with no modifications.   If we can do this with trivial
cost at bpf compile time and with no userspace changes that would seem
a better approach.

If not can someone point me to why we can't do that?  What am I missing?

Eric

> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>
> Carlos Neira (4):
>   fs/nsfs.c: added ns_match
>   bpf: added new helper bpf_get_ns_current_pid_tgid
>   tools: Added bpf_get_ns_current_pid_tgid helper
>   tools/testing/selftests/bpf: Add self-tests for new helper. self tests
>     added for new helper
>
>  fs/nsfs.c                                     |   8 +
>  include/linux/bpf.h                           |   1 +
>  include/linux/proc_ns.h                       |   2 +
>  include/uapi/linux/bpf.h                      |  18 ++-
>  kernel/bpf/core.c                             |   1 +
>  kernel/bpf/helpers.c                          |  32 ++++
>  kernel/trace/bpf_trace.c                      |   2 +
>  tools/include/uapi/linux/bpf.h                |  18 ++-
>  tools/testing/selftests/bpf/Makefile          |   2 +-
>  tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
>  .../selftests/bpf/progs/test_pidns_kern.c     |  71 ++++++++
>  tools/testing/selftests/bpf/test_pidns.c      | 152 ++++++++++++++++++
>  12 files changed, 307 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>  create mode 100644 tools/testing/selftests/bpf/test_pidns.c
Yonghong Song Sept. 26, 2019, 3:51 p.m. UTC | #4
On 9/25/19 5:59 PM, Eric W. Biederman wrote:
> Carlos Neira <cneirabustos@gmail.com> writes:
> 
>> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
>> scripts but this helper returns the pid as seen by the root namespace which is
>> fine when a bcc script is not executed inside a container.
>> When the process of interest is inside a container, pid filtering will not work
>> if bpf_get_current_pid_tgid() is used.
>> This helper addresses this limitation returning the pid as it's seen by the current
>> namespace where the script is executing.
>>
>> In the future different pid_ns files may belong to different devices, according to the
>> discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
>> To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
>> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
>> used to do pid filtering even inside a container.
> 
> I think I may have asked this before.  If I am repeating old gound
> please excuse me.
> 
> Am I correct in understanding these new helpers are designed to be used
> when programs running in ``conainers'' call it inside pid namespaces
> register bpf programs for tracing?

Right.

> 
> If so would it be possible to change how the existing bpf opcodes
> operate when they are used in the context of a pid namespace?


Today, typical bpf program getting pid like:
    uint64_t pid_tgid = bpf_get_current_pid_tgid();
    pid_t pid = pid_tgid >> 32;
    pid_t tid = pid_tgid;

    /* possible filtering ... */
    if (pid == <user_provided pid>) ....
    ...

    /* record pid in some places */
    map_val->pid = pid;
    ...

The bpf_get_current_pid_tgid() is a kernel helper
    BPF_CALL_0(bpf_get_current_pid_tgid)
    {
         struct task_struct *task = current;

         if (unlikely(!task))
                 return -EINVAL;

         return (u64) task->tgid << 32 | task->pid;
    }

So the bpf_get_current_pid_tgid() gets the tgid/pid outside any
pid namespaces.

To make the program work inside the container, just get namespace
pid/tgid not enough. You need to make sure the namespace you are
tracking is the one you are in. That is what the new proposed
helper to do.

Do you suggest we change
    bpf_get_current_pid_tgid()
to return namespaced tgid/pid?
First, this will break user API (kernel helper is an API) and second,
even if we do get pid/tgid, we still not sure whether
this is for my namespace or not.

Do you have something in mind to address this issue?

> 
> That later would seem to allow just moving an existing application into
> a pid namespace with no modifications.   If we can do this with trivial
> cost at bpf compile time and with no userspace changes that would seem
> a better approach.
> 
> If not can someone point me to why we can't do that?  What am I missing?
> 
> Eric
> 
>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>>
>> Carlos Neira (4):
>>    fs/nsfs.c: added ns_match
>>    bpf: added new helper bpf_get_ns_current_pid_tgid
>>    tools: Added bpf_get_ns_current_pid_tgid helper
>>    tools/testing/selftests/bpf: Add self-tests for new helper. self tests
>>      added for new helper
>>
>>   fs/nsfs.c                                     |   8 +
>>   include/linux/bpf.h                           |   1 +
>>   include/linux/proc_ns.h                       |   2 +
>>   include/uapi/linux/bpf.h                      |  18 ++-
>>   kernel/bpf/core.c                             |   1 +
>>   kernel/bpf/helpers.c                          |  32 ++++
>>   kernel/trace/bpf_trace.c                      |   2 +
>>   tools/include/uapi/linux/bpf.h                |  18 ++-
>>   tools/testing/selftests/bpf/Makefile          |   2 +-
>>   tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
>>   .../selftests/bpf/progs/test_pidns_kern.c     |  71 ++++++++
>>   tools/testing/selftests/bpf/test_pidns.c      | 152 ++++++++++++++++++
>>   12 files changed, 307 insertions(+), 3 deletions(-)
>>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>>   create mode 100644 tools/testing/selftests/bpf/test_pidns.c
John Fastabend Sept. 26, 2019, 4:16 p.m. UTC | #5
Eric W. Biederman wrote:
> Carlos Neira <cneirabustos@gmail.com> writes:
> 
> > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> > scripts but this helper returns the pid as seen by the root namespace which is
> > fine when a bcc script is not executed inside a container.
> > When the process of interest is inside a container, pid filtering will not work
> > if bpf_get_current_pid_tgid() is used.
> > This helper addresses this limitation returning the pid as it's seen by the current
> > namespace where the script is executing.
> >
> > In the future different pid_ns files may belong to different devices, according to the
> > discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
> > To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
> > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> > used to do pid filtering even inside a container.
> 
> I think I may have asked this before.  If I am repeating old gound
> please excuse me.
> 
> Am I correct in understanding these new helpers are designed to be used
> when programs running in ``conainers'' call it inside pid namespaces
> register bpf programs for tracing?
> 
> If so would it be possible to change how the existing bpf opcodes
> operate when they are used in the context of a pid namespace?
> 
> That later would seem to allow just moving an existing application into
> a pid namespace with no modifications.   If we can do this with trivial
> cost at bpf compile time and with no userspace changes that would seem
> a better approach.
> 
> If not can someone point me to why we can't do that?  What am I missing?

We have some management/observabiliity bpf programs loaded from privileged
containers that end up getting triggered in multiple container context. Here
we want the root namespace pid otherwise there would be collisions (same pid
in multiple containers) when its used as a key and we would have difficulty
finding the pid from the root namespace.

I guess at load time if its an unprivileged program we could convert it to
use the pid of the current namespace?

Or if the application is moved into a unprivileged container?

Our code is outside bcc so not sure exactly how the bcc case works. Just
wanted to point out we use the root namespace pid for various things
so I think it might need to be a bit smarter than just the moving an
existing application into a pid namespace.

.John
Yonghong Song Sept. 26, 2019, 5:01 p.m. UTC | #6
On 9/26/19 9:16 AM, John Fastabend wrote:
> Eric W. Biederman wrote:
>> Carlos Neira <cneirabustos@gmail.com> writes:
>>
>>> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
>>> scripts but this helper returns the pid as seen by the root namespace which is
>>> fine when a bcc script is not executed inside a container.
>>> When the process of interest is inside a container, pid filtering will not work
>>> if bpf_get_current_pid_tgid() is used.
>>> This helper addresses this limitation returning the pid as it's seen by the current
>>> namespace where the script is executing.
>>>
>>> In the future different pid_ns files may belong to different devices, according to the
>>> discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
>>> To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
>>> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
>>> used to do pid filtering even inside a container.
>>
>> I think I may have asked this before.  If I am repeating old gound
>> please excuse me.
>>
>> Am I correct in understanding these new helpers are designed to be used
>> when programs running in ``conainers'' call it inside pid namespaces
>> register bpf programs for tracing?
>>
>> If so would it be possible to change how the existing bpf opcodes
>> operate when they are used in the context of a pid namespace?
>>
>> That later would seem to allow just moving an existing application into
>> a pid namespace with no modifications.   If we can do this with trivial
>> cost at bpf compile time and with no userspace changes that would seem
>> a better approach.
>>
>> If not can someone point me to why we can't do that?  What am I missing?
> 
> We have some management/observabiliity bpf programs loaded from privileged
> containers that end up getting triggered in multiple container context. Here
> we want the root namespace pid otherwise there would be collisions (same pid
> in multiple containers) when its used as a key and we would have difficulty
> finding the pid from the root namespace.

Yes, using root namespace pid will work.

I am referring to a priviledged container (current root, and future may
just CAP_BPF and CAP_TRACIING) where you do not need to go to root
to check root pids. Also, there are cases, we do pid namespace-scope 
statistics collecting, filtering based on namespace "id" is also needed.

> 
> I guess at load time if its an unprivileged program we could convert it to
> use the pid of the current namespace?

This way we will need to helper to get current namespace pid.

> 
> Or if the application is moved into a unprivileged container?

Ya. A helper will be needed.

> 
> Our code is outside bcc so not sure exactly how the bcc case works. Just
> wanted to point out we use the root namespace pid for various things
> so I think it might need to be a bit smarter than just the moving an
> existing application into a pid namespace.

As a workaround, we do this as well. The goal is to improve usability.
So we do not need to go to root to find these pids.
Sometimes if filtering at namespace level, we have to approximate as 
sometimes it is impossible to track all pids in the container.

> 
> .John
>