diff mbox series

[v5,next,5/5] net: modules: use request_module_cap() to load 'netdev-%s' modules

Message ID 1511803118-2552-6-git-send-email-tixxdz@gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show
Series Improve Module autoloading infrastructure | expand

Commit Message

Djalal Harouni Nov. 27, 2017, 5:18 p.m. UTC
This uses the new request_module_cap() facility to directly propagate
CAP_NET_ADMIN capability and the 'netdev' module prefix to the
capability subsystem as it was suggested.

We do not remove the explicit capable(CAP_NET_ADMIN) check here, but we
may remove it in future versions since it is also performed by the
capability subsystem. This allows to have a better interface where other
subsystems will just use this call and let the capability subsystem
handles the permission checks, if the modules should be loaded or not.

This is also an infrastructure fix since historically Linux always
allowed to auto-load modules without privileges, and later the net code
started to check capabilities and prefixes, adapted the CAP_NET_ADMIN
check with the 'netdev' prefix to prevent abusing the capability by
loading non-netdev modules. However from a bigger picture we want to
continue to support automatic module loading as non privileged but also
implement easy policy solutions like:

User=djalal
DenyNewFeatures=no

Which will translate to allow the interactive user djalal to load extra
Linux features. Others, volatile accounts or guests can be easily
blocked from doing so. We have introduced in previous patches the
necessary infrastructure and now with this change we start to use the
new request_module_cap() function to explicitly tell the capability
subsystem that we want to auto-load modules with CAP_NET_ADMIN if they
are prefixed.

This is also based on suggestions from Rusty Russel and Kees Cook [1]

[1] https://lkml.org/lkml/2017/4/26/735

Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Andy Lutomirski <luto@kernel.org>
Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
---
 net/core/dev_ioctl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Linus Torvalds Nov. 27, 2017, 6:44 p.m. UTC | #1
On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
> This uses the new request_module_cap() facility to directly propagate
> CAP_NET_ADMIN capability and the 'netdev' module prefix to the
> capability subsystem as it was suggested.

This is the kind of complexity that I wonder if it's worth it at all.

Nobody sane actually uses those stupid capability bits. Have you ever
actually seen it used in real life?

They were a mistake, and we should never have done them - another case
of security people who think that complexity == security, when in
reality nobody actually wants the complexity or is willing to set it
up and manage it.

                   Linus
Djalal Harouni Nov. 27, 2017, 9:41 p.m. UTC | #2
Hi Linus,

On Mon, Nov 27, 2017 at 7:44 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
>> This uses the new request_module_cap() facility to directly propagate
>> CAP_NET_ADMIN capability and the 'netdev' module prefix to the
>> capability subsystem as it was suggested.
>
> This is the kind of complexity that I wonder if it's worth it at all.
>
> Nobody sane actually uses those stupid capability bits. Have you ever
> actually seen it used in real life?

Yes they are complicated even for developers, and normal users do not
understand them, however yes every sandbox and container is exposing
them to endusers directly, they are documented!  so yes CAP_SYS_MODULE
is exposed but it does not cover autoloading.

However, we are trying hard to abstract some semantics that are easy
to grasp, we are mutating capabilities and seccomp to have an
abstracted "yes/no" options for our endusers.

Now, if you are referring to kernel code, the networking subsystem is
using them and I don't want to break any assumption here. There is
still the request_module(), the request_module_cap() was suggested so
networking code later won't have to do the checks on its own, and
maybe it can be consistent in the long term. The phonet sockets even
needs CAP_SYS_ADMIN...


>
> They were a mistake, and we should never have done them - another case
> of security people who think that complexity == security, when in
> reality nobody actually wants the complexity or is willing to set it
> up and manage it.

Alright, but I guess we are stuck, is there something better on how we
can do this or describe this ?


Please note in these patches, the mode is specifically described as:

* allowed: for backward compatibility  (I would have done without it)
* privileged: which includes capabilities (backward compatibility too)
or we can add what ever in the future
* disabled: even for privileged.

So I would have preferred if it is something like "yes/no" but...
However in userspace we will try hard to hide this complexity and the
capability bits.

Now I can see that the code comments and doc refer to privileged with
capabilities a lot, where we can maybe update that doc and code to
less state that privileged means capabilities ? Suggestions ?

Thanks!

>                    Linus
Linus Torvalds Nov. 27, 2017, 10:04 p.m. UTC | #3
On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote:
>
> However, we are trying hard to abstract some semantics that are easy
> to grasp, we are mutating capabilities and seccomp to have an
> abstracted "yes/no" options for our endusers.

Yes.

Sadly, it looks like we actually do have users that just expect to
load modules dynamically without any capabilities at all.

So we can't actually disallow it by default at all, which imho makes
this security option essentially useless.

A security option that people can't use without breaking their system
is pointless.

We saw that with SELinux - people ended up just disabling it for
_years_, simply because it ended up breaking so much in practice. And
yes, it got fixed eventually, but at an incredibly high maintenance
cost of all the crazy rules databases.

> Alright, but I guess we are stuck, is there something better on how we
> can do this or describe this ?

So I wonder if we can perhaps look at the places that actually do
"requerst_module()", and start filtering them on that basis.

Some of them will already have checked for capabilities.

Others clearly expect to juist work even _without_ capabilities (ie
the bluetoothd case).

So the whole "let's add a global config option" model is broken. There
is no possible global rule. It will break things, which in turn mean
that people won't turn it on (and we can't turn it on by default),
which in turn makes this pointless.

In other words, I really think that anything that just adds a mode
flag cannot work.

So instead of having one "modules_autoload_mode" thing, maybe the
individual requerst_module() cases need to simply be audited.

Put another way: I think the part of your patch series that does that
"request_module_cap()" and makes the netdev modules use it is a good
addition.

It's the "mode" part I really don't agree with, because apparently we
really need to default it to permissive.

So how about instead:

 - add that "request_module_cap()" and make the networking code that
already uses CAP_ADMIN_NET use it.

 - make "request_module()" itself default to being
"request_module_cap(CAP_SYS_MODULE,..)"

 - make sure that when the capability check fails, we print an error
message, and then for the ones that trigger, we will audit them and
see if it's ok.

Because that "mode" flag defaulting to off will just mean that the
default case will remain the existing unsafe one, and that's bad.

Opt-in really doesn't work. We've done it.

Global flags for varied behavior really doesn't work. We've done that
too. Different cases want different behavior, the global flag is just
fundamentally broken.

              Linus
Kees Cook Nov. 27, 2017, 10:59 p.m. UTC | #4
On Mon, Nov 27, 2017 at 2:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote:
>>
>> However, we are trying hard to abstract some semantics that are easy
>> to grasp, we are mutating capabilities and seccomp to have an
>> abstracted "yes/no" options for our endusers.
>
> Yes.
>
> Sadly, it looks like we actually do have users that just expect to
> load modules dynamically without any capabilities at all.
>
> So we can't actually disallow it by default at all, which imho makes
> this security option essentially useless.

The good news here is that this series comes with an expected user:
systemd, for which I suspect this will option will get wider and wider
use.

> A security option that people can't use without breaking their system
> is pointless.

Another bit of good news is that systems _depending_ cap-less module
loading are uncommon enough that many system builders can enable this
protection and nothing will break. (Even you thought it was rare
enough that we could just make this default-enabled.)

Besides, distros understand that they have to keep an eye out for
these things since the kernel has a long history of not allowing
default-enabled protections that change userspace behavior, etc.

> So I wonder if we can perhaps look at the places that actually do
> "requerst_module()", and start filtering them on that basis.
>
> Some of them will already have checked for capabilities.
>
> Others clearly expect to just work even _without_ capabilities (ie
> the bluetoothd case).
>
> So the whole "let's add a global config option" model is broken. There
> is no possible global rule. It will break things, which in turn mean
> that people won't turn it on (and we can't turn it on by default),
> which in turn makes this pointless.
>
> In other words, I really think that anything that just adds a mode
> flag cannot work.
>
> So instead of having one "modules_autoload_mode" thing, maybe the
> individual requerst_module() cases need to simply be audited.
>
> Put another way: I think the part of your patch series that does that
> "request_module_cap()" and makes the netdev modules use it is a good
> addition.
>
> It's the "mode" part I really don't agree with, because apparently we
> really need to default it to permissive.
>
> So how about instead:
>
>  - add that "request_module_cap()" and make the networking code that
> already uses CAP_ADMIN_NET use it.
>
>  - make "request_module()" itself default to being
> "request_module_cap(CAP_SYS_MODULE,..)"
>
>  - make sure that when the capability check fails, we print an error
> message, and then for the ones that trigger, we will audit them and
> see if it's ok.

The issue isn't "is it always okay to cap-less-load this module?" it's
"should the kernel's attack surface be allowed to grown by an
unprivileged user?" Nearly all the request_module() sites have already
been audited and adjusted to avoid _arbitrary_ module loading (usually
by adding module prefixes: netdev-, crypto-, etc), and that greatly
helped reduce the ability for a unprivileged user to load a
known-buggy module, but there will remain many subsystems that
continue to expect to grow the kernel's features on demand within
their subsystem, and sometimes those modules will have bugs that an
unprivileged user can exploit (this history of this happening is quite
real; it's not a theoretical concern). This is especially problematic
for distro kernels where they build tons of modules.

There isn't a way for an admin to say "I only want root to be able to
load modules". They can't delete all the "unused" kernel modules,
because maybe root will want the module, they can't change module file
permissions because the modules are read with init's permissions (and
it would be terrible to repeatedly change all the file perms each time
a new kernel got installed), modules can only be blacklisted (not
whitelisted) in /etc/modprobe.d/, etc.

> Because that "mode" flag defaulting to off will just mean that the
> default case will remain the existing unsafe one, and that's bad.
>
> Opt-in really doesn't work. We've done it.
>
> Global flags for varied behavior really doesn't work. We've done that
> too. Different cases want different behavior, the global flag is just
> fundamentally broken.

I don't disagree that a global should be avoided, but I'm struggling
to see another option here. We can't break userspace by default so we
can't restrict cap-less loading by default. But we can allow userspace
to _choose_ to break itself, especially within a container. This isn't
uncommon, especially for modules, where we even have the global
"modules_disabled" sysctl already. The level of granularity of control
here is the issue, and it's what this series solves.

The options I see for module loading control are:
1) monolithic kernel (no modules)
2) modular kernel that flips on modules_disabled after boot (no
modules after boot)
3) modular kernel that allows per-subsystem unpriv module loading (all
modules loadable)

There is a demand for something between 2 and 3 where only root can
load modules. (And as pointed out in the series, this is _especially_
true for containers where the admin may want to leave module loading
alone in the init namespace, but stop any module loading in the
container.)

-Kees
Linus Torvalds Nov. 27, 2017, 11:14 p.m. UTC | #5
On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
>
> I don't disagree that a global should be avoided, but I'm struggling
> to see another option here. We can't break userspace by default so we
> can't restrict cap-less loading by default. But we can allow userspace
> to _choose_ to break itself, especially within a container. This isn't
> uncommon, especially for modules, where we even have the global
> "modules_disabled" sysctl already. The level of granularity of control
> here is the issue, and it's what this series solves.

So there's two "global" here

 - if a container were to choose to break itself, it should damn well
be container-specific, not some global option

   This part seems to be ok in the patch series, since the "global" is
really per-task. So it's not global in the "system-wide" sense.

 - if _one_ request_module() caller were to say "I don't want to be
loaded by a normal user", that doesn't mean that _other_
request_module() cases shouldn't.

   This is the part I'm objecting to, because it means that we can't
enable this stricter policy by default.

And the thing is, the patch series seems to already introduce largely
the better model of just making it site-specific. Introducing that
request_module_cap() thing and then using it for networking is a good
step.

But I also suspect that we _could_ just make the stricter rules
actually be default, if we just fixed the thing up to not be "every
request_module() is the same".

For example, several request_module() calls come from device node
opens, and it makes sense that we can just say: "if you have access to
the device node, then you have the right to request the module".

But that would need to be not a global "request_module()" behavior,
but a behavior that is tied to the particular call-site.

IOW, extend on that request_module_cap() model, and introduce
(perhaps) a "request_module_dev()" call that basically means "the user
opened the device node for the requested module".

Because those kinds of permissions aren't necessarily about
capabilities, but about things like "I'm in the dialout group, I get
to open tty devices and by implication request their modules".

And that _really_ isn't global behavior.  The fact that I might be
able to load a serial; module has *nothing* to do with whether I can
load some other kind of module at all.

That global mode is just wrong.

                Linus
Kees Cook Nov. 27, 2017, 11:19 p.m. UTC | #6
On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I don't disagree that a global should be avoided, but I'm struggling
>> to see another option here. We can't break userspace by default so we
>> can't restrict cap-less loading by default. But we can allow userspace
>> to _choose_ to break itself, especially within a container. This isn't
>> uncommon, especially for modules, where we even have the global
>> "modules_disabled" sysctl already. The level of granularity of control
>> here is the issue, and it's what this series solves.
>
> So there's two "global" here
>
>  - if a container were to choose to break itself, it should damn well
> be container-specific, not some global option
>
>    This part seems to be ok in the patch series, since the "global" is
> really per-task. So it's not global in the "system-wide" sense.
>
>  - if _one_ request_module() caller were to say "I don't want to be
> loaded by a normal user", that doesn't mean that _other_
> request_module() cases shouldn't.
>
>    This is the part I'm objecting to, because it means that we can't
> enable this stricter policy by default.
>
> And the thing is, the patch series seems to already introduce largely
> the better model of just making it site-specific. Introducing that
> request_module_cap() thing and then using it for networking is a good
> step.
>
> But I also suspect that we _could_ just make the stricter rules
> actually be default, if we just fixed the thing up to not be "every
> request_module() is the same".
>
> For example, several request_module() calls come from device node
> opens, and it makes sense that we can just say: "if you have access to
> the device node, then you have the right to request the module".
>
> But that would need to be not a global "request_module()" behavior,
> but a behavior that is tied to the particular call-site.
>
> IOW, extend on that request_module_cap() model, and introduce
> (perhaps) a "request_module_dev()" call that basically means "the user
> opened the device node for the requested module".
>
> Because those kinds of permissions aren't necessarily about
> capabilities, but about things like "I'm in the dialout group, I get
> to open tty devices and by implication request their modules".
>
> And that _really_ isn't global behavior.  The fact that I might be
> able to load a serial; module has *nothing* to do with whether I can
> load some other kind of module at all.
>
> That global mode is just wrong.

What about exporting this entirely to userspace, giving it as much
context as possible? i.e. inform modprobe about the user doing it,
maybe the subsystem, etc?

-Kees
Linus Torvalds Nov. 27, 2017, 11:35 p.m. UTC | #7
On Mon, Nov 27, 2017 at 3:19 PM, Kees Cook <keescook@chromium.org> wrote:
>
> What about exporting this entirely to userspace, giving it as much
> context as possible? i.e. inform modprobe about the user doing it,
> maybe the subsystem, etc?

Yeah, except for the fact that we don't trust user-mode?

We used to do that exact thing. It was a nasty disaster, and caused
version skew and other horrible problems.

So no. Th e"let's just let user mode sort it out" doesn't work. User
mode doesn't sort anything out, it just makes it worse.

It's not some made-up example when I say that user-mode has decided
that kernel requests have to be completely serialized, and recusive
invocations will just hang.

So no. We do not go down that particular rat-hole. It's just a bigger
chance of getting things wrong.

                Linus
Kees Cook Nov. 28, 2017, 1:23 a.m. UTC | #8
On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I don't disagree that a global should be avoided, but I'm struggling
>> to see another option here. We can't break userspace by default so we
>> can't restrict cap-less loading by default. But we can allow userspace
>> to _choose_ to break itself, especially within a container. This isn't
>> uncommon, especially for modules, where we even have the global
>> "modules_disabled" sysctl already. The level of granularity of control
>> here is the issue, and it's what this series solves.
>
> So there's two "global" here
>
>  - if a container were to choose to break itself, it should damn well
> be container-specific, not some global option
>
>    This part seems to be ok in the patch series, since the "global" is
> really per-task. So it's not global in the "system-wide" sense.

Right, though in the case of init, it could flip that toggle for
itself and it would then effectively be system-wide.

>  - if _one_ request_module() caller were to say "I don't want to be
> loaded by a normal user", that doesn't mean that _other_
> request_module() cases shouldn't.
>
>    This is the part I'm objecting to, because it means that we can't
> enable this stricter policy by default.

Okay, I see what you mean here. You want to clearly distinguish
between unprivileged and privileged-only. I'm unconvinced that's going
to change much, as the bulk of the exposed request_module() users are
already expecting to be unprivileged (and that's why they were all
converted to requiring a named prefix).

> And the thing is, the patch series seems to already introduce largely
> the better model of just making it site-specific. Introducing that
> request_module_cap() thing and then using it for networking is a good
> step.
>
> But I also suspect that we _could_ just make the stricter rules
> actually be default, if we just fixed the thing up to not be "every
> request_module() is the same".

When doing some of the older module name prefix work, I did consider
introducing a new request_module() API that included the prefix name
as an explicit argument (instead of embedding it in the format
string). We could easily start there, and then have "plain"
request_module() require privs. But we'll still need a way to say
"admin doesn't want unpriv module auto-loading".

> For example, several request_module() calls come from device node
> opens, and it makes sense that we can just say: "if you have access to
> the device node, then you have the right to request the module".

Many of these callers are using network interfaces to do this work, so
there isn't as clean a permission model associated with those like
there might be with a filesystem open(). But that doesn't matter (see
below).

> But that would need to be not a global "request_module()" behavior,
> but a behavior that is tied to the particular call-site.
>
> IOW, extend on that request_module_cap() model, and introduce
> (perhaps) a "request_module_dev()" call that basically means "the user
> opened the device node for the requested module".
>
> Because those kinds of permissions aren't necessarily about
> capabilities, but about things like "I'm in the dialout group, I get
> to open tty devices and by implication request their modules".

This really doesn't address the main concern that is the problem:
whitelisting vs blacklisting. In your example, the dialout group gives
access to specific ttys or serial ports, etc, but an admin may want a
way to make sure the users don't load some buggy line discipline. Now,
that admin could blacklist all those modules one at a time, but new
stuff might get introduced, it doesn't handle other subsystems, etc.

We need to provide a way to whitelist autoloaded modules. The
demonstrated need (to whitelist _no_ modules, addressed by this
series) provides that level of control on a task basis (effectively
making it container-specific).

> And that _really_ isn't global behavior.  The fact that I might be
> able to load a serial; module has *nothing* to do with whether I can
> load some other kind of module at all.
>
> That global mode is just wrong.

If the per-task thing stays and the global sysctl goes, that would be
fine by me. That still gives admins a way to control the autoload
behavior, assuming their init knows how to set the flag. The global
sysctl, in my mind, is really more of a way for an admin to do this
after the fact without rebooting, etc. But I don't have a strong
opinion about the global sysctl.

-Kees
diff mbox series

Patch

diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index 7e690d0..fdd8560 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -382,8 +382,10 @@  void dev_load(struct net *net, const char *name)
 	rcu_read_unlock();
 
 	no_module = !dev;
+	/* "netdev-%s" modules are allowed if CAP_NET_ADMIN is set */
 	if (no_module && capable(CAP_NET_ADMIN))
-		no_module = request_module("netdev-%s", name);
+		no_module = request_module_cap(CAP_NET_ADMIN, "netdev",
+					       "%s", name);
 	if (no_module && capable(CAP_SYS_MODULE))
 		request_module("%s", name);
 }