Message ID | 1511803118-2552-6-git-send-email-tixxdz@gmail.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Series | Improve Module autoloading infrastructure | expand |
On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote: > This uses the new request_module_cap() facility to directly propagate > CAP_NET_ADMIN capability and the 'netdev' module prefix to the > capability subsystem as it was suggested. This is the kind of complexity that I wonder if it's worth it at all. Nobody sane actually uses those stupid capability bits. Have you ever actually seen it used in real life? They were a mistake, and we should never have done them - another case of security people who think that complexity == security, when in reality nobody actually wants the complexity or is willing to set it up and manage it. Linus
Hi Linus, On Mon, Nov 27, 2017 at 7:44 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 9:18 AM, Djalal Harouni <tixxdz@gmail.com> wrote: >> This uses the new request_module_cap() facility to directly propagate >> CAP_NET_ADMIN capability and the 'netdev' module prefix to the >> capability subsystem as it was suggested. > > This is the kind of complexity that I wonder if it's worth it at all. > > Nobody sane actually uses those stupid capability bits. Have you ever > actually seen it used in real life? Yes they are complicated even for developers, and normal users do not understand them, however yes every sandbox and container is exposing them to endusers directly, they are documented! so yes CAP_SYS_MODULE is exposed but it does not cover autoloading. However, we are trying hard to abstract some semantics that are easy to grasp, we are mutating capabilities and seccomp to have an abstracted "yes/no" options for our endusers. Now, if you are referring to kernel code, the networking subsystem is using them and I don't want to break any assumption here. There is still the request_module(), the request_module_cap() was suggested so networking code later won't have to do the checks on its own, and maybe it can be consistent in the long term. The phonet sockets even needs CAP_SYS_ADMIN... > > They were a mistake, and we should never have done them - another case > of security people who think that complexity == security, when in > reality nobody actually wants the complexity or is willing to set it > up and manage it. Alright, but I guess we are stuck, is there something better on how we can do this or describe this ? Please note in these patches, the mode is specifically described as: * allowed: for backward compatibility (I would have done without it) * privileged: which includes capabilities (backward compatibility too) or we can add what ever in the future * disabled: even for privileged. So I would have preferred if it is something like "yes/no" but... However in userspace we will try hard to hide this complexity and the capability bits. Now I can see that the code comments and doc refer to privileged with capabilities a lot, where we can maybe update that doc and code to less state that privileged means capabilities ? Suggestions ? Thanks! > Linus
On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote: > > However, we are trying hard to abstract some semantics that are easy > to grasp, we are mutating capabilities and seccomp to have an > abstracted "yes/no" options for our endusers. Yes. Sadly, it looks like we actually do have users that just expect to load modules dynamically without any capabilities at all. So we can't actually disallow it by default at all, which imho makes this security option essentially useless. A security option that people can't use without breaking their system is pointless. We saw that with SELinux - people ended up just disabling it for _years_, simply because it ended up breaking so much in practice. And yes, it got fixed eventually, but at an incredibly high maintenance cost of all the crazy rules databases. > Alright, but I guess we are stuck, is there something better on how we > can do this or describe this ? So I wonder if we can perhaps look at the places that actually do "requerst_module()", and start filtering them on that basis. Some of them will already have checked for capabilities. Others clearly expect to juist work even _without_ capabilities (ie the bluetoothd case). So the whole "let's add a global config option" model is broken. There is no possible global rule. It will break things, which in turn mean that people won't turn it on (and we can't turn it on by default), which in turn makes this pointless. In other words, I really think that anything that just adds a mode flag cannot work. So instead of having one "modules_autoload_mode" thing, maybe the individual requerst_module() cases need to simply be audited. Put another way: I think the part of your patch series that does that "request_module_cap()" and makes the netdev modules use it is a good addition. It's the "mode" part I really don't agree with, because apparently we really need to default it to permissive. So how about instead: - add that "request_module_cap()" and make the networking code that already uses CAP_ADMIN_NET use it. - make "request_module()" itself default to being "request_module_cap(CAP_SYS_MODULE,..)" - make sure that when the capability check fails, we print an error message, and then for the ones that trigger, we will audit them and see if it's ok. Because that "mode" flag defaulting to off will just mean that the default case will remain the existing unsafe one, and that's bad. Opt-in really doesn't work. We've done it. Global flags for varied behavior really doesn't work. We've done that too. Different cases want different behavior, the global flag is just fundamentally broken. Linus
On Mon, Nov 27, 2017 at 2:04 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 1:41 PM, Djalal Harouni <tixxdz@gmail.com> wrote: >> >> However, we are trying hard to abstract some semantics that are easy >> to grasp, we are mutating capabilities and seccomp to have an >> abstracted "yes/no" options for our endusers. > > Yes. > > Sadly, it looks like we actually do have users that just expect to > load modules dynamically without any capabilities at all. > > So we can't actually disallow it by default at all, which imho makes > this security option essentially useless. The good news here is that this series comes with an expected user: systemd, for which I suspect this will option will get wider and wider use. > A security option that people can't use without breaking their system > is pointless. Another bit of good news is that systems _depending_ cap-less module loading are uncommon enough that many system builders can enable this protection and nothing will break. (Even you thought it was rare enough that we could just make this default-enabled.) Besides, distros understand that they have to keep an eye out for these things since the kernel has a long history of not allowing default-enabled protections that change userspace behavior, etc. > So I wonder if we can perhaps look at the places that actually do > "requerst_module()", and start filtering them on that basis. > > Some of them will already have checked for capabilities. > > Others clearly expect to just work even _without_ capabilities (ie > the bluetoothd case). > > So the whole "let's add a global config option" model is broken. There > is no possible global rule. It will break things, which in turn mean > that people won't turn it on (and we can't turn it on by default), > which in turn makes this pointless. > > In other words, I really think that anything that just adds a mode > flag cannot work. > > So instead of having one "modules_autoload_mode" thing, maybe the > individual requerst_module() cases need to simply be audited. > > Put another way: I think the part of your patch series that does that > "request_module_cap()" and makes the netdev modules use it is a good > addition. > > It's the "mode" part I really don't agree with, because apparently we > really need to default it to permissive. > > So how about instead: > > - add that "request_module_cap()" and make the networking code that > already uses CAP_ADMIN_NET use it. > > - make "request_module()" itself default to being > "request_module_cap(CAP_SYS_MODULE,..)" > > - make sure that when the capability check fails, we print an error > message, and then for the ones that trigger, we will audit them and > see if it's ok. The issue isn't "is it always okay to cap-less-load this module?" it's "should the kernel's attack surface be allowed to grown by an unprivileged user?" Nearly all the request_module() sites have already been audited and adjusted to avoid _arbitrary_ module loading (usually by adding module prefixes: netdev-, crypto-, etc), and that greatly helped reduce the ability for a unprivileged user to load a known-buggy module, but there will remain many subsystems that continue to expect to grow the kernel's features on demand within their subsystem, and sometimes those modules will have bugs that an unprivileged user can exploit (this history of this happening is quite real; it's not a theoretical concern). This is especially problematic for distro kernels where they build tons of modules. There isn't a way for an admin to say "I only want root to be able to load modules". They can't delete all the "unused" kernel modules, because maybe root will want the module, they can't change module file permissions because the modules are read with init's permissions (and it would be terrible to repeatedly change all the file perms each time a new kernel got installed), modules can only be blacklisted (not whitelisted) in /etc/modprobe.d/, etc. > Because that "mode" flag defaulting to off will just mean that the > default case will remain the existing unsafe one, and that's bad. > > Opt-in really doesn't work. We've done it. > > Global flags for varied behavior really doesn't work. We've done that > too. Different cases want different behavior, the global flag is just > fundamentally broken. I don't disagree that a global should be avoided, but I'm struggling to see another option here. We can't break userspace by default so we can't restrict cap-less loading by default. But we can allow userspace to _choose_ to break itself, especially within a container. This isn't uncommon, especially for modules, where we even have the global "modules_disabled" sysctl already. The level of granularity of control here is the issue, and it's what this series solves. The options I see for module loading control are: 1) monolithic kernel (no modules) 2) modular kernel that flips on modules_disabled after boot (no modules after boot) 3) modular kernel that allows per-subsystem unpriv module loading (all modules loadable) There is a demand for something between 2 and 3 where only root can load modules. (And as pointed out in the series, this is _especially_ true for containers where the admin may want to leave module loading alone in the init namespace, but stop any module loading in the container.) -Kees
On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: > > I don't disagree that a global should be avoided, but I'm struggling > to see another option here. We can't break userspace by default so we > can't restrict cap-less loading by default. But we can allow userspace > to _choose_ to break itself, especially within a container. This isn't > uncommon, especially for modules, where we even have the global > "modules_disabled" sysctl already. The level of granularity of control > here is the issue, and it's what this series solves. So there's two "global" here - if a container were to choose to break itself, it should damn well be container-specific, not some global option This part seems to be ok in the patch series, since the "global" is really per-task. So it's not global in the "system-wide" sense. - if _one_ request_module() caller were to say "I don't want to be loaded by a normal user", that doesn't mean that _other_ request_module() cases shouldn't. This is the part I'm objecting to, because it means that we can't enable this stricter policy by default. And the thing is, the patch series seems to already introduce largely the better model of just making it site-specific. Introducing that request_module_cap() thing and then using it for networking is a good step. But I also suspect that we _could_ just make the stricter rules actually be default, if we just fixed the thing up to not be "every request_module() is the same". For example, several request_module() calls come from device node opens, and it makes sense that we can just say: "if you have access to the device node, then you have the right to request the module". But that would need to be not a global "request_module()" behavior, but a behavior that is tied to the particular call-site. IOW, extend on that request_module_cap() model, and introduce (perhaps) a "request_module_dev()" call that basically means "the user opened the device node for the requested module". Because those kinds of permissions aren't necessarily about capabilities, but about things like "I'm in the dialout group, I get to open tty devices and by implication request their modules". And that _really_ isn't global behavior. The fact that I might be able to load a serial; module has *nothing* to do with whether I can load some other kind of module at all. That global mode is just wrong. Linus
On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: >> >> I don't disagree that a global should be avoided, but I'm struggling >> to see another option here. We can't break userspace by default so we >> can't restrict cap-less loading by default. But we can allow userspace >> to _choose_ to break itself, especially within a container. This isn't >> uncommon, especially for modules, where we even have the global >> "modules_disabled" sysctl already. The level of granularity of control >> here is the issue, and it's what this series solves. > > So there's two "global" here > > - if a container were to choose to break itself, it should damn well > be container-specific, not some global option > > This part seems to be ok in the patch series, since the "global" is > really per-task. So it's not global in the "system-wide" sense. > > - if _one_ request_module() caller were to say "I don't want to be > loaded by a normal user", that doesn't mean that _other_ > request_module() cases shouldn't. > > This is the part I'm objecting to, because it means that we can't > enable this stricter policy by default. > > And the thing is, the patch series seems to already introduce largely > the better model of just making it site-specific. Introducing that > request_module_cap() thing and then using it for networking is a good > step. > > But I also suspect that we _could_ just make the stricter rules > actually be default, if we just fixed the thing up to not be "every > request_module() is the same". > > For example, several request_module() calls come from device node > opens, and it makes sense that we can just say: "if you have access to > the device node, then you have the right to request the module". > > But that would need to be not a global "request_module()" behavior, > but a behavior that is tied to the particular call-site. > > IOW, extend on that request_module_cap() model, and introduce > (perhaps) a "request_module_dev()" call that basically means "the user > opened the device node for the requested module". > > Because those kinds of permissions aren't necessarily about > capabilities, but about things like "I'm in the dialout group, I get > to open tty devices and by implication request their modules". > > And that _really_ isn't global behavior. The fact that I might be > able to load a serial; module has *nothing* to do with whether I can > load some other kind of module at all. > > That global mode is just wrong. What about exporting this entirely to userspace, giving it as much context as possible? i.e. inform modprobe about the user doing it, maybe the subsystem, etc? -Kees
On Mon, Nov 27, 2017 at 3:19 PM, Kees Cook <keescook@chromium.org> wrote: > > What about exporting this entirely to userspace, giving it as much > context as possible? i.e. inform modprobe about the user doing it, > maybe the subsystem, etc? Yeah, except for the fact that we don't trust user-mode? We used to do that exact thing. It was a nasty disaster, and caused version skew and other horrible problems. So no. Th e"let's just let user mode sort it out" doesn't work. User mode doesn't sort anything out, it just makes it worse. It's not some made-up example when I say that user-mode has decided that kernel requests have to be completely serialized, and recusive invocations will just hang. So no. We do not go down that particular rat-hole. It's just a bigger chance of getting things wrong. Linus
On Mon, Nov 27, 2017 at 3:14 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Nov 27, 2017 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: >> >> I don't disagree that a global should be avoided, but I'm struggling >> to see another option here. We can't break userspace by default so we >> can't restrict cap-less loading by default. But we can allow userspace >> to _choose_ to break itself, especially within a container. This isn't >> uncommon, especially for modules, where we even have the global >> "modules_disabled" sysctl already. The level of granularity of control >> here is the issue, and it's what this series solves. > > So there's two "global" here > > - if a container were to choose to break itself, it should damn well > be container-specific, not some global option > > This part seems to be ok in the patch series, since the "global" is > really per-task. So it's not global in the "system-wide" sense. Right, though in the case of init, it could flip that toggle for itself and it would then effectively be system-wide. > - if _one_ request_module() caller were to say "I don't want to be > loaded by a normal user", that doesn't mean that _other_ > request_module() cases shouldn't. > > This is the part I'm objecting to, because it means that we can't > enable this stricter policy by default. Okay, I see what you mean here. You want to clearly distinguish between unprivileged and privileged-only. I'm unconvinced that's going to change much, as the bulk of the exposed request_module() users are already expecting to be unprivileged (and that's why they were all converted to requiring a named prefix). > And the thing is, the patch series seems to already introduce largely > the better model of just making it site-specific. Introducing that > request_module_cap() thing and then using it for networking is a good > step. > > But I also suspect that we _could_ just make the stricter rules > actually be default, if we just fixed the thing up to not be "every > request_module() is the same". When doing some of the older module name prefix work, I did consider introducing a new request_module() API that included the prefix name as an explicit argument (instead of embedding it in the format string). We could easily start there, and then have "plain" request_module() require privs. But we'll still need a way to say "admin doesn't want unpriv module auto-loading". > For example, several request_module() calls come from device node > opens, and it makes sense that we can just say: "if you have access to > the device node, then you have the right to request the module". Many of these callers are using network interfaces to do this work, so there isn't as clean a permission model associated with those like there might be with a filesystem open(). But that doesn't matter (see below). > But that would need to be not a global "request_module()" behavior, > but a behavior that is tied to the particular call-site. > > IOW, extend on that request_module_cap() model, and introduce > (perhaps) a "request_module_dev()" call that basically means "the user > opened the device node for the requested module". > > Because those kinds of permissions aren't necessarily about > capabilities, but about things like "I'm in the dialout group, I get > to open tty devices and by implication request their modules". This really doesn't address the main concern that is the problem: whitelisting vs blacklisting. In your example, the dialout group gives access to specific ttys or serial ports, etc, but an admin may want a way to make sure the users don't load some buggy line discipline. Now, that admin could blacklist all those modules one at a time, but new stuff might get introduced, it doesn't handle other subsystems, etc. We need to provide a way to whitelist autoloaded modules. The demonstrated need (to whitelist _no_ modules, addressed by this series) provides that level of control on a task basis (effectively making it container-specific). > And that _really_ isn't global behavior. The fact that I might be > able to load a serial; module has *nothing* to do with whether I can > load some other kind of module at all. > > That global mode is just wrong. If the per-task thing stays and the global sysctl goes, that would be fine by me. That still gives admins a way to control the autoload behavior, assuming their init knows how to set the flag. The global sysctl, in my mind, is really more of a way for an admin to do this after the fact without rebooting, etc. But I don't have a strong opinion about the global sysctl. -Kees
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 7e690d0..fdd8560 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -382,8 +382,10 @@ void dev_load(struct net *net, const char *name) rcu_read_unlock(); no_module = !dev; + /* "netdev-%s" modules are allowed if CAP_NET_ADMIN is set */ if (no_module && capable(CAP_NET_ADMIN)) - no_module = request_module("netdev-%s", name); + no_module = request_module_cap(CAP_NET_ADMIN, "netdev", + "%s", name); if (no_module && capable(CAP_SYS_MODULE)) request_module("%s", name); }
This uses the new request_module_cap() facility to directly propagate CAP_NET_ADMIN capability and the 'netdev' module prefix to the capability subsystem as it was suggested. We do not remove the explicit capable(CAP_NET_ADMIN) check here, but we may remove it in future versions since it is also performed by the capability subsystem. This allows to have a better interface where other subsystems will just use this call and let the capability subsystem handles the permission checks, if the modules should be loaded or not. This is also an infrastructure fix since historically Linux always allowed to auto-load modules without privileges, and later the net code started to check capabilities and prefixes, adapted the CAP_NET_ADMIN check with the 'netdev' prefix to prevent abusing the capability by loading non-netdev modules. However from a bigger picture we want to continue to support automatic module loading as non privileged but also implement easy policy solutions like: User=djalal DenyNewFeatures=no Which will translate to allow the interactive user djalal to load extra Linux features. Others, volatile accounts or guests can be easily blocked from doing so. We have introduced in previous patches the necessary infrastructure and now with this change we start to use the new request_module_cap() function to explicitly tell the capability subsystem that we want to auto-load modules with CAP_NET_ADMIN if they are prefixed. This is also based on suggestions from Rusty Russel and Kees Cook [1] [1] https://lkml.org/lkml/2017/4/26/735 Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: James Morris <james.l.morris@oracle.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Solar Designer <solar@openwall.com> Cc: Andy Lutomirski <luto@kernel.org> Suggested-by: Rusty Russell <rusty@rustcorp.com.au> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Djalal Harouni <tixxdz@gmail.com> --- net/core/dev_ioctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)