Message ID | 201008302227.DJH30258.OQFMFtFJOOVSHL@I-love.SAKURA.ne.jp |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Le lundi 30 août 2010 à 22:27 +0900, Tetsuo Handa a écrit : > I tried to create and autobind 1048576 UNIX domain sockets using > http://I-love.SAKURA.ne.jp/tmp/unixloop.asm on 2.6.18-194.11.1.el5.x86_64, > and found that it needs about 2GB of RAM. Thus, on systems where > /proc/sys/fs/file-max is larger than 1048576, a local user can create and > autobind 1048576 UNIX domain sockets in order to let applications fall into > > while (1) > yield(); > > loop. Below is the patch (only compile tested) to avoid falling into this loop. > Is there any reason a name has to be '\0' + 5 letters? > Shoud I give up after checking 1048576 names rather than after checking > 4294967296 names? Yes please. Fix the bug first. Then, a following patch to increase current limit, if necessary. > ---------------------------------------- > > err = -ENOMEM; > - addr = kzalloc(sizeof(*addr) + sizeof(short) + 16, GFP_KERNEL); > + addr = kzalloc(sizeof(*addr) + sizeof(short) + 19, GFP_KERNEL); IMHO, this 16 or 19 value is wrong, we need less memory than that. (But this will be adressed in a 2nd patch) Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I contacted Brad Spengler regarding below topic http://kerneltrap.org/mailarchive/linux-netdev/2010/8/21/6283491 and received some comments. Brad Spengler wrote: > Tetsuo Handa wrote: > > Brad Spengler wrote: > > > Nice find, seems like the problem isn't just the exhaustion of the space > > > itself, but that a single non-root user can completely exhaust it with > > > no limitation (given enough RAM). There should be some per-user > > > resource limit for this (similar to RLIMIT_NPROC) or the space should be > > > made less rigid. > > > > > I posted a patch in the third message of that thread, but no response so far. > > I don't know whether we should allow 4294967296 names or not > > because more RAM will be consumed by attacker if more names are allowed. > > The max size for sun_path is considerably large; why couldn't we encode > the uid/namespace into it somehow and reduce the amount each user can > allocate? (conceptually it'd be something like, <namespace > uuid><uid><hex to operate on as previously>) The OOM killer needs to > make sure creators of large numbers of these get killed, but > implementing a per-user limit on this keeps it from turning into a DoS. Is it possible to make OOM killer to kill processes consuming kernel memory rather than userspace memory? I'm happy if OOM killer can select victim process based on both kernel memory usage and userspace memory usage. For example, opening a file requires kernel memory allocation (e.g. /sys/kernel/security/tomoyo/self_domain allocates 8KB of kernel memory). Therefore, I refuse allowing everybody to open that file even if the content is public (because an attacker would open /sys/kernel/security/tomoyo/self_domain as many as possible using fork() and open() in order to exhaust kernel memory). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2 Sep 2010, Tetsuo Handa wrote: > Is it possible to make OOM killer to kill processes consuming kernel memory > rather than userspace memory? > > I'm happy if OOM killer can select victim process based on both kernel memory > usage and userspace memory usage. For example, opening a file requires kernel > memory allocation (e.g. /sys/kernel/security/tomoyo/self_domain allocates 8KB > of kernel memory). Therefore, I refuse allowing everybody to open that file > even if the content is public (because an attacker would open > /sys/kernel/security/tomoyo/self_domain as many as possible using > fork() and open() in order to exhaust kernel memory). The oom killer has been rewritten for 2.6.36, so it'd probably help to frame it in the context of its new behavior, which you can try out in 2.6.36-rc3. The case you're describing would be difficult to kill with the oom killer because each fork() and exec() would be considered as seperate users of memory, so an aggregate of kernel memory amongst these threads wouldn't be considered. The oom killer isn't really concerned whether the memory was allocated in userspace or kernelspace, the only thing that matters is that it's used and is unreclaimable (and non-migratable). The memory controller accounts for user memory, so it's probably not a sufficient alternative either in this case. Is it not possible for the kernel to intelligently limit the amount of memory that it can allocate through an interface at any given time? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Is it possible to make OOM killer to kill processes consuming kernel memory > rather than userspace memory? > > I'm happy if OOM killer can select victim process based on both kernel memory > usage and userspace memory usage. For example, opening a file requires kernel > memory allocation (e.g. /sys/kernel/security/tomoyo/self_domain allocates 8KB > of kernel memory). Therefore, I refuse allowing everybody to open that file > even if the content is public (because an attacker would open > /sys/kernel/security/tomoyo/self_domain as many as possible using > fork() and open() in order to exhaust kernel memory). Unfortunatelly, It's impossible. We have zero information which kernel memory should be bound caller task. Almost all kernel memory are task unrelated, so we can't bind them with caller task blindly. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 4414a18..1271e98 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock) static u32 ordernum = 1; struct unix_address *addr; int err; + u32 loop = 0; mutex_lock(&u->readlock); @@ -700,7 +701,7 @@ static int unix_autobind(struct socket *sock) goto out; err = -ENOMEM; - addr = kzalloc(sizeof(*addr) + sizeof(short) + 16, GFP_KERNEL); + addr = kzalloc(sizeof(*addr) + sizeof(short) + 19, GFP_KERNEL); if (!addr) goto out; @@ -708,11 +709,11 @@ static int unix_autobind(struct socket *sock) atomic_set(&addr->refcnt, 1); retry: - addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short); + addr->len = sprintf(addr->name->sun_path+1, "%08x", ordernum) + 1 + sizeof(short); addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0)); spin_lock(&unix_table_lock); - ordernum = (ordernum+1)&0xFFFFF; + ordernum++; if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type, addr->hash)) { @@ -720,6 +721,12 @@ retry: /* Sanity yield. It is unusual case, but yet... */ if (!(ordernum&0xFF)) yield(); + /* Give up if all names are in use. */ + if (unlikely(!++loop)) { + err = -EAGAIN; + kfree(addr); + goto out; + } goto retry; } addr->hash ^= sk->sk_type;