diff mbox

UNIX: Do not loop forever at unix_autobind().

Message ID 201009040740.o847eB4f040772@www262.sakura.ne.jp
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Tetsuo Handa Sept. 4, 2010, 7:40 a.m. UTC
Eric Dumazet wrote:
> Sorry, this wont work very well if you have many processes using
> autobind(). Some of them will loop many time before hitting
> "stop_ordernum".

I see. Then, we should use local counter rather than global counter for yield()
checking in case there are multiple threads hitting this loop.
----------------------------------------
From 57a49c7b5a39de58d4538b85c60c758be3d2ce4f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 4 Sep 2010 16:23:46 +0900
Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().

We assumed that unix_autobind() never fails if kzalloc() succeeded.
But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
consume all names using fork()/socket()/bind().

If all names are in use, those who call bind() with addr_len == sizeof(short)
or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

  while (1)
  	yield();

loop at unix_autobind() till a name becomes available.
This patch adds a loop counter in order to give up after 1048576 attempts and
use the loop counter in order to count yield() interval more reliably when
there are many threads hitting this loop.

Note that currently a local user can consume 2GB of kernel memory if the user
is allowed to create and autobind 1048576 UNIX domain sockets. We should
consider adding some restriction for autobind operation.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/unix/af_unix.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

Comments

Eric Dumazet Sept. 4, 2010, 8:24 a.m. UTC | #1
Le samedi 04 septembre 2010 à 16:40 +0900, Tetsuo Handa a écrit :
> Eric Dumazet wrote:
> > Sorry, this wont work very well if you have many processes using
> > autobind(). Some of them will loop many time before hitting
> > "stop_ordernum".
> 
> I see. Then, we should use local counter rather than global counter for yield()
> checking in case there are multiple threads hitting this loop.
> ----------------------------------------
> From 57a49c7b5a39de58d4538b85c60c758be3d2ce4f Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Sat, 4 Sep 2010 16:23:46 +0900
> Subject: [PATCH] UNIX: Do not loop forever at unix_autobind().
> 
> We assumed that unix_autobind() never fails if kzalloc() succeeded.
> But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
> larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
> consume all names using fork()/socket()/bind().
> 
> If all names are in use, those who call bind() with addr_len == sizeof(short)
> or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue
> 
>   while (1)
>   	yield();
> 
> loop at unix_autobind() till a name becomes available.
> This patch adds a loop counter in order to give up after 1048576 attempts and
> use the loop counter in order to count yield() interval more reliably when
> there are many threads hitting this loop.
> 
> Note that currently a local user can consume 2GB of kernel memory if the user
> is allowed to create and autobind 1048576 UNIX domain sockets. We should
> consider adding some restriction for autobind operation.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  net/unix/af_unix.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 4414a18..eedfe50 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -692,6 +692,7 @@ static int unix_autobind(struct socket *sock)
>  	static u32 ordernum = 1;
>  	struct unix_address *addr;
>  	int err;
> +	unsigned int retries = 0;
>  
>  	mutex_lock(&u->readlock);
>  
> @@ -717,8 +718,14 @@ retry:
>  	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
>  				      addr->hash)) {
>  		spin_unlock(&unix_table_lock);
> +		/* Give up if all names seems to be in use. */
> +		if (retries++ == 0xFFFFF) {
> +			err = -ENOMEM;
> +			kfree(addr);
> +			goto out;
> +		}
>  		/* Sanity yield. It is unusual case, but yet... */
> -		if (!(ordernum&0xFF))
> +		if (!(retries & 0xFF))
>  			yield();
>  		goto retry;
>  	}


Quite frankly, given __unix_find_socket_byname() can take quite a long
time with one million entries in table, we should just remove

if (whatever)
	yield();

and use a more friendly :

cond_resched();



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 4414a18..eedfe50 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -692,6 +692,7 @@  static int unix_autobind(struct socket *sock)
 	static u32 ordernum = 1;
 	struct unix_address *addr;
 	int err;
+	unsigned int retries = 0;
 
 	mutex_lock(&u->readlock);
 
@@ -717,8 +718,14 @@  retry:
 	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
 				      addr->hash)) {
 		spin_unlock(&unix_table_lock);
+		/* Give up if all names seems to be in use. */
+		if (retries++ == 0xFFFFF) {
+			err = -ENOMEM;
+			kfree(addr);
+			goto out;
+		}
 		/* Sanity yield. It is unusual case, but yet... */
-		if (!(ordernum&0xFF))
+		if (!(retries & 0xFF))
 			yield();
 		goto retry;
 	}