From patchwork Wed Jan 18 17:11:26 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Flavio Leitner X-Patchwork-Id: 136661 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id A0369B6EFF for ; Thu, 19 Jan 2012 04:11:46 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932264Ab2ARRLa (ORCPT ); Wed, 18 Jan 2012 12:11:30 -0500 Received: from mx1.redhat.com ([209.132.183.28]:29516 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932214Ab2ARRL3 (ORCPT ); Wed, 18 Jan 2012 12:11:29 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q0IHBTmM012310 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 18 Jan 2012 12:11:29 -0500 Received: from asterix.rh (ovpn-113-72.phx2.redhat.com [10.3.113.72]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q0IHBRX9019124; Wed, 18 Jan 2012 12:11:28 -0500 Date: Wed, 18 Jan 2012 15:11:26 -0200 From: Flavio Leitner To: netdev Cc: Marcelo Leitner Subject: bind()/inet_csk_get_port() fails when no port is requested Message-ID: <20120118151126.01a74dc5@asterix.rh> Mime-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi folks, It has been reported to me that bind() fails when you leave the port up to the kernel to choose and succeed when you request a certain port in the same conditions. For example, let's restrict the ephemeral port range to 3 ports only: # echo "32768 32770" > /proc/sys/net/ipv4/ip_local_port_range Assuming the system has two IP addresses: 172.31.1.6/24 and 192.168.100.6/24 then run the following python script which allocates all ephemeral ports using one IP address and then try to bind another one using another IP address. #!/usr/bin/python import socket ip1 = [] s = None for i in [ 1, 2, 3, 4, 5, 6 ]: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP) try: s.bind(('172.31.1.7', 0)) ip1.append(s) except socket.error, err: # socket.error: (98, 'Address already in use') if err.args[0] == 98: break else: raise print '%d sockets bound at 172.31.1.7' % len(ip1) print 'Now binding at 192.168.100.6' s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP) s.bind(('192.168.100.6', 0)) This is the result: # ./ephemeral.py 3 sockets bound at 172.31.1.6 Now binding at 192.168.100.6 Traceback (most recent call last): File "./ephemeral.py", line 23, in s.bind(('192.168.100.6', 0)) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 98] Address already in use The last bind() fails even using a different IP address. Now if we change the reproducer to use fixed port number instead: #!/usr/bin/python import socket ip1 = [] s = None first_port=32768 port=first_port for i in [ 1, 2, 3, 4, 5, 6 ]: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP) try: s.bind(('172.31.1.7', port)) ip1.append(s) except socket.error, err: # socket.error: (98, 'Address already in use') if err.args[0] == 98: break else: raise port = port + 1 print '%d sockets bound at 172.31.1.7' % len(ip1) print 'Now binding at 192.168.100.6' s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP) s.bind(('192.168.100.6', first_port)) This is the result: # ./fixedports.py 6 sockets bound at 172.31.1.7 Now binding at 192.168.100.6 <-- works out! Conclusion: When using ephemeral ports, inet_csk_get_port() fails without checking if a conflict had happened. When using fixed ports on the other hand, inet_csk_get_port() works as expected. I will attach a quick hack to illustrate what I am thinking. The idea is to check all ports first and if it fails, then try again looking for a port that doesn't conflict. So, for most cases, the algorithm is the same, but when the system ran out of ports, there is a hope :-) Is there a reason to behave like that? or is this a real bug? Sounds like a FAQ, but I am not finding an explanation for this on the net yet. *hack* thanks, fbl --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 2e4e244..2911f06 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -97,7 +97,9 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) int ret, attempts = 5; struct net *net = sock_net(sk); int smallest_size = -1, smallest_rover; + bool check_conflict; + check_conflict = false; local_bh_disable(); if (!snum) { int remaining, rover, low, high; @@ -128,6 +130,13 @@ again: goto have_snum; } } + + if (check_conflict && !inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) { + spin_unlock(&head->lock); + snum = rover; + goto have_snum; + } + goto next; } break; @@ -150,6 +159,11 @@ again: snum = smallest_rover; goto have_snum; } + /* try again checking if a port can be reused */ + if (!check_conflict) { + check_conflict = true; + goto again; + } goto fail; } /* OK, here is the one we will use. HEAD is