Message ID | 20120118151126.01a74dc5@asterix.rh |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Le mercredi 18 janvier 2012 à 15:11 -0200, Flavio Leitner a écrit : > Hi folks, > > It has been reported to me that bind() fails when you leave > the port up to the kernel to choose and succeed when you > request a certain port in the same conditions. > > For example, let's restrict the ephemeral port range to 3 ports only: > # echo "32768 32770" > /proc/sys/net/ipv4/ip_local_port_range > > Assuming the system has two IP addresses: 172.31.1.6/24 and > 192.168.100.6/24 then run the following python script which > allocates all ephemeral ports using one IP address and then > try to bind another one using another IP address. > ... > Conclusion: When using ephemeral ports, inet_csk_get_port() > fails without checking if a conflict had happened. When using > fixed ports on the other hand, inet_csk_get_port() works > as expected. > > I will attach a quick hack to illustrate what I am thinking. > The idea is to check all ports first and if it fails, then > try again looking for a port that doesn't conflict. So, for > most cases, the algorithm is the same, but when the system > ran out of ports, there is a hope :-) > > Is there a reason to behave like that? or is this a real bug? > Sounds like a FAQ, but I am not finding an explanation for this > on the net yet. > Hi Flavio This seems a very good idea. Only drawback is when table is really full, we'll scan it twice. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 18 janvier 2012 à 18:36 +0100, Eric Dumazet a écrit : > Only drawback is when table is really full, we'll scan it twice. > We should at least add a local_bh_enable() and cond_resched() and jump to beginning of inet_csk_get_port() to allow pending softirqs/rescheds to be handled at half way. if (!check_conflict) { check_conflict = true; local_bh_enable(); cond_resched(); goto start; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 18 Jan 2012 18:36:59 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mercredi 18 janvier 2012 à 15:11 -0200, Flavio Leitner a écrit : > > Hi folks, > > > > It has been reported to me that bind() fails when you leave > > the port up to the kernel to choose and succeed when you > > request a certain port in the same conditions. > > > > For example, let's restrict the ephemeral port range to 3 ports only: > > # echo "32768 32770" > /proc/sys/net/ipv4/ip_local_port_range > > > > Assuming the system has two IP addresses: 172.31.1.6/24 and > > 192.168.100.6/24 then run the following python script which > > allocates all ephemeral ports using one IP address and then > > try to bind another one using another IP address. > > > ... > > Conclusion: When using ephemeral ports, inet_csk_get_port() > > fails without checking if a conflict had happened. When using > > fixed ports on the other hand, inet_csk_get_port() works > > as expected. > > > > I will attach a quick hack to illustrate what I am thinking. > > The idea is to check all ports first and if it fails, then > > try again looking for a port that doesn't conflict. So, for > > most cases, the algorithm is the same, but when the system > > ran out of ports, there is a hope :-) > > > > Is there a reason to behave like that? or is this a real bug? > > Sounds like a FAQ, but I am not finding an explanation for this > > on the net yet. > > > > Hi Flavio > > This seems a very good idea. > > Only drawback is when table is really full, we'll scan it twice. > That's right. Maybe checking for conflicts in the first round helps, but it would add a cost for the average case scenario. But first, I'd like to know if this is a bug before spending more time working on it. Maybe there is a reason for behaving like that. thanks! fbl -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 2e4e244..2911f06 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -97,7 +97,9 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) int ret, attempts = 5; struct net *net = sock_net(sk); int smallest_size = -1, smallest_rover; + bool check_conflict; + check_conflict = false; local_bh_disable(); if (!snum) { int remaining, rover, low, high; @@ -128,6 +130,13 @@ again: goto have_snum; } } + + if (check_conflict && !inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) { + spin_unlock(&head->lock); + snum = rover; + goto have_snum; + } + goto next; } break; @@ -150,6 +159,11 @@ again: snum = smallest_rover; goto have_snum; } + /* try again checking if a port can be reused */ + if (!check_conflict) { + check_conflict = true; + goto again; + } goto fail; } /* OK, here is the one we will use. HEAD is