diff mbox

[v2] ipv4: synchronize bind() with RTM_NEWADDR notifications

Message ID 1287666383-17615-1-git-send-email-timo.teras@iki.fi
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Timo Teras Oct. 21, 2010, 1:06 p.m. UTC
Otherwise we have race condition to user land:
 1. process A: changes IP address
 2. process A: kernel sends RTM_NEWADDR (and schedules out)
 3. process B: gets notification
 4. process B: tries to bind() to new IP, but fails with EADDRNOTAVAIL
      because FIB is not yet updated and inet_addr_type() in inet_bind()
      does not recognize the IP as local
 5. process A: calls inetaddr_chain notifiers which updates FIB

Fix the error path to synchronize with configuration changes and retry
the address type check.

IPv6 side seems to handle the notifications properly: bind() immediately
after RTM_NEWADDR succeeds as expected.  This is because ipv6_chk_addr()
uses inet6_addr_lst which is updated before address notification.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
---
Since there was no reply to my question if this is ok, I interpreted it
as "maybe". So here's the code for review. Hopefully this helps determining
if this is an acceptable fix.

 net/ipv4/af_inet.c  |   18 ++++++++++++++++--
 net/ipv6/af_inet6.c |   13 ++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

Comments

Eric Dumazet Oct. 21, 2010, 2:10 p.m. UTC | #1
Le jeudi 21 octobre 2010 à 16:06 +0300, Timo Teräs a écrit :
> Otherwise we have race condition to user land:
>  1. process A: changes IP address
>  2. process A: kernel sends RTM_NEWADDR (and schedules out)
>  3. process B: gets notification
>  4. process B: tries to bind() to new IP, but fails with EADDRNOTAVAIL
>       because FIB is not yet updated and inet_addr_type() in inet_bind()
>       does not recognize the IP as local
>  5. process A: calls inetaddr_chain notifiers which updates FIB
> 
> Fix the error path to synchronize with configuration changes and retry
> the address type check.
> 
> IPv6 side seems to handle the notifications properly: bind() immediately
> after RTM_NEWADDR succeeds as expected.  This is because ipv6_chk_addr()
> uses inet6_addr_lst which is updated before address notification.
> 
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
> ---
> Since there was no reply to my question if this is ok, I interpreted it
> as "maybe". So here's the code for review. Hopefully this helps determining
> if this is an acceptable fix.
> 

Just say : no

Really Timo, this problem must get another fix.

I understand you need an urgent fix, you can use your patch in the
meantime, of course ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timo Teras Oct. 21, 2010, 7:01 p.m. UTC | #2
On 10/21/2010 05:10 PM, Eric Dumazet wrote:
> Le jeudi 21 octobre 2010 à 16:06 +0300, Timo Teräs a écrit :
>> Otherwise we have race condition to user land:
>>  1. process A: changes IP address
>>  2. process A: kernel sends RTM_NEWADDR (and schedules out)
>>  3. process B: gets notification
>>  4. process B: tries to bind() to new IP, but fails with EADDRNOTAVAIL
>>       because FIB is not yet updated and inet_addr_type() in inet_bind()
>>       does not recognize the IP as local
>>  5. process A: calls inetaddr_chain notifiers which updates FIB
>>
>> Fix the error path to synchronize with configuration changes and retry
>> the address type check.
>>
>> IPv6 side seems to handle the notifications properly: bind() immediately
>> after RTM_NEWADDR succeeds as expected.  This is because ipv6_chk_addr()
>> uses inet6_addr_lst which is updated before address notification.
>>
>> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
>> ---
>> Since there was no reply to my question if this is ok, I interpreted it
>> as "maybe". So here's the code for review. Hopefully this helps determining
>> if this is an acceptable fix.
> 
> Just say : no
> 
> Really Timo, this problem must get another fix.
> 
> I understand you need an urgent fix, you can use your patch in the
> meantime, of course ;)

Fine. I figured you might feel this way. My final idea to fix this, is
to modify inet_addr_type() do the address type check using the ifa
lists. This probably involves making ifa lists rcu (did not take close
look on how ifa lists work in current ipv4 code). This is basically how
ipv6 side works. I'm not too sure how to go on with this, so I'll wait
up until someone more qualified gets the time to look at this.

I think for my immediate needs I might be able to get away with using
the kludge patch, enabling non-local binding or fixing my userland code
to listen RTM_NEWROUTE/RTN_LOCAL events (though, IPv6 link-local
addresses are special and would need RTM_NEWADDR handling still to get
the real device's ifindex).

Btw. why do IPv6 RTN_LOCAL routes have loopback interface as dst instead
of the real device? IPv4 RTN_LOCAL routes seem to have the real device
so this looks inconsistent at first sight. I guess IPv6 requires this
for a bunch of other things. This way it's just not really possible to
get link-local IPv6 routes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6a1100c..013ab11 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -481,8 +481,22 @@  int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	    addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
 	    chk_addr_ret != RTN_LOCAL &&
 	    chk_addr_ret != RTN_MULTICAST &&
-	    chk_addr_ret != RTN_BROADCAST)
-		goto out;
+	    chk_addr_ret != RTN_BROADCAST) {
+		/* inet_addr_type() does a FIB lookup to check the
+		 * address type. Since FIB is updated after sending
+		 * RTM_NEWADDR notification, an application may end up
+		 * doing bind() before the FIB is updated. To avoid
+		 * returning a false negative, wait for possible ongoing
+		 * address changes to finish by acquiring rtnl lock and
+		 * retry the address type lookup. */
+		rtnl_lock();
+		rtnl_unlock();
+		chk_addr_ret = inet_addr_type(sock_net(sk), addr->sin_addr.s_addr);
+		if (chk_addr_ret != RTN_LOCAL &&
+		    chk_addr_ret != RTN_MULTICAST &&
+		    chk_addr_ret != RTN_BROADCAST)
+			goto out;
+	}
 
 	snum = ntohs(addr->sin_port);
 	err = -EACCES;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 56b9bf2..b1a83e1 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -300,7 +300,7 @@  int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			goto out;
 		}
 
-		/* Reproduce AF_INET checks to make the bindings consitant */
+		/* Reproduce AF_INET checks to make the bindings consistent */
 		v4addr = addr->sin6_addr.s6_addr32[3];
 		chk_addr_ret = inet_addr_type(net, v4addr);
 		if (!sysctl_ip_nonlocal_bind &&
@@ -309,8 +309,15 @@  int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		    chk_addr_ret != RTN_LOCAL &&
 		    chk_addr_ret != RTN_MULTICAST &&
 		    chk_addr_ret != RTN_BROADCAST) {
-			err = -EADDRNOTAVAIL;
-			goto out;
+			rtnl_lock();
+			rtnl_unlock();
+			chk_addr_ret = inet_addr_type(net, v4addr);
+			if (chk_addr_ret != RTN_LOCAL &&
+			    chk_addr_ret != RTN_MULTICAST &&
+			    chk_addr_ret != RTN_BROADCAST) {
+				err = -EADDRNOTAVAIL;
+				goto out;
+			}
 		}
 	} else {
 		if (addr_type != IPV6_ADDR_ANY) {