diff mbox

ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg

Message ID 20141117013448.GA26743@midget.suse.cz
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Bohac Nov. 17, 2014, 1:34 a.m. UTC
This fixes an old regression introduced by commit
b0d0d915 (ipx: remove the BKL).

When a recvmsg syscall blocks waiting for new data, no data can be sent on the
same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.

This breaks mars-nwe (NetWare emulator):
- the ncpserv process reads the request using recvmsg
- ncpserv forks and spawns nwconn
- ncpserv calls a (blocking) recvmsg and waits for new requests
- nwconn deadlocks in sendmsg on the same socket 

Commit b0d0d915 has simply replaced BKL locking with
lock_sock/release_sock. Unlike now, BKL got unlocked while
sleeping, so a blocking recvmsg did not block a concurrent
sendmsg.

Similarly, a potentially sleeping sendmsg() could block calls to recvmsg().

Only keep the socket locked while actually working with the socket data and
release it prior to calling skb_recv_datagram() / ipxitf_send().


Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Comments

Arnd Bergmann Nov. 18, 2014, 1:37 p.m. UTC | #1
On Monday 17 November 2014 02:34:48 Jiri Bohac wrote:
> This fixes an old regression introduced by commit
> b0d0d915 (ipx: remove the BKL).
> 
> When a recvmsg syscall blocks waiting for new data, no data can be sent on the
> same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.
> 
> This breaks mars-nwe (NetWare emulator):
> - the ncpserv process reads the request using recvmsg
> - ncpserv forks and spawns nwconn
> - ncpserv calls a (blocking) recvmsg and waits for new requests
> - nwconn deadlocks in sendmsg on the same socket 
> 
> Commit b0d0d915 has simply replaced BKL locking with
> lock_sock/release_sock. Unlike now, BKL got unlocked while
> sleeping, so a blocking recvmsg did not block a concurrent
> sendmsg.
> 
> Similarly, a potentially sleeping sendmsg() could block calls to recvmsg().
> 
> Only keep the socket locked while actually working with the socket data and
> release it prior to calling skb_recv_datagram() / ipxitf_send().
> 
> 
> Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Hi Jiri,

I'm very sorry about the regression my patch introduced, glad you worked
it out. Your patch looks correct to me, but I suspect we can do it in
a simpler way, based on what I found I did in the respective appletalk
and x25 BKL removal patches. From all I can tell, those do not have
the same problem, which is a relief to me.

Some questions:

> @@ -1745,12 +1745,16 @@ static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock,
>  		memcpy(usipx->sipx_node, ipxs->dest_addr.node, IPX_NODE_LEN);
>  	}
>  
> +	/* releases sk */
>  	rc = ipxrtr_route_packet(sk, usipx, msg->msg_iov, len,
>  				 flags & MSG_DONTWAIT);
>  	if (rc >= 0)
>  		rc = len;
> -out:
> +	goto out;
> +
> +out_release:
>  	release_sock(sk);
> +out:
>  	return rc;
>  }
>  

Does ipxrtr_route_packet() actually sleep while waiting for the network,
or is it possible that you only need to change the recvmsg path?

If you need to change this function, have you considered doing it
one of these two ways:

a) only change the ipxrtr_route_packet function to release the lock
   before sleeping and then reaquiring it but not change ipx_sendmsg

b) figure out whether ipx_sendmsg actually relies on the lock at all,
   and if it doesn't then remove the locking, or limit the scope to
   the parts that do.


> @@ -1776,20 +1780,21 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
>  #ifdef CONFIG_IPX_INTERN
>  		rc = -ENETDOWN;
>  		if (!ipxs->intrfc)
> -			goto out; /* Someone zonked the iface */
> +			goto out_release; /* Someone zonked the iface */
>  		memcpy(uaddr.sipx_node, ipxs->intrfc->if_node, IPX_NODE_LEN);
>  #endif	/* CONFIG_IPX_INTERN */
>  
>  		rc = __ipx_bind(sock, (struct sockaddr *)&uaddr,
>  			      sizeof(struct sockaddr_ipx));
>  		if (rc)
> -			goto out;
> +			goto out_release;
>  	}
>  
>  	rc = -ENOTCONN;
>  	if (sock_flag(sk, SOCK_ZAPPED))
> -		goto out;
> +		goto out_release;
>  
> +	release_sock(sk);
>  	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
>  				flags & MSG_DONTWAIT, &rc);
>  	if (!skb) {

Same thing here: I think your patch could be simplified if you just
release the socket lock before calling skb_recv_datagram and get
it back afterwards, and it would be much simpler if you could
show that the lock is not needed at all.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 18, 2014, 8:49 p.m. UTC | #2
From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 18 Nov 2014 14:37:26 +0100

> Does ipxrtr_route_packet() actually sleep while waiting for the network,
> or is it possible that you only need to change the recvmsg path?

I went over that code path and it doesn't appear to sleep at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Bohac Nov. 18, 2014, 10:10 p.m. UTC | #3
On Tue, Nov 18, 2014 at 02:37:26PM +0100, Arnd Bergmann wrote:
> Does ipxrtr_route_packet() actually sleep while waiting for the network,
> or is it possible that you only need to change the recvmsg path?

You're right. 
In fact, it can sleep in sock_alloc_send_skb(), but my patch does
not fix this - it releases the lock after that.
So let's ignore that for now, I'll send a V2 modifying only
ipx_recvmsg().

> >  	if (sock_flag(sk, SOCK_ZAPPED))
> > -		goto out;
> > +		goto out_release;
> >  
> > +	release_sock(sk);
> >  	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
> >  				flags & MSG_DONTWAIT, &rc);
> >  	if (!skb) {
> 
> Same thing here: I think your patch could be simplified if you just
> release the socket lock before calling skb_recv_datagram and get
> it back afterwards, 

It would simplify the code a little to just get the lock again.
But do we really want to optimize for source code size at the cost of
taking locks that are not necessarry?

> and it would be much simpler if you could show that the lock is
> not needed at all.

At least the ipxitf_insert_socket() inside __ipx_bind() looks
like it must be protected not to corrupt the intrfc->if_sklist.
I am not familiar with the code, so there may be other things.

Thanks for the review!
Arnd Bergmann Nov. 19, 2014, 8:32 a.m. UTC | #4
On Tuesday 18 November 2014 23:10:57 Jiri Bohac wrote:
> On Tue, Nov 18, 2014 at 02:37:26PM +0100, Arnd Bergmann wrote:
> > Does ipxrtr_route_packet() actually sleep while waiting for the network,
> > or is it possible that you only need to change the recvmsg path?
> 
> You're right. 
> In fact, it can sleep in sock_alloc_send_skb(), but my patch does
> not fix this - it releases the lock after that.
> So let's ignore that for now, I'll send a V2 modifying only
> ipx_recvmsg().

Ok.

> > >  	if (sock_flag(sk, SOCK_ZAPPED))
> > > -		goto out;
> > > +		goto out_release;
> > >  
> > > +	release_sock(sk);
> > >  	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
> > >  				flags & MSG_DONTWAIT, &rc);
> > >  	if (!skb) {
> > 
> > Same thing here: I think your patch could be simplified if you just
> > release the socket lock before calling skb_recv_datagram and get
> > it back afterwards, 
> 
> It would simplify the code a little to just get the lock again.
> But do we really want to optimize for source code size at the cost of
> taking locks that are not necessarry?

I'm more interested in the code structure, in particular this bit

@@ -1807,8 +1812,10 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
 
        rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov,
                                     copied);
-       if (rc)
-               goto out_free;
+       if (rc) {
+               skb_free_datagram(sk, skb);
+               goto out;
+       }


after your change mixes coding styles: in some failure cases you just goto
a cleanup part, in other cases you do the cleanup before the goto.

If I'm reading the patch correctly, this change has introduced a leak
because you no longer call skb_free_datagram() in the success case.
Changing the locking only around the skb_recv_datagram() call would
have avoided this problem or at least (if I'm reading it wrong and
the patch is indeed correct) have made it easier to review what the
new code flow and what the change is.

> > and it would be much simpler if you could show that the lock is
> > not needed at all.
> 
> At least the ipxitf_insert_socket() inside __ipx_bind() looks
> like it must be protected not to corrupt the intrfc->if_sklist.
> I am not familiar with the code, so there may be other things.

Ok. Better to do just a less invasive change then. Clearly this code
is not getting much testing, so leaving the locking in place (aside
from fixing the bug) seems the better choice.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Bohac Nov. 19, 2014, 10:34 a.m. UTC | #5
On Wed, Nov 19, 2014 at 09:32:54AM +0100, Arnd Bergmann wrote:
> I'm more interested in the code structure, in particular this bit
> 
> @@ -1807,8 +1812,10 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
>  
>         rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov,
>                                      copied);
> -       if (rc)
> -               goto out_free;
> +       if (rc) {
> +               skb_free_datagram(sk, skb);
> +               goto out;
> +       }
> 
> 
> after your change mixes coding styles: in some failure cases you just goto
> a cleanup part, in other cases you do the cleanup before the goto.
> 
> If I'm reading the patch correctly, this change has introduced a leak

Very lame of me - thanks so much for spotitng this!
Sending a restructured v3.
diff mbox

Patch

diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c
index 91729b8..1e0d796 100644
--- a/net/ipx/af_ipx.c
+++ b/net/ipx/af_ipx.c
@@ -1703,11 +1703,11 @@  static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock,
 /*	if (sk->sk_zapped)
 		return -EIO; */	/* Socket not bound */
 	if (flags & ~(MSG_DONTWAIT|MSG_CMSG_COMPAT))
-		goto out;
+		goto out_release;
 
 	/* Max possible packet size limited by 16 bit pktsize in header */
 	if (len >= 65535 - sizeof(struct ipxhdr))
-		goto out;
+		goto out_release;
 
 	if (usipx) {
 		if (!ipxs->port) {
@@ -1718,24 +1718,24 @@  static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock,
 #ifdef CONFIG_IPX_INTERN
 			rc = -ENETDOWN;
 			if (!ipxs->intrfc)
-				goto out; /* Someone zonked the iface */
+				goto out_release; /* Someone zonked the iface */
 			memcpy(uaddr.sipx_node, ipxs->intrfc->if_node,
 				IPX_NODE_LEN);
 #endif
 			rc = __ipx_bind(sock, (struct sockaddr *)&uaddr,
 					sizeof(struct sockaddr_ipx));
 			if (rc)
-				goto out;
+				goto out_release;
 		}
 
 		rc = -EINVAL;
 		if (msg->msg_namelen < sizeof(*usipx) ||
 		    usipx->sipx_family != AF_IPX)
-			goto out;
+			goto out_release;
 	} else {
 		rc = -ENOTCONN;
 		if (sk->sk_state != TCP_ESTABLISHED)
-			goto out;
+			goto out_release;
 
 		usipx = &local_sipx;
 		usipx->sipx_family 	= AF_IPX;
@@ -1745,12 +1745,16 @@  static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock,
 		memcpy(usipx->sipx_node, ipxs->dest_addr.node, IPX_NODE_LEN);
 	}
 
+	/* releases sk */
 	rc = ipxrtr_route_packet(sk, usipx, msg->msg_iov, len,
 				 flags & MSG_DONTWAIT);
 	if (rc >= 0)
 		rc = len;
-out:
+	goto out;
+
+out_release:
 	release_sock(sk);
+out:
 	return rc;
 }
 
@@ -1776,20 +1780,21 @@  static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
 #ifdef CONFIG_IPX_INTERN
 		rc = -ENETDOWN;
 		if (!ipxs->intrfc)
-			goto out; /* Someone zonked the iface */
+			goto out_release; /* Someone zonked the iface */
 		memcpy(uaddr.sipx_node, ipxs->intrfc->if_node, IPX_NODE_LEN);
 #endif	/* CONFIG_IPX_INTERN */
 
 		rc = __ipx_bind(sock, (struct sockaddr *)&uaddr,
 			      sizeof(struct sockaddr_ipx));
 		if (rc)
-			goto out;
+			goto out_release;
 	}
 
 	rc = -ENOTCONN;
 	if (sock_flag(sk, SOCK_ZAPPED))
-		goto out;
+		goto out_release;
 
+	release_sock(sk);
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
 				flags & MSG_DONTWAIT, &rc);
 	if (!skb) {
@@ -1807,8 +1812,10 @@  static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 	rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov,
 				     copied);
-	if (rc)
-		goto out_free;
+	if (rc) {
+		skb_free_datagram(sk, skb);
+		goto out;
+	}
 	if (skb->tstamp.tv64)
 		sk->sk_stamp = skb->tstamp;
 
@@ -1822,11 +1829,11 @@  static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock,
 		msg->msg_namelen	= sizeof(*sipx);
 	}
 	rc = copied;
+	goto out;
 
-out_free:
-	skb_free_datagram(sk, skb);
-out:
+out_release:
 	release_sock(sk);
+out:
 	return rc;
 }
 
diff --git a/net/ipx/ipx_route.c b/net/ipx/ipx_route.c
index 67e7ad3..2f082af 100644
--- a/net/ipx/ipx_route.c
+++ b/net/ipx/ipx_route.c
@@ -163,6 +163,7 @@  int ipxrtr_route_skb(struct sk_buff *skb)
 
 /*
  * Route an outgoing frame from a socket.
+ * Expects sk to be locked and releases it before returning.
  */
 int ipxrtr_route_packet(struct sock *sk, struct sockaddr_ipx *usipx,
 			struct iovec *iov, size_t len, int noblock)
@@ -184,7 +185,7 @@  int ipxrtr_route_packet(struct sock *sk, struct sockaddr_ipx *usipx,
 		rt = ipxrtr_lookup(usipx->sipx_network);
 		rc = -ENETUNREACH;
 		if (!rt)
-			goto out;
+			goto out_release;
 		intrfc = rt->ir_intrfc;
 	}
 
@@ -242,12 +243,16 @@  int ipxrtr_route_packet(struct sock *sk, struct sockaddr_ipx *usipx,
 	else
 		ipx->ipx_checksum = ipx_cksum(ipx, len + sizeof(struct ipxhdr));
 
+	release_sock(sk);
 	rc = ipxitf_send(intrfc, skb, (rt && rt->ir_routed) ?
 			 rt->ir_router_node : ipx->ipx_dest.node);
+	goto out;
 out_put:
 	ipxitf_put(intrfc);
 	if (rt)
 		ipxrtr_put(rt);
+out_release:
+	release_sock(sk);
 out:
 	return rc;
 }