Message ID | 20141117013448.GA26743@midget.suse.cz |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Monday 17 November 2014 02:34:48 Jiri Bohac wrote: > This fixes an old regression introduced by commit > b0d0d915 (ipx: remove the BKL). > > When a recvmsg syscall blocks waiting for new data, no data can be sent on the > same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked. > > This breaks mars-nwe (NetWare emulator): > - the ncpserv process reads the request using recvmsg > - ncpserv forks and spawns nwconn > - ncpserv calls a (blocking) recvmsg and waits for new requests > - nwconn deadlocks in sendmsg on the same socket > > Commit b0d0d915 has simply replaced BKL locking with > lock_sock/release_sock. Unlike now, BKL got unlocked while > sleeping, so a blocking recvmsg did not block a concurrent > sendmsg. > > Similarly, a potentially sleeping sendmsg() could block calls to recvmsg(). > > Only keep the socket locked while actually working with the socket data and > release it prior to calling skb_recv_datagram() / ipxitf_send(). > > > Signed-off-by: Jiri Bohac <jbohac@suse.cz> Hi Jiri, I'm very sorry about the regression my patch introduced, glad you worked it out. Your patch looks correct to me, but I suspect we can do it in a simpler way, based on what I found I did in the respective appletalk and x25 BKL removal patches. From all I can tell, those do not have the same problem, which is a relief to me. Some questions: > @@ -1745,12 +1745,16 @@ static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock, > memcpy(usipx->sipx_node, ipxs->dest_addr.node, IPX_NODE_LEN); > } > > + /* releases sk */ > rc = ipxrtr_route_packet(sk, usipx, msg->msg_iov, len, > flags & MSG_DONTWAIT); > if (rc >= 0) > rc = len; > -out: > + goto out; > + > +out_release: > release_sock(sk); > +out: > return rc; > } > Does ipxrtr_route_packet() actually sleep while waiting for the network, or is it possible that you only need to change the recvmsg path? If you need to change this function, have you considered doing it one of these two ways: a) only change the ipxrtr_route_packet function to release the lock before sleeping and then reaquiring it but not change ipx_sendmsg b) figure out whether ipx_sendmsg actually relies on the lock at all, and if it doesn't then remove the locking, or limit the scope to the parts that do. > @@ -1776,20 +1780,21 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock, > #ifdef CONFIG_IPX_INTERN > rc = -ENETDOWN; > if (!ipxs->intrfc) > - goto out; /* Someone zonked the iface */ > + goto out_release; /* Someone zonked the iface */ > memcpy(uaddr.sipx_node, ipxs->intrfc->if_node, IPX_NODE_LEN); > #endif /* CONFIG_IPX_INTERN */ > > rc = __ipx_bind(sock, (struct sockaddr *)&uaddr, > sizeof(struct sockaddr_ipx)); > if (rc) > - goto out; > + goto out_release; > } > > rc = -ENOTCONN; > if (sock_flag(sk, SOCK_ZAPPED)) > - goto out; > + goto out_release; > > + release_sock(sk); > skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, > flags & MSG_DONTWAIT, &rc); > if (!skb) { Same thing here: I think your patch could be simplified if you just release the socket lock before calling skb_recv_datagram and get it back afterwards, and it would be much simpler if you could show that the lock is not needed at all. Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Arnd Bergmann <arnd@arndb.de> Date: Tue, 18 Nov 2014 14:37:26 +0100 > Does ipxrtr_route_packet() actually sleep while waiting for the network, > or is it possible that you only need to change the recvmsg path? I went over that code path and it doesn't appear to sleep at all. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 18, 2014 at 02:37:26PM +0100, Arnd Bergmann wrote: > Does ipxrtr_route_packet() actually sleep while waiting for the network, > or is it possible that you only need to change the recvmsg path? You're right. In fact, it can sleep in sock_alloc_send_skb(), but my patch does not fix this - it releases the lock after that. So let's ignore that for now, I'll send a V2 modifying only ipx_recvmsg(). > > if (sock_flag(sk, SOCK_ZAPPED)) > > - goto out; > > + goto out_release; > > > > + release_sock(sk); > > skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, > > flags & MSG_DONTWAIT, &rc); > > if (!skb) { > > Same thing here: I think your patch could be simplified if you just > release the socket lock before calling skb_recv_datagram and get > it back afterwards, It would simplify the code a little to just get the lock again. But do we really want to optimize for source code size at the cost of taking locks that are not necessarry? > and it would be much simpler if you could show that the lock is > not needed at all. At least the ipxitf_insert_socket() inside __ipx_bind() looks like it must be protected not to corrupt the intrfc->if_sklist. I am not familiar with the code, so there may be other things. Thanks for the review!
On Tuesday 18 November 2014 23:10:57 Jiri Bohac wrote: > On Tue, Nov 18, 2014 at 02:37:26PM +0100, Arnd Bergmann wrote: > > Does ipxrtr_route_packet() actually sleep while waiting for the network, > > or is it possible that you only need to change the recvmsg path? > > You're right. > In fact, it can sleep in sock_alloc_send_skb(), but my patch does > not fix this - it releases the lock after that. > So let's ignore that for now, I'll send a V2 modifying only > ipx_recvmsg(). Ok. > > > if (sock_flag(sk, SOCK_ZAPPED)) > > > - goto out; > > > + goto out_release; > > > > > > + release_sock(sk); > > > skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, > > > flags & MSG_DONTWAIT, &rc); > > > if (!skb) { > > > > Same thing here: I think your patch could be simplified if you just > > release the socket lock before calling skb_recv_datagram and get > > it back afterwards, > > It would simplify the code a little to just get the lock again. > But do we really want to optimize for source code size at the cost of > taking locks that are not necessarry? I'm more interested in the code structure, in particular this bit @@ -1807,8 +1812,10 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock, rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov, copied); - if (rc) - goto out_free; + if (rc) { + skb_free_datagram(sk, skb); + goto out; + } after your change mixes coding styles: in some failure cases you just goto a cleanup part, in other cases you do the cleanup before the goto. If I'm reading the patch correctly, this change has introduced a leak because you no longer call skb_free_datagram() in the success case. Changing the locking only around the skb_recv_datagram() call would have avoided this problem or at least (if I'm reading it wrong and the patch is indeed correct) have made it easier to review what the new code flow and what the change is. > > and it would be much simpler if you could show that the lock is > > not needed at all. > > At least the ipxitf_insert_socket() inside __ipx_bind() looks > like it must be protected not to corrupt the intrfc->if_sklist. > I am not familiar with the code, so there may be other things. Ok. Better to do just a less invasive change then. Clearly this code is not getting much testing, so leaving the locking in place (aside from fixing the bug) seems the better choice. Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 19, 2014 at 09:32:54AM +0100, Arnd Bergmann wrote: > I'm more interested in the code structure, in particular this bit > > @@ -1807,8 +1812,10 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock, > > rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov, > copied); > - if (rc) > - goto out_free; > + if (rc) { > + skb_free_datagram(sk, skb); > + goto out; > + } > > > after your change mixes coding styles: in some failure cases you just goto > a cleanup part, in other cases you do the cleanup before the goto. > > If I'm reading the patch correctly, this change has introduced a leak Very lame of me - thanks so much for spotitng this! Sending a restructured v3.
diff --git a/net/ipx/af_ipx.c b/net/ipx/af_ipx.c index 91729b8..1e0d796 100644 --- a/net/ipx/af_ipx.c +++ b/net/ipx/af_ipx.c @@ -1703,11 +1703,11 @@ static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock, /* if (sk->sk_zapped) return -EIO; */ /* Socket not bound */ if (flags & ~(MSG_DONTWAIT|MSG_CMSG_COMPAT)) - goto out; + goto out_release; /* Max possible packet size limited by 16 bit pktsize in header */ if (len >= 65535 - sizeof(struct ipxhdr)) - goto out; + goto out_release; if (usipx) { if (!ipxs->port) { @@ -1718,24 +1718,24 @@ static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock, #ifdef CONFIG_IPX_INTERN rc = -ENETDOWN; if (!ipxs->intrfc) - goto out; /* Someone zonked the iface */ + goto out_release; /* Someone zonked the iface */ memcpy(uaddr.sipx_node, ipxs->intrfc->if_node, IPX_NODE_LEN); #endif rc = __ipx_bind(sock, (struct sockaddr *)&uaddr, sizeof(struct sockaddr_ipx)); if (rc) - goto out; + goto out_release; } rc = -EINVAL; if (msg->msg_namelen < sizeof(*usipx) || usipx->sipx_family != AF_IPX) - goto out; + goto out_release; } else { rc = -ENOTCONN; if (sk->sk_state != TCP_ESTABLISHED) - goto out; + goto out_release; usipx = &local_sipx; usipx->sipx_family = AF_IPX; @@ -1745,12 +1745,16 @@ static int ipx_sendmsg(struct kiocb *iocb, struct socket *sock, memcpy(usipx->sipx_node, ipxs->dest_addr.node, IPX_NODE_LEN); } + /* releases sk */ rc = ipxrtr_route_packet(sk, usipx, msg->msg_iov, len, flags & MSG_DONTWAIT); if (rc >= 0) rc = len; -out: + goto out; + +out_release: release_sock(sk); +out: return rc; } @@ -1776,20 +1780,21 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock, #ifdef CONFIG_IPX_INTERN rc = -ENETDOWN; if (!ipxs->intrfc) - goto out; /* Someone zonked the iface */ + goto out_release; /* Someone zonked the iface */ memcpy(uaddr.sipx_node, ipxs->intrfc->if_node, IPX_NODE_LEN); #endif /* CONFIG_IPX_INTERN */ rc = __ipx_bind(sock, (struct sockaddr *)&uaddr, sizeof(struct sockaddr_ipx)); if (rc) - goto out; + goto out_release; } rc = -ENOTCONN; if (sock_flag(sk, SOCK_ZAPPED)) - goto out; + goto out_release; + release_sock(sk); skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); if (!skb) { @@ -1807,8 +1812,10 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock, rc = skb_copy_datagram_iovec(skb, sizeof(struct ipxhdr), msg->msg_iov, copied); - if (rc) - goto out_free; + if (rc) { + skb_free_datagram(sk, skb); + goto out; + } if (skb->tstamp.tv64) sk->sk_stamp = skb->tstamp; @@ -1822,11 +1829,11 @@ static int ipx_recvmsg(struct kiocb *iocb, struct socket *sock, msg->msg_namelen = sizeof(*sipx); } rc = copied; + goto out; -out_free: - skb_free_datagram(sk, skb); -out: +out_release: release_sock(sk); +out: return rc; } diff --git a/net/ipx/ipx_route.c b/net/ipx/ipx_route.c index 67e7ad3..2f082af 100644 --- a/net/ipx/ipx_route.c +++ b/net/ipx/ipx_route.c @@ -163,6 +163,7 @@ int ipxrtr_route_skb(struct sk_buff *skb) /* * Route an outgoing frame from a socket. + * Expects sk to be locked and releases it before returning. */ int ipxrtr_route_packet(struct sock *sk, struct sockaddr_ipx *usipx, struct iovec *iov, size_t len, int noblock) @@ -184,7 +185,7 @@ int ipxrtr_route_packet(struct sock *sk, struct sockaddr_ipx *usipx, rt = ipxrtr_lookup(usipx->sipx_network); rc = -ENETUNREACH; if (!rt) - goto out; + goto out_release; intrfc = rt->ir_intrfc; } @@ -242,12 +243,16 @@ int ipxrtr_route_packet(struct sock *sk, struct sockaddr_ipx *usipx, else ipx->ipx_checksum = ipx_cksum(ipx, len + sizeof(struct ipxhdr)); + release_sock(sk); rc = ipxitf_send(intrfc, skb, (rt && rt->ir_routed) ? rt->ir_router_node : ipx->ipx_dest.node); + goto out; out_put: ipxitf_put(intrfc); if (rt) ipxrtr_put(rt); +out_release: + release_sock(sk); out: return rc; }
This fixes an old regression introduced by commit b0d0d915 (ipx: remove the BKL). When a recvmsg syscall blocks waiting for new data, no data can be sent on the same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked. This breaks mars-nwe (NetWare emulator): - the ncpserv process reads the request using recvmsg - ncpserv forks and spawns nwconn - ncpserv calls a (blocking) recvmsg and waits for new requests - nwconn deadlocks in sendmsg on the same socket Commit b0d0d915 has simply replaced BKL locking with lock_sock/release_sock. Unlike now, BKL got unlocked while sleeping, so a blocking recvmsg did not block a concurrent sendmsg. Similarly, a potentially sleeping sendmsg() could block calls to recvmsg(). Only keep the socket locked while actually working with the socket data and release it prior to calling skb_recv_datagram() / ipxitf_send(). Signed-off-by: Jiri Bohac <jbohac@suse.cz>