diff mbox series

[tipc-discussion,net,v2,1/1] tipc: fix race condition causing hung sendto

Message ID 20190221033121.4220-1-tung.q.nguyen@dektech.com.au
State Changes Requested
Delegated to: David Miller
Headers show
Series [tipc-discussion,net,v2,1/1] tipc: fix race condition causing hung sendto | expand

Commit Message

Tung Quang Nguyen Feb. 21, 2019, 3:31 a.m. UTC
When sending multicast messages via blocking socket,
if sending link is congested (tsk->cong_link_cnt is set to 1),
the sending thread will be put into sleeping state. However,
tipc_sk_filter_rcv() is called under socket spin lock but
tipc_wait_for_cond() is not. So, there is no guarantee that
the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in
CPU-1 will be perceived by CPU-0. If that is the case, the sending
thread in CPU-0 after being waken up, will continue to see
tsk->cong_link_cnt as 1 and put the sending thread into sleeping
state again. The sending thread will sleep forever.

CPU-0                                | CPU-1
tipc_wait_for_cond()                 |
{                                    |
 // condition_ = !tsk->cong_link_cnt |
 while ((rc_ = !(condition_))) {     |
  ...                                |
  release_sock(sk_);                 |
  wait_woken();                      |
                                     | if (!sock_owned_by_user(sk))
                                     |  tipc_sk_filter_rcv()
                                     |  {
                                     |   ...
                                     |   tipc_sk_proto_rcv()
                                     |   {
                                     |    ...
                                     |    tsk->cong_link_cnt--;
                                     |    ...
                                     |    sk->sk_write_space(sk);
                                     |    ...
                                     |   }
                                     |   ...
                                     |  }
  sched_annotate_sleep();            |
  lock_sock(sk_);                    |
  remove_wait_queue();               |
 }                                   |
}                                    |

This commit fixes it by adding memory barrier to tipc_sk_proto_rcv()
and tipc_wait_for_cond().

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
---
 net/tipc/socket.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

David Miller Feb. 21, 2019, 3:51 a.m. UTC | #1
So if the previous version didn't even compile, I have to ask.

How did you test this?
Tung Quang Nguyen Feb. 21, 2019, 4:04 a.m. UTC | #2
Hi David,

I compiled previous version and tested it. But I forgot to observe kernel
warning.

For the way to reproduce the issue: calling sendto() and printf() right
after that to print out a log to see if sendto() was successful. On some
occasions, the log was never printed and stack trace showed sendto() was not
returned.
By applying the patch, the issue disappeared.

Thanks.

Best regards,
Tung Nguyen

-----Original Message-----
From: David Miller <davem@davemloft.net> 
Sent: Thursday, February 21, 2019 10:51 AM
To: tung.q.nguyen@dektech.com.au
Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion][net v2 1/1] tipc: fix race condition causing
hung sendto


So if the previous version didn't even compile, I have to ask.

How did you test this?
David Miller Feb. 22, 2019, 11:24 p.m. UTC | #3
From: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Date: Thu, 21 Feb 2019 10:31:21 +0700

> When sending multicast messages via blocking socket,

This patch does not apply cleanly to net, please respin:

[davem@localhost net]$ git am --signoff tipc-discussion-net-v2-1-1-tipc-fix-race-condition-causing-hung-sendto.patch 
Applying: tipc: fix race condition causing hung sendto
error: patch failed: net/tipc/socket.c:379
error: net/tipc/socket.c: patch does not apply
Patch failed at 0001 tipc: fix race condition causing hung sendto
hint: Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
diff mbox series

Patch

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1217c90a363b..7cfa0c8ccc2a 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -379,16 +379,18 @@  static int tipc_sk_sock_err(struct socket *sock, long *timeout)
 
 #define tipc_wait_for_cond(sock_, timeo_, condition_)			       \
 ({                                                                             \
+	DEFINE_WAIT_FUNC(wait_, woken_wake_function);                          \
 	struct sock *sk_;						       \
 	int rc_;							       \
 									       \
 	while ((rc_ = !(condition_))) {					       \
-		DEFINE_WAIT_FUNC(wait_, woken_wake_function);	               \
+		/* coupled with smp_wmb() in tipc_sk_proto_rcv() */            \
+		smp_rmb();                                                     \
 		sk_ = (sock_)->sk;					       \
 		rc_ = tipc_sk_sock_err((sock_), timeo_);		       \
 		if (rc_)						       \
 			break;						       \
-		prepare_to_wait(sk_sleep(sk_), &wait_, TASK_INTERRUPTIBLE);    \
+		add_wait_queue(sk_sleep(sk_), &wait_);                         \
 		release_sock(sk_);					       \
 		*(timeo_) = wait_woken(&wait_, TASK_INTERRUPTIBLE, *(timeo_)); \
 		sched_annotate_sleep();				               \
@@ -1982,6 +1984,8 @@  static void tipc_sk_proto_rcv(struct sock *sk,
 		return;
 	case SOCK_WAKEUP:
 		tipc_dest_del(&tsk->cong_links, msg_orignode(hdr), 0);
+		/* coupled with smp_rmb() in tipc_wait_for_cond() */
+		smp_wmb();
 		tsk->cong_link_cnt--;
 		wakeup = true;
 		break;