diff mbox

tcp: perform DMA to userspace only if there is a task waiting for it

Message ID alpine.LNX.2.00.1207271556320.11375@pobox.suse.cz
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Kosina July 27, 2012, 2:05 p.m. UTC
Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT") 
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if 
applicable. It however does so without checking whether the userspace task 
is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner 
case where this doesn't work -- and that's when MSG_TRUNC flag to 
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether there 
is a valid ucopy.task and the socket is owned by userspace, but misses the 
check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is 
a good reproducer, as it makes a heavy use of MSG_TRUNC. On systems 
utilizing IOAT, you will soon find tbench waiting indefinitely in 
sk_wait_data(), as the data have already been early-copied in 
tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple iovec 
copy case to the IOAT case as well. It fixes the indefinite 
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 net/ipv4/tcp_input.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

Comments

David Miller July 27, 2012, 8:31 p.m. UTC | #1
From: Jiri Kosina <jkosina@suse.cz>
Date: Fri, 27 Jul 2012 16:05:06 +0200 (CEST)

>  #ifdef CONFIG_NET_DMA
> -				if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
> +				if (tp->ucopy.task == current &&
> +						sock_owned_by_user(sk) &&
> +						tcp_dma_try_early_copy(sk,
> +							skb, tcp_header_len)) {

This indentation is absolutely terrible.

If you are only able to indent lines using TAB characters, rather than
using an appropriate mixture of TAB and SPACE characters to get the
lines to line up properly, please do not even bother submitting
patches here.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3e07a64..f8059f9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5475,7 +5475,10 @@  int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			if (tp->copied_seq == tp->rcv_nxt &&
 			    len - tcp_header_len <= tp->ucopy.len) {
 #ifdef CONFIG_NET_DMA
-				if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
+				if (tp->ucopy.task == current &&
+						sock_owned_by_user(sk) &&
+						tcp_dma_try_early_copy(sk,
+							skb, tcp_header_len)) {
 					copied_early = 1;
 					eaten = 1;
 				}