From patchwork Fri Aug 5 04:29:29 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 108606 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3C5C3B6F69 for ; Fri, 5 Aug 2011 14:29:38 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750990Ab1HEE3d (ORCPT ); Fri, 5 Aug 2011 00:29:33 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:64291 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750766Ab1HEE3b (ORCPT ); Fri, 5 Aug 2011 00:29:31 -0400 Received: from www262.sakura.ne.jp (ksav01.sakura.ne.jp [210.224.165.38]) by www262.sakura.ne.jp (8.14.3/8.14.3) with ESMTP id p754TTLB030944; Fri, 5 Aug 2011 13:29:29 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) X-Nat-Received: from [202.181.97.72]:50437 [ident-empty] by smtp-proxy.isp with TPROXY id 1312518569.21076 Received: from www262.sakura.ne.jp (localhost [127.0.0.1]) by www262.sakura.ne.jp (8.14.3/8.14.3) with ESMTP id p754TTBY030940; Fri, 5 Aug 2011 13:29:29 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: (from i-love@localhost) by www262.sakura.ne.jp (8.14.3/8.14.3/Submit) id p754TTBa030939; Fri, 5 Aug 2011 13:29:29 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Message-Id: <201108050429.p754TTBa030939@www262.sakura.ne.jp> X-Authentication-Warning: www262.sakura.ne.jp: i-love set sender to penguin-kernel@i-love.sakura.ne.jp using -f Subject: Re: [PATCH 2/3] net: Cap number of elements for sendmmsg From: Tetsuo Handa To: anton@samba.org, acme@redhat.com Cc: netdev@vger.kernel.org, linux-security-module@vger.kernel.org, davem@davemloft.net, eparis@parisplace.org, casey@schaufler-ca.com, mjt@tls.msk.ru MIME-Version: 1.0 Date: Fri, 05 Aug 2011 13:29:29 +0900 References: <20110805000737.743684961@samba.org> <20110805000822.327910762@samba.org> In-Reply-To: <20110805000822.327910762@samba.org> X-Anti-Virus: Kaspersky Anti-Virus for Linux Mail Server 5.6.44/RELEASE, bases: 05082011 #5813173, status: clean Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Anton Blanchard wrote: > To limit the amount of time we can spend in sendmmsg, cap the > number of elements to UIO_MAXIOV (currently 1024). > > For error handling an application using sendmmsg needs to retry at > the first unsent message, so capping is simpler and requires less > application logic than returning EINVAL. > > Signed-off-by: Anton Blanchard > Cc: stable [3.0+] I think 1024 is a reasonable value. But I also worry that programmers may wish to send more. Apart from the upper limit for vlen argument of sendmmsg()/recvmmsg(), we need to deal with stall problem (described below). It may be possible to abuse sendmmsg() which was introduced in Linux 3.0 and recvmmsg() which was introduced in Linux 2.6.33 for triggering CPU stall warning. I ran below program ---------- Test program start ---------- #include #include #include #include #include #include #include #include #include #include #ifndef __NR_sendmmsg #if defined( __PPC__) #define __NR_sendmmsg 349 #elif defined(__x86_64__) #define __NR_sendmmsg 307 #elif defined(__i386__) #define __NR_sendmmsg 345 #else #error __NR_sendmmsg not defined #endif #endif struct mmsghdr { struct msghdr msg_hdr; unsigned int msg_len; }; static inline int sendmmsg(int fd, struct mmsghdr *mmsg, unsigned vlen, unsigned flags) { return syscall(__NR_sendmmsg, fd, mmsg, vlen, flags, NULL); } #define NUMBATCH (1048576 * 64) int main(int argc, char *argv[]) { const int fd = socket(AF_INET, SOCK_DGRAM, 0); struct mmsghdr *datagrams; unsigned int i; struct iovec iovec = { }; struct sockaddr_in addr = { }; addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); addr.sin_port = htons(10000); datagrams = calloc(sizeof(*datagrams), NUMBATCH); for (i = 0; i < NUMBATCH; ++i) { datagrams[i].msg_hdr.msg_iov = &iovec; datagrams[i].msg_hdr.msg_iovlen = 1; datagrams[i].msg_hdr.msg_name = &addr; datagrams[i].msg_hdr.msg_namelen = sizeof(addr); } printf("Calling sendmmsg()\n"); printf("%d\n", sendmmsg(fd, datagrams, NUMBATCH, 0)); printf("Done\n"); return 0; } ---------- Test program end ---------- and got below output. # time ./a.out Calling sendmmsg() INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies) 67108864 Done real 2m48.736s user 0m0.128s sys 0m16.489s If this application created threads that matches number of CPUs available, and entered into sendmmsg(), the machine will hang for many seconds. Also, signals are ignored when this application is in sendmmsg(). That is, if this application is holding much RAM (like above program) and is selected by OOM-killer, this application cannot be killed until returns from sendmmsg(). Applying below patch solves the stall message and signal delaying problem. ---------------------------------------- [PATCH] net: Fix sendmmsg() stall problem. If the caller passed a huge value (e.g. 64M) to vlen argument of sendmmsg(), the caller triggers "INFO: rcu_sched_state detected stall on CPU X" message because there is no chance to call scheduler. Thus give a chance to call scheduler and also check for pending signal. Also, if the caller passed a value where IS_ERR() returns true (e.g. UINT_MAX), the caller will get EOF and errno will be set to (e.g.) EPERM when all datagrams are successfully sent. Thus, limit the max value of vlen to INT_MAX. Signed-off-by: Tetsuo Handa Cc: stable [3.0+] --- net/socket.c | 5 +++++ 1 file changed, 5 insertions(+) ---------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- linux-3.0.orig/net/socket.c +++ linux-3.0/net/socket.c @@ -1999,6 +1999,8 @@ int __sys_sendmmsg(int fd, struct mmsghd struct compat_mmsghdr __user *compat_entry; struct msghdr msg_sys; + if ((int) vlen < 0) + return -EINVAL; datagrams = 0; sock = sockfd_lookup_light(fd, &err, &fput_needed); @@ -2035,6 +2037,9 @@ int __sys_sendmmsg(int fd, struct mmsghd if (err) break; ++datagrams; + cond_resched(); + if (signal_pending(current)) + break; } out_put: ---------------------------------------- The situation is similar regaring recvmmsg(). Although recvmmsg() less likely stalls than sendmmsg() does, it could happen if a huge number of datagrams are in the socket's receive queue and the caller attempted to fetch all of them at once. Thus, we may want below patch as well. ---------------------------------------- [PATCH] net: Fix recvmmsg() stall problem. If the caller passed a huge value to vlen argument of recvmmsg() and there are enough datagrams in the socket's receive queue, trying to pick up all at once may trigger "INFO: rcu_sched_state detected stall on CPU X" message because there is no chance to call scheduler. Thus give a chance to call scheduler and also check for pending signal. Also, if the caller passed a value where IS_ERR() returns true (e.g. UINT_MAX), the caller will get EOF and errno will be set to (e.g.) EPERM when all datagrams are successfully received. Thus, limit the max value of vlen to INT_MAX. Signed-off-by: Tetsuo Handa Cc: stable [2.6.33+] --- net/socket.c | 5 +++++ 1 file changed, 5 insertions(+) --- linux-3.0.orig/net/socket.c +++ linux-3.0/net/socket.c @@ -2204,6 +2204,8 @@ int __sys_recvmmsg(int fd, struct mmsghd struct msghdr msg_sys; struct timespec end_time; + if ((int) vlen < 0) + return -EINVAL; if (timeout && poll_select_set_timeout(&end_time, timeout->tv_sec, timeout->tv_nsec)) @@ -2247,6 +2249,9 @@ int __sys_recvmmsg(int fd, struct mmsghd if (err) break; ++datagrams; + cond_resched(); + if (signal_pending(current)) + break; /* MSG_WAITFORONE turns on MSG_DONTWAIT after one packet */ if (flags & MSG_WAITFORONE)