From patchwork Tue May 1 19:39:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Soheil Hassas Yeganeh X-Patchwork-Id: 907191 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="MEp0VS4j"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40bBYp2Wq3z9s1w for ; Wed, 2 May 2018 05:39:26 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751116AbeEATjW (ORCPT ); Tue, 1 May 2018 15:39:22 -0400 Received: from mail-qt0-f196.google.com ([209.85.216.196]:37388 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751027AbeEATjV (ORCPT ); Tue, 1 May 2018 15:39:21 -0400 Received: by mail-qt0-f196.google.com with SMTP id q13-v6so14929949qtp.4 for ; Tue, 01 May 2018 12:39:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=LgEZRG+IVFMQnnijmb/Bcr5gjWBx0uMQOCDxzKJIRgc=; b=MEp0VS4j6NDIsCW5uAbHVn9Y7HHtXRfemjSfqpxoCsAwNsmrlWr8IYFUAr7wjYAoz3 VSDAw5+l3764pkdhNGqBHxTnsVa2b1WuH6xW8OzXpbC5FrbtJJV5op1hgXtVBOVvBmxW 8R7pcaisdfr9lX/+PgI8q7p8mVAoVilA25HzqhWYYxGFMeDP4oOicqvnuOvF3D2At2o/ P3NkzK+lsdgjkNfuV0F7Zrw7sCWRns4zafxYmT9CNTMPDX7PyGAaOB8mqjqOmPcVPNeA jm8R5Gm1Gif5023mTYp0nHiZgRIF/j0eWpvN8IKh9Bt2Zhe2ZJLwh3MkrTBRmMByRKOu omGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=LgEZRG+IVFMQnnijmb/Bcr5gjWBx0uMQOCDxzKJIRgc=; b=SOBQtJrwgyIk8nMDzh3PGGuOXh6y/KFTkziVzJHQARjlZOOIE9UA8spU9N5dy2hrM+ x3Gt8SJYgsN6KxiAbuaqi5tFnD3KEGYRL+6Tr37P+YqNm9BWIo7ttrt0SwSrpqqKp+YH ZwykhqyEqtWOJEVGZXXj/kps6FqWGENU5nbQBbN1P+ZHi1Xyc6lqWNNsf5iGDQ1ktDnO QaOizUH56w7L+QbAg8CZoXOaLSdOYg7jtg2Yi4mYnq8rGb0KNGGZQTrTPpR71cK9ylPZ kgxpjdGKmoqlYBGK1AGAtf9PC3+y4utPtIQfRImQX8y4sdk0lUwd2RsW/hu52VT/D5lD M9Kw== X-Gm-Message-State: ALQs6tBsWYXt+9eTWSLV+mVm7MvfN9XOhz4Q4y9kpIZ3lT068gTGAqNC Q8zvV78l66wF2CJJ4EtGALg= X-Google-Smtp-Source: AB8JxZq8mOj9t3VkdC/oI6JGop/O5vG3qYBZ2cZ8mzwuyNIJoQsHkKzXAWXBKl8hDJsrhDynFqwMUQ== X-Received: by 2002:a0c:bd92:: with SMTP id n18-v6mr14259998qvg.132.1525203560811; Tue, 01 May 2018 12:39:20 -0700 (PDT) Received: from z.nyc.corp.google.com ([2620:0:1003:315:9c67:ffa0:44c0:d273]) by smtp.gmail.com with ESMTPSA id j25-v6sm7389308qtn.29.2018.05.01.12.39.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 May 2018 12:39:20 -0700 (PDT) From: Soheil Hassas Yeganeh To: davem@davemloft.net, netdev@vger.kernel.org Cc: ycheng@google.com, ncardwell@google.com, edumazet@google.com, willemb@google.com, Soheil Hassas Yeganeh Subject: [PATCH V4 net-next 1/2] tcp: send in-queue bytes in cmsg upon read Date: Tue, 1 May 2018 15:39:15 -0400 Message-Id: <20180501193916.113642-1-soheil.kdev@gmail.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Soheil Hassas Yeganeh Applications with many concurrent connections, high variance in receive queue length and tight memory bounds cannot allocate worst-case buffer size to drain sockets. Knowing the size of receive queue length, applications can optimize how they allocate buffers to read from the socket. The number of bytes pending on the socket is directly available through ioctl(FIONREAD/SIOCINQ) and can be approximated using getsockopt(MEMINFO) (rmem_alloc includes skb overheads in addition to application data). But, both of these options add an extra syscall per recvmsg. Moreover, ioctl(FIONREAD/SIOCINQ) takes the socket lock. Add the TCP_INQ socket option to TCP. When this socket option is set, recvmsg() relays the number of bytes available on the socket for reading to the application via the TCP_CM_INQ control message. Calculate the number of bytes after releasing the socket lock to include the processed backlog, if any. To avoid an extra branch in the hot path of recvmsg() for this new control message, move all cmsg processing inside an existing branch for processing receive timestamps. Since the socket lock is not held when calculating the size of receive queue, TCP_INQ is a hint. For example, it can overestimate the queue size by one byte, if FIN is received. With this method, applications can start reading from the socket using a small buffer, and then use larger buffers based on the remaining data when needed. V3 change-log: As suggested by David Miller, added loads with barrier to check whether we have multiple threads calling recvmsg in parallel. When that happens we lock the socket to calculate inq. V4 change-log: Removed inline from a static function. Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Yuchung Cheng Signed-off-by: Willem de Bruijn Reviewed-by: Eric Dumazet Reviewed-by: Neal Cardwell Suggested-by: David Miller --- include/linux/tcp.h | 2 +- include/uapi/linux/tcp.h | 3 +++ net/ipv4/tcp.c | 43 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 43 insertions(+), 5 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 20585d5c4e1c3..807776928cb86 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -228,7 +228,7 @@ struct tcp_sock { unused:2; u8 nonagle : 4,/* Disable Nagle algorithm? */ thin_lto : 1,/* Use linear timeouts for thin streams */ - unused1 : 1, + recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */ repair : 1, frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */ u8 repair_queue; diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index e9e8373b34b9d..29eb659aa77a1 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -123,6 +123,9 @@ enum { #define TCP_FASTOPEN_KEY 33 /* Set the key for Fast Open (cookie) */ #define TCP_FASTOPEN_NO_COOKIE 34 /* Enable TFO without a TFO cookie */ #define TCP_ZEROCOPY_RECEIVE 35 +#define TCP_INQ 36 /* Notify bytes available to read as a cmsg on read */ + +#define TCP_CM_INQ TCP_INQ struct tcp_repair_opt { __u32 opt_code; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4028ddd14dd5a..868ed74a76a80 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1889,6 +1889,22 @@ static void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk, } } +static int tcp_inq_hint(struct sock *sk) +{ + const struct tcp_sock *tp = tcp_sk(sk); + u32 copied_seq = READ_ONCE(tp->copied_seq); + u32 rcv_nxt = READ_ONCE(tp->rcv_nxt); + int inq; + + inq = rcv_nxt - copied_seq; + if (unlikely(inq < 0 || copied_seq != READ_ONCE(tp->copied_seq))) { + lock_sock(sk); + inq = tp->rcv_nxt - tp->copied_seq; + release_sock(sk); + } + return inq; +} + /* * This routine copies from a sock struct into the user buffer. * @@ -1905,13 +1921,14 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, u32 peek_seq; u32 *seq; unsigned long used; - int err; + int err, inq; int target; /* Read at least this many bytes */ long timeo; struct sk_buff *skb, *last; u32 urg_hole = 0; struct scm_timestamping tss; bool has_tss = false; + bool has_cmsg; if (unlikely(flags & MSG_ERRQUEUE)) return inet_recv_error(sk, msg, len, addr_len); @@ -1926,6 +1943,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, if (sk->sk_state == TCP_LISTEN) goto out; + has_cmsg = tp->recvmsg_inq; timeo = sock_rcvtimeo(sk, nonblock); /* Urgent data needs to be handled specially. */ @@ -2112,6 +2130,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, if (TCP_SKB_CB(skb)->has_rxtstamp) { tcp_update_recv_tstamps(skb, &tss); has_tss = true; + has_cmsg = true; } if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) goto found_fin_ok; @@ -2131,13 +2150,20 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, * on connected socket. I was just happy when found this 8) --ANK */ - if (has_tss) - tcp_recv_timestamp(msg, sk, &tss); - /* Clean up data we have read: This will do ACK frames. */ tcp_cleanup_rbuf(sk, copied); release_sock(sk); + + if (has_cmsg) { + if (has_tss) + tcp_recv_timestamp(msg, sk, &tss); + if (tp->recvmsg_inq) { + inq = tcp_inq_hint(sk); + put_cmsg(msg, SOL_TCP, TCP_CM_INQ, sizeof(inq), &inq); + } + } + return copied; out: @@ -3006,6 +3032,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level, tp->notsent_lowat = val; sk->sk_write_space(sk); break; + case TCP_INQ: + if (val > 1 || val < 0) + err = -EINVAL; + else + tp->recvmsg_inq = val; + break; default: err = -ENOPROTOOPT; break; @@ -3431,6 +3463,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level, case TCP_NOTSENT_LOWAT: val = tp->notsent_lowat; break; + case TCP_INQ: + val = tp->recvmsg_inq; + break; case TCP_SAVE_SYN: val = tp->save_syn; break; From patchwork Tue May 1 19:39:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Soheil Hassas Yeganeh X-Patchwork-Id: 907192 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="udoSfsNM"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40bBYq4TFxz9s21 for ; Wed, 2 May 2018 05:39:27 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751171AbeEATjZ (ORCPT ); Tue, 1 May 2018 15:39:25 -0400 Received: from mail-qt0-f195.google.com ([209.85.216.195]:38462 "EHLO mail-qt0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750766AbeEATjX (ORCPT ); Tue, 1 May 2018 15:39:23 -0400 Received: by mail-qt0-f195.google.com with SMTP id z23-v6so15728835qti.5 for ; Tue, 01 May 2018 12:39:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=YitvEtbMdH1HckPwE4JGeCmrP9i5GkMp7WXFODtCF8g=; b=udoSfsNMrrOOlqp47AgxmioMJPnAegPlOmsie6fqBR0aX3h9osXM+GOsn8Yd38WqvP UIcOkrPOSw9oCW19OtER7Y9R184MnvglbeGoo8lJY5o2yC3Nm4X+PzZdjuY5qaupwMD4 r/CxwCRcSm1ozKuZuzhUopWB0aVHjX+Q/ULDi7880e9plUQPUs2KUn97bqf03SaKUw0U 4RZT5dqiTVP977xKvxgrfJfi7FWNmY/5xUjCo72VsDEb8scvBVuNuVLSlC8ZyPQhCd3C HeD4gH56U83xcke4kNNejaLhBItMJelckRg5IwPQiqNL3Tipr8R1HaAhdwDDYQb4Dc70 YPHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=YitvEtbMdH1HckPwE4JGeCmrP9i5GkMp7WXFODtCF8g=; b=gW96PypHnnrBkPGyJdP5wzLVUAJ+L4Sl/N+Kj698zhWUKjDEH3Itc7rdv8bIqW+86G 2xzo44yrdU6s/PjY2+ZLoBMvNurUOKshf45gVmHKQSyEQyVvG8LKUXan8d1nHqy8fnvK B3vQ2g0+1vobMbF/YAzbTLIMCKhlD1BajVXUBEsm5z01SOJAhCCuuUNBspL93zLCUi3P y9BwloDskI5ifoXsZpY1L4BLJaQKlD25z2YPTsA6HgL9/TtBVrgmpsgit+UCNHAi3+Fb 9MzdwBMJBTbg5QjmvZuLmuGEan6f71Y7P/8W8eFcEi32UetL6A+PBp7GW9zn5hFDxmJ5 QCNA== X-Gm-Message-State: ALQs6tBzyMyfsspR1q9gLs2VfIP9n5EpMvW/bbFOZVFxWIAjTBwwu7Od mj37zGxigUEkrEquAplNETk= X-Google-Smtp-Source: AB8JxZoLDpk+i9JNDWVX5E8kUgOd7WREtZpp/ETWUjIA6xx1AQMPM6PpJ9tLa/Otz/hwaRyBvhCqRg== X-Received: by 2002:a0c:c342:: with SMTP id j2-v6mr13973961qvi.49.1525203562736; Tue, 01 May 2018 12:39:22 -0700 (PDT) Received: from z.nyc.corp.google.com ([2620:0:1003:315:9c67:ffa0:44c0:d273]) by smtp.gmail.com with ESMTPSA id j25-v6sm7389308qtn.29.2018.05.01.12.39.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 May 2018 12:39:22 -0700 (PDT) From: Soheil Hassas Yeganeh To: davem@davemloft.net, netdev@vger.kernel.org Cc: ycheng@google.com, ncardwell@google.com, edumazet@google.com, willemb@google.com, Soheil Hassas Yeganeh Subject: [PATCH V4 net-next 2/2] selftest: add test for TCP_INQ Date: Tue, 1 May 2018 15:39:16 -0400 Message-Id: <20180501193916.113642-2-soheil.kdev@gmail.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180501193916.113642-1-soheil.kdev@gmail.com> References: <20180501193916.113642-1-soheil.kdev@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Soheil Hassas Yeganeh Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Yuchung Cheng Signed-off-by: Willem de Bruijn Reviewed-by: Eric Dumazet Reviewed-by: Neal Cardwell --- tools/testing/selftests/net/Makefile | 3 +- tools/testing/selftests/net/tcp_inq.c | 189 ++++++++++++++++++++++++++ 2 files changed, 191 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/net/tcp_inq.c diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index df9102ec7b7af..0a1821f8dfb18 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -9,7 +9,7 @@ TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh udpgso.sh TEST_PROGS += udpgso_bench.sh TEST_GEN_FILES = socket TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy -TEST_GEN_FILES += tcp_mmap +TEST_GEN_FILES += tcp_mmap tcp_inq TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict TEST_GEN_PROGS += udpgso udpgso_bench_tx udpgso_bench_rx @@ -18,3 +18,4 @@ include ../lib.mk $(OUTPUT)/reuseport_bpf_numa: LDFLAGS += -lnuma $(OUTPUT)/tcp_mmap: LDFLAGS += -lpthread +$(OUTPUT)/tcp_inq: LDFLAGS += -lpthread diff --git a/tools/testing/selftests/net/tcp_inq.c b/tools/testing/selftests/net/tcp_inq.c new file mode 100644 index 0000000000000..d044b29ddabcc --- /dev/null +++ b/tools/testing/selftests/net/tcp_inq.c @@ -0,0 +1,189 @@ +/* + * Copyright 2018 Google Inc. + * Author: Soheil Hassas Yeganeh (soheil@google.com) + * + * Simple example on how to use TCP_INQ and TCP_CM_INQ. + * + * License (GPLv2): + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for + * more details. + */ +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef TCP_INQ +#define TCP_INQ 36 +#endif + +#ifndef TCP_CM_INQ +#define TCP_CM_INQ TCP_INQ +#endif + +#define BUF_SIZE 8192 +#define CMSG_SIZE 32 + +static int family = AF_INET6; +static socklen_t addr_len = sizeof(struct sockaddr_in6); +static int port = 4974; + +static void setup_loopback_addr(int family, struct sockaddr_storage *sockaddr) +{ + struct sockaddr_in6 *addr6 = (void *) sockaddr; + struct sockaddr_in *addr4 = (void *) sockaddr; + + switch (family) { + case PF_INET: + memset(addr4, 0, sizeof(*addr4)); + addr4->sin_family = AF_INET; + addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); + addr4->sin_port = htons(port); + break; + case PF_INET6: + memset(addr6, 0, sizeof(*addr6)); + addr6->sin6_family = AF_INET6; + addr6->sin6_addr = in6addr_loopback; + addr6->sin6_port = htons(port); + break; + default: + error(1, 0, "illegal family"); + } +} + +void *start_server(void *arg) +{ + int server_fd = (int)(unsigned long)arg; + struct sockaddr_in addr; + socklen_t addrlen = sizeof(addr); + char *buf; + int fd; + int r; + + buf = malloc(BUF_SIZE); + + for (;;) { + fd = accept(server_fd, (struct sockaddr *)&addr, &addrlen); + if (fd == -1) { + perror("accept"); + break; + } + do { + r = send(fd, buf, BUF_SIZE, 0); + } while (r < 0 && errno == EINTR); + if (r < 0) + perror("send"); + if (r != BUF_SIZE) + fprintf(stderr, "can only send %d bytes\n", r); + /* TCP_INQ can overestimate in-queue by one byte if we send + * the FIN packet. Sleep for 1 second, so that the client + * likely invoked recvmsg(). + */ + sleep(1); + close(fd); + } + + free(buf); + close(server_fd); + pthread_exit(0); +} + +int main(int argc, char *argv[]) +{ + struct sockaddr_storage listen_addr, addr; + int c, one = 1, inq = -1; + pthread_t server_thread; + char cmsgbuf[CMSG_SIZE]; + struct iovec iov[1]; + struct cmsghdr *cm; + struct msghdr msg; + int server_fd, fd; + char *buf; + + while ((c = getopt(argc, argv, "46p:")) != -1) { + switch (c) { + case '4': + family = PF_INET; + addr_len = sizeof(struct sockaddr_in); + break; + case '6': + family = PF_INET6; + addr_len = sizeof(struct sockaddr_in6); + break; + case 'p': + port = atoi(optarg); + break; + } + } + + server_fd = socket(family, SOCK_STREAM, 0); + if (server_fd < 0) + error(1, errno, "server socket"); + setup_loopback_addr(family, &listen_addr); + if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, + &one, sizeof(one)) != 0) + error(1, errno, "setsockopt(SO_REUSEADDR)"); + if (bind(server_fd, (const struct sockaddr *)&listen_addr, + addr_len) == -1) + error(1, errno, "bind"); + if (listen(server_fd, 128) == -1) + error(1, errno, "listen"); + if (pthread_create(&server_thread, NULL, start_server, + (void *)(unsigned long)server_fd) != 0) + error(1, errno, "pthread_create"); + + fd = socket(family, SOCK_STREAM, 0); + if (fd < 0) + error(1, errno, "client socket"); + setup_loopback_addr(family, &addr); + if (connect(fd, (const struct sockaddr *)&addr, addr_len) == -1) + error(1, errno, "connect"); + if (setsockopt(fd, SOL_TCP, TCP_INQ, &one, sizeof(one)) != 0) + error(1, errno, "setsockopt(TCP_INQ)"); + + msg.msg_name = NULL; + msg.msg_namelen = 0; + msg.msg_iov = iov; + msg.msg_iovlen = 1; + msg.msg_control = cmsgbuf; + msg.msg_controllen = sizeof(cmsgbuf); + msg.msg_flags = 0; + + buf = malloc(BUF_SIZE); + iov[0].iov_base = buf; + iov[0].iov_len = BUF_SIZE / 2; + + if (recvmsg(fd, &msg, 0) != iov[0].iov_len) + error(1, errno, "recvmsg"); + if (msg.msg_flags & MSG_CTRUNC) + error(1, 0, "control message is truncated"); + + for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) + if (cm->cmsg_level == SOL_TCP && cm->cmsg_type == TCP_CM_INQ) + inq = *((int *) CMSG_DATA(cm)); + + if (inq != BUF_SIZE - iov[0].iov_len) { + fprintf(stderr, "unexpected inq: %d\n", inq); + exit(1); + } + + printf("PASSED\n"); + free(buf); + close(fd); + return 0; +}