From patchwork Thu Sep 10 00:50:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1361143 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=pEQVX3It; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4Bn39f0yx4z9sSJ for ; Thu, 10 Sep 2020 12:43:50 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730585AbgIJCnY (ORCPT ); Wed, 9 Sep 2020 22:43:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730586AbgIJCHb (ORCPT ); Wed, 9 Sep 2020 22:07:31 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBF82C06136E for ; Wed, 9 Sep 2020 17:51:16 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 129so3870933ybn.15 for ; Wed, 09 Sep 2020 17:51:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=0DDsdUWkpeGDlcHTYL2blsH30ZZYfjjkwjdpdSOQiWE=; b=pEQVX3ItJxKCy4UrNyuy9fLtZ/MG7JnfGF3GluEGit5zgeFNmo5xJB8URMpZ96tXoj JFRnNWCfoZ5EG196lVv2nlcTpc6yRNjIJfxukD8ug7vj/jBtAu3HMYMsbSsIVPnVfNpL D2fuKJJecpIUnPNgqKvM6N546Dmyi8QYm6NEg2gTF3Sqh8xRz7TcVGK2+bsA4Sc0caam EIFMDCvma/yeqfSAQN0XOGmac51MQ2uJCUCaqoJxCovbtBuvLZazL1/9CgH08vzsdhQ7 D6PgEbp+uHyZWr2jMMebu/T3FOnduB51MWYrNEOhcgSrpKJB+ZXtgwHRCBW6cmSI6LOv lmbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0DDsdUWkpeGDlcHTYL2blsH30ZZYfjjkwjdpdSOQiWE=; b=emz7Rt50NdBX1skpSguhBLMA4y7dqb79n0RS2cSEe7WFxwA2PO0IipGMnyblB4hzqU 3SuH7GiS5w3RaxxNrOrpmk/F/7SSnVCQ1qFkWXAtBwx+2sWVqIV7g8ObTtqHtS2lVAxp iQf8yY7nOlEIsoPG6HjjtM+6kLMqjROkQdILKKpA/+vFndzlXDFyZex4ociSpmu1ev9/ /10FdFwS4CuDPBVyzit3JSW2z/kPmzFF4g0v4TJS+cQwD+4Lb45hPUeAYg/kuUFtOBOj mkFamndZ5OQavuY8opvP7HNEs2xSQ/73tR+FQO7w7K7xvSGw9QlZDw/z6Ha63/Y5BkOY GMFQ== X-Gm-Message-State: AOAM533w1F26HZRYhDNQAktqoznU9rBXvibf+VtPhDi9agopXuzAkKs0 7yNMp8pN5Bgxy0t8EYkNUUk7RxxTN3A= X-Google-Smtp-Source: ABdhPJywf8cWiEGTGUb0JRzwKaKhYD98mJzCMnMwLnzN3e4AtMbFlWAZEX8hsUvsSm0mmgtdLHL3MrJ8D5E= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:1ea0:b8ff:fe75:cf08]) (user=weiwan job=sendgmr) by 2002:a25:bcd2:: with SMTP id l18mr8919782ybm.290.1599699076196; Wed, 09 Sep 2020 17:51:16 -0700 (PDT) Date: Wed, 9 Sep 2020 17:50:46 -0700 In-Reply-To: <20200910005048.4146399-1-weiwan@google.com> Message-Id: <20200910005048.4146399-2-weiwan@google.com> Mime-Version: 1.0 References: <20200910005048.4146399-1-weiwan@google.com> X-Mailer: git-send-email 2.28.0.618.gf4bc123cb7-goog Subject: [PATCH net-next 1/3] tcp: record received TOS value in the request socket From: Wei Wang To: "David S . Miller" , netdev@vger.kernel.org Cc: Eric Dumazet , Wei Wang Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A new field is added to the request sock to record the TOS value received on the listening socket during 3WHS: When not under syn flood, it is recording the TOS value sent in SYN. When under syn flood, it is recording the TOS value sent in the ACK. This is a preparation patch in order to do TOS reflection in the later commit. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/linux/tcp.h | 1 + net/ipv4/syncookies.c | 6 +++--- net/ipv4/tcp_input.c | 1 + 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 56ff2952edaf..2f87377e9af7 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -134,6 +134,7 @@ struct tcp_request_sock { * FastOpen it's the seq# * after data-in-SYN. */ + u8 syn_tos; }; static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index f0794f0232ba..c375c126f436 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -286,11 +286,10 @@ struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk, struct sk_buff *skb) { + struct tcp_request_sock *treq; struct request_sock *req; #ifdef CONFIG_MPTCP - struct tcp_request_sock *treq; - if (sk_is_mptcp(sk)) ops = &mptcp_subflow_request_sock_ops; #endif @@ -299,8 +298,9 @@ struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops, if (!req) return NULL; -#if IS_ENABLED(CONFIG_MPTCP) treq = tcp_rsk(req); + treq->syn_tos = TCP_SKB_CB(skb)->ip_dsfield; +#if IS_ENABLED(CONFIG_MPTCP) treq->is_mptcp = sk_is_mptcp(sk); if (treq->is_mptcp) { int err = mptcp_subflow_init_cookie_req(req, sk, skb); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4337841faeff..3658ad84f0c6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6834,6 +6834,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops, tcp_rsk(req)->snt_isn = isn; tcp_rsk(req)->txhash = net_tx_rndhash(); + tcp_rsk(req)->syn_tos = TCP_SKB_CB(skb)->ip_dsfield; tcp_openreq_init_rwin(req, sk, dst); sk_rx_queue_set(req_to_sk(req), skb); if (!want_cookie) { From patchwork Thu Sep 10 00:50:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1361139 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=Gni/co0W; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4Bn2nh412mz9sTK for ; Thu, 10 Sep 2020 12:26:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730720AbgIJCYY (ORCPT ); Wed, 9 Sep 2020 22:24:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730560AbgIJCMD (ORCPT ); Wed, 9 Sep 2020 22:12:03 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52CCCC06136F for ; Wed, 9 Sep 2020 17:51:23 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id q2so3343170pfc.17 for ; Wed, 09 Sep 2020 17:51:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=BzCqo5xgZBo5lkRUUpl4P1WsfLCH683BsG3jzL/NrGM=; b=Gni/co0WqpO+s/m3y9FvhRplSy1+BsWk5adqElUr58DADZInAY7WK+tpTVOyuUB7DX A4/lrcASJ/H9rY07YW+UOU3AGyUH32uChIv/YttoVP7kr3m7qse+3TbWQmo+dUD4A11f oKTanqAwIbNrShNNIyxfoxFuQ8XFx6MOAJ8bYPmth+uNHPdYIdtU1lUs4SPQVxRvP0fZ ZOmCoBgzH+sx8xo9N7qGne3byABD7mnlMiFfJBlH8DDsMo+JFz7K+90/tb7AkUl35Vx7 dHzqZjxPi5YaGZ0l+BP89xnrR4J/ZYgRSGciK8GoABTl5H4oFnz1oXAWagX52fuYLfgt mXKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=BzCqo5xgZBo5lkRUUpl4P1WsfLCH683BsG3jzL/NrGM=; b=DVBo2Bher4XFTMSojcBl9ecCepR56ROzxiWXA4Gs7EjKzeoeBL4C6KIchBfQuKvg3W iDuO929RqWu3mfyqp+tE/2vOva7Z/t43QdUcMiTqMW6X1s/yEpICIvOgo/93Fa3fw7SM kEWxpll33b8sPU1P5mjNgQr2Yy9Lpp/V9SkyDHf4bFpse29s9RgkBRhnMf2tEzQbgi/E anfRUF9v+jAIfgY/Kxx2zGEX2qCL8eGgz3Kzmub7j3tqvJ0KLeJpxu1Gz+WeXpOUfhg/ 17pW35Jy1r0koOLswXfbifgY0HgXW02WLAcg7byNScZpaKqg3eL4JDcYR9cL6ueQwbpC 5MNw== X-Gm-Message-State: AOAM530G1qvRHPgW+JgIL1msRFjCavXbWZwMSsdRfaAZB5gW9pTZd0G/ c12G90LjF3G8pSpEDmjmHWs3i216J3U= X-Google-Smtp-Source: ABdhPJz0yds0a/ctKWfOeZ+euQiljkEdGK3NBmqSqwghhWb9ja05AXg4kH7M1ILkJ0pQsHKzDCCowe6gZxs= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:1ea0:b8ff:fe75:cf08]) (user=weiwan job=sendgmr) by 2002:a63:43c7:: with SMTP id q190mr2403824pga.6.1599699082703; Wed, 09 Sep 2020 17:51:22 -0700 (PDT) Date: Wed, 9 Sep 2020 17:50:47 -0700 In-Reply-To: <20200910005048.4146399-1-weiwan@google.com> Message-Id: <20200910005048.4146399-3-weiwan@google.com> Mime-Version: 1.0 References: <20200910005048.4146399-1-weiwan@google.com> X-Mailer: git-send-email 2.28.0.618.gf4bc123cb7-goog Subject: [PATCH net-next 2/3] ip: pass tos into ip_build_and_send_pkt() From: Wei Wang To: "David S . Miller" , netdev@vger.kernel.org Cc: Eric Dumazet , Wei Wang Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This commit adds tos as a new passed in parameter to ip_build_and_send_pkt() which will be used in the later commit. This is a pure restructure and does not have any functional change. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/ip.h | 2 +- net/dccp/ipv4.c | 6 ++++-- net/ipv4/ip_output.c | 5 +++-- net/ipv4/tcp_ipv4.c | 3 ++- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index b09c48d862cc..0f72bf8c0cbf 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -151,7 +151,7 @@ int igmp_mc_init(void); int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, __be32 saddr, __be32 daddr, - struct ip_options_rcu *opt); + struct ip_options_rcu *opt, u8 tos); int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev); void ip_list_rcv(struct list_head *head, struct packet_type *pt, diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index d8f3751a512b..bb3d70664dde 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -495,7 +495,8 @@ static int dccp_v4_send_response(const struct sock *sk, struct request_sock *req rcu_read_lock(); err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr, ireq->ir_rmt_addr, - rcu_dereference(ireq->ireq_opt)); + rcu_dereference(ireq->ireq_opt), + inet_sk(sk)->tos); rcu_read_unlock(); err = net_xmit_eval(err); } @@ -537,7 +538,8 @@ static void dccp_v4_ctl_send_reset(const struct sock *sk, struct sk_buff *rxskb) local_bh_disable(); bh_lock_sock(ctl_sk); err = ip_build_and_send_pkt(skb, ctl_sk, - rxiph->daddr, rxiph->saddr, NULL); + rxiph->daddr, rxiph->saddr, NULL, + inet_sk(ctl_sk)->tos); bh_unlock_sock(ctl_sk); if (net_xmit_eval(err) == 0) { diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index b931d0b02e49..5fb536ff51f0 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -142,7 +142,8 @@ static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst) * */ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, - __be32 saddr, __be32 daddr, struct ip_options_rcu *opt) + __be32 saddr, __be32 daddr, struct ip_options_rcu *opt, + u8 tos) { struct inet_sock *inet = inet_sk(sk); struct rtable *rt = skb_rtable(skb); @@ -155,7 +156,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, iph = ip_hdr(skb); iph->version = 4; iph->ihl = 5; - iph->tos = inet->tos; + iph->tos = tos; iph->ttl = ip_select_ttl(inet, &rt->dst); iph->daddr = (opt && opt->opt.srr ? opt->opt.faddr : daddr); iph->saddr = saddr; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index af27cfa9d8d3..c4c7ad4c8b5a 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -985,7 +985,8 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst, rcu_read_lock(); err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr, ireq->ir_rmt_addr, - rcu_dereference(ireq->ireq_opt)); + rcu_dereference(ireq->ireq_opt), + inet_sk(sk)->tos); rcu_read_unlock(); err = net_xmit_eval(err); } From patchwork Thu Sep 10 00:50:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 1361131 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=IQAXT+lj; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4Bn2WF2PmDz9sTd for ; Thu, 10 Sep 2020 12:14:01 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730483AbgIJCNO (ORCPT ); Wed, 9 Sep 2020 22:13:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730715AbgIJCLn (ORCPT ); Wed, 9 Sep 2020 22:11:43 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8C4DC061370 for ; Wed, 9 Sep 2020 17:51:27 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id b18so3039650qto.4 for ; Wed, 09 Sep 2020 17:51:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=bevnLS6p2i/KUfO+B56/HYDMgMMRrNgD8ihrhRxCIaM=; b=IQAXT+ljhkmSUlT6cbgY+YKcphqXvrvAgzCYhP091xCR76D6mdX6C7Dd8Vux8PTVDy 1XZ29FzYKpM54II8JIBXDSxhjz51lntB9aNcNE3P3kDE0yopLdy8JLQcW7S7qYCnme51 9fnK60ZzlvacVAKFmGx7ps+vKZQ8gXoiF8g6mOo7TQ6ELAVT2SB56Tp2vOrEBB+C9wfN 1UBI7lwmyFHLaG7W9zUGfKkT+g5zwpxO67KXr3yITFnF4aZjcurZmH8lh308mkynxsTr 4XRAZFb1CuNMt19JuCcUmO/jFLK0/tSBUgEV9l1VGM8ueR7yT+u8s+t51GrWStXwg4ax jlMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=bevnLS6p2i/KUfO+B56/HYDMgMMRrNgD8ihrhRxCIaM=; b=dVSRzxosxyNkXVUkFT2vHBUy1yZKFHHlRyvqqCBN5a1so4g36YEn+c3CkDKJqQ4Ydy RQ93J8kbcWTUJPpxwErnVTPaDRRQHBTzCmfosHtihXg5xVkkSyUOk2ig46hBHcXh3f80 YgQAaVi5O+W3fiU3Zu4f32LRwI0zl19S9nGfjQCUBmf+ZCbPUcOFlsCeYUYvgMLyRwVw QSrf32VwpELGrFt5YqirjOPAoaLQ4Z8NOBnJxvnc/bT3eStNwG2sEf3VJ/G21q0hKyVW mmde2+uPJmE1OCOitzbu03gQ0hal3SP+Yj7eTRcm44lUOB3DW9hQu3bPHMmpJnzQdU3D WBAA== X-Gm-Message-State: AOAM533RJydLJGv6Pb7Ihe0InJ2PMc3HTQ+UwFUSF9rpDTLj7N6eSwFw 3igofwsNRdSXCawXJl8U205cxHVtLFU= X-Google-Smtp-Source: ABdhPJyuGrfS+gUWcODccRXmVBD4fmtstIZvL5DRTGURaYiN1qj3kU06PyTjwQUedK9SkB8hBX9kVZDxWSE= X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:1ea0:b8ff:fe75:cf08]) (user=weiwan job=sendgmr) by 2002:a0c:a063:: with SMTP id b90mr4132472qva.25.1599699085534; Wed, 09 Sep 2020 17:51:25 -0700 (PDT) Date: Wed, 9 Sep 2020 17:50:48 -0700 In-Reply-To: <20200910005048.4146399-1-weiwan@google.com> Message-Id: <20200910005048.4146399-4-weiwan@google.com> Mime-Version: 1.0 References: <20200910005048.4146399-1-weiwan@google.com> X-Mailer: git-send-email 2.28.0.618.gf4bc123cb7-goog Subject: [PATCH net-next 3/3] tcp: reflect tos value received in SYN to the socket From: Wei Wang To: "David S . Miller" , netdev@vger.kernel.org Cc: Eric Dumazet , Wei Wang Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This commit adds a new TCP feature to reflect the tos value received in SYN, and send it out on the SYN-ACK, and eventually set the tos value of the established socket with this reflected tos value. This provides a way to set the traffic class/QoS level for all traffic in the same connection to be the same as the incoming SYN request. It could be useful in data centers to provide equivalent QoS according to the incoming request. This feature is guarded by /proc/sys/net/ipv4/tcp_reflect_tos, and is by default turned off. Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 1 + net/ipv4/sysctl_net_ipv4.c | 9 +++++++++ net/ipv4/tcp_ipv4.c | 10 +++++++++- net/ipv6/tcp_ipv6.c | 10 +++++++++- 4 files changed, 28 insertions(+), 2 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 9e36738c1fe1..8e4fcac4df72 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -183,6 +183,7 @@ struct netns_ipv4 { unsigned int sysctl_tcp_fastopen_blackhole_timeout; atomic_t tfo_active_disable_times; unsigned long tfo_active_disable_stamp; + int sysctl_tcp_reflect_tos; int sysctl_udp_wmem_min; int sysctl_udp_rmem_min; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 54023a46db04..3e5f4f2e705e 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -1329,6 +1329,15 @@ static struct ctl_table ipv4_net_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = &comp_sack_nr_max, }, + { + .procname = "tcp_reflect_tos", + .data = &init_net.ipv4.sysctl_tcp_reflect_tos, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { .procname = "udp_rmem_min", .data = &init_net.ipv4.sysctl_udp_rmem_min, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index c4c7ad4c8b5a..ace48b2790ff 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -972,6 +972,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst, struct flowi4 fl4; int err = -1; struct sk_buff *skb; + u8 tos; /* First, grab a route. */ if (!dst && (dst = inet_csk_route_req(sk, &fl4, req)) == NULL) @@ -979,6 +980,9 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst, skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb); + tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ? + tcp_rsk(req)->syn_tos : inet_sk(sk)->tos; + if (skb) { __tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr); @@ -986,7 +990,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst, err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr, ireq->ir_rmt_addr, rcu_dereference(ireq->ireq_opt), - inet_sk(sk)->tos); + tos & ~INET_ECN_MASK); rcu_read_unlock(); err = net_xmit_eval(err); } @@ -1531,6 +1535,10 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb, inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen; newinet->inet_id = prandom_u32(); + /* Set ToS of the new socket based upon the value of incoming SYN. */ + if (sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) + newinet->tos = tcp_rsk(req)->syn_tos & ~INET_ECN_MASK; + if (!dst) { dst = inet_csk_route_child_sock(sk, newsk, req); if (!dst) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 04efa3ee80ef..862058dce6d0 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -510,6 +510,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, struct flowi6 *fl6 = &fl->u.ip6; struct sk_buff *skb; int err = -ENOMEM; + u8 tclass; /* First, grab a route. */ if (!dst && (dst = inet6_csk_route_req(sk, fl6, req, @@ -528,9 +529,12 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, rcu_read_lock(); opt = ireq->ipv6_opt; + tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ? + tcp_rsk(req)->syn_tos : np->tclass; if (!opt) opt = rcu_dereference(np->opt); - err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt, np->tclass, + err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt, + tclass & ~INET_ECN_MASK, sk->sk_priority); rcu_read_unlock(); err = net_xmit_eval(err); @@ -1310,6 +1314,10 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * if (np->repflow) newnp->flow_label = ip6_flowlabel(ipv6_hdr(skb)); + /* Set ToS of the new socket based upon the value of incoming SYN. */ + if (sock_net(sk)->ipv4.sysctl_tcp_reflect_tos) + newnp->tclass = tcp_rsk(req)->syn_tos & ~INET_ECN_MASK; + /* Clone native IPv6 options from listening socket (if any) Yes, keeping reference count would be much more clever,