From patchwork Mon Dec 18 21:51:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 850433 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.b="IGNPnTua"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z0vsB4zqQz9s81 for ; Tue, 19 Dec 2017 08:52:30 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965208AbdLRVw3 (ORCPT ); Mon, 18 Dec 2017 16:52:29 -0500 Received: from mail-out6.apple.com ([17.151.62.28]:48858 "EHLO mail-in6.apple.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S937497AbdLRVvh (ORCPT ); Mon, 18 Dec 2017 16:51:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; d=apple.com; s=mailout2048s; c=relaxed/simple; q=dns/txt; i=@apple.com; t=1513633894; h=From:Sender:Reply-To:Subject:Date:Message-id:To:Cc:MIME-Version:Content-Type: Content-transfer-encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-reply-to:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=+ZM2ybc0T7Esd59NIstf0XHO4ulFQyprah9VT3ifnsY=; b=IGNPnTuaSxCV/HGPwiQUz7g+ScD+iP5rehzxdgJsJ05WT2ZvywI9YtbeB5uVcRfM sAZ+aMMdXL3vlVthvRjUfa/SNQPyXnk0uR4tlYnh4fSnMUGd9JteiRuGwFaum3D+ 3gItycqISyE2n8AICN5CUwNVmnirVXNIFyrdx3WnhXJuthstqPtAdQFe1hQSEuru f3W1T5GtrNHkU1ZUKxKOYHU8kEAfbpsqvna18fyzJ9UfbEpMCVAKx8JPsQrU6OZB WM3tVq9+zAs/Ry5sjPzhzZEKmpZaUiZGyd1Cd4DkYep2xf+PAMgOo2b0xn5MGrvV b9+ZBbRBebMqrLtKapwQfQ==; Received: from relay6.apple.com (relay6.apple.com [17.128.113.90]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mail-in6.apple.com (Apple Secure Mail Relay) with SMTP id 3B.23.20985.668383A5; Mon, 18 Dec 2017 13:51:34 -0800 (PST) X-AuditID: 11973e15-7125b9c0000051f9-3e-5a383866acbf Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by relay6.apple.com (Apple SCV relay) with SMTP id 86.13.05652.668383A5; Mon, 18 Dec 2017 13:51:34 -0800 (PST) Content-transfer-encoding: 7BIT Received: from localhost ([17.226.23.135]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.1.20171204 64bit (built Dec 4 2017)) with ESMTPSA id <0P1600HV1FDYVT70@nwk-mmpp-sz13.apple.com>; Mon, 18 Dec 2017 13:51:34 -0800 (PST) From: Christoph Paasch To: netdev@vger.kernel.org Cc: Eric Dumazet , Mat Martineau , Alexei Starovoitov , Ursula Braun Subject: [RFC 06/14] tcp_smc: Make SMC use TCP extra-option framework Date: Mon, 18 Dec 2017 13:51:01 -0800 Message-id: <20171218215109.38700-7-cpaasch@apple.com> X-Mailer: git-send-email 2.15.0 In-reply-to: <20171218215109.38700-1-cpaasch@apple.com> References: <20171218215109.38700-1-cpaasch@apple.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrMLMWRmVeSWpSXmKPExsUi2FAYpZtmYRFl0LNY3eLLz9vsFk+PPWK3 +NvSz2JxbIGYxdLpSg6sHgs2lXpsWtXJ5jHvZKDHg0ObWTw+b5ILYI3isklJzcksSy3St0vg yvg28yRjwaUuxorP51awNDDuyuti5OCQEDCR+HojuIuRi0NIYDWTxO6m96ww8ZlTZCHihxgl 3r5/wQYSZxaQlzh4HireyCSxsn8XSxcjJ4ewgKRE9507zCA2m4CWxNvb7awgtoiAlMTHHdvZ QRqYBdYyStydepAFZJCwgKvE2kOcIDUsAqoS6zb3gc3hFTCTOHWmkw3ElgDatfj7TjCbU8Bc oufrI3YQWwio5vzPl0wgMyUEbrNJvD/cxTiBUXAWwn0LGBlXMQrlJmbm6GbmmeklFhTkpOol 5+duYgSF6nQ70R2MZ1ZZHWIU4GBU4uGdcdU8Sog1say4MvcQozQHi5I478VqkyghgfTEktTs 1NSC1KL4otKc1OJDjEwcnFINjKG/w00D4uZtfMr8S5h5WUZ6W6/b3+M+P8rSc5QPqRvFXmZY 36xQZpSo1hb9s3CPRmKcut1s2+znTmoawj/WH5Y/Z5He1lzry1n712VKYa9ak65er6aTz6zg nP3q88TVZ+9XTiwLe5Uw48eihM1LNnjedbZ5Ok3aqYI1ycXgWHBy34NpC94psRRnJBpqMRcV JwIAaKtTuTYCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrPLMWRmVeSWpSXmKPExsUi2FB8QzfNwiLK4MMzJYsvP2+zWzw99ojd 4m9LP4vFsQViFkunKzmweizYVOqxaVUnm8e8k4EeDw5tZvH4vEkugDXK0CYtv6g8sShFoSi5 oMRWqTgjMSW/PN7S2MjUIbGgICdVLzk/V0nfziYlNSezLLVI3y7BMOPbzJOMBZe6GCs+n1vB 0sC4K6+LkYNDQsBEYuYU2S5GLg4hgUOMEm/fv2ADiTMLyEscPA8Vb2SSWNm/i6WLkZNDWEBS ovvOHWYQm01AS+Lt7XZWEFtEQEri447t7CANzAJrGSXuTj3IAjJIWMBVYu0hTpAaFgFViXWb +8Dm8AqYSZw608kGYksA7Vr8fSeYzSlgLtHz9RE7iC0EVHP+50umCYx8sxBOWsDIuIpRoCg1 J7HSTA/u2U2M4FAtjNrB2LDc6hCjAAejEg/vjKvmUUKsiWXFlblAv3EwK4nw+p0FCvGmJFZW pRblxxeV5qQWH2L0AbptIrOUaHI+MI7ySuINjS2MLU0sDAxMLM1McAgrifO6rwKaJZCeWJKa nZpakFoEM46Jg1OqgdFANGq39uvwE4e/v7t3M3XHfl+7gsvTp7ybXFXIfExQ6BCLSYtSXRTP o4yns7KW/dd4cNud6eLB6RYXoiTMVjz33j1hXbiqlt/0speMIetONLpqb5SZef/vm0XZSxcb Tq100Fn1zOPWkfvrEhvyd/vr3OJhEbOK+d4V03grw6f04brf682CHlxUYgEmC0Mt5qLiRAAa Idh4ggIAAA== Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Adopt the extra-option framework for SMC. It allows us to entirely remove SMC-code out of the TCP-stack. The static key is gone, as this is now covered by the static key of the extra-option framework. We allocate state (struct tcp_smc_opt) that indicates whether SMC was successfully negotiated or not and check this state in the relevant functions. Cc: Ursula Braun Signed-off-by: Christoph Paasch Reviewed-by: Mat Martineau --- include/linux/tcp.h | 3 +- include/net/inet_sock.h | 3 +- include/net/tcp.h | 4 - net/ipv4/tcp.c | 5 -- net/ipv4/tcp_input.c | 36 --------- net/ipv4/tcp_minisocks.c | 18 ----- net/ipv4/tcp_output.c | 54 -------------- net/smc/af_smc.c | 190 +++++++++++++++++++++++++++++++++++++++++++++-- 8 files changed, 186 insertions(+), 127 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 4756bd2c4b54..231b352f587f 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -257,8 +257,7 @@ struct tcp_sock { syn_fastopen_ch:1, /* Active TFO re-enabling probe */ syn_data_acked:1,/* data in SYN is acked by SYN-ACK */ save_syn:1, /* Save headers of SYN packet */ - is_cwnd_limited:1,/* forward progress limited by snd_cwnd? */ - syn_smc:1; /* SYN includes SMC */ + is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */ u32 tlp_high_seq; /* snd_nxt at the time of TLP retransmit. */ /* RTT measurement */ diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 39efb968b7a4..8e51b4a69088 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -90,8 +90,7 @@ struct inet_request_sock { wscale_ok : 1, ecn_ok : 1, acked : 1, - no_srccheck: 1, - smc_ok : 1; + no_srccheck: 1; u32 ir_mark; union { struct ip_options_rcu __rcu *ireq_opt; diff --git a/include/net/tcp.h b/include/net/tcp.h index ac62ceff9815..a5c4856e25c7 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2062,10 +2062,6 @@ static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk) return (tcp_call_bpf(sk, BPF_SOCK_OPS_NEEDS_ECN) == 1); } -#if IS_ENABLED(CONFIG_SMC) -extern struct static_key_false tcp_have_smc; -#endif - struct tcp_extopt_store; struct tcp_extopt_ops { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 17f38afb4212..0a1cabee6d5e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -294,11 +294,6 @@ EXPORT_SYMBOL(sysctl_tcp_mem); atomic_long_t tcp_memory_allocated; /* Current allocated memory. */ EXPORT_SYMBOL(tcp_memory_allocated); -#if IS_ENABLED(CONFIG_SMC) -DEFINE_STATIC_KEY_FALSE(tcp_have_smc); -EXPORT_SYMBOL(tcp_have_smc); -#endif - /* * Current number of TCP sockets. */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1950ff80fb3f..af8f4f9fd098 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3671,24 +3671,6 @@ static void tcp_parse_fastopen_option(int len, const unsigned char *cookie, foc->exp = exp_opt; } -static int smc_parse_options(const struct tcphdr *th, - struct tcp_options_received *opt_rx, - const unsigned char *ptr, - int opsize) -{ -#if IS_ENABLED(CONFIG_SMC) - if (static_branch_unlikely(&tcp_have_smc)) { - if (th->syn && !(opsize & 1) && - opsize >= TCPOLEN_EXP_SMC_BASE && - get_unaligned_be32(ptr) == TCPOPT_SMC_MAGIC) { - opt_rx->smc_ok = 1; - return 1; - } - } -#endif - return 0; -} - /* Look for tcp options. Normally only called on SYN and SYNACK packets. * But, this can also be called on packets in the established flow when * the fast version below fails. @@ -3796,9 +3778,6 @@ void tcp_parse_options(const struct net *net, tcp_parse_fastopen_option(opsize - TCPOLEN_EXP_FASTOPEN_BASE, ptr + 2, th->syn, foc, true); - else if (smc_parse_options(th, opt_rx, ptr, - opsize)) - break; else if (opsize >= TCPOLEN_EXP_BASE) tcp_extopt_parse(get_unaligned_be32(ptr), opsize, ptr, skb, @@ -5572,16 +5551,6 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, return false; } -static void smc_check_reset_syn(struct tcp_sock *tp) -{ -#if IS_ENABLED(CONFIG_SMC) - if (static_branch_unlikely(&tcp_have_smc)) { - if (tp->syn_smc && !tp->rx_opt.smc_ok) - tp->syn_smc = 0; - } -#endif -} - static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, const struct tcphdr *th) { @@ -5692,8 +5661,6 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, * is initialized. */ tp->copied_seq = tp->rcv_nxt; - smc_check_reset_syn(tp); - smp_mb(); tcp_finish_connect(sk, skb); @@ -6150,9 +6117,6 @@ static void tcp_openreq_init(struct request_sock *req, ireq->ir_rmt_port = tcp_hdr(skb)->source; ireq->ir_num = ntohs(tcp_hdr(skb)->dest); ireq->ir_mark = inet_request_mark(sk, skb); -#if IS_ENABLED(CONFIG_SMC) - ireq->smc_ok = rx_opt->smc_ok; -#endif } struct request_sock *inet_reqsk_alloc(const struct request_sock_ops *ops, diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 676ad7ca13ad..aa2ff9aadad0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -435,21 +435,6 @@ void tcp_ca_openreq_child(struct sock *sk, const struct dst_entry *dst) } EXPORT_SYMBOL_GPL(tcp_ca_openreq_child); -static void smc_check_reset_syn_req(struct tcp_sock *oldtp, - struct request_sock *req, - struct tcp_sock *newtp) -{ -#if IS_ENABLED(CONFIG_SMC) - struct inet_request_sock *ireq; - - if (static_branch_unlikely(&tcp_have_smc)) { - ireq = inet_rsk(req); - if (oldtp->syn_smc && !ireq->smc_ok) - newtp->syn_smc = 0; - } -#endif -} - /* This is not only more efficient than what we used to do, it eliminates * a lot of code duplication between IPv4/IPv6 SYN recv processing. -DaveM * @@ -467,9 +452,6 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, struct tcp_request_sock *treq = tcp_rsk(req); struct inet_connection_sock *newicsk = inet_csk(newsk); struct tcp_sock *newtp = tcp_sk(newsk); - struct tcp_sock *oldtp = tcp_sk(sk); - - smc_check_reset_syn_req(oldtp, req, newtp); /* Now setup tcp_sock */ newtp->pred_flags = 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 6804a9325107..baf1c913ca7f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -398,21 +398,6 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp) return tp->snd_una != tp->snd_up; } -static void smc_options_write(__be32 *ptr, u16 *options) -{ -#if IS_ENABLED(CONFIG_SMC) - if (static_branch_unlikely(&tcp_have_smc)) { - if (unlikely(OPTION_SMC & *options)) { - *ptr++ = htonl((TCPOPT_NOP << 24) | - (TCPOPT_NOP << 16) | - (TCPOPT_EXP << 8) | - (TCPOLEN_EXP_SMC_BASE)); - *ptr++ = htonl(TCPOPT_SMC_MAGIC); - } - } -#endif -} - /* Write previously computed TCP options to the packet. * * Beware: Something in the Internet is very sensitive to the ordering of @@ -527,45 +512,10 @@ static void tcp_options_write(__be32 *ptr, struct sk_buff *skb, struct sock *sk, ptr += (len + 3) >> 2; } - smc_options_write(ptr, &options); - if (unlikely(!hlist_empty(extopt_list))) tcp_extopt_write(ptr, skb, opts, sk); } -static void smc_set_option(const struct tcp_sock *tp, - struct tcp_out_options *opts, - unsigned int *remaining) -{ -#if IS_ENABLED(CONFIG_SMC) - if (static_branch_unlikely(&tcp_have_smc)) { - if (tp->syn_smc) { - if (*remaining >= TCPOLEN_EXP_SMC_BASE_ALIGNED) { - opts->options |= OPTION_SMC; - *remaining -= TCPOLEN_EXP_SMC_BASE_ALIGNED; - } - } - } -#endif -} - -static void smc_set_option_cond(const struct tcp_sock *tp, - const struct inet_request_sock *ireq, - struct tcp_out_options *opts, - unsigned int *remaining) -{ -#if IS_ENABLED(CONFIG_SMC) - if (static_branch_unlikely(&tcp_have_smc)) { - if (tp->syn_smc && ireq->smc_ok) { - if (*remaining >= TCPOLEN_EXP_SMC_BASE_ALIGNED) { - opts->options |= OPTION_SMC; - *remaining -= TCPOLEN_EXP_SMC_BASE_ALIGNED; - } - } - } -#endif -} - /* Compute TCP options for SYN packets. This is not the final * network wire format yet. */ @@ -631,8 +581,6 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, } } - smc_set_option(tp, opts, &remaining); - if (unlikely(!hlist_empty(&tp->tcp_option_list))) remaining -= tcp_extopt_prepare(skb, TCPHDR_SYN, remaining, opts, tcp_to_sk(tp)); @@ -698,8 +646,6 @@ static unsigned int tcp_synack_options(const struct sock *sk, } } - smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining); - if (unlikely(!hlist_empty(&tcp_rsk(req)->tcp_option_list))) remaining -= tcp_extopt_prepare(skb, TCPHDR_SYN | TCPHDR_ACK, remaining, opts, diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index daf8075f5a4c..14bb84f81a50 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -44,6 +44,149 @@ #include "smc_rx.h" #include "smc_close.h" +static unsigned int tcp_smc_opt_prepare(struct sk_buff *skb, u8 flags, + unsigned int remaining, + struct tcp_out_options *opts, + const struct sock *sk, + struct tcp_extopt_store *store); +static __be32 *tcp_smc_opt_write(__be32 *ptr, struct sk_buff *skb, + struct tcp_out_options *opts, + struct sock *sk, + struct tcp_extopt_store *store); +static void tcp_smc_opt_parse(int opsize, const unsigned char *opptr, + const struct sk_buff *skb, + struct tcp_options_received *opt_rx, + struct sock *sk, + struct tcp_extopt_store *store); +static void tcp_smc_opt_post_process(struct sock *sk, + struct tcp_options_received *opt, + struct tcp_extopt_store *store); +static struct tcp_extopt_store *tcp_smc_opt_copy(struct sock *listener, + struct request_sock *req, + struct tcp_options_received *opt, + struct tcp_extopt_store *store); +static void tcp_smc_opt_destroy(struct tcp_extopt_store *store); + +struct tcp_smc_opt { + struct tcp_extopt_store store; + int smc_ok:1; /* SMC supported on this connection */ + struct rcu_head rcu; +}; + +static const struct tcp_extopt_ops tcp_smc_extra_ops = { + .option_kind = TCPOPT_SMC_MAGIC, + .parse = tcp_smc_opt_parse, + .post_process = tcp_smc_opt_post_process, + .prepare = tcp_smc_opt_prepare, + .write = tcp_smc_opt_write, + .copy = tcp_smc_opt_copy, + .destroy = tcp_smc_opt_destroy, + .owner = THIS_MODULE, +}; + +static struct tcp_smc_opt *tcp_extopt_to_smc(struct tcp_extopt_store *store) +{ + return container_of(store, struct tcp_smc_opt, store); +} + +static struct tcp_smc_opt *tcp_smc_opt_find(struct sock *sk) +{ + struct tcp_extopt_store *ext_opt; + + ext_opt = tcp_extopt_find_kind(TCPOPT_SMC_MAGIC, sk); + + return tcp_extopt_to_smc(ext_opt); +} + +static unsigned int tcp_smc_opt_prepare(struct sk_buff *skb, u8 flags, + unsigned int remaining, + struct tcp_out_options *opts, + const struct sock *sk, + struct tcp_extopt_store *store) +{ + if (!(flags & TCPHDR_SYN)) + return 0; + + if (remaining >= TCPOLEN_EXP_SMC_BASE_ALIGNED) { + opts->options |= OPTION_SMC; + return TCPOLEN_EXP_SMC_BASE_ALIGNED; + } + + return 0; +} + +static __be32 *tcp_smc_opt_write(__be32 *ptr, struct sk_buff *skb, + struct tcp_out_options *opts, + struct sock *sk, + struct tcp_extopt_store *store) +{ + if (unlikely(OPTION_SMC & opts->options)) { + *ptr++ = htonl((TCPOPT_NOP << 24) | + (TCPOPT_NOP << 16) | + (TCPOPT_EXP << 8) | + (TCPOLEN_EXP_SMC_BASE)); + *ptr++ = htonl(TCPOPT_SMC_MAGIC); + } + + return ptr; +} + +static void tcp_smc_opt_parse(int opsize, const unsigned char *opptr, + const struct sk_buff *skb, + struct tcp_options_received *opt_rx, + struct sock *sk, + struct tcp_extopt_store *store) +{ + struct tcphdr *th = tcp_hdr(skb); + + if (th->syn && !(opsize & 1) && opsize >= TCPOLEN_EXP_SMC_BASE) + opt_rx->smc_ok = 1; +} + +static void tcp_smc_opt_post_process(struct sock *sk, + struct tcp_options_received *opt, + struct tcp_extopt_store *store) +{ + struct tcp_smc_opt *smc_opt = tcp_extopt_to_smc(store); + + if (sk->sk_state != TCP_SYN_SENT) + return; + + if (opt->smc_ok) + smc_opt->smc_ok = 1; + else + smc_opt->smc_ok = 0; +} + +static struct tcp_extopt_store *tcp_smc_opt_copy(struct sock *listener, + struct request_sock *req, + struct tcp_options_received *opt, + struct tcp_extopt_store *store) +{ + struct tcp_smc_opt *smc_opt; + + /* First, check if the peer sent us the smc-opt */ + if (!opt->smc_ok) + return NULL; + + smc_opt = kzalloc(sizeof(*smc_opt), GFP_ATOMIC); + if (!smc_opt) + return NULL; + + smc_opt->store.ops = &tcp_smc_extra_ops; + + smc_opt->smc_ok = 1; + + return (struct tcp_extopt_store *)smc_opt; +} + +static void tcp_smc_opt_destroy(struct tcp_extopt_store *store) +{ + struct tcp_smc_opt *smc_opt = tcp_extopt_to_smc(store); + + kfree_rcu(smc_opt, rcu); +} + static DEFINE_MUTEX(smc_create_lgr_pending); /* serialize link group * creation */ @@ -384,13 +527,15 @@ static int smc_connect_rdma(struct smc_sock *smc) struct smc_clc_msg_accept_confirm aclc; int local_contact = SMC_FIRST_CONTACT; struct smc_ib_device *smcibdev; + struct tcp_smc_opt *smc_opt; struct smc_link *link; u8 srv_first_contact; int reason_code = 0; int rc = 0; u8 ibport; - if (!tcp_sk(smc->clcsock->sk)->syn_smc) { + smc_opt = tcp_smc_opt_find(smc->clcsock->sk); + if (!smc_opt || !smc_opt->smc_ok) { /* peer has not signalled SMC-capability */ smc->use_fallback = true; goto out_connected; @@ -535,6 +680,7 @@ static int smc_connect_rdma(struct smc_sock *smc) static int smc_connect(struct socket *sock, struct sockaddr *addr, int alen, int flags) { + struct tcp_smc_opt *smc_opt; struct sock *sk = sock->sk; struct smc_sock *smc; int rc = -EINVAL; @@ -548,9 +694,17 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr, goto out_err; smc->addr = addr; /* needed for nonblocking connect */ + smc_opt = kzalloc(sizeof(*smc_opt), GFP_KERNEL); + if (!smc_opt) { + rc = -ENOMEM; + goto out_err; + } + smc_opt->store.ops = &tcp_smc_extra_ops; + lock_sock(sk); switch (sk->sk_state) { default: + rc = -EINVAL; goto out; case SMC_ACTIVE: rc = -EISCONN; @@ -560,8 +714,15 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr, break; } + /* We are the only owner of smc->clcsock->sk, so we can be lockless */ + rc = tcp_register_extopt(&smc_opt->store, smc->clcsock->sk); + if (rc) { + release_sock(smc->clcsock->sk); + kfree(smc_opt); + goto out_err; + } + smc_copy_sock_settings_to_clc(smc); - tcp_sk(smc->clcsock->sk)->syn_smc = 1; rc = kernel_connect(smc->clcsock, addr, alen, flags); if (rc) goto out; @@ -760,6 +921,7 @@ static void smc_listen_work(struct work_struct *work) struct smc_clc_msg_proposal *pclc; struct smc_ib_device *smcibdev; struct sockaddr_in peeraddr; + struct tcp_smc_opt *smc_opt; u8 buf[SMC_CLC_MAX_LEN]; struct smc_link *link; int reason_code = 0; @@ -769,7 +931,8 @@ static void smc_listen_work(struct work_struct *work) u8 ibport; /* check if peer is smc capable */ - if (!tcp_sk(newclcsock->sk)->syn_smc) { + smc_opt = tcp_smc_opt_find(newclcsock->sk); + if (!smc_opt || !smc_opt->smc_ok) { new_smc->use_fallback = true; goto out_connected; } @@ -962,10 +1125,18 @@ static void smc_tcp_listen_work(struct work_struct *work) static int smc_listen(struct socket *sock, int backlog) { + struct tcp_smc_opt *smc_opt; struct sock *sk = sock->sk; struct smc_sock *smc; int rc; + smc_opt = kzalloc(sizeof(*smc_opt), GFP_KERNEL); + if (!smc_opt) { + rc = -ENOMEM; + goto out_err; + } + smc_opt->store.ops = &tcp_smc_extra_ops; + smc = smc_sk(sk); lock_sock(sk); @@ -978,11 +1149,19 @@ static int smc_listen(struct socket *sock, int backlog) sk->sk_max_ack_backlog = backlog; goto out; } + + /* We are the only owner of smc->clcsock->sk, so we can be lockless */ + rc = tcp_register_extopt(&smc_opt->store, smc->clcsock->sk); + if (rc) { + release_sock(smc->clcsock->sk); + kfree(smc_opt); + goto out_err; + } + /* some socket options are handled in core, so we could not apply * them to the clc socket -- copy smc socket options to clc socket */ smc_copy_sock_settings_to_clc(smc); - tcp_sk(smc->clcsock->sk)->syn_smc = 1; rc = kernel_listen(smc->clcsock, backlog); if (rc) @@ -995,6 +1174,7 @@ static int smc_listen(struct socket *sock, int backlog) out: release_sock(sk); +out_err: return rc; } @@ -1425,7 +1605,6 @@ static int __init smc_init(void) goto out_sock; } - static_branch_enable(&tcp_have_smc); return 0; out_sock: @@ -1450,7 +1629,6 @@ static void __exit smc_exit(void) list_del_init(&lgr->list); smc_lgr_free(lgr); /* free link group */ } - static_branch_disable(&tcp_have_smc); smc_ib_unregister_client(); sock_unregister(PF_SMC); proto_unregister(&smc_proto);