From patchwork Mon Mar 7 11:05:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1602056 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JEitYwyu; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4KBwc764mVz9sFt for ; Mon, 7 Mar 2022 22:05:42 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 59E0260E41; Mon, 7 Mar 2022 11:05:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1UyTKB0JDDck; Mon, 7 Mar 2022 11:05:37 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id C8833600C7; Mon, 7 Mar 2022 11:05:36 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8662CC0012; Mon, 7 Mar 2022 11:05:36 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id A85D8C000B for ; Mon, 7 Mar 2022 11:05:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 8D7D681BB0 for ; Mon, 7 Mar 2022 11:05:34 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=fail (1024-bit key) reason="fail (body has been altered)" header.d=redhat.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D9oNeRA3LB1j for ; Mon, 7 Mar 2022 11:05:32 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id ECAC381B72 for ; Mon, 7 Mar 2022 11:05:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646651130; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=eAmNKE5MdWBdJlGZ/cct0Zoy8IcpNd6Cf0jz+NVHzKI=; b=JEitYwyuFL43bK0quDbc86AbFmBW5XCvJEq2jlMhqbepCWKoYG5XPRdnlaWHTUPgmxiWtF NA/+tCHgNfhBjAGmU9gAVekyLvUs1aK03jhIzaookKZUIg24LYlpM3bnYtuwVVdkdZkUi5 LdODCvRHKi3+FME3oktbgJrW+H+icro= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-342-rGhg8wMtNOyZg64VKf-TiA-1; Mon, 07 Mar 2022 06:05:25 -0500 X-MC-Unique: rGhg8wMtNOyZg64VKf-TiA-1 Received: by mail-wm1-f72.google.com with SMTP id l13-20020a7bcf0d000000b0038982c6bf8fso4573808wmg.7 for ; Mon, 07 Mar 2022 03:05:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:user-agent :mime-version:content-transfer-encoding; bh=EuATgHbecmQOGeWkVcTLlP6rTbU8Zkm3x/LmnbfE9zM=; b=RgKzX49BkWTaOowZZdA1IL9hEfJB/vv3Me3swomNkblQILzMF1pWbslKlesgFpk4Nt H/ohfmylzq3R6PoIUSCCj48yj1RUUeKyYTgstLmspyH6o2y8lEiYN/l54QA/+chfAKTp yyBzijPg7EYEsCfguiJQmMWgW5g9Up7rVhA/7ZXem8ILjJ837YPqOmot8u5q8DqtJ4uF pc0GyYSxrr7YgO0WO/nnap/uXA04oZ38gC3IhYxibsFouXS3qN47fjrdf89nfxyc1GO3 mBQkxkSHJdP8XKYezdBDjuWH2umraZHZIDv0BGgAytEKVGcI/jheXrvjQZOF6c+/18Nf r1Og== X-Gm-Message-State: AOAM532lGnIkzIq/IKCrRTNxvS736eFCkxcLG3njqRLCUDr15+Iio9qv quJ98GOsD2yb7JJRNC9VUe5LHgg5fjpxUCdlSgogYEvO3iUBh4MUOJCFXAPRE4JCS3vfqdueZQE tSgxWpAV49pPHQ+PlI/CW8oArhDmbBxxZ8RuaXgAKrrlD//iTcFMt9gMWCqugdlJQ X-Received: by 2002:adf:eb87:0:b0:1ef:85f5:6ab4 with SMTP id t7-20020adfeb87000000b001ef85f56ab4mr8413710wrn.158.1646651123676; Mon, 07 Mar 2022 03:05:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJyZe06eFv84A/S7aQEA/lFdjLhsiB3RVoHZKitHZXCJzfKb+hGE0l7HNpQjLBw+pX6hyB5llA== X-Received: by 2002:adf:eb87:0:b0:1ef:85f5:6ab4 with SMTP id t7-20020adfeb87000000b001ef85f56ab4mr8413646wrn.158.1646651122748; Mon, 07 Mar 2022 03:05:22 -0800 (PST) Received: from localhost (net-93-66-41-83.cust.vodafonedsl.it. [93.66.41.83]) by smtp.gmail.com with ESMTPSA id s17-20020adfdb11000000b001f02d5fea43sm11886986wri.98.2022.03.07.03.05.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Mar 2022 03:05:22 -0800 (PST) From: Paolo Valerio To: dev@openvswitch.org Date: Mon, 07 Mar 2022 12:05:15 +0100 Message-ID: <164665111586.3858280.2879577107463178457.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v2] conntrack: Replaces nat_conn introducing key directionality. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Peng He Currently, when doing NAT, the userspace conntrack will use an extra conn for the two directions in a flow. However, each conn has actually the two keys for both orig and rev directions. This patch introduces a key_node[CT_DIR_MAX] member in the conn which consists of key and a cmap_node for hash lookup. Both keys can now be accessed in the following way: conn->key_node[CT_DIR_{FWD,REV}].key similarly to what Aaron Conole suggested. This patch avoids the extra allocation for nat_conn, and makes userspace code cleaner. Signed-off-by: Peng He Co-authored-by: Paolo Valerio Signed-off-by: Paolo Valerio Reviewed-by: Gaetan Rivet Acked-by: Aaron Conole --- The patch was initially only part of an RFC series (see the link below), but it makes sense outside of that context, so let's keep it separate. RFC -> v1: added Gaetan's Reviewed-by Link to the RFC: https://patchwork.ozlabs.org/project/openvswitch/patch/163820915323.3001886.1292868316263973958.stgit@fed.void/ Link to the initial patch: https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/ v2: Rebased against current master --- lib/conntrack-private.h | 20 +- lib/conntrack-tp.c | 6 - lib/conntrack.c | 507 ++++++++++++++++++++--------------------------- 3 files changed, 233 insertions(+), 300 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index dfdf4e676..c4af3cbf0 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -48,9 +48,16 @@ struct ct_endpoint { * hashing in ct_endpoint_hash_add(). */ BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4); +enum key_dir { + CT_DIR_FWD = 0, + CT_DIR_REV, + CT_DIR_MAX, +}; + /* Changes to this structure need to be reflected in conn_key_hash() * and conn_key_cmp(). */ struct conn_key { + enum key_dir dir; struct ct_endpoint src; struct ct_endpoint dst; @@ -86,21 +93,19 @@ struct alg_exp_node { bool nat_rpl_dst; }; -enum OVS_PACKED_ENUM ct_conn_type { - CT_CONN_TYPE_DEFAULT, - CT_CONN_TYPE_UN_NAT, +struct conn_key_node { + struct conn_key key; + struct cmap_node cm_node; }; struct conn { /* Immutable data. */ - struct conn_key key; - struct conn_key rev_key; + struct conn_key_node key_node[CT_DIR_MAX]; struct conn_key parent_key; /* Only used for orig_tuple support. */ struct ovs_list exp_node; - struct cmap_node cm_node; + uint16_t nat_action; char *alg; - struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */ /* Mutable data. */ struct ovs_mutex lock; /* Guards all mutable fields. */ @@ -120,7 +125,6 @@ struct conn { /* Immutable data. */ bool alg_related; /* True if alg data connection. */ - enum ct_conn_type conn_type; uint32_t tp_id; /* Timeout policy ID. */ }; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index a586d3a8d..2bdda6711 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -276,7 +276,8 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, ovs_mutex_lock(&conn->lock); VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d " "val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_update_expiration__(ct, conn, tm, now, val); } @@ -307,7 +308,8 @@ conn_init_expiration(struct conntrack *ct, struct conn *conn, } VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_init_expiration__(ct, conn, tm, now, val); } diff --git a/lib/conntrack.c b/lib/conntrack.c index 40690e5f0..f520334a1 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -96,7 +96,6 @@ static struct conn *new_conn(struct conntrack *ct, struct dp_packet *pkt, uint32_t tp_id); static void delete_conn_cmn(struct conn *); static void delete_conn(struct conn *); -static void delete_conn_one(struct conn *conn); static enum ct_update_res conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, @@ -110,8 +109,7 @@ static void set_label(struct dp_packet *, struct conn *, static void *clean_thread_main(void *f_); static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info); static uint8_t @@ -231,61 +229,6 @@ conn_key_cmp(const struct conn_key *key1, const struct conn_key *key2) return 1; } -static void -ct_print_conn_info(const struct conn *c, const char *log_msg, - enum vlog_level vll, bool force, bool rl_on) -{ -#define CT_VLOG(RL_ON, LEVEL, ...) \ - do { \ - if (RL_ON) { \ - static struct vlog_rate_limit rl_ = VLOG_RATE_LIMIT_INIT(5, 5); \ - vlog_rate_limit(&this_module, LEVEL, &rl_, __VA_ARGS__); \ - } else { \ - vlog(&this_module, LEVEL, __VA_ARGS__); \ - } \ - } while (0) - - if (OVS_UNLIKELY(force || vlog_is_enabled(&this_module, vll))) { - if (c->key.dl_type == htons(ETH_TYPE_IP)) { - CT_VLOG(rl_on, vll, "%s: src ip "IP_FMT" dst ip "IP_FMT" rev src " - "ip "IP_FMT" rev dst ip "IP_FMT" src/dst ports " - "%"PRIu16"/%"PRIu16" rev src/dst ports " - "%"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, - IP_ARGS(c->key.src.addr.ipv4), - IP_ARGS(c->key.dst.addr.ipv4), - IP_ARGS(c->rev_key.src.addr.ipv4), - IP_ARGS(c->rev_key.dst.addr.ipv4), - ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } else { - char ip6_s[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.src.addr.ipv6, ip6_s, sizeof ip6_s); - char ip6_d[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.dst.addr.ipv6, ip6_d, sizeof ip6_d); - char ip6_rs[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.src.addr.ipv6, ip6_rs, - sizeof ip6_rs); - char ip6_rd[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.dst.addr.ipv6, ip6_rd, - sizeof ip6_rd); - - CT_VLOG(rl_on, vll, "%s: src ip %s dst ip %s rev src ip %s" - " rev dst ip %s src/dst ports %"PRIu16"/%"PRIu16 - " rev src/dst ports %"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, ip6_s, ip6_d, ip6_rs, - ip6_rd, ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } - } -} - /* Initializes the connection tracker 'ct'. The caller is responsible for * calling 'conntrack_destroy()', when the instance is not needed anymore */ struct conntrack * @@ -444,34 +387,28 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone) } static void -conn_clean_cmn(struct conntrack *ct, struct conn *conn) +conn_clean(struct conntrack *ct, struct conn *conn) OVS_REQUIRES(ct->ct_lock) { + struct zone_limit *zl; + uint32_t hash; + if (conn->alg) { - expectation_clean(ct, &conn->key); + expectation_clean(ct, &conn->key_node[CT_DIR_FWD].key); } - uint32_t hash = conn_key_hash(&conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->cm_node, hash); + hash = conn_key_hash(&conn->key_node[CT_DIR_FWD].key, ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); - struct zone_limit *zl = zone_limit_lookup(ct, conn->admit_zone); + zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { zl->czl.count--; } -} -/* Must be called with 'conn' of 'conn_type' CT_CONN_TYPE_DEFAULT. Also - * removes the associated nat 'conn' from the lookup datastructures. */ -static void -conn_clean(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); - - conn_clean_cmn(ct, conn); - if (conn->nat_conn) { - uint32_t hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash); + if (conn->nat_action) { + hash = conn_key_hash(&conn->key_node[CT_DIR_REV].key, + ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); } ovs_list_remove(&conn->exp_node); conn->cleaned = true; @@ -479,33 +416,26 @@ conn_clean(struct conntrack *ct, struct conn *conn) atomic_count_dec(&ct->n_conn); } -static void -conn_clean_one(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - conn_clean_cmn(ct, conn); - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_list_remove(&conn->exp_node); - conn->cleaned = true; - atomic_count_dec(&ct->n_conn); - } - ovsrcu_postpone(delete_conn_one, conn); -} - /* Destroys the connection tracker 'ct' and frees all the allocated memory. * The caller of this function must already have shut down packet input * and PMD threads (which would have been quiesced). */ void conntrack_destroy(struct conntrack *ct) { + struct conn_key_node *keyn; struct conn *conn; + latch_set(&ct->clean_thread_exit); pthread_join(ct->clean_thread, NULL); latch_destroy(&ct->clean_thread_exit); ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - conn_clean_one(ct, conn); + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->key.dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + conn_clean(ct, conn); } cmap_destroy(&ct->conns); @@ -544,31 +474,31 @@ conn_key_lookup(struct conntrack *ct, const struct conn_key *key, uint32_t hash, long long now, struct conn **conn_out, bool *reply) { - struct conn *conn; + struct conn_key_node *keyn; + struct conn *conn = NULL; bool found = false; - CMAP_FOR_EACH_WITH_HASH (conn, cm_node, hash, &ct->conns) { - if (!conn_key_cmp(&conn->key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = false; - } - break; - } - if (!conn_key_cmp(&conn->rev_key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = true; + CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ct->conns) { + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + for (int i = CT_DIR_FWD; i < CT_DIR_MAX; i++) { + if (!conn_key_cmp(&conn->key_node[i].key, key) && + !conn_expired(conn, now)) { + found = true; + if (reply) { + *reply = i; + } + goto out_found; } - break; } } +out_found: if (found && conn_out) { *conn_out = conn; } else if (conn_out) { *conn_out = NULL; } + return found; } @@ -602,7 +532,7 @@ write_ct_md(struct dp_packet *pkt, uint16_t zone, const struct conn *conn, if (conn->alg_related) { key = &conn->parent_key; } else { - key = &conn->key; + key = &conn->key_node[CT_DIR_FWD].key; } } else if (alg_exp) { pkt->md.ct_mark = alg_exp->parent_mark; @@ -725,21 +655,24 @@ handle_alg_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, static void pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key, + *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, conn->rev_key.dst.port, th->tcp_dst); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + packet_set_tcp_port(pkt, rev_key->dst.port, th->tcp_dst); + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, conn->rev_key.dst.port, uh->udp_dst); + packet_set_udp_port(pkt, rev_key->dst.port, uh->udp_dst); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->rev_key.dst.port, - conn->rev_key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->rev_key.dst.port, - conn->rev_key.src.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, rev_key->dst.port, + rev_key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, rev_key->dst.port, + rev_key->src.port); } } } @@ -747,32 +680,35 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn) static void nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) { + const struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key, + *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { pkt->md.ct_state |= CS_SRC_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_src, - conn->rev_key.dst.addr.ipv4); + rev_key->dst.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_src.be32, - &conn->rev_key.dst.addr.ipv6, true); + &rev_key->dst.addr.ipv6, true); } if (!related) { pat_packet(pkt, conn); } } else if (conn->nat_action & NAT_ACTION_DST) { pkt->md.ct_state |= CS_DST_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_dst, - conn->rev_key.src.addr.ipv4); + rev_key->src.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_dst.be32, - &conn->rev_key.src.addr.ipv6, true); + &rev_key->src.addr.ipv6, true); } if (!related) { pat_packet(pkt, conn); @@ -783,19 +719,21 @@ nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) static void un_pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, th->tcp_src, conn->key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + packet_set_tcp_port(pkt, th->tcp_src, key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, uh->udp_src, conn->key.src.port); + packet_set_udp_port(pkt, uh->udp_src, key->src.port); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->key.dst.port, conn->key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->key.dst.port, conn->key.src.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, key->dst.port, key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, key->dst.port, key->src.port); } } } @@ -803,23 +741,25 @@ un_pat_packet(struct dp_packet *pkt, const struct conn *conn) static void reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th_in = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, conn->key.src.port, + packet_set_tcp_port(pkt, key->src.port, th_in->tcp_dst); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh_in = dp_packet_l4(pkt); - packet_set_udp_port(pkt, conn->key.src.port, + packet_set_udp_port(pkt, key->src.port, uh_in->udp_dst); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->key.src.port, - conn->key.dst.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->key.src.port, - conn->key.dst.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, key->src.port, + key->dst.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, key->src.port, + key->dst.port); } } } @@ -827,6 +767,7 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn) static void reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; char *tail = dp_packet_tail(pkt); uint16_t pad = dp_packet_l2_pad_size(pkt); struct conn_key inner_key; @@ -834,7 +775,7 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) uint16_t orig_l3_ofs = pkt->l3_ofs; uint16_t orig_l4_ofs = pkt->l4_ofs; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); struct icmp_header *icmp = dp_packet_l4(pkt); struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1); @@ -847,10 +788,10 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) if (conn->nat_action & NAT_ACTION_SRC) { packet_set_ipv4_addr(pkt, &inner_l3->ip_src, - conn->key.src.addr.ipv4); + key->src.addr.ipv4); } else if (conn->nat_action & NAT_ACTION_DST) { packet_set_ipv4_addr(pkt, &inner_l3->ip_dst, - conn->key.dst.addr.ipv4); + key->dst.addr.ipv4); } reverse_pat_packet(pkt, conn); @@ -870,13 +811,13 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) pkt->l4_ofs += inner_l4 - (char *) icmp6; if (conn->nat_action & NAT_ACTION_SRC) { - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, inner_l3_6->ip6_src.be32, - &conn->key.src.addr.ipv6, true); + &key->src.addr.ipv6, true); } else if (conn->nat_action & NAT_ACTION_DST) { - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, inner_l3_6->ip6_dst.be32, - &conn->key.dst.addr.ipv6, true); + &key->dst.addr.ipv6, true); } reverse_pat_packet(pkt, conn); icmp6->icmp6_base.icmp6_cksum = 0; @@ -891,17 +832,19 @@ static void un_nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { pkt->md.ct_state |= CS_DST_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_dst, - conn->key.src.addr.ipv4); + key->src.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_dst.be32, - &conn->key.src.addr.ipv6, true); + &key->src.addr.ipv6, true); } if (OVS_UNLIKELY(related)) { @@ -911,15 +854,15 @@ un_nat_packet(struct dp_packet *pkt, const struct conn *conn, } } else if (conn->nat_action & NAT_ACTION_DST) { pkt->md.ct_state |= CS_SRC_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_src, - conn->key.dst.addr.ipv4); + key->dst.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_src.be32, - &conn->key.dst.addr.ipv6, true); + &key->dst.addr.ipv6, true); } if (OVS_UNLIKELY(related)) { @@ -937,7 +880,7 @@ conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, { struct conn *conn; ovs_mutex_unlock(&conn_in->lock); - conn_lookup(ct, &conn_in->key, now, &conn, NULL); + conn_lookup(ct, &conn_in->key_node[CT_DIR_FWD].key, now, &conn, NULL); ovs_mutex_lock(&conn_in->lock); if (conn && seq_skew) { @@ -975,7 +918,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, OVS_REQUIRES(ct->ct_lock) { struct conn *nc = NULL; - struct conn *nat_conn = NULL; + uint32_t rev_hash = ctx->hash; if (!valid_new(pkt, &ctx->key)) { pkt->md.ct_state = CS_INVALID; @@ -989,6 +932,8 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } if (commit) { + struct conn_key_node *fwd_key_node, *rev_key_node; + struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); if (zl && zl->czl.count >= zl->czl.limit) { @@ -1003,9 +948,12 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } nc = new_conn(ct, pkt, &ctx->key, now, tp_id); - memcpy(&nc->key, &ctx->key, sizeof nc->key); - memcpy(&nc->rev_key, &nc->key, sizeof nc->rev_key); - conn_key_reverse(&nc->rev_key); + fwd_key_node = &nc->key_node[CT_DIR_FWD]; + rev_key_node = &nc->key_node[CT_DIR_REV]; + memcpy(&fwd_key_node->key, &ctx->key, sizeof fwd_key_node->key); + memcpy(&rev_key_node->key, &fwd_key_node->key, + sizeof rev_key_node->key); + conn_key_reverse(&rev_key_node->key); if (ct_verify_helper(helper, ct_alg_ctl)) { nc->alg = nullable_xstrdup(helper); @@ -1020,45 +968,33 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (nat_action_info) { nc->nat_action = nat_action_info->nat_action; - nat_conn = xzalloc(sizeof *nat_conn); if (alg_exp) { if (alg_exp->nat_rpl_dst) { - nc->rev_key.dst.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.dst.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_SRC; } else { - nc->rev_key.src.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.src.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_DST; } } else { - memcpy(nat_conn, nc, sizeof *nat_conn); - bool nat_res = nat_get_unique_tuple(ct, nc, nat_conn, - nat_action_info); + bool nat_res = nat_get_unique_tuple(ct, nc, nat_action_info); if (!nat_res) { goto nat_res_exhaustion; } - - /* Update nc with nat adjustments made to nat_conn by - * nat_get_unique_tuple(). */ - memcpy(nc, nat_conn, sizeof *nc); } nat_packet(pkt, nc, ctx->icmp_related); - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; - nat_conn->nat_action = 0; - nat_conn->alg = NULL; - nat_conn->nat_conn = NULL; - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); + rev_hash = conn_key_hash(&rev_key_node->key, ct->hash_basis); + rev_key_node->key.dir = CT_DIR_REV; + cmap_insert(&ct->conns, &rev_key_node->cm_node, rev_hash); } - nc->nat_conn = nat_conn; ovs_mutex_init_adaptive(&nc->lock); - nc->conn_type = CT_CONN_TYPE_DEFAULT; - cmap_insert(&ct->conns, &nc->cm_node, ctx->hash); + fwd_key_node->key.dir = CT_DIR_FWD; + cmap_insert(&ct->conns, &fwd_key_node->cm_node, ctx->hash); + atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ if (zl) { @@ -1078,7 +1014,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, * firewall rules or a separate firewall. Also using zone partitioning * can limit DoS impact. */ nat_res_exhaustion: - free(nat_conn); ovs_list_remove(&nc->exp_node); delete_conn_cmn(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); @@ -1092,7 +1027,6 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, struct conn *conn, long long now) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); bool create_new_conn = false; if (ctx->icmp_related) { @@ -1120,7 +1054,8 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, break; case CT_UPDATE_NEW: ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); @@ -1301,11 +1236,12 @@ initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, if (natted) { if (OVS_LIKELY(ctx->conn)) { ctx->reply = !ctx->reply; - ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key; + ctx->key = ctx->conn->key_node[ctx->reply ? + CT_DIR_REV : CT_DIR_FWD].key; ctx->hash = conn_key_hash(&ctx->key, ct->hash_basis); } else { /* A lookup failure does not necessarily imply that an - * error occurred, it may simply indicate that a conn got + * error occurred, it may simply indicate that a ctx->conn got * removed during the recirculation. */ COVERAGE_INC(conntrack_lookup_natted_miss); conn_key_reverse(&ctx->key); @@ -1335,32 +1271,14 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, /* Delete found entry if in wrong direction. 'force' implies commit. */ if (OVS_UNLIKELY(force && ctx->reply && conn)) { ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); conn = NULL; } - if (OVS_LIKELY(conn)) { - if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { - - ctx->reply = true; - struct conn *rev_conn = conn; /* Save for debugging. */ - uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); - conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); - - if (!conn) { - pkt->md.ct_state |= CS_INVALID; - write_ct_md(pkt, zone, NULL, NULL, NULL); - char *log_msg = xasprintf("Missing parent conn %p", rev_conn); - ct_print_conn_info(rev_conn, log_msg, VLL_INFO, true, true); - free(log_msg); - return; - } - } - } - enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt, tp_src, tp_dst, helper); @@ -1453,7 +1371,8 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, struct conn *conn = packet->md.conn; if (OVS_UNLIKELY(packet->md.ct_state == CS_INVALID)) { write_ct_md(packet, zone, NULL, NULL, NULL); - } else if (conn && conn->key.zone == zone && !force + } else if (conn && + conn->key_node[CT_DIR_FWD].key.zone == zone && !force && !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) { process_one_fast(zone, setmark, setlabel, nat_action_info, conn, packet); @@ -2234,7 +2153,7 @@ nat_ipv6_addr_increment(struct in6_addr *ipv6, uint32_t increment) } static uint32_t -nat_range_hash(const struct conn *conn, uint32_t basis, +nat_range_hash(struct conn_key *key, uint32_t basis, const struct nat_action_info_t *nat_info) { uint32_t hash = basis; @@ -2244,11 +2163,11 @@ nat_range_hash(const struct conn *conn, uint32_t basis, hash = hash_add(hash, (nat_info->max_port << 16) | nat_info->min_port); - hash = ct_endpoint_hash_add(hash, &conn->key.src); - hash = ct_endpoint_hash_add(hash, &conn->key.dst); - hash = hash_add(hash, (OVS_FORCE uint32_t) conn->key.dl_type); - hash = hash_add(hash, conn->key.nw_proto); - hash = hash_add(hash, conn->key.zone); + hash = ct_endpoint_hash_add(hash, &key->src); + hash = ct_endpoint_hash_add(hash, &key->dst); + hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type); + hash = hash_add(hash, key->nw_proto); + hash = hash_add(hash, key->zone); /* The purpose of the second parameter is to distinguish hashes of data of * different length; our data always has the same length so there is no @@ -2322,7 +2241,7 @@ get_addr_in_range(union ct_addr *min, union ct_addr *max, } static void -get_initial_addr(const struct conn *conn, union ct_addr *min, +get_initial_addr(struct conn_key *key, union ct_addr *min, union ct_addr *max, union ct_addr *curr, uint32_t hash, bool ipv4, const struct nat_action_info_t *nat_info) @@ -2332,9 +2251,9 @@ get_initial_addr(const struct conn *conn, union ct_addr *min, /* All-zero case. */ if (!memcmp(min, &zero_ip, sizeof *min)) { if (nat_info->nat_action & NAT_ACTION_SRC) { - *curr = conn->key.src.addr; + *curr = key->src.addr; } else if (nat_info->nat_action & NAT_ACTION_DST) { - *curr = conn->key.dst.addr; + *curr = key->dst.addr; } } else { get_addr_in_range(min, max, curr, hash, ipv4); @@ -2398,7 +2317,7 @@ next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, } static bool -nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, +nat_get_unique_l4(struct conntrack *ct, struct conn_key *rev_key, ovs_be16 *port, uint16_t curr, uint16_t min, uint16_t max) { @@ -2406,8 +2325,8 @@ nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, FOR_EACH_PORT_IN_RANGE (curr, min, max) { *port = htons(curr); - if (!conn_lookup(ct, &nat_conn->rev_key, - time_msec(), NULL, NULL)) { + if (!conn_lookup(ct, rev_key, time_msec(), + NULL, NULL)) { return true; } } @@ -2439,44 +2358,47 @@ nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, * * If none can be found, return exhaustion to the caller. */ static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info) { union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0}, guard_addr = {0}; - uint32_t hash = nat_range_hash(conn, ct->hash_basis, nat_info); - bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || - conn->key.nw_proto == IPPROTO_UDP; + struct conn_key *fwd_key = &conn->key_node[CT_DIR_FWD].key, + *rev_key = &conn->key_node[CT_DIR_REV].key; + bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP || + fwd_key->nw_proto == IPPROTO_UDP; uint16_t min_dport, max_dport, curr_dport; uint16_t min_sport, max_sport, curr_sport; + uint32_t hash; + + hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info); min_addr = nat_info->min_addr; max_addr = nat_info->max_addr; - get_initial_addr(conn, &min_addr, &max_addr, &curr_addr, hash, - (conn->key.dl_type == htons(ETH_TYPE_IP)), nat_info); + get_initial_addr(fwd_key, &min_addr, &max_addr, &curr_addr, hash, + (fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info); /* Save the address we started from so that * we can stop once we reach it. */ guard_addr = curr_addr; - set_sport_range(nat_info, &conn->key, hash, &curr_sport, + set_sport_range(nat_info, fwd_key, hash, &curr_sport, &min_sport, &max_sport); - set_dport_range(nat_info, &conn->key, hash, &curr_dport, + set_dport_range(nat_info, fwd_key, hash, &curr_dport, &min_dport, &max_dport); if (pat_proto) { - nat_conn->rev_key.src.port = htons(curr_dport); - nat_conn->rev_key.dst.port = htons(curr_sport); + rev_key->src.port = htons(curr_dport); + rev_key->dst.port = htons(curr_sport); } another_round: - store_addr_to_key(&curr_addr, &nat_conn->rev_key, + store_addr_to_key(&curr_addr, rev_key, nat_info->nat_action); if (!pat_proto) { - if (!conn_lookup(ct, &nat_conn->rev_key, + if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } @@ -2486,12 +2408,12 @@ another_round: bool found = false; if (nat_info->nat_action & NAT_ACTION_DST_PORT) { - found = nat_get_unique_l4(ct, nat_conn, &nat_conn->rev_key.src.port, + found = nat_get_unique_l4(ct, rev_key, &rev_key->src.port, curr_dport, min_dport, max_dport); } if (!found) { - found = nat_get_unique_l4(ct, nat_conn, &nat_conn->rev_key.dst.port, + found = nat_get_unique_l4(ct, rev_key, &rev_key->dst.port, curr_sport, min_sport, max_sport); } @@ -2504,7 +2426,7 @@ another_round: next_addr: if (next_addr_in_range_guarded(&curr_addr, &min_addr, &max_addr, &guard_addr, - conn->key.dl_type == htons(ETH_TYPE_IP))) { + fwd_key->dl_type == htons(ETH_TYPE_IP))) { return false; } @@ -2516,9 +2438,9 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, long long now) { ovs_mutex_lock(&conn->lock); + uint8_t nw_proto = conn->key_node[CT_DIR_FWD].key.nw_proto; enum ct_update_res update_res = - l4_protos[conn->key.nw_proto]->conn_update(ct, conn, pkt, ctx->reply, - now); + l4_protos[nw_proto]->conn_update(ct, conn, pkt, ctx->reply, now); ovs_mutex_unlock(&conn->lock); return update_res; } @@ -2526,13 +2448,10 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, static bool conn_expired(struct conn *conn, long long now) { - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_lock(&conn->lock); - bool expired = now >= conn->expiration ? true : false; - ovs_mutex_unlock(&conn->lock); - return expired; - } - return false; + ovs_mutex_lock(&conn->lock); + bool expired = now >= conn->expiration ? true : false; + ovs_mutex_unlock(&conn->lock); + return expired; } static bool @@ -2558,19 +2477,7 @@ delete_conn_cmn(struct conn *conn) static void delete_conn(struct conn *conn) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); ovs_mutex_destroy(&conn->lock); - free(conn->nat_conn); - delete_conn_cmn(conn); -} - -/* Only used by conn_clean_one(). */ -static void -delete_conn_one(struct conn *conn) -{ - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_destroy(&conn->lock); - } delete_conn_cmn(conn); } @@ -2662,11 +2569,14 @@ static void conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long now) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key, + *rev_key = &conn->key_node[CT_DIR_REV].key; + memset(entry, 0, sizeof *entry); - conn_key_to_tuple(&conn->key, &entry->tuple_orig); - conn_key_to_tuple(&conn->rev_key, &entry->tuple_reply); + conn_key_to_tuple(key, &entry->tuple_orig); + conn_key_to_tuple(rev_key, &entry->tuple_reply); - entry->zone = conn->key.zone; + entry->zone = key->zone; ovs_mutex_lock(&conn->lock); entry->mark = conn->mark; @@ -2674,7 +2584,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long expiration = conn->expiration - now; - struct ct_l4_proto *class = l4_protos[conn->key.nw_proto]; + struct ct_l4_proto *class = l4_protos[key->nw_proto]; if (class->conn_get_protoinfo) { class->conn_get_protoinfo(conn, &entry->protoinfo); } @@ -2722,10 +2632,12 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) if (!cm_node) { break; } + struct conn_key_node *keyn; struct conn *conn; - INIT_CONTAINER(conn, cm_node, cm_node); - if ((!dump->filter_zone || conn->key.zone == dump->zone) && - (conn->conn_type != CT_CONN_TYPE_UN_NAT)) { + INIT_CONTAINER(keyn, cm_node, cm_node); + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + if ((!dump->filter_zone || keyn->key.zone == dump->zone) && + (keyn->key.dir == CT_DIR_FWD)) { conn_to_ct_dpif_entry(conn, entry, now); return 0; } @@ -2743,12 +2655,17 @@ conntrack_dump_done(struct conntrack_dump *dump OVS_UNUSED) int conntrack_flush(struct conntrack *ct, const uint16_t *zone) { + struct conn_key_node *keyn; struct conn *conn; ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - if (!zone || *zone == conn->key.zone) { - conn_clean_one(ct, conn); + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->key.dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + if (!zone || *zone == keyn->key.zone) { + conn_clean(ct, conn); } } ovs_mutex_unlock(&ct->ct_lock); @@ -2769,10 +2686,10 @@ conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, ovs_mutex_lock(&ct->ct_lock); conn_lookup(ct, &key, time_msec(), &conn, NULL); - if (conn && conn->conn_type == CT_CONN_TYPE_DEFAULT) { + if (conn) { conn_clean(ct, conn); } else { - VLOG_WARN("Must flush tuple using the original pre-NATed tuple"); + VLOG_WARN("Tuple not found"); error = ENOENT; } @@ -2916,50 +2833,52 @@ expectation_create(struct conntrack *ct, ovs_be16 dst_port, const struct conn *parent_conn, bool reply, bool src_ip_wc, bool skip_nat) { + const struct conn_key *pconn_key = &parent_conn->key_node[CT_DIR_FWD].key, + *pconn_rev_key = &parent_conn->key_node[CT_DIR_REV].key; union ct_addr src_addr; union ct_addr dst_addr; union ct_addr alg_nat_repl_addr; struct alg_exp_node *alg_exp_node = xzalloc(sizeof *alg_exp_node); if (reply) { - src_addr = parent_conn->key.src.addr; - dst_addr = parent_conn->key.dst.addr; + src_addr = pconn_key->src.addr; + dst_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; if (skip_nat) { alg_nat_repl_addr = dst_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->rev_key.src.addr; + alg_nat_repl_addr = pconn_rev_key->src.addr; alg_exp_node->nat_rpl_dst = false; } else { - alg_nat_repl_addr = parent_conn->rev_key.dst.addr; + alg_nat_repl_addr = pconn_rev_key->dst.addr; } } else { - src_addr = parent_conn->rev_key.src.addr; - dst_addr = parent_conn->rev_key.dst.addr; + src_addr = pconn_rev_key->src.addr; + dst_addr = pconn_rev_key->dst.addr; alg_exp_node->nat_rpl_dst = false; if (skip_nat) { alg_nat_repl_addr = src_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->key.dst.addr; + alg_nat_repl_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; } else { - alg_nat_repl_addr = parent_conn->key.src.addr; + alg_nat_repl_addr = pconn_key->src.addr; } } if (src_ip_wc) { memset(&src_addr, 0, sizeof src_addr); } - alg_exp_node->key.dl_type = parent_conn->key.dl_type; - alg_exp_node->key.nw_proto = parent_conn->key.nw_proto; - alg_exp_node->key.zone = parent_conn->key.zone; + alg_exp_node->key.dl_type = pconn_key->dl_type; + alg_exp_node->key.nw_proto = pconn_key->nw_proto; + alg_exp_node->key.zone = pconn_key->zone; alg_exp_node->key.src.addr = src_addr; alg_exp_node->key.dst.addr = dst_addr; alg_exp_node->key.src.port = ALG_WC_SRC_PORT; alg_exp_node->key.dst.port = dst_port; alg_exp_node->parent_mark = parent_conn->mark; alg_exp_node->parent_label = parent_conn->label; - memcpy(&alg_exp_node->parent_key, &parent_conn->key, + memcpy(&alg_exp_node->parent_key, pconn_key, sizeof alg_exp_node->parent_key); /* Take the write lock here because it is almost 100% * likely that the lookup will fail and @@ -3211,12 +3130,16 @@ process_ftp_ctl_v4(struct conntrack *ct, switch (mode) { case CT_FTP_MODE_ACTIVE: - *v4_addr_rep = conn_for_expectation->rev_key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv4; break; case CT_FTP_MODE_PASSIVE: - *v4_addr_rep = conn_for_expectation->key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->rev_key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_REV].key.src.addr.ipv4; break; case CT_TFTP_MODE: default: @@ -3316,17 +3239,20 @@ process_ftp_ctl_v6(struct conntrack *ct, switch (*mode) { case CT_FTP_MODE_ACTIVE: - *v6_addr_rep = conn_for_expectation->rev_key.dst.addr; + *v6_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr; /* Although most servers will block this exploit, there may be some * less well managed. */ if (memcmp(&ip6_addr, &v6_addr_rep->ipv6, sizeof ip6_addr) && - memcmp(&ip6_addr, &conn_for_expectation->key.src.addr.ipv6, + memcmp(&ip6_addr, + &conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv6, sizeof ip6_addr)) { return CT_FTP_CTL_INVALID; } break; case CT_FTP_MODE_PASSIVE: - *v6_addr_rep = conn_for_expectation->key.dst.addr; + *v6_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr; break; case CT_TFTP_MODE: default: @@ -3487,7 +3413,8 @@ handle_tftp_ctl(struct conntrack *ct, long long now OVS_UNUSED, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED, bool nat OVS_UNUSED) { - expectation_create(ct, conn_for_expectation->key.src.port, + expectation_create(ct, + conn_for_expectation->key_node[CT_DIR_FWD].key.src.port, conn_for_expectation, !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); }