From patchwork Fri Jul 1 13:34:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1651202 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fZr0Mf+c; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LZGQt6bxNz9s09 for ; Fri, 1 Jul 2022 23:35:02 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id D2B88846C5; Fri, 1 Jul 2022 13:35:00 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org D2B88846C5 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fZr0Mf+c X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IocdAZkHdf6v; Fri, 1 Jul 2022 13:34:59 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id BA9ED846D4; Fri, 1 Jul 2022 13:34:58 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org BA9ED846D4 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8E28DC0035; Fri, 1 Jul 2022 13:34:58 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4AE33C002D for ; Fri, 1 Jul 2022 13:34:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3324B61351 for ; Fri, 1 Jul 2022 13:34:57 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 3324B61351 Authentication-Results: smtp3.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fZr0Mf+c X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Yr-wNIxc1DtH for ; Fri, 1 Jul 2022 13:34:56 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 1F99E613D8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id 1F99E613D8 for ; Fri, 1 Jul 2022 13:34:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656682495; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iyjX1zla/MAKp4qeqrM9ZdmRrINc/MouYuroqpWHHmI=; b=fZr0Mf+cANRQTk4ybI+1ZzuPukZq5L31jYoQHHDDEdrulE7OPCy8oPBzROhbBXX/ssvZy8 Dqfn7h/kkXOutZtimnMpmDMiJdIWDzNk2oW9Y2GQj2LgFl4fs90EcQnDssGGn9N4PxgGoa EDBxKl4gLiJiPiMhZ53PMPsoGKQ0FEo= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-265-8_jjkP0mOzGuOgmmAGuJDw-1; Fri, 01 Jul 2022 09:34:54 -0400 X-MC-Unique: 8_jjkP0mOzGuOgmmAGuJDw-1 Received: by mail-wm1-f70.google.com with SMTP id p6-20020a05600c358600b003a0483b3c2eso1425666wmq.3 for ; Fri, 01 Jul 2022 06:34:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=iyjX1zla/MAKp4qeqrM9ZdmRrINc/MouYuroqpWHHmI=; b=1TDhE4hhHelaynss9wFV+BGchGPPxrtby2vopEPs6g/tn6j1D5u6DFTHN5/Vfxv6aF JyxdSUuQQ0p7l4E2KBC22NsAPQ84+JwrcJigWvOHdYUQGdJ8B5BtVl/OWOHgU2//jmk5 1OvEFuNHavAHNXafKG+YWyx79IpGgrwoeOvaBNDDAVaqkkUgFXpBALTePQ8Qz5HPlnNQ FDOTDrKsAuBq0YBCx78Xl+bYQBDL+6MK8Jc08T/2XJDa+n7u8pULkhXczVnqRmU4SIhY eM1MMSZiQbsrNkKMowRPCsCBMppURPxs9X/XhkCHUuXTqMA/OUpnPAMUk2z61GT+ATnk Ywog== X-Gm-Message-State: AJIora+FFszMYTDardupgucAETu1FpEno0wyNGA9X2L/NDSnLXA2P4A6 W72nD20hpaQ0rwqZAmPs4rpAm7iGrFnBSPPx8xO/SB78hlWHJOyuZhZsGONGX0lWZDCgUTbUPiA Gg5fGNPAdrOx1OjStlWa/SCXX0vbN1YF6BGqeLgCSJcNYJ9xeApT/M0ia9bFZF3dG X-Received: by 2002:a05:600c:4f86:b0:3a0:56ad:da10 with SMTP id n6-20020a05600c4f8600b003a056adda10mr15987782wmq.17.1656682492567; Fri, 01 Jul 2022 06:34:52 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uodho+fShCnr1Y9h0stCELYDCCEHZE26p0IzmPRlw/aF7Ew2y85EncIBaX8zp4ks+nqcxFRQ== X-Received: by 2002:a05:600c:4f86:b0:3a0:56ad:da10 with SMTP id n6-20020a05600c4f8600b003a056adda10mr15987745wmq.17.1656682492202; Fri, 01 Jul 2022 06:34:52 -0700 (PDT) Received: from localhost (net-5-95-131-226.cust.vodafonedsl.it. [5.95.131.226]) by smtp.gmail.com with ESMTPSA id ay30-20020a05600c1e1e00b003a033177655sm6810502wmb.29.2022.07.01.06.34.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Jul 2022 06:34:51 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Fri, 01 Jul 2022 15:34:51 +0200 Message-ID: <165668249107.1967719.117828246513672504.stgit@fed.void> In-Reply-To: <165668246947.1967719.4502709837520339365.stgit@fed.void> References: <165668246947.1967719.4502709837520339365.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v4 1/6] conntrack: Use a cmap to store zone limits X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Gaetan Rivet Change the data structure from hmap to cmap for zone limits. As they are shared amongst multiple conntrack users, multiple readers want to check the current zone limit state before progressing in their processing. Using a CMAP allows doing lookups without taking the global 'ct_lock', thus reducing contention. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 2 + lib/conntrack.c | 70 ++++++++++++++++++++++++++++++++--------------- lib/conntrack.h | 2 + lib/dpif-netdev.c | 5 ++- 4 files changed, 53 insertions(+), 26 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index dfdf4e676..d9461b811 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -192,7 +192,7 @@ struct conntrack { struct ovs_mutex ct_lock; /* Protects 2 following fields. */ struct cmap conns OVS_GUARDED; struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED; - struct hmap zone_limits OVS_GUARDED; + struct cmap zone_limits OVS_GUARDED; struct hmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ diff --git a/lib/conntrack.c b/lib/conntrack.c index faa2d6ab7..6df1142b9 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -81,7 +81,7 @@ enum ct_alg_ctl_type { }; struct zone_limit { - struct hmap_node node; + struct cmap_node node; struct conntrack_zone_limit czl; }; @@ -311,7 +311,7 @@ conntrack_init(void) for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) { ovs_list_init(&ct->exp_lists[i]); } - hmap_init(&ct->zone_limits); + cmap_init(&ct->zone_limits); ct->zone_limit_seq = 0; timeout_policy_init(ct); ovs_mutex_unlock(&ct->ct_lock); @@ -346,12 +346,25 @@ zone_key_hash(int32_t zone, uint32_t basis) } static struct zone_limit * -zone_limit_lookup(struct conntrack *ct, int32_t zone) +zone_limit_lookup_protected(struct conntrack *ct, int32_t zone) OVS_REQUIRES(ct->ct_lock) { uint32_t hash = zone_key_hash(zone, ct->hash_basis); struct zone_limit *zl; - HMAP_FOR_EACH_IN_BUCKET (zl, node, hash, &ct->zone_limits) { + CMAP_FOR_EACH_WITH_HASH_PROTECTED (zl, node, hash, &ct->zone_limits) { + if (zl->czl.zone == zone) { + return zl; + } + } + return NULL; +} + +static struct zone_limit * +zone_limit_lookup(struct conntrack *ct, int32_t zone) +{ + uint32_t hash = zone_key_hash(zone, ct->hash_basis); + struct zone_limit *zl; + CMAP_FOR_EACH_WITH_HASH (zl, node, hash, &ct->zone_limits) { if (zl->czl.zone == zone) { return zl; } @@ -361,7 +374,6 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone) static struct zone_limit * zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone) - OVS_REQUIRES(ct->ct_lock) { struct zone_limit *zl = zone_limit_lookup(ct, zone); return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE); @@ -370,13 +382,16 @@ zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone) struct conntrack_zone_limit zone_limit_get(struct conntrack *ct, int32_t zone) { - ovs_mutex_lock(&ct->ct_lock); - struct conntrack_zone_limit czl = {DEFAULT_ZONE, 0, 0, 0}; + struct conntrack_zone_limit czl = { + .zone = DEFAULT_ZONE, + .limit = 0, + .count = ATOMIC_COUNT_INIT(0), + .zone_limit_seq = 0, + }; struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone); if (zl) { czl = zl->czl; } - ovs_mutex_unlock(&ct->ct_lock); return czl; } @@ -384,13 +399,19 @@ static int zone_limit_create(struct conntrack *ct, int32_t zone, uint32_t limit) OVS_REQUIRES(ct->ct_lock) { + struct zone_limit *zl = zone_limit_lookup_protected(ct, zone); + + if (zl) { + return 0; + } + if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) { - struct zone_limit *zl = xzalloc(sizeof *zl); + zl = xzalloc(sizeof *zl); zl->czl.limit = limit; zl->czl.zone = zone; zl->czl.zone_limit_seq = ct->zone_limit_seq++; uint32_t hash = zone_key_hash(zone, ct->hash_basis); - hmap_insert(&ct->zone_limits, &zl->node, hash); + cmap_insert(&ct->zone_limits, &zl->node, hash); return 0; } else { return EINVAL; @@ -401,13 +422,14 @@ int zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit) { int err = 0; - ovs_mutex_lock(&ct->ct_lock); struct zone_limit *zl = zone_limit_lookup(ct, zone); if (zl) { zl->czl.limit = limit; VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone); } else { + ovs_mutex_lock(&ct->ct_lock); err = zone_limit_create(ct, zone, limit); + ovs_mutex_unlock(&ct->ct_lock); if (!err) { VLOG_INFO("Created zone limit of %u for zone %d", limit, zone); } else { @@ -415,7 +437,6 @@ zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit) zone); } } - ovs_mutex_unlock(&ct->ct_lock); return err; } @@ -423,23 +444,25 @@ static void zone_limit_clean(struct conntrack *ct, struct zone_limit *zl) OVS_REQUIRES(ct->ct_lock) { - hmap_remove(&ct->zone_limits, &zl->node); - free(zl); + uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); + cmap_remove(&ct->zone_limits, &zl->node, hash); + ovsrcu_postpone(free, zl); } int zone_limit_delete(struct conntrack *ct, uint16_t zone) { ovs_mutex_lock(&ct->ct_lock); - struct zone_limit *zl = zone_limit_lookup(ct, zone); + struct zone_limit *zl = zone_limit_lookup_protected(ct, zone); if (zl) { zone_limit_clean(ct, zl); + ovs_mutex_unlock(&ct->ct_lock); VLOG_INFO("Deleted zone limit for zone %d", zone); } else { + ovs_mutex_unlock(&ct->ct_lock); VLOG_INFO("Attempted delete of non-existent zone limit: zone %d", zone); } - ovs_mutex_unlock(&ct->ct_lock); return 0; } @@ -456,7 +479,7 @@ conn_clean_cmn(struct conntrack *ct, struct conn *conn) struct zone_limit *zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { - zl->czl.count--; + atomic_count_dec(&zl->czl.count); } } @@ -510,10 +533,13 @@ conntrack_destroy(struct conntrack *ct) cmap_destroy(&ct->conns); struct zone_limit *zl; - HMAP_FOR_EACH_POP (zl, node, &ct->zone_limits) { - free(zl); + CMAP_FOR_EACH (zl, node, &ct->zone_limits) { + uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); + + cmap_remove(&ct->zone_limits, &zl->node, hash); + ovsrcu_postpone(free, zl); } - hmap_destroy(&ct->zone_limits); + cmap_destroy(&ct->zone_limits); struct timeout_policy *tp; HMAP_FOR_EACH_POP (tp, node, &ct->timeout_policies) { @@ -991,7 +1017,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (commit) { struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); - if (zl && zl->czl.count >= zl->czl.limit) { + if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { return nc; } @@ -1064,7 +1090,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (zl) { nc->admit_zone = zl->czl.zone; nc->zone_limit_seq = zl->czl.zone_limit_seq; - zl->czl.count++; + atomic_count_inc(&zl->czl.count); } else { nc->admit_zone = INVALID_ZONE; } diff --git a/lib/conntrack.h b/lib/conntrack.h index 9553b188a..58b181834 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -108,7 +108,7 @@ struct conntrack_dump { struct conntrack_zone_limit { int32_t zone; uint32_t limit; - uint32_t count; + atomic_count count; uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ }; diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index d09138f2c..760cde1a4 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -9263,7 +9263,8 @@ dpif_netdev_ct_get_limits(struct dpif *dpif, czl = zone_limit_get(dp->conntrack, zone_limit->zone); if (czl.zone == zone_limit->zone || czl.zone == DEFAULT_ZONE) { ct_dpif_push_zone_limit(zone_limits_reply, zone_limit->zone, - czl.limit, czl.count); + czl.limit, + atomic_count_get(&czl.count)); } else { return EINVAL; } @@ -9273,7 +9274,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif, czl = zone_limit_get(dp->conntrack, z); if (czl.zone == z) { ct_dpif_push_zone_limit(zone_limits_reply, z, czl.limit, - czl.count); + atomic_count_get(&czl.count)); } } } From patchwork Fri Jul 1 13:34:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1651203 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KHTVDd4M; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LZGR50c8Dz9s09 for ; Fri, 1 Jul 2022 23:35:13 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3E8CB613F8; Fri, 1 Jul 2022 13:35:11 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 3E8CB613F8 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KHTVDd4M X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VUCWXr9a_inY; Fri, 1 Jul 2022 13:35:10 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 1FFF5613E3; Fri, 1 Jul 2022 13:35:09 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 1FFF5613E3 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id DB9E8C0035; Fri, 1 Jul 2022 13:35:08 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 15E81C0037 for ; Fri, 1 Jul 2022 13:35:08 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 52ABA417C6 for ; Fri, 1 Jul 2022 13:35:06 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 52ABA417C6 Authentication-Results: smtp4.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KHTVDd4M X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C9OhM5l27_hg for ; Fri, 1 Jul 2022 13:35:04 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org C24F242463 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id C24F242463 for ; Fri, 1 Jul 2022 13:35:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656682501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DQ8AJ5QUxo9XnIkLf81k7Qz4hZ0b0lm4rQh44tOhYXk=; b=KHTVDd4M/3aGx/eqtne8sytYN3GUD15Wi4JCnb0vi/W+MnaTyyG40KsbnZOFSoFp/fwA6R JI1sFDNvKZiXumlQQSiLKrPiiVl5izpfDpk6xXDIrKZ5EeM+1Z/q3Bx38YaBizBxSxpOR8 bz/77lmT2aHMCfnsLO62anwjJ13IZok= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-160-wq9BQlJ4NviQAX0FfWm84A-1; Fri, 01 Jul 2022 09:35:00 -0400 X-MC-Unique: wq9BQlJ4NviQAX0FfWm84A-1 Received: by mail-wr1-f70.google.com with SMTP id az28-20020adfe19c000000b0021bc8df3721so386728wrb.7 for ; Fri, 01 Jul 2022 06:35:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=DQ8AJ5QUxo9XnIkLf81k7Qz4hZ0b0lm4rQh44tOhYXk=; b=e6XldJQnDdJu41tEPIZWtaykvkwgsL0E5N9DUI0UIonRVs+Ka9cCxoE3obnLZlsJ7D XLVouMDF2zWqVM7tekcgxUwGPzKSjgYIQpfu2mDOzkwHqgAY2hbtQBU8cPaCCg2zGnsL QU6jSRS2chhAx4G1AP7jQBZBi5GRbNBTF8KV9376KMLV25szs6IGeK3Tbe4TkaJP2sF0 Ou7IBx5yQpJIzdpr7mG/8kb+NAt+CuTqwtPb7adql8ovBqNZYKn2iuNq9g35qn+Odip9 frjE69IFV1DEQHgiNufIEz9FanqVzt43Hmd/ukcbJwxYslzpcxIverZcbb3XZIhnH8gc 6LZQ== X-Gm-Message-State: AJIora/Mp3Aa2sFElMlbtSWBewdn29CktlP5Fuc77kuD5GZihv8wVb5L F0vwlJmujferQzrqVi0lCGY4iQjT/fPhg4p17tMgknM7yr1fp7Awav+G4xqkeqizRFJbgCkEtGr SEMCiDmR3vD6A+akqhSuKb+bQqcKlPf0LISwi7BT0NyW0ETirq4OeLsDNc4pC0JK7 X-Received: by 2002:a1c:f213:0:b0:39b:ad32:5e51 with SMTP id s19-20020a1cf213000000b0039bad325e51mr16320591wmc.72.1656682498742; Fri, 01 Jul 2022 06:34:58 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uG3vFR06IrFIV7JzXdfhrYsgwtfPHS4BtebYBiont/u547plvXlV3zpcWMJTifX5Zrs/3tTg== X-Received: by 2002:a1c:f213:0:b0:39b:ad32:5e51 with SMTP id s19-20020a1cf213000000b0039bad325e51mr16320565wmc.72.1656682498428; Fri, 01 Jul 2022 06:34:58 -0700 (PDT) Received: from localhost (net-5-95-131-226.cust.vodafonedsl.it. [5.95.131.226]) by smtp.gmail.com with ESMTPSA id j22-20020a05600c485600b003a04b248896sm6470059wmo.35.2022.07.01.06.34.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Jul 2022 06:34:58 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Fri, 01 Jul 2022 15:34:57 +0200 Message-ID: <165668249729.1967719.17248672795731437916.stgit@fed.void> In-Reply-To: <165668246947.1967719.4502709837520339365.stgit@fed.void> References: <165668246947.1967719.4502709837520339365.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v4 2/6] conntrack-tp: Use a cmap to store timeout policies X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Gaetan Rivet Multiple lookups are done to stored timeout policies, each time blocking the global 'ct_lock'. This is usually not necessary and it should be acceptable to get policy updates slightly delayed (by one RCU sync at most). Using a CMAP reduces multiple lock taking and releasing in the connection insertion path. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Acked-by: William Tu Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 2 +- lib/conntrack-tp.c | 54 ++++++++++++++++++++++++++--------------------- lib/conntrack.c | 9 +++++--- lib/conntrack.h | 2 +- 4 files changed, 38 insertions(+), 29 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index d9461b811..34c688821 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -193,7 +193,7 @@ struct conntrack { struct cmap conns OVS_GUARDED; struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED; struct cmap zone_limits OVS_GUARDED; - struct hmap timeout_policies OVS_GUARDED; + struct cmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */ diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index a586d3a8d..c2245038b 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -47,14 +47,15 @@ static unsigned int ct_dpif_netdev_tp_def[] = { }; static struct timeout_policy * -timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) +timeout_policy_lookup_protected(struct conntrack *ct, int32_t tp_id) OVS_REQUIRES(ct->ct_lock) { struct timeout_policy *tp; uint32_t hash; hash = hash_int(tp_id, ct->hash_basis); - HMAP_FOR_EACH_IN_BUCKET (tp, node, hash, &ct->timeout_policies) { + CMAP_FOR_EACH_WITH_HASH_PROTECTED (tp, node, hash, + &ct->timeout_policies) { if (tp->policy.id == tp_id) { return tp; } @@ -62,20 +63,25 @@ timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) return NULL; } -struct timeout_policy * -timeout_policy_get(struct conntrack *ct, int32_t tp_id) +static struct timeout_policy * +timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) { struct timeout_policy *tp; + uint32_t hash; - ovs_mutex_lock(&ct->ct_lock); - tp = timeout_policy_lookup(ct, tp_id); - if (!tp) { - ovs_mutex_unlock(&ct->ct_lock); - return NULL; + hash = hash_int(tp_id, ct->hash_basis); + CMAP_FOR_EACH_WITH_HASH (tp, node, hash, &ct->timeout_policies) { + if (tp->policy.id == tp_id) { + return tp; + } } + return NULL; +} - ovs_mutex_unlock(&ct->ct_lock); - return tp; +struct timeout_policy * +timeout_policy_get(struct conntrack *ct, int32_t tp_id) +{ + return timeout_policy_lookup(ct, tp_id); } static void @@ -125,27 +131,30 @@ timeout_policy_create(struct conntrack *ct, init_default_tp(tp, tp_id); update_existing_tp(tp, new_tp); hash = hash_int(tp_id, ct->hash_basis); - hmap_insert(&ct->timeout_policies, &tp->node, hash); + cmap_insert(&ct->timeout_policies, &tp->node, hash); } static void timeout_policy_clean(struct conntrack *ct, struct timeout_policy *tp) OVS_REQUIRES(ct->ct_lock) { - hmap_remove(&ct->timeout_policies, &tp->node); - free(tp); + uint32_t hash = hash_int(tp->policy.id, ct->hash_basis); + cmap_remove(&ct->timeout_policies, &tp->node, hash); + ovsrcu_postpone(free, tp); } static int -timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id) +timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id, + bool warn_on_error) OVS_REQUIRES(ct->ct_lock) { + struct timeout_policy *tp; int err = 0; - struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id); + tp = timeout_policy_lookup_protected(ct, tp_id); if (tp) { timeout_policy_clean(ct, tp); - } else { + } else if (warn_on_error) { VLOG_WARN_RL(&rl, "Failed to delete a non-existent timeout " "policy: id=%d", tp_id); err = ENOENT; @@ -159,7 +168,7 @@ timeout_policy_delete(struct conntrack *ct, uint32_t tp_id) int err; ovs_mutex_lock(&ct->ct_lock); - err = timeout_policy_delete__(ct, tp_id); + err = timeout_policy_delete__(ct, tp_id, true); ovs_mutex_unlock(&ct->ct_lock); return err; } @@ -170,7 +179,7 @@ timeout_policy_init(struct conntrack *ct) { struct timeout_policy tp; - hmap_init(&ct->timeout_policies); + cmap_init(&ct->timeout_policies); /* Create default timeout policy. */ memset(&tp, 0, sizeof tp); @@ -182,14 +191,11 @@ int timeout_policy_update(struct conntrack *ct, struct timeout_policy *new_tp) { - int err = 0; uint32_t tp_id = new_tp->policy.id; + int err = 0; ovs_mutex_lock(&ct->ct_lock); - struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id); - if (tp) { - err = timeout_policy_delete__(ct, tp_id); - } + timeout_policy_delete__(ct, tp_id, false); timeout_policy_create(ct, new_tp); ovs_mutex_unlock(&ct->ct_lock); return err; diff --git a/lib/conntrack.c b/lib/conntrack.c index 6df1142b9..38ddc9b91 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -542,10 +542,13 @@ conntrack_destroy(struct conntrack *ct) cmap_destroy(&ct->zone_limits); struct timeout_policy *tp; - HMAP_FOR_EACH_POP (tp, node, &ct->timeout_policies) { - free(tp); + CMAP_FOR_EACH (tp, node, &ct->timeout_policies) { + uint32_t hash = hash_int(tp->policy.id, ct->hash_basis); + + cmap_remove(&ct->timeout_policies, &tp->node, hash); + ovsrcu_postpone(free, tp); } - hmap_destroy(&ct->timeout_policies); + cmap_destroy(&ct->timeout_policies); ovs_mutex_unlock(&ct->ct_lock); ovs_mutex_destroy(&ct->ct_lock); diff --git a/lib/conntrack.h b/lib/conntrack.h index 58b181834..b064abc9f 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -113,7 +113,7 @@ struct conntrack_zone_limit { }; struct timeout_policy { - struct hmap_node node; + struct cmap_node node; struct ct_dpif_timeout_policy policy; }; From patchwork Fri Jul 1 13:35:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1651204 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ByGFsOK4; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LZGRY3MsVz9s09 for ; Fri, 1 Jul 2022 23:35:36 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 72B5261417; Fri, 1 Jul 2022 13:35:33 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 72B5261417 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ByGFsOK4 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XTQtp8SE-WYJ; Fri, 1 Jul 2022 13:35:31 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 3982460BF5; Fri, 1 Jul 2022 13:35:30 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 3982460BF5 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id ECBAEC0035; Fri, 1 Jul 2022 13:35:29 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 54820C0037 for ; Fri, 1 Jul 2022 13:35:28 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 266464247B for ; Fri, 1 Jul 2022 13:35:16 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 266464247B Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ByGFsOK4 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fY2KJfo9dd1h for ; Fri, 1 Jul 2022 13:35:11 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 332DD42477 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 332DD42477 for ; Fri, 1 Jul 2022 13:35:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656682510; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KXEgNvNgOQwly3milRXIE55DN7iZ9Ctx4DAokBkVbbI=; b=ByGFsOK4SAYsebKRTG/4L43bI50FO6nIdMlwAFG7wQPgr3UtSvu1Kwp0si2AvbROGH8AWQ cw1pqccOJ+I2/lU16z7Mim98Qyv+PY4rekWwhvGlus4mrFU435nfof3wmVcXuag60nuzmW aeojM38q12iy0VbqouIZbQfWkZUsyMo= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-335-d1odU5vDPHepswiV2ltGHw-1; Fri, 01 Jul 2022 09:35:07 -0400 X-MC-Unique: d1odU5vDPHepswiV2ltGHw-1 Received: by mail-wr1-f70.google.com with SMTP id m7-20020adfa3c7000000b0021b94088ba2so390156wrb.9 for ; Fri, 01 Jul 2022 06:35:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=JXg9rSbLDtfm6OY7wWZb6DtcCOLGWtJZ9AzruOYhQ44=; b=NETiJdkqi7eUoRGmAaYlqJqaLneJ05tQZanLKw3+I1K1NjHLfFsq2Iu8muFCDIHVOE DrRRmpprY8tV4FYhPrvvWIs75mpYRzaIH9L5Sviz8IZiyDRZOXH94/b/Csep5kgYfx3A HpiT1Kfw1YZE4ZNV1yBzb9ZExCvlTTaLdLsVsgRpwMb/zlc8wPHVfDrjbO5yG4w6+wq+ N4Z0MiLtWwQuNEJznvfdq2UxRFZTTZYMYvFDBXQa0s0S0QMGNmUlafsusW1jTPVbt0GP p6pxYlEuBwe22df8jKcO/6xFzsLydc3wxv5OOJ5QglVcQafjVH6CXDeSv4oooFgO0p6j tDmQ== X-Gm-Message-State: AJIora/QLQCCWu/zjdhBlgvOGe+YRHS3yxyfvFx7k1VlWaqAxy8yR40Q ldeMbKsBZRH4S8gV4BYDVipZShc6885nCEQX7CBC+D0gJdTO7T4Du8Qepl/mhfeFpDD7NB0fOg3 273LhEmRVWmk27h2JqTZoVU8tlDkm4QVAhKoz8UximKAbEyt1lD8R8EHg+FTlgBLL X-Received: by 2002:adf:fb43:0:b0:21a:22eb:da43 with SMTP id c3-20020adffb43000000b0021a22ebda43mr13544959wrs.347.1656682505684; Fri, 01 Jul 2022 06:35:05 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vcJSLWZBoaimduWwxzeP/ajqrBoZXP7DdNblF0u6e1AWvAT/9qsfRkGxCYCWIcl4obJqlYIQ== X-Received: by 2002:adf:fb43:0:b0:21a:22eb:da43 with SMTP id c3-20020adffb43000000b0021a22ebda43mr13544878wrs.347.1656682504704; Fri, 01 Jul 2022 06:35:04 -0700 (PDT) Received: from localhost (net-5-95-131-226.cust.vodafonedsl.it. [5.95.131.226]) by smtp.gmail.com with ESMTPSA id q13-20020adfdfcd000000b0021b8cd8a068sm22179628wrn.49.2022.07.01.06.35.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Jul 2022 06:35:04 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Fri, 01 Jul 2022 15:35:03 +0200 Message-ID: <165668250354.1967719.13981409928749386554.stgit@fed.void> In-Reply-To: <165668246947.1967719.4502709837520339365.stgit@fed.void> References: <165668246947.1967719.4502709837520339365.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v4 3/6] conntrack: Replaces nat_conn introducing key directionality. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Peng He Currently, when doing NAT, the userspace conntrack will use an extra conn for the two directions in a flow. However, each conn has actually the two keys for both orig and rev directions. This patch introduces a key_node[CT_DIR_MAX] member in the conn which consists of key and a cmap_node for hash lookup. Both keys can now be accessed in the following way: conn->key_node[CT_DIR_{FWD,REV}].key similarly to what Aaron Conole suggested. This patch avoids the extra allocation for nat_conn, and makes userspace code cleaner. Signed-off-by: Peng He Co-authored-by: Paolo Valerio Signed-off-by: Paolo Valerio Reviewed-by: Gaetan Rivet Acked-by: Aaron Conole --- - moved dir into the keynode as it is a better place to store it --- lib/conntrack-private.h | 20 +- lib/conntrack-tp.c | 6 - lib/conntrack.c | 505 ++++++++++++++++++++--------------------------- 3 files changed, 232 insertions(+), 299 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 34c688821..a9546ee9c 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -48,6 +48,12 @@ struct ct_endpoint { * hashing in ct_endpoint_hash_add(). */ BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4); +enum key_dir { + CT_DIR_FWD = 0, + CT_DIR_REV, + CT_DIR_MAX, +}; + /* Changes to this structure need to be reflected in conn_key_hash() * and conn_key_cmp(). */ struct conn_key { @@ -86,21 +92,20 @@ struct alg_exp_node { bool nat_rpl_dst; }; -enum OVS_PACKED_ENUM ct_conn_type { - CT_CONN_TYPE_DEFAULT, - CT_CONN_TYPE_UN_NAT, +struct conn_key_node { + enum key_dir dir; + struct conn_key key; + struct cmap_node cm_node; }; struct conn { /* Immutable data. */ - struct conn_key key; - struct conn_key rev_key; + struct conn_key_node key_node[CT_DIR_MAX]; struct conn_key parent_key; /* Only used for orig_tuple support. */ struct ovs_list exp_node; - struct cmap_node cm_node; + uint16_t nat_action; char *alg; - struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */ /* Mutable data. */ struct ovs_mutex lock; /* Guards all mutable fields. */ @@ -120,7 +125,6 @@ struct conn { /* Immutable data. */ bool alg_related; /* True if alg data connection. */ - enum ct_conn_type conn_type; uint32_t tp_id; /* Timeout policy ID. */ }; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index c2245038b..9ecb06978 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -282,7 +282,8 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, ovs_mutex_lock(&conn->lock); VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d " "val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_update_expiration__(ct, conn, tm, now, val); } @@ -313,7 +314,8 @@ conn_init_expiration(struct conntrack *ct, struct conn *conn, } VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_init_expiration__(ct, conn, tm, now, val); } diff --git a/lib/conntrack.c b/lib/conntrack.c index 38ddc9b91..371a34b89 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -96,7 +96,6 @@ static struct conn *new_conn(struct conntrack *ct, struct dp_packet *pkt, uint32_t tp_id); static void delete_conn_cmn(struct conn *); static void delete_conn(struct conn *); -static void delete_conn_one(struct conn *conn); static enum ct_update_res conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, @@ -110,8 +109,7 @@ static void set_label(struct dp_packet *, struct conn *, static void *clean_thread_main(void *f_); static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info); static uint8_t @@ -231,61 +229,6 @@ conn_key_cmp(const struct conn_key *key1, const struct conn_key *key2) return 1; } -static void -ct_print_conn_info(const struct conn *c, const char *log_msg, - enum vlog_level vll, bool force, bool rl_on) -{ -#define CT_VLOG(RL_ON, LEVEL, ...) \ - do { \ - if (RL_ON) { \ - static struct vlog_rate_limit rl_ = VLOG_RATE_LIMIT_INIT(5, 5); \ - vlog_rate_limit(&this_module, LEVEL, &rl_, __VA_ARGS__); \ - } else { \ - vlog(&this_module, LEVEL, __VA_ARGS__); \ - } \ - } while (0) - - if (OVS_UNLIKELY(force || vlog_is_enabled(&this_module, vll))) { - if (c->key.dl_type == htons(ETH_TYPE_IP)) { - CT_VLOG(rl_on, vll, "%s: src ip "IP_FMT" dst ip "IP_FMT" rev src " - "ip "IP_FMT" rev dst ip "IP_FMT" src/dst ports " - "%"PRIu16"/%"PRIu16" rev src/dst ports " - "%"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, - IP_ARGS(c->key.src.addr.ipv4), - IP_ARGS(c->key.dst.addr.ipv4), - IP_ARGS(c->rev_key.src.addr.ipv4), - IP_ARGS(c->rev_key.dst.addr.ipv4), - ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } else { - char ip6_s[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.src.addr.ipv6, ip6_s, sizeof ip6_s); - char ip6_d[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.dst.addr.ipv6, ip6_d, sizeof ip6_d); - char ip6_rs[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.src.addr.ipv6, ip6_rs, - sizeof ip6_rs); - char ip6_rd[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.dst.addr.ipv6, ip6_rd, - sizeof ip6_rd); - - CT_VLOG(rl_on, vll, "%s: src ip %s dst ip %s rev src ip %s" - " rev dst ip %s src/dst ports %"PRIu16"/%"PRIu16 - " rev src/dst ports %"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, ip6_s, ip6_d, ip6_rs, - ip6_rd, ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } - } -} - /* Initializes the connection tracker 'ct'. The caller is responsible for * calling 'conntrack_destroy()', when the instance is not needed anymore */ struct conntrack * @@ -467,34 +410,28 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone) } static void -conn_clean_cmn(struct conntrack *ct, struct conn *conn) +conn_clean(struct conntrack *ct, struct conn *conn) OVS_REQUIRES(ct->ct_lock) { + struct zone_limit *zl; + uint32_t hash; + if (conn->alg) { - expectation_clean(ct, &conn->key); + expectation_clean(ct, &conn->key_node[CT_DIR_FWD].key); } - uint32_t hash = conn_key_hash(&conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->cm_node, hash); + hash = conn_key_hash(&conn->key_node[CT_DIR_FWD].key, ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); - struct zone_limit *zl = zone_limit_lookup(ct, conn->admit_zone); + zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { atomic_count_dec(&zl->czl.count); } -} -/* Must be called with 'conn' of 'conn_type' CT_CONN_TYPE_DEFAULT. Also - * removes the associated nat 'conn' from the lookup datastructures. */ -static void -conn_clean(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); - - conn_clean_cmn(ct, conn); - if (conn->nat_conn) { - uint32_t hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash); + if (conn->nat_action) { + hash = conn_key_hash(&conn->key_node[CT_DIR_REV].key, + ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); } ovs_list_remove(&conn->exp_node); conn->cleaned = true; @@ -502,33 +439,26 @@ conn_clean(struct conntrack *ct, struct conn *conn) atomic_count_dec(&ct->n_conn); } -static void -conn_clean_one(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - conn_clean_cmn(ct, conn); - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_list_remove(&conn->exp_node); - conn->cleaned = true; - atomic_count_dec(&ct->n_conn); - } - ovsrcu_postpone(delete_conn_one, conn); -} - /* Destroys the connection tracker 'ct' and frees all the allocated memory. * The caller of this function must already have shut down packet input * and PMD threads (which would have been quiesced). */ void conntrack_destroy(struct conntrack *ct) { + struct conn_key_node *keyn; struct conn *conn; + latch_set(&ct->clean_thread_exit); pthread_join(ct->clean_thread, NULL); latch_destroy(&ct->clean_thread_exit); ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - conn_clean_one(ct, conn); + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + conn_clean(ct, conn); } cmap_destroy(&ct->conns); @@ -573,31 +503,31 @@ conn_key_lookup(struct conntrack *ct, const struct conn_key *key, uint32_t hash, long long now, struct conn **conn_out, bool *reply) { - struct conn *conn; + struct conn_key_node *keyn; + struct conn *conn = NULL; bool found = false; - CMAP_FOR_EACH_WITH_HASH (conn, cm_node, hash, &ct->conns) { - if (!conn_key_cmp(&conn->key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = false; - } - break; - } - if (!conn_key_cmp(&conn->rev_key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = true; + CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ct->conns) { + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + for (int i = CT_DIR_FWD; i < CT_DIR_MAX; i++) { + if (!conn_key_cmp(&conn->key_node[i].key, key) && + !conn_expired(conn, now)) { + found = true; + if (reply) { + *reply = i; + } + goto out_found; } - break; } } +out_found: if (found && conn_out) { *conn_out = conn; } else if (conn_out) { *conn_out = NULL; } + return found; } @@ -631,7 +561,7 @@ write_ct_md(struct dp_packet *pkt, uint16_t zone, const struct conn *conn, if (conn->alg_related) { key = &conn->parent_key; } else { - key = &conn->key; + key = &conn->key_node[CT_DIR_FWD].key; } } else if (alg_exp) { pkt->md.ct_mark = alg_exp->parent_mark; @@ -754,21 +684,24 @@ handle_alg_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, static void pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key, + *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, conn->rev_key.dst.port, th->tcp_dst); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + packet_set_tcp_port(pkt, rev_key->dst.port, th->tcp_dst); + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, conn->rev_key.dst.port, uh->udp_dst); + packet_set_udp_port(pkt, rev_key->dst.port, uh->udp_dst); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->rev_key.dst.port, - conn->rev_key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->rev_key.dst.port, - conn->rev_key.src.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, rev_key->dst.port, + rev_key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, rev_key->dst.port, + rev_key->src.port); } } } @@ -776,32 +709,35 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn) static void nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) { + const struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key, + *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { pkt->md.ct_state |= CS_SRC_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_src, - conn->rev_key.dst.addr.ipv4); + rev_key->dst.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_src.be32, - &conn->rev_key.dst.addr.ipv6, true); + &rev_key->dst.addr.ipv6, true); } if (!related) { pat_packet(pkt, conn); } } else if (conn->nat_action & NAT_ACTION_DST) { pkt->md.ct_state |= CS_DST_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_dst, - conn->rev_key.src.addr.ipv4); + rev_key->src.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_dst.be32, - &conn->rev_key.src.addr.ipv6, true); + &rev_key->src.addr.ipv6, true); } if (!related) { pat_packet(pkt, conn); @@ -812,19 +748,21 @@ nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) static void un_pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, th->tcp_src, conn->key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + packet_set_tcp_port(pkt, th->tcp_src, key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, uh->udp_src, conn->key.src.port); + packet_set_udp_port(pkt, uh->udp_src, key->src.port); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->key.dst.port, conn->key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->key.dst.port, conn->key.src.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, key->dst.port, key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, key->dst.port, key->src.port); } } } @@ -832,23 +770,25 @@ un_pat_packet(struct dp_packet *pkt, const struct conn *conn) static void reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th_in = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, conn->key.src.port, + packet_set_tcp_port(pkt, key->src.port, th_in->tcp_dst); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh_in = dp_packet_l4(pkt); - packet_set_udp_port(pkt, conn->key.src.port, + packet_set_udp_port(pkt, key->src.port, uh_in->udp_dst); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->key.src.port, - conn->key.dst.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->key.src.port, - conn->key.dst.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, key->src.port, + key->dst.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, key->src.port, + key->dst.port); } } } @@ -856,6 +796,7 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn) static void reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; char *tail = dp_packet_tail(pkt); uint16_t pad = dp_packet_l2_pad_size(pkt); struct conn_key inner_key; @@ -863,7 +804,7 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) uint16_t orig_l3_ofs = pkt->l3_ofs; uint16_t orig_l4_ofs = pkt->l4_ofs; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); struct icmp_header *icmp = dp_packet_l4(pkt); struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1); @@ -876,10 +817,10 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) if (conn->nat_action & NAT_ACTION_SRC) { packet_set_ipv4_addr(pkt, &inner_l3->ip_src, - conn->key.src.addr.ipv4); + key->src.addr.ipv4); } else if (conn->nat_action & NAT_ACTION_DST) { packet_set_ipv4_addr(pkt, &inner_l3->ip_dst, - conn->key.dst.addr.ipv4); + key->dst.addr.ipv4); } reverse_pat_packet(pkt, conn); @@ -899,13 +840,13 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) pkt->l4_ofs += inner_l4 - (char *) icmp6; if (conn->nat_action & NAT_ACTION_SRC) { - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, inner_l3_6->ip6_src.be32, - &conn->key.src.addr.ipv6, true); + &key->src.addr.ipv6, true); } else if (conn->nat_action & NAT_ACTION_DST) { - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, inner_l3_6->ip6_dst.be32, - &conn->key.dst.addr.ipv6, true); + &key->dst.addr.ipv6, true); } reverse_pat_packet(pkt, conn); icmp6->icmp6_base.icmp6_cksum = 0; @@ -920,17 +861,19 @@ static void un_nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { pkt->md.ct_state |= CS_DST_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_dst, - conn->key.src.addr.ipv4); + key->src.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_dst.be32, - &conn->key.src.addr.ipv6, true); + &key->src.addr.ipv6, true); } if (OVS_UNLIKELY(related)) { @@ -940,15 +883,15 @@ un_nat_packet(struct dp_packet *pkt, const struct conn *conn, } } else if (conn->nat_action & NAT_ACTION_DST) { pkt->md.ct_state |= CS_SRC_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_src, - conn->key.dst.addr.ipv4); + key->dst.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_src.be32, - &conn->key.dst.addr.ipv6, true); + &key->dst.addr.ipv6, true); } if (OVS_UNLIKELY(related)) { @@ -966,7 +909,7 @@ conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, { struct conn *conn; ovs_mutex_unlock(&conn_in->lock); - conn_lookup(ct, &conn_in->key, now, &conn, NULL); + conn_lookup(ct, &conn_in->key_node[CT_DIR_FWD].key, now, &conn, NULL); ovs_mutex_lock(&conn_in->lock); if (conn && seq_skew) { @@ -1004,7 +947,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, OVS_REQUIRES(ct->ct_lock) { struct conn *nc = NULL; - struct conn *nat_conn = NULL; + uint32_t rev_hash = ctx->hash; if (!valid_new(pkt, &ctx->key)) { pkt->md.ct_state = CS_INVALID; @@ -1018,6 +961,8 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } if (commit) { + struct conn_key_node *fwd_key_node, *rev_key_node; + struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { @@ -1032,9 +977,12 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } nc = new_conn(ct, pkt, &ctx->key, now, tp_id); - memcpy(&nc->key, &ctx->key, sizeof nc->key); - memcpy(&nc->rev_key, &nc->key, sizeof nc->rev_key); - conn_key_reverse(&nc->rev_key); + fwd_key_node = &nc->key_node[CT_DIR_FWD]; + rev_key_node = &nc->key_node[CT_DIR_REV]; + memcpy(&fwd_key_node->key, &ctx->key, sizeof fwd_key_node->key); + memcpy(&rev_key_node->key, &fwd_key_node->key, + sizeof rev_key_node->key); + conn_key_reverse(&rev_key_node->key); if (ct_verify_helper(helper, ct_alg_ctl)) { nc->alg = nullable_xstrdup(helper); @@ -1049,45 +997,33 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (nat_action_info) { nc->nat_action = nat_action_info->nat_action; - nat_conn = xzalloc(sizeof *nat_conn); if (alg_exp) { if (alg_exp->nat_rpl_dst) { - nc->rev_key.dst.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.dst.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_SRC; } else { - nc->rev_key.src.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.src.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_DST; } } else { - memcpy(nat_conn, nc, sizeof *nat_conn); - bool nat_res = nat_get_unique_tuple(ct, nc, nat_conn, - nat_action_info); + bool nat_res = nat_get_unique_tuple(ct, nc, nat_action_info); if (!nat_res) { goto nat_res_exhaustion; } - - /* Update nc with nat adjustments made to nat_conn by - * nat_get_unique_tuple(). */ - memcpy(nc, nat_conn, sizeof *nc); } nat_packet(pkt, nc, ctx->icmp_related); - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; - nat_conn->nat_action = 0; - nat_conn->alg = NULL; - nat_conn->nat_conn = NULL; - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); + rev_hash = conn_key_hash(&rev_key_node->key, ct->hash_basis); + rev_key_node->dir = CT_DIR_REV; + cmap_insert(&ct->conns, &rev_key_node->cm_node, rev_hash); } - nc->nat_conn = nat_conn; ovs_mutex_init_adaptive(&nc->lock); - nc->conn_type = CT_CONN_TYPE_DEFAULT; - cmap_insert(&ct->conns, &nc->cm_node, ctx->hash); + fwd_key_node->dir = CT_DIR_FWD; + cmap_insert(&ct->conns, &fwd_key_node->cm_node, ctx->hash); + atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ if (zl) { @@ -1107,7 +1043,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, * firewall rules or a separate firewall. Also using zone partitioning * can limit DoS impact. */ nat_res_exhaustion: - free(nat_conn); ovs_list_remove(&nc->exp_node); delete_conn_cmn(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); @@ -1121,7 +1056,6 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, struct conn *conn, long long now) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); bool create_new_conn = false; if (ctx->icmp_related) { @@ -1149,7 +1083,8 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, break; case CT_UPDATE_NEW: ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); @@ -1330,11 +1265,12 @@ initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, if (natted) { if (OVS_LIKELY(ctx->conn)) { ctx->reply = !ctx->reply; - ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key; + ctx->key = ctx->conn->key_node[ctx->reply ? + CT_DIR_REV : CT_DIR_FWD].key; ctx->hash = conn_key_hash(&ctx->key, ct->hash_basis); } else { /* A lookup failure does not necessarily imply that an - * error occurred, it may simply indicate that a conn got + * error occurred, it may simply indicate that a ctx->conn got * removed during the recirculation. */ COVERAGE_INC(conntrack_lookup_natted_miss); conn_key_reverse(&ctx->key); @@ -1364,32 +1300,14 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, /* Delete found entry if in wrong direction. 'force' implies commit. */ if (OVS_UNLIKELY(force && ctx->reply && conn)) { ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); conn = NULL; } - if (OVS_LIKELY(conn)) { - if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { - - ctx->reply = true; - struct conn *rev_conn = conn; /* Save for debugging. */ - uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); - conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); - - if (!conn) { - pkt->md.ct_state |= CS_INVALID; - write_ct_md(pkt, zone, NULL, NULL, NULL); - char *log_msg = xasprintf("Missing parent conn %p", rev_conn); - ct_print_conn_info(rev_conn, log_msg, VLL_INFO, true, true); - free(log_msg); - return; - } - } - } - enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt, tp_src, tp_dst, helper); @@ -1482,7 +1400,8 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, struct conn *conn = packet->md.conn; if (OVS_UNLIKELY(packet->md.ct_state == CS_INVALID)) { write_ct_md(packet, zone, NULL, NULL, NULL); - } else if (conn && conn->key.zone == zone && !force + } else if (conn && + conn->key_node[CT_DIR_FWD].key.zone == zone && !force && !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) { process_one_fast(zone, setmark, setlabel, nat_action_info, conn, packet); @@ -2263,7 +2182,7 @@ nat_ipv6_addr_increment(struct in6_addr *ipv6, uint32_t increment) } static uint32_t -nat_range_hash(const struct conn *conn, uint32_t basis, +nat_range_hash(struct conn_key *key, uint32_t basis, const struct nat_action_info_t *nat_info) { uint32_t hash = basis; @@ -2273,11 +2192,11 @@ nat_range_hash(const struct conn *conn, uint32_t basis, hash = hash_add(hash, ((uint32_t) nat_info->max_port << 16) | nat_info->min_port); - hash = ct_endpoint_hash_add(hash, &conn->key.src); - hash = ct_endpoint_hash_add(hash, &conn->key.dst); - hash = hash_add(hash, (OVS_FORCE uint32_t) conn->key.dl_type); - hash = hash_add(hash, conn->key.nw_proto); - hash = hash_add(hash, conn->key.zone); + hash = ct_endpoint_hash_add(hash, &key->src); + hash = ct_endpoint_hash_add(hash, &key->dst); + hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type); + hash = hash_add(hash, key->nw_proto); + hash = hash_add(hash, key->zone); /* The purpose of the second parameter is to distinguish hashes of data of * different length; our data always has the same length so there is no @@ -2351,7 +2270,7 @@ get_addr_in_range(union ct_addr *min, union ct_addr *max, } static void -find_addr(const struct conn *conn, union ct_addr *min, +find_addr(struct conn_key *key, union ct_addr *min, union ct_addr *max, union ct_addr *curr, uint32_t hash, bool ipv4, const struct nat_action_info_t *nat_info) @@ -2361,9 +2280,9 @@ find_addr(const struct conn *conn, union ct_addr *min, /* All-zero case. */ if (!memcmp(min, &zero_ip, sizeof *min)) { if (nat_info->nat_action & NAT_ACTION_SRC) { - *curr = conn->key.src.addr; + *curr = key->src.addr; } else if (nat_info->nat_action & NAT_ACTION_DST) { - *curr = conn->key.dst.addr; + *curr = key->dst.addr; } } else { get_addr_in_range(min, max, curr, hash, ipv4); @@ -2382,7 +2301,7 @@ store_addr_to_key(union ct_addr *addr, struct conn_key *key, } static bool -nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, +nat_get_unique_l4(struct conntrack *ct, struct conn_key *rev_key, ovs_be16 *port, uint16_t curr, uint16_t min, uint16_t max) { @@ -2405,8 +2324,8 @@ another_round: } *port = htons(curr); - if (!conn_lookup(ct, &nat_conn->rev_key, - time_msec(), NULL, NULL)) { + if (!conn_lookup(ct, rev_key, time_msec(), + NULL, NULL)) { return true; } } @@ -2444,38 +2363,41 @@ another_round: * * If none can be found, return exhaustion to the caller. */ static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info) { - uint32_t hash = nat_range_hash(conn, ct->hash_basis, nat_info); union ct_addr min_addr = {0}, max_addr = {0}, addr = {0}; - bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || - conn->key.nw_proto == IPPROTO_UDP; + struct conn_key *fwd_key = &conn->key_node[CT_DIR_FWD].key, + *rev_key = &conn->key_node[CT_DIR_REV].key; + bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP || + fwd_key->nw_proto == IPPROTO_UDP; uint16_t min_dport, max_dport, curr_dport; uint16_t min_sport, max_sport, curr_sport; + uint32_t hash; + + hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info); min_addr = nat_info->min_addr; max_addr = nat_info->max_addr; - find_addr(conn, &min_addr, &max_addr, &addr, hash, - (conn->key.dl_type == htons(ETH_TYPE_IP)), nat_info); + find_addr(fwd_key, &min_addr, &max_addr, &addr, hash, + (fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info); - set_sport_range(nat_info, &conn->key, hash, &curr_sport, + set_sport_range(nat_info, fwd_key, hash, &curr_sport, &min_sport, &max_sport); - set_dport_range(nat_info, &conn->key, hash, &curr_dport, + set_dport_range(nat_info, fwd_key, hash, &curr_dport, &min_dport, &max_dport); if (pat_proto) { - nat_conn->rev_key.src.port = htons(curr_dport); - nat_conn->rev_key.dst.port = htons(curr_sport); + rev_key->src.port = htons(curr_dport); + rev_key->dst.port = htons(curr_sport); } - store_addr_to_key(&addr, &nat_conn->rev_key, + store_addr_to_key(&addr, rev_key, nat_info->nat_action); if (!pat_proto) { - if (!conn_lookup(ct, &nat_conn->rev_key, + if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } @@ -2485,12 +2407,12 @@ nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, bool found = false; if (nat_info->nat_action & NAT_ACTION_DST_PORT) { - found = nat_get_unique_l4(ct, nat_conn, &nat_conn->rev_key.src.port, + found = nat_get_unique_l4(ct, rev_key, &rev_key->src.port, curr_dport, min_dport, max_dport); } if (!found) { - found = nat_get_unique_l4(ct, nat_conn, &nat_conn->rev_key.dst.port, + found = nat_get_unique_l4(ct, rev_key, &rev_key->dst.port, curr_sport, min_sport, max_sport); } @@ -2506,9 +2428,9 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, long long now) { ovs_mutex_lock(&conn->lock); + uint8_t nw_proto = conn->key_node[CT_DIR_FWD].key.nw_proto; enum ct_update_res update_res = - l4_protos[conn->key.nw_proto]->conn_update(ct, conn, pkt, ctx->reply, - now); + l4_protos[nw_proto]->conn_update(ct, conn, pkt, ctx->reply, now); ovs_mutex_unlock(&conn->lock); return update_res; } @@ -2516,13 +2438,10 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, static bool conn_expired(struct conn *conn, long long now) { - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_lock(&conn->lock); - bool expired = now >= conn->expiration ? true : false; - ovs_mutex_unlock(&conn->lock); - return expired; - } - return false; + ovs_mutex_lock(&conn->lock); + bool expired = now >= conn->expiration ? true : false; + ovs_mutex_unlock(&conn->lock); + return expired; } static bool @@ -2548,19 +2467,7 @@ delete_conn_cmn(struct conn *conn) static void delete_conn(struct conn *conn) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); ovs_mutex_destroy(&conn->lock); - free(conn->nat_conn); - delete_conn_cmn(conn); -} - -/* Only used by conn_clean_one(). */ -static void -delete_conn_one(struct conn *conn) -{ - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_destroy(&conn->lock); - } delete_conn_cmn(conn); } @@ -2652,11 +2559,14 @@ static void conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long now) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key, + *rev_key = &conn->key_node[CT_DIR_REV].key; + memset(entry, 0, sizeof *entry); - conn_key_to_tuple(&conn->key, &entry->tuple_orig); - conn_key_to_tuple(&conn->rev_key, &entry->tuple_reply); + conn_key_to_tuple(key, &entry->tuple_orig); + conn_key_to_tuple(rev_key, &entry->tuple_reply); - entry->zone = conn->key.zone; + entry->zone = key->zone; ovs_mutex_lock(&conn->lock); entry->mark = conn->mark; @@ -2664,7 +2574,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long expiration = conn->expiration - now; - struct ct_l4_proto *class = l4_protos[conn->key.nw_proto]; + struct ct_l4_proto *class = l4_protos[key->nw_proto]; if (class->conn_get_protoinfo) { class->conn_get_protoinfo(conn, &entry->protoinfo); } @@ -2712,10 +2622,12 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) if (!cm_node) { break; } + struct conn_key_node *keyn; struct conn *conn; - INIT_CONTAINER(conn, cm_node, cm_node); - if ((!dump->filter_zone || conn->key.zone == dump->zone) && - (conn->conn_type != CT_CONN_TYPE_UN_NAT)) { + INIT_CONTAINER(keyn, cm_node, cm_node); + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + if ((!dump->filter_zone || keyn->key.zone == dump->zone) && + (keyn->dir == CT_DIR_FWD)) { conn_to_ct_dpif_entry(conn, entry, now); return 0; } @@ -2733,12 +2645,17 @@ conntrack_dump_done(struct conntrack_dump *dump OVS_UNUSED) int conntrack_flush(struct conntrack *ct, const uint16_t *zone) { + struct conn_key_node *keyn; struct conn *conn; ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - if (!zone || *zone == conn->key.zone) { - conn_clean_one(ct, conn); + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + if (!zone || *zone == keyn->key.zone) { + conn_clean(ct, conn); } } ovs_mutex_unlock(&ct->ct_lock); @@ -2759,10 +2676,10 @@ conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, ovs_mutex_lock(&ct->ct_lock); conn_lookup(ct, &key, time_msec(), &conn, NULL); - if (conn && conn->conn_type == CT_CONN_TYPE_DEFAULT) { + if (conn) { conn_clean(ct, conn); } else { - VLOG_WARN("Must flush tuple using the original pre-NATed tuple"); + VLOG_WARN("Tuple not found"); error = ENOENT; } @@ -2906,50 +2823,52 @@ expectation_create(struct conntrack *ct, ovs_be16 dst_port, const struct conn *parent_conn, bool reply, bool src_ip_wc, bool skip_nat) { + const struct conn_key *pconn_key = &parent_conn->key_node[CT_DIR_FWD].key, + *pconn_rev_key = &parent_conn->key_node[CT_DIR_REV].key; union ct_addr src_addr; union ct_addr dst_addr; union ct_addr alg_nat_repl_addr; struct alg_exp_node *alg_exp_node = xzalloc(sizeof *alg_exp_node); if (reply) { - src_addr = parent_conn->key.src.addr; - dst_addr = parent_conn->key.dst.addr; + src_addr = pconn_key->src.addr; + dst_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; if (skip_nat) { alg_nat_repl_addr = dst_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->rev_key.src.addr; + alg_nat_repl_addr = pconn_rev_key->src.addr; alg_exp_node->nat_rpl_dst = false; } else { - alg_nat_repl_addr = parent_conn->rev_key.dst.addr; + alg_nat_repl_addr = pconn_rev_key->dst.addr; } } else { - src_addr = parent_conn->rev_key.src.addr; - dst_addr = parent_conn->rev_key.dst.addr; + src_addr = pconn_rev_key->src.addr; + dst_addr = pconn_rev_key->dst.addr; alg_exp_node->nat_rpl_dst = false; if (skip_nat) { alg_nat_repl_addr = src_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->key.dst.addr; + alg_nat_repl_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; } else { - alg_nat_repl_addr = parent_conn->key.src.addr; + alg_nat_repl_addr = pconn_key->src.addr; } } if (src_ip_wc) { memset(&src_addr, 0, sizeof src_addr); } - alg_exp_node->key.dl_type = parent_conn->key.dl_type; - alg_exp_node->key.nw_proto = parent_conn->key.nw_proto; - alg_exp_node->key.zone = parent_conn->key.zone; + alg_exp_node->key.dl_type = pconn_key->dl_type; + alg_exp_node->key.nw_proto = pconn_key->nw_proto; + alg_exp_node->key.zone = pconn_key->zone; alg_exp_node->key.src.addr = src_addr; alg_exp_node->key.dst.addr = dst_addr; alg_exp_node->key.src.port = ALG_WC_SRC_PORT; alg_exp_node->key.dst.port = dst_port; alg_exp_node->parent_mark = parent_conn->mark; alg_exp_node->parent_label = parent_conn->label; - memcpy(&alg_exp_node->parent_key, &parent_conn->key, + memcpy(&alg_exp_node->parent_key, pconn_key, sizeof alg_exp_node->parent_key); /* Take the write lock here because it is almost 100% * likely that the lookup will fail and @@ -3201,12 +3120,16 @@ process_ftp_ctl_v4(struct conntrack *ct, switch (mode) { case CT_FTP_MODE_ACTIVE: - *v4_addr_rep = conn_for_expectation->rev_key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv4; break; case CT_FTP_MODE_PASSIVE: - *v4_addr_rep = conn_for_expectation->key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->rev_key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_REV].key.src.addr.ipv4; break; case CT_TFTP_MODE: default: @@ -3306,17 +3229,20 @@ process_ftp_ctl_v6(struct conntrack *ct, switch (*mode) { case CT_FTP_MODE_ACTIVE: - *v6_addr_rep = conn_for_expectation->rev_key.dst.addr; + *v6_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr; /* Although most servers will block this exploit, there may be some * less well managed. */ if (memcmp(&ip6_addr, &v6_addr_rep->ipv6, sizeof ip6_addr) && - memcmp(&ip6_addr, &conn_for_expectation->key.src.addr.ipv6, + memcmp(&ip6_addr, + &conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv6, sizeof ip6_addr)) { return CT_FTP_CTL_INVALID; } break; case CT_FTP_MODE_PASSIVE: - *v6_addr_rep = conn_for_expectation->key.dst.addr; + *v6_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr; break; case CT_TFTP_MODE: default: @@ -3477,7 +3403,8 @@ handle_tftp_ctl(struct conntrack *ct, long long now OVS_UNUSED, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED, bool nat OVS_UNUSED) { - expectation_create(ct, conn_for_expectation->key.src.port, + expectation_create(ct, + conn_for_expectation->key_node[CT_DIR_FWD].key.src.port, conn_for_expectation, !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); } From patchwork Fri Jul 1 13:35:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1651205 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FuGHRkSN; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LZGRd1YY3z9s09 for ; Fri, 1 Jul 2022 23:35:41 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 4CC7961423; Fri, 1 Jul 2022 13:35:37 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 4CC7961423 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FuGHRkSN X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OPPIDdBEVvs1; Fri, 1 Jul 2022 13:35:34 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 2515C60BC4; Fri, 1 Jul 2022 13:35:33 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 2515C60BC4 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 96607C0081; Fri, 1 Jul 2022 13:35:31 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 59BDCC002D for ; Fri, 1 Jul 2022 13:35:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 2D4A8846ED for ; Fri, 1 Jul 2022 13:35:18 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 2D4A8846ED Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=FuGHRkSN X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HHU8q_WGh1gn for ; Fri, 1 Jul 2022 13:35:16 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 2C92A846E5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 2C92A846E5 for ; Fri, 1 Jul 2022 13:35:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656682515; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Km7bS6Jht315yyuO+3UPnhmHzmC4v8VIIWCpJS/ttI=; b=FuGHRkSN8A+eivCcqgJ0kMyKvv2kaNT13nwjXhnstSkCUelArnV/cks0XSRyg/t0I+btud l/6ZGXppABjWgYJXm6O/Jsgt3UGhhlh4qdYDysHk9QAdB69j8iUYfwebo6zp07Yvx+bMx4 XjAWJibs4wj98u3FZtQwD15TO7VDXzc= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-444-Yxwx0YkcM7G-zS__b8E5gQ-1; Fri, 01 Jul 2022 09:35:13 -0400 X-MC-Unique: Yxwx0YkcM7G-zS__b8E5gQ-1 Received: by mail-wm1-f70.google.com with SMTP id bg6-20020a05600c3c8600b003a03d5d19e4so1433685wmb.1 for ; Fri, 01 Jul 2022 06:35:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=JV3V/lMdx6ux3hDUSneuvsU1aYvAJ+RUXdnpNROtxnk=; b=QS9/4ks79YuPYnxpZsDnmeEGXFZ5kscjwrG57EF4WO8lVyKhbDWpXVOLro1oiT70I2 YFJAeryJhFdTqwjFUK7P2xhOIFtqWIXres8fZ3CSd2F0AZMcem9L9QMlUItdCuUCF3Hk tvQ5YWVmJCRS3z6VgELxMevwLk9ibK3EL7jzWXa5y0U9LeWiun26Pt6beSQ6MOGlrDGG P692cLpGB8jae+D/wlUmQGCBq44JsqPpMJA4UqzPYvdm30cDHmk9ImhsJzZjHvHNUgwo nPyHQnotsOLx4Msn5/L9n9pREvObHuIB9lNooqH17WOespj+eEr3FPDz4NH+rgptGzcw ReUw== X-Gm-Message-State: AJIora9Wq+Rj8aIes+JptYhrjJiYTgtTxvSJXCsrH5F9ShkMKkfQxiBp 7Cg1TurKERqzYcja/cJkC2IM9B9zLz+uD5AD/9XtgX2EHz6G5xMXO/LfrZddkYD16UcjqXKI5xs Q3MFQgrzWB1vaiJ6oV6UfJ/7FH1aOqUcn6TwqVqjuDcHPEfZH4m/pANIFmvcTx9cp X-Received: by 2002:a05:600c:a02:b0:39d:bdf8:a4f4 with SMTP id z2-20020a05600c0a0200b0039dbdf8a4f4mr16204728wmp.201.1656682511924; Fri, 01 Jul 2022 06:35:11 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uW5xb/8v3u8mspEb7zEmQGbXjOASk0JkNWM/5H3W6YiYDj7NuL0NrPJKydgZN46a9w2lKR4w== X-Received: by 2002:a05:600c:a02:b0:39d:bdf8:a4f4 with SMTP id z2-20020a05600c0a0200b0039dbdf8a4f4mr16204650wmp.201.1656682511017; Fri, 01 Jul 2022 06:35:11 -0700 (PDT) Received: from localhost (net-5-95-131-226.cust.vodafonedsl.it. [5.95.131.226]) by smtp.gmail.com with ESMTPSA id bv13-20020a0560001f0d00b002101ed6e70fsm3781987wrb.37.2022.07.01.06.35.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Jul 2022 06:35:10 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Fri, 01 Jul 2022 15:35:09 +0200 Message-ID: <165668250987.1967719.7371616138630033269.stgit@fed.void> In-Reply-To: <165668246947.1967719.4502709837520339365.stgit@fed.void> References: <165668246947.1967719.4502709837520339365.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v4 4/6] conntrack: Split single cmap to multiple buckets. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The purpose of this commit is to split the current way of storing the conn nodes. Before this patch the nodes were stored into a single cmap using ct->lock to avoid concurrent write access. The idea is to resurrect the old bucket approach already implemented before the introduction of the cmap, but instead of using hmaps, cmaps are used instead. With this commit a single connection can be stored into one or two (at most) CONNTRACK_BUCKETS available based on the outcome of the function hash_to_bucket() on the key. Every bucket has its local lock that needs to be acquired every time a node has to be removed/inserted from/to the cmap. This means that, in case the hash of the CT_DIR_FWD key differs from the one of the CT_DIR_REV, we can end up having the reference of the two key nodes in different buckets, and consequently acquiring two locks (one per cmap). The sweeper task now no longer uses expiration lists, but instead scans the cmaps taking charge of the expired entries. Signed-off-by: Paolo Valerio --- - Changed reschedule time from 90 to 30 seconds. This would reduce the worst case max latency from 90 + time to process to 30 + ttp - Renamed the hash shifting function to a more meaningful hash_to_bucket() - Reduced the number of buckets to 512. 1024 was just set as an example and may be an overkill for the average case. - Dropped (at least for the time being) the zone limit clean up as that part needs more tests. Different approaches might be evaluated (e.g. demanding the zone cleanup to the sweeper) as this has the potential to significantly increase the time spent by the threads processing the packets. Reducing the next wake timeout of the sweeper reduces the need of cleaning the zone. - Dropped lookup_gc as during syn flood test did not prove to be not really effective. It may also introduce additional latencies during the processing of the packet so we may want to drop it at least for the time being as further tests may be needed. --- lib/conntrack-private.h | 34 ++- lib/conntrack-tp.c | 42 ---- lib/conntrack.c | 498 +++++++++++++++++++++++++++++++---------------- tests/system-traffic.at | 5 4 files changed, 357 insertions(+), 222 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index a9546ee9c..fd501d574 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -95,6 +95,7 @@ struct alg_exp_node { struct conn_key_node { enum key_dir dir; struct conn_key key; + uint32_t key_hash; struct cmap_node cm_node; }; @@ -102,7 +103,6 @@ struct conn { /* Immutable data. */ struct conn_key_node key_node[CT_DIR_MAX]; struct conn_key parent_key; /* Only used for orig_tuple support. */ - struct ovs_list exp_node; uint16_t nat_action; char *alg; @@ -121,7 +121,9 @@ struct conn { /* Mutable data. */ bool seq_skew_dir; /* TCP sequence skew direction due to NATTing of FTP * control messages; true if reply direction. */ - bool cleaned; /* True if cleaned from expiry lists. */ + atomic_flag cleaned; /* True if the entry was stale and one of the + * cleaner (i.e. packet path or sweeper) took + * charge of it. */ /* Immutable data. */ bool alg_related; /* True if alg data connection. */ @@ -192,10 +194,25 @@ enum ct_timeout { N_CT_TM }; -struct conntrack { - struct ovs_mutex ct_lock; /* Protects 2 following fields. */ +#define CONNTRACK_BUCKETS_SHIFT 9 +#define CONNTRACK_BUCKETS (1 << CONNTRACK_BUCKETS_SHIFT) + +struct ct_bucket { + /* Protects 'conns'. In case of natted conns, there's a high + * chance that the forward and the reverse key stand in different + * buckets. buckets_lock() should be the preferred way to acquire + * these locks (unless otherwise needed), as it deals with the + * acquisition order. */ + struct ovs_mutex lock; + /* Contains the connections in the bucket, indexed by + * 'struct conn_key'. */ struct cmap conns OVS_GUARDED; - struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED; +}; + +struct conntrack { + struct ct_bucket buckets[CONNTRACK_BUCKETS]; + unsigned int next_bucket; + struct ovs_mutex ct_lock; struct cmap zone_limits OVS_GUARDED; struct cmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ @@ -220,9 +237,10 @@ struct conntrack { }; /* Lock acquisition order: - * 1. 'ct_lock' - * 2. 'conn->lock' - * 3. 'resources_lock' + * 1. 'buckets[p1]->lock' + * 2 'buckets[p2]->lock' (with p1 < p2) + * 3. 'conn->lock' + * 4. 'resources_lock' */ extern struct ct_l4_proto ct_proto_tcp; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index 9ecb06978..117810528 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -236,27 +236,6 @@ tm_to_ct_dpif_tp(enum ct_timeout tm) return CT_DPIF_TP_ATTR_MAX; } -static void -conn_update_expiration__(struct conntrack *ct, struct conn *conn, - enum ct_timeout tm, long long now, - uint32_t tp_value) - OVS_REQUIRES(conn->lock) -{ - ovs_mutex_unlock(&conn->lock); - - ovs_mutex_lock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); - if (!conn->cleaned) { - conn->expiration = now + tp_value * 1000; - ovs_list_remove(&conn->exp_node); - ovs_list_push_back(&ct->exp_lists[tm], &conn->exp_node); - } - ovs_mutex_unlock(&conn->lock); - ovs_mutex_unlock(&ct->ct_lock); - - ovs_mutex_lock(&conn->lock); -} - /* The conn entry lock must be held on entry and exit. */ void conn_update_expiration(struct conntrack *ct, struct conn *conn, @@ -266,42 +245,25 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, struct timeout_policy *tp; uint32_t val; - ovs_mutex_unlock(&conn->lock); - - ovs_mutex_lock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); tp = timeout_policy_lookup(ct, conn->tp_id); if (tp) { val = tp->policy.attrs[tm_to_ct_dpif_tp(tm)]; } else { val = ct_dpif_netdev_tp_def[tm_to_ct_dpif_tp(tm)]; } - ovs_mutex_unlock(&conn->lock); - ovs_mutex_unlock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d " "val=%u sec.", ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, conn->tp_id, val); - conn_update_expiration__(ct, conn, tm, now, val); -} - -static void -conn_init_expiration__(struct conntrack *ct, struct conn *conn, - enum ct_timeout tm, long long now, - uint32_t tp_value) -{ - conn->expiration = now + tp_value * 1000; - ovs_list_push_back(&ct->exp_lists[tm], &conn->exp_node); + conn->expiration = now + val * 1000; } /* ct_lock must be held. */ void conn_init_expiration(struct conntrack *ct, struct conn *conn, enum ct_timeout tm, long long now) - OVS_REQUIRES(ct->ct_lock) { struct timeout_policy *tp; uint32_t val; @@ -317,5 +279,5 @@ conn_init_expiration(struct conntrack *ct, struct conn *conn, ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, conn->tp_id, val); - conn_init_expiration__(ct, conn, tm, now, val); + conn->expiration = now + val * 1000; } diff --git a/lib/conntrack.c b/lib/conntrack.c index 371a34b89..fb784d066 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -44,7 +44,6 @@ VLOG_DEFINE_THIS_MODULE(conntrack); COVERAGE_DEFINE(conntrack_full); -COVERAGE_DEFINE(conntrack_long_cleanup); COVERAGE_DEFINE(conntrack_l3csum_err); COVERAGE_DEFINE(conntrack_l4csum_err); COVERAGE_DEFINE(conntrack_lookup_natted_miss); @@ -85,6 +84,8 @@ struct zone_limit { struct conntrack_zone_limit czl; }; +static unsigned hash_to_bucket(uint32_t hash); +static void conn_clean(struct conntrack *ct, struct conn *conn); static bool conn_key_extract(struct conntrack *, struct dp_packet *, ovs_be16 dl_type, struct conn_lookup_ctx *, uint16_t zone); @@ -109,8 +110,9 @@ static void set_label(struct dp_packet *, struct conn *, static void *clean_thread_main(void *f_); static bool -nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, - const struct nat_action_info_t *nat_info); +nat_get_unique_tuple_lock(struct conntrack *ct, struct conn *conn, + const struct nat_action_info_t *nat_info, + uint32_t *rev_hash); static uint8_t reverse_icmp_type(uint8_t type); @@ -249,16 +251,17 @@ conntrack_init(void) ovs_rwlock_unlock(&ct->resources_lock); ovs_mutex_init_adaptive(&ct->ct_lock); - ovs_mutex_lock(&ct->ct_lock); - cmap_init(&ct->conns); - for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) { - ovs_list_init(&ct->exp_lists[i]); + + ct->next_bucket = 0; + for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) { + struct ct_bucket *bucket = &ct->buckets[i]; + cmap_init(&bucket->conns); + ovs_mutex_init_adaptive(&bucket->lock); } + cmap_init(&ct->zone_limits); ct->zone_limit_seq = 0; timeout_policy_init(ct); - ovs_mutex_unlock(&ct->ct_lock); - atomic_count_init(&ct->n_conn, 0); atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT); atomic_init(&ct->tcp_seq_chk, true); @@ -410,9 +413,9 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone) } static void -conn_clean(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) +conn_clean__(struct conntrack *ct, struct conn *conn) { + struct ct_bucket *bucket; struct zone_limit *zl; uint32_t hash; @@ -420,8 +423,9 @@ conn_clean(struct conntrack *ct, struct conn *conn) expectation_clean(ct, &conn->key_node[CT_DIR_FWD].key); } - hash = conn_key_hash(&conn->key_node[CT_DIR_FWD].key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); + hash = conn->key_node[CT_DIR_FWD].key_hash; + bucket = &ct->buckets[hash_to_bucket(hash)]; + cmap_remove(&bucket->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { @@ -429,12 +433,10 @@ conn_clean(struct conntrack *ct, struct conn *conn) } if (conn->nat_action) { - hash = conn_key_hash(&conn->key_node[CT_DIR_REV].key, - ct->hash_basis); - cmap_remove(&ct->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); + hash = conn->key_node[CT_DIR_REV].key_hash; + bucket = &ct->buckets[hash_to_bucket(hash)]; + cmap_remove(&bucket->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); } - ovs_list_remove(&conn->exp_node); - conn->cleaned = true; ovsrcu_postpone(delete_conn, conn); atomic_count_dec(&ct->n_conn); } @@ -446,22 +448,32 @@ void conntrack_destroy(struct conntrack *ct) { struct conn_key_node *keyn; + struct ct_bucket *bucket; struct conn *conn; + int i; latch_set(&ct->clean_thread_exit); pthread_join(ct->clean_thread, NULL); latch_destroy(&ct->clean_thread_exit); - ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { - if (keyn->dir != CT_DIR_FWD) { - continue; + for (i = 0; i < CONNTRACK_BUCKETS; i++) { + bucket = &ct->buckets[i]; + CMAP_FOR_EACH (keyn, cm_node, &bucket->conns) { + if (keyn->dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + conn_clean(ct, conn); } - conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); - conn_clean(ct, conn); } - cmap_destroy(&ct->conns); + for (i = 0; i < CONNTRACK_BUCKETS; i++) { + bucket = &ct->buckets[i]; + ovs_mutex_destroy(&bucket->lock); + cmap_destroy(&ct->buckets[i].conns); + } + + ovs_mutex_lock(&ct->ct_lock); struct zone_limit *zl; CMAP_FOR_EACH (zl, node, &ct->zone_limits) { uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); @@ -498,45 +510,132 @@ conntrack_destroy(struct conntrack *ct) } +static unsigned hash_to_bucket(uint32_t hash) +{ + return hash >> (32 - CONNTRACK_BUCKETS_SHIFT); +} + static bool -conn_key_lookup(struct conntrack *ct, const struct conn_key *key, - uint32_t hash, long long now, struct conn **conn_out, - bool *reply) +conn_key_lookup__(struct conntrack *ct, unsigned bucket, + const struct conn_key *key, uint32_t hash, + long long now, struct conn **conn_out, + bool *reply) { + struct ct_bucket *ctb = &ct->buckets[bucket]; struct conn_key_node *keyn; - struct conn *conn = NULL; bool found = false; + struct conn *conn; - CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ct->conns) { + CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ctb->conns) { conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + if (conn_expired(conn, now)) { + continue; + } + for (int i = CT_DIR_FWD; i < CT_DIR_MAX; i++) { - if (!conn_key_cmp(&conn->key_node[i].key, key) && - !conn_expired(conn, now)) { + if (!conn_key_cmp(&conn->key_node[i].key, key)) { found = true; if (reply) { *reply = i; } - goto out_found; + + goto conn_out_found; } } } -out_found: - if (found && conn_out) { - *conn_out = conn; - } else if (conn_out) { - *conn_out = NULL; +conn_out_found: + if (conn_out) { + *conn_out = found ? conn : NULL; } return found; } +static bool +conn_key_lookup(struct conntrack *ct, unsigned bucket, + const struct conn_key *key, uint32_t hash, + long long now, struct conn **conn_out, + bool *reply) +{ + return conn_key_lookup__(ct, bucket, key, hash, + now, conn_out, reply); +} + +static void +ct_buckets_unlock__(struct conntrack *ct, uint32_t p1, uint32_t p2) + OVS_NO_THREAD_SAFETY_ANALYSIS +{ + if (p1 < p2) { + ovs_mutex_unlock(&ct->buckets[p2].lock); + ovs_mutex_unlock(&ct->buckets[p1].lock); + } else if (p1 > p2) { + ovs_mutex_unlock(&ct->buckets[p1].lock); + ovs_mutex_unlock(&ct->buckets[p2].lock); + } else { + ovs_mutex_unlock(&ct->buckets[p1].lock); + } +} + +static void +ct_buckets_unlock(struct conntrack *ct, uint32_t h1, uint32_t h2) +{ + unsigned p1 = hash_to_bucket(h1), + p2 = hash_to_bucket(h2); + + ct_buckets_unlock__(ct, p1, p2); +} + +/* Acquires both locks in an ordered way. */ +static void +ct_buckets_lock__(struct conntrack *ct, unsigned p1, unsigned p2) + OVS_NO_THREAD_SAFETY_ANALYSIS +{ + if (p1 < p2) { + ovs_mutex_lock(&ct->buckets[p1].lock); + ovs_mutex_lock(&ct->buckets[p2].lock); + } else if (p1 > p2) { + ovs_mutex_lock(&ct->buckets[p2].lock); + ovs_mutex_lock(&ct->buckets[p1].lock); + } else { + ovs_mutex_lock(&ct->buckets[p1].lock); + } +} + +static void +ct_buckets_lock(struct conntrack *ct, uint32_t h1, uint32_t h2) +{ + unsigned p1 = hash_to_bucket(h1), + p2 = hash_to_bucket(h2); + + ct_buckets_lock__(ct, p1, p2); +} + +static void +conn_clean(struct conntrack *ct, struct conn *conn) +{ + uint32_t h1, h2; + + if (atomic_flag_test_and_set(&conn->cleaned)) { + return; + } + + h1 = conn->key_node[CT_DIR_FWD].key_hash; + h2 = conn->key_node[CT_DIR_REV].key_hash; + ct_buckets_lock(ct, h1, h2); + conn_clean__(ct, conn); + ct_buckets_unlock(ct, h1, h2); +} + static bool conn_lookup(struct conntrack *ct, const struct conn_key *key, long long now, struct conn **conn_out, bool *reply) { uint32_t hash = conn_key_hash(key, ct->hash_basis); - return conn_key_lookup(ct, key, hash, now, conn_out, reply); + unsigned bucket = hash_to_bucket(hash); + + return conn_key_lookup__(ct, bucket, key, hash, + now, conn_out, reply); } static void @@ -944,7 +1043,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, const struct nat_action_info_t *nat_action_info, const char *helper, const struct alg_exp_node *alg_exp, enum ct_alg_ctl_type ct_alg_ctl, uint32_t tp_id) - OVS_REQUIRES(ct->ct_lock) { struct conn *nc = NULL; uint32_t rev_hash = ctx->hash; @@ -961,10 +1059,11 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } if (commit) { - struct conn_key_node *fwd_key_node, *rev_key_node; - struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); + struct conn_key_node *fwd_key_node, *rev_key_node; + bool handle_tuple = false; + if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { return nc; } @@ -1007,22 +1106,40 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, nc->nat_action = NAT_ACTION_DST; } } else { - bool nat_res = nat_get_unique_tuple(ct, nc, nat_action_info); - - if (!nat_res) { - goto nat_res_exhaustion; - } + handle_tuple = true; } - - nat_packet(pkt, nc, ctx->icmp_related); - rev_hash = conn_key_hash(&rev_key_node->key, ct->hash_basis); - rev_key_node->dir = CT_DIR_REV; - cmap_insert(&ct->conns, &rev_key_node->cm_node, rev_hash); } ovs_mutex_init_adaptive(&nc->lock); + atomic_flag_clear(&nc->cleaned); fwd_key_node->dir = CT_DIR_FWD; - cmap_insert(&ct->conns, &fwd_key_node->cm_node, ctx->hash); + rev_key_node->dir = CT_DIR_REV; + + if (handle_tuple) { + bool nat_res = nat_get_unique_tuple_lock(ct, nc, nat_action_info, + &rev_hash); + + if (!nat_res) { + goto out_error; + } + } else { + rev_hash = conn_key_hash(&rev_key_node->key, ct->hash_basis); + ct_buckets_lock(ct, ctx->hash, rev_hash); + } + + if (conn_lookup(ct, &ctx->key, now, NULL, NULL)) { + goto out_error_unlock; + } + + fwd_key_node->key_hash = ctx->hash; + cmap_insert(&ct->buckets[hash_to_bucket(ctx->hash)].conns, + &fwd_key_node->cm_node, ctx->hash); + if (nat_action_info) { + rev_key_node->key_hash = rev_hash; + cmap_insert(&ct->buckets[hash_to_bucket(rev_hash)].conns, + &rev_key_node->cm_node, rev_hash); + nat_packet(pkt, nc, ctx->icmp_related); + } atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ @@ -1033,21 +1150,23 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } else { nc->admit_zone = INVALID_ZONE; } + ct_buckets_unlock(ct, ctx->hash, rev_hash); } return nc; +out_error_unlock: + ct_buckets_unlock(ct, ctx->hash, rev_hash); /* This would be a user error or a DOS attack. A user error is prevented * by allocating enough combinations of NAT addresses when combined with * ephemeral ports. A DOS attack should be protected against with * firewall rules or a separate firewall. Also using zone partitioning * can limit DoS impact. */ -nat_res_exhaustion: - ovs_list_remove(&nc->exp_node); +out_error: + ovs_mutex_destroy(&nc->lock); delete_conn_cmn(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); - VLOG_WARN_RL(&rl, "Unable to NAT due to tuple space exhaustion - " - "if DoS attack, use firewalling and/or zone partitioning."); + VLOG_WARN_RL(&rl, "Unable to insert a new connection."); return NULL; } @@ -1082,12 +1201,7 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, pkt->md.ct_state = CS_INVALID; break; case CT_UPDATE_NEW: - ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, - now, NULL, NULL)) { - conn_clean(ct, conn); - } - ovs_mutex_unlock(&ct->ct_lock); + conn_clean(ct, conn); create_new_conn = true; break; case CT_UPDATE_VALID_NEW: @@ -1253,6 +1367,8 @@ static void initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, long long now, bool natted) { + unsigned bucket = hash_to_bucket(ctx->hash); + if (natted) { /* If the packet has been already natted (e.g. a previous * action took place), retrieve it performing a lookup of its @@ -1260,7 +1376,8 @@ initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, conn_key_reverse(&ctx->key); } - conn_key_lookup(ct, &ctx->key, ctx->hash, now, &ctx->conn, &ctx->reply); + conn_key_lookup(ct, bucket, &ctx->key, ctx->hash, + now, &ctx->conn, &ctx->reply); if (natted) { if (OVS_LIKELY(ctx->conn)) { @@ -1287,24 +1404,20 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper, uint32_t tp_id) { + bool create_new_conn = false; + /* Reset ct_state whenever entering a new zone. */ if (pkt->md.ct_state && pkt->md.ct_zone != zone) { pkt->md.ct_state = 0; } - bool create_new_conn = false; initial_conn_lookup(ct, ctx, now, !!(pkt->md.ct_state & (CS_SRC_NAT | CS_DST_NAT))); struct conn *conn = ctx->conn; /* Delete found entry if in wrong direction. 'force' implies commit. */ if (OVS_UNLIKELY(force && ctx->reply && conn)) { - ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, - now, NULL, NULL)) { - conn_clean(ct, conn); - } - ovs_mutex_unlock(&ct->ct_lock); + conn_clean(ct, conn); conn = NULL; } @@ -1338,7 +1451,6 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, struct alg_exp_node alg_exp_entry; if (OVS_UNLIKELY(create_new_conn)) { - ovs_rwlock_rdlock(&ct->resources_lock); alg_exp = expectation_lookup(&ct->alg_expectations, &ctx->key, ct->hash_basis, @@ -1349,12 +1461,9 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, } ovs_rwlock_unlock(&ct->resources_lock); - ovs_mutex_lock(&ct->ct_lock); - if (!conn_lookup(ct, &ctx->key, now, NULL, NULL)) { - conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, - helper, alg_exp, ct_alg_ctl, tp_id); - } - ovs_mutex_unlock(&ct->ct_lock); + conn = conn_not_found(ct, pkt, ctx, commit, now, + nat_action_info, helper, alg_exp, + ct_alg_ctl, tp_id); } write_ct_md(pkt, zone, conn, &ctx->key, alg_exp); @@ -1467,83 +1576,92 @@ set_label(struct dp_packet *pkt, struct conn *conn, } -/* Delete the expired connections from 'ctb', up to 'limit'. Returns the - * earliest expiration time among the remaining connections in 'ctb'. Returns - * LLONG_MAX if 'ctb' is empty. The return value might be smaller than 'now', - * if 'limit' is reached */ +/* Delete the expired connections from 'bucket', up to 'limit'. + * Returns the earliest expiration time among the remaining + * connections in 'bucket'. Returns LLONG_MAX if 'bucket' is empty. + * The return value might be smaller than 'now', if 'limit' is + * reached. */ static long long -ct_sweep(struct conntrack *ct, long long now, size_t limit) +sweep_bucket(struct conntrack *ct, struct ct_bucket *bucket, + long long now) { + struct conn_key_node *keyn; + unsigned int conn_count = 0; struct conn *conn; - long long min_expiration = LLONG_MAX; - size_t count = 0; + long long expiration; - ovs_mutex_lock(&ct->ct_lock); + CMAP_FOR_EACH (keyn, cm_node, &bucket->conns) { + if (keyn->dir != CT_DIR_FWD) { + continue; + } - for (unsigned i = 0; i < N_CT_TM; i++) { - LIST_FOR_EACH_SAFE (conn, exp_node, &ct->exp_lists[i]) { - ovs_mutex_lock(&conn->lock); - if (now < conn->expiration || count >= limit) { - min_expiration = MIN(min_expiration, conn->expiration); - ovs_mutex_unlock(&conn->lock); - if (count >= limit) { - /* Do not check other lists. */ - COVERAGE_INC(conntrack_long_cleanup); - goto out; - } - break; - } else { - ovs_mutex_unlock(&conn->lock); - conn_clean(ct, conn); - } - count++; + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + ovs_mutex_lock(&conn->lock); + expiration = conn->expiration; + ovs_mutex_unlock(&conn->lock); + + if (now >= expiration) { + conn_clean(ct, conn); } + + conn_count++; } -out: - VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count, - time_msec() - now); - ovs_mutex_unlock(&ct->ct_lock); - return min_expiration; + return conn_count; } -/* Cleans up old connection entries from 'ct'. Returns the time when the - * next expiration might happen. The return value might be smaller than - * 'now', meaning that an internal limit has been reached, and some expired - * connections have not been deleted. */ +/* Cleans up old connection entries from 'ct'. Returns the the next + * wake up time. The return value might be smaller than 'now', meaning + * that an internal limit has been reached, that is, the table + * hasn't been entirely scanned. */ static long long conntrack_clean(struct conntrack *ct, long long now) { - unsigned int n_conn_limit; + long long next_wakeup = now + 30 * 1000; + unsigned int n_conn_limit, i, count = 0; + size_t clean_end; + atomic_read_relaxed(&ct->n_conn_limit, &n_conn_limit); - size_t clean_max = n_conn_limit > 10 ? n_conn_limit / 10 : 1; - long long min_exp = ct_sweep(ct, now, clean_max); - long long next_wakeup = MIN(min_exp, now + CT_DPIF_NETDEV_TP_MIN); + clean_end = n_conn_limit / 64; + + for (i = ct->next_bucket; i < CONNTRACK_BUCKETS; i++) { + struct ct_bucket *bucket = &ct->buckets[i]; + + count += sweep_bucket(ct, bucket, now); + + if (count > clean_end) { + next_wakeup = 0; + break; + } + } + + ct->next_bucket = (i < CONNTRACK_BUCKETS) ? i : 0; return next_wakeup; } /* Cleanup: * - * We must call conntrack_clean() periodically. conntrack_clean() return - * value gives an hint on when the next cleanup must be done (either because - * there is an actual connection that expires, or because a new connection - * might be created with the minimum timeout). + * We must call conntrack_clean() periodically. conntrack_clean() + * return value gives an hint on when the next cleanup must be done + * (either because there is still work to do, or because a new + * connection might be created). * * The logic below has two goals: * - * - We want to reduce the number of wakeups and batch connection cleanup - * when the load is not very high. CT_CLEAN_INTERVAL ensures that if we - * are coping with the current cleanup tasks, then we wait at least - * 5 seconds to do further cleanup. + * - When the load is high, we want to avoid to hog the CPU scanning + * all the buckets and their respective CMAPs "at once". For this + * reason, every batch cleanup aims to scan at most n_conn_limit / + * 64 entries (more if the buckets contains many entrie) before + * yielding the CPU. In this case, the next wake up will happen in + * CT_CLEAN_MIN_INTERVAL_MS and the scan will resume starting from + * the first bucket not scanned. * - * - We don't want to keep the map locked too long, as we might prevent - * traffic from flowing. CT_CLEAN_MIN_INTERVAL ensures that if cleanup is - * behind, there is at least some 200ms blocks of time when the map will be - * left alone, so the datapath can operate unhindered. - */ -#define CT_CLEAN_INTERVAL 5000 /* 5 seconds */ -#define CT_CLEAN_MIN_INTERVAL 200 /* 0.2 seconds */ + * - We also don't want to scan the buckets so frequently, as going + * through all the connections, during high loads, may be costly in + * terms of CPU time. In this case the next wake up is set to 90 + * seconds. */ +#define CT_CLEAN_MIN_INTERVAL_MS 100 /* 0.1 seconds */ static void * clean_thread_main(void *f_) @@ -1556,9 +1674,9 @@ clean_thread_main(void *f_) next_wake = conntrack_clean(ct, now); if (next_wake < now) { - poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL); + poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL_MS); } else { - poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL)); + poll_timer_wait_until(next_wake); } latch_wait(&ct->clean_thread_exit); poll_block(); @@ -2301,9 +2419,10 @@ store_addr_to_key(union ct_addr *addr, struct conn_key *key, } static bool -nat_get_unique_l4(struct conntrack *ct, struct conn_key *rev_key, - ovs_be16 *port, uint16_t curr, uint16_t min, - uint16_t max) +nat_get_unique_l4(struct conntrack *ct, struct conn_key *fwd_key, + struct conn_key *rev_key, ovs_be16 *port, + uint16_t curr, uint16_t min, + uint16_t max, uint32_t *rev_hash) { static const unsigned int max_attempts = 128; uint16_t range = max - min + 1; @@ -2324,10 +2443,15 @@ another_round: } *port = htons(curr); - if (!conn_lookup(ct, rev_key, time_msec(), - NULL, NULL)) { + uint32_t key_hash = conn_key_hash(fwd_key, ct->hash_basis); + *rev_hash = conn_key_hash(rev_key, ct->hash_basis); + + ct_buckets_lock(ct, key_hash, *rev_hash); + if (!conn_lookup(ct, rev_key, + time_msec(), NULL, NULL)) { return true; } + ct_buckets_unlock(ct, key_hash, *rev_hash); } if (attempts < range && attempts >= 16) { @@ -2361,10 +2485,13 @@ another_round: * tries to find a source port in the ephemeral * range (after testing the port used by the sender). * - * If none can be found, return exhaustion to the caller. */ + * If none can be found, return exhaustion to the caller. + * If successful, this function returns with the locks + * acquired. */ static bool -nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, - const struct nat_action_info_t *nat_info) +nat_get_unique_tuple_lock(struct conntrack *ct, struct conn *conn, + const struct nat_action_info_t *nat_info, + uint32_t *rev_hash) { union ct_addr min_addr = {0}, max_addr = {0}, addr = {0}; struct conn_key *fwd_key = &conn->key_node[CT_DIR_FWD].key, @@ -2397,23 +2524,30 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, nat_info->nat_action); if (!pat_proto) { + uint32_t key_hash = conn_key_hash(fwd_key, ct->hash_basis); + *rev_hash = conn_key_hash(rev_key, ct->hash_basis); + + ct_buckets_lock(ct, key_hash, *rev_hash); if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } + ct_buckets_unlock(ct, key_hash, *rev_hash); return false; } bool found = false; if (nat_info->nat_action & NAT_ACTION_DST_PORT) { - found = nat_get_unique_l4(ct, rev_key, &rev_key->src.port, - curr_dport, min_dport, max_dport); + found = nat_get_unique_l4(ct, fwd_key, rev_key, &rev_key->src.port, + curr_dport, min_dport, max_dport, + rev_hash); } if (!found) { - found = nat_get_unique_l4(ct, rev_key, &rev_key->dst.port, - curr_sport, min_sport, max_sport); + found = nat_get_unique_l4(ct, fwd_key, rev_key, &rev_key->dst.port, + curr_sport, min_sport, max_sport, + rev_hash); } if (found) { @@ -2615,20 +2749,38 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) { struct conntrack *ct = dump->ct; long long now = time_msec(); + struct ct_bucket *bucket; - for (;;) { - struct cmap_node *cm_node = cmap_next_position(&ct->conns, - &dump->cm_pos); - if (!cm_node) { - break; + while (dump->bucket < CONNTRACK_BUCKETS) { + struct cmap_node *cm_node; + bucket = &ct->buckets[dump->bucket]; + + for (;;) { + cm_node = cmap_next_position(&bucket->conns, + &dump->cm_pos); + if (!cm_node) { + break; + } + struct conn_key_node *keyn; + struct conn *conn; + INIT_CONTAINER(keyn, cm_node, cm_node); + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + + if (conn_expired(conn, now)) { + continue; + } + + if ((!dump->filter_zone || keyn->key.zone == dump->zone) && + (keyn->dir == CT_DIR_FWD)) { + conn_to_ct_dpif_entry(conn, entry, now); + break; + } } - struct conn_key_node *keyn; - struct conn *conn; - INIT_CONTAINER(keyn, cm_node, cm_node); - conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); - if ((!dump->filter_zone || keyn->key.zone == dump->zone) && - (keyn->dir == CT_DIR_FWD)) { - conn_to_ct_dpif_entry(conn, entry, now); + + if (!cm_node) { + memset(&dump->cm_pos, 0, sizeof dump->cm_pos); + dump->bucket++; + } else { return 0; } } @@ -2648,17 +2800,18 @@ conntrack_flush(struct conntrack *ct, const uint16_t *zone) struct conn_key_node *keyn; struct conn *conn; - ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { - if (keyn->dir != CT_DIR_FWD) { - continue; - } - conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); - if (!zone || *zone == keyn->key.zone) { - conn_clean(ct, conn); + for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) { + CMAP_FOR_EACH (keyn, cm_node, &ct->buckets[i].conns) { + if (keyn->dir != CT_DIR_FWD) { + continue; + } + + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); + if (!zone || *zone == keyn->key.zone) { + conn_clean(ct, conn); + } } } - ovs_mutex_unlock(&ct->ct_lock); return 0; } @@ -2667,15 +2820,19 @@ int conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, uint16_t zone) { - int error = 0; struct conn_key key; - struct conn *conn; + struct conn *conn = NULL; + unsigned bucket; + uint32_t hash; + int error = 0; memset(&key, 0, sizeof(key)); tuple_to_conn_key(tuple, zone, &key); - ovs_mutex_lock(&ct->ct_lock); - conn_lookup(ct, &key, time_msec(), &conn, NULL); + hash = conn_key_hash(&key, ct->hash_basis); + bucket = hash_to_bucket(hash); + + conn_key_lookup(ct, bucket, &key, hash, time_msec(), &conn, NULL); if (conn) { conn_clean(ct, conn); } else { @@ -2683,7 +2840,6 @@ conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, error = ENOENT; } - ovs_mutex_unlock(&ct->ct_lock); return error; } diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 36e10aa4a..90794571a 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -5131,9 +5131,8 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sed -e 's/dst= tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.2XX,sport=,dport=),zone=1,protoinfo=(state=) ]) -OVS_TRAFFIC_VSWITCHD_STOP(["dnl -/Unable to NAT due to tuple space exhaustion - if DoS attack, use firewalling and\/or zone partitioning./d -/Dropped .* log messages in last .* seconds \(most recently, .* seconds ago\) due to excessive rate/d"]) +OVS_TRAFFIC_VSWITCHD_STOP(["/dnl +Unable to insert a new connection./d"]) AT_CLEANUP AT_SETUP([conntrack - more complex SNAT]) From patchwork Fri Jul 1 13:35:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1651206 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=DNoM0w7n; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LZGRn18hsz9s09 for ; Fri, 1 Jul 2022 23:35:49 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id E2F96424BC; Fri, 1 Jul 2022 13:35:46 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org E2F96424BC Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=DNoM0w7n X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YsTr-iSH7L45; Fri, 1 Jul 2022 13:35:45 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id 5226A4245A; Fri, 1 Jul 2022 13:35:44 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 5226A4245A Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1F129C007B; Fri, 1 Jul 2022 13:35:44 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 2822FC0039 for ; Fri, 1 Jul 2022 13:35:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 97527418CC for ; Fri, 1 Jul 2022 13:35:23 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 97527418CC X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7-HJ3hMh1JHZ for ; Fri, 1 Jul 2022 13:35:22 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 652964245B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 652964245B for ; Fri, 1 Jul 2022 13:35:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656682521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0sDnHve8uaKNQlp3j1PZFyi6rcod+zG6Uh5mugXY1jY=; b=DNoM0w7n53YoYBhMmsOLUNZi0XcRJAhQ3XfezBGM54Y2dGgI9JTQMFFzpuAKxN8eyZr+MX YseYH5PP1D6NVkKg46DgAs3FVKrUoUBUyTSs1v+m4sZZvNE4ScXkYWLWrzucA12UAwthAU +5HCLO5b8fQvdvyETjMLb/tSGA5MXXo= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-641-fFQdN7ycN5WIRtZBhBUBzw-1; Fri, 01 Jul 2022 09:35:19 -0400 X-MC-Unique: fFQdN7ycN5WIRtZBhBUBzw-1 Received: by mail-wm1-f69.google.com with SMTP id v184-20020a1cacc1000000b0039c7efa3e95so1038289wme.3 for ; Fri, 01 Jul 2022 06:35:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=0sDnHve8uaKNQlp3j1PZFyi6rcod+zG6Uh5mugXY1jY=; b=r/4fwaKY24YPaGUxJqHAzSFv+x0j9qwzzIEgwts9XbSvn0gw5sMR+y7rB5ZBwiPph8 S9RGbftvzCkgAff+eAaLhS6ebepgdkZWLJXFiyXZ8GB6qmqPu/hgOOZX/PN1oqYqEGtA k0l4+LweWyVjjlMMG6B1lZe3YSRBM+0n+74pNC21QcqPTgoW4slMc/W0sgFsmNMUx1aE x+BmHHoGAXtRGDbWydmIupUh+3QiBRJwiPqAkm9RtCF2IeGW5BCxqu3mhdcUErPGCj3u 7fwwMACvIFITo+6ZBjAuF8loi0RIDJDinmLQ068smpNSW+i4om5Y6W9dTTxBPSx+9zTE iRwg== X-Gm-Message-State: AJIora8+VdvIPvFGnF9zcmsGc7z+beqt39fi0S2R8xbWukwU9DF7i9QN rsUagCfu7e778yJhM+BL48CCfFG0J5gD8OGgzYlaQ5RMro7hMJjEIwnvVHFfrrp58bk0DYkHkG3 YM2bUjv7XrvVHRNIahwwLF3CIGI1VZfpHTaFpi6hEPyRIAQQIr3u4mugOMpwcnrpg X-Received: by 2002:a5d:688e:0:b0:21b:9d51:25d2 with SMTP id h14-20020a5d688e000000b0021b9d5125d2mr14145590wru.286.1656682517531; Fri, 01 Jul 2022 06:35:17 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vQyDC0mYh5HfcdhkhJSAOn7cX3izt5QcOePgk/ZI29/KYlyrgchNqntlWJhWPG4O6Wozxv6g== X-Received: by 2002:a5d:688e:0:b0:21b:9d51:25d2 with SMTP id h14-20020a5d688e000000b0021b9d5125d2mr14145552wru.286.1656682517212; Fri, 01 Jul 2022 06:35:17 -0700 (PDT) Received: from localhost (net-5-95-131-226.cust.vodafonedsl.it. [5.95.131.226]) by smtp.gmail.com with ESMTPSA id l1-20020a5d4bc1000000b00219e77e489fsm21865842wrt.17.2022.07.01.06.35.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Jul 2022 06:35:16 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Fri, 01 Jul 2022 15:35:16 +0200 Message-ID: <165668251610.1967719.11429037651192747503.stgit@fed.void> In-Reply-To: <165668246947.1967719.4502709837520339365.stgit@fed.void> References: <165668246947.1967719.4502709837520339365.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v4 5/6] conntrack: Use an atomic conn expiration value X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Gaetan Rivet A lock is taken during conn_lookup() to check whether a connection is expired before returning it. This lock can have some contention. Even though this lock ensures a consistent sequence of writes, it does not imply a specific order. A ct_clean thread taking the lock first could read a value that would be updated immediately after by a PMD waiting on the same lock, just as well as the inverse order. As such, the expiration time can be stale anytime it is read. In this context, using an atomic will ensure the same guarantees for either writes or reads, i.e. writes are consistent and reads are not undefined behaviour. Reading an atomic is however less costly than taking and releasing a lock. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Acked-by: William Tu Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 2 +- lib/conntrack-tp.c | 2 +- lib/conntrack.c | 27 +++++++++++++++------------ 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index fd501d574..d71d4d778 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -110,7 +110,7 @@ struct conn { /* Mutable data. */ struct ovs_mutex lock; /* Guards all mutable fields. */ ovs_u128 label; - long long expiration; + atomic_llong expiration; uint32_t mark; int seq_skew; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index 117810528..cdb3639de 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -257,7 +257,7 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, conn->tp_id, val); - conn->expiration = now + val * 1000; + atomic_store_relaxed(&conn->expiration, now + val * 1000); } /* ct_lock must be held. */ diff --git a/lib/conntrack.c b/lib/conntrack.c index fb784d066..1ad29325c 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -101,6 +101,7 @@ static enum ct_update_res conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, long long now); +static long long int conn_expiration(const struct conn *conn); static bool conn_expired(struct conn *, long long now); static void set_mark(struct dp_packet *, struct conn *, uint32_t val, uint32_t mask); @@ -531,7 +532,6 @@ conn_key_lookup__(struct conntrack *ct, unsigned bucket, if (conn_expired(conn, now)) { continue; } - for (int i = CT_DIR_FWD; i < CT_DIR_MAX; i++) { if (!conn_key_cmp(&conn->key_node[i].key, key)) { found = true; @@ -1004,13 +1004,10 @@ un_nat_packet(struct dp_packet *pkt, const struct conn *conn, static void conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, long long now, int seq_skew, bool seq_skew_dir) - OVS_NO_THREAD_SAFETY_ANALYSIS { struct conn *conn; - ovs_mutex_unlock(&conn_in->lock); - conn_lookup(ct, &conn_in->key_node[CT_DIR_FWD].key, now, &conn, NULL); - ovs_mutex_lock(&conn_in->lock); + conn_lookup(ct, &conn_in->key_node[CT_DIR_FWD].key, now, &conn, NULL); if (conn && seq_skew) { conn->seq_skew = seq_skew; conn->seq_skew_dir = seq_skew_dir; @@ -1596,9 +1593,7 @@ sweep_bucket(struct conntrack *ct, struct ct_bucket *bucket, } conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->dir]); - ovs_mutex_lock(&conn->lock); - expiration = conn->expiration; - ovs_mutex_unlock(&conn->lock); + expiration = conn_expiration(conn); if (now >= expiration) { conn_clean(ct, conn); @@ -2569,12 +2564,20 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, return update_res; } +static long long int +conn_expiration(const struct conn *conn) +{ + long long int expiration; + + atomic_read_relaxed(&CONST_CAST(struct conn *, conn)->expiration, + &expiration); + return expiration; +} + static bool conn_expired(struct conn *conn, long long now) { - ovs_mutex_lock(&conn->lock); - bool expired = now >= conn->expiration ? true : false; - ovs_mutex_unlock(&conn->lock); + bool expired = now >= conn_expiration(conn) ? true : false; return expired; } @@ -2706,7 +2709,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, entry->mark = conn->mark; memcpy(&entry->labels, &conn->label, sizeof entry->labels); - long long expiration = conn->expiration - now; + long long expiration = conn_expiration(conn) - now; struct ct_l4_proto *class = l4_protos[key->nw_proto]; if (class->conn_get_protoinfo) { From patchwork Fri Jul 1 13:35:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1651207 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OOAMpPce; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LZGRr6BqMz9s09 for ; Fri, 1 Jul 2022 23:35:52 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id D3825424D5; Fri, 1 Jul 2022 13:35:50 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org D3825424D5 Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OOAMpPce X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UR0_QLnk2VRS; Fri, 1 Jul 2022 13:35:49 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id 7328F424C3; Fri, 1 Jul 2022 13:35:47 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 7328F424C3 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id D1EB4C0039; Fri, 1 Jul 2022 13:35:45 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0402FC0035 for ; Fri, 1 Jul 2022 13:35:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id E4E1784705 for ; Fri, 1 Jul 2022 13:35:30 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org E4E1784705 Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OOAMpPce X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vvT8GWngB7NQ for ; Fri, 1 Jul 2022 13:35:30 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 231E3846B8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 231E3846B8 for ; Fri, 1 Jul 2022 13:35:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656682529; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qdhbbLhnbeyycVtLXwArg58vpYvcgOMn8NPa1GRlYi4=; b=OOAMpPceY4xnkjHzng9s3mKKqfeT9MaE8yjELvn8H7eyfmzy9+EbErk0stAkovw5SzslNu IPxGVcrfqiHFIPtSPEtE92d1kBNC+pMyD0ALM5i1jcUxsS2Y3iepQ0UPJ+aj48WkLdkKlx W6n2M5z9m0ItHZQ6JTBiKxe4yVREII8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-564-hTFyQPGJPN-YOrJnMfpSnQ-1; Fri, 01 Jul 2022 09:35:25 -0400 X-MC-Unique: hTFyQPGJPN-YOrJnMfpSnQ-1 Received: by mail-wr1-f72.google.com with SMTP id s1-20020a5d69c1000000b0021b9f3abfebso390340wrw.2 for ; Fri, 01 Jul 2022 06:35:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=qdhbbLhnbeyycVtLXwArg58vpYvcgOMn8NPa1GRlYi4=; b=Cy8yAS6rgIIwoAUqbwV6OYLmsPaYQBJNXzz+DbRstCQ23FdeKP6cYj8YpZ/J8qjxA7 zcvAyxaFGgrYnkJybq2iQdMB6Iz8E/rMd1z3q5VARgK7Z/Vzmf3QvxLLVi9JPbvZJCuy swczmQSmyGI1QMWAFpUS4NLSKdMRZG08sZMzBbRcQ9JDtGfZl6byl3BDOJyqCj34RIQU Jwo6Ar1MQS6QpKk8X6R9kdA3IO/wSLzyV8y6LSXY2E+JwD1WAhDBgy/zVX3BWb0ND5zW vXcR9JnkdmTFjnCqYczeHnVzCfivw6+X2g/C9Gv9XFTV63q+WLKa6PMLKo46dN6Dj++l PWHA== X-Gm-Message-State: AJIora+EYXSl9LDSBUjg9PnLLh0egyKd1FtsPt8p/VK34Xo+9fkIqyhK Ff/jAaowbLq6G7iXbIrEAfhd9c7Qv2P2qQLqH+gs7oGlmyh9sXwiyxIBUCa/yk0yLXRi5z40FjB iJLdn6OX6o6xE78Dq+XQGR4LXM1P/yhmRxWqz6kuqqhajfS4zcUVX6F8nMgPYmAaE X-Received: by 2002:a1c:4b12:0:b0:3a0:4b1f:e944 with SMTP id y18-20020a1c4b12000000b003a04b1fe944mr16738866wma.166.1656682524314; Fri, 01 Jul 2022 06:35:24 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sw59fts29SuhszWIvxyFTSCtqeLO9LvQjWFgJxnxpK66j5UhvEzPHHxU0LK2QUSuGsOcM1tA== X-Received: by 2002:a1c:4b12:0:b0:3a0:4b1f:e944 with SMTP id y18-20020a1c4b12000000b003a04b1fe944mr16738838wma.166.1656682523892; Fri, 01 Jul 2022 06:35:23 -0700 (PDT) Received: from localhost (net-5-95-131-226.cust.vodafonedsl.it. [5.95.131.226]) by smtp.gmail.com with ESMTPSA id g13-20020adffc8d000000b0021b99efceb6sm22605488wrr.22.2022.07.01.06.35.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Jul 2022 06:35:23 -0700 (PDT) From: Paolo Valerio To: dev@openvswitch.org Date: Fri, 01 Jul 2022 15:35:22 +0200 Message-ID: <165668252237.1967719.7852766388871679884.stgit@fed.void> In-Reply-To: <165668246947.1967719.4502709837520339365.stgit@fed.void> References: <165668246947.1967719.4502709837520339365.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: i.maximets@ovn.org Subject: [ovs-dev] [PATCH v4 6/6] conntrack: Make ovs-appctl dpctl/ct-bkts work with multiple buckets X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Without this patch "ovs-appctl dpctl/ct-bkts" produces the following output: Total Buckets: 1 Current Connections: 10246 +-----------+-----------------------------------------+ | Buckets | Connections per Buckets | +-----------+-----------------------------------------+ 0.. 7 | 10246 with this patch applied, the output becomes: Total Buckets: 1024 Current Connections: 95956 +-----------+-----------------------------------------+ | Buckets | Connections per Buckets | +-----------+-----------------------------------------+ 0.. 7 | 87 100 90 91 92 92 101 83 8.. 15 | 98 86 80 91 92 93 92 84 16.. 23 | 82 114 103 90 94 80 95 96 ... 1000..1007 | 98 88 106 100 91 99 89 81 1008..1015 | 97 99 93 67 102 97 89 86 1016..1023 | 113 79 97 86 106 93 80 90 Signed-off-by: Paolo Valerio --- lib/conntrack.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 1ad29325c..00c183c48 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2694,7 +2694,7 @@ tuple_to_conn_key(const struct ct_dpif_tuple *tuple, uint16_t zone, static void conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, - long long now) + long long now, unsigned int bkt) { const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key, *rev_key = &conn->key_node[CT_DIR_REV].key; @@ -2718,6 +2718,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, ovs_mutex_unlock(&conn->lock); entry->timeout = (expiration > 0) ? expiration / 1000 : 0; + entry->bkt = bkt; if (conn->alg) { /* Caller is responsible for freeing. */ @@ -2743,7 +2744,7 @@ conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump, } dump->ct = ct; - *ptot_bkts = 1; /* Need to clean up the callers. */ + *ptot_bkts = CONNTRACK_BUCKETS; return 0; } @@ -2775,7 +2776,7 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) if ((!dump->filter_zone || keyn->key.zone == dump->zone) && (keyn->dir == CT_DIR_FWD)) { - conn_to_ct_dpif_entry(conn, entry, now); + conn_to_ct_dpif_entry(conn, entry, now, dump->bucket); break; } }