From patchwork Thu Aug 29 07:00:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1978254 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HxgowNKM; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WvXJF5PBDz1yZ9 for ; Thu, 29 Aug 2024 17:01:32 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id D4C2C61101; Thu, 29 Aug 2024 07:01:29 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id HfnCU4XN6pdK; Thu, 29 Aug 2024 07:01:28 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 73A4660A54 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HxgowNKM Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 73A4660A54; Thu, 29 Aug 2024 07:01:28 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4EA5BC0012; Thu, 29 Aug 2024 07:01:28 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id E9F20C0011 for ; Thu, 29 Aug 2024 07:01:26 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id C1B6240F7E for ; Thu, 29 Aug 2024 07:01:26 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id ckeWkdwYeop9 for ; Thu, 29 Aug 2024 07:01:25 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=echaudro@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 67BDD403C0 Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 67BDD403C0 Authentication-Results: smtp2.osuosl.org; dkim=pass (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HxgowNKM Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 67BDD403C0 for ; Thu, 29 Aug 2024 07:01:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724914883; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=voIh6lvz627HbR39N1602lHkTQOksEtCYLaV5ZGsCCE=; b=HxgowNKMPhNbLu3hNpxHCvk35UmDaZ5htD3Dv0+L69bPiJK+6Fkgtcys97RRmmUTxIORdK CQgc3TwgAdw8WamvohyCHlDdWggStsOOZz5Vx4Hh5+zIgcZTBXscrDeC0tQqVWgp6hwIyx w5IeSX4tVVRSbN7v+/qceUOU2sAwMx8= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-627-qvfTNkL9NZmpCqi70HhUvQ-1; Thu, 29 Aug 2024 03:01:20 -0400 X-MC-Unique: qvfTNkL9NZmpCqi70HhUvQ-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6CFE51955BED; Thu, 29 Aug 2024 07:01:19 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.194.72]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4E7471955DD8; Thu, 29 Aug 2024 07:01:18 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Thu, 29 Aug 2024 09:00:06 +0200 Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v5] ofproto-dpif-upcall: Avoid stale ukeys leaks. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" It is observed in some environments that there are much more ukeys than actual DP flows. For example: $ ovs-appctl upcall/show system@ovs-system: flows : (current 7) (avg 6) (max 117) (limit 2125) offloaded flows : 525 dump duration : 1063ms ufid enabled : true 23: (keys 3612) 24: (keys 3625) 25: (keys 3485) The revalidator threads are busy revalidating the stale ukeys leading to high CPU and long dump duration. This patch tracks the number of consecutive missed dumps. If four dumps are missed in a row, it is assumed that the datapath flow no longer exists, and the ukey can be deleted. Reported-by: Roi Dayan Co-authored-by: Han Zhou Co-authored-by: Roi Dayan Signed-off-by: Han Zhou Signed-off-by: Roi Dayan Signed-off-by: Eelco Chaudron --- v3: - Rewrote fix to use actual dump state, and added a tests case. v4: - Added reason to end of flow_del_reason. - Rather than time based, make it missed dumps based. - Changed it from a TC specific test to a generic unit test. v5: - Fixed spelling mistake. - Added coverage counter verification to the unit test. --- ofproto/ofproto-dpif-upcall.c | 18 ++++++++ tests/ofproto-dpif.at | 45 ++++++++++++++++++++ utilities/usdt-scripts/flow_reval_monitor.py | 4 +- 3 files changed, 66 insertions(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 4d39bc5a7..e7d4c2b2c 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow); COVERAGE_DEFINE(dumped_new_flow); COVERAGE_DEFINE(handler_duplicate_upcall); COVERAGE_DEFINE(revalidate_missed_dp_flow); +COVERAGE_DEFINE(revalidate_missing_dp_flow); COVERAGE_DEFINE(ukey_dp_change); COVERAGE_DEFINE(ukey_invalid_stat_reset); COVERAGE_DEFINE(ukey_replace_contention); @@ -284,6 +285,7 @@ enum flow_del_reason { FDR_TOO_EXPENSIVE, /* Too expensive to revalidate. */ FDR_UPDATE_FAIL, /* Datapath update failed. */ FDR_XLATION_ERROR, /* Flow translation error. */ + FDR_FLOW_MISSING_DP, /* Flow is missing from the datapath. */ }; /* 'udpif_key's are responsible for tracking the little bit of state udpif @@ -318,6 +320,7 @@ struct udpif_key { uint64_t dump_seq OVS_GUARDED; /* Tracks udpif->dump_seq. */ uint64_t reval_seq OVS_GUARDED; /* Tracks udpif->reval_seq. */ enum ukey_state state OVS_GUARDED; /* Tracks ukey lifetime. */ + uint32_t missed_dumps OVS_GUARDED; /* Missed consecutive dumps. */ /* 'state' debug information. */ unsigned int state_thread OVS_GUARDED; /* Thread that transitions. */ @@ -3040,6 +3043,21 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) result = revalidate_ukey(udpif, ukey, &stats, &odp_actions, reval_seq, &recircs, &del_reason); } + + if (ukey->dump_seq != dump_seq) { + ukey->missed_dumps++; + if (ukey->missed_dumps >= 4) { + /* If the flow was not dumped for 4 revalidator rounds, + * we can assume the datapath flow no longer exists + * and the ukey should be deleted. */ + COVERAGE_INC(revalidate_missing_dp_flow); + del_reason = FDR_FLOW_MISSING_DP; + result = UKEY_DELETE; + } + } else { + ukey->missed_dumps = 0; + } + if (result != UKEY_KEEP) { /* Clears 'recircs' if filled by revalidate_ukey(). */ reval_op_init(&ops[n_ops++], result, udpif, ukey, &recircs, diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index 42fb66de6..1df944ef8 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -12661,3 +12661,48 @@ AT_CHECK([ovs-appctl revalidator/resume]) OVS_VSWITCHD_STOP AT_CLEANUP + +AT_SETUP([ofproto-dpif - Cleanup missing datapath flows]) + +OVS_VSWITCHD_START +add_of_ports br0 1 2 + +m4_define([ICMP_PKT], [m4_join([,], + [eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800)], + [ipv4(src=10.10.10.2,dst=10.10.10.1,proto=1,tos=1,ttl=128,frag=no)], + [icmp(type=8,code=0)])]) + +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flow br0 'actions=normal' ]) + +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ICMP_PKT']) + +AT_CHECK([ovs-appctl dpctl/dump-flows --names | strip_used | strip_stats | dnl + strip_duration | strip_dp_hash | sort], [0], [dnl +flow-dump from the main thread: +recirc_id(0),in_port(p1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:br0,p2 +]) + +dnl Make sure the ukey exists. +AT_CHECK([ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \ + grep -q '1)'], [0]) + +dnl Delete all datapath flows, and make sure they are gone. +AT_CHECK([ovs-appctl dpctl/del-flows]) +AT_CHECK([ovs-appctl dpctl/dump-flows --names ], [0], []) + +dnl Move forward in time and make sure we have at least 4 * 500ms. +AT_CHECK([ovs-appctl time/warp 3000 300], [0], [ignore]) + +dnl Make sure no more ukeys exists. +AT_CHECK([ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \ + grep -qv '0)'], [1]) + +dnl Verify coverage counter was hit. +AT_CHECK([ovs-appctl coverage/read-counter revalidate_missing_dp_flow], [0], + [dnl +1 +]) + +OVS_VSWITCHD_STOP(["/failed to flow_del (No such file or directory)/d"]) +AT_CLEANUP diff --git a/utilities/usdt-scripts/flow_reval_monitor.py b/utilities/usdt-scripts/flow_reval_monitor.py index 28479a565..80c9c98bd 100755 --- a/utilities/usdt-scripts/flow_reval_monitor.py +++ b/utilities/usdt-scripts/flow_reval_monitor.py @@ -255,6 +255,7 @@ FdrReasons = IntEnum( "FDR_TOO_EXPENSIVE", "FDR_UPDATE_FAIL", "FDR_XLATION_ERROR", + "FDR_FLOW_MISSING_DP" ], start=0, ) @@ -270,7 +271,8 @@ FdrReasonStrings = { FdrReasons.FDR_PURGE: "User requested flow deletion", FdrReasons.FDR_TOO_EXPENSIVE: "Too expensive to revalidate", FdrReasons.FDR_UPDATE_FAIL: "Datapath update failed", - FdrReasons.FDR_XLATION_ERROR: "Flow translation error" + FdrReasons.FDR_XLATION_ERROR: "Flow translation error", + FdrReasons.FDR_FLOW_MISSING_DP: "Flow is missing from the datapath" }