From patchwork Tue Aug 27 09:00:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1977207 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bNM/cw3Q; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WtM4F6SK0z1yZd for ; Tue, 27 Aug 2024 19:02:05 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 3E255404DA; Tue, 27 Aug 2024 09:02:04 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id pG_-xsL_GGLN; Tue, 27 Aug 2024 09:02:02 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 381C640156 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bNM/cw3Q Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 381C640156; Tue, 27 Aug 2024 09:02:02 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0BB05C07E7; Tue, 27 Aug 2024 09:02:02 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id EA20BC07E6 for ; Tue, 27 Aug 2024 09:02:00 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id CBFAF607E0 for ; Tue, 27 Aug 2024 09:02:00 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id T-nqkS2UqAkW for ; Tue, 27 Aug 2024 09:02:00 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=echaudro@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org BD481607D8 Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org BD481607D8 Authentication-Results: smtp3.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=bNM/cw3Q Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id BD481607D8 for ; Tue, 27 Aug 2024 09:01:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724749318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Mv7dB5uFdYcKkxqIMVlPmijFyuNw5BJQPhDIeproLCM=; b=bNM/cw3QtCCf4HoyAHEbOok0NFrdlkzdZolWOVfgZRu5lfWj8+yW96/43GSEXNLRlmTAWa MBtxIpBGVdK/FaGd0q41PsrJ3ylTiyt7KoxQUMknQz9iS7pLa6oBcv5o+Tg7EHazcqS5nq tEl/kBVQk/haLXyaCFExCrOpB4y20uE= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-516-ECiILDqGOA6m0b_qBGBozg-1; Tue, 27 Aug 2024 05:01:52 -0400 X-MC-Unique: ECiILDqGOA6m0b_qBGBozg-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D1EAB1955BF1; Tue, 27 Aug 2024 09:01:50 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.194.72]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8B61819560A3; Tue, 27 Aug 2024 09:01:49 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Tue, 27 Aug 2024 11:00:10 +0200 Message-ID: <961883411e034dc905a8e4d2b303230166ac46c5.1724749210.git.echaudro@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v4] ofproto-dpif-upcall: Avoid stale ukeys leaks. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: i.maximets@ovn.org Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" It is observed in some environments that there are much more ukeys than actual DP flows. For example: $ ovs-appctl upcall/show system@ovs-system: flows : (current 7) (avg 6) (max 117) (limit 2125) offloaded flows : 525 dump duration : 1063ms ufid enabled : true 23: (keys 3612) 24: (keys 3625) 25: (keys 3485) The revalidator threads are busy revalidating the stale ukeys leading to high CPU and long dump duration. This patch tracks the number of consecutive missed dumps. If four dumps are missed in a row, it is assumed that the datapath flow no longer exists, and the ukey can be deleted. Reported-by: Roi Dayan Co-authored-by: Han Zhou Co-authored-by: Roi Dayan Signed-off-by: Han Zhou Signed-off-by: Roi Dayan Signed-off-by: Eelco Chaudron Acked-by: Simon Horman --- v3: - Rewrote fix to use actual dump state, and added a tests case. v4: - Added reason to end of flow_del_reason. - Rather than time based, make it missed dumps based. - Changed it from a TC specific test to a generic unit test. --- ofproto/ofproto-dpif-upcall.c | 18 +++++++++ tests/ofproto-dpif.at | 39 ++++++++++++++++++++ utilities/usdt-scripts/flow_reval_monitor.py | 4 +- 3 files changed, 60 insertions(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 4d39bc5a7..053cfd863 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow); COVERAGE_DEFINE(dumped_new_flow); COVERAGE_DEFINE(handler_duplicate_upcall); COVERAGE_DEFINE(revalidate_missed_dp_flow); +COVERAGE_DEFINE(revalidate_missing_dp_flow); COVERAGE_DEFINE(ukey_dp_change); COVERAGE_DEFINE(ukey_invalid_stat_reset); COVERAGE_DEFINE(ukey_replace_contention); @@ -284,6 +285,7 @@ enum flow_del_reason { FDR_TOO_EXPENSIVE, /* Too expensive to revalidate. */ FDR_UPDATE_FAIL, /* Datapath update failed. */ FDR_XLATION_ERROR, /* Flow translation error. */ + FDR_FLOW_MISSING_DP, /* Flow is missing from the datapath. */ }; /* 'udpif_key's are responsible for tracking the little bit of state udpif @@ -318,6 +320,7 @@ struct udpif_key { uint64_t dump_seq OVS_GUARDED; /* Tracks udpif->dump_seq. */ uint64_t reval_seq OVS_GUARDED; /* Tracks udpif->reval_seq. */ enum ukey_state state OVS_GUARDED; /* Tracks ukey lifetime. */ + uint32_t missed_dumps OVS_GUARDED; /* Missed consecutive dumps. */ /* 'state' debug information. */ unsigned int state_thread OVS_GUARDED; /* Thread that transitions. */ @@ -3040,6 +3043,21 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) result = revalidate_ukey(udpif, ukey, &stats, &odp_actions, reval_seq, &recircs, &del_reason); } + + if (ukey->dump_seq != dump_seq) { + ukey->missed_dumps++; + if (ukey->missed_dumps >= 4) { + /* If the flow was not dumped for 4 revalidator rounds, + * we can assume the datapath flow now longer exists + * and the ukey should be deleted. */ + COVERAGE_INC(revalidate_missing_dp_flow); + del_reason = FDR_FLOW_MISSING_DP; + result = UKEY_DELETE; + } + } else { + ukey->missed_dumps = 0; + } + if (result != UKEY_KEEP) { /* Clears 'recircs' if filled by revalidate_ukey(). */ reval_op_init(&ops[n_ops++], result, udpif, ukey, &recircs, diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index 42fb66de6..9702b889a 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -12661,3 +12661,42 @@ AT_CHECK([ovs-appctl revalidator/resume]) OVS_VSWITCHD_STOP AT_CLEANUP + +AT_SETUP([ofproto-dpif - Cleanup missing datapath flows]) + +OVS_VSWITCHD_START +add_of_ports br0 1 2 + +m4_define([ICMP_PKT], [m4_join([,], + [eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800)], + [ipv4(src=10.10.10.2,dst=10.10.10.1,proto=1,tos=1,ttl=128,frag=no)], + [icmp(type=8,code=0)])]) + +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flow br0 'actions=normal' ]) + +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ICMP_PKT']) + +AT_CHECK([ovs-appctl dpctl/dump-flows --names | strip_used | strip_stats | dnl + strip_duration | strip_dp_hash | sort], [0], [dnl +flow-dump from the main thread: +recirc_id(0),in_port(p1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:br0,p2 +]) + +dnl Make sure the ukey exists. +AT_CHECK([ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \ + grep -q '1)'], [0]) + +dnl Delete all datapath flows, and make sure they are gone. +AT_CHECK([ovs-appctl dpctl/del-flows]) +AT_CHECK([ovs-appctl dpctl/dump-flows --names ], [0], []) + +dnl Move forward in time and make sure we have at least 4 * 500ms. +AT_CHECK([ovs-appctl time/warp 3000 300], [0], [ignore]) + +dnl Make sure no more ukeys exists. +AT_CHECK([ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \ + grep -qv '0)'], [1]) + +OVS_VSWITCHD_STOP(["/failed to flow_del (No such file or directory)/d"]) +AT_CLEANUP diff --git a/utilities/usdt-scripts/flow_reval_monitor.py b/utilities/usdt-scripts/flow_reval_monitor.py index 28479a565..80c9c98bd 100755 --- a/utilities/usdt-scripts/flow_reval_monitor.py +++ b/utilities/usdt-scripts/flow_reval_monitor.py @@ -255,6 +255,7 @@ FdrReasons = IntEnum( "FDR_TOO_EXPENSIVE", "FDR_UPDATE_FAIL", "FDR_XLATION_ERROR", + "FDR_FLOW_MISSING_DP" ], start=0, ) @@ -270,7 +271,8 @@ FdrReasonStrings = { FdrReasons.FDR_PURGE: "User requested flow deletion", FdrReasons.FDR_TOO_EXPENSIVE: "Too expensive to revalidate", FdrReasons.FDR_UPDATE_FAIL: "Datapath update failed", - FdrReasons.FDR_XLATION_ERROR: "Flow translation error" + FdrReasons.FDR_XLATION_ERROR: "Flow translation error", + FdrReasons.FDR_FLOW_MISSING_DP: "Flow is missing from the datapath" }