From patchwork Mon Nov 11 16:30:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2009875 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=O02eQzED; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XnFRW3dXSz1xty for ; Tue, 12 Nov 2024 03:31:19 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id BD3E480D9E; Mon, 11 Nov 2024 16:31:16 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 9j-UPCLzDscA; Mon, 11 Nov 2024 16:31:13 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 165ED80D61 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=O02eQzED Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id 165ED80D61; Mon, 11 Nov 2024 16:31:13 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id C18C5C0889; Mon, 11 Nov 2024 16:31:12 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5EEF6C0889 for ; Mon, 11 Nov 2024 16:31:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 4D45040433 for ; Mon, 11 Nov 2024 16:31:10 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id z1WFOvmrjA6t for ; Mon, 11 Nov 2024 16:31:07 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org B60DE40319 Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org B60DE40319 Authentication-Results: smtp2.osuosl.org; dkim=pass (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=O02eQzED Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id B60DE40319 for ; Mon, 11 Nov 2024 16:31:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731342665; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nKnGKW/5vUdILxdOPY3ZUZvD+482RfsZ/Efg6H30Y4I=; b=O02eQzEDN6xmrZoCFAqu34RhkoSLKetEj2PMiy+Spb9q0MXtUIXc3oYZ+k0OaeNCbxzQMr /Bd/VVOV+YqLMfFI5tgoNbaurHm0jj6dVBmP/QJCaMhlrZl6gVomLpRmeemCwWIwR8Wql+ OIzOZAkFxL7IGoav4reoSpfIAnJSA0A= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-375-Uuk37ufsMUS7_pZKs6A-iw-1; Mon, 11 Nov 2024 11:31:03 -0500 X-MC-Unique: Uuk37ufsMUS7_pZKs6A-iw-1 X-Mimecast-MFC-AGG-ID: Uuk37ufsMUS7_pZKs6A-iw Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-460dd04feecso94570781cf.0 for ; Mon, 11 Nov 2024 08:31:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731342663; x=1731947463; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nKnGKW/5vUdILxdOPY3ZUZvD+482RfsZ/Efg6H30Y4I=; b=o/mhm0Ln1o9gzFApIRAu6L0HJ1mmrWnbXlELNQXR8+Yp+Br02nnZ3iKNzYvD9jqrHy ac3K7ByGahi8+UqViFCvrlkpZn/DDD5fSVxESUilyWN9MCTVmrJ9IcIhxcHJjCQMsthf cs9UdffWhaJ4+4gKzXLFLpJHTnM++ql4nze8uOZc/WrHPGAlvMs1/QgMtX0qi1ADPGzO 0pZZU0ZF5gC4hIMfBmSBhwA2za9tccYB/dsByH71MRF9DeFcYTDVyFVYSi04IGUo44JB DcnWCokEpDw5fGGHmya1kaU4iFW2ql0W2juo6G4LAzz5QXQ7PZuidwcSBctUJ0ELUhm7 BBmA== X-Gm-Message-State: AOJu0YzTIlbhvSVgLM0IpJzNToGTEZuv8KOLOOFjBjf9YjcNbSLqS6pN i07b8suMnzf3BrR9Y5e1/EJR9Q8a4F4nxpFsyutBVO+w7LB6yLGqp8hnv+yE9DC/HZFXF1bOPCz jBZkqn49koTSc/sPDMOLjGnD1VZHllccsi0BgDPGHv2nbT7EJ5bu4IVRA49/g8LSW0ZTB8ns/Me dNTSOljL2T7zAXVfpbut1MJQUKNPTgGd42WIDptHEPjJnITIC6gQ== X-Received: by 2002:a05:622a:1808:b0:460:e9ce:7629 with SMTP id d75a77b69052e-4630940eb23mr179680281cf.42.1731342662467; Mon, 11 Nov 2024 08:31:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IEzxeayRpLFQr/es/oa/DZlB9bH5zud09uYwyRBoedlEdb0v2M7hvOpFq5k40m3YAyD1PCt4g== X-Received: by 2002:a05:622a:1808:b0:460:e9ce:7629 with SMTP id d75a77b69052e-4630940eb23mr179679841cf.42.1731342661865; Mon, 11 Nov 2024 08:31:01 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-462ff57e5fasm64841381cf.58.2024.11.11.08.31.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 08:31:01 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Mon, 11 Nov 2024 17:30:46 +0100 Message-ID: <6980e3633fe62568c5a4fb08492bc620da10b05d.1731342268.git.lorenzo.bianconi@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: T2IvGVV0t6DGxHukHBnzXu5z-tdh1A5JmivOVhUBvLE_1731342663 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC v3 ovn 1/4] northd: Introduce ECMP_Nexthop table in SB db. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Introduce ECMP_Nexthop table in the SB db in order to track active ecmp-symmetric-reply connections and flush stale ones. ECMP_Nexthop table contains ip and mac address for each active nexthop. Signed-off-by: Lorenzo Bianconi --- northd/en-northd.c | 29 ++++++++++++++++++++ northd/en-northd.h | 4 +++ northd/inc-proc-northd.c | 9 ++++++- northd/northd.c | 58 +++++++++++++++++++++++++++++++++++++++- northd/northd.h | 6 +++++ ovn-sb.ovsschema | 17 ++++++++++-- ovn-sb.xml | 31 +++++++++++++++++++++ tests/ovn-northd.at | 33 ++++++++++++++++++----- 8 files changed, 176 insertions(+), 11 deletions(-) diff --git a/northd/en-northd.c b/northd/en-northd.c index 24ed31517..165af44a0 100644 --- a/northd/en-northd.c +++ b/northd/en-northd.c @@ -404,6 +404,23 @@ en_bfd_sync_run(struct engine_node *node, void *data) engine_set_node_state(node, EN_UPDATED); } +void +en_ecmp_nexthop_run(struct engine_node *node, void *data OVS_UNUSED) +{ + const struct engine_context *eng_ctx = engine_get_context(); + struct northd_data *northd_data = engine_get_input_data("northd", node); + struct static_routes_data *static_routes_data = + engine_get_input_data("static_routes", node); + const struct sbrec_ecmp_nexthop_table *sbrec_ecmp_nexthop_table = + EN_OVSDB_GET(engine_get_input("SB_ecmp_nexthop", node)); + + build_ecmp_nexthop_table(eng_ctx->ovnsb_idl_txn, + &northd_data->lr_ports, + &static_routes_data->parsed_routes, + sbrec_ecmp_nexthop_table); + engine_set_node_state(node, EN_UPDATED); +} + void *en_northd_init(struct engine_node *node OVS_UNUSED, struct engine_arg *arg OVS_UNUSED) @@ -454,6 +471,13 @@ void return data; } +void * +en_ecmp_nexthop_init(struct engine_node *node OVS_UNUSED, + struct engine_arg *arg OVS_UNUSED) +{ + return NULL; +} + void en_northd_cleanup(void *data) { @@ -526,3 +550,8 @@ en_bfd_sync_cleanup(void *data) { bfd_sync_destroy(data); } + +void +en_ecmp_nexthop_cleanup(void *data OVS_UNUSED) +{ +} diff --git a/northd/en-northd.h b/northd/en-northd.h index 631a7c17a..2666cc67e 100644 --- a/northd/en-northd.h +++ b/northd/en-northd.h @@ -42,5 +42,9 @@ bool bfd_sync_northd_change_handler(struct engine_node *node, void *data OVS_UNUSED); void en_bfd_sync_run(struct engine_node *node, void *data); void en_bfd_sync_cleanup(void *data OVS_UNUSED); +void en_ecmp_nexthop_run(struct engine_node *node, void *data); +void *en_ecmp_nexthop_init(struct engine_node *node OVS_UNUSED, + struct engine_arg *arg OVS_UNUSED); +void en_ecmp_nexthop_cleanup(void *data); #endif /* EN_NORTHD_H */ diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c index 8c834facb..8e16fde80 100644 --- a/northd/inc-proc-northd.c +++ b/northd/inc-proc-northd.c @@ -103,7 +103,8 @@ static unixctl_cb_func chassis_features_list; SB_NODE(fdb, "fdb") \ SB_NODE(static_mac_binding, "static_mac_binding") \ SB_NODE(chassis_template_var, "chassis_template_var") \ - SB_NODE(logical_dp_group, "logical_dp_group") + SB_NODE(logical_dp_group, "logical_dp_group") \ + SB_NODE(ecmp_nexthop, "ecmp_nexthop") enum sb_engine_node { #define SB_NODE(NAME, NAME_STR) SB_##NAME, @@ -162,6 +163,7 @@ static ENGINE_NODE(route_policies, "route_policies"); static ENGINE_NODE(static_routes, "static_routes"); static ENGINE_NODE(bfd, "bfd"); static ENGINE_NODE(bfd_sync, "bfd_sync"); +static ENGINE_NODE(ecmp_nexthop, "ecmp_nexthop"); void inc_proc_northd_init(struct ovsdb_idl_loop *nb, struct ovsdb_idl_loop *sb) @@ -264,6 +266,10 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb, engine_add_input(&en_bfd_sync, &en_route_policies, NULL); engine_add_input(&en_bfd_sync, &en_northd, bfd_sync_northd_change_handler); + engine_add_input(&en_ecmp_nexthop, &en_sb_ecmp_nexthop, NULL); + engine_add_input(&en_ecmp_nexthop, &en_northd, NULL); + engine_add_input(&en_ecmp_nexthop, &en_static_routes, NULL); + engine_add_input(&en_sync_meters, &en_nb_acl, NULL); engine_add_input(&en_sync_meters, &en_nb_meter, NULL); engine_add_input(&en_sync_meters, &en_sb_meter, NULL); @@ -334,6 +340,7 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb, engine_add_input(&en_sync_from_sb, &en_sb_port_binding, NULL); engine_add_input(&en_sync_from_sb, &en_sb_ha_chassis_group, NULL); + engine_add_input(&en_northd_output, &en_ecmp_nexthop, NULL); engine_add_input(&en_northd_output, &en_sync_from_sb, NULL); engine_add_input(&en_northd_output, &en_sync_to_sb, northd_output_sync_to_sb_handler); diff --git a/northd/northd.c b/northd/northd.c index 832ff1758..acaa101ec 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -10720,6 +10720,60 @@ build_bfd_map(const struct nbrec_bfd_table *nbrec_bfd_table, } } +void +build_ecmp_nexthop_table( + struct ovsdb_idl_txn *ovnsb_txn, + struct hmap *lr_ports, struct hmap *routes, + const struct sbrec_ecmp_nexthop_table *sbrec_ecmp_nexthop_table) +{ + if (!ovnsb_txn) { + return; + } + + struct sset nb_nexthops_sset = SSET_INITIALIZER(&nb_nexthops_sset); + struct sset sb_nexthops_sset = SSET_INITIALIZER(&sb_nexthops_sset); + + const struct sbrec_ecmp_nexthop *sb_ecmp_nexthop; + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, + sbrec_ecmp_nexthop_table) { + sset_add(&sb_nexthops_sset, sb_ecmp_nexthop->nexthop); + } + + struct parsed_route *pr; + HMAP_FOR_EACH (pr, key_node, routes) { + if (!pr->ecmp_symmetric_reply) { + continue; + } + + if (!pr->out_port) { + continue; + } + + struct ovn_port *out_port = ovn_port_find(lr_ports, pr->out_port->key); + if (!out_port || !out_port->sb) { + continue; + } + + const struct nbrec_logical_router_static_route *r = pr->route; + if (!sset_contains(&sb_nexthops_sset, r->nexthop)) { + sb_ecmp_nexthop = sbrec_ecmp_nexthop_insert(ovnsb_txn); + sbrec_ecmp_nexthop_set_nexthop(sb_ecmp_nexthop, r->nexthop); + sbrec_ecmp_nexthop_set_port(sb_ecmp_nexthop, out_port->sb); + } + sset_add(&nb_nexthops_sset, r->nexthop); + } + + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH_SAFE (sb_ecmp_nexthop, + sbrec_ecmp_nexthop_table) { + if (!sset_contains(&nb_nexthops_sset, sb_ecmp_nexthop->nexthop)) { + sbrec_ecmp_nexthop_delete(sb_ecmp_nexthop); + } + } + + sset_destroy(&nb_nexthops_sset); + sset_destroy(&sb_nexthops_sset); +} + /* Returns a string of the IP address of the router port 'op' that * overlaps with 'ip_s". If one is not found, returns NULL. * @@ -11160,10 +11214,11 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *lr_ports, } /* Verify that ip_prefix and nexthop are on the same network. */ + struct ovn_port *out_port = NULL; if (!is_discard_route && !find_static_route_outport(od, lr_ports, route, IN6_IS_ADDR_V4MAPPED(&prefix), - NULL, NULL)) { + NULL, &out_port)) { return; } @@ -11206,6 +11261,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *lr_ports, new_pr->hash = route_hash(new_pr); new_pr->route = route; new_pr->nbr = od->nbr; + new_pr->out_port = out_port; new_pr->ecmp_symmetric_reply = smap_get_bool(&route->options, "ecmp_symmetric_reply", false); diff --git a/northd/northd.h b/northd/northd.h index c1442ff40..e6338e60b 100644 --- a/northd/northd.h +++ b/northd/northd.h @@ -703,6 +703,7 @@ struct parsed_route { uint32_t route_table_id; uint32_t hash; const struct nbrec_logical_router_static_route *route; + struct ovn_port *out_port; bool ecmp_symmetric_reply; bool is_discard_route; const struct nbrec_logical_router *nbr; @@ -746,6 +747,11 @@ void bfd_destroy(struct bfd_data *); void bfd_sync_init(struct bfd_sync_data *); void bfd_sync_destroy(struct bfd_sync_data *); +void build_ecmp_nexthop_table( + struct ovsdb_idl_txn *ovnsb_txn, + struct hmap *lr_ports, struct hmap *routes, + const struct sbrec_ecmp_nexthop_table *sbrec_ecmp_nexthop_table); + struct lflow_table; struct lr_stateful_tracked_data; struct ls_stateful_tracked_data; diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema index 73abf2c8d..864cb0ed6 100644 --- a/ovn-sb.ovsschema +++ b/ovn-sb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Southbound", - "version": "20.37.0", - "cksum": "1950136776 31493", + "version": "20.38.0", + "cksum": "466210938 32119", "tables": { "SB_Global": { "columns": { @@ -610,6 +610,19 @@ "refTable": "Datapath_Binding"}}}}, "indexes": [["logical_port", "ip"]], "isRoot": true}, + "ECMP_Nexthop": { + "columns": { + "nexthop": {"type": "string"}, + "port": {"type": {"key": {"type": "uuid", + "refTable": "Port_Binding", + "refType": "strong"}, + "min": 0, "max": 1}}, + "mac": {"type": "string"}, + "external_ids": { + "type": {"key": "string", "value": "string", + "min": 0, "max": "unlimited"}}}, + "indexes": [["nexthop", "port"]], + "isRoot": true}, "Chassis_Template_Var": { "columns": { "chassis": {"type": "string"}, diff --git a/ovn-sb.xml b/ovn-sb.xml index ea4adc1c3..ea1d484a7 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -5217,4 +5217,35 @@ tcp.flags = RST; The set of variable values for a given chassis. + + +

+ Each record in this table represents an active ECMP route committed by + ovn-northd to ovs connection-tracking table. + ECMP_Nexthop table is used by ovn-controller + to track active ct entries and to flush stale ones. +

+ +

+ Nexthop IP address for this ECMP route. Nexthop IP address should + be the IP address of a connected router port or the IP address of + an external device used as nexthop for the given destination. +

+
+ +

+ The reference to table for the port used + to connect to the configured next-hop. +

+
+ +

+ Nexthop mac address. +

+
+ + + See External IDs at the beginning of this document. + +
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 4e5b3b172..15959a972 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -3879,7 +3879,7 @@ wait_row_count bfd 1 logical_port=r0-sw3 detect_mult=5 dst_ip=192.168.3.2 \ check_engine_stats northd norecompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3894,7 +3894,7 @@ wait_row_count bfd 1 logical_port=r0-sw1 min_rx=1000 min_tx=1000 detect_mult=100 check_engine_stats northd norecompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3906,7 +3906,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats static_routes recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3922,7 +3922,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats static_routes recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3934,7 +3934,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats route_policies recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3969,7 +3969,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats static_routes recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3984,7 +3984,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats route_policies recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -6809,6 +6809,10 @@ check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public # ECMP flows will be added even if there is only one next-hop. check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.10 +check_row_count ECMP_Nexthop 1 +uuid=$(fetch_column Port_Binding _uuid logical_port=lr0-public) +check_column 192.168.0.10 ECMP_Nexthop nexthop +check_column "$uuid" ECMP_Nexthop port ovn-sbctl dump-flows lr0 > lr0flows @@ -6828,6 +6832,13 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dn ]) check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.20 +check_row_count ECMP_Nexthop 2 +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.10], [0], [dnl +192.168.0.10 +]) +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.20], [0], [dnl +192.168.0.20 +]) ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl @@ -6857,6 +6868,13 @@ AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [ # add ecmp route with wrong nexthop check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.1.20 +check_row_count ECMP_Nexthop 2 +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.10], [0], [dnl +192.168.0.10 +]) +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.20], [0], [dnl +192.168.0.20 +]) ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl @@ -6876,6 +6894,7 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192. check ovn-nbctl lr-route-del lr0 wait_row_count nb:Logical_Router_Static_Route 0 +check_row_count ECMP_Nexthop 0 check ovn-nbctl --wait=sb lr-route-add lr0 1.0.0.0/24 192.168.0.10 ovn-sbctl dump-flows lr0 > lr0flows From patchwork Mon Nov 11 16:30:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2009876 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=aff4krqN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XnFRY0vJLz1xty for ; Tue, 12 Nov 2024 03:31:21 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 142F14067F; Mon, 11 Nov 2024 16:31:18 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id H-BOG12yAXSo; Mon, 11 Nov 2024 16:31:15 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org CE1564064E Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=aff4krqN Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id CE1564064E; Mon, 11 Nov 2024 16:31:14 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 2AAE9C08A9; Mon, 11 Nov 2024 16:31:14 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8DE42C087D for ; Mon, 11 Nov 2024 16:31:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 70402605C5 for ; Mon, 11 Nov 2024 16:31:10 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id Zeqdd-jFGMRm for ; Mon, 11 Nov 2024 16:31:09 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org 00BCA6066A Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 00BCA6066A Authentication-Results: smtp3.osuosl.org; dkim=pass (1024-bit key, unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=aff4krqN Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id 00BCA6066A for ; Mon, 11 Nov 2024 16:31:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731342668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N1bvOrbhJNoZTMyfN2TU9O+uI9wrj2ZRYCS3ozfCjuM=; b=aff4krqNSiFntkeiVOCbunyYNhztiasOIUzmcmB7O6zAHIkhKlgr89cUsb6krQf+wg0slK OiP9v4+AMpI2ELuelCWLfgPW9Uzd0a3lrqfPPaGelr+yNVoX/wP2Y0+pPtuhn50iOnvtk7 IbcMvuJj3JFjc74mderUazrTOUEj29I= Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com [209.85.167.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-671-RU8XQTu8PfGQM0eVoyXQUw-1; Mon, 11 Nov 2024 11:31:07 -0500 X-MC-Unique: RU8XQTu8PfGQM0eVoyXQUw-1 X-Mimecast-MFC-AGG-ID: RU8XQTu8PfGQM0eVoyXQUw Received: by mail-oi1-f198.google.com with SMTP id 5614622812f47-3e60dde7edcso2973844b6e.0 for ; Mon, 11 Nov 2024 08:31:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731342666; x=1731947466; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N1bvOrbhJNoZTMyfN2TU9O+uI9wrj2ZRYCS3ozfCjuM=; b=oLarjvwGdNfEt5vrv+ixcc37b1vWAD1OrxFO+VBZ8jWpLObct5S1/5QXyGXKNTGSva tzkujgbWbAyqDnexZNvC8e1sKEnkDaVa69rjJzuW7/UFbQHIRAqcwj7v6w7S33eDq1z5 +PK/5QH0CINLHHvofvn9XVDG/Pu6TY3toRSzuY2bVn1JKo8Uvm6sspktvUNx4qhk85s2 U1q89bJmbN73Q1LmIr3M9SiTrMDF/WiSB45jtutKno301kEw31wyszlzTO4gZpt5Xq2l zEmfaU90JXdbiX2oX04ttKCaQ4ZEQwSXvF6Rqq0UyMBl9QQi/7M5GyZCZdlMFS3/hV/4 mBUw== X-Gm-Message-State: AOJu0YzYpaqkG4N4oJuE0QwiLfubrOM7QTKigjCWhFI5j4w0wh6Amhno nCfOL0ytbyuSbMQlpGXFLDkFfqgh9UgaR6t7uw2WHU7TArOj6q69/+6mGaAzEnlsrgeHweNQwA7 Y1fxGyh6ezYA7tFKNQ1EaazXev8AkEC9wadTDAk3X4lEDQOCMKCevAwkSa+hKmwyAnG7+DgB8qW eUGCuWLVhlFYkSDfKlsYAonDQkwynibWiZqXnvnQ5MSYO2raavtQ== X-Received: by 2002:a05:6808:1987:b0:3e6:1057:21af with SMTP id 5614622812f47-3e794772f73mr9546452b6e.41.1731342665715; Mon, 11 Nov 2024 08:31:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IFeQuhOSzzaCPGTNtbYPTcisN/gCA4Xz0fFu01SDgiSQ58sZLKFYUqOi4NvId0G3L7GFuZg0A== X-Received: by 2002:a05:6808:1987:b0:3e6:1057:21af with SMTP id 5614622812f47-3e794772f73mr9546388b6e.41.1731342665015; Mon, 11 Nov 2024 08:31:05 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6d3961ecc85sm61549536d6.39.2024.11.11.08.31.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 08:31:04 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Mon, 11 Nov 2024 17:30:47 +0100 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Ze9cxZZC3BtiDf6GrJQpkYBE3DWioPBja-9iKz8g8Z8_1731342666 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC v3 ovn 2/4] pinctrl: Send periodic arp/nd to ecmp next-hops. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Introduce the capbility to periodically send ARP/ND packets for ECMP nexthops in order to resolve their L2 address. This is a preliminary patch to introduce the capability to flush stale ECMP CT entries. Signed-off-by: Lorenzo Bianconi --- NEWS | 5 + controller/ovn-controller.8.xml | 10 ++ controller/ovn-controller.c | 2 + controller/pinctrl.c | 284 +++++++++++++++++++++++++++++++- controller/pinctrl.h | 2 + 5 files changed, 300 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index da3aba739..1f8f54d5d 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,11 @@ Post v24.09.0 hash (with specified hash fields) for ECMP routes while choosing nexthop. - ovn-ic: Add support for route tag to prevent route learning. + - Add "arp-max-timeout-sec" config option to vswitchd external-ids to + cap the time between when ovn-controller sends ARP/ND packets for + ECMP-nexthop. + By default ovn-controller continuously sends ARP/ND packets for + ECMP-nexthop. OVN v24.09.0 - 13 Sep 2024 -------------------------- diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml index aeaa374c1..7f95a9932 100644 --- a/controller/ovn-controller.8.xml +++ b/controller/ovn-controller.8.xml @@ -385,6 +385,16 @@ cap for the exponential backoff used by ovn-controller to send GARPs packets. +
external_ids:arp-nd-max-timeout-sec
+
+ When used, this configuration value specifies the maximum timeout + (in seconds) between two consecutive ARP/ND packets sent by + ovn-controller to resolve ECMP nexthop mac address. + ovn-controller by default continuously sends ARP/ND + packets. Setting external_ids:arp-nd-max-timeout-sec + allows to cap for the exponential backoff used by ovn-controller + to send ARPs/NDs packets. +
external_ids:ovn-bridge-remote

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index ab9593850..c93995b94 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5699,6 +5699,8 @@ main(int argc, char *argv[]) sbrec_mac_binding_table_get( ovnsb_idl_loop.idl), sbrec_bfd_table_get(ovnsb_idl_loop.idl), + sbrec_ecmp_nexthop_table_get( + ovnsb_idl_loop.idl), br_int, chassis, &runtime_data->local_datapaths, &runtime_data->active_tunnels, diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 07cd140e0..d97b10710 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -170,6 +170,9 @@ static struct seq *pinctrl_main_seq; static long long int garp_rarp_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; static bool garp_rarp_continuous; +static long long int arp_nd_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; +static bool arp_nd_continuous; + static void *pinctrl_handler(void *arg); struct pinctrl { @@ -230,13 +233,17 @@ static void run_activated_ports( const struct sbrec_chassis *chassis); static void init_send_garps_rarps(void); +static void init_send_arps_nds(void); static void destroy_send_garps_rarps(void); +static void destroy_send_arps_nds(void); static void send_garp_rarp_wait(long long int send_garp_rarp_time); +static void send_arp_nd_wait(long long int send_arp_nd_time); static void send_garp_rarp_prepare( struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *, const struct sbrec_chassis *, const struct hmap *local_datapaths, @@ -246,6 +253,9 @@ static void send_garp_rarp_prepare( static void send_garp_rarp_run(struct rconn *swconn, long long int *send_garp_rarp_time) OVS_REQUIRES(pinctrl_mutex); +static void send_arp_nd_run(struct rconn *swconn, + long long int *send_arp_nd_time) + OVS_REQUIRES(pinctrl_mutex); static void pinctrl_handle_nd_na(struct rconn *swconn, const struct flow *ip_flow, const struct match *md, @@ -555,6 +565,7 @@ pinctrl_init(void) { init_put_mac_bindings(); init_send_garps_rarps(); + init_send_arps_nds(); init_ipv6_ras(); init_ipv6_prefixd(); init_buffered_packets_ctx(); @@ -3997,6 +4008,7 @@ pinctrl_handler(void *arg_) static long long int send_ipv6_ra_time = LLONG_MAX; /* Next GARP/RARP announcement in ms. */ static long long int send_garp_rarp_time = LLONG_MAX; + static long long int send_arp_nd_time = LLONG_MAX; /* Next multicast query (IGMP) in ms. */ static long long int send_mcast_query_time = LLONG_MAX; static long long int svc_monitors_next_run_time = LLONG_MAX; @@ -4034,6 +4046,7 @@ pinctrl_handler(void *arg_) if (may_inject_pkts()) { ovs_mutex_lock(&pinctrl_mutex); send_garp_rarp_run(swconn, &send_garp_rarp_time); + send_arp_nd_run(swconn, &send_arp_nd_time); send_ipv6_ras(swconn, &send_ipv6_ra_time); send_ipv6_prefixd(swconn, &send_prefixd_time); send_mac_binding_buffered_pkts(swconn); @@ -4052,6 +4065,7 @@ pinctrl_handler(void *arg_) rconn_recv_wait(swconn); if (rconn_is_connected(swconn)) { send_garp_rarp_wait(send_garp_rarp_time); + send_arp_nd_wait(send_arp_nd_time); ipv6_ra_wait(send_ipv6_ra_time); ip_mcast_querier_wait(send_mcast_query_time); svc_monitors_wait(svc_monitors_next_run_time); @@ -4148,6 +4162,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, const struct sbrec_service_monitor_table *svc_mon_table, const struct sbrec_mac_binding_table *mac_binding_table, const struct sbrec_bfd_table *bfd_table, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis, const struct hmap *local_datapaths, @@ -4164,8 +4179,9 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, sbrec_port_binding_by_key, chassis); send_garp_rarp_prepare(ovnsb_idl_txn, sbrec_port_binding_by_datapath, sbrec_port_binding_by_name, - sbrec_mac_binding_by_lport_ip, br_int, chassis, - local_datapaths, active_tunnels, ovs_table); + sbrec_mac_binding_by_lport_ip, ecmp_nh_table, + br_int, chassis, local_datapaths, active_tunnels, + ovs_table); prepare_ipv6_ras(local_active_ports_ras, sbrec_port_binding_by_name); prepare_ipv6_prefixd(ovnsb_idl_txn, sbrec_port_binding_by_name, local_active_ports_ipv6_pd, chassis, @@ -4700,6 +4716,7 @@ pinctrl_destroy(void) latch_destroy(&pinctrl.pinctrl_thread_exit); rconn_destroy(pinctrl.swconn); destroy_send_garps_rarps(); + destroy_send_arps_nds(); destroy_ipv6_ras(); destroy_ipv6_prefixd(); destroy_buffered_packets_ctx(); @@ -5208,6 +5225,150 @@ send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, } } +struct arp_nd_data { + struct hmap_node hmap_node; + struct eth_addr ea; /* Ethernet address of port. */ + struct in6_addr src_ip; /* IP address of port. */ + struct in6_addr dst_ip; /* Destination IP address */ + long long int announce_time; /* Next announcement in ms. */ + int backoff; /* Backoff timeout for the next + * announcement (in msecs). */ + uint32_t dp_key; /* Datapath used to output this GARP. */ + uint32_t port_key; /* Port to inject the GARP into. */ +}; + +static struct hmap send_arp_nd_data; + +static void +init_send_arps_nds(void) +{ + hmap_init(&send_arp_nd_data); +} + +static void +destroy_send_arps_nds(void) +{ + struct arp_nd_data *e; + HMAP_FOR_EACH_POP (e, hmap_node, &send_arp_nd_data) { + free(e); + } + hmap_destroy(&send_arp_nd_data); +} + +static struct arp_nd_data * +arp_nd_find_data(const struct sbrec_port_binding *pb, + const struct in6_addr *nexthop) +{ + uint32_t hash = 0; + + hash = hash_add_in6_addr(hash, nexthop); + hash = hash_add(hash, pb->datapath->tunnel_key); + hash = hash_add(hash, pb->tunnel_key); + + struct arp_nd_data *e; + HMAP_FOR_EACH_WITH_HASH (e, hmap_node, hash, &send_arp_nd_data) { + if (ipv6_addr_equals(&e->dst_ip, nexthop) && + e->port_key == pb->tunnel_key) { + return e; + } + } + + return NULL; +} + +static bool +arp_nd_find_is_stale(const struct arp_nd_data *e, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, + const struct sbrec_chassis *chassis) +{ + const struct sbrec_ecmp_nexthop *sb_ecmp_nexthop; + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, ecmp_nh_table) { + const struct sbrec_port_binding *pb = sb_ecmp_nexthop->port; + if (pb->chassis != chassis) { + continue; + } + + struct lport_addresses laddrs; + if (!extract_ip_addresses(sb_ecmp_nexthop->nexthop, &laddrs)) { + continue; + } + + struct in6_addr dst_ip = laddrs.n_ipv4_addrs + ? in6_addr_mapped_ipv4(laddrs.ipv4_addrs[0].addr) + : laddrs.ipv6_addrs[0].addr; + destroy_lport_addresses(&laddrs); + + if (pb->tunnel_key == e->port_key && + pb->datapath->tunnel_key == e->dp_key && + ipv6_addr_equals(&e->dst_ip, &dst_ip)) { + return false; + } + } + return true; +} + +static struct arp_nd_data * +arp_nd_alloc_data(const struct eth_addr ea, + struct in6_addr src_ip, struct in6_addr dst_ip, + const struct sbrec_port_binding *pb) +{ + struct arp_nd_data *e = xmalloc(sizeof *e); + e->ea = ea; + e->src_ip = src_ip; + e->dst_ip = dst_ip; + e->announce_time = time_msec() + 1000; + e->backoff = 1000; /* msec. */ + e->dp_key = pb->datapath->tunnel_key; + e->port_key = pb->tunnel_key; + + uint32_t hash = 0; + hash = hash_add_in6_addr(hash, &dst_ip); + hash = hash_add(hash, e->dp_key); + hash = hash_add(hash, e->port_key); + hmap_insert(&send_arp_nd_data, &e->hmap_node, hash); + notify_pinctrl_handler(); + + return e; +} + +/* Add or update a vif for which ARPs need to be announced. */ +static void +send_arp_nd_update(const struct sbrec_port_binding *pb, const char *nexthop, + long long int max_arp_timeout, bool continuous_arp_nd) +{ + struct lport_addresses laddrs; + if (!extract_ip_addresses(nexthop, &laddrs)) { + return; + } + + struct in6_addr dst_ip = laddrs.n_ipv4_addrs + ? in6_addr_mapped_ipv4(laddrs.ipv4_addrs[0].addr) + : laddrs.ipv6_addrs[0].addr; + destroy_lport_addresses(&laddrs); + + struct arp_nd_data *e = arp_nd_find_data(pb, &dst_ip); + if (!e) { + if (!extract_lsp_addresses(pb->mac[0], &laddrs)) { + return; + } + + if (laddrs.n_ipv4_addrs) { + arp_nd_alloc_data(laddrs.ea, + in6_addr_mapped_ipv4(laddrs.ipv4_addrs[0].addr), + dst_ip, pb); + } else if (laddrs.n_ipv6_addrs) { + arp_nd_alloc_data(laddrs.ea, laddrs.ipv6_addrs[0].addr, + dst_ip, pb); + } + destroy_lport_addresses(&laddrs); + } else if (max_arp_timeout != arp_nd_max_timeout || + continuous_arp_nd != arp_nd_continuous) { + /* reset backoff */ + e->announce_time = time_msec() + 1000; + e->backoff = 1000; /* msec. */ + } +} + /* Remove a vif from GARP announcements. */ static void send_garp_rarp_delete(const char *lport) @@ -6546,6 +6707,16 @@ send_garp_rarp_wait(long long int send_garp_rarp_time) } } +static void +send_arp_nd_wait(long long int send_arp_nd_time) +{ + /* Set the poll timer for next arp packet only if there is data to + * be sent. */ + if (hmap_count(&send_arp_nd_data)) { + poll_timer_wait_until(send_arp_nd_time); + } +} + /* Called with in the pinctrl_handler thread context. */ static void send_garp_rarp_run(struct rconn *swconn, long long int *send_garp_rarp_time) @@ -6568,6 +6739,84 @@ send_garp_rarp_run(struct rconn *swconn, long long int *send_garp_rarp_time) } } +static long long int +send_arp_nd(struct rconn *swconn, struct arp_nd_data *e, + long long int current_time) + OVS_REQUIRES(pinctrl_mutex) +{ + if (current_time < e->announce_time) { + return e->announce_time; + } + + /* Compose a ARP request packet. */ + uint64_t packet_stub[128 / 8]; + struct dp_packet packet; + dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); + if (IN6_IS_ADDR_V4MAPPED(&e->src_ip)) { + compose_arp(&packet, ARP_OP_REQUEST, e->ea, eth_addr_zero, + true, in6_addr_get_mapped_ipv4(&e->src_ip), + in6_addr_get_mapped_ipv4(&e->dst_ip)); + } else { + compose_nd_ns(&packet, e->ea, &e->src_ip, &e->dst_ip); + } + + /* Inject ARP request. */ + uint64_t ofpacts_stub[4096 / 8]; + struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub); + enum ofp_version version = rconn_get_version(swconn); + put_load(e->dp_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts); + put_load(e->port_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts); + struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts); + resubmit->in_port = OFPP_CONTROLLER; + resubmit->table_id = OFTABLE_LOCAL_OUTPUT; + + struct ofputil_packet_out po = { + .packet = dp_packet_data(&packet), + .packet_len = dp_packet_size(&packet), + .buffer_id = UINT32_MAX, + .ofpacts = ofpacts.data, + .ofpacts_len = ofpacts.size, + }; + match_set_in_port(&po.flow_metadata, OFPP_CONTROLLER); + enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version); + queue_msg(swconn, ofputil_encode_packet_out(&po, proto)); + dp_packet_uninit(&packet); + ofpbuf_uninit(&ofpacts); + + /* Set the next announcement. At most 5 announcements are sent for a + * vif if arp_nd_max_timeout is not specified otherwise cap the max + * timeout to arp_nd_max_timeout. */ + if (arp_nd_continuous || e->backoff < arp_nd_max_timeout) { + e->announce_time = current_time + e->backoff; + } else { + e->announce_time = LLONG_MAX; + } + e->backoff = MIN(arp_nd_max_timeout, e->backoff * 2); + + return e->announce_time; +} + +static void +send_arp_nd_run(struct rconn *swconn, long long int *send_arp_nd_time) + OVS_REQUIRES(pinctrl_mutex) +{ + if (!hmap_count(&send_arp_nd_data)) { + return; + } + + /* Send ARPs, and update the next announcement. */ + long long int current_time = time_msec(); + *send_arp_nd_time = LLONG_MAX; + + struct arp_nd_data *e; + HMAP_FOR_EACH (e, hmap_node, &send_arp_nd_data) { + long long int next_announce = send_arp_nd(swconn, e, current_time); + if (*send_arp_nd_time > next_announce) { + *send_arp_nd_time = next_announce; + } + } +} + /* Called by pinctrl_run(). Runs with in the main ovn-controller * thread context. */ static void @@ -6575,6 +6824,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis, const struct hmap *local_datapaths, @@ -6587,7 +6837,8 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, struct sset nat_ip_keys = SSET_INITIALIZER(&nat_ip_keys); struct shash nat_addresses; unsigned long long garp_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; - bool garp_continuous = false; + unsigned long long max_arp_nd_timeout = GARP_RARP_DEF_MAX_TIMEOUT; + bool garp_continuous = false, continuous_arp_nd = true; const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_table_first(ovs_table); if (cfg) { @@ -6597,6 +6848,11 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, if (!garp_max_timeout) { garp_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; } + + max_arp_nd_timeout = smap_get_ullong( + &cfg->external_ids, "arp-nd-max-timeout-sec", + GARP_RARP_DEF_MAX_TIMEOUT / 1000) * 1000; + continuous_arp_nd = !!max_arp_nd_timeout; } shash_init(&nat_addresses); @@ -6610,6 +6866,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, &nat_ip_keys, &local_l3gw_ports, chassis, active_tunnels, &nat_addresses); + /* For deleted ports and deleted nat ips, remove from * send_garp_rarp_data. */ struct shash_node *iter; @@ -6645,6 +6902,24 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, } } + struct arp_nd_data *e; + const struct sbrec_ecmp_nexthop *sb_ecmp_nexthop; + HMAP_FOR_EACH_SAFE (e, hmap_node, &send_arp_nd_data) { + if (arp_nd_find_is_stale(e, ecmp_nh_table, chassis)) { + hmap_remove(&send_arp_nd_data, &e->hmap_node); + free(e); + notify_pinctrl_handler(); + } + } + + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, ecmp_nh_table) { + const struct sbrec_port_binding *pb = sb_ecmp_nexthop->port; + if (pb && !strcmp(pb->type, "l3gateway") && pb->chassis == chassis) { + send_arp_nd_update(pb, sb_ecmp_nexthop->nexthop, + max_arp_nd_timeout, continuous_arp_nd); + } + } + /* pinctrl_handler thread will send the GARPs. */ sset_destroy(&localnet_vifs); @@ -6662,6 +6937,9 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, garp_rarp_max_timeout = garp_max_timeout; garp_rarp_continuous = garp_continuous; + + arp_nd_max_timeout = max_arp_nd_timeout; + arp_nd_continuous = continuous_arp_nd; } static bool diff --git a/controller/pinctrl.h b/controller/pinctrl.h index 3462b670c..5c8ea7aea 100644 --- a/controller/pinctrl.h +++ b/controller/pinctrl.h @@ -36,6 +36,7 @@ struct sbrec_dns_table; struct sbrec_controller_event_table; struct sbrec_service_monitor_table; struct sbrec_bfd_table; +struct sbrec_ecmp_nexthop_table; struct sbrec_port_binding; struct sbrec_mac_binding_table; @@ -54,6 +55,7 @@ void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, const struct sbrec_service_monitor_table *, const struct sbrec_mac_binding_table *, const struct sbrec_bfd_table *, + const struct sbrec_ecmp_nexthop_table *, const struct ovsrec_bridge *, const struct sbrec_chassis *, const struct hmap *local_datapaths, const struct sset *active_tunnels, From patchwork Mon Nov 11 16:30:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2009877 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=SSk8FEdI; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XnFRc3hJtz1xty for ; Tue, 12 Nov 2024 03:31:24 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id A25DE405A2; Mon, 11 Nov 2024 16:31:21 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id jCKNvD7oaaT2; Mon, 11 Nov 2024 16:31:18 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 5CC5740527 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=SSk8FEdI Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id 5CC5740527; Mon, 11 Nov 2024 16:31:16 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 23D77C08BD; Mon, 11 Nov 2024 16:31:16 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 58AA0C0889 for ; Mon, 11 Nov 2024 16:31:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 3902640651 for ; Mon, 11 Nov 2024 16:31:12 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id L2F1wO1wph0Q for ; Mon, 11 Nov 2024 16:31:11 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org 9C4ED40631 Authentication-Results: smtp4.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 9C4ED40631 Authentication-Results: smtp4.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=SSk8FEdI Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 9C4ED40631 for ; Mon, 11 Nov 2024 16:31:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731342669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VXpihsfq/a9ZwRHMwErwHHMKmThm5+Rso4VqgUJnCqU=; b=SSk8FEdIgq6gMRnplfmT/DXRBw6UXbths1dCSUsTOze5JwHXTtBRCvroj57OKzwZGQhMkl zsPMfZZal4xu91lEPq8Z/hgIrH6rP00KlFnc5p0n/WAHFZGvkqv4atZ3GD62DyQz3hjAf+ koOpYiLhCcagsNk4F5VFN/gAMosFplg= Received: from mail-oi1-f197.google.com (mail-oi1-f197.google.com [209.85.167.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-689-zJZvNG53MHORgt4Rvw1qJQ-1; Mon, 11 Nov 2024 11:31:08 -0500 X-MC-Unique: zJZvNG53MHORgt4Rvw1qJQ-1 X-Mimecast-MFC-AGG-ID: zJZvNG53MHORgt4Rvw1qJQ Received: by mail-oi1-f197.google.com with SMTP id 5614622812f47-3e5fef7b126so4805988b6e.0 for ; Mon, 11 Nov 2024 08:31:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731342667; x=1731947467; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VXpihsfq/a9ZwRHMwErwHHMKmThm5+Rso4VqgUJnCqU=; b=Vo/Jj5164CU7YEpb670pFyx43x28SKcPraIwQUCGpO99CScddnUJOYD5eH4A+ha4wW U54C2FFAznzG7TpuaGANn8gXfRTfI0R5fUyUjf7wCrq13R1eknhsnVGaEpouqbhW4OTc x417iTwP/QUBXClQmEkiW5UDmClTK6sDcb0QyHGxGwcEZivun3qFGqJN1dJtSn/CfCna aSFAYVz6SMcuc22PwpR/7p38tWTeLIvgHI1EaxAaSekhK1WVtSN51sfTbY0JgAkDJ/ER uv2Odey7TT94SR8zf2biAyBe4FG3LNnhsLj/I106muMp6epeBoFH9xg02EB07qHBSx0B kW6w== X-Gm-Message-State: AOJu0YxKM9ZnY0ORcM3dCo/1s15Nea45EZ9ksrGmW8Ie0xzHAYL2YNpb +7ONBX7xfBH51C9fz9lRvwmfu0k284bGcmL52G9xtDaymFK9Zc6GrceHOR4aD3a4ndFr8m3ZTw2 jtCh1+iRnrW7T1OKIj8el7yZsMoVXGZ0tT0KZQ+V+KN/sCfRh0gJ5EYsioqk6tZN6f9g9xtxpDi IFj0cC1QX4mWEgzY14D7SuCmnJOaa/DuWClV9u171kKqD98iIfOA== X-Received: by 2002:a05:6808:1703:b0:3e5:fd08:d020 with SMTP id 5614622812f47-3e79477093amr10704736b6e.42.1731342667374; Mon, 11 Nov 2024 08:31:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IHygkViDUEOu+WCsgodsOoHltcNcf6CZeXape2GELPiCQQlArvl3xb4uQqQvhvFAEpwY9RuNQ== X-Received: by 2002:a05:6808:1703:b0:3e5:fd08:d020 with SMTP id 5614622812f47-3e79477093amr10704685b6e.42.1731342666849; Mon, 11 Nov 2024 08:31:06 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6d396643437sm61136786d6.129.2024.11.11.08.31.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 08:31:06 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Mon, 11 Nov 2024 17:30:48 +0100 Message-ID: <9a3d81afee3aca22ee5f1c6dce35e2283ca3f4de.1731342268.git.lorenzo.bianconi@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: -l3CdcvVdZjx1HfNSM_BNPuFDcUo-pNrKsiWGqypcPg_1731342667 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC v3 ovn 3/4] pinctrl: Update ecmp-nexthop mac resolving L2 address. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Set the mac address column in the ECMP_Nexthop table updating sb MAC_Binding table. This is a preliminary patch to introduce the capability to flush stale ECMP CT entries. Signed-off-by: Lorenzo Bianconi --- controller/ovn-controller.c | 4 ++ controller/pinctrl.c | 85 ++++++++++++++++++++++++++++++------- controller/pinctrl.h | 1 + 3 files changed, 75 insertions(+), 15 deletions(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index c93995b94..c319e0d77 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -4963,6 +4963,9 @@ main(int argc, char *argv[]) = ovsdb_idl_index_create2(ovnsb_idl_loop.idl, &sbrec_fdb_col_mac, &sbrec_fdb_col_dp_key); + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop + = ovsdb_idl_index_create1(ovnsb_idl_loop.idl, + &sbrec_ecmp_nexthop_col_nexthop); struct ovsdb_idl_index *sbrec_mac_binding_by_datapath = mac_binding_by_datapath_index_create(ovnsb_idl_loop.idl); struct ovsdb_idl_index *sbrec_static_mac_binding_by_datapath @@ -5691,6 +5694,7 @@ main(int argc, char *argv[]) sbrec_igmp_group, sbrec_ip_multicast, sbrec_fdb_by_dp_key_mac, + sbrec_ecmp_by_nexthop, sbrec_dns_table_get(ovnsb_idl_loop.idl), sbrec_controller_event_table_get( ovnsb_idl_loop.idl), diff --git a/controller/pinctrl.c b/controller/pinctrl.c index d97b10710..a3cfe73bd 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -207,11 +207,15 @@ static void pinctrl_handle_put_mac_binding(const struct flow *md, OVS_REQUIRES(pinctrl_mutex); static void init_put_mac_bindings(void); static void destroy_put_mac_bindings(void); +static const struct sbrec_ecmp_nexthop *ecmp_nexthop_lookup( + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, + const char *nexthop); static void run_put_mac_bindings( struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_datapath_binding_by_key, struct ovsdb_idl_index *sbrec_port_binding_by_key, - struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip) + struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop) OVS_REQUIRES(pinctrl_mutex); static void wait_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn); static void send_mac_binding_buffered_pkts(struct rconn *swconn) @@ -243,6 +247,7 @@ static void send_garp_rarp_prepare( struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *, const struct sbrec_chassis *, @@ -4157,6 +4162,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_igmp_groups, struct ovsdb_idl_index *sbrec_ip_multicast_opts, struct ovsdb_idl_index *sbrec_fdb_by_dp_key_mac, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_dns_table *dns_table, const struct sbrec_controller_event_table *ce_table, const struct sbrec_service_monitor_table *svc_mon_table, @@ -4174,12 +4180,14 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, ovs_mutex_lock(&pinctrl_mutex); run_put_mac_bindings(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_key, - sbrec_mac_binding_by_lport_ip); + sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop); run_put_vport_bindings(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_key, chassis); send_garp_rarp_prepare(ovnsb_idl_txn, sbrec_port_binding_by_datapath, sbrec_port_binding_by_name, - sbrec_mac_binding_by_lport_ip, ecmp_nh_table, + sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop, ecmp_nh_table, br_int, chassis, local_datapaths, active_tunnels, ovs_table); prepare_ipv6_ras(local_active_ports_ras, sbrec_port_binding_by_name); @@ -4827,6 +4835,7 @@ send_mac_binding_buffered_pkts(struct rconn *swconn) static void mac_binding_add_to_sb(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const char *logical_port, const struct sbrec_datapath_binding *dp, struct eth_addr ea, const char *ip, @@ -4858,6 +4867,12 @@ mac_binding_add_to_sb(struct ovsdb_idl_txn *ovnsb_idl_txn, sbrec_mac_binding_set_timestamp(b, time_wall_msec()); } } + + const struct sbrec_ecmp_nexthop *ecmp_nh = + ecmp_nexthop_lookup(sbrec_ecmp_by_nexthop, ip); + if (ecmp_nh) { + sbrec_ecmp_nexthop_set_mac(ecmp_nh, mac_string); + } } /* Simulate the effect of a GARP on local datapaths, i.e., create MAC_Bindings @@ -4866,6 +4881,7 @@ mac_binding_add_to_sb(struct ovsdb_idl_txn *ovnsb_idl_txn, static void send_garp_locally(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct hmap *local_datapaths, const struct sbrec_port_binding *in_pb, struct eth_addr ea, ovs_be32 ip) @@ -4895,17 +4911,33 @@ send_garp_locally(struct ovsdb_idl_txn *ovnsb_idl_txn, ip_format_masked(ip, OVS_BE32_MAX, &ip_s); mac_binding_add_to_sb(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - remote->logical_port, remote->datapath, - ea, ds_cstr(&ip_s), update_only); + sbrec_ecmp_by_nexthop, remote->logical_port, + remote->datapath, ea, ds_cstr(&ip_s), + update_only); ds_destroy(&ip_s); } } +static const struct sbrec_ecmp_nexthop * +ecmp_nexthop_lookup(struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, + const char *nexthop) +{ + struct sbrec_ecmp_nexthop *ecmp_nh = + sbrec_ecmp_nexthop_index_init_row(sbrec_ecmp_by_nexthop); + sbrec_ecmp_nexthop_set_nexthop(ecmp_nh, nexthop); + const struct sbrec_ecmp_nexthop *retval = + sbrec_ecmp_nexthop_index_find(sbrec_ecmp_by_nexthop, ecmp_nh); + sbrec_ecmp_nexthop_index_destroy_row(ecmp_nh); + + return retval; +} + static void run_put_mac_binding(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_datapath_binding_by_key, struct ovsdb_idl_index *sbrec_port_binding_by_key, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct mac_binding *mb) { /* Convert logical datapath and logical port key into lport. */ @@ -4928,8 +4960,9 @@ run_put_mac_binding(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ds ip_s = DS_EMPTY_INITIALIZER; ipv6_format_mapped(&mb->data.ip, &ip_s); mac_binding_add_to_sb(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - pb->logical_port, pb->datapath, mb->data.mac, - ds_cstr(&ip_s), false); + sbrec_ecmp_by_nexthop, pb->logical_port, + pb->datapath, mb->data.mac, ds_cstr(&ip_s), + false); ds_destroy(&ip_s); } @@ -4939,7 +4972,8 @@ static void run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_datapath_binding_by_key, struct ovsdb_idl_index *sbrec_port_binding_by_key, - struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip) + struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop) OVS_REQUIRES(pinctrl_mutex) { if (!ovnsb_idl_txn) { @@ -4954,7 +4988,8 @@ run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn, run_put_mac_binding(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_key, - sbrec_mac_binding_by_lport_ip, mb); + sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop, mb); mac_binding_remove(&put_mac_bindings, mb); } } @@ -5104,6 +5139,7 @@ add_garp_rarp(const char *name, const struct eth_addr ea, ovs_be32 ip, static void send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct hmap *local_datapaths, const struct sbrec_port_binding *binding_rec, struct shash *nat_addresses, @@ -5147,8 +5183,9 @@ send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, binding_rec->tunnel_key); send_garp_locally(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - local_datapaths, binding_rec, laddrs->ea, - laddrs->ipv4_addrs[i].addr); + sbrec_ecmp_by_nexthop, + local_datapaths, binding_rec, + laddrs->ea, laddrs->ipv4_addrs[i].addr); } free(name); @@ -5217,7 +5254,8 @@ send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, binding_rec->tunnel_key); if (ip) { send_garp_locally(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - local_datapaths, binding_rec, laddrs.ea, ip); + sbrec_ecmp_by_nexthop, local_datapaths, + binding_rec, laddrs.ea, ip); } destroy_lport_addresses(&laddrs); @@ -6824,6 +6862,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis, @@ -6885,6 +6924,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, if (pb && !smap_get_bool(&pb->options, "disable_garp_rarp", false)) { send_garp_rarp_update(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop, local_datapaths, pb, &nat_addresses, garp_max_timeout, garp_continuous); } @@ -6897,8 +6937,9 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, = lport_lookup_by_name(sbrec_port_binding_by_name, gw_port); if (pb && !smap_get_bool(&pb->options, "disable_garp_rarp", false)) { send_garp_rarp_update(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - local_datapaths, pb, &nat_addresses, - garp_max_timeout, garp_continuous); + sbrec_ecmp_by_nexthop, local_datapaths, pb, + &nat_addresses, garp_max_timeout, + garp_continuous); } } @@ -6914,7 +6955,21 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, ecmp_nh_table) { const struct sbrec_port_binding *pb = sb_ecmp_nexthop->port; - if (pb && !strcmp(pb->type, "l3gateway") && pb->chassis == chassis) { + if (!pb || pb->chassis != chassis) { + continue; + } + + /* Update mac binding if not already set. */ + if (!strcmp(sb_ecmp_nexthop->mac, "")) { + const struct sbrec_mac_binding *mb = + mac_binding_lookup(sbrec_mac_binding_by_lport_ip, + pb->logical_port, sb_ecmp_nexthop->nexthop); + if (ovnsb_idl_txn && mb) { + sbrec_ecmp_nexthop_set_mac(sb_ecmp_nexthop, mb->mac); + } + } + + if (!strcmp(pb->type, "l3gateway")) { send_arp_nd_update(pb, sb_ecmp_nexthop->nexthop, max_arp_nd_timeout, continuous_arp_nd); } diff --git a/controller/pinctrl.h b/controller/pinctrl.h index 5c8ea7aea..f1968f31c 100644 --- a/controller/pinctrl.h +++ b/controller/pinctrl.h @@ -50,6 +50,7 @@ void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_igmp_groups, struct ovsdb_idl_index *sbrec_ip_multicast_opts, struct ovsdb_idl_index *sbrec_fdb_by_dp_key_mac, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_dns_table *, const struct sbrec_controller_event_table *, const struct sbrec_service_monitor_table *, From patchwork Mon Nov 11 16:30:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2009878 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JQ+TJ4AK; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XnFRr65zcz1xty for ; Tue, 12 Nov 2024 03:31:36 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 186AA405E0; Mon, 11 Nov 2024 16:31:35 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id iphagOxd2ryN; Mon, 11 Nov 2024 16:31:25 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 0D678405C1 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JQ+TJ4AK Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id 0D678405C1; Mon, 11 Nov 2024 16:31:25 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id BC514C0889; Mon, 11 Nov 2024 16:31:24 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0B39DC087D for ; Mon, 11 Nov 2024 16:31:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 9B19860705 for ; Mon, 11 Nov 2024 16:31:21 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 7FxOBniQwPeC for ; Mon, 11 Nov 2024 16:31:15 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org A7B94605C5 Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org A7B94605C5 Authentication-Results: smtp3.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JQ+TJ4AK Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id A7B94605C5 for ; Mon, 11 Nov 2024 16:31:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731342673; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zJEOxJx32NzBnxsG7vr/lJrWrInpRIjHEOXCKL8A9Jg=; b=JQ+TJ4AKTp5ljsC9yn2s1qD3H4sNsYiE+5JG9/tGkvy0Opla5hDKegA+aHnJ2KFnYZQ7PD 9wy5l7F12cHu0k4te5FWeG62om0Voun9d2xVcbTPx6VDbCcZTjndcOg1YW8pmf3//UaaV0 wrAjvvSTcI/RHrpH4qo8Ee1g70ORfRo= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-655-WCCGrZyINb-ojumCF5F-hg-1; Mon, 11 Nov 2024 11:31:11 -0500 X-MC-Unique: WCCGrZyINb-ojumCF5F-hg-1 X-Mimecast-MFC-AGG-ID: WCCGrZyINb-ojumCF5F-hg Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-4608f5ec511so76166571cf.3 for ; Mon, 11 Nov 2024 08:31:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731342671; x=1731947471; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zJEOxJx32NzBnxsG7vr/lJrWrInpRIjHEOXCKL8A9Jg=; b=a1G5gaanBedm8y8XiqqZ0IdY0WIJkUkRgj1UeTjMDI/u87ojKiIFgt+6j9ZeMu1hgs ZUnupKBxudz99IZrHmE2AP+F1dTSLl5xb66yb+qgDaHI5wSoqgAbuOtqG15yqNADPC9E WBVxNMAjUzbd2qn9uijQLjZmMgQmZqybP4DGP/b0LMR+RpjHmLnKw6ObwuZH7475umM5 tYOfhsY+GVWqr2+IKZvBZQu3wZMl3hVgz5MNuw7Hk9vbvurdWI4YLJD8Fc6ivdsY8PNP yqPj8bzj+3VVFtsd9vFt0j/Zwstmp3XP+EmPUO9c98opIuuJg4IWxKlchSvFe83NAtc/ MtKQ== X-Gm-Message-State: AOJu0YwKCPKMC71gmQrrpv+SdrU310M+qE676gTd4YM0Wg01vnYBhpWS xuHOmfFzaeEFVk7TUXARjyx1xM1oeGSSEkfmOHnYQwSO9TSMfvEPANbJmMHJ64HMRAVzTfewZFo f8K5cjGqBp2vf1WBYa1f6IixNf1Td67WNYmh0SAEzmfsjKzX1keLv4TG2rOS1BK3DDH+UmeLjQU BZj6JSAQGt7UVRBJrUPRv0uPmhMDOFJ8clFZqdnGKBv4NdNw33LQ== X-Received: by 2002:ac8:5dc9:0:b0:458:3116:f06d with SMTP id d75a77b69052e-4630933f79bmr191916811cf.22.1731342670104; Mon, 11 Nov 2024 08:31:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IFVjUOCNOFqrLiC/ADyJ3G9JQnOXjQ77iiRA/P2WoXftgM+mXHSboaVEr1ijJEPRRuPPXHBYA== X-Received: by 2002:ac8:5dc9:0:b0:458:3116:f06d with SMTP id d75a77b69052e-4630933f79bmr191915931cf.22.1731342668922; Mon, 11 Nov 2024 08:31:08 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-462ff3ec65asm64726541cf.2.2024.11.11.08.31.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Nov 2024 08:31:08 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Mon, 11 Nov 2024 17:30:49 +0100 Message-ID: <2534dbe699593a86b8aeb553059011fc4c44111f.1731342268.git.lorenzo.bianconi@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: FLvECh3eqtOWhXIK8vwHGBTbJfzlqoLTCLqbj1XIDgk_1731342671 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC v3 ovn 4/4] ofctrl: Introduce ecmp_nexthop_monitor. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Introduce ecmp_nexthop_monitor in ovn-controller in order to track and flush ecmp-symmetric reply ct entires when requested by the CMS (e.g removing the related static ecmp routes). CT entries are flushed using the ethernet mac address stored in ct_label. Signed-off-by: Lorenzo Bianconi --- NEWS | 2 + controller/automake.mk | 4 +- controller/ecmp-next-hop-monitor.c | 102 ++++++ controller/ecmp-next-hop-monitor.h | 24 ++ controller/ofctrl.c | 6 + controller/ofctrl.h | 2 + controller/ovn-controller.c | 2 + include/ovn/logical-fields.h | 3 + tests/system-ovn.at | 544 +++++++++++++++++++++++++++++ 9 files changed, 688 insertions(+), 1 deletion(-) create mode 100644 controller/ecmp-next-hop-monitor.c create mode 100644 controller/ecmp-next-hop-monitor.h diff --git a/NEWS b/NEWS index 1f8f54d5d..f46285d32 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,8 @@ Post v24.09.0 ECMP-nexthop. By default ovn-controller continuously sends ARP/ND packets for ECMP-nexthop. + - Introduce ovn-controller ECMP_nexthop monitor in order to flush stale ct + entries when related ecmp routes are removed by the CMS. OVN v24.09.0 - 13 Sep 2024 -------------------------- diff --git a/controller/automake.mk b/controller/automake.mk index ed93cfb3c..6e3de5423 100644 --- a/controller/automake.mk +++ b/controller/automake.mk @@ -49,7 +49,9 @@ controller_ovn_controller_SOURCES = \ controller/statctrl.h \ controller/statctrl.c \ controller/ct-zone.h \ - controller/ct-zone.c + controller/ct-zone.c \ + controller/ecmp-next-hop-monitor.h \ + controller/ecmp-next-hop-monitor.c controller_ovn_controller_LDADD = lib/libovn.la $(OVS_LIBDIR)/libopenvswitch.la man_MANS += controller/ovn-controller.8 diff --git a/controller/ecmp-next-hop-monitor.c b/controller/ecmp-next-hop-monitor.c new file mode 100644 index 000000000..7bd14ff3c --- /dev/null +++ b/controller/ecmp-next-hop-monitor.c @@ -0,0 +1,102 @@ +/* Copyright (c) 2024, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "lib/ovn-util.h" +#include "lib/simap.h" +#include "openvswitch/hmap.h" +#include "openvswitch/ofp-ct.h" +#include "openvswitch/rconn.h" +#include "openvswitch/vlog.h" +#include "ovn/logical-fields.h" +#include "ovn-sb-idl.h" +#include "controller/ecmp-next-hop-monitor.h" + +VLOG_DEFINE_THIS_MODULE(ecmp_next_hop_monitor); + +static struct smap ecmp_nexthop; + +void ecmp_nexthop_init(void) +{ + smap_init(&ecmp_nexthop); +} + +void ecmp_nexthop_destroy(void) +{ + smap_destroy(&ecmp_nexthop); +} + +static void +ecmp_nexthop_monitor_flush_ct_entry(const struct rconn *swconn, + const char *mac, struct ovs_list *msgs) +{ + struct eth_addr ea; + if (!ovs_scan(mac, ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(ea))) { + return; + } + + ovs_u128 mask = { + /* ct_label.ecmp_reply_eth BITS[32-79] */ + .u64.hi = OVN_CT_ECMP_ETH_HIGH, + .u64.lo = OVN_CT_ECMP_ETH_LOW, + }; + + ovs_be32 lo = get_unaligned_be32((void *)&ea.be16[1]); + ovs_u128 nexthop = { + .u64.hi = ntohs(ea.be16[0]), + .u64.lo = (uint64_t) ntohl(lo) << 32, + }; + + struct ofp_ct_match match = { + .labels = nexthop, + .labels_mask = mask, + }; + struct ofpbuf *msg = ofp_ct_match_encode(&match, NULL, + rconn_get_version(swconn)); + ovs_list_push_back(msgs, &msg->list_node); +} + +void +ecmp_nexthop_monitor_run(const struct sbrec_ecmp_nexthop_table *enh_table, + const struct rconn *swconn, struct ovs_list *msgs) +{ + struct sset ecmp_sb_only = SSET_INITIALIZER(&ecmp_sb_only); + struct simap mac_count = SIMAP_INITIALIZER(&mac_count); + + struct smap_node *node; + SMAP_FOR_EACH_SAFE (node, &ecmp_nexthop) { + simap_increase(&mac_count, node->value, 1); + } + + const struct sbrec_ecmp_nexthop *sbrec_ecmp_nexthop; + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sbrec_ecmp_nexthop, enh_table) { + smap_replace(&ecmp_nexthop, sbrec_ecmp_nexthop->nexthop, + sbrec_ecmp_nexthop->mac); + sset_add(&ecmp_sb_only, sbrec_ecmp_nexthop->nexthop); + } + + SMAP_FOR_EACH_SAFE (node, &ecmp_nexthop) { + /* Do not flush CT entries if the share the same mac address. */ + if (!sset_contains(&ecmp_sb_only, node->key)) { + if (simap_get(&mac_count, node->value) == 1) { + ecmp_nexthop_monitor_flush_ct_entry(swconn, node->value, msgs); + } + smap_remove_node(&ecmp_nexthop, node); + } + } + + sset_destroy(&ecmp_sb_only); + simap_destroy(&mac_count); +} diff --git a/controller/ecmp-next-hop-monitor.h b/controller/ecmp-next-hop-monitor.h new file mode 100644 index 000000000..280ec3bed --- /dev/null +++ b/controller/ecmp-next-hop-monitor.h @@ -0,0 +1,24 @@ +/* Copyright (c) 2024, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef OVN_CMP_NEXT_HOP_MONITOR_H +#define OVN_CMP_NEXT_HOP_MONITOR_H + +void ecmp_nexthop_init(void); +void ecmp_nexthop_destroy(void); +void ecmp_nexthop_monitor_run(const struct sbrec_ecmp_nexthop_table *enh_table, + const struct rconn *swconn, + struct ovs_list *msgs); +#endif /* OVN_CMP_NEXT_HOP_MONITOR_H */ diff --git a/controller/ofctrl.c b/controller/ofctrl.c index f9387d375..d770b8e98 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -54,6 +54,7 @@ #include "vswitch-idl.h" #include "ovn-sb-idl.h" #include "ct-zone.h" +#include "ecmp-next-hop-monitor.h" VLOG_DEFINE_THIS_MODULE(ofctrl); @@ -425,6 +426,7 @@ ofctrl_init(struct ovn_extend_table *group_table, tx_counter = rconn_packet_counter_create(); hmap_init(&installed_lflows); hmap_init(&installed_pflows); + ecmp_nexthop_init(); ovs_list_init(&flow_updates); ovn_init_symtab(&symtab); groups = group_table; @@ -877,6 +879,7 @@ ofctrl_destroy(void) expr_symtab_destroy(&symtab); shash_destroy(&symtab); ofctrl_meter_bands_destroy(); + ecmp_nexthop_destroy(); } uint64_t @@ -2664,6 +2667,7 @@ ofctrl_put(struct ovn_desired_flow_table *lflow_table, struct shash *pending_ct_zones, struct hmap *pending_lb_tuples, struct ovsdb_idl_index *sbrec_meter_by_name, + const struct sbrec_ecmp_nexthop_table *enh_table, uint64_t req_cfg, bool lflows_changed, bool pflows_changed) @@ -2704,6 +2708,8 @@ ofctrl_put(struct ovn_desired_flow_table *lflow_table, /* OpenFlow messages to send to the switch to bring it up-to-date. */ struct ovs_list msgs = OVS_LIST_INITIALIZER(&msgs); + ecmp_nexthop_monitor_run(enh_table, swconn, &msgs); + /* Iterate through ct zones that need to be flushed. */ struct shash_node *iter; SHASH_FOR_EACH(iter, pending_ct_zones) { diff --git a/controller/ofctrl.h b/controller/ofctrl.h index 129e3b6ad..33953a8a4 100644 --- a/controller/ofctrl.h +++ b/controller/ofctrl.h @@ -31,6 +31,7 @@ struct ofpbuf; struct ovsrec_bridge; struct ovsrec_open_vswitch_table; struct sbrec_meter_table; +struct sbrec_ecmp_nexthop_table; struct shash; struct ovn_desired_flow_table { @@ -59,6 +60,7 @@ void ofctrl_put(struct ovn_desired_flow_table *lflow_table, struct shash *pending_ct_zones, struct hmap *pending_lb_tuples, struct ovsdb_idl_index *sbrec_meter_by_name, + const struct sbrec_ecmp_nexthop_table *enh_table, uint64_t nb_cfg, bool lflow_changed, bool pflow_changed); diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index c319e0d77..536fe2b69 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5778,6 +5778,8 @@ main(int argc, char *argv[]) &ct_zones_data->ctx.pending, &lb_data->removed_tuples, sbrec_meter_by_name, + sbrec_ecmp_nexthop_table_get( + ovnsb_idl_loop.idl), ofctrl_seqno_get_req_cfg(), engine_node_changed(&en_lflow_output), engine_node_changed(&en_pflow_output)); diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h index d563e044c..a024b0cd3 100644 --- a/include/ovn/logical-fields.h +++ b/include/ovn/logical-fields.h @@ -212,6 +212,9 @@ const struct ovn_field *ovn_field_from_name(const char *name); #define OVN_CT_ECMP_ETH_1ST_BIT 32 #define OVN_CT_ECMP_ETH_END_BIT 79 +#define OVN_CT_ECMP_ETH_LOW (((1ULL << OVN_CT_ECMP_ETH_1ST_BIT) - 1) << 32) +#define OVN_CT_ECMP_ETH_HIGH ((1ULL << (OVN_CT_ECMP_ETH_END_BIT - 63)) - 1) + #define OVN_CT_STR(LABEL_VALUE) OVS_STRINGIZE(LABEL_VALUE) #define OVN_CT_MASKED_STR(LABEL_VALUE) \ OVS_STRINGIZE(LABEL_VALUE) "/" OVS_STRINGIZE(LABEL_VALUE) diff --git a/tests/system-ovn.at b/tests/system-ovn.at index 6dfc3055a..27f872091 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -14002,3 +14002,547 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d /.*terminating with signal 15.*/d"]) AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ECMP Flush CT entries - IPv4]) +AT_KEYWORDS([ecmp]) +ovn_start +OVS_TRAFFIC_VSWITCHD_START() + +ADD_BR([br-int]) +ADD_BR([br-ext]) +ADD_BR([br-ecmp]) + +ovs-ofctl add-flow br-ext action=normal +ovs-ofctl add-flow br-ecmp action=normal +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller +ovs-vsctl set Open_vSwitch . external-ids:arp-max-timeout-sec=1 + +check ovn-nbctl lr-add R1 +check ovn-nbctl set logical_router R1 options:chassis=hv1 +check ovn-nbctl lr-add R2 +check ovn-nbctl set logical_router R2 options:chassis=hv1 + +check ovn-nbctl ls-add sw0 +check ovn-nbctl ls-add sw1 +check ovn-nbctl ls-add public + +check ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 192.168.1.1/24 +check ovn-nbctl lrp-add R1 rp-public1 00:00:02:01:02:03 172.16.1.1/24 + +check ovn-nbctl lrp-add R2 rp-sw1 00:00:03:01:02:03 192.168.2.1/24 +check ovn-nbctl lrp-add R2 rp-public2 00:00:04:01:02:03 172.16.1.5/24 + +check ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \ + type=router options:router-port=rp-sw0 \ + -- lsp-set-addresses sw0-rp router + +check ovn-nbctl lsp-add sw1 sw1-rp -- set Logical_Switch_Port sw1-rp \ + type=router options:router-port=rp-sw1 \ + -- lsp-set-addresses sw1-rp router + +check ovn-nbctl lsp-add public public-rp1 -- set Logical_Switch_Port public-rp1 \ + type=router options:router-port=rp-public1 \ + -- lsp-set-addresses public-rp1 router + +check ovn-nbctl lsp-add public public-rp2 -- set Logical_Switch_Port public-rp2 \ + type=router options:router-port=rp-public2 \ + -- lsp-set-addresses public-rp2 router + +ADD_NAMESPACES(alice) +ADD_VETH(alice, alice, br-int, "192.168.1.2/24", "f0:00:00:01:02:03", \ + "192.168.1.1") +check ovn-nbctl lsp-add sw0 alice \ + -- lsp-set-addresses alice "f0:00:00:01:02:03 192.168.1.2" + +ADD_NAMESPACES(peter) +ADD_VETH(peter, peter, br-int, "192.168.2.2/24", "f0:00:02:01:02:03", \ + "192.168.2.1") +check ovn-nbctl lsp-add sw1 peter \ + -- lsp-set-addresses peter "f0:00:02:01:02:03 192.168.2.2" + +check ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext +check ovn-nbctl lsp-add public public1 \ + -- lsp-set-addresses public1 unknown \ + -- lsp-set-type public1 localnet \ + -- lsp-set-options public1 network_name=phynet + +ADD_NAMESPACES(ecmp-path0) +ADD_VETH(ecmp-p01, ecmp-path0, br-ext, "172.16.1.2/24", "f0:00:00:01:02:04", "172.16.1.1") +ADD_VETH(ecmp-p02, ecmp-path0, br-ecmp, "172.16.2.2/24", "f0:00:00:01:03:04") + +ADD_NAMESPACES(ecmp-path1) +ADD_VETH(ecmp-p11, ecmp-path1, br-ext, "172.16.1.3/24", "f0:00:00:01:02:05", "172.16.1.1") +ADD_VETH(ecmp-p12, ecmp-path1, br-ecmp, "172.16.2.3/24", "f0:00:00:01:03:05") + +ADD_NAMESPACES(bob) +ADD_VETH(bob, bob, br-ecmp, "172.16.2.10/24", "f0:00:00:01:02:06", "172.16.2.2") + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.3 + +wait_for_ports_up +check ovn-nbctl --wait=hv sync +NETNS_DAEMONIZE([alice], [nc -l -k 80], [alice.pid]) +NETNS_DAEMONIZE([peter], [nc -l -k 80], [peter.pid]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='172.16.1.2' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020400000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +]) + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip route del 0.0.0.0/0 via 172.16.2.2]) +NS_CHECK_EXEC([bob], [ip route add 0.0.0.0/0 via 172.16.2.3]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +check_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='172.16.1.2' +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020400000000 +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Add the route back and verify we do not flush if we have multiple next-hops with the same mac address +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.2 +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='172.16.1.2' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:05]) +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.2' + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip route del 0.0.0.0/0 via 172.16.2.3]) +NS_CHECK_EXEC([bob], [ip route add 0.0.0.0/0 via 172.16.2.2]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove second ECMP route +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:06]) + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.3 + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 172.16.2.0/24 172.16.1.3 + +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:06' ECMP_Nexthop mac nexthop='172.16.1.2' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +NS_CHECK_EXEC([ecmp-path0], [ip route add 192.168.2.2/32 via 172.16.1.5]) +NS_CHECK_EXEC([ecmp-path1], [ip route add 192.168.2.2/32 via 172.16.1.5]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.2.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.2.2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020600000000 +icmp,orig=(src=172.16.2.10,dst=192.168.2.2,id=,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +tcp,orig=(src=172.16.2.10,dst=192.168.2.2,sport=,dport=),reply=(src=192.168.2.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:06' ECMP_Nexthop mac nexthop='172.16.1.2' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020600000000 +icmp,orig=(src=172.16.2.10,dst=192.168.2.2,id=,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +tcp,orig=(src=172.16.2.10,dst=192.168.2.2,sport=,dport=),reply=(src=192.168.2.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +check ovn-nbctl lr-route-del R2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d +/.*terminating with signal 15.*/d"]) +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ECMP Flush CT entries - IPv6]) +AT_KEYWORDS([ecmp]) +ovn_start +OVS_TRAFFIC_VSWITCHD_START() + +ADD_BR([br-int]) +ADD_BR([br-ext]) +ADD_BR([br-ecmp]) + +ovs-ofctl add-flow br-ext action=normal +ovs-ofctl add-flow br-ecmp action=normal +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller +ovs-vsctl set Open_vSwitch . external-ids:arp-max-timeout-sec=1 + +check ovn-nbctl lr-add R1 +check ovn-nbctl set logical_router R1 options:chassis=hv1 +check ovn-nbctl lr-add R2 +check ovn-nbctl set logical_router R2 options:chassis=hv1 + +check ovn-nbctl ls-add sw0 +check ovn-nbctl ls-add sw1 +check ovn-nbctl ls-add public + +check ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 fd11::1/64 +check ovn-nbctl lrp-add R1 rp-public1 00:00:02:01:02:03 fd12::1/64 + +check ovn-nbctl lrp-add R2 rp-sw1 00:00:03:01:02:03 fd14::1/64 +check ovn-nbctl lrp-add R2 rp-public2 00:00:04:01:02:03 fd12::5/64 + +check ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \ + type=router options:router-port=rp-sw0 \ + -- lsp-set-addresses sw0-rp router + +check ovn-nbctl lsp-add sw1 sw1-rp -- set Logical_Switch_Port sw1-rp \ + type=router options:router-port=rp-sw1 \ + -- lsp-set-addresses sw1-rp router + +check ovn-nbctl lsp-add public public-rp1 -- set Logical_Switch_Port public-rp1 \ + type=router options:router-port=rp-public1 \ + -- lsp-set-addresses public-rp1 router + +check ovn-nbctl lsp-add public public-rp2 -- set Logical_Switch_Port public-rp2 \ + type=router options:router-port=rp-public2 \ + -- lsp-set-addresses public-rp2 router + +ADD_NAMESPACES(alice) +ADD_VETH(alice, alice, br-int, "fd11::2/64", "f0:00:00:01:02:03", "fd11::1", "nodad") +check ovn-nbctl lsp-add sw0 alice -- lsp-set-addresses alice "f0:00:00:01:02:03 fd11::2" + +ADD_NAMESPACES(peter) +ADD_VETH(peter, peter, br-int, "fd14::2/64", "f0:00:02:01:02:03", "fd14::1", "nodad") +check ovn-nbctl lsp-add sw1 peter -- lsp-set-addresses peter "f0:00:02:01:02:03 fd14::2" + +check ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext +check ovn-nbctl lsp-add public public1 \ + -- lsp-set-addresses public1 unknown \ + -- lsp-set-type public1 localnet \ + -- lsp-set-options public1 network_name=phynet + +ADD_NAMESPACES(ecmp-path0) +ADD_VETH(ecmp-p01, ecmp-path0, br-ext, "fd12::2/64", "f0:00:00:01:02:04", "fd12::1", "nodad") +ADD_VETH(ecmp-p02, ecmp-path0, br-ecmp, "fd13::2/64", "f0:00:00:01:03:04") +OVS_WAIT_UNTIL([NS_EXEC([ecmp-path0], [ip a show dev ecmp-p02 | grep "fe80::" | grep -v tentative])]) + +ADD_NAMESPACES(ecmp-path1) +ADD_VETH(ecmp-p11, ecmp-path1, br-ext, "fd12::3/64", "f0:00:00:01:02:05", "fd12::1", "nodad") +ADD_VETH(ecmp-p12, ecmp-path1, br-ecmp, "fd13::3/64", "f0:00:00:01:03:05") +OVS_WAIT_UNTIL([NS_EXEC([ecmp-path1], [ip a show dev ecmp-p12 | grep "fe80::" | grep -v tentative])]) + +ADD_NAMESPACES(bob) +ADD_VETH(bob, bob, br-ecmp, "fd13::a/64", "f0:00:00:01:02:06", "fd13::2", "nodad") + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::3 + +NS_CHECK_EXEC([ecmp-path0], [sysctl -w net.ipv6.conf.all.forwarding=1],[0], [dnl +net.ipv6.conf.all.forwarding = 1 +]) +NS_CHECK_EXEC([ecmp-path1], [sysctl -w net.ipv6.conf.all.forwarding=1],[0], [dnl +net.ipv6.conf.all.forwarding = 1 +]) + +ovn-nbctl --wait=hv sync +NETNS_DAEMONIZE([alice], [nc -6 -l -k 80], [alice.pid]) +NETNS_DAEMONIZE([peter], [nc -6 -l -k 80], [peter.pid]) + +NS_CHECK_EXEC([bob], [ping6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='"fd12::2"' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020400000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +]) + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip -6 route del ::/0 via fd13::2]) +NS_CHECK_EXEC([bob], [ip -6 route add ::/0 via fd13::3]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +check_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='"fd12::2"' +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020400000000 +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 fd13::/64 fd12::2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + + Add the route back and verify we do not flush if we have multiple next-hops with the same mac address +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::2 +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='"fd12::2"' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' +# +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:05]) +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::2"' + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip -6 route del ::/0 via fd13::3]) +NS_CHECK_EXEC([bob], [ip -6 route add ::/0 via fd13::2]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 fd13::/64 fd12::2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove second ECMP route +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:06]) + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::3 + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 fd13::/64 fd12::2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 fd13::/64 fd12::3 + +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:06' ECMP_Nexthop mac nexthop='"fd12::2"' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +NS_CHECK_EXEC([ecmp-path0], [ip route add fd14::2/128 via fd12::5]) +NS_CHECK_EXEC([ecmp-path1], [ip route add fd14::2/128 via fd12::5]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd14::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd14::2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020600000000 +icmpv6,orig=(src=fd13::a,dst=fd14::2,id=,type=128,code=0),reply=(src=fd14::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +tcp,orig=(src=fd13::a,dst=fd14::2,sport=,dport=),reply=(src=fd14::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +# Remove second ECMP route +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:06' ECMP_Nexthop mac nexthop='"fd12::2"' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020600000000 +icmpv6,orig=(src=fd13::a,dst=fd14::2,id=,type=128,code=0),reply=(src=fd14::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +tcp,orig=(src=fd13::a,dst=fd14::2,sport=,dport=),reply=(src=fd14::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +check ovn-nbctl lr-route-del R2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d +/.*terminating with signal 15.*/d"]) +AT_CLEANUP +])