From patchwork Wed Nov 13 11:04:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2010708 X-Patchwork-Delegate: dceara@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JvL6oNPI; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XpL6S5JRQz1y09 for ; Wed, 13 Nov 2024 22:05:20 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 52EE960860; Wed, 13 Nov 2024 11:05:18 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id tpKsD2U6uB1O; Wed, 13 Nov 2024 11:05:16 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org A1AD56085C Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JvL6oNPI Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id A1AD56085C; Wed, 13 Nov 2024 11:05:16 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7421EC08A9; Wed, 13 Nov 2024 11:05:16 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 9CA6FC08B4 for ; Wed, 13 Nov 2024 11:05:13 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 7EBD4607FF for ; Wed, 13 Nov 2024 11:05:13 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id abZl2qCmmtDX for ; Wed, 13 Nov 2024 11:05:12 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org 0806A607EA Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 0806A607EA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id 0806A607EA for ; Wed, 13 Nov 2024 11:05:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731495910; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+h9bLXHeeF0orEwV/8+YdUBazuYt2o86nCFsB+8LtSk=; b=JvL6oNPIn9jJzIZhelwhHUhurIlm8bJ6yBi5aM4eFtTC9OuJSzhXZFalXEwJ4nrudTIf0Q r66+GsWsquqLK8fCNfEAdqwpgLxYIcm3y5cPi32lCGh+lTLVwPOip4b7YSfwLPGlhUHFy5 gLKcFnLcmxahoU7QQCju5ghQmmMIlTc= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-19-inxTmYdZOZKstSOQJiOOZw-1; Wed, 13 Nov 2024 06:05:09 -0500 X-MC-Unique: inxTmYdZOZKstSOQJiOOZw-1 X-Mimecast-MFC-AGG-ID: inxTmYdZOZKstSOQJiOOZw Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-539eaa0561bso4959626e87.0 for ; Wed, 13 Nov 2024 03:05:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731495907; x=1732100707; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+h9bLXHeeF0orEwV/8+YdUBazuYt2o86nCFsB+8LtSk=; b=lNK3uPoDrJx5oPF0ZHfhKKqYIKmsuhOe3XEkwGB+TKcFwDVnRwtb/D0m36cO5+lDVM CiNFXXSffrw1v/efeghwQv/QsupaZ6PfqDYEJ7+VW8H8+Q6etN/tPm30yvlKR/y3N+sA d34L+NHZHOU6MehlC1ByGUin09U3fJ0C9e9o6zle7540Qz4f1rdykvlIFYSbJGqTivB9 XQdIeZZqZNvWEwl2Vtz0T+ybGrJvsz6zEOwL4eTR7ZOkcWS9ija75D7D7qwW8pldvmHy FTdW4C35a+ef+6Bx9zE9+3Su7tamZqZgnYgcMIsF6O4hVQHgp+VKsQm3SE6SHY0ORDC3 jZfw== X-Gm-Message-State: AOJu0YzFmdTjQYN9KJmScWQa3LOm04kVy54CJX3zd8IK11bylwDfB3E0 cRrPp0MLG1IOVITv9d6ebtk5TVxIIFADV2rLEzSA9XzK0Llu6rwwlJTqR/b8pUgM77Aj+vb7qae rQ+yEPSDxpGnJIBZssdbX9FJq9F4RZ7BPtTHioXPPzQ3nYUjRly1z59IIvwb37M9qzpDnVQjg0V x1e+MOotUH/34uA03jzdMmm50G0F847GWQu7UMrxMoVCNyzYI5iA== X-Received: by 2002:a05:6512:281b:b0:536:54fd:275b with SMTP id 2adb3069b0e04-53d86303095mr9863258e87.54.1731495907099; Wed, 13 Nov 2024 03:05:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IGdZipODu2sUYGE+lbsZAGiSAORzJ2JYB9Nzg8MQDyALftSx6Lq8HYpNJyjEaVKOgHvWVlD8w== X-Received: by 2002:a05:6512:281b:b0:536:54fd:275b with SMTP id 2adb3069b0e04-53d86303095mr9863211e87.54.1731495906390; Wed, 13 Nov 2024 03:05:06 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-432d48baa42sm18716495e9.1.2024.11.13.03.05.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2024 03:05:05 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Wed, 13 Nov 2024 12:04:57 +0100 Message-ID: <78017c6fdb5acfb516405e6486b852f26a7754cd.1731495611.git.lorenzo.bianconi@redhat.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: CdMVEit_gCKUa3yLlrrwt1MOb0fdYzOc5ZA8Ujhsn50_1731495908 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH ovn 1/4] northd: Introduce ECMP_Nexthop table in SB db. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Introduce ECMP_Nexthop table in the SB db in order to track active ecmp-symmetric-reply connections and flush stale ones. ECMP_Nexthop table contains ip and mac address for each active nexthop. Signed-off-by: Lorenzo Bianconi --- northd/en-northd.c | 29 +++++++++++ northd/en-northd.h | 4 ++ northd/inc-proc-northd.c | 9 +++- northd/northd.c | 104 ++++++++++++++++++++++++++++++++++++++- northd/northd.h | 6 +++ ovn-sb.ovsschema | 17 ++++++- ovn-sb.xml | 31 ++++++++++++ tests/ovn-northd.at | 33 ++++++++++--- 8 files changed, 222 insertions(+), 11 deletions(-) diff --git a/northd/en-northd.c b/northd/en-northd.c index 24ed31517..165af44a0 100644 --- a/northd/en-northd.c +++ b/northd/en-northd.c @@ -404,6 +404,23 @@ en_bfd_sync_run(struct engine_node *node, void *data) engine_set_node_state(node, EN_UPDATED); } +void +en_ecmp_nexthop_run(struct engine_node *node, void *data OVS_UNUSED) +{ + const struct engine_context *eng_ctx = engine_get_context(); + struct northd_data *northd_data = engine_get_input_data("northd", node); + struct static_routes_data *static_routes_data = + engine_get_input_data("static_routes", node); + const struct sbrec_ecmp_nexthop_table *sbrec_ecmp_nexthop_table = + EN_OVSDB_GET(engine_get_input("SB_ecmp_nexthop", node)); + + build_ecmp_nexthop_table(eng_ctx->ovnsb_idl_txn, + &northd_data->lr_ports, + &static_routes_data->parsed_routes, + sbrec_ecmp_nexthop_table); + engine_set_node_state(node, EN_UPDATED); +} + void *en_northd_init(struct engine_node *node OVS_UNUSED, struct engine_arg *arg OVS_UNUSED) @@ -454,6 +471,13 @@ void return data; } +void * +en_ecmp_nexthop_init(struct engine_node *node OVS_UNUSED, + struct engine_arg *arg OVS_UNUSED) +{ + return NULL; +} + void en_northd_cleanup(void *data) { @@ -526,3 +550,8 @@ en_bfd_sync_cleanup(void *data) { bfd_sync_destroy(data); } + +void +en_ecmp_nexthop_cleanup(void *data OVS_UNUSED) +{ +} diff --git a/northd/en-northd.h b/northd/en-northd.h index 631a7c17a..2666cc67e 100644 --- a/northd/en-northd.h +++ b/northd/en-northd.h @@ -42,5 +42,9 @@ bool bfd_sync_northd_change_handler(struct engine_node *node, void *data OVS_UNUSED); void en_bfd_sync_run(struct engine_node *node, void *data); void en_bfd_sync_cleanup(void *data OVS_UNUSED); +void en_ecmp_nexthop_run(struct engine_node *node, void *data); +void *en_ecmp_nexthop_init(struct engine_node *node OVS_UNUSED, + struct engine_arg *arg OVS_UNUSED); +void en_ecmp_nexthop_cleanup(void *data); #endif /* EN_NORTHD_H */ diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c index 8c834facb..8e16fde80 100644 --- a/northd/inc-proc-northd.c +++ b/northd/inc-proc-northd.c @@ -103,7 +103,8 @@ static unixctl_cb_func chassis_features_list; SB_NODE(fdb, "fdb") \ SB_NODE(static_mac_binding, "static_mac_binding") \ SB_NODE(chassis_template_var, "chassis_template_var") \ - SB_NODE(logical_dp_group, "logical_dp_group") + SB_NODE(logical_dp_group, "logical_dp_group") \ + SB_NODE(ecmp_nexthop, "ecmp_nexthop") enum sb_engine_node { #define SB_NODE(NAME, NAME_STR) SB_##NAME, @@ -162,6 +163,7 @@ static ENGINE_NODE(route_policies, "route_policies"); static ENGINE_NODE(static_routes, "static_routes"); static ENGINE_NODE(bfd, "bfd"); static ENGINE_NODE(bfd_sync, "bfd_sync"); +static ENGINE_NODE(ecmp_nexthop, "ecmp_nexthop"); void inc_proc_northd_init(struct ovsdb_idl_loop *nb, struct ovsdb_idl_loop *sb) @@ -264,6 +266,10 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb, engine_add_input(&en_bfd_sync, &en_route_policies, NULL); engine_add_input(&en_bfd_sync, &en_northd, bfd_sync_northd_change_handler); + engine_add_input(&en_ecmp_nexthop, &en_sb_ecmp_nexthop, NULL); + engine_add_input(&en_ecmp_nexthop, &en_northd, NULL); + engine_add_input(&en_ecmp_nexthop, &en_static_routes, NULL); + engine_add_input(&en_sync_meters, &en_nb_acl, NULL); engine_add_input(&en_sync_meters, &en_nb_meter, NULL); engine_add_input(&en_sync_meters, &en_sb_meter, NULL); @@ -334,6 +340,7 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb, engine_add_input(&en_sync_from_sb, &en_sb_port_binding, NULL); engine_add_input(&en_sync_from_sb, &en_sb_ha_chassis_group, NULL); + engine_add_input(&en_northd_output, &en_ecmp_nexthop, NULL); engine_add_input(&en_northd_output, &en_sync_from_sb, NULL); engine_add_input(&en_northd_output, &en_sync_to_sb, northd_output_sync_to_sb_handler); diff --git a/northd/northd.c b/northd/northd.c index 64b2e3859..d54fbf14e 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -10720,6 +10720,106 @@ build_bfd_map(const struct nbrec_bfd_table *nbrec_bfd_table, } } +struct ecmp_nexthop_data { + struct hmap_node hmap_node; + const struct sbrec_ecmp_nexthop *sb_ecmp_nh; + bool stale; +}; + +static struct ecmp_nexthop_data * +ecmp_nexthop_alloc_entry(const struct sbrec_ecmp_nexthop *sb_ecmp_nh, + struct hmap *map) +{ + struct ecmp_nexthop_data *e = xmalloc(sizeof *e); + e->sb_ecmp_nh = sb_ecmp_nh; + + const char *sb_port = sb_ecmp_nh->port->logical_port; + const char *sb_nexthop = sb_ecmp_nh->nexthop; + + uint32_t hash = hash_string(sb_nexthop, 0); + hash = hash_add(hash, hash_string(sb_port, 0)); + hmap_insert(map, &e->hmap_node, hash); + + return e; +} + +static struct ecmp_nexthop_data * +ecmp_nexthop_find_entry(const char *nexthop, const char *port, + struct hmap *map) +{ + uint32_t hash = hash_string(nexthop, 0); + hash = hash_add(hash, hash_string(port, 0)); + + struct ecmp_nexthop_data *e; + HMAP_FOR_EACH_WITH_HASH (e, hmap_node, hash, map) { + const char *sb_port = e->sb_ecmp_nh->port->logical_port; + const char *sb_nexthop = e->sb_ecmp_nh->nexthop; + if (!strcmp(sb_nexthop, nexthop) && !strcmp(sb_port, port)) { + return e; + } + } + return NULL; +} + +void +build_ecmp_nexthop_table( + struct ovsdb_idl_txn *ovnsb_txn, + const struct hmap *lr_ports, const struct hmap *routes, + const struct sbrec_ecmp_nexthop_table *sbrec_ecmp_nexthop_table) +{ + if (!ovnsb_txn) { + return; + } + + struct hmap sb_nexthops_map = HMAP_INITIALIZER(&sb_nexthops_map); + + const struct sbrec_ecmp_nexthop *sb_ecmp_nexthop; + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, + sbrec_ecmp_nexthop_table) { + struct ecmp_nexthop_data *e = ecmp_nexthop_alloc_entry( + sb_ecmp_nexthop, &sb_nexthops_map); + e->stale = true; + } + + struct parsed_route *pr; + HMAP_FOR_EACH (pr, key_node, routes) { + if (!pr->ecmp_symmetric_reply) { + continue; + } + + if (!pr->out_port) { + continue; + } + + struct ovn_port *out_port = ovn_port_find(lr_ports, pr->out_port->key); + if (!out_port || !out_port->sb) { + continue; + } + + const struct nbrec_logical_router_static_route *r = pr->route; + const char *pb_name = out_port->sb->logical_port; + + struct ecmp_nexthop_data *e = ecmp_nexthop_find_entry( + r->nexthop, pb_name, &sb_nexthops_map); + if (!e) { + sb_ecmp_nexthop = sbrec_ecmp_nexthop_insert(ovnsb_txn); + sbrec_ecmp_nexthop_set_nexthop(sb_ecmp_nexthop, r->nexthop); + sbrec_ecmp_nexthop_set_port(sb_ecmp_nexthop, out_port->sb); + e = ecmp_nexthop_alloc_entry(sb_ecmp_nexthop, &sb_nexthops_map); + } + e->stale = false; + } + + struct ecmp_nexthop_data *e; + HMAP_FOR_EACH_POP (e, hmap_node, &sb_nexthops_map) { + if (e->stale) { + sbrec_ecmp_nexthop_delete(e->sb_ecmp_nh); + } + free(e); + } + hmap_destroy(&sb_nexthops_map); +} + /* Returns a string of the IP address of the router port 'op' that * overlaps with 'ip_s". If one is not found, returns NULL. * @@ -11160,10 +11260,11 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *lr_ports, } /* Verify that ip_prefix and nexthop are on the same network. */ + struct ovn_port *out_port = NULL; if (!is_discard_route && !find_static_route_outport(od, lr_ports, route, IN6_IS_ADDR_V4MAPPED(&prefix), - NULL, NULL)) { + NULL, &out_port)) { return; } @@ -11206,6 +11307,7 @@ parsed_routes_add(struct ovn_datapath *od, const struct hmap *lr_ports, new_pr->hash = route_hash(new_pr); new_pr->route = route; new_pr->nbr = od->nbr; + new_pr->out_port = out_port; new_pr->ecmp_symmetric_reply = smap_get_bool(&route->options, "ecmp_symmetric_reply", false); diff --git a/northd/northd.h b/northd/northd.h index c1442ff40..3bd2e29e3 100644 --- a/northd/northd.h +++ b/northd/northd.h @@ -703,6 +703,7 @@ struct parsed_route { uint32_t route_table_id; uint32_t hash; const struct nbrec_logical_router_static_route *route; + struct ovn_port *out_port; bool ecmp_symmetric_reply; bool is_discard_route; const struct nbrec_logical_router *nbr; @@ -746,6 +747,11 @@ void bfd_destroy(struct bfd_data *); void bfd_sync_init(struct bfd_sync_data *); void bfd_sync_destroy(struct bfd_sync_data *); +void build_ecmp_nexthop_table( + struct ovsdb_idl_txn *ovnsb_txn, + const struct hmap *lr_ports, const struct hmap *routes, + const struct sbrec_ecmp_nexthop_table *sbrec_ecmp_nexthop_table); + struct lflow_table; struct lr_stateful_tracked_data; struct ls_stateful_tracked_data; diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema index 73abf2c8d..864cb0ed6 100644 --- a/ovn-sb.ovsschema +++ b/ovn-sb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Southbound", - "version": "20.37.0", - "cksum": "1950136776 31493", + "version": "20.38.0", + "cksum": "466210938 32119", "tables": { "SB_Global": { "columns": { @@ -610,6 +610,19 @@ "refTable": "Datapath_Binding"}}}}, "indexes": [["logical_port", "ip"]], "isRoot": true}, + "ECMP_Nexthop": { + "columns": { + "nexthop": {"type": "string"}, + "port": {"type": {"key": {"type": "uuid", + "refTable": "Port_Binding", + "refType": "strong"}, + "min": 0, "max": 1}}, + "mac": {"type": "string"}, + "external_ids": { + "type": {"key": "string", "value": "string", + "min": 0, "max": "unlimited"}}}, + "indexes": [["nexthop", "port"]], + "isRoot": true}, "Chassis_Template_Var": { "columns": { "chassis": {"type": "string"}, diff --git a/ovn-sb.xml b/ovn-sb.xml index ea4adc1c3..ea1d484a7 100644 --- a/ovn-sb.xml +++ b/ovn-sb.xml @@ -5217,4 +5217,35 @@ tcp.flags = RST; The set of variable values for a given chassis. + + +

+ Each record in this table represents an active ECMP route committed by + ovn-northd to ovs connection-tracking table. + ECMP_Nexthop table is used by ovn-controller + to track active ct entries and to flush stale ones. +

+ +

+ Nexthop IP address for this ECMP route. Nexthop IP address should + be the IP address of a connected router port or the IP address of + an external device used as nexthop for the given destination. +

+
+ +

+ The reference to table for the port used + to connect to the configured next-hop. +

+
+ +

+ Nexthop mac address. +

+
+ + + See External IDs at the beginning of this document. + +
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 8477e4250..1e01c2614 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -3886,7 +3886,7 @@ wait_row_count bfd 1 logical_port=r0-sw3 detect_mult=5 dst_ip=192.168.3.2 \ check_engine_stats northd norecompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3901,7 +3901,7 @@ wait_row_count bfd 1 logical_port=r0-sw1 min_rx=1000 min_tx=1000 detect_mult=100 check_engine_stats northd norecompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3913,7 +3913,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats static_routes recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3929,7 +3929,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats static_routes recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3941,7 +3941,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats route_policies recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3976,7 +3976,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats static_routes recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -3991,7 +3991,7 @@ check_engine_stats northd recompute nocompute check_engine_stats bfd recompute nocompute check_engine_stats route_policies recompute nocompute check_engine_stats lflow recompute nocompute -check_engine_stats northd_output norecompute compute +check_engine_stats northd_output recompute nocompute CHECK_NO_CHANGE_AFTER_RECOMPUTE check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats @@ -6816,6 +6816,10 @@ check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public # ECMP flows will be added even if there is only one next-hop. check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.10 +check_row_count ECMP_Nexthop 1 +uuid=$(fetch_column Port_Binding _uuid logical_port=lr0-public) +check_column 192.168.0.10 ECMP_Nexthop nexthop +check_column "$uuid" ECMP_Nexthop port ovn-sbctl dump-flows lr0 > lr0flows @@ -6835,6 +6839,13 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dn ]) check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.20 +check_row_count ECMP_Nexthop 2 +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.10], [0], [dnl +192.168.0.10 +]) +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.20], [0], [dnl +192.168.0.20 +]) ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl @@ -6864,6 +6875,13 @@ AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" lr0flows | ovn_strip_lflows], [0], [ # add ecmp route with wrong nexthop check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.1.20 +check_row_count ECMP_Nexthop 2 +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.10], [0], [dnl +192.168.0.10 +]) +AT_CHECK([ovn-sbctl --columns nexthop --bare find ECMP_Nexthop nexthop=192.168.0.20], [0], [dnl +192.168.0.20 +]) ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl @@ -6883,6 +6901,7 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192. check ovn-nbctl lr-route-del lr0 wait_row_count nb:Logical_Router_Static_Route 0 +check_row_count ECMP_Nexthop 0 check ovn-nbctl --wait=sb lr-route-add lr0 1.0.0.0/24 192.168.0.10 ovn-sbctl dump-flows lr0 > lr0flows From patchwork Wed Nov 13 11:04:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2010709 X-Patchwork-Delegate: dceara@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=i9VNlzHd; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XpL6W5h1hz1xty for ; Wed, 13 Nov 2024 22:05:23 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id BEF1181154; Wed, 13 Nov 2024 11:05:21 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 0FtIsMLjeU6Y; Wed, 13 Nov 2024 11:05:19 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 3E3E1812A8 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=i9VNlzHd Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id 3E3E1812A8; Wed, 13 Nov 2024 11:05:18 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 75CB5C08B9; Wed, 13 Nov 2024 11:05:17 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id B37F2C08A9 for ; Wed, 13 Nov 2024 11:05:14 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 9DE91812C3 for ; Wed, 13 Nov 2024 11:05:14 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id igMciWpMPw_1 for ; Wed, 13 Nov 2024 11:05:13 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org 1907581232 Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 1907581232 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 1907581232 for ; Wed, 13 Nov 2024 11:05:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731495911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rxEg+K/F1oGr01Xj9c+wfgJY8DBLVKS+Oucfo1nSoOM=; b=i9VNlzHdUki5ogyIeBEAMb0VyasdyyFEPzbxOp0VXLXxNJtlUUkSTpRztvy1t9YiRfX1MS 8MqncEOwsH5IF/HJubMbmogKkHPSKV9UhDaDdZnQkhjc7eOGGnS94ZRL2wAH8/ciBhWFNG +k1N9W6eXcM8vsmVaF7gIsF/0VCfEfA= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-456-h01A6OZTNS-ONR1SDw9okg-1; Wed, 13 Nov 2024 06:05:10 -0500 X-MC-Unique: h01A6OZTNS-ONR1SDw9okg-1 X-Mimecast-MFC-AGG-ID: h01A6OZTNS-ONR1SDw9okg Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-431673032e6so44975775e9.0 for ; Wed, 13 Nov 2024 03:05:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731495908; x=1732100708; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rxEg+K/F1oGr01Xj9c+wfgJY8DBLVKS+Oucfo1nSoOM=; b=xOm+0cqdB4VtILNH3DUSz1l/NQRdkMGAzq2zOrviOrzFvSwj9epf7zsyaSGm0eb2dS 5tCLiF4CuIF1g+9BeY6sAJIEFFqxA50XbqB6Acxi4bimEQQvTaOEZoNCUCnqmyA+TgqU 5zol63yoDY76oRgveIZKgM4PmCUTa9w1zLBDSvzxuHWDZTLCMdqiSA+qHXJkfLda1YvQ JMUNwgwOxypySR3zqUtFuLWX/JhlD9QQ90Tw4qv3WoU515Id9dtKUHZAHE9L3QAtU6TE 4NPN5prbDsnulV9GHJy99z3UjHCVi28K0UJVmwsgoCI3np6LMPv/I1mEuII0c2nwGOnn 5qlQ== X-Gm-Message-State: AOJu0Ywzvq3JrGPv+0/pGgtwNQ6WslUCmD0EGglkxP+nn/3mF9ozcMLX 8MmucTJoX2hO1m3S4kbeObkAbutJazYUPzzm2ycYt6zUPFOsyNVFAHlebF1zBxlCXM6rYXJP9fA CUsgg74iucnO8Mmn0a6sQHxGBs0O91571c3pXdFZwCFzKv68L5lBBSo2v5urMI9BswAjzPEcBly XC1nj73dIXdOleihmZQ2p7qUswL6tmgvzeHeJ5guoXkmvH6eAiGA== X-Received: by 2002:a05:600c:a46:b0:431:5ed4:7e7d with SMTP id 5b1f17b1804b1-432cce78030mr53514375e9.18.1731495908351; Wed, 13 Nov 2024 03:05:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IFT4SmO/3N7m5l360cqMSkJXu8dHRotfKzAW3B8I5RfwKleU9qGgxWdHgss2VWPv3df1/fZ2Q== X-Received: by 2002:a05:600c:a46:b0:431:5ed4:7e7d with SMTP id 5b1f17b1804b1-432cce78030mr53513965e9.18.1731495907751; Wed, 13 Nov 2024 03:05:07 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-381eda0604bsm17807519f8f.105.2024.11.13.03.05.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2024 03:05:07 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Wed, 13 Nov 2024 12:04:58 +0100 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 4CKubxlj875snO6KKlR-H2T92zNcnxOynfGA3Gv-aN0_1731495909 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH ovn 2/4] pinctrl: Send periodic arp/nd to ecmp next-hops. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Introduce the capbility to periodically send ARP/ND packets for ECMP nexthops in order to resolve their L2 address. This is a preliminary patch to introduce the capability to flush stale ECMP CT entries. Signed-off-by: Lorenzo Bianconi --- NEWS | 5 + controller/ovn-controller.8.xml | 10 ++ controller/ovn-controller.c | 2 + controller/pinctrl.c | 284 +++++++++++++++++++++++++++++++- controller/pinctrl.h | 2 + 5 files changed, 300 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index da3aba739..1f8f54d5d 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,11 @@ Post v24.09.0 hash (with specified hash fields) for ECMP routes while choosing nexthop. - ovn-ic: Add support for route tag to prevent route learning. + - Add "arp-max-timeout-sec" config option to vswitchd external-ids to + cap the time between when ovn-controller sends ARP/ND packets for + ECMP-nexthop. + By default ovn-controller continuously sends ARP/ND packets for + ECMP-nexthop. OVN v24.09.0 - 13 Sep 2024 -------------------------- diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml index aeaa374c1..7f95a9932 100644 --- a/controller/ovn-controller.8.xml +++ b/controller/ovn-controller.8.xml @@ -385,6 +385,16 @@ cap for the exponential backoff used by ovn-controller to send GARPs packets. +
external_ids:arp-nd-max-timeout-sec
+
+ When used, this configuration value specifies the maximum timeout + (in seconds) between two consecutive ARP/ND packets sent by + ovn-controller to resolve ECMP nexthop mac address. + ovn-controller by default continuously sends ARP/ND + packets. Setting external_ids:arp-nd-max-timeout-sec + allows to cap for the exponential backoff used by ovn-controller + to send ARPs/NDs packets. +
external_ids:ovn-bridge-remote

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 98b144699..ecfa3b229 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5743,6 +5743,8 @@ main(int argc, char *argv[]) sbrec_mac_binding_table_get( ovnsb_idl_loop.idl), sbrec_bfd_table_get(ovnsb_idl_loop.idl), + sbrec_ecmp_nexthop_table_get( + ovnsb_idl_loop.idl), br_int, chassis, &runtime_data->local_datapaths, &runtime_data->active_tunnels, diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 3fb7e2fd7..eb6043ef8 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -164,6 +164,9 @@ static struct seq *pinctrl_main_seq; static long long int garp_rarp_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; static bool garp_rarp_continuous; +static long long int arp_nd_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; +static bool arp_nd_continuous; + static void *pinctrl_handler(void *arg); struct pinctrl { @@ -223,13 +226,17 @@ static void run_activated_ports( const struct sbrec_chassis *chassis); static void init_send_garps_rarps(void); +static void init_send_arps_nds(void); static void destroy_send_garps_rarps(void); +static void destroy_send_arps_nds(void); static void send_garp_rarp_wait(long long int send_garp_rarp_time); +static void send_arp_nd_wait(long long int send_arp_nd_time); static void send_garp_rarp_prepare( struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *, const struct sbrec_chassis *, const struct hmap *local_datapaths, @@ -239,6 +246,9 @@ static void send_garp_rarp_prepare( static void send_garp_rarp_run(struct rconn *swconn, long long int *send_garp_rarp_time) OVS_REQUIRES(pinctrl_mutex); +static void send_arp_nd_run(struct rconn *swconn, + long long int *send_arp_nd_time) + OVS_REQUIRES(pinctrl_mutex); static void pinctrl_handle_nd_na(struct rconn *swconn, const struct flow *ip_flow, const struct match *md, @@ -548,6 +558,7 @@ pinctrl_init(void) { init_put_mac_bindings(); init_send_garps_rarps(); + init_send_arps_nds(); init_ipv6_ras(); init_ipv6_prefixd(); init_buffered_packets_ctx(); @@ -3878,6 +3889,7 @@ pinctrl_handler(void *arg_) static long long int send_ipv6_ra_time = LLONG_MAX; /* Next GARP/RARP announcement in ms. */ static long long int send_garp_rarp_time = LLONG_MAX; + static long long int send_arp_nd_time = LLONG_MAX; /* Next multicast query (IGMP) in ms. */ static long long int send_mcast_query_time = LLONG_MAX; static long long int svc_monitors_next_run_time = LLONG_MAX; @@ -3915,6 +3927,7 @@ pinctrl_handler(void *arg_) if (may_inject_pkts()) { ovs_mutex_lock(&pinctrl_mutex); send_garp_rarp_run(swconn, &send_garp_rarp_time); + send_arp_nd_run(swconn, &send_arp_nd_time); send_ipv6_ras(swconn, &send_ipv6_ra_time); send_ipv6_prefixd(swconn, &send_prefixd_time); send_mac_binding_buffered_pkts(swconn); @@ -3933,6 +3946,7 @@ pinctrl_handler(void *arg_) rconn_recv_wait(swconn); if (rconn_is_connected(swconn)) { send_garp_rarp_wait(send_garp_rarp_time); + send_arp_nd_wait(send_arp_nd_time); ipv6_ra_wait(send_ipv6_ra_time); ip_mcast_querier_wait(send_mcast_query_time); svc_monitors_wait(svc_monitors_next_run_time); @@ -4019,6 +4033,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, const struct sbrec_service_monitor_table *svc_mon_table, const struct sbrec_mac_binding_table *mac_binding_table, const struct sbrec_bfd_table *bfd_table, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis, const struct hmap *local_datapaths, @@ -4035,8 +4050,9 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, sbrec_port_binding_by_key, chassis); send_garp_rarp_prepare(ovnsb_idl_txn, sbrec_port_binding_by_datapath, sbrec_port_binding_by_name, - sbrec_mac_binding_by_lport_ip, br_int, chassis, - local_datapaths, active_tunnels, ovs_table); + sbrec_mac_binding_by_lport_ip, ecmp_nh_table, + br_int, chassis, local_datapaths, active_tunnels, + ovs_table); prepare_ipv6_ras(local_active_ports_ras, sbrec_port_binding_by_name); prepare_ipv6_prefixd(ovnsb_idl_txn, sbrec_port_binding_by_name, local_active_ports_ipv6_pd, chassis, @@ -4570,6 +4586,7 @@ pinctrl_destroy(void) latch_destroy(&pinctrl.pinctrl_thread_exit); rconn_destroy(pinctrl.swconn); destroy_send_garps_rarps(); + destroy_send_arps_nds(); destroy_ipv6_ras(); destroy_ipv6_prefixd(); destroy_buffered_packets_ctx(); @@ -5077,6 +5094,150 @@ send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, } } +struct arp_nd_data { + struct hmap_node hmap_node; + struct eth_addr ea; /* Ethernet address of port. */ + struct in6_addr src_ip; /* IP address of port. */ + struct in6_addr dst_ip; /* Destination IP address */ + long long int announce_time; /* Next announcement in ms. */ + int backoff; /* Backoff timeout for the next + * announcement (in msecs). */ + uint32_t dp_key; /* Datapath used to output this GARP. */ + uint32_t port_key; /* Port to inject the GARP into. */ +}; + +static struct hmap send_arp_nd_data; + +static void +init_send_arps_nds(void) +{ + hmap_init(&send_arp_nd_data); +} + +static void +destroy_send_arps_nds(void) +{ + struct arp_nd_data *e; + HMAP_FOR_EACH_POP (e, hmap_node, &send_arp_nd_data) { + free(e); + } + hmap_destroy(&send_arp_nd_data); +} + +static struct arp_nd_data * +arp_nd_find_data(const struct sbrec_port_binding *pb, + const struct in6_addr *nexthop) +{ + uint32_t hash = 0; + + hash = hash_add_in6_addr(hash, nexthop); + hash = hash_add(hash, pb->datapath->tunnel_key); + hash = hash_add(hash, pb->tunnel_key); + + struct arp_nd_data *e; + HMAP_FOR_EACH_WITH_HASH (e, hmap_node, hash, &send_arp_nd_data) { + if (ipv6_addr_equals(&e->dst_ip, nexthop) && + e->port_key == pb->tunnel_key) { + return e; + } + } + + return NULL; +} + +static bool +arp_nd_find_is_stale(const struct arp_nd_data *e, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, + const struct sbrec_chassis *chassis) +{ + const struct sbrec_ecmp_nexthop *sb_ecmp_nexthop; + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, ecmp_nh_table) { + const struct sbrec_port_binding *pb = sb_ecmp_nexthop->port; + if (pb->chassis != chassis) { + continue; + } + + struct lport_addresses laddrs; + if (!extract_ip_addresses(sb_ecmp_nexthop->nexthop, &laddrs)) { + continue; + } + + struct in6_addr dst_ip = laddrs.n_ipv4_addrs + ? in6_addr_mapped_ipv4(laddrs.ipv4_addrs[0].addr) + : laddrs.ipv6_addrs[0].addr; + destroy_lport_addresses(&laddrs); + + if (pb->tunnel_key == e->port_key && + pb->datapath->tunnel_key == e->dp_key && + ipv6_addr_equals(&e->dst_ip, &dst_ip)) { + return false; + } + } + return true; +} + +static struct arp_nd_data * +arp_nd_alloc_data(const struct eth_addr ea, + struct in6_addr src_ip, struct in6_addr dst_ip, + const struct sbrec_port_binding *pb) +{ + struct arp_nd_data *e = xmalloc(sizeof *e); + e->ea = ea; + e->src_ip = src_ip; + e->dst_ip = dst_ip; + e->announce_time = time_msec() + 1000; + e->backoff = 1000; /* msec. */ + e->dp_key = pb->datapath->tunnel_key; + e->port_key = pb->tunnel_key; + + uint32_t hash = 0; + hash = hash_add_in6_addr(hash, &dst_ip); + hash = hash_add(hash, e->dp_key); + hash = hash_add(hash, e->port_key); + hmap_insert(&send_arp_nd_data, &e->hmap_node, hash); + notify_pinctrl_handler(); + + return e; +} + +/* Add or update a vif for which ARPs need to be announced. */ +static void +send_arp_nd_update(const struct sbrec_port_binding *pb, const char *nexthop, + long long int max_arp_timeout, bool continuous_arp_nd) +{ + struct lport_addresses laddrs; + if (!extract_ip_addresses(nexthop, &laddrs)) { + return; + } + + struct in6_addr dst_ip = laddrs.n_ipv4_addrs + ? in6_addr_mapped_ipv4(laddrs.ipv4_addrs[0].addr) + : laddrs.ipv6_addrs[0].addr; + destroy_lport_addresses(&laddrs); + + struct arp_nd_data *e = arp_nd_find_data(pb, &dst_ip); + if (!e) { + if (!extract_lsp_addresses(pb->mac[0], &laddrs)) { + return; + } + + if (laddrs.n_ipv4_addrs) { + arp_nd_alloc_data(laddrs.ea, + in6_addr_mapped_ipv4(laddrs.ipv4_addrs[0].addr), + dst_ip, pb); + } else if (laddrs.n_ipv6_addrs) { + arp_nd_alloc_data(laddrs.ea, laddrs.ipv6_addrs[0].addr, + dst_ip, pb); + } + destroy_lport_addresses(&laddrs); + } else if (max_arp_timeout != arp_nd_max_timeout || + continuous_arp_nd != arp_nd_continuous) { + /* reset backoff */ + e->announce_time = time_msec() + 1000; + e->backoff = 1000; /* msec. */ + } +} + /* Remove a vif from GARP announcements. */ static void send_garp_rarp_delete(const char *lport) @@ -6415,6 +6576,16 @@ send_garp_rarp_wait(long long int send_garp_rarp_time) } } +static void +send_arp_nd_wait(long long int send_arp_nd_time) +{ + /* Set the poll timer for next arp packet only if there is data to + * be sent. */ + if (hmap_count(&send_arp_nd_data)) { + poll_timer_wait_until(send_arp_nd_time); + } +} + /* Called with in the pinctrl_handler thread context. */ static void send_garp_rarp_run(struct rconn *swconn, long long int *send_garp_rarp_time) @@ -6437,6 +6608,84 @@ send_garp_rarp_run(struct rconn *swconn, long long int *send_garp_rarp_time) } } +static long long int +send_arp_nd(struct rconn *swconn, struct arp_nd_data *e, + long long int current_time) + OVS_REQUIRES(pinctrl_mutex) +{ + if (current_time < e->announce_time) { + return e->announce_time; + } + + /* Compose a ARP request packet. */ + uint64_t packet_stub[128 / 8]; + struct dp_packet packet; + dp_packet_use_stub(&packet, packet_stub, sizeof packet_stub); + if (IN6_IS_ADDR_V4MAPPED(&e->src_ip)) { + compose_arp(&packet, ARP_OP_REQUEST, e->ea, eth_addr_zero, + true, in6_addr_get_mapped_ipv4(&e->src_ip), + in6_addr_get_mapped_ipv4(&e->dst_ip)); + } else { + compose_nd_ns(&packet, e->ea, &e->src_ip, &e->dst_ip); + } + + /* Inject ARP request. */ + uint64_t ofpacts_stub[4096 / 8]; + struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub); + enum ofp_version version = rconn_get_version(swconn); + put_load(e->dp_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts); + put_load(e->port_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts); + struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts); + resubmit->in_port = OFPP_CONTROLLER; + resubmit->table_id = OFTABLE_LOCAL_OUTPUT; + + struct ofputil_packet_out po = { + .packet = dp_packet_data(&packet), + .packet_len = dp_packet_size(&packet), + .buffer_id = UINT32_MAX, + .ofpacts = ofpacts.data, + .ofpacts_len = ofpacts.size, + }; + match_set_in_port(&po.flow_metadata, OFPP_CONTROLLER); + enum ofputil_protocol proto = ofputil_protocol_from_ofp_version(version); + queue_msg(swconn, ofputil_encode_packet_out(&po, proto)); + dp_packet_uninit(&packet); + ofpbuf_uninit(&ofpacts); + + /* Set the next announcement. At most 5 announcements are sent for a + * vif if arp_nd_max_timeout is not specified otherwise cap the max + * timeout to arp_nd_max_timeout. */ + if (arp_nd_continuous || e->backoff < arp_nd_max_timeout) { + e->announce_time = current_time + e->backoff; + } else { + e->announce_time = LLONG_MAX; + } + e->backoff = MIN(arp_nd_max_timeout, e->backoff * 2); + + return e->announce_time; +} + +static void +send_arp_nd_run(struct rconn *swconn, long long int *send_arp_nd_time) + OVS_REQUIRES(pinctrl_mutex) +{ + if (!hmap_count(&send_arp_nd_data)) { + return; + } + + /* Send ARPs, and update the next announcement. */ + long long int current_time = time_msec(); + *send_arp_nd_time = LLONG_MAX; + + struct arp_nd_data *e; + HMAP_FOR_EACH (e, hmap_node, &send_arp_nd_data) { + long long int next_announce = send_arp_nd(swconn, e, current_time); + if (*send_arp_nd_time > next_announce) { + *send_arp_nd_time = next_announce; + } + } +} + /* Called by pinctrl_run(). Runs with in the main ovn-controller * thread context. */ static void @@ -6444,6 +6693,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis, const struct hmap *local_datapaths, @@ -6456,7 +6706,8 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, struct sset nat_ip_keys = SSET_INITIALIZER(&nat_ip_keys); struct shash nat_addresses; unsigned long long garp_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; - bool garp_continuous = false; + unsigned long long max_arp_nd_timeout = GARP_RARP_DEF_MAX_TIMEOUT; + bool garp_continuous = false, continuous_arp_nd = true; const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_table_first(ovs_table); if (cfg) { @@ -6466,6 +6717,11 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, if (!garp_max_timeout) { garp_max_timeout = GARP_RARP_DEF_MAX_TIMEOUT; } + + max_arp_nd_timeout = smap_get_ullong( + &cfg->external_ids, "arp-nd-max-timeout-sec", + GARP_RARP_DEF_MAX_TIMEOUT / 1000) * 1000; + continuous_arp_nd = !!max_arp_nd_timeout; } shash_init(&nat_addresses); @@ -6479,6 +6735,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, &nat_ip_keys, &local_l3gw_ports, chassis, active_tunnels, &nat_addresses); + /* For deleted ports and deleted nat ips, remove from * send_garp_rarp_data. */ struct shash_node *iter; @@ -6514,6 +6771,24 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, } } + struct arp_nd_data *e; + const struct sbrec_ecmp_nexthop *sb_ecmp_nexthop; + HMAP_FOR_EACH_SAFE (e, hmap_node, &send_arp_nd_data) { + if (arp_nd_find_is_stale(e, ecmp_nh_table, chassis)) { + hmap_remove(&send_arp_nd_data, &e->hmap_node); + free(e); + notify_pinctrl_handler(); + } + } + + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, ecmp_nh_table) { + const struct sbrec_port_binding *pb = sb_ecmp_nexthop->port; + if (pb && !strcmp(pb->type, "l3gateway") && pb->chassis == chassis) { + send_arp_nd_update(pb, sb_ecmp_nexthop->nexthop, + max_arp_nd_timeout, continuous_arp_nd); + } + } + /* pinctrl_handler thread will send the GARPs. */ sset_destroy(&localnet_vifs); @@ -6531,6 +6806,9 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, garp_rarp_max_timeout = garp_max_timeout; garp_rarp_continuous = garp_continuous; + + arp_nd_max_timeout = max_arp_nd_timeout; + arp_nd_continuous = continuous_arp_nd; } static bool diff --git a/controller/pinctrl.h b/controller/pinctrl.h index 846afe0a4..8459f4f53 100644 --- a/controller/pinctrl.h +++ b/controller/pinctrl.h @@ -36,6 +36,7 @@ struct sbrec_dns_table; struct sbrec_controller_event_table; struct sbrec_service_monitor_table; struct sbrec_bfd_table; +struct sbrec_ecmp_nexthop_table; struct sbrec_port_binding; struct sbrec_mac_binding_table; @@ -53,6 +54,7 @@ void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, const struct sbrec_service_monitor_table *, const struct sbrec_mac_binding_table *, const struct sbrec_bfd_table *, + const struct sbrec_ecmp_nexthop_table *, const struct ovsrec_bridge *, const struct sbrec_chassis *, const struct hmap *local_datapaths, const struct sset *active_tunnels, From patchwork Wed Nov 13 11:04:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2010710 X-Patchwork-Delegate: dceara@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=iZkcBYtN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XpL6Y0v4Vz1xty for ; Wed, 13 Nov 2024 22:05:25 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 122D940578; Wed, 13 Nov 2024 11:05:23 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id bOZdxIYgGkPZ; Wed, 13 Nov 2024 11:05:20 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 9892440560 Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=iZkcBYtN Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id 9892440560; Wed, 13 Nov 2024 11:05:19 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 389E2C08C1; Wed, 13 Nov 2024 11:05:19 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id EE292C08A8 for ; Wed, 13 Nov 2024 11:05:15 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id D3A72405F3 for ; Wed, 13 Nov 2024 11:05:15 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id gh-DKLA7DStG for ; Wed, 13 Nov 2024 11:05:13 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 7B5084055A Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 7B5084055A Authentication-Results: smtp2.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=iZkcBYtN Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 7B5084055A for ; Wed, 13 Nov 2024 11:05:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731495912; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NZxyg16YVJRkpToxxutDuzQFKXQwThSLX9kar85cdOc=; b=iZkcBYtNGydwmDA0+IUgLWBiaj2QW8APlWmmfeVk4+/MWsJ9Le2vo8id2D2yMBZuDNKi1p L3x6bgUxSl7ekv/MpoZb/wMfZ7ApA4sgyMN3CKSpjhEdAGA3TRhuwdNoSbZEhk0c0jYy6C H/kYwmfrp9FcF5I3bU2dhP7zPQY8hoc= Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-482-gFna3GajOgyp2wKByxaXLQ-1; Wed, 13 Nov 2024 06:05:11 -0500 X-MC-Unique: gFna3GajOgyp2wKByxaXLQ-1 X-Mimecast-MFC-AGG-ID: gFna3GajOgyp2wKByxaXLQ Received: by mail-lf1-f69.google.com with SMTP id 2adb3069b0e04-539ec1a590fso5308018e87.0 for ; Wed, 13 Nov 2024 03:05:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731495910; x=1732100710; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NZxyg16YVJRkpToxxutDuzQFKXQwThSLX9kar85cdOc=; b=UDgDAlVP8vt5pGUc32RpU4TlpTue3aYKpKAi6w/nUpiT3bzV2btdwQCKX7iG/OfxEW WpbWnu1zCTFiIDRNeqoJpK36giF2tHzpca+Hen02eKnTfkfnr7kjvV3HCwjzHgXS+TOV F+pQa0uUnc0Pb3SSzWbDrD1PsSU1AWv6o85Sj70OYbHGqwXSrZDIQCDh4lqdq+iPMvEI 9x0xFL+7DKmo5PnIhQxWpvfLX7XcHW/yTcZyBwST6PqOTKThlMMs8ubavBAutucZUjva Tyy0tvzvnX7CmB5F4fX8R01zEENvmnX5s5eXNQLRGC+cuhIKp3jL1AwGLvl4yDDFvmtD RaKA== X-Gm-Message-State: AOJu0YyRYnocgwzp1TdAxI3Ka9iYIqiyFY9FB/34k53Q6cd0O+FOFYuC Heq0D2Uz9LnWmfgvFeNPYVKm+OAHoy7up3aU7bH5Zbfv6CbFGthh6XY1EpuizBWCNO8regZxTGH 5VTsXOEJUshWxoyeVM210838BC7B57P+s6rGK/24AaDdQjlsy0RxEgqIYuUrFNO6H8X+hJBgrpl ZMJzvlIjtF5Edh4GMJW1Axtkkt+fXLChQkD4ExKbm8W9g9yV5CRQ== X-Received: by 2002:a05:6512:3d1d:b0:53d:69f4:c5e0 with SMTP id 2adb3069b0e04-53d862bd96bmr9475927e87.12.1731495909555; Wed, 13 Nov 2024 03:05:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IHg52ntsVGO5iVaDDtY4L7btg5Ooj15/FEz13uhzlyly33keK4duLcxx1/tx01y2rsuKGiv2A== X-Received: by 2002:a05:6512:3d1d:b0:53d:69f4:c5e0 with SMTP id 2adb3069b0e04-53d862bd96bmr9475896e87.12.1731495909008; Wed, 13 Nov 2024 03:05:09 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-381ed97cfefsm17912749f8f.26.2024.11.13.03.05.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2024 03:05:08 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Wed, 13 Nov 2024 12:04:59 +0100 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: kz170-MGc8RxbioZ1GOpIk5FqzAPY-53UkhAWU4c_g0_1731495910 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH ovn 3/4] pinctrl: Update ecmp-nexthop mac resolving L2 address. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Set the mac address column in the ECMP_Nexthop table updating sb MAC_Binding table. This is a preliminary patch to introduce the capability to flush stale ECMP CT entries. Signed-off-by: Lorenzo Bianconi --- controller/ovn-controller.c | 5 +++ controller/pinctrl.c | 88 ++++++++++++++++++++++++++++++------- controller/pinctrl.h | 1 + 3 files changed, 79 insertions(+), 15 deletions(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index ecfa3b229..6cee6450d 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5002,6 +5002,10 @@ main(int argc, char *argv[]) = ovsdb_idl_index_create2(ovnsb_idl_loop.idl, &sbrec_fdb_col_mac, &sbrec_fdb_col_dp_key); + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop + = ovsdb_idl_index_create2(ovnsb_idl_loop.idl, + &sbrec_ecmp_nexthop_col_nexthop, + &sbrec_ecmp_nexthop_col_port); struct ovsdb_idl_index *sbrec_mac_binding_by_datapath = mac_binding_by_datapath_index_create(ovnsb_idl_loop.idl); struct ovsdb_idl_index *sbrec_static_mac_binding_by_datapath @@ -5736,6 +5740,7 @@ main(int argc, char *argv[]) sbrec_igmp_group, sbrec_ip_multicast, sbrec_fdb_by_dp_key_mac, + sbrec_ecmp_by_nexthop, sbrec_controller_event_table_get( ovnsb_idl_loop.idl), sbrec_service_monitor_table_get( diff --git a/controller/pinctrl.c b/controller/pinctrl.c index eb6043ef8..5a4e94300 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -200,11 +200,16 @@ static void pinctrl_handle_put_mac_binding(const struct flow *md, OVS_REQUIRES(pinctrl_mutex); static void init_put_mac_bindings(void); static void destroy_put_mac_bindings(void); +static const struct sbrec_ecmp_nexthop *ecmp_nexthop_lookup( + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, + const char *nexthop, + const struct sbrec_port_binding *pb); static void run_put_mac_bindings( struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_datapath_binding_by_key, struct ovsdb_idl_index *sbrec_port_binding_by_key, - struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip) + struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop) OVS_REQUIRES(pinctrl_mutex); static void wait_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn); static void send_mac_binding_buffered_pkts(struct rconn *swconn) @@ -236,6 +241,7 @@ static void send_garp_rarp_prepare( struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *, const struct sbrec_chassis *, @@ -4029,6 +4035,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_igmp_groups, struct ovsdb_idl_index *sbrec_ip_multicast_opts, struct ovsdb_idl_index *sbrec_fdb_by_dp_key_mac, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_controller_event_table *ce_table, const struct sbrec_service_monitor_table *svc_mon_table, const struct sbrec_mac_binding_table *mac_binding_table, @@ -4045,12 +4052,14 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, ovs_mutex_lock(&pinctrl_mutex); run_put_mac_bindings(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_key, - sbrec_mac_binding_by_lport_ip); + sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop); run_put_vport_bindings(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_key, chassis); send_garp_rarp_prepare(ovnsb_idl_txn, sbrec_port_binding_by_datapath, sbrec_port_binding_by_name, - sbrec_mac_binding_by_lport_ip, ecmp_nh_table, + sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop, ecmp_nh_table, br_int, chassis, local_datapaths, active_tunnels, ovs_table); prepare_ipv6_ras(local_active_ports_ras, sbrec_port_binding_by_name); @@ -4696,8 +4705,10 @@ send_mac_binding_buffered_pkts(struct rconn *swconn) static void mac_binding_add_to_sb(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const char *logical_port, const struct sbrec_datapath_binding *dp, + const struct sbrec_port_binding *pb, struct eth_addr ea, const char *ip, bool update_only) { @@ -4727,6 +4738,12 @@ mac_binding_add_to_sb(struct ovsdb_idl_txn *ovnsb_idl_txn, sbrec_mac_binding_set_timestamp(b, time_wall_msec()); } } + + const struct sbrec_ecmp_nexthop *ecmp_nh = + ecmp_nexthop_lookup(sbrec_ecmp_by_nexthop, ip, pb); + if (ecmp_nh) { + sbrec_ecmp_nexthop_set_mac(ecmp_nh, mac_string); + } } /* Simulate the effect of a GARP on local datapaths, i.e., create MAC_Bindings @@ -4735,6 +4752,7 @@ mac_binding_add_to_sb(struct ovsdb_idl_txn *ovnsb_idl_txn, static void send_garp_locally(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct hmap *local_datapaths, const struct sbrec_port_binding *in_pb, struct eth_addr ea, ovs_be32 ip) @@ -4764,17 +4782,34 @@ send_garp_locally(struct ovsdb_idl_txn *ovnsb_idl_txn, ip_format_masked(ip, OVS_BE32_MAX, &ip_s); mac_binding_add_to_sb(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - remote->logical_port, remote->datapath, - ea, ds_cstr(&ip_s), update_only); + sbrec_ecmp_by_nexthop, remote->logical_port, + remote->datapath, remote, ea, ds_cstr(&ip_s), + update_only); ds_destroy(&ip_s); } } +static const struct sbrec_ecmp_nexthop * +ecmp_nexthop_lookup(struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, + const char *nexthop, const struct sbrec_port_binding *pb) +{ + struct sbrec_ecmp_nexthop *ecmp_nh = + sbrec_ecmp_nexthop_index_init_row(sbrec_ecmp_by_nexthop); + sbrec_ecmp_nexthop_index_set_nexthop(ecmp_nh, nexthop); + sbrec_ecmp_nexthop_index_set_port(ecmp_nh, pb); + const struct sbrec_ecmp_nexthop *retval = + sbrec_ecmp_nexthop_index_find(sbrec_ecmp_by_nexthop, ecmp_nh); + sbrec_ecmp_nexthop_index_destroy_row(ecmp_nh); + + return retval; +} + static void run_put_mac_binding(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_datapath_binding_by_key, struct ovsdb_idl_index *sbrec_port_binding_by_key, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct mac_binding *mb) { /* Convert logical datapath and logical port key into lport. */ @@ -4797,8 +4832,9 @@ run_put_mac_binding(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ds ip_s = DS_EMPTY_INITIALIZER; ipv6_format_mapped(&mb->data.ip, &ip_s); mac_binding_add_to_sb(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - pb->logical_port, pb->datapath, mb->data.mac, - ds_cstr(&ip_s), false); + sbrec_ecmp_by_nexthop, pb->logical_port, + pb->datapath, pb, mb->data.mac, ds_cstr(&ip_s), + false); ds_destroy(&ip_s); } @@ -4808,7 +4844,8 @@ static void run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_datapath_binding_by_key, struct ovsdb_idl_index *sbrec_port_binding_by_key, - struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip) + struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop) OVS_REQUIRES(pinctrl_mutex) { if (!ovnsb_idl_txn) { @@ -4823,7 +4860,8 @@ run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn, run_put_mac_binding(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_key, - sbrec_mac_binding_by_lport_ip, mb); + sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop, mb); mac_binding_remove(&put_mac_bindings, mb); } } @@ -4973,6 +5011,7 @@ add_garp_rarp(const char *name, const struct eth_addr ea, ovs_be32 ip, static void send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct hmap *local_datapaths, const struct sbrec_port_binding *binding_rec, struct shash *nat_addresses, @@ -5016,8 +5055,9 @@ send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, binding_rec->tunnel_key); send_garp_locally(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - local_datapaths, binding_rec, laddrs->ea, - laddrs->ipv4_addrs[i].addr); + sbrec_ecmp_by_nexthop, + local_datapaths, binding_rec, + laddrs->ea, laddrs->ipv4_addrs[i].addr); } free(name); @@ -5086,7 +5126,8 @@ send_garp_rarp_update(struct ovsdb_idl_txn *ovnsb_idl_txn, binding_rec->tunnel_key); if (ip) { send_garp_locally(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - local_datapaths, binding_rec, laddrs.ea, ip); + sbrec_ecmp_by_nexthop, local_datapaths, + binding_rec, laddrs.ea, ip); } destroy_lport_addresses(&laddrs); @@ -6693,6 +6734,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_port_binding_by_datapath, struct ovsdb_idl_index *sbrec_port_binding_by_name, struct ovsdb_idl_index *sbrec_mac_binding_by_lport_ip, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_ecmp_nexthop_table *ecmp_nh_table, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis, @@ -6754,6 +6796,7 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, if (pb && !smap_get_bool(&pb->options, "disable_garp_rarp", false)) { send_garp_rarp_update(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, + sbrec_ecmp_by_nexthop, local_datapaths, pb, &nat_addresses, garp_max_timeout, garp_continuous); } @@ -6766,8 +6809,9 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, = lport_lookup_by_name(sbrec_port_binding_by_name, gw_port); if (pb && !smap_get_bool(&pb->options, "disable_garp_rarp", false)) { send_garp_rarp_update(ovnsb_idl_txn, sbrec_mac_binding_by_lport_ip, - local_datapaths, pb, &nat_addresses, - garp_max_timeout, garp_continuous); + sbrec_ecmp_by_nexthop, local_datapaths, pb, + &nat_addresses, garp_max_timeout, + garp_continuous); } } @@ -6783,7 +6827,21 @@ send_garp_rarp_prepare(struct ovsdb_idl_txn *ovnsb_idl_txn, SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sb_ecmp_nexthop, ecmp_nh_table) { const struct sbrec_port_binding *pb = sb_ecmp_nexthop->port; - if (pb && !strcmp(pb->type, "l3gateway") && pb->chassis == chassis) { + if (!pb || pb->chassis != chassis) { + continue; + } + + /* Update mac binding if not already set. */ + if (!strcmp(sb_ecmp_nexthop->mac, "")) { + const struct sbrec_mac_binding *mb = + mac_binding_lookup(sbrec_mac_binding_by_lport_ip, + pb->logical_port, sb_ecmp_nexthop->nexthop); + if (ovnsb_idl_txn && mb) { + sbrec_ecmp_nexthop_set_mac(sb_ecmp_nexthop, mb->mac); + } + } + + if (!strcmp(pb->type, "l3gateway")) { send_arp_nd_update(pb, sb_ecmp_nexthop->nexthop, max_arp_nd_timeout, continuous_arp_nd); } diff --git a/controller/pinctrl.h b/controller/pinctrl.h index 8459f4f53..5fc956620 100644 --- a/controller/pinctrl.h +++ b/controller/pinctrl.h @@ -50,6 +50,7 @@ void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn, struct ovsdb_idl_index *sbrec_igmp_groups, struct ovsdb_idl_index *sbrec_ip_multicast_opts, struct ovsdb_idl_index *sbrec_fdb_by_dp_key_mac, + struct ovsdb_idl_index *sbrec_ecmp_by_nexthop, const struct sbrec_controller_event_table *, const struct sbrec_service_monitor_table *, const struct sbrec_mac_binding_table *, From patchwork Wed Nov 13 11:05:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Bianconi X-Patchwork-Id: 2010711 X-Patchwork-Delegate: dceara@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MSIuDu5G; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XpL6h4ZKnz1xty for ; Wed, 13 Nov 2024 22:05:32 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id E89A7608A3; Wed, 13 Nov 2024 11:05:30 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id KSpCoz1WsXqB; Wed, 13 Nov 2024 11:05:23 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org C12776087E Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MSIuDu5G Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id C12776087E; Wed, 13 Nov 2024 11:05:22 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6C354C08A9; Wed, 13 Nov 2024 11:05:22 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5F45EC08BC for ; Wed, 13 Nov 2024 11:05:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 3F1A840B42 for ; Wed, 13 Nov 2024 11:05:21 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 558Zr2HchxNp for ; Wed, 13 Nov 2024 11:05:16 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=lorenzo.bianconi@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 65845405F3 Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=none dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 65845405F3 Authentication-Results: smtp2.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MSIuDu5G Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 65845405F3 for ; Wed, 13 Nov 2024 11:05:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731495915; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pBrSuepmWBhNnq+S0Dp6ctt/9Rkc6yHbWhtbwIZ80ng=; b=MSIuDu5GV6zSBsXbny2pwWBZtPJsS1iujD5qrLfgBMAe4SVWkKMFh2KgFiqYZ6Oug59kMo F/cEtj8tNILNp8Ks5d2Y7PniWXOJo4D8EyAzADD7eRSi9lkexpfRQcNEBCKcF9880Rv75l eI8J+V+/J4t+DGm7zhaEd+Hh/5zLVr0= Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-416-VqwwnzSbPqqfpU9XsHElZw-1; Wed, 13 Nov 2024 06:05:13 -0500 X-MC-Unique: VqwwnzSbPqqfpU9XsHElZw-1 X-Mimecast-MFC-AGG-ID: VqwwnzSbPqqfpU9XsHElZw Received: by mail-lf1-f69.google.com with SMTP id 2adb3069b0e04-53da27c79a0so292158e87.1 for ; Wed, 13 Nov 2024 03:05:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731495912; x=1732100712; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pBrSuepmWBhNnq+S0Dp6ctt/9Rkc6yHbWhtbwIZ80ng=; b=baLY7TMIUB4AIe+9I+Amu+Rro799Eto1IagwL3lcVvY/gexAv6AX7OYy78C0+QaFsD 5Wbfd1zR3vWrG/VX3Mawqpcwo4fCV5YMb69hIeuyg/zqYc9yrUqzclZcL1maj9qLtUIq RLjrBWZaE4hw3xrbmHTTv65LoD5oqFfkQ9s7PEhlhKgw7U/SAoa7SN6JZEkBdEqERUgJ Dtfif24UnDMbJlb5aqaizQjZ8iu0DL2SnQ7Am938H9mMuUYiNmTIxsI2jeoSAxAmJGsF rP+XbDwYn8ouu65Xd3dEGSXeGBR2QHIuqdsd2nNfdyUaAomArDzm0IHhmUj5ChH6RY3M 3H7Q== X-Gm-Message-State: AOJu0Yy/F+5yiuIhbYu2euIxc7DS2CCwBjTfNCmXLDltWepYzW+u2Sc+ KJkjZ/LIF5TYyKfLp62S0+qcTNcStflo5d/j/UzzVXSKmHboYtnjr40BBLAAveXnMyiVebHDHcJ G/eAyloqNGShfZXQRQ2qzx9mysMXPnZy/GqKhyxc0qBhlcS8yUQ2f33jNkvMjUifaiAYcfFJkg9 HKLNbWtA0SEuBDJWIGiMZDjLfRdKVk3fyeaUMxKNErA3HbOi9TFA== X-Received: by 2002:a05:6512:39c4:b0:539:e6fc:4170 with SMTP id 2adb3069b0e04-53d9fe91a6bmr1367681e87.32.1731495911421; Wed, 13 Nov 2024 03:05:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IH7KPz5oSvv/HUvnsK8AgVIEJe5KZcM764YMqcXk5sNEVyoDD0xBIjzOD2P91ZwMBsyua6Fhg== X-Received: by 2002:a05:6512:39c4:b0:539:e6fc:4170 with SMTP id 2adb3069b0e04-53d9fe91a6bmr1367632e87.32.1731495910599; Wed, 13 Nov 2024 03:05:10 -0800 (PST) Received: from localhost (net-93-146-37-148.cust.vodafonedsl.it. [93.146.37.148]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-432d54f6fa0sm21124645e9.11.2024.11.13.03.05.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Nov 2024 03:05:09 -0800 (PST) From: Lorenzo Bianconi To: ovs-dev@openvswitch.org Date: Wed, 13 Nov 2024 12:05:00 +0100 Message-ID: X-Mailer: git-send-email 2.47.0 In-Reply-To: References: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: QHOLluMCFDMUGEa_uU3WVNWfd84vjSWXqhjeZuGJnSI_1731495912 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH ovn 4/4] ofctrl: Introduce ecmp_nexthop_monitor. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: dceara@redhat.com Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Introduce ecmp_nexthop_monitor in ovn-controller in order to track and flush ecmp-symmetric reply ct entires when requested by the CMS (e.g removing the related static ecmp routes). CT entries are flushed using the ethernet mac address stored in ct_label. Signed-off-by: Lorenzo Bianconi --- NEWS | 2 + controller/automake.mk | 4 +- controller/ecmp-next-hop-monitor.c | 184 ++++++++++ controller/ecmp-next-hop-monitor.h | 25 ++ controller/ofctrl.c | 7 + controller/ofctrl.h | 3 + controller/ovn-controller.c | 3 + include/ovn/logical-fields.h | 3 + tests/system-ovn.at | 526 +++++++++++++++++++++++++++++ 9 files changed, 756 insertions(+), 1 deletion(-) create mode 100644 controller/ecmp-next-hop-monitor.c create mode 100644 controller/ecmp-next-hop-monitor.h diff --git a/NEWS b/NEWS index 1f8f54d5d..f46285d32 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,8 @@ Post v24.09.0 ECMP-nexthop. By default ovn-controller continuously sends ARP/ND packets for ECMP-nexthop. + - Introduce ovn-controller ECMP_nexthop monitor in order to flush stale ct + entries when related ecmp routes are removed by the CMS. OVN v24.09.0 - 13 Sep 2024 -------------------------- diff --git a/controller/automake.mk b/controller/automake.mk index bb0bf2d33..766e36382 100644 --- a/controller/automake.mk +++ b/controller/automake.mk @@ -51,7 +51,9 @@ controller_ovn_controller_SOURCES = \ controller/ct-zone.h \ controller/ct-zone.c \ controller/ovn-dns.c \ - controller/ovn-dns.h + controller/ovn-dns.h \ + controller/ecmp-next-hop-monitor.h \ + controller/ecmp-next-hop-monitor.c controller_ovn_controller_LDADD = lib/libovn.la $(OVS_LIBDIR)/libopenvswitch.la man_MANS += controller/ovn-controller.8 diff --git a/controller/ecmp-next-hop-monitor.c b/controller/ecmp-next-hop-monitor.c new file mode 100644 index 000000000..bafe9750f --- /dev/null +++ b/controller/ecmp-next-hop-monitor.c @@ -0,0 +1,184 @@ +/* Copyright (c) 2024, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "ct-zone.h" +#include "lib/ovn-util.h" +#include "lib/simap.h" +#include "openvswitch/hmap.h" +#include "openvswitch/ofp-ct.h" +#include "openvswitch/rconn.h" +#include "openvswitch/vlog.h" +#include "ovn/logical-fields.h" +#include "ovn-sb-idl.h" +#include "controller/ecmp-next-hop-monitor.h" + +VLOG_DEFINE_THIS_MODULE(ecmp_next_hop_monitor); + +static struct hmap ecmp_nexthop; + +struct ecmp_nexthop_data { + struct hmap_node hmap_node; + uint16_t zone_id; + char *nexthop; + char *mac; +}; + +void ecmp_nexthop_init(void) +{ + hmap_init(&ecmp_nexthop); +} + +static void +ecmp_nexthop_erase_entry(struct ecmp_nexthop_data *e) +{ + free(e->nexthop); + free(e->mac); + free(e); +} + +static void +ecmp_nexthop_destroy_map(struct hmap *map) +{ + struct ecmp_nexthop_data *e; + HMAP_FOR_EACH_POP (e, hmap_node, map) { + ecmp_nexthop_erase_entry(e); + } + hmap_destroy(map); +} + +void ecmp_nexthop_destroy(void) +{ + ecmp_nexthop_destroy_map(&ecmp_nexthop); +} + +static struct ecmp_nexthop_data * +ecmp_nexthop_alloc_entry(const char *nexthop, const char *mac, + const uint16_t zone_id, struct hmap *map) +{ + struct ecmp_nexthop_data *e = xmalloc(sizeof *e); + e->nexthop = xstrdup(nexthop); + e->mac = xstrdup(mac); + e->zone_id = zone_id; + + uint32_t hash = hash_string(nexthop, 0); + hash = hash_add(hash, hash_string(mac, 0)); + hash = hash_add(hash, zone_id); + hmap_insert(map, &e->hmap_node, hash); + + return e; +} + +static struct ecmp_nexthop_data * +ecmp_nexthop_find_entry(const char *nexthop, const char *mac, + const uint16_t zone_id, struct hmap *map) +{ + uint32_t hash = hash_string(nexthop, 0); + hash = hash_add(hash, hash_string(mac, 0)); + hash = hash_add(hash, zone_id); + + struct ecmp_nexthop_data *e; + HMAP_FOR_EACH_WITH_HASH (e, hmap_node, hash, map) { + if (!strcmp(e->nexthop, nexthop) && + !strcmp(e->mac, mac) && e->zone_id == zone_id) { + return e; + } + } + return NULL; +} + +static void +ecmp_nexthop_monitor_flush_ct_entry(const struct rconn *swconn, + const char *mac, uint16_t zone_id, + struct ovs_list *msgs) +{ + struct eth_addr ea; + if (!ovs_scan(mac, ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(ea))) { + return; + } + + ovs_u128 mask = { + /* ct_label.ecmp_reply_eth BITS[32-79] */ + .u64.hi = OVN_CT_ECMP_ETH_HIGH, + .u64.lo = OVN_CT_ECMP_ETH_LOW, + }; + + ovs_be32 lo = get_unaligned_be32((void *)&ea.be16[1]); + ovs_u128 nexthop = { + .u64.hi = ntohs(ea.be16[0]), + .u64.lo = (uint64_t) ntohl(lo) << 32, + }; + + struct ofp_ct_match match = { + .labels = nexthop, + .labels_mask = mask, + }; + struct ofpbuf *msg = ofp_ct_match_encode(&match, &zone_id, + rconn_get_version(swconn)); + ovs_list_push_back(msgs, &msg->list_node); +} + +void +ecmp_nexthop_monitor_run(const struct sbrec_ecmp_nexthop_table *enh_table, + const struct shash *current_ct_zones, + const struct rconn *swconn, struct ovs_list *msgs) +{ + struct hmap sb_ecmp_nexthop = HMAP_INITIALIZER(&sb_ecmp_nexthop); + + const struct sbrec_ecmp_nexthop *sbrec_ecmp_nexthop; + SBREC_ECMP_NEXTHOP_TABLE_FOR_EACH (sbrec_ecmp_nexthop, enh_table) { + struct sbrec_port_binding *pb = sbrec_ecmp_nexthop->port; + if (!pb) { + continue; + } + + const char *dp_name = smap_get(&pb->datapath->external_ids, "name"); + if (!dp_name) { + continue; + } + + char *name = xasprintf("%s_dnat", dp_name); + struct ct_zone *ct_zone = shash_find_data(current_ct_zones, name); + free(name); + + if (!ct_zone) { + continue; + } + + if (!ecmp_nexthop_find_entry(sbrec_ecmp_nexthop->nexthop, + sbrec_ecmp_nexthop->mac, ct_zone->zone, + &ecmp_nexthop)) { + ecmp_nexthop_alloc_entry(sbrec_ecmp_nexthop->nexthop, + sbrec_ecmp_nexthop->mac, + ct_zone->zone, &ecmp_nexthop); + } + ecmp_nexthop_alloc_entry(sbrec_ecmp_nexthop->nexthop, + sbrec_ecmp_nexthop->mac, ct_zone->zone, + &sb_ecmp_nexthop); + } + + struct ecmp_nexthop_data *e; + HMAP_FOR_EACH_SAFE (e, hmap_node, &ecmp_nexthop) { + if (!ecmp_nexthop_find_entry(e->nexthop, e->mac, e->zone_id, + &sb_ecmp_nexthop)) { + ecmp_nexthop_monitor_flush_ct_entry(swconn, e->mac, + e->zone_id, msgs); + hmap_remove(&ecmp_nexthop, &e->hmap_node); + ecmp_nexthop_erase_entry(e); + } + } + + ecmp_nexthop_destroy_map(&sb_ecmp_nexthop); +} diff --git a/controller/ecmp-next-hop-monitor.h b/controller/ecmp-next-hop-monitor.h new file mode 100644 index 000000000..ee8278e3b --- /dev/null +++ b/controller/ecmp-next-hop-monitor.h @@ -0,0 +1,25 @@ +/* Copyright (c) 2024, Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef OVN_CMP_NEXT_HOP_MONITOR_H +#define OVN_CMP_NEXT_HOP_MONITOR_H + +void ecmp_nexthop_init(void); +void ecmp_nexthop_destroy(void); +void ecmp_nexthop_monitor_run(const struct sbrec_ecmp_nexthop_table *enh_table, + const struct shash *current_ct_zones, + const struct rconn *swconn, + struct ovs_list *msgs); +#endif /* OVN_CMP_NEXT_HOP_MONITOR_H */ diff --git a/controller/ofctrl.c b/controller/ofctrl.c index f9387d375..e44da749d 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -54,6 +54,7 @@ #include "vswitch-idl.h" #include "ovn-sb-idl.h" #include "ct-zone.h" +#include "ecmp-next-hop-monitor.h" VLOG_DEFINE_THIS_MODULE(ofctrl); @@ -425,6 +426,7 @@ ofctrl_init(struct ovn_extend_table *group_table, tx_counter = rconn_packet_counter_create(); hmap_init(&installed_lflows); hmap_init(&installed_pflows); + ecmp_nexthop_init(); ovs_list_init(&flow_updates); ovn_init_symtab(&symtab); groups = group_table; @@ -877,6 +879,7 @@ ofctrl_destroy(void) expr_symtab_destroy(&symtab); shash_destroy(&symtab); ofctrl_meter_bands_destroy(); + ecmp_nexthop_destroy(); } uint64_t @@ -2662,8 +2665,10 @@ void ofctrl_put(struct ovn_desired_flow_table *lflow_table, struct ovn_desired_flow_table *pflow_table, struct shash *pending_ct_zones, + struct shash *current_ct_zones, struct hmap *pending_lb_tuples, struct ovsdb_idl_index *sbrec_meter_by_name, + const struct sbrec_ecmp_nexthop_table *enh_table, uint64_t req_cfg, bool lflows_changed, bool pflows_changed) @@ -2704,6 +2709,8 @@ ofctrl_put(struct ovn_desired_flow_table *lflow_table, /* OpenFlow messages to send to the switch to bring it up-to-date. */ struct ovs_list msgs = OVS_LIST_INITIALIZER(&msgs); + ecmp_nexthop_monitor_run(enh_table, current_ct_zones, swconn, &msgs); + /* Iterate through ct zones that need to be flushed. */ struct shash_node *iter; SHASH_FOR_EACH(iter, pending_ct_zones) { diff --git a/controller/ofctrl.h b/controller/ofctrl.h index 129e3b6ad..5735cd553 100644 --- a/controller/ofctrl.h +++ b/controller/ofctrl.h @@ -31,6 +31,7 @@ struct ofpbuf; struct ovsrec_bridge; struct ovsrec_open_vswitch_table; struct sbrec_meter_table; +struct sbrec_ecmp_nexthop_table; struct shash; struct ovn_desired_flow_table { @@ -57,8 +58,10 @@ enum mf_field_id ofctrl_get_mf_field_id(void); void ofctrl_put(struct ovn_desired_flow_table *lflow_table, struct ovn_desired_flow_table *pflow_table, struct shash *pending_ct_zones, + struct shash *current_ct_zones, struct hmap *pending_lb_tuples, struct ovsdb_idl_index *sbrec_meter_by_name, + const struct sbrec_ecmp_nexthop_table *enh_table, uint64_t nb_cfg, bool lflow_changed, bool pflow_changed); diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 6cee6450d..4b05077d3 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5821,8 +5821,11 @@ main(int argc, char *argv[]) ofctrl_put(&lflow_output_data->flow_table, &pflow_output_data->flow_table, &ct_zones_data->ctx.pending, + &ct_zones_data->ctx.current, &lb_data->removed_tuples, sbrec_meter_by_name, + sbrec_ecmp_nexthop_table_get( + ovnsb_idl_loop.idl), ofctrl_seqno_get_req_cfg(), engine_node_changed(&en_lflow_output), engine_node_changed(&en_pflow_output)); diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h index d563e044c..a024b0cd3 100644 --- a/include/ovn/logical-fields.h +++ b/include/ovn/logical-fields.h @@ -212,6 +212,9 @@ const struct ovn_field *ovn_field_from_name(const char *name); #define OVN_CT_ECMP_ETH_1ST_BIT 32 #define OVN_CT_ECMP_ETH_END_BIT 79 +#define OVN_CT_ECMP_ETH_LOW (((1ULL << OVN_CT_ECMP_ETH_1ST_BIT) - 1) << 32) +#define OVN_CT_ECMP_ETH_HIGH ((1ULL << (OVN_CT_ECMP_ETH_END_BIT - 63)) - 1) + #define OVN_CT_STR(LABEL_VALUE) OVS_STRINGIZE(LABEL_VALUE) #define OVN_CT_MASKED_STR(LABEL_VALUE) \ OVS_STRINGIZE(LABEL_VALUE) "/" OVS_STRINGIZE(LABEL_VALUE) diff --git a/tests/system-ovn.at b/tests/system-ovn.at index 6dfc3055a..e9d15898f 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -14002,3 +14002,529 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d /.*terminating with signal 15.*/d"]) AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ECMP Flush CT entries - IPv4]) +AT_KEYWORDS([ecmp]) +ovn_start +OVS_TRAFFIC_VSWITCHD_START() + +ADD_BR([br-int]) +ADD_BR([br-ext]) +ADD_BR([br-ecmp]) + +ovs-ofctl add-flow br-ext action=normal +ovs-ofctl add-flow br-ecmp action=normal +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller +ovs-vsctl set Open_vSwitch . external-ids:arp-max-timeout-sec=1 + +check ovn-nbctl lr-add R1 +check ovn-nbctl set logical_router R1 options:chassis=hv1 +check ovn-nbctl lr-add R2 +check ovn-nbctl set logical_router R2 options:chassis=hv1 + +check ovn-nbctl ls-add sw0 +check ovn-nbctl ls-add sw1 +check ovn-nbctl ls-add public + +check ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 192.168.1.1/24 +check ovn-nbctl lrp-add R1 rp-public1 00:00:02:01:02:03 172.16.1.1/24 + +check ovn-nbctl lrp-add R2 rp-sw1 00:00:03:01:02:03 192.168.2.1/24 +check ovn-nbctl lrp-add R2 rp-public2 00:00:04:01:02:03 172.16.1.5/24 + +check ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \ + type=router options:router-port=rp-sw0 \ + -- lsp-set-addresses sw0-rp router + +check ovn-nbctl lsp-add sw1 sw1-rp -- set Logical_Switch_Port sw1-rp \ + type=router options:router-port=rp-sw1 \ + -- lsp-set-addresses sw1-rp router + +check ovn-nbctl lsp-add public public-rp1 -- set Logical_Switch_Port public-rp1 \ + type=router options:router-port=rp-public1 \ + -- lsp-set-addresses public-rp1 router + +check ovn-nbctl lsp-add public public-rp2 -- set Logical_Switch_Port public-rp2 \ + type=router options:router-port=rp-public2 \ + -- lsp-set-addresses public-rp2 router + +ADD_NAMESPACES(alice) +ADD_VETH(alice, alice, br-int, "192.168.1.2/24", "f0:00:00:01:02:03", \ + "192.168.1.1") +check ovn-nbctl lsp-add sw0 alice \ + -- lsp-set-addresses alice "f0:00:00:01:02:03 192.168.1.2" + +ADD_NAMESPACES(peter) +ADD_VETH(peter, peter, br-int, "192.168.2.2/24", "f0:00:02:01:02:03", \ + "192.168.2.1") +check ovn-nbctl lsp-add sw1 peter \ + -- lsp-set-addresses peter "f0:00:02:01:02:03 192.168.2.2" + +check ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext +check ovn-nbctl lsp-add public public1 \ + -- lsp-set-addresses public1 unknown \ + -- lsp-set-type public1 localnet \ + -- lsp-set-options public1 network_name=phynet + +ADD_NAMESPACES(ecmp-path0) +ADD_VETH(ecmp-p01, ecmp-path0, br-ext, "172.16.1.2/24", "f0:00:00:01:02:04", "172.16.1.1") +ADD_VETH(ecmp-p02, ecmp-path0, br-ecmp, "172.16.2.2/24", "f0:00:00:01:03:04") + +ADD_NAMESPACES(ecmp-path1) +ADD_VETH(ecmp-p11, ecmp-path1, br-ext, "172.16.1.3/24", "f0:00:00:01:02:05", "172.16.1.1") +ADD_VETH(ecmp-p12, ecmp-path1, br-ecmp, "172.16.2.3/24", "f0:00:00:01:03:05") + +ADD_NAMESPACES(bob) +ADD_VETH(bob, bob, br-ecmp, "172.16.2.10/24", "f0:00:00:01:02:06", "172.16.2.2") + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.3 + +wait_for_ports_up +check ovn-nbctl --wait=hv sync +NETNS_DAEMONIZE([alice], [nc -l -k 80], [alice.pid]) +NETNS_DAEMONIZE([peter], [nc -l -k 80], [peter.pid]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='172.16.1.2' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020400000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +]) + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip route del 0.0.0.0/0 via 172.16.2.2]) +NS_CHECK_EXEC([bob], [ip route add 0.0.0.0/0 via 172.16.2.3]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +check_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='172.16.1.2' +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020400000000 +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +ovn-sbctl list ECMP_Nexthop > /tmp/ecmp-nh + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Add the route back and verify we do not flush if we have multiple next-hops with the same mac address +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.2 +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='172.16.1.2' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:05]) +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.2' + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip route del 0.0.0.0/0 via 172.16.2.3]) +NS_CHECK_EXEC([bob], [ip route add 0.0.0.0/0 via 172.16.2.2]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +# Remove second ECMP route +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 + +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:06]) + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 172.16.2.0/24 172.16.1.3 + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 172.16.2.0/24 172.16.1.2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 172.16.2.0/24 172.16.1.3 + +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 4 + +NS_CHECK_EXEC([ecmp-path0], [ip route add 192.168.2.2/32 via 172.16.1.5]) +NS_CHECK_EXEC([ecmp-path1], [ip route add 192.168.2.2/32 via 172.16.1.5]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.1.2 80], [0]) + +NS_CHECK_EXEC([bob], [ping -q -c 3 -i 0.3 -w 2 192.168.2.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -z 192.168.2.2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.1.2,id=,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020600000000 +icmp,orig=(src=172.16.2.10,dst=192.168.2.2,id=,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=172.16.2.10,dst=192.168.1.2,sport=,dport=),reply=(src=192.168.1.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +tcp,orig=(src=172.16.2.10,dst=192.168.2.2,sport=,dport=),reply=(src=192.168.2.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:06' ECMP_Nexthop mac nexthop='172.16.1.2' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='172.16.1.3' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmp,orig=(src=172.16.2.10,dst=192.168.2.2,id=,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.2.10,id=,type=0,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=172.16.2.10,dst=192.168.2.2,sport=,dport=),reply=(src=192.168.2.2,dst=172.16.2.10,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +check ovn-nbctl lr-route-del R2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d +/.*terminating with signal 15.*/d"]) +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ECMP Flush CT entries - IPv6]) +AT_KEYWORDS([ecmp]) +ovn_start +OVS_TRAFFIC_VSWITCHD_START() + +ADD_BR([br-int]) +ADD_BR([br-ext]) +ADD_BR([br-ecmp]) + +ovs-ofctl add-flow br-ext action=normal +ovs-ofctl add-flow br-ecmp action=normal +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller +ovs-vsctl set Open_vSwitch . external-ids:arp-max-timeout-sec=1 + +check ovn-nbctl lr-add R1 +check ovn-nbctl set logical_router R1 options:chassis=hv1 +check ovn-nbctl lr-add R2 +check ovn-nbctl set logical_router R2 options:chassis=hv1 + +check ovn-nbctl ls-add sw0 +check ovn-nbctl ls-add sw1 +check ovn-nbctl ls-add public + +check ovn-nbctl lrp-add R1 rp-sw0 00:00:01:01:02:03 fd11::1/64 +check ovn-nbctl lrp-add R1 rp-public1 00:00:02:01:02:03 fd12::1/64 + +check ovn-nbctl lrp-add R2 rp-sw1 00:00:03:01:02:03 fd14::1/64 +check ovn-nbctl lrp-add R2 rp-public2 00:00:04:01:02:03 fd12::5/64 + +check ovn-nbctl lsp-add sw0 sw0-rp -- set Logical_Switch_Port sw0-rp \ + type=router options:router-port=rp-sw0 \ + -- lsp-set-addresses sw0-rp router + +check ovn-nbctl lsp-add sw1 sw1-rp -- set Logical_Switch_Port sw1-rp \ + type=router options:router-port=rp-sw1 \ + -- lsp-set-addresses sw1-rp router + +check ovn-nbctl lsp-add public public-rp1 -- set Logical_Switch_Port public-rp1 \ + type=router options:router-port=rp-public1 \ + -- lsp-set-addresses public-rp1 router + +check ovn-nbctl lsp-add public public-rp2 -- set Logical_Switch_Port public-rp2 \ + type=router options:router-port=rp-public2 \ + -- lsp-set-addresses public-rp2 router + +ADD_NAMESPACES(alice) +ADD_VETH(alice, alice, br-int, "fd11::2/64", "f0:00:00:01:02:03", "fd11::1", "nodad") +check ovn-nbctl lsp-add sw0 alice -- lsp-set-addresses alice "f0:00:00:01:02:03 fd11::2" + +ADD_NAMESPACES(peter) +ADD_VETH(peter, peter, br-int, "fd14::2/64", "f0:00:02:01:02:03", "fd14::1", "nodad") +check ovn-nbctl lsp-add sw1 peter -- lsp-set-addresses peter "f0:00:02:01:02:03 fd14::2" + +check ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext +check ovn-nbctl lsp-add public public1 \ + -- lsp-set-addresses public1 unknown \ + -- lsp-set-type public1 localnet \ + -- lsp-set-options public1 network_name=phynet + +ADD_NAMESPACES(ecmp-path0) +ADD_VETH(ecmp-p01, ecmp-path0, br-ext, "fd12::2/64", "f0:00:00:01:02:04", "fd12::1", "nodad") +ADD_VETH(ecmp-p02, ecmp-path0, br-ecmp, "fd13::2/64", "f0:00:00:01:03:04") +OVS_WAIT_UNTIL([NS_EXEC([ecmp-path0], [ip a show dev ecmp-p02 | grep "fe80::" | grep -v tentative])]) + +ADD_NAMESPACES(ecmp-path1) +ADD_VETH(ecmp-p11, ecmp-path1, br-ext, "fd12::3/64", "f0:00:00:01:02:05", "fd12::1", "nodad") +ADD_VETH(ecmp-p12, ecmp-path1, br-ecmp, "fd13::3/64", "f0:00:00:01:03:05") +OVS_WAIT_UNTIL([NS_EXEC([ecmp-path1], [ip a show dev ecmp-p12 | grep "fe80::" | grep -v tentative])]) + +ADD_NAMESPACES(bob) +ADD_VETH(bob, bob, br-ecmp, "fd13::a/64", "f0:00:00:01:02:06", "fd13::2", "nodad") + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::3 + +NS_CHECK_EXEC([ecmp-path0], [sysctl -w net.ipv6.conf.all.forwarding=1],[0], [dnl +net.ipv6.conf.all.forwarding = 1 +]) +NS_CHECK_EXEC([ecmp-path1], [sysctl -w net.ipv6.conf.all.forwarding=1],[0], [dnl +net.ipv6.conf.all.forwarding = 1 +]) + +ovn-nbctl --wait=hv sync +NETNS_DAEMONIZE([alice], [nc -6 -l -k 80], [alice.pid]) +NETNS_DAEMONIZE([peter], [nc -6 -l -k 80], [peter.pid]) + +NS_CHECK_EXEC([bob], [ping6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='"fd12::2"' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020400000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +]) + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip -6 route del ::/0 via fd13::2]) +NS_CHECK_EXEC([bob], [ip -6 route add ::/0 via fd13::3]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +wait_row_count ECMP_Nexthop 2 +check_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='"fd12::2"' +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020400000000 +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020400000000,protoinfo=(state=) +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 fd13::/64 fd12::2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 +check_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + + Add the route back and verify we do not flush if we have multiple next-hops with the same mac address +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::2 +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:04' ECMP_Nexthop mac nexthop='"fd12::2"' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' +# +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:05]) +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::2"' + +# Change bob default IP address +NS_CHECK_EXEC([bob], [ip -6 route del ::/0 via fd13::3]) +NS_CHECK_EXEC([bob], [ip -6 route add ::/0 via fd13::2]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020500000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020500000000,protoinfo=(state=) +]) + +# Remove first ECMP route +check ovn-nbctl lr-route-del R1 fd13::/64 fd12::2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 1 + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +# Remove second ECMP route +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 + +NS_CHECK_EXEC([ecmp-path0], [ip link set dev ecmp-p01 address f0:00:00:01:02:06]) + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R1 fd13::/64 fd12::3 + +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 fd13::/64 fd12::2 +check ovn-nbctl --ecmp-symmetric-reply lr-route-add R2 fd13::/64 fd12::3 + +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 4 + +NS_CHECK_EXEC([ecmp-path0], [ip route add fd14::2/128 via fd12::5]) +NS_CHECK_EXEC([ecmp-path1], [ip route add fd14::2/128 via fd12::5]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd11::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd11::2 80], [0]) + +NS_CHECK_EXEC([bob], [ping -6 -q -c 3 -i 0.3 -w 2 fd14::2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([bob], [nc -6 -z fd14::2 80], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd11::2,id=,type=128,code=0),reply=(src=fd11::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020600000000 +icmpv6,orig=(src=fd13::a,dst=fd14::2,id=,type=128,code=0),reply=(src=fd14::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=fd13::a,dst=fd11::2,sport=,dport=),reply=(src=fd11::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +tcp,orig=(src=fd13::a,dst=fd14::2,sport=,dport=),reply=(src=fd14::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +# Remove second ECMP route +check ovn-nbctl lr-route-del R1 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 2 +wait_column 'f0:00:00:01:02:06' ECMP_Nexthop mac nexthop='"fd12::2"' +wait_column 'f0:00:00:01:02:05' ECMP_Nexthop mac nexthop='"fd12::3"' + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd13::a) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +icmpv6,orig=(src=fd13::a,dst=fd14::2,id=,type=128,code=0),reply=(src=fd14::2,dst=fd13::a,id=,type=129,code=0),zone=,mark=,labels=0xf0000001020600000000 +tcp,orig=(src=fd13::a,dst=fd14::2,sport=,dport=),reply=(src=fd14::2,dst=fd13::a,sport=,dport=),zone=,mark=,labels=0xf0000001020600000000,protoinfo=(state=) +]) + +check ovn-nbctl lr-route-del R2 +check ovn-nbctl --wait=hv sync +wait_row_count ECMP_Nexthop 0 +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.2.10) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/mark=[[0-9]]*/mark=/' | sort], [0], [dnl +]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d +/.*terminating with signal 15.*/d"]) +AT_CLEANUP +])