Message ID | 20240806111812.54352-1-martin.kalcok@canonical.com |
---|---|
State | Changes Requested |
Delegated to: | Dumitru Ceara |
Headers | show |
Series | [ovs-dev,v4] northd: BGP port redirecting. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
Hi Martin, Sorry, I was reviewing V3 but I was slow in actually sending out the email. On 8/6/24 13:17, Martin Kalcok wrote: > This change adds a 'bgp-redirect' option to LRP that allows > redirecting of BGP control plane traffic to an arbitrary LSP > in its peer LS. > > The option expects a string with a LSP name. When set, > any traffic entering LS that's destined for any of the > LRP's IP addresses (including IPv6 LLA) is redirected > to the LSP specified in the option's value. > > This enables external BGP daemons to listen on an interface > bound to a LSP and effectively act as if they were listening > on (and speaking from) LRP's IP address. > > Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> > --- Strictly on this version of the patch, and with the thought in mind that we might want to consider this feature experimental [0] and maybe change it (NB too) in the future, I left a few comments inline. With those addressed I think the patch looks OK to me. [0] https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ Now, for the point when we'd remove the "experimental" tag: In general, it makes me a bit uneasy that we have to share the MAC and IP between the LRP and the VIF of logical switch port that's connected to the same switch as the LRP. I was thinking of alternatives for the future and the only things I could come up with until now are: 1. Create a separate, "disconnected" logical switch: ovn-nbctl ls-add bgp Add the bgp-daemon port to it: ovn-nbctl lsp-add bgp bgp-daemon. Then we don't need "unknown" either, I think. But it's not possible today to move packets from the ingress pipeline of a logical switch ("public" in this test) to the egress pipeline of a different logical switch ("bgp" in my example). It also feels icky to implement that. 2. Add the ability to bind an OVS port directly to a logical router port: then we could do the same type of redirect you do here but instead in the logical router pipeline. The advantage is we don't have to drop any non-bgp traffic in the switch pipeline. The switch keeps functioning as it does today. Maybe an advantage of this second alternative would be that we can easily attach a filtering option to this LRP (e.g., LRP.options:control-traffic-filter) to allow us to be more flexible in what kind of traffic we forward to the actuall routing protocol daemon that runs behind that OVS port - Vladislav also mentioned during the meeting it might be interesting to forward BFD packets to the FRR (or whatever application implements the routing protocol) instance too. But again, if we consider the current feature "experimental", we can spend more time during the next release cycle to figure out what's best. > > v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" > as discussed during community meeting. > > northd/northd.c | 108 ++++++++++++++++++++++++++++++++++++++++ > northd/ovn-northd.8.xml | 23 +++++++++ > ovn-nb.xml | 14 ++++++ > tests/ovn-northd.at | 58 +++++++++++++++++++++ > tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ > 5 files changed, 289 insertions(+) > > diff --git a/northd/northd.c b/northd/northd.c > index 4353df07d..088104f25 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, > } > } > > +static void > +build_bgp_redirect_rule__( > + const char *s_addr, const char *bgp_port_name, bool is_ipv6, > + struct ovn_port *ls_peer, struct lflow_table *lflows, > + struct ds *match, struct ds *actions) > +{ > + int ip_ver = is_ipv6 ? 6 : 4; > + /* Redirect packets in the input pipeline destined for LR's IP to > + * the port specified in 'bgp-redirect' option. > + */ > + ds_clear(match); > + ds_clear(actions); > + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); > + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); > + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, > + ds_cstr(match), > + ds_cstr(actions), > + ls_peer->lflow_ref); > + > + > + /* Drop any traffic originating from 'bgp-redirect' port that does > + * not originate from BGP daemon port. This blocks unnecessary > + * traffic like ARP broadcasts or IPv6 router solicitation packets > + * from the dummy 'bgp-redirect' port. > + */ > + ds_clear(match); > + ds_put_format(match, "inport == \"%s\"", bgp_port_name); > + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, > + ds_cstr(match), > + REGBIT_PORT_SEC_DROP " = 1; next;", > + ls_peer->lflow_ref); Not a huge deal but this logical flow gets regenerated multiple times for each bgp_port_name (for each LRP address). In the end we'll only install one in the database so there's no impact on SB. But maybe we should consider moving this part out of this function. > + > + ds_put_format(match, > + " && ip%d.src == %s && tcp.src == 179", > + ip_ver, > + s_addr); > + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, > + ds_cstr(match), > + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", > + ls_peer->lflow_ref); In your system test we set 'addresses = "unknown"' for 'bgp_port_name'. And I guess we kind of have to do it like that all the time when using this feature. I'm trying to think of scenarios in which users set LSP.addresses = 'unknown' and LSP.port-security = '<some-addresses>'. I'm not sure I saw those until now but maybe I'm wrong. > +} > + > +static void > +build_lrouter_bgp_redirect( > + struct ovn_port *op, struct lflow_table *lflows, > + struct ds *match, struct ds *actions) > +{ > + /* LRP has to have a peer.*/ > + if (op->peer == NULL) { > + return; > + } > + /* LRP has to have NB record.*/ > + if (op->nbrp == NULL) { > + return; > + } > + > + /* Proceed only for LRPs that have 'bgp-redirect' option set. Value of this > + * option is the name of LSP to which a BGP traffic will be redirected. > + */ > + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-redirect"); > + if (bgp_port == NULL) { > + return; > + } > + > + /* Ensure that LSP, to which the BGP traffic is redirected, exists.*/ > + struct ovn_port *peer_lsp; > + bool bgp_port_exists = false; > + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { > + size_t peer_lsp_s = strlen(peer_lsp->key); > + if (peer_lsp_s == strlen(bgp_port) > + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ > + bgp_port_exists = true; > + break; > + } > + } > + > + if (!bgp_port_exists) { > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on Logical Router Port " > + "'%s' refers to non-existent Logical Switch Port. " > + "BGP redirecting won't be configured.", > + op->key); > + return; > + } > + > + if (op->cr_port != NULL) { > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not supported on" > + " Distributed Gateway Port '%s'", op->key); > + return; > + } > + > + /* Mirror traffic destined for LRP's IPs and default BGP port > + * to the port defined in 'bgp-redirect' option. > + */ > + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { > + const char *ip_s = op->lrp_networks.ipv4_addrs[i].addr_s; > + build_bgp_redirect_rule__(ip_s, bgp_port, false, op->peer, lflows, > + match, actions); > + } > + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { > + const char *ip_s = op->lrp_networks.ipv6_addrs[i].addr_s; > + build_bgp_redirect_rule__(ip_s, bgp_port, true, op->peer, lflows, > + match, actions); > + } > +} > + > /* This function adds ARP resolve flows related to a LSP. */ > static void > build_arp_resolve_flows_for_lsp( > @@ -16003,6 +16110,7 @@ build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, > lsi->meter_groups, op->lflow_ref); > build_lrouter_icmp_packet_toobig_admin_flows(op, lsi->lflows, &lsi->match, > &lsi->actions, op->lflow_ref); > + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, &lsi->actions); > } > > static void * > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > index b06b09ac5..4cbdf8f74 100644 > --- a/northd/ovn-northd.8.xml > +++ b/northd/ovn-northd.8.xml > @@ -284,6 +284,21 @@ > dropped in the next stage. > </li> > > + <li> > + For each logical port that's defined as a target of BGP redirecting > + (via <code>bgp-redirect</code> option set on Logical Router Port), a > + filter is set in place that disallows any traffic entering this port > + that does not originate from Logical Router Port's IPs and default BGP > + port (TCP 179). This filtering is achieved by two rules. First rule has > + priority 81, it matches on <code>inport == <var>BGP_MIRROR_PORT</var> > + && ip.src == <var>LRP_IP</var> && tcp.src == 179</code> > + and allows traffic further with <code>REGBIT_PORT_SEC_DROP" = > + check_in_port_sec(); next;</code>. Second rule with priority 80 matches > + the rest of the traffic from that port and sets > + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that the packets are > + dropped in the next stage. > + </li> > + > <li> > For each (enabled) vtep logical port, a priority 70 flow is added which > matches on all packets and applies the action > @@ -1949,6 +1964,14 @@ output; > on the logical switch. > </li> > > + <li> > + For any logical port that's defined as a target of BGP redirecting (via > + <code>bgp-redirect</code> option set on Logical Router Port), a > + priority-100 flow is added that redirects traffic for Logical Router > + Port IPs (including IPv6 LLA) and TCP port 179, to the targeted > + logical port. > + </li> > + > <li> > Priority-90 flows for transit switches that forward registered > IP multicast traffic to their corresponding multicast group , which > diff --git a/ovn-nb.xml b/ovn-nb.xml > index 9552534f6..2d2f5c9e0 100644 > --- a/ovn-nb.xml > +++ b/ovn-nb.xml > @@ -3440,6 +3440,20 @@ or > </p> > </column> > > + <column name="options" key="bgp-redirect" type='{"type": "string"}'> > + <p> > + This option expects a name of a Logical Switch Port that's present > + in the peer's Logical Switch. If set, it causes any traffic > + that's destined for Logical Router Port's IP addresses (including > + its IPv6 LLA) and the default BGP port (TCP 179), to be redirected > + to the specified Logical Switch Port. > + > + This allows external BGP daemon to be bound to a port in OVN's > + Logical Switch and act as if it was listening on Logical Router > + Port's IP address. > + </p> > + </column> > + > <column name="options" key="gateway_mtu_bypass"> > <p> > When configured, represents a match expression, in the same > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index a389d1988..df1e924ab 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d > > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD_NO_HV([ > +AT_SETUP([BGP control plane redirect]) > +ovn_start > + > +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 > + > +check ovn-nbctl lr-add lr -- \ > + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 > +check ovn-nbctl --wait=sb set logical_router lr options:chassis=hv1 > + > +check ovn-nbctl ls-add ls -- \ > + lsp-add ls ls-lr -- \ > + lsp-set-type ls-lr router -- \ > + lsp-set-addresses ls-lr router -- \ > + lsp-set-options ls-lr router-port=lr-ls > + > +check ovn-nbctl lsp-add ls lsp-bgp -- \ > + lsp-set-addresses lsp-bgp unknown > + > +# Function that ensures that no BGP redirect ruels are installed. > +check_no_bgp_redirect() { > + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 > +]) > + > + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc -l], [0], [0 > +]) > +} > + > +# By default, no rules related to BGP redirect are present > +check_no_bgp_redirect > + > +# Set "lsp-bgp" port as target of BGP control plane redirected traffic > +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-bgp > + > +# Check that BGP control plane traffic is redirected to "lsp-bgp" > +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl > + table=??(ls_in_l2_lkup ), priority=100 , match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) > + table=??(ls_in_l2_lkup ), priority=100 , match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) > +]) > + > +# Check that only BGP-related traffic is accepted on "lsp-bgp" port > +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], [dnl > + table=??(ls_in_check_port_sec), priority=80 , match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) > + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) > + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) > +]) > + > +# Remove 'bgp-redirect' option from LRP and check that rules are removed > +check ovn-nbctl --wait=sb remove logical_router_port lr-ls options bgp-redirect > +check_no_bgp_redirect > + > +# Set non-existent LSP as target of 'bgp-redirect' and check that no rules are added > +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-foo > +check_no_bgp_redirect > + > +AT_CLEANUP > +]) > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > index c24ede7c5..c09c1136e 100644 > --- a/tests/system-ovn.at > +++ b/tests/system-ovn.at > @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d > /connection dropped.*/d"]) > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([BGP control plane redirect]) Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. > +ovn_start > +OVS_TRAFFIC_VSWITCHD_START() > + > +ADD_BR([br-int]) > +ADD_BR([br-ext]) > + > +ovs-ofctl add-flow br-ext action=normal > +# Set external-ids in br-int needed for ovn-controller > +ovs-vsctl \ > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ > + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ > + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true Most ov*-ctl commands should be prefixed with "check" in this test. > + > +# Start ovn-controller > +start_daemon ovn-controller > + > +ovn-nbctl lr-add R1 \ > + -- set Logical_Router R1 options:chassis=hv1 > + > +ovn-nbctl ls-add public > + > +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 > + > +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ > + type=router options:router-port=rp-public \ > + -- lsp-set-addresses public-rp router > + > +ovn-nbctl lsp-add public bgp-daemon \ > + -- lsp-set-addresses bgp-daemon unknown > + > +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext]) > +ovn-nbctl lsp-add public public1 \ > + -- lsp-set-addresses public1 unknown \ > + -- lsp-set-type public1 localnet \ > + -- lsp-set-options public1 network_name=phynet > + > +ovn-nbctl --wait=hv sync > + > +# Set option that redirects BGP control plane traffic to a LSP "bgp-daemon" > +ovn-nbctl --wait=sb set logical_router_port rp-public options:bgp-redirect=bgp-daemon > + > +# Create "bgp-daemon" interface in a namespace with IP and MAC matching LRP "rp-public" > +ADD_NAMESPACES(bgp-daemon) > +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", "00:00:02:01:02:03") > + > +ADD_NAMESPACES(ext-foo) > +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", "00:10:10:01:02:13", \ > + "172.16.1.1") > + > +# Flip the interface down/up to get proper IPv6 LLA > +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) > +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) > + > +# Wait until IPv6 LLA loses the "tentative" flag otherwise it can't be bound to. > +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp-daemon | grep "fe80::" | grep -v tentative])]) > +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | grep "fe80::" | grep -v tentative])]) > + > +# Verify that BGP control plane traffic is delivered to the "bgp-daemon" > +# interface on both IPv4 and IPv6 LLA addresses > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], [bgp_v4.pid]) > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only 172.16.1.1 179]) > + > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 fe80::200:2ff:fe01:203%ext-foo 179]) > + > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > + > +as ovn-sb > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +as ovn-nb > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +as northd > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > + > +as > +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > +/.*terminating with signal 15.*/d"]) > +AT_CLEANUP > +]) Regards, Dumitru
Hi Martin, Dumitru, On 06.08.2024 19:23, Dumitru Ceara wrote: > Hi Martin, > > Sorry, I was reviewing V3 but I was slow in actually sending out the email. > > On 8/6/24 13:17, Martin Kalcok wrote: >> This change adds a 'bgp-redirect' option to LRP that allows >> redirecting of BGP control plane traffic to an arbitrary LSP >> in its peer LS. >> >> The option expects a string with a LSP name. When set, >> any traffic entering LS that's destined for any of the >> LRP's IP addresses (including IPv6 LLA) is redirected >> to the LSP specified in the option's value. >> >> This enables external BGP daemons to listen on an interface >> bound to a LSP and effectively act as if they were listening >> on (and speaking from) LRP's IP address. >> >> Signed-off-by: Martin Kalcok<martin.kalcok@canonical.com> >> --- > Strictly on this version of the patch, and with the thought in mind that > we might want to consider this feature experimental [0] and maybe change > it (NB too) in the future, I left a few comments inline. With those > addressed I think the patch looks OK to me. > > [0] > https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ > > Now, for the point when we'd remove the "experimental" tag: > > In general, it makes me a bit uneasy that we have to share the MAC and > IP between the LRP and the VIF of logical switch port that's connected > to the same switch as the LRP. > > I was thinking of alternatives for the future and the only things I > could come up with until now are: > > 1. Create a separate, "disconnected" logical switch: > ovn-nbctl ls-add bgp > > Add the bgp-daemon port to it: > ovn-nbctl lsp-add bgp bgp-daemon. > > Then we don't need "unknown" either, I think. > > But it's not possible today to move packets from the ingress pipeline of > a logical switch ("public" in this test) to the egress pipeline of a > different logical switch ("bgp" in my example). It also feels icky to > implement that. > > 2. Add the ability to bind an OVS port directly to a logical router port: > > then we could do the same type of redirect you do here but instead in > the logical router pipeline. The advantage is we don't have to drop any > non-bgp traffic in the switch pipeline. The switch keeps functioning as > it does today. > > Maybe an advantage of this second alternative would be that we can > easily attach a filtering option to this LRP (e.g., > LRP.options:control-traffic-filter) to allow us to be more flexible in > what kind of traffic we forward to the actuall routing protocol daemon > that runs behind that OVS port - Vladislav also mentioned during the > meeting it might be interesting to forward BFD packets to the FRR (or > whatever application implements the routing protocol) instance too. @Martin, I'm again kindly asking about possible support for BFD redirect. Is it possible to incorporate these changes as an additional configuration know ("bfd-redirect") into your patch so we can start using this functionality internally and possibly give some feedback (hopefully positive)? It doesn't seem to be a very big change but looks very attractive for us. I can send a separate patch for this on top of your patch, but technically it won't be able to jump into the 24.09 because formally it's already a soft-freeze in progress. As an option I can send a patch, which can be squashed into this one. What do you think? > > But again, if we consider the current feature "experimental", we can > spend more time during the next release cycle to figure out what's best. > >> v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" >> as discussed during community meeting. >> >> northd/northd.c | 108 ++++++++++++++++++++++++++++++++++++++++ >> northd/ovn-northd.8.xml | 23 +++++++++ >> ovn-nb.xml | 14 ++++++ >> tests/ovn-northd.at | 58 +++++++++++++++++++++ >> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >> 5 files changed, 289 insertions(+) >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 4353df07d..088104f25 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, >> } >> } >> >> +static void >> +build_bgp_redirect_rule__( >> + const char *s_addr, const char *bgp_port_name, bool is_ipv6, >> + struct ovn_port *ls_peer, struct lflow_table *lflows, >> + struct ds *match, struct ds *actions) >> +{ >> + int ip_ver = is_ipv6 ? 6 : 4; >> + /* Redirect packets in the input pipeline destined for LR's IP to >> + * the port specified in 'bgp-redirect' option. >> + */ >> + ds_clear(match); >> + ds_clear(actions); >> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); >> + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); >> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, >> + ds_cstr(match), >> + ds_cstr(actions), >> + ls_peer->lflow_ref); >> + >> + >> + /* Drop any traffic originating from 'bgp-redirect' port that does >> + * not originate from BGP daemon port. This blocks unnecessary >> + * traffic like ARP broadcasts or IPv6 router solicitation packets >> + * from the dummy 'bgp-redirect' port. >> + */ >> + ds_clear(match); >> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, >> + ds_cstr(match), >> + REGBIT_PORT_SEC_DROP " = 1; next;", >> + ls_peer->lflow_ref); > Not a huge deal but this logical flow gets regenerated multiple times > for each bgp_port_name (for each LRP address). In the end we'll only > install one in the database so there's no impact on SB. But maybe we > should consider moving this part out of this function. > >> + >> + ds_put_format(match, >> + " && ip%d.src == %s && tcp.src == 179", >> + ip_ver, >> + s_addr); >> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, >> + ds_cstr(match), >> + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", >> + ls_peer->lflow_ref); > In your system test we set 'addresses = "unknown"' for 'bgp_port_name'. > And I guess we kind of have to do it like that all the time when using > this feature. I'm trying to think of scenarios in which users set > LSP.addresses = 'unknown' and LSP.port-security = '<some-addresses>'. > I'm not sure I saw those until now but maybe I'm wrong. > >> +} >> + >> +static void >> +build_lrouter_bgp_redirect( >> + struct ovn_port *op, struct lflow_table *lflows, >> + struct ds *match, struct ds *actions) >> +{ >> + /* LRP has to have a peer.*/ >> + if (op->peer == NULL) { >> + return; >> + } >> + /* LRP has to have NB record.*/ >> + if (op->nbrp == NULL) { >> + return; >> + } >> + >> + /* Proceed only for LRPs that have 'bgp-redirect' option set. Value of this >> + * option is the name of LSP to which a BGP traffic will be redirected. >> + */ >> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-redirect"); >> + if (bgp_port == NULL) { >> + return; >> + } >> + >> + /* Ensure that LSP, to which the BGP traffic is redirected, exists.*/ >> + struct ovn_port *peer_lsp; >> + bool bgp_port_exists = false; >> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >> + size_t peer_lsp_s = strlen(peer_lsp->key); >> + if (peer_lsp_s == strlen(bgp_port) >> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >> + bgp_port_exists = true; >> + break; >> + } >> + } >> + >> + if (!bgp_port_exists) { >> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on Logical Router Port " >> + "'%s' refers to non-existent Logical Switch Port. " >> + "BGP redirecting won't be configured.", >> + op->key); >> + return; >> + } >> + >> + if (op->cr_port != NULL) { >> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not supported on" >> + " Distributed Gateway Port '%s'", op->key); >> + return; >> + } >> + >> + /* Mirror traffic destined for LRP's IPs and default BGP port >> + * to the port defined in 'bgp-redirect' option. >> + */ >> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { >> + const char *ip_s = op->lrp_networks.ipv4_addrs[i].addr_s; >> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op->peer, lflows, >> + match, actions); >> + } >> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { >> + const char *ip_s = op->lrp_networks.ipv6_addrs[i].addr_s; >> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op->peer, lflows, >> + match, actions); >> + } >> +} >> + >> /* This function adds ARP resolve flows related to a LSP. */ >> static void >> build_arp_resolve_flows_for_lsp( >> @@ -16003,6 +16110,7 @@ build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >> lsi->meter_groups, op->lflow_ref); >> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi->lflows, &lsi->match, >> &lsi->actions, op->lflow_ref); >> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, &lsi->actions); >> } >> >> static void * >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >> index b06b09ac5..4cbdf8f74 100644 >> --- a/northd/ovn-northd.8.xml >> +++ b/northd/ovn-northd.8.xml >> @@ -284,6 +284,21 @@ >> dropped in the next stage. >> </li> >> >> + <li> >> + For each logical port that's defined as a target of BGP redirecting >> + (via <code>bgp-redirect</code> option set on Logical Router Port), a >> + filter is set in place that disallows any traffic entering this port >> + that does not originate from Logical Router Port's IPs and default BGP >> + port (TCP 179). This filtering is achieved by two rules. First rule has >> + priority 81, it matches on <code>inport == <var>BGP_MIRROR_PORT</var> >> + && ip.src == <var>LRP_IP</var> && tcp.src == 179</code> >> + and allows traffic further with <code>REGBIT_PORT_SEC_DROP" = >> + check_in_port_sec(); next;</code>. Second rule with priority 80 matches >> + the rest of the traffic from that port and sets >> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that the packets are >> + dropped in the next stage. >> + </li> >> + >> <li> >> For each (enabled) vtep logical port, a priority 70 flow is added which >> matches on all packets and applies the action >> @@ -1949,6 +1964,14 @@ output; >> on the logical switch. >> </li> >> >> + <li> >> + For any logical port that's defined as a target of BGP redirecting (via >> + <code>bgp-redirect</code> option set on Logical Router Port), a >> + priority-100 flow is added that redirects traffic for Logical Router >> + Port IPs (including IPv6 LLA) and TCP port 179, to the targeted >> + logical port. >> + </li> >> + >> <li> >> Priority-90 flows for transit switches that forward registered >> IP multicast traffic to their corresponding multicast group , which >> diff --git a/ovn-nb.xml b/ovn-nb.xml >> index 9552534f6..2d2f5c9e0 100644 >> --- a/ovn-nb.xml >> +++ b/ovn-nb.xml >> @@ -3440,6 +3440,20 @@ or >> </p> >> </column> >> >> + <column name="options" key="bgp-redirect" type='{"type": "string"}'> >> + <p> >> + This option expects a name of a Logical Switch Port that's present >> + in the peer's Logical Switch. If set, it causes any traffic >> + that's destined for Logical Router Port's IP addresses (including >> + its IPv6 LLA) and the default BGP port (TCP 179), to be redirected >> + to the specified Logical Switch Port. >> + >> + This allows external BGP daemon to be bound to a port in OVN's >> + Logical Switch and act as if it was listening on Logical Router >> + Port's IP address. >> + </p> >> + </column> >> + >> <column name="options" key="gateway_mtu_bypass"> >> <p> >> When configured, represents a match expression, in the same >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >> index a389d1988..df1e924ab 100644 >> --- a/tests/ovn-northd.at >> +++ b/tests/ovn-northd.at >> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d >> >> AT_CLEANUP >> ]) >> + >> +OVN_FOR_EACH_NORTHD_NO_HV([ >> +AT_SETUP([BGP control plane redirect]) >> +ovn_start >> + >> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >> + >> +check ovn-nbctl lr-add lr -- \ >> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >> +check ovn-nbctl --wait=sb set logical_router lr options:chassis=hv1 >> + >> +check ovn-nbctl ls-add ls -- \ >> + lsp-add ls ls-lr -- \ >> + lsp-set-type ls-lr router -- \ >> + lsp-set-addresses ls-lr router -- \ >> + lsp-set-options ls-lr router-port=lr-ls >> + >> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >> + lsp-set-addresses lsp-bgp unknown >> + >> +# Function that ensures that no BGP redirect ruels are installed. >> +check_no_bgp_redirect() { >> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >> +]) >> + >> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc -l], [0], [0 >> +]) >> +} >> + >> +# By default, no rules related to BGP redirect are present >> +check_no_bgp_redirect >> + >> +# Set "lsp-bgp" port as target of BGP control plane redirected traffic >> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-bgp >> + >> +# Check that BGP control plane traffic is redirected to "lsp-bgp" >> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >> +]) >> + >> +# Check that only BGP-related traffic is accepted on "lsp-bgp" port >> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], [dnl >> + table=??(ls_in_check_port_sec), priority=80 , match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >> +]) >> + >> +# Remove 'bgp-redirect' option from LRP and check that rules are removed >> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls options bgp-redirect >> +check_no_bgp_redirect >> + >> +# Set non-existent LSP as target of 'bgp-redirect' and check that no rules are added >> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-foo >> +check_no_bgp_redirect >> + >> +AT_CLEANUP >> +]) >> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >> index c24ede7c5..c09c1136e 100644 >> --- a/tests/system-ovn.at >> +++ b/tests/system-ovn.at >> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >> /connection dropped.*/d"]) >> AT_CLEANUP >> ]) >> + >> +OVN_FOR_EACH_NORTHD([ >> +AT_SETUP([BGP control plane redirect]) > Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. > >> +ovn_start >> +OVS_TRAFFIC_VSWITCHD_START() >> + >> +ADD_BR([br-int]) >> +ADD_BR([br-ext]) >> + >> +ovs-ofctl add-flow br-ext action=normal >> +# Set external-ids in br-int needed for ovn-controller >> +ovs-vsctl \ >> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >> + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >> + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true > Most ov*-ctl commands should be prefixed with "check" in this test. > >> + >> +# Start ovn-controller >> +start_daemon ovn-controller >> + >> +ovn-nbctl lr-add R1 \ >> + -- set Logical_Router R1 options:chassis=hv1 >> + >> +ovn-nbctl ls-add public >> + >> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >> + >> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ >> + type=router options:router-port=rp-public \ >> + -- lsp-set-addresses public-rp router >> + >> +ovn-nbctl lsp-add public bgp-daemon \ >> + -- lsp-set-addresses bgp-daemon unknown >> + >> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext]) >> +ovn-nbctl lsp-add public public1 \ >> + -- lsp-set-addresses public1 unknown \ >> + -- lsp-set-type public1 localnet \ >> + -- lsp-set-options public1 network_name=phynet >> + >> +ovn-nbctl --wait=hv sync >> + >> +# Set option that redirects BGP control plane traffic to a LSP "bgp-daemon" >> +ovn-nbctl --wait=sb set logical_router_port rp-public options:bgp-redirect=bgp-daemon >> + >> +# Create "bgp-daemon" interface in a namespace with IP and MAC matching LRP "rp-public" >> +ADD_NAMESPACES(bgp-daemon) >> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", "00:00:02:01:02:03") >> + >> +ADD_NAMESPACES(ext-foo) >> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", "00:10:10:01:02:13", \ >> + "172.16.1.1") >> + >> +# Flip the interface down/up to get proper IPv6 LLA >> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >> + >> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it can't be bound to. >> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp-daemon | grep "fe80::" | grep -v tentative])]) >> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | grep "fe80::" | grep -v tentative])]) >> + >> +# Verify that BGP control plane traffic is delivered to the "bgp-daemon" >> +# interface on both IPv4 and IPv6 LLA addresses >> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], [bgp_v4.pid]) >> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only 172.16.1.1 179]) >> + >> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 fe80::200:2ff:fe01:203%ext-foo 179]) >> + >> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >> + >> +as ovn-sb >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> + >> +as ovn-nb >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> + >> +as northd >> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >> + >> +as >> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >> +/.*terminating with signal 15.*/d"]) >> +AT_CLEANUP >> +]) > Regards, > Dumitru > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 8/6/24 19:22, Vladislav Odintsov wrote: > Hi Martin, Dumitru, > Hi Vladislav, > On 06.08.2024 19:23, Dumitru Ceara wrote: >> Hi Martin, >> >> Sorry, I was reviewing V3 but I was slow in actually sending out the email. >> >> On 8/6/24 13:17, Martin Kalcok wrote: >>> This change adds a 'bgp-redirect' option to LRP that allows >>> redirecting of BGP control plane traffic to an arbitrary LSP >>> in its peer LS. >>> >>> The option expects a string with a LSP name. When set, >>> any traffic entering LS that's destined for any of the >>> LRP's IP addresses (including IPv6 LLA) is redirected >>> to the LSP specified in the option's value. >>> >>> This enables external BGP daemons to listen on an interface >>> bound to a LSP and effectively act as if they were listening >>> on (and speaking from) LRP's IP address. >>> >>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>> --- >> Strictly on this version of the patch, and with the thought in mind that >> we might want to consider this feature experimental [0] and maybe change >> it (NB too) in the future, I left a few comments inline. With those >> addressed I think the patch looks OK to me. >> >> [0] >> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >> >> Now, for the point when we'd remove the "experimental" tag: >> >> In general, it makes me a bit uneasy that we have to share the MAC and >> IP between the LRP and the VIF of logical switch port that's connected >> to the same switch as the LRP. >> >> I was thinking of alternatives for the future and the only things I >> could come up with until now are: >> >> 1. Create a separate, "disconnected" logical switch: >> ovn-nbctl ls-add bgp >> >> Add the bgp-daemon port to it: >> ovn-nbctl lsp-add bgp bgp-daemon. >> >> Then we don't need "unknown" either, I think. >> >> But it's not possible today to move packets from the ing,ress pipeline of >> a logical switch ("public" in this test) to the egress pipeline of a >> different logical switch ("bgp" in my example). It also feels icky to >> implement that. >> >> 2. Add the ability to bind an OVS port directly to a logical router port: >> >> then we could do the same type of redirect you do here but instead in >> the logical router pipeline. The advantage is we don't have to drop any >> non-bgp traffic in the switch pipeline. The switch keeps functioning as >> it does today. >> >> Maybe an advantage of this second alternative would be that we can >> easily attach a filtering option to this LRP (e.g., >> LRP.options:control-traffic-filter) to allow us to be more flexible in >> what kind of traffic we forward to the actuall routing protocol daemon >> that runs behind that OVS port - Vladislav also mentioned during the >> meeting it might be interesting to forward BFD packets to the FRR (or >> whatever application implements the routing protocol) instance too. > > @Martin, I'm again kindly asking about possible support for BFD > redirect. Is it possible to incorporate these changes as an additional > configuration know ("bfd-redirect") into your patch so we can start > using this functionality internally and possibly give some feedback > (hopefully positive)? > > It doesn't seem to be a very big change but looks very attractive for us. > > I can send a separate patch for this on top of your patch, but > technically it won't be able to jump into the 24.09 because formally > it's already a soft-freeze in progress. As an option I can send a patch, > which can be squashed into this one. > > What do you think? I didn't check with other maintainers (added them in CC now), but given that we agreed to try to accept the port redirecting patch as experimental I think it's fine to expand it to allow BFD support too. Posting a follow up patch on top of this one is fine from my perspective. However, what if we change the option to? - NB.Logical_Router_Port.options:routing-protocol-redirect=<LSP-NAME> - NB.Logical_Router_Port.options:routing-protocols=<LIST_OF_PROTOCOLS> E.g.: NB.Logical_Router_Port.options:routing-protocol-redirect="bgp-daemon" NB.Logical_Router_Port.options:routing-protocols="BGP;BFD" Would this have a higher chance of becoming fully supported in the next release? Regards, Dumitru > >> But again, if we consider the current feature "experimental", we can >> spend more time during the next release cycle to figure out what's best. >> >>> v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" >>> as discussed during community meeting. >>> >>> northd/northd.c | 108 ++++++++++++++++++++++++++++++++++++++++ >>> northd/ovn-northd.8.xml | 23 +++++++++ >>> ovn-nb.xml | 14 ++++++ >>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>> 5 files changed, 289 insertions(+) >>> >>> diff --git a/northd/northd.c b/northd/northd.c >>> index 4353df07d..088104f25 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>> } >>> } >>> >>> +static void >>> +build_bgp_redirect_rule__( >>> + const char *s_addr, const char *bgp_port_name, bool is_ipv6, >>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>> + struct ds *match, struct ds *actions) >>> +{ >>> + int ip_ver = is_ipv6 ? 6 : 4; >>> + /* Redirect packets in the input pipeline destined for LR's IP to >>> + * the port specified in 'bgp-redirect' option. >>> + */ >>> + ds_clear(match); >>> + ds_clear(actions); >>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); >>> + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); >>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, >>> + ds_cstr(match), >>> + ds_cstr(actions), >>> + ls_peer->lflow_ref); >>> + >>> + >>> + /* Drop any traffic originating from 'bgp-redirect' port that does >>> + * not originate from BGP daemon port. This blocks unnecessary >>> + * traffic like ARP broadcasts or IPv6 router solicitation packets >>> + * from the dummy 'bgp-redirect' port. >>> + */ >>> + ds_clear(match); >>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, >>> + ds_cstr(match), >>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>> + ls_peer->lflow_ref); >> Not a huge deal but this logical flow gets regenerated multiple times >> for each bgp_port_name (for each LRP address). In the end we'll only >> install one in the database so there's no impact on SB. But maybe we >> should consider moving this part out of this function. >> >>> + >>> + ds_put_format(match, >>> + " && ip%d.src == %s && tcp.src == 179", >>> + ip_ver, >>> + s_addr); >>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, >>> + ds_cstr(match), >>> + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", >>> + ls_peer->lflow_ref); >> In your system test we set 'addresses = "unknown"' for 'bgp_port_name'. >> And I guess we kind of have to do it like that all the time when using >> this feature. I'm trying to think of scenarios in which users set >> LSP.addresses = 'unknown' and LSP.port-security = '<some-addresses>'. >> I'm not sure I saw those until now but maybe I'm wrong. >> >>> +} >>> + >>> +static void >>> +build_lrouter_bgp_redirect( >>> + struct ovn_port *op, struct lflow_table *lflows, >>> + struct ds *match, struct ds *actions) >>> +{ >>> + /* LRP has to have a peer.*/ >>> + if (op->peer == NULL) { >>> + return; >>> + } >>> + /* LRP has to have NB record.*/ >>> + if (op->nbrp == NULL) { >>> + return; >>> + } >>> + >>> + /* Proceed only for LRPs that have 'bgp-redirect' option set. Value of this >>> + * option is the name of LSP to which a BGP traffic will be redirected. >>> + */ >>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-redirect"); >>> + if (bgp_port == NULL) { >>> + return; >>> + } >>> + >>> + /* Ensure that LSP, to which the BGP traffic is redirected, exists.*/ >>> + struct ovn_port *peer_lsp; >>> + bool bgp_port_exists = false; >>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>> + if (peer_lsp_s == strlen(bgp_port) >>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>> + bgp_port_exists = true; >>> + break; >>> + } >>> + } >>> + >>> + if (!bgp_port_exists) { >>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on Logical Router Port " >>> + "'%s' refers to non-existent Logical Switch Port. " >>> + "BGP redirecting won't be configured.", >>> + op->key); >>> + return; >>> + } >>> + >>> + if (op->cr_port != NULL) { >>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not supported on" >>> + " Distributed Gateway Port '%s'", op->key); >>> + return; >>> + } >>> + >>> + /* Mirror traffic destined for LRP's IPs and default BGP port >>> + * to the port defined in 'bgp-redirect' option. >>> + */ >>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { >>> + const char *ip_s = op->lrp_networks.ipv4_addrs[i].addr_s; >>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op->peer, lflows, >>> + match, actions); >>> + } >>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { >>> + const char *ip_s = op->lrp_networks.ipv6_addrs[i].addr_s; >>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op->peer, lflows, >>> + match, actions); >>> + } >>> +} >>> + >>> /* This function adds ARP resolve flows related to a LSP. */ >>> static void >>> build_arp_resolve_flows_for_lsp( >>> @@ -16003,6 +16110,7 @@ build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>> lsi->meter_groups, op->lflow_ref); >>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi->lflows, &lsi->match, >>> &lsi->actions, op->lflow_ref); >>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, &lsi->actions); >>> } >>> >>> static void * >>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>> index b06b09ac5..4cbdf8f74 100644 >>> --- a/northd/ovn-northd.8.xml >>> +++ b/northd/ovn-northd.8.xml >>> @@ -284,6 +284,21 @@ >>> dropped in the next stage. >>> </li> >>> >>> + <li> >>> + For each logical port that's defined as a target of BGP redirecting >>> + (via <code>bgp-redirect</code> option set on Logical Router Port), a >>> + filter is set in place that disallows any traffic entering this port >>> + that does not originate from Logical Router Port's IPs and default BGP >>> + port (TCP 179). This filtering is achieved by two rules. First rule has >>> + priority 81, it matches on <code>inport == <var>BGP_MIRROR_PORT</var> >>> + && ip.src == <var>LRP_IP</var> && tcp.src == 179</code> >>> + and allows traffic further with <code>REGBIT_PORT_SEC_DROP" = >>> + check_in_port_sec(); next;</code>. Second rule with priority 80 matches >>> + the rest of the traffic from that port and sets >>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that the packets are >>> + dropped in the next stage. >>> + </li> >>> + >>> <li> >>> For each (enabled) vtep logical port, a priority 70 flow is added which >>> matches on all packets and applies the action >>> @@ -1949,6 +1964,14 @@ output; >>> on the logical switch. >>> </li> >>> >>> + <li> >>> + For any logical port that's defined as a target of BGP redirecting (via >>> + <code>bgp-redirect</code> option set on Logical Router Port), a >>> + priority-100 flow is added that redirects traffic for Logical Router >>> + Port IPs (including IPv6 LLA) and TCP port 179, to the targeted >>> + logical port. >>> + </li> >>> + >>> <li> >>> Priority-90 flows for transit switches that forward registered >>> IP multicast traffic to their corresponding multicast group , which >>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>> index 9552534f6..2d2f5c9e0 100644 >>> --- a/ovn-nb.xml >>> +++ b/ovn-nb.xml >>> @@ -3440,6 +3440,20 @@ or >>> </p> >>> </column> >>> >>> + <column name="options" key="bgp-redirect" type='{"type": "string"}'> >>> + <p> >>> + This option expects a name of a Logical Switch Port that's present >>> + in the peer's Logical Switch. If set, it causes any traffic >>> + that's destined for Logical Router Port's IP addresses (including >>> + its IPv6 LLA) and the default BGP port (TCP 179), to be redirected >>> + to the specified Logical Switch Port. >>> + >>> + This allows external BGP daemon to be bound to a port in OVN's >>> + Logical Switch and act as if it was listening on Logical Router >>> + Port's IP address. >>> + </p> >>> + </column> >>> + >>> <column name="options" key="gateway_mtu_bypass"> >>> <p> >>> When configured, represents a match expression, in the same >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>> index a389d1988..df1e924ab 100644 >>> --- a/tests/ovn-northd.at >>> +++ b/tests/ovn-northd.at >>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>> >>> AT_CLEANUP >>> ]) >>> + >>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>> +AT_SETUP([BGP control plane redirect]) >>> +ovn_start >>> + >>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>> + >>> +check ovn-nbctl lr-add lr -- \ >>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>> +check ovn-nbctl --wait=sb set logical_router lr options:chassis=hv1 >>> + >>> +check ovn-nbctl ls-add ls -- \ >>> + lsp-add ls ls-lr -- \ >>> + lsp-set-type ls-lr router -- \ >>> + lsp-set-addresses ls-lr router -- \ >>> + lsp-set-options ls-lr router-port=lr-ls >>> + >>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>> + lsp-set-addresses lsp-bgp unknown >>> + >>> +# Function that ensures that no BGP redirect ruels are installed. >>> +check_no_bgp_redirect() { >>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>> +]) >>> + >>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc -l], [0], [0 >>> +]) >>> +} >>> + >>> +# By default, no rules related to BGP redirect are present >>> +check_no_bgp_redirect >>> + >>> +# Set "lsp-bgp" port as target of BGP control plane redirected traffic >>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-bgp >>> + >>> +# Check that BGP control plane traffic is redirected to "lsp-bgp" >>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>> +]) >>> + >>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" port >>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], [dnl >>> + table=??(ls_in_check_port_sec), priority=80 , match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>> +]) >>> + >>> +# Remove 'bgp-redirect' option from LRP and check that rules are removed >>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls options bgp-redirect >>> +check_no_bgp_redirect >>> + >>> +# Set non-existent LSP as target of 'bgp-redirect' and check that no rules are added >>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-foo >>> +check_no_bgp_redirect >>> + >>> +AT_CLEANUP >>> +]) >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>> index c24ede7c5..c09c1136e 100644 >>> --- a/tests/system-ovn.at >>> +++ b/tests/system-ovn.at >>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> /connection dropped.*/d"]) >>> AT_CLEANUP >>> ]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([BGP control plane redirect]) >> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. >> >>> +ovn_start >>> +OVS_TRAFFIC_VSWITCHD_START() >>> + >>> +ADD_BR([br-int]) >>> +ADD_BR([br-ext]) >>> + >>> +ovs-ofctl add-flow br-ext action=normal >>> +# Set external-ids in br-int needed for ovn-controller >>> +ovs-vsctl \ >>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>> + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>> + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true >> Most ov*-ctl commands should be prefixed with "check" in this test. >> >>> + >>> +# Start ovn-controller >>> +start_daemon ovn-controller >>> + >>> +ovn-nbctl lr-add R1 \ >>> + -- set Logical_Router R1 options:chassis=hv1 >>> + >>> +ovn-nbctl ls-add public >>> + >>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>> + >>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ >>> + type=router options:router-port=rp-public \ >>> + -- lsp-set-addresses public-rp router >>> + >>> +ovn-nbctl lsp-add public bgp-daemon \ >>> + -- lsp-set-addresses bgp-daemon unknown >>> + >>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext]) >>> +ovn-nbctl lsp-add public public1 \ >>> + -- lsp-set-addresses public1 unknown \ >>> + -- lsp-set-type public1 localnet \ >>> + -- lsp-set-options public1 network_name=phynet >>> + >>> +ovn-nbctl --wait=hv sync >>> + >>> +# Set option that redirects BGP control plane traffic to a LSP "bgp-daemon" >>> +ovn-nbctl --wait=sb set logical_router_port rp-public options:bgp-redirect=bgp-daemon >>> + >>> +# Create "bgp-daemon" interface in a namespace with IP and MAC matching LRP "rp-public" >>> +ADD_NAMESPACES(bgp-daemon) >>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", "00:00:02:01:02:03") >>> + >>> +ADD_NAMESPACES(ext-foo) >>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", "00:10:10:01:02:13", \ >>> + "172.16.1.1") >>> + >>> +# Flip the interface down/up to get proper IPv6 LLA >>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>> + >>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it can't be bound to. >>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp-daemon | grep "fe80::" | grep -v tentative])]) >>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | grep "fe80::" | grep -v tentative])]) >>> + >>> +# Verify that BGP control plane traffic is delivered to the "bgp-daemon" >>> +# interface on both IPv4 and IPv6 LLA addresses >>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], [bgp_v4.pid]) >>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only 172.16.1.1 179]) >>> + >>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>> + >>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>> + >>> +as ovn-sb >>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> + >>> +as ovn-nb >>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> + >>> +as northd >>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>> + >>> +as >>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>> +/.*terminating with signal 15.*/d"]) >>> +AT_CLEANUP >>> +]) >> Regards, >> Dumitru >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi Dumitru and Vladislav, Thank you both for the review and the feedback On Wed, 2024-08-07 at 09:17 +0200, Dumitru Ceara wrote: > On 8/6/24 19:22, Vladislav Odintsov wrote: > > Hi Martin, Dumitru, > > > > Hi Vladislav, > > > On 06.08.2024 19:23, Dumitru Ceara wrote: > > > Hi Martin, > > > > > > Sorry, I was reviewing V3 but I was slow in actually sending out > > > the email. > > > > > > On 8/6/24 13:17, Martin Kalcok wrote: > > > > This change adds a 'bgp-redirect' option to LRP that allows > > > > redirecting of BGP control plane traffic to an arbitrary LSP > > > > in its peer LS. > > > > > > > > The option expects a string with a LSP name. When set, > > > > any traffic entering LS that's destined for any of the > > > > LRP's IP addresses (including IPv6 LLA) is redirected > > > > to the LSP specified in the option's value. > > > > > > > > This enables external BGP daemons to listen on an interface > > > > bound to a LSP and effectively act as if they were listening > > > > on (and speaking from) LRP's IP address. > > > > > > > > Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> > > > > --- > > > Strictly on this version of the patch, and with the thought in > > > mind that > > > we might want to consider this feature experimental [0] and maybe > > > change > > > it (NB too) in the future, I left a few comments inline. With > > > those > > > addressed I think the patch looks OK to me. > > > > > > [0] > > > https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ > > > I'm keeping an eye on this one and I think you made a good point Dumitru about the need for clear way to mark feature as experimental. I'll chime in that patch discussion as well. > > > Now, for the point when we'd remove the "experimental" tag: > > > > > > In general, it makes me a bit uneasy that we have to share the > > > MAC and > > > IP between the LRP and the VIF of logical switch port that's > > > connected > > > to the same switch as the LRP. I think that in one way or another, the requirement for the interface to share IP/MAC with the LRP will remain. The reason for sharing them is to allow operating BGP unnumbered which uses IPv6 LLA instead of pre-configured ASN. > > > > > > I was thinking of alternatives for the future and the only things > > > I > > > could come up with until now are: > > > > > > 1. Create a separate, "disconnected" logical switch: > > > ovn-nbctl ls-add bgp > > > > > > Add the bgp-daemon port to it: > > > ovn-nbctl lsp-add bgp bgp-daemon. > > > > > > Then we don't need "unknown" either, I think. > > > > > > But it's not possible today to move packets from the ing,ress > > > pipeline of > > > a logical switch ("public" in this test) to the egress pipeline > > > of a > > > different logical switch ("bgp" in my example). It also feels > > > icky to > > > implement that. > > > > > > 2. Add the ability to bind an OVS port directly to a logical > > > router port: > > > > > > then we could do the same type of redirect you do here but > > > instead in > > > the logical router pipeline. The advantage is we don't have to > > > drop any > > > non-bgp traffic in the switch pipeline. The switch keeps > > > functioning as > > > it does today. I like the second approach a lot. It would indeed feel more "in place" to have this logic in the LR, instead of LS. I didn't think this would be possible initially. > > > > > > Maybe an advantage of this second alternative would be that we > > > can > > > easily attach a filtering option to this LRP (e.g., > > > LRP.options:control-traffic-filter) to allow us to be more > > > flexible in > > > what kind of traffic we forward to the actuall routing protocol > > > daemon > > > that runs behind that OVS port - Vladislav also mentioned during > > > the > > > meeting it might be interesting to forward BFD packets to the FRR > > > (or > > > whatever application implements the routing protocol) instance > > > too. > > > > @Martin, I'm again kindly asking about possible support for BFD > > redirect. Is it possible to incorporate these changes as an > > additional > > configuration know ("bfd-redirect") into your patch so we can start > > using this functionality internally and possibly give some feedback > > (hopefully positive)? > > > > It doesn't seem to be a very big change but looks very attractive > > for us. > > > > I can send a separate patch for this on top of your patch, but > > technically it won't be able to jump into the 24.09 because > > formally > > it's already a soft-freeze in progress. As an option I can send a > > patch, > > which can be squashed into this one. > > > > What do you think? > > I didn't check with other maintainers (added them in CC now), but > given > that we agreed to try to accept the port redirecting patch as > experimental I think it's fine to expand it to allow BFD support too. > Posting a follow up patch on top of this one is fine from my > perspective. Indeed the change is not that big code-wise, but given that it would require some forethought about the implementation I was bit hesitant to include it in here. I didn't want to rock the boat too much, so to speak. However given that this is indeed going to be marked as experimental and if maintainers are OK with it, I'm happy to post v5 with this change included. Feedback from the real-world usage from Vladislav would be excellent thing to have when we'll move this feature from experimental to stable. > > However, what if we change the option to? > - NB.Logical_Router_Port.options:routing-protocol-redirect=<LSP-NAME> > - NB.Logical_Router_Port.options:routing- > protocols=<LIST_OF_PROTOCOLS> > > E.g.: > NB.Logical_Router_Port.options:routing-protocol-redirect="bgp-daemon" > NB.Logical_Router_Port.options:routing-protocols="BGP;BFD" These seem like a good option to me. > > Would this have a higher chance of becoming fully supported in the > next > release? > > Regards, > Dumitru > > > > > > But again, if we consider the current feature "experimental", we > > > can > > > spend more time during the next release cycle to figure out > > > what's best. > > > > > > > v4 of this patch renames the feature from "bgp-mirror" to "bgp- > > > > redirect" > > > > as discussed during community meeting. > > > > > > > > northd/northd.c | 108 > > > > ++++++++++++++++++++++++++++++++++++++++ > > > > northd/ovn-northd.8.xml | 23 +++++++++ > > > > ovn-nb.xml | 14 ++++++ > > > > tests/ovn-northd.at | 58 +++++++++++++++++++++ > > > > tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ > > > > 5 files changed, 289 insertions(+) > > > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > > index 4353df07d..088104f25 100644 > > > > --- a/northd/northd.c > > > > +++ b/northd/northd.c > > > > @@ -13048,6 +13048,113 @@ > > > > build_arp_resolve_flows_for_lrp(struct ovn_port *op, > > > > } > > > > } > > > > > > > > +static void > > > > +build_bgp_redirect_rule__( > > > > + const char *s_addr, const char *bgp_port_name, bool > > > > is_ipv6, > > > > + struct ovn_port *ls_peer, struct lflow_table *lflows, > > > > + struct ds *match, struct ds *actions) > > > > +{ > > > > + int ip_ver = is_ipv6 ? 6 : 4; > > > > + /* Redirect packets in the input pipeline destined for > > > > LR's IP to > > > > + * the port specified in 'bgp-redirect' option. > > > > + */ > > > > + ds_clear(match); > > > > + ds_clear(actions); > > > > + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", > > > > ip_ver, s_addr); > > > > + ds_put_format(actions, "outport = \"%s\"; output;", > > > > bgp_port_name); > > > > + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, > > > > 100, > > > > + ds_cstr(match), > > > > + ds_cstr(actions), > > > > + ls_peer->lflow_ref); > > > > + > > > > + > > > > + /* Drop any traffic originating from 'bgp-redirect' port > > > > that does > > > > + * not originate from BGP daemon port. This blocks > > > > unnecessary > > > > + * traffic like ARP broadcasts or IPv6 router solicitation > > > > packets > > > > + * from the dummy 'bgp-redirect' port. > > > > + */ > > > > + ds_clear(match); > > > > + ds_put_format(match, "inport == \"%s\"", bgp_port_name); > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > S_SWITCH_IN_CHECK_PORT_SEC, 80, > > > > + ds_cstr(match), > > > > + REGBIT_PORT_SEC_DROP " = 1; next;", > > > > + ls_peer->lflow_ref); > > > Not a huge deal but this logical flow gets regenerated multiple > > > times > > > for each bgp_port_name (for each LRP address). In the end we'll > > > only > > > install one in the database so there's no impact on SB. But > > > maybe we > > > should consider moving this part out of this function. ack > > > > > > > + > > > > + ds_put_format(match, > > > > + " && ip%d.src == %s && tcp.src == 179", > > > > + ip_ver, > > > > + s_addr); > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > S_SWITCH_IN_CHECK_PORT_SEC, 81, > > > > + ds_cstr(match), > > > > + REGBIT_PORT_SEC_DROP " = > > > > check_in_port_sec(); next;", > > > > + ls_peer->lflow_ref); > > > In your system test we set 'addresses = "unknown"' for > > > 'bgp_port_name'. > > > And I guess we kind of have to do it like that all the time > > > when using > > > this feature. I'm trying to think of scenarios in which users > > > set > > > LSP.addresses = 'unknown' and LSP.port-security = '<some- > > > addresses>'. > > > I'm not sure I saw those until now but maybe I'm wrong. Is the concern here that the "bgp-port" with the "unknown" address is going to receive all the unicast traffic for unknown MAC adresses as well? I initially tried to add deny rules that would block any incoming packets except those that match LRP's IP and BGP port, but it turned out that the negative matches are kinda icky (for a match that has something like `ip4.src != <IP_ADDR>` it generated hundreds of OF rules in the ovs) > > > > > > > +} > > > > + > > > > +static void > > > > +build_lrouter_bgp_redirect( > > > > + struct ovn_port *op, struct lflow_table *lflows, > > > > + struct ds *match, struct ds *actions) > > > > +{ > > > > + /* LRP has to have a peer.*/ > > > > + if (op->peer == NULL) { > > > > + return; > > > > + } > > > > + /* LRP has to have NB record.*/ > > > > + if (op->nbrp == NULL) { > > > > + return; > > > > + } > > > > + > > > > + /* Proceed only for LRPs that have 'bgp-redirect' option > > > > set. Value of this > > > > + * option is the name of LSP to which a BGP traffic will > > > > be redirected. > > > > + */ > > > > + const char *bgp_port = smap_get(&op->nbrp->options, "bgp- > > > > redirect"); > > > > + if (bgp_port == NULL) { > > > > + return; > > > > + } > > > > + > > > > + /* Ensure that LSP, to which the BGP traffic is > > > > redirected, exists.*/ > > > > + struct ovn_port *peer_lsp; > > > > + bool bgp_port_exists = false; > > > > + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { > > > > + size_t peer_lsp_s = strlen(peer_lsp->key); > > > > + if (peer_lsp_s == strlen(bgp_port) > > > > + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ > > > > + bgp_port_exists = true; > > > > + break; > > > > + } > > > > + } > > > > + > > > > + if (!bgp_port_exists) { > > > > + static struct vlog_rate_limit rl = > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on > > > > Logical Router Port " > > > > + "'%s' refers to non-existent Logical > > > > Switch Port. " > > > > + "BGP redirecting won't be > > > > configured.", > > > > + op->key); > > > > + return; > > > > + } > > > > + > > > > + if (op->cr_port != NULL) { > > > > + static struct vlog_rate_limit rl = > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not > > > > supported on" > > > > + " Distributed Gateway Port '%s'", > > > > op->key); > > > > + return; > > > > + } > > > > + > > > > + /* Mirror traffic destined for LRP's IPs and default BGP > > > > port > > > > + * to the port defined in 'bgp-redirect' option. > > > > + */ > > > > + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) > > > > { > > > > + const char *ip_s = op- > > > > >lrp_networks.ipv4_addrs[i].addr_s; > > > > + build_bgp_redirect_rule__(ip_s, bgp_port, false, op- > > > > >peer, lflows, > > > > + match, actions); > > > > + } > > > > + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) > > > > { > > > > + const char *ip_s = op- > > > > >lrp_networks.ipv6_addrs[i].addr_s; > > > > + build_bgp_redirect_rule__(ip_s, bgp_port, true, op- > > > > >peer, lflows, > > > > + match, actions); > > > > + } > > > > +} > > > > + > > > > /* This function adds ARP resolve flows related to a LSP. */ > > > > static void > > > > build_arp_resolve_flows_for_lsp( > > > > @@ -16003,6 +16110,7 @@ > > > > build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, > > > > lsi->meter_groups, op- > > > > >lflow_ref); > > > > build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- > > > > >lflows, &lsi->match, > > > > &lsi- > > > > >actions, op->lflow_ref); > > > > + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, > > > > &lsi->actions); > > > > } > > > > > > > > static void * > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > > > index b06b09ac5..4cbdf8f74 100644 > > > > --- a/northd/ovn-northd.8.xml > > > > +++ b/northd/ovn-northd.8.xml > > > > @@ -284,6 +284,21 @@ > > > > dropped in the next stage. > > > > </li> > > > > > > > > + <li> > > > > + For each logical port that's defined as a target of > > > > BGP redirecting > > > > + (via <code>bgp-redirect</code> option set on Logical > > > > Router Port), a > > > > + filter is set in place that disallows any traffic > > > > entering this port > > > > + that does not originate from Logical Router Port's IPs > > > > and default BGP > > > > + port (TCP 179). This filtering is achieved by two > > > > rules. First rule has > > > > + priority 81, it matches on <code>inport == > > > > <var>BGP_MIRROR_PORT</var> > > > > + && ip.src == <var>LRP_IP</var> && > > > > tcp.src == 179</code> > > > > + and allows traffic further with > > > > <code>REGBIT_PORT_SEC_DROP" = > > > > + check_in_port_sec(); next;</code>. Second rule with > > > > priority 80 matches > > > > + the rest of the traffic from that port and sets > > > > + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that > > > > the packets are > > > > + dropped in the next stage. > > > > + </li> > > > > + > > > > <li> > > > > For each (enabled) vtep logical port, a priority 70 > > > > flow is added which > > > > matches on all packets and applies the action > > > > @@ -1949,6 +1964,14 @@ output; > > > > on the logical switch. > > > > </li> > > > > > > > > + <li> > > > > + For any logical port that's defined as a target of BGP > > > > redirecting (via > > > > + <code>bgp-redirect</code> option set on Logical Router > > > > Port), a > > > > + priority-100 flow is added that redirects traffic for > > > > Logical Router > > > > + Port IPs (including IPv6 LLA) and TCP port 179, to the > > > > targeted > > > > + logical port. > > > > + </li> > > > > + > > > > <li> > > > > Priority-90 flows for transit switches that forward > > > > registered > > > > IP multicast traffic to their corresponding multicast > > > > group , which > > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > > > index 9552534f6..2d2f5c9e0 100644 > > > > --- a/ovn-nb.xml > > > > +++ b/ovn-nb.xml > > > > @@ -3440,6 +3440,20 @@ or > > > > </p> > > > > </column> > > > > > > > > + <column name="options" key="bgp-redirect" type='{"type": > > > > "string"}'> > > > > + <p> > > > > + This option expects a name of a Logical Switch Port > > > > that's present > > > > + in the peer's Logical Switch. If set, it causes any > > > > traffic > > > > + that's destined for Logical Router Port's IP > > > > addresses (including > > > > + its IPv6 LLA) and the default BGP port (TCP 179), to > > > > be redirected > > > > + to the specified Logical Switch Port. > > > > + > > > > + This allows external BGP daemon to be bound to a > > > > port in OVN's > > > > + Logical Switch and act as if it was listening on > > > > Logical Router > > > > + Port's IP address. > > > > + </p> > > > > + </column> > > > > + > > > > <column name="options" key="gateway_mtu_bypass"> > > > > <p> > > > > When configured, represents a match expression, in > > > > the same > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > > index a389d1988..df1e924ab 100644 > > > > --- a/tests/ovn-northd.at > > > > +++ b/tests/ovn-northd.at > > > > @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | > > > > grep lr_in_dnat | ovn_strip_lflows], [0], [d > > > > > > > > AT_CLEANUP > > > > ]) > > > > + > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > > +AT_SETUP([BGP control plane redirect]) > > > > +ovn_start > > > > + > > > > +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 > > > > + > > > > +check ovn-nbctl lr-add lr -- \ > > > > + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 > > > > +check ovn-nbctl --wait=sb set logical_router lr > > > > options:chassis=hv1 > > > > + > > > > +check ovn-nbctl ls-add ls -- \ > > > > + lsp-add ls ls-lr -- \ > > > > + lsp-set-type ls-lr router -- \ > > > > + lsp-set-addresses ls-lr router -- \ > > > > + lsp-set-options ls-lr router-port=lr-ls > > > > + > > > > +check ovn-nbctl lsp-add ls lsp-bgp -- \ > > > > + lsp-set-addresses lsp-bgp unknown > > > > + > > > > +# Function that ensures that no BGP redirect ruels are > > > > installed. > > > > +check_no_bgp_redirect() { > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | > > > > grep "tcp.dst == 179" | wc -l], [0], [0 > > > > +]) > > > > + > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc - > > > > l], [0], [0 > > > > +]) > > > > +} > > > > + > > > > +# By default, no rules related to BGP redirect are present > > > > +check_no_bgp_redirect > > > > + > > > > +# Set "lsp-bgp" port as target of BGP control plane redirected > > > > traffic > > > > +check ovn-nbctl --wait=sb set logical_router_port lr-ls > > > > options:bgp-redirect=lsp-bgp > > > > + > > > > +# Check that BGP control plane traffic is redirected to "lsp- > > > > bgp" > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep > > > > "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl > > > > + table=??(ls_in_l2_lkup ), priority=100 , > > > > match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), > > > > action=(outport = "lsp-bgp"; output;) > > > > + table=??(ls_in_l2_lkup ), priority=100 , > > > > match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), > > > > action=(outport = "lsp-bgp"; output;) > > > > +]) > > > > + > > > > +# Check that only BGP-related traffic is accepted on "lsp-bgp" > > > > port > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec > > > > | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], > > > > [dnl > > > > + table=??(ls_in_check_port_sec), priority=80 , > > > > match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) > > > > + table=??(ls_in_check_port_sec), priority=81 , > > > > match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src > > > > == 179), action=(reg0[[15]] = check_in_port_sec(); next;) > > > > + table=??(ls_in_check_port_sec), priority=81 , > > > > match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 > > > > && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); > > > > next;) > > > > +]) > > > > + > > > > +# Remove 'bgp-redirect' option from LRP and check that rules > > > > are removed > > > > +check ovn-nbctl --wait=sb remove logical_router_port lr-ls > > > > options bgp-redirect > > > > +check_no_bgp_redirect > > > > + > > > > +# Set non-existent LSP as target of 'bgp-redirect' and check > > > > that no rules are added > > > > +check ovn-nbctl --wait=sb set logical_router_port lr-ls > > > > options:bgp-redirect=lsp-foo > > > > +check_no_bgp_redirect > > > > + > > > > +AT_CLEANUP > > > > +]) > > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > > > > index c24ede7c5..c09c1136e 100644 > > > > --- a/tests/system-ovn.at > > > > +++ b/tests/system-ovn.at > > > > @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to > > > > query port patch-.*/d > > > > /connection dropped.*/d"]) > > > > AT_CLEANUP > > > > ]) > > > > + > > > > +OVN_FOR_EACH_NORTHD([ > > > > +AT_SETUP([BGP control plane redirect]) > > > Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. ack > > > > > > > +ovn_start > > > > +OVS_TRAFFIC_VSWITCHD_START() > > > > + > > > > +ADD_BR([br-int]) > > > > +ADD_BR([br-ext]) > > > > + > > > > +ovs-ofctl add-flow br-ext action=normal > > > > +# Set external-ids in br-int needed for ovn-controller > > > > +ovs-vsctl \ > > > > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > > > > + -- set Open_vSwitch . external-ids:ovn- > > > > remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > > > > + -- set Open_vSwitch . external-ids:ovn-encap- > > > > type=geneve \ > > > > + -- set Open_vSwitch . external-ids:ovn-encap- > > > > ip=169.0.0.1 \ > > > > + -- set bridge br-int fail-mode=secure other- > > > > config:disable-in-band=true > > > Most ov*-ctl commands should be prefixed with "check" in this > > > test. ack > > > > > > > + > > > > +# Start ovn-controller > > > > +start_daemon ovn-controller > > > > + > > > > +ovn-nbctl lr-add R1 \ > > > > + -- set Logical_Router R1 options:chassis=hv1 > > > > + > > > > +ovn-nbctl ls-add public > > > > + > > > > +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 > > > > + > > > > +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port > > > > public-rp \ > > > > + type=router options:router-port=rp-public \ > > > > + -- lsp-set-addresses public-rp router > > > > + > > > > +ovn-nbctl lsp-add public bgp-daemon \ > > > > + -- lsp-set-addresses bgp-daemon unknown > > > > + > > > > +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- > > > > bridge-mappings=phynet:br-ext]) > > > > +ovn-nbctl lsp-add public public1 \ > > > > + -- lsp-set-addresses public1 unknown \ > > > > + -- lsp-set-type public1 localnet \ > > > > + -- lsp-set-options public1 network_name=phynet > > > > + > > > > +ovn-nbctl --wait=hv sync > > > > + > > > > +# Set option that redirects BGP control plane traffic to a LSP > > > > "bgp-daemon" > > > > +ovn-nbctl --wait=sb set logical_router_port rp-public > > > > options:bgp-redirect=bgp-daemon > > > > + > > > > +# Create "bgp-daemon" interface in a namespace with IP and MAC > > > > matching LRP "rp-public" > > > > +ADD_NAMESPACES(bgp-daemon) > > > > +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", > > > > "00:00:02:01:02:03") > > > > + > > > > +ADD_NAMESPACES(ext-foo) > > > > +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", > > > > "00:10:10:01:02:13", \ > > > > + "172.16.1.1") > > > > + > > > > +# Flip the interface down/up to get proper IPv6 LLA > > > > +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) > > > > +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) > > > > + > > > > +# Wait until IPv6 LLA loses the "tentative" flag otherwise it > > > > can't be bound to. > > > > +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- > > > > daemon | grep "fe80::" | grep -v tentative])]) > > > > +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | > > > > grep "fe80::" | grep -v tentative])]) > > > > + > > > > +# Verify that BGP control plane traffic is delivered to the > > > > "bgp-daemon" > > > > +# interface on both IPv4 and IPv6 LLA addresses > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], > > > > [bgp_v4.pid]) > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only > > > > 172.16.1.1 179]) > > > > + > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k > > > > fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 > > > > fe80::200:2ff:fe01:203%ext-foo 179]) > > > > + > > > > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > > > > + > > > > +as ovn-sb > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > + > > > > +as ovn-nb > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > + > > > > +as northd > > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > > > > + > > > > +as > > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > > > > +/.*terminating with signal 15.*/d"]) > > > > +AT_CLEANUP > > > > +]) > > > Regards, > > > Dumitru > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > Thanks, Martin.
On 8/7/24 11:17, martin.kalcok@canonical.com wrote: > Hi Dumitru and Vladislav, > Thank you both for the review and the feedback > > On Wed, 2024-08-07 at 09:17 +0200, Dumitru Ceara wrote: >> On 8/6/24 19:22, Vladislav Odintsov wrote: >>> Hi Martin, Dumitru, >>> >> >> Hi Vladislav, >> >>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>> Hi Martin, >>>> >>>> Sorry, I was reviewing V3 but I was slow in actually sending out >>>> the email. >>>> >>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>> redirecting of BGP control plane traffic to an arbitrary LSP >>>>> in its peer LS. >>>>> >>>>> The option expects a string with a LSP name. When set, >>>>> any traffic entering LS that's destined for any of the >>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>> to the LSP specified in the option's value. >>>>> >>>>> This enables external BGP daemons to listen on an interface >>>>> bound to a LSP and effectively act as if they were listening >>>>> on (and speaking from) LRP's IP address. >>>>> >>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>>> --- >>>> Strictly on this version of the patch, and with the thought in >>>> mind that >>>> we might want to consider this feature experimental [0] and maybe >>>> change >>>> it (NB too) in the future, I left a few comments inline. With >>>> those >>>> addressed I think the patch looks OK to me. >>>> >>>> [0] >>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>> > > I'm keeping an eye on this one and I think you made a good point > Dumitru about the need for clear way to mark feature as experimental. > I'll chime in that patch discussion as well. > Thanks! >>>> Now, for the point when we'd remove the "experimental" tag: >>>> >>>> In general, it makes me a bit uneasy that we have to share the >>>> MAC and >>>> IP between the LRP and the VIF of logical switch port that's >>>> connected >>>> to the same switch as the LRP. > > I think that in one way or another, the requirement for the interface > to share IP/MAC with the LRP will remain. The reason for sharing them > is to allow operating BGP unnumbered which uses IPv6 LLA instead of > pre-configured ASN. > I think we don't need it with any of the alternative options below but we can discuss those in detail after branching, in the 25.03 cycle. >>>> >>>> I was thinking of alternatives for the future and the only things >>>> I >>>> could come up with until now are: >>>> >>>> 1. Create a separate, "disconnected" logical switch: >>>> ovn-nbctl ls-add bgp >>>> >>>> Add the bgp-daemon port to it: >>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>> >>>> Then we don't need "unknown" either, I think. >>>> >>>> But it's not possible today to move packets from the ing,ress >>>> pipeline of >>>> a logical switch ("public" in this test) to the egress pipeline >>>> of a >>>> different logical switch ("bgp" in my example). It also feels >>>> icky to >>>> implement that. >>>> >>>> 2. Add the ability to bind an OVS port directly to a logical >>>> router port: >>>> >>>> then we could do the same type of redirect you do here but >>>> instead in >>>> the logical router pipeline. The advantage is we don't have to >>>> drop any >>>> non-bgp traffic in the switch pipeline. The switch keeps >>>> functioning as >>>> it does today. > > I like the second approach a lot. It would indeed feel more "in place" > to have this logic in the LR, instead of LS. I didn't think this would > be possible initially. > Ack, maybe we should come back to this discussion after branching. >>>> >>>> Maybe an advantage of this second alternative would be that we >>>> can >>>> easily attach a filtering option to this LRP (e.g., >>>> LRP.options:control-traffic-filter) to allow us to be more >>>> flexible in >>>> what kind of traffic we forward to the actuall routing protocol >>>> daemon >>>> that runs behind that OVS port - Vladislav also mentioned during >>>> the >>>> meeting it might be interesting to forward BFD packets to the FRR >>>> (or >>>> whatever application implements the routing protocol) instance >>>> too. >>> >>> @Martin, I'm again kindly asking about possible support for BFD >>> redirect. Is it possible to incorporate these changes as an >>> additional >>> configuration know ("bfd-redirect") into your patch so we can start >>> using this functionality internally and possibly give some feedback >>> (hopefully positive)? >>> >>> It doesn't seem to be a very big change but looks very attractive >>> for us. >>> >>> I can send a separate patch for this on top of your patch, but >>> technically it won't be able to jump into the 24.09 because >>> formally >>> it's already a soft-freeze in progress. As an option I can send a >>> patch, >>> which can be squashed into this one. >>> >>> What do you think? >> >> I didn't check with other maintainers (added them in CC now), but >> given >> that we agreed to try to accept the port redirecting patch as >> experimental I think it's fine to expand it to allow BFD support too. >> Posting a follow up patch on top of this one is fine from my >> perspective. > > Indeed the change is not that big code-wise, but given that it would > require some forethought about the implementation I was bit hesitant to > include it in here. I didn't want to rock the boat too much, so to > speak. However given that this is indeed going to be marked as > experimental and if maintainers are OK with it, I'm happy to post v5 > with this change included. Feedback from the real-world usage from > Vladislav would be excellent thing to have when we'll move this feature > from experimental to stable. > >> >> However, what if we change the option to? >> - NB.Logical_Router_Port.options:routing-protocol-redirect=<LSP-NAME> >> - NB.Logical_Router_Port.options:routing- >> protocols=<LIST_OF_PROTOCOLS> >> >> E.g.: >> NB.Logical_Router_Port.options:routing-protocol-redirect="bgp-daemon" >> NB.Logical_Router_Port.options:routing-protocols="BGP;BFD" > > These seem like a good option to me. Martin, if it's not too much work could we try to add this to v5 then? Vladislav, does that work for you too? Regards, Dumitru >> >> Would this have a higher chance of becoming fully supported in the >> next >> release? >> >> Regards, >> Dumitru >> >>> >>>> But again, if we consider the current feature "experimental", we >>>> can >>>> spend more time during the next release cycle to figure out >>>> what's best. >>>> >>>>> v4 of this patch renames the feature from "bgp-mirror" to "bgp- >>>>> redirect" >>>>> as discussed during community meeting. >>>>> >>>>> northd/northd.c | 108 >>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>> ovn-nb.xml | 14 ++++++ >>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>>>> 5 files changed, 289 insertions(+) >>>>> >>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>> index 4353df07d..088104f25 100644 >>>>> --- a/northd/northd.c >>>>> +++ b/northd/northd.c >>>>> @@ -13048,6 +13048,113 @@ >>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>> } >>>>> } >>>>> >>>>> +static void >>>>> +build_bgp_redirect_rule__( >>>>> + const char *s_addr, const char *bgp_port_name, bool >>>>> is_ipv6, >>>>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>>>> + struct ds *match, struct ds *actions) >>>>> +{ >>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>> + /* Redirect packets in the input pipeline destined for >>>>> LR's IP to >>>>> + * the port specified in 'bgp-redirect' option. >>>>> + */ >>>>> + ds_clear(match); >>>>> + ds_clear(actions); >>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", >>>>> ip_ver, s_addr); >>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>> bgp_port_name); >>>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, >>>>> 100, >>>>> + ds_cstr(match), >>>>> + ds_cstr(actions), >>>>> + ls_peer->lflow_ref); >>>>> + >>>>> + >>>>> + /* Drop any traffic originating from 'bgp-redirect' port >>>>> that does >>>>> + * not originate from BGP daemon port. This blocks >>>>> unnecessary >>>>> + * traffic like ARP broadcasts or IPv6 router solicitation >>>>> packets >>>>> + * from the dummy 'bgp-redirect' port. >>>>> + */ >>>>> + ds_clear(match); >>>>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>> + ds_cstr(match), >>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>> + ls_peer->lflow_ref); >>>> Not a huge deal but this logical flow gets regenerated multiple >>>> times >>>> for each bgp_port_name (for each LRP address). In the end we'll >>>> only >>>> install one in the database so there's no impact on SB. But >>>> maybe we >>>> should consider moving this part out of this function. > > ack > >>>> >>>>> + >>>>> + ds_put_format(match, >>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>> + ip_ver, >>>>> + s_addr); >>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>> + ds_cstr(match), >>>>> + REGBIT_PORT_SEC_DROP " = >>>>> check_in_port_sec(); next;", >>>>> + ls_peer->lflow_ref); >>>> In your system test we set 'addresses = "unknown"' for >>>> 'bgp_port_name'. >>>> And I guess we kind of have to do it like that all the time >>>> when using >>>> this feature. I'm trying to think of scenarios in which users >>>> set >>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>> addresses>'. >>>> I'm not sure I saw those until now but maybe I'm wrong. > > Is the concern here that the "bgp-port" with the "unknown" address is > going to receive all the unicast traffic for unknown MAC adresses as > well? I initially tried to add deny rules that would block any incoming > packets except those that match LRP's IP and BGP port, but it turned > out that the negative matches are kinda icky (for a match that has > something like `ip4.src != <IP_ADDR>` it generated hundreds of OF rules > in the ovs) > >>>> >>>>> +} >>>>> + >>>>> +static void >>>>> +build_lrouter_bgp_redirect( >>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>> + struct ds *match, struct ds *actions) >>>>> +{ >>>>> + /* LRP has to have a peer.*/ >>>>> + if (op->peer == NULL) { >>>>> + return; >>>>> + } >>>>> + /* LRP has to have NB record.*/ >>>>> + if (op->nbrp == NULL) { >>>>> + return; >>>>> + } >>>>> + >>>>> + /* Proceed only for LRPs that have 'bgp-redirect' option >>>>> set. Value of this >>>>> + * option is the name of LSP to which a BGP traffic will >>>>> be redirected. >>>>> + */ >>>>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp- >>>>> redirect"); >>>>> + if (bgp_port == NULL) { >>>>> + return; >>>>> + } >>>>> + >>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>> redirected, exists.*/ >>>>> + struct ovn_port *peer_lsp; >>>>> + bool bgp_port_exists = false; >>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>>>> + bgp_port_exists = true; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> + if (!bgp_port_exists) { >>>>> + static struct vlog_rate_limit rl = >>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>> Logical Router Port " >>>>> + "'%s' refers to non-existent Logical >>>>> Switch Port. " >>>>> + "BGP redirecting won't be >>>>> configured.", >>>>> + op->key); >>>>> + return; >>>>> + } >>>>> + >>>>> + if (op->cr_port != NULL) { >>>>> + static struct vlog_rate_limit rl = >>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>> supported on" >>>>> + " Distributed Gateway Port '%s'", >>>>> op->key); >>>>> + return; >>>>> + } >>>>> + >>>>> + /* Mirror traffic destined for LRP's IPs and default BGP >>>>> port >>>>> + * to the port defined in 'bgp-redirect' option. >>>>> + */ >>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) >>>>> { >>>>> + const char *ip_s = op- >>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op- >>>>>> peer, lflows, >>>>> + match, actions); >>>>> + } >>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) >>>>> { >>>>> + const char *ip_s = op- >>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op- >>>>>> peer, lflows, >>>>> + match, actions); >>>>> + } >>>>> +} >>>>> + >>>>> /* This function adds ARP resolve flows related to a LSP. */ >>>>> static void >>>>> build_arp_resolve_flows_for_lsp( >>>>> @@ -16003,6 +16110,7 @@ >>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>>>> lsi->meter_groups, op- >>>>>> lflow_ref); >>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>> lflows, &lsi->match, >>>>> &lsi- >>>>>> actions, op->lflow_ref); >>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, >>>>> &lsi->actions); >>>>> } >>>>> >>>>> static void * >>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>>>> index b06b09ac5..4cbdf8f74 100644 >>>>> --- a/northd/ovn-northd.8.xml >>>>> +++ b/northd/ovn-northd.8.xml >>>>> @@ -284,6 +284,21 @@ >>>>> dropped in the next stage. >>>>> </li> >>>>> >>>>> + <li> >>>>> + For each logical port that's defined as a target of >>>>> BGP redirecting >>>>> + (via <code>bgp-redirect</code> option set on Logical >>>>> Router Port), a >>>>> + filter is set in place that disallows any traffic >>>>> entering this port >>>>> + that does not originate from Logical Router Port's IPs >>>>> and default BGP >>>>> + port (TCP 179). This filtering is achieved by two >>>>> rules. First rule has >>>>> + priority 81, it matches on <code>inport == >>>>> <var>BGP_MIRROR_PORT</var> >>>>> + && ip.src == <var>LRP_IP</var> && >>>>> tcp.src == 179</code> >>>>> + and allows traffic further with >>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>> + check_in_port_sec(); next;</code>. Second rule with >>>>> priority 80 matches >>>>> + the rest of the traffic from that port and sets >>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that >>>>> the packets are >>>>> + dropped in the next stage. >>>>> + </li> >>>>> + >>>>> <li> >>>>> For each (enabled) vtep logical port, a priority 70 >>>>> flow is added which >>>>> matches on all packets and applies the action >>>>> @@ -1949,6 +1964,14 @@ output; >>>>> on the logical switch. >>>>> </li> >>>>> >>>>> + <li> >>>>> + For any logical port that's defined as a target of BGP >>>>> redirecting (via >>>>> + <code>bgp-redirect</code> option set on Logical Router >>>>> Port), a >>>>> + priority-100 flow is added that redirects traffic for >>>>> Logical Router >>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to the >>>>> targeted >>>>> + logical port. >>>>> + </li> >>>>> + >>>>> <li> >>>>> Priority-90 flows for transit switches that forward >>>>> registered >>>>> IP multicast traffic to their corresponding multicast >>>>> group , which >>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>> index 9552534f6..2d2f5c9e0 100644 >>>>> --- a/ovn-nb.xml >>>>> +++ b/ovn-nb.xml >>>>> @@ -3440,6 +3440,20 @@ or >>>>> </p> >>>>> </column> >>>>> >>>>> + <column name="options" key="bgp-redirect" type='{"type": >>>>> "string"}'> >>>>> + <p> >>>>> + This option expects a name of a Logical Switch Port >>>>> that's present >>>>> + in the peer's Logical Switch. If set, it causes any >>>>> traffic >>>>> + that's destined for Logical Router Port's IP >>>>> addresses (including >>>>> + its IPv6 LLA) and the default BGP port (TCP 179), to >>>>> be redirected >>>>> + to the specified Logical Switch Port. >>>>> + >>>>> + This allows external BGP daemon to be bound to a >>>>> port in OVN's >>>>> + Logical Switch and act as if it was listening on >>>>> Logical Router >>>>> + Port's IP address. >>>>> + </p> >>>>> + </column> >>>>> + >>>>> <column name="options" key="gateway_mtu_bypass"> >>>>> <p> >>>>> When configured, represents a match expression, in >>>>> the same >>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>> index a389d1988..df1e924ab 100644 >>>>> --- a/tests/ovn-northd.at >>>>> +++ b/tests/ovn-northd.at >>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | >>>>> grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>> >>>>> AT_CLEANUP >>>>> ]) >>>>> + >>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>> +AT_SETUP([BGP control plane redirect]) >>>>> +ovn_start >>>>> + >>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>> + >>>>> +check ovn-nbctl lr-add lr -- \ >>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>> options:chassis=hv1 >>>>> + >>>>> +check ovn-nbctl ls-add ls -- \ >>>>> + lsp-add ls ls-lr -- \ >>>>> + lsp-set-type ls-lr router -- \ >>>>> + lsp-set-addresses ls-lr router -- \ >>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>> + >>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>> + lsp-set-addresses lsp-bgp unknown >>>>> + >>>>> +# Function that ensures that no BGP redirect ruels are >>>>> installed. >>>>> +check_no_bgp_redirect() { >>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>> grep "tcp.dst == 179" | wc -l], [0], [0 >>>>> +]) >>>>> + >>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc - >>>>> l], [0], [0 >>>>> +]) >>>>> +} >>>>> + >>>>> +# By default, no rules related to BGP redirect are present >>>>> +check_no_bgp_redirect >>>>> + >>>>> +# Set "lsp-bgp" port as target of BGP control plane redirected >>>>> traffic >>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>> options:bgp-redirect=lsp-bgp >>>>> + >>>>> +# Check that BGP control plane traffic is redirected to "lsp- >>>>> bgp" >>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep >>>>> "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>> action=(outport = "lsp-bgp"; output;) >>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>> action=(outport = "lsp-bgp"; output;) >>>>> +]) >>>>> + >>>>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" >>>>> port >>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec >>>>> | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], >>>>> [dnl >>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src >>>>> == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>> match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 >>>>> && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>> next;) >>>>> +]) >>>>> + >>>>> +# Remove 'bgp-redirect' option from LRP and check that rules >>>>> are removed >>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>> options bgp-redirect >>>>> +check_no_bgp_redirect >>>>> + >>>>> +# Set non-existent LSP as target of 'bgp-redirect' and check >>>>> that no rules are added >>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>> options:bgp-redirect=lsp-foo >>>>> +check_no_bgp_redirect >>>>> + >>>>> +AT_CLEANUP >>>>> +]) >>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>> index c24ede7c5..c09c1136e 100644 >>>>> --- a/tests/system-ovn.at >>>>> +++ b/tests/system-ovn.at >>>>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to >>>>> query port patch-.*/d >>>>> /connection dropped.*/d"]) >>>>> AT_CLEANUP >>>>> ]) >>>>> + >>>>> +OVN_FOR_EACH_NORTHD([ >>>>> +AT_SETUP([BGP control plane redirect]) >>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. > > ack > >>>> >>>>> +ovn_start >>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>> + >>>>> +ADD_BR([br-int]) >>>>> +ADD_BR([br-ext]) >>>>> + >>>>> +ovs-ofctl add-flow br-ext action=normal >>>>> +# Set external-ids in br-int needed for ovn-controller >>>>> +ovs-vsctl \ >>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>> type=geneve \ >>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>> ip=169.0.0.1 \ >>>>> + -- set bridge br-int fail-mode=secure other- >>>>> config:disable-in-band=true >>>> Most ov*-ctl commands should be prefixed with "check" in this >>>> test. > > ack > >>>> >>>>> + >>>>> +# Start ovn-controller >>>>> +start_daemon ovn-controller >>>>> + >>>>> +ovn-nbctl lr-add R1 \ >>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>> + >>>>> +ovn-nbctl ls-add public >>>>> + >>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>>>> + >>>>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port >>>>> public-rp \ >>>>> + type=router options:router-port=rp-public \ >>>>> + -- lsp-set-addresses public-rp router >>>>> + >>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>> + >>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>> bridge-mappings=phynet:br-ext]) >>>>> +ovn-nbctl lsp-add public public1 \ >>>>> + -- lsp-set-addresses public1 unknown \ >>>>> + -- lsp-set-type public1 localnet \ >>>>> + -- lsp-set-options public1 network_name=phynet >>>>> + >>>>> +ovn-nbctl --wait=hv sync >>>>> + >>>>> +# Set option that redirects BGP control plane traffic to a LSP >>>>> "bgp-daemon" >>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>> options:bgp-redirect=bgp-daemon >>>>> + >>>>> +# Create "bgp-daemon" interface in a namespace with IP and MAC >>>>> matching LRP "rp-public" >>>>> +ADD_NAMESPACES(bgp-daemon) >>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>> "00:00:02:01:02:03") >>>>> + >>>>> +ADD_NAMESPACES(ext-foo) >>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>> "00:10:10:01:02:13", \ >>>>> + "172.16.1.1") >>>>> + >>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>> + >>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it >>>>> can't be bound to. >>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | >>>>> grep "fe80::" | grep -v tentative])]) >>>>> + >>>>> +# Verify that BGP control plane traffic is delivered to the >>>>> "bgp-daemon" >>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>> [bgp_v4.pid]) >>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>> 172.16.1.1 179]) >>>>> + >>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 >>>>> fe80::200:2ff:fe01:203%ext-foo 179]) >>>>> + >>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>> + >>>>> +as ovn-sb >>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>> + >>>>> +as ovn-nb >>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>> + >>>>> +as northd >>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>> + >>>>> +as >>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>> +/.*terminating with signal 15.*/d"]) >>>>> +AT_CLEANUP >>>>> +]) >>>> Regards, >>>> Dumitru >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > Thanks, > Martin. >
On 07.08.2024 16:23, Dumitru Ceara wrote: > On 8/7/24 11:17,martin.kalcok@canonical.com wrote: >> Hi Dumitru and Vladislav, >> Thank you both for the review and the feedback >> >> On Wed, 2024-08-07 at 09:17 +0200, Dumitru Ceara wrote: >>> On 8/6/24 19:22, Vladislav Odintsov wrote: >>>> Hi Martin, Dumitru, >>>> >>> Hi Vladislav, >>> >>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>> Hi Martin, >>>>> >>>>> Sorry, I was reviewing V3 but I was slow in actually sending out >>>>> the email. >>>>> >>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>> redirecting of BGP control plane traffic to an arbitrary LSP >>>>>> in its peer LS. >>>>>> >>>>>> The option expects a string with a LSP name. When set, >>>>>> any traffic entering LS that's destined for any of the >>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>> to the LSP specified in the option's value. >>>>>> >>>>>> This enables external BGP daemons to listen on an interface >>>>>> bound to a LSP and effectively act as if they were listening >>>>>> on (and speaking from) LRP's IP address. >>>>>> >>>>>> Signed-off-by: Martin Kalcok<martin.kalcok@canonical.com> >>>>>> --- >>>>> Strictly on this version of the patch, and with the thought in >>>>> mind that >>>>> we might want to consider this feature experimental [0] and maybe >>>>> change >>>>> it (NB too) in the future, I left a few comments inline. With >>>>> those >>>>> addressed I think the patch looks OK to me. >>>>> >>>>> [0] >>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>> >> I'm keeping an eye on this one and I think you made a good point >> Dumitru about the need for clear way to mark feature as experimental. >> I'll chime in that patch discussion as well. >> > Thanks! > >>>>> Now, for the point when we'd remove the "experimental" tag: >>>>> >>>>> In general, it makes me a bit uneasy that we have to share the >>>>> MAC and >>>>> IP between the LRP and the VIF of logical switch port that's >>>>> connected >>>>> to the same switch as the LRP. >> I think that in one way or another, the requirement for the interface >> to share IP/MAC with the LRP will remain. The reason for sharing them >> is to allow operating BGP unnumbered which uses IPv6 LLA instead of >> pre-configured ASN. >> > I think we don't need it with any of the alternative options below but > we can discuss those in detail after branching, in the 25.03 cycle. > >>>>> I was thinking of alternatives for the future and the only things >>>>> I >>>>> could come up with until now are: >>>>> >>>>> 1. Create a separate, "disconnected" logical switch: >>>>> ovn-nbctl ls-add bgp >>>>> >>>>> Add the bgp-daemon port to it: >>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>> >>>>> Then we don't need "unknown" either, I think. >>>>> >>>>> But it's not possible today to move packets from the ing,ress >>>>> pipeline of >>>>> a logical switch ("public" in this test) to the egress pipeline >>>>> of a >>>>> different logical switch ("bgp" in my example). It also feels >>>>> icky to >>>>> implement that. >>>>> >>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>> router port: >>>>> >>>>> then we could do the same type of redirect you do here but >>>>> instead in >>>>> the logical router pipeline. The advantage is we don't have to >>>>> drop any >>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>> functioning as >>>>> it does today. >> I like the second approach a lot. It would indeed feel more "in place" >> to have this logic in the LR, instead of LS. I didn't think this would >> be possible initially. >> > Ack, maybe we should come back to this discussion after branching. > >>>>> Maybe an advantage of this second alternative would be that we >>>>> can >>>>> easily attach a filtering option to this LRP (e.g., >>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>> flexible in >>>>> what kind of traffic we forward to the actuall routing protocol >>>>> daemon >>>>> that runs behind that OVS port - Vladislav also mentioned during >>>>> the >>>>> meeting it might be interesting to forward BFD packets to the FRR >>>>> (or >>>>> whatever application implements the routing protocol) instance >>>>> too. >>>> @Martin, I'm again kindly asking about possible support for BFD >>>> redirect. Is it possible to incorporate these changes as an >>>> additional >>>> configuration know ("bfd-redirect") into your patch so we can start >>>> using this functionality internally and possibly give some feedback >>>> (hopefully positive)? >>>> >>>> It doesn't seem to be a very big change but looks very attractive >>>> for us. >>>> >>>> I can send a separate patch for this on top of your patch, but >>>> technically it won't be able to jump into the 24.09 because >>>> formally >>>> it's already a soft-freeze in progress. As an option I can send a >>>> patch, >>>> which can be squashed into this one. >>>> >>>> What do you think? >>> I didn't check with other maintainers (added them in CC now), but >>> given >>> that we agreed to try to accept the port redirecting patch as >>> experimental I think it's fine to expand it to allow BFD support too. >>> Posting a follow up patch on top of this one is fine from my >>> perspective. >> Indeed the change is not that big code-wise, but given that it would >> require some forethought about the implementation I was bit hesitant to >> include it in here. I didn't want to rock the boat too much, so to >> speak. However given that this is indeed going to be marked as >> experimental and if maintainers are OK with it, I'm happy to post v5 >> with this change included. Feedback from the real-world usage from >> Vladislav would be excellent thing to have when we'll move this feature >> from experimental to stable. Thanks for this, Martin! >> >>> However, what if we change the option to? >>> - NB.Logical_Router_Port.options:routing-protocol-redirect=<LSP-NAME> >>> - NB.Logical_Router_Port.options:routing- >>> protocols=<LIST_OF_PROTOCOLS> >>> >>> E.g.: >>> NB.Logical_Router_Port.options:routing-protocol-redirect="bgp-daemon" >>> NB.Logical_Router_Port.options:routing-protocols="BGP;BFD" >> These seem like a good option to me. > Martin, if it's not too much work could we try to add this to v5 then? > Vladislav, does that work for you too? I'd be okay with both versions, but I'm concerned about options naming. Technically BFD is not a "routing" protocol. Will next variants be more accurate? - NB.Logical_Router_Port.options:redirect-protocols-lport=<LSP-NAME> # (or "lsp" or "to"?) - NB.Logical_Router_Port.options:redirect-protocols=<LIST_OF_PROTOCOLS> E.g.: - NB.Logical_Router_Port.options:redirect-protocols-lport="bgp-daemon" - NB.Logical_Router_Port.options:redirect-protocols="BGP;BFD" If needed this can be extended later with "NB.Logical_Router_Port.options:redirect-protocols-match=<MATCH>". For example: "NB.Logical_Router_Port.options:redirect-protocols-match='ip.protocol == 89 || ip.dst == <lsp_ip> && udp.dst == 3784'" (OSPF + BFD). WDYT? > > Regards, > Dumitru > >>> Would this have a higher chance of becoming fully supported in the >>> next >>> release? >>> >>> Regards, >>> Dumitru >>> >>>>> But again, if we consider the current feature "experimental", we >>>>> can >>>>> spend more time during the next release cycle to figure out >>>>> what's best. >>>>> >>>>>> v4 of this patch renames the feature from "bgp-mirror" to "bgp- >>>>>> redirect" >>>>>> as discussed during community meeting. >>>>>> >>>>>> northd/northd.c | 108 >>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>> ovn-nb.xml | 14 ++++++ >>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>>>>> 5 files changed, 289 insertions(+) >>>>>> >>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>> index 4353df07d..088104f25 100644 >>>>>> --- a/northd/northd.c >>>>>> +++ b/northd/northd.c >>>>>> @@ -13048,6 +13048,113 @@ >>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>> } >>>>>> } >>>>>> >>>>>> +static void >>>>>> +build_bgp_redirect_rule__( >>>>>> + const char *s_addr, const char *bgp_port_name, bool >>>>>> is_ipv6, >>>>>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>>>>> + struct ds *match, struct ds *actions) >>>>>> +{ >>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>> + /* Redirect packets in the input pipeline destined for >>>>>> LR's IP to >>>>>> + * the port specified in 'bgp-redirect' option. >>>>>> + */ >>>>>> + ds_clear(match); >>>>>> + ds_clear(actions); >>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", >>>>>> ip_ver, s_addr); >>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>> bgp_port_name); >>>>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, >>>>>> 100, >>>>>> + ds_cstr(match), >>>>>> + ds_cstr(actions), >>>>>> + ls_peer->lflow_ref); >>>>>> + >>>>>> + >>>>>> + /* Drop any traffic originating from 'bgp-redirect' port >>>>>> that does >>>>>> + * not originate from BGP daemon port. This blocks >>>>>> unnecessary >>>>>> + * traffic like ARP broadcasts or IPv6 router solicitation >>>>>> packets >>>>>> + * from the dummy 'bgp-redirect' port. >>>>>> + */ >>>>>> + ds_clear(match); >>>>>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>> + ds_cstr(match), >>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>> + ls_peer->lflow_ref); >>>>> Not a huge deal but this logical flow gets regenerated multiple >>>>> times >>>>> for each bgp_port_name (for each LRP address). In the end we'll >>>>> only >>>>> install one in the database so there's no impact on SB. But >>>>> maybe we >>>>> should consider moving this part out of this function. >> ack >> >>>>>> + >>>>>> + ds_put_format(match, >>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>> + ip_ver, >>>>>> + s_addr); >>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>> + ds_cstr(match), >>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>> check_in_port_sec(); next;", >>>>>> + ls_peer->lflow_ref); >>>>> In your system test we set 'addresses = "unknown"' for >>>>> 'bgp_port_name'. >>>>> And I guess we kind of have to do it like that all the time >>>>> when using >>>>> this feature. I'm trying to think of scenarios in which users >>>>> set >>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>> addresses>'. >>>>> I'm not sure I saw those until now but maybe I'm wrong. >> Is the concern here that the "bgp-port" with the "unknown" address is >> going to receive all the unicast traffic for unknown MAC adresses as >> well? I initially tried to add deny rules that would block any incoming >> packets except those that match LRP's IP and BGP port, but it turned >> out that the negative matches are kinda icky (for a match that has >> something like `ip4.src != <IP_ADDR>` it generated hundreds of OF rules >> in the ovs) >> >>>>>> +} >>>>>> + >>>>>> +static void >>>>>> +build_lrouter_bgp_redirect( >>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>> + struct ds *match, struct ds *actions) >>>>>> +{ >>>>>> + /* LRP has to have a peer.*/ >>>>>> + if (op->peer == NULL) { >>>>>> + return; >>>>>> + } >>>>>> + /* LRP has to have NB record.*/ >>>>>> + if (op->nbrp == NULL) { >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' option >>>>>> set. Value of this >>>>>> + * option is the name of LSP to which a BGP traffic will >>>>>> be redirected. >>>>>> + */ >>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp- >>>>>> redirect"); >>>>>> + if (bgp_port == NULL) { >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>> redirected, exists.*/ >>>>>> + struct ovn_port *peer_lsp; >>>>>> + bool bgp_port_exists = false; >>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>>>>> + bgp_port_exists = true; >>>>>> + break; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + if (!bgp_port_exists) { >>>>>> + static struct vlog_rate_limit rl = >>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>> Logical Router Port " >>>>>> + "'%s' refers to non-existent Logical >>>>>> Switch Port. " >>>>>> + "BGP redirecting won't be >>>>>> configured.", >>>>>> + op->key); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + if (op->cr_port != NULL) { >>>>>> + static struct vlog_rate_limit rl = >>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>> supported on" >>>>>> + " Distributed Gateway Port '%s'", >>>>>> op->key); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + /* Mirror traffic destined for LRP's IPs and default BGP >>>>>> port >>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>> + */ >>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) >>>>>> { >>>>>> + const char *ip_s = op- >>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op- >>>>>>> peer, lflows, >>>>>> + match, actions); >>>>>> + } >>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) >>>>>> { >>>>>> + const char *ip_s = op- >>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op- >>>>>>> peer, lflows, >>>>>> + match, actions); >>>>>> + } >>>>>> +} >>>>>> + >>>>>> /* This function adds ARP resolve flows related to a LSP. */ >>>>>> static void >>>>>> build_arp_resolve_flows_for_lsp( >>>>>> @@ -16003,6 +16110,7 @@ >>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>>>>> lsi->meter_groups, op- >>>>>>> lflow_ref); >>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>> lflows, &lsi->match, >>>>>> &lsi- >>>>>>> actions, op->lflow_ref); >>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, >>>>>> &lsi->actions); >>>>>> } >>>>>> >>>>>> static void * >>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>> --- a/northd/ovn-northd.8.xml >>>>>> +++ b/northd/ovn-northd.8.xml >>>>>> @@ -284,6 +284,21 @@ >>>>>> dropped in the next stage. >>>>>> </li> >>>>>> >>>>>> + <li> >>>>>> + For each logical port that's defined as a target of >>>>>> BGP redirecting >>>>>> + (via <code>bgp-redirect</code> option set on Logical >>>>>> Router Port), a >>>>>> + filter is set in place that disallows any traffic >>>>>> entering this port >>>>>> + that does not originate from Logical Router Port's IPs >>>>>> and default BGP >>>>>> + port (TCP 179). This filtering is achieved by two >>>>>> rules. First rule has >>>>>> + priority 81, it matches on <code>inport == >>>>>> <var>BGP_MIRROR_PORT</var> >>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>> tcp.src == 179</code> >>>>>> + and allows traffic further with >>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>> + check_in_port_sec(); next;</code>. Second rule with >>>>>> priority 80 matches >>>>>> + the rest of the traffic from that port and sets >>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that >>>>>> the packets are >>>>>> + dropped in the next stage. >>>>>> + </li> >>>>>> + >>>>>> <li> >>>>>> For each (enabled) vtep logical port, a priority 70 >>>>>> flow is added which >>>>>> matches on all packets and applies the action >>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>> on the logical switch. >>>>>> </li> >>>>>> >>>>>> + <li> >>>>>> + For any logical port that's defined as a target of BGP >>>>>> redirecting (via >>>>>> + <code>bgp-redirect</code> option set on Logical Router >>>>>> Port), a >>>>>> + priority-100 flow is added that redirects traffic for >>>>>> Logical Router >>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to the >>>>>> targeted >>>>>> + logical port. >>>>>> + </li> >>>>>> + >>>>>> <li> >>>>>> Priority-90 flows for transit switches that forward >>>>>> registered >>>>>> IP multicast traffic to their corresponding multicast >>>>>> group , which >>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>> --- a/ovn-nb.xml >>>>>> +++ b/ovn-nb.xml >>>>>> @@ -3440,6 +3440,20 @@ or >>>>>> </p> >>>>>> </column> >>>>>> >>>>>> + <column name="options" key="bgp-redirect" type='{"type": >>>>>> "string"}'> >>>>>> + <p> >>>>>> + This option expects a name of a Logical Switch Port >>>>>> that's present >>>>>> + in the peer's Logical Switch. If set, it causes any >>>>>> traffic >>>>>> + that's destined for Logical Router Port's IP >>>>>> addresses (including >>>>>> + its IPv6 LLA) and the default BGP port (TCP 179), to >>>>>> be redirected >>>>>> + to the specified Logical Switch Port. >>>>>> + >>>>>> + This allows external BGP daemon to be bound to a >>>>>> port in OVN's >>>>>> + Logical Switch and act as if it was listening on >>>>>> Logical Router >>>>>> + Port's IP address. >>>>>> + </p> >>>>>> + </column> >>>>>> + >>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>> <p> >>>>>> When configured, represents a match expression, in >>>>>> the same >>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>> index a389d1988..df1e924ab 100644 >>>>>> --- a/tests/ovn-northd.at >>>>>> +++ b/tests/ovn-northd.at >>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | >>>>>> grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>> >>>>>> AT_CLEANUP >>>>>> ]) >>>>>> + >>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>> +ovn_start >>>>>> + >>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>> + >>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>> options:chassis=hv1 >>>>>> + >>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>> + lsp-add ls ls-lr -- \ >>>>>> + lsp-set-type ls-lr router -- \ >>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>> + >>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>> + >>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>> installed. >>>>>> +check_no_bgp_redirect() { >>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>> grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>> +]) >>>>>> + >>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc - >>>>>> l], [0], [0 >>>>>> +]) >>>>>> +} >>>>>> + >>>>>> +# By default, no rules related to BGP redirect are present >>>>>> +check_no_bgp_redirect >>>>>> + >>>>>> +# Set "lsp-bgp" port as target of BGP control plane redirected >>>>>> traffic >>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>> options:bgp-redirect=lsp-bgp >>>>>> + >>>>>> +# Check that BGP control plane traffic is redirected to "lsp- >>>>>> bgp" >>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep >>>>>> "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>> action=(outport = "lsp-bgp"; output;) >>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>> action=(outport = "lsp-bgp"; output;) >>>>>> +]) >>>>>> + >>>>>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" >>>>>> port >>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec >>>>>> | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], >>>>>> [dnl >>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src >>>>>> == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>> match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 >>>>>> && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>> next;) >>>>>> +]) >>>>>> + >>>>>> +# Remove 'bgp-redirect' option from LRP and check that rules >>>>>> are removed >>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>> options bgp-redirect >>>>>> +check_no_bgp_redirect >>>>>> + >>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and check >>>>>> that no rules are added >>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>> options:bgp-redirect=lsp-foo >>>>>> +check_no_bgp_redirect >>>>>> + >>>>>> +AT_CLEANUP >>>>>> +]) >>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>> index c24ede7c5..c09c1136e 100644 >>>>>> --- a/tests/system-ovn.at >>>>>> +++ b/tests/system-ovn.at >>>>>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to >>>>>> query port patch-.*/d >>>>>> /connection dropped.*/d"]) >>>>>> AT_CLEANUP >>>>>> ]) >>>>>> + >>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>> +AT_SETUP([BGP control plane redirect]) >>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. >> ack >> >>>>>> +ovn_start >>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>> + >>>>>> +ADD_BR([br-int]) >>>>>> +ADD_BR([br-ext]) >>>>>> + >>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>> +ovs-vsctl \ >>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>> type=geneve \ >>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>> ip=169.0.0.1 \ >>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>> config:disable-in-band=true >>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>> test. >> ack >> >>>>>> + >>>>>> +# Start ovn-controller >>>>>> +start_daemon ovn-controller >>>>>> + >>>>>> +ovn-nbctl lr-add R1 \ >>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>> + >>>>>> +ovn-nbctl ls-add public >>>>>> + >>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>>>>> + >>>>>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port >>>>>> public-rp \ >>>>>> + type=router options:router-port=rp-public \ >>>>>> + -- lsp-set-addresses public-rp router >>>>>> + >>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>> + >>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>> bridge-mappings=phynet:br-ext]) >>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>> + -- lsp-set-type public1 localnet \ >>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>> + >>>>>> +ovn-nbctl --wait=hv sync >>>>>> + >>>>>> +# Set option that redirects BGP control plane traffic to a LSP >>>>>> "bgp-daemon" >>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>> options:bgp-redirect=bgp-daemon >>>>>> + >>>>>> +# Create "bgp-daemon" interface in a namespace with IP and MAC >>>>>> matching LRP "rp-public" >>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>> "00:00:02:01:02:03") >>>>>> + >>>>>> +ADD_NAMESPACES(ext-foo) >>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>> "00:10:10:01:02:13", \ >>>>>> + "172.16.1.1") >>>>>> + >>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>> + >>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it >>>>>> can't be bound to. >>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | >>>>>> grep "fe80::" | grep -v tentative])]) >>>>>> + >>>>>> +# Verify that BGP control plane traffic is delivered to the >>>>>> "bgp-daemon" >>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>> [bgp_v4.pid]) >>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>> 172.16.1.1 179]) >>>>>> + >>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 >>>>>> fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>> + >>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>> + >>>>>> +as ovn-sb >>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>> + >>>>>> +as ovn-nb >>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>> + >>>>>> +as northd >>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>> + >>>>>> +as >>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>> +/.*terminating with signal 15.*/d"]) >>>>>> +AT_CLEANUP >>>>>> +]) >>>>> Regards, >>>>> Dumitru >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> Thanks, >> Martin. >>
Hi all, @Vladislav, you sent one more message earlier but you did not include the mailing list so it fell out of thread, I'll address your question here. You were asking: @Martin, didn't you look for ARP behavior? If we have one LS with three LSPs: - 1 router type LSP with MAC 00:00:00:00:00:01 and IP 169.254.0.1/30 - one "vif" LSP with "unknown" addresses set and configured underlying OVS port with same MAC and IP set - one localnet port for external connectivity. If some party from localnet side sends broadcast ARP request "Who has 169.254.0.1? Answer <...>", will this ARP request got flooded to both LRP and LSP with "unknown" addresses? If yes, will the sender receive 2 ARP replies? Answer: Yes the request will be received on the LSP as well, but only single reply will be received by the requester on the localnet side. We included rules that drop packets not originating from the BGP daemon port, this includes ARP and IPv6 ND. I double-checked now and when external machine sends ARP request for the LRP's IP, only single reply arrives. However, more generally I want to raise an issue that I realized today as I was rewriting this to support BFD. When I was originally implementing and testing this, I failed to account for the fact that the BGP daemon is not just a server, but also a client that needs to connect to its peer. And while the communication works well when the peer is connecting to our (OVN-side) BGP daemon, it fails when our daemon wants to initiate connection to its peer. The reason for this is that our daemon uses randomly selected port number as a client, so the replies from the peer are not destined for port 179 and redirect rules won't pick them up. (I want to add that this is NOT caused by the rules that I mentioned in the reply to Vladislav, I relaxed those to block only outgoing ARP and ND traffic.) I see a possible way forward in making this into a generic redirect feature that, when enabled, redirects all TCP (and UDP?) traffic destined for LRP to the selected LSP. I gave it a quick test and it does solve this issue of our BGP daemon reaching out to its peer. I realize that this is kind of a big change very close to the freeze, however it would solve the discussion around naming of the various LRP options :D I would be interested to hear your thoughts on this. Thanks, Martin. On Wed, 2024-08-07 at 23:26 +0700, Vladislav Odintsov wrote: > > > > > On 07.08.2024 16:23, Dumitru Ceara wrote: > > > > > On 8/7/24 11:17, martin.kalcok@canonical.com wrote: > > > > > > > > Hi Dumitru and Vladislav, > > > Thank you both for the review and the feedback > > > > > > On Wed, 2024-08-07 at 09:17 +0200, Dumitru Ceara wrote: > > > > > > > > > > > On 8/6/24 19:22, Vladislav Odintsov wrote: > > > > > > > > > > > > > > Hi Martin, Dumitru, > > > > > > > > > > > > > > > > > > Hi Vladislav, > > > > > > > > > > > > > > > > > > On 06.08.2024 19:23, Dumitru Ceara wrote: > > > > > > > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > Sorry, I was reviewing V3 but I was slow in actually sending out > > > > > > the email. > > > > > > > > > > > > On 8/6/24 13:17, Martin Kalcok wrote: > > > > > > > > > > > > > > > > > > > > This change adds a 'bgp-redirect' option to LRP that allows > > > > > > > redirecting of BGP control plane traffic to an arbitrary LSP > > > > > > > in its peer LS. > > > > > > > > > > > > > > The option expects a string with a LSP name. When set, > > > > > > > any traffic entering LS that's destined for any of the > > > > > > > LRP's IP addresses (including IPv6 LLA) is redirected > > > > > > > to the LSP specified in the option's value. > > > > > > > > > > > > > > This enables external BGP daemons to listen on an interface > > > > > > > bound to a LSP and effectively act as if they were listening > > > > > > > on (and speaking from) LRP's IP address. > > > > > > > > > > > > > > Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> > > > > > > > --- > > > > > > > > > > > > > > > > > > > Strictly on this version of the patch, and with the thought in > > > > > > mind that > > > > > > we might want to consider this feature experimental [0] and maybe > > > > > > change > > > > > > it (NB too) in the future, I left a few comments inline. With > > > > > > those > > > > > > addressed I think the patch looks OK to me. > > > > > > > > > > > > [0] > > > > > > https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm keeping an eye on this one and I think you made a good point > > > Dumitru about the need for clear way to mark feature as experimental. > > > I'll chime in that patch discussion as well. > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > Now, for the point when we'd remove the "experimental" tag: > > > > > > > > > > > > In general, it makes me a bit uneasy that we have to share the > > > > > > MAC and > > > > > > IP between the LRP and the VIF of logical switch port that's > > > > > > connected > > > > > > to the same switch as the LRP. > > > > > > > > > > > > > > > > > > > > > I think that in one way or another, the requirement for the interface > > > to share IP/MAC with the LRP will remain. The reason for sharing them > > > is to allow operating BGP unnumbered which uses IPv6 LLA instead of > > > pre-configured ASN. > > > > > > > > > > I think we don't need it with any of the alternative options below but > > we can discuss those in detail after branching, in the 25.03 cycle. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I was thinking of alternatives for the future and the only things > > > > > > I > > > > > > could come up with until now are: > > > > > > > > > > > > 1. Create a separate, "disconnected" logical switch: > > > > > > ovn-nbctl ls-add bgp > > > > > > > > > > > > Add the bgp-daemon port to it: > > > > > > ovn-nbctl lsp-add bgp bgp-daemon. > > > > > > > > > > > > Then we don't need "unknown" either, I think. > > > > > > > > > > > > But it's not possible today to move packets from the ing,ress > > > > > > pipeline of > > > > > > a logical switch ("public" in this test) to the egress pipeline > > > > > > of a > > > > > > different logical switch ("bgp" in my example). It also feels > > > > > > icky to > > > > > > implement that. > > > > > > > > > > > > 2. Add the ability to bind an OVS port directly to a logical > > > > > > router port: > > > > > > > > > > > > then we could do the same type of redirect you do here but > > > > > > instead in > > > > > > the logical router pipeline. The advantage is we don't have to > > > > > > drop any > > > > > > non-bgp traffic in the switch pipeline. The switch keeps > > > > > > functioning as > > > > > > it does today. > > > > > > > > > > > > > > > > > > > > > I like the second approach a lot. It would indeed feel more "in place" > > > to have this logic in the LR, instead of LS. I didn't think this would > > > be possible initially. > > > > > > > > > > Ack, maybe we should come back to this discussion after branching. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe an advantage of this second alternative would be that we > > > > > > can > > > > > > easily attach a filtering option to this LRP (e.g., > > > > > > LRP.options:control-traffic-filter) to allow us to be more > > > > > > flexible in > > > > > > what kind of traffic we forward to the actuall routing protocol > > > > > > daemon > > > > > > that runs behind that OVS port - Vladislav also mentioned during > > > > > > the > > > > > > meeting it might be interesting to forward BFD packets to the FRR > > > > > > (or > > > > > > whatever application implements the routing protocol) instance > > > > > > too. > > > > > > > > > > > > > > > > @Martin, I'm again kindly asking about possible support for BFD > > > > > redirect. Is it possible to incorporate these changes as an > > > > > additional > > > > > configuration know ("bfd-redirect") into your patch so we can start > > > > > using this functionality internally and possibly give some feedback > > > > > (hopefully positive)? > > > > > > > > > > It doesn't seem to be a very big change but looks very attractive > > > > > for us. > > > > > > > > > > I can send a separate patch for this on top of your patch, but > > > > > technically it won't be able to jump into the 24.09 because > > > > > formally > > > > > it's already a soft-freeze in progress. As an option I can send a > > > > > patch, > > > > > which can be squashed into this one. > > > > > > > > > > What do you think? > > > > > > > > > > > > > I didn't check with other maintainers (added them in CC now), but > > > > given > > > > that we agreed to try to accept the port redirecting patch as > > > > experimental I think it's fine to expand it to allow BFD support too. > > > > Posting a follow up patch on top of this one is fine from my > > > > perspective. > > > > > > > > > > Indeed the change is not that big code-wise, but given that it would > > > require some forethought about the implementation I was bit hesitant to > > > include it in here. I didn't want to rock the boat too much, so to > > > speak. However given that this is indeed going to be marked as > > > experimental and if maintainers are OK with it, I'm happy to post v5 > > > with this change included. Feedback from the real-world usage from > > > Vladislav would be excellent thing to have when we'll move this feature > > > from experimental to stable. > > > > > > Thanks for this, Martin! > > > > > > > > > > > > > > > > > > > > However, what if we change the option to? > > > > - NB.Logical_Router_Port.options:routing-protocol-redirect=<LSP-NAME> > > > > - NB.Logical_Router_Port.options:routing- > > > > protocols=<LIST_OF_PROTOCOLS> > > > > > > > > E.g.: > > > > NB.Logical_Router_Port.options:routing-protocol-redirect="bgp-daemon" > > > > NB.Logical_Router_Port.options:routing-protocols="BGP;BFD" > > > > > > > > > > These seem like a good option to me. > > > > > > > Martin, if it's not too much work could we try to add this to v5 then? > > Vladislav, does that work for you too? > > > > I'd be okay with both versions, but I'm concerned about options > naming. Technically BFD is not a "routing" protocol. > > Will next variants be more accurate? > > - NB.Logical_Router_Port.options:redirect-protocols-lport=<LSP-NAME> # (or "lsp" or "to"?) > - NB.Logical_Router_Port.options:redirect-protocols=<LIST_OF_PROTOCOLS> > > E.g.: > - NB.Logical_Router_Port.options:redirect-protocols-lport="bgp-daemon" > - NB.Logical_Router_Port.options:redirect-protocols="BGP;BFD" > > If needed this can be extended later with "NB.Logical_Router_Port.options:redirect-protocols-match=<MATCH>". For example: "NB.Logical_Router_Port.options:redirect-protocols-match='ip.protocol == 89 || ip.dst == <lsp_ip> && udp.dst == 3784'" (OSPF + BFD). > > WDYT? > > > > > > > > Regards, > > Dumitru > > > > > > > > > > > > > > > Would this have a higher chance of becoming fully supported in the > > > > next > > > > release? > > > > > > > > Regards, > > > > Dumitru > > > > > > > > > > > > > > > > > > > > > > > > > But again, if we consider the current feature "experimental", we > > > > > > can > > > > > > spend more time during the next release cycle to figure out > > > > > > what's best. > > > > > > > > > > > > > > > > > > > > > > > > > > v4 of this patch renames the feature from "bgp-mirror" to "bgp- > > > > > > > redirect" > > > > > > > as discussed during community meeting. > > > > > > > > > > > > > > northd/northd.c | 108 > > > > > > > ++++++++++++++++++++++++++++++++++++++++ > > > > > > > northd/ovn-northd.8.xml | 23 +++++++++ > > > > > > > ovn-nb.xml | 14 ++++++ > > > > > > > tests/ovn-northd.at | 58 +++++++++++++++++++++ > > > > > > > tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ > > > > > > > 5 files changed, 289 insertions(+) > > > > > > > > > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > > > > > index 4353df07d..088104f25 100644 > > > > > > > --- a/northd/northd.c > > > > > > > +++ b/northd/northd.c > > > > > > > @@ -13048,6 +13048,113 @@ > > > > > > > build_arp_resolve_flows_for_lrp(struct ovn_port *op, > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > +static void > > > > > > > +build_bgp_redirect_rule__( > > > > > > > + const char *s_addr, const char *bgp_port_name, bool > > > > > > > is_ipv6, > > > > > > > + struct ovn_port *ls_peer, struct lflow_table *lflows, > > > > > > > + struct ds *match, struct ds *actions) > > > > > > > +{ > > > > > > > + int ip_ver = is_ipv6 ? 6 : 4; > > > > > > > + /* Redirect packets in the input pipeline destined for > > > > > > > LR's IP to > > > > > > > + * the port specified in 'bgp-redirect' option. > > > > > > > + */ > > > > > > > + ds_clear(match); > > > > > > > + ds_clear(actions); > > > > > > > + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", > > > > > > > ip_ver, s_addr); > > > > > > > + ds_put_format(actions, "outport = \"%s\"; output;", > > > > > > > bgp_port_name); > > > > > > > + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, > > > > > > > 100, > > > > > > > + ds_cstr(match), > > > > > > > + ds_cstr(actions), > > > > > > > + ls_peer->lflow_ref); > > > > > > > + > > > > > > > + > > > > > > > + /* Drop any traffic originating from 'bgp-redirect' port > > > > > > > that does > > > > > > > + * not originate from BGP daemon port. This blocks > > > > > > > unnecessary > > > > > > > + * traffic like ARP broadcasts or IPv6 router solicitation > > > > > > > packets > > > > > > > + * from the dummy 'bgp-redirect' port. > > > > > > > + */ > > > > > > > + ds_clear(match); > > > > > > > + ds_put_format(match, "inport == \"%s\"", bgp_port_name); > > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > > S_SWITCH_IN_CHECK_PORT_SEC, 80, > > > > > > > + ds_cstr(match), > > > > > > > + REGBIT_PORT_SEC_DROP " = 1; next;", > > > > > > > + ls_peer->lflow_ref); > > > > > > > > > > > > > > > > > > > Not a huge deal but this logical flow gets regenerated multiple > > > > > > times > > > > > > for each bgp_port_name (for each LRP address). In the end we'll > > > > > > only > > > > > > install one in the database so there's no impact on SB. But > > > > > > maybe we > > > > > > should consider moving this part out of this function. > > > > > > > > > > > > > > > > > > > > > ack > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > + ds_put_format(match, > > > > > > > + " && ip%d.src == %s && tcp.src == 179", > > > > > > > + ip_ver, > > > > > > > + s_addr); > > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > > S_SWITCH_IN_CHECK_PORT_SEC, 81, > > > > > > > + ds_cstr(match), > > > > > > > + REGBIT_PORT_SEC_DROP " = > > > > > > > check_in_port_sec(); next;", > > > > > > > + ls_peer->lflow_ref); > > > > > > > > > > > > > > > > > > > In your system test we set 'addresses = "unknown"' for > > > > > > 'bgp_port_name'. > > > > > > And I guess we kind of have to do it like that all the time > > > > > > when using > > > > > > this feature. I'm trying to think of scenarios in which users > > > > > > set > > > > > > LSP.addresses = 'unknown' and LSP.port-security = '<some- > > > > > > addresses>'. > > > > > > I'm not sure I saw those until now but maybe I'm wrong. > > > > > > > > > > > > > > > > > > > > > Is the concern here that the "bgp-port" with the "unknown" address is > > > going to receive all the unicast traffic for unknown MAC adresses as > > > well? I initially tried to add deny rules that would block any incoming > > > packets except those that match LRP's IP and BGP port, but it turned > > > out that the negative matches are kinda icky (for a match that has > > > something like `ip4.src != <IP_ADDR>` it generated hundreds of OF rules > > > in the ovs) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +} > > > > > > > + > > > > > > > +static void > > > > > > > +build_lrouter_bgp_redirect( > > > > > > > + struct ovn_port *op, struct lflow_table *lflows, > > > > > > > + struct ds *match, struct ds *actions) > > > > > > > +{ > > > > > > > + /* LRP has to have a peer.*/ > > > > > > > + if (op->peer == NULL) { > > > > > > > + return; > > > > > > > + } > > > > > > > + /* LRP has to have NB record.*/ > > > > > > > + if (op->nbrp == NULL) { > > > > > > > + return; > > > > > > > + } > > > > > > > + > > > > > > > + /* Proceed only for LRPs that have 'bgp-redirect' option > > > > > > > set. Value of this > > > > > > > + * option is the name of LSP to which a BGP traffic will > > > > > > > be redirected. > > > > > > > + */ > > > > > > > + const char *bgp_port = smap_get(&op->nbrp->options, "bgp- > > > > > > > redirect"); > > > > > > > + if (bgp_port == NULL) { > > > > > > > + return; > > > > > > > + } > > > > > > > + > > > > > > > + /* Ensure that LSP, to which the BGP traffic is > > > > > > > redirected, exists.*/ > > > > > > > + struct ovn_port *peer_lsp; > > > > > > > + bool bgp_port_exists = false; > > > > > > > + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { > > > > > > > + size_t peer_lsp_s = strlen(peer_lsp->key); > > > > > > > + if (peer_lsp_s == strlen(bgp_port) > > > > > > > + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ > > > > > > > + bgp_port_exists = true; > > > > > > > + break; > > > > > > > + } > > > > > > > + } > > > > > > > + > > > > > > > + if (!bgp_port_exists) { > > > > > > > + static struct vlog_rate_limit rl = > > > > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > > > > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on > > > > > > > Logical Router Port " > > > > > > > + "'%s' refers to non-existent Logical > > > > > > > Switch Port. " > > > > > > > + "BGP redirecting won't be > > > > > > > configured.", > > > > > > > + op->key); > > > > > > > + return; > > > > > > > + } > > > > > > > + > > > > > > > + if (op->cr_port != NULL) { > > > > > > > + static struct vlog_rate_limit rl = > > > > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > > > > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not > > > > > > > supported on" > > > > > > > + " Distributed Gateway Port '%s'", > > > > > > > op->key); > > > > > > > + return; > > > > > > > + } > > > > > > > + > > > > > > > + /* Mirror traffic destined for LRP's IPs and default BGP > > > > > > > port > > > > > > > + * to the port defined in 'bgp-redirect' option. > > > > > > > + */ > > > > > > > + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) > > > > > > > { > > > > > > > + const char *ip_s = op- > > > > > > > > > > > > > > > > > > > > > > > lrp_networks.ipv4_addrs[i].addr_s; > > > > > > > > > > > > > > > > > > > > > > + build_bgp_redirect_rule__(ip_s, bgp_port, false, op- > > > > > > > > > > > > > > > > > > > > > > > peer, lflows, > > > > > > > > > > > > > > > > > > > > > > + match, actions); > > > > > > > + } > > > > > > > + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) > > > > > > > { > > > > > > > + const char *ip_s = op- > > > > > > > > > > > > > > > > > > > > > > > lrp_networks.ipv6_addrs[i].addr_s; > > > > > > > > > > > > > > > > > > > > > > + build_bgp_redirect_rule__(ip_s, bgp_port, true, op- > > > > > > > > > > > > > > > > > > > > > > > peer, lflows, > > > > > > > > > > > > > > > > > > > > > > + match, actions); > > > > > > > + } > > > > > > > +} > > > > > > > + > > > > > > > /* This function adds ARP resolve flows related to a LSP. */ > > > > > > > static void > > > > > > > build_arp_resolve_flows_for_lsp( > > > > > > > @@ -16003,6 +16110,7 @@ > > > > > > > build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, > > > > > > > lsi->meter_groups, op- > > > > > > > > > > > > > > > > > > > > > > > lflow_ref); > > > > > > > > > > > > > > > > > > > > > > build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- > > > > > > > > > > > > > > > > > > > > > > > lflows, &lsi->match, > > > > > > > > > > > > > > > > > > > > > > &lsi- > > > > > > > > > > > > > > > > > > > > > > > actions, op->lflow_ref); > > > > > > > > > > > > > > > > > > > > > > + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, > > > > > > > &lsi->actions); > > > > > > > } > > > > > > > > > > > > > > static void * > > > > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > > > > > > index b06b09ac5..4cbdf8f74 100644 > > > > > > > --- a/northd/ovn-northd.8.xml > > > > > > > +++ b/northd/ovn-northd.8.xml > > > > > > > @@ -284,6 +284,21 @@ > > > > > > > dropped in the next stage. > > > > > > > </li> > > > > > > > > > > > > > > + <li> > > > > > > > + For each logical port that's defined as a target of > > > > > > > BGP redirecting > > > > > > > + (via <code>bgp-redirect</code> option set on Logical > > > > > > > Router Port), a > > > > > > > + filter is set in place that disallows any traffic > > > > > > > entering this port > > > > > > > + that does not originate from Logical Router Port's IPs > > > > > > > and default BGP > > > > > > > + port (TCP 179). This filtering is achieved by two > > > > > > > rules. First rule has > > > > > > > + priority 81, it matches on <code>inport == > > > > > > > <var>BGP_MIRROR_PORT</var> > > > > > > > + && ip.src == <var>LRP_IP</var> && > > > > > > > tcp.src == 179</code> > > > > > > > + and allows traffic further with > > > > > > > <code>REGBIT_PORT_SEC_DROP" = > > > > > > > + check_in_port_sec(); next;</code>. Second rule with > > > > > > > priority 80 matches > > > > > > > + the rest of the traffic from that port and sets > > > > > > > + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that > > > > > > > the packets are > > > > > > > + dropped in the next stage. > > > > > > > + </li> > > > > > > > + > > > > > > > <li> > > > > > > > For each (enabled) vtep logical port, a priority 70 > > > > > > > flow is added which > > > > > > > matches on all packets and applies the action > > > > > > > @@ -1949,6 +1964,14 @@ output; > > > > > > > on the logical switch. > > > > > > > </li> > > > > > > > > > > > > > > + <li> > > > > > > > + For any logical port that's defined as a target of BGP > > > > > > > redirecting (via > > > > > > > + <code>bgp-redirect</code> option set on Logical Router > > > > > > > Port), a > > > > > > > + priority-100 flow is added that redirects traffic for > > > > > > > Logical Router > > > > > > > + Port IPs (including IPv6 LLA) and TCP port 179, to the > > > > > > > targeted > > > > > > > + logical port. > > > > > > > + </li> > > > > > > > + > > > > > > > <li> > > > > > > > Priority-90 flows for transit switches that forward > > > > > > > registered > > > > > > > IP multicast traffic to their corresponding multicast > > > > > > > group , which > > > > > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > > > > > > index 9552534f6..2d2f5c9e0 100644 > > > > > > > --- a/ovn-nb.xml > > > > > > > +++ b/ovn-nb.xml > > > > > > > @@ -3440,6 +3440,20 @@ or > > > > > > > </p> > > > > > > > </column> > > > > > > > > > > > > > > + <column name="options" key="bgp-redirect" type='{"type": > > > > > > > "string"}'> > > > > > > > + <p> > > > > > > > + This option expects a name of a Logical Switch Port > > > > > > > that's present > > > > > > > + in the peer's Logical Switch. If set, it causes any > > > > > > > traffic > > > > > > > + that's destined for Logical Router Port's IP > > > > > > > addresses (including > > > > > > > + its IPv6 LLA) and the default BGP port (TCP 179), to > > > > > > > be redirected > > > > > > > + to the specified Logical Switch Port. > > > > > > > + > > > > > > > + This allows external BGP daemon to be bound to a > > > > > > > port in OVN's > > > > > > > + Logical Switch and act as if it was listening on > > > > > > > Logical Router > > > > > > > + Port's IP address. > > > > > > > + </p> > > > > > > > + </column> > > > > > > > + > > > > > > > <column name="options" key="gateway_mtu_bypass"> > > > > > > > <p> > > > > > > > When configured, represents a match expression, in > > > > > > > the same > > > > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > > > > > index a389d1988..df1e924ab 100644 > > > > > > > --- a/tests/ovn-northd.at > > > > > > > +++ b/tests/ovn-northd.at > > > > > > > @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | > > > > > > > grep lr_in_dnat | ovn_strip_lflows], [0], [d > > > > > > > > > > > > > > AT_CLEANUP > > > > > > > ]) > > > > > > > + > > > > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > > > > > +AT_SETUP([BGP control plane redirect]) > > > > > > > +ovn_start > > > > > > > + > > > > > > > +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 > > > > > > > + > > > > > > > +check ovn-nbctl lr-add lr -- \ > > > > > > > + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 > > > > > > > +check ovn-nbctl --wait=sb set logical_router lr > > > > > > > options:chassis=hv1 > > > > > > > + > > > > > > > +check ovn-nbctl ls-add ls -- \ > > > > > > > + lsp-add ls ls-lr -- \ > > > > > > > + lsp-set-type ls-lr router -- \ > > > > > > > + lsp-set-addresses ls-lr router -- \ > > > > > > > + lsp-set-options ls-lr router-port=lr-ls > > > > > > > + > > > > > > > +check ovn-nbctl lsp-add ls lsp-bgp -- \ > > > > > > > + lsp-set-addresses lsp-bgp unknown > > > > > > > + > > > > > > > +# Function that ensures that no BGP redirect ruels are > > > > > > > installed. > > > > > > > +check_no_bgp_redirect() { > > > > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | > > > > > > > grep "tcp.dst == 179" | wc -l], [0], [0 > > > > > > > +]) > > > > > > > + > > > > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > > ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc - > > > > > > > l], [0], [0 > > > > > > > +]) > > > > > > > +} > > > > > > > + > > > > > > > +# By default, no rules related to BGP redirect are present > > > > > > > +check_no_bgp_redirect > > > > > > > + > > > > > > > +# Set "lsp-bgp" port as target of BGP control plane redirected > > > > > > > traffic > > > > > > > +check ovn-nbctl --wait=sb set logical_router_port lr-ls > > > > > > > options:bgp-redirect=lsp-bgp > > > > > > > + > > > > > > > +# Check that BGP control plane traffic is redirected to "lsp- > > > > > > > bgp" > > > > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep > > > > > > > "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl > > > > > > > + table=??(ls_in_l2_lkup ), priority=100 , > > > > > > > match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), > > > > > > > action=(outport = "lsp-bgp"; output;) > > > > > > > + table=??(ls_in_l2_lkup ), priority=100 , > > > > > > > match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), > > > > > > > action=(outport = "lsp-bgp"; output;) > > > > > > > +]) > > > > > > > + > > > > > > > +# Check that only BGP-related traffic is accepted on "lsp-bgp" > > > > > > > port > > > > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec > > > > > > > | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], > > > > > > > [dnl > > > > > > > + table=??(ls_in_check_port_sec), priority=80 , > > > > > > > match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) > > > > > > > + table=??(ls_in_check_port_sec), priority=81 , > > > > > > > match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src > > > > > > > == 179), action=(reg0[[15]] = check_in_port_sec(); next;) > > > > > > > + table=??(ls_in_check_port_sec), priority=81 , > > > > > > > match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 > > > > > > > && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); > > > > > > > next;) > > > > > > > +]) > > > > > > > + > > > > > > > +# Remove 'bgp-redirect' option from LRP and check that rules > > > > > > > are removed > > > > > > > +check ovn-nbctl --wait=sb remove logical_router_port lr-ls > > > > > > > options bgp-redirect > > > > > > > +check_no_bgp_redirect > > > > > > > + > > > > > > > +# Set non-existent LSP as target of 'bgp-redirect' and check > > > > > > > that no rules are added > > > > > > > +check ovn-nbctl --wait=sb set logical_router_port lr-ls > > > > > > > options:bgp-redirect=lsp-foo > > > > > > > +check_no_bgp_redirect > > > > > > > + > > > > > > > +AT_CLEANUP > > > > > > > +]) > > > > > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > > > > > > > index c24ede7c5..c09c1136e 100644 > > > > > > > --- a/tests/system-ovn.at > > > > > > > +++ b/tests/system-ovn.at > > > > > > > @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to > > > > > > > query port patch-.*/d > > > > > > > /connection dropped.*/d"]) > > > > > > > AT_CLEANUP > > > > > > > ]) > > > > > > > + > > > > > > > +OVN_FOR_EACH_NORTHD([ > > > > > > > +AT_SETUP([BGP control plane redirect]) > > > > > > > > > > > > > > > > > > > Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. > > > > > > > > > > > > > > > > > > > > > ack > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +ovn_start > > > > > > > +OVS_TRAFFIC_VSWITCHD_START() > > > > > > > + > > > > > > > +ADD_BR([br-int]) > > > > > > > +ADD_BR([br-ext]) > > > > > > > + > > > > > > > +ovs-ofctl add-flow br-ext action=normal > > > > > > > +# Set external-ids in br-int needed for ovn-controller > > > > > > > +ovs-vsctl \ > > > > > > > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > > > > > > > + -- set Open_vSwitch . external-ids:ovn- > > > > > > > remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > > > > > > > + -- set Open_vSwitch . external-ids:ovn-encap- > > > > > > > type=geneve \ > > > > > > > + -- set Open_vSwitch . external-ids:ovn-encap- > > > > > > > ip=169.0.0.1 \ > > > > > > > + -- set bridge br-int fail-mode=secure other- > > > > > > > config:disable-in-band=true > > > > > > > > > > > > > > > > > > > Most ov*-ctl commands should be prefixed with "check" in this > > > > > > test. > > > > > > > > > > > > > > > > > > > > > ack > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > +# Start ovn-controller > > > > > > > +start_daemon ovn-controller > > > > > > > + > > > > > > > +ovn-nbctl lr-add R1 \ > > > > > > > + -- set Logical_Router R1 options:chassis=hv1 > > > > > > > + > > > > > > > +ovn-nbctl ls-add public > > > > > > > + > > > > > > > +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 > > > > > > > + > > > > > > > +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port > > > > > > > public-rp \ > > > > > > > + type=router options:router-port=rp-public \ > > > > > > > + -- lsp-set-addresses public-rp router > > > > > > > + > > > > > > > +ovn-nbctl lsp-add public bgp-daemon \ > > > > > > > + -- lsp-set-addresses bgp-daemon unknown > > > > > > > + > > > > > > > +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- > > > > > > > bridge-mappings=phynet:br-ext]) > > > > > > > +ovn-nbctl lsp-add public public1 \ > > > > > > > + -- lsp-set-addresses public1 unknown \ > > > > > > > + -- lsp-set-type public1 localnet \ > > > > > > > + -- lsp-set-options public1 network_name=phynet > > > > > > > + > > > > > > > +ovn-nbctl --wait=hv sync > > > > > > > + > > > > > > > +# Set option that redirects BGP control plane traffic to a LSP > > > > > > > "bgp-daemon" > > > > > > > +ovn-nbctl --wait=sb set logical_router_port rp-public > > > > > > > options:bgp-redirect=bgp-daemon > > > > > > > + > > > > > > > +# Create "bgp-daemon" interface in a namespace with IP and MAC > > > > > > > matching LRP "rp-public" > > > > > > > +ADD_NAMESPACES(bgp-daemon) > > > > > > > +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", > > > > > > > "00:00:02:01:02:03") > > > > > > > + > > > > > > > +ADD_NAMESPACES(ext-foo) > > > > > > > +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", > > > > > > > "00:10:10:01:02:13", \ > > > > > > > + "172.16.1.1") > > > > > > > + > > > > > > > +# Flip the interface down/up to get proper IPv6 LLA > > > > > > > +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) > > > > > > > +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) > > > > > > > + > > > > > > > +# Wait until IPv6 LLA loses the "tentative" flag otherwise it > > > > > > > can't be bound to. > > > > > > > +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- > > > > > > > daemon | grep "fe80::" | grep -v tentative])]) > > > > > > > +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | > > > > > > > grep "fe80::" | grep -v tentative])]) > > > > > > > + > > > > > > > +# Verify that BGP control plane traffic is delivered to the > > > > > > > "bgp-daemon" > > > > > > > +# interface on both IPv4 and IPv6 LLA addresses > > > > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], > > > > > > > [bgp_v4.pid]) > > > > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only > > > > > > > 172.16.1.1 179]) > > > > > > > + > > > > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k > > > > > > > fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) > > > > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 > > > > > > > fe80::200:2ff:fe01:203%ext-foo 179]) > > > > > > > + > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > > > > > > > + > > > > > > > +as ovn-sb > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > > > > + > > > > > > > +as ovn-nb > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > > > > + > > > > > > > +as northd > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > > > > > > > + > > > > > > > +as > > > > > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > > > > > > > +/.*terminating with signal 15.*/d"]) > > > > > > > +AT_CLEANUP > > > > > > > +]) > > > > > > > > > > > > > > > > > > > Regards, > > > > > > Dumitru > > > > > > > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > dev@openvswitch.org > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > > > > > > > > > > > > > > Thanks, > > > Martin. > > > > > > > > >
On 8/7/24 20:49, martin.kalcok@canonical.com wrote: > Hi all, > @Vladislav, you sent one more message earlier but you did not include > the mailing list so it fell out of thread, I'll address your question > here. You were asking: > > @Martin, didn't you look for ARP behavior? If we have one LS with three > LSPs: > - 1 router type LSP with MAC 00:00:00:00:00:01 and IP 169.254.0.1/30 > - one "vif" LSP with "unknown" addresses set and configured underlying > OVS port with same MAC and IP set > - one localnet port for external connectivity. > > If some party from localnet side sends broadcast ARP request "Who has > 169.254.0.1? Answer <...>", will this ARP request got flooded to both > LRP and LSP with "unknown" addresses? If yes, will the sender receive 2 > ARP replies? > > > Answer: Yes the request will be received on the LSP as well, but only > single reply will be received by the requester on the localnet side. We > included rules that drop packets not originating from the BGP daemon > port, this includes ARP and IPv6 ND. I double-checked now and when > external machine sends ARP request for the LRP's IP, only single reply > arrives. > > However, more generally I want to raise an issue that I realized today > as I was rewriting this to support BFD. When I was originally > implementing and testing this, I failed to account for the fact that the > BGP daemon is not just a server, but also a client that needs to connect > to its peer. And while the communication works well when the peer is > connecting to our (OVN-side) BGP daemon, it fails when our daemon wants > to initiate connection to its peer. The reason for this is that our > daemon uses randomly selected port number as a client, so the replies > from the peer are not destined for port 179 and redirect rules won't > pick them up. (I want to add that this is NOT caused by the rules that I But isn't the reply coming from TCP.src == 179? > mentioned in the reply to Vladislav, I relaxed those to block only > outgoing ARP and ND traffic.) Doesn't the VIF need to resolve ARP to reach external destinations? > I see a possible way forward in making this into a generic redirect > feature that, when enabled, redirects all TCP (and UDP?) traffic > destined for LRP to the selected LSP. I gave it a quick test and it does > solve this issue of our BGP daemon reaching out to its peer. I realize > that this is kind of a big change very close to the freeze, however it > would solve the discussion around naming of the various LRP options :D > > I would be interested to hear your thoughts on this. What about other types of traffic initiated by the VIF behind the selected LSP? Shouldn't we let the CMS implement the traffic filtering with ACLs then? E.g., # allow all replies to sessions initiated by the VIF from-lport, match: inport==bgp-daemon, action: allow-related, prio 1 # allow incoming BGP connections (if needed) to-lport: match: "tcp.dst == BGP", action: allow, prio 2 # allow BFD (if needed) to-lport: match: "udp.dst == BFD", action: allow, prio 2 # default drop anything else going into the VIF, # replies to VIF-initiated traffic are allowed due to the # from-lport ACL. to-lport: match: "1", action: drop, prio 1 However, while writing this I'm starting to think more and more that "alternative 2" below, allowing the LR port to be (partially) bound to a VIF (with a filter for some of the incoming control traffic - as Vladislav mentioned during the technical meeting and in the other email thread), makes more sense. Like that we _don't_ really need to: - share any MAC/IP address (there's only one LR port with those) - worry about ARPs and ND because if the VIF generates an ARP it looks exactly the same as the one generated by the LRP (we do need to forward all replies to the VIF too though) - worry about blocking any type of egress traffic from the VIF in general TL/DR: if a generic "redirect" option works for now in conjunction with ACLs on the "target" logical switch I think I'm OK with that (if it's "experimental"). But, at least for now, it seems to me the long term solution might be something else in the end. Regards, Dumitru > > Thanks, > Martin. > > On Wed, 2024-08-07 at 23:26 +0700, Vladislav Odintsov wrote: >> >> >> On 07.08.2024 16:23, Dumitru Ceara wrote: >> >>> On 8/7/24 11:17, martin.kalcok@canonical.com wrote: >>> >>>> Hi Dumitru and Vladislav, >>>> Thank you both for the review and the feedback >>>> >>>> On Wed, 2024-08-07 at 09:17 +0200, Dumitru Ceara wrote: >>>> >>>>> On 8/6/24 19:22, Vladislav Odintsov wrote: >>>>> >>>>>> Hi Martin, Dumitru, >>>>>> >>>>> Hi Vladislav, >>>>> >>>>> >>>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>> >>>>>>> Hi Martin, >>>>>>> >>>>>>> Sorry, I was reviewing V3 but I was slow in actually sending out >>>>>>> the email. >>>>>>> >>>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>> >>>>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>>>> redirecting of BGP control plane traffic to an arbitrary LSP >>>>>>>> in its peer LS. >>>>>>>> >>>>>>>> The option expects a string with a LSP name. When set, >>>>>>>> any traffic entering LS that's destined for any of the >>>>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>>>> to the LSP specified in the option's value. >>>>>>>> >>>>>>>> This enables external BGP daemons to listen on an interface >>>>>>>> bound to a LSP and effectively act as if they were listening >>>>>>>> on (and speaking from) LRP's IP address. >>>>>>>> >>>>>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>>>>>> --- >>>>>>> Strictly on this version of the patch, and with the thought in >>>>>>> mind that >>>>>>> we might want to consider this feature experimental [0] and maybe >>>>>>> change >>>>>>> it (NB too) in the future, I left a few comments inline. With >>>>>>> those >>>>>>> addressed I think the patch looks OK to me. >>>>>>> >>>>>>> [0] >>>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>>> >>>> I'm keeping an eye on this one and I think you made a good point >>>> Dumitru about the need for clear way to mark feature as experimental. >>>> I'll chime in that patch discussion as well. >>>> >>> Thanks! >>> >>> >>>> >>>>> >>>>>> >>>>>>> Now, for the point when we'd remove the "experimental" tag: >>>>>>> >>>>>>> In general, it makes me a bit uneasy that we have to share the >>>>>>> MAC and >>>>>>> IP between the LRP and the VIF of logical switch port that's >>>>>>> connected >>>>>>> to the same switch as the LRP. >>>> I think that in one way or another, the requirement for the interface >>>> to share IP/MAC with the LRP will remain. The reason for sharing them >>>> is to allow operating BGP unnumbered which uses IPv6 LLA instead of >>>> pre-configured ASN. >>>> >>> I think we don't need it with any of the alternative options below but >>> we can discuss those in detail after branching, in the 25.03 cycle. >>> >>> >>>> >>>>> >>>>>> >>>>>>> I was thinking of alternatives for the future and the only things >>>>>>> I >>>>>>> could come up with until now are: >>>>>>> >>>>>>> 1. Create a separate, "disconnected" logical switch: >>>>>>> ovn-nbctl ls-add bgp >>>>>>> >>>>>>> Add the bgp-daemon port to it: >>>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>>> >>>>>>> Then we don't need "unknown" either, I think. >>>>>>> >>>>>>> But it's not possible today to move packets from the ing,ress >>>>>>> pipeline of >>>>>>> a logical switch ("public" in this test) to the egress pipeline >>>>>>> of a >>>>>>> different logical switch ("bgp" in my example). It also feels >>>>>>> icky to >>>>>>> implement that. >>>>>>> >>>>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>>>> router port: >>>>>>> >>>>>>> then we could do the same type of redirect you do here but >>>>>>> instead in >>>>>>> the logical router pipeline. The advantage is we don't have to >>>>>>> drop any >>>>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>>>> functioning as >>>>>>> it does today. >>>> I like the second approach a lot. It would indeed feel more "in place" >>>> to have this logic in the LR, instead of LS. I didn't think this would >>>> be possible initially. >>>> >>> Ack, maybe we should come back to this discussion after branching. >>> >>> >>>> >>>>> >>>>>> >>>>>>> Maybe an advantage of this second alternative would be that we >>>>>>> can >>>>>>> easily attach a filtering option to this LRP (e.g., >>>>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>>>> flexible in >>>>>>> what kind of traffic we forward to the actuall routing protocol >>>>>>> daemon >>>>>>> that runs behind that OVS port - Vladislav also mentioned during >>>>>>> the >>>>>>> meeting it might be interesting to forward BFD packets to the FRR >>>>>>> (or >>>>>>> whatever application implements the routing protocol) instance >>>>>>> too. >>>>>> @Martin, I'm again kindly asking about possible support for BFD >>>>>> redirect. Is it possible to incorporate these changes as an >>>>>> additional >>>>>> configuration know ("bfd-redirect") into your patch so we can start >>>>>> using this functionality internally and possibly give some feedback >>>>>> (hopefully positive)? >>>>>> >>>>>> It doesn't seem to be a very big change but looks very attractive >>>>>> for us. >>>>>> >>>>>> I can send a separate patch for this on top of your patch, but >>>>>> technically it won't be able to jump into the 24.09 because >>>>>> formally >>>>>> it's already a soft-freeze in progress. As an option I can send a >>>>>> patch, >>>>>> which can be squashed into this one. >>>>>> >>>>>> What do you think? >>>>> I didn't check with other maintainers (added them in CC now), but >>>>> given >>>>> that we agreed to try to accept the port redirecting patch as >>>>> experimental I think it's fine to expand it to allow BFD support too. >>>>> Posting a follow up patch on top of this one is fine from my >>>>> perspective. >>>> Indeed the change is not that big code-wise, but given that it would >>>> require some forethought about the implementation I was bit hesitant to >>>> include it in here. I didn't want to rock the boat too much, so to >>>> speak. However given that this is indeed going to be marked as >>>> experimental and if maintainers are OK with it, I'm happy to post v5 >>>> with this change included. Feedback from the real-world usage from >>>> Vladislav would be excellent thing to have when we'll move this feature >>>> from experimental to stable. >> Thanks for this, Martin! >> >>> >>>> >>>> >>>>> However, what if we change the option to? >>>>> - NB.Logical_Router_Port.options:routing-protocol-redirect=<LSP-NAME> >>>>> - NB.Logical_Router_Port.options:routing- >>>>> protocols=<LIST_OF_PROTOCOLS> >>>>> >>>>> E.g.: >>>>> NB.Logical_Router_Port.options:routing-protocol-redirect="bgp-daemon" >>>>> NB.Logical_Router_Port.options:routing-protocols="BGP;BFD" >>>> These seem like a good option to me. >>> Martin, if it's not too much work could we try to add this to v5 then? >>> Vladislav, does that work for you too? >> >> I'd be okay with both versions, but I'm concerned about options >> naming. Technically BFD is not a "routing" protocol. >> >> Will next variants be more accurate? >> >> - NB.Logical_Router_Port.options:redirect-protocols-lport=<LSP-NAME> # (or "lsp" or "to"?) >> - NB.Logical_Router_Port.options:redirect-protocols=<LIST_OF_PROTOCOLS> >> >> E.g.: >> - NB.Logical_Router_Port.options:redirect-protocols-lport="bgp-daemon" >> - NB.Logical_Router_Port.options:redirect-protocols="BGP;BFD" >> >> If needed this can be extended later with "NB.Logical_Router_Port.options:redirect-protocols-match=<MATCH>". For example: "NB.Logical_Router_Port.options:redirect-protocols-match='ip.protocol == 89 || ip.dst == <lsp_ip> && udp.dst == 3784'" (OSPF + BFD). >> >> WDYT? >> >> >>> >>> Regards, >>> Dumitru >>> >>> >>>> >>>>> Would this have a higher chance of becoming fully supported in the >>>>> next >>>>> release? >>>>> >>>>> Regards, >>>>> Dumitru >>>>> >>>>> >>>>>> >>>>>>> But again, if we consider the current feature "experimental", we >>>>>>> can >>>>>>> spend more time during the next release cycle to figure out >>>>>>> what's best. >>>>>>> >>>>>>> >>>>>>>> v4 of this patch renames the feature from "bgp-mirror" to "bgp- >>>>>>>> redirect" >>>>>>>> as discussed during community meeting. >>>>>>>> >>>>>>>> northd/northd.c | 108 >>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>>>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>>>>>>> 5 files changed, 289 insertions(+) >>>>>>>> >>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>> index 4353df07d..088104f25 100644 >>>>>>>> --- a/northd/northd.c >>>>>>>> +++ b/northd/northd.c >>>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> +static void >>>>>>>> +build_bgp_redirect_rule__( >>>>>>>> + const char *s_addr, const char *bgp_port_name, bool >>>>>>>> is_ipv6, >>>>>>>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>> +{ >>>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>>> + /* Redirect packets in the input pipeline destined for >>>>>>>> LR's IP to >>>>>>>> + * the port specified in 'bgp-redirect' option. >>>>>>>> + */ >>>>>>>> + ds_clear(match); >>>>>>>> + ds_clear(actions); >>>>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", >>>>>>>> ip_ver, s_addr); >>>>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>>>> bgp_port_name); >>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, >>>>>>>> 100, >>>>>>>> + ds_cstr(match), >>>>>>>> + ds_cstr(actions), >>>>>>>> + ls_peer->lflow_ref); >>>>>>>> + >>>>>>>> + >>>>>>>> + /* Drop any traffic originating from 'bgp-redirect' port >>>>>>>> that does >>>>>>>> + * not originate from BGP daemon port. This blocks >>>>>>>> unnecessary >>>>>>>> + * traffic like ARP broadcasts or IPv6 router solicitation >>>>>>>> packets >>>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>>> + */ >>>>>>>> + ds_clear(match); >>>>>>>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>>> + ds_cstr(match), >>>>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>>>> + ls_peer->lflow_ref); >>>>>>> Not a huge deal but this logical flow gets regenerated multiple >>>>>>> times >>>>>>> for each bgp_port_name (for each LRP address). In the end we'll >>>>>>> only >>>>>>> install one in the database so there's no impact on SB. But >>>>>>> maybe we >>>>>>> should consider moving this part out of this function. >>>> ack >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> + >>>>>>>> + ds_put_format(match, >>>>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>>>> + ip_ver, >>>>>>>> + s_addr); >>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>>> + ds_cstr(match), >>>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>>> check_in_port_sec(); next;", >>>>>>>> + ls_peer->lflow_ref); >>>>>>> In your system test we set 'addresses = "unknown"' for >>>>>>> 'bgp_port_name'. >>>>>>> And I guess we kind of have to do it like that all the time >>>>>>> when using >>>>>>> this feature. I'm trying to think of scenarios in which users >>>>>>> set >>>>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>>>> addresses>'. >>>>>>> I'm not sure I saw those until now but maybe I'm wrong. >>>> Is the concern here that the "bgp-port" with the "unknown" address is >>>> going to receive all the unicast traffic for unknown MAC adresses as >>>> well? I initially tried to add deny rules that would block any incoming >>>> packets except those that match LRP's IP and BGP port, but it turned >>>> out that the negative matches are kinda icky (for a match that has >>>> something like `ip4.src != <IP_ADDR>` it generated hundreds of OF rules >>>> in the ovs) >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void >>>>>>>> +build_lrouter_bgp_redirect( >>>>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>> +{ >>>>>>>> + /* LRP has to have a peer.*/ >>>>>>>> + if (op->peer == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + /* LRP has to have NB record.*/ >>>>>>>> + if (op->nbrp == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' option >>>>>>>> set. Value of this >>>>>>>> + * option is the name of LSP to which a BGP traffic will >>>>>>>> be redirected. >>>>>>>> + */ >>>>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp- >>>>>>>> redirect"); >>>>>>>> + if (bgp_port == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>>>> redirected, exists.*/ >>>>>>>> + struct ovn_port *peer_lsp; >>>>>>>> + bool bgp_port_exists = false; >>>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>>>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>>>>>>> + bgp_port_exists = true; >>>>>>>> + break; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> + if (!bgp_port_exists) { >>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>>>> Logical Router Port " >>>>>>>> + "'%s' refers to non-existent Logical >>>>>>>> Switch Port. " >>>>>>>> + "BGP redirecting won't be >>>>>>>> configured.", >>>>>>>> + op->key); >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + if (op->cr_port != NULL) { >>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>>>> supported on" >>>>>>>> + " Distributed Gateway Port '%s'", >>>>>>>> op->key); >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* Mirror traffic destined for LRP's IPs and default BGP >>>>>>>> port >>>>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>>>> + */ >>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) >>>>>>>> { >>>>>>>> + const char *ip_s = op- >>>>>>>> >>>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op- >>>>>>>> >>>>>>>>> peer, lflows, >>>>>>>> + match, actions); >>>>>>>> + } >>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) >>>>>>>> { >>>>>>>> + const char *ip_s = op- >>>>>>>> >>>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op- >>>>>>>> >>>>>>>>> peer, lflows, >>>>>>>> + match, actions); >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> /* This function adds ARP resolve flows related to a LSP. */ >>>>>>>> static void >>>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>>>>>>> lsi->meter_groups, op- >>>>>>>> >>>>>>>>> lflow_ref); >>>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>>> >>>>>>>>> lflows, &lsi->match, >>>>>>>> &lsi- >>>>>>>> >>>>>>>>> actions, op->lflow_ref); >>>>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, >>>>>>>> &lsi->actions); >>>>>>>> } >>>>>>>> >>>>>>>> static void * >>>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>>> @@ -284,6 +284,21 @@ >>>>>>>> dropped in the next stage. >>>>>>>> </li> >>>>>>>> >>>>>>>> + <li> >>>>>>>> + For each logical port that's defined as a target of >>>>>>>> BGP redirecting >>>>>>>> + (via <code>bgp-redirect</code> option set on Logical >>>>>>>> Router Port), a >>>>>>>> + filter is set in place that disallows any traffic >>>>>>>> entering this port >>>>>>>> + that does not originate from Logical Router Port's IPs >>>>>>>> and default BGP >>>>>>>> + port (TCP 179). This filtering is achieved by two >>>>>>>> rules. First rule has >>>>>>>> + priority 81, it matches on <code>inport == >>>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>>>> tcp.src == 179</code> >>>>>>>> + and allows traffic further with >>>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>>> + check_in_port_sec(); next;</code>. Second rule with >>>>>>>> priority 80 matches >>>>>>>> + the rest of the traffic from that port and sets >>>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that >>>>>>>> the packets are >>>>>>>> + dropped in the next stage. >>>>>>>> + </li> >>>>>>>> + >>>>>>>> <li> >>>>>>>> For each (enabled) vtep logical port, a priority 70 >>>>>>>> flow is added which >>>>>>>> matches on all packets and applies the action >>>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>>> on the logical switch. >>>>>>>> </li> >>>>>>>> >>>>>>>> + <li> >>>>>>>> + For any logical port that's defined as a target of BGP >>>>>>>> redirecting (via >>>>>>>> + <code>bgp-redirect</code> option set on Logical Router >>>>>>>> Port), a >>>>>>>> + priority-100 flow is added that redirects traffic for >>>>>>>> Logical Router >>>>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to the >>>>>>>> targeted >>>>>>>> + logical port. >>>>>>>> + </li> >>>>>>>> + >>>>>>>> <li> >>>>>>>> Priority-90 flows for transit switches that forward >>>>>>>> registered >>>>>>>> IP multicast traffic to their corresponding multicast >>>>>>>> group , which >>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>>> --- a/ovn-nb.xml >>>>>>>> +++ b/ovn-nb.xml >>>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>>> </p> >>>>>>>> </column> >>>>>>>> >>>>>>>> + <column name="options" key="bgp-redirect" type='{"type": >>>>>>>> "string"}'> >>>>>>>> + <p> >>>>>>>> + This option expects a name of a Logical Switch Port >>>>>>>> that's present >>>>>>>> + in the peer's Logical Switch. If set, it causes any >>>>>>>> traffic >>>>>>>> + that's destined for Logical Router Port's IP >>>>>>>> addresses (including >>>>>>>> + its IPv6 LLA) and the default BGP port (TCP 179), to >>>>>>>> be redirected >>>>>>>> + to the specified Logical Switch Port. >>>>>>>> + >>>>>>>> + This allows external BGP daemon to be bound to a >>>>>>>> port in OVN's >>>>>>>> + Logical Switch and act as if it was listening on >>>>>>>> Logical Router >>>>>>>> + Port's IP address. >>>>>>>> + </p> >>>>>>>> + </column> >>>>>>>> + >>>>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>>>> <p> >>>>>>>> When configured, represents a match expression, in >>>>>>>> the same >>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>>>> index a389d1988..df1e924ab 100644 >>>>>>>> --- a/tests/ovn-northd.at >>>>>>>> +++ b/tests/ovn-northd.at >>>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | >>>>>>>> grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>>> >>>>>>>> AT_CLEANUP >>>>>>>> ]) >>>>>>>> + >>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>> +ovn_start >>>>>>>> + >>>>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>>>> + >>>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>>>> options:chassis=hv1 >>>>>>>> + >>>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>>> + lsp-add ls ls-lr -- \ >>>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>>> + >>>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>>> + >>>>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>>>> installed. >>>>>>>> +check_no_bgp_redirect() { >>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>>>> grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>>>> +]) >>>>>>>> + >>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc - >>>>>>>> l], [0], [0 >>>>>>>> +]) >>>>>>>> +} >>>>>>>> + >>>>>>>> +# By default, no rules related to BGP redirect are present >>>>>>>> +check_no_bgp_redirect >>>>>>>> + >>>>>>>> +# Set "lsp-bgp" port as target of BGP control plane redirected >>>>>>>> traffic >>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>> options:bgp-redirect=lsp-bgp >>>>>>>> + >>>>>>>> +# Check that BGP control plane traffic is redirected to "lsp- >>>>>>>> bgp" >>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep >>>>>>>> "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>> +]) >>>>>>>> + >>>>>>>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" >>>>>>>> port >>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec >>>>>>>> | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], >>>>>>>> [dnl >>>>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src >>>>>>>> == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>> match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 >>>>>>>> && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>>>> next;) >>>>>>>> +]) >>>>>>>> + >>>>>>>> +# Remove 'bgp-redirect' option from LRP and check that rules >>>>>>>> are removed >>>>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>>>> options bgp-redirect >>>>>>>> +check_no_bgp_redirect >>>>>>>> + >>>>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and check >>>>>>>> that no rules are added >>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>> options:bgp-redirect=lsp-foo >>>>>>>> +check_no_bgp_redirect >>>>>>>> + >>>>>>>> +AT_CLEANUP >>>>>>>> +]) >>>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>>> --- a/tests/system-ovn.at >>>>>>>> +++ b/tests/system-ovn.at >>>>>>>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to >>>>>>>> query port patch-.*/d >>>>>>>> /connection dropped.*/d"]) >>>>>>>> AT_CLEANUP >>>>>>>> ]) >>>>>>>> + >>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. >>>> ack >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> +ovn_start >>>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>>> + >>>>>>>> +ADD_BR([br-int]) >>>>>>>> +ADD_BR([br-ext]) >>>>>>>> + >>>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>>>> +ovs-vsctl \ >>>>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>> type=geneve \ >>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>> ip=169.0.0.1 \ >>>>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>>>> config:disable-in-band=true >>>>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>>>> test. >>>> ack >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> + >>>>>>>> +# Start ovn-controller >>>>>>>> +start_daemon ovn-controller >>>>>>>> + >>>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>>>> + >>>>>>>> +ovn-nbctl ls-add public >>>>>>>> + >>>>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>>>>>>> + >>>>>>>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port >>>>>>>> public-rp \ >>>>>>>> + type=router options:router-port=rp-public \ >>>>>>>> + -- lsp-set-addresses public-rp router >>>>>>>> + >>>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>>> + >>>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>>>> + >>>>>>>> +ovn-nbctl --wait=hv sync >>>>>>>> + >>>>>>>> +# Set option that redirects BGP control plane traffic to a LSP >>>>>>>> "bgp-daemon" >>>>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>>>> options:bgp-redirect=bgp-daemon >>>>>>>> + >>>>>>>> +# Create "bgp-daemon" interface in a namespace with IP and MAC >>>>>>>> matching LRP "rp-public" >>>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>>>> "00:00:02:01:02:03") >>>>>>>> + >>>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>>>> "00:10:10:01:02:13", \ >>>>>>>> + "172.16.1.1") >>>>>>>> + >>>>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>>>> + >>>>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it >>>>>>>> can't be bound to. >>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | >>>>>>>> grep "fe80::" | grep -v tentative])]) >>>>>>>> + >>>>>>>> +# Verify that BGP control plane traffic is delivered to the >>>>>>>> "bgp-daemon" >>>>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>>>> [bgp_v4.pid]) >>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>> 172.16.1.1 179]) >>>>>>>> + >>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 >>>>>>>> fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>>> + >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>>> + >>>>>>>> +as ovn-sb >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>> + >>>>>>>> +as ovn-nb >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>> + >>>>>>>> +as northd >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>>> + >>>>>>>> +as >>>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>>> +AT_CLEANUP >>>>>>>> +]) >>>>>>> Regards, >>>>>>> Dumitru >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> dev@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> Thanks, >>>> Martin. >>>> > >
Re-adding the dev list. On 8/7/24 18:12, Vladislav Odintsov wrote: > Hi Dumitru, > Hi Vladislav, > I'd like to add some thoughts to your inputs. > Thanks for that, I added some more of my own below. :) > On 06.08.2024 19:23, Dumitru Ceara wrote: >> Hi Martin, >> >> Sorry, I was reviewing V3 but I was slow in actually sending out the email. >> >> On 8/6/24 13:17, Martin Kalcok wrote: >>> This change adds a 'bgp-redirect' option to LRP that allows >>> redirecting of BGP control plane traffic to an arbitrary LSP >>> in its peer LS. >>> >>> The option expects a string with a LSP name. When set, >>> any traffic entering LS that's destined for any of the >>> LRP's IP addresses (including IPv6 LLA) is redirected >>> to the LSP specified in the option's value. >>> >>> This enables external BGP daemons to listen on an interface >>> bound to a LSP and effectively act as if they were listening >>> on (and speaking from) LRP's IP address. >>> >>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>> --- >> Strictly on this version of the patch, and with the thought in mind that >> we might want to consider this feature experimental [0] and maybe change >> it (NB too) in the future, I left a few comments inline. With those >> addressed I think the patch looks OK to me. >> >> [0] >> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >> >> Now, for the point when we'd remove the "experimental" tag: >> >> In general, it makes me a bit uneasy that we have to share the MAC and >> IP between the LRP and the VIF of logical switch port that's connected >> to the same switch as the LRP. >> >> I was thinking of alternatives for the future and the only things I >> could come up with until now are: >> >> 1. Create a separate, "disconnected" logical switch: >> ovn-nbctl ls-add bgp >> >> Add the bgp-daemon port to it: >> ovn-nbctl lsp-add bgp bgp-daemon. >> >> Then we don't need "unknown" either, I think. >> >> But it's not possible today to move packets from the ingress pipeline of >> a logical switch ("public" in this test) to the egress pipeline of a >> different logical switch ("bgp" in my example). It also feels icky to >> implement that. >> >> 2. Add the ability to bind an OVS port directly to a logical router port: >> >> then we could do the same type of redirect you do here but instead in >> the logical router pipeline. The advantage is we don't have to drop any >> non-bgp traffic in the switch pipeline. The switch keeps functioning as >> it does today. >> >> Maybe an advantage of this second alternative would be that we can >> easily attach a filtering option to this LRP (e.g., >> LRP.options:control-traffic-filter) to allow us to be more flexible in >> what kind of traffic we forward to the actuall routing protocol daemon >> that runs behind that OVS port - Vladislav also mentioned during the >> meeting it might be interesting to forward BFD packets to the FRR (or >> whatever application implements the routing protocol) instance too. > > The idea to be able to bind LRP to OVS port sounds very interesting to > me. But with a note that it's not a pure "bind", but a partial: as you > wrote with some "filter" applied to pass not all the traffic. The main > idea here is to pass only control plane traffic destined to LRP's As we're discussing on the other thread (Martin pointed it out) we also probably need to involve conntrack and allow replies to any kind of traffic initiated by the entity behind the LRP's VIF (e.g., BGP sessions initiated from the host). > addresses. Other than that seems odd. Transit traffic should be remained > in LR pipeline otherwise it will have no difference with a regular VIF LSP. > Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or LB rules that will change the destination IP from LRP IP to something else) should not be affected. All other "transit" traffic doesn't have LRP IP, does it? > Having a filter (or match like in ACL or Logical_Router_Policies) looks > more flexible in terms of used protocols. This can coexist with proposal > from current patch, where the flow is not programmable from user. > I think so too. Regards, Dumitru >> But again, if we consider the current feature "experimental", we can >> spend more time during the next release cycle to figure out what's best. >> >>> v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" >>> as discussed during community meeting. >>> >>> northd/northd.c | 108 ++++++++++++++++++++++++++++++++++++++++ >>> northd/ovn-northd.8.xml | 23 +++++++++ >>> ovn-nb.xml | 14 ++++++ >>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>> 5 files changed, 289 insertions(+) >>> >>> diff --git a/northd/northd.c b/northd/northd.c >>> index 4353df07d..088104f25 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>> } >>> } >>> >>> +static void >>> +build_bgp_redirect_rule__( >>> + const char *s_addr, const char *bgp_port_name, bool is_ipv6, >>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>> + struct ds *match, struct ds *actions) >>> +{ >>> + int ip_ver = is_ipv6 ? 6 : 4; >>> + /* Redirect packets in the input pipeline destined for LR's IP to >>> + * the port specified in 'bgp-redirect' option. >>> + */ >>> + ds_clear(match); >>> + ds_clear(actions); >>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); >>> + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); >>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, >>> + ds_cstr(match), >>> + ds_cstr(actions), >>> + ls_peer->lflow_ref); >>> + >>> + >>> + /* Drop any traffic originating from 'bgp-redirect' port that does >>> + * not originate from BGP daemon port. This blocks unnecessary >>> + * traffic like ARP broadcasts or IPv6 router solicitation packets >>> + * from the dummy 'bgp-redirect' port. >>> + */ >>> + ds_clear(match); >>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, >>> + ds_cstr(match), >>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>> + ls_peer->lflow_ref); >> Not a huge deal but this logical flow gets regenerated multiple times >> for each bgp_port_name (for each LRP address). In the end we'll only >> install one in the database so there's no impact on SB. But maybe we >> should consider moving this part out of this function. >> >>> + >>> + ds_put_format(match, >>> + " && ip%d.src == %s && tcp.src == 179", >>> + ip_ver, >>> + s_addr); >>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, >>> + ds_cstr(match), >>> + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", >>> + ls_peer->lflow_ref); >> In your system test we set 'addresses = "unknown"' for 'bgp_port_name'. >> And I guess we kind of have to do it like that all the time when using >> this feature. I'm trying to think of scenarios in which users set >> LSP.addresses = 'unknown' and LSP.port-security = '<some-addresses>'. >> I'm not sure I saw those until now but maybe I'm wrong. > > I'm also interested whether this LSP port with unknown addresses will > get all the broadcast traffic? > > @Martin, didn't you look for ARP behavior? If we have one LS with three > LSPs: > - 1 router type LSP with MAC 00:00:00:00:00:01 and IP 169.254.0.1/30 > - one "vif" LSP with "unknown" addresses set and configured underlying > OVS port with same MAC and IP set > - one localnet port for external connectivity. > > If some party from localnet side sends broadcast ARP request "Who has > 169.254.0.1? Answer <...>", will this ARP request got flooded to both > LRP and LSP with "unknown" addresses? If yes, will the sender receive 2 > ARP replies? > >>> +} >>> + >>> +static void >>> +build_lrouter_bgp_redirect( >>> + struct ovn_port *op, struct lflow_table *lflows, >>> + struct ds *match, struct ds *actions) >>> +{ >>> + /* LRP has to have a peer.*/ >>> + if (op->peer == NULL) { >>> + return; >>> + } >>> + /* LRP has to have NB record.*/ >>> + if (op->nbrp == NULL) { >>> + return; >>> + } >>> + >>> + /* Proceed only for LRPs that have 'bgp-redirect' option set. Value of this >>> + * option is the name of LSP to which a BGP traffic will be redirected. >>> + */ >>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-redirect"); >>> + if (bgp_port == NULL) { >>> + return; >>> + } >>> + >>> + /* Ensure that LSP, to which the BGP traffic is redirected, exists.*/ >>> + struct ovn_port *peer_lsp; >>> + bool bgp_port_exists = false; >>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>> + if (peer_lsp_s == strlen(bgp_port) >>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>> + bgp_port_exists = true; >>> + break; >>> + } >>> + } >>> + >>> + if (!bgp_port_exists) { >>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on Logical Router Port " >>> + "'%s' refers to non-existent Logical Switch Port. " >>> + "BGP redirecting won't be configured.", >>> + op->key); >>> + return; >>> + } >>> + >>> + if (op->cr_port != NULL) { >>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not supported on" >>> + " Distributed Gateway Port '%s'", op->key); >>> + return; >>> + } >>> + >>> + /* Mirror traffic destined for LRP's IPs and default BGP port >>> + * to the port defined in 'bgp-redirect' option. >>> + */ >>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { >>> + const char *ip_s = op->lrp_networks.ipv4_addrs[i].addr_s; >>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op->peer, lflows, >>> + match, actions); >>> + } >>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { >>> + const char *ip_s = op->lrp_networks.ipv6_addrs[i].addr_s; >>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op->peer, lflows, >>> + match, actions); >>> + } >>> +} >>> + >>> /* This function adds ARP resolve flows related to a LSP. */ >>> static void >>> build_arp_resolve_flows_for_lsp( >>> @@ -16003,6 +16110,7 @@ build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>> lsi->meter_groups, op->lflow_ref); >>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi->lflows, &lsi->match, >>> &lsi->actions, op->lflow_ref); >>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, &lsi->actions); >>> } >>> >>> static void * >>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>> index b06b09ac5..4cbdf8f74 100644 >>> --- a/northd/ovn-northd.8.xml >>> +++ b/northd/ovn-northd.8.xml >>> @@ -284,6 +284,21 @@ >>> dropped in the next stage. >>> </li> >>> >>> + <li> >>> + For each logical port that's defined as a target of BGP redirecting >>> + (via <code>bgp-redirect</code> option set on Logical Router Port), a >>> + filter is set in place that disallows any traffic entering this port >>> + that does not originate from Logical Router Port's IPs and default BGP >>> + port (TCP 179). This filtering is achieved by two rules. First rule has >>> + priority 81, it matches on <code>inport == <var>BGP_MIRROR_PORT</var> >>> + && ip.src == <var>LRP_IP</var> && tcp.src == 179</code> >>> + and allows traffic further with <code>REGBIT_PORT_SEC_DROP" = >>> + check_in_port_sec(); next;</code>. Second rule with priority 80 matches >>> + the rest of the traffic from that port and sets >>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that the packets are >>> + dropped in the next stage. >>> + </li> >>> + >>> <li> >>> For each (enabled) vtep logical port, a priority 70 flow is added which >>> matches on all packets and applies the action >>> @@ -1949,6 +1964,14 @@ output; >>> on the logical switch. >>> </li> >>> >>> + <li> >>> + For any logical port that's defined as a target of BGP redirecting (via >>> + <code>bgp-redirect</code> option set on Logical Router Port), a >>> + priority-100 flow is added that redirects traffic for Logical Router >>> + Port IPs (including IPv6 LLA) and TCP port 179, to the targeted >>> + logical port. >>> + </li> >>> + >>> <li> >>> Priority-90 flows for transit switches that forward registered >>> IP multicast traffic to their corresponding multicast group , which >>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>> index 9552534f6..2d2f5c9e0 100644 >>> --- a/ovn-nb.xml >>> +++ b/ovn-nb.xml >>> @@ -3440,6 +3440,20 @@ or >>> </p> >>> </column> >>> >>> + <column name="options" key="bgp-redirect" type='{"type": "string"}'> >>> + <p> >>> + This option expects a name of a Logical Switch Port that's present >>> + in the peer's Logical Switch. If set, it causes any traffic >>> + that's destined for Logical Router Port's IP addresses (including >>> + its IPv6 LLA) and the default BGP port (TCP 179), to be redirected >>> + to the specified Logical Switch Port. >>> + >>> + This allows external BGP daemon to be bound to a port in OVN's >>> + Logical Switch and act as if it was listening on Logical Router >>> + Port's IP address. >>> + </p> >>> + </column> >>> + >>> <column name="options" key="gateway_mtu_bypass"> >>> <p> >>> When configured, represents a match expression, in the same >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>> index a389d1988..df1e924ab 100644 >>> --- a/tests/ovn-northd.at >>> +++ b/tests/ovn-northd.at >>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>> >>> AT_CLEANUP >>> ]) >>> + >>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>> +AT_SETUP([BGP control plane redirect]) >>> +ovn_start >>> + >>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>> + >>> +check ovn-nbctl lr-add lr -- \ >>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>> +check ovn-nbctl --wait=sb set logical_router lr options:chassis=hv1 >>> + >>> +check ovn-nbctl ls-add ls -- \ >>> + lsp-add ls ls-lr -- \ >>> + lsp-set-type ls-lr router -- \ >>> + lsp-set-addresses ls-lr router -- \ >>> + lsp-set-options ls-lr router-port=lr-ls >>> + >>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>> + lsp-set-addresses lsp-bgp unknown >>> + >>> +# Function that ensures that no BGP redirect ruels are installed. >>> +check_no_bgp_redirect() { >>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>> +]) >>> + >>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc -l], [0], [0 >>> +]) >>> +} >>> + >>> +# By default, no rules related to BGP redirect are present >>> +check_no_bgp_redirect >>> + >>> +# Set "lsp-bgp" port as target of BGP control plane redirected traffic >>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-bgp >>> + >>> +# Check that BGP control plane traffic is redirected to "lsp-bgp" >>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>> +]) >>> + >>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" port >>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], [dnl >>> + table=??(ls_in_check_port_sec), priority=80 , match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>> +]) >>> + >>> +# Remove 'bgp-redirect' option from LRP and check that rules are removed >>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls options bgp-redirect >>> +check_no_bgp_redirect >>> + >>> +# Set non-existent LSP as target of 'bgp-redirect' and check that no rules are added >>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-foo >>> +check_no_bgp_redirect >>> + >>> +AT_CLEANUP >>> +]) >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>> index c24ede7c5..c09c1136e 100644 >>> --- a/tests/system-ovn.at >>> +++ b/tests/system-ovn.at >>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> /connection dropped.*/d"]) >>> AT_CLEANUP >>> ]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([BGP control plane redirect]) >> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. >> >>> +ovn_start >>> +OVS_TRAFFIC_VSWITCHD_START() >>> + >>> +ADD_BR([br-int]) >>> +ADD_BR([br-ext]) >>> + >>> +ovs-ofctl add-flow br-ext action=normal >>> +# Set external-ids in br-int needed for ovn-controller >>> +ovs-vsctl \ >>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>> + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>> + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true >> Most ov*-ctl commands should be prefixed with "check" in this test. >> >>> + >>> +# Start ovn-controller >>> +start_daemon ovn-controller >>> + >>> +ovn-nbctl lr-add R1 \ >>> + -- set Logical_Router R1 options:chassis=hv1 >>> + >>> +ovn-nbctl ls-add public >>> + >>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>> + >>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ >>> + type=router options:router-port=rp-public \ >>> + -- lsp-set-addresses public-rp router >>> + >>> +ovn-nbctl lsp-add public bgp-daemon \ >>> + -- lsp-set-addresses bgp-daemon unknown >>> + >>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext]) >>> +ovn-nbctl lsp-add public public1 \ >>> + -- lsp-set-addresses public1 unknown \ >>> + -- lsp-set-type public1 localnet \ >>> + -- lsp-set-options public1 network_name=phynet >>> + >>> +ovn-nbctl --wait=hv sync >>> + >>> +# Set option that redirects BGP control plane traffic to a LSP "bgp-daemon" >>> +ovn-nbctl --wait=sb set logical_router_port rp-public options:bgp-redirect=bgp-daemon >>> + >>> +# Create "bgp-daemon" interface in a namespace with IP and MAC matching LRP "rp-public" >>> +ADD_NAMESPACES(bgp-daemon) >>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", "00:00:02:01:02:03") >>> + >>> +ADD_NAMESPACES(ext-foo) >>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", "00:10:10:01:02:13", \ >>> + "172.16.1.1") >>> + >>> +# Flip the interface down/up to get proper IPv6 LLA >>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>> + >>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it can't be bound to. >>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp-daemon | grep "fe80::" | grep -v tentative])]) >>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | grep "fe80::" | grep -v tentative])]) >>> + >>> +# Verify that BGP control plane traffic is delivered to the "bgp-daemon" >>> +# interface on both IPv4 and IPv6 LLA addresses >>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], [bgp_v4.pid]) >>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only 172.16.1.1 179]) >>> + >>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>> + >>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>> + >>> +as ovn-sb >>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> + >>> +as ovn-nb >>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> + >>> +as northd >>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>> + >>> +as >>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>> +/.*terminating with signal 15.*/d"]) >>> +AT_CLEANUP >>> +]) >> Regards, >> Dumitru >>
On 8/7/24 22:06, Dumitru Ceara wrote: > On 8/7/24 20:49, martin.kalcok@canonical.com wrote: >> Hi all, >> @Vladislav, you sent one more message earlier but you did not include >> the mailing list so it fell out of thread, I'll address your question >> here. You were asking: >> >> @Martin, didn't you look for ARP behavior? If we have one LS with three >> LSPs: >> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP 169.254.0.1/30 >> - one "vif" LSP with "unknown" addresses set and configured underlying >> OVS port with same MAC and IP set >> - one localnet port for external connectivity. >> >> If some party from localnet side sends broadcast ARP request "Who has >> 169.254.0.1? Answer <...>", will this ARP request got flooded to both >> LRP and LSP with "unknown" addresses? If yes, will the sender receive 2 >> ARP replies? >> >> >> Answer: Yes the request will be received on the LSP as well, but only >> single reply will be received by the requester on the localnet side. We >> included rules that drop packets not originating from the BGP daemon >> port, this includes ARP and IPv6 ND. I double-checked now and when >> external machine sends ARP request for the LRP's IP, only single reply >> arrives. >> >> However, more generally I want to raise an issue that I realized today >> as I was rewriting this to support BFD. When I was originally >> implementing and testing this, I failed to account for the fact that the >> BGP daemon is not just a server, but also a client that needs to connect >> to its peer. And while the communication works well when the peer is >> connecting to our (OVN-side) BGP daemon, it fails when our daemon wants >> to initiate connection to its peer. The reason for this is that our >> daemon uses randomly selected port number as a client, so the replies >> from the peer are not destined for port 179 and redirect rules won't >> pick them up. (I want to add that this is NOT caused by the rules that I > > But isn't the reply coming from TCP.src == 179? > >> mentioned in the reply to Vladislav, I relaxed those to block only >> outgoing ARP and ND traffic.) > > Doesn't the VIF need to resolve ARP to reach external destinations? > >> I see a possible way forward in making this into a generic redirect >> feature that, when enabled, redirects all TCP (and UDP?) traffic >> destined for LRP to the selected LSP. I gave it a quick test and it does >> solve this issue of our BGP daemon reaching out to its peer. I realize >> that this is kind of a big change very close to the freeze, however it >> would solve the discussion around naming of the various LRP options :D >> >> I would be interested to hear your thoughts on this. > > What about other types of traffic initiated by the VIF behind the > selected LSP? > > Shouldn't we let the CMS implement the traffic filtering with ACLs then? > > E.g., > > # allow all replies to sessions initiated by the VIF > from-lport, match: inport==bgp-daemon, action: allow-related, prio 1 > > # allow incoming BGP connections (if needed) > to-lport: match: "tcp.dst == BGP", action: allow, prio 2 > # allow BFD (if needed) > to-lport: match: "udp.dst == BFD", action: allow, prio 2 > # default drop anything else going into the VIF, > # replies to VIF-initiated traffic are allowed due to the > # from-lport ACL. > to-lport: match: "1", action: drop, prio 1 I just realized, all 3 "to-lport" ACLs above must also match on "outport == 'bgp-daemon' && ..". I only wanted to restrict traffic going into the target LSP. > > However, while writing this I'm starting to think more and more that > "alternative 2" below, allowing the LR port to be (partially) bound to a > VIF (with a filter for some of the incoming control traffic - as > Vladislav mentioned during the technical meeting and in the other email > thread), makes more sense. > > Like that we _don't_ really need to: > - share any MAC/IP address (there's only one LR port with those) > - worry about ARPs and ND because if the VIF generates an ARP it looks > exactly the same as the one generated by the LRP (we do need to forward > all replies to the VIF too though) > - worry about blocking any type of egress traffic from the VIF in general > > TL/DR: if a generic "redirect" option works for now in conjunction with > ACLs on the "target" logical switch I think I'm OK with that (if it's > "experimental"). But, at least for now, it seems to me the long term > solution might be something else in the end. > > Regards, > Dumitru > >> >> Thanks, >> Martin. >> >> On Wed, 2024-08-07 at 23:26 +0700, Vladislav Odintsov wrote: >>> >>> >>> On 07.08.2024 16:23, Dumitru Ceara wrote: >>> >>>> On 8/7/24 11:17, martin.kalcok@canonical.com wrote: >>>> >>>>> Hi Dumitru and Vladislav, >>>>> Thank you both for the review and the feedback >>>>> >>>>> On Wed, 2024-08-07 at 09:17 +0200, Dumitru Ceara wrote: >>>>> >>>>>> On 8/6/24 19:22, Vladislav Odintsov wrote: >>>>>> >>>>>>> Hi Martin, Dumitru, >>>>>>> >>>>>> Hi Vladislav, >>>>>> >>>>>> >>>>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>>> >>>>>>>> Hi Martin, >>>>>>>> >>>>>>>> Sorry, I was reviewing V3 but I was slow in actually sending out >>>>>>>> the email. >>>>>>>> >>>>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>>> >>>>>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>>>>> redirecting of BGP control plane traffic to an arbitrary LSP >>>>>>>>> in its peer LS. >>>>>>>>> >>>>>>>>> The option expects a string with a LSP name. When set, >>>>>>>>> any traffic entering LS that's destined for any of the >>>>>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>>>>> to the LSP specified in the option's value. >>>>>>>>> >>>>>>>>> This enables external BGP daemons to listen on an interface >>>>>>>>> bound to a LSP and effectively act as if they were listening >>>>>>>>> on (and speaking from) LRP's IP address. >>>>>>>>> >>>>>>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>>>>>>> --- >>>>>>>> Strictly on this version of the patch, and with the thought in >>>>>>>> mind that >>>>>>>> we might want to consider this feature experimental [0] and maybe >>>>>>>> change >>>>>>>> it (NB too) in the future, I left a few comments inline. With >>>>>>>> those >>>>>>>> addressed I think the patch looks OK to me. >>>>>>>> >>>>>>>> [0] >>>>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>>>> >>>>> I'm keeping an eye on this one and I think you made a good point >>>>> Dumitru about the need for clear way to mark feature as experimental. >>>>> I'll chime in that patch discussion as well. >>>>> >>>> Thanks! >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> Now, for the point when we'd remove the "experimental" tag: >>>>>>>> >>>>>>>> In general, it makes me a bit uneasy that we have to share the >>>>>>>> MAC and >>>>>>>> IP between the LRP and the VIF of logical switch port that's >>>>>>>> connected >>>>>>>> to the same switch as the LRP. >>>>> I think that in one way or another, the requirement for the interface >>>>> to share IP/MAC with the LRP will remain. The reason for sharing them >>>>> is to allow operating BGP unnumbered which uses IPv6 LLA instead of >>>>> pre-configured ASN. >>>>> >>>> I think we don't need it with any of the alternative options below but >>>> we can discuss those in detail after branching, in the 25.03 cycle. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> I was thinking of alternatives for the future and the only things >>>>>>>> I >>>>>>>> could come up with until now are: >>>>>>>> >>>>>>>> 1. Create a separate, "disconnected" logical switch: >>>>>>>> ovn-nbctl ls-add bgp >>>>>>>> >>>>>>>> Add the bgp-daemon port to it: >>>>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>>>> >>>>>>>> Then we don't need "unknown" either, I think. >>>>>>>> >>>>>>>> But it's not possible today to move packets from the ing,ress >>>>>>>> pipeline of >>>>>>>> a logical switch ("public" in this test) to the egress pipeline >>>>>>>> of a >>>>>>>> different logical switch ("bgp" in my example). It also feels >>>>>>>> icky to >>>>>>>> implement that. >>>>>>>> >>>>>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>>>>> router port: >>>>>>>> >>>>>>>> then we could do the same type of redirect you do here but >>>>>>>> instead in >>>>>>>> the logical router pipeline. The advantage is we don't have to >>>>>>>> drop any >>>>>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>>>>> functioning as >>>>>>>> it does today. >>>>> I like the second approach a lot. It would indeed feel more "in place" >>>>> to have this logic in the LR, instead of LS. I didn't think this would >>>>> be possible initially. >>>>> >>>> Ack, maybe we should come back to this discussion after branching. >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> Maybe an advantage of this second alternative would be that we >>>>>>>> can >>>>>>>> easily attach a filtering option to this LRP (e.g., >>>>>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>>>>> flexible in >>>>>>>> what kind of traffic we forward to the actuall routing protocol >>>>>>>> daemon >>>>>>>> that runs behind that OVS port - Vladislav also mentioned during >>>>>>>> the >>>>>>>> meeting it might be interesting to forward BFD packets to the FRR >>>>>>>> (or >>>>>>>> whatever application implements the routing protocol) instance >>>>>>>> too. >>>>>>> @Martin, I'm again kindly asking about possible support for BFD >>>>>>> redirect. Is it possible to incorporate these changes as an >>>>>>> additional >>>>>>> configuration know ("bfd-redirect") into your patch so we can start >>>>>>> using this functionality internally and possibly give some feedback >>>>>>> (hopefully positive)? >>>>>>> >>>>>>> It doesn't seem to be a very big change but looks very attractive >>>>>>> for us. >>>>>>> >>>>>>> I can send a separate patch for this on top of your patch, but >>>>>>> technically it won't be able to jump into the 24.09 because >>>>>>> formally >>>>>>> it's already a soft-freeze in progress. As an option I can send a >>>>>>> patch, >>>>>>> which can be squashed into this one. >>>>>>> >>>>>>> What do you think? >>>>>> I didn't check with other maintainers (added them in CC now), but >>>>>> given >>>>>> that we agreed to try to accept the port redirecting patch as >>>>>> experimental I think it's fine to expand it to allow BFD support too. >>>>>> Posting a follow up patch on top of this one is fine from my >>>>>> perspective. >>>>> Indeed the change is not that big code-wise, but given that it would >>>>> require some forethought about the implementation I was bit hesitant to >>>>> include it in here. I didn't want to rock the boat too much, so to >>>>> speak. However given that this is indeed going to be marked as >>>>> experimental and if maintainers are OK with it, I'm happy to post v5 >>>>> with this change included. Feedback from the real-world usage from >>>>> Vladislav would be excellent thing to have when we'll move this feature >>>>> from experimental to stable. >>> Thanks for this, Martin! >>> >>>> >>>>> >>>>> >>>>>> However, what if we change the option to? >>>>>> - NB.Logical_Router_Port.options:routing-protocol-redirect=<LSP-NAME> >>>>>> - NB.Logical_Router_Port.options:routing- >>>>>> protocols=<LIST_OF_PROTOCOLS> >>>>>> >>>>>> E.g.: >>>>>> NB.Logical_Router_Port.options:routing-protocol-redirect="bgp-daemon" >>>>>> NB.Logical_Router_Port.options:routing-protocols="BGP;BFD" >>>>> These seem like a good option to me. >>>> Martin, if it's not too much work could we try to add this to v5 then? >>>> Vladislav, does that work for you too? >>> >>> I'd be okay with both versions, but I'm concerned about options >>> naming. Technically BFD is not a "routing" protocol. >>> >>> Will next variants be more accurate? >>> >>> - NB.Logical_Router_Port.options:redirect-protocols-lport=<LSP-NAME> # (or "lsp" or "to"?) >>> - NB.Logical_Router_Port.options:redirect-protocols=<LIST_OF_PROTOCOLS> >>> >>> E.g.: >>> - NB.Logical_Router_Port.options:redirect-protocols-lport="bgp-daemon" >>> - NB.Logical_Router_Port.options:redirect-protocols="BGP;BFD" >>> >>> If needed this can be extended later with "NB.Logical_Router_Port.options:redirect-protocols-match=<MATCH>". For example: "NB.Logical_Router_Port.options:redirect-protocols-match='ip.protocol == 89 || ip.dst == <lsp_ip> && udp.dst == 3784'" (OSPF + BFD). >>> >>> WDYT? >>> >>> >>>> >>>> Regards, >>>> Dumitru >>>> >>>> >>>>> >>>>>> Would this have a higher chance of becoming fully supported in the >>>>>> next >>>>>> release? >>>>>> >>>>>> Regards, >>>>>> Dumitru >>>>>> >>>>>> >>>>>>> >>>>>>>> But again, if we consider the current feature "experimental", we >>>>>>>> can >>>>>>>> spend more time during the next release cycle to figure out >>>>>>>> what's best. >>>>>>>> >>>>>>>> >>>>>>>>> v4 of this patch renames the feature from "bgp-mirror" to "bgp- >>>>>>>>> redirect" >>>>>>>>> as discussed during community meeting. >>>>>>>>> >>>>>>>>> northd/northd.c | 108 >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>>>>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>>>>>>>> 5 files changed, 289 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>> index 4353df07d..088104f25 100644 >>>>>>>>> --- a/northd/northd.c >>>>>>>>> +++ b/northd/northd.c >>>>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> +static void >>>>>>>>> +build_bgp_redirect_rule__( >>>>>>>>> + const char *s_addr, const char *bgp_port_name, bool >>>>>>>>> is_ipv6, >>>>>>>>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>> +{ >>>>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>>>> + /* Redirect packets in the input pipeline destined for >>>>>>>>> LR's IP to >>>>>>>>> + * the port specified in 'bgp-redirect' option. >>>>>>>>> + */ >>>>>>>>> + ds_clear(match); >>>>>>>>> + ds_clear(actions); >>>>>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", >>>>>>>>> ip_ver, s_addr); >>>>>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>>>>> bgp_port_name); >>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, >>>>>>>>> 100, >>>>>>>>> + ds_cstr(match), >>>>>>>>> + ds_cstr(actions), >>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>> + >>>>>>>>> + >>>>>>>>> + /* Drop any traffic originating from 'bgp-redirect' port >>>>>>>>> that does >>>>>>>>> + * not originate from BGP daemon port. This blocks >>>>>>>>> unnecessary >>>>>>>>> + * traffic like ARP broadcasts or IPv6 router solicitation >>>>>>>>> packets >>>>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>>>> + */ >>>>>>>>> + ds_clear(match); >>>>>>>>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>>>> + ds_cstr(match), >>>>>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>>>>> + ls_peer->lflow_ref); >>>>>>>> Not a huge deal but this logical flow gets regenerated multiple >>>>>>>> times >>>>>>>> for each bgp_port_name (for each LRP address). In the end we'll >>>>>>>> only >>>>>>>> install one in the database so there's no impact on SB. But >>>>>>>> maybe we >>>>>>>> should consider moving this part out of this function. >>>>> ack >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> + >>>>>>>>> + ds_put_format(match, >>>>>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>>>>> + ip_ver, >>>>>>>>> + s_addr); >>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>>>> + ds_cstr(match), >>>>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>>>> check_in_port_sec(); next;", >>>>>>>>> + ls_peer->lflow_ref); >>>>>>>> In your system test we set 'addresses = "unknown"' for >>>>>>>> 'bgp_port_name'. >>>>>>>> And I guess we kind of have to do it like that all the time >>>>>>>> when using >>>>>>>> this feature. I'm trying to think of scenarios in which users >>>>>>>> set >>>>>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>>>>> addresses>'. >>>>>>>> I'm not sure I saw those until now but maybe I'm wrong. >>>>> Is the concern here that the "bgp-port" with the "unknown" address is >>>>> going to receive all the unicast traffic for unknown MAC adresses as >>>>> well? I initially tried to add deny rules that would block any incoming >>>>> packets except those that match LRP's IP and BGP port, but it turned >>>>> out that the negative matches are kinda icky (for a match that has >>>>> something like `ip4.src != <IP_ADDR>` it generated hundreds of OF rules >>>>> in the ovs) >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +build_lrouter_bgp_redirect( >>>>>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>> +{ >>>>>>>>> + /* LRP has to have a peer.*/ >>>>>>>>> + if (op->peer == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + /* LRP has to have NB record.*/ >>>>>>>>> + if (op->nbrp == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' option >>>>>>>>> set. Value of this >>>>>>>>> + * option is the name of LSP to which a BGP traffic will >>>>>>>>> be redirected. >>>>>>>>> + */ >>>>>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp- >>>>>>>>> redirect"); >>>>>>>>> + if (bgp_port == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>>>>> redirected, exists.*/ >>>>>>>>> + struct ovn_port *peer_lsp; >>>>>>>>> + bool bgp_port_exists = false; >>>>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>>>>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>>>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>>>>>>>> + bgp_port_exists = true; >>>>>>>>> + break; >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (!bgp_port_exists) { >>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>>>>> Logical Router Port " >>>>>>>>> + "'%s' refers to non-existent Logical >>>>>>>>> Switch Port. " >>>>>>>>> + "BGP redirecting won't be >>>>>>>>> configured.", >>>>>>>>> + op->key); >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (op->cr_port != NULL) { >>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>>>>> supported on" >>>>>>>>> + " Distributed Gateway Port '%s'", >>>>>>>>> op->key); >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* Mirror traffic destined for LRP's IPs and default BGP >>>>>>>>> port >>>>>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>>>>> + */ >>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) >>>>>>>>> { >>>>>>>>> + const char *ip_s = op- >>>>>>>>> >>>>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op- >>>>>>>>> >>>>>>>>>> peer, lflows, >>>>>>>>> + match, actions); >>>>>>>>> + } >>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) >>>>>>>>> { >>>>>>>>> + const char *ip_s = op- >>>>>>>>> >>>>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op- >>>>>>>>> >>>>>>>>>> peer, lflows, >>>>>>>>> + match, actions); >>>>>>>>> + } >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> /* This function adds ARP resolve flows related to a LSP. */ >>>>>>>>> static void >>>>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>>>>>>>> lsi->meter_groups, op- >>>>>>>>> >>>>>>>>>> lflow_ref); >>>>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>>>> >>>>>>>>>> lflows, &lsi->match, >>>>>>>>> &lsi- >>>>>>>>> >>>>>>>>>> actions, op->lflow_ref); >>>>>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, >>>>>>>>> &lsi->actions); >>>>>>>>> } >>>>>>>>> >>>>>>>>> static void * >>>>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>>>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>>>> @@ -284,6 +284,21 @@ >>>>>>>>> dropped in the next stage. >>>>>>>>> </li> >>>>>>>>> >>>>>>>>> + <li> >>>>>>>>> + For each logical port that's defined as a target of >>>>>>>>> BGP redirecting >>>>>>>>> + (via <code>bgp-redirect</code> option set on Logical >>>>>>>>> Router Port), a >>>>>>>>> + filter is set in place that disallows any traffic >>>>>>>>> entering this port >>>>>>>>> + that does not originate from Logical Router Port's IPs >>>>>>>>> and default BGP >>>>>>>>> + port (TCP 179). This filtering is achieved by two >>>>>>>>> rules. First rule has >>>>>>>>> + priority 81, it matches on <code>inport == >>>>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>>>>> tcp.src == 179</code> >>>>>>>>> + and allows traffic further with >>>>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>>>> + check_in_port_sec(); next;</code>. Second rule with >>>>>>>>> priority 80 matches >>>>>>>>> + the rest of the traffic from that port and sets >>>>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that >>>>>>>>> the packets are >>>>>>>>> + dropped in the next stage. >>>>>>>>> + </li> >>>>>>>>> + >>>>>>>>> <li> >>>>>>>>> For each (enabled) vtep logical port, a priority 70 >>>>>>>>> flow is added which >>>>>>>>> matches on all packets and applies the action >>>>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>>>> on the logical switch. >>>>>>>>> </li> >>>>>>>>> >>>>>>>>> + <li> >>>>>>>>> + For any logical port that's defined as a target of BGP >>>>>>>>> redirecting (via >>>>>>>>> + <code>bgp-redirect</code> option set on Logical Router >>>>>>>>> Port), a >>>>>>>>> + priority-100 flow is added that redirects traffic for >>>>>>>>> Logical Router >>>>>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to the >>>>>>>>> targeted >>>>>>>>> + logical port. >>>>>>>>> + </li> >>>>>>>>> + >>>>>>>>> <li> >>>>>>>>> Priority-90 flows for transit switches that forward >>>>>>>>> registered >>>>>>>>> IP multicast traffic to their corresponding multicast >>>>>>>>> group , which >>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>>>> --- a/ovn-nb.xml >>>>>>>>> +++ b/ovn-nb.xml >>>>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>>>> </p> >>>>>>>>> </column> >>>>>>>>> >>>>>>>>> + <column name="options" key="bgp-redirect" type='{"type": >>>>>>>>> "string"}'> >>>>>>>>> + <p> >>>>>>>>> + This option expects a name of a Logical Switch Port >>>>>>>>> that's present >>>>>>>>> + in the peer's Logical Switch. If set, it causes any >>>>>>>>> traffic >>>>>>>>> + that's destined for Logical Router Port's IP >>>>>>>>> addresses (including >>>>>>>>> + its IPv6 LLA) and the default BGP port (TCP 179), to >>>>>>>>> be redirected >>>>>>>>> + to the specified Logical Switch Port. >>>>>>>>> + >>>>>>>>> + This allows external BGP daemon to be bound to a >>>>>>>>> port in OVN's >>>>>>>>> + Logical Switch and act as if it was listening on >>>>>>>>> Logical Router >>>>>>>>> + Port's IP address. >>>>>>>>> + </p> >>>>>>>>> + </column> >>>>>>>>> + >>>>>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>>>>> <p> >>>>>>>>> When configured, represents a match expression, in >>>>>>>>> the same >>>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>>>>> index a389d1988..df1e924ab 100644 >>>>>>>>> --- a/tests/ovn-northd.at >>>>>>>>> +++ b/tests/ovn-northd.at >>>>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | >>>>>>>>> grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>>>> >>>>>>>>> AT_CLEANUP >>>>>>>>> ]) >>>>>>>>> + >>>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>> +ovn_start >>>>>>>>> + >>>>>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>>>>> + >>>>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>>>>> options:chassis=hv1 >>>>>>>>> + >>>>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>>>> + lsp-add ls ls-lr -- \ >>>>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>>>> + >>>>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>>>> + >>>>>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>>>>> installed. >>>>>>>>> +check_no_bgp_redirect() { >>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>>>>> grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc - >>>>>>>>> l], [0], [0 >>>>>>>>> +]) >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +# By default, no rules related to BGP redirect are present >>>>>>>>> +check_no_bgp_redirect >>>>>>>>> + >>>>>>>>> +# Set "lsp-bgp" port as target of BGP control plane redirected >>>>>>>>> traffic >>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>> options:bgp-redirect=lsp-bgp >>>>>>>>> + >>>>>>>>> +# Check that BGP control plane traffic is redirected to "lsp- >>>>>>>>> bgp" >>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep >>>>>>>>> "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" >>>>>>>>> port >>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec >>>>>>>>> | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], >>>>>>>>> [dnl >>>>>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src >>>>>>>>> == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>> match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 >>>>>>>>> && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>>>>> next;) >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> +# Remove 'bgp-redirect' option from LRP and check that rules >>>>>>>>> are removed >>>>>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>>>>> options bgp-redirect >>>>>>>>> +check_no_bgp_redirect >>>>>>>>> + >>>>>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and check >>>>>>>>> that no rules are added >>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>> options:bgp-redirect=lsp-foo >>>>>>>>> +check_no_bgp_redirect >>>>>>>>> + >>>>>>>>> +AT_CLEANUP >>>>>>>>> +]) >>>>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>>>> --- a/tests/system-ovn.at >>>>>>>>> +++ b/tests/system-ovn.at >>>>>>>>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to >>>>>>>>> query port patch-.*/d >>>>>>>>> /connection dropped.*/d"]) >>>>>>>>> AT_CLEANUP >>>>>>>>> ]) >>>>>>>>> + >>>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. >>>>> ack >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> +ovn_start >>>>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>>>> + >>>>>>>>> +ADD_BR([br-int]) >>>>>>>>> +ADD_BR([br-ext]) >>>>>>>>> + >>>>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>>>>> +ovs-vsctl \ >>>>>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>> type=geneve \ >>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>> ip=169.0.0.1 \ >>>>>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>>>>> config:disable-in-band=true >>>>>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>>>>> test. >>>>> ack >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> + >>>>>>>>> +# Start ovn-controller >>>>>>>>> +start_daemon ovn-controller >>>>>>>>> + >>>>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>>>>> + >>>>>>>>> +ovn-nbctl ls-add public >>>>>>>>> + >>>>>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>>>>>>>> + >>>>>>>>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port >>>>>>>>> public-rp \ >>>>>>>>> + type=router options:router-port=rp-public \ >>>>>>>>> + -- lsp-set-addresses public-rp router >>>>>>>>> + >>>>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>>>> + >>>>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>>>>> + >>>>>>>>> +ovn-nbctl --wait=hv sync >>>>>>>>> + >>>>>>>>> +# Set option that redirects BGP control plane traffic to a LSP >>>>>>>>> "bgp-daemon" >>>>>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>>>>> options:bgp-redirect=bgp-daemon >>>>>>>>> + >>>>>>>>> +# Create "bgp-daemon" interface in a namespace with IP and MAC >>>>>>>>> matching LRP "rp-public" >>>>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>>>>> "00:00:02:01:02:03") >>>>>>>>> + >>>>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>>>>> "00:10:10:01:02:13", \ >>>>>>>>> + "172.16.1.1") >>>>>>>>> + >>>>>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>>>>> + >>>>>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it >>>>>>>>> can't be bound to. >>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | >>>>>>>>> grep "fe80::" | grep -v tentative])]) >>>>>>>>> + >>>>>>>>> +# Verify that BGP control plane traffic is delivered to the >>>>>>>>> "bgp-daemon" >>>>>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>>>>> [bgp_v4.pid]) >>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>>> 172.16.1.1 179]) >>>>>>>>> + >>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 >>>>>>>>> fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>>>> + >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>>>> + >>>>>>>>> +as ovn-sb >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>> + >>>>>>>>> +as ovn-nb >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>> + >>>>>>>>> +as northd >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>>>> + >>>>>>>>> +as >>>>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>>>> +AT_CLEANUP >>>>>>>>> +]) >>>>>>>> Regards, >>>>>>>> Dumitru >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dev mailing list >>>>>>>> dev@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>> Thanks, >>>>> Martin. >>>>> >> >>
On 08.08.2024 03:16, Dumitru Ceara wrote: > Re-adding the dev list. > > On 8/7/24 18:12, Vladislav Odintsov wrote: >> Hi Dumitru, >> > Hi Vladislav, > >> I'd like to add some thoughts to your inputs. >> > Thanks for that, I added some more of my own below. :) > >> On 06.08.2024 19:23, Dumitru Ceara wrote: >>> Hi Martin, >>> >>> Sorry, I was reviewing V3 but I was slow in actually sending out the email. >>> >>> On 8/6/24 13:17, Martin Kalcok wrote: >>>> This change adds a 'bgp-redirect' option to LRP that allows >>>> redirecting of BGP control plane traffic to an arbitrary LSP >>>> in its peer LS. >>>> >>>> The option expects a string with a LSP name. When set, >>>> any traffic entering LS that's destined for any of the >>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>> to the LSP specified in the option's value. >>>> >>>> This enables external BGP daemons to listen on an interface >>>> bound to a LSP and effectively act as if they were listening >>>> on (and speaking from) LRP's IP address. >>>> >>>> Signed-off-by: Martin Kalcok<martin.kalcok@canonical.com> >>>> --- >>> Strictly on this version of the patch, and with the thought in mind that >>> we might want to consider this feature experimental [0] and maybe change >>> it (NB too) in the future, I left a few comments inline. With those >>> addressed I think the patch looks OK to me. >>> >>> [0] >>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>> >>> Now, for the point when we'd remove the "experimental" tag: >>> >>> In general, it makes me a bit uneasy that we have to share the MAC and >>> IP between the LRP and the VIF of logical switch port that's connected >>> to the same switch as the LRP. >>> >>> I was thinking of alternatives for the future and the only things I >>> could come up with until now are: >>> >>> 1. Create a separate, "disconnected" logical switch: >>> ovn-nbctl ls-add bgp >>> >>> Add the bgp-daemon port to it: >>> ovn-nbctl lsp-add bgp bgp-daemon. >>> >>> Then we don't need "unknown" either, I think. >>> >>> But it's not possible today to move packets from the ingress pipeline of >>> a logical switch ("public" in this test) to the egress pipeline of a >>> different logical switch ("bgp" in my example). It also feels icky to >>> implement that. >>> >>> 2. Add the ability to bind an OVS port directly to a logical router port: >>> >>> then we could do the same type of redirect you do here but instead in >>> the logical router pipeline. The advantage is we don't have to drop any >>> non-bgp traffic in the switch pipeline. The switch keeps functioning as >>> it does today. >>> >>> Maybe an advantage of this second alternative would be that we can >>> easily attach a filtering option to this LRP (e.g., >>> LRP.options:control-traffic-filter) to allow us to be more flexible in >>> what kind of traffic we forward to the actuall routing protocol daemon >>> that runs behind that OVS port - Vladislav also mentioned during the >>> meeting it might be interesting to forward BFD packets to the FRR (or >>> whatever application implements the routing protocol) instance too. >> The idea to be able to bind LRP to OVS port sounds very interesting to >> me. But with a note that it's not a pure "bind", but a partial: as you >> wrote with some "filter" applied to pass not all the traffic. The main >> idea here is to pass only control plane traffic destined to LRP's > As we're discussing on the other thread (Martin pointed it out) we also > probably need to involve conntrack and allow replies to any kind of > traffic initiated by the entity behind the LRP's VIF (e.g., BGP sessions > initiated from the host). > >> addresses. Other than that seems odd. Transit traffic should be remained >> in LR pipeline otherwise it will have no difference with a regular VIF LSP. >> > Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or LB > rules that will change the destination IP from LRP IP to something else) > should not be affected. All other "transit" traffic doesn't have LRP > IP, does it? You're right. @Dumitru, @Martin, what if we just redirect all traffic destined to LRP IPs to redirect port? Is there any drawbacks? For security it is possible to optionally use ACLs with or without conntrack. It's up to user/CMS. This seems quite simple in the code and very flexible. So no additional flows seems to be needed in future to support any other routing protocols (or for another possible non-routing usecases). > >> Having a filter (or match like in ACL or Logical_Router_Policies) looks >> more flexible in terms of used protocols. This can coexist with proposal >> from current patch, where the flow is not programmable from user. >> > I think so too. > > Regards, > Dumitru > >>> But again, if we consider the current feature "experimental", we can >>> spend more time during the next release cycle to figure out what's best. >>> >>>> v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" >>>> as discussed during community meeting. >>>> >>>> northd/northd.c | 108 ++++++++++++++++++++++++++++++++++++++++ >>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>> ovn-nb.xml | 14 ++++++ >>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>>> 5 files changed, 289 insertions(+) >>>> >>>> diff --git a/northd/northd.c b/northd/northd.c >>>> index 4353df07d..088104f25 100644 >>>> --- a/northd/northd.c >>>> +++ b/northd/northd.c >>>> @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>> } >>>> } >>>> >>>> +static void >>>> +build_bgp_redirect_rule__( >>>> + const char *s_addr, const char *bgp_port_name, bool is_ipv6, >>>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>>> + struct ds *match, struct ds *actions) >>>> +{ >>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>> + /* Redirect packets in the input pipeline destined for LR's IP to >>>> + * the port specified in 'bgp-redirect' option. >>>> + */ >>>> + ds_clear(match); >>>> + ds_clear(actions); >>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); >>>> + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); >>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, >>>> + ds_cstr(match), >>>> + ds_cstr(actions), >>>> + ls_peer->lflow_ref); >>>> + >>>> + >>>> + /* Drop any traffic originating from 'bgp-redirect' port that does >>>> + * not originate from BGP daemon port. This blocks unnecessary >>>> + * traffic like ARP broadcasts or IPv6 router solicitation packets >>>> + * from the dummy 'bgp-redirect' port. >>>> + */ >>>> + ds_clear(match); >>>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>> + ds_cstr(match), >>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>> + ls_peer->lflow_ref); >>> Not a huge deal but this logical flow gets regenerated multiple times >>> for each bgp_port_name (for each LRP address). In the end we'll only >>> install one in the database so there's no impact on SB. But maybe we >>> should consider moving this part out of this function. >>> >>>> + >>>> + ds_put_format(match, >>>> + " && ip%d.src == %s && tcp.src == 179", >>>> + ip_ver, >>>> + s_addr); >>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>> + ds_cstr(match), >>>> + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", >>>> + ls_peer->lflow_ref); >>> In your system test we set 'addresses = "unknown"' for 'bgp_port_name'. >>> And I guess we kind of have to do it like that all the time when using >>> this feature. I'm trying to think of scenarios in which users set >>> LSP.addresses = 'unknown' and LSP.port-security = '<some-addresses>'. >>> I'm not sure I saw those until now but maybe I'm wrong. >> I'm also interested whether this LSP port with unknown addresses will >> get all the broadcast traffic? >> >> @Martin, didn't you look for ARP behavior? If we have one LS with three >> LSPs: >> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP 169.254.0.1/30 >> - one "vif" LSP with "unknown" addresses set and configured underlying >> OVS port with same MAC and IP set >> - one localnet port for external connectivity. >> >> If some party from localnet side sends broadcast ARP request "Who has >> 169.254.0.1? Answer <...>", will this ARP request got flooded to both >> LRP and LSP with "unknown" addresses? If yes, will the sender receive 2 >> ARP replies? >> >>>> +} >>>> + >>>> +static void >>>> +build_lrouter_bgp_redirect( >>>> + struct ovn_port *op, struct lflow_table *lflows, >>>> + struct ds *match, struct ds *actions) >>>> +{ >>>> + /* LRP has to have a peer.*/ >>>> + if (op->peer == NULL) { >>>> + return; >>>> + } >>>> + /* LRP has to have NB record.*/ >>>> + if (op->nbrp == NULL) { >>>> + return; >>>> + } >>>> + >>>> + /* Proceed only for LRPs that have 'bgp-redirect' option set. Value of this >>>> + * option is the name of LSP to which a BGP traffic will be redirected. >>>> + */ >>>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-redirect"); >>>> + if (bgp_port == NULL) { >>>> + return; >>>> + } >>>> + >>>> + /* Ensure that LSP, to which the BGP traffic is redirected, exists.*/ >>>> + struct ovn_port *peer_lsp; >>>> + bool bgp_port_exists = false; >>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>> + if (peer_lsp_s == strlen(bgp_port) >>>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>>> + bgp_port_exists = true; >>>> + break; >>>> + } >>>> + } >>>> + >>>> + if (!bgp_port_exists) { >>>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on Logical Router Port " >>>> + "'%s' refers to non-existent Logical Switch Port. " >>>> + "BGP redirecting won't be configured.", >>>> + op->key); >>>> + return; >>>> + } >>>> + >>>> + if (op->cr_port != NULL) { >>>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not supported on" >>>> + " Distributed Gateway Port '%s'", op->key); >>>> + return; >>>> + } >>>> + >>>> + /* Mirror traffic destined for LRP's IPs and default BGP port >>>> + * to the port defined in 'bgp-redirect' option. >>>> + */ >>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { >>>> + const char *ip_s = op->lrp_networks.ipv4_addrs[i].addr_s; >>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op->peer, lflows, >>>> + match, actions); >>>> + } >>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { >>>> + const char *ip_s = op->lrp_networks.ipv6_addrs[i].addr_s; >>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op->peer, lflows, >>>> + match, actions); >>>> + } >>>> +} >>>> + >>>> /* This function adds ARP resolve flows related to a LSP. */ >>>> static void >>>> build_arp_resolve_flows_for_lsp( >>>> @@ -16003,6 +16110,7 @@ build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>>> lsi->meter_groups, op->lflow_ref); >>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi->lflows, &lsi->match, >>>> &lsi->actions, op->lflow_ref); >>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, &lsi->actions); >>>> } >>>> >>>> static void * >>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>>> index b06b09ac5..4cbdf8f74 100644 >>>> --- a/northd/ovn-northd.8.xml >>>> +++ b/northd/ovn-northd.8.xml >>>> @@ -284,6 +284,21 @@ >>>> dropped in the next stage. >>>> </li> >>>> >>>> + <li> >>>> + For each logical port that's defined as a target of BGP redirecting >>>> + (via <code>bgp-redirect</code> option set on Logical Router Port), a >>>> + filter is set in place that disallows any traffic entering this port >>>> + that does not originate from Logical Router Port's IPs and default BGP >>>> + port (TCP 179). This filtering is achieved by two rules. First rule has >>>> + priority 81, it matches on <code>inport == <var>BGP_MIRROR_PORT</var> >>>> + && ip.src == <var>LRP_IP</var> && tcp.src == 179</code> >>>> + and allows traffic further with <code>REGBIT_PORT_SEC_DROP" = >>>> + check_in_port_sec(); next;</code>. Second rule with priority 80 matches >>>> + the rest of the traffic from that port and sets >>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that the packets are >>>> + dropped in the next stage. >>>> + </li> >>>> + >>>> <li> >>>> For each (enabled) vtep logical port, a priority 70 flow is added which >>>> matches on all packets and applies the action >>>> @@ -1949,6 +1964,14 @@ output; >>>> on the logical switch. >>>> </li> >>>> >>>> + <li> >>>> + For any logical port that's defined as a target of BGP redirecting (via >>>> + <code>bgp-redirect</code> option set on Logical Router Port), a >>>> + priority-100 flow is added that redirects traffic for Logical Router >>>> + Port IPs (including IPv6 LLA) and TCP port 179, to the targeted >>>> + logical port. >>>> + </li> >>>> + >>>> <li> >>>> Priority-90 flows for transit switches that forward registered >>>> IP multicast traffic to their corresponding multicast group , which >>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>> index 9552534f6..2d2f5c9e0 100644 >>>> --- a/ovn-nb.xml >>>> +++ b/ovn-nb.xml >>>> @@ -3440,6 +3440,20 @@ or >>>> </p> >>>> </column> >>>> >>>> + <column name="options" key="bgp-redirect" type='{"type": "string"}'> >>>> + <p> >>>> + This option expects a name of a Logical Switch Port that's present >>>> + in the peer's Logical Switch. If set, it causes any traffic >>>> + that's destined for Logical Router Port's IP addresses (including >>>> + its IPv6 LLA) and the default BGP port (TCP 179), to be redirected >>>> + to the specified Logical Switch Port. >>>> + >>>> + This allows external BGP daemon to be bound to a port in OVN's >>>> + Logical Switch and act as if it was listening on Logical Router >>>> + Port's IP address. >>>> + </p> >>>> + </column> >>>> + >>>> <column name="options" key="gateway_mtu_bypass"> >>>> <p> >>>> When configured, represents a match expression, in the same >>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>> index a389d1988..df1e924ab 100644 >>>> --- a/tests/ovn-northd.at >>>> +++ b/tests/ovn-northd.at >>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>> >>>> AT_CLEANUP >>>> ]) >>>> + >>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>> +AT_SETUP([BGP control plane redirect]) >>>> +ovn_start >>>> + >>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>> + >>>> +check ovn-nbctl lr-add lr -- \ >>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>> +check ovn-nbctl --wait=sb set logical_router lr options:chassis=hv1 >>>> + >>>> +check ovn-nbctl ls-add ls -- \ >>>> + lsp-add ls ls-lr -- \ >>>> + lsp-set-type ls-lr router -- \ >>>> + lsp-set-addresses ls-lr router -- \ >>>> + lsp-set-options ls-lr router-port=lr-ls >>>> + >>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>> + lsp-set-addresses lsp-bgp unknown >>>> + >>>> +# Function that ensures that no BGP redirect ruels are installed. >>>> +check_no_bgp_redirect() { >>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>>> +]) >>>> + >>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc -l], [0], [0 >>>> +]) >>>> +} >>>> + >>>> +# By default, no rules related to BGP redirect are present >>>> +check_no_bgp_redirect >>>> + >>>> +# Set "lsp-bgp" port as target of BGP control plane redirected traffic >>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-bgp >>>> + >>>> +# Check that BGP control plane traffic is redirected to "lsp-bgp" >>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>>> +]) >>>> + >>>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" port >>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], [dnl >>>> + table=??(ls_in_check_port_sec), priority=80 , match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>> +]) >>>> + >>>> +# Remove 'bgp-redirect' option from LRP and check that rules are removed >>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls options bgp-redirect >>>> +check_no_bgp_redirect >>>> + >>>> +# Set non-existent LSP as target of 'bgp-redirect' and check that no rules are added >>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-foo >>>> +check_no_bgp_redirect >>>> + >>>> +AT_CLEANUP >>>> +]) >>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>> index c24ede7c5..c09c1136e 100644 >>>> --- a/tests/system-ovn.at >>>> +++ b/tests/system-ovn.at >>>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>>> /connection dropped.*/d"]) >>>> AT_CLEANUP >>>> ]) >>>> + >>>> +OVN_FOR_EACH_NORTHD([ >>>> +AT_SETUP([BGP control plane redirect]) >>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. >>> >>>> +ovn_start >>>> +OVS_TRAFFIC_VSWITCHD_START() >>>> + >>>> +ADD_BR([br-int]) >>>> +ADD_BR([br-ext]) >>>> + >>>> +ovs-ofctl add-flow br-ext action=normal >>>> +# Set external-ids in br-int needed for ovn-controller >>>> +ovs-vsctl \ >>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>> + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>>> + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true >>> Most ov*-ctl commands should be prefixed with "check" in this test. >>> >>>> + >>>> +# Start ovn-controller >>>> +start_daemon ovn-controller >>>> + >>>> +ovn-nbctl lr-add R1 \ >>>> + -- set Logical_Router R1 options:chassis=hv1 >>>> + >>>> +ovn-nbctl ls-add public >>>> + >>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>>> + >>>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ >>>> + type=router options:router-port=rp-public \ >>>> + -- lsp-set-addresses public-rp router >>>> + >>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>> + -- lsp-set-addresses bgp-daemon unknown >>>> + >>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext]) >>>> +ovn-nbctl lsp-add public public1 \ >>>> + -- lsp-set-addresses public1 unknown \ >>>> + -- lsp-set-type public1 localnet \ >>>> + -- lsp-set-options public1 network_name=phynet >>>> + >>>> +ovn-nbctl --wait=hv sync >>>> + >>>> +# Set option that redirects BGP control plane traffic to a LSP "bgp-daemon" >>>> +ovn-nbctl --wait=sb set logical_router_port rp-public options:bgp-redirect=bgp-daemon >>>> + >>>> +# Create "bgp-daemon" interface in a namespace with IP and MAC matching LRP "rp-public" >>>> +ADD_NAMESPACES(bgp-daemon) >>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", "00:00:02:01:02:03") >>>> + >>>> +ADD_NAMESPACES(ext-foo) >>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", "00:10:10:01:02:13", \ >>>> + "172.16.1.1") >>>> + >>>> +# Flip the interface down/up to get proper IPv6 LLA >>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>> + >>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it can't be bound to. >>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp-daemon | grep "fe80::" | grep -v tentative])]) >>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | grep "fe80::" | grep -v tentative])]) >>>> + >>>> +# Verify that BGP control plane traffic is delivered to the "bgp-daemon" >>>> +# interface on both IPv4 and IPv6 LLA addresses >>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], [bgp_v4.pid]) >>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only 172.16.1.1 179]) >>>> + >>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>> + >>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>> + >>>> +as ovn-sb >>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>> + >>>> +as ovn-nb >>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>> + >>>> +as northd >>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>> + >>>> +as >>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>> +/.*terminating with signal 15.*/d"]) >>>> +AT_CLEANUP >>>> +]) >>> Regards, >>> Dumitru >>> > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 8/8/24 10:42, Vladislav Odintsov wrote: > > On 08.08.2024 03:16, Dumitru Ceara wrote: >> Re-adding the dev list. >> >> On 8/7/24 18:12, Vladislav Odintsov wrote: >>> Hi Dumitru, >>> >> Hi Vladislav, >> >>> I'd like to add some thoughts to your inputs. >>> >> Thanks for that, I added some more of my own below. :) >> >>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>> Hi Martin, >>>> >>>> Sorry, I was reviewing V3 but I was slow in actually sending out the email. >>>> >>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>> redirecting of BGP control plane traffic to an arbitrary LSP >>>>> in its peer LS. >>>>> >>>>> The option expects a string with a LSP name. When set, >>>>> any traffic entering LS that's destined for any of the >>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>> to the LSP specified in the option's value. >>>>> >>>>> This enables external BGP daemons to listen on an interface >>>>> bound to a LSP and effectively act as if they were listening >>>>> on (and speaking from) LRP's IP address. >>>>> >>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>>> --- >>>> Strictly on this version of the patch, and with the thought in mind that >>>> we might want to consider this feature experimental [0] and maybe change >>>> it (NB too) in the future, I left a few comments inline. With those >>>> addressed I think the patch looks OK to me. >>>> >>>> [0] >>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>> >>>> Now, for the point when we'd remove the "experimental" tag: >>>> >>>> In general, it makes me a bit uneasy that we have to share the MAC and >>>> IP between the LRP and the VIF of logical switch port that's connected >>>> to the same switch as the LRP. >>>> >>>> I was thinking of alternatives for the future and the only things I >>>> could come up with until now are: >>>> >>>> 1. Create a separate, "disconnected" logical switch: >>>> ovn-nbctl ls-add bgp >>>> >>>> Add the bgp-daemon port to it: >>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>> >>>> Then we don't need "unknown" either, I think. >>>> >>>> But it's not possible today to move packets from the ingress pipeline of >>>> a logical switch ("public" in this test) to the egress pipeline of a >>>> different logical switch ("bgp" in my example). It also feels icky to >>>> implement that. >>>> >>>> 2. Add the ability to bind an OVS port directly to a logical router port: >>>> >>>> then we could do the same type of redirect you do here but instead in >>>> the logical router pipeline. The advantage is we don't have to drop any >>>> non-bgp traffic in the switch pipeline. The switch keeps functioning as >>>> it does today. >>>> >>>> Maybe an advantage of this second alternative would be that we can >>>> easily attach a filtering option to this LRP (e.g., >>>> LRP.options:control-traffic-filter) to allow us to be more flexible in >>>> what kind of traffic we forward to the actuall routing protocol daemon >>>> that runs behind that OVS port - Vladislav also mentioned during the >>>> meeting it might be interesting to forward BFD packets to the FRR (or >>>> whatever application implements the routing protocol) instance too. >>> The idea to be able to bind LRP to OVS port sounds very interesting to >>> me. But with a note that it's not a pure "bind", but a partial: as you >>> wrote with some "filter" applied to pass not all the traffic. The main >>> idea here is to pass only control plane traffic destined to LRP's >> As we're discussing on the other thread (Martin pointed it out) we also >> probably need to involve conntrack and allow replies to any kind of >> traffic initiated by the entity behind the LRP's VIF (e.g., BGP sessions >> initiated from the host). >> >>> addresses. Other than that seems odd. Transit traffic should be remained >>> in LR pipeline otherwise it will have no difference with a regular VIF LSP. >>> >> Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or LB >> rules that will change the destination IP from LRP IP to something else) >> should not be affected. All other "transit" traffic doesn't have LRP >> IP, does it? > > You're right. > > @Dumitru, @Martin, what if we just redirect all traffic destined to LRP > IPs to redirect port? Is there any drawbacks? > For security it is possible to optionally use ACLs with or without > conntrack. It's up to user/CMS. > > This seems quite simple in the code and very flexible. So no additional > flows seems to be needed in future to support any other routing > protocols (or for another possible non-routing usecases). > Won't this break all "transit" traffic that has destination IP the LRP IP (DNAT, LB), etc? >>> Having a filter (or match like in ACL or Logical_Router_Policies) looks >>> more flexible in terms of used protocols. This can coexist with proposal >>> from current patch, where the flow is not programmable from user. >>> >> I think so too. >> Regards, >> Dumitru >> >>>> But again, if we consider the current feature "experimental", we can >>>> spend more time during the next release cycle to figure out what's best. >>>> >>>>> v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" >>>>> as discussed during community meeting. >>>>> >>>>> northd/northd.c | 108 ++++++++++++++++++++++++++++++++++++++++ >>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>> ovn-nb.xml | 14 ++++++ >>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>> tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ >>>>> 5 files changed, 289 insertions(+) >>>>> >>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>> index 4353df07d..088104f25 100644 >>>>> --- a/northd/northd.c >>>>> +++ b/northd/northd.c >>>>> @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>> } >>>>> } >>>>> >>>>> +static void >>>>> +build_bgp_redirect_rule__( >>>>> + const char *s_addr, const char *bgp_port_name, bool is_ipv6, >>>>> + struct ovn_port *ls_peer, struct lflow_table *lflows, >>>>> + struct ds *match, struct ds *actions) >>>>> +{ >>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>> + /* Redirect packets in the input pipeline destined for LR's IP to >>>>> + * the port specified in 'bgp-redirect' option. >>>>> + */ >>>>> + ds_clear(match); >>>>> + ds_clear(actions); >>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); >>>>> + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); >>>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, >>>>> + ds_cstr(match), >>>>> + ds_cstr(actions), >>>>> + ls_peer->lflow_ref); >>>>> + >>>>> + >>>>> + /* Drop any traffic originating from 'bgp-redirect' port that does >>>>> + * not originate from BGP daemon port. This blocks unnecessary >>>>> + * traffic like ARP broadcasts or IPv6 router solicitation packets >>>>> + * from the dummy 'bgp-redirect' port. >>>>> + */ >>>>> + ds_clear(match); >>>>> + ds_put_format(match, "inport == \"%s\"", bgp_port_name); >>>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>> + ds_cstr(match), >>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>> + ls_peer->lflow_ref); >>>> Not a huge deal but this logical flow gets regenerated multiple times >>>> for each bgp_port_name (for each LRP address). In the end we'll only >>>> install one in the database so there's no impact on SB. But maybe we >>>> should consider moving this part out of this function. >>>> >>>>> + >>>>> + ds_put_format(match, >>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>> + ip_ver, >>>>> + s_addr); >>>>> + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>> + ds_cstr(match), >>>>> + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", >>>>> + ls_peer->lflow_ref); >>>> In your system test we set 'addresses = "unknown"' for 'bgp_port_name'. >>>> And I guess we kind of have to do it like that all the time when using >>>> this feature. I'm trying to think of scenarios in which users set >>>> LSP.addresses = 'unknown' and LSP.port-security = '<some-addresses>'. >>>> I'm not sure I saw those until now but maybe I'm wrong. >>> I'm also interested whether this LSP port with unknown addresses will >>> get all the broadcast traffic? >>> >>> @Martin, didn't you look for ARP behavior? If we have one LS with three >>> LSPs: >>> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP 169.254.0.1/30 >>> - one "vif" LSP with "unknown" addresses set and configured underlying >>> OVS port with same MAC and IP set >>> - one localnet port for external connectivity. >>> >>> If some party from localnet side sends broadcast ARP request "Who has >>> 169.254.0.1? Answer <...>", will this ARP request got flooded to both >>> LRP and LSP with "unknown" addresses? If yes, will the sender receive 2 >>> ARP replies? >>> >>>>> +} >>>>> + >>>>> +static void >>>>> +build_lrouter_bgp_redirect( >>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>> + struct ds *match, struct ds *actions) >>>>> +{ >>>>> + /* LRP has to have a peer.*/ >>>>> + if (op->peer == NULL) { >>>>> + return; >>>>> + } >>>>> + /* LRP has to have NB record.*/ >>>>> + if (op->nbrp == NULL) { >>>>> + return; >>>>> + } >>>>> + >>>>> + /* Proceed only for LRPs that have 'bgp-redirect' option set. Value of this >>>>> + * option is the name of LSP to which a BGP traffic will be redirected. >>>>> + */ >>>>> + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-redirect"); >>>>> + if (bgp_port == NULL) { >>>>> + return; >>>>> + } >>>>> + >>>>> + /* Ensure that LSP, to which the BGP traffic is redirected, exists.*/ >>>>> + struct ovn_port *peer_lsp; >>>>> + bool bgp_port_exists = false; >>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { >>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>> + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ >>>>> + bgp_port_exists = true; >>>>> + break; >>>>> + } >>>>> + } >>>>> + >>>>> + if (!bgp_port_exists) { >>>>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on Logical Router Port " >>>>> + "'%s' refers to non-existent Logical Switch Port. " >>>>> + "BGP redirecting won't be configured.", >>>>> + op->key); >>>>> + return; >>>>> + } >>>>> + >>>>> + if (op->cr_port != NULL) { >>>>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); >>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not supported on" >>>>> + " Distributed Gateway Port '%s'", op->key); >>>>> + return; >>>>> + } >>>>> + >>>>> + /* Mirror traffic destined for LRP's IPs and default BGP port >>>>> + * to the port defined in 'bgp-redirect' option. >>>>> + */ >>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { >>>>> + const char *ip_s = op->lrp_networks.ipv4_addrs[i].addr_s; >>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, op->peer, lflows, >>>>> + match, actions); >>>>> + } >>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { >>>>> + const char *ip_s = op->lrp_networks.ipv6_addrs[i].addr_s; >>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, op->peer, lflows, >>>>> + match, actions); >>>>> + } >>>>> +} >>>>> + >>>>> /* This function adds ARP resolve flows related to a LSP. */ >>>>> static void >>>>> build_arp_resolve_flows_for_lsp( >>>>> @@ -16003,6 +16110,7 @@ build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, >>>>> lsi->meter_groups, op->lflow_ref); >>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi->lflows, &lsi->match, >>>>> &lsi->actions, op->lflow_ref); >>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, &lsi->actions); >>>>> } >>>>> >>>>> static void * >>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>>>> index b06b09ac5..4cbdf8f74 100644 >>>>> --- a/northd/ovn-northd.8.xml >>>>> +++ b/northd/ovn-northd.8.xml >>>>> @@ -284,6 +284,21 @@ >>>>> dropped in the next stage. >>>>> </li> >>>>> >>>>> + <li> >>>>> + For each logical port that's defined as a target of BGP redirecting >>>>> + (via <code>bgp-redirect</code> option set on Logical Router Port), a >>>>> + filter is set in place that disallows any traffic entering this port >>>>> + that does not originate from Logical Router Port's IPs and default BGP >>>>> + port (TCP 179). This filtering is achieved by two rules. First rule has >>>>> + priority 81, it matches on <code>inport == <var>BGP_MIRROR_PORT</var> >>>>> + && ip.src == <var>LRP_IP</var> && tcp.src == 179</code> >>>>> + and allows traffic further with <code>REGBIT_PORT_SEC_DROP" = >>>>> + check_in_port_sec(); next;</code>. Second rule with priority 80 matches >>>>> + the rest of the traffic from that port and sets >>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that the packets are >>>>> + dropped in the next stage. >>>>> + </li> >>>>> + >>>>> <li> >>>>> For each (enabled) vtep logical port, a priority 70 flow is added which >>>>> matches on all packets and applies the action >>>>> @@ -1949,6 +1964,14 @@ output; >>>>> on the logical switch. >>>>> </li> >>>>> >>>>> + <li> >>>>> + For any logical port that's defined as a target of BGP redirecting (via >>>>> + <code>bgp-redirect</code> option set on Logical Router Port), a >>>>> + priority-100 flow is added that redirects traffic for Logical Router >>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to the targeted >>>>> + logical port. >>>>> + </li> >>>>> + >>>>> <li> >>>>> Priority-90 flows for transit switches that forward registered >>>>> IP multicast traffic to their corresponding multicast group , which >>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>> index 9552534f6..2d2f5c9e0 100644 >>>>> --- a/ovn-nb.xml >>>>> +++ b/ovn-nb.xml >>>>> @@ -3440,6 +3440,20 @@ or >>>>> </p> >>>>> </column> >>>>> >>>>> + <column name="options" key="bgp-redirect" type='{"type": "string"}'> >>>>> + <p> >>>>> + This option expects a name of a Logical Switch Port that's present >>>>> + in the peer's Logical Switch. If set, it causes any traffic >>>>> + that's destined for Logical Router Port's IP addresses (including >>>>> + its IPv6 LLA) and the default BGP port (TCP 179), to be redirected >>>>> + to the specified Logical Switch Port. >>>>> + >>>>> + This allows external BGP daemon to be bound to a port in OVN's >>>>> + Logical Switch and act as if it was listening on Logical Router >>>>> + Port's IP address. >>>>> + </p> >>>>> + </column> >>>>> + >>>>> <column name="options" key="gateway_mtu_bypass"> >>>>> <p> >>>>> When configured, represents a match expression, in the same >>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>> index a389d1988..df1e924ab 100644 >>>>> --- a/tests/ovn-northd.at >>>>> +++ b/tests/ovn-northd.at >>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>> >>>>> AT_CLEANUP >>>>> ]) >>>>> + >>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>> +AT_SETUP([BGP control plane redirect]) >>>>> +ovn_start >>>>> + >>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>> + >>>>> +check ovn-nbctl lr-add lr -- \ >>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>> +check ovn-nbctl --wait=sb set logical_router lr options:chassis=hv1 >>>>> + >>>>> +check ovn-nbctl ls-add ls -- \ >>>>> + lsp-add ls ls-lr -- \ >>>>> + lsp-set-type ls-lr router -- \ >>>>> + lsp-set-addresses ls-lr router -- \ >>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>> + >>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>> + lsp-set-addresses lsp-bgp unknown >>>>> + >>>>> +# Function that ensures that no BGP redirect ruels are installed. >>>>> +check_no_bgp_redirect() { >>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>>>> +]) >>>>> + >>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc -l], [0], [0 >>>>> +]) >>>>> +} >>>>> + >>>>> +# By default, no rules related to BGP redirect are present >>>>> +check_no_bgp_redirect >>>>> + >>>>> +# Set "lsp-bgp" port as target of BGP control plane redirected traffic >>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-bgp >>>>> + >>>>> +# Check that BGP control plane traffic is redirected to "lsp-bgp" >>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>>>> + table=??(ls_in_l2_lkup ), priority=100 , match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) >>>>> +]) >>>>> + >>>>> +# Check that only BGP-related traffic is accepted on "lsp-bgp" port >>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], [dnl >>>>> + table=??(ls_in_check_port_sec), priority=80 , match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>>> + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) >>>>> +]) >>>>> + >>>>> +# Remove 'bgp-redirect' option from LRP and check that rules are removed >>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls options bgp-redirect >>>>> +check_no_bgp_redirect >>>>> + >>>>> +# Set non-existent LSP as target of 'bgp-redirect' and check that no rules are added >>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-foo >>>>> +check_no_bgp_redirect >>>>> + >>>>> +AT_CLEANUP >>>>> +]) >>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>> index c24ede7c5..c09c1136e 100644 >>>>> --- a/tests/system-ovn.at >>>>> +++ b/tests/system-ovn.at >>>>> @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>>>> /connection dropped.*/d"]) >>>>> AT_CLEANUP >>>>> ]) >>>>> + >>>>> +OVN_FOR_EACH_NORTHD([ >>>>> +AT_SETUP([BGP control plane redirect]) >>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) here. >>>> >>>>> +ovn_start >>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>> + >>>>> +ADD_BR([br-int]) >>>>> +ADD_BR([br-ext]) >>>>> + >>>>> +ovs-ofctl add-flow br-ext action=normal >>>>> +# Set external-ids in br-int needed for ovn-controller >>>>> +ovs-vsctl \ >>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>> + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>>>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>>>> + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true >>>> Most ov*-ctl commands should be prefixed with "check" in this test. >>>> >>>>> + >>>>> +# Start ovn-controller >>>>> +start_daemon ovn-controller >>>>> + >>>>> +ovn-nbctl lr-add R1 \ >>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>> + >>>>> +ovn-nbctl ls-add public >>>>> + >>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 >>>>> + >>>>> +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ >>>>> + type=router options:router-port=rp-public \ >>>>> + -- lsp-set-addresses public-rp router >>>>> + >>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>> + >>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext]) >>>>> +ovn-nbctl lsp-add public public1 \ >>>>> + -- lsp-set-addresses public1 unknown \ >>>>> + -- lsp-set-type public1 localnet \ >>>>> + -- lsp-set-options public1 network_name=phynet >>>>> + >>>>> +ovn-nbctl --wait=hv sync >>>>> + >>>>> +# Set option that redirects BGP control plane traffic to a LSP "bgp-daemon" >>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public options:bgp-redirect=bgp-daemon >>>>> + >>>>> +# Create "bgp-daemon" interface in a namespace with IP and MAC matching LRP "rp-public" >>>>> +ADD_NAMESPACES(bgp-daemon) >>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", "00:00:02:01:02:03") >>>>> + >>>>> +ADD_NAMESPACES(ext-foo) >>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", "00:10:10:01:02:13", \ >>>>> + "172.16.1.1") >>>>> + >>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>> + >>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise it can't be bound to. >>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp-daemon | grep "fe80::" | grep -v tentative])]) >>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | grep "fe80::" | grep -v tentative])]) >>>>> + >>>>> +# Verify that BGP control plane traffic is delivered to the "bgp-daemon" >>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], [bgp_v4.pid]) >>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only 172.16.1.1 179]) >>>>> + >>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>>> + >>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>> + >>>>> +as ovn-sb >>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>> + >>>>> +as ovn-nb >>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>> + >>>>> +as northd >>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>> + >>>>> +as >>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>> +/.*terminating with signal 15.*/d"]) >>>>> +AT_CLEANUP >>>>> +]) >>>> Regards, >>>> Dumitru >>>> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: > On 8/8/24 10:42, Vladislav Odintsov wrote: > > > > On 08.08.2024 03:16, Dumitru Ceara wrote: > > > Re-adding the dev list. > > > > > > On 8/7/24 18:12, Vladislav Odintsov wrote: > > > > Hi Dumitru, > > > > > > > Hi Vladislav, > > > > > > > I'd like to add some thoughts to your inputs. > > > > > > > Thanks for that, I added some more of my own below. :) > > > > > > > On 06.08.2024 19:23, Dumitru Ceara wrote: > > > > > Hi Martin, > > > > > > > > > > Sorry, I was reviewing V3 but I was slow in actually sending > > > > > out the email. > > > > > > > > > > On 8/6/24 13:17, Martin Kalcok wrote: > > > > > > This change adds a 'bgp-redirect' option to LRP that allows > > > > > > redirecting of BGP control plane traffic to an arbitrary > > > > > > LSP > > > > > > in its peer LS. > > > > > > > > > > > > The option expects a string with a LSP name. When set, > > > > > > any traffic entering LS that's destined for any of the > > > > > > LRP's IP addresses (including IPv6 LLA) is redirected > > > > > > to the LSP specified in the option's value. > > > > > > > > > > > > This enables external BGP daemons to listen on an interface > > > > > > bound to a LSP and effectively act as if they were > > > > > > listening > > > > > > on (and speaking from) LRP's IP address. > > > > > > > > > > > > Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> > > > > > > --- > > > > > Strictly on this version of the patch, and with the thought > > > > > in mind that > > > > > we might want to consider this feature experimental [0] and > > > > > maybe change > > > > > it (NB too) in the future, I left a few comments inline. > > > > > With those > > > > > addressed I think the patch looks OK to me. > > > > > > > > > > [0] > > > > > https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ > > > > > > > > > > Now, for the point when we'd remove the "experimental" tag: > > > > > > > > > > In general, it makes me a bit uneasy that we have to share > > > > > the MAC and > > > > > IP between the LRP and the VIF of logical switch port that's > > > > > connected > > > > > to the same switch as the LRP. > > > > > > > > > > I was thinking of alternatives for the future and the only > > > > > things I > > > > > could come up with until now are: > > > > > > > > > > 1. Create a separate, "disconnected" logical switch: > > > > > ovn-nbctl ls-add bgp > > > > > > > > > > Add the bgp-daemon port to it: > > > > > ovn-nbctl lsp-add bgp bgp-daemon. > > > > > > > > > > Then we don't need "unknown" either, I think. > > > > > > > > > > But it's not possible today to move packets from the ingress > > > > > pipeline of > > > > > a logical switch ("public" in this test) to the egress > > > > > pipeline of a > > > > > different logical switch ("bgp" in my example). It also > > > > > feels icky to > > > > > implement that. > > > > > > > > > > 2. Add the ability to bind an OVS port directly to a logical > > > > > router port: > > > > > > > > > > then we could do the same type of redirect you do here but > > > > > instead in > > > > > the logical router pipeline. The advantage is we don't have > > > > > to drop any > > > > > non-bgp traffic in the switch pipeline. The switch keeps > > > > > functioning as > > > > > it does today. > > > > > > > > > > Maybe an advantage of this second alternative would be that > > > > > we can > > > > > easily attach a filtering option to this LRP (e.g., > > > > > LRP.options:control-traffic-filter) to allow us to be more > > > > > flexible in > > > > > what kind of traffic we forward to the actuall routing > > > > > protocol daemon > > > > > that runs behind that OVS port - Vladislav also mentioned > > > > > during the > > > > > meeting it might be interesting to forward BFD packets to the > > > > > FRR (or > > > > > whatever application implements the routing protocol) > > > > > instance too. > > > > The idea to be able to bind LRP to OVS port sounds very > > > > interesting to > > > > me. But with a note that it's not a pure "bind", but a partial: > > > > as you > > > > wrote with some "filter" applied to pass not all the traffic. > > > > The main > > > > idea here is to pass only control plane traffic destined to > > > > LRP's > > > As we're discussing on the other thread (Martin pointed it out) > > > we also > > > probably need to involve conntrack and allow replies to any kind > > > of > > > traffic initiated by the entity behind the LRP's VIF (e.g., BGP > > > sessions > > > initiated from the host). > > > > > > > addresses. Other than that seems odd. Transit traffic should be > > > > remained > > > > in LR pipeline otherwise it will have no difference with a > > > > regular VIF LSP. > > > > > > > Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or > > > LB > > > rules that will change the destination IP from LRP IP to > > > something else) > > > should not be affected. All other "transit" traffic doesn't have > > > LRP > > > IP, does it? > > > > You're right. > > > > @Dumitru, @Martin, what if we just redirect all traffic destined to > > LRP > > IPs to redirect port? Is there any drawbacks? > > For security it is possible to optionally use ACLs with or without > > conntrack. It's up to user/CMS. > > > > This seems quite simple in the code and very flexible. So no > > additional > > flows seems to be needed in future to support any other routing > > protocols (or for another possible non-routing usecases). > > > > Won't this break all "transit" traffic that has destination IP the > LRP > IP (DNAT, LB), etc? > I'm afraid so. I guess I was just tired yesterday when I made that proposal for general redirect. Given that the redirect is implemented in the LS pipeline, it would eat up all the traffic for LRP's IP before the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick test in case I'm wrong here again. If that's the case, I'll try the conntrack approach and update you all asap. Martin. > > > > Having a filter (or match like in ACL or > > > > Logical_Router_Policies) looks > > > > more flexible in terms of used protocols. This can coexist with > > > > proposal > > > > from current patch, where the flow is not programmable from > > > > user. > > > > > > > I think so too. > > > Regards, > > > Dumitru > > > > > > > > But again, if we consider the current feature "experimental", > > > > > we can > > > > > spend more time during the next release cycle to figure out > > > > > what's best. > > > > > > > > > > > v4 of this patch renames the feature from "bgp-mirror" to > > > > > > "bgp-redirect" > > > > > > as discussed during community meeting. > > > > > > > > > > > > northd/northd.c | 108 > > > > > > ++++++++++++++++++++++++++++++++++++++++ > > > > > > northd/ovn-northd.8.xml | 23 +++++++++ > > > > > > ovn-nb.xml | 14 ++++++ > > > > > > tests/ovn-northd.at | 58 +++++++++++++++++++++ > > > > > > tests/system-ovn.at | 86 > > > > > > ++++++++++++++++++++++++++++++++ > > > > > > 5 files changed, 289 insertions(+) > > > > > > > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > > > > index 4353df07d..088104f25 100644 > > > > > > --- a/northd/northd.c > > > > > > +++ b/northd/northd.c > > > > > > @@ -13048,6 +13048,113 @@ > > > > > > build_arp_resolve_flows_for_lrp(struct ovn_port *op, > > > > > > } > > > > > > } > > > > > > > > > > > > +static void > > > > > > +build_bgp_redirect_rule__( > > > > > > + const char *s_addr, const char *bgp_port_name, > > > > > > bool is_ipv6, > > > > > > + struct ovn_port *ls_peer, struct lflow_table > > > > > > *lflows, > > > > > > + struct ds *match, struct ds *actions) > > > > > > +{ > > > > > > + int ip_ver = is_ipv6 ? 6 : 4; > > > > > > + /* Redirect packets in the input pipeline destined for > > > > > > LR's IP to > > > > > > + * the port specified in 'bgp-redirect' option. > > > > > > + */ > > > > > > + ds_clear(match); > > > > > > + ds_clear(actions); > > > > > > + ds_put_format(match, "ip%d.dst == %s && tcp.dst == > > > > > > 179", ip_ver, s_addr); > > > > > > + ds_put_format(actions, "outport = \"%s\"; output;", > > > > > > bgp_port_name); > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > S_SWITCH_IN_L2_LKUP, 100, > > > > > > + ds_cstr(match), > > > > > > + ds_cstr(actions), > > > > > > + ls_peer->lflow_ref); > > > > > > + > > > > > > + > > > > > > + /* Drop any traffic originating from 'bgp-redirect' > > > > > > port that does > > > > > > + * not originate from BGP daemon port. This blocks > > > > > > unnecessary > > > > > > + * traffic like ARP broadcasts or IPv6 router > > > > > > solicitation packets > > > > > > + * from the dummy 'bgp-redirect' port. > > > > > > + */ > > > > > > + ds_clear(match); > > > > > > + ds_put_format(match, "inport == \"%s\"", > > > > > > bgp_port_name); > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > S_SWITCH_IN_CHECK_PORT_SEC, 80, > > > > > > + ds_cstr(match), > > > > > > + REGBIT_PORT_SEC_DROP " = 1; next;", > > > > > > + ls_peer->lflow_ref); > > > > > Not a huge deal but this logical flow gets regenerated > > > > > multiple times > > > > > for each bgp_port_name (for each LRP address). In the end > > > > > we'll only > > > > > install one in the database so there's no impact on SB. But > > > > > maybe we > > > > > should consider moving this part out of this function. > > > > > > > > > > > + > > > > > > + ds_put_format(match, > > > > > > + " && ip%d.src == %s && tcp.src == 179", > > > > > > + ip_ver, > > > > > > + s_addr); > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > S_SWITCH_IN_CHECK_PORT_SEC, 81, > > > > > > + ds_cstr(match), > > > > > > + REGBIT_PORT_SEC_DROP " = > > > > > > check_in_port_sec(); next;", > > > > > > + ls_peer->lflow_ref); > > > > > In your system test we set 'addresses = "unknown"' for > > > > > 'bgp_port_name'. > > > > > And I guess we kind of have to do it like that all the time > > > > > when using > > > > > this feature. I'm trying to think of scenarios in which > > > > > users set > > > > > LSP.addresses = 'unknown' and LSP.port-security = '<some- > > > > > addresses>'. > > > > > I'm not sure I saw those until now but maybe I'm wrong. > > > > I'm also interested whether this LSP port with unknown > > > > addresses will > > > > get all the broadcast traffic? > > > > > > > > @Martin, didn't you look for ARP behavior? If we have one LS > > > > with three > > > > LSPs: > > > > - 1 router type LSP with MAC 00:00:00:00:00:01 and IP > > > > 169.254.0.1/30 > > > > - one "vif" LSP with "unknown" addresses set and configured > > > > underlying > > > > OVS port with same MAC and IP set > > > > - one localnet port for external connectivity. > > > > > > > > If some party from localnet side sends broadcast ARP request > > > > "Who has > > > > 169.254.0.1? Answer <...>", will this ARP request got flooded > > > > to both > > > > LRP and LSP with "unknown" addresses? If yes, will the sender > > > > receive 2 > > > > ARP replies? > > > > > > > > > > +} > > > > > > + > > > > > > +static void > > > > > > +build_lrouter_bgp_redirect( > > > > > > + struct ovn_port *op, struct lflow_table *lflows, > > > > > > + struct ds *match, struct ds *actions) > > > > > > +{ > > > > > > + /* LRP has to have a peer.*/ > > > > > > + if (op->peer == NULL) { > > > > > > + return; > > > > > > + } > > > > > > + /* LRP has to have NB record.*/ > > > > > > + if (op->nbrp == NULL) { > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > + /* Proceed only for LRPs that have 'bgp-redirect' > > > > > > option set. Value of this > > > > > > + * option is the name of LSP to which a BGP traffic > > > > > > will be redirected. > > > > > > + */ > > > > > > + const char *bgp_port = smap_get(&op->nbrp->options, > > > > > > "bgp-redirect"); > > > > > > + if (bgp_port == NULL) { > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > + /* Ensure that LSP, to which the BGP traffic is > > > > > > redirected, exists.*/ > > > > > > + struct ovn_port *peer_lsp; > > > > > > + bool bgp_port_exists = false; > > > > > > + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od- > > > > > > >ports) { > > > > > > + size_t peer_lsp_s = strlen(peer_lsp->key); > > > > > > + if (peer_lsp_s == strlen(bgp_port) > > > > > > + && !strncmp(peer_lsp->key, bgp_port, > > > > > > peer_lsp_s)){ > > > > > > + bgp_port_exists = true; > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > + > > > > > > + if (!bgp_port_exists) { > > > > > > + static struct vlog_rate_limit rl = > > > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > > > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on > > > > > > Logical Router Port " > > > > > > + "'%s' refers to non-existent > > > > > > Logical Switch Port. " > > > > > > + "BGP redirecting won't be > > > > > > configured.", > > > > > > + op->key); > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > + if (op->cr_port != NULL) { > > > > > > + static struct vlog_rate_limit rl = > > > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > > > + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not > > > > > > supported on" > > > > > > + " Distributed Gateway Port > > > > > > '%s'", op->key); > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > + /* Mirror traffic destined for LRP's IPs and default > > > > > > BGP port > > > > > > + * to the port defined in 'bgp-redirect' option. > > > > > > + */ > > > > > > + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; > > > > > > i++) { > > > > > > + const char *ip_s = op- > > > > > > >lrp_networks.ipv4_addrs[i].addr_s; > > > > > > + build_bgp_redirect_rule__(ip_s, bgp_port, false, > > > > > > op->peer, lflows, > > > > > > + match, actions); > > > > > > + } > > > > > > + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; > > > > > > i++) { > > > > > > + const char *ip_s = op- > > > > > > >lrp_networks.ipv6_addrs[i].addr_s; > > > > > > + build_bgp_redirect_rule__(ip_s, bgp_port, true, > > > > > > op->peer, lflows, > > > > > > + match, actions); > > > > > > + } > > > > > > +} > > > > > > + > > > > > > /* This function adds ARP resolve flows related to a LSP. > > > > > > */ > > > > > > static void > > > > > > build_arp_resolve_flows_for_lsp( > > > > > > @@ -16003,6 +16110,7 @@ > > > > > > build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port > > > > > > *op, > > > > > > lsi->meter_groups, op- > > > > > > >lflow_ref); > > > > > > build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- > > > > > > >lflows, &lsi->match, > > > > > > &lsi- > > > > > > >actions, op->lflow_ref); > > > > > > + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi- > > > > > > >match, &lsi->actions); > > > > > > } > > > > > > > > > > > > static void * > > > > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn- > > > > > > northd.8.xml > > > > > > index b06b09ac5..4cbdf8f74 100644 > > > > > > --- a/northd/ovn-northd.8.xml > > > > > > +++ b/northd/ovn-northd.8.xml > > > > > > @@ -284,6 +284,21 @@ > > > > > > dropped in the next stage. > > > > > > </li> > > > > > > > > > > > > + <li> > > > > > > + For each logical port that's defined as a target > > > > > > of BGP redirecting > > > > > > + (via <code>bgp-redirect</code> option set on > > > > > > Logical Router Port), a > > > > > > + filter is set in place that disallows any traffic > > > > > > entering this port > > > > > > + that does not originate from Logical Router Port's > > > > > > IPs and default BGP > > > > > > + port (TCP 179). This filtering is achieved by two > > > > > > rules. First rule has > > > > > > + priority 81, it matches on <code>inport == > > > > > > <var>BGP_MIRROR_PORT</var> > > > > > > + && ip.src == <var>LRP_IP</var> && > > > > > > tcp.src == 179</code> > > > > > > + and allows traffic further with > > > > > > <code>REGBIT_PORT_SEC_DROP" = > > > > > > + check_in_port_sec(); next;</code>. Second rule > > > > > > with priority 80 matches > > > > > > + the rest of the traffic from that port and sets > > > > > > + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so > > > > > > that the packets are > > > > > > + dropped in the next stage. > > > > > > + </li> > > > > > > + > > > > > > <li> > > > > > > For each (enabled) vtep logical port, a priority > > > > > > 70 flow is added which > > > > > > matches on all packets and applies the action > > > > > > @@ -1949,6 +1964,14 @@ output; > > > > > > on the logical switch. > > > > > > </li> > > > > > > > > > > > > + <li> > > > > > > + For any logical port that's defined as a target of > > > > > > BGP redirecting (via > > > > > > + <code>bgp-redirect</code> option set on Logical > > > > > > Router Port), a > > > > > > + priority-100 flow is added that redirects traffic > > > > > > for Logical Router > > > > > > + Port IPs (including IPv6 LLA) and TCP port 179, to > > > > > > the targeted > > > > > > + logical port. > > > > > > + </li> > > > > > > + > > > > > > <li> > > > > > > Priority-90 flows for transit switches that > > > > > > forward registered > > > > > > IP multicast traffic to their corresponding > > > > > > multicast group , which > > > > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > > > > > index 9552534f6..2d2f5c9e0 100644 > > > > > > --- a/ovn-nb.xml > > > > > > +++ b/ovn-nb.xml > > > > > > @@ -3440,6 +3440,20 @@ or > > > > > > </p> > > > > > > </column> > > > > > > > > > > > > + <column name="options" key="bgp-redirect" > > > > > > type='{"type": "string"}'> > > > > > > + <p> > > > > > > + This option expects a name of a Logical Switch > > > > > > Port that's present > > > > > > + in the peer's Logical Switch. If set, it causes > > > > > > any traffic > > > > > > + that's destined for Logical Router Port's IP > > > > > > addresses (including > > > > > > + its IPv6 LLA) and the default BGP port (TCP > > > > > > 179), to be redirected > > > > > > + to the specified Logical Switch Port. > > > > > > + > > > > > > + This allows external BGP daemon to be bound to a > > > > > > port in OVN's > > > > > > + Logical Switch and act as if it was listening on > > > > > > Logical Router > > > > > > + Port's IP address. > > > > > > + </p> > > > > > > + </column> > > > > > > + > > > > > > <column name="options" key="gateway_mtu_bypass"> > > > > > > <p> > > > > > > When configured, represents a match expression, > > > > > > in the same > > > > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > > > > index a389d1988..df1e924ab 100644 > > > > > > --- a/tests/ovn-northd.at > > > > > > +++ b/tests/ovn-northd.at > > > > > > @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr > > > > > > | grep lr_in_dnat | ovn_strip_lflows], [0], [d > > > > > > > > > > > > AT_CLEANUP > > > > > > ]) > > > > > > + > > > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > > > > +AT_SETUP([BGP control plane redirect]) > > > > > > +ovn_start > > > > > > + > > > > > > +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 > > > > > > + > > > > > > +check ovn-nbctl lr-add lr -- \ > > > > > > + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 > > > > > > +check ovn-nbctl --wait=sb set logical_router lr > > > > > > options:chassis=hv1 > > > > > > + > > > > > > +check ovn-nbctl ls-add ls -- \ > > > > > > + lsp-add ls ls-lr -- \ > > > > > > + lsp-set-type ls-lr router -- \ > > > > > > + lsp-set-addresses ls-lr router -- \ > > > > > > + lsp-set-options ls-lr router-port=lr-ls > > > > > > + > > > > > > +check ovn-nbctl lsp-add ls lsp-bgp -- \ > > > > > > + lsp-set-addresses lsp-bgp unknown > > > > > > + > > > > > > +# Function that ensures that no BGP redirect ruels are > > > > > > installed. > > > > > > +check_no_bgp_redirect() { > > > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 > > > > > > +]) > > > > > > + > > > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > ls_in_check_port_sec | grep -E "priority=80|priority=81" | > > > > > > wc -l], [0], [0 > > > > > > +]) > > > > > > +} > > > > > > + > > > > > > +# By default, no rules related to BGP redirect are present > > > > > > +check_no_bgp_redirect > > > > > > + > > > > > > +# Set "lsp-bgp" port as target of BGP control plane > > > > > > redirected traffic > > > > > > +check ovn-nbctl --wait=sb set logical_router_port lr-ls > > > > > > options:bgp-redirect=lsp-bgp > > > > > > + > > > > > > +# Check that BGP control plane traffic is redirected to > > > > > > "lsp-bgp" > > > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | > > > > > > grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl > > > > > > + table=??(ls_in_l2_lkup ), priority=100 , > > > > > > match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), > > > > > > action=(outport = "lsp-bgp"; output;) > > > > > > + table=??(ls_in_l2_lkup ), priority=100 , > > > > > > match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), > > > > > > action=(outport = "lsp-bgp"; output;) > > > > > > +]) > > > > > > + > > > > > > +# Check that only BGP-related traffic is accepted on "lsp- > > > > > > bgp" port > > > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > ls_in_check_port_sec | grep -E "priority=80|priority=81" | > > > > > > ovn_strip_lflows], [0], [dnl > > > > > > + table=??(ls_in_check_port_sec), priority=80 , > > > > > > match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) > > > > > > + table=??(ls_in_check_port_sec), priority=81 , > > > > > > match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && > > > > > > tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); > > > > > > next;) > > > > > > + table=??(ls_in_check_port_sec), priority=81 , > > > > > > match=(inport == "lsp-bgp" && ip6.src == > > > > > > fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] > > > > > > = check_in_port_sec(); next;) > > > > > > +]) > > > > > > + > > > > > > +# Remove 'bgp-redirect' option from LRP and check that > > > > > > rules are removed > > > > > > +check ovn-nbctl --wait=sb remove logical_router_port lr-ls > > > > > > options bgp-redirect > > > > > > +check_no_bgp_redirect > > > > > > + > > > > > > +# Set non-existent LSP as target of 'bgp-redirect' and > > > > > > check that no rules are added > > > > > > +check ovn-nbctl --wait=sb set logical_router_port lr-ls > > > > > > options:bgp-redirect=lsp-foo > > > > > > +check_no_bgp_redirect > > > > > > + > > > > > > +AT_CLEANUP > > > > > > +]) > > > > > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > > > > > > index c24ede7c5..c09c1136e 100644 > > > > > > --- a/tests/system-ovn.at > > > > > > +++ b/tests/system-ovn.at > > > > > > @@ -13027,3 +13027,89 @@ > > > > > > OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch- > > > > > > .*/d > > > > > > /connection dropped.*/d"]) > > > > > > AT_CLEANUP > > > > > > ]) > > > > > > + > > > > > > +OVN_FOR_EACH_NORTHD([ > > > > > > +AT_SETUP([BGP control plane redirect]) > > > > > Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) > > > > > here. > > > > > > > > > > > +ovn_start > > > > > > +OVS_TRAFFIC_VSWITCHD_START() > > > > > > + > > > > > > +ADD_BR([br-int]) > > > > > > +ADD_BR([br-ext]) > > > > > > + > > > > > > +ovs-ofctl add-flow br-ext action=normal > > > > > > +# Set external-ids in br-int needed for ovn-controller > > > > > > +ovs-vsctl \ > > > > > > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > > > > > > + -- set Open_vSwitch . external-ids:ovn- > > > > > > remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > > > > > > + -- set Open_vSwitch . external-ids:ovn-encap- > > > > > > type=geneve \ > > > > > > + -- set Open_vSwitch . external-ids:ovn-encap- > > > > > > ip=169.0.0.1 \ > > > > > > + -- set bridge br-int fail-mode=secure other- > > > > > > config:disable-in-band=true > > > > > Most ov*-ctl commands should be prefixed with "check" in this > > > > > test. > > > > > > > > > > > + > > > > > > +# Start ovn-controller > > > > > > +start_daemon ovn-controller > > > > > > + > > > > > > +ovn-nbctl lr-add R1 \ > > > > > > + -- set Logical_Router R1 options:chassis=hv1 > > > > > > + > > > > > > +ovn-nbctl ls-add public > > > > > > + > > > > > > +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 > > > > > > 172.16.1.1/24 > > > > > > + > > > > > > +ovn-nbctl lsp-add public public-rp -- set > > > > > > Logical_Switch_Port public-rp \ > > > > > > + type=router options:router-port=rp-public \ > > > > > > + -- lsp-set-addresses public-rp router > > > > > > + > > > > > > +ovn-nbctl lsp-add public bgp-daemon \ > > > > > > + -- lsp-set-addresses bgp-daemon unknown > > > > > > + > > > > > > +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- > > > > > > bridge-mappings=phynet:br-ext]) > > > > > > +ovn-nbctl lsp-add public public1 \ > > > > > > + -- lsp-set-addresses public1 unknown \ > > > > > > + -- lsp-set-type public1 localnet \ > > > > > > + -- lsp-set-options public1 network_name=phynet > > > > > > + > > > > > > +ovn-nbctl --wait=hv sync > > > > > > + > > > > > > +# Set option that redirects BGP control plane traffic to a > > > > > > LSP "bgp-daemon" > > > > > > +ovn-nbctl --wait=sb set logical_router_port rp-public > > > > > > options:bgp-redirect=bgp-daemon > > > > > > + > > > > > > +# Create "bgp-daemon" interface in a namespace with IP and > > > > > > MAC matching LRP "rp-public" > > > > > > +ADD_NAMESPACES(bgp-daemon) > > > > > > +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", > > > > > > "00:00:02:01:02:03") > > > > > > + > > > > > > +ADD_NAMESPACES(ext-foo) > > > > > > +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", > > > > > > "00:10:10:01:02:13", \ > > > > > > + "172.16.1.1") > > > > > > + > > > > > > +# Flip the interface down/up to get proper IPv6 LLA > > > > > > +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) > > > > > > +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) > > > > > > + > > > > > > +# Wait until IPv6 LLA loses the "tentative" flag otherwise > > > > > > it can't be bound to. > > > > > > +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- > > > > > > daemon | grep "fe80::" | grep -v tentative])]) > > > > > > +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo > > > > > > | grep "fe80::" | grep -v tentative])]) > > > > > > + > > > > > > +# Verify that BGP control plane traffic is delivered to > > > > > > the "bgp-daemon" > > > > > > +# interface on both IPv4 and IPv6 LLA addresses > > > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], > > > > > > [bgp_v4.pid]) > > > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only > > > > > > 172.16.1.1 179]) > > > > > > + > > > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k > > > > > > fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) > > > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only > > > > > > -6 fe80::200:2ff:fe01:203%ext-foo 179]) > > > > > > + > > > > > > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > > > > > > + > > > > > > +as ovn-sb > > > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > > > + > > > > > > +as ovn-nb > > > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > > > + > > > > > > +as northd > > > > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > > > > > > + > > > > > > +as > > > > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d > > > > > > +/.*terminating with signal 15.*/d"]) > > > > > > +AT_CLEANUP > > > > > > +]) > > > > > Regards, > > > > > Dumitru > > > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 08.08.2024 15:51, martin.kalcok@canonical.com wrote: > On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: >> On 8/8/24 10:42, Vladislav Odintsov wrote: >>> On 08.08.2024 03:16, Dumitru Ceara wrote: >>>> Re-adding the dev list. >>>> >>>> On 8/7/24 18:12, Vladislav Odintsov wrote: >>>>> Hi Dumitru, >>>>> >>>> Hi Vladislav, >>>> >>>>> I'd like to add some thoughts to your inputs. >>>>> >>>> Thanks for that, I added some more of my own below. :) >>>> >>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>> Hi Martin, >>>>>> >>>>>> Sorry, I was reviewing V3 but I was slow in actually sending >>>>>> out the email. >>>>>> >>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>>> redirecting of BGP control plane traffic to an arbitrary >>>>>>> LSP >>>>>>> in its peer LS. >>>>>>> >>>>>>> The option expects a string with a LSP name. When set, >>>>>>> any traffic entering LS that's destined for any of the >>>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>>> to the LSP specified in the option's value. >>>>>>> >>>>>>> This enables external BGP daemons to listen on an interface >>>>>>> bound to a LSP and effectively act as if they were >>>>>>> listening >>>>>>> on (and speaking from) LRP's IP address. >>>>>>> >>>>>>> Signed-off-by: Martin Kalcok<martin.kalcok@canonical.com> >>>>>>> --- >>>>>> Strictly on this version of the patch, and with the thought >>>>>> in mind that >>>>>> we might want to consider this feature experimental [0] and >>>>>> maybe change >>>>>> it (NB too) in the future, I left a few comments inline. >>>>>> With those >>>>>> addressed I think the patch looks OK to me. >>>>>> >>>>>> [0] >>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>> >>>>>> Now, for the point when we'd remove the "experimental" tag: >>>>>> >>>>>> In general, it makes me a bit uneasy that we have to share >>>>>> the MAC and >>>>>> IP between the LRP and the VIF of logical switch port that's >>>>>> connected >>>>>> to the same switch as the LRP. >>>>>> >>>>>> I was thinking of alternatives for the future and the only >>>>>> things I >>>>>> could come up with until now are: >>>>>> >>>>>> 1. Create a separate, "disconnected" logical switch: >>>>>> ovn-nbctl ls-add bgp >>>>>> >>>>>> Add the bgp-daemon port to it: >>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>> >>>>>> Then we don't need "unknown" either, I think. >>>>>> >>>>>> But it's not possible today to move packets from the ingress >>>>>> pipeline of >>>>>> a logical switch ("public" in this test) to the egress >>>>>> pipeline of a >>>>>> different logical switch ("bgp" in my example). It also >>>>>> feels icky to >>>>>> implement that. >>>>>> >>>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>>> router port: >>>>>> >>>>>> then we could do the same type of redirect you do here but >>>>>> instead in >>>>>> the logical router pipeline. The advantage is we don't have >>>>>> to drop any >>>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>>> functioning as >>>>>> it does today. >>>>>> >>>>>> Maybe an advantage of this second alternative would be that >>>>>> we can >>>>>> easily attach a filtering option to this LRP (e.g., >>>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>>> flexible in >>>>>> what kind of traffic we forward to the actuall routing >>>>>> protocol daemon >>>>>> that runs behind that OVS port - Vladislav also mentioned >>>>>> during the >>>>>> meeting it might be interesting to forward BFD packets to the >>>>>> FRR (or >>>>>> whatever application implements the routing protocol) >>>>>> instance too. >>>>> The idea to be able to bind LRP to OVS port sounds very >>>>> interesting to >>>>> me. But with a note that it's not a pure "bind", but a partial: >>>>> as you >>>>> wrote with some "filter" applied to pass not all the traffic. >>>>> The main >>>>> idea here is to pass only control plane traffic destined to >>>>> LRP's >>>> As we're discussing on the other thread (Martin pointed it out) >>>> we also >>>> probably need to involve conntrack and allow replies to any kind >>>> of >>>> traffic initiated by the entity behind the LRP's VIF (e.g., BGP >>>> sessions >>>> initiated from the host). >>>> >>>>> addresses. Other than that seems odd. Transit traffic should be >>>>> remained >>>>> in LR pipeline otherwise it will have no difference with a >>>>> regular VIF LSP. >>>>> >>>> Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or >>>> LB >>>> rules that will change the destination IP from LRP IP to >>>> something else) >>>> should not be affected. All other "transit" traffic doesn't have >>>> LRP >>>> IP, does it? >>> You're right. >>> >>> @Dumitru, @Martin, what if we just redirect all traffic destined to >>> LRP >>> IPs to redirect port? Is there any drawbacks? >>> For security it is possible to optionally use ACLs with or without >>> conntrack. It's up to user/CMS. >>> >>> This seems quite simple in the code and very flexible. So no >>> additional >>> flows seems to be needed in future to support any other routing >>> protocols (or for another possible non-routing usecases). >>> >> Won't this break all "transit" traffic that has destination IP the >> LRP >> IP (DNAT, LB), etc? >> > I'm afraid so. I guess I was just tired yesterday when I made that > proposal for general redirect. Given that the redirect is implemented > in the LS pipeline, it would eat up all the traffic for LRP's IP before > the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick test in > case I'm wrong here again. Oops, this will definitely break that cases. For some reason I thought only about routing cases. > > If that's the case, I'll try the conntrack approach and update you all > asap. > > Martin. > >>>>> Having a filter (or match like in ACL or >>>>> Logical_Router_Policies) looks >>>>> more flexible in terms of used protocols. This can coexist with >>>>> proposal >>>>> from current patch, where the flow is not programmable from >>>>> user. >>>>> >>>> I think so too. >>>> Regards, >>>> Dumitru >>>> >>>>>> But again, if we consider the current feature "experimental", >>>>>> we can >>>>>> spend more time during the next release cycle to figure out >>>>>> what's best. >>>>>> >>>>>>> v4 of this patch renames the feature from "bgp-mirror" to >>>>>>> "bgp-redirect" >>>>>>> as discussed during community meeting. >>>>>>> >>>>>>> northd/northd.c | 108 >>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>>> tests/system-ovn.at | 86 >>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>> 5 files changed, 289 insertions(+) >>>>>>> >>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>> index 4353df07d..088104f25 100644 >>>>>>> --- a/northd/northd.c >>>>>>> +++ b/northd/northd.c >>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +static void >>>>>>> +build_bgp_redirect_rule__( >>>>>>> + const char *s_addr, const char *bgp_port_name, >>>>>>> bool is_ipv6, >>>>>>> + struct ovn_port *ls_peer, struct lflow_table >>>>>>> *lflows, >>>>>>> + struct ds *match, struct ds *actions) >>>>>>> +{ >>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>> + /* Redirect packets in the input pipeline destined for >>>>>>> LR's IP to >>>>>>> + * the port specified in 'bgp-redirect' option. >>>>>>> + */ >>>>>>> + ds_clear(match); >>>>>>> + ds_clear(actions); >>>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == >>>>>>> 179", ip_ver, s_addr); >>>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>>> bgp_port_name); >>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>> S_SWITCH_IN_L2_LKUP, 100, >>>>>>> + ds_cstr(match), >>>>>>> + ds_cstr(actions), >>>>>>> + ls_peer->lflow_ref); >>>>>>> + >>>>>>> + >>>>>>> + /* Drop any traffic originating from 'bgp-redirect' >>>>>>> port that does >>>>>>> + * not originate from BGP daemon port. This blocks >>>>>>> unnecessary >>>>>>> + * traffic like ARP broadcasts or IPv6 router >>>>>>> solicitation packets >>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>> + */ >>>>>>> + ds_clear(match); >>>>>>> + ds_put_format(match, "inport == \"%s\"", >>>>>>> bgp_port_name); >>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>> + ds_cstr(match), >>>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>>> + ls_peer->lflow_ref); >>>>>> Not a huge deal but this logical flow gets regenerated >>>>>> multiple times >>>>>> for each bgp_port_name (for each LRP address). In the end >>>>>> we'll only >>>>>> install one in the database so there's no impact on SB. But >>>>>> maybe we >>>>>> should consider moving this part out of this function. >>>>>> >>>>>>> + >>>>>>> + ds_put_format(match, >>>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>>> + ip_ver, >>>>>>> + s_addr); >>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>> + ds_cstr(match), >>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>> check_in_port_sec(); next;", >>>>>>> + ls_peer->lflow_ref); >>>>>> In your system test we set 'addresses = "unknown"' for >>>>>> 'bgp_port_name'. >>>>>> And I guess we kind of have to do it like that all the time >>>>>> when using >>>>>> this feature. I'm trying to think of scenarios in which >>>>>> users set >>>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>>> addresses>'. >>>>>> I'm not sure I saw those until now but maybe I'm wrong. >>>>> I'm also interested whether this LSP port with unknown >>>>> addresses will >>>>> get all the broadcast traffic? >>>>> >>>>> @Martin, didn't you look for ARP behavior? If we have one LS >>>>> with three >>>>> LSPs: >>>>> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP >>>>> 169.254.0.1/30 >>>>> - one "vif" LSP with "unknown" addresses set and configured >>>>> underlying >>>>> OVS port with same MAC and IP set >>>>> - one localnet port for external connectivity. >>>>> >>>>> If some party from localnet side sends broadcast ARP request >>>>> "Who has >>>>> 169.254.0.1? Answer <...>", will this ARP request got flooded >>>>> to both >>>>> LRP and LSP with "unknown" addresses? If yes, will the sender >>>>> receive 2 >>>>> ARP replies? >>>>> >>>>>>> +} >>>>>>> + >>>>>>> +static void >>>>>>> +build_lrouter_bgp_redirect( >>>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>>> + struct ds *match, struct ds *actions) >>>>>>> +{ >>>>>>> + /* LRP has to have a peer.*/ >>>>>>> + if (op->peer == NULL) { >>>>>>> + return; >>>>>>> + } >>>>>>> + /* LRP has to have NB record.*/ >>>>>>> + if (op->nbrp == NULL) { >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' >>>>>>> option set. Value of this >>>>>>> + * option is the name of LSP to which a BGP traffic >>>>>>> will be redirected. >>>>>>> + */ >>>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, >>>>>>> "bgp-redirect"); >>>>>>> + if (bgp_port == NULL) { >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>>> redirected, exists.*/ >>>>>>> + struct ovn_port *peer_lsp; >>>>>>> + bool bgp_port_exists = false; >>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od- >>>>>>>> ports) { >>>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>> + && !strncmp(peer_lsp->key, bgp_port, >>>>>>> peer_lsp_s)){ >>>>>>> + bgp_port_exists = true; >>>>>>> + break; >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + if (!bgp_port_exists) { >>>>>>> + static struct vlog_rate_limit rl = >>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>>> Logical Router Port " >>>>>>> + "'%s' refers to non-existent >>>>>>> Logical Switch Port. " >>>>>>> + "BGP redirecting won't be >>>>>>> configured.", >>>>>>> + op->key); >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + if (op->cr_port != NULL) { >>>>>>> + static struct vlog_rate_limit rl = >>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>>> supported on" >>>>>>> + " Distributed Gateway Port >>>>>>> '%s'", op->key); >>>>>>> + return; >>>>>>> + } >>>>>>> + >>>>>>> + /* Mirror traffic destined for LRP's IPs and default >>>>>>> BGP port >>>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>>> + */ >>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; >>>>>>> i++) { >>>>>>> + const char *ip_s = op- >>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, >>>>>>> op->peer, lflows, >>>>>>> + match, actions); >>>>>>> + } I'm thinking more and more that having a "match" expression in a "NB.Logical_Router_Port.options:redirect-match" option it a good choice. LRP can have multiple IP addresses and enabling redirect for all of them could be implicit for the user and may contain security concerns. As an example: LR can have configured LB with tcp/179. If user configured peering IP-address (which can be in a link-local address range) and also added an IP address for Load Balancer in a `networks` list of a LRP. Enabling BGP redirect for all IPs of LRP will break load balancing for tcp/179. While it is not needed for load balancers to assign their VIP to networks, there is no validations, which prohibit doing so. It is still possible and can be difficult to catch such behavior for redirected traffic with Load Balancers. >>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; >>>>>>> i++) { >>>>>>> + const char *ip_s = op- >>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, >>>>>>> op->peer, lflows, >>>>>>> + match, actions); >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> /* This function adds ARP resolve flows related to a LSP. >>>>>>> */ >>>>>>> static void >>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port >>>>>>> *op, >>>>>>> lsi->meter_groups, op- >>>>>>>> lflow_ref); >>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>>> lflows, &lsi->match, >>>>>>> &lsi- >>>>>>>> actions, op->lflow_ref); >>>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi- >>>>>>>> match, &lsi->actions); >>>>>>> } >>>>>>> >>>>>>> static void * >>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn- >>>>>>> northd.8.xml >>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>> @@ -284,6 +284,21 @@ >>>>>>> dropped in the next stage. >>>>>>> </li> >>>>>>> >>>>>>> + <li> >>>>>>> + For each logical port that's defined as a target >>>>>>> of BGP redirecting >>>>>>> + (via <code>bgp-redirect</code> option set on >>>>>>> Logical Router Port), a >>>>>>> + filter is set in place that disallows any traffic >>>>>>> entering this port >>>>>>> + that does not originate from Logical Router Port's >>>>>>> IPs and default BGP >>>>>>> + port (TCP 179). This filtering is achieved by two >>>>>>> rules. First rule has >>>>>>> + priority 81, it matches on <code>inport == >>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>>> tcp.src == 179</code> >>>>>>> + and allows traffic further with >>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>> + check_in_port_sec(); next;</code>. Second rule >>>>>>> with priority 80 matches >>>>>>> + the rest of the traffic from that port and sets >>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so >>>>>>> that the packets are >>>>>>> + dropped in the next stage. >>>>>>> + </li> >>>>>>> + >>>>>>> <li> >>>>>>> For each (enabled) vtep logical port, a priority >>>>>>> 70 flow is added which >>>>>>> matches on all packets and applies the action >>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>> on the logical switch. >>>>>>> </li> >>>>>>> >>>>>>> + <li> >>>>>>> + For any logical port that's defined as a target of >>>>>>> BGP redirecting (via >>>>>>> + <code>bgp-redirect</code> option set on Logical >>>>>>> Router Port), a >>>>>>> + priority-100 flow is added that redirects traffic >>>>>>> for Logical Router >>>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to >>>>>>> the targeted >>>>>>> + logical port. >>>>>>> + </li> >>>>>>> + >>>>>>> <li> >>>>>>> Priority-90 flows for transit switches that >>>>>>> forward registered >>>>>>> IP multicast traffic to their corresponding >>>>>>> multicast group , which >>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>> --- a/ovn-nb.xml >>>>>>> +++ b/ovn-nb.xml >>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>> </p> >>>>>>> </column> >>>>>>> >>>>>>> + <column name="options" key="bgp-redirect" >>>>>>> type='{"type": "string"}'> >>>>>>> + <p> >>>>>>> + This option expects a name of a Logical Switch >>>>>>> Port that's present >>>>>>> + in the peer's Logical Switch. If set, it causes >>>>>>> any traffic >>>>>>> + that's destined for Logical Router Port's IP >>>>>>> addresses (including >>>>>>> + its IPv6 LLA) and the default BGP port (TCP >>>>>>> 179), to be redirected >>>>>>> + to the specified Logical Switch Port. >>>>>>> + >>>>>>> + This allows external BGP daemon to be bound to a >>>>>>> port in OVN's >>>>>>> + Logical Switch and act as if it was listening on >>>>>>> Logical Router >>>>>>> + Port's IP address. >>>>>>> + </p> >>>>>>> + </column> >>>>>>> + >>>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>>> <p> >>>>>>> When configured, represents a match expression, >>>>>>> in the same >>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>>> index a389d1988..df1e924ab 100644 >>>>>>> --- a/tests/ovn-northd.at >>>>>>> +++ b/tests/ovn-northd.at >>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr >>>>>>> | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>> >>>>>>> AT_CLEANUP >>>>>>> ]) >>>>>>> + >>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>> +ovn_start >>>>>>> + >>>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>>> + >>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>>> options:chassis=hv1 >>>>>>> + >>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>> + lsp-add ls ls-lr -- \ >>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>> + >>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>> + >>>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>>> installed. >>>>>>> +check_no_bgp_redirect() { >>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>> ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>>> +]) >>>>>>> + >>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>> wc -l], [0], [0 >>>>>>> +]) >>>>>>> +} >>>>>>> + >>>>>>> +# By default, no rules related to BGP redirect are present >>>>>>> +check_no_bgp_redirect >>>>>>> + >>>>>>> +# Set "lsp-bgp" port as target of BGP control plane >>>>>>> redirected traffic >>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>> options:bgp-redirect=lsp-bgp >>>>>>> + >>>>>>> +# Check that BGP control plane traffic is redirected to >>>>>>> "lsp-bgp" >>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>>> grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>> +]) >>>>>>> + >>>>>>> +# Check that only BGP-related traffic is accepted on "lsp- >>>>>>> bgp" port >>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>> ovn_strip_lflows], [0], [dnl >>>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && >>>>>>> tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>>> next;) >>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>> match=(inport == "lsp-bgp" && ip6.src == >>>>>>> fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] >>>>>>> = check_in_port_sec(); next;) >>>>>>> +]) >>>>>>> + >>>>>>> +# Remove 'bgp-redirect' option from LRP and check that >>>>>>> rules are removed >>>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>>> options bgp-redirect >>>>>>> +check_no_bgp_redirect >>>>>>> + >>>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and >>>>>>> check that no rules are added >>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>> options:bgp-redirect=lsp-foo >>>>>>> +check_no_bgp_redirect >>>>>>> + >>>>>>> +AT_CLEANUP >>>>>>> +]) >>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>> --- a/tests/system-ovn.at >>>>>>> +++ b/tests/system-ovn.at >>>>>>> @@ -13027,3 +13027,89 @@ >>>>>>> OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch- >>>>>>> .*/d >>>>>>> /connection dropped.*/d"]) >>>>>>> AT_CLEANUP >>>>>>> ]) >>>>>>> + >>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) >>>>>> here. >>>>>> >>>>>>> +ovn_start >>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>> + >>>>>>> +ADD_BR([br-int]) >>>>>>> +ADD_BR([br-ext]) >>>>>>> + >>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>>> +ovs-vsctl \ >>>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>> type=geneve \ >>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>> ip=169.0.0.1 \ >>>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>>> config:disable-in-band=true >>>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>>> test. >>>>>> >>>>>>> + >>>>>>> +# Start ovn-controller >>>>>>> +start_daemon ovn-controller >>>>>>> + >>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>>> + >>>>>>> +ovn-nbctl ls-add public >>>>>>> + >>>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 >>>>>>> 172.16.1.1/24 >>>>>>> + >>>>>>> +ovn-nbctl lsp-add public public-rp -- set >>>>>>> Logical_Switch_Port public-rp \ >>>>>>> + type=router options:router-port=rp-public \ >>>>>>> + -- lsp-set-addresses public-rp router >>>>>>> + >>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>> + >>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>>> + >>>>>>> +ovn-nbctl --wait=hv sync >>>>>>> + >>>>>>> +# Set option that redirects BGP control plane traffic to a >>>>>>> LSP "bgp-daemon" >>>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>>> options:bgp-redirect=bgp-daemon >>>>>>> + >>>>>>> +# Create "bgp-daemon" interface in a namespace with IP and >>>>>>> MAC matching LRP "rp-public" >>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>>> "00:00:02:01:02:03") >>>>>>> + >>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>>> "00:10:10:01:02:13", \ >>>>>>> + "172.16.1.1") >>>>>>> + >>>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>>> + >>>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise >>>>>>> it can't be bound to. >>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo >>>>>>> | grep "fe80::" | grep -v tentative])]) >>>>>>> + >>>>>>> +# Verify that BGP control plane traffic is delivered to >>>>>>> the "bgp-daemon" >>>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>>> [bgp_v4.pid]) >>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>> 172.16.1.1 179]) >>>>>>> + >>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>> -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>> + >>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>> + >>>>>>> +as ovn-sb >>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>> + >>>>>>> +as ovn-nb >>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>> + >>>>>>> +as northd >>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>> + >>>>>>> +as >>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>> +AT_CLEANUP >>>>>>> +]) >>>>>> Regards, >>>>>> Dumitru >>>>>> >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 8/8/24 11:14, Vladislav Odintsov wrote: > > On 08.08.2024 15:51, martin.kalcok@canonical.com wrote: >> On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: >>> On 8/8/24 10:42, Vladislav Odintsov wrote: >>>> On 08.08.2024 03:16, Dumitru Ceara wrote: >>>>> Re-adding the dev list. >>>>> >>>>> On 8/7/24 18:12, Vladislav Odintsov wrote: >>>>>> Hi Dumitru, >>>>>> >>>>> Hi Vladislav, >>>>> >>>>>> I'd like to add some thoughts to your inputs. >>>>>> >>>>> Thanks for that, I added some more of my own below. :) >>>>> >>>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>>> Hi Martin, >>>>>>> >>>>>>> Sorry, I was reviewing V3 but I was slow in actually sending >>>>>>> out the email. >>>>>>> >>>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>>>> redirecting of BGP control plane traffic to an arbitrary >>>>>>>> LSP >>>>>>>> in its peer LS. >>>>>>>> >>>>>>>> The option expects a string with a LSP name. When set, >>>>>>>> any traffic entering LS that's destined for any of the >>>>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>>>> to the LSP specified in the option's value. >>>>>>>> >>>>>>>> This enables external BGP daemons to listen on an interface >>>>>>>> bound to a LSP and effectively act as if they were >>>>>>>> listening >>>>>>>> on (and speaking from) LRP's IP address. >>>>>>>> >>>>>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>>>>>> --- >>>>>>> Strictly on this version of the patch, and with the thought >>>>>>> in mind that >>>>>>> we might want to consider this feature experimental [0] and >>>>>>> maybe change >>>>>>> it (NB too) in the future, I left a few comments inline. >>>>>>> With those >>>>>>> addressed I think the patch looks OK to me. >>>>>>> >>>>>>> [0] >>>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>>> >>>>>>> Now, for the point when we'd remove the "experimental" tag: >>>>>>> >>>>>>> In general, it makes me a bit uneasy that we have to share >>>>>>> the MAC and >>>>>>> IP between the LRP and the VIF of logical switch port that's >>>>>>> connected >>>>>>> to the same switch as the LRP. >>>>>>> >>>>>>> I was thinking of alternatives for the future and the only >>>>>>> things I >>>>>>> could come up with until now are: >>>>>>> >>>>>>> 1. Create a separate, "disconnected" logical switch: >>>>>>> ovn-nbctl ls-add bgp >>>>>>> >>>>>>> Add the bgp-daemon port to it: >>>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>>> >>>>>>> Then we don't need "unknown" either, I think. >>>>>>> >>>>>>> But it's not possible today to move packets from the ingress >>>>>>> pipeline of >>>>>>> a logical switch ("public" in this test) to the egress >>>>>>> pipeline of a >>>>>>> different logical switch ("bgp" in my example). It also >>>>>>> feels icky to >>>>>>> implement that. >>>>>>> >>>>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>>>> router port: >>>>>>> >>>>>>> then we could do the same type of redirect you do here but >>>>>>> instead in >>>>>>> the logical router pipeline. The advantage is we don't have >>>>>>> to drop any >>>>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>>>> functioning as >>>>>>> it does today. >>>>>>> >>>>>>> Maybe an advantage of this second alternative would be that >>>>>>> we can >>>>>>> easily attach a filtering option to this LRP (e.g., >>>>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>>>> flexible in >>>>>>> what kind of traffic we forward to the actuall routing >>>>>>> protocol daemon >>>>>>> that runs behind that OVS port - Vladislav also mentioned >>>>>>> during the >>>>>>> meeting it might be interesting to forward BFD packets to the >>>>>>> FRR (or >>>>>>> whatever application implements the routing protocol) >>>>>>> instance too. >>>>>> The idea to be able to bind LRP to OVS port sounds very >>>>>> interesting to >>>>>> me. But with a note that it's not a pure "bind", but a partial: >>>>>> as you >>>>>> wrote with some "filter" applied to pass not all the traffic. >>>>>> The main >>>>>> idea here is to pass only control plane traffic destined to >>>>>> LRP's >>>>> As we're discussing on the other thread (Martin pointed it out) >>>>> we also >>>>> probably need to involve conntrack and allow replies to any kind >>>>> of >>>>> traffic initiated by the entity behind the LRP's VIF (e.g., BGP >>>>> sessions >>>>> initiated from the host). >>>>> >>>>>> addresses. Other than that seems odd. Transit traffic should be >>>>>> remained >>>>>> in LR pipeline otherwise it will have no difference with a >>>>>> regular VIF LSP. >>>>>> >>>>> Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or >>>>> LB >>>>> rules that will change the destination IP from LRP IP to >>>>> something else) >>>>> should not be affected. All other "transit" traffic doesn't have >>>>> LRP >>>>> IP, does it? >>>> You're right. >>>> >>>> @Dumitru, @Martin, what if we just redirect all traffic destined to >>>> LRP >>>> IPs to redirect port? Is there any drawbacks? >>>> For security it is possible to optionally use ACLs with or without >>>> conntrack. It's up to user/CMS. >>>> >>>> This seems quite simple in the code and very flexible. So no >>>> additional >>>> flows seems to be needed in future to support any other routing >>>> protocols (or for another possible non-routing usecases). >>>> >>> Won't this break all "transit" traffic that has destination IP the >>> LRP >>> IP (DNAT, LB), etc? >>> >> I'm afraid so. I guess I was just tired yesterday when I made that >> proposal for general redirect. Given that the redirect is implemented >> in the LS pipeline, it would eat up all the traffic for LRP's IP before >> the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick test in >> case I'm wrong here again. > Oops, this will definitely break that cases. For some reason I thought > only about routing cases. > >> If that's the case, I'll try the conntrack approach and update you all >> asap. >> >> Martin. >> >>>>>> Having a filter (or match like in ACL or >>>>>> Logical_Router_Policies) looks >>>>>> more flexible in terms of used protocols. This can coexist with >>>>>> proposal >>>>>> from current patch, where the flow is not programmable from >>>>>> user. >>>>>> >>>>> I think so too. >>>>> Regards, >>>>> Dumitru >>>>> >>>>>>> But again, if we consider the current feature "experimental", >>>>>>> we can >>>>>>> spend more time during the next release cycle to figure out >>>>>>> what's best. >>>>>>> >>>>>>>> v4 of this patch renames the feature from "bgp-mirror" to >>>>>>>> "bgp-redirect" >>>>>>>> as discussed during community meeting. >>>>>>>> >>>>>>>> northd/northd.c | 108 >>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>>>> tests/system-ovn.at | 86 >>>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>>> 5 files changed, 289 insertions(+) >>>>>>>> >>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>> index 4353df07d..088104f25 100644 >>>>>>>> --- a/northd/northd.c >>>>>>>> +++ b/northd/northd.c >>>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> +static void >>>>>>>> +build_bgp_redirect_rule__( >>>>>>>> + const char *s_addr, const char *bgp_port_name, >>>>>>>> bool is_ipv6, >>>>>>>> + struct ovn_port *ls_peer, struct lflow_table >>>>>>>> *lflows, >>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>> +{ >>>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>>> + /* Redirect packets in the input pipeline destined for >>>>>>>> LR's IP to >>>>>>>> + * the port specified in 'bgp-redirect' option. >>>>>>>> + */ >>>>>>>> + ds_clear(match); >>>>>>>> + ds_clear(actions); >>>>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == >>>>>>>> 179", ip_ver, s_addr); >>>>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>>>> bgp_port_name); >>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>> S_SWITCH_IN_L2_LKUP, 100, >>>>>>>> + ds_cstr(match), >>>>>>>> + ds_cstr(actions), >>>>>>>> + ls_peer->lflow_ref); >>>>>>>> + >>>>>>>> + >>>>>>>> + /* Drop any traffic originating from 'bgp-redirect' >>>>>>>> port that does >>>>>>>> + * not originate from BGP daemon port. This blocks >>>>>>>> unnecessary >>>>>>>> + * traffic like ARP broadcasts or IPv6 router >>>>>>>> solicitation packets >>>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>>> + */ >>>>>>>> + ds_clear(match); >>>>>>>> + ds_put_format(match, "inport == \"%s\"", >>>>>>>> bgp_port_name); >>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>>> + ds_cstr(match), >>>>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>>>> + ls_peer->lflow_ref); >>>>>>> Not a huge deal but this logical flow gets regenerated >>>>>>> multiple times >>>>>>> for each bgp_port_name (for each LRP address). In the end >>>>>>> we'll only >>>>>>> install one in the database so there's no impact on SB. But >>>>>>> maybe we >>>>>>> should consider moving this part out of this function. >>>>>>> >>>>>>>> + >>>>>>>> + ds_put_format(match, >>>>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>>>> + ip_ver, >>>>>>>> + s_addr); >>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>>> + ds_cstr(match), >>>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>>> check_in_port_sec(); next;", >>>>>>>> + ls_peer->lflow_ref); >>>>>>> In your system test we set 'addresses = "unknown"' for >>>>>>> 'bgp_port_name'. >>>>>>> And I guess we kind of have to do it like that all the time >>>>>>> when using >>>>>>> this feature. I'm trying to think of scenarios in which >>>>>>> users set >>>>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>>>> addresses>'. >>>>>>> I'm not sure I saw those until now but maybe I'm wrong. >>>>>> I'm also interested whether this LSP port with unknown >>>>>> addresses will >>>>>> get all the broadcast traffic? >>>>>> >>>>>> @Martin, didn't you look for ARP behavior? If we have one LS >>>>>> with three >>>>>> LSPs: >>>>>> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP >>>>>> 169.254.0.1/30 >>>>>> - one "vif" LSP with "unknown" addresses set and configured >>>>>> underlying >>>>>> OVS port with same MAC and IP set >>>>>> - one localnet port for external connectivity. >>>>>> >>>>>> If some party from localnet side sends broadcast ARP request >>>>>> "Who has >>>>>> 169.254.0.1? Answer <...>", will this ARP request got flooded >>>>>> to both >>>>>> LRP and LSP with "unknown" addresses? If yes, will the sender >>>>>> receive 2 >>>>>> ARP replies? >>>>>> >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void >>>>>>>> +build_lrouter_bgp_redirect( >>>>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>> +{ >>>>>>>> + /* LRP has to have a peer.*/ >>>>>>>> + if (op->peer == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + /* LRP has to have NB record.*/ >>>>>>>> + if (op->nbrp == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' >>>>>>>> option set. Value of this >>>>>>>> + * option is the name of LSP to which a BGP traffic >>>>>>>> will be redirected. >>>>>>>> + */ >>>>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, >>>>>>>> "bgp-redirect"); >>>>>>>> + if (bgp_port == NULL) { >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>>>> redirected, exists.*/ >>>>>>>> + struct ovn_port *peer_lsp; >>>>>>>> + bool bgp_port_exists = false; >>>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od- >>>>>>>>> ports) { >>>>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>>> + && !strncmp(peer_lsp->key, bgp_port, >>>>>>>> peer_lsp_s)){ >>>>>>>> + bgp_port_exists = true; >>>>>>>> + break; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> + if (!bgp_port_exists) { >>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>>>> Logical Router Port " >>>>>>>> + "'%s' refers to non-existent >>>>>>>> Logical Switch Port. " >>>>>>>> + "BGP redirecting won't be >>>>>>>> configured.", >>>>>>>> + op->key); >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + if (op->cr_port != NULL) { >>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>>>> supported on" >>>>>>>> + " Distributed Gateway Port >>>>>>>> '%s'", op->key); >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* Mirror traffic destined for LRP's IPs and default >>>>>>>> BGP port >>>>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>>>> + */ >>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; >>>>>>>> i++) { >>>>>>>> + const char *ip_s = op- >>>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, >>>>>>>> op->peer, lflows, >>>>>>>> + match, actions); >>>>>>>> + } > > I'm thinking more and more that having a "match" expression in a > "NB.Logical_Router_Port.options:redirect-match" option it a good choice. > Would the match be enforced only for redirected traffic (from outside to bgp target LSP)? > LRP can have multiple IP addresses and enabling redirect for all of them > could be implicit for the user and may contain security concerns. As an > example: LR can have configured LB with tcp/179. If user configured > peering IP-address (which can be in a link-local address range) and also > added an IP address for Load Balancer in a `networks` list of a LRP. > Enabling BGP redirect for all IPs of LRP will break load balancing for > tcp/179. While it is not needed for load balancers to assign their VIP > to networks, there is no validations, which prohibit doing so. It is I think this is actually something that ovn-kubernetes does in their deployments for cluster-IP services. And I suspect that today there's nothing preventing users from creating cluster-IP services for TCP/179. > still possible and can be difficult to catch such behavior for > redirected traffic with Load Balancers. > >>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; >>>>>>>> i++) { >>>>>>>> + const char *ip_s = op- >>>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, >>>>>>>> op->peer, lflows, >>>>>>>> + match, actions); >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> /* This function adds ARP resolve flows related to a LSP. >>>>>>>> */ >>>>>>>> static void >>>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port >>>>>>>> *op, >>>>>>>> lsi->meter_groups, op- >>>>>>>>> lflow_ref); >>>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>>>> lflows, &lsi->match, >>>>>>>> &lsi- >>>>>>>>> actions, op->lflow_ref); >>>>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi- >>>>>>>>> match, &lsi->actions); >>>>>>>> } >>>>>>>> >>>>>>>> static void * >>>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn- >>>>>>>> northd.8.xml >>>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>>> @@ -284,6 +284,21 @@ >>>>>>>> dropped in the next stage. >>>>>>>> </li> >>>>>>>> >>>>>>>> + <li> >>>>>>>> + For each logical port that's defined as a target >>>>>>>> of BGP redirecting >>>>>>>> + (via <code>bgp-redirect</code> option set on >>>>>>>> Logical Router Port), a >>>>>>>> + filter is set in place that disallows any traffic >>>>>>>> entering this port >>>>>>>> + that does not originate from Logical Router Port's >>>>>>>> IPs and default BGP >>>>>>>> + port (TCP 179). This filtering is achieved by two >>>>>>>> rules. First rule has >>>>>>>> + priority 81, it matches on <code>inport == >>>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>>>> tcp.src == 179</code> >>>>>>>> + and allows traffic further with >>>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>>> + check_in_port_sec(); next;</code>. Second rule >>>>>>>> with priority 80 matches >>>>>>>> + the rest of the traffic from that port and sets >>>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so >>>>>>>> that the packets are >>>>>>>> + dropped in the next stage. >>>>>>>> + </li> >>>>>>>> + >>>>>>>> <li> >>>>>>>> For each (enabled) vtep logical port, a priority >>>>>>>> 70 flow is added which >>>>>>>> matches on all packets and applies the action >>>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>>> on the logical switch. >>>>>>>> </li> >>>>>>>> >>>>>>>> + <li> >>>>>>>> + For any logical port that's defined as a target of >>>>>>>> BGP redirecting (via >>>>>>>> + <code>bgp-redirect</code> option set on Logical >>>>>>>> Router Port), a >>>>>>>> + priority-100 flow is added that redirects traffic >>>>>>>> for Logical Router >>>>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to >>>>>>>> the targeted >>>>>>>> + logical port. >>>>>>>> + </li> >>>>>>>> + >>>>>>>> <li> >>>>>>>> Priority-90 flows for transit switches that >>>>>>>> forward registered >>>>>>>> IP multicast traffic to their corresponding >>>>>>>> multicast group , which >>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>>> --- a/ovn-nb.xml >>>>>>>> +++ b/ovn-nb.xml >>>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>>> </p> >>>>>>>> </column> >>>>>>>> >>>>>>>> + <column name="options" key="bgp-redirect" >>>>>>>> type='{"type": "string"}'> >>>>>>>> + <p> >>>>>>>> + This option expects a name of a Logical Switch >>>>>>>> Port that's present >>>>>>>> + in the peer's Logical Switch. If set, it causes >>>>>>>> any traffic >>>>>>>> + that's destined for Logical Router Port's IP >>>>>>>> addresses (including >>>>>>>> + its IPv6 LLA) and the default BGP port (TCP >>>>>>>> 179), to be redirected >>>>>>>> + to the specified Logical Switch Port. >>>>>>>> + >>>>>>>> + This allows external BGP daemon to be bound to a >>>>>>>> port in OVN's >>>>>>>> + Logical Switch and act as if it was listening on >>>>>>>> Logical Router >>>>>>>> + Port's IP address. >>>>>>>> + </p> >>>>>>>> + </column> >>>>>>>> + >>>>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>>>> <p> >>>>>>>> When configured, represents a match expression, >>>>>>>> in the same >>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>>>> index a389d1988..df1e924ab 100644 >>>>>>>> --- a/tests/ovn-northd.at >>>>>>>> +++ b/tests/ovn-northd.at >>>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr >>>>>>>> | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>>> >>>>>>>> AT_CLEANUP >>>>>>>> ]) >>>>>>>> + >>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>> +ovn_start >>>>>>>> + >>>>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>>>> + >>>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>>>> options:chassis=hv1 >>>>>>>> + >>>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>>> + lsp-add ls ls-lr -- \ >>>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>>> + >>>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>>> + >>>>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>>>> installed. >>>>>>>> +check_no_bgp_redirect() { >>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>> ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>>>> +]) >>>>>>>> + >>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>> wc -l], [0], [0 >>>>>>>> +]) >>>>>>>> +} >>>>>>>> + >>>>>>>> +# By default, no rules related to BGP redirect are present >>>>>>>> +check_no_bgp_redirect >>>>>>>> + >>>>>>>> +# Set "lsp-bgp" port as target of BGP control plane >>>>>>>> redirected traffic >>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>> options:bgp-redirect=lsp-bgp >>>>>>>> + >>>>>>>> +# Check that BGP control plane traffic is redirected to >>>>>>>> "lsp-bgp" >>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>>>> grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>> +]) >>>>>>>> + >>>>>>>> +# Check that only BGP-related traffic is accepted on "lsp- >>>>>>>> bgp" port >>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>> ovn_strip_lflows], [0], [dnl >>>>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && >>>>>>>> tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>>>> next;) >>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>> match=(inport == "lsp-bgp" && ip6.src == >>>>>>>> fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] >>>>>>>> = check_in_port_sec(); next;) >>>>>>>> +]) >>>>>>>> + >>>>>>>> +# Remove 'bgp-redirect' option from LRP and check that >>>>>>>> rules are removed >>>>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>>>> options bgp-redirect >>>>>>>> +check_no_bgp_redirect >>>>>>>> + >>>>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and >>>>>>>> check that no rules are added >>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>> options:bgp-redirect=lsp-foo >>>>>>>> +check_no_bgp_redirect >>>>>>>> + >>>>>>>> +AT_CLEANUP >>>>>>>> +]) >>>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>>> --- a/tests/system-ovn.at >>>>>>>> +++ b/tests/system-ovn.at >>>>>>>> @@ -13027,3 +13027,89 @@ >>>>>>>> OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch- >>>>>>>> .*/d >>>>>>>> /connection dropped.*/d"]) >>>>>>>> AT_CLEANUP >>>>>>>> ]) >>>>>>>> + >>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) >>>>>>> here. >>>>>>> >>>>>>>> +ovn_start >>>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>>> + >>>>>>>> +ADD_BR([br-int]) >>>>>>>> +ADD_BR([br-ext]) >>>>>>>> + >>>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>>>> +ovs-vsctl \ >>>>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>> type=geneve \ >>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>> ip=169.0.0.1 \ >>>>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>>>> config:disable-in-band=true >>>>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>>>> test. >>>>>>> >>>>>>>> + >>>>>>>> +# Start ovn-controller >>>>>>>> +start_daemon ovn-controller >>>>>>>> + >>>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>>>> + >>>>>>>> +ovn-nbctl ls-add public >>>>>>>> + >>>>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 >>>>>>>> 172.16.1.1/24 >>>>>>>> + >>>>>>>> +ovn-nbctl lsp-add public public-rp -- set >>>>>>>> Logical_Switch_Port public-rp \ >>>>>>>> + type=router options:router-port=rp-public \ >>>>>>>> + -- lsp-set-addresses public-rp router >>>>>>>> + >>>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>>> + >>>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>>>> + >>>>>>>> +ovn-nbctl --wait=hv sync >>>>>>>> + >>>>>>>> +# Set option that redirects BGP control plane traffic to a >>>>>>>> LSP "bgp-daemon" >>>>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>>>> options:bgp-redirect=bgp-daemon >>>>>>>> + >>>>>>>> +# Create "bgp-daemon" interface in a namespace with IP and >>>>>>>> MAC matching LRP "rp-public" >>>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>>>> "00:00:02:01:02:03") >>>>>>>> + >>>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>>>> "00:10:10:01:02:13", \ >>>>>>>> + "172.16.1.1") >>>>>>>> + >>>>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>>>> + >>>>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise >>>>>>>> it can't be bound to. >>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo >>>>>>>> | grep "fe80::" | grep -v tentative])]) >>>>>>>> + >>>>>>>> +# Verify that BGP control plane traffic is delivered to >>>>>>>> the "bgp-daemon" >>>>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>>>> [bgp_v4.pid]) >>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>> 172.16.1.1 179]) >>>>>>>> + >>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>> -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>>> + >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>>> + >>>>>>>> +as ovn-sb >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>> + >>>>>>>> +as ovn-nb >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>> + >>>>>>>> +as northd >>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>>> + >>>>>>>> +as >>>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>>> +AT_CLEANUP >>>>>>>> +]) >>>>>>> Regards, >>>>>>> Dumitru >>>>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 08.08.2024 18:45, Dumitru Ceara wrote: > On 8/8/24 11:14, Vladislav Odintsov wrote: >> On 08.08.2024 15:51,martin.kalcok@canonical.com wrote: >>> On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: >>>> On 8/8/24 10:42, Vladislav Odintsov wrote: >>>>> On 08.08.2024 03:16, Dumitru Ceara wrote: >>>>>> Re-adding the dev list. >>>>>> >>>>>> On 8/7/24 18:12, Vladislav Odintsov wrote: >>>>>>> Hi Dumitru, >>>>>>> >>>>>> Hi Vladislav, >>>>>> >>>>>>> I'd like to add some thoughts to your inputs. >>>>>>> >>>>>> Thanks for that, I added some more of my own below. :) >>>>>> >>>>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>>>> Hi Martin, >>>>>>>> >>>>>>>> Sorry, I was reviewing V3 but I was slow in actually sending >>>>>>>> out the email. >>>>>>>> >>>>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>>>>> redirecting of BGP control plane traffic to an arbitrary >>>>>>>>> LSP >>>>>>>>> in its peer LS. >>>>>>>>> >>>>>>>>> The option expects a string with a LSP name. When set, >>>>>>>>> any traffic entering LS that's destined for any of the >>>>>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>>>>> to the LSP specified in the option's value. >>>>>>>>> >>>>>>>>> This enables external BGP daemons to listen on an interface >>>>>>>>> bound to a LSP and effectively act as if they were >>>>>>>>> listening >>>>>>>>> on (and speaking from) LRP's IP address. >>>>>>>>> >>>>>>>>> Signed-off-by: Martin Kalcok<martin.kalcok@canonical.com> >>>>>>>>> --- >>>>>>>> Strictly on this version of the patch, and with the thought >>>>>>>> in mind that >>>>>>>> we might want to consider this feature experimental [0] and >>>>>>>> maybe change >>>>>>>> it (NB too) in the future, I left a few comments inline. >>>>>>>> With those >>>>>>>> addressed I think the patch looks OK to me. >>>>>>>> >>>>>>>> [0] >>>>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>>>> >>>>>>>> Now, for the point when we'd remove the "experimental" tag: >>>>>>>> >>>>>>>> In general, it makes me a bit uneasy that we have to share >>>>>>>> the MAC and >>>>>>>> IP between the LRP and the VIF of logical switch port that's >>>>>>>> connected >>>>>>>> to the same switch as the LRP. >>>>>>>> >>>>>>>> I was thinking of alternatives for the future and the only >>>>>>>> things I >>>>>>>> could come up with until now are: >>>>>>>> >>>>>>>> 1. Create a separate, "disconnected" logical switch: >>>>>>>> ovn-nbctl ls-add bgp >>>>>>>> >>>>>>>> Add the bgp-daemon port to it: >>>>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>>>> >>>>>>>> Then we don't need "unknown" either, I think. >>>>>>>> >>>>>>>> But it's not possible today to move packets from the ingress >>>>>>>> pipeline of >>>>>>>> a logical switch ("public" in this test) to the egress >>>>>>>> pipeline of a >>>>>>>> different logical switch ("bgp" in my example). It also >>>>>>>> feels icky to >>>>>>>> implement that. >>>>>>>> >>>>>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>>>>> router port: >>>>>>>> >>>>>>>> then we could do the same type of redirect you do here but >>>>>>>> instead in >>>>>>>> the logical router pipeline. The advantage is we don't have >>>>>>>> to drop any >>>>>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>>>>> functioning as >>>>>>>> it does today. >>>>>>>> >>>>>>>> Maybe an advantage of this second alternative would be that >>>>>>>> we can >>>>>>>> easily attach a filtering option to this LRP (e.g., >>>>>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>>>>> flexible in >>>>>>>> what kind of traffic we forward to the actuall routing >>>>>>>> protocol daemon >>>>>>>> that runs behind that OVS port - Vladislav also mentioned >>>>>>>> during the >>>>>>>> meeting it might be interesting to forward BFD packets to the >>>>>>>> FRR (or >>>>>>>> whatever application implements the routing protocol) >>>>>>>> instance too. >>>>>>> The idea to be able to bind LRP to OVS port sounds very >>>>>>> interesting to >>>>>>> me. But with a note that it's not a pure "bind", but a partial: >>>>>>> as you >>>>>>> wrote with some "filter" applied to pass not all the traffic. >>>>>>> The main >>>>>>> idea here is to pass only control plane traffic destined to >>>>>>> LRP's >>>>>> As we're discussing on the other thread (Martin pointed it out) >>>>>> we also >>>>>> probably need to involve conntrack and allow replies to any kind >>>>>> of >>>>>> traffic initiated by the entity behind the LRP's VIF (e.g., BGP >>>>>> sessions >>>>>> initiated from the host). >>>>>> >>>>>>> addresses. Other than that seems odd. Transit traffic should be >>>>>>> remained >>>>>>> in LR pipeline otherwise it will have no difference with a >>>>>>> regular VIF LSP. >>>>>>> >>>>>> Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or >>>>>> LB >>>>>> rules that will change the destination IP from LRP IP to >>>>>> something else) >>>>>> should not be affected. All other "transit" traffic doesn't have >>>>>> LRP >>>>>> IP, does it? >>>>> You're right. >>>>> >>>>> @Dumitru, @Martin, what if we just redirect all traffic destined to >>>>> LRP >>>>> IPs to redirect port? Is there any drawbacks? >>>>> For security it is possible to optionally use ACLs with or without >>>>> conntrack. It's up to user/CMS. >>>>> >>>>> This seems quite simple in the code and very flexible. So no >>>>> additional >>>>> flows seems to be needed in future to support any other routing >>>>> protocols (or for another possible non-routing usecases). >>>>> >>>> Won't this break all "transit" traffic that has destination IP the >>>> LRP >>>> IP (DNAT, LB), etc? >>>> >>> I'm afraid so. I guess I was just tired yesterday when I made that >>> proposal for general redirect. Given that the redirect is implemented >>> in the LS pipeline, it would eat up all the traffic for LRP's IP before >>> the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick test in >>> case I'm wrong here again. >> Oops, this will definitely break that cases. For some reason I thought >> only about routing cases. >> >>> If that's the case, I'll try the conntrack approach and update you all >>> asap. >>> >>> Martin. >>> >>>>>>> Having a filter (or match like in ACL or >>>>>>> Logical_Router_Policies) looks >>>>>>> more flexible in terms of used protocols. This can coexist with >>>>>>> proposal >>>>>>> from current patch, where the flow is not programmable from >>>>>>> user. >>>>>>> >>>>>> I think so too. >>>>>> Regards, >>>>>> Dumitru >>>>>> >>>>>>>> But again, if we consider the current feature "experimental", >>>>>>>> we can >>>>>>>> spend more time during the next release cycle to figure out >>>>>>>> what's best. >>>>>>>> >>>>>>>>> v4 of this patch renames the feature from "bgp-mirror" to >>>>>>>>> "bgp-redirect" >>>>>>>>> as discussed during community meeting. >>>>>>>>> >>>>>>>>> northd/northd.c | 108 >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>>>>> tests/system-ovn.at | 86 >>>>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>>>> 5 files changed, 289 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>> index 4353df07d..088104f25 100644 >>>>>>>>> --- a/northd/northd.c >>>>>>>>> +++ b/northd/northd.c >>>>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> +static void >>>>>>>>> +build_bgp_redirect_rule__( >>>>>>>>> + const char *s_addr, const char *bgp_port_name, >>>>>>>>> bool is_ipv6, >>>>>>>>> + struct ovn_port *ls_peer, struct lflow_table >>>>>>>>> *lflows, >>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>> +{ >>>>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>>>> + /* Redirect packets in the input pipeline destined for >>>>>>>>> LR's IP to >>>>>>>>> + * the port specified in 'bgp-redirect' option. >>>>>>>>> + */ >>>>>>>>> + ds_clear(match); >>>>>>>>> + ds_clear(actions); >>>>>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == >>>>>>>>> 179", ip_ver, s_addr); >>>>>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>>>>> bgp_port_name); >>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>> S_SWITCH_IN_L2_LKUP, 100, >>>>>>>>> + ds_cstr(match), >>>>>>>>> + ds_cstr(actions), >>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>> + >>>>>>>>> + >>>>>>>>> + /* Drop any traffic originating from 'bgp-redirect' >>>>>>>>> port that does >>>>>>>>> + * not originate from BGP daemon port. This blocks >>>>>>>>> unnecessary >>>>>>>>> + * traffic like ARP broadcasts or IPv6 router >>>>>>>>> solicitation packets >>>>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>>>> + */ >>>>>>>>> + ds_clear(match); >>>>>>>>> + ds_put_format(match, "inport == \"%s\"", >>>>>>>>> bgp_port_name); >>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>>>> + ds_cstr(match), >>>>>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>>>>> + ls_peer->lflow_ref); >>>>>>>> Not a huge deal but this logical flow gets regenerated >>>>>>>> multiple times >>>>>>>> for each bgp_port_name (for each LRP address). In the end >>>>>>>> we'll only >>>>>>>> install one in the database so there's no impact on SB. But >>>>>>>> maybe we >>>>>>>> should consider moving this part out of this function. >>>>>>>> >>>>>>>>> + >>>>>>>>> + ds_put_format(match, >>>>>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>>>>> + ip_ver, >>>>>>>>> + s_addr); >>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>>>> + ds_cstr(match), >>>>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>>>> check_in_port_sec(); next;", >>>>>>>>> + ls_peer->lflow_ref); >>>>>>>> In your system test we set 'addresses = "unknown"' for >>>>>>>> 'bgp_port_name'. >>>>>>>> And I guess we kind of have to do it like that all the time >>>>>>>> when using >>>>>>>> this feature. I'm trying to think of scenarios in which >>>>>>>> users set >>>>>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>>>>> addresses>'. >>>>>>>> I'm not sure I saw those until now but maybe I'm wrong. >>>>>>> I'm also interested whether this LSP port with unknown >>>>>>> addresses will >>>>>>> get all the broadcast traffic? >>>>>>> >>>>>>> @Martin, didn't you look for ARP behavior? If we have one LS >>>>>>> with three >>>>>>> LSPs: >>>>>>> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP >>>>>>> 169.254.0.1/30 >>>>>>> - one "vif" LSP with "unknown" addresses set and configured >>>>>>> underlying >>>>>>> OVS port with same MAC and IP set >>>>>>> - one localnet port for external connectivity. >>>>>>> >>>>>>> If some party from localnet side sends broadcast ARP request >>>>>>> "Who has >>>>>>> 169.254.0.1? Answer <...>", will this ARP request got flooded >>>>>>> to both >>>>>>> LRP and LSP with "unknown" addresses? If yes, will the sender >>>>>>> receive 2 >>>>>>> ARP replies? >>>>>>> >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +static void >>>>>>>>> +build_lrouter_bgp_redirect( >>>>>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>> +{ >>>>>>>>> + /* LRP has to have a peer.*/ >>>>>>>>> + if (op->peer == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + /* LRP has to have NB record.*/ >>>>>>>>> + if (op->nbrp == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' >>>>>>>>> option set. Value of this >>>>>>>>> + * option is the name of LSP to which a BGP traffic >>>>>>>>> will be redirected. >>>>>>>>> + */ >>>>>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, >>>>>>>>> "bgp-redirect"); >>>>>>>>> + if (bgp_port == NULL) { >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>>>>> redirected, exists.*/ >>>>>>>>> + struct ovn_port *peer_lsp; >>>>>>>>> + bool bgp_port_exists = false; >>>>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od- >>>>>>>>>> ports) { >>>>>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>>>> + && !strncmp(peer_lsp->key, bgp_port, >>>>>>>>> peer_lsp_s)){ >>>>>>>>> + bgp_port_exists = true; >>>>>>>>> + break; >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (!bgp_port_exists) { >>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>>>>> Logical Router Port " >>>>>>>>> + "'%s' refers to non-existent >>>>>>>>> Logical Switch Port. " >>>>>>>>> + "BGP redirecting won't be >>>>>>>>> configured.", >>>>>>>>> + op->key); >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (op->cr_port != NULL) { >>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>>>>> supported on" >>>>>>>>> + " Distributed Gateway Port >>>>>>>>> '%s'", op->key); >>>>>>>>> + return; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* Mirror traffic destined for LRP's IPs and default >>>>>>>>> BGP port >>>>>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>>>>> + */ >>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; >>>>>>>>> i++) { >>>>>>>>> + const char *ip_s = op- >>>>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, >>>>>>>>> op->peer, lflows, >>>>>>>>> + match, actions); >>>>>>>>> + } >> I'm thinking more and more that having a "match" expression in a >> "NB.Logical_Router_Port.options:redirect-match" option it a good choice. >> > Would the match be enforced only for redirected traffic (from outside to > bgp target LSP)? I guess I didn't get your question, sorry. > >> LRP can have multiple IP addresses and enabling redirect for all of them >> could be implicit for the user and may contain security concerns. As an >> example: LR can have configured LB with tcp/179. If user configured >> peering IP-address (which can be in a link-local address range) and also >> added an IP address for Load Balancer in a `networks` list of a LRP. >> Enabling BGP redirect for all IPs of LRP will break load balancing for >> tcp/179. While it is not needed for load balancers to assign their VIP >> to networks, there is no validations, which prohibit doing so. It is > I think this is actually something that ovn-kubernetes does in their > deployments for cluster-IP services. And I suspect that today there's > nothing preventing users from creating cluster-IP services for TCP/179. So, this proofs that this is a possible problem if enabled redirect and if load balancer's IP is configured to be in networks list. In any case this doesn't prevent from current implementation to merged as "experimental". I'm just started discussion about a GA solution. > >> still possible and can be difficult to catch such behavior for >> redirected traffic with Load Balancers. >> >>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; >>>>>>>>> i++) { >>>>>>>>> + const char *ip_s = op- >>>>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, >>>>>>>>> op->peer, lflows, >>>>>>>>> + match, actions); >>>>>>>>> + } >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> /* This function adds ARP resolve flows related to a LSP. >>>>>>>>> */ >>>>>>>>> static void >>>>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port >>>>>>>>> *op, >>>>>>>>> lsi->meter_groups, op- >>>>>>>>>> lflow_ref); >>>>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>>>>> lflows, &lsi->match, >>>>>>>>> &lsi- >>>>>>>>>> actions, op->lflow_ref); >>>>>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi- >>>>>>>>>> match, &lsi->actions); >>>>>>>>> } >>>>>>>>> >>>>>>>>> static void * >>>>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn- >>>>>>>>> northd.8.xml >>>>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>>>> @@ -284,6 +284,21 @@ >>>>>>>>> dropped in the next stage. >>>>>>>>> </li> >>>>>>>>> >>>>>>>>> + <li> >>>>>>>>> + For each logical port that's defined as a target >>>>>>>>> of BGP redirecting >>>>>>>>> + (via <code>bgp-redirect</code> option set on >>>>>>>>> Logical Router Port), a >>>>>>>>> + filter is set in place that disallows any traffic >>>>>>>>> entering this port >>>>>>>>> + that does not originate from Logical Router Port's >>>>>>>>> IPs and default BGP >>>>>>>>> + port (TCP 179). This filtering is achieved by two >>>>>>>>> rules. First rule has >>>>>>>>> + priority 81, it matches on <code>inport == >>>>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>>>>> tcp.src == 179</code> >>>>>>>>> + and allows traffic further with >>>>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>>>> + check_in_port_sec(); next;</code>. Second rule >>>>>>>>> with priority 80 matches >>>>>>>>> + the rest of the traffic from that port and sets >>>>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so >>>>>>>>> that the packets are >>>>>>>>> + dropped in the next stage. >>>>>>>>> + </li> >>>>>>>>> + >>>>>>>>> <li> >>>>>>>>> For each (enabled) vtep logical port, a priority >>>>>>>>> 70 flow is added which >>>>>>>>> matches on all packets and applies the action >>>>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>>>> on the logical switch. >>>>>>>>> </li> >>>>>>>>> >>>>>>>>> + <li> >>>>>>>>> + For any logical port that's defined as a target of >>>>>>>>> BGP redirecting (via >>>>>>>>> + <code>bgp-redirect</code> option set on Logical >>>>>>>>> Router Port), a >>>>>>>>> + priority-100 flow is added that redirects traffic >>>>>>>>> for Logical Router >>>>>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to >>>>>>>>> the targeted >>>>>>>>> + logical port. >>>>>>>>> + </li> >>>>>>>>> + >>>>>>>>> <li> >>>>>>>>> Priority-90 flows for transit switches that >>>>>>>>> forward registered >>>>>>>>> IP multicast traffic to their corresponding >>>>>>>>> multicast group , which >>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>>>> --- a/ovn-nb.xml >>>>>>>>> +++ b/ovn-nb.xml >>>>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>>>> </p> >>>>>>>>> </column> >>>>>>>>> >>>>>>>>> + <column name="options" key="bgp-redirect" >>>>>>>>> type='{"type": "string"}'> >>>>>>>>> + <p> >>>>>>>>> + This option expects a name of a Logical Switch >>>>>>>>> Port that's present >>>>>>>>> + in the peer's Logical Switch. If set, it causes >>>>>>>>> any traffic >>>>>>>>> + that's destined for Logical Router Port's IP >>>>>>>>> addresses (including >>>>>>>>> + its IPv6 LLA) and the default BGP port (TCP >>>>>>>>> 179), to be redirected >>>>>>>>> + to the specified Logical Switch Port. >>>>>>>>> + >>>>>>>>> + This allows external BGP daemon to be bound to a >>>>>>>>> port in OVN's >>>>>>>>> + Logical Switch and act as if it was listening on >>>>>>>>> Logical Router >>>>>>>>> + Port's IP address. >>>>>>>>> + </p> >>>>>>>>> + </column> >>>>>>>>> + >>>>>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>>>>> <p> >>>>>>>>> When configured, represents a match expression, >>>>>>>>> in the same >>>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>>>>> index a389d1988..df1e924ab 100644 >>>>>>>>> --- a/tests/ovn-northd.at >>>>>>>>> +++ b/tests/ovn-northd.at >>>>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr >>>>>>>>> | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>>>> >>>>>>>>> AT_CLEANUP >>>>>>>>> ]) >>>>>>>>> + >>>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>> +ovn_start >>>>>>>>> + >>>>>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>>>>> + >>>>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>>>>> options:chassis=hv1 >>>>>>>>> + >>>>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>>>> + lsp-add ls ls-lr -- \ >>>>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>>>> + >>>>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>>>> + >>>>>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>>>>> installed. >>>>>>>>> +check_no_bgp_redirect() { >>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>> ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>>> wc -l], [0], [0 >>>>>>>>> +]) >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +# By default, no rules related to BGP redirect are present >>>>>>>>> +check_no_bgp_redirect >>>>>>>>> + >>>>>>>>> +# Set "lsp-bgp" port as target of BGP control plane >>>>>>>>> redirected traffic >>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>> options:bgp-redirect=lsp-bgp >>>>>>>>> + >>>>>>>>> +# Check that BGP control plane traffic is redirected to >>>>>>>>> "lsp-bgp" >>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>>>>> grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> +# Check that only BGP-related traffic is accepted on "lsp- >>>>>>>>> bgp" port >>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>>> ovn_strip_lflows], [0], [dnl >>>>>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && >>>>>>>>> tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>>>>> next;) >>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>> match=(inport == "lsp-bgp" && ip6.src == >>>>>>>>> fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] >>>>>>>>> = check_in_port_sec(); next;) >>>>>>>>> +]) >>>>>>>>> + >>>>>>>>> +# Remove 'bgp-redirect' option from LRP and check that >>>>>>>>> rules are removed >>>>>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>>>>> options bgp-redirect >>>>>>>>> +check_no_bgp_redirect >>>>>>>>> + >>>>>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and >>>>>>>>> check that no rules are added >>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>> options:bgp-redirect=lsp-foo >>>>>>>>> +check_no_bgp_redirect >>>>>>>>> + >>>>>>>>> +AT_CLEANUP >>>>>>>>> +]) >>>>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>>>> --- a/tests/system-ovn.at >>>>>>>>> +++ b/tests/system-ovn.at >>>>>>>>> @@ -13027,3 +13027,89 @@ >>>>>>>>> OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch- >>>>>>>>> .*/d >>>>>>>>> /connection dropped.*/d"]) >>>>>>>>> AT_CLEANUP >>>>>>>>> ]) >>>>>>>>> + >>>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) >>>>>>>> here. >>>>>>>> >>>>>>>>> +ovn_start >>>>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>>>> + >>>>>>>>> +ADD_BR([br-int]) >>>>>>>>> +ADD_BR([br-ext]) >>>>>>>>> + >>>>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>>>>> +ovs-vsctl \ >>>>>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>> type=geneve \ >>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>> ip=169.0.0.1 \ >>>>>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>>>>> config:disable-in-band=true >>>>>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>>>>> test. >>>>>>>> >>>>>>>>> + >>>>>>>>> +# Start ovn-controller >>>>>>>>> +start_daemon ovn-controller >>>>>>>>> + >>>>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>>>>> + >>>>>>>>> +ovn-nbctl ls-add public >>>>>>>>> + >>>>>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 >>>>>>>>> 172.16.1.1/24 >>>>>>>>> + >>>>>>>>> +ovn-nbctl lsp-add public public-rp -- set >>>>>>>>> Logical_Switch_Port public-rp \ >>>>>>>>> + type=router options:router-port=rp-public \ >>>>>>>>> + -- lsp-set-addresses public-rp router >>>>>>>>> + >>>>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>>>> + >>>>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>>>>> + >>>>>>>>> +ovn-nbctl --wait=hv sync >>>>>>>>> + >>>>>>>>> +# Set option that redirects BGP control plane traffic to a >>>>>>>>> LSP "bgp-daemon" >>>>>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>>>>> options:bgp-redirect=bgp-daemon >>>>>>>>> + >>>>>>>>> +# Create "bgp-daemon" interface in a namespace with IP and >>>>>>>>> MAC matching LRP "rp-public" >>>>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>>>>> "00:00:02:01:02:03") >>>>>>>>> + >>>>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>>>>> "00:10:10:01:02:13", \ >>>>>>>>> + "172.16.1.1") >>>>>>>>> + >>>>>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>>>>> + >>>>>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise >>>>>>>>> it can't be bound to. >>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo >>>>>>>>> | grep "fe80::" | grep -v tentative])]) >>>>>>>>> + >>>>>>>>> +# Verify that BGP control plane traffic is delivered to >>>>>>>>> the "bgp-daemon" >>>>>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>>>>> [bgp_v4.pid]) >>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>>> 172.16.1.1 179]) >>>>>>>>> + >>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>>> -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>>>> + >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>>>> + >>>>>>>>> +as ovn-sb >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>> + >>>>>>>>> +as ovn-nb >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>> + >>>>>>>>> +as northd >>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>>>> + >>>>>>>>> +as >>>>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>>>> +AT_CLEANUP >>>>>>>>> +]) >>>>>>>> Regards, >>>>>>>> Dumitru >>>>>>>> >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> dev@openvswitch.org >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 8/8/24 13:51, Vladislav Odintsov wrote: > > On 08.08.2024 18:45, Dumitru Ceara wrote: >> On 8/8/24 11:14, Vladislav Odintsov wrote: >>> On 08.08.2024 15:51, martin.kalcok@canonical.com wrote: >>>> On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: >>>>> On 8/8/24 10:42, Vladislav Odintsov wrote: >>>>>> On 08.08.2024 03:16, Dumitru Ceara wrote: >>>>>>> Re-adding the dev list. >>>>>>> >>>>>>> On 8/7/24 18:12, Vladislav Odintsov wrote: >>>>>>>> Hi Dumitru, >>>>>>>> >>>>>>> Hi Vladislav, >>>>>>> >>>>>>>> I'd like to add some thoughts to your inputs. >>>>>>>> >>>>>>> Thanks for that, I added some more of my own below. :) >>>>>>> >>>>>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>>>>> Hi Martin, >>>>>>>>> >>>>>>>>> Sorry, I was reviewing V3 but I was slow in actually sending >>>>>>>>> out the email. >>>>>>>>> >>>>>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>>>>>> redirecting of BGP control plane traffic to an arbitrary >>>>>>>>>> LSP >>>>>>>>>> in its peer LS. >>>>>>>>>> >>>>>>>>>> The option expects a string with a LSP name. When set, >>>>>>>>>> any traffic entering LS that's destined for any of the >>>>>>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>>>>>> to the LSP specified in the option's value. >>>>>>>>>> >>>>>>>>>> This enables external BGP daemons to listen on an interface >>>>>>>>>> bound to a LSP and effectively act as if they were >>>>>>>>>> listening >>>>>>>>>> on (and speaking from) LRP's IP address. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>>>>>>>> --- >>>>>>>>> Strictly on this version of the patch, and with the thought >>>>>>>>> in mind that >>>>>>>>> we might want to consider this feature experimental [0] and >>>>>>>>> maybe change >>>>>>>>> it (NB too) in the future, I left a few comments inline. >>>>>>>>> With those >>>>>>>>> addressed I think the patch looks OK to me. >>>>>>>>> >>>>>>>>> [0] >>>>>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>>>>> >>>>>>>>> Now, for the point when we'd remove the "experimental" tag: >>>>>>>>> >>>>>>>>> In general, it makes me a bit uneasy that we have to share >>>>>>>>> the MAC and >>>>>>>>> IP between the LRP and the VIF of logical switch port that's >>>>>>>>> connected >>>>>>>>> to the same switch as the LRP. >>>>>>>>> >>>>>>>>> I was thinking of alternatives for the future and the only >>>>>>>>> things I >>>>>>>>> could come up with until now are: >>>>>>>>> >>>>>>>>> 1. Create a separate, "disconnected" logical switch: >>>>>>>>> ovn-nbctl ls-add bgp >>>>>>>>> >>>>>>>>> Add the bgp-daemon port to it: >>>>>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>>>>> >>>>>>>>> Then we don't need "unknown" either, I think. >>>>>>>>> >>>>>>>>> But it's not possible today to move packets from the ingress >>>>>>>>> pipeline of >>>>>>>>> a logical switch ("public" in this test) to the egress >>>>>>>>> pipeline of a >>>>>>>>> different logical switch ("bgp" in my example). It also >>>>>>>>> feels icky to >>>>>>>>> implement that. >>>>>>>>> >>>>>>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>>>>>> router port: >>>>>>>>> >>>>>>>>> then we could do the same type of redirect you do here but >>>>>>>>> instead in >>>>>>>>> the logical router pipeline. The advantage is we don't have >>>>>>>>> to drop any >>>>>>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>>>>>> functioning as >>>>>>>>> it does today. >>>>>>>>> >>>>>>>>> Maybe an advantage of this second alternative would be that >>>>>>>>> we can >>>>>>>>> easily attach a filtering option to this LRP (e.g., >>>>>>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>>>>>> flexible in >>>>>>>>> what kind of traffic we forward to the actuall routing >>>>>>>>> protocol daemon >>>>>>>>> that runs behind that OVS port - Vladislav also mentioned >>>>>>>>> during the >>>>>>>>> meeting it might be interesting to forward BFD packets to the >>>>>>>>> FRR (or >>>>>>>>> whatever application implements the routing protocol) >>>>>>>>> instance too. >>>>>>>> The idea to be able to bind LRP to OVS port sounds very >>>>>>>> interesting to >>>>>>>> me. But with a note that it's not a pure "bind", but a partial: >>>>>>>> as you >>>>>>>> wrote with some "filter" applied to pass not all the traffic. >>>>>>>> The main >>>>>>>> idea here is to pass only control plane traffic destined to >>>>>>>> LRP's >>>>>>> As we're discussing on the other thread (Martin pointed it out) >>>>>>> we also >>>>>>> probably need to involve conntrack and allow replies to any kind >>>>>>> of >>>>>>> traffic initiated by the entity behind the LRP's VIF (e.g., BGP >>>>>>> sessions >>>>>>> initiated from the host). >>>>>>> >>>>>>>> addresses. Other than that seems odd. Transit traffic should be >>>>>>>> remained >>>>>>>> in LR pipeline otherwise it will have no difference with a >>>>>>>> regular VIF LSP. >>>>>>>> >>>>>>> Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or >>>>>>> LB >>>>>>> rules that will change the destination IP from LRP IP to >>>>>>> something else) >>>>>>> should not be affected. All other "transit" traffic doesn't have >>>>>>> LRP >>>>>>> IP, does it? >>>>>> You're right. >>>>>> >>>>>> @Dumitru, @Martin, what if we just redirect all traffic destined to >>>>>> LRP >>>>>> IPs to redirect port? Is there any drawbacks? >>>>>> For security it is possible to optionally use ACLs with or without >>>>>> conntrack. It's up to user/CMS. >>>>>> >>>>>> This seems quite simple in the code and very flexible. So no >>>>>> additional >>>>>> flows seems to be needed in future to support any other routing >>>>>> protocols (or for another possible non-routing usecases). >>>>>> >>>>> Won't this break all "transit" traffic that has destination IP the >>>>> LRP >>>>> IP (DNAT, LB), etc? >>>>> >>>> I'm afraid so. I guess I was just tired yesterday when I made that >>>> proposal for general redirect. Given that the redirect is implemented >>>> in the LS pipeline, it would eat up all the traffic for LRP's IP before >>>> the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick test in >>>> case I'm wrong here again. >>> Oops, this will definitely break that cases. For some reason I thought >>> only about routing cases. >>> >>>> If that's the case, I'll try the conntrack approach and update you all >>>> asap. >>>> >>>> Martin. >>>> >>>>>>>> Having a filter (or match like in ACL or >>>>>>>> Logical_Router_Policies) looks >>>>>>>> more flexible in terms of used protocols. This can coexist with >>>>>>>> proposal >>>>>>>> from current patch, where the flow is not programmable from >>>>>>>> user. >>>>>>>> >>>>>>> I think so too. >>>>>>> Regards, >>>>>>> Dumitru >>>>>>> >>>>>>>>> But again, if we consider the current feature "experimental", >>>>>>>>> we can >>>>>>>>> spend more time during the next release cycle to figure out >>>>>>>>> what's best. >>>>>>>>> >>>>>>>>>> v4 of this patch renames the feature from "bgp-mirror" to >>>>>>>>>> "bgp-redirect" >>>>>>>>>> as discussed during community meeting. >>>>>>>>>> >>>>>>>>>> northd/northd.c | 108 >>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>>>>>> tests/system-ovn.at | 86 >>>>>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>>>>> 5 files changed, 289 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>>> index 4353df07d..088104f25 100644 >>>>>>>>>> --- a/northd/northd.c >>>>>>>>>> +++ b/northd/northd.c >>>>>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> +static void >>>>>>>>>> +build_bgp_redirect_rule__( >>>>>>>>>> + const char *s_addr, const char *bgp_port_name, >>>>>>>>>> bool is_ipv6, >>>>>>>>>> + struct ovn_port *ls_peer, struct lflow_table >>>>>>>>>> *lflows, >>>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>>> +{ >>>>>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>>>>> + /* Redirect packets in the input pipeline destined for >>>>>>>>>> LR's IP to >>>>>>>>>> + * the port specified in 'bgp-redirect' option. >>>>>>>>>> + */ >>>>>>>>>> + ds_clear(match); >>>>>>>>>> + ds_clear(actions); >>>>>>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == >>>>>>>>>> 179", ip_ver, s_addr); >>>>>>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>>>>>> bgp_port_name); >>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>> S_SWITCH_IN_L2_LKUP, 100, >>>>>>>>>> + ds_cstr(match), >>>>>>>>>> + ds_cstr(actions), >>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>>> + >>>>>>>>>> + >>>>>>>>>> + /* Drop any traffic originating from 'bgp-redirect' >>>>>>>>>> port that does >>>>>>>>>> + * not originate from BGP daemon port. This blocks >>>>>>>>>> unnecessary >>>>>>>>>> + * traffic like ARP broadcasts or IPv6 router >>>>>>>>>> solicitation packets >>>>>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>>>>> + */ >>>>>>>>>> + ds_clear(match); >>>>>>>>>> + ds_put_format(match, "inport == \"%s\"", >>>>>>>>>> bgp_port_name); >>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>>>>> + ds_cstr(match), >>>>>>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>> Not a huge deal but this logical flow gets regenerated >>>>>>>>> multiple times >>>>>>>>> for each bgp_port_name (for each LRP address). In the end >>>>>>>>> we'll only >>>>>>>>> install one in the database so there's no impact on SB. But >>>>>>>>> maybe we >>>>>>>>> should consider moving this part out of this function. >>>>>>>>> >>>>>>>>>> + >>>>>>>>>> + ds_put_format(match, >>>>>>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>>>>>> + ip_ver, >>>>>>>>>> + s_addr); >>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>>>>> + ds_cstr(match), >>>>>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>>>>> check_in_port_sec(); next;", >>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>> In your system test we set 'addresses = "unknown"' for >>>>>>>>> 'bgp_port_name'. >>>>>>>>> And I guess we kind of have to do it like that all the time >>>>>>>>> when using >>>>>>>>> this feature. I'm trying to think of scenarios in which >>>>>>>>> users set >>>>>>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>>>>>> addresses>'. >>>>>>>>> I'm not sure I saw those until now but maybe I'm wrong. >>>>>>>> I'm also interested whether this LSP port with unknown >>>>>>>> addresses will >>>>>>>> get all the broadcast traffic? >>>>>>>> >>>>>>>> @Martin, didn't you look for ARP behavior? If we have one LS >>>>>>>> with three >>>>>>>> LSPs: >>>>>>>> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP >>>>>>>> 169.254.0.1/30 >>>>>>>> - one "vif" LSP with "unknown" addresses set and configured >>>>>>>> underlying >>>>>>>> OVS port with same MAC and IP set >>>>>>>> - one localnet port for external connectivity. >>>>>>>> >>>>>>>> If some party from localnet side sends broadcast ARP request >>>>>>>> "Who has >>>>>>>> 169.254.0.1? Answer <...>", will this ARP request got flooded >>>>>>>> to both >>>>>>>> LRP and LSP with "unknown" addresses? If yes, will the sender >>>>>>>> receive 2 >>>>>>>> ARP replies? >>>>>>>> >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static void >>>>>>>>>> +build_lrouter_bgp_redirect( >>>>>>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>>> +{ >>>>>>>>>> + /* LRP has to have a peer.*/ >>>>>>>>>> + if (op->peer == NULL) { >>>>>>>>>> + return; >>>>>>>>>> + } >>>>>>>>>> + /* LRP has to have NB record.*/ >>>>>>>>>> + if (op->nbrp == NULL) { >>>>>>>>>> + return; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' >>>>>>>>>> option set. Value of this >>>>>>>>>> + * option is the name of LSP to which a BGP traffic >>>>>>>>>> will be redirected. >>>>>>>>>> + */ >>>>>>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, >>>>>>>>>> "bgp-redirect"); >>>>>>>>>> + if (bgp_port == NULL) { >>>>>>>>>> + return; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>>>>>> redirected, exists.*/ >>>>>>>>>> + struct ovn_port *peer_lsp; >>>>>>>>>> + bool bgp_port_exists = false; >>>>>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od- >>>>>>>>>>> ports) { >>>>>>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>>>>> + && !strncmp(peer_lsp->key, bgp_port, >>>>>>>>>> peer_lsp_s)){ >>>>>>>>>> + bgp_port_exists = true; >>>>>>>>>> + break; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + if (!bgp_port_exists) { >>>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>>>>>> Logical Router Port " >>>>>>>>>> + "'%s' refers to non-existent >>>>>>>>>> Logical Switch Port. " >>>>>>>>>> + "BGP redirecting won't be >>>>>>>>>> configured.", >>>>>>>>>> + op->key); >>>>>>>>>> + return; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + if (op->cr_port != NULL) { >>>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>>>>>> supported on" >>>>>>>>>> + " Distributed Gateway Port >>>>>>>>>> '%s'", op->key); >>>>>>>>>> + return; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + /* Mirror traffic destined for LRP's IPs and default >>>>>>>>>> BGP port >>>>>>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>>>>>> + */ >>>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; >>>>>>>>>> i++) { >>>>>>>>>> + const char *ip_s = op- >>>>>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, >>>>>>>>>> op->peer, lflows, >>>>>>>>>> + match, actions); >>>>>>>>>> + } >>> I'm thinking more and more that having a "match" expression in a >>> "NB.Logical_Router_Port.options:redirect-match" option it a good choice. >>> >> Would the match be enforced only for redirected traffic (from outside to >> bgp target LSP)? > I guess I didn't get your question, sorry. I was thinking of the case Martin brought up yesterday, with the session initiated from inside. If we have the following config (not sure if the option names will change but I hope it's clear): LRP { .name = lrp .networks = [{00:00:00:00:00:01, 1.1.1.1/24}] .options = { .redirect-port = 'bgp-daemon' .redirect-match = 'tcp.dst == 179 || tcp.src == 179' } } Then, in the switch pipeline in the L2_LOOKUP stage, we'll only force the outport to be 'bgp-daemon' if "destination IP == 1.1.1.1 && 'redirect-match'" is true. So, only traffic coming from "outside" towards the router IP and that also matches the filter. No additional restrictions. Will this be enough though? What if 'bgp-daemon' ARPs for 1.1.1.2 (an external host)? How does that work? ARP reply is unicast and will have dmac that of the LRP. We could include ARP and ND in the match too but then what about ARP replies that actually should go to the router port? I'm actually thinking more and more that we shouldn't redirect any traffic like this, we should just bind the VIF directly to the LRP. >>> LRP can have multiple IP addresses and enabling redirect for all of them >>> could be implicit for the user and may contain security concerns. As an >>> example: LR can have configured LB with tcp/179. If user configured >>> peering IP-address (which can be in a link-local address range) and also >>> added an IP address for Load Balancer in a `networks` list of a LRP. >>> Enabling BGP redirect for all IPs of LRP will break load balancing for >>> tcp/179. While it is not needed for load balancers to assign their VIP >>> to networks, there is no validations, which prohibit doing so. It is >> I think this is actually something that ovn-kubernetes does in their >> deployments for cluster-IP services. And I suspect that today there's >> nothing preventing users from creating cluster-IP services for TCP/179. > > So, this proofs that this is a possible problem if enabled redirect and > if load balancer's IP is configured to be in networks list. > > In any case this doesn't prevent from current implementation to merged > as "experimental". I'm just started discussion about a GA solution. > >>> still possible and can be difficult to catch such behavior for >>> redirected traffic with Load Balancers. >>> >>>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; >>>>>>>>>> i++) { >>>>>>>>>> + const char *ip_s = op- >>>>>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, >>>>>>>>>> op->peer, lflows, >>>>>>>>>> + match, actions); >>>>>>>>>> + } >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> /* This function adds ARP resolve flows related to a LSP. >>>>>>>>>> */ >>>>>>>>>> static void >>>>>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port >>>>>>>>>> *op, >>>>>>>>>> lsi->meter_groups, op- >>>>>>>>>>> lflow_ref); >>>>>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>>>>>> lflows, &lsi->match, >>>>>>>>>> &lsi- >>>>>>>>>>> actions, op->lflow_ref); >>>>>>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi- >>>>>>>>>>> match, &lsi->actions); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> static void * >>>>>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn- >>>>>>>>>> northd.8.xml >>>>>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>>>>> @@ -284,6 +284,21 @@ >>>>>>>>>> dropped in the next stage. >>>>>>>>>> </li> >>>>>>>>>> >>>>>>>>>> + <li> >>>>>>>>>> + For each logical port that's defined as a target >>>>>>>>>> of BGP redirecting >>>>>>>>>> + (via <code>bgp-redirect</code> option set on >>>>>>>>>> Logical Router Port), a >>>>>>>>>> + filter is set in place that disallows any traffic >>>>>>>>>> entering this port >>>>>>>>>> + that does not originate from Logical Router Port's >>>>>>>>>> IPs and default BGP >>>>>>>>>> + port (TCP 179). This filtering is achieved by two >>>>>>>>>> rules. First rule has >>>>>>>>>> + priority 81, it matches on <code>inport == >>>>>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>>>>>> tcp.src == 179</code> >>>>>>>>>> + and allows traffic further with >>>>>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>>>>> + check_in_port_sec(); next;</code>. Second rule >>>>>>>>>> with priority 80 matches >>>>>>>>>> + the rest of the traffic from that port and sets >>>>>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so >>>>>>>>>> that the packets are >>>>>>>>>> + dropped in the next stage. >>>>>>>>>> + </li> >>>>>>>>>> + >>>>>>>>>> <li> >>>>>>>>>> For each (enabled) vtep logical port, a priority >>>>>>>>>> 70 flow is added which >>>>>>>>>> matches on all packets and applies the action >>>>>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>>>>> on the logical switch. >>>>>>>>>> </li> >>>>>>>>>> >>>>>>>>>> + <li> >>>>>>>>>> + For any logical port that's defined as a target of >>>>>>>>>> BGP redirecting (via >>>>>>>>>> + <code>bgp-redirect</code> option set on Logical >>>>>>>>>> Router Port), a >>>>>>>>>> + priority-100 flow is added that redirects traffic >>>>>>>>>> for Logical Router >>>>>>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to >>>>>>>>>> the targeted >>>>>>>>>> + logical port. >>>>>>>>>> + </li> >>>>>>>>>> + >>>>>>>>>> <li> >>>>>>>>>> Priority-90 flows for transit switches that >>>>>>>>>> forward registered >>>>>>>>>> IP multicast traffic to their corresponding >>>>>>>>>> multicast group , which >>>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>>>>> --- a/ovn-nb.xml >>>>>>>>>> +++ b/ovn-nb.xml >>>>>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>>>>> </p> >>>>>>>>>> </column> >>>>>>>>>> >>>>>>>>>> + <column name="options" key="bgp-redirect" >>>>>>>>>> type='{"type": "string"}'> >>>>>>>>>> + <p> >>>>>>>>>> + This option expects a name of a Logical Switch >>>>>>>>>> Port that's present >>>>>>>>>> + in the peer's Logical Switch. If set, it causes >>>>>>>>>> any traffic >>>>>>>>>> + that's destined for Logical Router Port's IP >>>>>>>>>> addresses (including >>>>>>>>>> + its IPv6 LLA) and the default BGP port (TCP >>>>>>>>>> 179), to be redirected >>>>>>>>>> + to the specified Logical Switch Port. >>>>>>>>>> + >>>>>>>>>> + This allows external BGP daemon to be bound to a >>>>>>>>>> port in OVN's >>>>>>>>>> + Logical Switch and act as if it was listening on >>>>>>>>>> Logical Router >>>>>>>>>> + Port's IP address. >>>>>>>>>> + </p> >>>>>>>>>> + </column> >>>>>>>>>> + >>>>>>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>>>>>> <p> >>>>>>>>>> When configured, represents a match expression, >>>>>>>>>> in the same >>>>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>>>>>> index a389d1988..df1e924ab 100644 >>>>>>>>>> --- a/tests/ovn-northd.at >>>>>>>>>> +++ b/tests/ovn-northd.at >>>>>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr >>>>>>>>>> | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>>>>> >>>>>>>>>> AT_CLEANUP >>>>>>>>>> ]) >>>>>>>>>> + >>>>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>>> +ovn_start >>>>>>>>>> + >>>>>>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>>>>>> + >>>>>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>>>>>> options:chassis=hv1 >>>>>>>>>> + >>>>>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>>>>> + lsp-add ls ls-lr -- \ >>>>>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>>>>> + >>>>>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>>>>> + >>>>>>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>>>>>> installed. >>>>>>>>>> +check_no_bgp_redirect() { >>>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>> ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>>>>>> +]) >>>>>>>>>> + >>>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>>>> wc -l], [0], [0 >>>>>>>>>> +]) >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +# By default, no rules related to BGP redirect are present >>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>> + >>>>>>>>>> +# Set "lsp-bgp" port as target of BGP control plane >>>>>>>>>> redirected traffic >>>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>>> options:bgp-redirect=lsp-bgp >>>>>>>>>> + >>>>>>>>>> +# Check that BGP control plane traffic is redirected to >>>>>>>>>> "lsp-bgp" >>>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>>>>>> grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>>> +]) >>>>>>>>>> + >>>>>>>>>> +# Check that only BGP-related traffic is accepted on "lsp- >>>>>>>>>> bgp" port >>>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>>>> ovn_strip_lflows], [0], [dnl >>>>>>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && >>>>>>>>>> tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>>>>>> next;) >>>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>>> match=(inport == "lsp-bgp" && ip6.src == >>>>>>>>>> fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] >>>>>>>>>> = check_in_port_sec(); next;) >>>>>>>>>> +]) >>>>>>>>>> + >>>>>>>>>> +# Remove 'bgp-redirect' option from LRP and check that >>>>>>>>>> rules are removed >>>>>>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>>>>>> options bgp-redirect >>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>> + >>>>>>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and >>>>>>>>>> check that no rules are added >>>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>>> options:bgp-redirect=lsp-foo >>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>> + >>>>>>>>>> +AT_CLEANUP >>>>>>>>>> +]) >>>>>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>>>>> --- a/tests/system-ovn.at >>>>>>>>>> +++ b/tests/system-ovn.at >>>>>>>>>> @@ -13027,3 +13027,89 @@ >>>>>>>>>> OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch- >>>>>>>>>> .*/d >>>>>>>>>> /connection dropped.*/d"]) >>>>>>>>>> AT_CLEANUP >>>>>>>>>> ]) >>>>>>>>>> + >>>>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) >>>>>>>>> here. >>>>>>>>> >>>>>>>>>> +ovn_start >>>>>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>>>>> + >>>>>>>>>> +ADD_BR([br-int]) >>>>>>>>>> +ADD_BR([br-ext]) >>>>>>>>>> + >>>>>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>>>>>> +ovs-vsctl \ >>>>>>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>>> type=geneve \ >>>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>>> ip=169.0.0.1 \ >>>>>>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>>>>>> config:disable-in-band=true >>>>>>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>>>>>> test. >>>>>>>>> >>>>>>>>>> + >>>>>>>>>> +# Start ovn-controller >>>>>>>>>> +start_daemon ovn-controller >>>>>>>>>> + >>>>>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>>>>>> + >>>>>>>>>> +ovn-nbctl ls-add public >>>>>>>>>> + >>>>>>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 >>>>>>>>>> 172.16.1.1/24 >>>>>>>>>> + >>>>>>>>>> +ovn-nbctl lsp-add public public-rp -- set >>>>>>>>>> Logical_Switch_Port public-rp \ >>>>>>>>>> + type=router options:router-port=rp-public \ >>>>>>>>>> + -- lsp-set-addresses public-rp router >>>>>>>>>> + >>>>>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>>>>> + >>>>>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>>>>>> + >>>>>>>>>> +ovn-nbctl --wait=hv sync >>>>>>>>>> + >>>>>>>>>> +# Set option that redirects BGP control plane traffic to a >>>>>>>>>> LSP "bgp-daemon" >>>>>>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>>>>>> options:bgp-redirect=bgp-daemon >>>>>>>>>> + >>>>>>>>>> +# Create "bgp-daemon" interface in a namespace with IP and >>>>>>>>>> MAC matching LRP "rp-public" >>>>>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>>>>>> "00:00:02:01:02:03") >>>>>>>>>> + >>>>>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>>>>>> "00:10:10:01:02:13", \ >>>>>>>>>> + "172.16.1.1") >>>>>>>>>> + >>>>>>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>>>>>> + >>>>>>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise >>>>>>>>>> it can't be bound to. >>>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo >>>>>>>>>> | grep "fe80::" | grep -v tentative])]) >>>>>>>>>> + >>>>>>>>>> +# Verify that BGP control plane traffic is delivered to >>>>>>>>>> the "bgp-daemon" >>>>>>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>>>>>> [bgp_v4.pid]) >>>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>>>> 172.16.1.1 179]) >>>>>>>>>> + >>>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>>>> -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>>>>> + >>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>>>>> + >>>>>>>>>> +as ovn-sb >>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>>> + >>>>>>>>>> +as ovn-nb >>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>>> + >>>>>>>>>> +as northd >>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>>>>> + >>>>>>>>>> +as >>>>>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>>>>> +AT_CLEANUP >>>>>>>>>> +]) >>>>>>>>> Regards, >>>>>>>>> Dumitru >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> dev@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 8/8/24 14:04, Dumitru Ceara wrote: > On 8/8/24 13:51, Vladislav Odintsov wrote: >> >> On 08.08.2024 18:45, Dumitru Ceara wrote: >>> On 8/8/24 11:14, Vladislav Odintsov wrote: >>>> On 08.08.2024 15:51, martin.kalcok@canonical.com wrote: >>>>> On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: >>>>>> On 8/8/24 10:42, Vladislav Odintsov wrote: >>>>>>> On 08.08.2024 03:16, Dumitru Ceara wrote: >>>>>>>> Re-adding the dev list. >>>>>>>> >>>>>>>> On 8/7/24 18:12, Vladislav Odintsov wrote: >>>>>>>>> Hi Dumitru, >>>>>>>>> >>>>>>>> Hi Vladislav, >>>>>>>> >>>>>>>>> I'd like to add some thoughts to your inputs. >>>>>>>>> >>>>>>>> Thanks for that, I added some more of my own below. :) >>>>>>>> >>>>>>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>>>>>> Hi Martin, >>>>>>>>>> >>>>>>>>>> Sorry, I was reviewing V3 but I was slow in actually sending >>>>>>>>>> out the email. >>>>>>>>>> >>>>>>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>>>>>> This change adds a 'bgp-redirect' option to LRP that allows >>>>>>>>>>> redirecting of BGP control plane traffic to an arbitrary >>>>>>>>>>> LSP >>>>>>>>>>> in its peer LS. >>>>>>>>>>> >>>>>>>>>>> The option expects a string with a LSP name. When set, >>>>>>>>>>> any traffic entering LS that's destined for any of the >>>>>>>>>>> LRP's IP addresses (including IPv6 LLA) is redirected >>>>>>>>>>> to the LSP specified in the option's value. >>>>>>>>>>> >>>>>>>>>>> This enables external BGP daemons to listen on an interface >>>>>>>>>>> bound to a LSP and effectively act as if they were >>>>>>>>>>> listening >>>>>>>>>>> on (and speaking from) LRP's IP address. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>>>>>>>>> --- >>>>>>>>>> Strictly on this version of the patch, and with the thought >>>>>>>>>> in mind that >>>>>>>>>> we might want to consider this feature experimental [0] and >>>>>>>>>> maybe change >>>>>>>>>> it (NB too) in the future, I left a few comments inline. >>>>>>>>>> With those >>>>>>>>>> addressed I think the patch looks OK to me. >>>>>>>>>> >>>>>>>>>> [0] >>>>>>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>>>>>> >>>>>>>>>> Now, for the point when we'd remove the "experimental" tag: >>>>>>>>>> >>>>>>>>>> In general, it makes me a bit uneasy that we have to share >>>>>>>>>> the MAC and >>>>>>>>>> IP between the LRP and the VIF of logical switch port that's >>>>>>>>>> connected >>>>>>>>>> to the same switch as the LRP. >>>>>>>>>> >>>>>>>>>> I was thinking of alternatives for the future and the only >>>>>>>>>> things I >>>>>>>>>> could come up with until now are: >>>>>>>>>> >>>>>>>>>> 1. Create a separate, "disconnected" logical switch: >>>>>>>>>> ovn-nbctl ls-add bgp >>>>>>>>>> >>>>>>>>>> Add the bgp-daemon port to it: >>>>>>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>>>>>> >>>>>>>>>> Then we don't need "unknown" either, I think. >>>>>>>>>> >>>>>>>>>> But it's not possible today to move packets from the ingress >>>>>>>>>> pipeline of >>>>>>>>>> a logical switch ("public" in this test) to the egress >>>>>>>>>> pipeline of a >>>>>>>>>> different logical switch ("bgp" in my example). It also >>>>>>>>>> feels icky to >>>>>>>>>> implement that. >>>>>>>>>> >>>>>>>>>> 2. Add the ability to bind an OVS port directly to a logical >>>>>>>>>> router port: >>>>>>>>>> >>>>>>>>>> then we could do the same type of redirect you do here but >>>>>>>>>> instead in >>>>>>>>>> the logical router pipeline. The advantage is we don't have >>>>>>>>>> to drop any >>>>>>>>>> non-bgp traffic in the switch pipeline. The switch keeps >>>>>>>>>> functioning as >>>>>>>>>> it does today. >>>>>>>>>> >>>>>>>>>> Maybe an advantage of this second alternative would be that >>>>>>>>>> we can >>>>>>>>>> easily attach a filtering option to this LRP (e.g., >>>>>>>>>> LRP.options:control-traffic-filter) to allow us to be more >>>>>>>>>> flexible in >>>>>>>>>> what kind of traffic we forward to the actuall routing >>>>>>>>>> protocol daemon >>>>>>>>>> that runs behind that OVS port - Vladislav also mentioned >>>>>>>>>> during the >>>>>>>>>> meeting it might be interesting to forward BFD packets to the >>>>>>>>>> FRR (or >>>>>>>>>> whatever application implements the routing protocol) >>>>>>>>>> instance too. >>>>>>>>> The idea to be able to bind LRP to OVS port sounds very >>>>>>>>> interesting to >>>>>>>>> me. But with a note that it's not a pure "bind", but a partial: >>>>>>>>> as you >>>>>>>>> wrote with some "filter" applied to pass not all the traffic. >>>>>>>>> The main >>>>>>>>> idea here is to pass only control plane traffic destined to >>>>>>>>> LRP's >>>>>>>> As we're discussing on the other thread (Martin pointed it out) >>>>>>>> we also >>>>>>>> probably need to involve conntrack and allow replies to any kind >>>>>>>> of >>>>>>>> traffic initiated by the entity behind the LRP's VIF (e.g., BGP >>>>>>>> sessions >>>>>>>> initiated from the host). >>>>>>>> >>>>>>>>> addresses. Other than that seems odd. Transit traffic should be >>>>>>>>> remained >>>>>>>>> in LR pipeline otherwise it will have no difference with a >>>>>>>>> regular VIF LSP. >>>>>>>>> >>>>>>>> Definitely, traffic that needs to be DNATed (DNAT/unSNAT rules or >>>>>>>> LB >>>>>>>> rules that will change the destination IP from LRP IP to >>>>>>>> something else) >>>>>>>> should not be affected. All other "transit" traffic doesn't have >>>>>>>> LRP >>>>>>>> IP, does it? >>>>>>> You're right. >>>>>>> >>>>>>> @Dumitru, @Martin, what if we just redirect all traffic destined to >>>>>>> LRP >>>>>>> IPs to redirect port? Is there any drawbacks? >>>>>>> For security it is possible to optionally use ACLs with or without >>>>>>> conntrack. It's up to user/CMS. >>>>>>> >>>>>>> This seems quite simple in the code and very flexible. So no >>>>>>> additional >>>>>>> flows seems to be needed in future to support any other routing >>>>>>> protocols (or for another possible non-routing usecases). >>>>>>> >>>>>> Won't this break all "transit" traffic that has destination IP the >>>>>> LRP >>>>>> IP (DNAT, LB), etc? >>>>>> >>>>> I'm afraid so. I guess I was just tired yesterday when I made that >>>>> proposal for general redirect. Given that the redirect is implemented >>>>> in the LS pipeline, it would eat up all the traffic for LRP's IP before >>>>> the DNAT/unSNAT occurs in the LR pipeline. I'll give it a quick test in >>>>> case I'm wrong here again. >>>> Oops, this will definitely break that cases. For some reason I thought >>>> only about routing cases. >>>> >>>>> If that's the case, I'll try the conntrack approach and update you all >>>>> asap. >>>>> >>>>> Martin. >>>>> >>>>>>>>> Having a filter (or match like in ACL or >>>>>>>>> Logical_Router_Policies) looks >>>>>>>>> more flexible in terms of used protocols. This can coexist with >>>>>>>>> proposal >>>>>>>>> from current patch, where the flow is not programmable from >>>>>>>>> user. >>>>>>>>> >>>>>>>> I think so too. >>>>>>>> Regards, >>>>>>>> Dumitru >>>>>>>> >>>>>>>>>> But again, if we consider the current feature "experimental", >>>>>>>>>> we can >>>>>>>>>> spend more time during the next release cycle to figure out >>>>>>>>>> what's best. >>>>>>>>>> >>>>>>>>>>> v4 of this patch renames the feature from "bgp-mirror" to >>>>>>>>>>> "bgp-redirect" >>>>>>>>>>> as discussed during community meeting. >>>>>>>>>>> >>>>>>>>>>> northd/northd.c | 108 >>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>>>> tests/ovn-northd.at | 58 +++++++++++++++++++++ >>>>>>>>>>> tests/system-ovn.at | 86 >>>>>>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>>>>>> 5 files changed, 289 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>>>> index 4353df07d..088104f25 100644 >>>>>>>>>>> --- a/northd/northd.c >>>>>>>>>>> +++ b/northd/northd.c >>>>>>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port *op, >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> +static void >>>>>>>>>>> +build_bgp_redirect_rule__( >>>>>>>>>>> + const char *s_addr, const char *bgp_port_name, >>>>>>>>>>> bool is_ipv6, >>>>>>>>>>> + struct ovn_port *ls_peer, struct lflow_table >>>>>>>>>>> *lflows, >>>>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>>>> +{ >>>>>>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>>>>>> + /* Redirect packets in the input pipeline destined for >>>>>>>>>>> LR's IP to >>>>>>>>>>> + * the port specified in 'bgp-redirect' option. >>>>>>>>>>> + */ >>>>>>>>>>> + ds_clear(match); >>>>>>>>>>> + ds_clear(actions); >>>>>>>>>>> + ds_put_format(match, "ip%d.dst == %s && tcp.dst == >>>>>>>>>>> 179", ip_ver, s_addr); >>>>>>>>>>> + ds_put_format(actions, "outport = \"%s\"; output;", >>>>>>>>>>> bgp_port_name); >>>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>>> S_SWITCH_IN_L2_LKUP, 100, >>>>>>>>>>> + ds_cstr(match), >>>>>>>>>>> + ds_cstr(actions), >>>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>>>> + >>>>>>>>>>> + >>>>>>>>>>> + /* Drop any traffic originating from 'bgp-redirect' >>>>>>>>>>> port that does >>>>>>>>>>> + * not originate from BGP daemon port. This blocks >>>>>>>>>>> unnecessary >>>>>>>>>>> + * traffic like ARP broadcasts or IPv6 router >>>>>>>>>>> solicitation packets >>>>>>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>>>>>> + */ >>>>>>>>>>> + ds_clear(match); >>>>>>>>>>> + ds_put_format(match, "inport == \"%s\"", >>>>>>>>>>> bgp_port_name); >>>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>>>>>> + ds_cstr(match), >>>>>>>>>>> + REGBIT_PORT_SEC_DROP " = 1; next;", >>>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>>> Not a huge deal but this logical flow gets regenerated >>>>>>>>>> multiple times >>>>>>>>>> for each bgp_port_name (for each LRP address). In the end >>>>>>>>>> we'll only >>>>>>>>>> install one in the database so there's no impact on SB. But >>>>>>>>>> maybe we >>>>>>>>>> should consider moving this part out of this function. >>>>>>>>>> >>>>>>>>>>> + >>>>>>>>>>> + ds_put_format(match, >>>>>>>>>>> + " && ip%d.src == %s && tcp.src == 179", >>>>>>>>>>> + ip_ver, >>>>>>>>>>> + s_addr); >>>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>>>>>> + ds_cstr(match), >>>>>>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>>>>>> check_in_port_sec(); next;", >>>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>>> In your system test we set 'addresses = "unknown"' for >>>>>>>>>> 'bgp_port_name'. >>>>>>>>>> And I guess we kind of have to do it like that all the time >>>>>>>>>> when using >>>>>>>>>> this feature. I'm trying to think of scenarios in which >>>>>>>>>> users set >>>>>>>>>> LSP.addresses = 'unknown' and LSP.port-security = '<some- >>>>>>>>>> addresses>'. >>>>>>>>>> I'm not sure I saw those until now but maybe I'm wrong. >>>>>>>>> I'm also interested whether this LSP port with unknown >>>>>>>>> addresses will >>>>>>>>> get all the broadcast traffic? >>>>>>>>> >>>>>>>>> @Martin, didn't you look for ARP behavior? If we have one LS >>>>>>>>> with three >>>>>>>>> LSPs: >>>>>>>>> - 1 router type LSP with MAC 00:00:00:00:00:01 and IP >>>>>>>>> 169.254.0.1/30 >>>>>>>>> - one "vif" LSP with "unknown" addresses set and configured >>>>>>>>> underlying >>>>>>>>> OVS port with same MAC and IP set >>>>>>>>> - one localnet port for external connectivity. >>>>>>>>> >>>>>>>>> If some party from localnet side sends broadcast ARP request >>>>>>>>> "Who has >>>>>>>>> 169.254.0.1? Answer <...>", will this ARP request got flooded >>>>>>>>> to both >>>>>>>>> LRP and LSP with "unknown" addresses? If yes, will the sender >>>>>>>>> receive 2 >>>>>>>>> ARP replies? >>>>>>>>> >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> +static void >>>>>>>>>>> +build_lrouter_bgp_redirect( >>>>>>>>>>> + struct ovn_port *op, struct lflow_table *lflows, >>>>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>>>> +{ >>>>>>>>>>> + /* LRP has to have a peer.*/ >>>>>>>>>>> + if (op->peer == NULL) { >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + /* LRP has to have NB record.*/ >>>>>>>>>>> + if (op->nbrp == NULL) { >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + /* Proceed only for LRPs that have 'bgp-redirect' >>>>>>>>>>> option set. Value of this >>>>>>>>>>> + * option is the name of LSP to which a BGP traffic >>>>>>>>>>> will be redirected. >>>>>>>>>>> + */ >>>>>>>>>>> + const char *bgp_port = smap_get(&op->nbrp->options, >>>>>>>>>>> "bgp-redirect"); >>>>>>>>>>> + if (bgp_port == NULL) { >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + /* Ensure that LSP, to which the BGP traffic is >>>>>>>>>>> redirected, exists.*/ >>>>>>>>>>> + struct ovn_port *peer_lsp; >>>>>>>>>>> + bool bgp_port_exists = false; >>>>>>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od- >>>>>>>>>>>> ports) { >>>>>>>>>>> + size_t peer_lsp_s = strlen(peer_lsp->key); >>>>>>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>>>>>> + && !strncmp(peer_lsp->key, bgp_port, >>>>>>>>>>> peer_lsp_s)){ >>>>>>>>>>> + bgp_port_exists = true; >>>>>>>>>>> + break; >>>>>>>>>>> + } >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + if (!bgp_port_exists) { >>>>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on >>>>>>>>>>> Logical Router Port " >>>>>>>>>>> + "'%s' refers to non-existent >>>>>>>>>>> Logical Switch Port. " >>>>>>>>>>> + "BGP redirecting won't be >>>>>>>>>>> configured.", >>>>>>>>>>> + op->key); >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + if (op->cr_port != NULL) { >>>>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not >>>>>>>>>>> supported on" >>>>>>>>>>> + " Distributed Gateway Port >>>>>>>>>>> '%s'", op->key); >>>>>>>>>>> + return; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + /* Mirror traffic destined for LRP's IPs and default >>>>>>>>>>> BGP port >>>>>>>>>>> + * to the port defined in 'bgp-redirect' option. >>>>>>>>>>> + */ >>>>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; >>>>>>>>>>> i++) { >>>>>>>>>>> + const char *ip_s = op- >>>>>>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, false, >>>>>>>>>>> op->peer, lflows, >>>>>>>>>>> + match, actions); >>>>>>>>>>> + } >>>> I'm thinking more and more that having a "match" expression in a >>>> "NB.Logical_Router_Port.options:redirect-match" option it a good choice. >>>> >>> Would the match be enforced only for redirected traffic (from outside to >>> bgp target LSP)? >> I guess I didn't get your question, sorry. > > I was thinking of the case Martin brought up yesterday, with the session > initiated from inside. > > If we have the following config (not sure if the option names will > change but I hope it's clear): > > LRP { > .name = lrp > .networks = [{00:00:00:00:00:01, 1.1.1.1/24}] > .options = { > .redirect-port = 'bgp-daemon' > .redirect-match = 'tcp.dst == 179 || tcp.src == 179' > } > } > > Then, in the switch pipeline in the L2_LOOKUP stage, we'll only force > the outport to be 'bgp-daemon' if "destination IP == 1.1.1.1 && > 'redirect-match'" is true. So, only traffic coming from "outside" > towards the router IP and that also matches the filter. > > No additional restrictions. > > Will this be enough though? > > What if 'bgp-daemon' ARPs for 1.1.1.2 (an external host)? How does that > work? ARP reply is unicast and will have dmac that of the LRP. > > We could include ARP and ND in the match too but then what about ARP > replies that actually should go to the router port? > Also, what about other control traffic, e.g.: a. ICMP replies from external hosts for ICMP echo initiated by bgp-daemon vs b. ICMP replies from external hosts for ICMP echo initiated by something else (a different VIF) behind the LR that got SNAT-ed to the LRP IP. > I'm actually thinking more and more that we shouldn't redirect any > traffic like this, we should just bind the VIF directly to the LRP. > >>>> LRP can have multiple IP addresses and enabling redirect for all of them >>>> could be implicit for the user and may contain security concerns. As an >>>> example: LR can have configured LB with tcp/179. If user configured >>>> peering IP-address (which can be in a link-local address range) and also >>>> added an IP address for Load Balancer in a `networks` list of a LRP. >>>> Enabling BGP redirect for all IPs of LRP will break load balancing for >>>> tcp/179. While it is not needed for load balancers to assign their VIP >>>> to networks, there is no validations, which prohibit doing so. It is >>> I think this is actually something that ovn-kubernetes does in their >>> deployments for cluster-IP services. And I suspect that today there's >>> nothing preventing users from creating cluster-IP services for TCP/179. >> >> So, this proofs that this is a possible problem if enabled redirect and >> if load balancer's IP is configured to be in networks list. >> >> In any case this doesn't prevent from current implementation to merged >> as "experimental". I'm just started discussion about a GA solution. >> >>>> still possible and can be difficult to catch such behavior for >>>> redirected traffic with Load Balancers. >>>> >>>>>>>>>>> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; >>>>>>>>>>> i++) { >>>>>>>>>>> + const char *ip_s = op- >>>>>>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>>>>>> + build_bgp_redirect_rule__(ip_s, bgp_port, true, >>>>>>>>>>> op->peer, lflows, >>>>>>>>>>> + match, actions); >>>>>>>>>>> + } >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> /* This function adds ARP resolve flows related to a LSP. >>>>>>>>>>> */ >>>>>>>>>>> static void >>>>>>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port >>>>>>>>>>> *op, >>>>>>>>>>> lsi->meter_groups, op- >>>>>>>>>>>> lflow_ref); >>>>>>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op, lsi- >>>>>>>>>>>> lflows, &lsi->match, >>>>>>>>>>> &lsi- >>>>>>>>>>>> actions, op->lflow_ref); >>>>>>>>>>> + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi- >>>>>>>>>>>> match, &lsi->actions); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> static void * >>>>>>>>>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn- >>>>>>>>>>> northd.8.xml >>>>>>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>>>>>> @@ -284,6 +284,21 @@ >>>>>>>>>>> dropped in the next stage. >>>>>>>>>>> </li> >>>>>>>>>>> >>>>>>>>>>> + <li> >>>>>>>>>>> + For each logical port that's defined as a target >>>>>>>>>>> of BGP redirecting >>>>>>>>>>> + (via <code>bgp-redirect</code> option set on >>>>>>>>>>> Logical Router Port), a >>>>>>>>>>> + filter is set in place that disallows any traffic >>>>>>>>>>> entering this port >>>>>>>>>>> + that does not originate from Logical Router Port's >>>>>>>>>>> IPs and default BGP >>>>>>>>>>> + port (TCP 179). This filtering is achieved by two >>>>>>>>>>> rules. First rule has >>>>>>>>>>> + priority 81, it matches on <code>inport == >>>>>>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>>>>>> + && ip.src == <var>LRP_IP</var> && >>>>>>>>>>> tcp.src == 179</code> >>>>>>>>>>> + and allows traffic further with >>>>>>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>>>>>> + check_in_port_sec(); next;</code>. Second rule >>>>>>>>>>> with priority 80 matches >>>>>>>>>>> + the rest of the traffic from that port and sets >>>>>>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so >>>>>>>>>>> that the packets are >>>>>>>>>>> + dropped in the next stage. >>>>>>>>>>> + </li> >>>>>>>>>>> + >>>>>>>>>>> <li> >>>>>>>>>>> For each (enabled) vtep logical port, a priority >>>>>>>>>>> 70 flow is added which >>>>>>>>>>> matches on all packets and applies the action >>>>>>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>>>>>> on the logical switch. >>>>>>>>>>> </li> >>>>>>>>>>> >>>>>>>>>>> + <li> >>>>>>>>>>> + For any logical port that's defined as a target of >>>>>>>>>>> BGP redirecting (via >>>>>>>>>>> + <code>bgp-redirect</code> option set on Logical >>>>>>>>>>> Router Port), a >>>>>>>>>>> + priority-100 flow is added that redirects traffic >>>>>>>>>>> for Logical Router >>>>>>>>>>> + Port IPs (including IPv6 LLA) and TCP port 179, to >>>>>>>>>>> the targeted >>>>>>>>>>> + logical port. >>>>>>>>>>> + </li> >>>>>>>>>>> + >>>>>>>>>>> <li> >>>>>>>>>>> Priority-90 flows for transit switches that >>>>>>>>>>> forward registered >>>>>>>>>>> IP multicast traffic to their corresponding >>>>>>>>>>> multicast group , which >>>>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>>>>>> --- a/ovn-nb.xml >>>>>>>>>>> +++ b/ovn-nb.xml >>>>>>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>>>>>> </p> >>>>>>>>>>> </column> >>>>>>>>>>> >>>>>>>>>>> + <column name="options" key="bgp-redirect" >>>>>>>>>>> type='{"type": "string"}'> >>>>>>>>>>> + <p> >>>>>>>>>>> + This option expects a name of a Logical Switch >>>>>>>>>>> Port that's present >>>>>>>>>>> + in the peer's Logical Switch. If set, it causes >>>>>>>>>>> any traffic >>>>>>>>>>> + that's destined for Logical Router Port's IP >>>>>>>>>>> addresses (including >>>>>>>>>>> + its IPv6 LLA) and the default BGP port (TCP >>>>>>>>>>> 179), to be redirected >>>>>>>>>>> + to the specified Logical Switch Port. >>>>>>>>>>> + >>>>>>>>>>> + This allows external BGP daemon to be bound to a >>>>>>>>>>> port in OVN's >>>>>>>>>>> + Logical Switch and act as if it was listening on >>>>>>>>>>> Logical Router >>>>>>>>>>> + Port's IP address. >>>>>>>>>>> + </p> >>>>>>>>>>> + </column> >>>>>>>>>>> + >>>>>>>>>>> <column name="options" key="gateway_mtu_bypass"> >>>>>>>>>>> <p> >>>>>>>>>>> When configured, represents a match expression, >>>>>>>>>>> in the same >>>>>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>>>>>>>>> index a389d1988..df1e924ab 100644 >>>>>>>>>>> --- a/tests/ovn-northd.at >>>>>>>>>>> +++ b/tests/ovn-northd.at >>>>>>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr >>>>>>>>>>> | grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>>>>>> >>>>>>>>>>> AT_CLEANUP >>>>>>>>>>> ]) >>>>>>>>>>> + >>>>>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>>>> +ovn_start >>>>>>>>>>> + >>>>>>>>>>> +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 >>>>>>>>>>> + >>>>>>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 >>>>>>>>>>> +check ovn-nbctl --wait=sb set logical_router lr >>>>>>>>>>> options:chassis=hv1 >>>>>>>>>>> + >>>>>>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>>>>>> + lsp-add ls ls-lr -- \ >>>>>>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>>>>>> + >>>>>>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>>>>>> + >>>>>>>>>>> +# Function that ensures that no BGP redirect ruels are >>>>>>>>>>> installed. >>>>>>>>>>> +check_no_bgp_redirect() { >>>>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>>> ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 >>>>>>>>>>> +]) >>>>>>>>>>> + >>>>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>>>>> wc -l], [0], [0 >>>>>>>>>>> +]) >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> +# By default, no rules related to BGP redirect are present >>>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>>> + >>>>>>>>>>> +# Set "lsp-bgp" port as target of BGP control plane >>>>>>>>>>> redirected traffic >>>>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>>>> options:bgp-redirect=lsp-bgp >>>>>>>>>>> + >>>>>>>>>>> +# Check that BGP control plane traffic is redirected to >>>>>>>>>>> "lsp-bgp" >>>>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | >>>>>>>>>>> grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl >>>>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), >>>>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 , >>>>>>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), >>>>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>>>> +]) >>>>>>>>>>> + >>>>>>>>>>> +# Check that only BGP-related traffic is accepted on "lsp- >>>>>>>>>>> bgp" port >>>>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>>> ls_in_check_port_sec | grep -E "priority=80|priority=81" | >>>>>>>>>>> ovn_strip_lflows], [0], [dnl >>>>>>>>>>> + table=??(ls_in_check_port_sec), priority=80 , >>>>>>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) >>>>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>>>> match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && >>>>>>>>>>> tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); >>>>>>>>>>> next;) >>>>>>>>>>> + table=??(ls_in_check_port_sec), priority=81 , >>>>>>>>>>> match=(inport == "lsp-bgp" && ip6.src == >>>>>>>>>>> fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] >>>>>>>>>>> = check_in_port_sec(); next;) >>>>>>>>>>> +]) >>>>>>>>>>> + >>>>>>>>>>> +# Remove 'bgp-redirect' option from LRP and check that >>>>>>>>>>> rules are removed >>>>>>>>>>> +check ovn-nbctl --wait=sb remove logical_router_port lr-ls >>>>>>>>>>> options bgp-redirect >>>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>>> + >>>>>>>>>>> +# Set non-existent LSP as target of 'bgp-redirect' and >>>>>>>>>>> check that no rules are added >>>>>>>>>>> +check ovn-nbctl --wait=sb set logical_router_port lr-ls >>>>>>>>>>> options:bgp-redirect=lsp-foo >>>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>>> + >>>>>>>>>>> +AT_CLEANUP >>>>>>>>>>> +]) >>>>>>>>>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>>>>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>>>>>> --- a/tests/system-ovn.at >>>>>>>>>>> +++ b/tests/system-ovn.at >>>>>>>>>>> @@ -13027,3 +13027,89 @@ >>>>>>>>>>> OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch- >>>>>>>>>>> .*/d >>>>>>>>>>> /connection dropped.*/d"]) >>>>>>>>>>> AT_CLEANUP >>>>>>>>>>> ]) >>>>>>>>>>> + >>>>>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>>> Nit: probably best to add AT_SKIP_IF([test $HAVE_NC = no]) >>>>>>>>>> here. >>>>>>>>>> >>>>>>>>>>> +ovn_start >>>>>>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>>>>>> + >>>>>>>>>>> +ADD_BR([br-int]) >>>>>>>>>>> +ADD_BR([br-ext]) >>>>>>>>>>> + >>>>>>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>>>>>> +# Set external-ids in br-int needed for ovn-controller >>>>>>>>>>> +ovs-vsctl \ >>>>>>>>>>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>>>>>>>>>> + -- set Open_vSwitch . external-ids:ovn- >>>>>>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>>>> type=geneve \ >>>>>>>>>>> + -- set Open_vSwitch . external-ids:ovn-encap- >>>>>>>>>>> ip=169.0.0.1 \ >>>>>>>>>>> + -- set bridge br-int fail-mode=secure other- >>>>>>>>>>> config:disable-in-band=true >>>>>>>>>> Most ov*-ctl commands should be prefixed with "check" in this >>>>>>>>>> test. >>>>>>>>>> >>>>>>>>>>> + >>>>>>>>>>> +# Start ovn-controller >>>>>>>>>>> +start_daemon ovn-controller >>>>>>>>>>> + >>>>>>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>>>>>> + -- set Logical_Router R1 options:chassis=hv1 >>>>>>>>>>> + >>>>>>>>>>> +ovn-nbctl ls-add public >>>>>>>>>>> + >>>>>>>>>>> +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 >>>>>>>>>>> 172.16.1.1/24 >>>>>>>>>>> + >>>>>>>>>>> +ovn-nbctl lsp-add public public-rp -- set >>>>>>>>>>> Logical_Switch_Port public-rp \ >>>>>>>>>>> + type=router options:router-port=rp-public \ >>>>>>>>>>> + -- lsp-set-addresses public-rp router >>>>>>>>>>> + >>>>>>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>>>>>> + >>>>>>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn- >>>>>>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>>>>>> + -- lsp-set-options public1 network_name=phynet >>>>>>>>>>> + >>>>>>>>>>> +ovn-nbctl --wait=hv sync >>>>>>>>>>> + >>>>>>>>>>> +# Set option that redirects BGP control plane traffic to a >>>>>>>>>>> LSP "bgp-daemon" >>>>>>>>>>> +ovn-nbctl --wait=sb set logical_router_port rp-public >>>>>>>>>>> options:bgp-redirect=bgp-daemon >>>>>>>>>>> + >>>>>>>>>>> +# Create "bgp-daemon" interface in a namespace with IP and >>>>>>>>>>> MAC matching LRP "rp-public" >>>>>>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", >>>>>>>>>>> "00:00:02:01:02:03") >>>>>>>>>>> + >>>>>>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", >>>>>>>>>>> "00:10:10:01:02:13", \ >>>>>>>>>>> + "172.16.1.1") >>>>>>>>>>> + >>>>>>>>>>> +# Flip the interface down/up to get proper IPv6 LLA >>>>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) >>>>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) >>>>>>>>>>> + >>>>>>>>>>> +# Wait until IPv6 LLA loses the "tentative" flag otherwise >>>>>>>>>>> it can't be bound to. >>>>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp- >>>>>>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo >>>>>>>>>>> | grep "fe80::" | grep -v tentative])]) >>>>>>>>>>> + >>>>>>>>>>> +# Verify that BGP control plane traffic is delivered to >>>>>>>>>>> the "bgp-daemon" >>>>>>>>>>> +# interface on both IPv4 and IPv6 LLA addresses >>>>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], >>>>>>>>>>> [bgp_v4.pid]) >>>>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>>>>> 172.16.1.1 179]) >>>>>>>>>>> + >>>>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) >>>>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only >>>>>>>>>>> -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>>>>>> + >>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>>>>>> + >>>>>>>>>>> +as ovn-sb >>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>>>> + >>>>>>>>>>> +as ovn-nb >>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>>>> + >>>>>>>>>>> +as northd >>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>>>>>> + >>>>>>>>>>> +as >>>>>>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d >>>>>>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>>>>>> +AT_CLEANUP >>>>>>>>>>> +]) >>>>>>>>>> Regards, >>>>>>>>>> Dumitru >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dev mailing list >>>>>>>> dev@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Thu, 2024-08-08 at 14:18 +0200, Dumitru Ceara wrote: > On 8/8/24 14:04, Dumitru Ceara wrote: > > On 8/8/24 13:51, Vladislav Odintsov wrote: > > > > > > On 08.08.2024 18:45, Dumitru Ceara wrote: > > > > On 8/8/24 11:14, Vladislav Odintsov wrote: > > > > > On 08.08.2024 15:51, martin.kalcok@canonical.com wrote: > > > > > > On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: > > > > > > > On 8/8/24 10:42, Vladislav Odintsov wrote: > > > > > > > > On 08.08.2024 03:16, Dumitru Ceara wrote: > > > > > > > > > Re-adding the dev list. > > > > > > > > > > > > > > > > > > On 8/7/24 18:12, Vladislav Odintsov wrote: > > > > > > > > > > Hi Dumitru, > > > > > > > > > > > > > > > > > > > Hi Vladislav, > > > > > > > > > > > > > > > > > > > I'd like to add some thoughts to your inputs. > > > > > > > > > > > > > > > > > > > Thanks for that, I added some more of my own below. > > > > > > > > > :) > > > > > > > > > > > > > > > > > > > On 06.08.2024 19:23, Dumitru Ceara wrote: > > > > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > > > > > > > Sorry, I was reviewing V3 but I was slow in > > > > > > > > > > > actually sending > > > > > > > > > > > out the email. > > > > > > > > > > > > > > > > > > > > > > On 8/6/24 13:17, Martin Kalcok wrote: > > > > > > > > > > > > This change adds a 'bgp-redirect' option to LRP > > > > > > > > > > > > that allows > > > > > > > > > > > > redirecting of BGP control plane traffic to an > > > > > > > > > > > > arbitrary > > > > > > > > > > > > LSP > > > > > > > > > > > > in its peer LS. > > > > > > > > > > > > > > > > > > > > > > > > The option expects a string with a LSP name. > > > > > > > > > > > > When set, > > > > > > > > > > > > any traffic entering LS that's destined for any > > > > > > > > > > > > of the > > > > > > > > > > > > LRP's IP addresses (including IPv6 LLA) is > > > > > > > > > > > > redirected > > > > > > > > > > > > to the LSP specified in the option's value. > > > > > > > > > > > > > > > > > > > > > > > > This enables external BGP daemons to listen on > > > > > > > > > > > > an interface > > > > > > > > > > > > bound to a LSP and effectively act as if they > > > > > > > > > > > > were > > > > > > > > > > > > listening > > > > > > > > > > > > on (and speaking from) LRP's IP address. > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Martin Kalcok > > > > > > > > > > > > <martin.kalcok@canonical.com> > > > > > > > > > > > > --- > > > > > > > > > > > Strictly on this version of the patch, and with > > > > > > > > > > > the thought > > > > > > > > > > > in mind that > > > > > > > > > > > we might want to consider this feature > > > > > > > > > > > experimental [0] and > > > > > > > > > > > maybe change > > > > > > > > > > > it (NB too) in the future, I left a few comments > > > > > > > > > > > inline. > > > > > > > > > > > With those > > > > > > > > > > > addressed I think the patch looks OK to me. > > > > > > > > > > > > > > > > > > > > > > [0] > > > > > > > > > > > https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ > > > > > > > > > > > > > > > > > > > > > > Now, for the point when we'd remove the > > > > > > > > > > > "experimental" tag: > > > > > > > > > > > > > > > > > > > > > > In general, it makes me a bit uneasy that we have > > > > > > > > > > > to share > > > > > > > > > > > the MAC and > > > > > > > > > > > IP between the LRP and the VIF of logical switch > > > > > > > > > > > port that's > > > > > > > > > > > connected > > > > > > > > > > > to the same switch as the LRP. > > > > > > > > > > > > > > > > > > > > > > I was thinking of alternatives for the future and > > > > > > > > > > > the only > > > > > > > > > > > things I > > > > > > > > > > > could come up with until now are: > > > > > > > > > > > > > > > > > > > > > > 1. Create a separate, "disconnected" logical > > > > > > > > > > > switch: > > > > > > > > > > > ovn-nbctl ls-add bgp > > > > > > > > > > > > > > > > > > > > > > Add the bgp-daemon port to it: > > > > > > > > > > > ovn-nbctl lsp-add bgp bgp-daemon. > > > > > > > > > > > > > > > > > > > > > > Then we don't need "unknown" either, I think. > > > > > > > > > > > > > > > > > > > > > > But it's not possible today to move packets from > > > > > > > > > > > the ingress > > > > > > > > > > > pipeline of > > > > > > > > > > > a logical switch ("public" in this test) to the > > > > > > > > > > > egress > > > > > > > > > > > pipeline of a > > > > > > > > > > > different logical switch ("bgp" in my example). > > > > > > > > > > > It also > > > > > > > > > > > feels icky to > > > > > > > > > > > implement that. > > > > > > > > > > > > > > > > > > > > > > 2. Add the ability to bind an OVS port directly > > > > > > > > > > > to a logical > > > > > > > > > > > router port: > > > > > > > > > > > > > > > > > > > > > > then we could do the same type of redirect you do > > > > > > > > > > > here but > > > > > > > > > > > instead in > > > > > > > > > > > the logical router pipeline. The advantage is we > > > > > > > > > > > don't have > > > > > > > > > > > to drop any > > > > > > > > > > > non-bgp traffic in the switch pipeline. The > > > > > > > > > > > switch keeps > > > > > > > > > > > functioning as > > > > > > > > > > > it does today. > > > > > > > > > > > > > > > > > > > > > > Maybe an advantage of this second alternative > > > > > > > > > > > would be that > > > > > > > > > > > we can > > > > > > > > > > > easily attach a filtering option to this LRP > > > > > > > > > > > (e.g., > > > > > > > > > > > LRP.options:control-traffic-filter) to allow us > > > > > > > > > > > to be more > > > > > > > > > > > flexible in > > > > > > > > > > > what kind of traffic we forward to the actuall > > > > > > > > > > > routing > > > > > > > > > > > protocol daemon > > > > > > > > > > > that runs behind that OVS port - Vladislav also > > > > > > > > > > > mentioned > > > > > > > > > > > during the > > > > > > > > > > > meeting it might be interesting to forward BFD > > > > > > > > > > > packets to the > > > > > > > > > > > FRR (or > > > > > > > > > > > whatever application implements the routing > > > > > > > > > > > protocol) > > > > > > > > > > > instance too. > > > > > > > > > > The idea to be able to bind LRP to OVS port sounds > > > > > > > > > > very > > > > > > > > > > interesting to > > > > > > > > > > me. But with a note that it's not a pure "bind", > > > > > > > > > > but a partial: > > > > > > > > > > as you > > > > > > > > > > wrote with some "filter" applied to pass not all > > > > > > > > > > the traffic. > > > > > > > > > > The main > > > > > > > > > > idea here is to pass only control plane traffic > > > > > > > > > > destined to > > > > > > > > > > LRP's > > > > > > > > > As we're discussing on the other thread (Martin > > > > > > > > > pointed it out) > > > > > > > > > we also > > > > > > > > > probably need to involve conntrack and allow replies > > > > > > > > > to any kind > > > > > > > > > of > > > > > > > > > traffic initiated by the entity behind the LRP's VIF > > > > > > > > > (e.g., BGP > > > > > > > > > sessions > > > > > > > > > initiated from the host). > > > > > > > > > > > > > > > > > > > addresses. Other than that seems odd. Transit > > > > > > > > > > traffic should be > > > > > > > > > > remained > > > > > > > > > > in LR pipeline otherwise it will have no difference > > > > > > > > > > with a > > > > > > > > > > regular VIF LSP. > > > > > > > > > > > > > > > > > > > Definitely, traffic that needs to be DNATed > > > > > > > > > (DNAT/unSNAT rules or > > > > > > > > > LB > > > > > > > > > rules that will change the destination IP from LRP IP > > > > > > > > > to > > > > > > > > > something else) > > > > > > > > > should not be affected. All other "transit" traffic > > > > > > > > > doesn't have > > > > > > > > > LRP > > > > > > > > > IP, does it? > > > > > > > > You're right. > > > > > > > > > > > > > > > > @Dumitru, @Martin, what if we just redirect all traffic > > > > > > > > destined to > > > > > > > > LRP > > > > > > > > IPs to redirect port? Is there any drawbacks? > > > > > > > > For security it is possible to optionally use ACLs with > > > > > > > > or without > > > > > > > > conntrack. It's up to user/CMS. > > > > > > > > > > > > > > > > This seems quite simple in the code and very flexible. > > > > > > > > So no > > > > > > > > additional > > > > > > > > flows seems to be needed in future to support any other > > > > > > > > routing > > > > > > > > protocols (or for another possible non-routing > > > > > > > > usecases). > > > > > > > > > > > > > > > Won't this break all "transit" traffic that has > > > > > > > destination IP the > > > > > > > LRP > > > > > > > IP (DNAT, LB), etc? > > > > > > > > > > > > > I'm afraid so. I guess I was just tired yesterday when I > > > > > > made that > > > > > > proposal for general redirect. Given that the redirect is > > > > > > implemented > > > > > > in the LS pipeline, it would eat up all the traffic for > > > > > > LRP's IP before > > > > > > the DNAT/unSNAT occurs in the LR pipeline. I'll give it a > > > > > > quick test in > > > > > > case I'm wrong here again. > > > > > Oops, this will definitely break that cases. For some reason > > > > > I thought > > > > > only about routing cases. > > > > > > > > > > > If that's the case, I'll try the conntrack approach and > > > > > > update you all > > > > > > asap. > > > > > > > > > > > > Martin. I tried approach of redirecting traffic that matches: (ip.dst == <LRP_IP> && tcp.dst == <BGP>) || (ip.dst == <LRP_IP> && tcp.src == <BGP>) This seems to solve bidirectional BGP sessions without the need for redirecting all of the traffic, or involving conntrack. Though I see some initial lag when session is initiated from our BGP daemon. I suspect that this is something to do with ARP/ND resolution. I'm looking into it. But to me this seems like a relatively easy to implement and functioning approach (provided I find the reason for the initial lag). > > > > > > > > > > > > > > > > Having a filter (or match like in ACL or > > > > > > > > > > Logical_Router_Policies) looks > > > > > > > > > > more flexible in terms of used protocols. This can > > > > > > > > > > coexist with > > > > > > > > > > proposal > > > > > > > > > > from current patch, where the flow is not > > > > > > > > > > programmable from > > > > > > > > > > user. > > > > > > > > > > > > > > > > > > > I think so too. > > > > > > > > > Regards, > > > > > > > > > Dumitru > > > > > > > > > > > > > > > > > > > > But again, if we consider the current feature > > > > > > > > > > > "experimental", > > > > > > > > > > > we can > > > > > > > > > > > spend more time during the next release cycle to > > > > > > > > > > > figure out > > > > > > > > > > > what's best. > > > > > > > > > > > > > > > > > > > > > > > v4 of this patch renames the feature from "bgp- > > > > > > > > > > > > mirror" to > > > > > > > > > > > > "bgp-redirect" > > > > > > > > > > > > as discussed during community meeting. > > > > > > > > > > > > > > > > > > > > > > > > northd/northd.c | 108 > > > > > > > > > > > > ++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > northd/ovn-northd.8.xml | 23 +++++++++ > > > > > > > > > > > > ovn-nb.xml | 14 ++++++ > > > > > > > > > > > > tests/ovn-northd.at | 58 > > > > > > > > > > > > +++++++++++++++++++++ > > > > > > > > > > > > tests/system-ovn.at | 86 > > > > > > > > > > > > ++++++++++++++++++++++++++++++++ > > > > > > > > > > > > 5 files changed, 289 insertions(+) > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/northd/northd.c b/northd/northd.c > > > > > > > > > > > > index 4353df07d..088104f25 100644 > > > > > > > > > > > > --- a/northd/northd.c > > > > > > > > > > > > +++ b/northd/northd.c > > > > > > > > > > > > @@ -13048,6 +13048,113 @@ > > > > > > > > > > > > build_arp_resolve_flows_for_lrp(struct ovn_port > > > > > > > > > > > > *op, > > > > > > > > > > > > } > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > +static void > > > > > > > > > > > > +build_bgp_redirect_rule__( > > > > > > > > > > > > + const char *s_addr, const char > > > > > > > > > > > > *bgp_port_name, > > > > > > > > > > > > bool is_ipv6, > > > > > > > > > > > > + struct ovn_port *ls_peer, struct > > > > > > > > > > > > lflow_table > > > > > > > > > > > > *lflows, > > > > > > > > > > > > + struct ds *match, struct ds *actions) > > > > > > > > > > > > +{ > > > > > > > > > > > > + int ip_ver = is_ipv6 ? 6 : 4; > > > > > > > > > > > > + /* Redirect packets in the input pipeline > > > > > > > > > > > > destined for > > > > > > > > > > > > LR's IP to > > > > > > > > > > > > + * the port specified in 'bgp-redirect' > > > > > > > > > > > > option. > > > > > > > > > > > > + */ > > > > > > > > > > > > + ds_clear(match); > > > > > > > > > > > > + ds_clear(actions); > > > > > > > > > > > > + ds_put_format(match, "ip%d.dst == %s && > > > > > > > > > > > > tcp.dst == > > > > > > > > > > > > 179", ip_ver, s_addr); > > > > > > > > > > > > + ds_put_format(actions, "outport = \"%s\"; > > > > > > > > > > > > output;", > > > > > > > > > > > > bgp_port_name); > > > > > > > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > > > > > > > S_SWITCH_IN_L2_LKUP, 100, > > > > > > > > > > > > + ds_cstr(match), > > > > > > > > > > > > + ds_cstr(actions), > > > > > > > > > > > > + ls_peer->lflow_ref); > > > > > > > > > > > > + > > > > > > > > > > > > + > > > > > > > > > > > > + /* Drop any traffic originating from 'bgp- > > > > > > > > > > > > redirect' > > > > > > > > > > > > port that does > > > > > > > > > > > > + * not originate from BGP daemon port. > > > > > > > > > > > > This blocks > > > > > > > > > > > > unnecessary > > > > > > > > > > > > + * traffic like ARP broadcasts or IPv6 > > > > > > > > > > > > router > > > > > > > > > > > > solicitation packets > > > > > > > > > > > > + * from the dummy 'bgp-redirect' port. > > > > > > > > > > > > + */ > > > > > > > > > > > > + ds_clear(match); > > > > > > > > > > > > + ds_put_format(match, "inport == \"%s\"", > > > > > > > > > > > > bgp_port_name); > > > > > > > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > > > > > > > S_SWITCH_IN_CHECK_PORT_SEC, 80, > > > > > > > > > > > > + ds_cstr(match), > > > > > > > > > > > > + REGBIT_PORT_SEC_DROP " = 1; > > > > > > > > > > > > next;", > > > > > > > > > > > > + ls_peer->lflow_ref); > > > > > > > > > > > Not a huge deal but this logical flow gets > > > > > > > > > > > regenerated > > > > > > > > > > > multiple times > > > > > > > > > > > for each bgp_port_name (for each LRP address). > > > > > > > > > > > In the end > > > > > > > > > > > we'll only > > > > > > > > > > > install one in the database so there's no impact > > > > > > > > > > > on SB. But > > > > > > > > > > > maybe we > > > > > > > > > > > should consider moving this part out of this > > > > > > > > > > > function. > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > > + ds_put_format(match, > > > > > > > > > > > > + " && ip%d.src == %s && > > > > > > > > > > > > tcp.src == 179", > > > > > > > > > > > > + ip_ver, > > > > > > > > > > > > + s_addr); > > > > > > > > > > > > + ovn_lflow_add(lflows, ls_peer->od, > > > > > > > > > > > > S_SWITCH_IN_CHECK_PORT_SEC, 81, > > > > > > > > > > > > + ds_cstr(match), > > > > > > > > > > > > + REGBIT_PORT_SEC_DROP " = > > > > > > > > > > > > check_in_port_sec(); next;", > > > > > > > > > > > > + ls_peer->lflow_ref); > > > > > > > > > > > In your system test we set 'addresses = > > > > > > > > > > > "unknown"' for > > > > > > > > > > > 'bgp_port_name'. > > > > > > > > > > > And I guess we kind of have to do it like that > > > > > > > > > > > all the time > > > > > > > > > > > when using > > > > > > > > > > > this feature. I'm trying to think of scenarios > > > > > > > > > > > in which > > > > > > > > > > > users set > > > > > > > > > > > LSP.addresses = 'unknown' and LSP.port-security = > > > > > > > > > > > '<some- > > > > > > > > > > > addresses>'. > > > > > > > > > > > I'm not sure I saw those until now but maybe I'm > > > > > > > > > > > wrong. > > > > > > > > > > I'm also interested whether this LSP port with > > > > > > > > > > unknown > > > > > > > > > > addresses will > > > > > > > > > > get all the broadcast traffic? > > > > > > > > > > > > > > > > > > > > @Martin, didn't you look for ARP behavior? If we > > > > > > > > > > have one LS > > > > > > > > > > with three > > > > > > > > > > LSPs: > > > > > > > > > > - 1 router type LSP with MAC 00:00:00:00:00:01 and > > > > > > > > > > IP > > > > > > > > > > 169.254.0.1/30 > > > > > > > > > > - one "vif" LSP with "unknown" addresses set and > > > > > > > > > > configured > > > > > > > > > > underlying > > > > > > > > > > OVS port with same MAC and IP set > > > > > > > > > > - one localnet port for external connectivity. > > > > > > > > > > > > > > > > > > > > If some party from localnet side sends broadcast > > > > > > > > > > ARP request > > > > > > > > > > "Who has > > > > > > > > > > 169.254.0.1? Answer <...>", will this ARP request > > > > > > > > > > got flooded > > > > > > > > > > to both > > > > > > > > > > LRP and LSP with "unknown" addresses? If yes, will > > > > > > > > > > the sender > > > > > > > > > > receive 2 > > > > > > > > > > ARP replies? > > > > > > > > > > > > > > > > > > > > > > +} > > > > > > > > > > > > + > > > > > > > > > > > > +static void > > > > > > > > > > > > +build_lrouter_bgp_redirect( > > > > > > > > > > > > + struct ovn_port *op, struct > > > > > > > > > > > > lflow_table *lflows, > > > > > > > > > > > > + struct ds *match, struct ds *actions) > > > > > > > > > > > > +{ > > > > > > > > > > > > + /* LRP has to have a peer.*/ > > > > > > > > > > > > + if (op->peer == NULL) { > > > > > > > > > > > > + return; > > > > > > > > > > > > + } > > > > > > > > > > > > + /* LRP has to have NB record.*/ > > > > > > > > > > > > + if (op->nbrp == NULL) { > > > > > > > > > > > > + return; > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > + /* Proceed only for LRPs that have 'bgp- > > > > > > > > > > > > redirect' > > > > > > > > > > > > option set. Value of this > > > > > > > > > > > > + * option is the name of LSP to which a > > > > > > > > > > > > BGP traffic > > > > > > > > > > > > will be redirected. > > > > > > > > > > > > + */ > > > > > > > > > > > > + const char *bgp_port = smap_get(&op->nbrp- > > > > > > > > > > > > >options, > > > > > > > > > > > > "bgp-redirect"); > > > > > > > > > > > > + if (bgp_port == NULL) { > > > > > > > > > > > > + return; > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > + /* Ensure that LSP, to which the BGP > > > > > > > > > > > > traffic is > > > > > > > > > > > > redirected, exists.*/ > > > > > > > > > > > > + struct ovn_port *peer_lsp; > > > > > > > > > > > > + bool bgp_port_exists = false; > > > > > > > > > > > > + HMAP_FOR_EACH (peer_lsp, dp_node, &op- > > > > > > > > > > > > >peer->od- > > > > > > > > > > > > > ports) { > > > > > > > > > > > > + size_t peer_lsp_s = strlen(peer_lsp- > > > > > > > > > > > > >key); > > > > > > > > > > > > + if (peer_lsp_s == strlen(bgp_port) > > > > > > > > > > > > + && !strncmp(peer_lsp->key, > > > > > > > > > > > > bgp_port, > > > > > > > > > > > > peer_lsp_s)){ > > > > > > > > > > > > + bgp_port_exists = true; > > > > > > > > > > > > + break; > > > > > > > > > > > > + } > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > + if (!bgp_port_exists) { > > > > > > > > > > > > + static struct vlog_rate_limit rl = > > > > > > > > > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > > > > > > > > > + VLOG_WARN_RL(&rl, "Option 'bgp- > > > > > > > > > > > > redirect' set on > > > > > > > > > > > > Logical Router Port " > > > > > > > > > > > > + "'%s' refers to non- > > > > > > > > > > > > existent > > > > > > > > > > > > Logical Switch Port. " > > > > > > > > > > > > + "BGP redirecting > > > > > > > > > > > > won't be > > > > > > > > > > > > configured.", > > > > > > > > > > > > + op->key); > > > > > > > > > > > > + return; > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > + if (op->cr_port != NULL) { > > > > > > > > > > > > + static struct vlog_rate_limit rl = > > > > > > > > > > > > VLOG_RATE_LIMIT_INIT(1, 5); > > > > > > > > > > > > + VLOG_WARN_RL(&rl, "Option 'bgp- > > > > > > > > > > > > redirect' is not > > > > > > > > > > > > supported on" > > > > > > > > > > > > + " Distributed > > > > > > > > > > > > Gateway Port > > > > > > > > > > > > '%s'", op->key); > > > > > > > > > > > > + return; > > > > > > > > > > > > + } > > > > > > > > > > > > + > > > > > > > > > > > > + /* Mirror traffic destined for LRP's IPs > > > > > > > > > > > > and default > > > > > > > > > > > > BGP port > > > > > > > > > > > > + * to the port defined in 'bgp-redirect' > > > > > > > > > > > > option. > > > > > > > > > > > > + */ > > > > > > > > > > > > + for (size_t i = 0; i < op- > > > > > > > > > > > > >lrp_networks.n_ipv4_addrs; > > > > > > > > > > > > i++) { > > > > > > > > > > > > + const char *ip_s = op- > > > > > > > > > > > > > lrp_networks.ipv4_addrs[i].addr_s; > > > > > > > > > > > > + build_bgp_redirect_rule__(ip_s, > > > > > > > > > > > > bgp_port, false, > > > > > > > > > > > > op->peer, lflows, > > > > > > > > > > > > + match, > > > > > > > > > > > > actions); > > > > > > > > > > > > + } > > > > > I'm thinking more and more that having a "match" expression > > > > > in a > > > > > "NB.Logical_Router_Port.options:redirect-match" option it a > > > > > good choice. > > > > > > > > > Would the match be enforced only for redirected traffic (from > > > > outside to > > > > bgp target LSP)? > > > I guess I didn't get your question, sorry. > > > > I was thinking of the case Martin brought up yesterday, with the > > session > > initiated from inside. > > > > If we have the following config (not sure if the option names will > > change but I hope it's clear): > > > > LRP { > > .name = lrp > > .networks = [{00:00:00:00:00:01, 1.1.1.1/24}] > > .options = { > > .redirect-port = 'bgp-daemon' > > .redirect-match = 'tcp.dst == 179 || tcp.src == 179' > > } > > } > > > > Then, in the switch pipeline in the L2_LOOKUP stage, we'll only > > force > > the outport to be 'bgp-daemon' if "destination IP == 1.1.1.1 && > > 'redirect-match'" is true. So, only traffic coming from "outside" > > towards the router IP and that also matches the filter. > > > > No additional restrictions. > > > > Will this be enough though? > > > > What if 'bgp-daemon' ARPs for 1.1.1.2 (an external host)? How does > > that > > work? ARP reply is unicast and will have dmac that of the LRP. > > > > We could include ARP and ND in the match too but then what about > > ARP > > replies that actually should go to the router port? Is there a way to deliver the ARP replies to both ports? iirc, for example, ct_commit forks pipelines. (I'm not saying to use ct_commit, but apply similar approach??) > > > > Also, what about other control traffic, e.g.: > > a. ICMP replies from external hosts for ICMP echo initiated by bgp- > daemon > > vs > > b. ICMP replies from external hosts for ICMP echo initiated by > something > else (a different VIF) behind the LR that got SNAT-ed to the LRP IP. > > > I'm actually thinking more and more that we shouldn't redirect any > > traffic like this, we should just bind the VIF directly to the LRP. > > > > > > > LRP can have multiple IP addresses and enabling redirect for > > > > > all of them > > > > > could be implicit for the user and may contain security > > > > > concerns. As an > > > > > example: LR can have configured LB with tcp/179. If user > > > > > configured > > > > > peering IP-address (which can be in a link-local address > > > > > range) and also > > > > > added an IP address for Load Balancer in a `networks` list of > > > > > a LRP. > > > > > Enabling BGP redirect for all IPs of LRP will break load > > > > > balancing for > > > > > tcp/179. While it is not needed for load balancers to assign > > > > > their VIP > > > > > to networks, there is no validations, which prohibit doing > > > > > so. It is > > > > I think this is actually something that ovn-kubernetes does in > > > > their > > > > deployments for cluster-IP services. And I suspect that today > > > > there's > > > > nothing preventing users from creating cluster-IP services for > > > > TCP/179. Would it be useful to redirect only traffic for LRP's IPv6 LLA? @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for setting up BGP peering? > > > > > > So, this proofs that this is a possible problem if enabled > > > redirect and > > > if load balancer's IP is configured to be in networks list. > > > > > > In any case this doesn't prevent from current implementation to > > > merged > > > as "experimental". I'm just started discussion about a GA > > > solution. > > > > > > > > still possible and can be difficult to catch such behavior > > > > > for > > > > > redirected traffic with Load Balancers. > > > > > > > > > > > > > > > > > + for (size_t i = 0; i < op- > > > > > > > > > > > > >lrp_networks.n_ipv6_addrs; > > > > > > > > > > > > i++) { > > > > > > > > > > > > + const char *ip_s = op- > > > > > > > > > > > > > lrp_networks.ipv6_addrs[i].addr_s; > > > > > > > > > > > > + build_bgp_redirect_rule__(ip_s, > > > > > > > > > > > > bgp_port, true, > > > > > > > > > > > > op->peer, lflows, > > > > > > > > > > > > + match, > > > > > > > > > > > > actions); > > > > > > > > > > > > + } > > > > > > > > > > > > +} > > > > > > > > > > > > + > > > > > > > > > > > > /* This function adds ARP resolve flows > > > > > > > > > > > > related to a LSP. > > > > > > > > > > > > */ > > > > > > > > > > > > static void > > > > > > > > > > > > build_arp_resolve_flows_for_lsp( > > > > > > > > > > > > @@ -16003,6 +16110,7 @@ > > > > > > > > > > > > build_lswitch_and_lrouter_iterate_by_lrp(struct > > > > > > > > > > > > ovn_port > > > > > > > > > > > > *op, > > > > > > > > > > > > lsi- > > > > > > > > > > > > >meter_groups, op- > > > > > > > > > > > > > lflow_ref); > > > > > > > > > > > > > > > > > > > > > > > > build_lrouter_icmp_packet_toobig_admin_flows(op > > > > > > > > > > > > , lsi- > > > > > > > > > > > > > lflows, &lsi->match, > > > > > > > > > > > > > > > > > > > > > > > > &lsi- > > > > > > > > > > > > > actions, op->lflow_ref); > > > > > > > > > > > > + build_lrouter_bgp_redirect(op, lsi- > > > > > > > > > > > > >lflows, &lsi- > > > > > > > > > > > > > match, &lsi->actions); > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > static void * > > > > > > > > > > > > diff --git a/northd/ovn-northd.8.xml > > > > > > > > > > > > b/northd/ovn- > > > > > > > > > > > > northd.8.xml > > > > > > > > > > > > index b06b09ac5..4cbdf8f74 100644 > > > > > > > > > > > > --- a/northd/ovn-northd.8.xml > > > > > > > > > > > > +++ b/northd/ovn-northd.8.xml > > > > > > > > > > > > @@ -284,6 +284,21 @@ > > > > > > > > > > > > dropped in the next stage. > > > > > > > > > > > > </li> > > > > > > > > > > > > > > > > > > > > > > > > + <li> > > > > > > > > > > > > + For each logical port that's defined > > > > > > > > > > > > as a target > > > > > > > > > > > > of BGP redirecting > > > > > > > > > > > > + (via <code>bgp-redirect</code> option > > > > > > > > > > > > set on > > > > > > > > > > > > Logical Router Port), a > > > > > > > > > > > > + filter is set in place that disallows > > > > > > > > > > > > any traffic > > > > > > > > > > > > entering this port > > > > > > > > > > > > + that does not originate from Logical > > > > > > > > > > > > Router Port's > > > > > > > > > > > > IPs and default BGP > > > > > > > > > > > > + port (TCP 179). This filtering is > > > > > > > > > > > > achieved by two > > > > > > > > > > > > rules. First rule has > > > > > > > > > > > > + priority 81, it matches on > > > > > > > > > > > > <code>inport == > > > > > > > > > > > > <var>BGP_MIRROR_PORT</var> > > > > > > > > > > > > + && ip.src == <var>LRP_IP</var> > > > > > > > > > > > > && > > > > > > > > > > > > tcp.src == 179</code> > > > > > > > > > > > > + and allows traffic further with > > > > > > > > > > > > <code>REGBIT_PORT_SEC_DROP" = > > > > > > > > > > > > + check_in_port_sec(); next;</code>. > > > > > > > > > > > > Second rule > > > > > > > > > > > > with priority 80 matches > > > > > > > > > > > > + the rest of the traffic from that port > > > > > > > > > > > > and sets > > > > > > > > > > > > + <code>REGBIT_PORT_SEC_DROP" = 1; > > > > > > > > > > > > next;"</code> so > > > > > > > > > > > > that the packets are > > > > > > > > > > > > + dropped in the next stage. > > > > > > > > > > > > + </li> > > > > > > > > > > > > + > > > > > > > > > > > > <li> > > > > > > > > > > > > For each (enabled) vtep logical port, > > > > > > > > > > > > a priority > > > > > > > > > > > > 70 flow is added which > > > > > > > > > > > > matches on all packets and applies the > > > > > > > > > > > > action > > > > > > > > > > > > @@ -1949,6 +1964,14 @@ output; > > > > > > > > > > > > on the logical switch. > > > > > > > > > > > > </li> > > > > > > > > > > > > > > > > > > > > > > > > + <li> > > > > > > > > > > > > + For any logical port that's defined as > > > > > > > > > > > > a target of > > > > > > > > > > > > BGP redirecting (via > > > > > > > > > > > > + <code>bgp-redirect</code> option set > > > > > > > > > > > > on Logical > > > > > > > > > > > > Router Port), a > > > > > > > > > > > > + priority-100 flow is added that > > > > > > > > > > > > redirects traffic > > > > > > > > > > > > for Logical Router > > > > > > > > > > > > + Port IPs (including IPv6 LLA) and TCP > > > > > > > > > > > > port 179, to > > > > > > > > > > > > the targeted > > > > > > > > > > > > + logical port. > > > > > > > > > > > > + </li> > > > > > > > > > > > > + > > > > > > > > > > > > <li> > > > > > > > > > > > > Priority-90 flows for transit switches > > > > > > > > > > > > that > > > > > > > > > > > > forward registered > > > > > > > > > > > > IP multicast traffic to their > > > > > > > > > > > > corresponding > > > > > > > > > > > > multicast group , which > > > > > > > > > > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > > > > > > > > > > > index 9552534f6..2d2f5c9e0 100644 > > > > > > > > > > > > --- a/ovn-nb.xml > > > > > > > > > > > > +++ b/ovn-nb.xml > > > > > > > > > > > > @@ -3440,6 +3440,20 @@ or > > > > > > > > > > > > </p> > > > > > > > > > > > > </column> > > > > > > > > > > > > > > > > > > > > > > > > + <column name="options" key="bgp- > > > > > > > > > > > > redirect" > > > > > > > > > > > > type='{"type": "string"}'> > > > > > > > > > > > > + <p> > > > > > > > > > > > > + This option expects a name of a > > > > > > > > > > > > Logical Switch > > > > > > > > > > > > Port that's present > > > > > > > > > > > > + in the peer's Logical Switch. If > > > > > > > > > > > > set, it causes > > > > > > > > > > > > any traffic > > > > > > > > > > > > + that's destined for Logical Router > > > > > > > > > > > > Port's IP > > > > > > > > > > > > addresses (including > > > > > > > > > > > > + its IPv6 LLA) and the default BGP > > > > > > > > > > > > port (TCP > > > > > > > > > > > > 179), to be redirected > > > > > > > > > > > > + to the specified Logical Switch > > > > > > > > > > > > Port. > > > > > > > > > > > > + > > > > > > > > > > > > + This allows external BGP daemon to > > > > > > > > > > > > be bound to a > > > > > > > > > > > > port in OVN's > > > > > > > > > > > > + Logical Switch and act as if it was > > > > > > > > > > > > listening on > > > > > > > > > > > > Logical Router > > > > > > > > > > > > + Port's IP address. > > > > > > > > > > > > + </p> > > > > > > > > > > > > + </column> > > > > > > > > > > > > + > > > > > > > > > > > > <column name="options" > > > > > > > > > > > > key="gateway_mtu_bypass"> > > > > > > > > > > > > <p> > > > > > > > > > > > > When configured, represents a match > > > > > > > > > > > > expression, > > > > > > > > > > > > in the same > > > > > > > > > > > > diff --git a/tests/ovn-northd.at b/tests/ovn- > > > > > > > > > > > > northd.at > > > > > > > > > > > > index a389d1988..df1e924ab 100644 > > > > > > > > > > > > --- a/tests/ovn-northd.at > > > > > > > > > > > > +++ b/tests/ovn-northd.at > > > > > > > > > > > > @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl > > > > > > > > > > > > dump-flows lr > > > > > > > > > > > > > grep lr_in_dnat | ovn_strip_lflows], [0], [d > > > > > > > > > > > > > > > > > > > > > > > > AT_CLEANUP > > > > > > > > > > > > ]) > > > > > > > > > > > > + > > > > > > > > > > > > +OVN_FOR_EACH_NORTHD_NO_HV([ > > > > > > > > > > > > +AT_SETUP([BGP control plane redirect]) > > > > > > > > > > > > +ovn_start > > > > > > > > > > > > + > > > > > > > > > > > > +check ovn-sbctl chassis-add hv1 geneve > > > > > > > > > > > > 127.0.0.1 > > > > > > > > > > > > + > > > > > > > > > > > > +check ovn-nbctl lr-add lr -- \ > > > > > > > > > > > > + lrp-add lr lr-ls 02:ac:10:01:00:01 > > > > > > > > > > > > 172.16.1.1/24 > > > > > > > > > > > > +check ovn-nbctl --wait=sb set logical_router > > > > > > > > > > > > lr > > > > > > > > > > > > options:chassis=hv1 > > > > > > > > > > > > + > > > > > > > > > > > > +check ovn-nbctl ls-add ls -- \ > > > > > > > > > > > > + lsp-add ls ls-lr -- \ > > > > > > > > > > > > + lsp-set-type ls-lr router -- \ > > > > > > > > > > > > + lsp-set-addresses ls-lr router -- \ > > > > > > > > > > > > + lsp-set-options ls-lr router-port=lr-ls > > > > > > > > > > > > + > > > > > > > > > > > > +check ovn-nbctl lsp-add ls lsp-bgp -- \ > > > > > > > > > > > > + lsp-set-addresses lsp-bgp unknown > > > > > > > > > > > > + > > > > > > > > > > > > +# Function that ensures that no BGP redirect > > > > > > > > > > > > ruels are > > > > > > > > > > > > installed. > > > > > > > > > > > > +check_no_bgp_redirect() { > > > > > > > > > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > > > > > > > ls_in_l2_lkup | grep "tcp.dst == 179" | wc - > > > > > > > > > > > > l], [0], [0 > > > > > > > > > > > > +]) > > > > > > > > > > > > + > > > > > > > > > > > > + AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > > > > > > > ls_in_check_port_sec | grep -E > > > > > > > > > > > > "priority=80|priority=81" | > > > > > > > > > > > > wc -l], [0], [0 > > > > > > > > > > > > +]) > > > > > > > > > > > > +} > > > > > > > > > > > > + > > > > > > > > > > > > +# By default, no rules related to BGP redirect > > > > > > > > > > > > are present > > > > > > > > > > > > +check_no_bgp_redirect > > > > > > > > > > > > + > > > > > > > > > > > > +# Set "lsp-bgp" port as target of BGP control > > > > > > > > > > > > plane > > > > > > > > > > > > redirected traffic > > > > > > > > > > > > +check ovn-nbctl --wait=sb set > > > > > > > > > > > > logical_router_port lr-ls > > > > > > > > > > > > options:bgp-redirect=lsp-bgp > > > > > > > > > > > > + > > > > > > > > > > > > +# Check that BGP control plane traffic is > > > > > > > > > > > > redirected to > > > > > > > > > > > > "lsp-bgp" > > > > > > > > > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > > > > > > > ls_in_l2_lkup | > > > > > > > > > > > > grep "tcp.dst == 179" | ovn_strip_lflows], [0], > > > > > > > > > > > > [dnl > > > > > > > > > > > > + table=??(ls_in_l2_lkup ), priority=100 > > > > > > > > > > > > , > > > > > > > > > > > > match=(ip4.dst == 172.16.1.1 && tcp.dst == > > > > > > > > > > > > 179), > > > > > > > > > > > > action=(outport = "lsp-bgp"; output;) > > > > > > > > > > > > + table=??(ls_in_l2_lkup ), priority=100 > > > > > > > > > > > > , > > > > > > > > > > > > match=(ip6.dst == fe80::ac:10ff:fe01:1 && > > > > > > > > > > > > tcp.dst == 179), > > > > > > > > > > > > action=(outport = "lsp-bgp"; output;) > > > > > > > > > > > > +]) > > > > > > > > > > > > + > > > > > > > > > > > > +# Check that only BGP-related traffic is > > > > > > > > > > > > accepted on "lsp- > > > > > > > > > > > > bgp" port > > > > > > > > > > > > +AT_CHECK([ovn-sbctl dump-flows ls | grep > > > > > > > > > > > > ls_in_check_port_sec | grep -E > > > > > > > > > > > > "priority=80|priority=81" | > > > > > > > > > > > > ovn_strip_lflows], [0], [dnl > > > > > > > > > > > > + table=??(ls_in_check_port_sec), > > > > > > > > > > > > priority=80 , > > > > > > > > > > > > match=(inport == "lsp-bgp"), action=(reg0[[15]] > > > > > > > > > > > > = 1; next;) > > > > > > > > > > > > + table=??(ls_in_check_port_sec), > > > > > > > > > > > > priority=81 , > > > > > > > > > > > > match=(inport == "lsp-bgp" && ip4.src == > > > > > > > > > > > > 172.16.1.1 && > > > > > > > > > > > > tcp.src == 179), action=(reg0[[15]] = > > > > > > > > > > > > check_in_port_sec(); > > > > > > > > > > > > next;) > > > > > > > > > > > > + table=??(ls_in_check_port_sec), > > > > > > > > > > > > priority=81 , > > > > > > > > > > > > match=(inport == "lsp-bgp" && ip6.src == > > > > > > > > > > > > fe80::ac:10ff:fe01:1 && tcp.src == 179), > > > > > > > > > > > > action=(reg0[[15]] > > > > > > > > > > > > = check_in_port_sec(); next;) > > > > > > > > > > > > +]) > > > > > > > > > > > > + > > > > > > > > > > > > +# Remove 'bgp-redirect' option from LRP and > > > > > > > > > > > > check that > > > > > > > > > > > > rules are removed > > > > > > > > > > > > +check ovn-nbctl --wait=sb remove > > > > > > > > > > > > logical_router_port lr-ls > > > > > > > > > > > > options bgp-redirect > > > > > > > > > > > > +check_no_bgp_redirect > > > > > > > > > > > > + > > > > > > > > > > > > +# Set non-existent LSP as target of 'bgp- > > > > > > > > > > > > redirect' and > > > > > > > > > > > > check that no rules are added > > > > > > > > > > > > +check ovn-nbctl --wait=sb set > > > > > > > > > > > > logical_router_port lr-ls > > > > > > > > > > > > options:bgp-redirect=lsp-foo > > > > > > > > > > > > +check_no_bgp_redirect > > > > > > > > > > > > + > > > > > > > > > > > > +AT_CLEANUP > > > > > > > > > > > > +]) > > > > > > > > > > > > diff --git a/tests/system-ovn.at > > > > > > > > > > > > b/tests/system-ovn.at > > > > > > > > > > > > index c24ede7c5..c09c1136e 100644 > > > > > > > > > > > > --- a/tests/system-ovn.at > > > > > > > > > > > > +++ b/tests/system-ovn.at > > > > > > > > > > > > @@ -13027,3 +13027,89 @@ > > > > > > > > > > > > OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query > > > > > > > > > > > > port patch- > > > > > > > > > > > > .*/d > > > > > > > > > > > > /connection dropped.*/d"]) > > > > > > > > > > > > AT_CLEANUP > > > > > > > > > > > > ]) > > > > > > > > > > > > + > > > > > > > > > > > > +OVN_FOR_EACH_NORTHD([ > > > > > > > > > > > > +AT_SETUP([BGP control plane redirect]) > > > > > > > > > > > Nit: probably best to add AT_SKIP_IF([test > > > > > > > > > > > $HAVE_NC = no]) > > > > > > > > > > > here. > > > > > > > > > > > > > > > > > > > > > > > +ovn_start > > > > > > > > > > > > +OVS_TRAFFIC_VSWITCHD_START() > > > > > > > > > > > > + > > > > > > > > > > > > +ADD_BR([br-int]) > > > > > > > > > > > > +ADD_BR([br-ext]) > > > > > > > > > > > > + > > > > > > > > > > > > +ovs-ofctl add-flow br-ext action=normal > > > > > > > > > > > > +# Set external-ids in br-int needed for ovn- > > > > > > > > > > > > controller > > > > > > > > > > > > +ovs-vsctl \ > > > > > > > > > > > > + -- set Open_vSwitch . external- > > > > > > > > > > > > ids:system-id=hv1 \ > > > > > > > > > > > > + -- set Open_vSwitch . external- > > > > > > > > > > > > ids:ovn- > > > > > > > > > > > > remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > > > > > > > > > > > > + -- set Open_vSwitch . external- > > > > > > > > > > > > ids:ovn-encap- > > > > > > > > > > > > type=geneve \ > > > > > > > > > > > > + -- set Open_vSwitch . external- > > > > > > > > > > > > ids:ovn-encap- > > > > > > > > > > > > ip=169.0.0.1 \ > > > > > > > > > > > > + -- set bridge br-int fail-mode=secure > > > > > > > > > > > > other- > > > > > > > > > > > > config:disable-in-band=true > > > > > > > > > > > Most ov*-ctl commands should be prefixed with > > > > > > > > > > > "check" in this > > > > > > > > > > > test. > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > > +# Start ovn-controller > > > > > > > > > > > > +start_daemon ovn-controller > > > > > > > > > > > > + > > > > > > > > > > > > +ovn-nbctl lr-add R1 \ > > > > > > > > > > > > + -- set Logical_Router R1 > > > > > > > > > > > > options:chassis=hv1 > > > > > > > > > > > > + > > > > > > > > > > > > +ovn-nbctl ls-add public > > > > > > > > > > > > + > > > > > > > > > > > > +ovn-nbctl lrp-add R1 rp-public > > > > > > > > > > > > 00:00:02:01:02:03 > > > > > > > > > > > > 172.16.1.1/24 > > > > > > > > > > > > + > > > > > > > > > > > > +ovn-nbctl lsp-add public public-rp -- set > > > > > > > > > > > > Logical_Switch_Port public-rp \ > > > > > > > > > > > > + type=router options:router-port=rp-public > > > > > > > > > > > > \ > > > > > > > > > > > > + -- lsp-set-addresses public-rp router > > > > > > > > > > > > + > > > > > > > > > > > > +ovn-nbctl lsp-add public bgp-daemon \ > > > > > > > > > > > > + -- lsp-set-addresses bgp-daemon unknown > > > > > > > > > > > > + > > > > > > > > > > > > +AT_CHECK([ovs-vsctl set Open_vSwitch . > > > > > > > > > > > > external-ids:ovn- > > > > > > > > > > > > bridge-mappings=phynet:br-ext]) > > > > > > > > > > > > +ovn-nbctl lsp-add public public1 \ > > > > > > > > > > > > + -- lsp-set-addresses public1 unknown \ > > > > > > > > > > > > + -- lsp-set-type public1 localnet \ > > > > > > > > > > > > + -- lsp-set-options public1 > > > > > > > > > > > > network_name=phynet > > > > > > > > > > > > + > > > > > > > > > > > > +ovn-nbctl --wait=hv sync > > > > > > > > > > > > + > > > > > > > > > > > > +# Set option that redirects BGP control plane > > > > > > > > > > > > traffic to a > > > > > > > > > > > > LSP "bgp-daemon" > > > > > > > > > > > > +ovn-nbctl --wait=sb set logical_router_port > > > > > > > > > > > > rp-public > > > > > > > > > > > > options:bgp-redirect=bgp-daemon > > > > > > > > > > > > + > > > > > > > > > > > > +# Create "bgp-daemon" interface in a namespace > > > > > > > > > > > > with IP and > > > > > > > > > > > > MAC matching LRP "rp-public" > > > > > > > > > > > > +ADD_NAMESPACES(bgp-daemon) > > > > > > > > > > > > +ADD_VETH(bgp-daemon, bgp-daemon, br-int, > > > > > > > > > > > > "172.16.1.1/24", > > > > > > > > > > > > "00:00:02:01:02:03") > > > > > > > > > > > > + > > > > > > > > > > > > +ADD_NAMESPACES(ext-foo) > > > > > > > > > > > > +ADD_VETH(ext-foo, ext-foo, br-ext, > > > > > > > > > > > > "172.16.1.100/24", > > > > > > > > > > > > "00:10:10:01:02:13", \ > > > > > > > > > > > > + "172.16.1.1") > > > > > > > > > > > > + > > > > > > > > > > > > +# Flip the interface down/up to get proper > > > > > > > > > > > > IPv6 LLA > > > > > > > > > > > > +NS_EXEC([bgp-daemon], [ip link set down bgp- > > > > > > > > > > > > daemon]) > > > > > > > > > > > > +NS_EXEC([bgp-daemon], [ip link set up bgp- > > > > > > > > > > > > daemon]) > > > > > > > > > > > > + > > > > > > > > > > > > +# Wait until IPv6 LLA loses the "tentative" > > > > > > > > > > > > flag otherwise > > > > > > > > > > > > it can't be bound to. > > > > > > > > > > > > +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a > > > > > > > > > > > > show dev bgp- > > > > > > > > > > > > daemon | grep "fe80::" | grep -v tentative])]) > > > > > > > > > > > > +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show > > > > > > > > > > > > dev ext-foo > > > > > > > > > > > > > grep "fe80::" | grep -v tentative])]) > > > > > > > > > > > > + > > > > > > > > > > > > +# Verify that BGP control plane traffic is > > > > > > > > > > > > delivered to > > > > > > > > > > > > the "bgp-daemon" > > > > > > > > > > > > +# interface on both IPv4 and IPv6 LLA > > > > > > > > > > > > addresses > > > > > > > > > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k > > > > > > > > > > > > 172.16.1.1 179], > > > > > > > > > > > > [bgp_v4.pid]) > > > > > > > > > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc > > > > > > > > > > > > --send-only > > > > > > > > > > > > 172.16.1.1 179]) > > > > > > > > > > > > + > > > > > > > > > > > > +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k > > > > > > > > > > > > fe80::200:2ff:fe01:203%bgp-daemon 179], > > > > > > > > > > > > [bgp_v6.pid]) > > > > > > > > > > > > +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc > > > > > > > > > > > > --send-only > > > > > > > > > > > > -6 fe80::200:2ff:fe01:203%ext-foo 179]) > > > > > > > > > > > > + > > > > > > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovn-controller]) > > > > > > > > > > > > + > > > > > > > > > > > > +as ovn-sb > > > > > > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > > > > > > > > > + > > > > > > > > > > > > +as ovn-nb > > > > > > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > > > > > > > > > > > + > > > > > > > > > > > > +as northd > > > > > > > > > > > > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) > > > > > > > > > > > > + > > > > > > > > > > > > +as > > > > > > > > > > > > +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error > > > > > > > > > > > > receiving.*/d > > > > > > > > > > > > +/.*terminating with signal 15.*/d"]) > > > > > > > > > > > > +AT_CLEANUP > > > > > > > > > > > > +]) > > > > > > > > > > > Regards, > > > > > > > > > > > Dumitru > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > dev mailing list > > > > > > > > > dev@openvswitch.org > > > > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > _______________________________________________ > > > > > > dev mailing list > > > > > > dev@openvswitch.org > > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On 8/8/24 14:31, martin.kalcok@canonical.com wrote: > On Thu, 2024-08-08 at 14:18 +0200, Dumitru Ceara wrote: >> On 8/8/24 14:04, Dumitru Ceara wrote: >>> On 8/8/24 13:51, Vladislav Odintsov wrote: >>>> >>>> On 08.08.2024 18:45, Dumitru Ceara wrote: >>>>> On 8/8/24 11:14, Vladislav Odintsov wrote: >>>>>> On 08.08.2024 15:51, martin.kalcok@canonical.com wrote: >>>>>>> On Thu, 2024-08-08 at 10:46 +0200, Dumitru Ceara wrote: >>>>>>>> On 8/8/24 10:42, Vladislav Odintsov wrote: >>>>>>>>> On 08.08.2024 03:16, Dumitru Ceara wrote: >>>>>>>>>> Re-adding the dev list. >>>>>>>>>> >>>>>>>>>> On 8/7/24 18:12, Vladislav Odintsov wrote: >>>>>>>>>>> Hi Dumitru, >>>>>>>>>>> >>>>>>>>>> Hi Vladislav, >>>>>>>>>> >>>>>>>>>>> I'd like to add some thoughts to your inputs. >>>>>>>>>>> >>>>>>>>>> Thanks for that, I added some more of my own below. >>>>>>>>>> :) >>>>>>>>>> >>>>>>>>>>> On 06.08.2024 19:23, Dumitru Ceara wrote: >>>>>>>>>>>> Hi Martin, >>>>>>>>>>>> >>>>>>>>>>>> Sorry, I was reviewing V3 but I was slow in >>>>>>>>>>>> actually sending >>>>>>>>>>>> out the email. >>>>>>>>>>>> >>>>>>>>>>>> On 8/6/24 13:17, Martin Kalcok wrote: >>>>>>>>>>>>> This change adds a 'bgp-redirect' option to LRP >>>>>>>>>>>>> that allows >>>>>>>>>>>>> redirecting of BGP control plane traffic to an >>>>>>>>>>>>> arbitrary >>>>>>>>>>>>> LSP >>>>>>>>>>>>> in its peer LS. >>>>>>>>>>>>> >>>>>>>>>>>>> The option expects a string with a LSP name. >>>>>>>>>>>>> When set, >>>>>>>>>>>>> any traffic entering LS that's destined for any >>>>>>>>>>>>> of the >>>>>>>>>>>>> LRP's IP addresses (including IPv6 LLA) is >>>>>>>>>>>>> redirected >>>>>>>>>>>>> to the LSP specified in the option's value. >>>>>>>>>>>>> >>>>>>>>>>>>> This enables external BGP daemons to listen on >>>>>>>>>>>>> an interface >>>>>>>>>>>>> bound to a LSP and effectively act as if they >>>>>>>>>>>>> were >>>>>>>>>>>>> listening >>>>>>>>>>>>> on (and speaking from) LRP's IP address. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Martin Kalcok >>>>>>>>>>>>> <martin.kalcok@canonical.com> >>>>>>>>>>>>> --- >>>>>>>>>>>> Strictly on this version of the patch, and with >>>>>>>>>>>> the thought >>>>>>>>>>>> in mind that >>>>>>>>>>>> we might want to consider this feature >>>>>>>>>>>> experimental [0] and >>>>>>>>>>>> maybe change >>>>>>>>>>>> it (NB too) in the future, I left a few comments >>>>>>>>>>>> inline. >>>>>>>>>>>> With those >>>>>>>>>>>> addressed I think the patch looks OK to me. >>>>>>>>>>>> >>>>>>>>>>>> [0] >>>>>>>>>>>> https://patchwork.ozlabs.org/project/ovn/patch/20240802155449.3528777-1-mmichels@redhat.com/ >>>>>>>>>>>> >>>>>>>>>>>> Now, for the point when we'd remove the >>>>>>>>>>>> "experimental" tag: >>>>>>>>>>>> >>>>>>>>>>>> In general, it makes me a bit uneasy that we have >>>>>>>>>>>> to share >>>>>>>>>>>> the MAC and >>>>>>>>>>>> IP between the LRP and the VIF of logical switch >>>>>>>>>>>> port that's >>>>>>>>>>>> connected >>>>>>>>>>>> to the same switch as the LRP. >>>>>>>>>>>> >>>>>>>>>>>> I was thinking of alternatives for the future and >>>>>>>>>>>> the only >>>>>>>>>>>> things I >>>>>>>>>>>> could come up with until now are: >>>>>>>>>>>> >>>>>>>>>>>> 1. Create a separate, "disconnected" logical >>>>>>>>>>>> switch: >>>>>>>>>>>> ovn-nbctl ls-add bgp >>>>>>>>>>>> >>>>>>>>>>>> Add the bgp-daemon port to it: >>>>>>>>>>>> ovn-nbctl lsp-add bgp bgp-daemon. >>>>>>>>>>>> >>>>>>>>>>>> Then we don't need "unknown" either, I think. >>>>>>>>>>>> >>>>>>>>>>>> But it's not possible today to move packets from >>>>>>>>>>>> the ingress >>>>>>>>>>>> pipeline of >>>>>>>>>>>> a logical switch ("public" in this test) to the >>>>>>>>>>>> egress >>>>>>>>>>>> pipeline of a >>>>>>>>>>>> different logical switch ("bgp" in my example). >>>>>>>>>>>> It also >>>>>>>>>>>> feels icky to >>>>>>>>>>>> implement that. >>>>>>>>>>>> >>>>>>>>>>>> 2. Add the ability to bind an OVS port directly >>>>>>>>>>>> to a logical >>>>>>>>>>>> router port: >>>>>>>>>>>> >>>>>>>>>>>> then we could do the same type of redirect you do >>>>>>>>>>>> here but >>>>>>>>>>>> instead in >>>>>>>>>>>> the logical router pipeline. The advantage is we >>>>>>>>>>>> don't have >>>>>>>>>>>> to drop any >>>>>>>>>>>> non-bgp traffic in the switch pipeline. The >>>>>>>>>>>> switch keeps >>>>>>>>>>>> functioning as >>>>>>>>>>>> it does today. >>>>>>>>>>>> >>>>>>>>>>>> Maybe an advantage of this second alternative >>>>>>>>>>>> would be that >>>>>>>>>>>> we can >>>>>>>>>>>> easily attach a filtering option to this LRP >>>>>>>>>>>> (e.g., >>>>>>>>>>>> LRP.options:control-traffic-filter) to allow us >>>>>>>>>>>> to be more >>>>>>>>>>>> flexible in >>>>>>>>>>>> what kind of traffic we forward to the actuall >>>>>>>>>>>> routing >>>>>>>>>>>> protocol daemon >>>>>>>>>>>> that runs behind that OVS port - Vladislav also >>>>>>>>>>>> mentioned >>>>>>>>>>>> during the >>>>>>>>>>>> meeting it might be interesting to forward BFD >>>>>>>>>>>> packets to the >>>>>>>>>>>> FRR (or >>>>>>>>>>>> whatever application implements the routing >>>>>>>>>>>> protocol) >>>>>>>>>>>> instance too. >>>>>>>>>>> The idea to be able to bind LRP to OVS port sounds >>>>>>>>>>> very >>>>>>>>>>> interesting to >>>>>>>>>>> me. But with a note that it's not a pure "bind", >>>>>>>>>>> but a partial: >>>>>>>>>>> as you >>>>>>>>>>> wrote with some "filter" applied to pass not all >>>>>>>>>>> the traffic. >>>>>>>>>>> The main >>>>>>>>>>> idea here is to pass only control plane traffic >>>>>>>>>>> destined to >>>>>>>>>>> LRP's >>>>>>>>>> As we're discussing on the other thread (Martin >>>>>>>>>> pointed it out) >>>>>>>>>> we also >>>>>>>>>> probably need to involve conntrack and allow replies >>>>>>>>>> to any kind >>>>>>>>>> of >>>>>>>>>> traffic initiated by the entity behind the LRP's VIF >>>>>>>>>> (e.g., BGP >>>>>>>>>> sessions >>>>>>>>>> initiated from the host). >>>>>>>>>> >>>>>>>>>>> addresses. Other than that seems odd. Transit >>>>>>>>>>> traffic should be >>>>>>>>>>> remained >>>>>>>>>>> in LR pipeline otherwise it will have no difference >>>>>>>>>>> with a >>>>>>>>>>> regular VIF LSP. >>>>>>>>>>> >>>>>>>>>> Definitely, traffic that needs to be DNATed >>>>>>>>>> (DNAT/unSNAT rules or >>>>>>>>>> LB >>>>>>>>>> rules that will change the destination IP from LRP IP >>>>>>>>>> to >>>>>>>>>> something else) >>>>>>>>>> should not be affected. All other "transit" traffic >>>>>>>>>> doesn't have >>>>>>>>>> LRP >>>>>>>>>> IP, does it? >>>>>>>>> You're right. >>>>>>>>> >>>>>>>>> @Dumitru, @Martin, what if we just redirect all traffic >>>>>>>>> destined to >>>>>>>>> LRP >>>>>>>>> IPs to redirect port? Is there any drawbacks? >>>>>>>>> For security it is possible to optionally use ACLs with >>>>>>>>> or without >>>>>>>>> conntrack. It's up to user/CMS. >>>>>>>>> >>>>>>>>> This seems quite simple in the code and very flexible. >>>>>>>>> So no >>>>>>>>> additional >>>>>>>>> flows seems to be needed in future to support any other >>>>>>>>> routing >>>>>>>>> protocols (or for another possible non-routing >>>>>>>>> usecases). >>>>>>>>> >>>>>>>> Won't this break all "transit" traffic that has >>>>>>>> destination IP the >>>>>>>> LRP >>>>>>>> IP (DNAT, LB), etc? >>>>>>>> >>>>>>> I'm afraid so. I guess I was just tired yesterday when I >>>>>>> made that >>>>>>> proposal for general redirect. Given that the redirect is >>>>>>> implemented >>>>>>> in the LS pipeline, it would eat up all the traffic for >>>>>>> LRP's IP before >>>>>>> the DNAT/unSNAT occurs in the LR pipeline. I'll give it a >>>>>>> quick test in >>>>>>> case I'm wrong here again. >>>>>> Oops, this will definitely break that cases. For some reason >>>>>> I thought >>>>>> only about routing cases. >>>>>> >>>>>>> If that's the case, I'll try the conntrack approach and >>>>>>> update you all >>>>>>> asap. >>>>>>> >>>>>>> Martin. > > I tried approach of redirecting traffic that matches: > > (ip.dst == <LRP_IP> && tcp.dst == <BGP>) || (ip.dst == <LRP_IP> && > tcp.src == <BGP>) > > This seems to solve bidirectional BGP sessions without the need for > redirecting all of the traffic, or involving conntrack. Though I see > some initial lag when session is initiated from our BGP daemon. I > suspect that this is something to do with ARP/ND resolution. I'm > looking into it. But to me this seems like a relatively easy to > implement and functioning approach (provided I find the reason for the > initial lag). > >>>>>>> >>>>>>>>>>> Having a filter (or match like in ACL or >>>>>>>>>>> Logical_Router_Policies) looks >>>>>>>>>>> more flexible in terms of used protocols. This can >>>>>>>>>>> coexist with >>>>>>>>>>> proposal >>>>>>>>>>> from current patch, where the flow is not >>>>>>>>>>> programmable from >>>>>>>>>>> user. >>>>>>>>>>> >>>>>>>>>> I think so too. >>>>>>>>>> Regards, >>>>>>>>>> Dumitru >>>>>>>>>> >>>>>>>>>>>> But again, if we consider the current feature >>>>>>>>>>>> "experimental", >>>>>>>>>>>> we can >>>>>>>>>>>> spend more time during the next release cycle to >>>>>>>>>>>> figure out >>>>>>>>>>>> what's best. >>>>>>>>>>>> >>>>>>>>>>>>> v4 of this patch renames the feature from "bgp- >>>>>>>>>>>>> mirror" to >>>>>>>>>>>>> "bgp-redirect" >>>>>>>>>>>>> as discussed during community meeting. >>>>>>>>>>>>> >>>>>>>>>>>>> northd/northd.c | 108 >>>>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>>>>> northd/ovn-northd.8.xml | 23 +++++++++ >>>>>>>>>>>>> ovn-nb.xml | 14 ++++++ >>>>>>>>>>>>> tests/ovn-northd.at | 58 >>>>>>>>>>>>> +++++++++++++++++++++ >>>>>>>>>>>>> tests/system-ovn.at | 86 >>>>>>>>>>>>> ++++++++++++++++++++++++++++++++ >>>>>>>>>>>>> 5 files changed, 289 insertions(+) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c >>>>>>>>>>>>> index 4353df07d..088104f25 100644 >>>>>>>>>>>>> --- a/northd/northd.c >>>>>>>>>>>>> +++ b/northd/northd.c >>>>>>>>>>>>> @@ -13048,6 +13048,113 @@ >>>>>>>>>>>>> build_arp_resolve_flows_for_lrp(struct ovn_port >>>>>>>>>>>>> *op, >>>>>>>>>>>>> } >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> +static void >>>>>>>>>>>>> +build_bgp_redirect_rule__( >>>>>>>>>>>>> + const char *s_addr, const char >>>>>>>>>>>>> *bgp_port_name, >>>>>>>>>>>>> bool is_ipv6, >>>>>>>>>>>>> + struct ovn_port *ls_peer, struct >>>>>>>>>>>>> lflow_table >>>>>>>>>>>>> *lflows, >>>>>>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>>>>>> +{ >>>>>>>>>>>>> + int ip_ver = is_ipv6 ? 6 : 4; >>>>>>>>>>>>> + /* Redirect packets in the input pipeline >>>>>>>>>>>>> destined for >>>>>>>>>>>>> LR's IP to >>>>>>>>>>>>> + * the port specified in 'bgp-redirect' >>>>>>>>>>>>> option. >>>>>>>>>>>>> + */ >>>>>>>>>>>>> + ds_clear(match); >>>>>>>>>>>>> + ds_clear(actions); >>>>>>>>>>>>> + ds_put_format(match, "ip%d.dst == %s && >>>>>>>>>>>>> tcp.dst == >>>>>>>>>>>>> 179", ip_ver, s_addr); >>>>>>>>>>>>> + ds_put_format(actions, "outport = \"%s\"; >>>>>>>>>>>>> output;", >>>>>>>>>>>>> bgp_port_name); >>>>>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>>>>> S_SWITCH_IN_L2_LKUP, 100, >>>>>>>>>>>>> + ds_cstr(match), >>>>>>>>>>>>> + ds_cstr(actions), >>>>>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>>>>>> + >>>>>>>>>>>>> + >>>>>>>>>>>>> + /* Drop any traffic originating from 'bgp- >>>>>>>>>>>>> redirect' >>>>>>>>>>>>> port that does >>>>>>>>>>>>> + * not originate from BGP daemon port. >>>>>>>>>>>>> This blocks >>>>>>>>>>>>> unnecessary >>>>>>>>>>>>> + * traffic like ARP broadcasts or IPv6 >>>>>>>>>>>>> router >>>>>>>>>>>>> solicitation packets >>>>>>>>>>>>> + * from the dummy 'bgp-redirect' port. >>>>>>>>>>>>> + */ >>>>>>>>>>>>> + ds_clear(match); >>>>>>>>>>>>> + ds_put_format(match, "inport == \"%s\"", >>>>>>>>>>>>> bgp_port_name); >>>>>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 80, >>>>>>>>>>>>> + ds_cstr(match), >>>>>>>>>>>>> + REGBIT_PORT_SEC_DROP " = 1; >>>>>>>>>>>>> next;", >>>>>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>>>>> Not a huge deal but this logical flow gets >>>>>>>>>>>> regenerated >>>>>>>>>>>> multiple times >>>>>>>>>>>> for each bgp_port_name (for each LRP address). >>>>>>>>>>>> In the end >>>>>>>>>>>> we'll only >>>>>>>>>>>> install one in the database so there's no impact >>>>>>>>>>>> on SB. But >>>>>>>>>>>> maybe we >>>>>>>>>>>> should consider moving this part out of this >>>>>>>>>>>> function. >>>>>>>>>>>> >>>>>>>>>>>>> + >>>>>>>>>>>>> + ds_put_format(match, >>>>>>>>>>>>> + " && ip%d.src == %s && >>>>>>>>>>>>> tcp.src == 179", >>>>>>>>>>>>> + ip_ver, >>>>>>>>>>>>> + s_addr); >>>>>>>>>>>>> + ovn_lflow_add(lflows, ls_peer->od, >>>>>>>>>>>>> S_SWITCH_IN_CHECK_PORT_SEC, 81, >>>>>>>>>>>>> + ds_cstr(match), >>>>>>>>>>>>> + REGBIT_PORT_SEC_DROP " = >>>>>>>>>>>>> check_in_port_sec(); next;", >>>>>>>>>>>>> + ls_peer->lflow_ref); >>>>>>>>>>>> In your system test we set 'addresses = >>>>>>>>>>>> "unknown"' for >>>>>>>>>>>> 'bgp_port_name'. >>>>>>>>>>>> And I guess we kind of have to do it like that >>>>>>>>>>>> all the time >>>>>>>>>>>> when using >>>>>>>>>>>> this feature. I'm trying to think of scenarios >>>>>>>>>>>> in which >>>>>>>>>>>> users set >>>>>>>>>>>> LSP.addresses = 'unknown' and LSP.port-security = >>>>>>>>>>>> '<some- >>>>>>>>>>>> addresses>'. >>>>>>>>>>>> I'm not sure I saw those until now but maybe I'm >>>>>>>>>>>> wrong. >>>>>>>>>>> I'm also interested whether this LSP port with >>>>>>>>>>> unknown >>>>>>>>>>> addresses will >>>>>>>>>>> get all the broadcast traffic? >>>>>>>>>>> >>>>>>>>>>> @Martin, didn't you look for ARP behavior? If we >>>>>>>>>>> have one LS >>>>>>>>>>> with three >>>>>>>>>>> LSPs: >>>>>>>>>>> - 1 router type LSP with MAC 00:00:00:00:00:01 and >>>>>>>>>>> IP >>>>>>>>>>> 169.254.0.1/30 >>>>>>>>>>> - one "vif" LSP with "unknown" addresses set and >>>>>>>>>>> configured >>>>>>>>>>> underlying >>>>>>>>>>> OVS port with same MAC and IP set >>>>>>>>>>> - one localnet port for external connectivity. >>>>>>>>>>> >>>>>>>>>>> If some party from localnet side sends broadcast >>>>>>>>>>> ARP request >>>>>>>>>>> "Who has >>>>>>>>>>> 169.254.0.1? Answer <...>", will this ARP request >>>>>>>>>>> got flooded >>>>>>>>>>> to both >>>>>>>>>>> LRP and LSP with "unknown" addresses? If yes, will >>>>>>>>>>> the sender >>>>>>>>>>> receive 2 >>>>>>>>>>> ARP replies? >>>>>>>>>>> >>>>>>>>>>>>> +} >>>>>>>>>>>>> + >>>>>>>>>>>>> +static void >>>>>>>>>>>>> +build_lrouter_bgp_redirect( >>>>>>>>>>>>> + struct ovn_port *op, struct >>>>>>>>>>>>> lflow_table *lflows, >>>>>>>>>>>>> + struct ds *match, struct ds *actions) >>>>>>>>>>>>> +{ >>>>>>>>>>>>> + /* LRP has to have a peer.*/ >>>>>>>>>>>>> + if (op->peer == NULL) { >>>>>>>>>>>>> + return; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + /* LRP has to have NB record.*/ >>>>>>>>>>>>> + if (op->nbrp == NULL) { >>>>>>>>>>>>> + return; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + /* Proceed only for LRPs that have 'bgp- >>>>>>>>>>>>> redirect' >>>>>>>>>>>>> option set. Value of this >>>>>>>>>>>>> + * option is the name of LSP to which a >>>>>>>>>>>>> BGP traffic >>>>>>>>>>>>> will be redirected. >>>>>>>>>>>>> + */ >>>>>>>>>>>>> + const char *bgp_port = smap_get(&op->nbrp- >>>>>>>>>>>>>> options, >>>>>>>>>>>>> "bgp-redirect"); >>>>>>>>>>>>> + if (bgp_port == NULL) { >>>>>>>>>>>>> + return; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + /* Ensure that LSP, to which the BGP >>>>>>>>>>>>> traffic is >>>>>>>>>>>>> redirected, exists.*/ >>>>>>>>>>>>> + struct ovn_port *peer_lsp; >>>>>>>>>>>>> + bool bgp_port_exists = false; >>>>>>>>>>>>> + HMAP_FOR_EACH (peer_lsp, dp_node, &op- >>>>>>>>>>>>>> peer->od- >>>>>>>>>>>>>> ports) { >>>>>>>>>>>>> + size_t peer_lsp_s = strlen(peer_lsp- >>>>>>>>>>>>>> key); >>>>>>>>>>>>> + if (peer_lsp_s == strlen(bgp_port) >>>>>>>>>>>>> + && !strncmp(peer_lsp->key, >>>>>>>>>>>>> bgp_port, >>>>>>>>>>>>> peer_lsp_s)){ >>>>>>>>>>>>> + bgp_port_exists = true; >>>>>>>>>>>>> + break; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + if (!bgp_port_exists) { >>>>>>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp- >>>>>>>>>>>>> redirect' set on >>>>>>>>>>>>> Logical Router Port " >>>>>>>>>>>>> + "'%s' refers to non- >>>>>>>>>>>>> existent >>>>>>>>>>>>> Logical Switch Port. " >>>>>>>>>>>>> + "BGP redirecting >>>>>>>>>>>>> won't be >>>>>>>>>>>>> configured.", >>>>>>>>>>>>> + op->key); >>>>>>>>>>>>> + return; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + if (op->cr_port != NULL) { >>>>>>>>>>>>> + static struct vlog_rate_limit rl = >>>>>>>>>>>>> VLOG_RATE_LIMIT_INIT(1, 5); >>>>>>>>>>>>> + VLOG_WARN_RL(&rl, "Option 'bgp- >>>>>>>>>>>>> redirect' is not >>>>>>>>>>>>> supported on" >>>>>>>>>>>>> + " Distributed >>>>>>>>>>>>> Gateway Port >>>>>>>>>>>>> '%s'", op->key); >>>>>>>>>>>>> + return; >>>>>>>>>>>>> + } >>>>>>>>>>>>> + >>>>>>>>>>>>> + /* Mirror traffic destined for LRP's IPs >>>>>>>>>>>>> and default >>>>>>>>>>>>> BGP port >>>>>>>>>>>>> + * to the port defined in 'bgp-redirect' >>>>>>>>>>>>> option. >>>>>>>>>>>>> + */ >>>>>>>>>>>>> + for (size_t i = 0; i < op- >>>>>>>>>>>>>> lrp_networks.n_ipv4_addrs; >>>>>>>>>>>>> i++) { >>>>>>>>>>>>> + const char *ip_s = op- >>>>>>>>>>>>>> lrp_networks.ipv4_addrs[i].addr_s; >>>>>>>>>>>>> + build_bgp_redirect_rule__(ip_s, >>>>>>>>>>>>> bgp_port, false, >>>>>>>>>>>>> op->peer, lflows, >>>>>>>>>>>>> + match, >>>>>>>>>>>>> actions); >>>>>>>>>>>>> + } >>>>>> I'm thinking more and more that having a "match" expression >>>>>> in a >>>>>> "NB.Logical_Router_Port.options:redirect-match" option it a >>>>>> good choice. >>>>>> >>>>> Would the match be enforced only for redirected traffic (from >>>>> outside to >>>>> bgp target LSP)? >>>> I guess I didn't get your question, sorry. >>> >>> I was thinking of the case Martin brought up yesterday, with the >>> session >>> initiated from inside. >>> >>> If we have the following config (not sure if the option names will >>> change but I hope it's clear): >>> >>> LRP { >>> .name = lrp >>> .networks = [{00:00:00:00:00:01, 1.1.1.1/24}] >>> .options = { >>> .redirect-port = 'bgp-daemon' >>> .redirect-match = 'tcp.dst == 179 || tcp.src == 179' >>> } >>> } >>> >>> Then, in the switch pipeline in the L2_LOOKUP stage, we'll only >>> force >>> the outport to be 'bgp-daemon' if "destination IP == 1.1.1.1 && >>> 'redirect-match'" is true. So, only traffic coming from "outside" >>> towards the router IP and that also matches the filter. >>> >>> No additional restrictions. >>> >>> Will this be enough though? >>> >>> What if 'bgp-daemon' ARPs for 1.1.1.2 (an external host)? How does >>> that >>> work? ARP reply is unicast and will have dmac that of the LRP. >>> >>> We could include ARP and ND in the match too but then what about >>> ARP >>> replies that actually should go to the router port? > > Is there a way to deliver the ARP replies to both ports? iirc, for > example, ct_commit forks pipelines. (I'm not saying to use ct_commit, > but apply similar approach??) ct_commit doesn't really fork anything IIUC. I remember the docs say that but it's not really what's happening. We have "clone()" but blindly duplicating all control traffic is not really enough I think. But please check the ICMP example below. It's not only ARP/ND we need to handle. We need a way to differentiate between "replies to traffic originated by the LRP" and "replies to traffic originated by the LSP we redirect traffic to".. so maybe some sort of conntrack based forwarding (which we don't have in ovn today). >>> >> >> Also, what about other control traffic, e.g.: >> >> a. ICMP replies from external hosts for ICMP echo initiated by bgp- >> daemon >> >> vs >> >> b. ICMP replies from external hosts for ICMP echo initiated by >> something >> else (a different VIF) behind the LR that got SNAT-ed to the LRP IP. >> >>> I'm actually thinking more and more that we shouldn't redirect any >>> traffic like this, we should just bind the VIF directly to the LRP. >>> >>>>>> LRP can have multiple IP addresses and enabling redirect for >>>>>> all of them >>>>>> could be implicit for the user and may contain security >>>>>> concerns. As an >>>>>> example: LR can have configured LB with tcp/179. If user >>>>>> configured >>>>>> peering IP-address (which can be in a link-local address >>>>>> range) and also >>>>>> added an IP address for Load Balancer in a `networks` list of >>>>>> a LRP. >>>>>> Enabling BGP redirect for all IPs of LRP will break load >>>>>> balancing for >>>>>> tcp/179. While it is not needed for load balancers to assign >>>>>> their VIP >>>>>> to networks, there is no validations, which prohibit doing >>>>>> so. It is >>>>> I think this is actually something that ovn-kubernetes does in >>>>> their >>>>> deployments for cluster-IP services. And I suspect that today >>>>> there's >>>>> nothing preventing users from creating cluster-IP services for >>>>> TCP/179. > > Would it be useful to redirect only traffic for LRP's IPv6 LLA? > @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for setting up > BGP peering? > I'll let Vladislav comment on this. I do think IPv4 might be a requirement on the long term for us downstream though. >>>> >>>> So, this proofs that this is a possible problem if enabled >>>> redirect and >>>> if load balancer's IP is configured to be in networks list. >>>> >>>> In any case this doesn't prevent from current implementation to >>>> merged >>>> as "experimental". I'm just started discussion about a GA >>>> solution. >>>> >>>>>> still possible and can be difficult to catch such behavior >>>>>> for >>>>>> redirected traffic with Load Balancers. >>>>>> >>>>>>>>>>>>> + for (size_t i = 0; i < op- >>>>>>>>>>>>>> lrp_networks.n_ipv6_addrs; >>>>>>>>>>>>> i++) { >>>>>>>>>>>>> + const char *ip_s = op- >>>>>>>>>>>>>> lrp_networks.ipv6_addrs[i].addr_s; >>>>>>>>>>>>> + build_bgp_redirect_rule__(ip_s, >>>>>>>>>>>>> bgp_port, true, >>>>>>>>>>>>> op->peer, lflows, >>>>>>>>>>>>> + match, >>>>>>>>>>>>> actions); >>>>>>>>>>>>> + } >>>>>>>>>>>>> +} >>>>>>>>>>>>> + >>>>>>>>>>>>> /* This function adds ARP resolve flows >>>>>>>>>>>>> related to a LSP. >>>>>>>>>>>>> */ >>>>>>>>>>>>> static void >>>>>>>>>>>>> build_arp_resolve_flows_for_lsp( >>>>>>>>>>>>> @@ -16003,6 +16110,7 @@ >>>>>>>>>>>>> build_lswitch_and_lrouter_iterate_by_lrp(struct >>>>>>>>>>>>> ovn_port >>>>>>>>>>>>> *op, >>>>>>>>>>>>> lsi- >>>>>>>>>>>>>> meter_groups, op- >>>>>>>>>>>>>> lflow_ref); >>>>>>>>>>>>> >>>>>>>>>>>>> build_lrouter_icmp_packet_toobig_admin_flows(op >>>>>>>>>>>>> , lsi- >>>>>>>>>>>>>> lflows, &lsi->match, >>>>>>>>>>>>> >>>>>>>>>>>>> &lsi- >>>>>>>>>>>>>> actions, op->lflow_ref); >>>>>>>>>>>>> + build_lrouter_bgp_redirect(op, lsi- >>>>>>>>>>>>>> lflows, &lsi- >>>>>>>>>>>>>> match, &lsi->actions); >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> static void * >>>>>>>>>>>>> diff --git a/northd/ovn-northd.8.xml >>>>>>>>>>>>> b/northd/ovn- >>>>>>>>>>>>> northd.8.xml >>>>>>>>>>>>> index b06b09ac5..4cbdf8f74 100644 >>>>>>>>>>>>> --- a/northd/ovn-northd.8.xml >>>>>>>>>>>>> +++ b/northd/ovn-northd.8.xml >>>>>>>>>>>>> @@ -284,6 +284,21 @@ >>>>>>>>>>>>> dropped in the next stage. >>>>>>>>>>>>> </li> >>>>>>>>>>>>> >>>>>>>>>>>>> + <li> >>>>>>>>>>>>> + For each logical port that's defined >>>>>>>>>>>>> as a target >>>>>>>>>>>>> of BGP redirecting >>>>>>>>>>>>> + (via <code>bgp-redirect</code> option >>>>>>>>>>>>> set on >>>>>>>>>>>>> Logical Router Port), a >>>>>>>>>>>>> + filter is set in place that disallows >>>>>>>>>>>>> any traffic >>>>>>>>>>>>> entering this port >>>>>>>>>>>>> + that does not originate from Logical >>>>>>>>>>>>> Router Port's >>>>>>>>>>>>> IPs and default BGP >>>>>>>>>>>>> + port (TCP 179). This filtering is >>>>>>>>>>>>> achieved by two >>>>>>>>>>>>> rules. First rule has >>>>>>>>>>>>> + priority 81, it matches on >>>>>>>>>>>>> <code>inport == >>>>>>>>>>>>> <var>BGP_MIRROR_PORT</var> >>>>>>>>>>>>> + && ip.src == <var>LRP_IP</var> >>>>>>>>>>>>> && >>>>>>>>>>>>> tcp.src == 179</code> >>>>>>>>>>>>> + and allows traffic further with >>>>>>>>>>>>> <code>REGBIT_PORT_SEC_DROP" = >>>>>>>>>>>>> + check_in_port_sec(); next;</code>. >>>>>>>>>>>>> Second rule >>>>>>>>>>>>> with priority 80 matches >>>>>>>>>>>>> + the rest of the traffic from that port >>>>>>>>>>>>> and sets >>>>>>>>>>>>> + <code>REGBIT_PORT_SEC_DROP" = 1; >>>>>>>>>>>>> next;"</code> so >>>>>>>>>>>>> that the packets are >>>>>>>>>>>>> + dropped in the next stage. >>>>>>>>>>>>> + </li> >>>>>>>>>>>>> + >>>>>>>>>>>>> <li> >>>>>>>>>>>>> For each (enabled) vtep logical port, >>>>>>>>>>>>> a priority >>>>>>>>>>>>> 70 flow is added which >>>>>>>>>>>>> matches on all packets and applies the >>>>>>>>>>>>> action >>>>>>>>>>>>> @@ -1949,6 +1964,14 @@ output; >>>>>>>>>>>>> on the logical switch. >>>>>>>>>>>>> </li> >>>>>>>>>>>>> >>>>>>>>>>>>> + <li> >>>>>>>>>>>>> + For any logical port that's defined as >>>>>>>>>>>>> a target of >>>>>>>>>>>>> BGP redirecting (via >>>>>>>>>>>>> + <code>bgp-redirect</code> option set >>>>>>>>>>>>> on Logical >>>>>>>>>>>>> Router Port), a >>>>>>>>>>>>> + priority-100 flow is added that >>>>>>>>>>>>> redirects traffic >>>>>>>>>>>>> for Logical Router >>>>>>>>>>>>> + Port IPs (including IPv6 LLA) and TCP >>>>>>>>>>>>> port 179, to >>>>>>>>>>>>> the targeted >>>>>>>>>>>>> + logical port. >>>>>>>>>>>>> + </li> >>>>>>>>>>>>> + >>>>>>>>>>>>> <li> >>>>>>>>>>>>> Priority-90 flows for transit switches >>>>>>>>>>>>> that >>>>>>>>>>>>> forward registered >>>>>>>>>>>>> IP multicast traffic to their >>>>>>>>>>>>> corresponding >>>>>>>>>>>>> multicast group , which >>>>>>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>>>>>>>>>>> index 9552534f6..2d2f5c9e0 100644 >>>>>>>>>>>>> --- a/ovn-nb.xml >>>>>>>>>>>>> +++ b/ovn-nb.xml >>>>>>>>>>>>> @@ -3440,6 +3440,20 @@ or >>>>>>>>>>>>> </p> >>>>>>>>>>>>> </column> >>>>>>>>>>>>> >>>>>>>>>>>>> + <column name="options" key="bgp- >>>>>>>>>>>>> redirect" >>>>>>>>>>>>> type='{"type": "string"}'> >>>>>>>>>>>>> + <p> >>>>>>>>>>>>> + This option expects a name of a >>>>>>>>>>>>> Logical Switch >>>>>>>>>>>>> Port that's present >>>>>>>>>>>>> + in the peer's Logical Switch. If >>>>>>>>>>>>> set, it causes >>>>>>>>>>>>> any traffic >>>>>>>>>>>>> + that's destined for Logical Router >>>>>>>>>>>>> Port's IP >>>>>>>>>>>>> addresses (including >>>>>>>>>>>>> + its IPv6 LLA) and the default BGP >>>>>>>>>>>>> port (TCP >>>>>>>>>>>>> 179), to be redirected >>>>>>>>>>>>> + to the specified Logical Switch >>>>>>>>>>>>> Port. >>>>>>>>>>>>> + >>>>>>>>>>>>> + This allows external BGP daemon to >>>>>>>>>>>>> be bound to a >>>>>>>>>>>>> port in OVN's >>>>>>>>>>>>> + Logical Switch and act as if it was >>>>>>>>>>>>> listening on >>>>>>>>>>>>> Logical Router >>>>>>>>>>>>> + Port's IP address. >>>>>>>>>>>>> + </p> >>>>>>>>>>>>> + </column> >>>>>>>>>>>>> + >>>>>>>>>>>>> <column name="options" >>>>>>>>>>>>> key="gateway_mtu_bypass"> >>>>>>>>>>>>> <p> >>>>>>>>>>>>> When configured, represents a match >>>>>>>>>>>>> expression, >>>>>>>>>>>>> in the same >>>>>>>>>>>>> diff --git a/tests/ovn-northd.at b/tests/ovn- >>>>>>>>>>>>> northd.at >>>>>>>>>>>>> index a389d1988..df1e924ab 100644 >>>>>>>>>>>>> --- a/tests/ovn-northd.at >>>>>>>>>>>>> +++ b/tests/ovn-northd.at >>>>>>>>>>>>> @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl >>>>>>>>>>>>> dump-flows lr >>>>>>>>>>>>>> grep lr_in_dnat | ovn_strip_lflows], [0], [d >>>>>>>>>>>>> >>>>>>>>>>>>> AT_CLEANUP >>>>>>>>>>>>> ]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +OVN_FOR_EACH_NORTHD_NO_HV([ >>>>>>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>>>>>> +ovn_start >>>>>>>>>>>>> + >>>>>>>>>>>>> +check ovn-sbctl chassis-add hv1 geneve >>>>>>>>>>>>> 127.0.0.1 >>>>>>>>>>>>> + >>>>>>>>>>>>> +check ovn-nbctl lr-add lr -- \ >>>>>>>>>>>>> + lrp-add lr lr-ls 02:ac:10:01:00:01 >>>>>>>>>>>>> 172.16.1.1/24 >>>>>>>>>>>>> +check ovn-nbctl --wait=sb set logical_router >>>>>>>>>>>>> lr >>>>>>>>>>>>> options:chassis=hv1 >>>>>>>>>>>>> + >>>>>>>>>>>>> +check ovn-nbctl ls-add ls -- \ >>>>>>>>>>>>> + lsp-add ls ls-lr -- \ >>>>>>>>>>>>> + lsp-set-type ls-lr router -- \ >>>>>>>>>>>>> + lsp-set-addresses ls-lr router -- \ >>>>>>>>>>>>> + lsp-set-options ls-lr router-port=lr-ls >>>>>>>>>>>>> + >>>>>>>>>>>>> +check ovn-nbctl lsp-add ls lsp-bgp -- \ >>>>>>>>>>>>> + lsp-set-addresses lsp-bgp unknown >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Function that ensures that no BGP redirect >>>>>>>>>>>>> ruels are >>>>>>>>>>>>> installed. >>>>>>>>>>>>> +check_no_bgp_redirect() { >>>>>>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>>>>> ls_in_l2_lkup | grep "tcp.dst == 179" | wc - >>>>>>>>>>>>> l], [0], [0 >>>>>>>>>>>>> +]) >>>>>>>>>>>>> + >>>>>>>>>>>>> + AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>>>>> ls_in_check_port_sec | grep -E >>>>>>>>>>>>> "priority=80|priority=81" | >>>>>>>>>>>>> wc -l], [0], [0 >>>>>>>>>>>>> +]) >>>>>>>>>>>>> +} >>>>>>>>>>>>> + >>>>>>>>>>>>> +# By default, no rules related to BGP redirect >>>>>>>>>>>>> are present >>>>>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Set "lsp-bgp" port as target of BGP control >>>>>>>>>>>>> plane >>>>>>>>>>>>> redirected traffic >>>>>>>>>>>>> +check ovn-nbctl --wait=sb set >>>>>>>>>>>>> logical_router_port lr-ls >>>>>>>>>>>>> options:bgp-redirect=lsp-bgp >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Check that BGP control plane traffic is >>>>>>>>>>>>> redirected to >>>>>>>>>>>>> "lsp-bgp" >>>>>>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>>>>> ls_in_l2_lkup | >>>>>>>>>>>>> grep "tcp.dst == 179" | ovn_strip_lflows], [0], >>>>>>>>>>>>> [dnl >>>>>>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 >>>>>>>>>>>>> , >>>>>>>>>>>>> match=(ip4.dst == 172.16.1.1 && tcp.dst == >>>>>>>>>>>>> 179), >>>>>>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>>>>>> + table=??(ls_in_l2_lkup ), priority=100 >>>>>>>>>>>>> , >>>>>>>>>>>>> match=(ip6.dst == fe80::ac:10ff:fe01:1 && >>>>>>>>>>>>> tcp.dst == 179), >>>>>>>>>>>>> action=(outport = "lsp-bgp"; output;) >>>>>>>>>>>>> +]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Check that only BGP-related traffic is >>>>>>>>>>>>> accepted on "lsp- >>>>>>>>>>>>> bgp" port >>>>>>>>>>>>> +AT_CHECK([ovn-sbctl dump-flows ls | grep >>>>>>>>>>>>> ls_in_check_port_sec | grep -E >>>>>>>>>>>>> "priority=80|priority=81" | >>>>>>>>>>>>> ovn_strip_lflows], [0], [dnl >>>>>>>>>>>>> + table=??(ls_in_check_port_sec), >>>>>>>>>>>>> priority=80 , >>>>>>>>>>>>> match=(inport == "lsp-bgp"), action=(reg0[[15]] >>>>>>>>>>>>> = 1; next;) >>>>>>>>>>>>> + table=??(ls_in_check_port_sec), >>>>>>>>>>>>> priority=81 , >>>>>>>>>>>>> match=(inport == "lsp-bgp" && ip4.src == >>>>>>>>>>>>> 172.16.1.1 && >>>>>>>>>>>>> tcp.src == 179), action=(reg0[[15]] = >>>>>>>>>>>>> check_in_port_sec(); >>>>>>>>>>>>> next;) >>>>>>>>>>>>> + table=??(ls_in_check_port_sec), >>>>>>>>>>>>> priority=81 , >>>>>>>>>>>>> match=(inport == "lsp-bgp" && ip6.src == >>>>>>>>>>>>> fe80::ac:10ff:fe01:1 && tcp.src == 179), >>>>>>>>>>>>> action=(reg0[[15]] >>>>>>>>>>>>> = check_in_port_sec(); next;) >>>>>>>>>>>>> +]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Remove 'bgp-redirect' option from LRP and >>>>>>>>>>>>> check that >>>>>>>>>>>>> rules are removed >>>>>>>>>>>>> +check ovn-nbctl --wait=sb remove >>>>>>>>>>>>> logical_router_port lr-ls >>>>>>>>>>>>> options bgp-redirect >>>>>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Set non-existent LSP as target of 'bgp- >>>>>>>>>>>>> redirect' and >>>>>>>>>>>>> check that no rules are added >>>>>>>>>>>>> +check ovn-nbctl --wait=sb set >>>>>>>>>>>>> logical_router_port lr-ls >>>>>>>>>>>>> options:bgp-redirect=lsp-foo >>>>>>>>>>>>> +check_no_bgp_redirect >>>>>>>>>>>>> + >>>>>>>>>>>>> +AT_CLEANUP >>>>>>>>>>>>> +]) >>>>>>>>>>>>> diff --git a/tests/system-ovn.at >>>>>>>>>>>>> b/tests/system-ovn.at >>>>>>>>>>>>> index c24ede7c5..c09c1136e 100644 >>>>>>>>>>>>> --- a/tests/system-ovn.at >>>>>>>>>>>>> +++ b/tests/system-ovn.at >>>>>>>>>>>>> @@ -13027,3 +13027,89 @@ >>>>>>>>>>>>> OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query >>>>>>>>>>>>> port patch- >>>>>>>>>>>>> .*/d >>>>>>>>>>>>> /connection dropped.*/d"]) >>>>>>>>>>>>> AT_CLEANUP >>>>>>>>>>>>> ]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +OVN_FOR_EACH_NORTHD([ >>>>>>>>>>>>> +AT_SETUP([BGP control plane redirect]) >>>>>>>>>>>> Nit: probably best to add AT_SKIP_IF([test >>>>>>>>>>>> $HAVE_NC = no]) >>>>>>>>>>>> here. >>>>>>>>>>>> >>>>>>>>>>>>> +ovn_start >>>>>>>>>>>>> +OVS_TRAFFIC_VSWITCHD_START() >>>>>>>>>>>>> + >>>>>>>>>>>>> +ADD_BR([br-int]) >>>>>>>>>>>>> +ADD_BR([br-ext]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +ovs-ofctl add-flow br-ext action=normal >>>>>>>>>>>>> +# Set external-ids in br-int needed for ovn- >>>>>>>>>>>>> controller >>>>>>>>>>>>> +ovs-vsctl \ >>>>>>>>>>>>> + -- set Open_vSwitch . external- >>>>>>>>>>>>> ids:system-id=hv1 \ >>>>>>>>>>>>> + -- set Open_vSwitch . external- >>>>>>>>>>>>> ids:ovn- >>>>>>>>>>>>> remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>>>>>>>>>>>> + -- set Open_vSwitch . external- >>>>>>>>>>>>> ids:ovn-encap- >>>>>>>>>>>>> type=geneve \ >>>>>>>>>>>>> + -- set Open_vSwitch . external- >>>>>>>>>>>>> ids:ovn-encap- >>>>>>>>>>>>> ip=169.0.0.1 \ >>>>>>>>>>>>> + -- set bridge br-int fail-mode=secure >>>>>>>>>>>>> other- >>>>>>>>>>>>> config:disable-in-band=true >>>>>>>>>>>> Most ov*-ctl commands should be prefixed with >>>>>>>>>>>> "check" in this >>>>>>>>>>>> test. >>>>>>>>>>>> >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Start ovn-controller >>>>>>>>>>>>> +start_daemon ovn-controller >>>>>>>>>>>>> + >>>>>>>>>>>>> +ovn-nbctl lr-add R1 \ >>>>>>>>>>>>> + -- set Logical_Router R1 >>>>>>>>>>>>> options:chassis=hv1 >>>>>>>>>>>>> + >>>>>>>>>>>>> +ovn-nbctl ls-add public >>>>>>>>>>>>> + >>>>>>>>>>>>> +ovn-nbctl lrp-add R1 rp-public >>>>>>>>>>>>> 00:00:02:01:02:03 >>>>>>>>>>>>> 172.16.1.1/24 >>>>>>>>>>>>> + >>>>>>>>>>>>> +ovn-nbctl lsp-add public public-rp -- set >>>>>>>>>>>>> Logical_Switch_Port public-rp \ >>>>>>>>>>>>> + type=router options:router-port=rp-public >>>>>>>>>>>>> \ >>>>>>>>>>>>> + -- lsp-set-addresses public-rp router >>>>>>>>>>>>> + >>>>>>>>>>>>> +ovn-nbctl lsp-add public bgp-daemon \ >>>>>>>>>>>>> + -- lsp-set-addresses bgp-daemon unknown >>>>>>>>>>>>> + >>>>>>>>>>>>> +AT_CHECK([ovs-vsctl set Open_vSwitch . >>>>>>>>>>>>> external-ids:ovn- >>>>>>>>>>>>> bridge-mappings=phynet:br-ext]) >>>>>>>>>>>>> +ovn-nbctl lsp-add public public1 \ >>>>>>>>>>>>> + -- lsp-set-addresses public1 unknown \ >>>>>>>>>>>>> + -- lsp-set-type public1 localnet \ >>>>>>>>>>>>> + -- lsp-set-options public1 >>>>>>>>>>>>> network_name=phynet >>>>>>>>>>>>> + >>>>>>>>>>>>> +ovn-nbctl --wait=hv sync >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Set option that redirects BGP control plane >>>>>>>>>>>>> traffic to a >>>>>>>>>>>>> LSP "bgp-daemon" >>>>>>>>>>>>> +ovn-nbctl --wait=sb set logical_router_port >>>>>>>>>>>>> rp-public >>>>>>>>>>>>> options:bgp-redirect=bgp-daemon >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Create "bgp-daemon" interface in a namespace >>>>>>>>>>>>> with IP and >>>>>>>>>>>>> MAC matching LRP "rp-public" >>>>>>>>>>>>> +ADD_NAMESPACES(bgp-daemon) >>>>>>>>>>>>> +ADD_VETH(bgp-daemon, bgp-daemon, br-int, >>>>>>>>>>>>> "172.16.1.1/24", >>>>>>>>>>>>> "00:00:02:01:02:03") >>>>>>>>>>>>> + >>>>>>>>>>>>> +ADD_NAMESPACES(ext-foo) >>>>>>>>>>>>> +ADD_VETH(ext-foo, ext-foo, br-ext, >>>>>>>>>>>>> "172.16.1.100/24", >>>>>>>>>>>>> "00:10:10:01:02:13", \ >>>>>>>>>>>>> + "172.16.1.1") >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Flip the interface down/up to get proper >>>>>>>>>>>>> IPv6 LLA >>>>>>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set down bgp- >>>>>>>>>>>>> daemon]) >>>>>>>>>>>>> +NS_EXEC([bgp-daemon], [ip link set up bgp- >>>>>>>>>>>>> daemon]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Wait until IPv6 LLA loses the "tentative" >>>>>>>>>>>>> flag otherwise >>>>>>>>>>>>> it can't be bound to. >>>>>>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a >>>>>>>>>>>>> show dev bgp- >>>>>>>>>>>>> daemon | grep "fe80::" | grep -v tentative])]) >>>>>>>>>>>>> +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show >>>>>>>>>>>>> dev ext-foo >>>>>>>>>>>>>> grep "fe80::" | grep -v tentative])]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +# Verify that BGP control plane traffic is >>>>>>>>>>>>> delivered to >>>>>>>>>>>>> the "bgp-daemon" >>>>>>>>>>>>> +# interface on both IPv4 and IPv6 LLA >>>>>>>>>>>>> addresses >>>>>>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k >>>>>>>>>>>>> 172.16.1.1 179], >>>>>>>>>>>>> [bgp_v4.pid]) >>>>>>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc >>>>>>>>>>>>> --send-only >>>>>>>>>>>>> 172.16.1.1 179]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k >>>>>>>>>>>>> fe80::200:2ff:fe01:203%bgp-daemon 179], >>>>>>>>>>>>> [bgp_v6.pid]) >>>>>>>>>>>>> +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc >>>>>>>>>>>>> --send-only >>>>>>>>>>>>> -6 fe80::200:2ff:fe01:203%ext-foo 179]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +as ovn-sb >>>>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +as ovn-nb >>>>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +as northd >>>>>>>>>>>>> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>>>>>>>>>>>> + >>>>>>>>>>>>> +as >>>>>>>>>>>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error >>>>>>>>>>>>> receiving.*/d >>>>>>>>>>>>> +/.*terminating with signal 15.*/d"]) >>>>>>>>>>>>> +AT_CLEANUP >>>>>>>>>>>>> +]) >>>>>>>>>>>> Regards, >>>>>>>>>>>> Dumitru >>>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> dev mailing list >>>>>>>>>> dev@openvswitch.org >>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> dev@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >
With OVN we do use IPv4 LLA for now. IPv6 is in future plans. On 08.08.2024 19:31, martin.kalcok@canonical.com wrote: > Would it be useful to redirect only traffic for LRP's IPv6 LLA? > @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for setting up > BGP peering?
On 8/8/24 14:42, Dumitru Ceara wrote: >> Would it be useful to redirect only traffic for LRP's IPv6 LLA? >> @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for setting up >> BGP peering? >> > I'll let Vladislav comment on this. I do think IPv4 might be a > requirement on the long term for us downstream though. An addendum here, I meant non-LLA IPv4 (globally routable but maybe also private).
Thank you for the confirmation. As for the context of the current "experimental" implementation. Would you say that it's OK if we then redirect all LRP IP's (as currently proposed) and properly document that this feature causes traffic to all LRP IPs to be redirected. Therefore it's incompatible with, for example, creating load balancer on LRP with port 179? Martin. On Thu, 2024-08-08 at 15:29 +0200, Dumitru Ceara wrote: > On 8/8/24 14:42, Dumitru Ceara wrote: > > > Would it be useful to redirect only traffic for LRP's IPv6 LLA? > > > @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for > > > setting up > > > BGP peering? > > > > > I'll let Vladislav comment on this. I do think IPv4 might be a > > requirement on the long term for us downstream though. > > An addendum here, I meant non-LLA IPv4 (globally routable but maybe > also > private). >
On 8/8/24 15:37, martin.kalcok@canonical.com wrote: > Thank you for the confirmation. > > As for the context of the current "experimental" implementation. Would > you say that it's OK if we then redirect all LRP IP's (as currently > proposed) and properly document that this feature causes traffic to all > LRP IPs to be redirected. Therefore it's incompatible with, for > example, creating load balancer on LRP with port 179? > It's actually incompatible with any load balancer to be created that has VIP=<any LRP IP>. Same for NATs. What if we combine it with a filter like Vladislav suggested? But not match on any IPs explicitly. Then the user could just configure multiple IPs (e.g., IP1, IP2) on the LRP and decide which of them should be redirected by adding a filter on "ip.dest == IP2" (for example). That would allow you to: a. redirect all traffic (no filter or filter == "1") b. redirect traffic for select IPs (filter == "ip.dest == X") c. do more complex things (responsibility of the user if things break?) What do you think? > Martin. > > On Thu, 2024-08-08 at 15:29 +0200, Dumitru Ceara wrote: >> On 8/8/24 14:42, Dumitru Ceara wrote: >>>> Would it be useful to redirect only traffic for LRP's IPv6 LLA? >>>> @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for >>>> setting up >>>> BGP peering? >>>> >>> I'll let Vladislav comment on this. I do think IPv4 might be a >>> requirement on the long term for us downstream though. >> >> An addendum here, I meant non-LLA IPv4 (globally routable but maybe >> also >> private). >> >
On Thu, 2024-08-08 at 16:56 +0200, Dumitru Ceara wrote: > On 8/8/24 15:37, martin.kalcok@canonical.com wrote: > > Thank you for the confirmation. > > > > As for the context of the current "experimental" implementation. > > Would > > you say that it's OK if we then redirect all LRP IP's (as currently > > proposed) and properly document that this feature causes traffic to > > all > > LRP IPs to be redirected. Therefore it's incompatible with, for > > example, creating load balancer on LRP with port 179? > > > > It's actually incompatible with any load balancer to be created that > has > VIP=<any LRP IP>. Same for NATs. Sorry if I wasn't clear, but I was referring to an implementation that would redirect only traffic that matches: * ip.dst == LRP_IP && tcp.dst == 179 (Traffic from peer to our daemon) * ip.dst == LRP_IP && tcp.src == 179 (Replies to our daemons requests) These are static rules without any conntrack involvement. I have it deployed right now and I see NATed traffic from the hosts behind NAT working fine. (above I'm using port 179 as example, same is true for the BFD port) > > What if we combine it with a filter like Vladislav suggested? But > not > match on any IPs explicitly. Then the user could just configure > multiple IPs (e.g., IP1, IP2) on the LRP and decide which of them > should > be redirected by adding a filter on "ip.dest == IP2" (for example). > > That would allow you to: > a. redirect all traffic (no filter or filter == "1") > b. redirect traffic for select IPs (filter == "ip.dest == X") > c. do more complex things (responsibility of the user if things > break?) > > What do you think? To me personally, the option that lets user directly inject rules into OVN seems bit fragile and prone to errors/unexpected behavior. However that's not to say that I wouldn't implement it this way if maintainers are OK with this approach. I realize that we are very close to the freeze, but part of me still hopes that this feature will make it in :D. Properly implementing free- form rules via option is going to take a lot of time (implementation/input sanitation/testing/documentation). Perhaps we could open it up for the discussion in the further cycle development? I have an implementation based on your original feedback (i.e. with 'routing-protocol-redirect' and 'routing-protocols' options) that: * Allows redirecting BGP (179) and BFD (3784) * works for BGP in both directions * does not produce duplicate ARP replies * doesn't seem to break regular SNATed traffic. I plan to post it (hopefully today), after updating docs and tests, for consideration. Would that be OK, or are there any other issues that would block the first experimental implementation? Martin. > > > Martin. > > > > On Thu, 2024-08-08 at 15:29 +0200, Dumitru Ceara wrote: > > > On 8/8/24 14:42, Dumitru Ceara wrote: > > > > > Would it be useful to redirect only traffic for LRP's IPv6 > > > > > LLA? > > > > > @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for > > > > > setting up > > > > > BGP peering? > > > > > > > > > I'll let Vladislav comment on this. I do think IPv4 might be a > > > > requirement on the long term for us downstream though. > > > > > > An addendum here, I meant non-LLA IPv4 (globally routable but > > > maybe > > > also > > > private). > > > > > >
On 8/8/24 17:56, martin.kalcok@canonical.com wrote: > On Thu, 2024-08-08 at 16:56 +0200, Dumitru Ceara wrote: >> On 8/8/24 15:37, martin.kalcok@canonical.com wrote: >>> Thank you for the confirmation. >>> >>> As for the context of the current "experimental" implementation. >>> Would >>> you say that it's OK if we then redirect all LRP IP's (as currently >>> proposed) and properly document that this feature causes traffic to >>> all >>> LRP IPs to be redirected. Therefore it's incompatible with, for >>> example, creating load balancer on LRP with port 179? >>> >> >> It's actually incompatible with any load balancer to be created that >> has >> VIP=<any LRP IP>. Same for NATs. > > Sorry if I wasn't clear, but I was referring to an implementation that > would redirect only traffic that matches: > * ip.dst == LRP_IP && tcp.dst == 179 (Traffic from peer to our daemon) > * ip.dst == LRP_IP && tcp.src == 179 (Replies to our daemons requests) > > These are static rules without any conntrack involvement. I have it > deployed right now and I see NATed traffic from the hosts behind NAT > working fine. > (above I'm using port 179 as example, same is true for the BFD port) > >> >> What if we combine it with a filter like Vladislav suggested? But >> not >> match on any IPs explicitly. Then the user could just configure >> multiple IPs (e.g., IP1, IP2) on the LRP and decide which of them >> should >> be redirected by adding a filter on "ip.dest == IP2" (for example). >> >> That would allow you to: >> a. redirect all traffic (no filter or filter == "1") >> b. redirect traffic for select IPs (filter == "ip.dest == X") >> c. do more complex things (responsibility of the user if things >> break?) >> >> What do you think? > > To me personally, the option that lets user directly inject rules into > OVN seems bit fragile and prone to errors/unexpected behavior. However > that's not to say that I wouldn't implement it this way if maintainers > are OK with this approach. > I realize that we are very close to the freeze, but part of me still > hopes that this feature will make it in :D. Properly implementing free- > form rules via option is going to take a lot of time > (implementation/input sanitation/testing/documentation). Perhaps we > could open it up for the discussion in the further cycle development? > OK, you're right it's not something we can do now. > I have an implementation based on your original feedback (i.e. with > 'routing-protocol-redirect' and 'routing-protocols' options) that: > * Allows redirecting BGP (179) and BFD (3784) > * works for BGP in both directions > * does not produce duplicate ARP replies I still don't understand how the 'bgp-daemon' VIF resolves the mac address of its BGP peer. ARP replies are unicast and destination mac is the 'bgp-daemon' MAC address which happens to be the LRP MAC address too. But maybe I can understand it better from the system test.. > * doesn't seem to break regular SNATed traffic. > > I plan to post it (hopefully today), after updating docs and tests, for > consideration. Would that be OK, or are there any other issues that > would block the first experimental implementation? > If it works like this it's probably fine. Thanks for the hard work on this! Regards, Dumitru > Martin. > >> >>> Martin. >>> >>> On Thu, 2024-08-08 at 15:29 +0200, Dumitru Ceara wrote: >>>> On 8/8/24 14:42, Dumitru Ceara wrote: >>>>>> Would it be useful to redirect only traffic for LRP's IPv6 >>>>>> LLA? >>>>>> @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for >>>>>> setting up >>>>>> BGP peering? >>>>>> >>>>> I'll let Vladislav comment on this. I do think IPv4 might be a >>>>> requirement on the long term for us downstream though. >>>> >>>> An addendum here, I meant non-LLA IPv4 (globally routable but >>>> maybe >>>> also >>>> private). >>>> >>> >> >
On 8/8/24 18:13, Dumitru Ceara wrote: > On 8/8/24 17:56, martin.kalcok@canonical.com wrote: >> On Thu, 2024-08-08 at 16:56 +0200, Dumitru Ceara wrote: >>> On 8/8/24 15:37, martin.kalcok@canonical.com wrote: >>>> Thank you for the confirmation. >>>> >>>> As for the context of the current "experimental" implementation. >>>> Would >>>> you say that it's OK if we then redirect all LRP IP's (as currently >>>> proposed) and properly document that this feature causes traffic to >>>> all >>>> LRP IPs to be redirected. Therefore it's incompatible with, for >>>> example, creating load balancer on LRP with port 179? >>>> >>> >>> It's actually incompatible with any load balancer to be created that >>> has >>> VIP=<any LRP IP>. Same for NATs. >> >> Sorry if I wasn't clear, but I was referring to an implementation that >> would redirect only traffic that matches: >> * ip.dst == LRP_IP && tcp.dst == 179 (Traffic from peer to our daemon) >> * ip.dst == LRP_IP && tcp.src == 179 (Replies to our daemons requests) >> >> These are static rules without any conntrack involvement. I have it >> deployed right now and I see NATed traffic from the hosts behind NAT >> working fine. >> (above I'm using port 179 as example, same is true for the BFD port) >> >>> >>> What if we combine it with a filter like Vladislav suggested? But >>> not >>> match on any IPs explicitly. Then the user could just configure >>> multiple IPs (e.g., IP1, IP2) on the LRP and decide which of them >>> should >>> be redirected by adding a filter on "ip.dest == IP2" (for example). >>> >>> That would allow you to: >>> a. redirect all traffic (no filter or filter == "1") >>> b. redirect traffic for select IPs (filter == "ip.dest == X") >>> c. do more complex things (responsibility of the user if things >>> break?) >>> >>> What do you think? >> >> To me personally, the option that lets user directly inject rules into >> OVN seems bit fragile and prone to errors/unexpected behavior. However >> that's not to say that I wouldn't implement it this way if maintainers >> are OK with this approach. >> I realize that we are very close to the freeze, but part of me still >> hopes that this feature will make it in :D. Properly implementing free- >> form rules via option is going to take a lot of time >> (implementation/input sanitation/testing/documentation). Perhaps we >> could open it up for the discussion in the further cycle development? >> > > OK, you're right it's not something we can do now. > >> I have an implementation based on your original feedback (i.e. with >> 'routing-protocol-redirect' and 'routing-protocols' options) that: >> * Allows redirecting BGP (179) and BFD (3784) >> * works for BGP in both directions >> * does not produce duplicate ARP replies > > I still don't understand how the 'bgp-daemon' VIF resolves the mac > address of its BGP peer. ARP replies are unicast and destination mac is > the 'bgp-daemon' MAC address which happens to be the LRP MAC address too. > > But maybe I can understand it better from the system test.. > >> * doesn't seem to break regular SNATed traffic. >> >> I plan to post it (hopefully today), after updating docs and tests, for >> consideration. Would that be OK, or are there any other issues that >> would block the first experimental implementation? >> > > If it works like this it's probably fine. > Thanks for the hard work on this! > I forgot to mention, it would be great if you could already mark it "experimental" in the ovn-nb documentation. Thanks! > Regards, > Dumitru > >> Martin. >> >>> >>>> Martin. >>>> >>>> On Thu, 2024-08-08 at 15:29 +0200, Dumitru Ceara wrote: >>>>> On 8/8/24 14:42, Dumitru Ceara wrote: >>>>>>> Would it be useful to redirect only traffic for LRP's IPv6 >>>>>>> LLA? >>>>>>> @Vladislav, in your setups, do you use ipv4 or ipv6 LLAs for >>>>>>> setting up >>>>>>> BGP peering? >>>>>>> >>>>>> I'll let Vladislav comment on this. I do think IPv4 might be a >>>>>> requirement on the long term for us downstream though. >>>>> >>>>> An addendum here, I meant non-LLA IPv4 (globally routable but >>>>> maybe >>>>> also >>>>> private). >>>>> >>>> >>> >>
diff --git a/northd/northd.c b/northd/northd.c index 4353df07d..088104f25 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -13048,6 +13048,113 @@ build_arp_resolve_flows_for_lrp(struct ovn_port *op, } } +static void +build_bgp_redirect_rule__( + const char *s_addr, const char *bgp_port_name, bool is_ipv6, + struct ovn_port *ls_peer, struct lflow_table *lflows, + struct ds *match, struct ds *actions) +{ + int ip_ver = is_ipv6 ? 6 : 4; + /* Redirect packets in the input pipeline destined for LR's IP to + * the port specified in 'bgp-redirect' option. + */ + ds_clear(match); + ds_clear(actions); + ds_put_format(match, "ip%d.dst == %s && tcp.dst == 179", ip_ver, s_addr); + ds_put_format(actions, "outport = \"%s\"; output;", bgp_port_name); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_L2_LKUP, 100, + ds_cstr(match), + ds_cstr(actions), + ls_peer->lflow_ref); + + + /* Drop any traffic originating from 'bgp-redirect' port that does + * not originate from BGP daemon port. This blocks unnecessary + * traffic like ARP broadcasts or IPv6 router solicitation packets + * from the dummy 'bgp-redirect' port. + */ + ds_clear(match); + ds_put_format(match, "inport == \"%s\"", bgp_port_name); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 80, + ds_cstr(match), + REGBIT_PORT_SEC_DROP " = 1; next;", + ls_peer->lflow_ref); + + ds_put_format(match, + " && ip%d.src == %s && tcp.src == 179", + ip_ver, + s_addr); + ovn_lflow_add(lflows, ls_peer->od, S_SWITCH_IN_CHECK_PORT_SEC, 81, + ds_cstr(match), + REGBIT_PORT_SEC_DROP " = check_in_port_sec(); next;", + ls_peer->lflow_ref); +} + +static void +build_lrouter_bgp_redirect( + struct ovn_port *op, struct lflow_table *lflows, + struct ds *match, struct ds *actions) +{ + /* LRP has to have a peer.*/ + if (op->peer == NULL) { + return; + } + /* LRP has to have NB record.*/ + if (op->nbrp == NULL) { + return; + } + + /* Proceed only for LRPs that have 'bgp-redirect' option set. Value of this + * option is the name of LSP to which a BGP traffic will be redirected. + */ + const char *bgp_port = smap_get(&op->nbrp->options, "bgp-redirect"); + if (bgp_port == NULL) { + return; + } + + /* Ensure that LSP, to which the BGP traffic is redirected, exists.*/ + struct ovn_port *peer_lsp; + bool bgp_port_exists = false; + HMAP_FOR_EACH (peer_lsp, dp_node, &op->peer->od->ports) { + size_t peer_lsp_s = strlen(peer_lsp->key); + if (peer_lsp_s == strlen(bgp_port) + && !strncmp(peer_lsp->key, bgp_port, peer_lsp_s)){ + bgp_port_exists = true; + break; + } + } + + if (!bgp_port_exists) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' set on Logical Router Port " + "'%s' refers to non-existent Logical Switch Port. " + "BGP redirecting won't be configured.", + op->key); + return; + } + + if (op->cr_port != NULL) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_WARN_RL(&rl, "Option 'bgp-redirect' is not supported on" + " Distributed Gateway Port '%s'", op->key); + return; + } + + /* Mirror traffic destined for LRP's IPs and default BGP port + * to the port defined in 'bgp-redirect' option. + */ + for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) { + const char *ip_s = op->lrp_networks.ipv4_addrs[i].addr_s; + build_bgp_redirect_rule__(ip_s, bgp_port, false, op->peer, lflows, + match, actions); + } + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) { + const char *ip_s = op->lrp_networks.ipv6_addrs[i].addr_s; + build_bgp_redirect_rule__(ip_s, bgp_port, true, op->peer, lflows, + match, actions); + } +} + /* This function adds ARP resolve flows related to a LSP. */ static void build_arp_resolve_flows_for_lsp( @@ -16003,6 +16110,7 @@ build_lswitch_and_lrouter_iterate_by_lrp(struct ovn_port *op, lsi->meter_groups, op->lflow_ref); build_lrouter_icmp_packet_toobig_admin_flows(op, lsi->lflows, &lsi->match, &lsi->actions, op->lflow_ref); + build_lrouter_bgp_redirect(op, lsi->lflows, &lsi->match, &lsi->actions); } static void * diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index b06b09ac5..4cbdf8f74 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -284,6 +284,21 @@ dropped in the next stage. </li> + <li> + For each logical port that's defined as a target of BGP redirecting + (via <code>bgp-redirect</code> option set on Logical Router Port), a + filter is set in place that disallows any traffic entering this port + that does not originate from Logical Router Port's IPs and default BGP + port (TCP 179). This filtering is achieved by two rules. First rule has + priority 81, it matches on <code>inport == <var>BGP_MIRROR_PORT</var> + && ip.src == <var>LRP_IP</var> && tcp.src == 179</code> + and allows traffic further with <code>REGBIT_PORT_SEC_DROP" = + check_in_port_sec(); next;</code>. Second rule with priority 80 matches + the rest of the traffic from that port and sets + <code>REGBIT_PORT_SEC_DROP" = 1; next;"</code> so that the packets are + dropped in the next stage. + </li> + <li> For each (enabled) vtep logical port, a priority 70 flow is added which matches on all packets and applies the action @@ -1949,6 +1964,14 @@ output; on the logical switch. </li> + <li> + For any logical port that's defined as a target of BGP redirecting (via + <code>bgp-redirect</code> option set on Logical Router Port), a + priority-100 flow is added that redirects traffic for Logical Router + Port IPs (including IPv6 LLA) and TCP port 179, to the targeted + logical port. + </li> + <li> Priority-90 flows for transit switches that forward registered IP multicast traffic to their corresponding multicast group , which diff --git a/ovn-nb.xml b/ovn-nb.xml index 9552534f6..2d2f5c9e0 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -3440,6 +3440,20 @@ or </p> </column> + <column name="options" key="bgp-redirect" type='{"type": "string"}'> + <p> + This option expects a name of a Logical Switch Port that's present + in the peer's Logical Switch. If set, it causes any traffic + that's destined for Logical Router Port's IP addresses (including + its IPv6 LLA) and the default BGP port (TCP 179), to be redirected + to the specified Logical Switch Port. + + This allows external BGP daemon to be bound to a port in OVN's + Logical Switch and act as if it was listening on Logical Router + Port's IP address. + </p> + </column> + <column name="options" key="gateway_mtu_bypass"> <p> When configured, represents a match expression, in the same diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index a389d1988..df1e924ab 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -12721,3 +12721,61 @@ AT_CHECK([ovn-sbctl dump-flows lr | grep lr_in_dnat | ovn_strip_lflows], [0], [d AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD_NO_HV([ +AT_SETUP([BGP control plane redirect]) +ovn_start + +check ovn-sbctl chassis-add hv1 geneve 127.0.0.1 + +check ovn-nbctl lr-add lr -- \ + lrp-add lr lr-ls 02:ac:10:01:00:01 172.16.1.1/24 +check ovn-nbctl --wait=sb set logical_router lr options:chassis=hv1 + +check ovn-nbctl ls-add ls -- \ + lsp-add ls ls-lr -- \ + lsp-set-type ls-lr router -- \ + lsp-set-addresses ls-lr router -- \ + lsp-set-options ls-lr router-port=lr-ls + +check ovn-nbctl lsp-add ls lsp-bgp -- \ + lsp-set-addresses lsp-bgp unknown + +# Function that ensures that no BGP redirect ruels are installed. +check_no_bgp_redirect() { + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | wc -l], [0], [0 +]) + + AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | wc -l], [0], [0 +]) +} + +# By default, no rules related to BGP redirect are present +check_no_bgp_redirect + +# Set "lsp-bgp" port as target of BGP control plane redirected traffic +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-bgp + +# Check that BGP control plane traffic is redirected to "lsp-bgp" +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_l2_lkup | grep "tcp.dst == 179" | ovn_strip_lflows], [0], [dnl + table=??(ls_in_l2_lkup ), priority=100 , match=(ip4.dst == 172.16.1.1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) + table=??(ls_in_l2_lkup ), priority=100 , match=(ip6.dst == fe80::ac:10ff:fe01:1 && tcp.dst == 179), action=(outport = "lsp-bgp"; output;) +]) + +# Check that only BGP-related traffic is accepted on "lsp-bgp" port +AT_CHECK([ovn-sbctl dump-flows ls | grep ls_in_check_port_sec | grep -E "priority=80|priority=81" | ovn_strip_lflows], [0], [dnl + table=??(ls_in_check_port_sec), priority=80 , match=(inport == "lsp-bgp"), action=(reg0[[15]] = 1; next;) + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip4.src == 172.16.1.1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) + table=??(ls_in_check_port_sec), priority=81 , match=(inport == "lsp-bgp" && ip6.src == fe80::ac:10ff:fe01:1 && tcp.src == 179), action=(reg0[[15]] = check_in_port_sec(); next;) +]) + +# Remove 'bgp-redirect' option from LRP and check that rules are removed +check ovn-nbctl --wait=sb remove logical_router_port lr-ls options bgp-redirect +check_no_bgp_redirect + +# Set non-existent LSP as target of 'bgp-redirect' and check that no rules are added +check ovn-nbctl --wait=sb set logical_router_port lr-ls options:bgp-redirect=lsp-foo +check_no_bgp_redirect + +AT_CLEANUP +]) diff --git a/tests/system-ovn.at b/tests/system-ovn.at index c24ede7c5..c09c1136e 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -13027,3 +13027,89 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d /connection dropped.*/d"]) AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([BGP control plane redirect]) +ovn_start +OVS_TRAFFIC_VSWITCHD_START() + +ADD_BR([br-int]) +ADD_BR([br-ext]) + +ovs-ofctl add-flow br-ext action=normal +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller + +ovn-nbctl lr-add R1 \ + -- set Logical_Router R1 options:chassis=hv1 + +ovn-nbctl ls-add public + +ovn-nbctl lrp-add R1 rp-public 00:00:02:01:02:03 172.16.1.1/24 + +ovn-nbctl lsp-add public public-rp -- set Logical_Switch_Port public-rp \ + type=router options:router-port=rp-public \ + -- lsp-set-addresses public-rp router + +ovn-nbctl lsp-add public bgp-daemon \ + -- lsp-set-addresses bgp-daemon unknown + +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=phynet:br-ext]) +ovn-nbctl lsp-add public public1 \ + -- lsp-set-addresses public1 unknown \ + -- lsp-set-type public1 localnet \ + -- lsp-set-options public1 network_name=phynet + +ovn-nbctl --wait=hv sync + +# Set option that redirects BGP control plane traffic to a LSP "bgp-daemon" +ovn-nbctl --wait=sb set logical_router_port rp-public options:bgp-redirect=bgp-daemon + +# Create "bgp-daemon" interface in a namespace with IP and MAC matching LRP "rp-public" +ADD_NAMESPACES(bgp-daemon) +ADD_VETH(bgp-daemon, bgp-daemon, br-int, "172.16.1.1/24", "00:00:02:01:02:03") + +ADD_NAMESPACES(ext-foo) +ADD_VETH(ext-foo, ext-foo, br-ext, "172.16.1.100/24", "00:10:10:01:02:13", \ + "172.16.1.1") + +# Flip the interface down/up to get proper IPv6 LLA +NS_EXEC([bgp-daemon], [ip link set down bgp-daemon]) +NS_EXEC([bgp-daemon], [ip link set up bgp-daemon]) + +# Wait until IPv6 LLA loses the "tentative" flag otherwise it can't be bound to. +OVS_WAIT_UNTIL([NS_EXEC([bgp-daemon], [ip a show dev bgp-daemon | grep "fe80::" | grep -v tentative])]) +OVS_WAIT_UNTIL([NS_EXEC([ext-foo], [ip a show dev ext-foo | grep "fe80::" | grep -v tentative])]) + +# Verify that BGP control plane traffic is delivered to the "bgp-daemon" +# interface on both IPv4 and IPv6 LLA addresses +NETNS_DAEMONIZE([bgp-daemon], [nc -l -k 172.16.1.1 179], [bgp_v4.pid]) +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only 172.16.1.1 179]) + +NETNS_DAEMONIZE([bgp-daemon], [nc -l -6 -k fe80::200:2ff:fe01:203%bgp-daemon 179], [bgp_v6.pid]) +NS_CHECK_EXEC([ext-foo], [echo "TCP test" | nc --send-only -6 fe80::200:2ff:fe01:203%ext-foo 179]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d +/.*terminating with signal 15.*/d"]) +AT_CLEANUP +])
This change adds a 'bgp-redirect' option to LRP that allows redirecting of BGP control plane traffic to an arbitrary LSP in its peer LS. The option expects a string with a LSP name. When set, any traffic entering LS that's destined for any of the LRP's IP addresses (including IPv6 LLA) is redirected to the LSP specified in the option's value. This enables external BGP daemons to listen on an interface bound to a LSP and effectively act as if they were listening on (and speaking from) LRP's IP address. Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> --- v4 of this patch renames the feature from "bgp-mirror" to "bgp-redirect" as discussed during community meeting. northd/northd.c | 108 ++++++++++++++++++++++++++++++++++++++++ northd/ovn-northd.8.xml | 23 +++++++++ ovn-nb.xml | 14 ++++++ tests/ovn-northd.at | 58 +++++++++++++++++++++ tests/system-ovn.at | 86 ++++++++++++++++++++++++++++++++ 5 files changed, 289 insertions(+)