Message ID | 20241127061406.3164762-1-hzhou@ovn.org |
---|---|
State | New |
Headers | show |
Series | [ovs-dev] northd: Use lower priority for all src routes. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Hi Han, On 11/27/24 7:14 AM, Han Zhou wrote: > The current implementation prefers dst routes over src routes only if > they have the same prefix length. This is not very useful for real world > use cases, because it doesn't make much sense to compare the prefix > length between different fields of the IP header (dst IP v.s. src IP). > The prefix length should make sense only when comparing for the same > field, either dst or src. > > This patch changes the behavior by always prefering dst routes over src > routes, regardless of the prefix length, and comparing prefix length > only within the same type of routes. This is a significant change in behavior and will potentially break existing deployments. In my opinion this is also not really a bug fix. The old behavior was properly documented and the implementation worked as documented. I don't think we can just change preference like that. I'm afraid that if we want to support this we need a way to opt-in through configuration. > > E.g., there are two routes: > > 10.0.0.0/8 nexthop A dst-ip > 192.168.11.0/24 nexthop B src-ip > > For a packet 192.168.11.123 -> 10.1.2.3: > > Before: nexthop will be B > > After: nexthop will be A > We already support logical router policies that can achieve the same thing. The user could just add a policy like: ip.dst == 10.0.0.0/8 action: "reroute A" I understand policies might be tricky to get right but, on the other hand, changing existing behavior (as mentioned above) seems very risky to me. I'm not sure about other CMS but ovn-kubernetes uses a mix of src-ip/dst-ip routes. In our ovn-org/ovn CI we only run a subset of the ovn-kubernetes CI jobs so there's a chance we don't cover all their use cases. Would it be possible to do a full CI test run with ovn-kubernetes master branch and an OVN version that includes your patch. I guess that would require a small adjustment to the ovn-kubernetes code to use the Dockerfile.fedora.dev file [0][1] (pointing to your ovn branch) in CI and opening a PR against their repo. Regards, Dumitru [0] https://github.com/ovn-kubernetes/ovn-kubernetes/blob/master/dist/images/Dockerfile.fedora.dev [1] https://github.com/ovn-kubernetes/ovn-kubernetes/blob/9cf5a7707ca278b96fed631a8780fbf3ff7e32bb/.github/workflows/test.yml#L212 > Reported-by: Girish Moodalbail <gmoodalbail@gmail.com> > Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050049.html > Signed-off-by: Han Zhou <hzhou@ovn.org> > --- > NEWS | 2 ++ > northd/northd.c | 12 ++++++---- > ovn-nb.xml | 6 ++--- > tests/ovn-northd.at | 53 +++++++++++++++++++++++---------------------- > 4 files changed, 40 insertions(+), 33 deletions(-) > > diff --git a/NEWS b/NEWS > index da3aba739ced..74ee23b6473b 100644 > --- a/NEWS > +++ b/NEWS > @@ -4,6 +4,8 @@ Post v24.09.0 > hash (with specified hash fields) for ECMP routes > while choosing nexthop. > - ovn-ic: Add support for route tag to prevent route learning. > + - Adjusted priorities of src-ip based static routes to be lower than other > + types of routes regardless of prefix length. > > OVN v24.09.0 - 13 Sep 2024 > -------------------------- > diff --git a/northd/northd.c b/northd/northd.c > index 2aa6c0958db4..039c20bda497 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -303,9 +303,9 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2)); > * same ip_prefix values: > * - connected route overrides static one; > * - static route overrides src-ip route. */ > -#define ROUTE_PRIO_OFFSET_MULTIPLIER 5 > -#define ROUTE_PRIO_OFFSET_STATIC 2 > -#define ROUTE_PRIO_OFFSET_CONNECTED 4 > +#define ROUTE_PRIO_OFFSET_MULTIPLIER 3 > +#define ROUTE_PRIO_OFFSET_STATIC 0 > +#define ROUTE_PRIO_OFFSET_CONNECTED 2 > > /* Returns the type of the datapath to which a flow with the given 'stage' may > * be added. */ > @@ -11443,6 +11443,7 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, > bool has_protocol_match) > { > const char *dir; > + int base = 0; > /* The priority here is calculated to implement longest-prefix-match > * routing. */ > if (is_src_route) { > @@ -11450,6 +11451,9 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, > ofs = 0; > } else { > dir = "dst"; > + /* dst routes have higher priority than all src routes regardless of > + * prefix length. */ > + base = (128 + 1) * ROUTE_PRIO_OFFSET_MULTIPLIER; > } > > if (op_inport) { > @@ -11462,7 +11466,7 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, > if (has_protocol_match) { > ofs += 1; > } > - *priority = (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; > + *priority = base + (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; > > ds_put_format(match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", dir, > network_s, plen); > diff --git a/ovn-nb.xml b/ovn-nb.xml > index 5114bbc2eb9b..ad774cd4ba72 100644 > --- a/ovn-nb.xml > +++ b/ovn-nb.xml > @@ -3782,9 +3782,9 @@ or > </p> > > <p> > - When multiple routes match a packet, the longest-prefix match is chosen. > - For a given prefix length, a <code>dst-ip</code> route is preferred over > - a <code>src-ip</code> route. > + When multiple routes match a packet, a <code>dst-ip</code> route is > + preferred over a <code>src-ip</code> route. Among the same type of > + routes, the longest-prefix match is chosen. > </p> > > <p> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index e3b7b0cb5f49..73905adccdd0 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -6823,9 +6823,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl > table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) > table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) > table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;) > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;) > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > ]) > > AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl > @@ -6841,9 +6841,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl > table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) > table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) > table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > ]) > AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl > table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;) > @@ -6870,9 +6870,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl > table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) > table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) > table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > ]) > AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl > table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;) > @@ -6888,14 +6888,14 @@ check ovn-nbctl --wait=sb lr-route-add lr0 1.0.0.0/24 192.168.0.10 > ovn-sbctl dump-flows lr0 > lr0flows > > AT_CHECK([grep -e "lr_in_ip_routing.*192.168.0.10" lr0flows | ovn_strip_lflows], [0], [dnl > - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > ]) > > check ovn-nbctl --wait=sb lr-route-add lr0 2.0.0.0/24 lr0-public > > ovn-sbctl dump-flows lr0 > lr0flows > AT_CHECK([grep -e "lr_in_ip_routing.*2.0.0.0" lr0flows | ovn_strip_lflows], [0], [dnl > - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > ]) > > check ovn-nbctl lr-route-add lr0 3.3.0.0/16 192.168.0.11 > @@ -6910,7 +6910,7 @@ check ovn-nbctl set logical_router_static_route $route2_uuid selection_fields="i > check ovn-nbctl --wait=sb sync > ovn-sbctl dump-flows lr0 > lr0flows > AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed 's/table=../table=??/' | sort], [0], [dnl > - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > ]) > > check ovn-nbctl set logical_router_static_route $route1_uuid selection_fields="ip_src,ip_dst,tp_src,tp_dst" > @@ -6919,10 +6919,10 @@ check ovn-nbctl set logical_router_static_route $route2_uuid selection_fields="i > check ovn-nbctl --wait=sb sync > ovn-sbctl dump-flows lr0 > lr0flows > AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed 's/table=../table=??/' | sort], [0], [dnl > - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) > - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) > - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) > + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) > + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) > + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) > ]) > > AT_CLEANUP > @@ -7346,18 +7346,19 @@ AT_CHECK([grep "lr_in_ip_routing_pre" lr0flows | ovn_strip_lflows], [0], [dnl > grep -e "(lr_in_ip_routing ).*outport" lr0flows > > AT_CHECK([grep -e "(lr_in_ip_routing ).*outport" lr0flows | ovn_strip_lflows], [0], [dnl > - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 1 && ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 2 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 2 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 1 && ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > ]) > > + > AT_CLEANUP > ]) >
On Wed, Nov 27, 2024 at 3:27 AM Dumitru Ceara <dceara@redhat.com> wrote: > > Hi Han, > > On 11/27/24 7:14 AM, Han Zhou wrote: > > The current implementation prefers dst routes over src routes only if > > they have the same prefix length. This is not very useful for real world > > use cases, because it doesn't make much sense to compare the prefix > > length between different fields of the IP header (dst IP v.s. src IP). > > The prefix length should make sense only when comparing for the same > > field, either dst or src. > > > > This patch changes the behavior by always prefering dst routes over src > > routes, regardless of the prefix length, and comparing prefix length > > only within the same type of routes. > > This is a significant change in behavior and will potentially break > existing deployments. In my opinion this is also not really a bug fix. > The old behavior was properly documented and the implementation worked > as documented. > > I don't think we can just change preference like that. I'm afraid that > if we want to support this we need a way to opt-in through configuration. > Thanks Dumitru for the review and feedback. I do agree it is a significant behavior change. However, I was in doubt it would break any existing deployments which was why I didn't add any configuration to control the behavior, just to avoid unnecessary knobs. It may not be a strictly bug fix, but the original behavior to me is more like a bug in the design. I can't imagine any real world use case that would rely on the old behavior but be broken by the new behavior, and also never heard of any physical routers implemented this way. > > > > E.g., there are two routes: > > > > 10.0.0.0/8 nexthop A dst-ip > > 192.168.11.0/24 nexthop B src-ip > > > > For a packet 192.168.11.123 -> 10.1.2.3: > > > > Before: nexthop will be B > > > > After: nexthop will be A > > > > We already support logical router policies that can achieve the same > thing. The user could just add a policy like: > > ip.dst == 10.0.0.0/8 action: "reroute A" > Yes, I understand that we have policy route that support the desired behavior, which was also discussed in the "reported-at", for the ovn-k8s, but it ended up with still the src-ip static routes as the first stage and then some more complex policy route for more complex routing decisions. Now we are facing the same problem again - a need for variant prefix length subnets for different ovn-k8s nodes. There are multiple alternatives to solve the problem but it seems to be the most simple and straightforward to "fix" the src-ip route behavior. > I understand policies might be tricky to get right but, on the other > hand, changing existing behavior (as mentioned above) seems very risky > to me. > > I'm not sure about other CMS but ovn-kubernetes uses a mix of > src-ip/dst-ip routes. In our ovn-org/ovn CI we only run a subset of the > ovn-kubernetes CI jobs so there's a chance we don't cover all their use > cases. Would it be possible to do a full CI test run with > ovn-kubernetes master branch and an OVN version that includes your patch. > In fact we already ran full ovn-k8s test cases in our downstream CI for this same change, which worked pretty well and nothing is broken, and of course the variant length subnet problem was fixed without any changes required in ovn-k8s. So, it would be great if you could give another thought about this, even better if you have access to broader customer use cases regarding src-ip routes, and see if there is a real risk of breaking existing use cases. If you still believe it is a high risk, we can definitely add a knob and keep the default behavior unchanged. Best, Han > I guess that would require a small adjustment to the ovn-kubernetes code > to use the Dockerfile.fedora.dev file [0][1] (pointing to your ovn > branch) in CI and opening a PR against their repo. > > Regards, > Dumitru > > [0] > https://github.com/ovn-kubernetes/ovn-kubernetes/blob/master/dist/images/Dockerfile.fedora.dev > [1] > https://github.com/ovn-kubernetes/ovn-kubernetes/blob/9cf5a7707ca278b96fed631a8780fbf3ff7e32bb/.github/workflows/test.yml#L212 > > > Reported-by: Girish Moodalbail <gmoodalbail@gmail.com> > > Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050049.html > > Signed-off-by: Han Zhou <hzhou@ovn.org> > > --- > > NEWS | 2 ++ > > northd/northd.c | 12 ++++++---- > > ovn-nb.xml | 6 ++--- > > tests/ovn-northd.at | 53 +++++++++++++++++++++++---------------------- > > 4 files changed, 40 insertions(+), 33 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index da3aba739ced..74ee23b6473b 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -4,6 +4,8 @@ Post v24.09.0 > > hash (with specified hash fields) for ECMP routes > > while choosing nexthop. > > - ovn-ic: Add support for route tag to prevent route learning. > > + - Adjusted priorities of src-ip based static routes to be lower than other > > + types of routes regardless of prefix length. > > > > OVN v24.09.0 - 13 Sep 2024 > > -------------------------- > > diff --git a/northd/northd.c b/northd/northd.c > > index 2aa6c0958db4..039c20bda497 100644 > > --- a/northd/northd.c > > +++ b/northd/northd.c > > @@ -303,9 +303,9 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2)); > > * same ip_prefix values: > > * - connected route overrides static one; > > * - static route overrides src-ip route. */ > > -#define ROUTE_PRIO_OFFSET_MULTIPLIER 5 > > -#define ROUTE_PRIO_OFFSET_STATIC 2 > > -#define ROUTE_PRIO_OFFSET_CONNECTED 4 > > +#define ROUTE_PRIO_OFFSET_MULTIPLIER 3 > > +#define ROUTE_PRIO_OFFSET_STATIC 0 > > +#define ROUTE_PRIO_OFFSET_CONNECTED 2 > > > > /* Returns the type of the datapath to which a flow with the given 'stage' may > > * be added. */ > > @@ -11443,6 +11443,7 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, > > bool has_protocol_match) > > { > > const char *dir; > > + int base = 0; > > /* The priority here is calculated to implement longest-prefix-match > > * routing. */ > > if (is_src_route) { > > @@ -11450,6 +11451,9 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, > > ofs = 0; > > } else { > > dir = "dst"; > > + /* dst routes have higher priority than all src routes regardless of > > + * prefix length. */ > > + base = (128 + 1) * ROUTE_PRIO_OFFSET_MULTIPLIER; > > } > > > > if (op_inport) { > > @@ -11462,7 +11466,7 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, > > if (has_protocol_match) { > > ofs += 1; > > } > > - *priority = (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; > > + *priority = base + (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; > > > > ds_put_format(match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", dir, > > network_s, plen); > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > index 5114bbc2eb9b..ad774cd4ba72 100644 > > --- a/ovn-nb.xml > > +++ b/ovn-nb.xml > > @@ -3782,9 +3782,9 @@ or > > </p> > > > > <p> > > - When multiple routes match a packet, the longest-prefix match is chosen. > > - For a given prefix length, a <code>dst-ip</code> route is preferred over > > - a <code>src-ip</code> route. > > + When multiple routes match a packet, a <code>dst-ip</code> route is > > + preferred over a <code>src-ip</code> route. Among the same type of > > + routes, the longest-prefix match is chosen. > > </p> > > > > <p> > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > index e3b7b0cb5f49..73905adccdd0 100644 > > --- a/tests/ovn-northd.at > > +++ b/tests/ovn-northd.at > > @@ -6823,9 +6823,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl > > table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) > > table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) > > table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) > > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;) > > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;) > > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > ]) > > > > AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl > > @@ -6841,9 +6841,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl > > table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) > > table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) > > table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) > > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > ]) > > AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl > > table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;) > > @@ -6870,9 +6870,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl > > table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) > > table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) > > table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) > > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) > > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > ]) > > AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl > > table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;) > > @@ -6888,14 +6888,14 @@ check ovn-nbctl --wait=sb lr-route-add lr0 1.0.0.0/24 192.168.0.10 > > ovn-sbctl dump-flows lr0 > lr0flows > > > > AT_CHECK([grep -e "lr_in_ip_routing.*192.168.0.10" lr0flows | ovn_strip_lflows], [0], [dnl > > - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > ]) > > > > check ovn-nbctl --wait=sb lr-route-add lr0 2.0.0.0/24 lr0-public > > > > ovn-sbctl dump-flows lr0 > lr0flows > > AT_CHECK([grep -e "lr_in_ip_routing.*2.0.0.0" lr0flows | ovn_strip_lflows], [0], [dnl > > - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) > > ]) > > > > check ovn-nbctl lr-route-add lr0 3.3.0.0/16 192.168.0.11 > > @@ -6910,7 +6910,7 @@ check ovn-nbctl set logical_router_static_route $route2_uuid selection_fields="i > > check ovn-nbctl --wait=sb sync > > ovn-sbctl dump-flows lr0 > lr0flows > > AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed 's/table=../table=??/' | sort], [0], [dnl > > - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > > + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > > ]) > > > > check ovn-nbctl set logical_router_static_route $route1_uuid selection_fields="ip_src,ip_dst,tp_src,tp_dst" > > @@ -6919,10 +6919,10 @@ check ovn-nbctl set logical_router_static_route $route2_uuid selection_fields="i > > check ovn-nbctl --wait=sb sync > > ovn-sbctl dump-flows lr0 > lr0flows > > AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed 's/table=../table=??/' | sort], [0], [dnl > > - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > > - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) > > - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) > > - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) > > + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) > > + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) > > + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) > > + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) > > ]) > > > > AT_CLEANUP > > @@ -7346,18 +7346,19 @@ AT_CHECK([grep "lr_in_ip_routing_pre" lr0flows | ovn_strip_lflows], [0], [dnl > > grep -e "(lr_in_ip_routing ).*outport" lr0flows > > > > AT_CHECK([grep -e "(lr_in_ip_routing ).*outport" lr0flows | ovn_strip_lflows], [0], [dnl > > - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 1 && ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 2 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > > - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 2 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 1 && ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) > > + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) > > ]) > > > > + > > AT_CLEANUP > > ]) > >
On 11/28/24 12:24 AM, Han Zhou wrote: > On Wed, Nov 27, 2024 at 3:27 AM Dumitru Ceara <dceara@redhat.com> wrote: >> >> Hi Han, >> >> On 11/27/24 7:14 AM, Han Zhou wrote: >>> The current implementation prefers dst routes over src routes only if >>> they have the same prefix length. This is not very useful for real world >>> use cases, because it doesn't make much sense to compare the prefix >>> length between different fields of the IP header (dst IP v.s. src IP). >>> The prefix length should make sense only when comparing for the same >>> field, either dst or src. >>> >>> This patch changes the behavior by always prefering dst routes over src >>> routes, regardless of the prefix length, and comparing prefix length >>> only within the same type of routes. >> >> This is a significant change in behavior and will potentially break >> existing deployments. In my opinion this is also not really a bug fix. >> The old behavior was properly documented and the implementation worked >> as documented. >> >> I don't think we can just change preference like that. I'm afraid that >> if we want to support this we need a way to opt-in through configuration. >> > > Thanks Dumitru for the review and feedback. I do agree it is a significant > behavior change. However, I was in doubt it would break any existing > deployments which was why I didn't add any configuration to control the > behavior, just to avoid unnecessary knobs. It may not be a strictly bug I agree, I usually don't like adding more config knobs because it complicates the test matrix and makes maintenance hard. Maybe we should make this a full fledged feature instead? What if we add a new NB table, Logical_Router_Static_Source_Route (or a better name), where we can add all source routes that need to have lower priority than destination routes? If we go that way, we'd have to add a new stage in the pipeline, before the current routing stage, in which we evaluate these new source routes. Like that all destination routes will implicitly have higher priority. I know it's less than ideal to go this way too but the original behavior is present in OVN for a long time, since 2016: https://github.com/ovn-org/ovn/commit/4b1a0567 > fix, but the original behavior to me is more like a bug in the design. I > can't imagine any real world use case that would rely on the old behavior > but be broken by the new behavior, and also never heard of any physical > routers implemented this way. > >>> >>> E.g., there are two routes: >>> >>> 10.0.0.0/8 nexthop A dst-ip >>> 192.168.11.0/24 nexthop B src-ip >>> >>> For a packet 192.168.11.123 -> 10.1.2.3: >>> >>> Before: nexthop will be B >>> >>> After: nexthop will be A >>> >> >> We already support logical router policies that can achieve the same >> thing. The user could just add a policy like: >> >> ip.dst == 10.0.0.0/8 action: "reroute A" >> > > Yes, I understand that we have policy route that support the desired > behavior, which was also discussed in the "reported-at", for the ovn-k8s, > but it ended up with still the src-ip static routes as the first stage and > then some more complex policy route for more complex routing decisions. Now > we are facing the same problem again - a need for variant prefix length > subnets for different ovn-k8s nodes. There are multiple alternatives to > solve the problem but it seems to be the most simple and straightforward to > "fix" the src-ip route behavior. > >> I understand policies might be tricky to get right but, on the other >> hand, changing existing behavior (as mentioned above) seems very risky >> to me. >> >> I'm not sure about other CMS but ovn-kubernetes uses a mix of >> src-ip/dst-ip routes. In our ovn-org/ovn CI we only run a subset of the >> ovn-kubernetes CI jobs so there's a chance we don't cover all their use >> cases. Would it be possible to do a full CI test run with >> ovn-kubernetes master branch and an OVN version that includes your patch. >> > > In fact we already ran full ovn-k8s test cases in our downstream CI for > this same change, which worked pretty well and nothing is broken, and of > course the variant length subnet problem was fixed without any changes > required in ovn-k8s. > I see, OK. > So, it would be great if you could give another thought about this, even > better if you have access to broader customer use cases regarding src-ip > routes, and see if there is a real risk of breaking existing use cases. If At least inside Red Hat I'm not aware of other uses of src-ip routes than the ones in ovn-kubernetes. I'm adding some more folks from the CMS-side (OpenShift, OpenStack) to the thread. Daniel, Jakub, Enrique, do you think changing the default behavior of the src-ip logical static routes (making them less preferred than any dest-ip static route) would affect OpenStack, OpenShift, CNV? > you still believe it is a high risk, we can definitely add a knob and keep > the default behavior unchanged. > My concern generic, for all CMS that might exist out there. I had a quick look at kube-ovn, microovn, lxd and couldn't see any use of src-ip routes. But that doesn't mean that there are no users out there that rely on the current behavior. Vladislav, are you guys using src-ip routes? If so, will this change break anything in your use case? Thanks, Dumitru > Best, > Han > >> I guess that would require a small adjustment to the ovn-kubernetes code >> to use the Dockerfile.fedora.dev file [0][1] (pointing to your ovn >> branch) in CI and opening a PR against their repo. >> >> Regards, >> Dumitru >> >> [0] >> > https://github.com/ovn-kubernetes/ovn-kubernetes/blob/master/dist/images/Dockerfile.fedora.dev >> [1] >> > https://github.com/ovn-kubernetes/ovn-kubernetes/blob/9cf5a7707ca278b96fed631a8780fbf3ff7e32bb/.github/workflows/test.yml#L212 >> >>> Reported-by: Girish Moodalbail <gmoodalbail@gmail.com> >>> Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050049.html >>> Signed-off-by: Han Zhou <hzhou@ovn.org> >>> --- >>> NEWS | 2 ++ >>> northd/northd.c | 12 ++++++---- >>> ovn-nb.xml | 6 ++--- >>> tests/ovn-northd.at | 53 +++++++++++++++++++++++---------------------- >>> 4 files changed, 40 insertions(+), 33 deletions(-) >>> >>> diff --git a/NEWS b/NEWS >>> index da3aba739ced..74ee23b6473b 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -4,6 +4,8 @@ Post v24.09.0 >>> hash (with specified hash fields) for ECMP routes >>> while choosing nexthop. >>> - ovn-ic: Add support for route tag to prevent route learning. >>> + - Adjusted priorities of src-ip based static routes to be lower > than other >>> + types of routes regardless of prefix length. >>> >>> OVN v24.09.0 - 13 Sep 2024 >>> -------------------------- >>> diff --git a/northd/northd.c b/northd/northd.c >>> index 2aa6c0958db4..039c20bda497 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -303,9 +303,9 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2)); >>> * same ip_prefix values: >>> * - connected route overrides static one; >>> * - static route overrides src-ip route. */ >>> -#define ROUTE_PRIO_OFFSET_MULTIPLIER 5 >>> -#define ROUTE_PRIO_OFFSET_STATIC 2 >>> -#define ROUTE_PRIO_OFFSET_CONNECTED 4 >>> +#define ROUTE_PRIO_OFFSET_MULTIPLIER 3 >>> +#define ROUTE_PRIO_OFFSET_STATIC 0 >>> +#define ROUTE_PRIO_OFFSET_CONNECTED 2 >>> >>> /* Returns the type of the datapath to which a flow with the given > 'stage' may >>> * be added. */ >>> @@ -11443,6 +11443,7 @@ build_route_match(const struct ovn_port > *op_inport, uint32_t rtb_id, >>> bool has_protocol_match) >>> { >>> const char *dir; >>> + int base = 0; >>> /* The priority here is calculated to implement > longest-prefix-match >>> * routing. */ >>> if (is_src_route) { >>> @@ -11450,6 +11451,9 @@ build_route_match(const struct ovn_port > *op_inport, uint32_t rtb_id, >>> ofs = 0; >>> } else { >>> dir = "dst"; >>> + /* dst routes have higher priority than all src routes > regardless of >>> + * prefix length. */ >>> + base = (128 + 1) * ROUTE_PRIO_OFFSET_MULTIPLIER; >>> } >>> >>> if (op_inport) { >>> @@ -11462,7 +11466,7 @@ build_route_match(const struct ovn_port > *op_inport, uint32_t rtb_id, >>> if (has_protocol_match) { >>> ofs += 1; >>> } >>> - *priority = (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; >>> + *priority = base + (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; >>> >>> ds_put_format(match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", dir, >>> network_s, plen); >>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>> index 5114bbc2eb9b..ad774cd4ba72 100644 >>> --- a/ovn-nb.xml >>> +++ b/ovn-nb.xml >>> @@ -3782,9 +3782,9 @@ or >>> </p> >>> >>> <p> >>> - When multiple routes match a packet, the longest-prefix match is > chosen. >>> - For a given prefix length, a <code>dst-ip</code> route is > preferred over >>> - a <code>src-ip</code> route. >>> + When multiple routes match a packet, a <code>dst-ip</code> route > is >>> + preferred over a <code>src-ip</code> route. Among the same type > of >>> + routes, the longest-prefix match is chosen. >>> </p> >>> >>> <p> >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>> index e3b7b0cb5f49..73905adccdd0 100644 >>> --- a/tests/ovn-northd.at >>> +++ b/tests/ovn-northd.at >>> @@ -6823,9 +6823,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | > ovn_strip_lflows], [0], [dnl >>> table=??(lr_in_ip_routing ), priority=0 , match=(1), > action=(drop;) >>> table=??(lr_in_ip_routing ), priority=10300, > match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), > action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = > 192.168.0.1; outport = "lr0-public"; next;) >>> table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || > nd_ra), action=(drop;) >>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; > flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && > ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == > "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = > 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; > flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && > ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == > "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = > 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>> ]) >>> >>> AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | > ovn_strip_lflows], [0], [dnl >>> @@ -6841,9 +6841,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | > ovn_strip_lflows], [0], [dnl >>> table=??(lr_in_ip_routing ), priority=0 , match=(1), > action=(drop;) >>> table=??(lr_in_ip_routing ), priority=10300, > match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), > action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = > 192.168.0.1; outport = "lr0-public"; next;) >>> table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || > nd_ra), action=(drop;) >>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; > flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && > ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(1, 2);) >>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == > "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = > 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; > flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && > ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(1, 2);) >>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == > "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = > 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>> ]) >>> AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed > 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl >>> table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), > action=(drop;) >>> @@ -6870,9 +6870,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | > ovn_strip_lflows], [0], [dnl >>> table=??(lr_in_ip_routing ), priority=0 , match=(1), > action=(drop;) >>> table=??(lr_in_ip_routing ), priority=10300, > match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), > action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = > 192.168.0.1; outport = "lr0-public"; next;) >>> table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || > nd_ra), action=(drop;) >>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; > flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && > ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(1, 2);) >>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == > "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = > 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; > flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && > ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(1, 2);) >>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == > "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = > 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>> ]) >>> AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed > 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl >>> table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), > action=(drop;) >>> @@ -6888,14 +6888,14 @@ check ovn-nbctl --wait=sb lr-route-add lr0 > 1.0.0.0/24 192.168.0.10 >>> ovn-sbctl dump-flows lr0 > lr0flows >>> >>> AT_CHECK([grep -e "lr_in_ip_routing.*192.168.0.10" lr0flows | > ovn_strip_lflows], [0], [dnl >>> - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && > ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = > "lr0-public"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && > ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = > "lr0-public"; flags.loopback = 1; next;) >>> ]) >>> >>> check ovn-nbctl --wait=sb lr-route-add lr0 2.0.0.0/24 lr0-public >>> >>> ovn-sbctl dump-flows lr0 > lr0flows >>> AT_CHECK([grep -e "lr_in_ip_routing.*2.0.0.0" lr0flows | > ovn_strip_lflows], [0], [dnl >>> - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && > ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = > "lr0-public"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && > ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = > "lr0-public"; flags.loopback = 1; next;) >>> ]) >>> >>> check ovn-nbctl lr-route-add lr0 3.3.0.0/16 192.168.0.11 >>> @@ -6910,7 +6910,7 @@ check ovn-nbctl set logical_router_static_route > $route2_uuid selection_fields="i >>> check ovn-nbctl --wait=sb sync >>> ovn-sbctl dump-flows lr0 > lr0flows >>> AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed > 's/table=../table=??/' | sort], [0], [dnl >>> - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src");) >>> + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src");) >>> ]) >>> >>> check ovn-nbctl set logical_router_static_route $route1_uuid > selection_fields="ip_src,ip_dst,tp_src,tp_dst" >>> @@ -6919,10 +6919,10 @@ check ovn-nbctl set logical_router_static_route > $route2_uuid selection_fields="i >>> check ovn-nbctl --wait=sb sync >>> ovn-sbctl dump-flows lr0 > lr0flows >>> AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed > 's/table=../table=??/' | sort], [0], [dnl >>> - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src");) >>> - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; > reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) >>> - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; > reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) >>> - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; > reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) >>> + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] > = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src");) >>> + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; > reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) >>> + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; > reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) >>> + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && > ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; > reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); > hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) >>> ]) >>> >>> AT_CLEANUP >>> @@ -7346,18 +7346,19 @@ AT_CHECK([grep "lr_in_ip_routing_pre" lr0flows > | ovn_strip_lflows], [0], [dnl >>> grep -e "(lr_in_ip_routing ).*outport" lr0flows >>> >>> AT_CHECK([grep -e "(lr_in_ip_routing ).*outport" lr0flows | > ovn_strip_lflows], [0], [dnl >>> - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 1 && > ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = > "lrp1"; flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; > flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == > 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; > flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == > 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; > flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 2 && > ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = > "lrp0"; flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 0 && > ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = > "lrp0"; flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 2 && > ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = > "lrp0"; flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == > "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = > 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == > "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = > 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) >>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == > "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = > 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 0 && > ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = > "lrp0"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 2 && > ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = > "lrp0"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 1 && > ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = > "lrp1"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == > 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; > flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == > 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; > flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == > 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 > = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; > flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 2 && > ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = > 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = > "lrp0"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == > "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = > 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == > "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = > 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) >>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == > "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; > xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = > 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) >>> ]) >>> >>> + >>> AT_CLEANUP >>> ]) >>> >
On 11/28/24 9:48 AM, Dumitru Ceara wrote: > On 11/28/24 12:24 AM, Han Zhou wrote: >> On Wed, Nov 27, 2024 at 3:27 AM Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> Hi Han, >>> >>> On 11/27/24 7:14 AM, Han Zhou wrote: >>>> The current implementation prefers dst routes over src routes only if >>>> they have the same prefix length. This is not very useful for real world >>>> use cases, because it doesn't make much sense to compare the prefix >>>> length between different fields of the IP header (dst IP v.s. src IP). >>>> The prefix length should make sense only when comparing for the same >>>> field, either dst or src. >>>> >>>> This patch changes the behavior by always prefering dst routes over src >>>> routes, regardless of the prefix length, and comparing prefix length >>>> only within the same type of routes. >>> >>> This is a significant change in behavior and will potentially break >>> existing deployments. In my opinion this is also not really a bug fix. >>> The old behavior was properly documented and the implementation worked >>> as documented. >>> >>> I don't think we can just change preference like that. I'm afraid that >>> if we want to support this we need a way to opt-in through configuration. >>> >> >> Thanks Dumitru for the review and feedback. I do agree it is a significant >> behavior change. However, I was in doubt it would break any existing >> deployments which was why I didn't add any configuration to control the >> behavior, just to avoid unnecessary knobs. It may not be a strictly bug > > I agree, I usually don't like adding more config knobs because it > complicates the test matrix and makes maintenance hard. Maybe we should > make this a full fledged feature instead? What if we add a new NB > table, Logical_Router_Static_Source_Route (or a better name), where we > can add all source routes that need to have lower priority than > destination routes? > > If we go that way, we'd have to add a new stage in the pipeline, before > the current routing stage, in which we evaluate these new source routes. > Like that all destination routes will implicitly have higher priority. > > I know it's less than ideal to go this way too but the original behavior > is present in OVN for a long time, since 2016: > > https://github.com/ovn-org/ovn/commit/4b1a0567 > >> fix, but the original behavior to me is more like a bug in the design. I >> can't imagine any real world use case that would rely on the old behavior >> but be broken by the new behavior, and also never heard of any physical >> routers implemented this way. >> >>>> >>>> E.g., there are two routes: >>>> >>>> 10.0.0.0/8 nexthop A dst-ip >>>> 192.168.11.0/24 nexthop B src-ip >>>> >>>> For a packet 192.168.11.123 -> 10.1.2.3: >>>> >>>> Before: nexthop will be B >>>> >>>> After: nexthop will be A >>>> >>> >>> We already support logical router policies that can achieve the same >>> thing. The user could just add a policy like: >>> >>> ip.dst == 10.0.0.0/8 action: "reroute A" >>> >> >> Yes, I understand that we have policy route that support the desired >> behavior, which was also discussed in the "reported-at", for the ovn-k8s, >> but it ended up with still the src-ip static routes as the first stage and >> then some more complex policy route for more complex routing decisions. Now >> we are facing the same problem again - a need for variant prefix length >> subnets for different ovn-k8s nodes. There are multiple alternatives to >> solve the problem but it seems to be the most simple and straightforward to >> "fix" the src-ip route behavior. >> >>> I understand policies might be tricky to get right but, on the other >>> hand, changing existing behavior (as mentioned above) seems very risky >>> to me. >>> >>> I'm not sure about other CMS but ovn-kubernetes uses a mix of >>> src-ip/dst-ip routes. In our ovn-org/ovn CI we only run a subset of the >>> ovn-kubernetes CI jobs so there's a chance we don't cover all their use >>> cases. Would it be possible to do a full CI test run with >>> ovn-kubernetes master branch and an OVN version that includes your patch. >>> >> >> In fact we already ran full ovn-k8s test cases in our downstream CI for >> this same change, which worked pretty well and nothing is broken, and of >> course the variant length subnet problem was fixed without any changes >> required in ovn-k8s. >> > > I see, OK. > >> So, it would be great if you could give another thought about this, even >> better if you have access to broader customer use cases regarding src-ip >> routes, and see if there is a real risk of breaking existing use cases. If > > At least inside Red Hat I'm not aware of other uses of src-ip routes > than the ones in ovn-kubernetes. I'm adding some more folks from the > CMS-side (OpenShift, OpenStack) to the thread. > > Daniel, Jakub, Enrique, do you think changing the default behavior of > the src-ip logical static routes (making them less preferred than any > dest-ip static route) would affect OpenStack, OpenShift, CNV? > >> you still believe it is a high risk, we can definitely add a knob and keep >> the default behavior unchanged. >> > > My concern generic, for all CMS that might exist out there. I had a > quick look at kube-ovn, microovn, lxd and couldn't see any use of src-ip > routes. > > But that doesn't mean that there are no users out there that rely on the > current behavior. > > Vladislav, are you guys using src-ip routes? If so, will this change > break anything in your use case? > Assuming silence means people are OK with this behavior change: Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks, Dumitru > Thanks, > Dumitru > >> Best, >> Han >> >>> I guess that would require a small adjustment to the ovn-kubernetes code >>> to use the Dockerfile.fedora.dev file [0][1] (pointing to your ovn >>> branch) in CI and opening a PR against their repo. >>> >>> Regards, >>> Dumitru >>> >>> [0] >>> >> https://github.com/ovn-kubernetes/ovn-kubernetes/blob/master/dist/images/Dockerfile.fedora.dev >>> [1] >>> >> https://github.com/ovn-kubernetes/ovn-kubernetes/blob/9cf5a7707ca278b96fed631a8780fbf3ff7e32bb/.github/workflows/test.yml#L212 >>> >>>> Reported-by: Girish Moodalbail <gmoodalbail@gmail.com> >>>> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050049.html >>>> Signed-off-by: Han Zhou <hzhou@ovn.org> >>>> --- >>>> NEWS | 2 ++ >>>> northd/northd.c | 12 ++++++---- >>>> ovn-nb.xml | 6 ++--- >>>> tests/ovn-northd.at | 53 +++++++++++++++++++++++---------------------- >>>> 4 files changed, 40 insertions(+), 33 deletions(-) >>>> >>>> diff --git a/NEWS b/NEWS >>>> index da3aba739ced..74ee23b6473b 100644 >>>> --- a/NEWS >>>> +++ b/NEWS >>>> @@ -4,6 +4,8 @@ Post v24.09.0 >>>> hash (with specified hash fields) for ECMP routes >>>> while choosing nexthop. >>>> - ovn-ic: Add support for route tag to prevent route learning. >>>> + - Adjusted priorities of src-ip based static routes to be lower >> than other >>>> + types of routes regardless of prefix length. >>>> >>>> OVN v24.09.0 - 13 Sep 2024 >>>> -------------------------- >>>> diff --git a/northd/northd.c b/northd/northd.c >>>> index 2aa6c0958db4..039c20bda497 100644 >>>> --- a/northd/northd.c >>>> +++ b/northd/northd.c >>>> @@ -303,9 +303,9 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2)); >>>> * same ip_prefix values: >>>> * - connected route overrides static one; >>>> * - static route overrides src-ip route. */ >>>> -#define ROUTE_PRIO_OFFSET_MULTIPLIER 5 >>>> -#define ROUTE_PRIO_OFFSET_STATIC 2 >>>> -#define ROUTE_PRIO_OFFSET_CONNECTED 4 >>>> +#define ROUTE_PRIO_OFFSET_MULTIPLIER 3 >>>> +#define ROUTE_PRIO_OFFSET_STATIC 0 >>>> +#define ROUTE_PRIO_OFFSET_CONNECTED 2 >>>> >>>> /* Returns the type of the datapath to which a flow with the given >> 'stage' may >>>> * be added. */ >>>> @@ -11443,6 +11443,7 @@ build_route_match(const struct ovn_port >> *op_inport, uint32_t rtb_id, >>>> bool has_protocol_match) >>>> { >>>> const char *dir; >>>> + int base = 0; >>>> /* The priority here is calculated to implement >> longest-prefix-match >>>> * routing. */ >>>> if (is_src_route) { >>>> @@ -11450,6 +11451,9 @@ build_route_match(const struct ovn_port >> *op_inport, uint32_t rtb_id, >>>> ofs = 0; >>>> } else { >>>> dir = "dst"; >>>> + /* dst routes have higher priority than all src routes >> regardless of >>>> + * prefix length. */ >>>> + base = (128 + 1) * ROUTE_PRIO_OFFSET_MULTIPLIER; >>>> } >>>> >>>> if (op_inport) { >>>> @@ -11462,7 +11466,7 @@ build_route_match(const struct ovn_port >> *op_inport, uint32_t rtb_id, >>>> if (has_protocol_match) { >>>> ofs += 1; >>>> } >>>> - *priority = (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; >>>> + *priority = base + (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; >>>> >>>> ds_put_format(match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", dir, >>>> network_s, plen); >>>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>>> index 5114bbc2eb9b..ad774cd4ba72 100644 >>>> --- a/ovn-nb.xml >>>> +++ b/ovn-nb.xml >>>> @@ -3782,9 +3782,9 @@ or >>>> </p> >>>> >>>> <p> >>>> - When multiple routes match a packet, the longest-prefix match is >> chosen. >>>> - For a given prefix length, a <code>dst-ip</code> route is >> preferred over >>>> - a <code>src-ip</code> route. >>>> + When multiple routes match a packet, a <code>dst-ip</code> route >> is >>>> + preferred over a <code>src-ip</code> route. Among the same type >> of >>>> + routes, the longest-prefix match is chosen. >>>> </p> >>>> >>>> <p> >>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>> index e3b7b0cb5f49..73905adccdd0 100644 >>>> --- a/tests/ovn-northd.at >>>> +++ b/tests/ovn-northd.at >>>> @@ -6823,9 +6823,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | >> ovn_strip_lflows], [0], [dnl >>>> table=??(lr_in_ip_routing ), priority=0 , match=(1), >> action=(drop;) >>>> table=??(lr_in_ip_routing ), priority=10300, >> match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), >> action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = >> 192.168.0.1; outport = "lr0-public"; next;) >>>> table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || >> nd_ra), action=(drop;) >>>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; >> flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == >> "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = >> 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; >> flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == >> "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = >> 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>>> ]) >>>> >>>> AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | >> ovn_strip_lflows], [0], [dnl >>>> @@ -6841,9 +6841,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | >> ovn_strip_lflows], [0], [dnl >>>> table=??(lr_in_ip_routing ), priority=0 , match=(1), >> action=(drop;) >>>> table=??(lr_in_ip_routing ), priority=10300, >> match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), >> action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = >> 192.168.0.1; outport = "lr0-public"; next;) >>>> table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || >> nd_ra), action=(drop;) >>>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; >> flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(1, 2);) >>>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == >> "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = >> 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; >> flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(1, 2);) >>>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == >> "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = >> 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>>> ]) >>>> AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed >> 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl >>>> table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), >> action=(drop;) >>>> @@ -6870,9 +6870,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | >> ovn_strip_lflows], [0], [dnl >>>> table=??(lr_in_ip_routing ), priority=0 , match=(1), >> action=(drop;) >>>> table=??(lr_in_ip_routing ), priority=10300, >> match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), >> action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = >> 192.168.0.1; outport = "lr0-public"; next;) >>>> table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || >> nd_ra), action=(drop;) >>>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; >> flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(1, 2);) >>>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == >> "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = >> 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; >> flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(1, 2);) >>>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == >> "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = >> 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) >>>> ]) >>>> AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed >> 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl >>>> table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), >> action=(drop;) >>>> @@ -6888,14 +6888,14 @@ check ovn-nbctl --wait=sb lr-route-add lr0 >> 1.0.0.0/24 192.168.0.10 >>>> ovn-sbctl dump-flows lr0 > lr0flows >>>> >>>> AT_CHECK([grep -e "lr_in_ip_routing.*192.168.0.10" lr0flows | >> ovn_strip_lflows], [0], [dnl >>>> - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = >> "lr0-public"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && >> ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = >> "lr0-public"; flags.loopback = 1; next;) >>>> ]) >>>> >>>> check ovn-nbctl --wait=sb lr-route-add lr0 2.0.0.0/24 lr0-public >>>> >>>> ovn-sbctl dump-flows lr0 > lr0flows >>>> AT_CHECK([grep -e "lr_in_ip_routing.*2.0.0.0" lr0flows | >> ovn_strip_lflows], [0], [dnl >>>> - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && >> ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = >> "lr0-public"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && >> ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = >> "lr0-public"; flags.loopback = 1; next;) >>>> ]) >>>> >>>> check ovn-nbctl lr-route-add lr0 3.3.0.0/16 192.168.0.11 >>>> @@ -6910,7 +6910,7 @@ check ovn-nbctl set logical_router_static_route >> $route2_uuid selection_fields="i >>>> check ovn-nbctl --wait=sb sync >>>> ovn-sbctl dump-flows lr0 > lr0flows >>>> AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed >> 's/table=../table=??/' | sort], [0], [dnl >>>> - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src");) >>>> + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src");) >>>> ]) >>>> >>>> check ovn-nbctl set logical_router_static_route $route1_uuid >> selection_fields="ip_src,ip_dst,tp_src,tp_dst" >>>> @@ -6919,10 +6919,10 @@ check ovn-nbctl set logical_router_static_route >> $route2_uuid selection_fields="i >>>> check ovn-nbctl --wait=sb sync >>>> ovn-sbctl dump-flows lr0 > lr0flows >>>> AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed >> 's/table=../table=??/' | sort], [0], [dnl >>>> - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src");) >>>> - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; >> reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) >>>> - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; >> reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) >>>> - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; >> reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) >>>> + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] >> = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src");) >>>> + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; >> reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) >>>> + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; >> reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) >>>> + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && >> ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; >> reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); >> hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) >>>> ]) >>>> >>>> AT_CLEANUP >>>> @@ -7346,18 +7346,19 @@ AT_CHECK([grep "lr_in_ip_routing_pre" lr0flows >> | ovn_strip_lflows], [0], [dnl >>>> grep -e "(lr_in_ip_routing ).*outport" lr0flows >>>> >>>> AT_CHECK([grep -e "(lr_in_ip_routing ).*outport" lr0flows | >> ovn_strip_lflows], [0], [dnl >>>> - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 1 && >> ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = >> "lrp1"; flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; >> flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == >> 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; >> flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == >> 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; >> flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 2 && >> ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = >> "lrp0"; flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 0 && >> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = >> "lrp0"; flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 2 && >> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = >> "lrp0"; flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == >> "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = >> 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == >> "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = >> 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) >>>> - table=??(lr_in_ip_routing ), priority=324 , match=(inport == >> "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = >> 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 0 && >> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = >> "lrp0"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 2 && >> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = >> "lrp0"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 1 && >> ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = >> "lrp1"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == >> 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; >> flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == >> 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; >> flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == >> 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 >> = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; >> flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 2 && >> ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = >> 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = >> "lrp0"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == >> "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = >> 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == >> "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = >> 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) >>>> + table=??(lr_in_ip_routing ), priority=581 , match=(inport == >> "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; >> xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = >> 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) >>>> ]) >>>> >>>> + >>>> AT_CLEANUP >>>> ]) >>>> >>
diff --git a/NEWS b/NEWS index da3aba739ced..74ee23b6473b 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,8 @@ Post v24.09.0 hash (with specified hash fields) for ECMP routes while choosing nexthop. - ovn-ic: Add support for route tag to prevent route learning. + - Adjusted priorities of src-ip based static routes to be lower than other + types of routes regardless of prefix length. OVN v24.09.0 - 13 Sep 2024 -------------------------- diff --git a/northd/northd.c b/northd/northd.c index 2aa6c0958db4..039c20bda497 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -303,9 +303,9 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2)); * same ip_prefix values: * - connected route overrides static one; * - static route overrides src-ip route. */ -#define ROUTE_PRIO_OFFSET_MULTIPLIER 5 -#define ROUTE_PRIO_OFFSET_STATIC 2 -#define ROUTE_PRIO_OFFSET_CONNECTED 4 +#define ROUTE_PRIO_OFFSET_MULTIPLIER 3 +#define ROUTE_PRIO_OFFSET_STATIC 0 +#define ROUTE_PRIO_OFFSET_CONNECTED 2 /* Returns the type of the datapath to which a flow with the given 'stage' may * be added. */ @@ -11443,6 +11443,7 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, bool has_protocol_match) { const char *dir; + int base = 0; /* The priority here is calculated to implement longest-prefix-match * routing. */ if (is_src_route) { @@ -11450,6 +11451,9 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, ofs = 0; } else { dir = "dst"; + /* dst routes have higher priority than all src routes regardless of + * prefix length. */ + base = (128 + 1) * ROUTE_PRIO_OFFSET_MULTIPLIER; } if (op_inport) { @@ -11462,7 +11466,7 @@ build_route_match(const struct ovn_port *op_inport, uint32_t rtb_id, if (has_protocol_match) { ofs += 1; } - *priority = (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; + *priority = base + (plen * ROUTE_PRIO_OFFSET_MULTIPLIER) + ofs; ds_put_format(match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", dir, network_s, plen); diff --git a/ovn-nb.xml b/ovn-nb.xml index 5114bbc2eb9b..ad774cd4ba72 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -3782,9 +3782,9 @@ or </p> <p> - When multiple routes match a packet, the longest-prefix match is chosen. - For a given prefix length, a <code>dst-ip</code> route is preferred over - a <code>src-ip</code> route. + When multiple routes match a packet, a <code>dst-ip</code> route is + preferred over a <code>src-ip</code> route. Among the same type of + routes, the longest-prefix match is chosen. </p> <p> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index e3b7b0cb5f49..73905adccdd0 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -6823,9 +6823,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;) - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;) + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) ]) AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl @@ -6841,9 +6841,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) ]) AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;) @@ -6870,9 +6870,9 @@ AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl table=??(lr_in_ip_routing ), priority=0 , match=(1), action=(drop;) table=??(lr_in_ip_routing ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;) table=??(lr_in_ip_routing ), priority=10550, match=(nd_rs || nd_ra), action=(drop;) - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(1, 2);) + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) ]) AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | sed 's/192\.168\.0\..0/192.168.0.??/' | ovn_strip_lflows], [0], [dnl table=??(lr_in_ip_routing_ecmp), priority=0 , match=(1), action=(drop;) @@ -6888,14 +6888,14 @@ check ovn-nbctl --wait=sb lr-route-add lr0 1.0.0.0/24 192.168.0.10 ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -e "lr_in_ip_routing.*192.168.0.10" lr0flows | ovn_strip_lflows], [0], [dnl - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && ip4.dst == 1.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) ]) check ovn-nbctl --wait=sb lr-route-add lr0 2.0.0.0/24 lr0-public ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -e "lr_in_ip_routing.*2.0.0.0" lr0flows | ovn_strip_lflows], [0], [dnl - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 0 && ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 0 && ip4.dst == 2.0.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;) ]) check ovn-nbctl lr-route-add lr0 3.3.0.0/16 192.168.0.11 @@ -6910,7 +6910,7 @@ check ovn-nbctl set logical_router_static_route $route2_uuid selection_fields="i check ovn-nbctl --wait=sb sync ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed 's/table=../table=??/' | sort], [0], [dnl - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) ]) check ovn-nbctl set logical_router_static_route $route1_uuid selection_fields="ip_src,ip_dst,tp_src,tp_dst" @@ -6919,10 +6919,10 @@ check ovn-nbctl set logical_router_static_route $route2_uuid selection_fields="i check ovn-nbctl --wait=sb sync ovn-sbctl dump-flows lr0 > lr0flows AT_CHECK([grep -e "(lr_in_ip_routing ).*3.3.0.0" lr0flows | sed 's/table=../table=??/' | sort], [0], [dnl - table=??(lr_in_ip_routing ), priority=82 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) - table=??(lr_in_ip_routing ), priority=83 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) + table=??(lr_in_ip_routing ), priority=435 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src");) + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && sctp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,sctp_dst,sctp_src");) + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && tcp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,tcp_dst,tcp_src");) + table=??(lr_in_ip_routing ), priority=436 , match=(reg7 == 0 && ip4.dst == 3.3.0.0/16 && udp), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = select(values=(1, 2); hash_fields="ip_dst,ip_proto,ip_src,udp_dst,udp_src");) ]) AT_CLEANUP @@ -7346,18 +7346,19 @@ AT_CHECK([grep "lr_in_ip_routing_pre" lr0flows | ovn_strip_lflows], [0], [dnl grep -e "(lr_in_ip_routing ).*outport" lr0flows AT_CHECK([grep -e "(lr_in_ip_routing ).*outport" lr0flows | ovn_strip_lflows], [0], [dnl - table=??(lr_in_ip_routing ), priority=122 , match=(reg7 == 1 && ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=124 , match=(ip4.dst == 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=162 , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=2 , match=(reg7 == 2 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) - table=??(lr_in_ip_routing ), priority=324 , match=(inport == "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 0 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=387 , match=(reg7 == 2 && ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=459 , match=(reg7 == 1 && ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.1.10; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=461 , match=(ip4.dst == 192.168.2.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.2.1; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=483 , match=(reg7 == 2 && ip4.dst == 1.1.1.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.20; reg1 = 192.168.0.1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp0" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:1; eth.src = 00:00:00:00:00:01; outport = "lrp0"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp1" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:101; eth.src = 00:00:00:00:01:01; outport = "lrp1"; flags.loopback = 1; next;) + table=??(lr_in_ip_routing ), priority=581 , match=(inport == "lrp2" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:ff:fe00:201; eth.src = 00:00:00:00:02:01; outport = "lrp2"; flags.loopback = 1; next;) ]) + AT_CLEANUP ])
The current implementation prefers dst routes over src routes only if they have the same prefix length. This is not very useful for real world use cases, because it doesn't make much sense to compare the prefix length between different fields of the IP header (dst IP v.s. src IP). The prefix length should make sense only when comparing for the same field, either dst or src. This patch changes the behavior by always prefering dst routes over src routes, regardless of the prefix length, and comparing prefix length only within the same type of routes. E.g., there are two routes: 10.0.0.0/8 nexthop A dst-ip 192.168.11.0/24 nexthop B src-ip For a packet 192.168.11.123 -> 10.1.2.3: Before: nexthop will be B After: nexthop will be A Reported-by: Girish Moodalbail <gmoodalbail@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050049.html Signed-off-by: Han Zhou <hzhou@ovn.org> --- NEWS | 2 ++ northd/northd.c | 12 ++++++---- ovn-nb.xml | 6 ++--- tests/ovn-northd.at | 53 +++++++++++++++++++++++---------------------- 4 files changed, 40 insertions(+), 33 deletions(-)