diff mbox series

[ovs-dev] northd: Respect --ecmp-symmetric-reply for single routes.

Message ID 20240917170010.798375-1-rosemarie@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] northd: Respect --ecmp-symmetric-reply for single routes. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Rosemarie O'Riorden Sept. 17, 2024, 5 p.m. UTC
When preparing to build ECMP and static route flows, routes are sorted
into unique routes (meaning they are not part of a group) or they are
added to EMCP groups. Then, ECMP route flows are built out of the
groups, and static route flows are built out of the unique routes.
However, 'unique routes' include ones that use the
--ecmp-symmetric-reply flag, meaning that they may not be added to an
ECMP group, and thus ECMP symmetric reply would not be used for those
flows.

For example, if one route is added and traffic is started, and then
later another route is added, the already flowing traffic might be
rerouted since it wasn't conntracked initially. This could break
symmetric reply with traffic using a different next-hop than before.

This change makes it so that when the --ecmp-symmetric-reply flag is
used, even for unique routes, an ECMP group is created which they are
added to. Thus they are added to the ECMP route flow, rather than
static. This allows ECMP groups to persist even when there is only one
route.

Edited documentation to support this change.
Also updated incorrect actions in documentation.

Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
Reported-at: https://issues.redhat.com/browse/FDP-786
Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
---
 northd/northd.c         | 33 ++++++++++++++++++++++-----------
 northd/ovn-northd.8.xml | 13 ++++++++++++-
 tests/ovn-northd.at     |  5 ++++-
 3 files changed, 38 insertions(+), 13 deletions(-)

Comments

Rosemarie O'Riorden Sept. 17, 2024, 7:50 p.m. UTC | #1
Recheck-request: github-robot-_ovn-kubernetes

On 9/17/24 1:00 PM, Rosemarie O'Riorden wrote:
> When preparing to build ECMP and static route flows, routes are sorted
> into unique routes (meaning they are not part of a group) or they are
> added to EMCP groups. Then, ECMP route flows are built out of the
> groups, and static route flows are built out of the unique routes.
> However, 'unique routes' include ones that use the
> --ecmp-symmetric-reply flag, meaning that they may not be added to an
> ECMP group, and thus ECMP symmetric reply would not be used for those
> flows.
> 
> For example, if one route is added and traffic is started, and then
> later another route is added, the already flowing traffic might be
> rerouted since it wasn't conntracked initially. This could break
> symmetric reply with traffic using a different next-hop than before.
> 
> This change makes it so that when the --ecmp-symmetric-reply flag is
> used, even for unique routes, an ECMP group is created which they are
> added to. Thus they are added to the ECMP route flow, rather than
> static. This allows ECMP groups to persist even when there is only one
> route.
> 
> Edited documentation to support this change.
> Also updated incorrect actions in documentation.
> 
> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
> Reported-at: https://issues.redhat.com/browse/FDP-786
> Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
> ---
>  northd/northd.c         | 33 ++++++++++++++++++++++-----------
>  northd/ovn-northd.8.xml | 13 ++++++++++++-
>  tests/ovn-northd.at     |  5 ++++-
>  3 files changed, 38 insertions(+), 13 deletions(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index 983c464eb..8ae3a75bd 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -11567,21 +11567,27 @@ build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
>  
>      struct ds actions = DS_EMPTY_INITIALIZER;
>      ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
> -                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
> -                  REG_ECMP_MEMBER_ID);
> +                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
>  
> -    bool is_first = true;
> -    LIST_FOR_EACH (er, list_node, &eg->route_list) {
> -        if (is_first) {
> -            is_first = false;
> -        } else {
> -            ds_put_cstr(&actions, ", ");
> +    if (!ovs_list_is_singleton(&eg->route_list)) {
> +        bool is_first = true;
> +
> +        ds_put_cstr(&actions, "select(");
> +        LIST_FOR_EACH (er, list_node, &eg->route_list) {
> +            if (is_first) {
> +                is_first = false;
> +            } else {
> +                ds_put_cstr(&actions, ", ");
> +            }
> +            ds_put_format(&actions, "%"PRIu16, er->id);
>          }
> -        ds_put_format(&actions, "%"PRIu16, er->id);
> +        ds_put_cstr(&actions, ");");
> +    } else {
> +        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
> +                          struct ecmp_route_list_node, list_node);
> +        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
>      }
>  
> -    ds_put_cstr(&actions, ");");
> -
>      ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, priority,
>                    ds_cstr(&route_match), ds_cstr(&actions),
>                    lflow_ref);
> @@ -13543,6 +13549,11 @@ build_static_route_flows_for_lrouter(
>                  if (group) {
>                      ecmp_groups_add_route(group, route);
>                  }
> +            } else if (route->ecmp_symmetric_reply) {
> +                /* Traffic for symmetric reply routes has to be conntracked
> +                 * even if there is only one next-hop, in case another next-hop
> +                 * is added later. */
> +                ecmp_groups_add(&ecmp_groups, route);
>              } else {
>                  unique_routes_add(&unique_routes, route);
>              }
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index ede38882a..ef5cd0c8c 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -4328,7 +4328,18 @@ next;
>  ip.ttl--;
>  flags.loopback = 1;
>  reg8[0..15] = <var>GID</var>;
> -select(reg8[16..31], <var>MID1</var>, <var>MID2</var>, ...);
> +reg8[16..31] = select(<var>MID1</var>, <var>MID2</var>, ...);
> +        </pre>
> +        <p>
> +          However, when there is only one route in an ECMP group, group actions
> +          will be:
> +        </p>
> +
> +        <pre>
> +ip.ttl--;
> +flags.loopback = 1;
> +reg8[0..15] = <var>GID</var>;
> +reg8[16..31] = <var>MID1</var>);
>          </pre>
>        </li>
>  
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index dcc3dbbc3..d459c23c0 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -6799,20 +6799,23 @@ check ovn-nbctl lsp-set-type public-lr0 router
>  check ovn-nbctl lsp-set-addresses public-lr0 router
>  check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
>  
> +# ECMP flows will be added even if there is only one next-hop.
>  check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.10
>  
>  ovn-sbctl dump-flows lr0 > lr0flows
>  
>  AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
>    table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
>    table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
>    table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
>    table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> -  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> +  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;)
>  ])
>  
>  AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
>    table=??(lr_in_ip_routing_ecmp), priority=0    , match=(1), action=(drop;)
> +  table=??(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[[0..15]] == 1 && reg8[[16..31]] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; next;)
>    table=??(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[[0..15]] == 0), action=(next;)
>  ])
>
Dumitru Ceara Sept. 18, 2024, 10:06 a.m. UTC | #2
On 9/17/24 19:00, Rosemarie O'Riorden wrote:
> When preparing to build ECMP and static route flows, routes are sorted
> into unique routes (meaning they are not part of a group) or they are
> added to EMCP groups. Then, ECMP route flows are built out of the
> groups, and static route flows are built out of the unique routes.
> However, 'unique routes' include ones that use the
> --ecmp-symmetric-reply flag, meaning that they may not be added to an
> ECMP group, and thus ECMP symmetric reply would not be used for those
> flows.
> 
> For example, if one route is added and traffic is started, and then
> later another route is added, the already flowing traffic might be
> rerouted since it wasn't conntracked initially. This could break
> symmetric reply with traffic using a different next-hop than before.
> 
> This change makes it so that when the --ecmp-symmetric-reply flag is
> used, even for unique routes, an ECMP group is created which they are
> added to. Thus they are added to the ECMP route flow, rather than
> static. This allows ECMP groups to persist even when there is only one
> route.
> 
> Edited documentation to support this change.
> Also updated incorrect actions in documentation.
> 
> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
> Reported-at: https://issues.redhat.com/browse/FDP-786
> Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
> ---

Hi Rosemarie,

Thanks for the fix!

>  northd/northd.c         | 33 ++++++++++++++++++++++-----------
>  northd/ovn-northd.8.xml | 13 ++++++++++++-
>  tests/ovn-northd.at     |  5 ++++-
>  3 files changed, 38 insertions(+), 13 deletions(-)
> 
> diff --git a/northd/northd.c b/northd/northd.c
> index 983c464eb..8ae3a75bd 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -11567,21 +11567,27 @@ build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
>  
>      struct ds actions = DS_EMPTY_INITIALIZER;
>      ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
> -                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
> -                  REG_ECMP_MEMBER_ID);
> +                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
>  
> -    bool is_first = true;
> -    LIST_FOR_EACH (er, list_node, &eg->route_list) {
> -        if (is_first) {
> -            is_first = false;
> -        } else {
> -            ds_put_cstr(&actions, ", ");
> +    if (!ovs_list_is_singleton(&eg->route_list)) {
> +        bool is_first = true;
> +
> +        ds_put_cstr(&actions, "select(");
> +        LIST_FOR_EACH (er, list_node, &eg->route_list) {
> +            if (is_first) {
> +                is_first = false;
> +            } else {
> +                ds_put_cstr(&actions, ", ");
> +            }
> +            ds_put_format(&actions, "%"PRIu16, er->id);
>          }
> -        ds_put_format(&actions, "%"PRIu16, er->id);
> +        ds_put_cstr(&actions, ");");
> +    } else {
> +        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
> +                          struct ecmp_route_list_node, list_node);
> +        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
>      }
>  
> -    ds_put_cstr(&actions, ");");
> -

Isn't it simpler to just allow "select(<single-value>)" logical actions?
 It doesn't feel too wrong to allow select to select a value from a set
that only contains a single value.

As far as I can tell, from a performance perspective this doesn't make
much difference.  Also, in general, we expect "ecmp routes with a single
path" to be a transient state, e.g., due to maintenance some paths are
brought down for a finite duration of time.

We do have to remove the restriction in `parse_select_action()` to allow
ovn-controller to parse actions of form "select(<single-value>)".

Then we don't need this special treatment for the single next hop case here.

>      ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, priority,
>                    ds_cstr(&route_match), ds_cstr(&actions),
>                    lflow_ref);
> @@ -13543,6 +13549,11 @@ build_static_route_flows_for_lrouter(
>                  if (group) {
>                      ecmp_groups_add_route(group, route);
>                  }
> +            } else if (route->ecmp_symmetric_reply) {
> +                /* Traffic for symmetric reply routes has to be conntracked
> +                 * even if there is only one next-hop, in case another next-hop
> +                 * is added later. */
> +                ecmp_groups_add(&ecmp_groups, route);

I think we can simplify the fix to only this part if we adapt the
"select()" action as described above.

>              } else {
>                  unique_routes_add(&unique_routes, route);
>              }
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index ede38882a..ef5cd0c8c 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -4328,7 +4328,18 @@ next;
>  ip.ttl--;
>  flags.loopback = 1;
>  reg8[0..15] = <var>GID</var>;
> -select(reg8[16..31], <var>MID1</var>, <var>MID2</var>, ...);
> +reg8[16..31] = select(<var>MID1</var>, <var>MID2</var>, ...);
> +        </pre>
> +        <p>
> +          However, when there is only one route in an ECMP group, group actions
> +          will be:
> +        </p>
> +
> +        <pre>
> +ip.ttl--;
> +flags.loopback = 1;
> +reg8[0..15] = <var>GID</var>;
> +reg8[16..31] = <var>MID1</var>);
>          </pre>
>        </li>
>  
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index dcc3dbbc3..d459c23c0 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -6799,20 +6799,23 @@ check ovn-nbctl lsp-set-type public-lr0 router
>  check ovn-nbctl lsp-set-addresses public-lr0 router
>  check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
>  
> +# ECMP flows will be added even if there is only one next-hop.
>  check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.10
>  
>  ovn-sbctl dump-flows lr0 > lr0flows
>  
>  AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
>    table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
> +  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
>    table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
>    table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
>    table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> -  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
> +  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;)
>  ])
>  
>  AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
>    table=??(lr_in_ip_routing_ecmp), priority=0    , match=(1), action=(drop;)
> +  table=??(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[[0..15]] == 1 && reg8[[16..31]] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; next;)
>    table=??(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[[0..15]] == 0), action=(next;)
>  ])
>  

Thanks for updating the test but would it also be possible to update the
"ECMP symmetric reply" and "ECMP IPv6 symmetric reply" system tests in
system-ovn.at?

We could try to ensure that when we delete one of the two equal cost
paths traffic still flows and conntrack label (and mark?) are correctly set.

What do you think?

Thanks,
Dumitru
Ilya Maximets Sept. 18, 2024, 10:26 a.m. UTC | #3
On 9/18/24 12:06, Dumitru Ceara wrote:
> On 9/17/24 19:00, Rosemarie O'Riorden wrote:
>> When preparing to build ECMP and static route flows, routes are sorted
>> into unique routes (meaning they are not part of a group) or they are
>> added to EMCP groups. Then, ECMP route flows are built out of the
>> groups, and static route flows are built out of the unique routes.
>> However, 'unique routes' include ones that use the
>> --ecmp-symmetric-reply flag, meaning that they may not be added to an
>> ECMP group, and thus ECMP symmetric reply would not be used for those
>> flows.
>>
>> For example, if one route is added and traffic is started, and then
>> later another route is added, the already flowing traffic might be
>> rerouted since it wasn't conntracked initially. This could break
>> symmetric reply with traffic using a different next-hop than before.
>>
>> This change makes it so that when the --ecmp-symmetric-reply flag is
>> used, even for unique routes, an ECMP group is created which they are
>> added to. Thus they are added to the ECMP route flow, rather than
>> static. This allows ECMP groups to persist even when there is only one
>> route.
>>
>> Edited documentation to support this change.
>> Also updated incorrect actions in documentation.
>>
>> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
>> Reported-at: https://issues.redhat.com/browse/FDP-786
>> Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
>> ---
> 
> Hi Rosemarie,
> 
> Thanks for the fix!
> 
>>  northd/northd.c         | 33 ++++++++++++++++++++++-----------
>>  northd/ovn-northd.8.xml | 13 ++++++++++++-
>>  tests/ovn-northd.at     |  5 ++++-
>>  3 files changed, 38 insertions(+), 13 deletions(-)
>>
>> diff --git a/northd/northd.c b/northd/northd.c
>> index 983c464eb..8ae3a75bd 100644
>> --- a/northd/northd.c
>> +++ b/northd/northd.c
>> @@ -11567,21 +11567,27 @@ build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
>>  
>>      struct ds actions = DS_EMPTY_INITIALIZER;
>>      ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
>> -                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
>> -                  REG_ECMP_MEMBER_ID);
>> +                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
>>  
>> -    bool is_first = true;
>> -    LIST_FOR_EACH (er, list_node, &eg->route_list) {
>> -        if (is_first) {
>> -            is_first = false;
>> -        } else {
>> -            ds_put_cstr(&actions, ", ");
>> +    if (!ovs_list_is_singleton(&eg->route_list)) {
>> +        bool is_first = true;
>> +
>> +        ds_put_cstr(&actions, "select(");
>> +        LIST_FOR_EACH (er, list_node, &eg->route_list) {
>> +            if (is_first) {
>> +                is_first = false;
>> +            } else {
>> +                ds_put_cstr(&actions, ", ");
>> +            }
>> +            ds_put_format(&actions, "%"PRIu16, er->id);
>>          }
>> -        ds_put_format(&actions, "%"PRIu16, er->id);
>> +        ds_put_cstr(&actions, ");");
>> +    } else {
>> +        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
>> +                          struct ecmp_route_list_node, list_node);
>> +        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
>>      }
>>  
>> -    ds_put_cstr(&actions, ");");
>> -
> 
> Isn't it simpler to just allow "select(<single-value>)" logical actions?
>  It doesn't feel too wrong to allow select to select a value from a set
> that only contains a single value.
> 
> As far as I can tell, from a performance perspective this doesn't make
> much difference.  Also, in general, we expect "ecmp routes with a single
> path" to be a transient state, e.g., due to maintenance some paths are
> brought down for a finite duration of time.
> 
> We do have to remove the restriction in `parse_select_action()` to allow
> ovn-controller to parse actions of form "select(<single-value>)".
> 
> Then we don't need this special treatment for the single next hop case here.

Wouldn't this require even more special treatment in a form of a
feature flag that ovn-controller reports so we don't generate actions
it doesn't understand?

Might be painful, especially if we're going to backport this fix
to stable releases.

Best regards, Ilya Maximets.
Ilya Maximets Sept. 18, 2024, 10:31 a.m. UTC | #4
On 9/18/24 12:26, Ilya Maximets wrote:
> On 9/18/24 12:06, Dumitru Ceara wrote:
>> On 9/17/24 19:00, Rosemarie O'Riorden wrote:
>>> When preparing to build ECMP and static route flows, routes are sorted
>>> into unique routes (meaning they are not part of a group) or they are
>>> added to EMCP groups. Then, ECMP route flows are built out of the
>>> groups, and static route flows are built out of the unique routes.
>>> However, 'unique routes' include ones that use the
>>> --ecmp-symmetric-reply flag, meaning that they may not be added to an
>>> ECMP group, and thus ECMP symmetric reply would not be used for those
>>> flows.
>>>
>>> For example, if one route is added and traffic is started, and then
>>> later another route is added, the already flowing traffic might be
>>> rerouted since it wasn't conntracked initially. This could break
>>> symmetric reply with traffic using a different next-hop than before.
>>>
>>> This change makes it so that when the --ecmp-symmetric-reply flag is
>>> used, even for unique routes, an ECMP group is created which they are
>>> added to. Thus they are added to the ECMP route flow, rather than
>>> static. This allows ECMP groups to persist even when there is only one
>>> route.
>>>
>>> Edited documentation to support this change.
>>> Also updated incorrect actions in documentation.
>>>
>>> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
>>> Reported-at: https://issues.redhat.com/browse/FDP-786
>>> Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
>>> ---
>>
>> Hi Rosemarie,
>>
>> Thanks for the fix!
>>
>>>  northd/northd.c         | 33 ++++++++++++++++++++++-----------
>>>  northd/ovn-northd.8.xml | 13 ++++++++++++-
>>>  tests/ovn-northd.at     |  5 ++++-
>>>  3 files changed, 38 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 983c464eb..8ae3a75bd 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -11567,21 +11567,27 @@ build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
>>>  
>>>      struct ds actions = DS_EMPTY_INITIALIZER;
>>>      ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
>>> -                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
>>> -                  REG_ECMP_MEMBER_ID);
>>> +                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
>>>  
>>> -    bool is_first = true;
>>> -    LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>> -        if (is_first) {
>>> -            is_first = false;
>>> -        } else {
>>> -            ds_put_cstr(&actions, ", ");
>>> +    if (!ovs_list_is_singleton(&eg->route_list)) {
>>> +        bool is_first = true;
>>> +
>>> +        ds_put_cstr(&actions, "select(");
>>> +        LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>> +            if (is_first) {
>>> +                is_first = false;
>>> +            } else {
>>> +                ds_put_cstr(&actions, ", ");
>>> +            }
>>> +            ds_put_format(&actions, "%"PRIu16, er->id);
>>>          }
>>> -        ds_put_format(&actions, "%"PRIu16, er->id);
>>> +        ds_put_cstr(&actions, ");");
>>> +    } else {
>>> +        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
>>> +                          struct ecmp_route_list_node, list_node);
>>> +        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
>>>      }
>>>  
>>> -    ds_put_cstr(&actions, ");");
>>> -
>>
>> Isn't it simpler to just allow "select(<single-value>)" logical actions?
>>  It doesn't feel too wrong to allow select to select a value from a set
>> that only contains a single value.
>>
>> As far as I can tell, from a performance perspective this doesn't make
>> much difference.  Also, in general, we expect "ecmp routes with a single
>> path" to be a transient state, e.g., due to maintenance some paths are
>> brought down for a finite duration of time.
>>
>> We do have to remove the restriction in `parse_select_action()` to allow
>> ovn-controller to parse actions of form "select(<single-value>)".
>>
>> Then we don't need this special treatment for the single next hop case here.
> 
> Wouldn't this require even more special treatment in a form of a
> feature flag that ovn-controller reports so we don't generate actions
> it doesn't understand?
> 
> Might be painful, especially if we're going to backport this fix
> to stable releases.

Also, there is already a similar code in northd for router policies.  So,
it would make more sense to change both at once outside of a bug fix.

> 
> Best regards, Ilya Maximets.
Dumitru Ceara Sept. 18, 2024, 10:39 a.m. UTC | #5
On 9/18/24 12:31, Ilya Maximets wrote:
> On 9/18/24 12:26, Ilya Maximets wrote:
>> On 9/18/24 12:06, Dumitru Ceara wrote:
>>> On 9/17/24 19:00, Rosemarie O'Riorden wrote:
>>>> When preparing to build ECMP and static route flows, routes are sorted
>>>> into unique routes (meaning they are not part of a group) or they are
>>>> added to EMCP groups. Then, ECMP route flows are built out of the
>>>> groups, and static route flows are built out of the unique routes.
>>>> However, 'unique routes' include ones that use the
>>>> --ecmp-symmetric-reply flag, meaning that they may not be added to an
>>>> ECMP group, and thus ECMP symmetric reply would not be used for those
>>>> flows.
>>>>
>>>> For example, if one route is added and traffic is started, and then
>>>> later another route is added, the already flowing traffic might be
>>>> rerouted since it wasn't conntracked initially. This could break
>>>> symmetric reply with traffic using a different next-hop than before.
>>>>
>>>> This change makes it so that when the --ecmp-symmetric-reply flag is
>>>> used, even for unique routes, an ECMP group is created which they are
>>>> added to. Thus they are added to the ECMP route flow, rather than
>>>> static. This allows ECMP groups to persist even when there is only one
>>>> route.
>>>>
>>>> Edited documentation to support this change.
>>>> Also updated incorrect actions in documentation.
>>>>
>>>> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
>>>> Reported-at: https://issues.redhat.com/browse/FDP-786
>>>> Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
>>>> ---
>>>
>>> Hi Rosemarie,
>>>
>>> Thanks for the fix!
>>>
>>>>  northd/northd.c         | 33 ++++++++++++++++++++++-----------
>>>>  northd/ovn-northd.8.xml | 13 ++++++++++++-
>>>>  tests/ovn-northd.at     |  5 ++++-
>>>>  3 files changed, 38 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>> index 983c464eb..8ae3a75bd 100644
>>>> --- a/northd/northd.c
>>>> +++ b/northd/northd.c
>>>> @@ -11567,21 +11567,27 @@ build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
>>>>  
>>>>      struct ds actions = DS_EMPTY_INITIALIZER;
>>>>      ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
>>>> -                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
>>>> -                  REG_ECMP_MEMBER_ID);
>>>> +                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
>>>>  
>>>> -    bool is_first = true;
>>>> -    LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>>> -        if (is_first) {
>>>> -            is_first = false;
>>>> -        } else {
>>>> -            ds_put_cstr(&actions, ", ");
>>>> +    if (!ovs_list_is_singleton(&eg->route_list)) {
>>>> +        bool is_first = true;
>>>> +
>>>> +        ds_put_cstr(&actions, "select(");
>>>> +        LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>>> +            if (is_first) {
>>>> +                is_first = false;
>>>> +            } else {
>>>> +                ds_put_cstr(&actions, ", ");
>>>> +            }
>>>> +            ds_put_format(&actions, "%"PRIu16, er->id);
>>>>          }
>>>> -        ds_put_format(&actions, "%"PRIu16, er->id);
>>>> +        ds_put_cstr(&actions, ");");
>>>> +    } else {
>>>> +        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
>>>> +                          struct ecmp_route_list_node, list_node);
>>>> +        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
>>>>      }
>>>>  
>>>> -    ds_put_cstr(&actions, ");");
>>>> -
>>>
>>> Isn't it simpler to just allow "select(<single-value>)" logical actions?
>>>  It doesn't feel too wrong to allow select to select a value from a set
>>> that only contains a single value.
>>>
>>> As far as I can tell, from a performance perspective this doesn't make
>>> much difference.  Also, in general, we expect "ecmp routes with a single
>>> path" to be a transient state, e.g., due to maintenance some paths are
>>> brought down for a finite duration of time.
>>>
>>> We do have to remove the restriction in `parse_select_action()` to allow
>>> ovn-controller to parse actions of form "select(<single-value>)".
>>>
>>> Then we don't need this special treatment for the single next hop case here.
>>
>> Wouldn't this require even more special treatment in a form of a
>> feature flag that ovn-controller reports so we don't generate actions
>> it doesn't understand?
>>
>> Might be painful, especially if we're going to backport this fix
>> to stable releases.

Good point.  Although our official upgrade policy still states that
ovn-controller should be upgraded first or version pinning should be
enabled (in this case ovn-controller will not try to parse the new SB
flow until it is restarted).

https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html#upgrade-procedures

This bug report came from OpenShift and there, as far as I know, there's
a guarantee that the ovn-controller version will always be >= to the
ovn-northd version.  To my knowledge they're the only users of
ecmp-symmetric-reply but of course we can't know that 100% sure.

> 
> Also, there is already a similar code in northd for router policies.  So,
> it would make more sense to change both at once outside of a bug fix.
> 

It's a bit different there as there's no concept of "equal cost" policy
so we can't really know if the intention is to add more paths in the
future or not.  While with routes, IMO, it doesn't make sense to have a
"single path equal cost multi-path" route - except when that state is
transient.

But given that we have this similar code for policies, we might as well
do it for routes too as Rosemarie's patch does.

I'd still prefer to have system tests to cover the single path case too
though.

Regards,
Dumitru
Ilya Maximets Sept. 18, 2024, 11:14 a.m. UTC | #6
On 9/18/24 12:39, Dumitru Ceara wrote:
> On 9/18/24 12:31, Ilya Maximets wrote:
>> On 9/18/24 12:26, Ilya Maximets wrote:
>>> On 9/18/24 12:06, Dumitru Ceara wrote:
>>>> On 9/17/24 19:00, Rosemarie O'Riorden wrote:
>>>>> When preparing to build ECMP and static route flows, routes are sorted
>>>>> into unique routes (meaning they are not part of a group) or they are
>>>>> added to EMCP groups. Then, ECMP route flows are built out of the
>>>>> groups, and static route flows are built out of the unique routes.
>>>>> However, 'unique routes' include ones that use the
>>>>> --ecmp-symmetric-reply flag, meaning that they may not be added to an
>>>>> ECMP group, and thus ECMP symmetric reply would not be used for those
>>>>> flows.
>>>>>
>>>>> For example, if one route is added and traffic is started, and then
>>>>> later another route is added, the already flowing traffic might be
>>>>> rerouted since it wasn't conntracked initially. This could break
>>>>> symmetric reply with traffic using a different next-hop than before.
>>>>>
>>>>> This change makes it so that when the --ecmp-symmetric-reply flag is
>>>>> used, even for unique routes, an ECMP group is created which they are
>>>>> added to. Thus they are added to the ECMP route flow, rather than
>>>>> static. This allows ECMP groups to persist even when there is only one
>>>>> route.
>>>>>
>>>>> Edited documentation to support this change.
>>>>> Also updated incorrect actions in documentation.
>>>>>
>>>>> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
>>>>> Reported-at: https://issues.redhat.com/browse/FDP-786
>>>>> Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
>>>>> ---
>>>>
>>>> Hi Rosemarie,
>>>>
>>>> Thanks for the fix!
>>>>
>>>>>  northd/northd.c         | 33 ++++++++++++++++++++++-----------
>>>>>  northd/ovn-northd.8.xml | 13 ++++++++++++-
>>>>>  tests/ovn-northd.at     |  5 ++++-
>>>>>  3 files changed, 38 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>> index 983c464eb..8ae3a75bd 100644
>>>>> --- a/northd/northd.c
>>>>> +++ b/northd/northd.c
>>>>> @@ -11567,21 +11567,27 @@ build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
>>>>>  
>>>>>      struct ds actions = DS_EMPTY_INITIALIZER;
>>>>>      ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
>>>>> -                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
>>>>> -                  REG_ECMP_MEMBER_ID);
>>>>> +                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
>>>>>  
>>>>> -    bool is_first = true;
>>>>> -    LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>>>> -        if (is_first) {
>>>>> -            is_first = false;
>>>>> -        } else {
>>>>> -            ds_put_cstr(&actions, ", ");
>>>>> +    if (!ovs_list_is_singleton(&eg->route_list)) {
>>>>> +        bool is_first = true;
>>>>> +
>>>>> +        ds_put_cstr(&actions, "select(");
>>>>> +        LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>>>> +            if (is_first) {
>>>>> +                is_first = false;
>>>>> +            } else {
>>>>> +                ds_put_cstr(&actions, ", ");
>>>>> +            }
>>>>> +            ds_put_format(&actions, "%"PRIu16, er->id);
>>>>>          }
>>>>> -        ds_put_format(&actions, "%"PRIu16, er->id);
>>>>> +        ds_put_cstr(&actions, ");");
>>>>> +    } else {
>>>>> +        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
>>>>> +                          struct ecmp_route_list_node, list_node);
>>>>> +        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
>>>>>      }
>>>>>  
>>>>> -    ds_put_cstr(&actions, ");");
>>>>> -
>>>>
>>>> Isn't it simpler to just allow "select(<single-value>)" logical actions?
>>>>  It doesn't feel too wrong to allow select to select a value from a set
>>>> that only contains a single value.
>>>>
>>>> As far as I can tell, from a performance perspective this doesn't make
>>>> much difference.  Also, in general, we expect "ecmp routes with a single
>>>> path" to be a transient state, e.g., due to maintenance some paths are
>>>> brought down for a finite duration of time.
>>>>
>>>> We do have to remove the restriction in `parse_select_action()` to allow
>>>> ovn-controller to parse actions of form "select(<single-value>)".
>>>>
>>>> Then we don't need this special treatment for the single next hop case here.
>>>
>>> Wouldn't this require even more special treatment in a form of a
>>> feature flag that ovn-controller reports so we don't generate actions
>>> it doesn't understand?
>>>
>>> Might be painful, especially if we're going to backport this fix
>>> to stable releases.
> 
> Good point.  Although our official upgrade policy still states that
> ovn-controller should be upgraded first or version pinning should be
> enabled (in this case ovn-controller will not try to parse the new SB
> flow until it is restarted).
> 
> https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html#upgrade-procedures

Sure, but shouldn't we also remove some of the chassis feature flags then?
The ones that rely only on ovn-controller version and not anything external
like OVS or kernel.  Some of them were added very recently with the
justification of the opposite upgrade order.  See this commit for example:
  d9c97878eb23 ("actions: New action ct_commit_to_zone.")

If the upgrade order is not enforced for new features, there is no point
doing that for bug fix patches.

Also, the upgrade order only applies to major upgrades AFAIK.  After
this fix is backported, users are free to perform minor package updates
in any order and they'll hit the issue.

> 
> This bug report came from OpenShift and there, as far as I know, there's
> a guarantee that the ovn-controller version will always be >= to the
> ovn-northd version.  To my knowledge they're the only users of
> ecmp-symmetric-reply but of course we can't know that 100% sure.

At least kube-ovn supports ecmp with symmetric reply configuration.
So, OpenShift is unlikely to be the only user.

> 
>>
>> Also, there is already a similar code in northd for router policies.  So,
>> it would make more sense to change both at once outside of a bug fix.
>>
> 
> It's a bit different there as there's no concept of "equal cost" policy
> so we can't really know if the intention is to add more paths in the
> future or not.  While with routes, IMO, it doesn't make sense to have a
> "single path equal cost multi-path" route - except when that state is
> transient.

I can see someone adding this flag to the route just in case someday
they'll need to add another one.  Totally valid use case IMO, and the
configuration will not be transient.

> 
> But given that we have this similar code for policies, we might as well
> do it for routes too as Rosemarie's patch does.

OK.

> 
> I'd still prefer to have system tests to cover the single path case too
> though.

Sure.

Best regards, Ilya Maximets.
Dumitru Ceara Sept. 18, 2024, 11:38 a.m. UTC | #7
On 9/18/24 13:14, Ilya Maximets wrote:
> On 9/18/24 12:39, Dumitru Ceara wrote:
>> On 9/18/24 12:31, Ilya Maximets wrote:
>>> On 9/18/24 12:26, Ilya Maximets wrote:
>>>> On 9/18/24 12:06, Dumitru Ceara wrote:
>>>>> On 9/17/24 19:00, Rosemarie O'Riorden wrote:
>>>>>> When preparing to build ECMP and static route flows, routes are sorted
>>>>>> into unique routes (meaning they are not part of a group) or they are
>>>>>> added to EMCP groups. Then, ECMP route flows are built out of the
>>>>>> groups, and static route flows are built out of the unique routes.
>>>>>> However, 'unique routes' include ones that use the
>>>>>> --ecmp-symmetric-reply flag, meaning that they may not be added to an
>>>>>> ECMP group, and thus ECMP symmetric reply would not be used for those
>>>>>> flows.
>>>>>>
>>>>>> For example, if one route is added and traffic is started, and then
>>>>>> later another route is added, the already flowing traffic might be
>>>>>> rerouted since it wasn't conntracked initially. This could break
>>>>>> symmetric reply with traffic using a different next-hop than before.
>>>>>>
>>>>>> This change makes it so that when the --ecmp-symmetric-reply flag is
>>>>>> used, even for unique routes, an ECMP group is created which they are
>>>>>> added to. Thus they are added to the ECMP route flow, rather than
>>>>>> static. This allows ECMP groups to persist even when there is only one
>>>>>> route.
>>>>>>
>>>>>> Edited documentation to support this change.
>>>>>> Also updated incorrect actions in documentation.
>>>>>>
>>>>>> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
>>>>>> Reported-at: https://issues.redhat.com/browse/FDP-786
>>>>>> Signed-off-by: Rosemarie O'Riorden <rosemarie@redhat.com>
>>>>>> ---
>>>>>
>>>>> Hi Rosemarie,
>>>>>
>>>>> Thanks for the fix!
>>>>>
>>>>>>  northd/northd.c         | 33 ++++++++++++++++++++++-----------
>>>>>>  northd/ovn-northd.8.xml | 13 ++++++++++++-
>>>>>>  tests/ovn-northd.at     |  5 ++++-
>>>>>>  3 files changed, 38 insertions(+), 13 deletions(-)
>>>>>>
>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>> index 983c464eb..8ae3a75bd 100644
>>>>>> --- a/northd/northd.c
>>>>>> +++ b/northd/northd.c
>>>>>> @@ -11567,21 +11567,27 @@ build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
>>>>>>  
>>>>>>      struct ds actions = DS_EMPTY_INITIALIZER;
>>>>>>      ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
>>>>>> -                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
>>>>>> -                  REG_ECMP_MEMBER_ID);
>>>>>> +                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
>>>>>>  
>>>>>> -    bool is_first = true;
>>>>>> -    LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>>>>> -        if (is_first) {
>>>>>> -            is_first = false;
>>>>>> -        } else {
>>>>>> -            ds_put_cstr(&actions, ", ");
>>>>>> +    if (!ovs_list_is_singleton(&eg->route_list)) {
>>>>>> +        bool is_first = true;
>>>>>> +
>>>>>> +        ds_put_cstr(&actions, "select(");
>>>>>> +        LIST_FOR_EACH (er, list_node, &eg->route_list) {
>>>>>> +            if (is_first) {
>>>>>> +                is_first = false;
>>>>>> +            } else {
>>>>>> +                ds_put_cstr(&actions, ", ");
>>>>>> +            }
>>>>>> +            ds_put_format(&actions, "%"PRIu16, er->id);
>>>>>>          }
>>>>>> -        ds_put_format(&actions, "%"PRIu16, er->id);
>>>>>> +        ds_put_cstr(&actions, ");");
>>>>>> +    } else {
>>>>>> +        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
>>>>>> +                          struct ecmp_route_list_node, list_node);
>>>>>> +        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
>>>>>>      }
>>>>>>  
>>>>>> -    ds_put_cstr(&actions, ");");
>>>>>> -
>>>>>
>>>>> Isn't it simpler to just allow "select(<single-value>)" logical actions?
>>>>>  It doesn't feel too wrong to allow select to select a value from a set
>>>>> that only contains a single value.
>>>>>
>>>>> As far as I can tell, from a performance perspective this doesn't make
>>>>> much difference.  Also, in general, we expect "ecmp routes with a single
>>>>> path" to be a transient state, e.g., due to maintenance some paths are
>>>>> brought down for a finite duration of time.
>>>>>
>>>>> We do have to remove the restriction in `parse_select_action()` to allow
>>>>> ovn-controller to parse actions of form "select(<single-value>)".
>>>>>
>>>>> Then we don't need this special treatment for the single next hop case here.
>>>>
>>>> Wouldn't this require even more special treatment in a form of a
>>>> feature flag that ovn-controller reports so we don't generate actions
>>>> it doesn't understand?
>>>>
>>>> Might be painful, especially if we're going to backport this fix
>>>> to stable releases.
>>
>> Good point.  Although our official upgrade policy still states that
>> ovn-controller should be upgraded first or version pinning should be
>> enabled (in this case ovn-controller will not try to parse the new SB
>> flow until it is restarted).
>>
>> https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html#upgrade-procedures
> 
> Sure, but shouldn't we also remove some of the chassis feature flags then?
> The ones that rely only on ovn-controller version and not anything external
> like OVS or kernel.  Some of them were added very recently with the
> justification of the opposite upgrade order.  See this commit for example:
>   d9c97878eb23 ("actions: New action ct_commit_to_zone.")
> 
> If the upgrade order is not enforced for new features, there is no point
> doing that for bug fix patches.
> 
> Also, the upgrade order only applies to major upgrades AFAIK.  After
> this fix is backported, users are free to perform minor package updates
> in any order and they'll hit the issue.
> 

Yeah, it's still not clear to me I guess what we should allow users to
do and what not.  But the safest option is to use a feature flag in such
cases.  We should revisit all this at some point in the future, I think.

>>
>> This bug report came from OpenShift and there, as far as I know, there's
>> a guarantee that the ovn-controller version will always be >= to the
>> ovn-northd version.  To my knowledge they're the only users of
>> ecmp-symmetric-reply but of course we can't know that 100% sure.
> 
> At least kube-ovn supports ecmp with symmetric reply configuration.
> So, OpenShift is unlikely to be the only user.
> 

Ah, ok, good to know.

>>
>>>
>>> Also, there is already a similar code in northd for router policies.  So,
>>> it would make more sense to change both at once outside of a bug fix.
>>>
>>
>> It's a bit different there as there's no concept of "equal cost" policy
>> so we can't really know if the intention is to add more paths in the
>> future or not.  While with routes, IMO, it doesn't make sense to have a
>> "single path equal cost multi-path" route - except when that state is
>> transient.
> 
> I can see someone adding this flag to the route just in case someday
> they'll need to add another one.  Totally valid use case IMO, and the
> configuration will not be transient.
> 

I'm not sure I agree with the validity of that use case (in the sense
that to me that still sounds like "transient").  But yes, we can handle
it so we should probably just do that.

>>
>> But given that we have this similar code for policies, we might as well
>> do it for routes too as Rosemarie's patch does.
> 
> OK.
> 
>>
>> I'd still prefer to have system tests to cover the single path case too
>> though.
> 
> Sure.
> 
> Best regards, Ilya Maximets.
> 

Thanks,
Dumitru
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index 983c464eb..8ae3a75bd 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -11567,21 +11567,27 @@  build_ecmp_route_flow(struct lflow_table *lflows, struct ovn_datapath *od,
 
     struct ds actions = DS_EMPTY_INITIALIZER;
     ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
-                  "; %s = select(", REG_ECMP_GROUP_ID, eg->id,
-                  REG_ECMP_MEMBER_ID);
+                  "; %s = ", REG_ECMP_GROUP_ID, eg->id, REG_ECMP_MEMBER_ID);
 
-    bool is_first = true;
-    LIST_FOR_EACH (er, list_node, &eg->route_list) {
-        if (is_first) {
-            is_first = false;
-        } else {
-            ds_put_cstr(&actions, ", ");
+    if (!ovs_list_is_singleton(&eg->route_list)) {
+        bool is_first = true;
+
+        ds_put_cstr(&actions, "select(");
+        LIST_FOR_EACH (er, list_node, &eg->route_list) {
+            if (is_first) {
+                is_first = false;
+            } else {
+                ds_put_cstr(&actions, ", ");
+            }
+            ds_put_format(&actions, "%"PRIu16, er->id);
         }
-        ds_put_format(&actions, "%"PRIu16, er->id);
+        ds_put_cstr(&actions, ");");
+    } else {
+        er = CONTAINER_OF(ovs_list_front(&eg->route_list),
+                          struct ecmp_route_list_node, list_node);
+        ds_put_format(&actions, "%"PRIu16"; next;", er->id);
     }
 
-    ds_put_cstr(&actions, ");");
-
     ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, priority,
                   ds_cstr(&route_match), ds_cstr(&actions),
                   lflow_ref);
@@ -13543,6 +13549,11 @@  build_static_route_flows_for_lrouter(
                 if (group) {
                     ecmp_groups_add_route(group, route);
                 }
+            } else if (route->ecmp_symmetric_reply) {
+                /* Traffic for symmetric reply routes has to be conntracked
+                 * even if there is only one next-hop, in case another next-hop
+                 * is added later. */
+                ecmp_groups_add(&ecmp_groups, route);
             } else {
                 unique_routes_add(&unique_routes, route);
             }
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index ede38882a..ef5cd0c8c 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -4328,7 +4328,18 @@  next;
 ip.ttl--;
 flags.loopback = 1;
 reg8[0..15] = <var>GID</var>;
-select(reg8[16..31], <var>MID1</var>, <var>MID2</var>, ...);
+reg8[16..31] = select(<var>MID1</var>, <var>MID2</var>, ...);
+        </pre>
+        <p>
+          However, when there is only one route in an ECMP group, group actions
+          will be:
+        </p>
+
+        <pre>
+ip.ttl--;
+flags.loopback = 1;
+reg8[0..15] = <var>GID</var>;
+reg8[16..31] = <var>MID1</var>);
         </pre>
       </li>
 
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index dcc3dbbc3..d459c23c0 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -6799,20 +6799,23 @@  check ovn-nbctl lsp-set-type public-lr0 router
 check ovn-nbctl lsp-set-addresses public-lr0 router
 check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
 
+# ECMP flows will be added even if there is only one next-hop.
 check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 1.0.0.1 192.168.0.10
 
 ovn-sbctl dump-flows lr0 > lr0flows
 
 AT_CHECK([grep -w "lr_in_ip_routing" lr0flows | ovn_strip_lflows], [0], [dnl
   table=??(lr_in_ip_routing   ), priority=0    , match=(1), action=(drop;)
+  table=??(lr_in_ip_routing   ), priority=10300, match=(ct_mark.ecmp_reply_port == 1 && reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; eth.src = 00:00:20:20:12:13; reg1 = 192.168.0.1; outport = "lr0-public"; next;)
   table=??(lr_in_ip_routing   ), priority=10550, match=(nd_rs || nd_ra), action=(drop;)
   table=??(lr_in_ip_routing   ), priority=194  , match=(inport == "lr0-public" && ip6.dst == fe80::/64), action=(ip.ttl--; reg8[[0..15]] = 0; xxreg0 = ip6.dst; xxreg1 = fe80::200:20ff:fe20:1213; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
   table=??(lr_in_ip_routing   ), priority=74   , match=(ip4.dst == 192.168.0.0/24), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = ip4.dst; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
-  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; reg8[[0..15]] = 0; reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
+  table=??(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.0.0.1/32), action=(ip.ttl--; flags.loopback = 1; reg8[[0..15]] = 1; reg8[[16..31]] = 1; next;)
 ])
 
 AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows | ovn_strip_lflows], [0], [dnl
   table=??(lr_in_ip_routing_ecmp), priority=0    , match=(1), action=(drop;)
+  table=??(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[[0..15]] == 1 && reg8[[16..31]] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; next;)
   table=??(lr_in_ip_routing_ecmp), priority=150  , match=(reg8[[0..15]] == 0), action=(next;)
 ])