diff mbox series

[ovs-dev] ovn-northd: Don't generate identical flows for same LBs with different protocol.

Message ID 20210827152452.1949113-1-i.maximets@ovn.org
State Not Applicable
Headers show
Series [ovs-dev] ovn-northd: Don't generate identical flows for same LBs with different protocol. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Ilya Maximets Aug. 27, 2021, 3:24 p.m. UTC
It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
these load balancers has no ports specified (!vip_port), northd will
generate identical logical flows for them.  2 of 3 such flows will be
just discarded, so it's better to not build them form the beginning.

For example, in an ovn-heater's 120 node density-heavy scenario we
have 3 load balancers with 15K VIPs in each.  One for tcp, one for
udp and one for sctp.  In this case, ovn-northd generates 26M of
logical flows in total.  ~7.5M of them are flows for a single load
balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
discarded.

Let's find all these identical load balancers and skip while building
logical flows.  With this change, 15M of redundant logical flows are
not generated saving ~1.5 seconds of the CPU time per run.

Comparison function and the loop looks heavy, but in testing it takes
only a few milliseconds on these large load balancers.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

The better option might be to allow multiple protocols being configured
per load balancer, but this will require big API/schema/etc changes
and may reuire re-desing of certain northd/ovn-controller logic.

 lib/lb.c            | 56 +++++++++++++++++++++++++++++++++++++++++++++
 lib/lb.h            |  4 ++++
 northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)

Comments

Mark Gray Aug. 27, 2021, 5:05 p.m. UTC | #1
On 27/08/2021 16:24, Ilya Maximets wrote:
> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
> balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
> these load balancers has no ports specified (!vip_port), northd will
> generate identical logical flows for them.  2 of 3 such flows will be
> just discarded, so it's better to not build them form the beginning.

s/form/from

> 
> For example, in an ovn-heater's 120 node density-heavy scenario we
> have 3 load balancers with 15K VIPs in each.  One for tcp, one for
> udp and one for sctp.  In this case, ovn-northd generates 26M of
> logical flows in total.  ~7.5M of them are flows for a single load
> balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
> discarded.
> 
> Let's find all these identical load balancers and skip while building
> logical flows.  With this change, 15M of redundant logical flows are
> not generated saving ~1.5 seconds of the CPU time per run.
> 
> Comparison function and the loop looks heavy, but in testing it takes
> only a few milliseconds on these large load balancers.
> 
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

I see you haven't changed any tests? I am guessing because this doesn't
change the logical flows .. which is surprising?

Also, as an FYI, I was unable to create load balancers like this using
nbctl directly. It fails on creation of the load balancer. However, you
can modify the NBDB after creation. So I presume this is an allowed
configuration. However, it is not very well specified.

`$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp
ovn-nbctl: Protocol is unnecessary when no port of vip is given.`

> ---
> 
> The better option might be to allow multiple protocols being configured
> per load balancer, but this will require big API/schema/etc changes
> and may reuire re-desing of certain northd/ovn-controller logic.
> 
>  lib/lb.c            | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  lib/lb.h            |  4 ++++
>  northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++
>  3 files changed, 95 insertions(+)
> 
> diff --git a/lib/lb.c b/lib/lb.c
> index 7b0ed1abe..fb12c3457 100644
> --- a/lib/lb.c
> +++ b/lib/lb.c
> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb)
>      bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp");
>      struct ovn_northd_lb *lb = xzalloc(sizeof *lb);
>  
> +    lb->skip_lflow_build = false;
>      lb->nlb = nbrec_lb;
>      lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
>      lb->n_vips = smap_count(&nbrec_lb->vips);
> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid)
>      return NULL;
>  }
>  
> +/* Compares two load balancers for equality ignoring the 'protocol'. */
> +bool
> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1,
> +                                     const struct ovn_northd_lb *lb2)
> +{

I'd have a bit of a concern about maintaining this. For example, if we
add more fields, we would have to remember to update this but I can't
think of a better way of doing it.

> +    /* It's much more convenient to compare Northbound DB configuration. */
> +    const struct nbrec_load_balancer *lb_a = lb1->nlb;
> +    const struct nbrec_load_balancer *lb_b = lb2->nlb;
> +
> +    if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) {
> +        return false;
> +    }
> +    if (lb_a->n_health_check != lb_b->n_health_check) {
> +        return false;
> +    }
> +    if (lb_a->n_health_check
> +        && !memcmp(lb_a->health_check, lb_b->health_check,
> +                   lb_a->n_health_check * sizeof *lb_a->health_check)) {
> +        return false;
> +    }
> +    if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) {
> +        return false;
> +    }
> +    if (!smap_equal(&lb_a->options, &lb_b->options)) {
> +        return false;
> +    }
> +    if (lb_a->n_selection_fields != lb_b->n_selection_fields) {
> +        return false;
> +    }
> +    if (lb_a->n_selection_fields &&
> +        memcmp(lb_a->selection_fields, lb_b->selection_fields,
> +               lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) {
> +        return false;
> +    }
> +    if (!smap_equal(&lb_a->vips, &lb_b->vips)) {
> +        return false;
> +    }
> +
> +    /* Below fields are not easily accessible from the Nb DB entry, so
> +     * comparing parsed versions. */
> +    if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) {
> +        return false;
> +    }
> +    if (lb1->n_nb_ls
> +        && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) {
> +        return false;
> +    }
> +    if (lb1->n_nb_lr
> +        && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) {
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  void
>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od)
>  {
> diff --git a/lib/lb.h b/lib/lb.h
> index 832ed31fb..8e92338a3 100644
> --- a/lib/lb.h
> +++ b/lib/lb.h
> @@ -50,6 +50,8 @@ struct ovn_northd_lb {
>      size_t n_nb_lr;
>      size_t n_allocated_nb_lr;
>      struct ovn_datapath **nb_lr;
> +
> +    bool skip_lflow_build;
>  };
>  
>  struct ovn_lb_vip {
> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend {
>  
>  struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *);
>  struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *);
> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *,
> +                                          const struct ovn_northd_lb *);
>  void ovn_northd_lb_destroy(struct ovn_northd_lb *);
>  void
>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od);
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index af413aba4..d1efc28f4 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
>              sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0);
>          }
>      }
> +
> +    /* Northd doesn't use protocol in case vips has no ports specified.
> +     * And it's also common that user configures several identical load
> +     * balancers, one per protocol. We need to find them and use only one
> +     * for logical flow construction.  Logical flows will be identical,
> +     * so it's faster to just not build them. */
> +    struct ovn_northd_lb *lb2;
> +    HMAP_FOR_EACH (lb, hmap_node, lbs) {
> +        if (lb->skip_lflow_build) {
> +            continue;
> +        }
> +        for (size_t i = 0; i < lb->n_vips; i++) {
> +            if (lb->vips[i].vip_port) {
> +                goto next;
> +            }
> +        }
> +        HMAP_FOR_EACH (lb2, hmap_node, lbs) {
> +            if (lb2 == lb || lb2->skip_lflow_build) {
> +                continue;
> +            }
> +            if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) {
> +                /* Load balancer still referenced from logical switch or
> +                 * router, so we can't destroy it here. */
> +                lb2->skip_lflow_build = true;
> +            }
> +        }

I am not sure how this would scale in the case in which there were lots
of load-balancers. OVN-K is changing to a model in which there are many
(15k?) loadbalancers. This would increase this inner loop to 15k x 15k
(225M) iterations.

> +next:;
> +    }
> +
>  }
>  
>  static void
> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg)
>                      if (stop_parallel_processing()) {
>                          return NULL;
>                      }
> +                    if (lb->skip_lflow_build) {
> +                        continue;
> +                    }
>                      build_lswitch_arp_nd_service_monitor(lb, lsi->lflows,
>                                                           &lsi->match,
>                                                           &lsi->actions);
> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>              build_lswitch_and_lrouter_iterate_by_op(op, &lsi);
>          }
>          HMAP_FOR_EACH (lb, hmap_node, lbs) {
> +            if (lb->skip_lflow_build) {
> +                continue;
> +            }
>              build_lswitch_arp_nd_service_monitor(lb, lsi.lflows,
>                                                   &lsi.actions,
>                                                   &lsi.match);
> 

I have found a configuration that causes undefined behaviour. However,
it is the same as the without your patch but it is relevant. If you
define two load balancers with the protocol specified (but without the
port) and attach different attributes to each. It can cause conflicting
logical flows. Consider the following reproducer (can be run in sandbox
environment):

`
# Create the first logical switch with one port
ovn-nbctl ls-add sw0
ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"

# Create the second logical switch with one port
ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"

# Create a logical router and attach both logical switches
ovn-nbctl lr-add lr0
ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
ovn-nbctl lsp-add sw0 lrp0-attachment
ovn-nbctl lsp-set-type lrp0-attachment router
ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
ovn-nbctl lsp-add sw1 lrp1-attachment
ovn-nbctl lsp-set-type lrp1-attachment router
ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1

ovs-vsctl add-port br-int p1 -- \
    set Interface p1 external_ids:iface-id=sw0-port1
ovs-vsctl add-port br-int p2 -- \
    set Interface p2 external_ids:iface-id=sw1-port1

ovn-nbctl set Logical_Router lr0 options:chassis=hv1
ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2
ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2
ovn-nbctl set Load_Balancer lb0 protocol=tcp
ovn-nbctl set Load_Balancer lb0 options=skip_snat=true
ovn-nbctl set Load_Balancer lb1 protocol=udp
ovn-nbctl lr-lb-add lr0 lb0
ovn-nbctl lr-lb-add lr0 lb1
`

Your code gives the following flows:
$ ovn-sbctl dump-flows | grep lr_in_dnat
  table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
reg0 == 11.0.0.200 && ct_label.natted == 1),
action=(flags.skip_snat_for_lb = 1; next;)
  table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;)
  table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);)
  table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1;
ct_lb(backends=192.168.0.2);)
  table=6 (lr_in_dnat         ), priority=0    , match=(1), action=(next;)

These should produce different flows for each load balancer which means
we need to specify protocol as a match field. I think this prevents your
optimization?

For reference, I raised a bugzilla bug for this and it is similar to one
I fixed recently.

https://bugzilla.redhat.com/show_bug.cgi?id=1998592
Ilya Maximets Aug. 27, 2021, 6:03 p.m. UTC | #2
On 8/27/21 7:05 PM, Mark Gray wrote:
> On 27/08/2021 16:24, Ilya Maximets wrote:
>> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
>> balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
>> these load balancers has no ports specified (!vip_port), northd will
>> generate identical logical flows for them.  2 of 3 such flows will be
>> just discarded, so it's better to not build them form the beginning.
> 
> s/form/from
> 
>>
>> For example, in an ovn-heater's 120 node density-heavy scenario we
>> have 3 load balancers with 15K VIPs in each.  One for tcp, one for
>> udp and one for sctp.  In this case, ovn-northd generates 26M of
>> logical flows in total.  ~7.5M of them are flows for a single load
>> balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
>> discarded.
>>
>> Let's find all these identical load balancers and skip while building
>> logical flows.  With this change, 15M of redundant logical flows are
>> not generated saving ~1.5 seconds of the CPU time per run.
>>
>> Comparison function and the loop looks heavy, but in testing it takes
>> only a few milliseconds on these large load balancers.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> I see you haven't changed any tests? I am guessing because this doesn't
> change the logical flows .. which is surprising?

This is not surprising. :)
This patch only allows to not generate duplicate flows that will never
end up in a Southbound anyway.  So, total number and look of logical
flows is exactly the same.  Hence, it's hard to test.  It's a pure
performance fix.

> 
> Also, as an FYI, I was unable to create load balancers like this using
> nbctl directly. It fails on creation of the load balancer. However, you
> can modify the NBDB after creation. So I presume this is an allowed
> configuration. However, it is not very well specified.

Yeah.  nbctl seems inconsistent.  However, ovn-k8s and other CMSs are
not using it right now and communicate directly with a dtabase, so it's
not an issue for them.

> 
> `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp
> ovn-nbctl: Protocol is unnecessary when no port of vip is given.`
> 
>> ---
>>
>> The better option might be to allow multiple protocols being configured
>> per load balancer, but this will require big API/schema/etc changes
>> and may reuire re-desing of certain northd/ovn-controller logic.
>>
>>  lib/lb.c            | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>  lib/lb.h            |  4 ++++
>>  northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++
>>  3 files changed, 95 insertions(+)
>>
>> diff --git a/lib/lb.c b/lib/lb.c
>> index 7b0ed1abe..fb12c3457 100644
>> --- a/lib/lb.c
>> +++ b/lib/lb.c
>> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb)
>>      bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp");
>>      struct ovn_northd_lb *lb = xzalloc(sizeof *lb);
>>  
>> +    lb->skip_lflow_build = false;
>>      lb->nlb = nbrec_lb;
>>      lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
>>      lb->n_vips = smap_count(&nbrec_lb->vips);
>> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid)
>>      return NULL;
>>  }
>>  
>> +/* Compares two load balancers for equality ignoring the 'protocol'. */
>> +bool
>> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1,
>> +                                     const struct ovn_northd_lb *lb2)
>> +{
> 
> I'd have a bit of a concern about maintaining this. For example, if we
> add more fields, we would have to remember to update this but I can't
> think of a better way of doing it.

I can add a comment to the 'struct ovn_northd_lb' for developers that
any change should be reflected in this function.

> 
>> +    /* It's much more convenient to compare Northbound DB configuration. */
>> +    const struct nbrec_load_balancer *lb_a = lb1->nlb;
>> +    const struct nbrec_load_balancer *lb_b = lb2->nlb;
>> +
>> +    if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) {
>> +        return false;
>> +    }
>> +    if (lb_a->n_health_check != lb_b->n_health_check) {
>> +        return false;
>> +    }
>> +    if (lb_a->n_health_check
>> +        && !memcmp(lb_a->health_check, lb_b->health_check,
>> +                   lb_a->n_health_check * sizeof *lb_a->health_check)) {
>> +        return false;
>> +    }
>> +    if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) {
>> +        return false;
>> +    }
>> +    if (!smap_equal(&lb_a->options, &lb_b->options)) {
>> +        return false;
>> +    }
>> +    if (lb_a->n_selection_fields != lb_b->n_selection_fields) {
>> +        return false;
>> +    }
>> +    if (lb_a->n_selection_fields &&
>> +        memcmp(lb_a->selection_fields, lb_b->selection_fields,
>> +               lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) {
>> +        return false;
>> +    }
>> +    if (!smap_equal(&lb_a->vips, &lb_b->vips)) {
>> +        return false;
>> +    }
>> +
>> +    /* Below fields are not easily accessible from the Nb DB entry, so
>> +     * comparing parsed versions. */
>> +    if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) {
>> +        return false;
>> +    }
>> +    if (lb1->n_nb_ls
>> +        && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) {
>> +        return false;
>> +    }
>> +    if (lb1->n_nb_lr
>> +        && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) {
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>  void
>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od)
>>  {
>> diff --git a/lib/lb.h b/lib/lb.h
>> index 832ed31fb..8e92338a3 100644
>> --- a/lib/lb.h
>> +++ b/lib/lb.h
>> @@ -50,6 +50,8 @@ struct ovn_northd_lb {
>>      size_t n_nb_lr;
>>      size_t n_allocated_nb_lr;
>>      struct ovn_datapath **nb_lr;
>> +
>> +    bool skip_lflow_build;
>>  };
>>  
>>  struct ovn_lb_vip {
>> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend {
>>  
>>  struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *);
>>  struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *);
>> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *,
>> +                                          const struct ovn_northd_lb *);
>>  void ovn_northd_lb_destroy(struct ovn_northd_lb *);
>>  void
>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od);
>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> index af413aba4..d1efc28f4 100644
>> --- a/northd/ovn-northd.c
>> +++ b/northd/ovn-northd.c
>> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
>>              sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0);
>>          }
>>      }
>> +
>> +    /* Northd doesn't use protocol in case vips has no ports specified.
>> +     * And it's also common that user configures several identical load
>> +     * balancers, one per protocol. We need to find them and use only one
>> +     * for logical flow construction.  Logical flows will be identical,
>> +     * so it's faster to just not build them. */
>> +    struct ovn_northd_lb *lb2;
>> +    HMAP_FOR_EACH (lb, hmap_node, lbs) {
>> +        if (lb->skip_lflow_build) {
>> +            continue;
>> +        }
>> +        for (size_t i = 0; i < lb->n_vips; i++) {
>> +            if (lb->vips[i].vip_port) {
>> +                goto next;
>> +            }
>> +        }
>> +        HMAP_FOR_EACH (lb2, hmap_node, lbs) {
>> +            if (lb2 == lb || lb2->skip_lflow_build) {
>> +                continue;
>> +            }
>> +            if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) {
>> +                /* Load balancer still referenced from logical switch or
>> +                 * router, so we can't destroy it here. */
>> +                lb2->skip_lflow_build = true;
>> +            }
>> +        }
> 
> I am not sure how this would scale in the case in which there were lots
> of load-balancers. OVN-K is changing to a model in which there are many
> (15k?) loadbalancers. This would increase this inner loop to 15k x 15k
> (225M) iterations.

I'll try to test that.  We can try to hash the load balancer and make
this part semi-linear.  The code will be a bit more complex.

> 
>> +next:;
>> +    }
>> +
>>  }
>>  
>>  static void
>> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg)
>>                      if (stop_parallel_processing()) {
>>                          return NULL;
>>                      }
>> +                    if (lb->skip_lflow_build) {
>> +                        continue;
>> +                    }
>>                      build_lswitch_arp_nd_service_monitor(lb, lsi->lflows,
>>                                                           &lsi->match,
>>                                                           &lsi->actions);
>> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>              build_lswitch_and_lrouter_iterate_by_op(op, &lsi);
>>          }
>>          HMAP_FOR_EACH (lb, hmap_node, lbs) {
>> +            if (lb->skip_lflow_build) {
>> +                continue;
>> +            }
>>              build_lswitch_arp_nd_service_monitor(lb, lsi.lflows,
>>                                                   &lsi.actions,
>>                                                   &lsi.match);
>>
> 
> I have found a configuration that causes undefined behaviour. However,
> it is the same as the without your patch but it is relevant. If you
> define two load balancers with the protocol specified (but without the
> port) and attach different attributes to each. It can cause conflicting
> logical flows. Consider the following reproducer (can be run in sandbox
> environment):
> 
> `
> # Create the first logical switch with one port
> ovn-nbctl ls-add sw0
> ovn-nbctl lsp-add sw0 sw0-port1
> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
> 
> # Create the second logical switch with one port
> ovn-nbctl ls-add sw1
> ovn-nbctl lsp-add sw1 sw1-port1
> ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"
> 
> # Create a logical router and attach both logical switches
> ovn-nbctl lr-add lr0
> ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
> ovn-nbctl lsp-add sw0 lrp0-attachment
> ovn-nbctl lsp-set-type lrp0-attachment router
> ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
> ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
> ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
> ovn-nbctl lsp-add sw1 lrp1-attachment
> ovn-nbctl lsp-set-type lrp1-attachment router
> ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
> ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1
> 
> ovs-vsctl add-port br-int p1 -- \
>     set Interface p1 external_ids:iface-id=sw0-port1
> ovs-vsctl add-port br-int p2 -- \
>     set Interface p2 external_ids:iface-id=sw1-port1
> 
> ovn-nbctl set Logical_Router lr0 options:chassis=hv1
> ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2
> ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2
> ovn-nbctl set Load_Balancer lb0 protocol=tcp
> ovn-nbctl set Load_Balancer lb0 options=skip_snat=true
> ovn-nbctl set Load_Balancer lb1 protocol=udp
> ovn-nbctl lr-lb-add lr0 lb0
> ovn-nbctl lr-lb-add lr0 lb1
> `
> 
> Your code gives the following flows:
> $ ovn-sbctl dump-flows | grep lr_in_dnat
>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
> reg0 == 11.0.0.200 && ct_label.natted == 1),
> action=(flags.skip_snat_for_lb = 1; next;)
>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
> reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;)
>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
> reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);)
>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
> reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1;
> ct_lb(backends=192.168.0.2);)
>   table=6 (lr_in_dnat         ), priority=0    , match=(1), action=(next;)
> 
> These should produce different flows for each load balancer which means
> we need to specify protocol as a match field. I think this prevents your
> optimization?

Since 'options' are different, optimization will not be applied.
ovn_northd_lb_equal_except_for_proto() compares them.

But if options are the same, than you don't need a match on a protocol.

> 
> For reference, I raised a bugzilla bug for this and it is similar to one
> I fixed recently.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1998592
> 

The problem is that ability to implement the optimization depends on
how this particular bug will be fixed in OVN.  If it will be fixed by
blindly adding protocol matches everywhere, than, obviously, optimization
will not be possible as logical flows will never be identical anymore.

On the other hand, if it will be fixed this way, number of logical flows
in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't
sound like a good thing to have at that scale.

So, I'd say that whoever will work on fixing a bug should try to avoid
having protocol matches as much as possible.  This will save a lot of memory
and processing time in all OVN components.  And will also allow optimization
impemented in this patch.

On a side note, all these force_snat and skip_snat looks like a dirty
workarounds for some other problems.  And they are too low level for
OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS).

Best regards, Ilya Maximets.
Ilya Maximets Aug. 27, 2021, 7:24 p.m. UTC | #3
On 8/27/21 8:03 PM, Ilya Maximets wrote:
> On 8/27/21 7:05 PM, Mark Gray wrote:
>> On 27/08/2021 16:24, Ilya Maximets wrote:
>>> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
>>> balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
>>> these load balancers has no ports specified (!vip_port), northd will
>>> generate identical logical flows for them.  2 of 3 such flows will be
>>> just discarded, so it's better to not build them form the beginning.
>>
>> s/form/from
>>
>>>
>>> For example, in an ovn-heater's 120 node density-heavy scenario we
>>> have 3 load balancers with 15K VIPs in each.  One for tcp, one for
>>> udp and one for sctp.  In this case, ovn-northd generates 26M of
>>> logical flows in total.  ~7.5M of them are flows for a single load
>>> balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
>>> discarded.
>>>
>>> Let's find all these identical load balancers and skip while building
>>> logical flows.  With this change, 15M of redundant logical flows are
>>> not generated saving ~1.5 seconds of the CPU time per run.
>>>
>>> Comparison function and the loop looks heavy, but in testing it takes
>>> only a few milliseconds on these large load balancers.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>
>> I see you haven't changed any tests? I am guessing because this doesn't
>> change the logical flows .. which is surprising?
> 
> This is not surprising. :)
> This patch only allows to not generate duplicate flows that will never
> end up in a Southbound anyway.  So, total number and look of logical
> flows is exactly the same.  Hence, it's hard to test.  It's a pure
> performance fix.
> 
>>
>> Also, as an FYI, I was unable to create load balancers like this using
>> nbctl directly. It fails on creation of the load balancer. However, you
>> can modify the NBDB after creation. So I presume this is an allowed
>> configuration. However, it is not very well specified.
> 
> Yeah.  nbctl seems inconsistent.  However, ovn-k8s and other CMSs are
> not using it right now and communicate directly with a dtabase, so it's
> not an issue for them.
> 
>>
>> `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp
>> ovn-nbctl: Protocol is unnecessary when no port of vip is given.`
>>
>>> ---
>>>
>>> The better option might be to allow multiple protocols being configured
>>> per load balancer, but this will require big API/schema/etc changes
>>> and may reuire re-desing of certain northd/ovn-controller logic.
>>>
>>>  lib/lb.c            | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>>  lib/lb.h            |  4 ++++
>>>  northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++
>>>  3 files changed, 95 insertions(+)
>>>
>>> diff --git a/lib/lb.c b/lib/lb.c
>>> index 7b0ed1abe..fb12c3457 100644
>>> --- a/lib/lb.c
>>> +++ b/lib/lb.c
>>> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb)
>>>      bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp");
>>>      struct ovn_northd_lb *lb = xzalloc(sizeof *lb);
>>>  
>>> +    lb->skip_lflow_build = false;
>>>      lb->nlb = nbrec_lb;
>>>      lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
>>>      lb->n_vips = smap_count(&nbrec_lb->vips);
>>> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid)
>>>      return NULL;
>>>  }
>>>  
>>> +/* Compares two load balancers for equality ignoring the 'protocol'. */
>>> +bool
>>> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1,
>>> +                                     const struct ovn_northd_lb *lb2)
>>> +{
>>
>> I'd have a bit of a concern about maintaining this. For example, if we
>> add more fields, we would have to remember to update this but I can't
>> think of a better way of doing it.
> 
> I can add a comment to the 'struct ovn_northd_lb' for developers that
> any change should be reflected in this function.
> 
>>
>>> +    /* It's much more convenient to compare Northbound DB configuration. */
>>> +    const struct nbrec_load_balancer *lb_a = lb1->nlb;
>>> +    const struct nbrec_load_balancer *lb_b = lb2->nlb;
>>> +
>>> +    if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_health_check != lb_b->n_health_check) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_health_check
>>> +        && !memcmp(lb_a->health_check, lb_b->health_check,
>>> +                   lb_a->n_health_check * sizeof *lb_a->health_check)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->options, &lb_b->options)) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_selection_fields != lb_b->n_selection_fields) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_selection_fields &&
>>> +        memcmp(lb_a->selection_fields, lb_b->selection_fields,
>>> +               lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->vips, &lb_b->vips)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    /* Below fields are not easily accessible from the Nb DB entry, so
>>> +     * comparing parsed versions. */
>>> +    if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) {
>>> +        return false;
>>> +    }
>>> +    if (lb1->n_nb_ls
>>> +        && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) {
>>> +        return false;
>>> +    }
>>> +    if (lb1->n_nb_lr
>>> +        && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +
>>>  void
>>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od)
>>>  {
>>> diff --git a/lib/lb.h b/lib/lb.h
>>> index 832ed31fb..8e92338a3 100644
>>> --- a/lib/lb.h
>>> +++ b/lib/lb.h
>>> @@ -50,6 +50,8 @@ struct ovn_northd_lb {
>>>      size_t n_nb_lr;
>>>      size_t n_allocated_nb_lr;
>>>      struct ovn_datapath **nb_lr;
>>> +
>>> +    bool skip_lflow_build;
>>>  };
>>>  
>>>  struct ovn_lb_vip {
>>> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend {
>>>  
>>>  struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *);
>>>  struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *);
>>> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *,
>>> +                                          const struct ovn_northd_lb *);
>>>  void ovn_northd_lb_destroy(struct ovn_northd_lb *);
>>>  void
>>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od);
>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>>> index af413aba4..d1efc28f4 100644
>>> --- a/northd/ovn-northd.c
>>> +++ b/northd/ovn-northd.c
>>> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
>>>              sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0);
>>>          }
>>>      }
>>> +
>>> +    /* Northd doesn't use protocol in case vips has no ports specified.
>>> +     * And it's also common that user configures several identical load
>>> +     * balancers, one per protocol. We need to find them and use only one
>>> +     * for logical flow construction.  Logical flows will be identical,
>>> +     * so it's faster to just not build them. */
>>> +    struct ovn_northd_lb *lb2;
>>> +    HMAP_FOR_EACH (lb, hmap_node, lbs) {
>>> +        if (lb->skip_lflow_build) {
>>> +            continue;
>>> +        }
>>> +        for (size_t i = 0; i < lb->n_vips; i++) {
>>> +            if (lb->vips[i].vip_port) {
>>> +                goto next;
>>> +            }
>>> +        }
>>> +        HMAP_FOR_EACH (lb2, hmap_node, lbs) {
>>> +            if (lb2 == lb || lb2->skip_lflow_build) {
>>> +                continue;
>>> +            }
>>> +            if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) {
>>> +                /* Load balancer still referenced from logical switch or
>>> +                 * router, so we can't destroy it here. */
>>> +                lb2->skip_lflow_build = true;
>>> +            }
>>> +        }
>>
>> I am not sure how this would scale in the case in which there were lots
>> of load-balancers. OVN-K is changing to a model in which there are many
>> (15k?) loadbalancers. This would increase this inner loop to 15k x 15k
>> (225M) iterations.
> 
> I'll try to test that.  We can try to hash the load balancer and make
> this part semi-linear.  The code will be a bit more complex.
> 
>>
>>> +next:;
>>> +    }
>>> +
>>>  }
>>>  
>>>  static void
>>> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg)
>>>                      if (stop_parallel_processing()) {
>>>                          return NULL;
>>>                      }
>>> +                    if (lb->skip_lflow_build) {
>>> +                        continue;
>>> +                    }
>>>                      build_lswitch_arp_nd_service_monitor(lb, lsi->lflows,
>>>                                                           &lsi->match,
>>>                                                           &lsi->actions);
>>> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>>              build_lswitch_and_lrouter_iterate_by_op(op, &lsi);
>>>          }
>>>          HMAP_FOR_EACH (lb, hmap_node, lbs) {
>>> +            if (lb->skip_lflow_build) {
>>> +                continue;
>>> +            }
>>>              build_lswitch_arp_nd_service_monitor(lb, lsi.lflows,
>>>                                                   &lsi.actions,
>>>                                                   &lsi.match);
>>>
>>
>> I have found a configuration that causes undefined behaviour. However,
>> it is the same as the without your patch but it is relevant. If you
>> define two load balancers with the protocol specified (but without the
>> port) and attach different attributes to each. It can cause conflicting
>> logical flows. Consider the following reproducer (can be run in sandbox
>> environment):
>>
>> `
>> # Create the first logical switch with one port
>> ovn-nbctl ls-add sw0
>> ovn-nbctl lsp-add sw0 sw0-port1
>> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
>>
>> # Create the second logical switch with one port
>> ovn-nbctl ls-add sw1
>> ovn-nbctl lsp-add sw1 sw1-port1
>> ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"
>>
>> # Create a logical router and attach both logical switches
>> ovn-nbctl lr-add lr0
>> ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
>> ovn-nbctl lsp-add sw0 lrp0-attachment
>> ovn-nbctl lsp-set-type lrp0-attachment router
>> ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
>> ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
>> ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
>> ovn-nbctl lsp-add sw1 lrp1-attachment
>> ovn-nbctl lsp-set-type lrp1-attachment router
>> ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
>> ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1
>>
>> ovs-vsctl add-port br-int p1 -- \
>>     set Interface p1 external_ids:iface-id=sw0-port1
>> ovs-vsctl add-port br-int p2 -- \
>>     set Interface p2 external_ids:iface-id=sw1-port1
>>
>> ovn-nbctl set Logical_Router lr0 options:chassis=hv1
>> ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2
>> ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2
>> ovn-nbctl set Load_Balancer lb0 protocol=tcp
>> ovn-nbctl set Load_Balancer lb0 options=skip_snat=true
>> ovn-nbctl set Load_Balancer lb1 protocol=udp
>> ovn-nbctl lr-lb-add lr0 lb0
>> ovn-nbctl lr-lb-add lr0 lb1
>> `
>>
>> Your code gives the following flows:
>> $ ovn-sbctl dump-flows | grep lr_in_dnat
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
>> reg0 == 11.0.0.200 && ct_label.natted == 1),
>> action=(flags.skip_snat_for_lb = 1; next;)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
>> reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
>> reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
>> reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1;
>> ct_lb(backends=192.168.0.2);)
>>   table=6 (lr_in_dnat         ), priority=0    , match=(1), action=(next;)
>>
>> These should produce different flows for each load balancer which means
>> we need to specify protocol as a match field. I think this prevents your
>> optimization?
> 
> Since 'options' are different, optimization will not be applied.
> ovn_northd_lb_equal_except_for_proto() compares them.
> 
> But if options are the same, than you don't need a match on a protocol.
> 
>>
>> For reference, I raised a bugzilla bug for this and it is similar to one
>> I fixed recently.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1998592
>>
> 
> The problem is that ability to implement the optimization depends on
> how this particular bug will be fixed in OVN.  If it will be fixed by
> blindly adding protocol matches everywhere, than, obviously, optimization
> will not be possible as logical flows will never be identical anymore.
> 
> On the other hand, if it will be fixed this way, number of logical flows
> in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't
> sound like a good thing to have at that scale.

Hmm.  I guess, it's not that bad, because with dp-groups they will be
groupped, but, I'd still expect increase in 100-200K of logical flows,
which is a substantial addition.  ovn-controller will see a harder hit,
since it will unfold dp-groups and wil get all 15M extra flows to process.

> 
> So, I'd say that whoever will work on fixing a bug should try to avoid
> having protocol matches as much as possible.  This will save a lot of memory
> and processing time in all OVN components.  And will also allow optimization
> impemented in this patch.
> 
> On a side note, all these force_snat and skip_snat looks like a dirty
> workarounds for some other problems.  And they are too low level for
> OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS).
> 
> Best regards, Ilya Maximets.
>
Mark Gray Aug. 30, 2021, 9:03 a.m. UTC | #4
On 27/08/2021 19:03, Ilya Maximets wrote:
> On 8/27/21 7:05 PM, Mark Gray wrote:
>> On 27/08/2021 16:24, Ilya Maximets wrote:
>>> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
>>> balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
>>> these load balancers has no ports specified (!vip_port), northd will
>>> generate identical logical flows for them.  2 of 3 such flows will be
>>> just discarded, so it's better to not build them form the beginning.
>>
>> s/form/from
>>
>>>
>>> For example, in an ovn-heater's 120 node density-heavy scenario we
>>> have 3 load balancers with 15K VIPs in each.  One for tcp, one for
>>> udp and one for sctp.  In this case, ovn-northd generates 26M of
>>> logical flows in total.  ~7.5M of them are flows for a single load
>>> balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
>>> discarded.
>>>
>>> Let's find all these identical load balancers and skip while building
>>> logical flows.  With this change, 15M of redundant logical flows are
>>> not generated saving ~1.5 seconds of the CPU time per run.
>>>
>>> Comparison function and the loop looks heavy, but in testing it takes
>>> only a few milliseconds on these large load balancers.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>
>> I see you haven't changed any tests? I am guessing because this doesn't
>> change the logical flows .. which is surprising?
> 
> This is not surprising. :)
> This patch only allows to not generate duplicate flows that will never
> end up in a Southbound anyway.  So, total number and look of logical
> flows is exactly the same.  Hence, it's hard to test.  It's a pure
> performance fix.

For me, I was surprised that we didn't have a test for the kind of issue
that we had below. I was expecting something like that to fail.

> 
>>
>> Also, as an FYI, I was unable to create load balancers like this using
>> nbctl directly. It fails on creation of the load balancer. However, you
>> can modify the NBDB after creation. So I presume this is an allowed
>> configuration. However, it is not very well specified.
> 
> Yeah.  nbctl seems inconsistent.  However, ovn-k8s and other CMSs are
> not using it right now and communicate directly with a dtabase, so it's
> not an issue for them.
> 
>>
>> `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp
>> ovn-nbctl: Protocol is unnecessary when no port of vip is given.`
>>
>>> ---
>>>
>>> The better option might be to allow multiple protocols being configured
>>> per load balancer, but this will require big API/schema/etc changes
>>> and may reuire re-desing of certain northd/ovn-controller logic.
>>>
>>>  lib/lb.c            | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>>  lib/lb.h            |  4 ++++
>>>  northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++
>>>  3 files changed, 95 insertions(+)
>>>
>>> diff --git a/lib/lb.c b/lib/lb.c
>>> index 7b0ed1abe..fb12c3457 100644
>>> --- a/lib/lb.c
>>> +++ b/lib/lb.c
>>> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb)
>>>      bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp");
>>>      struct ovn_northd_lb *lb = xzalloc(sizeof *lb);
>>>  
>>> +    lb->skip_lflow_build = false;
>>>      lb->nlb = nbrec_lb;
>>>      lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
>>>      lb->n_vips = smap_count(&nbrec_lb->vips);
>>> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid)
>>>      return NULL;
>>>  }
>>>  
>>> +/* Compares two load balancers for equality ignoring the 'protocol'. */
>>> +bool
>>> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1,
>>> +                                     const struct ovn_northd_lb *lb2)
>>> +{
>>
>> I'd have a bit of a concern about maintaining this. For example, if we
>> add more fields, we would have to remember to update this but I can't
>> think of a better way of doing it.
> 
> I can add a comment to the 'struct ovn_northd_lb' for developers that
> any change should be reflected in this function.
> 
>>
>>> +    /* It's much more convenient to compare Northbound DB configuration. */
>>> +    const struct nbrec_load_balancer *lb_a = lb1->nlb;
>>> +    const struct nbrec_load_balancer *lb_b = lb2->nlb;
>>> +
>>> +    if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_health_check != lb_b->n_health_check) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_health_check
>>> +        && !memcmp(lb_a->health_check, lb_b->health_check,
>>> +                   lb_a->n_health_check * sizeof *lb_a->health_check)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->options, &lb_b->options)) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_selection_fields != lb_b->n_selection_fields) {
>>> +        return false;
>>> +    }
>>> +    if (lb_a->n_selection_fields &&
>>> +        memcmp(lb_a->selection_fields, lb_b->selection_fields,
>>> +               lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) {
>>> +        return false;
>>> +    }
>>> +    if (!smap_equal(&lb_a->vips, &lb_b->vips)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    /* Below fields are not easily accessible from the Nb DB entry, so
>>> +     * comparing parsed versions. */
>>> +    if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) {
>>> +        return false;
>>> +    }
>>> +    if (lb1->n_nb_ls
>>> +        && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) {
>>> +        return false;
>>> +    }
>>> +    if (lb1->n_nb_lr
>>> +        && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +
>>>  void
>>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od)
>>>  {
>>> diff --git a/lib/lb.h b/lib/lb.h
>>> index 832ed31fb..8e92338a3 100644
>>> --- a/lib/lb.h
>>> +++ b/lib/lb.h
>>> @@ -50,6 +50,8 @@ struct ovn_northd_lb {
>>>      size_t n_nb_lr;
>>>      size_t n_allocated_nb_lr;
>>>      struct ovn_datapath **nb_lr;
>>> +
>>> +    bool skip_lflow_build;
>>>  };
>>>  
>>>  struct ovn_lb_vip {
>>> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend {
>>>  
>>>  struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *);
>>>  struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *);
>>> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *,
>>> +                                          const struct ovn_northd_lb *);
>>>  void ovn_northd_lb_destroy(struct ovn_northd_lb *);
>>>  void
>>>  ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od);
>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>>> index af413aba4..d1efc28f4 100644
>>> --- a/northd/ovn-northd.c
>>> +++ b/northd/ovn-northd.c
>>> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
>>>              sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0);
>>>          }
>>>      }
>>> +
>>> +    /* Northd doesn't use protocol in case vips has no ports specified.
>>> +     * And it's also common that user configures several identical load
>>> +     * balancers, one per protocol. We need to find them and use only one
>>> +     * for logical flow construction.  Logical flows will be identical,
>>> +     * so it's faster to just not build them. */
>>> +    struct ovn_northd_lb *lb2;
>>> +    HMAP_FOR_EACH (lb, hmap_node, lbs) {
>>> +        if (lb->skip_lflow_build) {
>>> +            continue;
>>> +        }
>>> +        for (size_t i = 0; i < lb->n_vips; i++) {
>>> +            if (lb->vips[i].vip_port) {
>>> +                goto next;
>>> +            }
>>> +        }
>>> +        HMAP_FOR_EACH (lb2, hmap_node, lbs) {
>>> +            if (lb2 == lb || lb2->skip_lflow_build) {
>>> +                continue;
>>> +            }
>>> +            if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) {
>>> +                /* Load balancer still referenced from logical switch or
>>> +                 * router, so we can't destroy it here. */
>>> +                lb2->skip_lflow_build = true;
>>> +            }
>>> +        }
>>
>> I am not sure how this would scale in the case in which there were lots
>> of load-balancers. OVN-K is changing to a model in which there are many
>> (15k?) loadbalancers. This would increase this inner loop to 15k x 15k
>> (225M) iterations.
> 
> I'll try to test that.  We can try to hash the load balancer and make
> this part semi-linear.  The code will be a bit more complex.
> 
>>
>>> +next:;
>>> +    }
>>> +
>>>  }
>>>  
>>>  static void
>>> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg)
>>>                      if (stop_parallel_processing()) {
>>>                          return NULL;
>>>                      }
>>> +                    if (lb->skip_lflow_build) {
>>> +                        continue;
>>> +                    }
>>>                      build_lswitch_arp_nd_service_monitor(lb, lsi->lflows,
>>>                                                           &lsi->match,
>>>                                                           &lsi->actions);
>>> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>>              build_lswitch_and_lrouter_iterate_by_op(op, &lsi);
>>>          }
>>>          HMAP_FOR_EACH (lb, hmap_node, lbs) {
>>> +            if (lb->skip_lflow_build) {
>>> +                continue;
>>> +            }
>>>              build_lswitch_arp_nd_service_monitor(lb, lsi.lflows,
>>>                                                   &lsi.actions,
>>>                                                   &lsi.match);
>>>
>>
>> I have found a configuration that causes undefined behaviour. However,
>> it is the same as the without your patch but it is relevant. If you
>> define two load balancers with the protocol specified (but without the
>> port) and attach different attributes to each. It can cause conflicting
>> logical flows. Consider the following reproducer (can be run in sandbox
>> environment):
>>
>> `
>> # Create the first logical switch with one port
>> ovn-nbctl ls-add sw0
>> ovn-nbctl lsp-add sw0 sw0-port1
>> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
>>
>> # Create the second logical switch with one port
>> ovn-nbctl ls-add sw1
>> ovn-nbctl lsp-add sw1 sw1-port1
>> ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"
>>
>> # Create a logical router and attach both logical switches
>> ovn-nbctl lr-add lr0
>> ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
>> ovn-nbctl lsp-add sw0 lrp0-attachment
>> ovn-nbctl lsp-set-type lrp0-attachment router
>> ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
>> ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
>> ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
>> ovn-nbctl lsp-add sw1 lrp1-attachment
>> ovn-nbctl lsp-set-type lrp1-attachment router
>> ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
>> ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1
>>
>> ovs-vsctl add-port br-int p1 -- \
>>     set Interface p1 external_ids:iface-id=sw0-port1
>> ovs-vsctl add-port br-int p2 -- \
>>     set Interface p2 external_ids:iface-id=sw1-port1
>>
>> ovn-nbctl set Logical_Router lr0 options:chassis=hv1
>> ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2
>> ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2
>> ovn-nbctl set Load_Balancer lb0 protocol=tcp
>> ovn-nbctl set Load_Balancer lb0 options=skip_snat=true
>> ovn-nbctl set Load_Balancer lb1 protocol=udp
>> ovn-nbctl lr-lb-add lr0 lb0
>> ovn-nbctl lr-lb-add lr0 lb1
>> `
>>
>> Your code gives the following flows:
>> $ ovn-sbctl dump-flows | grep lr_in_dnat
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
>> reg0 == 11.0.0.200 && ct_label.natted == 1),
>> action=(flags.skip_snat_for_lb = 1; next;)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.est && ip4 &&
>> reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
>> reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);)
>>   table=6 (lr_in_dnat         ), priority=110  , match=(ct.new && ip4 &&
>> reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1;
>> ct_lb(backends=192.168.0.2);)
>>   table=6 (lr_in_dnat         ), priority=0    , match=(1), action=(next;)
>>
>> These should produce different flows for each load balancer which means
>> we need to specify protocol as a match field. I think this prevents your
>> optimization?
> 
> Since 'options' are different, optimization will not be applied.
> ovn_northd_lb_equal_except_for_proto() compares them.
> 
> But if options are the same, than you don't need a match on a protocol.
> 
>>
>> For reference, I raised a bugzilla bug for this and it is similar to one
>> I fixed recently.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1998592
>>
> 
> The problem is that ability to implement the optimization depends on
> how this particular bug will be fixed in OVN.  If it will be fixed by
> blindly adding protocol matches everywhere, than, obviously, optimization
> will not be possible as logical flows will never be identical anymore.
> 
> On the other hand, if it will be fixed this way, number of logical flows
> in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't
> sound like a good thing to have at that scale.
> 
> So, I'd say that whoever will work on fixing a bug should try to avoid
> having protocol matches as much as possible.  This will save a lot of memory
> and processing time in all OVN components.  And will also allow optimization
> impemented in this patch.
> 
> On a side note, all these force_snat and skip_snat looks like a dirty
> workarounds for some other problems.  And they are too low level for
> OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS).

Yeah, I agree but they are part of the interface now and seem to be used
which, unfortunately, is a bit of a problem. I am not really sure of the
origin of them TBH.
> 
> Best regards, Ilya Maximets.
>
Dumitru Ceara Sept. 1, 2021, 12:42 p.m. UTC | #5
Hi Ilya,

On 8/27/21 5:24 PM, Ilya Maximets wrote:
> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
> balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
> these load balancers has no ports specified (!vip_port), northd will
> generate identical logical flows for them.  2 of 3 such flows will be
> just discarded, so it's better to not build them form the beginning.

I don't think this is accurate; AFAIK ovn-kubernetes will not configure
load balancer VIPs without specifying the port.

> 
> For example, in an ovn-heater's 120 node density-heavy scenario we
> have 3 load balancers with 15K VIPs in each.  One for tcp, one for
> udp and one for sctp.  In this case, ovn-northd generates 26M of
> logical flows in total.  ~7.5M of them are flows for a single load
> balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
> discarded.

The way ovn-heater was configuring VIPs was wrong.  I opened a PR to fix
that:

https://github.com/dceara/ovn-heater/pull/75

> 
> Let's find all these identical load balancers and skip while building
> logical flows.  With this change, 15M of redundant logical flows are
> not generated saving ~1.5 seconds of the CPU time per run.

In conclusion I'm not so sure the impact will be as noticeable in a real
ovn-kubernetes deployment.

> 
> Comparison function and the loop looks heavy, but in testing it takes
> only a few milliseconds on these large load balancers.
> 
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

Regards,
Dumitru
Han Zhou Sept. 16, 2021, 5:28 a.m. UTC | #6
On Wed, Sep 1, 2021 at 5:42 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> Hi Ilya,
>
> On 8/27/21 5:24 PM, Ilya Maximets wrote:
> > It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
> > balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
> > these load balancers has no ports specified (!vip_port), northd will
> > generate identical logical flows for them.  2 of 3 such flows will be
> > just discarded, so it's better to not build them form the beginning.
>
> I don't think this is accurate; AFAIK ovn-kubernetes will not configure
> load balancer VIPs without specifying the port.
>
> >
> > For example, in an ovn-heater's 120 node density-heavy scenario we
> > have 3 load balancers with 15K VIPs in each.  One for tcp, one for
> > udp and one for sctp.  In this case, ovn-northd generates 26M of
> > logical flows in total.  ~7.5M of them are flows for a single load
> > balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
> > discarded.
>
> The way ovn-heater was configuring VIPs was wrong.  I opened a PR to fix
> that:
>
> https://github.com/dceara/ovn-heater/pull/75
>
> >
> > Let's find all these identical load balancers and skip while building
> > logical flows.  With this change, 15M of redundant logical flows are
> > not generated saving ~1.5 seconds of the CPU time per run.
>
> In conclusion I'm not so sure the impact will be as noticeable in a real
> ovn-kubernetes deployment.
>

I agree with Dumitru that maybe it's not worth optimizing OVN for
unrealistic use cases. If someone configures LBs that way (with L4 protocol
specified but no L4 ports), shall we consider it a misconfiguration and let
the user be responsible for the unnecessary overhead? We could add
documents to advise users not to do such things.
On the other hand, if users do have ports added (i.e. correcting the
misconfiguration), this change would not help anyway.

> >
> > Comparison function and the loop looks heavy, but in testing it takes
> > only a few milliseconds on these large load balancers.

The O(n^2) loop does look heavy. Would it be slow when there are a large
number of LBs (each may have small number of VIPs)?

Thanks,
Han

> >
> > Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> > ---
>
> Regards,
> Dumitru
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets Sept. 16, 2021, 11:35 a.m. UTC | #7
On 9/16/21 07:28, Han Zhou wrote:
> 
> 
> On Wed, Sep 1, 2021 at 5:42 AM Dumitru Ceara <dceara@redhat.com <mailto:dceara@redhat.com>> wrote:
>>
>> Hi Ilya,
>>
>> On 8/27/21 5:24 PM, Ilya Maximets wrote:
>> > It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load
>> > balancers, one per protocol (tcp, udp, sctp).  However, if VIPs of
>> > these load balancers has no ports specified (!vip_port), northd will
>> > generate identical logical flows for them.  2 of 3 such flows will be
>> > just discarded, so it's better to not build them form the beginning.
>>
>> I don't think this is accurate; AFAIK ovn-kubernetes will not configure
>> load balancer VIPs without specifying the port.
>>
>> >
>> > For example, in an ovn-heater's 120 node density-heavy scenario we
>> > have 3 load balancers with 15K VIPs in each.  One for tcp, one for
>> > udp and one for sctp.  In this case, ovn-northd generates 26M of
>> > logical flows in total.  ~7.5M of them are flows for a single load
>> > balancer.  2 * 7.5M = 15M are identical to the first 7.5M and just
>> > discarded.
>>
>> The way ovn-heater was configuring VIPs was wrong.  I opened a PR to fix
>> that:
>>
>> https://github.com/dceara/ovn-heater/pull/75 <https://github.com/dceara/ovn-heater/pull/75>
>>
>> >
>> > Let's find all these identical load balancers and skip while building
>> > logical flows.  With this change, 15M of redundant logical flows are
>> > not generated saving ~1.5 seconds of the CPU time per run.
>>
>> In conclusion I'm not so sure the impact will be as noticeable in a real
>> ovn-kubernetes deployment.
>>
> 
> I agree with Dumitru that maybe it's not worth optimizing OVN for unrealistic use cases. If someone configures LBs that way (with L4 protocol specified but no L4 ports), shall we consider it a misconfiguration and let the user be responsible for the unnecessary overhead? We could add documents to advise users not to do such things.
> On the other hand, if users do have ports added (i.e. correcting the misconfiguration), this change would not help anyway.

Yes, I agree.  It was an attempt to optimize what I saw in the
ovn-heater test.  But, yes, such configuration seems to be far
from reality.  ovn-heater also configures ports for LBs now.
So, let's drop this patch.

> 
>> >
>> > Comparison function and the loop looks heavy, but in testing it takes
>> > only a few milliseconds on these large load balancers.
> 
> The O(n^2) loop does look heavy. Would it be slow when there are a large number of LBs (each may have small number of VIPs)?

I had an idea, how to reduce computational complexity of this loop,
but it seems pointless to actually implement, since this patch
doesn't really cover real world scenarios and will not be accepted.

Best regards, Ilya Maximets.

> 
> Thanks,
> Han
> 
>> >
>> > Signed-off-by: Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>>
>> > ---
>>
>> Regards,
>> Dumitru
diff mbox series

Patch

diff --git a/lib/lb.c b/lib/lb.c
index 7b0ed1abe..fb12c3457 100644
--- a/lib/lb.c
+++ b/lib/lb.c
@@ -168,6 +168,7 @@  ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb)
     bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp");
     struct ovn_northd_lb *lb = xzalloc(sizeof *lb);
 
+    lb->skip_lflow_build = false;
     lb->nlb = nbrec_lb;
     lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
     lb->n_vips = smap_count(&nbrec_lb->vips);
@@ -238,6 +239,61 @@  ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid)
     return NULL;
 }
 
+/* Compares two load balancers for equality ignoring the 'protocol'. */
+bool
+ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1,
+                                     const struct ovn_northd_lb *lb2)
+{
+    /* It's much more convenient to compare Northbound DB configuration. */
+    const struct nbrec_load_balancer *lb_a = lb1->nlb;
+    const struct nbrec_load_balancer *lb_b = lb2->nlb;
+
+    if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) {
+        return false;
+    }
+    if (lb_a->n_health_check != lb_b->n_health_check) {
+        return false;
+    }
+    if (lb_a->n_health_check
+        && !memcmp(lb_a->health_check, lb_b->health_check,
+                   lb_a->n_health_check * sizeof *lb_a->health_check)) {
+        return false;
+    }
+    if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) {
+        return false;
+    }
+    if (!smap_equal(&lb_a->options, &lb_b->options)) {
+        return false;
+    }
+    if (lb_a->n_selection_fields != lb_b->n_selection_fields) {
+        return false;
+    }
+    if (lb_a->n_selection_fields &&
+        memcmp(lb_a->selection_fields, lb_b->selection_fields,
+               lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) {
+        return false;
+    }
+    if (!smap_equal(&lb_a->vips, &lb_b->vips)) {
+        return false;
+    }
+
+    /* Below fields are not easily accessible from the Nb DB entry, so
+     * comparing parsed versions. */
+    if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) {
+        return false;
+    }
+    if (lb1->n_nb_ls
+        && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) {
+        return false;
+    }
+    if (lb1->n_nb_lr
+        && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) {
+        return false;
+    }
+
+    return true;
+}
+
 void
 ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od)
 {
diff --git a/lib/lb.h b/lib/lb.h
index 832ed31fb..8e92338a3 100644
--- a/lib/lb.h
+++ b/lib/lb.h
@@ -50,6 +50,8 @@  struct ovn_northd_lb {
     size_t n_nb_lr;
     size_t n_allocated_nb_lr;
     struct ovn_datapath **nb_lr;
+
+    bool skip_lflow_build;
 };
 
 struct ovn_lb_vip {
@@ -87,6 +89,8 @@  struct ovn_northd_lb_backend {
 
 struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *);
 struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *);
+bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *,
+                                          const struct ovn_northd_lb *);
 void ovn_northd_lb_destroy(struct ovn_northd_lb *);
 void
 ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od);
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index af413aba4..d1efc28f4 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -3732,6 +3732,35 @@  build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
             sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0);
         }
     }
+
+    /* Northd doesn't use protocol in case vips has no ports specified.
+     * And it's also common that user configures several identical load
+     * balancers, one per protocol. We need to find them and use only one
+     * for logical flow construction.  Logical flows will be identical,
+     * so it's faster to just not build them. */
+    struct ovn_northd_lb *lb2;
+    HMAP_FOR_EACH (lb, hmap_node, lbs) {
+        if (lb->skip_lflow_build) {
+            continue;
+        }
+        for (size_t i = 0; i < lb->n_vips; i++) {
+            if (lb->vips[i].vip_port) {
+                goto next;
+            }
+        }
+        HMAP_FOR_EACH (lb2, hmap_node, lbs) {
+            if (lb2 == lb || lb2->skip_lflow_build) {
+                continue;
+            }
+            if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) {
+                /* Load balancer still referenced from logical switch or
+                 * router, so we can't destroy it here. */
+                lb2->skip_lflow_build = true;
+            }
+        }
+next:;
+    }
+
 }
 
 static void
@@ -12772,6 +12801,9 @@  build_lflows_thread(void *arg)
                     if (stop_parallel_processing()) {
                         return NULL;
                     }
+                    if (lb->skip_lflow_build) {
+                        continue;
+                    }
                     build_lswitch_arp_nd_service_monitor(lb, lsi->lflows,
                                                          &lsi->match,
                                                          &lsi->actions);
@@ -12943,6 +12975,9 @@  build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             build_lswitch_and_lrouter_iterate_by_op(op, &lsi);
         }
         HMAP_FOR_EACH (lb, hmap_node, lbs) {
+            if (lb->skip_lflow_build) {
+                continue;
+            }
             build_lswitch_arp_nd_service_monitor(lb, lsi.lflows,
                                                  &lsi.actions,
                                                  &lsi.match);