Message ID | 20210827152452.1949113-1-i.maximets@ovn.org |
---|---|
State | Not Applicable |
Headers | show |
Series | [ovs-dev] ovn-northd: Don't generate identical flows for same LBs with different protocol. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
On 27/08/2021 16:24, Ilya Maximets wrote: > It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load > balancers, one per protocol (tcp, udp, sctp). However, if VIPs of > these load balancers has no ports specified (!vip_port), northd will > generate identical logical flows for them. 2 of 3 such flows will be > just discarded, so it's better to not build them form the beginning. s/form/from > > For example, in an ovn-heater's 120 node density-heavy scenario we > have 3 load balancers with 15K VIPs in each. One for tcp, one for > udp and one for sctp. In this case, ovn-northd generates 26M of > logical flows in total. ~7.5M of them are flows for a single load > balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just > discarded. > > Let's find all these identical load balancers and skip while building > logical flows. With this change, 15M of redundant logical flows are > not generated saving ~1.5 seconds of the CPU time per run. > > Comparison function and the loop looks heavy, but in testing it takes > only a few milliseconds on these large load balancers. > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> I see you haven't changed any tests? I am guessing because this doesn't change the logical flows .. which is surprising? Also, as an FYI, I was unable to create load balancers like this using nbctl directly. It fails on creation of the load balancer. However, you can modify the NBDB after creation. So I presume this is an allowed configuration. However, it is not very well specified. `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp ovn-nbctl: Protocol is unnecessary when no port of vip is given.` > --- > > The better option might be to allow multiple protocols being configured > per load balancer, but this will require big API/schema/etc changes > and may reuire re-desing of certain northd/ovn-controller logic. > > lib/lb.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > lib/lb.h | 4 ++++ > northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++ > 3 files changed, 95 insertions(+) > > diff --git a/lib/lb.c b/lib/lb.c > index 7b0ed1abe..fb12c3457 100644 > --- a/lib/lb.c > +++ b/lib/lb.c > @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb) > bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp"); > struct ovn_northd_lb *lb = xzalloc(sizeof *lb); > > + lb->skip_lflow_build = false; > lb->nlb = nbrec_lb; > lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; > lb->n_vips = smap_count(&nbrec_lb->vips); > @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid) > return NULL; > } > > +/* Compares two load balancers for equality ignoring the 'protocol'. */ > +bool > +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1, > + const struct ovn_northd_lb *lb2) > +{ I'd have a bit of a concern about maintaining this. For example, if we add more fields, we would have to remember to update this but I can't think of a better way of doing it. > + /* It's much more convenient to compare Northbound DB configuration. */ > + const struct nbrec_load_balancer *lb_a = lb1->nlb; > + const struct nbrec_load_balancer *lb_b = lb2->nlb; > + > + if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) { > + return false; > + } > + if (lb_a->n_health_check != lb_b->n_health_check) { > + return false; > + } > + if (lb_a->n_health_check > + && !memcmp(lb_a->health_check, lb_b->health_check, > + lb_a->n_health_check * sizeof *lb_a->health_check)) { > + return false; > + } > + if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) { > + return false; > + } > + if (!smap_equal(&lb_a->options, &lb_b->options)) { > + return false; > + } > + if (lb_a->n_selection_fields != lb_b->n_selection_fields) { > + return false; > + } > + if (lb_a->n_selection_fields && > + memcmp(lb_a->selection_fields, lb_b->selection_fields, > + lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) { > + return false; > + } > + if (!smap_equal(&lb_a->vips, &lb_b->vips)) { > + return false; > + } > + > + /* Below fields are not easily accessible from the Nb DB entry, so > + * comparing parsed versions. */ > + if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) { > + return false; > + } > + if (lb1->n_nb_ls > + && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) { > + return false; > + } > + if (lb1->n_nb_lr > + && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) { > + return false; > + } > + > + return true; > +} > + > void > ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od) > { > diff --git a/lib/lb.h b/lib/lb.h > index 832ed31fb..8e92338a3 100644 > --- a/lib/lb.h > +++ b/lib/lb.h > @@ -50,6 +50,8 @@ struct ovn_northd_lb { > size_t n_nb_lr; > size_t n_allocated_nb_lr; > struct ovn_datapath **nb_lr; > + > + bool skip_lflow_build; > }; > > struct ovn_lb_vip { > @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend { > > struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *); > struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *); > +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *, > + const struct ovn_northd_lb *); > void ovn_northd_lb_destroy(struct ovn_northd_lb *); > void > ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od); > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > index af413aba4..d1efc28f4 100644 > --- a/northd/ovn-northd.c > +++ b/northd/ovn-northd.c > @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, > sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0); > } > } > + > + /* Northd doesn't use protocol in case vips has no ports specified. > + * And it's also common that user configures several identical load > + * balancers, one per protocol. We need to find them and use only one > + * for logical flow construction. Logical flows will be identical, > + * so it's faster to just not build them. */ > + struct ovn_northd_lb *lb2; > + HMAP_FOR_EACH (lb, hmap_node, lbs) { > + if (lb->skip_lflow_build) { > + continue; > + } > + for (size_t i = 0; i < lb->n_vips; i++) { > + if (lb->vips[i].vip_port) { > + goto next; > + } > + } > + HMAP_FOR_EACH (lb2, hmap_node, lbs) { > + if (lb2 == lb || lb2->skip_lflow_build) { > + continue; > + } > + if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) { > + /* Load balancer still referenced from logical switch or > + * router, so we can't destroy it here. */ > + lb2->skip_lflow_build = true; > + } > + } I am not sure how this would scale in the case in which there were lots of load-balancers. OVN-K is changing to a model in which there are many (15k?) loadbalancers. This would increase this inner loop to 15k x 15k (225M) iterations. > +next:; > + } > + > } > > static void > @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg) > if (stop_parallel_processing()) { > return NULL; > } > + if (lb->skip_lflow_build) { > + continue; > + } > build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, > &lsi->match, > &lsi->actions); > @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, > build_lswitch_and_lrouter_iterate_by_op(op, &lsi); > } > HMAP_FOR_EACH (lb, hmap_node, lbs) { > + if (lb->skip_lflow_build) { > + continue; > + } > build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, > &lsi.actions, > &lsi.match); > I have found a configuration that causes undefined behaviour. However, it is the same as the without your patch but it is relevant. If you define two load balancers with the protocol specified (but without the port) and attach different attributes to each. It can cause conflicting logical flows. Consider the following reproducer (can be run in sandbox environment): ` # Create the first logical switch with one port ovn-nbctl ls-add sw0 ovn-nbctl lsp-add sw0 sw0-port1 ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" # Create the second logical switch with one port ovn-nbctl ls-add sw1 ovn-nbctl lsp-add sw1 sw1-port1 ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" # Create a logical router and attach both logical switches ovn-nbctl lr-add lr0 ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 ovn-nbctl lsp-add sw0 lrp0-attachment ovn-nbctl lsp-set-type lrp0-attachment router ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 ovn-nbctl lsp-add sw1 lrp1-attachment ovn-nbctl lsp-set-type lrp1-attachment router ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 ovs-vsctl add-port br-int p1 -- \ set Interface p1 external_ids:iface-id=sw0-port1 ovs-vsctl add-port br-int p2 -- \ set Interface p2 external_ids:iface-id=sw1-port1 ovn-nbctl set Logical_Router lr0 options:chassis=hv1 ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2 ovn-nbctl set Load_Balancer lb0 protocol=tcp ovn-nbctl set Load_Balancer lb0 options=skip_snat=true ovn-nbctl set Load_Balancer lb1 protocol=udp ovn-nbctl lr-lb-add lr0 lb0 ovn-nbctl lr-lb-add lr0 lb1 ` Your code gives the following flows: $ ovn-sbctl dump-flows | grep lr_in_dnat table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1), action=(flags.skip_snat_for_lb = 1; next;) table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;) table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);) table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1; ct_lb(backends=192.168.0.2);) table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) These should produce different flows for each load balancer which means we need to specify protocol as a match field. I think this prevents your optimization? For reference, I raised a bugzilla bug for this and it is similar to one I fixed recently. https://bugzilla.redhat.com/show_bug.cgi?id=1998592
On 8/27/21 7:05 PM, Mark Gray wrote: > On 27/08/2021 16:24, Ilya Maximets wrote: >> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load >> balancers, one per protocol (tcp, udp, sctp). However, if VIPs of >> these load balancers has no ports specified (!vip_port), northd will >> generate identical logical flows for them. 2 of 3 such flows will be >> just discarded, so it's better to not build them form the beginning. > > s/form/from > >> >> For example, in an ovn-heater's 120 node density-heavy scenario we >> have 3 load balancers with 15K VIPs in each. One for tcp, one for >> udp and one for sctp. In this case, ovn-northd generates 26M of >> logical flows in total. ~7.5M of them are flows for a single load >> balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just >> discarded. >> >> Let's find all these identical load balancers and skip while building >> logical flows. With this change, 15M of redundant logical flows are >> not generated saving ~1.5 seconds of the CPU time per run. >> >> Comparison function and the loop looks heavy, but in testing it takes >> only a few milliseconds on these large load balancers. >> >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > > I see you haven't changed any tests? I am guessing because this doesn't > change the logical flows .. which is surprising? This is not surprising. :) This patch only allows to not generate duplicate flows that will never end up in a Southbound anyway. So, total number and look of logical flows is exactly the same. Hence, it's hard to test. It's a pure performance fix. > > Also, as an FYI, I was unable to create load balancers like this using > nbctl directly. It fails on creation of the load balancer. However, you > can modify the NBDB after creation. So I presume this is an allowed > configuration. However, it is not very well specified. Yeah. nbctl seems inconsistent. However, ovn-k8s and other CMSs are not using it right now and communicate directly with a dtabase, so it's not an issue for them. > > `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp > ovn-nbctl: Protocol is unnecessary when no port of vip is given.` > >> --- >> >> The better option might be to allow multiple protocols being configured >> per load balancer, but this will require big API/schema/etc changes >> and may reuire re-desing of certain northd/ovn-controller logic. >> >> lib/lb.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >> lib/lb.h | 4 ++++ >> northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++ >> 3 files changed, 95 insertions(+) >> >> diff --git a/lib/lb.c b/lib/lb.c >> index 7b0ed1abe..fb12c3457 100644 >> --- a/lib/lb.c >> +++ b/lib/lb.c >> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb) >> bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp"); >> struct ovn_northd_lb *lb = xzalloc(sizeof *lb); >> >> + lb->skip_lflow_build = false; >> lb->nlb = nbrec_lb; >> lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; >> lb->n_vips = smap_count(&nbrec_lb->vips); >> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid) >> return NULL; >> } >> >> +/* Compares two load balancers for equality ignoring the 'protocol'. */ >> +bool >> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1, >> + const struct ovn_northd_lb *lb2) >> +{ > > I'd have a bit of a concern about maintaining this. For example, if we > add more fields, we would have to remember to update this but I can't > think of a better way of doing it. I can add a comment to the 'struct ovn_northd_lb' for developers that any change should be reflected in this function. > >> + /* It's much more convenient to compare Northbound DB configuration. */ >> + const struct nbrec_load_balancer *lb_a = lb1->nlb; >> + const struct nbrec_load_balancer *lb_b = lb2->nlb; >> + >> + if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) { >> + return false; >> + } >> + if (lb_a->n_health_check != lb_b->n_health_check) { >> + return false; >> + } >> + if (lb_a->n_health_check >> + && !memcmp(lb_a->health_check, lb_b->health_check, >> + lb_a->n_health_check * sizeof *lb_a->health_check)) { >> + return false; >> + } >> + if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) { >> + return false; >> + } >> + if (!smap_equal(&lb_a->options, &lb_b->options)) { >> + return false; >> + } >> + if (lb_a->n_selection_fields != lb_b->n_selection_fields) { >> + return false; >> + } >> + if (lb_a->n_selection_fields && >> + memcmp(lb_a->selection_fields, lb_b->selection_fields, >> + lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) { >> + return false; >> + } >> + if (!smap_equal(&lb_a->vips, &lb_b->vips)) { >> + return false; >> + } >> + >> + /* Below fields are not easily accessible from the Nb DB entry, so >> + * comparing parsed versions. */ >> + if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) { >> + return false; >> + } >> + if (lb1->n_nb_ls >> + && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) { >> + return false; >> + } >> + if (lb1->n_nb_lr >> + && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) { >> + return false; >> + } >> + >> + return true; >> +} >> + >> void >> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od) >> { >> diff --git a/lib/lb.h b/lib/lb.h >> index 832ed31fb..8e92338a3 100644 >> --- a/lib/lb.h >> +++ b/lib/lb.h >> @@ -50,6 +50,8 @@ struct ovn_northd_lb { >> size_t n_nb_lr; >> size_t n_allocated_nb_lr; >> struct ovn_datapath **nb_lr; >> + >> + bool skip_lflow_build; >> }; >> >> struct ovn_lb_vip { >> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend { >> >> struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *); >> struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *); >> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *, >> + const struct ovn_northd_lb *); >> void ovn_northd_lb_destroy(struct ovn_northd_lb *); >> void >> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od); >> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >> index af413aba4..d1efc28f4 100644 >> --- a/northd/ovn-northd.c >> +++ b/northd/ovn-northd.c >> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, >> sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0); >> } >> } >> + >> + /* Northd doesn't use protocol in case vips has no ports specified. >> + * And it's also common that user configures several identical load >> + * balancers, one per protocol. We need to find them and use only one >> + * for logical flow construction. Logical flows will be identical, >> + * so it's faster to just not build them. */ >> + struct ovn_northd_lb *lb2; >> + HMAP_FOR_EACH (lb, hmap_node, lbs) { >> + if (lb->skip_lflow_build) { >> + continue; >> + } >> + for (size_t i = 0; i < lb->n_vips; i++) { >> + if (lb->vips[i].vip_port) { >> + goto next; >> + } >> + } >> + HMAP_FOR_EACH (lb2, hmap_node, lbs) { >> + if (lb2 == lb || lb2->skip_lflow_build) { >> + continue; >> + } >> + if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) { >> + /* Load balancer still referenced from logical switch or >> + * router, so we can't destroy it here. */ >> + lb2->skip_lflow_build = true; >> + } >> + } > > I am not sure how this would scale in the case in which there were lots > of load-balancers. OVN-K is changing to a model in which there are many > (15k?) loadbalancers. This would increase this inner loop to 15k x 15k > (225M) iterations. I'll try to test that. We can try to hash the load balancer and make this part semi-linear. The code will be a bit more complex. > >> +next:; >> + } >> + >> } >> >> static void >> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg) >> if (stop_parallel_processing()) { >> return NULL; >> } >> + if (lb->skip_lflow_build) { >> + continue; >> + } >> build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, >> &lsi->match, >> &lsi->actions); >> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, >> build_lswitch_and_lrouter_iterate_by_op(op, &lsi); >> } >> HMAP_FOR_EACH (lb, hmap_node, lbs) { >> + if (lb->skip_lflow_build) { >> + continue; >> + } >> build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, >> &lsi.actions, >> &lsi.match); >> > > I have found a configuration that causes undefined behaviour. However, > it is the same as the without your patch but it is relevant. If you > define two load balancers with the protocol specified (but without the > port) and attach different attributes to each. It can cause conflicting > logical flows. Consider the following reproducer (can be run in sandbox > environment): > > ` > # Create the first logical switch with one port > ovn-nbctl ls-add sw0 > ovn-nbctl lsp-add sw0 sw0-port1 > ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" > > # Create the second logical switch with one port > ovn-nbctl ls-add sw1 > ovn-nbctl lsp-add sw1 sw1-port1 > ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" > > # Create a logical router and attach both logical switches > ovn-nbctl lr-add lr0 > ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 > ovn-nbctl lsp-add sw0 lrp0-attachment > ovn-nbctl lsp-set-type lrp0-attachment router > ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 > ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 > ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 > ovn-nbctl lsp-add sw1 lrp1-attachment > ovn-nbctl lsp-set-type lrp1-attachment router > ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 > ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 > > ovs-vsctl add-port br-int p1 -- \ > set Interface p1 external_ids:iface-id=sw0-port1 > ovs-vsctl add-port br-int p2 -- \ > set Interface p2 external_ids:iface-id=sw1-port1 > > ovn-nbctl set Logical_Router lr0 options:chassis=hv1 > ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 > ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2 > ovn-nbctl set Load_Balancer lb0 protocol=tcp > ovn-nbctl set Load_Balancer lb0 options=skip_snat=true > ovn-nbctl set Load_Balancer lb1 protocol=udp > ovn-nbctl lr-lb-add lr0 lb0 > ovn-nbctl lr-lb-add lr0 lb1 > ` > > Your code gives the following flows: > $ ovn-sbctl dump-flows | grep lr_in_dnat > table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && > reg0 == 11.0.0.200 && ct_label.natted == 1), > action=(flags.skip_snat_for_lb = 1; next;) > table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && > reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;) > table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && > reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);) > table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && > reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1; > ct_lb(backends=192.168.0.2);) > table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) > > These should produce different flows for each load balancer which means > we need to specify protocol as a match field. I think this prevents your > optimization? Since 'options' are different, optimization will not be applied. ovn_northd_lb_equal_except_for_proto() compares them. But if options are the same, than you don't need a match on a protocol. > > For reference, I raised a bugzilla bug for this and it is similar to one > I fixed recently. > > https://bugzilla.redhat.com/show_bug.cgi?id=1998592 > The problem is that ability to implement the optimization depends on how this particular bug will be fixed in OVN. If it will be fixed by blindly adding protocol matches everywhere, than, obviously, optimization will not be possible as logical flows will never be identical anymore. On the other hand, if it will be fixed this way, number of logical flows in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't sound like a good thing to have at that scale. So, I'd say that whoever will work on fixing a bug should try to avoid having protocol matches as much as possible. This will save a lot of memory and processing time in all OVN components. And will also allow optimization impemented in this patch. On a side note, all these force_snat and skip_snat looks like a dirty workarounds for some other problems. And they are too low level for OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS). Best regards, Ilya Maximets.
On 8/27/21 8:03 PM, Ilya Maximets wrote: > On 8/27/21 7:05 PM, Mark Gray wrote: >> On 27/08/2021 16:24, Ilya Maximets wrote: >>> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load >>> balancers, one per protocol (tcp, udp, sctp). However, if VIPs of >>> these load balancers has no ports specified (!vip_port), northd will >>> generate identical logical flows for them. 2 of 3 such flows will be >>> just discarded, so it's better to not build them form the beginning. >> >> s/form/from >> >>> >>> For example, in an ovn-heater's 120 node density-heavy scenario we >>> have 3 load balancers with 15K VIPs in each. One for tcp, one for >>> udp and one for sctp. In this case, ovn-northd generates 26M of >>> logical flows in total. ~7.5M of them are flows for a single load >>> balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just >>> discarded. >>> >>> Let's find all these identical load balancers and skip while building >>> logical flows. With this change, 15M of redundant logical flows are >>> not generated saving ~1.5 seconds of the CPU time per run. >>> >>> Comparison function and the loop looks heavy, but in testing it takes >>> only a few milliseconds on these large load balancers. >>> >>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> >> I see you haven't changed any tests? I am guessing because this doesn't >> change the logical flows .. which is surprising? > > This is not surprising. :) > This patch only allows to not generate duplicate flows that will never > end up in a Southbound anyway. So, total number and look of logical > flows is exactly the same. Hence, it's hard to test. It's a pure > performance fix. > >> >> Also, as an FYI, I was unable to create load balancers like this using >> nbctl directly. It fails on creation of the load balancer. However, you >> can modify the NBDB after creation. So I presume this is an allowed >> configuration. However, it is not very well specified. > > Yeah. nbctl seems inconsistent. However, ovn-k8s and other CMSs are > not using it right now and communicate directly with a dtabase, so it's > not an issue for them. > >> >> `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp >> ovn-nbctl: Protocol is unnecessary when no port of vip is given.` >> >>> --- >>> >>> The better option might be to allow multiple protocols being configured >>> per load balancer, but this will require big API/schema/etc changes >>> and may reuire re-desing of certain northd/ovn-controller logic. >>> >>> lib/lb.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >>> lib/lb.h | 4 ++++ >>> northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++ >>> 3 files changed, 95 insertions(+) >>> >>> diff --git a/lib/lb.c b/lib/lb.c >>> index 7b0ed1abe..fb12c3457 100644 >>> --- a/lib/lb.c >>> +++ b/lib/lb.c >>> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb) >>> bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp"); >>> struct ovn_northd_lb *lb = xzalloc(sizeof *lb); >>> >>> + lb->skip_lflow_build = false; >>> lb->nlb = nbrec_lb; >>> lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; >>> lb->n_vips = smap_count(&nbrec_lb->vips); >>> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid) >>> return NULL; >>> } >>> >>> +/* Compares two load balancers for equality ignoring the 'protocol'. */ >>> +bool >>> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1, >>> + const struct ovn_northd_lb *lb2) >>> +{ >> >> I'd have a bit of a concern about maintaining this. For example, if we >> add more fields, we would have to remember to update this but I can't >> think of a better way of doing it. > > I can add a comment to the 'struct ovn_northd_lb' for developers that > any change should be reflected in this function. > >> >>> + /* It's much more convenient to compare Northbound DB configuration. */ >>> + const struct nbrec_load_balancer *lb_a = lb1->nlb; >>> + const struct nbrec_load_balancer *lb_b = lb2->nlb; >>> + >>> + if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) { >>> + return false; >>> + } >>> + if (lb_a->n_health_check != lb_b->n_health_check) { >>> + return false; >>> + } >>> + if (lb_a->n_health_check >>> + && !memcmp(lb_a->health_check, lb_b->health_check, >>> + lb_a->n_health_check * sizeof *lb_a->health_check)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->options, &lb_b->options)) { >>> + return false; >>> + } >>> + if (lb_a->n_selection_fields != lb_b->n_selection_fields) { >>> + return false; >>> + } >>> + if (lb_a->n_selection_fields && >>> + memcmp(lb_a->selection_fields, lb_b->selection_fields, >>> + lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->vips, &lb_b->vips)) { >>> + return false; >>> + } >>> + >>> + /* Below fields are not easily accessible from the Nb DB entry, so >>> + * comparing parsed versions. */ >>> + if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) { >>> + return false; >>> + } >>> + if (lb1->n_nb_ls >>> + && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) { >>> + return false; >>> + } >>> + if (lb1->n_nb_lr >>> + && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) { >>> + return false; >>> + } >>> + >>> + return true; >>> +} >>> + >>> void >>> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od) >>> { >>> diff --git a/lib/lb.h b/lib/lb.h >>> index 832ed31fb..8e92338a3 100644 >>> --- a/lib/lb.h >>> +++ b/lib/lb.h >>> @@ -50,6 +50,8 @@ struct ovn_northd_lb { >>> size_t n_nb_lr; >>> size_t n_allocated_nb_lr; >>> struct ovn_datapath **nb_lr; >>> + >>> + bool skip_lflow_build; >>> }; >>> >>> struct ovn_lb_vip { >>> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend { >>> >>> struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *); >>> struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *); >>> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *, >>> + const struct ovn_northd_lb *); >>> void ovn_northd_lb_destroy(struct ovn_northd_lb *); >>> void >>> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od); >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >>> index af413aba4..d1efc28f4 100644 >>> --- a/northd/ovn-northd.c >>> +++ b/northd/ovn-northd.c >>> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, >>> sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0); >>> } >>> } >>> + >>> + /* Northd doesn't use protocol in case vips has no ports specified. >>> + * And it's also common that user configures several identical load >>> + * balancers, one per protocol. We need to find them and use only one >>> + * for logical flow construction. Logical flows will be identical, >>> + * so it's faster to just not build them. */ >>> + struct ovn_northd_lb *lb2; >>> + HMAP_FOR_EACH (lb, hmap_node, lbs) { >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> + for (size_t i = 0; i < lb->n_vips; i++) { >>> + if (lb->vips[i].vip_port) { >>> + goto next; >>> + } >>> + } >>> + HMAP_FOR_EACH (lb2, hmap_node, lbs) { >>> + if (lb2 == lb || lb2->skip_lflow_build) { >>> + continue; >>> + } >>> + if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) { >>> + /* Load balancer still referenced from logical switch or >>> + * router, so we can't destroy it here. */ >>> + lb2->skip_lflow_build = true; >>> + } >>> + } >> >> I am not sure how this would scale in the case in which there were lots >> of load-balancers. OVN-K is changing to a model in which there are many >> (15k?) loadbalancers. This would increase this inner loop to 15k x 15k >> (225M) iterations. > > I'll try to test that. We can try to hash the load balancer and make > this part semi-linear. The code will be a bit more complex. > >> >>> +next:; >>> + } >>> + >>> } >>> >>> static void >>> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg) >>> if (stop_parallel_processing()) { >>> return NULL; >>> } >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, >>> &lsi->match, >>> &lsi->actions); >>> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, >>> build_lswitch_and_lrouter_iterate_by_op(op, &lsi); >>> } >>> HMAP_FOR_EACH (lb, hmap_node, lbs) { >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, >>> &lsi.actions, >>> &lsi.match); >>> >> >> I have found a configuration that causes undefined behaviour. However, >> it is the same as the without your patch but it is relevant. If you >> define two load balancers with the protocol specified (but without the >> port) and attach different attributes to each. It can cause conflicting >> logical flows. Consider the following reproducer (can be run in sandbox >> environment): >> >> ` >> # Create the first logical switch with one port >> ovn-nbctl ls-add sw0 >> ovn-nbctl lsp-add sw0 sw0-port1 >> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" >> >> # Create the second logical switch with one port >> ovn-nbctl ls-add sw1 >> ovn-nbctl lsp-add sw1 sw1-port1 >> ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" >> >> # Create a logical router and attach both logical switches >> ovn-nbctl lr-add lr0 >> ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 >> ovn-nbctl lsp-add sw0 lrp0-attachment >> ovn-nbctl lsp-set-type lrp0-attachment router >> ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 >> ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 >> ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 >> ovn-nbctl lsp-add sw1 lrp1-attachment >> ovn-nbctl lsp-set-type lrp1-attachment router >> ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 >> ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 >> >> ovs-vsctl add-port br-int p1 -- \ >> set Interface p1 external_ids:iface-id=sw0-port1 >> ovs-vsctl add-port br-int p2 -- \ >> set Interface p2 external_ids:iface-id=sw1-port1 >> >> ovn-nbctl set Logical_Router lr0 options:chassis=hv1 >> ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 >> ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2 >> ovn-nbctl set Load_Balancer lb0 protocol=tcp >> ovn-nbctl set Load_Balancer lb0 options=skip_snat=true >> ovn-nbctl set Load_Balancer lb1 protocol=udp >> ovn-nbctl lr-lb-add lr0 lb0 >> ovn-nbctl lr-lb-add lr0 lb1 >> ` >> >> Your code gives the following flows: >> $ ovn-sbctl dump-flows | grep lr_in_dnat >> table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && >> reg0 == 11.0.0.200 && ct_label.natted == 1), >> action=(flags.skip_snat_for_lb = 1; next;) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && >> reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && >> reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && >> reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1; >> ct_lb(backends=192.168.0.2);) >> table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) >> >> These should produce different flows for each load balancer which means >> we need to specify protocol as a match field. I think this prevents your >> optimization? > > Since 'options' are different, optimization will not be applied. > ovn_northd_lb_equal_except_for_proto() compares them. > > But if options are the same, than you don't need a match on a protocol. > >> >> For reference, I raised a bugzilla bug for this and it is similar to one >> I fixed recently. >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1998592 >> > > The problem is that ability to implement the optimization depends on > how this particular bug will be fixed in OVN. If it will be fixed by > blindly adding protocol matches everywhere, than, obviously, optimization > will not be possible as logical flows will never be identical anymore. > > On the other hand, if it will be fixed this way, number of logical flows > in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't > sound like a good thing to have at that scale. Hmm. I guess, it's not that bad, because with dp-groups they will be groupped, but, I'd still expect increase in 100-200K of logical flows, which is a substantial addition. ovn-controller will see a harder hit, since it will unfold dp-groups and wil get all 15M extra flows to process. > > So, I'd say that whoever will work on fixing a bug should try to avoid > having protocol matches as much as possible. This will save a lot of memory > and processing time in all OVN components. And will also allow optimization > impemented in this patch. > > On a side note, all these force_snat and skip_snat looks like a dirty > workarounds for some other problems. And they are too low level for > OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS). > > Best regards, Ilya Maximets. >
On 27/08/2021 19:03, Ilya Maximets wrote: > On 8/27/21 7:05 PM, Mark Gray wrote: >> On 27/08/2021 16:24, Ilya Maximets wrote: >>> It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load >>> balancers, one per protocol (tcp, udp, sctp). However, if VIPs of >>> these load balancers has no ports specified (!vip_port), northd will >>> generate identical logical flows for them. 2 of 3 such flows will be >>> just discarded, so it's better to not build them form the beginning. >> >> s/form/from >> >>> >>> For example, in an ovn-heater's 120 node density-heavy scenario we >>> have 3 load balancers with 15K VIPs in each. One for tcp, one for >>> udp and one for sctp. In this case, ovn-northd generates 26M of >>> logical flows in total. ~7.5M of them are flows for a single load >>> balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just >>> discarded. >>> >>> Let's find all these identical load balancers and skip while building >>> logical flows. With this change, 15M of redundant logical flows are >>> not generated saving ~1.5 seconds of the CPU time per run. >>> >>> Comparison function and the loop looks heavy, but in testing it takes >>> only a few milliseconds on these large load balancers. >>> >>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> >> I see you haven't changed any tests? I am guessing because this doesn't >> change the logical flows .. which is surprising? > > This is not surprising. :) > This patch only allows to not generate duplicate flows that will never > end up in a Southbound anyway. So, total number and look of logical > flows is exactly the same. Hence, it's hard to test. It's a pure > performance fix. For me, I was surprised that we didn't have a test for the kind of issue that we had below. I was expecting something like that to fail. > >> >> Also, as an FYI, I was unable to create load balancers like this using >> nbctl directly. It fails on creation of the load balancer. However, you >> can modify the NBDB after creation. So I presume this is an allowed >> configuration. However, it is not very well specified. > > Yeah. nbctl seems inconsistent. However, ovn-k8s and other CMSs are > not using it right now and communicate directly with a dtabase, so it's > not an issue for them. > >> >> `$ ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 tcp >> ovn-nbctl: Protocol is unnecessary when no port of vip is given.` >> >>> --- >>> >>> The better option might be to allow multiple protocols being configured >>> per load balancer, but this will require big API/schema/etc changes >>> and may reuire re-desing of certain northd/ovn-controller logic. >>> >>> lib/lb.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >>> lib/lb.h | 4 ++++ >>> northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++ >>> 3 files changed, 95 insertions(+) >>> >>> diff --git a/lib/lb.c b/lib/lb.c >>> index 7b0ed1abe..fb12c3457 100644 >>> --- a/lib/lb.c >>> +++ b/lib/lb.c >>> @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb) >>> bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp"); >>> struct ovn_northd_lb *lb = xzalloc(sizeof *lb); >>> >>> + lb->skip_lflow_build = false; >>> lb->nlb = nbrec_lb; >>> lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; >>> lb->n_vips = smap_count(&nbrec_lb->vips); >>> @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid) >>> return NULL; >>> } >>> >>> +/* Compares two load balancers for equality ignoring the 'protocol'. */ >>> +bool >>> +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1, >>> + const struct ovn_northd_lb *lb2) >>> +{ >> >> I'd have a bit of a concern about maintaining this. For example, if we >> add more fields, we would have to remember to update this but I can't >> think of a better way of doing it. > > I can add a comment to the 'struct ovn_northd_lb' for developers that > any change should be reflected in this function. > >> >>> + /* It's much more convenient to compare Northbound DB configuration. */ >>> + const struct nbrec_load_balancer *lb_a = lb1->nlb; >>> + const struct nbrec_load_balancer *lb_b = lb2->nlb; >>> + >>> + if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) { >>> + return false; >>> + } >>> + if (lb_a->n_health_check != lb_b->n_health_check) { >>> + return false; >>> + } >>> + if (lb_a->n_health_check >>> + && !memcmp(lb_a->health_check, lb_b->health_check, >>> + lb_a->n_health_check * sizeof *lb_a->health_check)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->options, &lb_b->options)) { >>> + return false; >>> + } >>> + if (lb_a->n_selection_fields != lb_b->n_selection_fields) { >>> + return false; >>> + } >>> + if (lb_a->n_selection_fields && >>> + memcmp(lb_a->selection_fields, lb_b->selection_fields, >>> + lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) { >>> + return false; >>> + } >>> + if (!smap_equal(&lb_a->vips, &lb_b->vips)) { >>> + return false; >>> + } >>> + >>> + /* Below fields are not easily accessible from the Nb DB entry, so >>> + * comparing parsed versions. */ >>> + if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) { >>> + return false; >>> + } >>> + if (lb1->n_nb_ls >>> + && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) { >>> + return false; >>> + } >>> + if (lb1->n_nb_lr >>> + && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) { >>> + return false; >>> + } >>> + >>> + return true; >>> +} >>> + >>> void >>> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od) >>> { >>> diff --git a/lib/lb.h b/lib/lb.h >>> index 832ed31fb..8e92338a3 100644 >>> --- a/lib/lb.h >>> +++ b/lib/lb.h >>> @@ -50,6 +50,8 @@ struct ovn_northd_lb { >>> size_t n_nb_lr; >>> size_t n_allocated_nb_lr; >>> struct ovn_datapath **nb_lr; >>> + >>> + bool skip_lflow_build; >>> }; >>> >>> struct ovn_lb_vip { >>> @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend { >>> >>> struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *); >>> struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *); >>> +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *, >>> + const struct ovn_northd_lb *); >>> void ovn_northd_lb_destroy(struct ovn_northd_lb *); >>> void >>> ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od); >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >>> index af413aba4..d1efc28f4 100644 >>> --- a/northd/ovn-northd.c >>> +++ b/northd/ovn-northd.c >>> @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, >>> sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0); >>> } >>> } >>> + >>> + /* Northd doesn't use protocol in case vips has no ports specified. >>> + * And it's also common that user configures several identical load >>> + * balancers, one per protocol. We need to find them and use only one >>> + * for logical flow construction. Logical flows will be identical, >>> + * so it's faster to just not build them. */ >>> + struct ovn_northd_lb *lb2; >>> + HMAP_FOR_EACH (lb, hmap_node, lbs) { >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> + for (size_t i = 0; i < lb->n_vips; i++) { >>> + if (lb->vips[i].vip_port) { >>> + goto next; >>> + } >>> + } >>> + HMAP_FOR_EACH (lb2, hmap_node, lbs) { >>> + if (lb2 == lb || lb2->skip_lflow_build) { >>> + continue; >>> + } >>> + if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) { >>> + /* Load balancer still referenced from logical switch or >>> + * router, so we can't destroy it here. */ >>> + lb2->skip_lflow_build = true; >>> + } >>> + } >> >> I am not sure how this would scale in the case in which there were lots >> of load-balancers. OVN-K is changing to a model in which there are many >> (15k?) loadbalancers. This would increase this inner loop to 15k x 15k >> (225M) iterations. > > I'll try to test that. We can try to hash the load balancer and make > this part semi-linear. The code will be a bit more complex. > >> >>> +next:; >>> + } >>> + >>> } >>> >>> static void >>> @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg) >>> if (stop_parallel_processing()) { >>> return NULL; >>> } >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, >>> &lsi->match, >>> &lsi->actions); >>> @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, >>> build_lswitch_and_lrouter_iterate_by_op(op, &lsi); >>> } >>> HMAP_FOR_EACH (lb, hmap_node, lbs) { >>> + if (lb->skip_lflow_build) { >>> + continue; >>> + } >>> build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, >>> &lsi.actions, >>> &lsi.match); >>> >> >> I have found a configuration that causes undefined behaviour. However, >> it is the same as the without your patch but it is relevant. If you >> define two load balancers with the protocol specified (but without the >> port) and attach different attributes to each. It can cause conflicting >> logical flows. Consider the following reproducer (can be run in sandbox >> environment): >> >> ` >> # Create the first logical switch with one port >> ovn-nbctl ls-add sw0 >> ovn-nbctl lsp-add sw0 sw0-port1 >> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2" >> >> # Create the second logical switch with one port >> ovn-nbctl ls-add sw1 >> ovn-nbctl lsp-add sw1 sw1-port1 >> ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2" >> >> # Create a logical router and attach both logical switches >> ovn-nbctl lr-add lr0 >> ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24 >> ovn-nbctl lsp-add sw0 lrp0-attachment >> ovn-nbctl lsp-set-type lrp0-attachment router >> ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01 >> ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0 >> ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24 >> ovn-nbctl lsp-add sw1 lrp1-attachment >> ovn-nbctl lsp-set-type lrp1-attachment router >> ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02 >> ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1 >> >> ovs-vsctl add-port br-int p1 -- \ >> set Interface p1 external_ids:iface-id=sw0-port1 >> ovs-vsctl add-port br-int p2 -- \ >> set Interface p2 external_ids:iface-id=sw1-port1 >> >> ovn-nbctl set Logical_Router lr0 options:chassis=hv1 >> ovn-nbctl lb-add lb0 11.0.0.200 192.168.0.2 >> ovn-nbctl lb-add lb1 11.0.0.200 192.168.0.2 >> ovn-nbctl set Load_Balancer lb0 protocol=tcp >> ovn-nbctl set Load_Balancer lb0 options=skip_snat=true >> ovn-nbctl set Load_Balancer lb1 protocol=udp >> ovn-nbctl lr-lb-add lr0 lb0 >> ovn-nbctl lr-lb-add lr0 lb1 >> ` >> >> Your code gives the following flows: >> $ ovn-sbctl dump-flows | grep lr_in_dnat >> table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && >> reg0 == 11.0.0.200 && ct_label.natted == 1), >> action=(flags.skip_snat_for_lb = 1; next;) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.est && ip4 && >> reg0 == 11.0.0.200 && ct_label.natted == 1), action=(next;) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && >> reg0 == 11.0.0.200), action=(ct_lb(backends=192.168.0.2);) >> table=6 (lr_in_dnat ), priority=110 , match=(ct.new && ip4 && >> reg0 == 11.0.0.200), action=(flags.skip_snat_for_lb = 1; >> ct_lb(backends=192.168.0.2);) >> table=6 (lr_in_dnat ), priority=0 , match=(1), action=(next;) >> >> These should produce different flows for each load balancer which means >> we need to specify protocol as a match field. I think this prevents your >> optimization? > > Since 'options' are different, optimization will not be applied. > ovn_northd_lb_equal_except_for_proto() compares them. > > But if options are the same, than you don't need a match on a protocol. > >> >> For reference, I raised a bugzilla bug for this and it is similar to one >> I fixed recently. >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1998592 >> > > The problem is that ability to implement the optimization depends on > how this particular bug will be fixed in OVN. If it will be fixed by > blindly adding protocol matches everywhere, than, obviously, optimization > will not be possible as logical flows will never be identical anymore. > > On the other hand, if it will be fixed this way, number of logical flows > in a database will explode from 9.9M to 26M flows, i.e. 2.5x and it doesn't > sound like a good thing to have at that scale. > > So, I'd say that whoever will work on fixing a bug should try to avoid > having protocol matches as much as possible. This will save a lot of memory > and processing time in all OVN components. And will also allow optimization > impemented in this patch. > > On a side note, all these force_snat and skip_snat looks like a dirty > workarounds for some other problems. And they are too low level for > OVN, therefore, IMHO, should not exist or being exposed to the end user (CMS). Yeah, I agree but they are part of the interface now and seem to be used which, unfortunately, is a bit of a problem. I am not really sure of the origin of them TBH. > > Best regards, Ilya Maximets. >
Hi Ilya, On 8/27/21 5:24 PM, Ilya Maximets wrote: > It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load > balancers, one per protocol (tcp, udp, sctp). However, if VIPs of > these load balancers has no ports specified (!vip_port), northd will > generate identical logical flows for them. 2 of 3 such flows will be > just discarded, so it's better to not build them form the beginning. I don't think this is accurate; AFAIK ovn-kubernetes will not configure load balancer VIPs without specifying the port. > > For example, in an ovn-heater's 120 node density-heavy scenario we > have 3 load balancers with 15K VIPs in each. One for tcp, one for > udp and one for sctp. In this case, ovn-northd generates 26M of > logical flows in total. ~7.5M of them are flows for a single load > balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just > discarded. The way ovn-heater was configuring VIPs was wrong. I opened a PR to fix that: https://github.com/dceara/ovn-heater/pull/75 > > Let's find all these identical load balancers and skip while building > logical flows. With this change, 15M of redundant logical flows are > not generated saving ~1.5 seconds of the CPU time per run. In conclusion I'm not so sure the impact will be as noticeable in a real ovn-kubernetes deployment. > > Comparison function and the loop looks heavy, but in testing it takes > only a few milliseconds on these large load balancers. > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- Regards, Dumitru
On Wed, Sep 1, 2021 at 5:42 AM Dumitru Ceara <dceara@redhat.com> wrote: > > Hi Ilya, > > On 8/27/21 5:24 PM, Ilya Maximets wrote: > > It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load > > balancers, one per protocol (tcp, udp, sctp). However, if VIPs of > > these load balancers has no ports specified (!vip_port), northd will > > generate identical logical flows for them. 2 of 3 such flows will be > > just discarded, so it's better to not build them form the beginning. > > I don't think this is accurate; AFAIK ovn-kubernetes will not configure > load balancer VIPs without specifying the port. > > > > > For example, in an ovn-heater's 120 node density-heavy scenario we > > have 3 load balancers with 15K VIPs in each. One for tcp, one for > > udp and one for sctp. In this case, ovn-northd generates 26M of > > logical flows in total. ~7.5M of them are flows for a single load > > balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just > > discarded. > > The way ovn-heater was configuring VIPs was wrong. I opened a PR to fix > that: > > https://github.com/dceara/ovn-heater/pull/75 > > > > > Let's find all these identical load balancers and skip while building > > logical flows. With this change, 15M of redundant logical flows are > > not generated saving ~1.5 seconds of the CPU time per run. > > In conclusion I'm not so sure the impact will be as noticeable in a real > ovn-kubernetes deployment. > I agree with Dumitru that maybe it's not worth optimizing OVN for unrealistic use cases. If someone configures LBs that way (with L4 protocol specified but no L4 ports), shall we consider it a misconfiguration and let the user be responsible for the unnecessary overhead? We could add documents to advise users not to do such things. On the other hand, if users do have ports added (i.e. correcting the misconfiguration), this change would not help anyway. > > > > Comparison function and the loop looks heavy, but in testing it takes > > only a few milliseconds on these large load balancers. The O(n^2) loop does look heavy. Would it be slow when there are a large number of LBs (each may have small number of VIPs)? Thanks, Han > > > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > > --- > > Regards, > Dumitru > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 9/16/21 07:28, Han Zhou wrote: > > > On Wed, Sep 1, 2021 at 5:42 AM Dumitru Ceara <dceara@redhat.com <mailto:dceara@redhat.com>> wrote: >> >> Hi Ilya, >> >> On 8/27/21 5:24 PM, Ilya Maximets wrote: >> > It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load >> > balancers, one per protocol (tcp, udp, sctp). However, if VIPs of >> > these load balancers has no ports specified (!vip_port), northd will >> > generate identical logical flows for them. 2 of 3 such flows will be >> > just discarded, so it's better to not build them form the beginning. >> >> I don't think this is accurate; AFAIK ovn-kubernetes will not configure >> load balancer VIPs without specifying the port. >> >> > >> > For example, in an ovn-heater's 120 node density-heavy scenario we >> > have 3 load balancers with 15K VIPs in each. One for tcp, one for >> > udp and one for sctp. In this case, ovn-northd generates 26M of >> > logical flows in total. ~7.5M of them are flows for a single load >> > balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just >> > discarded. >> >> The way ovn-heater was configuring VIPs was wrong. I opened a PR to fix >> that: >> >> https://github.com/dceara/ovn-heater/pull/75 <https://github.com/dceara/ovn-heater/pull/75> >> >> > >> > Let's find all these identical load balancers and skip while building >> > logical flows. With this change, 15M of redundant logical flows are >> > not generated saving ~1.5 seconds of the CPU time per run. >> >> In conclusion I'm not so sure the impact will be as noticeable in a real >> ovn-kubernetes deployment. >> > > I agree with Dumitru that maybe it's not worth optimizing OVN for unrealistic use cases. If someone configures LBs that way (with L4 protocol specified but no L4 ports), shall we consider it a misconfiguration and let the user be responsible for the unnecessary overhead? We could add documents to advise users not to do such things. > On the other hand, if users do have ports added (i.e. correcting the misconfiguration), this change would not help anyway. Yes, I agree. It was an attempt to optimize what I saw in the ovn-heater test. But, yes, such configuration seems to be far from reality. ovn-heater also configures ports for LBs now. So, let's drop this patch. > >> > >> > Comparison function and the loop looks heavy, but in testing it takes >> > only a few milliseconds on these large load balancers. > > The O(n^2) loop does look heavy. Would it be slow when there are a large number of LBs (each may have small number of VIPs)? I had an idea, how to reduce computational complexity of this loop, but it seems pointless to actually implement, since this patch doesn't really cover real world scenarios and will not be accepted. Best regards, Ilya Maximets. > > Thanks, > Han > >> > >> > Signed-off-by: Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> >> > --- >> >> Regards, >> Dumitru
diff --git a/lib/lb.c b/lib/lb.c index 7b0ed1abe..fb12c3457 100644 --- a/lib/lb.c +++ b/lib/lb.c @@ -168,6 +168,7 @@ ovn_northd_lb_create(const struct nbrec_load_balancer *nbrec_lb) bool is_sctp = nullable_string_is_equal(nbrec_lb->protocol, "sctp"); struct ovn_northd_lb *lb = xzalloc(sizeof *lb); + lb->skip_lflow_build = false; lb->nlb = nbrec_lb; lb->proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp"; lb->n_vips = smap_count(&nbrec_lb->vips); @@ -238,6 +239,61 @@ ovn_northd_lb_find(struct hmap *lbs, const struct uuid *uuid) return NULL; } +/* Compares two load balancers for equality ignoring the 'protocol'. */ +bool +ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *lb1, + const struct ovn_northd_lb *lb2) +{ + /* It's much more convenient to compare Northbound DB configuration. */ + const struct nbrec_load_balancer *lb_a = lb1->nlb; + const struct nbrec_load_balancer *lb_b = lb2->nlb; + + if (!smap_equal(&lb_a->external_ids, &lb_b->external_ids)) { + return false; + } + if (lb_a->n_health_check != lb_b->n_health_check) { + return false; + } + if (lb_a->n_health_check + && !memcmp(lb_a->health_check, lb_b->health_check, + lb_a->n_health_check * sizeof *lb_a->health_check)) { + return false; + } + if (!smap_equal(&lb_a->ip_port_mappings, &lb_b->ip_port_mappings)) { + return false; + } + if (!smap_equal(&lb_a->options, &lb_b->options)) { + return false; + } + if (lb_a->n_selection_fields != lb_b->n_selection_fields) { + return false; + } + if (lb_a->n_selection_fields && + memcmp(lb_a->selection_fields, lb_b->selection_fields, + lb_a->n_selection_fields * sizeof *lb_a->selection_fields)) { + return false; + } + if (!smap_equal(&lb_a->vips, &lb_b->vips)) { + return false; + } + + /* Below fields are not easily accessible from the Nb DB entry, so + * comparing parsed versions. */ + if (lb1->n_nb_ls != lb2->n_nb_ls || lb1->n_nb_lr != lb2->n_nb_lr) { + return false; + } + if (lb1->n_nb_ls + && memcmp(lb1->nb_ls, lb1->nb_ls, lb1->n_nb_ls * sizeof *lb1->nb_ls)) { + return false; + } + if (lb1->n_nb_lr + && memcmp(lb1->nb_lr, lb1->nb_lr, lb1->n_nb_lr * sizeof *lb1->nb_lr)) { + return false; + } + + return true; +} + void ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od) { diff --git a/lib/lb.h b/lib/lb.h index 832ed31fb..8e92338a3 100644 --- a/lib/lb.h +++ b/lib/lb.h @@ -50,6 +50,8 @@ struct ovn_northd_lb { size_t n_nb_lr; size_t n_allocated_nb_lr; struct ovn_datapath **nb_lr; + + bool skip_lflow_build; }; struct ovn_lb_vip { @@ -87,6 +89,8 @@ struct ovn_northd_lb_backend { struct ovn_northd_lb *ovn_northd_lb_create(const struct nbrec_load_balancer *); struct ovn_northd_lb * ovn_northd_lb_find(struct hmap *, const struct uuid *); +bool ovn_northd_lb_equal_except_for_proto(const struct ovn_northd_lb *, + const struct ovn_northd_lb *); void ovn_northd_lb_destroy(struct ovn_northd_lb *); void ovn_northd_lb_add_lr(struct ovn_northd_lb *lb, struct ovn_datapath *od); diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index af413aba4..d1efc28f4 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -3732,6 +3732,35 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, sbrec_datapath_binding_set_load_balancers(od->sb, NULL, 0); } } + + /* Northd doesn't use protocol in case vips has no ports specified. + * And it's also common that user configures several identical load + * balancers, one per protocol. We need to find them and use only one + * for logical flow construction. Logical flows will be identical, + * so it's faster to just not build them. */ + struct ovn_northd_lb *lb2; + HMAP_FOR_EACH (lb, hmap_node, lbs) { + if (lb->skip_lflow_build) { + continue; + } + for (size_t i = 0; i < lb->n_vips; i++) { + if (lb->vips[i].vip_port) { + goto next; + } + } + HMAP_FOR_EACH (lb2, hmap_node, lbs) { + if (lb2 == lb || lb2->skip_lflow_build) { + continue; + } + if (ovn_northd_lb_equal_except_for_proto(lb, lb2)) { + /* Load balancer still referenced from logical switch or + * router, so we can't destroy it here. */ + lb2->skip_lflow_build = true; + } + } +next:; + } + } static void @@ -12772,6 +12801,9 @@ build_lflows_thread(void *arg) if (stop_parallel_processing()) { return NULL; } + if (lb->skip_lflow_build) { + continue; + } build_lswitch_arp_nd_service_monitor(lb, lsi->lflows, &lsi->match, &lsi->actions); @@ -12943,6 +12975,9 @@ build_lswitch_and_lrouter_flows(struct hmap *datapaths, struct hmap *ports, build_lswitch_and_lrouter_iterate_by_op(op, &lsi); } HMAP_FOR_EACH (lb, hmap_node, lbs) { + if (lb->skip_lflow_build) { + continue; + } build_lswitch_arp_nd_service_monitor(lb, lsi.lflows, &lsi.actions, &lsi.match);
It's common for CMS (e.g., ovn-kubernetes) to create 3 identical load balancers, one per protocol (tcp, udp, sctp). However, if VIPs of these load balancers has no ports specified (!vip_port), northd will generate identical logical flows for them. 2 of 3 such flows will be just discarded, so it's better to not build them form the beginning. For example, in an ovn-heater's 120 node density-heavy scenario we have 3 load balancers with 15K VIPs in each. One for tcp, one for udp and one for sctp. In this case, ovn-northd generates 26M of logical flows in total. ~7.5M of them are flows for a single load balancer. 2 * 7.5M = 15M are identical to the first 7.5M and just discarded. Let's find all these identical load balancers and skip while building logical flows. With this change, 15M of redundant logical flows are not generated saving ~1.5 seconds of the CPU time per run. Comparison function and the loop looks heavy, but in testing it takes only a few milliseconds on these large load balancers. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- The better option might be to allow multiple protocols being configured per load balancer, but this will require big API/schema/etc changes and may reuire re-desing of certain northd/ovn-controller logic. lib/lb.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ lib/lb.h | 4 ++++ northd/ovn-northd.c | 35 ++++++++++++++++++++++++++++ 3 files changed, 95 insertions(+)