Message ID | 20231026181224.3366779-1-numans@ovn.org |
---|---|
Headers | show |
Series | northd lflow incremental processing | expand |
On Thu, Oct 26, 2023 at 11:12 AM <numans@ovn.org> wrote: > > From: Numan Siddique <numans@ovn.org> > > This patch series adds incremental processing in the lflow engine > node to handle changes to northd and other engine nodes. > Changed related to load balancers and NAT are mainly handled in > this patch series. > > This patch series can also be found here - https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 Thanks Numan for the great improvement! I spent some time these days to review the series and now at patch 10. I need to take a break and so I just sent the comments I had so far. I also did scale testing for northd with https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / chassis setup, and noticed that the recompute latency has increased 20% after the series. I think in general it is expected to have some performance penalty for the recompute case because of the new indexes created for I-P. However, the patch 10 "northd: Refactor lflow management into a separate module." alone introduces a 10% latency increase, which necessitates more investigation. > > Prior to this patch series, most of the changes to northd engine > resulted in full recomputation of logical flows. This series > aims to improve the performance of ovn-northd by adding the I-P > support. I'd like to clarify "most of the changes" a little. I think we should focus on the most impactful changes that happen in a cloud environment. I don't think it is realistic to achieve "most of the changes" in I-P because it is too complex (even with this series we still handle a very small part of the DB schema incrementally), but it may be realistic to handle the most impactful changes, which are the most frequent changes in the cloud, incrementally. Before this series, we could handle regular VIF changes and related address-sets, port-groups (some of) updates incrementally. Those are the most common changes related to pod/VM changes in cloud. I believe the next goal is for LB changes related to pod/VMs (i.e. the LB backends), which I believe is the main goal of this series. Is my understanding correct? While reviewing the patches, I'd say sometimes I feel a little lost because it is hard to correlate some of the changes with the above goal. I believe there is a reason for all the changes but I am not sure if it is directly related to the goal of LB backend related I-P. I understand that there are other aspects of LB that can be incrementally processed. But I'd recommend if changes necessary for this goal can be largely reduced it would help the review and we might be able to merge them sooner and add more but less impactful I-P later step by step. I guess I will have a clearer picture when I finish the rest of the patches, but it would be helpful if you could add more guidance in this cover letter. Thanks, Han > In order to add this support, some of the northd engine > node data (from struct ovn_datapath) is split and moved over to > new engine nodes - mainly related to load balancers, NAT and ACLs. > > Below are the scale testing results done with these patches applied > using ovn-heater. The test ran the scenario - > ocp-500-density-heavy.yml [1]. > > With all the lflow I-P patches applied, the resuts are: > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Min (s) Median (s) 90%ile (s) 99%ile (s) Max (s) Mean (s) Total (s) Count Failed > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Iteration Total 0.136883 1.129016 1.192001 1.204167 1.212728 0.665017 83.127099 125 0 > Namespace.add_ports 0.005216 0.005736 0.007034 0.015486 0.018978 0.006211 0.776373 125 0 > WorkerNode.bind_port 0.035030 0.046082 0.052469 0.058293 0.060311 0.045973 11.493259 250 0 > WorkerNode.ping_port 0.005057 0.006727 1.047692 1.069253 1.071336 0.266896 66.724094 250 0 > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > The results with the present main are: > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Min (s) Median (s) 90%ile (s) 99%ile (s) Max (s) Mean (s) Total (s) Count Failed > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Iteration Total 0.135491 2.223805 3.311270 3.339078 3.345346 1.729172 216.146495 125 0 > Namespace.add_ports 0.005380 0.005744 0.006819 0.018773 0.020800 0.006292 0.786532 125 0 > WorkerNode.bind_port 0.034179 0.046055 0.053488 0.058801 0.071043 0.046117 11.529311 250 0 > WorkerNode.ping_port 0.004956 0.006952 3.086952 3.191743 3.192807 0.791544 197.886026 250 0 > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > [1] - https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml > > v1 -> v2 > -------- > * Now also maintaing array indexes for ls_lbacls, lr_nat and > lr_lb_nat_data tables (similar to ovn_datapaths->array) to > make the lookup effecient. The same ovn_datapath->index > is reused. > > * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c. > In v2 we don't use objdep_mgr to maintain the resource to lflow > references. Instead we maintain the 'struct lflow' pointer. > With this we don't need to maintain additional hmap of lflows. > > > Numan Siddique (18): > northd: Refactor the northd change tracking. > northd: Track ovn_datapaths in northd engine track data. > tests: Add a couple of tests in ovn-northd for I-P. > northd: Move router ports SB PB options sync to sync_to_sb_pb node. > northd: Add a new engine 'lr-nat' to manage lr NAT data. > northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb data. > northd: Generate logical router's LB and NAT flows using > lr_lbnat_data. > northd: Don't commit dhcp response flows in the conntrack. > northd: Add a new node ls_lbacls. > northd: Refactor lflow management into a separate module. > northd: Use lflow_ref when adding all logical flows. > northd: Move ovn_lb_datapaths from lib to northd module. > northd: Handle lb changes in lflow engine. > northd: Add lr_lb_nat_data handler for lflow engine node. > northd: Add ls_lbacls handler for lflow engine node. > northd: Add I-P for NB_Global and SB_Global. > northd: Add a noop handler for northd SB mac binding. > northd: Add northd change handler for sync_to_sb_lb node. > > lib/lb.c | 96 - > lib/lb.h | 57 - > lib/ovn-util.c | 17 +- > lib/ovn-util.h | 2 +- > lib/stopwatch-names.h | 3 + > northd/aging.c | 21 +- > northd/automake.mk | 12 +- > northd/en-global-config.c | 588 ++++ > northd/en-global-config.h | 65 + > northd/en-lflow.c | 123 +- > northd/en-lflow.h | 8 + > northd/en-lr-lb-nat-data.c | 685 ++++ > northd/en-lr-lb-nat-data.h | 125 + > northd/en-lr-nat.c | 498 +++ > northd/en-lr-nat.h | 137 + > northd/en-ls-lb-acls.c | 530 +++ > northd/en-ls-lb-acls.h | 111 + > northd/en-northd.c | 67 +- > northd/en-northd.h | 2 +- > northd/en-port-group.h | 3 + > northd/en-sync-sb.c | 513 ++- > northd/inc-proc-northd.c | 79 +- > northd/lflow-mgr.c | 1078 ++++++ > northd/lflow-mgr.h | 192 ++ > northd/northd.c | 6485 ++++++++++++++++-------------------- > northd/northd.h | 551 ++- > northd/ovn-northd.c | 4 + > tests/ovn-northd.at | 858 ++++- > 28 files changed, 8827 insertions(+), 4083 deletions(-) > create mode 100644 northd/en-global-config.c > create mode 100644 northd/en-global-config.h > create mode 100644 northd/en-lr-lb-nat-data.c > create mode 100644 northd/en-lr-lb-nat-data.h > create mode 100644 northd/en-lr-nat.c > create mode 100644 northd/en-lr-nat.h > create mode 100644 northd/en-ls-lb-acls.c > create mode 100644 northd/en-ls-lb-acls.h > create mode 100644 northd/lflow-mgr.c > create mode 100644 northd/lflow-mgr.h > > -- > 2.41.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <hzhou@ovn.org> wrote: > > On Thu, Oct 26, 2023 at 11:12 AM <numans@ovn.org> wrote: > > > > From: Numan Siddique <numans@ovn.org> > > > > This patch series adds incremental processing in the lflow engine > > node to handle changes to northd and other engine nodes. > > Changed related to load balancers and NAT are mainly handled in > > this patch series. > > > > This patch series can also be found here - > https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 > > Thanks Numan for the great improvement! Hi Han, Thanks for the review comments. I understand it is hard to review somany patches, especially related to I-P. I appreciate the time spent on it and the feedback. > > I spent some time these days to review the series and now at patch 10. I > need to take a break and so I just sent the comments I had so far. > > I also did scale testing for northd with > https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / > chassis setup, and noticed that the recompute latency has increased 20% > after the series. I think in general it is expected to have some > performance penalty for the recompute case because of the new indexes > created for I-P. However, the patch 10 "northd: Refactor lflow management > into a separate module." alone introduces a 10% latency increase, which > necessitates more investigation. Before sending out the series I did some testing on recomputes with a large OVN NB DB and SB DB (from a 500 node ovn-heater density heavy run). I'm aware of the increase in recomputes. And I did some more testing today as well. In my testing, I can see that the increase in latency is due to the new engine nodes added (lr_lbnat mainly) and due to housekeeping of the lflow references. I do not see any increase due to the new lflow-mgr.c added in patch 10. I compared patch 9 and patch 10 separately and there is no difference in lflow engine node recompute time between patch 9 and 10. Below are the results with the time taken for the mentioned engine nodes in msecs for a recompute for some of the individual patches in the series. The sample OVN DBs have -------------------------------- Resource Total ------------------------------- Logical switches 1001 ---------------------------- Logical routers 501 ---------------------------- Router ports 1501 ---------------------------- Switch ports 29501 ---------------------------- Load balancers 35253 ---------------------------- Load balancer group 1 ---------------------------- SB Logical flows 268349 ---------------------------- SB DB groups 509 ---------------------------- There is one load balancer group which has all the load balancers and it is associated with all the logical switches and routers. Below is the time taken for each engine node in msec --------------------------------------------------------------------- Engine nodes | lb_data | northd | lr_lbnat | lflow | --------------------------------------------------------------------- ovn-northd-main | 358 | 2455 | x | 2082 | --------------------------------------------------------------------- ovn-northd-p1 | 373 | 2476 | x | 2170 | --------------------------------------------------------------------- ovn-northd-p5 | 367 | 2413 | x | 2195 | --------------------------------------------------------------------- ovn-northd-p6 | 354 | 688 | 1815 | 2442 | --------------------------------------------------------------------- ovn-northd-p7 | 357 | 683 | 1778 | 2806 | --------------------------------------------------------------------- ovn-northd-p9 | 352 | 682 | 1781 | 2781 | --------------------------------------------------------------------- ovn-northd-p10 | 365 | 838 | 1707 | 2812 | --------------------------------------------------------------------- ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | --------------------------------------------------------------------- ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | --------------------------------------------------------------------- ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | --------------------------------------------------------------------- ovn-northd-p1 means ovn-northd with the patch 1 of this series, ovn-northd-p2 - patch 2 of this series and so on. Patch 6 adds a new engine node lr_lbnat and that's why the northd engine node time gets reduced from ~2413 msec to 688. With the first 10 patches, northd engine over all time increases to 200msec compared to "main '' and lflow engine node time increases to around 800 msec. The point of this data is to show that the overall increase is mainly due to bookkeeping and new engine nodes. I tried to optimise but I couldn't. IIMO this 20% overall increase should be fine for the following reasons - The scale tests with ovn-heater show significant decrease in ovn-northd CPU usage and the decrease in the overall test duration. - This means there were very few recomputes triggered. - With the OCP/K8s kube-burner tests, there were 0 recomputes in the northd and lflow engine nodes. > > > > > Prior to this patch series, most of the changes to northd engine > > resulted in full recomputation of logical flows. This series > > aims to improve the performance of ovn-northd by adding the I-P > > support. > > I'd like to clarify "most of the changes" a little. I think we should focus > on the most impactful changes that happen in a cloud environment. I don't > think it is realistic to achieve "most of the changes" in I-P because it is > too complex (even with this series we still handle a very small part of the > DB schema incrementally), but it may be realistic to handle the most > impactful changes, which are the most frequent changes in the cloud, > incrementally. Before this series, we could handle regular VIF changes and > related address-sets, port-groups (some of) updates incrementally. Those > are the most common changes related to pod/VM changes in cloud. I believe > the next goal is for LB changes related to pod/VMs (i.e. the LB backends), > which I believe is the main goal of this series. Is my understanding > correct? > That is correct. LB change handling is very important in my opinion when used with ovn-kubernetes because all the k8s services are mapped to OVN LBs. > While reviewing the patches, I'd say sometimes I feel a little lost because > it is hard to correlate some of the changes with the above goal. I understand that. Maybe I've done a bad job in conveying why the changes are required for LBs. Let me try to clarify a bit here. I'll incorporate and add more details in the patch commits and cover letter in the next series. This series mainly targets handling lflow generation for load balancer changes. As you know we use conntrack in OVN for - Load balancers (both in logical switch and router) - ACLs in logical switch - NAT in load balancers. And this makes it complicated to decouple logical flow generation for LBs from ACLs and NATs. That is the reason I had to split northd node data related to load balancers, ACLs and NATs into separate nodes. To give an example, we generate certain logical flows for a logical switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See build_acl_hints()). For logical routers, a NAT can be a LB VIP too. This becomes difficult to decouple LB and NAT. We could fall back to recompute for NAT changes. But we still need to group logical flows related to LB and NAT. And that's why I added lr_lbnat_data engine node in patch 6. I believe > there is a reason for all the changes but I am not sure if it is directly > related to the goal of LB backend related I-P. I understand that there are > other aspects of LB that can be incrementally processed. But I'd recommend > if changes necessary for this goal can be largely reduced it would help the > review and we might be able to merge them sooner and add more but less > impactful I-P later step by step. I guess I will have a clearer picture > when I finish the rest of the patches, but it would be helpful if you could > add more guidance in this cover letter. I'll definitely add more details in the cover letter. Apart from the LB related I-P, this series only adds 2 more additional patches for I-P. One is for the NB_Global changes and the other for SB mac binding changes. ovn-kubernetes periodically writes in the NB_Global.options column with a timestamp for its internal health checks and this results in unnecessary recomputes. After running the kube-burner density cni tests, I figured out the most common changes done in the NB DB and added the I-P for that. With all these patches applied, there are no recomputes in both northd and lflow engine nodes during the test duration. Not having the I-P for NB Global for example was resulting in significant ovn-northd CPU usage. Thanks again for the reviews. Numan > > Thanks, > Han > > > In order to add this support, some of the northd engine > > node data (from struct ovn_datapath) is split and moved over to > > new engine nodes - mainly related to load balancers, NAT and ACLs. > > > > Below are the scale testing results done with these patches applied > > using ovn-heater. The test ran the scenario - > > ocp-500-density-heavy.yml [1]. > > > > With all the lflow I-P patches applied, the resuts are: > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Min (s) Median (s) 90%ile (s) > 99%ile (s) Max (s) Mean (s) Total (s) Count > Failed > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Iteration Total 0.136883 1.129016 1.192001 > 1.204167 1.212728 0.665017 83.127099 125 0 > > Namespace.add_ports 0.005216 0.005736 0.007034 > 0.015486 0.018978 0.006211 0.776373 125 0 > > WorkerNode.bind_port 0.035030 0.046082 0.052469 > 0.058293 0.060311 0.045973 11.493259 250 0 > > WorkerNode.ping_port 0.005057 0.006727 1.047692 > 1.069253 1.071336 0.266896 66.724094 250 0 > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > The results with the present main are: > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Min (s) Median (s) 90%ile (s) > 99%ile (s) Max (s) Mean (s) Total (s) Count > Failed > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Iteration Total 0.135491 2.223805 3.311270 > 3.339078 3.345346 1.729172 216.146495 125 0 > > Namespace.add_ports 0.005380 0.005744 0.006819 > 0.018773 0.020800 0.006292 0.786532 125 0 > > WorkerNode.bind_port 0.034179 0.046055 0.053488 > 0.058801 0.071043 0.046117 11.529311 250 0 > > WorkerNode.ping_port 0.004956 0.006952 3.086952 > 3.191743 3.192807 0.791544 197.886026 250 0 > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > [1] - > https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml > > > > v1 -> v2 > > -------- > > * Now also maintaing array indexes for ls_lbacls, lr_nat and > > lr_lb_nat_data tables (similar to ovn_datapaths->array) to > > make the lookup effecient. The same ovn_datapath->index > > is reused. > > > > * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c. > > In v2 we don't use objdep_mgr to maintain the resource to lflow > > references. Instead we maintain the 'struct lflow' pointer. > > With this we don't need to maintain additional hmap of lflows. > > > > > > Numan Siddique (18): > > northd: Refactor the northd change tracking. > > northd: Track ovn_datapaths in northd engine track data. > > tests: Add a couple of tests in ovn-northd for I-P. > > northd: Move router ports SB PB options sync to sync_to_sb_pb node. > > northd: Add a new engine 'lr-nat' to manage lr NAT data. > > northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb data. > > northd: Generate logical router's LB and NAT flows using > > lr_lbnat_data. > > northd: Don't commit dhcp response flows in the conntrack. > > northd: Add a new node ls_lbacls. > > northd: Refactor lflow management into a separate module. > > northd: Use lflow_ref when adding all logical flows. > > northd: Move ovn_lb_datapaths from lib to northd module. > > northd: Handle lb changes in lflow engine. > > northd: Add lr_lb_nat_data handler for lflow engine node. > > northd: Add ls_lbacls handler for lflow engine node. > > northd: Add I-P for NB_Global and SB_Global. > > northd: Add a noop handler for northd SB mac binding. > > northd: Add northd change handler for sync_to_sb_lb node. > > > > lib/lb.c | 96 - > > lib/lb.h | 57 - > > lib/ovn-util.c | 17 +- > > lib/ovn-util.h | 2 +- > > lib/stopwatch-names.h | 3 + > > northd/aging.c | 21 +- > > northd/automake.mk | 12 +- > > northd/en-global-config.c | 588 ++++ > > northd/en-global-config.h | 65 + > > northd/en-lflow.c | 123 +- > > northd/en-lflow.h | 8 + > > northd/en-lr-lb-nat-data.c | 685 ++++ > > northd/en-lr-lb-nat-data.h | 125 + > > northd/en-lr-nat.c | 498 +++ > > northd/en-lr-nat.h | 137 + > > northd/en-ls-lb-acls.c | 530 +++ > > northd/en-ls-lb-acls.h | 111 + > > northd/en-northd.c | 67 +- > > northd/en-northd.h | 2 +- > > northd/en-port-group.h | 3 + > > northd/en-sync-sb.c | 513 ++- > > northd/inc-proc-northd.c | 79 +- > > northd/lflow-mgr.c | 1078 ++++++ > > northd/lflow-mgr.h | 192 ++ > > northd/northd.c | 6485 ++++++++++++++++-------------------- > > northd/northd.h | 551 ++- > > northd/ovn-northd.c | 4 + > > tests/ovn-northd.at | 858 ++++- > > 28 files changed, 8827 insertions(+), 4083 deletions(-) > > create mode 100644 northd/en-global-config.c > > create mode 100644 northd/en-global-config.h > > create mode 100644 northd/en-lr-lb-nat-data.c > > create mode 100644 northd/en-lr-lb-nat-data.h > > create mode 100644 northd/en-lr-nat.c > > create mode 100644 northd/en-lr-nat.h > > create mode 100644 northd/en-ls-lb-acls.c > > create mode 100644 northd/en-ls-lb-acls.h > > create mode 100644 northd/lflow-mgr.c > > create mode 100644 northd/lflow-mgr.h > > > > -- > > 2.41.0 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <numans@ovn.org> wrote: > > On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <hzhou@ovn.org> wrote: > > > > On Thu, Oct 26, 2023 at 11:12 AM <numans@ovn.org> wrote: > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > This patch series adds incremental processing in the lflow engine > > > node to handle changes to northd and other engine nodes. > > > Changed related to load balancers and NAT are mainly handled in > > > this patch series. > > > > > > This patch series can also be found here - > > https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 > > > > Thanks Numan for the great improvement! > > Hi Han, > > Thanks for the review comments. I understand it is hard to review > somany patches, especially related to I-P. > I appreciate the time spent on it and the feedback. > > > > > I spent some time these days to review the series and now at patch 10. I > > need to take a break and so I just sent the comments I had so far. > > > > I also did scale testing for northd with > > https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / > > chassis setup, and noticed that the recompute latency has increased 20% > > after the series. I think in general it is expected to have some > > performance penalty for the recompute case because of the new indexes > > created for I-P. However, the patch 10 "northd: Refactor lflow management > > into a separate module." alone introduces a 10% latency increase, which > > necessitates more investigation. > > Before sending out the series I did some testing on recomputes with a > large OVN NB DB and SB DB > (from a 500 node ovn-heater density heavy run). > I'm aware of the increase in recomputes. And I did some more testing > today as well. > > In my testing, I can see that the increase in latency is due to the > new engine nodes added (lr_lbnat mainly) > and due to housekeeping of the lflow references. I do not see any > increase due to the new lflow-mgr.c added in patch 10. > > I compared patch 9 and patch 10 separately and there is no difference > in lflow engine node recompute time between patch 9 and 10. > My results were different. My test profile simulates the ovn-k8s topology (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small amount of ACLs and PGs. ( https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg ) The test results for ovn-northd recompute time are: main: 1118.3 p9: 1129.5 p10: 1243.4 ==> 10% more than p9 p18: 1357.6 I am not sure if the p10 increase is related to the hash change or something else. > Below are the results with the time taken for the mentioned engine > nodes in msecs for a recompute for some of the individual patches in > the series. > > > The sample OVN DBs have > > -------------------------------- > Resource Total > ------------------------------- > Logical switches 1001 > ---------------------------- > Logical routers 501 > ---------------------------- > Router ports 1501 > ---------------------------- > Switch ports 29501 > ---------------------------- > Load balancers 35253 > ---------------------------- > Load balancer group 1 > ---------------------------- > SB Logical flows 268349 > ---------------------------- > SB DB groups 509 > ---------------------------- > > There is one load balancer group which has all the load balancers and > it is associated with all the logical switches and routers. > > Below is the time taken for each engine node in msec > > --------------------------------------------------------------------- > Engine nodes | lb_data | northd | lr_lbnat | lflow | > --------------------------------------------------------------------- > ovn-northd-main | 358 | 2455 | x | 2082 | > --------------------------------------------------------------------- > ovn-northd-p1 | 373 | 2476 | x | 2170 | > --------------------------------------------------------------------- > ovn-northd-p5 | 367 | 2413 | x | 2195 | > --------------------------------------------------------------------- > ovn-northd-p6 | 354 | 688 | 1815 | 2442 | > --------------------------------------------------------------------- > ovn-northd-p7 | 357 | 683 | 1778 | 2806 | > --------------------------------------------------------------------- > ovn-northd-p9 | 352 | 682 | 1781 | 2781 | > --------------------------------------------------------------------- > ovn-northd-p10 | 365 | 838 | 1707 | 2812 | > --------------------------------------------------------------------- > ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | > --------------------------------------------------------------------- > ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | > --------------------------------------------------------------------- > ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | > --------------------------------------------------------------------- > > ovn-northd-p1 means ovn-northd with the patch 1 of this series, > ovn-northd-p2 - patch 2 of this series and so on. > > Patch 6 adds a new engine node lr_lbnat and that's why the northd > engine node time gets reduced from ~2413 msec to 688. > > With the first 10 patches, northd engine over all time increases to > 200msec compared to "main '' and lflow engine node time increases to > around 800 msec. > The point of this data is to show that the overall increase is mainly > due to bookkeeping and new engine nodes. I tried to optimise but I > couldn't. > > > IIMO this 20% overall increase should be fine for the following reasons > - The scale tests with ovn-heater show significant decrease in > ovn-northd CPU usage and the decrease in the overall test duration. > - This means there were very few recomputes triggered. > - With the OCP/K8s kube-burner tests, there were 0 recomputes > in the northd and lflow engine nodes. > > > > > > > > > > > Prior to this patch series, most of the changes to northd engine > > > resulted in full recomputation of logical flows. This series > > > aims to improve the performance of ovn-northd by adding the I-P > > > support. > > > > I'd like to clarify "most of the changes" a little. I think we should focus > > on the most impactful changes that happen in a cloud environment. I don't > > think it is realistic to achieve "most of the changes" in I-P because it is > > too complex (even with this series we still handle a very small part of the > > DB schema incrementally), but it may be realistic to handle the most > > impactful changes, which are the most frequent changes in the cloud, > > incrementally. Before this series, we could handle regular VIF changes and > > related address-sets, port-groups (some of) updates incrementally. Those > > are the most common changes related to pod/VM changes in cloud. I believe > > the next goal is for LB changes related to pod/VMs (i.e. the LB backends), > > which I believe is the main goal of this series. Is my understanding > > correct? > > > That is correct. LB change handling is very important in my opinion > when used with ovn-kubernetes because all the k8s services > are mapped to OVN LBs. > > > > While reviewing the patches, I'd say sometimes I feel a little lost because > > it is hard to correlate some of the changes with the above goal. > > I understand that. Maybe I've done a bad job in conveying why the > changes are required for LBs. > > Let me try to clarify a bit here. I'll incorporate and add more > details in the patch commits and cover letter in the next series. > > This series mainly targets handling lflow generation for load balancer changes. > As you know we use conntrack in OVN for > - Load balancers (both in logical switch and router) > - ACLs in logical switch > - NAT in load balancers. > > And this makes it complicated to decouple logical flow generation for > LBs from ACLs and NATs. That is the reason I had to split northd node > data related to load balancers, ACLs and NATs into separate nodes. > > To give an example, we generate certain logical flows for a logical > switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See > build_acl_hints()). > For logical routers, a NAT can be a LB VIP too. This becomes > difficult to decouple LB and NAT. We could fall back to recompute > for NAT changes. But we still need to group logical flows related to > LB and NAT. And that's why I added lr_lbnat_data engine node in patch > 6. > Thanks for explaining. However, I wonder if these are related to LB backend updates. I agree that service creation/deletion may also be important to be handled incrementally, but they are less critical (less frequent) than the backend updates which are directly related to pod/vm creation/deletion. The has_xxx and NAT related handling are more related to the service creation/deletion rather than LB backend updates, right? Thanks, Han > I believe > > there is a reason for all the changes but I am not sure if it is directly > > related to the goal of LB backend related I-P. I understand that there are > > other aspects of LB that can be incrementally processed. But I'd recommend > > if changes necessary for this goal can be largely reduced it would help the > > review and we might be able to merge them sooner and add more but less > > impactful I-P later step by step. I guess I will have a clearer picture > > when I finish the rest of the patches, but it would be helpful if you could > > add more guidance in this cover letter. > > I'll definitely add more details in the cover letter. > > Apart from the LB related I-P, this series only adds 2 more additional > patches for I-P. > One is for the NB_Global changes and the other for SB mac binding changes. > ovn-kubernetes periodically writes in the NB_Global.options column > with a timestamp for its internal health checks > and this results in unnecessary recomputes. After running the > kube-burner density cni tests, I figured out the most common > changes done in the NB DB and added the I-P for that. With all these > patches applied, there are no recomputes in both northd and lflow > engine nodes > during the test duration. Not having the I-P for NB Global for > example was resulting in significant ovn-northd CPU usage. > > Thanks again for the reviews. > > Numan > > > > > Thanks, > > Han > > > > > In order to add this support, some of the northd engine > > > node data (from struct ovn_datapath) is split and moved over to > > > new engine nodes - mainly related to load balancers, NAT and ACLs. > > > > > > Below are the scale testing results done with these patches applied > > > using ovn-heater. The test ran the scenario - > > > ocp-500-density-heavy.yml [1]. > > > > > > With all the lflow I-P patches applied, the resuts are: > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Min (s) Median (s) 90%ile (s) > > 99%ile (s) Max (s) Mean (s) Total (s) Count > > Failed > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Iteration Total 0.136883 1.129016 1.192001 > > 1.204167 1.212728 0.665017 83.127099 125 0 > > > Namespace.add_ports 0.005216 0.005736 0.007034 > > 0.015486 0.018978 0.006211 0.776373 125 0 > > > WorkerNode.bind_port 0.035030 0.046082 0.052469 > > 0.058293 0.060311 0.045973 11.493259 250 0 > > > WorkerNode.ping_port 0.005057 0.006727 1.047692 > > 1.069253 1.071336 0.266896 66.724094 250 0 > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > The results with the present main are: > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Min (s) Median (s) 90%ile (s) > > 99%ile (s) Max (s) Mean (s) Total (s) Count > > Failed > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Iteration Total 0.135491 2.223805 3.311270 > > 3.339078 3.345346 1.729172 216.146495 125 0 > > > Namespace.add_ports 0.005380 0.005744 0.006819 > > 0.018773 0.020800 0.006292 0.786532 125 0 > > > WorkerNode.bind_port 0.034179 0.046055 0.053488 > > 0.058801 0.071043 0.046117 11.529311 250 0 > > > WorkerNode.ping_port 0.004956 0.006952 3.086952 > > 3.191743 3.192807 0.791544 197.886026 250 0 > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > [1] - > > https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml > > > > > > v1 -> v2 > > > -------- > > > * Now also maintaing array indexes for ls_lbacls, lr_nat and > > > lr_lb_nat_data tables (similar to ovn_datapaths->array) to > > > make the lookup effecient. The same ovn_datapath->index > > > is reused. > > > > > > * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c. > > > In v2 we don't use objdep_mgr to maintain the resource to lflow > > > references. Instead we maintain the 'struct lflow' pointer. > > > With this we don't need to maintain additional hmap of lflows. > > > > > > > > > Numan Siddique (18): > > > northd: Refactor the northd change tracking. > > > northd: Track ovn_datapaths in northd engine track data. > > > tests: Add a couple of tests in ovn-northd for I-P. > > > northd: Move router ports SB PB options sync to sync_to_sb_pb node. > > > northd: Add a new engine 'lr-nat' to manage lr NAT data. > > > northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb data. > > > northd: Generate logical router's LB and NAT flows using > > > lr_lbnat_data. > > > northd: Don't commit dhcp response flows in the conntrack. > > > northd: Add a new node ls_lbacls. > > > northd: Refactor lflow management into a separate module. > > > northd: Use lflow_ref when adding all logical flows. > > > northd: Move ovn_lb_datapaths from lib to northd module. > > > northd: Handle lb changes in lflow engine. > > > northd: Add lr_lb_nat_data handler for lflow engine node. > > > northd: Add ls_lbacls handler for lflow engine node. > > > northd: Add I-P for NB_Global and SB_Global. > > > northd: Add a noop handler for northd SB mac binding. > > > northd: Add northd change handler for sync_to_sb_lb node. > > > > > > lib/lb.c | 96 - > > > lib/lb.h | 57 - > > > lib/ovn-util.c | 17 +- > > > lib/ovn-util.h | 2 +- > > > lib/stopwatch-names.h | 3 + > > > northd/aging.c | 21 +- > > > northd/automake.mk | 12 +- > > > northd/en-global-config.c | 588 ++++ > > > northd/en-global-config.h | 65 + > > > northd/en-lflow.c | 123 +- > > > northd/en-lflow.h | 8 + > > > northd/en-lr-lb-nat-data.c | 685 ++++ > > > northd/en-lr-lb-nat-data.h | 125 + > > > northd/en-lr-nat.c | 498 +++ > > > northd/en-lr-nat.h | 137 + > > > northd/en-ls-lb-acls.c | 530 +++ > > > northd/en-ls-lb-acls.h | 111 + > > > northd/en-northd.c | 67 +- > > > northd/en-northd.h | 2 +- > > > northd/en-port-group.h | 3 + > > > northd/en-sync-sb.c | 513 ++- > > > northd/inc-proc-northd.c | 79 +- > > > northd/lflow-mgr.c | 1078 ++++++ > > > northd/lflow-mgr.h | 192 ++ > > > northd/northd.c | 6485 ++++++++++++++++-------------------- > > > northd/northd.h | 551 ++- > > > northd/ovn-northd.c | 4 + > > > tests/ovn-northd.at | 858 ++++- > > > 28 files changed, 8827 insertions(+), 4083 deletions(-) > > > create mode 100644 northd/en-global-config.c > > > create mode 100644 northd/en-global-config.h > > > create mode 100644 northd/en-lr-lb-nat-data.c > > > create mode 100644 northd/en-lr-lb-nat-data.h > > > create mode 100644 northd/en-lr-nat.c > > > create mode 100644 northd/en-lr-nat.h > > > create mode 100644 northd/en-ls-lb-acls.c > > > create mode 100644 northd/en-ls-lb-acls.h > > > create mode 100644 northd/lflow-mgr.c > > > create mode 100644 northd/lflow-mgr.h > > > > > > -- > > > 2.41.0 > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <hzhou@ovn.org> wrote: > > On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <numans@ovn.org> wrote: > > > > On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <hzhou@ovn.org> wrote: > > > > > > On Thu, Oct 26, 2023 at 11:12 AM <numans@ovn.org> wrote: > > > > > > > > From: Numan Siddique <numans@ovn.org> > > > > > > > > This patch series adds incremental processing in the lflow engine > > > > node to handle changes to northd and other engine nodes. > > > > Changed related to load balancers and NAT are mainly handled in > > > > this patch series. > > > > > > > > This patch series can also be found here - > > > https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 > > > > > > Thanks Numan for the great improvement! > > > > Hi Han, > > > > Thanks for the review comments. I understand it is hard to review > > somany patches, especially related to I-P. > > I appreciate the time spent on it and the feedback. > > > > > > > > I spent some time these days to review the series and now at patch 10. I > > > need to take a break and so I just sent the comments I had so far. > > > > > > I also did scale testing for northd with > > > https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / > > > chassis setup, and noticed that the recompute latency has increased 20% > > > after the series. I think in general it is expected to have some > > > performance penalty for the recompute case because of the new indexes > > > created for I-P. However, the patch 10 "northd: Refactor lflow > management > > > into a separate module." alone introduces a 10% latency increase, which > > > necessitates more investigation. > > > > Before sending out the series I did some testing on recomputes with a > > large OVN NB DB and SB DB > > (from a 500 node ovn-heater density heavy run). > > I'm aware of the increase in recomputes. And I did some more testing > > today as well. > > > > In my testing, I can see that the increase in latency is due to the > > new engine nodes added (lr_lbnat mainly) > > and due to housekeeping of the lflow references. I do not see any > > increase due to the new lflow-mgr.c added in patch 10. > > > > I compared patch 9 and patch 10 separately and there is no difference > > in lflow engine node recompute time between patch 9 and 10. > > > > My results were different. My test profile simulates the ovn-k8s topology > (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small > amount of ACLs and PGs. > ( > https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg > ) > > The test results for ovn-northd recompute time are: > main: 1118.3 > p9: 1129.5 > p10: 1243.4 ==> 10% more than p9 > p18: 1357.6 > > I am not sure if the p10 increase is related to the hash change or > something else. > > > Below are the results with the time taken for the mentioned engine > > nodes in msecs for a recompute for some of the individual patches in > > the series. > > > > > > The sample OVN DBs have > > > > -------------------------------- > > Resource Total > > ------------------------------- > > Logical switches 1001 > > ---------------------------- > > Logical routers 501 > > ---------------------------- > > Router ports 1501 > > ---------------------------- > > Switch ports 29501 > > ---------------------------- > > Load balancers 35253 > > ---------------------------- > > Load balancer group 1 > > ---------------------------- > > SB Logical flows 268349 > > ---------------------------- > > SB DB groups 509 > > ---------------------------- > > > > There is one load balancer group which has all the load balancers and > > it is associated with all the logical switches and routers. > > > > Below is the time taken for each engine node in msec > > > > --------------------------------------------------------------------- > > Engine nodes | lb_data | northd | lr_lbnat | lflow | > > --------------------------------------------------------------------- > > ovn-northd-main | 358 | 2455 | x | 2082 | > > --------------------------------------------------------------------- > > ovn-northd-p1 | 373 | 2476 | x | 2170 | > > --------------------------------------------------------------------- > > ovn-northd-p5 | 367 | 2413 | x | 2195 | > > --------------------------------------------------------------------- > > ovn-northd-p6 | 354 | 688 | 1815 | 2442 | > > --------------------------------------------------------------------- > > ovn-northd-p7 | 357 | 683 | 1778 | 2806 | > > --------------------------------------------------------------------- > > ovn-northd-p9 | 352 | 682 | 1781 | 2781 | > > --------------------------------------------------------------------- > > ovn-northd-p10 | 365 | 838 | 1707 | 2812 | > > --------------------------------------------------------------------- > > ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | > > --------------------------------------------------------------------- > > ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | > > --------------------------------------------------------------------- > > ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | > > --------------------------------------------------------------------- > > > > ovn-northd-p1 means ovn-northd with the patch 1 of this series, > > ovn-northd-p2 - patch 2 of this series and so on. > > > > Patch 6 adds a new engine node lr_lbnat and that's why the northd > > engine node time gets reduced from ~2413 msec to 688. > > > > With the first 10 patches, northd engine over all time increases to > > 200msec compared to "main '' and lflow engine node time increases to > > around 800 msec. > > The point of this data is to show that the overall increase is mainly > > due to bookkeeping and new engine nodes. I tried to optimise but I > > couldn't. > > > > > > IIMO this 20% overall increase should be fine for the following reasons > > - The scale tests with ovn-heater show significant decrease in > > ovn-northd CPU usage and the decrease in the overall test duration. > > - This means there were very few recomputes triggered. > > - With the OCP/K8s kube-burner tests, there were 0 recomputes > > in the northd and lflow engine nodes. > > > > > > > > > > > > > > > > > Prior to this patch series, most of the changes to northd engine > > > > resulted in full recomputation of logical flows. This series > > > > aims to improve the performance of ovn-northd by adding the I-P > > > > support. > > > > > > I'd like to clarify "most of the changes" a little. I think we should > focus > > > on the most impactful changes that happen in a cloud environment. I > don't > > > think it is realistic to achieve "most of the changes" in I-P because > it is > > > too complex (even with this series we still handle a very small part of > the > > > DB schema incrementally), but it may be realistic to handle the most > > > impactful changes, which are the most frequent changes in the cloud, > > > incrementally. Before this series, we could handle regular VIF changes > and > > > related address-sets, port-groups (some of) updates incrementally. Those > > > are the most common changes related to pod/VM changes in cloud. I > believe > > > the next goal is for LB changes related to pod/VMs (i.e. the LB > backends), > > > which I believe is the main goal of this series. Is my understanding > > > correct? > > > > > That is correct. LB change handling is very important in my opinion > > when used with ovn-kubernetes because all the k8s services > > are mapped to OVN LBs. > > > > > > > While reviewing the patches, I'd say sometimes I feel a little lost > because > > > it is hard to correlate some of the changes with the above goal. > > > > I understand that. Maybe I've done a bad job in conveying why the > > changes are required for LBs. > > > > Let me try to clarify a bit here. I'll incorporate and add more > > details in the patch commits and cover letter in the next series. > > > > This series mainly targets handling lflow generation for load balancer > changes. > > As you know we use conntrack in OVN for > > - Load balancers (both in logical switch and router) > > - ACLs in logical switch > > - NAT in load balancers. > > > > And this makes it complicated to decouple logical flow generation for > > LBs from ACLs and NATs. That is the reason I had to split northd node > > data related to load balancers, ACLs and NATs into separate nodes. > > > > To give an example, we generate certain logical flows for a logical > > switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See > > build_acl_hints()). > > For logical routers, a NAT can be a LB VIP too. This becomes > > difficult to decouple LB and NAT. We could fall back to recompute > > for NAT changes. But we still need to group logical flows related to > > LB and NAT. And that's why I added lr_lbnat_data engine node in patch > > 6. > > > > Thanks for explaining. However, I wonder if these are related to LB backend > updates. I agree that service creation/deletion may also be important to be > handled incrementally, but they are less critical (less frequent) than the > backend updates which are directly related to pod/vm creation/deletion. From what I understand, at least for OCP use cases they seem to be critical. Also it depends on the use case I suppose. The > has_xxx and NAT related handling are more related to the service > creation/deletion rather than LB backend updates, right? I think so. Numan > > Thanks, > Han > > > I believe > > > there is a reason for all the changes but I am not sure if it is > directly > > > related to the goal of LB backend related I-P. I understand that there > are > > > other aspects of LB that can be incrementally processed. But I'd > recommend > > > if changes necessary for this goal can be largely reduced it would help > the > > > review and we might be able to merge them sooner and add more but less > > > impactful I-P later step by step. I guess I will have a clearer picture > > > when I finish the rest of the patches, but it would be helpful if you > could > > > add more guidance in this cover letter. > > > > I'll definitely add more details in the cover letter. > > > > Apart from the LB related I-P, this series only adds 2 more additional > > patches for I-P. > > One is for the NB_Global changes and the other for SB mac binding changes. > > ovn-kubernetes periodically writes in the NB_Global.options column > > with a timestamp for its internal health checks > > and this results in unnecessary recomputes. After running the > > kube-burner density cni tests, I figured out the most common > > changes done in the NB DB and added the I-P for that. With all these > > patches applied, there are no recomputes in both northd and lflow > > engine nodes > > during the test duration. Not having the I-P for NB Global for > > example was resulting in significant ovn-northd CPU usage. > > > > Thanks again for the reviews. > > > > Numan > > > > > > > > Thanks, > > > Han > > > > > > > In order to add this support, some of the northd engine > > > > node data (from struct ovn_datapath) is split and moved over to > > > > new engine nodes - mainly related to load balancers, NAT and ACLs. > > > > > > > > Below are the scale testing results done with these patches applied > > > > using ovn-heater. The test ran the scenario - > > > > ocp-500-density-heavy.yml [1]. > > > > > > > > With all the lflow I-P patches applied, the resuts are: > > > > > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > Min (s) Median (s) 90%ile (s) > > > 99%ile (s) Max (s) Mean (s) Total (s) Count > > > Failed > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > Iteration Total 0.136883 1.129016 1.192001 > > > 1.204167 1.212728 0.665017 83.127099 125 > 0 > > > > Namespace.add_ports 0.005216 0.005736 0.007034 > > > 0.015486 0.018978 0.006211 0.776373 125 > 0 > > > > WorkerNode.bind_port 0.035030 0.046082 0.052469 > > > 0.058293 0.060311 0.045973 11.493259 250 > 0 > > > > WorkerNode.ping_port 0.005057 0.006727 1.047692 > > > 1.069253 1.071336 0.266896 66.724094 250 > 0 > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > > > The results with the present main are: > > > > > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > Min (s) Median (s) 90%ile (s) > > > 99%ile (s) Max (s) Mean (s) Total (s) Count > > > Failed > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > Iteration Total 0.135491 2.223805 3.311270 > > > 3.339078 3.345346 1.729172 216.146495 125 > 0 > > > > Namespace.add_ports 0.005380 0.005744 0.006819 > > > 0.018773 0.020800 0.006292 0.786532 125 > 0 > > > > WorkerNode.bind_port 0.034179 0.046055 0.053488 > > > 0.058801 0.071043 0.046117 11.529311 250 > 0 > > > > WorkerNode.ping_port 0.004956 0.006952 3.086952 > > > 3.191743 3.192807 0.791544 197.886026 250 > 0 > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > > > [1] - > > > > https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml > > > > > > > > v1 -> v2 > > > > -------- > > > > * Now also maintaing array indexes for ls_lbacls, lr_nat and > > > > lr_lb_nat_data tables (similar to ovn_datapaths->array) to > > > > make the lookup effecient. The same ovn_datapath->index > > > > is reused. > > > > > > > > * Made some signficant changes to 'struct lflow_ref' in > lflow-mgr.c. > > > > In v2 we don't use objdep_mgr to maintain the resource to lflow > > > > references. Instead we maintain the 'struct lflow' pointer. > > > > With this we don't need to maintain additional hmap of lflows. > > > > > > > > > > > > Numan Siddique (18): > > > > northd: Refactor the northd change tracking. > > > > northd: Track ovn_datapaths in northd engine track data. > > > > tests: Add a couple of tests in ovn-northd for I-P. > > > > northd: Move router ports SB PB options sync to sync_to_sb_pb node. > > > > northd: Add a new engine 'lr-nat' to manage lr NAT data. > > > > northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb > data. > > > > northd: Generate logical router's LB and NAT flows using > > > > lr_lbnat_data. > > > > northd: Don't commit dhcp response flows in the conntrack. > > > > northd: Add a new node ls_lbacls. > > > > northd: Refactor lflow management into a separate module. > > > > northd: Use lflow_ref when adding all logical flows. > > > > northd: Move ovn_lb_datapaths from lib to northd module. > > > > northd: Handle lb changes in lflow engine. > > > > northd: Add lr_lb_nat_data handler for lflow engine node. > > > > northd: Add ls_lbacls handler for lflow engine node. > > > > northd: Add I-P for NB_Global and SB_Global. > > > > northd: Add a noop handler for northd SB mac binding. > > > > northd: Add northd change handler for sync_to_sb_lb node. > > > > > > > > lib/lb.c | 96 - > > > > lib/lb.h | 57 - > > > > lib/ovn-util.c | 17 +- > > > > lib/ovn-util.h | 2 +- > > > > lib/stopwatch-names.h | 3 + > > > > northd/aging.c | 21 +- > > > > northd/automake.mk | 12 +- > > > > northd/en-global-config.c | 588 ++++ > > > > northd/en-global-config.h | 65 + > > > > northd/en-lflow.c | 123 +- > > > > northd/en-lflow.h | 8 + > > > > northd/en-lr-lb-nat-data.c | 685 ++++ > > > > northd/en-lr-lb-nat-data.h | 125 + > > > > northd/en-lr-nat.c | 498 +++ > > > > northd/en-lr-nat.h | 137 + > > > > northd/en-ls-lb-acls.c | 530 +++ > > > > northd/en-ls-lb-acls.h | 111 + > > > > northd/en-northd.c | 67 +- > > > > northd/en-northd.h | 2 +- > > > > northd/en-port-group.h | 3 + > > > > northd/en-sync-sb.c | 513 ++- > > > > northd/inc-proc-northd.c | 79 +- > > > > northd/lflow-mgr.c | 1078 ++++++ > > > > northd/lflow-mgr.h | 192 ++ > > > > northd/northd.c | 6485 > ++++++++++++++++-------------------- > > > > northd/northd.h | 551 ++- > > > > northd/ovn-northd.c | 4 + > > > > tests/ovn-northd.at | 858 ++++- > > > > 28 files changed, 8827 insertions(+), 4083 deletions(-) > > > > create mode 100644 northd/en-global-config.c > > > > create mode 100644 northd/en-global-config.h > > > > create mode 100644 northd/en-lr-lb-nat-data.c > > > > create mode 100644 northd/en-lr-lb-nat-data.h > > > > create mode 100644 northd/en-lr-nat.c > > > > create mode 100644 northd/en-lr-nat.h > > > > create mode 100644 northd/en-ls-lb-acls.c > > > > create mode 100644 northd/en-ls-lb-acls.h > > > > create mode 100644 northd/lflow-mgr.c > > > > create mode 100644 northd/lflow-mgr.h > > > > > > > > -- > > > > 2.41.0 > > > > > > > > _______________________________________________ > > > > dev mailing list > > > > dev@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 11/16/23 22:05, Numan Siddique wrote: > On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <hzhou@ovn.org> wrote: >> >> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <numans@ovn.org> wrote: >>> >>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <hzhou@ovn.org> wrote: >>>> >>>> On Thu, Oct 26, 2023 at 11:12 AM <numans@ovn.org> wrote: >>>>> >>>>> From: Numan Siddique <numans@ovn.org> >>>>> >>>>> This patch series adds incremental processing in the lflow engine >>>>> node to handle changes to northd and other engine nodes. >>>>> Changed related to load balancers and NAT are mainly handled in >>>>> this patch series. >>>>> >>>>> This patch series can also be found here - >>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 >>>> >>>> Thanks Numan for the great improvement! >>> >>> Hi Han, >>> >>> Thanks for the review comments. I understand it is hard to review >>> somany patches, especially related to I-P. >>> I appreciate the time spent on it and the feedback. >>> >>>> >>>> I spent some time these days to review the series and now at patch 10. I >>>> need to take a break and so I just sent the comments I had so far. >>>> >>>> I also did scale testing for northd with >>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / >>>> chassis setup, and noticed that the recompute latency has increased 20% >>>> after the series. I think in general it is expected to have some >>>> performance penalty for the recompute case because of the new indexes >>>> created for I-P. However, the patch 10 "northd: Refactor lflow >> management >>>> into a separate module." alone introduces a 10% latency increase, which >>>> necessitates more investigation. >>> >>> Before sending out the series I did some testing on recomputes with a >>> large OVN NB DB and SB DB >>> (from a 500 node ovn-heater density heavy run). >>> I'm aware of the increase in recomputes. And I did some more testing >>> today as well. >>> >>> In my testing, I can see that the increase in latency is due to the >>> new engine nodes added (lr_lbnat mainly) >>> and due to housekeeping of the lflow references. I do not see any >>> increase due to the new lflow-mgr.c added in patch 10. >>> >>> I compared patch 9 and patch 10 separately and there is no difference >>> in lflow engine node recompute time between patch 9 and 10. >>> >> >> My results were different. My test profile simulates the ovn-k8s topology >> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small >> amount of ACLs and PGs. >> ( >> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg >> ) >> >> The test results for ovn-northd recompute time are: >> main: 1118.3 >> p9: 1129.5 >> p10: 1243.4 ==> 10% more than p9 >> p18: 1357.6 >> >> I am not sure if the p10 increase is related to the hash change or >> something else. >> I didn't review p10 in detail yet but I did run some tests and (with gcc and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no significant difference between p9 and p10 when running Han's scale test profile. Then I also tried the same thing using the 500 nodes + 50 LSP/node perf test we already have in the repo: https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185 Again, I didn't see any significant change between p9 and p10 on my laptop. I wonder what's different in our setups. Kind of related, I posted a series to improve the in-tree perf testing and allow us to also gather recompute stats (build the DB, reset stopwatches and trigger a recompute, then collect stopwatches): https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=* Would it be an idea to try to merge both scenarios you guys used into something that's defined as a new test in-tree? Like that it's easier for anyone to just run the same test. Thanks, Dumitru >>> Below are the results with the time taken for the mentioned engine >>> nodes in msecs for a recompute for some of the individual patches in >>> the series. >>> >>> >>> The sample OVN DBs have >>> >>> -------------------------------- >>> Resource Total >>> ------------------------------- >>> Logical switches 1001 >>> ---------------------------- >>> Logical routers 501 >>> ---------------------------- >>> Router ports 1501 >>> ---------------------------- >>> Switch ports 29501 >>> ---------------------------- >>> Load balancers 35253 >>> ---------------------------- >>> Load balancer group 1 >>> ---------------------------- >>> SB Logical flows 268349 >>> ---------------------------- >>> SB DB groups 509 >>> ---------------------------- >>> >>> There is one load balancer group which has all the load balancers and >>> it is associated with all the logical switches and routers. >>> >>> Below is the time taken for each engine node in msec >>> >>> --------------------------------------------------------------------- >>> Engine nodes | lb_data | northd | lr_lbnat | lflow | >>> --------------------------------------------------------------------- >>> ovn-northd-main | 358 | 2455 | x | 2082 | >>> --------------------------------------------------------------------- >>> ovn-northd-p1 | 373 | 2476 | x | 2170 | >>> --------------------------------------------------------------------- >>> ovn-northd-p5 | 367 | 2413 | x | 2195 | >>> --------------------------------------------------------------------- >>> ovn-northd-p6 | 354 | 688 | 1815 | 2442 | >>> --------------------------------------------------------------------- >>> ovn-northd-p7 | 357 | 683 | 1778 | 2806 | >>> --------------------------------------------------------------------- >>> ovn-northd-p9 | 352 | 682 | 1781 | 2781 | >>> --------------------------------------------------------------------- >>> ovn-northd-p10 | 365 | 838 | 1707 | 2812 | >>> --------------------------------------------------------------------- >>> ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | >>> --------------------------------------------------------------------- >>> ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | >>> --------------------------------------------------------------------- >>> ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | >>> --------------------------------------------------------------------- >>> >>> ovn-northd-p1 means ovn-northd with the patch 1 of this series, >>> ovn-northd-p2 - patch 2 of this series and so on. >>> >>> Patch 6 adds a new engine node lr_lbnat and that's why the northd >>> engine node time gets reduced from ~2413 msec to 688. >>> >>> With the first 10 patches, northd engine over all time increases to >>> 200msec compared to "main '' and lflow engine node time increases to >>> around 800 msec. >>> The point of this data is to show that the overall increase is mainly >>> due to bookkeeping and new engine nodes. I tried to optimise but I >>> couldn't. >>> >>> >>> IIMO this 20% overall increase should be fine for the following reasons >>> - The scale tests with ovn-heater show significant decrease in >>> ovn-northd CPU usage and the decrease in the overall test duration. >>> - This means there were very few recomputes triggered. >>> - With the OCP/K8s kube-burner tests, there were 0 recomputes >>> in the northd and lflow engine nodes. >>> >>> >>> >>>> >>>>> >>>>> Prior to this patch series, most of the changes to northd engine >>>>> resulted in full recomputation of logical flows. This series >>>>> aims to improve the performance of ovn-northd by adding the I-P >>>>> support. >>>> >>>> I'd like to clarify "most of the changes" a little. I think we should >> focus >>>> on the most impactful changes that happen in a cloud environment. I >> don't >>>> think it is realistic to achieve "most of the changes" in I-P because >> it is >>>> too complex (even with this series we still handle a very small part of >> the >>>> DB schema incrementally), but it may be realistic to handle the most >>>> impactful changes, which are the most frequent changes in the cloud, >>>> incrementally. Before this series, we could handle regular VIF changes >> and >>>> related address-sets, port-groups (some of) updates incrementally. Those >>>> are the most common changes related to pod/VM changes in cloud. I >> believe >>>> the next goal is for LB changes related to pod/VMs (i.e. the LB >> backends), >>>> which I believe is the main goal of this series. Is my understanding >>>> correct? >>>> >>> That is correct. LB change handling is very important in my opinion >>> when used with ovn-kubernetes because all the k8s services >>> are mapped to OVN LBs. >>> >>> >>>> While reviewing the patches, I'd say sometimes I feel a little lost >> because >>>> it is hard to correlate some of the changes with the above goal. >>> >>> I understand that. Maybe I've done a bad job in conveying why the >>> changes are required for LBs. >>> >>> Let me try to clarify a bit here. I'll incorporate and add more >>> details in the patch commits and cover letter in the next series. >>> >>> This series mainly targets handling lflow generation for load balancer >> changes. >>> As you know we use conntrack in OVN for >>> - Load balancers (both in logical switch and router) >>> - ACLs in logical switch >>> - NAT in load balancers. >>> >>> And this makes it complicated to decouple logical flow generation for >>> LBs from ACLs and NATs. That is the reason I had to split northd node >>> data related to load balancers, ACLs and NATs into separate nodes. >>> >>> To give an example, we generate certain logical flows for a logical >>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See >>> build_acl_hints()). >>> For logical routers, a NAT can be a LB VIP too. This becomes >>> difficult to decouple LB and NAT. We could fall back to recompute >>> for NAT changes. But we still need to group logical flows related to >>> LB and NAT. And that's why I added lr_lbnat_data engine node in patch >>> 6. >>> >> >> Thanks for explaining. However, I wonder if these are related to LB backend >> updates. I agree that service creation/deletion may also be important to be >> handled incrementally, but they are less critical (less frequent) than the >> backend updates which are directly related to pod/vm creation/deletion. > > From what I understand, at least for OCP use cases they seem to be critical. > Also it depends on the use case I suppose. > > The >> has_xxx and NAT related handling are more related to the service >> creation/deletion rather than LB backend updates, right? > > I think so. > > Numan > >> >> Thanks, >> Han >> >>> I believe >>>> there is a reason for all the changes but I am not sure if it is >> directly >>>> related to the goal of LB backend related I-P. I understand that there >> are >>>> other aspects of LB that can be incrementally processed. But I'd >> recommend >>>> if changes necessary for this goal can be largely reduced it would help >> the >>>> review and we might be able to merge them sooner and add more but less >>>> impactful I-P later step by step. I guess I will have a clearer picture >>>> when I finish the rest of the patches, but it would be helpful if you >> could >>>> add more guidance in this cover letter. >>> >>> I'll definitely add more details in the cover letter. >>> >>> Apart from the LB related I-P, this series only adds 2 more additional >>> patches for I-P. >>> One is for the NB_Global changes and the other for SB mac binding changes. >>> ovn-kubernetes periodically writes in the NB_Global.options column >>> with a timestamp for its internal health checks >>> and this results in unnecessary recomputes. After running the >>> kube-burner density cni tests, I figured out the most common >>> changes done in the NB DB and added the I-P for that. With all these >>> patches applied, there are no recomputes in both northd and lflow >>> engine nodes >>> during the test duration. Not having the I-P for NB Global for >>> example was resulting in significant ovn-northd CPU usage. >>> >>> Thanks again for the reviews. >>> >>> Numan >>> >>>> >>>> Thanks, >>>> Han >>>> >>>>> In order to add this support, some of the northd engine >>>>> node data (from struct ovn_datapath) is split and moved over to >>>>> new engine nodes - mainly related to load balancers, NAT and ACLs. >>>>> >>>>> Below are the scale testing results done with these patches applied >>>>> using ovn-heater. The test ran the scenario - >>>>> ocp-500-density-heavy.yml [1]. >>>>> >>>>> With all the lflow I-P patches applied, the resuts are: >>>>> >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Min (s) Median (s) 90%ile (s) >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>> Failed >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Iteration Total 0.136883 1.129016 1.192001 >>>> 1.204167 1.212728 0.665017 83.127099 125 >> 0 >>>>> Namespace.add_ports 0.005216 0.005736 0.007034 >>>> 0.015486 0.018978 0.006211 0.776373 125 >> 0 >>>>> WorkerNode.bind_port 0.035030 0.046082 0.052469 >>>> 0.058293 0.060311 0.045973 11.493259 250 >> 0 >>>>> WorkerNode.ping_port 0.005057 0.006727 1.047692 >>>> 1.069253 1.071336 0.266896 66.724094 250 >> 0 >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> The results with the present main are: >>>>> >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Min (s) Median (s) 90%ile (s) >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>> Failed >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> Iteration Total 0.135491 2.223805 3.311270 >>>> 3.339078 3.345346 1.729172 216.146495 125 >> 0 >>>>> Namespace.add_ports 0.005380 0.005744 0.006819 >>>> 0.018773 0.020800 0.006292 0.786532 125 >> 0 >>>>> WorkerNode.bind_port 0.034179 0.046055 0.053488 >>>> 0.058801 0.071043 0.046117 11.529311 250 >> 0 >>>>> WorkerNode.ping_port 0.004956 0.006952 3.086952 >>>> 3.191743 3.192807 0.791544 197.886026 250 >> 0 >>>>> >>>> >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>> >>>>> [1] - >>>> >> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml >>>>> >>>>> v1 -> v2 >>>>> -------- >>>>> * Now also maintaing array indexes for ls_lbacls, lr_nat and >>>>> lr_lb_nat_data tables (similar to ovn_datapaths->array) to >>>>> make the lookup effecient. The same ovn_datapath->index >>>>> is reused. >>>>> >>>>> * Made some signficant changes to 'struct lflow_ref' in >> lflow-mgr.c. >>>>> In v2 we don't use objdep_mgr to maintain the resource to lflow >>>>> references. Instead we maintain the 'struct lflow' pointer. >>>>> With this we don't need to maintain additional hmap of lflows. >>>>> >>>>> >>>>> Numan Siddique (18): >>>>> northd: Refactor the northd change tracking. >>>>> northd: Track ovn_datapaths in northd engine track data. >>>>> tests: Add a couple of tests in ovn-northd for I-P. >>>>> northd: Move router ports SB PB options sync to sync_to_sb_pb node. >>>>> northd: Add a new engine 'lr-nat' to manage lr NAT data. >>>>> northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb >> data. >>>>> northd: Generate logical router's LB and NAT flows using >>>>> lr_lbnat_data. >>>>> northd: Don't commit dhcp response flows in the conntrack. >>>>> northd: Add a new node ls_lbacls. >>>>> northd: Refactor lflow management into a separate module. >>>>> northd: Use lflow_ref when adding all logical flows. >>>>> northd: Move ovn_lb_datapaths from lib to northd module. >>>>> northd: Handle lb changes in lflow engine. >>>>> northd: Add lr_lb_nat_data handler for lflow engine node. >>>>> northd: Add ls_lbacls handler for lflow engine node. >>>>> northd: Add I-P for NB_Global and SB_Global. >>>>> northd: Add a noop handler for northd SB mac binding. >>>>> northd: Add northd change handler for sync_to_sb_lb node. >>>>> >>>>> lib/lb.c | 96 - >>>>> lib/lb.h | 57 - >>>>> lib/ovn-util.c | 17 +- >>>>> lib/ovn-util.h | 2 +- >>>>> lib/stopwatch-names.h | 3 + >>>>> northd/aging.c | 21 +- >>>>> northd/automake.mk | 12 +- >>>>> northd/en-global-config.c | 588 ++++ >>>>> northd/en-global-config.h | 65 + >>>>> northd/en-lflow.c | 123 +- >>>>> northd/en-lflow.h | 8 + >>>>> northd/en-lr-lb-nat-data.c | 685 ++++ >>>>> northd/en-lr-lb-nat-data.h | 125 + >>>>> northd/en-lr-nat.c | 498 +++ >>>>> northd/en-lr-nat.h | 137 + >>>>> northd/en-ls-lb-acls.c | 530 +++ >>>>> northd/en-ls-lb-acls.h | 111 + >>>>> northd/en-northd.c | 67 +- >>>>> northd/en-northd.h | 2 +- >>>>> northd/en-port-group.h | 3 + >>>>> northd/en-sync-sb.c | 513 ++- >>>>> northd/inc-proc-northd.c | 79 +- >>>>> northd/lflow-mgr.c | 1078 ++++++ >>>>> northd/lflow-mgr.h | 192 ++ >>>>> northd/northd.c | 6485 >> ++++++++++++++++-------------------- >>>>> northd/northd.h | 551 ++- >>>>> northd/ovn-northd.c | 4 + >>>>> tests/ovn-northd.at | 858 ++++- >>>>> 28 files changed, 8827 insertions(+), 4083 deletions(-) >>>>> create mode 100644 northd/en-global-config.c >>>>> create mode 100644 northd/en-global-config.h >>>>> create mode 100644 northd/en-lr-lb-nat-data.c >>>>> create mode 100644 northd/en-lr-lb-nat-data.h >>>>> create mode 100644 northd/en-lr-nat.c >>>>> create mode 100644 northd/en-lr-nat.h >>>>> create mode 100644 northd/en-ls-lb-acls.c >>>>> create mode 100644 northd/en-ls-lb-acls.h >>>>> create mode 100644 northd/lflow-mgr.c >>>>> create mode 100644 northd/lflow-mgr.h >>>>> >>>>> -- >>>>> 2.41.0 >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Fri, Nov 24, 2023 at 10:38 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 11/16/23 22:05, Numan Siddique wrote: > > On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <hzhou@ovn.org> wrote: > >> > >> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <numans@ovn.org> wrote: > >>> > >>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <hzhou@ovn.org> wrote: > >>>> > >>>> On Thu, Oct 26, 2023 at 11:12 AM <numans@ovn.org> wrote: > >>>>> > >>>>> From: Numan Siddique <numans@ovn.org> > >>>>> > >>>>> This patch series adds incremental processing in the lflow engine > >>>>> node to handle changes to northd and other engine nodes. > >>>>> Changed related to load balancers and NAT are mainly handled in > >>>>> this patch series. > >>>>> > >>>>> This patch series can also be found here - > >>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 > >>>> > >>>> Thanks Numan for the great improvement! > >>> > >>> Hi Han, > >>> > >>> Thanks for the review comments. I understand it is hard to review > >>> somany patches, especially related to I-P. > >>> I appreciate the time spent on it and the feedback. > >>> > >>>> > >>>> I spent some time these days to review the series and now at patch 10. I > >>>> need to take a break and so I just sent the comments I had so far. > >>>> > >>>> I also did scale testing for northd with > >>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / > >>>> chassis setup, and noticed that the recompute latency has increased 20% > >>>> after the series. I think in general it is expected to have some > >>>> performance penalty for the recompute case because of the new indexes > >>>> created for I-P. However, the patch 10 "northd: Refactor lflow > >> management > >>>> into a separate module." alone introduces a 10% latency increase, which > >>>> necessitates more investigation. > >>> > >>> Before sending out the series I did some testing on recomputes with a > >>> large OVN NB DB and SB DB > >>> (from a 500 node ovn-heater density heavy run). > >>> I'm aware of the increase in recomputes. And I did some more testing > >>> today as well. > >>> > >>> In my testing, I can see that the increase in latency is due to the > >>> new engine nodes added (lr_lbnat mainly) > >>> and due to housekeeping of the lflow references. I do not see any > >>> increase due to the new lflow-mgr.c added in patch 10. > >>> > >>> I compared patch 9 and patch 10 separately and there is no difference > >>> in lflow engine node recompute time between patch 9 and 10. > >>> > >> > >> My results were different. My test profile simulates the ovn-k8s topology > >> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small > >> amount of ACLs and PGs. > >> ( > >> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg > >> ) > >> > >> The test results for ovn-northd recompute time are: > >> main: 1118.3 > >> p9: 1129.5 > >> p10: 1243.4 ==> 10% more than p9 > >> p18: 1357.6 > >> > >> I am not sure if the p10 increase is related to the hash change or > >> something else. > >> > > I didn't review p10 in detail yet but I did run some tests and (with gcc > and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no > significant difference between p9 and p10 when running Han's scale test > profile. > > Then I also tried the same thing using the 500 nodes + 50 LSP/node perf > test we already have in the repo: > > https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185 > > Again, I didn't see any significant change between p9 and p10 on my > laptop. I wonder what's different in our setups. > > Kind of related, I posted a series to improve the in-tree perf testing > and allow us to also gather recompute stats (build the DB, reset > stopwatches and trigger a recompute, then collect stopwatches): > > https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=* > > Would it be an idea to try to merge both scenarios you guys used into > something that's defined as a new test in-tree? Like that it's easier > for anyone to just run the same test. That sounds good to me. I'll take a look at your patches soon. Thanks for testing them out. Can you also please compare OVN main vs p10 of this series ? Thanks Numan > > Thanks, > Dumitru > > >>> Below are the results with the time taken for the mentioned engine > >>> nodes in msecs for a recompute for some of the individual patches in > >>> the series. > >>> > >>> > >>> The sample OVN DBs have > >>> > >>> -------------------------------- > >>> Resource Total > >>> ------------------------------- > >>> Logical switches 1001 > >>> ---------------------------- > >>> Logical routers 501 > >>> ---------------------------- > >>> Router ports 1501 > >>> ---------------------------- > >>> Switch ports 29501 > >>> ---------------------------- > >>> Load balancers 35253 > >>> ---------------------------- > >>> Load balancer group 1 > >>> ---------------------------- > >>> SB Logical flows 268349 > >>> ---------------------------- > >>> SB DB groups 509 > >>> ---------------------------- > >>> > >>> There is one load balancer group which has all the load balancers and > >>> it is associated with all the logical switches and routers. > >>> > >>> Below is the time taken for each engine node in msec > >>> > >>> --------------------------------------------------------------------- > >>> Engine nodes | lb_data | northd | lr_lbnat | lflow | > >>> --------------------------------------------------------------------- > >>> ovn-northd-main | 358 | 2455 | x | 2082 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p1 | 373 | 2476 | x | 2170 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p5 | 367 | 2413 | x | 2195 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p6 | 354 | 688 | 1815 | 2442 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p7 | 357 | 683 | 1778 | 2806 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p9 | 352 | 682 | 1781 | 2781 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p10 | 365 | 838 | 1707 | 2812 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | > >>> --------------------------------------------------------------------- > >>> ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | > >>> --------------------------------------------------------------------- > >>> > >>> ovn-northd-p1 means ovn-northd with the patch 1 of this series, > >>> ovn-northd-p2 - patch 2 of this series and so on. > >>> > >>> Patch 6 adds a new engine node lr_lbnat and that's why the northd > >>> engine node time gets reduced from ~2413 msec to 688. > >>> > >>> With the first 10 patches, northd engine over all time increases to > >>> 200msec compared to "main '' and lflow engine node time increases to > >>> around 800 msec. > >>> The point of this data is to show that the overall increase is mainly > >>> due to bookkeeping and new engine nodes. I tried to optimise but I > >>> couldn't. > >>> > >>> > >>> IIMO this 20% overall increase should be fine for the following reasons > >>> - The scale tests with ovn-heater show significant decrease in > >>> ovn-northd CPU usage and the decrease in the overall test duration. > >>> - This means there were very few recomputes triggered. > >>> - With the OCP/K8s kube-burner tests, there were 0 recomputes > >>> in the northd and lflow engine nodes. > >>> > >>> > >>> > >>>> > >>>>> > >>>>> Prior to this patch series, most of the changes to northd engine > >>>>> resulted in full recomputation of logical flows. This series > >>>>> aims to improve the performance of ovn-northd by adding the I-P > >>>>> support. > >>>> > >>>> I'd like to clarify "most of the changes" a little. I think we should > >> focus > >>>> on the most impactful changes that happen in a cloud environment. I > >> don't > >>>> think it is realistic to achieve "most of the changes" in I-P because > >> it is > >>>> too complex (even with this series we still handle a very small part of > >> the > >>>> DB schema incrementally), but it may be realistic to handle the most > >>>> impactful changes, which are the most frequent changes in the cloud, > >>>> incrementally. Before this series, we could handle regular VIF changes > >> and > >>>> related address-sets, port-groups (some of) updates incrementally. Those > >>>> are the most common changes related to pod/VM changes in cloud. I > >> believe > >>>> the next goal is for LB changes related to pod/VMs (i.e. the LB > >> backends), > >>>> which I believe is the main goal of this series. Is my understanding > >>>> correct? > >>>> > >>> That is correct. LB change handling is very important in my opinion > >>> when used with ovn-kubernetes because all the k8s services > >>> are mapped to OVN LBs. > >>> > >>> > >>>> While reviewing the patches, I'd say sometimes I feel a little lost > >> because > >>>> it is hard to correlate some of the changes with the above goal. > >>> > >>> I understand that. Maybe I've done a bad job in conveying why the > >>> changes are required for LBs. > >>> > >>> Let me try to clarify a bit here. I'll incorporate and add more > >>> details in the patch commits and cover letter in the next series. > >>> > >>> This series mainly targets handling lflow generation for load balancer > >> changes. > >>> As you know we use conntrack in OVN for > >>> - Load balancers (both in logical switch and router) > >>> - ACLs in logical switch > >>> - NAT in load balancers. > >>> > >>> And this makes it complicated to decouple logical flow generation for > >>> LBs from ACLs and NATs. That is the reason I had to split northd node > >>> data related to load balancers, ACLs and NATs into separate nodes. > >>> > >>> To give an example, we generate certain logical flows for a logical > >>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See > >>> build_acl_hints()). > >>> For logical routers, a NAT can be a LB VIP too. This becomes > >>> difficult to decouple LB and NAT. We could fall back to recompute > >>> for NAT changes. But we still need to group logical flows related to > >>> LB and NAT. And that's why I added lr_lbnat_data engine node in patch > >>> 6. > >>> > >> > >> Thanks for explaining. However, I wonder if these are related to LB backend > >> updates. I agree that service creation/deletion may also be important to be > >> handled incrementally, but they are less critical (less frequent) than the > >> backend updates which are directly related to pod/vm creation/deletion. > > > > From what I understand, at least for OCP use cases they seem to be critical. > > Also it depends on the use case I suppose. > > > > The > >> has_xxx and NAT related handling are more related to the service > >> creation/deletion rather than LB backend updates, right? > > > > I think so. > > > > Numan > > > >> > >> Thanks, > >> Han > >> > >>> I believe > >>>> there is a reason for all the changes but I am not sure if it is > >> directly > >>>> related to the goal of LB backend related I-P. I understand that there > >> are > >>>> other aspects of LB that can be incrementally processed. But I'd > >> recommend > >>>> if changes necessary for this goal can be largely reduced it would help > >> the > >>>> review and we might be able to merge them sooner and add more but less > >>>> impactful I-P later step by step. I guess I will have a clearer picture > >>>> when I finish the rest of the patches, but it would be helpful if you > >> could > >>>> add more guidance in this cover letter. > >>> > >>> I'll definitely add more details in the cover letter. > >>> > >>> Apart from the LB related I-P, this series only adds 2 more additional > >>> patches for I-P. > >>> One is for the NB_Global changes and the other for SB mac binding changes. > >>> ovn-kubernetes periodically writes in the NB_Global.options column > >>> with a timestamp for its internal health checks > >>> and this results in unnecessary recomputes. After running the > >>> kube-burner density cni tests, I figured out the most common > >>> changes done in the NB DB and added the I-P for that. With all these > >>> patches applied, there are no recomputes in both northd and lflow > >>> engine nodes > >>> during the test duration. Not having the I-P for NB Global for > >>> example was resulting in significant ovn-northd CPU usage. > >>> > >>> Thanks again for the reviews. > >>> > >>> Numan > >>> > >>>> > >>>> Thanks, > >>>> Han > >>>> > >>>>> In order to add this support, some of the northd engine > >>>>> node data (from struct ovn_datapath) is split and moved over to > >>>>> new engine nodes - mainly related to load balancers, NAT and ACLs. > >>>>> > >>>>> Below are the scale testing results done with these patches applied > >>>>> using ovn-heater. The test ran the scenario - > >>>>> ocp-500-density-heavy.yml [1]. > >>>>> > >>>>> With all the lflow I-P patches applied, the resuts are: > >>>>> > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Min (s) Median (s) 90%ile (s) > >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count > >>>> Failed > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Iteration Total 0.136883 1.129016 1.192001 > >>>> 1.204167 1.212728 0.665017 83.127099 125 > >> 0 > >>>>> Namespace.add_ports 0.005216 0.005736 0.007034 > >>>> 0.015486 0.018978 0.006211 0.776373 125 > >> 0 > >>>>> WorkerNode.bind_port 0.035030 0.046082 0.052469 > >>>> 0.058293 0.060311 0.045973 11.493259 250 > >> 0 > >>>>> WorkerNode.ping_port 0.005057 0.006727 1.047692 > >>>> 1.069253 1.071336 0.266896 66.724094 250 > >> 0 > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> > >>>>> The results with the present main are: > >>>>> > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Min (s) Median (s) 90%ile (s) > >>>> 99%ile (s) Max (s) Mean (s) Total (s) Count > >>>> Failed > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> Iteration Total 0.135491 2.223805 3.311270 > >>>> 3.339078 3.345346 1.729172 216.146495 125 > >> 0 > >>>>> Namespace.add_ports 0.005380 0.005744 0.006819 > >>>> 0.018773 0.020800 0.006292 0.786532 125 > >> 0 > >>>>> WorkerNode.bind_port 0.034179 0.046055 0.053488 > >>>> 0.058801 0.071043 0.046117 11.529311 250 > >> 0 > >>>>> WorkerNode.ping_port 0.004956 0.006952 3.086952 > >>>> 3.191743 3.192807 0.791544 197.886026 250 > >> 0 > >>>>> > >>>> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- > >>>>> > >>>>> [1] - > >>>> > >> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml > >>>>> > >>>>> v1 -> v2 > >>>>> -------- > >>>>> * Now also maintaing array indexes for ls_lbacls, lr_nat and > >>>>> lr_lb_nat_data tables (similar to ovn_datapaths->array) to > >>>>> make the lookup effecient. The same ovn_datapath->index > >>>>> is reused. > >>>>> > >>>>> * Made some signficant changes to 'struct lflow_ref' in > >> lflow-mgr.c. > >>>>> In v2 we don't use objdep_mgr to maintain the resource to lflow > >>>>> references. Instead we maintain the 'struct lflow' pointer. > >>>>> With this we don't need to maintain additional hmap of lflows. > >>>>> > >>>>> > >>>>> Numan Siddique (18): > >>>>> northd: Refactor the northd change tracking. > >>>>> northd: Track ovn_datapaths in northd engine track data. > >>>>> tests: Add a couple of tests in ovn-northd for I-P. > >>>>> northd: Move router ports SB PB options sync to sync_to_sb_pb node. > >>>>> northd: Add a new engine 'lr-nat' to manage lr NAT data. > >>>>> northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb > >> data. > >>>>> northd: Generate logical router's LB and NAT flows using > >>>>> lr_lbnat_data. > >>>>> northd: Don't commit dhcp response flows in the conntrack. > >>>>> northd: Add a new node ls_lbacls. > >>>>> northd: Refactor lflow management into a separate module. > >>>>> northd: Use lflow_ref when adding all logical flows. > >>>>> northd: Move ovn_lb_datapaths from lib to northd module. > >>>>> northd: Handle lb changes in lflow engine. > >>>>> northd: Add lr_lb_nat_data handler for lflow engine node. > >>>>> northd: Add ls_lbacls handler for lflow engine node. > >>>>> northd: Add I-P for NB_Global and SB_Global. > >>>>> northd: Add a noop handler for northd SB mac binding. > >>>>> northd: Add northd change handler for sync_to_sb_lb node. > >>>>> > >>>>> lib/lb.c | 96 - > >>>>> lib/lb.h | 57 - > >>>>> lib/ovn-util.c | 17 +- > >>>>> lib/ovn-util.h | 2 +- > >>>>> lib/stopwatch-names.h | 3 + > >>>>> northd/aging.c | 21 +- > >>>>> northd/automake.mk | 12 +- > >>>>> northd/en-global-config.c | 588 ++++ > >>>>> northd/en-global-config.h | 65 + > >>>>> northd/en-lflow.c | 123 +- > >>>>> northd/en-lflow.h | 8 + > >>>>> northd/en-lr-lb-nat-data.c | 685 ++++ > >>>>> northd/en-lr-lb-nat-data.h | 125 + > >>>>> northd/en-lr-nat.c | 498 +++ > >>>>> northd/en-lr-nat.h | 137 + > >>>>> northd/en-ls-lb-acls.c | 530 +++ > >>>>> northd/en-ls-lb-acls.h | 111 + > >>>>> northd/en-northd.c | 67 +- > >>>>> northd/en-northd.h | 2 +- > >>>>> northd/en-port-group.h | 3 + > >>>>> northd/en-sync-sb.c | 513 ++- > >>>>> northd/inc-proc-northd.c | 79 +- > >>>>> northd/lflow-mgr.c | 1078 ++++++ > >>>>> northd/lflow-mgr.h | 192 ++ > >>>>> northd/northd.c | 6485 > >> ++++++++++++++++-------------------- > >>>>> northd/northd.h | 551 ++- > >>>>> northd/ovn-northd.c | 4 + > >>>>> tests/ovn-northd.at | 858 ++++- > >>>>> 28 files changed, 8827 insertions(+), 4083 deletions(-) > >>>>> create mode 100644 northd/en-global-config.c > >>>>> create mode 100644 northd/en-global-config.h > >>>>> create mode 100644 northd/en-lr-lb-nat-data.c > >>>>> create mode 100644 northd/en-lr-lb-nat-data.h > >>>>> create mode 100644 northd/en-lr-nat.c > >>>>> create mode 100644 northd/en-lr-nat.h > >>>>> create mode 100644 northd/en-ls-lb-acls.c > >>>>> create mode 100644 northd/en-ls-lb-acls.h > >>>>> create mode 100644 northd/lflow-mgr.c > >>>>> create mode 100644 northd/lflow-mgr.h > >>>>> > >>>>> -- > >>>>> 2.41.0 > >>>>> > >>>>> _______________________________________________ > >>>>> dev mailing list > >>>>> dev@openvswitch.org > >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>>> _______________________________________________ > >>>> dev mailing list > >>>> dev@openvswitch.org > >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 11/24/23 16:41, Numan Siddique wrote: > On Fri, Nov 24, 2023 at 10:38 AM Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 11/16/23 22:05, Numan Siddique wrote: >>> On Thu, Nov 16, 2023 at 2:54 PM Han Zhou <hzhou@ovn.org> wrote: >>>> >>>> On Wed, Nov 15, 2023 at 7:32 PM Numan Siddique <numans@ovn.org> wrote: >>>>> >>>>> On Wed, Nov 15, 2023 at 2:59 AM Han Zhou <hzhou@ovn.org> wrote: >>>>>> >>>>>> On Thu, Oct 26, 2023 at 11:12 AM <numans@ovn.org> wrote: >>>>>>> >>>>>>> From: Numan Siddique <numans@ovn.org> >>>>>>> >>>>>>> This patch series adds incremental processing in the lflow engine >>>>>>> node to handle changes to northd and other engine nodes. >>>>>>> Changed related to load balancers and NAT are mainly handled in >>>>>>> this patch series. >>>>>>> >>>>>>> This patch series can also be found here - >>>>>> https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 >>>>>> >>>>>> Thanks Numan for the great improvement! >>>>> >>>>> Hi Han, >>>>> >>>>> Thanks for the review comments. I understand it is hard to review >>>>> somany patches, especially related to I-P. >>>>> I appreciate the time spent on it and the feedback. >>>>> >>>>>> >>>>>> I spent some time these days to review the series and now at patch 10. I >>>>>> need to take a break and so I just sent the comments I had so far. >>>>>> >>>>>> I also did scale testing for northd with >>>>>> https://github.com/hzhou8/ovn-test-script for a 500 chassis, 50 lsp / >>>>>> chassis setup, and noticed that the recompute latency has increased 20% >>>>>> after the series. I think in general it is expected to have some >>>>>> performance penalty for the recompute case because of the new indexes >>>>>> created for I-P. However, the patch 10 "northd: Refactor lflow >>>> management >>>>>> into a separate module." alone introduces a 10% latency increase, which >>>>>> necessitates more investigation. >>>>> >>>>> Before sending out the series I did some testing on recomputes with a >>>>> large OVN NB DB and SB DB >>>>> (from a 500 node ovn-heater density heavy run). >>>>> I'm aware of the increase in recomputes. And I did some more testing >>>>> today as well. >>>>> >>>>> In my testing, I can see that the increase in latency is due to the >>>>> new engine nodes added (lr_lbnat mainly) >>>>> and due to housekeeping of the lflow references. I do not see any >>>>> increase due to the new lflow-mgr.c added in patch 10. >>>>> >>>>> I compared patch 9 and patch 10 separately and there is no difference >>>>> in lflow engine node recompute time between patch 9 and 10. >>>>> >>>> >>>> My results were different. My test profile simulates the ovn-k8s topology >>>> (central mode, not IC) with 500 nodes, 50 LSP/node, with no LB, small >>>> amount of ACLs and PGs. >>>> ( >>>> https://github.com/hzhou8/ovn-test-script/blob/main/args.500ch_50lsp_10000pg >>>> ) >>>> >>>> The test results for ovn-northd recompute time are: >>>> main: 1118.3 >>>> p9: 1129.5 >>>> p10: 1243.4 ==> 10% more than p9 >>>> p18: 1357.6 >>>> >>>> I am not sure if the p10 increase is related to the hash change or >>>> something else. >>>> >> >> I didn't review p10 in detail yet but I did run some tests and (with gcc >> and CFLAGS="-O2 -fno-omit-frame-pointer -fno-common -g") I see no >> significant difference between p9 and p10 when running Han's scale test >> profile. >> >> Then I also tried the same thing using the 500 nodes + 50 LSP/node perf >> test we already have in the repo: >> >> https://github.com/ovn-org/ovn/blob/5ef2a08ade01d698f84e197987ea679d8978d2b9/tests/perf-northd.at#L185 >> >> Again, I didn't see any significant change between p9 and p10 on my >> laptop. I wonder what's different in our setups. >> >> Kind of related, I posted a series to improve the in-tree perf testing >> and allow us to also gather recompute stats (build the DB, reset >> stopwatches and trigger a recompute, then collect stopwatches): >> >> https://patchwork.ozlabs.org/project/ovn/list/?series=383727&state=* >> >> Would it be an idea to try to merge both scenarios you guys used into >> something that's defined as a new test in-tree? Like that it's easier >> for anyone to just run the same test. > > That sounds good to me. I'll take a look at your patches soon. > > Thanks for testing them out. Can you also please compare OVN main vs > p10 of this series ? > Sorry for the delay in replying; I ran some more tests both with Han's benchmark and also with the in-tree check-perf 500 node + 50 LSP/node test. I did this with debug logs disabled and ran them multiple times to collect some more relevant averages (the results did seem more stable). This is what I measured (where "pX" == "patch X of this series" and values are averages in milliseconds): ------------------------------------------------------------------------------- | Benchmark | Recompute "main" | Recompute p9 | Recompute p10 | Recompute p16 | |------------------------------------------------------------------------------ | Han's | 1552 | 1560 | 1676 | 1704 | ------------------------------------------------------------------------------- | in-tree | 1146 | 1250 | 1281 | 1337 | ------------------------------------------------------------------------------- So, in both cases, I see some increase in time it takes to run full recomputes. However, IMO, a hit of 10-15% in time it takes to recompute might be acceptable as long as the most common cases that used to trigger recompute are now handled incrementally. Regards, Dumitru > Thanks > Numan > > >> >> Thanks, >> Dumitru >> >>>>> Below are the results with the time taken for the mentioned engine >>>>> nodes in msecs for a recompute for some of the individual patches in >>>>> the series. >>>>> >>>>> >>>>> The sample OVN DBs have >>>>> >>>>> -------------------------------- >>>>> Resource Total >>>>> ------------------------------- >>>>> Logical switches 1001 >>>>> ---------------------------- >>>>> Logical routers 501 >>>>> ---------------------------- >>>>> Router ports 1501 >>>>> ---------------------------- >>>>> Switch ports 29501 >>>>> ---------------------------- >>>>> Load balancers 35253 >>>>> ---------------------------- >>>>> Load balancer group 1 >>>>> ---------------------------- >>>>> SB Logical flows 268349 >>>>> ---------------------------- >>>>> SB DB groups 509 >>>>> ---------------------------- >>>>> >>>>> There is one load balancer group which has all the load balancers and >>>>> it is associated with all the logical switches and routers. >>>>> >>>>> Below is the time taken for each engine node in msec >>>>> >>>>> --------------------------------------------------------------------- >>>>> Engine nodes | lb_data | northd | lr_lbnat | lflow | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-main | 358 | 2455 | x | 2082 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p1 | 373 | 2476 | x | 2170 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p5 | 367 | 2413 | x | 2195 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p6 | 354 | 688 | 1815 | 2442 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p7 | 357 | 683 | 1778 | 2806 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p9 | 352 | 682 | 1781 | 2781 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p10 | 365 | 838 | 1707 | 2812 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p13 | 362 | 1066 | 1882 | 2917 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p15 | 359 | 1046 | 1688 | 2907 | >>>>> --------------------------------------------------------------------- >>>>> ovn-northd-p18 | 379 | 1066 | 1729 | 2886 | >>>>> --------------------------------------------------------------------- >>>>> >>>>> ovn-northd-p1 means ovn-northd with the patch 1 of this series, >>>>> ovn-northd-p2 - patch 2 of this series and so on. >>>>> >>>>> Patch 6 adds a new engine node lr_lbnat and that's why the northd >>>>> engine node time gets reduced from ~2413 msec to 688. >>>>> >>>>> With the first 10 patches, northd engine over all time increases to >>>>> 200msec compared to "main '' and lflow engine node time increases to >>>>> around 800 msec. >>>>> The point of this data is to show that the overall increase is mainly >>>>> due to bookkeeping and new engine nodes. I tried to optimise but I >>>>> couldn't. >>>>> >>>>> >>>>> IIMO this 20% overall increase should be fine for the following reasons >>>>> - The scale tests with ovn-heater show significant decrease in >>>>> ovn-northd CPU usage and the decrease in the overall test duration. >>>>> - This means there were very few recomputes triggered. >>>>> - With the OCP/K8s kube-burner tests, there were 0 recomputes >>>>> in the northd and lflow engine nodes. >>>>> >>>>> >>>>> >>>>>> >>>>>>> >>>>>>> Prior to this patch series, most of the changes to northd engine >>>>>>> resulted in full recomputation of logical flows. This series >>>>>>> aims to improve the performance of ovn-northd by adding the I-P >>>>>>> support. >>>>>> >>>>>> I'd like to clarify "most of the changes" a little. I think we should >>>> focus >>>>>> on the most impactful changes that happen in a cloud environment. I >>>> don't >>>>>> think it is realistic to achieve "most of the changes" in I-P because >>>> it is >>>>>> too complex (even with this series we still handle a very small part of >>>> the >>>>>> DB schema incrementally), but it may be realistic to handle the most >>>>>> impactful changes, which are the most frequent changes in the cloud, >>>>>> incrementally. Before this series, we could handle regular VIF changes >>>> and >>>>>> related address-sets, port-groups (some of) updates incrementally. Those >>>>>> are the most common changes related to pod/VM changes in cloud. I >>>> believe >>>>>> the next goal is for LB changes related to pod/VMs (i.e. the LB >>>> backends), >>>>>> which I believe is the main goal of this series. Is my understanding >>>>>> correct? >>>>>> >>>>> That is correct. LB change handling is very important in my opinion >>>>> when used with ovn-kubernetes because all the k8s services >>>>> are mapped to OVN LBs. >>>>> >>>>> >>>>>> While reviewing the patches, I'd say sometimes I feel a little lost >>>> because >>>>>> it is hard to correlate some of the changes with the above goal. >>>>> >>>>> I understand that. Maybe I've done a bad job in conveying why the >>>>> changes are required for LBs. >>>>> >>>>> Let me try to clarify a bit here. I'll incorporate and add more >>>>> details in the patch commits and cover letter in the next series. >>>>> >>>>> This series mainly targets handling lflow generation for load balancer >>>> changes. >>>>> As you know we use conntrack in OVN for >>>>> - Load balancers (both in logical switch and router) >>>>> - ACLs in logical switch >>>>> - NAT in load balancers. >>>>> >>>>> And this makes it complicated to decouple logical flow generation for >>>>> LBs from ACLs and NATs. That is the reason I had to split northd node >>>>> data related to load balancers, ACLs and NATs into separate nodes. >>>>> >>>>> To give an example, we generate certain logical flows for a logical >>>>> switch by checking " if (od->has_lb_vip && od->has_stateful_acl)" (See >>>>> build_acl_hints()). >>>>> For logical routers, a NAT can be a LB VIP too. This becomes >>>>> difficult to decouple LB and NAT. We could fall back to recompute >>>>> for NAT changes. But we still need to group logical flows related to >>>>> LB and NAT. And that's why I added lr_lbnat_data engine node in patch >>>>> 6. >>>>> >>>> >>>> Thanks for explaining. However, I wonder if these are related to LB backend >>>> updates. I agree that service creation/deletion may also be important to be >>>> handled incrementally, but they are less critical (less frequent) than the >>>> backend updates which are directly related to pod/vm creation/deletion. >>> >>> From what I understand, at least for OCP use cases they seem to be critical. >>> Also it depends on the use case I suppose. >>> >>> The >>>> has_xxx and NAT related handling are more related to the service >>>> creation/deletion rather than LB backend updates, right? >>> >>> I think so. >>> >>> Numan >>> >>>> >>>> Thanks, >>>> Han >>>> >>>>> I believe >>>>>> there is a reason for all the changes but I am not sure if it is >>>> directly >>>>>> related to the goal of LB backend related I-P. I understand that there >>>> are >>>>>> other aspects of LB that can be incrementally processed. But I'd >>>> recommend >>>>>> if changes necessary for this goal can be largely reduced it would help >>>> the >>>>>> review and we might be able to merge them sooner and add more but less >>>>>> impactful I-P later step by step. I guess I will have a clearer picture >>>>>> when I finish the rest of the patches, but it would be helpful if you >>>> could >>>>>> add more guidance in this cover letter. >>>>> >>>>> I'll definitely add more details in the cover letter. >>>>> >>>>> Apart from the LB related I-P, this series only adds 2 more additional >>>>> patches for I-P. >>>>> One is for the NB_Global changes and the other for SB mac binding changes. >>>>> ovn-kubernetes periodically writes in the NB_Global.options column >>>>> with a timestamp for its internal health checks >>>>> and this results in unnecessary recomputes. After running the >>>>> kube-burner density cni tests, I figured out the most common >>>>> changes done in the NB DB and added the I-P for that. With all these >>>>> patches applied, there are no recomputes in both northd and lflow >>>>> engine nodes >>>>> during the test duration. Not having the I-P for NB Global for >>>>> example was resulting in significant ovn-northd CPU usage. >>>>> >>>>> Thanks again for the reviews. >>>>> >>>>> Numan >>>>> >>>>>> >>>>>> Thanks, >>>>>> Han >>>>>> >>>>>>> In order to add this support, some of the northd engine >>>>>>> node data (from struct ovn_datapath) is split and moved over to >>>>>>> new engine nodes - mainly related to load balancers, NAT and ACLs. >>>>>>> >>>>>>> Below are the scale testing results done with these patches applied >>>>>>> using ovn-heater. The test ran the scenario - >>>>>>> ocp-500-density-heavy.yml [1]. >>>>>>> >>>>>>> With all the lflow I-P patches applied, the resuts are: >>>>>>> >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Min (s) Median (s) 90%ile (s) >>>>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>>>> Failed >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Iteration Total 0.136883 1.129016 1.192001 >>>>>> 1.204167 1.212728 0.665017 83.127099 125 >>>> 0 >>>>>>> Namespace.add_ports 0.005216 0.005736 0.007034 >>>>>> 0.015486 0.018978 0.006211 0.776373 125 >>>> 0 >>>>>>> WorkerNode.bind_port 0.035030 0.046082 0.052469 >>>>>> 0.058293 0.060311 0.045973 11.493259 250 >>>> 0 >>>>>>> WorkerNode.ping_port 0.005057 0.006727 1.047692 >>>>>> 1.069253 1.071336 0.266896 66.724094 250 >>>> 0 >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> >>>>>>> The results with the present main are: >>>>>>> >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Min (s) Median (s) 90%ile (s) >>>>>> 99%ile (s) Max (s) Mean (s) Total (s) Count >>>>>> Failed >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> Iteration Total 0.135491 2.223805 3.311270 >>>>>> 3.339078 3.345346 1.729172 216.146495 125 >>>> 0 >>>>>>> Namespace.add_ports 0.005380 0.005744 0.006819 >>>>>> 0.018773 0.020800 0.006292 0.786532 125 >>>> 0 >>>>>>> WorkerNode.bind_port 0.034179 0.046055 0.053488 >>>>>> 0.058801 0.071043 0.046117 11.529311 250 >>>> 0 >>>>>>> WorkerNode.ping_port 0.004956 0.006952 3.086952 >>>>>> 3.191743 3.192807 0.791544 197.886026 250 >>>> 0 >>>>>>> >>>>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------- >>>>>>> >>>>>>> [1] - >>>>>> >>>> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml >>>>>>> >>>>>>> v1 -> v2 >>>>>>> -------- >>>>>>> * Now also maintaing array indexes for ls_lbacls, lr_nat and >>>>>>> lr_lb_nat_data tables (similar to ovn_datapaths->array) to >>>>>>> make the lookup effecient. The same ovn_datapath->index >>>>>>> is reused. >>>>>>> >>>>>>> * Made some signficant changes to 'struct lflow_ref' in >>>> lflow-mgr.c. >>>>>>> In v2 we don't use objdep_mgr to maintain the resource to lflow >>>>>>> references. Instead we maintain the 'struct lflow' pointer. >>>>>>> With this we don't need to maintain additional hmap of lflows. >>>>>>> >>>>>>> >>>>>>> Numan Siddique (18): >>>>>>> northd: Refactor the northd change tracking. >>>>>>> northd: Track ovn_datapaths in northd engine track data. >>>>>>> tests: Add a couple of tests in ovn-northd for I-P. >>>>>>> northd: Move router ports SB PB options sync to sync_to_sb_pb node. >>>>>>> northd: Add a new engine 'lr-nat' to manage lr NAT data. >>>>>>> northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb >>>> data. >>>>>>> northd: Generate logical router's LB and NAT flows using >>>>>>> lr_lbnat_data. >>>>>>> northd: Don't commit dhcp response flows in the conntrack. >>>>>>> northd: Add a new node ls_lbacls. >>>>>>> northd: Refactor lflow management into a separate module. >>>>>>> northd: Use lflow_ref when adding all logical flows. >>>>>>> northd: Move ovn_lb_datapaths from lib to northd module. >>>>>>> northd: Handle lb changes in lflow engine. >>>>>>> northd: Add lr_lb_nat_data handler for lflow engine node. >>>>>>> northd: Add ls_lbacls handler for lflow engine node. >>>>>>> northd: Add I-P for NB_Global and SB_Global. >>>>>>> northd: Add a noop handler for northd SB mac binding. >>>>>>> northd: Add northd change handler for sync_to_sb_lb node. >>>>>>> >>>>>>> lib/lb.c | 96 - >>>>>>> lib/lb.h | 57 - >>>>>>> lib/ovn-util.c | 17 +- >>>>>>> lib/ovn-util.h | 2 +- >>>>>>> lib/stopwatch-names.h | 3 + >>>>>>> northd/aging.c | 21 +- >>>>>>> northd/automake.mk | 12 +- >>>>>>> northd/en-global-config.c | 588 ++++ >>>>>>> northd/en-global-config.h | 65 + >>>>>>> northd/en-lflow.c | 123 +- >>>>>>> northd/en-lflow.h | 8 + >>>>>>> northd/en-lr-lb-nat-data.c | 685 ++++ >>>>>>> northd/en-lr-lb-nat-data.h | 125 + >>>>>>> northd/en-lr-nat.c | 498 +++ >>>>>>> northd/en-lr-nat.h | 137 + >>>>>>> northd/en-ls-lb-acls.c | 530 +++ >>>>>>> northd/en-ls-lb-acls.h | 111 + >>>>>>> northd/en-northd.c | 67 +- >>>>>>> northd/en-northd.h | 2 +- >>>>>>> northd/en-port-group.h | 3 + >>>>>>> northd/en-sync-sb.c | 513 ++- >>>>>>> northd/inc-proc-northd.c | 79 +- >>>>>>> northd/lflow-mgr.c | 1078 ++++++ >>>>>>> northd/lflow-mgr.h | 192 ++ >>>>>>> northd/northd.c | 6485 >>>> ++++++++++++++++-------------------- >>>>>>> northd/northd.h | 551 ++- >>>>>>> northd/ovn-northd.c | 4 + >>>>>>> tests/ovn-northd.at | 858 ++++- >>>>>>> 28 files changed, 8827 insertions(+), 4083 deletions(-) >>>>>>> create mode 100644 northd/en-global-config.c >>>>>>> create mode 100644 northd/en-global-config.h >>>>>>> create mode 100644 northd/en-lr-lb-nat-data.c >>>>>>> create mode 100644 northd/en-lr-lb-nat-data.h >>>>>>> create mode 100644 northd/en-lr-nat.c >>>>>>> create mode 100644 northd/en-lr-nat.h >>>>>>> create mode 100644 northd/en-ls-lb-acls.c >>>>>>> create mode 100644 northd/en-ls-lb-acls.h >>>>>>> create mode 100644 northd/lflow-mgr.c >>>>>>> create mode 100644 northd/lflow-mgr.h >>>>>>> >>>>>>> -- >>>>>>> 2.41.0 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> dev@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> dev@openvswitch.org >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
From: Numan Siddique <numans@ovn.org> This patch series adds incremental processing in the lflow engine node to handle changes to northd and other engine nodes. Changed related to load balancers and NAT are mainly handled in this patch series. This patch series can also be found here - https://github.com/numansiddique/ovn/tree/northd_lflow_ip/v1 Prior to this patch series, most of the changes to northd engine resulted in full recomputation of logical flows. This series aims to improve the performance of ovn-northd by adding the I-P support. In order to add this support, some of the northd engine node data (from struct ovn_datapath) is split and moved over to new engine nodes - mainly related to load balancers, NAT and ACLs. Below are the scale testing results done with these patches applied using ovn-heater. The test ran the scenario - ocp-500-density-heavy.yml [1]. With all the lflow I-P patches applied, the resuts are: ------------------------------------------------------------------------------------------------------------------------------------------------------- Min (s) Median (s) 90%ile (s) 99%ile (s) Max (s) Mean (s) Total (s) Count Failed ------------------------------------------------------------------------------------------------------------------------------------------------------- Iteration Total 0.136883 1.129016 1.192001 1.204167 1.212728 0.665017 83.127099 125 0 Namespace.add_ports 0.005216 0.005736 0.007034 0.015486 0.018978 0.006211 0.776373 125 0 WorkerNode.bind_port 0.035030 0.046082 0.052469 0.058293 0.060311 0.045973 11.493259 250 0 WorkerNode.ping_port 0.005057 0.006727 1.047692 1.069253 1.071336 0.266896 66.724094 250 0 ------------------------------------------------------------------------------------------------------------------------------------------------------- The results with the present main are: ------------------------------------------------------------------------------------------------------------------------------------------------------- Min (s) Median (s) 90%ile (s) 99%ile (s) Max (s) Mean (s) Total (s) Count Failed ------------------------------------------------------------------------------------------------------------------------------------------------------- Iteration Total 0.135491 2.223805 3.311270 3.339078 3.345346 1.729172 216.146495 125 0 Namespace.add_ports 0.005380 0.005744 0.006819 0.018773 0.020800 0.006292 0.786532 125 0 WorkerNode.bind_port 0.034179 0.046055 0.053488 0.058801 0.071043 0.046117 11.529311 250 0 WorkerNode.ping_port 0.004956 0.006952 3.086952 3.191743 3.192807 0.791544 197.886026 250 0 ------------------------------------------------------------------------------------------------------------------------------------------------------- [1] - https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml v1 -> v2 -------- * Now also maintaing array indexes for ls_lbacls, lr_nat and lr_lb_nat_data tables (similar to ovn_datapaths->array) to make the lookup effecient. The same ovn_datapath->index is reused. * Made some signficant changes to 'struct lflow_ref' in lflow-mgr.c. In v2 we don't use objdep_mgr to maintain the resource to lflow references. Instead we maintain the 'struct lflow' pointer. With this we don't need to maintain additional hmap of lflows. Numan Siddique (18): northd: Refactor the northd change tracking. northd: Track ovn_datapaths in northd engine track data. tests: Add a couple of tests in ovn-northd for I-P. northd: Move router ports SB PB options sync to sync_to_sb_pb node. northd: Add a new engine 'lr-nat' to manage lr NAT data. northd: Add a new engine 'lr-lb-nat-data' to manage routers' lb data. northd: Generate logical router's LB and NAT flows using lr_lbnat_data. northd: Don't commit dhcp response flows in the conntrack. northd: Add a new node ls_lbacls. northd: Refactor lflow management into a separate module. northd: Use lflow_ref when adding all logical flows. northd: Move ovn_lb_datapaths from lib to northd module. northd: Handle lb changes in lflow engine. northd: Add lr_lb_nat_data handler for lflow engine node. northd: Add ls_lbacls handler for lflow engine node. northd: Add I-P for NB_Global and SB_Global. northd: Add a noop handler for northd SB mac binding. northd: Add northd change handler for sync_to_sb_lb node. lib/lb.c | 96 - lib/lb.h | 57 - lib/ovn-util.c | 17 +- lib/ovn-util.h | 2 +- lib/stopwatch-names.h | 3 + northd/aging.c | 21 +- northd/automake.mk | 12 +- northd/en-global-config.c | 588 ++++ northd/en-global-config.h | 65 + northd/en-lflow.c | 123 +- northd/en-lflow.h | 8 + northd/en-lr-lb-nat-data.c | 685 ++++ northd/en-lr-lb-nat-data.h | 125 + northd/en-lr-nat.c | 498 +++ northd/en-lr-nat.h | 137 + northd/en-ls-lb-acls.c | 530 +++ northd/en-ls-lb-acls.h | 111 + northd/en-northd.c | 67 +- northd/en-northd.h | 2 +- northd/en-port-group.h | 3 + northd/en-sync-sb.c | 513 ++- northd/inc-proc-northd.c | 79 +- northd/lflow-mgr.c | 1078 ++++++ northd/lflow-mgr.h | 192 ++ northd/northd.c | 6485 ++++++++++++++++-------------------- northd/northd.h | 551 ++- northd/ovn-northd.c | 4 + tests/ovn-northd.at | 858 ++++- 28 files changed, 8827 insertions(+), 4083 deletions(-) create mode 100644 northd/en-global-config.c create mode 100644 northd/en-global-config.h create mode 100644 northd/en-lr-lb-nat-data.c create mode 100644 northd/en-lr-lb-nat-data.h create mode 100644 northd/en-lr-nat.c create mode 100644 northd/en-lr-nat.h create mode 100644 northd/en-ls-lb-acls.c create mode 100644 northd/en-ls-lb-acls.h create mode 100644 northd/lflow-mgr.c create mode 100644 northd/lflow-mgr.h