mbox series

[ovs-dev,v3,0/4] northd: Optimize preparation of load balancers.

Message ID 20220909213223.824013-1-i.maximets@ovn.org
Headers show
Series northd: Optimize preparation of load balancers. | expand

Message

Ilya Maximets Sept. 9, 2022, 9:32 p.m. UTC
Re-compute of 'northd' node in ovn-northd may take almost half of the
total processing time in case there is a big number of load balancers
applied to multiple switches/routers or if there are huge load balancer
groups applied to them.  The latter is a common case for ovn-kubernetes
clusters.

This patch set is a result of profiling ovn-northd in ovn-heater
density-heavy scenario with 500 fake nodes, which supposed to resemble
high scale ovn-kubernetes setups.

There are no functional changes, only mechanical optimizations that
allows to achieve exactly the same result by doing less work.

In total these patches allowed to speed up ovn-northd in the
aforementioned scenario by about 40%.  For exmaple, average northd
poll interval went down from 19.7 seconds to 10.2 seconds.  And the
maximum poll interval reduced from 31.7 to 14.9 seconds.


Version 2:
 - Moved LB-specific structures and function to lib/lb.[c,h].
 - 'ods' array in struct ovn_lb_group split in two: ls and lr.
 - Added missed handling of 'skip_snat' and 'event' options.
 - Minor re-base/re-factor.
 - Added 'Acked-by' from Dumitru to patches 1 and 4.

Version 3:
 - Code to manage struct ovn_lb_group split into separate functions
   in lib/lb.[c,h]:
     * ovn_lb_group_create()
     * ovn_lb_group_destroy()
     * ovn_lb_group_add_ls/lr()
 - Added 'Acked-by' from Dumitru to remaining patches.


Ilya Maximets (4):
  northd: Optimize load balancer lookups for groups.
  northd: Add datapaths to load balancers in bulk.
  northd: Retrieve load balancer options only once.
  northd: Re-use IP sets created for load balancer groups.

 lib/lb.c           | 126 +++++++++++++++++++++++--
 lib/lb.h           |  75 +++++++++++++--
 northd/en-northd.c |   2 +
 northd/northd.c    | 229 ++++++++++++++++++++++++++-------------------
 northd/northd.h    |  16 ++--
 5 files changed, 328 insertions(+), 120 deletions(-)

Comments

Dumitru Ceara Sept. 12, 2022, 10:25 a.m. UTC | #1
On 9/9/22 23:32, Ilya Maximets wrote:
> Re-compute of 'northd' node in ovn-northd may take almost half of the
> total processing time in case there is a big number of load balancers
> applied to multiple switches/routers or if there are huge load balancer
> groups applied to them.  The latter is a common case for ovn-kubernetes
> clusters.
> 
> This patch set is a result of profiling ovn-northd in ovn-heater
> density-heavy scenario with 500 fake nodes, which supposed to resemble
> high scale ovn-kubernetes setups.
> 
> There are no functional changes, only mechanical optimizations that
> allows to achieve exactly the same result by doing less work.
> 
> In total these patches allowed to speed up ovn-northd in the
> aforementioned scenario by about 40%.  For exmaple, average northd
> poll interval went down from 19.7 seconds to 10.2 seconds.  And the
> maximum poll interval reduced from 31.7 to 14.9 seconds.
> 
> 
> Version 2:
>  - Moved LB-specific structures and function to lib/lb.[c,h].
>  - 'ods' array in struct ovn_lb_group split in two: ls and lr.
>  - Added missed handling of 'skip_snat' and 'event' options.
>  - Minor re-base/re-factor.
>  - Added 'Acked-by' from Dumitru to patches 1 and 4.
> 
> Version 3:
>  - Code to manage struct ovn_lb_group split into separate functions
>    in lib/lb.[c,h]:
>      * ovn_lb_group_create()
>      * ovn_lb_group_destroy()
>      * ovn_lb_group_add_ls/lr()
>  - Added 'Acked-by' from Dumitru to remaining patches.
> 

I had another look at this version and the patches look good to go to
me; thanks!
Han Zhou Sept. 13, 2022, 5:18 p.m. UTC | #2
On Mon, Sep 12, 2022 at 3:25 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 9/9/22 23:32, Ilya Maximets wrote:
> > Re-compute of 'northd' node in ovn-northd may take almost half of the
> > total processing time in case there is a big number of load balancers
> > applied to multiple switches/routers or if there are huge load balancer
> > groups applied to them.  The latter is a common case for ovn-kubernetes
> > clusters.
> >
> > This patch set is a result of profiling ovn-northd in ovn-heater
> > density-heavy scenario with 500 fake nodes, which supposed to resemble
> > high scale ovn-kubernetes setups.
> >
> > There are no functional changes, only mechanical optimizations that
> > allows to achieve exactly the same result by doing less work.
> >
> > In total these patches allowed to speed up ovn-northd in the
> > aforementioned scenario by about 40%.  For exmaple, average northd
> > poll interval went down from 19.7 seconds to 10.2 seconds.  And the
> > maximum poll interval reduced from 31.7 to 14.9 seconds.
> >
> >
> > Version 2:
> >  - Moved LB-specific structures and function to lib/lb.[c,h].
> >  - 'ods' array in struct ovn_lb_group split in two: ls and lr.
> >  - Added missed handling of 'skip_snat' and 'event' options.
> >  - Minor re-base/re-factor.
> >  - Added 'Acked-by' from Dumitru to patches 1 and 4.
> >
> > Version 3:
> >  - Code to manage struct ovn_lb_group split into separate functions
> >    in lib/lb.[c,h]:
> >      * ovn_lb_group_create()
> >      * ovn_lb_group_destroy()
> >      * ovn_lb_group_add_ls/lr()
> >  - Added 'Acked-by' from Dumitru to remaining patches.
> >
>
> I had another look at this version and the patches look good to go to
> me; thanks!
>
Thanks Ilya and Dumitru.
Now that this series is merged to main. Shall we discuss/vote if it is
required to be backported? Dumitru has proposed it to be backported down to
the LTS branch-22.03.
Although it is not a bug fix, it seems to be important for large scale
environments with heavy LB usage. Any objections?

Thanks,
Han
Numan Siddique Sept. 15, 2022, 10:09 p.m. UTC | #3
On Tue, Sep 13, 2022 at 1:19 PM Han Zhou <hzhou@ovn.org> wrote:
>
> On Mon, Sep 12, 2022 at 3:25 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >
> > On 9/9/22 23:32, Ilya Maximets wrote:
> > > Re-compute of 'northd' node in ovn-northd may take almost half of the
> > > total processing time in case there is a big number of load balancers
> > > applied to multiple switches/routers or if there are huge load balancer
> > > groups applied to them.  The latter is a common case for ovn-kubernetes
> > > clusters.
> > >
> > > This patch set is a result of profiling ovn-northd in ovn-heater
> > > density-heavy scenario with 500 fake nodes, which supposed to resemble
> > > high scale ovn-kubernetes setups.
> > >
> > > There are no functional changes, only mechanical optimizations that
> > > allows to achieve exactly the same result by doing less work.
> > >
> > > In total these patches allowed to speed up ovn-northd in the
> > > aforementioned scenario by about 40%.  For exmaple, average northd
> > > poll interval went down from 19.7 seconds to 10.2 seconds.  And the
> > > maximum poll interval reduced from 31.7 to 14.9 seconds.
> > >
> > >
> > > Version 2:
> > >  - Moved LB-specific structures and function to lib/lb.[c,h].
> > >  - 'ods' array in struct ovn_lb_group split in two: ls and lr.
> > >  - Added missed handling of 'skip_snat' and 'event' options.
> > >  - Minor re-base/re-factor.
> > >  - Added 'Acked-by' from Dumitru to patches 1 and 4.
> > >
> > > Version 3:
> > >  - Code to manage struct ovn_lb_group split into separate functions
> > >    in lib/lb.[c,h]:
> > >      * ovn_lb_group_create()
> > >      * ovn_lb_group_destroy()
> > >      * ovn_lb_group_add_ls/lr()
> > >  - Added 'Acked-by' from Dumitru to remaining patches.
> > >
> >
> > I had another look at this version and the patches look good to go to
> > me; thanks!
> >
> Thanks Ilya and Dumitru.
> Now that this series is merged to main. Shall we discuss/vote if it is
> required to be backported? Dumitru has proposed it to be backported down to
> the LTS branch-22.03.
> Although it is not a bug fix, it seems to be important for large scale
> environments with heavy LB usage. Any objections?

No objections from my side.

Thanks
Numan

>
> Thanks,
> Han
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Mark Michelson Sept. 16, 2022, 1:54 p.m. UTC | #4
On 9/15/22 18:09, Numan Siddique wrote:
> On Tue, Sep 13, 2022 at 1:19 PM Han Zhou <hzhou@ovn.org> wrote:
>>
>> On Mon, Sep 12, 2022 at 3:25 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>
>>> On 9/9/22 23:32, Ilya Maximets wrote:
>>>> Re-compute of 'northd' node in ovn-northd may take almost half of the
>>>> total processing time in case there is a big number of load balancers
>>>> applied to multiple switches/routers or if there are huge load balancer
>>>> groups applied to them.  The latter is a common case for ovn-kubernetes
>>>> clusters.
>>>>
>>>> This patch set is a result of profiling ovn-northd in ovn-heater
>>>> density-heavy scenario with 500 fake nodes, which supposed to resemble
>>>> high scale ovn-kubernetes setups.
>>>>
>>>> There are no functional changes, only mechanical optimizations that
>>>> allows to achieve exactly the same result by doing less work.
>>>>
>>>> In total these patches allowed to speed up ovn-northd in the
>>>> aforementioned scenario by about 40%.  For exmaple, average northd
>>>> poll interval went down from 19.7 seconds to 10.2 seconds.  And the
>>>> maximum poll interval reduced from 31.7 to 14.9 seconds.
>>>>
>>>>
>>>> Version 2:
>>>>   - Moved LB-specific structures and function to lib/lb.[c,h].
>>>>   - 'ods' array in struct ovn_lb_group split in two: ls and lr.
>>>>   - Added missed handling of 'skip_snat' and 'event' options.
>>>>   - Minor re-base/re-factor.
>>>>   - Added 'Acked-by' from Dumitru to patches 1 and 4.
>>>>
>>>> Version 3:
>>>>   - Code to manage struct ovn_lb_group split into separate functions
>>>>     in lib/lb.[c,h]:
>>>>       * ovn_lb_group_create()
>>>>       * ovn_lb_group_destroy()
>>>>       * ovn_lb_group_add_ls/lr()
>>>>   - Added 'Acked-by' from Dumitru to remaining patches.
>>>>
>>>
>>> I had another look at this version and the patches look good to go to
>>> me; thanks!
>>>
>> Thanks Ilya and Dumitru.
>> Now that this series is merged to main. Shall we discuss/vote if it is
>> required to be backported? Dumitru has proposed it to be backported down to
>> the LTS branch-22.03.
>> Although it is not a bug fix, it seems to be important for large scale
>> environments with heavy LB usage. Any objections?
> 
> No objections from my side.
> 
> Thanks
> Numan
> 

No objections here either. I attempted to start this. I was able to 
create the 22.09 and 22.06 backports fairly easily. I will push those 
very soon.Applying the patches to branch-22.03 in particular results in 
many conflicts, so Ilya, I'll leave that to you to create the backport 
patch. Once it's up, I'll give it a review and merge it as well.

>>
>> Thanks,
>> Han
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>