diff mbox series

[ovs-dev] socket: Increase listen backlog to 128 everywhere.

Message ID 20240531194008.3171150-1-haleyb.dev@gmail.com
State Deferred
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev] socket: Increase listen backlog to 128 everywhere. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Brian Haley May 31, 2024, 7:40 p.m. UTC
An earlier patch [1] increased the size of the listen
backlog to 64. While that was a huge improvement over
10, further testing in large deployments showed 128
was even better.

Looking at 'ss -lmt' output over more than one week for
port 6641, captured across three different controllers,
the average was:

    listen(s, 10) : 1213 drops/day
    listen(s, 64) : 791 drops/day
    listen(s, 128): 657 drops/day

Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
output over one week for port 6641, again captured across
three different controllers, the average was:

    listen(s, 10) : 741 drops/day
    listen(s, 64) : 200 drops/day
    listen(s, 128): 22 drops/day

While having this value configurable would be the
best solution, changing to 128 is a quick fix that
should be good for all deployments. A link to the
original discussion is at [2].

[1] https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
[2] https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c

Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
---
 lib/socket-util.c    | 2 +-
 python/ovs/stream.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Simon Horman June 4, 2024, 9:05 a.m. UTC | #1
+ Ihar

On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
> An earlier patch [1] increased the size of the listen
> backlog to 64. While that was a huge improvement over
> 10, further testing in large deployments showed 128
> was even better.

nit: I would slightly prefer if a commit was referenced like this:

  commit 2b7efee031c3 ("socket: Increase listen backlog to 64 everywhere.")

> Looking at 'ss -lmt' output over more than one week for
> port 6641, captured across three different controllers,
> the average was:
> 
>     listen(s, 10) : 1213 drops/day
>     listen(s, 64) : 791 drops/day
>     listen(s, 128): 657 drops/day
> 
> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
> output over one week for port 6641, again captured across
> three different controllers, the average was:
> 
>     listen(s, 10) : 741 drops/day
>     listen(s, 64) : 200 drops/day
>     listen(s, 128): 22 drops/day
> 
> While having this value configurable would be the
> best solution, changing to 128 is a quick fix that
> should be good for all deployments. A link to the
> original discussion is at [2].
> 
> [1] https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> [2] https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c

nit: These two references are the same?

> Signed-off-by: Brian Haley <haleyb.dev@gmail.com>

I'd value input on this from Ihar (CCed) who worked on the cited commit.

> ---
>  lib/socket-util.c    | 2 +-
>  python/ovs/stream.py | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/socket-util.c b/lib/socket-util.c
> index c569b7d16..552310266 100644
> --- a/lib/socket-util.c
> +++ b/lib/socket-util.c
> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target, int default_port,
>      }
>  
>      /* Listen. */
> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
>          error = sock_errno();
>          VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>          goto error;
> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
> index dbb6b2e1f..874fe0bd5 100644
> --- a/python/ovs/stream.py
> +++ b/python/ovs/stream.py
> @@ -620,7 +620,7 @@ class PassiveStream(object):
>              raise Exception('Unknown connection string')
>  
>          try:
> -            sock.listen(64)
> +            sock.listen(128)
>          except socket.error as e:
>              vlog.err("%s: listen: %s" % (name, os.strerror(e.error)))
>              sock.close()
> -- 
> 2.34.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Alin Gabriel Serdean June 4, 2024, 9:54 a.m. UTC | #2
Does it make sense to make this configurable via automake in order to avoid
future patches if we need to bump the value further?

On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org> wrote:

> + Ihar
>
> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
> > An earlier patch [1] increased the size of the listen
> > backlog to 64. While that was a huge improvement over
> > 10, further testing in large deployments showed 128
> > was even better.
>
> nit: I would slightly prefer if a commit was referenced like this:
>
>   commit 2b7efee031c3 ("socket: Increase listen backlog to 64 everywhere.")
>
> > Looking at 'ss -lmt' output over more than one week for
> > port 6641, captured across three different controllers,
> > the average was:
> >
> >     listen(s, 10) : 1213 drops/day
> >     listen(s, 64) : 791 drops/day
> >     listen(s, 128): 657 drops/day
> >
> > Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
> > output over one week for port 6641, again captured across
> > three different controllers, the average was:
> >
> >     listen(s, 10) : 741 drops/day
> >     listen(s, 64) : 200 drops/day
> >     listen(s, 128): 22 drops/day
> >
> > While having this value configurable would be the
> > best solution, changing to 128 is a quick fix that
> > should be good for all deployments. A link to the
> > original discussion is at [2].
> >
> > [1]
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> > [2]
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>
> nit: These two references are the same?
>
> > Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
>
> I'd value input on this from Ihar (CCed) who worked on the cited commit.
>
> > ---
> >  lib/socket-util.c    | 2 +-
> >  python/ovs/stream.py | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/socket-util.c b/lib/socket-util.c
> > index c569b7d16..552310266 100644
> > --- a/lib/socket-util.c
> > +++ b/lib/socket-util.c
> > @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target, int
> default_port,
> >      }
> >
> >      /* Listen. */
> > -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
> > +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
> >          error = sock_errno();
> >          VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
> >          goto error;
> > diff --git a/python/ovs/stream.py b/python/ovs/stream.py
> > index dbb6b2e1f..874fe0bd5 100644
> > --- a/python/ovs/stream.py
> > +++ b/python/ovs/stream.py
> > @@ -620,7 +620,7 @@ class PassiveStream(object):
> >              raise Exception('Unknown connection string')
> >
> >          try:
> > -            sock.listen(64)
> > +            sock.listen(128)
> >          except socket.error as e:
> >              vlog.err("%s: listen: %s" % (name, os.strerror(e.error)))
> >              sock.close()
> > --
> > 2.34.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ilya Maximets June 4, 2024, 10:27 a.m. UTC | #3
On 6/4/24 11:54, Alin Serdean wrote:
> Does it make sense to make this configurable via automake in order to avoid
> future patches if we need to bump the value further?

I'm still not convinced this is an issue worth fixing.

The original discussion is here:
  https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html

It's not clear if these reconnections actually cause any harm or are
even helpful in getting to connect to a less busy server in a general
case.

The original number 10 was clearly way too low for any reasonable workload.
But even with that we lived for a very long time with very large clusters
without any issues.

The main problem in the original thread was that a lot of neutron clients
are having leader-only connections to the database for seemingly no reason.
That results in unnecessary mass re-connection on leadership change.
So, I'd prefer this fixed in OpenStack instead.

Best regards, Ilya Maximets.

> 
> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org> wrote:
> 
>> + Ihar
>>
>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
>>> An earlier patch [1] increased the size of the listen
>>> backlog to 64. While that was a huge improvement over
>>> 10, further testing in large deployments showed 128
>>> was even better.
>>
>> nit: I would slightly prefer if a commit was referenced like this:
>>
>>   commit 2b7efee031c3 ("socket: Increase listen backlog to 64 everywhere.")
>>
>>> Looking at 'ss -lmt' output over more than one week for
>>> port 6641, captured across three different controllers,
>>> the average was:
>>>
>>>     listen(s, 10) : 1213 drops/day
>>>     listen(s, 64) : 791 drops/day
>>>     listen(s, 128): 657 drops/day
>>>
>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
>>> output over one week for port 6641, again captured across
>>> three different controllers, the average was:
>>>
>>>     listen(s, 10) : 741 drops/day
>>>     listen(s, 64) : 200 drops/day
>>>     listen(s, 128): 22 drops/day
>>>
>>> While having this value configurable would be the
>>> best solution, changing to 128 is a quick fix that
>>> should be good for all deployments. A link to the
>>> original discussion is at [2].
>>>
>>> [1]
>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>>> [2]
>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>>
>> nit: These two references are the same?
>>
>>> Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
>>
>> I'd value input on this from Ihar (CCed) who worked on the cited commit.
>>
>>> ---
>>>  lib/socket-util.c    | 2 +-
>>>  python/ovs/stream.py | 2 +-
>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
>>> index c569b7d16..552310266 100644
>>> --- a/lib/socket-util.c
>>> +++ b/lib/socket-util.c
>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target, int
>> default_port,
>>>      }
>>>
>>>      /* Listen. */
>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
>>>          error = sock_errno();
>>>          VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>>>          goto error;
>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
>>> index dbb6b2e1f..874fe0bd5 100644
>>> --- a/python/ovs/stream.py
>>> +++ b/python/ovs/stream.py
>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
>>>              raise Exception('Unknown connection string')
>>>
>>>          try:
>>> -            sock.listen(64)
>>> +            sock.listen(128)
>>>          except socket.error as e:
>>>              vlog.err("%s: listen: %s" % (name, os.strerror(e.error)))
>>>              sock.close()
>>> --
>>> 2.34.1
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Brian Haley June 5, 2024, 3:17 p.m. UTC | #4
Hi Ilya,

On 6/4/24 6:27 AM, Ilya Maximets wrote:
> On 6/4/24 11:54, Alin Serdean wrote:
>> Does it make sense to make this configurable via automake in order to avoid
>> future patches if we need to bump the value further?
> 
> I'm still not convinced this is an issue worth fixing.
> 
> The original discussion is here:
>    https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html
> 
> It's not clear if these reconnections actually cause any harm or are
> even helpful in getting to connect to a less busy server in a general
> case.

To me, not having to reconnect is a win, especially given the number of 
client threads here. And 128 is still not that much for a server.

> The original number 10 was clearly way too low for any reasonable workload.
> But even with that we lived for a very long time with very large clusters
> without any issues.

It could have also been that noone ever noticed, I know I didn't until 
digging deeper into this.

> The main problem in the original thread was that a lot of neutron clients
> are having leader-only connections to the database for seemingly no reason.
> That results in unnecessary mass re-connection on leadership change.
> So, I'd prefer this fixed in OpenStack instead.

But isn't having many leader-only connections a perfectly valid config 
when most if not all are writers? And if a small increase in the backlog 
helps I don't see any reason not to do it. Fixing in Openstack would 
involve one of two options from what I know - 1) Use a proxy; 2) Just 
connect irregardless of leadership status. I'm not sure #2 won't cause 
some other problems as the secondaries would have to forward things to 
the leader, correct?

The best solution would be to have this configurable if you see that as 
a viable option.

-Brian

> Best regards, Ilya Maximets.
> 
>>
>> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org> wrote:
>>
>>> + Ihar
>>>
>>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
>>>> An earlier patch [1] increased the size of the listen
>>>> backlog to 64. While that was a huge improvement over
>>>> 10, further testing in large deployments showed 128
>>>> was even better.
>>>
>>> nit: I would slightly prefer if a commit was referenced like this:
>>>
>>>    commit 2b7efee031c3 ("socket: Increase listen backlog to 64 everywhere.")
>>>
>>>> Looking at 'ss -lmt' output over more than one week for
>>>> port 6641, captured across three different controllers,
>>>> the average was:
>>>>
>>>>      listen(s, 10) : 1213 drops/day
>>>>      listen(s, 64) : 791 drops/day
>>>>      listen(s, 128): 657 drops/day
>>>>
>>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
>>>> output over one week for port 6641, again captured across
>>>> three different controllers, the average was:
>>>>
>>>>      listen(s, 10) : 741 drops/day
>>>>      listen(s, 64) : 200 drops/day
>>>>      listen(s, 128): 22 drops/day
>>>>
>>>> While having this value configurable would be the
>>>> best solution, changing to 128 is a quick fix that
>>>> should be good for all deployments. A link to the
>>>> original discussion is at [2].
>>>>
>>>> [1]
>>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>>>> [2]
>>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>>>
>>> nit: These two references are the same?
>>>
>>>> Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
>>>
>>> I'd value input on this from Ihar (CCed) who worked on the cited commit.
>>>
>>>> ---
>>>>   lib/socket-util.c    | 2 +-
>>>>   python/ovs/stream.py | 2 +-
>>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
>>>> index c569b7d16..552310266 100644
>>>> --- a/lib/socket-util.c
>>>> +++ b/lib/socket-util.c
>>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target, int
>>> default_port,
>>>>       }
>>>>
>>>>       /* Listen. */
>>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
>>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
>>>>           error = sock_errno();
>>>>           VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>>>>           goto error;
>>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
>>>> index dbb6b2e1f..874fe0bd5 100644
>>>> --- a/python/ovs/stream.py
>>>> +++ b/python/ovs/stream.py
>>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
>>>>               raise Exception('Unknown connection string')
>>>>
>>>>           try:
>>>> -            sock.listen(64)
>>>> +            sock.listen(128)
>>>>           except socket.error as e:
>>>>               vlog.err("%s: listen: %s" % (name, os.strerror(e.error)))
>>>>               sock.close()
>>>> --
>>>> 2.34.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ihar Hrachyshka June 6, 2024, 3:12 p.m. UTC | #5
On Wed, Jun 5, 2024 at 11:17 AM Brian Haley <haleyb.dev@gmail.com> wrote:

> Hi Ilya,
>
> On 6/4/24 6:27 AM, Ilya Maximets wrote:
> > On 6/4/24 11:54, Alin Serdean wrote:
> >> Does it make sense to make this configurable via automake in order to
> avoid
> >> future patches if we need to bump the value further?
> >
> > I'm still not convinced this is an issue worth fixing.
> >
> > The original discussion is here:
> >
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html
> >
> > It's not clear if these reconnections actually cause any harm or are
> > even helpful in getting to connect to a less busy server in a general
> > case.
>
> To me, not having to reconnect is a win, especially given the number of
> client threads here. And 128 is still not that much for a server.
>
> > The original number 10 was clearly way too low for any reasonable
> workload.
> > But even with that we lived for a very long time with very large clusters
> > without any issues.
>
> It could have also been that noone ever noticed, I know I didn't until
> digging deeper into this.
>
> > The main problem in the original thread was that a lot of neutron clients
> > are having leader-only connections to the database for seemingly no
> reason.
>

They write, so AFAIU they need a leader (even if they connect to secondary,
writes will be proxied), correct?

There may be place for some OpenStack optimizations (e.g. consolidating all
neutron agents into one that would maintain a single connection per
chassis), but ultimately, it's still O(n) where n = number of chassis in
the cluster; and we have to have at least 2 connections per chassis (1
ovn-controller and at least 1 neutron agent, for metadata). But I would not
overestimate the amount of settings where the number of connections from
neutron agents can be reduced. This is only an option where multiple
neutron agents are running on the same chassis. Usually, we run
ovn-controller and ovn-metadata-agent. We may also run ovn-agent (for qos)
or bgp-agent (for BGP) but these are not deployed in all environments / on
all nodes.

Apart from it, deploying a proxy per node will allow to coalesce the number
of connections to 1 per chassis. One can go further and build a tree of
proxies to keep the number of connections to the leader at bay (is it O(log
n)?) This is not a trivial change to architecture.


> > That results in unnecessary mass re-connection on leadership change.
> > So, I'd prefer this fixed in OpenStack instead.
>
> But isn't having many leader-only connections a perfectly valid config
> when most if not all are writers? And if a small increase in the backlog
> helps I don't see any reason not to do it. Fixing in Openstack would
> involve one of two options from what I know - 1) Use a proxy; 2) Just
> connect irregardless of leadership status. I'm not sure #2 won't cause
> some other problems as the secondaries would have to forward things to
> the leader, correct?
>
> The best solution would be to have this configurable if you see that as
> a viable option.
>
> -Brian
>
> > Best regards, Ilya Maximets.
> >
> >>
> >> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org> wrote:
> >>
> >>> + Ihar
> >>>
> >>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
> >>>> An earlier patch [1] increased the size of the listen
> >>>> backlog to 64. While that was a huge improvement over
> >>>> 10, further testing in large deployments showed 128
> >>>> was even better.
> >>>
>

What I lack in this commit message is the definition of "better". I see
stats of drops using ss and netstat, but it's not clear what the user's
visible effect of this is. Do you have a bug report to link to, with some
symptoms described?


> >>> nit: I would slightly prefer if a commit was referenced like this:
> >>>
> >>>    commit 2b7efee031c3 ("socket: Increase listen backlog to 64
> everywhere.")
> >>>
> >>>> Looking at 'ss -lmt' output over more than one week for
> >>>> port 6641, captured across three different controllers,
> >>>> the average was:
> >>>>
> >>>>      listen(s, 10) : 1213 drops/day
> >>>>      listen(s, 64) : 791 drops/day
> >>>>      listen(s, 128): 657 drops/day
> >>>>
> >>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
> >>>> output over one week for port 6641, again captured across
> >>>> three different controllers, the average was:
> >>>>
> >>>>      listen(s, 10) : 741 drops/day
> >>>>      listen(s, 64) : 200 drops/day
> >>>>      listen(s, 128): 22 drops/day
>

So why not 256? 512? 1024? It's always an issue when a single value is
picked for all environments, isn't it.

The suggestion to have it configurable seems to me to be the way to go.


> >>>>
> >>>> While having this value configurable would be the
> >>>> best solution, changing to 128 is a quick fix that
> >>>> should be good for all deployments. A link to the
> >>>> original discussion is at [2].
> >>>>
> >>>> [1]
> >>>
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> >>>> [2]
> >>>
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> >>>
> >>> nit: These two references are the same?
> >>>
> >>>> Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
> >>>
> >>> I'd value input on this from Ihar (CCed) who worked on the cited
> commit.
> >>>
> >>>> ---
> >>>>   lib/socket-util.c    | 2 +-
> >>>>   python/ovs/stream.py | 2 +-
> >>>>   2 files changed, 2 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
> >>>> index c569b7d16..552310266 100644
> >>>> --- a/lib/socket-util.c
> >>>> +++ b/lib/socket-util.c
> >>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target,
> int
> >>> default_port,
> >>>>       }
> >>>>
> >>>>       /* Listen. */
> >>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
> >>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
> >>>>           error = sock_errno();
> >>>>           VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
> >>>>           goto error;
> >>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
> >>>> index dbb6b2e1f..874fe0bd5 100644
> >>>> --- a/python/ovs/stream.py
> >>>> +++ b/python/ovs/stream.py
> >>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
> >>>>               raise Exception('Unknown connection string')
> >>>>
> >>>>           try:
> >>>> -            sock.listen(64)
> >>>> +            sock.listen(128)
> >>>>           except socket.error as e:
> >>>>               vlog.err("%s: listen: %s" % (name,
> os.strerror(e.error)))
> >>>>               sock.close()
> >>>> --
> >>>> 2.34.1
> >>>>
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> dev@openvswitch.org
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
>
Ihar Hrachyshka June 6, 2024, 4:02 p.m. UTC | #6
On Thu, Jun 6, 2024 at 11:12 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote:

> On Wed, Jun 5, 2024 at 11:17 AM Brian Haley <haleyb.dev@gmail.com> wrote:
>
>> Hi Ilya,
>>
>> On 6/4/24 6:27 AM, Ilya Maximets wrote:
>> > On 6/4/24 11:54, Alin Serdean wrote:
>> >> Does it make sense to make this configurable via automake in order to
>> avoid
>> >> future patches if we need to bump the value further?
>> >
>> > I'm still not convinced this is an issue worth fixing.
>> >
>> > The original discussion is here:
>> >
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html
>> >
>> > It's not clear if these reconnections actually cause any harm or are
>> > even helpful in getting to connect to a less busy server in a general
>> > case.
>>
>> To me, not having to reconnect is a win, especially given the number of
>> client threads here. And 128 is still not that much for a server.
>>
>> > The original number 10 was clearly way too low for any reasonable
>> workload.
>> > But even with that we lived for a very long time with very large
>> clusters
>> > without any issues.
>>
>> It could have also been that noone ever noticed, I know I didn't until
>> digging deeper into this.
>>
>> > The main problem in the original thread was that a lot of neutron
>> clients
>> > are having leader-only connections to the database for seemingly no
>> reason.
>>
>
> They write, so AFAIU they need a leader (even if they connect to
> secondary, writes will be proxied), correct?
>
> There may be place for some OpenStack optimizations (e.g. consolidating
> all neutron agents into one that would maintain a single connection per
> chassis), but ultimately, it's still O(n) where n = number of chassis in
> the cluster; and we have to have at least 2 connections per chassis (1
> ovn-controller and at least 1 neutron agent, for metadata). But I would not
> overestimate the amount of settings where the number of connections from
> neutron agents can be reduced. This is only an option where multiple
> neutron agents are running on the same chassis. Usually, we run
> ovn-controller and ovn-metadata-agent. We may also run ovn-agent (for qos)
> or bgp-agent (for BGP) but these are not deployed in all environments / on
> all nodes.
>
> Apart from it, deploying a proxy per node will allow to coalesce the
> number of connections to 1 per chassis. One can go further and build a tree
> of proxies to keep the number of connections to the leader at bay (is it
> O(log n)?) This is not a trivial change to architecture.
>

OK disregard this (somewhat): Brian experiences issues with NB, not SB. NB
connections come from neutron-server workers (not agents). In this case,
OpenStack would have to run its own "proxy", per node - for all workers to
share, or an ovsdb proxy.

Regarding the switch from leader-only to any member for NB connections by
workers: is the argument to do it about the fact that the number of
secondaries is a lot lower than the number of neutron-server workers, so
the load on a leader is also lower? (2 or 4 connections from secondaries
vs. N*W connections, where N is number of API controllers and W is the
number of workers per neutron-server)

What would be the drawbacks we should consider? I can imagine the load on
secondary members will increase (but the load on the leader will decrease).
Plus probably some minor lag in request processing because of proxying
through secondaries (for most - but not all - neutron-server workers).
Anything else?


>
>
>> > That results in unnecessary mass re-connection on leadership change.
>> > So, I'd prefer this fixed in OpenStack instead.
>>
>> But isn't having many leader-only connections a perfectly valid config
>> when most if not all are writers? And if a small increase in the backlog
>> helps I don't see any reason not to do it. Fixing in Openstack would
>> involve one of two options from what I know - 1) Use a proxy; 2) Just
>> connect irregardless of leadership status. I'm not sure #2 won't cause
>> some other problems as the secondaries would have to forward things to
>> the leader, correct?
>>
>> The best solution would be to have this configurable if you see that as
>> a viable option.
>>
>> -Brian
>>
>> > Best regards, Ilya Maximets.
>> >
>> >>
>> >> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org> wrote:
>> >>
>> >>> + Ihar
>> >>>
>> >>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
>> >>>> An earlier patch [1] increased the size of the listen
>> >>>> backlog to 64. While that was a huge improvement over
>> >>>> 10, further testing in large deployments showed 128
>> >>>> was even better.
>> >>>
>>
>
> What I lack in this commit message is the definition of "better". I see
> stats of drops using ss and netstat, but it's not clear what the user's
> visible effect of this is. Do you have a bug report to link to, with some
> symptoms described?
>
>
>> >>> nit: I would slightly prefer if a commit was referenced like this:
>> >>>
>> >>>    commit 2b7efee031c3 ("socket: Increase listen backlog to 64
>> everywhere.")
>> >>>
>> >>>> Looking at 'ss -lmt' output over more than one week for
>> >>>> port 6641, captured across three different controllers,
>> >>>> the average was:
>> >>>>
>> >>>>      listen(s, 10) : 1213 drops/day
>> >>>>      listen(s, 64) : 791 drops/day
>> >>>>      listen(s, 128): 657 drops/day
>> >>>>
>> >>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
>> >>>> output over one week for port 6641, again captured across
>> >>>> three different controllers, the average was:
>> >>>>
>> >>>>      listen(s, 10) : 741 drops/day
>> >>>>      listen(s, 64) : 200 drops/day
>> >>>>      listen(s, 128): 22 drops/day
>>
>
> So why not 256? 512? 1024? It's always an issue when a single value is
> picked for all environments, isn't it.
>
> The suggestion to have it configurable seems to me to be the way to go.
>
>
>> >>>>
>> >>>> While having this value configurable would be the
>> >>>> best solution, changing to 128 is a quick fix that
>> >>>> should be good for all deployments. A link to the
>> >>>> original discussion is at [2].
>> >>>>
>> >>>> [1]
>> >>>
>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>> >>>> [2]
>> >>>
>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>> >>>
>> >>> nit: These two references are the same?
>> >>>
>> >>>> Signed-off-by: Brian Haley <haleyb.dev@gmail.com>
>> >>>
>> >>> I'd value input on this from Ihar (CCed) who worked on the cited
>> commit.
>> >>>
>> >>>> ---
>> >>>>   lib/socket-util.c    | 2 +-
>> >>>>   python/ovs/stream.py | 2 +-
>> >>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>> >>>>
>> >>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
>> >>>> index c569b7d16..552310266 100644
>> >>>> --- a/lib/socket-util.c
>> >>>> +++ b/lib/socket-util.c
>> >>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target,
>> int
>> >>> default_port,
>> >>>>       }
>> >>>>
>> >>>>       /* Listen. */
>> >>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
>> >>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
>> >>>>           error = sock_errno();
>> >>>>           VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>> >>>>           goto error;
>> >>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
>> >>>> index dbb6b2e1f..874fe0bd5 100644
>> >>>> --- a/python/ovs/stream.py
>> >>>> +++ b/python/ovs/stream.py
>> >>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
>> >>>>               raise Exception('Unknown connection string')
>> >>>>
>> >>>>           try:
>> >>>> -            sock.listen(64)
>> >>>> +            sock.listen(128)
>> >>>>           except socket.error as e:
>> >>>>               vlog.err("%s: listen: %s" % (name,
>> os.strerror(e.error)))
>> >>>>               sock.close()
>> >>>> --
>> >>>> 2.34.1
>> >>>>
>> >>>> _______________________________________________
>> >>>> dev mailing list
>> >>>> dev@openvswitch.org
>> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >>>>
>> >>> _______________________________________________
>> >>> dev mailing list
>> >>> dev@openvswitch.org
>> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >>>
>> >> _______________________________________________
>> >> dev mailing list
>> >> dev@openvswitch.org
>> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >
>>
>>
Ilya Maximets Aug. 2, 2024, 11:59 a.m. UTC | #7
On 6/6/24 18:02, Ihar Hrachyshka wrote:
> On Thu, Jun 6, 2024 at 11:12 AM Ihar Hrachyshka <ihrachys@redhat.com <mailto:ihrachys@redhat.com>> wrote:
> 
>     On Wed, Jun 5, 2024 at 11:17 AM Brian Haley <haleyb.dev@gmail.com <mailto:haleyb.dev@gmail.com>> wrote:
> 
>         Hi Ilya,
> 
>         On 6/4/24 6:27 AM, Ilya Maximets wrote:
>         > On 6/4/24 11:54, Alin Serdean wrote:
>         >> Does it make sense to make this configurable via automake in order to avoid
>         >> future patches if we need to bump the value further?
>         >
>         > I'm still not convinced this is an issue worth fixing.
>         >
>         > The original discussion is here:
>         >    https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html>
>         >
>         > It's not clear if these reconnections actually cause any harm or are
>         > even helpful in getting to connect to a less busy server in a general
>         > case.
> 
>         To me, not having to reconnect is a win, especially given the number of
>         client threads here. And 128 is still not that much for a server.
> 
>         > The original number 10 was clearly way too low for any reasonable workload.
>         > But even with that we lived for a very long time with very large clusters
>         > without any issues.
> 
>         It could have also been that noone ever noticed, I know I didn't until
>         digging deeper into this.
> 
>         > The main problem in the original thread was that a lot of neutron clients
>         > are having leader-only connections to the database for seemingly no reason.
> 
> 
>     They write, so AFAIU they need a leader (even if they connect to secondary,>     writes will be proxied), correct?

They will be proxied, however, if you're sending transactions through
the leader, it will not reply until the transaction is replicated to
the majority of followers.  So, there is no much difference in latency.

Only the "heavy" writers benefit from connecting to a leader, because
if you have multiple clients with constant stream of transactions from
all of them, followers will need to re-try due to prerequisite changes
once the other server commits a different transaction.  But AFAIU since
you have many clients, probably none of them are "heavy" writers.

> 
>     There may be place for some OpenStack optimizations (e.g. consolidating all
> neutron agents into one that would maintain a single connection per chassis), but
> ultimately, it's still O(n) where n = number of chassis in the cluster; and we have
> to have at least 2 connections per chassis (1 ovn-controller and at least 1 neutron
> agent, for metadata). But I would not overestimate the amount of settings where the
> number of connections from neutron agents can be reduced. This is only an option
> where multiple neutron agents are running on the same chassis. Usually, we run
> ovn-controller and ovn-metadata-agent. We may also run ovn-agent (for qos) or
> bgp-agent (for BGP) but these are not deployed in all environments / on all nodes.
> 
>     Apart from it, deploying a proxy per node will allow to coalesce the number of
> connections to 1 per chassis. One can go further and build a tree of proxies to keep
> the number of connections to the leader at bay (is it O(log n)?) This is not a trivial
> change to architecture.
> 
> 
> OK disregard this (somewhat): Brian experiences issues with NB, not SB. NB connections
> come from neutron-server workers (not agents). In this case, OpenStack would have to
> run its own "proxy", per node - for all workers to share, or an ovsdb proxy.
> 
> Regarding the switch from leader-only to any member for NB connections by workers: is
> the argument to do it about the fact that the number of secondaries is a lot lower than
> the number of neutron-server workers, so the load on a leader is also lower? (2 or 4
> connections from secondaries vs. N*W connections, where N is number of API controllers
> and W is the number of workers per neutron-server)
> 
> What would be the drawbacks we should consider? I can imagine the load on secondary
> members will increase (but the load on the leader will decrease). Plus probably some
> minor lag in request processing because of proxying through secondaries (for most - but
> not all - neutron-server workers). Anything else?

Lag will likely not be noticeable, since, as I mentioned above, leader waits for
followers to acknowledge every transaction anyway before it is considered committed.

There is some extra computational cost on a follower if there are many simultaneous
transactions executed through different servers.  Only one of the servers will be
able to commit the transaction, others will receive a prerequisite mismatch error,
because committing any transaction changes prerequisites.  However, this error is
transient and not returned to the client.  Instead, the servers that received this
error from the leader will re-try their transactions until they succeed.  This takes
some CPU resources.  However, as I said before, this is only a problem if you have
many write-heavy clients sending constant stream of transactions without any
breathing room.

So, I'd recommend to try this out.

>  
> 
>      
> 
>         > That results in unnecessary mass re-connection on leadership change.
>         > So, I'd prefer this fixed in OpenStack instead.
> 
>         But isn't having many leader-only connections a perfectly valid config
>         when most if not all are writers? And if a small increase in the backlog
>         helps I don't see any reason not to do it. Fixing in Openstack would
>         involve one of two options from what I know - 1) Use a proxy; 2) Just
>         connect irregardless of leadership status. I'm not sure #2 won't cause
>         some other problems as the secondaries would have to forward things to
>         the leader, correct?
> 
>         The best solution would be to have this configurable if you see that as
>         a viable option.
> 
>         -Brian
> 
>         > Best regards, Ilya Maximets.
>         >
>         >>
>         >> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org <mailto:horms@ovn.org>> wrote:
>         >>
>         >>> + Ihar
>         >>>
>         >>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
>         >>>> An earlier patch [1] increased the size of the listen
>         >>>> backlog to 64. While that was a huge improvement over
>         >>>> 10, further testing in large deployments showed 128
>         >>>> was even better.
>         >>>
> 
> 
>     What I lack in this commit message is the definition of "better". I see stats of drops using ss and netstat, but it's not clear what the user's visible effect of this is. Do you have a bug report to link to, with some symptoms described?
>      
> 
>         >>> nit: I would slightly prefer if a commit was referenced like this:
>         >>>
>         >>>    commit 2b7efee031c3 ("socket: Increase listen backlog to 64 everywhere.")
>         >>>
>         >>>> Looking at 'ss -lmt' output over more than one week for
>         >>>> port 6641, captured across three different controllers,
>         >>>> the average was:
>         >>>>
>         >>>>      listen(s, 10) : 1213 drops/day
>         >>>>      listen(s, 64) : 791 drops/day
>         >>>>      listen(s, 128): 657 drops/day
>         >>>>
>         >>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
>         >>>> output over one week for port 6641, again captured across
>         >>>> three different controllers, the average was:
>         >>>>
>         >>>>      listen(s, 10) : 741 drops/day
>         >>>>      listen(s, 64) : 200 drops/day
>         >>>>      listen(s, 128): 22 drops/day
> 
> 
>     So why not 256? 512? 1024? It's always an issue when a single value is picked for all environments, isn't it.
> 
>     The suggestion to have it configurable seems to me to be the way to go.
>      
> 
>         >>>>
>         >>>> While having this value configurable would be the
>         >>>> best solution, changing to 128 is a quick fix that
>         >>>> should be good for all deployments. A link to the
>         >>>> original discussion is at [2].
>         >>>>
>         >>>> [1]
>         >>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c <https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c>
>         >>>> [2]
>         >>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c <https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c>
>         >>>
>         >>> nit: These two references are the same?
>         >>>
>         >>>> Signed-off-by: Brian Haley <haleyb.dev@gmail.com <mailto:haleyb.dev@gmail.com>>
>         >>>
>         >>> I'd value input on this from Ihar (CCed) who worked on the cited commit.
>         >>>
>         >>>> ---
>         >>>>   lib/socket-util.c    | 2 +-
>         >>>>   python/ovs/stream.py | 2 +-
>         >>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>         >>>>
>         >>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
>         >>>> index c569b7d16..552310266 100644
>         >>>> --- a/lib/socket-util.c
>         >>>> +++ b/lib/socket-util.c
>         >>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target, int
>         >>> default_port,
>         >>>>       }
>         >>>>
>         >>>>       /* Listen. */
>         >>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
>         >>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
>         >>>>           error = sock_errno();
>         >>>>           VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>         >>>>           goto error;
>         >>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
>         >>>> index dbb6b2e1f..874fe0bd5 100644
>         >>>> --- a/python/ovs/stream.py
>         >>>> +++ b/python/ovs/stream.py
>         >>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
>         >>>>               raise Exception('Unknown connection string')
>         >>>>
>         >>>>           try:
>         >>>> -            sock.listen(64)
>         >>>> +            sock.listen(128)
>         >>>>           except socket.error as e:
>         >>>>               vlog.err("%s: listen: %s" % (name, os.strerror(e.error)))
>         >>>>               sock.close()
>         >>>> --
>         >>>> 2.34.1
>         >>>>
>         >>>> _______________________________________________
>         >>>> dev mailing list
>         >>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>         >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>         >>>>
>         >>> _______________________________________________
>         >>> dev mailing list
>         >>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>         >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>         >>>
>         >> _______________________________________________
>         >> dev mailing list
>         >> dev@openvswitch.org <mailto:dev@openvswitch.org>
>         >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>         >
>
Brian Haley Aug. 8, 2024, 5:55 p.m. UTC | #8
Hi Ilya and others,

Thanks for responding, comments below.

On 8/2/24 7:59 AM, Ilya Maximets wrote:
> On 6/6/24 18:02, Ihar Hrachyshka wrote:
>> On Thu, Jun 6, 2024 at 11:12 AM Ihar Hrachyshka <ihrachys@redhat.com <mailto:ihrachys@redhat.com>> wrote:
>>
>>      On Wed, Jun 5, 2024 at 11:17 AM Brian Haley <haleyb.dev@gmail.com <mailto:haleyb.dev@gmail.com>> wrote:
>>
>>          Hi Ilya,
>>
>>          On 6/4/24 6:27 AM, Ilya Maximets wrote:
>>          > On 6/4/24 11:54, Alin Serdean wrote:
>>          >> Does it make sense to make this configurable via automake in order to avoid
>>          >> future patches if we need to bump the value further?
>>          >
>>          > I'm still not convinced this is an issue worth fixing.
>>          >
>>          > The original discussion is here:
>>          >    https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html <https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html>
>>          >
>>          > It's not clear if these reconnections actually cause any harm or are
>>          > even helpful in getting to connect to a less busy server in a general
>>          > case.
>>
>>          To me, not having to reconnect is a win, especially given the number of
>>          client threads here. And 128 is still not that much for a server.

An update on this since maybe it will make people re-think increasing 
the socket backlog to 128.

After running for 50+ days (now closer to 75) in production with a 128 
backlog, my customer has seen no increase in listen queue overflow or 
"SYN to LISTEN drops" statistics on the OVN units. And the leader 
elections/changes have dropped to about once an hour. At 64 they are 
still seen. To me that shows it can help in larger deployments, even if 
seeing them is more an annoyance than anything (they think it's more).

>>          > The original number 10 was clearly way too low for any reasonable workload.
>>          > But even with that we lived for a very long time with very large clusters
>>          > without any issues.
>>
>>          It could have also been that noone ever noticed, I know I didn't until
>>          digging deeper into this.
>>
>>          > The main problem in the original thread was that a lot of neutron clients
>>          > are having leader-only connections to the database for seemingly no reason.
>>
>>
>>      They write, so AFAIU they need a leader (even if they connect to secondary,>     writes will be proxied), correct?
> 
> They will be proxied, however, if you're sending transactions through
> the leader, it will not reply until the transaction is replicated to
> the majority of followers.  So, there is no much difference in latency.
> 
> Only the "heavy" writers benefit from connecting to a leader, because
> if you have multiple clients with constant stream of transactions from
> all of them, followers will need to re-try due to prerequisite changes
> once the other server commits a different transaction.  But AFAIU since
> you have many clients, probably none of them are "heavy" writers.

I can't say I have an exact answer for the "heavy" writers question, 
just that some of our customer clouds are very heavily used, i.e. it's a 
never-ending cycle of building and tearing-down. Think of a CI pipeline 
that's run on every patch, to infinity. It's not "light" and never in 
what I'd call a stable state for very long.

I will say I'm interested in exploring using a "leader-only=false" 
deployment, I'll try and loop-in some other Canonical people to see what 
they think of that. Ihar had sent a test patch at [0], not sure if it 
was linked here before.

Thanks,

-Brian

[0] https://review.opendev.org/c/openstack/neutron/+/921461

>>      There may be place for some OpenStack optimizations (e.g. consolidating all
>> neutron agents into one that would maintain a single connection per chassis), but
>> ultimately, it's still O(n) where n = number of chassis in the cluster; and we have
>> to have at least 2 connections per chassis (1 ovn-controller and at least 1 neutron
>> agent, for metadata). But I would not overestimate the amount of settings where the
>> number of connections from neutron agents can be reduced. This is only an option
>> where multiple neutron agents are running on the same chassis. Usually, we run
>> ovn-controller and ovn-metadata-agent. We may also run ovn-agent (for qos) or
>> bgp-agent (for BGP) but these are not deployed in all environments / on all nodes.
>>
>>      Apart from it, deploying a proxy per node will allow to coalesce the number of
>> connections to 1 per chassis. One can go further and build a tree of proxies to keep
>> the number of connections to the leader at bay (is it O(log n)?) This is not a trivial
>> change to architecture.
>>
>>
>> OK disregard this (somewhat): Brian experiences issues with NB, not SB. NB connections
>> come from neutron-server workers (not agents). In this case, OpenStack would have to
>> run its own "proxy", per node - for all workers to share, or an ovsdb proxy.
>>
>> Regarding the switch from leader-only to any member for NB connections by workers: is
>> the argument to do it about the fact that the number of secondaries is a lot lower than
>> the number of neutron-server workers, so the load on a leader is also lower? (2 or 4
>> connections from secondaries vs. N*W connections, where N is number of API controllers
>> and W is the number of workers per neutron-server)
>>
>> What would be the drawbacks we should consider? I can imagine the load on secondary
>> members will increase (but the load on the leader will decrease). Plus probably some
>> minor lag in request processing because of proxying through secondaries (for most - but
>> not all - neutron-server workers). Anything else?
> 
> Lag will likely not be noticeable, since, as I mentioned above, leader waits for
> followers to acknowledge every transaction anyway before it is considered committed.
> 
> There is some extra computational cost on a follower if there are many simultaneous
> transactions executed through different servers.  Only one of the servers will be
> able to commit the transaction, others will receive a prerequisite mismatch error,
> because committing any transaction changes prerequisites.  However, this error is
> transient and not returned to the client.  Instead, the servers that received this
> error from the leader will re-try their transactions until they succeed.  This takes
> some CPU resources.  However, as I said before, this is only a problem if you have
> many write-heavy clients sending constant stream of transactions without any
> breathing room.
> 
> So, I'd recommend to try this out.
> 
>>   
>>
>>       
>>
>>          > That results in unnecessary mass re-connection on leadership change.
>>          > So, I'd prefer this fixed in OpenStack instead.
>>
>>          But isn't having many leader-only connections a perfectly valid config
>>          when most if not all are writers? And if a small increase in the backlog
>>          helps I don't see any reason not to do it. Fixing in Openstack would
>>          involve one of two options from what I know - 1) Use a proxy; 2) Just
>>          connect irregardless of leadership status. I'm not sure #2 won't cause
>>          some other problems as the secondaries would have to forward things to
>>          the leader, correct?
>>
>>          The best solution would be to have this configurable if you see that as
>>          a viable option.
>>
>>          -Brian
>>
>>          > Best regards, Ilya Maximets.
>>          >
>>          >>
>>          >> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org <mailto:horms@ovn.org>> wrote:
>>          >>
>>          >>> + Ihar
>>          >>>
>>          >>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
>>          >>>> An earlier patch [1] increased the size of the listen
>>          >>>> backlog to 64. While that was a huge improvement over
>>          >>>> 10, further testing in large deployments showed 128
>>          >>>> was even better.
>>          >>>
>>
>>
>>      What I lack in this commit message is the definition of "better". I see stats of drops using ss and netstat, but it's not clear what the user's visible effect of this is. Do you have a bug report to link to, with some symptoms described?
>>       
>>
>>          >>> nit: I would slightly prefer if a commit was referenced like this:
>>          >>>
>>          >>>    commit 2b7efee031c3 ("socket: Increase listen backlog to 64 everywhere.")
>>          >>>
>>          >>>> Looking at 'ss -lmt' output over more than one week for
>>          >>>> port 6641, captured across three different controllers,
>>          >>>> the average was:
>>          >>>>
>>          >>>>      listen(s, 10) : 1213 drops/day
>>          >>>>      listen(s, 64) : 791 drops/day
>>          >>>>      listen(s, 128): 657 drops/day
>>          >>>>
>>          >>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
>>          >>>> output over one week for port 6641, again captured across
>>          >>>> three different controllers, the average was:
>>          >>>>
>>          >>>>      listen(s, 10) : 741 drops/day
>>          >>>>      listen(s, 64) : 200 drops/day
>>          >>>>      listen(s, 128): 22 drops/day
>>
>>
>>      So why not 256? 512? 1024? It's always an issue when a single value is picked for all environments, isn't it.
>>
>>      The suggestion to have it configurable seems to me to be the way to go.
>>       
>>
>>          >>>>
>>          >>>> While having this value configurable would be the
>>          >>>> best solution, changing to 128 is a quick fix that
>>          >>>> should be good for all deployments. A link to the
>>          >>>> original discussion is at [2].
>>          >>>>
>>          >>>> [1]
>>          >>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c <https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c>
>>          >>>> [2]
>>          >>> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c <https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c>
>>          >>>
>>          >>> nit: These two references are the same?
>>          >>>
>>          >>>> Signed-off-by: Brian Haley <haleyb.dev@gmail.com <mailto:haleyb.dev@gmail.com>>
>>          >>>
>>          >>> I'd value input on this from Ihar (CCed) who worked on the cited commit.
>>          >>>
>>          >>>> ---
>>          >>>>   lib/socket-util.c    | 2 +-
>>          >>>>   python/ovs/stream.py | 2 +-
>>          >>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>          >>>>
>>          >>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
>>          >>>> index c569b7d16..552310266 100644
>>          >>>> --- a/lib/socket-util.c
>>          >>>> +++ b/lib/socket-util.c
>>          >>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char *target, int
>>          >>> default_port,
>>          >>>>       }
>>          >>>>
>>          >>>>       /* Listen. */
>>          >>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
>>          >>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
>>          >>>>           error = sock_errno();
>>          >>>>           VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
>>          >>>>           goto error;
>>          >>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
>>          >>>> index dbb6b2e1f..874fe0bd5 100644
>>          >>>> --- a/python/ovs/stream.py
>>          >>>> +++ b/python/ovs/stream.py
>>          >>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
>>          >>>>               raise Exception('Unknown connection string')
>>          >>>>
>>          >>>>           try:
>>          >>>> -            sock.listen(64)
>>          >>>> +            sock.listen(128)
>>          >>>>           except socket.error as e:
>>          >>>>               vlog.err("%s: listen: %s" % (name, os.strerror(e.error)))
>>          >>>>               sock.close()
>>          >>>> --
>>          >>>> 2.34.1
>>          >>>>
>>          >>>> _______________________________________________
>>          >>>> dev mailing list
>>          >>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>          >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>          >>>>
>>          >>> _______________________________________________
>>          >>> dev mailing list
>>          >>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>          >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>          >>>
>>          >> _______________________________________________
>>          >> dev mailing list
>>          >> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>          >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>          >
>>
>
Ihar Hrachyshka Aug. 8, 2024, 8:12 p.m. UTC | #9
On Thu, Aug 8, 2024 at 1:55 PM Brian Haley <haleyb.dev@gmail.com> wrote:

> Hi Ilya and others,
>
> Thanks for responding, comments below.
>
> On 8/2/24 7:59 AM, Ilya Maximets wrote:
> > On 6/6/24 18:02, Ihar Hrachyshka wrote:
> >> On Thu, Jun 6, 2024 at 11:12 AM Ihar Hrachyshka <ihrachys@redhat.com
> <mailto:ihrachys@redhat.com>> wrote:
> >>
> >>      On Wed, Jun 5, 2024 at 11:17 AM Brian Haley <haleyb.dev@gmail.com
> <mailto:haleyb.dev@gmail.com>> wrote:
> >>
> >>          Hi Ilya,
> >>
> >>          On 6/4/24 6:27 AM, Ilya Maximets wrote:
> >>          > On 6/4/24 11:54, Alin Serdean wrote:
> >>          >> Does it make sense to make this configurable via automake
> in order to avoid
> >>          >> future patches if we need to bump the value further?
> >>          >
> >>          > I'm still not convinced this is an issue worth fixing.
> >>          >
> >>          > The original discussion is here:
> >>          >
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html
> <https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html
> >
> >>          >
> >>          > It's not clear if these reconnections actually cause any
> harm or are
> >>          > even helpful in getting to connect to a less busy server in
> a general
> >>          > case.
> >>
> >>          To me, not having to reconnect is a win, especially given the
> number of
> >>          client threads here. And 128 is still not that much for a
> server.
>
> An update on this since maybe it will make people re-think increasing
> the socket backlog to 128.
>
> After running for 50+ days (now closer to 75) in production with a 128
> backlog, my customer has seen no increase in listen queue overflow or
> "SYN to LISTEN drops" statistics on the OVN units. And the leader
> elections/changes have dropped to about once an hour. At 64 they are
> still seen. To me that shows it can help in larger deployments, even if
> seeing them is more an annoyance than anything (they think it's more).
>

Thanks for the info.

I think it's very reasonable to want to be able to tune it, especially
because backlog actually behaves very differently depending on the kernel.
Each of linux, openbsd, freebsd, darwin have their own unique math to
calculate the effective queue sizes. E.g. linux does backlog+1; openbsd is
3*backlog (but only when memory pressure is below 95% for mbufs); darwin is
either 1.5*backlog or just backlog, depending on soqlimitcompat sysctl
setting... etc. etc. etc. One size probably won't ever fit all. (I know
that we don't target all of the kernels as supported platforms, but we do
at least some of them.)


>
> >>          > The original number 10 was clearly way too low for any
> reasonable workload.
> >>          > But even with that we lived for a very long time with very
> large clusters
> >>          > without any issues.
> >>
> >>          It could have also been that noone ever noticed, I know I
> didn't until
> >>          digging deeper into this.
> >>
> >>          > The main problem in the original thread was that a lot of
> neutron clients
> >>          > are having leader-only connections to the database for
> seemingly no reason.
> >>
> >>
> >>      They write, so AFAIU they need a leader (even if they connect to
> secondary,>     writes will be proxied), correct?
> >
> > They will be proxied, however, if you're sending transactions through
> > the leader, it will not reply until the transaction is replicated to
> > the majority of followers.  So, there is no much difference in latency.
> >
> > Only the "heavy" writers benefit from connecting to a leader, because
> > if you have multiple clients with constant stream of transactions from
> > all of them, followers will need to re-try due to prerequisite changes
> > once the other server commits a different transaction.  But AFAIU since
> > you have many clients, probably none of them are "heavy" writers.
>
> I can't say I have an exact answer for the "heavy" writers question,
> just that some of our customer clouds are very heavily used, i.e. it's a
> never-ending cycle of building and tearing-down. Think of a CI pipeline
> that's run on every patch, to infinity. It's not "light" and never in
> what I'd call a stable state for very long.
>
> I will say I'm interested in exploring using a "leader-only=false"
> deployment, I'll try and loop-in some other Canonical people to see what
> they think of that. Ihar had sent a test patch at [0], not sure if it
> was linked here before.
>

Thanks for the link for reference.

This patch had some very limited functional testing (in upstream gate where
we never run more than one node in a ovsdb cluster and internally on a
limited number of nodes - e.g. with just 3 controllers). A perf team that I
work with internally promised me to run a set of scale tests using a
patched neutron with this included. I will report back how it goes once I
have data. Obviously, it would help if some other groups would do their own
testing and report back.


>
> Thanks,
>
> -Brian
>
> [0] https://review.opendev.org/c/openstack/neutron/+/921461
>
> >>      There may be place for some OpenStack optimizations (e.g.
> consolidating all
> >> neutron agents into one that would maintain a single connection per
> chassis), but
> >> ultimately, it's still O(n) where n = number of chassis in the cluster;
> and we have
> >> to have at least 2 connections per chassis (1 ovn-controller and at
> least 1 neutron
> >> agent, for metadata). But I would not overestimate the amount of
> settings where the
> >> number of connections from neutron agents can be reduced. This is only
> an option
> >> where multiple neutron agents are running on the same chassis. Usually,
> we run
> >> ovn-controller and ovn-metadata-agent. We may also run ovn-agent (for
> qos) or
> >> bgp-agent (for BGP) but these are not deployed in all environments / on
> all nodes.
> >>
> >>      Apart from it, deploying a proxy per node will allow to coalesce
> the number of
> >> connections to 1 per chassis. One can go further and build a tree of
> proxies to keep
> >> the number of connections to the leader at bay (is it O(log n)?) This
> is not a trivial
> >> change to architecture.
> >>
> >>
> >> OK disregard this (somewhat): Brian experiences issues with NB, not SB.
> NB connections
> >> come from neutron-server workers (not agents). In this case, OpenStack
> would have to
> >> run its own "proxy", per node - for all workers to share, or an ovsdb
> proxy.
> >>
> >> Regarding the switch from leader-only to any member for NB connections
> by workers: is
> >> the argument to do it about the fact that the number of secondaries is
> a lot lower than
> >> the number of neutron-server workers, so the load on a leader is also
> lower? (2 or 4
> >> connections from secondaries vs. N*W connections, where N is number of
> API controllers
> >> and W is the number of workers per neutron-server)
> >>
> >> What would be the drawbacks we should consider? I can imagine the load
> on secondary
> >> members will increase (but the load on the leader will decrease). Plus
> probably some
> >> minor lag in request processing because of proxying through secondaries
> (for most - but
> >> not all - neutron-server workers). Anything else?
> >
> > Lag will likely not be noticeable, since, as I mentioned above, leader
> waits for
> > followers to acknowledge every transaction anyway before it is
> considered committed.
> >
> > There is some extra computational cost on a follower if there are many
> simultaneous
> > transactions executed through different servers.  Only one of the
> servers will be
> > able to commit the transaction, others will receive a prerequisite
> mismatch error,
> > because committing any transaction changes prerequisites.  However, this
> error is
> > transient and not returned to the client.  Instead, the servers that
> received this
> > error from the leader will re-try their transactions until they
> succeed.  This takes
> > some CPU resources.  However, as I said before, this is only a problem
> if you have
> > many write-heavy clients sending constant stream of transactions without
> any
> > breathing room.
> >
> > So, I'd recommend to try this out.
> >
> >>
> >>
> >>
> >>
> >>          > That results in unnecessary mass re-connection on leadership
> change.
> >>          > So, I'd prefer this fixed in OpenStack instead.
> >>
> >>          But isn't having many leader-only connections a perfectly
> valid config
> >>          when most if not all are writers? And if a small increase in
> the backlog
> >>          helps I don't see any reason not to do it. Fixing in Openstack
> would
> >>          involve one of two options from what I know - 1) Use a proxy;
> 2) Just
> >>          connect irregardless of leadership status. I'm not sure #2
> won't cause
> >>          some other problems as the secondaries would have to forward
> things to
> >>          the leader, correct?
> >>
> >>          The best solution would be to have this configurable if you
> see that as
> >>          a viable option.
> >>
> >>          -Brian
> >>
> >>          > Best regards, Ilya Maximets.
> >>          >
> >>          >>
> >>          >> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <horms@ovn.org
> <mailto:horms@ovn.org>> wrote:
> >>          >>
> >>          >>> + Ihar
> >>          >>>
> >>          >>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley
> wrote:
> >>          >>>> An earlier patch [1] increased the size of the listen
> >>          >>>> backlog to 64. While that was a huge improvement over
> >>          >>>> 10, further testing in large deployments showed 128
> >>          >>>> was even better.
> >>          >>>
> >>
> >>
> >>      What I lack in this commit message is the definition of "better".
> I see stats of drops using ss and netstat, but it's not clear what the
> user's visible effect of this is. Do you have a bug report to link to, with
> some symptoms described?
> >>
> >>
> >>          >>> nit: I would slightly prefer if a commit was referenced
> like this:
> >>          >>>
> >>          >>>    commit 2b7efee031c3 ("socket: Increase listen backlog
> to 64 everywhere.")
> >>          >>>
> >>          >>>> Looking at 'ss -lmt' output over more than one week for
> >>          >>>> port 6641, captured across three different controllers,
> >>          >>>> the average was:
> >>          >>>>
> >>          >>>>      listen(s, 10) : 1213 drops/day
> >>          >>>>      listen(s, 64) : 791 drops/day
> >>          >>>>      listen(s, 128): 657 drops/day
> >>          >>>>
> >>          >>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
> >>          >>>> output over one week for port 6641, again captured across
> >>          >>>> three different controllers, the average was:
> >>          >>>>
> >>          >>>>      listen(s, 10) : 741 drops/day
> >>          >>>>      listen(s, 64) : 200 drops/day
> >>          >>>>      listen(s, 128): 22 drops/day
> >>
> >>
> >>      So why not 256? 512? 1024? It's always an issue when a single
> value is picked for all environments, isn't it.
> >>
> >>      The suggestion to have it configurable seems to me to be the way
> to go.
> >>
> >>
> >>          >>>>
> >>          >>>> While having this value configurable would be the
> >>          >>>> best solution, changing to 128 is a quick fix that
> >>          >>>> should be good for all deployments. A link to the
> >>          >>>> original discussion is at [2].
> >>          >>>>
> >>          >>>> [1]
> >>          >>>
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> <
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> >
> >>          >>>> [2]
> >>          >>>
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> <
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
> >
> >>          >>>
> >>          >>> nit: These two references are the same?
> >>          >>>
> >>          >>>> Signed-off-by: Brian Haley <haleyb.dev@gmail.com <mailto:
> haleyb.dev@gmail.com>>
> >>          >>>
> >>          >>> I'd value input on this from Ihar (CCed) who worked on the
> cited commit.
> >>          >>>
> >>          >>>> ---
> >>          >>>>   lib/socket-util.c    | 2 +-
> >>          >>>>   python/ovs/stream.py | 2 +-
> >>          >>>>   2 files changed, 2 insertions(+), 2 deletions(-)
> >>          >>>>
> >>          >>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
> >>          >>>> index c569b7d16..552310266 100644
> >>          >>>> --- a/lib/socket-util.c
> >>          >>>> +++ b/lib/socket-util.c
> >>          >>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const
> char *target, int
> >>          >>> default_port,
> >>          >>>>       }
> >>          >>>>
> >>          >>>>       /* Listen. */
> >>          >>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
> >>          >>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
> >>          >>>>           error = sock_errno();
> >>          >>>>           VLOG_ERR("%s: listen: %s", target,
> sock_strerror(error));
> >>          >>>>           goto error;
> >>          >>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
> >>          >>>> index dbb6b2e1f..874fe0bd5 100644
> >>          >>>> --- a/python/ovs/stream.py
> >>          >>>> +++ b/python/ovs/stream.py
> >>          >>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
> >>          >>>>               raise Exception('Unknown connection string')
> >>          >>>>
> >>          >>>>           try:
> >>          >>>> -            sock.listen(64)
> >>          >>>> +            sock.listen(128)
> >>          >>>>           except socket.error as e:
> >>          >>>>               vlog.err("%s: listen: %s" % (name,
> os.strerror(e.error)))
> >>          >>>>               sock.close()
> >>          >>>> --
> >>          >>>> 2.34.1
> >>          >>>>
> >>          >>>> _______________________________________________
> >>          >>>> dev mailing list
> >>          >>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
> >>          >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >>          >>>>
> >>          >>> _______________________________________________
> >>          >>> dev mailing list
> >>          >>> dev@openvswitch.org <mailto:dev@openvswitch.org>
> >>          >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >>          >>>
> >>          >> _______________________________________________
> >>          >> dev mailing list
> >>          >> dev@openvswitch.org <mailto:dev@openvswitch.org>
> >>          >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> >>          >
> >>
> >
>
>
diff mbox series

Patch

diff --git a/lib/socket-util.c b/lib/socket-util.c
index c569b7d16..552310266 100644
--- a/lib/socket-util.c
+++ b/lib/socket-util.c
@@ -769,7 +769,7 @@  inet_open_passive(int style, const char *target, int default_port,
     }
 
     /* Listen. */
-    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
+    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
         error = sock_errno();
         VLOG_ERR("%s: listen: %s", target, sock_strerror(error));
         goto error;
diff --git a/python/ovs/stream.py b/python/ovs/stream.py
index dbb6b2e1f..874fe0bd5 100644
--- a/python/ovs/stream.py
+++ b/python/ovs/stream.py
@@ -620,7 +620,7 @@  class PassiveStream(object):
             raise Exception('Unknown connection string')
 
         try:
-            sock.listen(64)
+            sock.listen(128)
         except socket.error as e:
             vlog.err("%s: listen: %s" % (name, os.strerror(e.error)))
             sock.close()