diff mbox series

[ovs-dev] utilities: disable OVSDB inactivity probes for non-daemon ovn-nbctl

Message ID 20230323192520.3903330-1-odivlad@gmail.com
State Changes Requested
Headers show
Series [ovs-dev] utilities: disable OVSDB inactivity probes for non-daemon ovn-nbctl | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Vladislav Odintsov March 23, 2023, 7:25 p.m. UTC
For large OVN_Southbound DBs defatult interval of 5000 ms could be not
sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl running
in non-daemon mode.

Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
---
 utilities/ovn-dbctl.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ales Musil March 31, 2023, 9:43 a.m. UTC | #1
On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
wrote:

> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
> running
> in non-daemon mode.
>
> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
> ---
>  utilities/ovn-dbctl.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
> index 369a6a663..4307a5cae 100644
> --- a/utilities/ovn-dbctl.c
> +++ b/utilities/ovn-dbctl.c
> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>      if (daemon_mode) {
>          server_loop(dbctl_options, idl, argc, argv_);
>      } else {
> +        /* Disable OVSDB probe interval for non-daemon mode. */
> +        ovsdb_idl_set_probe_interval(idl, 0);
> +
>          struct ctl_command *commands;
>          size_t n_commands;
>          char *error;
> --
> 2.36.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Looks good to me, thanks.

Reviewed-by: Ales Musil <amusil@redhat.com>
Dumitru Ceara March 31, 2023, 12:01 p.m. UTC | #2
On 3/31/23 11:43, Ales Musil wrote:
> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
> wrote:
> 
>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>> running
>> in non-daemon mode.
>>
>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>> ---
>>  utilities/ovn-dbctl.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>> index 369a6a663..4307a5cae 100644
>> --- a/utilities/ovn-dbctl.c
>> +++ b/utilities/ovn-dbctl.c
>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>      if (daemon_mode) {
>>          server_loop(dbctl_options, idl, argc, argv_);
>>      } else {
>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>> +        ovsdb_idl_set_probe_interval(idl, 0);

I think I'd avoid using the idl function directly and call instead:

set_idl_probe_interval(idl, 0);

Just to keep it aligned with all other uses in OVN.  I can patch that at
apply time if it looks OK to you.

>> +
>>          struct ctl_command *commands;
>>          size_t n_commands;
>>          char *error;
>> --
>> 2.36.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Looks good to me, thanks.
> 
> Reviewed-by: Ales Musil <amusil@redhat.com>
> 

Vladislav, Ales, I was thinking of backporting this to stable branches
too, what do you think?

Thanks,
Dumitru
Vladislav Odintsov March 31, 2023, 12:46 p.m. UTC | #3
Hi Dumitru,

> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 3/31/23 11:43, Ales Musil wrote:
>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
>> wrote:
>> 
>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>> running
>>> in non-daemon mode.
>>> 
>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>> ---
>>> utilities/ovn-dbctl.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>> index 369a6a663..4307a5cae 100644
>>> --- a/utilities/ovn-dbctl.c
>>> +++ b/utilities/ovn-dbctl.c
>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>     if (daemon_mode) {
>>>         server_loop(dbctl_options, idl, argc, argv_);
>>>     } else {
>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>> +        ovsdb_idl_set_probe_interval(idl, 0);
> 
> I think I'd avoid using the idl function directly and call instead:
> 
> set_idl_probe_interval(idl, 0);
> 
> Just to keep it aligned with all other uses in OVN.  I can patch that at
> apply time if it looks OK to you.

I’ve got no objections here.
Small nit: set_idl_probe_interval function needs also a remote. Like this:

set_idl_probe_interval(idl, db, 0);

Also, please correct typo in commit message: defatult -> default.

> 
>>> +
>>>         struct ctl_command *commands;
>>>         size_t n_commands;
>>>         char *error;
>>> --
>>> 2.36.1
>>> 
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> 
>>> 
>> Looks good to me, thanks.
>> 
>> Reviewed-by: Ales Musil <amusil@redhat.com>
>> 
> 
> Vladislav, Ales, I was thinking of backporting this to stable branches
> too, what do you think?
> 
> Thanks,
> Dumitru


Regards,
Vladislav Odintsov
Dumitru Ceara March 31, 2023, 12:55 p.m. UTC | #4
On 3/31/23 14:46, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 3/31/23 11:43, Ales Musil wrote:
>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
>>> wrote:
>>>
>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>>> running
>>>> in non-daemon mode.
>>>>
>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>> ---
>>>> utilities/ovn-dbctl.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>> index 369a6a663..4307a5cae 100644
>>>> --- a/utilities/ovn-dbctl.c
>>>> +++ b/utilities/ovn-dbctl.c
>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>     if (daemon_mode) {
>>>>         server_loop(dbctl_options, idl, argc, argv_);
>>>>     } else {
>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>
>> I think I'd avoid using the idl function directly and call instead:
>>
>> set_idl_probe_interval(idl, 0);
>>
>> Just to keep it aligned with all other uses in OVN.  I can patch that at
>> apply time if it looks OK to you.
> 
> I’ve got no objections here.
> Small nit: set_idl_probe_interval function needs also a remote. Like this:
> 
> set_idl_probe_interval(idl, db, 0);
> 
> Also, please correct typo in commit message: defatult -> default.
> 

In light of the ovs-discuss thread [0] is it maybe better to just set
this probe interval to a very high value instead?  That's for the case
when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of
for example cable being unplugged somewhere on the way between the two.

[0]
https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html

>>
>>>> +
>>>>         struct ctl_command *commands;
>>>>         size_t n_commands;
>>>>         char *error;
>>>> --
>>>> 2.36.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>>>
>>> Looks good to me, thanks.
>>>
>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>
>>
>> Vladislav, Ales, I was thinking of backporting this to stable branches
>> too, what do you think?
>>
>> Thanks,
>> Dumitru
> 
> 
> Regards,
> Vladislav Odintsov
> 
>
Vladislav Odintsov March 31, 2023, 2:51 p.m. UTC | #5
As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC.
Please correct me if I’m wrong.

> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 3/31/23 14:46, Vladislav Odintsov wrote:
>> Hi Dumitru,
>> 
>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> On 3/31/23 11:43, Ales Musil wrote:
>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
>>>> wrote:
>>>> 
>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>>>> running
>>>>> in non-daemon mode.
>>>>> 
>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>> ---
>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>> 
>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>> index 369a6a663..4307a5cae 100644
>>>>> --- a/utilities/ovn-dbctl.c
>>>>> +++ b/utilities/ovn-dbctl.c
>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>    if (daemon_mode) {
>>>>>        server_loop(dbctl_options, idl, argc, argv_);
>>>>>    } else {
>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>> 
>>> I think I'd avoid using the idl function directly and call instead:
>>> 
>>> set_idl_probe_interval(idl, 0);
>>> 
>>> Just to keep it aligned with all other uses in OVN.  I can patch that at
>>> apply time if it looks OK to you.
>> 
>> I’ve got no objections here.
>> Small nit: set_idl_probe_interval function needs also a remote. Like this:
>> 
>> set_idl_probe_interval(idl, db, 0);
>> 
>> Also, please correct typo in commit message: defatult -> default.
>> 
> 
> In light of the ovs-discuss thread [0] is it maybe better to just set
> this probe interval to a very high value instead?  That's for the case
> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of
> for example cable being unplugged somewhere on the way between the two.
> 
> [0]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
> 
>>> 
>>>>> +
>>>>>        struct ctl_command *commands;
>>>>>        size_t n_commands;
>>>>>        char *error;
>>>>> --
>>>>> 2.36.1
>>>>> 
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev@openvswitch.org
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>> 
>>>>> 
>>>> Looks good to me, thanks.
>>>> 
>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>> 
>>> 
>>> Vladislav, Ales, I was thinking of backporting this to stable branches
>>> too, what do you think?
>>> 
>>> Thanks,
>>> Dumitru
>> 
>> 
>> Regards,
>> Vladislav Odintsov
>> 
>> 
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org <mailto:dev@openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov
Dumitru Ceara March 31, 2023, 3:17 p.m. UTC | #6
On 3/31/23 16:51, Vladislav Odintsov wrote:
> As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC.
> Please correct me if I’m wrong.
> 

You're right but the default timeouts are quite large for TCP sessions:

With keepalives default:
tcp_keepalive_time - INTEGER
	How often TCP sends out keepalive messages when keepalive is enabled.
	Default: 2hours.

tcp_keepalive_probes - INTEGER
	How many keepalive probes TCP sends out, until it decides that the
	connection is broken. Default value: 9.

tcp_keepalive_intvl - INTEGER
	How frequently the probes are send out. Multiplied by
	tcp_keepalive_probes it is time to kill not responding connection,
	after probes started. Default value: 75sec i.e. connection
	will be aborted after ~11 minutes of retries.

DB clients don't even enable tcp keepalives IIRC.

So then we depend retransmits timing out:
tcp_retries2 - INTEGER
	This value influences the timeout of an alive TCP connection,
	when RTO retransmissions remain unacknowledged.
	Given a value of N, a hypothetical TCP connection following
	exponential backoff with an initial RTO of TCP_RTO_MIN would
	retransmit N times before killing the connection at the (N+1)th RTO.

	The default value of 15 yields a hypothetical timeout of 924.6
	seconds and is a lower bound for the effective timeout.
	TCP will effectively time out at the first RTO which exceeds the
	hypothetical timeout.

	RFC 1122 recommends at least 100 seconds for the timeout,
	which corresponds to a value of at least 8.

Isn't it possible that the connection suddenly goes down in between a
transaction request and its reply but with all TCP segments being
ACKed?

>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>>
>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>>>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>>>>> running
>>>>>> in non-daemon mode.
>>>>>>
>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>> ---
>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>> 1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>> index 369a6a663..4307a5cae 100644
>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>    if (daemon_mode) {
>>>>>>        server_loop(dbctl_options, idl, argc, argv_);
>>>>>>    } else {
>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>
>>>> I think I'd avoid using the idl function directly and call instead:
>>>>
>>>> set_idl_probe_interval(idl, 0);
>>>>
>>>> Just to keep it aligned with all other uses in OVN.  I can patch that at
>>>> apply time if it looks OK to you.
>>>
>>> I’ve got no objections here.
>>> Small nit: set_idl_probe_interval function needs also a remote. Like this:
>>>
>>> set_idl_probe_interval(idl, db, 0);
>>>
>>> Also, please correct typo in commit message: defatult -> default.
>>>
>>
>> In light of the ovs-discuss thread [0] is it maybe better to just set
>> this probe interval to a very high value instead?  That's for the case
>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of
>> for example cable being unplugged somewhere on the way between the two.
>>
>> [0]
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>
>>>>
>>>>>> +
>>>>>>        struct ctl_command *commands;
>>>>>>        size_t n_commands;
>>>>>>        char *error;
>>>>>> --
>>>>>> 2.36.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev@openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>
>>>>>>
>>>>> Looks good to me, thanks.
>>>>>
>>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>>>
>>>>
>>>> Vladislav, Ales, I was thinking of backporting this to stable branches
>>>> too, what do you think?
>>>>
>>>> Thanks,
>>>> Dumitru
>>>
>>>
>>> Regards,
>>> Vladislav Odintsov
>>>
>>>
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 
> Regards,
> Vladislav Odintsov
> 
>
Ilya Maximets April 3, 2023, 10:08 a.m. UTC | #7
On 3/31/23 17:17, Dumitru Ceara wrote:
> On 3/31/23 16:51, Vladislav Odintsov wrote:
>> As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC.
>> Please correct me if I’m wrong.
>>
> 
> You're right but the default timeouts are quite large for TCP sessions:
> 
> With keepalives default:
> tcp_keepalive_time - INTEGER
> 	How often TCP sends out keepalive messages when keepalive is enabled.
> 	Default: 2hours.
> 
> tcp_keepalive_probes - INTEGER
> 	How many keepalive probes TCP sends out, until it decides that the
> 	connection is broken. Default value: 9.
> 
> tcp_keepalive_intvl - INTEGER
> 	How frequently the probes are send out. Multiplied by
> 	tcp_keepalive_probes it is time to kill not responding connection,
> 	after probes started. Default value: 75sec i.e. connection
> 	will be aborted after ~11 minutes of retries.
> 
> DB clients don't even enable tcp keepalives IIRC.
> 
> So then we depend retransmits timing out:
> tcp_retries2 - INTEGER
> 	This value influences the timeout of an alive TCP connection,
> 	when RTO retransmissions remain unacknowledged.
> 	Given a value of N, a hypothetical TCP connection following
> 	exponential backoff with an initial RTO of TCP_RTO_MIN would
> 	retransmit N times before killing the connection at the (N+1)th RTO.
> 
> 	The default value of 15 yields a hypothetical timeout of 924.6
> 	seconds and is a lower bound for the effective timeout.
> 	TCP will effectively time out at the first RTO which exceeds the
> 	hypothetical timeout.
> 
> 	RFC 1122 recommends at least 100 seconds for the timeout,
> 	which corresponds to a value of at least 8.
> 
> Isn't it possible that the connection suddenly goes down in between a
> transaction request and its reply but with all TCP segments being
> ACKed?

Right.  If the client already sent the request and only waits for a reply,
the retransmission timeout will not have any effect.  An idle TCP session
may stay open practically forever, because it is designed to survive network
disruptions.  There is no such thing as a TCP timeout in a general case.

> 
>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>>
>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>> Hi Dumitru,
>>>>
>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>
>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>>>>>> running
>>>>>>> in non-daemon mode.
>>>>>>>
>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>>> ---
>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>    if (daemon_mode) {
>>>>>>>        server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>    } else {
>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>
>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>
>>>>> set_idl_probe_interval(idl, 0);
>>>>>
>>>>> Just to keep it aligned with all other uses in OVN.  I can patch that at
>>>>> apply time if it looks OK to you.
>>>>
>>>> I’ve got no objections here.
>>>> Small nit: set_idl_probe_interval function needs also a remote. Like this:
>>>>
>>>> set_idl_probe_interval(idl, db, 0);
>>>>
>>>> Also, please correct typo in commit message: defatult -> default.
>>>>
>>>
>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>> this probe interval to a very high value instead?  That's for the case
>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of
>>> for example cable being unplugged somewhere on the way between the two.
>>>
>>> [0]
>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>
>>>>>
>>>>>>> +
>>>>>>>        struct ctl_command *commands;
>>>>>>>        size_t n_commands;
>>>>>>>        char *error;
>>>>>>> --
>>>>>>> 2.36.1
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> dev@openvswitch.org
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>
>>>>>>>
>>>>>> Looks good to me, thanks.
>>>>>>
>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>>>>
>>>>>
>>>>> Vladislav, Ales, I was thinking of backporting this to stable branches
>>>>> too, what do you think?
>>>>>
>>>>> Thanks,
>>>>> Dumitru
>>>>
>>>>
>>>> Regards,
>>>> Vladislav Odintsov
>>>>
>>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>> Regards,
>> Vladislav Odintsov
>>
>>
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Dumitru Ceara April 13, 2023, 8:55 a.m. UTC | #8
Hi Vladislav,

On 4/3/23 12:08, Ilya Maximets wrote:
> On 3/31/23 17:17, Dumitru Ceara wrote:
>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>> As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC.
>>> Please correct me if I’m wrong.
>>>
>>
>> You're right but the default timeouts are quite large for TCP sessions:
>>
>> With keepalives default:
>> tcp_keepalive_time - INTEGER
>> 	How often TCP sends out keepalive messages when keepalive is enabled.
>> 	Default: 2hours.
>>
>> tcp_keepalive_probes - INTEGER
>> 	How many keepalive probes TCP sends out, until it decides that the
>> 	connection is broken. Default value: 9.
>>
>> tcp_keepalive_intvl - INTEGER
>> 	How frequently the probes are send out. Multiplied by
>> 	tcp_keepalive_probes it is time to kill not responding connection,
>> 	after probes started. Default value: 75sec i.e. connection
>> 	will be aborted after ~11 minutes of retries.
>>
>> DB clients don't even enable tcp keepalives IIRC.
>>
>> So then we depend retransmits timing out:
>> tcp_retries2 - INTEGER
>> 	This value influences the timeout of an alive TCP connection,
>> 	when RTO retransmissions remain unacknowledged.
>> 	Given a value of N, a hypothetical TCP connection following
>> 	exponential backoff with an initial RTO of TCP_RTO_MIN would
>> 	retransmit N times before killing the connection at the (N+1)th RTO.
>>
>> 	The default value of 15 yields a hypothetical timeout of 924.6
>> 	seconds and is a lower bound for the effective timeout.
>> 	TCP will effectively time out at the first RTO which exceeds the
>> 	hypothetical timeout.
>>
>> 	RFC 1122 recommends at least 100 seconds for the timeout,
>> 	which corresponds to a value of at least 8.
>>
>> Isn't it possible that the connection suddenly goes down in between a
>> transaction request and its reply but with all TCP segments being
>> ACKed?
> 
> Right.  If the client already sent the request and only waits for a reply,
> the retransmission timeout will not have any effect.  An idle TCP session
> may stay open practically forever, because it is designed to survive network
> disruptions.  There is no such thing as a TCP timeout in a general case.
> 

I was wondering if you have more thoughts on this?  Should we go for the
approach where OVN sets the probe interval to a "reasonably large"
value by default?  For example, using 30s would mean failure after ~60
seconds of inactivity.

Thanks,
Dumitru

>>
>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>> Hi Dumitru,
>>>>>
>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>
>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>>>>>>> running
>>>>>>>> in non-daemon mode.
>>>>>>>>
>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>>>> ---
>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>    if (daemon_mode) {
>>>>>>>>        server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>    } else {
>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>
>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>
>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>
>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch that at
>>>>>> apply time if it looks OK to you.
>>>>>
>>>>> I’ve got no objections here.
>>>>> Small nit: set_idl_probe_interval function needs also a remote. Like this:
>>>>>
>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>
>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>
>>>>
>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>> this probe interval to a very high value instead?  That's for the case
>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of
>>>> for example cable being unplugged somewhere on the way between the two.
>>>>
>>>> [0]
>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>
>>>>>>
>>>>>>>> +
>>>>>>>>        struct ctl_command *commands;
>>>>>>>>        size_t n_commands;
>>>>>>>>        char *error;
>>>>>>>> --
>>>>>>>> 2.36.1
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> dev@openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>
>>>>>>>>
>>>>>>> Looks good to me, thanks.
>>>>>>>
>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>>>>>
>>>>>>
>>>>>> Vladislav, Ales, I was thinking of backporting this to stable branches
>>>>>> too, what do you think?
>>>>>>
>>>>>> Thanks,
>>>>>> Dumitru
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vladislav Odintsov
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
>>> Regards,
>>> Vladislav Odintsov
>>>
>>>
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Vladislav Odintsov April 14, 2023, 1:37 p.m. UTC | #9
Hi Dumitru,

> On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> Hi Vladislav,
> 
> On 4/3/23 12:08, Ilya Maximets wrote:
>> On 3/31/23 17:17, Dumitru Ceara wrote:
>>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC.
>>>> Please correct me if I’m wrong.
>>>> 
>>> 
>>> You're right but the default timeouts are quite large for TCP sessions:
>>> 
>>> With keepalives default:
>>> tcp_keepalive_time - INTEGER
>>> 	How often TCP sends out keepalive messages when keepalive is enabled.
>>> 	Default: 2hours.
>>> 
>>> tcp_keepalive_probes - INTEGER
>>> 	How many keepalive probes TCP sends out, until it decides that the
>>> 	connection is broken. Default value: 9.
>>> 
>>> tcp_keepalive_intvl - INTEGER
>>> 	How frequently the probes are send out. Multiplied by
>>> 	tcp_keepalive_probes it is time to kill not responding connection,
>>> 	after probes started. Default value: 75sec i.e. connection
>>> 	will be aborted after ~11 minutes of retries.
>>> 
>>> DB clients don't even enable tcp keepalives IIRC.
>>> 
>>> So then we depend retransmits timing out:
>>> tcp_retries2 - INTEGER
>>> 	This value influences the timeout of an alive TCP connection,
>>> 	when RTO retransmissions remain unacknowledged.
>>> 	Given a value of N, a hypothetical TCP connection following
>>> 	exponential backoff with an initial RTO of TCP_RTO_MIN would
>>> 	retransmit N times before killing the connection at the (N+1)th RTO.
>>> 
>>> 	The default value of 15 yields a hypothetical timeout of 924.6
>>> 	seconds and is a lower bound for the effective timeout.
>>> 	TCP will effectively time out at the first RTO which exceeds the
>>> 	hypothetical timeout.
>>> 
>>> 	RFC 1122 recommends at least 100 seconds for the timeout,
>>> 	which corresponds to a value of at least 8.
>>> 
>>> Isn't it possible that the connection suddenly goes down in between a
>>> transaction request and its reply but with all TCP segments being
>>> ACKed?
>> 
>> Right.  If the client already sent the request and only waits for a reply,
>> the retransmission timeout will not have any effect.  An idle TCP session
>> may stay open practically forever, because it is designed to survive network
>> disruptions.  There is no such thing as a TCP timeout in a general case.
>> 
> 
> I was wondering if you have more thoughts on this?  Should we go for the
> approach where OVN sets the probe interval to a "reasonably large"
> value by default?  For example, using 30s would mean failure after ~60
> seconds of inactivity.

Actually I’m okay with your proposal to set a reasonable high inactivity probe (120/180 seconds?).
Maybe it should be not only for non-daemon operation of dbctl but also for all modes?

Alternatively, I thought it could be useful to add a new command line option like --inactivity-probe=N for the utilities to allow the user to choose and set interval which is needed in each case.

What do you think about it?

> 
> Thanks,
> Dumitru
> 
>>> 
>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>> 
>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>>> Hi Dumitru,
>>>>>> 
>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>> 
>>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>>>>>>>> running
>>>>>>>>> in non-daemon mode.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>>>>> ---
>>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>> 
>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>>   if (daemon_mode) {
>>>>>>>>>       server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>>   } else {
>>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>> 
>>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>> 
>>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>> 
>>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch that at
>>>>>>> apply time if it looks OK to you.
>>>>>> 
>>>>>> I’ve got no objections here.
>>>>>> Small nit: set_idl_probe_interval function needs also a remote. Like this:
>>>>>> 
>>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>> 
>>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>> 
>>>>> 
>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>>> this probe interval to a very high value instead?  That's for the case
>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of
>>>>> for example cable being unplugged somewhere on the way between the two.
>>>>> 
>>>>> [0]
>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>> 
>>>>>>> 
>>>>>>>>> +
>>>>>>>>>       struct ctl_command *commands;
>>>>>>>>>       size_t n_commands;
>>>>>>>>>       char *error;
>>>>>>>>> --
>>>>>>>>> 2.36.1
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> dev mailing list
>>>>>>>>> dev@openvswitch.org
>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Looks good to me, thanks.
>>>>>>>> 
>>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>>>>>> 
>>>>>>> 
>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable branches
>>>>>>> too, what do you think?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Dumitru
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Vladislav Odintsov
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>> 
>>>> 
>>>> Regards,
>>>> Vladislav Odintsov
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> 
> 


Regards,
Vladislav Odintsov
Dumitru Ceara April 18, 2023, 1:59 p.m. UTC | #10
On 4/14/23 15:37, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
>> On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> Hi Vladislav,
>>
>> On 4/3/23 12:08, Ilya Maximets wrote:
>>> On 3/31/23 17:17, Dumitru Ceara wrote:
>>>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl
>>>>> (non-daemon mode), it doesn’t make sense to have client -> server
>>>>> inactivity probes. If cable is unplugged, will just hit tcp session
>>>>> timeout, IIUC.
>>>>> Please correct me if I’m wrong.
>>>>>
>>>>
>>>> You're right but the default timeouts are quite large for TCP sessions:
>>>>
>>>> With keepalives default:
>>>> tcp_keepalive_time - INTEGER
>>>> How often TCP sends out keepalive messages when keepalive is enabled.
>>>> Default: 2hours.
>>>>
>>>> tcp_keepalive_probes - INTEGER
>>>> How many keepalive probes TCP sends out, until it decides that the
>>>> connection is broken. Default value: 9.
>>>>
>>>> tcp_keepalive_intvl - INTEGER
>>>> How frequently the probes are send out. Multiplied by
>>>> tcp_keepalive_probes it is time to kill not responding connection,
>>>> after probes started. Default value: 75sec i.e. connection
>>>> will be aborted after ~11 minutes of retries.
>>>>
>>>> DB clients don't even enable tcp keepalives IIRC.
>>>>
>>>> So then we depend retransmits timing out:
>>>> tcp_retries2 - INTEGER
>>>> This value influences the timeout of an alive TCP connection,
>>>> when RTO retransmissions remain unacknowledged.
>>>> Given a value of N, a hypothetical TCP connection following
>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would
>>>> retransmit N times before killing the connection at the (N+1)th RTO.
>>>>
>>>> The default value of 15 yields a hypothetical timeout of 924.6
>>>> seconds and is a lower bound for the effective timeout.
>>>> TCP will effectively time out at the first RTO which exceeds the
>>>> hypothetical timeout.
>>>>
>>>> RFC 1122 recommends at least 100 seconds for the timeout,
>>>> which corresponds to a value of at least 8.
>>>>
>>>> Isn't it possible that the connection suddenly goes down in between a
>>>> transaction request and its reply but with all TCP segments being
>>>> ACKed?
>>>
>>> Right.  If the client already sent the request and only waits for a
>>> reply,
>>> the retransmission timeout will not have any effect.  An idle TCP session
>>> may stay open practically forever, because it is designed to survive
>>> network
>>> disruptions.  There is no such thing as a TCP timeout in a general case.
>>>
>>
>> I was wondering if you have more thoughts on this?  Should we go for the
>> approach where OVN sets the probe interval to a "reasonably large"
>> value by default?  For example, using 30s would mean failure after ~60
>> seconds of inactivity.
> 
> Actually I’m okay with your proposal to set a reasonable high inactivity
> probe (120/180 seconds?).
> Maybe it should be not only for non-daemon operation of dbctl but also
> for all modes?
> 

It's probably a good idea, yes.

> Alternatively, I thought it could be useful to add a new command line
> option like --inactivity-probe=N for the utilities to allow the user to
> choose and set interval which is needed in each case.
> 
> What do you think about it?
> 

Northd uses NB_Global.options:northd_probe_interval as inactivity
interval.  Can we do the same thing and use as default the high value
until the first successful connection to the NB?

>>
>> Thanks,
>> Dumitru
>>
>>>>
>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>
>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>>>> Hi Dumitru,
>>>>>>>
>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>>
>>>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov
>>>>>>>>> <odivlad@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms
>>>>>>>>>> could be not
>>>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for
>>>>>>>>>> ovn-*ctl
>>>>>>>>>> running
>>>>>>>>>> in non-daemon mode.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>>>>>> ---
>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>>>   if (daemon_mode) {
>>>>>>>>>>       server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>>>   } else {
>>>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>>>
>>>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>>>
>>>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>>>
>>>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch
>>>>>>>> that at
>>>>>>>> apply time if it looks OK to you.
>>>>>>>
>>>>>>> I’ve got no objections here.
>>>>>>> Small nit: set_idl_probe_interval function needs also a remote.
>>>>>>> Like this:
>>>>>>>
>>>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>>>
>>>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>>>
>>>>>>
>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>>>> this probe interval to a very high value instead?  That's for the case
>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies
>>>>>> because of
>>>>>> for example cable being unplugged somewhere on the way between the
>>>>>> two.
>>>>>>
>>>>>> [0]
>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>>>
>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>>       struct ctl_command *commands;
>>>>>>>>>>       size_t n_commands;
>>>>>>>>>>       char *error;
>>>>>>>>>> --
>>>>>>>>>> 2.36.1
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dev mailing list
>>>>>>>>>> dev@openvswitch.org
>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Looks good to me, thanks.
>>>>>>>>>
>>>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable
>>>>>>>> branches
>>>>>>>> too, what do you think?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Dumitru
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Vladislav Odintsov
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vladislav Odintsov
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
> 
> 
> Regards,
> Vladislav Odintsov
>
Vladislav Odintsov April 18, 2023, 3:31 p.m. UTC | #11
> On 18 Apr 2023, at 16:59, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 4/14/23 15:37, Vladislav Odintsov wrote:
>> Hi Dumitru,
>> 
>>> On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>> 
>>> Hi Vladislav,
>>> 
>>> On 4/3/23 12:08, Ilya Maximets wrote:
>>>> On 3/31/23 17:17, Dumitru Ceara wrote:
>>>>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl
>>>>>> (non-daemon mode), it doesn’t make sense to have client -> server
>>>>>> inactivity probes. If cable is unplugged, will just hit tcp session
>>>>>> timeout, IIUC.
>>>>>> Please correct me if I’m wrong.
>>>>>> 
>>>>> 
>>>>> You're right but the default timeouts are quite large for TCP sessions:
>>>>> 
>>>>> With keepalives default:
>>>>> tcp_keepalive_time - INTEGER
>>>>> How often TCP sends out keepalive messages when keepalive is enabled.
>>>>> Default: 2hours.
>>>>> 
>>>>> tcp_keepalive_probes - INTEGER
>>>>> How many keepalive probes TCP sends out, until it decides that the
>>>>> connection is broken. Default value: 9.
>>>>> 
>>>>> tcp_keepalive_intvl - INTEGER
>>>>> How frequently the probes are send out. Multiplied by
>>>>> tcp_keepalive_probes it is time to kill not responding connection,
>>>>> after probes started. Default value: 75sec i.e. connection
>>>>> will be aborted after ~11 minutes of retries.
>>>>> 
>>>>> DB clients don't even enable tcp keepalives IIRC.
>>>>> 
>>>>> So then we depend retransmits timing out:
>>>>> tcp_retries2 - INTEGER
>>>>> This value influences the timeout of an alive TCP connection,
>>>>> when RTO retransmissions remain unacknowledged.
>>>>> Given a value of N, a hypothetical TCP connection following
>>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would
>>>>> retransmit N times before killing the connection at the (N+1)th RTO.
>>>>> 
>>>>> The default value of 15 yields a hypothetical timeout of 924.6
>>>>> seconds and is a lower bound for the effective timeout.
>>>>> TCP will effectively time out at the first RTO which exceeds the
>>>>> hypothetical timeout.
>>>>> 
>>>>> RFC 1122 recommends at least 100 seconds for the timeout,
>>>>> which corresponds to a value of at least 8.
>>>>> 
>>>>> Isn't it possible that the connection suddenly goes down in between a
>>>>> transaction request and its reply but with all TCP segments being
>>>>> ACKed?
>>>> 
>>>> Right.  If the client already sent the request and only waits for a
>>>> reply,
>>>> the retransmission timeout will not have any effect.  An idle TCP session
>>>> may stay open practically forever, because it is designed to survive
>>>> network
>>>> disruptions.  There is no such thing as a TCP timeout in a general case.
>>>> 
>>> 
>>> I was wondering if you have more thoughts on this?  Should we go for the
>>> approach where OVN sets the probe interval to a "reasonably large"
>>> value by default?  For example, using 30s would mean failure after ~60
>>> seconds of inactivity.
>> 
>> Actually I’m okay with your proposal to set a reasonable high inactivity
>> probe (120/180 seconds?).
>> Maybe it should be not only for non-daemon operation of dbctl but also
>> for all modes?
>> 
> 
> It's probably a good idea, yes.
> 
>> Alternatively, I thought it could be useful to add a new command line
>> option like --inactivity-probe=N for the utilities to allow the user to
>> choose and set interval which is needed in each case.
>> 
>> What do you think about it?
>> 
> 
> Northd uses NB_Global.options:northd_probe_interval as inactivity
> interval.  Can we do the same thing and use as default the high value
> until the first successful connection to the NB?

Good idea, I’ll dig into this next week, thanks.

> 
>>> 
>>> Thanks,
>>> Dumitru
>>> 
>>>>> 
>>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>> 
>>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>>>>> Hi Dumitru,
>>>>>>>> 
>>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>>> 
>>>>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov
>>>>>>>>>> <odivlad@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms
>>>>>>>>>>> could be not
>>>>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for
>>>>>>>>>>> ovn-*ctl
>>>>>>>>>>> running
>>>>>>>>>>> in non-daemon mode.
>>>>>>>>>>> 
>>>>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>>>>>>> ---
>>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>>> 
>>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>>>>   if (daemon_mode) {
>>>>>>>>>>>       server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>>>>   } else {
>>>>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>>>> 
>>>>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>>>> 
>>>>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>>>> 
>>>>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch
>>>>>>>>> that at
>>>>>>>>> apply time if it looks OK to you.
>>>>>>>> 
>>>>>>>> I’ve got no objections here.
>>>>>>>> Small nit: set_idl_probe_interval function needs also a remote.
>>>>>>>> Like this:
>>>>>>>> 
>>>>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>>>> 
>>>>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>>>> 
>>>>>>> 
>>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>>>>> this probe interval to a very high value instead?  That's for the case
>>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies
>>>>>>> because of
>>>>>>> for example cable being unplugged somewhere on the way between the
>>>>>>> two.
>>>>>>> 
>>>>>>> [0]
>>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> +
>>>>>>>>>>>       struct ctl_command *commands;
>>>>>>>>>>>       size_t n_commands;
>>>>>>>>>>>       char *error;
>>>>>>>>>>> --
>>>>>>>>>>> 2.36.1
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dev mailing list
>>>>>>>>>>> dev@openvswitch.org
>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> Looks good to me, thanks.
>>>>>>>>>> 
>>>>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable
>>>>>>>>> branches
>>>>>>>>> too, what do you think?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Dumitru
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Vladislav Odintsov
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Vladislav Odintsov
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev@openvswitch.org
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>> 
>>> 
>> 
>> 
>> Regards,
>> Vladislav Odintsov


Regards,
Vladislav Odintsov
Vladislav Odintsov May 3, 2023, 12:57 a.m. UTC | #12
Hi Dumitru,

I’m a bit concerned about our last conversation’s idea how to set inactivity probe.
The point is that in general case dbctl module can be used by multiple existing and possible future ovn-*ctl utilities.
So, to configure a basic (we may call it a "startup" idl inactivity probe), which is hardcoded, let’s say to 120000 ms, we should do this inside dbctl.c file.
But if we’re talking about to read the contents of DB and try to find inactivity probe configuration there - such code must be in particular utility: ovn-sbctl/ovn-nbctl/etc, right? And it seems to me that the second configuration attempt of inactivity probe makes sense only in case of daemon-mode.
So it’s needed after initial load of DB contents to read appropriate DB and reconfigure interval.

I’ve posted v2 [1], please take a look on it. Let me know if you have any thoughts/concern.

1: https://patchwork.ozlabs.org/project/ovn/patch/20230503005531.4005891-1-odivlad@gmail.com/

> On 18 Apr 2023, at 18:31, Vladislav Odintsov <odivlad@gmail.com> wrote:
> 
> 
> 
>> On 18 Apr 2023, at 16:59, Dumitru Ceara <dceara@redhat.com> wrote:
>> 
>> On 4/14/23 15:37, Vladislav Odintsov wrote:
>>> Hi Dumitru,
>>> 
>>>> On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>>> 
>>>> Hi Vladislav,
>>>> 
>>>> On 4/3/23 12:08, Ilya Maximets wrote:
>>>>> On 3/31/23 17:17, Dumitru Ceara wrote:
>>>>>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl
>>>>>>> (non-daemon mode), it doesn’t make sense to have client -> server
>>>>>>> inactivity probes. If cable is unplugged, will just hit tcp session
>>>>>>> timeout, IIUC.
>>>>>>> Please correct me if I’m wrong.
>>>>>>> 
>>>>>> 
>>>>>> You're right but the default timeouts are quite large for TCP sessions:
>>>>>> 
>>>>>> With keepalives default:
>>>>>> tcp_keepalive_time - INTEGER
>>>>>> How often TCP sends out keepalive messages when keepalive is enabled.
>>>>>> Default: 2hours.
>>>>>> 
>>>>>> tcp_keepalive_probes - INTEGER
>>>>>> How many keepalive probes TCP sends out, until it decides that the
>>>>>> connection is broken. Default value: 9.
>>>>>> 
>>>>>> tcp_keepalive_intvl - INTEGER
>>>>>> How frequently the probes are send out. Multiplied by
>>>>>> tcp_keepalive_probes it is time to kill not responding connection,
>>>>>> after probes started. Default value: 75sec i.e. connection
>>>>>> will be aborted after ~11 minutes of retries.
>>>>>> 
>>>>>> DB clients don't even enable tcp keepalives IIRC.
>>>>>> 
>>>>>> So then we depend retransmits timing out:
>>>>>> tcp_retries2 - INTEGER
>>>>>> This value influences the timeout of an alive TCP connection,
>>>>>> when RTO retransmissions remain unacknowledged.
>>>>>> Given a value of N, a hypothetical TCP connection following
>>>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would
>>>>>> retransmit N times before killing the connection at the (N+1)th RTO.
>>>>>> 
>>>>>> The default value of 15 yields a hypothetical timeout of 924.6
>>>>>> seconds and is a lower bound for the effective timeout.
>>>>>> TCP will effectively time out at the first RTO which exceeds the
>>>>>> hypothetical timeout.
>>>>>> 
>>>>>> RFC 1122 recommends at least 100 seconds for the timeout,
>>>>>> which corresponds to a value of at least 8.
>>>>>> 
>>>>>> Isn't it possible that the connection suddenly goes down in between a
>>>>>> transaction request and its reply but with all TCP segments being
>>>>>> ACKed?
>>>>> 
>>>>> Right.  If the client already sent the request and only waits for a
>>>>> reply,
>>>>> the retransmission timeout will not have any effect.  An idle TCP session
>>>>> may stay open practically forever, because it is designed to survive
>>>>> network
>>>>> disruptions.  There is no such thing as a TCP timeout in a general case.
>>>>> 
>>>> 
>>>> I was wondering if you have more thoughts on this?  Should we go for the
>>>> approach where OVN sets the probe interval to a "reasonably large"
>>>> value by default?  For example, using 30s would mean failure after ~60
>>>> seconds of inactivity.
>>> 
>>> Actually I’m okay with your proposal to set a reasonable high inactivity
>>> probe (120/180 seconds?).
>>> Maybe it should be not only for non-daemon operation of dbctl but also
>>> for all modes?
>>> 
>> 
>> It's probably a good idea, yes.
>> 
>>> Alternatively, I thought it could be useful to add a new command line
>>> option like --inactivity-probe=N for the utilities to allow the user to
>>> choose and set interval which is needed in each case.
>>> 
>>> What do you think about it?
>>> 
>> 
>> Northd uses NB_Global.options:northd_probe_interval as inactivity
>> interval.  Can we do the same thing and use as default the high value
>> until the first successful connection to the NB?
> 
> Good idea, I’ll dig into this next week, thanks.
> 
>> 
>>>> 
>>>> Thanks,
>>>> Dumitru
>>>> 
>>>>>> 
>>>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>> 
>>>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>>>>>> Hi Dumitru,
>>>>>>>>> 
>>>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov
>>>>>>>>>>> <odivlad@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms
>>>>>>>>>>>> could be not
>>>>>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for
>>>>>>>>>>>> ovn-*ctl
>>>>>>>>>>>> running
>>>>>>>>>>>> in non-daemon mode.
>>>>>>>>>>>> 
>>>>>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>>>> 
>>>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>>>>>  if (daemon_mode) {
>>>>>>>>>>>>      server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>>>>>  } else {
>>>>>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>>>>> 
>>>>>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>>>>> 
>>>>>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>>>>> 
>>>>>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch
>>>>>>>>>> that at
>>>>>>>>>> apply time if it looks OK to you.
>>>>>>>>> 
>>>>>>>>> I’ve got no objections here.
>>>>>>>>> Small nit: set_idl_probe_interval function needs also a remote.
>>>>>>>>> Like this:
>>>>>>>>> 
>>>>>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>>>>> 
>>>>>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>>>>>> this probe interval to a very high value instead?  That's for the case
>>>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies
>>>>>>>> because of
>>>>>>>> for example cable being unplugged somewhere on the way between the
>>>>>>>> two.
>>>>>>>> 
>>>>>>>> [0]
>>>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> +
>>>>>>>>>>>>      struct ctl_command *commands;
>>>>>>>>>>>>      size_t n_commands;
>>>>>>>>>>>>      char *error;
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.36.1
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> dev mailing list
>>>>>>>>>>>> dev@openvswitch.org
>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> Looks good to me, thanks.
>>>>>>>>>>> 
>>>>>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com>
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable
>>>>>>>>>> branches
>>>>>>>>>> too, what do you think?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Dumitru
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Vladislav Odintsov
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> dev@openvswitch.org <mailto:dev@openvswitch.org>
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Vladislav Odintsov
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev@openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>> 
>>>> 
>>> 
>>> 
>>> Regards,
>>> Vladislav Odintsov
> 
> 
> Regards,
> Vladislav Odintsov
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org <mailto:dev@openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov
diff mbox series

Patch

diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
index 369a6a663..4307a5cae 100644
--- a/utilities/ovn-dbctl.c
+++ b/utilities/ovn-dbctl.c
@@ -208,6 +208,9 @@  ovn_dbctl_main(int argc, char *argv[],
     if (daemon_mode) {
         server_loop(dbctl_options, idl, argc, argv_);
     } else {
+        /* Disable OVSDB probe interval for non-daemon mode. */
+        ovsdb_idl_set_probe_interval(idl, 0);
+
         struct ctl_command *commands;
         size_t n_commands;
         char *error;