Message ID | 20230323192520.3903330-1-odivlad@gmail.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] utilities: disable OVSDB inactivity probes for non-daemon ovn-nbctl | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> wrote: > For large OVN_Southbound DBs defatult interval of 5000 ms could be not > sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl > running > in non-daemon mode. > > Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> > --- > utilities/ovn-dbctl.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c > index 369a6a663..4307a5cae 100644 > --- a/utilities/ovn-dbctl.c > +++ b/utilities/ovn-dbctl.c > @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], > if (daemon_mode) { > server_loop(dbctl_options, idl, argc, argv_); > } else { > + /* Disable OVSDB probe interval for non-daemon mode. */ > + ovsdb_idl_set_probe_interval(idl, 0); > + > struct ctl_command *commands; > size_t n_commands; > char *error; > -- > 2.36.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Looks good to me, thanks. Reviewed-by: Ales Musil <amusil@redhat.com>
On 3/31/23 11:43, Ales Musil wrote: > On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> > wrote: > >> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >> running >> in non-daemon mode. >> >> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >> --- >> utilities/ovn-dbctl.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >> index 369a6a663..4307a5cae 100644 >> --- a/utilities/ovn-dbctl.c >> +++ b/utilities/ovn-dbctl.c >> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >> if (daemon_mode) { >> server_loop(dbctl_options, idl, argc, argv_); >> } else { >> + /* Disable OVSDB probe interval for non-daemon mode. */ >> + ovsdb_idl_set_probe_interval(idl, 0); I think I'd avoid using the idl function directly and call instead: set_idl_probe_interval(idl, 0); Just to keep it aligned with all other uses in OVN. I can patch that at apply time if it looks OK to you. >> + >> struct ctl_command *commands; >> size_t n_commands; >> char *error; >> -- >> 2.36.1 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > Looks good to me, thanks. > > Reviewed-by: Ales Musil <amusil@redhat.com> > Vladislav, Ales, I was thinking of backporting this to stable branches too, what do you think? Thanks, Dumitru
Hi Dumitru, > On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: > > On 3/31/23 11:43, Ales Musil wrote: >> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> >> wrote: >> >>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>> running >>> in non-daemon mode. >>> >>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>> --- >>> utilities/ovn-dbctl.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>> index 369a6a663..4307a5cae 100644 >>> --- a/utilities/ovn-dbctl.c >>> +++ b/utilities/ovn-dbctl.c >>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>> if (daemon_mode) { >>> server_loop(dbctl_options, idl, argc, argv_); >>> } else { >>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>> + ovsdb_idl_set_probe_interval(idl, 0); > > I think I'd avoid using the idl function directly and call instead: > > set_idl_probe_interval(idl, 0); > > Just to keep it aligned with all other uses in OVN. I can patch that at > apply time if it looks OK to you. I’ve got no objections here. Small nit: set_idl_probe_interval function needs also a remote. Like this: set_idl_probe_interval(idl, db, 0); Also, please correct typo in commit message: defatult -> default. > >>> + >>> struct ctl_command *commands; >>> size_t n_commands; >>> char *error; >>> -- >>> 2.36.1 >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >> Looks good to me, thanks. >> >> Reviewed-by: Ales Musil <amusil@redhat.com> >> > > Vladislav, Ales, I was thinking of backporting this to stable branches > too, what do you think? > > Thanks, > Dumitru Regards, Vladislav Odintsov
On 3/31/23 14:46, Vladislav Odintsov wrote: > Hi Dumitru, > >> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 3/31/23 11:43, Ales Musil wrote: >>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> >>> wrote: >>> >>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>>> running >>>> in non-daemon mode. >>>> >>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>> --- >>>> utilities/ovn-dbctl.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>> index 369a6a663..4307a5cae 100644 >>>> --- a/utilities/ovn-dbctl.c >>>> +++ b/utilities/ovn-dbctl.c >>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>> if (daemon_mode) { >>>> server_loop(dbctl_options, idl, argc, argv_); >>>> } else { >>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>> + ovsdb_idl_set_probe_interval(idl, 0); >> >> I think I'd avoid using the idl function directly and call instead: >> >> set_idl_probe_interval(idl, 0); >> >> Just to keep it aligned with all other uses in OVN. I can patch that at >> apply time if it looks OK to you. > > I’ve got no objections here. > Small nit: set_idl_probe_interval function needs also a remote. Like this: > > set_idl_probe_interval(idl, db, 0); > > Also, please correct typo in commit message: defatult -> default. > In light of the ovs-discuss thread [0] is it maybe better to just set this probe interval to a very high value instead? That's for the case when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of for example cable being unplugged somewhere on the way between the two. [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >> >>>> + >>>> struct ctl_command *commands; >>>> size_t n_commands; >>>> char *error; >>>> -- >>>> 2.36.1 >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>>> >>> Looks good to me, thanks. >>> >>> Reviewed-by: Ales Musil <amusil@redhat.com> >>> >> >> Vladislav, Ales, I was thinking of backporting this to stable branches >> too, what do you think? >> >> Thanks, >> Dumitru > > > Regards, > Vladislav Odintsov > >
As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC. Please correct me if I’m wrong. > On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: > > On 3/31/23 14:46, Vladislav Odintsov wrote: >> Hi Dumitru, >> >>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 3/31/23 11:43, Ales Musil wrote: >>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> >>>> wrote: >>>> >>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>>>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>>>> running >>>>> in non-daemon mode. >>>>> >>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>> --- >>>>> utilities/ovn-dbctl.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>> index 369a6a663..4307a5cae 100644 >>>>> --- a/utilities/ovn-dbctl.c >>>>> +++ b/utilities/ovn-dbctl.c >>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>> if (daemon_mode) { >>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>> } else { >>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>> >>> I think I'd avoid using the idl function directly and call instead: >>> >>> set_idl_probe_interval(idl, 0); >>> >>> Just to keep it aligned with all other uses in OVN. I can patch that at >>> apply time if it looks OK to you. >> >> I’ve got no objections here. >> Small nit: set_idl_probe_interval function needs also a remote. Like this: >> >> set_idl_probe_interval(idl, db, 0); >> >> Also, please correct typo in commit message: defatult -> default. >> > > In light of the ovs-discuss thread [0] is it maybe better to just set > this probe interval to a very high value instead? That's for the case > when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of > for example cable being unplugged somewhere on the way between the two. > > [0] > https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html > >>> >>>>> + >>>>> struct ctl_command *commands; >>>>> size_t n_commands; >>>>> char *error; >>>>> -- >>>>> 2.36.1 >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>> >>>>> >>>> Looks good to me, thanks. >>>> >>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>> >>> >>> Vladislav, Ales, I was thinking of backporting this to stable branches >>> too, what do you think? >>> >>> Thanks, >>> Dumitru >> >> >> Regards, >> Vladislav Odintsov >> >> > > _______________________________________________ > dev mailing list > dev@openvswitch.org <mailto:dev@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov
On 3/31/23 16:51, Vladislav Odintsov wrote: > As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC. > Please correct me if I’m wrong. > You're right but the default timeouts are quite large for TCP sessions: With keepalives default: tcp_keepalive_time - INTEGER How often TCP sends out keepalive messages when keepalive is enabled. Default: 2hours. tcp_keepalive_probes - INTEGER How many keepalive probes TCP sends out, until it decides that the connection is broken. Default value: 9. tcp_keepalive_intvl - INTEGER How frequently the probes are send out. Multiplied by tcp_keepalive_probes it is time to kill not responding connection, after probes started. Default value: 75sec i.e. connection will be aborted after ~11 minutes of retries. DB clients don't even enable tcp keepalives IIRC. So then we depend retransmits timing out: tcp_retries2 - INTEGER This value influences the timeout of an alive TCP connection, when RTO retransmissions remain unacknowledged. Given a value of N, a hypothetical TCP connection following exponential backoff with an initial RTO of TCP_RTO_MIN would retransmit N times before killing the connection at the (N+1)th RTO. The default value of 15 yields a hypothetical timeout of 924.6 seconds and is a lower bound for the effective timeout. TCP will effectively time out at the first RTO which exceeds the hypothetical timeout. RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8. Isn't it possible that the connection suddenly goes down in between a transaction request and its reply but with all TCP segments being ACKed? >> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 3/31/23 14:46, Vladislav Odintsov wrote: >>> Hi Dumitru, >>> >>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 3/31/23 11:43, Ales Musil wrote: >>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> >>>>> wrote: >>>>> >>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>>>>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>>>>> running >>>>>> in non-daemon mode. >>>>>> >>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>>> --- >>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>> index 369a6a663..4307a5cae 100644 >>>>>> --- a/utilities/ovn-dbctl.c >>>>>> +++ b/utilities/ovn-dbctl.c >>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>> if (daemon_mode) { >>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>> } else { >>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>> >>>> I think I'd avoid using the idl function directly and call instead: >>>> >>>> set_idl_probe_interval(idl, 0); >>>> >>>> Just to keep it aligned with all other uses in OVN. I can patch that at >>>> apply time if it looks OK to you. >>> >>> I’ve got no objections here. >>> Small nit: set_idl_probe_interval function needs also a remote. Like this: >>> >>> set_idl_probe_interval(idl, db, 0); >>> >>> Also, please correct typo in commit message: defatult -> default. >>> >> >> In light of the ovs-discuss thread [0] is it maybe better to just set >> this probe interval to a very high value instead? That's for the case >> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of >> for example cable being unplugged somewhere on the way between the two. >> >> [0] >> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >> >>>> >>>>>> + >>>>>> struct ctl_command *commands; >>>>>> size_t n_commands; >>>>>> char *error; >>>>>> -- >>>>>> 2.36.1 >>>>>> >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> dev@openvswitch.org >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> >>>>>> >>>>> Looks good to me, thanks. >>>>> >>>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>>> >>>> >>>> Vladislav, Ales, I was thinking of backporting this to stable branches >>>> too, what do you think? >>>> >>>> Thanks, >>>> Dumitru >>> >>> >>> Regards, >>> Vladislav Odintsov >>> >>> >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org <mailto:dev@openvswitch.org> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > Regards, > Vladislav Odintsov > >
On 3/31/23 17:17, Dumitru Ceara wrote: > On 3/31/23 16:51, Vladislav Odintsov wrote: >> As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC. >> Please correct me if I’m wrong. >> > > You're right but the default timeouts are quite large for TCP sessions: > > With keepalives default: > tcp_keepalive_time - INTEGER > How often TCP sends out keepalive messages when keepalive is enabled. > Default: 2hours. > > tcp_keepalive_probes - INTEGER > How many keepalive probes TCP sends out, until it decides that the > connection is broken. Default value: 9. > > tcp_keepalive_intvl - INTEGER > How frequently the probes are send out. Multiplied by > tcp_keepalive_probes it is time to kill not responding connection, > after probes started. Default value: 75sec i.e. connection > will be aborted after ~11 minutes of retries. > > DB clients don't even enable tcp keepalives IIRC. > > So then we depend retransmits timing out: > tcp_retries2 - INTEGER > This value influences the timeout of an alive TCP connection, > when RTO retransmissions remain unacknowledged. > Given a value of N, a hypothetical TCP connection following > exponential backoff with an initial RTO of TCP_RTO_MIN would > retransmit N times before killing the connection at the (N+1)th RTO. > > The default value of 15 yields a hypothetical timeout of 924.6 > seconds and is a lower bound for the effective timeout. > TCP will effectively time out at the first RTO which exceeds the > hypothetical timeout. > > RFC 1122 recommends at least 100 seconds for the timeout, > which corresponds to a value of at least 8. > > Isn't it possible that the connection suddenly goes down in between a > transaction request and its reply but with all TCP segments being > ACKed? Right. If the client already sent the request and only waits for a reply, the retransmission timeout will not have any effect. An idle TCP session may stay open practically forever, because it is designed to survive network disruptions. There is no such thing as a TCP timeout in a general case. > >>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>> Hi Dumitru, >>>> >>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>>>> >>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>>>>>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>>>>>> running >>>>>>> in non-daemon mode. >>>>>>> >>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>>>> --- >>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>> 1 file changed, 3 insertions(+) >>>>>>> >>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>> if (daemon_mode) { >>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>> } else { >>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>> >>>>> I think I'd avoid using the idl function directly and call instead: >>>>> >>>>> set_idl_probe_interval(idl, 0); >>>>> >>>>> Just to keep it aligned with all other uses in OVN. I can patch that at >>>>> apply time if it looks OK to you. >>>> >>>> I’ve got no objections here. >>>> Small nit: set_idl_probe_interval function needs also a remote. Like this: >>>> >>>> set_idl_probe_interval(idl, db, 0); >>>> >>>> Also, please correct typo in commit message: defatult -> default. >>>> >>> >>> In light of the ovs-discuss thread [0] is it maybe better to just set >>> this probe interval to a very high value instead? That's for the case >>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of >>> for example cable being unplugged somewhere on the way between the two. >>> >>> [0] >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>> >>>>> >>>>>>> + >>>>>>> struct ctl_command *commands; >>>>>>> size_t n_commands; >>>>>>> char *error; >>>>>>> -- >>>>>>> 2.36.1 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> dev@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>> >>>>>>> >>>>>> Looks good to me, thanks. >>>>>> >>>>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>>>> >>>>> >>>>> Vladislav, Ales, I was thinking of backporting this to stable branches >>>>> too, what do you think? >>>>> >>>>> Thanks, >>>>> Dumitru >>>> >>>> >>>> Regards, >>>> Vladislav Odintsov >>>> >>>> >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org <mailto:dev@openvswitch.org> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> >> Regards, >> Vladislav Odintsov >> >> > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi Vladislav, On 4/3/23 12:08, Ilya Maximets wrote: > On 3/31/23 17:17, Dumitru Ceara wrote: >> On 3/31/23 16:51, Vladislav Odintsov wrote: >>> As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC. >>> Please correct me if I’m wrong. >>> >> >> You're right but the default timeouts are quite large for TCP sessions: >> >> With keepalives default: >> tcp_keepalive_time - INTEGER >> How often TCP sends out keepalive messages when keepalive is enabled. >> Default: 2hours. >> >> tcp_keepalive_probes - INTEGER >> How many keepalive probes TCP sends out, until it decides that the >> connection is broken. Default value: 9. >> >> tcp_keepalive_intvl - INTEGER >> How frequently the probes are send out. Multiplied by >> tcp_keepalive_probes it is time to kill not responding connection, >> after probes started. Default value: 75sec i.e. connection >> will be aborted after ~11 minutes of retries. >> >> DB clients don't even enable tcp keepalives IIRC. >> >> So then we depend retransmits timing out: >> tcp_retries2 - INTEGER >> This value influences the timeout of an alive TCP connection, >> when RTO retransmissions remain unacknowledged. >> Given a value of N, a hypothetical TCP connection following >> exponential backoff with an initial RTO of TCP_RTO_MIN would >> retransmit N times before killing the connection at the (N+1)th RTO. >> >> The default value of 15 yields a hypothetical timeout of 924.6 >> seconds and is a lower bound for the effective timeout. >> TCP will effectively time out at the first RTO which exceeds the >> hypothetical timeout. >> >> RFC 1122 recommends at least 100 seconds for the timeout, >> which corresponds to a value of at least 8. >> >> Isn't it possible that the connection suddenly goes down in between a >> transaction request and its reply but with all TCP segments being >> ACKed? > > Right. If the client already sent the request and only waits for a reply, > the retransmission timeout will not have any effect. An idle TCP session > may stay open practically forever, because it is designed to survive network > disruptions. There is no such thing as a TCP timeout in a general case. > I was wondering if you have more thoughts on this? Should we go for the approach where OVN sets the probe interval to a "reasonably large" value by default? For example, using 30s would mean failure after ~60 seconds of inactivity. Thanks, Dumitru >> >>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>>> Hi Dumitru, >>>>> >>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>> >>>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>>>>>>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>>>>>>> running >>>>>>>> in non-daemon mode. >>>>>>>> >>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>>>>> --- >>>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>> >>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>>> if (daemon_mode) { >>>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>>> } else { >>>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>>> >>>>>> I think I'd avoid using the idl function directly and call instead: >>>>>> >>>>>> set_idl_probe_interval(idl, 0); >>>>>> >>>>>> Just to keep it aligned with all other uses in OVN. I can patch that at >>>>>> apply time if it looks OK to you. >>>>> >>>>> I’ve got no objections here. >>>>> Small nit: set_idl_probe_interval function needs also a remote. Like this: >>>>> >>>>> set_idl_probe_interval(idl, db, 0); >>>>> >>>>> Also, please correct typo in commit message: defatult -> default. >>>>> >>>> >>>> In light of the ovs-discuss thread [0] is it maybe better to just set >>>> this probe interval to a very high value instead? That's for the case >>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of >>>> for example cable being unplugged somewhere on the way between the two. >>>> >>>> [0] >>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>>> >>>>>> >>>>>>>> + >>>>>>>> struct ctl_command *commands; >>>>>>>> size_t n_commands; >>>>>>>> char *error; >>>>>>>> -- >>>>>>>> 2.36.1 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dev mailing list >>>>>>>> dev@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>> >>>>>>>> >>>>>>> Looks good to me, thanks. >>>>>>> >>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>>>>> >>>>>> >>>>>> Vladislav, Ales, I was thinking of backporting this to stable branches >>>>>> too, what do you think? >>>>>> >>>>>> Thanks, >>>>>> Dumitru >>>>> >>>>> >>>>> Regards, >>>>> Vladislav Odintsov >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org <mailto:dev@openvswitch.org> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >>> Regards, >>> Vladislav Odintsov >>> >>> >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
Hi Dumitru, > On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote: > > Hi Vladislav, > > On 4/3/23 12:08, Ilya Maximets wrote: >> On 3/31/23 17:17, Dumitru Ceara wrote: >>> On 3/31/23 16:51, Vladislav Odintsov wrote: >>>> As I understood from Ilya, in case of one-command run of ovn-sbctl (non-daemon mode), it doesn’t make sense to have client -> server inactivity probes. If cable is unplugged, will just hit tcp session timeout, IIUC. >>>> Please correct me if I’m wrong. >>>> >>> >>> You're right but the default timeouts are quite large for TCP sessions: >>> >>> With keepalives default: >>> tcp_keepalive_time - INTEGER >>> How often TCP sends out keepalive messages when keepalive is enabled. >>> Default: 2hours. >>> >>> tcp_keepalive_probes - INTEGER >>> How many keepalive probes TCP sends out, until it decides that the >>> connection is broken. Default value: 9. >>> >>> tcp_keepalive_intvl - INTEGER >>> How frequently the probes are send out. Multiplied by >>> tcp_keepalive_probes it is time to kill not responding connection, >>> after probes started. Default value: 75sec i.e. connection >>> will be aborted after ~11 minutes of retries. >>> >>> DB clients don't even enable tcp keepalives IIRC. >>> >>> So then we depend retransmits timing out: >>> tcp_retries2 - INTEGER >>> This value influences the timeout of an alive TCP connection, >>> when RTO retransmissions remain unacknowledged. >>> Given a value of N, a hypothetical TCP connection following >>> exponential backoff with an initial RTO of TCP_RTO_MIN would >>> retransmit N times before killing the connection at the (N+1)th RTO. >>> >>> The default value of 15 yields a hypothetical timeout of 924.6 >>> seconds and is a lower bound for the effective timeout. >>> TCP will effectively time out at the first RTO which exceeds the >>> hypothetical timeout. >>> >>> RFC 1122 recommends at least 100 seconds for the timeout, >>> which corresponds to a value of at least 8. >>> >>> Isn't it possible that the connection suddenly goes down in between a >>> transaction request and its reply but with all TCP segments being >>> ACKed? >> >> Right. If the client already sent the request and only waits for a reply, >> the retransmission timeout will not have any effect. An idle TCP session >> may stay open practically forever, because it is designed to survive network >> disruptions. There is no such thing as a TCP timeout in a general case. >> > > I was wondering if you have more thoughts on this? Should we go for the > approach where OVN sets the probe interval to a "reasonably large" > value by default? For example, using 30s would mean failure after ~60 > seconds of inactivity. Actually I’m okay with your proposal to set a reasonable high inactivity probe (120/180 seconds?). Maybe it should be not only for non-daemon operation of dbctl but also for all modes? Alternatively, I thought it could be useful to add a new command line option like --inactivity-probe=N for the utilities to allow the user to choose and set interval which is needed in each case. What do you think about it? > > Thanks, > Dumitru > >>> >>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: >>>>> >>>>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>>>> Hi Dumitru, >>>>>> >>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>> >>>>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odivlad@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>>>>>>>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>>>>>>>> running >>>>>>>>> in non-daemon mode. >>>>>>>>> >>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>>>>>> --- >>>>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>>>> if (daemon_mode) { >>>>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>>>> } else { >>>>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>>>> >>>>>>> I think I'd avoid using the idl function directly and call instead: >>>>>>> >>>>>>> set_idl_probe_interval(idl, 0); >>>>>>> >>>>>>> Just to keep it aligned with all other uses in OVN. I can patch that at >>>>>>> apply time if it looks OK to you. >>>>>> >>>>>> I’ve got no objections here. >>>>>> Small nit: set_idl_probe_interval function needs also a remote. Like this: >>>>>> >>>>>> set_idl_probe_interval(idl, db, 0); >>>>>> >>>>>> Also, please correct typo in commit message: defatult -> default. >>>>>> >>>>> >>>>> In light of the ovs-discuss thread [0] is it maybe better to just set >>>>> this probe interval to a very high value instead? That's for the case >>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of >>>>> for example cable being unplugged somewhere on the way between the two. >>>>> >>>>> [0] >>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>>>> >>>>>>> >>>>>>>>> + >>>>>>>>> struct ctl_command *commands; >>>>>>>>> size_t n_commands; >>>>>>>>> char *error; >>>>>>>>> -- >>>>>>>>> 2.36.1 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> dev mailing list >>>>>>>>> dev@openvswitch.org >>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>>> >>>>>>>>> >>>>>>>> Looks good to me, thanks. >>>>>>>> >>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>>>>>> >>>>>>> >>>>>>> Vladislav, Ales, I was thinking of backporting this to stable branches >>>>>>> too, what do you think? >>>>>>> >>>>>>> Thanks, >>>>>>> Dumitru >>>>>> >>>>>> >>>>>> Regards, >>>>>> Vladislav Odintsov >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org <mailto:dev@openvswitch.org> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>>> >>>> Regards, >>>> Vladislav Odintsov >>>> >>>> >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > Regards, Vladislav Odintsov
On 4/14/23 15:37, Vladislav Odintsov wrote: > Hi Dumitru, > >> On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote: >> >> Hi Vladislav, >> >> On 4/3/23 12:08, Ilya Maximets wrote: >>> On 3/31/23 17:17, Dumitru Ceara wrote: >>>> On 3/31/23 16:51, Vladislav Odintsov wrote: >>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl >>>>> (non-daemon mode), it doesn’t make sense to have client -> server >>>>> inactivity probes. If cable is unplugged, will just hit tcp session >>>>> timeout, IIUC. >>>>> Please correct me if I’m wrong. >>>>> >>>> >>>> You're right but the default timeouts are quite large for TCP sessions: >>>> >>>> With keepalives default: >>>> tcp_keepalive_time - INTEGER >>>> How often TCP sends out keepalive messages when keepalive is enabled. >>>> Default: 2hours. >>>> >>>> tcp_keepalive_probes - INTEGER >>>> How many keepalive probes TCP sends out, until it decides that the >>>> connection is broken. Default value: 9. >>>> >>>> tcp_keepalive_intvl - INTEGER >>>> How frequently the probes are send out. Multiplied by >>>> tcp_keepalive_probes it is time to kill not responding connection, >>>> after probes started. Default value: 75sec i.e. connection >>>> will be aborted after ~11 minutes of retries. >>>> >>>> DB clients don't even enable tcp keepalives IIRC. >>>> >>>> So then we depend retransmits timing out: >>>> tcp_retries2 - INTEGER >>>> This value influences the timeout of an alive TCP connection, >>>> when RTO retransmissions remain unacknowledged. >>>> Given a value of N, a hypothetical TCP connection following >>>> exponential backoff with an initial RTO of TCP_RTO_MIN would >>>> retransmit N times before killing the connection at the (N+1)th RTO. >>>> >>>> The default value of 15 yields a hypothetical timeout of 924.6 >>>> seconds and is a lower bound for the effective timeout. >>>> TCP will effectively time out at the first RTO which exceeds the >>>> hypothetical timeout. >>>> >>>> RFC 1122 recommends at least 100 seconds for the timeout, >>>> which corresponds to a value of at least 8. >>>> >>>> Isn't it possible that the connection suddenly goes down in between a >>>> transaction request and its reply but with all TCP segments being >>>> ACKed? >>> >>> Right. If the client already sent the request and only waits for a >>> reply, >>> the retransmission timeout will not have any effect. An idle TCP session >>> may stay open practically forever, because it is designed to survive >>> network >>> disruptions. There is no such thing as a TCP timeout in a general case. >>> >> >> I was wondering if you have more thoughts on this? Should we go for the >> approach where OVN sets the probe interval to a "reasonably large" >> value by default? For example, using 30s would mean failure after ~60 >> seconds of inactivity. > > Actually I’m okay with your proposal to set a reasonable high inactivity > probe (120/180 seconds?). > Maybe it should be not only for non-daemon operation of dbctl but also > for all modes? > It's probably a good idea, yes. > Alternatively, I thought it could be useful to add a new command line > option like --inactivity-probe=N for the utilities to allow the user to > choose and set interval which is needed in each case. > > What do you think about it? > Northd uses NB_Global.options:northd_probe_interval as inactivity interval. Can we do the same thing and use as default the high value until the first successful connection to the NB? >> >> Thanks, >> Dumitru >> >>>> >>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>> >>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>>>>> Hi Dumitru, >>>>>>> >>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>> >>>>>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov >>>>>>>>> <odivlad@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms >>>>>>>>>> could be not >>>>>>>>>> sufficient. This patch disables OVSDB inactivity probes for >>>>>>>>>> ovn-*ctl >>>>>>>>>> running >>>>>>>>>> in non-daemon mode. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>>>>>>> --- >>>>>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>>>>> if (daemon_mode) { >>>>>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>>>>> } else { >>>>>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>>>>> >>>>>>>> I think I'd avoid using the idl function directly and call instead: >>>>>>>> >>>>>>>> set_idl_probe_interval(idl, 0); >>>>>>>> >>>>>>>> Just to keep it aligned with all other uses in OVN. I can patch >>>>>>>> that at >>>>>>>> apply time if it looks OK to you. >>>>>>> >>>>>>> I’ve got no objections here. >>>>>>> Small nit: set_idl_probe_interval function needs also a remote. >>>>>>> Like this: >>>>>>> >>>>>>> set_idl_probe_interval(idl, db, 0); >>>>>>> >>>>>>> Also, please correct typo in commit message: defatult -> default. >>>>>>> >>>>>> >>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set >>>>>> this probe interval to a very high value instead? That's for the case >>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies >>>>>> because of >>>>>> for example cable being unplugged somewhere on the way between the >>>>>> two. >>>>>> >>>>>> [0] >>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>>>>> >>>>>>>> >>>>>>>>>> + >>>>>>>>>> struct ctl_command *commands; >>>>>>>>>> size_t n_commands; >>>>>>>>>> char *error; >>>>>>>>>> -- >>>>>>>>>> 2.36.1 >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> dev mailing list >>>>>>>>>> dev@openvswitch.org >>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Looks good to me, thanks. >>>>>>>>> >>>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>>>>>>> >>>>>>>> >>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable >>>>>>>> branches >>>>>>>> too, what do you think? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Dumitru >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Vladislav Odintsov >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> dev@openvswitch.org <mailto:dev@openvswitch.org> >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>> >>>>> >>>>> Regards, >>>>> Vladislav Odintsov >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >> > > > Regards, > Vladislav Odintsov >
> On 18 Apr 2023, at 16:59, Dumitru Ceara <dceara@redhat.com> wrote: > > On 4/14/23 15:37, Vladislav Odintsov wrote: >> Hi Dumitru, >> >>> On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote: >>> >>> Hi Vladislav, >>> >>> On 4/3/23 12:08, Ilya Maximets wrote: >>>> On 3/31/23 17:17, Dumitru Ceara wrote: >>>>> On 3/31/23 16:51, Vladislav Odintsov wrote: >>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl >>>>>> (non-daemon mode), it doesn’t make sense to have client -> server >>>>>> inactivity probes. If cable is unplugged, will just hit tcp session >>>>>> timeout, IIUC. >>>>>> Please correct me if I’m wrong. >>>>>> >>>>> >>>>> You're right but the default timeouts are quite large for TCP sessions: >>>>> >>>>> With keepalives default: >>>>> tcp_keepalive_time - INTEGER >>>>> How often TCP sends out keepalive messages when keepalive is enabled. >>>>> Default: 2hours. >>>>> >>>>> tcp_keepalive_probes - INTEGER >>>>> How many keepalive probes TCP sends out, until it decides that the >>>>> connection is broken. Default value: 9. >>>>> >>>>> tcp_keepalive_intvl - INTEGER >>>>> How frequently the probes are send out. Multiplied by >>>>> tcp_keepalive_probes it is time to kill not responding connection, >>>>> after probes started. Default value: 75sec i.e. connection >>>>> will be aborted after ~11 minutes of retries. >>>>> >>>>> DB clients don't even enable tcp keepalives IIRC. >>>>> >>>>> So then we depend retransmits timing out: >>>>> tcp_retries2 - INTEGER >>>>> This value influences the timeout of an alive TCP connection, >>>>> when RTO retransmissions remain unacknowledged. >>>>> Given a value of N, a hypothetical TCP connection following >>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would >>>>> retransmit N times before killing the connection at the (N+1)th RTO. >>>>> >>>>> The default value of 15 yields a hypothetical timeout of 924.6 >>>>> seconds and is a lower bound for the effective timeout. >>>>> TCP will effectively time out at the first RTO which exceeds the >>>>> hypothetical timeout. >>>>> >>>>> RFC 1122 recommends at least 100 seconds for the timeout, >>>>> which corresponds to a value of at least 8. >>>>> >>>>> Isn't it possible that the connection suddenly goes down in between a >>>>> transaction request and its reply but with all TCP segments being >>>>> ACKed? >>>> >>>> Right. If the client already sent the request and only waits for a >>>> reply, >>>> the retransmission timeout will not have any effect. An idle TCP session >>>> may stay open practically forever, because it is designed to survive >>>> network >>>> disruptions. There is no such thing as a TCP timeout in a general case. >>>> >>> >>> I was wondering if you have more thoughts on this? Should we go for the >>> approach where OVN sets the probe interval to a "reasonably large" >>> value by default? For example, using 30s would mean failure after ~60 >>> seconds of inactivity. >> >> Actually I’m okay with your proposal to set a reasonable high inactivity >> probe (120/180 seconds?). >> Maybe it should be not only for non-daemon operation of dbctl but also >> for all modes? >> > > It's probably a good idea, yes. > >> Alternatively, I thought it could be useful to add a new command line >> option like --inactivity-probe=N for the utilities to allow the user to >> choose and set interval which is needed in each case. >> >> What do you think about it? >> > > Northd uses NB_Global.options:northd_probe_interval as inactivity > interval. Can we do the same thing and use as default the high value > until the first successful connection to the NB? Good idea, I’ll dig into this next week, thanks. > >>> >>> Thanks, >>> Dumitru >>> >>>>> >>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>> >>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>>>>>> Hi Dumitru, >>>>>>>> >>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>>> >>>>>>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov >>>>>>>>>> <odivlad@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms >>>>>>>>>>> could be not >>>>>>>>>>> sufficient. This patch disables OVSDB inactivity probes for >>>>>>>>>>> ovn-*ctl >>>>>>>>>>> running >>>>>>>>>>> in non-daemon mode. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>>>>>>>> --- >>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>>>>>> if (daemon_mode) { >>>>>>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>>>>>> } else { >>>>>>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>>>>>> >>>>>>>>> I think I'd avoid using the idl function directly and call instead: >>>>>>>>> >>>>>>>>> set_idl_probe_interval(idl, 0); >>>>>>>>> >>>>>>>>> Just to keep it aligned with all other uses in OVN. I can patch >>>>>>>>> that at >>>>>>>>> apply time if it looks OK to you. >>>>>>>> >>>>>>>> I’ve got no objections here. >>>>>>>> Small nit: set_idl_probe_interval function needs also a remote. >>>>>>>> Like this: >>>>>>>> >>>>>>>> set_idl_probe_interval(idl, db, 0); >>>>>>>> >>>>>>>> Also, please correct typo in commit message: defatult -> default. >>>>>>>> >>>>>>> >>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set >>>>>>> this probe interval to a very high value instead? That's for the case >>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies >>>>>>> because of >>>>>>> for example cable being unplugged somewhere on the way between the >>>>>>> two. >>>>>>> >>>>>>> [0] >>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>>>>>> >>>>>>>>> >>>>>>>>>>> + >>>>>>>>>>> struct ctl_command *commands; >>>>>>>>>>> size_t n_commands; >>>>>>>>>>> char *error; >>>>>>>>>>> -- >>>>>>>>>>> 2.36.1 >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> dev mailing list >>>>>>>>>>> dev@openvswitch.org >>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Looks good to me, thanks. >>>>>>>>>> >>>>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>>>>>>>> >>>>>>>>> >>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable >>>>>>>>> branches >>>>>>>>> too, what do you think? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dumitru >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Vladislav Odintsov >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> dev@openvswitch.org <mailto:dev@openvswitch.org> >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> >>>>>> >>>>>> Regards, >>>>>> Vladislav Odintsov >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>> >> >> >> Regards, >> Vladislav Odintsov Regards, Vladislav Odintsov
Hi Dumitru, I’m a bit concerned about our last conversation’s idea how to set inactivity probe. The point is that in general case dbctl module can be used by multiple existing and possible future ovn-*ctl utilities. So, to configure a basic (we may call it a "startup" idl inactivity probe), which is hardcoded, let’s say to 120000 ms, we should do this inside dbctl.c file. But if we’re talking about to read the contents of DB and try to find inactivity probe configuration there - such code must be in particular utility: ovn-sbctl/ovn-nbctl/etc, right? And it seems to me that the second configuration attempt of inactivity probe makes sense only in case of daemon-mode. So it’s needed after initial load of DB contents to read appropriate DB and reconfigure interval. I’ve posted v2 [1], please take a look on it. Let me know if you have any thoughts/concern. 1: https://patchwork.ozlabs.org/project/ovn/patch/20230503005531.4005891-1-odivlad@gmail.com/ > On 18 Apr 2023, at 18:31, Vladislav Odintsov <odivlad@gmail.com> wrote: > > > >> On 18 Apr 2023, at 16:59, Dumitru Ceara <dceara@redhat.com> wrote: >> >> On 4/14/23 15:37, Vladislav Odintsov wrote: >>> Hi Dumitru, >>> >>>> On 13 Apr 2023, at 11:55, Dumitru Ceara <dceara@redhat.com> wrote: >>>> >>>> Hi Vladislav, >>>> >>>> On 4/3/23 12:08, Ilya Maximets wrote: >>>>> On 3/31/23 17:17, Dumitru Ceara wrote: >>>>>> On 3/31/23 16:51, Vladislav Odintsov wrote: >>>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl >>>>>>> (non-daemon mode), it doesn’t make sense to have client -> server >>>>>>> inactivity probes. If cable is unplugged, will just hit tcp session >>>>>>> timeout, IIUC. >>>>>>> Please correct me if I’m wrong. >>>>>>> >>>>>> >>>>>> You're right but the default timeouts are quite large for TCP sessions: >>>>>> >>>>>> With keepalives default: >>>>>> tcp_keepalive_time - INTEGER >>>>>> How often TCP sends out keepalive messages when keepalive is enabled. >>>>>> Default: 2hours. >>>>>> >>>>>> tcp_keepalive_probes - INTEGER >>>>>> How many keepalive probes TCP sends out, until it decides that the >>>>>> connection is broken. Default value: 9. >>>>>> >>>>>> tcp_keepalive_intvl - INTEGER >>>>>> How frequently the probes are send out. Multiplied by >>>>>> tcp_keepalive_probes it is time to kill not responding connection, >>>>>> after probes started. Default value: 75sec i.e. connection >>>>>> will be aborted after ~11 minutes of retries. >>>>>> >>>>>> DB clients don't even enable tcp keepalives IIRC. >>>>>> >>>>>> So then we depend retransmits timing out: >>>>>> tcp_retries2 - INTEGER >>>>>> This value influences the timeout of an alive TCP connection, >>>>>> when RTO retransmissions remain unacknowledged. >>>>>> Given a value of N, a hypothetical TCP connection following >>>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would >>>>>> retransmit N times before killing the connection at the (N+1)th RTO. >>>>>> >>>>>> The default value of 15 yields a hypothetical timeout of 924.6 >>>>>> seconds and is a lower bound for the effective timeout. >>>>>> TCP will effectively time out at the first RTO which exceeds the >>>>>> hypothetical timeout. >>>>>> >>>>>> RFC 1122 recommends at least 100 seconds for the timeout, >>>>>> which corresponds to a value of at least 8. >>>>>> >>>>>> Isn't it possible that the connection suddenly goes down in between a >>>>>> transaction request and its reply but with all TCP segments being >>>>>> ACKed? >>>>> >>>>> Right. If the client already sent the request and only waits for a >>>>> reply, >>>>> the retransmission timeout will not have any effect. An idle TCP session >>>>> may stay open practically forever, because it is designed to survive >>>>> network >>>>> disruptions. There is no such thing as a TCP timeout in a general case. >>>>> >>>> >>>> I was wondering if you have more thoughts on this? Should we go for the >>>> approach where OVN sets the probe interval to a "reasonably large" >>>> value by default? For example, using 30s would mean failure after ~60 >>>> seconds of inactivity. >>> >>> Actually I’m okay with your proposal to set a reasonable high inactivity >>> probe (120/180 seconds?). >>> Maybe it should be not only for non-daemon operation of dbctl but also >>> for all modes? >>> >> >> It's probably a good idea, yes. >> >>> Alternatively, I thought it could be useful to add a new command line >>> option like --inactivity-probe=N for the utilities to allow the user to >>> choose and set interval which is needed in each case. >>> >>> What do you think about it? >>> >> >> Northd uses NB_Global.options:northd_probe_interval as inactivity >> interval. Can we do the same thing and use as default the high value >> until the first successful connection to the NB? > > Good idea, I’ll dig into this next week, thanks. > >> >>>> >>>> Thanks, >>>> Dumitru >>>> >>>>>> >>>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>> >>>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>>>>>>> Hi Dumitru, >>>>>>>>> >>>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dceara@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov >>>>>>>>>>> <odivlad@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms >>>>>>>>>>>> could be not >>>>>>>>>>>> sufficient. This patch disables OVSDB inactivity probes for >>>>>>>>>>>> ovn-*ctl >>>>>>>>>>>> running >>>>>>>>>>>> in non-daemon mode. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>>>>>>>>> --- >>>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>>>>>>> if (daemon_mode) { >>>>>>>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>>>>>>> } else { >>>>>>>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>>>>>>> >>>>>>>>>> I think I'd avoid using the idl function directly and call instead: >>>>>>>>>> >>>>>>>>>> set_idl_probe_interval(idl, 0); >>>>>>>>>> >>>>>>>>>> Just to keep it aligned with all other uses in OVN. I can patch >>>>>>>>>> that at >>>>>>>>>> apply time if it looks OK to you. >>>>>>>>> >>>>>>>>> I’ve got no objections here. >>>>>>>>> Small nit: set_idl_probe_interval function needs also a remote. >>>>>>>>> Like this: >>>>>>>>> >>>>>>>>> set_idl_probe_interval(idl, db, 0); >>>>>>>>> >>>>>>>>> Also, please correct typo in commit message: defatult -> default. >>>>>>>>> >>>>>>>> >>>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set >>>>>>>> this probe interval to a very high value instead? That's for the case >>>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies >>>>>>>> because of >>>>>>>> for example cable being unplugged somewhere on the way between the >>>>>>>> two. >>>>>>>> >>>>>>>> [0] >>>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>>>>>>> >>>>>>>>>> >>>>>>>>>>>> + >>>>>>>>>>>> struct ctl_command *commands; >>>>>>>>>>>> size_t n_commands; >>>>>>>>>>>> char *error; >>>>>>>>>>>> -- >>>>>>>>>>>> 2.36.1 >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> dev mailing list >>>>>>>>>>>> dev@openvswitch.org >>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Looks good to me, thanks. >>>>>>>>>>> >>>>>>>>>>> Reviewed-by: Ales Musil <amusil@redhat.com> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable >>>>>>>>>> branches >>>>>>>>>> too, what do you think? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Dumitru >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vladislav Odintsov >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dev mailing list >>>>>>>> dev@openvswitch.org <mailto:dev@openvswitch.org> >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Vladislav Odintsov >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> dev@openvswitch.org >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>> >>>> >>> >>> >>> Regards, >>> Vladislav Odintsov > > > Regards, > Vladislav Odintsov > > _______________________________________________ > dev mailing list > dev@openvswitch.org <mailto:dev@openvswitch.org> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov
diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c index 369a6a663..4307a5cae 100644 --- a/utilities/ovn-dbctl.c +++ b/utilities/ovn-dbctl.c @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], if (daemon_mode) { server_loop(dbctl_options, idl, argc, argv_); } else { + /* Disable OVSDB probe interval for non-daemon mode. */ + ovsdb_idl_set_probe_interval(idl, 0); + struct ctl_command *commands; size_t n_commands; char *error;
For large OVN_Southbound DBs defatult interval of 5000 ms could be not sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl running in non-daemon mode. Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> --- utilities/ovn-dbctl.c | 3 +++ 1 file changed, 3 insertions(+)