diff mbox series

[ovs-dev,v2] tests: Remove broken "feature inactivity probe" test.

Message ID 20231116135410.1259216-1-dceara@redhat.com
State Accepted
Headers show
Series [ovs-dev,v2] tests: Remove broken "feature inactivity probe" test. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Dumitru Ceara Nov. 16, 2023, 1:54 p.m. UTC
The test incorrectly assumed that "ovn-nbctl --wait=hv sync" will always
send an OpenFlow barrier to ovs-vswitchd.  That doesn't happen unless
there are other OpenFlow (rule or group) changes that need to be
programmed in the datapath.

An initial attempt to fix this [0] uncovered the fact that there's no
easy way to cover all possible scenarios [1].  It seems that the effort
to fix the test is not justified for the case it tests for (that
inactivity probes are handled by the features module - that is quite
obviously handled in the code).  It's therefore more reasonable to just
skip the test (to avoid noise in CI).

Spotted in CI, mostly on oversubscribed systems.  A log sample that
shows that ovn-controller didn't generate any barrier request for the
nbctl sync request:

  2023-11-15T12:12:22.937Z|00084|vconn|DBG|unix#4: received: OFPT_BARRIER_REQUEST (OF1.5) (xid=0x13):
  2023-11-15T12:12:22.937Z|00085|vconn|DBG|unix#4: sent (Success): OFPT_BARRIER_REPLY (OF1.5) (xid=0x13):
  ...
  2023-11-15T12:12:23.032Z|00090|unixctl|DBG|received request time/warp["60000"], id=0
  2023-11-15T12:12:23.032Z|00091|unixctl|DBG|replying with success, id=0: "warped"
  2023-11-15T12:12:23.042Z|00094|vconn|DBG|unix#3: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
  2023-11-15T12:12:23.042Z|00095|vconn|DBG|unix#4: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
  2023-11-15T12:12:23.042Z|00097|vconn|DBG|unix#5: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
  2023-11-15T12:12:23.042Z|00098|unixctl|DBG|received request time/warp["60000"], id=0
  2023-11-15T12:12:23.042Z|00099|unixctl|DBG|replying with success, id=0: "warped"
  2023-11-15T12:12:23.052Z|00100|rconn|ERR|br-int<->unix#3: no response to inactivity probe after 60 seconds, disconnecting
  2023-11-15T12:12:23.052Z|00101|rconn|ERR|br-int<->unix#4: no response to inactivity probe after 60 seconds, disconnecting
  2023-11-15T12:12:23.052Z|00102|rconn|ERR|br-int<->unix#5: no response to inactivity probe after 60 seconds, disconnecting
  2023-11-15T12:12:23.052Z|00103|unixctl|DBG|received request time/warp["60000"], id=0

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409473.html
[1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409530.html

Fixes: ff00142808dc ("controller: Fixed ovs/ovn(features) connection lost when running more than 120 seconds")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
V2:
- removed the test instead of trying to fix it (after discussion with
Xavier).
---
 tests/ovn.at | 31 -------------------------------
 1 file changed, 31 deletions(-)

Comments

Xavier Simonart Nov. 16, 2023, 3:21 p.m. UTC | #1
Hi Dumitru

Looks good to me, thanks.
Acked-by: Xavier Simonart <xsimonar@redhat.com>

Thanks
Xavier

On Thu, Nov 16, 2023 at 2:54 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> The test incorrectly assumed that "ovn-nbctl --wait=hv sync" will always
> send an OpenFlow barrier to ovs-vswitchd.  That doesn't happen unless
> there are other OpenFlow (rule or group) changes that need to be
> programmed in the datapath.
>
> An initial attempt to fix this [0] uncovered the fact that there's no
> easy way to cover all possible scenarios [1].  It seems that the effort
> to fix the test is not justified for the case it tests for (that
> inactivity probes are handled by the features module - that is quite
> obviously handled in the code).  It's therefore more reasonable to just
> skip the test (to avoid noise in CI).
>
> Spotted in CI, mostly on oversubscribed systems.  A log sample that
> shows that ovn-controller didn't generate any barrier request for the
> nbctl sync request:
>
>   2023-11-15T12:12:22.937Z|00084|vconn|DBG|unix#4: received: OFPT_BARRIER_REQUEST (OF1.5) (xid=0x13):
>   2023-11-15T12:12:22.937Z|00085|vconn|DBG|unix#4: sent (Success): OFPT_BARRIER_REPLY (OF1.5) (xid=0x13):
>   ...
>   2023-11-15T12:12:23.032Z|00090|unixctl|DBG|received request time/warp["60000"], id=0
>   2023-11-15T12:12:23.032Z|00091|unixctl|DBG|replying with success, id=0: "warped"
>   2023-11-15T12:12:23.042Z|00094|vconn|DBG|unix#3: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>   2023-11-15T12:12:23.042Z|00095|vconn|DBG|unix#4: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>   2023-11-15T12:12:23.042Z|00097|vconn|DBG|unix#5: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>   2023-11-15T12:12:23.042Z|00098|unixctl|DBG|received request time/warp["60000"], id=0
>   2023-11-15T12:12:23.042Z|00099|unixctl|DBG|replying with success, id=0: "warped"
>   2023-11-15T12:12:23.052Z|00100|rconn|ERR|br-int<->unix#3: no response to inactivity probe after 60 seconds, disconnecting
>   2023-11-15T12:12:23.052Z|00101|rconn|ERR|br-int<->unix#4: no response to inactivity probe after 60 seconds, disconnecting
>   2023-11-15T12:12:23.052Z|00102|rconn|ERR|br-int<->unix#5: no response to inactivity probe after 60 seconds, disconnecting
>   2023-11-15T12:12:23.052Z|00103|unixctl|DBG|received request time/warp["60000"], id=0
>
> [0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409473.html
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409530.html
>
> Fixes: ff00142808dc ("controller: Fixed ovs/ovn(features) connection lost when running more than 120 seconds")
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---
> V2:
> - removed the test instead of trying to fix it (after discussion with
> Xavier).
> ---
>  tests/ovn.at | 31 -------------------------------
>  1 file changed, 31 deletions(-)
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index b8c61f87fb..198387d93e 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -35342,37 +35342,6 @@ OVN_CLEANUP([hv1],[hv2])
>  AT_CLEANUP
>  ])
>
> -OVN_FOR_EACH_NORTHD([
> -AT_SETUP([feature inactivity probe])
> -ovn_start
> -net_add n1
> -
> -sim_add hv1
> -as hv1
> -check ovs-vsctl add-br br-phys
> -ovn_attach n1 br-phys 192.168.0.1
> -
> -dnl Ensure that there are 4 openflow connections.
> -OVS_WAIT_UNTIL([test "$(grep -c 'negotiated OpenFlow version' hv1/ovs-vswitchd.log)" -eq "4"])
> -
> -dnl "Wait" 3 times 60 seconds and ensure ovn-controller writes to the
> -dnl openflow connections in the meantime.  This should allow ovs-vswitchd
> -dnl to probe the openflow connections at least twice.
> -
> -as hv1 ovs-appctl time/warp 60000
> -check ovn-nbctl --wait=hv sync
> -
> -as hv1 ovs-appctl time/warp 60000
> -check ovn-nbctl --wait=hv sync
> -
> -as hv1 ovs-appctl time/warp 60000
> -check ovn-nbctl --wait=hv sync
> -
> -AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
> -OVN_CLEANUP([hv1])
> -AT_CLEANUP
> -])
> -
>  OVN_FOR_EACH_NORTHD([
>  AT_SETUP([Logical flows with Chassis_Template_Var references])
>  AT_KEYWORDS([templates])
> --
> 2.39.3
>
Dumitru Ceara Nov. 16, 2023, 4:31 p.m. UTC | #2
On 11/16/23 16:21, Xavier Simonart wrote:
> Hi Dumitru
> 
> Looks good to me, thanks.
> Acked-by: Xavier Simonart <xsimonar@redhat.com>
> 

Thanks!  Applied and backported down to 22.03.  One flaky test less to
worry about! :)

> Thanks
> Xavier
> 
> On Thu, Nov 16, 2023 at 2:54 PM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> The test incorrectly assumed that "ovn-nbctl --wait=hv sync" will always
>> send an OpenFlow barrier to ovs-vswitchd.  That doesn't happen unless
>> there are other OpenFlow (rule or group) changes that need to be
>> programmed in the datapath.
>>
>> An initial attempt to fix this [0] uncovered the fact that there's no
>> easy way to cover all possible scenarios [1].  It seems that the effort
>> to fix the test is not justified for the case it tests for (that
>> inactivity probes are handled by the features module - that is quite
>> obviously handled in the code).  It's therefore more reasonable to just
>> skip the test (to avoid noise in CI).
>>
>> Spotted in CI, mostly on oversubscribed systems.  A log sample that
>> shows that ovn-controller didn't generate any barrier request for the
>> nbctl sync request:
>>
>>   2023-11-15T12:12:22.937Z|00084|vconn|DBG|unix#4: received: OFPT_BARRIER_REQUEST (OF1.5) (xid=0x13):
>>   2023-11-15T12:12:22.937Z|00085|vconn|DBG|unix#4: sent (Success): OFPT_BARRIER_REPLY (OF1.5) (xid=0x13):
>>   ...
>>   2023-11-15T12:12:23.032Z|00090|unixctl|DBG|received request time/warp["60000"], id=0
>>   2023-11-15T12:12:23.032Z|00091|unixctl|DBG|replying with success, id=0: "warped"
>>   2023-11-15T12:12:23.042Z|00094|vconn|DBG|unix#3: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>>   2023-11-15T12:12:23.042Z|00095|vconn|DBG|unix#4: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>>   2023-11-15T12:12:23.042Z|00097|vconn|DBG|unix#5: sent (Success): OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
>>   2023-11-15T12:12:23.042Z|00098|unixctl|DBG|received request time/warp["60000"], id=0
>>   2023-11-15T12:12:23.042Z|00099|unixctl|DBG|replying with success, id=0: "warped"
>>   2023-11-15T12:12:23.052Z|00100|rconn|ERR|br-int<->unix#3: no response to inactivity probe after 60 seconds, disconnecting
>>   2023-11-15T12:12:23.052Z|00101|rconn|ERR|br-int<->unix#4: no response to inactivity probe after 60 seconds, disconnecting
>>   2023-11-15T12:12:23.052Z|00102|rconn|ERR|br-int<->unix#5: no response to inactivity probe after 60 seconds, disconnecting
>>   2023-11-15T12:12:23.052Z|00103|unixctl|DBG|received request time/warp["60000"], id=0
>>
>> [0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409473.html
>> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409530.html
>>
>> Fixes: ff00142808dc ("controller: Fixed ovs/ovn(features) connection lost when running more than 120 seconds")
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>> ---
>> V2:
>> - removed the test instead of trying to fix it (after discussion with
>> Xavier).
>> ---
>>  tests/ovn.at | 31 -------------------------------
>>  1 file changed, 31 deletions(-)
>>
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index b8c61f87fb..198387d93e 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -35342,37 +35342,6 @@ OVN_CLEANUP([hv1],[hv2])
>>  AT_CLEANUP
>>  ])
>>
>> -OVN_FOR_EACH_NORTHD([
>> -AT_SETUP([feature inactivity probe])
>> -ovn_start
>> -net_add n1
>> -
>> -sim_add hv1
>> -as hv1
>> -check ovs-vsctl add-br br-phys
>> -ovn_attach n1 br-phys 192.168.0.1
>> -
>> -dnl Ensure that there are 4 openflow connections.
>> -OVS_WAIT_UNTIL([test "$(grep -c 'negotiated OpenFlow version' hv1/ovs-vswitchd.log)" -eq "4"])
>> -
>> -dnl "Wait" 3 times 60 seconds and ensure ovn-controller writes to the
>> -dnl openflow connections in the meantime.  This should allow ovs-vswitchd
>> -dnl to probe the openflow connections at least twice.
>> -
>> -as hv1 ovs-appctl time/warp 60000
>> -check ovn-nbctl --wait=hv sync
>> -
>> -as hv1 ovs-appctl time/warp 60000
>> -check ovn-nbctl --wait=hv sync
>> -
>> -as hv1 ovs-appctl time/warp 60000
>> -check ovn-nbctl --wait=hv sync
>> -
>> -AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
>> -OVN_CLEANUP([hv1])
>> -AT_CLEANUP
>> -])
>> -
>>  OVN_FOR_EACH_NORTHD([
>>  AT_SETUP([Logical flows with Chassis_Template_Var references])
>>  AT_KEYWORDS([templates])
>> --
>> 2.39.3
>>
>
diff mbox series

Patch

diff --git a/tests/ovn.at b/tests/ovn.at
index b8c61f87fb..198387d93e 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -35342,37 +35342,6 @@  OVN_CLEANUP([hv1],[hv2])
 AT_CLEANUP
 ])
 
-OVN_FOR_EACH_NORTHD([
-AT_SETUP([feature inactivity probe])
-ovn_start
-net_add n1
-
-sim_add hv1
-as hv1
-check ovs-vsctl add-br br-phys
-ovn_attach n1 br-phys 192.168.0.1
-
-dnl Ensure that there are 4 openflow connections.
-OVS_WAIT_UNTIL([test "$(grep -c 'negotiated OpenFlow version' hv1/ovs-vswitchd.log)" -eq "4"])
-
-dnl "Wait" 3 times 60 seconds and ensure ovn-controller writes to the
-dnl openflow connections in the meantime.  This should allow ovs-vswitchd
-dnl to probe the openflow connections at least twice.
-
-as hv1 ovs-appctl time/warp 60000
-check ovn-nbctl --wait=hv sync
-
-as hv1 ovs-appctl time/warp 60000
-check ovn-nbctl --wait=hv sync
-
-as hv1 ovs-appctl time/warp 60000
-check ovn-nbctl --wait=hv sync
-
-AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
-OVN_CLEANUP([hv1])
-AT_CLEANUP
-])
-
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([Logical flows with Chassis_Template_Var references])
 AT_KEYWORDS([templates])