Message ID | 20211001132624.3248475-1-xsimonar@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] ovn.at: Fix flaky tests "VLAN transparency, passthru=true, multiple hosts" | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
On 01/10/2021 14:26, Xavier Simonart wrote: > Tests were waiting for ports to be reported up before sending packets. > However, waiting for both ports to be up is not enough to guarantee > that all flows are installed for both ports. > We now wait for last flows (implementing switching to localnet port) > are installed. > Following tests were are fixed: > - VLAN transparency, passthru=true, multiple hosts > - VLAN transparency, passthru=true, multiple hosts, custom ethtype > - VLAN transparency, passthru=true, multiple hosts, flat/untagged > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > --- Looks good other than the comment I have in: (ovn.at: Fix flaky test "controller I-P handling with monitoring disabled") https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html > tests/ovn.at | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/tests/ovn.at b/tests/ovn.at > index fc8f31d06..cc940701f 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -3413,6 +3413,14 @@ for i in 1 2; do > options:rxq_pcap=vif$i-rx.pcap \ > ofport-request=$i > OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) > + # Waiting for both ports to be up is not enough to guarantee that all flows are installed. > + # Wait for flows implementing switching to localnet port. > + OVS_WAIT_UNTIL([ > + test $(as hv-$i ovs-ofctl dump-flows br-int | \ > + grep "table=38" | \ > + grep "resubmit(,38)" -c) -eq 1 > + ]) > + > done > > test_packet() { > @@ -3478,8 +3486,13 @@ for i in 1 2; do > options:rxq_pcap=vif$i-rx.pcap \ > ofport-request=$i > wait_for_ports_up lsp$i > -done > + OVS_WAIT_UNTIL([ > + test $(ovs-ofctl dump-flows br-int | \ > + grep "table=38" | \ > + grep "resubmit(,38)" -c) -eq 1 > + ]) > > +done > # create taps on fabric to check vlan encapsulation there > for i in 1 2; do > as hv-$i > @@ -3561,6 +3574,12 @@ for i in 1 2; do > options:rxq_pcap=vif$i-rx.pcap \ > ofport-request=$i > OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) > + OVS_WAIT_UNTIL([ > + test $(ovs-ofctl dump-flows br-int | \ > + grep "table=38" | \ > + grep "resubmit(,38)" -c) -eq 1 > + ]) > + > done > > for i in 1 2; do >
*blows the dust off* Sorry this one has sat for so long. See my comments below. On 10/1/21 11:12, Mark Gray wrote: > On 01/10/2021 14:26, Xavier Simonart wrote: >> Tests were waiting for ports to be reported up before sending packets. >> However, waiting for both ports to be up is not enough to guarantee >> that all flows are installed for both ports. >> We now wait for last flows (implementing switching to localnet port) >> are installed. >> Following tests were are fixed: >> - VLAN transparency, passthru=true, multiple hosts >> - VLAN transparency, passthru=true, multiple hosts, custom ethtype >> - VLAN transparency, passthru=true, multiple hosts, flat/untagged >> >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >> --- > Looks good other than the comment I have in: > (ovn.at: Fix flaky test "controller I-P handling with monitoring disabled") > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html > >> tests/ovn.at | 21 ++++++++++++++++++++- >> 1 file changed, 20 insertions(+), 1 deletion(-) >> >> diff --git a/tests/ovn.at b/tests/ovn.at >> index fc8f31d06..cc940701f 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -3413,6 +3413,14 @@ for i in 1 2; do >> options:rxq_pcap=vif$i-rx.pcap \ >> ofport-request=$i >> OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) >> + # Waiting for both ports to be up is not enough to guarantee that all flows are installed. >> + # Wait for flows implementing switching to localnet port. >> + OVS_WAIT_UNTIL([ >> + test $(as hv-$i ovs-ofctl dump-flows br-int | \ >> + grep "table=38" | \ >> + grep "resubmit(,38)" -c) -eq 1 >> + ]) >> + The concern Mark Gray raised is valid, since a change in table numbers would require updates to this test in order to continue to pass. We actually have a more reliable method of testing if a port's OF flows have been installed. When a port's associated OF flows are installed, we set external_ids:ovn-installed=true on the Port_Binding in the southbound database. I think the following should work: wait_row_count Port_Binding 1 logical_port=lsp$i external_ids:ovn-installed=true This will wait until there is 1 row in the Port_Binding table whose logical_port is set to lsp$i and whose external_ids:ovn-installed is set to true. >> done >> >> test_packet() { >> @@ -3478,8 +3486,13 @@ for i in 1 2; do >> options:rxq_pcap=vif$i-rx.pcap \ >> ofport-request=$i >> wait_for_ports_up lsp$i >> -done >> + OVS_WAIT_UNTIL([ >> + test $(ovs-ofctl dump-flows br-int | \ >> + grep "table=38" | \ >> + grep "resubmit(,38)" -c) -eq 1 >> + ]) >> >> +done >> # create taps on fabric to check vlan encapsulation there >> for i in 1 2; do >> as hv-$i >> @@ -3561,6 +3574,12 @@ for i in 1 2; do >> options:rxq_pcap=vif$i-rx.pcap \ >> ofport-request=$i >> OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) >> + OVS_WAIT_UNTIL([ >> + test $(ovs-ofctl dump-flows br-int | \ >> + grep "table=38" | \ >> + grep "resubmit(,38)" -c) -eq 1 >> + ]) >> + >> done >> >> for i in 1 2; do >> > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
Hi Mark Thanks for looking into this. I agreed that the solution is not optimal as it would require updates if table numbers are changing, but it was the best I had found at that time ... For the solution you propose, do you mean looking at ovn-installed in *ovs* - I do not think that there is an ovn-installed in sbdb ? There is a "Port up" in sbdb and "ovn-installed" in ovs. But both might be written quite before all flows are properly set up. I did raise the question <https://www.mail-archive.com/ovs-dev@openvswitch.org/msg59901.html> some time ago about the potential solutions (fix ovn, fix test cases, ...) and consensus seemed to be to try fixing the test case. In addition to the case reported upstream, patch port is installed later in this test case, and flows for the patch port even later.... so more reasons for the test to fail. However, I now think that, for this test case, we can do better than checking table numbers: - (1) check that the patch port is created in OVS. - (2) wait for flows containing output:"patch_port_id" in Openflow - as those flows are installed after the patch port has been created. Note that checking whether patch port is created (1) is not enough - it is decreasing the test failure rate, but there are still failures. What do you think? Thanks Xvaier On Tue, Feb 1, 2022 at 3:51 PM Mark Michelson <mmichels@redhat.com> wrote: > *blows the dust off* > > Sorry this one has sat for so long. See my comments below. > > On 10/1/21 11:12, Mark Gray wrote: > > On 01/10/2021 14:26, Xavier Simonart wrote: > >> Tests were waiting for ports to be reported up before sending packets. > >> However, waiting for both ports to be up is not enough to guarantee > >> that all flows are installed for both ports. > >> We now wait for last flows (implementing switching to localnet port) > >> are installed. > >> Following tests were are fixed: > >> - VLAN transparency, passthru=true, multiple hosts > >> - VLAN transparency, passthru=true, multiple hosts, custom ethtype > >> - VLAN transparency, passthru=true, multiple hosts, flat/untagged > >> > >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >> --- > > Looks good other than the comment I have in: > > (ovn.at: Fix flaky test "controller I-P handling with monitoring > disabled") > > > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html > > > >> tests/ovn.at | 21 ++++++++++++++++++++- > >> 1 file changed, 20 insertions(+), 1 deletion(-) > >> > >> diff --git a/tests/ovn.at b/tests/ovn.at > >> index fc8f31d06..cc940701f 100644 > >> --- a/tests/ovn.at > >> +++ b/tests/ovn.at > >> @@ -3413,6 +3413,14 @@ for i in 1 2; do > >> options:rxq_pcap=vif$i-rx.pcap \ > >> ofport-request=$i > >> OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) > >> + # Waiting for both ports to be up is not enough to guarantee that > all flows are installed. > >> + # Wait for flows implementing switching to localnet port. > >> + OVS_WAIT_UNTIL([ > >> + test $(as hv-$i ovs-ofctl dump-flows br-int | \ > >> + grep "table=38" | \ > >> + grep "resubmit(,38)" -c) -eq 1 > >> + ]) > >> + > > The concern Mark Gray raised is valid, since a change in table numbers > would require updates to this test in order to continue to pass. > > We actually have a more reliable method of testing if a port's OF flows > have been installed. When a port's associated OF flows are installed, we > set external_ids:ovn-installed=true on the Port_Binding in the > southbound database. > > I think the following should work: > > wait_row_count Port_Binding 1 logical_port=lsp$i > external_ids:ovn-installed=true > > This will wait until there is 1 row in the Port_Binding table whose > logical_port is set to lsp$i and whose external_ids:ovn-installed is set > to true. > > >> done > >> > >> test_packet() { > >> @@ -3478,8 +3486,13 @@ for i in 1 2; do > >> options:rxq_pcap=vif$i-rx.pcap \ > >> ofport-request=$i > >> wait_for_ports_up lsp$i > >> -done > >> + OVS_WAIT_UNTIL([ > >> + test $(ovs-ofctl dump-flows br-int | \ > >> + grep "table=38" | \ > >> + grep "resubmit(,38)" -c) -eq 1 > >> + ]) > >> > >> +done > >> # create taps on fabric to check vlan encapsulation there > >> for i in 1 2; do > >> as hv-$i > >> @@ -3561,6 +3574,12 @@ for i in 1 2; do > >> options:rxq_pcap=vif$i-rx.pcap \ > >> ofport-request=$i > >> OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) > >> + OVS_WAIT_UNTIL([ > >> + test $(ovs-ofctl dump-flows br-int | \ > >> + grep "table=38" | \ > >> + grep "resubmit(,38)" -c) -eq 1 > >> + ]) > >> + > >> done > >> > >> for i in 1 2; do > >> > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > >
On 2/3/22 06:21, Xavier Simonart wrote: > Hi Mark > > Thanks for looking into this. > > I agreed that the solution is not optimal as it would require updates if > table numbers are changing, but it was the best I had found at that time ... > > For the solution you propose, do you mean looking at ovn-installed in > _ovs_ - I do not think that there is an ovn-installed in sbdb ? Sorry, you are correct, it's in the OVS Interface table that corresponds with the SB port binding. Unfortunately, wait_row_count (and other functions/macros in ovn-macros.at) do not work on the OVS database. So to get the same functionality, you would need to copy the logic of wait_row_count to determine when an OVS Interface has external_ids:iface-id set to the logical port name and external_ids:ovn-installed set to true...or make the ovn-macros work with the OVS database :) > There is a "Port up" in sbdb and "ovn-installed" in ovs. But both might > be written quite before all flows are properly set up. > > I did raise thequestion > <https://www.mail-archive.com/ovs-dev@openvswitch.org/msg59901.html> > some time ago about the potential solutions (fix ovn, fix test cases, > ...) and consensus seemed to be to try fixing the test case. > In addition to the case reported upstream, patch port is installed later > in this test case, and flows for the patch port even later.... so more > reasons for the test to fail. > > However, I now think that, for this test case, we can do better than > checking table numbers: > - (1) check that the patch port is created in OVS. > - (2) wait for flows containing output:"patch_port_id" in Openflow - as > those flows are installed after the patch port has been created. > Note that checking whether patch port is created (1) is not enough - it > is decreasing the test failure rate, but there are still failures. > What do you think? Yes, I think this can work, too. Checking "output" actions should be reliable even in the face of changing table numbers. Feel free to give it a try if you think it will be easier or more reliable than using ovn-installed. > > Thanks > Xvaier > > > On Tue, Feb 1, 2022 at 3:51 PM Mark Michelson <mmichels@redhat.com > <mailto:mmichels@redhat.com>> wrote: > > *blows the dust off* > > Sorry this one has sat for so long. See my comments below. > > On 10/1/21 11:12, Mark Gray wrote: > > On 01/10/2021 14:26, Xavier Simonart wrote: > >> Tests were waiting for ports to be reported up before sending > packets. > >> However, waiting for both ports to be up is not enough to guarantee > >> that all flows are installed for both ports. > >> We now wait for last flows (implementing switching to localnet port) > >> are installed. > >> Following tests were are fixed: > >> - VLAN transparency, passthru=true, multiple hosts > >> - VLAN transparency, passthru=true, multiple hosts, custom ethtype > >> - VLAN transparency, passthru=true, multiple hosts, flat/untagged > >> > >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com > <mailto:xsimonar@redhat.com>> > >> --- > > Looks good other than the comment I have in: > > (ovn.at <http://ovn.at>: Fix flaky test "controller I-P handling > with monitoring disabled") > > > > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html > <https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388223.html> > > > >> tests/ovn.at <http://ovn.at> | 21 ++++++++++++++++++++- > >> 1 file changed, 20 insertions(+), 1 deletion(-) > >> > >> diff --git a/tests/ovn.at <http://ovn.at> b/tests/ovn.at > <http://ovn.at> > >> index fc8f31d06..cc940701f 100644 > >> --- a/tests/ovn.at <http://ovn.at> > >> +++ b/tests/ovn.at <http://ovn.at> > >> @@ -3413,6 +3413,14 @@ for i in 1 2; do > >> options:rxq_pcap=vif$i-rx.pcap \ > >> ofport-request=$i > >> OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) > >> + # Waiting for both ports to be up is not enough to > guarantee that all flows are installed. > >> + # Wait for flows implementing switching to localnet port. > >> + OVS_WAIT_UNTIL([ > >> + test $(as hv-$i ovs-ofctl dump-flows br-int | \ > >> + grep "table=38" | \ > >> + grep "resubmit(,38)" -c) -eq 1 > >> + ]) > >> + > > The concern Mark Gray raised is valid, since a change in table numbers > would require updates to this test in order to continue to pass. > > We actually have a more reliable method of testing if a port's OF flows > have been installed. When a port's associated OF flows are > installed, we > set external_ids:ovn-installed=true on the Port_Binding in the > southbound database. > > I think the following should work: > > wait_row_count Port_Binding 1 logical_port=lsp$i > external_ids:ovn-installed=true > > This will wait until there is 1 row in the Port_Binding table whose > logical_port is set to lsp$i and whose external_ids:ovn-installed is > set > to true. > > >> done > >> > >> test_packet() { > >> @@ -3478,8 +3486,13 @@ for i in 1 2; do > >> options:rxq_pcap=vif$i-rx.pcap \ > >> ofport-request=$i > >> wait_for_ports_up lsp$i > >> -done > >> + OVS_WAIT_UNTIL([ > >> + test $(ovs-ofctl dump-flows br-int | \ > >> + grep "table=38" | \ > >> + grep "resubmit(,38)" -c) -eq 1 > >> + ]) > >> > >> +done > >> # create taps on fabric to check vlan encapsulation there > >> for i in 1 2; do > >> as hv-$i > >> @@ -3561,6 +3574,12 @@ for i in 1 2; do > >> options:rxq_pcap=vif$i-rx.pcap \ > >> ofport-request=$i > >> OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) > >> + OVS_WAIT_UNTIL([ > >> + test $(ovs-ofctl dump-flows br-int | \ > >> + grep "table=38" | \ > >> + grep "resubmit(,38)" -c) -eq 1 > >> + ]) > >> + > >> done > >> > >> for i in 1 2; do > >> > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org <mailto:dev@openvswitch.org> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> > > >
diff --git a/tests/ovn.at b/tests/ovn.at index fc8f31d06..cc940701f 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -3413,6 +3413,14 @@ for i in 1 2; do options:rxq_pcap=vif$i-rx.pcap \ ofport-request=$i OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) + # Waiting for both ports to be up is not enough to guarantee that all flows are installed. + # Wait for flows implementing switching to localnet port. + OVS_WAIT_UNTIL([ + test $(as hv-$i ovs-ofctl dump-flows br-int | \ + grep "table=38" | \ + grep "resubmit(,38)" -c) -eq 1 + ]) + done test_packet() { @@ -3478,8 +3486,13 @@ for i in 1 2; do options:rxq_pcap=vif$i-rx.pcap \ ofport-request=$i wait_for_ports_up lsp$i -done + OVS_WAIT_UNTIL([ + test $(ovs-ofctl dump-flows br-int | \ + grep "table=38" | \ + grep "resubmit(,38)" -c) -eq 1 + ]) +done # create taps on fabric to check vlan encapsulation there for i in 1 2; do as hv-$i @@ -3561,6 +3574,12 @@ for i in 1 2; do options:rxq_pcap=vif$i-rx.pcap \ ofport-request=$i OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp$i` = xup]) + OVS_WAIT_UNTIL([ + test $(ovs-ofctl dump-flows br-int | \ + grep "table=38" | \ + grep "resubmit(,38)" -c) -eq 1 + ]) + done for i in 1 2; do
Tests were waiting for ports to be reported up before sending packets. However, waiting for both ports to be up is not enough to guarantee that all flows are installed for both ports. We now wait for last flows (implementing switching to localnet port) are installed. Following tests were are fixed: - VLAN transparency, passthru=true, multiple hosts - VLAN transparency, passthru=true, multiple hosts, custom ethtype - VLAN transparency, passthru=true, multiple hosts, flat/untagged Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- tests/ovn.at | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)