Message ID | 3957de4e1a47075fd8251156c01ec06cad9ba60a.1702659678.git.lorenzo.bianconi@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] test: add dedicated test for garp-max-timeout | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On 12/15/23 18:01, Lorenzo Bianconi wrote: > Introduce a dedicated test for garp-max-timeout knob > For context of the review, I had asked for this "standalone" test for garp-max-timeout so that we can backport the following patch to 21.12: https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410247.html > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> > --- > tests/ovn.at | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 82 insertions(+) > > diff --git a/tests/ovn.at b/tests/ovn.at > index 918f97a9e..25b101a7a 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -37464,3 +37464,85 @@ wait_for_ports_up > OVN_CLEANUP([hv1]) > AT_CLEANUP > ]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([gratuitous arp max timeout]) > +AT_KEYWORDS([garp-timeout]) > +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) Nit: We don't really need tcpdump, we could use the ovs-pcap utility directly. But it's not mandatory I guess, we already have a bunch of tests using tcpdump. > +ovn_start > + > +ovn-nbctl ls-add ls0 > +ovn-nbctl lr-add lr0 > +ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24 > +ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \ > + type=router options:router-port=lr0-ls0 addresses='"f0:00:00:00:00:01"' > + > +ovn-nbctl lsp-add ls0 ln_port > +ovn-nbctl lsp-set-addresses ln_port unknown > +ovn-nbctl lsp-set-type ln_port localnet > +ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1 Missing "check .." for all these. > + > +# Prepare packets > +touch empty_expected > +echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" > arp_expected Please use fmt_pkt instead. > + > +net_add n1 > +sim_add hv1 > +as hv1 > +ovs-vsctl \ > + -- add-br br-phys \ > + -- add-br br-eth0 > + > +ovn_attach n1 br-phys 192.168.0.10 > + > +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0]) > +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap]) > + > +ovn-sbctl dump-flows > sbflows > +AT_CAPTURE_FILE([sbflows]) > + Why do we need this? > +# Wait until the patch ports are created in hv1 to connect br-int to br-eth0 > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) > +OVN_WAIT_PATCH_PORT_FLOWS(["ln_port"], ["hv1"]) Using OVN_WAIT_PATCH_PORT_FLOWS makes the patch harder to backport to 21.12. Do we really need this? > +# Temporarily remove lr0 chassis > +# Wait for hv confirmation to make sure chassis is removed before we reset pcap > +# Otherwise a garp might be sent after pcap have been reset but before chassis is removed > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) > + Why do we need to remove the "chassis" option on lr0? > +as hv1 reset_pcap_file snoopvif hv1/snoopvif > +hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1) > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) > +OVS_WAIT_UNTIL([ > + ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding ls0-lr0) > + test "$ls0_lr0" = $hv1_uuid > +]) > + > +OVN_CHECK_PACKETS_CONTAIN([hv1/snoopvif-tx.pcap], [arp_expected]) This checks that one ARP packet is there, why? The test is about garp-max-timeout. > +# Temporarily remove lr0 chassis > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) > + Why is this needed? > +as hv1 reset_pcap_file snoopvif hv1/snoopvif > +# set garp max timeout to 2s > +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=2]) > +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1]) > + > +OVS_WAIT_UNTIL([ > +n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp | wc -l) > +test "$n_arp" = 10 > +]) This doesn't actually test the timeout value. It will wait until 10 packets are received; they could be received at any intervals and the test will pass as long as it doesn't take more than OVS_TIMEOUT to receive all 10 packets. > + > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) > +as hv1 reset_pcap_file snoopvif hv1/snoopvif > +# set garp max timeout to 2s > +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=1]) The comment says "2s" but we set the value to 1 second. > +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1]) I'm still confused about why we keep changing the options:chassis value. > + > +OVS_WAIT_UNTIL([ > +n_arp=$(tcpdump -c 20 -ner hv1/snoopvif-tx.pcap arp | wc -l) > +test "$n_arp" = 20 > +]) > + Same comment as above about not really testing the timeout. > +OVN_CLEANUP([hv1]) > + > +AT_CLEANUP > +]) I'm not sure whether we can do better than: clear pcap file, configure "x seconds" timeout, sleep for "x * (y + 1)" seconds, check that the the pcap contains at least y arp packets. But that might be enough. We could also tag the test as "unstable" (TAG_UNSTABLE) pro-actively so that it gets rechecked on failure in GitHub actions. What do you think? Thanks, Dumitru
> On 12/15/23 18:01, Lorenzo Bianconi wrote: > > Introduce a dedicated test for garp-max-timeout knob > > > > For context of the review, I had asked for this "standalone" test for > garp-max-timeout so that we can backport the following patch to 21.12: > > https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410247.html Hi Dumitru, thx for thre review > > > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> > > --- > > tests/ovn.at | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 82 insertions(+) > > > > diff --git a/tests/ovn.at b/tests/ovn.at > > index 918f97a9e..25b101a7a 100644 > > --- a/tests/ovn.at > > +++ b/tests/ovn.at > > @@ -37464,3 +37464,85 @@ wait_for_ports_up > > OVN_CLEANUP([hv1]) > > AT_CLEANUP > > ]) > > + > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([gratuitous arp max timeout]) > > +AT_KEYWORDS([garp-timeout]) > > +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) > > Nit: We don't really need tcpdump, we could use the ovs-pcap utility > directly. But it's not mandatory I guess, we already have a bunch of > tests using tcpdump. > > > +ovn_start > > + > > +ovn-nbctl ls-add ls0 > > +ovn-nbctl lr-add lr0 > > +ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24 > > +ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \ > > + type=router options:router-port=lr0-ls0 addresses='"f0:00:00:00:00:01"' > > + > > +ovn-nbctl lsp-add ls0 ln_port > > +ovn-nbctl lsp-set-addresses ln_port unknown > > +ovn-nbctl lsp-set-type ln_port localnet > > +ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1 > > Missing "check .." for all these. ack, I will fix it > > > + > > +# Prepare packets > > +touch empty_expected > > +echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" > arp_expected > > Please use fmt_pkt instead. ack, I will fix it > > > + > > +net_add n1 > > +sim_add hv1 > > +as hv1 > > +ovs-vsctl \ > > + -- add-br br-phys \ > > + -- add-br br-eth0 > > + > > +ovn_attach n1 br-phys 192.168.0.10 > > + > > +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0]) > > +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap]) > > + > > +ovn-sbctl dump-flows > sbflows > > +AT_CAPTURE_FILE([sbflows]) > > + > > Why do we need this? I will get rid of it > > > +# Wait until the patch ports are created in hv1 to connect br-int to br-eth0 > > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) > > +OVN_WAIT_PATCH_PORT_FLOWS(["ln_port"], ["hv1"]) > > Using OVN_WAIT_PATCH_PORT_FLOWS makes the patch harder to backport to > 21.12. Do we really need this? I will get rid of it > > > +# Temporarily remove lr0 chassis > > +# Wait for hv confirmation to make sure chassis is removed before we reset pcap > > +# Otherwise a garp might be sent after pcap have been reset but before chassis is removed > > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) > > + > > Why do we need to remove the "chassis" option on lr0? it is to avoid we receive garp packet during the configuration > > > +as hv1 reset_pcap_file snoopvif hv1/snoopvif > > +hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1) > > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) > > +OVS_WAIT_UNTIL([ > > + ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding ls0-lr0) > > + test "$ls0_lr0" = $hv1_uuid > > +]) > > + > > +OVN_CHECK_PACKETS_CONTAIN([hv1/snoopvif-tx.pcap], [arp_expected]) > > This checks that one ARP packet is there, why? The test is about > garp-max-timeout. > > > +# Temporarily remove lr0 chassis > > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) > > + > > Why is this needed? I will get rid of it > > > +as hv1 reset_pcap_file snoopvif hv1/snoopvif > > +# set garp max timeout to 2s > > +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=2]) > > +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1]) > > + > > +OVS_WAIT_UNTIL([ > > +n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp | wc -l) > > +test "$n_arp" = 10 > > +]) > > This doesn't actually test the timeout value. It will wait until 10 > packets are received; they could be received at any intervals and the > test will pass as long as it doesn't take more than OVS_TIMEOUT to > receive all 10 packets. > > > + > > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) > > +as hv1 reset_pcap_file snoopvif hv1/snoopvif > > +# set garp max timeout to 2s > > +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=1]) > > The comment says "2s" but we set the value to 1 second. ack, I will fix it > > > +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1]) > > I'm still confused about why we keep changing the options:chassis value. same as above > > > + > > +OVS_WAIT_UNTIL([ > > +n_arp=$(tcpdump -c 20 -ner hv1/snoopvif-tx.pcap arp | wc -l) > > +test "$n_arp" = 20 > > +]) > > + > > Same comment as above about not really testing the timeout. > > > +OVN_CLEANUP([hv1]) > > + > > +AT_CLEANUP > > +]) > > I'm not sure whether we can do better than: clear pcap file, configure > "x seconds" timeout, sleep for "x * (y + 1)" seconds, check that the the > pcap contains at least y arp packets. > > But that might be enough. We could also tag the test as "unstable" > (TAG_UNSTABLE) pro-actively so that it gets rechecked on failure in > GitHub actions. > > What do you think? ack, I will rework the test according to your comments. Regards, Lorenzo > > Thanks, > Dumitru > >
diff --git a/tests/ovn.at b/tests/ovn.at index 918f97a9e..25b101a7a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -37464,3 +37464,85 @@ wait_for_ports_up OVN_CLEANUP([hv1]) AT_CLEANUP ]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([gratuitous arp max timeout]) +AT_KEYWORDS([garp-timeout]) +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) +ovn_start + +ovn-nbctl ls-add ls0 +ovn-nbctl lr-add lr0 +ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24 +ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \ + type=router options:router-port=lr0-ls0 addresses='"f0:00:00:00:00:01"' + +ovn-nbctl lsp-add ls0 ln_port +ovn-nbctl lsp-set-addresses ln_port unknown +ovn-nbctl lsp-set-type ln_port localnet +ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1 + +# Prepare packets +touch empty_expected +echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" > arp_expected + +net_add n1 +sim_add hv1 +as hv1 +ovs-vsctl \ + -- add-br br-phys \ + -- add-br br-eth0 + +ovn_attach n1 br-phys 192.168.0.10 + +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0]) +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap]) + +ovn-sbctl dump-flows > sbflows +AT_CAPTURE_FILE([sbflows]) + +# Wait until the patch ports are created in hv1 to connect br-int to br-eth0 +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) +OVN_WAIT_PATCH_PORT_FLOWS(["ln_port"], ["hv1"]) +# Temporarily remove lr0 chassis +# Wait for hv confirmation to make sure chassis is removed before we reset pcap +# Otherwise a garp might be sent after pcap have been reset but before chassis is removed +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) + +as hv1 reset_pcap_file snoopvif hv1/snoopvif +hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1) +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1]) +OVS_WAIT_UNTIL([ + ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding ls0-lr0) + test "$ls0_lr0" = $hv1_uuid +]) + +OVN_CHECK_PACKETS_CONTAIN([hv1/snoopvif-tx.pcap], [arp_expected]) +# Temporarily remove lr0 chassis +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) + +as hv1 reset_pcap_file snoopvif hv1/snoopvif +# set garp max timeout to 2s +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=2]) +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1]) + +OVS_WAIT_UNTIL([ +n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp | wc -l) +test "$n_arp" = 10 +]) + +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis]) +as hv1 reset_pcap_file snoopvif hv1/snoopvif +# set garp max timeout to 2s +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=1]) +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1]) + +OVS_WAIT_UNTIL([ +n_arp=$(tcpdump -c 20 -ner hv1/snoopvif-tx.pcap arp | wc -l) +test "$n_arp" = 20 +]) + +OVN_CLEANUP([hv1]) + +AT_CLEANUP +])
Introduce a dedicated test for garp-max-timeout knob Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> --- tests/ovn.at | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+)