diff mbox series

[ovs-dev] test: add dedicated test for garp-max-timeout

Message ID 3957de4e1a47075fd8251156c01ec06cad9ba60a.1702659678.git.lorenzo.bianconi@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] test: add dedicated test for garp-max-timeout | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Lorenzo Bianconi Dec. 15, 2023, 5:01 p.m. UTC
Introduce a dedicated test for garp-max-timeout knob

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
---
 tests/ovn.at | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

Comments

Dumitru Ceara Dec. 18, 2023, 11:17 a.m. UTC | #1
On 12/15/23 18:01, Lorenzo Bianconi wrote:
> Introduce a dedicated test for garp-max-timeout knob
> 

For context of the review, I had asked for this "standalone" test for
garp-max-timeout so that we can backport the following patch to 21.12:

https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410247.html

> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> ---
>  tests/ovn.at | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 82 insertions(+)
> 
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 918f97a9e..25b101a7a 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -37464,3 +37464,85 @@ wait_for_ports_up
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([gratuitous arp max timeout])
> +AT_KEYWORDS([garp-timeout])
> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])

Nit: We don't really need tcpdump, we could use the ovs-pcap utility
directly.  But it's not mandatory I guess, we already have a bunch of
tests using tcpdump.

> +ovn_start
> +
> +ovn-nbctl ls-add ls0
> +ovn-nbctl lr-add lr0
> +ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24
> +ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \
> +    type=router options:router-port=lr0-ls0 addresses='"f0:00:00:00:00:01"'
> +
> +ovn-nbctl lsp-add ls0 ln_port
> +ovn-nbctl lsp-set-addresses ln_port unknown
> +ovn-nbctl lsp-set-type ln_port localnet
> +ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1

Missing "check .." for all these.

> +
> +# Prepare packets
> +touch empty_expected
> +echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" > arp_expected

Please use fmt_pkt instead.

> +
> +net_add n1
> +sim_add hv1
> +as hv1
> +ovs-vsctl \
> +    -- add-br br-phys \
> +    -- add-br br-eth0
> +
> +ovn_attach n1 br-phys 192.168.0.10
> +
> +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0])
> +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap])
> +
> +ovn-sbctl dump-flows > sbflows
> +AT_CAPTURE_FILE([sbflows])
> +

Why do we need this?

> +# Wait until the patch ports are created in hv1 to connect br-int to br-eth0
> +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
> +OVN_WAIT_PATCH_PORT_FLOWS(["ln_port"], ["hv1"])

Using OVN_WAIT_PATCH_PORT_FLOWS makes the patch harder to backport to
21.12.  Do we really need this?

> +# Temporarily remove lr0 chassis
> +# Wait for hv confirmation to make sure chassis is removed before we reset pcap
> +# Otherwise a garp might be sent after pcap have been reset but before chassis is removed
> +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
> +

Why do we need to remove the "chassis" option on lr0?

> +as hv1 reset_pcap_file snoopvif hv1/snoopvif
> +hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
> +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
> +OVS_WAIT_UNTIL([
> +    ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding ls0-lr0)
> +    test "$ls0_lr0" = $hv1_uuid
> +])
> +
> +OVN_CHECK_PACKETS_CONTAIN([hv1/snoopvif-tx.pcap], [arp_expected])

This checks that one ARP packet is there, why?  The test is about
garp-max-timeout.

> +# Temporarily remove lr0 chassis
> +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
> +

Why is this needed?

> +as hv1 reset_pcap_file snoopvif hv1/snoopvif
> +# set garp max timeout to 2s
> +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=2])
> +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])
> +
> +OVS_WAIT_UNTIL([
> +n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp | wc -l)
> +test "$n_arp" = 10
> +])

This doesn't actually test the timeout value.  It will wait until 10
packets are received; they could be received at any intervals and the
test will pass as long as it doesn't take more than OVS_TIMEOUT to
receive all 10 packets.

> +
> +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
> +as hv1 reset_pcap_file snoopvif hv1/snoopvif
> +# set garp max timeout to 2s
> +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=1])

The comment says "2s" but we set the value to 1 second.

> +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])

I'm still confused about why we keep changing the options:chassis value.

> +
> +OVS_WAIT_UNTIL([
> +n_arp=$(tcpdump -c 20 -ner hv1/snoopvif-tx.pcap arp | wc -l)
> +test "$n_arp" = 20
> +])
> +

Same comment as above about not really testing the timeout.

> +OVN_CLEANUP([hv1])
> +
> +AT_CLEANUP
> +])

I'm not sure whether we can do better than: clear pcap file, configure
"x seconds" timeout, sleep for "x * (y + 1)" seconds, check that the the
pcap contains at least y arp packets.

But that might be enough.  We could also tag the test as "unstable"
(TAG_UNSTABLE) pro-actively so that it gets rechecked on failure in
GitHub actions.

What do you think?

Thanks,
Dumitru
Lorenzo Bianconi Dec. 22, 2023, 3:46 p.m. UTC | #2
> On 12/15/23 18:01, Lorenzo Bianconi wrote:
> > Introduce a dedicated test for garp-max-timeout knob
> > 
> 
> For context of the review, I had asked for this "standalone" test for
> garp-max-timeout so that we can backport the following patch to 21.12:
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2023-December/410247.html

Hi Dumitru,

thx for thre review

> 
> > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> > ---
> >  tests/ovn.at | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 82 insertions(+)
> > 
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 918f97a9e..25b101a7a 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -37464,3 +37464,85 @@ wait_for_ports_up
> >  OVN_CLEANUP([hv1])
> >  AT_CLEANUP
> >  ])
> > +
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([gratuitous arp max timeout])
> > +AT_KEYWORDS([garp-timeout])
> > +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
> 
> Nit: We don't really need tcpdump, we could use the ovs-pcap utility
> directly.  But it's not mandatory I guess, we already have a bunch of
> tests using tcpdump.
> 
> > +ovn_start
> > +
> > +ovn-nbctl ls-add ls0
> > +ovn-nbctl lr-add lr0
> > +ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24
> > +ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \
> > +    type=router options:router-port=lr0-ls0 addresses='"f0:00:00:00:00:01"'
> > +
> > +ovn-nbctl lsp-add ls0 ln_port
> > +ovn-nbctl lsp-set-addresses ln_port unknown
> > +ovn-nbctl lsp-set-type ln_port localnet
> > +ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1
> 
> Missing "check .." for all these.

ack, I will fix it

> 
> > +
> > +# Prepare packets
> > +touch empty_expected
> > +echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" > arp_expected
> 
> Please use fmt_pkt instead.

ack, I will fix it

> 
> > +
> > +net_add n1
> > +sim_add hv1
> > +as hv1
> > +ovs-vsctl \
> > +    -- add-br br-phys \
> > +    -- add-br br-eth0
> > +
> > +ovn_attach n1 br-phys 192.168.0.10
> > +
> > +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0])
> > +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap])
> > +
> > +ovn-sbctl dump-flows > sbflows
> > +AT_CAPTURE_FILE([sbflows])
> > +
> 
> Why do we need this?

I will get rid of it

> 
> > +# Wait until the patch ports are created in hv1 to connect br-int to br-eth0
> > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
> > +OVN_WAIT_PATCH_PORT_FLOWS(["ln_port"], ["hv1"])
> 
> Using OVN_WAIT_PATCH_PORT_FLOWS makes the patch harder to backport to
> 21.12.  Do we really need this?

I will get rid of it

> 
> > +# Temporarily remove lr0 chassis
> > +# Wait for hv confirmation to make sure chassis is removed before we reset pcap
> > +# Otherwise a garp might be sent after pcap have been reset but before chassis is removed
> > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
> > +
> 
> Why do we need to remove the "chassis" option on lr0?

it is to avoid we receive garp packet during the configuration

> 
> > +as hv1 reset_pcap_file snoopvif hv1/snoopvif
> > +hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
> > +AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
> > +OVS_WAIT_UNTIL([
> > +    ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding ls0-lr0)
> > +    test "$ls0_lr0" = $hv1_uuid
> > +])
> > +
> > +OVN_CHECK_PACKETS_CONTAIN([hv1/snoopvif-tx.pcap], [arp_expected])
> 
> This checks that one ARP packet is there, why?  The test is about
> garp-max-timeout.
> 
> > +# Temporarily remove lr0 chassis
> > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
> > +
> 
> Why is this needed?

I will get rid of it

> 
> > +as hv1 reset_pcap_file snoopvif hv1/snoopvif
> > +# set garp max timeout to 2s
> > +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=2])
> > +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])
> > +
> > +OVS_WAIT_UNTIL([
> > +n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp | wc -l)
> > +test "$n_arp" = 10
> > +])
> 
> This doesn't actually test the timeout value.  It will wait until 10
> packets are received; they could be received at any intervals and the
> test will pass as long as it doesn't take more than OVS_TIMEOUT to
> receive all 10 packets.
> 
> > +
> > +AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
> > +as hv1 reset_pcap_file snoopvif hv1/snoopvif
> > +# set garp max timeout to 2s
> > +AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=1])
> 
> The comment says "2s" but we set the value to 1 second.

ack, I will fix it

> 
> > +AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])
> 
> I'm still confused about why we keep changing the options:chassis value.

same as above

> 
> > +
> > +OVS_WAIT_UNTIL([
> > +n_arp=$(tcpdump -c 20 -ner hv1/snoopvif-tx.pcap arp | wc -l)
> > +test "$n_arp" = 20
> > +])
> > +
> 
> Same comment as above about not really testing the timeout.
> 
> > +OVN_CLEANUP([hv1])
> > +
> > +AT_CLEANUP
> > +])
> 
> I'm not sure whether we can do better than: clear pcap file, configure
> "x seconds" timeout, sleep for "x * (y + 1)" seconds, check that the the
> pcap contains at least y arp packets.
> 
> But that might be enough.  We could also tag the test as "unstable"
> (TAG_UNSTABLE) pro-actively so that it gets rechecked on failure in
> GitHub actions.
> 
> What do you think?

ack, I will rework the test according to your comments.

Regards,
Lorenzo

> 
> Thanks,
> Dumitru
> 
>
diff mbox series

Patch

diff --git a/tests/ovn.at b/tests/ovn.at
index 918f97a9e..25b101a7a 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -37464,3 +37464,85 @@  wait_for_ports_up
 OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([gratuitous arp max timeout])
+AT_KEYWORDS([garp-timeout])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+ovn_start
+
+ovn-nbctl ls-add ls0
+ovn-nbctl lr-add lr0
+ovn-nbctl lrp-add lr0 lr0-ls0 f0:00:00:00:00:01 192.168.0.1/24
+ovn-nbctl lsp-add ls0 ls0-lr0 -- set Logical_Switch_Port ls0-lr0 \
+    type=router options:router-port=lr0-ls0 addresses='"f0:00:00:00:00:01"'
+
+ovn-nbctl lsp-add ls0 ln_port
+ovn-nbctl lsp-set-addresses ln_port unknown
+ovn-nbctl lsp-set-type ln_port localnet
+ovn-nbctl --wait=hv lsp-set-options ln_port network_name=physnet1
+
+# Prepare packets
+touch empty_expected
+echo "fffffffffffff0000000000108060001080006040001f00000000001c0a80001000000000000c0a80001" > arp_expected
+
+net_add n1
+sim_add hv1
+as hv1
+ovs-vsctl \
+    -- add-br br-phys \
+    -- add-br br-eth0
+
+ovn_attach n1 br-phys 192.168.0.10
+
+AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0])
+AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif options:tx_pcap=hv1/snoopvif-tx.pcap options:rxq_pcap=hv1/snoopvif-rx.pcap])
+
+ovn-sbctl dump-flows > sbflows
+AT_CAPTURE_FILE([sbflows])
+
+# Wait until the patch ports are created in hv1 to connect br-int to br-eth0
+AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
+OVN_WAIT_PATCH_PORT_FLOWS(["ln_port"], ["hv1"])
+# Temporarily remove lr0 chassis
+# Wait for hv confirmation to make sure chassis is removed before we reset pcap
+# Otherwise a garp might be sent after pcap have been reset but before chassis is removed
+AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
+
+as hv1 reset_pcap_file snoopvif hv1/snoopvif
+hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
+AT_CHECK([ovn-nbctl set logical_router lr0 options:chassis=hv1])
+OVS_WAIT_UNTIL([
+    ls0_lr0=$(ovn-sbctl --bare --columns chassis list port_binding ls0-lr0)
+    test "$ls0_lr0" = $hv1_uuid
+])
+
+OVN_CHECK_PACKETS_CONTAIN([hv1/snoopvif-tx.pcap], [arp_expected])
+# Temporarily remove lr0 chassis
+AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
+
+as hv1 reset_pcap_file snoopvif hv1/snoopvif
+# set garp max timeout to 2s
+AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=2])
+AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])
+
+OVS_WAIT_UNTIL([
+n_arp=$(tcpdump -c 10 -ner hv1/snoopvif-tx.pcap arp | wc -l)
+test "$n_arp" = 10
+])
+
+AT_CHECK([ovn-nbctl --wait=hv remove logical_router lr0 options chassis])
+as hv1 reset_pcap_file snoopvif hv1/snoopvif
+# set garp max timeout to 2s
+AT_CHECK([as hv1 ovs-vsctl set Open_vSwitch . external-ids:garp-max-timeout-sec=1])
+AT_CHECK([ovn-nbctl --wait=hv set logical_router lr0 options:chassis=hv1])
+
+OVS_WAIT_UNTIL([
+n_arp=$(tcpdump -c 20 -ner hv1/snoopvif-tx.pcap arp | wc -l)
+test "$n_arp" = 20
+])
+
+OVN_CLEANUP([hv1])
+
+AT_CLEANUP
+])