diff mbox series

[ovs-dev,v4] Make tunnel ids exhaustion test trigger the problem.

Message ID 20240405161437.631395-1-ihrachys@redhat.com
State Superseded
Headers show
Series [ovs-dev,v4] Make tunnel ids exhaustion test trigger the problem. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Ihar Hrachyshka April 5, 2024, 4:14 p.m. UTC
The original version of the scenario passed with or without the fix.
This is because all LSs were processed in one go, so the allocate
function was never entered with *hint==0.

Also, added another scenario that will check behavior when *hint is out
of [min;max] bounds but > max (this happens in an obscure scenario where
a vxlan chassis is added to the cluster mid-light, forcing northd to
reduce its effective max value for tunnel ids; which may become lower
than the current *hint for ports.)

Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()")
Co-Authored-By: Vladislav Odintsov <odivlad@gmail.com>
Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
---
v1: initial version.
v2: cover both cases of hint = 0 and hint > max.
v3: reduce the number of ports to create in the hint > max scenario needed to trigger the problem.
v4: remove spurious lib/ovn-util.c change.
---
 tests/ovn-northd.at | 43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

Comments

Vladislav Odintsov April 6, 2024, 6:30 a.m. UTC | #1
Hi Ihar,

Thanks for cooperation and enhancements in the testcases!
The patch looks good to me.

> On 5 Apr 2024, at 19:14, Ihar Hrachyshka <ihrachys@redhat.com> wrote:
> 
> The original version of the scenario passed with or without the fix.
> This is because all LSs were processed in one go, so the allocate
> function was never entered with *hint==0.
> 
> Also, added another scenario that will check behavior when *hint is out
> of [min;max] bounds but > max (this happens in an obscure scenario where
> a vxlan chassis is added to the cluster mid-light, forcing northd to
> reduce its effective max value for tunnel ids; which may become lower
> than the current *hint for ports.)
> 
> Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()")
> Co-Authored-By: Vladislav Odintsov <odivlad@gmail.com>
> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> ---
> v1: initial version.
> v2: cover both cases of hint = 0 and hint > max.
> v3: reduce the number of ports to create in the hint > max scenario needed to trigger the problem.
> v4: remove spurious lib/ovn-util.c change.
> ---
> tests/ovn-northd.at | 43 ++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index be006fb32..1a4e7274d 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -2823,7 +2823,7 @@ AT_CLEANUP
> ])
> 
> OVN_FOR_EACH_NORTHD_NO_HV([
> -AT_SETUP([check tunnel ids exhaustion])
> +AT_SETUP([check datapath tunnel ids exhaustion])
> ovn_start
> 
> # Create a fake chassis with vxlan encap to lower MAX DP tunnel key to 2^12
> @@ -2833,13 +2833,18 @@ ovn-sbctl \
> 
> cmd="ovn-nbctl --wait=sb"
> 
> -for i in {1..4097}; do
> +for i in {1..4095}; do
>     cmd="${cmd} -- ls-add lsw-${i}"
> done
> 
> eval $cmd
> 
> -check_row_count nb:Logical_Switch 4097
> +check_row_count nb:Logical_Switch 4095
> +wait_row_count sb:Datapath_Binding 4095
> +
> +ovn-nbctl ls-add lsw-exhausted
> +
> +check_row_count nb:Logical_Switch 4096
> wait_row_count sb:Datapath_Binding 4095
> 
> OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" northd/ovn-northd.log])
> @@ -2847,6 +2852,38 @@ OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" northd/ovn-northd.log])
> AT_CLEANUP
> ])
> 
> +OVN_FOR_EACH_NORTHD_NO_HV([
> +AT_SETUP([check port tunnel ids exhaustion; vxlan chassis pops up midflight])
> +ovn_start
> +
> +cmd="ovn-nbctl --wait=sb"
> +
> +cmd="${cmd} -- ls-add lsw"
> +for i in {1..2048}; do
> +    cmd="${cmd} -- lsp-add lsw lsp-${i}"
> +done
> +
> +eval $cmd
> +
> +check_row_count nb:Logical_Switch_Port 2048
> +wait_row_count sb:Port_Binding 2048
> +
> +# Now create a fake chassis with vxlan encap to lower MAX port tunnel key to 2^11
> +ovn-sbctl \
> +    --id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \
> +    -- --id=@c create chassis name=hv1 encaps=@e
> +
> +ovn-nbctl lsp-add lsw lsp-exhausted
> +
> +check_row_count nb:Logical_Switch_Port 2049
> +wait_row_count sb:Port_Binding 2048
> +
> +OVS_WAIT_UNTIL([grep "all port tunnel ids exhausted" northd/ovn-northd.log])
> +
> +AT_CLEANUP
> +])
> +
> +
> 
> OVN_FOR_EACH_NORTHD_NO_HV([
> AT_SETUP([Logical Flow Datapath Groups])
> -- 
> 2.41.0
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov
Ihar Hrachyshka April 8, 2024, 1:49 p.m. UTC | #2
Note to reviewers: looks like the port tunnel id test case revealed a
number of undefined behaviors and leaks (?) in northd; I am working on
addressing these. Before that, we should not merge this patch as-is. (Or at
least we should not merge the part with the port case; I believe the
network case is fine.)

On Sat, Apr 6, 2024 at 2:30 AM Vladislav Odintsov <odivlad@gmail.com> wrote:

> Hi Ihar,
>
> Thanks for cooperation and enhancements in the testcases!
> The patch looks good to me.
>
> On 5 Apr 2024, at 19:14, Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>
> The original version of the scenario passed with or without the fix.
> This is because all LSs were processed in one go, so the allocate
> function was never entered with *hint==0.
>
> Also, added another scenario that will check behavior when *hint is out
> of [min;max] bounds but > max (this happens in an obscure scenario where
> a vxlan chassis is added to the cluster mid-light, forcing northd to
> reduce its effective max value for tunnel ids; which may become lower
> than the current *hint for ports.)
>
> Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()")
> Co-Authored-By: Vladislav Odintsov <odivlad@gmail.com>
> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> ---
> v1: initial version.
> v2: cover both cases of hint = 0 and hint > max.
> v3: reduce the number of ports to create in the hint > max scenario needed
> to trigger the problem.
> v4: remove spurious lib/ovn-util.c change.
> ---
> tests/ovn-northd.at | 43 ++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 40 insertions(+), 3 deletions(-)
>
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index be006fb32..1a4e7274d 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -2823,7 +2823,7 @@ AT_CLEANUP
> ])
>
> OVN_FOR_EACH_NORTHD_NO_HV([
> -AT_SETUP([check tunnel ids exhaustion])
> +AT_SETUP([check datapath tunnel ids exhaustion])
> ovn_start
>
> # Create a fake chassis with vxlan encap to lower MAX DP tunnel key to 2^12
> @@ -2833,13 +2833,18 @@ ovn-sbctl \
>
> cmd="ovn-nbctl --wait=sb"
>
> -for i in {1..4097}; do
> +for i in {1..4095}; do
>     cmd="${cmd} -- ls-add lsw-${i}"
> done
>
> eval $cmd
>
> -check_row_count nb:Logical_Switch 4097
> +check_row_count nb:Logical_Switch 4095
> +wait_row_count sb:Datapath_Binding 4095
> +
> +ovn-nbctl ls-add lsw-exhausted
> +
> +check_row_count nb:Logical_Switch 4096
> wait_row_count sb:Datapath_Binding 4095
>
> OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted"
> northd/ovn-northd.log])
> @@ -2847,6 +2852,38 @@ OVS_WAIT_UNTIL([grep "all datapath tunnel ids
> exhausted" northd/ovn-northd.log])
> AT_CLEANUP
> ])
>
> +OVN_FOR_EACH_NORTHD_NO_HV([
> +AT_SETUP([check port tunnel ids exhaustion; vxlan chassis pops up
> midflight])
> +ovn_start
> +
> +cmd="ovn-nbctl --wait=sb"
> +
> +cmd="${cmd} -- ls-add lsw"
> +for i in {1..2048}; do
> +    cmd="${cmd} -- lsp-add lsw lsp-${i}"
> +done
> +
> +eval $cmd
> +
> +check_row_count nb:Logical_Switch_Port 2048
> +wait_row_count sb:Port_Binding 2048
> +
> +# Now create a fake chassis with vxlan encap to lower MAX port tunnel key
> to 2^11
> +ovn-sbctl \
> +    --id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \
> +    -- --id=@c create chassis name=hv1 encaps=@e
> +
> +ovn-nbctl lsp-add lsw lsp-exhausted
> +
> +check_row_count nb:Logical_Switch_Port 2049
> +wait_row_count sb:Port_Binding 2048
> +
> +OVS_WAIT_UNTIL([grep "all port tunnel ids exhausted"
> northd/ovn-northd.log])
> +
> +AT_CLEANUP
> +])
> +
> +
>
> OVN_FOR_EACH_NORTHD_NO_HV([
> AT_SETUP([Logical Flow Datapath Groups])
> --
> 2.41.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
> Regards,
> Vladislav Odintsov
>
>
Ihar Hrachyshka April 11, 2024, 11:02 p.m. UTC | #3
Series with this patch that also fixes ubsan failure here:
https://patchwork.ozlabs.org/project/ovn/list/?series=402694

On Mon, Apr 8, 2024 at 9:49 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote:

> Note to reviewers: looks like the port tunnel id test case revealed a
> number of undefined behaviors and leaks (?) in northd; I am working on
> addressing these. Before that, we should not merge this patch as-is. (Or at
> least we should not merge the part with the port case; I believe the
> network case is fine.)
>
> On Sat, Apr 6, 2024 at 2:30 AM Vladislav Odintsov <odivlad@gmail.com>
> wrote:
>
>> Hi Ihar,
>>
>> Thanks for cooperation and enhancements in the testcases!
>> The patch looks good to me.
>>
>> On 5 Apr 2024, at 19:14, Ihar Hrachyshka <ihrachys@redhat.com> wrote:
>>
>> The original version of the scenario passed with or without the fix.
>> This is because all LSs were processed in one go, so the allocate
>> function was never entered with *hint==0.
>>
>> Also, added another scenario that will check behavior when *hint is out
>> of [min;max] bounds but > max (this happens in an obscure scenario where
>> a vxlan chassis is added to the cluster mid-light, forcing northd to
>> reduce its effective max value for tunnel ids; which may become lower
>> than the current *hint for ports.)
>>
>> Fixes: a1f165a7b807 ("northd: fix infinite loop in ovn_allocate_tnlid()")
>> Co-Authored-By: Vladislav Odintsov <odivlad@gmail.com>
>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
>> ---
>> v1: initial version.
>> v2: cover both cases of hint = 0 and hint > max.
>> v3: reduce the number of ports to create in the hint > max scenario
>> needed to trigger the problem.
>> v4: remove spurious lib/ovn-util.c change.
>> ---
>> tests/ovn-northd.at | 43 ++++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 40 insertions(+), 3 deletions(-)
>>
>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> index be006fb32..1a4e7274d 100644
>> --- a/tests/ovn-northd.at
>> +++ b/tests/ovn-northd.at
>> @@ -2823,7 +2823,7 @@ AT_CLEANUP
>> ])
>>
>> OVN_FOR_EACH_NORTHD_NO_HV([
>> -AT_SETUP([check tunnel ids exhaustion])
>> +AT_SETUP([check datapath tunnel ids exhaustion])
>> ovn_start
>>
>> # Create a fake chassis with vxlan encap to lower MAX DP tunnel key to
>> 2^12
>> @@ -2833,13 +2833,18 @@ ovn-sbctl \
>>
>> cmd="ovn-nbctl --wait=sb"
>>
>> -for i in {1..4097}; do
>> +for i in {1..4095}; do
>>     cmd="${cmd} -- ls-add lsw-${i}"
>> done
>>
>> eval $cmd
>>
>> -check_row_count nb:Logical_Switch 4097
>> +check_row_count nb:Logical_Switch 4095
>> +wait_row_count sb:Datapath_Binding 4095
>> +
>> +ovn-nbctl ls-add lsw-exhausted
>> +
>> +check_row_count nb:Logical_Switch 4096
>> wait_row_count sb:Datapath_Binding 4095
>>
>> OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted"
>> northd/ovn-northd.log])
>> @@ -2847,6 +2852,38 @@ OVS_WAIT_UNTIL([grep "all datapath tunnel ids
>> exhausted" northd/ovn-northd.log])
>> AT_CLEANUP
>> ])
>>
>> +OVN_FOR_EACH_NORTHD_NO_HV([
>> +AT_SETUP([check port tunnel ids exhaustion; vxlan chassis pops up
>> midflight])
>> +ovn_start
>> +
>> +cmd="ovn-nbctl --wait=sb"
>> +
>> +cmd="${cmd} -- ls-add lsw"
>> +for i in {1..2048}; do
>> +    cmd="${cmd} -- lsp-add lsw lsp-${i}"
>> +done
>> +
>> +eval $cmd
>> +
>> +check_row_count nb:Logical_Switch_Port 2048
>> +wait_row_count sb:Port_Binding 2048
>> +
>> +# Now create a fake chassis with vxlan encap to lower MAX port tunnel
>> key to 2^11
>> +ovn-sbctl \
>> +    --id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \
>> +    -- --id=@c create chassis name=hv1 encaps=@e
>> +
>> +ovn-nbctl lsp-add lsw lsp-exhausted
>> +
>> +check_row_count nb:Logical_Switch_Port 2049
>> +wait_row_count sb:Port_Binding 2048
>> +
>> +OVS_WAIT_UNTIL([grep "all port tunnel ids exhausted"
>> northd/ovn-northd.log])
>> +
>> +AT_CLEANUP
>> +])
>> +
>> +
>>
>> OVN_FOR_EACH_NORTHD_NO_HV([
>> AT_SETUP([Logical Flow Datapath Groups])
>> --
>> 2.41.0
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>>
>> Regards,
>> Vladislav Odintsov
>>
>>
diff mbox series

Patch

diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index be006fb32..1a4e7274d 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -2823,7 +2823,7 @@  AT_CLEANUP
 ])
 
 OVN_FOR_EACH_NORTHD_NO_HV([
-AT_SETUP([check tunnel ids exhaustion])
+AT_SETUP([check datapath tunnel ids exhaustion])
 ovn_start
 
 # Create a fake chassis with vxlan encap to lower MAX DP tunnel key to 2^12
@@ -2833,13 +2833,18 @@  ovn-sbctl \
 
 cmd="ovn-nbctl --wait=sb"
 
-for i in {1..4097}; do
+for i in {1..4095}; do
     cmd="${cmd} -- ls-add lsw-${i}"
 done
 
 eval $cmd
 
-check_row_count nb:Logical_Switch 4097
+check_row_count nb:Logical_Switch 4095
+wait_row_count sb:Datapath_Binding 4095
+
+ovn-nbctl ls-add lsw-exhausted
+
+check_row_count nb:Logical_Switch 4096
 wait_row_count sb:Datapath_Binding 4095
 
 OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" northd/ovn-northd.log])
@@ -2847,6 +2852,38 @@  OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" northd/ovn-northd.log])
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([check port tunnel ids exhaustion; vxlan chassis pops up midflight])
+ovn_start
+
+cmd="ovn-nbctl --wait=sb"
+
+cmd="${cmd} -- ls-add lsw"
+for i in {1..2048}; do
+    cmd="${cmd} -- lsp-add lsw lsp-${i}"
+done
+
+eval $cmd
+
+check_row_count nb:Logical_Switch_Port 2048
+wait_row_count sb:Port_Binding 2048
+
+# Now create a fake chassis with vxlan encap to lower MAX port tunnel key to 2^11
+ovn-sbctl \
+    --id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \
+    -- --id=@c create chassis name=hv1 encaps=@e
+
+ovn-nbctl lsp-add lsw lsp-exhausted
+
+check_row_count nb:Logical_Switch_Port 2049
+wait_row_count sb:Port_Binding 2048
+
+OVS_WAIT_UNTIL([grep "all port tunnel ids exhausted" northd/ovn-northd.log])
+
+AT_CLEANUP
+])
+
+
 
 OVN_FOR_EACH_NORTHD_NO_HV([
 AT_SETUP([Logical Flow Datapath Groups])