diff mbox series

[ovs-dev,v2] dpif-netdev: Increase MAX_RECIRC_DEPTH to 8.

Message ID 20240126132451.25157-1-jmeng@redhat.com
State Accepted
Commit 5df46a44e8755d065b7247a0e7d2c90e9781219f
Delegated to: Simon Horman
Headers show
Series [ovs-dev,v2] dpif-netdev: Increase MAX_RECIRC_DEPTH to 8. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Jakob Meng Jan. 26, 2024, 1:24 p.m. UTC
From: Jakob Meng <code@jakobmeng.de>

In a scenario where OVN does load balancing and then SNAT with a OVS
userspace datapath [0], the recirc_depth may be greater than 6. In
that case, ovs-vswitchd might drop packets and raise warnings:

  dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded.

Increasing MAX_RECIRC_DEPTH to 8 solves this issue.

[0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740

Reported-at: https://issues.redhat.com/browse/FDP-251
Signed-off-by: Jakob Meng <code@jakobmeng.de>
---
 lib/dpif-netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Simon Horman Jan. 30, 2024, 9:44 a.m. UTC | #1
On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote:
> From: Jakob Meng <code@jakobmeng.de>
> 
> In a scenario where OVN does load balancing and then SNAT with a OVS
> userspace datapath [0], the recirc_depth may be greater than 6. In
> that case, ovs-vswitchd might drop packets and raise warnings:
> 
>   dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded.
> 
> Increasing MAX_RECIRC_DEPTH to 8 solves this issue.
> 
> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740
> 
> Reported-at: https://issues.redhat.com/browse/FDP-251
> Signed-off-by: Jakob Meng <code@jakobmeng.de>

Hi Jakob,

I'm unsure what the considerations were when setting this limit,
but I do note that it was increased once before, from 5 to 6,
for an OVN use-case [1]. So this approach seems reasonable to me.

Acked-by: Simon Horman <horms@ovn.org>

[1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6.
    https://github.com/openvswitch/ovs/commit/3f9d3836d63a
Jakob Meng Feb. 15, 2024, 8:16 a.m. UTC | #2
On 30.01.24 10:44, Simon Horman wrote:
> On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote:
>> From: Jakob Meng <code@jakobmeng.de>
>>
>> In a scenario where OVN does load balancing and then SNAT with a OVS
>> userspace datapath [0], the recirc_depth may be greater than 6. In
>> that case, ovs-vswitchd might drop packets and raise warnings:
>>
>>   dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded.
>>
>> Increasing MAX_RECIRC_DEPTH to 8 solves this issue.
>>
>> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-251
>> Signed-off-by: Jakob Meng <code@jakobmeng.de>
> Hi Jakob,
>
> I'm unsure what the considerations were when setting this limit,
> but I do note that it was increased once before, from 5 to 6,
> for an OVN use-case [1]. So this approach seems reasonable to me.
>
> Acked-by: Simon Horman <horms@ovn.org>
>
> [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6.
>     https://github.com/openvswitch/ovs/commit/3f9d3836d63a
>

Hi Ilya,
can we include this patch in 3.3, please?

It is required for more complex setups like OKD/OpenShift with userspace datapaths.

Best,
Jakob
Simon Horman Feb. 15, 2024, 1:23 p.m. UTC | #3
On Thu, Feb 15, 2024 at 09:16:38AM +0100, Jakob Meng wrote:
> On 30.01.24 10:44, Simon Horman wrote:
> > On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote:
> >> From: Jakob Meng <code@jakobmeng.de>
> >>
> >> In a scenario where OVN does load balancing and then SNAT with a OVS
> >> userspace datapath [0], the recirc_depth may be greater than 6. In
> >> that case, ovs-vswitchd might drop packets and raise warnings:
> >>
> >>   dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded.
> >>
> >> Increasing MAX_RECIRC_DEPTH to 8 solves this issue.
> >>
> >> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740
> >>
> >> Reported-at: https://issues.redhat.com/browse/FDP-251
> >> Signed-off-by: Jakob Meng <code@jakobmeng.de>
> > Hi Jakob,
> >
> > I'm unsure what the considerations were when setting this limit,
> > but I do note that it was increased once before, from 5 to 6,
> > for an OVN use-case [1]. So this approach seems reasonable to me.
> >
> > Acked-by: Simon Horman <horms@ovn.org>
> >
> > [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6.
> >     https://github.com/openvswitch/ovs/commit/3f9d3836d63a
> >
> 
> Hi Ilya,
> can we include this patch in 3.3, please?
> 
> It is required for more complex setups like OKD/OpenShift with userspace datapaths.

In the hope of moving this forwards:

My feeling is that while on the one hand this is a wack-a-mole approach it
does address a real problem in a well understood way, thus moving
things in a positive direction.

As for downside, there is a risk that this will blow the stack anyway
or somehow cause resource exhaustion that didn't occur before. I didn't
exercise the patch. But I do suspect that risk is minimal. And if it
surfaces we will have to search for a better solution.

But we don't have a better solution at this time - or at least I don't.
So, other than what I assume is a small risk of regression, do we really
lose anything by moving forward with this simple change at this time?
Ilya Maximets Feb. 15, 2024, 11:03 p.m. UTC | #4
On 2/15/24 14:23, Simon Horman wrote:
> On Thu, Feb 15, 2024 at 09:16:38AM +0100, Jakob Meng wrote:
>> On 30.01.24 10:44, Simon Horman wrote:
>>> On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote:
>>>> From: Jakob Meng <code@jakobmeng.de>
>>>>
>>>> In a scenario where OVN does load balancing and then SNAT with a OVS
>>>> userspace datapath [0], the recirc_depth may be greater than 6. In
>>>> that case, ovs-vswitchd might drop packets and raise warnings:
>>>>
>>>>   dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded.
>>>>
>>>> Increasing MAX_RECIRC_DEPTH to 8 solves this issue.
>>>>
>>>> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740
>>>>
>>>> Reported-at: https://issues.redhat.com/browse/FDP-251
>>>> Signed-off-by: Jakob Meng <code@jakobmeng.de>
>>> Hi Jakob,
>>>
>>> I'm unsure what the considerations were when setting this limit,
>>> but I do note that it was increased once before, from 5 to 6,
>>> for an OVN use-case [1]. So this approach seems reasonable to me.
>>>
>>> Acked-by: Simon Horman <horms@ovn.org>
>>>
>>> [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6.
>>>     https://github.com/openvswitch/ovs/commit/3f9d3836d63a
>>>
>>
>> Hi Ilya,
>> can we include this patch in 3.3, please?
>>
>> It is required for more complex setups like OKD/OpenShift with userspace datapaths.
> 
> In the hope of moving this forwards:
> 
> My feeling is that while on the one hand this is a wack-a-mole approach it
> does address a real problem in a well understood way, thus moving
> things in a positive direction.
> 
> As for downside, there is a risk that this will blow the stack anyway
> or somehow cause resource exhaustion that didn't occur before. I didn't
> exercise the patch. But I do suspect that risk is minimal. And if it
> surfaces we will have to search for a better solution.
> 
> But we don't have a better solution at this time - or at least I don't.
> So, other than what I assume is a small risk of regression, do we really
> lose anything by moving forward with this simple change at this time?

I think, it's fine for now.  The extra 2 recirculations are very unlikely
to blow up the stack, especially since the default config is to have 2 MB
of pre-allocated and mlocked stack.

So, applied for now.  Also applied to 3.3 as I don't think this change
carries any risk at this time and it will make life of OVN developers/users
much easier.

But we still need to think of a better way of solving this issue for the
future.

Best regards, Ilya Maximets.
Jakob Meng Feb. 16, 2024, 8:35 a.m. UTC | #5
On 16.02.24 00:03, Ilya Maximets wrote:
> On 2/15/24 14:23, Simon Horman wrote:
>> On Thu, Feb 15, 2024 at 09:16:38AM +0100, Jakob Meng wrote:
>>> On 30.01.24 10:44, Simon Horman wrote:
>>>> On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote:
>>>>> From: Jakob Meng <code@jakobmeng.de>
>>>>>
>>>>> In a scenario where OVN does load balancing and then SNAT with a OVS
>>>>> userspace datapath [0], the recirc_depth may be greater than 6. In
>>>>> that case, ovs-vswitchd might drop packets and raise warnings:
>>>>>
>>>>>   dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded.
>>>>>
>>>>> Increasing MAX_RECIRC_DEPTH to 8 solves this issue.
>>>>>
>>>>> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740
>>>>>
>>>>> Reported-at: https://issues.redhat.com/browse/FDP-251
>>>>> Signed-off-by: Jakob Meng <code@jakobmeng.de>
>>>> Hi Jakob,
>>>>
>>>> I'm unsure what the considerations were when setting this limit,
>>>> but I do note that it was increased once before, from 5 to 6,
>>>> for an OVN use-case [1]. So this approach seems reasonable to me.
>>>>
>>>> Acked-by: Simon Horman <horms@ovn.org>
>>>>
>>>> [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6.
>>>>     https://github.com/openvswitch/ovs/commit/3f9d3836d63a
>>>>
>>> Hi Ilya,
>>> can we include this patch in 3.3, please?
>>>
>>> It is required for more complex setups like OKD/OpenShift with userspace datapaths.
>> In the hope of moving this forwards:
>>
>> My feeling is that while on the one hand this is a wack-a-mole approach it
>> does address a real problem in a well understood way, thus moving
>> things in a positive direction.
>>
>> As for downside, there is a risk that this will blow the stack anyway
>> or somehow cause resource exhaustion that didn't occur before. I didn't
>> exercise the patch. But I do suspect that risk is minimal. And if it
>> surfaces we will have to search for a better solution.
>>
>> But we don't have a better solution at this time - or at least I don't.
>> So, other than what I assume is a small risk of regression, do we really
>> lose anything by moving forward with this simple change at this time?
> I think, it's fine for now.  The extra 2 recirculations are very unlikely
> to blow up the stack, especially since the default config is to have 2 MB
> of pre-allocated and mlocked stack.
>
> So, applied for now.  Also applied to 3.3 as I don't think this change
> carries any risk at this time and it will make life of OVN developers/users
> much easier.
>
> But we still need to think of a better way of solving this issue for the
> future.
>
> Best regards, Ilya Maximets.
>

Thank you, Ilya and Simon! I have updated the ticket [0] accordingly.

[0] https://issues.redhat.com/browse/FDP-251
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c1981137f..46e24d204 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -99,7 +99,7 @@  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
 #define FLOW_DUMP_MAX_BATCH 50
 /* Use per thread recirc_depth to prevent recirculation loop. */
-#define MAX_RECIRC_DEPTH 6
+#define MAX_RECIRC_DEPTH 8
 DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
 
 /* Use instant packet send by default. */