Message ID | 20240126132451.25157-1-jmeng@redhat.com |
---|---|
State | Accepted |
Commit | 5df46a44e8755d065b7247a0e7d2c90e9781219f |
Delegated to: | Simon Horman |
Headers | show |
Series | [ovs-dev,v2] dpif-netdev: Increase MAX_RECIRC_DEPTH to 8. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote: > From: Jakob Meng <code@jakobmeng.de> > > In a scenario where OVN does load balancing and then SNAT with a OVS > userspace datapath [0], the recirc_depth may be greater than 6. In > that case, ovs-vswitchd might drop packets and raise warnings: > > dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded. > > Increasing MAX_RECIRC_DEPTH to 8 solves this issue. > > [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740 > > Reported-at: https://issues.redhat.com/browse/FDP-251 > Signed-off-by: Jakob Meng <code@jakobmeng.de> Hi Jakob, I'm unsure what the considerations were when setting this limit, but I do note that it was increased once before, from 5 to 6, for an OVN use-case [1]. So this approach seems reasonable to me. Acked-by: Simon Horman <horms@ovn.org> [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6. https://github.com/openvswitch/ovs/commit/3f9d3836d63a
On 30.01.24 10:44, Simon Horman wrote: > On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote: >> From: Jakob Meng <code@jakobmeng.de> >> >> In a scenario where OVN does load balancing and then SNAT with a OVS >> userspace datapath [0], the recirc_depth may be greater than 6. In >> that case, ovs-vswitchd might drop packets and raise warnings: >> >> dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded. >> >> Increasing MAX_RECIRC_DEPTH to 8 solves this issue. >> >> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740 >> >> Reported-at: https://issues.redhat.com/browse/FDP-251 >> Signed-off-by: Jakob Meng <code@jakobmeng.de> > Hi Jakob, > > I'm unsure what the considerations were when setting this limit, > but I do note that it was increased once before, from 5 to 6, > for an OVN use-case [1]. So this approach seems reasonable to me. > > Acked-by: Simon Horman <horms@ovn.org> > > [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6. > https://github.com/openvswitch/ovs/commit/3f9d3836d63a > Hi Ilya, can we include this patch in 3.3, please? It is required for more complex setups like OKD/OpenShift with userspace datapaths. Best, Jakob
On Thu, Feb 15, 2024 at 09:16:38AM +0100, Jakob Meng wrote: > On 30.01.24 10:44, Simon Horman wrote: > > On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote: > >> From: Jakob Meng <code@jakobmeng.de> > >> > >> In a scenario where OVN does load balancing and then SNAT with a OVS > >> userspace datapath [0], the recirc_depth may be greater than 6. In > >> that case, ovs-vswitchd might drop packets and raise warnings: > >> > >> dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded. > >> > >> Increasing MAX_RECIRC_DEPTH to 8 solves this issue. > >> > >> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740 > >> > >> Reported-at: https://issues.redhat.com/browse/FDP-251 > >> Signed-off-by: Jakob Meng <code@jakobmeng.de> > > Hi Jakob, > > > > I'm unsure what the considerations were when setting this limit, > > but I do note that it was increased once before, from 5 to 6, > > for an OVN use-case [1]. So this approach seems reasonable to me. > > > > Acked-by: Simon Horman <horms@ovn.org> > > > > [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6. > > https://github.com/openvswitch/ovs/commit/3f9d3836d63a > > > > Hi Ilya, > can we include this patch in 3.3, please? > > It is required for more complex setups like OKD/OpenShift with userspace datapaths. In the hope of moving this forwards: My feeling is that while on the one hand this is a wack-a-mole approach it does address a real problem in a well understood way, thus moving things in a positive direction. As for downside, there is a risk that this will blow the stack anyway or somehow cause resource exhaustion that didn't occur before. I didn't exercise the patch. But I do suspect that risk is minimal. And if it surfaces we will have to search for a better solution. But we don't have a better solution at this time - or at least I don't. So, other than what I assume is a small risk of regression, do we really lose anything by moving forward with this simple change at this time?
On 2/15/24 14:23, Simon Horman wrote: > On Thu, Feb 15, 2024 at 09:16:38AM +0100, Jakob Meng wrote: >> On 30.01.24 10:44, Simon Horman wrote: >>> On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote: >>>> From: Jakob Meng <code@jakobmeng.de> >>>> >>>> In a scenario where OVN does load balancing and then SNAT with a OVS >>>> userspace datapath [0], the recirc_depth may be greater than 6. In >>>> that case, ovs-vswitchd might drop packets and raise warnings: >>>> >>>> dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded. >>>> >>>> Increasing MAX_RECIRC_DEPTH to 8 solves this issue. >>>> >>>> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740 >>>> >>>> Reported-at: https://issues.redhat.com/browse/FDP-251 >>>> Signed-off-by: Jakob Meng <code@jakobmeng.de> >>> Hi Jakob, >>> >>> I'm unsure what the considerations were when setting this limit, >>> but I do note that it was increased once before, from 5 to 6, >>> for an OVN use-case [1]. So this approach seems reasonable to me. >>> >>> Acked-by: Simon Horman <horms@ovn.org> >>> >>> [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6. >>> https://github.com/openvswitch/ovs/commit/3f9d3836d63a >>> >> >> Hi Ilya, >> can we include this patch in 3.3, please? >> >> It is required for more complex setups like OKD/OpenShift with userspace datapaths. > > In the hope of moving this forwards: > > My feeling is that while on the one hand this is a wack-a-mole approach it > does address a real problem in a well understood way, thus moving > things in a positive direction. > > As for downside, there is a risk that this will blow the stack anyway > or somehow cause resource exhaustion that didn't occur before. I didn't > exercise the patch. But I do suspect that risk is minimal. And if it > surfaces we will have to search for a better solution. > > But we don't have a better solution at this time - or at least I don't. > So, other than what I assume is a small risk of regression, do we really > lose anything by moving forward with this simple change at this time? I think, it's fine for now. The extra 2 recirculations are very unlikely to blow up the stack, especially since the default config is to have 2 MB of pre-allocated and mlocked stack. So, applied for now. Also applied to 3.3 as I don't think this change carries any risk at this time and it will make life of OVN developers/users much easier. But we still need to think of a better way of solving this issue for the future. Best regards, Ilya Maximets.
On 16.02.24 00:03, Ilya Maximets wrote: > On 2/15/24 14:23, Simon Horman wrote: >> On Thu, Feb 15, 2024 at 09:16:38AM +0100, Jakob Meng wrote: >>> On 30.01.24 10:44, Simon Horman wrote: >>>> On Fri, Jan 26, 2024 at 02:24:51PM +0100, jmeng@redhat.com wrote: >>>>> From: Jakob Meng <code@jakobmeng.de> >>>>> >>>>> In a scenario where OVN does load balancing and then SNAT with a OVS >>>>> userspace datapath [0], the recirc_depth may be greater than 6. In >>>>> that case, ovs-vswitchd might drop packets and raise warnings: >>>>> >>>>> dpif_netdev|WARN|Packet dropped. Max recirculation depth exceeded. >>>>> >>>>> Increasing MAX_RECIRC_DEPTH to 8 solves this issue. >>>>> >>>>> [0] https://github.com/ovn-org/ovn/blob/dd5cd73e3df1bfb1a215cb45d1e2e03eff1d049a/tests/system-ovn-kmod.at#L740 >>>>> >>>>> Reported-at: https://issues.redhat.com/browse/FDP-251 >>>>> Signed-off-by: Jakob Meng <code@jakobmeng.de> >>>> Hi Jakob, >>>> >>>> I'm unsure what the considerations were when setting this limit, >>>> but I do note that it was increased once before, from 5 to 6, >>>> for an OVN use-case [1]. So this approach seems reasonable to me. >>>> >>>> Acked-by: Simon Horman <horms@ovn.org> >>>> >>>> [1] dpif-netdev: Set MAX_RECIRC_DEPTH to 6. >>>> https://github.com/openvswitch/ovs/commit/3f9d3836d63a >>>> >>> Hi Ilya, >>> can we include this patch in 3.3, please? >>> >>> It is required for more complex setups like OKD/OpenShift with userspace datapaths. >> In the hope of moving this forwards: >> >> My feeling is that while on the one hand this is a wack-a-mole approach it >> does address a real problem in a well understood way, thus moving >> things in a positive direction. >> >> As for downside, there is a risk that this will blow the stack anyway >> or somehow cause resource exhaustion that didn't occur before. I didn't >> exercise the patch. But I do suspect that risk is minimal. And if it >> surfaces we will have to search for a better solution. >> >> But we don't have a better solution at this time - or at least I don't. >> So, other than what I assume is a small risk of regression, do we really >> lose anything by moving forward with this simple change at this time? > I think, it's fine for now. The extra 2 recirculations are very unlikely > to blow up the stack, especially since the default config is to have 2 MB > of pre-allocated and mlocked stack. > > So, applied for now. Also applied to 3.3 as I don't think this change > carries any risk at this time and it will make life of OVN developers/users > much easier. > > But we still need to think of a better way of solving this issue for the > future. > > Best regards, Ilya Maximets. > Thank you, Ilya and Simon! I have updated the ticket [0] accordingly. [0] https://issues.redhat.com/browse/FDP-251
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c1981137f..46e24d204 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -99,7 +99,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev); #define FLOW_DUMP_MAX_BATCH 50 /* Use per thread recirc_depth to prevent recirculation loop. */ -#define MAX_RECIRC_DEPTH 6 +#define MAX_RECIRC_DEPTH 8 DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0) /* Use instant packet send by default. */