diff mbox series

[nf] netfilter: restore default behavior for nf_conntrack_events

Message ID 20240604135438.2613064-1-nicolas.dichtel@6wind.com
State New
Headers show
Series [nf] netfilter: restore default behavior for nf_conntrack_events | expand

Commit Message

Nicolas Dichtel June 4, 2024, 1:54 p.m. UTC
Since the below commit, there are regressions for legacy setups:
1/ conntracks are created while there are no listener
2/ a listener starts and dumps all conntracks to get the current state
3/ conntracks deleted before the listener has started are not advertised

This is problematic in containers, where conntracks could be created early.
This sysctl is part of unsafe sysctl and could not be changed easily in
some environments.

Let's switch back to the legacy behavior.

CC: stable@vger.kernel.org
Fixes: 90d1daa45849 ("netfilter: conntrack: add nf_conntrack_events autodetect mode")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 Documentation/networking/nf_conntrack-sysctl.rst | 10 ++++++----
 net/netfilter/nf_conntrack_ecache.c              |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

Comments

Florian Westphal June 5, 2024, 8:55 a.m. UTC | #1
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> Since the below commit, there are regressions for legacy setups:
> 1/ conntracks are created while there are no listener
> 2/ a listener starts and dumps all conntracks to get the current state
> 3/ conntracks deleted before the listener has started are not advertised
> 
> This is problematic in containers, where conntracks could be created early.
> This sysctl is part of unsafe sysctl and could not be changed easily in
> some environments.
> 
> Let's switch back to the legacy behavior.

:-(

Would it be possible to resolve this for containers by setting
the container default to 1 if init_net had it changed to 1 at netns
creation time?
Nicolas Dichtel June 5, 2024, 9:09 a.m. UTC | #2
Le 05/06/2024 à 10:55, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>> Since the below commit, there are regressions for legacy setups:
>> 1/ conntracks are created while there are no listener
>> 2/ a listener starts and dumps all conntracks to get the current state
>> 3/ conntracks deleted before the listener has started are not advertised
>>
>> This is problematic in containers, where conntracks could be created early.
>> This sysctl is part of unsafe sysctl and could not be changed easily in
>> some environments.
>>
>> Let's switch back to the legacy behavior.
> 
> :-(
> 
> Would it be possible to resolve this for containers by setting
> the container default to 1 if init_net had it changed to 1 at netns
> creation time?

When we have access to the host, it is possible to allow the configuration of
this (unsafe) sysctl for the pod. But there are cases where we don't have access
to the host.

https://docs.openshift.com/container-platform/4.9/nodes/containers/nodes-containers-sysctls.html#nodes-containers-sysctls-unsafe_nodes-containers-using


Regards,
Nicolas
Pablo Neira Ayuso June 5, 2024, 6:41 p.m. UTC | #3
On Wed, Jun 05, 2024 at 11:09:31AM +0200, Nicolas Dichtel wrote:
> Le 05/06/2024 à 10:55, Florian Westphal a écrit :
> > Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >> Since the below commit, there are regressions for legacy setups:
> >> 1/ conntracks are created while there are no listener
> >> 2/ a listener starts and dumps all conntracks to get the current state
> >> 3/ conntracks deleted before the listener has started are not advertised
> >>
> >> This is problematic in containers, where conntracks could be created early.
> >> This sysctl is part of unsafe sysctl and could not be changed easily in
> >> some environments.
> >>
> >> Let's switch back to the legacy behavior.
> > 
> > :-(
> > 
> > Would it be possible to resolve this for containers by setting
> > the container default to 1 if init_net had it changed to 1 at netns
> > creation time?
> 
> When we have access to the host, it is possible to allow the configuration of
> this (unsafe) sysctl for the pod. But there are cases where we don't have access
> to the host.
> 
> https://docs.openshift.com/container-platform/4.9/nodes/containers/nodes-containers-sysctls.html#nodes-containers-sysctls-unsafe_nodes-containers-using

conntrack is enabled on-demand by the ruleset these days, such monitor
process could be created _before_ the ruleset is loaded?
Nicolas Dichtel June 6, 2024, 8:50 a.m. UTC | #4
Le 05/06/2024 à 20:41, Pablo Neira Ayuso a écrit :
> On Wed, Jun 05, 2024 at 11:09:31AM +0200, Nicolas Dichtel wrote:
>> Le 05/06/2024 à 10:55, Florian Westphal a écrit :
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>> Since the below commit, there are regressions for legacy setups:
>>>> 1/ conntracks are created while there are no listener
>>>> 2/ a listener starts and dumps all conntracks to get the current state
>>>> 3/ conntracks deleted before the listener has started are not advertised
>>>>
>>>> This is problematic in containers, where conntracks could be created early.
>>>> This sysctl is part of unsafe sysctl and could not be changed easily in
>>>> some environments.
>>>>
>>>> Let's switch back to the legacy behavior.
>>>
>>> :-(
>>>
>>> Would it be possible to resolve this for containers by setting
>>> the container default to 1 if init_net had it changed to 1 at netns
>>> creation time?
>>
>> When we have access to the host, it is possible to allow the configuration of
>> this (unsafe) sysctl for the pod. But there are cases where we don't have access
>> to the host.
>>
>> https://docs.openshift.com/container-platform/4.9/nodes/containers/nodes-containers-sysctls.html#nodes-containers-sysctls-unsafe_nodes-containers-using
> 
> conntrack is enabled on-demand by the ruleset these days, such monitor
> process could be created _before_ the ruleset is loaded?
It's not so easy :) There are several modules in the system.

I understand it's "sad" to keep nf_conntrack_events=1, but this change breaks
the backward compatibility. A container migrated to a host with a recent kernel
is broken.
Usually, in the networking stack, sysctl are added to keep the legacy behavior
and enable new systems to use "modern" features. There are a lot of examples :)
Florian Westphal June 6, 2024, 8:53 a.m. UTC | #5
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> I understand it's "sad" to keep nf_conntrack_events=1, but this change breaks
> the backward compatibility. A container migrated to a host with a recent kernel
> is broken.
> Usually, in the networking stack, sysctl are added to keep the legacy behavior
> and enable new systems to use "modern" features. There are a lot of examples :)

Weeks of work down the drain.  I wonder if we can make any changes aside
from bug fixes in the future.
Nicolas Dichtel June 6, 2024, 1:07 p.m. UTC | #6
Le 06/06/2024 à 10:53, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>> I understand it's "sad" to keep nf_conntrack_events=1, but this change breaks
>> the backward compatibility. A container migrated to a host with a recent kernel
>> is broken.
>> Usually, in the networking stack, sysctl are added to keep the legacy behavior
>> and enable new systems to use "modern" features. There are a lot of examples :)
> 
> Weeks of work down the drain.  I wonder if we can make any changes aside
> from bug fixes in the future.
The commit doesn't remove the optimization, it only keeps the existing behavior.
Systems that require this optimization, could still turn nf_conntrack_event to 2.
Pablo Neira Ayuso June 26, 2024, 11:41 a.m. UTC | #7
Hi Nicolas,

On Tue, Jun 04, 2024 at 03:54:38PM +0200, Nicolas Dichtel wrote:
> Since the below commit, there are regressions for legacy setups:
> 1/ conntracks are created while there are no listener
> 2/ a listener starts and dumps all conntracks to get the current state
> 3/ conntracks deleted before the listener has started are not advertised
> 
> This is problematic in containers, where conntracks could be created early.
> This sysctl is part of unsafe sysctl and could not be changed easily in
> some environments.
> 
> Let's switch back to the legacy behavior.

Maybe it is possible to annotate destroy events in a percpu area if
the conntrack extension is not available. This code used to follow
such approach time ago.
Nicolas Dichtel July 3, 2024, 7:37 a.m. UTC | #8
Hi Pablo,

Le 26/06/2024 à 13:41, Pablo Neira Ayuso a écrit :
> Hi Nicolas,
> 
> On Tue, Jun 04, 2024 at 03:54:38PM +0200, Nicolas Dichtel wrote:
>> Since the below commit, there are regressions for legacy setups:
>> 1/ conntracks are created while there are no listener
>> 2/ a listener starts and dumps all conntracks to get the current state
>> 3/ conntracks deleted before the listener has started are not advertised
>>
>> This is problematic in containers, where conntracks could be created early.
>> This sysctl is part of unsafe sysctl and could not be changed easily in
>> some environments.
>>
>> Let's switch back to the legacy behavior.
> 
> Maybe it is possible to annotate destroy events in a percpu area if
> the conntrack extension is not available. This code used to follow
> such approach time ago.
Thanks for the feedback. I was wondering if just sending the destroy event would
be possible. TBH, I'm not very familiar with this part of the code, I need to
dig a bit. I won't have time for this right now, any help would be appreciated.
Pablo Neira Ayuso July 15, 2024, 2:19 p.m. UTC | #9
On Wed, Jul 03, 2024 at 09:37:59AM +0200, Nicolas Dichtel wrote:
> Hi Pablo,
> 
> Le 26/06/2024 à 13:41, Pablo Neira Ayuso a écrit :
> > Hi Nicolas,
> > 
> > On Tue, Jun 04, 2024 at 03:54:38PM +0200, Nicolas Dichtel wrote:
> >> Since the below commit, there are regressions for legacy setups:
> >> 1/ conntracks are created while there are no listener
> >> 2/ a listener starts and dumps all conntracks to get the current state
> >> 3/ conntracks deleted before the listener has started are not advertised
> >>
> >> This is problematic in containers, where conntracks could be created early.
> >> This sysctl is part of unsafe sysctl and could not be changed easily in
> >> some environments.
> >>
> >> Let's switch back to the legacy behavior.
> > 
> > Maybe it is possible to annotate destroy events in a percpu area if
> > the conntrack extension is not available. This code used to follow
> > such approach time ago.
>
> Thanks for the feedback. I was wondering if just sending the destroy event would
> be possible. TBH, I'm not very familiar with this part of the code, I need to
> dig a bit. I won't have time for this right now, any help would be appreciated.

I will take a look and get back to you.
diff mbox series

Patch

diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index c383a394c665..edc04f99e1aa 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -34,13 +34,15 @@  nf_conntrack_count - INTEGER (read-only)
 
 nf_conntrack_events - BOOLEAN
 	- 0 - disabled
-	- 1 - enabled
-	- 2 - auto (default)
+	- 1 - enabled (default)
+	- 2 - auto
 
 	If this option is enabled, the connection tracking code will
 	provide userspace with connection tracking events via ctnetlink.
-	The default allocates the extension if a userspace program is
-	listening to ctnetlink events.
+	The 'auto' allocates the extension if a userspace program is
+	listening to ctnetlink events. Note that conntracks created
+	before the first listener has started won't trigger any netlink
+	event.
 
 nf_conntrack_expect_max - INTEGER
 	Maximum size of expectation table.  Default value is
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index 69948e1d6974..4c8559529e18 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -334,7 +334,7 @@  bool nf_ct_ecache_ext_add(struct nf_conn *ct, u16 ctmask, u16 expmask, gfp_t gfp
 }
 EXPORT_SYMBOL_GPL(nf_ct_ecache_ext_add);
 
-#define NF_CT_EVENTS_DEFAULT 2
+#define NF_CT_EVENTS_DEFAULT 1
 static int nf_ct_events __read_mostly = NF_CT_EVENTS_DEFAULT;
 
 void nf_conntrack_ecache_pernet_init(struct net *net)