diff mbox series

[nf-next] netfilter: conntrack: collect start time as early as possible

Message ID 20241026105030.75254-1-fw@strlen.de
State Superseded, archived
Headers show
Series [nf-next] netfilter: conntrack: collect start time as early as possible | expand

Commit Message

Florian Westphal Oct. 26, 2024, 10:50 a.m. UTC
Sample start time at allocation time, not when the conntrack entry
is inserted into the hashtable.

In most cases this makes very little difference, but there are
cases where there is significant delay beteen allocation and
confirmation, e.g. when packets get queued to userspace.

Sampling as early as possible exposes this extra delay to userspace
via ctnetlink.

Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Fixes: a992ca2a0498 ("netfilter: nf_conntrack_tstamp: add flow-based timestamp extension")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_timestamp.h | 12 ++++++------
 net/netfilter/nf_conntrack_core.c              | 16 ++--------------
 2 files changed, 8 insertions(+), 20 deletions(-)

Comments

Pablo Neira Ayuso Oct. 28, 2024, 11:09 p.m. UTC | #1
Hi Florian,

On Sat, Oct 26, 2024 at 12:50:13PM +0200, Florian Westphal wrote:
> Sample start time at allocation time, not when the conntrack entry
> is inserted into the hashtable.

Back at the time, long time ago, I remember to have measured a
performance impact on this.

> In most cases this makes very little difference, but there are
> cases where there is significant delay beteen allocation and
> confirmation, e.g. when packets get queued to userspace.

I delayed this to insertion time because packet could dropped before,
rendering this conntrack timestamp useless? There is no event
reporting for conntrack that never get confirmed.
Florian Westphal Oct. 29, 2024, 7:16 a.m. UTC | #2
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Sat, Oct 26, 2024 at 12:50:13PM +0200, Florian Westphal wrote:
> > Sample start time at allocation time, not when the conntrack entry
> > is inserted into the hashtable.
> 
> Back at the time, long time ago, I remember to have measured a
> performance impact on this.

You mean when enabling timestamp + conntracks get dropped before
confirm, correct?

> > In most cases this makes very little difference, but there are
> > cases where there is significant delay beteen allocation and
> > confirmation, e.g. when packets get queued to userspace.
> 
> I delayed this to insertion time because packet could dropped before,
> rendering this conntrack timestamp useless? There is no event
> reporting for conntrack that never get confirmed.

Sure, but the "issue" is that the reported start time doesn't account
for a possible delay.  I did not measure huge delta before/after this
patch but if you have e.g. nfqueue in between alloc+confirm then the
start timestamp will account for that delay after this patch.
Pablo Neira Ayuso Oct. 29, 2024, 9:53 a.m. UTC | #3
On Tue, Oct 29, 2024 at 08:16:24AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Sat, Oct 26, 2024 at 12:50:13PM +0200, Florian Westphal wrote:
> > > Sample start time at allocation time, not when the conntrack entry
> > > is inserted into the hashtable.
> > 
> > Back at the time, long time ago, I remember to have measured a
> > performance impact on this.
> 
> You mean when enabling timestamp + conntracks get dropped before
> confirm, correct?

Yes.

> > > In most cases this makes very little difference, but there are
> > > cases where there is significant delay beteen allocation and
> > > confirmation, e.g. when packets get queued to userspace.
> > 
> > I delayed this to insertion time because packet could dropped before,
> > rendering this conntrack timestamp useless? There is no event
> > reporting for conntrack that never get confirmed.
> 
> Sure, but the "issue" is that the reported start time doesn't account
> for a possible delay.  I did not measure huge delta before/after this
> patch but if you have e.g. nfqueue in between alloc+confirm then the
> start timestamp will account for that delay after this patch.

I see. I think the question is what this start timestamp is. For me,
it is the start time since the conntrack is _confirmed_ which is what
we expose to userspace via ctnetlink and /proc interface.

Is this user trying to trying to profile nfqueue? Why does this user
assume conntrack allocation time is the right spot to push the
timestamp on the ct?

On top of this, at that time I made this, I measured ~20-25%
performance drop to get this accurate timestamp, probably this is
cheaper now in modern equipment?
Florian Westphal Oct. 29, 2024, 11:18 a.m. UTC | #4
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > I delayed this to insertion time because packet could dropped before,
> > > rendering this conntrack timestamp useless? There is no event
> > > reporting for conntrack that never get confirmed.
> > 
> > Sure, but the "issue" is that the reported start time doesn't account
> > for a possible delay.  I did not measure huge delta before/after this
> > patch but if you have e.g. nfqueue in between alloc+confirm then the
> > start timestamp will account for that delay after this patch.
> 
> I see. I think the question is what this start timestamp is. For me,
> it is the start time since the conntrack is _confirmed_ which is what
> we expose to userspace via ctnetlink and /proc interface.

Sure, its a question on definition as to what "flow start time" means.
See below for yet another proposal.

> Is this user trying to trying to profile nfqueue? Why does this user
> assume conntrack allocation time is the right spot to push the
> timestamp on the ct?

If I understood correctly its about using conntrack events + timestamp
extension to get (passive) RTT measurements.

> On top of this, at that time I made this, I measured ~20-25%
> performance drop to get this accurate timestamp, probably this is
> cheaper now in modern equipment?

I think cost depends on the clock source, hopefully most systems do not
have a too expensive one nowadays.

What about using skb_tstamp() first?  If skb already had rx tstamp
enabled, we'd get an even more accurate start time (packet arrival),
even before ct got allocated.
diff mbox series

Patch

diff --git a/include/net/netfilter/nf_conntrack_timestamp.h b/include/net/netfilter/nf_conntrack_timestamp.h
index 57138d974a9f..76515353829d 100644
--- a/include/net/netfilter/nf_conntrack_timestamp.h
+++ b/include/net/netfilter/nf_conntrack_timestamp.h
@@ -23,18 +23,18 @@  struct nf_conn_tstamp *nf_conn_tstamp_find(const struct nf_conn *ct)
 #endif
 }
 
-static inline
-struct nf_conn_tstamp *nf_ct_tstamp_ext_add(struct nf_conn *ct, gfp_t gfp)
+static inline void nf_ct_tstamp_ext_add(struct nf_conn *ct, gfp_t gfp)
 {
 #ifdef CONFIG_NF_CONNTRACK_TIMESTAMP
 	struct net *net = nf_ct_net(ct);
+	struct nf_conn_tstamp *tstamp;
 
 	if (!net->ct.sysctl_tstamp)
-		return NULL;
+		return;
 
-	return nf_ct_ext_add(ct, NF_CT_EXT_TSTAMP, gfp);
-#else
-	return NULL;
+	tstamp = nf_ct_ext_add(ct, NF_CT_EXT_TSTAMP, gfp);
+	if (tstamp)
+		tstamp->start = ktime_get_real_ns();
 #endif
 };
 
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9db3e2b0b1c3..ed1870096519 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -976,18 +976,6 @@  static void nf_ct_acct_merge(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
 	}
 }
 
-static void __nf_conntrack_insert_prepare(struct nf_conn *ct)
-{
-	struct nf_conn_tstamp *tstamp;
-
-	refcount_inc(&ct->ct_general.use);
-
-	/* set conntrack timestamp, if enabled. */
-	tstamp = nf_conn_tstamp_find(ct);
-	if (tstamp)
-		tstamp->start = ktime_get_real_ns();
-}
-
 /**
  * nf_ct_match_reverse - check if ct1 and ct2 refer to identical flow
  * @ct1: conntrack in hash table to check against
@@ -1111,7 +1099,7 @@  static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx)
 	 */
 	loser_ct->status |= IPS_FIXED_TIMEOUT | IPS_NAT_CLASH;
 
-	__nf_conntrack_insert_prepare(loser_ct);
+	refcount_inc(&loser_ct->ct_general.use);
 
 	/* fake add for ORIGINAL dir: we want lookups to only find the entry
 	 * already in the table.  This also hides the clashing entry from
@@ -1295,7 +1283,7 @@  __nf_conntrack_confirm(struct sk_buff *skb)
 	   weird delay cases. */
 	ct->timeout += nfct_time_stamp;
 
-	__nf_conntrack_insert_prepare(ct);
+	refcount_inc(&ct->ct_general.use);
 
 	/* Since the lookup is lockless, hash insertion must be done after
 	 * starting the timer and setting the CONFIRMED bit. The RCU barriers