diff mbox series

[v2] net: ip/tnl: Set iph->id only when don't fragment is not set

Message ID 20191124132418.GA13864@fuckup
State Rejected
Delegated to: David Miller
Headers show
Series [v2] net: ip/tnl: Set iph->id only when don't fragment is not set | expand

Commit Message

Oliver Herms Nov. 24, 2019, 1:24 p.m. UTC
In IPv4 the identification field ensures that fragments of different datagrams
are not mixed by the receiver. Packets with Don't Fragment (DF) flag set are not
to be fragmented in transit and thus don't need an identification.
From RFC 6864 "Updated Specification of the IPv4 ID Field" section 4.1:
"The IPv4 ID field MUST NOT be used for purposes other than fragmentation and
reassembly."
Calculating the identification takes significant CPU time.
This patch will increase IP tunneling performance by ~10% unless DF is not set.
However, DF is set by default which is best practice.
For security reason this patch will set ID to zero when ID is not needed to
prevent leaking random information.

Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
---
Changes in v2:
  - Add source of assertion into commit message.
  - Correct check of DF flag.
  - Set iph->id to 0 when DF if not set to avoid leaking information.

 net/ipv4/ip_tunnel_core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

David Miller Nov. 25, 2019, 10:41 p.m. UTC | #1
From: Oliver Herms <oliver.peter.herms@gmail.com>
Date: Sun, 24 Nov 2019 14:24:18 +0100

> From RFC 6864 "Updated Specification of the IPv4 ID Field" section 4.1:

Just reading the abstract of this RFC I cannot take it seriously:

	This document updates the specification of the IPv4 ID field
	in RFCs 791, 1122, and 2003 to more closely reflect current
	practice...

"more closely reflect current practice" ?!?!

That statement is a joke right?

Linux generates the bulk of the traffic on the internet and we've had
the current behavior of the ID field for decades.

Therefore, I don't think even the premise of this document is valid.

These are all red flags to me, and I think we should keep the current
behavior.

I'm not applying your patch, sorry.
Oliver Herms Nov. 26, 2019, 7:10 p.m. UTC | #2
Hi Dave,

On 25.11.19 23:41, David Miller wrote:
> From: Oliver Herms <oliver.peter.herms@gmail.com>
> Date: Sun, 24 Nov 2019 14:24:18 +0100
> 
>> From RFC 6864 "Updated Specification of the IPv4 ID Field" section 4.1:
> 
> Just reading the abstract of this RFC I cannot take it seriously:
> 
> 	This document updates the specification of the IPv4 ID field
> 	in RFCs 791, 1122, and 2003 to more closely reflect current
> 	practice...
> 
> "more closely reflect current practice" ?!?!
> 
> That statement is a joke right?
> 
> Linux generates the bulk of the traffic on the internet and we've had
> the current behavior of the ID field for decades.
> 
> Therefore, I don't think even the premise of this document is valid.
> 
> These are all red flags to me, and I think we should keep the current
> behavior.
> 
> I'm not applying your patch, sorry.
> 
I totally understand your argument.

What do you think about making this configurable via sysctl and make the current
behavior the default? I would also like to make this configurable for other 
payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set.

Would you be willing to merge a patch that offers this?

Thanks
Oliver
David Miller Nov. 26, 2019, 8:51 p.m. UTC | #3
From: Oliver Herms <oliver.peter.herms@gmail.com>
Date: Tue, 26 Nov 2019 20:10:52 +0100

> Would you be willing to merge a patch that offers this?

No.

There is lots of precendence for incrementing the ID, look
at SLIP compression:

	deltaS = ntohs(ip->id) - ntohs(cs->cs_ip.id);
	if(deltaS != 1){
		cp = encode(cp,deltaS);
		changes |= NEW_I;
	}

That's just one example, it won't compress unless the ID
field is (unconditionally) incrementing.

The RFC being discussed here is poorly constructed and is founded on a
false basis of reality.  It's the usual tone deaf IETF stuff... and
Linux will not participate.

Thanks.
Eric Dumazet Nov. 26, 2019, 10:45 p.m. UTC | #4
On 11/26/19 11:10 AM, Oliver Herms wrote:
> 
> What do you think about making this configurable via sysctl and make the current
> behavior the default? I would also like to make this configurable for other 
> payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set.
>

Certainly not.

I advise you to look at GRO layer (at various stages, depending on linux version)

You can not 'optimize [1]' the sender and break receivers ( including old ones )

[1] Look at ip_select_ident_segs() : the per-socket id generator makes
ID generation quite low cost, there is no real issue here.
Oliver Herms Nov. 26, 2019, 11:32 p.m. UTC | #5
Hi everyone,

On 26.11.19 23:45, Eric Dumazet wrote:
> 
> 
> On 11/26/19 11:10 AM, Oliver Herms wrote:
>>
>> What do you think about making this configurable via sysctl and make the current
>> behavior the default? I would also like to make this configurable for other 
>> payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set.
>>
> 
> Certainly not.
> 
> I advise you to look at GRO layer (at various stages, depending on linux version)
> 
> You can not 'optimize [1]' the sender and break receivers ( including old ones )
> 
> [1] Look at ip_select_ident_segs() : the per-socket id generator makes
> ID generation quite low cost, there is no real issue here.
> 

ip_select_ident_segs() is not the issue. The issue is with __ip_select_ident
that calls ip_idents_reserve. That consumes significant amount of CPU time here
while not adding any value (for my use case of company internal IPIP tunneling 
in a well defined environment to be fair).

Here is a flame graph: https://tinyurl.com/s9qv9fx
I'm curious for ideas on how to make this more efficient.
Using a simple incrementation here, as with sockets, would solve my problem well enough.

Thoughts?

Thanks
Oliver
Eric Dumazet Nov. 27, 2019, 12:23 a.m. UTC | #6
On 11/26/19 3:32 PM, Oliver Herms wrote:
> Hi everyone,
> 
> On 26.11.19 23:45, Eric Dumazet wrote:
>>
>>
>> On 11/26/19 11:10 AM, Oliver Herms wrote:
>>>
>>> What do you think about making this configurable via sysctl and make the current
>>> behavior the default? I would also like to make this configurable for other 
>>> payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set.
>>>
>>
>> Certainly not.
>>
>> I advise you to look at GRO layer (at various stages, depending on linux version)
>>
>> You can not 'optimize [1]' the sender and break receivers ( including old ones )
>>
>> [1] Look at ip_select_ident_segs() : the per-socket id generator makes
>> ID generation quite low cost, there is no real issue here.
>>
> 
> ip_select_ident_segs() is not the issue. The issue is with __ip_select_ident
> that calls ip_idents_reserve. That consumes significant amount of CPU time here
> while not adding any value (for my use case of company internal IPIP tunneling 
> in a well defined environment to be fair).
> 
> Here is a flame graph: https://tinyurl.com/s9qv9fx
> I'm curious for ideas on how to make this more efficient.
> Using a simple incrementation here, as with sockets, would solve my problem well enough.
> 
> Thoughts?
> 

Oliver, TCP flows do not call  __ip_select_ident()

Pleas do not touch TCP payload.

Thank you.
Eric Dumazet Nov. 27, 2019, 12:28 a.m. UTC | #7
On 11/26/19 3:32 PM, Oliver Herms wrote:
> Using a simple incrementation here, as with sockets, would solve my problem well enough.
>

I have to ask : Are you aware that linux is SMP OS ?

If on a mostly idle host, two packets need a different ID, using a " simple incrementation" 
wont fit the need.

sockets are protected against concurrent increments by their lock.
Oliver Herms Nov. 27, 2019, 12:19 p.m. UTC | #8
On 27.11.19 01:28, Eric Dumazet wrote:
> 
> 
> On 11/26/19 3:32 PM, Oliver Herms wrote:
>> Using a simple incrementation here, as with sockets, would solve my problem well enough.
>>
> 
> I have to ask : Are you aware that linux is SMP OS ?
> 
> If on a mostly idle host, two packets need a different ID, using a " simple incrementation" 
> wont fit the need.
> 
> sockets are protected against concurrent increments by their lock.
> 
I know and I'm not going to mess around with TCP.
I've double checked and found that for non IP tunnel traffic (UDP, TCP, etc.) the cheap function
ip_select_ident_segs() is being used. That is absolutely fine. Nothing to optimize here.

For IP tunnels __ip_select_ident is being called.
And that one is way more expensive than ip_select_ident_segs().

ip_select_ident_segs() increments a counter (yes, I'm aware it is protected by lock).
If somehow __ip_select_ident could be refactored to work in a similar fashion that
would solve my problem.

Thanks
Oliver
diff mbox series

Patch

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 1452a97914a0..ea12c8cc9e83 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -73,7 +73,11 @@  void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	iph->daddr	=	dst;
 	iph->saddr	=	src;
 	iph->ttl	=	ttl;
-	__ip_select_ident(net, iph, skb_shinfo(skb)->gso_segs ?: 1);
+
+	if (likely((iph->frag_off & htons(IP_DF)) == htons(IP_DF)))
+		iph->id = 0;
+	else
+		__ip_select_ident(net, iph, skb_shinfo(skb)->gso_segs ?: 1);
 
 	err = ip_local_out(net, sk, skb);