Message ID | 20191124132418.GA13864@fuckup |
---|---|
State | Rejected |
Delegated to: | David Miller |
Headers | show |
Series | [v2] net: ip/tnl: Set iph->id only when don't fragment is not set | expand |
From: Oliver Herms <oliver.peter.herms@gmail.com> Date: Sun, 24 Nov 2019 14:24:18 +0100 > From RFC 6864 "Updated Specification of the IPv4 ID Field" section 4.1: Just reading the abstract of this RFC I cannot take it seriously: This document updates the specification of the IPv4 ID field in RFCs 791, 1122, and 2003 to more closely reflect current practice... "more closely reflect current practice" ?!?! That statement is a joke right? Linux generates the bulk of the traffic on the internet and we've had the current behavior of the ID field for decades. Therefore, I don't think even the premise of this document is valid. These are all red flags to me, and I think we should keep the current behavior. I'm not applying your patch, sorry.
Hi Dave, On 25.11.19 23:41, David Miller wrote: > From: Oliver Herms <oliver.peter.herms@gmail.com> > Date: Sun, 24 Nov 2019 14:24:18 +0100 > >> From RFC 6864 "Updated Specification of the IPv4 ID Field" section 4.1: > > Just reading the abstract of this RFC I cannot take it seriously: > > This document updates the specification of the IPv4 ID field > in RFCs 791, 1122, and 2003 to more closely reflect current > practice... > > "more closely reflect current practice" ?!?! > > That statement is a joke right? > > Linux generates the bulk of the traffic on the internet and we've had > the current behavior of the ID field for decades. > > Therefore, I don't think even the premise of this document is valid. > > These are all red flags to me, and I think we should keep the current > behavior. > > I'm not applying your patch, sorry. > I totally understand your argument. What do you think about making this configurable via sysctl and make the current behavior the default? I would also like to make this configurable for other payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set. Would you be willing to merge a patch that offers this? Thanks Oliver
From: Oliver Herms <oliver.peter.herms@gmail.com> Date: Tue, 26 Nov 2019 20:10:52 +0100 > Would you be willing to merge a patch that offers this? No. There is lots of precendence for incrementing the ID, look at SLIP compression: deltaS = ntohs(ip->id) - ntohs(cs->cs_ip.id); if(deltaS != 1){ cp = encode(cp,deltaS); changes |= NEW_I; } That's just one example, it won't compress unless the ID field is (unconditionally) incrementing. The RFC being discussed here is poorly constructed and is founded on a false basis of reality. It's the usual tone deaf IETF stuff... and Linux will not participate. Thanks.
On 11/26/19 11:10 AM, Oliver Herms wrote: > > What do you think about making this configurable via sysctl and make the current > behavior the default? I would also like to make this configurable for other > payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set. > Certainly not. I advise you to look at GRO layer (at various stages, depending on linux version) You can not 'optimize [1]' the sender and break receivers ( including old ones ) [1] Look at ip_select_ident_segs() : the per-socket id generator makes ID generation quite low cost, there is no real issue here.
Hi everyone, On 26.11.19 23:45, Eric Dumazet wrote: > > > On 11/26/19 11:10 AM, Oliver Herms wrote: >> >> What do you think about making this configurable via sysctl and make the current >> behavior the default? I would also like to make this configurable for other >> payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set. >> > > Certainly not. > > I advise you to look at GRO layer (at various stages, depending on linux version) > > You can not 'optimize [1]' the sender and break receivers ( including old ones ) > > [1] Look at ip_select_ident_segs() : the per-socket id generator makes > ID generation quite low cost, there is no real issue here. > ip_select_ident_segs() is not the issue. The issue is with __ip_select_ident that calls ip_idents_reserve. That consumes significant amount of CPU time here while not adding any value (for my use case of company internal IPIP tunneling in a well defined environment to be fair). Here is a flame graph: https://tinyurl.com/s9qv9fx I'm curious for ideas on how to make this more efficient. Using a simple incrementation here, as with sockets, would solve my problem well enough. Thoughts? Thanks Oliver
On 11/26/19 3:32 PM, Oliver Herms wrote: > Hi everyone, > > On 26.11.19 23:45, Eric Dumazet wrote: >> >> >> On 11/26/19 11:10 AM, Oliver Herms wrote: >>> >>> What do you think about making this configurable via sysctl and make the current >>> behavior the default? I would also like to make this configurable for other >>> payload types like TCP and UDP. IMHO there the ID is unnecessary, too, when DF is set. >>> >> >> Certainly not. >> >> I advise you to look at GRO layer (at various stages, depending on linux version) >> >> You can not 'optimize [1]' the sender and break receivers ( including old ones ) >> >> [1] Look at ip_select_ident_segs() : the per-socket id generator makes >> ID generation quite low cost, there is no real issue here. >> > > ip_select_ident_segs() is not the issue. The issue is with __ip_select_ident > that calls ip_idents_reserve. That consumes significant amount of CPU time here > while not adding any value (for my use case of company internal IPIP tunneling > in a well defined environment to be fair). > > Here is a flame graph: https://tinyurl.com/s9qv9fx > I'm curious for ideas on how to make this more efficient. > Using a simple incrementation here, as with sockets, would solve my problem well enough. > > Thoughts? > Oliver, TCP flows do not call __ip_select_ident() Pleas do not touch TCP payload. Thank you.
On 11/26/19 3:32 PM, Oliver Herms wrote: > Using a simple incrementation here, as with sockets, would solve my problem well enough. > I have to ask : Are you aware that linux is SMP OS ? If on a mostly idle host, two packets need a different ID, using a " simple incrementation" wont fit the need. sockets are protected against concurrent increments by their lock.
On 27.11.19 01:28, Eric Dumazet wrote: > > > On 11/26/19 3:32 PM, Oliver Herms wrote: >> Using a simple incrementation here, as with sockets, would solve my problem well enough. >> > > I have to ask : Are you aware that linux is SMP OS ? > > If on a mostly idle host, two packets need a different ID, using a " simple incrementation" > wont fit the need. > > sockets are protected against concurrent increments by their lock. > I know and I'm not going to mess around with TCP. I've double checked and found that for non IP tunnel traffic (UDP, TCP, etc.) the cheap function ip_select_ident_segs() is being used. That is absolutely fine. Nothing to optimize here. For IP tunnels __ip_select_ident is being called. And that one is way more expensive than ip_select_ident_segs(). ip_select_ident_segs() increments a counter (yes, I'm aware it is protected by lock). If somehow __ip_select_ident could be refactored to work in a similar fashion that would solve my problem. Thanks Oliver
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 1452a97914a0..ea12c8cc9e83 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -73,7 +73,11 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, iph->daddr = dst; iph->saddr = src; iph->ttl = ttl; - __ip_select_ident(net, iph, skb_shinfo(skb)->gso_segs ?: 1); + + if (likely((iph->frag_off & htons(IP_DF)) == htons(IP_DF))) + iph->id = 0; + else + __ip_select_ident(net, iph, skb_shinfo(skb)->gso_segs ?: 1); err = ip_local_out(net, sk, skb);
In IPv4 the identification field ensures that fragments of different datagrams are not mixed by the receiver. Packets with Don't Fragment (DF) flag set are not to be fragmented in transit and thus don't need an identification. From RFC 6864 "Updated Specification of the IPv4 ID Field" section 4.1: "The IPv4 ID field MUST NOT be used for purposes other than fragmentation and reassembly." Calculating the identification takes significant CPU time. This patch will increase IP tunneling performance by ~10% unless DF is not set. However, DF is set by default which is best practice. For security reason this patch will set ID to zero when ID is not needed to prevent leaking random information. Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com> --- Changes in v2: - Add source of assertion into commit message. - Correct check of DF flag. - Set iph->id to 0 when DF if not set to avoid leaking information. net/ipv4/ip_tunnel_core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)