diff mbox

Flow classifier proto-dst and TOS (and proto-src)

Message ID 1318697463.7169.21.camel@ganymede
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Dan Siemon Oct. 15, 2011, 4:51 p.m. UTC
cls_flow.c: flow_get_proto_dst()

The proto-dst key returns the destination port for UDP, TCP and a few
other protocols [see proto_ports_offset()]. For ICMP and IPIP it falls
back to:

return addr_fold(skb_dst(skb)) ^ (__force u16)skb->protocol;

Since Linux maintains a dst_entry for each TOS value this causes the
returned value to be affected by the TOS which is unexpected and
probably broken.

Is there a reason why this doesn't return 0 for protocols that don't
have a notion of source and destination ports? It seems very odd to me
that a value which is not at all related to the traffic on the wire is
returned for this key.

There is a somewhat similar situation with flow_get_proto_src(). Here
the fallback value is:

return addr_fold(skb->sk);

It looks like this is 0 when the traffic doesn't originate locally and
even for local traffic I don't understand why the use of a effectively
random number here is useful.

For a long winded explanation of how I discovered this see:
http://www.coverfire.com/archives/2011/10/15/linux-flow-classifier-proto-dst-and-tos/

Below is a simple patch which makes these functions fallback to
returning 0 when the protocol doesn't have the notion of ports.

Signed-off-by: Dan Siemon <dan@coverfire.com>

Comments

Eric Dumazet Oct. 17, 2011, 6:09 a.m. UTC | #1
Le samedi 15 octobre 2011 à 12:51 -0400, Dan Siemon a écrit :
> cls_flow.c: flow_get_proto_dst()
> 
> The proto-dst key returns the destination port for UDP, TCP and a few
> other protocols [see proto_ports_offset()]. For ICMP and IPIP it falls
> back to:
> 
> return addr_fold(skb_dst(skb)) ^ (__force u16)skb->protocol;
> 
> Since Linux maintains a dst_entry for each TOS value this causes the
> returned value to be affected by the TOS which is unexpected and
> probably broken.

Hi Dan

I think Patrick did this on purpose, because of of the lack of
perturbation in cls_flow.c : If all these frames were mapped to a single
flow, they might interfere with an other regular flow and hurt it.

I dont qualify existing code as buggy. Its about fallback behavior
anyway (I dont think its even documented)

If you have too many frames going to the fallback, then this classifier
is probably not the one you should use ?

Hint : You can change your filter to use this classifier only on TCP/UDP
trafic, and use another one on other protocols : Coupled to your qdisc
rules, you even can limit to X percent the bandwidth allocated to this
trafic

We could argue that if TOS value of two packets is different, then
packets belong to different flows as well. [ It seems we currently lack
a FLOW_KEY_TOS : that could be a usefull addition ]


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Siemon Oct. 24, 2011, 1:03 a.m. UTC | #2
On Mon, 2011-10-17 at 08:09 +0200, Eric Dumazet wrote:
> Le samedi 15 octobre 2011 à 12:51 -0400, Dan Siemon a écrit :
> > cls_flow.c: flow_get_proto_dst()
> > 
> > The proto-dst key returns the destination port for UDP, TCP and a few
> > other protocols [see proto_ports_offset()]. For ICMP and IPIP it falls
> > back to:
> > 
> > return addr_fold(skb_dst(skb)) ^ (__force u16)skb->protocol;
> > 
> > Since Linux maintains a dst_entry for each TOS value this causes the
> > returned value to be affected by the TOS which is unexpected and
> > probably broken.
> 
> Hi Dan
> 
> I think Patrick did this on purpose, because of of the lack of
> perturbation in cls_flow.c : If all these frames were mapped to a single
> flow, they might interfere with an other regular flow and hurt it.
> 
> I dont qualify existing code as buggy. Its about fallback behavior
> anyway (I dont think its even documented)

Thanks for the review Eric.

Won't virtually all uses of proto-dst also use the dst key anyway? In
which case this fallback does nothing except make the TOS effect the
hash output because the dst will be the same and dst_entry would be the
same if it wasn't for the different TOS (by far the common case). I
don't see the value of the unintuitive behavior.

I'm not certain this is a problem but also note that including TOS will
mean that packets within a tunnel will be reordered if 'tos inherit' is
set on the tunnel and only the typical src,dst,proto,proto-src,proto-dst
is used. Again, probably not expected.

> If you have too many frames going to the fallback, then this classifier
> is probably not the one you should use ?

If you have significant traffic in tunnels then any 5-tuple approach is
going to present problems unless you look into the tunnel (like my other
patch :) )

> Hint : You can change your filter to use this classifier only on TCP/UDP
> trafic, and use another one on other protocols : Coupled to your qdisc
> rules, you even can limit to X percent the bandwidth allocated to this
> trafic
> 
> We could argue that if TOS value of two packets is different, then
> packets belong to different flows as well. [ It seems we currently lack
> a FLOW_KEY_TOS : that could be a usefull addition ]
Eric Dumazet Oct. 24, 2011, 4:02 a.m. UTC | #3
Le dimanche 23 octobre 2011 à 21:03 -0400, Dan Siemon a écrit :
> On Mon, 2011-10-17 at 08:09 +0200, Eric Dumazet wrote:
> > Le samedi 15 octobre 2011 à 12:51 -0400, Dan Siemon a écrit :
> > > cls_flow.c: flow_get_proto_dst()
> > > 
> > > The proto-dst key returns the destination port for UDP, TCP and a few
> > > other protocols [see proto_ports_offset()]. For ICMP and IPIP it falls
> > > back to:
> > > 
> > > return addr_fold(skb_dst(skb)) ^ (__force u16)skb->protocol;
> > > 
> > > Since Linux maintains a dst_entry for each TOS value this causes the
> > > returned value to be affected by the TOS which is unexpected and
> > > probably broken.
> > 
> > Hi Dan
> > 
> > I think Patrick did this on purpose, because of of the lack of
> > perturbation in cls_flow.c : If all these frames were mapped to a single
> > flow, they might interfere with an other regular flow and hurt it.
> > 
> > I dont qualify existing code as buggy. Its about fallback behavior
> > anyway (I dont think its even documented)
> 
> Thanks for the review Eric.
> 
> Won't virtually all uses of proto-dst also use the dst key anyway? In
> which case this fallback does nothing except make the TOS effect the
> hash output because the dst will be the same and dst_entry would be the
> same if it wasn't for the different TOS (by far the common case). I
> don't see the value of the unintuitive behavior.
> 
> I'm not certain this is a problem but also note that including TOS will
> mean that packets within a tunnel will be reordered if 'tos inherit' is
> set on the tunnel and only the typical src,dst,proto,proto-src,proto-dst
> is used. Again, probably not expected.

Hmm, it seems cls_flow now supports "perturbation", so my prior argument
is not true anymore. I guess your patch would be fine, but I prefer
waiting Patrick review on it ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 6994214..7527e61 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -150,7 +150,7 @@  static u32 flow_get_proto_src(struct sk_buff *skb)
 	}
 	}
 
-	return addr_fold(skb->sk);
+	return 0;
 }
 
 static u32 flow_get_proto_dst(struct sk_buff *skb)
@@ -192,7 +192,7 @@  static u32 flow_get_proto_dst(struct sk_buff *skb)
 	}
 	}
 
-	return addr_fold(skb_dst(skb)) ^ (__force u16)skb->protocol;
+	return 0;
 }
 
static u32 flow_get_iif(const struct sk_buff *skb)