diff mbox

[RFC] : ingress socket filter by mark

Message ID 1255897680.4815.63.camel@dogo.mojatatu.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

jamal Oct. 18, 2009, 8:28 p.m. UTC
On Sun, 2009-10-18 at 19:28 +0200, Eric Dumazet wrote:

> I vote for extending BPF, and not adding the price of a compare
> for each packet. Only users wanting mark filtering should pay the price.

To be honest it nagged me as well;->
So here's a basic patch stolen from a patch i just saw that you
posted;-> I still havent tested. Let me know if it looks reasonable...

cheers,
jamal

Comments

Maciej Żenczykowski Oct. 18, 2009, 11:09 p.m. UTC | #1
I haven't looked at the patches, but I do not believe requiring
marking to be symmetric to be a good idea.

Example:
- a relatively complex router/load balancer setup
- normal (no mark) packets get routed/load balanced to destinations
which are healthy
- a separate job which health checks the destinations (and updates the
no mark routing table on destination health state transitions) - it
uses socket marking to select a separate routing table which throws
all load at a specific destination host.

ie. the socket marking is used to explicitly direct load at a specific
destination host.
Obviously only the transmit path is affected.  There is no marking
happening on the receive path.

Another example would be tunneling. I'd envision something like:
ip tunnel add vtun0 mode gre remote ... local ... tos inherit ttl 64 mark 0x1234

ip rule add fwmark 0x1234 lookup 250
ip route add 192.168.1.0/24 dev eth0 table 250
ip route add default via 192.168.1.1 dev eth0 table 250
ip route add default dev vtun0 table main

(obviously this is just an example and not fully fleshed out,
furthermore ip tunnel doesn't currently support mark, nor do the
tunnel modules themselves)

If you do want to allow explicit incoming mark filtering use another
socket option (SO_MARK_RCV) and a different/new socket field.

>> I vote for extending BPF, and not adding the price of a compare
>> for each packet. Only users wanting mark filtering should pay the price.

I agree that being able to filter on mark in bpf makes a lot of sense.
I wonder if we're not hitting the filters potentially before the mark
is set though (on receive at least)...
I'm nowhere near sure but I think packets get diverted/cloned to
tcpdump before they hit the ip stack (and thus potentially get marked
by ip(6)table mangle rules)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
jamal Oct. 19, 2009, 12:12 p.m. UTC | #2
On Sun, 2009-10-18 at 16:09 -0700, Maciej Żenczykowski wrote:
> 
> I agree that being able to filter on mark in bpf makes a lot of sense.

I agree as well - i posted a patch yesterday; i just tested it and it
works so i will formally post it shortly.

> I wonder if we're not hitting the filters potentially before the mark
> is set though (on receive at least)...
> I'm nowhere near sure but I think packets get diverted/cloned to
> tcpdump before they hit the ip stack (and thus potentially get marked
> by ip(6)table mangle rules)

There are many ways to mark the packets before they get to the socket.
tc ingress provides at least two ways (ipt action and recently posted
patch by me on skbedit); iptables as well.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1354aaf..909193e 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -123,7 +123,8 @@  struct sock_fprog	/* Required for SO_ATTACH_FILTER. */
 #define SKF_AD_IFINDEX 	8
 #define SKF_AD_NLATTR	12
 #define SKF_AD_NLATTR_NEST	16
-#define SKF_AD_MAX	20
+#define SKF_AD_MARK 	20
+#define SKF_AD_MAX	24
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index d1d779c..e3987e1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -303,6 +303,9 @@  load_b:
 		case SKF_AD_IFINDEX:
 			A = skb->dev->ifindex;
 			continue;
+		case SKF_AD_MARK:
+			A = skb->mark;
+			continue;
 		case SKF_AD_NLATTR: {
 			struct nlattr *nla;