Message ID | 50D8413C.8050508@openwrt.org |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 12-12-24 06:49 AM, Felix Fietkau wrote: > > After I added it as an experiment, I got distracted with other projects > again and forgot about submitting it. Take a look at the code - if the > approach is reasonable, I'll submit this thing for inclusion soon. > Excellent ;-> Simple and elegant. Usable as is - some minor comments. First nitpick: The name is not very reflective, how about: GetMarkFromConntrack or something along those lines? > +static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a, > + struct tcf_result *res) > +{ > + struct nf_conn *c; > + enum ip_conntrack_info ctinfo; > + int proto; > + int r; > + > + if (skb->protocol == htons(ETH_P_IP)) { > + if (skb->len < sizeof(struct iphdr)) > + goto out; > + proto = PF_INET; > + } else if (skb->protocol == htons(ETH_P_IPV6)) { > + if (skb->len < sizeof(struct ipv6hdr)) > + goto out; > + proto = PF_INET6; > + } else > + goto out; > + I would have said that this action is probably also not useful for egress qdisc path since skb->mark would already be set. It maybe worth checking skb->tc_verd and skipping overhead of nf_conntrack_in() call. Look at act_mirred for such a check. > + r = nf_conntrack_in(dev_net(skb->dev), proto, NF_INET_PRE_ROUTING, skb); > + if (r != NF_ACCEPT) > + goto out; > + > + c = nf_ct_get(skb, &ctinfo); > + if (!c) > + goto out; > + > + skb->mark = c->mark; > + nf_conntrack_put(skb->nfct); > + skb->nfct = NULL; > + > +out: > + return TC_ACT_PIPE; Ok, perhaps set tcf_action in (iproute2) user space to TC_ACT_PIPE then just return policy->tcf_action here. Even better is to have a different TC_ACT_XXX returned for failure vs success... Your success path becomes TC_ACT_PIPE and let the user program the failure branch optionally. This would allow for branching to different actions if success/failure, example: if mark is found { if mark is 0xa redirect to ifb0 else redirect to ifb1 } else set mark to 3 then redirect to ifb9 etc. Not sure if that made sense. I am under the influence of nyquil ;-> cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Felix, On Mon, Dec 24, 2012 at 12:49:16PM +0100, Felix Fietkau wrote: > On 2012-12-24 12:34 PM, Jamal Hadi Salim wrote: > > > > Some good news Yury. > > I am told Felix Fietkau <nbd@openwrt.org> (on CC) actually > > already solved this issue and it is a feature in openwrt. I > > cant find the code. > > > > Felix - Yury is trying to retrieve skb->mark fields from > > netfilter connmark. My understanding is you have written > > such an action. Can you please point us to it - and any > > reason you havent submitted this for inclusion in kernel > > proper? > After I added it as an experiment, I got distracted with other projects > again and forgot about submitting it. Take a look at the code - if the > approach is reasonable, I'll submit this thing for inclusion soon. > > - Felix > > --- /dev/null > +++ b/net/sched/act_connmark.c > @@ -0,0 +1,137 @@ > +/* > + * Copyright (c) 2011 Felix Fietkau <nbd@openwrt.org> > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple > + * Place - Suite 330, Boston, MA 02111-1307 USA. > + */ > + > +#include <linux/module.h> > +#include <linux/init.h> > +#include <linux/kernel.h> > +#include <linux/skbuff.h> > +#include <linux/rtnetlink.h> > +#include <linux/pkt_cls.h> > +#include <linux/ip.h> > +#include <linux/ipv6.h> > +#include <net/netlink.h> > +#include <net/pkt_sched.h> > +#include <net/act_api.h> > + > +#include <net/netfilter/nf_conntrack.h> > +#include <net/netfilter/nf_conntrack_core.h> > + > +#define TCA_ACT_CONNMARK 20 > + > +#define CONNMARK_TAB_MASK 3 > +static struct tcf_common *tcf_connmark_ht[CONNMARK_TAB_MASK + 1]; > +static u32 connmark_idx_gen; > +static DEFINE_RWLOCK(connmark_lock); > + > +static struct tcf_hashinfo connmark_hash_info = { > + .htab = tcf_connmark_ht, > + .hmask = CONNMARK_TAB_MASK, > + .lock = &connmark_lock, > +}; > + > +static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a, > + struct tcf_result *res) > +{ > + struct nf_conn *c; > + enum ip_conntrack_info ctinfo; > + int proto; > + int r; > + > + if (skb->protocol == htons(ETH_P_IP)) { > + if (skb->len < sizeof(struct iphdr)) > + goto out; > + proto = PF_INET; > + } else if (skb->protocol == htons(ETH_P_IPV6)) { > + if (skb->len < sizeof(struct ipv6hdr)) > + goto out; > + proto = PF_INET6; > + } else > + goto out; > + > + r = nf_conntrack_in(dev_net(skb->dev), proto, NF_INET_PRE_ROUTING, skb); conntrack needs to see defragmented packets, you have to call nf_defrag_ipv4 / _ipv6 respectively before that. This also changes the semantics of the raw table in iptables since it will now see packet with conntrack already attached. So this would also break -j CT --notrack. This needs more thinking. I can appreciate the value of calling conntrack from different points of the packet traversal, but there are a couple of thing we have to resolve before allowing that. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Pablo, On 12-12-24 08:12 AM, Pablo Neira Ayuso wrote: > > conntrack needs to see defragmented packets, you have to call > nf_defrag_ipv4 / _ipv6 respectively before that. > This should not be too hard to do - although my thinking says this should be a separate action. > This also changes the semantics of the raw table in iptables since it > will now see packet with conntrack already attached. So this would > also break -j CT --notrack. > Is there a flag we can check which says a flow is not to be tracked? Doesnt nf_conntrack_in() fail if --no track is set? > This needs more thinking. I can appreciate the value of calling > conntrack from different points of the packet traversal, but there are > a couple of thing we have to resolve before allowing that. There is user need for this Pablo - as you can see from what Felix deployed it seems to be used a lot more wider audience dependency. What do we need to do to get this to work properly? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jamal, On Mon, Dec 24, 2012 at 09:05:42AM -0500, Jamal Hadi Salim wrote: > On 12-12-24 08:12 AM, Pablo Neira Ayuso wrote: > > > >conntrack needs to see defragmented packets, you have to call > >nf_defrag_ipv4 / _ipv6 respectively before that. > > > > This should not be too hard to do - although my thinking says this > should be a separate action. > > >This also changes the semantics of the raw table in iptables since it > >will now see packet with conntrack already attached. So this would > >also break -j CT --notrack. > > Is there a flag we can check which says a flow is not to be tracked? > Doesnt nf_conntrack_in() fail if --no track is set? The notrack dummy conntrack (consider it a flag) is attached in prerouting raw table. By attaching conntracks at ingress, the notrack flag will be ignored. Note that this also breaks conntrack templates via -j CT, that allows us to set custom conntrack timeouts, zones and helpers at prerouting raw. Basically, ct templates are attached via -j CT, this template is munched by nf_conntrack_in, which adds the corresponding ct features based on the template information. > >This needs more thinking. I can appreciate the value of calling > >conntrack from different points of the packet traversal, but there are > >a couple of thing we have to resolve before allowing that. > > There is user need for this Pablo - as you can see from what Felix > deployed it seems to be used a lot more wider audience dependency. > What do we need to do to get this to work properly? The conntrack code needs to be generalized to allow creating conntrack with features all at once (so we can remove the template infrastructure). Even after that, we'll still have that -j CT rules will be ignored if you're using, let's name it, act_ct from ingress to attach the conntrack to it. With the current approach you're using, people will see conntracks in the iptables raw table, that breaks the current semantics. We'll have the netfilter workshop by Q1/Q2 2013 (still TBA), I think this is material for discussion in it. cheers, Pablo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 24, 2012 at 07:19:43PM +0100, Pablo Neira Ayuso wrote: > Hi Jamal, > > On Mon, Dec 24, 2012 at 09:05:42AM -0500, Jamal Hadi Salim wrote: > > On 12-12-24 08:12 AM, Pablo Neira Ayuso wrote: > > > > > >conntrack needs to see defragmented packets, you have to call > > >nf_defrag_ipv4 / _ipv6 respectively before that. > > > > > > > This should not be too hard to do - although my thinking says this > > should be a separate action. > > > > >This also changes the semantics of the raw table in iptables since it > > >will now see packet with conntrack already attached. So this would > > >also break -j CT --notrack. > > > > Is there a flag we can check which says a flow is not to be tracked? > > Doesnt nf_conntrack_in() fail if --no track is set? > > The notrack dummy conntrack (consider it a flag) is attached in > prerouting raw table. By attaching conntracks at ingress, the notrack > flag will be ignored. Note that this also breaks conntrack templates > via -j CT, that allows us to set custom conntrack timeouts, zones and > helpers at prerouting raw. > > Basically, ct templates are attached via -j CT, this template is > munched by nf_conntrack_in, which adds the corresponding ct features > based on the template information. I'm still spinning around this and I don't come with some easy solution that doesn't break the existing semantics. One possibility can be to drop the ct reference after leaving ingress, so the lookup happens again in prerouting after the raw table to attach it again and no ct is seen in the raw table but: 1) it's suboptimal in case users have rules using ct at ingress and in iptables. 2) the conntrack template infrastructure needs to be reworked/replaced by something more flexible to attach features to conntracks, so we can still attach features for conntrack entries that were created at ingress (so helpers / custom timeouts / notrack don't break). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- /dev/null +++ b/net/sched/act_connmark.c @@ -0,0 +1,137 @@ +/* + * Copyright (c) 2011 Felix Fietkau <nbd@openwrt.org> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/skbuff.h> +#include <linux/rtnetlink.h> +#include <linux/pkt_cls.h> +#include <linux/ip.h> +#include <linux/ipv6.h> +#include <net/netlink.h> +#include <net/pkt_sched.h> +#include <net/act_api.h> + +#include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_conntrack_core.h> + +#define TCA_ACT_CONNMARK 20 + +#define CONNMARK_TAB_MASK 3 +static struct tcf_common *tcf_connmark_ht[CONNMARK_TAB_MASK + 1]; +static u32 connmark_idx_gen; +static DEFINE_RWLOCK(connmark_lock); + +static struct tcf_hashinfo connmark_hash_info = { + .htab = tcf_connmark_ht, + .hmask = CONNMARK_TAB_MASK, + .lock = &connmark_lock, +}; + +static int tcf_connmark(struct sk_buff *skb, const struct tc_action *a, + struct tcf_result *res) +{ + struct nf_conn *c; + enum ip_conntrack_info ctinfo; + int proto; + int r; + + if (skb->protocol == htons(ETH_P_IP)) { + if (skb->len < sizeof(struct iphdr)) + goto out; + proto = PF_INET; + } else if (skb->protocol == htons(ETH_P_IPV6)) { + if (skb->len < sizeof(struct ipv6hdr)) + goto out; + proto = PF_INET6; + } else + goto out; + + r = nf_conntrack_in(dev_net(skb->dev), proto, NF_INET_PRE_ROUTING, skb); + if (r != NF_ACCEPT) + goto out; + + c = nf_ct_get(skb, &ctinfo); + if (!c) + goto out; + + skb->mark = c->mark; + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; + +out: + return TC_ACT_PIPE; +} + +static int tcf_connmark_init(struct nlattr *nla, struct nlattr *est, + struct tc_action *a, int ovr, int bind) +{ + struct tcf_common *pc; + + pc = tcf_hash_create(0, est, a, sizeof(*pc), bind, + &connmark_idx_gen, &connmark_hash_info); + if (IS_ERR(pc)) + return PTR_ERR(pc); + + tcf_hash_insert(pc, &connmark_hash_info); + + return ACT_P_CREATED; +} + +static inline int tcf_connmark_cleanup(struct tc_action *a, int bind) +{ + if (a->priv) + return tcf_hash_release(a->priv, bind, &connmark_hash_info); + return 0; +} + +static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a, + int bind, int ref) +{ + return skb->len; +} + +static struct tc_action_ops act_connmark_ops = { + .kind = "connmark", + .hinfo = &connmark_hash_info, + .type = TCA_ACT_CONNMARK, + .capab = TCA_CAP_NONE, + .owner = THIS_MODULE, + .act = tcf_connmark, + .dump = tcf_connmark_dump, + .cleanup = tcf_connmark_cleanup, + .init = tcf_connmark_init, + .walk = tcf_generic_walker, +}; + +MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>"); +MODULE_DESCRIPTION("Connection tracking mark restoring"); +MODULE_LICENSE("GPL"); + +static int __init connmark_init_module(void) +{ + return tcf_register_action(&act_connmark_ops); +} + +static void __exit connmark_cleanup_module(void) +{ + tcf_unregister_action(&act_connmark_ops); +} + +module_init(connmark_init_module); +module_exit(connmark_cleanup_module); --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -670,6 +670,19 @@ config NET_ACT_CSUM To compile this code as a module, choose M here: the module will be called act_csum. +config NET_ACT_CONNMARK + tristate "Connection Tracking Marking" + depends on NET_CLS_ACT + depends on NF_CONNTRACK + depends on NF_CONNTRACK_MARK + ---help--- + Say Y here to restore the connmark from a scheduler action + + If unsure, say N. + + To compile this code as a module, choose M here: the + module will be called act_connmark. + config NET_CLS_IND bool "Incoming device classification" depends on NET_CLS_U32 || NET_CLS_FW --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -16,6 +16,7 @@ obj-$(CONFIG_NET_ACT_PEDIT) += act_pedit obj-$(CONFIG_NET_ACT_SIMP) += act_simple.o obj-$(CONFIG_NET_ACT_SKBEDIT) += act_skbedit.o obj-$(CONFIG_NET_ACT_CSUM) += act_csum.o +obj-$(CONFIG_NET_ACT_CONNMARK) += act_connmark.o obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o