Message ID | 1428095784-7091-1-git-send-email-ast@plumgrid.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On 04/03/2015 11:16 PM, Alexei Starovoitov wrote: > BPF programs attached to ingress and egress qdiscs see inconsistent skb->data. > For ingress L2 header is already pulled, whereas for egress it's present. > That makes writing programs more difficult. > Make them consistent by pushing L2 before running the programs and pulling > it back afterwards. > Similar approach is taken by skb_defer_rx_timestamp() which does push/pull > before calling ptp_classify_raw()->BPF_PROG_RUN(). > > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Thanks for looking into this. This ends up going via ingress_enqueue(), right? Maybe it would be better to add a new netlink attribute for ingress qdisc there that sets a flag in ingress_qdisc_data to pull the header space before calling tc_classify() and restore it later on? So, it would be configurable from tc. Would that work? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/3/15 2:46 PM, Daniel Borkmann wrote: > On 04/03/2015 11:16 PM, Alexei Starovoitov wrote: >> BPF programs attached to ingress and egress qdiscs see inconsistent >> skb->data. >> For ingress L2 header is already pulled, whereas for egress it's present. >> That makes writing programs more difficult. >> Make them consistent by pushing L2 before running the programs and >> pulling >> it back afterwards. >> Similar approach is taken by skb_defer_rx_timestamp() which does >> push/pull >> before calling ptp_classify_raw()->BPF_PROG_RUN(). >> >> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> > > Thanks for looking into this. This ends up going via ingress_enqueue(), yes. > right? Maybe it would be better to add a new netlink attribute for > ingress qdisc there that sets a flag in ingress_qdisc_data to pull the > header space before calling tc_classify() and restore it later on? > So, it would be configurable from tc. Would that work? you mean a flag that will affect all classifiers? I'm not sure other classifiers care. Noone complained for years. I think it would be overdesign. Here the fix is trivial, which is my preference. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/03/2015 11:52 PM, Alexei Starovoitov wrote: > On 4/3/15 2:46 PM, Daniel Borkmann wrote: >> On 04/03/2015 11:16 PM, Alexei Starovoitov wrote: >>> BPF programs attached to ingress and egress qdiscs see inconsistent >>> skb->data. >>> For ingress L2 header is already pulled, whereas for egress it's present. >>> That makes writing programs more difficult. >>> Make them consistent by pushing L2 before running the programs and >>> pulling >>> it back afterwards. >>> Similar approach is taken by skb_defer_rx_timestamp() which does >>> push/pull >>> before calling ptp_classify_raw()->BPF_PROG_RUN(). >>> >>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >> >> Thanks for looking into this. This ends up going via ingress_enqueue(), > > yes. > >> right? Maybe it would be better to add a new netlink attribute for >> ingress qdisc there that sets a flag in ingress_qdisc_data to pull the >> header space before calling tc_classify() and restore it later on? >> So, it would be configurable from tc. Would that work? > > you mean a flag that will affect all classifiers? I'm not sure other > classifiers care. Noone complained for years. I think it would be > overdesign. Here the fix is trivial, which is my preference. But the 'defect' is actually on the ingress qdisc side, right, not the classifier itself ... so if we do this in the classifier, we add two extra branches to the output path, which would never be taken. Plus, other classifiers wanting to look into ethernet headers would then also need to pull from /within/ their classifier as well. What about classic BPF users? I don't think fixing this in ingress qdisc is overdesign, but rather the better place to fix it. ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/3/15 3:10 PM, Daniel Borkmann wrote: > On 04/03/2015 11:52 PM, Alexei Starovoitov wrote: >> On 4/3/15 2:46 PM, Daniel Borkmann wrote: >>> On 04/03/2015 11:16 PM, Alexei Starovoitov wrote: >>>> BPF programs attached to ingress and egress qdiscs see inconsistent >>>> skb->data. >>>> For ingress L2 header is already pulled, whereas for egress it's >>>> present. >>>> That makes writing programs more difficult. >>>> Make them consistent by pushing L2 before running the programs and >>>> pulling >>>> it back afterwards. >>>> Similar approach is taken by skb_defer_rx_timestamp() which does >>>> push/pull >>>> before calling ptp_classify_raw()->BPF_PROG_RUN(). >>>> >>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >>> >>> Thanks for looking into this. This ends up going via ingress_enqueue(), >> >> yes. >> >>> right? Maybe it would be better to add a new netlink attribute for >>> ingress qdisc there that sets a flag in ingress_qdisc_data to pull the >>> header space before calling tc_classify() and restore it later on? >>> So, it would be configurable from tc. Would that work? >> >> you mean a flag that will affect all classifiers? I'm not sure other >> classifiers care. Noone complained for years. I think it would be >> overdesign. Here the fix is trivial, which is my preference. > > But the 'defect' is actually on the ingress qdisc side, right, not > the classifier itself ... so if we do this in the classifier, we add > two extra branches to the output path, which would never be taken. > > Plus, other classifiers wanting to look into ethernet headers would > then also need to pull from /within/ their classifier as well. What > about classic BPF users? > > I don't think fixing this in ingress qdisc is overdesign, but rather > the better place to fix it. ;) Strongly disagree. 1. there shouldn't be a choice at all for bpf. Because not pulling l2 means it's bug. 2. adding a flag means adding it to iproute2 with default off and making users forgetting it from time to time and have no way of knowing why their programs all of a sudden stopped working. classic falls under the same rules. It doesn't make sense at all to run a program on packet without L2 header. It's very odd both for classic and extended programs. Two 'if' conditions in critical path is bogus argument, since these checks would be there in ingress as well. Same critical path. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/04/2015 12:17 AM, Alexei Starovoitov wrote: ... > 1. there shouldn't be a choice at all for bpf. Because not pulling l2 > means it's bug. Yep, correct. You would also loose context for a possible dissection, at best you only have skb->protocol. > 2. adding a flag means adding it to iproute2 with default off and making > users forgetting it from time to time and have no way of knowing why > their programs all of a sudden stopped working. > > classic falls under the same rules. It doesn't make sense at all to run > a program on packet without L2 header. It's very odd both for classic > and extended programs. Yep. > Two 'if' conditions in critical path is bogus argument, since these > checks would be there in ingress as well. Same critical path. Why bogus? There would be no such test on the normal egress path, where this is irrelevant. I wasn't talking about ingress here. I see the point regarding the user option. So, why not adding a flag to tcf_proto_ops a la `.flags = CLS_REQUIRES_L2` that gets propagated to tcf_proto, and only ingress_enqueue() would need to test if the classifier imposes that requirement, so it can push/pull. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/3/15 3:54 PM, Daniel Borkmann wrote: > On 04/04/2015 12:17 AM, Alexei Starovoitov wrote: > ... >> 1. there shouldn't be a choice at all for bpf. Because not pulling l2 >> means it's bug. > > Yep, correct. You would also loose context for a possible dissection, > at best you only have skb->protocol. > >> 2. adding a flag means adding it to iproute2 with default off and making >> users forgetting it from time to time and have no way of knowing why >> their programs all of a sudden stopped working. >> >> classic falls under the same rules. It doesn't make sense at all to run >> a program on packet without L2 header. It's very odd both for classic >> and extended programs. > > Yep. > >> Two 'if' conditions in critical path is bogus argument, since these >> checks would be there in ingress as well. Same critical path. > > Why bogus? There would be no such test on the normal egress path, > where this is irrelevant. I wasn't talking about ingress here. > > I see the point regarding the user option. So, why not adding a flag > to tcf_proto_ops a la `.flags = CLS_REQUIRES_L2` that gets propagated > to tcf_proto, and only ingress_enqueue() would need to test if the > classifier imposes that requirement, so it can push/pull. ok. that sounds better, but neither tcf_proto nor tcf_proto_ops have 'flags' field today... well, I guess it's time to add flags there. Probably add 'flags' to tcf_proto_ops only and do fl->ops->flags in ingress_enqueue()? Will respin. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/3/15 4:04 PM, Alexei Starovoitov wrote: > On 4/3/15 3:54 PM, Daniel Borkmann wrote: >> On 04/04/2015 12:17 AM, Alexei Starovoitov wrote: >> ... >>> 1. there shouldn't be a choice at all for bpf. Because not pulling l2 >>> means it's bug. >> >> Yep, correct. You would also loose context for a possible dissection, >> at best you only have skb->protocol. >> >>> 2. adding a flag means adding it to iproute2 with default off and making >>> users forgetting it from time to time and have no way of knowing why >>> their programs all of a sudden stopped working. >>> >>> classic falls under the same rules. It doesn't make sense at all to run >>> a program on packet without L2 header. It's very odd both for classic >>> and extended programs. >> >> Yep. >> >>> Two 'if' conditions in critical path is bogus argument, since these >>> checks would be there in ingress as well. Same critical path. >> >> Why bogus? There would be no such test on the normal egress path, >> where this is irrelevant. I wasn't talking about ingress here. >> >> I see the point regarding the user option. So, why not adding a flag >> to tcf_proto_ops a la `.flags = CLS_REQUIRES_L2` that gets propagated >> to tcf_proto, and only ingress_enqueue() would need to test if the >> classifier imposes that requirement, so it can push/pull. > > ok. that sounds better, but neither tcf_proto nor tcf_proto_ops have > 'flags' field today... well, I guess it's time to add flags there. > Probably add 'flags' to tcf_proto_ops only and do fl->ops->flags in > ingress_enqueue()? > > Will respin. nope. will take it back. that doesn't work, since this check cannot be done in ingress_enqueue(), because it sees the pointer to first filter only, so both TCQ_F_INGRESS flag and CLS_REQUIRES_L2 flag need to be checked inside tc_classify_compat() which is a lot worse than my current patch. So I prefer this patch still :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/04/2015 01:11 AM, Alexei Starovoitov wrote: > On 4/3/15 4:04 PM, Alexei Starovoitov wrote: >> On 4/3/15 3:54 PM, Daniel Borkmann wrote: ... >>> I see the point regarding the user option. So, why not adding a flag >>> to tcf_proto_ops a la `.flags = CLS_REQUIRES_L2` that gets propagated >>> to tcf_proto, and only ingress_enqueue() would need to test if the >>> classifier imposes that requirement, so it can push/pull. >> >> ok. that sounds better, but neither tcf_proto nor tcf_proto_ops have >> 'flags' field today... well, I guess it's time to add flags there. I don't think it would be a big problem. >> Probably add 'flags' to tcf_proto_ops only and do fl->ops->flags in >> ingress_enqueue()? Something along that line, yeah. >> Will respin. > > nope. will take it back. > that doesn't work, since this check cannot be done in ingress_enqueue(), > because it sees the pointer to first filter only, so both TCQ_F_INGRESS > flag and CLS_REQUIRES_L2 flag need to be checked inside So on a quick glance, we're calling into cls_bpf_classify() in tp->classify() (net/sched/cls_api.c +265), so all remaining filters in that list we're traversing in cls_bpf_classify() are all BPF filters, no? Have to grab some sleep for now, will be on travel tomorrow. Anyway, worst case it could still be refactored later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/04/2015 01:26 AM, Daniel Borkmann wrote: > On 04/04/2015 01:11 AM, Alexei Starovoitov wrote: ... >> nope. will take it back. >> that doesn't work, since this check cannot be done in ingress_enqueue(), >> because it sees the pointer to first filter only, so both TCQ_F_INGRESS >> flag and CLS_REQUIRES_L2 flag need to be checked inside > > So on a quick glance, we're calling into cls_bpf_classify() in tp->classify() > (net/sched/cls_api.c +265), so all remaining filters in that list we're > traversing in cls_bpf_classify() are all BPF filters, no? I see, you mean the classifier chain, not the chain of filters within the cls_bpf classifier, ok. > Have to grab some sleep for now, will be on travel tomorrow. Anyway, worst > case it could still be refactored later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/3/15 4:48 PM, Daniel Borkmann wrote: > On 04/04/2015 01:26 AM, Daniel Borkmann wrote: >> On 04/04/2015 01:11 AM, Alexei Starovoitov wrote: > ... >>> nope. will take it back. >>> that doesn't work, since this check cannot be done in ingress_enqueue(), >>> because it sees the pointer to first filter only, so both TCQ_F_INGRESS >>> flag and CLS_REQUIRES_L2 flag need to be checked inside >> >> So on a quick glance, we're calling into cls_bpf_classify() in >> tp->classify() >> (net/sched/cls_api.c +265), so all remaining filters in that list we're >> traversing in cls_bpf_classify() are all BPF filters, no? > > I see, you mean the classifier chain, not the chain of filters within > the cls_bpf classifier, ok. yes. the chain of classifiers can have different types, so we cannot check it once in ingress_enqueue(). As you said we can refactor it later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/04/2015 02:14 AM, Alexei Starovoitov wrote: > On 4/3/15 4:48 PM, Daniel Borkmann wrote: >> On 04/04/2015 01:26 AM, Daniel Borkmann wrote: >>> On 04/04/2015 01:11 AM, Alexei Starovoitov wrote: >> ... >>>> nope. will take it back. >>>> that doesn't work, since this check cannot be done in ingress_enqueue(), >>>> because it sees the pointer to first filter only, so both TCQ_F_INGRESS >>>> flag and CLS_REQUIRES_L2 flag need to be checked inside >>> >>> So on a quick glance, we're calling into cls_bpf_classify() in >>> tp->classify() >>> (net/sched/cls_api.c +265), so all remaining filters in that list we're >>> traversing in cls_bpf_classify() are all BPF filters, no? >> >> I see, you mean the classifier chain, not the chain of filters within >> the cls_bpf classifier, ok. > > yes. the chain of classifiers can have different types, so we > cannot check it once in ingress_enqueue(). > As you said we can refactor it later. Too many indirections. :/ Yes, I'm fine with that, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexei Starovoitov <ast@plumgrid.com> Date: Fri, 3 Apr 2015 14:16:24 -0700 > + if (skb_headroom(skb) < ETH_HLEN) > + return -1; > + __skb_push(skb, ETH_HLEN); ... > + if (tp->q->flags & TCQ_F_INGRESS) > + __skb_pull(skb, ETH_HLEN); Please use the actual device's L2 header length, via dev->hard_header_len, rather than hard coding ethernet. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 5c4171c5d2bd..b1cefd2037c9 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -66,6 +66,12 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct cls_bpf_prog *prog; int ret = -1; + if (tp->q->flags & TCQ_F_INGRESS) { + if (skb_headroom(skb) < ETH_HLEN) + return -1; + __skb_push(skb, ETH_HLEN); + } + /* Needed here for accessing maps. */ rcu_read_lock(); list_for_each_entry_rcu(prog, &head->plist, link) { @@ -86,6 +92,9 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp, } rcu_read_unlock(); + if (tp->q->flags & TCQ_F_INGRESS) + __skb_pull(skb, ETH_HLEN); + return ret; }
BPF programs attached to ingress and egress qdiscs see inconsistent skb->data. For ingress L2 header is already pulled, whereas for egress it's present. That makes writing programs more difficult. Make them consistent by pushing L2 before running the programs and pulling it back afterwards. Similar approach is taken by skb_defer_rx_timestamp() which does push/pull before calling ptp_classify_raw()->BPF_PROG_RUN(). Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> --- Looks like noone has tried bpf with ingress qdisc before, otherwise this problem would have been found sooner. This patch is probably not needed for 'net', since tc+bpf infra only starting to come alive. net/sched/cls_bpf.c | 9 +++++++++ 1 file changed, 9 insertions(+)