Message ID | 1472241532-11682-6-git-send-email-daniel@zonque.org |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On 08/26/2016 09:58 PM, Daniel Mack wrote: > If the cgroup associated with the receiving socket has an eBPF > programs installed, run them from __dev_queue_xmit(). > > eBPF programs used in this context are expected to either return 1 to > let the packet pass, or != 1 to drop them. The programs have access to > the full skb, including the MAC headers. > > Note that cgroup_bpf_run_filter() is stubbed out as static inline nop > for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if > the feature is unused. > > Signed-off-by: Daniel Mack <daniel@zonque.org> > --- > net/core/dev.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index a75df86..17484e6 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -141,6 +141,7 @@ > #include <linux/netfilter_ingress.h> > #include <linux/sctp.h> > #include <linux/crash_dump.h> > +#include <linux/bpf-cgroup.h> > > #include "net-sysfs.h" > > @@ -3329,6 +3330,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) > if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) > __skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED); > > + rc = cgroup_bpf_run_filter(skb->sk, skb, > + BPF_ATTACH_TYPE_CGROUP_INET_EGRESS); > + if (rc) > + return rc; This would leak the whole skb by the way. Apart from that, could this be modeled w/o affecting the forwarding path (at some local output point where we know to have a valid socket)? Then you could also drop the !sk and sk->sk_family tests, and we wouldn't need to replicate parts of what clsact is doing as well. Hmm, maybe access to src/dst mac could be handled to be just zeroes since not available at that point? > /* Disable soft irqs for various locks below. Also > * stops preemption for RCU. > */ >
On Tue, Aug 30, 2016 at 12:03:23AM +0200, Daniel Borkmann wrote: > On 08/26/2016 09:58 PM, Daniel Mack wrote: > >If the cgroup associated with the receiving socket has an eBPF > >programs installed, run them from __dev_queue_xmit(). > > > >eBPF programs used in this context are expected to either return 1 to > >let the packet pass, or != 1 to drop them. The programs have access to > >the full skb, including the MAC headers. > > > >Note that cgroup_bpf_run_filter() is stubbed out as static inline nop > >for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if > >the feature is unused. > > > >Signed-off-by: Daniel Mack <daniel@zonque.org> > >--- > > net/core/dev.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > >diff --git a/net/core/dev.c b/net/core/dev.c > >index a75df86..17484e6 100644 > >--- a/net/core/dev.c > >+++ b/net/core/dev.c > >@@ -141,6 +141,7 @@ > > #include <linux/netfilter_ingress.h> > > #include <linux/sctp.h> > > #include <linux/crash_dump.h> > >+#include <linux/bpf-cgroup.h> > > > > #include "net-sysfs.h" > > > >@@ -3329,6 +3330,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) > > if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) > > __skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED); > > > >+ rc = cgroup_bpf_run_filter(skb->sk, skb, > >+ BPF_ATTACH_TYPE_CGROUP_INET_EGRESS); > >+ if (rc) > >+ return rc; > > This would leak the whole skb by the way. > > Apart from that, could this be modeled w/o affecting the forwarding path (at some > local output point where we know to have a valid socket)? Then you could also drop > the !sk and sk->sk_family tests, and we wouldn't need to replicate parts of what > clsact is doing as well. Hmm, maybe access to src/dst mac could be handled to be > just zeroes since not available at that point? > > > /* Disable soft irqs for various locks below. Also > > * stops preemption for RCU. > > */ > > Given this patchset only effects AF_INET, and AF_INET6, why not put the hooks at ip_output, and ip6_output
On 08/30/2016 12:03 AM, Daniel Borkmann wrote: > On 08/26/2016 09:58 PM, Daniel Mack wrote: >> diff --git a/net/core/dev.c b/net/core/dev.c >> index a75df86..17484e6 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -141,6 +141,7 @@ >> #include <linux/netfilter_ingress.h> >> #include <linux/sctp.h> >> #include <linux/crash_dump.h> >> +#include <linux/bpf-cgroup.h> >> >> #include "net-sysfs.h" >> >> @@ -3329,6 +3330,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) >> if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) >> __skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED); >> >> + rc = cgroup_bpf_run_filter(skb->sk, skb, >> + BPF_ATTACH_TYPE_CGROUP_INET_EGRESS); >> + if (rc) >> + return rc; > > This would leak the whole skb by the way. Ah, right. > Apart from that, could this be modeled w/o affecting the forwarding path (at some > local output point where we know to have a valid socket)? Then you could also drop > the !sk and sk->sk_family tests, and we wouldn't need to replicate parts of what > clsact is doing as well. Hmm, maybe access to src/dst mac could be handled to be > just zeroes since not available at that point? Hmm, I wonder where this hook could be put instead then. When placed in ip_output() and ip6_output(), the mac headers cannot be pushed before running the program, resulting in bogus skb data from the eBPF program. Also, if I read the code correctly, ip[6]_output is not called for multicast packets. Any other ideas? Thanks, Daniel
On 09/05/2016 04:22 PM, Daniel Mack wrote: > On 08/30/2016 12:03 AM, Daniel Borkmann wrote: >> On 08/26/2016 09:58 PM, Daniel Mack wrote: > >>> diff --git a/net/core/dev.c b/net/core/dev.c >>> index a75df86..17484e6 100644 >>> --- a/net/core/dev.c >>> +++ b/net/core/dev.c >>> @@ -141,6 +141,7 @@ >>> #include <linux/netfilter_ingress.h> >>> #include <linux/sctp.h> >>> #include <linux/crash_dump.h> >>> +#include <linux/bpf-cgroup.h> >>> >>> #include "net-sysfs.h" >>> >>> @@ -3329,6 +3330,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) >>> if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) >>> __skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED); >>> >>> + rc = cgroup_bpf_run_filter(skb->sk, skb, >>> + BPF_ATTACH_TYPE_CGROUP_INET_EGRESS); >>> + if (rc) >>> + return rc; >> >> This would leak the whole skb by the way. > > Ah, right. > >> Apart from that, could this be modeled w/o affecting the forwarding path (at some >> local output point where we know to have a valid socket)? Then you could also drop >> the !sk and sk->sk_family tests, and we wouldn't need to replicate parts of what >> clsact is doing as well. Hmm, maybe access to src/dst mac could be handled to be >> just zeroes since not available at that point? > > Hmm, I wonder where this hook could be put instead then. When placed in > ip_output() and ip6_output(), the mac headers cannot be pushed before > running the program, resulting in bogus skb data from the eBPF program. But as it stands right now, RX will only see a subset of packets in sk_filter() layer (depending on where it's called in the proto handler implementation, so might not even include all control messages, for example) as opposed to the TX hook going that far even 'seeing' everything incl. forwarded packets in the sense that we know a priori that these kind of skbs going through the cgroup_bpf_run_filter() handler when the hook is enabled will just skip this hook eventually anyway. What about letting such progs see /only/ local skbs for RX and TX, with skb->data from L3 onwards (iirc, that would be similar to what current sk_filter() programs see)?
diff --git a/net/core/dev.c b/net/core/dev.c index a75df86..17484e6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -141,6 +141,7 @@ #include <linux/netfilter_ingress.h> #include <linux/sctp.h> #include <linux/crash_dump.h> +#include <linux/bpf-cgroup.h> #include "net-sysfs.h" @@ -3329,6 +3330,11 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) __skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_SCHED); + rc = cgroup_bpf_run_filter(skb->sk, skb, + BPF_ATTACH_TYPE_CGROUP_INET_EGRESS); + if (rc) + return rc; + /* Disable soft irqs for various locks below. Also * stops preemption for RCU. */
If the cgroup associated with the receiving socket has an eBPF programs installed, run them from __dev_queue_xmit(). eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the full skb, including the MAC headers. Note that cgroup_bpf_run_filter() is stubbed out as static inline nop for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if the feature is unused. Signed-off-by: Daniel Mack <daniel@zonque.org> --- net/core/dev.c | 6 ++++++ 1 file changed, 6 insertions(+)