diff mbox

[v8,01/11] bpf: add XDP prog type for early driver filter

Message ID 1468309894-26258-2-git-send-email-bblanco@plumgrid.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Brenden Blanco July 12, 2016, 7:51 a.m. UTC
Add a new bpf prog type that is intended to run in early stages of the
packet rx path. Only minimal packet metadata will be available, hence a
new context type, struct xdp_md, is exposed to userspace. So far only
expose the packet start and end pointers, and only in read mode.

An XDP program must return one of the well known enum values, all other
return codes are reserved for future use. Unfortunately, this
restriction is hard to enforce at verification time, so take the
approach of warning at runtime when such programs are encountered. Out
of bounds return codes should alias to XDP_ABORTED.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
---
 include/linux/filter.h   | 18 +++++++++++
 include/uapi/linux/bpf.h | 20 ++++++++++++
 kernel/bpf/verifier.c    |  1 +
 net/core/filter.c        | 79 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 118 insertions(+)

Comments

Jesper Dangaard Brouer July 12, 2016, 1:14 p.m. UTC | #1
On Tue, 12 Jul 2016 00:51:24 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:

> Add a new bpf prog type that is intended to run in early stages of the
> packet rx path. Only minimal packet metadata will be available, hence a
> new context type, struct xdp_md, is exposed to userspace. So far only
> expose the packet start and end pointers, and only in read mode.
> 
> An XDP program must return one of the well known enum values, all other
> return codes are reserved for future use. Unfortunately, this
> restriction is hard to enforce at verification time, so take the
> approach of warning at runtime when such programs are encountered. Out
> of bounds return codes should alias to XDP_ABORTED.

This is going to be a serious usability problem, when XDP gets extended
with newer features.

How can users know what XDP capabilities a given driver support?

If the user loads a XDP program using capabilities not avail in the
driver, then all his traffic will be hard dropped.


My proposal is to NOT allow XDP programs to be loaded if they want to use
return-codes/features not supported by the driver.  Thus, eBPF programs
register/annotate their needed capabilities upfront (I guess this could
also help HW offload engines).
Tom Herbert July 12, 2016, 2:52 p.m. UTC | #2
On Tue, Jul 12, 2016 at 6:14 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Tue, 12 Jul 2016 00:51:24 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
>
>> Add a new bpf prog type that is intended to run in early stages of the
>> packet rx path. Only minimal packet metadata will be available, hence a
>> new context type, struct xdp_md, is exposed to userspace. So far only
>> expose the packet start and end pointers, and only in read mode.
>>
>> An XDP program must return one of the well known enum values, all other
>> return codes are reserved for future use. Unfortunately, this
>> restriction is hard to enforce at verification time, so take the
>> approach of warning at runtime when such programs are encountered. Out
>> of bounds return codes should alias to XDP_ABORTED.
>
> This is going to be a serious usability problem, when XDP gets extended
> with newer features.
>
> How can users know what XDP capabilities a given driver support?
>
We talked about this a the XDP summit and I have some slides on this
(hope to have slides posted shortly) . Drivers, more generally XDP
platforms, will provide a list APIs that they support. APIs would be
contained in a sort common specification files that and would have
some name like basic_xdp_api, adv_switch_api, etc. An XDP program is
written using one of the published APIs and the API it uses is
expressed as part of the program. At runtime (possibly compile time)
the API used by the program is checked against the list of APIs for
the target platform-- if the API is not in the set then the program is
not allowed to be loaded. To whatever extent possible we should also
try to verify that program adhere's to the API as load and compile
time. In the event that program violates the API it claims to be
using, such as return invalid return code for the API, that is an
error condition.

> If the user loads a XDP program using capabilities not avail in the
> driver, then all his traffic will be hard dropped.
>
>
> My proposal is to NOT allow XDP programs to be loaded if they want to use
> return-codes/features not supported by the driver.  Thus, eBPF programs
> register/annotate their needed capabilities upfront (I guess this could
> also help HW offload engines).
>
Exactly. We just need to define exactly how this is done ;-)

One open issue is whether XDP needs to be binary compatible across
platforms or source code compatible (also require recompile in latter
case for each platform). Personally I prefer source code compatible
since that might allow platforms to implement the API specifically for
their needs (like the ordering of fields in a meta data structure.

Tom

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer
Jakub Kicinski July 12, 2016, 4:08 p.m. UTC | #3
On Tue, 12 Jul 2016 07:52:53 -0700, Tom Herbert wrote:
> On Tue, Jul 12, 2016 at 6:14 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Tue, 12 Jul 2016 00:51:24 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
> >  
> >> Add a new bpf prog type that is intended to run in early stages of the
> >> packet rx path. Only minimal packet metadata will be available, hence a
> >> new context type, struct xdp_md, is exposed to userspace. So far only
> >> expose the packet start and end pointers, and only in read mode.
> >>
> >> An XDP program must return one of the well known enum values, all other
> >> return codes are reserved for future use. Unfortunately, this
> >> restriction is hard to enforce at verification time, so take the
> >> approach of warning at runtime when such programs are encountered. Out
> >> of bounds return codes should alias to XDP_ABORTED.  
> >
> > This is going to be a serious usability problem, when XDP gets extended
> > with newer features.
> >
> > How can users know what XDP capabilities a given driver support?

I'm personally not a big fan of return codes.  I'm hoping we can express
most decisions by setting fields in metadata.  It provides a better
structure and is easier to detect/translate/optimize.

> We talked about this a the XDP summit and I have some slides on this
> (hope to have slides posted shortly) . Drivers, more generally XDP
> platforms, will provide a list APIs that they support. APIs would be
> contained in a sort common specification files that and would have
> some name like basic_xdp_api, adv_switch_api, etc. An XDP program is
> written using one of the published APIs and the API it uses is
> expressed as part of the program. At runtime (possibly compile time)
> the API used by the program is checked against the list of APIs for
> the target platform-- if the API is not in the set then the program is
> not allowed to be loaded. To whatever extent possible we should also
> try to verify that program adhere's to the API as load and compile
> time. In the event that program violates the API it claims to be
> using, such as return invalid return code for the API, that is an
> error condition.

+1

> > If the user loads a XDP program using capabilities not avail in the
> > driver, then all his traffic will be hard dropped.
> >
> >
> > My proposal is to NOT allow XDP programs to be loaded if they want to use
> > return-codes/features not supported by the driver.  Thus, eBPF programs
> > register/annotate their needed capabilities upfront (I guess this could
> > also help HW offload engines).

Not sure we need annotation, use of an API will probably be easily
detectable (call of a function, access to metadata field).  Verifier
could collect that info in-kernel with little effort.

> Exactly. We just need to define exactly how this is done ;-)
> 
> One open issue is whether XDP needs to be binary compatible across
> platforms or source code compatible (also require recompile in latter
> case for each platform). Personally I prefer source code compatible
> since that might allow platforms to implement the API specifically for
> their needs (like the ordering of fields in a meta data structure.

Since XDP is so close to hardware by design, I think we could consider
introducing per-HW translation stage between the verifier and host JIT.
For extended metadata problem for example - we could define metadata as:

meta {
	void *packet_start;
	void *packet_end;
	struct standard_meta *extended_meta;
}

standard_meta {
	vlan;
	timestamp;
	hash;
	...
}

Program coming from the user space would just use standard_meta but
per-driver/per-HW translator would patch it up.  extended_meta pointer
would probably become pointer to HW-specific queue descriptor at
runtime.  The per-HW translator stage would change the offsets and add
necessary checks and fix-ups.  It could even be possible to translate
from extended_meta access to access to metadata prepended to the frame,
translator would need a hint from the verifier on how to get the packet
pointer.

I haven't thought this through in detail, it's just an idea which would
help us to keep binary compatibility.  HW-specific optimizations in
user space compiler would be great, but breaking binary compatibility
is a bit of a scary step.
Alexei Starovoitov July 13, 2016, 4:14 a.m. UTC | #4
On Tue, Jul 12, 2016 at 07:52:53AM -0700, Tom Herbert wrote:
> On Tue, Jul 12, 2016 at 6:14 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Tue, 12 Jul 2016 00:51:24 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
> >
> >> Add a new bpf prog type that is intended to run in early stages of the
> >> packet rx path. Only minimal packet metadata will be available, hence a
> >> new context type, struct xdp_md, is exposed to userspace. So far only
> >> expose the packet start and end pointers, and only in read mode.
> >>
> >> An XDP program must return one of the well known enum values, all other
> >> return codes are reserved for future use. Unfortunately, this
> >> restriction is hard to enforce at verification time, so take the
> >> approach of warning at runtime when such programs are encountered. Out
> >> of bounds return codes should alias to XDP_ABORTED.
> >
> > This is going to be a serious usability problem, when XDP gets extended
> > with newer features.
> >
> > How can users know what XDP capabilities a given driver support?
> >
> We talked about this a the XDP summit and I have some slides on this
> (hope to have slides posted shortly) . Drivers, more generally XDP
> platforms, will provide a list APIs that they support. APIs would be
> contained in a sort common specification files that and would have
> some name like basic_xdp_api, adv_switch_api, etc. An XDP program is
> written using one of the published APIs and the API it uses is
> expressed as part of the program. At runtime (possibly compile time)
> the API used by the program is checked against the list of APIs for
> the target platform-- if the API is not in the set then the program is
> not allowed to be loaded. To whatever extent possible we should also
> try to verify that program adhere's to the API as load and compile
> time. In the event that program violates the API it claims to be
> using, such as return invalid return code for the API, that is an
> error condition.
> 
> > If the user loads a XDP program using capabilities not avail in the
> > driver, then all his traffic will be hard dropped.
> >
> >
> > My proposal is to NOT allow XDP programs to be loaded if they want to use
> > return-codes/features not supported by the driver.  Thus, eBPF programs
> > register/annotate their needed capabilities upfront (I guess this could
> > also help HW offload engines).
> >
> Exactly. We just need to define exactly how this is done ;-)

yep. As we discussed at the summit there is a basic set of commands
drop/pass/tx that all drivers are expected to perform if they claim
xdp support and there will be driver/hw specific extensions.
We should be able to exercise all hw capabilites in a vendor neutral way.
It's a bit contradiction, obviously. We want xdp to be generic
and at the same time allow hw specific extensions.

> One open issue is whether XDP needs to be binary compatible across
> platforms or source code compatible (also require recompile in latter
> case for each platform). Personally I prefer source code compatible
> since that might allow platforms to implement the API specifically for
> their needs (like the ordering of fields in a meta data structure.

I agree that for some cases we have to give up on binary compatibility.
I think it will be similar to cpu world. Where applications
are compiled for different flavors of instruction set.
Like we might have encryption facility that is available
for sw path and offloaded into the most capable NICs.
I don't think annotations will solve that, since annotations
are hints when compiler/user space can be trusted which is
not the case here. It's more likely that we'll have driver
specific metadata that will hook into the verifier framework, so
we can make sure at load time that memory accesses and helper
calls are valid. That will make programs using hw specific
extensions to be loadable only on the given hw which I think
is inevitable, since we're operating at the lowest level,
right next to hw and any sw abstraction/generalization will
only slow things down.
skb is a generic facility where physical and virtual devices
look the same to programs. xdp is hw specific place.
If we try to generalize things too much at xdp level,
it will become skb-like. So beyond drop/pass/tx I can only
think of very few 'for all drivers' features.
diff mbox

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6fc31ef..15d816a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -368,6 +368,11 @@  struct bpf_skb_data_end {
 	void *data_end;
 };
 
+struct xdp_buff {
+	void *data;
+	void *data_end;
+};
+
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf and act_bpf programs
  */
@@ -429,6 +434,18 @@  static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 	return BPF_PROG_RUN(prog, skb);
 }
 
+static inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
+				   struct xdp_buff *xdp)
+{
+	u32 ret;
+
+	rcu_read_lock();
+	ret = BPF_PROG_RUN(prog, (void *)xdp);
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
 	return max(sizeof(struct bpf_prog),
@@ -509,6 +526,7 @@  bool bpf_helper_changes_skb_data(void *func);
 
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
+void bpf_warn_invalid_xdp_action(u32 act);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 262a7e8..4282d44 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@  enum bpf_prog_type {
 	BPF_PROG_TYPE_SCHED_CLS,
 	BPF_PROG_TYPE_SCHED_ACT,
 	BPF_PROG_TYPE_TRACEPOINT,
+	BPF_PROG_TYPE_XDP,
 };
 
 #define BPF_PSEUDO_MAP_FD	1
@@ -437,4 +438,23 @@  struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
+/* User return codes for XDP prog type.
+ * A valid XDP program must return one of these defined values. All other
+ * return codes are reserved for future use. Unknown return codes will result
+ * in packet drop.
+ */
+enum xdp_action {
+	XDP_ABORTED = 0,
+	XDP_DROP,
+	XDP_PASS,
+};
+
+/* user accessible metadata for XDP packet hook
+ * new fields must be added to the end of this structure
+ */
+struct xdp_md {
+	__u32 data;
+	__u32 data_end;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e206c21..a8d67d0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -713,6 +713,7 @@  static int check_ptr_alignment(struct verifier_env *env, struct reg_state *reg,
 	switch (env->prog->type) {
 	case BPF_PROG_TYPE_SCHED_CLS:
 	case BPF_PROG_TYPE_SCHED_ACT:
+	case BPF_PROG_TYPE_XDP:
 		break;
 	default:
 		verbose("verifier is misconfigured\n");
diff --git a/net/core/filter.c b/net/core/filter.c
index 10c4a2f..2d770f5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2369,6 +2369,12 @@  tc_cls_act_func_proto(enum bpf_func_id func_id)
 	}
 }
 
+static const struct bpf_func_proto *
+xdp_func_proto(enum bpf_func_id func_id)
+{
+	return sk_filter_func_proto(func_id);
+}
+
 static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 {
 	if (off < 0 || off >= sizeof(struct __sk_buff))
@@ -2436,6 +2442,44 @@  static bool tc_cls_act_is_valid_access(int off, int size,
 	return __is_valid_access(off, size, type);
 }
 
+static bool __is_valid_xdp_access(int off, int size,
+				  enum bpf_access_type type)
+{
+	if (off < 0 || off >= sizeof(struct xdp_md))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (size != 4)
+		return false;
+
+	return true;
+}
+
+static bool xdp_is_valid_access(int off, int size,
+				enum bpf_access_type type,
+				enum bpf_reg_type *reg_type)
+{
+	if (type == BPF_WRITE)
+		return false;
+
+	switch (off) {
+	case offsetof(struct xdp_md, data):
+		*reg_type = PTR_TO_PACKET;
+		break;
+	case offsetof(struct xdp_md, data_end):
+		*reg_type = PTR_TO_PACKET_END;
+		break;
+	}
+
+	return __is_valid_xdp_access(off, size, type);
+}
+
+void bpf_warn_invalid_xdp_action(u32 act)
+{
+	WARN_ONCE(1, "Illegal XDP return value %u, expect packet loss\n", act);
+}
+EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
+
 static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 				      int src_reg, int ctx_off,
 				      struct bpf_insn *insn_buf,
@@ -2587,6 +2631,29 @@  static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 	return insn - insn_buf;
 }
 
+static u32 xdp_convert_ctx_access(enum bpf_access_type type, int dst_reg,
+				  int src_reg, int ctx_off,
+				  struct bpf_insn *insn_buf,
+				  struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (ctx_off) {
+	case offsetof(struct xdp_md, data):
+		*insn++ = BPF_LDX_MEM(bytes_to_bpf_size(FIELD_SIZEOF(struct xdp_buff, data)),
+				      dst_reg, src_reg,
+				      offsetof(struct xdp_buff, data));
+		break;
+	case offsetof(struct xdp_md, data_end):
+		*insn++ = BPF_LDX_MEM(bytes_to_bpf_size(FIELD_SIZEOF(struct xdp_buff, data_end)),
+				      dst_reg, src_reg,
+				      offsetof(struct xdp_buff, data_end));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
 static const struct bpf_verifier_ops sk_filter_ops = {
 	.get_func_proto		= sk_filter_func_proto,
 	.is_valid_access	= sk_filter_is_valid_access,
@@ -2599,6 +2666,12 @@  static const struct bpf_verifier_ops tc_cls_act_ops = {
 	.convert_ctx_access	= bpf_net_convert_ctx_access,
 };
 
+static const struct bpf_verifier_ops xdp_ops = {
+	.get_func_proto		= xdp_func_proto,
+	.is_valid_access	= xdp_is_valid_access,
+	.convert_ctx_access	= xdp_convert_ctx_access,
+};
+
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
 	.ops	= &sk_filter_ops,
 	.type	= BPF_PROG_TYPE_SOCKET_FILTER,
@@ -2614,11 +2687,17 @@  static struct bpf_prog_type_list sched_act_type __read_mostly = {
 	.type	= BPF_PROG_TYPE_SCHED_ACT,
 };
 
+static struct bpf_prog_type_list xdp_type __read_mostly = {
+	.ops	= &xdp_ops,
+	.type	= BPF_PROG_TYPE_XDP,
+};
+
 static int __init register_sk_filter_ops(void)
 {
 	bpf_register_prog_type(&sk_filter_type);
 	bpf_register_prog_type(&sched_cls_type);
 	bpf_register_prog_type(&sched_act_type);
+	bpf_register_prog_type(&xdp_type);
 
 	return 0;
 }