Message ID | 1460090930-11219-2-git-send-email-bblanco@plumgrid.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 7 Apr 2016 21:48:47 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote: > Add two new set/get netdev ops for drivers implementing the > BPF_PROG_TYPE_PHYS_DEV filter. > > Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> [...] > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index cb4e508..3acf732 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h [...] > @@ -1102,6 +1103,14 @@ struct tc_to_netdev { > * appropriate rx headroom value allows avoiding skb head copy on > * forward. Setting a negative value resets the rx headroom to the > * default value. > + * int (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog); > + * This function is used to set or clear a bpf program used in the > + * earliest stages of packet rx. The prog will have been loaded as > + * BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling > + * bpf_prog_put on any old progs that are stored, but not on the passed > + * in prog. > + * bool (*ndo_bpf_get)(struct net_device *dev); > + * This function is used to check if a bpf program is set on the device. > * This interface for the entire device, right. I can imagine that users want to attach a eBPF program per RX queue. Can we adapt the interface to support this? (could default to adding it all queues) I'm also wondering if we should add a "flags" parameter. Or maybe we can extend 'struct bpf_prog' with I have in mind. When the eBPF program is attached to a RX queue, I want to know if the program want to modify packet-data, up-front. The problem is that most drivers use dma_sync, which means that data is considered 'read-only' (the "considered" part depend on DMA engine, and we might find a DMA loop-hole for some configs). If the program want to write, the driver have the option of reconfiguring the driver routine to use dma_unmap, before handing over the page. Or driver can also choose to realloc the specific RX ring queue pages as single pages (using dma_map/unmap consistently). This also allow us to give a return code indicating given driver does not support writable packet-pages (rejecting program).
On Fri, Apr 08, 2016 at 11:38:58AM +0200, Jesper Dangaard Brouer wrote: > > On Thu, 7 Apr 2016 21:48:47 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote: > > > Add two new set/get netdev ops for drivers implementing the > > BPF_PROG_TYPE_PHYS_DEV filter. > > > > Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> > [...] > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index cb4e508..3acf732 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > [...] > > @@ -1102,6 +1103,14 @@ struct tc_to_netdev { > > * appropriate rx headroom value allows avoiding skb head copy on > > * forward. Setting a negative value resets the rx headroom to the > > * default value. > > + * int (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog); > > + * This function is used to set or clear a bpf program used in the > > + * earliest stages of packet rx. The prog will have been loaded as > > + * BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling > > + * bpf_prog_put on any old progs that are stored, but not on the passed > > + * in prog. > > + * bool (*ndo_bpf_get)(struct net_device *dev); > > + * This function is used to check if a bpf program is set on the device. > > * > > This interface for the entire device, right. I can imagine that users > want to attach a eBPF program per RX queue. Can we adapt the interface > to support this? (could default to adding it all queues) > Currently yes, for the entire device. I don't see rx queue exposed in common setlink APIs. Wouldn't this be available only through ethtool, generally? That would be a significant change to the api, but not a lot of code. I would defer to others on which is cleaner. An alternative could be to always run the program, but expose the queue number in struct bpf_phys_dev_md. That is not as flexible since the program is still shared, but maybe still useful. > > I'm also wondering if we should add a "flags" parameter. Or maybe we > can extend 'struct bpf_prog' with I have in mind. > > When the eBPF program is attached to a RX queue, I want to know if the > program want to modify packet-data, up-front. > > The problem is that most drivers use dma_sync, which means that data is > considered 'read-only' (the "considered" part depend on DMA engine, and > we might find a DMA loop-hole for some configs). > If the program want to write, the driver have the option of > reconfiguring the driver routine to use dma_unmap, before handing over > the page. Or driver can also choose to realloc the specific RX ring > queue pages as single pages (using dma_map/unmap consistently). > This also allow us to give a return code indicating given driver does > not support writable packet-pages (rejecting program). When write-mode is enabled for this prog type, we'll add the flag. I don't want to add unused flags prematurely. When we add such support, it should be available in the bpf_prog struct, similar to the cb_access or dst_needed bool fields. > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cb4e508..3acf732 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -61,6 +61,7 @@ struct wireless_dev; /* 802.15.4 specific */ struct wpan_dev; struct mpls_dev; +struct bpf_prog; void netdev_set_default_ethtool_ops(struct net_device *dev, const struct ethtool_ops *ops); @@ -1102,6 +1103,14 @@ struct tc_to_netdev { * appropriate rx headroom value allows avoiding skb head copy on * forward. Setting a negative value resets the rx headroom to the * default value. + * int (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog); + * This function is used to set or clear a bpf program used in the + * earliest stages of packet rx. The prog will have been loaded as + * BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling + * bpf_prog_put on any old progs that are stored, but not on the passed + * in prog. + * bool (*ndo_bpf_get)(struct net_device *dev); + * This function is used to check if a bpf program is set on the device. * */ struct net_device_ops { @@ -1292,6 +1301,9 @@ struct net_device_ops { struct sk_buff *skb); void (*ndo_set_rx_headroom)(struct net_device *dev, int needed_headroom); + int (*ndo_bpf_set)(struct net_device *dev, + struct bpf_prog *prog); + bool (*ndo_bpf_get)(struct net_device *dev); }; /** @@ -3251,6 +3263,7 @@ int dev_get_phys_port_id(struct net_device *dev, int dev_get_phys_port_name(struct net_device *dev, char *name, size_t len); int dev_change_proto_down(struct net_device *dev, bool proto_down); +int dev_change_bpf_fd(struct net_device *dev, int fd); struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev); struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, int *ret); diff --git a/net/core/dev.c b/net/core/dev.c index 273f10d..7cf749c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -94,6 +94,7 @@ #include <linux/ethtool.h> #include <linux/notifier.h> #include <linux/skbuff.h> +#include <linux/bpf.h> #include <net/net_namespace.h> #include <net/sock.h> #include <net/busy_poll.h> @@ -6483,6 +6484,43 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down) EXPORT_SYMBOL(dev_change_proto_down); /** + * dev_change_bpf_fd - set or clear a bpf program for a device + * @dev: device + * @fd: new program fd or negative value to clear + * + * Set or clear a bpf program for a device + */ +int dev_change_bpf_fd(struct net_device *dev, int fd) +{ + const struct net_device_ops *ops = dev->netdev_ops; + struct bpf_prog *prog = NULL; + int err; + + if (!ops->ndo_bpf_set) + return -EOPNOTSUPP; + if (!netif_device_present(dev)) + return -ENODEV; + + if (fd >= 0) { + prog = bpf_prog_get(fd); + if (IS_ERR(prog)) + return PTR_ERR(prog); + + if (prog->type != BPF_PROG_TYPE_PHYS_DEV) { + bpf_prog_put(prog); + return -EINVAL; + } + } + + err = ops->ndo_bpf_set(dev, prog); + if (err < 0 && prog) + bpf_prog_put(prog); + + return err; +} +EXPORT_SYMBOL(dev_change_bpf_fd); + +/** * dev_new_index - allocate an ifindex * @net: the applicable net namespace *
Add two new set/get netdev ops for drivers implementing the BPF_PROG_TYPE_PHYS_DEV filter. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> --- include/linux/netdevice.h | 13 +++++++++++++ net/core/dev.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+)