From patchwork Thu Jul 26 14:40:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949753 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="WmjKrZDD"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvsr1xFdz9ryl for ; Fri, 27 Jul 2018 00:41:04 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730524AbeGZP6M (ORCPT ); Thu, 26 Jul 2018 11:58:12 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:40007 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP6M (ORCPT ); Thu, 26 Jul 2018 11:58:12 -0400 Received: by mail-pl0-f65.google.com with SMTP id s17-v6so922731plp.7 for ; Thu, 26 Jul 2018 07:41:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kCvFyIF60tzPmtz4gtKG6yaANHbgdKH1t1woD6Dz7DA=; b=WmjKrZDDwf+XnuuEr1EVemR/dZdos0tcuju879mpU8FDuqjzQ9arqP+FsJkxfTvz3N 3DMqjW81Fn0nN81rdrYP3WCcLE6ks3heWOkihtZZbQHNMkHXWD6/u6OydZEexaKOoSO2 FaH4WM8T9Z0EbdHNCGRwMVMVwHCKZ1RxZoFsfg/CmDFQRLFAoSEhjCWAwTSSvTTQfh2v B+/JvajmyW/Csa0GhVE1W8aPCM/8g5/buU0jbnK0Cp92vg8jaYp+e+zxP8UCTo+bj88Y Gkzc+rrgATBcQOyRN85vtBAL9d7dROFt6KBxYydsBpeycbdFURqVSYmY/P187Jiv987g MO1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kCvFyIF60tzPmtz4gtKG6yaANHbgdKH1t1woD6Dz7DA=; b=Ef2awQGl9PPM05ebzBdzxM6w8TWWWJlgA+C+taze5JpuXc1GZ/3pOOVOXIv3E+J7wm CeRIXqUpuIn+NolxH8VmzMFMMieQIbezfzIk49wocPgc/yMjogzS30XCWvqg8Yy8r+pn BI0bHQsJ17eyYIyDhqO5N4p3sLV5zzSeqr4IzCQPNb+ViEVHWL9I6VQbV8DKzLlUiQfC Vuxt44socp4si5Rxu8A1dzd2QpQmwAiLnWRC8IN0ZcAjwLAMcX3qd4HaSYwvQwqj9j/y UliS+TC65tbZBvd1upEs+WUIwvoqgU7XFfbjM7DLuGjV7dyLw/u2dA7GhcuuxIX2DEFP 1pkA== X-Gm-Message-State: AOUpUlFlECZArc9qkVSLpmJiHxx8HNa2uhtKamQOW8X7XCYS5CRtaZwo tTm4rcC8hS9wzqaE4aOHFCwhh45n X-Google-Smtp-Source: AAOMgpfUJjSLEFRSZxoGsUgYEU42HPs8Qa96k+7BudFbMMjZvjbcr6hEbqcT+SFgLvqDCuYrdZlXbQ== X-Received: by 2002:a17:902:6bc8:: with SMTP id m8-v6mr2291474plt.162.1532616061940; Thu, 26 Jul 2018 07:41:01 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.40.59 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:01 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 1/9] net: Export skb_headers_offset_update Date: Thu, 26 Jul 2018 23:40:24 +0900 Message-Id: <20180726144032.2116-2-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is needed for veth XDP which does skb_copy_expand()-like operation. v2: - Drop skb_copy_header part because it has already been exported now. Signed-off-by: Toshiaki Makita --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index fd3cb1b247df..f6929688853a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1035,6 +1035,7 @@ static inline struct sk_buff *alloc_skb_fclone(unsigned int size, } struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src); +void skb_headers_offset_update(struct sk_buff *skb, int off); int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask); struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority); void skb_copy_header(struct sk_buff *new, const struct sk_buff *old); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 266b954f763e..f5670e6ab40c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1291,7 +1291,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) } EXPORT_SYMBOL(skb_clone); -static void skb_headers_offset_update(struct sk_buff *skb, int off) +void skb_headers_offset_update(struct sk_buff *skb, int off) { /* Only adjust this if it actually is csum_start rather than csum */ if (skb->ip_summed == CHECKSUM_PARTIAL) @@ -1305,6 +1305,7 @@ static void skb_headers_offset_update(struct sk_buff *skb, int off) skb->inner_network_header += off; skb->inner_mac_header += off; } +EXPORT_SYMBOL(skb_headers_offset_update); void skb_copy_header(struct sk_buff *new, const struct sk_buff *old) { From patchwork Thu Jul 26 14:40:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949755 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="iP83QAIW"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvsv25Bmz9ryl for ; Fri, 27 Jul 2018 00:41:07 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730610AbeGZP6P (ORCPT ); Thu, 26 Jul 2018 11:58:15 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:34731 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP6P (ORCPT ); Thu, 26 Jul 2018 11:58:15 -0400 Received: by mail-pg1-f196.google.com with SMTP id y5-v6so1306752pgv.1 for ; Thu, 26 Jul 2018 07:41:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8k9qjZmyJ2wbr/FKyHtiMAaKyN3BfL/ZRZ8hvsFa3RY=; b=iP83QAIWpeCLHpHRzIS5NvskfkzG2ITjbGP6QaXePONrXk3FJI5Vib/kcVelovatdr bo3LdwySjQ7OvBCIFWgRiT4R/oMe/mCog1zD+++ZDG7exeuGU+LtOsdQKKK7ZM5Kiytj 8KdnyJsWK8rtYIX5YW7Idq/DNT0/KjE9ScVlkekmizwu+hTx/Trf+hh6OvqcvqOd30RS SQ8VJQa0AM6ljd7LkYF+sZzOOD5HbzPcGzi7lLemvWDJvTg6bt4oD8I47+Tu7IMraubk jfI4L3Rr+P0z9ltwgbc2l0M9IKbUaHh+Q2c/ANXn5tYAt8sfWqrWtpWkrYyRyhiLjgla h0PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8k9qjZmyJ2wbr/FKyHtiMAaKyN3BfL/ZRZ8hvsFa3RY=; b=tV9Zz6T9Zf55CqLTq4KoB/qNQ7CL49P+CuIM18rqSfLf7BR5FEFHK9ed3Z4RBP5b0k L1ZBmfXORZUzz/++LoxRG70WurU6ayZy/TSQr6RjBKJYhy7eYBIPdcsf7ELO0rcOyKgT 897iqvF6ElS6EzBGZEuWA5HcXwS4gpKePGlCBaqbMenypJ7pcdPCsP8gjUBL8xqDre0F RJd1uaZOvGrUTVVBzDvRrlHY7C8mezI2R/Okps/uCm5cFyM7DmFRgfSCefJgtQakCQd1 X92nQYNe6EPFU/ITWyXJb2C3yuGcHYqu4cVM0AEgHqsf+SNGArG5oYwtA6hiWHjivnzn UDsQ== X-Gm-Message-State: AOUpUlFlPACA5/DcqSTxFLsvdqZv0liOxVsm4AReVs1+YSZSLGIj9UlB 6PlU1U2RRsbhYPL9+5U73sjYdMOK X-Google-Smtp-Source: AAOMgpfJHsXqezbNk/NeYUkcusTwrmaEDuB9hED8vctCI1CK2ycIKwcwrTP6MN8kXUETs8qFsjppZw== X-Received: by 2002:a63:a919:: with SMTP id u25-v6mr2263914pge.211.1532616064710; Thu, 26 Jul 2018 07:41:04 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:03 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 2/9] veth: Add driver XDP Date: Thu, 26 Jul 2018 23:40:25 +0900 Message-Id: <20180726144032.2116-3-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is the basic implementation of veth driver XDP. Incoming packets are sent from the peer veth device in the form of skb, so this is generally doing the same thing as generic XDP. This itself is not so useful, but a starting point to implement other useful veth XDP features like TX and REDIRECT. This introduces NAPI when XDP is enabled, because XDP is now heavily relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function enqueues packets to the ring and peer NAPI handler drains the ring. Currently only one ring is allocated for each veth device, so it does not scale on multiqueue env. This can be resolved by allocating rings on the per-queue basis later. Note that NAPI is not used but netif_rx is used when XDP is not loaded, so this does not change the default behaviour. v3: - Fix race on closing the device. - Add extack messages in ndo_bpf. v2: - Squashed with the patch adding NAPI. - Implement adjust_tail. - Don't acquire consumer lock because it is guarded by NAPI. - Make poll_controller noop since it is unnecessary. - Register rxq_info on enabling XDP rather than on opening the device. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 373 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 366 insertions(+), 7 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index a69ad39ee57e..78fa08cb6e24 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -19,10 +19,18 @@ #include #include #include +#include +#include +#include +#include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" +#define VETH_RING_SIZE 256 +#define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) + struct pcpu_vstats { u64 packets; u64 bytes; @@ -30,9 +38,16 @@ struct pcpu_vstats { }; struct veth_priv { + struct napi_struct xdp_napi; + struct net_device *dev; + struct bpf_prog __rcu *xdp_prog; + struct bpf_prog *_xdp_prog; struct net_device __rcu *peer; atomic64_t dropped; unsigned requested_headroom; + bool rx_notify_masked; + struct ptr_ring xdp_ring; + struct xdp_rxq_info xdp_rxq; }; /* @@ -98,11 +113,43 @@ static const struct ethtool_ops veth_ethtool_ops = { .get_link_ksettings = veth_get_link_ksettings, }; -static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) +/* general routines */ + +static void __veth_xdp_flush(struct veth_priv *priv) +{ + /* Write ptr_ring before reading rx_notify_masked */ + smp_mb(); + if (!priv->rx_notify_masked) { + priv->rx_notify_masked = true; + napi_schedule(&priv->xdp_napi); + } +} + +static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) +{ + if (unlikely(ptr_ring_produce(&priv->xdp_ring, skb))) { + dev_kfree_skb_any(skb); + return NET_RX_DROP; + } + + return NET_RX_SUCCESS; +} + +static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool xdp) { struct veth_priv *priv = netdev_priv(dev); + + return __dev_forward_skb(dev, skb) ?: xdp ? + veth_xdp_rx(priv, skb) : + netif_rx(skb); +} + +static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; int length = skb->len; + bool rcv_xdp = false; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -111,7 +158,10 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) goto drop; } - if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) { + rcv_priv = netdev_priv(rcv); + rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog); + + if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) { struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); u64_stats_update_begin(&stats->syncp); @@ -122,14 +172,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) drop: atomic64_inc(&priv->dropped); } + + if (rcv_xdp) + __veth_xdp_flush(rcv_priv); + rcu_read_unlock(); + return NETDEV_TX_OK; } -/* - * general routines - */ - static u64 veth_stats_one(struct pcpu_vstats *result, struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); @@ -179,18 +230,253 @@ static void veth_set_multicast_list(struct net_device *dev) { } +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + if (!buflen) { + buflen = SKB_DATA_ALIGN(headroom + len) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + } + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, + struct sk_buff *skb) +{ + u32 pktlen, headroom, act, metalen; + void *orig_data, *orig_data_end; + int size, mac_len, delta, off; + struct bpf_prog *xdp_prog; + struct xdp_buff xdp; + + rcu_read_lock(); + xdp_prog = rcu_dereference(priv->xdp_prog); + if (unlikely(!xdp_prog)) { + rcu_read_unlock(); + goto out; + } + + mac_len = skb->data - skb_mac_header(skb); + pktlen = skb->len + mac_len; + size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (size > PAGE_SIZE) + goto drop; + + headroom = skb_headroom(skb) - mac_len; + if (skb_shared(skb) || skb_head_is_locked(skb) || + skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) { + struct sk_buff *nskb; + void *head, *start; + struct page *page; + int head_off; + + page = alloc_page(GFP_ATOMIC); + if (!page) + goto drop; + + head = page_address(page); + start = head + VETH_XDP_HEADROOM; + if (skb_copy_bits(skb, -mac_len, start, pktlen)) { + page_frag_free(head); + goto drop; + } + + nskb = veth_build_skb(head, + VETH_XDP_HEADROOM + mac_len, skb->len, + PAGE_SIZE); + if (!nskb) { + page_frag_free(head); + goto drop; + } + + skb_copy_header(nskb, skb); + head_off = skb_headroom(nskb) - skb_headroom(skb); + skb_headers_offset_update(nskb, head_off); + if (skb->sk) + skb_set_owner_w(nskb, skb->sk); + consume_skb(skb); + skb = nskb; + } + + xdp.data_hard_start = skb->head; + xdp.data = skb_mac_header(skb); + xdp.data_end = xdp.data + pktlen; + xdp.data_meta = xdp.data; + xdp.rxq = &priv->xdp_rxq; + orig_data = xdp.data; + orig_data_end = xdp.data_end; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + + switch (act) { + case XDP_PASS: + break; + default: + bpf_warn_invalid_xdp_action(act); + case XDP_ABORTED: + trace_xdp_exception(priv->dev, xdp_prog, act); + case XDP_DROP: + goto drop; + } + rcu_read_unlock(); + + delta = orig_data - xdp.data; + off = mac_len + delta; + if (off > 0) + __skb_push(skb, off); + else if (off < 0) + __skb_pull(skb, -off); + skb->mac_header -= delta; + off = xdp.data_end - orig_data_end; + if (off != 0) + __skb_put(skb, off); + skb->protocol = eth_type_trans(skb, priv->dev); + + metalen = xdp.data - xdp.data_meta; + if (metalen) + skb_metadata_set(skb, metalen); +out: + return skb; +drop: + rcu_read_unlock(); + kfree_skb(skb); + return NULL; +} + +static int veth_xdp_rcv(struct veth_priv *priv, int budget) +{ + int i, done = 0; + + for (i = 0; i < budget; i++) { + struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + + if (!skb) + break; + + skb = veth_xdp_rcv_skb(priv, skb); + + if (skb) + napi_gro_receive(&priv->xdp_napi, skb); + + done++; + } + + return done; +} + +static int veth_poll(struct napi_struct *napi, int budget) +{ + struct veth_priv *priv = + container_of(napi, struct veth_priv, xdp_napi); + int done; + + done = veth_xdp_rcv(priv, budget); + + if (done < budget && napi_complete_done(napi, done)) { + /* Write rx_notify_masked before reading ptr_ring */ + smp_store_mb(priv->rx_notify_masked, false); + if (unlikely(!__ptr_ring_empty(&priv->xdp_ring))) { + priv->rx_notify_masked = true; + napi_schedule(&priv->xdp_napi); + } + } + + return done; +} + +static int veth_napi_add(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int err; + + err = ptr_ring_init(&priv->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (err) + return err; + + netif_napi_add(dev, &priv->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); + napi_enable(&priv->xdp_napi); + + return 0; +} + +static void veth_napi_del(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + napi_disable(&priv->xdp_napi); + netif_napi_del(&priv->xdp_napi); + priv->rx_notify_masked = false; + ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); +} + +static int veth_enable_xdp(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int err; + + if (!xdp_rxq_info_is_reg(&priv->xdp_rxq)) { + err = xdp_rxq_info_reg(&priv->xdp_rxq, dev, 0); + if (err < 0) + return err; + + err = xdp_rxq_info_reg_mem_model(&priv->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + if (err < 0) + goto err; + + err = veth_napi_add(dev); + if (err) + goto err; + } + + rcu_assign_pointer(priv->xdp_prog, priv->_xdp_prog); + + return 0; +err: + xdp_rxq_info_unreg(&priv->xdp_rxq); + + return err; +} + +static void veth_disable_xdp(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + rcu_assign_pointer(priv->xdp_prog, NULL); + veth_napi_del(dev); + xdp_rxq_info_unreg(&priv->xdp_rxq); +} + static int veth_open(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); struct net_device *peer = rtnl_dereference(priv->peer); + int err; if (!peer) return -ENOTCONN; + if (priv->_xdp_prog) { + err = veth_enable_xdp(dev); + if (err) + return err; + } + if (peer->flags & IFF_UP) { netif_carrier_on(dev); netif_carrier_on(peer); } + return 0; } @@ -203,6 +489,9 @@ static int veth_close(struct net_device *dev) if (peer) netif_carrier_off(peer); + if (priv->_xdp_prog) + veth_disable_xdp(dev); + return 0; } @@ -228,7 +517,7 @@ static void veth_dev_free(struct net_device *dev) static void veth_poll_controller(struct net_device *dev) { /* veth only receives frames when its peer sends one - * Since it's a synchronous operation, we are guaranteed + * Since it has nothing to do with disabling irqs, we are guaranteed * never to have pending data when we poll for it so * there is nothing to do here. * @@ -276,6 +565,72 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack) +{ + struct veth_priv *priv = netdev_priv(dev); + struct bpf_prog *old_prog; + struct net_device *peer; + int err; + + old_prog = priv->_xdp_prog; + priv->_xdp_prog = prog; + peer = rtnl_dereference(priv->peer); + + if (prog) { + if (!peer) { + NL_SET_ERR_MSG_MOD(extack, "Cannot set XDP when peer is detached"); + err = -ENOTCONN; + goto err; + } + + if (dev->flags & IFF_UP) { + err = veth_enable_xdp(dev); + if (err) { + NL_SET_ERR_MSG_MOD(extack, "Setup for XDP failed"); + goto err; + } + } + } + + if (old_prog) { + if (!prog && dev->flags & IFF_UP) + veth_disable_xdp(dev); + bpf_prog_put(old_prog); + } + + return 0; +err: + priv->_xdp_prog = old_prog; + + return err; +} + +static u32 veth_xdp_query(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + const struct bpf_prog *xdp_prog; + + xdp_prog = priv->_xdp_prog; + if (xdp_prog) + return xdp_prog->aux->id; + + return 0; +} + +static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) +{ + switch (xdp->command) { + case XDP_SETUP_PROG: + return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_QUERY_PROG: + xdp->prog_id = veth_xdp_query(dev); + return 0; + default: + return -EINVAL; + } +} + static const struct net_device_ops veth_netdev_ops = { .ndo_init = veth_dev_init, .ndo_open = veth_open, @@ -290,6 +645,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_get_iflink = veth_get_iflink, .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, + .ndo_bpf = veth_xdp, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ @@ -451,10 +807,13 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, */ priv = netdev_priv(dev); + priv->dev = dev; rcu_assign_pointer(priv->peer, peer); priv = netdev_priv(peer); + priv->dev = peer; rcu_assign_pointer(priv->peer, dev); + return 0; err_register_dev: From patchwork Thu Jul 26 14:40:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949756 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="kpZt/RUF"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvsy06clz9rxx for ; Fri, 27 Jul 2018 00:41:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730827AbeGZP6S (ORCPT ); Thu, 26 Jul 2018 11:58:18 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:45109 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP6S (ORCPT ); Thu, 26 Jul 2018 11:58:18 -0400 Received: by mail-pf1-f194.google.com with SMTP id i26-v6so653540pfo.12 for ; Thu, 26 Jul 2018 07:41:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QPcPsrdx+/eXIjWNkILeDqmC5vtHGQBTdML3+OfVlrc=; b=kpZt/RUFK9TbDG8yYA+FdTIDh2v5SozsVrlRjTa5wfgI7orKVNoVNgGEcUZKOMt4mh WT6p073in86eH70/R/ZvdZ/rCRssB20gKQj8EbrbQhUzBFNmQjXF+vBNa0DbJJVvJOIY 2/+u7YWcD+6teGab6rrOInc7LtnUvfeIt19pzdkgRFWA9S1Y6I1jIhR75qCyiH9Ox12G hYriiVaXSAPAt000EWOmnQohymFghIwDo7iyyF6gpc+smqzHMtlu3Ou5SDDDjcCBoRMh JOJCLb4CKne5lhKipvFu6pPmawKlmBrbPwz7injLgLCfjOK02IfcipwDr607IkarDL3s 9iJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=QPcPsrdx+/eXIjWNkILeDqmC5vtHGQBTdML3+OfVlrc=; b=LUVo/JJoxdiHxoNpjffIX3NlnZgAJu+t1pHHUad1iWGi50m4JZzr0yTrU/nREa4BLQ 7QIcXLMoBZm4DfNpAWgGNK0uiu9vuIK4kOO1wO3Vqv3xhmTmAxljSf4UavX/VR6QshDg XcN46Qu65Pxh9OVetbLSDsGkP5AJXrVGw9m6Xv/G0SAa2EM+/BmbwQxBEaybgrbSy6oO 32eQ6TmIzeqRloXkPACmLcFOVjnVKRo3JbPbXawn0LiK0deM5XbjDYlOUr3fWTEhvAx0 h/Hi7mosjjKjZua7uxptUCCM/nIC9dPsGFmssp3LWKgUsvRQoE1UK6Ggo3BZFvDDYCG/ foug== X-Gm-Message-State: AOUpUlHgmy2h83xXoATR/cZ8ZHMBWwusgdt23JqypZh4RoQCfShvUxuV h1dJmMgf8VvT0DyL3B41zVeWb4p2 X-Google-Smtp-Source: AAOMgpc1CvA+RqwwFW5t1I3sBpYlRSw31A1gCChL9rbjMC6mo3bxVL7vN9OH/oeYFVJ40Jzy5K6uQA== X-Received: by 2002:a63:214f:: with SMTP id s15-v6mr2189673pgm.267.1532616067532; Thu, 26 Jul 2018 07:41:07 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:06 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 3/9] veth: Avoid drops by oversized packets when XDP is enabled Date: Thu, 26 Jul 2018 23:40:26 +0900 Message-Id: <20180726144032.2116-4-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita All oversized packets including GSO packets are dropped if XDP is enabled on receiver side, so don't send such packets from peer. Drop TSO and SCTP fragmentation features so that veth devices themselves segment packets with XDP enabled. Also cap MTU accordingly. v4: - Don't auto-adjust MTU but cap max MTU. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 78fa08cb6e24..1b4006d3df32 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -542,6 +542,23 @@ static int veth_get_iflink(const struct net_device *dev) return iflink; } +static netdev_features_t veth_fix_features(struct net_device *dev, + netdev_features_t features) +{ + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer; + + peer = rtnl_dereference(priv->peer); + if (peer) { + struct veth_priv *peer_priv = netdev_priv(peer); + + if (peer_priv->_xdp_prog) + features &= ~NETIF_F_GSO_SOFTWARE; + } + + return features; +} + static void veth_set_rx_headroom(struct net_device *dev, int new_hr) { struct veth_priv *peer_priv, *priv = netdev_priv(dev); @@ -571,6 +588,7 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct veth_priv *priv = netdev_priv(dev); struct bpf_prog *old_prog; struct net_device *peer; + unsigned int max_mtu; int err; old_prog = priv->_xdp_prog; @@ -584,6 +602,15 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, goto err; } + max_mtu = PAGE_SIZE - VETH_XDP_HEADROOM - + peer->hard_header_len - + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (peer->mtu > max_mtu) { + NL_SET_ERR_MSG_MOD(extack, "Peer MTU is too large to set XDP"); + err = -ERANGE; + goto err; + } + if (dev->flags & IFF_UP) { err = veth_enable_xdp(dev); if (err) { @@ -591,14 +618,29 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, goto err; } } + + if (!old_prog) { + peer->hw_features &= ~NETIF_F_GSO_SOFTWARE; + peer->max_mtu = max_mtu; + } } if (old_prog) { - if (!prog && dev->flags & IFF_UP) - veth_disable_xdp(dev); + if (!prog) { + if (dev->flags & IFF_UP) + veth_disable_xdp(dev); + + if (peer) { + peer->hw_features |= NETIF_F_GSO_SOFTWARE; + peer->max_mtu = ETH_MAX_MTU; + } + } bpf_prog_put(old_prog); } + if ((!!old_prog ^ !!prog) && peer) + netdev_update_features(peer); + return 0; err: priv->_xdp_prog = old_prog; @@ -643,6 +685,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_poll_controller = veth_poll_controller, #endif .ndo_get_iflink = veth_get_iflink, + .ndo_fix_features = veth_fix_features, .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, From patchwork Thu Jul 26 14:40:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949757 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="lh8eQi2H"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvt10pHwz9rxx for ; Fri, 27 Jul 2018 00:41:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731230AbeGZP6V (ORCPT ); Thu, 26 Jul 2018 11:58:21 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:40443 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP6V (ORCPT ); Thu, 26 Jul 2018 11:58:21 -0400 Received: by mail-pg1-f193.google.com with SMTP id x5-v6so1297142pgp.7 for ; Thu, 26 Jul 2018 07:41:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ReKgz+Ex2WfpgItxlZNClL0L5GsAfUBRmfmJ5Joag1s=; b=lh8eQi2H1aV3Ls6cKwg6XjRjWxTz89Ij+8DFhO6CZkLdctMGMgKjS6sfk67UVw/yNX kzfY9qHI2CMiu2LqPQF0dU8S2saNsYERO1CTX1D5u6cNqlUlh+lpdXXJLrtKO2iNU1tU Ou5UhVqjdTWeb06Dl9GBLzPkWiASzDZqyAoyqdYzBSKy7l6VyJjj19avK0F2GsVyTLBb bYC3A2IGl2Yoy4D4f6FVr8rHdWzxYPGPhQSbLyPiAZ1ztHFuGfVtTEteQzKsWqCFNN8x Dt3REzwE+5Mb/xbmfH2SoMJKDviBQD8sxmuz9gc174VgYH4GM4nKwz+3uolRl9edWu5J 3A9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ReKgz+Ex2WfpgItxlZNClL0L5GsAfUBRmfmJ5Joag1s=; b=jQYww0EAidgiVGfvjrDWAavfIvjiRD8nx77vSsq/txqLRMcZjy/7z3+YMk/hdwHHsb oUKsNae4V4ovZRPQ9qkquHWDa7PpJ58slbFgDR8yvQKKfA/Mgdti01oLdw2uotVx/GwT u2DtCZDuzoRBG0Oa2HANVive/Yj7d6Obl33Ecsy2jb1DBCXtdTKNMHJtKSutS//Smnnb E+H7+kYOSYFDipDtv5lRjGd6os+tJqcIDgco01AufWnh5WCs/WujKHTNnr+YOFpnLbxA AIY1hD/ltwax0APIIbdvOEKvQ8hYA5dxGZTmG0+MB/qZQZNOXhmvOSCs/h9EQrFYyw0h bcHw== X-Gm-Message-State: AOUpUlE0NKaLZ2o8VTOKUq3rMmkUbf/gaTKRCrGv2wxDuXX1VvXJnNGs Ma/AdtNsWTPTNJdKi6QvzzjYkrqS X-Google-Smtp-Source: AAOMgpeupvSaZq4Pk/0l3vLlQ22JAUmtV7uWAB7EACshM7BW4vQJ3eiUOqaEfoecftVfNx1yHAVFUg== X-Received: by 2002:a65:4107:: with SMTP id w7-v6mr2171681pgp.302.1532616070140; Thu, 26 Jul 2018 07:41:10 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.07 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:09 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring Date: Thu, 26 Jul 2018 23:40:27 +0900 Message-Id: <20180726144032.2116-5-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is preparation for XDP TX and ndo_xdp_xmit. This allows napi handler to handle xdp_frames through xdp ring as well as sk_buff. v3: - Revert v2 change around rings and use a flag to differentiate skb and xdp_frame, since bulk skb xmit makes little performance difference for now. v2: - Use another ring instead of using flag to differentiate skb and xdp_frame. This approach makes bulk skb transmit possible in veth_xmit later. - Clear xdp_frame feilds in skb->head. - Implement adjust_tail. Signed-off-by: Toshiaki Makita Acked-by: John Fastabend --- drivers/net/veth.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 82 insertions(+), 5 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 1b4006d3df32..ef22d991f678 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -22,12 +22,12 @@ #include #include #include -#include #include #define DRV_NAME "veth" #define DRV_VERSION "1.0" +#define VETH_XDP_FLAG BIT(0) #define VETH_RING_SIZE 256 #define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) @@ -115,6 +115,24 @@ static const struct ethtool_ops veth_ethtool_ops = { /* general routines */ +static bool veth_is_xdp_frame(void *ptr) +{ + return (unsigned long)ptr & VETH_XDP_FLAG; +} + +static void *veth_ptr_to_xdp(void *ptr) +{ + return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG); +} + +static void veth_ptr_free(void *ptr) +{ + if (veth_is_xdp_frame(ptr)) + xdp_return_frame(veth_ptr_to_xdp(ptr)); + else + kfree_skb(ptr); +} + static void __veth_xdp_flush(struct veth_priv *priv) { /* Write ptr_ring before reading rx_notify_masked */ @@ -249,6 +267,61 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, + struct xdp_frame *frame) +{ + int len = frame->len, delta = 0; + struct bpf_prog *xdp_prog; + unsigned int headroom; + struct sk_buff *skb; + + rcu_read_lock(); + xdp_prog = rcu_dereference(priv->xdp_prog); + if (likely(xdp_prog)) { + struct xdp_buff xdp; + u32 act; + + xdp.data_hard_start = frame->data - frame->headroom; + xdp.data = frame->data; + xdp.data_end = frame->data + frame->len; + xdp.data_meta = frame->data - frame->metasize; + xdp.rxq = &priv->xdp_rxq; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + + switch (act) { + case XDP_PASS: + delta = frame->data - xdp.data; + len = xdp.data_end - xdp.data; + break; + default: + bpf_warn_invalid_xdp_action(act); + case XDP_ABORTED: + trace_xdp_exception(priv->dev, xdp_prog, act); + case XDP_DROP: + goto err_xdp; + } + } + rcu_read_unlock(); + + headroom = frame->data - delta - (void *)frame; + skb = veth_build_skb(frame, headroom, len, 0); + if (!skb) { + xdp_return_frame(frame); + goto err; + } + + memset(frame, 0, sizeof(*frame)); + skb->protocol = eth_type_trans(skb, priv->dev); +err: + return skb; +err_xdp: + rcu_read_unlock(); + xdp_return_frame(frame); + + return NULL; +} + static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, struct sk_buff *skb) { @@ -358,12 +431,16 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget) int i, done = 0; for (i = 0; i < budget; i++) { - struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + void *ptr = __ptr_ring_consume(&priv->xdp_ring); + struct sk_buff *skb; - if (!skb) + if (!ptr) break; - skb = veth_xdp_rcv_skb(priv, skb); + if (veth_is_xdp_frame(ptr)) + skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr)); + else + skb = veth_xdp_rcv_skb(priv, ptr); if (skb) napi_gro_receive(&priv->xdp_napi, skb); @@ -416,7 +493,7 @@ static void veth_napi_del(struct net_device *dev) napi_disable(&priv->xdp_napi); netif_napi_del(&priv->xdp_napi); priv->rx_notify_masked = false; - ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); + ptr_ring_cleanup(&priv->xdp_ring, veth_ptr_free); } static int veth_enable_xdp(struct net_device *dev) From patchwork Thu Jul 26 14:40:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949758 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="KxQ1XacJ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvt35XnMz9rxx for ; Fri, 27 Jul 2018 00:41:15 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731301AbeGZP6X (ORCPT ); Thu, 26 Jul 2018 11:58:23 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:34741 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP6X (ORCPT ); Thu, 26 Jul 2018 11:58:23 -0400 Received: by mail-pg1-f193.google.com with SMTP id y5-v6so1306969pgv.1 for ; Thu, 26 Jul 2018 07:41:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=HgUMi9TFcT7sWFJMvCAvXL4+hamgEwYXdM6RnN6X0Cw=; b=KxQ1XacJTmY/b9HM1kMXxrByVFHZA2NOEZO1z/ov1ECB6MD7Wp7KjwDngDgPbEB7hW m1LSZMaQIlfMPgljdnuCzcd7gnGzkB02Io73QvIsJ7bW7ax0riyUNqJ2Nh7O+nUUt/8+ O4gKSSkQCfWfHRNX/GQCAQf2JMKOHF5iwgEY1P/QIY126jkGO9EOVrHnBkHwxUsz9WvI n7DZYhqTzkzGO85ygC4h09yVqA+e3hZ5k1P1cOPRq14fX+plGhNQHKJ7J052gzcukrRp 6rIeDsfbvjQWz4NTOrzA7fCFGXt1njFOR6XnBmXsZa+n2Gow/S/I5mhTpILEZaOERUOH w5AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=HgUMi9TFcT7sWFJMvCAvXL4+hamgEwYXdM6RnN6X0Cw=; b=dlggTopBZ9BWVTct5niU8sDMogITlE5dZpDH2O9X4+dl2XZElEiBQ1/oBIUyVlAgJL vjsOvJuwELaicyfXdgBAXd+tKr+kNV4uRmIFEtUNXFikdUN06YkmEJZ1EXiFsFH/9fXp ZpH1FCIW+wUIxqSsWW66S+OWmEecIPecwhxXQOutT9c56Qg4t2uWjtXYmwPoEUAKjd73 oygJQY/VCXUNx4yEg4MkpTQ6LTeZIj64JXHvDoG6YLqc6bhsLN9T3sZc9CZ+DO8smJaX cclvUKZ4pCIdCQeDH2eX40vZP81PKjpqzrbrxh4RVuu920fnStg1ULr3k7/igMSwaVYS ZwMQ== X-Gm-Message-State: AOUpUlEzS0sqDXEM/+fmgGXmiXP7iJsEoi37bTomHG9PjgR3cJScpzpX Fpv7fNh3Zb5WIfdeeUbb1pvQZVT+ X-Google-Smtp-Source: AAOMgpdzlcX61uOOdZQMjnHLaiLp35EpKDrijXk3LWI8ZjfF+zqNXxL77hXeGxYvL87bs1D8Bxn6sA== X-Received: by 2002:a65:6455:: with SMTP id s21-v6mr2168671pgv.394.1532616072775; Thu, 26 Jul 2018 07:41:12 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:11 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 5/9] veth: Add ndo_xdp_xmit Date: Thu, 26 Jul 2018 23:40:28 +0900 Message-Id: <20180726144032.2116-6-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This allows NIC's XDP to redirect packets to veth. The destination veth device enqueues redirected packets to the napi ring of its peer, then they are processed by XDP on its peer veth device. This can be thought as calling another XDP program by XDP program using REDIRECT, when the peer enables driver XDP. Note that when the peer veth device does not set driver xdp, redirected packets will be dropped because the peer is not ready for NAPI. v4: - Don't use xdp_ok_fwd_dev() because checking IFF_UP is not necessary. Add comments about it and check only MTU. v2: - Drop the part converting xdp_frame into skb when XDP is not enabled. - Implement bulk interface of ndo_xdp_xmit. - Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush. Signed-off-by: Toshiaki Makita Acked-by: John Fastabend --- drivers/net/veth.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index ef22d991f678..acdb1c543f4b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -125,6 +126,11 @@ static void *veth_ptr_to_xdp(void *ptr) return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG); } +static void *veth_xdp_to_ptr(void *ptr) +{ + return (void *)((unsigned long)ptr | VETH_XDP_FLAG); +} + static void veth_ptr_free(void *ptr) { if (veth_is_xdp_frame(ptr)) @@ -267,6 +273,50 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static int veth_xdp_xmit(struct net_device *dev, int n, + struct xdp_frame **frames, u32 flags) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct net_device *rcv; + unsigned int max_len; + int i, drops = 0; + + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) + return -EINVAL; + + rcv = rcu_dereference(priv->peer); + if (unlikely(!rcv)) + return -ENXIO; + + rcv_priv = netdev_priv(rcv); + /* Non-NULL xdp_prog ensures that xdp_ring is initialized on receive + * side. This means an XDP program is loaded on the peer and the peer + * device is up. + */ + if (!rcu_access_pointer(rcv_priv->xdp_prog)) + return -ENXIO; + + max_len = rcv->mtu + rcv->hard_header_len + VLAN_HLEN; + + spin_lock(&rcv_priv->xdp_ring.producer_lock); + for (i = 0; i < n; i++) { + struct xdp_frame *frame = frames[i]; + void *ptr = veth_xdp_to_ptr(frame); + + if (unlikely(frame->len > max_len || + __ptr_ring_produce(&rcv_priv->xdp_ring, ptr))) { + xdp_return_frame_rx_napi(frame); + drops++; + } + } + spin_unlock(&rcv_priv->xdp_ring.producer_lock); + + if (flags & XDP_XMIT_FLUSH) + __veth_xdp_flush(rcv_priv); + + return n - drops; +} + static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, struct xdp_frame *frame) { @@ -766,6 +816,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, + .ndo_xdp_xmit = veth_xdp_xmit, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ From patchwork Thu Jul 26 14:40:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949759 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="sfzn5CqM"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvt60Xzrz9rxx for ; Fri, 27 Jul 2018 00:41:18 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731355AbeGZP60 (ORCPT ); Thu, 26 Jul 2018 11:58:26 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:38385 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP60 (ORCPT ); Thu, 26 Jul 2018 11:58:26 -0400 Received: by mail-pl0-f66.google.com with SMTP id b1-v6so923582pls.5 for ; Thu, 26 Jul 2018 07:41:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=q1Wg6IHMW9ivPuWscNFuYqJxfrwna1FYaMPvy9uGFWQ=; b=sfzn5CqMLp6UKBK7/hw5qnR9LS5N/XFgMFeSuiJySZzqAnstwISJzoZf6LJxMKJbUP Rbf8NN8t5gqmJvJ6DwtDxIbWSc4XT5bWH34RYyIa9KzVNy7GmUSwhneZUQ99lTphxuY7 PG0ppGfO5wsonhSrrTiT7mDNE0yl4OQT/O7Y+m4VDzI3XzwJajjd0+q93abQAGgi9nsk OaAUMVH39wmsY+GT2tOt9CaX86zvMn4DdrZnoqbv5gwtbLdtirph2QoFJayvUbu1uquW JLFEn8CtPF5F9npCdle82y5wwsIN7kszsw3MWzN/ps4ZndGOTuTLc/5QwIy1ZlWLg7UD KwMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=q1Wg6IHMW9ivPuWscNFuYqJxfrwna1FYaMPvy9uGFWQ=; b=i3xzp2Vcm0hvAP2Wab/Orz/FP6VOnKcnYoLRdeOnOz2c11vvdUgMVPDsk8E43scH4G RxVyXESsVfLK5IYAJ84tVqSL/duvtvRO50Xcg1xwTbaCCpL1aXFDlzdmCC2csPtGJ1wf NClMwWXlUtxp6mRlZy6HfgERhj1ZuU+NZdbHaqm4mAaii70Te23WQvW1UM9PPBcutGHE M6YsuAZCoGEgao1n6EbQ2RuH5m0k3BfAI1Yoz+6ZXtinhEinbmEswqk/BbDnpVaDmpOe rjD8XM77v729/HYYjREZKStLa1supgqZ3w32uLnT6yylQvxBCjIYlH49uLP8yi0ZqKvS 34Hg== X-Gm-Message-State: AOUpUlEUsxaFwTW8BEY4ijqB/CAVPB2aMdNU80nR19m+KQnxXbai4w1x SdOlgANOSA56DTECtRw3nTnfPeil X-Google-Smtp-Source: AAOMgpf2bb3SYj5kKGlCY2JtbLL0K8rWd0utaTp3PopjdkBI8h2YIvlA7B87z7FuVGwwfr75lRD4mw== X-Received: by 2002:a17:902:42a3:: with SMTP id h32-v6mr2236765pld.72.1532616075427; Thu, 26 Jul 2018 07:41:15 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.12 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:14 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 6/9] bpf: Make redirect_info accessible from modules Date: Thu, 26 Jul 2018 23:40:29 +0900 Message-Id: <20180726144032.2116-7-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita We are going to add kern_flags field in redirect_info for kernel internal use. In order to avoid function call to access the flags, make redirect_info accessible from modules. Also as it is now non-static, add prefix bpf_ to redirect_info. Signed-off-by: Toshiaki Makita --- include/linux/filter.h | 10 ++++++++++ net/core/filter.c | 29 +++++++++++------------------ 2 files changed, 21 insertions(+), 18 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index c73dd7396886..4717af8b95e6 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -537,6 +537,16 @@ struct sk_msg_buff { struct list_head list; }; +struct bpf_redirect_info { + u32 ifindex; + u32 flags; + struct bpf_map *map; + struct bpf_map *map_to_flush; + unsigned long map_owner; +}; + +DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); + /* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, * lwt, ...). Subsystems allowing direct data access must (!) diff --git a/net/core/filter.c b/net/core/filter.c index 104d560946da..acf322296535 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2080,19 +2080,12 @@ static const struct bpf_func_proto bpf_clone_redirect_proto = { .arg3_type = ARG_ANYTHING, }; -struct redirect_info { - u32 ifindex; - u32 flags; - struct bpf_map *map; - struct bpf_map *map_to_flush; - unsigned long map_owner; -}; - -static DEFINE_PER_CPU(struct redirect_info, redirect_info); +DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); +EXPORT_SYMBOL_GPL(bpf_redirect_info); BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); if (unlikely(flags & ~(BPF_F_INGRESS))) return TC_ACT_SHOT; @@ -2105,7 +2098,7 @@ BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags) int skb_do_redirect(struct sk_buff *skb) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); struct net_device *dev; dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex); @@ -3198,7 +3191,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, void xdp_do_flush_map(void) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); struct bpf_map *map = ri->map_to_flush; ri->map_to_flush = NULL; @@ -3243,7 +3236,7 @@ static inline bool xdp_map_invalid(const struct bpf_prog *xdp_prog, static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); unsigned long map_owner = ri->map_owner; struct bpf_map *map = ri->map; u32 index = ri->ifindex; @@ -3283,7 +3276,7 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); struct net_device *fwd; u32 index = ri->ifindex; int err; @@ -3315,7 +3308,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); unsigned long map_owner = ri->map_owner; struct bpf_map *map = ri->map; u32 index = ri->ifindex; @@ -3366,7 +3359,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); u32 index = ri->ifindex; struct net_device *fwd; int err = 0; @@ -3397,7 +3390,7 @@ EXPORT_SYMBOL_GPL(xdp_do_generic_redirect); BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); if (unlikely(flags)) return XDP_ABORTED; @@ -3421,7 +3414,7 @@ static const struct bpf_func_proto bpf_xdp_redirect_proto = { BPF_CALL_4(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, u64, flags, unsigned long, map_owner) { - struct redirect_info *ri = this_cpu_ptr(&redirect_info); + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); if (unlikely(flags)) return XDP_ABORTED; From patchwork Thu Jul 26 14:40:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949760 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="e7Pn+RL2"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvt851lFz9ryl for ; Fri, 27 Jul 2018 00:41:20 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731411AbeGZP63 (ORCPT ); Thu, 26 Jul 2018 11:58:29 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:38389 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP63 (ORCPT ); Thu, 26 Jul 2018 11:58:29 -0400 Received: by mail-pl0-f66.google.com with SMTP id b1-v6so923622pls.5 for ; Thu, 26 Jul 2018 07:41:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UTbcJqmwq1G2q7bqKIpA/E/Lfq8X3SISlY+roGvkfuc=; b=e7Pn+RL2nzBi5jWneUVzt7pVoOtdqslUXwGuL1fi7DDYtUc8d0OtAfQiytt03TmMFP CaqFKSG8K/Hi1mRrXFDKvqP3RD8KuVU3gJtZpBbSuHUCfAJetJhm8oHyWZrFU/A7t2fA vi+eUqNaYCyHLmHrL9hjzovEx5JdCLtn5nRHkr+ukXFT34pJmG999PSOLFUo9rPsocV5 8513pd/u3poj1BJwMSUSdh/jjsaKuLJkxf9cbgVk3y7U16M+Fo6U5X1RtcK+r6uh0kyQ An3b42JTrh5Rqe6pepfn1qIOTup8bhaj9JG8FLuP9TEkgwG+gIgJcCzpUQYcgY6kq+xa E+hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UTbcJqmwq1G2q7bqKIpA/E/Lfq8X3SISlY+roGvkfuc=; b=TZjSUKXgB9bf6PARSYuvkc8hhnm4L90SCod8abVUbcUQBW8pSYxBS02voVBD7JOrTy sZ7G2TmqG2PtmBaK0Fe3kzueFYEJO9/3npON3c26Cmemhief0zmDOIp1DQo6ZQp29EWa sUxBuzMzjuebH41POkYnXMQqhPKO0gqYLfsqz+gCyrD7tiu6ArmqfoTem7qeTzSYwrmD LY1KYKBlTlS7Rhv7VxMEKuUcEyaHDgquDxh1btsVqhi3w0KvVPoesEnTC50HTelXz7Lq Unz2vqBBibLFtDk7PFE6qaIRQEWwBHsPFfsyPhdnwM3znGh1J5BVuWsF2vx6D/EzPV9/ 2HDA== X-Gm-Message-State: AOUpUlFtWNGTDHaUJcpjYcabwYep4TshQgo9w2TIIcfXRaokS/yfSnuN H6cABjOaml34YOL/MZxVdHIlgO98 X-Google-Smtp-Source: AAOMgpebmCjPmsoTTGUX5WGmmFCtt4l7dRVB0KsPFUqt1mRs3OypWWdS3QGaUrsYbZWan5vewSNJXA== X-Received: by 2002:a17:902:7e06:: with SMTP id b6-v6mr2246757plm.230.1532616078123; Thu, 26 Jul 2018 07:41:18 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:17 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 7/9] xdp: Helpers for disabling napi_direct of xdp_return_frame Date: Thu, 26 Jul 2018 23:40:30 +0900 Message-Id: <20180726144032.2116-8-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita We need some mechanism to disable napi_direct on calling xdp_return_frame_rx_napi() from some context. When veth gets support of XDP_REDIRECT, it will redirects packets which are redirected from other devices. On redirection veth will reuse xdp_mem_info of the redirection source device to make return_frame work. But in this case .ndo_xdp_xmit() called from veth redirection uses xdp_mem_info which is not guarded by NAPI, because the .ndo_xdp_xmit() is not called directly from the rxq which owns the xdp_mem_info. This approach introduces a flag in bpf_redirect_info to indicate that napi_direct should be disabled even when _rx_napi variant is used as well as helper functions to use it. A NAPI handler who wants to use this flag needs to call xdp_set_return_frame_no_direct() before processing packets, and call xdp_clear_return_frame_no_direct() after xdp_do_flush_map() before exiting NAPI. v4: - Use bpf_redirect_info for storing the flag instead of xdp_mem_info to avoid per-frame copy cost. Signed-off-by: Toshiaki Makita --- include/linux/filter.h | 25 +++++++++++++++++++++++++ net/core/xdp.c | 6 ++++-- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 4717af8b95e6..2b072dab32c0 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -543,10 +543,14 @@ struct bpf_redirect_info { struct bpf_map *map; struct bpf_map *map_to_flush; unsigned long map_owner; + u32 kern_flags; }; DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); +/* flags for bpf_redirect_info kern_flags */ +#define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ + /* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, * lwt, ...). Subsystems allowing direct data access must (!) @@ -775,6 +779,27 @@ static inline bool bpf_dump_raw_ok(void) struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off, const struct bpf_insn *patch, u32 len); +static inline bool xdp_return_frame_no_direct(void) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + return ri->kern_flags & BPF_RI_F_RF_NO_DIRECT; +} + +static inline void xdp_set_return_frame_no_direct(void) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + ri->kern_flags |= BPF_RI_F_RF_NO_DIRECT; +} + +static inline void xdp_clear_return_frame_no_direct(void) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + + ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT; +} + static inline int xdp_ok_fwd_dev(const struct net_device *fwd, unsigned int pktlen) { diff --git a/net/core/xdp.c b/net/core/xdp.c index 57285383ed00..3dd99e1c04f5 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -330,10 +330,12 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */ xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); page = virt_to_head_page(data); - if (xa) + if (xa) { + napi_direct &= !xdp_return_frame_no_direct(); page_pool_put_page(xa->page_pool, page, napi_direct); - else + } else { put_page(page); + } rcu_read_unlock(); break; case MEM_TYPE_PAGE_SHARED: From patchwork Thu Jul 26 14:40:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949761 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LefLiRKX"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvtC2wtMz9s0w for ; Fri, 27 Jul 2018 00:41:23 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731429AbeGZP6b (ORCPT ); Thu, 26 Jul 2018 11:58:31 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:38395 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP6b (ORCPT ); Thu, 26 Jul 2018 11:58:31 -0400 Received: by mail-pl0-f66.google.com with SMTP id b1-v6so923684pls.5 for ; Thu, 26 Jul 2018 07:41:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=0ll2prJca9ZDOvA1VCYrDrE9RShIqEZVu3buZlRDjfQ=; b=LefLiRKXMQpaIOgAt8HHzF0AxLc0qT7TvfnW0Yj+Uw98t1rKZsgC4CRLXhLFuYfaR6 rDj1Q69NMFlgy3pM4B3YdcT/YYl2rmK5RWt/n/LN7bU06+cHbuc3FxKQ2yHC+l4UCTyY bjb1DRwpQPgE3zLSCq+u2qNrJl8nQxiZZakd/fvHiTnseBZlqmTKF/cUb+B79AQISgrB A7n4lYAPtXLQjIJlSz+Qu25sbzYKxSB+jX4AoHmqTCnpqQcKBwbJLc2fmN9xK328asvc w3Hb1to+umriNu2VvWIN2E/u8aeIlKccDmkptvUcyrUqIkGOMNLQbkGB7y0dIoEzmoMk TFLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=0ll2prJca9ZDOvA1VCYrDrE9RShIqEZVu3buZlRDjfQ=; b=e1iISjiUIs+7hNdierUtXuQe0gyqPQ6Z+m6/X5t/ig6nTGbIYWANdyrqopydJwXvDL X5TgCKUQmsyaMkx6GSg2gq47shaQOSF1F2jsoqE3DKhLpck2b9nEXNkusDilh+6YiGt5 dJe3hn7zCysjBARO/xZmdr8SmXnpnhQifHe926SncYcQdXinHGol2Oyvk5q0m6V8K67E cBj+hxtKgcU844AMN6XzEpBD+jgoGNOvU01/phWQybfPfsc04iB+iNbeY89IdP9TZkQk uxGrj3nVcPuduNf3OVERjH+RvjLVGba7GGRyn8oae/xTTTRUp8AA3qBtPfnVST9NtpAU c5HQ== X-Gm-Message-State: AOUpUlFVkUnDb2u36xt8sonkrtZT1mImtaIxdC9qBarbhjiewiY/dBsJ KQVZ4lEHpMfF9fDRMkilZSnYtrnD X-Google-Smtp-Source: AAOMgpfAqoGmTcp2tcVHeqtrrDFE4Hbab++MYSMjYmbBYs1iFqQpuxmo0jZLPI8X593PpC+ugDBJmw== X-Received: by 2002:a17:902:d70d:: with SMTP id w13-v6mr2276833ply.40.1532616080823; Thu, 26 Jul 2018 07:41:20 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:20 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 8/9] veth: Add XDP TX and REDIRECT Date: Thu, 26 Jul 2018 23:40:31 +0900 Message-Id: <20180726144032.2116-9-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This allows further redirection of xdp_frames like NIC -> veth--veth -> veth--veth (XDP) (XDP) (XDP) The intermediate XDP, redirecting packets from NIC to the other veth, reuses xdp_mem_info from NIC so that page recycling of the NIC works on the destination veth's XDP. In this way return_frame is not fully guarded by NAPI, since another NAPI handler on another cpu may use the same xdp_mem_info concurrently. Thus disable napi_direct by xdp_set_return_frame_no_direct() during the NAPI context. v4: - Use xdp_[set|clear]_return_frame_no_direct() instead of a flag in xdp_mem_info. v3: - Fix double free when veth_xdp_tx() returns a positive value. - Convert xdp_xmit and xdp_redir variables into flags. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 110 insertions(+), 9 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index acdb1c543f4b..60397a8ea2e9 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -32,6 +32,10 @@ #define VETH_RING_SIZE 256 #define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) +/* Separating two types of XDP xmit */ +#define VETH_XDP_TX BIT(0) +#define VETH_XDP_REDIR BIT(1) + struct pcpu_vstats { u64 packets; u64 bytes; @@ -45,6 +49,7 @@ struct veth_priv { struct bpf_prog *_xdp_prog; struct net_device __rcu *peer; atomic64_t dropped; + struct xdp_mem_info xdp_mem; unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; @@ -317,10 +322,42 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return n - drops; } +static void veth_xdp_flush(struct net_device *dev) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct net_device *rcv; + + rcu_read_lock(); + rcv = rcu_dereference(priv->peer); + if (unlikely(!rcv)) + goto out; + + rcv_priv = netdev_priv(rcv); + /* xdp_ring is initialized on receive side? */ + if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog))) + goto out; + + __veth_xdp_flush(rcv_priv); +out: + rcu_read_unlock(); +} + +static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) +{ + struct xdp_frame *frame = convert_to_xdp_frame(xdp); + + if (unlikely(!frame)) + return -EOVERFLOW; + + return veth_xdp_xmit(dev, 1, &frame, 0); +} + static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, - struct xdp_frame *frame) + struct xdp_frame *frame, + unsigned int *xdp_xmit) { int len = frame->len, delta = 0; + struct xdp_frame orig_frame; struct bpf_prog *xdp_prog; unsigned int headroom; struct sk_buff *skb; @@ -344,6 +381,29 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, delta = frame->data - xdp.data; len = xdp.data_end - xdp.data; break; + case XDP_TX: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { + trace_xdp_exception(priv->dev, xdp_prog, act); + frame = &orig_frame; + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_TX; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) { + frame = &orig_frame; + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_REDIR; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -368,12 +428,13 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, err_xdp: rcu_read_unlock(); xdp_return_frame(frame); - +xdp_xmit: return NULL; } static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, - struct sk_buff *skb) + struct sk_buff *skb, + unsigned int *xdp_xmit) { u32 pktlen, headroom, act, metalen; void *orig_data, *orig_data_end; @@ -444,6 +505,26 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, switch (act) { case XDP_PASS: break; + case XDP_TX: + get_page(virt_to_page(xdp.data)); + consume_skb(skb); + xdp.rxq->mem = priv->xdp_mem; + if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { + trace_xdp_exception(priv->dev, xdp_prog, act); + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_TX; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + get_page(virt_to_page(xdp.data)); + consume_skb(skb); + xdp.rxq->mem = priv->xdp_mem; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) + goto err_xdp; + *xdp_xmit |= VETH_XDP_REDIR; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -474,9 +555,15 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, rcu_read_unlock(); kfree_skb(skb); return NULL; +err_xdp: + rcu_read_unlock(); + page_frag_free(xdp.data); +xdp_xmit: + return NULL; } -static int veth_xdp_rcv(struct veth_priv *priv, int budget) +static int veth_xdp_rcv(struct veth_priv *priv, int budget, + unsigned int *xdp_xmit) { int i, done = 0; @@ -487,10 +574,12 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget) if (!ptr) break; - if (veth_is_xdp_frame(ptr)) - skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr)); - else - skb = veth_xdp_rcv_skb(priv, ptr); + if (veth_is_xdp_frame(ptr)) { + skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr), + xdp_xmit); + } else { + skb = veth_xdp_rcv_skb(priv, ptr, xdp_xmit); + } if (skb) napi_gro_receive(&priv->xdp_napi, skb); @@ -505,9 +594,11 @@ static int veth_poll(struct napi_struct *napi, int budget) { struct veth_priv *priv = container_of(napi, struct veth_priv, xdp_napi); + unsigned int xdp_xmit = 0; int done; - done = veth_xdp_rcv(priv, budget); + xdp_set_return_frame_no_direct(); + done = veth_xdp_rcv(priv, budget, &xdp_xmit); if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ @@ -518,6 +609,12 @@ static int veth_poll(struct napi_struct *napi, int budget) } } + if (xdp_xmit & VETH_XDP_TX) + veth_xdp_flush(priv->dev); + if (xdp_xmit & VETH_XDP_REDIR) + xdp_do_flush_map(); + xdp_clear_return_frame_no_direct(); + return done; } @@ -564,6 +661,9 @@ static int veth_enable_xdp(struct net_device *dev) err = veth_napi_add(dev); if (err) goto err; + + /* Save original mem info as it can be overwritten */ + priv->xdp_mem = priv->xdp_rxq.mem; } rcu_assign_pointer(priv->xdp_prog, priv->_xdp_prog); @@ -581,6 +681,7 @@ static void veth_disable_xdp(struct net_device *dev) rcu_assign_pointer(priv->xdp_prog, NULL); veth_napi_del(dev); + priv->xdp_rxq.mem = priv->xdp_mem; xdp_rxq_info_unreg(&priv->xdp_rxq); } From patchwork Thu Jul 26 14:40:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 949762 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="u/ghAT/h"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41bvtG2ktKz9ryl for ; Fri, 27 Jul 2018 00:41:26 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731472AbeGZP6e (ORCPT ); Thu, 26 Jul 2018 11:58:34 -0400 Received: from mail-pl0-f68.google.com ([209.85.160.68]:47001 "EHLO mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbeGZP6e (ORCPT ); Thu, 26 Jul 2018 11:58:34 -0400 Received: by mail-pl0-f68.google.com with SMTP id t17-v6so916420ply.13 for ; Thu, 26 Jul 2018 07:41:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=25jWibDkIkiuouCcXVYwucJgAzalMwAdlR7VAB5RAg8=; b=u/ghAT/hIOxiTIH3Q3/UOFUrvZ5A4KDYrG/vzW1/aexF+WpBiL6tzI8A/s79qVRzzE C9ls4ALVyJdQy/aYMTBnrXOR05dD8gwXV+1N4RDQdfqUJIsWJ/Ql3SDw6kR/ymgnUzZg R5UTJy0FworCQ/WryvfwdfyDP/IcP9Y00S/COHWKB8dwltIdFAWz0YFJnEvlS+bQzoX2 CodryDoMbNtIEil3aHpFbB7mpbwqvI4EnfGMD7xsX7JU6D3PJfmVug2aYTjAmIXDKVGv 1JbKfkWNJSWMK84GuoRX0VkFpSaZ0tH7IJ6BmTdacgBXctbAlRsj9u+t6lZ0mmh8y8DK nl9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=25jWibDkIkiuouCcXVYwucJgAzalMwAdlR7VAB5RAg8=; b=A1BZQhqEEaFbTKeD3niHp1RA1FeqRLFTQIA04/hdx97LURQXuYEn7+Cb4/b5n5GZTb MVcOsdZD2diHBd1VTepzvgfSLaNejhuU/Vk0+B0V+5LFoMYn0awERCIrkwcm5cCRNYvD BZplQONbf3VDa37IuVuVSoE+FdYuoCaMvlnJtNX7XW4O9stIXxDHqaP4R7yzZm0VkyJN D8Djrh94chpCIcbXC4DpqGsHUXR8c2MQzu8a9OtZDs/bvAmy+Gq3KdQ2hw0Ayk+xhbO/ j9OlDpRI14W/3nsBhN3sR2m1+61UiRhfbkHcuIqK9MBU2i2jjQoCNariHUHMwDxaVlXq embw== X-Gm-Message-State: AOUpUlEZbaX20yKYVtv94zaawZ71O1O5W8AACKqHj6lbgLxIQ9IBVRyw FM1ep6qYF+7pHZ0vkwi1g4h2zlcF X-Google-Smtp-Source: AAOMgpebRlocS8acChEZGw7O2E/JEY4OtAQcwOBBq3VdTgbq2zfuEhki1fluryW1yYjoKVSHDpb2dg== X-Received: by 2002:a17:902:8a92:: with SMTP id p18-v6mr2290245plo.148.1532616083617; Thu, 26 Jul 2018 07:41:23 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id p3-v6sm2649982pfo.130.2018.07.26.07.41.21 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 26 Jul 2018 07:41:22 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer , Jakub Kicinski Subject: [PATCH v5 bpf-next 9/9] veth: Support per queue XDP ring Date: Thu, 26 Jul 2018 23:40:32 +0900 Message-Id: <20180726144032.2116-10-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180726144032.2116-1-toshiaki.makita1@gmail.com> References: <20180726144032.2116-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita Move XDP and napi related fields in veth_priv to newly created veth_rq structure. When xdp_frames are enqueued from ndo_xdp_xmit and XDP_TX, rxq is selected by current cpu. When skbs are enqueued from the peer device, rxq is one to one mapping of its peer txq. This way we have a restriction that the number of rxqs must not less than the number of peer txqs, but leave the possibility to achieve bulk skb xmit in the future because txq lock would make it possible to remove rxq ptr_ring lock. v3: - Add extack messages. - Fix array overrun in veth_xmit. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 278 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 188 insertions(+), 90 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 60397a8ea2e9..3059b897ecea 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -42,20 +42,24 @@ struct pcpu_vstats { struct u64_stats_sync syncp; }; -struct veth_priv { +struct veth_rq { struct napi_struct xdp_napi; struct net_device *dev; struct bpf_prog __rcu *xdp_prog; - struct bpf_prog *_xdp_prog; - struct net_device __rcu *peer; - atomic64_t dropped; struct xdp_mem_info xdp_mem; - unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; struct xdp_rxq_info xdp_rxq; }; +struct veth_priv { + struct net_device __rcu *peer; + atomic64_t dropped; + struct bpf_prog *_xdp_prog; + struct veth_rq *rq; + unsigned int requested_headroom; +}; + /* * ethtool interface */ @@ -144,19 +148,19 @@ static void veth_ptr_free(void *ptr) kfree_skb(ptr); } -static void __veth_xdp_flush(struct veth_priv *priv) +static void __veth_xdp_flush(struct veth_rq *rq) { /* Write ptr_ring before reading rx_notify_masked */ smp_mb(); - if (!priv->rx_notify_masked) { - priv->rx_notify_masked = true; - napi_schedule(&priv->xdp_napi); + if (!rq->rx_notify_masked) { + rq->rx_notify_masked = true; + napi_schedule(&rq->xdp_napi); } } -static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) +static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) { - if (unlikely(ptr_ring_produce(&priv->xdp_ring, skb))) { + if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) { dev_kfree_skb_any(skb); return NET_RX_DROP; } @@ -164,21 +168,22 @@ static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) return NET_RX_SUCCESS; } -static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool xdp) +static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, + struct veth_rq *rq, bool xdp) { - struct veth_priv *priv = netdev_priv(dev); - return __dev_forward_skb(dev, skb) ?: xdp ? - veth_xdp_rx(priv, skb) : + veth_xdp_rx(rq, skb) : netif_rx(skb); } static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct veth_rq *rq = NULL; struct net_device *rcv; int length = skb->len; bool rcv_xdp = false; + int rxq; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -188,9 +193,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } rcv_priv = netdev_priv(rcv); - rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog); + rxq = skb_get_queue_mapping(skb); + if (rxq < rcv->real_num_rx_queues) { + rq = &rcv_priv->rq[rxq]; + rcv_xdp = rcu_access_pointer(rq->xdp_prog); + if (rcv_xdp) + skb_record_rx_queue(skb, rxq); + } - if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) { + if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) { struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); u64_stats_update_begin(&stats->syncp); @@ -203,7 +214,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } if (rcv_xdp) - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); rcu_read_unlock(); @@ -278,12 +289,18 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static int veth_select_rxq(struct net_device *dev) +{ + return smp_processor_id() % dev->real_num_rx_queues; +} + static int veth_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; unsigned int max_len; + struct veth_rq *rq; int i, drops = 0; if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) @@ -294,30 +311,31 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return -ENXIO; rcv_priv = netdev_priv(rcv); + rq = &rcv_priv->rq[veth_select_rxq(rcv)]; /* Non-NULL xdp_prog ensures that xdp_ring is initialized on receive * side. This means an XDP program is loaded on the peer and the peer * device is up. */ - if (!rcu_access_pointer(rcv_priv->xdp_prog)) + if (!rcu_access_pointer(rq->xdp_prog)) return -ENXIO; max_len = rcv->mtu + rcv->hard_header_len + VLAN_HLEN; - spin_lock(&rcv_priv->xdp_ring.producer_lock); + spin_lock(&rq->xdp_ring.producer_lock); for (i = 0; i < n; i++) { struct xdp_frame *frame = frames[i]; void *ptr = veth_xdp_to_ptr(frame); if (unlikely(frame->len > max_len || - __ptr_ring_produce(&rcv_priv->xdp_ring, ptr))) { + __ptr_ring_produce(&rq->xdp_ring, ptr))) { xdp_return_frame_rx_napi(frame); drops++; } } - spin_unlock(&rcv_priv->xdp_ring.producer_lock); + spin_unlock(&rq->xdp_ring.producer_lock); if (flags & XDP_XMIT_FLUSH) - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); return n - drops; } @@ -326,6 +344,7 @@ static void veth_xdp_flush(struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; + struct veth_rq *rq; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -333,11 +352,12 @@ static void veth_xdp_flush(struct net_device *dev) goto out; rcv_priv = netdev_priv(rcv); + rq = &rcv_priv->rq[veth_select_rxq(rcv)]; /* xdp_ring is initialized on receive side? */ - if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog))) + if (unlikely(!rcu_access_pointer(rq->xdp_prog))) goto out; - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); out: rcu_read_unlock(); } @@ -352,7 +372,7 @@ static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) return veth_xdp_xmit(dev, 1, &frame, 0); } -static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, +static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, struct xdp_frame *frame, unsigned int *xdp_xmit) { @@ -363,7 +383,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, struct sk_buff *skb; rcu_read_lock(); - xdp_prog = rcu_dereference(priv->xdp_prog); + xdp_prog = rcu_dereference(rq->xdp_prog); if (likely(xdp_prog)) { struct xdp_buff xdp; u32 act; @@ -372,7 +392,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, xdp.data = frame->data; xdp.data_end = frame->data + frame->len; xdp.data_meta = frame->data - frame->metasize; - xdp.rxq = &priv->xdp_rxq; + xdp.rxq = &rq->xdp_rxq; act = bpf_prog_run_xdp(xdp_prog, &xdp); @@ -385,8 +405,8 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, orig_frame = *frame; xdp.data_hard_start = frame; xdp.rxq->mem = frame->mem; - if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { - trace_xdp_exception(priv->dev, xdp_prog, act); + if (unlikely(veth_xdp_tx(rq->dev, &xdp) < 0)) { + trace_xdp_exception(rq->dev, xdp_prog, act); frame = &orig_frame; goto err_xdp; } @@ -397,7 +417,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, orig_frame = *frame; xdp.data_hard_start = frame; xdp.rxq->mem = frame->mem; - if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) { + if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { frame = &orig_frame; goto err_xdp; } @@ -407,7 +427,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: - trace_xdp_exception(priv->dev, xdp_prog, act); + trace_xdp_exception(rq->dev, xdp_prog, act); case XDP_DROP: goto err_xdp; } @@ -422,7 +442,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, } memset(frame, 0, sizeof(*frame)); - skb->protocol = eth_type_trans(skb, priv->dev); + skb->protocol = eth_type_trans(skb, rq->dev); err: return skb; err_xdp: @@ -432,8 +452,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, return NULL; } -static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, - struct sk_buff *skb, +static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb, unsigned int *xdp_xmit) { u32 pktlen, headroom, act, metalen; @@ -443,7 +462,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, struct xdp_buff xdp; rcu_read_lock(); - xdp_prog = rcu_dereference(priv->xdp_prog); + xdp_prog = rcu_dereference(rq->xdp_prog); if (unlikely(!xdp_prog)) { rcu_read_unlock(); goto out; @@ -496,7 +515,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, xdp.data = skb_mac_header(skb); xdp.data_end = xdp.data + pktlen; xdp.data_meta = xdp.data; - xdp.rxq = &priv->xdp_rxq; + xdp.rxq = &rq->xdp_rxq; orig_data = xdp.data; orig_data_end = xdp.data_end; @@ -508,9 +527,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, case XDP_TX: get_page(virt_to_page(xdp.data)); consume_skb(skb); - xdp.rxq->mem = priv->xdp_mem; - if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { - trace_xdp_exception(priv->dev, xdp_prog, act); + xdp.rxq->mem = rq->xdp_mem; + if (unlikely(veth_xdp_tx(rq->dev, &xdp) < 0)) { + trace_xdp_exception(rq->dev, xdp_prog, act); goto err_xdp; } *xdp_xmit |= VETH_XDP_TX; @@ -519,8 +538,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, case XDP_REDIRECT: get_page(virt_to_page(xdp.data)); consume_skb(skb); - xdp.rxq->mem = priv->xdp_mem; - if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) + xdp.rxq->mem = rq->xdp_mem; + if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) goto err_xdp; *xdp_xmit |= VETH_XDP_REDIR; rcu_read_unlock(); @@ -528,7 +547,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: - trace_xdp_exception(priv->dev, xdp_prog, act); + trace_xdp_exception(rq->dev, xdp_prog, act); case XDP_DROP: goto drop; } @@ -544,7 +563,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, off = xdp.data_end - orig_data_end; if (off != 0) __skb_put(skb, off); - skb->protocol = eth_type_trans(skb, priv->dev); + skb->protocol = eth_type_trans(skb, rq->dev); metalen = xdp.data - xdp.data_meta; if (metalen) @@ -562,27 +581,26 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, return NULL; } -static int veth_xdp_rcv(struct veth_priv *priv, int budget, - unsigned int *xdp_xmit) +static int veth_xdp_rcv(struct veth_rq *rq, int budget, unsigned int *xdp_xmit) { int i, done = 0; for (i = 0; i < budget; i++) { - void *ptr = __ptr_ring_consume(&priv->xdp_ring); + void *ptr = __ptr_ring_consume(&rq->xdp_ring); struct sk_buff *skb; if (!ptr) break; if (veth_is_xdp_frame(ptr)) { - skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr), + skb = veth_xdp_rcv_one(rq, veth_ptr_to_xdp(ptr), xdp_xmit); } else { - skb = veth_xdp_rcv_skb(priv, ptr, xdp_xmit); + skb = veth_xdp_rcv_skb(rq, ptr, xdp_xmit); } if (skb) - napi_gro_receive(&priv->xdp_napi, skb); + napi_gro_receive(&rq->xdp_napi, skb); done++; } @@ -592,25 +610,25 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget, static int veth_poll(struct napi_struct *napi, int budget) { - struct veth_priv *priv = - container_of(napi, struct veth_priv, xdp_napi); + struct veth_rq *rq = + container_of(napi, struct veth_rq, xdp_napi); unsigned int xdp_xmit = 0; int done; xdp_set_return_frame_no_direct(); - done = veth_xdp_rcv(priv, budget, &xdp_xmit); + done = veth_xdp_rcv(rq, budget, &xdp_xmit); if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ - smp_store_mb(priv->rx_notify_masked, false); - if (unlikely(!__ptr_ring_empty(&priv->xdp_ring))) { - priv->rx_notify_masked = true; - napi_schedule(&priv->xdp_napi); + smp_store_mb(rq->rx_notify_masked, false); + if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) { + rq->rx_notify_masked = true; + napi_schedule(&rq->xdp_napi); } } if (xdp_xmit & VETH_XDP_TX) - veth_xdp_flush(priv->dev); + veth_xdp_flush(rq->dev); if (xdp_xmit & VETH_XDP_REDIR) xdp_do_flush_map(); xdp_clear_return_frame_no_direct(); @@ -621,56 +639,90 @@ static int veth_poll(struct napi_struct *napi, int budget) static int veth_napi_add(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); - int err; + int err, i; - err = ptr_ring_init(&priv->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); - if (err) - return err; + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + err = ptr_ring_init(&rq->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (err) + goto err_xdp_ring; + } - netif_napi_add(dev, &priv->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); - napi_enable(&priv->xdp_napi); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); + napi_enable(&rq->xdp_napi); + } return 0; +err_xdp_ring: + for (i--; i >= 0; i--) + ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); + + return err; } static void veth_napi_del(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); + int i; - napi_disable(&priv->xdp_napi); - netif_napi_del(&priv->xdp_napi); - priv->rx_notify_masked = false; - ptr_ring_cleanup(&priv->xdp_ring, veth_ptr_free); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + napi_disable(&rq->xdp_napi); + napi_hash_del(&rq->xdp_napi); + } + synchronize_net(); + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + netif_napi_del(&rq->xdp_napi); + rq->rx_notify_masked = false; + ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free); + } } static int veth_enable_xdp(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); - int err; + int err, i; - if (!xdp_rxq_info_is_reg(&priv->xdp_rxq)) { - err = xdp_rxq_info_reg(&priv->xdp_rxq, dev, 0); - if (err < 0) - return err; + if (!xdp_rxq_info_is_reg(&priv->rq[0].xdp_rxq)) { + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; - err = xdp_rxq_info_reg_mem_model(&priv->xdp_rxq, - MEM_TYPE_PAGE_SHARED, NULL); - if (err < 0) - goto err; + err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i); + if (err < 0) + goto err_rxq_reg; + + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, + MEM_TYPE_PAGE_SHARED, + NULL); + if (err < 0) + goto err_reg_mem; + + /* Save original mem info as it can be overwritten */ + rq->xdp_mem = rq->xdp_rxq.mem; + } err = veth_napi_add(dev); if (err) - goto err; - - /* Save original mem info as it can be overwritten */ - priv->xdp_mem = priv->xdp_rxq.mem; + goto err_rxq_reg; } - rcu_assign_pointer(priv->xdp_prog, priv->_xdp_prog); + for (i = 0; i < dev->real_num_rx_queues; i++) + rcu_assign_pointer(priv->rq[i].xdp_prog, priv->_xdp_prog); return 0; -err: - xdp_rxq_info_unreg(&priv->xdp_rxq); +err_reg_mem: + xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq); +err_rxq_reg: + for (i--; i >= 0; i--) + xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq); return err; } @@ -678,11 +730,17 @@ static int veth_enable_xdp(struct net_device *dev) static void veth_disable_xdp(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); + int i; - rcu_assign_pointer(priv->xdp_prog, NULL); + for (i = 0; i < dev->real_num_rx_queues; i++) + rcu_assign_pointer(priv->rq[i].xdp_prog, NULL); veth_napi_del(dev); - priv->xdp_rxq.mem = priv->xdp_mem; - xdp_rxq_info_unreg(&priv->xdp_rxq); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + rq->xdp_rxq.mem = rq->xdp_mem; + xdp_rxq_info_unreg(&rq->xdp_rxq); + } } static int veth_open(struct net_device *dev) @@ -839,6 +897,12 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, goto err; } + if (dev->real_num_rx_queues < peer->real_num_tx_queues) { + NL_SET_ERR_MSG_MOD(extack, "XDP expects number of rx queues not less than peer tx queues"); + err = -ENOSPC; + goto err; + } + if (dev->flags & IFF_UP) { err = veth_enable_xdp(dev); if (err) { @@ -973,13 +1037,31 @@ static int veth_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } +static int veth_alloc_queues(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL); + if (!priv->rq) + return -ENOMEM; + + return 0; +} + +static void veth_free_queues(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + kfree(priv->rq); +} + static struct rtnl_link_ops veth_link_ops; static int veth_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - int err; + int err, i; struct net_device *peer; struct veth_priv *priv; char ifname[IFNAMSIZ]; @@ -1032,6 +1114,12 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, return PTR_ERR(peer); } + err = veth_alloc_queues(peer); + if (err) { + put_net(net); + goto err_peer_alloc_queues; + } + if (!ifmp || !tbp[IFLA_ADDRESS]) eth_hw_addr_random(peer); @@ -1060,6 +1148,10 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, * should be re-allocated */ + err = veth_alloc_queues(dev); + if (err) + goto err_alloc_queues; + if (tb[IFLA_ADDRESS] == NULL) eth_hw_addr_random(dev); @@ -1079,22 +1171,28 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, */ priv = netdev_priv(dev); - priv->dev = dev; + for (i = 0; i < dev->real_num_rx_queues; i++) + priv->rq[i].dev = dev; rcu_assign_pointer(priv->peer, peer); priv = netdev_priv(peer); - priv->dev = peer; + for (i = 0; i < peer->real_num_rx_queues; i++) + priv->rq[i].dev = peer; rcu_assign_pointer(priv->peer, dev); return 0; err_register_dev: + veth_free_queues(dev); +err_alloc_queues: /* nothing to do */ err_configure_peer: unregister_netdevice(peer); return err; err_register_peer: + veth_free_queues(peer); +err_peer_alloc_queues: free_netdev(peer); return err; }