From patchwork Sun Jul 22 15:13:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947480 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EaFRFugT"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSnG3nZWz9s3N for ; Mon, 23 Jul 2018 01:13:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729874AbeGVQKg (ORCPT ); Sun, 22 Jul 2018 12:10:36 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:33259 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQKf (ORCPT ); Sun, 22 Jul 2018 12:10:35 -0400 Received: by mail-pl0-f65.google.com with SMTP id 6-v6so7160760plb.0 for ; Sun, 22 Jul 2018 08:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/CZGL/TRevkImEXNf0cT9Y6Le2TsY9OwB0VyJqv64pU=; b=EaFRFugTt4Q3TeQ9g5dgwJKOzSytGZ/8ET6EuBHhHBDI7xQDm0iigBZMVNcJrFXaSu uMvGYo0I6UrH4bxXnebZrWiV++01tSVp6IQHX39aLwY/M0G5U5KyBvRPthkCstFS2Ynb fZtDz2iEyrHF5WhfkjzwRF4LvXLdaEr6NF6NxSFfF4eZcTlmxntsjkK6CcM5+x9rtEb8 woay9jDvFTuHk7qKHVSGeiGo7NiOFFSPpHfKaJe2aMu/X7H3mOgfQrgl6CAQcv69j0BG FuCDL8CIv0Jfmj7G6Uzpf9DT+s9/IRr7KiRBySnu23HvC8ZFKND3iUxgBUKg2QIXpdz/ 8gHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/CZGL/TRevkImEXNf0cT9Y6Le2TsY9OwB0VyJqv64pU=; b=uldjbHiI5cn/orOYQWND99s1ctUBPrqksmtxk8HjfxDxnznM1OA+8nzm4grA4gI5e+ dtcPfOih88zsPSAHXmDs8DgNbzrTNpF0vend9lf1/xz5EWSsBuVSY6CcA0KMOHylBQUa WK6c0cQPMwMnKnS3KCXsY0kllz0Kd3VWEAJHBJ6iNuMnwFrle5JAu/K2rNdN7o4m1fti a/qhHITsoNxEte7xO5RPQVnHDXWtLx9dqA6lJAckh8j8QvmiRZ71r4LkSk3I3cpAZ2MX ydsRXgaS7kZmSDMFs6G/0S7PnZrBvpQXjZWNSLlFbIN/XJnexkePvxeyHnsM9JRTExye NfSw== X-Gm-Message-State: AOUpUlHgRRMrwNDY4C9LMOaOT9FrVZoPICMTBbZR+j+7G7EhaZNVab7Q kIR8M6PmGQ3O8FTyoEGzB3Pr2xJg X-Google-Smtp-Source: AAOMgpdLvcfs+EzmBNbesaAThfjvcM9GLRfxEZ1IpoIRfJ7q6M7m66RyjHDNxTNKpOsRg9vZw9R1tg== X-Received: by 2002:a17:902:8308:: with SMTP id bd8-v6mr9476238plb.329.1532272415592; Sun, 22 Jul 2018 08:13:35 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.33 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:34 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 7/8] veth: Add XDP TX and REDIRECT Date: Mon, 23 Jul 2018 00:13:07 +0900 Message-Id: <20180722151308.5480-8-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This allows further redirection of xdp_frames like NIC -> veth--veth -> veth--veth (XDP) (XDP) (XDP) The intermediate XDP, redirecting packets from NIC to the other veth, reuses xdp_mem_info from NIC so that page recycling of the NIC works on the destination veth's XDP. In this way return_frame is not fully guarded by NAPI, since another NAPI handler on another cpu may use the same xdp_mem_info concurrently. Thus disable napi_direct by XDP_MEM_RF_NO_DIRECT flag. v3: - Fix double free when veth_xdp_tx() returns a positive value. - Convert xdp_xmit and xdp_redir variables into flags. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 110 insertions(+), 9 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 57187e955fea..0323a4ca74e2 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -32,6 +32,10 @@ #define VETH_RING_SIZE 256 #define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) +/* Separating two types of XDP xmit */ +#define VETH_XDP_TX BIT(0) +#define VETH_XDP_REDIR BIT(1) + struct pcpu_vstats { u64 packets; u64 bytes; @@ -45,6 +49,7 @@ struct veth_priv { struct bpf_prog *_xdp_prog; struct net_device __rcu *peer; atomic64_t dropped; + struct xdp_mem_info xdp_mem; unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; @@ -311,10 +316,42 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return n - drops; } +static void veth_xdp_flush(struct net_device *dev) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct net_device *rcv; + + rcu_read_lock(); + rcv = rcu_dereference(priv->peer); + if (unlikely(!rcv)) + goto out; + + rcv_priv = netdev_priv(rcv); + /* xdp_ring is initialized on receive side? */ + if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog))) + goto out; + + __veth_xdp_flush(rcv_priv); +out: + rcu_read_unlock(); +} + +static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) +{ + struct xdp_frame *frame = convert_to_xdp_frame(xdp); + + if (unlikely(!frame)) + return -EOVERFLOW; + + return veth_xdp_xmit(dev, 1, &frame, 0); +} + static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, - struct xdp_frame *frame) + struct xdp_frame *frame, + unsigned int *xdp_xmit) { int len = frame->len, delta = 0; + struct xdp_frame orig_frame; struct bpf_prog *xdp_prog; unsigned int headroom; struct sk_buff *skb; @@ -338,6 +375,31 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, delta = frame->data - xdp.data; len = xdp.data_end - xdp.data; break; + case XDP_TX: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; + if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { + trace_xdp_exception(priv->dev, xdp_prog, act); + frame = &orig_frame; + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_TX; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) { + frame = &orig_frame; + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_REDIR; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -362,12 +424,13 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, err_xdp: rcu_read_unlock(); xdp_return_frame(frame); - +xdp_xmit: return NULL; } static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, - struct sk_buff *skb) + struct sk_buff *skb, + unsigned int *xdp_xmit) { u32 pktlen, headroom, act, metalen; void *orig_data, *orig_data_end; @@ -438,6 +501,26 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, switch (act) { case XDP_PASS: break; + case XDP_TX: + get_page(virt_to_page(xdp.data)); + consume_skb(skb); + xdp.rxq->mem = priv->xdp_mem; + if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { + trace_xdp_exception(priv->dev, xdp_prog, act); + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_TX; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + get_page(virt_to_page(xdp.data)); + consume_skb(skb); + xdp.rxq->mem = priv->xdp_mem; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) + goto err_xdp; + *xdp_xmit |= VETH_XDP_REDIR; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -468,9 +551,15 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, rcu_read_unlock(); kfree_skb(skb); return NULL; +err_xdp: + rcu_read_unlock(); + page_frag_free(xdp.data); +xdp_xmit: + return NULL; } -static int veth_xdp_rcv(struct veth_priv *priv, int budget) +static int veth_xdp_rcv(struct veth_priv *priv, int budget, + unsigned int *xdp_xmit) { int i, done = 0; @@ -481,10 +570,12 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget) if (!ptr) break; - if (veth_is_xdp_frame(ptr)) - skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr)); - else - skb = veth_xdp_rcv_skb(priv, ptr); + if (veth_is_xdp_frame(ptr)) { + skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr), + xdp_xmit); + } else { + skb = veth_xdp_rcv_skb(priv, ptr, xdp_xmit); + } if (skb) napi_gro_receive(&priv->xdp_napi, skb); @@ -499,9 +590,10 @@ static int veth_poll(struct napi_struct *napi, int budget) { struct veth_priv *priv = container_of(napi, struct veth_priv, xdp_napi); + unsigned int xdp_xmit = 0; int done; - done = veth_xdp_rcv(priv, budget); + done = veth_xdp_rcv(priv, budget, &xdp_xmit); if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ @@ -512,6 +604,11 @@ static int veth_poll(struct napi_struct *napi, int budget) } } + if (xdp_xmit & VETH_XDP_TX) + veth_xdp_flush(priv->dev); + if (xdp_xmit & VETH_XDP_REDIR) + xdp_do_flush_map(); + return done; } @@ -558,6 +655,9 @@ static int veth_enable_xdp(struct net_device *dev) err = veth_napi_add(dev); if (err) goto err; + + /* Save original mem info as it can be overwritten */ + priv->xdp_mem = priv->xdp_rxq.mem; } rcu_assign_pointer(priv->xdp_prog, priv->_xdp_prog); @@ -575,6 +675,7 @@ static void veth_disable_xdp(struct net_device *dev) rcu_assign_pointer(priv->xdp_prog, NULL); veth_napi_del(dev); + priv->xdp_rxq.mem = priv->xdp_mem; xdp_rxq_info_unreg(&priv->xdp_rxq); }