Message ID | 1558609008-2590-2-git-send-email-makita.toshiaki@lab.ntt.co.jp |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | veth: Bulk XDP_TX | expand |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to > the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the > heavy cost of indirect call but it also reduces lock acquisition on the > destination device that needs locks like veth and tun. > > XDP_TX does not use indirect calls but drivers which require locks can > benefit from the bulk transmit for XDP_TX as well. XDP_TX happens on the same device, so there's an implicit bulking happening because of the NAPI cycle. So why is an additional mechanism needed (in the general case)? -Toke
On 2019/05/23 20:11, Toke Høiland-Jørgensen wrote: > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > >> XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to >> the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the >> heavy cost of indirect call but it also reduces lock acquisition on the >> destination device that needs locks like veth and tun. >> >> XDP_TX does not use indirect calls but drivers which require locks can >> benefit from the bulk transmit for XDP_TX as well. > > XDP_TX happens on the same device, so there's an implicit bulking > happening because of the NAPI cycle. So why is an additional mechanism > needed (in the general case)? Not sure what the implicit bulking you mention is. XDP_TX calls .ndo_xdp_xmit() for each packet, and it acquires a lock in veth and tun. To avoid this, we need additional storage for bulking like devmap for XDP_REDIRECT.
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: > On 2019/05/23 20:11, Toke Høiland-Jørgensen wrote: >> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> writes: >> >>> XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to >>> the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the >>> heavy cost of indirect call but it also reduces lock acquisition on the >>> destination device that needs locks like veth and tun. >>> >>> XDP_TX does not use indirect calls but drivers which require locks can >>> benefit from the bulk transmit for XDP_TX as well. >> >> XDP_TX happens on the same device, so there's an implicit bulking >> happening because of the NAPI cycle. So why is an additional mechanism >> needed (in the general case)? > > Not sure what the implicit bulking you mention is. XDP_TX calls > .ndo_xdp_xmit() for each packet, and it acquires a lock in veth and > tun. To avoid this, we need additional storage for bulking like devmap > for XDP_REDIRECT. The bulking is in veth_poll(), where veth_xdp_flush() is only called at the end. But see my other reply to the veth.c patch for the lock contention issue... -Toke
diff --git a/include/net/xdp.h b/include/net/xdp.h index 0f25b36..30b36c8 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -84,6 +84,13 @@ struct xdp_frame { struct net_device *dev_rx; /* used by cpumap */ }; +#define XDP_TX_BULK_SIZE 16 +struct xdp_tx_bulk_queue { + struct xdp_frame *q[XDP_TX_BULK_SIZE]; + unsigned int count; +}; +DECLARE_PER_CPU(struct xdp_tx_bulk_queue, xdp_tx_bq); + /* Clear kernel pointers in xdp_frame */ static inline void xdp_scrub_frame(struct xdp_frame *frame) { diff --git a/net/core/xdp.c b/net/core/xdp.c index 4b2b194..0622f2d 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -40,6 +40,9 @@ struct xdp_mem_allocator { struct rcu_head rcu; }; +DEFINE_PER_CPU(struct xdp_tx_bulk_queue, xdp_tx_bq); +EXPORT_PER_CPU_SYMBOL_GPL(xdp_tx_bq); + static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed) { const u32 *k = data;
XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the heavy cost of indirect call but it also reduces lock acquisition on the destination device that needs locks like veth and tun. XDP_TX does not use indirect calls but drivers which require locks can benefit from the bulk transmit for XDP_TX as well. This patch adds per-cpu queues which can be used for bulk transmit on XDP_TX. I did not add functions like enqueue/flush but exposed the queue directly because we should avoid indirect calls on XDP_TX. Note that the queue must be flushed, i.e. "count" member needs to be set to 0, when a NAPI handler which used this queue exits. Otherwise packets left in the queue will be transmitted from totally unintentional devices. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> --- include/net/xdp.h | 7 +++++++ net/core/xdp.c | 3 +++ 2 files changed, 10 insertions(+)