Message ID | 1408887738-7661-4-git-send-email-dborkman@redhat.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: Daniel Borkmann <dborkman@redhat.com> Date: Sun, 24 Aug 2014 15:42:18 +0200 > This adds a first use-case of deferred tail pointer flushing > for AF_PACKET's TX_RING in QDISC_BYPASS mode. > > Signed-off-by: Daniel Borkmann <dborkman@redhat.com> If last_queue changes, you'll need to force a flush, does that end up happening with your changes here? I really couldn't tell for sure. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/25/2014 07:57 AM, David Miller wrote: > From: Daniel Borkmann <dborkman@redhat.com> > Date: Sun, 24 Aug 2014 15:42:18 +0200 > >> This adds a first use-case of deferred tail pointer flushing >> for AF_PACKET's TX_RING in QDISC_BYPASS mode. >> >> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> > > If last_queue changes, you'll need to force a flush, does that > end up happening with your changes here? I really couldn't > tell for sure. Yes indeed, I noticed that later on as well. :) I will fix that up and resubmit the series, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 24 Aug 2014 15:42:18 +0200 Daniel Borkmann <dborkman@redhat.com> wrote: > This adds a first use-case of deferred tail pointer flushing > for AF_PACKET's TX_RING in QDISC_BYPASS mode. Testing with trafgen. I've updated patch 1/3 to NOT call mmiowb(), during this testing, see why in my other post. trafgen cmdline: trafgen --cpp --dev eth5 --conf udp_example01.trafgen -V --cpus 1 * Only use 1 CPU * default is mmap * default is QDISC_BYPASS mode BASELINE(no-patches): trafgen QDISC_BYPASS and mmap: - tx:1562539 pps With PACKET_FLUSH_THRESH=8, and QDISC_BYPASS and mmap: - tx:1683746 pps Improvement: + 121207 pps - 46 ns (1/1562539*10^9)-(1/1683746*10^9) This is a significant improvement! :-)
On Mon, 25 Aug 2014 15:54:02 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote: > On Sun, 24 Aug 2014 15:42:18 +0200 > Daniel Borkmann <dborkman@redhat.com> wrote: > > > This adds a first use-case of deferred tail pointer flushing > > for AF_PACKET's TX_RING in QDISC_BYPASS mode. > > Testing with trafgen. I've updated patch 1/3 to NOT call mmiowb(), > during this testing, see why in my other post. > > trafgen cmdline: > trafgen --cpp --dev eth5 --conf udp_example01.trafgen -V --cpus 1 > * Only use 1 CPU > * default is mmap > * default is QDISC_BYPASS mode > > BASELINE(no-patches): trafgen QDISC_BYPASS and mmap: > - tx:1562539 pps > > With PACKET_FLUSH_THRESH=8, and QDISC_BYPASS and mmap: > - tx:1683746 pps > > Improvement: > + 121207 pps > - 46 ns (1/1562539*10^9)-(1/1683746*10^9) > > This is a significant improvement! :-) I'm unfortunately seeing a regression, if I'm NOT bypassing the qdisc layer, and still use mmap. Trafgen have an option --qdisc-path for this. (I believe most other solutions, don't set the QDISC_BYPASS socket option) trafgen command: # trafgen --cpp --dev eth5 --conf udp_example01.trafgen -V --qdisc-path --cpus 1 * still use mmap * choose normal qdisc code path via --qdisc-path BASELINE(no-patches): trafgen using --qdisc-path and mmap: - tx:1371307 pps (Patched): trafgen using --qdisc-path and mmap - tx:1345999 pps Regression: * 25308 pps slower than before * 13.71 nanosec slower (1/1371307*10^9)-(1/1345999*10^9) How can we explain this?!? As can be deducted from the baseline numbers, the cost of the qdisc path is fairly high, with 89.24 ns ((1/1562539*10^9)-(1/1371307*10^9)). (This is a bit higher than I expected based on my data from: http://people.netfilter.org/hawk/presentations/nfws2014/dp-accel-qdisc-lockless.pdf where I measured it to be 60ns). (Does this makes sense?): Above results say we can save 46ns by delaying tailptr updates. But the qdisc path itself will add 89ns of delay between packet, which is then too large to take advantage of the tailptr win. (not sure this explains the issue... feel free to come up with a better explanation)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 0dfa990..27457e8 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -216,7 +216,8 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *, static void packet_flush_mclist(struct sock *sk); struct packet_skb_cb { - unsigned int origlen; + u32 enforce_flush:1, + origlen:31; union { struct sockaddr_pkt pkt; struct sockaddr_ll ll; @@ -237,8 +238,11 @@ struct packet_skb_cb { static void __fanout_unlink(struct sock *sk, struct packet_sock *po); static void __fanout_link(struct sock *sk, struct packet_sock *po); +#define PACKET_FLUSH_THRESH 8 + static int packet_direct_xmit(struct sk_buff *skb) { + bool flush = PACKET_SKB_CB(skb)->enforce_flush; struct net_device *dev = skb->dev; netdev_features_t features; struct netdev_queue *txq; @@ -261,9 +265,12 @@ static int packet_direct_xmit(struct sk_buff *skb) HARD_TX_LOCK(dev, txq, smp_processor_id()); if (!netif_xmit_frozen_or_drv_stopped(txq)) { - ret = netdev_start_xmit(skb, dev); - if (ret == NETDEV_TX_OK) + ret = __netdev_xmit_only(skb, dev); + if (ret == NETDEV_TX_OK) { + if (flush) + __netdev_xmit_flush(dev, queue_map); txq_trans_update(txq); + } } HARD_TX_UNLOCK(dev, txq); @@ -313,7 +320,7 @@ static u16 __packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb) return (u16) raw_smp_processor_id() % dev->real_num_tx_queues; } -static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb) +static u16 packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb) { const struct net_device_ops *ops = dev->netdev_ops; u16 queue_index; @@ -327,6 +334,7 @@ static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb) } skb_set_queue_mapping(skb, queue_index); + return queue_index; } /* register_prot_hook must be invoked with the po->bind_lock held, @@ -2237,7 +2245,8 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) unsigned char *addr; int len_sum = 0; int status = TP_STATUS_AVAILABLE; - int hlen, tlen; + int hlen, tlen, pending = 0; + u16 last_queue = 0; mutex_lock(&po->pg_vec_lock); @@ -2276,18 +2285,22 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) ph = packet_current_frame(po, &po->tx_ring, TP_STATUS_SEND_REQUEST); if (unlikely(ph == NULL)) { - if (need_wait && need_resched()) + if (need_wait && need_resched()) { + if (packet_use_direct_xmit(po) && pending > 0) { + __netdev_xmit_flush(dev, last_queue); + pending = 0; + } schedule(); + } continue; } status = TP_STATUS_SEND_REQUEST; hlen = LL_RESERVED_SPACE(dev); - tlen = dev->needed_tailroom; - skb = sock_alloc_send_skb(&po->sk, - hlen + tlen + sizeof(struct sockaddr_ll), - 0, &err); + tlen = dev->needed_tailroom; + skb = sock_alloc_send_skb(&po->sk, hlen + tlen + + sizeof(struct sockaddr_ll), 0, &err); if (unlikely(skb == NULL)) goto out_status; @@ -2319,13 +2332,18 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) } } - packet_pick_tx_queue(dev, skb); + last_queue = packet_pick_tx_queue(dev, skb); skb->destructor = tpacket_destruct_skb; __packet_set_status(po, ph, TP_STATUS_SENDING); packet_inc_pending(&po->tx_ring); status = TP_STATUS_SEND_REQUEST; + if (pending >= PACKET_FLUSH_THRESH) { + PACKET_SKB_CB(skb)->enforce_flush = 1; + pending = -1; + } + err = po->xmit(skb); if (unlikely(err > 0)) { err = net_xmit_errno(err); @@ -2340,7 +2358,11 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) * let's treat it like congestion or err < 0 */ err = 0; + } else { + /* Sucessfully sent out. */ + pending++; } + packet_increment_head(&po->tx_ring); len_sum += tp_len; } while (likely((ph != NULL) || @@ -2354,11 +2376,12 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) err = len_sum; goto out_put; - out_status: __packet_set_status(po, ph, status); kfree_skb(skb); out_put: + if (packet_use_direct_xmit(po) && pending > 0) + __netdev_xmit_flush(dev, last_queue); dev_put(dev); out: mutex_unlock(&po->pg_vec_lock); @@ -2561,6 +2584,8 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) if (unlikely(extra_len == 4)) skb->no_fcs = 1; + PACKET_SKB_CB(skb)->enforce_flush = 1; + err = po->xmit(skb); if (err > 0 && (err = net_xmit_errno(err)) != 0) goto out_unlock;
This adds a first use-case of deferred tail pointer flushing for AF_PACKET's TX_RING in QDISC_BYPASS mode. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> --- net/packet/af_packet.c | 49 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 37 insertions(+), 12 deletions(-)