Message ID | tkrat.83117f4523f0d928@s5r6.in-berlin.de |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, 2010-11-13 at 23:07 +0100, Stefan Richter wrote: > This prevents firewire-net from submitting write requests in fast > succession until failure due to all 64 transaction labels were used up > for unfinished split transactions. The netif_stop/wake_queue API is > used for this purpose. > > Without this stop/wake mechanism, datagrams were simply lost whenever > the tlabel pool was exhausted. Plus, tlabel exhaustion by firewire-net > also prevented other unrelated outbound transactions to be initiated. > > The high watermark is set to considerably less than 64 (I chose 8) > because peers which run current Linux firewire-ohci are still easily > saturated by this (i.e. some datagrams are dropped with ack-busy-* > events), depending on the hardware at transmitter and receiver side. > > I did not see changes to resulting throughput that were discernible from > the usual measuring noise. To do: Revisit the choice of queue depth > once firewire-ohci's AR DMA was improved. > > I wonder what a good net_device.tx_queue_len value is. I just set it > to the same value as the chosen watermark for now. > > Note: This removes some netif_wake_queue from reception code paths. > They were apparently copy&paste artefacts from a nonsensical > netif_wake_queue use in the older eth1394 driver. This belongs only > into the transmit path. > > Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> > --- > Update 2: Maxim told me to de-obfuscate status tracking. I realized > that netif_queue_stopped can be used for that and thereby noticed bogus > usages of it in the rx path. In fact after lot of testing I see that original patch, '[PATCH 4/4] firewire: net: throttle TX queue before running out of tlabels' works the best here. With AR fixes, I don't see even a single fwnet_write_complete error on ether side. However the 'update 2' (maybe update 1 too, didn't test), lowers desktop->laptop throughput somewhat. (250 vs 227 Mbits/s). I tested this many times. Actuall raw troughput possible with UDP stream and ether no throttling or higher packets in flight count (I tested 50/30), it 280 Mbits/s. BTW, I still don't understand fully why my laptop sends only at 180 Mbits/s pretty much always regardless of patches or TCP/UDP. I also tested performance impact of other patches, and it is too small to see through the noise. Not bad, ah? From complete trainwreck, the IP over 1394, turned out into very stable and fast connection that beats 100 Mbit ethernet a bit. Now next on my list a POC (Piece of cake) items. I need to figure out why s2ram hoses the network connection. In fact usually, firewire-ohci does work, and reload of firewire-net restores the connection. Also, I need to add all required bits to make firewire-net work with NM. I need to make resets more robust. Currently after cable plug it takes some time until connection starts working. So thanks again, especially to Clemens Ladisch for the hardest fixes that made all this possible. And of course feel free to not merge my AR rewrite, it is mostly done as a prof of concept to see if my hardware is buggy or not. I am sure these patches can be done better. Best regards, Maxim Levitsky -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Maxim Levitsky wrote: > In fact after lot of testing I see that original patch, > '[PATCH 4/4] firewire: net: throttle TX queue before running out of > tlabels' works the best here. > With AR fixes, I don't see even a single fwnet_write_complete error on > ether side. Well, that version missed that the rx path opened up the tx queue again. I.e. it did not work as intended. > However the 'update 2' (maybe update 1 too, didn't test), lowers > desktop->laptop throughput somewhat. > (250 vs 227 Mbits/s). I tested this many times. > > Actuall raw troughput possible with UDP stream and ether no throttling > or higher packets in flight count (I tested 50/30), it 280 Mbits/s. Good, I will test deeper queues with a few different controllers here. As long as we keep a margin to 64 so that other traffic besides IPover1394 still has a chance to acquire transaction labels, it's OK. > BTW, I still don't understand fully why my laptop sends only at 180 > Mbits/s pretty much always regardless of patches or TCP/UDP. If it is not CPU bound, then it is because Ricoh did not optimize the AR DMA unit as well as Texas Instruments did.
On Sun, 2010-11-14 at 10:25 +0100, Stefan Richter wrote: > Maxim Levitsky wrote: > > In fact after lot of testing I see that original patch, > > '[PATCH 4/4] firewire: net: throttle TX queue before running out of > > tlabels' works the best here. > > With AR fixes, I don't see even a single fwnet_write_complete error on > > ether side. > > Well, that version missed that the rx path opened up the tx queue again. I.e. > it did not work as intended. > > > However the 'update 2' (maybe update 1 too, didn't test), lowers > > desktop->laptop throughput somewhat. > > (250 vs 227 Mbits/s). I tested this many times. > > > > Actuall raw troughput possible with UDP stream and ether no throttling > > or higher packets in flight count (I tested 50/30), it 280 Mbits/s. > > Good, I will test deeper queues with a few different controllers here. As > long as we keep a margin to 64 so that other traffic besides IPover1394 still > has a chance to acquire transaction labels, it's OK. Just tested the 'update 2' with 8-16 margin. Gives me ~250 Mbits/s TCP easily, and ~280 Mbit/s UDP. Pretty much the maximum its possible to get out of this hardware. > > > BTW, I still don't understand fully why my laptop sends only at 180 > > Mbits/s pretty much always regardless of patches or TCP/UDP. > > If it is not CPU bound, then it is because Ricoh did not optimize the AR DMA > unit as well as Texas Instruments did. You mean AT, because in the fast case (desktop->laptop), the TI transmits and Ricoh receives. In slow case Ricoh receives and TI transmits. Anyway speeds of new stack beat the old one by significant margin. Best regards, Maxim Levitsky -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: b/drivers/firewire/net.c =================================================================== --- a/drivers/firewire/net.c +++ b/drivers/firewire/net.c @@ -28,8 +28,14 @@ #include <asm/unaligned.h> #include <net/arp.h> -#define FWNET_MAX_FRAGMENTS 25 /* arbitrary limit */ -#define FWNET_ISO_PAGE_COUNT (PAGE_SIZE < 16 * 1024 ? 4 : 2) +/* rx limits */ +#define FWNET_MAX_FRAGMENTS 25 /* arbitrary limit */ +#define FWNET_ISO_PAGE_COUNT (PAGE_SIZE < 16*1024 ? 4 : 2) + +/* tx limits */ +#define FWNET_MAX_QUEUED_DATAGRAMS 8 /* should keep AT DMA busy enough */ +#define FWNET_MIN_QUEUED_DATAGRAMS 2 +#define FWNET_TX_QUEUE_LEN FWNET_MAX_QUEUED_DATAGRAMS /* ? */ #define IEEE1394_BROADCAST_CHANNEL 31 #define IEEE1394_ALL_NODES (0xffc0 | 0x003f) @@ -641,8 +647,6 @@ static int fwnet_finish_incoming_packet( net->stats.rx_packets++; net->stats.rx_bytes += skb->len; } - if (netif_queue_stopped(net)) - netif_wake_queue(net); return 0; @@ -651,8 +655,6 @@ static int fwnet_finish_incoming_packet( net->stats.rx_dropped++; dev_kfree_skb_any(skb); - if (netif_queue_stopped(net)) - netif_wake_queue(net); return -ENOENT; } @@ -784,15 +786,10 @@ static int fwnet_incoming_packet(struct * Datagram is not complete, we're done for the * moment. */ - spin_unlock_irqrestore(&dev->lock, flags); - - return 0; + retval = 0; fail: spin_unlock_irqrestore(&dev->lock, flags); - if (netif_queue_stopped(net)) - netif_wake_queue(net); - return retval; } @@ -892,6 +889,13 @@ static void fwnet_free_ptask(struct fwne kmem_cache_free(fwnet_packet_task_cache, ptask); } +/* Caller must hold dev->lock. */ +static void dec_queued_datagrams(struct fwnet_device *dev) +{ + if (--dev->queued_datagrams == FWNET_MIN_QUEUED_DATAGRAMS) + netif_wake_queue(dev->netdev); +} + static int fwnet_send_packet(struct fwnet_packet_task *ptask); static void fwnet_transmit_packet_done(struct fwnet_packet_task *ptask) @@ -908,7 +912,7 @@ static void fwnet_transmit_packet_done(s /* Check whether we or the networking TX soft-IRQ is last user. */ free = (ptask->outstanding_pkts == 0 && ptask->enqueued); if (free) - dev->queued_datagrams--; + dec_queued_datagrams(dev); if (ptask->outstanding_pkts == 0) { dev->netdev->stats.tx_packets++; @@ -979,7 +983,7 @@ static void fwnet_transmit_packet_failed /* Check whether we or the networking TX soft-IRQ is last user. */ free = ptask->enqueued; if (free) - dev->queued_datagrams--; + dec_queued_datagrams(dev); dev->netdev->stats.tx_dropped++; dev->netdev->stats.tx_errors++; @@ -1064,7 +1068,7 @@ static int fwnet_send_packet(struct fwne if (!free) ptask->enqueued = true; else - dev->queued_datagrams--; + dec_queued_datagrams(dev); spin_unlock_irqrestore(&dev->lock, flags); @@ -1083,7 +1087,7 @@ static int fwnet_send_packet(struct fwne if (!free) ptask->enqueued = true; else - dev->queued_datagrams--; + dec_queued_datagrams(dev); spin_unlock_irqrestore(&dev->lock, flags); @@ -1249,6 +1253,15 @@ static netdev_tx_t fwnet_tx(struct sk_bu struct fwnet_peer *peer; unsigned long flags; + spin_lock_irqsave(&dev->lock, flags); + + /* Can this happen? */ + if (netif_queue_stopped(dev->netdev)) { + spin_unlock_irqrestore(&dev->lock, flags); + + return NETDEV_TX_BUSY; + } + ptask = kmem_cache_alloc(fwnet_packet_task_cache, GFP_ATOMIC); if (ptask == NULL) goto fail; @@ -1267,9 +1280,6 @@ static netdev_tx_t fwnet_tx(struct sk_bu proto = hdr_buf.h_proto; dg_size = skb->len; - /* serialize access to peer, including peer->datagram_label */ - spin_lock_irqsave(&dev->lock, flags); - /* * Set the transmission type for the packet. ARP packets and IP * broadcast packets are sent via GASP. @@ -1291,7 +1301,7 @@ static netdev_tx_t fwnet_tx(struct sk_bu peer = fwnet_peer_find_by_guid(dev, be64_to_cpu(guid)); if (!peer || peer->fifo == FWNET_NO_FIFO_ADDR) - goto fail_unlock; + goto fail; generation = peer->generation; dest_node = peer->node_id; @@ -1345,7 +1355,8 @@ static netdev_tx_t fwnet_tx(struct sk_bu max_payload += RFC2374_FRAG_HDR_SIZE; } - dev->queued_datagrams++; + if (++dev->queued_datagrams == FWNET_MAX_QUEUED_DATAGRAMS) + netif_stop_queue(dev->netdev); spin_unlock_irqrestore(&dev->lock, flags); @@ -1356,9 +1367,9 @@ static netdev_tx_t fwnet_tx(struct sk_bu return NETDEV_TX_OK; - fail_unlock: - spin_unlock_irqrestore(&dev->lock, flags); fail: + spin_unlock_irqrestore(&dev->lock, flags); + if (ptask) kmem_cache_free(fwnet_packet_task_cache, ptask); @@ -1415,7 +1426,7 @@ static void fwnet_init_dev(struct net_de net->addr_len = FWNET_ALEN; net->hard_header_len = FWNET_HLEN; net->type = ARPHRD_IEEE1394; - net->tx_queue_len = 10; + net->tx_queue_len = FWNET_TX_QUEUE_LEN; SET_ETHTOOL_OPS(net, &fwnet_ethtool_ops); }
This prevents firewire-net from submitting write requests in fast succession until failure due to all 64 transaction labels were used up for unfinished split transactions. The netif_stop/wake_queue API is used for this purpose. Without this stop/wake mechanism, datagrams were simply lost whenever the tlabel pool was exhausted. Plus, tlabel exhaustion by firewire-net also prevented other unrelated outbound transactions to be initiated. The high watermark is set to considerably less than 64 (I chose 8) because peers which run current Linux firewire-ohci are still easily saturated by this (i.e. some datagrams are dropped with ack-busy-* events), depending on the hardware at transmitter and receiver side. I did not see changes to resulting throughput that were discernible from the usual measuring noise. To do: Revisit the choice of queue depth once firewire-ohci's AR DMA was improved. I wonder what a good net_device.tx_queue_len value is. I just set it to the same value as the chosen watermark for now. Note: This removes some netif_wake_queue from reception code paths. They were apparently copy&paste artefacts from a nonsensical netif_wake_queue use in the older eth1394 driver. This belongs only into the transmit path. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- Update 2: Maxim told me to de-obfuscate status tracking. I realized that netif_queue_stopped can be used for that and thereby noticed bogus usages of it in the rx path. drivers/firewire/net.c | 59 ++++++++++++++++++++++++----------------- 1 file changed, 35 insertions(+), 24 deletions(-)