From patchwork Wed Jun 3 03:17:04 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rusty Russell X-Patchwork-Id: 28028 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 98CAAB707C for ; Wed, 3 Jun 2009 13:17:29 +1000 (EST) Received: by ozlabs.org (Postfix) id 8B886DDDF4; Wed, 3 Jun 2009 13:17:29 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 23DF9DDDF0 for ; Wed, 3 Jun 2009 13:17:29 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755202AbZFCDRM (ORCPT ); Tue, 2 Jun 2009 23:17:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753254AbZFCDRK (ORCPT ); Tue, 2 Jun 2009 23:17:10 -0400 Received: from ozlabs.org ([203.10.76.45]:51431 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752722AbZFCDRK (ORCPT ); Tue, 2 Jun 2009 23:17:10 -0400 Received: from vivaldi.localnet (unknown [150.101.102.135]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPSA id C09E9DDDE1; Wed, 3 Jun 2009 13:17:11 +1000 (EST) From: Rusty Russell To: Herbert Xu Subject: Re: [PATCH 2/4] virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb. Date: Wed, 3 Jun 2009 12:47:04 +0930 User-Agent: KMail/1.11.2 (Linux/2.6.28-11-generic; KDE/4.2.2; i686; ; ) Cc: netdev@vger.kernel.org, virtualization@lists.linux-foundation.org, David Miller References: <200905292346.04815.rusty@rustcorp.com.au> <200906022325.57961.rusty@rustcorp.com.au> <20090602234532.GA5417@gondor.apana.org.au> In-Reply-To: <20090602234532.GA5417@gondor.apana.org.au> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200906031247.05591.rusty@rustcorp.com.au> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 3 Jun 2009 09:15:32 am Herbert Xu wrote: > On Tue, Jun 02, 2009 at 11:25:57PM +0930, Rusty Russell wrote: > > Or, we could just "return NETDEV_TX_BUSY;". I like that :) > > No you should fix it so that you check the queue status after > transmitting a packet so we never get into this state in the > first place. We could figure out if we can take the worst-case packet, and underutilize our queue. And fix the other *67* drivers. Of course that doesn't even work, because we return NETDEV_TX_BUSY from dev.c! "Hi, core netdevs here. Don't use NETDEV_TX_BUSY. Yeah, we can't figure out how to avoid it either. But y'know, just hack something together". Herbert, we are *better* than this! How's this? Tested for the virtio_net driver here. [RFC] net: fix double-tcpdump problem with NETDEV_TX_BUSY. Herbert shares a distain for drivers returning TX_BUSY because network taps see packets twice when it's used. Unfortunately, it's ubiquitous. This patch marks packets by (ab)using the "peeked" bit in the skb. This bit is currently used for packets queued in a socket; we reset it in dev_queue_xmit and set it when we hand the packet to dev_queue_xmit_nit. We also reset it on incoming packets: this is safe, but it might be sufficient to reset it only in the loopback driver? --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1678,8 +1678,10 @@ int dev_hard_start_xmit(struct sk_buff * int rc; if (likely(!skb->next)) { - if (!list_empty(&ptype_all)) + if (!list_empty(&ptype_all) && !skb->peeked) { dev_queue_xmit_nit(skb, dev); + skb->peeked = true; + } if (netif_needs_gso(dev, skb)) { if (unlikely(dev_gso_segment(skb))) @@ -1796,6 +1798,8 @@ int dev_queue_xmit(struct sk_buff *skb) struct Qdisc *q; int rc = -ENOMEM; + skb->peeked = false; + /* GSO will handle the following emulations directly. */ if (netif_needs_gso(dev, skb)) goto gso; @@ -1942,6 +1946,8 @@ int netif_rx(struct sk_buff *skb) if (!skb->tstamp.tv64) net_timestamp(skb); + skb->peeked = false; + /* * The code is rearranged so that the path is the most * short when CPU is congested, but is still operating.