From patchwork Fri Mar 20 20:36:47 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jesse Brandeburg X-Patchwork-Id: 24770 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 1DE81DDDFF for ; Sat, 21 Mar 2009 07:37:09 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757914AbZCTUgy (ORCPT ); Fri, 20 Mar 2009 16:36:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757692AbZCTUgx (ORCPT ); Fri, 20 Mar 2009 16:36:53 -0400 Received: from mga01.intel.com ([192.55.52.88]:48652 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757218AbZCTUgv (ORCPT ); Fri, 20 Mar 2009 16:36:51 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 20 Mar 2009 13:28:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.38,396,1233561600"; d="scan'208";a="674680300" Received: from plxs0284.pdx.intel.com ([10.7.76.148]) by fmsmga001.fm.intel.com with ESMTP; 20 Mar 2009 13:40:35 -0700 Received: from jbrandeb-desk1.amr.corp.intel.com (jbrandeb-desk1.amr.corp.intel.com [134.134.3.173]) by plxs0284.pdx.intel.com (8.12.11.20060308/8.12.10/MailSET/Hub) with ESMTP id n2KKalWH006293; Fri, 20 Mar 2009 13:36:47 -0700 Date: Fri, 20 Mar 2009 13:36:47 -0700 (Pacific Daylight Time) From: "Brandeburg, Jesse" To: Dave Boutcher cc: Eric Dumazet , "netdev@vger.kernel.org" , "e1000-devel@lists.sourceforge.net" Subject: Re: IGMP Join dropping multicast packets In-Reply-To: <91bdcedb0903181851v2b3190bekb832df227910e844@mail.gmail.com> Message-ID: References: <91bdcedb0903141316j2dbf4160wb348a5a9e3bde8ad@mail.gmail.com> <49BC69D5.5000002@cosmosbay.com> <91bdcedb0903151904x1066ac24h63557b588e7c4967@mail.gmail.com> <49BEA1ED.4010907@cosmosbay.com> <91bdcedb0903172050td2ef895he48168987ad94472@mail.gmail.com> <91bdcedb0903181851v2b3190bekb832df227910e844@mail.gmail.com> User-Agent: Alpine 2.00 (WNT 1167 2008-08-23) ReplyTo: "Brandeburg, Jesse" X-X-Sender: amrjbrandeb@imapmail.glb.intel.com MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 18 Mar 2009, Dave Boutcher wrote: > If you go back in this thread I had a dead easy unprivileged user-land testcase > that causes frame loss.  We ran into this in a production environment > (and I kind > of glossed over how long it took to figure out why the hell we were dropping > frames...you can only increase rmem_max so many times ;-)  OTOH not that many > people use multicast, and even fewer notice a few dropped frames, so the > priority is probably lowish. > > On the other other hand, I'm working in the financial trading space these days, > where Linux is pretty much king....and they're all about multicast. here is a patch proposal [RFC] only, I've just briefly tested it for e1000 parts. If you want to give it a spin I would appreciate feedback. [RFC] e1000: fix loss of multicast packets From: Jesse Brandeburg e1000 (and e1000e, igb, ixgbe, ixgb) all do a series of operations each time a multicast address is added. The flow goes something like 1) stack adds one multicast address 2) stack passes whole current list of unicast and multicast addresses to driver 3) driver clears entire list in hardware 4) driver programs each multicast address using iomem in a loop This was causing multicast packets to be lost during the reprogramming process. reference with test program: http://kerneltrap.org/mailarchive/linux-netdev/2009/3/14/5160514/thread Thanks to Dave Boutcher for his report and test program. This driver fix prepares an array all at once in memory and programs it in one shot to the hardware, not requiring an "erase" cycle. It would still be possible for packets to be dropped while the receiver is off during reprogramming. Signed-off-by: Jesse Brandeburg CC: Dave Boutcher --- drivers/net/e1000/e1000_main.c | 40 +++++++++++++++++++++++++++++++--------- 1 files changed, 31 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 26474c9..65697ab 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -2328,6 +2328,12 @@ static void e1000_set_rx_mode(struct net_device *netdev) int mta_reg_count = (hw->mac_type == e1000_ich8lan) ? E1000_NUM_MTA_REGISTERS_ICH8LAN : E1000_NUM_MTA_REGISTERS; + u32 *mcarray = kzalloc(512, GFP_ATOMIC); + + if (!mcarray) { + DPRINTK(PROBE, ERR, "memory allocation failed\n"); + return; + } if (hw->mac_type == e1000_ich8lan) rar_entries = E1000_RAR_ENTRIES_ICH8LAN; @@ -2394,22 +2400,38 @@ static void e1000_set_rx_mode(struct net_device *netdev) } WARN_ON(uc_ptr != NULL); - /* clear the old settings from the multicast hash table */ - - for (i = 0; i < mta_reg_count; i++) { - E1000_WRITE_REG_ARRAY(hw, MTA, i, 0); - E1000_WRITE_FLUSH(); - } - /* load any remaining addresses into the hash table */ for (; mc_ptr; mc_ptr = mc_ptr->next) { + u32 hash_reg, hash_bit, mta; hash_value = e1000_hash_mc_addr(hw, mc_ptr->da_addr); - e1000_mta_set(hw, hash_value); + hash_reg = (hash_value >> 5) & 0x7F; + hash_bit = hash_value & 0x1F; + mta = (1 << hash_bit); + mcarray[hash_reg] |= mta; } + /* write the hash table completely, write from bottom to avoid + * stupid write combining chipsets, and flushing each write */ + for (i = mta_reg_count - 1; i >= 0 ; i--) { + /* If we are on an 82544 and we are trying to write an odd + * offset in the MTA, save off the previous entry before + * writing and restore the old value after writing. + */ + if ((hw->mac_type == e1000_82544) && ((i & 1) == 1)) { + u32 temp = E1000_READ_REG_ARRAY(hw, MTA, (i - 1)); + E1000_WRITE_REG_ARRAY(hw, MTA, i, mcarray[i]); + E1000_WRITE_REG_ARRAY(hw, MTA, (i - 1), temp); + } else { + E1000_WRITE_REG_ARRAY(hw, MTA, i, mcarray[i]); + } + } + E1000_WRITE_FLUSH(); + if (hw->mac_type == e1000_82542_rev2_0) e1000_leave_82542_rst(adapter); + + kfree(mcarray); } /* Need to wait a few seconds after link up to get diagnostic information from