Message ID | AANLkTimRwNqPYh1MgXPhh3hHep7Koc3OJCCroEj_scqg@mail.gmail.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
Junchang Wang <junchangwang@gmail.com> : > Traffic stats counters (rx_bytes and tx_bytes) in net_device are > "unsigned long". On 32-bit systems, they wrap around every few > minutes, giving out wrong answers to the amount of traffic. To get the > right message, another available approach is "ethtool -S". However, > r8169 didn't support those two counters so far. ethtool is - imvho - biased toward hardware, not toward software emulation. If the packets are short enough, replace "_bytes" by "_packets", "_minutes" by "_hours" or "_every_day" and the same kind of problem appear. You can fix the application at zero cost in the kernel: poll < 34 s and update the application counters with the kernel counters increment.
From: Junchang Wang <junchangwang@gmail.com> Date: Tue, 25 May 2010 22:19:46 +0800 > Traffic stats counters (rx_bytes and tx_bytes) in net_device are > "unsigned long". On 32-bit systems, they wrap around every few > minutes, giving out wrong answers to the amount of traffic. To get the > right message, another available approach is "ethtool -S". However, > r8169 didn't support those two counters so far. > > Add traffic counters tx_bytes and rx_bytes with 64-bit width for > ethtool. On 32-bit systems, gcc treats each one as two 32-bit > variables, making the increment not "atomic". But there is no sync > issue since the updates to the counters are serialized by driver logic > in any case. Results provided by ethtool maybe slightly biased if the > read and update operations are interleaved. But the results are much > better than the original ones that always fall into the range from 0 > to 4GiB. > > Signed-off-by: Junchang Wang <junchangwang@gmail.com> I absolutely do not want to see drivers start doing this, so right off the bat I am not going to apply this patch. If the problem is that people want 64-bit counters available for core statistics on 32-bit systems, we do not fix that problem by hacking every single driver to provide them side-band via ethtool. First of all, we now have "struct rtnl_link_stats64" in linux/if_link.h, it's there to start combating this problem generically, for every device, rather than the way you are trying handle it only for one specific driver at a time. So that's the area where you should start looking to solve these kinds of problem. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, Thanks for your advice. > > If the problem is that people want 64-bit counters available for core > statistics on 32-bit systems, we do not fix that problem by hacking > every single driver to provide them side-band via ethtool. Most NICs have provided those two 64-bit counters in hardware. They work fine even in 32-bit systems and don't need new 64-bit counters any more. Frankly, r8169 is the first Gbps NIC I have seen that does not support those two counters. So I thought changing upper layer is immoderate and tried to provide a cheap but valuable way. > > First of all, we now have "struct rtnl_link_stats64" in > linux/if_link.h, it's there to start combating this problem > generically, for every device, rather than the way you are trying > handle it only for one specific driver at a time. Thanks for your advice. I'll go deep into it and see how we can solve this problem.
Hi Francois, > > If the packets are short enough, replace "_bytes" by "_packets", "_minutes" > by "_hours" or "_every_day" and the same kind of problem appear. r8169 has provided 64-bit hardware counters for #packets, #error_packets, etc. They works fine even on 32-bit systems. What we really need is just counter rx_bytes and tx_bytes. > > You can fix the application at zero cost in the kernel: poll < 34 s and > update the application counters with the kernel counters increment. Thanks for you advice.
On Tue, 2010-05-25 at 16:15 -0700, David Miller wrote: [...] > If the problem is that people want 64-bit counters available for core > statistics on 32-bit systems, we do not fix that problem by hacking > every single driver to provide them side-band via ethtool. > > First of all, we now have "struct rtnl_link_stats64" in > linux/if_link.h, it's there to start combating this problem > generically, for every device, rather than the way you are trying > handle it only for one specific driver at a time. > > So that's the area where you should start looking to solve these kinds > of problem. My understand of the current situation is as follows; correct me if any of this is wrong: The standard counters in struct net_device_stats have type unsigned long which is the native word size and so can be read and updated automatically. Net drivers can update counters from the data path without any interlocking with their ndo_get_stats implementation or the networking core code which reads them. The values returned by ndo_get_stats (by reference) are exposed: - Through procfs (/proc/net/dev) as columns of numbers - Through sysfs (/sys/class/net/*/stats/*) as single numeric strings - Through netlink (IFLA_STATS and IFLA_STATS64) as 32-bit or 64-bit values in binary structures Changing the counter types to u64 for 32-bit architectures would remove atomicity and expose half-updated counters to userland. Changing the driver interface significantly so that atomicity is not needed would require changes to hundreds of drivers. Assuming the above is all correct, I think we can only solve this with a gradual change (as for net_device_ops). The following might work: 1. a. Define struct net_device_stats64 identically to rtnl_link_stats64. b. Add net_device_ops::ndo_get_stats64, the implementation of which will return a pointer to such a structure. The referenced structure must only be updated atomically, except within the call to ndo_get_stats64. (For 64-bit architectures, these could be macro aliases to the existing structure and operations.) c. On 32-bit architectures, insert unsigned long padding after each member of struct net_device_stats. d. Add an anonymous union in net_device; move stats into the union and add struct net_device_stats64 stats64. e. Change dev_get_stats() to return a pointer to struct net_device_stats64, and to call ndo_get_stats64 if available in preference to ndo_get_stats. Adjust callers accordingly. f. Either change dev_get_stats() to clear the padding members, or change drivers to ensure that the padding is cleared. 2. Change drivers to define ndo_get_stats64, following the rule stated in 1b. 3. Much later, remove net_device_stats. Ben.
On Wed, 02 Jun 2010 22:34:29 +0100 Ben Hutchings <bhutchings@solarflare.com> wrote: > Changing the counter types to u64 for 32-bit architectures would remove > atomicity and expose half-updated counters to userland. Changing the > driver interface significantly so that atomicity is not needed would > require changes to hundreds of drivers. Another big issue is maintaining ABI compatibility for /proc and ioctl interfaces. So bigger values would only be available through netlink, and most applications using counters don't use netlink.
On Wednesday 02 June 2010 23:59:12 Stephen Hemminger wrote: > On Wed, 02 Jun 2010 22:34:29 +0100 > Ben Hutchings <bhutchings@solarflare.com> wrote: > > > Changing the counter types to u64 for 32-bit architectures would remove > > atomicity and expose half-updated counters to userland. Changing the > > driver interface significantly so that atomicity is not needed would > > require changes to hundreds of drivers. > > Another big issue is maintaining ABI compatibility for /proc and ioctl > interfaces. So bigger values would only be available through netlink, > and most applications using counters don't use netlink. Doesn't the /proc interface already allow 64 bit values in case of 64 bit hosts running the same 32 bit user space? For the ioctl interfaces, we can of course introduce additional ioctl commands that always deal with 64 bit values, many other subsystems have been extended in similar ways. We don't mandate that a program built against new kernel headers must work on old kernels, so it is possibly to actually change the definitions, as long as the kernel still supports the old interfaces for existing binaries. We should still have a good reason even for breaking the ABI in ways that we don't guarantee to be stable. Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-06-03 at 01:38 +0200, Arnd Bergmann wrote: > On Wednesday 02 June 2010 23:59:12 Stephen Hemminger wrote: > > On Wed, 02 Jun 2010 22:34:29 +0100 > > Ben Hutchings <bhutchings@solarflare.com> wrote: > > > > > Changing the counter types to u64 for 32-bit architectures would remove > > > atomicity and expose half-updated counters to userland. Changing the > > > driver interface significantly so that atomicity is not needed would > > > require changes to hundreds of drivers. > > > > Another big issue is maintaining ABI compatibility for /proc and ioctl > > interfaces. So bigger values would only be available through netlink, > > and most applications using counters don't use netlink. > > Doesn't the /proc interface already allow 64 bit values in case of > 64 bit hosts running the same 32 bit user space? Yes. And the most widely used consumer of /proc/net/dev, ifconfig, already parses them as 64-bit values. > For the ioctl interfaces, we can of course introduce additional > ioctl commands that always deal with 64 bit values, many other > subsystems have been extended in similar ways. [...] Are there any ioctl interfaces to net_device_stats? I didn't spot any. Ben.
Hi Ben, On Wed, Jun 02, 2010 at 10:34:29PM +0100, Ben Hutchings wrote: >Changing the counter types to u64 for 32-bit architectures would remove >atomicity and expose half-updated counters to userland. Changing the >driver interface significantly so that atomicity is not needed would >require changes to hundreds of drivers. > >Assuming the above is all correct, I think we can only solve this with a >gradual change (as for net_device_ops). The following might work: > I realized the network team doesn't care about 64-bit counters (especially rx_*) on 32-bit systems. A similar disscussion can be found here: http://www.gossamer-threads.com/lists/linux/kernel/282631?search_string=64%20stats;#282631 And Eric just gave a explanation why they stand by that point. Updating rx_* counters in core network will dirty a cache line. --Junchang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 04 juin 2010 à 09:59 +0800, Junchang Wang a écrit : > Hi Ben, > > On Wed, Jun 02, 2010 at 10:34:29PM +0100, Ben Hutchings wrote: > >Changing the counter types to u64 for 32-bit architectures would remove > >atomicity and expose half-updated counters to userland. Changing the > >driver interface significantly so that atomicity is not needed would > >require changes to hundreds of drivers. > > > >Assuming the above is all correct, I think we can only solve this with a > >gradual change (as for net_device_ops). The following might work: > > > I realized the network team doesn't care about 64-bit counters (especially rx_*) on 32-bit systems. A similar disscussion can be found here: > http://www.gossamer-threads.com/lists/linux/kernel/282631?search_string=64%20stats;#282631 > Well, the outlined discussion is 8 years old. Some things changed so we probably have more possibilities today. per_cpu infrastructure is pretty cool now for example. If stats are updated only by the NIC, we can have 64 bit stats with nearly 0 cost. Real problem is some stats are updated by software. Upgrading them to 64 bits would need atomic64 to coordinate writers and readers, or making sure reader use appropriate locks to serialize with the writer. > And Eric just gave a explanation why they stand by that point. Updating rx_* counters in core network will dirty a cache line. Sorry, I realize I was a bit wrong in my last mail. In txq_trans_update(txq) we only update the time (in jiffies) of last transmission, using a txq field instead of dev->trans_start) txq->tx_bytes, txq->tx_packets, txq->tx_dropped _might_ be used by _drivers_ to update tx counters, instead of dev->stats{tx_bytes, tx_packets, tx_dropped}, especially by multiqueue drivers. txq->tx... updates are better because they are multiqueue compatable (no need to use atomic ops), and they use a cache line already dirtied because we hold txq lock at transmission time (unless for LLTX drivers) (For an example, check drivers/net/macvlan.c, macvlan_start_xmit() and commit 2c11455321f3) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 217e709..19a2748 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -510,6 +510,8 @@ struct rtl8169_private { struct mii_if_info mii; struct rtl8169_counters counters; + u64 tx_bytes; + u64 rx_bytes; u32 saved_wolopts; }; @@ -1162,6 +1164,8 @@ static void rtl8169_set_msglevel(struct net_device *dev, u32 value) static const char rtl8169_gstrings[][ETH_GSTRING_LEN] = { "tx_packets", "rx_packets", + "tx_bytes", + "rx_bytes", "tx_errors", "rx_errors",
Traffic stats counters (rx_bytes and tx_bytes) in net_device are "unsigned long". On 32-bit systems, they wrap around every few minutes, giving out wrong answers to the amount of traffic. To get the right message, another available approach is "ethtool -S". However, r8169 didn't support those two counters so far. Add traffic counters tx_bytes and rx_bytes with 64-bit width for ethtool. On 32-bit systems, gcc treats each one as two 32-bit variables, making the increment not "atomic". But there is no sync issue since the updates to the counters are serialized by driver logic in any case. Results provided by ethtool maybe slightly biased if the read and update operations are interleaved. But the results are much better than the original ones that always fall into the range from 0 to 4GiB. Signed-off-by: Junchang Wang <junchangwang@gmail.com> --- drivers/net/r8169.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) "rx_missed", @@ -1236,17 +1240,19 @@ static void rtl8169_get_ethtool_stats(struct net_device *dev, data[0] = le64_to_cpu(tp->counters.tx_packets); data[1] = le64_to_cpu(tp->counters.rx_packets); - data[2] = le64_to_cpu(tp->counters.tx_errors); - data[3] = le32_to_cpu(tp->counters.rx_errors); - data[4] = le16_to_cpu(tp->counters.rx_missed); - data[5] = le16_to_cpu(tp->counters.align_errors); - data[6] = le32_to_cpu(tp->counters.tx_one_collision); - data[7] = le32_to_cpu(tp->counters.tx_multi_collision); - data[8] = le64_to_cpu(tp->counters.rx_unicast); - data[9] = le64_to_cpu(tp->counters.rx_broadcast); - data[10] = le32_to_cpu(tp->counters.rx_multicast); - data[11] = le16_to_cpu(tp->counters.tx_aborted); - data[12] = le16_to_cpu(tp->counters.tx_underun); + data[2] = tp->tx_bytes; + data[3] = tp->rx_bytes; + data[4] = le64_to_cpu(tp->counters.tx_errors); + data[5] = le32_to_cpu(tp->counters.rx_errors); + data[6] = le16_to_cpu(tp->counters.rx_missed); + data[7] = le16_to_cpu(tp->counters.align_errors); + data[8] = le32_to_cpu(tp->counters.tx_one_collision); + data[9] = le32_to_cpu(tp->counters.tx_multi_collision); + data[10] = le64_to_cpu(tp->counters.rx_unicast); + data[11] = le64_to_cpu(tp->counters.rx_broadcast); + data[12] = le32_to_cpu(tp->counters.rx_multicast); + data[13] = le16_to_cpu(tp->counters.tx_aborted); + data[14] = le16_to_cpu(tp->counters.tx_underun); } static void rtl8169_get_strings(struct net_device *dev, u32 stringset, u8 *data) @@ -4412,6 +4418,7 @@ static void rtl8169_tx_interrupt(struct net_device *dev, dev->stats.tx_bytes += len; dev->stats.tx_packets++; + tp->tx_bytes += len; rtl8169_unmap_tx_skb(tp->pci_dev, tx_skb, tp->TxDescArray + entry); @@ -4567,6 +4574,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev, dev->stats.rx_bytes += pkt_size; dev->stats.rx_packets++; + tp->rx_bytes += pkt_size; } /* Work around for AMD plateform. */ -- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html