diff mbox

r8169: Add values missing in @get_stats64 from HW counters

Message ID 1439888664-28620-1-git-send-email-vinschen@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Corinna Vinschen Aug. 18, 2015, 9:04 a.m. UTC
The r8169 driver collects statistical information returned by
@get_stats64 by counting them in the driver itself, even though many
(but not all) of the values are already collected by tally counters
(TCs) in the NIC.  Some of these TC values are not returned by
@get_stats64.  Especially the received multicast packages are missing
from /proc/net/dev.

Rectify this by fetching the TCs and returning them from
rtl8169_get_stats64.

The counters collected in the driver obviously disappear as soon as the
driver is unloaded so after a driver is loaded the counters always start
at 0.  The TCs are only reset by a power cycle and there's no known way
to reset them programatically.  Without further considerations the values
collected by the driver would not match up against the TC values.

Therefore introduce an addition to the rtl8169_private struct and a
function rtl8169_init_tc_counter_offset to store the TCs at first
rtl_open.  Use these values as offsets in rtl8169_get_stats64.

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
---
 drivers/net/ethernet/realtek/r8169.c | 62 ++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

Comments

Francois Romieu Aug. 18, 2015, 9:40 p.m. UTC | #1
Corinna Vinschen <vinschen@redhat.com> :
> The r8169 driver collects statistical information returned by
> @get_stats64 by counting them in the driver itself, even though many
> (but not all) of the values are already collected by tally counters
> (TCs) in the NIC.  Some of these TC values are not returned by
> @get_stats64.  Especially the received multicast packages are missing
> from /proc/net/dev.
> Rectify this by fetching the TCs and returning them from
> rtl8169_get_stats64.

It is not exactly a some / many / not all / confused [*] picture
- get_stats64 provides software managed values
- ethtool provides hardware only values

[*] ok, rx_missed is kinda special.

> The counters collected in the driver obviously disappear as soon as the
> driver is unloaded so after a driver is loaded the counters always start
> at 0.  The TCs are only reset by a power cycle and there's no known way
> to reset them programatically. Without further considerations the values
> collected by the driver would not match up against the TC values.

I'd rather:
1. keep software and hardware stats separated
2. push to userspace the responsibility to offset device specific hardware
   values jumps

My 0.02 €.
David Miller Aug. 19, 2015, 2:05 a.m. UTC | #2
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Tue, 18 Aug 2015 23:40:17 +0200

> Corinna Vinschen <vinschen@redhat.com> :
>> The r8169 driver collects statistical information returned by
>> @get_stats64 by counting them in the driver itself, even though many
>> (but not all) of the values are already collected by tally counters
>> (TCs) in the NIC.  Some of these TC values are not returned by
>> @get_stats64.  Especially the received multicast packages are missing
>> from /proc/net/dev.
>> Rectify this by fetching the TCs and returning them from
>> rtl8169_get_stats64.
> 
> It is not exactly a some / many / not all / confused [*] picture
> - get_stats64 provides software managed values
> - ethtool provides hardware only values
> 
> [*] ok, rx_missed is kinda special.
> 
>> The counters collected in the driver obviously disappear as soon as the
>> driver is unloaded so after a driver is loaded the counters always start
>> at 0.  The TCs are only reset by a power cycle and there's no known way
>> to reset them programatically. Without further considerations the values
>> collected by the driver would not match up against the TC values.
> 
> I'd rather:
> 1. keep software and hardware stats separated
> 2. push to userspace the responsibility to offset device specific hardware
>    values jumps
> 
> My 0.02 €.

Any value that is provided by ->get_stats64() should be provided by whatever
mechanism is available and determined to be "prudent" for the device in
question.  If the device is not tracking the event, this means pure software
counters.

If the device tracks the event, you should attempt to use the hardware
facility to provide the ->get_stats64() values.

If the hardware provides counters not starting at zero on driver load,
one option to handle that is to load the initial values of all of
those counters and calculate diffs at ->get_stats64() time.

Otherwise, if you don't want to calculate diffs, you must use software
computed counters.

Ethtool stats are for values provided by the hardware which are outside of
the collection defined by ->get_stats64().  Some drivers also use it for
providing per-RX-queue and per-TX-queue statistics.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Corinna Vinschen Aug. 19, 2015, 9:10 a.m. UTC | #3
On Aug 18 19:05, David Miller wrote:
> From: Francois Romieu <romieu@fr.zoreil.com>
> Date: Tue, 18 Aug 2015 23:40:17 +0200
> 
> > Corinna Vinschen <vinschen@redhat.com> :
> >> The r8169 driver collects statistical information returned by
> >> @get_stats64 by counting them in the driver itself, even though many
> >> (but not all) of the values are already collected by tally counters
> >> (TCs) in the NIC.  Some of these TC values are not returned by
> >> @get_stats64.  Especially the received multicast packages are missing
> >> from /proc/net/dev.
> >> Rectify this by fetching the TCs and returning them from
> >> rtl8169_get_stats64.
> > 
> > It is not exactly a some / many / not all / confused [*] picture
> > - get_stats64 provides software managed values
> > - ethtool provides hardware only values
> > 
> > [*] ok, rx_missed is kinda special.
> > 
> >> The counters collected in the driver obviously disappear as soon as the
> >> driver is unloaded so after a driver is loaded the counters always start
> >> at 0.  The TCs are only reset by a power cycle and there's no known way
> >> to reset them programatically. Without further considerations the values
> >> collected by the driver would not match up against the TC values.
> > 
> > I'd rather:
> > 1. keep software and hardware stats separated
> > 2. push to userspace the responsibility to offset device specific hardware
> >    values jumps
> > 
> > My 0.02 €.
> 
> Any value that is provided by ->get_stats64() should be provided by whatever
> mechanism is available and determined to be "prudent" for the device in
> question.  If the device is not tracking the event, this means pure software
> counters.
> 
> If the device tracks the event, you should attempt to use the hardware
> facility to provide the ->get_stats64() values.
> 
> If the hardware provides counters not starting at zero on driver load,
> one option to handle that is to load the initial values of all of
> those counters and calculate diffs at ->get_stats64() time.

That's what my code is doing, just for a limited set of values.

I was already wondering why some values available in the hardware tally
counters are additionally counted in software.  I was planning to
propose to use the hardware counters in @get_stats64 in the first place
and only count values in the driver which are not provided by the
hardware


Corinna
Corinna Vinschen Aug. 19, 2015, 9:13 a.m. UTC | #4
On Aug 19 02:51, Hayes Wang wrote:
> [...]
> > > The TCs are only reset by a power cycle and there's no known way
> > > to reset them programatically.
> 
> It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152.

Thanks for this hint.  I'll give it a try.  Is it safe to assume that
this is implemented in all NICs covered by r8169?  Otherwise it's more
safe to use offset values as in my patch, I think.


Thanks,
Corinna
Corinna Vinschen Aug. 19, 2015, 1:07 p.m. UTC | #5
On Aug 19 09:31, Hayes Wang wrote:
> Corinna Vinschen [mailto:vinschen@redhat.com]
> > Sent: Wednesday, August 19, 2015 5:13 PM
> [...]
> > > It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152.
> > 
> > Is it safe to assume that this is implemented in all NICs covered by r8169? 
> 
> It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later.

Thanks.  In that case I would prefer the same generic method for all
chip versions, so I'd opt for storing the offset values at rtl_open
time as my patch is doing right now.  Is that acceptable?

If so, wouldn't it make even more sense to use the hardware collected
information in @get_stats64 throughout, except for the numbers collected
*only* in software?  

I would be willing to propose a matching patch.


Thanks,
Corinna
Corinna Vinschen Aug. 19, 2015, 2:24 p.m. UTC | #6
On Aug 19 15:07, Corinna Vinschen wrote:
> On Aug 19 09:31, Hayes Wang wrote:
> > Corinna Vinschen [mailto:vinschen@redhat.com]
> > > Sent: Wednesday, August 19, 2015 5:13 PM
> > [...]
> > > > It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152.
> > > 
> > > Is it safe to assume that this is implemented in all NICs covered by r8169? 
> > 
> > It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later.
> 
> Thanks.  In that case I would prefer the same generic method for all
> chip versions, so I'd opt for storing the offset values at rtl_open
> time as my patch is doing right now.  Is that acceptable?
> 
> If so, wouldn't it make even more sense to use the hardware collected
> information in @get_stats64 throughout, except for the numbers collected
> *only* in software?  
> 
> I would be willing to propose a matching patch.

It just occured to me that the combination of resetting the counters on
post-RTL_GIGA_MAC_VER_19 chips plus offset handling would be quite
nice, because it would reset also the small 16 and 32 bit counters.

So I'd like to propose a patch which combines both techniques, if that's
an acceptable way to go forward.

Btw., does setting the reset bit in CounterAddrLow work the same way as
setting the CounterDump flag?  I.e, does the driver have to wait for the
hardware to set the bit to 0 again to be sure the reset is finished?


Thanks in advance,
Corinna
diff mbox

Patch

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index f790f61..e7c7955 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -747,6 +747,14 @@  struct rtl8169_counters {
 	__le16	tx_underun;
 };
 
+struct rtl8169_tc_offsets {
+	bool	inited;
+	__le64	tx_errors;
+	__le32	tx_multi_collision;
+	__le32	rx_multicast;
+	__le16	tx_aborted;
+};
+
 enum rtl_flag {
 	RTL_FLAG_TASK_ENABLED,
 	RTL_FLAG_TASK_SLOW_PENDING,
@@ -824,6 +832,7 @@  struct rtl8169_private {
 
 	struct mii_if_info mii;
 	struct rtl8169_counters counters;
+	struct rtl8169_tc_offsets tc_offset;
 	u32 saved_wolopts;
 	u32 opts1_mask;
 
@@ -2220,6 +2229,38 @@  static void rtl8169_update_counters(struct net_device *dev)
 	dma_free_coherent(d, sizeof(*counters), counters, paddr);
 }
 
+static void rtl8169_init_tc_counter_offset(struct net_device *dev)
+{
+	struct rtl8169_private *tp = netdev_priv(dev);
+
+	/*
+	 * rtl8169_init_tc_counter_offset is called from rtl_open.  The tally
+	 * counters are only reset by a power cycle, while the counter values
+	 * collected by the driver are reset at every driver unload/load cycle.
+	 * There's also no (documented?) way to reset the tally counters
+	 * programatically.
+	 *
+	 * To make sure the HW values returned by @get_stats64 match the SW
+	 * values, we collect the initial values at first open(*) and use them
+	 * as offsets to normalize the values returned by @get_stats64.
+	 *
+	 * (*) We can't call rtl8169_init_tc_counter_offset from rtl_init_one
+	 * for the reason stated in rtl8169_update_counters; CmdRxEnb is only
+	 * set at open time by rtl_hw_start.
+	 */
+
+	if (tp->tc_offset.inited)
+		return;
+
+	rtl8169_update_counters(dev);
+
+	tp->tc_offset.tx_errors = tp->counters.tx_errors;
+	tp->tc_offset.tx_multi_collision = tp->counters.tx_multi_collision;
+	tp->tc_offset.rx_multicast = tp->counters.rx_multicast;
+	tp->tc_offset.tx_aborted = tp->counters.tx_aborted;
+	tp->tc_offset.inited = true;
+}
+
 static void rtl8169_get_ethtool_stats(struct net_device *dev,
 				      struct ethtool_stats *stats, u64 *data)
 {
@@ -7631,6 +7672,8 @@  static int rtl_open(struct net_device *dev)
 
 	rtl_hw_start(dev);
 
+	rtl8169_init_tc_counter_offset(dev);
+
 	netif_start_queue(dev);
 
 	rtl_unlock_work(tp);
@@ -7689,6 +7732,25 @@  rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	stats->rx_fifo_errors	= dev->stats.rx_fifo_errors;
 	stats->rx_missed_errors = dev->stats.rx_missed_errors;
 
+	/*
+	 * Fetch additonal counter values missing in stats collected by driver
+	 * from tally counters.
+	 */
+	rtl8169_update_counters(dev);
+
+	/*
+	 * Subtract values fetched during initalization.
+	 * See rtl8169_init_tc_counter_offset for a description why we do that.
+	 */
+	stats->tx_errors = le64_to_cpu(tp->counters.tx_errors) -
+		le64_to_cpu(tp->tc_offset.tx_errors);
+	stats->collisions = le32_to_cpu(tp->counters.tx_multi_collision) -
+		le32_to_cpu(tp->tc_offset.tx_multi_collision);
+	stats->multicast = le32_to_cpu(tp->counters.rx_multicast) -
+		le32_to_cpu(tp->tc_offset.rx_multicast);
+	stats->tx_aborted_errors = le16_to_cpu(tp->counters.tx_aborted) -
+		le16_to_cpu(tp->tc_offset.tx_aborted);
+
 	return stats;
 }