[3/3] Revert "e1000e: Do not read ICR in Other interrupt"

Message ID	20180126091236.13044-4-bpoirier@suse.com
State	Awaiting Upstream, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Benjamin Poirier <bpoirier@suse.com> To: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com>, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/3] Revert "e1000e: Do not read ICR in Other interrupt" Date: Fri, 26 Jan 2018 18:12:36 +0900 Message-Id: <20180126091236.13044-4-bpoirier@suse.com> In-Reply-To: <20180126091236.13044-1-bpoirier@suse.com> References: <20180126091236.13044-1-bpoirier@suse.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	e1000e: Revert interrupt handling changes \| expand [0/3] e1000e: Revert interrupt handling changes [1/3] Partial revert "e1000e: Avoid receiver overrun interrupt bursts" [2/3] Revert "e1000e: Separate signaling for link check/link up" [3/3] Revert "e1000e: Do not read ICR in Other interrupt"

Message ID

20180126091236.13044-4-bpoirier@suse.com

State

Awaiting Upstream, archived

Delegated to:

David Miller

Headers

From: Benjamin Poirier <bpoirier@suse.com>
To: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 3/3] Revert "e1000e: Do not read ICR in Other interrupt"
Date: Fri, 26 Jan 2018 18:12:36 +0900
Message-Id: <20180126091236.13044-4-bpoirier@suse.com>
In-Reply-To: <20180126091236.13044-1-bpoirier@suse.com>
References: <20180126091236.13044-1-bpoirier@suse.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Series

e1000e: Revert interrupt handling changes | expand

Commit Message

Benjamin Poirier Jan. 26, 2018, 9:12 a.m. UTC

This reverts commit 16ecba59bc333d6282ee057fb02339f77a880beb.

It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
overrun interrupt bursts"). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.

Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before
commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").

Since the ICR read in the Other interrupt handler has already been
restored, this patch effectively reverts the remainder of commit
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").

Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Alexander Duyck Jan. 26, 2018, 9:01 p.m. UTC | #1

On Fri, Jan 26, 2018 at 1:12 AM, Benjamin Poirier <bpoirier@suse.com> wrote:
> This reverts commit 16ecba59bc333d6282ee057fb02339f77a880beb.
>
> It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> overrun interrupt bursts"). Some tracing shows that after
> e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.
>
> Some experimentation showed that this flaw in vmware e1000e emulation can
> be worked around by not setting Other in EIAC. This is how it was before
> commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").
>
> Since the ICR read in the Other interrupt handler has already been
> restored, this patch effectively reverts the remainder of commit
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").
>
> Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ed103b9a8d3a..fffc1f0e3895 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1916,6 +1916,13 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
>         struct e1000_hw *hw = &adapter->hw;
>         u32 icr = er32(ICR);
>
> +       /* Certain events (such as RXO) which trigger Other do not set
> +        * INT_ASSERTED. In that case, read to clear of icr does not take
> +        * place.
> +        */
> +       if (!(icr & E1000_ICR_INT_ASSERTED))
> +               ew32(ICR, E1000_ICR_OTHER);
> +

This piece doesn't make sense to me. Why are we clearing OTHER if
ICR_INT_ASSERTED is not set? The original code that was removed was in
commit 4d432f67ff00 "e1000e: Remove unreachable code" was setting IMS
and returning, not clearing the ICR register. I would argue that the
code is probably unreachable and if we just have the checks for OTHER
and LSC then we should be taking care of all of this in the task
anyway. All this code the code in the original was doing was
re-enabling the interrupt via IMS so we probably don't need this bit
as long as we are clearing OTHER and LSC when they are set so that we
can get future interrupts.

>         if (icr & adapter->eiac_mask)
>                 ew32(ICS, (icr & adapter->eiac_mask));
>
> @@ -2033,7 +2040,6 @@ static void e1000_configure_msix(struct e1000_adapter *adapter)
>                        hw->hw_addr + E1000_EITR_82574(vector));
>         else
>                 writel(1, hw->hw_addr + E1000_EITR_82574(vector));
> -       adapter->eiac_mask |= E1000_IMS_OTHER;
>
>         /* Cause Tx interrupts on every write back */
>         ivar |= BIT(31);
> @@ -2258,7 +2264,7 @@ static void e1000_irq_enable(struct e1000_adapter *adapter)
>
>         if (adapter->msix_entries) {
>                 ew32(EIAC_82574, adapter->eiac_mask & E1000_EIAC_MASK_82574);
> -               ew32(IMS, adapter->eiac_mask | E1000_IMS_LSC);
> +               ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER | E1000_IMS_LSC);
>         } else if (hw->mac.type >= e1000_pch_lpt) {
>                 ew32(IMS, IMS_ENABLE_MASK | E1000_IMS_ECCER);
>         } else {
> --
> 2.15.1
>

Benjamin Poirier Jan. 29, 2018, 7:28 a.m. UTC | #2

On 2018/01/26 13:01, Alexander Duyck wrote:
> On Fri, Jan 26, 2018 at 1:12 AM, Benjamin Poirier <bpoirier@suse.com> wrote:
> > This reverts commit 16ecba59bc333d6282ee057fb02339f77a880beb.
> >
> > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > overrun interrupt bursts"). Some tracing shows that after
> > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.
> >
> > Some experimentation showed that this flaw in vmware e1000e emulation can
> > be worked around by not setting Other in EIAC. This is how it was before
> > commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").
> >
> > Since the ICR read in the Other interrupt handler has already been
> > restored, this patch effectively reverts the remainder of commit
> > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").
> >
> > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > ---
> >  drivers/net/ethernet/intel/e1000e/netdev.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> > index ed103b9a8d3a..fffc1f0e3895 100644
> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > @@ -1916,6 +1916,13 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
> >         struct e1000_hw *hw = &adapter->hw;
> >         u32 icr = er32(ICR);
> >
> > +       /* Certain events (such as RXO) which trigger Other do not set
> > +        * INT_ASSERTED. In that case, read to clear of icr does not take
> > +        * place.
> > +        */
> > +       if (!(icr & E1000_ICR_INT_ASSERTED))
> > +               ew32(ICR, E1000_ICR_OTHER);
> > +
> 
> This piece doesn't make sense to me. Why are we clearing OTHER if
> ICR_INT_ASSERTED is not set?

Datasheet §10.2.4.1 ("Interrupt Cause Read Register") says that ICR read
to clear only occurs if INT_ASSERTED is set. This corresponds to what I
observed.

However, while working on these issues, I noticed that when there is an rxo
event, INT_ASSERTED is not always set even though the interrupt is raised. I
think this is a hardware flaw.

For example, if doing
ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER);
we enter e1000_msix_other() and two consecutive reads of ICR result in
0x81000004
0x00000000

If doing
ew32(ICS, E1000_ICS_RXO | E1000_ICS_OTHER);
we enter e1000_msix_other() and two consecutive reads of ICR result in
0x01000041
0x01000041

Consequently, we must clear OTHER manually from ICR, otherwise the
interrupt is immediately re-raised after exiting the handler.

These observations are the same whether the interrupt is triggered via a
write to ICS or in hardware.

Furthermore, I tested that this behavior is the same for other Other
events (MDAC, SRPD, ACK, MNG). Those were tested via a write to ICS
only, not in hardware.

This is a version of the test patch that I used to trigger lsc and rxo in
software and hardware. It applies over this patch series.

diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
index 0641c0098738..f54e7ac9c934 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -398,6 +398,7 @@
 #define E1000_ICR_LSC           0x00000004 /* Link Status Change */
 #define E1000_ICR_RXSEQ         0x00000008 /* Rx sequence error */
 #define E1000_ICR_RXDMT0        0x00000010 /* Rx desc min. threshold (0) */
+#define E1000_ICR_RXO           0x00000040 /* rx overrun */
 #define E1000_ICR_RXT0          0x00000080 /* Rx timer intr (ring 0) */
 #define E1000_ICR_ECCER         0x00400000 /* Uncorrectable ECC Error */
 /* If this bit asserted, the driver should claim the interrupt */
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 003cbd605799..4933c1beac74 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -1802,98 +1802,20 @@ static void e1000_diag_test(struct net_device *netdev,
 			    struct ethtool_test *eth_test, u64 *data)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
-	u16 autoneg_advertised;
-	u8 forced_speed_duplex;
-	u8 autoneg;
-	bool if_running = netif_running(netdev);
+	struct e1000_hw *hw = &adapter->hw;
 
 	pm_runtime_get_sync(netdev->dev.parent);
 
 	set_bit(__E1000_TESTING, &adapter->state);
 
-	if (!if_running) {
-		/* Get control of and reset hardware */
-		if (adapter->flags & FLAG_HAS_AMT)
-			e1000e_get_hw_control(adapter);
-
-		e1000e_power_up_phy(adapter);
-
-		adapter->hw.phy.autoneg_wait_to_complete = 1;
-		e1000e_reset(adapter);
-		adapter->hw.phy.autoneg_wait_to_complete = 0;
-	}
-
 	if (eth_test->flags == ETH_TEST_FL_OFFLINE) {
-		/* Offline tests */
-
-		/* save speed, duplex, autoneg settings */
-		autoneg_advertised = adapter->hw.phy.autoneg_advertised;
-		forced_speed_duplex = adapter->hw.mac.forced_speed_duplex;
-		autoneg = adapter->hw.mac.autoneg;
-
-		e_info("offline testing starting\n");
-
-		if (if_running)
-			/* indicate we're in test mode */
-			e1000e_close(netdev);
-
-		if (e1000_reg_test(adapter, &data[0]))
-			eth_test->flags |= ETH_TEST_FL_FAILED;
-
-		e1000e_reset(adapter);
-		if (e1000_eeprom_test(adapter, &data[1]))
-			eth_test->flags |= ETH_TEST_FL_FAILED;
-
-		e1000e_reset(adapter);
-		if (e1000_intr_test(adapter, &data[2]))
-			eth_test->flags |= ETH_TEST_FL_FAILED;
-
-		e1000e_reset(adapter);
-		if (e1000_loopback_test(adapter, &data[3]))
-			eth_test->flags |= ETH_TEST_FL_FAILED;
-
-		/* force this routine to wait until autoneg complete/timeout */
-		adapter->hw.phy.autoneg_wait_to_complete = 1;
-		e1000e_reset(adapter);
-		adapter->hw.phy.autoneg_wait_to_complete = 0;
-
-		if (e1000_link_test(adapter, &data[4]))
-			eth_test->flags |= ETH_TEST_FL_FAILED;
-
-		/* restore speed, duplex, autoneg settings */
-		adapter->hw.phy.autoneg_advertised = autoneg_advertised;
-		adapter->hw.mac.forced_speed_duplex = forced_speed_duplex;
-		adapter->hw.mac.autoneg = autoneg;
-		e1000e_reset(adapter);
-
-		clear_bit(__E1000_TESTING, &adapter->state);
-		if (if_running)
-			e1000e_open(netdev);
+		// LSC, RXO, MDAC, SRPD, ACK, MNG
+		ew32(ICS, E1000_ICR_RXO | E1000_ICR_OTHER);
 	} else {
-		/* Online tests */
-
-		e_info("online testing starting\n");
-
-		/* register, eeprom, intr and loopback tests not run online */
-		data[0] = 0;
-		data[1] = 0;
-		data[2] = 0;
-		data[3] = 0;
-
-		if (e1000_link_test(adapter, &data[4]))
-			eth_test->flags |= ETH_TEST_FL_FAILED;
-
-		clear_bit(__E1000_TESTING, &adapter->state);
-	}
-
-	if (!if_running) {
-		e1000e_reset(adapter);
-
-		if (adapter->flags & FLAG_HAS_AMT)
-			e1000e_release_hw_control(adapter);
+		ew32(ICS, E1000_ICR_LSC | E1000_ICR_OTHER);
 	}
 
-	msleep_interruptible(4 * 1000);
+	clear_bit(__E1000_TESTING, &adapter->state);
 
 	pm_runtime_put_sync(netdev->dev.parent);
 }
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index fffc1f0e3895..5b3a0feaf052 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -46,6 +46,10 @@
 
 #include "e1000.h"
 
+DEFINE_RATELIMIT_STATE(rx_ratelimit_state, 2 * HZ, 1);
+DEFINE_RATELIMIT_STATE(other_ratelimit_state, 2 * HZ, 1);
+DEFINE_RATELIMIT_STATE(other_ratelimit_state2, 2 * HZ, 1);
+
 #define DRV_EXTRAVERSION "-k"
 
 #define DRV_VERSION "3.2.6" DRV_EXTRAVERSION
@@ -936,6 +940,9 @@ static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done,
 	int cleaned_count = 0;
 	bool cleaned = false;
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
+	static unsigned int count;
+
+	mdelay(10);
 
 	i = rx_ring->next_to_clean;
 	rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
@@ -1067,6 +1074,16 @@ static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done,
 
 	adapter->total_rx_bytes += total_rx_bytes;
 	adapter->total_rx_packets += total_rx_packets;
+
+	count++;
+	if (__ratelimit(&rx_ratelimit_state)) {
+		static unsigned int max;
+		max = max(max, total_rx_packets);
+		trace_printk("rx %u now, max %u, %u rounds\n",
+			     total_rx_packets, max, count);
+		count = 0;
+	}
+
 	return cleaned;
 }
 
@@ -1914,14 +1931,30 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
 	struct net_device *netdev = data;
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
-	u32 icr = er32(ICR);
+	static unsigned int count;
+	u32 icr2, icr = er32(ICR);
 
 	/* Certain events (such as RXO) which trigger Other do not set
 	 * INT_ASSERTED. In that case, read to clear of icr does not take
 	 * place.
 	 */
+	/*
 	if (!(icr & E1000_ICR_INT_ASSERTED))
 		ew32(ICR, E1000_ICR_OTHER);
+	*/
+
+	icr2 = er32(ICR);
+
+	count++;
+	if (__ratelimit(&other_ratelimit_state)) {
+		trace_printk("icr 0x%08x icr2 0x%08x count %u\n", icr, icr2,
+			     count);
+		count = 0;
+	}
+	if (icr & E1000_ICR_RXO && icr & E1000_ICR_INT_ASSERTED &&
+	    __ratelimit(&other_ratelimit_state2)) {
+		trace_printk("special icr 0x%08x icr2 0x%08x\n", icr, icr2);
+	}
 
 	if (icr & adapter->eiac_mask)
 		ew32(ICS, (icr & adapter->eiac_mask));

Alexander Duyck Jan. 29, 2018, 5:22 p.m. UTC | #3

On Sun, Jan 28, 2018 at 11:28 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
> On 2018/01/26 13:01, Alexander Duyck wrote:
>> On Fri, Jan 26, 2018 at 1:12 AM, Benjamin Poirier <bpoirier@suse.com> wrote:
>> > This reverts commit 16ecba59bc333d6282ee057fb02339f77a880beb.
>> >
>> > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
>> > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
>> > overrun interrupt bursts"). Some tracing shows that after
>> > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
>> > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
>> > icr=0x80000004 (_INT_ASSERTED | _LSC) in the same situation.
>> >
>> > Some experimentation showed that this flaw in vmware e1000e emulation can
>> > be worked around by not setting Other in EIAC. This is how it was before
>> > commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").
>> >
>> > Since the ICR read in the Other interrupt handler has already been
>> > restored, this patch effectively reverts the remainder of commit
>> > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").
>> >
>> > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
>> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
>> > ---
>> >  drivers/net/ethernet/intel/e1000e/netdev.c | 10 ++++++++--
>> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>> > index ed103b9a8d3a..fffc1f0e3895 100644
>> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> > @@ -1916,6 +1916,13 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
>> >         struct e1000_hw *hw = &adapter->hw;
>> >         u32 icr = er32(ICR);
>> >
>> > +       /* Certain events (such as RXO) which trigger Other do not set
>> > +        * INT_ASSERTED. In that case, read to clear of icr does not take
>> > +        * place.
>> > +        */
>> > +       if (!(icr & E1000_ICR_INT_ASSERTED))
>> > +               ew32(ICR, E1000_ICR_OTHER);
>> > +
>>
>> This piece doesn't make sense to me. Why are we clearing OTHER if
>> ICR_INT_ASSERTED is not set?
>
> Datasheet §10.2.4.1 ("Interrupt Cause Read Register") says that ICR read
> to clear only occurs if INT_ASSERTED is set. This corresponds to what I
> observed.
>
> However, while working on these issues, I noticed that when there is an rxo
> event, INT_ASSERTED is not always set even though the interrupt is raised. I
> think this is a hardware flaw.

I agree. I need to check with our silicon team to see what we can determine.

> For example, if doing
> ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER);
> we enter e1000_msix_other() and two consecutive reads of ICR result in
> 0x81000004
> 0x00000000
>
> If doing
> ew32(ICS, E1000_ICS_RXO | E1000_ICS_OTHER);
> we enter e1000_msix_other() and two consecutive reads of ICR result in
> 0x01000041
> 0x01000041

This is interesting. So the ICR is doing the clear on read, so that
answers the question I had about the earlier patch.

One thought on this.. Is there any reason why you are limiting this to
only the OTHER bit? It seems like RXO and the other causes that aren't
supposed to be included in the mask should probably be cleared as
well, are they auto-cleared, ignored, or is there some advantage to
leaving them set?

> Consequently, we must clear OTHER manually from ICR, otherwise the
> interrupt is immediately re-raised after exiting the handler.
>
> These observations are the same whether the interrupt is triggered via a
> write to ICS or in hardware.
>
> Furthermore, I tested that this behavior is the same for other Other
> events (MDAC, SRPD, ACK, MNG). Those were tested via a write to ICS
> only, not in hardware.
>
> This is a version of the test patch that I used to trigger lsc and rxo in
> software and hardware. It applies over this patch series.

I plan to look into this some more over the next few days. Ideally if
we could mask these "OTHER" interrupts besides the LSC we could comply
with all the needed bits for MSI-X. My concern is that we are still
stuck reading the ICR at this point because of this and it is going to
make dealing with MSI-X challenging on 82574 since it seems like the
intention was that you weren't supposed to be reading the ICR when
MSI-X is enabled based on the list of current issues and HW errata.

At this point it seems like the interrupts is firing and the
INT_ASSERTED is all we really need to be checking for if I understand
this all correctly. Basically if LSC is set it will trigger OTHER and
INT_ASSERTED, if any of the other causes are set they are only setting
OTHER.

> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
> index 0641c0098738..f54e7ac9c934 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -398,6 +398,7 @@
>  #define E1000_ICR_LSC           0x00000004 /* Link Status Change */
>  #define E1000_ICR_RXSEQ         0x00000008 /* Rx sequence error */
>  #define E1000_ICR_RXDMT0        0x00000010 /* Rx desc min. threshold (0) */
> +#define E1000_ICR_RXO           0x00000040 /* rx overrun */
>  #define E1000_ICR_RXT0          0x00000080 /* Rx timer intr (ring 0) */
>  #define E1000_ICR_ECCER         0x00400000 /* Uncorrectable ECC Error */
>  /* If this bit asserted, the driver should claim the interrupt */
> diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
> index 003cbd605799..4933c1beac74 100644
> --- a/drivers/net/ethernet/intel/e1000e/ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
> @@ -1802,98 +1802,20 @@ static void e1000_diag_test(struct net_device *netdev,
>                             struct ethtool_test *eth_test, u64 *data)
>  {
>         struct e1000_adapter *adapter = netdev_priv(netdev);
> -       u16 autoneg_advertised;
> -       u8 forced_speed_duplex;
> -       u8 autoneg;
> -       bool if_running = netif_running(netdev);
> +       struct e1000_hw *hw = &adapter->hw;
>
>         pm_runtime_get_sync(netdev->dev.parent);
>
>         set_bit(__E1000_TESTING, &adapter->state);
>
> -       if (!if_running) {
> -               /* Get control of and reset hardware */
> -               if (adapter->flags & FLAG_HAS_AMT)
> -                       e1000e_get_hw_control(adapter);
> -
> -               e1000e_power_up_phy(adapter);
> -
> -               adapter->hw.phy.autoneg_wait_to_complete = 1;
> -               e1000e_reset(adapter);
> -               adapter->hw.phy.autoneg_wait_to_complete = 0;
> -       }
> -
>         if (eth_test->flags == ETH_TEST_FL_OFFLINE) {
> -               /* Offline tests */
> -
> -               /* save speed, duplex, autoneg settings */
> -               autoneg_advertised = adapter->hw.phy.autoneg_advertised;
> -               forced_speed_duplex = adapter->hw.mac.forced_speed_duplex;
> -               autoneg = adapter->hw.mac.autoneg;
> -
> -               e_info("offline testing starting\n");
> -
> -               if (if_running)
> -                       /* indicate we're in test mode */
> -                       e1000e_close(netdev);
> -
> -               if (e1000_reg_test(adapter, &data[0]))
> -                       eth_test->flags |= ETH_TEST_FL_FAILED;
> -
> -               e1000e_reset(adapter);
> -               if (e1000_eeprom_test(adapter, &data[1]))
> -                       eth_test->flags |= ETH_TEST_FL_FAILED;
> -
> -               e1000e_reset(adapter);
> -               if (e1000_intr_test(adapter, &data[2]))
> -                       eth_test->flags |= ETH_TEST_FL_FAILED;
> -
> -               e1000e_reset(adapter);
> -               if (e1000_loopback_test(adapter, &data[3]))
> -                       eth_test->flags |= ETH_TEST_FL_FAILED;
> -
> -               /* force this routine to wait until autoneg complete/timeout */
> -               adapter->hw.phy.autoneg_wait_to_complete = 1;
> -               e1000e_reset(adapter);
> -               adapter->hw.phy.autoneg_wait_to_complete = 0;
> -
> -               if (e1000_link_test(adapter, &data[4]))
> -                       eth_test->flags |= ETH_TEST_FL_FAILED;
> -
> -               /* restore speed, duplex, autoneg settings */
> -               adapter->hw.phy.autoneg_advertised = autoneg_advertised;
> -               adapter->hw.mac.forced_speed_duplex = forced_speed_duplex;
> -               adapter->hw.mac.autoneg = autoneg;
> -               e1000e_reset(adapter);
> -
> -               clear_bit(__E1000_TESTING, &adapter->state);
> -               if (if_running)
> -                       e1000e_open(netdev);
> +               // LSC, RXO, MDAC, SRPD, ACK, MNG
> +               ew32(ICS, E1000_ICR_RXO | E1000_ICR_OTHER);
>         } else {
> -               /* Online tests */
> -
> -               e_info("online testing starting\n");
> -
> -               /* register, eeprom, intr and loopback tests not run online */
> -               data[0] = 0;
> -               data[1] = 0;
> -               data[2] = 0;
> -               data[3] = 0;
> -
> -               if (e1000_link_test(adapter, &data[4]))
> -                       eth_test->flags |= ETH_TEST_FL_FAILED;
> -
> -               clear_bit(__E1000_TESTING, &adapter->state);
> -       }
> -
> -       if (!if_running) {
> -               e1000e_reset(adapter);
> -
> -               if (adapter->flags & FLAG_HAS_AMT)
> -                       e1000e_release_hw_control(adapter);
> +               ew32(ICS, E1000_ICR_LSC | E1000_ICR_OTHER);
>         }
>
> -       msleep_interruptible(4 * 1000);
> +       clear_bit(__E1000_TESTING, &adapter->state);
>
>         pm_runtime_put_sync(netdev->dev.parent);
>  }
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index fffc1f0e3895..5b3a0feaf052 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -46,6 +46,10 @@
>
>  #include "e1000.h"
>
> +DEFINE_RATELIMIT_STATE(rx_ratelimit_state, 2 * HZ, 1);
> +DEFINE_RATELIMIT_STATE(other_ratelimit_state, 2 * HZ, 1);
> +DEFINE_RATELIMIT_STATE(other_ratelimit_state2, 2 * HZ, 1);
> +
>  #define DRV_EXTRAVERSION "-k"
>
>  #define DRV_VERSION "3.2.6" DRV_EXTRAVERSION
> @@ -936,6 +940,9 @@ static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done,
>         int cleaned_count = 0;
>         bool cleaned = false;
>         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> +       static unsigned int count;
> +
> +       mdelay(10);
>
>         i = rx_ring->next_to_clean;
>         rx_desc = E1000_RX_DESC_EXT(*rx_ring, i);
> @@ -1067,6 +1074,16 @@ static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done,
>
>         adapter->total_rx_bytes += total_rx_bytes;
>         adapter->total_rx_packets += total_rx_packets;
> +
> +       count++;
> +       if (__ratelimit(&rx_ratelimit_state)) {
> +               static unsigned int max;
> +               max = max(max, total_rx_packets);
> +               trace_printk("rx %u now, max %u, %u rounds\n",
> +                            total_rx_packets, max, count);
> +               count = 0;
> +       }
> +
>         return cleaned;
>  }
>
> @@ -1914,14 +1931,30 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
>         struct net_device *netdev = data;
>         struct e1000_adapter *adapter = netdev_priv(netdev);
>         struct e1000_hw *hw = &adapter->hw;
> -       u32 icr = er32(ICR);
> +       static unsigned int count;
> +       u32 icr2, icr = er32(ICR);
>
>         /* Certain events (such as RXO) which trigger Other do not set
>          * INT_ASSERTED. In that case, read to clear of icr does not take
>          * place.
>          */
> +       /*
>         if (!(icr & E1000_ICR_INT_ASSERTED))
>                 ew32(ICR, E1000_ICR_OTHER);
> +       */
> +
> +       icr2 = er32(ICR);
> +
> +       count++;
> +       if (__ratelimit(&other_ratelimit_state)) {
> +               trace_printk("icr 0x%08x icr2 0x%08x count %u\n", icr, icr2,
> +                            count);
> +               count = 0;
> +       }
> +       if (icr & E1000_ICR_RXO && icr & E1000_ICR_INT_ASSERTED &&
> +           __ratelimit(&other_ratelimit_state2)) {
> +               trace_printk("special icr 0x%08x icr2 0x%08x\n", icr, icr2);
> +       }
>
>         if (icr & adapter->eiac_mask)
>                 ew32(ICS, (icr & adapter->eiac_mask));

Benjamin Poirier Feb. 8, 2018, 6:40 a.m. UTC | #4

On 2018/01/29 09:22, Alexander Duyck wrote:
[...]
> 
> > Consequently, we must clear OTHER manually from ICR, otherwise the
> > interrupt is immediately re-raised after exiting the handler.
> >
> > These observations are the same whether the interrupt is triggered via a
> > write to ICS or in hardware.
> >
> > Furthermore, I tested that this behavior is the same for other Other
> > events (MDAC, SRPD, ACK, MNG). Those were tested via a write to ICS
> > only, not in hardware.
> >
> > This is a version of the test patch that I used to trigger lsc and rxo in
> > software and hardware. It applies over this patch series.
> 
> I plan to look into this some more over the next few days. Ideally if
> we could mask these "OTHER" interrupts besides the LSC we could comply
> with all the needed bits for MSI-X. My concern is that we are still
> stuck reading the ICR at this point because of this and it is going to
> make dealing with MSI-X challenging on 82574 since it seems like the
> intention was that you weren't supposed to be reading the ICR when
> MSI-X is enabled based on the list of current issues and HW errata.

I totally agree with you that it looks like the msi-x interface was
designed so you don't need to read icr. That's also why I was happy to
go that direction with the (now infamous) commit 16ecba59bc33 ("e1000e:
Do not read ICR in Other interrupt", v4.5-rc1).

However, we looked at it before and there seems to be no way to mask
individual Other interrupt causes (masking rxo but getting lsc). Because
of that, I think we have to keep reading icr in the Other interrupt
handler.

> 
> At this point it seems like the interrupts is firing and the
> INT_ASSERTED is all we really need to be checking for if I understand
> this all correctly. Basically if LSC is set it will trigger OTHER and
> INT_ASSERTED, if any of the other causes are set they are only setting
> OTHER.

I think that's right and it's related to the fact that currently LSC is
set in IMS but not the other causes. Since we have to read icr (as I
wrote above) but we want to avoid reading it without INT_ASSERTED set
(as per errata 12) the solution will be to set all of the causes related
to Other in IMS. Patches incoming...

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ed103b9a8d3a..fffc1f0e3895 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1916,6 +1916,13 @@  static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
 	struct e1000_hw *hw = &adapter->hw;
 	u32 icr = er32(ICR);
 
+	/* Certain events (such as RXO) which trigger Other do not set
+	 * INT_ASSERTED. In that case, read to clear of icr does not take
+	 * place.
+	 */
+	if (!(icr & E1000_ICR_INT_ASSERTED))
+		ew32(ICR, E1000_ICR_OTHER);
+
 	if (icr & adapter->eiac_mask)
 		ew32(ICS, (icr & adapter->eiac_mask));
 
@@ -2033,7 +2040,6 @@  static void e1000_configure_msix(struct e1000_adapter *adapter)
 		       hw->hw_addr + E1000_EITR_82574(vector));
 	else
 		writel(1, hw->hw_addr + E1000_EITR_82574(vector));
-	adapter->eiac_mask |= E1000_IMS_OTHER;
 
 	/* Cause Tx interrupts on every write back */
 	ivar |= BIT(31);
@@ -2258,7 +2264,7 @@  static void e1000_irq_enable(struct e1000_adapter *adapter)
 
 	if (adapter->msix_entries) {
 		ew32(EIAC_82574, adapter->eiac_mask & E1000_EIAC_MASK_82574);
-		ew32(IMS, adapter->eiac_mask | E1000_IMS_LSC);
+		ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER | E1000_IMS_LSC);
 	} else if (hw->mac.type >= e1000_pch_lpt) {
 		ew32(IMS, IMS_ENABLE_MASK | E1000_IMS_ECCER);
 	} else {