Message ID | 20180118065054.29844-1-bpoirier@suse.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Series | [RFC] e1000e: Remove Other from EIAC. | expand |
On 2018/01/18 15:50, Benjamin Poirier wrote: > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > Some experimentation showed that this flaw in vmware e1000e emulation can > be worked around by not setting Other in EIAC. This is how it was before > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). vmware folks, please comment.
On 2018/01/18 15:50, Benjamin Poirier wrote: > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. (_INT_ASSERTED | _LSC) > > Some experimentation showed that this flaw in vmware e1000e emulation can > be worked around by not setting Other in EIAC. This is how it was before > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote: > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my patch on was that the VMware code was sending _OTHER instead of _LSC to trigger LSC events. As such in my version of the workaround I just went through and did the testing if the _RXO bit was set, otherwise I assumed that whatever event was received must have been meant to trigger an _LSC type event since that worked in the past. > Some experimentation showed that this flaw in vmware e1000e emulation can > be worked around by not setting Other in EIAC. This is how it was before > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). Did this actually set the _LSC bit or was it just giving you the _OTHER bit? The ICR for that combination would come out 0x81000000. > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > --- > > Jeff, I'm sending as RFC since it looks like a problem that should be fixed > in vmware. If you'd like to have the workaround in e1000e, I'll submit. I would appreciate it if you could review/test the patch I submitted for the same issue. Specifically I would want to make certain that it still addresses the original RXO interrupt burst issue your reported. Thanks. - Alex
On Thu, 18 Jan 2018, Benjamin Poirier wrote: > On 2018/01/18 15:50, Benjamin Poirier wrote: > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > be worked around by not setting Other in EIAC. This is how it was before > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > vmware folks, please comment. Thank you for bringing this to our attention. Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which has the said patch), I could bring up e1000e interface (version: 3.2.6-k), get dhcp address and even do large file downloads without difficulty. Could you give us more pointers on how we may be able to reproduce this locally? Was there anything different with the configuration when the issue was observed? Is the issue consistently reproducible? Thanks, Shri
On 2018/01/18 18:42, Shrikrishna Khare wrote: > > > On Thu, 18 Jan 2018, Benjamin Poirier wrote: > > > On 2018/01/18 15:50, Benjamin Poirier wrote: > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > > be worked around by not setting Other in EIAC. This is how it was before > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > > > vmware folks, please comment. > > Thank you for bringing this to our attention. > > Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which > has the said patch), I could bring up e1000e interface (version: 3.2.6-k), > get dhcp address and even do large file downloads without difficulty. > > Could you give us more pointers on how we may be able to reproduce this > locally? Was there anything different with the configuration when the > issue was observed? Is the issue consistently reproducible? It's consistently reproducible, however I noticed that once in a while there is a genuine "Other" interrupt that comes in and triggers the link status change. The problem is with interrupts that are triggered via a write to ICS (such as in e1000e_trigger_lsc()). Can you reproduce a problem if you do: ip link set ethX down ip link set ethX up If you're building your own kernel, you can add the following patch and cat /sys/kernel/debug/tracing/trace_pipe For me it shows on v4.15-rc8: <...>-2578 [000] .... 83527.938321: e1000e_trigger_lsc: trigger_lsc <...>-2578 [000] d.h. 83527.938398: e1000_msix_other: icr 0x0 With the patch that I submitted, it shows: wickedd-1329 [002] .N.. 20.123545: e1000e_trigger_lsc: trigger_lsc <idle>-0 [000] d.h. 20.123630: e1000_msix_other: icr 0x81000004 <idle>-0 [000] d.h. 20.123654: e1000_msix_other: lsc <idle>-0 [000] d.h. 20.123676: e1000_msix_other: mod_timer diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9f18d39bdc8f..16620ce840fc 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1918,22 +1918,29 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) bool enable = true; icr = er32(ICR); + trace_printk("icr 0x%x\n", icr); + if (icr & E1000_ICR_RXO) { + trace_printk("rxo\n"); ew32(ICR, E1000_ICR_RXO); enable = false; /* napi poll will re-enable Other, make sure it runs */ if (napi_schedule_prep(&adapter->napi)) { + trace_printk("napi schedule\n"); adapter->total_rx_bytes = 0; adapter->total_rx_packets = 0; __napi_schedule(&adapter->napi); } } if (icr & E1000_ICR_LSC) { + trace_printk("lsc\n"); ew32(ICR, E1000_ICR_LSC); hw->mac.get_link_status = true; /* guard against interrupt when we're going down */ - if (!test_bit(__E1000_DOWN, &adapter->state)) + if (!test_bit(__E1000_DOWN, &adapter->state)) { + trace_printk("mod_timer\n"); mod_timer(&adapter->watchdog_timer, jiffies + 1); + } } if (enable && !test_bit(__E1000_DOWN, &adapter->state)) @@ -4221,6 +4228,8 @@ static void e1000e_trigger_lsc(struct e1000_adapter *adapter) { struct e1000_hw *hw = &adapter->hw; + trace_printk("trigger_lsc\n"); + if (adapter->msix_entries) ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER); else
On 2018/01/18 07:51, Alexander Duyck wrote: > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote: > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my Yes. The numeric value is correct. I made a mistake when writing down the flag names. > patch on was that the VMware code was sending _OTHER instead of _LSC > to trigger LSC events. As such in my version of the workaround I just It's not so deterministic, sadly. In my tests, upon entry into e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch I've seen: icr=0x0 icr=0x3d Reserved RXDMT0 Reserved LSC TXDW icr=0x46 RXO LSC TXQE and someone else reported: icr=0x3c Reserved RXDMT0 Reserved LSC > went through and did the testing if the _RXO bit was set, otherwise I > assumed that whatever event was received must have been meant to > trigger an _LSC type event since that worked in the past. > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > be worked around by not setting Other in EIAC. This is how it was before > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > Did this actually set the _LSC bit or was it just giving you the > _OTHER bit? The ICR for that combination would come out 0x81000000. With my patch, after e1000e_trigger_lsc(), it results in icr=0x81000004 on real and emulated hardware. IMO, the resulting icr read is cleaner than with your patch but it depends on an undocumented quirk of the emulated vmware e1000e, so I don't know which of the two workarounds is more desirable. If you'd like to stick with your patch though, I think that you should definitely rewrite it as: diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9f18d39bdc8f..68c0bcb8287f 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1928,7 +1928,12 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) __napi_schedule(&adapter->napi); } } - if (icr & E1000_ICR_LSC) { + if (icr & E1000_ICR_LSC || !(icr & E1000_ICR_RXO)) { + /* We assume if the RXO bit is not set that this is a + * link status change event. This is needed due to emulated + * versions of the device that may not correctly populate + * the LSC bit. + */ ew32(ICR, E1000_ICR_LSC); hw->mac.get_link_status = true; /* guard against interrupt when we're going down */ > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > > --- > > > > Jeff, I'm sending as RFC since it looks like a problem that should be fixed > > in vmware. If you'd like to have the workaround in e1000e, I'll submit. > > I would appreciate it if you could review/test the patch I submitted > for the same issue. Specifically I would want to make certain that it > still addresses the original RXO interrupt burst issue your reported. I've tested both my patch and yours; they both allow link up on vmware; link up on real 82574 and rxo mitigation on real 82574. I couldn't conveniently test rxo on vmware. > > Thanks. > > - Alex >
On 2018/01/19 17:59, Benjamin Poirier wrote: > On 2018/01/18 07:51, Alexander Duyck wrote: > > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote: > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my > > Yes. The numeric value is correct. I made a mistake when writing down > the flag names. > > > patch on was that the VMware code was sending _OTHER instead of _LSC > > to trigger LSC events. As such in my version of the workaround I just > > It's not so deterministic, sadly. In my tests, upon entry into > e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch > I've seen: > icr=0x0 > icr=0x3d > Reserved RXDMT0 Reserved LSC TXDW > icr=0x46 > RXO LSC TXQE > and someone else reported: > icr=0x3c > Reserved RXDMT0 Reserved LSC > > > went through and did the testing if the _RXO bit was set, otherwise I > > assumed that whatever event was received must have been meant to > > trigger an _LSC type event since that worked in the past. ^... Re-reading that part, my thoughts are that it worked in the past, before 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1), because (presumably) Other was not set in EIAC. It worked after 16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the value of icr. If it had, it would've found a bogus value, which is what's happening after 4aea7a5c5e94. I wonder if we're not just getting some uninitialized value from the emulation code... which makes me kind of uneasy about your approach of trying to make sense of the value. Maybe Shrikrishna can clarify where the icr value is coming from when Other is set in EIAC. > > > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > > be worked around by not setting Other in EIAC. This is how it was before > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > > > Did this actually set the _LSC bit or was it just giving you the > > _OTHER bit? The ICR for that combination would come out 0x81000000. > > With my patch, after e1000e_trigger_lsc(), it results in icr=0x81000004 > on real and emulated hardware. > > IMO, the resulting icr read is cleaner than with your patch but it > depends on an undocumented quirk of the emulated vmware e1000e, so I > don't know which of the two workarounds is more desirable. > > If you'd like to stick with your patch though, I think that you should > definitely rewrite it as: > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > index 9f18d39bdc8f..68c0bcb8287f 100644 > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > @@ -1928,7 +1928,12 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) > __napi_schedule(&adapter->napi); > } > } > - if (icr & E1000_ICR_LSC) { > + if (icr & E1000_ICR_LSC || !(icr & E1000_ICR_RXO)) { > + /* We assume if the RXO bit is not set that this is a > + * link status change event. This is needed due to emulated > + * versions of the device that may not correctly populate > + * the LSC bit. > + */ > ew32(ICR, E1000_ICR_LSC); > hw->mac.get_link_status = true; > /* guard against interrupt when we're going down */ > > > > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > > > --- > > > > > > Jeff, I'm sending as RFC since it looks like a problem that should be fixed > > > in vmware. If you'd like to have the workaround in e1000e, I'll submit. > > > > I would appreciate it if you could review/test the patch I submitted > > for the same issue. Specifically I would want to make certain that it > > still addresses the original RXO interrupt burst issue your reported. > > I've tested both my patch and yours; they both allow link up on vmware; > link up on real 82574 and rxo mitigation on real 82574. I couldn't > conveniently test rxo on vmware. > > > > > Thanks. > > > > - Alex > >
On Fri, Jan 19, 2018 at 5:36 AM, Benjamin Poirier <benjamin.poirier@gmail.com> wrote: > On 2018/01/19 17:59, Benjamin Poirier wrote: >> On 2018/01/18 07:51, Alexander Duyck wrote: >> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote: >> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build >> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver >> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after >> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() >> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, >> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. >> > >> > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my >> >> Yes. The numeric value is correct. I made a mistake when writing down >> the flag names. >> >> > patch on was that the VMware code was sending _OTHER instead of _LSC >> > to trigger LSC events. As such in my version of the workaround I just >> >> It's not so deterministic, sadly. In my tests, upon entry into >> e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch >> I've seen: >> icr=0x0 >> icr=0x3d >> Reserved RXDMT0 Reserved LSC TXDW >> icr=0x46 >> RXO LSC TXQE >> and someone else reported: >> icr=0x3c >> Reserved RXDMT0 Reserved LSC >> >> > went through and did the testing if the _RXO bit was set, otherwise I >> > assumed that whatever event was received must have been meant to >> > trigger an _LSC type event since that worked in the past. > ^... > > Re-reading that part, my thoughts are that it worked in the past, before > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1), > because (presumably) Other was not set in EIAC. It worked after > 16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun > interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the > value of icr. If it had, it would've found a bogus value, which is > what's happening after 4aea7a5c5e94. I wonder if we're not just getting > some uninitialized value from the emulation code... which makes me kind > of uneasy about your approach of trying to make sense of the value. > Maybe Shrikrishna can clarify where the icr value is coming from when > Other is set in EIAC. For now I still say we let my current patch go as-is in order to address the immediate issue. We can follow-up and do a more refined version of things once we get the final word from VMware on how this actually works. If nothing else the current patch appears to resolve the currently reported issue and is already submitted. I'm of the mind that we need to cut down on the code thrash. This driver is supposed to have been in a "maintenance" mode for the last year or so as there aren't being any new parts added is my understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1) was submitted or accepted in the first place. I don't see any notes about it fixing any bug or addressing any issue and it seems like that is the start of all the issues we have been having recently with RXO triggering more interrupts, various link issues, and this most recent VMware issue. - Alex
On 2018/01/19 08:22, Alexander Duyck wrote: > On Fri, Jan 19, 2018 at 5:36 AM, Benjamin Poirier > <benjamin.poirier@gmail.com> wrote: > > On 2018/01/19 17:59, Benjamin Poirier wrote: > >> On 2018/01/18 07:51, Alexander Duyck wrote: > >> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote: > >> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > >> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > >> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > >> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > >> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > >> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > >> > > >> > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my > >> > >> Yes. The numeric value is correct. I made a mistake when writing down > >> the flag names. > >> > >> > patch on was that the VMware code was sending _OTHER instead of _LSC > >> > to trigger LSC events. As such in my version of the workaround I just > >> > >> It's not so deterministic, sadly. In my tests, upon entry into > >> e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch > >> I've seen: > >> icr=0x0 > >> icr=0x3d > >> Reserved RXDMT0 Reserved LSC TXDW > >> icr=0x46 > >> RXO LSC TXQE > >> and someone else reported: > >> icr=0x3c > >> Reserved RXDMT0 Reserved LSC > >> > >> > went through and did the testing if the _RXO bit was set, otherwise I > >> > assumed that whatever event was received must have been meant to > >> > trigger an _LSC type event since that worked in the past. > > ^... > > > > Re-reading that part, my thoughts are that it worked in the past, before > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1), > > because (presumably) Other was not set in EIAC. It worked after > > 16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun > > interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the > > value of icr. If it had, it would've found a bogus value, which is > > what's happening after 4aea7a5c5e94. I wonder if we're not just getting > > some uninitialized value from the emulation code... which makes me kind > > of uneasy about your approach of trying to make sense of the value. > > Maybe Shrikrishna can clarify where the icr value is coming from when > > Other is set in EIAC. > > For now I still say we let my current patch go as-is in order to > address the immediate issue. We can follow-up and do a more refined > version of things once we get the final word from VMware on how this > actually works. If nothing else the current patch appears to resolve > the currently reported issue and is already submitted. > > I'm of the mind that we need to cut down on the code thrash. This > driver is supposed to have been in a "maintenance" mode for the last > year or so as there aren't being any new parts added is my > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or > accepted in the first place. I don't see any notes about it fixing any > bug or addressing any issue and it seems like that is the start of all > the issues we have been having recently with RXO triggering more > interrupts, various link issues, and this most recent VMware issue. I'm sorry to say but you're the one who suggested that change: http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html On 2015/10/28 23:08, Alexander Duyck wrote: > On 10/22/2015 05:32 PM, Benjamin Poirier wrote: [...] > > I would argue your first patch probably didn't go far enough to remove dead > code. Specifically you should only ever get into this function if LSC is > set. There are no other causes that should trigger this. As such you could > probably remove the ICR read, and instead replace it with an ICR write of > the LSC bit since OTHER is already cleared via EIAC. >
On 2018/01/20 07:45, Benjamin Poirier wrote: [...] > > > > I'm of the mind that we need to cut down on the code thrash. This > > driver is supposed to have been in a "maintenance" mode for the last > > year or so as there aren't being any new parts added is my > > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: > > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or > > accepted in the first place. I don't see any notes about it fixing any > > bug or addressing any issue and it seems like that is the start of all > > the issues we have been having recently with RXO triggering more > > interrupts, various link issues, and this most recent VMware issue. > > I'm sorry to say but you're the one who suggested that change: > > http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html > > On 2015/10/28 23:08, Alexander Duyck wrote: > > On 10/22/2015 05:32 PM, Benjamin Poirier wrote: > [...] > > > > I would argue your first patch probably didn't go far enough to remove dead > > code. Specifically you should only ever get into this function if LSC is > > set. There are no other causes that should trigger this. As such you could > > probably remove the ICR read, and instead replace it with an ICR write of > > the LSC bit since OTHER is already cleared via EIAC. > > ... The assumption that "There are no other causes that should trigger this." turned out to be wrong and that caused the RXO problems. Clearing OTHER via EIAC is causing the problems with vmware now. I don't think you foresaw those problems back in 2015 and neither did I.
On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier <benjamin.poirier@gmail.com> wrote: > On 2018/01/20 07:45, Benjamin Poirier wrote: > [...] >> > >> > I'm of the mind that we need to cut down on the code thrash. This >> > driver is supposed to have been in a "maintenance" mode for the last >> > year or so as there aren't being any new parts added is my >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or >> > accepted in the first place. I don't see any notes about it fixing any >> > bug or addressing any issue and it seems like that is the start of all >> > the issues we have been having recently with RXO triggering more >> > interrupts, various link issues, and this most recent VMware issue. >> >> I'm sorry to say but you're the one who suggested that change: >> >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html >> >> On 2015/10/28 23:08, Alexander Duyck wrote: >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote: >> [...] >> > >> > I would argue your first patch probably didn't go far enough to remove dead >> > code. Specifically you should only ever get into this function if LSC is >> > set. There are no other causes that should trigger this. As such you could >> > probably remove the ICR read, and instead replace it with an ICR write of >> > the LSC bit since OTHER is already cleared via EIAC. >> > > > ... The assumption that "There are no other causes that should trigger > this." turned out to be wrong and that caused the RXO problems. Clearing > OTHER via EIAC is causing the problems with vmware now. I don't think > you foresaw those problems back in 2015 and neither did I. Well that explains why I felt like I was explaining things to a younger version of myself. I was a bit more relaxed in terms of being willing to make arbitrary changes a few years ago. I tend to be a bit more conservative now, at least as far as having justifications as to why I want to do things. With any change you end up taking on risk, and so usually a patch has a justification as to why you are making the change. Unfortunately at the time I didn't have all the information and based my suggestion on a bad assumption. I would guess at the time I was thinking of doing general code cleanup. Other drivers such as igb work this way, but it led us down the path we are on now where we are having to make one fix after another. It is leading in the opposite direction of maintainability as this is all becoming more complex. Suggesting this was a bad decision on my part at the time. I'm only human, I make mistakes.. :-) With further review of the code I am seeing various other issues that could still pop up as I am not certain we should even have the "other" interrupt messing with the NAPI polling or packet accounting logic at all. The question I would have at this point is if we revert 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1) and all the related fixes for it, what do we end up with? It seems like the code is slowly heading back in the direction of where it was originally anyway since there have been a number of partial reverts. I'm wondering what would happen if we were to just short-cut that and audit the patches involved to see what we really need and don't. Your patch as proposed is essentially another step in that direction. I'm thinking we may want to drop my currently proposed fix for now and instead look at going through and figure out what changes after that first one are still really needed. It doesn't look like my fix will make it for 4.15 anyway so we might as well focus on making certain to have things as solid as possible by the time 4.16-rc1 rolls around. Thanks. - Alex
On 2018/01/20 09:21, Alexander Duyck wrote: > On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier > <benjamin.poirier@gmail.com> wrote: > > On 2018/01/20 07:45, Benjamin Poirier wrote: > > [...] > >> > > >> > I'm of the mind that we need to cut down on the code thrash. This > >> > driver is supposed to have been in a "maintenance" mode for the last > >> > year or so as there aren't being any new parts added is my > >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: > >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or > >> > accepted in the first place. I don't see any notes about it fixing any > >> > bug or addressing any issue and it seems like that is the start of all > >> > the issues we have been having recently with RXO triggering more > >> > interrupts, various link issues, and this most recent VMware issue. > >> > >> I'm sorry to say but you're the one who suggested that change: > >> > >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html > >> > >> On 2015/10/28 23:08, Alexander Duyck wrote: > >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote: > >> [...] > >> > > >> > I would argue your first patch probably didn't go far enough to remove dead > >> > code. Specifically you should only ever get into this function if LSC is > >> > set. There are no other causes that should trigger this. As such you could > >> > probably remove the ICR read, and instead replace it with an ICR write of > >> > the LSC bit since OTHER is already cleared via EIAC. > >> > > > > > ... The assumption that "There are no other causes that should trigger > > this." turned out to be wrong and that caused the RXO problems. Clearing > > OTHER via EIAC is causing the problems with vmware now. I don't think > > you foresaw those problems back in 2015 and neither did I. > > Well that explains why I felt like I was explaining things to a > younger version of myself. I was a bit more relaxed in terms of being > willing to make arbitrary changes a few years ago. I tend to be a bit > more conservative now, at least as far as having justifications as to > why I want to do things. With any change you end up taking on risk, > and so usually a patch has a justification as to why you are making > the change. > > Unfortunately at the time I didn't have all the information and based > my suggestion on a bad assumption. I would guess at the time I was > thinking of doing general code cleanup. Other drivers such as igb work > this way, but it led us down the path we are on now where we are > having to make one fix after another. It is leading in the opposite > direction of maintainability as this is all becoming more complex. > Suggesting this was a bad decision on my part at the time. I'm only > human, I make mistakes.. :-) Thanks for the introspection. > > With further review of the code I am seeing various other issues that > could still pop up as I am not certain we should even have the "other" > interrupt messing with the NAPI polling or packet accounting logic at > all. The question I would have at this point is if we revert > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1) > and all the related fixes for it, what do we end up with? The patch I submitted for the current vmware issue actually finishes reverting commit 16ecba59bc33. I believe the relevant commits to consider are: 16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1) a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1) 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) 19110cfbb34d e1000e: Separate signaling for link check/link up (v4.15-rc1) 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) 4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return value. (v4.15-rc8) (submitted) e1000e: Remove Other from EIAC. commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of 16ecba59bc33 (ICR read). The submitted patch reverts the rest of 16ecba59bc33 (EIAC clearing of Other). > It seems > like the code is slowly heading back in the direction of where it was > originally anyway since there have been a number of partial reverts. > I'm wondering what would happen if we were to just short-cut that and > audit the patches involved to see what we really need and don't. > > Your patch as proposed is essentially another step in that direction. > I'm thinking we may want to drop my currently proposed fix for now and > instead look at going through and figure out what changes after that > first one are still really needed. If the patch that I submitted for the current vmware issue is merged, the significant commits that are left are: 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) Fixes a problem in the irq disabling of the napi implementation. 19110cfbb34d e1000e: Separate signaling for link check/link up (v4.15-rc1) Fixes link flapping caused by a race condition in link detection. It was found because the Other interrupt was being triggered sort of spuriously by rxo. 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) Fixes Other interrupt bursts during sustained rxo conditions. I think all of those are still needed, or if they're removed, they need to be reimplemented differently. > It doesn't look like my fix will > make it for 4.15 anyway so we might as well focus on making certain to > have things as solid as possible by the time 4.16-rc1 rolls around. > > Thanks. > > - Alex
On Sun, Jan 21, 2018 at 11:12 PM, Benjamin Poirier <benjamin.poirier@gmail.com> wrote: > On 2018/01/20 09:21, Alexander Duyck wrote: >> On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier >> <benjamin.poirier@gmail.com> wrote: >> > On 2018/01/20 07:45, Benjamin Poirier wrote: >> > [...] >> >> > >> >> > I'm of the mind that we need to cut down on the code thrash. This >> >> > driver is supposed to have been in a "maintenance" mode for the last >> >> > year or so as there aren't being any new parts added is my >> >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: >> >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or >> >> > accepted in the first place. I don't see any notes about it fixing any >> >> > bug or addressing any issue and it seems like that is the start of all >> >> > the issues we have been having recently with RXO triggering more >> >> > interrupts, various link issues, and this most recent VMware issue. >> >> >> >> I'm sorry to say but you're the one who suggested that change: >> >> >> >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html >> >> >> >> On 2015/10/28 23:08, Alexander Duyck wrote: >> >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote: >> >> [...] >> >> > >> >> > I would argue your first patch probably didn't go far enough to remove dead >> >> > code. Specifically you should only ever get into this function if LSC is >> >> > set. There are no other causes that should trigger this. As such you could >> >> > probably remove the ICR read, and instead replace it with an ICR write of >> >> > the LSC bit since OTHER is already cleared via EIAC. >> >> > >> > >> > ... The assumption that "There are no other causes that should trigger >> > this." turned out to be wrong and that caused the RXO problems. Clearing >> > OTHER via EIAC is causing the problems with vmware now. I don't think >> > you foresaw those problems back in 2015 and neither did I. >> >> Well that explains why I felt like I was explaining things to a >> younger version of myself. I was a bit more relaxed in terms of being >> willing to make arbitrary changes a few years ago. I tend to be a bit >> more conservative now, at least as far as having justifications as to >> why I want to do things. With any change you end up taking on risk, >> and so usually a patch has a justification as to why you are making >> the change. >> >> Unfortunately at the time I didn't have all the information and based >> my suggestion on a bad assumption. I would guess at the time I was >> thinking of doing general code cleanup. Other drivers such as igb work >> this way, but it led us down the path we are on now where we are >> having to make one fix after another. It is leading in the opposite >> direction of maintainability as this is all becoming more complex. >> Suggesting this was a bad decision on my part at the time. I'm only >> human, I make mistakes.. :-) > > Thanks for the introspection. > >> >> With further review of the code I am seeing various other issues that >> could still pop up as I am not certain we should even have the "other" >> interrupt messing with the NAPI polling or packet accounting logic at >> all. The question I would have at this point is if we revert >> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1) >> and all the related fixes for it, what do we end up with? > > The patch I submitted for the current vmware issue actually finishes > reverting commit 16ecba59bc33. > > I believe the relevant commits to consider are: > > 16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1) > a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1) > 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) > > 19110cfbb34d e1000e: Separate signaling for link check/link up > (v4.15-rc1) > 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) > > 4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return > value. (v4.15-rc8) > > (submitted) e1000e: Remove Other from EIAC. > > commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of > 16ecba59bc33 (ICR read). The submitted patch reverts the rest of > 16ecba59bc33 (EIAC clearing of Other). > >> It seems >> like the code is slowly heading back in the direction of where it was >> originally anyway since there have been a number of partial reverts. >> I'm wondering what would happen if we were to just short-cut that and >> audit the patches involved to see what we really need and don't. >> >> Your patch as proposed is essentially another step in that direction. >> I'm thinking we may want to drop my currently proposed fix for now and >> instead look at going through and figure out what changes after that >> first one are still really needed. > > If the patch that I submitted for the current vmware issue is merged, > the significant commits that are left are: > > 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) > Fixes a problem in the irq disabling of the napi implementation. This one I fully agree is still needed. > 19110cfbb34d e1000e: Separate signaling for link check/link up > (v4.15-rc1) > Fixes link flapping caused by a race condition in link > detection. It was found because the Other interrupt was being > triggered sort of spuriously by rxo. This one is somewhat iffy. I am not sure if the patch description really matches what it is doing. It doesn't appear to do what it says it is trying to do since clearing get_link_status will still trigger a link up, it just takes an extra 2 seconds. I think there may be issues if you aren't using autoneg, as I don't see how you are getting the link to report up other than the fact that mac->get_link_status has been cleared but we are reporting a pseduo-error. In addition it is only really needed after the RXO problem was introduced which really didn't exist until after we stopped checking for LSC. One interesting test we may want to look at is to see if there is an additional delay in a link coming up for a non-autoneg setup. If we find an additional 2 second delay then I would be even more confident that this patch has a bug. > 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) > Fixes Other interrupt bursts during sustained rxo conditions. So the RXO problem probably didn't exist until we stopped checking for the OTHER and LSC bits in the "other" interrupt handler. Yes there would be more "other" cause interrupts, but they shouldn't have been causing much in the way of issues since the get_link_status value never changed. Personally I would lean more toward the option of reverting this patch and instead just focus on testing OTHER and LSC as we originally were so that we don't risk messing up NAPI by messing with ring state from a non-ring interrupt. I will try to get to these later this week if you would like. Unfortunately I don't have any of these devices in any of my development systems so I have to go chase one down. Otherwise you are free to take these on and tell me if I have made another flawed assumption somewhere, but I am thinking the RXO issue goes away if we get the original "other" interrupt routine back to where it was. So the last bit in all this ends up being that because of 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) we don't seem to auto-clear interrupt causes anymore on ICR read. I am not certain what the impact of this is. I would be interested in finding out if a cause left set will trigger an interrupt storm or if it just goes quiet when we just leave the value high. If it goes quiet then that in itself might solve the RXO interrupt burst problem if we don't clear it. Otherwise we need to make certain to clear all of the causes that can trigger the "other" interrupt to fire regardless of if we service the events or not. Thanks. - Alex
On 2018/01/22 10:01, Alexander Duyck wrote: [...] > > > > If the patch that I submitted for the current vmware issue is merged, > > the significant commits that are left are: > > > > 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) > > Fixes a problem in the irq disabling of the napi implementation. > > This one I fully agree is still needed. > > > 19110cfbb34d e1000e: Separate signaling for link check/link up > > (v4.15-rc1) > > Fixes link flapping caused by a race condition in link > > detection. It was found because the Other interrupt was being > > triggered sort of spuriously by rxo. > > This one is somewhat iffy. I am not sure if the patch description > really matches what it is doing. It doesn't appear to do what it says > it is trying to do since clearing get_link_status will still trigger a > link up, it just takes an extra 2 seconds. I think there may be issues > if you aren't using autoneg, as I don't see how you are getting the > link to report up other than the fact that mac->get_link_status has > been cleared but we are reporting a pseduo-error. In addition it is > only really needed after the RXO problem was introduced which really > didn't exist until after we stopped checking for LSC. One interesting > test we may want to look at is to see if there is an additional delay > in a link coming up for a non-autoneg setup. If we find an additional > 2 second delay then I would be even more confident that this patch has > a bug. It seems like you're right but I didn't look into this part of the problem in detail yet. I'll get back to it. > > > 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) > > Fixes Other interrupt bursts during sustained rxo conditions. > > So the RXO problem probably didn't exist until we stopped checking for > the OTHER and LSC bits in the "other" interrupt handler. Yes there > would be more "other" cause interrupts, but they shouldn't have been > causing much in the way of issues since the get_link_status value > never changed. Personally I would lean more toward the option of I agree. I tested rxo behavior on commit 4d432f67ff00 ("e1000e: Remove unreachable code", v4.5-rc1) which is before any significant change in that area. (I force rxo by adding mdelay(10) to e1000_clean_rx_irq and sending a netperf UDP_STREAM from another host). In case of sustained rxo condition, we get repeated Other interrupts. Handling these irqs is useless work that could be avoided when the system is already overloaded but it doesn't lead to misbehavior like the race condition described in the log of commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up", v4.15-rc1). However, I noticed something unexpected. It seems like reading ICR doesn't clear every bit that's set in IAM, most notably not rxo. In a different test, I was doing a single write of RXO | OTHER to ICS, then two subsequent reads of icr gave 0x01000041. OTOH, writing a bit to ICS reliably clears it. So if you want to remove RXO interrupt mitigation, you should at least add a write of RXO to ICR, to clear it. On my system it reduced Other interrupts from ~17000/s to ~1700/s when using the mdelay testing approach. > reverting this patch and instead just focus on testing OTHER and LSC > as we originally were so that we don't risk messing up NAPI by messing > with ring state from a non-ring interrupt. > > I will try to get to these later this week if you would like. > Unfortunately I don't have any of these devices in any of my > development systems so I have to go chase one down. Otherwise you are > free to take these on and tell me if I have made another flawed > assumption somewhere, but I am thinking the RXO issue goes away if we > get the original "other" interrupt routine back to where it was. > > So the last bit in all this ends up being that because of 0a8047ac68e5 > e1000e: Fix msi-x interrupt automask (v4.5-rc1) we don't seem to > auto-clear interrupt causes anymore on ICR read. I am not certain what > the impact of this is. I would be interested in finding out if a cause > left set will trigger an interrupt storm or if it just goes quiet when > we just leave the value high. If it goes quiet then that in itself > might solve the RXO interrupt burst problem if we don't clear it. > Otherwise we need to make certain to clear all of the causes that can > trigger the "other" interrupt to fire regardless of if we service the > events or not. In MSI-X mode, as long as Other is not set in ICR, nothing will happen even if the bits related to Other (LSC, RXO, MDAC, SRPD, ACK, MNG) are set. However, that doesn't solve the rxo interrupt burst because an rxo condition in hardware sets both RXO and Other, so it triggers an interrupt.
On Wed, Jan 24, 2018 at 12:35 AM, Benjamin Poirier <benjamin.poirier@gmail.com> wrote: > On 2018/01/22 10:01, Alexander Duyck wrote: > [...] >> > >> > If the patch that I submitted for the current vmware issue is merged, >> > the significant commits that are left are: >> > >> > 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1) >> > Fixes a problem in the irq disabling of the napi implementation. >> >> This one I fully agree is still needed. >> >> > 19110cfbb34d e1000e: Separate signaling for link check/link up >> > (v4.15-rc1) >> > Fixes link flapping caused by a race condition in link >> > detection. It was found because the Other interrupt was being >> > triggered sort of spuriously by rxo. >> >> This one is somewhat iffy. I am not sure if the patch description >> really matches what it is doing. It doesn't appear to do what it says >> it is trying to do since clearing get_link_status will still trigger a >> link up, it just takes an extra 2 seconds. I think there may be issues >> if you aren't using autoneg, as I don't see how you are getting the >> link to report up other than the fact that mac->get_link_status has >> been cleared but we are reporting a pseduo-error. In addition it is >> only really needed after the RXO problem was introduced which really >> didn't exist until after we stopped checking for LSC. One interesting >> test we may want to look at is to see if there is an additional delay >> in a link coming up for a non-autoneg setup. If we find an additional >> 2 second delay then I would be even more confident that this patch has >> a bug. > > It seems like you're right but I didn't look into this part of the problem in > detail yet. I'll get back to it. > >> >> > 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1) >> > Fixes Other interrupt bursts during sustained rxo conditions. >> >> So the RXO problem probably didn't exist until we stopped checking for >> the OTHER and LSC bits in the "other" interrupt handler. Yes there >> would be more "other" cause interrupts, but they shouldn't have been >> causing much in the way of issues since the get_link_status value >> never changed. Personally I would lean more toward the option of > > I agree. I tested rxo behavior on commit 4d432f67ff00 ("e1000e: Remove > unreachable code", v4.5-rc1) which is before any significant change in that > area. (I force rxo by adding mdelay(10) to e1000_clean_rx_irq and sending a > netperf UDP_STREAM from another host). In case of sustained rxo condition, we > get repeated Other interrupts. Handling these irqs is useless work that could > be avoided when the system is already overloaded but it doesn't lead to > misbehavior like the race condition described in the log of commit > 19110cfbb34d ("e1000e: Separate signaling for link check/link up", v4.15-rc1). > > However, I noticed something unexpected. It seems like reading ICR doesn't > clear every bit that's set in IAM, most notably not rxo. In a different test, > I was doing a single write of RXO | OTHER to ICS, then two subsequent reads of > icr gave 0x01000041. OTOH, writing a bit to ICS reliably clears it. So if you > want to remove RXO interrupt mitigation, you should at least add a write of > RXO to ICR, to clear it. On my system it reduced Other interrupts from > ~17000/s to ~1700/s when using the mdelay testing approach. That makes sense. I would say we should probably just put together a mask of all possible causes for "other" interrupts that we don't actually care about and just clear them explicitly when we clear the "other" bit. >> reverting this patch and instead just focus on testing OTHER and LSC >> as we originally were so that we don't risk messing up NAPI by messing >> with ring state from a non-ring interrupt. >> >> I will try to get to these later this week if you would like. >> Unfortunately I don't have any of these devices in any of my >> development systems so I have to go chase one down. Otherwise you are >> free to take these on and tell me if I have made another flawed >> assumption somewhere, but I am thinking the RXO issue goes away if we >> get the original "other" interrupt routine back to where it was. >> >> So the last bit in all this ends up being that because of 0a8047ac68e5 >> e1000e: Fix msi-x interrupt automask (v4.5-rc1) we don't seem to >> auto-clear interrupt causes anymore on ICR read. I am not certain what >> the impact of this is. I would be interested in finding out if a cause >> left set will trigger an interrupt storm or if it just goes quiet when >> we just leave the value high. If it goes quiet then that in itself >> might solve the RXO interrupt burst problem if we don't clear it. >> Otherwise we need to make certain to clear all of the causes that can >> trigger the "other" interrupt to fire regardless of if we service the >> events or not. > > In MSI-X mode, as long as Other is not set in ICR, nothing will happen even if > the bits related to Other (LSC, RXO, MDAC, SRPD, ACK, MNG) are set. However, > that doesn't solve the rxo interrupt burst because an rxo condition in > hardware sets both RXO and Other, so it triggers an interrupt. Right. I expect we will still have more interrupts. But by just clearing it and moving on we at least resolve the issue without introducing possible new ones. I would say we could probably clear everything that is on that list that is not LSC explicitly since we really don't care about those events. If that changes then in the future we could avoid clearing them until we capture an actual event. Thanks. - Alex
On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote: > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > Some experimentation showed that this flaw in vmware e1000e emulation can > be worked around by not setting Other in EIAC. This is how it was before > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > --- Hi Benjamin, How would you feel about resubmitting this patch for net? We have some issues that have come up and it would be useful to have this fixed in the kernel sooner rather than later. I would be okay with us applying it for now while we work on coming up with a more complete solution. Thanks. - Alex
On 2018/01/30 11:46, Alexander Duyck wrote: > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote: > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > Some experimentation showed that this flaw in vmware e1000e emulation can > > be worked around by not setting Other in EIAC. This is how it was before > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > > --- > > Hi Benjamin, > > How would you feel about resubmitting this patch for net? > > We have some issues that have come up and it would be useful to have > this fixed in the kernel sooner rather than later. I would be okay > with us applying it for now while we work on coming up with a more > complete solution. Ok, I've resent it in its original form. Once it's in mainline I'll rebase the cleanups.
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On > Behalf Of Benjamin Poirier > Sent: Tuesday, January 30, 2018 11:31 PM > To: Alexander Duyck <alexander.duyck@gmail.com> > Cc: Netdev <netdev@vger.kernel.org>; intel-wired-lan <intel-wired- > lan@lists.osuosl.org>; linux-kernel@vger.kernel.org > Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC. > > On 2018/01/30 11:46, Alexander Duyck wrote: > > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> > wrote: > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid > receiver > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() > > > on emulated e1000e devices. In comparison, on real e1000e 82574 > hardware, > > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > > > Some experimentation showed that this flaw in vmware e1000e > emulation can > > > be worked around by not setting Other in EIAC. This is how it was before > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > > > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") > > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > > > --- > > > > Hi Benjamin, > > > > How would you feel about resubmitting this patch for net? > > > > We have some issues that have come up and it would be useful to have > > this fixed in the kernel sooner rather than later. I would be okay > > with us applying it for now while we work on coming up with a more > > complete solution. > > Ok, I've resent it in its original form. Once it's in mainline I'll > rebase the cleanups. Tested-by: Aaron Brown <aaron.f.brown@intel.com> > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> From: Brown, Aaron F > Sent: Thursday, February 1, 2018 8:30 PM > To: 'Benjamin Poirier' <bpoirier@suse.com>; Alexander Duyck > <alexander.duyck@gmail.com> > Cc: Netdev <netdev@vger.kernel.org>; intel-wired-lan <intel-wired- > lan@lists.osuosl.org>; linux-kernel@vger.kernel.org > Subject: RE: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC. > > > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On > > Behalf Of Benjamin Poirier > > Sent: Tuesday, January 30, 2018 11:31 PM > > To: Alexander Duyck <alexander.duyck@gmail.com> > > Cc: Netdev <netdev@vger.kernel.org>; intel-wired-lan <intel-wired- > > lan@lists.osuosl.org>; linux-kernel@vger.kernel.org > > Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from > EIAC. > > > > On 2018/01/30 11:46, Alexander Duyck wrote: > > > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier > <bpoirier@suse.com> > > wrote: > > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build > > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid > > receiver > > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after > > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in > e1000_msix_other() > > > > on emulated e1000e devices. In comparison, on real e1000e 82574 > > hardware, > > > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. > > > > > > > > Some experimentation showed that this flaw in vmware e1000e > > emulation can > > > > be worked around by not setting Other in EIAC. This is how it was > before > > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). > > > > > > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt > bursts") > > > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com> > > > > --- > > > > > > Hi Benjamin, > > > > > > How would you feel about resubmitting this patch for net? > > > > > > We have some issues that have come up and it would be useful to have > > > this fixed in the kernel sooner rather than later. I would be okay > > > with us applying it for now while we work on coming up with a more > > > complete solution. > > > > Ok, I've resent it in its original form. Once it's in mainline I'll > > rebase the cleanups. > > Tested-by: Aaron Brown > <aaron.f.brown@intel.com> Stupid line wrap ... been that kind of day. Tested-by: Aaron Brown <aaron.f.brown@intel.com> > > > _______________________________________________ > > Intel-wired-lan mailing list > > Intel-wired-lan@osuosl.org > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9f18d39bdc8f..625a4c9a86a4 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1918,6 +1918,8 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) bool enable = true; icr = er32(ICR); + ew32(ICR, E1000_ICR_OTHER); + if (icr & E1000_ICR_RXO) { ew32(ICR, E1000_ICR_RXO); enable = false; @@ -2040,7 +2042,6 @@ static void e1000_configure_msix(struct e1000_adapter *adapter) hw->hw_addr + E1000_EITR_82574(vector)); else writel(1, hw->hw_addr + E1000_EITR_82574(vector)); - adapter->eiac_mask |= E1000_IMS_OTHER; /* Cause Tx interrupts on every write back */ ivar |= BIT(31); @@ -2265,7 +2266,7 @@ static void e1000_irq_enable(struct e1000_adapter *adapter) if (adapter->msix_entries) { ew32(EIAC_82574, adapter->eiac_mask & E1000_EIAC_MASK_82574); - ew32(IMS, adapter->eiac_mask | E1000_IMS_LSC); + ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER | E1000_IMS_LSC); } else if (hw->mac.type >= e1000_pch_lpt) { ew32(IMS, IMS_ENABLE_MASK | E1000_IMS_ECCER); } else {
It was reported that emulated e1000e devices in vmware esxi 6.5 Build 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts", v4.15-rc1). Some tracing shows that after e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. Some experimentation showed that this flaw in vmware e1000e emulation can be worked around by not setting Other in EIAC. This is how it was before 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1). Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> --- Jeff, I'm sending as RFC since it looks like a problem that should be fixed in vmware. If you'd like to have the workaround in e1000e, I'll submit. --- drivers/net/ethernet/intel/e1000e/netdev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)