Message ID | 20180815062110.16155-1-jian-hong@endlessm.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Series | r8169: don't use MSI-X on RTL8106e | expand |
From: <jian-hong@endlessm.com> Date: Wed, 15 Aug 2018 14:21:10 +0800 > Found the ethernet network on ASUS X441UAR doesn't come back on resume > from suspend when using MSI-X. The chip is RTL8106e - version 39. Heiner, please take a look at this. You recently disabled MSI-X on RTL8168g for similar reasons. Now that we've seen two chips like this, maybe there is some other problem afoot. Thanks.
On 16.08.2018 21:21, David Miller wrote: > From: <jian-hong@endlessm.com> > Date: Wed, 15 Aug 2018 14:21:10 +0800 > >> Found the ethernet network on ASUS X441UAR doesn't come back on resume >> from suspend when using MSI-X. The chip is RTL8106e - version 39. > > Heiner, please take a look at this. > > You recently disabled MSI-X on RTL8168g for similar reasons. > > Now that we've seen two chips like this, maybe there is some other > problem afoot. > Thanks for the hint. I saw it already and just contacted Realtek whether they are aware of any MSI-X issues with particular chip versions. With the chip versions I have access to MSI-X works fine. There's also the theoretical option that the issues are caused by broken BIOS's. But so far only chip versions have been reported which are very similar, at least with regard to version number (2x VER_40, 1x VER_39). So they may share some buggy component. Let's see whether Realtek can provide some hint. If more chip versions are reported having problems with MSI-X, then we could switch to a whitelist or disable MSI-X in general. Heiner > Thanks. >
From: Heiner Kallweit <hkallweit1@gmail.com> Date: Thu, 16 Aug 2018 21:37:31 +0200 > On 16.08.2018 21:21, David Miller wrote: >> From: <jian-hong@endlessm.com> >> Date: Wed, 15 Aug 2018 14:21:10 +0800 >> >>> Found the ethernet network on ASUS X441UAR doesn't come back on resume >>> from suspend when using MSI-X. The chip is RTL8106e - version 39. >> >> Heiner, please take a look at this. >> >> You recently disabled MSI-X on RTL8168g for similar reasons. >> >> Now that we've seen two chips like this, maybe there is some other >> problem afoot. >> > Thanks for the hint. I saw it already and just contacted Realtek > whether they are aware of any MSI-X issues with particular chip > versions. With the chip versions I have access to MSI-X works fine. > > There's also the theoretical option that the issues are caused by > broken BIOS's. But so far only chip versions have been reported > which are very similar, at least with regard to version number > (2x VER_40, 1x VER_39). So they may share some buggy component. > > Let's see whether Realtek can provide some hint. > If more chip versions are reported having problems with MSI-X, > then we could switch to a whitelist or disable MSI-X in general. It could be that we need to reprogram some register(s) on resume, which normally might not be needed, and that is what is causing the problem with some chips.
On 16.08.2018 21:39, David Miller wrote: > From: Heiner Kallweit <hkallweit1@gmail.com> > Date: Thu, 16 Aug 2018 21:37:31 +0200 > >> On 16.08.2018 21:21, David Miller wrote: >>> From: <jian-hong@endlessm.com> >>> Date: Wed, 15 Aug 2018 14:21:10 +0800 >>> >>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume >>>> from suspend when using MSI-X. The chip is RTL8106e - version 39. >>> >>> Heiner, please take a look at this. >>> >>> You recently disabled MSI-X on RTL8168g for similar reasons. >>> >>> Now that we've seen two chips like this, maybe there is some other >>> problem afoot. >>> >> Thanks for the hint. I saw it already and just contacted Realtek >> whether they are aware of any MSI-X issues with particular chip >> versions. With the chip versions I have access to MSI-X works fine. >> >> There's also the theoretical option that the issues are caused by >> broken BIOS's. But so far only chip versions have been reported >> which are very similar, at least with regard to version number >> (2x VER_40, 1x VER_39). So they may share some buggy component. >> >> Let's see whether Realtek can provide some hint. >> If more chip versions are reported having problems with MSI-X, >> then we could switch to a whitelist or disable MSI-X in general. > > It could be that we need to reprogram some register(s) on resume, > which normally might not be needed, and that is what is causing the > problem with some chips. > Indeed. That's what I'm checking with Realtek. In the register list in the r8169 driver there's one entry which seems to indicate that there are MSI-X specific settings. However this register isn't used, and the r8168 vendor driver uses only MSI. And there are no public datasheets.
[+cc Marc, Thomas, Christoph, linux-pci) (beginning of thread at [1]) On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote: > On 16.08.2018 21:39, David Miller wrote: > > From: Heiner Kallweit <hkallweit1@gmail.com> > > Date: Thu, 16 Aug 2018 21:37:31 +0200 > > > >> On 16.08.2018 21:21, David Miller wrote: > >>> From: <jian-hong@endlessm.com> > >>> Date: Wed, 15 Aug 2018 14:21:10 +0800 > >>> > >>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume > >>>> from suspend when using MSI-X. The chip is RTL8106e - version 39. > >>> > >>> Heiner, please take a look at this. > >>> > >>> You recently disabled MSI-X on RTL8168g for similar reasons. > >>> > >>> Now that we've seen two chips like this, maybe there is some other > >>> problem afoot. > >>> > >> Thanks for the hint. I saw it already and just contacted Realtek > >> whether they are aware of any MSI-X issues with particular chip > >> versions. With the chip versions I have access to MSI-X works fine. > >> > >> There's also the theoretical option that the issues are caused by > >> broken BIOS's. But so far only chip versions have been reported > >> which are very similar, at least with regard to version number > >> (2x VER_40, 1x VER_39). So they may share some buggy component. > >> > >> Let's see whether Realtek can provide some hint. > >> If more chip versions are reported having problems with MSI-X, > >> then we could switch to a whitelist or disable MSI-X in general. > > > > It could be that we need to reprogram some register(s) on resume, > > which normally might not be needed, and that is what is causing the > > problem with some chips. > > > Indeed. That's what I'm checking with Realtek. > In the register list in the r8169 driver there's one entry which > seems to indicate that there are MSI-X specific settings. > However this register isn't used, and the r8168 vendor driver > uses only MSI. And there are no public datasheets. Do we have any information about these chip versions in other systems? Or other devices using MSI-X in the same ASUS system? It seems possible that there's some PCI core or suspend/resume issue with MSI-X and this patch just avoids it without fixing the root cause. It might be useful to have a kernel.org bugzilla with the complete dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived for future reference. [1] https://lkml.kernel.org/r/20180815062110.16155-1-jian-hong@endlessm.com
On 08/16/2018 12:50 PM, Heiner Kallweit wrote: > On 16.08.2018 21:39, David Miller wrote: >> From: Heiner Kallweit <hkallweit1@gmail.com> >> Date: Thu, 16 Aug 2018 21:37:31 +0200 >> >>> On 16.08.2018 21:21, David Miller wrote: >>>> From: <jian-hong@endlessm.com> >>>> Date: Wed, 15 Aug 2018 14:21:10 +0800 >>>> >>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume >>>>> from suspend when using MSI-X. The chip is RTL8106e - version 39. >>>> >>>> Heiner, please take a look at this. >>>> >>>> You recently disabled MSI-X on RTL8168g for similar reasons. >>>> >>>> Now that we've seen two chips like this, maybe there is some other >>>> problem afoot. >>>> >>> Thanks for the hint. I saw it already and just contacted Realtek >>> whether they are aware of any MSI-X issues with particular chip >>> versions. With the chip versions I have access to MSI-X works fine. >>> >>> There's also the theoretical option that the issues are caused by >>> broken BIOS's. But so far only chip versions have been reported >>> which are very similar, at least with regard to version number >>> (2x VER_40, 1x VER_39). So they may share some buggy component. >>> >>> Let's see whether Realtek can provide some hint. >>> If more chip versions are reported having problems with MSI-X, >>> then we could switch to a whitelist or disable MSI-X in general. >> >> It could be that we need to reprogram some register(s) on resume, >> which normally might not be needed, and that is what is causing the >> problem with some chips. >> > Indeed. That's what I'm checking with Realtek. > In the register list in the r8169 driver there's one entry which > seems to indicate that there are MSI-X specific settings. > However this register isn't used, and the r8168 vendor driver > uses only MSI. And there are no public datasheets. Stupid question, but should not we be asking the reporter to try again with: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfdd19ad80f203f42f05fd32a31c678c9c524ef9 applied? The original report shows the Generic PHY being used, not the Realtek PHY driver being used, is this possibly contributing to the problem?
On 20.08.2018 20:44, Bjorn Helgaas wrote: > [+cc Marc, Thomas, Christoph, linux-pci) > (beginning of thread at [1]) > > On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote: >> On 16.08.2018 21:39, David Miller wrote: >>> From: Heiner Kallweit <hkallweit1@gmail.com> >>> Date: Thu, 16 Aug 2018 21:37:31 +0200 >>> >>>> On 16.08.2018 21:21, David Miller wrote: >>>>> From: <jian-hong@endlessm.com> >>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800 >>>>> >>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume >>>>>> from suspend when using MSI-X. The chip is RTL8106e - version 39. >>>>> >>>>> Heiner, please take a look at this. >>>>> >>>>> You recently disabled MSI-X on RTL8168g for similar reasons. >>>>> >>>>> Now that we've seen two chips like this, maybe there is some other >>>>> problem afoot. >>>>> >>>> Thanks for the hint. I saw it already and just contacted Realtek >>>> whether they are aware of any MSI-X issues with particular chip >>>> versions. With the chip versions I have access to MSI-X works fine. >>>> >>>> There's also the theoretical option that the issues are caused by >>>> broken BIOS's. But so far only chip versions have been reported >>>> which are very similar, at least with regard to version number >>>> (2x VER_40, 1x VER_39). So they may share some buggy component. >>>> >>>> Let's see whether Realtek can provide some hint. >>>> If more chip versions are reported having problems with MSI-X, >>>> then we could switch to a whitelist or disable MSI-X in general. >>> >>> It could be that we need to reprogram some register(s) on resume, >>> which normally might not be needed, and that is what is causing the >>> problem with some chips. >>> >> Indeed. That's what I'm checking with Realtek. >> In the register list in the r8169 driver there's one entry which >> seems to indicate that there are MSI-X specific settings. >> However this register isn't used, and the r8168 vendor driver >> uses only MSI. And there are no public datasheets. > > Do we have any information about these chip versions in other systems? > Or other devices using MSI-X in the same ASUS system? It seems > possible that there's some PCI core or suspend/resume issue with MSI-X > and this patch just avoids it without fixing the root cause. > I'm in contact with Realtek and according to them few chip versions seem to clear MSI-X table entries on resume from suspend. Checking with them how this could be fixed / worked around. Worst case we may have to disable MSI-X in general. > It might be useful to have a kernel.org bugzilla with the complete > dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived > for future reference. > > [1] https://lkml.kernel.org/r/20180815062110.16155-1-jian-hong@endlessm.com >
On 20.08.2018 22:40, Florian Fainelli wrote: > On 08/16/2018 12:50 PM, Heiner Kallweit wrote: >> On 16.08.2018 21:39, David Miller wrote: >>> From: Heiner Kallweit <hkallweit1@gmail.com> >>> Date: Thu, 16 Aug 2018 21:37:31 +0200 >>> >>>> On 16.08.2018 21:21, David Miller wrote: >>>>> From: <jian-hong@endlessm.com> >>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800 >>>>> >>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume >>>>>> from suspend when using MSI-X. The chip is RTL8106e - version 39. >>>>> >>>>> Heiner, please take a look at this. >>>>> >>>>> You recently disabled MSI-X on RTL8168g for similar reasons. >>>>> >>>>> Now that we've seen two chips like this, maybe there is some other >>>>> problem afoot. >>>>> >>>> Thanks for the hint. I saw it already and just contacted Realtek >>>> whether they are aware of any MSI-X issues with particular chip >>>> versions. With the chip versions I have access to MSI-X works fine. >>>> >>>> There's also the theoretical option that the issues are caused by >>>> broken BIOS's. But so far only chip versions have been reported >>>> which are very similar, at least with regard to version number >>>> (2x VER_40, 1x VER_39). So they may share some buggy component. >>>> >>>> Let's see whether Realtek can provide some hint. >>>> If more chip versions are reported having problems with MSI-X, >>>> then we could switch to a whitelist or disable MSI-X in general. >>> >>> It could be that we need to reprogram some register(s) on resume, >>> which normally might not be needed, and that is what is causing the >>> problem with some chips. >>> >> Indeed. That's what I'm checking with Realtek. >> In the register list in the r8169 driver there's one entry which >> seems to indicate that there are MSI-X specific settings. >> However this register isn't used, and the r8168 vendor driver >> uses only MSI. And there are no public datasheets. > > Stupid question, but should not we be asking the reporter to try again with: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfdd19ad80f203f42f05fd32a31c678c9c524ef9 > > applied? The original report shows the Generic PHY being used, not the > Realtek PHY driver being used, is this possibly contributing to the problem? > I don't think it's related, because falling back to MSI fixes the issue for the reporter. And some chip versions report a generic Realtek PHY ID which isn't covered by any Realtek PHY driver. These chip versions seem to work fine with the generic PHY driver. So he may have Realtek PHY drivers enabled or not. But indeed, would be good to have this info to get the full picture. See also the mail I wrote few minutes ago, there it's described what we know about the reason of the MSI-X issue so far.
On 20/08/18 19:44, Bjorn Helgaas wrote: > [+cc Marc, Thomas, Christoph, linux-pci) > (beginning of thread at [1]) > > On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote: >> On 16.08.2018 21:39, David Miller wrote: >>> From: Heiner Kallweit <hkallweit1@gmail.com> >>> Date: Thu, 16 Aug 2018 21:37:31 +0200 >>> >>>> On 16.08.2018 21:21, David Miller wrote: >>>>> From: <jian-hong@endlessm.com> >>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800 >>>>> >>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume >>>>>> from suspend when using MSI-X. The chip is RTL8106e - version 39. >>>>> >>>>> Heiner, please take a look at this. >>>>> >>>>> You recently disabled MSI-X on RTL8168g for similar reasons. >>>>> >>>>> Now that we've seen two chips like this, maybe there is some other >>>>> problem afoot. >>>>> >>>> Thanks for the hint. I saw it already and just contacted Realtek >>>> whether they are aware of any MSI-X issues with particular chip >>>> versions. With the chip versions I have access to MSI-X works fine. >>>> >>>> There's also the theoretical option that the issues are caused by >>>> broken BIOS's. But so far only chip versions have been reported >>>> which are very similar, at least with regard to version number >>>> (2x VER_40, 1x VER_39). So they may share some buggy component. >>>> >>>> Let's see whether Realtek can provide some hint. >>>> If more chip versions are reported having problems with MSI-X, >>>> then we could switch to a whitelist or disable MSI-X in general. >>> >>> It could be that we need to reprogram some register(s) on resume, >>> which normally might not be needed, and that is what is causing the >>> problem with some chips. >>> >> Indeed. That's what I'm checking with Realtek. >> In the register list in the r8169 driver there's one entry which >> seems to indicate that there are MSI-X specific settings. >> However this register isn't used, and the r8168 vendor driver >> uses only MSI. And there are no public datasheets. > > Do we have any information about these chip versions in other systems? > Or other devices using MSI-X in the same ASUS system? It seems > possible that there's some PCI core or suspend/resume issue with MSI-X > and this patch just avoids it without fixing the root cause. > > It might be useful to have a kernel.org bugzilla with the complete > dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived > for future reference. The one system I have with a Realtek chip seems happy enough with MSI-X, but it never gets suspended. There is comment in the patch that I don't quite get: > It is the IRQ 127 - PCI-MSI used by enp2s0. However, lspci lists MSI is > disabled and MSI-X is enabled which conflicts to the interrupt table. What do you mean by "conflicts"? With what? Another question is whether you've loaded any firmware (some versions of the Realtek HW seem to require it). For the posterity, some data from my own system, which I don't know if it has any relevance to the problem at hand. Thanks, M. [ 2.624963] r8169 0000:02:00.0 eth0: RTL8168g/8111g, 5a:fe:ad:ce:11:00, XID 4c000800, IRQ 26 [ 2.633398] r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] 26: 50 997005 0 0 MSI 1048576 Edge enp2s0 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 25 Region 0: I/O ports at 1000 [size=256] Region 2: Memory at 100004000 (64-bit, prefetchable) [size=4K] Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE# DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data pcilib: sysfs_read_vpd: read failed: Input/output error Not readable Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [170 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Kernel driver in use: r8169
From: Heiner Kallweit <hkallweit1@gmail.com> Date: Mon, 20 Aug 2018 22:46:48 +0200 > I'm in contact with Realtek and according to them few chip versions > seem to clear MSI-X table entries on resume from suspend. Checking > with them how this could be fixed / worked around. > Worst case we may have to disable MSI-X in general. I worry that if the chip does this, and somehow MSI-X is enabled and an interrupt is generated, the chip will write to the cleared out MSI-X address. This will either write garbage into memory or cause a bus error and require PCI error recovery. It also looks like your test patch doesn't fix things for people who have tested it. Hmmm...
On 21.08.2018 21:31, David Miller wrote: > From: Heiner Kallweit <hkallweit1@gmail.com> > Date: Mon, 20 Aug 2018 22:46:48 +0200 > >> I'm in contact with Realtek and according to them few chip versions >> seem to clear MSI-X table entries on resume from suspend. Checking >> with them how this could be fixed / worked around. >> Worst case we may have to disable MSI-X in general. > > I worry that if the chip does this, and somehow MSI-X is enabled and > an interrupt is generated, the chip will write to the cleared out > MSI-X address. This will either write garbage into memory or cause > a bus error and require PCI error recovery. > > It also looks like your test patch doesn't fix things for people who > have tested it. > The test patch was based on the first info from Realtek which made me think that the base address of the MSI-X table is cleared, what obviously is not the case. After some further tests it seems that the solution isn't as simple as storing the MSI-X table entries on suspend and restore them on resume. On my system (where MSI-X works fine) MSI-X table entries on resume are partially different from the ones on suspend. Unfortunately I don't have affected test hardware, currently I'm waiting for further feedback from Realtek. > Hmmm... >
On 21.08.2018 10:28, Marc Zyngier wrote: > On 20/08/18 19:44, Bjorn Helgaas wrote: >> [+cc Marc, Thomas, Christoph, linux-pci) >> (beginning of thread at [1]) >> >> On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote: >>> On 16.08.2018 21:39, David Miller wrote: >>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>> Date: Thu, 16 Aug 2018 21:37:31 +0200 >>>> >>>>> On 16.08.2018 21:21, David Miller wrote: >>>>>> From: <jian-hong@endlessm.com> >>>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800 >>>>>> >>>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume >>>>>>> from suspend when using MSI-X. The chip is RTL8106e - version 39. >>>>>> >>>>>> Heiner, please take a look at this. >>>>>> >>>>>> You recently disabled MSI-X on RTL8168g for similar reasons. >>>>>> >>>>>> Now that we've seen two chips like this, maybe there is some other >>>>>> problem afoot. >>>>>> >>>>> Thanks for the hint. I saw it already and just contacted Realtek >>>>> whether they are aware of any MSI-X issues with particular chip >>>>> versions. With the chip versions I have access to MSI-X works fine. >>>>> >>>>> There's also the theoretical option that the issues are caused by >>>>> broken BIOS's. But so far only chip versions have been reported >>>>> which are very similar, at least with regard to version number >>>>> (2x VER_40, 1x VER_39). So they may share some buggy component. >>>>> >>>>> Let's see whether Realtek can provide some hint. >>>>> If more chip versions are reported having problems with MSI-X, >>>>> then we could switch to a whitelist or disable MSI-X in general. >>>> >>>> It could be that we need to reprogram some register(s) on resume, >>>> which normally might not be needed, and that is what is causing the >>>> problem with some chips. >>>> >>> Indeed. That's what I'm checking with Realtek. >>> In the register list in the r8169 driver there's one entry which >>> seems to indicate that there are MSI-X specific settings. >>> However this register isn't used, and the r8168 vendor driver >>> uses only MSI. And there are no public datasheets. >> >> Do we have any information about these chip versions in other systems? >> Or other devices using MSI-X in the same ASUS system? It seems >> possible that there's some PCI core or suspend/resume issue with MSI-X >> and this patch just avoids it without fixing the root cause. >> >> It might be useful to have a kernel.org bugzilla with the complete >> dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived >> for future reference. > > The one system I have with a Realtek chip seems happy enough with MSI-X, > but it never gets suspended. Other owners of affected chip versiosn made the same experience, MSI-X works fine until resume from suspend. > There is comment in the patch that I don't quite get: > >> It is the IRQ 127 - PCI-MSI used by enp2s0. However, lspci lists MSI is >> disabled and MSI-X is enabled which conflicts to the interrupt table. > > What do you mean by "conflicts"? With what? Another question is whether > you've loaded any firmware (some versions of the Realtek HW seem to require > it). > These "conflicts" were a misunderstanding which was clarified with the reporter. "PCI-MSI" as irq chip name in /proc/interrupts output was interpreted in a way that a MSI irq is used, not a MSI-X irq. The firmware is for the PHY only, that's at least my experience on the chip versions I have for testing. > For the posterity, some data from my own system, which I don't know if it > has any relevance to the problem at hand. > > Thanks, > > M. > > [ 2.624963] r8169 0000:02:00.0 eth0: RTL8168g/8111g, 5a:fe:ad:ce:11:00, XID 4c000800, IRQ 26 > [ 2.633398] r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] > > 26: 50 997005 0 0 MSI 1048576 Edge enp2s0 > > 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) > Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 25 > Region 0: I/O ports at 1000 [size=256] > Region 2: Memory at 100004000 (64-bit, prefetchable) [size=4K] > Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ > Address: 0000000000000000 Data: 0000 > Capabilities: [70] Express (v2) Endpoint, MSI 01 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- > MaxPayload 128 bytes, MaxReadReq 4096 bytes > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE# > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- > Vector table: BAR=4 offset=00000000 > PBA: BAR=4 offset=00000800 > Capabilities: [d0] Vital Product Data > pcilib: sysfs_read_vpd: read failed: Input/output error > Not readable > Capabilities: [100 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > Capabilities: [140 v1] Virtual Channel > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > Arb: Fixed- WRR32- WRR64- WRR128- > Ctrl: ArbSelect=Fixed > Status: InProgress- > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff > Status: NegoPending- InProgress- > Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00 > Capabilities: [170 v1] Latency Tolerance Reporting > Max snoop latency: 0ns > Max no snoop latency: 0ns > Kernel driver in use: r8169 > >
On Tue, 21 Aug 2018, Heiner Kallweit wrote: > On 21.08.2018 21:31, David Miller wrote: > > From: Heiner Kallweit <hkallweit1@gmail.com> > > Date: Mon, 20 Aug 2018 22:46:48 +0200 > > > >> I'm in contact with Realtek and according to them few chip versions > >> seem to clear MSI-X table entries on resume from suspend. Checking > >> with them how this could be fixed / worked around. > >> Worst case we may have to disable MSI-X in general. > > > > I worry that if the chip does this, and somehow MSI-X is enabled and > > an interrupt is generated, the chip will write to the cleared out > > MSI-X address. This will either write garbage into memory or cause > > a bus error and require PCI error recovery. > > > > It also looks like your test patch doesn't fix things for people who > > have tested it. > > > The test patch was based on the first info from Realtek which made me > think that the base address of the MSI-X table is cleared, what > obviously is not the case. > > After some further tests it seems that the solution isn't as simple > as storing the MSI-X table entries on suspend and restore them on > resume. On my system (where MSI-X works fine) MSI-X table entries > on resume are partially different from the ones on suspend. Which is not a surprise. Please don't try to fiddle with that at the driver level. The irq and PCI core code are the ones in charge and if you'd restore at the wrong point then hell breaks lose. Can you please do the following: 1) Store the PCI config space at suspend time 2) Compare the PCI config space at resume time and print the difference Do that on a working and a non-working version of Realtek NICs. Thanks, tglx
On 22.08.2018 13:44, Thomas Gleixner wrote: > On Tue, 21 Aug 2018, Heiner Kallweit wrote: >> On 21.08.2018 21:31, David Miller wrote: >>> From: Heiner Kallweit <hkallweit1@gmail.com> >>> Date: Mon, 20 Aug 2018 22:46:48 +0200 >>> >>>> I'm in contact with Realtek and according to them few chip versions >>>> seem to clear MSI-X table entries on resume from suspend. Checking >>>> with them how this could be fixed / worked around. >>>> Worst case we may have to disable MSI-X in general. >>> >>> I worry that if the chip does this, and somehow MSI-X is enabled and >>> an interrupt is generated, the chip will write to the cleared out >>> MSI-X address. This will either write garbage into memory or cause >>> a bus error and require PCI error recovery. >>> >>> It also looks like your test patch doesn't fix things for people who >>> have tested it. >>> >> The test patch was based on the first info from Realtek which made me >> think that the base address of the MSI-X table is cleared, what >> obviously is not the case. >> >> After some further tests it seems that the solution isn't as simple >> as storing the MSI-X table entries on suspend and restore them on >> resume. On my system (where MSI-X works fine) MSI-X table entries >> on resume are partially different from the ones on suspend. > > Which is not a surprise. Please don't try to fiddle with that at the driver > level. The irq and PCI core code are the ones in charge and if you'd > restore at the wrong point then hell breaks lose. > Instead of spending a lot of effort on a workaround which may not be acceptable, it may be better to fall back to MSI on all affected chip versions. For two chip versions which were reported to have this issues we're doing this already. I asked Realtek whether they have an overview which chip versions are affected, let's see .. The Realtek chips provide an alternative, register-based way to access the MSI-X table, and their Windows driver seems to use it. See here: https://patchwork.kernel.org/patch/4149171/ But as we handle all MSI-X basics in the PCI core, this isn't an option. > Can you please do the following: > > 1) Store the PCI config space at suspend time > 2) Compare the PCI config space at resume time and print the difference > > Do that on a working and a non-working version of Realtek NICs. > > Thanks, > > tglx > > >
2018-08-23 3:49 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>: > On 22.08.2018 13:44, Thomas Gleixner wrote: >> On Tue, 21 Aug 2018, Heiner Kallweit wrote: >>> On 21.08.2018 21:31, David Miller wrote: >>>> From: Heiner Kallweit <hkallweit1@gmail.com> >>>> Date: Mon, 20 Aug 2018 22:46:48 +0200 >>>> >>>>> I'm in contact with Realtek and according to them few chip versions >>>>> seem to clear MSI-X table entries on resume from suspend. Checking >>>>> with them how this could be fixed / worked around. >>>>> Worst case we may have to disable MSI-X in general. >>>> >>>> I worry that if the chip does this, and somehow MSI-X is enabled and >>>> an interrupt is generated, the chip will write to the cleared out >>>> MSI-X address. This will either write garbage into memory or cause >>>> a bus error and require PCI error recovery. >>>> >>>> It also looks like your test patch doesn't fix things for people who >>>> have tested it. >>>> >>> The test patch was based on the first info from Realtek which made me >>> think that the base address of the MSI-X table is cleared, what >>> obviously is not the case. >>> >>> After some further tests it seems that the solution isn't as simple >>> as storing the MSI-X table entries on suspend and restore them on >>> resume. On my system (where MSI-X works fine) MSI-X table entries >>> on resume are partially different from the ones on suspend. >> >> Which is not a surprise. Please don't try to fiddle with that at the driver >> level. The irq and PCI core code are the ones in charge and if you'd >> restore at the wrong point then hell breaks lose. >> > Instead of spending a lot of effort on a workaround which may not be > acceptable, it may be better to fall back to MSI on all affected chip > versions. For two chip versions which were reported to have this issues > we're doing this already. I asked Realtek whether they have an overview > which chip versions are affected, let's see .. > > The Realtek chips provide an alternative, register-based way to access > the MSI-X table, and their Windows driver seems to use it. See here: > https://patchwork.kernel.org/patch/4149171/ > > But as we handle all MSI-X basics in the PCI core, this isn't an option. > > >> Can you please do the following: Tested on ASUS X441AUR equipped with RTL8106e. This is the laptop whose ethernet does not come back after resume, if it does not fallback to MSI. Here is the full dmesg: https://gist.github.com/starnight/e65a97c9bf2d558926895ab76974687e >> 1) Store the PCI config space at suspend time Before suspend: dev@endless:~$ sudo lspci -xnnvvs 02:00.0 02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136] (rev 07) Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast Ethernet controller [1043:200f] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at e000 [size=256] Region 2: Memory at ef100000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at e0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE# DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data pcilib: sysfs_read_vpd: read failed: Input/output error Not readable Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 01-00-00-00-36-4c-e0-00 Capabilities: [170 v1] Latency Tolerance Reporting Max snoop latency: 3145728ns Max no snoop latency: 3145728ns Kernel driver in use: r8169 Kernel modules: r8169 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 >> 2) Compare the PCI config space at resume time and print the difference After resume: dev@endless:~$ sudo lspci -xnnvvs 02:00.0 02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136] (rev 07) Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast Ethernet controller [1043:200f] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at e000 [size=256] Region 2: Memory at ef100000 (64-bit, non-prefetchable) [size=4K] Region 4: Memory at e0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE# DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 Capabilities: [d0] Vital Product Data pcilib: sysfs_read_vpd: read failed: Input/output error Not readable Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 01-00-00-00-36-4c-e0-00 Capabilities: [170 v1] Latency Tolerance Reporting Max snoop latency: 3145728ns Max no snoop latency: 3145728ns Kernel driver in use: r8169 Kernel modules: r8169 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 After comparing, there is no difference between before suspend and after resume. Regards, Jian-Hong Pan >> Do that on a working and a non-working version of Realtek NICs. >> >> Thanks, >> >> tglx >> >> >> >
On Thu, Aug 23, 2018 at 06:46:28PM +0800, Jian-Hong Pan wrote: > > On 22.08.2018 13:44, Thomas Gleixner wrote: > >> Can you please do the following: > > Tested on ASUS X441AUR equipped with RTL8106e. > This is the laptop whose ethernet does not come back after resume, if > it does not fallback to MSI. > ... > dev@endless:~$ sudo lspci -xnnvvs 02:00.0 > ... > 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00 > 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00 > 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20 > 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 > > After comparing, there is no difference between before suspend and > after resume. It'd be better to compare the hex data directly and ignore the lspci decoding, since lspci doesn't decode everything. You only dumped the first 0x40 bytes of config space, and all capabilities, including the MSI and MSI-X capabilities, are past that: > Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- > Vector table: BAR=4 offset=00000000 > PBA: BAR=4 offset=00000800 In addition, some of the MSI-X information for this device is in BAR 4. "lspci -xxx" will dump all config space, and you can use a tool like http://cmp.felk.cvut.cz/~pisa/linux/rdwrmem.c or https://github.com/billfarrow/pcimem to dump the BAR contents.
2018-08-23 21:38 GMT+08:00 Bjorn Helgaas <helgaas@kernel.org>: > On Thu, Aug 23, 2018 at 06:46:28PM +0800, Jian-Hong Pan wrote: >> > On 22.08.2018 13:44, Thomas Gleixner wrote: >> >> Can you please do the following: >> >> Tested on ASUS X441AUR equipped with RTL8106e. >> This is the laptop whose ethernet does not come back after resume, if >> it does not fallback to MSI. >> ... > >> dev@endless:~$ sudo lspci -xnnvvs 02:00.0 >> ... >> 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00 >> 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00 >> 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20 >> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 >> >> After comparing, there is no difference between before suspend and >> after resume. > > It'd be better to compare the hex data directly and ignore the lspci > decoding, since lspci doesn't decode everything. You only dumped the > first 0x40 bytes of config space, and all capabilities, including the > MSI and MSI-X capabilities, are past that: > >> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ >> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- >> Vector table: BAR=4 offset=00000000 >> PBA: BAR=4 offset=00000800 > > In addition, some of the MSI-X information for this device is in BAR > 4. "lspci -xxx" will dump all config space, and you can use a tool > like http://cmp.felk.cvut.cz/~pisa/linux/rdwrmem.c or > https://github.com/billfarrow/pcimem to dump the BAR contents. Tested on ASUS X441AUR equipped with RTL8106e without fallbacking to MSI again. Use lspci and https://github.com/billfarrow/pcimem Here is the status before suspend: dev@endless:~$ sudo lspci -xxxs 02:00.0 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07) 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 40: 01 50 c3 ff 08 00 00 00 00 00 00 00 00 00 00 00 50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 10 b0 02 02 c0 8d 90 05 10 20 10 00 11 7c 47 00 80: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 1f 08 0c 00 00 04 00 00 02 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 11 d0 03 80 04 00 00 00 04 08 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 dev@endless:~$ sudo ~/pcimem/pcimem /sys/devices/pci0000\:00/0000\:00\:1c.4/0000\:02\:00.0/resource4 0 b*16384 [sudo] password for dev: /sys/devices/pci0000:00/0000:00:1c.4/0000:02:00.0/resource4 opened. Target offset is 0x0, page size is 4096 mmap(0, 16384, 0x3, 0x1, 3, 0x0) PCI Memory mapped to address 0x7f15186d1000. 0x0000: 0x38 0x0001: 0x03 0x0002: 0xE0 0x0003: 0xFE 0x0004: 0x00 ... 0x0010: 0x41 0x0011: 0x72 . . . 0x003C: 0x01 0x003D: 0x00 ... 0x1000: 0x38 0x1001: 0x03 . . . After resume: dev@endless:~$ sudo lspci -xxxs 02:00.0 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07) 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 40: 01 50 c3 ff 08 00 00 00 00 00 00 00 00 00 00 00 50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 10 b0 02 02 c0 8d 90 05 10 20 10 00 11 7c 47 00 80: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 1f 08 0c 00 00 04 00 00 02 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 11 d0 03 80 04 00 00 00 04 08 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 dev@endless:~$ sudo ~/pcimem/pcimem /sys/devices/pci0000\:00/0000\:00\:1c.4/0000\:02\:00.0/resource4 0 b*16384 /sys/devices/pci0000:00/0000:00:1c.4/0000:02:00.0/resource4 opened. Target offset is 0x0, page size is 4096 mmap(0, 16384, 0x3, 0x1, 3, 0x0) PCI Memory mapped to address 0x7f8d68dd5000. 0x0000: 0xFF ... The config is the same, but values in BAR=4 is weird after resume. They all become 0xFF. Regards, Jian-Hong Pan
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 0d9c3831838f..0efa977c422d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -7071,17 +7071,20 @@ static int rtl_alloc_irq(struct rtl8169_private *tp) { unsigned int flags; - if (tp->mac_version <= RTL_GIGA_MAC_VER_06) { + switch (tp->mac_version) { + case RTL_GIGA_MAC_VER_01 ... RTL_GIGA_MAC_VER_06: RTL_W8(tp, Cfg9346, Cfg9346_Unlock); RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable); RTL_W8(tp, Cfg9346, Cfg9346_Lock); flags = PCI_IRQ_LEGACY; - } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) { + break; + case RTL_GIGA_MAC_VER_39 ... RTL_GIGA_MAC_VER_40: /* This version was reported to have issues with resume * from suspend when using MSI-X */ flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI; - } else { + break; + default: flags = PCI_IRQ_ALL_TYPES; }