Message ID | 20230514103634.235917-1-mail@mariushoch.de |
---|---|
Headers | show |
Series | i2c: i801: Force no IRQ for Dell Latitude E7450 | expand |
Hi Marius, On Sun, 14 May 2023 12:36:32 +0200, Marius Hoch wrote: > The Dell Latitude E7450 uses IRQ 18 for the accelerometer, > but also claims that the SMBus uses IRQ 18. This will > result in: > > i801_smbus 0000:00:1f.3: PCI INT C: failed to register GSI > i801_smbus 0000:00:1f.3: Failed to enable SMBus PCI device (-16) > i801_smbus: probe of 0000:00:1f.3 failed with error -16 The i2c-i801 driver supports shared IRQ. If this fails, this means that the other driver is not passing IRQF_SHARED when registering the interrupt. Which driver is this? I'd rather check whether sharing the IRQ is possible, rather that falling back to polling, which has a performance cost. > Force the SMBus IRQ to IRQ_NOTCONNECTED in this case, so that > we fall back to polling, which also seems to be what the (very > dated) Windows 7 drivers on the Dell Latitude E7450 do. What makes you think so?
Hi, thank you for the reply! On 23/05/2023 20:03, Jean Delvare wrote: > Hi Marius, > > On Sun, 14 May 2023 12:36:32 +0200, Marius Hoch wrote: >> The Dell Latitude E7450 uses IRQ 18 for the accelerometer, >> but also claims that the SMBus uses IRQ 18. This will >> result in: >> >> i801_smbus 0000:00:1f.3: PCI INT C: failed to register GSI >> i801_smbus 0000:00:1f.3: Failed to enable SMBus PCI device (-16) >> i801_smbus: probe of 0000:00:1f.3 failed with error -16 > The i2c-i801 driver supports shared IRQ. If this fails, this means that > the other driver is not passing IRQF_SHARED when registering the > interrupt. Which driver is this? I'd rather check whether sharing the > IRQ is possible, rather that falling back to polling, which has a > performance cost. I don't think this is a conflict rather than a completely bogus entry: smo8800 uses IRQ 18 (the freefall sensor). For the SMBus in acpi_pci_irq_enable, acpi_register_gsi fails for GSI 18 with IRQ 255 (dev->irq), independently from the presence of the dell_smo8800 module. Now looking into this again, seeing dev->irq at 255 seems very suspicious here? Doesn't that mean not connected (although I'm not sure how this relates to it supposedly having GSI 18)? > >> Force the SMBus IRQ to IRQ_NOTCONNECTED in this case, so that >> we fall back to polling, which also seems to be what the (very >> dated) Windows 7 drivers on the Dell Latitude E7450 do. > What makes you think so? > According to the Windows 7 device manager IRQ view, the SMBus has no IRQ assigned, which I assumed implies that polling is used. If there is another way to check this on Windows 7, please let me know. Cheers, Marius
Hi Marius, On Sat, 3 Jun 2023 11:24:02 +0200, Marius Hoch wrote: > On 23/05/2023 20:03, Jean Delvare wrote: > > On Sun, 14 May 2023 12:36:32 +0200, Marius Hoch wrote: > >> The Dell Latitude E7450 uses IRQ 18 for the accelerometer, > >> but also claims that the SMBus uses IRQ 18. This will > >> result in: > >> > >> i801_smbus 0000:00:1f.3: PCI INT C: failed to register GSI > >> i801_smbus 0000:00:1f.3: Failed to enable SMBus PCI device (-16) > >> i801_smbus: probe of 0000:00:1f.3 failed with error -16 > > The i2c-i801 driver supports shared IRQ. If this fails, this means that > > the other driver is not passing IRQF_SHARED when registering the > > interrupt. Which driver is this? I'd rather check whether sharing the > > IRQ is possible, rather that falling back to polling, which has a > > performance cost. > I don't think this is a conflict rather than a completely bogus entry: > smo8800 uses IRQ 18 (the freefall sensor). You're probably right. I admit I misread your report originally and thought requesting the IRQ was failing. But actually the failure happens before that, when enabling the PCI device. So its not related to sharing the interrupt. > For the SMBus in acpi_pci_irq_enable, acpi_register_gsi fails for GSI 18 > with IRQ 255 (dev->irq), independently from the presence of the > dell_smo8800 module. > > Now looking into this again, seeing dev->irq at 255 seems very > suspicious here? Doesn't that mean not connected (although I'm not sure > how this relates to it supposedly having GSI 18)? I admit I don't know. I'm not familiar with how GSI numbers relate to IRQ numbers. I think I understand that GSI numbers are an ACPI thing, and the ACPI layer is responsible for mapping these to actual IRQ numbers? Is there a GSI-to-IRQ table available somewhere as part of the ACPI tables? If so, it would be interesting to disassemble the ACPI tables on your system and check what this looks like for you. If this is a bug in the ACPI data then it might be worth booting with acpi=noirq and see if it helps. This option might break other things though (like free fall detection or thermal management) so be cautious. IRQ number 255 indeed looks suspicious, but I'm also not aware of this being a special value (nr_irqs is defined as an unsigned int, which suggest that large IRQ numbers, albeit unusual on desktop and laptop systems, are supported and frequently seen on large server systems), so the i2c-i801 driver has no reason to handle it in a particular way. Out of curiosity, did you check for a BIOS update for your laptop? Did you look at BIOS option to see if by any chance enabling/disabling the SMBus interrupt is an option there? I'm also curious how you collected the IRQ value. Did you boot with some debug kernel parameter, like dyndbg="file pci_irq.c +p"? Did you manage to figure out where in the function call chain (starting with pcim_enable_device) the failure actually happens? Even if IRQ value 255 is most probably wrong in your case, I'm surprised that this causes an error at device activation time, rather than when later requesting the IRQ. > >> Force the SMBus IRQ to IRQ_NOTCONNECTED in this case, so that > >> we fall back to polling, which also seems to be what the (very > >> dated) Windows 7 drivers on the Dell Latitude E7450 do. > > What makes you think so? > > According to the Windows 7 device manager IRQ view, the SMBus has no IRQ > assigned, which I assumed implies that polling is used. If there is > another way to check this on Windows 7, please let me know. That's a reasonable assumption, and not being familiar with Windows, I don't have any other suggestion. However that doesn't necessarily mean that interrupts can't work. After all, the original i2c-i801 Linux driver also did not support interrupts.
Hi again Marius, On Sun, 4 Jun 2023 16:01:32 +0200, Jean Delvare wrote: > IRQ number 255 indeed looks suspicious, but I'm also not aware of this > being a special value (nr_irqs is defined as an unsigned int, which > suggest that large IRQ numbers, albeit unusual on desktop and laptop > systems, are supported and frequently seen on large server systems), so > the i2c-i801 driver has no reason to handle it in a particular way. OK, I stand corrected. There's this interesting comment in drivers/acpi/pci_irq.c:acpi_pci_irq_valid(): /* * On x86 irq line 0xff means "unknown" or "no connection" * (PCI 3.0, Section 6.2.4, footnote on page 223). */ So that's probably what you are seeing on your Dell laptop. Unfortunately this function is static inline, so we can't call it from the i2c-i801 driver. It's called by acpi_pci_irq_enable() but only if (gsi < 0), which isn't the case on your system.
Hi Jean, Dne 04. 06. 23 v 16:01 Jean Delvare napsal(a): > I admit I don't know. I'm not familiar with how GSI numbers relate to > IRQ numbers. I think I understand that GSI numbers are an ACPI thing, > and the ACPI layer is responsible for mapping these to actual IRQ > numbers? Is there a GSI-to-IRQ table available somewhere as part of the > ACPI tables? If so, it would be interesting to disassemble the ACPI > tables on your system and check what this looks like for you. You need to check _PRT method of PCI0 device in APIC mode. This will tell you to what GSI (APIC/pin) it goes. To check you need to have a look to the DSDT table and decompile it. You can obtain it by running acpidump > tables.txt and the acpixtract -a tables.txt and finally running iasl -d dsdt.asl. Then, because the SMBUS lives on bus0, you just need to check _PRT method under PCI0 device for the entry of 001fffff (INT C). If this entry exists it will tell you where is it connected. I assume this has no entry and then as a last chance Linux tries the PCI IRQ entry in the configuration space gets queried. And this has 0xff which is telling no IRQ connected. The southbridge has a IRQ routing configuration register which can be used to verify if this is routed anywhere or really left "unconnected". This is usually in the the RCBA base + something register. Have a look to "D31IP" register: SMBus Pin (SMIP) — R/W. Indicates which pin the SMBus controller drives as its interrupt. bits 15:12 If there is 0, it is not routed anywhere. Also you need to check "D31IR" where the PIN C is going: Interrupt C Pin Route (ICR) — R/W. Indicates which physical pin on the PCH is connected to the INTC# pin reported for device 31 functions. The PIRQA corresponds to the PIN 16 of IOAPIC etc. If you need more info on that feel free to contact me. I can try to help. Thanks, Rudolf
Hi Rudolf, thanks for the reply. On 04/06/2023 22:41, Rudolf Marek wrote: > Hi Jean, > > Dne 04. 06. 23 v 16:01 Jean Delvare napsal(a): >> I admit I don't know. I'm not familiar with how GSI numbers relate to >> IRQ numbers. I think I understand that GSI numbers are an ACPI thing, >> and the ACPI layer is responsible for mapping these to actual IRQ >> numbers? Is there a GSI-to-IRQ table available somewhere as part of the >> ACPI tables? If so, it would be interesting to disassemble the ACPI >> tables on your system and check what this looks like for you. > > You need to check _PRT method of PCI0 device in APIC mode. > This will tell you to what GSI (APIC/pin) it goes. > To check you need to have a look to the DSDT table and decompile > it. You can obtain it by running acpidump > tables.txt and the > acpixtract -a tables.txt > and finally running iasl -d dsdt.asl. > > Then, because the SMBUS lives on bus0, you just need to check _PRT method > under PCI0 device for the entry of 001fffff (INT C). > If this entry exists it will tell you where is it connected. The PCI0 device's _PRT, when PICM is true, returns AR00. That contains: Package (0x04) { 0x001FFFFF, 0x02, Zero, 0x12 }, So according to this IRQ (=GSI?) 18 should be used (which, as mentioned earlier is also used for the freefall device). (In acpi_pci_irq_enable) acpi_register_gsi fails for this (with gsi=18) and afterwards dev->irq is at 255 (which might just be an initial value? dev->irq is only set in acpi_pci_irq_enable afterwards). > I assume this has no entry and then as a last chance Linux tries the > PCI IRQ entry > in the configuration space gets queried. And this has 0xff which is > telling no IRQ connected. > > The southbridge has a IRQ routing configuration register which can be > used to verify > if this is routed anywhere or really left "unconnected". This is > usually in the the RCBA base + something > register. Have a look to "D31IP" register: > > SMBus Pin (SMIP) — R/W. Indicates which pin the SMBus controller > drives as its > interrupt. bits 15:12 > > If there is 0, it is not routed anywhere. Also you need to check > "D31IR" where the PIN C is going: > > Interrupt C Pin Route (ICR) — R/W. Indicates which physical pin on the > PCH is > connected to the INTC# pin reported for device 31 functions. > > The PIRQA corresponds to the PIN 16 of IOAPIC etc. > > If you need more info on that feel free to contact me. I can try to help. I skipped these steps (after identifying the _PRT entry) as it seems to me that we have a ACPI entry here (it's just not functional), thus this information would presumably be of no help. Further help in debugging this would be much appreciated. In order to further see why acpi_register_gsi failed, I also got the irqdomain debug output and this also didn't help me (except that it doesn't register a domain for our SMBus, like "irq: Added domain IR-PCI-MSI-0000:00:XX"). > > > Thanks, > Rudolf >
Hi Jean, thanks again for all the helpful replies! On 04/06/2023 16:01, Jean Delvare wrote: > Hi Marius, > > On Sat, 3 Jun 2023 11:24:02 +0200, Marius Hoch wrote: >> On 23/05/2023 20:03, Jean Delvare wrote: >>> On Sun, 14 May 2023 12:36:32 +0200, Marius Hoch wrote: >>>> The Dell Latitude E7450 uses IRQ 18 for the accelerometer, >>>> but also claims that the SMBus uses IRQ 18. This will >>>> result in: >>>> >>>> i801_smbus 0000:00:1f.3: PCI INT C: failed to register GSI >>>> i801_smbus 0000:00:1f.3: Failed to enable SMBus PCI device (-16) >>>> i801_smbus: probe of 0000:00:1f.3 failed with error -16 >>> The i2c-i801 driver supports shared IRQ. If this fails, this means that >>> the other driver is not passing IRQF_SHARED when registering the >>> interrupt. Which driver is this? I'd rather check whether sharing the >>> IRQ is possible, rather that falling back to polling, which has a >>> performance cost. >> I don't think this is a conflict rather than a completely bogus entry: >> smo8800 uses IRQ 18 (the freefall sensor). > You're probably right. I admit I misread your report originally and > thought requesting the IRQ was failing. But actually the failure > happens before that, when enabling the PCI device. So its not related > to sharing the interrupt. > >> For the SMBus in acpi_pci_irq_enable, acpi_register_gsi fails for GSI 18 >> with IRQ 255 (dev->irq), independently from the presence of the >> dell_smo8800 module. >> >> Now looking into this again, seeing dev->irq at 255 seems very >> suspicious here? Doesn't that mean not connected (although I'm not sure >> how this relates to it supposedly having GSI 18)? > I admit I don't know. I'm not familiar with how GSI numbers relate to > IRQ numbers. I think I understand that GSI numbers are an ACPI thing, > and the ACPI layer is responsible for mapping these to actual IRQ > numbers? Is there a GSI-to-IRQ table available somewhere as part of the > ACPI tables? If so, it would be interesting to disassemble the ACPI > tables on your system and check what this looks like for you. > > If this is a bug in the ACPI data then it might be worth booting with > acpi=noirq and see if it helps. This option might break other things > though (like free fall detection or thermal management) so be cautious. I just booted with acpi=noirq, the PCI device no longer fails to be enabled and the device got assigned IRQ 19 now (according to lspci -v/ proc/interrupts), while the freefall device remained at IRQ 18. Interestingly dmesg is full of spam from the freefall device (endlessly reporting that freefall got detected, probably indicating a problem in IRQ handling, yikes). Booting without the smo8800 module results in: [root@fedora ~]# dmesg | grep -i smbus [ 20.042515] i801_smbus 0000:00:1f.3: PCI->APIC IRQ transform: INT C -> IRQ 19 [ 20.042548] i801_smbus 0000:00:1f.3: SPD Write Disable is set [ 20.042574] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt [ 20.051270] i801_smbus 0000:00:1f.3: Accelerometer lis3lv02d is present on SMBus but its address is unknown, skipping registration [ 20.253942] i801_smbus 0000:00:1f.3: Transaction timeout [ 20.461962] i801_smbus 0000:00:1f.3: Transaction timeout The "Transaction timeout" messages might indicate that interrupt routing isn't actually working? > > IRQ number 255 indeed looks suspicious, but I'm also not aware of this > being a special value (nr_irqs is defined as an unsigned int, which > suggest that large IRQ numbers, albeit unusual on desktop and laptop > systems, are supported and frequently seen on large server systems), so > the i2c-i801 driver has no reason to handle it in a particular way. > > Out of curiosity, did you check for a BIOS update for your laptop? Did > you look at BIOS option to see if by any chance enabling/disabling the > SMBus interrupt is an option there? There is a newer BIOS version, yes. I didn't yet apply it, though. Reading up on the kernel bug, I doubt it will fix these issues (it would be interesting to see though, but this might make this bug unreproducible for me). > > I'm also curious how you collected the IRQ value. Did you boot with > some debug kernel parameter, like dyndbg="file pci_irq.c +p"? > > Did you manage to figure out where in the function call chain (starting > with pcim_enable_device) the failure actually happens? Even if IRQ > value 255 is most probably wrong in your case, I'm surprised that this > causes an error at device activation time, rather than when later > requesting the IRQ. I'm not sure anymore IRQ 255 is actually being set from ACPI or PCI registers but might just be a pre-initialization value (see my reply to Rudolf). > >>>> Force the SMBus IRQ to IRQ_NOTCONNECTED in this case, so that >>>> we fall back to polling, which also seems to be what the (very >>>> dated) Windows 7 drivers on the Dell Latitude E7450 do. >>> What makes you think so? >> According to the Windows 7 device manager IRQ view, the SMBus has no IRQ >> assigned, which I assumed implies that polling is used. If there is >> another way to check this on Windows 7, please let me know. > That's a reasonable assumption, and not being familiar with Windows, I > don't have any other suggestion. However that doesn't necessarily mean > that interrupts can't work. After all, the original i2c-i801 Linux > driver also did not support interrupts. > Cheers, Marius
On Sun, 18 Jun 2023 15:42:40 +0200, Marius Hoch wrote: > I just booted with acpi=noirq, the PCI device no longer fails to be > enabled and the device got assigned IRQ 19 now (according to lspci -v/ > proc/interrupts), while the freefall device remained at IRQ 18. > Interestingly dmesg is full of spam from the freefall device (endlessly > reporting that freefall got detected, probably indicating a problem in > IRQ handling, yikes). Unfortunately, while acpi=noirq can be useful for testing purposes and bug investigation, there's no guarantee that a modern x86 system can actually work properly without ACPI-based PCI routing. > Booting without the smo8800 module results in: > [root@fedora ~]# dmesg | grep -i smbus > [ 20.042515] i801_smbus 0000:00:1f.3: PCI->APIC IRQ transform: INT C > -> IRQ 19 > [ 20.042548] i801_smbus 0000:00:1f.3: SPD Write Disable is set > [ 20.042574] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt > [ 20.051270] i801_smbus 0000:00:1f.3: Accelerometer lis3lv02d is > present on SMBus but its address is unknown, skipping registration > [ 20.253942] i801_smbus 0000:00:1f.3: Transaction timeout > [ 20.461962] i801_smbus 0000:00:1f.3: Transaction timeout > > The "Transaction timeout" messages might indicate that interrupt routing > isn't actually working? Indeed. This means the driver waited for an interrupt but was never called back.
Hello! In your log is one important information. On Sunday 18 June 2023 15:42:40 Marius Hoch wrote: > Hi Jean, > > thanks again for all the helpful replies! > > On 04/06/2023 16:01, Jean Delvare wrote: > > Hi Marius, > > > > On Sat, 3 Jun 2023 11:24:02 +0200, Marius Hoch wrote: > > > On 23/05/2023 20:03, Jean Delvare wrote: > > > > On Sun, 14 May 2023 12:36:32 +0200, Marius Hoch wrote: > > > > > The Dell Latitude E7450 uses IRQ 18 for the accelerometer, > > > > > but also claims that the SMBus uses IRQ 18. This will > > > > > result in: > > > > > > > > > > i801_smbus 0000:00:1f.3: PCI INT C: failed to register GSI > > > > > i801_smbus 0000:00:1f.3: Failed to enable SMBus PCI device (-16) > > > > > i801_smbus: probe of 0000:00:1f.3 failed with error -16 > > > > The i2c-i801 driver supports shared IRQ. If this fails, this means that > > > > the other driver is not passing IRQF_SHARED when registering the > > > > interrupt. Which driver is this? I'd rather check whether sharing the > > > > IRQ is possible, rather that falling back to polling, which has a > > > > performance cost. > > > I don't think this is a conflict rather than a completely bogus entry: > > > smo8800 uses IRQ 18 (the freefall sensor). > > You're probably right. I admit I misread your report originally and > > thought requesting the IRQ was failing. But actually the failure > > happens before that, when enabling the PCI device. So its not related > > to sharing the interrupt. > > > > > For the SMBus in acpi_pci_irq_enable, acpi_register_gsi fails for GSI 18 > > > with IRQ 255 (dev->irq), independently from the presence of the > > > dell_smo8800 module. > > > > > > Now looking into this again, seeing dev->irq at 255 seems very > > > suspicious here? Doesn't that mean not connected (although I'm not sure > > > how this relates to it supposedly having GSI 18)? > > I admit I don't know. I'm not familiar with how GSI numbers relate to > > IRQ numbers. I think I understand that GSI numbers are an ACPI thing, > > and the ACPI layer is responsible for mapping these to actual IRQ > > numbers? Is there a GSI-to-IRQ table available somewhere as part of the > > ACPI tables? If so, it would be interesting to disassemble the ACPI > > tables on your system and check what this looks like for you. > > > > If this is a bug in the ACPI data then it might be worth booting with > > acpi=noirq and see if it helps. This option might break other things > > though (like free fall detection or thermal management) so be cautious. > I just booted with acpi=noirq, the PCI device no longer fails to be enabled > and the device got assigned IRQ 19 now (according to lspci -v/ > proc/interrupts), while the freefall device remained at IRQ 18. > Interestingly dmesg is full of spam from the freefall device (endlessly > reporting that freefall got detected, probably indicating a problem in IRQ > handling, yikes). > > Booting without the smo8800 module results in: > [root@fedora ~]# dmesg | grep -i smbus > [ 20.042515] i801_smbus 0000:00:1f.3: PCI->APIC IRQ transform: INT C -> > IRQ 19 > [ 20.042548] i801_smbus 0000:00:1f.3: SPD Write Disable is set > [ 20.042574] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt > [ 20.051270] i801_smbus 0000:00:1f.3: Accelerometer lis3lv02d is present > on SMBus but its address is unknown, skipping registration The accelerometer with "conflicting" interrupt line is connected via that same SMBus. You could be able to detects its address via i2cdetect and manually load via: echo lis3lv02d [address] > /sys/.../new_device My guess, are not all devices on SMBus using same shared interrupt line?
On Sun, May 14, 2023 at 12:36:32PM +0200, Marius Hoch wrote: > The Dell Latitude E7450 uses IRQ 18 for the accelerometer, > but also claims that the SMBus uses IRQ 18. This will > result in: > > i801_smbus 0000:00:1f.3: PCI INT C: failed to register GSI > i801_smbus 0000:00:1f.3: Failed to enable SMBus PCI device (-16) > i801_smbus: probe of 0000:00:1f.3 failed with error -16 > > Force the SMBus IRQ to IRQ_NOTCONNECTED in this case, so that > we fall back to polling, which also seems to be what the (very > dated) Windows 7 drivers on the Dell Latitude E7450 do. > > This was tested on Dell Latitude E7450. > > I chose to explicitly list all affected devices here, but > alternatively it would be possible to do this programmatically: > If the initial pcim_enable_device fails and we're on (any) > Dell Latitude, re-try with IRQ_NOTCONNECTED. > > Marius Hoch (2): > i2c: i801: Force no IRQ for Dell Latitude E7450 > i2c: i801: Force no IRQ for further Dell Latitudes > > drivers/i2c/busses/i2c-i801.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) Where are we with this patch set? Other solution found already? Discussion stalled? Too much other things going on?