Message ID | 20240530012727.324611-2-jjang@nvidia.com |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | selftest: drivers: Add support its msi hwirq checking | expand |
[+cc Thomas] On Wed, May 29, 2024 at 06:27:27PM -0700, Joseph Jang wrote: > Validate there are no duplicate ITS-MSI hwirqs from the > /sys/kernel/irq/*/hwirq. > > One example log show 2 duplicated MSI entries in the /proc/interrupts. > > 150: 0 ... ITS-MSI 3355443200 Edge pciehp > 152: 0 ... ITS-MSI 3355443200 Edge pciehp I don't know how ITS-MSI works, so I don't know whether it's an error that both entries mention 3355443200. 3355443200 == 0xc8000000, which looks like it could be an address or address/data pair or something, and it does make sense to me that if two devices write the same MSI address/data, it should result in the same IRQ. It seems like maybe this is a generic issue, i.e., if this is a problem, maybe it would affect *other* kinds of MSI too, not just ITS-MSI? > Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. > [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/ > > Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> > Signed-off-by: Joseph Jang <jjang@nvidia.com> > --- > tools/testing/selftests/drivers/irq/Makefile | 5 +++++ > .../selftests/drivers/irq/its-msi-irq-test.sh | 20 +++++++++++++++++++ > 2 files changed, 25 insertions(+) > create mode 100644 tools/testing/selftests/drivers/irq/Makefile > create mode 100755 tools/testing/selftests/drivers/irq/its-msi-irq-test.sh > > diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile > new file mode 100644 > index 000000000000..569df5de22ee > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/Makefile > @@ -0,0 +1,5 @@ > +# SPDX-License-Identifier: GPL-2.0 > + > +TEST_PROGS := its-msi-irq-test.sh > + > +include ../../lib.mk > diff --git a/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh > new file mode 100755 > index 000000000000..87c88674903f > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh > @@ -0,0 +1,20 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0 > + > +if [ -z "$(grep "ITS-MSI" /proc/interrupts)" ]; then > + echo "SKIP: no ITS-MSI irq." > + exit 4 > +fi > + > +# Get ITS-MSI hwirq list from /sys/kernel/irq/*/hwirq. > +its_msi_irq_list=$(grep "ITS-MSI" /sys/kernel/irq/*/chip_name | Is there a limit on the size of the "*" expansion here? > + awk -F ':' '{print $1}' | > + xargs -I {} sh -c 'cat $(dirname {})/hwirq' | sort -V) > + > +# Check whether could find duplicated its-msi hwirq or not. > +if [ -n "$(echo "$its_msi_irq_list" | uniq -cd)" ]; then > + echo "ERROR: find duplicated its-msi hwirq." > + exit 1 > +fi > + > +exit 0 > -- > 2.34.1 >
On Wed, Jul 31 2024 at 14:24, Bjorn Helgaas wrote: > On Wed, May 29, 2024 at 06:27:27PM -0700, Joseph Jang wrote: >> Validate there are no duplicate ITS-MSI hwirqs from the >> /sys/kernel/irq/*/hwirq. >> >> One example log show 2 duplicated MSI entries in the /proc/interrupts. >> >> 150: 0 ... ITS-MSI 3355443200 Edge pciehp >> 152: 0 ... ITS-MSI 3355443200 Edge pciehp > > I don't know how ITS-MSI works, so I don't know whether it's an error > that both entries mention 3355443200. > > 3355443200 == 0xc8000000, which looks like it could be an address or > address/data pair or something, and it does make sense to me that if > two devices write the same MSI address/data, it should result in the > same IRQ. That was an issue with truncation which got fixed some time ago: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/ > It seems like maybe this is a generic issue, i.e., if this is a > problem, maybe it would affect *other* kinds of MSI too, not just > ITS-MSI? It's the same for ALL interrupts whether MSI or not. The requirement is that for any interrupt chip all hardware interrupt numbers related to a particular chip must be unique. Adding a ITS-MSI specific parser is just wrong. It's a generic problem and has absolutely nothing to do with ITS or MSI. Aside of that the proposed parser does not even work anymore on 6.11 because we switched ARM[64] over to per device domains during the merge window. So if we want a selftest for the correctness of the hardware interrupt numbers then it should grab the per interrupt sysfs entry 'chip_name' and 'hwirq' pairs and do an analysis per 'chip_name' whether all hardware interrupt numbers for a chip are unique. Thanks, tglx
On Wed, Jul 31 2024 at 22:42, Thomas Gleixner wrote: > Aside of that the proposed parser does not even work anymore on 6.11 > because we switched ARM[64] over to per device domains during the merge > window. > > So if we want a selftest for the correctness of the hardware interrupt > numbers then it should grab the per interrupt sysfs entry 'chip_name' > and 'hwirq' pairs and do an analysis per 'chip_name' whether all > hardware interrupt numbers for a chip are unique. I just hacked up a 20 lines snake script to analyze it and indeed that produces duplicates because some interrupt chips do not have unique chip names as they are shared between interrupt domains and the chip names are constant. There are several ways to handle this: 1) Amend /sys/kernel/irq/$N/chip_name with the irq domain name 2) Expose the irq domain name in /sys/kernel/irq/$N/domain_name 3) Utilize the existing /sys/kernel/debug/irq/ mechanism #1 Does change the output of chip_name, but that is a kernel internal detail anyway so there is no real UABI concern. #2 has the advantage that it does not change the output of chip_name but it consumes more memory for a dubious value. #3 has the downside that it requires CONFIG_GENERIC_IRQ_DEBUGFS=y and is root only, but that should be not a problem for testing. We have other selftests which have Kconfig dependencies and root requirements. The upside is that it does not require kernel changes. No real strong opinion either way, but all of that is better than a ITS specific parser which fails to work on the next kernel version. Thanks, tglx
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile new file mode 100644 index 000000000000..569df5de22ee --- /dev/null +++ b/tools/testing/selftests/drivers/irq/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +TEST_PROGS := its-msi-irq-test.sh + +include ../../lib.mk diff --git a/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh new file mode 100755 index 000000000000..87c88674903f --- /dev/null +++ b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh @@ -0,0 +1,20 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +if [ -z "$(grep "ITS-MSI" /proc/interrupts)" ]; then + echo "SKIP: no ITS-MSI irq." + exit 4 +fi + +# Get ITS-MSI hwirq list from /sys/kernel/irq/*/hwirq. +its_msi_irq_list=$(grep "ITS-MSI" /sys/kernel/irq/*/chip_name | + awk -F ':' '{print $1}' | + xargs -I {} sh -c 'cat $(dirname {})/hwirq' | sort -V) + +# Check whether could find duplicated its-msi hwirq or not. +if [ -n "$(echo "$its_msi_irq_list" | uniq -cd)" ]; then + echo "ERROR: find duplicated its-msi hwirq." + exit 1 +fi + +exit 0