diff mbox series

[1/1] selftest: drivers: Add support its msi hwirq checking

Message ID 20240530012727.324611-2-jjang@nvidia.com
State Handled Elsewhere
Headers show
Series selftest: drivers: Add support its msi hwirq checking | expand

Commit Message

Joseph Jang May 30, 2024, 1:27 a.m. UTC
Validate there are no duplicate ITS-MSI hwirqs from the
/sys/kernel/irq/*/hwirq.

One example log show 2 duplicated MSI entries in the /proc/interrupts.

150: 0 ... ITS-MSI 3355443200 Edge      pciehp
152: 0 ... ITS-MSI 3355443200 Edge      pciehp

Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
[1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/

Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Joseph Jang <jjang@nvidia.com>
---
 tools/testing/selftests/drivers/irq/Makefile  |  5 +++++
 .../selftests/drivers/irq/its-msi-irq-test.sh | 20 +++++++++++++++++++
 2 files changed, 25 insertions(+)
 create mode 100644 tools/testing/selftests/drivers/irq/Makefile
 create mode 100755 tools/testing/selftests/drivers/irq/its-msi-irq-test.sh

Comments

Bjorn Helgaas July 31, 2024, 7:24 p.m. UTC | #1
[+cc Thomas]

On Wed, May 29, 2024 at 06:27:27PM -0700, Joseph Jang wrote:
> Validate there are no duplicate ITS-MSI hwirqs from the
> /sys/kernel/irq/*/hwirq.
> 
> One example log show 2 duplicated MSI entries in the /proc/interrupts.
> 
> 150: 0 ... ITS-MSI 3355443200 Edge      pciehp
> 152: 0 ... ITS-MSI 3355443200 Edge      pciehp

I don't know how ITS-MSI works, so I don't know whether it's an error
that both entries mention 3355443200.

3355443200 == 0xc8000000, which looks like it could be an address or
address/data pair or something, and it does make sense to me that if
two devices write the same MSI address/data, it should result in the
same IRQ.

It seems like maybe this is a generic issue, i.e., if this is a
problem, maybe it would affect *other* kinds of MSI too, not just
ITS-MSI?

> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
> 
> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
> Signed-off-by: Joseph Jang <jjang@nvidia.com>
> ---
>  tools/testing/selftests/drivers/irq/Makefile  |  5 +++++
>  .../selftests/drivers/irq/its-msi-irq-test.sh | 20 +++++++++++++++++++
>  2 files changed, 25 insertions(+)
>  create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>  create mode 100755 tools/testing/selftests/drivers/irq/its-msi-irq-test.sh
> 
> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
> new file mode 100644
> index 000000000000..569df5de22ee
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TEST_PROGS := its-msi-irq-test.sh
> +
> +include ../../lib.mk
> diff --git a/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh
> new file mode 100755
> index 000000000000..87c88674903f
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh
> @@ -0,0 +1,20 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +if [ -z "$(grep "ITS-MSI" /proc/interrupts)" ]; then
> +	echo "SKIP: no ITS-MSI irq."
> +	exit 4
> +fi
> +
> +# Get ITS-MSI hwirq list from /sys/kernel/irq/*/hwirq.
> +its_msi_irq_list=$(grep "ITS-MSI" /sys/kernel/irq/*/chip_name |

Is there a limit on the size of the "*" expansion here?

> +				   awk -F ':' '{print $1}' |
> +				   xargs -I {} sh -c 'cat $(dirname {})/hwirq' | sort -V)
> +
> +# Check whether could find duplicated its-msi hwirq or not.
> +if [ -n "$(echo "$its_msi_irq_list" | uniq -cd)" ]; then
> +	echo "ERROR: find duplicated its-msi hwirq."
> +	exit 1
> +fi
> +
> +exit 0
> -- 
> 2.34.1
>
Thomas Gleixner July 31, 2024, 8:42 p.m. UTC | #2
On Wed, Jul 31 2024 at 14:24, Bjorn Helgaas wrote:
> On Wed, May 29, 2024 at 06:27:27PM -0700, Joseph Jang wrote:
>> Validate there are no duplicate ITS-MSI hwirqs from the
>> /sys/kernel/irq/*/hwirq.
>> 
>> One example log show 2 duplicated MSI entries in the /proc/interrupts.
>> 
>> 150: 0 ... ITS-MSI 3355443200 Edge      pciehp
>> 152: 0 ... ITS-MSI 3355443200 Edge      pciehp
>
> I don't know how ITS-MSI works, so I don't know whether it's an error
> that both entries mention 3355443200.
>
> 3355443200 == 0xc8000000, which looks like it could be an address or
> address/data pair or something, and it does make sense to me that if
> two devices write the same MSI address/data, it should result in the
> same IRQ.

That was an issue with truncation which got fixed some time ago:

  https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/

> It seems like maybe this is a generic issue, i.e., if this is a
> problem, maybe it would affect *other* kinds of MSI too, not just
> ITS-MSI?

It's the same for ALL interrupts whether MSI or not.

The requirement is that for any interrupt chip all hardware interrupt
numbers related to a particular chip must be unique.

Adding a ITS-MSI specific parser is just wrong. It's a generic problem
and has absolutely nothing to do with ITS or MSI.

Aside of that the proposed parser does not even work anymore on 6.11
because we switched ARM[64] over to per device domains during the merge
window.

So if we want a selftest for the correctness of the hardware interrupt
numbers then it should grab the per interrupt sysfs entry 'chip_name'
and 'hwirq' pairs and do an analysis per 'chip_name' whether all
hardware interrupt numbers for a chip are unique.

Thanks,

        tglx
Thomas Gleixner July 31, 2024, 11:12 p.m. UTC | #3
On Wed, Jul 31 2024 at 22:42, Thomas Gleixner wrote:
> Aside of that the proposed parser does not even work anymore on 6.11
> because we switched ARM[64] over to per device domains during the merge
> window.
>
> So if we want a selftest for the correctness of the hardware interrupt
> numbers then it should grab the per interrupt sysfs entry 'chip_name'
> and 'hwirq' pairs and do an analysis per 'chip_name' whether all
> hardware interrupt numbers for a chip are unique.

I just hacked up a 20 lines snake script to analyze it and indeed that
produces duplicates because some interrupt chips do not have unique chip
names as they are shared between interrupt domains and the chip names
are constant.

There are several ways to handle this:

  1) Amend /sys/kernel/irq/$N/chip_name with the irq domain name

  2) Expose the irq domain name in /sys/kernel/irq/$N/domain_name

  3) Utilize the existing /sys/kernel/debug/irq/ mechanism

#1 Does change the output of chip_name, but that is a kernel internal
   detail anyway so there is no real UABI concern.

#2 has the advantage that it does not change the output of chip_name but
   it consumes more memory for a dubious value.

#3 has the downside that it requires CONFIG_GENERIC_IRQ_DEBUGFS=y and is
   root only, but that should be not a problem for testing. We have other
   selftests which have Kconfig dependencies and root requirements. The
   upside is that it does not require kernel changes.

No real strong opinion either way, but all of that is better than a ITS
specific parser which fails to work on the next kernel version.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
new file mode 100644
index 000000000000..569df5de22ee
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/Makefile
@@ -0,0 +1,5 @@ 
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_PROGS := its-msi-irq-test.sh
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh
new file mode 100755
index 000000000000..87c88674903f
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/its-msi-irq-test.sh
@@ -0,0 +1,20 @@ 
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+if [ -z "$(grep "ITS-MSI" /proc/interrupts)" ]; then
+	echo "SKIP: no ITS-MSI irq."
+	exit 4
+fi
+
+# Get ITS-MSI hwirq list from /sys/kernel/irq/*/hwirq.
+its_msi_irq_list=$(grep "ITS-MSI" /sys/kernel/irq/*/chip_name |
+				   awk -F ':' '{print $1}' |
+				   xargs -I {} sh -c 'cat $(dirname {})/hwirq' | sort -V)
+
+# Check whether could find duplicated its-msi hwirq or not.
+if [ -n "$(echo "$its_msi_irq_list" | uniq -cd)" ]; then
+	echo "ERROR: find duplicated its-msi hwirq."
+	exit 1
+fi
+
+exit 0