mbox series

[SRU,J,v2,0/2] amd/iommu: Fix warnings on AMD systems after booting into kdump kernel

Message ID 20240913083225.715406-1-ghadi.rahme@canonical.com
Headers show
Series amd/iommu: Fix warnings on AMD systems after booting into kdump kernel | expand

Message

Ghadi Elie Rahme Sept. 13, 2024, 8:32 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2080378

[impact]
On some AMD systems, loading into a kdump kernel will show a few warnings IOMMU warnings during early boot. These warnings have not been observed yet to cause any issues but there is a fix upstream for them. Currently only focal-HWE and jammy 5.15 are affected. Newer kernel releases already have the fix. The stack traces look like the following:

    [ 9.125703] WARNING: CPU: 0 PID: 1 at drivers/iommu/amd/init.c:829 iommu_init_irq+0x2f2/0x3c0
    [ 9.134223] Modules linked in:
    [ 9.137283] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-107-generic #117~20.04.1-Ubuntu
    [ 9.145716] Hardware name: <hidden>
    [ 9.153111] RIP: 0010:iommu_init_irq+0x2f2/0x3c0
    [ 9.157729] Code: 90 ff 85 c0 0f 84 e8 fd ff ff be 01 00 00 00 44 89 ef 89 45 94 e8 2e dc 90 ff 4c 89 e7 e8 b6 cf 90 ff 8b 45 94 e9 6c fd ff ff <0f> 0b 31 c0 e9 63 fd ff ff 0f 0b 31 c0 e9 5a fd ff ff 31 c9 48 c7
    [ 9.176475] RSP: 0018:ffffa005000fbd00 EFLAGS: 00010202
    [ 9.181703] RAX: 0000000000000198 RBX: ffff9335af44a000 RCX: ffffa00500100000
    [ 9.188838] RDX: ffffa00500100000 RSI: ffff9335c05b9140 RDI: ffff9335c05b95c8
    [ 9.195970] RBP: ffffa005000fbd70 R08: ffffffffffffffff R09: 0000000000000000
    [ 9.203101] R10: ffffffe000000000 R11: 0000000000000025 R12: ffff9335c0468cc0
    [ 9.210231] R13: 000000000000001a R14: ffff9335b0151600 R15: 0000000000000006
    [ 9.217362] FS: 0000000000000000(0000) GS:ffff9336aec00000(0000) knlGS:0000000000000000
    [ 9.225446] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 9.231185] CR2: 0000000000000000 CR3: 000002006c810000 CR4: 0000000000350ef0
    [ 9.238318] Call Trace:
    [ 9.240763] <TASK>
    [ 9.242869] ? show_regs.cold+0x1a/0x1f
    [ 9.246710] ? iommu_init_irq+0x2f2/0x3c0
    [ 9.250722] ? __warn+0x8b/0xe0
    [ 9.253868] ? iommu_init_irq+0x2f2/0x3c0
    [ 9.257883] ? report_bug+0xd5/0x110
    [ 9.261461] ? handle_bug+0x39/0x90
    [ 9.264956] ? exc_invalid_op+0x19/0x70
    [ 9.268794] ? asm_exc_invalid_op+0x1b/0x20
    [ 9.272980] ? iommu_init_irq+0x2f2/0x3c0
    [ 9.276993] ? e820__memblock_setup+0x89/0x89
    [ 9.281353] state_next+0x3f5/0x6ba
    [ 9.284847] ? e820__memblock_setup+0x89/0x89
    [ 9.289206] iommu_go_to_state+0x28/0x31
    [ 9.293131] amd_iommu_init+0x15/0x4f
    [ 9.296797] ? e820__memblock_setup+0x89/0x89
    [ 9.301150] pci_iommu_init+0x1a/0x48
    [ 9.304817] do_one_initcall+0x48/0x1e0
    [ 9.308655] kernel_init_freeable+0x284/0x2f1
    [ 9.313016] ? rest_init+0x100/0x100
    [ 9.316593] kernel_init+0x1b/0x150
    [ 9.320078] ? rest_init+0x100/0x100
    [ 9.323658] ret_from_fork+0x22/0x30
    [ 9.327238] </TASK>
    [ 9.329431] ---[ end trace 6113ebe8cb8ce54f ]---

The commit that fixes the issue is:

* c5e1a1eb9279 ("iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement")

However another commit was also created that fixed compiler warnings introduced by the above commit:

* be280ea763f7 ("iommu/amd: Fix compile warning in init code")

[Test Plan]

1- On a machine using an AMD CPU and running the focal-hwe or jammy 5.15 kernel, make sure kdump is configured following these steps: https://ubuntu.com/server/docs/kernel-crash-dump

2- Trigger a kernel panic. This can be done using the command:

$ echo c > /proc/sysrq-trigger

3- When the machine reboots, you will notice IOMMU warnings during the early phases of the boot process in dmesg.

4- After applying the two commits and repeating step 2. No IOMMU warnings should show up anymore in dmesg.

[Fix]
Only the first commit of the below list is required to fix the bug, but the second one is good to have to avoid compilation warnings introduced by that commit:

* c5e1a1eb9279 ("iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement")
* be280ea763f7 ("iommu/amd: Fix compile warning in init code")

[where problems could occur]

* IOMMU can fail to initialize after applying these commits on AMD systems.

* There is a chance these commits do not fix IOMMU warnings for all AMD system configurations

Joerg Roedel (1):
  iommu/amd: Fix compile warning in init code

Suravee Suthikulpanit (1):
  iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement

 drivers/iommu/amd/init.c | 95 +++++++++++++++++++++++++---------------
 1 file changed, 59 insertions(+), 36 deletions(-)

Comments

Mehmet Basaran Oct. 6, 2024, 12:28 p.m. UTC | #1
Patch [1/2] looks good.

Patch [2/2] is different from the upstream commit
be280ea763f7db492e0e30ba22873433aea0f468. In the upstream commit
free_ga_log() is not put inside a macro completely. When I followed
where free_ga_log() is called, I saw code paths that are not protected
by CONFIG_IRQ_REMAP macro. So, in this case, if CONFIG_IRQ_REMAP is not
defined, this module will not compile.

free_ga_log
 free_iommu_one
  free_iommu_all
   free_iommu_resources
    state_next
     iommu_go_to_state
      amd_iommu_init
       amd_iommu_detect


Nacked-by: Mehmet Basaran <mehmet.basaran@canonical.com>
Ghadi Elie Rahme <ghadi.rahme@canonical.com> writes:

> BugLink: https://bugs.launchpad.net/bugs/2080378
>
> [impact]
> On some AMD systems, loading into a kdump kernel will show a few warnings IOMMU warnings during early boot. These warnings have not been observed yet to cause any issues but there is a fix upstream for them. Currently only focal-HWE and jammy 5.15 are affected. Newer kernel releases already have the fix. The stack traces look like the following:
>
>     [ 9.125703] WARNING: CPU: 0 PID: 1 at drivers/iommu/amd/init.c:829 iommu_init_irq+0x2f2/0x3c0
>     [ 9.134223] Modules linked in:
>     [ 9.137283] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-107-generic #117~20.04.1-Ubuntu
>     [ 9.145716] Hardware name: <hidden>
>     [ 9.153111] RIP: 0010:iommu_init_irq+0x2f2/0x3c0
>     [ 9.157729] Code: 90 ff 85 c0 0f 84 e8 fd ff ff be 01 00 00 00 44 89 ef 89 45 94 e8 2e dc 90 ff 4c 89 e7 e8 b6 cf 90 ff 8b 45 94 e9 6c fd ff ff <0f> 0b 31 c0 e9 63 fd ff ff 0f 0b 31 c0 e9 5a fd ff ff 31 c9 48 c7
>     [ 9.176475] RSP: 0018:ffffa005000fbd00 EFLAGS: 00010202
>     [ 9.181703] RAX: 0000000000000198 RBX: ffff9335af44a000 RCX: ffffa00500100000
>     [ 9.188838] RDX: ffffa00500100000 RSI: ffff9335c05b9140 RDI: ffff9335c05b95c8
>     [ 9.195970] RBP: ffffa005000fbd70 R08: ffffffffffffffff R09: 0000000000000000
>     [ 9.203101] R10: ffffffe000000000 R11: 0000000000000025 R12: ffff9335c0468cc0
>     [ 9.210231] R13: 000000000000001a R14: ffff9335b0151600 R15: 0000000000000006
>     [ 9.217362] FS: 0000000000000000(0000) GS:ffff9336aec00000(0000) knlGS:0000000000000000
>     [ 9.225446] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [ 9.231185] CR2: 0000000000000000 CR3: 000002006c810000 CR4: 0000000000350ef0
>     [ 9.238318] Call Trace:
>     [ 9.240763] <TASK>
>     [ 9.242869] ? show_regs.cold+0x1a/0x1f
>     [ 9.246710] ? iommu_init_irq+0x2f2/0x3c0
>     [ 9.250722] ? __warn+0x8b/0xe0
>     [ 9.253868] ? iommu_init_irq+0x2f2/0x3c0
>     [ 9.257883] ? report_bug+0xd5/0x110
>     [ 9.261461] ? handle_bug+0x39/0x90
>     [ 9.264956] ? exc_invalid_op+0x19/0x70
>     [ 9.268794] ? asm_exc_invalid_op+0x1b/0x20
>     [ 9.272980] ? iommu_init_irq+0x2f2/0x3c0
>     [ 9.276993] ? e820__memblock_setup+0x89/0x89
>     [ 9.281353] state_next+0x3f5/0x6ba
>     [ 9.284847] ? e820__memblock_setup+0x89/0x89
>     [ 9.289206] iommu_go_to_state+0x28/0x31
>     [ 9.293131] amd_iommu_init+0x15/0x4f
>     [ 9.296797] ? e820__memblock_setup+0x89/0x89
>     [ 9.301150] pci_iommu_init+0x1a/0x48
>     [ 9.304817] do_one_initcall+0x48/0x1e0
>     [ 9.308655] kernel_init_freeable+0x284/0x2f1
>     [ 9.313016] ? rest_init+0x100/0x100
>     [ 9.316593] kernel_init+0x1b/0x150
>     [ 9.320078] ? rest_init+0x100/0x100
>     [ 9.323658] ret_from_fork+0x22/0x30
>     [ 9.327238] </TASK>
>     [ 9.329431] ---[ end trace 6113ebe8cb8ce54f ]---
>
> The commit that fixes the issue is:
>
> * c5e1a1eb9279 ("iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement")
>
> However another commit was also created that fixed compiler warnings introduced by the above commit:
>
> * be280ea763f7 ("iommu/amd: Fix compile warning in init code")
>
> [Test Plan]
>
> 1- On a machine using an AMD CPU and running the focal-hwe or jammy 5.15 kernel, make sure kdump is configured following these steps: https://ubuntu.com/server/docs/kernel-crash-dump
>
> 2- Trigger a kernel panic. This can be done using the command:
>
> $ echo c > /proc/sysrq-trigger
>
> 3- When the machine reboots, you will notice IOMMU warnings during the early phases of the boot process in dmesg.
>
> 4- After applying the two commits and repeating step 2. No IOMMU warnings should show up anymore in dmesg.
>
> [Fix]
> Only the first commit of the below list is required to fix the bug, but the second one is good to have to avoid compilation warnings introduced by that commit:
>
> * c5e1a1eb9279 ("iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement")
> * be280ea763f7 ("iommu/amd: Fix compile warning in init code")
>
> [where problems could occur]
>
> * IOMMU can fail to initialize after applying these commits on AMD systems.
>
> * There is a chance these commits do not fix IOMMU warnings for all AMD system configurations
>
> Joerg Roedel (1):
>   iommu/amd: Fix compile warning in init code
>
> Suravee Suthikulpanit (1):
>   iommu/amd: Simplify and Consolidate Virtual APIC (AVIC) Enablement
>
>  drivers/iommu/amd/init.c | 95 +++++++++++++++++++++++++---------------
>  1 file changed, 59 insertions(+), 36 deletions(-)
>
> --
> 2.43.0
>
>
> --
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team