mbox series

[0/1,SRU,Noble] Fix amdgpu hangs on DCN 3.5 at bootup

Message ID 20240620084042.515243-1-vicamo.yang@canonical.com
Headers show
Series Fix amdgpu hangs on DCN 3.5 at bootup | expand

Message

You-Sheng Yang June 20, 2024, 8:40 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2066233

[Impact]

Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to
2.3. This version uses same structure. Version 2.3 is missing from the
construct_integrated_info() parser, so it leads to NULL pointer dereference.

```
Call Trace:
<TASK>
? show_regs+0x72/0x90
? __die+0x25/0x80
? page_fault_oops+0x154/0x4c0
? ttm_bo_kmap+0x11d/0x310 [ttm]
? dma_resv_wait_timeout+0x48/0xe0
? do_user_addr_fault+0x30e/0x6e0
? exc_page_fault+0x84/0x1b0
? asm_exc_page_fault+0x27/0x30
? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
? dcn35_hwseq_create+0x23/0x470 [amdgpu]
```

[Fix]

Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom
Integrated System Info v2_2 for DCN35")

[Test Case]

AMDGPU should then be initialized successfully without NULL pointer deref dump
at boot.

[Where problems could occur]

No. New hardware revision with same data only.

[Other Info]

While this has been landed to v6.9-rc7, expect every kernel version older than
that with planned support to the new VBIOS update should be fixed. So far
linux/noble and linux-oem-6.8/noble are nominated by chip vendor.

Gabe Teeger (1):
  drm/amd/display: Atom Integrated System Info v2_2 for DCN35

 drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Manuel Diewald June 20, 2024, 9:05 a.m. UTC | #1
On Thu, Jun 20, 2024 at 04:40:41PM +0800, You-Sheng Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/2066233
> 
> [Impact]
> 
> Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to
> 2.3. This version uses same structure. Version 2.3 is missing from the
> construct_integrated_info() parser, so it leads to NULL pointer dereference.
> 
> ```
> Call Trace:
> <TASK>
> ? show_regs+0x72/0x90
> ? __die+0x25/0x80
> ? page_fault_oops+0x154/0x4c0
> ? ttm_bo_kmap+0x11d/0x310 [ttm]
> ? dma_resv_wait_timeout+0x48/0xe0
> ? do_user_addr_fault+0x30e/0x6e0
> ? exc_page_fault+0x84/0x1b0
> ? asm_exc_page_fault+0x27/0x30
> ? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
> ? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
> ? dcn35_hwseq_create+0x23/0x470 [amdgpu]
> ```
> 
> [Fix]
> 
> Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom
> Integrated System Info v2_2 for DCN35")
> 
> [Test Case]
> 
> AMDGPU should then be initialized successfully without NULL pointer deref dump
> at boot.
> 
> [Where problems could occur]
> 
> No. New hardware revision with same data only.
> 
> [Other Info]
> 
> While this has been landed to v6.9-rc7, expect every kernel version older than
> that with planned support to the new VBIOS update should be fixed. So far
> linux/noble and linux-oem-6.8/noble are nominated by chip vendor.
> 
> Gabe Teeger (1):
>   drm/amd/display: Atom Integrated System Info v2_2 for DCN35
> 
>  drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> -- 
> 2.43.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Paolo Pisati June 20, 2024, 1:57 p.m. UTC | #2
On Thu, Jun 20, 2024 at 04:40:41PM +0800, You-Sheng Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/2066233

Clean cherry-pick.

Acked-by: Paolo Pisati <paolo.pisati@canonical.com>
Stefan Bader June 21, 2024, 2:10 p.m. UTC | #3
On 20.06.24 10:40, You-Sheng Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/2066233
> 
> [Impact]
> 
> Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to
> 2.3. This version uses same structure. Version 2.3 is missing from the
> construct_integrated_info() parser, so it leads to NULL pointer dereference.
> 
> ```
> Call Trace:
> <TASK>
> ? show_regs+0x72/0x90
> ? __die+0x25/0x80
> ? page_fault_oops+0x154/0x4c0
> ? ttm_bo_kmap+0x11d/0x310 [ttm]
> ? dma_resv_wait_timeout+0x48/0xe0
> ? do_user_addr_fault+0x30e/0x6e0
> ? exc_page_fault+0x84/0x1b0
> ? asm_exc_page_fault+0x27/0x30
> ? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
> ? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
> ? dcn35_hwseq_create+0x23/0x470 [amdgpu]
> ```
> 
> [Fix]
> 
> Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom
> Integrated System Info v2_2 for DCN35")
> 
> [Test Case]
> 
> AMDGPU should then be initialized successfully without NULL pointer deref dump
> at boot.
> 
> [Where problems could occur]
> 
> No. New hardware revision with same data only.
> 
> [Other Info]
> 
> While this has been landed to v6.9-rc7, expect every kernel version older than
> that with planned support to the new VBIOS update should be fixed. So far
> linux/noble and linux-oem-6.8/noble are nominated by chip vendor.
> 
> Gabe Teeger (1):
>    drm/amd/display: Atom Integrated System Info v2_2 for DCN35
> 
>   drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 1 +
>   1 file changed, 1 insertion(+)
> 

Applied to noble:linux/master-next. Thanks.

-Stefan