mbox series

[0/8,SRU,OEM-5.17] System hang during S3 test

Message ID 20220921031924.2354693-1-acelan.kao@canonical.com
Headers show
Series System hang during S3 test | expand

Message

AceLan Kao Sept. 21, 2022, 3:19 a.m. UTC
From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>

BugLink: https://launchpad.net/bugs/1990330

[Impact]
It hangs while doing S3 test on the platform with CPU Intel(R) Pentium(R)
Silver N6005 @ 2.00GHz

[Fix]
Can't reproduce this issue with v5.18-rc1 kernel, so bisecting the kernel
and found this below commit
   567511462387 mm/memcg: protect memcg_stock with a local_lock_t
For safty, I backported the series of the patch
   https://www.spinics.net/lists/cgroups/msg31595.html
But I still can reproduce the issue after applied those patches on top of
5.17 oem kernel.
So, I did a second round of bisecting and found below commit is required, too
   a74c6c00b1cb mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory

[Test]
Done the S3 test 400 times on the target and it's still working well.

[Where problems could occur]
Hard to evaluate the impact, but from the overnight test, the memory usage
is still low, so there should be no memory leakage, and can't find any fix
patch for those applied commits from upstream and linux-next trees.
The patches are all from v5.18-rc1, so we only need them to be applied on
oem-5.17 kernel, and will let QA to do thoroughly tests.

Johannes Weiner (1):
  mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in
    drain_obj_stock()

Miaohe Lin (1):
  mm/memremap: avoid calling kasan_remove_zero_shadow() for device
    private memory

Michal Hocko (1):
  mm/memcg: revert ("mm/memcg: optimize user context object stock
    access")

Sebastian Andrzej Siewior (4):
  mm/memcg: disable threshold event handlers on PREEMPT_RT
  mm/memcg: protect per-CPU counter by disabling preemption on
    PREEMPT_RT where needed.
  mm/memcg: protect memcg_stock with a local_lock_t
  mm/memcg: disable migration instead of preemption in
    drain_all_stock().

Yosry Ahmed (1):
  memcg: add per-memcg total kernel memory stat

 .../admin-guide/cgroup-v1/memory.rst          |   2 +
 Documentation/admin-guide/cgroup-v2.rst       |   5 +
 include/linux/memcontrol.h                    |   1 +
 mm/memcontrol.c                               | 257 +++++++++++-------
 mm/memremap.c                                 |   3 +-
 5 files changed, 173 insertions(+), 95 deletions(-)

Comments

Timo Aaltonen Sept. 21, 2022, 10:40 a.m. UTC | #1
AceLan Kao kirjoitti 21.9.2022 klo 6.19:
> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>
> 
> BugLink: https://launchpad.net/bugs/1990330
> 
> [Impact]
> It hangs while doing S3 test on the platform with CPU Intel(R) Pentium(R)
> Silver N6005 @ 2.00GHz
> 
> [Fix]
> Can't reproduce this issue with v5.18-rc1 kernel, so bisecting the kernel
> and found this below commit
>     567511462387 mm/memcg: protect memcg_stock with a local_lock_t
> For safty, I backported the series of the patch
>     https://www.spinics.net/lists/cgroups/msg31595.html
> But I still can reproduce the issue after applied those patches on top of
> 5.17 oem kernel.
> So, I did a second round of bisecting and found below commit is required, too
>     a74c6c00b1cb mm/memremap: avoid calling kasan_remove_zero_shadow() for device private memory
> 
> [Test]
> Done the S3 test 400 times on the target and it's still working well.
> 
> [Where problems could occur]
> Hard to evaluate the impact, but from the overnight test, the memory usage
> is still low, so there should be no memory leakage, and can't find any fix
> patch for those applied commits from upstream and linux-next trees.
> The patches are all from v5.18-rc1, so we only need them to be applied on
> oem-5.17 kernel, and will let QA to do thoroughly tests.
> 
> Johannes Weiner (1):
>    mm/memcg: opencode the inner part of obj_cgroup_uncharge_pages() in
>      drain_obj_stock()
> 
> Miaohe Lin (1):
>    mm/memremap: avoid calling kasan_remove_zero_shadow() for device
>      private memory
> 
> Michal Hocko (1):
>    mm/memcg: revert ("mm/memcg: optimize user context object stock
>      access")
> 
> Sebastian Andrzej Siewior (4):
>    mm/memcg: disable threshold event handlers on PREEMPT_RT
>    mm/memcg: protect per-CPU counter by disabling preemption on
>      PREEMPT_RT where needed.
>    mm/memcg: protect memcg_stock with a local_lock_t
>    mm/memcg: disable migration instead of preemption in
>      drain_all_stock().
> 
> Yosry Ahmed (1):
>    memcg: add per-memcg total kernel memory stat
> 
>   .../admin-guide/cgroup-v1/memory.rst          |   2 +
>   Documentation/admin-guide/cgroup-v2.rst       |   5 +
>   include/linux/memcontrol.h                    |   1 +
>   mm/memcontrol.c                               | 257 +++++++++++-------
>   mm/memremap.c                                 |   3 +-
>   5 files changed, 173 insertions(+), 95 deletions(-)
> 

applied to oem-5.17, thanks